Science.gov

Sample records for gene set statistics

  1. Self-Contained Statistical Analysis of Gene Sets

    PubMed Central

    Cannon, Judy L.; Ricoy, Ulises M.; Johnson, Christopher

    2016-01-01

    Microarrays are a powerful tool for studying differential gene expression. However, lists of many differentially expressed genes are often generated, and unraveling meaningful biological processes from the lists can be challenging. For this reason, investigators have sought to quantify the statistical probability of compiled gene sets rather than individual genes. The gene sets typically are organized around a biological theme or pathway. We compute correlations between different gene set tests and elect to use Fisher’s self-contained method for gene set analysis. We improve Fisher’s differential expression analysis of a gene set by limiting the p-value of an individual gene within the gene set to prevent a small percentage of genes from determining the statistical significance of the entire set. In addition, we also compute dependencies among genes within the set to determine which genes are statistically linked. The method is applied to T-ALL (T-lineage Acute Lymphoblastic Leukemia) to identify differentially expressed gene sets between T-ALL and normal patients and T-ALL and AML (Acute Myeloid Leukemia) patients. PMID:27711232

  2. GeneSetDB: A comprehensive meta-database, statistical and visualisation framework for gene set analysis

    PubMed Central

    Araki, Hiromitsu; Knapp, Christoph; Tsai, Peter; Print, Cristin

    2012-01-01

    Most “omics” experiments require comprehensive interpretation of the biological meaning of gene lists. To address this requirement, a number of gene set analysis (GSA) tools have been developed. Although the biological value of GSA is strictly limited by the breadth of the gene sets used, very few methods exist for simultaneously analysing multiple publically available gene set databases. Therefore, we constructed GeneSetDB (http://genesetdb.auckland.ac.nz/haeremai.html), a comprehensive meta-database, which integrates 26 public databases containing diverse biological information with a particular focus on human disease and pharmacology. GeneSetDB enables users to search for gene sets containing a gene identifier or keyword, generate their own gene sets, or statistically test for enrichment of an uploaded gene list across all gene sets, and visualise gene set enrichment and overlap using a clustered heat map. PMID:23650583

  3. Effect of the absolute statistic on gene-sampling gene-set analysis methods.

    PubMed

    Nam, Dougu

    2015-03-02

    Gene-set enrichment analysis and its modified versions have commonly been used for identifying altered functions or pathways in disease from microarray data. In particular, the simple gene-sampling gene-set analysis methods have been heavily used for datasets with only a few sample replicates. The biggest problem with this approach is the highly inflated false-positive rate. In this paper, the effect of absolute gene statistic on gene-sampling gene-set analysis methods is systematically investigated. Thus far, the absolute gene statistic has merely been regarded as a supplementary method for capturing the bidirectional changes in each gene set. Here, it is shown that incorporating the absolute gene statistic in gene-sampling gene-set analysis substantially reduces the false-positive rate and improves the overall discriminatory ability. Its effect was investigated by power, false-positive rate, and receiver operating curve for a number of simulated and real datasets. The performances of gene-set analysis methods in one-tailed (genome-wide association study) and two-tailed (gene expression data) tests were also compared and discussed.

  4. FLAGS: A Flexible and Adaptive Association Test for Gene Sets Using Summary Statistics

    PubMed Central

    Huang, Jianfei; Wang, Kai; Wei, Peng; Liu, Xiangtao; Liu, Xiaoming; Tan, Kai; Boerwinkle, Eric; Potash, James B.; Han, Shizhong

    2016-01-01

    Genome-wide association studies (GWAS) have been widely used for identifying common variants associated with complex diseases. Despite remarkable success in uncovering many risk variants and providing novel insights into disease biology, genetic variants identified to date fail to explain the vast majority of the heritability for most complex diseases. One explanation is that there are still a large number of common variants that remain to be discovered, but their effect sizes are generally too small to be detected individually. Accordingly, gene set analysis of GWAS, which examines a group of functionally related genes, has been proposed as a complementary approach to single-marker analysis. Here, we propose a flexible and adaptive test for gene sets (FLAGS), using summary statistics. Extensive simulations showed that this method has an appropriate type I error rate and outperforms existing methods with increased power. As a proof of principle, through real data analyses of Crohn’s disease GWAS data and bipolar disorder GWAS meta-analysis results, we demonstrated the superior performance of FLAGS over several state-of-the-art association tests for gene sets. Our method allows for the more powerful application of gene set analysis to complex diseases, which will have broad use given that GWAS summary results are increasingly publicly available. PMID:26773050

  5. A statistical approach towards the derivation of predictive gene sets for potency ranking of chemicals in the mouse embryonic stem cell test.

    PubMed

    Schulpen, Sjors H W; Pennings, Jeroen L A; Tonk, Elisa C M; Piersma, Aldert H

    2014-03-21

    The embryonic stem cell test (EST) is applied as a model system for detection of embryotoxicants. The application of transcriptomics allows a more detailed effect assessment compared to the morphological endpoint. Genes involved in cell differentiation, modulated by chemical exposures, may be useful as biomarkers of developmental toxicity. We describe a statistical approach to obtain a predictive gene set for toxicity potency ranking of compounds within one class. This resulted in a gene set based on differential gene expression across concentration-response series of phthalatic monoesters. We determined the concentration at which gene expression was changed at least 1.5-fold. Genes responding with the same potency ranking in vitro and in vivo embryotoxicity were selected. A leave-one-out cross-validation showed that the relative potency of each phthalate was always predicted correctly. The classical morphological 50% effect level (ID50) in EST was similar to the predicted concentration using gene set expression responses. A general down-regulation of development-related genes and up-regulation of cell-cycle related genes was observed, reminiscent of the differentiation inhibition in EST. This study illustrates the feasibility of applying dedicated gene set selections as biomarkers for developmental toxicity potency ranking on the basis of in vitro testing in the EST.

  6. Seeing sets: representation by statistical properties.

    PubMed

    Ariely, D

    2001-03-01

    Sets of similar objects are common occurrences--a crowd of people, a bunch of bananas, a copse of trees, a shelf of books, a line of cars. Each item in the set may be distinct, highly visible, and discriminable. But when we look away from the set, what information do we have? The current article starts to address this question by introducing the idea of a set representation. This idea was tested using two new paradigms: mean discrimination and member identification. Three experiments using sets of different-sized spots showed that observers know a set's mean quite accurately but know little about the individual items, except their range. Taken together, these results suggest that the visual system represents the overall statistical, and not individual, properties of sets.

  7. Probabilities for separating sets of order statistics.

    PubMed

    Glueck, D H; Karimpour-Fard, A; Mandel, J; Muller, K E

    2010-04-01

    Consider a set of order statistics that arise from sorting samples from two different populations, each with their own, possibly different distribution functions. The probability that these order statistics fall in disjoint, ordered intervals and that of the smallest statistics, a certain number come from the first populations is given in terms of the two distribution functions. The result is applied to computing the joint probability of the number of rejections and the number of false rejections for the Benjamini-Hochberg false discovery rate procedure.

  8. Statistical mechanics of maximal independent sets

    NASA Astrophysics Data System (ADS)

    Dall'Asta, Luca; Pin, Paolo; Ramezanpour, Abolfazl

    2009-12-01

    The graph theoretic concept of maximal independent set arises in several practical problems in computer science as well as in game theory. A maximal independent set is defined by the set of occupied nodes that satisfy some packing and covering constraints. It is known that finding minimum and maximum-density maximal independent sets are hard optimization problems. In this paper, we use cavity method of statistical physics and Monte Carlo simulations to study the corresponding constraint satisfaction problem on random graphs. We obtain the entropy of maximal independent sets within the replica symmetric and one-step replica symmetry breaking frameworks, shedding light on the metric structure of the landscape of solutions and suggesting a class of possible algorithms. This is of particular relevance for the application to the study of strategic interactions in social and economic networks, where maximal independent sets correspond to pure Nash equilibria of a graphical game of public goods allocation.

  9. On asymptotically generalized statistical equivalent set sequences

    NASA Astrophysics Data System (ADS)

    Savas, Ekrem

    2013-10-01

    In this paper we shall study the asymptotically λ-statistical equivalent (Wijsman sense) of multiple L. In addition to these definition, natural inclusion theorems shall also be presented. This approach has not been considered in any context before.

  10. On asymptotically lacunary invariant statistical equivalent set sequences

    NASA Astrophysics Data System (ADS)

    Pancaroglu, Nimet; Nuray, Fatih; Savas, Ekrem

    2013-10-01

    In this paper, we define asymptotically invariant equivalence, strongly asymptotically invariant equivalence, asymptotically invariant statistical equivalence, asymptotically lacunary invariant statistical equivalence, strongly asymptotically lacunary invariant equivalence, asymptotically lacunary invariant equivalence (Wijsman sense) for sequences of sets. Also we investigate some relations between asymptotically lacunary invariant statistical equivalence and asymptotically invariant statistical equivalence for sequences of sets. We introduce some notions and theorems as follows, asymptotically lacunary invariant statistical equivalence, strongly asymptotically lacunary invariant equivalence, asymptotically lacunary invariant equivalence (Wijsman sense) for sequences of sets.

  11. An Independent Filter for Gene Set Testing Based on Spectral Enrichment.

    PubMed

    Frost, H Robert; Li, Zhigang; Asselbergs, Folkert W; Moore, Jason H

    2015-01-01

    Gene set testing has become an indispensable tool for the analysis of high-dimensional genomic data. An important motivation for testing gene sets, rather than individual genomic variables, is to improve statistical power by reducing the number of tested hypotheses. Given the dramatic growth in common gene set collections, however, testing is often performed with nearly as many gene sets as underlying genomic variables. To address the challenge to statistical power posed by large gene set collections, we have developed spectral gene set filtering (SGSF), a novel technique for independent filtering of gene set collections prior to gene set testing. The SGSF method uses as a filter statistic the p-value measuring the statistical significance of the association between each gene set and the sample principal components (PCs), taking into account the significance of the associated eigenvalues. Because this filter statistic is independent of standard gene set test statistics under the null hypothesis but dependent under the alternative, the proportion of enriched gene sets is increased without impacting the type I error rate. As shown using simulated and real gene expression data, the SGSF algorithm accurately filters gene sets unrelated to the experimental outcome resulting in significantly increased gene set testing power.

  12. Gene set analyses for interpreting microarray experiments on prokaryotic organisms.

    SciTech Connect

    Tintle, Nathan; Best, Aaron; Dejongh, Matthew; VanBruggen, Dirk; Heffron, Fred; Porwollik, Steffen; Taylor, Ronald C.

    2008-11-05

    Background: Recent advances in microarray technology have brought with them the need for enhanced methods of biologically interpreting gene expression data. Recently, methods like Gene Set Enrichment Analysis (GSEA) and variants of Fisher’s exact test have been proposed which utilize a priori biological information. Typically, these methods are demonstrated with a priori biological information from the Gene Ontology. Results: Alternative gene set definitions are presented based on gene sets inferred from the SEED: open-source software environment for comparative genome annotation and analysis of microbial organisms. Many of these gene sets are then shown to provide consistent expression across a series of experiments involving Salmonella Typhimurium. Implementation of the gene sets in an analysis of microarray data is then presented for the Salmonella Typhimurium data. Conclusions: SEED inferred gene sets can be naturally defined based on subsystems in the SEED. The consistent expression values of these SEED inferred gene sets suggest their utility for statistical analyses of gene expression data based on a priori biological information

  13. MAVTgsa: An R Package for Gene Set (Enrichment) Analysis

    DOE PAGES

    Chien, Chih-Yi; Chang, Ching-Wei; Tsai, Chen-An; ...

    2014-01-01

    Gene semore » t analysis methods aim to determine whether an a priori defined set of genes shows statistically significant difference in expression on either categorical or continuous outcomes. Although many methods for gene set analysis have been proposed, a systematic analysis tool for identification of different types of gene set significance modules has not been developed previously. This work presents an R package, called MAVTgsa, which includes three different methods for integrated gene set enrichment analysis. (1) The one-sided OLS (ordinary least squares) test detects coordinated changes of genes in gene set in one direction, either up- or downregulation. (2) The two-sided MANOVA (multivariate analysis variance) detects changes both up- and downregulation for studying two or more experimental conditions. (3) A random forests-based procedure is to identify gene sets that can accurately predict samples from different experimental conditions or are associated with the continuous phenotypes. MAVTgsa computes the P values and FDR (false discovery rate) q -value for all gene sets in the study. Furthermore, MAVTgsa provides several visualization outputs to support and interpret the enrichment results. This package is available online.« less

  14. Random forests-based differential analysis of gene sets for gene expression data.

    PubMed

    Hsueh, Huey-Miin; Zhou, Da-Wei; Tsai, Chen-An

    2013-04-10

    In DNA microarray studies, gene-set analysis (GSA) has become the focus of gene expression data analysis. GSA utilizes the gene expression profiles of functionally related gene sets in Gene Ontology (GO) categories or priori-defined biological classes to assess the significance of gene sets associated with clinical outcomes or phenotypes. Many statistical approaches have been proposed to determine whether such functionally related gene sets express differentially (enrichment and/or deletion) in variations of phenotypes. However, little attention has been given to the discriminatory power of gene sets and classification of patients. In this study, we propose a method of gene set analysis, in which gene sets are used to develop classifications of patients based on the Random Forest (RF) algorithm. The corresponding empirical p-value of an observed out-of-bag (OOB) error rate of the classifier is introduced to identify differentially expressed gene sets using an adequate resampling method. In addition, we discuss the impacts and correlations of genes within each gene set based on the measures of variable importance in the RF algorithm. Significant classifications are reported and visualized together with the underlying gene sets and their contribution to the phenotypes of interest. Numerical studies using both synthesized data and a series of publicly available gene expression data sets are conducted to evaluate the performance of the proposed methods. Compared with other hypothesis testing approaches, our proposed methods are reliable and successful in identifying enriched gene sets and in discovering the contributions of genes within a gene set. The classification results of identified gene sets can provide an valuable alternative to gene set testing to reveal the unknown, biologically relevant classes of samples or patients. In summary, our proposed method allows one to simultaneously assess the discriminatory ability of gene sets and the importance of genes for

  15. Sets, Probability and Statistics: The Mathematics of Life Insurance.

    ERIC Educational Resources Information Center

    Clifford, Paul C.; And Others

    The practical use of such concepts as sets, probability and statistics are considered by many to be vital and necessary to our everyday life. This student manual is intended to familiarize students with these concepts and to provide practice using real life examples. It also attempts to illustrate how the insurance industry uses such mathematic…

  16. Online Updating of Statistical Inference in the Big Data Setting.

    PubMed

    Schifano, Elizabeth D; Wu, Jing; Wang, Chun; Yan, Jun; Chen, Ming-Hui

    2016-01-01

    We present statistical methods for big data arising from online analytical processing, where large amounts of data arrive in streams and require fast analysis without storage/access to the historical data. In particular, we develop iterative estimating algorithms and statistical inferences for linear models and estimating equations that update as new data arrive. These algorithms are computationally efficient, minimally storage-intensive, and allow for possible rank deficiencies in the subset design matrices due to rare-event covariates. Within the linear model setting, the proposed online-updating framework leads to predictive residual tests that can be used to assess the goodness-of-fit of the hypothesized model. We also propose a new online-updating estimator under the estimating equation setting. Theoretical properties of the goodness-of-fit tests and proposed estimators are examined in detail. In simulation studies and real data applications, our estimator compares favorably with competing approaches under the estimating equation setting.

  17. Comparing Data Sets: Implicit Summaries of the Statistical Properties of Number Sets

    ERIC Educational Resources Information Center

    Morris, Bradley J.; Masnick, Amy M.

    2015-01-01

    Comparing datasets, that is, sets of numbers in context, is a critical skill in higher order cognition. Although much is known about how people compare single numbers, little is known about how number sets are represented and compared. We investigated how subjects compared datasets that varied in their statistical properties, including ratio of…

  18. Gene set analyses for interpreting microarray experiments on prokaryotic organisms

    PubMed Central

    Tintle, Nathan L; Best, Aaron A; DeJongh, Matthew; Van Bruggen, Dirk; Heffron, Fred; Porwollik, Steffen; Taylor, Ronald C

    2008-01-01

    Background Despite the widespread usage of DNA microarrays, questions remain about how best to interpret the wealth of gene-by-gene transcriptional levels that they measure. Recently, methods have been proposed which use biologically defined sets of genes in interpretation, instead of examining results gene-by-gene. Despite a serious limitation, a method based on Fisher's exact test remains one of the few plausible options for gene set analysis when an experiment has few replicates, as is typically the case for prokaryotes. Results We extend five methods of gene set analysis from use on experiments with multiple replicates, for use on experiments with few replicates. We then use simulated and real data to compare these methods with each other and with the Fisher's exact test (FET) method. As a result of the simulation we find that a method named MAXMEAN-NR, maintains the nominal rate of false positive findings (type I error rate) while offering good statistical power and robustness to a variety of gene set distributions for set sizes of at least 10. Other methods (ABSSUM-NR or SUM-NR) are shown to be powerful for set sizes less than 10. Analysis of three sets of experimental data shows similar results. Furthermore, the MAXMEAN-NR method is shown to be able to detect biologically relevant sets as significant, when other methods (including FET) cannot. We also find that the popular GSEA-NR method performs poorly when compared to MAXMEAN-NR. Conclusion MAXMEAN-NR is a method of gene set analysis for experiments with few replicates, as is common for prokaryotes. Results of simulation and real data analysis suggest that the MAXMEAN-NR method offers increased robustness and biological relevance of findings as compared to FET and other methods, while maintaining the nominal type I error rate. PMID:18986519

  19. WebGestalt: an integrated system for exploring gene sets in various biological contexts.

    PubMed

    Zhang, Bing; Kirov, Stefan; Snoddy, Jay

    2005-07-01

    High-throughput technologies have led to the rapid generation of large-scale datasets about genes and gene products. These technologies have also shifted our research focus from 'single genes' to 'gene sets'. We have developed a web-based integrated data mining system, WebGestalt (http://genereg.ornl.gov/webgestalt/), to help biologists in exploring large sets of genes. WebGestalt is composed of four modules: gene set management, information retrieval, organization/visualization, and statistics. The management module uploads, saves, retrieves and deletes gene sets, as well as performs Boolean operations to generate the unions, intersections or differences between different gene sets. The information retrieval module currently retrieves information for up to 20 attributes for all genes in a gene set. The organization/visualization module organizes and visualizes gene sets in various biological contexts, including Gene Ontology, tissue expression pattern, chromosome distribution, metabolic and signaling pathways, protein domain information and publications. The statistics module recommends and performs statistical tests to suggest biological areas that are important to a gene set and warrant further investigation. In order to demonstrate the use of WebGestalt, we have generated 48 gene sets with genes over-represented in various human tissue types. Exploration of all the 48 gene sets using WebGestalt is available for the public at http://genereg.ornl.gov/webgestalt/wg_enrich.php.

  20. Analysis of gene set using shrinkage covariance matrix approach

    NASA Astrophysics Data System (ADS)

    Karjanto, Suryaefiza; Aripin, Rasimah

    2013-09-01

    Microarray methodology has been exploited for different applications such as gene discovery and disease diagnosis. This technology is also used for quantitative and highly parallel measurements of gene expression. Recently, microarrays have been one of main interests of statisticians because they provide a perfect example of the paradigms of modern statistics. In this study, the alternative approach to estimate the covariance matrix has been proposed to solve the high dimensionality problem in microarrays. The extension of traditional Hotelling's T2 statistic is constructed for determining the significant gene sets across experimental conditions using shrinkage approach. Real data sets were used as illustrations to compare the performance of the proposed methods with other methods. The results across the methods are consistent, implying that this approach provides an alternative to existing techniques.

  1. The limitations of simple gene set enrichment analysis assuming gene independence.

    PubMed

    Tamayo, Pablo; Steinhardt, George; Liberzon, Arthur; Mesirov, Jill P

    2016-02-01

    Since its first publication in 2003, the Gene Set Enrichment Analysis method, based on the Kolmogorov-Smirnov statistic, has been heavily used, modified, and also questioned. Recently a simplified approach using a one-sample t-test score to assess enrichment and ignoring gene-gene correlations was proposed by Irizarry et al. 2009 as a serious contender. The argument criticizes Gene Set Enrichment Analysis's nonparametric nature and its use of an empirical null distribution as unnecessary and hard to compute. We refute these claims by careful consideration of the assumptions of the simplified method and its results, including a comparison with Gene Set Enrichment Analysis's on a large benchmark set of 50 datasets. Our results provide strong empirical evidence that gene-gene correlations cannot be ignored due to the significant variance inflation they produced on the enrichment scores and should be taken into account when estimating gene set enrichment significance. In addition, we discuss the challenges that the complex correlation structure and multi-modality of gene sets pose more generally for gene set enrichment methods.

  2. WebGestalt: an integrated system for exploring gene sets in various biological contexts

    PubMed Central

    Zhang, Bing; Kirov, Stefan; Snoddy, Jay

    2005-01-01

    High-throughput technologies have led to the rapid generation of large-scale datasets about genes and gene products. These technologies have also shifted our research focus from ‘single genes’ to ‘gene sets’. We have developed a web-based integrated data mining system, WebGestalt (), to help biologists in exploring large sets of genes. WebGestalt is composed of four modules: gene set management, information retrieval, organization/visualization, and statistics. The management module uploads, saves, retrieves and deletes gene sets, as well as performs Boolean operations to generate the unions, intersections or differences between different gene sets. The information retrieval module currently retrieves information for up to 20 attributes for all genes in a gene set. The organization/visualization module organizes and visualizes gene sets in various biological contexts, including Gene Ontology, tissue expression pattern, chromosome distribution, metabolic and signaling pathways, protein domain information and publications. The statistics module recommends and performs statistical tests to suggest biological areas that are important to a gene set and warrant further investigation. In order to demonstrate the use of WebGestalt, we have generated 48 gene sets with genes over-represented in various human tissue types. Exploration of all the 48 gene sets using WebGestalt is available for the public at . PMID:15980575

  3. What's statistical about learning? Insights from modelling statistical learning as a set of memory processes.

    PubMed

    Thiessen, Erik D

    2017-01-05

    Statistical learning has been studied in a variety of different tasks, including word segmentation, object identification, category learning, artificial grammar learning and serial reaction time tasks (e.g. Saffran et al. 1996 Science 274: , 1926-1928; Orban et al. 2008 Proceedings of the National Academy of Sciences 105: , 2745-2750; Thiessen & Yee 2010 Child Development 81: , 1287-1303; Saffran 2002 Journal of Memory and Language 47: , 172-196; Misyak & Christiansen 2012 Language Learning 62: , 302-331). The difference among these tasks raises questions about whether they all depend on the same kinds of underlying processes and computations, or whether they are tapping into different underlying mechanisms. Prior theoretical approaches to statistical learning have often tried to explain or model learning in a single task. However, in many cases these approaches appear inadequate to explain performance in multiple tasks. For example, explaining word segmentation via the computation of sequential statistics (such as transitional probability) provides little insight into the nature of sensitivity to regularities among simultaneously presented features. In this article, we will present a formal computational approach that we believe is a good candidate to provide a unifying framework to explore and explain learning in a wide variety of statistical learning tasks. This framework suggests that statistical learning arises from a set of processes that are inherent in memory systems, including activation, interference, integration of information and forgetting (e.g. Perruchet & Vinter 1998 Journal of Memory and Language 39: , 246-263; Thiessen et al. 2013 Psychological Bulletin 139: , 792-814). From this perspective, statistical learning does not involve explicit computation of statistics, but rather the extraction of elements of the input into memory traces, and subsequent integration across those memory traces that emphasize consistent information (Thiessen and Pavlik

  4. The Gene Set Builder: collation, curation, and distribution of sets of genes

    PubMed Central

    Yusuf, Dimas; Lim, Jonathan S; Wasserman, Wyeth W

    2005-01-01

    Background In bioinformatics and genomics, there are many applications designed to investigate the common properties for a set of genes. Often, these multi-gene analysis tools attempt to reveal sequential, functional, and expressional ties. However, while tremendous effort has been invested in developing tools that can analyze a set of genes, minimal effort has been invested in developing tools that can help researchers compile, store, and annotate gene sets in the first place. As a result, the process of making or accessing a set often involves tedious and time consuming steps such as finding identifiers for each individual gene. These steps are often repeated extensively to shift from one identifier type to another; or to recreate a published set. In this paper, we present a simple online tool which – with the help of the gene catalogs Ensembl and GeneLynx – can help researchers build and annotate sets of genes quickly and easily. Description The Gene Set Builder is a database-driven, web-based tool designed to help researchers compile, store, export, and share sets of genes. This application supports the 17 eukaryotic genomes found in version 32 of the Ensembl database, which includes species from yeast to human. User-created information such as sets and customized annotations are stored to facilitate easy access. Gene sets stored in the system can be "exported" in a variety of output formats – as lists of identifiers, in tables, or as sequences. In addition, gene sets can be "shared" with specific users to facilitate collaborations or fully released to provide access to published results. The application also features a Perl API (Application Programming Interface) for direct connectivity to custom analysis tools. A downloadable Quick Reference guide and an online tutorial are available to help new users learn its functionalities. Conclusion The Gene Set Builder is an Ensembl-facilitated online tool designed to help researchers compile and manage sets of

  5. A test statistic for the affected-sib-set method.

    PubMed

    Lange, K

    1986-07-01

    This paper discusses generalizations of the affected-sib-pair method. First, the requirement that sib identity-by-descent relations be known unambiguously is relaxed by substituting sib identity-by-state relations. This permits affected sibs to be used even when their parents are unavailable for typing. In the limit of an infinite number of marker alleles each of infinitesimal population frequency, the identity-by-state relations coincide with the usual identity-by-descent relations. Second, a weighted pairs test statistic is proposed that covers affected sib sets of size greater than two. These generalizations make the affected-sib-pair method a more powerful technique for detecting departures from independent segregation of disease and marker phenotypes. A sample calculation suggests such a departure for tuberculoid leprosy and the HLA D locus.

  6. STATISTICS OF DARK MATTER HALOS FROM THE EXCURSION SET APPROACH

    SciTech Connect

    Lapi, A.; Salucci, P.; Danese, L.

    2013-08-01

    We exploit the excursion set approach in integral formulation to derive novel, accurate analytic approximations of the unconditional and conditional first crossing distributions for random walks with uncorrelated steps and general shapes of the moving barrier; we find the corresponding approximations of the unconditional and conditional halo mass functions for cold dark matter (DM) power spectra to represent very well the outcomes of state-of-the-art cosmological N-body simulations. In addition, we apply these results to derive, and confront with simulations, other quantities of interest in halo statistics, including the rates of halo formation and creation, the average halo growth history, and the halo bias. Finally, we discuss how our approach and main results change when considering random walks with correlated instead of uncorrelated steps, and warm instead of cold DM power spectra.

  7. Using the Gene Ontology to Scan Multi-Level Gene Sets for Associations in Genome Wide Association Studies

    PubMed Central

    Schaid, Daniel J.; Sinnwell, Jason P.; Jenkins, Gregory D.; McDonnell, Shannon K.; Ingle, James N.; Kubo, Michiaki; Goss, Paul E.; Costantino, Joseph P.; Wickerham, D. Lawrence; Weinshilboum, Richard M.

    2011-01-01

    Gene-set analyses have been widely used in gene expression studies, and some of the developed methods have been extended to genome wide association studies (GWAS). Yet, complications due to linkage disequilibrium (LD) among single nucleotide polymorphisms (SNPs), and variable numbers of SNPs per gene and genes per gene-set, have plagued current approaches, often leading to ad hoc “fixes”. To overcome some of the current limitations, we developed a general approach to scan GWAS SNP data for both gene-level and gene-set analyses, building on score statistics for generalized linear models, and taking advantage of the directed acyclic graph structure of the gene ontology when creating gene-sets. However, other types of gene-set structures can be used, such as the popular Kyoto Encyclopedia of Genes and Genomes (KEGG). Our approach combines SNPs into genes, and genes into gene-sets, but assures that positive and negative effects of genes on a trait do not cancel. To control for multiple testing of many gene-sets, we use an efficient computational strategy that accounts for LD and provides accurate step-down adjusted p-values for each gene-set. Application of our methods to two different GWAS provide guidance on the potential strengths and weaknesses of our proposed gene-set analyses. PMID:22161999

  8. Gene Sets Net Correlations Analysis (GSNCA): a multivariate differential coexpression test for gene sets

    PubMed Central

    Rahmatallah, Yasir; Emmert-Streib, Frank; Glazko, Galina

    2014-01-01

    Motivation: To date, gene set analysis approaches primarily focus on identifying differentially expressed gene sets (pathways). Methods for identifying differentially coexpressed pathways also exist but are mostly based on aggregated pairwise correlations or other pairwise measures of coexpression. Instead, we propose Gene Sets Net Correlations Analysis (GSNCA), a multivariate differential coexpression test that accounts for the complete correlation structure between genes. Results: In GSNCA, weight factors are assigned to genes in proportion to the genes’ cross-correlations (intergene correlations). The problem of finding the weight vectors is formulated as an eigenvector problem with a unique solution. GSNCA tests the null hypothesis that for a gene set there is no difference in the weight vectors of the genes between two conditions. In simulation studies and the analyses of experimental data, we demonstrate that GSNCA captures changes in the structure of genes’ cross-correlations rather than differences in the averaged pairwise correlations. Thus, GSNCA infers differences in coexpression networks, however, bypassing method-dependent steps of network inference. As an additional result from GSNCA, we define hub genes as genes with the largest weights and show that these genes correspond frequently to major and specific pathway regulators, as well as to genes that are most affected by the biological difference between two conditions. In summary, GSNCA is a new approach for the analysis of differentially coexpressed pathways that also evaluates the importance of the genes in the pathways, thus providing unique information that may result in the generation of novel biological hypotheses. Availability and implementation: Implementation of the GSNCA test in R is available upon request from the authors. Contact: YRahmatallah@uams.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24292935

  9. A statistical mechanics analysis of the set covering problem

    NASA Astrophysics Data System (ADS)

    Fontanari, J. F.

    1996-02-01

    The dependence of the optimal solution average cost 0305-4470/29/3/004/img1 of the set covering problem on the density of 1's of the incidence matrix (0305-4470/29/3/004/img2) and on the number of constraints (P) is investigated in the limit where the number of items (N) goes to infinity. The annealed approximation is employed to study two stochastic models: the constant density model, where the elements of the incidence matrix are statistically independent random variables, and the Karp model, where the rows of the incidence matrix possess the same number of 1's. Lower bounds for 0305-4470/29/3/004/img1 are presented in the case that P scales with ln N and 0305-4470/29/3/004/img2 is of order 1, as well as in the case that P scales linearly with N and 0305-4470/29/3/004/img2 is of order 1/N. It is shown that in the case that P scales with exp N and 0305-4470/29/3/004/img2 is of order 1 the annealed approximation yields exact results for both models.

  10. GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies

    PubMed Central

    Zhang, Bing; Schmoyer, Denise; Kirov, Stefan; Snoddy, Jay

    2004-01-01

    Background Microarray and other high-throughput technologies are producing large sets of interesting genes that are difficult to analyze directly. Bioinformatics tools are needed to interpret the functional information in the gene sets. Results We have created a web-based tool for data analysis and data visualization for sets of genes called GOTree Machine (GOTM). This tool was originally intended to analyze sets of co-regulated genes identified from microarray analysis but is adaptable for use with other gene sets from other high-throughput analyses. GOTree Machine generates a GOTree, a tree-like structure to navigate the Gene Ontology Directed Acyclic Graph for input gene sets. This system provides user friendly data navigation and visualization. Statistical analysis helps users to identify the most important Gene Ontology categories for the input gene sets and suggests biological areas that warrant further study. GOTree Machine is available online at . Conclusion GOTree Machine has a broad application in functional genomic, proteomic and other high-throughput methods that generate large sets of interesting genes; its primary purpose is to help users sort for interesting patterns in gene sets. PMID:14975175

  11. Statistical Mechanics of the Minimum Dominating Set Problem

    NASA Astrophysics Data System (ADS)

    Zhao, Jin-Hua; Habibulla, Yusupjan; Zhou, Hai-Jun

    2015-06-01

    The minimum dominating set (MDS) problem has wide applications in network science and related fields. It aims at constructing a node set of smallest size such that any node of the network is either in this set or is adjacent to at least one node of this set. Although this optimization problem is generally very difficult, we show it can be exactly solved by a generalized leaf-removal (GLR) process if the network contains no core. We present a percolation theory to describe the GLR process on random networks, and solve a spin glass model by mean field method to estimate the MDS size. We also implement a message-passing algorithm and a local heuristic algorithm that combines GLR with greedy node-removal to obtain near-optimal solutions for single random networks. Our algorithms also perform well on real-world network instances.

  12. Studying the complex expression dependences between sets of coexpressed genes.

    PubMed

    Huerta, Mario; Casanova, Oriol; Barchino, Roberto; Flores, Jose; Querol, Enrique; Cedano, Juan

    2014-01-01

    Organisms simplify the orchestration of gene expression by coregulating genes whose products function together in the cell. The use of clustering methods to obtain sets of coexpressed genes from expression arrays is very common; nevertheless there are no appropriate tools to study the expression networks among these sets of coexpressed genes. The aim of the developed tools is to allow studying the complex expression dependences that exist between sets of coexpressed genes. For this purpose, we start detecting the nonlinear expression relationships between pairs of genes, plus the coexpressed genes. Next, we form networks among sets of coexpressed genes that maintain nonlinear expression dependences between all of them. The expression relationship between the sets of coexpressed genes is defined by the expression relationship between the skeletons of these sets, where this skeleton represents the coexpressed genes with a well-defined nonlinear expression relationship with the skeleton of the other sets. As a result, we can study the nonlinear expression relationships between a target gene and other sets of coexpressed genes, or start the study from the skeleton of the sets, to study the complex relationships of activation and deactivation between the sets of coexpressed genes that carry out the different cellular processes present in the expression experiments.

  13. Correcting Transcription Factor Gene Sets for Copy Number and Promoter Methylation Variations

    PubMed Central

    Rathi, Komal S.; Gaykalova, Daria A.; Hennesey, Patrick; Califano, Joseph A.; Ochs, Michael F.

    2014-01-01

    Gene set analysis provides a method to generate statistical inferences across sets of linked genes, primarily using high-throughput expression data. Common gene sets include biological pathways, operons, and targets of transcriptional regulators. In higher eukaryotes, especially when dealing with diseases with strong genetic and epigenetic components such as cancer, copy number loss and gene silencing through promoter methylation can eliminate the possibility that a gene is transcribed. This, in turn, can adversely affect the estimation of transcription factor or pathway activity from a set of target genes, since some of the targets may not be responsive to transcriptional regulation. Here we introduce a simple filtering approach that removes genes from consideration if they show copy number loss or promoter methylation and demonstrate the improvement in inference of transcription factor activity in a simulated data set based on the background expression observed in normal head and neck tissue. PMID:25195578

  14. The Effect of Distributed Practice in Undergraduate Statistics Homework Sets: A Randomized Trial

    ERIC Educational Resources Information Center

    Crissinger, Bryan R.

    2015-01-01

    Most homework sets in statistics courses are constructed so that students concentrate or "mass" their practice on a certain topic in one problem set. Distributed practice homework sets include review problems in each set so that practice on a topic is distributed across problem sets. There is a body of research that points to the…

  15. Excursion sets and non-Gaussian void statistics

    SciTech Connect

    D'Amico, Guido; Musso, Marcello; Paranjape, Aseem; Norena, Jorge

    2011-01-15

    Primordial non-Gaussianity (NG) affects the large scale structure (LSS) of the Universe by leaving an imprint on the distribution of matter at late times. Much attention has been focused on using the distribution of collapsed objects (i.e. dark matter halos and the galaxies and galaxy clusters that reside in them) to probe primordial NG. An equally interesting and complementary probe however is the abundance of extended underdense regions or voids in the LSS. The calculation of the abundance of voids using the excursion set formalism in the presence of primordial NG is subject to the same technical issues as the one for halos, which were discussed e.g. in Ref. [51][G. D'Amico, M. Musso, J. Norena, and A. Paranjape, arXiv:1005.1203.]. However, unlike the excursion set problem for halos which involved random walks in the presence of one barrier {delta}{sub c}, the void excursion set problem involves two barriers {delta}{sub v} and {delta}{sub c}. This leads to a new complication introduced by what is called the 'void-in-cloud' effect discussed in the literature, which is unique to the case of voids. We explore a path integral approach which allows us to carefully account for all these issues, leading to a rigorous derivation of the effects of primordial NG on void abundances. The void-in-cloud issue, in particular, makes the calculation conceptually rather different from the one for halos. However, we show that its final effect can be described by a simple yet accurate approximation. Our final void abundance function is valid on larger scales than the expressions of other authors, while being broadly in agreement with those expressions on smaller scales.

  16. Correcting transcription factor gene sets for copy number and promoter methylation variations.

    PubMed

    Rathi, Komal S; Gaykalova, Daria A; Hennessey, Patrick; Califano, Joseph A; Ochs, Michael F

    2014-09-01

    Gene set analysis provides a method to generate statistical inferences across sets of linked genes, primarily using high-throughput expression data. Common gene sets include biological pathways, operons, and targets of transcriptional regulators. In higher eukaryotes, especially when dealing with diseases with strong genetic and epigenetic components such as cancer, copy number loss and gene silencing through promoter methylation can eliminate the possibility that a gene is transcribed. This, in turn, can adversely affect the estimation of transcription factor or pathway activity from a set of target genes, as some of the targets may not be responsive to transcriptional regulation. Here we introduce a simple filtering approach that removes genes from consideration if they show copy number loss or promoter methylation, and demonstrate the improvement in inference of transcription factor activity in a simulated dataset based on the background expression observed in normal head and neck tissue.

  17. Chronic periodontitis genome-wide association studies: gene-centric and gene set enrichment analyses.

    PubMed

    Rhodin, K; Divaris, K; North, K E; Barros, S P; Moss, K; Beck, J D; Offenbacher, S

    2014-09-01

    Recent genome-wide association studies (GWAS) of chronic periodontitis (CP) offer rich data sources for the investigation of candidate genes, functional elements, and pathways. We used GWAS data of CP (n = 4,504) and periodontal pathogen colonization (n = 1,020) from a cohort of adult Americans of European descent participating in the Atherosclerosis Risk in Communities study and employed a MAGENTA approach (i.e., meta-analysis gene set enrichment of variant associations) to obtain gene-centric and gene set association results corrected for gene size, number of single-nucleotide polymorphisms, and local linkage disequilibrium characteristics based on the human genome build 18 (National Center for Biotechnology Information build 36). We used the Gene Ontology, Ingenuity, KEGG, Panther, Reactome, and Biocarta databases for gene set enrichment analyses. Six genes showed evidence of statistically significant association: 4 with severe CP (NIN, p = 1.6 × 10(-7); ABHD12B, p = 3.6 × 10(-7); WHAMM, p = 1.7 × 10(-6); AP3B2, p = 2.2 × 10(-6)) and 2 with high periodontal pathogen colonization (red complex-KCNK1, p = 3.4 × 10(-7); Porphyromonas gingivalis-DAB2IP, p = 1.0 × 10(-6)). Top-ranked genes for moderate CP were HGD (p = 1.4 × 10(-5)), ZNF675 (p = 1.5 × 10(-5)), TNFRSF10C (p = 2.0 × 10(-5)), and EMR1 (p = 2.0 × 10(-5)). Loci containing NIN, EMR1, KCNK1, and DAB2IP had showed suggestive evidence of association in the earlier single-nucleotide polymorphism-based analyses, whereas WHAMM and AP2B2 emerged as novel candidates. The top gene sets included severe CP ("endoplasmic reticulum membrane," "cytochrome P450," "microsome," and "oxidation reduction") and moderate CP ("regulation of gene expression," "zinc ion binding," "BMP signaling pathway," and "ruffle"). Gene-centric analyses offer a promising avenue for efficient interrogation of large-scale GWAS data. These results highlight genes in previously identified loci and new candidate genes and pathways

  18. Sets, Probability and Statistics: The Mathematics of Life Insurance. [Computer Program.] Second Edition.

    ERIC Educational Resources Information Center

    King, James M.; And Others

    The materials described here represent the conversion of a highly popular student workbook "Sets, Probability and Statistics: The Mathematics of Life Insurance" into a computer program. The program is designed to familiarize students with the concepts of sets, probability, and statistics, and to provide practice using real life examples. It also…

  19. Principles for the organization of gene-sets.

    PubMed

    Li, Wentian; Freudenberg, Jan; Oswald, Michaela

    2015-12-01

    A gene-set, an important concept in microarray expression analysis and systems biology, is a collection of genes and/or their products (i.e. proteins) that have some features in common. There are many different ways to construct gene-sets, but a systematic organization of these ways is lacking. Gene-sets are mainly organized ad hoc in current public-domain databases, with group header names often determined by practical reasons (such as the types of technology in obtaining the gene-sets or a balanced number of gene-sets under a header). Here we aim at providing a gene-set organization principle according to the level at which genes are connected: homology, physical map proximity, chemical interaction, biological, and phenotypic-medical levels. We also distinguish two types of connections between genes: actual connection versus sharing of a label. Actual connections denote direct biological interactions, whereas shared label connection denotes shared membership in a group. Some extensions of the framework are also addressed such as overlapping of gene-sets, modules, and the incorporation of other non-protein-coding entities such as microRNAs.

  20. SiBIC: a web server for generating gene set networks based on biclusters obtained by maximal frequent itemset mining.

    PubMed

    Takahashi, Kei-ichiro; Takigawa, Ichigaku; Mamitsuka, Hiroshi

    2013-01-01

    Detecting biclusters from expression data is useful, since biclusters are coexpressed genes under only part of all given experimental conditions. We present a software called SiBIC, which from a given expression dataset, first exhaustively enumerates biclusters, which are then merged into rather independent biclusters, which finally are used to generate gene set networks, in which a gene set assigned to one node has coexpressed genes. We evaluated each step of this procedure: 1) significance of the generated biclusters biologically and statistically, 2) biological quality of merged biclusters, and 3) biological significance of gene set networks. We emphasize that gene set networks, in which nodes are not genes but gene sets, can be more compact than usual gene networks, meaning that gene set networks are more comprehensible. SiBIC is available at http://utrecht.kuicr.kyoto-u.ac.jp:8080/miami/faces/index.jsp.

  1. Gene coexpression measures in large heterogeneous samples using count statistics.

    PubMed

    Wang, Y X Rachel; Waterman, Michael S; Huang, Haiyan

    2014-11-18

    With the advent of high-throughput technologies making large-scale gene expression data readily available, developing appropriate computational tools to process these data and distill insights into systems biology has been an important part of the "big data" challenge. Gene coexpression is one of the earliest techniques developed that is still widely in use for functional annotation, pathway analysis, and, most importantly, the reconstruction of gene regulatory networks, based on gene expression data. However, most coexpression measures do not specifically account for local features in expression profiles. For example, it is very likely that the patterns of gene association may change or only exist in a subset of the samples, especially when the samples are pooled from a range of experiments. We propose two new gene coexpression statistics based on counting local patterns of gene expression ranks to take into account the potentially diverse nature of gene interactions. In particular, one of our statistics is designed for time-course data with local dependence structures, such as time series coupled over a subregion of the time domain. We provide asymptotic analysis of their distributions and power, and evaluate their performance against a wide range of existing coexpression measures on simulated and real data. Our new statistics are fast to compute, robust against outliers, and show comparable and often better general performance.

  2. Principal Angle Enrichment Analysis (PAEA): Dimensionally Reduced Multivariate Gene Set Enrichment Analysis Tool.

    PubMed

    Clark, Neil R; Szymkiewicz, Maciej; Wang, Zichen; Monteiro, Caroline D; Jones, Matthew R; Ma'ayan, Avi

    2015-11-01

    Gene set analysis of differential expression, which identifies collectively differentially expressed gene sets, has become an important tool for biology. The power of this approach lies in its reduction of the dimensionality of the statistical problem and its incorporation of biological interpretation by construction. Many approaches to gene set analysis have been proposed, but benchmarking their performance in the setting of real biological data is difficult due to the lack of a gold standard. In a previously published work we proposed a geometrical approach to differential expression which performed highly in benchmarking tests and compared well to the most popular methods of differential gene expression. As reported, this approach has a natural extension to gene set analysis which we call Principal Angle Enrichment Analysis (PAEA). PAEA employs dimensionality reduction and a multivariate approach for gene set enrichment analysis. However, the performance of this method has not been assessed nor its implementation as a web-based tool. Here we describe new benchmarking protocols for gene set analysis methods and find that PAEA performs highly. The PAEA method is implemented as a user-friendly web-based tool, which contains 70 gene set libraries and is freely available to the community.

  3. Identifying the optimal gene and gene set in hepatocellular carcinoma based on differential expression and differential co-expression algorithm.

    PubMed

    Dong, Li-Yang; Zhou, Wei-Zhong; Ni, Jun-Wei; Xiang, Wei; Hu, Wen-Hao; Yu, Chang; Li, Hai-Yan

    2017-02-01

    The objective of this study was to identify the optimal gene and gene set for hepatocellular carcinoma (HCC) utilizing differential expression and differential co-expression (DEDC) algorithm. The DEDC algorithm consisted of four parts: calculating differential expression (DE) by absolute t-value in t-statistics; computing differential co-expression (DC) based on Z-test; determining optimal thresholds on the basis of Chi-squared (χ2) maximization and the corresponding gene was the optimal gene; and evaluating functional relevance of genes categorized into different partitions to determine the optimal gene set with highest mean minimum functional information (FI) gain (Δ*G). The optimal thresholds divided genes into four partitions, high DE and high DC (HDE-HDC), high DE and low DC (HDE-LDC), low DE and high DC (LDE‑HDC), and low DE and low DC (LDE-LDC). In addition, the optimal gene was validated by conducting reverse transcription-polymerase chain reaction (RT-PCR) assay. The optimal threshold for DC and DE were 1.032 and 1.911, respectively. Using the optimal gene, the genes were divided into four partitions including: HDE-HDC (2,053 genes), HED-LDC (2,822 genes), LDE-HDC (2,622 genes), and LDE-LDC (6,169 genes). The optimal gene was microtubule‑associated protein RP/EB family member 1 (MAPRE1), and RT-PCR assay validated the significant difference between the HCC and normal state. The optimal gene set was nucleoside metabolic process (GO\\GO:0009116) with Δ*G = 18.681 and 24 HDE-HDC partitions in total. In conclusion, we successfully investigated the optimal gene, MAPRE1, and gene set, nucleoside metabolic process, which may be potential biomarkers for targeted therapy and provide significant insight for revealing the pathological mechanism underlying HCC.

  4. Curated eutherian third party data gene data sets

    PubMed Central

    Premzl, Marko

    2015-01-01

    The free available eutherian genomic sequence data sets advanced scientific field of genomics. Of note, future revisions of gene data sets were expected, due to incompleteness of public eutherian genomic sequence assemblies and potential genomic sequence errors. The eutherian comparative genomic analysis protocol was proposed as guidance in protection against potential genomic sequence errors in public eutherian genomic sequences. The protocol was applicable in updates of 7 major eutherian gene data sets, including 812 complete coding sequences deposited in European Nucleotide Archive as curated third party data gene data sets. PMID:26862561

  5. FDR-FET: an optimizing gene set enrichment analysis method.

    PubMed

    Ji, Rui-Ru; Ott, Karl-Heinz; Yordanova, Roumyana; Bruccoleri, Robert E

    2011-01-01

    Gene set enrichment analysis for analyzing large profiling and screening experiments can reveal unifying biological schemes based on previously accumulated knowledge represented as "gene sets". Most of the existing implementations use a fixed fold-change or P value cutoff to generate regulated gene lists. However, the threshold selection in most cases is arbitrary, and has a significant effect on the test outcome and interpretation of the experiment. We developed a new gene set enrichment analysis method, ie, FDR-FET, which dynamically optimizes the threshold choice and improves the sensitivity and selectivity of gene set enrichment analysis. The procedure translates experimental results into a series of regulated gene lists at multiple false discovery rate (FDR) cutoffs, and computes the P value of the overrepresentation of a gene set using a Fisher's exact test (FET) in each of these gene lists. The lowest P value is retained to represent the significance of the gene set. We also implemented improved methods to define a more relevant global reference set for the FET. We demonstrate the validity of the method using a published microarray study of three protease inhibitors of the human immunodeficiency virus and compare the results with those from other popular gene set enrichment analysis algorithms. Our results show that combining FDR with multiple cutoffs allows us to control the error while retaining genes that increase information content. We conclude that FDR-FET can selectively identify significant affected biological processes. Our method can be used for any user-generated gene list in the area of transcriptome, proteome, and other biological and scientific applications.

  6. Prior knowledge transfer across transcriptional data sets and technologies using compositional statistics yields new mislabelled ovarian cell line

    PubMed Central

    Blayney, Jaine K.; Davison, Timothy; McCabe, Nuala; Walker, Steven; Keating, Karen; Delaney, Thomas; Greenan, Caroline; Williams, Alistair R.; McCluggage, W. Glenn; Capes-Davis, Amanda; Harkin, D. Paul; Gourley, Charlie; Kennedy, Richard D.

    2016-01-01

    Here, we describe gene expression compositional assignment (GECA), a powerful, yet simple method based on compositional statistics that can validate the transfer of prior knowledge, such as gene lists, into independent data sets, platforms and technologies. Transcriptional profiling has been used to derive gene lists that stratify patients into prognostic molecular subgroups and assess biomarker performance in the pre-clinical setting. Archived public data sets are an invaluable resource for subsequent in silico validation, though their use can lead to data integration issues. We show that GECA can be used without the need for normalising expression levels between data sets and can outperform rank-based correlation methods. To validate GECA, we demonstrate its success in the cross-platform transfer of gene lists in different domains including: bladder cancer staging, tumour site of origin and mislabelled cell lines. We also show its effectiveness in transferring an epithelial ovarian cancer prognostic gene signature across technologies, from a microarray to a next-generation sequencing setting. In a final case study, we predict the tumour site of origin and histopathology of epithelial ovarian cancer cell lines. In particular, we identify and validate the commonly-used cell line OVCAR-5 as non-ovarian, being gastrointestinal in origin. GECA is available as an open-source R package. PMID:27353327

  7. IGSA: Individual Gene Sets Analysis, including Enrichment and Clustering

    PubMed Central

    Liu, Lei; Ma, Hongzhe; Yang, Jingbo; Xie, Hongbo; Liu, Bo; Jin, Qing

    2016-01-01

    Analysis of gene sets has been widely applied in various high-throughput biological studies. One weakness in the traditional methods is that they neglect the heterogeneity of genes expressions in samples which may lead to the omission of some specific and important gene sets. It is also difficult for them to reflect the severities of disease and provide expression profiles of gene sets for individuals. We developed an application software called IGSA that leverages a powerful analytical capacity in gene sets enrichment and samples clustering. IGSA calculates gene sets expression scores for each sample and takes an accumulating clustering strategy to let the samples gather into the set according to the progress of disease from mild to severe. We focus on gastric, pancreatic and ovarian cancer data sets for the performance of IGSA. We also compared the results of IGSA in KEGG pathways enrichment with David, GSEA, SPIA, ssGSEA and analyzed the results of IGSA clustering and different similarity measurement methods. Notably, IGSA is proved to be more sensitive and specific in finding significant pathways, and can indicate related changes in pathways with the severity of disease. In addition, IGSA provides with significant gene sets profile for each sample. PMID:27764138

  8. IGSA: Individual Gene Sets Analysis, including Enrichment and Clustering.

    PubMed

    Wu, Lingxiang; Chen, Xiujie; Zhang, Denan; Zhang, Wubing; Liu, Lei; Ma, Hongzhe; Yang, Jingbo; Xie, Hongbo; Liu, Bo; Jin, Qing

    2016-01-01

    Analysis of gene sets has been widely applied in various high-throughput biological studies. One weakness in the traditional methods is that they neglect the heterogeneity of genes expressions in samples which may lead to the omission of some specific and important gene sets. It is also difficult for them to reflect the severities of disease and provide expression profiles of gene sets for individuals. We developed an application software called IGSA that leverages a powerful analytical capacity in gene sets enrichment and samples clustering. IGSA calculates gene sets expression scores for each sample and takes an accumulating clustering strategy to let the samples gather into the set according to the progress of disease from mild to severe. We focus on gastric, pancreatic and ovarian cancer data sets for the performance of IGSA. We also compared the results of IGSA in KEGG pathways enrichment with David, GSEA, SPIA, ssGSEA and analyzed the results of IGSA clustering and different similarity measurement methods. Notably, IGSA is proved to be more sensitive and specific in finding significant pathways, and can indicate related changes in pathways with the severity of disease. In addition, IGSA provides with significant gene sets profile for each sample.

  9. FDR-FET: an optimizing gene set enrichment analysis method

    PubMed Central

    Ji, Rui-Ru; Ott, Karl-Heinz; Yordanova, Roumyana; Bruccoleri, Robert E

    2011-01-01

    Gene set enrichment analysis for analyzing large profiling and screening experiments can reveal unifying biological schemes based on previously accumulated knowledge represented as “gene sets”. Most of the existing implementations use a fixed fold-change or P value cutoff to generate regulated gene lists. However, the threshold selection in most cases is arbitrary, and has a significant effect on the test outcome and interpretation of the experiment. We developed a new gene set enrichment analysis method, ie, FDR-FET, which dynamically optimizes the threshold choice and improves the sensitivity and selectivity of gene set enrichment analysis. The procedure translates experimental results into a series of regulated gene lists at multiple false discovery rate (FDR) cutoffs, and computes the P value of the overrepresentation of a gene set using a Fisher’s exact test (FET) in each of these gene lists. The lowest P value is retained to represent the significance of the gene set. We also implemented improved methods to define a more relevant global reference set for the FET. We demonstrate the validity of the method using a published microarray study of three protease inhibitors of the human immunodeficiency virus and compare the results with those from other popular gene set enrichment analysis algorithms. Our results show that combining FDR with multiple cutoffs allows us to control the error while retaining genes that increase information content. We conclude that FDR-FET can selectively identify significant affected biological processes. Our method can be used for any user-generated gene list in the area of transcriptome, proteome, and other biological and scientific applications. PMID:21918636

  10. Statistical Approaches for Gene Selection, Hub Gene Identification and Module Interaction in Gene Co-Expression Network Analysis: An Application to Aluminum Stress in Soybean (Glycine max L.)

    PubMed Central

    Das, Samarendra; Meher, Prabina Kumar; Bhar, Lal Mohan; Mandal, Baidya Nath

    2017-01-01

    Selection of informative genes is an important problem in gene expression studies. The small sample size and the large number of genes in gene expression data make the selection process complex. Further, the selected informative genes may act as a vital input for gene co-expression network analysis. Moreover, the identification of hub genes and module interactions in gene co-expression networks is yet to be fully explored. This paper presents a statistically sound gene selection technique based on support vector machine algorithm for selecting informative genes from high dimensional gene expression data. Also, an attempt has been made to develop a statistical approach for identification of hub genes in the gene co-expression network. Besides, a differential hub gene analysis approach has also been developed to group the identified hub genes into various groups based on their gene connectivity in a case vs. control study. Based on this proposed approach, an R package, i.e., dhga (https://cran.r-project.org/web/packages/dhga) has been developed. The comparative performance of the proposed gene selection technique as well as hub gene identification approach was evaluated on three different crop microarray datasets. The proposed gene selection technique outperformed most of the existing techniques for selecting robust set of informative genes. Based on the proposed hub gene identification approach, a few number of hub genes were identified as compared to the existing approach, which is in accordance with the principle of scale free property of real networks. In this study, some key genes along with their Arabidopsis orthologs has been reported, which can be used for Aluminum toxic stress response engineering in soybean. The functional analysis of various selected key genes revealed the underlying molecular mechanisms of Aluminum toxic stress response in soybean. PMID:28056073

  11. Evaluation of statistical treatments of left-censored environmental data using coincident uncensored data sets: I. Summary statistics

    USGS Publications Warehouse

    Antweiler, R.C.; Taylor, H.E.

    2008-01-01

    The main classes of statistical treatment of below-detection limit (left-censored) environmental data for the determination of basic statistics that have been used in the literature are substitution methods, maximum likelihood, regression on order statistics (ROS), and nonparametric techniques. These treatments, along with using all instrument-generated data (even those below detection), were evaluated by examining data sets in which the true values of the censored data were known. It was found that for data sets with less than 70% censored data, the best technique overall for determination of summary statistics was the nonparametric Kaplan-Meier technique. ROS and the two substitution methods of assigning one-half the detection limit value to censored data or assigning a random number between zero and the detection limit to censored data were adequate alternatives. The use of these two substitution methods, however, requires a thorough understanding of how the laboratory censored the data. The technique of employing all instrument-generated data - including numbers below the detection limit - was found to be less adequate than the above techniques. At high degrees of censoring (greater than 70% censored data), no technique provided good estimates of summary statistics. Maximum likelihood techniques were found to be far inferior to all other treatments except substituting zero or the detection limit value to censored data.

  12. Gene regulatory network inference using out of equilibrium statistical mechanics

    PubMed Central

    Benecke, Arndt

    2008-01-01

    Spatiotemporal control of gene expression is fundamental to multicellular life. Despite prodigious efforts, the encoding of gene expression regulation in eukaryotes is not understood. Gene expression analyses nourish the hope to reverse engineer effector-target gene networks using inference techniques. Inference from noisy and circumstantial data relies on using robust models with few parameters for the underlying mechanisms. However, a systematic path to gene regulatory network reverse engineering from functional genomics data is still impeded by fundamental problems. Recently, Johannes Berg from the Theoretical Physics Institute of Cologne University has made two remarkable contributions that significantly advance the gene regulatory network inference problem. Berg, who uses gene expression data from yeast, has demonstrated a nonequilibrium regime for mRNA concentration dynamics and was able to map the gene regulatory process upon simple stochastic systems driven out of equilibrium. The impact of his demonstration is twofold, affecting both the understanding of the operational constraints under which transcription occurs and the capacity to extract relevant information from highly time-resolved expression data. Berg has used his observation to predict target genes of selected transcription factors, and thereby, in principle, demonstrated applicability of his out of equilibrium statistical mechanics approach to the gene network inference problem. PMID:19404429

  13. Regulation of SET Gene Expression by NFkB.

    PubMed

    Feng, Yi; Li, Xiaoyong; Zhou, Weitao; Lou, Dandan; Huang, Daochao; Li, Yanhua; Kang, Yu; Xiang, Yan; Li, Tingyu; Zhou, Weihui; Song, Weihong

    2016-06-28

    SET is elevated and mislocalized in the neuronal cytoplasm in brains of Alzheimer's disease (AD) and Down syndrome (DS) patients. Cytoplasm SET leads to inhibition of protein phosphatase 2A and is involved in the tau pathology. However, the regulation of SET gene expression remains elusive. In the present study, we cloned a 1399-bp segment of the 5' flanking region of the human SET gene and identified that the transcription start site (TSS) of SET transcript 1 is located at 123 bp upstream of the translation start site ATG in exon 1. Sequence analysis reveals several putative regulatory elements including NFkB, Sp1, and HSE. Luciferase assay and electrophoretic mobility shift assay (EMSA) identified a functional cis-acting NFkB-responsive element in the SET gene promoter. Overexpression and activation of NFkB upregulate transcription of SET isoform 1 but not isoform 2, indicating that the expression of these two isoforms is differentially regulated. The results demonstrate that NFkB plays an important role in regulation of the human SET gene expression. Our findings suggest that oxidative stress and inflammatory responses could result in abnormal SET gene expression, contributing to the tauopathy in AD pathogenesis.

  14. On sufficient statistics of least-squares superposition of vector sets.

    PubMed

    Konagurthu, Arun S; Kasarapu, Parthan; Allison, Lloyd; Collier, James H; Lesk, Arthur M

    2015-06-01

    The problem of superposition of two corresponding vector sets by minimizing their sum-of-squares error under orthogonal transformation is a fundamental task in many areas of science, notably structural molecular biology. This problem can be solved exactly using an algorithm whose time complexity grows linearly with the number of correspondences. This efficient solution has facilitated the widespread use of the superposition task, particularly in studies involving macromolecular structures. This article formally derives a set of sufficient statistics for the least-squares superposition problem. These statistics are additive. This permits a highly efficient (constant time) computation of superpositions (and sufficient statistics) of vector sets that are composed from its constituent vector sets under addition or deletion operation, where the sufficient statistics of the constituent sets are already known (that is, the constituent vector sets have been previously superposed). This results in a drastic improvement in the run time of the methods that commonly superpose vector sets under addition or deletion operations, where previously these operations were carried out ab initio (ignoring the sufficient statistics). We experimentally demonstrate the improvement our work offers in the context of protein structural alignment programs that assemble a reliable structural alignment from well-fitting (substructural) fragment pairs. A C++ library for this task is available online under an open-source license.

  15. Differential Effects of Goal Setting and Value Reappraisal on College Women's Motivation and Achievement in Statistics

    ERIC Educational Resources Information Center

    Acee, Taylor Wayne

    2009-01-01

    The purpose of this dissertation was to investigate the differential effects of goal setting and value reappraisal on female students' self-efficacy beliefs, value perceptions, exam performance and continued interest in statistics. It was hypothesized that the Enhanced Goal Setting Intervention (GS-E) would positively impact students'…

  16. Gene Identification Algorithms Using Exploratory Statistical Analysis of Periodicity

    NASA Astrophysics Data System (ADS)

    Mukherjee, Shashi Bajaj; Sen, Pradip Kumar

    2010-10-01

    Studying periodic pattern is expected as a standard line of attack for recognizing DNA sequence in identification of gene and similar problems. But peculiarly very little significant work is done in this direction. This paper studies statistical properties of DNA sequences of complete genome using a new technique. A DNA sequence is converted to a numeric sequence using various types of mappings and standard Fourier technique is applied to study the periodicity. Distinct statistical behaviour of periodicity parameters is found in coding and non-coding sequences, which can be used to distinguish between these parts. Here DNA sequences of Drosophila melanogaster were analyzed with significant accuracy.

  17. Integrated gene set analysis for microRNA studies

    PubMed Central

    Garcia-Garcia, Francisco; Panadero, Joaquin; Dopazo, Joaquin; Montaner, David

    2016-01-01

    Motivation: Functional interpretation of miRNA expression data is currently done in a three step procedure: select differentially expressed miRNAs, find their target genes, and carry out gene set overrepresentation analysis. Nevertheless, major limitations of this approach have already been described at the gene level, while some newer arise in the miRNA scenario. Here, we propose an enhanced methodology that builds on the well-established gene set analysis paradigm. Evidence for differential expression at the miRNA level is transferred to a gene differential inhibition score which is easily interpretable in terms of gene sets or pathways. Such transferred indexes account for the additive effect of several miRNAs targeting the same gene, and also incorporate cancellation effects between cases and controls. Together, these two desirable characteristics allow for more accurate modeling of regulatory processes. Results: We analyze high-throughput sequencing data from 20 different cancer types and provide exhaustive reports of gene and Gene Ontology-term deregulation by miRNA action. Availability and Implementation: The proposed methodology was implemented in the Bioconductor library mdgsa. http://bioconductor.org/packages/mdgsa. For the purpose of reproducibility all of the scripts are available at https://github.com/dmontaner-papers/gsa4mirna Contact: david.montaner@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27324197

  18. Analysis of Gene Sets Based on the Underlying Regulatory Network

    PubMed Central

    Michailidis, George

    2009-01-01

    Abstract Networks are often used to represent the interactions among genes and proteins. These interactions are known to play an important role in vital cell functions and should be included in the analysis of genes that are differentially expressed. Methods of gene set analysis take advantage of external biological information and analyze a priori defined sets of genes. These methods can potentially preserve the correlation among genes; however, they do not directly incorporate the information about the gene network. In this paper, we propose a latent variable model that directly incorporates the network information. We then use the theory of mixed linear models to present a general inference framework for the problem of testing the significance of subnetworks. Several possible test procedures are introduced and a network based method for testing the changes in expression levels of genes as well as the structure of the network is presented. The performance of the proposed method is compared with methods of gene set analysis using both simulation studies, as well as real data on genes related to the galactose utilization pathway in yeast. PMID:19254181

  19. A hybrid approach of gene sets and single genes for the prediction of survival risks with gene expression data.

    PubMed

    Seok, Junhee; Davis, Ronald W; Xiao, Wenzhong

    2015-01-01

    Accumulated biological knowledge is often encoded as gene sets, collections of genes associated with similar biological functions or pathways. The use of gene sets in the analyses of high-throughput gene expression data has been intensively studied and applied in clinical research. However, the main interest remains in finding modules of biological knowledge, or corresponding gene sets, significantly associated with disease conditions. Risk prediction from censored survival times using gene sets hasn't been well studied. In this work, we propose a hybrid method that uses both single gene and gene set information together to predict patient survival risks from gene expression profiles. In the proposed method, gene sets provide context-level information that is poorly reflected by single genes. Complementarily, single genes help to supplement incomplete information of gene sets due to our imperfect biomedical knowledge. Through the tests over multiple data sets of cancer and trauma injury, the proposed method showed robust and improved performance compared with the conventional approaches with only single genes or gene sets solely. Additionally, we examined the prediction result in the trauma injury data, and showed that the modules of biological knowledge used in the prediction by the proposed method were highly interpretable in biology. A wide range of survival prediction problems in clinical genomics is expected to benefit from the use of biological knowledge.

  20. Hepatic vessel segmentation using variational level set combined with non-local robust statistics.

    PubMed

    Lu, Siyu; Huang, Hui; Liang, Ping; Chen, Gang; Xiao, Liang

    2017-02-01

    Hepatic vessel segmentation is a challenging step in therapy guided by magnetic resonance imaging (MRI). This paper presents an improved variational level set method, which uses non-local robust statistics to suppress the influence of noise in MR images. The non-local robust statistics, which represent vascular features, are learned adaptively from seeds provided by users. K-means clustering in neighborhoods of seeds is utilized to exclude inappropriate seeds, which are obviously corrupted by noise. The neighborhoods of appropriate seeds are placed in an array to calculate the non-local robust statistics, and the variational level set formulation can be constructed. Bias correction is utilized in the level set formulation to reduce the influence of intensity inhomogeneity of MRI. Experiments were conducted over real MR images, and showed that the proposed method performed better on small hepatic vessel segmentation compared with other segmentation methods.

  1. GeneBase 1.1: a tool to summarize data from NCBI gene datasets and its application to an update of human gene statistics.

    PubMed

    Piovesan, Allison; Caracausi, Maria; Antonaros, Francesca; Pelleri, Maria Chiara; Vitale, Lorenza

    2016-01-01

    We release GeneBase 1.1, a local tool with a graphical interface useful for parsing, structuring and indexing data from the National Center for Biotechnology Information (NCBI) Gene data bank. Compared to its predecessor GeneBase (1.0), GeneBase 1.1 now allows dynamic calculation and summarization in terms of median, mean, standard deviation and total for many quantitative parameters associated with genes, gene transcripts and gene features (exons, introns, coding sequences, untranslated regions). GeneBase 1.1 thus offers the opportunity to perform analyses of the main gene structure parameters also following the search for any set of genes with the desired characteristics, allowing unique functionalities not provided by the NCBI Gene itself. In order to show the potential of our tool for local parsing, structuring and dynamic summarizing of publicly available databases for data retrieval, analysis and testing of biological hypotheses, we provide as a sample application a revised set of statistics for human nuclear genes, gene transcripts and gene features. In contrast with previous estimations strongly underestimating the length of human genes, a 'mean' human protein-coding gene is 67 kbp long, has eleven 309 bp long exons and ten 6355 bp long introns. Median, mean and extreme values are provided for many other features offering an updated reference source for human genome studies, data useful to set parameters for bioinformatic tools and interesting clues to the biomedical meaning of the gene features themselves.Database URL: http://apollo11.isto.unibo.it/software/.

  2. GeneBase 1.1: a tool to summarize data from NCBI gene datasets and its application to an update of human gene statistics

    PubMed Central

    Piovesan, Allison; Caracausi, Maria; Antonaros, Francesca; Pelleri, Maria Chiara; Vitale, Lorenza

    2016-01-01

    We release GeneBase 1.1, a local tool with a graphical interface useful for parsing, structuring and indexing data from the National Center for Biotechnology Information (NCBI) Gene data bank. Compared to its predecessor GeneBase (1.0), GeneBase 1.1 now allows dynamic calculation and summarization in terms of median, mean, standard deviation and total for many quantitative parameters associated with genes, gene transcripts and gene features (exons, introns, coding sequences, untranslated regions). GeneBase 1.1 thus offers the opportunity to perform analyses of the main gene structure parameters also following the search for any set of genes with the desired characteristics, allowing unique functionalities not provided by the NCBI Gene itself. In order to show the potential of our tool for local parsing, structuring and dynamic summarizing of publicly available databases for data retrieval, analysis and testing of biological hypotheses, we provide as a sample application a revised set of statistics for human nuclear genes, gene transcripts and gene features. In contrast with previous estimations strongly underestimating the length of human genes, a ‘mean’ human protein-coding gene is 67 kbp long, has eleven 309 bp long exons and ten 6355 bp long introns. Median, mean and extreme values are provided for many other features offering an updated reference source for human genome studies, data useful to set parameters for bioinformatic tools and interesting clues to the biomedical meaning of the gene features themselves. Database URL: http://apollo11.isto.unibo.it/software/ PMID:28025344

  3. An unusually simple HP1 gene set in Hymenopteran insects.

    PubMed

    Fang, C; Schmitz, L; Ferree, P M

    2015-12-01

    The heterochromatin protein 1 (HP1) gene family includes a set of paralogs in higher eukaryotes that serve fundamental roles in heterochromatin structure and maintenance, and other chromatin-related functions. At least 10 full and 16 partial HP1 genes exist among Drosophila species, with multiple gene gains, losses, and sub-functionalizations within this insect group. An important question is whether this diverse set of HP1 genes and their dynamic evolution represent the standard rule in eukaryotic groups. Here we have begun to address this question by bio-informatically identifying the HP1 family genes in representative species of the insect order Hymenoptera, which includes all ants, bees, wasps, and sawflies. Compared to Drosophila species, Hymenopterans have a much simpler set of HP1 genes, including one full and two partial HP1s. All 3 genes appear to have been present in the common ancestor of the Hymenopterans and they derive from a Drosophila HP1B-like gene. In ants, a partial HP1 gene containing only a chromoshadow domain harbors amino acid changes at highly conserved sites within the PxVxL recognition region, suggesting that this gene has undergone sub-functionalization. In the jewel wasp Nasonia vitripennis, the full HP1 and partial chromoshadow-only HP1 are expressed in both germ line and somatic tissues. However, the partial chromodomain-only HP1 is expressed exclusively in the ovary and testis, suggesting that it may have a specialized chromatin role during gametogenesis. Our findings demonstrate that the HP1 gene family is much simpler and evolutionarily less dynamic within the Hymenopterans compared to the much younger Drosophila group, a pattern that may reflect major differences in the range of chromatin-related functions present in these and perhaps other insect groups.

  4. A review of statistical methods for data sets with multiple censoring points

    SciTech Connect

    Gilbert, R.O.

    1995-07-06

    This report reviews and summarizes recent literature on statistical methods for analyzing data sets that are censored by multiple censoring points. This report is organized as follows. Following the introductory comments in Section 2, a brief discussion of detection limits is given in Section 3. Sections 4 and 5 focus on data analysis methods for estimating parameters and testing hypotheses, respectively, when data sets are left censored with multiple censoring points. A list of publications that deal with a variety of other applications for censored data sets is provided in Section 6. Recommendations on future research for developing new or improved tools for statistically analyzing multiple left-censored data sets are provided in Section 7. The list of references is in Section 8.

  5. Turning publicly available gene expression data into discoveries using gene set context analysis.

    PubMed

    Ji, Zhicheng; Vokes, Steven A; Dang, Chi V; Ji, Hongkai

    2016-01-08

    Gene Set Context Analysis (GSCA) is an open source software package to help researchers use massive amounts of publicly available gene expression data (PED) to make discoveries. Users can interactively visualize and explore gene and gene set activities in 25,000+ consistently normalized human and mouse gene expression samples representing diverse biological contexts (e.g. different cells, tissues and disease types, etc.). By providing one or multiple genes or gene sets as input and specifying a gene set activity pattern of interest, users can query the expression compendium to systematically identify biological contexts associated with the specified gene set activity pattern. In this way, researchers with new gene sets from their own experiments may discover previously unknown contexts of gene set functions and hence increase the value of their experiments. GSCA has a graphical user interface (GUI). The GUI makes the analysis convenient and customizable. Analysis results can be conveniently exported as publication quality figures and tables. GSCA is available at https://github.com/zji90/GSCA. This software significantly lowers the bar for biomedical investigators to use PED in their daily research for generating and screening hypotheses, which was previously difficult because of the complexity, heterogeneity and size of the data.

  6. A statistical approach to set classification by feature selection with applications to classification of histopathology images.

    PubMed

    Jung, Sungkyu; Qiao, Xingye

    2014-09-01

    Set classification problems arise when classification tasks are based on sets of observations as opposed to individual observations. In set classification, a classification rule is trained with N sets of observations, where each set is labeled with class information, and the prediction of a class label is performed also with a set of observations. Data sets for set classification appear, for example, in diagnostics of disease based on multiple cell nucleus images from a single tissue. Relevant statistical models for set classification are introduced, which motivate a set classification framework based on context-free feature extraction. By understanding a set of observations as an empirical distribution, we employ a data-driven method to choose those features which contain information on location and major variation. In particular, the method of principal component analysis is used to extract the features of major variation. Multidimensional scaling is used to represent features as vector-valued points on which conventional classifiers can be applied. The proposed set classification approaches achieve better classification results than competing methods in a number of simulated data examples. The benefits of our method are demonstrated in an analysis of histopathology images of cell nuclei related to liver cancer.

  7. Joint Clustering and Component Analysis of Correspondenceless Point Sets: Application to Cardiac Statistical Modeling.

    PubMed

    Gooya, Ali; Lekadir, Karim; Alba, Xenia; Swift, Andrew J; Wild, Jim M; Frangi, Alejandro F

    2015-01-01

    Construction of Statistical Shape Models (SSMs) from arbitrary point sets is a challenging problem due to significant shape variation and lack of explicit point correspondence across the training data set. In medical imaging, point sets can generally represent different shape classes that span healthy and pathological exemplars. In such cases, the constructed SSM may not generalize well, largely because the probability density function (pdf) of the point sets deviates from the underlying assumption of Gaussian statistics. To this end, we propose a generative model for unsupervised learning of the pdf of point sets as a mixture of distinctive classes. A Variational Bayesian (VB) method is proposed for making joint inferences on the labels of point sets, and the principal modes of variations in each cluster. The method provides a flexible framework to handle point sets with no explicit point-to-point correspondences. We also show that by maximizing the marginalized likelihood of the model, the optimal number of clusters of point sets can be determined. We illustrate this work in the context of understanding the anatomical phenotype of the left and right ventricles in heart. To this end, we use a database containing hearts of healthy subjects, patients with Pulmonary Hypertension (PH), and patients with Hypertrophic Cardiomyopathy (HCM). We demonstrate that our method can outperform traditional PCA in both generalization and specificity measures.

  8. Methods for Determining the Statistical Significance of Enrichment or Depletion of Gene Ontology Classifications under Weighted Membership.

    PubMed

    Iacucci, Ernesto; Zingg, Hans H; Perkins, Theodore J

    2012-01-01

    High-throughput molecular biology studies, such as microarray assays of gene expression, two-hybrid experiments for detecting protein interactions, or ChIP-Seq experiments for transcription factor binding, often result in an "interesting" set of genes - say, genes that are co-expressed or bound by the same factor. One way of understanding the biological meaning of such a set is to consider what processes or functions, as defined in an ontology, are over-represented (enriched) or under-represented (depleted) among genes in the set. Usually, the significance of enrichment or depletion scores is based on simple statistical models and on the membership of genes in different classifications. We consider the more general problem of computing p-values for arbitrary integer additive statistics, or weighted membership functions. Such membership functions can be used to represent, for example, prior knowledge on the role of certain genes or classifications, differential importance of different classifications or genes to the experimenter, hierarchical relationships between classifications, or different degrees of interestingness or evidence for specific genes. We describe a generic dynamic programming algorithm that can compute exact p-values for arbitrary integer additive statistics. We also describe several optimizations for important special cases, which can provide orders-of-magnitude speed up in the computations. We apply our methods to datasets describing oxidative phosphorylation and parturition and compare p-values based on computations of several different statistics for measuring enrichment. We find major differences between p-values resulting from these statistics, and that some statistics recover "gold standard" annotations of the data better than others. Our work establishes a theoretical and algorithmic basis for far richer notions of enrichment or depletion of gene sets with respect to gene ontologies than has previously been available.

  9. TransFind—predicting transcriptional regulators for gene sets

    PubMed Central

    Kiełbasa, Szymon M.; Klein, Holger; Roider, Helge G.; Vingron, Martin; Blüthgen, Nils

    2010-01-01

    The analysis of putative transcription factor binding sites in promoter regions of coregulated genes allows to infer the transcription factors that underlie observed changes in gene expression. While such analyses constitute a central component of the in-silico characterization of transcriptional regulatory networks, there is still a lack of simple-to-use web servers able to combine state-of-the-art prediction methods with phylogenetic analysis and appropriate multiple testing corrected statistics, which returns the results within a short time. Having these aims in mind we developed TransFind, which is freely available at http://transfind.sys-bio.net/. PMID:20511592

  10. Reproducibility-optimized test statistic for ranking genes in microarray studies.

    PubMed

    Elo, Laura L; Filén, Sanna; Lahesmaa, Riitta; Aittokallio, Tero

    2008-01-01

    A principal goal of microarray studies is to identify the genes showing differential expression under distinct conditions. In such studies, the selection of an optimal test statistic is a crucial challenge, which depends on the type and amount of data under analysis. While previous studies on simulated or spike-in datasets do not provide practical guidance on how to choose the best method for a given real dataset, we introduce an enhanced reproducibility-optimization procedure, which enables the selection of a suitable gene- anking statistic directly from the data. In comparison with existing ranking methods, the reproducibilityoptimized statistic shows good performance consistently under various simulated conditions and on Affymetrix spike-in dataset. Further, the feasibility of the novel statistic is confirmed in a practical research setting using data from an in-house cDNA microarray study of asthma-related gene expression changes. These results suggest that the procedure facilitates the selection of an appropriate test statistic for a given dataset without relying on a priori assumptions, which may bias the findings and their interpretation. Moreover, the general reproducibilityoptimization procedure is not limited to detecting differential expression only but could be extended to a wide range of other applications as well.

  11. Improving Gene-Set Enrichment Analysis of RNA-Seq Data with Small Replicates.

    PubMed

    Yoon, Sora; Kim, Seon-Young; Nam, Dougu

    2016-01-01

    Deregulated pathways identified from transcriptome data of two sample groups have played a key role in many genomic studies. Gene-set enrichment analysis (GSEA) has been commonly used for pathway or functional analysis of microarray data, and it is also being applied to RNA-seq data. However, most RNA-seq data so far have only small replicates. This enforces to apply the gene-permuting GSEA method (or preranked GSEA) which results in a great number of false positives due to the inter-gene correlation in each gene-set. We demonstrate that incorporating the absolute gene statistic in one-tailed GSEA considerably improves the false-positive control and the overall discriminatory ability of the gene-permuting GSEA methods for RNA-seq data. To test the performance, a simulation method to generate correlated read counts within a gene-set was newly developed, and a dozen of currently available RNA-seq enrichment analysis methods were compared, where the proposed methods outperformed others that do not account for the inter-gene correlation. Analysis of real RNA-seq data also supported the proposed methods in terms of false positive control, ranks of true positives and biological relevance. An efficient R package (AbsFilterGSEA) coded with C++ (Rcpp) is available from CRAN.

  12. Gene expression data analysis using closed item set mining for labeled data.

    PubMed

    Rotter, Ana; Novak, Petra Kralj; Baebler, Spela; Toplak, Natasa; Blejec, Andrej; Lavrac, Nada; Gruden, Kristina

    2010-04-01

    This article presents an approach to microarray data analysis using discretised expression values in combination with a methodology of closed item set mining for class labeled data (RelSets). A statistical 2 x 2 factorial design analysis was run in parallel. The approach was validated on two independent sets of two-color microarray experiments using potato plants. Our results demonstrate that the two different analytical procedures, applied on the same data, are adequate for solving two different biological questions being asked. Statistical analysis is appropriate if an overview of the consequences of treatments and their interaction terms on the studied system is needed. If, on the other hand, a list of genes whose expression (upregulation or downregulation) differentiates between classes of data is required, the use of the RelSets algorithm is preferred. The used algorithms are freely available upon request to the authors.

  13. A Complete Set of Nascent Transcription Rates for Yeast Genes

    PubMed Central

    Pelechano, Vicent; Chávez, Sebastián; Pérez-Ortín, José E.

    2010-01-01

    The amount of mRNA in a cell is the result of two opposite reactions: transcription and mRNA degradation. These reactions are governed by kinetics laws, and the most regulated step for many genes is the transcription rate. The transcription rate, which is assumed to be exercised mainly at the RNA polymerase recruitment level, can be calculated using the RNA polymerase densities determined either by run-on or immunoprecipitation using specific antibodies. The yeast Saccharomyces cerevisiae is the ideal model organism to generate a complete set of nascent transcription rates that will prove useful for many gene regulation studies. By combining genomic data from both the GRO (Genomic Run-on) and the RNA pol ChIP-on-chip methods we generated a new, more accurate nascent transcription rate dataset. By comparing this dataset with the indirect ones obtained from the mRNA stabilities and mRNA amount datasets, we are able to obtain biological information about posttranscriptional regulation processes and a genomic snapshot of the location of the active transcriptional machinery. We have obtained nascent transcription rates for 4,670 yeast genes. The median RNA polymerase II density in the genes is 0.078 molecules/kb, which corresponds to an average of 0.096 molecules/gene. Most genes have transcription rates of between 2 and 30 mRNAs/hour and less than 1% of yeast genes have >1 RNA polymerase molecule/gene. Histone and ribosomal protein genes are the highest transcribed groups of genes and other than these exceptions the transcription of genes is an infrequent phenomenon in a yeast cell. PMID:21103382

  14. CoGA: An R Package to Identify Differentially Co-Expressed Gene Sets by Analyzing the Graph Spectra.

    PubMed

    Santos, Suzana de Siqueira; Galatro, Thais Fernanda de Almeida; Watanabe, Rodrigo Akira; Oba-Shinjo, Sueli Mieko; Nagahashi Marie, Suely Kazue; Fujita, André

    2015-01-01

    Gene set analysis aims to identify predefined sets of functionally related genes that are differentially expressed between two conditions. Although gene set analysis has been very successful, by incorporating biological knowledge about the gene sets and enhancing statistical power over gene-by-gene analyses, it does not take into account the correlation (association) structure among the genes. In this work, we present CoGA (Co-expression Graph Analyzer), an R package for the identification of groups of differentially associated genes between two phenotypes. The analysis is based on concepts of Information Theory applied to the spectral distributions of the gene co-expression graphs, such as the spectral entropy to measure the randomness of a graph structure and the Jensen-Shannon divergence to discriminate classes of graphs. The package also includes common measures to compare gene co-expression networks in terms of their structural properties, such as centrality, degree distribution, shortest path length, and clustering coefficient. Besides the structural analyses, CoGA also includes graphical interfaces for visual inspection of the networks, ranking of genes according to their "importance" in the network, and the standard differential expression analysis. We show by both simulation experiments and analyses of real data that the statistical tests performed by CoGA indeed control the rate of false positives and is able to identify differentially co-expressed genes that other methods failed.

  15. Key genes and pathways in thyroid cancer based on gene set enrichment analysis.

    PubMed

    He, Wenwu; Qi, Bin; Zhou, Qiuxi; Lu, Chuansen; Huang, Qi; Xian, Lei; Chen, Mingwu

    2013-09-01

    The incidence of thyroid cancer and its associated morbidity has shown the most rapid increase among all cancers since 1982, but the mechanisms involved in thyroid cancer, particularly significant key genes induced in thyroid cancer, remain undefined. In many studies, gene probes have been used to search for key genes involved in causing and facilitating thyroid cancer. As a result, many possible virulence genes and pathways have been identified. However, these studies lack a case contrast for selecting the most possible virulence genes and pathways, as well as conclusive results with which to clarify the mechanisms of cancer development. In the present study, we used gene set enrichment and meta-analysis to select key genes and pathways. Based on gene set enrichment, we identified 5 downregulated and 4 upregulated mixed pathways in 6 tissue datasets. Based on the meta-analysis, there were 17 common pathways in the tissue datasets. One pathway, the p53 signaling pathway, which includes 13 genes, was identified by both the gene set enrichment analysis and meta-analysis. Genes are important elements that form key pathways. These pathways can induce the development of thyroid cancer later in life. The key pathways and genes identified in the present study can be used in the next stage of research, which will involve gene elimination and other methods of experimentation.

  16. A statistical framework for improving genomic annotations of prokaryotic essential genes.

    PubMed

    Deng, Jingyuan; Su, Shengchang; Lin, Xiaodong; Hassett, Daniel J; Lu, Long Jason

    2013-01-01

    Large-scale systematic analysis of gene essentiality is an important step closer toward unraveling the complex relationship between genotypes and phenotypes. Such analysis cannot be accomplished without unbiased and accurate annotations of essential genes. In current genomic databases, most of the essential gene annotations are derived from whole-genome transposon mutagenesis (TM), the most frequently used experimental approach for determining essential genes in microorganisms under defined conditions. However, there are substantial systematic biases associated with TM experiments. In this study, we developed a novel Poisson model-based statistical framework to simulate the TM insertion process and subsequently correct the experimental biases. We first quantitatively assessed the effects of major factors that potentially influence the accuracy of TM and subsequently incorporated relevant factors into the framework. Through iteratively optimizing parameters, we inferred the actual insertion events occurred and described each gene's essentiality on probability measure. Evaluated by the definite mapping of essential gene profile in Escherichia coli, our model significantly improved the accuracy of original TM datasets, resulting in more accurate annotations of essential genes. Our method also showed encouraging results in improving subsaturation level TM datasets. To test our model's broad applicability to other bacteria, we applied it to Pseudomonas aeruginosa PAO1 and Francisella tularensis novicida TM datasets. We validated our predictions by literature as well as allelic exchange experiments in PAO1. Our model was correct on six of the seven tested genes. Remarkably, among all three cases that our predictions contradicted the TM assignments, experimental validations supported our predictions. In summary, our method will be a promising tool in improving genomic annotations of essential genes and enabling large-scale explorations of gene essentiality. Our

  17. Parallel evolution of nacre building gene sets in molluscs.

    PubMed

    Jackson, Daniel J; McDougall, Carmel; Woodcroft, Ben; Moase, Patrick; Rose, Robert A; Kube, Michael; Reinhardt, Richard; Rokhsar, Daniel S; Montagnani, Caroline; Joubert, Caroline; Piquemal, David; Degnan, Bernard M

    2010-03-01

    The capacity to biomineralize is closely linked to the rapid expansion of animal life during the early Cambrian, with many skeletonized phyla first appearing in the fossil record at this time. The appearance of disparate molluscan forms during this period leaves open the possibility that shells evolved independently and in parallel in at least some groups. To test this proposition and gain insight into the evolution of structural genes that contribute to shell fabrication, we compared genes expressed in nacre (mother-of-pearl) forming cells in the mantle of the bivalve Pinctada maxima and the gastropod Haliotis asinina. Despite both species having highly lustrous nacre, we find extensive differences in these expressed gene sets. Following the removal of housekeeping genes, less than 10% of all gene clusters are shared between these molluscs, with some being conserved biomineralization genes that are also found in deuterostomes. These differences extend to secreted proteins that may localize to the organic shell matrix, with less than 15% of this secretome being shared. Despite these differences, H. asinina and P. maxima both secrete proteins with repetitive low-complexity domains (RLCDs). Pinctada maxima RLCD proteins-for example, the shematrins-are predominated by silk/fibroin-like domains, which are absent from the H. asinina data set. Comparisons of shematrin genes across three species of Pinctada indicate that this gene family has undergone extensive divergent evolution within pearl oysters. We also detect fundamental bivalve-gastropod differences in extracellular matrix proteins involved in mollusc-shell formation. Pinctada maxima expresses a chitin synthase at high levels and several chitin deacetylation genes, whereas only one protein involved in chitin interactions is present in the H. asinina data set, suggesting that the organic matrix on which calcification proceeds differs fundamentally between these species. Large-scale differences in genes expressed

  18. The essential gene set of a photosynthetic organism

    PubMed Central

    Rubin, Benjamin E.; Wetmore, Kelly M.; Price, Morgan N.; Diamond, Spencer; Shultzaberger, Ryan K.; Lowe, Laura C.; Curtin, Genevieve; Arkin, Adam P.; Deutschbauer, Adam; Golden, Susan S.

    2015-01-01

    Synechococcus elongatus PCC 7942 is a model organism used for studying photosynthesis and the circadian clock, and it is being developed for the production of fuel, industrial chemicals, and pharmaceuticals. To identify a comprehensive set of genes and intergenic regions that impacts fitness in S. elongatus, we created a pooled library of ∼250,000 transposon mutants and used sequencing to identify the insertion locations. By analyzing the distribution and survival of these mutants, we identified 718 of the organism’s 2,723 genes as essential for survival under laboratory conditions. The validity of the essential gene set is supported by its tight overlap with well-conserved genes and its enrichment for core biological processes. The differences noted between our dataset and these predictors of essentiality, however, have led to surprising biological insights. One such finding is that genes in a large portion of the TCA cycle are dispensable, suggesting that S. elongatus does not require a cyclic TCA process. Furthermore, the density of the transposon mutant library enabled individual and global statements about the essentiality of noncoding RNAs, regulatory elements, and other intergenic regions. In this way, a group I intron located in tRNALeu, which has been used extensively for phylogenetic studies, was shown here to be essential for the survival of S. elongatus. Our survey of essentiality for every locus in the S. elongatus genome serves as a powerful resource for understanding the organism’s physiology and defines the essential gene set required for the growth of a photosynthetic organism. PMID:26508635

  19. Statistical plant set estimation using Schroeder-phased multisinusoidal input design

    NASA Technical Reports Server (NTRS)

    Bayard, D. S.

    1992-01-01

    A frequency domain method is developed for plant set estimation. The estimation of a plant 'set' rather than a point estimate is required to support many methods of modern robust control design. The approach here is based on using a Schroeder-phased multisinusoid input design which has the special property of placing input energy only at the discrete frequency points used in the computation. A detailed analysis of the statistical properties of the frequency domain estimator is given, leading to exact expressions for the probability distribution of the estimation error, and many important properties. It is shown that, for any nominal parametric plant estimate, one can use these results to construct an overbound on the additive uncertainty to any prescribed statistical confidence. The 'soft' bound thus obtained can be used to replace 'hard' bounds presently used in many robust control analysis and synthesis methods.

  20. Quantum Statistical Mechanical Derivation of the Second Law of Thermodynamics: A Hybrid Setting Approach.

    PubMed

    Tasaki, Hal

    2016-04-29

    Based on quantum statistical mechanics and microscopic quantum dynamics, we prove Planck's and Kelvin's principles for macroscopic systems in a general and realistic setting. We consider a hybrid quantum system that consists of the thermodynamic system, which is initially in thermal equilibrium, and the "apparatus" which operates on the former, and assume that the whole system evolves autonomously. This provides a satisfactory derivation of the second law for macroscopic systems.

  1. GeneTopics - interpretation of gene sets via literature-driven topic models

    PubMed Central

    2013-01-01

    Background Annotation of a set of genes is often accomplished through comparison to a library of labelled gene sets such as biological processes or canonical pathways. However, this approach might fail if the employed libraries are not up to date with the latest research, don't capture relevant biological themes or are curated at a different level of granularity than is required to appropriately analyze the input gene set. At the same time, the vast biomedical literature offers an unstructured repository of the latest research findings that can be tapped to provide thematic sub-groupings for any input gene set. Methods Our proposed method relies on a gene-specific text corpus and extracts commonalities between documents in an unsupervised manner using a topic model approach. We automatically determine the number of topics summarizing the corpus and calculate a gene relevancy score for each topic allowing us to eliminate non-specific topics. As a result we obtain a set of literature topics in which each topic is associated with a subset of the input genes providing directly interpretable keywords and corresponding documents for literature research. Results We validate our method based on labelled gene sets from the KEGG metabolic pathway collection and the genetic association database (GAD) and show that the approach is able to detect topics consistent with the labelled annotation. Furthermore, we discuss the results on three different types of experimentally derived gene sets, (1) differentially expressed genes from a cardiac hypertrophy experiment in mice, (2) altered transcript abundance in human pancreatic beta cells, and (3) genes implicated by GWA studies to be associated with metabolite levels in a healthy population. In all three cases, we are able to replicate findings from the original papers in a quick and semi-automated manner. Conclusions Our approach provides a novel way of automatically generating meaningful annotations for gene sets that are directly

  2. Statistical Mechanics of Horizontal Gene Transfer in Evolutionary Ecology

    NASA Astrophysics Data System (ADS)

    Chia, Nicholas; Goldenfeld, Nigel

    2011-04-01

    The biological world, especially its majority microbial component, is strongly interacting and may be dominated by collective effects. In this review, we provide a brief introduction for statistical physicists of the way in which living cells communicate genetically through transferred genes, as well as the ways in which they can reorganize their genomes in response to environmental pressure. We discuss how genome evolution can be thought of as related to the physical phenomenon of annealing, and describe the sense in which genomes can be said to exhibit an analogue of information entropy. As a direct application of these ideas, we analyze the variation with ocean depth of transposons in marine microbial genomes, predicting trends that are consistent with recent observations using metagenomic surveys.

  3. Exact statistical tests for the intersection of independent lists of genes

    PubMed Central

    NATARAJAN, LOKI; PU, MINYA; MESSER, KAREN

    2012-01-01

    Public data repositories have enabled researchers to compare results across multiple genomic studies in order to replicate findings. A common approach is to first rank genes according to an hypothesis of interest within each study. Then, lists of the top-ranked genes within each study are compared across studies. Genes recaptured as highly ranked (usually above some threshold) in multiple studies are considered to be significant. However, this comparison strategy often remains informal, in that Type I error and false discovery rate are usually uncontrolled. In this paper, we formalize an inferential strategy for this kind of list-intersection discovery test. We show how to compute a p-value associated with a `recaptured' set of genes, using a closed-form Poisson approximation to the distribution of the size of the recaptured set. The distribution of the test statistic depends on the rank threshold and the number of studies within which a gene must be recaptured. We use a Poisson approximation to investigate operating characteristics of the test. We give practical guidance on how to design a bioinformatic list-intersection study with prespecified control of Type I error (at the set level) and false discovery rate (at the gene level). We show how choice of test parameters will affect the expected proportion of significant genes identified. We present a strategy for identifying optimal choice of parameters, depending on the particular alternative hypothesis which might hold. We illustrate our methods using prostate cancer gene-expression datasets from the curated Oncomine database. PMID:23335952

  4. Experiences Running a Parallel Answer Set Solver on Blue Gene

    NASA Astrophysics Data System (ADS)

    Schneidenbach, Lars; Schnor, Bettina; Gebser, Martin; Kaminski, Roland; Kaufmann, Benjamin; Schaub, Torsten

    This paper presents the concept of parallelisation of a solver for Answer Set Programming (ASP). While there already exist some approaches to parallel ASP solving, there was a lack of a parallel version of the powerful clasp solver. We implemented a parallel version of clasp based on message-passing. Experimental results on Blue Gene P/L indicate the potential of such an approach.

  5. Shrinkage covariance matrix approach based on robust trimmed mean in gene sets detection

    NASA Astrophysics Data System (ADS)

    Karjanto, Suryaefiza; Ramli, Norazan Mohamed; Ghani, Nor Azura Md; Aripin, Rasimah; Yusop, Noorezatty Mohd

    2015-02-01

    Microarray involves of placing an orderly arrangement of thousands of gene sequences in a grid on a suitable surface. The technology has made a novelty discovery since its development and obtained an increasing attention among researchers. The widespread of microarray technology is largely due to its ability to perform simultaneous analysis of thousands of genes in a massively parallel manner in one experiment. Hence, it provides valuable knowledge on gene interaction and function. The microarray data set typically consists of tens of thousands of genes (variables) from just dozens of samples due to various constraints. Therefore, the sample covariance matrix in Hotelling's T2 statistic is not positive definite and become singular, thus it cannot be inverted. In this research, the Hotelling's T2 statistic is combined with a shrinkage approach as an alternative estimation to estimate the covariance matrix to detect significant gene sets. The use of shrinkage covariance matrix overcomes the singularity problem by converting an unbiased to an improved biased estimator of covariance matrix. Robust trimmed mean is integrated into the shrinkage matrix to reduce the influence of outliers and consequently increases its efficiency. The performance of the proposed method is measured using several simulation designs. The results are expected to outperform existing techniques in many tested conditions.

  6. Gene set analysis approaches for RNA-seq data: performance evaluation and application guideline

    PubMed Central

    Rahmatallah, Yasir; Emmert-Streib, Frank

    2016-01-01

    Transcriptome sequencing (RNA-seq) is gradually replacing microarrays for high-throughput studies of gene expression. The main challenge of analyzing microarray data is not in finding differentially expressed genes, but in gaining insights into the biological processes underlying phenotypic differences. To interpret experimental results from microarrays, gene set analysis (GSA) has become the method of choice, in particular because it incorporates pre-existing biological knowledge (in a form of functionally related gene sets) into the analysis. Here we provide a brief review of several statistically different GSA approaches (competitive and self-contained) that can be adapted from microarrays practice as well as those specifically designed for RNA-seq. We evaluate their performance (in terms of Type I error rate, power, robustness to the sample size and heterogeneity, as well as the sensitivity to different types of selection biases) on simulated and real RNA-seq data. Not surprisingly, the performance of various GSA approaches depends only on the statistical hypothesis they test and does not depend on whether the test was developed for microarrays or RNA-seq data. Interestingly, we found that competitive methods have lower power as well as robustness to the samples heterogeneity than self-contained methods, leading to poor results reproducibility. We also found that the power of unsupervised competitive methods depends on the balance between up- and down-regulated genes in tested gene sets. These properties of competitive methods have been overlooked before. Our evaluation provides a concise guideline for selecting GSA approaches, best performing under particular experimental settings in the context of RNA-seq. PMID:26342128

  7. Statistical methods for mapping quantitative trait loci from a dense set of markers.

    PubMed Central

    Dupuis, J; Siegmund, D

    1999-01-01

    Lander and Botstein introduced statistical methods for searching an entire genome for quantitative trait loci (QTL) in experimental organisms, with emphasis on a backcross design and QTL having only additive effects. We extend their results to intercross and other designs, and we compare the power of the resulting test as a function of the magnitude of the additive and dominance effects, the sample size and intermarker distances. We also compare three methods for constructing confidence regions for a QTL: likelihood regions, Bayesian credible sets, and support regions. We show that with an appropriate evaluation of the coverage probability a support region is approximately a confidence region, and we provide a theroretical explanation of the empirical observation that the size of the support region is proportional to the sample size, not the square root of the sample size, as one might expect from standard statistical theory. PMID:9872974

  8. Fully moderated T-statistic for small sample size gene expression arrays.

    PubMed

    Yu, Lianbo; Gulati, Parul; Fernandez, Soledad; Pennell, Michael; Kirschner, Lawrence; Jarjoura, David

    2011-09-15

    Gene expression microarray experiments with few replications lead to great variability in estimates of gene variances. Several Bayesian methods have been developed to reduce this variability and to increase power. Thus far, moderated t methods assumed a constant coefficient of variation (CV) for the gene variances. We provide evidence against this assumption, and extend the method by allowing the CV to vary with gene expression. Our CV varying method, which we refer to as the fully moderated t-statistic, was compared to three other methods (ordinary t, and two moderated t predecessors). A simulation study and a familiar spike-in data set were used to assess the performance of the testing methods. The results showed that our CV varying method had higher power than the other three methods, identified a greater number of true positives in spike-in data, fit simulated data under varying assumptions very well, and in a real data set better identified higher expressing genes that were consistent with functional pathways associated with the experiments.

  9. Gene set analysis for self-contained tests: complex null and specific alternative hypotheses

    PubMed Central

    Rahmatallah, Y.; Glazko, G.

    2012-01-01

    Motivation: The analysis of differentially expressed gene sets became a routine in the analyses of gene expression data. There is a multitude of tests available, ranging from aggregation tests that summarize gene-level statistics for a gene set to true multivariate tests, accounting for intergene correlations. Most of them detect complex departures from the null hypothesis but when the null hypothesis is rejected, the specific alternative leading to the rejection is not easily identifiable. Results: In this article we compare the power and Type I error rates of minimum-spanning tree (MST)-based non-parametric multivariate tests with several multivariate and aggregation tests, which are frequently used for pathway analyses. In our simulation study, we demonstrate that MST-based tests have power that is for many settings comparable with the power of conventional approaches, but outperform them in specific regions of the parameter space corresponding to biologically relevant configurations. Further, we find for simulated and for gene expression data that MST-based tests discriminate well against shift and scale alternatives. As a general result, we suggest a two-step practical analysis strategy that may increase the interpretability of experimental data: first, apply the most powerful multivariate test to find the subset of pathways for which the null hypothesis is rejected and second, apply MST-based tests to these pathways to select those that support specific alternative hypotheses. Contact: gvglazko@uams.edu or yrahmatallah@uams.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23044539

  10. Regionalisation of statistical model outputs creating gridded data sets for Germany

    NASA Astrophysics Data System (ADS)

    Höpp, Simona Andrea; Rauthe, Monika; Deutschländer, Thomas

    2016-04-01

    The goal of the German research program ReKliEs-De (regional climate projection ensembles for Germany, http://.reklies.hlug.de) is to distribute robust information about the range and the extremes of future climate for Germany and its neighbouring river catchment areas. This joint research project is supported by the German Federal Ministry of Education and Research (BMBF) and was initiated by the German Federal States. The Project results are meant to support the development of adaptation strategies to mitigate the impacts of future climate change. The aim of our part of the project is to adapt and transfer the regionalisation methods of the gridded hydrological data set (HYRAS) from daily station data to the station based statistical regional climate model output of WETTREG (regionalisation method based on weather patterns). The WETTREG model output covers the period of 1951 to 2100 with a daily temporal resolution. For this, we generate a gridded data set of the WETTREG output for precipitation, air temperature and relative humidity with a spatial resolution of 12.5 km x 12.5 km, which is common for regional climate models. Thus, this regionalisation allows comparing statistical to dynamical climate model outputs. The HYRAS data set was developed by the German Meteorological Service within the German research program KLIWAS (www.kliwas.de) and consists of daily gridded data for Germany and its neighbouring river catchment areas. It has a spatial resolution of 5 km x 5 km for the entire domain for the hydro-meteorological elements precipitation, air temperature and relative humidity and covers the period of 1951 to 2006. After conservative remapping the HYRAS data set is also convenient for the validation of climate models. The presentation will consist of two parts to present the actual state of the adaptation of the HYRAS regionalisation methods to the statistical regional climate model WETTREG: First, an overview of the HYRAS data set and the regionalisation

  11. Discriminatory power of game-related statistics in 14-15 year age group male volleyball, according to set.

    PubMed

    García-Hermoso, Antonio; Dávila-Romero, Carlos; Saavedra, Jose M

    2013-02-01

    This study compared volleyball game-related statistics by outcome (winners and losers of sets) and set number (total, initial, and last) to identify characteristics that discriminated game performance. Game-related statistics from 314 sets (44 matches) played by teams of male 14- to 15-year-olds in a regional volleyball championship were analysed (2011). Differences between contexts (winning or losing teams) and "set number" (total, initial, and last) were assessed. A discriminant analysis was then performed according to outcome (winners and losers of sets) and "set number" (total, initial, and last). The results showed differences (winning or losing sets) in several variables of Complexes I (attack point and error reception) and II (serve and aces). Game-related statistics which discriminate performance in the sets index the serve, positive reception, and attack point. The predictors of performance at these ages when players are still learning could help coaches plan their training.

  12. MeDiA: Mean Distance Association and Its Applications in Nonlinear Gene Set Analysis.

    PubMed

    Peng, Hesen; Ma, Junjie; Bai, Yun; Lu, Jianwei; Yu, Tianwei

    2015-01-01

    Probabilistic association discovery aims at identifying the association between random vectors, regardless of number of variables involved or linear/nonlinear functional forms. Recently, applications in high-dimensional data have generated rising interest in probabilistic association discovery. We developed a framework based on functions on the observation graph, named MeDiA (Mean Distance Association). We generalize its property to a group of functions on the observation graph. The group of functions encapsulates major existing methods in association discovery, e.g. mutual information and Brownian Covariance, and can be expanded to more complicated forms. We conducted numerical comparison of the statistical power of related methods under multiple scenarios. We further demonstrated the application of MeDiA as a method of gene set analysis that captures a broader range of responses than traditional gene set analysis methods.

  13. Statistics of power injection in a plate set into chaotic vibration

    NASA Astrophysics Data System (ADS)

    Cadot, O.; Boudaoud, A.; Touzé, C.

    2008-12-01

    A vibrating plate is set into a chaotic state of wave turbulence by either a periodic or a random local forcing. Correlations between the forcing and the local velocity response of the plate at the forcing point are studied. Statistical models with fairly good agreement with the experiments are proposed for each forcing. Both distributions of injected power have a logarithmic cusp for zero power, while the tails are Gaussian for the periodic driving and exponential for the random one. The distributions of injected work over long time intervals are investigated in the framework of the fluctuation theorem, also known as the Gallavotti-Cohen theorem. It appears that the conclusions of the theorem are verified only for the periodic, deterministic forcing. Using independent estimates of the phase space contraction, this result is discussed in the light of available theoretical framework.

  14. Breast cancer diagnosis using level-set statistics and support vector machines.

    PubMed

    Liu, Jianguo; Yuan, Xiaohui; Buckles, Bill P

    2008-01-01

    Breast cancer diagnosis based on microscopic biopsy images and machine learning has demonstrated great promise in the past two decades. Various feature selection (or extraction) and classification algorithms have been attempted with success. However, some feature selection processes are complex and the number of features used can be quite large. We propose a new feature selection method based on level-set statistics. This procedure is simple and, when used with support vector machines (SVM), only a small number of features is needed to achieve satisfactory accuracy that is comparable to those using more sophisticated features. Therefore, the classification can be completed in much shorter time. We use multi-class support vector machines as the classification tool. Numerical results are reported to support the viability of this new procedure.

  15. Statistical criteria to set alarm levels for continuous measurements of ground contamination.

    PubMed

    Brandl, A; Jimenez, A D Herrera

    2008-08-01

    In the course of the decommissioning of the ASTRA research reactor at the site of the Austrian Research Centers at Seibersdorf, the operator and licensee, Nuclear Engineering Seibersdorf, conducted an extensive site survey and characterization to demonstrate compliance with regulatory site release criteria. This survey included radiological characterization of approximately 400,000 m(2) of open land on the Austrian Research Centers premises. Part of this survey was conducted using a mobile large-area gas proportional counter, continuously recording measurements while it was moved at a speed of 0.5 ms(-1). In order to set reasonable investigation levels, two alarm levels based on statistical considerations were developed. This paper describes the derivation of these alarm levels and the operational experience gained by detector deployment in the field.

  16. Three gene expression vector sets for concurrently expressing multiple genes in Saccharomyces cerevisiae.

    PubMed

    Ishii, Jun; Kondo, Takashi; Makino, Harumi; Ogura, Akira; Matsuda, Fumio; Kondo, Akihiko

    2014-05-01

    Yeast has the potential to be used in bulk-scale fermentative production of fuels and chemicals due to its tolerance for low pH and robustness for autolysis. However, expression of multiple external genes in one host yeast strain is considerably labor-intensive due to the lack of polycistronic transcription. To promote the metabolic engineering of yeast, we generated systematic and convenient genetic engineering tools to express multiple genes in Saccharomyces cerevisiae. We constructed a series of multi-copy and integration vector sets for concurrently expressing two or three genes in S. cerevisiae by embedding three classical promoters. The comparative expression capabilities of the constructed vectors were monitored with green fluorescent protein, and the concurrent expression of genes was monitored with three different fluorescent proteins. Our multiple gene expression tool will be helpful to the advanced construction of genetically engineered yeast strains in a variety of research fields other than metabolic engineering.

  17. Analyzing Planck and low redshift data sets with advanced statistical methods

    NASA Astrophysics Data System (ADS)

    Eifler, Tim

    The recent ESA/NASA Planck mission has provided a key data set to constrain cosmology that is most sensitive to physics of the early Universe, such as inflation and primordial NonGaussianity (Planck 2015 results XIII). In combination with cosmological probes of the LargeScale Structure (LSS), the Planck data set is a powerful source of information to investigate late time phenomena (Planck 2015 results XIV), e.g. the accelerated expansion of the Universe, the impact of baryonic physics on the growth of structure, and the alignment of galaxies in their dark matter halos. It is the main objective of this proposal to re-analyze the archival Planck data, 1) with different, more recently developed statistical methods for cosmological parameter inference, and 2) to combine Planck and ground-based observations in an innovative way. We will make the corresponding analysis framework publicly available and believe that it will set a new standard for future CMB-LSS analyses. Advanced statistical methods, such as the Gibbs sampler (Jewell et al 2004, Wandelt et al 2004) have been critical in the analysis of Planck data. More recently, Approximate Bayesian Computation (ABC, see Weyant et al 2012, Akeret et al 2015, Ishida et al 2015, for cosmological applications) has matured to an interesting tool in cosmological likelihood analyses. It circumvents several assumptions that enter the standard Planck (and most LSS) likelihood analyses, most importantly, the assumption that the functional form of the likelihood of the CMB observables is a multivariate Gaussian. Beyond applying new statistical methods to Planck data in order to cross-check and validate existing constraints, we plan to combine Planck and DES data in a new and innovative way and run multi-probe likelihood analyses of CMB and LSS observables. The complexity of multiprobe likelihood analyses scale (non-linearly) with the level of correlations amongst the individual probes that are included. For the multi

  18. PECA: a novel statistical tool for deconvoluting time-dependent gene expression regulation.

    PubMed

    Teo, Guoshou; Vogel, Christine; Ghosh, Debashis; Kim, Sinae; Choi, Hyungwon

    2014-01-03

    Protein expression varies as a result of intricate regulation of synthesis and degradation of messenger RNAs (mRNA) and proteins. Studies of dynamic regulation typically rely on time-course data sets of mRNA and protein expression, yet there are no statistical methods that integrate these multiomics data and deconvolute individual regulatory processes of gene expression control underlying the observed concentration changes. To address this challenge, we developed Protein Expression Control Analysis (PECA), a method to quantitatively dissect protein expression variation into the contributions of mRNA synthesis/degradation and protein synthesis/degradation, termed RNA-level and protein-level regulation respectively. PECA computes the rate ratios of synthesis versus degradation as the statistical summary of expression control during a given time interval at each molecular level and computes the probability that the rate ratio changed between adjacent time intervals, indicating regulation change at the time point. Along with the associated false-discovery rates, PECA gives the complete description of dynamic expression control, that is, which proteins were up- or down-regulated at each molecular level and each time point. Using PECA, we analyzed two yeast data sets monitoring the cellular response to hyperosmotic and oxidative stress. The rate ratio profiles reported by PECA highlighted a large magnitude of RNA-level up-regulation of stress response genes in the early response and concordant protein-level regulation with time delay. However, the contributions of RNA- and protein-level regulation and their temporal patterns were different between the two data sets. We also observed several cases where protein-level regulation counterbalanced transcriptomic changes in the early stress response to maintain the stability of protein concentrations, suggesting that proteostasis is a proteome-wide phenomenon mediated by post-transcriptional regulation.

  19. Partitioning large data sets: Use of statistical methods applied to a set of Russian igneous-rock chemical analyses

    NASA Astrophysics Data System (ADS)

    Hernández Encinas, L.

    1994-12-01

    A method of cluster analysis has been applied on the following ideas: (a) stabilizing the variances of the variables; (b) reducing the number of variables by principal component analysis; and (c) generating a moderate number of groups containing large numbers of samples and applying a method of cluster analysis to them. This method was applied to a large data set to divide it into groups of samples without taking into consideration their origin. The sample set available consists of 1271 rock chemical analyses from 37 Massifs in the Ural Mountains (Russia), which then were divided into 6 differentiated groups. Later, discriminant functions were calculated to assign new samples to the groups determined.

  20. Variance component score test for time-course gene set analysis of longitudinal RNA-seq data.

    PubMed

    Agniel, Denis; Hejblum, Boris P

    2017-03-10

    As gene expression measurement technology is shifting from microarrays to sequencing, the statistical tools available for their analysis must be adapted since RNA-seq data are measured as counts. It has been proposed to model RNA-seq counts as continuous variables using nonparametric regression to account for their inherent heteroscedasticity. In this vein, we propose tcgsaseq, a principled, model-free, and efficient method for detecting longitudinal changes in RNA-seq gene sets defined a priori. The method identifies those gene sets whose expression varies over time, based on an original variance component score test accounting for both covariates and heteroscedasticity without assuming any specific parametric distribution for the (transformed) counts. We demonstrate that despite the presence of a nonparametric component, our test statistic has a simple form and limiting distribution, and both may be computed quickly. A permutation version of the test is additionally proposed for very small sample sizes. Applied to both simulated data and two real datasets, tcgsaseq is shown to exhibit very good statistical properties, with an increase in stability and power when compared to state-of-the-art methods ROAST (rotation gene set testing), edgeR, and DESeq2, which can fail to control the type I error under certain realistic settings. We have made the method available for the community in the R package tcgsaseq.

  1. Evidence for a Global Sampling Process in Extraction of Summary Statistics of Item Sizes in a Set.

    PubMed

    Tokita, Midori; Ueda, Sachiyo; Ishiguchi, Akira

    2016-01-01

    Several studies have shown that our visual system may construct a "summary statistical representation" over groups of visual objects. Although there is a general understanding that human observers can accurately represent sets of a variety of features, many questions on how summary statistics, such as an average, are computed remain unanswered. This study investigated sampling properties of visual information used by human observers to extract two types of summary statistics of item sets, average and variance. We presented three models of ideal observers to extract the summary statistics: a global sampling model without sampling noise, global sampling model with sampling noise, and limited sampling model. We compared the performance of an ideal observer of each model with that of human observers using statistical efficiency analysis. Results suggest that summary statistics of items in a set may be computed without representing individual items, which makes it possible to discard the limited sampling account. Moreover, the extraction of summary statistics may not necessarily require the representation of individual objects with focused attention when the sets of items are larger than 4.

  2. Evidence for a Global Sampling Process in Extraction of Summary Statistics of Item Sizes in a Set

    PubMed Central

    Tokita, Midori; Ueda, Sachiyo; Ishiguchi, Akira

    2016-01-01

    Several studies have shown that our visual system may construct a “summary statistical representation” over groups of visual objects. Although there is a general understanding that human observers can accurately represent sets of a variety of features, many questions on how summary statistics, such as an average, are computed remain unanswered. This study investigated sampling properties of visual information used by human observers to extract two types of summary statistics of item sets, average and variance. We presented three models of ideal observers to extract the summary statistics: a global sampling model without sampling noise, global sampling model with sampling noise, and limited sampling model. We compared the performance of an ideal observer of each model with that of human observers using statistical efficiency analysis. Results suggest that summary statistics of items in a set may be computed without representing individual items, which makes it possible to discard the limited sampling account. Moreover, the extraction of summary statistics may not necessarily require the representation of individual objects with focused attention when the sets of items are larger than 4. PMID:27242622

  3. Reduced Set of Virulence Genes Allows High Accuracy Prediction of Bacterial Pathogenicity in Humans

    PubMed Central

    Iraola, Gregorio; Vazquez, Gustavo; Spangenberg, Lucía; Naya, Hugo

    2012-01-01

    Although there have been great advances in understanding bacterial pathogenesis, there is still a lack of integrative information about what makes a bacterium a human pathogen. The advent of high-throughput sequencing technologies has dramatically increased the amount of completed bacterial genomes, for both known human pathogenic and non-pathogenic strains; this information is now available to investigate genetic features that determine pathogenic phenotypes in bacteria. In this work we determined presence/absence patterns of different virulence-related genes among more than finished bacterial genomes from both human pathogenic and non-pathogenic strains, belonging to different taxonomic groups (i.e: Actinobacteria, Gammaproteobacteria, Firmicutes, etc.). An accuracy of 95% using a cross-fold validation scheme with in-fold feature selection is obtained when classifying human pathogens and non-pathogens. A reduced subset of highly informative genes () is presented and applied to an external validation set. The statistical model was implemented in the BacFier v1.0 software (freely available at ), that displays not only the prediction (pathogen/non-pathogen) and an associated probability for pathogenicity, but also the presence/absence vector for the analyzed genes, so it is possible to decipher the subset of virulence genes responsible for the classification on the analyzed genome. Furthermore, we discuss the biological relevance for bacterial pathogenesis of the core set of genes, corresponding to eight functional categories, all with evident and documented association with the phenotypes of interest. Also, we analyze which functional categories of virulence genes were more distinctive for pathogenicity in each taxonomic group, which seems to be a completely new kind of information and could lead to important evolutionary conclusions. PMID:22916122

  4. Along signal paths: an empirical gene set approach exploiting pathway topology

    PubMed Central

    Martini, Paolo; Sales, Gabriele; Massa, M. Sofia; Chiogna, Monica; Romualdi, Chiara

    2013-01-01

    Gene set analysis using biological pathways has become a widely used statistical approach for gene expression analysis. A biological pathway can be represented through a graph where genes and their interactions are, respectively, nodes and edges of the graph. From a biological point of view only some portions of a pathway are expected to be altered; however, few methods using pathway topology have been proposed and none of them tries to identify the signal paths, within a pathway, mostly involved in the biological problem. Here, we present a novel algorithm for pathway analysis clipper, that tries to fill in this gap. clipper implements a two-step empirical approach based on the exploitation of graph decomposition into a junction tree to reconstruct the most relevant signal path. In the first step clipper selects significant pathways according to statistical tests on the means and the concentration matrices of the graphs derived from pathway topologies. Then, it identifies within these pathways the signal paths having the greatest association with a specific phenotype. We test our approach on simulated and two real expression datasets. Our results demonstrate the efficacy of clipper in the identification of signal transduction paths totally coherent with the biological problem. PMID:23002139

  5. Can You Explain that in Plain English? Making Statistics Group Projects Work in a Multicultural Setting

    ERIC Educational Resources Information Center

    Sisto, Michelle

    2009-01-01

    Students increasingly need to learn to communicate statistical results clearly and effectively, as well as to become competent consumers of statistical information. These two learning goals are particularly important for business students. In line with reform movements in Statistics Education and the GAISE guidelines, we are working to implement…

  6. Statistics

    Cancer.gov

    Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.

  7. Impact of benchmark data set topology on the validation of virtual screening methods: exploration and quantification by spatial statistics.

    PubMed

    Rohrer, Sebastian G; Baumann, Knut

    2008-04-01

    A common finding of many reports evaluating ligand-based virtual screening methods is that validation results vary considerably with changing benchmark data sets. It is widely assumed that these data set specific effects are caused by the redundancy, self-similarity, and cluster structure inherent to those data sets. These phenomena manifest themselves in the data sets' representation in descriptor space, which is termed the data set topology. A methodology for the characterization of data set topology based on spatial statistics is introduced. The method is nonparametric and can deal with arbitrary distributions of descriptor values. With this methodology it is possible to associate differences in virtual screening performance on different data sets with differences in data set topology. Moreover, the better virtual screening performance of certain descriptors can be explained by their ability of representing the benchmark data sets by a more favorable topology. Finally it is shown, that the composition of some benchmark data sets causes topologies that lead to overoptimistic validation results even in very "simple" descriptor spaces. Spatial statistics analysis as proposed here facilitates the detection of such biased data sets and may provide a tool for the future design of unbiased benchmark data sets.

  8. Extracting transcription factor binding sites from unaligned gene sequences with statistical models

    PubMed Central

    Lu, Chung-Chin; Yuan, Wei-Hao; Chen, Te-Ming

    2008-01-01

    Background Transcription factor binding sites (TFBSs) are crucial in the regulation of gene transcription. Recently, chromatin immunoprecipitation followed by cDNA microarray hybridization (ChIP-chip array) has been used to identify potential regulatory sequences, but the procedure can only map the probable protein-DNA interaction loci within 1–2 kb resolution. To find out the exact binding motifs, it is necessary to build a computational method to examine the ChIP-chip array binding sequences and search for possible motifs representing the transcription factor binding sites. Results We developed a program to find out accurate motif sites from a set of unaligned DNA sequences in the yeast genome. Compared with MDscan, the prediction results suggest that, overall, our algorithm outperforms MDscan since the predicted motifs are more consistent with previously known specificities reported in the literature and have better prediction ranks. Our program also outperforms the constraint-less Cosmo program, especially in the elimination of false positives. Conclusion In this study, an improved sampling algorithm is proposed to incorporate the binomial probability model to build significant initial candidate motif sets. By investigating the statistical dependence between base positions in TFBSs, the method of dependency graphs and their expanded Bayesian networks is combined. The results show that our program satisfactorily extract transcription factor binding sites from unaligned gene sequences. PMID:19091030

  9. Human Effector / Initiator Gene Sets That Regulate Myometrial Contractility During Term and Preterm Labor

    PubMed Central

    WEINER, Carl P.; MASON, Clifford W.; DONG, Yafeng; BUHIMSCHI, Irina A.; SWAAN, Peter W.; BUHIMSCHI, Catalin S.

    2010-01-01

    Objective Distinct processes govern transition from quiescence to activation during term (TL) and preterm labor (PTL). We sought gene sets responsible for TL and PTL, along with the effector genes necessary for labor independent of gestation and underlying trigger. Methods Expression was analyzed in term and preterm +/− labor (n =6 subjects/group). Gene sets were generated using logic operations. Results 34 genes were similarly expressed in PTL/TL but absent from nonlabor samples (Effector Set). 49 genes were specific to PTL (Preterm Initiator Set) and 174 to TL (Term Initiator Set). The gene ontogeny processes comprising Term Initiator and Effector Sets were diverse, though inflammation was represented in 4 of the top 10; inflammation dominated the Preterm Initiator Set. Comments TL and PTL differ dramatically in initiator profiles. Though inflammation is part of the Term Initiator and the Effector Sets, it is an overwhelming part of PTL associated with intraamniotic inflammation. PMID:20452493

  10. Inferring biological functions and associated transcriptional regulators using gene set expression coherence analysis

    PubMed Central

    Kim, Tae-Min; Chung, Yeun-Jun; Rhyu, Mun-Gan; Ho Jung, Myeong

    2007-01-01

    Background Gene clustering has been widely used to group genes with similar expression pattern in microarray data analysis. Subsequent enrichment analysis using predefined gene sets can provide clues on which functional themes or regulatory sequence motifs are associated with individual gene clusters. In spite of the potential utility, gene clustering and enrichment analysis have been used in separate platforms, thus, the development of integrative algorithm linking both methods is highly challenging. Results In this study, we propose an algorithm for discovery of molecular functions and elucidation of transcriptional logics using two kinds of gene information, functional and regulatory motif gene sets. The algorithm, termed gene set expression coherence analysis first selects functional gene sets with significantly high expression coherences. Those candidate gene sets are further processed into a number of functionally related themes or functional clusters according to the expression similarities. Each functional cluster is then, investigated for the enrichment of transcriptional regulatory motifs using modified gene set enrichment analysis and regulatory motif gene sets. The method was tested for two publicly available expression profiles representing murine myogenesis and erythropoiesis. For respective profiles, our algorithm identified myocyte- and erythrocyte-related molecular functions, along with the putative transcriptional regulators for the corresponding molecular functions. Conclusion As an integrative and comprehensive method for the analysis of large-scaled gene expression profiles, our method is able to generate a set of testable hypotheses: the transcriptional regulator X regulates function Y under cellular condition Z. GSECA algorithm is implemented into freely available software package. PMID:18021416

  11. New cyt b gene universal primer set for forensic analysis.

    PubMed

    Lopez-Oceja, A; Gamarra, D; Borragan, S; Jiménez-Moreno, S; de Pancorbo, M M

    2016-07-01

    Analysis of mitochondrial DNA, and in particular the cytochrome b gene (cyt b), has become an essential tool for species identification in routine forensic practice. In cases of degraded samples, where the DNA is fractionated, universal primers that are highly efficient for the amplification of the target region are necessary. Therefore, in the present study a new universal cyt b primer set with high species identification capabilities, even in samples with highly degraded DNA, has been developed. In order to achieve this objective, the primers were designed following the alignment of complete sequences of the cyt b from 751 species from the Class of Mammalia listed in GenBank. A highly variable region of 148bp flanked by highly conserved sequences was chosen for placing the primers. The effectiveness of the new pair of primers was examined in 63 animal species belonging to 38 Families from 14 Orders and 5 Classes (Mammalia, Aves, Reptilia, Actinopterygii, and Malacostraca). Species determination was possible in all cases, which shows that the fragment analyzed provided a high capability for species identification. Furthermore, to ensure the efficiency of the 148bp fragment, the intraspecific variability was analyzed by calculating the concordance between individuals with the BLAST tool from the NCBI (National Center for Biotechnological Information). The intraspecific concordance levels were superior to 97% in all species. Likewise, the phylogenetic information from the selected fragment was confirmed by obtaining the phylogenetic tree from the sequences of the species analyzed. Evidence of the high power of phylogenetic discrimination of the analyzed fragment of the cyt b was obtained, as 93.75% of the species were grouped within their corresponding Orders. Finally, the analysis of 40 degraded samples with small-size DNA fragments showed that the new pair of primers permits identifying the species, even when the DNA is highly degraded as it is very common in

  12. snpGeneSets: An R Package for Genome-Wide Study Annotation

    PubMed Central

    Mei, Hao; Li, Lianna; Jiang, Fan; Simino, Jeannette; Griswold, Michael; Mosley, Thomas; Liu, Shijian

    2016-01-01

    Genome-wide studies (GWS) of SNP associations and differential gene expressions have generated abundant results; next-generation sequencing technology has further boosted the number of variants and genes identified. Effective interpretation requires massive annotation and downstream analysis of these genome-wide results, a computationally challenging task. We developed the snpGeneSets package to simplify annotation and analysis of GWS results. Our package integrates local copies of knowledge bases for SNPs, genes, and gene sets, and implements wrapper functions in the R language to enable transparent access to low-level databases for efficient annotation of large genomic data. The package contains functions that execute three types of annotations: (1) genomic mapping annotation for SNPs and genes and functional annotation for gene sets; (2) bidirectional mapping between SNPs and genes, and genes and gene sets; and (3) calculation of gene effect measures from SNP associations and performance of gene set enrichment analyses to identify functional pathways. We applied snpGeneSets to type 2 diabetes (T2D) results from the NHGRI genome-wide association study (GWAS) catalog, a Finnish GWAS, and a genome-wide expression study (GWES). These studies demonstrate the usefulness of snpGeneSets for annotating and performing enrichment analysis of GWS results. The package is open-source, free, and can be downloaded at: https://www.umc.edu/biostats_software/. PMID:27807048

  13. snpGeneSets: An R Package for Genome-Wide Study Annotation.

    PubMed

    Mei, Hao; Li, Lianna; Jiang, Fan; Simino, Jeannette; Griswold, Michael; Mosley, Thomas; Liu, Shijian

    2016-12-07

    Genome-wide studies (GWS) of SNP associations and differential gene expressions have generated abundant results; next-generation sequencing technology has further boosted the number of variants and genes identified. Effective interpretation requires massive annotation and downstream analysis of these genome-wide results, a computationally challenging task. We developed the snpGeneSets package to simplify annotation and analysis of GWS results. Our package integrates local copies of knowledge bases for SNPs, genes, and gene sets, and implements wrapper functions in the R language to enable transparent access to low-level databases for efficient annotation of large genomic data. The package contains functions that execute three types of annotations: (1) genomic mapping annotation for SNPs and genes and functional annotation for gene sets; (2) bidirectional mapping between SNPs and genes, and genes and gene sets; and (3) calculation of gene effect measures from SNP associations and performance of gene set enrichment analyses to identify functional pathways. We applied snpGeneSets to type 2 diabetes (T2D) results from the NHGRI genome-wide association study (GWAS) catalog, a Finnish GWAS, and a genome-wide expression study (GWES). These studies demonstrate the usefulness of snpGeneSets for annotating and performing enrichment analysis of GWS results. The package is open-source, free, and can be downloaded at: https://www.umc.edu/biostats_software/.

  14. Gene-set activity toolbox (GAT): A platform for microarray-based cancer diagnosis using an integrative gene-set analysis approach.

    PubMed

    Engchuan, Worrawat; Meechai, Asawin; Tongsima, Sissades; Doungpan, Narumol; Chan, Jonathan H

    2016-08-01

    Cancer is a complex disease that cannot be diagnosed reliably using only single gene expression analysis. Using gene-set analysis on high throughput gene expression profiling controlled by various environmental factors is a commonly adopted technique used by the cancer research community. This work develops a comprehensive gene expression analysis tool (gene-set activity toolbox: (GAT)) that is implemented with data retriever, traditional data pre-processing, several gene-set analysis methods, network visualization and data mining tools. The gene-set analysis methods are used to identify subsets of phenotype-relevant genes that will be used to build a classification model. To evaluate GAT performance, we performed a cross-dataset validation study on three common cancers namely colorectal, breast and lung cancers. The results show that GAT can be used to build a reasonable disease diagnostic model and the predicted markers have biological relevance. GAT can be accessed from http://gat.sit.kmutt.ac.th where GAT's java library for gene-set analysis, simple classification and a database with three cancer benchmark datasets can be downloaded.

  15. GeneSet2miRNA: finding the signature of cooperative miRNA activities in the gene lists

    PubMed Central

    Antonov, Alexey V.; Dietmann, Sabine; Wong, Philip; Lutter, Dominik; Mewes, Hans W.

    2009-01-01

    GeneSet2miRNA is the first web-based tool which is able to identify whether or not a gene list has a signature of miRNA-regulatory activity. As input, GeneSet2miRNA accepts a list of genes. As output, a list of miRNA-regulatory models is provided. A miRNA-regulatory model is a group of miRNAs (single, pair, triplet or quadruplet) that is predicted to regulate a significant subset of genes from the submitted list. GeneSet2miRNA provides a user friendly dialog-driven web page submission available for several model organisms. GeneSet2miRNA is freely available at http://mips.helmholtz-muenchen.de/proj/gene2mir/. PMID:19420064

  16. From biophysics to evolutionary genetics: statistical aspects of gene regulation

    PubMed Central

    Lässig, Michael

    2007-01-01

    This is an introductory review on how genes interact to produce biological functions. Transcriptional interactions involve the binding of proteins to regulatory DNA. Specific binding sites can be identified by genomic analysis, and these undergo a stochastic evolution process governed by selection, mutations, and genetic drift. We focus on the links between the biophysical function and the evolution of regulatory elements. In particular, we infer fitness landscapes of binding sites from genomic data, leading to a quantitative evolutionary picture of regulation. PMID:17903288

  17. PEDSTATS: descriptive statistics, graphics and quality assessment for gene mapping data.

    PubMed

    Wigginton, Janis E; Abecasis, Gonçalo R

    2005-08-15

    We describe a tool that produces summary statistics and basic quality assessments for gene-mapping data, accommodating either pedigree or case-control datasets. Our tool can also produce graphic output in the PDF format.

  18. Selection of the Maximum Spatial Cluster Size of the Spatial Scan Statistic by Using the Maximum Clustering Set-Proportion Statistic.

    PubMed

    Ma, Yue; Yin, Fei; Zhang, Tao; Zhou, Xiaohua Andrew; Li, Xiaosong

    2016-01-01

    Spatial scan statistics are widely used in various fields. The performance of these statistics is influenced by parameters, such as maximum spatial cluster size, and can be improved by parameter selection using performance measures. Current performance measures are based on the presence of clusters and are thus inapplicable to data sets without known clusters. In this work, we propose a novel overall performance measure called maximum clustering set-proportion (MCS-P), which is based on the likelihood of the union of detected clusters and the applied dataset. MCS-P was compared with existing performance measures in a simulation study to select the maximum spatial cluster size. Results of other performance measures, such as sensitivity and misclassification, suggest that the spatial scan statistic achieves accurate results in most scenarios with the maximum spatial cluster sizes selected using MCS-P. Given that previously known clusters are not required in the proposed strategy, selection of the optimal maximum cluster size with MCS-P can improve the performance of the scan statistic in applications without identified clusters.

  19. Toward a comprehensive set of asthma susceptibility genes.

    PubMed

    Bossé, Yohan; Hudson, Thomas J

    2007-01-01

    Epidemiological and twin studies have demonstrated that asthma is under genetic and environmental influences. Numerous candidate gene association studies as well as genome-wide linkage scans have followed, aiming to elucidate the genetic architecture underlying this complex disease. Several promising asthma susceptibility genes were identified, and a comprehensive catalogue of these genes seems a realistic goal within 5 to 10 years. However, a key challenge is to understand the combination of genes and environmental factors that gives rise to the disease in a specific individual. Currently, most of the reports of asthma susceptibility genes are either preliminary or controversial, with little knowledge about the genetic mechanisms leading to abnormal function of the gene that promotes the development of asthma. Replications of published associations are relatively few. Many factors, including the inherent complexity of asthma as well as methodological issues, can explain these inconsistencies. Promising genetic tools are emerging with the completion of the International HapMap Project that will increase the scope of gene-discovery investigations. It is hoped that these tools, combined with validation studies in additional populations, will enable the creation of a comprehensive catalogue of susceptibility genes for asthma. Notwithstanding the difficulties in making sense of the vast amount of new genetic data, we already see the emergence of new biological pathways of atopy, airway remodeling, and asthma that may lead to novel therapeutic approaches.

  20. Accurate Gene Expression-Based Biodosimetry Using a Minimal Set of Human Gene Transcripts

    SciTech Connect

    Tucker, James D.; Joiner, Michael C.; Thomas, Robert A.; Grever, William E.; Bakhmutsky, Marina V.; Chinkhota, Chantelle N.; Smolinski, Joseph M.; Divine, George W.; Auner, Gregory W.

    2014-03-15

    Purpose: Rapid and reliable methods for conducting biological dosimetry are a necessity in the event of a large-scale nuclear event. Conventional biodosimetry methods lack the speed, portability, ease of use, and low cost required for triaging numerous victims. Here we address this need by showing that polymerase chain reaction (PCR) on a small number of gene transcripts can provide accurate and rapid dosimetry. The low cost and relative ease of PCR compared with existing dosimetry methods suggest that this approach may be useful in mass-casualty triage situations. Methods and Materials: Human peripheral blood from 60 adult donors was acutely exposed to cobalt-60 gamma rays at doses of 0 (control) to 10 Gy. mRNA expression levels of 121 selected genes were obtained 0.5, 1, and 2 days after exposure by reverse-transcriptase real-time PCR. Optimal dosimetry at each time point was obtained by stepwise regression of dose received against individual gene transcript expression levels. Results: Only 3 to 4 different gene transcripts, ASTN2, CDKN1A, GDF15, and ATM, are needed to explain ≥0.87 of the variance (R{sup 2}). Receiver-operator characteristics, a measure of sensitivity and specificity, of 0.98 for these statistical models were achieved at each time point. Conclusions: The actual and predicted radiation doses agree very closely up to 6 Gy. Dosimetry at 8 and 10 Gy shows some effect of saturation, thereby slightly diminishing the ability to quantify higher exposures. Analyses of these gene transcripts may be advantageous for use in a field-portable device designed to assess exposures in mass casualty situations or in clinical radiation emergencies.

  1. Set statistics in conductive bridge random access memory device with Cu/HfO{sub 2}/Pt structure

    SciTech Connect

    Zhang, Meiyun; Long, Shibing Wang, Guoming; Xu, Xiaoxin; Li, Yang; Liu, Qi; Lv, Hangbing; Liu, Ming; Lian, Xiaojuan; Miranda, Enrique; Suñé, Jordi

    2014-11-10

    The switching parameter variation of resistive switching memory is one of the most important challenges in its application. In this letter, we have studied the set statistics of conductive bridge random access memory with a Cu/HfO{sub 2}/Pt structure. The experimental distributions of the set parameters in several off resistance ranges are shown to nicely fit a Weibull model. The Weibull slopes of the set voltage and current increase and decrease logarithmically with off resistance, respectively. This experimental behavior is perfectly captured by a Monte Carlo simulator based on the cell-based set voltage statistics model and the Quantum Point Contact electron transport model. Our work provides indications for the improvement of the switching uniformity.

  2. Detection of viruses via statistical gene expression analysis.

    PubMed

    Chen, Minhua; Carlson, David; Zaas, Aimee; Woods, Christopher W; Ginsburg, Geoffrey S; Hero, Alfred; Lucas, Joseph; Carin, Lawrence

    2011-03-01

    We develop a new bayesian construction of the elastic net (ENet), with variational bayesian analysis. This modeling framework is motivated by analysis of gene expression data for viruses, with a focus on H3N2 and H1N1 influenza, as well as Rhino virus and RSV (respiratory syncytial virus). Our objective is to understand the biological pathways responsible for the host response to such viruses, with the ultimate objective of developing a clinical test to distinguish subjects infected by such viruses from subjects with other symptom causes (e.g., bacteria). In addition to analyzing these new datasets, we provide a detailed analysis of the bayesian ENet and compare it to related models.

  3. Fundamental Limitations of High Contrast Imaging Set by Small Sample Statistics

    NASA Astrophysics Data System (ADS)

    Mawet, D.; Milli, J.; Wahhaj, Z.; Pelat, D.; Absil, O.; Delacroix, C.; Boccaletti, A.; Kasper, M.; Kenworthy, M.; Marois, C.; Mennesson, B.; Pueyo, L.

    2014-09-01

    In this paper, we review the impact of small sample statistics on detection thresholds and corresponding confidence levels (CLs) in high-contrast imaging at small angles. When looking close to the star, the number of resolution elements decreases rapidly toward small angles. This reduction of the number of degrees of freedom dramatically affects CLs and false alarm probabilities. Naively using the same ideal hypothesis and methods as for larger separations, which are well understood and commonly assume Gaussian noise, can yield up to one order of magnitude error in contrast estimations at fixed CL. The statistical penalty exponentially increases toward very small inner working angles. Even at 5-10 resolution elements from the star, false alarm probabilities can be significantly higher than expected. Here we present a rigorous statistical analysis that ensures robustness of the CL, but also imposes a substantial limitation on corresponding achievable detection limits (thus contrast) at small angles. This unavoidable fundamental statistical effect has a significant impact on current coronagraphic and future high-contrast imagers. Finally, the paper concludes with practical recommendations to account for small number statistics when computing the sensitivity to companions at small angles and when exploiting the results of direct imaging planet surveys.

  4. Fundamental limitations of high contrast imaging set by small sample statistics

    SciTech Connect

    Mawet, D.; Milli, J.; Wahhaj, Z.; Pelat, D.; Absil, O.; Delacroix, C.; Boccaletti, A.; Kasper, M.; Kenworthy, M.; Marois, C.; Mennesson, B.; Pueyo, L.

    2014-09-10

    In this paper, we review the impact of small sample statistics on detection thresholds and corresponding confidence levels (CLs) in high-contrast imaging at small angles. When looking close to the star, the number of resolution elements decreases rapidly toward small angles. This reduction of the number of degrees of freedom dramatically affects CLs and false alarm probabilities. Naively using the same ideal hypothesis and methods as for larger separations, which are well understood and commonly assume Gaussian noise, can yield up to one order of magnitude error in contrast estimations at fixed CL. The statistical penalty exponentially increases toward very small inner working angles. Even at 5-10 resolution elements from the star, false alarm probabilities can be significantly higher than expected. Here we present a rigorous statistical analysis that ensures robustness of the CL, but also imposes a substantial limitation on corresponding achievable detection limits (thus contrast) at small angles. This unavoidable fundamental statistical effect has a significant impact on current coronagraphic and future high-contrast imagers. Finally, the paper concludes with practical recommendations to account for small number statistics when computing the sensitivity to companions at small angles and when exploiting the results of direct imaging planet surveys.

  5. A reference gene set for chemosensory receptor genes of Manduca sexta.

    PubMed

    Koenig, Christopher; Hirsh, Ariana; Bucks, Sascha; Klinner, Christian; Vogel, Heiko; Shukla, Aditi; Mansfield, Jennifer H; Morton, Brian; Hansson, Bill S; Grosse-Wilde, Ewald

    2015-11-01

    The order of Lepidoptera has historically been crucial for chemosensory research, with many important advances coming from the analysis of species like Bombyx mori or the tobacco hornworm, Manduca sexta. Specifically M. sexta has long been a major model species in the field, especially regarding the importance of olfaction in an ecological context, mainly the interaction with its host plants. In recent years transcriptomic data has led to the discovery of members of all major chemosensory receptor families in the species, but the data was fragmentary and incomplete. Here we present the analysis of the newly available high-quality genome data for the species, supplemented by additional transcriptome data to generate a high quality reference gene set for the three major chemosensory receptor gene families, the gustatory (GR), olfactory (OR) and antennal ionotropic receptors (IR). Coupled with gene expression analysis our approach allows association of specific receptor types and behaviors, like pheromone and host detection. The dataset will provide valuable support for future analysis of these essential chemosensory modalities in this species and in Lepidoptera in general.

  6. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update.

    PubMed

    Kuleshov, Maxim V; Jones, Matthew R; Rouillard, Andrew D; Fernandez, Nicolas F; Duan, Qiaonan; Wang, Zichen; Koplev, Simon; Jenkins, Sherry L; Jagodnik, Kathleen M; Lachmann, Alexander; McDermott, Michael G; Monteiro, Caroline D; Gundersen, Gregory W; Ma'ayan, Avi

    2016-07-08

    Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Here we present a significant update to one of the tools in this domain called Enrichr. Enrichr currently contains a large collection of diverse gene set libraries available for analysis and download. In total, Enrichr currently contains 180 184 annotated gene sets from 102 gene set libraries. New features have been added to Enrichr including the ability to submit fuzzy sets, upload BED files, improved application programming interface and visualization of the results as clustergrams. Overall, Enrichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries. Enrichr is freely available at: http://amp.pharm.mssm.edu/Enrichr.

  7. Size distribution of function-based human gene sets and the split-merge model.

    PubMed

    Li, Wentian; Fontanelli, Oscar; Miramontes, Pedro

    2016-08-01

    The sizes of paralogues-gene families produced by ancestral duplication-are known to follow a power-law distribution. We examine the size distribution of gene sets or gene families where genes are grouped by a similar function or share a common property. The size distribution of Human Gene Nomenclature Committee (HGNC) gene sets deviate from the power-law, and can be fitted much better by a beta rank function. We propose a simple mechanism to break a power-law size distribution by a combination of splitting and merging operations. The largest gene sets are split into two to account for the subfunctional categories, and a small proportion of other gene sets are merged into larger sets as new common themes might be realized. These operations are not uncommon for a curator of gene sets. A simulation shows that iteration of these operations changes the size distribution of Ensembl paralogues and could lead to a distribution fitted by a rank beta function. We further illustrate application of beta rank function by the example of distribution of transcription factors and drug target genes among HGNC gene families.

  8. Size distribution of function-based human gene sets and the split–merge model

    PubMed Central

    Fontanelli, Oscar; Miramontes, Pedro

    2016-01-01

    The sizes of paralogues—gene families produced by ancestral duplication—are known to follow a power-law distribution. We examine the size distribution of gene sets or gene families where genes are grouped by a similar function or share a common property. The size distribution of Human Gene Nomenclature Committee (HGNC) gene sets deviate from the power-law, and can be fitted much better by a beta rank function. We propose a simple mechanism to break a power-law size distribution by a combination of splitting and merging operations. The largest gene sets are split into two to account for the subfunctional categories, and a small proportion of other gene sets are merged into larger sets as new common themes might be realized. These operations are not uncommon for a curator of gene sets. A simulation shows that iteration of these operations changes the size distribution of Ensembl paralogues and could lead to a distribution fitted by a rank beta function. We further illustrate application of beta rank function by the example of distribution of transcription factors and drug target genes among HGNC gene families. PMID:27853602

  9. Comparison of three summary statistics for ranking genes in genome-wide association studies.

    PubMed

    Freytag, Saskia; Bickeböller, Heike

    2014-05-20

    Problems associated with insufficient power have haunted the analysis of genome-wide association studies and are likely to be the main challenge for the analysis of next-generation sequencing data. Ranking genes according to their strength of association with the investigated phenotype is one solution. To obtain rankings for genes, researchers can draw from a wide range of statistics summarizing the relationships between variants mapped to a gene and the phenotype. Hence, it is of interest to explore the performance of these statistics in the context of rankings. To this end, we conducted a simulation study (limited to genes of equal sizes) of three different summary statistics examining the ability to rank genes in a meaningful order. The weighted sum of squared marginal score test (Pan, 2009), RareCover algorithm (Bahtia et al., 2010) and the elastic net regularization (Zou and Hastie, 2005) were chosen, because they can handle common as well as rare variants. The test based on the score statistic outperformed both other methods in almost all investigated scenarios. It was the only measure to consistently detect genes with interacting causal variants. However, the RareCover algorithm proved better at identifying genes including causal variants with small effect sizes and low minor allele frequency than the weighted sum of squared marginal score test. The performance of the elastic net regularization was unimpressive for all but the simplest scenarios.

  10. An Efficient and Robust Statistical Modeling Approach to Discover Differentially Expressed Genes Using Genomic Expression Profiles

    PubMed Central

    Thomas, Jeffrey G.; Olson, James M.; Tapscott, Stephen J.; Zhao, Lue Ping

    2001-01-01

    We have developed a statistical regression modeling approach to discover genes that are differentially expressed between two predefined sample groups in DNA microarray experiments. Our model is based on well-defined assumptions, uses rigorous and well-characterized statistical measures, and accounts for the heterogeneity and genomic complexity of the data. In contrast to cluster analysis, which attempts to define groups of genes and/or samples that share common overall expression profiles, our modeling approach uses known sample group membership to focus on expression profiles of individual genes in a sensitive and robust manner. Further, this approach can be used to test statistical hypotheses about gene expression. To demonstrate this methodology, we compared the expression profiles of 11 acute myeloid leukemia (AML) and 27 acute lymphoblastic leukemia (ALL) samples from a previous study (Golub et al. 1999) and found 141 genes differentially expressed between AML and ALL with a 1% significance at the genomic level. Using this modeling approach to compare different sample groups within the AML samples, we identified a group of genes whose expression profiles correlated with that of thrombopoietin and found that genes whose expression associated with AML treatment outcome lie in recurrent chromosomal locations. Our results are compared with those obtained using t-tests or Wilcoxon rank sum statistics. PMID:11435405

  11. Assessment and improvement of statistical tools for comparative proteomics analysis of sparse data sets with few experimental replicates.

    PubMed

    Schwämmle, Veit; León, Ileana Rodríguez; Jensen, Ole Nørregaard

    2013-09-06

    Large-scale quantitative analyses of biological systems are often performed with few replicate experiments, leading to multiple nonidentical data sets due to missing values. For example, mass spectrometry driven proteomics experiments are frequently performed with few biological or technical replicates due to sample-scarcity or due to duty-cycle or sensitivity constraints, or limited capacity of the available instrumentation, leading to incomplete results where detection of significant feature changes becomes a challenge. This problem is further exacerbated for the detection of significant changes on the peptide level, for example, in phospho-proteomics experiments. In order to assess the extent of this problem and the implications for large-scale proteome analysis, we investigated and optimized the performance of three statistical approaches by using simulated and experimental data sets with varying numbers of missing values. We applied three tools, including standard t test, moderated t test, also known as limma, and rank products for the detection of significantly changing features in simulated and experimental proteomics data sets with missing values. The rank product method was improved to work with data sets containing missing values. Extensive analysis of simulated and experimental data sets revealed that the performance of the statistical analysis tools depended on simple properties of the data sets. High-confidence results were obtained by using the limma and rank products methods for analyses of triplicate data sets that exhibited more than 1000 features and more than 50% missing values. The maximum number of differentially represented features was identified by using limma and rank products methods in a complementary manner. We therefore recommend combined usage of these methods as a novel and optimal way to detect significantly changing features in these data sets. This approach is suitable for large quantitative data sets from stable isotope labeling

  12. Mechanical Unloading of Mouse Bone in Microgravity Significantly Alters Cell Cycle Gene Set Expression

    NASA Astrophysics Data System (ADS)

    Blaber, Elizabeth; Dvorochkin, Natalya; Almeida, Eduardo; Kaplan, Warren; Burns, Brnedan

    2012-07-01

    unloading in spaceflight, we conducted genome wide microarray analysis of total RNA isolated from the mouse pelvis. Specifically, 16 week old mice were subjected to 15 days spaceflight onboard NASA's STS-131 space shuttle mission. The pelvis of the mice was dissected, the bone marrow was flushed and the bones were briefly stored in RNAlater. The pelvii were then homogenized, and RNA was isolated using TRIzol. RNA concentration and quality was measured using a Nanodrop spectrometer, and 0.8% agarose gel electrophoresis. Samples of cDNA were analyzed using an Affymetrix GeneChip\\S Gene 1.0 ST (Sense Target) Array System for Mouse and GenePattern Software. We normalized the ST gene arrays using Robust Multichip Average (RMA) normalization, which summarizes perfectly matched spots on the array through the median polish algorithm, rather than normalizing according to mismatched spots. We also used Limma for statistical analysis, using the BioConductor Limma Library by Gordon Smyth, and differential expression analysis to identify genes with significant changes in expression between the two experimental conditions. Finally we used GSEApreRanked for Gene Set Enrichment Analysis (GSEA), with Kolmogorov-Smirnov style statistics to identify groups of genes that are regulated together using the t-statistics derived from Limma. Preliminary results show that 6,603 genes expressed in pelvic bone had statistically significant alterations in spaceflight compared to ground controls. These prominently included cell cycle arrest molecules p21, and p18, cell survival molecule Crbp1, and cell cycle molecules cyclin D1, and Cdk1. Additionally, GSEA results indicated alterations in molecular targets of cyclin D1 and Cdk4, senescence pathways resulting from abnormal laminin maturation, cell-cell contacts via E-cadherin, and several pathways relating to protein translation and metabolism. In total 111 gene sets out of 2,488, about 4%, showed statistically significant set alterations. These

  13. GSA-PCA: gene set generation by principal component analysis of the Laplacian matrix of a metabolic network

    PubMed Central

    2012-01-01

    Background Gene Set Analysis (GSA) has proven to be a useful approach to microarray analysis. However, most of the method development for GSA has focused on the statistical tests to be used rather than on the generation of sets that will be tested. Existing methods of set generation are often overly simplistic. The creation of sets from individual pathways (in isolation) is a poor reflection of the complexity of the underlying metabolic network. We have developed a novel approach to set generation via the use of Principal Component Analysis of the Laplacian matrix of a metabolic network. We have analysed a relatively simple data set to show the difference in results between our method and the current state-of-the-art pathway-based sets. Results The sets generated with this method are semi-exhaustive and capture much of the topological complexity of the metabolic network. The semi-exhaustive nature of this method has also allowed us to design a hypergeometric enrichment test to determine which genes are likely responsible for set significance. We show that our method finds significant aspects of biology that would be missed (i.e. false negatives) and addresses the false positive rates found with the use of simple pathway-based sets. Conclusions The set generation step for GSA is often neglected but is a crucial part of the analysis as it defines the full context for the analysis. As such, set generation methods should be robust and yield as complete a representation of the extant biological knowledge as possible. The method reported here achieves this goal and is demonstrably superior to previous set analysis methods. PMID:22876834

  14. Degrees of separation as a statistical tool for evaluating candidate genes.

    PubMed

    Nelson, Ronald M; Pettersson, Mats E

    2014-12-01

    Selection of candidate genes is an important step in the exploration of complex genetic architecture. The number of gene networks available is increasing and these can provide information to help with candidate gene selection. It is currently common to use the degree of connectedness in gene networks as validation in Genome Wide Association (GWA) and Quantitative Trait Locus (QTL) mapping studies. However, it can cause misleading results if not validated properly. Here we present a method and tool for validating the gene pairs from GWA studies given the context of the network they co-occur in. It ensures that proposed interactions and gene associations are not statistical artefacts inherent to the specific gene network architecture. The CandidateBacon package provides an easy and efficient method to calculate the average degree of separation (DoS) between pairs of genes to currently available gene networks. We show how these empirical estimates of average connectedness are used to validate candidate gene pairs. Validation of interacting genes by comparing their connectedness with the average connectedness in the gene network will provide support for said interactions by utilising the growing amount of gene network information available.

  15. Re-Conceptualization of Modified Angoff Standard Setting: Unified Statistical, Measurement, Cognitive, and Social Psychological Theories

    ERIC Educational Resources Information Center

    Iyioke, Ifeoma Chika

    2013-01-01

    This dissertation describes a design for training, in accordance with probability judgment heuristics principles, for the Angoff standard setting method. The new training with instruction, practice, and feedback tailored to the probability judgment heuristics principles was called the Heuristic training and the prevailing Angoff method training…

  16. Boosted leave-many-out cross-validation: the effect of training and test set diversity on PLS statistics.

    PubMed

    Clark, Robert D

    2003-01-01

    It is becoming increasingly common in quantitative structure/activity relationship (QSAR) analyses to use external test sets to evaluate the likely stability and predictivity of the models obtained. In some cases, such as those involving variable selection, an internal test set--i.e., a cross-validation set--is also used. Care is sometimes taken to ensure that the subsets used exhibit response and/or property distributions similar to those of the data set as a whole, but more often the individual observations are simply assigned 'at random.' In the special case of MLR without variable selection, it can be analytically demonstrated that this strategy is inferior to others. Most particularly, D-optimal design performs better if the form of the regression equation is known and the variables involved are well behaved. This report introduces an alternative, non-parametric approach termed 'boosted leave-many-out' (boosted LMO) cross-validation. In this method, relatively small training sets are chosen by applying optimizable k-dissimilarity selection (OptiSim) using a small subsample size (k = 4, in this case), with the unselected observations being reserved as a test set for the corresponding reduced model. Predictive errors for the full model are then estimated by aggregating results over several such analyses. The countervailing effects of training and test set size, diversity, and representativeness on PLS model statistics are described for CoMFA analysis of a large data set of COX2 inhibitors.

  17. Micro-foundations for macroeconomics: New set-up based on statistical physics

    NASA Astrophysics Data System (ADS)

    Yoshikawa, Hiroshi

    2016-12-01

    Modern macroeconomics is built on "micro foundations." Namely, optimization of micro agent such as consumer and firm is explicitly analyzed in model. Toward this goal, standard model presumes "the representative" consumer/firm, and analyzes its behavior in detail. However, the macroeconomy consists of 107 consumers and 106 firms. For the purpose of analyzing such macro system, it is meaningless to pursue the micro behavior in detail. In this respect, there is no essential difference between economics and physics. The method of statistical physics can be usefully applied to the macroeconomy, and provides Keynesian economics with correct micro-foundations.

  18. Integrated Data Collection Analysis (IDCA) Program - Statistical Analysis of RDX Standard Data Sets

    SciTech Connect

    Sandstrom, Mary M.; Brown, Geoffrey W.; Preston, Daniel N.; Pollard, Colin J.; Warner, Kirstin F.; Sorensen, Daniel N.; Remmers, Daniel L.; Phillips, Jason J.; Shelley, Timothy J.; Reyes, Jose A.; Hsu, Peter C.; Reynolds, John G.

    2015-10-30

    The Integrated Data Collection Analysis (IDCA) program is conducting a Proficiency Test for Small- Scale Safety and Thermal (SSST) testing of homemade explosives (HMEs). Described here are statistical analyses of the results for impact, friction, electrostatic discharge, and differential scanning calorimetry analysis of the RDX Type II Class 5 standard. The material was tested as a well-characterized standard several times during the proficiency study to assess differences among participants and the range of results that may arise for well-behaved explosive materials. The analyses show that there are detectable differences among the results from IDCA participants. While these differences are statistically significant, most of them can be disregarded for comparison purposes to assess potential variability when laboratories attempt to measure identical samples using methods assumed to be nominally the same. The results presented in this report include the average sensitivity results for the IDCA participants and the ranges of values obtained. The ranges represent variation about the mean values of the tests of between 26% and 42%. The magnitude of this variation is attributed to differences in operator, method, and environment as well as the use of different instruments that are also of varying age. The results appear to be a good representation of the broader safety testing community based on the range of methods, instruments, and environments included in the IDCA Proficiency Test.

  19. Statistics of dark matter halos in the excursion set peak framework

    SciTech Connect

    Lapi, A.; Danese, L. E-mail: danese@sissa.it

    2014-07-01

    We derive approximated, yet very accurate analytical expressions for the abundance and clustering properties of dark matter halos in the excursion set peak framework; the latter relies on the standard excursion set approach, but also includes the effects of a realistic filtering of the density field, a mass-dependent threshold for collapse, and the prescription from peak theory that halos tend to form around density maxima. We find that our approximations work excellently for diverse power spectra, collapse thresholds and density filters. Moreover, when adopting a cold dark matter power spectra, a tophat filtering and a mass-dependent collapse threshold (supplemented with conceivable scatter), our approximated halo mass function and halo bias represent very well the outcomes of cosmological N-body simulations.

  20. A power comparison of generalized additive models and the spatial scan statistic in a case-control setting

    PubMed Central

    2010-01-01

    Background A common, important problem in spatial epidemiology is measuring and identifying variation in disease risk across a study region. In application of statistical methods, the problem has two parts. First, spatial variation in risk must be detected across the study region and, second, areas of increased or decreased risk must be correctly identified. The location of such areas may give clues to environmental sources of exposure and disease etiology. One statistical method applicable in spatial epidemiologic settings is a generalized additive model (GAM) which can be applied with a bivariate LOESS smoother to account for geographic location as a possible predictor of disease status. A natural hypothesis when applying this method is whether residential location of subjects is associated with the outcome, i.e. is the smoothing term necessary? Permutation tests are a reasonable hypothesis testing method and provide adequate power under a simple alternative hypothesis. These tests have yet to be compared to other spatial statistics. Results This research uses simulated point data generated under three alternative hypotheses to evaluate the properties of the permutation methods and compare them to the popular spatial scan statistic in a case-control setting. Case 1 was a single circular cluster centered in a circular study region. The spatial scan statistic had the highest power though the GAM method estimates did not fall far behind. Case 2 was a single point source located at the center of a circular cluster and Case 3 was a line source at the center of the horizontal axis of a square study region. Each had linearly decreasing logodds with distance from the point. The GAM methods outperformed the scan statistic in Cases 2 and 3. Comparing sensitivity, measured as the proportion of the exposure source correctly identified as high or low risk, the GAM methods outperformed the scan statistic in all three Cases. Conclusions The GAM permutation testing methods

  1. Fast and Rigorous Computation of Gene and Pathway Scores from SNP-Based Summary Statistics.

    PubMed

    Lamparter, David; Marbach, Daniel; Rueedi, Rico; Kutalik, Zoltán; Bergmann, Sven

    2016-01-01

    Integrating single nucleotide polymorphism (SNP) p-values from genome-wide association studies (GWAS) across genes and pathways is a strategy to improve statistical power and gain biological insight. Here, we present Pascal (Pathway scoring algorithm), a powerful tool for computing gene and pathway scores from SNP-phenotype association summary statistics. For gene score computation, we implemented analytic and efficient numerical solutions to calculate test statistics. We examined in particular the sum and the maximum of chi-squared statistics, which measure the strongest and the average association signals per gene, respectively. For pathway scoring, we use a modified Fisher method, which offers not only significant power improvement over more traditional enrichment strategies, but also eliminates the problem of arbitrary threshold selection inherent in any binary membership based pathway enrichment approach. We demonstrate the marked increase in power by analyzing summary statistics from dozens of large meta-studies for various traits. Our extensive testing indicates that our method not only excels in rigorous type I error control, but also results in more biologically meaningful discoveries.

  2. Comprehensive analysis of SET domain gene family in foxtail millet identifies the putative role of SiSET14 in abiotic stress tolerance

    PubMed Central

    Yadav, Chandra Bhan; Muthamilarasan, Mehanathan; Dangi, Anand; Shweta, Shweta; Prasad, Manoj

    2016-01-01

    SET domain-containing genes catalyse histone lysine methylation, which alters chromatin structure and regulates the transcription of genes that are involved in various developmental and physiological processes. The present study identified 53 SET domain-containing genes in C4 panicoid model, foxtail millet (Setaria italica) and the genes were physically mapped onto nine chromosomes. Phylogenetic and structural analyses classified SiSET proteins into five classes (I–V). RNA-seq derived expression profiling showed that SiSET genes were differentially expressed in four tissues namely, leaf, root, stem and spica. Expression analyses using qRT-PCR was performed for 21 SiSET genes under different abiotic stress and hormonal treatments, which showed differential expression of these genes during late phase of stress and hormonal treatments. Significant upregulation of SiSET gene was observed during cold stress, which has been confirmed by over-expressing a candidate gene, SiSET14 in yeast. Interestingly, hypermethylation was observed in gene body of highly differentially expressed genes, whereas methylation event was completely absent in their transcription start sites. This suggested the occurrence of demethylation events during various abiotic stresses, which enhance the gene expression. Altogether, the present study would serve as a base for further functional characterization of SiSET genes towards understanding their molecular roles in conferring stress tolerance. PMID:27585852

  3. Cross-cultural adaptation of research instruments: language, setting, time and statistical considerations

    PubMed Central

    2010-01-01

    Background Research questionnaires are not always translated appropriately before they are used in new temporal, cultural or linguistic settings. The results based on such instruments may therefore not accurately reflect what they are supposed to measure. This paper aims to illustrate the process and required steps involved in the cross-cultural adaptation of a research instrument using the adaptation process of an attitudinal instrument as an example. Methods A questionnaire was needed for the implementation of a study in Norway 2007. There was no appropriate instruments available in Norwegian, thus an Australian-English instrument was cross-culturally adapted. Results The adaptation process included investigation of conceptual and item equivalence. Two forward and two back-translations were synthesized and compared by an expert committee. Thereafter the instrument was pretested and adjusted accordingly. The final questionnaire was administered to opioid maintenance treatment staff (n=140) and harm reduction staff (n=180). The overall response rate was 84%. The original instrument failed confirmatory analysis. Instead a new two-factor scale was identified and found valid in the new setting. Conclusions The failure of the original scale highlights the importance of adapting instruments to current research settings. It also emphasizes the importance of ensuring that concepts within an instrument are equal between the original and target language, time and context. If the described stages in the cross-cultural adaptation process had been omitted, the findings would have been misleading, even if presented with apparent precision. Thus, it is important to consider possible barriers when making a direct comparison between different nations, cultures and times. PMID:20144247

  4. A statistical model for bacterial speciation triggered by lateral gene transfer

    NASA Astrophysics Data System (ADS)

    Sidhu, Sunjeet; Peng, Wequin

    2006-03-01

    The process of bacterial speciation has been a major unresolved issue in the study of bacterial evolution. It has been proposed that lateral gene transfer and homologous recombination play critical and complementary roles in speciation. We introduce a statistical model, of a population, for the evolution under lateral gene transfer and local homologous recombination. We examine the evolutionary dynamics and its dependence on various evolutionary operators. J. G. Lawrence, Theor. Popul. Biol. 61, 449(2002).

  5. Modelling gene and protein regulatory networks with answer set programming.

    PubMed

    Fayruzov, Timur; Janssen, Jeroen; Vermeir, Dirk; Cornelis, Chris; De Cock, Martine

    2011-01-01

    Recently, many approaches to model regulatory networks have been proposed in the systems biology domain. However, the task is far from being solved. In this paper, we propose an Answer Set Programming (ASP)-based approach to model interaction networks. We build a general ASP framework that describes the network semantics and allows modelling specific networks with little effort. ASP provides a rich and flexible toolbox that allows expanding the framework with desired features. In this paper, we tune our framework to mimic Boolean network behaviour and apply it to model the Budding Yeast and Fission Yeast cell cycle networks. The obtained steady states of these networks correspond to those of the Boolean networks.

  6. A statistical investigation into the stability of iris recognition in diverse population sets

    NASA Astrophysics Data System (ADS)

    Howard, John J.; Etter, Delores M.

    2014-05-01

    Iris recognition is increasingly being deployed on population wide scales for important applications such as border security, social service administration, criminal identification and general population management. The error rates for this incredibly accurate form of biometric identification are established using well known, laboratory quality datasets. However, it is has long been acknowledged in biometric theory that not all individuals have the same likelihood of being correctly serviced by a biometric system. Typically, techniques for identifying clients that are likely to experience a false non-match or a false match error are carried out on a per-subject basis. This research makes the novel hypothesis that certain ethnical denominations are more or less likely to experience a biometric error. Through established statistical techniques, we demonstrate this hypothesis to be true and document the notable effect that the ethnicity of the client has on iris similarity scores. Understanding the expected impact of ethnical diversity on iris recognition accuracy is crucial to the future success of this technology as it is deployed in areas where the target population consists of clientele from a range of geographic backgrounds, such as border crossings and immigration check points.

  7. An abdominal aortic aneurysm segmentation method: level set with region and statistical information.

    PubMed

    Zhuge, Feng; Rubin, Geoffrey D; Sun, Shaohua; Napel, Sandy

    2006-05-01

    We present a system for segmenting the human aortic aneurysm in CT angiograms (CTA), which, in turn, allows measurements of volume and morphological aspects useful for treatment planning. The system estimates a rough "initial surface," and then refines it using a level set segmentation scheme augmented with two external analyzers: The global region analyzer, which incorporates a priori knowledge of the intensity, volume, and shape of the aorta and other structures, and the local feature analyzer, which uses voxel location, intensity, and texture features to train and drive a support vector machine classifier. Each analyzer outputs a value that corresponds to the likelihood that a given voxel is part of the aneurysm, which is used during level set iteration to control the evolution of the surface. We tested our system using a database of 20 CTA scans of patients with aortic aneurysms. The mean and worst case values of volume overlap, volume error, mean distance error, and maximum distance error relative to human tracing were 95.3% +/- 1.4% (s.d.); worst case = 92.9%, 3.5% +/- 2.5% (s.d.); worst case = 7.0%, 0.6 +/- 0.2 mm (s.d.); worst case = 1.0 mm, and 5.2 +/- 2.3 mm (s.d.); worst case = 9.6 mm, respectively. When implemented on a 2.8 GHz Pentium IV personal computer, the mean time required for segmentation was 7.4 +/- 3.6 min (s.d.). We also performed experiments that suggest that our method is insensitive to parameter changes within 10% of their experimentally determined values. This preliminary study proves feasibility for an accurate, precise, and robust system for segmentation of the abdominal aneurysm from CTA data, and may be of benefit to patients with aortic aneurysms.

  8. An abdominal aortic aneurysm segmentation method: Level set with region and statistical information

    SciTech Connect

    Zhuge Feng; Rubin, Geoffrey D.; Sun Shaohua; Napel, Sandy

    2006-05-15

    We present a system for segmenting the human aortic aneurysm in CT angiograms (CTA), which, in turn, allows measurements of volume and morphological aspects useful for treatment planning. The system estimates a rough 'initial surface', and then refines it using a level set segmentation scheme augmented with two external analyzers: The global region analyzer, which incorporates a priori knowledge of the intensity, volume, and shape of the aorta and other structures, and the local feature analyzer, which uses voxel location, intensity, and texture features to train and drive a support vector machine classifier. Each analyzer outputs a value that corresponds to the likelihood that a given voxel is part of the aneurysm, which is used during level set iteration to control the evolution of the surface. We tested our system using a database of 20 CTA scans of patients with aortic aneurysms. The mean and worst case values of volume overlap, volume error, mean distance error, and maximum distance error relative to human tracing were 95.3%{+-}1.4% (s.d.); worst case=92.9%, 3.5%{+-}2.5% (s.d.); worst case=7.0%, 0.6{+-}0.2 mm (s.d.); worst case=1.0 mm, and 5.2{+-}2.3mm (s.d.); worstcase=9.6 mm, respectively. When implemented on a 2.8 GHz Pentium IV personal computer, the mean time required for segmentation was 7.4{+-}3.6min (s.d.). We also performed experiments that suggest that our method is insensitive to parameter changes within 10% of their experimentally determined values. This preliminary study proves feasibility for an accurate, precise, and robust system for segmentation of the abdominal aneurysm from CTA data, and may be of benefit to patients with aortic aneurysms.

  9. CAsubtype: An R Package to Identify Gene Sets Predictive of Cancer Subtypes and Clinical Outcomes.

    PubMed

    Kong, Hualei; Tong, Pan; Zhao, Xiaodong; Sun, Jielin; Li, Hua

    2017-01-21

    In the past decade, molecular classification of cancer has gained high popularity owing to its high predictive power on clinical outcomes as compared with traditional methods commonly used in clinical practice. In particular, using gene expression profiles, recent studies have successfully identified a number of gene sets for the delineation of cancer subtypes that are associated with distinct prognosis. However, identification of such gene sets remains a laborious task due to the lack of tools with flexibility, integration and ease of use. To reduce the burden, we have developed an R package, CAsubtype, to efficiently identify gene sets predictive of cancer subtypes and clinical outcomes. By integrating more than 13,000 annotated gene sets, CAsubtype provides a comprehensive repertoire of candidates for new cancer subtype identification. For easy data access, CAsubtype further includes the gene expression and clinical data of more than 2000 cancer patients from TCGA. CAsubtype first employs principal component analysis to identify gene sets (from user-provided or package-integrated ones) with robust principal components representing significantly large variation between cancer samples. Based on these principal components, CAsubtype visualizes the sample distribution in low-dimensional space for better understanding of the distinction between samples and classifies samples into subgroups with prevalent clustering algorithms. Finally, CAsubtype performs survival analysis to compare the clinical outcomes between the identified subgroups, assessing their clinical value as potentially novel cancer subtypes. In conclusion, CAsubtype is a flexible and well-integrated tool in the R environment to identify gene sets for cancer subtype identification and clinical outcome prediction. Its simple R commands and comprehensive data sets enable efficient examination of the clinical value of any given gene set, thus facilitating hypothesis generating and testing in biological and

  10. Statistical inference of selection and divergence of rice blast resistance gene Pi-ta

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The resistance gene Pi-ta has been effectively used to control rice blast disease worldwide. A few recent studies have described the possible evolution of Pi-ta in cultivated and weedy rice. However, evolutionary statistics used for the studies are too limited to precisely understand selection and d...

  11. Characterizing gene sets using discriminative random walks with restart on heterogeneous biological networks

    PubMed Central

    Blatti, Charles; Sinha, Saurabh

    2016-01-01

    Motivation: Analysis of co-expressed gene sets typically involves testing for enrichment of different annotations or ‘properties’ such as biological processes, pathways, transcription factor binding sites, etc., one property at a time. This common approach ignores any known relationships among the properties or the genes themselves. It is believed that known biological relationships among genes and their many properties may be exploited to more accurately reveal commonalities of a gene set. Previous work has sought to achieve this by building biological networks that combine multiple types of gene–gene or gene–property relationships, and performing network analysis to identify other genes and properties most relevant to a given gene set. Most existing network-based approaches for recognizing genes or annotations relevant to a given gene set collapse information about different properties to simplify (homogenize) the networks. Results: We present a network-based method for ranking genes or properties related to a given gene set. Such related genes or properties are identified from among the nodes of a large, heterogeneous network of biological information. Our method involves a random walk with restarts, performed on an initial network with multiple node and edge types that preserve more of the original, specific property information than current methods that operate on homogeneous networks. In this first stage of our algorithm, we find the properties that are the most relevant to the given gene set and extract a subnetwork of the original network, comprising only these relevant properties. We then re-rank genes by their similarity to the given gene set, based on a second random walk with restarts, performed on the above subnetwork. We demonstrate the effectiveness of this algorithm for ranking genes related to Drosophila embryonic development and aggressive responses in the brains of social animals. Availability and Implementation: DRaWR was implemented as

  12. Statistical Analysis of Hurst Exponents of Essential/Nonessential Genes in 33 Bacterial Genomes

    PubMed Central

    Liu, Xiao; Wang, Baojin; Xu, Luo

    2015-01-01

    Methods for identifying essential genes currently depend predominantly on biochemical experiments. However, there is demand for improved computational methods for determining gene essentiality. In this study, we used the Hurst exponent, a characteristic parameter to describe long-range correlation in DNA, and analyzed its distribution in 33 bacterial genomes. In most genomes (31 out of 33) the significance levels of the Hurst exponents of the essential genes were significantly higher than for the corresponding full-gene-set, whereas the significance levels of the Hurst exponents of the nonessential genes remained unchanged or increased only slightly. All of the Hurst exponents of essential genes followed a normal distribution, with one exception. We therefore propose that the distribution feature of Hurst exponents of essential genes can be used as a classification index for essential gene prediction in bacteria. For computer-aided design in the field of synthetic biology, this feature can build a restraint for pre- or post-design checking of bacterial essential genes. Moreover, considering the relationship between gene essentiality and evolution, the Hurst exponents could be used as a descriptive parameter related to evolutionary level, or be added to the annotation of each gene. PMID:26067107

  13. Prioritization of candidate disease genes by enlarging the seed set and fusing information of the network topology and gene expression.

    PubMed

    Zhang, Shao-Wu; Shao, Dong-Dong; Zhang, Song-Yao; Wang, Yi-Bin

    2014-06-01

    The identification of disease genes is very important not only to provide greater understanding of gene function and cellular mechanisms which drive human disease, but also to enhance human disease diagnosis and treatment. Recently, high-throughput techniques have been applied to detect dozens or even hundreds of candidate genes. However, experimental approaches to validate the many candidates are usually time-consuming, tedious and expensive, and sometimes lack reproducibility. Therefore, numerous theoretical and computational methods (e.g. network-based approaches) have been developed to prioritize candidate disease genes. Many network-based approaches implicitly utilize the observation that genes causing the same or similar diseases tend to correlate with each other in gene-protein relationship networks. Of these network approaches, the random walk with restart algorithm (RWR) is considered to be a state-of-the-art approach. To further improve the performance of RWR, we propose a novel method named ESFSC to identify disease-related genes, by enlarging the seed set according to the centrality of disease genes in a network and fusing information of the protein-protein interaction (PPI) network topological similarity and the gene expression correlation. The ESFSC algorithm restarts at all of the nodes in the seed set consisting of the known disease genes and their k-nearest neighbor nodes, then walks in the global network separately guided by the similarity transition matrix constructed with PPI network topological similarity properties and the correlational transition matrix constructed with the gene expression profiles. As a result, all the genes in the network are ranked by weighted fusing the above results of the RWR guided by two types of transition matrices. Comprehensive simulation results of the 10 diseases with 97 known disease genes collected from the Online Mendelian Inheritance in Man (OMIM) database show that ESFSC outperforms existing methods for

  14. Gene integrated set profile analysis: a context-based approach for inferring biological endpoints.

    PubMed

    Kowalski, Jeanne; Dwivedi, Bhakti; Newman, Scott; Switchenko, Jeffery M; Pauly, Rini; Gutman, David A; Arora, Jyoti; Gandhi, Khanjan; Ainslie, Kylie; Doho, Gregory; Qin, Zhaohui; Moreno, Carlos S; Rossi, Michael R; Vertino, Paula M; Lonial, Sagar; Bernal-Mizrachi, Leon; Boise, Lawrence H

    2016-04-20

    The identification of genes with specific patterns of change (e.g. down-regulated and methylated) as phenotype drivers or samples with similar profiles for a given gene set as drivers of clinical outcome, requires the integration of several genomic data types for which an 'integrate by intersection' (IBI) approach is often applied. In this approach, results from separate analyses of each data type are intersected, which has the limitation of a smaller intersection with more data types. We introduce a new method, GISPA (Gene Integrated Set Profile Analysis) for integrated genomic analysis and its variation, SISPA (Sample Integrated Set Profile Analysis) for defining respective genes and samples with the context of similar, a priori specified molecular profiles. With GISPA, the user defines a molecular profile that is compared among several classes and obtains ranked gene sets that satisfy the profile as drivers of each class. With SISPA, the user defines a gene set that satisfies a profile and obtains sample groups of profile activity. Our results from applying GISPA to human multiple myeloma (MM) cell lines contained genes of known profiles and importance, along with several novel targets, and their further SISPA application to MM coMMpass trial data showed clinical relevance.

  15. Gene expression profiling of peripheral blood mononuclear cells in the setting of peripheral arterial disease

    PubMed Central

    2012-01-01

    Background Peripheral arterial disease (PAD) is a relatively common manifestation of systemic atherosclerosis that leads to progressive narrowing of the lumen of leg arteries. Circulating monocytes are in contact with the arterial wall and can serve as reporters of vascular pathology in the setting of PAD. We performed gene expression analysis of peripheral blood mononuclear cells (PBMC) in patients with PAD and controls without PAD to identify differentially regulated genes. Methods PAD was defined as an ankle brachial index (ABI) ≤0.9 (n = 19) while age and gender matched controls had an ABI > 1.0 (n = 18). Microarray analysis was performed using Affymetrix HG-U133 plus 2.0 gene chips and analyzed using GeneSpring GX 11.0. Gene expression data was normalized using Robust Multichip Analysis (RMA) normalization method, differential expression was defined as a fold change ≥1.5, followed by unpaired Mann-Whitney test (P < 0.05) and correction for multiple testing by Benjamini and Hochberg False Discovery Rate. Meta-analysis of differentially expressed genes was performed using an integrated bioinformatics pipeline with tools for enrichment analysis using Gene Ontology (GO) terms, pathway analysis using Kyoto Encyclopedia of Genes and Genomes (KEGG), molecular event enrichment using Reactome annotations and network analysis using Ingenuity Pathway Analysis suite. Extensive biocuration was also performed to understand the functional context of genes. Results We identified 87 genes differentially expressed in the setting of PAD; 40 genes were upregulated and 47 genes were downregulated. We employed an integrated bioinformatics pipeline coupled with literature curation to characterize the functional coherence of differentially regulated genes. Conclusion Notably, upregulated genes mediate immune response, inflammation, apoptosis, stress response, phosphorylation, hemostasis, platelet activation and platelet aggregation. Downregulated genes included several genes from

  16. Identifying stably expressed genes from multiple RNA-Seq data sets

    PubMed Central

    Emerson, Sarah; Chang, Jeff H.; Di, Yanming

    2016-01-01

    We examined RNA-Seq data on 211 biological samples from 24 different Arabidopsis experiments carried out by different labs. We grouped the samples according to tissue types, and in each of the groups, we identified genes that are stably expressed across biological samples, treatment conditions, and experiments. We fit a Poisson log-linear mixed-effect model to the read counts for each gene and decomposed the total variance into between-sample, between-treatment and between-experiment variance components. Identifying stably expressed genes is useful for count normalization and differential expression analysis. The variance component analysis that we explore here is a first step towards understanding the sources and nature of the RNA-Seq count variation. When using a numerical measure to identify stably expressed genes, the outcome depends on multiple factors: the background sample set and the reference gene set used for count normalization, the technology used for measuring gene expression, and the specific numerical stability measure used. Since differential expression (DE) is measured by relative frequencies, we argue that DE is a relative concept. We advocate using an explicit reference gene set for count normalization to improve interpretability of DE results, and recommend using a common reference gene set when analyzing multiple RNA-Seq experiments to avoid potential inconsistent conclusions. PMID:28028467

  17. Confero: an integrated contrast data and gene set platform for computational analysis and biological interpretation of omics data

    PubMed Central

    2013-01-01

    Background High-throughput omics technologies such as microarrays and next-generation sequencing (NGS) have become indispensable tools in biological research. Computational analysis and biological interpretation of omics data can pose significant challenges due to a number of factors, in particular the systems integration required to fully exploit and compare data from different studies and/or technology platforms. In transcriptomics, the identification of differentially expressed genes when studying effect(s) or contrast(s) of interest constitutes the starting point for further downstream computational analysis (e.g. gene over-representation/enrichment analysis, reverse engineering) leading to mechanistic insights. Therefore, it is important to systematically store the full list of genes with their associated statistical analysis results (differential expression, t-statistics, p-value) corresponding to one or more effect(s) or contrast(s) of interest (shortly termed as ” contrast data”) in a comparable manner and extract gene sets in order to efficiently support downstream analyses and further leverage data on a long-term basis. Filling this gap would open new research perspectives for biologists to discover disease-related biomarkers and to support the understanding of molecular mechanisms underlying specific biological perturbation effects (e.g. disease, genetic, environmental, etc.). Results To address these challenges, we developed Confero, a contrast data and gene set platform for downstream analysis and biological interpretation of omics data. The Confero software platform provides storage of contrast data in a simple and standard format, data transformation to enable cross-study and platform data comparison, and automatic extraction and storage of gene sets to build new a priori knowledge which is leveraged by integrated and extensible downstream computational analysis tools. Gene Set Enrichment Analysis (GSEA) and Over-Representation Analysis (ORA) are

  18. GeneAnalytics: An Integrative Gene Set Analysis Tool for Next Generation Sequencing, RNAseq and Microarray Data

    PubMed Central

    Ben-Ari Fuchs, Shani; Lieder, Iris; Mazor, Yaron; Buzhor, Ella; Kaplan, Sergey; Bogoch, Yoel; Plaschkes, Inbar; Shitrit, Alina; Rappaport, Noa; Kohn, Asher; Edgar, Ron; Shenhav, Liraz; Safran, Marilyn; Lancet, Doron; Guan-Golan, Yaron; Warshawsky, David; Shtrichman, Ronit

    2016-01-01

    Abstract Postgenomics data are produced in large volumes by life sciences and clinical applications of novel omics diagnostics and therapeutics for precision medicine. To move from “data-to-knowledge-to-innovation,” a crucial missing step in the current era is, however, our limited understanding of biological and clinical contexts associated with data. Prominent among the emerging remedies to this challenge are the gene set enrichment tools. This study reports on GeneAnalytics™ (geneanalytics.genecards.org), a comprehensive and easy-to-apply gene set analysis tool for rapid contextualization of expression patterns and functional signatures embedded in the postgenomics Big Data domains, such as Next Generation Sequencing (NGS), RNAseq, and microarray experiments. GeneAnalytics' differentiating features include in-depth evidence-based scoring algorithms, an intuitive user interface and proprietary unified data. GeneAnalytics employs the LifeMap Science's GeneCards suite, including the GeneCards®—the human gene database; the MalaCards—the human diseases database; and the PathCards—the biological pathways database. Expression-based analysis in GeneAnalytics relies on the LifeMap Discovery®—the embryonic development and stem cells database, which includes manually curated expression data for normal and diseased tissues, enabling advanced matching algorithm for gene–tissue association. This assists in evaluating differentiation protocols and discovering biomarkers for tissues and cells. Results are directly linked to gene, disease, or cell “cards” in the GeneCards suite. Future developments aim to enhance the GeneAnalytics algorithm as well as visualizations, employing varied graphical display items. Such attributes make GeneAnalytics a broadly applicable postgenomics data analyses and interpretation tool for translation of data to knowledge-based innovation in various Big Data fields such as precision medicine, ecogenomics, nutrigenomics

  19. Comprehensive Analysis of MILE Gene Expression Data Set Advances Discovery of Leukaemia Type and Subtype Biomarkers.

    PubMed

    Labaj, Wojciech; Papiez, Anna; Polanski, Andrzej; Polanska, Joanna

    2017-03-01

    Large collections of data in studies on cancer such as leukaemia provoke the necessity of applying tailored analysis algorithms to ensure supreme information extraction. In this work, a custom-fit pipeline is demonstrated for thorough investigation of the voluminous MILE gene expression data set. Three analyses are accomplished, each for gaining a deeper understanding of the processes underlying leukaemia types and subtypes. First, the main disease groups are tested for differential expression against the healthy control as in a standard case-control study. Here, the basic knowledge on molecular mechanisms is confirmed quantitatively and by literature references. Second, pairwise comparison testing is performed for juxtaposing the main leukaemia types among each other. In this case by means of the Dice coefficient similarity measure the general relations are pointed out. Moreover, lists of candidate main leukaemia group biomarkers are proposed. Finally, with this approach being successful, the third analysis provides insight into all of the studied subtypes, followed by the emergence of four leukaemia subtype biomarkers. In addition, the class enhanced DEG signature obtained on the basis of novel pipeline processing leads to significantly better classification power of multi-class data classifiers. The developed methodology consisting of batch effect adjustment, adaptive noise and feature filtration coupled with adequate statistical testing and biomarker definition proves to be an effective approach towards knowledge discovery in high-throughput molecular biology experiments.

  20. Comparison of statistics for candidate-gene association studies using cases and parents

    SciTech Connect

    Schaid, D.J.; Sommer, S.S. )

    1994-08-01

    Studies of association between candidate genes and disease can be designed to use cases with disease, and in place of nonrelated controls, their parents. The advantage of this design is the elimination of spurious differences due to ethnic differences between cases and nonrelated controls. However, several statistical methods of analysis have been proposed in the literature, and the choice of analysis is not always clear. The authors review some of the statistical methods currently developed and present two new statistical methods aimed at specific genetic hypotheses of dominance and recessivity of the candidate gene. These new methods can be more powerful than other current methods, as demonstrated by simulations. The basis of these new statistical methods is a likelihood approach. The advantage of the likelihood framework is that regression models can be developed to assess genotype-environment interactions, as well as the relative contribution that alleles at the candidate-gene locus make to the relative risk (RR) of disease. This latter development allows testing of (1) whether interactions between alleles exist, on the scale of log RR, and (2) whether alleles originating from the mother or father of a case impart different risks, i.e., genomic imprinting. 13 refs., 2 figs., 2 tabs.

  1. Consistently altered expression of gene sets in postmortem brains of individuals with major psychiatric disorders

    PubMed Central

    Darby, M M; Yolken, R H; Sabunciyan, S

    2016-01-01

    The measurement of gene expression in postmortem brain is an important tool for understanding the pathogenesis of serious psychiatric disorders. We hypothesized that major molecular deficits associated with psychiatric disease would affect the entire brain, and such deficits may be shared across disorders. We performed RNA sequencing and quantified gene expression in the hippocampus of 100 brains in the Stanley Array Collection followed by replication in the orbitofrontal cortex of 57 brains in the Stanley Neuropathology Consortium. We then identified genes and canonical pathway gene sets with significantly altered expression in schizophrenia and bipolar disorder in the hippocampus and in schizophrenia, bipolar disorder and major depression in the orbitofrontal cortex. Although expression of individual genes varied, gene sets were significantly enriched in both of the brain regions, and many of these were consistent across diagnostic groups. Further examination of core gene sets with consistently increased or decreased expression in both of the brain regions and across target disorders revealed that ribosomal genes are overexpressed while genes involved in neuronal processes, GABAergic signaling, endocytosis and antigen processing have predominantly decreased expression in affected individuals compared to controls without a psychiatric disorder. Our results highlight pathways of central importance to psychiatric health and emphasize messenger RNA processing and protein synthesis as potential therapeutic targets for all three of the disorders. PMID:27622934

  2. Fold change rank ordering statistics: a new method for detecting differentially expressed genes

    PubMed Central

    2014-01-01

    Background Different methods have been proposed for analyzing differentially expressed (DE) genes in microarray data. Methods based on statistical tests that incorporate expression level variability are used more commonly than those based on fold change (FC). However, FC based results are more reproducible and biologically relevant. Results We propose a new method based on fold change rank ordering statistics (FCROS). We exploit the variation in calculated FC levels using combinatorial pairs of biological conditions in the datasets. A statistic is associated with the ranks of the FC values for each gene, and the resulting probability is used to identify the DE genes within an error level. The FCROS method is deterministic, requires a low computational runtime and also solves the problem of multiple tests which usually arises with microarray datasets. Conclusion We compared the performance of FCROS with those of other methods using synthetic and real microarray datasets. We found that FCROS is well suited for DE gene identification from noisy datasets when compared with existing FC based methods. PMID:24423217

  3. EVE (external variance estimation) increases statistical power for detecting differentially expressed genes.

    PubMed

    Wille, Anja; Gruissem, Wilhelm; Bühlmann, Peter; Hennig, Lars

    2007-11-01

    Accurately identifying differentially expressed genes from microarray data is not a trivial task, partly because of poor variance estimates of gene expression signals. Here, after analyzing 380 replicated microarray experiments, we found that probesets have typical, distinct variances that can be estimated based on a large number of microarray experiments. These probeset-specific variances depend at least in part on the function of the probed gene: genes for ribosomal or structural proteins often have a small variance, while genes implicated in stress responses often have large variances. We used these variance estimates to develop a statistical test for differentially expressed genes called EVE (external variance estimation). The EVE algorithm performs better than the t-test and LIMMA on some real-world data, where external information from appropriate databases is available. Thus, EVE helps to maximize the information gained from a typical microarray experiment. Nonetheless, only a large number of replicates will guarantee to identify nearly all truly differentially expressed genes. However, our simulation studies suggest that even limited numbers of replicates will usually result in good coverage of strongly differentially expressed genes.

  4. Resolving ancient radiations: can complete plastid gene sets elucidate deep relationships among the tropical gingers (Zingiberales)?

    PubMed Central

    Barrett, Craig F.; Specht, Chelsea D.; Leebens-Mack, Jim; Stevenson, Dennis Wm.; Zomlefer, Wendy B.; Davis, Jerrold I.

    2014-01-01

    Background and Aims Zingiberales comprise a clade of eight tropical monocot families including approx. 2500 species and are hypothesized to have undergone an ancient, rapid radiation during the Cretaceous. Zingiberales display substantial variation in floral morphology, and several members are ecologically and economically important. Deep phylogenetic relationships among primary lineages of Zingiberales have proved difficult to resolve in previous studies, representing a key region of uncertainty in the monocot tree of life. Methods Next-generation sequencing was used to construct complete plastid gene sets for nine taxa of Zingiberales, which were added to five previously sequenced sets in an attempt to resolve deep relationships among families in the order. Variation in taxon sampling, process partition inclusion and partition model parameters were examined to assess their effects on topology and support. Key Results Codon-based likelihood analysis identified a strongly supported clade of ((Cannaceae, Marantaceae), (Costaceae, Zingiberaceae)), sister to (Musaceae, (Lowiaceae, Strelitziaceae)), collectively sister to Heliconiaceae. However, the deepest divergences in this phylogenetic analysis comprised short branches with weak support. Additionally, manipulation of matrices resulted in differing deep topologies in an unpredictable fashion. Alternative topology testing allowed statistical rejection of some of the topologies. Saturation fails to explain observed topological uncertainty and low support at the base of Zingiberales. Evidence for conflict among the plastid data was based on a support metric that accounts for conflicting resampled topologies. Conclusions Many relationships were resolved with robust support, but the paucity of character information supporting the deepest nodes and the existence of conflict suggest that plastid coding regions are insufficient to resolve and support the earliest divergences among families of Zingiberales. Whole plastomes

  5. Statistical analysis of nucleotide sequences of the hemagglutinin gene of human influenza A viruses.

    PubMed Central

    Ina, Y; Gojobori, T

    1994-01-01

    To examine whether positive selection operates on the hemagglutinin 1 (HA1) gene of human influenza A viruses (H1 subtype), 21 nucleotide sequences of the HA1 gene were statistically analyzed. The nucleotide sequences were divided into antigenic and nonantigenic sites. The nucleotide diversities for antigenic and nonantigenic sites of the HA1 gene were computed at synonymous and nonsynonymous sites separately. For nonantigenic sites, the nucleotide diversities were larger at synonymous sites than at nonsynonymous sites. This is consistent with the neutral theory of molecular evolution. For antigenic sites, however, the nucleotide diversities at nonsynonymous sites were larger than those at synonymous sites. These results suggest that positive selection operates on antigenic sites of the HA1 gene of human influenza A viruses (H1 subtype). PMID:8078892

  6. Gene Set Signature of Reversal Reaction Type I in Leprosy Patients

    PubMed Central

    Orlova, Marianna; Cobat, Aurélie; Huong, Nguyen Thu; Ba, Nguyen Ngoc; Van Thuc, Nguyen; Spencer, John; Nédélec, Yohann; Barreiro, Luis; Thai, Vu Hong; Abel, Laurent; Alcaïs, Alexandre; Schurr, Erwin

    2013-01-01

    Leprosy reversal reactions type 1 (T1R) are acute immune episodes that affect a subset of leprosy patients and remain a major cause of nerve damage. Little is known about the relative importance of innate versus environmental factors in the pathogenesis of T1R. In a retrospective design, we evaluated innate differences in response to Mycobacterium leprae between healthy individuals and former leprosy patients affected or free of T1R by analyzing the transcriptome response of whole blood to M. leprae sonicate. Validation of results was conducted in a subsequent prospective study. We observed the differential expression of 581 genes upon exposure of whole blood to M. leprae sonicate in the retrospective study. We defined a 44 T1R gene set signature of differentially regulated genes. The majority of the T1R set genes were represented by three functional groups: i) pro-inflammatory regulators; ii) arachidonic acid metabolism mediators; and iii) regulators of anti-inflammation. The validity of the T1R gene set signature was replicated in the prospective arm of the study. The T1R genetic signature encompasses genes encoding pro- and anti-inflammatory mediators of innate immunity. This suggests an innate defect in the regulation of the inflammatory response to M. leprae antigens. The identified T1R gene set represents a critical first step towards a genetic profile of leprosy patients who are at increased risk of T1R and concomitant nerve damage. PMID:23874223

  7. A brain region-specific predictive gene map for autism derived by profiling a reference gene set.

    PubMed

    Kumar, Ajay; Swanwick, Catherine Croft; Johnson, Nicole; Menashe, Idan; Basu, Saumyendra N; Bales, Michael E; Banerjee-Basu, Sharmila

    2011-01-01

    Molecular underpinnings of complex psychiatric disorders such as autism spectrum disorders (ASD) remain largely unresolved. Increasingly, structural variations in discrete chromosomal loci are implicated in ASD, expanding the search space for its disease etiology. We exploited the high genetic heterogeneity of ASD to derive a predictive map of candidate genes by an integrated bioinformatics approach. Using a reference set of 84 Rare and Syndromic candidate ASD genes (AutRef84), we built a composite reference profile based on both functional and expression analyses. First, we created a functional profile of AutRef84 by performing Gene Ontology (GO) enrichment analysis which encompassed three main areas: 1) neurogenesis/projection, 2) cell adhesion, and 3) ion channel activity. Second, we constructed an expression profile of AutRef84 by conducting DAVID analysis which found enrichment in brain regions critical for sensory information processing (olfactory bulb, occipital lobe), executive function (prefrontal cortex), and hormone secretion (pituitary). Disease specificity of this dual AutRef84 profile was demonstrated by comparative analysis with control, diabetes, and non-specific gene sets. We then screened the human genome with the dual AutRef84 profile to derive a set of 460 potential ASD candidate genes. Importantly, the power of our predictive gene map was demonstrated by capturing 18 existing ASD-associated genes which were not part of the AutRef84 input dataset. The remaining 442 genes are entirely novel putative ASD risk genes. Together, we used a composite ASD reference profile to generate a predictive map of novel ASD candidate genes which should be prioritized for future research.

  8. StemChecker: a web-based tool to discover and explore stemness signatures in gene sets

    PubMed Central

    Pinto, José P.; Kalathur, Ravi K.; Oliveira, Daniel V.; Barata, Tânia; Machado, Rui S.R.; Machado, Susana; Pacheco-Leyva, Ivette; Duarte, Isabel; Futschik, Matthias E.

    2015-01-01

    Stem cells present unique regenerative abilities, offering great potential for treatment of prevalent pathologies such as diabetes, neurodegenerative and heart diseases. Various research groups dedicated significant effort to identify sets of genes—so-called stemness signatures—considered essential to define stem cells. However, their usage has been hindered by the lack of comprehensive resources and easy-to-use tools. For this we developed StemChecker, a novel stemness analysis tool, based on the curation of nearly fifty published stemness signatures defined by gene expression, RNAi screens, Transcription Factor (TF) binding sites, literature reviews and computational approaches. StemChecker allows researchers to explore the presence of stemness signatures in user-defined gene sets, without carrying-out lengthy literature curation or data processing. To assist in exploring underlying regulatory mechanisms, we collected over 80 target gene sets of TFs associated with pluri- or multipotency. StemChecker presents an intuitive graphical display, as well as detailed statistical results in table format, which helps revealing transcriptionally regulatory programs, indicating the putative involvement of stemness-associated processes in diseases like cancer. Overall, StemChecker substantially expands the available repertoire of online tools, designed to assist the stem cell biology, developmental biology, regenerative medicine and human disease research community. StemChecker is freely accessible at http://stemchecker.sysbiolab.eu. PMID:26007653

  9. Hox gene Ultrabithorax regulates distinct sets of target genes at successive stages of Drosophila haltere morphogenesis.

    PubMed

    Pavlopoulos, Anastasios; Akam, Michael

    2011-02-15

    Hox genes encode highly conserved transcription factors that regionalize the animal body axis by controlling complex developmental processes. Although they are known to operate in multiple cell types and at different stages, we are still missing the batteries of genes targeted by any one Hox gene over the course of a single developmental process to achieve a particular cell and organ morphology. The transformation of wings into halteres by the Hox gene Ultrabithorax (Ubx) in Drosophila melanogaster presents an excellent model system to study the Hox control of transcriptional networks during successive stages of appendage morphogenesis and cell differentiation. We have used an inducible misexpression system to switch on Ubx in the wing epithelium at successive stages during metamorphosis--in the larva, prepupa, and pupa. We have then used extensive microarray expression profiling and quantitative RT-PCR to identify the primary transcriptional responses to Ubx. We find that Ubx targets range from regulatory genes like transcription factors and signaling components to terminal differentiation genes affecting a broad repertoire of cell behaviors and metabolic reactions. Ubx up- and down-regulates hundreds of downstream genes at each stage, mostly in a subtle manner. Strikingly, our analysis reveals that Ubx target genes are largely distinct at different stages of appendage morphogenesis, suggesting extensive interactions between Hox genes and hormone-controlled regulatory networks to orchestrate complex genetic programs during metamorphosis.

  10. Identification of a conserved set of upregulated genes in mouse skeletal muscle hypertrophy and regrowth.

    PubMed

    Chaillou, Thomas; Jackson, Janna R; England, Jonathan H; Kirby, Tyler J; Richards-White, Jena; Esser, Karyn A; Dupont-Versteegden, Esther E; McCarthy, John J

    2015-01-01

    The purpose of this study was to compare the gene expression profile of mouse skeletal muscle undergoing two forms of growth (hypertrophy and regrowth) with the goal of identifying a conserved set of differentially expressed genes. Expression profiling by microarray was performed on the plantaris muscle subjected to 1, 3, 5, 7, 10, and 14 days of hypertrophy or regrowth following 2 wk of hind-limb suspension. We identified 97 differentially expressed genes (≥2-fold increase or ≥50% decrease compared with control muscle) that were conserved during the two forms of muscle growth. The vast majority (∼90%) of the differentially expressed genes was upregulated and occurred at a single time point (64 out of 86 genes), which most often was on the first day of the time course. Microarray analysis from the conserved upregulated genes showed a set of genes related to contractile apparatus and stress response at day 1, including three genes involved in mechanotransduction and four genes encoding heat shock proteins. Our analysis further identified three cell cycle-related genes at day and several genes associated with extracellular matrix (ECM) at both days 3 and 10. In conclusion, we have identified a core set of genes commonly upregulated in two forms of muscle growth that could play a role in the maintenance of sarcomere stability, ECM remodeling, cell proliferation, fast-to-slow fiber type transition, and the regulation of skeletal muscle growth. These findings suggest conserved regulatory mechanisms involved in the adaptation of skeletal muscle to increased mechanical loading.

  11. Mechanism-based biomarker gene sets for glutathione depletion-related hepatotoxicity in rats

    SciTech Connect

    Gao Weihua; Mizukawa, Yumiko; Nakatsu, Noriyuki; Minowa, Yosuke; Yamada, Hiroshi; Ohno, Yasuo; Urushidani, Tetsuro

    2010-09-15

    Chemical-induced glutathione depletion is thought to be caused by two types of toxicological mechanisms: PHO-type glutathione depletion [glutathione conjugated with chemicals such as phorone (PHO) or diethyl maleate (DEM)], and BSO-type glutathione depletion [i.e., glutathione synthesis inhibited by chemicals such as L-buthionine-sulfoximine (BSO)]. In order to identify mechanism-based biomarker gene sets for glutathione depletion in rat liver, male SD rats were treated with various chemicals including PHO (40, 120 and 400 mg/kg), DEM (80, 240 and 800 mg/kg), BSO (150, 450 and 1500 mg/kg), and bromobenzene (BBZ, 10, 100 and 300 mg/kg). Liver samples were taken 3, 6, 9 and 24 h after administration and examined for hepatic glutathione content, physiological and pathological changes, and gene expression changes using Affymetrix GeneChip Arrays. To identify differentially expressed probe sets in response to glutathione depletion, we focused on the following two courses of events for the two types of mechanisms of glutathione depletion: a) gene expression changes occurring simultaneously in response to glutathione depletion, and b) gene expression changes after glutathione was depleted. The gene expression profiles of the identified probe sets for the two types of glutathione depletion differed markedly at times during and after glutathione depletion, whereas Srxn1 was markedly increased for both types as glutathione was depleted, suggesting that Srxn1 is a key molecule in oxidative stress related to glutathione. The extracted probe sets were refined and verified using various compounds including 13 additional positive or negative compounds, and they established two useful marker sets. One contained three probe sets (Akr7a3, Trib3 and Gstp1) that could detect conjugation-type glutathione depletors any time within 24 h after dosing, and the other contained 14 probe sets that could detect glutathione depletors by any mechanism. These two sets, with appropriate scoring

  12. Automated Detection of Cancer Associated Genes Using a Combined Fuzzy-Rough-Set-Based F-Information and Water Swirl Algorithm of Human Gene Expression Data.

    PubMed

    Ganesh Kumar, Pugalendhi; Kavitha, Muthu Subash; Ahn, Byeong-Cheol

    2016-01-01

    This study describes a novel approach to reducing the challenges of highly nonlinear multiclass gene expression values for cancer diagnosis. To build a fruitful system for cancer diagnosis, in this study, we introduced two levels of gene selection such as filtering and embedding for selection of potential genes and the most relevant genes associated with cancer, respectively. The filter procedure was implemented by developing a fuzzy rough set (FR)-based method for redefining the criterion function of f-information (FI) to identify the potential genes without discretizing the continuous gene expression values. The embedded procedure is implemented by means of a water swirl algorithm (WSA), which attempts to optimize the rule set and membership function required to classify samples using a fuzzy-rule-based multiclassification system (FRBMS). Two novel update equations are proposed in WSA, which have better exploration and exploitation abilities while designing a self-learning FRBMS. The efficiency of our new approach was evaluated on 13 multicategory and 9 binary datasets of cancer gene expression. Additionally, the performance of the proposed FRFI-WSA method in designing an FRBMS was compared with existing methods for gene selection and optimization such as genetic algorithm (GA), particle swarm optimization (PSO), and artificial bee colony algorithm (ABC) on all the datasets. In the global cancer map with repeated measurements (GCM_RM) dataset, the FRFI-WSA showed the smallest number of 16 most relevant genes associated with cancer using a minimal number of 26 compact rules with the highest classification accuracy (96.45%). In addition, the statistical validation used in this study revealed that the biological relevance of the most relevant genes associated with cancer and their linguistics detected by the proposed FRFI-WSA approach are better than those in the other methods. The simple interpretable rules with most relevant genes and effectively classified

  13. Automated Detection of Cancer Associated Genes Using a Combined Fuzzy-Rough-Set-Based F-Information and Water Swirl Algorithm of Human Gene Expression Data

    PubMed Central

    Ahn, Byeong-Cheol

    2016-01-01

    This study describes a novel approach to reducing the challenges of highly nonlinear multiclass gene expression values for cancer diagnosis. To build a fruitful system for cancer diagnosis, in this study, we introduced two levels of gene selection such as filtering and embedding for selection of potential genes and the most relevant genes associated with cancer, respectively. The filter procedure was implemented by developing a fuzzy rough set (FR)-based method for redefining the criterion function of f-information (FI) to identify the potential genes without discretizing the continuous gene expression values. The embedded procedure is implemented by means of a water swirl algorithm (WSA), which attempts to optimize the rule set and membership function required to classify samples using a fuzzy-rule-based multiclassification system (FRBMS). Two novel update equations are proposed in WSA, which have better exploration and exploitation abilities while designing a self-learning FRBMS. The efficiency of our new approach was evaluated on 13 multicategory and 9 binary datasets of cancer gene expression. Additionally, the performance of the proposed FRFI-WSA method in designing an FRBMS was compared with existing methods for gene selection and optimization such as genetic algorithm (GA), particle swarm optimization (PSO), and artificial bee colony algorithm (ABC) on all the datasets. In the global cancer map with repeated measurements (GCM_RM) dataset, the FRFI-WSA showed the smallest number of 16 most relevant genes associated with cancer using a minimal number of 26 compact rules with the highest classification accuracy (96.45%). In addition, the statistical validation used in this study revealed that the biological relevance of the most relevant genes associated with cancer and their linguistics detected by the proposed FRFI-WSA approach are better than those in the other methods. The simple interpretable rules with most relevant genes and effectively classified

  14. A set-based association test identifies sex-specific gene sets associated with type 2 diabetes

    PubMed Central

    He, Tao; Zhong, Ping-Shou; Cui, Yuehua

    2014-01-01

    Single variant analysis in genome-wide association studies (GWAS) has been proven to be successful in identifying thousands of genetic variants associated with hundreds of complex diseases. However, these identified variants only explain a small fraction of inheritable variability in many diseases, suggesting that other resources, such as multilevel genetic variations, may contribute to disease susceptibility. In this work, we proposed to combine genetic variants that belong to a gene set, such as at gene- and pathway-level to form an integrated signal aimed to identify major players that function in a coordinated manner conferring disease risk. The integrated analysis provides novel insight into disease etiology while individual signals could be easily missed by single variant analysis. We applied our approach to a genome-wide association study of type 2 diabetes (T2D) with male and female data analyzed separately. Novel sex-specific genes and pathways were identified to increase the risk of T2D. We also demonstrated the performance of signal integration through simulation studies. PMID:25429300

  15. A Survey of Statistical Models for Reverse Engineering Gene Regulatory Networks

    PubMed Central

    Huang, Yufei; Tienda-Luna, Isabel M.; Wang, Yufeng

    2009-01-01

    Statistical models for reverse engineering gene regulatory networks are surveyed in this article. To provide readers with a system-level view of the modeling issues in this research, a graphical modeling framework is proposed. This framework serves as the scaffolding on which the review of different models can be systematically assembled. Based on the framework, we review many existing models for many aspects of gene regulation; the pros and cons of each model are discussed. In addition, network inference algorithms are also surveyed under the graphical modeling framework by the categories of point solutions and probabilistic solutions and the connections and differences among the algorithms are provided. This survey has the potential to elucidate the development and future of reverse engineering GRNs and bring statistical signal processing closer to the core of this research. PMID:20046885

  16. Identification of a set of genes showing regionally enriched expression in the mouse brain

    PubMed Central

    D'Souza, Cletus A; Chopra, Vikramjit; Varhol, Richard; Xie, Yuan-Yun; Bohacec, Slavita; Zhao, Yongjun; Lee, Lisa LC; Bilenky, Mikhail; Portales-Casamar, Elodie; He, An; Wasserman, Wyeth W; Goldowitz, Daniel; Marra, Marco A; Holt, Robert A; Simpson, Elizabeth M; Jones, Steven JM

    2008-01-01

    Background The Pleiades Promoter Project aims to improve gene therapy by designing human mini-promoters (< 4 kb) that drive gene expression in specific brain regions or cell-types of therapeutic interest. Our goal was to first identify genes displaying regionally enriched expression in the mouse brain so that promoters designed from orthologous human genes can then be tested to drive reporter expression in a similar pattern in the mouse brain. Results We have utilized LongSAGE to identify regionally enriched transcripts in the adult mouse brain. As supplemental strategies, we also performed a meta-analysis of published literature and inspected the Allen Brain Atlas in situ hybridization data. From a set of approximately 30,000 mouse genes, 237 were identified as showing specific or enriched expression in 30 target regions of the mouse brain. GO term over-representation among these genes revealed co-involvement in various aspects of central nervous system development and physiology. Conclusion Using a multi-faceted expression validation approach, we have identified mouse genes whose human orthologs are good candidates for design of mini-promoters. These mouse genes represent molecular markers in several discrete brain regions/cell-types, which could potentially provide a mechanistic explanation of unique functions performed by each region. This set of markers may also serve as a resource for further studies of gene regulatory elements influencing brain expression. PMID:18625066

  17. Associations between DNA methylation and schizophrenia-related intermediate phenotypes - a gene set enrichment analysis.

    PubMed

    Hass, Johanna; Walton, Esther; Wright, Carrie; Beyer, Andreas; Scholz, Markus; Turner, Jessica; Liu, Jingyu; Smolka, Michael N; Roessner, Veit; Sponheim, Scott R; Gollub, Randy L; Calhoun, Vince D; Ehrlich, Stefan

    2015-06-03

    Multiple genetic approaches have identified microRNAs as key effectors in psychiatric disorders as they post-transcriptionally regulate expression of thousands of target genes. However, their role in specific psychiatric diseases remains poorly understood. In addition, epigenetic mechanisms such as DNA methylation, which affect the expression of both microRNAs and coding genes, are critical for our understanding of molecular mechanisms in schizophrenia. Using clinical, imaging, genetic, and epigenetic data of 103 patients with schizophrenia and 111 healthy controls of the Mind Clinical Imaging Consortium (MCIC) study of schizophrenia, we conducted gene set enrichment analysis to identify markers for schizophrenia-associated intermediate phenotypes. Genes were ranked based on the correlation between DNA methylation patterns and each phenotype, and then searched for enrichment in 221 predicted microRNA target gene sets. We found the predicted hsa-miR-219a-5p target gene set to be significantly enriched for genes (EPHA4, PKNOX1, ESR1, among others) whose methylation status is correlated with hippocampal volume independent of disease status. Our results were strengthened by significant associations between hsa-miR-219a-5p target gene methylation patterns and hippocampus-related neuropsychological variables. IPA pathway analysis of the respective predicted hsa-miR-219a-5p target genes revealed associated network functions in behavior and developmental disorders. Altered methylation patterns of predicted hsa-miR-219a-5p target genes are associated with a structural aberration of the brain that has been proposed as a possible biomarker for schizophrenia. The (dys)regulation of microRNA target genes by epigenetic mechanisms may confer additional risk for developing psychiatric symptoms. Further study is needed to understand possible interactions between microRNAs and epigenetic changes and their impact on risk for brain-based disorders such as schizophrenia.

  18. Different gene sets contribute to different symptom dimensions of depression and anxiety.

    PubMed

    van Veen, Tineke; Goeman, Jelle J; Monajemi, Ramin; Wardenaar, Klaas J; Hartman, Catharina A; Snieder, Harold; Nolte, Ilja M; Penninx, Brenda W J H; Zitman, Frans G

    2012-07-01

    Although many genetic association studies have been carried out, it remains unclear which genes contribute to depression. This may be due to heterogeneity of the DSM-IV category of depression. Specific symptom-dimensions provide a more homogenous phenotype. Furthermore, as effects of individual genes are small, analysis of genetic data at the pathway-level provides more power to detect associations and yield valuable biological insight. In 1,398 individuals with a Major Depressive Disorder, the symptom dimensions of the tripartite model of anxiety and depression, General Distress, Anhedonic Depression, and Anxious Arousal, were measured with the Mood and Anxiety Symptoms Questionnaire (30-item Dutch adaptation; MASQ-D30). Association of these symptom dimensions with candidate gene sets and gene sets from two public pathway databases was tested using the Global test. One pathway was associated with General Distress, and concerned molecules expressed in the endoplasmatic reticulum lumen. Seven pathways were associated with Anhedonic Depression. Important themes were neurodevelopment, neurodegeneration, and cytoskeleton. Furthermore, three gene sets associated with Anxious Arousal regarded development, morphology, and genetic recombination. The individual pathways explained up to 1.7% of the variance. These data demonstrate mechanisms that influence the specific dimensions. Moreover, they show the value of using dimensional phenotypes on one hand and gene sets on the other hand.

  19. Regulatory Genes Controlling Anthocyanin Pigmentation Are Functionally Conserved among Plant Species and Have Distinct Sets of Target Genes.

    PubMed Central

    Quattrocchio, F; Wing, JF; Leppen, H; Mol, J; Koes, RE

    1993-01-01

    In this study, we demonstrate that in petunia at least four regulatory genes (anthocyanin-1 [an1], an2, an4, and an11) control transcription of a subset of structural genes from the anthocyanin pathway by using a combination of RNA gel blot analysis, transcription run-on assays, and transient expression assays. an2- and an11- mutants could be transiently complemented by the maize regulatory genes Leaf color (Lc) or Colorless-1 (C1), respectively, whereas an1- mutants only by Lc and C1 together. In addition, the combination of Lc and C1 induces pigment accumulation in young leaves. This indicates that Lc and C1 are both necessary and sufficient to produce pigmentation in leaf cells. Regulatory pigmentation genes in maize and petunia control different sets of structural genes. The maize Lc and C1 genes expressed in petunia differentially activate the promoters of the chalcone synthase genes chsA and chsJ in the same way that the homologous petunia genes do. This suggests that the regulatory proteins in both species are functionally similar and that the choice of target genes is determined by their promoter sequences. We present an evolutionary model that explains the differences in regulation of pigmentation pathways of maize, petunia, and snapdragon. PMID:12271045

  20. Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data

    PubMed Central

    2013-01-01

    Background Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with gene set analysis (GSA) methods for chemical treatment identification, for pharmacological mechanism elucidation, and for comparing compound toxicity profiles. Methods We created 30,211 chemical response-specific gene sets for human and mouse by next-gen TM, and derived 1,189 (human) and 588 (mouse) gene sets from the Comparative Toxicogenomics Database (CTD). We tested for significant differential expression (SDE) (false discovery rate -corrected p-values < 0.05) of the next-gen TM-derived gene sets and the CTD-derived gene sets in gene expression (GE) data sets of five chemicals (from experimental models). We tested for SDE of gene sets for six fibrates in a peroxisome proliferator-activated receptor alpha (PPARA) knock-out GE dataset and compared to results from the Connectivity Map. We tested for SDE of 319 next-gen TM-derived gene sets for environmental toxicants in three GE data sets of triazoles, and tested for SDE of 442 gene sets associated with embryonic structures. We compared the gene sets to triazole effects seen in the Whole Embryo Culture (WEC), and used principal component analysis (PCA) to discriminate triazoles from other chemicals. Results Next-gen TM-derived gene sets matching the chemical treatment were significantly altered in three GE data sets, and the corresponding CTD-derived gene sets were significantly altered in five GE data sets. Six next-gen TM-derived and four CTD-derived fibrate gene sets were significantly altered in the PPARA knock-out GE dataset. None of the fibrate signatures in cMap scored significant against the PPARA GE signature. 33 environmental toxicant gene sets were significantly altered in the triazole GE data sets. 21 of these toxicants

  1. Protein and gene model inference based on statistical modeling in k-partite graphs.

    PubMed

    Gerster, Sarah; Qeli, Ermir; Ahrens, Christian H; Bühlmann, Peter

    2010-07-06

    One of the major goals of proteomics is the comprehensive and accurate description of a proteome. Shotgun proteomics, the method of choice for the analysis of complex protein mixtures, requires that experimentally observed peptides are mapped back to the proteins they were derived from. This process is also known as protein inference. We present Markovian Inference of Proteins and Gene Models (MIPGEM), a statistical model based on clearly stated assumptions to address the problem of protein and gene model inference for shotgun proteomics data. In particular, we are dealing with dependencies among peptides and proteins using a Markovian assumption on k-partite graphs. We are also addressing the problems of shared peptides and ambiguous proteins by scoring the encoding gene models. Empirical results on two control datasets with synthetic mixtures of proteins and on complex protein samples of Saccharomyces cerevisiae, Drosophila melanogaster, and Arabidopsis thaliana suggest that the results with MIPGEM are competitive with existing tools for protein inference.

  2. Statistical inference of selection and divergence of the rice blast resistance gene Pi-ta.

    PubMed

    Amei, Amei; Lee, Seonghee; Mysore, Kirankumar S; Jia, Yulin

    2014-10-21

    The resistance gene Pi-ta has been effectively used to control rice blast disease, but some populations of cultivated and wild rice have evolved resistance. Insights into the evolutionary processes that led to this resistance during crop domestication may be inferred from the population history of domesticated and wild rice strains. In this study, we applied a recently developed statistical method, time-dependent Poisson random field model, to examine the evolution of the Pi-ta gene in cultivated and weedy rice. Our study suggests that the Pi-ta gene may have more recently introgressed into cultivated rice, indica and japonica, and U.S. weedy rice from the wild species, O. rufipogon. In addition, the Pi-ta gene is under positive selection in japonica, tropical japonica, U.S. cultivars and U.S. weedy rice. We also found that sequences of two domains of the Pi-ta gene, the nucleotide binding site and leucine-rich repeat domain, are highly conserved among all rice accessions examined. Our results provide a valuable analytical tool for understanding the evolution of disease resistance genes in crop plants.

  3. Statistical methods in detecting differential expressed genes, analyzing insertion tolerance for genes and group selection for survival data

    NASA Astrophysics Data System (ADS)

    Liu, Fangfang

    The thesis is composed of three independent projects: (i) analyzing transposon-sequencing data to infer functions of genes on bacteria growth (chapter 2), (ii) developing semi-parametric Bayesian method for differential gene expression analysis with RNA-sequencing data (chapter 3), (iii) solving group selection problem for survival data (chapter 4). All projects are motivated by statistical challenges raised in biological research. The first project is motivated by the need to develop statistical models to accommodate the transposon insertion sequencing (Tn-Seq) data, Tn-Seq data consist of sequence reads around each transposon insertion site. The detection of transposon insertion at a given site indicates that the disruption of genomic sequence at this site does not cause essential function loss and the bacteria can still grow. Hence, such measurements have been used to infer the functions of each gene on bacteria growth. We propose a zero-inflated Poisson regression method for analyzing the Tn-Seq count data, and derive an Expectation-Maximization (EM) algorithm to obtain parameter estimates. We also propose a multiple testing procedure that categorizes genes into each of the three states, hypo-tolerant, tolerant, and hyper-tolerant, while controlling false discovery rate. Simulation studies show our method provides good estimation of model parameters and inference on gene functions. In the second project, we model the count data from RNA-sequencing experiment for each gene using a Poisson-Gamma hierarchical model, or equivalently, a negative binomial (NB) model. We derive a full semi-parametric Bayesian approach with Dirichlet process as the prior for the fold changes between two treatment means. An inference strategy using Gibbs algorithm is developed for differential expression analysis. We evaluate our method with several simulation studies, and the results demonstrate that our method outperforms other methods including the popularly applied ones such as edge

  4. The Core Mouse Response to Infection by Neospora Caninum Defined by Gene Set Enrichment Analyses

    PubMed Central

    Ellis, John; Goodswen, Stephen; Kennedy, Paul J; Bush, Stephen

    2012-01-01

    In this study, the BALB/c and Qs mouse responses to infection by the parasite Neospora caninum were investigated in order to identify host response mechanisms. Investigation was done using gene set (enrichment) analyses of microarray data. GSEA, MANOVA, Romer, subGSE and SAM-GS were used to study the contrasts Neospora strain type, Mouse type (BALB/c and Qs) and time post infection (6 hours post infection and 10 days post infection). The analyses show that the major signal in the core mouse response to infection is from time post infection and can be defined by gene ontology terms Protein Kinase Activity, Cell Proliferation and Transcription Initiation. Several terms linked to signaling, morphogenesis, response and fat metabolism were also identified. At 10 days post infection, genes associated with fatty acid metabolism were identified as up regulated in expression. The value of gene set (enrichment) analyses in the analysis of microarray data is discussed. PMID:23012496

  5. A prognosis classifier for breast cancer based on conserved gene regulation between mammary gland development and tumorigenesis: a multiscale statistical model.

    PubMed

    Tian, Yingpu; Chen, Baozhen; Guan, Pengfei; Kang, Yujia; Lu, Zhongxian

    2013-01-01

    Identification of novel cancer genes for molecular therapy and diagnosis is a current focus of breast cancer research. Although a few small gene sets were identified as prognosis classifiers, more powerful models are still needed for the definition of effective gene sets for the diagnosis and treatment guidance in breast cancer. In the present study, we have developed a novel statistical approach for systematic analysis of intrinsic correlations of gene expression between development and tumorigenesis in mammary gland. Based on this analysis, we constructed a predictive model for prognosis in breast cancer that may be useful for therapy decisions. We first defined developmentally associated genes from a mouse mammary gland epithelial gene expression database. Then, we found that the cancer modulated genes were enriched in this developmentally associated genes list. Furthermore, the developmentally associated genes had a specific expression profile, which associated with the molecular characteristics and histological grade of the tumor. These result suggested that the processes of mammary gland development and tumorigenesis share gene regulatory mechanisms. Then, the list of regulatory genes both on the developmental and tumorigenesis process was defined an 835-member prognosis classifier, which showed an exciting ability to predict clinical outcome of three groups of breast cancer patients (the predictive accuracy 64∼72%) with a robust prognosis prediction (hazard ratio 3.3∼3.8, higher than that of other clinical risk factors (around 2.0-2.8)). In conclusion, our results identified the conserved molecular mechanisms between mammary gland development and neoplasia, and provided a unique potential model for mining unknown cancer genes and predicting the clinical status of breast tumors. These findings also suggested that developmental roles of genes may be important criteria for selecting genes for prognosis prediction in breast cancer.

  6. Gene regulatory network inference using fused LASSO on multiple data sets.

    PubMed

    Omranian, Nooshin; Eloundou-Mbebi, Jeanne M O; Mueller-Roeber, Bernd; Nikoloski, Zoran

    2016-02-11

    Devising computational methods to accurately reconstruct gene regulatory networks given gene expression data is key to systems biology applications. Here we propose a method for reconstructing gene regulatory networks by simultaneous consideration of data sets from different perturbation experiments and corresponding controls. The method imposes three biologically meaningful constraints: (1) expression levels of each gene should be explained by the expression levels of a small number of transcription factor coding genes, (2) networks inferred from different data sets should be similar with respect to the type and number of regulatory interactions, and (3) relationships between genes which exhibit similar differential behavior over the considered perturbations should be favored. We demonstrate that these constraints can be transformed in a fused LASSO formulation for the proposed method. The comparative analysis on transcriptomics time-series data from prokaryotic species, Escherichia coli and Mycobacterium tuberculosis, as well as a eukaryotic species, mouse, demonstrated that the proposed method has the advantages of the most recent approaches for regulatory network inference, while obtaining better performance and assigning higher scores to the true regulatory links. The study indicates that the combination of sparse regression techniques with other biologically meaningful constraints is a promising framework for gene regulatory network reconstructions.

  7. Yeast genome-wide screen reveals dissimilar sets of host genes affecting replication of RNA viruses

    PubMed Central

    Panavas, Tadas; Serviene, Elena; Brasher, Jeremy; Nagy, Peter D.

    2005-01-01

    Viruses are devastating pathogens of humans, animals, and plants. To further our understanding of how viruses use the resources of infected cells, we systematically tested the yeast single-gene-knockout library for the effect of each host gene on the replication of tomato bushy stunt virus (TBSV), a positive-strand RNA virus of plants. The genome-wide screen identified 96 host genes whose absence either reduced or increased the accumulation of the TBSV replicon. The identified genes are involved in the metabolism of nucleic acids, lipids, proteins, and other compounds and in protein targeting/transport. Comparison with published genome-wide screens reveals that the replication of TBSV and brome mosaic virus (BMV), which belongs to a different supergroup among plus-strand RNA viruses, is affected by vastly different yeast genes. Moreover, a set of yeast genes involved in vacuolar targeting of proteins and vesicle-mediated transport both affected replication of the TBSV replicon and enhanced the cytotoxicity of the Parkinson's disease-related α-synuclein when this protein was expressed in yeast. In addition, a set of host genes involved in ubiquitin-dependent protein catabolism affected both TBSV replication and the cytotoxicity of a mutant huntingtin protein, a candidate agent in Huntington's disease. This finding suggests that virus infection and disease-causing proteins might use or alter similar host pathways and may suggest connections between chronic diseases and prior virus infection. PMID:15883361

  8. A PLSPM-based test statistic for detecting gene-gene co-association in genome-wide association study with case-control design.

    PubMed

    Zhang, Xiaoshuai; Yang, Xiaowei; Yuan, Zhongshang; Liu, Yanxun; Li, Fangyu; Peng, Bin; Zhu, Dianwen; Zhao, Jinghua; Xue, Fuzhong

    2013-01-01

    For genome-wide association data analysis, two genes in any pathway, two SNPs in the two linked gene regions respectively or in the two linked exons respectively within one gene are often correlated with each other. We therefore proposed the concept of gene-gene co-association, which refers to the effects not only due to the traditional interaction under nearly independent condition but the correlation between two genes. Furthermore, we constructed a novel statistic for detecting gene-gene co-association based on Partial Least Squares Path Modeling (PLSPM). Through simulation, the relationship between traditional interaction and co-association was highlighted under three different types of co-association. Both simulation and real data analysis demonstrated that the proposed PLSPM-based statistic has better performance than single SNP-based logistic model, PCA-based logistic model, and other gene-based methods.

  9. Primer Sets Developed for Functional Genes Reveal Shifts in Functionality of Fungal Community in Soils

    PubMed Central

    Hannula, S. Emilia; van Veen, Johannes A.

    2016-01-01

    Phylogenetic diversity of soil microbes is a hot topic at the moment. However, the molecular tools for the assessment of functional diversity in the fungal community are less developed than tools based on genes encoding the ribosomal operon. Here 20 sets of primers targeting genes involved mainly in carbon cycling were designed and/or validated and the functioning of soil fungal communities along a chronosequence of land abandonment from agriculture was evaluated using them. We hypothesized that changes in fungal community structure during secondary succession would lead to difference in the types of genes present in soils and that these changes would be directional. We expected an increase in genes involved in degradation of recalcitrant organic matter in time since agriculture. Out of the investigated genes, the richness of the genes related to carbon cycling was significantly higher in fields abandoned for longer time. The composition of six of the genes analyzed revealed significant differences between fields abandoned for shorter and longer time. However, all genes revealed significant variance over the fields studied, and this could be related to other parameters than the time since agriculture such as pH, organic matter, and the amount of available nitrogen. Contrary to our initial hypothesis, the genes significantly different between fields were not related to the decomposition of more recalcitrant matter but rather involved in degradation of cellulose and hemicellulose. PMID:27965632

  10. Primer Sets Developed for Functional Genes Reveal Shifts in Functionality of Fungal Community in Soils.

    PubMed

    Hannula, S Emilia; van Veen, Johannes A

    2016-01-01

    Phylogenetic diversity of soil microbes is a hot topic at the moment. However, the molecular tools for the assessment of functional diversity in the fungal community are less developed than tools based on genes encoding the ribosomal operon. Here 20 sets of primers targeting genes involved mainly in carbon cycling were designed and/or validated and the functioning of soil fungal communities along a chronosequence of land abandonment from agriculture was evaluated using them. We hypothesized that changes in fungal community structure during secondary succession would lead to difference in the types of genes present in soils and that these changes would be directional. We expected an increase in genes involved in degradation of recalcitrant organic matter in time since agriculture. Out of the investigated genes, the richness of the genes related to carbon cycling was significantly higher in fields abandoned for longer time. The composition of six of the genes analyzed revealed significant differences between fields abandoned for shorter and longer time. However, all genes revealed significant variance over the fields studied, and this could be related to other parameters than the time since agriculture such as pH, organic matter, and the amount of available nitrogen. Contrary to our initial hypothesis, the genes significantly different between fields were not related to the decomposition of more recalcitrant matter but rather involved in degradation of cellulose and hemicellulose.

  11. Meta gene set enrichment analyses link miR-137-regulated pathways with schizophrenia risk

    PubMed Central

    Wright, Carrie; Calhoun, Vince D.; Ehrlich, Stefan; Wang, Lei; Turner, Jessica A.; Bizzozero, Nora I. Perrone-

    2015-01-01

    Background: A single nucleotide polymorphism (SNP) within MIR137, the host gene for miR-137, has been identified repeatedly as a risk factor for schizophrenia. Previous genetic pathway analyses suggest that potential targets of this microRNA (miRNA) are also highly enriched in schizophrenia-relevant biological pathways, including those involved in nervous system development and function. Methods: In this study, we evaluated the schizophrenia risk of miR-137 target genes within these pathways. Gene set enrichment analysis of pathway-specific miR-137 targets was performed using the stage 1 (21,856 subjects) schizophrenia genome wide association study data from the Psychiatric Genomics Consortium and a small independent replication cohort (244 subjects) from the Mind Clinical Imaging Consortium and Northwestern University. Results: Gene sets of potential miR-137 targets were enriched with variants associated with schizophrenia risk, including target sets involved in axonal guidance signaling, Ephrin receptor signaling, long-term potentiation, PKA signaling, and Sertoli cell junction signaling. The schizophrenia-risk association of SNPs in PKA signaling targets was replicated in the second independent cohort. Conclusions: These results suggest that these biological pathways may be involved in the mechanisms by which this MIR137 variant enhances schizophrenia risk. SNPs in targets and the miRNA host gene may collectively lead to dysregulation of target expression and aberrant functioning of such implicated pathways. Pathway-guided gene set enrichment analyses should be useful in evaluating the impact of other miRNAs and target genes in different diseases. PMID:25941532

  12. Protein Interaction Networks Reveal Novel Autism Risk Genes within GWAS Statistical Noise

    PubMed Central

    Correia, Catarina; Oliveira, Guiomar; Vicente, Astrid M.

    2014-01-01

    Genome-wide association studies (GWAS) for Autism Spectrum Disorder (ASD) thus far met limited success in the identification of common risk variants, consistent with the notion that variants with small individual effects cannot be detected individually in single SNP analysis. To further capture disease risk gene information from ASD association studies, we applied a network-based strategy to the Autism Genome Project (AGP) and the Autism Genetics Resource Exchange GWAS datasets, combining family-based association data with Human Protein-Protein interaction (PPI) data. Our analysis showed that autism-associated proteins at higher than conventional levels of significance (P<0.1) directly interact more than random expectation and are involved in a limited number of interconnected biological processes, indicating that they are functionally related. The functionally coherent networks generated by this approach contain ASD-relevant disease biology, as demonstrated by an improved positive predictive value and sensitivity in retrieving known ASD candidate genes relative to the top associated genes from either GWAS, as well as a higher gene overlap between the two ASD datasets. Analysis of the intersection between the networks obtained from the two ASD GWAS and six unrelated disease datasets identified fourteen genes exclusively present in the ASD networks. These are mostly novel genes involved in abnormal nervous system phenotypes in animal models, and in fundamental biological processes previously implicated in ASD, such as axon guidance, cell adhesion or cytoskeleton organization. Overall, our results highlighted novel susceptibility genes previously hidden within GWAS statistical “noise” that warrant further analysis for causal variants. PMID:25409314

  13. Protein interaction networks reveal novel autism risk genes within GWAS statistical noise.

    PubMed

    Correia, Catarina; Oliveira, Guiomar; Vicente, Astrid M

    2014-01-01

    Genome-wide association studies (GWAS) for Autism Spectrum Disorder (ASD) thus far met limited success in the identification of common risk variants, consistent with the notion that variants with small individual effects cannot be detected individually in single SNP analysis. To further capture disease risk gene information from ASD association studies, we applied a network-based strategy to the Autism Genome Project (AGP) and the Autism Genetics Resource Exchange GWAS datasets, combining family-based association data with Human Protein-Protein interaction (PPI) data. Our analysis showed that autism-associated proteins at higher than conventional levels of significance (P<0.1) directly interact more than random expectation and are involved in a limited number of interconnected biological processes, indicating that they are functionally related. The functionally coherent networks generated by this approach contain ASD-relevant disease biology, as demonstrated by an improved positive predictive value and sensitivity in retrieving known ASD candidate genes relative to the top associated genes from either GWAS, as well as a higher gene overlap between the two ASD datasets. Analysis of the intersection between the networks obtained from the two ASD GWAS and six unrelated disease datasets identified fourteen genes exclusively present in the ASD networks. These are mostly novel genes involved in abnormal nervous system phenotypes in animal models, and in fundamental biological processes previously implicated in ASD, such as axon guidance, cell adhesion or cytoskeleton organization. Overall, our results highlighted novel susceptibility genes previously hidden within GWAS statistical "noise" that warrant further analysis for causal variants.

  14. Deciphering causal and statistical relations of molecular aberrations and gene expressions in NCI-60 cell lines

    PubMed Central

    2011-01-01

    Background Cancer cells harbor a large number of molecular alterations such as mutations, amplifications and deletions on DNA sequences and epigenetic changes on DNA methylations. These aberrations may dysregulate gene expressions, which in turn drive the malignancy of tumors. Deciphering the causal and statistical relations of molecular aberrations and gene expressions is critical for understanding the molecular mechanisms of clinical phenotypes. Results In this work, we proposed a computational method to reconstruct association modules containing driver aberrations, passenger mRNA or microRNA expressions, and putative regulators that mediate the effects from drivers to passengers. By applying the module-finding algorithm to the integrated datasets of NCI-60 cancer cell lines, we found that gene expressions were driven by diverse molecular aberrations including chromosomal segments' copy number variations, gene mutations and DNA methylations, microRNA expressions, and the expressions of transcription factors. In-silico validation indicated that passenger genes were enriched with the regulator binding motifs, functional categories or pathways where the drivers were involved, and co-citations with the driver/regulator genes. Moreover, 6 of 11 predicted MYB targets were down-regulated in an MYB-siRNA treated leukemia cell line. In addition, microRNA expressions were driven by distinct mechanisms from mRNA expressions. Conclusions The results provide rich mechanistic information regarding molecular aberrations and gene expressions in cancer genomes. This kind of integrative analysis will become an important tool for the diagnosis and treatment of cancer in the era of personalized medicine. PMID:22051105

  15. Application of a statistical software package for analysis of large patient dose data sets obtained from RIS.

    PubMed

    Fazakerley, J; Charnock, P; Wilde, R; Jones, R; Ward, M

    2010-01-01

    For the purpose of patient dose audit, clinical audit and radiology workload analysis, data from Radiology Information Systems (RIS) at many hospitals are collected using a database and the analysis was automated using a statistical package and Visual Basic coding. The database is a Structured Query Language database, which can be queried using an off-the-shelf statistical package, Statistica. Macros were created to automatically format the data to a consistent format between different hospitals ready for analysis. These macros can also be used to automate further analysis such as detailing mean kV, mAs and entrance surface dose per room and per gender. Standard deviation and standard error of the mean are also generated. Graphs can also be generated to illustrate the trends in doses between different variables such as room and gender. Collectively, this information can be used to generate a report. A process that once could take up to 1 d to complete now takes around 1 h. A major benefit in providing the service to hospital trusts is that less resource is now required to report on RIS data, making the possibility of continuous dose audit more likely. Time that was spent on sorting through data can now be spent on improving the analysis to provide benefit to the customer. Using data sets from RIS is a good way to perform dose audits as the huge numbers of data available provide the bases for very accurate analysis. Using macros written in Statistica Visual Basic has helped sort and consistently analyse these data. Being able to analyse by exposure factors has provided a more detailed report to the customer.

  16. Infrequently transcribed long genes depend on the Set2/Rpd3S pathway for accurate transcription

    PubMed Central

    Li, Bing; Gogol, Madelaine; Carey, Mike; Pattenden, Samantha G.; Seidel, Chris; Workman, Jerry L.

    2007-01-01

    The presence of Set2-mediated methylation of H3K36 (K36me) correlates with transcription frequency throughout the yeast genome. K36me targets the Rpd3S complex to deacetylate transcribed regions and suppress cryptic transcription initiation at certain genes. Here, using a genome-wide approach, we report that the Set2–Rpd3S pathway is generally required for controlling acetylation at coding regions. When using acetylation as a functional readout for this pathway, we discovered that longer genes and, surprisingly, genes transcribed at lower frequency exhibit a stronger dependency. Moreover, a systematic screen using high-resolution tiling microarrays allowed us to identify a group of genes that rely on Set2–Rpd3S to suppress spurious transcripts. Interestingly, most of these genes are within the group that depend on the same pathway to maintain a hypoacetylated state at coding regions. These data highlight the importance of using the functional readout of histone codes to define the roles of specific pathways. PMID:17545470

  17. Transcriptomic sequencing reveals a set of unique genes activated by butyrate-induced histone modification

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Butyrate is a nutritional element with strong epigenetic regulatory activity as an inhibitor of histone deacetylases (HDACs). Based on the analysis of differentially expressed genes induced by butyrate in the bovine epithelial cell using deep RNA-sequencing technology (RNA-seq), a set of unique gen...

  18. The Use of Multi-Component Statistical Techniques in Understanding Subduction Zone Arc Granitic Geochemical Data Sets

    NASA Astrophysics Data System (ADS)

    Pompe, L.; Clausen, B. L.; Morton, D. M.

    2015-12-01

    Multi-component statistical techniques and GIS visualization are emerging trends in understanding large data sets. Our research applies these techniques to a large igneous geochemical data set from southern California to better understand magmatic and plate tectonic processes. A set of 480 granitic samples collected by Baird from this area were analyzed for 39 geochemical elements. Of these samples, 287 are from the Peninsular Ranges Batholith (PRB) and 164 from part of the Transverse Ranges (TR). Principal component analysis (PCA) summarized the 39 variables into 3 principal components (PC) by matrix multiplication and for the PRB are interpreted as follows: PC1 with about 30% of the variation included mainly compatible elements and SiO2 and indicates extent of differentation; PC2 with about 20% of the variation included HFS elements and may indicate crustal contamination as usually identified by Sri; PC3 with about 20% of the variation included mainly HRE elements and may indicate magma source depth as often diplayed using REE spider diagrams and possibly Sr/Y. Several elements did not fit well in any of the three components: Cr, Ni, U, and Na2O.For the PRB, the PC1 correlation with SiO2 was r=-0.85, the PC2 correlation with Sri was r=0.80, and the PC3 correlation with Gd/Yb was r=-0.76 and with Sr/Y was r=-0.66 . Extending this method to the TR, correlations were r=-0.85, -0.21, -0.06, and -0.64, respectively. A similar extent of correlation for both areas was visually evident using GIS interpolation.PC1 seems to do well at indicating differentiation index for both the PRB and TR and correlates very well with SiO2, Al2O3, MgO, FeO*, CaO, K2O, Sc, V, and Co, but poorly with Na2O and Cr. If the crustal component is represented by Sri, PC2 correlates well and less expesively with this indicator in the PRB, but not in the TR. Source depth has been related to the slope on REE spidergrams, and PC3 based on only the HREE and using the Sr/Y ratios gives a reasonable

  19. Learning contextual gene set interaction networks of cancer with condition specificity

    PubMed Central

    2013-01-01

    Background Identifying similarities and differences in the molecular constitutions of various types of cancer is one of the key challenges in cancer research. The appearances of a cancer depend on complex molecular interactions, including gene regulatory networks and gene-environment interactions. This complexity makes it challenging to decipher the molecular origin of the cancer. In recent years, many studies reported methods to uncover heterogeneous depictions of complex cancers, which are often categorized into different subtypes. The challenge is to identify diverse molecular contexts within a cancer, to relate them to different subtypes, and to learn underlying molecular interactions specific to molecular contexts so that we can recommend context-specific treatment to patients. Results In this study, we describe a novel method to discern molecular interactions specific to certain molecular contexts. Unlike conventional approaches to build modular networks of individual genes, our focus is to identify cancer-generic and subtype-specific interactions between contextual gene sets, of which each gene set share coherent transcriptional patterns across a subset of samples, termed contextual gene set. We then apply a novel formulation for quantitating the effect of the samples from each subtype on the calculated strength of interactions observed. Two cancer data sets were analyzed to support the validity of condition-specificity of identified interactions. When compared to an existing approach, the proposed method was much more sensitive in identifying condition-specific interactions even in heterogeneous data set. The results also revealed that network components specific to different types of cancer are related to different biological functions than cancer-generic network components. We found not only the results that are consistent with previous studies, but also new hypotheses on the biological mechanisms specific to certain cancer types that warrant further

  20. Repeated observation of immune gene sets enrichment in women with non-small cell lung cancer.

    PubMed

    Araujo, Jhajaira M; Prado, Alexandra; Cardenas, Nadezhda K; Zaharia, Mayer; Dyer, Richard; Doimi, Franco; Bravo, Leny; Pinillos, Luis; Morante, Zaida; Aguilar, Alfredo; Mas, Luis A; Gomez, Henry L; Vallejos, Carlos S; Rolfo, Christian; Pinto, Joseph A

    2016-04-12

    There are different biological and clinical patterns of lung cancer between genders indicating intrinsic differences leading to increased sensitivity to cigarette smoke-induced DNA damage, mutational patterns of KRAS and better clinical outcomes in women while differences between genders at gene-expression levels was not previously reported. Here we show an enrichment of immune genes in NSCLC in women compared to men. We found in a GSEA analysis (by biological processes annotated from Gene Ontology) of six public datasets a repeated observation of immune gene sets enrichment in women. "Immune system process", "immune response", "defense response", "cellular defense response" and "regulation of immune system process" were the gene sets most over-represented while APOBEC3G, APOBEC3F, LAT, CD1D and CCL5 represented the top-five core genes. Characterization of immune cell composition with the platform CIBERSORT showed no differences between genders; however, there were differences when tumor tissues were compared to normal tissues. Our results suggest different immune responses in NSCLC between genders that could be related with the different clinical outcome.

  1. Comprehensive set of integrative plasmid vectors for copper-inducible gene expression in Myxococcus xanthus.

    PubMed

    Gómez-Santos, Nuria; Treuner-Lange, Anke; Moraleda-Muñoz, Aurelio; García-Bravo, Elena; García-Hernández, Raquel; Martínez-Cayuela, Marina; Pérez, Juana; Søgaard-Andersen, Lotte; Muñoz-Dorado, José

    2012-04-01

    Myxococcus xanthus is widely used as a model system for studying gliding motility, multicellular development, and cellular differentiation. Moreover, M. xanthus is a rich source of novel secondary metabolites. The analysis of these processes has been hampered by the limited set of tools for inducible gene expression. Here we report the construction of a set of plasmid vectors to allow copper-inducible gene expression in M. xanthus. Analysis of the effect of copper on strain DK1622 revealed that copper concentrations of up to 500 μM during growth and 60 μM during development do not affect physiological processes such as cell viability, motility, or aggregation into fruiting bodies. Of the copper-responsive promoters in M. xanthus reported so far, the multicopper oxidase cuoA promoter was used to construct expression vectors, because no basal expression is observed in the absence of copper and induction linearly depends on the copper concentration in the culture medium. Four different plasmid vectors have been constructed, with different marker selection genes and sites of integration in the M. xanthus chromosome. The vectors have been tested and gene expression quantified using the lacZ gene. Moreover, we demonstrate the functional complementation of the motility defect caused by lack of PilB by the copper-induced expression of the pilB gene. These versatile vectors are likely to deepen our understanding of the biology of M. xanthus and may also have biotechnological applications.

  2. Repeated observation of immune gene sets enrichment in women with non-small cell lung cancer

    PubMed Central

    Araujo, Jhajaira M.; Prado, Alexandra; Cardenas, Nadezhda K.; Zaharia, Mayer; Dyer, Richard; Doimi, Franco; Bravo, Leny; Pinillos, Luis; Morante, Zaida; Aguilar, Alfredo; Mas, Luis A.; Gomez, Henry L.; Vallejos, Carlos S.; Rolfo, Christian; Pinto, Joseph A.

    2016-01-01

    There are different biological and clinical patterns of lung cancer between genders indicating intrinsic differences leading to increased sensitivity to cigarette smoke-induced DNA damage, mutational patterns of KRAS and better clinical outcomes in women while differences between genders at gene-expression levels was not previously reported. Here we show an enrichment of immune genes in NSCLC in women compared to men. We found in a GSEA analysis (by biological processes annotated from Gene Ontology) of six public datasets a repeated observation of immune gene sets enrichment in women. “Immune system process”, “immune response”, “defense response”, “cellular defense response” and “regulation of immune system process” were the gene sets most over-represented while APOBEC3G, APOBEC3F, LAT, CD1D and CCL5 represented the top-five core genes. Characterization of immune cell composition with the platform CIBERSORT showed no differences between genders; however, there were differences when tumor tissues were compared to normal tissues. Our results suggest different immune responses in NSCLC between genders that could be related with the different clinical outcome. PMID:26958810

  3. det1, cop1, and cop9 mutations cause inappropriate expression of several gene sets.

    PubMed Central

    Mayer, R; Raventos, D; Chua, N H

    1996-01-01

    Genetic studies using Arabidopsis offer a promising approach to investigate the mechanisms of light signal transduction during seedling development. Several mutants, called det/cop, have been isolated based on their deetiolated/constitutive photomorphogenic phenotypes in the dark. This study examines the specificity of the det/cop mutations with respect to their effects on genes regulated by other signal transduction pathways. Steady state mRNA levels of a number of differently regulated gene sets were compared between mutants and the wild type. We found that det2, cop2, cop3, and cop4 mutants displayed a gene expression pattern similar to that of the wild type. By contrast, det1, cop1, and cop9 mutations exhibited pleiotropic effects. In addition to light-responsive genes, genes normally inducible by plant pathogens, hypoxia, and developmental programs were inappropriately expressed in these mutants. Our data provide evidence that DET1, COP1, and COP9 most likely act as negative regulators of several sets of genes, not just those involved in light-regulated seedling development. PMID:8953766

  4. det1, cop1, and cop9 mutations cause inappropriate expression of several gene sets.

    PubMed

    Mayer, R; Raventos, D; Chua, N H

    1996-11-01

    Genetic studies using Arabidopsis offer a promising approach to investigate the mechanisms of light signal transduction during seedling development. Several mutants, called det/cop, have been isolated based on their deetiolated/constitutive photomorphogenic phenotypes in the dark. This study examines the specificity of the det/cop mutations with respect to their effects on genes regulated by other signal transduction pathways. Steady state mRNA levels of a number of differently regulated gene sets were compared between mutants and the wild type. We found that det2, cop2, cop3, and cop4 mutants displayed a gene expression pattern similar to that of the wild type. By contrast, det1, cop1, and cop9 mutations exhibited pleiotropic effects. In addition to light-responsive genes, genes normally inducible by plant pathogens, hypoxia, and developmental programs were inappropriately expressed in these mutants. Our data provide evidence that DET1, COP1, and COP9 most likely act as negative regulators of several sets of genes, not just those involved in light-regulated seedling development.

  5. Use of the gamma method for self-contained gene-set analysis of SNP data

    PubMed Central

    Biernacka, Joanna M; Jenkins, Gregory D; Wang, Liewei; Moyer, Ann M; Fridley, Brooke L

    2012-01-01

    Gene-set analysis (GSA) evaluates the overall evidence of association between a phenotype and all genotyped single nucleotide polymorphisms (SNPs) in a set of genes, as opposed to testing for association between a phenotype and each SNP individually. We propose using the Gamma Method (GM) to combine gene-level P-values for assessing the significance of GS association. We performed simulations to compare the GM with several other self-contained GSA strategies, including both one-step and two-step GSA approaches, in a variety of scenarios. We denote a ‘one-step' GSA approach to be one in which all SNPs in a GS are used to derive a test of GS association without consideration of gene-level effects, and a ‘two-step' approach to be one in which all genotyped SNPs in a gene are first used to evaluate association of the phenotype with all measured variation in the gene and then the gene-level tests of association are aggregated to assess the GS association with the phenotype. The simulations suggest that, overall, two-step methods provide higher power than one-step approaches and that combining gene-level P-values using the GM with a soft truncation threshold between 0.05 and 0.20 is a powerful approach for conducting GSA, relative to the competing approaches assessed. We also applied all of the considered GSA methods to data from a pharmacogenomic study of cisplatin, and obtained evidence suggesting that the glutathione metabolism GS is associated with cisplatin drug response. PMID:22166939

  6. A rough set based rational clustering framework for determining correlated genes.

    PubMed

    Jeyaswamidoss, Jeba Emilyn; Thangaraj, Kesavan; Ramar, Kadarkarai; Chitra, Muthusamy

    2016-06-01

    Cluster analysis plays a foremost role in identifying groups of genes that show similar behavior under a set of experimental conditions. Several clustering algorithms have been proposed for identifying gene behaviors and to understand their significance. The principal aim of this work is to develop an intelligent rough clustering technique, which will efficiently remove the irrelevant dimensions in a high-dimensional space and obtain appropriate meaningful clusters. This paper proposes a novel biclustering technique that is based on rough set theory. The proposed algorithm uses correlation coefficient as a similarity measure to simultaneously cluster both the rows and columns of a gene expression data matrix and mean squared residue to generate the initial biclusters. Furthermore, the biclusters are refined to form the lower and upper boundaries by determining the membership of the genes in the clusters using mean squared residue. The algorithm is illustrated with yeast gene expression data and the experiment proves the effectiveness of the method. The main advantage is that it overcomes the problem of selection of initial clusters and also the restriction of one object belonging to only one cluster by allowing overlapping of biclusters.

  7. Genome-Wide Identification, Phylogenetic and Co-Expression Analysis of OsSET Gene Family in Rice

    PubMed Central

    Lu, Zhanhua; Huang, Xiaolong; Ouyang, Yidan; Yao, Jialing

    2013-01-01

    Background SET domain is responsible for the catalytic activity of histone lysine methyltransferases (HKMTs) during developmental process. Histone lysine methylation plays a crucial and diverse regulatory function in chromatin organization and genome function. Although several SET genes have been identified and characterized in plants, the understanding of OsSET gene family in rice is still very limited. Methodology/Principal Findings In this study, a systematic analysis was performed and revealed the presence of at least 43 SET genes in rice genome. Phylogenetic and structural analysis grouped SET proteins into five classes, and supposed that the domains out of SET domain were significant for the specific of histone lysine methylation, as well as the recognition of methylated histone lysine. Based on the global microarray, gene expression profile revealed that the transcripts of OsSET genes were accumulated differentially during vegetative and reproductive developmental stages and preferentially up or down-regulated in different tissues. Cis-elements identification, co-expression analysis and GO analysis of expression correlation of 12 OsSET genes suggested that OsSET genes might be involved in cell cycle regulation and feedback. Conclusions/Significance This study will facilitate further studies on OsSET family and provide useful clues for functional validation of OsSETs. PMID:23762371

  8. Isolation and characterization of a diverse set of genes from carrot somatic embryos.

    PubMed Central

    Lin, X; Hwang, G J; Zimmerman, J L

    1996-01-01

    The early events in plant embryogenesis are critical for pattern formation, since it is during this process that the primary apical meristems and the embryo polarity axis are established. However, little is known about the molecular events that are unique to the early stages of embryogenesis. This study of gene expression during plant embryogenesis is focused on identifying molecular markers from carrot (Daucus carota) somatic embryos and characterizing the expression and regulation of these genes through embryo development. A cDNA library, prepared from polysomal mRNA of globular embryos, was screened using a subtracted probe; 49 clones were isolated and preliminarily characterized. Sequence analysis revealed a large set of genes, including many new genes, that are expressed in a variety of patterns during embryogenesis and may be regulated by different molecular mechanisms. To our knowledge, this group of clones represents the largest collection of embryo-enhanced genes isolated thus far, and demonstrates the utility of the subtracted-probe approach to the somatic embryo system. It is anticipated that many of these genes may serve as useful molecular markers for early embryo development. PMID:8938424

  9. Developmental Control of Stress Stimulons in Streptomyces coelicolor Revealed by Statistical Analyses of Global Gene Expression Patterns

    PubMed Central

    Vohradsky, J.; Li, X.-M.; Dale, G.; Folcher, M.; Nguyen, L.; Viollier, P. H.; Thompson, C. J.

    2000-01-01

    Stress-induced regulatory networks coordinated with a procaryotic developmental program were revealed by two-dimensional gel analyses of global gene expression. Four developmental stages were identified by their distinctive protein synthesis patterns using principal component analysis. Statistical analyses focused on five stress stimulons (induced by heat, cold, salt, ethanol, or antibiotic shock) and their synthesis during development. Unlike other bacteria, for which various stresses induce expression of similar sets of protein spots, in Streptomyces coelicolor heat, salt, and ethanol stimulons were composed of independent sets of proteins. This suggested independent control by different physiological stress signals and their corresponding regulatory systems. These stress proteins were also under developmental control. Cluster analysis of stress protein synthesis profiles identified 10 different developmental patterns or “synexpression groups.” Proteins induced by cold, heat, or salt shock were enriched in three developmental synexpression groups. In addition, certain proteins belonging to the heat and salt shock stimulons were coregulated during development. Thus, stress regulatory systems controlling these stimulons were implicated as integral parts of the developmental program. This correlation suggested that thermal shock and salt shock stress response regulatory systems either allow the cell to adapt to stresses associated with development or directly control the developmental program. PMID:10940043

  10. Detection of RTX toxin genes in gram-negative bacteria with a set of specific probes.

    PubMed Central

    Kuhnert, P; Heyberger-Meyer, B; Burnens, A P; Nicolet, J; Frey, J

    1997-01-01

    The family of RTX (RTX representing repeats in the structural toxin) toxins is composed of several protein toxins with a characteristic nonapeptide glycine-rich repeat motif. Most of its members were shown to have cytolytic activity. By comparing the genetic relationships of the RTX toxin genes we established a set of 10 gene probes to be used for screening as-yet-unknown RTX toxin genes in bacterial species. The probes include parts of apxIA, apxIIA, and apxIIIA from Actinobacillus pleuropneumoniae, cyaA from Bordetella pertusis, frpA from Neisseria meningitidis, prtC from Erwinia chrysanthemi, hlyA and elyA from Escherichia coli, aaltA from Actinobacillus actinomycetemcomitans and lktA from Pasteurella haemolytica. A panel of pathogenic and nonpathogenic gram-negative bacteria were investigated for the presence of RTX toxin genes. The probes detected all known genes for RTX toxins. Moreover, we found potential RTX toxin genes in several pathogenic bacterial species for which no such toxins are known yet. This indicates that RTX or RTX-like toxins are widely distributed among pathogenic gram-negative bacteria. The probes generated by PCR and the hybridization method were optimized to allow broad-range screening for RTX toxin genes in one step. This included the binding of unlabelled probes to a nylon filter and subsequent hybridization of the filter with labelled genomic DNA of the strain to be tested. The method constitutes a powerful tool for the assessment of the potential pathogenicity of poorly characterized strains intended to be used in biotechnological applications. Moreover, it is useful for the detection of already-known or new RTX toxin genes in bacteria of medical importance. PMID:9172345

  11. Construction of a Bacterial Cell that Contains Only the Set of Essential Genes Necessary to Impart Life

    DTIC Science & Technology

    2013-08-16

    gene and gene cluster deletions. To date, we have removed approximately 234 kb from the Mycoplasma mycoides JCVI-syn1.0 genome. The resultant 844 kb...categories to make steady progress with gene and gene cluster deletions. To date, we have removed approximately 234 kb from the Mycoplasma mycoides JCVI...only the set of genes that are essential for life under ideal laboratory conditions. We are working to minimize Mycoplasma mycoides JCVI-syn1.0

  12. Expression map of a complete set of gustatory receptor genes in chemosensory organs of Bombyx mori.

    PubMed

    Guo, Huizhen; Cheng, Tingcai; Chen, Zhiwei; Jiang, Liang; Guo, Youbing; Liu, Jianqiu; Li, Shenglong; Taniai, Kiyoko; Asaoka, Kiyoshi; Kadono-Okuda, Keiko; Arunkumar, Kallare P; Wu, Jiaqi; Kishino, Hirohisa; Zhang, Huijie; Seth, Rakesh K; Gopinathan, Karumathil P; Montagné, Nicolas; Jacquin-Joly, Emmanuelle; Goldsmith, Marian R; Xia, Qingyou; Mita, Kazuei

    2017-03-01

    Most lepidopteran species are herbivores, and interaction with host plants affects their gene expression and behavior as well as their genome evolution. Gustatory receptors (Grs) are expected to mediate host plant selection, feeding, oviposition and courtship behavior. However, due to their high diversity, sequence divergence and extremely low level of expression it has been difficult to identify precisely a complete set of Grs in Lepidoptera. By manual annotation and BAC sequencing, we improved annotation of 43 gene sequences compared with previously reported Grs in the most studied lepidopteran model, the silkworm, Bombyx mori, and identified 7 new tandem copies of BmGr30 on chromosome 7, bringing the total number of BmGrs to 76. Among these, we mapped 68 genes to chromosomes in a newly constructed chromosome distribution map and 8 genes to scaffolds; we also found new evidence for large clusters of BmGrs, especially from the bitter receptor family. RNA-seq analysis of diverse BmGr expression patterns in chemosensory organs of larvae and adults enabled us to draw a precise organ specific map of BmGr expression. Interestingly, most of the clustered genes were expressed in the same tissues and more than half of the genes were expressed in larval maxillae, larval thoracic legs and adult legs. For example, BmGr63 showed high expression levels in all organs in both larval and adult stages. By contrast, some genes showed expression limited to specific developmental stages or organs and tissues. BmGr19 was highly expressed in larval chemosensory organs (especially antennae and thoracic legs), the single exon genes BmGr53 and BmGr67 were expressed exclusively in larval tissues, the BmGr27-BmGr31 gene cluster on chr7 displayed a high expression level limited to adult legs and the candidate CO2 receptor BmGr2 was highly expressed in adult antennae, where few other Grs were expressed. Transcriptional analysis of the Grs in B. mori provides a valuable new reference for

  13. Identification of the Core Set of Carbon-Associated Genes in a Bioenergy Grassland Soil

    PubMed Central

    Howe, Adina; Yang, Fan; Williams, Ryan J.; Meyer, Folker; Hofmockel, Kirsten S.

    2016-01-01

    Despite the central role of soil microbial communities in global carbon (C) cycling, little is known about soil microbial community structure and even less about their metabolic pathways. Efforts to characterize soil communities often focus on identifying differences in gene content across environmental gradients, but an alternative question is what genes are similar in soils. These genes may indicate critical species or potential functions that are required in all soils. Here we identified the “core” set of C cycling sequences widely present in multiple soil metagenomes from a fertilized prairie (FP). Of 226,887 sequences associated with known enzymes involved in the synthesis, metabolism, and transport of carbohydrates, 843 were identified to be consistently prevalent across four replicate soil metagenomes. This core metagenome was functionally and taxonomically diverse, representing five enzyme classes and 99 enzyme families within the CAZy database. Though it only comprised 0.4% of all CAZy-associated genes identified in FP metagenomes, the core was found to be comprised of functions similar to those within cumulative soils. The FP CAZy-associated core sequences were present in multiple publicly available soil metagenomes and most similar to soils sharing geographic proximity. In soil ecosystems, where high diversity remains a key challenge for metagenomic investigations, these core genes represent a subset of critical functions necessary for carbohydrate metabolism, which can be targeted to evaluate important C fluxes in these and other similar soils. PMID:27855202

  14. gsSKAT: Rapid gene set analysis and multiple testing correction for rare-variant association studies using weighted linear kernels.

    PubMed

    Larson, Nicholas B; McDonnell, Shannon; Cannon Albright, Lisa; Teerlink, Craig; Stanford, Janet; Ostrander, Elaine A; Isaacs, William B; Xu, Jianfeng; Cooney, Kathleen A; Lange, Ethan; Schleutker, Johanna; Carpten, John D; Powell, Isaac; Bailey-Wilson, Joan E; Cussenot, Olivier; Cancel-Tassin, Geraldine; Giles, Graham G; MacInnis, Robert J; Maier, Christiane; Whittemore, Alice S; Hsieh, Chih-Lin; Wiklund, Fredrik; Catolona, William J; Foulkes, William; Mandal, Diptasri; Eeles, Rosalind; Kote-Jarai, Zsofia; Ackerman, Michael J; Olson, Timothy M; Klein, Christopher J; Thibodeau, Stephen N; Schaid, Daniel J

    2017-02-16

    Next-generation sequencing technologies have afforded unprecedented characterization of low-frequency and rare genetic variation. Due to low power for single-variant testing, aggregative methods are commonly used to combine observed rare variation within a single gene. Causal variation may also aggregate across multiple genes within relevant biomolecular pathways. Kernel-machine regression and adaptive testing methods for aggregative rare-variant association testing have been demonstrated to be powerful approaches for pathway-level analysis, although these methods tend to be computationally intensive at high-variant dimensionality and require access to complete data. An additional analytical issue in scans of large pathway definition sets is multiple testing correction. Gene set definitions may exhibit substantial genic overlap, and the impact of the resultant correlation in test statistics on Type I error rate control for large agnostic gene set scans has not been fully explored. Herein, we first outline a statistical strategy for aggregative rare-variant analysis using component gene-level linear kernel score test summary statistics as well as derive simple estimators of the effective number of tests for family-wise error rate control. We then conduct extensive simulation studies to characterize the behavior of our approach relative to direct application of kernel and adaptive methods under a variety of conditions. We also apply our method to two case-control studies, respectively, evaluating rare variation in hereditary prostate cancer and schizophrenia. Finally, we provide open-source R code for public use to facilitate easy application of our methods to existing rare-variant analysis results.

  15. Gene-set meta-analysis of lung cancer identifies pathway related to systemic lupus erythematosus

    PubMed Central

    Sohns, Melanie; Friedrichs, Stefanie; Hung, Rayjean J.; Fehringer, Gord; McLaughlin, John; Amos, Christopher I.; Brennan, Paul; Risch, Angela; Brüske, Irene; Caporaso, Neil E.; Landi, Maria Teresa; Christiani, David C.; Wei, Yongyue; Bickeböller, Heike

    2017-01-01

    Introduction Gene-set analysis (GSA) is an approach using the results of single-marker genome-wide association studies when investigating pathways as a whole with respect to the genetic basis of a disease. Methods We performed a meta-analysis of seven GSAs for lung cancer, applying the method META-GSA. Overall, the information taken from 11,365 cases and 22,505 controls from within the TRICL/ILCCO consortia was used to investigate a total of 234 pathways from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Results META-GSA reveals the systemic lupus erythematosus KEGG pathway hsa05322, driven by the gene region 6p21-22, as also implicated in lung cancer (p = 0.0306). This gene region is known to be associated with squamous cell lung carcinoma. The most important genes driving the significance of this pathway belong to the genomic areas HIST1-H4L, -1BN, -2BN, -H2AK, -H4K and C2/C4A/C4B. Within these areas, the markers most significantly associated with LC are rs13194781 (located within HIST12BN) and rs1270942 (located between C2 and C4A). Conclusions We have discovered a pathway currently marked as specific to systemic lupus erythematosus as being significantly implicated in lung cancer. The gene region 6p21-22 in this pathway appears to be more extensively associated with lung cancer than previously assumed. Given wide-stretched linkage disequilibrium to the area APOM/BAG6/MSH5, there is currently simply not enough information or evidence to conclude whether the potential pleiotropy of lung cancer and systemic lupus erythematosus is spurious, biological, or mediated. Further research into this pathway and gene region will be necessary. PMID:28273134

  16. Transcriptomic Analysis Identifies Candidate Genes and Gene Sets Controlling the Response of Porcine Peripheral Blood Mononuclear Cells to Poly I:C Stimulation.

    PubMed

    Wang, Jiying; Wang, Yanping; Wang, Huaizhong; Wang, Haifei; Liu, Jian-Feng; Wu, Ying; Guo, Jianfeng

    2016-05-03

    Polyinosinic-polycytidylic acid (poly I:C), a synthetic dsRNA analog, has been demonstrated to have stimulatory effects similar to viral dsRNA. To gain deep knowledge of the host transcriptional response of pigs to poly I:C stimulation, in the present study, we cultured and stimulated peripheral blood mononuclear cells (PBMC) of piglets of one Chinese indigenous breed (Dapulian) and one modern commercial breed (Landrace) with poly I:C, and compared their transcriptional profiling using RNA-sequencing (RNA-seq). Our results indicated that poly I:C stimulation can elicit significantly differentially expressed (DE) genes in Dapulian (g = 290) as well as Landrace (g = 85). We also performed gene set analysis using the Gene Set Enrichment Analysis (GSEA) package, and identified some significantly enriched gene sets in Dapulian (g = 18) and Landrace (g = 21). Most of the shared DE genes and gene sets were immune-related, and may play crucial rules in the immune response of poly I:C stimulation. In addition, we detected large sets of significantly DE genes and enriched gene sets when comparing the gene expression profile between the two breeds, including control and poly I:C stimulation groups. Besides immune-related functions, some of the DE genes and gene sets between the two breeds were involved in development and growth of various tissues, which may be correlated with the different characteristics of the two breeds. The DE genes and gene sets detected herein provide crucial information towards understanding the immune regulation of antiviral responses, and the molecular mechanisms of different genetic resistance to viral infection, in modern and indigenous pigs.

  17. Transcriptomic Analysis Identifies Candidate Genes and Gene Sets Controlling the Response of Porcine Peripheral Blood Mononuclear Cells to Poly I:C Stimulation

    PubMed Central

    Wang, Jiying; Wang, Yanping; Wang, Huaizhong; Wang, Haifei; Liu, Jian-Feng; Wu, Ying; Guo, Jianfeng

    2016-01-01

    Polyinosinic-polycytidylic acid (poly I:C), a synthetic dsRNA analog, has been demonstrated to have stimulatory effects similar to viral dsRNA. To gain deep knowledge of the host transcriptional response of pigs to poly I:C stimulation, in the present study, we cultured and stimulated peripheral blood mononuclear cells (PBMC) of piglets of one Chinese indigenous breed (Dapulian) and one modern commercial breed (Landrace) with poly I:C, and compared their transcriptional profiling using RNA-sequencing (RNA-seq). Our results indicated that poly I:C stimulation can elicit significantly differentially expressed (DE) genes in Dapulian (g = 290) as well as Landrace (g = 85). We also performed gene set analysis using the Gene Set Enrichment Analysis (GSEA) package, and identified some significantly enriched gene sets in Dapulian (g = 18) and Landrace (g = 21). Most of the shared DE genes and gene sets were immune-related, and may play crucial rules in the immune response of poly I:C stimulation. In addition, we detected large sets of significantly DE genes and enriched gene sets when comparing the gene expression profile between the two breeds, including control and poly I:C stimulation groups. Besides immune-related functions, some of the DE genes and gene sets between the two breeds were involved in development and growth of various tissues, which may be correlated with the different characteristics of the two breeds. The DE genes and gene sets detected herein provide crucial information towards understanding the immune regulation of antiviral responses, and the molecular mechanisms of different genetic resistance to viral infection, in modern and indigenous pigs. PMID:26935416

  18. Functional classification of genes using semantic distance and fuzzy clustering approach: evaluation with reference sets and overlap analysis.

    PubMed

    Devignes, Marie-Dominique; Benabderrahmane, Sidahmed; Smaïl-Tabbone, Malika; Napoli, Amedeo; Poch, Olivier

    2012-01-01

    Functional classification aims at grouping genes according to their molecular function or the biological process they participate in. Evaluating the validity of such unsupervised gene classification remains a challenge given the variety of distance measures and classification algorithms that can be used. We evaluate here functional classification of genes with the help of reference sets: KEGG (Kyoto Encyclopaedia of Genes and Genomes) pathways and Pfam clans. These sets represent ground truth for any distance based on GO (Gene Ontology) biological process and molecular function annotations respectively. Overlaps between clusters and reference sets are estimated by the F-score method. We test our previously described IntelliGO semantic distance with hierarchical and fuzzy C-means clustering and we compare results with the state-of-the-art DAVID (Database for Annotation Visualisation and Integrated Discovery) functional classification method. Finally, study of best matching clusters to reference sets leads us to propose a set-difference method for discovering missing information.

  19. Feature selection in gene expression data using principal component analysis and rough set theory.

    PubMed

    Mishra, Debahuti; Dash, Rajashree; Rath, Amiya Kumar; Acharya, Milu

    2011-01-01

    In many fields such as data mining, machine learning, pattern recognition and signal processing, data sets containing huge number of features are often involved. Feature selection is an essential data preprocessing technique for such high-dimensional data classification tasks. Traditional dimensionality reduction approach falls into two categories: Feature Extraction (FE) and Feature Selection (FS). Principal component analysis is an unsupervised linear FE method for projecting high-dimensional data into a low-dimensional space with minimum loss of information. It discovers the directions of maximal variances in the data. The Rough set approach to feature selection is used to discover the data dependencies and reduction in the number of attributes contained in a data set using the data alone, requiring no additional information. For selecting discriminative features from principal components, the Rough set theory can be applied jointly with PCA, which guarantees that the selected principal components will be the most adequate for classification. We call this method Rough PCA. The proposed method is successfully applied for choosing the principal features and then applying the Upper and Lower Approximations to find the reduced set of features from a gene expression data.

  20. Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.

    PubMed

    Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra

    2015-01-01

    Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data

  1. Primer Sets Developed To Amplify Conserved Genes from Filamentous Ascomycetes Are Useful in Differentiating Fusarium Species Associated with Conifers

    PubMed Central

    Donaldson, G. C.; Ball, L. A.; Axelrood, P. E.; Glass, N. L.

    1995-01-01

    We examined the usefulness of primer sets designed to amplify introns within conserved genes in filamentous ascomycetes to differentiate 35 isolates representing six different species of Fusarium commonly found in association with conifer seedlings. We analyzed restriction fragment length polymorphisms (RFLP) in five amplified PCR products from each Fusarium isolate. The primers used in this study were constructed on the basis of sequence information from the H3, H4, and (beta)-tubulin genes in Neurospora crassa. Primers previously developed for the intergenic transcribed spacer region of the ribosomal DNA were also used. The degree of interspecific polymorphism observed in the PCR products from the six Fusarium species allowed differentiation by a limited number of amplifications and restriction endonuclease digestions. The level of intraspecific RFLP variation in the five PCR products was low in both Fusarium proliferatum and F. avenaceum but was high in a population sample of F. oxysporum isolates. Clustering of the 35 isolates by statistical analyses gave similar dendrograms for H3, H4, and (beta)-tubulin RFLP analysis, but a dendrogram produced by intergenic transcribed spacer analysis varied in the placement of some F. oxysporum isolates. PMID:16534991

  2. Gene set based association analyses for the WSSV resistance of Pacific white shrimp Litopenaeus vannamei.

    PubMed

    Yu, Yang; Liu, Jingwen; Li, Fuhua; Zhang, Xiaojun; Zhang, Chengsong; Xiang, Jianhai

    2017-01-17

    White Spot Syndrome Virus (WSSV) is regarded as a virus with the strongest pathogenicity to shrimp. For the threshold trait such as disease resistance, marker assisted selection (MAS) was considered to be a more effective approach. In the present study, association analyses of single nucleotide polymorphisms (SNPs) located in a set of immune related genes were conducted to identify markers associated with WSSV resistance. SNPs were detected by bioinformatics analysis on RNA sequencing data generated by Illimina sequencing platform and Roche 454 sequencing technology. A total of 681 SNPs located in the exons of immune related genes were selected as candidate SNPs. Among these SNPs, 77 loci were genotyped in WSSV susceptible group and resistant group. Association analysis was performed based on logistic regression method under an additive and dominance model in GenABEL package. As a result, five SNPs showed associations with WSSV resistance at a significant level of 0.05. Besides, SNP-SNP interaction analysis was conducted. The combination of SNP loci in TRAF6, Cu/Zn SOD and nLvALF2 exhibited a significant effect on the WSSV resistance of shrimp. Gene expression analysis revealed that these SNPs might influence the expression of these immune-related genes. This study provides a useful method for performing MAS in shrimp.

  3. A Minimal Set of Glycolytic Genes Reveals Strong Redundancies in Saccharomyces cerevisiae Central Metabolism.

    PubMed

    Solis-Escalante, Daniel; Kuijpers, Niels G A; Barrajon-Simancas, Nuria; van den Broek, Marcel; Pronk, Jack T; Daran, Jean-Marc; Daran-Lapujade, Pascale

    2015-08-01

    As a result of ancestral whole-genome and small-scale duplication events, the genomes of Saccharomyces cerevisiae and many eukaryotes still contain a substantial fraction of duplicated genes. In all investigated organisms, metabolic pathways, and more particularly glycolysis, are specifically enriched for functionally redundant paralogs. In ancestors of the Saccharomyces lineage, the duplication of glycolytic genes is purported to have played an important role leading to S. cerevisiae's current lifestyle favoring fermentative metabolism even in the presence of oxygen and characterized by a high glycolytic capacity. In modern S. cerevisiae strains, the 12 glycolytic reactions leading to the biochemical conversion from glucose to ethanol are encoded by 27 paralogs. In order to experimentally explore the physiological role of this genetic redundancy, a yeast strain with a minimal set of 14 paralogs was constructed (the "minimal glycolysis" [MG] strain). Remarkably, a combination of a quantitative systems approach and semiquantitative analysis in a wide array of growth environments revealed the absence of a phenotypic response to the cumulative deletion of 13 glycolytic paralogs. This observation indicates that duplication of glycolytic genes is not a prerequisite for achieving the high glycolytic fluxes and fermentative capacities that are characteristic of S. cerevisiae and essential for many of its industrial applications and argues against gene dosage effects as a means of fixing minor glycolytic paralogs in the yeast genome. The MG strain was carefully designed and constructed to provide a robust prototrophic platform for quantitative studies and has been made available to the scientific community.

  4. Gene set based association analyses for the WSSV resistance of Pacific white shrimp Litopenaeus vannamei

    PubMed Central

    Yu, Yang; Liu, Jingwen; Li, Fuhua; Zhang, Xiaojun; Zhang, Chengsong; Xiang, Jianhai

    2017-01-01

    White Spot Syndrome Virus (WSSV) is regarded as a virus with the strongest pathogenicity to shrimp. For the threshold trait such as disease resistance, marker assisted selection (MAS) was considered to be a more effective approach. In the present study, association analyses of single nucleotide polymorphisms (SNPs) located in a set of immune related genes were conducted to identify markers associated with WSSV resistance. SNPs were detected by bioinformatics analysis on RNA sequencing data generated by Illimina sequencing platform and Roche 454 sequencing technology. A total of 681 SNPs located in the exons of immune related genes were selected as candidate SNPs. Among these SNPs, 77 loci were genotyped in WSSV susceptible group and resistant group. Association analysis was performed based on logistic regression method under an additive and dominance model in GenABEL package. As a result, five SNPs showed associations with WSSV resistance at a significant level of 0.05. Besides, SNP-SNP interaction analysis was conducted. The combination of SNP loci in TRAF6, Cu/Zn SOD and nLvALF2 exhibited a significant effect on the WSSV resistance of shrimp. Gene expression analysis revealed that these SNPs might influence the expression of these immune-related genes. This study provides a useful method for performing MAS in shrimp. PMID:28094323

  5. Globularity and language-readiness: generating new predictions by expanding the set of genes of interest

    PubMed Central

    Boeckx, Cedric; Benítez-Burraco, Antonio

    2014-01-01

    This study builds on the hypothesis put forth in Boeckx and Benítez-Burraco (2014), according to which the developmental changes expressed at the levels of brain morphology and neural connectivity that resulted in a more globular braincase in our species were crucial to understand the origins of our language-ready brain. Specifically, this paper explores the links between two well-known ‘language-related’ genes like FOXP2 and ROBO1 implicated in vocal learning and the initial set of genes of interest put forth in Boeckx and Benítez-Burraco (2014), with RUNX2 as focal point. Relying on the existing literature, we uncover potential molecular links that could be of interest to future experimental inquiries into the biological foundations of language and the testing of our initial hypothesis. Our discussion could also be relevant for clinical linguistics and for the interpretation of results from paleogenomics. PMID:25505436

  6. In silico analysis of stomach lineage specific gene set expression pattern in gastric cancer

    SciTech Connect

    Pandi, Narayanan Sathiya Suganya, Sivagurunathan; Rajendran, Suriliyandi

    2013-10-04

    Highlights: •Identified stomach lineage specific gene set (SLSGS) was found to be under expressed in gastric tumors. •Elevated expression of SLSGS in gastric tumor is a molecular predictor of metabolic type gastric cancer. •In silico pathway scanning identified estrogen-α signaling is a putative regulator of SLSGS in gastric cancer. •Elevated expression of SLSGS in GC is associated with an overall increase in the survival of GC patients. -- Abstract: Stomach lineage specific gene products act as a protective barrier in the normal stomach and their expression maintains the normal physiological processes, cellular integrity and morphology of the gastric wall. However, the regulation of stomach lineage specific genes in gastric cancer (GC) is far less clear. In the present study, we sought to investigate the role and regulation of stomach lineage specific gene set (SLSGS) in GC. SLSGS was identified by comparing the mRNA expression profiles of normal stomach tissue with other organ tissue. The obtained SLSGS was found to be under expressed in gastric tumors. Functional annotation analysis revealed that the SLSGS was enriched for digestive function and gastric epithelial maintenance. Employing a single sample prediction method across GC mRNA expression profiles identified the under expression of SLSGS in proliferative type and invasive type gastric tumors compared to the metabolic type gastric tumors. Integrative pathway activation prediction analysis revealed a close association between estrogen-α signaling and SLSGS expression pattern in GC. Elevated expression of SLSGS in GC is associated with an overall increase in the survival of GC patients. In conclusion, our results highlight that estrogen mediated regulation of SLSGS in gastric tumor is a molecular predictor of metabolic type GC and prognostic factor in GC.

  7. Gene-Environment Interaction in the Etiology of Mathematical Ability Using SNP Sets

    PubMed Central

    Kovas, Yulia; Plomin, Robert

    2010-01-01

    Mathematics ability and disability is as heritable as other cognitive abilities and disabilities, however its genetic etiology has received relatively little attention. In our recent genome-wide association study of mathematical ability in 10-year-old children, 10 SNP associations were nominated from scans of pooled DNA and validated in an individually genotyped sample. In this paper, we use a ‘SNP set’ composite of these 10 SNPs to investigate gene-environment (GE) interaction, examining whether the association between the 10-SNP set and mathematical ability differs as a function of ten environmental measures in the home and school in a sample of 1888 children with complete data. We found two significant GE interactions for environmental measures in the home and the school both in the direction of the diathesis-stress type of GE interaction: The 10-SNP set was more strongly associated with mathematical ability in chaotic homes and when parents are negative. PMID:20978832

  8. Identification of prognostic genes and gene sets for early-stage non-small cell lung cancer using bi-level selection methods

    PubMed Central

    Tian, Suyan; Wang, Chi; Chang, Howard H.; Sun, Jianguo

    2017-01-01

    In contrast to feature selection and gene set analysis, bi-level selection is a process of selecting not only important gene sets but also important genes within those gene sets. Depending on the order of selections, a bi-level selection method can be classified into three categories – forward selection, which first selects relevant gene sets followed by the selection of relevant individual genes; backward selection which takes the reversed order; and simultaneous selection, which performs the two tasks simultaneously usually with the aids of a penalized regression model. To test the existence of subtype-specific prognostic genes for non-small cell lung cancer (NSCLC), we had previously proposed the Cox-filter method that examines the association between patients’ survival time after diagnosis with one specific gene, the disease subtypes, and their interaction terms. In this study, we further extend it to carry out forward and backward bi-level selection. Using simulations and a NSCLC application, we demonstrate that the forward selection outperforms the backward selection and other relevant algorithms in our setting. Both proposed methods are readily understandable and interpretable. Therefore, they represent useful tools for the researchers who are interested in exploring the prognostic value of gene expression data for specific subtypes or stages of a disease. PMID:28387364

  9. Gene Selection Integrated with Biological Knowledge for Plant Stress Response Using Neighborhood System and Rough Set Theory.

    PubMed

    Meng, Jun; Zhang, Jing; Luan, Yushi

    2015-01-01

    Mining knowledge from gene expression data is a hot research topic and direction of bioinformatics. Gene selection and sample classification are significant research trends, due to the large amount of genes and small size of samples in gene expression data. Rough set theory has been successfully applied to gene selection, as it can select attributes without redundancy. To improve the interpretability of the selected genes, some researchers introduced biological knowledge. In this paper, we first employ neighborhood system to deal directly with the new information table formed by integrating gene expression data with biological knowledge, which can simultaneously present the information in multiple perspectives and do not weaken the information of individual gene for selection and classification. Then, we give a novel framework for gene selection and propose a significant gene selection method based on this framework by employing reduction algorithm in rough set theory. The proposed method is applied to the analysis of plant stress response. Experimental results on three data sets show that the proposed method is effective, as it can select significant gene subsets without redundancy and achieve high classification accuracy. Biological analysis for the results shows that the interpretability is well.

  10. Schizophrenia-Associated MIR204 Regulates Noncoding RNAs and Affects Neurotransmitter and Ion Channel Gene Sets

    PubMed Central

    Cammaerts, Sophia; Strazisar, Mojca; Smets, Bart; Weckhuysen, Sarah; Nordin, Annelie; De Jonghe, Peter; Adolfsson, Rolf; De Rijk, Peter; Del Favero, Jurgen

    2015-01-01

    As regulators of gene expression, microRNAs (miRNAs) are likely to play an important role in the development of disease. In this study we present a large-scale strategy to identify miRNAs with a role in the regulation of neuronal processes. Thereby we found variant rs7861254 located near the MIR204 gene to be significantly associated with schizophrenia. This variant resulted in reduced expression of miR-204 in neuronal-like SH-SY5Y cells. Analysis of the consequences of the altered miR-204 expression on the transcriptome of these cells uncovered a new mode of action for miR-204, being the regulation of noncoding RNAs (ncRNAs), including several miRNAs, such as MIR296. Furthermore, pathway analysis showed downstream effects of miR-204 on neurotransmitter and ion channel related gene sets, potentially mediated by miRNAs regulated through miR-204. PMID:26714269

  11. A modular framework for gene set analysis integrating multilevel omics data

    PubMed Central

    Sass, Steffen; Buettner, Florian; Mueller, Nikola S.; Theis, Fabian J.

    2013-01-01

    Modern high-throughput methods allow the investigation of biological functions across multiple ‘omics’ levels. Levels include mRNA and protein expression profiling as well as additional knowledge on, for example, DNA methylation and microRNA regulation. The reason for this interest in multi-omics is that actual cellular responses to different conditions are best explained mechanistically when taking all omics levels into account. To map gene products to their biological functions, public ontologies like Gene Ontology are commonly used. Many methods have been developed to identify terms in an ontology, overrepresented within a set of genes. However, these methods are not able to appropriately deal with any combination of several data types. Here, we propose a new method to analyse integrated data across multiple omics-levels to simultaneously assess their biological meaning. We developed a model-based Bayesian method for inferring interpretable term probabilities in a modular framework. Our Multi-level ONtology Analysis (MONA) algorithm performed significantly better than conventional analyses of individual levels and yields best results even for sophisticated models including mRNA fine-tuning by microRNAs. The MONA framework is flexible enough to allow for different underlying regulatory motifs or ontologies. It is ready-to-use for applied researchers and is available as a standalone application from http://icb.helmholtz-muenchen.de/mona. PMID:23975194

  12. In silico analysis of stomach lineage specific gene set expression pattern in gastric cancer.

    PubMed

    Pandi, Narayanan Sathiya; Suganya, Sivagurunathan; Rajendran, Suriliyandi

    2013-10-04

    Stomach lineage specific gene products act as a protective barrier in the normal stomach and their expression maintains the normal physiological processes, cellular integrity and morphology of the gastric wall. However, the regulation of stomach lineage specific genes in gastric cancer (GC) is far less clear. In the present study, we sought to investigate the role and regulation of stomach lineage specific gene set (SLSGS) in GC. SLSGS was identified by comparing the mRNA expression profiles of normal stomach tissue with other organ tissue. The obtained SLSGS was found to be under expressed in gastric tumors. Functional annotation analysis revealed that the SLSGS was enriched for digestive function and gastric epithelial maintenance. Employing a single sample prediction method across GC mRNA expression profiles identified the under expression of SLSGS in proliferative type and invasive type gastric tumors compared to the metabolic type gastric tumors. Integrative pathway activation prediction analysis revealed a close association between estrogen-α signaling and SLSGS expression pattern in GC. Elevated expression of SLSGS in GC is associated with an overall increase in the survival of GC patients. In conclusion, our results highlight that estrogen mediated regulation of SLSGS in gastric tumor is a molecular predictor of metabolic type GC and prognostic factor in GC.

  13. Detecting Cooperativity between Transcription Factors Based on Functional Coherence and Similarity of Their Target Gene Sets

    PubMed Central

    Wu, Wei-Sheng; Lai, Fu-Jou

    2016-01-01

    In eukaryotic cells, transcriptional regulation of gene expression is usually achieved by cooperative transcription factors (TFs). Therefore, knowing cooperative TFs is the first step toward uncovering the molecular mechanisms of gene expression regulation. Many algorithms based on different rationales have been proposed to predict cooperative TF pairs in yeast. Although various types of rationales have been used in the existing algorithms, functional coherence is not yet used. This prompts us to develop a new algorithm based on functional coherence and similarity of the target gene sets to identify cooperative TF pairs in yeast. The proposed algorithm predicted 40 cooperative TF pairs. Among them, three (Pdc2-Thi2, Hot1-Msn1 and Leu3-Met28) are novel predictions, which have not been predicted by any existing algorithms. Strikingly, two (Pdc2-Thi2 and Hot1-Msn1) of the three novel predictions have been experimentally validated, demonstrating the power of the proposed algorithm. Moreover, we show that the predictions of the proposed algorithm are more biologically meaningful than the predictions of 17 existing algorithms under four evaluation indices. In summary, our study suggests that new algorithms based on novel rationales are worthy of developing for detecting previously unidentifiable cooperative TF pairs. PMID:27623007

  14. MADS goes genomic in conifers: towards determining the ancestral set of MADS-box genes in seed plants

    PubMed Central

    Gramzow, Lydia; Weilandt, Lisa; Theißen, Günter

    2014-01-01

    Background and Aims MADS-box genes comprise a gene family coding for transcription factors. This gene family expanded greatly during land plant evolution such that the number of MADS-box genes ranges from one or two in green algae to around 100 in angiosperms. Given the crucial functions of MADS-box genes for nearly all aspects of plant development, the expansion of this gene family probably contributed to the increasing complexity of plants. However, the expansion of MADS-box genes during one important step of land plant evolution, namely the origin of seed plants, remains poorly understood due to the previous lack of whole-genome data for gymnosperms. Methods The newly available genome sequences of Picea abies, Picea glauca and Pinus taeda were used to identify the complete set of MADS-box genes in these conifers. In addition, MADS-box genes were identified in the growing number of transcriptomes available for gymnosperms. With these datasets, phylogenies were constructed to determine the ancestral set of MADS-box genes of seed plants and to infer the ancestral functions of these genes. Key Results Type I MADS-box genes are under-represented in gymnosperms and only a minimum of two Type I MADS-box genes have been present in the most recent common ancestor (MRCA) of seed plants. In contrast, a large number of Type II MADS-box genes were found in gymnosperms. The MRCA of extant seed plants probably possessed at least 11–14 Type II MADS-box genes. In gymnosperms two duplications of Type II MADS-box genes were found, such that the MRCA of extant gymnosperms had at least 14–16 Type II MADS-box genes. Conclusions The implied ancestral set of MADS-box genes for seed plants shows simplicity for Type I MADS-box genes and remarkable complexity for Type II MADS-box genes in terms of phylogeny and putative functions. The analysis of transcriptome data reveals that gymnosperm MADS-box genes are expressed in a great variety of tissues, indicating diverse roles of MADS

  15. Two Novel Sets of Genes Essential for Nicotine Degradation by Sphingomonas melonis TY.

    PubMed

    Wang, Haixia; Xie, Cuixiao; Zhu, Panpan; Zhou, Ning-Yi; Lu, Zhenmei

    2016-01-01

    Nicotine is a type of environmental pollutant present in the tobacco waste that is generated during tobacco manufacturing. Sphingomonas melonis TY can utilize nicotine as a sole source of carbon, nitrogen and energy via a variant of the pyridine and pyrrolidine pathway (the VPP pathway). In this study, we report the identification of two novel sets of genes, ndrA1A2A3, and ndrB1B2B3B4, which are crucial for nicotine degradation by strain TY. ndrA1A2A3 and ndrB1B2B3B4 exhibit similarity with both nicotine dehydrogenase ndh from Arthrobacter nicotinovorans and nicotine hydroxylase vppA from Ochrobactrum sp. SJY1. The transcriptional levels of ndrA1A2A3 and ndrB1B2B3B4 in strain TY were significantly upregulated in the presence of nicotine. Furthermore, ndrA1 or ndrB2 knockout resulted in a loss of the ability to degrade nicotine, whereas gene complementation restored the capacity of each mutant to utilize nicotine for growth. Biodegradation assays indicated that the mutant strains retained the ability to degrade the first intermediate in the pathway, 6-hydroxynicotine (6 HN). However, heterologous expression of ndrA1A2A3 and ndrB1B2B3B4 did not confer nicotine dehydrogenase activity to E. coli DH5α, Pseudomonas putida KT2440 or Sphingomonas aquatilis. These results provide information on the VPP pathway of nicotine degradation in S. melonis TY, and we conclude that these two sets of genes have essential functions in the conversion of nicotine to 6 HN in strain TY.

  16. Two Novel Sets of Genes Essential for Nicotine Degradation by Sphingomonas melonis TY

    PubMed Central

    Wang, Haixia; Xie, Cuixiao; Zhu, Panpan; Zhou, Ning-Yi; Lu, Zhenmei

    2017-01-01

    Nicotine is a type of environmental pollutant present in the tobacco waste that is generated during tobacco manufacturing. Sphingomonas melonis TY can utilize nicotine as a sole source of carbon, nitrogen and energy via a variant of the pyridine and pyrrolidine pathway (the VPP pathway). In this study, we report the identification of two novel sets of genes, ndrA1A2A3, and ndrB1B2B3B4, which are crucial for nicotine degradation by strain TY. ndrA1A2A3 and ndrB1B2B3B4 exhibit similarity with both nicotine dehydrogenase ndh from Arthrobacter nicotinovorans and nicotine hydroxylase vppA from Ochrobactrum sp. SJY1. The transcriptional levels of ndrA1A2A3 and ndrB1B2B3B4 in strain TY were significantly upregulated in the presence of nicotine. Furthermore, ndrA1 or ndrB2 knockout resulted in a loss of the ability to degrade nicotine, whereas gene complementation restored the capacity of each mutant to utilize nicotine for growth. Biodegradation assays indicated that the mutant strains retained the ability to degrade the first intermediate in the pathway, 6-hydroxynicotine (6 HN). However, heterologous expression of ndrA1A2A3 and ndrB1B2B3B4 did not confer nicotine dehydrogenase activity to E. coli DH5α, Pseudomonas putida KT2440 or Sphingomonas aquatilis. These results provide information on the VPP pathway of nicotine degradation in S. melonis TY, and we conclude that these two sets of genes have essential functions in the conversion of nicotine to 6 HN in strain TY. PMID:28144232

  17. Statistical Analysis of a Large Sample Size Pyroshock Test Data Set Including Post Flight Data Assessment. Revision 1

    NASA Technical Reports Server (NTRS)

    Hughes, William O.; McNelis, Anne M.

    2010-01-01

    The Earth Observing System (EOS) Terra spacecraft was launched on an Atlas IIAS launch vehicle on its mission to observe planet Earth in late 1999. Prior to launch, the new design of the spacecraft's pyroshock separation system was characterized by a series of 13 separation ground tests. The analysis methods used to evaluate this unusually large amount of shock data will be discussed in this paper, with particular emphasis on population distributions and finding statistically significant families of data, leading to an overall shock separation interface level. The wealth of ground test data also allowed a derivation of a Mission Assurance level for the flight. All of the flight shock measurements were below the EOS Terra Mission Assurance level thus contributing to the overall success of the EOS Terra mission. The effectiveness of the statistical methodology for characterizing the shock interface level and for developing a flight Mission Assurance level from a large sample size of shock data is demonstrated in this paper.

  18. Statistical analysis of the effective factors on the 28 days compressive strength and setting time of the concrete

    PubMed Central

    Abolpour, Bahador; Mehdi Afsahi, Mohammad; Hosseini, Saeed Gharib

    2014-01-01

    In this study, the effects of various factors (weight fraction of the SiO2, Al2O3, Fe2O3, Na2O, K2O, CaO, MgO, Cl, SO3, and the Blaine of the cement particles) on the concrete compressive strength and also initial setting time have been investigated. Compressive strength and setting time tests have been carried out based on DIN standards in this study. Interactions of these factors have been obtained by the use of analysis of variance and regression equations of these factors have been obtained to predict the concrete compressive strength and initial setting time. Also, simple and applicable formulas with less than 6% absolute mean error have been developed using the genetic algorithm to predict these parameters. Finally, the effect of each factor has been investigated when other factors are in their low or high level. PMID:26425360

  19. [Comparison of several Russian populations by vital statistics and frequency of genes, causing hereditary diseases].

    PubMed

    El'chinova, G I; Mamedova, R A; Brusintseva, O V; Ginter, E K

    1994-11-01

    Distances computed from vital statistics using the Euclid formula and thus termed "vital" are proposed for use in population studies. An example of use of these statistics for comparison of four large geographically separated Russian populations is given.

  20. TPMS: a set of utilities for querying collections of gene trees

    PubMed Central

    2013-01-01

    Background The information in large collections of phylogenetic trees is useful for many comparative genomic studies. Therefore, there is a need for flexible tools that allow exploration of such collections in order to retrieve relevant data as quickly as possible. Results In this paper, we present TPMS (Tree Pattern-Matching Suite), a set of programs for handling and retrieving gene trees according to different criteria. The programs from the suite include utilities for tree collection building, specific tree-pattern search strategies and tree rooting. Use of TPMS is illustrated through three examples: systematic search for incongruencies in a large tree collection, a short study on the Coelomata/Ecdysozoa controversy and an evaluation of the level of support for a recently published Mammal phylogeny. Conclusion TPMS is a powerful suite allowing to quickly retrieve sets of trees matching complex patterns in large collection or to root trees using more rigorous approaches than the classical midpoint method. As it is made of a set of command-line programs, it can be easily integrated in any sequence analysis pipeline for an automated use. PMID:23530580

  1. An ancient dental gene set governs development and continuous regeneration of teeth in sharks.

    PubMed

    Rasch, Liam J; Martin, Kyle J; Cooper, Rory L; Metscher, Brian D; Underwood, Charlie J; Fraser, Gareth J

    2016-07-15

    The evolution of oral teeth is considered a major contributor to the overall success of jawed vertebrates. This is especially apparent in cartilaginous fishes including sharks and rays, which develop elaborate arrays of highly specialized teeth, organized in rows and retain the capacity for life-long regeneration. Perpetual regeneration of oral teeth has been either lost or highly reduced in many other lineages including important developmental model species, so cartilaginous fishes are uniquely suited for deep comparative analyses of tooth development and regeneration. Additionally, sharks and rays can offer crucial insights into the characters of the dentition in the ancestor of all jawed vertebrates. Despite this, tooth development and regeneration in chondrichthyans is poorly understood and remains virtually uncharacterized from a developmental genetic standpoint. Using the emerging chondrichthyan model, the catshark (Scyliorhinus spp.), we characterized the expression of genes homologous to those known to be expressed during stages of early dental competence, tooth initiation, morphogenesis, and regeneration in bony vertebrates. We have found that expression patterns of several genes from Hh, Wnt/β-catenin, Bmp and Fgf signalling pathways indicate deep conservation over ~450 million years of tooth development and regeneration. We describe how these genes participate in the initial emergence of the shark dentition and how they are redeployed during regeneration of successive tooth generations. We suggest that at the dawn of the vertebrate lineage, teeth (i) were most likely continuously regenerative structures, and (ii) utilised a core set of genes from members of key developmental signalling pathways that were instrumental in creating a dental legacy redeployed throughout vertebrate evolution. These data lay the foundation for further experimental investigations utilizing the unique regenerative capacity of chondrichthyan models to answer evolutionary

  2. Transcriptional Shift Identifies a Set of Genes Driving Breast Cancer Chemoresistance

    PubMed Central

    Vera-Ramirez, Laura; Sanchez-Rovira, Pedro; Ramirez-Tortosa, Cesar L.; Quiles, Jose L.; Ramirez-Tortosa, MCarmen; Lorente, Jose A.

    2013-01-01

    Background Distant recurrences after antineoplastic treatment remain a serious problem for breast cancer clinical management, which threats patients’ life. Systemic therapy is administered to eradicate cancer cells from the organism, both at the site of the primary tumor and at any other potential location. Despite this intervention, a significant proportion of breast cancer patients relapse even many years after their primary tumor has been successfully treated according to current clinical standards, evidencing the existence of a chemoresistant cell subpopulation originating from the primary tumor. Methods/Findings To identify key molecules and signaling pathways which drive breast cancer chemoresistance we performed gene expression analysis before and after anthracycline and taxane-based chemotherapy and compared the results between different histopathological response groups (good-, mid- and bad-response), established according to the Miller & Payne grading system. Two cohorts of 33 and 73 breast cancer patients receiving neoadjuvant chemotherapy were recruited for whole-genome expression analysis and validation assay, respectively. Identified genes were subjected to a bioinformatic analysis in order to ascertain the molecular function of the proteins they encode and the signaling in which they participate. High throughput technologies identified 65 gene sequences which were over-expressed in all groups (P ≤ 0·05 Bonferroni test). Notably we found that, after chemotherapy, a significant proportion of these genes were over-expressed in the good responders group, making their tumors indistinguishable from those of the bad responders in their expression profile (P ≤ 0.05 Benjamini-Hochgerg`s method). Conclusions These data identify a set of key molecular pathways selectively up-regulated in post-chemotherapy cancer cells, which may become appropriate targets for the development of future directed therapies against breast cancer. PMID:23326553

  3. Statistical Epistasis and Progressive Brain Change in Schizophrenia: An Approach for Examining the Relationships Between Multiple Genes

    PubMed Central

    Andreasen, Nancy C.; Wilcox, Marsha A.; Ho, Beng-Choon; Epping, Eric; Ziebell, Steven; Zeien, Eugene; Weiss, Brett; Wassink, Thomas

    2011-01-01

    Although schizophrenia is generally considered to occur as a consequence of multiple genes that interact with one another, very few methods have been developed to model epistasis. Phenotype definition has also been a major challenge for research on the genetics of schizophrenia. In this report we use novel statistical techniques to address the high dimensionality of genomic data, and we apply a refinement in phenotype definition by basing it on the occurrence of brain changes during the early course of the illness, as measured by repeated MR scans (i.e., an “intermediate phenotype.” The method combines a machine learning algorithm, the ensemble method using stochastic gradient boosting, with traditional general linear model statistics. We began with fourteen genes that are relevant to schizophrenia based on association studies or their role in neurodevelopment and then used statistical techniques to reduce them to five genes and 17 SNPs that had a significant statistical interaction: 5 for PDE4B, 4 for RELN, 4 for ERBB4, 3 for DISC1, and one for NRG1. Five of the SNPs involved in these interactions replicate previous research, in that these five SNPs have previously been identified as schizophrenia vulnerability markers or implicate cognitive processes relevant to schizophrenia. This ability to replicate previous work suggests that our method has potential for detecting a meaningful epistatic relationships among the genes that influence brain abnormalities in schizophrenia. PMID:21876540

  4. Statistical epistasis and progressive brain change in schizophrenia: an approach for examining the relationships between multiple genes.

    PubMed

    Andreasen, N C; Wilcox, M A; Ho, B-C; Epping, E; Ziebell, S; Zeien, E; Weiss, B; Wassink, T

    2012-11-01

    Although schizophrenia is generally considered to occur as a consequence of multiple genes that interact with one another, very few methods have been developed to model epistasis. Phenotype definition has also been a major challenge for research on the genetics of schizophrenia. In this report, we use novel statistical techniques to address the high dimensionality of genomic data, and we apply a refinement in phenotype definition by basing it on the occurrence of brain changes during the early course of the illness, as measured by repeated magnetic resonance scans (i.e., an 'intermediate phenotype.') The method combines a machine-learning algorithm, the ensemble method using stochastic gradient boosting, with traditional general linear model statistics. We began with 14 genes that are relevant to schizophrenia, based on association studies or their role in neurodevelopment, and then used statistical techniques to reduce them to five genes and 17 single nucleotide polymorphisms (SNPs) that had a significant statistical interaction: five for PDE4B, four for RELN, four for ERBB4, three for DISC1 and one for NRG1. Five of the SNPs involved in these interactions replicate previous research in that, these five SNPs have previously been identified as schizophrenia vulnerability markers or implicate cognitive processes relevant to schizophrenia. This ability to replicate previous work suggests that our method has potential for detecting a meaningful epistatic relationship among the genes that influence brain abnormalities in schizophrenia.

  5. Linking Hematopoietic Differentiation to Co-Expressed Sets of Pluripotency-Associated and Imprinted Genes and to Regulatory microRNA-Transcription Factor Motifs

    PubMed Central

    Hamed, Mohamed; Trumm, Johannes; Spaniol, Christian; Sethi, Riccha; Irhimeh, Mohammad R.; Fuellen, Georg; Paulsen, Martina

    2017-01-01

    Maintenance of cell pluripotency, differentiation, and reprogramming are regulated by complex gene regulatory networks (GRNs) including monoallelically-expressed imprinted genes. Besides transcriptional control, epigenetic modifications and microRNAs contribute to cellular differentiation. As a model system for studying the capacity of cells to preserve their pluripotency state and the onset of differentiation and subsequent specialization, murine hematopoiesis was used and compared to embryonic stem cells (ESCs) as a control. Using published microarray data, the expression profiles of two sets of genes, pluripotent and imprinted, were compared to a third set of known hematopoietic genes. We found that more than half of the pluripotent and imprinted genes are clearly upregulated in ESCs but subsequently repressed during hematopoiesis. The remaining genes were either upregulated in hematopoietic progenitors or in differentiated blood cells. The three gene sets each consist of three similarly behaving gene groups with similar expression profiles in various lineages of the hematopoietic system as well as in ESCs. To explain this co-regulation behavior, we explored the transcriptional and post-transcriptional mechanisms of pluripotent and imprinted genes and their regulator/target miRNAs in six different hematopoietic lineages. Therewith, lineage-specific transcription factor (TF)-miRNA regulatory networks were generated and their topologies and functional impacts during hematopoiesis were analyzed. This led to the identification of TF-miRNA co-regulatory motifs, for which we validated the contribution to the cellular development of the corresponding lineage in terms of statistical significance and relevance to biological evidence. This analysis also identified key miRNAs and TFs/genes that might play important roles in the derived lineage networks. These molecular associations suggest new aspects of the cellular regulation of the onset of cellular differentiation and

  6. Inter-species inference of gene set enrichment in lung epithelial cells from proteomic and large transcriptomic datasets

    PubMed Central

    Hormoz, Sahand; Bhanot, Gyan; Biehl, Michael; Bilal, Erhan; Meyer, Pablo; Norel, Raquel; Rhrissorrakrai, Kahn; Dayarian, Adel

    2015-01-01

    Motivation: Translating findings in rodent models to human models has been a cornerstone of modern biology and drug development. However, in many cases, a naive ‘extrapolation’ between the two species has not succeeded. As a result, clinical trials of new drugs sometimes fail even after considerable success in the mouse or rat stage of development. In addition to in vitro studies, inter-species translation requires analytical tools that can predict the enriched gene sets in human cells under various stimuli from corresponding measurements in animals. Such tools can improve our understanding of the underlying biology and optimize the allocation of resources for drug development. Results: We developed an algorithm to predict differential gene set enrichment as part of the sbv IMPROVER (systems biology verification in Industrial Methodology for Process Verification in Research) Species Translation Challenge, which focused on phosphoproteomic and transcriptomic measurements of normal human bronchial epithelial (NHBE) primary cells under various stimuli and corresponding measurements in rat (NRBE) primary cells. We find that gene sets exhibit a higher inter-species correlation compared with individual genes, and are potentially more suited for direct prediction. Furthermore, in contrast to a similar cross-species response in protein phosphorylation states 5 and 25 min after exposure to stimuli, gene set enrichment 6 h after exposure is significantly different in NHBE cells compared with NRBE cells. In spite of this difference, we were able to develop a robust algorithm to predict gene set activation in NHBE with high accuracy using simple analytical methods. Availability and implementation: Implementation of all algorithms is available as source code (in Matlab) at http://bhanot.biomaps.rutgers.edu/wiki/codes_SC3_Predicting_GeneSets.zip, along with the relevant data used in the analysis. Gene sets, gene expression and protein phosphorylation data are available on

  7. Building a statistical emulator for prediction of crop yield response to climate change: a global gridded panel data set approach

    NASA Astrophysics Data System (ADS)

    Mistry, Malcolm; De Cian, Enrica; Wing, Ian Sue

    2015-04-01

    There is widespread concern that trends and variability in weather induced by climate change will detrimentally affect global agricultural productivity and food supplies. Reliable quantification of the risks of negative impacts at regional and global scales is a critical research need, which has so far been met by forcing state-of-the-art global gridded crop models with outputs of global climate model (GCM) simulations in exercises such as the Inter-Sectoral Impact Model Intercomparison Project (ISIMIP)-Fastrack. Notwithstanding such progress, it remains challenging to use these simulation-based projections to assess agricultural risk because their gridded fields of crop yields are fundamentally denominated as discrete combinations of warming scenarios, GCMs and crop models, and not as model-specific or model-averaged yield response functions of meteorological shifts, which may have their own independent probability of occurrence. By contrast, the empirical climate economics literature has adeptly represented agricultural responses to meteorological variables as reduced-form statistical response surfaces which identify the crop productivity impacts of additional exposure to different intervals of temperature and precipitation [cf Schlenker and Roberts, 2009]. This raises several important questions: (1) what do the equivalent reduced-form statistical response surfaces look like for crop model outputs, (2) do they exhibit systematic variation over space (e.g., crop suitability zones) or across crop models with different characteristics, (3) how do they compare to estimates based on historical observations, and (4) what are the implications for the characterization of climate risks? We address these questions by estimating statistical yield response functions for four major crops (maize, rice, wheat and soybeans) over the historical period (1971-2004) as well as future climate change scenarios (2005-2099) using ISIMIP-Fastrack data for five GCMs and seven crop models

  8. Evaluation of daily precipitation statistics and monsoon onset/retreat over western Sahel in multiple data sets

    NASA Astrophysics Data System (ADS)

    Diaconescu, Emilia Paula; Gachon, Philippe; Scinocca, John; Laprise, René

    2015-09-01

    The West Africa rainfall regime constitutes a considerable challenge for Regional Climate Models (RCMs) due to the complexity of dynamical and physical processes that characterise the West African Monsoon. In this paper, daily precipitation statistics are evaluated from the contributions to the AFRICA-CORDEX experiment from two ERA-Interim driven Canadian RCMs: CanRCM4, developed at the Canadian Centre for Climate Modelling and Analysis (CCCma) and CRCM5, developed at the University of Québec at Montréal. These modelled precipitation statistics are evaluated against three gridded observed datasets—the Global Precipitation Climatology Project (GPCP), the Tropical Rainfall Measuring Mission (TRMM), and the Africa Rainfall Climatology (ARC2)—and four reanalysis products (ECMWF ERA-Interim, NCEP/DOE Reanalysis II, NASA MERRA and NOAA-CIRES Twentieth Century Reanalysis). The two RCMs share the same dynamics from the Environment Canada GEM forecast model, but have two different physics' packages: CanRCM4 obtains its physics from CCCma's global atmospheric model (CanAM4), while CRCM5 shares a number of its physics modules with the limited-area version of GEM forecast model. The evaluation is focused on various daily precipitation statistics (maximum number of consecutive wet days, number of moderate and very heavy precipitation events, precipitation frequency distribution) and on the monsoon onset and retreat over the Sahel region. We find that the CRCM5 has a good representation of daily precipitation statistics over the southern Sahel, with spatial distributions close to GPCP dataset. Some differences are observed in the northern part of the Sahel, where the model is characterised by a dry bias. CanRCM4 and the ERA-Interim and MERRA reanalysis products overestimate the number of wet days over Sahel with a shift in the frequency distribution toward smaller daily precipitation amounts than in observations. Both RCMs and reanalyses have difficulties in reproducing

  9. Meta-analysis of Gene-Level Associations for Rare Variants Based on Single-Variant Statistics

    PubMed Central

    Hu, Yi-Juan; Berndt, Sonja I.; Gustafsson, Stefan; Ganna, Andrea; Berndt, Sonja I.; Gustafsson, Stefan; Mägi, Reedik; Ganna, Andrea; Wheeler, Eleanor; Feitosa, Mary F.; Justice, Anne E.; Monda, Keri L.; Croteau-Chonka, Damien C.; Day, Felix R.; Esko, Tõnu; Fall, Tove; Ferreira, Teresa; Gentilini, Davide; Jackson, Anne U.; Luan, Jian’an; Randall, Joshua C.; Vedantam, Sailaja; Willer, Cristen J.; Winkler, Thomas W.; Wood, Andrew R.; Workalemahu, Tsegaselassie; Hu, Yi-Juan; Lee, Sang Hong; Liang, Liming; Lin, Dan-Yu; Min, Josine L.; Neale, Benjamin M.; Thorleifsson, Gudmar; Yang, Jian; Albrecht, Eva; Amin, Najaf; Bragg-Gresham, Jennifer L.; Cadby, Gemma; den Heijer, Martin; Eklund, Niina; Fischer, Krista; Goel, Anuj; Hottenga, Jouke-Jan; Huffman, Jennifer E.; Jarick, Ivonne; Johansson, Åsa; Johnson, Toby; Kanoni, Stavroula; Kleber, Marcus E.; König, Inke R.; Kristiansson, Kati; Kutalik, Zoltán; Lamina, Claudia; Lecoeur, Cecile; Li, Guo; Mangino, Massimo; McArdle, Wendy L.; Medina-Gomez, Carolina; Müller-Nurasyid, Martina; Ngwa, Julius S.; Nolte, Ilja M.; Paternoster, Lavinia; Pechlivanis, Sonali; Perola, Markus; Peters, Marjolein J.; Preuss, Michael; Rose, Lynda M.; Shi, Jianxin; Shungin, Dmitry; Smith, Albert Vernon; Strawbridge, Rona J.; Surakka, Ida; Teumer, Alexander; Trip, Mieke D.; Tyrer, Jonathan; Van Vliet-Ostaptchouk, Jana V.; Vandenput, Liesbeth; Waite, Lindsay L.; Zhao, Jing Hua; Absher, Devin; Asselbergs, Folkert W.; Atalay, Mustafa; Attwood, Antony P.; Balmforth, Anthony J.; Basart, Hanneke; Beilby, John; Bonnycastle, Lori L.; Brambilla, Paolo; Bruinenberg, Marcel; Campbell, Harry; Chasman, Daniel I.; Chines, Peter S.; Collins, Francis S.; Connell, John M.; Cookson, William; de Faire, Ulf; de Vegt, Femmie; Dei, Mariano; Dimitriou, Maria; Edkins, Sarah; Estrada, Karol; Evans, David M.; Farrall, Martin; Ferrario, Marco M.; Ferrières, Jean; Franke, Lude; Frau, Francesca; Gejman, Pablo V.; Grallert, Harald; Grönberg, Henrik; Gudnason, Vilmundur; Hall, Alistair S.; Hall, Per; Hartikainen, Anna-Liisa; Hayward, Caroline; Heard-Costa, Nancy L.; Heath, Andrew C.; Hebebrand, Johannes; Homuth, Georg; Hu, Frank B.; Hunt, Sarah E.; Hyppönen, Elina; Iribarren, Carlos; Jacobs, Kevin B.; Jansson, John-Olov; Jula, Antti; Kähönen, Mika; Kathiresan, Sekar; Kee, Frank; Khaw, Kay-Tee; Kivimaki, Mika; Koenig, Wolfgang; Kraja, Aldi T.; Kumari, Meena; Kuulasmaa, Kari; Kuusisto, Johanna; Laitinen, Jaana H.; Lakka, Timo A.; Langenberg, Claudia; Launer, Lenore J.; Lind, Lars; Lindström, Jaana; Liu, Jianjun; Liuzzi, Antonio; Lokki, Marja-Liisa; Lorentzon, Mattias; Madden, Pamela A.; Magnusson, Patrik K.; Manunta, Paolo; Marek, Diana; März, Winfried; Leach, Irene Mateo; McKnight, Barbara; Medland, Sarah E.; Mihailov, Evelin; Milani, Lili; Montgomery, Grant W.; Mooser, Vincent; Mühleisen, Thomas W.; Munroe, Patricia B.; Musk, Arthur W.; Narisu, Narisu; Navis, Gerjan; Nicholson, George; Nohr, Ellen A.; Ong, Ken K.; Oostra, Ben A.; Palmer, Colin N.A.; Palotie, Aarno; Peden, John F.; Pedersen, Nancy; Peters, Annette; Polasek, Ozren; Pouta, Anneli; Pramstaller, Peter P.; Prokopenko, Inga; Pütter, Carolin; Radhakrishnan, Aparna; Raitakari, Olli; Rendon, Augusto; Rivadeneira, Fernando; Rudan, Igor; Saaristo, Timo E.; Sambrook, Jennifer G.; Sanders, Alan R.; Sanna, Serena; Saramies, Jouko; Schipf, Sabine; Schreiber, Stefan; Schunkert, Heribert; Shin, So-Youn; Signorini, Stefano; Sinisalo, Juha; Skrobek, Boris; Soranzo, Nicole; Stančáková, Alena; Stark, Klaus; Stephens, Jonathan C.; Stirrups, Kathleen; Stolk, Ronald P.; Stumvoll, Michael; Swift, Amy J.; Theodoraki, Eirini V.; Thorand, Barbara; Tregouet, David-Alexandre; Tremoli, Elena; Van der Klauw, Melanie M.; van Meurs, Joyce B.J.; Vermeulen, Sita H.; Viikari, Jorma; Virtamo, Jarmo; Vitart, Veronique; Waeber, Gérard; Wang, Zhaoming; Widén, Elisabeth; Wild, Sarah H.; Willemsen, Gonneke; Winkelmann, Bernhard R.; Witteman, Jacqueline C.M.; Wolffenbuttel, Bruce H.R.; Wong, Andrew; Wright, Alan F.; Zillikens, M. Carola; Amouyel, Philippe; Boehm, Bernhard O.; Boerwinkle, Eric; Boomsma, Dorret I.; Caulfield, Mark J.; Chanock, Stephen J.; Cupples, L. Adrienne; Cusi, Daniele; Dedoussis, George V.; Erdmann, Jeanette; Eriksson, Johan G.; Franks, Paul W.; Froguel, Philippe; Gieger, Christian; Gyllensten, Ulf; Hamsten, Anders; Harris, Tamara B.; Hengstenberg, Christian; Hicks, Andrew A.; Hingorani, Aroon; Hinney, Anke; Hofman, Albert; Hovingh, Kees G.; Hveem, Kristian; Illig, Thomas; Jarvelin, Marjo-Riitta; Jöckel, Karl-Heinz; Keinanen-Kiukaanniemi, Sirkka M.; Kiemeney, Lambertus A.; Kuh, Diana; Laakso, Markku; Lehtimäki, Terho; Levinson, Douglas F.; Martin, Nicholas G.; Metspalu, Andres; Morris, Andrew D.; Nieminen, Markku S.; Njølstad, Inger; Ohlsson, Claes; Oldehinkel, Albertine J.; Ouwehand, Willem H.; Palmer, Lyle J.; Penninx, Brenda; Power, Chris; Province, Michael A.; Psaty, Bruce M.; Qi, Lu; Rauramaa, Rainer; Ridker, Paul M.; Ripatti, Samuli; Salomaa, Veikko; Samani, Nilesh J.; Snieder, Harold; Sørensen, Thorkild I.A.; Spector, Timothy D.; Stefansson, Kari; Tönjes, Anke; Tuomilehto, Jaakko; Uitterlinden, André G.; Uusitupa, Matti; van der Harst, Pim; Vollenweider, Peter; Wallaschofski, Henri; Wareham, Nicholas J.; Watkins, Hugh; Wichmann, H.-Erich; Wilson, James F.; Abecasis, Goncalo R.; Assimes, Themistocles L.; Barroso, Inês; Boehnke, Michael; Borecki, Ingrid B.; Deloukas, Panos; Fox, Caroline S.; Frayling, Timothy; Groop, Leif C.; Haritunian, Talin; Heid, Iris M.; Hunter, David; Kaplan, Robert C.; Karpe, Fredrik; Moffatt, Miriam; Mohlke, Karen L.; O’Connell, Jeffrey R.; Pawitan, Yudi; Schadt, Eric E.; Schlessinger, David; Steinthorsdottir, Valgerdur; Strachan, David P.; Thorsteinsdottir, Unnur; van Duijn, Cornelia M.; Visscher, Peter M.; Di Blasio, Anna Maria; Hirschhorn, Joel N.; Lindgren, Cecilia M.; Morris, Andrew P.; Meyre, David; Scherag, André; McCarthy, Mark I.; Speliotes, Elizabeth K.; North, Kari E.; Loos, Ruth J.F.; Ingelsson, Erik; Hirschhorn, Joel; North, Kari E.; Ingelsson, Erik; Lin, Dan-Yu

    2013-01-01

    Meta-analysis of genome-wide association studies (GWASs) has led to the discoveries of many common variants associated with complex human diseases. There is a growing recognition that identifying “causal” rare variants also requires large-scale meta-analysis. The fact that association tests with rare variants are performed at the gene level rather than at the variant level poses unprecedented challenges in the meta-analysis. First, different studies may adopt different gene-level tests, so the results are not compatible. Second, gene-level tests require multivariate statistics (i.e., components of the test statistic and their covariance matrix), which are difficult to obtain. To overcome these challenges, we propose to perform gene-level tests for rare variants by combining the results of single-variant analysis (i.e., p values of association tests and effect estimates) from participating studies. This simple strategy is possible because of an insight that multivariate statistics can be recovered from single-variant statistics, together with the correlation matrix of the single-variant test statistics, which can be estimated from one of the participating studies or from a publicly available database. We show both theoretically and numerically that the proposed meta-analysis approach provides accurate control of the type I error and is as powerful as joint analysis of individual participant data. This approach accommodates any disease phenotype and any study design and produces all commonly used gene-level tests. An application to the GWAS summary results of the Genetic Investigation of ANthropometric Traits (GIANT) consortium reveals rare and low-frequency variants associated with human height. The relevant software is freely available. PMID:23891470

  10. Meta-analysis of gene-level associations for rare variants based on single-variant statistics.

    PubMed

    Hu, Yi-Juan; Berndt, Sonja I; Gustafsson, Stefan; Ganna, Andrea; Hirschhorn, Joel; North, Kari E; Ingelsson, Erik; Lin, Dan-Yu

    2013-08-08

    Meta-analysis of genome-wide association studies (GWASs) has led to the discoveries of many common variants associated with complex human diseases. There is a growing recognition that identifying "causal" rare variants also requires large-scale meta-analysis. The fact that association tests with rare variants are performed at the gene level rather than at the variant level poses unprecedented challenges in the meta-analysis. First, different studies may adopt different gene-level tests, so the results are not compatible. Second, gene-level tests require multivariate statistics (i.e., components of the test statistic and their covariance matrix), which are difficult to obtain. To overcome these challenges, we propose to perform gene-level tests for rare variants by combining the results of single-variant analysis (i.e., p values of association tests and effect estimates) from participating studies. This simple strategy is possible because of an insight that multivariate statistics can be recovered from single-variant statistics, together with the correlation matrix of the single-variant test statistics, which can be estimated from one of the participating studies or from a publicly available database. We show both theoretically and numerically that the proposed meta-analysis approach provides accurate control of the type I error and is as powerful as joint analysis of individual participant data. This approach accommodates any disease phenotype and any study design and produces all commonly used gene-level tests. An application to the GWAS summary results of the Genetic Investigation of ANthropometric Traits (GIANT) consortium reveals rare and low-frequency variants associated with human height. The relevant software is freely available.

  11. An optimized gene set for transcriptomics based neurodevelopmental toxicity prediction in the neural embryonic stem cell test.

    PubMed

    Pennings, Jeroen L A; Theunissen, Peter T; Piersma, Aldert H

    2012-10-28

    The murine neural embryonic stem cell test (ESTn) is an in vitro model for neurodevelopmental toxicity testing. Recent studies have shown that application of transcriptomics analyses in the ESTn is useful for obtaining more accurate predictions as well as mechanistic insights. Gene expression responses due to stem cell neural differentiation versus toxicant exposure could be distinguished using the Principal Component Analysis based differentiation track algorithm. In this study, we performed a de novo analysis on combined raw data (10 compounds, 19 exposures) from three previous transcriptomics studies to identify an optimized gene set for neurodevelopmental toxicity prediction in the ESTn. By evaluating predictions of 200,000 randomly selected gene sets, we identified genes which significantly contributed to the prediction reliability. A set of 100 genes was obtained, predominantly involved in (neural) development. Further stringency restrictions resulted in a set of 29 genes that allowed for 84% prediction accuracy (area under the curve 94%). We anticipate these gene sets will contribute to further improve ESTn transcriptomics studies aimed at compound risk assessment.

  12. Statistical tests against systematic errors in data sets based on the equality of residual means and variances from control samples: theory and applications.

    PubMed

    Henn, Julian; Meindl, Kathrin

    2015-03-01

    Statistical tests are applied for the detection of systematic errors in data sets from least-squares refinements or other residual-based reconstruction processes. Samples of the residuals of the data are tested against the hypothesis that they belong to the same distribution. For this it is necessary that they show the same mean values and variances within the limits given by statistical fluctuations. When the samples differ significantly from each other, they are not from the same distribution within the limits set by the significance level. Therefore they cannot originate from a single Gaussian function in this case. It is shown that a significance cutoff results in exactly this case. Significance cutoffs are still frequently used in charge-density studies. The tests are applied to artificial data with and without systematic errors and to experimental data from the literature.

  13. Construction of a Bacterial Cell that Contains Only the Set of Essential Genes Necessary to Impart Life

    DTIC Science & Technology

    2013-02-19

    Status Report (Quarterly) 3. DATES COVERED (From - To) 17-11-2012 to 19-02-2013 4. TITLE AND SUBTITLE Construction of a Bacterial Cell that Contains ...creating a bacterium that contains only the set of genes that are essential for life. Toward that end, we have continued to delete genes and gene...301 795 7133 Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std. Z39.18 Construction of a Bacterial Cell that Contains Only the

  14. An Integrated Statistical Approach to Compare Transcriptomics Data Across Experiments: A Case Study on the Identification of Candidate Target Genes of the Transcription Factor PPARα

    PubMed Central

    Ullah, Mohammad Ohid; Müller, Michael; Hooiveld, Guido J.E.J.

    2012-01-01

    An effective strategy to elucidate the signal transduction cascades activated by a transcription factor is to compare the transcriptional profiles of wild type and transcription factor knockout models. Many statistical tests have been proposed for analyzing gene expression data, but most tests are based on pair-wise comparisons. Since the analysis of microarrays involves the testing of multiple hypotheses within one study, it is generally accepted that one should control for false positives by the false discovery rate (FDR). However, it has been reported that this may be an inappropriate metric for comparing data across different experiments. Here we propose an approach that addresses the above mentioned problem by the simultaneous testing and integration of the three hypotheses (contrasts) using the cell means ANOVA model. These three contrasts test for the effect of a treatment in wild type, gene knockout, and globally over all experimental groups. We illustrate our approach on microarray experiments that focused on the identification of candidate target genes and biological processes governed by the fatty acid sensing transcription factor PPARα in liver. Compared to the often applied FDR based across experiment comparison, our approach identified a conservative but less noisy set of candidate genes with same sensitivity and specificity. However, our method had the advantage of properly adjusting for multiple testing while integrating data from two experiments, and was driven by biological inference. Taken together, in this study we present a simple, yet efficient strategy to compare differential expression of genes across experiments while controlling for multiple hypothesis testing. PMID:22783064

  15. Shedding of clinical-grade lentiviral vectors is not detected in a gene therapy setting.

    PubMed

    Cesani, M; Plati, T; Lorioli, L; Benedicenti, F; Redaelli, D; Dionisio, F; Biasco, L; Montini, E; Naldini, L; Biffi, A

    2015-06-01

    Gene therapy using viral vectors that stably integrate into ex vivo cultured cells holds great promises for the treatment of monogenic diseases as well as cancer. However, carry-over of infectious vector particles has been described to occur upon ex vivo transduction of target cells. This, in turn, may lead to inadvertent spreading of viral particles to off-target cells in vivo, raising concerns for potential adverse effects, such as toxicity of ectopic transgene expression, immunogenicity from in vivo transduced antigen-presenting cells and, possibly, gene transfer to germline cells. Here, we have investigated factors influencing the extent of lentiviral vector (LV) shedding upon ex vivo transduction of human hematopoietic stem and progenitor cells. Our results indicate that, although vector carry-over is detectable when using laboratory-grade vector stocks, the use of clinical-grade vector stocks strongly decreases the extent of inadvertent transduction of secondary targets, likely because of the higher degree of purification. These data provide supportive evidence for the safe use of the LV platform in clinical settings.

  16. Statistical generation of training sets for measuring NO3(-), NH4(+) and major ions in natural waters using an ion selective electrode array.

    PubMed

    Mueller, Amy V; Hemond, Harold F

    2016-05-18

    Knowledge of ionic concentrations in natural waters is essential to understand watershed processes. Inorganic nitrogen, in the form of nitrate and ammonium ions, is a key nutrient as well as a participant in redox, acid-base, and photochemical processes of natural waters, leading to spatiotemporal patterns of ion concentrations at scales as small as meters or hours. Current options for measurement in situ are costly, relying primarily on instruments adapted from laboratory methods (e.g., colorimetric, UV absorption); free-standing and inexpensive ISE sensors for NO3(-) and NH4(+) could be attractive alternatives if interferences from other constituents were overcome. Multi-sensor arrays, coupled with appropriate non-linear signal processing, offer promise in this capacity but have not yet successfully achieved signal separation for NO3(-) and NH4(+)in situ at naturally occurring levels in unprocessed water samples. A novel signal processor, underpinned by an appropriate sensor array, is proposed that overcomes previous limitations by explicitly integrating basic chemical constraints (e.g., charge balance). This work further presents a rationalized process for the development of such in situ instrumentation for NO3(-) and NH4(+), including a statistical-modeling strategy for instrument design, training/calibration, and validation. Statistical analysis reveals that historical concentrations of major ionic constituents in natural waters across New England strongly covary and are multi-modal. This informs the design of a statistically appropriate training set, suggesting that the strong covariance of constituents across environmental samples can be exploited through appropriate signal processing mechanisms to further improve estimates of minor constituents. Two artificial neural network architectures, one expanded to incorporate knowledge of basic chemical constraints, were tested to process outputs of a multi-sensor array, trained using datasets of varying degrees of

  17. A transcriptomic approach to identify regulatory genes involved in fruit set of wild-type and parthenocarpic tomato genotypes.

    PubMed

    Ruiu, Fabrizio; Picarella, Maurizio Enea; Imanishi, Shunsuke; Mazzucato, Andrea

    2015-10-01

    The tomato parthenocarpic fruit (pat) mutation associates a strong competence for parthenocarpy with homeotic transformation of anthers and aberrancy of ovules. To dissect this complex floral phenotype, genes involved in the pollination-independent fruit set of the pat mutant were investigated by microarray analysis using wild-type and mutant ovaries. Normalized expression data were subjected to one-way ANOVA and 2499 differentially expressed genes (DEGs) displaying a >1.5 log-fold change in at least one of the pairwise comparisons analyzed were detected. DEGs were categorized into 20 clusters and clusters classified into five groups representing transcripts with similar expression dynamics. The "regulatory function" group (685 DEGs) contained putative negative or positive fruit set regulators, "pollination-dependent" (411 DEGs) included genes activated by pollination, "fruit growth-related" (815 DEGs) genes activated at early fruit growth. The last groups listed genes with different or similar expression pattern at all stages in the two genotypes. qRT-PCR validation of 20 DEGs plus other four selected genes assessed the high reliability of microarray expression data; the average correlation coefficient for the 20 DEGs was 0.90. In all the groups were evidenced relevant transcription factors encoding proteins regulating meristem differentiation and floral organ development, genes involved in metabolism, transport and response of hormones, genes involved in cell division and in primary and secondary metabolism. Among pathways related to secondary metabolites emerged genes related to the synthesis of flavonoids, supporting the recent evidence that these compounds are important at the fruit set phase. Selected genes showing a de-regulated expression pattern in pat were studied in other four parthenocarpic genotypes either genetically anonymous or carrying lesions in known gene sequences. This comparative approach offered novel insights for improving the present

  18. Repressors Nrg1 and Nrg2 regulate a set of stress-responsive genes in Saccharomyces cerevisiae.

    PubMed

    Vyas, Valmik K; Berkey, Cristin D; Miyao, Takenori; Carlson, Marian

    2005-11-01

    The yeast Saccharomyces cerevisiae responds to environmental stress by rapidly altering the expression of large sets of genes. We report evidence that the transcriptional repressors Nrg1 and Nrg2 (Nrg1/Nrg2), which were previously implicated in glucose repression, regulate a set of stress-responsive genes. Genome-wide expression analysis identified 150 genes that were upregulated in nrg1Delta nrg2Delta double mutant cells, relative to wild-type cells, during growth in glucose. We found that many of these genes are regulated by glucose repression. Stress response elements (STREs) and STRE-like elements are overrepresented in the promoters of these genes, and a search of available expression data sets showed that many are regulated in response to a variety of environmental stress signals. In accord with these findings, mutation of NRG1 and NRG2 enhanced the resistance of cells to salt and oxidative stress and decreased tolerance to freezing. We present evidence that Nrg1/Nrg2 not only contribute to repression of target genes in the absence of stress but also limit induction in response to salt stress. We suggest that Nrg1/Nrg2 fine-tune the regulation of a set of stress-responsive genes.

  19. Gene set analyses of genome-wide association studies on 49 quantitative traits measured in a single genetic epidemiology dataset.

    PubMed

    Kim, Jihye; Kwon, Ji-Sun; Kim, Sangsoo

    2013-09-01

    Gene set analysis is a powerful tool for interpreting a genome-wide association study result and is gaining popularity these days. Comparison of the gene sets obtained for a variety of traits measured from a single genetic epidemiology dataset may give insights into the biological mechanisms underlying these traits. Based on the previously published single nucleotide polymorphism (SNP) genotype data on 8,842 individuals enrolled in the Korea Association Resource project, we performed a series of systematic genome-wide association analyses for 49 quantitative traits of basic epidemiological, anthropometric, or blood chemistry parameters. Each analysis result was subjected to subsequent gene set analyses based on Gene Ontology (GO) terms using gene set analysis software, GSA-SNP, identifying a set of GO terms significantly associated to each trait (pcorr < 0.05). Pairwise comparison of the traits in terms of the semantic similarity in their GO sets revealed surprising cases where phenotypically uncorrelated traits showed high similarity in terms of biological pathways. For example, the pH level was related to 7 other traits that showed low phenotypic correlations with it. A literature survey implies that these traits may be regulated partly by common pathways that involve neuronal or nerve systems.

  20. Introduction to Bayesian statistical approaches to compositional analyses of transgenic crops 1. Model validation and setting the stage.

    PubMed

    Harrison, Jay M; Breeze, Matthew L; Harrigan, George G

    2011-08-01

    Statistical comparisons of compositional data generated on genetically modified (GM) crops and their near-isogenic conventional (non-GM) counterparts typically rely on classical significance testing. This manuscript presents an introduction to Bayesian methods for compositional analysis along with recommendations for model validation. The approach is illustrated using protein and fat data from two herbicide tolerant GM soybeans (MON87708 and MON87708×MON89788) and a conventional comparator grown in the US in 2008 and 2009. Guidelines recommended by the US Food and Drug Administration (FDA) in conducting Bayesian analyses of clinical studies on medical devices were followed. This study is the first Bayesian approach to GM and non-GM compositional comparisons. The evaluation presented here supports a conclusion that a Bayesian approach to analyzing compositional data can provide meaningful and interpretable results. We further describe the importance of method validation and approaches to model checking if Bayesian approaches to compositional data analysis are to be considered viable by scientists involved in GM research and regulation.

  1. A novel proposal of a simplified bacterial gene set and the neo-construction of a general minimized metabolic network

    PubMed Central

    Ye, Yuan-Nong; Ma, Bin-Guang; Dong, Chuan; Zhang, Hong; Chen, Ling-Ling; Guo, Feng-Biao

    2016-01-01

    A minimal gene set (MGS) is critical for the assembly of a minimal artificial cell. We have developed a proposal of simplifying bacterial gene set to approximate a bacterial MGS by the following procedure. First, we base our simplified bacterial gene set (SBGS) on experimentally determined essential genes to ensure that the genes included in the SBGS are critical. Second, we introduced a half-retaining strategy to extract persistent essential genes to ensure stability. Third, we constructed a viable metabolic network to supplement SBGS. The proposed SBGS includes 327 genes and required 431 reactions. This report describes an SBGS that preserves both self-replication and self-maintenance systems. In the minimized metabolic network, we identified five novel hub metabolites and confirmed 20 known hubs. Highly essential genes were found to distribute the connecting metabolites into more reactions. Based on our SBGS, we expanded the pool of targets for designing broad-spectrum antibacterial drugs to reduce pathogen resistance. We also suggested a rough semi-de novo strategy to synthesize an artificial cell, with potential applications in industry. PMID:27713529

  2. Classification of Non-Small Cell Lung Cancer Using Significance Analysis of Microarray-Gene Set Reduction Algorithm

    PubMed Central

    Zhang, Lei; Wang, Linlin; Du, Bochuan; Wang, Tianjiao; Tian, Pu

    2016-01-01

    Among non-small cell lung cancer (NSCLC), adenocarcinoma (AC), and squamous cell carcinoma (SCC) are two major histology subtypes, accounting for roughly 40% and 30% of all lung cancer cases, respectively. Since AC and SCC differ in their cell of origin, location within the lung, and growth pattern, they are considered as distinct diseases. Gene expression signatures have been demonstrated to be an effective tool for distinguishing AC and SCC. Gene set analysis is regarded as irrelevant to the identification of gene expression signatures. Nevertheless, we found that one specific gene set analysis method, significance analysis of microarray-gene set reduction (SAMGSR), can be adopted directly to select relevant features and to construct gene expression signatures. In this study, we applied SAMGSR to a NSCLC gene expression dataset. When compared with several novel feature selection algorithms, for example, LASSO, SAMGSR has equivalent or better performance in terms of predictive ability and model parsimony. Therefore, SAMGSR is a feature selection algorithm, indeed. Additionally, we applied SAMGSR to AC and SCC subtypes separately to discriminate their respective stages, that is, stage II versus stage I. Few overlaps between these two resulting gene signatures illustrate that AC and SCC are technically distinct diseases. Therefore, stratified analyses on subtypes are recommended when diagnostic or prognostic signatures of these two NSCLC subtypes are constructed. PMID:27446945

  3. Classification of Non-Small Cell Lung Cancer Using Significance Analysis of Microarray-Gene Set Reduction Algorithm.

    PubMed

    Zhang, Lei; Wang, Linlin; Du, Bochuan; Wang, Tianjiao; Tian, Pu; Tian, Suyan

    2016-01-01

    Among non-small cell lung cancer (NSCLC), adenocarcinoma (AC), and squamous cell carcinoma (SCC) are two major histology subtypes, accounting for roughly 40% and 30% of all lung cancer cases, respectively. Since AC and SCC differ in their cell of origin, location within the lung, and growth pattern, they are considered as distinct diseases. Gene expression signatures have been demonstrated to be an effective tool for distinguishing AC and SCC. Gene set analysis is regarded as irrelevant to the identification of gene expression signatures. Nevertheless, we found that one specific gene set analysis method, significance analysis of microarray-gene set reduction (SAMGSR), can be adopted directly to select relevant features and to construct gene expression signatures. In this study, we applied SAMGSR to a NSCLC gene expression dataset. When compared with several novel feature selection algorithms, for example, LASSO, SAMGSR has equivalent or better performance in terms of predictive ability and model parsimony. Therefore, SAMGSR is a feature selection algorithm, indeed. Additionally, we applied SAMGSR to AC and SCC subtypes separately to discriminate their respective stages, that is, stage II versus stage I. Few overlaps between these two resulting gene signatures illustrate that AC and SCC are technically distinct diseases. Therefore, stratified analyses on subtypes are recommended when diagnostic or prognostic signatures of these two NSCLC subtypes are constructed.

  4. Comparison of different statistical methods for estimation of extreme sea levels with wave set-up contribution

    NASA Astrophysics Data System (ADS)

    Kergadallan, Xavier; Bernardara, Pietro; Benoit, Michel; Andreewsky, Marc; Weiss, Jérôme

    2013-04-01

    Estimating the probability of occurrence of extreme sea levels is a central issue for the protection of the coast. Return periods of sea level with wave set-up contribution are estimated here in one site : Cherbourg in France in the English Channel. The methodology follows two steps : the first one is computation of joint probability of simultaneous wave height and still sea level, the second one is interpretation of that joint probabilities to assess a sea level for a given return period. Two different approaches were evaluated to compute joint probability of simultaneous wave height and still sea level : the first one is multivariate extreme values distributions of logistic type in which all components of the variables become large simultaneously, the second one is conditional approach for multivariate extreme values in which only one component of the variables have to be large. Two different methods were applied to estimate sea level with wave set-up contribution for a given return period : Monte-Carlo simulation in which estimation is more accurate but needs higher calculation time and classical ocean engineering design contours of type inverse-FORM in which the method is simpler and allows more complex estimation of wave setup part (wave propagation to the coast for example). We compare results from the two different approaches with the two different methods. To be able to use both Monte-Carlo simulation and design contours methods, wave setup is estimated with an simple empirical formula. We show advantages of the conditional approach compared to the multivariate extreme values approach when extreme sea-level occurs when either surge or wave height is large. We discuss the validity of the ocean engineering design contours method which is an alternative when computation of sea levels is too complex to use Monte-Carlo simulation method.

  5. ICGA-PSO-ELM approach for accurate multiclass cancer classification resulting in reduced gene sets in which genes encoding secreted proteins are highly represented.

    PubMed

    Saraswathi, Saras; Sundaram, Suresh; Sundararajan, Narasimhan; Zimmermann, Michael; Nilsen-Hamilton, Marit

    2011-01-01

    A combination of Integer-Coded Genetic Algorithm (ICGA) and Particle Swarm Optimization (PSO), coupled with the neural-network-based Extreme Learning Machine (ELM), is used for gene selection and cancer classification. ICGA is used with PSO-ELM to select an optimal set of genes, which is then used to build a classifier to develop an algorithm (ICGA_PSO_ELM) that can handle sparse data and sample imbalance. We evaluate the performance of ICGA-PSO-ELM and compare our results with existing methods in the literature. An investigation into the functions of the selected genes, using a systems biology approach, revealed that many of the identified genes are involved in cell signaling and proliferation. An analysis of these gene sets shows a larger representation of genes that encode secreted proteins than found in randomly selected gene sets. Secreted proteins constitute a major means by which cells interact with their surroundings. Mounting biological evidence has identified the tumor microenvironment as a critical factor that determines tumor survival and growth. Thus, the genes identified by this study that encode secreted proteins might provide important insights to the nature of the critical biological features in the microenvironment of each tumor type that allow these cells to thrive and proliferate.

  6. Bayesian Statistical Analyses for Presence of Single Genes Affecting Meat Quality Traits in a Crossed Pig Population

    PubMed Central

    Janss, LLG.; Van-Arendonk, JAM.; Brascamp, E. W.

    1997-01-01

    Presence of single genes affecting meat quality traits was investigated in F(2) individuals of a cross between Chinese Meishan and Western pig lines using phenotypic measurements on 11 traits. A Bayesian approach was used for inference about a mixed model of inheritance, postulating effects of polygenic background genes, action of a biallelic autosomal single gene and various nongenetic effects. Cooking loss, drip loss, two pH measurements, intramuscular fat, shearforce and back-fat thickness were traits found to be likely influenced by a single gene. In all cases, a recessive allele was found, which likely originates from the Meishan breed and is absent in the Western founder lines. By studying associations between genotypes assigned to individuals based on phenotypic measurements for various traits, it was concluded that cooking loss, two pH measurements and possibly backfat thickness are influenced by one gene, and that a second gene influences intramuscular fat and possibly shearforce and drip loss. Statistical findings were supported by demonstrating marked differences in variances of families of fathers inferred as carriers and those inferred as noncarriers. It is concluded that further molecular genetic research effort to map single genes affecting these traits based on the same experimental data has a high probability of success. PMID:9071593

  7. META-GSA: Combining Findings from Gene-Set Analyses across Several Genome-Wide Association Studies

    PubMed Central

    Rosenberger, Albert; Friedrichs, Stefanie; Amos, Christopher I.; Brennan, Paul; Fehringer, Gordon; Heinrich, Joachim; Hung, Rayjean J.; Muley, Thomas; Müller-Nurasyid, Martina; Risch, Angela; Bickeböller, Heike

    2015-01-01

    Introduction Gene-set analysis (GSA) methods are used as complementary approaches to genome-wide association studies (GWASs). The single marker association estimates of a predefined set of genes are either contrasted with those of all remaining genes or with a null non-associated background. To pool the p-values from several GSAs, it is important to take into account the concordance of the observed patterns resulting from single marker association point estimates across any given gene set. Here we propose an enhanced version of Fisher’s inverse χ2-method META-GSA, however weighting each study to account for imperfect correlation between association patterns. Simulation and Power We investigated the performance of META-GSA by simulating GWASs with 500 cases and 500 controls at 100 diallelic markers in 20 different scenarios, simulating different relative risks between 1 and 1.5 in gene sets of 10 genes. Wilcoxon’s rank sum test was applied as GSA for each study. We found that META-GSA has greater power to discover truly associated gene sets than simple pooling of the p-values, by e.g. 59% versus 37%, when the true relative risk for 5 of 10 genes was assume to be 1.5. Under the null hypothesis of no difference in the true association pattern between the gene set of interest and the set of remaining genes, the results of both approaches are almost uncorrelated. We recommend not relying on p-values alone when combining the results of independent GSAs. Application We applied META-GSA to pool the results of four case-control GWASs of lung cancer risk (Central European Study and Toronto/Lunenfeld-Tanenbaum Research Institute Study; German Lung Cancer Study and MD Anderson Cancer Center Study), which had already been analyzed separately with four different GSA methods (EASE; SLAT, mSUMSTAT and GenGen). This application revealed the pathway GO0015291 “transmembrane transporter activity” as significantly enriched with associated genes (GSA-method: EASE, p = 0

  8. Redundancy control in pathway databases (ReCiPa): an application for improving gene-set enrichment analysis in Omics studies and "Big data" biology.

    PubMed

    Vivar, Juan C; Pemu, Priscilla; McPherson, Ruth; Ghosh, Sujoy

    2013-08-01

    Abstract Unparalleled technological advances have fueled an explosive growth in the scope and scale of biological data and have propelled life sciences into the realm of "Big Data" that cannot be managed or analyzed by conventional approaches. Big Data in the life sciences are driven primarily via a diverse collection of 'omics'-based technologies, including genomics, proteomics, metabolomics, transcriptomics, metagenomics, and lipidomics. Gene-set enrichment analysis is a powerful approach for interrogating large 'omics' datasets, leading to the identification of biological mechanisms associated with observed outcomes. While several factors influence the results from such analysis, the impact from the contents of pathway databases is often under-appreciated. Pathway databases often contain variously named pathways that overlap with one another to varying degrees. Ignoring such redundancies during pathway analysis can lead to the designation of several pathways as being significant due to high content-similarity, rather than truly independent biological mechanisms. Statistically, such dependencies also result in correlated p values and overdispersion, leading to biased results. We investigated the level of redundancies in multiple pathway databases and observed large discrepancies in the nature and extent of pathway overlap. This prompted us to develop the application, ReCiPa (Redundancy Control in Pathway Databases), to control redundancies in pathway databases based on user-defined thresholds. Analysis of genomic and genetic datasets, using ReCiPa-generated overlap-controlled versions of KEGG and Reactome pathways, led to a reduction in redundancy among the top-scoring gene-sets and allowed for the inclusion of additional gene-sets representing possibly novel biological mechanisms. Using obesity as an example, bioinformatic analysis further demonstrated that gene-sets identified from overlap-controlled pathway databases show stronger evidence of prior association

  9. Transcriptome Analysis Reveals Candidate Genes Involved in Gibberellin-Induced Fruit Setting in Triploid Loquat (Eriobotrya japonica)

    PubMed Central

    Jiang, Shuang; Luo, Jun; Xu, Fanjie; Zhang, Xueying

    2016-01-01

    The triploid loquat (Eriobotrya japonica) is a new germplasm with a high edible fruit rate. Under natural conditions, the triploid loquat has a low fruit setting ratio (not more than 10 fruits in a tree), reflecting fertilization failure. To unravel the molecular mechanism of gibberellin (GA) treatment to induce parthenocarpy in triploid loquats, a transcriptome analysis of fruit setting induced by GA3 was analyzed using RNA-seq at four different stages during the development of young fruit. Approximately 344 million high quality reads in seven libraries were de novo assembled, yielding 153,900 unique transcripts with more than 79.9% functionally annotated transcripts. A total of 2,220, 2,974, and 1,614 differentially expressed genes (DEGs) were observed at 3, 7, and 14 days after GA treatment, respectively. The weighted gene co-expression network and Venn diagram analysis of DEGs revealed that sixteen candidate genes may play critical roles in the fruit setting after GA treatment. Five genes were related to auxin, in which one auxin synthesis gene of yucca was upregulated, suggesting that auxin may act as a signal for fruit setting. Furthermore, ABA 8′-hydroxylase was upregulated, while ethylene-forming enzyme was downregulated, suggesting that multiple hormones may be involved in GA signaling. Four transcription factors, NAC7, NAC23, bHLH35, and HD16, were potentially negatively regulated in fruit setting, and two cell division-related genes, arr9 and CYCA3, were upregulated. In addition, the expression of the GA receptor gid1 was downregulated by GA treatment, suggesting that the negative feedback mechanism in GA signaling may be regulated by gid1. Altogether, the results of the present study provide information from a comprehensive gene expression analysis and insight into the molecular mechanism underlying fruit setting under GA treatment in E. japonica. PMID:28066478

  10. Nutritional status of breastfed infants in rural Zambia: comparison of the National Center for Health Statistics growth reference versus the WHO 12-month breastfed pooled data set.

    PubMed Central

    Hautvast, J. L.; Pandor, A.; Burema, J.; Tolboom, J. J.; Chishimba, N.; Monnens, L. A.; van Staveren, W. A.

    2000-01-01

    Cross-sectional data for breastfed infants in rural Zambia were used to evaluate the effect of applying two different data sets as a reference, i.e. the WHO 12-month breastfed pooled data set and the National Center for Health Statistics (NCHS) growth reference in terms of prevalence of malnutrition (stunting, underweight, and wasting). A total of 518 infants who were attending mother-and-child health clinics were included. Age, weight and length were recorded. Anthropometric Z-scores were calculated in two ways: by applying the NCHS growth reference and by using the WHO breastfed data set. Anthropometric Z-scores calculated using the breastfed data set were lower during the first 6-7 months of life compared with those calculated by applying the NCHS growth reference. This resulted in a higher proportion of children aged 0-6 months being classified as stunted and underweight using the breastfed data set versus the NCHS growth reference. After the age of 7 months, similar prevalences of stunting or underweight were observed. Relatively few infants were classified as wasted. In order to adequately assess the prevalence of stunting and underweight in breastfed infants, it is recommended that a new growth reference be developed, as has been initiated by WHO. PMID:10885182

  11. Nutritional status of breastfed infants in rural Zambia: comparison of the National Center for Health Statistics growth reference versus the WHO 12-month breastfed pooled data set.

    PubMed

    Hautvast, J L; Pandor, A; Burema, J; Tolboom, J J; Chishimba, N; Monnens, L A; van Staveren, W A

    2000-01-01

    Cross-sectional data for breastfed infants in rural Zambia were used to evaluate the effect of applying two different data sets as a reference, i.e. the WHO 12-month breastfed pooled data set and the National Center for Health Statistics (NCHS) growth reference in terms of prevalence of malnutrition (stunting, underweight, and wasting). A total of 518 infants who were attending mother-and-child health clinics were included. Age, weight and length were recorded. Anthropometric Z-scores were calculated in two ways: by applying the NCHS growth reference and by using the WHO breastfed data set. Anthropometric Z-scores calculated using the breastfed data set were lower during the first 6-7 months of life compared with those calculated by applying the NCHS growth reference. This resulted in a higher proportion of children aged 0-6 months being classified as stunted and underweight using the breastfed data set versus the NCHS growth reference. After the age of 7 months, similar prevalences of stunting or underweight were observed. Relatively few infants were classified as wasted. In order to adequately assess the prevalence of stunting and underweight in breastfed infants, it is recommended that a new growth reference be developed, as has been initiated by WHO.

  12. Statistical inference and reverse engineering of gene regulatory networks from observational expression data.

    PubMed

    Emmert-Streib, Frank; Glazko, Galina V; Altay, Gökmen; de Matos Simoes, Ricardo

    2012-01-01

    In this paper, we present a systematic and conceptual overview of methods for inferring gene regulatory networks from observational gene expression data. Further, we discuss two classic approaches to infer causal structures and compare them with contemporary methods by providing a conceptual categorization thereof. We complement the above by surveying global and local evaluation measures for assessing the performance of inference algorithms.

  13. Setting limits on homeotic gene function: restraint of Sex combs reduced activity by teashirt and other homeotic genes.

    PubMed Central

    Andrew, D J; Horner, M A; Petitt, M G; Smolik, S M; Scott, M P

    1994-01-01

    Each of the homeotic genes of the HOM or HOX complexes is expressed in a limited domain along the anterior-posterior axis. Each homeotic protein directs the formation of characteristic structures, such as wings or ribs. In flies, when a heat shock-inducible homeotic gene is used to produce a homeotic protein in all cells of the embryo, only some cells respond by altering their fates. We have identified genes that limit where the homeotic gene Sex combs reduced (Scr) can affect cell fates in the Drosophila embryo. In the abdominal cuticle Scr is prevented from inducing prothoracic structures by the three bithorax complex (BX-C) homeotic genes. However, two of the BX-C homeotic genes, Ultrabithorax (Ubx) and abdominal-A (abd-A), have no effect on the ability of Scr to direct the formation of salivary glands. Instead, salivary gland induction by Scr is limited in the trunk by the homeotic gene teashirt (tsh) and in the last abdominal segment by the third BX-C gene, Abdominal-B (AbdB). Therefore, spatial restrictions on homeotic gene activity differ between tissues and result both from the regulation of homeotic gene transcription and from restraints on where homeotic proteins can function. Images PMID:7907545

  14. Correlation of a set of gene variants, life events and personality features on adult ADHD severity.

    PubMed

    Müller, Daniel J; Chiesa, Alberto; Mandelli, Laura; De Luca, Vincenzo; De Ronchi, Diana; Jain, Umesh; Serretti, Alessandro; Kennedy, James L

    2010-07-01

    Increasing evidence suggests that symptoms of attention deficit hyperactivity disorder (ADHD) could persist into adult life in a substantial proportion of cases. The aim of the present study was to investigate the impact of (1) adverse events, (2) personality traits and (3) genetic variants chosen on the basis of previous findings and (4) their possible interactions on adult ADHD severity. One hundred and ten individuals diagnosed with adult ADHD were evaluated for occurrence of adverse events in childhood and adulthood, and personality traits by the Temperament and Character Inventory (TCI). Common polymorphisms within a set of nine important candidate genes (SLC6A3, DBH, DRD4, DRD5, HTR2A, CHRNA7, BDNF, PRKG1 and TAAR9) were genotyped for each subject. Life events, personality traits and genetic variations were analyzed in relationship to severity of current symptoms, according to the Brown Attention Deficit Disorder Scale (BADDS). Genetic variations were not significantly associated with severity of ADHD symptoms. Life stressors displayed only a minor effect as compared to personality traits. Indeed, symptoms' severity was significantly correlated with the temperamental trait of Harm avoidance and the character trait of Self directedness. The results of the present work are in line with previous evidence of a significant correlation between some personality traits and adult ADHD. However, several limitations such as the small sample size and the exclusion of patients with other severe comorbid psychiatric disorders could have influenced the significance of present findings.

  15. LOESS correction for length variation in gene set-based genomic sequence analysis

    PubMed Central

    Aboukhalil, Anton; Bulyk, Martha L.

    2012-01-01

    Motivation: Sequence analysis algorithms are often applied to sets of DNA, RNA or protein sequences to identify common or distinguishing features. Controlling for sequence length variation is critical to properly score sequence features and identify true biological signals rather than length-dependent artifacts. Results: Several cis-regulatory module discovery algorithms exhibit a substantial dependence between DNA sequence score and sequence length. Our newly developed LOESS method is flexible in capturing diverse score-length relationships and is more effective in correcting DNA sequence scores for length-dependent artifacts, compared with four other approaches. Application of this method to genes co-expressed during Drosophila melanogaster embryonic mesoderm development or neural development scored by the Lever motif analysis algorithm resulted in successful recovery of their biologically validated cis-regulatory codes. The LOESS length-correction method is broadly applicable, and may be useful not only for more accurate inference of cis-regulatory codes, but also for detection of other types of patterns in biological sequences. Availability: Source code and compiled code are available from http://thebrain.bwh.harvard.edu/LM_LOESS/ Contact: mlbulyk@receptor.med.harvard.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22492312

  16. Understanding the transcriptional regulation of cervix cancer using microarray gene expression data and promoter sequence analysis of a curated gene set.

    PubMed

    Srivastava, Prashant; Mangal, Manu; Agarwal, Subhash Mohan

    2014-02-10

    Cervical cancer, the malignant neoplasm of the cervix uteri is the second most common cancer among women worldwide and the top-most cancer in India. Several factors are responsible for causing cervical cancer, which alter the expression of oncogenic genes resulting in up or down-regulation of gene expression and inactivation of tumor-suppressor genes/gene products. Gene expression is regulated by interactions between transcription factors (TFs) and specific regulatory elements in the promoter regions of target genes. Thus, it is important to decipher and analyze TFs that bind to regulatory regions of diseased genes and regulate their expression. In the present study, computational methods involving the combination of gene expression data from microarray experiments and promoter sequence analysis of a curated gene set involved in the cervical cancer causation have been utilized for identifying potential regulatory elements. Consensus predictions of two approaches led to the identification of twelve TFs that might be crucial to the regulation of cervical cancer progression. Subsequently, TF enrichment and oncomine expression analysis suggested that the transcription factor family E2F played an important role for the regulation of genes involve in cervical carcinogenesis. Our results suggest that E2F possesses diagnostic/prognostic value and can act as a potential drug target in cervical cancer.

  17. Counteracting H3K4 methylation modulators Set1 and Jhd2 co-regulate chromatin dynamics and gene transcription

    PubMed Central

    Ramakrishnan, Saravanan; Pokhrel, Srijana; Palani, Sowmiya; Pflueger, Christian; Parnell, Timothy J.; Cairns, Bradley R.; Bhaskara, Srividya; Chandrasekharan, Mahesh B.

    2016-01-01

    Histone H3K4 methylation is connected to gene transcription from yeast to humans, but its mechanistic roles in transcription and chromatin dynamics remain poorly understood. We investigated the functions for Set1 and Jhd2, the sole H3K4 methyltransferase and H3K4 demethylase, respectively, in S. cerevisiae. Here, we show that Set1 and Jhd2 predominantly co-regulate genome-wide transcription. We find combined activities of Set1 and Jhd2 via H3K4 methylation contribute to positive or negative transcriptional regulation. Providing mechanistic insights, our data reveal that Set1 and Jhd2 together control nucleosomal turnover and occupancy during transcriptional co-regulation. Moreover, we find a genome-wide co-regulation of chromatin structure by Set1 and Jhd2 at different groups of transcriptionally active or inactive genes and at different regions within yeast genes. Overall, our study puts forth a model wherein combined actions of Set1 and Jhd2 via modulating H3K4 methylation−demethylation together control chromatin dynamics during various facets of transcriptional regulation. PMID:27325136

  18. Regions of Unusual Statistical Properties as Tools in the Search for Horizontally Transferred Genes in Escherichia coli

    NASA Astrophysics Data System (ADS)

    Putonti, Catherine; Chumakov, Sergei; Chavez, Arturo; Luo, Yi; Graur, Dan; Fox, George E.; Fofanov, Yuriy

    2006-09-01

    The observed diversity of statistical characteristics along genomic sequences is the result of the influences of a variety of ongoing processes including horizontal gene transfer, gene loss, genome rearrangements, and evolution. The rate at which various processes affect the genome typically varies between different genomic regions. Thus, variations in statistical properties seen in different regions of a genome are often associated with its evolution and functional organization. Analysis of such properties is therefore relevant to many ongoing biomedical research efforts. Similarity Plot or S-plot is a Windows-based application for large-scale comparisons and 2D visualization of similarities between genomic sequences. This application combines two approaches wildly used in genomics: window analysis of statistical characteristics along genomes and dot-plot visual representation. S-plot is effective in detecting highly similar regions between two genomes. Within a single genome, S-plot has the ability to identify highly dissimilar regions displaying unusual compositional properties. The application was used to perform a comparative analysis of 50+ microbial genomes as well as many eukaryote genomes including human, rat, mouse, and drosophila. We illustrate the uses of S-Plot in a comparison involving Escherichia coli K12 and E. coli O157:H7.

  19. Organization of a β and α globin gene set in the teleost Atlantic cod, Gadus morhua.

    PubMed

    Halldórsdóttir, Katrín; Árnason, Einar

    2009-12-01

    Developmental globin gene expression and gene switching in vertebrates have been extensively studied. Globin gene regions have been characterized in some fish species and show linked α and β loci. Understanding coordinated expression between α and β globin genes in fish is of importance for further insights into globin gene regulation in teleosts and higher vertebrates. We characterize linked β and α globin genes in Atlantic cod, pulled from the Atlantic cod genome with a PCR research strategy, by screening a genomic λ library and primer walking. The genes are oriented tail-to-head (5'-3'), differing from the head-to-head orientation in transcriptional polarity characteristic of teleostean globin genes. Four tandem repeats are found in an intergenic region of 1500 base pairs. One microsatellite, which consists primarily of atg tandem repeats, has an open reading frame. The globin genes and open reading frame have a CCAAT promoter element and TATA boxes. The promoters of the open reading frame and the β gene share an 89-bp block (with 100% identity) that probably regulates transcription.

  20. Statistical controversies in clinical research: prognostic gene signatures are not (yet) useful in clinical practice

    PubMed Central

    Michiels, S.; Ternès, N.; Rotolo, F.

    2016-01-01

    With the genomic revolution and the era of targeted therapy, prognostic and predictive gene signatures are becoming increasingly important in clinical research. They are expected to assist prognosis assessment and therapeutic decision making. Notwithstanding, an evidence-based approach is needed to bring gene signatures from the laboratory to clinical practice. In early breast cancer, multiple prognostic gene signatures are commercially available without having formally reached the highest levels of evidence-based criteria. We discuss specific concepts for developing and validating a prognostic signature and illustrate them with contemporary examples in breast cancer. When a prognostic signature has not been developed for predicting the magnitude of relative treatment benefit through an interaction effect, it may be wishful thinking to test its predictive value. We propose that new gene signatures be built specifically for predicting treatment effects for future patients and outline an approach for this using a cross-validation scheme in a standard phase III trial. Replication in an independent trial remains essential. PMID:27634691

  1. Direct methylation of FXR by Set7/9, a lysine methyltransferase, regulates the expression of FXR target genes

    PubMed Central

    Balasubramaniyan, Natarajan; Ananthanarayanan, Meena

    2012-01-01

    The farnesoid X receptor (FXR) is a ligand (bile acid)-dependent nuclear receptor that regulates target genes involved in every aspect of bile acid homeostasis. Upon binding of ligand, FXR recruits an array of coactivators and associated proteins, some of which have intrinsic enzymatic activity that modify histones or even components of the transcriptional complex. In this study, we show chromatin occupancy by the Set7/9 methyltransferase at the FXR response element (FXRE) and direct methylation of FXR in vivo and in vitro at lysine 206. siRNA depletion of Set7/9 in the Huh-7 liver cell line decreased endogenous mRNAs of the FXR target genes, the short heterodimer partner (SHP) and bile salt export pump (BSEP). Mutation of the methylation site at K206 of FXR to an arginine prevented methylation by Set7/9. A pan-methyllysine antibody recognized the wild-type FXR but not the K206R mutant form. An electromobility shift assay showed that methylation by Set7/9 enhanced binding of FXR/retinoic X receptor-α to the FXRE. Interaction between hinge domain of FXR (containing K206) and Set7/9 was confirmed by coimmunoprecipitation, GST pull down, and mammalian two-hybrid experiments. Set7/9 overexpression in Huh-7 cells significantly enhanced transactivation of the SHP and BSEP promoters in a ligand-dependent fashion by wild-type FXR but not the K206R mutant FXR. A Set7/9 mutant deficient in methyltransferase activity was also not effective in increasing transactivation of the BSEP promoter. These studies demonstrate that posttranslational methylation of FXR by Set7/9 contributes to the transcriptional activation of FXR-target genes. PMID:22345554

  2. No associations of a set of SNPs in the Vascular Endothelial Growth Factor (VEGF) and Matrix Metalloproteinase (MMP) genes with survival of colorectal cancer patients.

    PubMed

    Dan, Lydia A; Werdyani, Salem; Xu, Jingxiong; Shestopaloff, Konstantin; Hyde, Angela; Dicks, Elizabeth; Younghusband, Ban; Green, Jane; Parfrey, Patrick; Xu, Wei; Savas, Sevtap

    2016-09-01

    In this study, we aimed to investigate the associations of genetic variations within select genes functioning in angiogenesis, lymph-angiogenesis, and metastasis pathways and the risk of outcome in colorectal cancer patients. We followed a two-stage analysis: First, 381 polymorphisms from 30 genes (eight Vascular Endothelial Growth Factor (VEGF) and 22 Matrix Metalloproteinase [MMP] genes) were investigated in the discovery cohort (n = 505). Then, 16 polymorphisms with the lowest P-value in this analysis were investigated in a separate replication cohort (n = 247). Genotypes were obtained using the Illumina(®) HumanOmni-1-Quad (discovery cohort) and Sequenom MassArray(®) (replication cohort) platforms. The primary outcome measure was overall survival (OS). Kaplan-Meier, univariate and multivariable Cox regression methods were used to test the associations between genotypes and OS. Four SNPs (rs12365082, rs11225389, rs11225388, and rs2846707) had the univariate analysis P < 0.05 in both the discovery and replication cohorts. These SNPs are in linkage disequilibrium with each other to varying extent and are located in the MMP8 and MMP27 genes. In the multivariable analysis adjusting for age, stage, and microsatellite instability status, three of these SNPs (rs12365082, rs11225389, rs11225388) were independent predictors of OS (P < 0.05) in the discovery cohort. However, the same analysis in the replication cohort did not yield statistically significant results. Overall, while the genetic variations in the VEGF and MMP genes are attractive candidates as prognostic markers, our study showed no evidence of associations of a large set of SNPs in these genes and overall survival of colorectal cancer patients in our study.

  3. The sex-inducing pheromone and wounding trigger the same set of genes in the multicellular green alga Volvox.

    PubMed

    Amon, P; Haas, E; Sumper, M

    1998-05-01

    The sex-inducing pheromone of the multicellular green alga Volvox carteri is a glycoprotein that triggers development of males and females at a concentration <10(-16) M. By differential screening of a cDNA library, two novel genes were identified that are transcribed under the control of this pheromone. Unexpectedly, one gene product was characterized as a lysozyme/chitinase, and the other gene product was shown to encode a polypeptide with a striking modular composition. This polypeptide has a cysteine protease domain separated by an extensin-like module from three repeats of a chitin binding domain. In higher plants, similar protein families are known to play an important role in defense against fungi. Indeed, we found that the same set of genes triggered by the sexual pheromone was also inducible in V. carteri by wounding.

  4. The sex-inducing pheromone and wounding trigger the same set of genes in the multicellular green alga Volvox.

    PubMed Central

    Amon, P; Haas, E; Sumper, M

    1998-01-01

    The sex-inducing pheromone of the multicellular green alga Volvox carteri is a glycoprotein that triggers development of males and females at a concentration <10(-16) M. By differential screening of a cDNA library, two novel genes were identified that are transcribed under the control of this pheromone. Unexpectedly, one gene product was characterized as a lysozyme/chitinase, and the other gene product was shown to encode a polypeptide with a striking modular composition. This polypeptide has a cysteine protease domain separated by an extensin-like module from three repeats of a chitin binding domain. In higher plants, similar protein families are known to play an important role in defense against fungi. Indeed, we found that the same set of genes triggered by the sexual pheromone was also inducible in V. carteri by wounding. PMID:9596636

  5. CURLY LEAF Regulates Gene Sets Coordinating Seed Size and Lipid Biosynthesis1[OPEN

    PubMed Central

    Wang, Huan; Ye, Jian; Wu, Hui-Wen; Sun, Hai-Xi; Chua, Nam-Hai

    2016-01-01

    CURLY LEAF (CLF), a histone methyltransferase of Polycomb Repressive Complex 2 (PRC2) for trimethylation of histone H3 Lys 27 (H3K27me3), has been thought as a negative regulator controlling mainly postgermination growth in Arabidopsis (Arabidopsis thaliana). Approximately 14% to 29% of genic regions are decorated by H3K27me3 in the Arabidopsis genome; however, transcriptional repression activities of PRC2 on a majority of these regions remain unclear. Here, by analysis of transcriptome profiles, we found that approximately 11.6% genes in the Arabidopsis genome were repressed by CLF in various organs. Unexpectedly, approximately 54% of these genes were preferentially repressed in siliques. Further analyses of 118 transcriptome datasets uncovered a group of genes that was preferentially expressed and repressed by CLF in embryos at the mature-green stage. This observation suggests that CLF mediates a large-scale H3K27me3 programming/reprogramming event during embryonic development. Plants of clf-28 produced bigger and heavier seeds with higher oil content, larger oil bodies, and altered long-chain fatty acid composition compared with wild type. Around 46% of CLF-repressed genes were associated with H3K27me3 marks; moreover, we verified histone modification and transcriptional repression by CLF on regulatory genes. Our results suggest that CLF silences specific gene expression modules. Genes operating within a module have various molecular functions, but they cooperate to regulate a similar physiological function during embryo development. PMID:26945048

  6. A Complete Set of Flagellar Genes Acquired by Horizontal Transfer Coexists with the Endogenous Flagellar System in Rhodobacter sphaeroides▿ †

    PubMed Central

    Poggio, Sebastian; Abreu-Goodger, Cei; Fabela, Salvador; Osorio, Aurora; Dreyfus, Georges; Vinuesa, Pablo; Camarena, Laura

    2007-01-01

    Bacteria swim in liquid environments by means of a complex rotating structure known as the flagellum. Approximately 40 proteins are required for the assembly and functionality of this structure. Rhodobacter sphaeroides has two flagellar systems. One of these systems has been shown to be functional and is required for the synthesis of the well-characterized single subpolar flagellum, while the other was found only after the genome sequence of this bacterium was completed. In this work we found that the second flagellar system of R. sphaeroides can be expressed and produces a functional flagellum. In many bacteria with two flagellar systems, one is required for swimming, while the other allows movement in denser environments by producing a large number of flagella over the entire cell surface. In contrast, the second flagellar system of R. sphaeroides produces polar flagella that are required for swimming. Expression of the second set of flagellar genes seems to be positively regulated under anaerobic growth conditions. Phylogenic analysis suggests that the flagellar system that was initially characterized was in fact acquired by horizontal transfer from a γ-proteobacterium, while the second flagellar system contains the native genes. Interestingly, other α-proteobacteria closely related to R. sphaeroides have also acquired a set of flagellar genes similar to the set found in R. sphaeroides, suggesting that a common ancestor received this gene cluster. PMID:17293429

  7. Identification of candidate genes for prostate cancer-risk SNPs utilizing a normal prostate tissue eQTL data set

    PubMed Central

    Thibodeau, S. N.; French, A. J.; McDonnell, S. K.; Cheville, J.; Middha, S.; Tillmans, L.; Riska, S.; Baheti, S.; Larson, M. C.; Fogarty, Z.; Zhang, Y.; Larson, N.; Nair, A.; O'Brien, D.; Wang, L.; Schaid, D J.

    2015-01-01

    Multiple studies have identified loci associated with the risk of developing prostate cancer but the associated genes are not well studied. Here we create a normal prostate tissue-specific eQTL data set and apply this data set to previously identified prostate cancer (PrCa)-risk SNPs in an effort to identify candidate target genes. The eQTL data set is constructed by the genotyping and RNA sequencing of 471 samples. We focus on 146 PrCa-risk SNPs, including all SNPs in linkage disequilibrium with each risk SNP, resulting in 100 unique risk intervals. We analyse cis-acting associations where the transcript is located within 2 Mb (±1 Mb) of the risk SNP interval. Of all SNP–gene combinations tested, 41.7% of SNPs demonstrate a significant eQTL signal after adjustment for sample histology and 14 expression principal component covariates. Of the 100 PrCa-risk intervals, 51 have a significant eQTL signal and these are associated with 88 genes. This study provides a rich resource to study biological mechanisms underlying genetic risk to PrCa. PMID:26611117

  8. Gene set differential analysis of time course expression profiles via sparse estimation in functional logistic model with application to time-dependent biomarker detection.

    PubMed

    Kayano, Mitsunori; Matsui, Hidetoshi; Yamaguchi, Rui; Imoto, Seiya; Miyano, Satoru

    2016-04-01

    High-throughput time course expression profiles have been available in the last decade due to developments in measurement techniques and devices. Functional data analysis, which treats smoothed curves instead of originally observed discrete data, is effective for the time course expression profiles in terms of dimension reduction, robustness, and applicability to data measured at small and irregularly spaced time points. However, the statistical method of differential analysis for time course expression profiles has not been well established. We propose a functional logistic model based on elastic net regularization (F-Logistic) in order to identify the genes with dynamic alterations in case/control study. We employ a mixed model as a smoothing method to obtain functional data; then F-Logistic is applied to time course profiles measured at small and irregularly spaced time points. We evaluate the performance of F-Logistic in comparison with another functional data approach, i.e. functional ANOVA test (F-ANOVA), by applying the methods to real and synthetic time course data sets. The real data sets consist of the time course gene expression profiles for long-term effects of recombinant interferon β on disease progression in multiple sclerosis. F-Logistic distinguishes dynamic alterations, which cannot be found by competitive approaches such as F-ANOVA, in case/control study based on time course expression profiles. F-Logistic is effective for time-dependent biomarker detection, diagnosis, and therapy.

  9. Assessing the Association of Mitochondrial Genetic Variation With Primary Open-Angle Glaucoma Using Gene-Set Analyses

    PubMed Central

    Khawaja, Anthony P.; Cooke Bailey, Jessica N.; Kang, Jae Hee; Allingham, R. Rand; Hauser, Michael A.; Brilliant, Murray; Budenz, Donald L.; Christen, William G.; Fingert, John; Gaasterland, Douglas; Gaasterland, Terry; Kraft, Peter; Lee, Richard K.; Lichter, Paul R.; Liu, Yutao; Medeiros, Felipe; Moroi, Syoko E.; Richards, Julia E.; Realini, Tony; Ritch, Robert; Schuman, Joel S.; Scott, William K.; Singh, Kuldev; Sit, Arthur J.; Vollrath, Douglas; Wollstein, Gadi; Zack, Donald J.; Zhang, Kang; Pericak-Vance, Margaret; Weinreb, Robert N.; Haines, Jonathan L.; Pasquale, Louis R.; Wiggs, Janey L.

    2016-01-01

    Purpose Recent studies indicate that mitochondrial proteins may contribute to the pathogenesis of primary open-angle glaucoma (POAG). In this study, we examined the association between POAG and common variations in gene-encoding mitochondrial proteins. Methods We examined genetic data from 3430 POAG cases and 3108 controls derived from the combination of the GLAUGEN and NEIGHBOR studies. We constructed biological-system coherent mitochondrial nuclear-encoded protein gene-sets by intersecting the MitoCarta database with the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. We examined the mitochondrial gene-sets for association with POAG and with normal-tension glaucoma (NTG) and high-tension glaucoma (HTG) subsets using Pathway Analysis by Randomization Incorporating Structure. Results We identified 22 KEGG pathways with significant mitochondrial protein-encoding gene enrichment, belonging to six general biological classes. Among the pathway classes, mitochondrial lipid metabolism was associated with POAG overall (P = 0.013) and with NTG (P = 0.0006), and mitochondrial carbohydrate metabolism was associated with NTG (P = 0.030). Examining the individual KEGG pathway mitochondrial gene-sets, fatty acid elongation and synthesis and degradation of ketone bodies, both lipid metabolism pathways, were significantly associated with POAG (P = 0.005 and P = 0.002, respectively) and NTG (P = 0.0004 and P < 0.0001, respectively). Butanoate metabolism, a carbohydrate metabolism pathway, was significantly associated with POAG (P = 0.004), NTG (P = 0.001), and HTG (P = 0.010). Conclusions We present an effective approach for assessing the contributions of mitochondrial genetic variation to open-angle glaucoma. Our findings support a role for mitochondria in POAG pathogenesis and specifically point to lipid and carbohydrate metabolism pathways as being important. PMID:27661856

  10. Discovery of gene-gene interactions across multiple independent data sets of late onset Alzheimer disease from the Alzheimer Disease Genetics Consortium.

    PubMed

    Hohman, Timothy J; Bush, William S; Jiang, Lan; Brown-Gentry, Kristin D; Torstenson, Eric S; Dudek, Scott M; Mukherjee, Shubhabrata; Naj, Adam; Kunkle, Brian W; Ritchie, Marylyn D; Martin, Eden R; Schellenberg, Gerard D; Mayeux, Richard; Farrer, Lindsay A; Pericak-Vance, Margaret A; Haines, Jonathan L; Thornton-Wells, Tricia A

    2016-02-01

    Late-onset Alzheimer disease (AD) has a complex genetic etiology, involving locus heterogeneity, polygenic inheritance, and gene-gene interactions; however, the investigation of interactions in recent genome-wide association studies has been limited. We used a biological knowledge-driven approach to evaluate gene-gene interactions for consistency across 13 data sets from the Alzheimer Disease Genetics Consortium. Fifteen single nucleotide polymorphism (SNP)-SNP pairs within 3 gene-gene combinations were identified: SIRT1 × ABCB1, PSAP × PEBP4, and GRIN2B × ADRA1A. In addition, we extend a previously identified interaction from an endophenotype analysis between RYR3 × CACNA1C. Finally, post hoc gene expression analyses of the implicated SNPs further implicate SIRT1 and ABCB1, and implicate CDH23 which was most recently identified as an AD risk locus in an epigenetic analysis of AD. The observed interactions in this article highlight ways in which genotypic variation related to disease may depend on the genetic context in which it occurs. Further, our results highlight the utility of evaluating genetic interactions to explain additional variance in AD risk and identify novel molecular mechanisms of AD pathogenesis.

  11. Testing for Multivariate Normality in Mass Spectrometry Imaging Data: A Robust Statistical Approach for Clustering Evaluation and the Generation of Synthetic Mass Spectrometry Imaging Data Sets.

    PubMed

    Dexter, Alex; Race, Alan M; Styles, Iain B; Bunch, Josephine

    2016-11-15

    Spatial clustering is a powerful tool in mass spectrometry imaging (MSI) and has been demonstrated to be capable of differentiating tumor types, visualizing intratumor heterogeneity, and segmenting anatomical structures. Several clustering methods have been applied to mass spectrometry imaging data, but a principled comparison and evaluation of different clustering techniques presents a significant challenge. We propose that testing whether the data has a multivariate normal distribution within clusters can be used to evaluate the performance when using algorithms that assume normality in the data, such as k-means clustering. In cases where clustering has been performed using the cosine distance, conversion of the data to polar coordinates prior to normality testing should be performed to ensure normality is tested in the correct coordinate system. In addition to these evaluations of internal consistency, we demonstrate that the multivariate normal distribution can then be used as a basis for statistical modeling of MSI data. This allows the generation of synthetic MSI data sets with known ground truth, providing a means of external clustering evaluation. To demonstrate this, reference data from seven anatomical regions of an MSI image of a coronal section of mouse brain were modeled. From this, a set of synthetic data based on this model was generated. Results of r(2) fitting of the chi-squared quantile-quantile plots on the seven anatomical regions confirmed that the data acquired from each spatial region was found to be closer to normally distributed in polar space than in Euclidean. Finally, principal component analysis was applied to a single data set that included synthetic and real data. No significant differences were found between the two data types, indicating the suitability of these methods for generating realistic synthetic data.

  12. HoxBlinc RNA recruits Set1/MLL complexes to activate Hox gene expression patterns and mesoderm lineage development

    PubMed Central

    Deng, Changwang; Li, Ying; Zhou, Lei; Cho, Joonseok; Patel, Bhavita; Terada, Nao; Li, Yangqiu; Bungert, Jörg; Qiu, Yi; Huang, Suming

    2015-01-01

    Summary Trithorax proteins and long-intergenic noncoding RNAs are critical regulators of embryonic stem cell pluripotency; however, how they cooperatively regulate germ layer mesoderm specification remains elusive. We report here that HoxBlinc RNA first specifies Flk1+ mesoderm and then promotes hematopoietic differentiation through regulating hoxb gene pathways. HoxBlinc binds to the hoxb genes, recruits Setd1a/MLL1 complexes, and mediates long-range chromatin interactions to activate transcription of the hoxb genes. Depletion of HoxBlinc by shRNA-mediated KD or CRISPR-Cas9-mediated genetic deletion inhibits expression of hoxb genes and other factors regulating cardiac/hematopoietic differentiation. Reduced hoxb gene expression is accompanied by decreased recruitment of Set1/MLL1 and H3K4me3 modification, as well as by reduced chromatin loop formation. Re-expression of hoxb2-b4 genes in HoxBlinc-depleted embryoid bodies rescues Flk1+ precursors that undergo hematopoietic differentiation. Thus, HoxBlinc plays an important role in controlling hoxb transcription networks that mediate specification of mesoderm-derived Flk1+ precursors and differentiation of Flk1+ cells into hematopoietic lineages. PMID:26725110

  13. HDR: a statistical two-step approach successfully identifies disease genes in autosomal recessive families.

    PubMed

    Imai, Atsuko; Kohda, Masakazu; Nakaya, Akihiro; Sakata, Yasushi; Murayama, Kei; Ohtake, Akira; Lathrop, Mark; Okazaki, Yasushi; Ott, Jurg

    2016-11-01

    In the search for sequence variants underlying disease, commonly applied filtering steps usually result in a number of candidate variants that cannot further be narrowed down. In autosomal recessive families, disease usually occurs only in one generation so that genetic linkage analysis is unlikely to help. Because homozygous recessive mutations tend to be inherited together with flanking homozygous variants, we developed a statistical method to detect pathogenic variants in autosomal recessive families: We look for differences in patterns of homozygosity around candidate variants between patients and control individuals and expect that such differences are greater for pathogenic variants than random candidate variants. In six autosomal recessive mitochondrial disease families, in which pathogenic homozygous variants have already been identified, our approach succeeded in prioritizing pathogenic mutations. Our method is applicable to single patients from recessive families with at least a few dozen control individuals from the same population; it is easy to use and is highly effective for detecting causative mutations in autosomal recessive families.

  14. A gateway cloning vector set for high-throughput functional analysis of genes in planta.

    PubMed

    Curtis, Mark D; Grossniklaus, Ueli

    2003-10-01

    The current challenge, now that two plant genomes have been sequenced, is to assign a function to the increasing number of predicted genes. In Arabidopsis, approximately 55% of genes can be assigned a putative function, however, less than 8% of these have been assigned a function by direct experimental evidence. To identify these functions, many genes will have to undergo comprehensive analyses, which will include the production of chimeric transgenes for constitutive or inducible ectopic expression, for antisense or dominant negative expression, for subcellular localization studies, for promoter analysis, and for gene complementation studies. The production of such transgenes is often hampered by laborious conventional cloning technology that relies on restriction digestion and ligation. With the aim of providing tools for high throughput gene analysis, we have produced a Gateway-compatible Agrobacterium sp. binary vector system that facilitates fast and reliable DNA cloning. This collection of vectors is freely available, for noncommercial purposes, and can be used for the ectopic expression of genes either constitutively or inducibly. The vectors can be used for the expression of protein fusions to the Aequorea victoria green fluorescent protein and to the beta-glucuronidase protein so that the subcellular localization of a protein can be identified. They can also be used to generate promoter-reporter constructs and to facilitate efficient cloning of genomic DNA fragments for complementation experiments. All vectors were derived from pCambia T-DNA cloning vectors, with the exception of a chemically inducible vector, for Agrobacterium sp.-mediated transformation of a wide range of plant species.

  15. ABAEnrichment: an R package to test for gene set expression enrichment in the adult and developing human brain.

    PubMed

    Grote, Steffi; Prüfer, Kay; Kelso, Janet; Dannemann, Michael

    2016-10-15

    We present ABAEnrichment, an R package that tests for expression enrichment in specific brain regions at different developmental stages using expression information gathered from multiple regions of the adult and developing human brain, together with ontologically organized structural information about the brain, both provided by the Allen Brain Atlas. We validate ABAEnrichment by successfully recovering the origin of gene sets identified in specific brain cell-types and developmental stages.

  16. Gene Set-Based Integrative Analysis Revealing Two Distinct Functional Regulation Patterns in Four Common Subtypes of Epithelial Ovarian Cancer.

    PubMed

    Chang, Chia-Ming; Chuang, Chi-Mu; Wang, Mong-Lien; Yang, Yi-Ping; Chuang, Jen-Hua; Yang, Ming-Jie; Yen, Ming-Shyen; Chiou, Shih-Hwa; Chang, Cheng-Chang

    2016-08-05

    Clear cell (CCC), endometrioid (EC), mucinous (MC) and high-grade serous carcinoma (SC) are the four most common subtypes of epithelial ovarian carcinoma (EOC). The widely accepted dualistic model of ovarian carcinogenesis divided EOCs into type I and II categories based on the molecular features. However, this hypothesis has not been experimentally demonstrated. We carried out a gene set-based analysis by integrating the microarray gene expression profiles downloaded from the publicly available databases. These quantified biological functions of EOCs were defined by 1454 Gene Ontology (GO) term and 674 Reactome pathway gene sets. The pathogenesis of the four EOC subtypes was investigated by hierarchical clustering and exploratory factor analysis. The patterns of functional regulation among the four subtypes containing 1316 cases could be accurately classified by machine learning. The results revealed that the ERBB and PI3K-related pathways played important roles in the carcinogenesis of CCC, EC and MC; while deregulation of cell cycle was more predominant in SC. The study revealed that two different functional regulation patterns exist among the four EOC subtypes, which were compatible with the type I and II classifications proposed by the dualistic model of ovarian carcinogenesis.

  17. Using RNAi in C. "elegans" to Demonstrate Gene Knockdown Phenotypes in the Undergraduate Biology Lab Setting

    ERIC Educational Resources Information Center

    Roy, Nicole M.

    2013-01-01

    RNA interference (RNAi) is a powerful technology used to knock down genes in basic research and medicine. In 2006 RNAi technology using "Caenorhabditis elegans" ("C. elegans") was awarded the Nobel Prize in medicine and thus students graduating in the biological sciences should have experience with this technology. However,…

  18. SBERIA: Set Based gene EnviRonment InterAction test for rare and common variants in complex diseases

    PubMed Central

    Jiao, Shuo; Hsu, Li; Bézieau, Stéphane; Brenner, Hermann; Chan, Andrew T.; Chang-Claude, Jenny; Le Marchand, Loic; Lemire, Mathieu; Newcomb, Polly A.; Slattery, Martha L.; Peters, Ulrike

    2013-01-01

    Identification of gene-environment interaction (GxE) is important in understanding the etiology of complex diseases. However, partially due to the lack of power, there have been very few replicated GxE findings compared to the success in marginal association studies. The existing GxE testing methods mainly focus on improving the power for individual markers. In this paper, we took a different strategy and proposed a Set Based gene EnviRonment InterAction test (SBERIA), which can improve the power by reducing the multiple testing burdens and aggregating signals within a set. The major challenge of the signal aggregation within a set is how to tell signals from noise and how to determine the direction of the signals. SBERIA takes advantage of the established correlation screening for GxE to guide the aggregation of genotypes within a marker set. The correlation screening has been shown to be an efficient way of selecting potential GxE candidate SNPs in case-control studies for complex diseases. Importantly, the correlation screening in case-control combined samples is independent of the interaction test. With this desirable feature, SBERIA maintains the correct type I error level and can be easily implemented in a regular logistic regression setting. We showed that SBERIA had higher power than benchmark methods in various simulation scenarios, both for common and rare variants. We also applied SBERIA to real GWAS data of 10,729 colorectal cancer cases and 13,328 controls and found evidence of interaction between the set of known colorectal cancer susceptibility loci and smoking. PMID:23720162

  19. The analysis of translation-related gene set boosts debates around origin and evolution of mimiviruses

    PubMed Central

    Colson, Philippe; La Scola, Bernard

    2017-01-01

    The giant mimiviruses challenged the well-established concept of viruses, blurring the roots of the tree of life, mainly due to their genetic content. Along with other nucleo-cytoplasmic large DNA viruses, they compose a new proposed order—named Megavirales—whose origin and evolution generate heated debate in the scientific community. The presence of an arsenal of genes not widespread in the virosphere related to important steps of the translational process, including transfer RNAs, aminoacyl-tRNA synthetases, and translation factors for peptide synthesis, constitutes an important element of this debate. In this review, we highlight the main findings to date about the translational machinery of the mimiviruses and compare their distribution along the distinct members of the family Mimiviridae. Furthermore, we discuss how the presence and/or absence of the translation-related genes among mimiviruses raises important insights to boost the debate on their origin and evolutionary history. PMID:28207761

  20. Identification of Pou5f1, Sox2, and Nanog downstream target genes with statistical confidence by applying a novel algorithm to time course microarray and genome-wide chromatin immunoprecipitation data

    PubMed Central

    Sharov, Alexei A; Masui, Shinji; Sharova, Lioudmila V; Piao, Yulan; Aiba, Kazuhiro; Matoba, Ryo; Xin, Li; Niwa, Hitoshi; Ko, Minoru SH

    2008-01-01

    Background Target genes of a transcription factor (TF) Pou5f1 (Oct3/4 or Oct4), which is essential for pluripotency maintenance and self-renewal of embryonic stem (ES) cells, have previously been identified based on their response to Pou5f1 manipulation and occurrence of Chromatin-immunoprecipitation (ChIP)-binding sites in promoters. However, many responding genes with binding sites may not be direct targets because response may be mediated by other genes and ChIP-binding site may not be functional in terms of transcription regulation. Results To reduce the number of false positives, we propose to separate responding genes into groups according to direction, magnitude, and time of response, and to apply the false discovery rate (FDR) criterion to each group individually. Using this novel algorithm with stringent statistical criteria (FDR < 0.2) to a compendium of published and new microarray data (3, 6, 12, and 24 hr after Pou5f1 suppression) and published ChIP data, we identified 420 tentative target genes (TTGs) for Pou5f1. The majority of TTGs (372) were down-regulated after Pou5f1 suppression, indicating that the Pou5f1 functions as an activator of gene expression when it binds to promoters. Interestingly, many activated genes are potent suppressors of transcription, which include polycomb genes, zinc finger TFs, chromatin remodeling factors, and suppressors of signaling. Similar analysis showed that Sox2 and Nanog also function mostly as transcription activators in cooperation with Pou5f1. Conclusion We have identified the most reliable sets of direct target genes for key pluripotency genes – Pou5f1, Sox2, and Nanog, and found that they predominantly function as activators of downstream gene expression. Thus, most genes related to cell differentiation are suppressed indirectly. PMID:18522731

  1. Toxicity mechanisms identification via gene set enrichment analysis of time-series toxicogenomics data: impact of time and concentration.

    PubMed

    Gao, Ce; Weisman, David; Lan, Jiaqi; Gou, Na; Gu, April Z

    2015-04-07

    The advance in high-throughput "toxicogenomics" technologies, which allows for concurrent monitoring of cellular responses globally upon exposure to chemical toxicants, presents promises for next-generation toxicity assessment. It is recognized that cellular responses to toxicants have a highly dynamic nature, and exhibit both temporal complexity and dose-response shifts. Most current gene enrichment or pathway analysis lack the recognition of the inherent correlation within time series data, and may potentially miss important pathways or yield biased and inconsistent results that ignore dynamic patterns and time-sensitivity. In this study, we investigated the application of two score metrics for GSEA (gene set enrichment analysis) to rank the genes that consider the temporal gene expression profile. One applies a novel time series CPCA (common principal components analysis) to generate scores for genes based on their contributions to the common temporal variation among treatments for a given chemical at different concentrations. Another one employs an integrated altered gene expression quantifier-TELI (transcriptional effect level index) that integrates altered gene expression magnitude over the exposure time. By comparing the GSEA results using two different ranking metrics for examining the dynamic responses of reporter cells treated with various dose levels of three model toxicants, mitomycin C, hydrogen peroxide, and lead nitrate, the analysis identified and revealed different toxicity mechanisms of these chemicals that exhibit chemical-specific, as well as time-aware and dose-sensitive nature. The ability, advantages, and disadvantages of varying ranking metrics were discussed. These findings support the notion that toxicity bioassays should account for the cells' complex dynamic responses, thereby implying that both data acquisition and data analysis should look beyond simple traditional end point responses.

  2. The smallest known genomes of multicellular and toxic cyanobacteria: comparison, minimal gene sets for linked traits and the evolutionary implications.

    PubMed

    Stucken, Karina; John, Uwe; Cembella, Allan; Murillo, Alejandro A; Soto-Liebe, Katia; Fuentes-Valdés, Juan J; Friedel, Maik; Plominsky, Alvaro M; Vásquez, Mónica; Glöckner, Gernot

    2010-02-16

    Cyanobacterial morphology is diverse, ranging from unicellular spheres or rods to multicellular structures such as colonies and filaments. Multicellular species represent an evolutionary strategy to differentiate and compartmentalize certain metabolic functions for reproduction and nitrogen (N(2)) fixation into specialized cell types (e.g. akinetes, heterocysts and diazocytes). Only a few filamentous, differentiated cyanobacterial species, with genome sizes over 5 Mb, have been sequenced. We sequenced the genomes of two strains of closely related filamentous cyanobacterial species to yield further insights into the molecular basis of the traits of N(2) fixation, filament formation and cell differentiation. Cylindrospermopsis raciborskii CS-505 is a cylindrospermopsin-producing strain from Australia, whereas Raphidiopsis brookii D9 from Brazil synthesizes neurotoxins associated with paralytic shellfish poisoning (PSP). Despite their different morphology, toxin composition and disjunct geographical distribution, these strains form a monophyletic group. With genome sizes of approximately 3.9 (CS-505) and 3.2 (D9) Mb, these are the smallest genomes described for free-living filamentous cyanobacteria. We observed remarkable gene order conservation (synteny) between these genomes despite the difference in repetitive element content, which accounts for most of the genome size difference between them. We show here that the strains share a specific set of 2539 genes with >90% average nucleotide identity. The fact that the CS-505 and D9 genomes are small and streamlined compared to those of other filamentous cyanobacterial species and the lack of the ability for heterocyst formation in strain D9 allowed us to define a core set of genes responsible for each trait in filamentous species. We presume that in strain D9 the ability to form proper heterocysts was secondarily lost together with N(2) fixation capacity. Further comparisons to all available cyanobacterial genomes

  3. The Smallest Known Genomes of Multicellular and Toxic Cyanobacteria: Comparison, Minimal Gene Sets for Linked Traits and the Evolutionary Implications

    PubMed Central

    Stucken, Karina; John, Uwe; Cembella, Allan; Murillo, Alejandro A.; Soto-Liebe, Katia; Fuentes-Valdés, Juan J.; Friedel, Maik; Plominsky, Alvaro M.; Vásquez, Mónica; Glöckner, Gernot

    2010-01-01

    Cyanobacterial morphology is diverse, ranging from unicellular spheres or rods to multicellular structures such as colonies and filaments. Multicellular species represent an evolutionary strategy to differentiate and compartmentalize certain metabolic functions for reproduction and nitrogen (N2) fixation into specialized cell types (e.g. akinetes, heterocysts and diazocytes). Only a few filamentous, differentiated cyanobacterial species, with genome sizes over 5 Mb, have been sequenced. We sequenced the genomes of two strains of closely related filamentous cyanobacterial species to yield further insights into the molecular basis of the traits of N2 fixation, filament formation and cell differentiation. Cylindrospermopsis raciborskii CS-505 is a cylindrospermopsin-producing strain from Australia, whereas Raphidiopsis brookii D9 from Brazil synthesizes neurotoxins associated with paralytic shellfish poisoning (PSP). Despite their different morphology, toxin composition and disjunct geographical distribution, these strains form a monophyletic group. With genome sizes of approximately 3.9 (CS-505) and 3.2 (D9) Mb, these are the smallest genomes described for free-living filamentous cyanobacteria. We observed remarkable gene order conservation (synteny) between these genomes despite the difference in repetitive element content, which accounts for most of the genome size difference between them. We show here that the strains share a specific set of 2539 genes with >90% average nucleotide identity. The fact that the CS-505 and D9 genomes are small and streamlined compared to those of other filamentous cyanobacterial species and the lack of the ability for heterocyst formation in strain D9 allowed us to define a core set of genes responsible for each trait in filamentous species. We presume that in strain D9 the ability to form proper heterocysts was secondarily lost together with N2 fixation capacity. Further comparisons to all available cyanobacterial genomes covering

  4. Genetic diversity of the conserved motifs of six bacterial leaf blight resistance genes in a set of rice landraces

    PubMed Central

    2014-01-01

    Background Bacterial leaf blight (BLB) caused by the vascular pathogen Xanthomonas oryzae pv. oryzae (Xoo) is one of the most serious diseases leading to crop failure in rice growing countries. A total of 37 resistance genes against Xoo has been identified in rice. Of these, ten BLB resistance genes have been mapped on rice chromosomes, while 6 have been cloned, sequenced and characterized. Diversity analysis at the resistance gene level of this disease is scanty, and the landraces from West Bengal and North Eastern states of India have received little attention so far. The objective of this study was to assess the genetic diversity at conserved domains of 6 BLB resistance genes in a set of 22 rice accessions including landraces and check genotypes collected from the states of Assam, Nagaland, Mizoram and West Bengal. Results In this study 34 pairs of primers were designed from conserved domains of 6 BLB resistance genes; Xa1, xa5, Xa21, Xa21(A1), Xa26 and Xa27. The designed primer pairs were used to generate PCR based polymorphic DNA profiles to detect and elucidate the genetic diversity of the six genes in the 22 diverse rice accessions of known disease phenotype. A total of 140 alleles were identified including 41 rare and 26 null alleles. The average polymorphism information content (PIC) value was 0.56/primer pair. The DNA profiles identified each of the rice landraces unequivocally. The amplified polymorphic DNA bands were used to calculate genetic similarity of the rice landraces in all possible pair combinations. The similarity among the rice accessions ranged from 18% to 89% and the dendrogram produced from the similarity values was divided into 2 major clusters. The conserved domains identified within the sequenced rare alleles include Leucine-Rich Repeat, BED-type zinc finger domain, sugar transferase domain and the domain of the carbohydrate esterase 4 superfamily. Conclusions This study revealed high genetic diversity at conserved domains of six BLB

  5. Gene set enrichment analysis of microarray data from Pimephales promelas (Rafinesque), a non-mammalian model organism

    PubMed Central

    2011-01-01

    Background Methods for gene-class testing, such as Gene Set Enrichment Analysis (GSEA), incorporate biological knowledge into the analysis and interpretation of microarray data by comparing gene expression patterns to pathways, systems and emergent phenotypes. However, to use GSEA to its full capability with non-mammalian model organisms, a microarray platform must be annotated with human gene symbols. Doing so enables the ability to relate a model organism's gene expression, in response to a given treatment, to potential human health consequences of that treatment. We enhanced the annotation of a microarray platform from a non-mammalian model organism, and then used the GSEA approach in a reanalysis of a study examining the biological significance of acute and chronic methylmercury exposure on liver tissue of fathead minnow (Pimephales promelas). Using GSEA, we tested the hypothesis that fathead livers, in response to methylmercury exposure, would exhibit gene expression patterns similar to diseased human livers. Results We describe an enhanced annotation of the fathead minnow microarray platform with human gene symbols. This resource is now compatible with the GSEA approach for gene-class testing. We confirmed that GSEA, using this enhanced microarray platform, is able to recover results consistent with a previous analysis of fathead minnow exposure to methylmercury using standard analytical approaches. Using GSEA to compare fathead gene expression profiles to human phenotypes, we also found that fathead methylmercury-treated livers exhibited expression profiles that are homologous to human systems & pathways and results in damage that is similar to those of human liver damage associated with hepatocellular carcinoma and hepatitis B. Conclusions This study describes a powerful resource for enabling the use of non-mammalian model organisms in the study of human health significance. Results of microarray gene expression studies involving fathead minnow, typically

  6. Choosing the right path: enhancement of biologically relevant sets of genes or proteins using pathway structure

    PubMed Central

    Thomas, Reuben; Gohlke, Julia M; Stopper, Geffrey F; Parham, Frederick M; Portier, Christopher J

    2009-01-01

    A method is proposed that finds enriched pathways relevant to a studied condition using the measured molecular data and also the structural information of the pathway viewed as a network of nodes and edges. Tests are performed using simulated data and genomic data sets and the method is compared to two existing approaches. The analysis provided demonstrates the method proposed is very competitive with the current approaches and also provides biologically relevant results. PMID:19393085

  7. Five histidine kinases perceive osmotic stress and regulate distinct sets of genes in Synechocystis.

    PubMed

    Paithoonrangsarid, Kalyanee; Shoumskaya, Maria A; Kanesaki, Yu; Satoh, Syusei; Tabata, Satoshi; Los, Dmitry A; Zinchenko, Vladislav V; Hayashi, Hidenori; Tanticharoen, Morakot; Suzuki, Iwane; Murata, Norio

    2004-12-17

    Microorganisms respond to hyperosmotic stress via changes in the levels of expression of large numbers of genes. Such responses are essential for acclimation to a new osmotic environment. To identify factors involved in the perception and transduction of signals caused by hyperosmotic stress, we examined the response of Synechocystis sp. PCC 6803, which has proven to be a particularly useful microorganism in similar analyses. We screened knockout libraries of histidine kinases (Hiks) and response regulators (Rres) in Synechocystis by DNA microarray and slot-blot hybridization analyses, and we identified several two-component systems, which we designated Hik-Rre systems, namely, Hik33-Rre31, Hik34-Rre1, and Hik10-Rre3, as well as Hik16-Hik41-Rre17, as the transducers of hyperosmotic stress. We also identified Hik2-Rre1 as a putative additional two-component system. Each individual two-component system regulated the transcription of a specific group of genes that were responsive to hyperosmotic stress.

  8. Extended triplet set C343 of DNA sequences and its application to the p53 gene

    NASA Astrophysics Data System (ADS)

    Yan, Yan-Yan; Zhu, Ping

    2011-01-01

    Recently, much research has indicated that more and more cancers pose a threat to human life. Cancers are caused by oncogenes. Many human oncogenes have been found and most of them are located on chromosomes. The discovery of the oncogene plays a significant role in the treatment of cancer. The p53 tumor suppressor gene has received much attention because it frequently mutates or deletes in tumor cells of most people. Thus, the study of oncogenes is significant. In order to establish the Galois field (GF(7)), the indefinite gene is introduced as D and oncogene is introduced as O, and P. Taking the polynomial coefficients a0, a1, a2 in GF(7) and the bijective function f: GF(7) → {D,A,C,O,G,T,P}, where f (0) = D, f (1) = A, f (2) = C, f (3) = O, f (4) = G, f (5) = T, and f (6) = P, the bijective phi may be written as phi(a0 + a1x + a2x2). Based on the algebraic structure, we can not only analyse the DNA sequence of oncogenes, but also predict possible new cancers.

  9. HoxBlinc RNA Recruits Set1/MLL Complexes to Activate Hox Gene Expression Patterns and Mesoderm Lineage Development.

    PubMed

    Deng, Changwang; Li, Ying; Zhou, Lei; Cho, Joonseok; Patel, Bhavita; Terada, Naohiro; Li, Yangqiu; Bungert, Jörg; Qiu, Yi; Huang, Suming

    2016-01-05

    Trithorax proteins and long-intergenic noncoding RNAs are critical regulators of embryonic stem cell pluripotency; however, how they cooperatively regulate germ layer mesoderm specification remains elusive. We report here that HoxBlinc RNA first specifies Flk1(+) mesoderm and then promotes hematopoietic differentiation through regulation of hoxb pathways. HoxBlinc binds to the hoxb genes, recruits Setd1a/MLL1 complexes, and mediates long-range chromatin interactions to activate transcription of the hoxb genes. Depletion of HoxBlinc by shRNA-mediated knockdown or CRISPR-Cas9-mediated genetic deletion inhibits expression of hoxb genes and other factors regulating cardiac/hematopoietic differentiation. Reduced hoxb expression is accompanied by decreased recruitment of Set1/MLL1 and H3K4me3 modification, as well as by reduced chromatin loop formation. Re-expression of hoxb2-b4 genes in HoxBlinc-depleted embryoid bodies rescues Flk1(+) precursors that undergo hematopoietic differentiation. Thus, HoxBlinc plays an important role in controlling hoxb transcription networks that mediate specification of mesoderm-derived Flk1(+) precursors and differentiation of Flk1(+) cells into hematopoietic lineages.

  10. The transcriptional response to encystation stimuli in Giardia lamblia is restricted to a small set of genes.

    PubMed

    Morf, Laura; Spycher, Cornelia; Rehrauer, Hubert; Fournier, Catharine Aquino; Morrison, Hilary G; Hehl, Adrian B

    2010-10-01

    The protozoan parasite Giardia lamblia undergoes stage differentiation in the small intestine of the host to an environmentally resistant and infectious cyst. Encystation involves the secretion of an extracellular matrix comprised of cyst wall proteins (CWPs) and a β(1-3)-GalNAc homopolymer. Upon the induction of encystation, genes coding for CWPs are switched on, and mRNAs coding for a Myb transcription factor and enzymes involved in cyst wall glycan synthesis are upregulated. Encystation in vitro is triggered by several protocols, which call for changes in bile concentrations or availability of lipids, and elevated pH. However, the conditions for induction are not standardized and we predicted significant protocol-specific side effects. This makes reliable identification of encystation factors difficult. Here, we exploited the possibility of inducing encystation with two different protocols, which we show to be equally effective, for a comparative mRNA profile analysis. The standard encystation protocol induced a bipartite transcriptional response with surprisingly minor involvement of stress genes. A comparative analysis revealed a core set of only 18 encystation genes and showed that a majority of genes was indeed upregulated as a side effect of inducing conditions. We also established a Myb binding sequence as a signature motif in encystation promoters, suggesting coordinated regulation of these factors.

  11. Of Genes and Machines: Application of a Combination of Machine Learning Tools to Astronomy Data Sets

    NASA Astrophysics Data System (ADS)

    Heinis, S.; Kumar, S.; Gezari, S.; Burgett, W. S.; Chambers, K. C.; Draper, P. W.; Flewelling, H.; Kaiser, N.; Magnier, E. A.; Metcalfe, N.; Waters, C.

    2016-04-01

    We apply a combination of genetic algorithm (GA) and support vector machine (SVM) machine learning algorithms to solve two important problems faced by the astronomical community: star-galaxy separation and photometric redshift estimation of galaxies in survey catalogs. We use the GA to select the relevant features in the first step, followed by optimization of SVM parameters in the second step to obtain an optimal set of parameters to classify or regress, in the process of which we avoid overfitting. We apply our method to star-galaxy separation in Pan-STARRS1 data. We show that our method correctly classifies 98% of objects down to {i}{{P1}}=24.5, with a completeness (or true positive rate) of 99% for galaxies and 88% for stars. By combining colors with morphology, our star-galaxy separation method yields better results than the new SExtractor classifier spread_model, in particular at the faint end ({i}{{P1}}\\gt 22). We also use our method to derive photometric redshifts for galaxies in the COSMOS bright multiwavelength data set down to an error in (1+z) of σ =0.013, which compares well with estimates from spectral energy distribution fitting on the same data (σ =0.007) while making a significantly smaller number of assumptions.

  12. Joint optimization of segmentation and shape prior from level-set-based statistical shape model, and its application to the automated segmentation of abdominal organs.

    PubMed

    Saito, Atsushi; Nawano, Shigeru; Shimizu, Akinobu

    2016-02-01

    The goal of this study is to provide a theoretical framework for accurately optimizing the segmentation energy considering all of the possible shapes generated from the level-set-based statistical shape model (SSM). The proposed algorithm solves the well-known open problem, in which a shape prior may not be optimal in terms of an objective functional that needs to be minimized during segmentation. The algorithm allows the selection of an optimal shape prior from among all possible shapes generated from an SSM by conducting a branch-and-bound search over an eigenshape space. The proposed algorithm does not require predefined shape templates or the construction of a hierarchical clustering tree before graph-cut segmentation. It jointly optimizes an objective functional in terms of both the shape prior and segmentation labeling, and finds an optimal solution by considering all possible shapes generated from an SSM. We apply the proposed algorithm to both pancreas and spleen segmentation using multiphase computed tomography volumes, and we compare the results obtained with those produced by a conventional algorithm employing a branch-and-bound search over a search tree of predefined shapes, which were sampled discretely from an SSM. The proposed algorithm significantly improves the segmentation performance in terms of the Jaccard index and Dice similarity index. In addition, we compare the results with the state-of-the-art multiple abdominal organs segmentation algorithm, and confirmed that the performances of both algorithms are comparable to each other. We discuss the high computational efficiency of the proposed algorithm, which was determined experimentally using a normalized number of traversed nodes in a search tree, and the extensibility of the proposed algorithm to other SSMs or energy functionals.

  13. An overlapping set of genes is regulated by both NFIB and the glucocorticoid receptor during lung maturation

    PubMed Central

    2014-01-01

    Background Lung maturation is a late fetal developmental event in both mice and humans. Because of this, lung immaturity is a serious problem in premature infants. Disruption of genes for either the glucocorticoid receptor (Nr3c1) or the NFIB transcription factors results in perinatal lethality due to lung immaturity. In both knockouts, the phenotype includes excess cell proliferation, failure of saccularization and reduced expression of markers of epithelial differentiation. This similarity suggests that the two genes may co-regulate a specific set of genes essential for lung maturation. Results We analyzed the roles of these two transcription factors in regulating transcription using ChIP-seq data for NFIB, and RNA expression data and motif analysis for both. Our new ChIP-seq data for NFIB in lung at E16.5 shows that NFIB binds to a NFI motif. This motif is over-represented in the promoters of genes that are under-expressed in Nfib-KO mice at E18.5, suggesting an activator role for NFIB. Using available microarray data from Nr3c1-KO mice, we further identified 52 genes that are under-expressed in both Nfib and Nr3c1 knockouts, an overlap which is 13.1 times larger than what would be expected by chance. Finally, we looked for enrichment of 738 recently published transcription factor motifs in the promoters of these putative target genes and found that the NFIB and glucocorticoid receptor motifs were among the most enriched, suggesting that a subset of these genes may be directly activated by Nfib and Nr3c1. Conclusions Our data provide the first evidence for Nfib and Nr3c1 co-regulating genes related to lung maturation. They also establish that the in vivo DNA-binding specificity of NFIB is the same as previously seen in vitro, and highly similar to that of the other NFI-family members NFIA, NFIC and NFIX. PMID:24661679

  14. The Effects of Violation of Data Set Assumptions when Using the Oneway, Fixed Effects Analysis of Variance and the One Concomitant Analysis of Covariance Statistical Procedures.

    ERIC Educational Resources Information Center

    Johnson, Colleen Cook

    The purpose of this study is to help define the precise nature and limits of the tolerable range in which a researcher may be relatively confident about the statistical validity of his or her research findings, focusing specifically on the statistical validity of results when violating the assumptions associated with the one-way, fixed-effects…

  15. Characterisation of SNP haplotype structure in chemokine and chemokine receptor genes using CEPH pedigrees and statistical estimation.

    PubMed

    Clark, Vanessa J; Dean, Michael

    2004-03-01

    Chemokine signals and their cell-surface receptors are important modulators of HIV-1 disease and cancer. To aid future case/control association studies, aim to further characterise the haplotype structure of variation in chemokine and chemokine receptor genes. To perform haplotype analysis in a population-based association study, haplotypes must be determined by estimation, in the absence of family information or laboratory methods to establish phase. Here, test the accuracy of estimates of haplotype frequency and linkage disequilibrium by comparing estimated haplotypes generated with the expectation maximisation (EM) algorithm to haplotypes determined from Centre d'Etude Polymorphisme Humain (CEPH) pedigree data. To do this, they have characterised haplotypes comprising alleles at 11 biallelic loci in four chemokine receptor genes (CCR3, CCR2, CCR5 and CCRL2), which span 150 kb on chromosome 3p21, and haplotyes of nine biallelic loci in six chemokine genes [MCP-1(CCL2), Eotaxin(CCL11), RANTES(CCL5), MPIF-1(CCL23), PARC(CCL18) and MIP-1alpha(CCL3)] on chromosome 17q11-12. Forty multi-generation CEPH families, totalling 489 individuals, were genotyped by the TaqMan 5'-nuclease assay. Phased haplotypes and haplotypes estimated from unphased genotypes were compared in 103 grandparents who were assumed to have mated at random. For the 3p21 single nucleotide polymorphism (SNP) data, haplotypes determined by pedigree analysis and haplotypes generated by the EM algorithm were nearly identical. Linkage disequilibrium, measured by the D' statistic, was nearly maximal across the 150 kb region, with complete disequilibrium maintained at the extremes between CCR3-Y17Y and CCRL2-I243V. D'-values calculated from estimated haplotypes on 3p21 had high concordance with pairwise comparisons between pedigree-phased chromosomes. Conversely, there was less agreement between analyses of haplotype frequencies and linkage disequilibrium using estimated haplotypes when compared with

  16. BAT3 and SET1A Form a Complex with CTCFL/BORIS To Modulate H3K4 Histone Dimethylation and Gene Expression▿ †

    PubMed Central

    Nguyen, Phuongmai; Bar-Sela, Gil; Sun, Lunching; Bisht, Kheem S.; Cui, Hengmi; Kohn, Elise; Feinberg, Andrew P.; Gius, David

    2008-01-01

    Chromatin status is characterized in part by covalent posttranslational modifications of histones that regulate chromatin dynamics and direct gene expression. BORIS (brother of the regulator of imprinted sites) is an insulator DNA-binding protein that is thought to play a role in chromatin organization and gene expression. BORIS is a cancer-germ line gene; these are genes normally present in male germ cells (testis) that are also expressed in cancer cell lines as well as primary tumors. This work identifies SET1A, an H3K4 methyltransferase, and BAT3, a cochaperone recruiter, as binding partners for BORIS, and these proteins bind to the upstream promoter regions of two well-characterized procarcinogenic genes, Myc and BRCA1. RNA interference (RNAi) knockdown of BAT3, as well as SET1A, decreased Myc and BRCA1 gene expression but did not affect the binding properties of BORIS, but RNAi knockdown of BORIS prevented the assembly of BAT3 and SET1A at the Myc and BRCA1 promoters. Finally, chromatin analysis suggested that BORIS and BAT3 exert their effects on gene expression by recruiting proteins such as SET1A that are linked to changes in H3K4 dimethylation. Thus, we propose that BORIS acts as a platform upon which BAT3 and SET1A assemble and exert effects upon chromatin structure and gene expression. PMID:18765639

  17. Development of gene-based markers and construction of an integrated linkage map in eggplant by using Solanum orthologous (SOL) gene sets.

    PubMed

    Fukuoka, Hiroyuki; Miyatake, Koji; Nunome, Tsukasa; Negoro, Satomi; Shirasawa, Kenta; Isobe, Sachiko; Asamizu, Erika; Yamaguchi, Hirotaka; Ohyama, Akio

    2012-06-01

    We constructed an integrated DNA marker linkage map of eggplant (Solanum melongena L.) using DNA marker segregation data sets obtained from two independent intraspecific F(2) populations. The linkage map consisted of 12 linkage groups and encompassed 1,285.5 cM in total. We mapped 952 DNA markers, including 313 genomic SSR markers developed by random sequencing of simple sequence repeat (SSR)-enriched genomic libraries, and 623 single-nucleotide polymorphisms (SNP) and insertion/deletion polymorphisms (InDels) found in eggplant-expressed sequence tags (ESTs) and related genomic sequences [introns and untranslated regions (UTRs)]. Because of their co-dominant inheritance and their highly polymorphic and multi-allelic nature, the SSR markers may be more versatile than the SNP and InDel markers for map-based genetic analysis of any traits of interest using segregating populations derived from any intraspecific crosses of practical breeding materials. However, we found that the distribution of microsatellites in the genome was biased to some extent, and therefore a considerable part of the eggplant genome was first detected when gene-derived SNP and InDel markers were mapped. Of the 623 SNP and InDel markers mapped onto the eggplant integrated map, 469 were derived from eggplant unigenes contained within Solanum orthologous (SOL) gene sets (i.e., sets of orthologous unigenes from eggplant, tomato, and potato). Out of the 469 markers, 326 could also be mapped onto the tomato map. These common markers will be informative landmarks for the transfer of tomato's more saturated genomic information to eggplant and will also provide comparative information on the genome organization of the two solanaceous species. The data are available from the DNA marker database of vegetables, VegMarks (http://vegmarks.nivot.affrc.go.jp).

  18. HPV-16 E1, E2 and E6 each complement the Ad5 helper gene set, increasing rAAV2 and wt AAV2 production.

    PubMed

    Cao, M; Zhu, H; Bandyopadhyay, S; You, H; Hermonat, P L

    2012-04-01

    Adeno-associated virus type 2 (AAV) is a popular vector for human gene therapy, because of its safety record and ability to express genes long term. Yet large-scale recombinant (r) AAV production remains problematic because of low particle yield. The adenovirus (Ad) and herpes (simplex) virus helper genes for AAV have been widely used and studied, but the helper genes of human papillomavirus (HPV) have not. HPV-16 E1, E2 and E6 help wild-type (wt) AAV productive infection in differentiating keratinocytes, however, HEK293 cells are the standard cell line used for generating rAAV. Here we demonstrate that the three HPV genes were unable to stimulate significant rAAV replication in HEK293 cells when used alone. However, when used in conjunction (complementation) with the standard Ad5 helper gene set, E1, E2 and E6 were each capable of significantly boosting rAAV DNA replication and virus particle yield. Moreover, wt AAV DNA replication and virion yield were also significantly boosted by each HPV gene along with wt Ad5 virus co-infection. Mild-to-moderate changes in rep- and cap-encoded protein levels were evident in the presence of the E1, E2 and E6 genes. Higher wt AAV DNA replication was not matched by similar increases in the levels of rep-encoded protein. Moreover, although rep mRNA was upregulated, cap mRNA was upregulated more. Higher virus yields did correlate most consistently with increased Rep52-, VP3- and VP-related 21/31 kDa species. The observed boost in wt and rAAV production by HPV genes was not unexpected, as the Ad and HPV helper gene sets do not seem to recapitulate each other. These results raise the possibility of generating improved helper gene sets derived from both the Ad and HPV helper gene sets.

  19. HPV-16 E1, E2 and E6 each complement the Ad5 helper gene set, increasing rAAV2 and wt AAV2 production

    PubMed Central

    Cao, M.; Zhu, H.; Bandyopadhyay, S; You, H; Hermonat, P.L.

    2011-01-01

    Adeno-associated virus type 2 (AAV) is a popular vector for human gene therapy, because of its safety record and ability to express genes long term. Yet large scale recombinant (r)AAV production remains problematic due to low particle yield. The adenovirus (Ad) and herpes (simplex) virus (HSV) helper genes for AAV have been widely used and studied, but the helper genes of human papillomavirus (HPV) have not. HPV-16 E1, E2 and E6 help wild type (wt) AAV productive infection in differentiating keratinocytes, however HEK293 cells are the standard cell line used for generating rAAV. Here we demonstrate that the three HPV genes were unable to stimulate significant rAAV replication in HEK293 cells when used alone. However, when used in conjunction (complementation) with the standard Ad5 helper gene set, E1, E2 and E6 were each capable of significantly boosting rAAV DNA replication and virus particle yield. Moreover, wt AAV DNA replication and virion yield were also significantly boosted by each HPV gene along with wt Ad5 virus co-infection. Mild to moderate changes in rep- and cap–encoded protein levels were evident in the presence of the E1, E2 and E6 genes. Higher wt AAV DNA replication was not matched by similar increases in the levels of rep-encoded protein. Moreover, while rep mRNA was up-regulated, cap mRNA was up-regulated more. Higher virus yields did correlate most consistently with increased Rep52, VP3 and VP-related 21/31 kDa species. The observed boost in wt and rAAV production by HPV genes was not unexpected, as the Ad and HPV helper gene sets do not seem to recapitulate each other. These results raise the possibility of generating improved helper gene sets derived from both the Ad and HPV helper gene sets. PMID:21850053

  20. Optimization to the Culture Conditions for Phellinus Production with Regression Analysis and Gene-Set Based Genetic Algorithm

    PubMed Central

    Li, Zhongwei; Xin, Yuezhen; Wang, Xun; Sun, Beibei; Xia, Shengyu; Li, Hui

    2016-01-01

    Phellinus is a kind of fungus and is known as one of the elemental components in drugs to avoid cancers. With the purpose of finding optimized culture conditions for Phellinus production in the laboratory, plenty of experiments focusing on single factor were operated and large scale of experimental data were generated. In this work, we use the data collected from experiments for regression analysis, and then a mathematical model of predicting Phellinus production is achieved. Subsequently, a gene-set based genetic algorithm is developed to optimize the values of parameters involved in culture conditions, including inoculum size, PH value, initial liquid volume, temperature, seed age, fermentation time, and rotation speed. These optimized values of the parameters have accordance with biological experimental results, which indicate that our method has a good predictability for culture conditions optimization. PMID:27610365

  1. Gene Sets for Utilization of Primary and Secondary Nutrition Supplies in the Distal Gut of Endangered Iberian Lynx

    PubMed Central

    Alcaide, María; Messina, Enzo; Richter, Michael; Bargiela, Rafael; Peplies, Jörg; Huws, Sharon A.; Newbold, Charles J.; Golyshin, Peter N.; Simón, Miguel A.; López, Guillermo; Yakimov, Michail M.; Ferrer, Manuel

    2012-01-01

    Recent studies have indicated the existence of an extensive trans-genomic trans-mural co-metabolism between gut microbes and animal hosts that is diet-, host phylogeny- and provenance-influenced. Here, we analyzed the biodiversity at the level of small subunit rRNA gene sequence and the metabolic composition of 18 Mbp of consensus metagenome sequences and activity characteristics of bacterial intra-cellular extracts, in wild Iberian lynx (Lynx pardinus) fecal samples. Bacterial signatures (14.43% of all of the Firmicutes reads and 6.36% of total reads) related to the uncultured anaerobic commensals Anaeroplasma spp., which are typically found in ovine and bovine rumen, were first identified. The lynx gut was further characterized by an over-representation of ‘presumptive’ aquaporin aqpZ genes and genes encoding ‘active’ lysosomal-like digestive enzymes that are possibly needed to acquire glycerol, sugars and amino acids from glycoproteins, glyco(amino)lipids, glyco(amino)glycans and nucleoside diphosphate sugars. Lynx gut was highly enriched (28% of the total glycosidases) in genes encoding α-amylase and related enzymes, although it exhibited low rate of enzymatic activity indicative of starch degradation. The preponderance of β-xylosidase activity in protein extracts further suggests lynx gut microbes being most active for the metabolism of β-xylose containing plant N-glycans, although β-xylosidases sequences constituted only 1.5% of total glycosidases. These collective and unique bacterial, genetic and enzymatic activity signatures suggest that the wild lynx gut microbiota not only harbors gene sets underpinning sugar uptake from primary animal tissues (with the monotypic dietary profile of the wild lynx consisting of 80–100% wild rabbits) but also for the hydrolysis of prey-derived plant biomass. Although, the present investigation corresponds to a single sample and some of the statements should be considered qualitative, the data most likely

  2. Gene sets for utilization of primary and secondary nutrition supplies in the distal gut of endangered Iberian lynx.

    PubMed

    Alcaide, María; Messina, Enzo; Richter, Michael; Bargiela, Rafael; Peplies, Jörg; Huws, Sharon A; Newbold, Charles J; Golyshin, Peter N; Simón, Miguel A; López, Guillermo; Yakimov, Michail M; Ferrer, Manuel

    2012-01-01

    Recent studies have indicated the existence of an extensive trans-genomic trans-mural co-metabolism between gut microbes and animal hosts that is diet-, host phylogeny- and provenance-influenced. Here, we analyzed the biodiversity at the level of small subunit rRNA gene sequence and the metabolic composition of 18 Mbp of consensus metagenome sequences and activity characteristics of bacterial intra-cellular extracts, in wild Iberian lynx (Lynx pardinus) fecal samples. Bacterial signatures (14.43% of all of the Firmicutes reads and 6.36% of total reads) related to the uncultured anaerobic commensals Anaeroplasma spp., which are typically found in ovine and bovine rumen, were first identified. The lynx gut was further characterized by an over-representation of 'presumptive' aquaporin aqpZ genes and genes encoding 'active' lysosomal-like digestive enzymes that are possibly needed to acquire glycerol, sugars and amino acids from glycoproteins, glyco(amino)lipids, glyco(amino)glycans and nucleoside diphosphate sugars. Lynx gut was highly enriched (28% of the total glycosidases) in genes encoding α-amylase and related enzymes, although it exhibited low rate of enzymatic activity indicative of starch degradation. The preponderance of β-xylosidase activity in protein extracts further suggests lynx gut microbes being most active for the metabolism of β-xylose containing plant N-glycans, although β-xylosidases sequences constituted only 1.5% of total glycosidases. These collective and unique bacterial, genetic and enzymatic activity signatures suggest that the wild lynx gut microbiota not only harbors gene sets underpinning sugar uptake from primary animal tissues (with the monotypic dietary profile of the wild lynx consisting of 80-100% wild rabbits) but also for the hydrolysis of prey-derived plant biomass. Although, the present investigation corresponds to a single sample and some of the statements should be considered qualitative, the data most likely suggests a

  3. The Effects of Single and Compound Violations of Data Set Assumptions when Using the Oneway, Fixed Effects Analysis of Variance and the One Concomitant Analysis of Covariance Statistical Models.

    ERIC Educational Resources Information Center

    Johnson, Colleen Cook

    This study integrates into one comprehensive Monte Carlo simulation a vast array of previously defined and substantively interrelated research studies of the robustness of analysis of variance (ANOVA) and analysis of covariance (ANCOVA) statistical procedures. Three sets of balanced ANOVA and ANCOVA designs (group sizes of 15, 30, and 45) and one…

  4. Analysis of protein gene products in cells with altered chromosome sets for the purpose of genetic mapping

    SciTech Connect

    Shishkin, S.S.; Zakharov, S.F.; Gromov, P.S.; Shcheglova, M.V.; Kukharenko, V.I.; Shilov, A.G.; Matveeva, N.M.; Zhdanova, N.S.; Efimochkin, A.S.; Krokhina, T.B. |

    1994-12-01

    Two-dimensional electrophoresis was used for analyzing proteins in hybrid cells that contained single human chromosomes (chromosome 5, chromosome 21, or chromosomes 5 and 21) against the background of the mouse genome. By comparing the protein patterns of hybrid and parent cells (about 1000 protein fractions for each kind of cell), five fractions among proteins of hybrid cells were supposedly identified as human proteins. The genes of two of them are probably located on chromosome 5, and those of the other three on chromosome 21. Moreover, analysis of proteins in fibroblasts of patients with the cri-du-chat syndrome (5p-) revealed a decrease in the content of two proteins as compared with those in preparations of diploid fibroblasts. This fact was regarded as evidence that two corresponding genes are located on the short arm of chromosome 5. Methodological problems associated with the use of protein pattern analysis in cells with altered chromosome sets for the purposes of genetic mapping are discussed.

  5. The STATFLUX code: a statistical method for calculation of flow and set of parameters, based on the Multiple-Compartment Biokinetical Model

    NASA Astrophysics Data System (ADS)

    Garcia, F.; Mesa, J.; Arruda-Neto, J. D. T.; Helene, O.; Vanin, V.; Milian, F.; Deppman, A.; Rodrigues, T. E.; Rodriguez, O.

    2007-03-01

    The code STATFLUX, implementing a new and simple statistical procedure for the calculation of transfer coefficients in radionuclide transport to animals and plants, is proposed. The method is based on the general multiple-compartment model, which uses a system of linear equations involving geometrical volume considerations. Flow parameters were estimated by employing two different least-squares procedures: Derivative and Gauss-Marquardt methods, with the available experimental data of radionuclide concentrations as the input functions of time. The solution of the inverse problem, which relates a given set of flow parameter with the time evolution of concentration functions, is achieved via a Monte Carlo simulation procedure. Program summaryTitle of program:STATFLUX Catalogue identifier:ADYS_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADYS_v1_0 Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Licensing provisions: none Computer for which the program is designed and others on which it has been tested:Micro-computer with Intel Pentium III, 3.0 GHz Installation:Laboratory of Linear Accelerator, Department of Experimental Physics, University of São Paulo, Brazil Operating system:Windows 2000 and Windows XP Programming language used:Fortran-77 as implemented in Microsoft Fortran 4.0. NOTE: Microsoft Fortran includes non-standard features which are used in this program. Standard Fortran compilers such as, g77, f77, ifort and NAG95, are not able to compile the code and therefore it has not been possible for the CPC Program Library to test the program. Memory required to execute with typical data:8 Mbytes of RAM memory and 100 MB of Hard disk memory No. of bits in a word:16 No. of lines in distributed program, including test data, etc.:6912 No. of bytes in distributed program, including test data, etc.:229 541 Distribution format:tar.gz Nature of the physical problem:The investigation of transport mechanisms for

  6. A Set of miRNAs, Their Gene and Protein Targets and Stromal Genes Distinguish Early from Late Onset ER Positive Breast Cancer

    PubMed Central

    Bastos, E. P.; Brentani, H.; Pereira, C. A. B.; Polpo, A.; Lima, L.; Puga, R. D.; Pasini, F. S.; Osorio, C. A. B. T.; Roela, R. A.; Achatz, M. I.; Trapé, A. P.; Gonzalez-Angulo, A. M.; Brentani, M. M.

    2016-01-01

    Breast cancer (BC) in young adult patients (YA) has a more aggressive biological behavior and is associated with a worse prognosis than BC arising in middle aged patients (MA). We proposed that differentially expressed miRNAs could regulate genes and proteins underlying aggressive phenotypes of breast tumors in YA patients when compared to those arising in MA patients. Objective: Using integrated expression analyses of miRs, their mRNA and protein targets and stromal gene expression, we aimed to identify differentially expressed profiles between tumors from YA-BC and MA-BC. Methodology and Results: Samples of ER+ invasive ductal breast carcinomas, divided into two groups: YA-BC (35 years or less) or MA-BC (50–65 years) were evaluated. Screening for BRCA1/2 status according to the BOADICEA program indicated low risk of patients being carriers of these mutations. Aggressive characteristics were more evident in YA-BC versus MA-BC. Performing qPCR, we identified eight miRs differentially expressed (miR-9, 18b, 33b, 106a, 106b, 210, 518a-3p and miR-372) between YA-BC and MA-BC tumors with high confidence statement, which were associated with aggressive clinicopathological characteristics. The expression profiles by microarray identified 602 predicted target genes associated to proliferation, cell cycle and development biological functions. Performing RPPA, 24 target proteins differed between both groups and 21 were interconnected within a network protein-protein interactions associated with proliferation, development and metabolism pathways over represented in YA-BC. Combination of eight mRNA targets or the combination of eight target proteins defined indicators able to classify individual samples into YA-BC or MA-BC groups. Fibroblast-enriched stroma expression profile analysis resulted in 308 stromal genes differentially expressed between YA-BC and MA-BC. Conclusion: We defined a set of differentially expressed miRNAs, their mRNAs and protein targets and stromal

  7. Expressed genes for plant-type ribulose 1,5-bisphosphate carboxylase/oxygenase in the photosynthetic bacterium Chromatium vinosum, which possesses two complete sets of the genes.

    PubMed Central

    Viale, A M; Kobayashi, H; Akazawa, T

    1989-01-01

    Two sets of genes for the large and small subunits of ribulose 1,5-bisphosphate carboxylase/oxygenase (RuBisCO) were detected in the photosynthetic purple sulfur bacterium Chromatium vinosum by hybridization analysis with RuBisCO gene probes, cloned by using the lambda Fix vector, and designated rbcL-rbcS and rbcA-rbcB. rbcL and rbcA encode the large subunits, and rbcS and rbcB encode the small subunits. rbcL-rbcS was the same as that reported previously (A. M. Viale, H. Kobayashi, T. Takabe, and T. Akazawa, FEBS Lett. 192:283-288, 1985). A DNA fragment bearing rbcA-rbcB was subcloned in plasmid vectors and sequenced. We found that rbcB was located 177 base pairs downstream of the rbcA coding region, and both genes were preceded by plausible procaryotic ribosome-binding sites. rbcA and rbcD encoded polypeptides of 472 and 118 amino acids, respectively. Edman degradation analysis of the subunits of RuBisCO isolated from C. vinosum showed that rbcA-rbcB encoded the enzyme present in this bacterium. The large- and small-subunit polypeptides were posttranslationally processed to remove 2 and 1 amino acid residues from their N-termini, respectively. Among hetero-oligomeric RuBisCOs, the C. vinosum large subunit exhibited higher homology to that from cyanobacteria, eucaryotic algae, and higher plants (71.6 to 74.2%) than to that from the chemolithotrophic bacterium Alcaligenes eutrophus (56.6%). A similar situation has been observed for the C. vinosum small subunit, although the homology among small subunits from different organisms was lower than that among the large subunits. Images PMID:2708310

  8. Comparative genomic analysis of Brucella abortus vaccine strain 104M reveals a set of candidate genes associated with its virulence attenuation.

    PubMed

    Yu, Dong; Hui, Yiming; Zai, Xiaodong; Xu, Junjie; Liang, Long; Wang, Bingxiang; Yue, Junjie; Li, Shanhu

    2015-01-01

    The Brucella abortus strain 104M, a spontaneously attenuated strain, has been used as a vaccine strain in humans against brucellosis for 6 decades in China. Despite many studies, the molecular mechanisms that cause the attenuation are still unclear. Here, we determined the whole-genome sequence of 104M and conducted a comprehensive comparative analysis against the whole genome sequences of the virulent strain, A13334, and other reference strains. This analysis revealed a highly similar genome structure between 104M and A13334. The further comparative genomic analysis between 104M and A13334 revealed a set of genes missing in 104M. Some of these genes were identified to be directly or indirectly associated with virulence. Similarly, a set of mutations in the virulence-related genes was also identified, which may be related to virulence alteration. This study provides a set of candidate genes associated with virulence attenuation in B.abortus vaccine strain 104M.

  9. A Research Methodology for Future Summative Evaluation Studies: Incorporating the Component of Multiple Sets of Matched Samples into the Statistical Control Modeling

    ERIC Educational Resources Information Center

    Li, Yuan H.; Modarresi, Shahpar; Yang, Yu N.

    2006-01-01

    Summative evaluations have often been undertaken to determine the impact of educational programs on student academic achievement employing a quasi-experimental design. The summative finding is expected to be less misleading if a statistical model is performed on a dataset including a sound matched sample as a control group. This is because an…

  10. Validation of the Lung Subtyping Panel in Multiple Fresh-Frozen and Formalin-Fixed, Paraffin-Embedded Lung Tumor Gene Expression Data Sets.

    PubMed

    Faruki, Hawazin; Mayhew, Gregory M; Fan, Cheng; Wilkerson, Matthew D; Parker, Scott; Kam-Morgan, Lauren; Eisenberg, Marcia; Horten, Bruce; Hayes, D Neil; Perou, Charles M; Lai-Goldman, Myla

    2016-06-01

    Context .- A histologic classification of lung cancer subtypes is essential in guiding therapeutic management. Objective .- To complement morphology-based classification of lung tumors, a previously developed lung subtyping panel (LSP) of 57 genes was tested using multiple public fresh-frozen gene-expression data sets and a prospectively collected set of formalin-fixed, paraffin-embedded lung tumor samples. Design .- The LSP gene-expression signature was evaluated in multiple lung cancer gene-expression data sets totaling 2177 patients collected from 4 platforms: Illumina RNAseq (San Diego, California), Agilent (Santa Clara, California) and Affymetrix (Santa Clara) microarrays, and quantitative reverse transcription-polymerase chain reaction. Gene centroids were calculated for each of 3 genomic-defined subtypes: adenocarcinoma, squamous cell carcinoma, and neuroendocrine, the latter of which encompassed both small cell carcinoma and carcinoid. Classification by LSP into 3 subtypes was evaluated in both fresh-frozen and formalin-fixed, paraffin-embedded tumor samples, and agreement with the original morphology-based diagnosis was determined. Results .- The LSP-based classifications demonstrated overall agreement with the original clinical diagnosis ranging from 78% (251 of 322) to 91% (492 of 538 and 869 of 951) in the fresh-frozen public data sets and 84% (65 of 77) in the formalin-fixed, paraffin-embedded data set. The LSP performance was independent of tissue-preservation method and gene-expression platform. Secondary, blinded pathology review of formalin-fixed, paraffin-embedded samples demonstrated concordance of 82% (63 of 77) with the original morphology diagnosis. Conclusions .- The LSP gene-expression signature is a reproducible and objective method for classifying lung tumors and demonstrates good concordance with morphology-based classification across multiple data sets. The LSP panel can supplement morphologic assessment of lung cancers, particularly

  11. The complex set of late transcripts from the Drosophila sex determination gene sex-lethal encodes multiple related polypeptides.

    PubMed Central

    Samuels, M E; Schedl, P; Cline, T W

    1991-01-01

    Sex-lethal (Sxl), a key sex determination gene in Drosophila melanogaster, is known to express a set of three early transcripts arising during early embryogenesis and a set of seven late transcripts occurring from midembryogenesis through adulthood. Among the late transcripts, male-specific mRNAs were distinguished from their female counterparts by the presence of an extra exon interrupting an otherwise long open reading frame (ORF). We have now analyzed the structures of the late Sxl transcripts by cDNA sequencing, Northern (RNA) blotting, primer extension, and RNase protection. The late transcripts appear to use a common 5' end but differ at their 3' ends by the use of alternative polyadenylation sites. Two of these sites lack canonical AATAAA sequences, and their use correlates in females with the presence of a functional germ line, suggesting possible tissue-specific polyadenylation. Besides the presence of the male-specific exon, no additional sex-specific splicing events were detected, although a number of non-sex-specific splicing variants were observed. In females, the various forms of late Sxl transcript potentially encode up to six slightly different polypeptides. All of the protein-coding differences occur outside the previously defined ribonucleoprotein motifs. One class of Sxl mRNAs also includes a second long ORF in the same frame as the first ORF but separated from it by a single ochre codon. The function of this second ORF is unknown. Significant amounts of apparently partially processed Sxl RNAs were observed, consistent with the hypothesis that the regulated Sxl splices occur relatively slowly. Images PMID:1710769

  12. Fast set-based association analysis using summary data from GWAS identifies novel gene loci for human complex traits

    PubMed Central

    Bakshi, Andrew; Zhu, Zhihong; Vinkhuyzen, Anna A. E.; Hill, W. David; McRae, Allan F.; Visscher, Peter M.; Yang, Jian

    2016-01-01

    We propose a method (fastBAT) that performs a fast set-based association analysis for human complex traits using summary-level data from genome-wide association studies (GWAS) and linkage disequilibrium (LD) data from a reference sample with individual-level genotypes. We demonstrate using simulations and analyses of real datasets that fastBAT is more accurate and orders of magnitude faster than the prevailing methods. Using fastBAT, we analyze summary data from the latest meta-analyses of GWAS on 150,064–339,224 individuals for height, body mass index (BMI), and schizophrenia. We identify 6 novel gene loci for height, 2 for BMI, and 3 for schizophrenia at PfastBAT < 5 × 10−8. The gain of power is due to multiple small independent association signals at these loci (e.g. the THRB and FOXP1 loci for schizophrenia). The method is general and can be applied to GWAS data for all complex traits and diseases in humans and to such data in other species. PMID:27604177

  13. Sex determination of early medieval individuals through nested PCR using a new primer set in the SRY gene.

    PubMed

    Luptáková, Lenka; Bábelová, Andrea; Omelka, Radoslav; Kolena, Branislav; Vondráková, Mária; Bauerová, Mária

    2011-04-15

    One of the first questions asked about excavated human skeletal remains is the sex. As the morphological sex determination is complicated in cases involving fragmentary bones and in skeletons from infants and children, the development of DNA-based techniques has led to improvements in sex determination. This study is focused on sex determination from ancient DNA obtained from 25 skeletons found in Middle Aged burials in western Slovakia. We performed separate amplifications of DXZ4 repetitive satellite sequences on the X chromosome, and SRY gene - testis determined factor on the Y chromosome, using nested PCR. Our results showed that DXZ4 was amplified in the case of 23 individuals. With newly designed internal and external primer sets for SRY detection with internal PCR products in lengths of 102 bp and 85 bp we succeeded in detecting the SRY locus in 9 samples. Finally, the gender was determined in 23 individuals (14 females and 9 males). In 20 samples, the gender was determined by morphological and molecular methods. Sex determination of 17 samples using nested PCR matched the morphological one, providing evidence of the authenticity and ancient origin of the PCR amplifications. The DXZ4/SRY nested PCR method represents a useful technique in sex determination of medieval human remains and it is a critical addition to anthropological studies.

  14. Multiple phytohormones promote root hair elongation by regulating a similar set of genes in the root epidermis in Arabidopsis

    PubMed Central

    Zhang, Shan; Huang, Linli; Yan, An; Liu, Yihua; Liu, Bohan; Yu, Chunyan; Zhang, Aidong; Schiefelbein, John; Gan, Yinbo

    2016-01-01

    Multiple phytohormones, including auxin, ethylene, and cytokinin, play vital roles in regulating cell development in the root epidermis. However, their interactions in specific root hair cell developmental stages are largely unexplored. To bridge this gap, we employed genetic and pharmacological approaches as well as transcriptional analysis in order to dissect their distinct and overlapping roles in root hair initiation and elongation in Arabidopsis thaliana. Our results show that among auxin, ethylene, and cytokinin, only ethylene induces ectopic root hair cells in wild-type plants, implying a special role of ethylene in the hair initiation stage. In the subsequent elongation stage, however, auxin, ethylene, and cytokinin enhance root hair tip growth equally. Our data also suggest that the effect of cytokinin is independent from auxin and ethylene in this process. Exogenous cytokinin restores root hair elongation when the auxin and ethylene signal is defective, whereas auxin and ethylene also sustain elongation in the absence of the cytokinin signal. Notably, transcriptional analyses demonstrated that auxin, ethylene, and cytokinin regulate a similar set of root hair-specific genes. Together these analyses provide important clues regarding the mechanism of hormonal interactions and regulation in the formation of single-cell structures. PMID:27799284

  15. A MultiSite GatewayTM vector set for the functional analysis of genes in the model Saccharomyces cerevisiae

    PubMed Central

    2012-01-01

    Background Recombinatorial cloning using the GatewayTM technology has been the method of choice for high-throughput omics projects, resulting in the availability of entire ORFeomes in GatewayTM compatible vectors. The MultiSite GatewayTM system allows combining multiple genetic fragments such as promoter, ORF and epitope tag in one single reaction. To date, this technology has not been accessible in the yeast Saccharomyces cerevisiae, one of the most widely used experimental systems in molecular biology, due to the lack of appropriate destination vectors. Results Here, we present a set of three-fragment MultiSite GatewayTM destination vectors that have been developed for gene expression in S. cerevisiae and that allow the assembly of any promoter, open reading frame, epitope tag arrangement in combination with any of four auxotrophic markers and three distinct replication mechanisms. As an example of its applicability, we used yeast three-hybrid to provide evidence for the assembly of a ternary complex of plant proteins involved in jasmonate signalling and consisting of the JAZ, NINJA and TOPLESS proteins. Conclusion Our vectors make MultiSite GatewayTM cloning accessible in S. cerevisiae and implement a fast and versatile cloning method for the high-throughput functional analysis of (heterologous) proteins in one of the most widely used model organisms for molecular biology research. PMID:22994806

  16. Integrative structural annotation of de novo RNA-Seq provides an accurate reference gene set of the enormous genome of the onion (Allium cepa L.).

    PubMed

    Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil

    2015-02-01

    The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp.

  17. Intron loss in interferon genes follows a distinct set of stages, and may confer an evolutionary advantage.

    PubMed

    Krause, Christopher D

    2016-07-01

    The promoter-intron-exon structure of genes evolve. While the structures of some IFN genes (e.g., piscine and amphibian Type I IFNs, most tetrapod IFN-λ genes) resemble those of other class II cytokines (e.g., interleukins-10, 19, 20, 22, 24, 26), the structures of other IFN genes differ significantly. Although all bony vertebrate IFN-γ genes lack the canonical third intron, and all amniote Type I IFN genes lack introns, only some IFN-λ genes lost their introns. Interestingly, these intronless IFN-λ genes are not preferentially related to one another nor are they clustered with canonical multi-intron IFN-λ genes. Hypothesizing that intronless IFN-λ genes repeatedly and independently evolved and transposed throughout the genome, we sought to understand the genetic processes involved in their intron loss and genomic migration. Utilizing the high conservation of the promoters, the UTRs and the ORFs of the IFN-λ genes, we collected data from two families of intronless IFN-λ genes, and developed a model supported by these data to explain how intronless IFN-λ genes evolved. (1) A cytoplasmic IFN-λ cDNA generated by reverse transcriptional activity enters the nucleus and attempts to recombine with its multi-exon progenitor. (2) Nuclear DNA synthesis at the 5' and 3' ends within recombination intermediates affixes the promoter onto the cDNA and preserves its 3' UTR. (3) Resolution of the recombination complex releases the promoter-associated cDNA. (4) The released intronless gene co-integrates with a highly duplicated sequence undergoing transposition. We propose that this process explains not only the evolution of the gene structure of IFN genes, but also the increased transposition of intronless genes in genomes, and may confer an evolutionary advantage.

  18. The statistics of natural ELF/VLF waves derived from a long continuous set of ground-based observations at high latitude

    NASA Astrophysics Data System (ADS)

    Smith, A. J.; Horne, R. B.; Meredith, N. P.

    2010-04-01

    This paper analyses a unique set of continuous high-quality well-calibrated observations of natural ELF/VLF radio waves, in the range 0.3-10 kHz, made at Halley Research Station, Antarctica (76oS,27oW,L=4.5) over one and a half solar cycles (1992-2007). Reference is also made to similar but shorter data sets obtained from other Antarctic stations. The observed waves vary over a very wide dynamic range, from the receiver noise level of wrt (at 1 kHz) up to 40-50 dB above it, over a wide range of timescales. However, the long continuous data set allows us to average out the random and aperiodic variations to extract the underlying dependence of the wave characteristics on local time, time of year, solar cycle, etc. Below about 5 kHz the received waves are predominantly whistler-mode waves, notably chorus, which are generated in the magnetosphere and propagate on geomagnetic field-aligned ("ducted") paths to low altitudes. At the top of the frequency range the observed waves are mostly atmospherics from tropical lightning. The spectrum, and dependence on local time and season, are discussed in terms of a source function and a propagation function from the source region through the ionosphere (in the case of the magnetospheric waves) and under the ionosphere. The dependence of the waves on latitude, geomagnetic activity, solar cycle and day of the week are also described.

  19. Risks of spontaneously and IVF-conceived singleton and twin pregnancies differ, requiring reassessment of statistical premises favoring elective single embryo transfer (eSET).

    PubMed

    Gleicher, Norbert; Kushnir, Vitally A; Barad, David H

    2016-05-03

    A published review of the literature by Dutch investigators in 2004 suggested significant outcome differences between spontaneously - and in vitro fertilization (IVF) - conceived singleton and twin pregnancies. Here we review whether later studies between 2004-2015 confirmed these findings. Though methodologies of here reviewed studies varied, and all were retrospective, they overall confirmed results of the 2004 review, and supported significant outcome variances between spontaneously- and IVF-conceived pregnancies: IVF singletons demonstrate significantly poorer and IVF twins significantly better perinatal outcomes than spontaneously conceived singletons and twins, with differences stable over time, and with overall obstetrical outcomes significantly improved. Exaggerations of severe IVF twin risks are likely in the 50 % range, while exaggerations of milder perinatal risks are approximately in 25 % range. Though elective single embryo transfers (eSET) have been confirmed to reduce pregnancy chances, they are, nevertheless, increasingly utilized. eSET, equally unquestionably, however, reduces twin pregnancies. Because twin pregnancies have been alleged to increase outcome risks in comparison to singleton pregnancies, here reported findings should affect the ongoing discussion whether increased twin risks are factual. With no risk excess, eSET significantly reduces IVF pregnancy chances without compensatory benefits and, therefore, is not advisable in IVF, unless patients do not wish to conceive twins or have medical contraindications to conceiving twins.

  20. SAP domain-dependent Mkl1 signaling stimulates proliferation and cell migration by induction of a distinct gene set indicative of poor prognosis in breast cancer patients

    PubMed Central

    2014-01-01

    Background The main cause of death of breast cancer patients is not the primary tumor itself but the metastatic disease. Identifying breast cancer-specific signatures for metastasis and learning more about the nature of the genes involved in the metastatic process would 1) improve our understanding of the mechanisms of cancer progression and 2) reveal new therapeutic targets. Previous studies showed that the transcriptional regulator megakaryoblastic leukemia-1 (Mkl1) induces tenascin-C expression in normal and transformed mammary epithelial cells. Tenascin-C is known to be expressed in metastatic niches, is highly induced in cancer stroma and promotes breast cancer metastasis to the lung. Methods Using HC11 mammary epithelial cells overexpressing different Mkl1 constructs, we devised a subtractive transcript profiling screen to identify the mechanism by which Mkl1 induces a gene set co-regulated with tenascin-C. We performed computational analysis of the Mkl1 target genes and used cell biological experiments to confirm the effect of these gene products on cell behavior. To analyze whether this gene set is prognostic of accelerated cancer progression in human patients, we used the bioinformatics tool GOBO that allowed us to investigate a large breast tumor data set linked to patient data. Results We discovered a breast cancer-specific set of genes including tenascin-C, which is regulated by Mkl1 in a SAP domain-dependent, serum response factor-independent manner and is strongly implicated in cell proliferation, cell motility and cancer. Downregulation of this set of transcripts by overexpression of Mkl1 lacking the SAP domain inhibited cell growth and cell migration. Many of these genes are direct Mkl1 targets since their promoter-reporter constructs were induced by Mkl1 in a SAP domain-dependent manner. Transcripts, most strongly reduced in the absence of the SAP domain were mechanoresponsive. Finally, expression of this gene set is associated with high

  1. Different Sets of Post-Embryonic Development Genes Are Conserved or Lost in Two Caryophyllales Species (Reaumuria soongorica and Agriophyllum squarrosum).

    PubMed

    Zhao, Pengshan; Zhang, Jiwei; Zhao, Xin; Chen, Guoxiong; Ma, Xiao-Fei

    2016-01-01

    Reaumuria soongorica and sand rice (Agriophyllum squarrosum) belong to the clade of Caryophyllales and are widely distributed in the desert regions of north China. Both plants have evolved many specific traits and adaptation strategies to cope with recurring environmental threats. However, the genetic basis that underpins their unique traits and adaptation remains unknown. In this study, the transcriptome data of R. soongorica and sand rice were compared with three other species with previously sequenced genomes (Arabidopsis thaliana, Oryza sativa, and Beta vulgaris). Four different gene sets were identified, namely, the genes conserved in both species, those lost in both species, those conserved in R. soongorica only, and those conserved in sand rice only. Gene ontology showed that post-embryonic development genes (PEDGs) were enriched in all gene sets, and different sets of PEDGs were conserved or lost in both the R. soongorica and sand rice genomes. Expression profiles of Arabidopsis orthologs further provided some clues to the function of the species-specific conserved PEDGs. Such orthologs included LEAFY PETIOLE, which could be a candidate gene involved in the development of branch priority in sand rice.

  2. Construction of a Bacterial Cell that Contains Only the Set of Essential Genes Necessary to Impart Life

    DTIC Science & Technology

    2014-11-11

    We replaced essential gene MMYC_0361 with the rlmH gene from Bacillus subtilis. Mycoplasma mycoides containing the B. subtilis rlmH was viable...A) As an example of this approach, we swapped a synthetic expression module for Bacillus subtilis pseudouridine methyltransferase gene rlmH with

  3. Gene-set based genome-wide association analysis for the speed of sound in two skeletal sites of Korean women

    PubMed Central

    Kwon, Ji-Sun; Kim, Sangsoo

    2014-01-01

    The speed of sound (SOS) value is an indicator of bone mineral density (BMD). Previous genome-wide association (GWA) studies have identified a number of genes, whose variations may affect BMD levels. However, their biological implications have been elusive. We re-analyzed the GWA study dataset for the SOS values in skeletal sites of 4,659 Korean women, using a gene-set analysis software, GSA-SNP. We identified 10 common representative GO terms, and 17 candidate genes between these two traits (PGS < 0.05). Implication of these GO terms and genes in the bone mechanism is well supported by the literature survey. Interestingly, the significance levels of some member genes were inversely related, in several gene-sets that were shared between two skeletal sites. This implies that biological process, rather than SNP or gene, is the substantial unit of genetic association for SOS in bone. In conclusion, our findings may provide new insights into the biological mechanisms for BMD. [BMB Reports 2014; 47(6): 348-353] PMID:24286325

  4. Analysis of the seven-member AAD gene set demonstrates that genetic redundancy in yeast may be more apparent than real.

    PubMed Central

    Delneri, D; Gardner, D C; Oliver, S G

    1999-01-01

    Saccharomyces cerevisiae has seven genes encoding proteins with a high degree (>85%) of amino-acid sequence identity to the aryl-alcohol dehydrogenase of the lignin-degrading, filamentous fungus, Phanerochaete chrysosporium. All but one member of this gene set are telomere associated. Moreover, all contain a sequence similar to the DNA-binding site of the Yap1p transcriptional activator either upstream of or within their coding sequences. The expression of the AAD genes was found to be induced by chemicals, such as diamide and diethyl maleic acid ester (DEME), that cause an oxidative shock by inactivating the glutathione (GSH) reservoir of the cells. In contrast, the oxidizing agent hydrogen peroxide has no effect on the expression of these genes. We found that the response to anti-GSH agents was Yap1p dependent. The very high level of nucleotide sequence similarity between the AAD genes makes it difficult to determine if they are all involved in the oxidative-stress response. The use of single and multiple aad deletants demonstrated that only AAD4 (YDL243c) and AAD6 (YFL056/57c) respond to the oxidative stress. Of these two genes, only AAD4 is likely to be functional since the YFL056/57c open reading frame is interrupted by a stop codon. Thus, in terms of the function in response to oxidative stress, the sevenfold redundancy of the AAD gene set is more apparent than real. PMID:10581269

  5. Identification of key genes in hepatocellular carcinoma and validation of the candidate gene, cdc25a, using gene set enrichment analysis, meta-analysis and cross-species comparison.

    PubMed

    Lu, Xiaoxu; Sun, Wen; Tang, Yanping; Zhu, Lingqun; Li, Yuan; Ou, Chao; Yang, Chun; Su, Jianjia; Luo, Chengpiao; Hu, Yanling; Cao, Ji

    2016-02-01

    The aim of the present study was to determine key pathways and genes involved in the pathogenesis of hepatocellular carcinoma (HCC) through bioinformatic analyses of HCC microarray data based on cross-species comparison. Microarray data of gene expression in HCC in different species were analyzed using gene set enrichment analysis (GSEA) and meta-analysis. Reverse transcription-quantitative polymerase chain reaction and western blotting were performed to determine the mRNA and protein expression levels of cdc25a, one of the identified candidate genes, in human, rat and tree shrew samples. The cell cycle pathway had the largest overlap between the GSEA and meta-analysis. Meta-analyses showed that 25 genes, including cdc25a, in the cell cycle pathway were differentially expressed. Cdc25a mRNA levels in HCC tissues were higher than those in normal liver tissues in humans, rats and tree shrews, and the expression level of cdc25a in HCC tissues was higher than in corresponding paraneoplastic tissues in humans and rats. In human HCC tissues, the cdc25a mRNA level was significantly correlated with clinical stage, portal vein tumor thrombosis and extrahepatic metastasis. Western blotting showed that, cdc25a protein levels were significantly upregulated in HCC tissues in humans, rats and tree shrews. In conclusion, GSEA and meta-analysis can be combined to identify key molecules and pathways involved in HCC. This study demonstrated that the cell cycle pathway and the cdc25a gene may be crucial in the pathogenesis and progression of HCC.

  6. A Method for Gene-Based Pathway Analysis Using Genomewide Association Study Summary Statistics Reveals Nine New Type 1 Diabetes Associations

    PubMed Central

    Evangelou, Marina; Smyth, Deborah J; Fortune, Mary D; Burren, Oliver S; Walker, Neil M; Guo, Hui; Onengut-Gumuscu, Suna; Chen, Wei-Min; Concannon, Patrick; Rich, Stephen S; Todd, John A; Wallace, Chris

    2014-01-01

    Pathway analysis can complement point-wise single nucleotide polymorphism (SNP) analysis in exploring genomewide association study (GWAS) data to identify specific disease-associated genes that can be candidate causal genes. We propose a straightforward methodology that can be used for conducting a gene-based pathway analysis using summary GWAS statistics in combination with widely available reference genotype data. We used this method to perform a gene-based pathway analysis of a type 1 diabetes (T1D) meta-analysis GWAS (of 7,514 cases and 9,045 controls). An important feature of the conducted analysis is the removal of the major histocompatibility complex gene region, the major genetic risk factor for T1D. Thirty-one of the 1,583 (2%) tested pathways were identified to be enriched for association with T1D at a 5% false discovery rate. We analyzed these 31 pathways and their genes to identify SNPs in or near these pathway genes that showed potentially novel association with T1D and attempted to replicate the association of 22 SNPs in additional samples. Replication P-values were skewed () with 12 of the 22 SNPs showing . Support, including replication evidence, was obtained for nine T1D associated variants in genes ITGB7 (rs11170466, ), NRP1 (rs722988, ), BAD (rs694739, ), CTSB (rs1296023, ), FYN (rs11964650, ), UBE2G1 (rs9906760, ), MAP3K14 (rs17759555, ), ITGB1 (rs1557150, ), and IL7R (rs1445898, ). The proposed methodology can be applied to other GWAS datasets for which only summary level data are available. PMID:25371288

  7. A clustered set of three Sp-family genes is ancestral in the Metazoa: evidence from sequence analysis, protein domain structure, developmental expression patterns and chromosomal location

    PubMed Central

    2010-01-01

    Background The Sp-family of transcription factors are evolutionarily conserved zinc finger proteins present in many animal species. The orthology of the Sp genes in different animals is unclear and their evolutionary history is therefore controversially discussed. This is especially the case for the Sp gene buttonhead (btd) which plays a key role in head development in Drosophila melanogaster, and has been proposed to have originated by a recent gene duplication. The purpose of the presented study was to trace orthologs of btd in other insects and reconstruct the evolutionary history of the Sp genes within the metazoa. Results We isolated Sp genes from representatives of a holometabolous insect (Tribolium castaneum), a hemimetabolous insect (Oncopeltus fasciatus), primitively wingless hexapods (Folsomia candida and Thermobia domestica), and an amphipod crustacean (Parhyale hawaienis). We supplemented this data set with data from fully sequenced animal genomes. We performed phylogenetic sequence analysis with the result that all Sp factors fall into three monophyletic clades. These clades are also supported by protein domain structure, gene expression, and chromosomal location. We show that clear orthologs of the D. melanogaster btd gene are present even in the basal insects, and that the Sp5-related genes in the genome sequence of several deuterostomes and the basal metazoans Trichoplax adhaerens and Nematostella vectensis are also orthologs of btd. Conclusions All available data provide strong evidence for an ancestral cluster of three Sp-family genes as well as synteny of this Sp cluster and the Hox cluster. The ancestral Sp gene cluster already contained a Sp5/btd ortholog, which strongly suggests that btd is not the result of a recent gene duplication, but directly traces back to an ancestral gene already present in the metazoan ancestor. PMID:20353601

  8. Construction of a Bacterial Cell that Contains Only the Set of Essential Genes Necessary to Impart Life

    DTIC Science & Technology

    2013-11-18

    gene categories to make steady progress with gene and gene cluster deletions. To date, we have removed approximately 273 kb from the Mycoplasma ...the Mycoplasma mycoides JCVI-syn1.0 genome. The resultant 806 kb genome is viable and grows with a normal growth rate. Results on the Bottom Up...for life under ideal laboratory conditions. We are working to minimize Mycoplasma mycoides JCVI-syn1.0 (the synthetic version of Mycoplasma mycoides

  9. Construction of a Bacterial Cell that Contains Only the Set of Essential Genes Necessary to Impart Life

    DTIC Science & Technology

    2014-02-18

    gene categories to make steady progress with gene and gene cluster deletions. To date, we have removed approximately 283 kb from the Mycoplasma ...the Mycoplasma mycoides JCVI-syn1.0 genome by this method. The resultant 795 kb genome is viable and grows with a normal growth rate. Our main...essential for life under ideal laboratory conditions. We are working to minimize Mycoplasma mycoides JCVI-syn1.0 (the synthetic version of Mycoplasma

  10. Influence of statistical estimators of mutual information and data heterogeneity on the inference of gene regulatory networks.

    PubMed

    de Matos Simoes, Ricardo; Emmert-Streib, Frank

    2011-01-01

    The inference of gene regulatory networks from gene expression data is a difficult problem because the performance of the inference algorithms depends on a multitude of different factors. In this paper we study two of these. First, we investigate the influence of discrete mutual information (MI) estimators on the global and local network inference performance of the C3NET algorithm. More precisely, we study 4 different MI estimators (Empirical, Miller-Madow, Shrink and Schürmann-Grassberger) in combination with 3 discretization methods (equal frequency, equal width and global equal width discretization). We observe the best global and local inference performance of C3NET for the Miller-Madow estimator with an equal width discretization. Second, our numerical analysis can be considered as a systems approach because we simulate gene expression data from an underlying gene regulatory network, instead of making a distributional assumption to sample thereof. We demonstrate that despite the popularity of the latter approach, which is the traditional way of studying MI estimators, this is in fact not supported by simulated and biological expression data because of their heterogeneity. Hence, our study provides guidance for an efficient design of a simulation study in the context of network inference, supporting a systems approach.

  11. Characterizing differential individual response to Porcine Reproductive and Respiratory Virus infection through statistical and functional analysis of gene expression

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We evaluated differences in gene expression in pigs from the Porcine Reproductive and Respiratory Syndrome (PPRS) Host Genetics Consortium initiative showing a range of responses to PRRS virus (PRRSV) infection. Pigs were allocated into four phenotypic groups according to their serum viral level and...

  12. Time series analysis of benzo[A]pyrene-induced transcriptome changes suggests that a network of transcription factors regulates the effects on functional gene sets.

    PubMed

    van Delft, Joost H M; Mathijs, Karen; Staal, Yvonne C M; van Herwijnen, Marcel H M; Brauers, Karen J J; Boorsma, André; Kleinjans, Jos C S

    2010-10-01

    Chemical carcinogens may cause a multitude of effects inside cells, thereby affecting transcript levels of genes by direct activation of transcription factors (TF) or indirectly through the formation of DNA damage. As the temporal profiles of these responses may be profoundly different, examining time-dependent changes may provide new insights in TF networks related to cellular responses to chemical carcinogens. Therefore, we investigated in human hepatoma cells gene expression changes caused by benzo[a]pyrene at 12 time points after exposure, in relation to DNA adduct and cell cycle. Temporal profiles for functional gene sets demonstrate both early and late effects in up- and downregulation of relevant gene sets involved in cell cycle, apoptosis, DNA repair, and metabolism of amino acids and lipids. Many significant transcription regulation networks appeared to be around TF that are proto-oncogenes or tumor suppressor genes. The time series analysis tool Short Time-series Expression Miner (STEM) was used to identify time-dependent correlation of pathways, gene sets, TF networks, and biological parameters. Most correlations are with DNA adduct levels, which is an early response, and less with the later responses on G1 and S phase cells. The majority of the modulated genes in the Reactome pathways can be regulated by several of these TF, e.g., 73% by nuclear factor-kappa B and 34-42% by c-MYC, SRF, AP1, and E2F1. All these TF can also regulate one or more of the others. Our data indicate that a complex network of a few TF is responsible for the majority of the transcriptional changes induced by BaP. This network hardly changes over time, despite that the transcriptional profiles clearly alter, suggesting that also other regulatory mechanisms are involved.

  13. A connected set of genes associated with programmed cell death implicated in controlling the hypersensitive response in maize

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Rp1-D21 is a maize auto-active resistance gene that confers a spontaneous hypersensitive response (HR). Depending on the genetic background in which it operates; variable levels of HR are observed. This offers a convenient system to identify alleles that modulate HR and genes involved in disease res...

  14. Selection and validation of a set of reliable reference genes for quantitative RT-PCR studies in the brain of the Cephalopod Mollusc Octopus vulgaris

    PubMed Central

    Sirakov, Maria; Zarrella, Ilaria; Borra, Marco; Rizzo, Francesca; Biffali, Elio; Arnone, Maria Ina; Fiorito, Graziano

    2009-01-01

    Background Quantitative real-time polymerase chain reaction (RT-qPCR) is valuable for studying the molecular events underlying physiological and behavioral phenomena. Normalization of real-time PCR data is critical for a reliable mRNA quantification. Here we identify reference genes to be utilized in RT-qPCR experiments to normalize and monitor the expression of target genes in the brain of the cephalopod mollusc Octopus vulgaris, an invertebrate. Such an approach is novel for this taxon and of advantage in future experiments given the complexity of the behavioral repertoire of this species when compared with its relatively simple neural organization. Results We chose 16S, and 18S rRNA, actB, EEF1A, tubA and ubi as candidate reference genes (housekeeping genes, HKG). The expression of 16S and 18S was highly variable and did not meet the requirements of candidate HKG. The expression of the other genes was almost stable and uniform among samples. We analyzed the expression of HKG into two different set of animals using tissues taken from the central nervous system (brain parts) and mantle (here considered as control tissue) by BestKeeper, geNorm and NormFinder. We found that HKG expressions differed considerably with respect to brain area and octopus samples in an HKG-specific manner. However, when the mantle is treated as control tissue and the entire central nervous system is considered, NormFinder revealed tubA and ubi as the most suitable HKG pair. These two genes were utilized to evaluate the relative expression of the genes FoxP, creb, dat and TH in O. vulgaris. Conclusion We analyzed the expression profiles of some genes here identified for O. vulgaris by applying RT-qPCR analysis for the first time in cephalopods. We validated candidate reference genes and found the expression of ubi and tubA to be the most appropriate to evaluate the expression of target genes in the brain of different octopuses. Our results also underline the importance of choosing a proper

  15. The map-1 gene family in root-knot nematodes, Meloidogyne spp.: a set of taxonomically restricted genes specific to clonal species.

    PubMed

    Tomalova, Iva; Iachia, Cathy; Mulet, Karine; Castagnone-Sereno, Philippe

    2012-01-01

    Taxonomically restricted genes (TRGs), i.e., genes that are restricted to a limited subset of phylogenetically related organisms, may be important in adaptation. In parasitic organisms, TRG-encoded proteins are possible determinants of the specificity of host-parasite interactions. In the root-knot nematode (RKN) Meloidogyne incognita, the map-1 gene family encodes expansin-like proteins that are secreted into plant tissues during parasitism, thought to act as effectors to promote successful root infection. MAP-1 proteins exhibit a modular architecture, with variable number and arrangement of 58 and 13-aa domains in their central part. Here, we address the evolutionary origins of this gene family using a combination of bioinformatics and molecular biology approaches. Map-1 genes were solely identified in one single member of the phylum Nematoda, i.e., the genus Meloidogyne, and not detected in any other nematode, thus indicating that the map-1 gene family is indeed a TRG family. A phylogenetic analysis of the distribution of map-1 genes in RKNs further showed that these genes are specifically present in species that reproduce by mitotic parthenogenesis, with the exception of M. floridensis, and could not be detected in RKNs reproducing by either meiotic parthenogenesis or amphimixis. These results highlight the divergence between mitotic and meiotic RKN species as a critical transition in the evolutionary history of these parasites. Analysis of the sequence conservation and organization of repeated domains in map-1 genes suggests that gene duplication(s) together with domain loss/duplication have contributed to the evolution of the map-1 family, and that some strong selection mechanism may be acting upon these genes to maintain their functional role(s) in the specificity of the plant-RKN interactions.

  16. Design and experimental application of a novel non-degenerate universal primer set that amplifies prokaryotic 16S rRNA genes with a low possibility to amplify eukaryotic rRNA genes.

    PubMed

    Mori, Hiroshi; Maruyama, Fumito; Kato, Hiromi; Toyoda, Atsushi; Dozono, Ayumi; Ohtsubo, Yoshiyuki; Nagata, Yuji; Fujiyama, Asao; Tsuda, Masataka; Kurokawa, Ken

    2014-01-01

    The deep sequencing of 16S rRNA genes amplified by universal primers has revolutionized our understanding of microbial communities by allowing the characterization of the diversity of the uncultured majority. However, some universal primers also amplify eukaryotic rRNA genes, leading to a decrease in the efficiency of sequencing of prokaryotic 16S rRNA genes with possible mischaracterization of the diversity in the microbial community. In this study, we compared 16S rRNA gene sequences from genome-sequenced strains and identified candidates for non-degenerate universal primers that could be used for the amplification of prokaryotic 16S rRNA genes. The 50 identified candidates were investigated to calculate their coverage for prokaryotic and eukaryotic rRNA genes, including those from uncultured taxa and eukaryotic organelles, and a novel universal primer set, 342F-806R, covering many prokaryotic, but not eukaryotic, rRNA genes was identified. This primer set was validated by the amplification of 16S rRNA genes from a soil metagenomic sample and subsequent pyrosequencing using the Roche 454 platform. The same sample was also used for pyrosequencing of the amplicons by employing a commonly used primer set, 338F-533R, and for shotgun metagenomic sequencing using the Illumina platform. Our comparison of the taxonomic compositions inferred by the three sequencing experiments indicated that the non-degenerate 342F-806R primer set can characterize the taxonomic composition of the microbial community without substantial bias, and is highly expected to be applicable to the analysis of a wide variety of microbial communities.

  17. IMGT/HighV-QUEST Statistical Significance of IMGT Clonotype (AA) Diversity per Gene for Standardized Comparisons of Next Generation Sequencing Immunoprofiles of Immunoglobulins and T Cell Receptors

    PubMed Central

    Aouinti, Safa; Malouche, Dhafer; Giudicelli, Véronique; Kossida, Sofia; Lefranc, Marie-Paule

    2015-01-01

    The adaptive immune responses of humans and of other jawed vertebrate species (gnasthostomata) are characterized by the B and T cells and their specific antigen receptors, the immunoglobulins (IG) or antibodies and the T cell receptors (TR) (up to 2.1012 different IG and TR per individual). IMGT, the international ImMunoGeneTics information system (http://www.imgt.org), was created in 1989 by Marie-Paule Lefranc (Montpellier University and CNRS) to manage the huge and complex diversity of these antigen receptors. IMGT built on IMGT-ONTOLOGY concepts of identification (keywords), description (labels), classification (gene and allele nomenclature) and numerotation (IMGT unique numbering), is at the origin of immunoinformatics, a science at the interface between immunogenetics and bioinformatics. IMGT/HighV-QUEST, the first web portal, and so far the only one, for the next generation sequencing (NGS) analysis of IG and TR, is the paradigm for immune repertoire standardized outputs and immunoprofiles of the adaptive immune responses. It provides the identification of the variable (V), diversity (D) and joining (J) genes and alleles, analysis of the V-(D)-J junction and complementarity determining region 3 (CDR3) and the characterization of the ‘IMGT clonotype (AA)’ (AA for amino acid) diversity and expression. IMGT/HighV-QUEST compares outputs of different batches, up to one million nucleotide sequencesfor the statistical module. These high throughput IG and TR repertoire immunoprofiles are of prime importance in vaccination, cancer, infectious diseases, autoimmunity and lymphoproliferative disorders, however their comparative statistical analysis still remains a challenge. We present a standardized statistical procedure to analyze IMGT/HighV-QUEST outputs for the evaluation of the significance of the IMGT clonotype (AA) diversity differences in proportions, per gene of a given group, between NGS IG and TR repertoire immunoprofiles. The procedure is generic and

  18. Construction of a Bacterial Cell that Contains Only the Set of Essential Genes Necessary to Impart Life

    DTIC Science & Technology

    2012-08-16

    are essential for life. Toward that end, we have conducted transposon mutagenesis experiments, designed to identify genes that are not needed for...project were successfully completed. The global transposon experiments, using Tn4001-tetM and Tn5-puromycin, have been completed. The results have...been incorporated into the transposon map contained in Attachment A. The 6 IS elements, 6 R-M systems, the ICE element and 7 other large gene

  19. Monitoring of gene expression in bacteria during infections using an adaptable set of bioluminescent, fluorescent and colorigenic fusion vectors.

    PubMed

    Uliczka, Frank; Pisano, Fabio; Kochut, Annika; Opitz, Wiebke; Herbst, Katharina; Stolz, Tatjana; Dersch, Petra

    2011-01-01

    A family of versatile promoter-probe plasmids for gene expression analysis was developed based on a modular expression plasmid system (pZ). The vectors contain different replicons with exchangeable antibiotic cassettes to allow compatibility and expression analysis on a low-, midi- and high-copy number basis. Suicide vector variants also permit chromosomal integration of the reporter fusion and stable vector derivatives can be used for in vivo or in situ expression studies under non-selective conditions. Transcriptional and translational fusions to the reporter genes gfp(mut3.1), amCyan, dsRed2, luxCDABE, phoA or lacZ can be constructed, and presence of identical multiple cloning sites in the vector system facilitates the interchange of promoters or reporter genes between the plasmids of the series. The promoter of the constitutively expressed gapA gene of Escherichia coli was included to obtain fluorescent and bioluminescent expression constructs. A combination of the plasmids allows simultaneous detection and gene expression analysis in individual bacteria, e.g. in bacterial communities or during mouse infections. To test our vector system, we analyzed and quantified expression of Yersinia pseudotuberculosis virulence genes under laboratory conditions, in association with cells and during the infection process.

  20. Speeding up directed evolution: Combining the advantages of solid-phase combinatorial gene synthesis with statistically guided reduction of screening effort.

    PubMed

    Hoebenreich, Sabrina; Zilly, Felipe E; Acevedo-Rocha, Carlos G; Zilly, Matías; Reetz, Manfred T

    2015-03-20

    Efficient and economic methods in directed evolution at the protein, metabolic, and genome level are needed for biocatalyst development and the success of synthetic biology. In contrast to random strategies, semirational approaches such as saturation mutagenesis explore the sequence space in a focused manner. Although several combinatorial libraries based on saturation mutagenesis have been reported using solid-phase gene synthesis, direct comparison with traditional PCR-based methods is currently lacking. In this work, we compare combinatorial protein libraries created in-house via PCR versus those generated by commercial solid-phase gene synthesis. Using descriptive statistics and probabilistic distributions on amino acid occurrence frequencies, the quality of the libraries was assessed and compared, revealing that the outsourced libraries are characterized by less bias and outliers than the PCR-based ones. Afterward, we screened all libraries following a traditional algorithm for almost complete library coverage and compared this approach with an emergent statistical concept suggesting screening a lower portion of the protein sequence space. Upon analyzing the biocatalytic landscapes and best hits of all combinatorial libraries, we show that the screening effort could have been reduced in all cases by more than 50%, while still finding at least one of the best mutants.

  1. Human longevity and variation in DNA damage response and repair: study of the contribution of sub-processes using competitive gene-set analysis.

    PubMed

    Debrabant, Birgit; Soerensen, Mette; Flachsbart, Friederike; Dato, Serena; Mengel-From, Jonas; Stevnsner, Tinna; Bohr, Vilhelm A; Kruse, Torben A; Schreiber, Stefan; Nebel, Almut; Christensen, Kaare; Tan, Qihua; Christiansen, Lene

    2014-09-01

    DNA-damage response and repair are crucial to maintain genetic stability, and are consequently considered central to aging and longevity. Here, we investigate whether this pathway overall associates to longevity, and whether specific sub-processes are more strongly associated with longevity than others. Data were applied on 592 SNPs from 77 genes involved in nine sub-processes: DNA-damage response, base excision repair (BER), nucleotide excision repair, mismatch repair, non-homologous end-joining, homologous recombinational repair (HRR), RecQ helicase activities (RECQ), telomere functioning and mitochondrial DNA processes. The study population was 1089 long-lived and 736 middle-aged Danes. A self-contained set-based test of all SNPs displayed association with longevity (P-value=9.9 × 10(-5)), supporting that the overall pathway could affect longevity. Investigation of the nine sub-processes using the competitive gene-set analysis by Wang et al indicated that BER, HRR and RECQ associated stronger with longevity than the respective remaining genes of the pathway (P-values=0.004-0.048). For HRR and RECQ, only one gene contributed to the significance, whereas for BER several genes contributed. These associations did, however, generally not pass correction for multiple testing. Still, these findings indicate that, of the entire pathway, variation in BER might influence longevity the most. These modest sized P-values were not replicated in a German sample. This might, though, be due to differences in genotyping procedures and investigated SNPs, potentially inducing differences in the coverage of gene regions. Specifically, five genes were not covered at all in the German data. Therefore, investigations in additional study populations are needed before final conclusion can be drawn.

  2. CORE_TF: a user-friendly interface to identify evolutionary conserved transcription factor binding sites in sets of co-regulated genes

    PubMed Central

    Hestand, Matthew S; van Galen, Michiel; Villerius, Michel P; van Ommen, Gert-Jan B; den Dunnen, Johan T; 't Hoen, Peter AC

    2008-01-01

    Background The identification of transcription factor binding sites is difficult since they are only a small number of nucleotides in size, resulting in large numbers of false positives and false negatives in current approaches. Computational methods to reduce false positives are to look for over-representation of transcription factor binding sites in a set of similarly regulated promoters or to look for conservation in orthologous promoter alignments. Results We have developed a novel tool, "CORE_TF" (Conserved and Over-REpresented Transcription Factor binding sites) that identifies common transcription factor binding sites in promoters of co-regulated genes. To improve upon existing binding site predictions, the tool searches for position weight matrices from the TRANSFACR database that are over-represented in an experimental set compared to a random set of promoters and identifies cross-species conservation of the predicted transcription factor binding sites. The algorithm has been evaluated with expression and chromatin-immunoprecipitation on microarray data. We also implement and demonstrate the importance of matching the random set of promoters to the experimental promoters by GC content, which is a unique feature of our tool. Conclusion The program CORE_TF is accessible in a user friendly web interface at . It provides a table of over-represented transcription factor binding sites in the users input genes' promoters and a graphical view of evolutionary conserved transcription factor binding sites. In our test data sets it successfully predicts target transcription factors and their binding sites. PMID:19036135

  3. Heat Stress and Lipopolysaccharide Stimulation of Chicken Macrophage-Like Cell Line Activates Expression of Distinct Sets of Genes

    PubMed Central

    Slawinska, Anna; Hsieh, John C.; Schmidt, Carl J.; Lamont, Susan J.

    2016-01-01

    Acute heat stress requires immediate adjustment of the stressed individual to sudden changes of ambient temperatures. Chickens are particularly sensitive to heat stress due to development of insufficient physiological mechanisms to mitigate its effects. One of the symptoms of heat stress is endotoxemia that results from release of the lipopolysaccharide (LPS) from the guts. Heat-related cytotoxicity is mitigated by the innate immune system, which is comprised mostly of phagocytic cells such as monocytes and macrophages. The objective of this study was to analyze the molecular responses of the chicken macrophage-like HD11 cell line to combined heat stress and lipopolysaccharide treatment in vitro. The cells were heat-stressed and then allowed a temperature-recovery period, during which the gene expression was investigated. LPS was added to the cells to mimic the heat-stress-related endotoxemia. Semi high-throughput gene expression analysis was used to study a gene panel comprised of heat shock proteins, stress-related genes, signaling molecules and immune response genes. HD11 cell line responded to heat stress with increased mRNA abundance of the HSP25, HSPA2 and HSPH1 chaperones as well as DNAJA4 and DNAJB6 co-chaperones. The anti-apoptotic gene BAG3 was also highly up-regulated, providing evidence that the cells expressed pro-survival processes. The immune response of the HD11 cell line to LPS in the heat stress environment (up-regulation of CCL4, CCL5, IL1B, IL8 and iNOS) was higher than in thermoneutral conditions. However, the peak in the transcriptional regulation of the immune genes was after two hours of temperature-recovery. Therefore, we propose the potential influence of the extracellular heat shock proteins not only in mitigating effects of abiotic stress but also in triggering the higher level of the immune responses. Finally, use of correlation networks for the data analysis aided in discovering subtle differences in the gene expression (i.e. the role

  4. Ikaros sets the potential for Th17 lineage gene expression through effects on chromatin state in early T cell development.

    PubMed

    Wong, Larry Y; Hatfield, Julianne K; Brown, Melissa A

    2013-12-06

    Th17 cells are important effectors of immunity to extracellular pathogens, particularly at mucosal surfaces, but they can also contribute to pathologic tissue inflammation and autoimmunity. Defining the multitude of factors that influence their development is therefore of paramount importance. Our previous studies using Ikaros(-/-) CD4+ T cells implicated Ikaros in Th1 versus Th2 lineage decisions. Here we demonstrate that Ikaros also regulates Th17 differentiation through its ability to promote expression of multiple Th17 lineage-determining genes, including Ahr, Runx1, Rorc, Il17a, and Il22. Ikaros exerts its influence on the chromatin remodeling of these loci at two distinct stages in CD4+ T helper cell development. In naive cells, Ikaros is required to limit repressive chromatin modifications at these gene loci, thus maintaining the potential for expression of the Th17 gene program. Subsequently, Ikaros is essential for the acquisition of permissive histone marks in response to Th17 polarizing signals. Additionally, Ikaros represses the expression of genes that limit Th17 development, including Foxp3 and Tbx21. These data define new targets of the action of Ikaros and indicate that Ikaros plays a critical role in CD4+ T cell differentiation by integrating specific cytokine cues and directing epigenetic modifications that facilitate activation or repression of relevant genes that drive T cell lineage choice.

  5. The parthenocarpic hydra mutant reveals a new function for a SPOROCYTELESS-like gene in the control of fruit set in tomato.

    PubMed

    Rojas-Gracia, Pilar; Roque, Edelin; Medina, Mónica; Rochina, Maricruz; Hamza, Rim; Angarita-Díaz, María Pilar; Moreno, Vicente; Pérez-Martín, Fernando; Lozano, Rafael; Cañas, Luis; Beltrán, José Pío; Gómez-Mena, Concepción

    2017-05-01

    Fruit set is an essential process to ensure successful sexual plant reproduction. The development of the flower into a fruit is actively repressed in the absence of pollination. However, some cultivars from a few species are able to develop seedless fruits overcoming the standard restriction of unpollinated ovaries to growth. We report here the identification of the tomato hydra mutant that produces seedless (parthenocarpic) fruits. Seedless fruit production in hydra plants is linked to the absence of both male and female sporocyte development. The HYDRA gene is therefore essential for the initiation of sporogenesis in tomato. Using positional cloning, virus-induced gene silencing and expression analysis experiments, we identified the HYDRA gene and demonstrated that it encodes the tomato orthologue of SPOROCYTELESS/NOZZLE (SPL/NZZ) of Arabidopsis. We found that the precocious growth of the ovary is associated with changes in the expression of genes involved in gibberellin (GA) metabolism. Our results support the conservation of the function of SPL-like genes in the control of sporogenesis in plants. Moreover, this study uncovers a new function for the tomato SlSPL/HYDRA gene in the control of fruit initiation.

  6. Effect on in vitro cell response of the statistical insertion of N-(2-hydroxypropyl) methacrylamide on linear pro-dendronic polyamine's gene carriers.

    PubMed

    Redondo, Juan Alfonso; Martínez-Campos, Enrique; Navarro, Rodrigo; Reinecke, Helmut; Elvira, Carlos; López-Lacomba, José Luis; Gallardo, Alberto

    2015-06-01

    Statistical copolymers of N-(2-hydroxypropyl) methacrylamide (HPMA) and the dendronic methacrylic monomer 2-(3-(Bis(2-(diethylamino)ethyl)amino)propanamido)ethyl methacrylate (TEDETAMA, derived from N,N,N',N'-tetraethyldiethylenetriamine, TEDETA), were synthesized through radical copolymerization and evaluated in vitro as non-viral gene carriers. Three copolymers with nominal molar percentages of HPMA of 25%, 50% and 75% were prepared and studied comparatively to the positive controls poly-TEDETAMA and hyperbranched polyethyleneimine (PEI, 25kDa). Their ability to complex DNA at different N/P molar ratios, from 1/1 up to 8/1, was determined through agarose gel electrophoresis and Dynamic Light Scattering. The resulting complexes (polyplexes) were characterized and evaluated in vitro as possible non-viral gene carriers for Swiss-3T3 fibroblasts, using luciferase as reporter gene and a calcein cytocompatibility assay. All the copolymers, except the one with highest HPMA proportion (75 molar %) at the lowest N/P ratio, condensed DNA to a particle size between 100 and 300 nm. The copolymers with 25 and 50 molar % of HPMA displayed higher transfection efficiency and cytocompatibility than the positive controls poly-TEDETAMA and PEI. A higher proportion of HPMA (75 molar %) led to copolymers that displayed very low transfection efficiency, despite their full cytocompatibility even at the highest N/P ratio. These results indicate that the statistical combination of TEDETAMA and HPMA and its fine compositional tuning in the copolymers may fulfill the fine balance of transfection efficiency and cytocompatibility in a superior way to the control poly-TEDETAMA and PEI.

  7. Plasmid pCAR3 Contains Multiple Gene Sets Involved in the Conversion of Carbazole to Anthranilate†

    PubMed Central

    Urata, Masaaki; Uchimura, Hiromasa; Noguchi, Haruko; Sakaguchi, Tomoya; Takemura, Tetsuo; Eto, Kaori; Habe, Hiroshi; Omori, Toshio; Yamane, Hisakazu; Nojiri, Hideaki

    2006-01-01

    The carbazole degradative car-I gene cluster (carAaIBaIBbICIAcI) of Sphingomonas sp. strain KA1 is located on the 254-kb circular plasmid pCAR3. Carbazole conversion to anthranilate is catalyzed by carbazole 1,9a-dioxygenase (CARDO; CarAaIAcI), meta-cleavage enzyme (CarBaIBbI), and hydrolase (CarCI). CARDO is a three-component dioxygenase, and CarAaI and CarAcI are its terminal oxygenase and ferredoxin components. The car-I gene cluster lacks the gene encoding the ferredoxin reductase component of CARDO. In the present study, based on the draft sequence of pCAR3, we found multiple carbazole degradation genes dispersed in four loci on pCAR3, including a second copy of the car gene cluster (carAaIIBaIIBbIICIIAcII) and the ferredoxin/reductase genes fdxI-fdrI and fdrII. Biotransformation experiments showed that FdrI (or FdrII) could drive the electron transfer chain from NAD(P)H to CarAaI (or CarAaII) with the aid of ferredoxin (CarAcI, CarAcII, or FdxI). Because this electron transfer chain showed phylogenetic relatedness to that consisting of putidaredoxin and putidaredoxin reductase of the P450cam monooxygenase system of Pseudomonas putida, CARDO systems of KA1 can be classified in the class IIA Rieske non-heme iron oxygenase system. Reverse transcription-PCR (RT-PCR) and quantitative RT-PCR analyses revealed that two car gene clusters constituted operons, and their expression was induced when KA1 was exposed to carbazole, although the fdxI-fdrI and fdrII genes were expressed constitutively. Both terminal oxygenases of KA1 showed roughly the same substrate specificity as that from the well-characterized carbazole degrader Pseudomonas resinovorans CA10, although slight differences were observed. PMID:16672458

  8. Arabidopsis histone methyltransferase SET DOMAIN GROUP8 mediates induction of the jasmonate/ethylene pathway genes in plant defense response to necrotrophic fungi.

    PubMed

    Berr, Alexandre; McCallum, Emily J; Alioua, Abdelmalek; Heintz, Dimitri; Heitz, Thierry; Shen, Wen-Hui

    2010-11-01

    As sessile organisms, plants have to endure a wide variety of biotic and abiotic stresses, and accordingly they have evolved intricate and rapidly inducible defense strategies associated with the activation of a battery of genes. Among other mechanisms, changes in chromatin structure are thought to provide a flexible, global, and stable means for the regulation of gene transcription. In support of this idea, we demonstrate here that the Arabidopsis (Arabidopsis thaliana) histone methyltransferase SET DOMAIN GROUP8 (SDG8) plays a crucial role in plant defense against fungal pathogens by regulating a subset of genes within the jasmonic acid (JA) and/or ethylene signaling pathway. We show that the loss-of-function mutant sdg8-1 displays reduced resistance to the necrotrophic fungal pathogens Alternaria brassicicola and Botrytis cinerea. While levels of JA, a primary phytohormone involved in plant defense, and camalexin, a major phytoalexin against fungal pathogens, remain unchanged or even above normal in sdg8-1, induction of several defense genes within the JA/ethylene signaling pathway is severely compromised in response to fungal infection or JA treatment in mutant plants. Both downstream genes and, remarkably, also upstream mitogen-activated protein kinase kinase genes MKK3 and MKK5 are misregulated in sdg8-1. Accordingly, chromatin immunoprecipitation analysis shows that sdg8-1 impairs dynamic changes of histone H3 lysine 36 methylation at defense marker genes as well as at MKK3 and MKK5, which normally occurs upon infection with fungal pathogens or methyl JA treatment in wild-type plants. Our data indicate that SDG8-mediated histone H3 lysine 36 methylation may serve as a memory of permissive transcription for a subset of defense genes, allowing rapid establishment of transcriptional induction.

  9. Whole-Transcriptome RNA-seq, Gene Set Enrichment Pathway Analysis, and Exon Coverage Analysis of Two Plastid RNA Editing Mutants.

    PubMed

    Hackett, Justin B; Lu, Yan

    2017-04-07

    In land plants, plastid and mitochondrial RNAs are subject to post-transcriptional C-to-U RNA editing. T-DNA insertions in the ORGANELLE RNA RECOGNITION MOTIF PROTEIN6 gene resulted in reduced photosystem II (PSII) activity and smaller plant and leaf sizes. Exon coverage analysis of the ORRM6 gene showed that orrm6-1 and orrm6-2 are loss-of-function mutants. Compared to other ORRM proteins, ORRM6 affects a relative small number of RNA editing sites. Sanger sequencing of reverse transcription-PCR products of plastid transcripts revealed two plastid RNA editing sites that are substantially affected in the orrm6 mutants: psbF-C77 and accD-C794. The psbF gene encodes the beta subunit of cytochrome b559, an essential component of PSII. The accD gene encodes the beta subunit of acetyl-CoA carboxylase, a protein required in plastid fatty acid biosynthesis. Whole-transcriptome RNA-seq demonstrated that editing at psbF-C77 is nearly absent and the editing extent at accD-C794 was significantly reduced. Gene set enrichment pathway analysis showed that expression of multiple gene sets involved in photosynthesis, especially photosynthetic electron transport, is significantly up-regulated in both orrm6 mutants. The up-regulation could be a mechanism to compensate for the reduced PSII electron transport rate in the orrm6 mutants. These results further demonstrated that Organelle RNA Recognition Motif protein ORRM6 is required in editing of specific RNAs in the Arabidopsis (Arabidopsis thaliana) plastid.

  10. Cold exposure induces the acquisition of brown adipocyte gene expression profiles in cattle inguinal fat normalized with a new set of reference genes for qRT-PCR.

    PubMed

    Cao, K X; Hao, D; Wang, J; Peng, W W; Yan, Y J; Cao, H X; Sun, F; Chen, H

    2017-02-24

    The last few years have seen great advances in our understanding of browning in white adipose tissue (WAT) where white adipocytes take on characteristics of brown adipocytes. At present, the economic significance of browning for animal husbandry is beginning to be realized with the emerging evidence that browning affects body weight not only in human and rodent but in farm animals. Quantitative RT-PCR provides a quick and sensitive way to preliminary determine browning of WAT. However, there have been no established condition specific reference genes for browning of cattle WAT. As the results showed, the most two stable reference genes for diet treatment were Wdr33 (M=0.38) and Hdac3 (M=0.43), while the most three internal controls for temperature treatment were Hdac3 (M=0.28), Wdr33 (M=0.32), and Hprt1 (M=0.39) among the ten candidates. The mRNA relative expression levels of selective marker genes were normalized by normalization factor (geometric mean of control genes quantities). Cold exposure rather than high energy diet induced transcript elevations of brite specific markers (Cited1, Tbx1), thermoregulatory markers (brown and beige versus white markers, i.e., Cidea, Cox7a1, Ucp1), mitochondrial biogenesis markers (Nrf1, Nrf2, Tfam), and transcription regulators (brown and beige versus white markers, i.e., Pgc1α) (P<0.05) in cattle inguinal fat (iWAT). Quantitative RT-PCR is a preliminary study for WAT browning. In conclusion, cattle inguinal fat acquired brown adipocyte gene expression features upon cold acclimation with prerequisite identification of stable reference genes.

  11. Polymorphisms in sodium-dependent vitamin C transporter genes and plasma, aqueous humor and lens nucleus ascorbate concentrations in an ascorbate depleted setting.

    PubMed

    Senthilkumari, Srinivasan; Talwar, Badri; Dharmalingam, Kuppamuthu; Ravindran, Ravilla D; Jayanthi, Ramamurthy; Sundaresan, Periasamy; Saravanan, Charu; Young, Ian S; Dangour, Alan D; Fletcher, Astrid E

    2014-07-01

    We have previously reported low concentrations of plasma ascorbate and low dietary vitamin C intake in the older Indian population and a strong inverse association of these with cataract. Little is known about ascorbate levels in aqueous humor and lens in populations habitually depleted of ascorbate and no studies in any setting have investigated whether genetic polymorphisms influence ascorbate levels in ocular tissues. Our objectives were to investigate relationships between ascorbate concentrations in plasma, aqueous humor and lens and whether these relationships are influenced by Single Nucleotide Polymorphisms (SNPs) in sodium-dependent vitamin C transporter genes (SLC23A1 and SLC23A2). We enrolled sixty patients (equal numbers of men and women, mean age 63 years) undergoing small incision cataract surgery in southern India. We measured ascorbate concentrations in plasma, aqueous humor and lens nucleus using high performance liquid chromatography. SLC23A1 SNPs (rs4257763, rs6596473) and SLC23A2 SNPs (rs1279683 and rs12479919) were genotyped using a TaqMan assay. Patients were interviewed for lifestyle factors which might influence ascorbate. Plasma vitamin C was normalized by a log10 transformation. Statistical analysis used linear regression with the slope of the within-subject associations estimated using beta (β) coefficients. The ascorbate concentrations (μmol/L) were: plasma ascorbate, median and inter-quartile range (IQR), 15.2 (7.8, 34.5), mean (SD) of aqueous humor ascorbate, 1074 (545) and lens nucleus ascorbate, 0.42 (0.16) (μmol/g lens nucleus wet weight). Minimum allele frequencies were: rs1279683 (0.28), rs12479919 (0.30), rs659647 (0.48). Decreasing concentrations of ocular ascorbate from the common to the rare genotype were observed for rs6596473 and rs12479919. The per allele difference in aqueous humor ascorbate for rs6596473 was -217 μmol/L, p < 0.04 and a per allele difference in lens nucleus ascorbate of -0.085 μmol/g, p < 0

  12. Niemeyer Virus: A New Mimivirus Group A Isolate Harboring a Set of Duplicated Aminoacyl-tRNA Synthetase Genes

    PubMed Central

    Boratto, Paulo V. M.; Arantes, Thalita S.; Silva, Lorena C. F.; Assis, Felipe L.; Kroon, Erna G.; La Scola, Bernard; Abrahão, Jônatas S.

    2015-01-01

    It is well recognized that gene duplication/acquisition is a key factor for molecular evolution, being directly related to the emergence of new genetic variants. The importance of such phenomena can also be expanded to the viral world, with impacts on viral fitness and environmental adaptations. In this work we describe the isolation and characterization of Niemeyer virus, a new mimivirus isolate obtained from water samples of an urban lake in Brazil. Genomic data showed that Niemeyer harbors duplicated copies of three of its four aminoacyl-tRNA synthetase genes (cysteinyl, methionyl, and tyrosyl RS). Gene expression analysis showed that such duplications allowed significantly increased expression of methionyl and tyrosyl aaRS mRNA by Niemeyer in comparison to APMV. Remarkably, phylogenetic data revealed that Niemeyer duplicated gene pairs are different, each one clustering with a different group of mimivirus strains. Taken together, our results raise new questions about the origins and selective pressures involving events of aaRS gain and loss among mimiviruses. PMID:26635738

  13. A universal primer set for PCR amplification of nuclear histone H4 genes from all animal species.

    PubMed

    Pineau, Pascal; Henry, Michel; Suspène, Rodolphe; Marchio, Agnès; Dettai, Agnès; Debruyne, Régis; Petit, Thierry; Lécu, Alexis; Moisson, Pierre; Dejean, Anne; Wain-Hobson, Simon; Vartanian, Jean-Pierre

    2005-03-01

    To control the quality of genomic DNA of samples from a wide variety of animals, a heminested PCR assay specifically targeting a nuclear gene has been developed. The histone H4 gene family comprises a small number of genes considered among the most conserved genes in living organisms. Tissue samples from necropsies and from cells belonging to 43 different species were studied, eight samples from invertebrates and 35 samples from vertebrates covering all classes. Ancient DNA samples from three Siberian woolly mammoths (Mammuthus primigenius) dating between 40,000 and 49,000 years before present were also tested for PCR amplification. Performance of HIST2H4 amplification were also compared with those of previously published universal PCRs (28S rRNA, 18S rRNA, and cytochrome b). Overall, 95% of species studied yielded an amplification product, including some old samples from gorilla and chimpanzees. The data indicate that the HIST2H4 amplimers are, thus, suitable for both DNA quality testing as well as species identification in the animal kingdom.

  14. Statistical and Biological Gene-Lifestyle Interactions of MC4R and FTO with Diet and Physical Activity on Obesity: New Effects on Alcohol Consumption

    PubMed Central

    Covas, M. Isabel; Carrasco, Paula; Salas-Salvadó, Jordi; Martínez-González, Miguel Ángel; Arós, Fernando; Lapetra, José; Serra-Majem, Lluís; Lamuela-Raventos, Rosa; Gómez-Gracia, Enrique; Fiol, Miquel; Pintó, Xavier; Ros, Emilio; Martí, Amelia; Coltell, Oscar; Ordovás, Jose M.; Estruch, Ramon

    2012-01-01

    Background Fat mass and obesity (FTO) and melanocortin-4 receptor (MC4R) and are relevant genes associated with obesity. This could be through food intake, but results are contradictory. Modulation by diet or other lifestyle factors is also not well understood. Objective To investigate whether MC4R and FTO associations with body-weight are modulated by diet and physical activity (PA), and to study their association with alcohol and food intake. Methods Adherence to Mediterranean diet (AdMedDiet) and physical activity (PA) were assessed by validated questionnaires in 7,052 high cardiovascular risk subjects. MC4R rs17782313 and FTO rs9939609 were determined. Independent and joint associations (aggregate genetic score) as well as statistical and biological gene-lifestyle interactions were analyzed. Results FTO rs9939609 was associated with higher body mass index (BMI), waist circumference (WC) and obesity (P<0.05 for all). A similar, but not significant trend was found for MC4R rs17782313. Their additive effects (aggregate score) were significant and we observed a 7% per-allele increase of being obese (OR = 1.07; 95%CI 1.01–1.13). We found relevant statistical interactions (P<0.05) with PA. So, in active individuals, the associations with higher BMI, WC or obesity were not detected. A biological (non-statistical) interaction between AdMedDiet and rs9939609 and the aggregate score was found. Greater AdMedDiet in individuals carrying 4 or 3-risk alleles counterbalanced their genetic predisposition, exhibiting similar BMI (P = 0.502) than individuals with no risk alleles and lower AdMedDiet. They also had lower BMI (P = 0.021) than their counterparts with low AdMedDiet. We did not find any consistent association with energy or macronutrients, but found a novel association between these polymorphisms and lower alcohol consumption in variant-allele carriers (B+/−SE: −0.57+/−0.16 g/d per-score-allele; P = 0.001). Conclusion Statistical and biological

  15. Sequencing-based gene network analysis provides a core set of gene resource for understanding thermal adaptation in Zhikong scallop Chlamys farreri.

    PubMed

    Fu, X; Sun, Y; Wang, J; Xing, Q; Zou, J; Li, R; Wang, Z; Wang, S; Hu, X; Zhang, L; Bao, Z

    2014-01-01

    Marine organisms are commonly exposed to variable environmental conditions, and many of them are under threat from increased sea temperatures caused by global climate change. Generating transcriptomic resources under different stress conditions are crucial for understanding molecular mechanisms underlying thermal adaptation. In this study, we conducted transcriptome-wide gene expression profiling of the scallop Chlamys farreri challenged by acute and chronic heat stress. Of the 13 953 unique tags, more than 850 were significantly differentially expressed at each time point after acute heat stress, which was more than the number of tags differentially expressed (320-350) under chronic heat stress. To obtain a systemic view of gene expression alterations during thermal stress, a weighted gene coexpression network was constructed. Six modules were identified as acute heat stress-responsive modules. Among them, four modules involved in apoptosis regulation, mRNA binding, mitochondrial envelope formation and oxidation reduction were downregulated. The remaining two modules were upregulated. One was enriched with chaperone and the other with microsatellite sequences, whose coexpression may originate from a transcription factor binding site. These results indicated that C. farreri triggered several cellular processes to acclimate to elevated temperature. No modules responded to chronic heat stress, suggesting that the scallops might have acclimated to elevated temperature within 3 days. This study represents the first sequencing-based gene network analysis in a nonmodel aquatic species and provides valuable gene resources for the study of thermal adaptation, which should assist in the development of heat-tolerant scallop lines for aquaculture.

  16. Evidences of two different sets of histone genes active during embryogenesis of the sea urchin Paracentrotus lividus.

    PubMed Central

    Spinelli, G; Gianguzza, F; Casano, C; Acierno, P; Burckhardt, J

    1979-01-01

    Histone mRNAs at different stages of development were purified by hybridization with the cloned homologous histone genes. The electrophoretic patterns of oocytes, 2-4 blastomeres, 64 cells and morula histone mRNAs was found to be identical, whereas the electrophoretic pattern of mesenchyme blastula histone mRNA was markedly different. The cloned histone DNA of P.lividus was hybridized with the RNA of each stage. The Tm was 74 degrees C in all cases except for the mesenchyme histone mRNAs whose Tm was 59 degrees C, thus suggesting that at least two different clusters of histone genes are active in the course of the sea urchin development. Images PMID:424304

  17. Construction of a Bacterial Cell that Contains Only the Set of Essential Genes Necessary to Impart Life

    DTIC Science & Technology

    2013-02-19

    chemical equation ) that transforms A into the desired product B. The product and all byproducts would exit the system to avoid equilibrium. The...recipient cells have reached an exact state of competence that will result in a successful transplantation. The Gershenfeld team at the MIT Center...genes of a minimal bacterium. Proceedings of the National Academy of Sciences of the United States of America 103(2): 425-30. "Hail Mary" Genome has

  18. Construction of a Bacterial Cell that Contains Only the Set of Essential Genes Necessary to Impart Life

    DTIC Science & Technology

    2013-05-16

    additional transposon studies and used the data to construct a new, fast growing parental strain. The effort to modularize the genome has been initiated...in future experiments, we have conducted additional transposon studies and used the data to construct a new, fast growing parental strain. The...be viable but too slow growing for continued use as a parental strain for additional deletions o A new transposon study was conducted and genes were

  19. p-Aminobenzoic acid and chloramphenicol biosynthesis in Streptomyces venezuelae: gene sets for a key enzyme, 4-amino-4-deoxychorismate synthase.

    PubMed

    Chang, Z; Sun, Y; He, J; Vining, L C

    2001-08-01

    Amplification of sequences from Streptomyces venezuelae ISP5230 genomic DNA using PCR with primers based on conserved prokaryotic pabB sequences gave two main products. One matched pabAB, a locus previously identified in S. venezuelae. The second closely resembled the conserved pabB sequence consensus and hybridized with a 3.8 kb NcoI fragment of S. venezuelae ISP5230 genomic DNA. Cloning and sequence analysis of the 3.8 kb fragment detected three ORFs, and their deduced amino acid sequences were used in BLAST searches of the GenBank database. The ORF1 product was similar to PabB in other bacteria and to the PabB domain encoded by S. venezuelae pabAB. The ORF2 product resembled PabA of other bacteria. ORF3 was incomplete; its deduced partial amino acid sequence placed it in the MocR group of GntR-type transcriptional regulators. Introducing vectors containing the 3.8 kb NcoI fragment of S. venezuelae DNA into pabA and pabB mutants of Escherichia coli, or into the Streptomyces lividans pab mutant JG10, enhanced sulfanilamide resistance in the host strains. The increased resistance was attributed to expression of the pair of discrete translationally coupled p-aminobenzoic acid biosynthesis genes (designated pabB/pabA) cloned in the 3.8 kb fragment. These represent a second set of genes encoding 4-amino-4-deoxychorismate synthase in S. venezuelae ISP5230. In contrast to the fused pabAB set previously isolated from this species, they do not participate in chloramphenicol biosynthesis, but like pabAB they can be disrupted without affecting growth on minimal medium. The gene disruption results suggest that S. venezuelae may have a third set of genes encoding PABA synthase.

  20. Independent evolution of the core and accessory gene sets in the genus Neisseria: insights gained from the genome of Neisseria lactamica isolate 020-06

    PubMed Central

    2010-01-01

    Background The genus Neisseria contains two important yet very different pathogens, N. meningitidis and N. gonorrhoeae, in addition to non-pathogenic species, of which N. lactamica is the best characterized. Genomic comparisons of these three bacteria will provide insights into the mechanisms and evolution of pathogenesis in this group of organisms, which are applicable to understanding these processes more generally. Results Non-pathogenic N. lactamica exhibits very similar population structure and levels of diversity to the meningococcus, whilst gonococci are essentially recent descendents of a single clone. All three species share a common core gene set estimated to comprise around 1190 CDSs, corresponding to about 60% of the genome. However, some of the nucleotide sequence diversity within this core genome is particular to each group, indicating that cross-species recombination is rare in this shared core gene set. Other than the meningococcal cps region, which encodes the polysaccharide capsule, relatively few members of the large accessory gene pool are exclusive to one species group, and cross-species recombination within this accessory genome is frequent. Conclusion The three Neisseria species groups represent coherent biological and genetic groupings which appear to be maintained by low rates of inter-species horizontal genetic exchange within the core genome. There is extensive evidence for exchange among positively selected genes and the accessory genome and some evidence of hitch-hiking of housekeeping genes with other loci. It is not possible to define a 'pathogenome' for this group of organisms and the disease causing phenotypes are therefore likely to be complex, polygenic, and different among the various disease-associated phenotypes observed. PMID:21092259

  1. Statistical Inference at Work: Statistical Process Control as an Example

    ERIC Educational Resources Information Center

    Bakker, Arthur; Kent, Phillip; Derry, Jan; Noss, Richard; Hoyles, Celia

    2008-01-01

    To characterise statistical inference in the workplace this paper compares a prototypical type of statistical inference at work, statistical process control (SPC), with a type of statistical inference that is better known in educational settings, hypothesis testing. Although there are some similarities between the reasoning structure involved in…

  2. The Influence of DNA Extraction Procedure and Primer Set on the Bacterial Community Analysis by Pyrosequencing of Barcoded 16S rRNA Gene Amplicons

    PubMed Central

    Starke, Ingo C.; Vahjen, Wilfried; Pieper, Robert; Zentek, Jürgen

    2014-01-01

    In this study, the effect of different DNA extraction procedures and primer sets on pyrosequencing results regarding the composition of bacterial communities in the ileum of piglets was investigated. Ileal chyme from piglets fed a diet containing different amounts of zinc oxide was used to evaluate a pyrosequencing study with barcoded 16S rRNA PCR products. Two DNA extraction methods (bead beating versus silica gel columns) and two primer sets targeting variable regions of bacterial 16S rRNA genes (8f-534r versus 968f-1401r) were considered. The SEED viewer software of the MG-RAST server was used for automated sequence analysis. A total of 5.2 × 105 sequences were used for analysis after processing for read length (150 bp), minimum sequence occurrence (5), and exclusion of eukaryotic and unclassified/uncultured sequences. DNA extraction procedures and primer sets differed significantly in total sequence yield. The distribution of bacterial order and main bacterial genera was influenced significantly by both parameters. However, this study has shown that the results of pyrosequencing studies using barcoded PCR amplicons of bacterial 16S rRNA genes depend on DNA extraction and primer choice, as well as on the manner of downstream sequence analysis. PMID:25120931

  3. Single-cell genomics of uncultivated deep-branching magnetotactic bacteria reveals a conserved set of magnetosome genes.

    PubMed

    Kolinko, Sebastian; Richter, Michael; Glöckner, Frank-Oliver; Brachmann, Andreas; Schüler, Dirk

    2016-01-01

    While magnetosome biosynthesis within the magnetotactic Proteobacteria is increasingly well understood, much less is known about the genetic control within deep-branching phyla, which have a unique ultrastructure and biosynthesize up to several hundreds of bullet-shaped magnetite magnetosomes arranged in multiple bundles of chains, but have no cultured representatives. Recent metagenomic analysis identified magnetosome genes in the genus 'Candidatus Magnetobacterium' homologous to those in Proteobacteria. However, metagenomic analysis has been limited to highly abundant members of the community, and therefore only little is known about the magnetosome biosynthesis, ecophysiology and metabolic capacity in deep-branching MTB. Here we report the analysis of single-cell derived draft genomes of three deep-branching uncultivated MTB. Single-cell sorting followed by whole genome amplification generated draft genomes of Candidatus Magnetobacterium bavaricum and Candidatus Magnetoovum chiemensis CS-04 of the Nitrospirae phylum. Furthermore, we present the first, nearly complete draft genome of a magnetotactic representative from the candidate phylum Omnitrophica, tentatively named Candidatus Omnitrophus magneticus SKK-01. Besides key metabolic features consistent with a common chemolithoautotrophic lifestyle, we identified numerous, partly novel genes most likely involved in magnetosome biosynthesis of bullet-shaped magnetosomes and their arrangement in multiple bundles of chains.

  4. Performing statistical analyses on quantitative data in Taverna workflows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray data

    PubMed Central

    Li, Peter; Castrillo, Juan I; Velarde, Giles; Wassink, Ingo; Soiland-Reyes, Stian; Owen, Stuart; Withers, David; Oinn, Tom; Pocock, Matthew R; Goble, Carole A; Oliver, Stephen G; Kell, Douglas B

    2008-01-01

    Background There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated with the use of computational tools employed for the analysis of such data. For example, computational tools such as R and MATLAB require prior knowledge of their programming languages in order to implement statistical analyses on data. Combining two or more tools in an analysis may also be problematic since data may have to be manually copied and pasted between separate user interfaces for each tool. Furthermore, this transfer of data may require a reconciliation step in order for there to be interoperability between computational tools. Results Developments in the Taverna workflow system have enabled pipelines to be constructed and enacted for generic and ad hoc analyses of quantitative data. Here, we present an example of such a workflow involving the statistical identification of differentially-expressed genes from microarray data followed by the annotation of their relationships to cellular processes. This workflow makes use of customised maxdBrowse web services, a system that allows Taverna to query and retrieve gene expression data from the maxdLoad2 microarray database. These data are then analysed by R to identify differentially-expressed genes using the Taverna RShell processor which has been developed for invoking this tool when it has been deployed as a service using the RServe library. In addition, the workflow uses Beanshell scripts to reconcile mismatches of data between services as well as to implement a form of user interaction for selecting subsets of microarray data for analysis as part of the workflow execution. A new plugin system in the Taverna software architecture is demonstrated by the use of renderers for displaying PDF files and CSV formatted data within the Taverna workbench. Conclusion Taverna can

  5. An entropy-based statistic for genomewide association studies.

    PubMed

    Zhao, Jinying; Boerwinkle, Eric; Xiong, Momiao

    2005-07-01

    Efficient genotyping methods and the availability of a large collection of single-nucleotide polymorphisms provide valuable tools for genetic studies of human disease. The standard chi2 statistic for case-control studies, which uses a linear function of allele frequencies, has limited power when the number of marker loci is large. We introduce a novel test statistic for genetic association studies that uses Shannon entropy and a nonlinear function of allele frequencies to amplify the differences in allele and haplotype frequencies to maintain statistical power with large numbers of marker loci. We investigate the relationship between the entropy-based test statistic and the standard chi2 statistic and show that, in most cases, the power of the entropy-based statistic is greater than that of the standard chi2 statistic. The distribution of the entropy-based statistic and the type I error rates are validated using simulation studies. Finally, we apply the new entropy-based test statistic to two real data sets, one for the COMT gene and schizophrenia and one for the MMP-2 gene and esophageal carcinoma, to evaluate the performance of the new method for genetic association studies. The results show that the entropy-based statistic obtained smaller P values than did the standard chi2 statistic.

  6. The Drosophila Ash1 Gene Product, Which Is Localized at Specific Sites on Polytene Chromosomes, Contains a Set Domain and a Phd Finger

    PubMed Central

    Tripoulas, N.; LaJeunesse, D.; Gildea, J.; Shearn, A.

    1996-01-01

    The determined state of Drosophila imaginal discs depends on stable patterns of homeotic gene expression. The stability of these patterns requires the function of the ash1 gene, a member of the trithorax group. The primary translation product of the 7.5-kb ash1 transcript is predicted to be a basic protein of 2144 amino acids. The ASH1 protein contains a SET domain and a PHD finger. Both of these motifs are found in the products of some trithorax group and Polycomb group genes. We have determined the nucleotide sequence alterations in 10 ash1 mutant alleles and have examined their mutant phenotype. The best candidate for a null allele is ash1(22). The truncated protein product of this mutant allele is predicted to contain only 47 amino acids. The ASH1 protein is localized on polytene chromosomes of larval salivary glands at >100 sites. The chromosomal localization of ASH1 implies that it functions at the transcriptional level to maintain the expression pattern of homeotic selector genes. PMID:8725238

  7. RutR is the uracil/thymine-sensing master regulator of a set of genes for synthesis and degradation of pyrimidines.

    PubMed

    Shimada, Tomohiro; Hirao, Kiyo; Kori, Ayako; Yamamoto, Kaneyoshi; Ishihama, Akira

    2007-11-01

    Using the genomic SELEX, a total of six Escherichia coli DNA fragments have been identified, which formed complexes with transcription factor RutR. The RutR regulon was found to include a large number of genes encoding components for not only degradation of pyrimidines but also transport of glutamate, synthesis of glutamine, synthesis of pyrimidine nucleotides and arginine, and degradation of purines. DNase I footprinting indicated that RutR recognizes a palindromic sequence of TTGACCAnnTGGTCAA. The RutR box in P1 promoter of carAB encoding carbamoyl phosphate synthetase, a key enzyme of pyrimidine synthesis, overlaps with the PepA (CarP) repressor binding site, implying competition between RutR and PepA. Adding either uracil or thymine abolished RutR binding in vitro to the carAB P1 promoter. Accordingly, in the rutR-deletion mutant or in the presence of uracil, the activation in vivo of carAB P1 promoter was markedly reduced. Northern blot analysis of the RutR target genes indicated that RutR represses the Gad system genes involved in glutamate-dependent acid resistance and allantoin degradation. Altogether we propose that RutR is the pyrimidine sensor and the master regulator for a large set of the genes involved in the synthesis and degradation of pyrimidines.

  8. Association Between Single-Nucleotide Polymorphisms in Hormone Metabolism and DNA Repair Genes and Epithelial Ovarian Cancer: Results from Two Australian Studies and an Additional Validation Set

    PubMed Central

    Beesley, Jonathan; Jordan, Susan J.; Spurdle, Amanda B.; Song, Honglin; Ramus, Susan J.; Kjaer, Suzanne Kruger; Hogdall, Estrid; DiCioccio, Richard A.; McGuire, Valerie; Whittemore, Alice S.; Gayther, Simon A.; Pharoah, Paul D.P.; Webb, Penelope M.; Chenevix-Trench, Georgia

    2009-01-01

    Although some high-risk ovarian cancer genes have been identified, it is likely that common low penetrance alleles exist that confer some increase in ovarian cancer risk. We have genotyped nine putative functional single-nucleotide polymorphisms (SNP) in genes involved in steroid hormone synthesis (SRD5A2, CYP19A1, HSB17B1, and HSD17B4) and DNA repair (XRCC2, XRCC3, BRCA2, and RAD52) using two Australian ovarian cancer case-control studies, comprising a total of 1,466 cases and 1,821 controls of Caucasian origin. Genotype frequencies in cases and controls were compared using logistic regression. The only SNP we found to be associated with ovarian cancer risk in both of these two studies was SRD5A2 V89L (rs523349), which showed a significant trend of increasing risk per rare allele (P = 0.00002). We then genotyped another SNP in this gene (rs632148; r2 = 0.945 with V89L) in an attempt to validate this finding in an independent set of 1,479 cases and 2,452 controls from United Kingdom, United States, and Denmark. There was no association between rs632148 and ovarian cancer risk in the validation samples, and overall, there was no significant heterogeneity between the results of the five studies. Further analyses of SNPs in this gene are therefore warranted to determine whether SRD5A2 plays a role in ovarian cancer predisposition. PMID:18086758

  9. Comprehensive tissue-specific gene set enrichment analysis and transcription factor analysis of breast cancer by integrating 14 gene expression datasets

    PubMed Central

    Dai, Shao-Xing; Li, Gong-Hua; Lv, Wen-Wen; Guo, Yi-Cheng; An, San-Qi; Wu, Guo-Ying; Liu, Dahai; Huang, Jing-Fei

    2017-01-01

    Breast cancer is the most commonly diagnosed malignancy in women. Several key genes and pathways have been proven to correlate with breast cancer pathology. This study sought to explore the differences in key transcription factors (TFs), transcriptional regulation networks and dysregulated pathways in different tissues in breast cancer. We employed 14 breast cancer datasets from NCBI-GEO and performed an integrated analysis in three different tissues including breast, blood and saliva. The results showed that there were eight genes (CEBPD, EGR1, EGR2, EGR3, FOS, FOSB, ID1 and NFIL3) down-regulated in breast tissue but up-regulated in blood tissue. Furthermore, we identified several unreported tissue-specific TFs that may contribute to breast cancer, including ATOH8, DMRT2, TBX15 and ZNF367. The dysregulation of these TFs damaged lipid metabolism, development, cell adhesion, proliferation, differentiation and metastasis processes. Among these pathways, the breast tissue showed the most serious impairment and the blood tissue showed a relatively moderate damage, whereas the saliva tissue was almost unaffected. This study could be helpful for future biomarker discovery, drug design, and therapeutic and predictive applications in breast cancers. PMID:28036274

  10. Allele Mining and Selective Patterns of Pi9 Gene in a Set of Rice Landraces from India

    PubMed Central

    Imam, Jahangir; Mandal, Nimai P.; Variar, Mukund; Shukla, Pratyoosh

    2016-01-01

    Allelic variants of the broad-spectrum blast resistance gene, Pi9 (nucleotide binding site-leucine-rich repeat region) have been analyzed in Indian rice landraces. They were selected from the list of 338 rice landraces phenotyped in the rice blast nursery at central Rainfed Upland Rice Research Station, Hazaribag. Six of them were further selected on the basis of their resistance and susceptible pattern for virulence analysis and selective pattern study of Pi9 gene. The sequence analysis and phylogenetic study illustrated that such sequences are vastly homologous and clustered into two groups. All the blast resistance Pi9 alleles were grouped into one cluster, whereas Pi9 alleles of susceptible landraces formed another cluster even though these landraces have a low level of DNA polymorphisms. A total number of 136 polymorphic sites comprising of transitions, transversions, and insertion and deletions (InDels) were identified in the 2.9 kb sequence of Pi9 alleles. Lower variation in the form of mutations (77) (Transition + Transversion), and InDels (59) were observed in the Pi9 alleles isolated from rice landraces studied. The results showed that the Pi9 alleles of the selected rice landraces were less variable, suggesting that the rice landraces would have been exposed to less number of pathotypes across the country. The positive Tajima’s D (0.33580), P > 0.10 (not significant) was observed among the seven rice landraces, which suggests the balancing selection of Pi9 alleles. The value of synonymous substitution (-0.43337) was less than the non-synonymous substitution (0.78808). The greater non-synonymous substitution than the synonymous means that the coding region, mainly the leucine-rich repeat domain was under diversified selection. In this study, the Pi9 gene has been subjected to balancing selection with low nucleotide diversity which is different from the earlier reports, this may be because of the closeness of the rice landraces, cultivated in the same

  11. A set of genes critical to development is epigenetically poised in mouse germ cells from fetal stages through completion of meiosis.

    PubMed

    Lesch, Bluma J; Dokshin, Gregoriy A; Young, Richard A; McCarrey, John R; Page, David C

    2013-10-01

    In multicellular organisms, germ cells carry the hereditary material from one generation to the next. Developing germ cells are unipotent gamete precursors, and mature gametes are highly differentiated, specialized cells. However, upon gamete union at fertilization, their genomes drive a totipotent program, giving rise to a complete embryo as well as extraembryonic tissues. The biochemical basis for the ability to transition from differentiated cell to totipotent zygote is unknown. Here we report that a set of developmentally critical genes is maintained in an epigenetically poised (bivalent) state from embryonic stages through the end of meiosis. We performed ChIP-seq and RNA-seq analysis on flow-sorted male and female germ cells during embryogenesis at three time points surrounding sexual differentiation and female meiotic initiation, and then extended our analysis to meiotic and postmeiotic male germ cells. We identified a set of genes that is highly enriched for regulators of differentiation and retains a poised state (high H3K4me3, high H3K27me3, and lack of expression) across sexes and across developmental stages, including in haploid postmeiotic cells. The existence of such a state in embryonic stem cells has been well described. We now demonstrate that a subset of genes is maintained in a poised state in the germ line from the initiation of sexual differentiation during fetal development and into postmeiotic stages. We propose that the epigenetically poised condition of these developmental genes is a fundamental property of the mammalian germ-line nucleus, allowing differentiated gametes to unleash a totipotent program following fertilization.

  12. Analysis of Antibiotic Resistance Genes and its Associated SCCmec Types among Nasal Carriage of Methicillin Resistant Coagulase Negative Staphylococci from Community Settings, Chennai, Southern India

    PubMed Central

    Murugesan, Saravanan; Perumal, Nagaraj; Mahalingam, Surya Prakash; Dilliappan, Selva Kumar

    2015-01-01

    Objective The study was designed to find the distribution of SCCmec types and the various antibiotic resistance genes amongst MR-CoNS isolates from asymptomatic individuals. Materials and Methods A total of 145 nasal swabs were collected from asymptomatic healthy individuals from community settings. Identification and speciation of CoNS were done by standard biochemical methods. Screening of methicillin resistance (mecA gene) and detection of various antibiotic resistant genes were done using multiplex PCR method. SCCmec types (I - V) were determined using multiplex PCR. Results 50 (44.6%) isolates were found to be methicillin resistant both by cefoxitin method and multiplex PCR. S. epidermidis (40%) was the predominant species followed by S. haemolyticus (28%), S. hominis (20%) and S. warneri (12%). Highest resistance was shown for cotrimoxazole (26%), followed by ciprofloxacin (24%), tetracycline (20%), erythromycin (18%), fusidic acid (10%) and mupirocin (6%). Among SCCmec types, 44 isolates showed single type, including type I (30%), type IV (24%), type II (18%), type V (14%) and type III (2%). 6 isolates showed two types, III+IV (n= 2), II+V (n=2), IV+V (n=1) and type I+V (n=1). Conclusion In conclusion, to the best of our knowledge, this is the first study in India to study the distribution of antibiotic resistant genes and SCCmec types among MR-CoNS from community settings. This study highlights high prevalence of MR-CoNS in community and its role in harbouring genetically diverse SCCmec elements as antibiotic resistance determinant. PMID:26435940

  13. Descriptive statistics.

    PubMed

    Shi, Runhua; McLarty, Jerry W

    2009-10-01

    In this article, we introduced basic concepts of statistics, type of distributions, and descriptive statistics. A few examples were also provided. The basic concepts presented herein are only a fraction of the concepts related to descriptive statistics. Also, there are many commonly used distributions not presented herein, such as Poisson distributions for rare events and exponential distributions, F distributions, and logistic distributions. More information can be found in many statistics books and publications.

  14. Statistical Software.

    ERIC Educational Resources Information Center

    Callamaras, Peter

    1983-01-01

    This buyer's guide to seven major types of statistics software packages for microcomputers reviews Edu-Ware Statistics 3.0; Financial Planning; Speed Stat; Statistics with DAISY; Human Systems Dynamics package of Stats Plus, ANOVA II, and REGRESS II; Maxistat; and Moore-Barnes' MBC Test Construction and MBC Correlation. (MBR)

  15. Statistical Diversions

    ERIC Educational Resources Information Center

    Petocz, Peter; Sowey, Eric

    2008-01-01

    As a branch of knowledge, Statistics is ubiquitous and its applications can be found in (almost) every field of human endeavour. In this article, the authors track down the possible source of the link between the "Siren song" and applications of Statistics. Answers to their previous five questions and five new questions on Statistics are presented.

  16. The mannose-specific lectins from ramsons (Allium ursinum L.) are encoded by three sets of genes.

    PubMed

    Van Damme, J M; Smeets, K; Torrekens, S; Van Leuven, F; Peumans, W J

    1993-10-01

    Lectin cDNA clones encoding the two mannose-binding lectins from ramsons (allium ursinum L.) bulbs, AUAI and AUAII (AUA, Allium ursinum agglutinin), were isolated and characterized. Sequence comparison of the different cDNA clones isolated revealed three types of lectin clones called LECAUAG0, LECAUAG1 and LECAUAG2, which besides the obvious differences in their sequences also differ from each other in the number of potential glycosylation sites within the C-terminal peptide of the lectin precursor. In vivo biosynthesis studies of the ramson lectins have shown that glycosylated lectin precursors occur in the organelle fraction of radioactively labeled ramson bulbs. Despite the similarities between the A. ursinum and the A. sativum (garlic) lectins at the protein level, molecular cloning of the two ramson lectins has shown that the lectin genes in A. ursinum are organized differently. Whereas in A. sativum the lectin polypeptides of the heterodimeric ASAI are encoded by one large precursor, those of the heterodimeric AUAI lectin are derived from two different precursors. These results are confirmed by Northern blot hybridization of A. ursinum RNA which, after hybridization with a labeled lectin cDNA, reveals only one band of 800 nucleotides in contrast to A. sativum RNA which yields two bands of 1400 and 800 nucleotides. Furthermore it is shown that the two mannose-binding lectins are differentially expressed.

  17. A Novel Hybrid Dimension Reduction Technique for Undersized High Dimensional Gene Expression Data Sets Using Information Complexity Criterion for Cancer Classification

    PubMed Central

    Pamukçu, Esra; Bozdogan, Hamparsum; Çalık, Sinan

    2015-01-01

    Gene expression data typically are large, complex, and highly noisy. Their dimension is high with several thousand genes (i.e., features) but with only a limited number of observations (i.e., samples). Although the classical principal component analysis (PCA) method is widely used as a first standard step in dimension reduction and in supervised and unsupervised classification, it suffers from several shortcomings in the case of data sets involving undersized samples, since the sample covariance matrix degenerates and becomes singular. In this paper we address these limitations within the context of probabilistic PCA (PPCA) by introducing and developing a new and novel approach using maximum entropy covariance matrix and its hybridized smoothed covariance estimators. To reduce the dimensionality of the data and to choose the number of probabilistic PCs (PPCs) to be retained, we further introduce and develop celebrated Akaike's information criterion (AIC), consistent Akaike's information criterion (CAIC), and the information theoretic measure of complexity (ICOMP) criterion of Bozdogan. Six publicly available undersized benchmark data sets were analyzed to show the utility, flexibility, and versatility of our approach with hybridized smoothed covariance matrix estimators, which do not degenerate to perform the PPCA to reduce the dimension and to carry out supervised classification of cancer groups in high dimensions. PMID:25838836

  18. HU participates in expression of a specific set of genes required for growth and survival at acidic pH in Escherichia coli.

    PubMed

    Bi, Hongkai; Sun, Lianle; Fukamachi, Toshihiko; Saito, Hiromi; Kobayashi, Hiroshi

    2009-05-01

    The major histone-like Escherichia coli protein, HU, is composed of alpha and beta subunits respectively encoded by hupA and hupB in Escherichia coli. A mutant deficient in both hupA and hupB grew at a slightly slower rate than the wild type at pH 7.5. Growth of the mutant diminished with a decrease in pH, and no growth was observed at pH 4.6. Mutants of either hupA or hupB grew at all pH levels tested. The arginine-dependent survival at pH 2.5 was diminished approximately 60-fold by the deletion of both hupA and hupB, whereas the survival was slightly affected by the deletion of either hupA or hupB. The mRNA levels of adiA and adiC, which respectively encode arginine decarboxylase and arginine/agmatine antiporter, were low in the mutant deficient in both hupA and hupB. The deletion of both hupA and hupB had little effect on survival at pH 2.5 in the presence of glutamate or lysine, and expression of the genes for glutamate and lysine decarboxylases was not impaired by the deletion of the HU genes. These results suggest that HU regulates expression of the specific set of genes required for growth and survival in acidic environments.

  19. OPLS-DA as a suitable method for selecting a set of gene transcripts discriminating RAS- and PTPN11-mutated cells in acute lymphoblastic leukaemia.

    PubMed

    Musumarra, Giuseppe; Condorelli, Daniele F; Fortuna, Cosimo G

    2011-01-01

    OPLS discriminant analysis (OPLS-DA) was successfully applied for the selection of a limited number of gene transcripts necessary to discriminate PTPN11 and RAS mutated cells in acute lymphoblastic leukaemia (ALL) patients. The original set of 273 variables with VIP (1) values higher than 2.0 in the OPLS-DA model could be further reduced to 200 by elimination of less informative variables in the PCA class models adopted for SIMCA classification. The above 200 transcripts not only achieve a satisfactory discrimination accuracy between PTPN11 and RAS mutated cells but also indicate clearly that wild type samples belong to none of the mutated class models. In this list it was possible to identify candidate genes that could be involved in the molecular mechanisms discriminating PTPN11 and RAS mutations in ALL. Among them CBFA2T2, a member of the "ETO" family, is known because of its homology and association with the product of RUNX1-CBFA2T1 gene fusion generated by t(8;21) translocation, one frequent cause of acute myeloid leukemia.

  20. Chromatin H3K27me3/H3K4me3 histone marks define gene sets in high-grade serous ovarian cancer that distinguish malignant, tumour-sustaining and chemo-resistant ovarian tumour cells.

    PubMed

    Chapman-Rothe, N; Curry, E; Zeller, C; Liber, D; Stronach, E; Gabra, H; Ghaem-Maghami, S; Brown, R

    2013-09-19

    In embryonic stem (ES) cells, bivalent chromatin domains containing H3K4me3 and H3K27me3 marks silence developmental genes, while keeping them poised for activation following differentiation. We have identified gene sets associated with H3K27me3 and H3K4me3 marks at transcription start sites in a high-grade ovarian serous tumour and examined their association with epigenetic silencing and malignant progression. This revealed novel silenced bivalent marked genes, not described previously for ES cells, which are significantly enriched for the PI3K (P<10(-7)) and TGF-β signalling pathways (P<10(-5)). We matched histone marked gene sets to gene expression sets of eight normal fallopian tubes and 499 high-grade serous malignant ovarian samples. This revealed a significant decrease in gene expression for the H3K27me3 and bivalent gene sets in malignant tissue. We then correlated H3K27me3 and bivalent gene sets to gene expression data of ovarian tumour 'stem cell-like' sustaining cells versus non-sustaining cells. This showed a significantly lower expression for the H3K27me3 and bivalent gene sets in the tumour-sustaining cells. Similarly, comparison of matched chemo-sensitive and chemo-resistant ovarian cell lines showed a significantly lower expression of H3K27me3/bivalent marked genes in the chemo-resistant compared with the chemo-sensitive cell line. Our analysis supports the hypothesis that bivalent marks are associated with epigenetic silencing in ovarian cancer. However it also suggests that additional tumour specific bivalent marks, to those known in ES cells, are present in tumours and may potentially influence the subsequent development of drug resistance and tumour progression.

  1. Statistical Diversions

    ERIC Educational Resources Information Center

    Petocz, Peter; Sowey, Eric

    2008-01-01

    In this article, the authors focus on hypothesis testing--that peculiarly statistical way of deciding things. Statistical methods for testing hypotheses were developed in the 1920s and 1930s by some of the most famous statisticians, in particular Ronald Fisher, Jerzy Neyman and Egon Pearson, who laid the foundations of almost all modern methods of…

  2. A multistep approach to single nucleotide polymorphism-set analysis: an evaluation of power and type I error of gene-based tests of association after pathway-based association tests.

    PubMed

    Valcarcel, Alessandra; Grinde, Kelsey; Cook, Kaitlyn; Green, Alden; Tintle, Nathan

    2016-01-01

    The aggregation of functionally associated variants given a priori biological information can aid in the discovery of rare variants associated with complex diseases. Many methods exist that aggregate rare variants into a set and compute a single p value summarizing association between the set of rare variants and a phenotype of interest. These methods are often called gene-based, rare variant tests of association because the variants in the set are often all contained within the same gene. A reasonable extension of these approaches involves aggregating variants across an even larger set of variants (eg, all variants contained in genes within a pathway). Testing sets of variants such as pathways for association with a disease phenotype reduces multiple testing penalties, may increase power, and allows for straightforward biological interpretation. However, a significant variant-set association test does not indicate precisely which variants contained within that set are causal. Because pathways often contain many variants, it may be helpful to follow-up significant pathway tests by conducting gene-based tests on each gene in that pathway to narrow in on the region of causal variants. In this paper, we propose such a multistep approach for variant-set analysis that can also account for covariates and complex pedigree structure. We demonstrate this approach on simulated phenotypes from Genetic Analysis Workshop 19. We find generally better power for the multistep approach when compared to a more conventional, single-step approach that simply runs gene-based tests of association on each gene across the genome. Further work is necessary to evaluate the multistep approach on different data sets with different characteristics.

  3. miR-124, -128, and -137 Orchestrate Neural Differentiation by Acting on Overlapping Gene Sets Containing a Highly Connected Transcription Factor Network.

    PubMed

    Santos, Márcia C T; Tegge, Allison N; Correa, Bruna R; Mahesula, Swetha; Kohnke, Luana Q; Qiao, Mei; Ferreira, Marco A R; Kokovay, Erzsebet; Penalva, Luiz O F

    2016-01-01

    The ventricular-subventricular zone harbors neural stem cells (NSCs) that can differentiate into neurons, astrocytes, and oligodendrocytes. This process requires loss of stem cell properties and gain of characteristics associated with differentiated cells. miRNAs function as important drivers of this transition; miR-124, -128, and -137 are among the most relevant ones and have been shown to share commonalities and act as proneurogenic regulators. We conducted biological and genomic analyses to dissect their target repertoire during neurogenesis and tested the hypothesis that they act cooperatively to promote differentiation. To map their target genes, we transfected NSCs with antagomiRs and analyzed differences in their mRNA profile throughout differentiation with respect to controls. This strategy led to the identification of 910 targets for miR-124, 216 for miR-128, and 652 for miR-137. The target sets show extensive overlap. Inspection by gene ontology and network analysis indicated that transcription factors are a major component of these miRNAs target sets. Moreover, several of these transcription factors form a highly interconnected network. Sp1 was determined to be the main node of this network and was further investigated. Our data suggest that miR-124, -128, and -137 act synergistically to regulate Sp1 expression. Sp1 levels are dramatically reduced as cells differentiate and silencing of its expression reduced neuronal production and affected NSC viability and proliferation. In summary, our results show that miRNAs can act cooperatively and synergistically to regulate complex biological processes like neurogenesis and that transcription factors are heavily targeted to branch out their regulatory effect.

  4. Statistics Clinic

    NASA Technical Reports Server (NTRS)

    Feiveson, Alan H.; Foy, Millennia; Ploutz-Snyder, Robert; Fiedler, James

    2014-01-01

    Do you have elevated p-values? Is the data analysis process getting you down? Do you experience anxiety when you need to respond to criticism of statistical methods in your manuscript? You may be suffering from Insufficient Statistical Support Syndrome (ISSS). For symptomatic relief of ISSS, come for a free consultation with JSC biostatisticians at our help desk during the poster sessions at the HRP Investigators Workshop. Get answers to common questions about sample size, missing data, multiple testing, when to trust the results of your analyses and more. Side effects may include sudden loss of statistics anxiety, improved interpretation of your data, and increased confidence in your results.

  5. Thy1.2 driven expression of transgenic His₆-SUMO2 in the brain of mice alters a restricted set of genes.

    PubMed

    Rossner, Moritz J; Tirard, Marilyn

    2014-08-05

    Protein SUMOylation is a post-translational protein modification with a key regulatory role in nerve cell development and function, but its function in mammals in vivo has only been studied cursorily. We generated two new transgenic mouse lines that express His6-tagged SUMO1 and SUMO2 driven by the Thy1.2 promoter. The brains of mice of the two lines express transgenic His6-SUMO peptides and conjugate them to substrates in vivo but cytoarchitecture and synaptic organization of adult transgenic mouse brains are indistinguishable from the wild-type situation. We investigated the impact of transgenic SUMO expression on gene transcription in the hippocampus by performing genome wide analyses using microarrays. Surprisingly, no changes were observed in Thy1.2::His6-SUMO1 transgenic mice and only a restricted set of genes were upregulated in Thy1.2::His6-SUMO2 mice. Among these, Penk1 (Preproenkephalin 1), which encodes Met-enkephalin neuropeptides, showed the highest degree of alteration. Accordingly, a significant increase in Met-enkephalin peptide levels in the hippocampus of Thy1.2::His6-SUMO2 was detected, but the expression levels and cellular localization of Met-enkephalin receptors were not changed. Thus, transgenic neuronal expression of His6-SUMO1 or His6-SUMO2 only induces very minor phenotypical changes in mice.

  6. Fine-Scale Linkage Mapping Reveals a Small Set of Candidate Genes Influencing Honey Bee Grooming Behavior in Response to Varroa Mites

    PubMed Central

    Arechavaleta-Velasco, Miguel E.; Alcala-Escamilla, Karla; Robles-Rios, Carlos; Tsuruda, Jennifer M.; Hunt, Greg J.

    2012-01-01

    Populations of honey bees in North America have been experiencing high annual colony mortality for 15–20 years. Many apicultural researchers believe that introduced parasites called Varroa mites (V. destructor) are the most important factor in colony deaths. One important resistance mechanism that limits mite population growth in colonies is the ability of some lines of honey bees to groom mites from their bodies. To search for genes influencing this trait, we used an Illumina Bead Station genotyping array to determine the genotypes of several hundred worker bees at over a thousand single-nucleotide polymorphisms in a family that was apparently segregating for alleles influencing this behavior. Linkage analyses provided a genetic map with 1,313 markers anchored to genome sequence. Genotypes were analyzed for association with grooming behavior, measured as the time that individual bees took to initiate grooming after mites were placed on their thoraces. Quantitative-trait-locus interval mapping identified a single chromosomal region that was significant at the chromosome-wide level (p<0.05) on chromosome 5 with a LOD score of 2.72. The 95% confidence interval for quantitative trait locus location contained only 27 genes (honey bee official gene annotation set 2) including Atlastin, Ataxin and Neurexin-1 (AmNrx1), which have potential neurodevelopmental and behavioral effects. Atlastin and Ataxin homologs are associated with neurological diseases in humans. AmNrx1 codes for a presynaptic protein with many alternatively spliced isoforms. Neurexin-1 influences the growth, maintenance and maturation of synapses in the brain, as well as the type of receptors most prominent within synapses. Neurexin-1 has also been associated with autism spectrum disorder and schizophrenia in humans, and self-grooming behavior in mice. PMID:23133594

  7. Mental Illness Statistics

    MedlinePlus

    ... of benign genes ID’s ASD suspects More Additional Mental Health Information from NIMH Medications Statistics Clinical Trials Coping ... Finder Publicaciones en Español The National Institute of Mental Health (NIMH) is part of the National Institutes of ...

  8. Rapid detection and statistical differentiation of KPC gene variants in Gram-negative pathogens by use of high-resolution melting and ScreenClust analyses.

    PubMed

    Roth, Amanda L; Hanson, Nancy D

    2013-01-01

    In the United States, the production of the Klebsiella pneumoniae carbapenemase (KPC) is an important mechanism of carbapenem resistance in Gram-negative pathogens. Infections with KPC-producing organisms are associated with increased morbidity and mortality; therefore, the rapid detection of KPC-producing pathogens is critical in patient care and infection control. We developed a real-time PCR assay complemented with traditional high-resolution melting (HRM) analysis, as well as statistically based genotyping, using the Rotor-Gene ScreenClust HRM software to both detect the presence of bla(KPC) and differentiate between KPC-2-like and KPC-3-like alleles. A total of 166 clinical isolates of Enterobacteriaceae, Pseudomonas aeruginosa, and Acinetobacter baumannii with various β-lactamase susceptibility patterns were tested in the validation of this assay; 66 of these organisms were known to produce the KPC β-lactamase. The real-time PCR assay was able to detect the presence of bla(KPC) in all 66 of these clinical isolates (100% sensitivity and specificity). HRM analysis demonstrated that 26 had KPC-2-like melting peak temperatures, while 40 had KPC-3-like melting peak temperatures. Sequencing of 21 amplified products confirmed the melting peak results, with 9 isolates carrying bla(KPC-2) and 12 isolates carrying bla(KPC-3). This PCR/HRM assay can identify KPC-producing Gram-negative pathogens in as little as 3 h after isolation of pure colonies and does not require post-PCR sample manipulation for HRM analysis, and ScreenClust analysis easily distinguishes bla(KPC-2-like) and bla(KPC-3-like) alleles. Therefore, this assay is a rapid method to identify the presence of bla(KPC) enzymes in Gram-negative pathogens that can be easily integrated into busy clinical microbiology laboratories.

  9. Quick Statistics

    MedlinePlus

    ... population, or about 25 million Americans, has experienced tinnitus lasting at least five minutes in the past ... by NIDCD Epidemiology and Statistics Program staff: (1) tinnitus prevalence was obtained from the 2008 National Health ...

  10. A Molecular Approach to Nested RT-PCR Using a New Set of Primers for the Detection of the Human Immunodeficiency Virus Protease Gene

    PubMed Central

    Zarei, Mohammad; Ravanshad, Mehrdad; Bagban, Ashraf; Fallahi, Shahab

    2016-01-01

    Background The human immunodeficiency virus (HIV-1) is the etiologic agent of AIDS. The disease can be transmitted via blood in the window period prior to the development of antibodies to the disease. Thus, an appropriate method for the detection of HIV-1 during this window period is very important. Objectives This descriptive study proposes a sensitive, efficient, inexpensive, and easy method to detect HIV-1. Patients and Methods In this study 25 serum samples of patients under treatment and also 10 positive and 10 negative control samples were studied. Twenty-five blood samples were obtained from HIV-1-infected individuals who were receiving treatment at the acquired immune deficiency syndrome (AIDS) research center of Imam Khomeini hospital in Tehran. The identification of HIV-1-positive samples was done by using reverse transcription to produce copy deoxyribonucleic acid (cDNA) and then optimizing the nested polymerase chain reaction (PCR) method. Two pairs of primers were then designed specifically for the protease gene fragment of the nested real time-PCR (RT-PCR) samples. Electrophoresis was used to examine the PCR products. The results were analyzed using statistical tests, including Fisher’s exact test, and SPSS17 software. Results The 325 bp band of the protease gene was observed in all the positive control samples and in none of the negative control samples. The proposed method correctly identified HIV-1 in 23 of the 25 samples. Conclusions These results suggest that, in comparison with viral cultures, antibody detection by enzyme linked immunosorbent assay (ELISAs), and conventional PCR methods, the proposed method has high sensitivity and specificity for the detection of HIV-1. PMID:27679699

  11. High Nasal Carriage Rate of Staphylococcus aureus Containing Panton-Valentine leukocidin- and EDIN-Encoding Genes in Community and Hospital Settings in Burkina Faso

    PubMed Central

    Ouedraogo, Abdoul-Salam; Dunyach-Remy, Catherine; Kissou, Aimée; Sanou, Soufiane; Poda, Armel; Kyelem, Carole G.; Solassol, Jérôme; Bañuls, Anne-Laure; Van De Perre, Philippe; Ouédraogo, Rasmata; Jean-Pierre, Hélène; Lavigne, Jean-Philippe; Godreuil, Sylvain

    2016-01-01

    The objectives of the present study were to investigate the rate of S.aureus nasal carriage and molecular characteristics in hospital and community settings in Bobo Dioulasso, Burkina Faso. Nasal samples (n = 219) were collected from 116 healthy volunteers and 103 hospitalized patients in July and August 2014. Samples were first screened using CHROMagar Staph aureus chromogenic agar plates, and S. aureus strains were identified by mass spectrometry. Antibiotic susceptibility was tested using the disk diffusion method on Müller-Hinton agar. All S. aureus isolates were genotyped using DNA microarray. Overall, the rate of S. aureus nasal carriage was 32.9% (72/219) with 29% in healthy volunteers and 37% in hospital patients. Among the S. aureus isolates, only four methicillin-resistant S. aureus (MRSA) strains were identified and all in hospital patients (3.9%). The 72 S. aureus isolates from nasal samples belonged to 16 different clonal complexes, particularly to CC 152-MSSA (22 clones) and CC1-MSSA (nine clones). Two clones were significantly associated with community settings: CC1-MSSA and CC45-MSSA. The MRSA strains belonged to the ST88-MRSA-IV or the CC8-MRSA-V complex. A very high prevalence of toxinogenic strains 52.2% (36/69), containing Panton-Valentine leucocidin- and EDIN-encoding genes, was identified among the S. aureus isolates in community and hospital settings. This study provides the first characterization of S. aureus clones and their genetic characteristics in Burkina Faso. Altogether, it highlights the low prevalence of antimicrobial resistance, high diversity of methicillin-sensitive S. aureus clones and high frequency of toxinogenic S. aureus strains. PMID:27679613

  12. Set points, settling points and some alternative models: theoretical options to understand how genes and environments combine to regulate body adiposity

    PubMed Central

    Speakman, John R.; Levitsky, David A.; Allison, David B.; Bray, Molly S.; de Castro, John M.; Clegg, Deborah J.; Clapham, John C.; Dulloo, Abdul G.; Gruer, Laurence; Haw, Sally; Hebebrand, Johannes; Hetherington, Marion M.; Higgs, Susanne; Jebb, Susan A.; Loos, Ruth J. F.; Luckman, Simon; Luke, Amy; Mohammed-Ali, Vidya; O’Rahilly, Stephen; Pereira, Mark; Perusse, Louis; Robinson, Tom N.; Rolls, Barbara; Symonds, Michael E.; Westerterp-Plantenga, Margriet S.

    2011-01-01

    The close correspondence between energy intake and expenditure over prolonged time periods, coupled with an apparent protection of the level of body adiposity in the face of perturbations of energy balance, has led to the idea that body fatness is regulated via mechanisms that control intake and energy expenditure. Two models have dominated the discussion of how this regulation might take place. The set point model is rooted in physiology, genetics and molecular biology, and suggests that there is an active feedback mechanism linking adipose tissue (stored energy) to intake and expenditure via a set point, presumably encoded in the brain. This model is consistent with many of the biological aspects of energy balance, but struggles to explain the many significant environmental and social influences on obesity, food intake and physical activity. More importantly, the set point model does not effectively explain the ‘obesity epidemic’ – the large increase in body weight and adiposity of a large proportion of individuals in many countries since the 1980s. An alternative model, called the settling point model, is based on the idea that there is passive feedback between the size of the body stores and aspects of expenditure. This model accommodates many of the social and environmental characteristics of energy balance, but struggles to explain some of the biological and genetic aspects. The shortcomings of these two models reflect their failure to address the gene-by-environment interactions that dominate the regulation of body weight. We discuss two additional models – the general intake model and the dual intervention point model – that address this issue and might offer better ways to understand how body fatness is controlled. PMID:22065844

  13. High Nasal Carriage Rate of Staphylococcus aureus Containing Panton-Valentine leukocidin- and EDIN-Encoding Genes in Community and Hospital Settings in Burkina Faso.

    PubMed

    Ouedraogo, Abdoul-Salam; Dunyach-Remy, Catherine; Kissou, Aimée; Sanou, Soufiane; Poda, Armel; Kyelem, Carole G; Solassol, Jérôme; Bañuls, Anne-Laure; Van De Perre, Philippe; Ouédraogo, Rasmata; Jean-Pierre, Hélène; Lavigne, Jean-Philippe; Godreuil, Sylvain

    2016-01-01

    The objectives of the present study were to investigate the rate of S.aureus nasal carriage and molecular characteristics in hospital and community settings in Bobo Dioulasso, Burkina Faso. Nasal samples (n = 219) were collected from 116 healthy volunteers and 103 hospitalized patients in July and August 2014. Samples were first screened using CHROMagar Staph aureus chromogenic agar plates, and S. aureus strains were identified by mass spectrometry. Antibiotic susceptibility was tested using the disk diffusion method on Müller-Hinton agar. All S. aureus isolates were genotyped using DNA microarray. Overall, the rate of S. aureus nasal carriage was 32.9% (72/219) with 29% in healthy volunteers and 37% in hospital patients. Among the S. aureus isolates, only four methicillin-resistant S. aureus (MRSA) strains were identified and all in hospital patients (3.9%). The 72 S. aureus isolates from nasal samples belonged to 16 different clonal complexes, particularly to CC 152-MSSA (22 clones) and CC1-MSSA (nine clones). Two clones were significantly associated with community settings: CC1-MSSA and CC45-MSSA. The MRSA strains belonged to the ST88-MRSA-IV or the CC8-MRSA-V complex. A very high prevalence of toxinogenic strains 52.2% (36/69), containing Panton-Valentine leucocidin- and EDIN-encoding genes, was identified among the S. aureus isolates in community and hospital settings. This study provides the first characterization of S. aureus clones and their genetic characteristics in Burkina Faso. Altogether, it highlights the low prevalence of antimicrobial resistance, high diversity of methicillin-sensitive S. aureus clones and high frequency of toxinogenic S. aureus strains.

  14. UpSet: Visualization of Intersecting Sets

    PubMed Central

    Lex, Alexander; Gehlenborg, Nils; Strobelt, Hendrik; Vuillemot, Romain; Pfister, Hanspeter

    2016-01-01

    Understanding relationships between sets is an important analysis task that has received widespread attention in the visualization community. The major challenge in this context is the combinatorial explosion of the number of set intersections if the number of sets exceeds a trivial threshold. In this paper we introduce UpSet, a novel visualization technique for the quantitative analysis of sets, their intersections, and aggregates of intersections. UpSet is focused on creating task-driven aggregates, communicating the size and properties of aggregates and intersections, and a duality between the visualization of the elements in a dataset and their set membership. UpSet visualizes set intersections in a matrix layout and introduces aggregates based on groupings and queries. The matrix layout enables the effective representation of associated data, such as the number of elements in the aggregates and intersections, as well as additional summary statistics derived from subset or element attributes. Sorting according to various measures enables a task-driven analysis of relevant intersections and aggregates. The elements represented in the sets and their associated attributes are visualized in a separate view. Queries based on containment in specific intersections, aggregates or driven by attribute filters are propagated between both views. We also introduce several advanced visual encodings and interaction methods to overcome the problems of varying scales and to address scalability. UpSet is web-based and open source. We demonstrate its general utility in multiple use cases from various domains. PMID:26356912

  15. Statistical Theory of Breakup Reactions

    NASA Astrophysics Data System (ADS)

    Bertulani, Carlos A.; Descouvemont, Pierre; Hussein, Mahir S.

    2014-04-01

    We propose an alternative for Coupled-Channels calculations with looselybound exotic nuclei(CDCC), based on the the Random Matrix Model of the statistical theory of nuclear reactions. The coupled channels equations are divided into two sets. The first set, described by the CDCC, and the other set treated with RMT. The resulting theory is a Statistical CDCC (CDCCs), able in principle to take into account many pseudo channels.

  16. Quartiles in Elementary Statistics

    ERIC Educational Resources Information Center

    Langford, Eric

    2006-01-01

    The calculation of the upper and lower quartile values of a data set in an elementary statistics course is done in at least a dozen different ways, depending on the text or computer/calculator package being used (such as SAS, JMP, MINITAB, "Excel," and the TI-83 Plus). In this paper, we examine the various methods and offer a suggestion for a new…

  17. CELF Family RNA–Binding Protein UNC-75 Regulates Two Sets of Mutually Exclusive Exons of the unc-32 Gene in Neuron-Specific Manners in Caenorhabditis elegans

    PubMed Central

    Kuroyanagi, Hidehito; Watanabe, Yohei; Hagiwara, Masatoshi

    2013-01-01

    An enormous number of alternative pre–mRNA splicing patterns in multicellular organisms are coordinately defined by a limited number of regulatory proteins and cis elements. Mutually exclusive alternative splicing should be strictly regulated and is a challenging model for elucidating regulation mechanisms. Here we provide models of the regulation of two sets of mutually exclusive exons, 4a–4c and 7a–7b, of the Caenorhabditis elegans uncoordinated (unc)-32 gene, encoding the a subunit of V0 complex of vacuolar-type H+-ATPases. We visualize selection patterns of exon 4 and exon 7 in vivo by utilizing a trio and a pair of symmetric fluorescence splicing reporter minigenes, respectively, to demonstrate that they are regulated in tissue-specific manners. Genetic analyses reveal that RBFOX family RNA–binding proteins ASD-1 and FOX-1 and a UGCAUG stretch in intron 7b are involved in the neuron-specific selection of exon 7a. Through further forward genetic screening, we identify UNC-75, a neuron-specific CELF family RNA–binding protein of unknown function, as an essential regulator for the exon 7a selection. Electrophoretic mobility shift assays specify a short fragment in intron 7a as the recognition site for UNC-75 and demonstrate that UNC-75 specifically binds via its three RNA recognition motifs to the element including a UUGUUGUGUUGU stretch. The UUGUUGUGUUGU stretch in the reporter minigenes is actually required for the selection of exon 7a in the nervous system. We compare the amounts of partially spliced RNAs in the wild-type and unc-75 mutant backgrounds and raise a model for the mutually exclusive selection of unc-32 exon 7 by the RBFOX family and UNC-75. The neuron-specific selection of unc-32 exon 4b is also regulated by UNC-75 and the unc-75 mutation suppresses the Unc phenotype of the exon-4b-specific allele of unc-32 mutants. Taken together, UNC-75 is the neuron-specific splicing factor and regulates both sets of the mutually exclusive exons of

  18. Evaluating a set of reference genes for expression normalization in multiple tissues and skeletal muscle at different development stages in pigs using quantitative real-time polymerase chain reaction.

    PubMed

    Zhang, Jing; Tang, Zhonglin; Wang, Ning; Long, Liangqi; Li, Kui

    2012-01-01

    Gene expression analysis requires the use of reference genes consistently expressed under various conditions. In many cases, however, the commonly used reference genes are not uniformly expressed independently of tissues or environmental conditions. To provide a set of reliable reference genes in pigs, we used quantitative polymerase chain reaction to examine expression of six common reference genes (GAPDH, ACTB, H3F3A, HPRT1, RPL32, and RPS18) in adult tissues and prenatal skeletal muscles at 33, 65, and 90 days postcopulation from Tongcheng (obese-type) and Landrace (lean-type) pigs. The expression stability of these reference genes was evaluated by NormFinder, BestKeeper, and geNorm methods. Our data suggest that the reference genes were expressed variably in different tissues, developmental stages and breeds. RPS18, PRL32, and H3F3A could be used as internal controls to normalize gene expression in pig tissues and developmental skeletal muscle. The combination of internal control genes was necessary for accurate expression normalization. During skeletal muscle development, H3F3A and RPS18 would be the most appropriate combination to normalize gene expression in Tongcheng pigs, whereas the combination of PRL32 and RPS18 would be more suitable in Landrace pigs. In different tissues, the expression of PRL32 and RPS18 was the most consistent, and the combination of three genes (RPL32, RPS18, and H3F3A) is the most suitable for accurate normalization.

  19. FOXL2 gene mutations and blepharophimosis-ptosis-epicanthus inversus syndrome (BPES): a novel mutation detected in a Chinese family and a statistic model for summarizing previous reported records.

    PubMed

    Xu, Yan; Lei, Huo; Dong, Hong; Zhang, Liping; Qin, Qionglian; Gao, Jianmei; Zou, Yunlian; Yan, Xinmin

    2009-09-01

    Previous studies found that the forkhead transcription factor 2 (FOXL2) gene mutations are responsible for both types of blepharophimosis-ptosis-epicanthus inversus syndrome (BPES) but have not established any systematic statistic model for the complex and even contradictory results about genotype-phenotype correlations between them. This study is aimed to find possible mutations of FOXL2 gene in a Chinese family with type II BPES by using DNA sequencing and to further clarify genotype-phenotype correlations between FOXL2 mutations and BPES by using a systematic statistical method, namely Multifactor Dimensionality Reduction (MDR). A novel mutation (g.933_965dup) which could result in an expansion of the polyalanine (polyAla) tract was detected in all patients of this family. MDR analysis for intragenic mutations of FOXL2 gene reported in previous BPES studies indicated that the mutations which led to much stronger disturbance of amino acid sequence were responsible for more type I BPES, while other kinds of mutation were responsible for more type II BPES. In conclusion, the present study found a novel FOXL2 gene mutation in a Chinese BPES family and a new general genotype-phenotype correlation tendency between FOXL2 intragenic mutations and BPES, both of which expanded the knowledge about FOXL2 gene and BPES.

  20. Statistics Revelations

    ERIC Educational Resources Information Center

    Chicot, Katie; Holmes, Hilary

    2012-01-01

    The use, and misuse, of statistics is commonplace, yet in the printed format data representations can be either over simplified, supposedly for impact, or so complex as to lead to boredom, supposedly for completeness and accuracy. In this article the link to the video clip shows how dynamic visual representations can enliven and enhance the…

  1. Statistical Inference

    NASA Astrophysics Data System (ADS)

    Khan, Shahjahan

    Often scientific information on various data generating processes are presented in the from of numerical and categorical data. Except for some very rare occasions, generally such data represent a small part of the population, or selected outcomes of any data generating process. Although, valuable and useful information is lurking in the array of scientific data, generally, they are unavailable to the users. Appropriate statistical methods are essential to reveal the hidden "jewels" in the mess of the row data. Exploratory data analysis methods are used to uncover such valuable characteristics of the observed data. Statistical inference provides techniques to make valid conclusions about the unknown characteristics or parameters of the population from which scientifically drawn sample data are selected. Usually, statistical inference includes estimation of population parameters as well as performing test of hypotheses on the parameters. However, prediction of future responses and determining the prediction distributions are also part of statistical inference. Both Classical or Frequentists and Bayesian approaches are used in statistical inference. The commonly used Classical approach is based on the sample data alone. In contrast, increasingly popular Beyesian approach uses prior distribution on the parameters along with the sample data to make inferences. The non-parametric and robust methods are also being used in situations where commonly used model assumptions are unsupported. In this chapter,we cover the philosophical andmethodological aspects of both the Classical and Bayesian approaches.Moreover, some aspects of predictive inference are also included. In the absence of any evidence to support assumptions regarding the distribution of the underlying population, or if the variable is measured only in ordinal scale, non-parametric methods are used. Robust methods are employed to avoid any significant changes in the results due to deviations from the model

  2. Statistical Inference

    NASA Astrophysics Data System (ADS)

    Khan, Shahjahan

    Often scientific information on various data generating processes are presented in the from of numerical and categorical data. Except for some very rare occasions, generally such data represent a small part of the population, or selected outcomes of any data generating process. Although, valuable and useful information is lurking in the array of scientific data, generally, they are unavailable to the users. Appropriate statistical methods are essential to reveal the hidden “jewels” in the mess of the row data. Exploratory data analysis methods are used to uncover such valuable characteristics of the observed data. Statistical inference provides techniques to make valid conclusions about the unknown characteristics or parameters of the population from which scientifically drawn sample data are selected. Usually, statistical inference includes estimation of population parameters as well as performing test of hypotheses on the parameters. However, prediction of future responses and determining the prediction distributions are also part of statistical inference. Both Classical or Frequentists and Bayesian approaches are used in statistical inference. The commonly used Classical approach is based on the sample data alone. In contrast, increasingly popular Beyesian approach uses prior distribution on the parameters along with the sample data to make inferences. The non-parametric and robust methods are also being used in situations where commonly used model assumptions are unsupported. In this chapter,we cover the philosophical andmethodological aspects of both the Classical and Bayesian approaches.Moreover, some aspects of predictive inference are also included. In the absence of any evidence to support assumptions regarding the distribution of the underlying population, or if the variable is measured only in ordinal scale, non-parametric methods are used. Robust methods are employed to avoid any significant changes in the results due to deviations from the model

  3. Ideal statistically quasi Cauchy sequences

    NASA Astrophysics Data System (ADS)

    Savas, Ekrem; Cakalli, Huseyin

    2016-08-01

    An ideal I is a family of subsets of N, the set of positive integers which is closed under taking finite unions and subsets of its elements. A sequence (xk) of real numbers is said to be S(I)-statistically convergent to a real number L, if for each ɛ > 0 and for each δ > 0 the set { n ∈N :1/n | { k ≤n :| xk-L | ≥ɛ } | ≥δ } belongs to I. We introduce S(I)-statistically ward compactness of a subset of R, the set of real numbers, and S(I)-statistically ward continuity of a real function in the senses that a subset E of R is S(I)-statistically ward compact if any sequence of points in E has an S(I)-statistically quasi-Cauchy subsequence, and a real function is S(I)-statistically ward continuous if it preserves S(I)-statistically quasi-Cauchy sequences where a sequence (xk) is called to be S(I)-statistically quasi-Cauchy when (Δxk) is S(I)-statistically convergent to 0. We obtain results related to S(I)-statistically ward continuity, S(I)-statistically ward compactness, Nθ-ward continuity, and slowly oscillating continuity.

  4. Activated glucocorticoid receptor interacts with the INHAT component Set/TAF-Ibeta and releases it from a glucocorticoid-responsive gene promoter, relieving repression: implications for the pathogenesis of glucocorticoid resistance in acute undifferentiated leukemia with Set-Can translocation.

    PubMed

    Ichijo, Takamasa; Chrousos, George P; Kino, Tomoshige

    2008-02-13

    Set/template-activating factor (TAF)-Ibeta, part of the Set-Can oncogene product found in acute undifferentiated leukemia, is a component of the inhibitor of acetyltransferases (INHAT) complex. Set/TAF-Ibeta interacted with the DNA-binding domain of the glucocorticoid receptor (GR) in yeast two-hybrid screening, and repressed GR-induced transcriptional activity of a chromatin-integrated glucocorticoid-responsive and a natural promoter. Set/TAF-Ibeta was co-precipitated with glucocorticoid response elements (GREs) of these promoters in the absence of dexamethasone, while addition of the hormone caused dissociation of Set/TAF-Ibeta from and attraction of the p160-type coactivator GRIP1 to the promoter GREs. Set-Can fusion protein, on the other hand, did not interact with GR, was constitutively co-precipitated with GREs and suppressed GRIP1-induced enhancement of GR transcriptional activity and histone acetylation. Thus, Set/TAF-Ibeta acts as a ligand-activated GR-responsive transcriptional repressor, while Set-Can does not retain physiologic responsiveness to ligand-bound GR, possibly contributing to the poor responsiveness of Set-Can-harboring leukemic cells to glucocorticoids.

  5. The PR/SET domain zinc finger protein Prdm4 regulates gene expression in embryonic stem cells but plays a nonessential role in the developing mouse embryo.

    PubMed

    Bogani, Debora; Morgan, Marc A J; Nelson, Andrew C; Costello, Ita; McGouran, Joanna F; Kessler, Benedikt M; Robertson, Elizabeth J; Bikoff, Elizabeth K

    2013-10-01

    Prdm4 is a highly conserved member of the Prdm family of PR/SET domain zinc finger proteins. Many well-studied Prdm family members play critical roles in development and display striking loss-of-function phenotypes. Prdm4 functional contributions have yet to be characterized. Here, we describe its widespread expression in the early embryo and adult tissues. We demonstrate that DNA binding is exclusively mediated by the Prdm4 zinc finger domain, and we characterize its tripartite consensus sequence via SELEX (systematic evolution of ligands by exponential enrichment) and ChIP-seq (chromatin immunoprecipitation-sequencing) experiments. In embryonic stem cells (ESCs), Prdm4 regulates key pluripotency and differentiation pathways. Two independent strategies, namely, targeted deletion of the zinc finger domain and generation of a EUCOMM LacZ reporter allele, resulted in functional null alleles. However, homozygous mutant embryos develop normally and adults are healthy and fertile. Collectively, these results strongly suggest that Prdm4 functions redundantly with other transcriptional partners to cooperatively regulate gene expression in the embryo and adult animal.

  6. [Descriptive statistics].

    PubMed

    Rendón-Macías, Mario Enrique; Villasís-Keever, Miguel Ángel; Miranda-Novales, María Guadalupe

    2016-01-01

    Descriptive statistics is the branch of statistics that gives recommendations on how to summarize clearly and simply research data in tables, figures, charts, or graphs. Before performing a descriptive analysis it is paramount to summarize its goal or goals, and to identify the measurement scales of the different variables recorded in the study. Tables or charts aim to provide timely information on the results of an investigation. The graphs show trends and can be histograms, pie charts, "box and whiskers" plots, line graphs, or scatter plots. Images serve as examples to reinforce concepts or facts. The choice of a chart, graph, or image must be based on the study objectives. Usually it is not recommended to use more than seven in an article, also depending on its length.

  7. Order Statistics and Nonparametric Statistics.

    DTIC Science & Technology

    2014-09-26

    Topics investigated include the following: Probability that a fuze will fire; moving order statistics; distribution theory and properties of the...problem posed by an Army Scientist: A fuze will fire when at least n-i (or n-2) of n detonators function within time span t. What is the probability of

  8. Statistical Optics

    NASA Astrophysics Data System (ADS)

    Goodman, Joseph W.

    2000-07-01

    The Wiley Classics Library consists of selected books that have become recognized classics in their respective fields. With these new unabridged and inexpensive editions, Wiley hopes to extend the life of these important works by making them available to future generations of mathematicians and scientists. Currently available in the Series: T. W. Anderson The Statistical Analysis of Time Series T. S. Arthanari & Yadolah Dodge Mathematical Programming in Statistics Emil Artin Geometric Algebra Norman T. J. Bailey The Elements of Stochastic Processes with Applications to the Natural Sciences Robert G. Bartle The Elements of Integration and Lebesgue Measure George E. P. Box & Norman R. Draper Evolutionary Operation: A Statistical Method for Process Improvement George E. P. Box & George C. Tiao Bayesian Inference in Statistical Analysis R. W. Carter Finite Groups of Lie Type: Conjugacy Classes and Complex Characters R. W. Carter Simple Groups of Lie Type William G. Cochran & Gertrude M. Cox Experimental Designs, Second Edition Richard Courant Differential and Integral Calculus, Volume I RIchard Courant Differential and Integral Calculus, Volume II Richard Courant & D. Hilbert Methods of Mathematical Physics, Volume I Richard Courant & D. Hilbert Methods of Mathematical Physics, Volume II D. R. Cox Planning of Experiments Harold S. M. Coxeter Introduction to Geometry, Second Edition Charles W. Curtis & Irving Reiner Representation Theory of Finite Groups and Associative Algebras Charles W. Curtis & Irving Reiner Methods of Representation Theory with Applications to Finite Groups and Orders, Volume I Charles W. Curtis & Irving Reiner Methods of Representation Theory with Applications to Finite Groups and Orders, Volume II Cuthbert Daniel Fitting Equations to Data: Computer Analysis of Multifactor Data, Second Edition Bruno de Finetti Theory of Probability, Volume I Bruno de Finetti Theory of Probability, Volume 2 W. Edwards Deming Sample Design in Business Research

  9. Conditional statistical model building

    NASA Astrophysics Data System (ADS)

    Hansen, Mads Fogtmann; Hansen, Michael Sass; Larsen, Rasmus

    2008-03-01

    We present a new statistical deformation model suited for parameterized grids with different resolutions. Our method models the covariances between multiple grid levels explicitly, and allows for very efficient fitting of the model to data on multiple scales. The model is validated on a data set consisting of 62 annotated MR images of Corpus Callosum. One fifth of the data set was used as a training set, which was non-rigidly registered to each other without a shape prior. From the non-rigidly registered training set a shape prior was constructed by performing principal component analysis on each grid level and using the results to construct a conditional shape model, conditioning the finer parameters with the coarser grid levels. The remaining shapes were registered with the constructed shape prior. The dice measures for the registration without prior and the registration with a prior were 0.875 +/- 0.042 and 0.8615 +/- 0.051, respectively.

  10. Genes

    MedlinePlus

    ... Search Search MedlinePlus GO GO About MedlinePlus Site Map FAQs Customer Support Health Topics Drugs & Supplements Videos & Tools Español You Are Here: Home → Medical Encyclopedia → Genes URL of this page: //medlineplus.gov/ency/article/ ...

  11. Representational Versatility in Learning Statistics

    ERIC Educational Resources Information Center

    Graham, Alan T.; Thomas, Michael O. J.

    2005-01-01

    Statistical data can be represented in a number of qualitatively different ways, the choice depending on the following three conditions: the concepts to be investigated; the nature of the data; and the purpose for which they were collected. This paper begins by setting out frameworks that describe the nature of statistical thinking in schools, and…

  12. 1979 DOE statistical symposium

    SciTech Connect

    Gardiner, D.A.; Truett T.

    1980-09-01

    The 1979 DOE Statistical Symposium was the fifth in the series of annual symposia designed to bring together statisticians and other interested parties who are actively engaged in helping to solve the nation's energy problems. The program included presentations of technical papers centered around exploration and disposal of nuclear fuel, general energy-related topics, and health-related issues, and workshops on model evaluation, risk analysis, analysis of large data sets, and resource estimation.

  13. Statistical Neurodynamics.

    NASA Astrophysics Data System (ADS)

    Paine, Gregory Harold

    1982-03-01

    The primary objective of the thesis is to explore the dynamical properties of small nerve networks by means of the methods of statistical mechanics. To this end, a general formalism is developed and applied to elementary groupings of model neurons which are driven by either constant (steady state) or nonconstant (nonsteady state) forces. Neuronal models described by a system of coupled, nonlinear, first-order, ordinary differential equations are considered. A linearized form of the neuronal equations is studied in detail. A Lagrange function corresponding to the linear neural network is constructed which, through a Legendre transformation, provides a constant of motion. By invoking the Maximum-Entropy Principle with the single integral of motion as a constraint, a probability distribution function for the network in a steady state can be obtained. The formalism is implemented for some simple networks driven by a constant force; accordingly, the analysis focuses on a study of fluctuations about the steady state. In particular, a network composed of N noninteracting neurons, termed Free Thinkers, is considered in detail, with a view to interpretation and numerical estimation of the Lagrange multiplier corresponding to the constant of motion. As an archetypical example of a net of interacting neurons, the classical neural oscillator, consisting of two mutually inhibitory neurons, is investigated. It is further shown that in the case of a network driven by a nonconstant force, the Maximum-Entropy Principle can be applied to determine a probability distribution functional describing the network in a nonsteady state. The above examples are reconsidered with nonconstant driving forces which produce small deviations from the steady state. Numerical studies are performed on simplified models of two physical systems: the starfish central nervous system and the mammalian olfactory bulb. Discussions are given as to how statistical neurodynamics can be used to gain a better

  14. Phylogenomics reveals surprising sets of essential and dispensable clades of MIKC(c)-group MADS-box genes in flowering plants.

    PubMed

    Gramzow, Lydia; Theißen, Günter

    2015-06-01

    MIKC(C)-group MADS-box genes are involved in the control of many developmental processes in flowering plants. All of these genes are members of one of 17 clades that had already been established in the most recent common ancestor (MRCA) of extant angiosperms. These clades trace back to 11 seed plant-specific superclades that were present in the MRCA of extant seed plants. Due to their important role in plant development and evolution, the origin of the clades of MIKC(C)-group genes has been studied in great detail. In contrast, whether any of these ancestral clades has ever been lost completely in any species has not been investigated so far. Here, we determined the presence of these clades by BLAST, PSI-BLAST, and Hidden Markov Model searches and by phylogenetic methods in the whole genomes of 27 floweri