Science.gov

Sample records for gene set statistics

  1. Multiset Statistics for Gene Set Analysis

    PubMed Central

    Newton, Michael A.; Wang, Zhishi

    2015-01-01

    An important data analysis task in statistical genomics involves the integration of genome-wide gene-level measurements with preexisting data on the same genes. A wide variety of statistical methodologies and computational tools have been developed for this general task. We emphasize one particular distinction among methodologies, namely whether they process gene sets one at a time (uniset) or simultaneously via some multiset technique. Owing to the complexity of collections of gene sets, the multiset approach offers some advantages, as it naturally accommodates set-size variations and among-set overlaps. However, this approach presents both computational and inferential challenges. After reviewing some statistical issues that arise in uniset analysis, we examine two model-based multiset methods for gene list data. PMID:25914887

  2. FLAGS: A Flexible and Adaptive Association Test for Gene Sets Using Summary Statistics.

    PubMed

    Huang, Jianfei; Wang, Kai; Wei, Peng; Liu, Xiangtao; Liu, Xiaoming; Tan, Kai; Boerwinkle, Eric; Potash, James B; Han, Shizhong

    2016-03-01

    Genome-wide association studies (GWAS) have been widely used for identifying common variants associated with complex diseases. Despite remarkable success in uncovering many risk variants and providing novel insights into disease biology, genetic variants identified to date fail to explain the vast majority of the heritability for most complex diseases. One explanation is that there are still a large number of common variants that remain to be discovered, but their effect sizes are generally too small to be detected individually. Accordingly, gene set analysis of GWAS, which examines a group of functionally related genes, has been proposed as a complementary approach to single-marker analysis. Here, we propose a FL: exible and A: daptive test for G: ene S: ets (FLAGS), using summary statistics. Extensive simulations showed that this method has an appropriate type I error rate and outperforms existing methods with increased power. As a proof of principle, through real data analyses of Crohn's disease GWAS data and bipolar disorder GWAS meta-analysis results, we demonstrated the superior performance of FLAGS over several state-of-the-art association tests for gene sets. Our method allows for the more powerful application of gene set analysis to complex diseases, which will have broad use given that GWAS summary results are increasingly publicly available. PMID:26773050

  3. Gene set enrichment; a problem of pathways

    PubMed Central

    Meaburn, Emma L.; Schalkwyk, Leonard C.

    2010-01-01

    Gene Set Enrichment (GSE) is a computational technique which determines whether a priori defined set of genes show statistically significant differential expression between two phenotypes. Currently, the gene sets used for GSE are derived from annotation or pathway databases, which often contain computationally based and unrepresentative data. Here, we propose a novel approach for the generation of comprehensive and biologically derived gene sets, deriving sets through the application of machine learning techniques to gene expression data. These gene sets can be produced for specific tissues, developmental stages or environments. They provide a powerful and functionally meaningful way in which to mine genomewide association and next generation sequencing data in order to identify disease-associated variants and pathways. PMID:20861160

  4. Statistical mechanics of maximal independent sets

    NASA Astrophysics Data System (ADS)

    Dall'Asta, Luca; Pin, Paolo; Ramezanpour, Abolfazl

    2009-12-01

    The graph theoretic concept of maximal independent set arises in several practical problems in computer science as well as in game theory. A maximal independent set is defined by the set of occupied nodes that satisfy some packing and covering constraints. It is known that finding minimum and maximum-density maximal independent sets are hard optimization problems. In this paper, we use cavity method of statistical physics and Monte Carlo simulations to study the corresponding constraint satisfaction problem on random graphs. We obtain the entropy of maximal independent sets within the replica symmetric and one-step replica symmetry breaking frameworks, shedding light on the metric structure of the landscape of solutions and suggesting a class of possible algorithms. This is of particular relevance for the application to the study of strategic interactions in social and economic networks, where maximal independent sets correspond to pure Nash equilibria of a graphical game of public goods allocation.

  5. Statistical considerations in setting product specifications.

    PubMed

    Dong, Xiaoyu; Tsong, Yi; Shen, Meiyu

    2015-01-01

    According to ICH Q6A (1999), a specification is defined as a list of tests, references to analytical procedures, and appropriate acceptance criteria, which are numerical limits, ranges, or other criteria for the tests described. For drug products, specifications usually consist of test methods and acceptance criteria for assay, impurities, pH, dissolution, moisture, and microbial limits, depending on the dosage forms. They are usually proposed by the manufacturers and subject to the regulatory approval for use. When the acceptance criteria in product specifications cannot be pre-defined based on prior knowledge, the conventional approach is to use data from a limited number of clinical batches during the clinical development phases. Often in time, such acceptance criterion is set as an interval bounded by the sample mean plus and minus two to four standard deviations. This interval may be revised with the accumulated data collected from released batches after drug approval. In this article, we describe and discuss the statistical issues of commonly used approaches in setting or revising specifications (usually tighten the limits), including reference interval, (Min, Max) method, tolerance interval, and confidence limit of percentiles. We also compare their performance in terms of the interval width and the intended coverage. Based on our study results and review experiences, we make some recommendations on how to select the appropriate statistical methods in setting product specifications to better ensure the product quality. PMID:25358110

  6. Statistical mechanics of typical set decoding

    NASA Astrophysics Data System (ADS)

    Kabashima, Yoshiyuki; Nakamura, Kazutaka; van Mourik, Jort

    2002-09-01

    The performance of ``typical set (pairs) decoding'' for ensembles of Gallager's linear code is investigated using statistical physics. In this decoding method, errors occur, either when the information transmission is corrupted by atypical noise, or when multiple typical sequences satisfy the parity check equation as provided by the received corrupted codeword. We show that the average error rate for the second type of error over a given code ensemble can be accurately evaluated using the replica method, including the sensitivity to message length. Our approach generally improves the existing analysis known in the information theory community, which was recently reintroduced in IEEE Trans. Inf. Theory 45, 399 (1999), and is believed to be the most accurate to date.

  7. MAGMA: Generalized Gene-Set Analysis of GWAS Data

    PubMed Central

    de Leeuw, Christiaan A.; Mooij, Joris M.; Heskes, Tom; Posthuma, Danielle

    2015-01-01

    By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn’s Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn’s Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn’s Disease data was found to be considerably faster as well. PMID:25885710

  8. Importance of data management with statistical analysis set division.

    PubMed

    Wang, Ling; Li, Chan-juan; Jiang, Zhi-wei; Xia, Jie-lai

    2015-11-01

    Testing of hypothesis was affected by statistical analysis set division which was an important data management work before data base lock-in. Objective division of statistical analysis set under blinding was the guarantee of scientific trial conclusion. All the subjects having accepted at least once trial treatment after randomization should be concluded in safety set. Full analysis set should be close to the intention-to-treat as far as possible. Per protocol set division was the most difficult to control in blinded examination because of more subjectivity than the other two. The objectivity of statistical analysis set division must be guaranteed by the accurate raw data, the comprehensive data check and the scientific discussion, all of which were the strict requirement of data management. Proper division of statistical analysis set objectively and scientifically is an important approach to improve the data management quality. PMID:26911044

  9. Transformations on Data Sets and Their Effects on Descriptive Statistics

    ERIC Educational Resources Information Center

    Fox, Thomas B.

    2005-01-01

    The activity asks students to examine the effects on the descriptive statistics of a data set that has undergone either a translation or a scale change. They make conjectures relative to the effects on the statistics of a transformation on a data set and then they defend their conjectures and deductively verify several of them.

  10. Gene set analyses for interpreting microarray experiments on prokaryotic organisms.

    SciTech Connect

    Tintle, Nathan; Best, Aaron; Dejongh, Matthew; VanBruggen, Dirk; Heffron, Fred; Porwollik, Steffen; Taylor, Ronald C.

    2008-11-05

    Background: Recent advances in microarray technology have brought with them the need for enhanced methods of biologically interpreting gene expression data. Recently, methods like Gene Set Enrichment Analysis (GSEA) and variants of Fisher’s exact test have been proposed which utilize a priori biological information. Typically, these methods are demonstrated with a priori biological information from the Gene Ontology. Results: Alternative gene set definitions are presented based on gene sets inferred from the SEED: open-source software environment for comparative genome annotation and analysis of microbial organisms. Many of these gene sets are then shown to provide consistent expression across a series of experiments involving Salmonella Typhimurium. Implementation of the gene sets in an analysis of microarray data is then presented for the Salmonella Typhimurium data. Conclusions: SEED inferred gene sets can be naturally defined based on subsystems in the SEED. The consistent expression values of these SEED inferred gene sets suggest their utility for statistical analyses of gene expression data based on a priori biological information

  11. MAVTgsa: An R Package for Gene Set (Enrichment) Analysis

    DOE PAGESBeta

    Chien, Chih-Yi; Chang, Ching-Wei; Tsai, Chen-An; Chen, James J.

    2014-01-01

    Gene semore » t analysis methods aim to determine whether an a priori defined set of genes shows statistically significant difference in expression on either categorical or continuous outcomes. Although many methods for gene set analysis have been proposed, a systematic analysis tool for identification of different types of gene set significance modules has not been developed previously. This work presents an R package, called MAVTgsa, which includes three different methods for integrated gene set enrichment analysis. (1) The one-sided OLS (ordinary least squares) test detects coordinated changes of genes in gene set in one direction, either up- or downregulation. (2) The two-sided MANOVA (multivariate analysis variance) detects changes both up- and downregulation for studying two or more experimental conditions. (3) A random forests-based procedure is to identify gene sets that can accurately predict samples from different experimental conditions or are associated with the continuous phenotypes. MAVTgsa computes the P values and FDR (false discovery rate) q -value for all gene sets in the study. Furthermore, MAVTgsa provides several visualization outputs to support and interpret the enrichment results. This package is available online.« less

  12. Assessment of gene set analysis methods based on microarray data.

    PubMed

    Alavi-Majd, Hamid; Khodakarim, Soheila; Zayeri, Farid; Rezaei-Tavirani, Mostafa; Tabatabaei, Seyyed Mohammad; Heydarpour-Meymeh, Maryam

    2014-01-25

    Gene set analysis (GSA) incorporates biological information into statistical knowledge to identify gene sets differently expressed between two or more phenotypes. It allows us to gain an insight into the functional working mechanism of cells beyond the detection of differently expressed gene sets. In order to evaluate the competence of GSA approaches, three self-contained GSA approaches with different statistical methods were chosen; Category, Globaltest and Hotelling's T(2) together with their assayed power to identify the differences expressed via simulation and real microarray data. The Category does not take care of the correlation structure, while the other two deal with correlations. In order to perform these methods, R and Bioconductor were used. Furthermore, venous thromboembolism and acute lymphoblastic leukemia microarray data were applied. The results of three GSAs showed that the competence of these methods depends on the distribution of gene expression in a dataset. It is very important to assay the distribution of gene expression data before choosing the GSA method to identify gene sets differently expressed between phenotypes. On the other hand, assessment of common genes among significant gene sets indicated that there was a significant agreement between the result of GSA and the findings of biologists. PMID:24012817

  13. Network enrichment analysis: extension of gene-set enrichment analysis to gene networks

    PubMed Central

    2012-01-01

    Background Gene-set enrichment analyses (GEA or GSEA) are commonly used for biological characterization of an experimental gene-set. This is done by finding known functional categories, such as pathways or Gene Ontology terms, that are over-represented in the experimental set; the assessment is based on an overlap statistic. Rich biological information in terms of gene interaction network is now widely available, but this topological information is not used by GEA, so there is a need for methods that exploit this type of information in high-throughput data analysis. Results We developed a method of network enrichment analysis (NEA) that extends the overlap statistic in GEA to network links between genes in the experimental set and those in the functional categories. For the crucial step in statistical inference, we developed a fast network randomization algorithm in order to obtain the distribution of any network statistic under the null hypothesis of no association between an experimental gene-set and a functional category. We illustrate the NEA method using gene and protein expression data from a lung cancer study. Conclusions The results indicate that the NEA method is more powerful than the traditional GEA, primarily because the relationships between gene sets were more strongly captured by network connectivity rather than by simple overlaps. PMID:22966941

  14. Association Between a Prognostic Gene Signature and Functional Gene Sets

    PubMed Central

    Hummel, Manuela; Metzeler, Klaus H.; Buske, Christian; Bohlander, Stefan K.; Mansmann, Ulrich

    2008-01-01

    Background The development of expression-based gene signatures for predicting prognosis or class membership is a popular and challenging task. Besides their stringent validation, signatures need a functional interpretation and must be placed in a biological context. Popular tools such as Gene Set Enrichment have drawbacks because they are restricted to annotated genes and are unable to capture the information hidden in the signature’s non-annotated genes. Methodology We propose concepts to relate a signature with functional gene sets like pathways or Gene Ontology categories. The connection between single signature genes and a specific pathway is explored by hierarchical variable selection and gene association networks. The risk score derived from an individual patient’s signature is related to expression patterns of pathways and Gene Ontology categories. Global tests are useful for these tasks, and they adjust for other factors. GlobalAncova is used to explore the effect on gene expression in specific functional groups from the interaction of the score and selected mutations in the patient’s genome. Results We apply the proposed methods to an expression data set and a corresponding gene signature for predicting survival in Acute Myeloid Leukemia (AML). The example demonstrates strong relations between the signature and cancer-related pathways. The signature-based risk score was found to be associated with development-related biological processes. Conclusions Many authors interpret the functional aspects of a gene signature by linking signature genes to pathways or relevant functional gene groups. The method of gene set enrichment is preferred to annotating signature genes to specific Gene Ontology categories. The strategies proposed in this paper go beyond the restriction of annotation and deepen the insights into the biological mechanisms reflected in the information given by a signature. PMID:19812786

  15. Time-Course Gene Set Analysis for Longitudinal Gene Expression Data

    PubMed Central

    Hejblum, Boris P.; Skinner, Jason; Thiébaut, Rodolphe

    2015-01-01

    Gene set analysis methods, which consider predefined groups of genes in the analysis of genomic data, have been successfully applied for analyzing gene expression data in cross-sectional studies. The time-course gene set analysis (TcGSA) introduced here is an extension of gene set analysis to longitudinal data. The proposed method relies on random effects modeling with maximum likelihood estimates. It allows to use all available repeated measurements while dealing with unbalanced data due to missing at random (MAR) measurements. TcGSA is a hypothesis driven method that identifies a priori defined gene sets with significant expression variations over time, taking into account the potential heterogeneity of expression within gene sets. When biological conditions are compared, the method indicates if the time patterns of gene sets significantly differ according to these conditions. The interest of the method is illustrated by its application to two real life datasets: an HIV therapeutic vaccine trial (DALIA-1 trial), and data from a recent study on influenza and pneumococcal vaccines. In the DALIA-1 trial TcGSA revealed a significant change in gene expression over time within 69 gene sets during vaccination, while a standard univariate individual gene analysis corrected for multiple testing as well as a standard a Gene Set Enrichment Analysis (GSEA) for time series both failed to detect any significant pattern change over time. When applied to the second illustrative data set, TcGSA allowed the identification of 4 gene sets finally found to be linked with the influenza vaccine too although they were found to be associated to the pneumococcal vaccine only in previous analyses. In our simulation study TcGSA exhibits good statistical properties, and an increased power compared to other approaches for analyzing time-course expression patterns of gene sets. The method is made available for the community through an R package. PMID:26111374

  16. WebGestalt: an integrated system for exploring gene sets in various biological contexts.

    PubMed

    Zhang, Bing; Kirov, Stefan; Snoddy, Jay

    2005-07-01

    High-throughput technologies have led to the rapid generation of large-scale datasets about genes and gene products. These technologies have also shifted our research focus from 'single genes' to 'gene sets'. We have developed a web-based integrated data mining system, WebGestalt (http://genereg.ornl.gov/webgestalt/), to help biologists in exploring large sets of genes. WebGestalt is composed of four modules: gene set management, information retrieval, organization/visualization, and statistics. The management module uploads, saves, retrieves and deletes gene sets, as well as performs Boolean operations to generate the unions, intersections or differences between different gene sets. The information retrieval module currently retrieves information for up to 20 attributes for all genes in a gene set. The organization/visualization module organizes and visualizes gene sets in various biological contexts, including Gene Ontology, tissue expression pattern, chromosome distribution, metabolic and signaling pathways, protein domain information and publications. The statistics module recommends and performs statistical tests to suggest biological areas that are important to a gene set and warrant further investigation. In order to demonstrate the use of WebGestalt, we have generated 48 gene sets with genes over-represented in various human tissue types. Exploration of all the 48 gene sets using WebGestalt is available for the public at http://genereg.ornl.gov/webgestalt/wg_enrich.php. PMID:15980575

  17. Establishment of an attentional set via statistical learning.

    PubMed

    Cosman, Joshua D; Vecera, Shaun P

    2014-02-01

    The ability to overcome attentional capture and attend goal-relevant information is typically viewed as a volitional, effortful process that relies on the maintenance of current task priorities or "attentional sets" in working memory. However, the visual system possesses statistical learning mechanisms that can incidentally encode probabilistic associations between goal-relevant objects and the attributes likely to define them. Thus, it is possible that statistical learning may contribute to the establishment of a given attentional set and modulate the effects of attentional capture. Here we provide evidence for such a mechanism, showing that implicitly learned associations between a search target and its likely color directly influence the ability of a salient color precue to capture attention in a classic attentional capture task. This indicates a novel role for statistical learning in the modulation of attentional capture, and emphasizes the role that this learning may play in goal-directed attentional control more generally. PMID:24099589

  18. Analysis of gene set using shrinkage covariance matrix approach

    NASA Astrophysics Data System (ADS)

    Karjanto, Suryaefiza; Aripin, Rasimah

    2013-09-01

    Microarray methodology has been exploited for different applications such as gene discovery and disease diagnosis. This technology is also used for quantitative and highly parallel measurements of gene expression. Recently, microarrays have been one of main interests of statisticians because they provide a perfect example of the paradigms of modern statistics. In this study, the alternative approach to estimate the covariance matrix has been proposed to solve the high dimensionality problem in microarrays. The extension of traditional Hotelling's T2 statistic is constructed for determining the significant gene sets across experimental conditions using shrinkage approach. Real data sets were used as illustrations to compare the performance of the proposed methods with other methods. The results across the methods are consistent, implying that this approach provides an alternative to existing techniques.

  19. The limitations of simple gene set enrichment analysis assuming gene independence.

    PubMed

    Tamayo, Pablo; Steinhardt, George; Liberzon, Arthur; Mesirov, Jill P

    2016-02-01

    Since its first publication in 2003, the Gene Set Enrichment Analysis method, based on the Kolmogorov-Smirnov statistic, has been heavily used, modified, and also questioned. Recently a simplified approach using a one-sample t-test score to assess enrichment and ignoring gene-gene correlations was proposed by Irizarry et al. 2009 as a serious contender. The argument criticizes Gene Set Enrichment Analysis's nonparametric nature and its use of an empirical null distribution as unnecessary and hard to compute. We refute these claims by careful consideration of the assumptions of the simplified method and its results, including a comparison with Gene Set Enrichment Analysis's on a large benchmark set of 50 datasets. Our results provide strong empirical evidence that gene-gene correlations cannot be ignored due to the significant variance inflation they produced on the enrichment scores and should be taken into account when estimating gene set enrichment significance. In addition, we discuss the challenges that the complex correlation structure and multi-modality of gene sets pose more generally for gene set enrichment methods. PMID:23070592

  20. Caipirini: using gene sets to rank literature

    PubMed Central

    2012-01-01

    Background Keeping up-to-date with bioscience literature is becoming increasingly challenging. Several recent methods help meet this challenge by allowing literature search to be launched based on lists of abstracts that the user judges to be 'interesting'. Some methods go further by allowing the user to provide a second input set of 'uninteresting' abstracts; these two input sets are then used to search and rank literature by relevance. In this work we present the service 'Caipirini' (http://caipirini.org) that also allows two input sets, but takes the novel approach of allowing ranking of literature based on one or more sets of genes. Results To evaluate the usefulness of Caipirini, we used two test cases, one related to the human cell cycle, and a second related to disease defense mechanisms in Arabidopsis thaliana. In both cases, the new method achieved high precision in finding literature related to the biological mechanisms underlying the input data sets. Conclusions To our knowledge Caipirini is the first service enabling literature search directly based on biological relevance to gene sets; thus, Caipirini gives the research community a new way to unlock hidden knowledge from gene sets derived via high-throughput experiments. PMID:22297131

  1. Comparing Data Sets: Implicit Summaries of the Statistical Properties of Number Sets

    ERIC Educational Resources Information Center

    Morris, Bradley J.; Masnick, Amy M.

    2015-01-01

    Comparing datasets, that is, sets of numbers in context, is a critical skill in higher order cognition. Although much is known about how people compare single numbers, little is known about how number sets are represented and compared. We investigated how subjects compared datasets that varied in their statistical properties, including ratio of…

  2. Statistical Software for spatial analysis of stratigraphic data sets

    SciTech Connect

    2003-04-08

    Stratistics s a tool for statistical analysis of spatially explicit data sets and model output for description and for model-data comparisons. lt is intended for the analysis of data sets commonly used in geology, such as gamma ray logs and lithologic sequences, as well as 2-D data such as maps. Stratistics incorporates a far wider range of spatial analysis methods drawn from multiple disciplines, than are currently available in other packages. These include incorporation of techniques from spatial and landscape ecology, fractal analysis, and mathematical geology. Its use should substantially reduce the risk associated with the use of predictive models

  3. Statistical Software for spatial analysis of stratigraphic data sets

    Energy Science and Technology Software Center (ESTSC)

    2003-04-08

    Stratistics s a tool for statistical analysis of spatially explicit data sets and model output for description and for model-data comparisons. lt is intended for the analysis of data sets commonly used in geology, such as gamma ray logs and lithologic sequences, as well as 2-D data such as maps. Stratistics incorporates a far wider range of spatial analysis methods drawn from multiple disciplines, than are currently available in other packages. These include incorporation ofmore » techniques from spatial and landscape ecology, fractal analysis, and mathematical geology. Its use should substantially reduce the risk associated with the use of predictive models« less

  4. De-correlating expression in gene-set analysis

    PubMed Central

    Nam, Dougu

    2010-01-01

    Motivation: Group-wise pattern analysis of genes, known as gene-set analysis (GSA), addresses the differential expression pattern of biologically pre-defined gene sets. GSA exhibits high statistical power and has revealed many novel biological processes associated with specific phenotypes. In most cases, however, GSA relies on the invalid assumption that the members of each gene set are sampled independently, which increases false predictions. Results: We propose an algorithm, termed DECO, to remove (or alleviate) the bias caused by the correlation of the expression data in GSAs. This is accomplished through the eigenvalue-decomposition of covariance matrixes and a series of linear transformations of data. In particular, moderate de-correlation methods that truncate or re-scale eigenvalues were proposed for a more reliable analysis. Tests of simulated and real experimental data show that DECO effectively corrects the correlation structure of gene expression and improves the prediction accuracy (specificity and sensitivity) for both gene- and sample-randomizing GSA methods. Availability: The MATLAB codes and the tested data sets are available at ftp://deco.nims.re.kr/pub or from the author. Contact: dougnam@unist.ac.kr PMID:20823315

  5. Subdimensional geo-localization from finite set statistics

    NASA Astrophysics Data System (ADS)

    Boyle, Frank

    2013-05-01

    In practical circumstances, a problem that often occurs is to geo-localize an entity from surfacelevel imagery given wide area overhead information and other a priori information that might be used to relate the two views. Given a finite set of GMTI returns and surface-level imagery of a common region of space, we propose a statistical algorithm for the association of surface-level one-dimensional measurements of the finite set to entities of the shared-dimensional wide area overview. Specifically, the problem of fused tracking without reliable range information from a surface-level view of a subset of entities is solved by the association of projections of 3-dimensional movement and position measurements of the GMTI and surface-level imagery. In this process the position of the surface level observer is refined. We expand this algorithm to a set of surface level observers distributed over the region of interest and propose a system of continuous tracking of entities over congested areas. The fusion search algorithm exploits the invariant metric properties of projection in a matched-filter procedure as well as the partialordering of local apparent depth of objects. We achieve O(N) convergence thereby making this algorithm practical for large-N searches. The algorithm is demonstrated analytically and by simulation.

  6. STATISTICS OF DARK MATTER HALOS FROM THE EXCURSION SET APPROACH

    SciTech Connect

    Lapi, A.; Salucci, P.; Danese, L.

    2013-08-01

    We exploit the excursion set approach in integral formulation to derive novel, accurate analytic approximations of the unconditional and conditional first crossing distributions for random walks with uncorrelated steps and general shapes of the moving barrier; we find the corresponding approximations of the unconditional and conditional halo mass functions for cold dark matter (DM) power spectra to represent very well the outcomes of state-of-the-art cosmological N-body simulations. In addition, we apply these results to derive, and confront with simulations, other quantities of interest in halo statistics, including the rates of halo formation and creation, the average halo growth history, and the halo bias. Finally, we discuss how our approach and main results change when considering random walks with correlated instead of uncorrelated steps, and warm instead of cold DM power spectra.

  7. Applying Statistical Process Quality Control Methodology to Educational Settings.

    ERIC Educational Resources Information Center

    Blumberg, Carol Joyce

    A subset of Statistical Process Control (SPC) methodology known as Control Charting is introduced. SPC methodology is a collection of graphical and inferential statistics techniques used to study the progress of phenomena over time. The types of control charts covered are the null X (mean), R (Range), X (individual observations), MR (moving…

  8. Chronic periodontitis genome-wide association studies: gene-centric and gene set enrichment analyses.

    PubMed

    Rhodin, K; Divaris, K; North, K E; Barros, S P; Moss, K; Beck, J D; Offenbacher, S

    2014-09-01

    Recent genome-wide association studies (GWAS) of chronic periodontitis (CP) offer rich data sources for the investigation of candidate genes, functional elements, and pathways. We used GWAS data of CP (n = 4,504) and periodontal pathogen colonization (n = 1,020) from a cohort of adult Americans of European descent participating in the Atherosclerosis Risk in Communities study and employed a MAGENTA approach (i.e., meta-analysis gene set enrichment of variant associations) to obtain gene-centric and gene set association results corrected for gene size, number of single-nucleotide polymorphisms, and local linkage disequilibrium characteristics based on the human genome build 18 (National Center for Biotechnology Information build 36). We used the Gene Ontology, Ingenuity, KEGG, Panther, Reactome, and Biocarta databases for gene set enrichment analyses. Six genes showed evidence of statistically significant association: 4 with severe CP (NIN, p = 1.6 × 10(-7); ABHD12B, p = 3.6 × 10(-7); WHAMM, p = 1.7 × 10(-6); AP3B2, p = 2.2 × 10(-6)) and 2 with high periodontal pathogen colonization (red complex-KCNK1, p = 3.4 × 10(-7); Porphyromonas gingivalis-DAB2IP, p = 1.0 × 10(-6)). Top-ranked genes for moderate CP were HGD (p = 1.4 × 10(-5)), ZNF675 (p = 1.5 × 10(-5)), TNFRSF10C (p = 2.0 × 10(-5)), and EMR1 (p = 2.0 × 10(-5)). Loci containing NIN, EMR1, KCNK1, and DAB2IP had showed suggestive evidence of association in the earlier single-nucleotide polymorphism-based analyses, whereas WHAMM and AP2B2 emerged as novel candidates. The top gene sets included severe CP ("endoplasmic reticulum membrane," "cytochrome P450," "microsome," and "oxidation reduction") and moderate CP ("regulation of gene expression," "zinc ion binding," "BMP signaling pathway," and "ruffle"). Gene-centric analyses offer a promising avenue for efficient interrogation of large-scale GWAS data. These results highlight genes in previously identified loci and new candidate genes and pathways

  9. GO-based Functional Dissimilarity of Gene Sets

    PubMed Central

    2011-01-01

    Background The Gene Ontology (GO) provides a controlled vocabulary for describing the functions of genes and can be used to evaluate the functional coherence of gene sets. Many functional coherence measures consider each pair of gene functions in a set and produce an output based on all pairwise distances. A single gene can encode multiple proteins that may differ in function. For each functionality, other proteins that exhibit the same activity may also participate. Therefore, an identification of the most common function for all of the genes involved in a biological process is important in evaluating the functional similarity of groups of genes and a quantification of functional coherence can helps to clarify the role of a group of genes working together. Results To implement this approach to functional assessment, we present GFD (GO-based Functional Dissimilarity), a novel dissimilarity measure for evaluating groups of genes based on the most relevant functions of the whole set. The measure assigns a numerical value to the gene set for each of the three GO sub-ontologies. Conclusions Results show that GFD performs robustly when applied to gene set of known functionality (extracted from KEGG). It performs particularly well on randomly generated gene sets. An ROC analysis reveals that the performance of GFD in evaluating the functional dissimilarity of gene sets is very satisfactory. A comparative analysis against other functional measures, such as GS2 and those presented by Resnik and Wang, also demonstrates the robustness of GFD. PMID:21884611

  10. Simple Data Sets for Distinct Basic Summary Statistics

    ERIC Educational Resources Information Center

    Lesser, Lawrence M.

    2011-01-01

    It is important to avoid ambiguity with numbers because unfortunate choices of numbers can inadvertently make it possible for students to form misconceptions or make it difficult for teachers to tell if students obtained the right answer for the right reason. Therefore, it is important to make sure when introducing basic summary statistics that…

  11. LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights.

    PubMed

    Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong

    2016-01-01

    Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher's exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO's usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher. PMID:26750448

  12. LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights

    PubMed Central

    Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong

    2016-01-01

    Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher’s exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO’s usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher. PMID:26750448

  13. Using an ensemble of statistical metrics to quantify large sets of plant transcription factor binding sites

    PubMed Central

    2013-01-01

    Background From initial seed germination through reproduction, plants continuously reprogram their transcriptional repertoire to facilitate growth and development. This dynamic is mediated by a diverse but inextricably-linked catalog of regulatory proteins called transcription factors (TFs). Statistically quantifying TF binding site (TFBS) abundance in promoters of differentially expressed genes can be used to identify binding site patterns in promoters that are closely related to stress-response. Output from today’s transcriptomic assays necessitates statistically-oriented software to handle large promoter-sequence sets in a computationally tractable fashion. Results We present Marina, an open-source software for identifying over-represented TFBSs from amongst large sets of promoter sequences, using an ensemble of 7 statistical metrics and binding-site profiles. Through software comparison, we show that Marina can identify considerably more over-represented plant TFBSs compared to a popular software alternative. Conclusions Marina was used to identify over-represented TFBSs in a two time-point RNA-Seq study exploring the transcriptomic interplay between soybean (Glycine max) and soybean rust (Phakopsora pachyrhizi). Marina identified numerous abundant TFBSs recognized by transcription factors that are associated with defense-response such as WRKY, HY5 and MYB2. Comparing results from Marina to that of a popular software alternative suggests that regardless of the number of promoter-sequences, Marina is able to identify significantly more over-represented TFBSs. PMID:23578135

  14. SiBIC: A Web Server for Generating Gene Set Networks Based on Biclusters Obtained by Maximal Frequent Itemset Mining

    PubMed Central

    Takahashi, Kei-ichiro; Takigawa, Ichigaku; Mamitsuka, Hiroshi

    2013-01-01

    Detecting biclusters from expression data is useful, since biclusters are coexpressed genes under only part of all given experimental conditions. We present a software called SiBIC, which from a given expression dataset, first exhaustively enumerates biclusters, which are then merged into rather independent biclusters, which finally are used to generate gene set networks, in which a gene set assigned to one node has coexpressed genes. We evaluated each step of this procedure: 1) significance of the generated biclusters biologically and statistically, 2) biological quality of merged biclusters, and 3) biological significance of gene set networks. We emphasize that gene set networks, in which nodes are not genes but gene sets, can be more compact than usual gene networks, meaning that gene set networks are more comprehensible. SiBIC is available at http://utrecht.kuicr.kyoto-u.ac.jp:8080/miami/faces/index.jsp. PMID:24386124

  15. SiBIC: a web server for generating gene set networks based on biclusters obtained by maximal frequent itemset mining.

    PubMed

    Takahashi, Kei-ichiro; Takigawa, Ichigaku; Mamitsuka, Hiroshi

    2013-01-01

    Detecting biclusters from expression data is useful, since biclusters are coexpressed genes under only part of all given experimental conditions. We present a software called SiBIC, which from a given expression dataset, first exhaustively enumerates biclusters, which are then merged into rather independent biclusters, which finally are used to generate gene set networks, in which a gene set assigned to one node has coexpressed genes. We evaluated each step of this procedure: 1) significance of the generated biclusters biologically and statistically, 2) biological quality of merged biclusters, and 3) biological significance of gene set networks. We emphasize that gene set networks, in which nodes are not genes but gene sets, can be more compact than usual gene networks, meaning that gene set networks are more comprehensible. SiBIC is available at http://utrecht.kuicr.kyoto-u.ac.jp:8080/miami/faces/index.jsp. PMID:24386124

  16. Principles for the organization of gene-sets.

    PubMed

    Li, Wentian; Freudenberg, Jan; Oswald, Michaela

    2015-12-01

    A gene-set, an important concept in microarray expression analysis and systems biology, is a collection of genes and/or their products (i.e. proteins) that have some features in common. There are many different ways to construct gene-sets, but a systematic organization of these ways is lacking. Gene-sets are mainly organized ad hoc in current public-domain databases, with group header names often determined by practical reasons (such as the types of technology in obtaining the gene-sets or a balanced number of gene-sets under a header). Here we aim at providing a gene-set organization principle according to the level at which genes are connected: homology, physical map proximity, chemical interaction, biological, and phenotypic-medical levels. We also distinguish two types of connections between genes: actual connection versus sharing of a label. Actual connections denote direct biological interactions, whereas shared label connection denotes shared membership in a group. Some extensions of the framework are also addressed such as overlapping of gene-sets, modules, and the incorporation of other non-protein-coding entities such as microRNAs. PMID:26188561

  17. The Effect of Distributed Practice in Undergraduate Statistics Homework Sets: A Randomized Trial

    ERIC Educational Resources Information Center

    Crissinger, Bryan R.

    2015-01-01

    Most homework sets in statistics courses are constructed so that students concentrate or "mass" their practice on a certain topic in one problem set. Distributed practice homework sets include review problems in each set so that practice on a topic is distributed across problem sets. There is a body of research that points to the…

  18. Excursion sets and non-Gaussian void statistics

    NASA Astrophysics Data System (ADS)

    D'Amico, Guido; Musso, Marcello; Noreña, Jorge; Paranjape, Aseem

    2011-01-01

    Primordial non-Gaussianity (NG) affects the large scale structure (LSS) of the Universe by leaving an imprint on the distribution of matter at late times. Much attention has been focused on using the distribution of collapsed objects (i.e. dark matter halos and the galaxies and galaxy clusters that reside in them) to probe primordial NG. An equally interesting and complementary probe however is the abundance of extended underdense regions or voids in the LSS. The calculation of the abundance of voids using the excursion set formalism in the presence of primordial NG is subject to the same technical issues as the one for halos, which were discussed e.g. in Ref. [G. D’Amico, M. Musso, J. Noreña, and A. Paranjape, arXiv:1005.1203.]. However, unlike the excursion set problem for halos which involved random walks in the presence of one barrier δc, the void excursion set problem involves two barriers δv and δc. This leads to a new complication introduced by what is called the “void-in-cloud” effect discussed in the literature, which is unique to the case of voids. We explore a path integral approach which allows us to carefully account for all these issues, leading to a rigorous derivation of the effects of primordial NG on void abundances. The void-in-cloud issue, in particular, makes the calculation conceptually rather different from the one for halos. However, we show that its final effect can be described by a simple yet accurate approximation. Our final void abundance function is valid on larger scales than the expressions of other authors, while being broadly in agreement with those expressions on smaller scales.

  19. Excursion sets and non-Gaussian void statistics

    SciTech Connect

    D'Amico, Guido; Musso, Marcello; Paranjape, Aseem; Norena, Jorge

    2011-01-15

    Primordial non-Gaussianity (NG) affects the large scale structure (LSS) of the Universe by leaving an imprint on the distribution of matter at late times. Much attention has been focused on using the distribution of collapsed objects (i.e. dark matter halos and the galaxies and galaxy clusters that reside in them) to probe primordial NG. An equally interesting and complementary probe however is the abundance of extended underdense regions or voids in the LSS. The calculation of the abundance of voids using the excursion set formalism in the presence of primordial NG is subject to the same technical issues as the one for halos, which were discussed e.g. in Ref. [51][G. D'Amico, M. Musso, J. Norena, and A. Paranjape, arXiv:1005.1203.]. However, unlike the excursion set problem for halos which involved random walks in the presence of one barrier {delta}{sub c}, the void excursion set problem involves two barriers {delta}{sub v} and {delta}{sub c}. This leads to a new complication introduced by what is called the 'void-in-cloud' effect discussed in the literature, which is unique to the case of voids. We explore a path integral approach which allows us to carefully account for all these issues, leading to a rigorous derivation of the effects of primordial NG on void abundances. The void-in-cloud issue, in particular, makes the calculation conceptually rather different from the one for halos. However, we show that its final effect can be described by a simple yet accurate approximation. Our final void abundance function is valid on larger scales than the expressions of other authors, while being broadly in agreement with those expressions on smaller scales.

  20. Gene coexpression measures in large heterogeneous samples using count statistics.

    PubMed

    Wang, Y X Rachel; Waterman, Michael S; Huang, Haiyan

    2014-11-18

    With the advent of high-throughput technologies making large-scale gene expression data readily available, developing appropriate computational tools to process these data and distill insights into systems biology has been an important part of the "big data" challenge. Gene coexpression is one of the earliest techniques developed that is still widely in use for functional annotation, pathway analysis, and, most importantly, the reconstruction of gene regulatory networks, based on gene expression data. However, most coexpression measures do not specifically account for local features in expression profiles. For example, it is very likely that the patterns of gene association may change or only exist in a subset of the samples, especially when the samples are pooled from a range of experiments. We propose two new gene coexpression statistics based on counting local patterns of gene expression ranks to take into account the potentially diverse nature of gene interactions. In particular, one of our statistics is designed for time-course data with local dependence structures, such as time series coupled over a subregion of the time domain. We provide asymptotic analysis of their distributions and power, and evaluate their performance against a wide range of existing coexpression measures on simulated and real data. Our new statistics are fast to compute, robust against outliers, and show comparable and often better general performance. PMID:25288767

  1. Gene coexpression measures in large heterogeneous samples using count statistics

    PubMed Central

    Wang, Y. X. Rachel; Waterman, Michael S.; Huang, Haiyan

    2014-01-01

    With the advent of high-throughput technologies making large-scale gene expression data readily available, developing appropriate computational tools to process these data and distill insights into systems biology has been an important part of the “big data” challenge. Gene coexpression is one of the earliest techniques developed that is still widely in use for functional annotation, pathway analysis, and, most importantly, the reconstruction of gene regulatory networks, based on gene expression data. However, most coexpression measures do not specifically account for local features in expression profiles. For example, it is very likely that the patterns of gene association may change or only exist in a subset of the samples, especially when the samples are pooled from a range of experiments. We propose two new gene coexpression statistics based on counting local patterns of gene expression ranks to take into account the potentially diverse nature of gene interactions. In particular, one of our statistics is designed for time-course data with local dependence structures, such as time series coupled over a subregion of the time domain. We provide asymptotic analysis of their distributions and power, and evaluate their performance against a wide range of existing coexpression measures on simulated and real data. Our new statistics are fast to compute, robust against outliers, and show comparable and often better general performance. PMID:25288767

  2. Principal Angle Enrichment Analysis (PAEA): Dimensionally Reduced Multivariate Gene Set Enrichment Analysis Tool

    PubMed Central

    Clark, Neil R.; Szymkiewicz, Maciej; Wang, Zichen; Monteiro, Caroline D.; Jones, Matthew R.; Ma’ayan, Avi

    2016-01-01

    Gene set analysis of differential expression, which identifies collectively differentially expressed gene sets, has become an important tool for biology. The power of this approach lies in its reduction of the dimensionality of the statistical problem and its incorporation of biological interpretation by construction. Many approaches to gene set analysis have been proposed, but benchmarking their performance in the setting of real biological data is difficult due to the lack of a gold standard. In a previously published work we proposed a geometrical approach to differential expression which performed highly in benchmarking tests and compared well to the most popular methods of differential gene expression. As reported, this approach has a natural extension to gene set analysis which we call Principal Angle Enrichment Analysis (PAEA). PAEA employs dimensionality reduction and a multivariate approach for gene set enrichment analysis. However, the performance of this method has not been assessed nor its implementation as a web-based tool. Here we describe new benchmarking protocols for gene set analysis methods and find that PAEA performs highly. The PAEA method is implemented as a user-friendly web-based tool, which contains 70 gene set libraries and is freely available to the community. PMID:26848405

  3. Identification of pleiotropic genes and gene sets underlying growth and immunity traits: a case study on Meishan pigs.

    PubMed

    Zhang, Z; Wang, Z; Yang, Y; Zhao, J; Chen, Q; Liao, R; Chen, Z; Zhang, X; Xue, M; Yang, H; Zheng, Y; Wang, Q; Pan, Y

    2016-04-01

    Both growth and immune capacity are important traits in animal breeding. The animal quantitative trait loci (QTL) database is a valuable resource and can be used for interpreting the genetic mechanisms that underlie growth and immune traits. However, QTL intervals often involve too many candidate genes to find the true causal genes. Therefore, the aim of this study was to provide an effective annotation pipeline that can make full use of the information of Gene Ontology terms annotation, linkage gene blocks and pathways to further identify pleiotropic genes and gene sets in the overlapping intervals of growth-related and immunity-related QTLs. In total, 55 non-redundant QTL overlapping intervals were identified, 1893 growth-related genes and 713 immunity-related genes were further classified into overlapping intervals and 405 pleiotropic genes shared by the two gene sets were determined. In addition, 19 pleiotropic gene linkage blocks and 67 pathways related to immunity and growth traits were discovered. A total of 343 growth-related genes and 144 immunity-related genes involved in pleiotropic pathways were also identified, respectively. We also sequenced and genotyped 284 individuals from Chinese Meishan pigs and European pigs and mapped the single nucleotide polymorphisms (SNPs) to the pleiotropic genes and gene sets that we identified. A total of 971 high-confidence SNPs were mapped to the pleiotropic genes and gene sets that we identified, and among them 743 SNPs were statistically significant in allele frequency between Meishan and European pigs. This study explores the relationship between growth and immunity traits from the view of QTL overlapping intervals and can be generalized to explore the relationships between other traits. PMID:26689779

  4. Coexpression analysis of human genes across many microarray data sets.

    PubMed

    Lee, Homin K; Hsu, Amy K; Sajdak, Jon; Qin, Jie; Pavlidis, Paul

    2004-06-01

    We present a large-scale analysis of mRNA coexpression based on 60 large human data sets containing a total of 3924 microarrays. We sought pairs of genes that were reliably coexpressed (based on the correlation of their expression profiles) in multiple data sets, establishing a high-confidence network of 8805 genes connected by 220,649 "coexpression links" that are observed in at least three data sets. Confirmed positive correlations between genes were much more common than confirmed negative correlations. We show that confirmation of coexpression in multiple data sets is correlated with functional relatedness, and show how cluster analysis of the network can reveal functionally coherent groups of genes. Our findings demonstrate how the large body of accumulated microarray data can be exploited to increase the reliability of inferences about gene function. PMID:15173114

  5. Functional-Network-Based Gene Set Analysis Using Gene-Ontology

    PubMed Central

    Chang, Billy; Kustra, Rafal; Tian, Weidong

    2013-01-01

    To account for the functional non-equivalence among a set of genes within a biological pathway when performing gene set analysis, we introduce GOGANPA, a network-based gene set analysis method, which up-weights genes with functions relevant to the gene set of interest. The genes are weighted according to its degree within a genome-scale functional network constructed using the functional annotations available from the gene ontology database. By benchmarking GOGANPA using a well-studied P53 data set and three breast cancer data sets, we will demonstrate the power and reproducibility of our proposed method over traditional unweighted approaches and a competing network-based approach that involves a complex integrated network. GOGANPA’s sole reliance on gene ontology further allows GOGANPA to be widely applicable to the analysis of any gene-ontology-annotated genome. PMID:23418449

  6. Statistical Assessment of Crosstalk Enrichment between Gene Groups in Biological Networks

    PubMed Central

    Alexeyenko, Andrey; Sonnhammer, Erik L. L.

    2013-01-01

    Motivation Analyzing groups of functionally coupled genes or proteins in the context of global interaction networks has become an important aspect of bioinformatic investigations. Assessing the statistical significance of crosstalk enrichment between or within groups of genes can be a valuable tool for functional annotation of experimental gene sets. Results Here we present CrossTalkZ, a statistical method and software to assess the significance of crosstalk enrichment between pairs of gene or protein groups in large biological networks. We demonstrate that the standard z-score is generally an appropriate and unbiased statistic. We further evaluate the ability of four different methods to reliably recover crosstalk within known biological pathways. We conclude that the methods preserving the second-order topological network properties perform best. Finally, we show how CrossTalkZ can be used to annotate experimental gene sets using known pathway annotations and that its performance at this task is superior to gene enrichment analysis (GEA). Availability and Implementation CrossTalkZ (available at http://sonnhammer.sbc.su.se/download/software/CrossTalkZ/) is implemented in C++, easy to use, fast, accepts various input file formats, and produces a number of statistics. These include z-score, p-value, false discovery rate, and a test of normality for the null distributions. PMID:23372799

  7. Statistical aspect of trait mapping using a dense set of markers: A partial review

    SciTech Connect

    Dupuis, J.

    1996-12-31

    This paper presents a review of statistical methods used to locate trait loci using maps of markers spanning the whole genome. Such maps are becoming readily available and can be especially useful in mapping traits that are non Mendelian. Genome-wide search for a trait locus is often called a {open_quotes}global search{close_quotes}. Global search methods include, but are not restricted to, identifying disease susceptibility genes using affected relative pairs, finding quantitative trait loci in experimental organisms and locating quantitative trait loci in humans. For human linkage, we concentrate on methods using pairs of affected relatives rather than pedigree analysis. We begin in the next section with a review of work on the use of affected pairs of relatives to identify gene loci that increase susceptibility to a particular disease. We first review Risch`s 1990 series of papers. Risch`s method can be used to search the entire genome for such susceptibility genes. Using Risch`s idea Elston explored the issue of how many pairs and markers are necessary to reach a certain probability of detecting a locus if there exists one. He proposed a more economical two stage design that uses few markers at the first stage but adds markers around the {open_quotes}promising{close_quotes} area of the genome at the second stage. However, Risch and Elston do not use multipoint linkage analysis, which takes into account all markers at once (rather than one at a time) in the calculation of the test statistic. Such multipoint methods for affected relatives have been developed by Feingold and Feingold et al. The last authors` multipoint method is based on a continuous specification of identity by descent between the affected relatives but can also be used for a set of linked markers spanning the genome. A brief description of their method and treatment of more complex issues such as combining relative pairs is included. 29 refs., 4 tabs.

  8. A Bayesian variable selection procedure to rank overlapping gene sets

    PubMed Central

    2012-01-01

    Background Genome-wide expression profiling using microarrays or sequence-based technologies allows us to identify genes and genetic pathways whose expression patterns influence complex traits. Different methods to prioritize gene sets, such as the genes in a given molecular pathway, have been described. In many cases, these methods test one gene set at a time, and therefore do not consider overlaps among the pathways. Here, we present a Bayesian variable selection method to prioritize gene sets that overcomes this limitation by considering all gene sets simultaneously. We applied Bayesian variable selection to differential expression to prioritize the molecular and genetic pathways involved in the responses to Escherichia coli infection in Danish Holstein cows. Results We used a Bayesian variable selection method to prioritize Kyoto Encyclopedia of Genes and Genomes pathways. We used our data to study how the variable selection method was affected by overlaps among the pathways. In addition, we compared our approach to another that ignores the overlaps, and studied the differences in the prioritization. The variable selection method was robust to a change in prior probability and stable given a limited number of observations. Conclusions Bayesian variable selection is a useful way to prioritize gene sets while considering their overlaps. Ignoring the overlaps gives different and possibly misleading results. Additional procedures may be needed in cases of highly overlapping pathways that are hard to prioritize. PMID:22554182

  9. From Biophysics to Evolutionary Genetics: Statistical Aspects of Gene Regulation

    NASA Astrophysics Data System (ADS)

    Lässig, Michael

    Genomic functions often cannot be understood at the level of single genes but require the study of gene networks. This systems biology credo is nearly commonplace by now. Evidence comes from the comparative analysis of entire genomes: current estimates put, for example, the number of human genes at around 22,000, hardly more than the 14,000 of the fruit fly, and not even an order of magnitude higher than the 6,000 of baker's yeast. The complexity and diversity of higher animals, therefore, cannot be explained in terms of their gene numbers. If, however, a biological function requires the concerted action of several genes, and conversely, a gene takes part in several functional contexts, an organism may be defined less by its individual genes but by their interactions. The emerging picture of the genome as a strongly interacting system with many degrees of freedom brings new challenges for experiment and theory, many of which are of a statistical nature. And indeed, this picture continues to make the subject attractive to a growing number of statistical physicists.

  10. Validation of a set of reference genes to study response to herbicide stress in grasses

    PubMed Central

    2012-01-01

    Background Non-target-site based resistance to herbicides is a major threat to the chemical control of agronomically noxious weeds. This adaptive trait is endowed by differences in the expression of a number of genes in plants that are resistant or sensitive to herbicides. Quantification of the expression of such genes requires normalising qPCR data using reference genes with stable expression in the system studied as internal standards. The aim of this study was to validate reference genes in Alopecurus myosuroides, a grass (Poaceae) weed of economic and agronomic importance with no genomic resources. Results The stability of 11 candidate reference genes was assessed in plants resistant or sensitive to herbicides subjected or not to herbicide stress using the complementary statistical methods implemented by NormFinder, BestKeeper and geNorm. Ubiquitin, beta-tubulin and glyceraldehyde-3-phosphate dehydrogenase were identified as the best reference genes. The reference gene set accuracy was confirmed by analysing the expression of the gene encoding acetyl-coenzyme A carboxylase, a major herbicide target enzyme, and of an herbicide-induced gene encoding a glutathione-S-transferase. Conclusions This is the first study describing a set of reference genes (ubiquitin, beta-tubulin and glyceraldehyde-3-phosphate dehydrogenase) with a stable expression under herbicide stress in grasses. These genes are also candidate reference genes of choice for studies seeking to identify stress-responsive genes in grasses. PMID:22233533

  11. Integrated gene set analysis for microRNA studies

    PubMed Central

    Garcia-Garcia, Francisco; Panadero, Joaquin; Dopazo, Joaquin; Montaner, David

    2016-01-01

    Motivation: Functional interpretation of miRNA expression data is currently done in a three step procedure: select differentially expressed miRNAs, find their target genes, and carry out gene set overrepresentation analysis. Nevertheless, major limitations of this approach have already been described at the gene level, while some newer arise in the miRNA scenario. Here, we propose an enhanced methodology that builds on the well-established gene set analysis paradigm. Evidence for differential expression at the miRNA level is transferred to a gene differential inhibition score which is easily interpretable in terms of gene sets or pathways. Such transferred indexes account for the additive effect of several miRNAs targeting the same gene, and also incorporate cancellation effects between cases and controls. Together, these two desirable characteristics allow for more accurate modeling of regulatory processes. Results: We analyze high-throughput sequencing data from 20 different cancer types and provide exhaustive reports of gene and Gene Ontology-term deregulation by miRNA action. Availability and Implementation: The proposed methodology was implemented in the Bioconductor library mdgsa. http://bioconductor.org/packages/mdgsa. For the purpose of reproducibility all of the scripts are available at https://github.com/dmontaner-papers/gsa4mirna Contact: david.montaner@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27324197

  12. A Hybrid Approach of Gene Sets and Single Genes for the Prediction of Survival Risks with Gene Expression Data

    PubMed Central

    Seok, Junhee; Davis, Ronald W.; Xiao, Wenzhong

    2015-01-01

    Accumulated biological knowledge is often encoded as gene sets, collections of genes associated with similar biological functions or pathways. The use of gene sets in the analyses of high-throughput gene expression data has been intensively studied and applied in clinical research. However, the main interest remains in finding modules of biological knowledge, or corresponding gene sets, significantly associated with disease conditions. Risk prediction from censored survival times using gene sets hasn’t been well studied. In this work, we propose a hybrid method that uses both single gene and gene set information together to predict patient survival risks from gene expression profiles. In the proposed method, gene sets provide context-level information that is poorly reflected by single genes. Complementarily, single genes help to supplement incomplete information of gene sets due to our imperfect biomedical knowledge. Through the tests over multiple data sets of cancer and trauma injury, the proposed method showed robust and improved performance compared with the conventional approaches with only single genes or gene sets solely. Additionally, we examined the prediction result in the trauma injury data, and showed that the modules of biological knowledge used in the prediction by the proposed method were highly interpretable in biology. A wide range of survival prediction problems in clinical genomics is expected to benefit from the use of biological knowledge. PMID:25933378

  13. Gene Identification Algorithms Using Exploratory Statistical Analysis of Periodicity

    NASA Astrophysics Data System (ADS)

    Mukherjee, Shashi Bajaj; Sen, Pradip Kumar

    2010-10-01

    Studying periodic pattern is expected as a standard line of attack for recognizing DNA sequence in identification of gene and similar problems. But peculiarly very little significant work is done in this direction. This paper studies statistical properties of DNA sequences of complete genome using a new technique. A DNA sequence is converted to a numeric sequence using various types of mappings and standard Fourier technique is applied to study the periodicity. Distinct statistical behaviour of periodicity parameters is found in coding and non-coding sequences, which can be used to distinguish between these parts. Here DNA sequences of Drosophila melanogaster were analyzed with significant accuracy.

  14. Evaluation of statistical treatments of left-censored environmental data using coincident uncensored data sets: I. Summary statistics

    USGS Publications Warehouse

    Antweiler, R.C.; Taylor, H.E.

    2008-01-01

    The main classes of statistical treatment of below-detection limit (left-censored) environmental data for the determination of basic statistics that have been used in the literature are substitution methods, maximum likelihood, regression on order statistics (ROS), and nonparametric techniques. These treatments, along with using all instrument-generated data (even those below detection), were evaluated by examining data sets in which the true values of the censored data were known. It was found that for data sets with less than 70% censored data, the best technique overall for determination of summary statistics was the nonparametric Kaplan-Meier technique. ROS and the two substitution methods of assigning one-half the detection limit value to censored data or assigning a random number between zero and the detection limit to censored data were adequate alternatives. The use of these two substitution methods, however, requires a thorough understanding of how the laboratory censored the data. The technique of employing all instrument-generated data - including numbers below the detection limit - was found to be less adequate than the above techniques. At high degrees of censoring (greater than 70% censored data), no technique provided good estimates of summary statistics. Maximum likelihood techniques were found to be far inferior to all other treatments except substituting zero or the detection limit value to censored data.

  15. Turning publicly available gene expression data into discoveries using gene set context analysis

    PubMed Central

    Ji, Zhicheng; Vokes, Steven A.; Dang, Chi V.; Ji, Hongkai

    2016-01-01

    Gene Set Context Analysis (GSCA) is an open source software package to help researchers use massive amounts of publicly available gene expression data (PED) to make discoveries. Users can interactively visualize and explore gene and gene set activities in 25,000+ consistently normalized human and mouse gene expression samples representing diverse biological contexts (e.g. different cells, tissues and disease types, etc.). By providing one or multiple genes or gene sets as input and specifying a gene set activity pattern of interest, users can query the expression compendium to systematically identify biological contexts associated with the specified gene set activity pattern. In this way, researchers with new gene sets from their own experiments may discover previously unknown contexts of gene set functions and hence increase the value of their experiments. GSCA has a graphical user interface (GUI). The GUI makes the analysis convenient and customizable. Analysis results can be conveniently exported as publication quality figures and tables. GSCA is available at https://github.com/zji90/GSCA. This software significantly lowers the bar for biomedical investigators to use PED in their daily research for generating and screening hypotheses, which was previously difficult because of the complexity, heterogeneity and size of the data. PMID:26350211

  16. WhichGenes: a web-based tool for gathering, building, storing and exporting gene sets with application in gene set enrichment analysis

    PubMed Central

    Glez-Peña, Daniel; Gómez-López, Gonzalo; Pisano, David G.; Fdez-Riverola, Florentino

    2009-01-01

    WhichGenes is a web-based interactive gene set building tool offering a very simple interface to extract always-updated gene lists from multiple databases and unstructured biological data sources. While the user can specify new gene sets of interest by following a simple four-step wizard, the tool is able to run several queries in parallel. Every time a new set is generated, it is automatically added to the private gene-set cart and the user is notified by an e-mail containing a direct link to the new set stored in the server. WhichGenes provides functionalities to edit, delete and rename existing sets as well as the capability of generating new ones by combining previous existing sets (intersection, union and difference operators). The user can export his sets configuring the output format and selecting among multiple gene identifiers. In addition to the user-friendly environment, WhichGenes allows programmers to access its functionalities in a programmatic way through a Representational State Transfer web service. WhichGenes front-end is freely available at http://www.whichgenes.org/, WhichGenes API is accessible at http://www.whichgenes.org/api/. PMID:19406925

  17. Differential Effects of Goal Setting and Value Reappraisal on College Women's Motivation and Achievement in Statistics

    ERIC Educational Resources Information Center

    Acee, Taylor Wayne

    2009-01-01

    The purpose of this dissertation was to investigate the differential effects of goal setting and value reappraisal on female students' self-efficacy beliefs, value perceptions, exam performance and continued interest in statistics. It was hypothesized that the Enhanced Goal Setting Intervention (GS-E) would positively impact students'…

  18. A unified set-based test with adaptive filtering for gene-environment interaction analyses.

    PubMed

    Liu, Qianying; Chen, Lin S; Nicolae, Dan L; Pierce, Brandon L

    2016-06-01

    In genome-wide gene-environment interaction (GxE) studies, a common strategy to improve power is to first conduct a filtering test and retain only the SNPs that pass the filtering in the subsequent GxE analyses. Inspired by two-stage tests and gene-based tests in GxE analysis, we consider the general problem of jointly testing a set of parameters when only a few are truly from the alternative hypothesis and when filtering information is available. We propose a unified set-based test that simultaneously considers filtering on individual parameters and testing on the set. We derive the exact distribution and approximate the power function of the proposed unified statistic in simplified settings, and use them to adaptively calculate the optimal filtering threshold for each set. In the context of gene-based GxE analysis, we show that although the empirical power function may be affected by many factors, the optimal filtering threshold corresponding to the peak of the power curve primarily depends on the size of the gene. We further propose a resampling algorithm to calculate P-values for each gene given the estimated optimal filtering threshold. The performance of the method is evaluated in simulation studies and illustrated via a genome-wide gene-gender interaction analysis using pancreatic cancer genome-wide association data. PMID:26496228

  19. A unified set-based test with adaptive filtering for gene-environment interaction analyses

    PubMed Central

    Liu, Qianying; Chen, Lin S.; Nicolae, Dan L.; Pierce, Brandon L.

    2015-01-01

    Summary In genome-wide gene-environment interaction (GxE) studies, a common strategy to improve power is to first conduct a filtering test and retain only the SNPs that pass the filtering in the subsequent GxE analyses. Inspired by two-stage tests and gene-based tests in GxE analysis, we consider the general problem of jointly testing a set of parameters when only a few are truly from the alternative hypothesis and when filtering information is available. We propose a unified set-based test that simultaneously considers filtering on individual parameters and testing on the set. We derive the exact distribution and approximate the power function of the proposed unified statistic in simplified settings, and use them to adaptively calculate the optimal filtering threshold for each set. In the context of gene-based GxE analysis, we show that although the empirical power function may be affected by many factors, the optimal filtering threshold corresponding to the peak of the power curve primarily depends on the size of the gene. We further propose a resampling algorithm to calculate p-values for each gene given the estimated optimal filtering threshold. The performance of the method is evaluated in simulation studies and illustrated via a genome-wide gene-gender interaction analysis using pancreatic cancer genome-wide association data. PMID:26496228

  20. TransFind—predicting transcriptional regulators for gene sets

    PubMed Central

    Kiełbasa, Szymon M.; Klein, Holger; Roider, Helge G.; Vingron, Martin; Blüthgen, Nils

    2010-01-01

    The analysis of putative transcription factor binding sites in promoter regions of coregulated genes allows to infer the transcription factors that underlie observed changes in gene expression. While such analyses constitute a central component of the in-silico characterization of transcriptional regulatory networks, there is still a lack of simple-to-use web servers able to combine state-of-the-art prediction methods with phylogenetic analysis and appropriate multiple testing corrected statistics, which returns the results within a short time. Having these aims in mind we developed TransFind, which is freely available at http://transfind.sys-bio.net/. PMID:20511592

  1. Statistical framework for phylogenomic analysis of gene family expression profiles.

    PubMed

    Gu, Xun

    2004-05-01

    Microarray technology has produced massive expression data that are invaluable for investigating the genome-wide evolutionary pattern of gene expression. To this end, phylogenetic expression analysis is highly desirable. On the basis of the Brownian process, we developed a statistical framework (called the E(0) model), assuming the independent expression of evolution between lineages. Several evolutionary mechanisms are integrated to characterize the pattern of expression diversity after gene duplications, including gradual drift and dramatic shift (punctuated equilibrium). When the phylogeny of a gene family is given, we show that the likelihood function follows a multivariate normal distribution; the variance-covariance matrix is determined by the phylogenetic topology and evolutionary parameters. Maximum-likelihood methods for multiple microarray experiments are developed, and likelihood-ratio tests are designed for testing the evolutionary pattern of gene expression. To reconstruct the evolutionary trace of expression diversity after gene (or genome) duplications, we developed a Bayesian-based method and use the posterior mean as predictors. Potential applications in evolutionary genomics are discussed. PMID:15166175

  2. Parallel evolution of nacre building gene sets in molluscs.

    PubMed

    Jackson, Daniel J; McDougall, Carmel; Woodcroft, Ben; Moase, Patrick; Rose, Robert A; Kube, Michael; Reinhardt, Richard; Rokhsar, Daniel S; Montagnani, Caroline; Joubert, Caroline; Piquemal, David; Degnan, Bernard M

    2010-03-01

    The capacity to biomineralize is closely linked to the rapid expansion of animal life during the early Cambrian, with many skeletonized phyla first appearing in the fossil record at this time. The appearance of disparate molluscan forms during this period leaves open the possibility that shells evolved independently and in parallel in at least some groups. To test this proposition and gain insight into the evolution of structural genes that contribute to shell fabrication, we compared genes expressed in nacre (mother-of-pearl) forming cells in the mantle of the bivalve Pinctada maxima and the gastropod Haliotis asinina. Despite both species having highly lustrous nacre, we find extensive differences in these expressed gene sets. Following the removal of housekeeping genes, less than 10% of all gene clusters are shared between these molluscs, with some being conserved biomineralization genes that are also found in deuterostomes. These differences extend to secreted proteins that may localize to the organic shell matrix, with less than 15% of this secretome being shared. Despite these differences, H. asinina and P. maxima both secrete proteins with repetitive low-complexity domains (RLCDs). Pinctada maxima RLCD proteins-for example, the shematrins-are predominated by silk/fibroin-like domains, which are absent from the H. asinina data set. Comparisons of shematrin genes across three species of Pinctada indicate that this gene family has undergone extensive divergent evolution within pearl oysters. We also detect fundamental bivalve-gastropod differences in extracellular matrix proteins involved in mollusc-shell formation. Pinctada maxima expresses a chitin synthase at high levels and several chitin deacetylation genes, whereas only one protein involved in chitin interactions is present in the H. asinina data set, suggesting that the organic matrix on which calcification proceeds differs fundamentally between these species. Large-scale differences in genes expressed

  3. 'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns

    PubMed Central

    Hastie, Trevor; Tibshirani, Robert; Eisen, Michael B; Alizadeh, Ash; Levy, Ronald; Staudt, Louis; Chan, Wing C; Botstein, David; Brown, Patrick

    2000-01-01

    Background: Large gene expression studies, such as those conducted using DNA arrays, often provide millions of different pieces of data. To address the problem of analyzing such data, we describe a statistical method, which we have called 'gene shaving'. The method identifies subsets of genes with coherent expression patterns and large variation across conditions. Gene shaving differs from hierarchical clustering and other widely used methods for analyzing gene expression studies in that genes may belong to more than one cluster, and the clustering may be supervised by an outcome measure. The technique can be 'unsupervised', that is, the genes and samples are treated as unlabeled, or partially or fully supervised by using known properties of the genes or samples to assist in finding meaningful groupings. Results: We illustrate the use of the gene shaving method to analyze gene expression measurements made on samples from patients with diffuse large B-cell lymphoma. The method identifies a small cluster of genes whose expression is highly predictive of survival. Conclusions: The gene shaving method is a potentially useful tool for exploration of gene expression data and identification of interesting clusters of genes worth further investigation. PMID:11178228

  4. The essential gene set of a photosynthetic organism

    PubMed Central

    Rubin, Benjamin E.; Wetmore, Kelly M.; Price, Morgan N.; Diamond, Spencer; Shultzaberger, Ryan K.; Lowe, Laura C.; Curtin, Genevieve; Arkin, Adam P.; Deutschbauer, Adam; Golden, Susan S.

    2015-01-01

    Synechococcus elongatus PCC 7942 is a model organism used for studying photosynthesis and the circadian clock, and it is being developed for the production of fuel, industrial chemicals, and pharmaceuticals. To identify a comprehensive set of genes and intergenic regions that impacts fitness in S. elongatus, we created a pooled library of ∼250,000 transposon mutants and used sequencing to identify the insertion locations. By analyzing the distribution and survival of these mutants, we identified 718 of the organism’s 2,723 genes as essential for survival under laboratory conditions. The validity of the essential gene set is supported by its tight overlap with well-conserved genes and its enrichment for core biological processes. The differences noted between our dataset and these predictors of essentiality, however, have led to surprising biological insights. One such finding is that genes in a large portion of the TCA cycle are dispensable, suggesting that S. elongatus does not require a cyclic TCA process. Furthermore, the density of the transposon mutant library enabled individual and global statements about the essentiality of noncoding RNAs, regulatory elements, and other intergenic regions. In this way, a group I intron located in tRNALeu, which has been used extensively for phylogenetic studies, was shown here to be essential for the survival of S. elongatus. Our survey of essentiality for every locus in the S. elongatus genome serves as a powerful resource for understanding the organism’s physiology and defines the essential gene set required for the growth of a photosynthetic organism. PMID:26508635

  5. Textrous!: Extracting Semantic Textual Meaning from Gene Sets

    PubMed Central

    Daimon, Caitlin M.; Siddiqui, Sana; Luttrell, Louis M.; Maudsley, Stuart

    2013-01-01

    The un-biased and reproducible interpretation of high-content gene sets from large-scale genomic experiments is crucial to the understanding of biological themes, validation of experimental data, and the eventual development of plans for future experimentation. To derive biomedically-relevant information from simple gene lists, a mathematical association to scientific language and meaningful words or sentences is crucial. Unfortunately, existing software for deriving meaningful and easily-appreciable scientific textual ‘tokens’ from large gene sets either rely on controlled vocabularies (Medical Subject Headings, Gene Ontology, BioCarta) or employ Boolean text searching and co-occurrence models that are incapable of detecting indirect links in the literature. As an improvement to existing web-based informatic tools, we have developed Textrous!, a web-based framework for the extraction of biomedical semantic meaning from a given input gene set of arbitrary length. Textrous! employs natural language processing techniques, including latent semantic indexing (LSI), sentence splitting, word tokenization, parts-of-speech tagging, and noun-phrase chunking, to mine MEDLINE abstracts, PubMed Central articles, articles from the Online Mendelian Inheritance in Man (OMIM), and Mammalian Phenotype annotation obtained from Jackson Laboratories. Textrous! has the ability to generate meaningful output data with even very small input datasets, using two different text extraction methodologies (collective and individual) for the selecting, ranking, clustering, and visualization of English words obtained from the user data. Textrous!, therefore, is able to facilitate the output of quantitatively significant and easily appreciable semantic words and phrases linked to both individual gene and batch genomic data. PMID:23646135

  6. Textrous!: extracting semantic textual meaning from gene sets.

    PubMed

    Chen, Hongyu; Martin, Bronwen; Daimon, Caitlin M; Siddiqui, Sana; Luttrell, Louis M; Maudsley, Stuart

    2013-01-01

    The un-biased and reproducible interpretation of high-content gene sets from large-scale genomic experiments is crucial to the understanding of biological themes, validation of experimental data, and the eventual development of plans for future experimentation. To derive biomedically-relevant information from simple gene lists, a mathematical association to scientific language and meaningful words or sentences is crucial. Unfortunately, existing software for deriving meaningful and easily-appreciable scientific textual 'tokens' from large gene sets either rely on controlled vocabularies (Medical Subject Headings, Gene Ontology, BioCarta) or employ Boolean text searching and co-occurrence models that are incapable of detecting indirect links in the literature. As an improvement to existing web-based informatic tools, we have developed Textrous!, a web-based framework for the extraction of biomedical semantic meaning from a given input gene set of arbitrary length. Textrous! employs natural language processing techniques, including latent semantic indexing (LSI), sentence splitting, word tokenization, parts-of-speech tagging, and noun-phrase chunking, to mine MEDLINE abstracts, PubMed Central articles, articles from the Online Mendelian Inheritance in Man (OMIM), and Mammalian Phenotype annotation obtained from Jackson Laboratories. Textrous! has the ability to generate meaningful output data with even very small input datasets, using two different text extraction methodologies (collective and individual) for the selecting, ranking, clustering, and visualization of English words obtained from the user data. Textrous!, therefore, is able to facilitate the output of quantitatively significant and easily appreciable semantic words and phrases linked to both individual gene and batch genomic data. PMID:23646135

  7. The essential gene set of a photosynthetic organism.

    PubMed

    Rubin, Benjamin E; Wetmore, Kelly M; Price, Morgan N; Diamond, Spencer; Shultzaberger, Ryan K; Lowe, Laura C; Curtin, Genevieve; Arkin, Adam P; Deutschbauer, Adam; Golden, Susan S

    2015-12-01

    Synechococcus elongatus PCC 7942 is a model organism used for studying photosynthesis and the circadian clock, and it is being developed for the production of fuel, industrial chemicals, and pharmaceuticals. To identify a comprehensive set of genes and intergenic regions that impacts fitness in S. elongatus, we created a pooled library of ∼ 250,000 transposon mutants and used sequencing to identify the insertion locations. By analyzing the distribution and survival of these mutants, we identified 718 of the organism's 2,723 genes as essential for survival under laboratory conditions. The validity of the essential gene set is supported by its tight overlap with well-conserved genes and its enrichment for core biological processes. The differences noted between our dataset and these predictors of essentiality, however, have led to surprising biological insights. One such finding is that genes in a large portion of the TCA cycle are dispensable, suggesting that S. elongatus does not require a cyclic TCA process. Furthermore, the density of the transposon mutant library enabled individual and global statements about the essentiality of noncoding RNAs, regulatory elements, and other intergenic regions. In this way, a group I intron located in tRNA(Leu), which has been used extensively for phylogenetic studies, was shown here to be essential for the survival of S. elongatus. Our survey of essentiality for every locus in the S. elongatus genome serves as a powerful resource for understanding the organism's physiology and defines the essential gene set required for the growth of a photosynthetic organism. PMID:26508635

  8. GeneTopics - interpretation of gene sets via literature-driven topic models

    PubMed Central

    2013-01-01

    Background Annotation of a set of genes is often accomplished through comparison to a library of labelled gene sets such as biological processes or canonical pathways. However, this approach might fail if the employed libraries are not up to date with the latest research, don't capture relevant biological themes or are curated at a different level of granularity than is required to appropriately analyze the input gene set. At the same time, the vast biomedical literature offers an unstructured repository of the latest research findings that can be tapped to provide thematic sub-groupings for any input gene set. Methods Our proposed method relies on a gene-specific text corpus and extracts commonalities between documents in an unsupervised manner using a topic model approach. We automatically determine the number of topics summarizing the corpus and calculate a gene relevancy score for each topic allowing us to eliminate non-specific topics. As a result we obtain a set of literature topics in which each topic is associated with a subset of the input genes providing directly interpretable keywords and corresponding documents for literature research. Results We validate our method based on labelled gene sets from the KEGG metabolic pathway collection and the genetic association database (GAD) and show that the approach is able to detect topics consistent with the labelled annotation. Furthermore, we discuss the results on three different types of experimentally derived gene sets, (1) differentially expressed genes from a cardiac hypertrophy experiment in mice, (2) altered transcript abundance in human pancreatic beta cells, and (3) genes implicated by GWA studies to be associated with metabolite levels in a healthy population. In all three cases, we are able to replicate findings from the original papers in a quick and semi-automated manner. Conclusions Our approach provides a novel way of automatically generating meaningful annotations for gene sets that are directly

  9. Demonstration of a software design and statistical analysis methodology with application to patient outcomes data sets

    PubMed Central

    Mayo, Charles; Conners, Steve; Warren, Christopher; Miller, Robert; Court, Laurence; Popple, Richard

    2013-01-01

    Purpose: With emergence of clinical outcomes databases as tools utilized routinely within institutions, comes need for software tools to support automated statistical analysis of these large data sets and intrainstitutional exchange from independent federated databases to support data pooling. In this paper, the authors present a design approach and analysis methodology that addresses both issues. Methods: A software application was constructed to automate analysis of patient outcomes data using a wide range of statistical metrics, by combining use of C#.Net and R code. The accuracy and speed of the code was evaluated using benchmark data sets. Results: The approach provides data needed to evaluate combinations of statistical measurements for ability to identify patterns of interest in the data. Through application of the tools to a benchmark data set for dose-response threshold and to SBRT lung data sets, an algorithm was developed that uses receiver operator characteristic curves to identify a threshold value and combines use of contingency tables, Fisher exact tests, Welch t-tests, and Kolmogorov-Smirnov tests to filter the large data set to identify values demonstrating dose-response. Kullback-Leibler divergences were used to provide additional confirmation. Conclusions: The work demonstrates the viability of the design approach and the software tool for analysis of large data sets. PMID:24320426

  10. Joint Clustering and Component Analysis of Correspondenceless Point Sets: Application to Cardiac Statistical Modeling.

    PubMed

    Gooya, Ali; Lekadir, Karim; Alba, Xenia; Swift, Andrew J; Wild, Jim M; Frangi, Alejandro F

    2015-01-01

    Construction of Statistical Shape Models (SSMs) from arbitrary point sets is a challenging problem due to significant shape variation and lack of explicit point correspondence across the training data set. In medical imaging, point sets can generally represent different shape classes that span healthy and pathological exemplars. In such cases, the constructed SSM may not generalize well, largely because the probability density function (pdf) of the point sets deviates from the underlying assumption of Gaussian statistics. To this end, we propose a generative model for unsupervised learning of the pdf of point sets as a mixture of distinctive classes. A Variational Bayesian (VB) method is proposed for making joint inferences on the labels of point sets, and the principal modes of variations in each cluster. The method provides a flexible framework to handle point sets with no explicit point-to-point correspondences. We also show that by maximizing the marginalized likelihood of the model, the optimal number of clusters of point sets can be determined. We illustrate this work in the context of understanding the anatomical phenotype of the left and right ventricles in heart. To this end, we use a database containing hearts of healthy subjects, patients with Pulmonary Hypertension (PH), and patients with Hypertrophic Cardiomyopathy (HCM). We demonstrate that our method can outperform traditional PCA in both generalization and specificity measures. PMID:26221669

  11. Imputing gene expression from optimally reduced probe sets

    PubMed Central

    Donner, Yoni; Feng, Ting; Benoist, Christophe; Koller, Daphne

    2012-01-01

    Measuring complete gene expression profiles for a large number of experiments is costly. We propose an approach in which a small subset of probes is selected based on a preliminary set of full expression profiles. In subsequent experiments, only the subset is measured, and the missing values are imputed. We develop several algorithms to simultaneously select probes and impute missing values, and demonstrate that these probe selection for imputation (PSI) algorithms can successfully reconstruct missing gene expression values in a wide variety of applications, as evaluated using multiple metrics of biological importance. We analyze the performance of PSI methods under varying conditions, provide guidelines for choosing the optimal method based on the experimental setting, and indicate how to estimate imputation accuracy. Finally, we apply our approach to a large-scale study of immune system variation. PMID:23064520

  12. Gene Set Analysis: A Step-By-Step Guide

    PubMed Central

    Mooney, Michael A.; Wilmot, Beth

    2015-01-01

    To maximize the potential of genome-wide association studies, many researchers are performing secondary analyses to identify sets of genes jointly associated with the trait of interest. Although methods for gene-set analyses (GSA), also called pathway analyses, have been around for more than a decade, the field is still evolving. There are numerous algorithms available for testing the cumulative effect of multiple SNPs, yet no real consensus in the field about the best way to perform a GSA. This paper provides an overview of the factors that can affect the results of a GSA, the lessons learned from past studies, and suggestions for how to make analysis choices that are most appropriate for different types of data. PMID:26059482

  13. Individualized Math Problems in Calculus and Statistics. Oregon Vo-Tech Mathematics Problem Sets.

    ERIC Educational Resources Information Center

    Cosler, Norma, Ed.

    This is one of eighteen sets of individualized mathematics problems developed by the Oregon Vo-Tech Math Project. Each of these problem packages is organized around a mathematical topic and contains problems related to diverse vocations. Solutions are provided for all problems. Problems in which calculus and statistics are applied to forestry,…

  14. Using Stimulus Equivalence Technology to Teach Statistical Inference in a Group Setting

    ERIC Educational Resources Information Center

    Critchfield, Thomas S.; Fienup, Daniel M.

    2010-01-01

    Computerized lessons employing stimulus equivalence technology, used previously under laboratory conditions to teach inferential statistics concepts to college students, were employed in a group setting for the first time. Students showed the same directly taught and emergent learning gains as in laboratory studies. A brief paper-and-pencil…

  15. Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis

    PubMed Central

    Fan, Jean; Salathia, Neeraj; Liu, Rui; Kaeser, Gwendolyn E.; Yung, Yun C.; Herman, Joseph L.; Kaper, Fiona; Fan, Jian-Bing; Zhang, Kun; Chun, Jerold; Kharchenko, Peter V.

    2016-01-01

    The transcriptional state of a cell reflects a variety of biological factors, from persistent cell-type specific features to transient processes such as cell cycle. Depending on biological context, all such aspects of transcriptional heterogeneity may be of interest, but detecting them from noisy single-cell RNA-seq data remains challenging. We developed PAGODA to resolve multiple, potentially overlapping aspects of transcriptional heterogeneity by testing gene sets for coordinated variability amongst measured cells. PMID:26780092

  16. Shrinkage covariance matrix approach based on robust trimmed mean in gene sets detection

    NASA Astrophysics Data System (ADS)

    Karjanto, Suryaefiza; Ramli, Norazan Mohamed; Ghani, Nor Azura Md; Aripin, Rasimah; Yusop, Noorezatty Mohd

    2015-02-01

    Microarray involves of placing an orderly arrangement of thousands of gene sequences in a grid on a suitable surface. The technology has made a novelty discovery since its development and obtained an increasing attention among researchers. The widespread of microarray technology is largely due to its ability to perform simultaneous analysis of thousands of genes in a massively parallel manner in one experiment. Hence, it provides valuable knowledge on gene interaction and function. The microarray data set typically consists of tens of thousands of genes (variables) from just dozens of samples due to various constraints. Therefore, the sample covariance matrix in Hotelling's T2 statistic is not positive definite and become singular, thus it cannot be inverted. In this research, the Hotelling's T2 statistic is combined with a shrinkage approach as an alternative estimation to estimate the covariance matrix to detect significant gene sets. The use of shrinkage covariance matrix overcomes the singularity problem by converting an unbiased to an improved biased estimator of covariance matrix. Robust trimmed mean is integrated into the shrinkage matrix to reduce the influence of outliers and consequently increases its efficiency. The performance of the proposed method is measured using several simulation designs. The results are expected to outperform existing techniques in many tested conditions.

  17. Gene set analysis approaches for RNA-seq data: performance evaluation and application guideline

    PubMed Central

    Rahmatallah, Yasir; Emmert-Streib, Frank

    2016-01-01

    Transcriptome sequencing (RNA-seq) is gradually replacing microarrays for high-throughput studies of gene expression. The main challenge of analyzing microarray data is not in finding differentially expressed genes, but in gaining insights into the biological processes underlying phenotypic differences. To interpret experimental results from microarrays, gene set analysis (GSA) has become the method of choice, in particular because it incorporates pre-existing biological knowledge (in a form of functionally related gene sets) into the analysis. Here we provide a brief review of several statistically different GSA approaches (competitive and self-contained) that can be adapted from microarrays practice as well as those specifically designed for RNA-seq. We evaluate their performance (in terms of Type I error rate, power, robustness to the sample size and heterogeneity, as well as the sensitivity to different types of selection biases) on simulated and real RNA-seq data. Not surprisingly, the performance of various GSA approaches depends only on the statistical hypothesis they test and does not depend on whether the test was developed for microarrays or RNA-seq data. Interestingly, we found that competitive methods have lower power as well as robustness to the samples heterogeneity than self-contained methods, leading to poor results reproducibility. We also found that the power of unsupervised competitive methods depends on the balance between up- and down-regulated genes in tested gene sets. These properties of competitive methods have been overlooked before. Our evaluation provides a concise guideline for selecting GSA approaches, best performing under particular experimental settings in the context of RNA-seq. PMID:26342128

  18. Gene set internal coherence in the context of functional profiling

    PubMed Central

    Montaner, David; Minguez, Pablo; Al-Shahrour, Fátima; Dopazo, Joaquín

    2009-01-01

    Background Functional profiling methods have been extensively used in the context of high-throughput experiments and, in particular, in microarray data analysis. Such methods use available biological information to define different types of functional gene modules (e.g. gene ontology -GO-, KEGG pathways, etc.) whose representation in a pre-defined list of genes is further studied. In the most popular type of microarray experimental designs (e.g. up- or down-regulated genes, clusters of co-expressing genes, etc.) or in other genomic experiments (e.g. Chip-on-chip, epigenomics, etc.) these lists are composed by genes with a high degree of co-expression. Therefore, an implicit assumption in the application of functional profiling methods within this context is that the genes corresponding to the modules tested are effectively defining sets of co-expressing genes. Nevertheless not all the functional modules are biologically coherent entities in terms of co-expression, which will eventually hinder its detection with conventional methods of functional enrichment. Results Using a large collection of microarray data we have carried out a detailed survey of internal correlation in GO terms and KEGG pathways, providing a coherence index to be used for measuring functional module co-regulation. An unexpected low level of internal correlation was found among the modules studied. Only around 30% of the modules defined by GO terms and 57% of the modules defined by KEGG pathways display an internal correlation higher than the expected by chance. This information on the internal correlation of the genes within the functional modules can be used in the context of a logistic regression model in a simple way to improve their detection in gene expression experiments. Conclusion For the first time, an exhaustive study on the internal co-expression of the most popular functional categories has been carried out. Interestingly, the real level of coexpression within many of them is lower

  19. Statistical Mechanics of Horizontal Gene Transfer in Evolutionary Ecology

    NASA Astrophysics Data System (ADS)

    Chia, Nicholas; Goldenfeld, Nigel

    2011-04-01

    The biological world, especially its majority microbial component, is strongly interacting and may be dominated by collective effects. In this review, we provide a brief introduction for statistical physicists of the way in which living cells communicate genetically through transferred genes, as well as the ways in which they can reorganize their genomes in response to environmental pressure. We discuss how genome evolution can be thought of as related to the physical phenomenon of annealing, and describe the sense in which genomes can be said to exhibit an analogue of information entropy. As a direct application of these ideas, we analyze the variation with ocean depth of transposons in marine microbial genomes, predicting trends that are consistent with recent observations using metagenomic surveys.

  20. A statistical method to incorporate biological knowledge for generating testable novel gene regulatory interactions from microarray experiments

    PubMed Central

    Larsen, Peter; Almasri, Eyad; Chen, Guanrao; Dai, Yang

    2007-01-01

    Background The incorporation of prior biological knowledge in the analysis of microarray data has become important in the reconstruction of transcription regulatory networks in a cell. Most of the current research has been focused on the integration of multiple sets of microarray data as well as curated databases for a genome scale reconstruction. However, individual researchers are more interested in the extraction of most useful information from the data of their hypothesis-driven microarray experiments. How to compile the prior biological knowledge from literature to facilitate new hypothesis generation from a microarray experiment is the focus of this work. We propose a novel method based on the statistical analysis of reported gene interactions in PubMed literature. Results Using Gene Ontology (GO) Molecular Function annotation for reported gene regulatory interactions in PubMed literature, a statistical analysis method was proposed for the derivation of a likelihood of interaction (LOI) score for a pair of genes. The LOI-score and the Pearson correlation coefficient of gene profiles were utilized to check if a pair of query genes would be in the above specified interaction. The method was validated in the analysis of two gene sets formed from the yeast Saccharomyces cerevisiae cell cycle microarray data. It was found that high percentage of identified interactions shares GO Biological Process annotations (39.5% for a 102 interaction enriched gene set and 23.0% for a larger 999 cyclically expressed gene set). Conclusion This method can uncover novel biologically relevant gene interactions. With stringent confidence levels, small interaction networks can be identified for further establishment of a hypothesis testable by biological experiment. This procedure is computationally inexpensive and can be used as a preprocessing procedure for screening potential biologically relevant gene pairs subject to the analysis with sophisticated statistical methods. PMID

  1. CarGene: characterisation of sets of genes based on metabolic pathways analysis.

    PubMed

    Aguilar-Ruiz, Jesus S; Rodriguez-Baena, Domingo S; Diaz-Diaz, Norberto; Nepomuceno-Chamorro, Isabel A

    2011-01-01

    The great amount of biological information provides scientists with an incomparable framework for testing the results of new algorithms. Several tools have been developed for analysing gene-enrichment and most of them are Gene Ontology-based tools. We developed a Kyoto Encyclopedia of Genes and Genomes (Kegg)-based tool that provides a friendly graphical environment for analysing gene-enrichment. The tool integrates two statistical corrections and simultaneously analysing the information about many groups of genes in both visual and textual manner. We tested the usefulness of our approach on a previous analysis (Huttenshower et al.). Furthermore, our tool is freely available (http://www.upo.es/eps/bigs/cargene.html). PMID:22145534

  2. Statistical plant set estimation using Schroeder-phased multisinusoidal input design

    NASA Technical Reports Server (NTRS)

    Bayard, D. S.

    1992-01-01

    A frequency domain method is developed for plant set estimation. The estimation of a plant 'set' rather than a point estimate is required to support many methods of modern robust control design. The approach here is based on using a Schroeder-phased multisinusoid input design which has the special property of placing input energy only at the discrete frequency points used in the computation. A detailed analysis of the statistical properties of the frequency domain estimator is given, leading to exact expressions for the probability distribution of the estimation error, and many important properties. It is shown that, for any nominal parametric plant estimate, one can use these results to construct an overbound on the additive uncertainty to any prescribed statistical confidence. The 'soft' bound thus obtained can be used to replace 'hard' bounds presently used in many robust control analysis and synthesis methods.

  3. Multivariate statistical analysis and partitioning of sedimentary geochemical data sets: General principles and specific MATLAB scripts

    NASA Astrophysics Data System (ADS)

    Pisias, Nicklas G.; Murray, Richard W.; Scudder, Rachel P.

    2013-10-01

    Multivariate statistical treatments of large data sets in sedimentary geochemical and other fields are rapidly becoming more popular as analytical and computational capabilities expand. Because geochemical data sets present a unique set of conditions (e.g., the closed array), application of generic off-the-shelf applications is not straightforward and can yield misleading results. We present here annotated MATLAB scripts (and specific guidelines for their use) for Q-mode factor analysis, a constrained least squares multiple linear regression technique, and a total inversion protocol, that are based on the well-known approaches taken by Dymond (1981), Leinen and Pisias (1984), Kyte et al. (1993), and their predecessors. Although these techniques have been used by investigators for the past decades, their application has been neither consistent nor transparent, as their code has remained in-house or in formats not commonly used by many of today's researchers (e.g., FORTRAN). In addition to providing the annotated scripts and instructions for use, we discuss general principles to be considered when performing multivariate statistical treatments of large geochemical data sets, provide a brief contextual history of each approach, explain their similarities and differences, and include a sample data set for the user to test their own manipulation of the scripts.

  4. Coupled level set segmentation using a point-based statistical shape model relying on correspondence probabilities

    NASA Astrophysics Data System (ADS)

    Hufnagel, Heike; Ehrhardt, Jan; Pennec, Xavier; Schmidt-Richberg, Alexander; Handels, Heinz

    2010-03-01

    In this article, we propose a unified statistical framework for image segmentation with shape prior information. The approach combines an explicitely parameterized point-based probabilistic statistical shape model (SSM) with a segmentation contour which is implicitly represented by the zero level set of a higher dimensional surface. These two aspects are unified in a Maximum a Posteriori (MAP) estimation where the level set is evolved to converge towards the boundary of the organ to be segmented based on the image information while taking into account the prior given by the SSM information. The optimization of the energy functional obtained by the MAP formulation leads to an alternate update of the level set and an update of the fitting of the SSM. We then adapt the probabilistic SSM for multi-shape modeling and extend the approach to multiple-structure segmentation by introducing a level set function for each structure. During segmentation, the evolution of the different level set functions is coupled by the multi-shape SSM. First experimental evaluations indicate that our method is well suited for the segmentation of topologically complex, non spheric and multiple-structure shapes. We demonstrate the effectiveness of the method by experiments on kidney segmentation as well as on hip joint segmentation in CT images.

  5. Quantum Statistical Mechanical Derivation of the Second Law of Thermodynamics: A Hybrid Setting Approach

    NASA Astrophysics Data System (ADS)

    Tasaki, Hal

    2016-04-01

    Based on quantum statistical mechanics and microscopic quantum dynamics, we prove Planck's and Kelvin's principles for macroscopic systems in a general and realistic setting. We consider a hybrid quantum system that consists of the thermodynamic system, which is initially in thermal equilibrium, and the "apparatus" which operates on the former, and assume that the whole system evolves autonomously. This provides a satisfactory derivation of the second law for macroscopic systems.

  6. Quantum Statistical Mechanical Derivation of the Second Law of Thermodynamics: A Hybrid Setting Approach.

    PubMed

    Tasaki, Hal

    2016-04-29

    Based on quantum statistical mechanics and microscopic quantum dynamics, we prove Planck's and Kelvin's principles for macroscopic systems in a general and realistic setting. We consider a hybrid quantum system that consists of the thermodynamic system, which is initially in thermal equilibrium, and the "apparatus" which operates on the former, and assume that the whole system evolves autonomously. This provides a satisfactory derivation of the second law for macroscopic systems. PMID:27176507

  7. geneCommittee: a web-based tool for extensively testing the discriminatory power of biologically relevant gene sets in microarray data classification

    PubMed Central

    2014-01-01

    Background The diagnosis and prognosis of several diseases can be shortened through the use of different large-scale genome experiments. In this context, microarrays can generate expression data for a huge set of genes. However, to obtain solid statistical evidence from the resulting data, it is necessary to train and to validate many classification techniques in order to find the best discriminative method. This is a time-consuming process that normally depends on intricate statistical tools. Results geneCommittee is a web-based interactive tool for routinely evaluating the discriminative classification power of custom hypothesis in the form of biologically relevant gene sets. While the user can work with different gene set collections and several microarray data files to configure specific classification experiments, the tool is able to run several tests in parallel. Provided with a straightforward and intuitive interface, geneCommittee is able to render valuable information for diagnostic analyses and clinical management decisions based on systematically evaluating custom hypothesis over different data sets using complementary classifiers, a key aspect in clinical research. Conclusions geneCommittee allows the enrichment of microarrays raw data with gene functional annotations, producing integrated datasets that simplify the construction of better discriminative hypothesis, and allows the creation of a set of complementary classifiers. The trained committees can then be used for clinical research and diagnosis. Full documentation including common use cases and guided analysis workflows is freely available at http://sing.ei.uvigo.es/GC/. PMID:24475928

  8. Interpolation of a surface from sets of discrete height data of different statistical characteristics

    NASA Technical Reports Server (NTRS)

    Leberl, F.

    1976-01-01

    This paper presents and analyzes a method for the interpolation of a unique surface from two sets of independent digital height data of differing statistical characteristics. This method is based on linear prediction and thus relies on the concepts of auto- and cross-covariance functions. The linear prediction algorithm for two sets of digital height measurements is first derived and then evaluated using the method of moving averages and bilinear interpolation for comparison. It is found that the overall root mean square interpolation errors of linear prediction are similar to those from moving averages and bilinear interpolation. This accuracy performance, together with the well known potential for controlled filtering of measuring errors and good-behavior in areas of poor control, makes linear prediction a versatile and general method for interpolating a unique surface from two sets of digital height data, with applications in photogrammetric mapping, remote sensing, and other fields.

  9. Statistical analysis of sets of random walks: how to resolve their generating mechanism.

    PubMed

    Coscoy, Sylvie; Huguet, Etienne; Amblard, François

    2007-11-01

    The analysis of experimental random walks aims at identifying the process(es) that generate(s) them. It is in general a difficult task, because statistical dispersion within an experimental set of random walks is a complex combination of the stochastic nature of the generating process, and the possibility to have more than one simple process. In this paper, we study by numerical simulations how the statistical distribution of various geometric descriptors such as the second, third and fourth order moments of two-dimensional random walks depends on the stochastic process that generates that set. From these observations, we derive a method to classify complex sets of random walks, and resolve the generating process(es) by the systematic comparison of experimental moment distributions with those numerically obtained for candidate processes. In particular, various processes such as Brownian diffusion combined with convection, noise, confinement, anisotropy, or intermittency, can be resolved by using high order moment distributions. In addition, finite-size effects are observed that are useful for treating short random walks. As an illustration, we describe how the present method can be used to study the motile behavior of epithelial microvilli. The present work should be of interest in biology for all possible types of single particle tracking experiments. PMID:17896161

  10. Multivariate statistical approach to a data set of dioxin and furan contaminations in human milk

    SciTech Connect

    Lindstrom, G.U.M.; Sjostrom, M.; Swanson, S.E. ); Furst, P.; Kruger, C.; Meemken, H.A.; Groebel, W. )

    1988-05-01

    The levels of chlorinated dibenzodioxins, PCDDs, and dibenzofurans, PCDFs, in human milk have been of great concern after the discovery of the toxic 2,3,7,8-substituted isomers in milk of European origin. As knowledge of environmental contamination of human breast milk increases, questions will continue to be asked about possible risks from breast feeding. Before any recommendations can be made, there must be knowledge of contaminant levels in mothers' breast milk. Researchers have measured PCB and 17 different dioxins and furans in human breast milk samples. To date the data has only been analyzed by univariate and bivariate statistical methods. However to extract as much information as possible from this data set, multivariate statistical methods must be used. Here the authors present a multivariate analysis where the relationships between the polychlorinated compounds and the personalia of the mothers have been studied. For the data analysis partial least squares (PLS) modelling has been used.

  11. Regionalisation of statistical model outputs creating gridded data sets for Germany

    NASA Astrophysics Data System (ADS)

    Höpp, Simona Andrea; Rauthe, Monika; Deutschländer, Thomas

    2016-04-01

    The goal of the German research program ReKliEs-De (regional climate projection ensembles for Germany, http://.reklies.hlug.de) is to distribute robust information about the range and the extremes of future climate for Germany and its neighbouring river catchment areas. This joint research project is supported by the German Federal Ministry of Education and Research (BMBF) and was initiated by the German Federal States. The Project results are meant to support the development of adaptation strategies to mitigate the impacts of future climate change. The aim of our part of the project is to adapt and transfer the regionalisation methods of the gridded hydrological data set (HYRAS) from daily station data to the station based statistical regional climate model output of WETTREG (regionalisation method based on weather patterns). The WETTREG model output covers the period of 1951 to 2100 with a daily temporal resolution. For this, we generate a gridded data set of the WETTREG output for precipitation, air temperature and relative humidity with a spatial resolution of 12.5 km x 12.5 km, which is common for regional climate models. Thus, this regionalisation allows comparing statistical to dynamical climate model outputs. The HYRAS data set was developed by the German Meteorological Service within the German research program KLIWAS (www.kliwas.de) and consists of daily gridded data for Germany and its neighbouring river catchment areas. It has a spatial resolution of 5 km x 5 km for the entire domain for the hydro-meteorological elements precipitation, air temperature and relative humidity and covers the period of 1951 to 2006. After conservative remapping the HYRAS data set is also convenient for the validation of climate models. The presentation will consist of two parts to present the actual state of the adaptation of the HYRAS regionalisation methods to the statistical regional climate model WETTREG: First, an overview of the HYRAS data set and the regionalisation

  12. Statistical evaluation of synchronous spike patterns extracted by frequent item set mining

    PubMed Central

    Torre, Emiliano; Picado-Muiño, David; Denker, Michael; Borgelt, Christian; Grün, Sonja

    2013-01-01

    We recently proposed frequent itemset mining (FIM) as a method to perform an optimized search for patterns of synchronous spikes (item sets) in massively parallel spike trains. This search outputs the occurrence count (support) of individual patterns that are not trivially explained by the counts of any superset (closed frequent item sets). The number of patterns found by FIM makes direct statistical tests infeasible due to severe multiple testing. To overcome this issue, we proposed to test the significance not of individual patterns, but instead of their signatures, defined as the pairs of pattern size z and support c. Here, we derive in detail a statistical test for the significance of the signatures under the null hypothesis of full independence (pattern spectrum filtering, PSF) by means of surrogate data. As a result, injected spike patterns that mimic assembly activity are well detected, yielding a low false negative rate. However, this approach is prone to additionally classify patterns resulting from chance overlap of real assembly activity and background spiking as significant. These patterns represent false positives with respect to the null hypothesis of having one assembly of given signature embedded in otherwise independent spiking activity. We propose the additional method of pattern set reduction (PSR) to remove these false positives by conditional filtering. By employing stochastic simulations of parallel spike trains with correlated activity in form of injected spike synchrony in subsets of the neurons, we demonstrate for a range of parameter settings that the analysis scheme composed of FIM, PSF and PSR allows to reliably detect active assemblies in massively parallel spike trains. PMID:24167487

  13. Statistical evaluation of a new air dispersion model against AERMOD using the Prairie Grass data set.

    PubMed

    Armani, Fernando Augusto Silveira; de Almeida, Ricardo Carvalho; Dias, Nelson Luís da Costa

    2014-02-01

    In this work, the authors present a statistical assessment of two atmospheric dispersion models. One of them, AERMOD (American Meteorological Society/Environmental Protection Agency Regulatory Model), adopted by the US. Environmental Protection Agency, is widely used in many countries and here is taken as a baseline to assess the performance of a newly proposed model, MODELAR (Modelo Regulatório de Qualidade do Ar). In terms of parameterizations and modeling options, MODELAR is a somewhat simple model. It is currently being considered for adoption as the regulatory model in Paraná State, Brazil. The well-known Prairie Grass data set, already used in earlier evaluations of the same version of AERMOD analyzed here, was used to perform model assessment. The evaluations employed well-established statistical performance descriptors and techniques. The results indicate that MODELAR is a slightly better predictor, for the Prairie Grass data set, of concentrations under unstable conditions, whereas AERMOD has a better performance under near-neutral and stable conditions. Moreover cases of severe overestimation and underestimation, as detected by the Factor of Two index, are clearly associated with extreme stability conditions (both unstable and stable), stressing the need for better parameterizations under these conditions. PMID:24654389

  14. A cross-study gene set enrichment analysis identifies critical pathways in endometriosis

    PubMed Central

    Zhao, Hongbo; Wang, Qishan; Bai, Chunyan; He, Kan; Pan, Yuchun

    2009-01-01

    Background Endometriosis is an enigmatic disease. Gene expression profiling of endometriosis has been used in several studies, but few studies went further to classify subtypes of endometriosis based on expression patterns and to identify possible pathways involved in endometriosis. Some of the observed pathways are more inconsistent between the studies, and these candidate pathways presumably only represent a fraction of the pathways involved in endometriosis. Methods We applied a standardised microarray preprocessing and gene set enrichment analysis to six independent studies, and demonstrated increased concordance between these gene datasets. Results We find 16 up-regulated and 19 down-regulated pathways common in ovarian endometriosis data sets, 22 up-regulated and one down-regulated pathway common in peritoneal endometriosis data sets. Among them, 12 up-regulated and 1 down-regulated were found consistent between ovarian and peritoneal endometriosis. The main canonical pathways identified are related to immunological and inflammatory disease. Early secretory phase has the most over-represented pathways in the three uterine cycle phases. There are no overlapping significant pathways between the dataset from human endometrial endothelial cells and the datasets from ovarian endometriosis which used whole tissues. Conclusion The study of complex diseases through pathway analysis is able to highlight genes weakly connected to the phenotype which may be difficult to detect by using classical univariate statistics. By standardised microarray preprocessing and GSEA, we have increased the concordance in identifying many biological mechanisms involved in endometriosis. The identified gene pathways will shed light on the understanding of endometriosis and promote the development of novel therapies. PMID:19735579

  15. Reduced Set of Virulence Genes Allows High Accuracy Prediction of Bacterial Pathogenicity in Humans

    PubMed Central

    Iraola, Gregorio; Vazquez, Gustavo; Spangenberg, Lucía; Naya, Hugo

    2012-01-01

    Although there have been great advances in understanding bacterial pathogenesis, there is still a lack of integrative information about what makes a bacterium a human pathogen. The advent of high-throughput sequencing technologies has dramatically increased the amount of completed bacterial genomes, for both known human pathogenic and non-pathogenic strains; this information is now available to investigate genetic features that determine pathogenic phenotypes in bacteria. In this work we determined presence/absence patterns of different virulence-related genes among more than finished bacterial genomes from both human pathogenic and non-pathogenic strains, belonging to different taxonomic groups (i.e: Actinobacteria, Gammaproteobacteria, Firmicutes, etc.). An accuracy of 95% using a cross-fold validation scheme with in-fold feature selection is obtained when classifying human pathogens and non-pathogens. A reduced subset of highly informative genes () is presented and applied to an external validation set. The statistical model was implemented in the BacFier v1.0 software (freely available at ), that displays not only the prediction (pathogen/non-pathogen) and an associated probability for pathogenicity, but also the presence/absence vector for the analyzed genes, so it is possible to decipher the subset of virulence genes responsible for the classification on the analyzed genome. Furthermore, we discuss the biological relevance for bacterial pathogenesis of the core set of genes, corresponding to eight functional categories, all with evident and documented association with the phenotypes of interest. Also, we analyze which functional categories of virulence genes were more distinctive for pathogenicity in each taxonomic group, which seems to be a completely new kind of information and could lead to important evolutionary conclusions. PMID:22916122

  16. Tri-mean-based statistical differential gene expression detection.

    PubMed

    Ji, Zhaohua; Wu, Chunguo; Wang, Yao; Guan, Renchu; Tu, Huawei; Wu, Xiaozhou; Liang, Yanchun

    2012-01-01

    Based on the assumption that only a subset of disease group has differential gene expression, traditional detection of differentially expressed genes is under the constraint that cancer genes are up- or down-regulated in all disease samples compared with normal samples. However, in 2005, Tomlins assumed and discussed the situation that only a subset of disease samples would be activated, which are often referred to as outliers. PMID:23155761

  17. A clone-based statistical test for localizing disease genes using genomic mismatch scanning

    SciTech Connect

    Palmer, C.G.S.; Woodward, A.; Smalley, S.L.

    1994-09-01

    Genomic mismatch scanning (GMS) is a technique for isolating regions of DNA that are identical-by-descent (IBD) within pairs of relatives. GMS selected data are hybridized to an ordered array of DNA, e.g., metaphase chromosomes, YACs, to identify and localize enhanced region(s) of IBD across pairs of relatives affected with a trait of interest. If the trait has a genetic basis, it is reasonable to assume that the trait gene(s) will be located in these enhanced regions. Our approach to localize these enhanced regions is based on the availability of an ordered array of clones, e.g., YACs, which span the entire human genome. We use an exact binomial order statistic to develop a test for enhanced regions of IBD in sets of clones 1 cM in size selected for being biologically independent (i.e., separated by 50 cM). The test statistic is the maximum proportion of IBD pairs selected from the independent YACs within a set. Thus far, we have defined the power of the test under the alternative hypothesis of a single gene conditional on the maximum proportion IBD being located at the disease locus. As an example, for 60 grandparent-grandchild pairs, the exact power of the test with alpha=0.001 is 0.83 when the relative risk of the disease is 4.0 and the maximum proportion is at the disease locus. This method can be used in small samples and is not dependent on any specific mapping function.

  18. PECA: a novel statistical tool for deconvoluting time-dependent gene expression regulation.

    PubMed

    Teo, Guoshou; Vogel, Christine; Ghosh, Debashis; Kim, Sinae; Choi, Hyungwon

    2014-01-01

    Protein expression varies as a result of intricate regulation of synthesis and degradation of messenger RNAs (mRNA) and proteins. Studies of dynamic regulation typically rely on time-course data sets of mRNA and protein expression, yet there are no statistical methods that integrate these multiomics data and deconvolute individual regulatory processes of gene expression control underlying the observed concentration changes. To address this challenge, we developed Protein Expression Control Analysis (PECA), a method to quantitatively dissect protein expression variation into the contributions of mRNA synthesis/degradation and protein synthesis/degradation, termed RNA-level and protein-level regulation respectively. PECA computes the rate ratios of synthesis versus degradation as the statistical summary of expression control during a given time interval at each molecular level and computes the probability that the rate ratio changed between adjacent time intervals, indicating regulation change at the time point. Along with the associated false-discovery rates, PECA gives the complete description of dynamic expression control, that is, which proteins were up- or down-regulated at each molecular level and each time point. Using PECA, we analyzed two yeast data sets monitoring the cellular response to hyperosmotic and oxidative stress. The rate ratio profiles reported by PECA highlighted a large magnitude of RNA-level up-regulation of stress response genes in the early response and concordant protein-level regulation with time delay. However, the contributions of RNA- and protein-level regulation and their temporal patterns were different between the two data sets. We also observed several cases where protein-level regulation counterbalanced transcriptomic changes in the early stress response to maintain the stability of protein concentrations, suggesting that proteostasis is a proteome-wide phenomenon mediated by post-transcriptional regulation. PMID:24229407

  19. Statistical criteria to set alarm levels for continuous measurements of ground contamination.

    PubMed

    Brandl, A; Jimenez, A D Herrera

    2008-08-01

    In the course of the decommissioning of the ASTRA research reactor at the site of the Austrian Research Centers at Seibersdorf, the operator and licensee, Nuclear Engineering Seibersdorf, conducted an extensive site survey and characterization to demonstrate compliance with regulatory site release criteria. This survey included radiological characterization of approximately 400,000 m(2) of open land on the Austrian Research Centers premises. Part of this survey was conducted using a mobile large-area gas proportional counter, continuously recording measurements while it was moved at a speed of 0.5 ms(-1). In order to set reasonable investigation levels, two alarm levels based on statistical considerations were developed. This paper describes the derivation of these alarm levels and the operational experience gained by detector deployment in the field. PMID:18617795

  20. Speed reading for genes: bookmarks set the pace.

    PubMed

    Follmer, Nicole E; Francis, Nicole J

    2011-11-15

    During mitosis, most transcription ceases. Mitotic gene bookmarking marks genes for reactivation to ensure reestablishment of transcription states and cell-cycle progression. In a recent issue of Nature Cell Biology, Zhao et al. (2011) investigate how gene bookmarking leads to accelerated kinetics of transcriptional reactivation after mitosis. PMID:22075142

  1. Human Effector / Initiator Gene Sets That Regulate Myometrial Contractility During Term and Preterm Labor

    PubMed Central

    WEINER, Carl P.; MASON, Clifford W.; DONG, Yafeng; BUHIMSCHI, Irina A.; SWAAN, Peter W.; BUHIMSCHI, Catalin S.

    2010-01-01

    Objective Distinct processes govern transition from quiescence to activation during term (TL) and preterm labor (PTL). We sought gene sets responsible for TL and PTL, along with the effector genes necessary for labor independent of gestation and underlying trigger. Methods Expression was analyzed in term and preterm +/− labor (n =6 subjects/group). Gene sets were generated using logic operations. Results 34 genes were similarly expressed in PTL/TL but absent from nonlabor samples (Effector Set). 49 genes were specific to PTL (Preterm Initiator Set) and 174 to TL (Term Initiator Set). The gene ontogeny processes comprising Term Initiator and Effector Sets were diverse, though inflammation was represented in 4 of the top 10; inflammation dominated the Preterm Initiator Set. Comments TL and PTL differ dramatically in initiator profiles. Though inflammation is part of the Term Initiator and the Effector Sets, it is an overwhelming part of PTL associated with intraamniotic inflammation. PMID:20452493

  2. JAG: A Computational Tool to Evaluate the Role of Gene-Sets in Complex Traits

    PubMed Central

    Lips, Esther S.; Kooyman, Maarten; de Leeuw, Christiaan; Posthuma, Danielle

    2015-01-01

    Gene-set analysis has been proposed as a powerful tool to deal with the highly polygenic architecture of complex traits, as well as with the small effect sizes typically found in GWAS studies for complex traits. We developed a tool, Joint Association of Genetic variants (JAG), which can be applied to Genome Wide Association (GWA) data and tests for the joint effect of all single nucleotide polymorphisms (SNPs) located in a user-specified set of genes or biological pathway. JAG assigns SNPs to genes and incorporates self-contained and/or competitive tests for gene-set analysis. JAG uses permutation to evaluate gene-set significance, which implicitly controls for linkage disequilibrium, sample size, gene size, the number of SNPs per gene and the number of genes in the gene-set. We conducted a power analysis using the Wellcome Trust Case Control Consortium (WTCCC) Crohn’s disease data set and show that JAG correctly identifies validated gene-sets for Crohn’s disease and has more power than currently available tools for gene-set analysis. JAG is a powerful, novel tool for gene-set analysis, and can be freely downloaded from the CTG Lab website. PMID:26110313

  3. Gene-set activity toolbox (GAT): A platform for microarray-based cancer diagnosis using an integrative gene-set analysis approach.

    PubMed

    Engchuan, Worrawat; Meechai, Asawin; Tongsima, Sissades; Doungpan, Narumol; Chan, Jonathan H

    2016-08-01

    Cancer is a complex disease that cannot be diagnosed reliably using only single gene expression analysis. Using gene-set analysis on high throughput gene expression profiling controlled by various environmental factors is a commonly adopted technique used by the cancer research community. This work develops a comprehensive gene expression analysis tool (gene-set activity toolbox: (GAT)) that is implemented with data retriever, traditional data pre-processing, several gene-set analysis methods, network visualization and data mining tools. The gene-set analysis methods are used to identify subsets of phenotype-relevant genes that will be used to build a classification model. To evaluate GAT performance, we performed a cross-dataset validation study on three common cancers namely colorectal, breast and lung cancers. The results show that GAT can be used to build a reasonable disease diagnostic model and the predicted markers have biological relevance. GAT can be accessed from http://gat.sit.kmutt.ac.th where GAT's java library for gene-set analysis, simple classification and a database with three cancer benchmark datasets can be downloaded. PMID:27102089

  4. New cyt b gene universal primer set for forensic analysis.

    PubMed

    Lopez-Oceja, A; Gamarra, D; Borragan, S; Jiménez-Moreno, S; de Pancorbo, M M

    2016-07-01

    Analysis of mitochondrial DNA, and in particular the cytochrome b gene (cyt b), has become an essential tool for species identification in routine forensic practice. In cases of degraded samples, where the DNA is fractionated, universal primers that are highly efficient for the amplification of the target region are necessary. Therefore, in the present study a new universal cyt b primer set with high species identification capabilities, even in samples with highly degraded DNA, has been developed. In order to achieve this objective, the primers were designed following the alignment of complete sequences of the cyt b from 751 species from the Class of Mammalia listed in GenBank. A highly variable region of 148bp flanked by highly conserved sequences was chosen for placing the primers. The effectiveness of the new pair of primers was examined in 63 animal species belonging to 38 Families from 14 Orders and 5 Classes (Mammalia, Aves, Reptilia, Actinopterygii, and Malacostraca). Species determination was possible in all cases, which shows that the fragment analyzed provided a high capability for species identification. Furthermore, to ensure the efficiency of the 148bp fragment, the intraspecific variability was analyzed by calculating the concordance between individuals with the BLAST tool from the NCBI (National Center for Biotechnological Information). The intraspecific concordance levels were superior to 97% in all species. Likewise, the phylogenetic information from the selected fragment was confirmed by obtaining the phylogenetic tree from the sequences of the species analyzed. Evidence of the high power of phylogenetic discrimination of the analyzed fragment of the cyt b was obtained, as 93.75% of the species were grouped within their corresponding Orders. Finally, the analysis of 40 degraded samples with small-size DNA fragments showed that the new pair of primers permits identifying the species, even when the DNA is highly degraded as it is very common in

  5. Tracking Difference in Gene Expression in a Time-Course Experiment Using Gene Set Enrichment Analysis

    PubMed Central

    Wong, Pui Shan; Tanaka, Michihiro; Sunaga, Yoshihiko; Tanaka, Masayoshi; Taniguchi, Takeaki; Yoshino, Tomoko; Tanaka, Tsuyoshi; Fujibuchi, Wataru; Aburatani, Sachiyo

    2014-01-01

    Fistulifera sp. strain JPCC DA0580 is a newly sequenced pennate diatom that is capable of simultaneously growing and accumulating lipids. This is a unique trait, not found in other related microalgae so far. It is able to accumulate between 40 to 60% of its cell weight in lipids, making it a strong candidate for the production of biofuel. To investigate this characteristic, we used RNA-Seq data gathered at four different times while Fistulifera sp. strain JPCC DA0580 was grown in oil accumulating and non-oil accumulating conditions. We then adapted gene set enrichment analysis (GSEA) to investigate the relationship between the difference in gene expression of 7,822 genes and metabolic functions in our data. We utilized information in the KEGG pathway database to create the gene sets and changed GSEA to use re-sampling so that data from the different time points could be included in the analysis. Our GSEA method identified photosynthesis, lipid synthesis and amino acid synthesis related pathways as processes that play a significant role in oil production and growth in Fistulifera sp. strain JPCC DA0580. In addition to GSEA, we visualized the results by creating a network of compounds and reactions, and plotted the expression data on top of the network. This made existing graph algorithms available to us which we then used to calculate a path that metabolizes glucose into triacylglycerol (TAG) in the smallest number of steps. By visualizing the data this way, we observed a separate up-regulation of genes at different times instead of a concerted response. We also identified two metabolic paths that used less reactions than the one shown in KEGG and showed that the reactions were up-regulated during the experiment. The combination of analysis and visualization methods successfully analyzed time-course data, identified important metabolic pathways and provided new hypotheses for further research. PMID:25268590

  6. Accurate Gene Expression-Based Biodosimetry Using a Minimal Set of Human Gene Transcripts

    SciTech Connect

    Tucker, James D.; Joiner, Michael C.; Thomas, Robert A.; Grever, William E.; Bakhmutsky, Marina V.; Chinkhota, Chantelle N.; Smolinski, Joseph M.; Divine, George W.; Auner, Gregory W.

    2014-03-15

    Purpose: Rapid and reliable methods for conducting biological dosimetry are a necessity in the event of a large-scale nuclear event. Conventional biodosimetry methods lack the speed, portability, ease of use, and low cost required for triaging numerous victims. Here we address this need by showing that polymerase chain reaction (PCR) on a small number of gene transcripts can provide accurate and rapid dosimetry. The low cost and relative ease of PCR compared with existing dosimetry methods suggest that this approach may be useful in mass-casualty triage situations. Methods and Materials: Human peripheral blood from 60 adult donors was acutely exposed to cobalt-60 gamma rays at doses of 0 (control) to 10 Gy. mRNA expression levels of 121 selected genes were obtained 0.5, 1, and 2 days after exposure by reverse-transcriptase real-time PCR. Optimal dosimetry at each time point was obtained by stepwise regression of dose received against individual gene transcript expression levels. Results: Only 3 to 4 different gene transcripts, ASTN2, CDKN1A, GDF15, and ATM, are needed to explain ≥0.87 of the variance (R{sup 2}). Receiver-operator characteristics, a measure of sensitivity and specificity, of 0.98 for these statistical models were achieved at each time point. Conclusions: The actual and predicted radiation doses agree very closely up to 6 Gy. Dosimetry at 8 and 10 Gy shows some effect of saturation, thereby slightly diminishing the ability to quantify higher exposures. Analyses of these gene transcripts may be advantageous for use in a field-portable device designed to assess exposures in mass casualty situations or in clinical radiation emergencies.

  7. NetworkAnalyst for statistical, visual and network-based meta-analysis of gene expression data.

    PubMed

    Xia, Jianguo; Gill, Erin E; Hancock, Robert E W

    2015-06-01

    Meta-analysis of gene expression data sets is increasingly performed to help identify robust molecular signatures and to gain insights into underlying biological processes. The complicated nature of such analyses requires both advanced statistics and innovative visualization strategies to support efficient data comparison, interpretation and hypothesis generation. NetworkAnalyst (http://www.networkanalyst.ca) is a comprehensive web-based tool designed to allow bench researchers to perform various common and complex meta-analyses of gene expression data via an intuitive web interface. By coupling well-established statistical procedures with state-of-the-art data visualization techniques, NetworkAnalyst allows researchers to easily navigate large complex gene expression data sets to determine important features, patterns, functions and connections, thus leading to the generation of new biological hypotheses. This protocol provides a step-wise description of how to effectively use NetworkAnalyst to perform network analysis and visualization from gene lists; to perform meta-analysis on gene expression data while taking into account multiple metadata parameters; and, finally, to perform a meta-analysis of multiple gene expression data sets. NetworkAnalyst is designed to be accessible to biologists rather than to specialist bioinformaticians. The complete protocol can be executed in ∼1.5 h. Compared with other similar web-based tools, NetworkAnalyst offers a unique visual analytics experience that enables data analysis within the context of protein-protein interaction networks, heatmaps or chord diagrams. All of these analysis methods provide the user with supporting statistical and functional evidence. PMID:25950236

  8. Evidence for a Global Sampling Process in Extraction of Summary Statistics of Item Sizes in a Set

    PubMed Central

    Tokita, Midori; Ueda, Sachiyo; Ishiguchi, Akira

    2016-01-01

    Several studies have shown that our visual system may construct a “summary statistical representation” over groups of visual objects. Although there is a general understanding that human observers can accurately represent sets of a variety of features, many questions on how summary statistics, such as an average, are computed remain unanswered. This study investigated sampling properties of visual information used by human observers to extract two types of summary statistics of item sets, average and variance. We presented three models of ideal observers to extract the summary statistics: a global sampling model without sampling noise, global sampling model with sampling noise, and limited sampling model. We compared the performance of an ideal observer of each model with that of human observers using statistical efficiency analysis. Results suggest that summary statistics of items in a set may be computed without representing individual items, which makes it possible to discard the limited sampling account. Moreover, the extraction of summary statistics may not necessarily require the representation of individual objects with focused attention when the sets of items are larger than 4. PMID:27242622

  9. Can You Explain that in Plain English? Making Statistics Group Projects Work in a Multicultural Setting

    ERIC Educational Resources Information Center

    Sisto, Michelle

    2009-01-01

    Students increasingly need to learn to communicate statistical results clearly and effectively, as well as to become competent consumers of statistical information. These two learning goals are particularly important for business students. In line with reform movements in Statistics Education and the GAISE guidelines, we are working to implement…

  10. Pancreatic beta cells express a diverse set of homeobox genes.

    PubMed Central

    Rudnick, A; Ling, T Y; Odagiri, H; Rutter, W J; German, M S

    1994-01-01

    Homeobox genes, which are found in all eukaryotic organisms, encode transcriptional regulators involved in cell-type differentiation and development. Several homeobox genes encoding homeodomain proteins that bind and activate the insulin gene promoter have been described. In an attempt to identify additional beta-cell homeodomain proteins, we designed primers based on the sequences of beta-cell homeobox genes cdx3 and lmx1 and the Drosophila homeodomain protein Antennapedia and used these primers to amplify inserts by PCR from an insulinoma cDNA library. The resulting amplification products include sequences encoding 10 distinct homeodomain proteins; 3 of these proteins have not been described previously. In addition, an insert was obtained encoding a splice variant of engrailed-2, a homeodomain protein previously identified in the central nervous system. Northern analysis revealed a distinct pattern of expression for each homeobox gene. Interestingly, the PCR-derived clones do not represent a complete sampling of the beta-cell library because no inserts encoding cdx3 or lmx1 protein were obtained. Beta cells probably express additional homeobox genes. The abundance and diversity of homeodomain proteins found in beta cells illustrate the remarkable complexity and redundancy of the machinery controlling beta-cell development and differentiation. Images PMID:7991607

  11. Gene set analysis of genome-wide association studies: methodological issues and perspectives

    PubMed Central

    Wang, Lily; Jia, Peilin; Wolfinger, Russell D; Chen, Xi; Zhao, Zhongming

    2013-01-01

    Recent studies have demonstrated that gene set analysis, which tests disease association with genetic variants in a group of functionally related genes, is a promising approach for analyzing and interpreting genome-wide association studies (GWAS) data. These approaches aim to increase power by combining association signals from multiple genes in the same gene set. In addition, gene set analysis can also shed more light on the biological processes underlying complex diseases. However, current approaches for gene set analysis are still in an early stage of development in that analysis results are often prone to sources of bias, including gene set size and gene length, linkage disequilibrium patterns and the presence of overlapping genes. In this paper, we provide an in-depth review of the gene set analysis procedures, along with parameter choices and the particular methodology challenges at each stage. In addition to providing a survey of recently developed tools, we also classify the analysis methods into larger categories and discuss their strengths and limitations. In the last section, we outline several important areas for improving the analytical strategies in gene set analysis. PMID:21565265

  12. Constellation Map: Downstream visualization and interpretation of gene set enrichment results

    PubMed Central

    Tamayo, Pablo; Haining, W. Nicholas; Mesirov, Jill P.

    2015-01-01

    Summary: Gene set enrichment analysis (GSEA) approaches are widely used to identify coordinately regulated genes associated with phenotypes of interest. Here, we present Constellation Map, a tool to visualize and interpret the results when enrichment analyses yield a long list of significantly enriched gene sets. Constellation Map identifies commonalities that explain the enrichment of multiple top-scoring gene sets and maps the relationships between them. Constellation Map can help investigators take full advantage of GSEA and facilitates the biological interpretation of enrichment results. Availability: Constellation Map is freely available as a GenePattern module at http://www.genepattern.org. PMID:26594333

  13. Highly informative marker sets consisting of genes with low individual degree of differential expression

    PubMed Central

    Galatenko, V. V.; Shkurnikov, M. Yu.; Samatov, T. R.; Galatenko, A. V.; Mityakina, I. A.; Kaprin, A. D.; Schumacher, U.; Tonevitsky, A. G.

    2015-01-01

    Genes with significant differential expression are traditionally used to reveal the genetic background underlying phenotypic differences between cancer cells. We hypothesized that informative marker sets can be obtained by combining genes with a relatively low degree of individual differential expression. We developed a method for construction of highly informative gene combinations aimed at the maximization of the cumulative informative power and identified sets of 2–5 genes efficiently predicting recurrence for ER-positive breast cancer patients. The gene combinations constructed on the basis of microarray data were successfully applied to data acquired by RNA-seq. The developed method provides the basis for the generation of highly efficient prognostic and predictive gene signatures for cancer and other diseases. The identified gene sets can potentially reveal novel essential segments of gene interaction networks and pathways implied in cancer progression. PMID:26446398

  14. Performance of Single and Concatenated Sets of Mitochondrial Genes at Inferring Metazoan Relationships Relative to Full Mitogenome Data

    PubMed Central

    Havird, Justin C.; Santos, Scott R.

    2014-01-01

    Mitochondrial (mt) genes are some of the most popular and widely-utilized genetic loci in phylogenetic studies of metazoan taxa. However, their linked nature has raised questions on whether using the entire mitogenome for phylogenetics is overkill (at best) or pseudoreplication (at worst). Moreover, no studies have addressed the comparative phylogenetic utility of mitochondrial genes across individual lineages within the entire Metazoa. To comment on the phylogenetic utility of individual mt genes as well as concatenated subsets of genes, we analyzed mitogenomic data from 1865 metazoan taxa in 372 separate lineages spanning genera to subphyla. Specifically, phylogenies inferred from these datasets were statistically compared to ones generated from all 13 mt protein-coding (PC) genes (i.e., the “supergene” set) to determine which single genes performed “best” at, and the minimum number of genes required to, recover the “supergene” topology. Surprisingly, the popular marker COX1 performed poorest, while ND5, ND4, and ND2 were most likely to reproduce the “supergene” topology. Averaged across all lineages, the longest ∼2 mt PC genes were sufficient to recreate the “supergene” topology, although this average increased to ∼5 genes for datasets with 40 or more taxa. Furthermore, concatenation of the three “best” performing mt PC genes outperformed that of the three longest mt PC genes (i.e, ND5, COX1, and ND4). Taken together, while not all mt PC genes are equally interchangeable in phylogenetic studies of the metazoans, some subset can serve as a proxy for the 13 mt PC genes. However, the exact number and identity of these genes is specific to the lineage in question and cannot be applied indiscriminately across the Metazoa. PMID:24454717

  15. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update

    PubMed Central

    Kuleshov, Maxim V.; Jones, Matthew R.; Rouillard, Andrew D.; Fernandez, Nicolas F.; Duan, Qiaonan; Wang, Zichen; Koplev, Simon; Jenkins, Sherry L.; Jagodnik, Kathleen M.; Lachmann, Alexander; McDermott, Michael G.; Monteiro, Caroline D.; Gundersen, Gregory W.; Ma'ayan, Avi

    2016-01-01

    Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Here we present a significant update to one of the tools in this domain called Enrichr. Enrichr currently contains a large collection of diverse gene set libraries available for analysis and download. In total, Enrichr currently contains 180 184 annotated gene sets from 102 gene set libraries. New features have been added to Enrichr including the ability to submit fuzzy sets, upload BED files, improved application programming interface and visualization of the results as clustergrams. Overall, Enrichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries. Enrichr is freely available at: http://amp.pharm.mssm.edu/Enrichr. PMID:27141961

  16. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update.

    PubMed

    Kuleshov, Maxim V; Jones, Matthew R; Rouillard, Andrew D; Fernandez, Nicolas F; Duan, Qiaonan; Wang, Zichen; Koplev, Simon; Jenkins, Sherry L; Jagodnik, Kathleen M; Lachmann, Alexander; McDermott, Michael G; Monteiro, Caroline D; Gundersen, Gregory W; Ma'ayan, Avi

    2016-07-01

    Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Here we present a significant update to one of the tools in this domain called Enrichr. Enrichr currently contains a large collection of diverse gene set libraries available for analysis and download. In total, Enrichr currently contains 180 184 annotated gene sets from 102 gene set libraries. New features have been added to Enrichr including the ability to submit fuzzy sets, upload BED files, improved application programming interface and visualization of the results as clustergrams. Overall, Enrichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries. Enrichr is freely available at: http://amp.pharm.mssm.edu/Enrichr. PMID:27141961

  17. Statistical mechanics of chromatin: Inferring free energies of nucleosome formation from high-throughput data sets

    NASA Astrophysics Data System (ADS)

    Morozov, Alexandre

    2009-03-01

    Formation of nucleosome core particles is a first step towards packaging genomic DNA into chromosomes in living cells. Nucleosomes are formed by wrapping 147 base pairs of DNA around a spool of eight histone proteins. It is reasonable to assume that formation of single nucleosomes in vitro is determined by DNA sequence alone: it costs less elastic energy to wrap a flexible DNA polymer around the histone octamer, and more if the polymer is rigid. However, it is unclear to which extent this effect is important in living cells. Cells have evolved chromatin remodeling enzymes that expend ATP to actively reposition nucleosomes. In addition, nucleosome positioning on long DNA sequences is affected by steric exclusion - many nucleosomes have to form simultaneously without overlap. Currently available bioinformatics methods for predicting nucleosome positions are trained on in vivo data sets and are thus unable to distinguish between extrinsic and intrinsic nucleosome positioning signals. In order to see the relative importance of such signals for nucleosome positioning in vivo, we have developed a model based on a large collection of DNA sequences from nucleosomes reconstituted in vitro by salt dialysis. We have used these data to infer the free energy of nucleosome formation at each position along the genome. The method uses an exact result from the statistical mechanics of classical 1D fluids to infer the free energy landscape from nucleosome occupancy. We will discuss the degree to which in vitro nucleosome occupancy profiles are predictive of in vivo nucleosome positions, and will estimate how many nucleosomes are sequence-specific and how many are positioned purely by steric exclusion. Our approach to nucleosome energetics should be applicable across multiple organisms and genomic regions.

  18. Statistics

    Cancer.gov

    Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.

  19. Gene-Set Local Hierarchical Clustering (GSLHC)—A Gene Set-Based Approach for Characterizing Bioactive Compounds in Terms of Biological Functional Groups

    PubMed Central

    Hsu, Tzu-Ting; Hsu, Chueh-Lin; Liu, Hsueh-Chuan; Lee, Hoong-Chien

    2015-01-01

    Gene-set-based analysis (GSA), which uses the relative importance of functional gene-sets, or molecular signatures, as units for analysis of genome-wide gene expression data, has exhibited major advantages with respect to greater accuracy, robustness, and biological relevance, over individual gene analysis (IGA), which uses log-ratios of individual genes for analysis. Yet IGA remains the dominant mode of analysis of gene expression data. The Connectivity Map (CMap), an extensive database on genomic profiles of effects of drugs and small molecules and widely used for studies related to repurposed drug discovery, has been mostly employed in IGA mode. Here, we constructed a GSA-based version of CMap, Gene-Set Connectivity Map (GSCMap), in which all the genomic profiles in CMap are converted, using gene-sets from the Molecular Signatures Database, to functional profiles. We showed that GSCMap essentially eliminated cell-type dependence, a weakness of CMap in IGA mode, and yielded significantly better performance on sample clustering and drug-target association. As a first application of GSCMap we constructed the platform Gene-Set Local Hierarchical Clustering (GSLHC) for discovering insights on coordinated actions of biological functions and facilitating classification of heterogeneous subtypes on drug-driven responses. GSLHC was shown to tightly clustered drugs of known similar properties. We used GSLHC to identify the therapeutic properties and putative targets of 18 compounds of previously unknown characteristics listed in CMap, eight of which suggest anti-cancer activities. The GSLHC website http://cloudr.ncu.edu.tw/gslhc/ contains 1,857 local hierarchical clusters accessible by querying 555 of the 1,309 drugs and small molecules listed in CMap. We expect GSCMap and GSLHC to be widely useful in providing new insights in the biological effect of bioactive compounds, in drug repurposing, and in function-based classification of complex diseases. PMID:26473729

  20. Gene-Set Local Hierarchical Clustering (GSLHC)--A Gene Set-Based Approach for Characterizing Bioactive Compounds in Terms of Biological Functional Groups.

    PubMed

    Chung, Feng-Hsiang; Jin, Zhen-Hua; Hsu, Tzu-Ting; Hsu, Chueh-Lin; Liu, Hsueh-Chuan; Lee, Hoong-Chien

    2015-01-01

    Gene-set-based analysis (GSA), which uses the relative importance of functional gene-sets, or molecular signatures, as units for analysis of genome-wide gene expression data, has exhibited major advantages with respect to greater accuracy, robustness, and biological relevance, over individual gene analysis (IGA), which uses log-ratios of individual genes for analysis. Yet IGA remains the dominant mode of analysis of gene expression data. The Connectivity Map (CMap), an extensive database on genomic profiles of effects of drugs and small molecules and widely used for studies related to repurposed drug discovery, has been mostly employed in IGA mode. Here, we constructed a GSA-based version of CMap, Gene-Set Connectivity Map (GSCMap), in which all the genomic profiles in CMap are converted, using gene-sets from the Molecular Signatures Database, to functional profiles. We showed that GSCMap essentially eliminated cell-type dependence, a weakness of CMap in IGA mode, and yielded significantly better performance on sample clustering and drug-target association. As a first application of GSCMap we constructed the platform Gene-Set Local Hierarchical Clustering (GSLHC) for discovering insights on coordinated actions of biological functions and facilitating classification of heterogeneous subtypes on drug-driven responses. GSLHC was shown to tightly clustered drugs of known similar properties. We used GSLHC to identify the therapeutic properties and putative targets of 18 compounds of previously unknown characteristics listed in CMap, eight of which suggest anti-cancer activities. The GSLHC website http://cloudr.ncu.edu.tw/gslhc/ contains 1,857 local hierarchical clusters accessible by querying 555 of the 1,309 drugs and small molecules listed in CMap. We expect GSCMap and GSLHC to be widely useful in providing new insights in the biological effect of bioactive compounds, in drug repurposing, and in function-based classification of complex diseases. PMID:26473729

  1. Mechanical Unloading of Mouse Bone in Microgravity Significantly Alters Cell Cycle Gene Set Expression

    NASA Astrophysics Data System (ADS)

    Blaber, Elizabeth; Dvorochkin, Natalya; Almeida, Eduardo; Kaplan, Warren; Burns, Brnedan

    2012-07-01

    unloading in spaceflight, we conducted genome wide microarray analysis of total RNA isolated from the mouse pelvis. Specifically, 16 week old mice were subjected to 15 days spaceflight onboard NASA's STS-131 space shuttle mission. The pelvis of the mice was dissected, the bone marrow was flushed and the bones were briefly stored in RNAlater. The pelvii were then homogenized, and RNA was isolated using TRIzol. RNA concentration and quality was measured using a Nanodrop spectrometer, and 0.8% agarose gel electrophoresis. Samples of cDNA were analyzed using an Affymetrix GeneChip\\S Gene 1.0 ST (Sense Target) Array System for Mouse and GenePattern Software. We normalized the ST gene arrays using Robust Multichip Average (RMA) normalization, which summarizes perfectly matched spots on the array through the median polish algorithm, rather than normalizing according to mismatched spots. We also used Limma for statistical analysis, using the BioConductor Limma Library by Gordon Smyth, and differential expression analysis to identify genes with significant changes in expression between the two experimental conditions. Finally we used GSEApreRanked for Gene Set Enrichment Analysis (GSEA), with Kolmogorov-Smirnov style statistics to identify groups of genes that are regulated together using the t-statistics derived from Limma. Preliminary results show that 6,603 genes expressed in pelvic bone had statistically significant alterations in spaceflight compared to ground controls. These prominently included cell cycle arrest molecules p21, and p18, cell survival molecule Crbp1, and cell cycle molecules cyclin D1, and Cdk1. Additionally, GSEA results indicated alterations in molecular targets of cyclin D1 and Cdk4, senescence pathways resulting from abnormal laminin maturation, cell-cell contacts via E-cadherin, and several pathways relating to protein translation and metabolism. In total 111 gene sets out of 2,488, about 4%, showed statistically significant set alterations. These

  2. Selection of the Maximum Spatial Cluster Size of the Spatial Scan Statistic by Using the Maximum Clustering Set-Proportion Statistic.

    PubMed

    Ma, Yue; Yin, Fei; Zhang, Tao; Zhou, Xiaohua Andrew; Li, Xiaosong

    2016-01-01

    Spatial scan statistics are widely used in various fields. The performance of these statistics is influenced by parameters, such as maximum spatial cluster size, and can be improved by parameter selection using performance measures. Current performance measures are based on the presence of clusters and are thus inapplicable to data sets without known clusters. In this work, we propose a novel overall performance measure called maximum clustering set-proportion (MCS-P), which is based on the likelihood of the union of detected clusters and the applied dataset. MCS-P was compared with existing performance measures in a simulation study to select the maximum spatial cluster size. Results of other performance measures, such as sensitivity and misclassification, suggest that the spatial scan statistic achieves accurate results in most scenarios with the maximum spatial cluster sizes selected using MCS-P. Given that previously known clusters are not required in the proposed strategy, selection of the optimal maximum cluster size with MCS-P can improve the performance of the scan statistic in applications without identified clusters. PMID:26820646

  3. Set statistics in conductive bridge random access memory device with Cu/HfO{sub 2}/Pt structure

    SciTech Connect

    Zhang, Meiyun; Long, Shibing Wang, Guoming; Xu, Xiaoxin; Li, Yang; Liu, Qi; Lv, Hangbing; Liu, Ming; Lian, Xiaojuan; Miranda, Enrique; Suñé, Jordi

    2014-11-10

    The switching parameter variation of resistive switching memory is one of the most important challenges in its application. In this letter, we have studied the set statistics of conductive bridge random access memory with a Cu/HfO{sub 2}/Pt structure. The experimental distributions of the set parameters in several off resistance ranges are shown to nicely fit a Weibull model. The Weibull slopes of the set voltage and current increase and decrease logarithmically with off resistance, respectively. This experimental behavior is perfectly captured by a Monte Carlo simulator based on the cell-based set voltage statistics model and the Quantum Point Contact electron transport model. Our work provides indications for the improvement of the switching uniformity.

  4. Using genomic annotations increases statistical power to detect eGenes

    PubMed Central

    Duong, Dat; Zou, Jennifer; Hormozdiari, Farhad; Sul, Jae Hoon; Ernst, Jason; Han, Buhm; Eskin, Eleazar

    2016-01-01

    Motivation: Expression quantitative trait loci (eQTLs) are genetic variants that affect gene expression. In eQTL studies, one important task is to find eGenes or genes whose expressions are associated with at least one eQTL. The standard statistical method to determine whether a gene is an eGene requires association testing at all nearby variants and the permutation test to correct for multiple testing. The standard method however does not consider genomic annotation of the variants. In practice, variants near gene transcription start sites (TSSs) or certain histone modifications are likely to regulate gene expression. In this article, we introduce a novel eGene detection method that considers this empirical evidence and thereby increases the statistical power. Results: We applied our method to the liver Genotype-Tissue Expression (GTEx) data using distance from TSSs, DNase hypersensitivity sites, and six histone modifications as the genomic annotations for the variants. Each of these annotations helped us detected more candidate eGenes. Distance from TSS appears to be the most important annotation; specifically, using this annotation, our method discovered 50% more candidate eGenes than the standard permutation method. Contact: buhm.han@amc.seoul.kr or eeskin@cs.ucla.edu PMID:27307612

  5. Degrees of separation as a statistical tool for evaluating candidate genes.

    PubMed

    Nelson, Ronald M; Pettersson, Mats E

    2014-12-01

    Selection of candidate genes is an important step in the exploration of complex genetic architecture. The number of gene networks available is increasing and these can provide information to help with candidate gene selection. It is currently common to use the degree of connectedness in gene networks as validation in Genome Wide Association (GWA) and Quantitative Trait Locus (QTL) mapping studies. However, it can cause misleading results if not validated properly. Here we present a method and tool for validating the gene pairs from GWA studies given the context of the network they co-occur in. It ensures that proposed interactions and gene associations are not statistical artefacts inherent to the specific gene network architecture. The CandidateBacon package provides an easy and efficient method to calculate the average degree of separation (DoS) between pairs of genes to currently available gene networks. We show how these empirical estimates of average connectedness are used to validate candidate gene pairs. Validation of interacting genes by comparing their connectedness with the average connectedness in the gene network will provide support for said interactions by utilising the growing amount of gene network information available. PMID:25450218

  6. Fundamental Limitations of High Contrast Imaging Set by Small Sample Statistics

    NASA Astrophysics Data System (ADS)

    Mawet, D.; Milli, J.; Wahhaj, Z.; Pelat, D.; Absil, O.; Delacroix, C.; Boccaletti, A.; Kasper, M.; Kenworthy, M.; Marois, C.; Mennesson, B.; Pueyo, L.

    2014-09-01

    In this paper, we review the impact of small sample statistics on detection thresholds and corresponding confidence levels (CLs) in high-contrast imaging at small angles. When looking close to the star, the number of resolution elements decreases rapidly toward small angles. This reduction of the number of degrees of freedom dramatically affects CLs and false alarm probabilities. Naively using the same ideal hypothesis and methods as for larger separations, which are well understood and commonly assume Gaussian noise, can yield up to one order of magnitude error in contrast estimations at fixed CL. The statistical penalty exponentially increases toward very small inner working angles. Even at 5-10 resolution elements from the star, false alarm probabilities can be significantly higher than expected. Here we present a rigorous statistical analysis that ensures robustness of the CL, but also imposes a substantial limitation on corresponding achievable detection limits (thus contrast) at small angles. This unavoidable fundamental statistical effect has a significant impact on current coronagraphic and future high-contrast imagers. Finally, the paper concludes with practical recommendations to account for small number statistics when computing the sensitivity to companions at small angles and when exploiting the results of direct imaging planet surveys.

  7. Fundamental limitations of high contrast imaging set by small sample statistics

    SciTech Connect

    Mawet, D.; Milli, J.; Wahhaj, Z.; Pelat, D.; Absil, O.; Delacroix, C.; Boccaletti, A.; Kasper, M.; Kenworthy, M.; Marois, C.; Mennesson, B.; Pueyo, L.

    2014-09-10

    In this paper, we review the impact of small sample statistics on detection thresholds and corresponding confidence levels (CLs) in high-contrast imaging at small angles. When looking close to the star, the number of resolution elements decreases rapidly toward small angles. This reduction of the number of degrees of freedom dramatically affects CLs and false alarm probabilities. Naively using the same ideal hypothesis and methods as for larger separations, which are well understood and commonly assume Gaussian noise, can yield up to one order of magnitude error in contrast estimations at fixed CL. The statistical penalty exponentially increases toward very small inner working angles. Even at 5-10 resolution elements from the star, false alarm probabilities can be significantly higher than expected. Here we present a rigorous statistical analysis that ensures robustness of the CL, but also imposes a substantial limitation on corresponding achievable detection limits (thus contrast) at small angles. This unavoidable fundamental statistical effect has a significant impact on current coronagraphic and future high-contrast imagers. Finally, the paper concludes with practical recommendations to account for small number statistics when computing the sensitivity to companions at small angles and when exploiting the results of direct imaging planet surveys.

  8. Comprehensive analysis of SET domain gene family in foxtail millet identifies the putative role of SiSET14 in abiotic stress tolerance.

    PubMed

    Yadav, Chandra Bhan; Muthamilarasan, Mehanathan; Dangi, Anand; Shweta, Shweta; Prasad, Manoj

    2016-01-01

    SET domain-containing genes catalyse histone lysine methylation, which alters chromatin structure and regulates the transcription of genes that are involved in various developmental and physiological processes. The present study identified 53 SET domain-containing genes in C4 panicoid model, foxtail millet (Setaria italica) and the genes were physically mapped onto nine chromosomes. Phylogenetic and structural analyses classified SiSET proteins into five classes (I-V). RNA-seq derived expression profiling showed that SiSET genes were differentially expressed in four tissues namely, leaf, root, stem and spica. Expression analyses using qRT-PCR was performed for 21 SiSET genes under different abiotic stress and hormonal treatments, which showed differential expression of these genes during late phase of stress and hormonal treatments. Significant upregulation of SiSET gene was observed during cold stress, which has been confirmed by over-expressing a candidate gene, SiSET14 in yeast. Interestingly, hypermethylation was observed in gene body of highly differentially expressed genes, whereas methylation event was completely absent in their transcription start sites. This suggested the occurrence of demethylation events during various abiotic stresses, which enhance the gene expression. Altogether, the present study would serve as a base for further functional characterization of SiSET genes towards understanding their molecular roles in conferring stress tolerance. PMID:27585852

  9. Comprehensive analysis of SET domain gene family in foxtail millet identifies the putative role of SiSET14 in abiotic stress tolerance

    PubMed Central

    Yadav, Chandra Bhan; Muthamilarasan, Mehanathan; Dangi, Anand; Shweta, Shweta; Prasad, Manoj

    2016-01-01

    SET domain-containing genes catalyse histone lysine methylation, which alters chromatin structure and regulates the transcription of genes that are involved in various developmental and physiological processes. The present study identified 53 SET domain-containing genes in C4 panicoid model, foxtail millet (Setaria italica) and the genes were physically mapped onto nine chromosomes. Phylogenetic and structural analyses classified SiSET proteins into five classes (I–V). RNA-seq derived expression profiling showed that SiSET genes were differentially expressed in four tissues namely, leaf, root, stem and spica. Expression analyses using qRT-PCR was performed for 21 SiSET genes under different abiotic stress and hormonal treatments, which showed differential expression of these genes during late phase of stress and hormonal treatments. Significant upregulation of SiSET gene was observed during cold stress, which has been confirmed by over-expressing a candidate gene, SiSET14 in yeast. Interestingly, hypermethylation was observed in gene body of highly differentially expressed genes, whereas methylation event was completely absent in their transcription start sites. This suggested the occurrence of demethylation events during various abiotic stresses, which enhance the gene expression. Altogether, the present study would serve as a base for further functional characterization of SiSET genes towards understanding their molecular roles in conferring stress tolerance. PMID:27585852

  10. An 80-gene set to predict response to preoperative chemoradiotherapy for rectal cancer by principle component analysis

    PubMed Central

    EMPUKU, SHINICHIRO; NAKAJIMA, KENTARO; AKAGI, TOMONORI; KANEKO, KUNIHIKO; HIJIYA, NAOKI; ETOH, TSUYOSHI; SHIRAISHI, NORIO; MORIYAMA, MASATSUGU; INOMATA, MASAFUMI

    2016-01-01

    Preoperative chemoradiotherapy (CRT) for locally advanced rectal cancer not only improves the postoperative local control rate, but also induces downstaging. However, it has not been established how to individually select patients who receive effective preoperative CRT. The aim of this study was to identify a predictor of response to preoperative CRT for locally advanced rectal cancer. This study is additional to our multicenter phase II study evaluating the safety and efficacy of preoperative CRT using oral fluorouracil (UMIN ID: 03396). From April, 2009 to August, 2011, 26 biopsy specimens obtained prior to CRT were analyzed by cyclopedic microarray analysis. Response to CRT was evaluated according to a histological grading system using surgically resected specimens. To decide on the number of genes for dividing into responder and non-responder groups, we statistically analyzed the data using a dimension reduction method, a principle component analysis. Of the 26 cases, 11 were responders and 15 non-responders. No significant difference was found in clinical background data between the two groups. We determined that the optimal number of genes for the prediction of response was 80 of 40,000 and the functions of these genes were analyzed. When comparing non-responders with responders, genes expressed at a high level functioned in alternative splicing, whereas those expressed at a low level functioned in the septin complex. Thus, an 80-gene expression set that predicts response to preoperative CRT for locally advanced rectal cancer was identified using a novel statistical method. PMID:27123272

  11. Fast and Rigorous Computation of Gene and Pathway Scores from SNP-Based Summary Statistics

    PubMed Central

    Rueedi, Rico; Kutalik, Zoltán; Bergmann, Sven

    2016-01-01

    Integrating single nucleotide polymorphism (SNP) p-values from genome-wide association studies (GWAS) across genes and pathways is a strategy to improve statistical power and gain biological insight. Here, we present Pascal (Pathway scoring algorithm), a powerful tool for computing gene and pathway scores from SNP-phenotype association summary statistics. For gene score computation, we implemented analytic and efficient numerical solutions to calculate test statistics. We examined in particular the sum and the maximum of chi-squared statistics, which measure the strongest and the average association signals per gene, respectively. For pathway scoring, we use a modified Fisher method, which offers not only significant power improvement over more traditional enrichment strategies, but also eliminates the problem of arbitrary threshold selection inherent in any binary membership based pathway enrichment approach. We demonstrate the marked increase in power by analyzing summary statistics from dozens of large meta-studies for various traits. Our extensive testing indicates that our method not only excels in rigorous type I error control, but also results in more biologically meaningful discoveries. PMID:26808494

  12. Fast and Rigorous Computation of Gene and Pathway Scores from SNP-Based Summary Statistics.

    PubMed

    Lamparter, David; Marbach, Daniel; Rueedi, Rico; Kutalik, Zoltán; Bergmann, Sven

    2016-01-01

    Integrating single nucleotide polymorphism (SNP) p-values from genome-wide association studies (GWAS) across genes and pathways is a strategy to improve statistical power and gain biological insight. Here, we present Pascal (Pathway scoring algorithm), a powerful tool for computing gene and pathway scores from SNP-phenotype association summary statistics. For gene score computation, we implemented analytic and efficient numerical solutions to calculate test statistics. We examined in particular the sum and the maximum of chi-squared statistics, which measure the strongest and the average association signals per gene, respectively. For pathway scoring, we use a modified Fisher method, which offers not only significant power improvement over more traditional enrichment strategies, but also eliminates the problem of arbitrary threshold selection inherent in any binary membership based pathway enrichment approach. We demonstrate the marked increase in power by analyzing summary statistics from dozens of large meta-studies for various traits. Our extensive testing indicates that our method not only excels in rigorous type I error control, but also results in more biologically meaningful discoveries. PMID:26808494

  13. Selection of the Maximum Spatial Cluster Size of the Spatial Scan Statistic by Using the Maximum Clustering Set-Proportion Statistic

    PubMed Central

    Ma, Yue; Yin, Fei; Zhang, Tao; Zhou, Xiaohua Andrew; Li, Xiaosong

    2016-01-01

    Spatial scan statistics are widely used in various fields. The performance of these statistics is influenced by parameters, such as maximum spatial cluster size, and can be improved by parameter selection using performance measures. Current performance measures are based on the presence of clusters and are thus inapplicable to data sets without known clusters. In this work, we propose a novel overall performance measure called maximum clustering set–proportion (MCS-P), which is based on the likelihood of the union of detected clusters and the applied dataset. MCS-P was compared with existing performance measures in a simulation study to select the maximum spatial cluster size. Results of other performance measures, such as sensitivity and misclassification, suggest that the spatial scan statistic achieves accurate results in most scenarios with the maximum spatial cluster sizes selected using MCS-P. Given that previously known clusters are not required in the proposed strategy, selection of the optimal maximum cluster size with MCS-P can improve the performance of the scan statistic in applications without identified clusters. PMID:26820646

  14. A Demonstration of Using Person-Fit Statistics in Standard Setting.

    ERIC Educational Resources Information Center

    Bay, Luz; Nering, Michael L.

    The use of person-fit methods to determine the extent to which a panelist's ratings fit the item response theory (IRT) models used in the National Assessment of Educational Progress (NAEP) is demonstrated. Person-fit methods are statistical methods that allow the identification of nonfitting response vectors. To determine whether panelists'…

  15. Application of biclustering of gene expression data and gene set enrichment analysis methods to identify potentially disease causing nanomaterials

    PubMed Central

    Halappanavar, Sabina

    2015-01-01

    Summary Background: The presence of diverse types of nanomaterials (NMs) in commerce is growing at an exponential pace. As a result, human exposure to these materials in the environment is inevitable, necessitating the need for rapid and reliable toxicity testing methods to accurately assess the potential hazards associated with NMs. In this study, we applied biclustering and gene set enrichment analysis methods to derive essential features of altered lung transcriptome following exposure to NMs that are associated with lung-specific diseases. Several datasets from public microarray repositories describing pulmonary diseases in mouse models following exposure to a variety of substances were examined and functionally related biclusters of genes showing similar expression profiles were identified. The identified biclusters were then used to conduct a gene set enrichment analysis on pulmonary gene expression profiles derived from mice exposed to nano-titanium dioxide (nano-TiO2), carbon black (CB) or carbon nanotubes (CNTs) to determine the disease significance of these data-driven gene sets. Results: Biclusters representing inflammation (chemokine activity), DNA binding, cell cycle, apoptosis, reactive oxygen species (ROS) and fibrosis processes were identified. All of the NM studies were significant with respect to the bicluster related to chemokine activity (DAVID; FDR p-value = 0.032). The bicluster related to pulmonary fibrosis was enriched in studies where toxicity induced by CNT and CB studies was investigated, suggesting the potential for these materials to induce lung fibrosis. The pro-fibrogenic potential of CNTs is well established. Although CB has not been shown to induce fibrosis, it induces stronger inflammatory, oxidative stress and DNA damage responses than nano-TiO2 particles. Conclusion: The results of the analysis correctly identified all NMs to be inflammogenic and only CB and CNTs as potentially fibrogenic. In addition to identifying several

  16. Assessment and improvement of statistical tools for comparative proteomics analysis of sparse data sets with few experimental replicates.

    PubMed

    Schwämmle, Veit; León, Ileana Rodríguez; Jensen, Ole Nørregaard

    2013-09-01

    Large-scale quantitative analyses of biological systems are often performed with few replicate experiments, leading to multiple nonidentical data sets due to missing values. For example, mass spectrometry driven proteomics experiments are frequently performed with few biological or technical replicates due to sample-scarcity or due to duty-cycle or sensitivity constraints, or limited capacity of the available instrumentation, leading to incomplete results where detection of significant feature changes becomes a challenge. This problem is further exacerbated for the detection of significant changes on the peptide level, for example, in phospho-proteomics experiments. In order to assess the extent of this problem and the implications for large-scale proteome analysis, we investigated and optimized the performance of three statistical approaches by using simulated and experimental data sets with varying numbers of missing values. We applied three tools, including standard t test, moderated t test, also known as limma, and rank products for the detection of significantly changing features in simulated and experimental proteomics data sets with missing values. The rank product method was improved to work with data sets containing missing values. Extensive analysis of simulated and experimental data sets revealed that the performance of the statistical analysis tools depended on simple properties of the data sets. High-confidence results were obtained by using the limma and rank products methods for analyses of triplicate data sets that exhibited more than 1000 features and more than 50% missing values. The maximum number of differentially represented features was identified by using limma and rank products methods in a complementary manner. We therefore recommend combined usage of these methods as a novel and optimal way to detect significantly changing features in these data sets. This approach is suitable for large quantitative data sets from stable isotope labeling

  17. GeneAnalytics: An Integrative Gene Set Analysis Tool for Next Generation Sequencing, RNAseq and Microarray Data.

    PubMed

    Ben-Ari Fuchs, Shani; Lieder, Iris; Stelzer, Gil; Mazor, Yaron; Buzhor, Ella; Kaplan, Sergey; Bogoch, Yoel; Plaschkes, Inbar; Shitrit, Alina; Rappaport, Noa; Kohn, Asher; Edgar, Ron; Shenhav, Liraz; Safran, Marilyn; Lancet, Doron; Guan-Golan, Yaron; Warshawsky, David; Shtrichman, Ronit

    2016-03-01

    Postgenomics data are produced in large volumes by life sciences and clinical applications of novel omics diagnostics and therapeutics for precision medicine. To move from "data-to-knowledge-to-innovation," a crucial missing step in the current era is, however, our limited understanding of biological and clinical contexts associated with data. Prominent among the emerging remedies to this challenge are the gene set enrichment tools. This study reports on GeneAnalytics™ ( geneanalytics.genecards.org ), a comprehensive and easy-to-apply gene set analysis tool for rapid contextualization of expression patterns and functional signatures embedded in the postgenomics Big Data domains, such as Next Generation Sequencing (NGS), RNAseq, and microarray experiments. GeneAnalytics' differentiating features include in-depth evidence-based scoring algorithms, an intuitive user interface and proprietary unified data. GeneAnalytics employs the LifeMap Science's GeneCards suite, including the GeneCards®--the human gene database; the MalaCards-the human diseases database; and the PathCards--the biological pathways database. Expression-based analysis in GeneAnalytics relies on the LifeMap Discovery®--the embryonic development and stem cells database, which includes manually curated expression data for normal and diseased tissues, enabling advanced matching algorithm for gene-tissue association. This assists in evaluating differentiation protocols and discovering biomarkers for tissues and cells. Results are directly linked to gene, disease, or cell "cards" in the GeneCards suite. Future developments aim to enhance the GeneAnalytics algorithm as well as visualizations, employing varied graphical display items. Such attributes make GeneAnalytics a broadly applicable postgenomics data analyses and interpretation tool for translation of data to knowledge-based innovation in various Big Data fields such as precision medicine, ecogenomics, nutrigenomics, pharmacogenomics, vaccinomics

  18. Characterizing gene sets using discriminative random walks with restart on heterogeneous biological networks

    PubMed Central

    Blatti, Charles; Sinha, Saurabh

    2016-01-01

    Motivation: Analysis of co-expressed gene sets typically involves testing for enrichment of different annotations or ‘properties’ such as biological processes, pathways, transcription factor binding sites, etc., one property at a time. This common approach ignores any known relationships among the properties or the genes themselves. It is believed that known biological relationships among genes and their many properties may be exploited to more accurately reveal commonalities of a gene set. Previous work has sought to achieve this by building biological networks that combine multiple types of gene–gene or gene–property relationships, and performing network analysis to identify other genes and properties most relevant to a given gene set. Most existing network-based approaches for recognizing genes or annotations relevant to a given gene set collapse information about different properties to simplify (homogenize) the networks. Results: We present a network-based method for ranking genes or properties related to a given gene set. Such related genes or properties are identified from among the nodes of a large, heterogeneous network of biological information. Our method involves a random walk with restarts, performed on an initial network with multiple node and edge types that preserve more of the original, specific property information than current methods that operate on homogeneous networks. In this first stage of our algorithm, we find the properties that are the most relevant to the given gene set and extract a subnetwork of the original network, comprising only these relevant properties. We then re-rank genes by their similarity to the given gene set, based on a second random walk with restarts, performed on the above subnetwork. We demonstrate the effectiveness of this algorithm for ranking genes related to Drosophila embryonic development and aggressive responses in the brains of social animals. Availability and Implementation: DRaWR was implemented as

  19. GeneMarker® Genotyping Software: Tools to Increase the Statistical Power of DNA Fragment Analysis

    PubMed Central

    Hulce, D.; Li, X.; Snyder-Leiby, T.; Johathan Liu, C.S.

    2011-01-01

    The discriminatory power of post-genotyping analyses, such as kinship or clustering analysis, is dependent on the amount of genetic information obtained from the DNA fragment/genotyping analysis. The number of microsatellite loci amplified in one multiplex is limited by the number of dyes and overlapping loci boundaries; requiring researchers to amplify replicate samples with 2 or more multiplexes in order to obtain a genotype for 12–15 loci. AFLP is another method that is limited by the number of dyes, often requiring multiple amplifications of replicate samples to obtain more complete results. Traditionally, researchers export the genotyping results into a spread sheet, manually combine the results for each individual and then import into a third software package for post-genotyping analysis. GeneMarker is highly accurate, user-friendly genotyping software that allows all of these steps to be done in one software package, avoiding potential errors from data transfer to different programs and decreasing the amount of time needed to process the results. The Merge Project tool automatically combines the results from replicate samples processed with different primer sets. Replicate animal (diploid) DNA samples were amplified with three different multiplexes, each multiplex provided information on 4–6 loci. The kinship analysis using the merged results provided a 1017 increase in statistical power with a range of 108 when 5 loci were used versus 1025 when 15 loci were used to determine potential relationship levels with identity by descent calculations. These same sample sets were used in clustering analysis to diagram dendrograms. The dendrogram based on a single multiplex resulted in three branches at a given Euclidian distance. In comparison, the dendrogram that was constructed using the merged results had eight branches at the same Euclidian distance.

  20. GeneAnalytics: An Integrative Gene Set Analysis Tool for Next Generation Sequencing, RNAseq and Microarray Data

    PubMed Central

    Ben-Ari Fuchs, Shani; Lieder, Iris; Mazor, Yaron; Buzhor, Ella; Kaplan, Sergey; Bogoch, Yoel; Plaschkes, Inbar; Shitrit, Alina; Rappaport, Noa; Kohn, Asher; Edgar, Ron; Shenhav, Liraz; Safran, Marilyn; Lancet, Doron; Guan-Golan, Yaron; Warshawsky, David; Shtrichman, Ronit

    2016-01-01

    Abstract Postgenomics data are produced in large volumes by life sciences and clinical applications of novel omics diagnostics and therapeutics for precision medicine. To move from “data-to-knowledge-to-innovation,” a crucial missing step in the current era is, however, our limited understanding of biological and clinical contexts associated with data. Prominent among the emerging remedies to this challenge are the gene set enrichment tools. This study reports on GeneAnalytics™ (geneanalytics.genecards.org), a comprehensive and easy-to-apply gene set analysis tool for rapid contextualization of expression patterns and functional signatures embedded in the postgenomics Big Data domains, such as Next Generation Sequencing (NGS), RNAseq, and microarray experiments. GeneAnalytics' differentiating features include in-depth evidence-based scoring algorithms, an intuitive user interface and proprietary unified data. GeneAnalytics employs the LifeMap Science's GeneCards suite, including the GeneCards®—the human gene database; the MalaCards—the human diseases database; and the PathCards—the biological pathways database. Expression-based analysis in GeneAnalytics relies on the LifeMap Discovery®—the embryonic development and stem cells database, which includes manually curated expression data for normal and diseased tissues, enabling advanced matching algorithm for gene–tissue association. This assists in evaluating differentiation protocols and discovering biomarkers for tissues and cells. Results are directly linked to gene, disease, or cell “cards” in the GeneCards suite. Future developments aim to enhance the GeneAnalytics algorithm as well as visualizations, employing varied graphical display items. Such attributes make GeneAnalytics a broadly applicable postgenomics data analyses and interpretation tool for translation of data to knowledge-based innovation in various Big Data fields such as precision medicine, ecogenomics, nutrigenomics

  1. Gene expression profiling of peripheral blood mononuclear cells in the setting of peripheral arterial disease

    PubMed Central

    2012-01-01

    Background Peripheral arterial disease (PAD) is a relatively common manifestation of systemic atherosclerosis that leads to progressive narrowing of the lumen of leg arteries. Circulating monocytes are in contact with the arterial wall and can serve as reporters of vascular pathology in the setting of PAD. We performed gene expression analysis of peripheral blood mononuclear cells (PBMC) in patients with PAD and controls without PAD to identify differentially regulated genes. Methods PAD was defined as an ankle brachial index (ABI) ≤0.9 (n = 19) while age and gender matched controls had an ABI > 1.0 (n = 18). Microarray analysis was performed using Affymetrix HG-U133 plus 2.0 gene chips and analyzed using GeneSpring GX 11.0. Gene expression data was normalized using Robust Multichip Analysis (RMA) normalization method, differential expression was defined as a fold change ≥1.5, followed by unpaired Mann-Whitney test (P < 0.05) and correction for multiple testing by Benjamini and Hochberg False Discovery Rate. Meta-analysis of differentially expressed genes was performed using an integrated bioinformatics pipeline with tools for enrichment analysis using Gene Ontology (GO) terms, pathway analysis using Kyoto Encyclopedia of Genes and Genomes (KEGG), molecular event enrichment using Reactome annotations and network analysis using Ingenuity Pathway Analysis suite. Extensive biocuration was also performed to understand the functional context of genes. Results We identified 87 genes differentially expressed in the setting of PAD; 40 genes were upregulated and 47 genes were downregulated. We employed an integrated bioinformatics pipeline coupled with literature curation to characterize the functional coherence of differentially regulated genes. Conclusion Notably, upregulated genes mediate immune response, inflammation, apoptosis, stress response, phosphorylation, hemostasis, platelet activation and platelet aggregation. Downregulated genes included several genes from

  2. Studying gene and gene-environment effects of uncommon and common variants on continuous traits: a marker-set approach using gene-trait similarity regression.

    PubMed

    Tzeng, Jung-Ying; Zhang, Daowen; Pongpanich, Monnat; Smith, Chris; McCarthy, Mark I; Sale, Michèle M; Worrall, Bradford B; Hsu, Fang-Chi; Thomas, Duncan C; Sullivan, Patrick F

    2011-08-12

    Genomic association analyses of complex traits demand statistical tools that are capable of detecting small effects of common and rare variants and modeling complex interaction effects and yet are computationally feasible. In this work, we introduce a similarity-based regression method for assessing the main genetic and interaction effects of a group of markers on quantitative traits. The method uses genetic similarity to aggregate information from multiple polymorphic sites and integrates adaptive weights that depend on allele frequencies to accomodate common and uncommon variants. Collapsing information at the similarity level instead of the genotype level avoids canceling signals that have the opposite etiological effects and is applicable to any class of genetic variants without the need for dichotomizing the allele types. To assess gene-trait associations, we regress trait similarities for pairs of unrelated individuals on their genetic similarities and assess association by using a score test whose limiting distribution is derived in this work. The proposed regression framework allows for covariates, has the capacity to model both main and interaction effects, can be applied to a mixture of different polymorphism types, and is computationally efficient. These features make it an ideal tool for evaluating associations between phenotype and marker sets defined by linkage disequilibrium (LD) blocks, genes, or pathways in whole-genome analysis. PMID:21835306

  3. Statistical inference of selection and divergence of rice blast resistance gene Pi-ta

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The resistance gene Pi-ta has been effectively used to control rice blast disease worldwide. A few recent studies have described the possible evolution of Pi-ta in cultivated and weedy rice. However, evolutionary statistics used for the studies are too limited to precisely understand selection and d...

  4. GeneBrowser 2: an application to explore and identify common biological traits in a set of genes

    PubMed Central

    2010-01-01

    Background The development of high-throughput laboratory techniques created a demand for computer-assisted result analysis tools. Many of these techniques return lists of genes whose interpretation requires finding relevant biological roles for the problem at hand. The required information is typically available in public databases, and usually, this information must be manually retrieved to complement the analysis. This process is a very time-consuming task that should be automated as much as possible. Results GeneBrowser is a web-based tool that, for a given list of genes, combines data from several public databases with visualisation and analysis methods to help identify the most relevant and common biological characteristics. The functionalities provided include the following: a central point with the most relevant biological information for each inserted gene; a list of the most related papers in PubMed and gene expression studies in ArrayExpress; and an extended approach to functional analysis applied to Gene Ontology, homologies, gene chromosomal localisation and pathways. Conclusions GeneBrowser provides a unique entry point to several visualisation and analysis methods, providing fast and easy analysis of a set of genes. GeneBrowser fills the gap between Web portals that analyse one gene at a time and functional analysis tools that are limited in scope and usually desktop-based. PMID:20663121

  5. Gene integrated set profile analysis: a context-based approach for inferring biological endpoints

    PubMed Central

    Kowalski, Jeanne; Dwivedi, Bhakti; Newman, Scott; Switchenko, Jeffery M.; Pauly, Rini; Gutman, David A.; Arora, Jyoti; Gandhi, Khanjan; Ainslie, Kylie; Doho, Gregory; Qin, Zhaohui; Moreno, Carlos S.; Rossi, Michael R.; Vertino, Paula M.; Lonial, Sagar; Bernal-Mizrachi, Leon; Boise, Lawrence H.

    2016-01-01

    The identification of genes with specific patterns of change (e.g. down-regulated and methylated) as phenotype drivers or samples with similar profiles for a given gene set as drivers of clinical outcome, requires the integration of several genomic data types for which an ‘integrate by intersection’ (IBI) approach is often applied. In this approach, results from separate analyses of each data type are intersected, which has the limitation of a smaller intersection with more data types. We introduce a new method, GISPA (Gene Integrated Set Profile Analysis) for integrated genomic analysis and its variation, SISPA (Sample Integrated Set Profile Analysis) for defining respective genes and samples with the context of similar, a priori specified molecular profiles. With GISPA, the user defines a molecular profile that is compared among several classes and obtains ranked gene sets that satisfy the profile as drivers of each class. With SISPA, the user defines a gene set that satisfies a profile and obtains sample groups of profile activity. Our results from applying GISPA to human multiple myeloma (MM) cell lines contained genes of known profiles and importance, along with several novel targets, and their further SISPA application to MM coMMpass trial data showed clinical relevance. PMID:26826710

  6. Gene integrated set profile analysis: a context-based approach for inferring biological endpoints.

    PubMed

    Kowalski, Jeanne; Dwivedi, Bhakti; Newman, Scott; Switchenko, Jeffery M; Pauly, Rini; Gutman, David A; Arora, Jyoti; Gandhi, Khanjan; Ainslie, Kylie; Doho, Gregory; Qin, Zhaohui; Moreno, Carlos S; Rossi, Michael R; Vertino, Paula M; Lonial, Sagar; Bernal-Mizrachi, Leon; Boise, Lawrence H

    2016-04-20

    The identification of genes with specific patterns of change (e.g. down-regulated and methylated) as phenotype drivers or samples with similar profiles for a given gene set as drivers of clinical outcome, requires the integration of several genomic data types for which an 'integrate by intersection' (IBI) approach is often applied. In this approach, results from separate analyses of each data type are intersected, which has the limitation of a smaller intersection with more data types. We introduce a new method, GISPA (Gene Integrated Set Profile Analysis) for integrated genomic analysis and its variation, SISPA (Sample Integrated Set Profile Analysis) for defining respective genes and samples with the context of similar, a priori specified molecular profiles. With GISPA, the user defines a molecular profile that is compared among several classes and obtains ranked gene sets that satisfy the profile as drivers of each class. With SISPA, the user defines a gene set that satisfies a profile and obtains sample groups of profile activity. Our results from applying GISPA to human multiple myeloma (MM) cell lines contained genes of known profiles and importance, along with several novel targets, and their further SISPA application to MM coMMpass trial data showed clinical relevance. PMID:26826710

  7. Confero: an integrated contrast data and gene set platform for computational analysis and biological interpretation of omics data

    PubMed Central

    2013-01-01

    Background High-throughput omics technologies such as microarrays and next-generation sequencing (NGS) have become indispensable tools in biological research. Computational analysis and biological interpretation of omics data can pose significant challenges due to a number of factors, in particular the systems integration required to fully exploit and compare data from different studies and/or technology platforms. In transcriptomics, the identification of differentially expressed genes when studying effect(s) or contrast(s) of interest constitutes the starting point for further downstream computational analysis (e.g. gene over-representation/enrichment analysis, reverse engineering) leading to mechanistic insights. Therefore, it is important to systematically store the full list of genes with their associated statistical analysis results (differential expression, t-statistics, p-value) corresponding to one or more effect(s) or contrast(s) of interest (shortly termed as ” contrast data”) in a comparable manner and extract gene sets in order to efficiently support downstream analyses and further leverage data on a long-term basis. Filling this gap would open new research perspectives for biologists to discover disease-related biomarkers and to support the understanding of molecular mechanisms underlying specific biological perturbation effects (e.g. disease, genetic, environmental, etc.). Results To address these challenges, we developed Confero, a contrast data and gene set platform for downstream analysis and biological interpretation of omics data. The Confero software platform provides storage of contrast data in a simple and standard format, data transformation to enable cross-study and platform data comparison, and automatic extraction and storage of gene sets to build new a priori knowledge which is leveraged by integrated and extensible downstream computational analysis tools. Gene Set Enrichment Analysis (GSEA) and Over-Representation Analysis (ORA) are

  8. Statistical Analysis of Hurst Exponents of Essential/Nonessential Genes in 33 Bacterial Genomes

    PubMed Central

    Liu, Xiao; Wang, Baojin; Xu, Luo

    2015-01-01

    Methods for identifying essential genes currently depend predominantly on biochemical experiments. However, there is demand for improved computational methods for determining gene essentiality. In this study, we used the Hurst exponent, a characteristic parameter to describe long-range correlation in DNA, and analyzed its distribution in 33 bacterial genomes. In most genomes (31 out of 33) the significance levels of the Hurst exponents of the essential genes were significantly higher than for the corresponding full-gene-set, whereas the significance levels of the Hurst exponents of the nonessential genes remained unchanged or increased only slightly. All of the Hurst exponents of essential genes followed a normal distribution, with one exception. We therefore propose that the distribution feature of Hurst exponents of essential genes can be used as a classification index for essential gene prediction in bacteria. For computer-aided design in the field of synthetic biology, this feature can build a restraint for pre- or post-design checking of bacterial essential genes. Moreover, considering the relationship between gene essentiality and evolution, the Hurst exponents could be used as a descriptive parameter related to evolutionary level, or be added to the annotation of each gene. PMID:26067107

  9. Re-Conceptualization of Modified Angoff Standard Setting: Unified Statistical, Measurement, Cognitive, and Social Psychological Theories

    ERIC Educational Resources Information Center

    Iyioke, Ifeoma Chika

    2013-01-01

    This dissertation describes a design for training, in accordance with probability judgment heuristics principles, for the Angoff standard setting method. The new training with instruction, practice, and feedback tailored to the probability judgment heuristics principles was called the Heuristic training and the prevailing Angoff method training…

  10. Intertwining Threshold Settings, Biological Data and Database Knowledge to Optimize the Selection of Differentially Expressed Genes from Microarray

    PubMed Central

    Chuchana, Paul; Holzmuller, Philippe; Vezilier, Frederic; Berthier, David; Chantal, Isabelle; Severac, Dany; Lemesre, Jean Loup; Cuny, Gerard; Nirdé, Philippe; Bucheton, Bruno

    2010-01-01

    Background Many tools used to analyze microarrays in different conditions have been described. However, the integration of deregulated genes within coherent metabolic pathways is lacking. Currently no objective selection criterion based on biological functions exists to determine a threshold demonstrating that a gene is indeed differentially expressed. Methodology/Principal Findings To improve transcriptomic analysis of microarrays, we propose a new statistical approach that takes into account biological parameters. We present an iterative method to optimise the selection of differentially expressed genes in two experimental conditions. The stringency level of gene selection was associated simultaneously with the p-value of expression variation and the occurrence rate parameter associated with the percentage of donors whose transcriptomic profile is similar. Our method intertwines stringency level settings, biological data and a knowledge database to highlight molecular interactions using networks and pathways. Analysis performed during iterations helped us to select the optimal threshold required for the most pertinent selection of differentially expressed genes. Conclusions/Significance We have applied this approach to the well documented mechanism of human macrophage response to lipopolysaccharide stimulation. We thus verified that our method was able to determine with the highest degree of accuracy the best threshold for selecting genes that are truly differentially expressed. PMID:20976008

  11. Integrated Data Collection Analysis (IDCA) Program - Statistical Analysis of RDX Standard Data Sets

    SciTech Connect

    Sandstrom, Mary M.; Brown, Geoffrey W.; Preston, Daniel N.; Pollard, Colin J.; Warner, Kirstin F.; Sorensen, Daniel N.; Remmers, Daniel L.; Phillips, Jason J.; Shelley, Timothy J.; Reyes, Jose A.; Hsu, Peter C.; Reynolds, John G.

    2015-10-30

    The Integrated Data Collection Analysis (IDCA) program is conducting a Proficiency Test for Small- Scale Safety and Thermal (SSST) testing of homemade explosives (HMEs). Described here are statistical analyses of the results for impact, friction, electrostatic discharge, and differential scanning calorimetry analysis of the RDX Type II Class 5 standard. The material was tested as a well-characterized standard several times during the proficiency study to assess differences among participants and the range of results that may arise for well-behaved explosive materials. The analyses show that there are detectable differences among the results from IDCA participants. While these differences are statistically significant, most of them can be disregarded for comparison purposes to assess potential variability when laboratories attempt to measure identical samples using methods assumed to be nominally the same. The results presented in this report include the average sensitivity results for the IDCA participants and the ranges of values obtained. The ranges represent variation about the mean values of the tests of between 26% and 42%. The magnitude of this variation is attributed to differences in operator, method, and environment as well as the use of different instruments that are also of varying age. The results appear to be a good representation of the broader safety testing community based on the range of methods, instruments, and environments included in the IDCA Proficiency Test.

  12. Allele diversity for abiotic stress responsive candidate genes in chickpea reference set using gene based SNP markers

    PubMed Central

    Roorkiwal, Manish; Nayak, Spurthi N.; Thudi, Mahendar; Upadhyaya, Hari D.; Brunel, Dominique; Mournet, Pierre; This, Dominique; Sharma, Prakash C.; Varshney, Rajeev K.

    2014-01-01

    Chickpea is an important food legume crop for the semi-arid regions, however, its productivity is adversely affected by various biotic and abiotic stresses. Identification of candidate genes associated with abiotic stress response will help breeding efforts aiming to enhance its productivity. With this objective, 10 abiotic stress responsive candidate genes were selected on the basis of prior knowledge of this complex trait. These 10 genes were subjected to allele specific sequencing across a chickpea reference set comprising 300 genotypes including 211 genotypes of chickpea mini core collection. A total of 1.3 Mbp sequence data were generated. Multiple sequence alignment (MSA) revealed 79 SNPs and 41 indels in nine genes while the CAP2 gene was found to be conserved across all the genotypes. Among 10 candidate genes, the maximum number of SNPs (34) was observed in abscisic acid stress and ripening (ASR) gene including 22 transitions, 11 transversions and one tri-allelic SNP. Nucleotide diversity varied from 0.0004 to 0.0029 while polymorphism information content (PIC) values ranged from 0.01 (AKIN gene) to 0.43 (CAP2 promoter). Haplotype analysis revealed that alleles were represented by more than two haplotype blocks, except alleles of the CAP2 and sucrose synthase (SuSy) gene, where only one haplotype was identified. These genes can be used for association analysis and if validated, may be useful for enhancing abiotic stress, including drought tolerance, through molecular breeding. PMID:24926299

  13. Statistics of dark matter halos in the excursion set peak framework

    SciTech Connect

    Lapi, A.; Danese, L. E-mail: danese@sissa.it

    2014-07-01

    We derive approximated, yet very accurate analytical expressions for the abundance and clustering properties of dark matter halos in the excursion set peak framework; the latter relies on the standard excursion set approach, but also includes the effects of a realistic filtering of the density field, a mass-dependent threshold for collapse, and the prescription from peak theory that halos tend to form around density maxima. We find that our approximations work excellently for diverse power spectra, collapse thresholds and density filters. Moreover, when adopting a cold dark matter power spectra, a tophat filtering and a mass-dependent collapse threshold (supplemented with conceivable scatter), our approximated halo mass function and halo bias represent very well the outcomes of cosmological N-body simulations.

  14. Gene Set Signature of Reversal Reaction Type I in Leprosy Patients

    PubMed Central

    Orlova, Marianna; Cobat, Aurélie; Huong, Nguyen Thu; Ba, Nguyen Ngoc; Van Thuc, Nguyen; Spencer, John; Nédélec, Yohann; Barreiro, Luis; Thai, Vu Hong; Abel, Laurent; Alcaïs, Alexandre; Schurr, Erwin

    2013-01-01

    Leprosy reversal reactions type 1 (T1R) are acute immune episodes that affect a subset of leprosy patients and remain a major cause of nerve damage. Little is known about the relative importance of innate versus environmental factors in the pathogenesis of T1R. In a retrospective design, we evaluated innate differences in response to Mycobacterium leprae between healthy individuals and former leprosy patients affected or free of T1R by analyzing the transcriptome response of whole blood to M. leprae sonicate. Validation of results was conducted in a subsequent prospective study. We observed the differential expression of 581 genes upon exposure of whole blood to M. leprae sonicate in the retrospective study. We defined a 44 T1R gene set signature of differentially regulated genes. The majority of the T1R set genes were represented by three functional groups: i) pro-inflammatory regulators; ii) arachidonic acid metabolism mediators; and iii) regulators of anti-inflammation. The validity of the T1R gene set signature was replicated in the prospective arm of the study. The T1R genetic signature encompasses genes encoding pro- and anti-inflammatory mediators of innate immunity. This suggests an innate defect in the regulation of the inflammatory response to M. leprae antigens. The identified T1R gene set represents a critical first step towards a genetic profile of leprosy patients who are at increased risk of T1R and concomitant nerve damage. PMID:23874223

  15. Resolving ancient radiations: can complete plastid gene sets elucidate deep relationships among the tropical gingers (Zingiberales)?

    PubMed Central

    Barrett, Craig F.; Specht, Chelsea D.; Leebens-Mack, Jim; Stevenson, Dennis Wm.; Zomlefer, Wendy B.; Davis, Jerrold I.

    2014-01-01

    Background and Aims Zingiberales comprise a clade of eight tropical monocot families including approx. 2500 species and are hypothesized to have undergone an ancient, rapid radiation during the Cretaceous. Zingiberales display substantial variation in floral morphology, and several members are ecologically and economically important. Deep phylogenetic relationships among primary lineages of Zingiberales have proved difficult to resolve in previous studies, representing a key region of uncertainty in the monocot tree of life. Methods Next-generation sequencing was used to construct complete plastid gene sets for nine taxa of Zingiberales, which were added to five previously sequenced sets in an attempt to resolve deep relationships among families in the order. Variation in taxon sampling, process partition inclusion and partition model parameters were examined to assess their effects on topology and support. Key Results Codon-based likelihood analysis identified a strongly supported clade of ((Cannaceae, Marantaceae), (Costaceae, Zingiberaceae)), sister to (Musaceae, (Lowiaceae, Strelitziaceae)), collectively sister to Heliconiaceae. However, the deepest divergences in this phylogenetic analysis comprised short branches with weak support. Additionally, manipulation of matrices resulted in differing deep topologies in an unpredictable fashion. Alternative topology testing allowed statistical rejection of some of the topologies. Saturation fails to explain observed topological uncertainty and low support at the base of Zingiberales. Evidence for conflict among the plastid data was based on a support metric that accounts for conflicting resampled topologies. Conclusions Many relationships were resolved with robust support, but the paucity of character information supporting the deepest nodes and the existence of conflict suggest that plastid coding regions are insufficient to resolve and support the earliest divergences among families of Zingiberales. Whole plastomes

  16. Identification of a conserved set of upregulated genes in mouse skeletal muscle hypertrophy and regrowth

    PubMed Central

    Chaillou, Thomas; Jackson, Janna R.; England, Jonathan H.; Kirby, Tyler J.; Richards-White, Jena; Esser, Karyn A.; Dupont-Versteegden, Esther E.

    2014-01-01

    The purpose of this study was to compare the gene expression profile of mouse skeletal muscle undergoing two forms of growth (hypertrophy and regrowth) with the goal of identifying a conserved set of differentially expressed genes. Expression profiling by microarray was performed on the plantaris muscle subjected to 1, 3, 5, 7, 10, and 14 days of hypertrophy or regrowth following 2 wk of hind-limb suspension. We identified 97 differentially expressed genes (≥2-fold increase or ≥50% decrease compared with control muscle) that were conserved during the two forms of muscle growth. The vast majority (∼90%) of the differentially expressed genes was upregulated and occurred at a single time point (64 out of 86 genes), which most often was on the first day of the time course. Microarray analysis from the conserved upregulated genes showed a set of genes related to contractile apparatus and stress response at day 1, including three genes involved in mechanotransduction and four genes encoding heat shock proteins. Our analysis further identified three cell cycle-related genes at day and several genes associated with extracellular matrix (ECM) at both days 3 and 10. In conclusion, we have identified a core set of genes commonly upregulated in two forms of muscle growth that could play a role in the maintenance of sarcomere stability, ECM remodeling, cell proliferation, fast-to-slow fiber type transition, and the regulation of skeletal muscle growth. These findings suggest conserved regulatory mechanisms involved in the adaptation of skeletal muscle to increased mechanical loading. PMID:25554798

  17. StemChecker: a web-based tool to discover and explore stemness signatures in gene sets

    PubMed Central

    Pinto, José P.; Kalathur, Ravi K.; Oliveira, Daniel V.; Barata, Tânia; Machado, Rui S.R.; Machado, Susana; Pacheco-Leyva, Ivette; Duarte, Isabel; Futschik, Matthias E.

    2015-01-01

    Stem cells present unique regenerative abilities, offering great potential for treatment of prevalent pathologies such as diabetes, neurodegenerative and heart diseases. Various research groups dedicated significant effort to identify sets of genes—so-called stemness signatures—considered essential to define stem cells. However, their usage has been hindered by the lack of comprehensive resources and easy-to-use tools. For this we developed StemChecker, a novel stemness analysis tool, based on the curation of nearly fifty published stemness signatures defined by gene expression, RNAi screens, Transcription Factor (TF) binding sites, literature reviews and computational approaches. StemChecker allows researchers to explore the presence of stemness signatures in user-defined gene sets, without carrying-out lengthy literature curation or data processing. To assist in exploring underlying regulatory mechanisms, we collected over 80 target gene sets of TFs associated with pluri- or multipotency. StemChecker presents an intuitive graphical display, as well as detailed statistical results in table format, which helps revealing transcriptionally regulatory programs, indicating the putative involvement of stemness-associated processes in diseases like cancer. Overall, StemChecker substantially expands the available repertoire of online tools, designed to assist the stem cell biology, developmental biology, regenerative medicine and human disease research community. StemChecker is freely accessible at http://stemchecker.sysbiolab.eu. PMID:26007653

  18. Gene-set Analysis with CGI Information for Differential DNA Methylation Profiling

    PubMed Central

    Chang, Chia-Wei; Lu, Tzu-Pin; She, Chang-Xian; Feng, Yen-Chen; Hsiao, Chuhsing Kate

    2016-01-01

    DNA methylation is a well-established epigenetic biomarker for many diseases. Studying the relationships among a group of genes and their methylations may help to unravel the etiology of diseases. Since CpG-islands (CGIs) play a crucial role in the regulation of transcription during methylation, including them in the analysis may provide further information in understanding the pathogenesis of cancers. Such CGI information, however, has usually been overlooked in existing gene-set analyses. Here we aimed to include both pathway information and CGI status to rank competing gene-sets and identify among them the genes most likely contributing to DNA methylation changes. To accomplish this, we devised a Bayesian model for matched case-control studies with parameters for CGI status and pathway associations, while incorporating intra-gene-set information. Three cancer studies with candidate pathways were analyzed to illustrate this approach. The strength of association for each candidate pathway and the influence of each gene were evaluated. Results show that, based on probabilities, the importance of pathways and genes can be determined. The findings confirm that some of these genes are cancer-related and may hold the potential to be targeted in drug development. PMID:27090937

  19. Cross-cultural adaptation of research instruments: language, setting, time and statistical considerations

    PubMed Central

    2010-01-01

    Background Research questionnaires are not always translated appropriately before they are used in new temporal, cultural or linguistic settings. The results based on such instruments may therefore not accurately reflect what they are supposed to measure. This paper aims to illustrate the process and required steps involved in the cross-cultural adaptation of a research instrument using the adaptation process of an attitudinal instrument as an example. Methods A questionnaire was needed for the implementation of a study in Norway 2007. There was no appropriate instruments available in Norwegian, thus an Australian-English instrument was cross-culturally adapted. Results The adaptation process included investigation of conceptual and item equivalence. Two forward and two back-translations were synthesized and compared by an expert committee. Thereafter the instrument was pretested and adjusted accordingly. The final questionnaire was administered to opioid maintenance treatment staff (n=140) and harm reduction staff (n=180). The overall response rate was 84%. The original instrument failed confirmatory analysis. Instead a new two-factor scale was identified and found valid in the new setting. Conclusions The failure of the original scale highlights the importance of adapting instruments to current research settings. It also emphasizes the importance of ensuring that concepts within an instrument are equal between the original and target language, time and context. If the described stages in the cross-cultural adaptation process had been omitted, the findings would have been misleading, even if presented with apparent precision. Thus, it is important to consider possible barriers when making a direct comparison between different nations, cultures and times. PMID:20144247

  20. Statistical stage transition detection method for small sample gene expression time series data.

    PubMed

    Tominaga, Daisuke

    2014-08-01

    In terms of their internal (genetic) and external (phenotypic) states, living cells are always changing at varying rates. Periods of stable or low rate of change are often called States, Stages, or Phases, whereas high-rate periods are called Transitions or Transients. While states and transitions are observed phenotypically, such as cell differentiation, cancer progression, for example, are related with gene expression levels. On the other hand, stages of gene expression are definable based on changes of expression levels. Analyzing relations between state changes of phenotypes and stage transitions of gene expression levels is a general approach to elucidate mechanisms of life phenomena. Herein, we propose an algorithm to detect stage transitions in a time series of expression levels of a gene by defining statistically optimal division points. The algorithm shows detecting ability for simulated datasets. An annotation based analysis on detecting results for a dataset of initial development of Caenorhabditis elegans agrees with that are presented in the literature. PMID:24960588

  1. Neighborhood Rough Set Reduction-Based Gene Selection and Prioritization for Gene Expression Profile Analysis and Molecular Cancer Classification

    PubMed Central

    Hou, Mei-Ling; Wang, Shu-Lin; Li, Xue-Ling; Lei, Ying-Ke

    2010-01-01

    Selection of reliable cancer biomarkers is crucial for gene expression profile-based precise diagnosis of cancer type and successful treatment. However, current studies are confronted with overfitting and dimensionality curse in tumor classification and false positives in the identification of cancer biomarkers. Here, we developed a novel gene-ranking method based on neighborhood rough set reduction for molecular cancer classification based on gene expression profile. Comparison with other methods such as PAM, ClaNC, Kruskal-Wallis rank sum test, and Relief-F, our method shows that only few top-ranked genes could achieve higher tumor classification accuracy. Moreover, although the selected genes are not typical of known oncogenes, they are found to play a crucial role in the occurrence of tumor through searching the scientific literature and analyzing protein interaction partners, which may be used as candidate cancer biomarkers. PMID:20625410

  2. Mechanism-based biomarker gene sets for glutathione depletion-related hepatotoxicity in rats

    SciTech Connect

    Gao Weihua; Mizukawa, Yumiko; Nakatsu, Noriyuki; Minowa, Yosuke; Yamada, Hiroshi; Ohno, Yasuo; Urushidani, Tetsuro

    2010-09-15

    Chemical-induced glutathione depletion is thought to be caused by two types of toxicological mechanisms: PHO-type glutathione depletion [glutathione conjugated with chemicals such as phorone (PHO) or diethyl maleate (DEM)], and BSO-type glutathione depletion [i.e., glutathione synthesis inhibited by chemicals such as L-buthionine-sulfoximine (BSO)]. In order to identify mechanism-based biomarker gene sets for glutathione depletion in rat liver, male SD rats were treated with various chemicals including PHO (40, 120 and 400 mg/kg), DEM (80, 240 and 800 mg/kg), BSO (150, 450 and 1500 mg/kg), and bromobenzene (BBZ, 10, 100 and 300 mg/kg). Liver samples were taken 3, 6, 9 and 24 h after administration and examined for hepatic glutathione content, physiological and pathological changes, and gene expression changes using Affymetrix GeneChip Arrays. To identify differentially expressed probe sets in response to glutathione depletion, we focused on the following two courses of events for the two types of mechanisms of glutathione depletion: a) gene expression changes occurring simultaneously in response to glutathione depletion, and b) gene expression changes after glutathione was depleted. The gene expression profiles of the identified probe sets for the two types of glutathione depletion differed markedly at times during and after glutathione depletion, whereas Srxn1 was markedly increased for both types as glutathione was depleted, suggesting that Srxn1 is a key molecule in oxidative stress related to glutathione. The extracted probe sets were refined and verified using various compounds including 13 additional positive or negative compounds, and they established two useful marker sets. One contained three probe sets (Akr7a3, Trib3 and Gstp1) that could detect conjugation-type glutathione depletors any time within 24 h after dosing, and the other contained 14 probe sets that could detect glutathione depletors by any mechanism. These two sets, with appropriate scoring

  3. A statistical investigation into the stability of iris recognition in diverse population sets

    NASA Astrophysics Data System (ADS)

    Howard, John J.; Etter, Delores M.

    2014-05-01

    Iris recognition is increasingly being deployed on population wide scales for important applications such as border security, social service administration, criminal identification and general population management. The error rates for this incredibly accurate form of biometric identification are established using well known, laboratory quality datasets. However, it is has long been acknowledged in biometric theory that not all individuals have the same likelihood of being correctly serviced by a biometric system. Typically, techniques for identifying clients that are likely to experience a false non-match or a false match error are carried out on a per-subject basis. This research makes the novel hypothesis that certain ethnical denominations are more or less likely to experience a biometric error. Through established statistical techniques, we demonstrate this hypothesis to be true and document the notable effect that the ethnicity of the client has on iris similarity scores. Understanding the expected impact of ethnical diversity on iris recognition accuracy is crucial to the future success of this technology as it is deployed in areas where the target population consists of clientele from a range of geographic backgrounds, such as border crossings and immigration check points.

  4. Identification of a set of genes showing regionally enriched expression in the mouse brain

    PubMed Central

    D'Souza, Cletus A; Chopra, Vikramjit; Varhol, Richard; Xie, Yuan-Yun; Bohacec, Slavita; Zhao, Yongjun; Lee, Lisa LC; Bilenky, Mikhail; Portales-Casamar, Elodie; He, An; Wasserman, Wyeth W; Goldowitz, Daniel; Marra, Marco A; Holt, Robert A; Simpson, Elizabeth M; Jones, Steven JM

    2008-01-01

    Background The Pleiades Promoter Project aims to improve gene therapy by designing human mini-promoters (< 4 kb) that drive gene expression in specific brain regions or cell-types of therapeutic interest. Our goal was to first identify genes displaying regionally enriched expression in the mouse brain so that promoters designed from orthologous human genes can then be tested to drive reporter expression in a similar pattern in the mouse brain. Results We have utilized LongSAGE to identify regionally enriched transcripts in the adult mouse brain. As supplemental strategies, we also performed a meta-analysis of published literature and inspected the Allen Brain Atlas in situ hybridization data. From a set of approximately 30,000 mouse genes, 237 were identified as showing specific or enriched expression in 30 target regions of the mouse brain. GO term over-representation among these genes revealed co-involvement in various aspects of central nervous system development and physiology. Conclusion Using a multi-faceted expression validation approach, we have identified mouse genes whose human orthologs are good candidates for design of mini-promoters. These mouse genes represent molecular markers in several discrete brain regions/cell-types, which could potentially provide a mechanistic explanation of unique functions performed by each region. This set of markers may also serve as a resource for further studies of gene regulatory elements influencing brain expression. PMID:18625066

  5. Associations between DNA methylation and schizophrenia-related intermediate phenotypes a gene set enrichment analysis

    PubMed Central

    Hass, Johanna; Walton, Esther; Wright, Carrie; Beyer, Andreas; Scholz, Markus; Turner, Jessica; Liu, Jingyu; Smolka, Michael N.; Roessner, Veit; Sponheim, Scott R.; Gollub, Randy L.; Calhoun, Vince D.; Ehrlich, Stefan

    2015-01-01

    Multiple genetic approaches have identified microRNAs as key effectors in psychiatric disorders as they post-transcriptionally regulate expression of thousands of target genes. However, their role in specific psychiatric diseases remains poorly understood. In addition, epigenetic mechanisms such as DNA methylation, which affect the expression of both microRNAs and coding genes, are critical for our understanding of molecular mechanisms in schizophrenia. Using clinical, imaging, genetic, and epigenetic data of 103 patients with schizophrenia and 111 healthy controls of the Mind Clinical Imaging Consortium (MCIC) study of schizophrenia, we conducted gene set enrichment analysis to identify markers for schizophrenia-associated intermediate phenotypes. Genes were ranked based on the correlation between DNA methylation patterns and each phenotype, and then searched for enrichment in 221 predicted microRNA target gene sets. We found the predicted hsa-miR-219a-5p target gene set to be significantly enriched for genes (EPHA4, PKNOX1, ESR1, amongst others) whose methylation status is correlated with hippocampal volume independent of disease status. Our results were strengthened by significant associations between hsa-miR-219a-5p target gene methylation patterns and hippocampus-related neuropsychological variables. IPA pathway analysis of the respective predicted hsa-miR-219a-5p target genes revealed associated network functions in behaviour and developmental disorders. Altered methylation patterns of predicted hsa-miR-219a-5p target genes are associated with a structural aberration of the brain that has been proposed as a possible biomarker for schizophrenia. The (dys)regulation of microRNA target genes by epigenetic mechanisms may confer additional risk for developing psychiatric symptoms. Further study is needed to understand possible interactions between microRNAs and epigenetic changes and their impact on risk for brain-based disorders such as schizophrenia. PMID

  6. An abdominal aortic aneurysm segmentation method: Level set with region and statistical information

    SciTech Connect

    Zhuge Feng; Rubin, Geoffrey D.; Sun Shaohua; Napel, Sandy

    2006-05-15

    We present a system for segmenting the human aortic aneurysm in CT angiograms (CTA), which, in turn, allows measurements of volume and morphological aspects useful for treatment planning. The system estimates a rough 'initial surface', and then refines it using a level set segmentation scheme augmented with two external analyzers: The global region analyzer, which incorporates a priori knowledge of the intensity, volume, and shape of the aorta and other structures, and the local feature analyzer, which uses voxel location, intensity, and texture features to train and drive a support vector machine classifier. Each analyzer outputs a value that corresponds to the likelihood that a given voxel is part of the aneurysm, which is used during level set iteration to control the evolution of the surface. We tested our system using a database of 20 CTA scans of patients with aortic aneurysms. The mean and worst case values of volume overlap, volume error, mean distance error, and maximum distance error relative to human tracing were 95.3%{+-}1.4% (s.d.); worst case=92.9%, 3.5%{+-}2.5% (s.d.); worst case=7.0%, 0.6{+-}0.2 mm (s.d.); worst case=1.0 mm, and 5.2{+-}2.3mm (s.d.); worstcase=9.6 mm, respectively. When implemented on a 2.8 GHz Pentium IV personal computer, the mean time required for segmentation was 7.4{+-}3.6min (s.d.). We also performed experiments that suggest that our method is insensitive to parameter changes within 10% of their experimentally determined values. This preliminary study proves feasibility for an accurate, precise, and robust system for segmentation of the abdominal aneurysm from CTA data, and may be of benefit to patients with aortic aneurysms.

  7. Different gene sets contribute to different symptom dimensions of depression and anxiety.

    PubMed

    van Veen, Tineke; Goeman, Jelle J; Monajemi, Ramin; Wardenaar, Klaas J; Hartman, Catharina A; Snieder, Harold; Nolte, Ilja M; Penninx, Brenda W J H; Zitman, Frans G

    2012-07-01

    Although many genetic association studies have been carried out, it remains unclear which genes contribute to depression. This may be due to heterogeneity of the DSM-IV category of depression. Specific symptom-dimensions provide a more homogenous phenotype. Furthermore, as effects of individual genes are small, analysis of genetic data at the pathway-level provides more power to detect associations and yield valuable biological insight. In 1,398 individuals with a Major Depressive Disorder, the symptom dimensions of the tripartite model of anxiety and depression, General Distress, Anhedonic Depression, and Anxious Arousal, were measured with the Mood and Anxiety Symptoms Questionnaire (30-item Dutch adaptation; MASQ-D30). Association of these symptom dimensions with candidate gene sets and gene sets from two public pathway databases was tested using the Global test. One pathway was associated with General Distress, and concerned molecules expressed in the endoplasmatic reticulum lumen. Seven pathways were associated with Anhedonic Depression. Important themes were neurodevelopment, neurodegeneration, and cytoskeleton. Furthermore, three gene sets associated with Anxious Arousal regarded development, morphology, and genetic recombination. The individual pathways explained up to 1.7% of the variance. These data demonstrate mechanisms that influence the specific dimensions. Moreover, they show the value of using dimensional phenotypes on one hand and gene sets on the other hand. PMID:22573416

  8. Effective gene selection method with small sample sets using gradient-based and point injection techniques.

    PubMed

    Huang, D; Chow, Tommy W S

    2007-01-01

    Microarray gene expression data usually consist of a large amount of genes. Among these genes, only a small fraction is informative for performing cancer diagnostic test. This paper focuses on effective identification of informative genes. We analyze gene selection models from the perspective of optimization theory. As a result, a new strategy is designed to modify conventional search engines. Also, as overfitting is likely to occur in microarray data because of their small sample set, a point injection technique is developed to address the problem of overfitting. The proposed strategies have been evaluated on three kinds of cancer diagnosis. Our results show that the proposed strategies can improve the performance of gene selection substantially. The experimental results also indicate that the proposed methods are very robust under all the investigated cases. PMID:17666766

  9. Discrimination of white ginseng origins using multivariate statistical analysis of data sets

    PubMed Central

    Song, Hyuk-Hwan; Moon, Ji Young; Ryu, Hyung Won; Noh, Bong-Soo; Kim, Jeong-Han; Lee, Hyeong-Kyu; Oh, Sei-Ryang

    2014-01-01

    Background White ginseng (Panax ginseng Meyer) is commonly distributed as a health food in food markets. However, there is no practical method for distinguishing Korean white ginseng (KWG) from Chinese white ginseng (CWG), except for relying on the traceability system in the market. Methods Ultra-performance liquid chromatography quadrupole time-of-flight mass spectrometry combined with orthogonal partial least squares discrimination analysis (OPLS-DA) was employed to discriminate between KWG and CWG. Results The origins of white ginsengs in two test sets (1.0 μL and 0.2 μL injections) could be successfully discriminated by the OPLS-DA analysis. From OPLS-DA S-plots, KWG exhibited tentative markers derived from ginsenoside Rf and notoginsenoside R3 isomer, whereas CWG exhibited tentative markers derived from ginsenoside Ro and chikusetsusaponin Iva. Conclusion Results suggest that ultra-performance liquid chromatography quadrupole time-of-flight mass spectrometry coupled with OPLS-DA is an efficient tool for identifying the difference between the geographical origins of white ginsengs. PMID:25378993

  10. A Survey of Statistical Models for Reverse Engineering Gene Regulatory Networks

    PubMed Central

    Huang, Yufei; Tienda-Luna, Isabel M.; Wang, Yufeng

    2009-01-01

    Statistical models for reverse engineering gene regulatory networks are surveyed in this article. To provide readers with a system-level view of the modeling issues in this research, a graphical modeling framework is proposed. This framework serves as the scaffolding on which the review of different models can be systematically assembled. Based on the framework, we review many existing models for many aspects of gene regulation; the pros and cons of each model are discussed. In addition, network inference algorithms are also surveyed under the graphical modeling framework by the categories of point solutions and probabilistic solutions and the connections and differences among the algorithms are provided. This survey has the potential to elucidate the development and future of reverse engineering GRNs and bring statistical signal processing closer to the core of this research. PMID:20046885

  11. Global adaptive rank truncated product method for gene-set analysis in association studies.

    PubMed

    Vilor-Tejedor, Natalia; Calle, M Luz

    2014-08-01

    Gene set analysis (GSA) aims to assess the overall association of a set of genetic variants with a phenotype and has the potential to detect subtle effects of variants in a gene or a pathway that might be missed when assessed individually. We present a new implementation of the Adaptive Rank Truncated Product method (ARTP) for analyzing the association of a set of Single Nucleotide Polymorphisms (SNPs) in a gene or pathway. The new implementation, referred to as globalARTP, improves the original one by allowing the different SNPs in the set to have different modes of inheritance. We perform a simulation study for exploring the power of the proposed methodology in a set of scenarios with different numbers of causal SNPs with different effect sizes. Moreover, we show the advantage of using the gene set approach in the context of an Alzheimer's disease case-control study where we explore the endocytosis pathway. The new method is implemented in the R function globalARTP of the globalGSA package available at http://cran.r-project.org. PMID:25082012

  12. A beta-complex statistical four body contact potential combined with a hydrogen bond statistical potential recognizes the correct native structure from protein decoy sets.

    PubMed

    Sánchez-González, Gilberto; Kim, Jae-Kwan; Kim, Deok-Soo; Garduño-Juárez, Ramón

    2013-08-01

    We present a new four-body knowledge-based potential for recognizing the native state of proteins from their misfolded states. This potential was extracted from a large set of protein structures determined by X-ray crystallography using BetaMol, a software based on the recent theory of the beta-complex (β-complex) and quasi-triangulation of the Voronoi diagram of spheres. This geometric construct reflects the size difference among atoms in their full Euclidean metric; property not accounted for in a typical 3D Delaunay triangulation. The ability of this potential to identify the native conformation over a large set of decoys was evaluated. Experiments show that this potential outperforms a potential constructed with a classical Delaunay triangulation in decoy discrimination tests. The addition of a statistical hydrogen bond potential to our four-body potential allows a significant improvement in the decoy discrimination, in such a way that we are able to predict successfully the native structure in 90% of cases. PMID:23568277

  13. The Core Mouse Response to Infection by Neospora Caninum Defined by Gene Set Enrichment Analyses

    PubMed Central

    Ellis, John; Goodswen, Stephen; Kennedy, Paul J; Bush, Stephen

    2012-01-01

    In this study, the BALB/c and Qs mouse responses to infection by the parasite Neospora caninum were investigated in order to identify host response mechanisms. Investigation was done using gene set (enrichment) analyses of microarray data. GSEA, MANOVA, Romer, subGSE and SAM-GS were used to study the contrasts Neospora strain type, Mouse type (BALB/c and Qs) and time post infection (6 hours post infection and 10 days post infection). The analyses show that the major signal in the core mouse response to infection is from time post infection and can be defined by gene ontology terms Protein Kinase Activity, Cell Proliferation and Transcription Initiation. Several terms linked to signaling, morphogenesis, response and fat metabolism were also identified. At 10 days post infection, genes associated with fatty acid metabolism were identified as up regulated in expression. The value of gene set (enrichment) analyses in the analysis of microarray data is discussed. PMID:23012496

  14. Statistical methods in detecting differential expressed genes, analyzing insertion tolerance for genes and group selection for survival data

    NASA Astrophysics Data System (ADS)

    Liu, Fangfang

    The thesis is composed of three independent projects: (i) analyzing transposon-sequencing data to infer functions of genes on bacteria growth (chapter 2), (ii) developing semi-parametric Bayesian method for differential gene expression analysis with RNA-sequencing data (chapter 3), (iii) solving group selection problem for survival data (chapter 4). All projects are motivated by statistical challenges raised in biological research. The first project is motivated by the need to develop statistical models to accommodate the transposon insertion sequencing (Tn-Seq) data, Tn-Seq data consist of sequence reads around each transposon insertion site. The detection of transposon insertion at a given site indicates that the disruption of genomic sequence at this site does not cause essential function loss and the bacteria can still grow. Hence, such measurements have been used to infer the functions of each gene on bacteria growth. We propose a zero-inflated Poisson regression method for analyzing the Tn-Seq count data, and derive an Expectation-Maximization (EM) algorithm to obtain parameter estimates. We also propose a multiple testing procedure that categorizes genes into each of the three states, hypo-tolerant, tolerant, and hyper-tolerant, while controlling false discovery rate. Simulation studies show our method provides good estimation of model parameters and inference on gene functions. In the second project, we model the count data from RNA-sequencing experiment for each gene using a Poisson-Gamma hierarchical model, or equivalently, a negative binomial (NB) model. We derive a full semi-parametric Bayesian approach with Dirichlet process as the prior for the fold changes between two treatment means. An inference strategy using Gibbs algorithm is developed for differential expression analysis. We evaluate our method with several simulation studies, and the results demonstrate that our method outperforms other methods including the popularly applied ones such as edge

  15. Histone H4 Lys 20 monomethylation by histone methylase SET8 mediates Wnt target gene activation.

    PubMed

    Li, Zhenfei; Nie, Fen; Wang, Sheng; Li, Lin

    2011-02-22

    Histone methylation has an important role in transcriptional regulation. However, unlike H3K4 and H3K9 methylation, the role of H4K20 monomethylation (H4K20me-1) in transcriptional regulation remains unclear. Here, we show that Wnt3a specifically stimulates H4K20 monomethylation at the T cell factor (TCF)-binding element through the histone methylase SET8. Additionally, SET8 is crucial for activation of the Wnt reporter gene and target genes in both mammalian cells and zebrafish. Furthermore, SET8 interacts with lymphoid enhancing factor-1 (LEF1)/TCF4 directly, and this interaction is regulated by Wnt3a. Therefore, we conclude that SET8 is a Wnt signaling mediator and is recruited by LEF1/TCF4 to regulate the transcription of Wnt-activated genes, possibly through H4K20 monomethylation at the target gene promoters. Our findings also indicate that H4K20me-1 is a marker for gene transcription activation, at least in canonical Wnt signaling. PMID:21282610

  16. Insights into Colon Cancer Etiology via a Regularized Approach to Gene Set Analysis of GWAS Data

    PubMed Central

    Chen, Lin S.; Hutter, Carolyn M.; Potter, John D.; Liu, Yan; Prentice, Ross L.; Peters, Ulrike; Hsu, Li

    2010-01-01

    Genome-wide association studies (GWAS) have successfully identified susceptibility loci from marginal association analysis of SNPs. Valuable insight into genetic variation underlying complex diseases will likely be gained by considering functionally related sets of genes simultaneously. One approach is to further develop gene set enrichment analysis methods, which are initiated in gene expression studies, to account for the distinctive features of GWAS data. These features include the large number of SNPs per gene, the modest and sparse SNP associations, and the additional information provided by linkage disequilibrium (LD) patterns within genes. We propose a “gene set ridge regression in association studies (GRASS)” algorithm. GRASS summarizes the genetic structure for each gene as eigenSNPs and uses a novel form of regularized regression technique, termed group ridge regression, to select representative eigenSNPs for each gene and assess their joint association with disease risk. Compared with existing methods, the proposed algorithm greatly reduces the high dimensionality of GWAS data while still accounting for multiple hits and/or LD in the same gene. We show by simulation that this algorithm performs well in situations in which there are a large number of predictors compared to sample size. We applied the GRASS algorithm to a genome-wide association study of colon cancer and identified nicotinate and nicotinamide metabolism and transforming growth factor beta signaling as the top two significantly enriched pathways. Elucidating the role of variation in these pathways may enhance our understanding of colon cancer etiology. PMID:20560206

  17. A prognosis classifier for breast cancer based on conserved gene regulation between mammary gland development and tumorigenesis: a multiscale statistical model.

    PubMed

    Tian, Yingpu; Chen, Baozhen; Guan, Pengfei; Kang, Yujia; Lu, Zhongxian

    2013-01-01

    Identification of novel cancer genes for molecular therapy and diagnosis is a current focus of breast cancer research. Although a few small gene sets were identified as prognosis classifiers, more powerful models are still needed for the definition of effective gene sets for the diagnosis and treatment guidance in breast cancer. In the present study, we have developed a novel statistical approach for systematic analysis of intrinsic correlations of gene expression between development and tumorigenesis in mammary gland. Based on this analysis, we constructed a predictive model for prognosis in breast cancer that may be useful for therapy decisions. We first defined developmentally associated genes from a mouse mammary gland epithelial gene expression database. Then, we found that the cancer modulated genes were enriched in this developmentally associated genes list. Furthermore, the developmentally associated genes had a specific expression profile, which associated with the molecular characteristics and histological grade of the tumor. These result suggested that the processes of mammary gland development and tumorigenesis share gene regulatory mechanisms. Then, the list of regulatory genes both on the developmental and tumorigenesis process was defined an 835-member prognosis classifier, which showed an exciting ability to predict clinical outcome of three groups of breast cancer patients (the predictive accuracy 64∼72%) with a robust prognosis prediction (hazard ratio 3.3∼3.8, higher than that of other clinical risk factors (around 2.0-2.8)). In conclusion, our results identified the conserved molecular mechanisms between mammary gland development and neoplasia, and provided a unique potential model for mining unknown cancer genes and predicting the clinical status of breast tumors. These findings also suggested that developmental roles of genes may be important criteria for selecting genes for prognosis prediction in breast cancer. PMID:23565194

  18. Noisy attractors and ergodic sets in models of gene regulatory networks.

    PubMed

    Ribeiro, Andre S; Kauffman, Stuart A

    2007-08-21

    We investigate the hypothesis that cell types are attractors. This hypothesis was criticized with the fact that real gene networks are noisy systems and, thus, do not have attractors [Kadanoff, L., Coppersmith, S., Aldana, M., 2002. Boolean Dynamics with Random Couplings. http://www.citebase.org/abstract?id=oai:arXiv.org:nlin/0204062]. Given the concept of "ergodic set" as a set of states from which the system, once entering, does not leave when subject to internal noise, first, using the Boolean network model, we show that if all nodes of states on attractors are subject to internal state change with a probability p due to noise, multiple ergodic sets are very unlikely. Thereafter, we show that if a fraction of those nodes are "locked" (not subject to state fluctuations caused by internal noise), multiple ergodic sets emerge. Finally, we present an example of a gene network, modelled with a realistic model of transcription and translation and gene-gene interaction, driven by a stochastic simulation algorithm with multiple time-delayed reactions, which has internal noise and that we also subject to external perturbations. We show that, in this case, two distinct ergodic sets exist and are stable within a wide range of parameters variations and, to some extent, to external perturbations. PMID:17543998

  19. Transcriptomic Sequencing Reveals a Set of Unique Genes Activated by Butyrate-Induced Histone Modification.

    PubMed

    Li, Cong-Jun; Li, Robert W; Baldwin, Ransom L; Blomberg, Le Ann; Wu, Sitao; Li, Weizhong

    2016-01-01

    Butyrate is a nutritional element with strong epigenetic regulatory activity as a histone deacetylase inhibitor. Based on the analysis of differentially expressed genes in the bovine epithelial cells using RNA sequencing technology, a set of unique genes that are activated only after butyrate treatment were revealed. A complementary bioinformatics analysis of the functional category, pathway, and integrated network, using Ingenuity Pathways Analysis, indicated that these genes activated by butyrate treatment are related to major cellular functions, including cell morphological changes, cell cycle arrest, and apoptosis. Our results offered insight into the butyrate-induced transcriptomic changes and will accelerate our discerning of the molecular fundamentals of epigenomic regulation. PMID:26819550

  20. Transcriptomic Sequencing Reveals a Set of Unique Genes Activated by Butyrate-Induced Histone Modification

    PubMed Central

    Li, Cong-Jun; Li, Robert W.; Baldwin, Ransom L.; Blomberg, Le Ann; Wu, Sitao; Li, Weizhong

    2016-01-01

    Butyrate is a nutritional element with strong epigenetic regulatory activity as a histone deacetylase inhibitor. Based on the analysis of differentially expressed genes in the bovine epithelial cells using RNA sequencing technology, a set of unique genes that are activated only after butyrate treatment were revealed. A complementary bioinformatics analysis of the functional category, pathway, and integrated network, using Ingenuity Pathways Analysis, indicated that these genes activated by butyrate treatment are related to major cellular functions, including cell morphological changes, cell cycle arrest, and apoptosis. Our results offered insight into the butyrate-induced transcriptomic changes and will accelerate our discerning of the molecular fundamentals of epigenomic regulation. PMID:26819550

  1. Gene regulatory network inference using fused LASSO on multiple data sets

    PubMed Central

    Omranian, Nooshin; Eloundou-Mbebi, Jeanne M. O.; Mueller-Roeber, Bernd; Nikoloski, Zoran

    2016-01-01

    Devising computational methods to accurately reconstruct gene regulatory networks given gene expression data is key to systems biology applications. Here we propose a method for reconstructing gene regulatory networks by simultaneous consideration of data sets from different perturbation experiments and corresponding controls. The method imposes three biologically meaningful constraints: (1) expression levels of each gene should be explained by the expression levels of a small number of transcription factor coding genes, (2) networks inferred from different data sets should be similar with respect to the type and number of regulatory interactions, and (3) relationships between genes which exhibit similar differential behavior over the considered perturbations should be favored. We demonstrate that these constraints can be transformed in a fused LASSO formulation for the proposed method. The comparative analysis on transcriptomics time-series data from prokaryotic species, Escherichia coli and Mycobacterium tuberculosis, as well as a eukaryotic species, mouse, demonstrated that the proposed method has the advantages of the most recent approaches for regulatory network inference, while obtaining better performance and assigning higher scores to the true regulatory links. The study indicates that the combination of sparse regression techniques with other biologically meaningful constraints is a promising framework for gene regulatory network reconstructions. PMID:26864687

  2. Meta gene set enrichment analyses link miR-137-regulated pathways with schizophrenia risk

    PubMed Central

    Wright, Carrie; Calhoun, Vince D.; Ehrlich, Stefan; Wang, Lei; Turner, Jessica A.; Bizzozero, Nora I. Perrone-

    2015-01-01

    Background: A single nucleotide polymorphism (SNP) within MIR137, the host gene for miR-137, has been identified repeatedly as a risk factor for schizophrenia. Previous genetic pathway analyses suggest that potential targets of this microRNA (miRNA) are also highly enriched in schizophrenia-relevant biological pathways, including those involved in nervous system development and function. Methods: In this study, we evaluated the schizophrenia risk of miR-137 target genes within these pathways. Gene set enrichment analysis of pathway-specific miR-137 targets was performed using the stage 1 (21,856 subjects) schizophrenia genome wide association study data from the Psychiatric Genomics Consortium and a small independent replication cohort (244 subjects) from the Mind Clinical Imaging Consortium and Northwestern University. Results: Gene sets of potential miR-137 targets were enriched with variants associated with schizophrenia risk, including target sets involved in axonal guidance signaling, Ephrin receptor signaling, long-term potentiation, PKA signaling, and Sertoli cell junction signaling. The schizophrenia-risk association of SNPs in PKA signaling targets was replicated in the second independent cohort. Conclusions: These results suggest that these biological pathways may be involved in the mechanisms by which this MIR137 variant enhances schizophrenia risk. SNPs in targets and the miRNA host gene may collectively lead to dysregulation of target expression and aberrant functioning of such implicated pathways. Pathway-guided gene set enrichment analyses should be useful in evaluating the impact of other miRNAs and target genes in different diseases. PMID:25941532

  3. Transcriptomic sequencing reveals a set of unique genes activated by butyrate-induced histone modification

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Butyrate is a nutritional element with strong epigenetic regulatory activity as an inhibitor of histone deacetylases (HDACs). Based on the analysis of differentially expressed genes induced by butyrate in the bovine epithelial cell using deep RNA-sequencing technology (RNA-seq), a set of unique gen...

  4. Multiple divergent haplotypes express completely distinct sets of class I MHC genes in zebrafish.

    PubMed

    McConnell, Sean C; Restaino, Anthony C; de Jong, Jill L O

    2014-03-01

    The zebrafish is an important animal model for stem cell biology, cancer, and immunology research. Histocompatibility represents a key intersection of these disciplines; however, histocompatibility in zebrafish remains poorly understood. We examined a set of diverse zebrafish class I major histocompatibility complex (MHC) genes that segregate with specific haplotypes at chromosome 19, and for which donor-recipient matching has been shown to improve engraftment after hematopoietic transplantation. Using flanking gene polymorphisms, we identified six distinct chromosome 19 haplotypes. We describe several novel class I U lineage genes and characterize their sequence properties, expression, and haplotype distribution. Altogether, ten full-length zebrafish class I genes were analyzed, mhc1uba through mhc1uka. Expression data and sequence properties indicate that most are candidate classical genes. Several substitutions in putative peptide anchor residues, often shared with deduced MHC molecules from additional teleost species, suggest flexibility in antigen binding. All ten zebrafish class I genes were uniquely assigned among the six haplotypes, with dominant or codominant expression of one to three genes per haplotype. Interestingly, while the divergent MHC haplotypes display variable gene copy number and content, the different genes appear to have ancient origin, with extremely high levels of sequence diversity. Furthermore, haplotype variability extends beyond the MHC genes to include divergent forms of psmb8. The many disparate haplotypes at this locus therefore represent a remarkable form of genomic region configuration polymorphism. Defining the functional MHC genes within these divergent class I haplotypes in zebrafish will provide an important foundation for future studies in immunology and transplantation. PMID:24291825

  5. General approach for in vivo recovery of cell type-specific effector gene sets

    PubMed Central

    Barsi, Julius C.; Tu, Qiang; Davidson, Eric H.

    2014-01-01

    Differentially expressed, cell type-specific effector gene sets hold the key to multiple important problems in biology, from theoretical aspects of developmental gene regulatory networks (GRNs) to various practical applications. Although individual cell types of interest have been recovered by various methods and analyzed, systematic recovery of multiple cell type-specific gene sets from whole developing organisms has remained problematic. Here we describe a general methodology using the sea urchin embryo, a material of choice because of the large-scale GRNs already solved for this model system. This method utilizes the regulatory states expressed by given cells of the embryo to define cell type and includes a fluorescence activated cell sorting (FACS) procedure that results in no perturbation of transcript representation. We have extensively validated the method by spatial and qualitative analyses of the transcriptome expressed in isolated embryonic skeletogenic cells and as a consequence, generated a prototypical cell type-specific transcriptome database. PMID:24604781

  6. General approach for in vivo recovery of cell type-specific effector gene sets.

    PubMed

    Barsi, Julius C; Tu, Qiang; Davidson, Eric H

    2014-05-01

    Differentially expressed, cell type-specific effector gene sets hold the key to multiple important problems in biology, from theoretical aspects of developmental gene regulatory networks (GRNs) to various practical applications. Although individual cell types of interest have been recovered by various methods and analyzed, systematic recovery of multiple cell type-specific gene sets from whole developing organisms has remained problematic. Here we describe a general methodology using the sea urchin embryo, a material of choice because of the large-scale GRNs already solved for this model system. This method utilizes the regulatory states expressed by given cells of the embryo to define cell type and includes a fluorescence activated cell sorting (FACS) procedure that results in no perturbation of transcript representation. We have extensively validated the method by spatial and qualitative analyses of the transcriptome expressed in isolated embryonic skeletogenic cells and as a consequence, generated a prototypical cell type-specific transcriptome database. PMID:24604781

  7. Learning contextual gene set interaction networks of cancer with condition specificity

    PubMed Central

    2013-01-01

    Background Identifying similarities and differences in the molecular constitutions of various types of cancer is one of the key challenges in cancer research. The appearances of a cancer depend on complex molecular interactions, including gene regulatory networks and gene-environment interactions. This complexity makes it challenging to decipher the molecular origin of the cancer. In recent years, many studies reported methods to uncover heterogeneous depictions of complex cancers, which are often categorized into different subtypes. The challenge is to identify diverse molecular contexts within a cancer, to relate them to different subtypes, and to learn underlying molecular interactions specific to molecular contexts so that we can recommend context-specific treatment to patients. Results In this study, we describe a novel method to discern molecular interactions specific to certain molecular contexts. Unlike conventional approaches to build modular networks of individual genes, our focus is to identify cancer-generic and subtype-specific interactions between contextual gene sets, of which each gene set share coherent transcriptional patterns across a subset of samples, termed contextual gene set. We then apply a novel formulation for quantitating the effect of the samples from each subtype on the calculated strength of interactions observed. Two cancer data sets were analyzed to support the validity of condition-specificity of identified interactions. When compared to an existing approach, the proposed method was much more sensitive in identifying condition-specific interactions even in heterogeneous data set. The results also revealed that network components specific to different types of cancer are related to different biological functions than cancer-generic network components. We found not only the results that are consistent with previous studies, but also new hypotheses on the biological mechanisms specific to certain cancer types that warrant further

  8. Repeated observation of immune gene sets enrichment in women with non-small cell lung cancer.

    PubMed

    Araujo, Jhajaira M; Prado, Alexandra; Cardenas, Nadezhda K; Zaharia, Mayer; Dyer, Richard; Doimi, Franco; Bravo, Leny; Pinillos, Luis; Morante, Zaida; Aguilar, Alfredo; Mas, Luis A; Gomez, Henry L; Vallejos, Carlos S; Rolfo, Christian; Pinto, Joseph A

    2016-04-12

    There are different biological and clinical patterns of lung cancer between genders indicating intrinsic differences leading to increased sensitivity to cigarette smoke-induced DNA damage, mutational patterns of KRAS and better clinical outcomes in women while differences between genders at gene-expression levels was not previously reported. Here we show an enrichment of immune genes in NSCLC in women compared to men. We found in a GSEA analysis (by biological processes annotated from Gene Ontology) of six public datasets a repeated observation of immune gene sets enrichment in women. "Immune system process", "immune response", "defense response", "cellular defense response" and "regulation of immune system process" were the gene sets most over-represented while APOBEC3G, APOBEC3F, LAT, CD1D and CCL5 represented the top-five core genes. Characterization of immune cell composition with the platform CIBERSORT showed no differences between genders; however, there were differences when tumor tissues were compared to normal tissues. Our results suggest different immune responses in NSCLC between genders that could be related with the different clinical outcome. PMID:26958810

  9. Repeated observation of immune gene sets enrichment in women with non-small cell lung cancer

    PubMed Central

    Araujo, Jhajaira M.; Prado, Alexandra; Cardenas, Nadezhda K.; Zaharia, Mayer; Dyer, Richard; Doimi, Franco; Bravo, Leny; Pinillos, Luis; Morante, Zaida; Aguilar, Alfredo; Mas, Luis A.; Gomez, Henry L.; Vallejos, Carlos S.; Rolfo, Christian; Pinto, Joseph A.

    2016-01-01

    There are different biological and clinical patterns of lung cancer between genders indicating intrinsic differences leading to increased sensitivity to cigarette smoke-induced DNA damage, mutational patterns of KRAS and better clinical outcomes in women while differences between genders at gene-expression levels was not previously reported. Here we show an enrichment of immune genes in NSCLC in women compared to men. We found in a GSEA analysis (by biological processes annotated from Gene Ontology) of six public datasets a repeated observation of immune gene sets enrichment in women. “Immune system process”, “immune response”, “defense response”, “cellular defense response” and “regulation of immune system process” were the gene sets most over-represented while APOBEC3G, APOBEC3F, LAT, CD1D and CCL5 represented the top-five core genes. Characterization of immune cell composition with the platform CIBERSORT showed no differences between genders; however, there were differences when tumor tissues were compared to normal tissues. Our results suggest different immune responses in NSCLC between genders that could be related with the different clinical outcome. PMID:26958810

  10. Protein interaction networks reveal novel autism risk genes within GWAS statistical noise.

    PubMed

    Correia, Catarina; Oliveira, Guiomar; Vicente, Astrid M

    2014-01-01

    Genome-wide association studies (GWAS) for Autism Spectrum Disorder (ASD) thus far met limited success in the identification of common risk variants, consistent with the notion that variants with small individual effects cannot be detected individually in single SNP analysis. To further capture disease risk gene information from ASD association studies, we applied a network-based strategy to the Autism Genome Project (AGP) and the Autism Genetics Resource Exchange GWAS datasets, combining family-based association data with Human Protein-Protein interaction (PPI) data. Our analysis showed that autism-associated proteins at higher than conventional levels of significance (P<0.1) directly interact more than random expectation and are involved in a limited number of interconnected biological processes, indicating that they are functionally related. The functionally coherent networks generated by this approach contain ASD-relevant disease biology, as demonstrated by an improved positive predictive value and sensitivity in retrieving known ASD candidate genes relative to the top associated genes from either GWAS, as well as a higher gene overlap between the two ASD datasets. Analysis of the intersection between the networks obtained from the two ASD GWAS and six unrelated disease datasets identified fourteen genes exclusively present in the ASD networks. These are mostly novel genes involved in abnormal nervous system phenotypes in animal models, and in fundamental biological processes previously implicated in ASD, such as axon guidance, cell adhesion or cytoskeleton organization. Overall, our results highlighted novel susceptibility genes previously hidden within GWAS statistical "noise" that warrant further analysis for causal variants. PMID:25409314

  11. Protein Interaction Networks Reveal Novel Autism Risk Genes within GWAS Statistical Noise

    PubMed Central

    Correia, Catarina; Oliveira, Guiomar; Vicente, Astrid M.

    2014-01-01

    Genome-wide association studies (GWAS) for Autism Spectrum Disorder (ASD) thus far met limited success in the identification of common risk variants, consistent with the notion that variants with small individual effects cannot be detected individually in single SNP analysis. To further capture disease risk gene information from ASD association studies, we applied a network-based strategy to the Autism Genome Project (AGP) and the Autism Genetics Resource Exchange GWAS datasets, combining family-based association data with Human Protein-Protein interaction (PPI) data. Our analysis showed that autism-associated proteins at higher than conventional levels of significance (P<0.1) directly interact more than random expectation and are involved in a limited number of interconnected biological processes, indicating that they are functionally related. The functionally coherent networks generated by this approach contain ASD-relevant disease biology, as demonstrated by an improved positive predictive value and sensitivity in retrieving known ASD candidate genes relative to the top associated genes from either GWAS, as well as a higher gene overlap between the two ASD datasets. Analysis of the intersection between the networks obtained from the two ASD GWAS and six unrelated disease datasets identified fourteen genes exclusively present in the ASD networks. These are mostly novel genes involved in abnormal nervous system phenotypes in animal models, and in fundamental biological processes previously implicated in ASD, such as axon guidance, cell adhesion or cytoskeleton organization. Overall, our results highlighted novel susceptibility genes previously hidden within GWAS statistical “noise” that warrant further analysis for causal variants. PMID:25409314

  12. Deciphering causal and statistical relations of molecular aberrations and gene expressions in NCI-60 cell lines

    PubMed Central

    2011-01-01

    Background Cancer cells harbor a large number of molecular alterations such as mutations, amplifications and deletions on DNA sequences and epigenetic changes on DNA methylations. These aberrations may dysregulate gene expressions, which in turn drive the malignancy of tumors. Deciphering the causal and statistical relations of molecular aberrations and gene expressions is critical for understanding the molecular mechanisms of clinical phenotypes. Results In this work, we proposed a computational method to reconstruct association modules containing driver aberrations, passenger mRNA or microRNA expressions, and putative regulators that mediate the effects from drivers to passengers. By applying the module-finding algorithm to the integrated datasets of NCI-60 cancer cell lines, we found that gene expressions were driven by diverse molecular aberrations including chromosomal segments' copy number variations, gene mutations and DNA methylations, microRNA expressions, and the expressions of transcription factors. In-silico validation indicated that passenger genes were enriched with the regulator binding motifs, functional categories or pathways where the drivers were involved, and co-citations with the driver/regulator genes. Moreover, 6 of 11 predicted MYB targets were down-regulated in an MYB-siRNA treated leukemia cell line. In addition, microRNA expressions were driven by distinct mechanisms from mRNA expressions. Conclusions The results provide rich mechanistic information regarding molecular aberrations and gene expressions in cancer genomes. This kind of integrative analysis will become an important tool for the diagnosis and treatment of cancer in the era of personalized medicine. PMID:22051105

  13. A set-covering approach to specific search for literature about human genes.

    PubMed

    Jenssen, T K; Vinterbo, S

    2000-01-01

    With the advent of the cDNA microarray and oligonucleotide array technologies it has become possible to study a large number of genes in a single experiment. While experiments with thousands of genes are routinely performed, searching for literature about several genes by traditional methods is time consuming and error-prone. In addition to the inherent limitations of free text search, use of the conventional Boolean operators often result in either none (when AND'ing terms) or far too many (when OR'ing terms) hits. We have created a two-step procedure as an approach to meeting the challenge of multi-gene queries. Our results so far shows that the returned sets of articles scores high on relevance. PMID:11079910

  14. Identification of Genes Expressed in Hyperpigmented Skin Using Meta-Analysis of Microarray Data Sets.

    PubMed

    Yin, Lanlan; Coelho, Sergio G; Valencia, Julio C; Ebsen, Dominik; Mahns, Andre; Smuda, Christoph; Miller, Sharon A; Beer, Janusz Z; Kolbe, Ludger; Hearing, Vincent J

    2015-10-01

    More than 375 genes have been identified that are involved in regulating skin pigmentation and these act during development, survival, differentiation, and/or responses of melanocytes to the environment. Many of these genes have been cloned, and disruptions of their functions are associated with various pigmentary diseases; however, many remain to be identified. We have performed a series of microarray analyses of hyperpigmented compared with less pigmented skin to identify genes responsible for these differences. The rationale and goal for this study was to perform a meta-analysis on these microarray databases to identify genes that may be significantly involved in regulating skin phenotype either directly or indirectly that might not have been identified due to subtle differences by any of these individual studies alone. The meta-analysis demonstrates that 1,271 probes representing 921 genes are differentially expressed at significant levels in the 5 microarray data sets compared, providing new insights into the variety of genes involved in determining skin phenotype. Immunohistochemistry was used to validate two of these markers at the protein level (TRIM63 and QPCT), and we discuss the possible functions of these genes in regulating skin physiology. PMID:25950827

  15. Multivariate Risk Adjustment of Primary Care Patient Panels in a Public Health Setting: A Comparison of Statistical Models.

    PubMed

    Hirozawa, Anne M; Montez-Rath, Maria E; Johnson, Elizabeth C; Solnit, Stephen A; Drennan, Michael J; Katz, Mitchell H; Marx, Rani

    2016-01-01

    We compared prospective risk adjustment models for adjusting patient panels at the San Francisco Department of Public Health. We used 4 statistical models (linear regression, two-part model, zero-inflated Poisson, and zero-inflated negative binomial) and 4 subsets of predictor variables (age/gender categories, chronic diagnoses, homelessness, and a loss to follow-up indicator) to predict primary care visit frequency. Predicted visit frequency was then used to calculate patient weights and adjusted panel sizes. The two-part model using all predictor variables performed best (R = 0.20). This model, designed specifically for safety net patients, may prove useful for panel adjustment in other public health settings. PMID:27576054

  16. GSVA: gene set variation analysis for microarray and RNA-Seq data

    PubMed Central

    2013-01-01

    Background Gene set enrichment (GSE) analysis is a popular framework for condensing information from gene expression profiles into a pathway or signature summary. The strengths of this approach over single gene analysis include noise and dimension reduction, as well as greater biological interpretability. As molecular profiling experiments move beyond simple case-control studies, robust and flexible GSE methodologies are needed that can model pathway activity within highly heterogeneous data sets. Results To address this challenge, we introduce Gene Set Variation Analysis (GSVA), a GSE method that estimates variation of pathway activity over a sample population in an unsupervised manner. We demonstrate the robustness of GSVA in a comparison with current state of the art sample-wise enrichment methods. Further, we provide examples of its utility in differential pathway activity and survival analysis. Lastly, we show how GSVA works analogously with data from both microarray and RNA-seq experiments. Conclusions GSVA provides increased power to detect subtle pathway activity changes over a sample population in comparison to corresponding methods. While GSE methods are generally regarded as end points of a bioinformatic analysis, GSVA constitutes a starting point to build pathway-centric models of biology. Moreover, GSVA contributes to the current need of GSE methods for RNA-seq data. GSVA is an open source software package for R which forms part of the Bioconductor project and can be downloaded at http://www.bioconductor.org. PMID:23323831

  17. A rough set based rational clustering framework for determining correlated genes.

    PubMed

    Jeyaswamidoss, Jeba Emilyn; Thangaraj, Kesavan; Ramar, Kadarkarai; Chitra, Muthusamy

    2016-06-01

    Cluster analysis plays a foremost role in identifying groups of genes that show similar behavior under a set of experimental conditions. Several clustering algorithms have been proposed for identifying gene behaviors and to understand their significance. The principal aim of this work is to develop an intelligent rough clustering technique, which will efficiently remove the irrelevant dimensions in a high-dimensional space and obtain appropriate meaningful clusters. This paper proposes a novel biclustering technique that is based on rough set theory. The proposed algorithm uses correlation coefficient as a similarity measure to simultaneously cluster both the rows and columns of a gene expression data matrix and mean squared residue to generate the initial biclusters. Furthermore, the biclusters are refined to form the lower and upper boundaries by determining the membership of the genes in the clusters using mean squared residue. The algorithm is illustrated with yeast gene expression data and the experiment proves the effectiveness of the method. The main advantage is that it overcomes the problem of selection of initial clusters and also the restriction of one object belonging to only one cluster by allowing overlapping of biclusters. PMID:27352972

  18. Interrogating differences in expression of targeted gene sets to predict breast cancer outcome

    PubMed Central

    2013-01-01

    Background Genomics provides opportunities to develop precise tests for diagnostics, therapy selection and monitoring. From analyses of our studies and those of published results, 32 candidate genes were identified, whose expression appears related to clinical outcome of breast cancer. Expression of these genes was validated by qPCR and correlated with clinical follow-up to identify a gene subset for development of a prognostic test. Methods RNA was isolated from 225 frozen invasive ductal carcinomas,and qRT-PCR was performed. Univariate hazard ratios and 95% confidence intervals for breast cancer mortality and recurrence were calculated for each of the 32 candidate genes. A multivariable gene expression model for predicting each outcome was determined using the LASSO, with 1000 splits of the data into training and testing sets to determine predictive accuracy based on the C-index. Models with gene expression data were compared to models with standard clinical covariates and models with both gene expression and clinical covariates. Results Univariate analyses revealed over-expression of RABEP1, PGR, NAT1, PTP4A2, SLC39A6, ESR1, EVL, TBC1D9, FUT8, and SCUBE2 were all associated with reduced time to disease-related mortality (HR between 0.8 and 0.91, adjusted p < 0.05), while RABEP1, PGR, SLC39A6, and FUT8 were also associated with reduced recurrence times. Multivariable analyses using the LASSO revealed PGR, ESR1, NAT1, GABRP, TBC1D9, SLC39A6, and LRBA to be the most important predictors for both disease mortality and recurrence. Median C-indexes on test data sets for the gene expression, clinical, and combined models were 0.65, 0.63, and 0.65 for disease mortality and 0.64, 0.63, and 0.66 for disease recurrence, respectively. Conclusions Molecular signatures consisting of five genes (PGR, GABRP, TBC1D9, SLC39A6 and LRBA) for disease mortality and of six genes (PGR, ESR1, GABRP, TBC1D9, SLC39A6 and LRBA) for disease recurrence were identified. These signatures

  19. Gene set enrichment analysis and ingenuity pathway analysis of metastatic clear cell renal cell carcinoma cell line.

    PubMed

    Khan, Mohammed I; Dębski, Konrad J; Dabrowski, Michał; Czarnecka, Anna M; Szczylik, Cezary

    2016-08-01

    In recent years, genome-wide RNA expression analysis has become a routine tool that offers a great opportunity to study and understand the key role of genes that contribute to carcinogenesis. Various microarray platforms and statistical approaches can be used to identify genes that might serve as prognostic biomarkers and be developed as antitumor therapies in the future. Metastatic renal cell carcinoma (mRCC) is a serious, life-threatening disease, and there are few treatment options for patients. In this study, we performed one-color microarray gene expression (4×44K) analysis of the mRCC cell line Caki-1 and the healthy kidney cell line ASE-5063. A total of 1,921 genes were differentially expressed in the Caki-1 cell line (1,023 upregulated and 898 downregulated). Gene Set Enrichment Analysis (GSEA) and Ingenuity Pathway Analysis (IPA) approaches were used to analyze the differential-expression data. The objective of this research was to identify complex biological changes that occur during metastatic development using Caki-1 as a model mRCC cell line. Our data suggest that there are multiple deregulated pathways associated with metastatic clear cell renal cell carcinoma (mccRCC), including integrin-linked kinase (ILK) signaling, leukocyte extravasation signaling, IGF-I signaling, CXCR4 signaling, and phosphoinositol 3-kinase/AKT/mammalian target of rapamycin signaling. The IPA upstream analysis predicted top transcriptional regulators that are either activated or inhibited, such as estrogen receptors, TP53, KDM5B, SPDEF, and CDKN1A. The GSEA approach was used to further confirm enriched pathway data following IPA. PMID:27279483

  20. Statistical detection of differentially expressed genes based on RNA-seq: from biological to phylogenetic replicates.

    PubMed

    Gu, Xun

    2016-03-01

    RNA-seq has been an increasingly popular high-throughput platform to identify differentially expressed (DE) genes, which is much more reproducible and accurate than the previous microarray technology. Yet, a number of statistical issues remain to be resolved in data analysis, largely due to the high-throughput data volume and over-dispersion of read counts. These problems become more challenging for those biologists who use RNA-seq to measure genome-wide expression profiles in different combinations of sampling resources (species or genotypes) or treatments. In this paper, the author first reviews the statistical methods available for detecting DE genes, which have implemented negative binomial (NB) models and/or quasi-likelihood (QL) approaches to account for the over-dispersion problem in RNA-seq samples. The author then studies how to carry out the DE test in the context of phylogeny, i.e., RNA-seq samples are from a range of species as phylogenetic replicates. The author proposes a computational framework to solve this phylo-DE problem: While an NB model is used to account for data over-dispersion within biological replicates, over-dispersion among phylogenetic replicates is taken into account by QL, plus some special treatments for phylogenetic bias. This work helps to design cost-effective RNA-seq experiments in the field of biodiversity or phenotype plasticity that may involve hundreds of species under a phylogenetic framework. PMID:26108230

  1. A conserved set of maternal genes? Insights from a molluscan transcriptome.

    PubMed

    Liu, M Maureen; Davey, John W; Jackson, Daniel J; Blaxter, Mark L; Davison, Angus

    2014-01-01

    The early animal embryo is entirely reliant on maternal gene products for a 'jump-start' that transforms a transcriptionally inactive embryo into a fully functioning zygote. Despite extensive work on model species, it has not been possible to perform a comprehensive comparison of maternally-provisioned transcripts across the Bilateria because of the absence of a suitable dataset from the Lophotrochozoa. As part of an ongoing effort to identify the maternal gene that determines left-right asymmetry in snails, we have generated transcriptome data from 1 to 2-cell and ~32-cell pond snail (Lymnaea stagnalis) embryos. Here, we compare these data to maternal transcript datasets from other bilaterian metazoan groups, including representatives of the Ecydysozoa and Deuterostomia. We found that between 5 and 10% of all L. stagnalis maternal transcripts (~300-400 genes) are also present in the equivalent arthropod (Drosophila melanogaster), nematode (Caenorhabditis elegans), urochordate (Ciona intestinalis) and chordate (Homo sapiens, Mus musculus, Danio rerio) datasets. While the majority of these conserved maternal transcripts ("COMATs") have housekeeping gene functions, they are a non-random subset of all housekeeping genes, with an overrepresentation of functions associated with nucleotide binding, protein degradation and activities associated with the cell cycle. We conclude that a conserved set of maternal transcripts and their associated functions may be a necessary starting point of early development in the Bilateria. For the wider community interested in discovering conservation of gene expression in early bilaterian development, the list of putative COMATs may be useful resource. PMID:25690965

  2. Detection of RTX toxin genes in gram-negative bacteria with a set of specific probes.

    PubMed Central

    Kuhnert, P; Heyberger-Meyer, B; Burnens, A P; Nicolet, J; Frey, J

    1997-01-01

    The family of RTX (RTX representing repeats in the structural toxin) toxins is composed of several protein toxins with a characteristic nonapeptide glycine-rich repeat motif. Most of its members were shown to have cytolytic activity. By comparing the genetic relationships of the RTX toxin genes we established a set of 10 gene probes to be used for screening as-yet-unknown RTX toxin genes in bacterial species. The probes include parts of apxIA, apxIIA, and apxIIIA from Actinobacillus pleuropneumoniae, cyaA from Bordetella pertusis, frpA from Neisseria meningitidis, prtC from Erwinia chrysanthemi, hlyA and elyA from Escherichia coli, aaltA from Actinobacillus actinomycetemcomitans and lktA from Pasteurella haemolytica. A panel of pathogenic and nonpathogenic gram-negative bacteria were investigated for the presence of RTX toxin genes. The probes detected all known genes for RTX toxins. Moreover, we found potential RTX toxin genes in several pathogenic bacterial species for which no such toxins are known yet. This indicates that RTX or RTX-like toxins are widely distributed among pathogenic gram-negative bacteria. The probes generated by PCR and the hybridization method were optimized to allow broad-range screening for RTX toxin genes in one step. This included the binding of unlabelled probes to a nylon filter and subsequent hybridization of the filter with labelled genomic DNA of the strain to be tested. The method constitutes a powerful tool for the assessment of the potential pathogenicity of poorly characterized strains intended to be used in biotechnological applications. Moreover, it is useful for the detection of already-known or new RTX toxin genes in bacteria of medical importance. PMID:9172345

  3. Exploration of the cell-cycle genes found within the RIKEN FANTOM2 data set.

    PubMed

    Forrest, Alistair R R; Taylor, Darrin; Grimmond, Sean

    2003-06-01

    The cell cycle is one of the most fundamental processes within a cell. Phase-dependent expression and cell-cycle checkpoints require a high level of control. A large number of genes with varying functions and modes of action are responsible for this biology. In a targeted exploration of the FANTOM2-Variable Protein Set, a number of mouse homologs to known cell-cycle regulators as well as novel members of cell-cycle families were identified. Focusing on two prototype cell-cycle families, the cyclins and the NIMA-related kinases (NEKs), we believe we have identified all of the mouse members of these families, 24 cyclins and 10 NEKs, and mapped them to ENSEMBL transcripts. To attempt to globally identify all potential cell cycle-related genes within mouse, the MGI (Mouse Genome Database) assignments for the RIKEN Representative Set (RPS) and the results from two homology-based queries were merged. We identified 1415 genes with possible cell-cycle roles, and 1758 potential paralogs. We comment on the genes identified in this screen and evaluate the merits of each approach. PMID:12819135

  4. Meta-Analysis of Tumor Stem-Like Breast Cancer Cells Using Gene Set and Network Analysis

    PubMed Central

    Lee, Won Jun; Kim, Sang Cheol; Yoon, Jung-Ho; Yoon, Sang Jun; Lim, Johan; Kim, You-Sun; Kwon, Sung Won; Park, Jeong Hill

    2016-01-01

    Generally, cancer stem cells have epithelial-to-mesenchymal-transition characteristics and other aggressive properties that cause metastasis. However, there have been no confident markers for the identification of cancer stem cells and comparative methods examining adherent and sphere cells are widely used to investigate mechanism underlying cancer stem cells, because sphere cells have been known to maintain cancer stem cell characteristics. In this study, we conducted a meta-analysis that combined gene expression profiles from several studies that utilized tumorsphere technology to investigate tumor stem-like breast cancer cells. We used our own gene expression profiles along with the three different gene expression profiles from the Gene Expression Omnibus, which we combined using the ComBat method, and obtained significant gene sets using the gene set analysis of our datasets and the combined dataset. This experiment focused on four gene sets such as cytokine-cytokine receptor interaction that demonstrated significance in both datasets. Our observations demonstrated that among the genes of four significant gene sets, six genes were consistently up-regulated and satisfied the p-value of < 0.05, and our network analysis showed high connectivity in five genes. From these results, we established CXCR4, CXCL1 and HMGCS1, the intersecting genes of the datasets with high connectivity and p-value of < 0.05, as significant genes in the identification of cancer stem cells. Additional experiment using quantitative reverse transcription-polymerase chain reaction showed significant up-regulation in MCF-7 derived sphere cells and confirmed the importance of these three genes. Taken together, using meta-analysis that combines gene set and network analysis, we suggested CXCR4, CXCL1 and HMGCS1 as candidates involved in tumor stem-like breast cancer cells. Distinct from other meta-analysis, by using gene set analysis, we selected possible markers which can explain the biological

  5. The Use of Multi-Component Statistical Techniques in Understanding Subduction Zone Arc Granitic Geochemical Data Sets

    NASA Astrophysics Data System (ADS)

    Pompe, L.; Clausen, B. L.; Morton, D. M.

    2015-12-01

    Multi-component statistical techniques and GIS visualization are emerging trends in understanding large data sets. Our research applies these techniques to a large igneous geochemical data set from southern California to better understand magmatic and plate tectonic processes. A set of 480 granitic samples collected by Baird from this area were analyzed for 39 geochemical elements. Of these samples, 287 are from the Peninsular Ranges Batholith (PRB) and 164 from part of the Transverse Ranges (TR). Principal component analysis (PCA) summarized the 39 variables into 3 principal components (PC) by matrix multiplication and for the PRB are interpreted as follows: PC1 with about 30% of the variation included mainly compatible elements and SiO2 and indicates extent of differentation; PC2 with about 20% of the variation included HFS elements and may indicate crustal contamination as usually identified by Sri; PC3 with about 20% of the variation included mainly HRE elements and may indicate magma source depth as often diplayed using REE spider diagrams and possibly Sr/Y. Several elements did not fit well in any of the three components: Cr, Ni, U, and Na2O.For the PRB, the PC1 correlation with SiO2 was r=-0.85, the PC2 correlation with Sri was r=0.80, and the PC3 correlation with Gd/Yb was r=-0.76 and with Sr/Y was r=-0.66 . Extending this method to the TR, correlations were r=-0.85, -0.21, -0.06, and -0.64, respectively. A similar extent of correlation for both areas was visually evident using GIS interpolation.PC1 seems to do well at indicating differentiation index for both the PRB and TR and correlates very well with SiO2, Al2O3, MgO, FeO*, CaO, K2O, Sc, V, and Co, but poorly with Na2O and Cr. If the crustal component is represented by Sri, PC2 correlates well and less expesively with this indicator in the PRB, but not in the TR. Source depth has been related to the slope on REE spidergrams, and PC3 based on only the HREE and using the Sr/Y ratios gives a reasonable

  6. Defining the optimal animal model for translational research using gene set enrichment analysis.

    PubMed

    Weidner, Christopher; Steinfath, Matthias; Opitz, Elisa; Oelgeschläger, Michael; Schönfelder, Gilbert

    2016-01-01

    The mouse is the main model organism used to study the functions of human genes because most biological processes in the mouse are highly conserved in humans. Recent reports that compared identical transcriptomic datasets of human inflammatory diseases with datasets from mouse models using traditional gene-to-gene comparison techniques resulted in contradictory conclusions regarding the relevance of animal models for translational research. To reduce susceptibility to biased interpretation, all genes of interest for the biological question under investigation should be considered. Thus, standardized approaches for systematic data analysis are needed. We analyzed the same datasets using gene set enrichment analysis focusing on pathways assigned to inflammatory processes in either humans or mice. The analyses revealed a moderate overlap between all human and mouse datasets, with average positive and negative predictive values of 48 and 57% significant correlations. Subgroups of the septic mouse models (i.e., Staphylococcus aureus injection) correlated very well with most human studies. These findings support the applicability of targeted strategies to identify the optimal animal model and protocol to improve the success of translational research. PMID:27311961

  7. Transcriptomic Analysis Identifies Candidate Genes and Gene Sets Controlling the Response of Porcine Peripheral Blood Mononuclear Cells to Poly I:C Stimulation

    PubMed Central

    Wang, Jiying; Wang, Yanping; Wang, Huaizhong; Wang, Haifei; Liu, Jian-Feng; Wu, Ying; Guo, Jianfeng

    2016-01-01

    Polyinosinic-polycytidylic acid (poly I:C), a synthetic dsRNA analog, has been demonstrated to have stimulatory effects similar to viral dsRNA. To gain deep knowledge of the host transcriptional response of pigs to poly I:C stimulation, in the present study, we cultured and stimulated peripheral blood mononuclear cells (PBMC) of piglets of one Chinese indigenous breed (Dapulian) and one modern commercial breed (Landrace) with poly I:C, and compared their transcriptional profiling using RNA-sequencing (RNA-seq). Our results indicated that poly I:C stimulation can elicit significantly differentially expressed (DE) genes in Dapulian (g = 290) as well as Landrace (g = 85). We also performed gene set analysis using the Gene Set Enrichment Analysis (GSEA) package, and identified some significantly enriched gene sets in Dapulian (g = 18) and Landrace (g = 21). Most of the shared DE genes and gene sets were immune-related, and may play crucial rules in the immune response of poly I:C stimulation. In addition, we detected large sets of significantly DE genes and enriched gene sets when comparing the gene expression profile between the two breeds, including control and poly I:C stimulation groups. Besides immune-related functions, some of the DE genes and gene sets between the two breeds were involved in development and growth of various tissues, which may be correlated with the different characteristics of the two breeds. The DE genes and gene sets detected herein provide crucial information towards understanding the immune regulation of antiviral responses, and the molecular mechanisms of different genetic resistance to viral infection, in modern and indigenous pigs. PMID:26935416

  8. Transcriptomic Analysis Identifies Candidate Genes and Gene Sets Controlling the Response of Porcine Peripheral Blood Mononuclear Cells to Poly I:C Stimulation.

    PubMed

    Wang, Jiying; Wang, Yanping; Wang, Huaizhong; Wang, Haifei; Liu, Jian-Feng; Wu, Ying; Guo, Jianfeng

    2016-01-01

    Polyinosinic-polycytidylic acid (poly I:C), a synthetic dsRNA analog, has been demonstrated to have stimulatory effects similar to viral dsRNA. To gain deep knowledge of the host transcriptional response of pigs to poly I:C stimulation, in the present study, we cultured and stimulated peripheral blood mononuclear cells (PBMC) of piglets of one Chinese indigenous breed (Dapulian) and one modern commercial breed (Landrace) with poly I:C, and compared their transcriptional profiling using RNA-sequencing (RNA-seq). Our results indicated that poly I:C stimulation can elicit significantly differentially expressed (DE) genes in Dapulian (g = 290) as well as Landrace (g = 85). We also performed gene set analysis using the Gene Set Enrichment Analysis (GSEA) package, and identified some significantly enriched gene sets in Dapulian (g = 18) and Landrace (g = 21). Most of the shared DE genes and gene sets were immune-related, and may play crucial rules in the immune response of poly I:C stimulation. In addition, we detected large sets of significantly DE genes and enriched gene sets when comparing the gene expression profile between the two breeds, including control and poly I:C stimulation groups. Besides immune-related functions, some of the DE genes and gene sets between the two breeds were involved in development and growth of various tissues, which may be correlated with the different characteristics of the two breeds. The DE genes and gene sets detected herein provide crucial information towards understanding the immune regulation of antiviral responses, and the molecular mechanisms of different genetic resistance to viral infection, in modern and indigenous pigs. PMID:26935416

  9. Powerful Set-Based Gene-Environment Interaction Testing Framework for Complex Diseases.

    PubMed

    Jiao, Shuo; Peters, Ulrike; Berndt, Sonja; Bézieau, Stéphane; Brenner, Hermann; Campbell, Peter T; Chan, Andrew T; Chang-Claude, Jenny; Lemire, Mathieu; Newcomb, Polly A; Potter, John D; Slattery, Martha L; Woods, Michael O; Hsu, Li

    2015-12-01

    Identification of gene-environment interaction (G × E) is important in understanding the etiology of complex diseases. Based on our previously developed Set Based gene EnviRonment InterAction test (SBERIA), in this paper we propose a powerful framework for enhanced set-based G × E testing (eSBERIA). The major challenge of signal aggregation within a set is how to tell signals from noise. eSBERIA tackles this challenge by adaptively aggregating the interaction signals within a set weighted by the strength of the marginal and correlation screening signals. eSBERIA then combines the screening-informed aggregate test with a variance component test to account for the residual signals. Additionally, we develop a case-only extension for eSBERIA (coSBERIA) and an existing set-based method, which boosts the power not only by exploiting the G-E independence assumption but also by avoiding the need to specify main effects for a large number of variants in the set. Through extensive simulation, we show that coSBERIA and eSBERIA are considerably more powerful than existing methods within the case-only and the case-control method categories across a wide range of scenarios. We conduct a genome-wide G × E search by applying our methods to Illumina HumanExome Beadchip data of 10,446 colorectal cancer cases and 10,191 controls and identify two novel interactions between nonsteroidal anti-inflammatory drugs (NSAIDs) and MINK1 and PTCHD3. PMID:26095235

  10. Statistical mapping analysis of lesion location and neurological disability in multiple sclerosis: application to 452 patient data sets.

    PubMed

    Charil, Arnaud; Zijdenbos, Alex P; Taylor, Jonathan; Boelman, Cyrus; Worsley, Keith J; Evans, Alan C; Dagher, Alain

    2003-07-01

    In multiple sclerosis (MS), the correlation between disability and the volume of white matter lesions on magnetic resonance imaging (MRI) is usually weak. This may be because lesion location also influences the extent and type of functional disability. We applied an automatic lesion-detection algorithm to 452 MRI scans of patients with relapsing-remitting MS to identify the regions preferentially responsible for different types of clinical deficits. Statistical parametric maps were generated by performing voxel-wise linear regressions between lesion probability and different clinical disability scores. There was a clear distinction between lesion locations causing physical and cognitive disability. Lesion likelihood correlated with the Expanded Disability Status Scale (EDSS) in the left internal capsule and in periventricular white matter mostly in the left hemisphere. Pyramidal deficits correlated with only one area in the left internal capsule that was also present in the EDSS correlation. Cognitive dysfunction correlated with lesion location at the grey-white junction of associative, limbic, and prefrontal cortex. Coordination impairment correlated with areas in interhemispheric and pyramidal periventricular white matter tracts, and in the inferior and superior longitudinal fascicles. Bowel and bladder scores correlated with lesions in the medial frontal lobes, cerebellum, insula, dorsal midbrain, and pons, areas known to be involved in the control of micturition. This study demonstrates for the first time a relationship between the site of lesions and the type of disability in large scale MRI data set in MS. PMID:12880785

  11. TOPS: a versatile software tool for statistical analysis and visualization of combinatorial gene-gene and gene-drug interaction screens

    PubMed Central

    2014-01-01

    Background Measuring the impact of combinations of genetic or chemical perturbations on cellular fitness, sometimes referred to as synthetic lethal screening, is a powerful method for obtaining novel insights into gene function and drug action. Especially when performed at large scales, gene-gene or gene-drug interaction screens can reveal complex genetic interactions or drug mechanism of action or even identify novel therapeutics for the treatment of diseases. The result of such large-scale screen results can be represented as a matrix with a numeric score indicating the cellular fitness (e.g. viability or doubling time) for each double perturbation. In a typical screen, the majority of combinations do not impact the cellular fitness. Thus, it is critical to first discern true "hits" from noise. Subsequent data exploration and visualization methods can assist to extract meaningful biological information from the data. However, despite the increasing interest in combination perturbation screens, no user friendly open-source program exists that combines statistical analysis, data exploration tools and visualization. Results We developed TOPS (Tool for Combination Perturbation Screen Analysis), a Java and R-based software tool with a simple graphical user interface that allows the user to import, analyze, filter and plot data from double perturbation screens as well as other compatible data. TOPS was designed in a modular fashion to allow the user to add alternative importers for data formats or custom analysis scripts not covered by the original release. We demonstrate the utility of TOPS on two datasets derived from functional genetic screens using different methods. Dataset 1 is a gene-drug interaction screen and is based on Luminex xMAP technology. Dataset 2 is a gene-gene short hairpin (sh)RNAi screen exploring the interactions between deubiquitinating enzymes and a number of prominent oncogenes using massive parallel sequencing (MPS). Conclusions TOPS provides

  12. Average Rank-Based Score to Measure Deregulation of Molecular Pathway Gene Sets

    PubMed Central

    Zhang, Wei

    2011-01-01

    Background Deregulation of biological pathways has been shown to be involved in the turmorigenesis of a variety of cancers. The co-regulation of pathways in tumor and normal tissues has not been studied in a systematic manner. Results In this study we propose a novel statistic named AR-score (average rank based score) to measure pathway activities based on microarray gene expression profiles. We calculate and compare the AR-scores of pathways in microarray datasets containing expression profiles for a wide range of cancer types as well as the corresponding normal tissues. We find that many pathways undergo significant activity changes in tumors with respect to normal tissues. AR-scores for a small subset of pathways are capable of distinguishing tumor from normal tissues or classifying tumor subtypes. In normal tissues many pathways are highly correlated in their activities, whereas their correlations reduce significantly in tumors and cancer cell lines. The co-expression of genes in the same pathways was also significantly perturbed in tumors. Conclusions The co-regulation of genes in the same pathways and co-regulation of different pathways are significantly perturbed in tumors versus normal tissues. Our method provides a useful tool for better understanding the mechanistic changes in tumors, which can also be used for exploring other biological problems. PMID:22096597

  13. Statistical inference of the time-varying structure of gene-regulation networks

    PubMed Central

    2010-01-01

    Background Biological networks are highly dynamic in response to environmental and physiological cues. This variability is in contrast to conventional analyses of biological networks, which have overwhelmingly employed static graph models which stay constant over time to describe biological systems and their underlying molecular interactions. Methods To overcome these limitations, we propose here a new statistical modelling framework, the ARTIVA formalism (Auto Regressive TIme VArying models), and an associated inferential procedure that allows us to learn temporally varying gene-regulation networks from biological time-course expression data. ARTIVA simultaneously infers the topology of a regulatory network and how it changes over time. It allows us to recover the chronology of regulatory associations for individual genes involved in a specific biological process (development, stress response, etc.). Results We demonstrate that the ARTIVA approach generates detailed insights into the function and dynamics of complex biological systems and exploits efficiently time-course data in systems biology. In particular, two biological scenarios are analyzed: the developmental stages of Drosophila melanogaster and the response of Saccharomyces cerevisiae to benomyl poisoning. Conclusions ARTIVA does recover essential temporal dependencies in biological systems from transcriptional data, and provide a natural starting point to learn and investigate their dynamics in greater detail. PMID:20860793

  14. A Minimal Set of Glycolytic Genes Reveals Strong Redundancies in Saccharomyces cerevisiae Central Metabolism

    PubMed Central

    Solis-Escalante, Daniel; Kuijpers, Niels G. A.; Barrajon-Simancas, Nuria; van den Broek, Marcel; Pronk, Jack T.; Daran, Jean-Marc

    2015-01-01

    As a result of ancestral whole-genome and small-scale duplication events, the genomes of Saccharomyces cerevisiae and many eukaryotes still contain a substantial fraction of duplicated genes. In all investigated organisms, metabolic pathways, and more particularly glycolysis, are specifically enriched for functionally redundant paralogs. In ancestors of the Saccharomyces lineage, the duplication of glycolytic genes is purported to have played an important role leading to S. cerevisiae's current lifestyle favoring fermentative metabolism even in the presence of oxygen and characterized by a high glycolytic capacity. In modern S. cerevisiae strains, the 12 glycolytic reactions leading to the biochemical conversion from glucose to ethanol are encoded by 27 paralogs. In order to experimentally explore the physiological role of this genetic redundancy, a yeast strain with a minimal set of 14 paralogs was constructed (the “minimal glycolysis” [MG] strain). Remarkably, a combination of a quantitative systems approach and semiquantitative analysis in a wide array of growth environments revealed the absence of a phenotypic response to the cumulative deletion of 13 glycolytic paralogs. This observation indicates that duplication of glycolytic genes is not a prerequisite for achieving the high glycolytic fluxes and fermentative capacities that are characteristic of S. cerevisiae and essential for many of its industrial applications and argues against gene dosage effects as a means of fixing minor glycolytic paralogs in the yeast genome. The MG strain was carefully designed and constructed to provide a robust prototrophic platform for quantitative studies and has been made available to the scientific community. PMID:26071034

  15. MIR137 variants identified in psychiatric patients affect synaptogenesis and neuronal transmission gene sets.

    PubMed

    Strazisar, M; Cammaerts, S; van der Ven, K; Forero, D A; Lenaerts, A-S; Nordin, A; Almeida-Souza, L; Genovese, G; Timmerman, V; Liekens, A; De Rijk, P; Adolfsson, R; Callaerts, P; Del-Favero, J

    2015-04-01

    Sequence analysis of 13 microRNA (miRNA) genes expressed in the human brain and located in genomic regions associated with schizophrenia and/or bipolar disorder, in a northern Swedish patient/control population, resulted in the discovery of two functional variants in the MIR137 gene. On the basis of their location and the allele frequency differences between patients and controls, we explored the hypothesis that the discovered variants impact the expression of the mature miRNA and consequently influence global mRNA expression affecting normal brain functioning. Using neuronal-like SH-SY5Y cells, we demonstrated significantly reduced mature miR-137 levels in the cells expressing the variant miRNA gene. Subsequent transcriptome analysis showed that the reduction in miR-137 expression led to the deregulation of gene sets involved in synaptogenesis and neuronal transmission, all implicated in psychiatric disorders. Our functional findings add to the growing data, which implicate that miR-137 has an important role in the etiology of psychiatric disorders and emphasizes its involvement in nervous system development and proper synaptic function. PMID:24888363

  16. Globularity and language-readiness: generating new predictions by expanding the set of genes of interest

    PubMed Central

    Boeckx, Cedric; Benítez-Burraco, Antonio

    2014-01-01

    This study builds on the hypothesis put forth in Boeckx and Benítez-Burraco (2014), according to which the developmental changes expressed at the levels of brain morphology and neural connectivity that resulted in a more globular braincase in our species were crucial to understand the origins of our language-ready brain. Specifically, this paper explores the links between two well-known ‘language-related’ genes like FOXP2 and ROBO1 implicated in vocal learning and the initial set of genes of interest put forth in Boeckx and Benítez-Burraco (2014), with RUNX2 as focal point. Relying on the existing literature, we uncover potential molecular links that could be of interest to future experimental inquiries into the biological foundations of language and the testing of our initial hypothesis. Our discussion could also be relevant for clinical linguistics and for the interpretation of results from paleogenomics. PMID:25505436

  17. Gene Ontology Analysis of GWA Study Data Sets Provides Insights into the Biology of Bipolar Disorder

    PubMed Central

    Holmans, Peter; Green, Elaine K.; Pahwa, Jaspreet Singh; Ferreira, Manuel A.R.; Purcell, Shaun M.; Sklar, Pamela; Owen, Michael J.; O'Donovan, Michael C.; Craddock, Nick

    2009-01-01

    We present a method for testing overrepresentation of biological pathways, indexed by gene-ontology terms, in lists of significant SNPs from genome-wide association studies. This method corrects for linkage disequilibrium between SNPs, variable gene size, and multiple testing of nonindependent pathways. The method was applied to the Wellcome Trust Case-Control Consortium Crohn disease (CD) data set. At a general level, the biological basis of CD is relatively well known for a complex genetic trait, and it thus acted as a test of the method. The method, known as ALIGATOR (Association LIst Go AnnoTatOR), successfully detected biological pathways implicated in CD. The method was also applied to a meta-analysis of bipolar disorder, and it implicated the modulation of transcription and cellular activity, including that which occurs via hormonal action, as an important player in pathogenesis. PMID:19539887

  18. Gene ontology analysis of GWA study data sets provides insights into the biology of bipolar disorder.

    PubMed

    Holmans, Peter; Green, Elaine K; Pahwa, Jaspreet Singh; Ferreira, Manuel A R; Purcell, Shaun M; Sklar, Pamela; Owen, Michael J; O'Donovan, Michael C; Craddock, Nick

    2009-07-01

    We present a method for testing overrepresentation of biological pathways, indexed by gene-ontology terms, in lists of significant SNPs from genome-wide association studies. This method corrects for linkage disequilibrium between SNPs, variable gene size, and multiple testing of nonindependent pathways. The method was applied to the Wellcome Trust Case-Control Consortium Crohn disease (CD) data set. At a general level, the biological basis of CD is relatively well known for a complex genetic trait, and it thus acted as a test of the method. The method, known as ALIGATOR (Association LIst Go AnnoTatOR), successfully detected biological pathways implicated in CD. The method was also applied to a meta-analysis of bipolar disorder, and it implicated the modulation of transcription and cellular activity, including that which occurs via hormonal action, as an important player in pathogenesis. PMID:19539887

  19. Analyzing Large Gene Expression and Methylation Data Profiles Using StatBicRM: Statistical Biclustering-Based Rule Mining

    PubMed Central

    Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra

    2015-01-01

    Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data

  20. Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.

    PubMed

    Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra

    2015-01-01

    Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data

  1. In silico analysis of stomach lineage specific gene set expression pattern in gastric cancer

    SciTech Connect

    Pandi, Narayanan Sathiya Suganya, Sivagurunathan; Rajendran, Suriliyandi

    2013-10-04

    Highlights: •Identified stomach lineage specific gene set (SLSGS) was found to be under expressed in gastric tumors. •Elevated expression of SLSGS in gastric tumor is a molecular predictor of metabolic type gastric cancer. •In silico pathway scanning identified estrogen-α signaling is a putative regulator of SLSGS in gastric cancer. •Elevated expression of SLSGS in GC is associated with an overall increase in the survival of GC patients. -- Abstract: Stomach lineage specific gene products act as a protective barrier in the normal stomach and their expression maintains the normal physiological processes, cellular integrity and morphology of the gastric wall. However, the regulation of stomach lineage specific genes in gastric cancer (GC) is far less clear. In the present study, we sought to investigate the role and regulation of stomach lineage specific gene set (SLSGS) in GC. SLSGS was identified by comparing the mRNA expression profiles of normal stomach tissue with other organ tissue. The obtained SLSGS was found to be under expressed in gastric tumors. Functional annotation analysis revealed that the SLSGS was enriched for digestive function and gastric epithelial maintenance. Employing a single sample prediction method across GC mRNA expression profiles identified the under expression of SLSGS in proliferative type and invasive type gastric tumors compared to the metabolic type gastric tumors. Integrative pathway activation prediction analysis revealed a close association between estrogen-α signaling and SLSGS expression pattern in GC. Elevated expression of SLSGS in GC is associated with an overall increase in the survival of GC patients. In conclusion, our results highlight that estrogen mediated regulation of SLSGS in gastric tumor is a molecular predictor of metabolic type GC and prognostic factor in GC.

  2. MADS goes genomic in conifers: towards determining the ancestral set of MADS-box genes in seed plants

    PubMed Central

    Gramzow, Lydia; Weilandt, Lisa; Theißen, Günter

    2014-01-01

    Background and Aims MADS-box genes comprise a gene family coding for transcription factors. This gene family expanded greatly during land plant evolution such that the number of MADS-box genes ranges from one or two in green algae to around 100 in angiosperms. Given the crucial functions of MADS-box genes for nearly all aspects of plant development, the expansion of this gene family probably contributed to the increasing complexity of plants. However, the expansion of MADS-box genes during one important step of land plant evolution, namely the origin of seed plants, remains poorly understood due to the previous lack of whole-genome data for gymnosperms. Methods The newly available genome sequences of Picea abies, Picea glauca and Pinus taeda were used to identify the complete set of MADS-box genes in these conifers. In addition, MADS-box genes were identified in the growing number of transcriptomes available for gymnosperms. With these datasets, phylogenies were constructed to determine the ancestral set of MADS-box genes of seed plants and to infer the ancestral functions of these genes. Key Results Type I MADS-box genes are under-represented in gymnosperms and only a minimum of two Type I MADS-box genes have been present in the most recent common ancestor (MRCA) of seed plants. In contrast, a large number of Type II MADS-box genes were found in gymnosperms. The MRCA of extant seed plants probably possessed at least 11–14 Type II MADS-box genes. In gymnosperms two duplications of Type II MADS-box genes were found, such that the MRCA of extant gymnosperms had at least 14–16 Type II MADS-box genes. Conclusions The implied ancestral set of MADS-box genes for seed plants shows simplicity for Type I MADS-box genes and remarkable complexity for Type II MADS-box genes in terms of phylogeny and putative functions. The analysis of transcriptome data reveals that gymnosperm MADS-box genes are expressed in a great variety of tissues, indicating diverse roles of MADS

  3. A Joint Location-Scale Test Improves Power to Detect Associated SNPs, Gene Sets, and Pathways

    PubMed Central

    Soave, David; Corvol, Harriet; Panjwani, Naim; Gong, Jiafen; Li, Weili; Boëlle, Pierre-Yves; Durie, Peter R.; Paterson, Andrew D.; Rommens, Johanna M.; Strug, Lisa J.; Sun, Lei

    2015-01-01

    Gene-based, pathway, and other multivariate association methods are motivated by the possibility of GxG and GxE interactions; however, accounting for such interactions is limited by the challenges associated with adequate modeling information. Here we propose an easy-to-implement joint location-scale (JLS) association testing framework for single-variant and multivariate analysis that accounts for interactions without explicitly modeling them. We apply the JLS method to a gene-set analysis of cystic fibrosis (CF) lung disease, which is influenced by multiple environmental and genetic factors. We identify and replicate an association between the constituents of the apical plasma membrane and CF lung disease (p = 0.0099 and p = 0.0180, respectively) and highlight a role for the SLC9A3-SLC9A3R1/2-EZR complex in contributing to CF lung disease. Many association studies could benefit from re-analysis with the JLS method that leverages complex genetic architecture for SNP, gene, and pathway identification. Analytical verification, simulation, and additional proof-of-principle applications support our approach. PMID:26140448

  4. A Joint Location-Scale Test Improves Power to Detect Associated SNPs, Gene Sets, and Pathways.

    PubMed

    Soave, David; Corvol, Harriet; Panjwani, Naim; Gong, Jiafen; Li, Weili; Boëlle, Pierre-Yves; Durie, Peter R; Paterson, Andrew D; Rommens, Johanna M; Strug, Lisa J; Sun, Lei

    2015-07-01

    Gene-based, pathway, and other multivariate association methods are motivated by the possibility of GxG and GxE interactions; however, accounting for such interactions is limited by the challenges associated with adequate modeling information. Here we propose an easy-to-implement joint location-scale (JLS) association testing framework for single-variant and multivariate analysis that accounts for interactions without explicitly modeling them. We apply the JLS method to a gene-set analysis of cystic fibrosis (CF) lung disease, which is influenced by multiple environmental and genetic factors. We identify and replicate an association between the constituents of the apical plasma membrane and CF lung disease (p = 0.0099 and p = 0.0180, respectively) and highlight a role for the SLC9A3-SLC9A3R1/2-EZR complex in contributing to CF lung disease. Many association studies could benefit from re-analysis with the JLS method that leverages complex genetic architecture for SNP, gene, and pathway identification. Analytical verification, simulation, and additional proof-of-principle applications support our approach. PMID:26140448

  5. Schizophrenia-Associated MIR204 Regulates Noncoding RNAs and Affects Neurotransmitter and Ion Channel Gene Sets

    PubMed Central

    Cammaerts, Sophia; Strazisar, Mojca; Smets, Bart; Weckhuysen, Sarah; Nordin, Annelie; De Jonghe, Peter; Adolfsson, Rolf; De Rijk, Peter; Del Favero, Jurgen

    2015-01-01

    As regulators of gene expression, microRNAs (miRNAs) are likely to play an important role in the development of disease. In this study we present a large-scale strategy to identify miRNAs with a role in the regulation of neuronal processes. Thereby we found variant rs7861254 located near the MIR204 gene to be significantly associated with schizophrenia. This variant resulted in reduced expression of miR-204 in neuronal-like SH-SY5Y cells. Analysis of the consequences of the altered miR-204 expression on the transcriptome of these cells uncovered a new mode of action for miR-204, being the regulation of noncoding RNAs (ncRNAs), including several miRNAs, such as MIR296. Furthermore, pathway analysis showed downstream effects of miR-204 on neurotransmitter and ion channel related gene sets, potentially mediated by miRNAs regulated through miR-204. PMID:26714269

  6. A comprehensive time-course-based multicohort analysis of sepsis and sterile inflammation reveals a robust diagnostic gene set.

    PubMed

    Sweeney, Timothy E; Shidham, Aaditya; Wong, Hector R; Khatri, Purvesh

    2015-05-13

    Although several dozen studies of gene expression in sepsis have been published, distinguishing sepsis from a sterile systemic inflammatory response syndrome (SIRS) is still largely up to clinical suspicion. We hypothesized that a multicohort analysis of the publicly available sepsis gene expression data sets would yield a robust set of genes for distinguishing patients with sepsis from patients with sterile inflammation. A comprehensive search for gene expression data sets in sepsis identified 27 data sets matching our inclusion criteria. Five data sets (n = 663 samples) compared patients with sterile inflammation (SIRS/trauma) to time-matched patients with infections. We applied our multicohort analysis framework that uses both effect sizes and P values in a leave-one-data set-out fashion to these data sets. We identified 11 genes that were differentially expressed (false discovery rate ≤1%, inter-data set heterogeneity P > 0.01, summary effect size >1.5-fold) across all discovery cohorts with excellent diagnostic power [mean area under the receiver operating characteristic curve (AUC), 0.87; range, 0.7 to 0.98]. We then validated these 11 genes in 15 independent cohorts comparing (i) time-matched infected versus noninfected trauma patients (4 cohorts), (ii) ICU/trauma patients with infections over the clinical time course (3 cohorts), and (iii) healthy subjects versus sepsis patients (8 cohorts). In the discovery Glue Grant cohort, SIRS plus the 11-gene set improved prediction of infection (compared to SIRS alone) with a continuous net reclassification index of 0.90. Overall, multicohort analysis of time-matched cohorts yielded 11 genes that robustly distinguish sterile inflammation from infectious inflammation. PMID:25972003

  7. Cscan: finding common regulators of a set of genes by using a collection of genome-wide ChIP-seq datasets

    PubMed Central

    Zambelli, Federico; Prazzoli, Gian Marco; Pesole, Graziano; Pavesi, Giulio

    2012-01-01

    The regulation of transcription of eukaryotic genes is a very complex process, which involves interactions between transcription factors (TFs) and DNA, as well as other epigenetic factors like histone modifications, DNA methylation, and so on, which nowadays can be studied and characterized with techniques like ChIP-Seq. Cscan is a web resource that includes a large collection of genome-wide ChIP-Seq experiments performed on TFs, histone modifications, RNA polymerases and others. Enriched peak regions from the ChIP-Seq experiments are crossed with the genomic coordinates of a set of input genes, to identify which of the experiments present a statistically significant number of peaks within the input genes’ loci. The input can be a cluster of co-expressed genes, or any other set of genes sharing a common regulatory profile. Users can thus single out which TFs are likely to be common regulators of the genes, and their respective correlations. Also, by examining results on promoter activation, transcription, histone modifications, polymerase binding and so on, users can investigate the effect of the TFs (activation or repression of transcription) as well as of the cell or tissue specificity of the genes’ regulation and expression. The web interface is free for use, and there is no login requirement. Available at: http://www.beaconlab.it/cscan. PMID:22669907

  8. An ancient dental gene set governs development and continuous regeneration of teeth in sharks.

    PubMed

    Rasch, Liam J; Martin, Kyle J; Cooper, Rory L; Metscher, Brian D; Underwood, Charlie J; Fraser, Gareth J

    2016-07-15

    The evolution of oral teeth is considered a major contributor to the overall success of jawed vertebrates. This is especially apparent in cartilaginous fishes including sharks and rays, which develop elaborate arrays of highly specialized teeth, organized in rows and retain the capacity for life-long regeneration. Perpetual regeneration of oral teeth has been either lost or highly reduced in many other lineages including important developmental model species, so cartilaginous fishes are uniquely suited for deep comparative analyses of tooth development and regeneration. Additionally, sharks and rays can offer crucial insights into the characters of the dentition in the ancestor of all jawed vertebrates. Despite this, tooth development and regeneration in chondrichthyans is poorly understood and remains virtually uncharacterized from a developmental genetic standpoint. Using the emerging chondrichthyan model, the catshark (Scyliorhinus spp.), we characterized the expression of genes homologous to those known to be expressed during stages of early dental competence, tooth initiation, morphogenesis, and regeneration in bony vertebrates. We have found that expression patterns of several genes from Hh, Wnt/β-catenin, Bmp and Fgf signalling pathways indicate deep conservation over ~450 million years of tooth development and regeneration. We describe how these genes participate in the initial emergence of the shark dentition and how they are redeployed during regeneration of successive tooth generations. We suggest that at the dawn of the vertebrate lineage, teeth (i) were most likely continuously regenerative structures, and (ii) utilised a core set of genes from members of key developmental signalling pathways that were instrumental in creating a dental legacy redeployed throughout vertebrate evolution. These data lay the foundation for further experimental investigations utilizing the unique regenerative capacity of chondrichthyan models to answer evolutionary

  9. Meta-analysis of gene-level associations for rare variants based on single-variant statistics.

    PubMed

    Hu, Yi-Juan; Berndt, Sonja I; Gustafsson, Stefan; Ganna, Andrea; Hirschhorn, Joel; North, Kari E; Ingelsson, Erik; Lin, Dan-Yu

    2013-08-01

    Meta-analysis of genome-wide association studies (GWASs) has led to the discoveries of many common variants associated with complex human diseases. There is a growing recognition that identifying "causal" rare variants also requires large-scale meta-analysis. The fact that association tests with rare variants are performed at the gene level rather than at the variant level poses unprecedented challenges in the meta-analysis. First, different studies may adopt different gene-level tests, so the results are not compatible. Second, gene-level tests require multivariate statistics (i.e., components of the test statistic and their covariance matrix), which are difficult to obtain. To overcome these challenges, we propose to perform gene-level tests for rare variants by combining the results of single-variant analysis (i.e., p values of association tests and effect estimates) from participating studies. This simple strategy is possible because of an insight that multivariate statistics can be recovered from single-variant statistics, together with the correlation matrix of the single-variant test statistics, which can be estimated from one of the participating studies or from a publicly available database. We show both theoretically and numerically that the proposed meta-analysis approach provides accurate control of the type I error and is as powerful as joint analysis of individual participant data. This approach accommodates any disease phenotype and any study design and produces all commonly used gene-level tests. An application to the GWAS summary results of the Genetic Investigation of ANthropometric Traits (GIANT) consortium reveals rare and low-frequency variants associated with human height. The relevant software is freely available. PMID:23891470

  10. Meta-analysis of Gene-Level Associations for Rare Variants Based on Single-Variant Statistics

    PubMed Central

    Hu, Yi-Juan; Berndt, Sonja I.; Gustafsson, Stefan; Ganna, Andrea; Berndt, Sonja I.; Gustafsson, Stefan; Mägi, Reedik; Ganna, Andrea; Wheeler, Eleanor; Feitosa, Mary F.; Justice, Anne E.; Monda, Keri L.; Croteau-Chonka, Damien C.; Day, Felix R.; Esko, Tõnu; Fall, Tove; Ferreira, Teresa; Gentilini, Davide; Jackson, Anne U.; Luan, Jian’an; Randall, Joshua C.; Vedantam, Sailaja; Willer, Cristen J.; Winkler, Thomas W.; Wood, Andrew R.; Workalemahu, Tsegaselassie; Hu, Yi-Juan; Lee, Sang Hong; Liang, Liming; Lin, Dan-Yu; Min, Josine L.; Neale, Benjamin M.; Thorleifsson, Gudmar; Yang, Jian; Albrecht, Eva; Amin, Najaf; Bragg-Gresham, Jennifer L.; Cadby, Gemma; den Heijer, Martin; Eklund, Niina; Fischer, Krista; Goel, Anuj; Hottenga, Jouke-Jan; Huffman, Jennifer E.; Jarick, Ivonne; Johansson, Åsa; Johnson, Toby; Kanoni, Stavroula; Kleber, Marcus E.; König, Inke R.; Kristiansson, Kati; Kutalik, Zoltán; Lamina, Claudia; Lecoeur, Cecile; Li, Guo; Mangino, Massimo; McArdle, Wendy L.; Medina-Gomez, Carolina; Müller-Nurasyid, Martina; Ngwa, Julius S.; Nolte, Ilja M.; Paternoster, Lavinia; Pechlivanis, Sonali; Perola, Markus; Peters, Marjolein J.; Preuss, Michael; Rose, Lynda M.; Shi, Jianxin; Shungin, Dmitry; Smith, Albert Vernon; Strawbridge, Rona J.; Surakka, Ida; Teumer, Alexander; Trip, Mieke D.; Tyrer, Jonathan; Van Vliet-Ostaptchouk, Jana V.; Vandenput, Liesbeth; Waite, Lindsay L.; Zhao, Jing Hua; Absher, Devin; Asselbergs, Folkert W.; Atalay, Mustafa; Attwood, Antony P.; Balmforth, Anthony J.; Basart, Hanneke; Beilby, John; Bonnycastle, Lori L.; Brambilla, Paolo; Bruinenberg, Marcel; Campbell, Harry; Chasman, Daniel I.; Chines, Peter S.; Collins, Francis S.; Connell, John M.; Cookson, William; de Faire, Ulf; de Vegt, Femmie; Dei, Mariano; Dimitriou, Maria; Edkins, Sarah; Estrada, Karol; Evans, David M.; Farrall, Martin; Ferrario, Marco M.; Ferrières, Jean; Franke, Lude; Frau, Francesca; Gejman, Pablo V.; Grallert, Harald; Grönberg, Henrik; Gudnason, Vilmundur; Hall, Alistair S.; Hall, Per; Hartikainen, Anna-Liisa; Hayward, Caroline; Heard-Costa, Nancy L.; Heath, Andrew C.; Hebebrand, Johannes; Homuth, Georg; Hu, Frank B.; Hunt, Sarah E.; Hyppönen, Elina; Iribarren, Carlos; Jacobs, Kevin B.; Jansson, John-Olov; Jula, Antti; Kähönen, Mika; Kathiresan, Sekar; Kee, Frank; Khaw, Kay-Tee; Kivimaki, Mika; Koenig, Wolfgang; Kraja, Aldi T.; Kumari, Meena; Kuulasmaa, Kari; Kuusisto, Johanna; Laitinen, Jaana H.; Lakka, Timo A.; Langenberg, Claudia; Launer, Lenore J.; Lind, Lars; Lindström, Jaana; Liu, Jianjun; Liuzzi, Antonio; Lokki, Marja-Liisa; Lorentzon, Mattias; Madden, Pamela A.; Magnusson, Patrik K.; Manunta, Paolo; Marek, Diana; März, Winfried; Leach, Irene Mateo; McKnight, Barbara; Medland, Sarah E.; Mihailov, Evelin; Milani, Lili; Montgomery, Grant W.; Mooser, Vincent; Mühleisen, Thomas W.; Munroe, Patricia B.; Musk, Arthur W.; Narisu, Narisu; Navis, Gerjan; Nicholson, George; Nohr, Ellen A.; Ong, Ken K.; Oostra, Ben A.; Palmer, Colin N.A.; Palotie, Aarno; Peden, John F.; Pedersen, Nancy; Peters, Annette; Polasek, Ozren; Pouta, Anneli; Pramstaller, Peter P.; Prokopenko, Inga; Pütter, Carolin; Radhakrishnan, Aparna; Raitakari, Olli; Rendon, Augusto; Rivadeneira, Fernando; Rudan, Igor; Saaristo, Timo E.; Sambrook, Jennifer G.; Sanders, Alan R.; Sanna, Serena; Saramies, Jouko; Schipf, Sabine; Schreiber, Stefan; Schunkert, Heribert; Shin, So-Youn; Signorini, Stefano; Sinisalo, Juha; Skrobek, Boris; Soranzo, Nicole; Stančáková, Alena; Stark, Klaus; Stephens, Jonathan C.; Stirrups, Kathleen; Stolk, Ronald P.; Stumvoll, Michael; Swift, Amy J.; Theodoraki, Eirini V.; Thorand, Barbara; Tregouet, David-Alexandre; Tremoli, Elena; Van der Klauw, Melanie M.; van Meurs, Joyce B.J.; Vermeulen, Sita H.; Viikari, Jorma; Virtamo, Jarmo; Vitart, Veronique; Waeber, Gérard; Wang, Zhaoming; Widén, Elisabeth; Wild, Sarah H.; Willemsen, Gonneke; Winkelmann, Bernhard R.; Witteman, Jacqueline C.M.; Wolffenbuttel, Bruce H.R.; Wong, Andrew; Wright, Alan F.

    2013-01-01

    Meta-analysis of genome-wide association studies (GWASs) has led to the discoveries of many common variants associated with complex human diseases. There is a growing recognition that identifying “causal” rare variants also requires large-scale meta-analysis. The fact that association tests with rare variants are performed at the gene level rather than at the variant level poses unprecedented challenges in the meta-analysis. First, different studies may adopt different gene-level tests, so the results are not compatible. Second, gene-level tests require multivariate statistics (i.e., components of the test statistic and their covariance matrix), which are difficult to obtain. To overcome these challenges, we propose to perform gene-level tests for rare variants by combining the results of single-variant analysis (i.e., p values of association tests and effect estimates) from participating studies. This simple strategy is possible because of an insight that multivariate statistics can be recovered from single-variant statistics, together with the correlation matrix of the single-variant test statistics, which can be estimated from one of the participating studies or from a publicly available database. We show both theoretically and numerically that the proposed meta-analysis approach provides accurate control of the type I error and is as powerful as joint analysis of individual participant data. This approach accommodates any disease phenotype and any study design and produces all commonly used gene-level tests. An application to the GWAS summary results of the Genetic Investigation of ANthropometric Traits (GIANT) consortium reveals rare and low-frequency variants associated with human height. The relevant software is freely available. PMID:23891470

  11. A transcriptomic approach to identify regulatory genes involved in fruit set of wild-type and parthenocarpic tomato genotypes.

    PubMed

    Ruiu, Fabrizio; Picarella, Maurizio Enea; Imanishi, Shunsuke; Mazzucato, Andrea

    2015-10-01

    The tomato parthenocarpic fruit (pat) mutation associates a strong competence for parthenocarpy with homeotic transformation of anthers and aberrancy of ovules. To dissect this complex floral phenotype, genes involved in the pollination-independent fruit set of the pat mutant were investigated by microarray analysis using wild-type and mutant ovaries. Normalized expression data were subjected to one-way ANOVA and 2499 differentially expressed genes (DEGs) displaying a >1.5 log-fold change in at least one of the pairwise comparisons analyzed were detected. DEGs were categorized into 20 clusters and clusters classified into five groups representing transcripts with similar expression dynamics. The "regulatory function" group (685 DEGs) contained putative negative or positive fruit set regulators, "pollination-dependent" (411 DEGs) included genes activated by pollination, "fruit growth-related" (815 DEGs) genes activated at early fruit growth. The last groups listed genes with different or similar expression pattern at all stages in the two genotypes. qRT-PCR validation of 20 DEGs plus other four selected genes assessed the high reliability of microarray expression data; the average correlation coefficient for the 20 DEGs was 0.90. In all the groups were evidenced relevant transcription factors encoding proteins regulating meristem differentiation and floral organ development, genes involved in metabolism, transport and response of hormones, genes involved in cell division and in primary and secondary metabolism. Among pathways related to secondary metabolites emerged genes related to the synthesis of flavonoids, supporting the recent evidence that these compounds are important at the fruit set phase. Selected genes showing a de-regulated expression pattern in pat were studied in other four parthenocarpic genotypes either genetically anonymous or carrying lesions in known gene sequences. This comparative approach offered novel insights for improving the present

  12. Repressors Nrg1 and Nrg2 regulate a set of stress-responsive genes in Saccharomyces cerevisiae.

    PubMed

    Vyas, Valmik K; Berkey, Cristin D; Miyao, Takenori; Carlson, Marian

    2005-11-01

    The yeast Saccharomyces cerevisiae responds to environmental stress by rapidly altering the expression of large sets of genes. We report evidence that the transcriptional repressors Nrg1 and Nrg2 (Nrg1/Nrg2), which were previously implicated in glucose repression, regulate a set of stress-responsive genes. Genome-wide expression analysis identified 150 genes that were upregulated in nrg1Delta nrg2Delta double mutant cells, relative to wild-type cells, during growth in glucose. We found that many of these genes are regulated by glucose repression. Stress response elements (STREs) and STRE-like elements are overrepresented in the promoters of these genes, and a search of available expression data sets showed that many are regulated in response to a variety of environmental stress signals. In accord with these findings, mutation of NRG1 and NRG2 enhanced the resistance of cells to salt and oxidative stress and decreased tolerance to freezing. We present evidence that Nrg1/Nrg2 not only contribute to repression of target genes in the absence of stress but also limit induction in response to salt stress. We suggest that Nrg1/Nrg2 fine-tune the regulation of a set of stress-responsive genes. PMID:16278455

  13. Classification of Non-Small Cell Lung Cancer Using Significance Analysis of Microarray-Gene Set Reduction Algorithm

    PubMed Central

    Zhang, Lei; Wang, Linlin; Du, Bochuan; Wang, Tianjiao; Tian, Pu

    2016-01-01

    Among non-small cell lung cancer (NSCLC), adenocarcinoma (AC), and squamous cell carcinoma (SCC) are two major histology subtypes, accounting for roughly 40% and 30% of all lung cancer cases, respectively. Since AC and SCC differ in their cell of origin, location within the lung, and growth pattern, they are considered as distinct diseases. Gene expression signatures have been demonstrated to be an effective tool for distinguishing AC and SCC. Gene set analysis is regarded as irrelevant to the identification of gene expression signatures. Nevertheless, we found that one specific gene set analysis method, significance analysis of microarray-gene set reduction (SAMGSR), can be adopted directly to select relevant features and to construct gene expression signatures. In this study, we applied SAMGSR to a NSCLC gene expression dataset. When compared with several novel feature selection algorithms, for example, LASSO, SAMGSR has equivalent or better performance in terms of predictive ability and model parsimony. Therefore, SAMGSR is a feature selection algorithm, indeed. Additionally, we applied SAMGSR to AC and SCC subtypes separately to discriminate their respective stages, that is, stage II versus stage I. Few overlaps between these two resulting gene signatures illustrate that AC and SCC are technically distinct diseases. Therefore, stratified analyses on subtypes are recommended when diagnostic or prognostic signatures of these two NSCLC subtypes are constructed. PMID:27446945

  14. Statistical Analysis of a Large Sample Size Pyroshock Test Data Set Including Post Flight Data Assessment. Revision 1

    NASA Technical Reports Server (NTRS)

    Hughes, William O.; McNelis, Anne M.

    2010-01-01

    The Earth Observing System (EOS) Terra spacecraft was launched on an Atlas IIAS launch vehicle on its mission to observe planet Earth in late 1999. Prior to launch, the new design of the spacecraft's pyroshock separation system was characterized by a series of 13 separation ground tests. The analysis methods used to evaluate this unusually large amount of shock data will be discussed in this paper, with particular emphasis on population distributions and finding statistically significant families of data, leading to an overall shock separation interface level. The wealth of ground test data also allowed a derivation of a Mission Assurance level for the flight. All of the flight shock measurements were below the EOS Terra Mission Assurance level thus contributing to the overall success of the EOS Terra mission. The effectiveness of the statistical methodology for characterizing the shock interface level and for developing a flight Mission Assurance level from a large sample size of shock data is demonstrated in this paper.

  15. META-GSA: Combining Findings from Gene-Set Analyses across Several Genome-Wide Association Studies

    PubMed Central

    Rosenberger, Albert; Friedrichs, Stefanie; Amos, Christopher I.; Brennan, Paul; Fehringer, Gordon; Heinrich, Joachim; Hung, Rayjean J.; Muley, Thomas; Müller-Nurasyid, Martina; Risch, Angela; Bickeböller, Heike

    2015-01-01

    Introduction Gene-set analysis (GSA) methods are used as complementary approaches to genome-wide association studies (GWASs). The single marker association estimates of a predefined set of genes are either contrasted with those of all remaining genes or with a null non-associated background. To pool the p-values from several GSAs, it is important to take into account the concordance of the observed patterns resulting from single marker association point estimates across any given gene set. Here we propose an enhanced version of Fisher’s inverse χ2-method META-GSA, however weighting each study to account for imperfect correlation between association patterns. Simulation and Power We investigated the performance of META-GSA by simulating GWASs with 500 cases and 500 controls at 100 diallelic markers in 20 different scenarios, simulating different relative risks between 1 and 1.5 in gene sets of 10 genes. Wilcoxon’s rank sum test was applied as GSA for each study. We found that META-GSA has greater power to discover truly associated gene sets than simple pooling of the p-values, by e.g. 59% versus 37%, when the true relative risk for 5 of 10 genes was assume to be 1.5. Under the null hypothesis of no difference in the true association pattern between the gene set of interest and the set of remaining genes, the results of both approaches are almost uncorrelated. We recommend not relying on p-values alone when combining the results of independent GSAs. Application We applied META-GSA to pool the results of four case-control GWASs of lung cancer risk (Central European Study and Toronto/Lunenfeld-Tanenbaum Research Institute Study; German Lung Cancer Study and MD Anderson Cancer Center Study), which had already been analyzed separately with four different GSA methods (EASE; SLAT, mSUMSTAT and GenGen). This application revealed the pathway GO0015291 “transmembrane transporter activity” as significantly enriched with associated genes (GSA-method: EASE, p = 0

  16. Redundancy control in pathway databases (ReCiPa): an application for improving gene-set enrichment analysis in Omics studies and "Big data" biology.

    PubMed

    Vivar, Juan C; Pemu, Priscilla; McPherson, Ruth; Ghosh, Sujoy

    2013-08-01

    Abstract Unparalleled technological advances have fueled an explosive growth in the scope and scale of biological data and have propelled life sciences into the realm of "Big Data" that cannot be managed or analyzed by conventional approaches. Big Data in the life sciences are driven primarily via a diverse collection of 'omics'-based technologies, including genomics, proteomics, metabolomics, transcriptomics, metagenomics, and lipidomics. Gene-set enrichment analysis is a powerful approach for interrogating large 'omics' datasets, leading to the identification of biological mechanisms associated with observed outcomes. While several factors influence the results from such analysis, the impact from the contents of pathway databases is often under-appreciated. Pathway databases often contain variously named pathways that overlap with one another to varying degrees. Ignoring such redundancies during pathway analysis can lead to the designation of several pathways as being significant due to high content-similarity, rather than truly independent biological mechanisms. Statistically, such dependencies also result in correlated p values and overdispersion, leading to biased results. We investigated the level of redundancies in multiple pathway databases and observed large discrepancies in the nature and extent of pathway overlap. This prompted us to develop the application, ReCiPa (Redundancy Control in Pathway Databases), to control redundancies in pathway databases based on user-defined thresholds. Analysis of genomic and genetic datasets, using ReCiPa-generated overlap-controlled versions of KEGG and Reactome pathways, led to a reduction in redundancy among the top-scoring gene-sets and allowed for the inclusion of additional gene-sets representing possibly novel biological mechanisms. Using obesity as an example, bioinformatic analysis further demonstrated that gene-sets identified from overlap-controlled pathway databases show stronger evidence of prior association

  17. Association Signals Unveiled by a Comprehensive Gene Set Enrichment Analysis of Dental Caries Genome-Wide Association Studies

    PubMed Central

    Cuenco, Karen T.; Zeng, Zhen; Feingold, Eleanor; Marazita, Mary L.; Wang, Lily; Zhao, Zhongming

    2013-01-01

    Gene set-based analysis of genome-wide association study (GWAS) data has recently emerged as a useful approach to examine the joint effects of multiple risk loci in complex human diseases or phenotypes. Dental caries is a common, chronic, and complex disease leading to a decrease in quality of life worldwide. In this study, we applied the approaches of gene set enrichment analysis to a major dental caries GWAS dataset, which consists of 537 cases and 605 controls. Using four complementary gene set analysis methods, we analyzed 1331 Gene Ontology (GO) terms collected from the Molecular Signatures Database (MSigDB). Setting false discovery rate (FDR) threshold as 0.05, we identified 13 significantly associated GO terms. Additionally, 17 terms were further included as marginally associated because they were top ranked by each method, although their FDR is higher than 0.05. In total, we identified 30 promising GO terms, including ‘Sphingoid metabolic process,’ ‘Ubiquitin protein ligase activity,’ ‘Regulation of cytokine secretion,’ and ‘Ceramide metabolic process.’ These GO terms encompass broad functions that potentially interact and contribute to the oral immune response related to caries development, which have not been reported in the standard single marker based analysis. Collectively, our gene set enrichment analysis provided complementary insights into the molecular mechanisms and polygenic interactions in dental caries, revealing promising association signals that could not be detected through single marker analysis of GWAS data. PMID:23967329

  18. Statistical analysis of the effective factors on the 28 days compressive strength and setting time of the concrete

    PubMed Central

    Abolpour, Bahador; Mehdi Afsahi, Mohammad; Hosseini, Saeed Gharib

    2014-01-01

    In this study, the effects of various factors (weight fraction of the SiO2, Al2O3, Fe2O3, Na2O, K2O, CaO, MgO, Cl, SO3, and the Blaine of the cement particles) on the concrete compressive strength and also initial setting time have been investigated. Compressive strength and setting time tests have been carried out based on DIN standards in this study. Interactions of these factors have been obtained by the use of analysis of variance and regression equations of these factors have been obtained to predict the concrete compressive strength and initial setting time. Also, simple and applicable formulas with less than 6% absolute mean error have been developed using the genetic algorithm to predict these parameters. Finally, the effect of each factor has been investigated when other factors are in their low or high level. PMID:26425360

  19. Statistical analysis of the effective factors on the 28 days compressive strength and setting time of the concrete.

    PubMed

    Abolpour, Bahador; Mehdi Afsahi, Mohammad; Hosseini, Saeed Gharib

    2015-09-01

    In this study, the effects of various factors (weight fraction of the SiO2, Al2O3, Fe2O3, Na2O, K2O, CaO, MgO, Cl, SO3, and the Blaine of the cement particles) on the concrete compressive strength and also initial setting time have been investigated. Compressive strength and setting time tests have been carried out based on DIN standards in this study. Interactions of these factors have been obtained by the use of analysis of variance and regression equations of these factors have been obtained to predict the concrete compressive strength and initial setting time. Also, simple and applicable formulas with less than 6% absolute mean error have been developed using the genetic algorithm to predict these parameters. Finally, the effect of each factor has been investigated when other factors are in their low or high level. PMID:26425360

  20. Evaluation of daily precipitation statistics and monsoon onset/retreat over western Sahel in multiple data sets

    NASA Astrophysics Data System (ADS)

    Diaconescu, Emilia Paula; Gachon, Philippe; Scinocca, John; Laprise, René

    2015-09-01

    The West Africa rainfall regime constitutes a considerable challenge for Regional Climate Models (RCMs) due to the complexity of dynamical and physical processes that characterise the West African Monsoon. In this paper, daily precipitation statistics are evaluated from the contributions to the AFRICA-CORDEX experiment from two ERA-Interim driven Canadian RCMs: CanRCM4, developed at the Canadian Centre for Climate Modelling and Analysis (CCCma) and CRCM5, developed at the University of Québec at Montréal. These modelled precipitation statistics are evaluated against three gridded observed datasets—the Global Precipitation Climatology Project (GPCP), the Tropical Rainfall Measuring Mission (TRMM), and the Africa Rainfall Climatology (ARC2)—and four reanalysis products (ECMWF ERA-Interim, NCEP/DOE Reanalysis II, NASA MERRA and NOAA-CIRES Twentieth Century Reanalysis). The two RCMs share the same dynamics from the Environment Canada GEM forecast model, but have two different physics' packages: CanRCM4 obtains its physics from CCCma's global atmospheric model (CanAM4), while CRCM5 shares a number of its physics modules with the limited-area version of GEM forecast model. The evaluation is focused on various daily precipitation statistics (maximum number of consecutive wet days, number of moderate and very heavy precipitation events, precipitation frequency distribution) and on the monsoon onset and retreat over the Sahel region. We find that the CRCM5 has a good representation of daily precipitation statistics over the southern Sahel, with spatial distributions close to GPCP dataset. Some differences are observed in the northern part of the Sahel, where the model is characterised by a dry bias. CanRCM4 and the ERA-Interim and MERRA reanalysis products overestimate the number of wet days over Sahel with a shift in the frequency distribution toward smaller daily precipitation amounts than in observations. Both RCMs and reanalyses have difficulties in reproducing

  1. Empirical aspects of record linkage across multiple data sets using statistical linkage keys: the experience of the PIAC cohort study

    PubMed Central

    2010-01-01

    Background In Australia, many community service program data collections developed over the last decade, including several for aged care programs, contain a statistical linkage key (SLK) to enable derivation of client-level data. In addition, a common SLK is now used in many collections to facilitate the statistical examination of cross-program use. In 2005, the Pathways in Aged Care (PIAC) cohort study was funded to create a linked aged care database using the common SLK to enable analysis of pathways through aged care services. Linkage using an SLK is commonly deterministic. The purpose of this paper is to describe an extended deterministic record linkage strategy for situations where there is a general person identifier (e.g. an SLK) and several additional variables suitable for data linkage. This approach can allow for variation in client information recorded on different databases. Methods A stepwise deterministic record linkage algorithm was developed to link datasets using an SLK and several other variables. Three measures of likely match accuracy were used: the discriminating power of match key values, an estimated false match rate, and an estimated step-specific trade-off between true and false matches. The method was validated through examining link properties and clerical review of three samples of links. Results The deterministic algorithm resulted in up to an 11% increase in links compared with simple deterministic matching using an SLK. The links identified are of high quality: validation samples showed that less than 0.5% of links were false positives, and very few matches were made using non-unique match information (0.01%). There was a high degree of consistency in the characteristics of linked events. Conclusions The linkage strategy described in this paper has allowed the linking of multiple large aged care service datasets using a statistical linkage key while allowing for variation in its reporting. More widely, our deterministic algorithm

  2. Building a statistical emulator for prediction of crop yield response to climate change: a global gridded panel data set approach

    NASA Astrophysics Data System (ADS)

    Mistry, Malcolm; De Cian, Enrica; Wing, Ian Sue

    2015-04-01

    There is widespread concern that trends and variability in weather induced by climate change will detrimentally affect global agricultural productivity and food supplies. Reliable quantification of the risks of negative impacts at regional and global scales is a critical research need, which has so far been met by forcing state-of-the-art global gridded crop models with outputs of global climate model (GCM) simulations in exercises such as the Inter-Sectoral Impact Model Intercomparison Project (ISIMIP)-Fastrack. Notwithstanding such progress, it remains challenging to use these simulation-based projections to assess agricultural risk because their gridded fields of crop yields are fundamentally denominated as discrete combinations of warming scenarios, GCMs and crop models, and not as model-specific or model-averaged yield response functions of meteorological shifts, which may have their own independent probability of occurrence. By contrast, the empirical climate economics literature has adeptly represented agricultural responses to meteorological variables as reduced-form statistical response surfaces which identify the crop productivity impacts of additional exposure to different intervals of temperature and precipitation [cf Schlenker and Roberts, 2009]. This raises several important questions: (1) what do the equivalent reduced-form statistical response surfaces look like for crop model outputs, (2) do they exhibit systematic variation over space (e.g., crop suitability zones) or across crop models with different characteristics, (3) how do they compare to estimates based on historical observations, and (4) what are the implications for the characterization of climate risks? We address these questions by estimating statistical yield response functions for four major crops (maize, rice, wheat and soybeans) over the historical period (1971-2004) as well as future climate change scenarios (2005-2099) using ISIMIP-Fastrack data for five GCMs and seven crop models

  3. Comparison of several Russian populations by vital statistics and frequency of genes causing hereditary pathology

    SciTech Connect

    El`chinova, G.I.; Mamedova, R.A.; Brusintseva, O.V.; Ginter, E.K.

    1994-11-01

    Distances computed from vital statistics using the Euclid formula and thus termed {open_quotes}vital{close_quotes} are proposed for use in population studies. An example of use of these statistics for comparison of four large geographically separated Russian populations is given. 9 refs., 1 tab.

  4. Correlation of a set of gene variants, life events and personality features on adult ADHD severity.

    PubMed

    Müller, Daniel J; Chiesa, Alberto; Mandelli, Laura; De Luca, Vincenzo; De Ronchi, Diana; Jain, Umesh; Serretti, Alessandro; Kennedy, James L

    2010-07-01

    Increasing evidence suggests that symptoms of attention deficit hyperactivity disorder (ADHD) could persist into adult life in a substantial proportion of cases. The aim of the present study was to investigate the impact of (1) adverse events, (2) personality traits and (3) genetic variants chosen on the basis of previous findings and (4) their possible interactions on adult ADHD severity. One hundred and ten individuals diagnosed with adult ADHD were evaluated for occurrence of adverse events in childhood and adulthood, and personality traits by the Temperament and Character Inventory (TCI). Common polymorphisms within a set of nine important candidate genes (SLC6A3, DBH, DRD4, DRD5, HTR2A, CHRNA7, BDNF, PRKG1 and TAAR9) were genotyped for each subject. Life events, personality traits and genetic variations were analyzed in relationship to severity of current symptoms, according to the Brown Attention Deficit Disorder Scale (BADDS). Genetic variations were not significantly associated with severity of ADHD symptoms. Life stressors displayed only a minor effect as compared to personality traits. Indeed, symptoms' severity was significantly correlated with the temperamental trait of Harm avoidance and the character trait of Self directedness. The results of the present work are in line with previous evidence of a significant correlation between some personality traits and adult ADHD. However, several limitations such as the small sample size and the exclusion of patients with other severe comorbid psychiatric disorders could have influenced the significance of present findings. PMID:20006992

  5. Gene-set analysis based on the pharmacological profiles of drugs to identify repurposing opportunities in schizophrenia.

    PubMed

    de Jong, Simone; Vidler, Lewis R; Mokrab, Younes; Collier, David A; Breen, Gerome

    2016-08-01

    Genome-wide association studies (GWAS) have identified thousands of novel genetic associations for complex genetic disorders, leading to the identification of potential pharmacological targets for novel drug development. In schizophrenia, 108 conservatively defined loci that meet genome-wide significance have been identified and hundreds of additional sub-threshold associations harbour information on the genetic aetiology of the disorder. In the present study, we used gene-set analysis based on the known binding targets of chemical compounds to identify the 'drug pathways' most strongly associated with schizophrenia-associated genes, with the aim of identifying potential drug repositioning opportunities and clues for novel treatment paradigms, especially in multi-target drug development. We compiled 9389 gene sets (2496 with unique gene content) and interrogated gene-based p-values from the PGC2-SCZ analysis. Although no single drug exceeded experiment wide significance (corrected p<0.05), highly ranked gene-sets reaching suggestive significance including the dopamine receptor antagonists metoclopramide and trifluoperazine and the tyrosine kinase inhibitor neratinib. This is a proof of principle analysis showing the potential utility of GWAS data of schizophrenia for the direct identification of candidate drugs and molecules that show polypharmacy. PMID:27302942

  6. Statistical Inference and Reverse Engineering of Gene Regulatory Networks from Observational Expression Data

    PubMed Central

    Emmert-Streib, Frank; Glazko, Galina V.; Altay, Gökmen; de Matos Simoes, Ricardo

    2012-01-01

    In this paper, we present a systematic and conceptual overview of methods for inferring gene regulatory networks from observational gene expression data. Further, we discuss two classic approaches to infer causal structures and compare them with contemporary methods by providing a conceptual categorization thereof. We complement the above by surveying global and local evaluation measures for assessing the performance of inference algorithms. PMID:22408642

  7. Statistical generation of training sets for measuring NO3(-), NH4(+) and major ions in natural waters using an ion selective electrode array.

    PubMed

    Mueller, Amy V; Hemond, Harold F

    2016-05-18

    Knowledge of ionic concentrations in natural waters is essential to understand watershed processes. Inorganic nitrogen, in the form of nitrate and ammonium ions, is a key nutrient as well as a participant in redox, acid-base, and photochemical processes of natural waters, leading to spatiotemporal patterns of ion concentrations at scales as small as meters or hours. Current options for measurement in situ are costly, relying primarily on instruments adapted from laboratory methods (e.g., colorimetric, UV absorption); free-standing and inexpensive ISE sensors for NO3(-) and NH4(+) could be attractive alternatives if interferences from other constituents were overcome. Multi-sensor arrays, coupled with appropriate non-linear signal processing, offer promise in this capacity but have not yet successfully achieved signal separation for NO3(-) and NH4(+)in situ at naturally occurring levels in unprocessed water samples. A novel signal processor, underpinned by an appropriate sensor array, is proposed that overcomes previous limitations by explicitly integrating basic chemical constraints (e.g., charge balance). This work further presents a rationalized process for the development of such in situ instrumentation for NO3(-) and NH4(+), including a statistical-modeling strategy for instrument design, training/calibration, and validation. Statistical analysis reveals that historical concentrations of major ionic constituents in natural waters across New England strongly covary and are multi-modal. This informs the design of a statistically appropriate training set, suggesting that the strong covariance of constituents across environmental samples can be exploited through appropriate signal processing mechanisms to further improve estimates of minor constituents. Two artificial neural network architectures, one expanded to incorporate knowledge of basic chemical constraints, were tested to process outputs of a multi-sensor array, trained using datasets of varying degrees of

  8. Understanding gene regulatory mechanisms by integrating ChIP-seq and RNA-seq data: statistical solutions to biological problems

    PubMed Central

    Angelini, Claudia; Costa, Valerio

    2014-01-01

    The availability of omic data produced from international consortia, as well as from worldwide laboratories, is offering the possibility both to answer long-standing questions in biomedicine/molecular biology and to formulate novel hypotheses to test. However, the impact of such data is not fully exploited due to a limited availability of multi-omic data integration tools and methods. In this paper, we discuss the interplay between gene expression and epigenetic markers/transcription factors. We show how integrating ChIP-seq and RNA-seq data can help to elucidate gene regulatory mechanisms. In particular, we discuss the two following questions: (i) Can transcription factor occupancies or histone modification data predict gene expression? (ii) Can ChIP-seq and RNA-seq data be used to infer gene regulatory networks? We propose potential directions for statistical data integration. We discuss the importance of incorporating underestimated aspects (such as alternative splicing and long-range chromatin interactions). We also highlight the lack of data benchmarks and the need to develop tools for data integration from a statistical viewpoint, designed in the spirit of reproducible research. PMID:25364758

  9. Comparative genomics of lactic acid bacteria reveals a niche-specific gene set

    PubMed Central

    2009-01-01

    Background The recently sequenced genome of Lactobacillus helveticus DPC4571 [1] revealed a dairy organism with significant homology (75% of genes are homologous) to a probiotic bacteria Lb. acidophilus NCFM [2]. This led us to hypothesise that a group of genes could be determined which could define an organism's niche. Results Taking 11 fully sequenced lactic acid bacteria (LAB) as our target, (3 dairy LAB, 5 gut LAB and 3 multi-niche LAB), we demonstrated that the presence or absence of certain genes involved in sugar metabolism, the proteolytic system, and restriction modification enzymes were pivotal in suggesting the niche of a strain. We identified 9 niche specific genes, of which 6 are dairy specific and 3 are gut specific. The dairy specific genes identified in Lactobacillus helveticus DPC4571 were lhv_1161 and lhv_1171, encoding components of the proteolytic system, lhv_1031 lhv_1152, lhv_1978 and lhv_0028 encoding restriction endonuclease genes, while bile salt hydrolase genes lba_0892 and lba_1078, and the sugar metabolism gene lba_1689 from Lb. acidophilus NCFM were identified as gut specific genes. Conclusion Comparative analysis revealed that if an organism had homologs to the dairy specific geneset, it probably came from a dairy environment, whilst if it had homologs to gut specific genes, it was highly likely to be of intestinal origin. We propose that this "barcode" of 9 genes will be a useful initial guide to researchers in the LAB field to indicate an organism's ability to occupy a specific niche. PMID:19265535

  10. Genomic determinants of sporulation in Bacilli and Clostridia: towards the minimal set of sporulation-specific genes

    PubMed Central

    Galperin, Michael Y; Mekhedov, Sergei L; Puigbo, Pere; Smirnov, Sergey; Wolf, Yuri I; Rigden, Daniel J

    2012-01-01

    Three classes of low-G+C Gram-positive bacteria (Firmicutes), Bacilli, Clostridia and Negativicutes, include numerous members that are capable of producing heat-resistant endospores. Spore-forming firmicutes include many environmentally important organisms, such as insect pathogens and cellulose-degrading industrial strains, as well as human pathogens responsible for such diseases as anthrax, botulism, gas gangrene and tetanus. In the best-studied model organism Bacillus subtilis, sporulation involves over 500 genes, many of which are conserved among other bacilli and clostridia. This work aimed to define the genomic requirements for sporulation through an analysis of the presence of sporulation genes in various firmicutes, including those with smaller genomes than B. subtilis. Cultivable spore-formers were found to have genomes larger than 2300 kb and encompass over 2150 protein-coding genes of which 60 are orthologues of genes that are apparently essential for sporulation in B. subtilis. Clostridial spore-formers lack, among others, spoIIB, sda, spoVID and safA genes and have non-orthologous displacements of spoIIQ and spoIVFA, suggesting substantial differences between bacilli and clostridia in the engulfment and spore coat formation steps. Many B. subtilis sporulation genes, particularly those encoding small acid-soluble spore proteins and spore coat proteins, were found only in the family Bacillaceae, or even in a subset of Bacillus spp. Phylogenetic profiles of sporulation genes, compiled in this work, confirm the presence of a common sporulation gene core, but also illuminate the diversity of the sporulation processes within various lineages. These profiles should help further experimental studies of uncharacterized widespread sporulation genes, which would ultimately allow delineation of the minimal set(s) of sporulation-specific genes in Bacilli and Clostridia. PMID:22882546

  11. Genome-wide association data suggest ABCB1 and immune-related gene sets may be involved in adult antisocial behavior.

    PubMed

    Salvatore, J E; Edwards, A C; McClintick, J N; Bigdeli, T B; Adkins, A; Aliev, F; Edenberg, H J; Foroud, T; Hesselbrock, V; Kramer, J; Nurnberger, J I; Schuckit, M; Tischfield, J A; Xuei, X; Dick, D M

    2015-01-01

    Adult antisocial behavior (AAB) is moderately heritable, relatively common and has adverse consequences for individuals and society. We examined the molecular genetic basis of AAB in 1379 participants from a case-control study in which the cases met criteria for alcohol dependence. We also examined whether genes of interest were expressed in human brain. AAB was measured using a count of the number of Antisocial Personality Disorder criteria endorsed under criterion A from the Diagnostic and Statistical Manual of Mental Disorders, 4th Edition (DSM-IV). Participants were genotyped on the Illumina Human 1M BeadChip. In total, all single-nucleotide polymorphisms (SNPs) accounted for 25% of the variance in AAB, although this estimate was not significant (P=0.09). Enrichment tests indicated that more significantly associated genes were over-represented in seven gene sets, and most were immune related. Our most highly associated SNP (rs4728702, P=5.77 × 10(-7)) was located in the protein-coding adenosine triphosphate-binding cassette, sub-family B (MDR/TAP), member 1 (ABCB1). In a gene-based test, ABCB1 was genome-wide significant (q=0.03). Expression analyses indicated that ABCB1 was robustly expressed in the brain. ABCB1 has been implicated in substance use, and in post hoc tests we found that variation in ABCB1 was associated with DSM-IV alcohol and cocaine dependence criterion counts. These results suggest that ABCB1 may confer risk across externalizing behaviors, and are consistent with previous suggestions that immune pathways are associated with externalizing behaviors. The results should be tempered by the fact that we did not replicate the associations for ABCB1 or the gene sets in a less-affected independent sample. PMID:25918995

  12. Genome-wide association data suggest ABCB1 and immune-related gene sets may be involved in adult antisocial behavior

    PubMed Central

    Salvatore, J E; Edwards, A C; McClintick, J N; Bigdeli, T B; Adkins, A; Aliev, F; Edenberg, H J; Foroud, T; Hesselbrock, V; Kramer, J; Nurnberger, J I; Schuckit, M; Tischfield, J A; Xuei, X; Dick, D M

    2015-01-01

    Adult antisocial behavior (AAB) is moderately heritable, relatively common and has adverse consequences for individuals and society. We examined the molecular genetic basis of AAB in 1379 participants from a case–control study in which the cases met criteria for alcohol dependence. We also examined whether genes of interest were expressed in human brain. AAB was measured using a count of the number of Antisocial Personality Disorder criteria endorsed under criterion A from the Diagnostic and Statistical Manual of Mental Disorders, 4th Edition (DSM-IV). Participants were genotyped on the Illumina Human 1M BeadChip. In total, all single-nucleotide polymorphisms (SNPs) accounted for 25% of the variance in AAB, although this estimate was not significant (P=0.09). Enrichment tests indicated that more significantly associated genes were over-represented in seven gene sets, and most were immune related. Our most highly associated SNP (rs4728702, P=5.77 × 10−7) was located in the protein-coding adenosine triphosphate-binding cassette, sub-family B (MDR/TAP), member 1 (ABCB1). In a gene-based test, ABCB1 was genome-wide significant (q=0.03). Expression analyses indicated that ABCB1 was robustly expressed in the brain. ABCB1 has been implicated in substance use, and in post hoc tests we found that variation in ABCB1 was associated with DSM-IV alcohol and cocaine dependence criterion counts. These results suggest that ABCB1 may confer risk across externalizing behaviors, and are consistent with previous suggestions that immune pathways are associated with externalizing behaviors. The results should be tempered by the fact that we did not replicate the associations for ABCB1 or the gene sets in a less-affected independent sample. PMID:25918995

  13. Counteracting H3K4 methylation modulators Set1 and Jhd2 co-regulate chromatin dynamics and gene transcription

    PubMed Central

    Ramakrishnan, Saravanan; Pokhrel, Srijana; Palani, Sowmiya; Pflueger, Christian; Parnell, Timothy J.; Cairns, Bradley R.; Bhaskara, Srividya; Chandrasekharan, Mahesh B.

    2016-01-01

    Histone H3K4 methylation is connected to gene transcription from yeast to humans, but its mechanistic roles in transcription and chromatin dynamics remain poorly understood. We investigated the functions for Set1 and Jhd2, the sole H3K4 methyltransferase and H3K4 demethylase, respectively, in S. cerevisiae. Here, we show that Set1 and Jhd2 predominantly co-regulate genome-wide transcription. We find combined activities of Set1 and Jhd2 via H3K4 methylation contribute to positive or negative transcriptional regulation. Providing mechanistic insights, our data reveal that Set1 and Jhd2 together control nucleosomal turnover and occupancy during transcriptional co-regulation. Moreover, we find a genome-wide co-regulation of chromatin structure by Set1 and Jhd2 at different groups of transcriptionally active or inactive genes and at different regions within yeast genes. Overall, our study puts forth a model wherein combined actions of Set1 and Jhd2 via modulating H3K4 methylation−demethylation together control chromatin dynamics during various facets of transcriptional regulation. PMID:27325136

  14. Counteracting H3K4 methylation modulators Set1 and Jhd2 co-regulate chromatin dynamics and gene transcription.

    PubMed

    Ramakrishnan, Saravanan; Pokhrel, Srijana; Palani, Sowmiya; Pflueger, Christian; Parnell, Timothy J; Cairns, Bradley R; Bhaskara, Srividya; Chandrasekharan, Mahesh B

    2016-01-01

    Histone H3K4 methylation is connected to gene transcription from yeast to humans, but its mechanistic roles in transcription and chromatin dynamics remain poorly understood. We investigated the functions for Set1 and Jhd2, the sole H3K4 methyltransferase and H3K4 demethylase, respectively, in S. cerevisiae. Here, we show that Set1 and Jhd2 predominantly co-regulate genome-wide transcription. We find combined activities of Set1 and Jhd2 via H3K4 methylation contribute to positive or negative transcriptional regulation. Providing mechanistic insights, our data reveal that Set1 and Jhd2 together control nucleosomal turnover and occupancy during transcriptional co-regulation. Moreover, we find a genome-wide co-regulation of chromatin structure by Set1 and Jhd2 at different groups of transcriptionally active or inactive genes and at different regions within yeast genes. Overall, our study puts forth a model wherein combined actions of Set1 and Jhd2 via modulating H3K4 methylation-demethylation together control chromatin dynamics during various facets of transcriptional regulation. PMID:27325136

  15. Statistical analysis of GeneMark performance by cross-validation.

    PubMed

    Kleffe, J; Hermann, K; Borodovsky, M

    1996-03-01

    We have explored the performance of the GeneMark gene identification method using cross-validation over learning samples of E. coli DNA sequences. The computations gave more accurate estimations of the error rates in comparison with previous results when a sample of non-coding regions was derived from GenBank sequences with many true coding regions unannotated. The error rate components have been classified and delineated. It was shown that the method performs differently on class I, II and III genes. The most frequent errors come from misinterpreting the coding potential of the complementary sequence in the same frame. The effects of stop-codons present in alternative frames were also studied to understand better the main factors contributing to GeneMark performance. PMID:16749185

  16. High-resolution statistical mapping reveals gene territories in live yeast.

    PubMed

    Berger, Axel B; Cabal, Ghislain G; Fabre, Emmanuelle; Duong, Tarn; Buc, Henri; Nehrbass, Ulf; Olivo-Marin, Jean-Christophe; Gadal, Olivier; Zimmer, Christophe

    2008-12-01

    The nonrandom positioning of genes inside eukaryotic cell nuclei is implicated in central nuclear functions. However, the spatial organization of the genome remains largely uncharted, owing to limited resolution of optical microscopy, paucity of nuclear landmarks and moderate cell sampling. We developed a computational imaging approach that creates high-resolution probabilistic maps of subnuclear domains occupied by individual loci in budding yeast through automated analysis of thousands of living cells. After validation, we applied the technique to genes involved in galactose metabolism and ribosome biogenesis. We found that genomic loci are confined to 'gene territories' much smaller than the nucleus, which can be remodeled during transcriptional activation, and that the nucleolus is an important landmark for gene positioning. The technique can be used to visualize and quantify territory positions relative to each other and to nuclear landmarks, and should advance studies of nuclear architecture and function. PMID:18978785

  17. Differential profiling of volatile organic compound biomarker signatures utilizing a logical statistical filter-set and novel hybrid evolutionary classifiers

    NASA Astrophysics Data System (ADS)

    Grigsby, Claude C.; Zmuda, Michael A.; Boone, Derek W.; Highlander, Tyler C.; Kramer, Ryan M.; Rizki, Mateen M.

    2012-06-01

    A growing body of discoveries in molecular signatures has revealed that volatile organic compounds (VOCs), the small molecules associated with an individual's odor and breath, can be monitored to reveal the identity and presence of a unique individual, as well their overall physiological status. Given the analysis requirements for differential VOC profiling via gas chromatography/mass spectrometry, our group has developed a novel informatics platform, Metabolite Differentiation and Discovery Lab (MeDDL). In its current version, MeDDL is a comprehensive tool for time-series spectral registration and alignment, visualization, comparative analysis, and machine learning to facilitate the efficient analysis of multiple, large-scale biomarker discovery studies. The MeDDL toolset can therefore identify a large differential subset of registered peaks, where their corresponding intensities can be used as features for classification. This initial screening of peaks yields results sets that are typically too large for incorporation into a portable, electronic nose based system in addition to including VOCs that are not amenable to classification; consequently, it is also important to identify an optimal subset of these peaks to increase classification accuracy and to decrease the cost of the final system. MeDDL's learning tools include a classifier similar to a K-nearest neighbor classifier used in conjunction with a genetic algorithm (GA) that simultaneously optimizes the classifier and subset of features. The GA uses ROC curves to produce classifiers having maximal area under their ROC curve. Experimental results on over a dozen recognition problems show many examples of classifiers and feature sets that produce perfect ROC curves.

  18. Regions of Unusual Statistical Properties as Tools in the Search for Horizontally Transferred Genes in Escherichia coli

    NASA Astrophysics Data System (ADS)

    Putonti, Catherine; Chumakov, Sergei; Chavez, Arturo; Luo, Yi; Graur, Dan; Fox, George E.; Fofanov, Yuriy

    2006-09-01

    The observed diversity of statistical characteristics along genomic sequences is the result of the influences of a variety of ongoing processes including horizontal gene transfer, gene loss, genome rearrangements, and evolution. The rate at which various processes affect the genome typically varies between different genomic regions. Thus, variations in statistical properties seen in different regions of a genome are often associated with its evolution and functional organization. Analysis of such properties is therefore relevant to many ongoing biomedical research efforts. Similarity Plot or S-plot is a Windows-based application for large-scale comparisons and 2D visualization of similarities between genomic sequences. This application combines two approaches wildly used in genomics: window analysis of statistical characteristics along genomes and dot-plot visual representation. S-plot is effective in detecting highly similar regions between two genomes. Within a single genome, S-plot has the ability to identify highly dissimilar regions displaying unusual compositional properties. The application was used to perform a comparative analysis of 50+ microbial genomes as well as many eukaryote genomes including human, rat, mouse, and drosophila. We illustrate the uses of S-Plot in a comparison involving Escherichia coli K12 and E. coli O157:H7.

  19. Nutritional status of breastfed infants in rural Zambia: comparison of the National Center for Health Statistics growth reference versus the WHO 12-month breastfed pooled data set.

    PubMed Central

    Hautvast, J. L.; Pandor, A.; Burema, J.; Tolboom, J. J.; Chishimba, N.; Monnens, L. A.; van Staveren, W. A.

    2000-01-01

    Cross-sectional data for breastfed infants in rural Zambia were used to evaluate the effect of applying two different data sets as a reference, i.e. the WHO 12-month breastfed pooled data set and the National Center for Health Statistics (NCHS) growth reference in terms of prevalence of malnutrition (stunting, underweight, and wasting). A total of 518 infants who were attending mother-and-child health clinics were included. Age, weight and length were recorded. Anthropometric Z-scores were calculated in two ways: by applying the NCHS growth reference and by using the WHO breastfed data set. Anthropometric Z-scores calculated using the breastfed data set were lower during the first 6-7 months of life compared with those calculated by applying the NCHS growth reference. This resulted in a higher proportion of children aged 0-6 months being classified as stunted and underweight using the breastfed data set versus the NCHS growth reference. After the age of 7 months, similar prevalences of stunting or underweight were observed. Relatively few infants were classified as wasted. In order to adequately assess the prevalence of stunting and underweight in breastfed infants, it is recommended that a new growth reference be developed, as has been initiated by WHO. PMID:10885182

  20. Discovery of gene-gene interactions across multiple independent data sets of late onset Alzheimer disease from the Alzheimer Disease Genetics Consortium.

    PubMed

    Hohman, Timothy J; Bush, William S; Jiang, Lan; Brown-Gentry, Kristin D; Torstenson, Eric S; Dudek, Scott M; Mukherjee, Shubhabrata; Naj, Adam; Kunkle, Brian W; Ritchie, Marylyn D; Martin, Eden R; Schellenberg, Gerard D; Mayeux, Richard; Farrer, Lindsay A; Pericak-Vance, Margaret A; Haines, Jonathan L; Thornton-Wells, Tricia A

    2016-02-01

    Late-onset Alzheimer disease (AD) has a complex genetic etiology, involving locus heterogeneity, polygenic inheritance, and gene-gene interactions; however, the investigation of interactions in recent genome-wide association studies has been limited. We used a biological knowledge-driven approach to evaluate gene-gene interactions for consistency across 13 data sets from the Alzheimer Disease Genetics Consortium. Fifteen single nucleotide polymorphism (SNP)-SNP pairs within 3 gene-gene combinations were identified: SIRT1 × ABCB1, PSAP × PEBP4, and GRIN2B × ADRA1A. In addition, we extend a previously identified interaction from an endophenotype analysis between RYR3 × CACNA1C. Finally, post hoc gene expression analyses of the implicated SNPs further implicate SIRT1 and ABCB1, and implicate CDH23 which was most recently identified as an AD risk locus in an epigenetic analysis of AD. The observed interactions in this article highlight ways in which genotypic variation related to disease may depend on the genetic context in which it occurs. Further, our results highlight the utility of evaluating genetic interactions to explain additional variance in AD risk and identify novel molecular mechanisms of AD pathogenesis. PMID:26827652

  1. Based on the Fuzzy Set-valued Statistics and the Fuzzy Mathematics Theory in Air Traffic Control System Safety Appraisal Application

    NASA Astrophysics Data System (ADS)

    Zhaoning, Zhang; Na, Meng; Peng, Zhou

    Elaborated carries on the safety evaluation to the air traffic control system the important meaning .First, the person-equipment- environment- management system management model takes the instruction based on the systems engineering theory, establishes the air traffic management system safety evaluating indicator system. Next, based on the fuzzy set value statistical theory, calculates various targets the weight, and has carried on the fail-safe analysis to its weight; Based on the fuzzy mathematics theory, the use fuzzy comprehensive judgment carries on the safety evaluation to the air traffic management system. Finally, through the example analysis computation, confirmed has proposed the method the validity and the feasibility.

  2. Genetic Acquisition of NDM Gene Offers Sustainability among Clinical Isolates of Pseudomonas aeruginosa in Clinical Settings

    PubMed Central

    Mishra, Shweta; Upadhyay, Supriya; Sen, Malay Ranjan; Maurya, Anand Prakash; Choudhury, Debarati; Bhattacharjee, Amitabha

    2015-01-01

    New Delhi metallo β-lactamases are one of the most significant emerging resistance determinants towards carbapenem drugs. Their persistence and adaptability often depends on their genetic environment and linkage. This study reports a unique and novel arrangement of blaNDM-1 gene within clinical isolates of Pseudomonas aeruginosa from a tertiary referral hospital in north India. Three NDM positive clonally unrelated clinical isolates of P. aeruginosa were recovered from hospital patients. Association of integron with blaNDM-1 and presence of gene cassettes were assessed by PCR. Genetic linkage of NDM gene with ISAba125 was determined and in negative cases linkage in upstream region was mapped by inverse PCR. In which only one isolate’s NDM gene was linked with ISAba125 for mobility, while other two reveals new genetic arrangement and found to be inserted within DNA directed RNA polymerase gene of the host genome detected by inverse PCR followed by sequencing analysis. In continuation significance of this novel linkage was further analyzed wherein promoter site detected by Softberry BPROM software and activity were assessed by cloning succeeding semi-quantitative RT-PCR indicating the higher expression level of NDM gene. This study concluded out that the unique genetic makeup of NDM gene with DNA-dependent-RNA-polymerase favours adaptability to the host in hospital environment against huge antibiotic pressure. PMID:25635921

  3. Phosphoinositide 3-kinase and Bruton's tyrosine kinase regulate overlapping sets of genes in B lymphocytes

    PubMed Central

    Fruman, David A.; Ferl, Gregory Z.; An, Sam S.; Donahue, Amber C.; Satterthwaite, Anne B.; Witte, Owen N.

    2002-01-01

    Bruton's tyrosine kinase (Btk) acts downstream of phosphoinositide 3-kinase (PI3K) in a pathway required for B cell receptor (BCR)-dependent proliferation. We used DNA microarrays to determine what fraction of genes this pathway influences and to investigate whether PI3K and Btk mediate distinct gene regulation events. As complete loss-of-function mutations in PI3K and Btk alter B cell subpopulations and may cause compensatory changes in gene expression, we used B cells with partial loss of function in either PI3K or Btk. Only about 5% of the BCR-dependent gene expression changes were significantly affected by reduced PI3K or Btk. The results indicate that PI3K and Btk share target genes, and that PI3K influences additional genes independently of Btk. These data are consistent with PI3K acting through Btk and other effectors to regulate expression of a critical subset of BCR target genes that determine effective entry into the cell cycle. PMID:11756681

  4. CURLY LEAF Regulates Gene Sets Coordinating Seed Size and Lipid Biosynthesis.

    PubMed

    Liu, Jun; Deng, Shulin; Wang, Huan; Ye, Jian; Wu, Hui-Wen; Sun, Hai-Xi; Chua, Nam-Hai

    2016-05-01

    CURLY LEAF (CLF), a histone methyltransferase of Polycomb Repressive Complex 2 (PRC2) for trimethylation of histone H3 Lys 27 (H3K27me3), has been thought as a negative regulator controlling mainly postgermination growth in Arabidopsis (Arabidopsis thaliana). Approximately 14% to 29% of genic regions are decorated by H3K27me3 in the Arabidopsis genome; however, transcriptional repression activities of PRC2 on a majority of these regions remain unclear. Here, by analysis of transcriptome profiles, we found that approximately 11.6% genes in the Arabidopsis genome were repressed by CLF in various organs. Unexpectedly, approximately 54% of these genes were preferentially repressed in siliques. Further analyses of 118 transcriptome datasets uncovered a group of genes that was preferentially expressed and repressed by CLF in embryos at the mature-green stage. This observation suggests that CLF mediates a large-scale H3K27me3 programming/reprogramming event during embryonic development. Plants of clf-28 produced bigger and heavier seeds with higher oil content, larger oil bodies, and altered long-chain fatty acid composition compared with wild type. Around 46% of CLF-repressed genes were associated with H3K27me3 marks; moreover, we verified histone modification and transcriptional repression by CLF on regulatory genes. Our results suggest that CLF silences specific gene expression modules. Genes operating within a module have various molecular functions, but they cooperate to regulate a similar physiological function during embryo development. PMID:26945048

  5. Genetic acquisition of NDM gene offers sustainability among clinical isolates of Pseudomonas aeruginosa in clinical settings.

    PubMed

    Mishra, Shweta; Upadhyay, Supriya; Sen, Malay Ranjan; Maurya, Anand Prakash; Choudhury, Debarati; Bhattacharjee, Amitabha

    2015-01-01

    New Delhi metallo β-lactamases are one of the most significant emerging resistance determinants towards carbapenem drugs. Their persistence and adaptability often depends on their genetic environment and linkage. This study reports a unique and novel arrangement of blaNDM-1 gene within clinical isolates of Pseudomonas aeruginosa from a tertiary referral hospital in north India. Three NDM positive clonally unrelated clinical isolates of P. aeruginosa were recovered from hospital patients. Association of integron with blaNDM-1 and presence of gene cassettes were assessed by PCR. Genetic linkage of NDM gene with ISAba125 was determined and in negative cases linkage in upstream region was mapped by inverse PCR. In which only one isolate's NDM gene was linked with ISAba125 for mobility, while other two reveals new genetic arrangement and found to be inserted within DNA directed RNA polymerase gene of the host genome detected by inverse PCR followed by sequencing analysis. In continuation significance of this novel linkage was further analyzed wherein promoter site detected by Softberry BPROM software and activity were assessed by cloning succeeding semi-quantitative RT-PCR indicating the higher expression level of NDM gene. This study concluded out that the unique genetic makeup of NDM gene with DNA-dependent-RNA-polymerase favours adaptability to the host in hospital environment against huge antibiotic pressure. PMID:25635921

  6. CURLY LEAF Regulates Gene Sets Coordinating Seed Size and Lipid Biosynthesis1[OPEN

    PubMed Central

    Wang, Huan; Ye, Jian; Wu, Hui-Wen; Sun, Hai-Xi; Chua, Nam-Hai

    2016-01-01

    CURLY LEAF (CLF), a histone methyltransferase of Polycomb Repressive Complex 2 (PRC2) for trimethylation of histone H3 Lys 27 (H3K27me3), has been thought as a negative regulator controlling mainly postgermination growth in Arabidopsis (Arabidopsis thaliana). Approximately 14% to 29% of genic regions are decorated by H3K27me3 in the Arabidopsis genome; however, transcriptional repression activities of PRC2 on a majority of these regions remain unclear. Here, by analysis of transcriptome profiles, we found that approximately 11.6% genes in the Arabidopsis genome were repressed by CLF in various organs. Unexpectedly, approximately 54% of these genes were preferentially repressed in siliques. Further analyses of 118 transcriptome datasets uncovered a group of genes that was preferentially expressed and repressed by CLF in embryos at the mature-green stage. This observation suggests that CLF mediates a large-scale H3K27me3 programming/reprogramming event during embryonic development. Plants of clf-28 produced bigger and heavier seeds with higher oil content, larger oil bodies, and altered long-chain fatty acid composition compared with wild type. Around 46% of CLF-repressed genes were associated with H3K27me3 marks; moreover, we verified histone modification and transcriptional repression by CLF on regulatory genes. Our results suggest that CLF silences specific gene expression modules. Genes operating within a module have various molecular functions, but they cooperate to regulate a similar physiological function during embryo development. PMID:26945048

  7. Ecology of antibiotic resistance genes: characterization of enterococci from houseflies collected in food settings.

    PubMed

    Macovei, Lilia; Zurek, Ludek

    2006-06-01

    In this project, enterococci from the digestive tracts of 260 houseflies (Musca domestica L.) collected from five restaurants were characterized. Houseflies frequently (97% of the flies were positive) carried enterococci (mean, 3.1 x 10(3) CFU/fly). Using multiplex PCR, 205 of 355 randomly selected enterococcal isolates were identified and characterized. The majority of these isolates were Enterococcus faecalis (88.2%); in addition, 6.8% were E. faecium, and 4.9% were E. casseliflavus. E. faecalis isolates were phenotypically resistant to tetracycline (66.3%), erythromycin (23.8%), streptomycin (11.6%), ciprofloxacin (9.9%), and kanamycin (8.3%). Tetracycline resistance in E. faecalis was encoded by tet(M) (65.8%), tet(O) (1.7%), and tet(W) (0.8%). The majority (78.3%) of the erythromycin-resistant E. faecalis isolates carried erm(B). The conjugative transposon Tn916 and members of the Tn916/Tn1545 family were detected in 30.2% and 34.6% of the identified isolates, respectively. E. faecalis carried virulence genes, including a gelatinase gene (gelE; 70.7%), an aggregation substance gene (asa1; 33.2%), an enterococcus surface protein gene (esp; 8.8%), and a cytolysin gene (cylA; 8.8%). Phenotypic assays showed that 91.4% of the isolates with the gelE gene were gelatinolytic and that 46.7% of the isolates with the asa1 gene aggregated. All isolates with the cylA gene were hemolytic on human blood. This study showed that houseflies in food-handling and -serving facilities carry antibiotic-resistant and potentially virulent enterococci that have the capacity for horizontal transfer of antibiotic resistance genes to other bacteria. PMID:16751512

  8. Statistical completion of a partially identified graph with applications for the estimation of gene regulatory networks.

    PubMed

    Yu, Donghyeon; Son, Won; Lim, Johan; Xiao, Guanghua

    2015-10-01

    We study the estimation of a Gaussian graphical model whose dependent structures are partially identified. In a Gaussian graphical model, an off-diagonal zero entry in the concentration matrix (the inverse covariance matrix) implies the conditional independence of two corresponding variables, given all other variables. A number of methods have been proposed to estimate a sparse large-scale Gaussian graphical model or, equivalently, a sparse large-scale concentration matrix. In practice, the graph structure to be estimated is often partially identified by other sources or a pre-screening. In this paper, we propose a simple modification of existing methods to take into account this information in the estimation. We show that the partially identified dependent structure reduces the error in estimating the dependent structure. We apply the proposed method to estimating the gene regulatory network from lung cancer data, where protein-protein interactions are partially identified from the human protein reference database. The application shows that proposed method identified many important cancer genes as hub genes in the constructed lung cancer network. In addition, we validated the prognostic importance of a newly identified cancer gene, PTPN13, in four independent lung cancer datasets. The results indicate that the proposed method could facilitate studying underlying lung cancer mechanisms and identifying reliable biomarkers for lung cancer prognosis. PMID:25837438

  9. Statistical modelling of the formulation variables in non-viral gene delivery systems.

    PubMed

    Birchall, J C; Waterworth, C A; Luscombe, C; Parkins, D A; Gumbleton, M

    2001-06-01

    Traditionally, optimisation of a gene delivery formulation utilises a study design that involves altering only one formulation variable at any one time whilst keeping the other variables constant. As gene delivery formulations become more complex, e.g. to include multiple cellular and sub-cellular targeting elements, there will be an increasing requirement to generate and analyse data more efficiently and allow examination of the interaction between variables. This study aims to demonstrate the utility of multifactorial design, specifically a Central Composite Design, in modelling the responses size, zeta potential and in vitro transfection efficiency of some prototypic non-viral gene delivery vectors. i.e. cationic liposome-pDNA complexes, and extending the application of the design strategy to more complex vectors, i.e. tri-component lipid:polycation:DNA (LPD). The modelled predictions of how the above responses change as a function of formulation show consistency with an extensive literature base of data obtained using more traditional approaches, and highlight the robustness and utility of the Central Composite Design in examining key formulation variables in non-viral gene delivery systems. The approach should be further developed to maximise the predictive impact of data across the full range of pharmaceutical sciences. PMID:11697203

  10. Identification of candidate genes for prostate cancer-risk SNPs utilizing a normal prostate tissue eQTL data set

    PubMed Central

    Thibodeau, S. N.; French, A. J.; McDonnell, S. K.; Cheville, J.; Middha, S.; Tillmans, L.; Riska, S.; Baheti, S.; Larson, M. C.; Fogarty, Z.; Zhang, Y.; Larson, N.; Nair, A.; O'Brien, D.; Wang, L.; Schaid, D J.

    2015-01-01

    Multiple studies have identified loci associated with the risk of developing prostate cancer but the associated genes are not well studied. Here we create a normal prostate tissue-specific eQTL data set and apply this data set to previously identified prostate cancer (PrCa)-risk SNPs in an effort to identify candidate target genes. The eQTL data set is constructed by the genotyping and RNA sequencing of 471 samples. We focus on 146 PrCa-risk SNPs, including all SNPs in linkage disequilibrium with each risk SNP, resulting in 100 unique risk intervals. We analyse cis-acting associations where the transcript is located within 2 Mb (±1 Mb) of the risk SNP interval. Of all SNP–gene combinations tested, 41.7% of SNPs demonstrate a significant eQTL signal after adjustment for sample histology and 14 expression principal component covariates. Of the 100 PrCa-risk intervals, 51 have a significant eQTL signal and these are associated with 88 genes. This study provides a rich resource to study biological mechanisms underlying genetic risk to PrCa. PMID:26611117

  11. A Complete Set of Flagellar Genes Acquired by Horizontal Transfer Coexists with the Endogenous Flagellar System in Rhodobacter sphaeroides▿ †

    PubMed Central

    Poggio, Sebastian; Abreu-Goodger, Cei; Fabela, Salvador; Osorio, Aurora; Dreyfus, Georges; Vinuesa, Pablo; Camarena, Laura

    2007-01-01

    Bacteria swim in liquid environments by means of a complex rotating structure known as the flagellum. Approximately 40 proteins are required for the assembly and functionality of this structure. Rhodobacter sphaeroides has two flagellar systems. One of these systems has been shown to be functional and is required for the synthesis of the well-characterized single subpolar flagellum, while the other was found only after the genome sequence of this bacterium was completed. In this work we found that the second flagellar system of R. sphaeroides can be expressed and produces a functional flagellum. In many bacteria with two flagellar systems, one is required for swimming, while the other allows movement in denser environments by producing a large number of flagella over the entire cell surface. In contrast, the second flagellar system of R. sphaeroides produces polar flagella that are required for swimming. Expression of the second set of flagellar genes seems to be positively regulated under anaerobic growth conditions. Phylogenic analysis suggests that the flagellar system that was initially characterized was in fact acquired by horizontal transfer from a γ-proteobacterium, while the second flagellar system contains the native genes. Interestingly, other α-proteobacteria closely related to R. sphaeroides have also acquired a set of flagellar genes similar to the set found in R. sphaeroides, suggesting that a common ancestor received this gene cluster. PMID:17293429

  12. Gene set enrichment and topological analyses based on interaction networks in pediatric acute lymphoblastic leukemia

    PubMed Central

    SUI, SHUXIANG; WANG, XIN; ZHENG, HUA; GUO, HUA; CHEN, TONG; JI, DONG-MEI

    2015-01-01

    Pediatric acute lymphoblastic leukemia (ALL) accounts for over one-quarter of all pediatric cancers. Interacting genes and proteins within the larger human gene interaction network of the human genome are rarely investigated by studies investigating pediatric ALL. In the present study, interaction networks were constructed using the empirical Bayesian approach and the Search Tool for the Retrieval of Interacting Genes/proteins database, based on the differentially-expressed (DE) genes in pediatric ALL, which were identified using the RankProd package. Enrichment analysis of the interaction network was performed using the network-based methods EnrichNet and PathExpand, which were compared with the traditional expression analysis systematic explored (EASE) method. In total, 398 DE genes were identified in pediatric ALL, and LIF was the most significantly DE gene. The co-expression network consisted of 272 nodes, which indicated genes and proteins, and 602 edges, which indicated the number of interactions adjacent to the node. Comparison between EASE and PathExpand revealed that PathExpand detected more pathways or processes that were closely associated with pediatric ALL compared with the EASE method. There were 294 nodes and 1,588 edges in the protein-protein interaction network, with the processes of hematopoietic cell lineage and porphyrin metabolism demonstrating a close association with pediatric ALL. Network enrichment analysis based on the PathExpand algorithm was revealed to be more powerful for the analysis of interaction networks in pediatric ALL compared with the EASE method. LIF and MLLT11 were identified as the most significantly DE genes in pediatric ALL. The process of hematopoietic cell lineage was the pathway most significantly associated with pediatric ALL. PMID:26788135

  13. Groundwater denitrification and denitrifer gene abundances at varying hydrogeological settings in Ireland

    NASA Astrophysics Data System (ADS)

    Jahangir, M. M.; Barrett, M.; Johnston, P.; O'Flaherty, V.; Khalil, M. I.; Richards, K.

    2010-12-01

    Biological denitrification is an important mechanism for the reduction of nitrate in the terrestrial and aquatic environments and contributing to the global nitrogen balance. This study focuses on the abundance of denitrifier functional genes and dissolved gases (N2O and denitrified N2, called excess N2) in groundwater. Multilevel piezometers (36) installed to target three groundwater zones were: subsoil (5 m bgl, below ground level); bedrock interface (10 m bgl) and bedrock (20 m bgl) at three agricultural sites (Johnstown Castle, Solohead, Oak Park) and in bedrock at a further site (Dairy Gold). Low flow sampling procedures were used to collect groundwater monthly from February 2009 and January 2010. Dissolved N2 and Ar, measured using Membrane Inlet Mass Spectrometry, were used to estimate excess N2. Dissolved N2O was extracted using a helium headspace method. Ten litres groundwater were sampled from each well in May 2010 and DNA was concentrated by vacuum filter in 0.2 µm filter paper. Functional gene abundances were quantified using real-time PCR assays targeting the nitrite reductase (nir) and nitrous oxide reductase (nos) genes. Mean water table (WT) depth varied seasonally and was the shallowest in November and the deepest in June across all sites. Groundwater properties varied across sites and depths, with ranges of dissolved oxygen (DO) from 1.0-9.0 mg L-1, redox potential (Eh) -0.80-191, and aquifer permeability (Ksat) from 0.003-1.04 m d-1. Dissolved organic C (DOC), decreased with increasing depth ranged from 1.0-4.0, 0.9-2.4 and 0.8-2.4 respectively in subsoil, interface and bedrock, lowest in Oak Park and highest in Johnstown Castle. Total bacterial abundances were higher in subsoil (2.9 x 104 genes L-1) than in interface (6.7 x 104 genes L-1). The most abundant denitrifying functional genes were nirS, ranged from 1.4 x 103 genes L-1 in subsoil to 2.0 x 104 genes L-1 in bedrock followed by nosZ, varied from 2.56 x 102 genes L-1 in bedrock to 1.9 x

  14. HoxBlinc RNA recruits Set1/MLL complexes to activate Hox gene expression patterns and mesoderm lineage development

    PubMed Central

    Deng, Changwang; Li, Ying; Zhou, Lei; Cho, Joonseok; Patel, Bhavita; Terada, Nao; Li, Yangqiu; Bungert, Jörg; Qiu, Yi; Huang, Suming

    2015-01-01

    Summary Trithorax proteins and long-intergenic noncoding RNAs are critical regulators of embryonic stem cell pluripotency; however, how they cooperatively regulate germ layer mesoderm specification remains elusive. We report here that HoxBlinc RNA first specifies Flk1+ mesoderm and then promotes hematopoietic differentiation through regulating hoxb gene pathways. HoxBlinc binds to the hoxb genes, recruits Setd1a/MLL1 complexes, and mediates long-range chromatin interactions to activate transcription of the hoxb genes. Depletion of HoxBlinc by shRNA-mediated KD or CRISPR-Cas9-mediated genetic deletion inhibits expression of hoxb genes and other factors regulating cardiac/hematopoietic differentiation. Reduced hoxb gene expression is accompanied by decreased recruitment of Set1/MLL1 and H3K4me3 modification, as well as by reduced chromatin loop formation. Re-expression of hoxb2-b4 genes in HoxBlinc-depleted embryoid bodies rescues Flk1+ precursors that undergo hematopoietic differentiation. Thus, HoxBlinc plays an important role in controlling hoxb transcription networks that mediate specification of mesoderm-derived Flk1+ precursors and differentiation of Flk1+ cells into hematopoietic lineages. PMID:26725110

  15. A Core Gene Set Describes the Molecular Basis of Mutualism and Antagonism in Epichloë spp.

    PubMed

    Eaton, Carla J; Dupont, Pierre-Yves; Solomon, Peter; Clayton, William; Scott, Barry; Cox, Murray P

    2015-03-01

    Beneficial plant-fungal interactions play an important role in the ability of plants to survive changing environmental conditions. In contrast, phytopathogenic fungi fall at the opposite end of the symbiotic spectrum, causing reduced host growth or even death. In order to exploit beneficial interactions and prevent pathogenic ones, it is essential to understand the molecular differences underlying these alternative states. The association between the endophyte Epichloë festucae and Lolium perenne (perennial ryegrass) is an excellent system for studying these molecular patterns due to the existence of several fungal mutants that have an antagonistic rather than a mutualistic interaction with the host plant. By comparing gene expression in a wild-type beneficial association with three mutant antagonistic associations disrupted in key signaling genes, we identified a core set of 182 genes that show common differential expression patterns between these two states. These gene expression changes are indicative of a nutrient-starvation response, as supported by the upregulation of genes encoding degradative enzymes, transporters, and primary metabolism, and downregulation of genes encoding putative small-secreted proteins and secondary metabolism. These results suggest that disruption of a mutualistic symbiotic interaction may lead to an elevated uptake and degradation of host-derived nutrients and cell-wall components, reminiscent of phytopathogenic interactions. PMID:25496592

  16. Transcriptome analysis of cortical tissue reveals shared sets of downregulated genes in autism and schizophrenia.

    PubMed

    Ellis, S E; Panitch, R; West, A B; Arking, D E

    2016-01-01

    Autism (AUT), schizophrenia (SCZ) and bipolar disorder (BPD) are three highly heritable neuropsychiatric conditions. Clinical similarities and genetic overlap between the three disorders have been reported; however, the causes and the downstream effects of this overlap remain elusive. By analyzing transcriptomic RNA-sequencing data generated from post-mortem cortical brain tissues from AUT, SCZ, BPD and control subjects, we have begun to characterize the extent of gene expression overlap between these disorders. We report that the AUT and SCZ transcriptomes are significantly correlated (P<0.001), whereas the other two cross-disorder comparisons (AUT-BPD and SCZ-BPD) are not. Among AUT and SCZ, we find that the genes differentially expressed across disorders are involved in neurotransmission and synapse regulation. Despite the lack of global transcriptomic overlap across all three disorders, we highlight two genes, IQSEC3 and COPS7A, which are significantly downregulated compared with controls across all three disorders, suggesting either shared etiology or compensatory changes across these neuropsychiatric conditions. Finally, we tested for enrichment of genes differentially expressed across disorders in genetic association signals in AUT, SCZ or BPD, reporting lack of signal in any of the previously published genome-wide association study (GWAS). Together, these studies highlight the importance of examining gene expression from the primary tissue involved in neuropsychiatric conditions-the cortical brain. We identify a shared role for altered neurotransmission and synapse regulation in AUT and SCZ, in addition to two genes that may more generally contribute to neurodevelopmental and neuropsychiatric conditions. PMID:27219343

  17. Using RNAi in C. "elegans" to Demonstrate Gene Knockdown Phenotypes in the Undergraduate Biology Lab Setting

    ERIC Educational Resources Information Center

    Roy, Nicole M.

    2013-01-01

    RNA interference (RNAi) is a powerful technology used to knock down genes in basic research and medicine. In 2006 RNAi technology using "Caenorhabditis elegans" ("C. elegans") was awarded the Nobel Prize in medicine and thus students graduating in the biological sciences should have experience with this technology. However,…

  18. A Statistical Approach Reveals Designs for the Most Robust Stochastic Gene Oscillators

    PubMed Central

    2016-01-01

    The engineering of transcriptional networks presents many challenges due to the inherent uncertainty in the system structure, changing cellular context, and stochasticity in the governing dynamics. One approach to address these problems is to design and build systems that can function across a range of conditions; that is they are robust to uncertainty in their constituent components. Here we examine the parametric robustness landscape of transcriptional oscillators, which underlie many important processes such as circadian rhythms and the cell cycle, plus also serve as a model for the engineering of complex and emergent phenomena. The central questions that we address are: Can we build genetic oscillators that are more robust than those already constructed? Can we make genetic oscillators arbitrarily robust? These questions are technically challenging due to the large model and parameter spaces that must be efficiently explored. Here we use a measure of robustness that coincides with the Bayesian model evidence, combined with an efficient Monte Carlo method to traverse model space and concentrate on regions of high robustness, which enables the accurate evaluation of the relative robustness of gene network models governed by stochastic dynamics. We report the most robust two and three gene oscillator systems, plus examine how the number of interactions, the presence of autoregulation, and degradation of mRNA and protein affects the frequency, amplitude, and robustness of transcriptional oscillators. We also find that there is a limit to parametric robustness, beyond which there is nothing to be gained by adding additional feedback. Importantly, we provide predictions on new oscillator systems that can be constructed to verify the theory and advance design and modeling approaches to systems and synthetic biology. PMID:26835539

  19. A Statistical Approach Reveals Designs for the Most Robust Stochastic Gene Oscillators.

    PubMed

    Woods, Mae L; Leon, Miriam; Perez-Carrasco, Ruben; Barnes, Chris P

    2016-06-17

    The engineering of transcriptional networks presents many challenges due to the inherent uncertainty in the system structure, changing cellular context, and stochasticity in the governing dynamics. One approach to address these problems is to design and build systems that can function across a range of conditions; that is they are robust to uncertainty in their constituent components. Here we examine the parametric robustness landscape of transcriptional oscillators, which underlie many important processes such as circadian rhythms and the cell cycle, plus also serve as a model for the engineering of complex and emergent phenomena. The central questions that we address are: Can we build genetic oscillators that are more robust than those already constructed? Can we make genetic oscillators arbitrarily robust? These questions are technically challenging due to the large model and parameter spaces that must be efficiently explored. Here we use a measure of robustness that coincides with the Bayesian model evidence, combined with an efficient Monte Carlo method to traverse model space and concentrate on regions of high robustness, which enables the accurate evaluation of the relative robustness of gene network models governed by stochastic dynamics. We report the most robust two and three gene oscillator systems, plus examine how the number of interactions, the presence of autoregulation, and degradation of mRNA and protein affects the frequency, amplitude, and robustness of transcriptional oscillators. We also find that there is a limit to parametric robustness, beyond which there is nothing to be gained by adding additional feedback. Importantly, we provide predictions on new oscillator systems that can be constructed to verify the theory and advance design and modeling approaches to systems and synthetic biology. PMID:26835539

  20. Conjugative transposons: an unusual and diverse set of integrated gene transfer elements.

    PubMed Central

    Salyers, A A; Shoemaker, N B; Stevens, A M; Li, L Y

    1995-01-01

    Conjugative transposons are integrated DNA elements that excise themselves to form a covalently closed circular intermediate. This circular intermediate can either reintegrate in the same cell (intracellular transposition) or transfer by conjugation to a recipient and integrate into the recipient's genome (intercellular transposition). Conjugative transposons were first found in gram-positive cocci but are now known to be present in a variety of gram-positive and gram-negative bacteria also. Conjugative transposons have a surprisingly broad host range, and they probably contribute as much as plasmids to the spread of antibiotic resistance genes in some genera of disease-causing bacteria. Resistance genes need not be carried on the conjugative transposon to be transferred. Many conjugative transposons can mobilize coresident plasmids, and the Bacteroides conjugative transposons can even excise and mobilize unlinked integrated elements. The Bacteroides conjugative transposons are also unusual in that their transfer activities are regulated by tetracycline via a complex regulatory network. PMID:8531886

  1. The Smallest Known Genomes of Multicellular and Toxic Cyanobacteria: Comparison, Minimal Gene Sets for Linked Traits and the Evolutionary Implications

    PubMed Central

    Stucken, Karina; John, Uwe; Cembella, Allan; Murillo, Alejandro A.; Soto-Liebe, Katia; Fuentes-Valdés, Juan J.; Friedel, Maik; Plominsky, Alvaro M.; Vásquez, Mónica; Glöckner, Gernot

    2010-01-01

    Cyanobacterial morphology is diverse, ranging from unicellular spheres or rods to multicellular structures such as colonies and filaments. Multicellular species represent an evolutionary strategy to differentiate and compartmentalize certain metabolic functions for reproduction and nitrogen (N2) fixation into specialized cell types (e.g. akinetes, heterocysts and diazocytes). Only a few filamentous, differentiated cyanobacterial species, with genome sizes over 5 Mb, have been sequenced. We sequenced the genomes of two strains of closely related filamentous cyanobacterial species to yield further insights into the molecular basis of the traits of N2 fixation, filament formation and cell differentiation. Cylindrospermopsis raciborskii CS-505 is a cylindrospermopsin-producing strain from Australia, whereas Raphidiopsis brookii D9 from Brazil synthesizes neurotoxins associated with paralytic shellfish poisoning (PSP). Despite their different morphology, toxin composition and disjunct geographical distribution, these strains form a monophyletic group. With genome sizes of approximately 3.9 (CS-505) and 3.2 (D9) Mb, these are the smallest genomes described for free-living filamentous cyanobacteria. We observed remarkable gene order conservation (synteny) between these genomes despite the difference in repetitive element content, which accounts for most of the genome size difference between them. We show here that the strains share a specific set of 2539 genes with >90% average nucleotide identity. The fact that the CS-505 and D9 genomes are small and streamlined compared to those of other filamentous cyanobacterial species and the lack of the ability for heterocyst formation in strain D9 allowed us to define a core set of genes responsible for each trait in filamentous species. We presume that in strain D9 the ability to form proper heterocysts was secondarily lost together with N2 fixation capacity. Further comparisons to all available cyanobacterial genomes covering

  2. Genetic diversity of the conserved motifs of six bacterial leaf blight resistance genes in a set of rice landraces

    PubMed Central

    2014-01-01

    Background Bacterial leaf blight (BLB) caused by the vascular pathogen Xanthomonas oryzae pv. oryzae (Xoo) is one of the most serious diseases leading to crop failure in rice growing countries. A total of 37 resistance genes against Xoo has been identified in rice. Of these, ten BLB resistance genes have been mapped on rice chromosomes, while 6 have been cloned, sequenced and characterized. Diversity analysis at the resistance gene level of this disease is scanty, and the landraces from West Bengal and North Eastern states of India have received little attention so far. The objective of this study was to assess the genetic diversity at conserved domains of 6 BLB resistance genes in a set of 22 rice accessions including landraces and check genotypes collected from the states of Assam, Nagaland, Mizoram and West Bengal. Results In this study 34 pairs of primers were designed from conserved domains of 6 BLB resistance genes; Xa1, xa5, Xa21, Xa21(A1), Xa26 and Xa27. The designed primer pairs were used to generate PCR based polymorphic DNA profiles to detect and elucidate the genetic diversity of the six genes in the 22 diverse rice accessions of known disease phenotype. A total of 140 alleles were identified including 41 rare and 26 null alleles. The average polymorphism information content (PIC) value was 0.56/primer pair. The DNA profiles identified each of the rice landraces unequivocally. The amplified polymorphic DNA bands were used to calculate genetic similarity of the rice landraces in all possible pair combinations. The similarity among the rice accessions ranged from 18% to 89% and the dendrogram produced from the similarity values was divided into 2 major clusters. The conserved domains identified within the sequenced rare alleles include Leucine-Rich Repeat, BED-type zinc finger domain, sugar transferase domain and the domain of the carbohydrate esterase 4 superfamily. Conclusions This study revealed high genetic diversity at conserved domains of six BLB

  3. Gene set enrichment analysis of microarray data from Pimephales promelas (Rafinesque), a non-mammalian model organism

    PubMed Central

    2011-01-01

    Background Methods for gene-class testing, such as Gene Set Enrichment Analysis (GSEA), incorporate biological knowledge into the analysis and interpretation of microarray data by comparing gene expression patterns to pathways, systems and emergent phenotypes. However, to use GSEA to its full capability with non-mammalian model organisms, a microarray platform must be annotated with human gene symbols. Doing so enables the ability to relate a model organism's gene expression, in response to a given treatment, to potential human health consequences of that treatment. We enhanced the annotation of a microarray platform from a non-mammalian model organism, and then used the GSEA approach in a reanalysis of a study examining the biological significance of acute and chronic methylmercury exposure on liver tissue of fathead minnow (Pimephales promelas). Using GSEA, we tested the hypothesis that fathead livers, in response to methylmercury exposure, would exhibit gene expression patterns similar to diseased human livers. Results We describe an enhanced annotation of the fathead minnow microarray platform with human gene symbols. This resource is now compatible with the GSEA approach for gene-class testing. We confirmed that GSEA, using this enhanced microarray platform, is able to recover results consistent with a previous analysis of fathead minnow exposure to methylmercury using standard analytical approaches. Using GSEA to compare fathead gene expression profiles to human phenotypes, we also found that fathead methylmercury-treated livers exhibited expression profiles that are homologous to human systems & pathways and results in damage that is similar to those of human liver damage associated with hepatocellular carcinoma and hepatitis B. Conclusions This study describes a powerful resource for enabling the use of non-mammalian model organisms in the study of human health significance. Results of microarray gene expression studies involving fathead minnow, typically

  4. A statistical model and national data set for partioning fish-tissue mercury concentration variation between spatiotemporal and sample characteristic effects

    USGS Publications Warehouse

    Wente, Stephen P.

    2004-01-01

    Many Federal, Tribal, State, and local agencies monitor mercury in fish-tissue samples to identify sites with elevated fish-tissue mercury (fish-mercury) concentrations, track changes in fish-mercury concentrations over time, and produce fish-consumption advisories. Interpretation of such monitoring data commonly is impeded by difficulties in separating the effects of sample characteristics (species, tissues sampled, and sizes of fish) from the effects of spatial and temporal trends on fish-mercury concentrations. Without such a separation, variation in fish-mercury concentrations due to differences in the characteristics of samples collected over time or across space can be misattributed to temporal or spatial trends; and/or actual trends in fish-mercury concentration can be misattributed to differences in sample characteristics. This report describes a statistical model and national data set (31,813 samples) for calibrating the aforementioned statistical model that can separate spatiotemporal and sample characteristic effects in fish-mercury concentration data. This model could be useful for evaluating spatial and temporal trends in fishmercury concentrations and developing fish-consumption advisories. The observed fish-mercury concentration data and model predictions can be accessed, displayed geospatially, and downloaded via the World Wide Web (http://emmma.usgs.gov). This report and the associated web site may assist in the interpretation of large amounts of data from widespread fishmercury monitoring efforts.

  5. 16Stimator: statistical estimation of ribosomal gene copy numbers from draft genome assemblies.

    PubMed

    Perisin, Matthew; Vetter, Madlen; Gilbert, Jack A; Bergelson, Joy

    2016-04-01

    The 16S rRNA gene (16S) is an accepted marker of bacterial taxonomic diversity, even though differences in copy number obscure the relationship between amplicon and organismal abundances. Ancestral state reconstruction methods can predict 16S copy numbers through comparisons with closely related reference genomes; however, the database of closed genomes is limited. Here, we extend the reference database of 16S copy numbers to de novo assembled draft genomes by developing 16Stimator, a method to estimate 16S copy numbers when these repetitive regions collapse during assembly. Using a read depth approach, we estimate 16S copy numbers for 12 endophytic isolates from Arabidopsis thaliana and confirm estimates by qPCR. We further apply this approach to draft genomes deposited in NCBI and demonstrate accurate copy number estimation regardless of sequencing platform, with an overall median deviation of 14%. The expanded database of isolates with 16S copy number estimates increases the power of phylogenetic correction methods for determining organismal abundances from 16S amplicon surveys. PMID:26359911

  6. Root Exudates of Various Host Plants of Rhizobium leguminosarum Contain Different Sets of Inducers of Rhizobium Nodulation Genes.

    PubMed

    Zaat, S A; Wijffelman, C A; Mulders, I H; van Brussel, A A; Lugtenberg, B J

    1988-04-01

    Rhizobium promoters involved in the formation of root nodules on leguminous plants are activated by flavonoids in plant root exudate. A series of Rhizobium strains which all contain the inducible Rhizobium leguminosarum nodA promoter fused to the Escherichia coli lacZ gene, and which differ only in the source of the regulatory nodD gene, were recently used to show that the regulatory nodD gene determines which flavonoids are able to activate the nodA promoter (HP Spaink, CA Wijffelman, E Pees, RJH Okker, BJJ Lugtenberg 1987 Nature 328: 337-340). Since these strains therefore are able to discriminate between various flavonoids, they were used to determine whether or not plants that are nodulated by R. leguminosarum produce different inducers. After chromatographic separation of root exudate constituents from Vicia sativa L. subsp. nigra (L.), V. hirsuta (L.) S.F. Gray, Pisum sativum L. cv Rondo, and Trifolium subterraneum L., the fractions were tested with a set of strains containing a nodD gene of R. leguminosarum, R. trifolii, or Rhizobium meliloti, respectively. It appeared that the source of nodD determined whether, and to what extent, the R. leguminosarum nodA promoter was induced. Lack of induction could not be attributed to the presence of inhibitors. Most of the inducers were able to activate the nodA promoter in the presence of one particular nodD gene only. The inducers that were active in the presence of the R. leguminosarum nodD gene were different in each root exudate. PMID:16666070

  7. Gene set of chemosensory receptors in the polyembryonic endoparasitoid Macrocentrus cingulum

    PubMed Central

    Ahmed, Tofael; Zhang, Tiantao; Wang, Zhenying; He, Kanglai; Bai, Shuxiong

    2016-01-01

    Insects are extremely successful animals whose odor perception is very prominent due to their sophisticated olfactory system. The main chemosensory organ, antennae play a critical role in detecting odor in ambient environment before initiating appropriate behavioral responses. The antennal chemosensory receptor genes families have been suggested to be involved in olfactory signal transduction pathway as a sensory neuron response. The Macrocentrus cingulum is deployed successfully as a biological control agent for corn pest insects from the Lepidopteran genus Ostrinia. In this research, we assembled antennal transcriptomes of M. cingulum by using next generation sequencing to identify the major chemosensory receptors gene families. In total, 112 olfactory receptors candidates (79 odorant receptors, 20 gustatory receptors, and 13 ionotropic receptors) have been identified from the male and female antennal transcriptome. The sequences of all of these transcripts were confirmed by RT-PCR, and direct DNA sequencing. Expression profiles of gustatory receptors in olfactory and non-olfactory tissues were measured by RT-qPCR. The sex-specific and sex-biased chemoreceptors expression patterns suggested that they may have important functions in sense detection which behaviorally relevant to odor molecules. This reported result provides a comprehensive resource of the foundation in semiochemicals driven behaviors at molecular level in polyembryonic endoparasitoid. PMID:27090020

  8. Gene set of chemosensory receptors in the polyembryonic endoparasitoid Macrocentrus cingulum.

    PubMed

    Ahmed, Tofael; Zhang, Tiantao; Wang, Zhenying; He, Kanglai; Bai, Shuxiong

    2016-01-01

    Insects are extremely successful animals whose odor perception is very prominent due to their sophisticated olfactory system. The main chemosensory organ, antennae play a critical role in detecting odor in ambient environment before initiating appropriate behavioral responses. The antennal chemosensory receptor genes families have been suggested to be involved in olfactory signal transduction pathway as a sensory neuron response. The Macrocentrus cingulum is deployed successfully as a biological control agent for corn pest insects from the Lepidopteran genus Ostrinia. In this research, we assembled antennal transcriptomes of M. cingulum by using next generation sequencing to identify the major chemosensory receptors gene families. In total, 112 olfactory receptors candidates (79 odorant receptors, 20 gustatory receptors, and 13 ionotropic receptors) have been identified from the male and female antennal transcriptome. The sequences of all of these transcripts were confirmed by RT-PCR, and direct DNA sequencing. Expression profiles of gustatory receptors in olfactory and non-olfactory tissues were measured by RT-qPCR. The sex-specific and sex-biased chemoreceptors expression patterns suggested that they may have important functions in sense detection which behaviorally relevant to odor molecules. This reported result provides a comprehensive resource of the foundation in semiochemicals driven behaviors at molecular level in polyembryonic endoparasitoid. PMID:27090020

  9. Extended triplet set C343 of DNA sequences and its application to the p53 gene

    NASA Astrophysics Data System (ADS)

    Yan, Yan-Yan; Zhu, Ping

    2011-01-01

    Recently, much research has indicated that more and more cancers pose a threat to human life. Cancers are caused by oncogenes. Many human oncogenes have been found and most of them are located on chromosomes. The discovery of the oncogene plays a significant role in the treatment of cancer. The p53 tumor suppressor gene has received much attention because it frequently mutates or deletes in tumor cells of most people. Thus, the study of oncogenes is significant. In order to establish the Galois field (GF(7)), the indefinite gene is introduced as D and oncogene is introduced as O, and P. Taking the polynomial coefficients a0, a1, a2 in GF(7) and the bijective function f: GF(7) → {D,A,C,O,G,T,P}, where f (0) = D, f (1) = A, f (2) = C, f (3) = O, f (4) = G, f (5) = T, and f (6) = P, the bijective phi may be written as phi(a0 + a1x + a2x2). Based on the algebraic structure, we can not only analyse the DNA sequence of oncogenes, but also predict possible new cancers.

  10. HoxBlinc RNA Recruits Set1/MLL Complexes to Activate Hox Gene Expression Patterns and Mesoderm Lineage Development.

    PubMed

    Deng, Changwang; Li, Ying; Zhou, Lei; Cho, Joonseok; Patel, Bhavita; Terada, Naohiro; Li, Yangqiu; Bungert, Jörg; Qiu, Yi; Huang, Suming

    2016-01-01

    Trithorax proteins and long-intergenic noncoding RNAs are critical regulators of embryonic stem cell pluripotency; however, how they cooperatively regulate germ layer mesoderm specification remains elusive. We report here that HoxBlinc RNA first specifies Flk1(+) mesoderm and then promotes hematopoietic differentiation through regulation of hoxb pathways. HoxBlinc binds to the hoxb genes, recruits Setd1a/MLL1 complexes, and mediates long-range chromatin interactions to activate transcription of the hoxb genes. Depletion of HoxBlinc by shRNA-mediated knockdown or CRISPR-Cas9-mediated genetic deletion inhibits expression of hoxb genes and other factors regulating cardiac/hematopoietic differentiation. Reduced hoxb expression is accompanied by decreased recruitment of Set1/MLL1 and H3K4me3 modification, as well as by reduced chromatin loop formation. Re-expression of hoxb2-b4 genes in HoxBlinc-depleted embryoid bodies rescues Flk1(+) precursors that undergo hematopoietic differentiation. Thus, HoxBlinc plays an important role in controlling hoxb transcription networks that mediate specification of mesoderm-derived Flk1(+) precursors and differentiation of Flk1(+) cells into hematopoietic lineages. PMID:26725110

  11. The transcriptional response to encystation stimuli in Giardia lamblia is restricted to a small set of genes.

    PubMed

    Morf, Laura; Spycher, Cornelia; Rehrauer, Hubert; Fournier, Catharine Aquino; Morrison, Hilary G; Hehl, Adrian B

    2010-10-01

    The protozoan parasite Giardia lamblia undergoes stage differentiation in the small intestine of the host to an environmentally resistant and infectious cyst. Encystation involves the secretion of an extracellular matrix comprised of cyst wall proteins (CWPs) and a β(1-3)-GalNAc homopolymer. Upon the induction of encystation, genes coding for CWPs are switched on, and mRNAs coding for a Myb transcription factor and enzymes involved in cyst wall glycan synthesis are upregulated. Encystation in vitro is triggered by several protocols, which call for changes in bile concentrations or availability of lipids, and elevated pH. However, the conditions for induction are not standardized and we predicted significant protocol-specific side effects. This makes reliable identification of encystation factors difficult. Here, we exploited the possibility of inducing encystation with two different protocols, which we show to be equally effective, for a comparative mRNA profile analysis. The standard encystation protocol induced a bipartite transcriptional response with surprisingly minor involvement of stress genes. A comparative analysis revealed a core set of only 18 encystation genes and showed that a majority of genes was indeed upregulated as a side effect of inducing conditions. We also established a Myb binding sequence as a signature motif in encystation promoters, suggesting coordinated regulation of these factors. PMID:20693303

  12. The Transcriptional Response to Encystation Stimuli in Giardia lamblia Is Restricted to a Small Set of Genes ▿†

    PubMed Central

    Morf, Laura; Spycher, Cornelia; Rehrauer, Hubert; Fournier, Catharine Aquino; Morrison, Hilary G.; Hehl, Adrian B.

    2010-01-01

    The protozoan parasite Giardia lamblia undergoes stage differentiation in the small intestine of the host to an environmentally resistant and infectious cyst. Encystation involves the secretion of an extracellular matrix comprised of cyst wall proteins (CWPs) and a β(1-3)-GalNAc homopolymer. Upon the induction of encystation, genes coding for CWPs are switched on, and mRNAs coding for a Myb transcription factor and enzymes involved in cyst wall glycan synthesis are upregulated. Encystation in vitro is triggered by several protocols, which call for changes in bile concentrations or availability of lipids, and elevated pH. However, the conditions for induction are not standardized and we predicted significant protocol-specific side effects. This makes reliable identification of encystation factors difficult. Here, we exploited the possibility of inducing encystation with two different protocols, which we show to be equally effective, for a comparative mRNA profile analysis. The standard encystation protocol induced a bipartite transcriptional response with surprisingly minor involvement of stress genes. A comparative analysis revealed a core set of only 18 encystation genes and showed that a majority of genes was indeed upregulated as a side effect of inducing conditions. We also established a Myb binding sequence as a signature motif in encystation promoters, suggesting coordinated regulation of these factors. PMID:20693303

  13. An 86-probe-set gene-expression signature predicts survival in cytogenetically normal acute myeloid leukemia

    PubMed Central

    Metzeler, Klaus H.; Hummel, Manuela; Bloomfield, Clara D.; Spiekermann, Karsten; Braess, Jan; Sauerland, Maria-Cristina; Heinecke, Achim; Radmacher, Michael; Marcucci, Guido; Whitman, Susan P.; Maharry, Kati; Paschka, Peter; Larson, Richard A.; Berdel, Wolfgang E.; Büchner, Thomas; Wörmann, Bernhard; Mansmann, Ulrich; Hiddemann, Wolfgang

    2008-01-01

    Patients with cytogenetically normal acute myeloid leukemia (CN-AML) show heterogeneous treatment outcomes. We used gene-expression profiling to develop a gene signature that predicts overall survival (OS) in CN-AML. Based on data from 163 patients treated in the German AMLCG 1999 trial and analyzed on oligonucleotide microarrays, we used supervised principal component analysis to identify 86 probe sets (representing 66 different genes), which correlated with OS, and defined a prognostic score based on this signature. When applied to an independent cohort of 79 CN-AML patients, this continuous score remained a significant predictor for OS (hazard ratio [HR], 1.85; P = .002), event-free survival (HR = 1.73; P = .001), and relapse-free survival (HR = 1.76; P = .025). It kept its prognostic value in multivariate analyses adjusting for age, FLT3 ITD, and NPM1 status. In a validation cohort of 64 CN-AML patients treated on CALGB study 9621, the score also predicted OS (HR = 4.11; P < .001), event-free survival (HR = 2.90; P < .001), and relapse-free survival (HR = 3.14, P < .001) and retained its significance in a multivariate model for OS. In summary, we present a novel gene-expression signature that offers additional prognostic information for patients with CN-AML. PMID:18716133

  14. Expression of a set of synthetic suppressor tRNA(Phe) genes in Saccharomyces cerevisiae.

    PubMed Central

    Masson, J M; Meuris, P; Grunstein, M; Abelson, J; Miller, J H

    1987-01-01

    Synthetic ochre and amber tRNA suppressor genes derived from the yeast tRNA(PheGAA) sequence have been constructed. They were efficiently transcribed in vitro and expressed in vivo via a synthetic expression cassette. tRNA(PheUUA) and tRNA(PheUUA) delta IVS (IVS = intervening sequence) are relatively inefficient ochre suppressors. They are toxic to the cell when expressed on a multicopy plasmid, and they do not suppress at all when present as single copies. The intron does not seem to have any effect on suppression. In contrast, the amber suppressor tRNA(PheCUA) delta IVS is efficient when expressed from a single-copy plasmid, while its efficiency is reduced on a multicopy vector. Images PMID:3309948

  15. Of Genes and Machines: Application of a Combination of Machine Learning Tools to Astronomy Data Sets

    NASA Astrophysics Data System (ADS)

    Heinis, S.; Kumar, S.; Gezari, S.; Burgett, W. S.; Chambers, K. C.; Draper, P. W.; Flewelling, H.; Kaiser, N.; Magnier, E. A.; Metcalfe, N.; Waters, C.

    2016-04-01

    We apply a combination of genetic algorithm (GA) and support vector machine (SVM) machine learning algorithms to solve two important problems faced by the astronomical community: star–galaxy separation and photometric redshift estimation of galaxies in survey catalogs. We use the GA to select the relevant features in the first step, followed by optimization of SVM parameters in the second step to obtain an optimal set of parameters to classify or regress, in the process of which we avoid overfitting. We apply our method to star–galaxy separation in Pan-STARRS1 data. We show that our method correctly classifies 98% of objects down to {i}{{P1}}=24.5, with a completeness (or true positive rate) of 99% for galaxies and 88% for stars. By combining colors with morphology, our star–galaxy separation method yields better results than the new SExtractor classifier spread_model, in particular at the faint end ({i}{{P1}}\\gt 22). We also use our method to derive photometric redshifts for galaxies in the COSMOS bright multiwavelength data set down to an error in (1+z) of σ =0.013, which compares well with estimates from spectral energy distribution fitting on the same data (σ =0.007) while making a significantly smaller number of assumptions.

  16. Of Genes and Machines: Application of a Combination of Machine Learning Tools to Astronomy Data Sets

    NASA Astrophysics Data System (ADS)

    Heinis, S.; Kumar, S.; Gezari, S.; Burgett, W. S.; Chambers, K. C.; Draper, P. W.; Flewelling, H.; Kaiser, N.; Magnier, E. A.; Metcalfe, N.; Waters, C.

    2016-04-01

    We apply a combination of genetic algorithm (GA) and support vector machine (SVM) machine learning algorithms to solve two important problems faced by the astronomical community: star-galaxy separation and photometric redshift estimation of galaxies in survey catalogs. We use the GA to select the relevant features in the first step, followed by optimization of SVM parameters in the second step to obtain an optimal set of parameters to classify or regress, in the process of which we avoid overfitting. We apply our method to star-galaxy separation in Pan-STARRS1 data. We show that our method correctly classifies 98% of objects down to {i}{{P1}}=24.5, with a completeness (or true positive rate) of 99% for galaxies and 88% for stars. By combining colors with morphology, our star-galaxy separation method yields better results than the new SExtractor classifier spread_model, in particular at the faint end ({i}{{P1}}\\gt 22). We also use our method to derive photometric redshifts for galaxies in the COSMOS bright multiwavelength data set down to an error in (1+z) of σ =0.013, which compares well with estimates from spectral energy distribution fitting on the same data (σ =0.007) while making a significantly smaller number of assumptions.

  17. Phosphorylation of galectin-3 contributes to malignant transformation of human epithelial cells via modulation of unique sets of genes.

    PubMed

    Mazurek, Nachman; Sun, Yun Jie; Price, Janet E; Ramdas, Latha; Schober, Wendy; Nangia-Makker, Pratima; Byrd, James C; Raz, Avraham; Bresalier, Robert S

    2005-12-01

    Galectin-3 is a multifunctional beta-galactoside-binding protein implicated in apoptosis, malignant transformation, and tumor progression. The mechanisms by which galectin-3 contributes to malignant progression are not fully understood. In this study, we found that the introduction of wild-type galectin-3 into nontumorigenic, galectin-3-null BT549 human breast epithelial cells conferred tumorigenicity and metastatic potential in nude mice, and that galectin-3 expressed by the cells was phosphorylated. In contrast, BT549 cells expressing galectin-3 incapable of being phosphorylated (Ser6-->Glu Ser6-->Ala) were nontumorigenic. A microarray analysis of 10,000 human genes, comparing BT549 transfectants expressing wild-type and those expressing phosphomutant galectin-3, identified 188 genes that were differentially expressed (>2.5-fold). Genes affected by introduction of wild-type phosphorylated but not phosphomutant galectin-3 included those involved in oxidative stress, a novel noncaspase lysosomal apoptotic pathway, cell cycle regulation, transcriptional activation, cytoskeleton remodeling, cell adhesion, and tumor invasion. The reliability of the microarray data was validated by real-time reverse transcription-PCR (RT-PCR) and by Western blot analysis, and clinical relevance was evaluated by real-time RT-PCR screening of a panel of matched pairs of breast tumors. Differentially regulated genes in breast cancers that are also predicted to be associated with phospho-galectin-3 in transformed BT549 cells include C-type lectin 2, insulin-like growth factor-binding protein 5, cathepsins L2, and cyclin D1. These data show the functional diversity of galectin-3 and suggest that phosphorylation of the protein is necessary for regulation (directly or indirectly) of unique sets of genes that play a role in malignant transformation. PMID:16322222

  18. ellipsoidFN: a tool for identifying a heterogeneous set of cancer biomarkers based on gene expressions.

    PubMed

    Ren, Xianwen; Wang, Yong; Chen, Luonan; Zhang, Xiang-Sun; Jin, Qi

    2013-02-01

    Computationally identifying effective biomarkers for cancers from gene expression profiles is an important and challenging task. The challenge lies in the complicated pathogenesis of cancers that often involve the dysfunction of many genes and regulatory interactions. Thus, sophisticated classification model is in pressing need. In this study, we proposed an efficient approach, called ellipsoidFN (ellipsoid Feature Net), to model the disease complexity by ellipsoids and seek a set of heterogeneous biomarkers. Our approach achieves a non-linear classification scheme for the mixed samples by the ellipsoid concept, and at the same time uses a linear programming framework to efficiently select biomarkers from high-dimensional space. ellipsoidFN reduces the redundancy and improves the complementariness between the identified biomarkers, thus significantly enhancing the distinctiveness between cancers and normal samples, and even between cancer types. Numerical evaluation on real prostate cancer, breast cancer and leukemia gene expression datasets suggested that ellipsoidFN outperforms the state-of-the-art biomarker identification methods, and it can serve as a useful tool for cancer biomarker identification in the future. The Matlab code of ellipsoidFN is freely available from http://doc.aporc.org/wiki/EllipsoidFN. PMID:23262226

  19. Different CHD chromatin remodelers are required for expression of distinct gene sets and specific stages during development of Dictyostelium discoideum

    PubMed Central

    Platt, James L.; Rogers, Benjamin J.; Rogers, Kelley C.; Harwood, Adrian J.; Kimmel, Alan R.

    2013-01-01

    Control of chromatin structure is crucial for multicellular development and regulation of cell differentiation. The CHD (chromodomain-helicase-DNA binding) protein family is one of the major ATP-dependent, chromatin remodeling factors that regulate nucleosome positioning and access of transcription factors and RNA polymerase to the eukaryotic genome. There are three mammalian CHD subfamilies and their impaired functions are associated with several human diseases. Here, we identify three CHD orthologs (ChdA, ChdB and ChdC) in Dictyostelium discoideum. These CHDs are expressed throughout development, but with unique patterns. Null mutants lacking each CHD have distinct phenotypes that reflect their expression patterns and suggest functional specificity. Accordingly, using genome-wide (RNA-seq) transcriptome profiling for each null strain, we show that the different CHDs regulate distinct gene sets during both growth and development. ChdC is an apparent ortholog of the mammalian Class III CHD group that is associated with the human CHARGE syndrome, and GO analyses of aberrant gene expression in chdC nulls suggest defects in both cell-autonomous and non-autonomous signaling, which have been confirmed through analyses of chdC nulls developed in pure populations or with low levels of wild-type cells. This study provides novel insight into the broad function of CHDs in the regulation development and disease, through chromatin-mediated changes in directed gene expression. PMID:24301467

  20. Comprehensive screening for a complete set of Japanese-population-specific filaggrin gene mutations.

    PubMed

    Kono, M; Nomura, T; Ohguchi, Y; Mizuno, O; Suzuki, S; Tsujiuchi, H; Hamajima, N; McLean, W H I; Shimizu, H; Akiyama, M

    2014-04-01

    Mutations in FLG coding profilaggrin cause ichthyosis vulgaris and are an important predisposing factor for atopic dermatitis. Until now, most case-control studies and population-based screenings have been performed only for prevalent mutations. In this study, we established a high-throughput FLG mutation detection system by real-time PCR with a set of two double-dye probes and conducted comprehensive screening for almost all of the Japanese-population-specific FLG mutations (ten FLG mutations). The present comprehensive screening for all ten FLG mutations provided a more precise prevalence rate for FLG mutations (11.1%, n = 820), which seemed high compared with data of previous reports based on screening for limited numbers of FLG mutations. Our comprehensive screening suggested that population-specific FLG mutations may be a significant predisposing factor for hay fever (odds ratio = 2.01 [95% CI: 1.027-3.936, P < 0.05]), although the sample sizes of this study were too small for reliable subphenotype analysis on the association between FLG mutations and hay fever in the eczema patients and the noneczema individuals, and it is not clear whether the association between FLG mutations and hay fever is due to the close association between FLG mutations and hay fever patients with eczema. PMID:24467288

  1. [A fast algorithm to build a supertree with a set of gene trees].

    PubMed

    Gorbunov, K Iu; Liubetskiĭ, V A

    2012-01-01

    Important desired properties of an algorithm to construct a supertree (species tree) by reconciling input trees are its low complexity and applicability to large biological data. In its common statement the problem is proved to be NP-hard, i.e. to have an exponential complexity in practice. We propose a reformulation of the supertree building problem that allows a computationally effective solution. We introduce a biologically natural requirement that the supertree is sought for such that it does not contain clades incompatible with those existing in the input trees. The algorithm was tested with simulated and biological trees and was shown to possess an almost square complexity even if horizontal transfers are allowed. If HGTs are not assumed, the algorithm is mathematically correct and possesses the longest running time of n3 x[V0]3, where n is the number of input trees and [V0] is the total number of species. The authors are unaware of analogous solutions in published evidence. The corresponding inferring program, its usage examples and manual are freely available at http://lab6.iitp.ru/en/super3gl. The available program does not implement HGTs. The generalized case is described in the publication "A tree nearest in average to a set of trees" (Information Transmission Problems, 2011). PMID:22642116

  2. A statistical evaluation of models for the initial settlement of the american continent emphasizes the importance of gene flow with Asia.

    PubMed

    Ray, N; Wegmann, D; Fagundes, N J R; Wang, S; Ruiz-Linares, A; Excoffier, L

    2010-02-01

    Although there is agreement in that the Bering Strait was the entry point for the initial colonization of the American continent, there is considerable uncertainty regarding the timing and pattern of human migration from Asia to America. In order to perform a statistical assessment of the relative probability of alternative migration scenarios and to estimate key demographic parameters associated with them, we used an approximate Bayesian computation framework to analyze a data set of 401 autosomal microsatellite loci typed in 29 native American populations. A major finding is that a single, discrete, wave of colonization is highly inconsistent with observed levels of genetic diversity. A scenario with two discrete migration waves is also not supported by the data. The current genetic diversity of Amerindian populations is best explained by a third model involving recurrent gene flow between Asia and America, after initial colonization. We estimate that this colonization involved about 100 individuals and occurred some 13,000 years ago, in agreement with well-established archeological data. PMID:19805438

  3. Isolation and functional analysis of a set of auxin genes with low root-inducing activity from an Agrobacterium tumefaciens biotype III strain.

    PubMed

    Huss, B; Bonnard, G; Otten, L

    1989-03-01

    A new type of root-inducing iaa gene set was cloned from the Ti plasmid of the biotype III Agrobacterium tumefaciens strain Tm-4. These iaa genes are characterized by a very low DNA homology with the well-characterized iaa gene set, iaaM and iaaH, of the "common DNA" region of the biotype I strain Ach5 and by a low root-inducing activity.The biological activities of both iaa gene sets were compared by transferring each into a disarmed Ti vector and by testing the resulting strains on Nicotiana rustica leaf discs, decapitated Datura stramonium stems, tomato plants and Kalanchoë daigremontiana. Tm-4 iaa genes have a reproducibly weaker root-inducing ability on Nicotiana rustica, induce very little tumour growth on decapitated Datura plants or on tomato plants and do not induce roots on Kalanchoë daigremontiana. The Tm-4 iaa region was mapped by λ:: Tn5 transposon mutagenesis and tested on Nicotiana rustica. These tests combined with complementation experiments map the iaa genes to a 4.5-kb region.The Tm-4 iaa genes were able to complement the corresponding Ach5 iaa genes on Nicotiana rustica, indicating that the differences between these genes are quantitative rather than qualitative. Complementation experiments on Kalanchoë showed the iaaM gene of Tm-4 responsible for the overall weak auxin activity of the intact iaa set. In view of the observed structural and functional differences we propose to call the Tm-4 iaa genes TB-iaaM and TB-iaaH and the Ach5 iaa genes A-iaaM and A-iaaH. PMID:24272862

  4. INDUCTION OF EARLY GROWTH RESPONSE GENE 2 EXPRESSION IN THE FOREBRAIN OF MICE PERFORMING AN ATTENTION-SET-SHIFTING TASK

    PubMed Central

    DeSteno, Deirdre A.; Schmauss, Claudia

    2008-01-01

    Early growth response (egr) genes encode transcription factors that are induced by stimuli that cause synaptic plasticity. Here we show that the expression of one member of this family, egr-2, is induced in the orbital frontal cortex (OFC) and medial prefrontal cortex (mPFC) of mice performing an attention-set-shifting task (ASST). The ASST is a series of two-choice perceptual discriminations between different odors and textures. Within the OFC and mPFC, different subregions exhibited egr-2 induction in response to different test-related features. In the medial OFC and the anterior cingulate subregion of the mPFC, egr-2 induction occurred in response to exposure to the novel odor stimulus. In the ventrolateral OFC and the pre- and infralimbic mPFC, additional egr-2 induction occurred during the associative learning phase of the ASST. In the infralimbic mPFC, further egr-2 induction occurred when mice performed set-shifting and reversal learning phases of the ASST. Mice with enhanced set-shifting performance exhibited decreased egr-2 induction in the mPFC indicating that the magnitude of egr-2 induction correlates with the magnitude of attentional demand. This decrease was largest in the infralimbic mPFC suggesting further that egr-2 induction in this region plays a role in the attentional control during set-shifting. In contrast to egr-2, neither egr-1 nor egr-3 expression was altered in ASST-tested mice, and no egr-2 induction occurred in mice that performed a spatial working memory task. These findings suggest a specific role of egr-2-mediated transcriptional activation in cognitive functions associated with attention. PMID:18280047

  5. A Set of miRNAs, Their Gene and Protein Targets and Stromal Genes Distinguish Early from Late Onset ER Positive Breast Cancer

    PubMed Central

    Bastos, E. P.; Brentani, H.; Pereira, C. A. B.; Polpo, A.; Lima, L.; Puga, R. D.; Pasini, F. S.; Osorio, C. A. B. T.; Roela, R. A.; Achatz, M. I.; Trapé, A. P.; Gonzalez-Angulo, A. M.; Brentani, M. M.

    2016-01-01

    Breast cancer (BC) in young adult patients (YA) has a more aggressive biological behavior and is associated with a worse prognosis than BC arising in middle aged patients (MA). We proposed that differentially expressed miRNAs could regulate genes and proteins underlying aggressive phenotypes of breast tumors in YA patients when compared to those arising in MA patients. Objective: Using integrated expression analyses of miRs, their mRNA and protein targets and stromal gene expression, we aimed to identify differentially expressed profiles between tumors from YA-BC and MA-BC. Methodology and Results: Samples of ER+ invasive ductal breast carcinomas, divided into two groups: YA-BC (35 years or less) or MA-BC (50–65 years) were evaluated. Screening for BRCA1/2 status according to the BOADICEA program indicated low risk of patients being carriers of these mutations. Aggressive characteristics were more evident in YA-BC versus MA-BC. Performing qPCR, we identified eight miRs differentially expressed (miR-9, 18b, 33b, 106a, 106b, 210, 518a-3p and miR-372) between YA-BC and MA-BC tumors with high confidence statement, which were associated with aggressive clinicopathological characteristics. The expression profiles by microarray identified 602 predicted target genes associated to proliferation, cell cycle and development biological functions. Performing RPPA, 24 target proteins differed between both groups and 21 were interconnected within a network protein-protein interactions associated with proliferation, development and metabolism pathways over represented in YA-BC. Combination of eight mRNA targets or the combination of eight target proteins defined indicators able to classify individual samples into YA-BC or MA-BC groups. Fibroblast-enriched stroma expression profile analysis resulted in 308 stromal genes differentially expressed between YA-BC and MA-BC. Conclusion: We defined a set of differentially expressed miRNAs, their mRNAs and protein targets and stromal

  6. Gene Sets for Utilization of Primary and Secondary Nutrition Supplies in the Distal Gut of Endangered Iberian Lynx

    PubMed Central

    Alcaide, María; Messina, Enzo; Richter, Michael; Bargiela, Rafael; Peplies, Jörg; Huws, Sharon A.; Newbold, Charles J.; Golyshin, Peter N.; Simón, Miguel A.; López, Guillermo; Yakimov, Michail M.; Ferrer, Manuel

    2012-01-01

    Recent studies have indicated the existence of an extensive trans-genomic trans-mural co-metabolism between gut microbes and animal hosts that is diet-, host phylogeny- and provenance-influenced. Here, we analyzed the biodiversity at the level of small subunit rRNA gene sequence and the metabolic composition of 18 Mbp of consensus metagenome sequences and activity characteristics of bacterial intra-cellular extracts, in wild Iberian lynx (Lynx pardinus) fecal samples. Bacterial signatures (14.43% of all of the Firmicutes reads and 6.36% of total reads) related to the uncultured anaerobic commensals Anaeroplasma spp., which are typically found in ovine and bovine rumen, were first identified. The lynx gut was further characterized by an over-representation of ‘presumptive’ aquaporin aqpZ genes and genes encoding ‘active’ lysosomal-like digestive enzymes that are possibly needed to acquire glycerol, sugars and amino acids from glycoproteins, glyco(amino)lipids, glyco(amino)glycans and nucleoside diphosphate sugars. Lynx gut was highly enriched (28% of the total glycosidases) in genes encoding α-amylase and related enzymes, although it exhibited low rate of enzymatic activity indicative of starch degradation. The preponderance of β-xylosidase activity in protein extracts further suggests lynx gut microbes being most active for the metabolism of β-xylose containing plant N-glycans, although β-xylosidases sequences constituted only 1.5% of total glycosidases. These collective and unique bacterial, genetic and enzymatic activity signatures suggest that the wild lynx gut microbiota not only harbors gene sets underpinning sugar uptake from primary animal tissues (with the monotypic dietary profile of the wild lynx consisting of 80–100% wild rabbits) but also for the hydrolysis of prey-derived plant biomass. Although, the present investigation corresponds to a single sample and some of the statements should be considered qualitative, the data most likely

  7. Optimization to the Culture Conditions for Phellinus Production with Regression Analysis and Gene-Set Based Genetic Algorithm

    PubMed Central

    Li, Zhongwei; Xin, Yuezhen; Wang, Xun; Sun, Beibei; Xia, Shengyu; Li, Hui

    2016-01-01

    Phellinus is a kind of fungus and is known as one of the elemental components in drugs to avoid cancers. With the purpose of finding optimized culture conditions for Phellinus production in the laboratory, plenty of experiments focusing on single factor were operated and large scale of experimental data were generated. In this work, we use the data collected from experiments for regression analysis, and then a mathematical model of predicting Phellinus production is achieved. Subsequently, a gene-set based genetic algorithm is developed to optimize the values of parameters involved in culture conditions, including inoculum size, PH value, initial liquid volume, temperature, seed age, fermentation time, and rotation speed. These optimized values of the parameters have accordance with biological experimental results, which indicate that our method has a good predictability for culture conditions optimization. PMID:27610365

  8. Optimization to the Culture Conditions for Phellinus Production with Regression Analysis and Gene-Set Based Genetic Algorithm.

    PubMed

    Li, Zhongwei; Xin, Yuezhen; Wang, Xun; Sun, Beibei; Xia, Shengyu; Li, Hui; Zhu, Hu

    2016-01-01

    Phellinus is a kind of fungus and is known as one of the elemental components in drugs to avoid cancers. With the purpose of finding optimized culture conditions for Phellinus production in the laboratory, plenty of experiments focusing on single factor were operated and large scale of experimental data were generated. In this work, we use the data collected from experiments for regression analysis, and then a mathematical model of predicting Phellinus production is achieved. Subsequently, a gene-set based genetic algorithm is developed to optimize the values of parameters involved in culture conditions, including inoculum size, PH value, initial liquid volume, temperature, seed age, fermentation time, and rotation speed. These optimized values of the parameters have accordance with biological experimental results, which indicate that our method has a good predictability for culture conditions optimization. PMID:27610365

  9. Joint optimization of segmentation and shape prior from level-set-based statistical shape model, and its application to the automated segmentation of abdominal organs.

    PubMed

    Saito, Atsushi; Nawano, Shigeru; Shimizu, Akinobu

    2016-02-01

    The goal of this study is to provide a theoretical framework for accurately optimizing the segmentation energy considering all of the possible shapes generated from the level-set-based statistical shape model (SSM). The proposed algorithm solves the well-known open problem, in which a shape prior may not be optimal in terms of an objective functional that needs to be minimized during segmentation. The algorithm allows the selection of an optimal shape prior from among all possible shapes generated from an SSM by conducting a branch-and-bound search over an eigenshape space. The proposed algorithm does not require predefined shape templates or the construction of a hierarchical clustering tree before graph-cut segmentation. It jointly optimizes an objective functional in terms of both the shape prior and segmentation labeling, and finds an optimal solution by considering all possible shapes generated from an SSM. We apply the proposed algorithm to both pancreas and spleen segmentation using multiphase computed tomography volumes, and we compare the results obtained with those produced by a conventional algorithm employing a branch-and-bound search over a search tree of predefined shapes, which were sampled discretely from an SSM. The proposed algorithm significantly improves the segmentation performance in terms of the Jaccard index and Dice similarity index. In addition, we compare the results with the state-of-the-art multiple abdominal organs segmentation algorithm, and confirmed that the performances of both algorithms are comparable to each other. We discuss the high computational efficiency of the proposed algorithm, which was determined experimentally using a normalized number of traversed nodes in a search tree, and the extensibility of the proposed algorithm to other SSMs or energy functionals. PMID:26716720

  10. Maintenance of phenotypic diversity within a set of virulence encoding genes of the malaria parasite Plasmodium falciparum

    PubMed Central

    Holding, Thomas; Recker, Mario

    2015-01-01

    Infection by the human malaria parasite Plasmodium falciparum results in a broad spectrum of clinical outcomes, ranging from severe and potentially life-threatening malaria to asymptomatic carriage. In a process of naturally acquired immunity, individuals living in malaria-endemic regions build up a level of clinical protection, which attenuates infection severity in an exposure-dependent manner. Underlying this shift in the immunoepidemiology as well as the observed range in malaria pathogenesis is the var multigene family and the phenotypic diversity embedded within. The var gene-encoded surface proteins Plasmodium falciparum erythrocyte membrane protein 1 mediate variant-specific binding of infected red blood cells to a diverse set of host receptors that has been linked to specific disease manifestations, including cerebral and pregnancy-associated malaria. Here, we show that cross-reactive immune responses, which minimize the within-host benefit of each additionally expressed gene during infection, can cause selection for maximum phenotypic diversity at the genome level. We further show that differential functional constraints on protein diversification stably maintain uneven ratios between phenotypic groups, in line with empirical observation. Our results thus suggest that the maintenance of phenotypic diversity within P. falciparum is driven by an evolutionary trade-off that optimizes between within-host parasite fitness and between-host selection pressure. PMID:26674193

  11. Altered neuronal gene expression in brain regions differentially affected by Alzheimer’s disease: a reference data set

    PubMed Central

    Liang, Winnie S.; Dunckley, Travis; Beach, Thomas G.; Grover, Andrew; Mastroeni, Diego; Ramsey, Keri; Caselli, Richard J.; Kukull, Walter A.; McKeel, Daniel; Morris, John C.; Hulette, Christine M.; Schmechel, Donald; Reiman, Eric M.; Rogers, Joseph; Stephan, Dietrich A.

    2009-01-01

    Alzheimer’s Disease (AD) is the most widespread form of dementia during the later stages of life. If improved therapeutics are not developed, the prevalence of AD will drastically increase in the coming years as the world’s population ages. By identifying differences in neuronal gene expression profiles between healthy elderly persons and individuals diagnosed with AD, we may be able to better understand the molecular mechanisms that drive AD pathogenesis, including the formation of amyloid plaques and neurofibrillary tangles. In this study, we expression profiled histopathologically normal cortical neurons collected with laser capture microdissection (LCM) from six anatomically and functionally discrete postmortem brain regions in 34 AD-afflicted individuals, using Affymetrix Human Genome U133 Plus 2.0 microarrays. These regions include the entorhinal cortex, hippocampus, middle temporal gyrus, posterior cingulate cortex, superior frontal gyrus, and primary visual cortex. This study is predicated on previous parallel research on the postmortem brains of the same six regions in 14 healthy elderly individuals, for which LCM neurons were similarly processed for expression analysis. We identified significant regional differential expression in AD brains compared with control brains including expression changes of genes previously implicated in AD pathogenesis, particularly with regards to tangle and plaque formation. Pinpointing the expression of factors that may play a role in AD pathogenesis provides a foundation for future identification of new targets for improved AD therapeutics. We provide this carefully phenotyped, laser capture microdissected intraindividual brain region expression data set to the community as a public resource. PMID:18270320

  12. Maintenance of phenotypic diversity within a set of virulence encoding genes of the malaria parasite Plasmodium falciparum.

    PubMed

    Holding, Thomas; Recker, Mario

    2015-12-01

    Infection by the human malaria parasite Plasmodium falciparum results in a broad spectrum of clinical outcomes, ranging from severe and potentially life-threatening malaria to asymptomatic carriage. In a process of naturally acquired immunity, individuals living in malaria-endemic regions build up a level of clinical protection, which attenuates infection severity in an exposure-dependent manner. Underlying this shift in the immunoepidemiology as well as the observed range in malaria pathogenesis is the var multigene family and the phenotypic diversity embedded within. The var gene-encoded surface proteins Plasmodium falciparum erythrocyte membrane protein 1 mediate variant-specific binding of infected red blood cells to a diverse set of host receptors that has been linked to specific disease manifestations, including cerebral and pregnancy-associated malaria. Here, we show that cross-reactive immune responses, which minimize the within-host benefit of each additionally expressed gene during infection, can cause selection for maximum phenotypic diversity at the genome level. We further show that differential functional constraints on protein diversification stably maintain uneven ratios between phenotypic groups, in line with empirical observation. Our results thus suggest that the maintenance of phenotypic diversity within P. falciparum is driven by an evolutionary trade-off that optimizes between within-host parasite fitness and between-host selection pressure. PMID:26674193

  13. Analysis of protein gene products in cells with altered chromosome sets for the purpose of genetic mapping

    SciTech Connect

    Shishkin, S.S.; Zakharov, S.F.; Gromov, P.S.; Shcheglova, M.V.; Kukharenko, V.I.; Shilov, A.G.; Matveeva, N.M.; Zhdanova, N.S.; Efimochkin, A.S.; Krokhina, T.B. |

    1994-12-01

    Two-dimensional electrophoresis was used for analyzing proteins in hybrid cells that contained single human chromosomes (chromosome 5, chromosome 21, or chromosomes 5 and 21) against the background of the mouse genome. By comparing the protein patterns of hybrid and parent cells (about 1000 protein fractions for each kind of cell), five fractions among proteins of hybrid cells were supposedly identified as human proteins. The genes of two of them are probably located on chromosome 5, and those of the other three on chromosome 21. Moreover, analysis of proteins in fibroblasts of patients with the cri-du-chat syndrome (5p-) revealed a decrease in the content of two proteins as compared with those in preparations of diploid fibroblasts. This fact was regarded as evidence that two corresponding genes are located on the short arm of chromosome 5. Methodological problems associated with the use of protein pattern analysis in cells with altered chromosome sets for the purposes of genetic mapping are discussed.

  14. An optimized grapevine RNA isolation procedure and statistical determination of reference genes for real-time RT-PCR during berry development

    PubMed Central

    Reid, Karen E; Olsson, Niclas; Schlosser, James; Peng, Fred; Lund, Steven T

    2006-01-01

    Background Accuracy in quantitative real-time RT-PCR is dependent on high quality RNA, consistent cDNA synthesis, and validated stable reference genes for data normalization. Reference genes used for normalization impact the results generated from expression studies and, hence, should be evaluated prior to use across samples and treatments. Few statistically validated reference genes have been reported in grapevine. Moreover, success in isolating high quality RNA from grapevine tissues is typically limiting due to low pH, and high polyphenolic and polysaccharide contents. Results We describe optimization of an RNA isolation procedure that compensates for the low pH found in grape berries and improves the ability of the RNA to precipitate. This procedure was tested on pericarp and seed developmental series, as well as steady-state leaf, root, and flower tissues. Additionally, the expression stability of actin, AP47 (clathrin-associated protein), cyclophilin, EF1-α (elongation factor 1-α), GAPDH (glyceraldehyde 3-phosphate dehydrogenase), MDH (malate dehydrogenase), PP2A (protein phosphatase), SAND, TIP41, α-tubulin, β-tubulin, UBC (ubiquitin conjugating enzyme), UBQ-L40 (ubiquitin L40) and UBQ10 (polyubiquitin) were evaluated on Vitis vinifera cv. Cabernet Sauvignon pericarp using three different statistical approaches. Although several of the genes proved to be relatively stable, no single gene outperformed all other genes in each of the three evaluation methods tested. Furthermore, the effect of using one reference gene versus normalizing to the geometric mean of several genes is presented for the expression of an aquaporin and a sucrose transporter over a developmental series. Conclusion In order to quantify relative transcript abundances accurately using real-time RT-PCR, we recommend that combinations of several genes be used for normalization in grape berry development studies. Our data support GAPDH, actin, EF1-α and SAND as the most relevant reference

  15. Comparative genomic analysis of Brucella abortus vaccine strain 104M reveals a set of candidate genes associated with its virulence attenuation

    PubMed Central

    Yu, Dong; Hui, Yiming; Zai, Xiaodong; Xu, Junjie; Liang, Long; Wang, Bingxiang; Yue, Junjie; Li, Shanhu

    2015-01-01

    The Brucella abortus strain 104M, a spontaneously attenuated strain, has been used as a vaccine strain in humans against brucellosis for 6 decades in China. Despite many studies, the molecular mechanisms that cause the attenuation are still unclear. Here, we determined the whole-genome sequence of 104M and conducted a comprehensive comparative analysis against the whole genome sequences of the virulent strain, A13334, and other reference strains. This analysis revealed a highly similar genome structure between 104M and A13334. The further comparative genomic analysis between 104M and A13334 revealed a set of genes missing in 104M. Some of these genes were identified to be directly or indirectly associated with virulence. Similarly, a set of mutations in the virulence-related genes was also identified, which may be related to virulence alteration. This study provides a set of candidate genes associated with virulence attenuation in B.abortus vaccine strain 104M. PMID:26039674

  16. The Effects of Violation of Data Set Assumptions when Using the Oneway, Fixed Effects Analysis of Variance and the One Concomitant Analysis of Covariance Statistical Procedures.

    ERIC Educational Resources Information Center

    Johnson, Colleen Cook

    The purpose of this study is to help define the precise nature and limits of the tolerable range in which a researcher may be relatively confident about the statistical validity of his or her research findings, focusing specifically on the statistical validity of results when violating the assumptions associated with the one-way, fixed-effects…

  17. Analysis of Five Gene Sets in Chimpanzees Suggests Decoupling between the Action of Selection on Protein-Coding and on Noncoding Elements

    PubMed Central

    Santpere, Gabriel; Carnero-Montoro, Elena; Petit, Natalia; Serra, François; Hvilsom, Christina; Rambla, Jordi; Heredia-Genestar, Jose Maria; Halligan, Daniel L.; Dopazo, Hernan; Navarro, Arcadi; Bosch, Elena

    2015-01-01

    We set out to investigate potential differences and similarities between the selective forces acting upon the coding and noncoding regions of five different sets of genes defined according to functional and evolutionary criteria: 1) two reference gene sets presenting accelerated and slow rates of protein evolution (the Complement and Actin pathways); 2) a set of genes with evidence of accelerated evolution in at least one of their introns; and 3) two gene sets related to neurological function (Parkinson’s and Alzheimer’s diseases). To that effect, we combine human–chimpanzee divergence patterns with polymorphism data obtained from target resequencing 20 central chimpanzees, our closest relatives with largest long-term effective population size. By using the distribution of fitness effect-alpha extension of the McDonald–Kreitman test, we reproduce inferences of rates of evolution previously based only on divergence data on both coding and intronic sequences and also obtain inferences for other classes of genomic elements (untranslated regions, promoters, and conserved noncoding sequences). Our results suggest that 1) the distribution of fitness effect-alpha method successfully helps distinguishing different scenarios of accelerated divergence (adaptation or relaxed selective constraints) and 2) the adaptive history of coding and noncoding sequences within the gene sets analyzed is decoupled. PMID:25977458

  18. Analysis of Five Gene Sets in Chimpanzees Suggests Decoupling between the Action of Selection on Protein-Coding and on Noncoding Elements.

    PubMed

    Santpere, Gabriel; Carnero-Montoro, Elena; Petit, Natalia; Serra, François; Hvilsom, Christina; Rambla, Jordi; Heredia-Genestar, Jose Maria; Halligan, Daniel L; Dopazo, Hernan; Navarro, Arcadi; Bosch, Elena

    2015-06-01

    We set out to investigate potential differences and similarities between the selective forces acting upon the coding and noncoding regions of five different sets of genes defined according to functional and evolutionary criteria: 1) two reference gene sets presenting accelerated and slow rates of protein evolution (the Complement and Actin pathways); 2) a set of genes with evidence of accelerated evolution in at least one of their introns; and 3) two gene sets related to neurological function (Parkinson's and Alzheimer's diseases). To that effect, we combine human-chimpanzee divergence patterns with polymorphism data obtained from target resequencing 20 central chimpanzees, our closest relatives with largest long-term effective population size. By using the distribution of fitness effect-alpha extension of the McDonald-Kreitman test, we reproduce inferences of rates of evolution previously based only on divergence data on both coding and intronic sequences and also obtain inferences for other classes of genomic elements (untranslated regions, promoters, and conserved noncoding sequences). Our results suggest that 1) the distribution of fitness effect-alpha method successfully helps distinguishing different scenarios of accelerated divergence (adaptation or relaxed selective constraints) and 2) the adaptive history of coding and noncoding sequences within the gene sets analyzed is decoupled. PMID:25977458

  19. ArraySolver: An Algorithm for Colour-Coded Graphical Display and Wilcoxon Signed-Rank Statistics for Comparing Microarray Gene Expression Data

    PubMed Central

    2004-01-01

    The massive surge in the production of microarray data poses a great challenge for proper analysis and interpretation. In recent years numerous computational tools have been developed to extract meaningful interpretation of microarray gene expression data. However, a convenient tool for two-groups comparison of microarray data is still lacking and users have to rely on commercial statistical packages that might be costly and require special skills, in addition to extra time and effort for transferring data from one platform to other. Various statistical methods, including the t-test, analysis of variance, Pearson test and Mann–Whitney U test, have been reported for comparing microarray data, whereas the utilization of the Wilcoxon signed-rank test, which is an appropriate test for two-groups comparison of gene expression data, has largely been neglected in microarray studies. The aim of this investigation was to build an integrated tool, ArraySolver, for colour-coded graphical display and comparison of gene expression data using the Wilcoxon signed-rank test. The results of software validation showed similar outputs with ArraySolver and SPSS for large datasets. Whereas the former program appeared to be more accurate for 25 or fewer pairs (n ≤ 25), suggesting its potential application in analysing molecular signatures that usually contain small numbers of genes. The main advantages of ArraySolver are easy data selection, convenient report format, accurate statistics and the familiar Excel platform. PMID:18629036

  20. Validation of the Lung Subtyping Panel in Multiple Fresh-Frozen and Formalin-Fixed, Paraffin-Embedded Lung Tumor Gene Expression Data Sets.

    PubMed

    Faruki, Hawazin; Mayhew, Gregory M; Fan, Cheng; Wilkerson, Matthew D; Parker, Scott; Kam-Morgan, Lauren; Eisenberg, Marcia; Horten, Bruce; Hayes, D Neil; Perou, Charles M; Lai-Goldman, Myla

    2016-06-01

    Context .- A histologic classification of lung cancer subtypes is essential in guiding therapeutic management. Objective .- To complement morphology-based classification of lung tumors, a previously developed lung subtyping panel (LSP) of 57 genes was tested using multiple public fresh-frozen gene-expression data sets and a prospectively collected set of formalin-fixed, paraffin-embedded lung tumor samples. Design .- The LSP gene-expression signature was evaluated in multiple lung cancer gene-expression data sets totaling 2177 patients collected from 4 platforms: Illumina RNAseq (San Diego, California), Agilent (Santa Clara, California) and Affymetrix (Santa Clara) microarrays, and quantitative reverse transcription-polymerase chain reaction. Gene centroids were calculated for each of 3 genomic-defined subtypes: adenocarcinoma, squamous cell carcinoma, and neuroendocrine, the latter of which encompassed both small cell carcinoma and carcinoid. Classification by LSP into 3 subtypes was evaluated in both fresh-frozen and formalin-fixed, paraffin-embedded tumor samples, and agreement with the original morphology-based diagnosis was determined. Results .- The LSP-based classifications demonstrated overall agreement with the original clinical diagnosis ranging from 78% (251 of 322) to 91% (492 of 538 and 869 of 951) in the fresh-frozen public data sets and 84% (65 of 77) in the formalin-fixed, paraffin-embedded data set. The LSP performance was independent of tissue-preservation method and gene-expression platform. Secondary, blinded pathology review of formalin-fixed, paraffin-embedded samples demonstrated concordance of 82% (63 of 77) with the original morphology diagnosis. Conclusions .- The LSP gene-expression signature is a reproducible and objective method for classifying lung tumors and demonstrates good concordance with morphology-based classification across multiple data sets. The LSP panel can supplement morphologic assessment of lung cancers, particularly

  1. Expression profiling of cell cycle genes reveals key facilitators of cell production during carpel development, fruit set, and fruit growth in apple (Malus×domestica Borkh.)

    PubMed Central

    Malladi, Anish; Johnson, Lisa Klima

    2011-01-01

    Cell production is an essential facilitator of fruit growth and development. Cell production during carpel/floral-tube growth, fruit set, and fruit growth, and its regulation by cell cycle genes were investigated in apple (Malus×domestica Borkh.). Cell production was inhibited during late carpel/floral-tube development, resulting in growth arrest before bloom. Fruit set re-activated cell production between 8 d and 11 d after full bloom (DAFB) and triggered fruit growth. The early phase of fruit growth involved rapid cell production followed by exit from cell proliferation at ∼24 DAFB. Seventy-one cell cycle genes were identified, and expression of 59 genes was investigated using quantitative RT-PCR. Changes in expression of 19 genes were consistently associated with transitions in cell production during carpel/floral-tube growth, fruit set, and fruit growth. Fourteen genes, including B-type cyclin-dependent kinases (CDKs) and A2-, B1-, and B2-type cyclins, were positively associated with cell production, suggesting that availability of G2/M phase regulators of the cell cycle is limiting for cell proliferation. Enhanced expression of five genes including that of the putative CDK inhibitors, MdKRP4 and MdKRP5, was associated with reduced cell production. Exit from cell proliferation at G0/G1 during fruit growth was facilitated by multiple mechanisms including down-regulation of putative regulators of G1/S and G2/M phase progression and up-regulation of KRP genes. Interestingly, two CDKA genes and several CDK-activating factors were up-regulated during this period, suggesting functions for these genes in mediating exit from cell proliferation at G0/G1. Together, the data indicate that cell cycle genes are important facilitators of cell production during apple fruit development. PMID:20732881

  2. Integrative structural annotation of de novo RNA-Seq provides an accurate reference gene set of the enormous genome of the onion (Allium cepa L.)

    PubMed Central

    Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil

    2015-01-01

    The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp. PMID:25362073

  3. The Effects of Single and Compound Violations of Data Set Assumptions when Using the Oneway, Fixed Effects Analysis of Variance and the One Concomitant Analysis of Covariance Statistical Models.

    ERIC Educational Resources Information Center

    Johnson, Colleen Cook

    This study integrates into one comprehensive Monte Carlo simulation a vast array of previously defined and substantively interrelated research studies of the robustness of analysis of variance (ANOVA) and analysis of covariance (ANCOVA) statistical procedures. Three sets of balanced ANOVA and ANCOVA designs (group sizes of 15, 30, and 45) and one…

  4. Fast set-based association analysis using summary data from GWAS identifies novel gene loci for human complex traits.

    PubMed

    Bakshi, Andrew; Zhu, Zhihong; Vinkhuyzen, Anna A E; Hill, W David; McRae, Allan F; Visscher, Peter M; Yang, Jian

    2016-01-01

    We propose a method (fastBAT) that performs a fast set-based association analysis for human complex traits using summary-level data from genome-wide association studies (GWAS) and linkage disequilibrium (LD) data from a reference sample with individual-level genotypes. We demonstrate using simulations and analyses of real datasets that fastBAT is more accurate and orders of magnitude faster than the prevailing methods. Using fastBAT, we analyze summary data from the latest meta-analyses of GWAS on 150,064-339,224 individuals for height, body mass index (BMI), and schizophrenia. We identify 6 novel gene loci for height, 2 for BMI, and 3 for schizophrenia at PfastBAT < 5 × 10(-8). The gain of power is due to multiple small independent association signals at these loci (e.g. the THRB and FOXP1 loci for schizophrenia). The method is general and can be applied to GWAS data for all complex traits and diseases in humans and to such data in other species. PMID:27604177

  5. Fast set-based association analysis using summary data from GWAS identifies novel gene loci for human complex traits

    PubMed Central

    Bakshi, Andrew; Zhu, Zhihong; Vinkhuyzen, Anna A. E.; Hill, W. David; McRae, Allan F.; Visscher, Peter M.; Yang, Jian

    2016-01-01

    We propose a method (fastBAT) that performs a fast set-based association analysis for human complex traits using summary-level data from genome-wide association studies (GWAS) and linkage disequilibrium (LD) data from a reference sample with individual-level genotypes. We demonstrate using simulations and analyses of real datasets that fastBAT is more accurate and orders of magnitude faster than the prevailing methods. Using fastBAT, we analyze summary data from the latest meta-analyses of GWAS on 150,064–339,224 individuals for height, body mass index (BMI), and schizophrenia. We identify 6 novel gene loci for height, 2 for BMI, and 3 for schizophrenia at PfastBAT < 5 × 10−8. The gain of power is due to multiple small independent association signals at these loci (e.g. the THRB and FOXP1 loci for schizophrenia). The method is general and can be applied to GWAS data for all complex traits and diseases in humans and to such data in other species. PMID:27604177

  6. Intron loss in interferon genes follows a distinct set of stages, and may confer an evolutionary advantage.

    PubMed

    Krause, Christopher D

    2016-07-01

    The promoter-intron-exon structure of genes evolve. While the structures of some IFN genes (e.g., piscine and amphibian Type I IFNs, most tetrapod IFN-λ genes) resemble those of other class II cytokines (e.g., interleukins-10, 19, 20, 22, 24, 26), the structures of other IFN genes differ significantly. Although all bony vertebrate IFN-γ genes lack the canonical third intron, and all amniote Type I IFN genes lack introns, only some IFN-λ genes lost their introns. Interestingly, these intronless IFN-λ genes are not preferentially related to one another nor are they clustered with canonical multi-intron IFN-λ genes. Hypothesizing that intronless IFN-λ genes repeatedly and independently evolved and transposed throughout the genome, we sought to understand the genetic processes involved in their intron loss and genomic migration. Utilizing the high conservation of the promoters, the UTRs and the ORFs of the IFN-λ genes, we collected data from two families of intronless IFN-λ genes, and developed a model supported by these data to explain how intronless IFN-λ genes evolved. (1) A cytoplasmic IFN-λ cDNA generated by reverse transcriptional activity enters the nucleus and attempts to recombine with its multi-exon progenitor. (2) Nuclear DNA synthesis at the 5' and 3' ends within recombination intermediates affixes the promoter onto the cDNA and preserves its 3' UTR. (3) Resolution of the recombination complex releases the promoter-associated cDNA. (4) The released intronless gene co-integrates with a highly duplicated sequence undergoing transposition. We propose that this process explains not only the evolution of the gene structure of IFN genes, but also the increased transposition of intronless genes in genomes, and may confer an evolutionary advantage. PMID:27155818

  7. The STATFLUX code: a statistical method for calculation of flow and set of parameters, based on the Multiple-Compartment Biokinetical Model

    NASA Astrophysics Data System (ADS)

    Garcia, F.; Mesa, J.; Arruda-Neto, J. D. T.; Helene, O.; Vanin, V.; Milian, F.; Deppman, A.; Rodrigues, T. E.; Rodriguez, O.

    2007-03-01

    The code STATFLUX, implementing a new and simple statistical procedure for the calculation of transfer coefficients in radionuclide transport to animals and plants, is proposed. The method is based on the general multiple-compartment model, which uses a system of linear equations involving geometrical volume considerations. Flow parameters were estimated by employing two different least-squares procedures: Derivative and Gauss-Marquardt methods, with the available experimental data of radionuclide concentrations as the input functions of time. The solution of the inverse problem, which relates a given set of flow parameter with the time evolution of concentration functions, is achieved via a Monte Carlo simulation procedure. Program summaryTitle of program:STATFLUX Catalogue identifier:ADYS_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADYS_v1_0 Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Licensing provisions: none Computer for which the program is designed and others on which it has been tested:Micro-computer with Intel Pentium III, 3.0 GHz Installation:Laboratory of Linear Accelerator, Department of Experimental Physics, University of São Paulo, Brazil Operating system:Windows 2000 and Windows XP Programming language used:Fortran-77 as implemented in Microsoft Fortran 4.0. NOTE: Microsoft Fortran includes non-standard features which are used in this program. Standard Fortran compilers such as, g77, f77, ifort and NAG95, are not able to compile the code and therefore it has not been possible for the CPC Program Library to test the program. Memory required to execute with typical data:8 Mbytes of RAM memory and 100 MB of Hard disk memory No. of bits in a word:16 No. of lines in distributed program, including test data, etc.:6912 No. of bytes in distributed program, including test data, etc.:229 541 Distribution format:tar.gz Nature of the physical problem:The investigation of transport mechanisms for

  8. SAP domain-dependent Mkl1 signaling stimulates proliferation and cell migration by induction of a distinct gene set indicative of poor prognosis in breast cancer patients

    PubMed Central

    2014-01-01

    Background The main cause of death of breast cancer patients is not the primary tumor itself but the metastatic disease. Identifying breast cancer-specific signatures for metastasis and learning more about the nature of the genes involved in the metastatic process would 1) improve our understanding of the mechanisms of cancer progression and 2) reveal new therapeutic targets. Previous studies showed that the transcriptional regulator megakaryoblastic leukemia-1 (Mkl1) induces tenascin-C expression in normal and transformed mammary epithelial cells. Tenascin-C is known to be expressed in metastatic niches, is highly induced in cancer stroma and promotes breast cancer metastasis to the lung. Methods Using HC11 mammary epithelial cells overexpressing different Mkl1 constructs, we devised a subtractive transcript profiling screen to identify the mechanism by which Mkl1 induces a gene set co-regulated with tenascin-C. We performed computational analysis of the Mkl1 target genes and used cell biological experiments to confirm the effect of these gene products on cell behavior. To analyze whether this gene set is prognostic of accelerated cancer progression in human patients, we used the bioinformatics tool GOBO that allowed us to investigate a large breast tumor data set linked to patient data. Results We discovered a breast cancer-specific set of genes including tenascin-C, which is regulated by Mkl1 in a SAP domain-dependent, serum response factor-independent manner and is strongly implicated in cell proliferation, cell motility and cancer. Downregulation of this set of transcripts by overexpression of Mkl1 lacking the SAP domain inhibited cell growth and cell migration. Many of these genes are direct Mkl1 targets since their promoter-reporter constructs were induced by Mkl1 in a SAP domain-dependent manner. Transcripts, most strongly reduced in the absence of the SAP domain were mechanoresponsive. Finally, expression of this gene set is associated with high

  9. Identification of key genes in hepatocellular carcinoma and validation of the candidate gene, cdc25a, using gene set enrichment analysis, meta-analysis and cross-species comparison.

    PubMed

    Lu, Xiaoxu; Sun, Wen; Tang, Yanping; Zhu, Lingqun; Li, Yuan; Ou, Chao; Yang, Chun; Su, Jianjia; Luo, Chengpiao; Hu, Yanling; Cao, Ji

    2016-02-01

    The aim of the present study was to determine key pathways and genes involved in the pathogenesis of hepatocellular carcinoma (HCC) through bioinformatic analyses of HCC microarray data based on cross-species comparison. Microarray data of gene expression in HCC in different species were analyzed using gene set enrichment analysis (GSEA) and meta-analysis. Reverse transcription-quantitative polymerase chain reaction and western blotting were performed to determine the mRNA and protein expression levels of cdc25a, one of the identified candidate genes, in human, rat and tree shrew samples. The cell cycle pathway had the largest overlap between the GSEA and meta-analysis. Meta-analyses showed that 25 genes, including cdc25a, in the cell cycle pathway were differentially expressed. Cdc25a mRNA levels in HCC tissues were higher than those in normal liver tissues in humans, rats and tree shrews, and the expression level of cdc25a in HCC tissues was higher than in corresponding paraneoplastic tissues in humans and rats. In human HCC tissues, the cdc25a mRNA level was significantly correlated with clinical stage, portal vein tumor thrombosis and extrahepatic metastasis. Western blotting showed that, cdc25a protein levels were significantly upregulated in HCC tissues in humans, rats and tree shrews. In conclusion, GSEA and meta-analysis can be combined to identify key molecules and pathways involved in HCC. This study demonstrated that the cell cycle pathway and the cdc25a gene may be crucial in the pathogenesis and progression of HCC. PMID:26647881

  10. Setting the Tone: A Discursive Case Study of Problem-Based Inquiry Learning to Start a Graduate Statistics Course for In-Service Teachers

    ERIC Educational Resources Information Center

    Lesser, Lawrence M.; Kephart, Kerrie

    2011-01-01

    The first day of a course has great potential to set the tone for the entire course, planting the seeds for habits of mind and questioning and setting in motion expectations for classroom discourse. Rather than let the first meeting contain little besides going over the syllabus, the instructor (Lesser) decided to use two sustained open-ended…

  11. Different Sets of Post-Embryonic Development Genes Are Conserved or Lost in Two Caryophyllales Species (Reaumuria soongorica and Agriophyllum squarrosum).

    PubMed

    Zhao, Pengshan; Zhang, Jiwei; Zhao, Xin; Chen, Guoxiong; Ma, Xiao-Fei

    2016-01-01

    Reaumuria soongorica and sand rice (Agriophyllum squarrosum) belong to the clade of Caryophyllales and are widely distributed in the desert regions of north China. Both plants have evolved many specific traits and adaptation strategies to cope with recurring environmental threats. However, the genetic basis that underpins their unique traits and adaptation remains unknown. In this study, the transcriptome data of R. soongorica and sand rice were compared with three other species with previously sequenced genomes (Arabidopsis thaliana, Oryza sativa, and Beta vulgaris). Four different gene sets were identified, namely, the genes conserved in both species, those lost in both species, those conserved in R. soongorica only, and those conserved in sand rice only. Gene ontology showed that post-embryonic development genes (PEDGs) were enriched in all gene sets, and different sets of PEDGs were conserved or lost in both the R. soongorica and sand rice genomes. Expression profiles of Arabidopsis orthologs further provided some clues to the function of the species-specific conserved PEDGs. Such orthologs included LEAFY PETIOLE, which could be a candidate gene involved in the development of branch priority in sand rice. PMID:26815143

  12. Different Sets of Post-Embryonic Development Genes Are Conserved or Lost in Two Caryophyllales Species (Reaumuria soongorica and Agriophyllum squarrosum)

    PubMed Central

    Zhao, Pengshan; Zhang, Jiwei; Zhao, Xin; Chen, Guoxiong; Ma, Xiao-Fei

    2016-01-01

    Reaumuria soongorica and sand rice (Agriophyllum squarrosum) belong to the clade of Caryophyllales and are widely distributed in the desert regions of north China. Both plants have evolved many specific traits and adaptation strategies to cope with recurring environmental threats. However, the genetic basis that underpins their unique traits and adaptation remains unknown. In this study, the transcriptome data of R. soongorica and sand rice were compared with three other species with previously sequenced genomes (Arabidopsis thaliana, Oryza sativa, and Beta vulgaris). Four different gene sets were identified, namely, the genes conserved in both species, those lost in both species, those conserved in R. soongorica only, and those conserved in sand rice only. Gene ontology showed that post-embryonic development genes (PEDGs) were enriched in all gene sets, and different sets of PEDGs were conserved or lost in both the R. soongorica and sand rice genomes. Expression profiles of Arabidopsis orthologs further provided some clues to the function of the species-specific conserved PEDGs. Such orthologs included LEAFY PETIOLE, which could be a candidate gene involved in the development of branch priority in sand rice. PMID:26815143

  13. IMGT/HighV-QUEST Statistical Significance of IMGT Clonotype (AA) Diversity per Gene for Standardized Comparisons of Next Generation Sequencing Immunoprofiles of Immunoglobulins and T Cell Receptors.

    PubMed

    Aouinti, Safa; Malouche, Dhafer; Giudicelli, Véronique; Kossida, Sofia; Lefranc, Marie-Paule

    2015-01-01

    The adaptive immune responses of humans and of other jawed vertebrate species (gnasthostomata) are characterized by the B and T cells and their specific antigen receptors, the immunoglobulins (IG) or antibodies and the T cell receptors (TR) (up to 2.1012 different IG and TR per individual). IMGT, the international ImMunoGeneTics information system (http://www.imgt.org), was created in 1989 by Marie-Paule Lefranc (Montpellier University and CNRS) to manage the huge and complex diversity of these antigen receptors. IMGT built on IMGT-ONTOLOGY concepts of identification (keywords), description (labels), classification (gene and allele nomenclature) and numerotation (IMGT unique numbering), is at the origin of immunoinformatics, a science at the interface between immunogenetics and bioinformatics. IMGT/HighV-QUEST, the first web portal, and so far the only one, for the next generation sequencing (NGS) analysis of IG and TR, is the paradigm for immune repertoire standardized outputs and immunoprofiles of the adaptive immune responses. It provides the identification of the variable (V), diversity (D) and joining (J) genes and alleles, analysis of the V-(D)-J junction and complementarity determining region 3 (CDR3) and the characterization of the 'IMGT clonotype (AA)' (AA for amino acid) diversity and expression. IMGT/HighV-QUEST compares outputs of different batches, up to one million nucleotide sequencesfor the statistical module. These high throughput IG and TR repertoire immunoprofiles are of prime importance in vaccination, cancer, infectious diseases, autoimmunity and lymphoproliferative disorders, however their comparative statistical analysis still remains a challenge. We present a standardized statistical procedure to analyze IMGT/HighV-QUEST outputs for the evaluation of the significance of the IMGT clonotype (AA) diversity differences in proportions, per gene of a given group, between NGS IG and TR repertoire immunoprofiles. The procedure is generic and

  14. A Research Methodology for Future Summative Evaluation Studies: Incorporating the Component of Multiple Sets of Matched Samples into the Statistical Control Modeling

    ERIC Educational Resources Information Center

    Li, Yuan H.; Modarresi, Shahpar; Yang, Yu N.

    2006-01-01

    Summative evaluations have often been undertaken to determine the impact of educational programs on student academic achievement employing a quasi-experimental design. The summative finding is expected to be less misleading if a statistical model is performed on a dataset including a sound matched sample as a control group. This is because an…

  15. Analysis of the seven-member AAD gene set demonstrates that genetic redundancy in yeast may be more apparent than real.

    PubMed Central

    Delneri, D; Gardner, D C; Oliver, S G

    1999-01-01

    Saccharomyces cerevisiae has seven genes encoding proteins with a high degree (>85%) of amino-acid sequence identity to the aryl-alcohol dehydrogenase of the lignin-degrading, filamentous fungus, Phanerochaete chrysosporium. All but one member of this gene set are telomere associated. Moreover, all contain a sequence similar to the DNA-binding site of the Yap1p transcriptional activator either upstream of or within their coding sequences. The expression of the AAD genes was found to be induced by chemicals, such as diamide and diethyl maleic acid ester (DEME), that cause an oxidative shock by inactivating the glutathione (GSH) reservoir of the cells. In contrast, the oxidizing agent hydrogen peroxide has no effect on the expression of these genes. We found that the response to anti-GSH agents was Yap1p dependent. The very high level of nucleotide sequence similarity between the AAD genes makes it difficult to determine if they are all involved in the oxidative-stress response. The use of single and multiple aad deletants demonstrated that only AAD4 (YDL243c) and AAD6 (YFL056/57c) respond to the oxidative stress. Of these two genes, only AAD4 is likely to be functional since the YFL056/57c open reading frame is interrupted by a stop codon. Thus, in terms of the function in response to oxidative stress, the sevenfold redundancy of the AAD gene set is more apparent than real. PMID:10581269

  16. The STATFLUX code: a statistical method for calculation of flow and set of parameters, based on the Multiple-Compartment Biokinetical Model

    NASA Astrophysics Data System (ADS)

    Garcia, F.; Mesa, J.; Arruda-Neto, J. D. T.; Helene, O.; Vanin, V.; Milian, F.; Deppman, A.; Rodrigues, T. E.; Rodriguez, O.

    2007-03-01

    The code STATFLUX, implementing a new and simple statistical procedure for the calculation of transfer coefficients in radionuclide transport to animals and plants, is proposed. The method is based on the general multiple-compartment model, which uses a system of linear equations involving geometrical volume considerations. Flow parameters were estimated by employing two different least-squares procedures: Derivative and Gauss-Marquardt methods, with the available experimental data of radionuclide concentrations as the input functions of time. The solution of the inverse problem, which relates a given set of flow parameter with the time evolution of concentration functions, is achieved via a Monte Carlo simulation procedure. Program summaryTitle of program:STATFLUX Catalogue identifier:ADYS_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADYS_v1_0 Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Licensing provisions: none Computer for which the program is designed and others on which it has been tested:Micro-computer with Intel Pentium III, 3.0 GHz Installation:Laboratory of Linear Accelerator, Department of Experimental Physics, University of São Paulo, Brazil Operating system:Windows 2000 and Windows XP Programming language used:Fortran-77 as implemented in Microsoft Fortran 4.0. NOTE: Microsoft Fortran includes non-standard features which are used in this program. Standard Fortran compilers such as, g77, f77, ifort and NAG95, are not able to compile the code and therefore it has not been possible for the CPC Program Library to test the program. Memory required to execute with typical data:8 Mbytes of RAM memory and 100 MB of Hard disk memory No. of bits in a word:16 No. of lines in distributed program, including test data, etc.:6912 No. of bytes in distributed program, including test data, etc.:229 541 Distribution format:tar.gz Nature of the physical problem:The investigation of transport mechanisms for

  17. A Method for Gene-Based Pathway Analysis Using Genomewide Association Study Summary Statistics Reveals Nine New Type 1 Diabetes Associations

    PubMed Central

    Evangelou, Marina; Smyth, Deborah J; Fortune, Mary D; Burren, Oliver S; Walker, Neil M; Guo, Hui; Onengut-Gumuscu, Suna; Chen, Wei-Min; Concannon, Patrick; Rich, Stephen S; Todd, John A; Wallace, Chris

    2014-01-01

    Pathway analysis can complement point-wise single nucleotide polymorphism (SNP) analysis in exploring genomewide association study (GWAS) data to identify specific disease-associated genes that can be candidate causal genes. We propose a straightforward methodology that can be used for conducting a gene-based pathway analysis using summary GWAS statistics in combination with widely available reference genotype data. We used this method to perform a gene-based pathway analysis of a type 1 diabetes (T1D) meta-analysis GWAS (of 7,514 cases and 9,045 controls). An important feature of the conducted analysis is the removal of the major histocompatibility complex gene region, the major genetic risk factor for T1D. Thirty-one of the 1,583 (2%) tested pathways were identified to be enriched for association with T1D at a 5% false discovery rate. We analyzed these 31 pathways and their genes to identify SNPs in or near these pathway genes that showed potentially novel association with T1D and attempted to replicate the association of 22 SNPs in additional samples. Replication P-values were skewed () with 12 of the 22 SNPs showing . Support, including replication evidence, was obtained for nine T1D associated variants in genes ITGB7 (rs11170466, ), NRP1 (rs722988, ), BAD (rs694739, ), CTSB (rs1296023, ), FYN (rs11964650, ), UBE2G1 (rs9906760, ), MAP3K14 (rs17759555, ), ITGB1 (rs1557150, ), and IL7R (rs1445898, ). The proposed methodology can be applied to other GWAS datasets for which only summary level data are available. PMID:25371288

  18. Cloning of complete genome sets of six dsRNA viruses using an improved cloning method for large dsRNA genes.

    PubMed

    Potgieter, A C; Steele, A D; van Dijk, A A

    2002-09-01

    Cloning full-length large (>3 kb) dsRNA genome segments from small amounts of dsRNA has thus far remained problematic. Here, a single-primer amplification sequence-independent dsRNA cloning procedure was perfected for large genes and tailored for routine use to clone complete genome sets or individual genes. Nine complete viral genome sets were amplified by PCR, namely those of two human rotaviruses, two African horsesickness viruses (AHSV), two equine encephalosis viruses (EEV), one bluetongue virus (BTV), one reovirus and bacteriophage Phi12. Of these amplified genomes, six complete genome sets were cloned for viruses with genes ranging in size from 0.8 to 6.8 kb. Rotavirus dsRNA was extracted directly from stool samples. Co-expressed EEV VP3 and VP7 assembled into core-like particles that have typical orbivirus capsomeres. This work presents the first EEV sequence data and establishes that EEV genes have the same conserved termini (5' GUU and UAC 3') and coding assignment as AHSV and BTV. To clone complete genome sets, one-tube reactions were developed for oligo-ligation, cDNA synthesis and PCR amplification. The method is simple and efficient compared to other methods. Complete genomes can be cloned from as little as 1 ng dsRNA and a considerably reduced number of PCR cycles (22-30 cycles compared to 30-35 of other methods). This progress with cloning large dsRNA genes is important for recombinant vaccine development and determination of the role of terminal sequences for replication and gene expression. PMID:12185276

  19. Two new loci and gene sets related to sex determination and cancer progression are associated with susceptibility to testicular germ cell tumor.

    PubMed

    Kristiansen, Wenche; Karlsson, Robert; Rounge, Trine B; Whitington, Thomas; Andreassen, Bettina K; Magnusson, Patrik K; Fosså, Sophie D; Adami, Hans-Olov; Turnbull, Clare; Haugen, Trine B; Grotmol, Tom; Wiklund, Fredrik

    2015-07-15

    Genome-wide association (GWA) studies have reported 19 distinct susceptibility loci for testicular germ cell tumor (TGCT). A GWA study for TGCT was performed by genotyping 610 240 single-nucleotide polymorphisms (SNPs) in 1326 cases and 6687 controls from Sweden and Norway. No novel genome-wide significant associations were observed in this discovery stage. We put forward 27 SNPs from 15 novel regions and 12 SNPs previously reported, for replication in 710 case-parent triads and 289 cases and 290 controls. Predefined biological pathways and processes, in addition to a custom-built sex-determination gene set, were subject to enrichment analyses using Meta-Analysis Gene Set Enrichment of Variant Associations (M) and Improved Gene Set Enrichment Analysis for Genome-wide Association Study (I). In the combined meta-analysis, we observed genome-wide significant association for rs7501939 on chromosome 17q12 (OR = 0.78, 95% CI = 0.72-0.84, P = 1.1 × 10(-9)) and rs2195987 on chromosome 19p12 (OR = 0.76, 95% CI: 0.69-0.84, P = 3.2 × 10(-8)). The marker rs7501939 on chromosome 17q12 is located in an intron of the HNF1B gene, encoding a member of the homeodomain-containing superfamily of transcription factors. The sex-determination gene set (false discovery rate, FDRM < 0.001, FDRI < 0.001) and pathways related to NF-κB, glycerophospholipid and ether lipid metabolism, as well as cancer and apoptosis, was associated with TGCT (FDR < 0.1). In addition to revealing two new TGCT susceptibility loci, our results continue to support the notion that genes governing normal germ cell development in utero are implicated in the development of TGCT. PMID:25877299

  20. Development of a set of oligonucleotide primers specific for genes at the Glu-1 complex loci of wheat.

    PubMed

    D'Ovidio, R; Masci, S; Porceddu, E

    1995-07-01

    Specific amplification of the complete coding region of all six high-molecular-weight (HMW) glutenin genes present in hexaploid wheat was obtained by the polyerase chain reaction (PCR). Primers specific for the N-terminal region of the 1Dx gene and for the repetitive domain of the y-type HMW glutenin genes were also developed. Although the primers were constructed on the basis of the nucleotide sequences of HMW glutenin genes present in T. aestivum L. cv 'Cheyenne', they were very efficient in amplifying HMW glutenin genes of diploid and tetraploid wheat species. PCR analysis of HMW glutenin genes of T. urartu Tuman., T. longissimum (Schweinf. & Muschl.) Bowden and T. speltoides (Tausch) Gren. ex Richt, showed a high degree of length polymorphism, whereas a low degree of length variation was found in accessions of T. tauschii (Coss.) Schmal. Furthermore, using primers specific for the repetitive regions of HMW genes, we could demonstrate that the size variation observed was due to a different length of the central repetitive domain. The usefulness of the PCR-based approach to analyze the genetic polymorphism of HMW glutenin genes, to isolate new allelic variants, to estimate their molecular size and to verify the number of cysteine residues is discussed. PMID:24169762

  1. t-Test at the Probe Level: An Alternative Method to Identify Statistically Significant Genes for Microarray Data

    PubMed Central

    Boareto, Marcelo; Caticha, Nestor

    2014-01-01

    Microarray data analysis typically consists in identifying a list of differentially expressed genes (DEG), i.e., the genes that are differentially expressed between two experimental conditions. Variance shrinkage methods have been considered a better choice than the standard t-test for selecting the DEG because they correct the dependence of the error with the expression level. This dependence is mainly caused by errors in background correction, which more severely affects genes with low expression values. Here, we propose a new method for identifying the DEG that overcomes this issue and does not require background correction or variance shrinkage. Unlike current methods, our methodology is easy to understand and implement. It consists of applying the standard t-test directly on the normalized intensity data, which is possible because the probe intensity is proportional to the gene expression level and because the t-test is scale- and location-invariant. This methodology considerably improves the sensitivity and robustness of the list of DEG when compared with the t-test applied to preprocessed data and to the most widely used shrinkage methods, Significance Analysis of Microarrays (SAM) and Linear Models for Microarray Data (LIMMA). Our approach is useful especially when the genes of interest have small differences in expression and therefore get ignored by standard variance shrinkage methods.

  2. Risks of spontaneously and IVF-conceived singleton and twin pregnancies differ, requiring reassessment of statistical premises favoring elective single embryo transfer (eSET).

    PubMed

    Gleicher, Norbert; Kushnir, Vitally A; Barad, David H

    2016-01-01

    A published review of the literature by Dutch investigators in 2004 suggested significant outcome differences between spontaneously - and in vitro fertilization (IVF) - conceived singleton and twin pregnancies. Here we review whether later studies between 2004-2015 confirmed these findings. Though methodologies of here reviewed studies varied, and all were retrospective, they overall confirmed results of the 2004 review, and supported significant outcome variances between spontaneously- and IVF-conceived pregnancies: IVF singletons demonstrate significantly poorer and IVF twins significantly better perinatal outcomes than spontaneously conceived singletons and twins, with differences stable over time, and with overall obstetrical outcomes significantly improved. Exaggerations of severe IVF twin risks are likely in the 50 % range, while exaggerations of milder perinatal risks are approximately in 25 % range. Though elective single embryo transfers (eSET) have been confirmed to reduce pregnancy chances, they are, nevertheless, increasingly utilized. eSET, equally unquestionably, however, reduces twin pregnancies. Because twin pregnancies have been alleged to increase outcome risks in comparison to singleton pregnancies, here reported findings should affect the ongoing discussion whether increased twin risks are factual. With no risk excess, eSET significantly reduces IVF pregnancy chances without compensatory benefits and, therefore, is not advisable in IVF, unless patients do not wish to conceive twins or have medical contraindications to conceiving twins. PMID:27142226

  3. The statistics of natural ELF/VLF waves derived from a long continuous set of ground-based observations at high latitude

    NASA Astrophysics Data System (ADS)

    Smith, A. J.; Horne, R. B.; Meredith, N. P.

    2010-04-01

    This paper analyses a unique set of continuous high-quality well-calibrated observations of natural ELF/VLF radio waves, in the range 0.3-10 kHz, made at Halley Research Station, Antarctica (76oS,27oW,L=4.5) over one and a half solar cycles (1992-2007). Reference is also made to similar but shorter data sets obtained from other Antarctic stations. The observed waves vary over a very wide dynamic range, from the receiver noise level of wrt (at 1 kHz) up to 40-50 dB above it, over a wide range of timescales. However, the long continuous data set allows us to average out the random and aperiodic variations to extract the underlying dependence of the wave characteristics on local time, time of year, solar cycle, etc. Below about 5 kHz the received waves are predominantly whistler-mode waves, notably chorus, which are generated in the magnetosphere and propagate on geomagnetic field-aligned ("ducted") paths to low altitudes. At the top of the frequency range the observed waves are mostly atmospherics from tropical lightning. The spectrum, and dependence on local time and season, are discussed in terms of a source function and a propagation function from the source region through the ionosphere (in the case of the magnetospheric waves) and under the ionosphere. The dependence of the waves on latitude, geomagnetic activity, solar cycle and day of the week are also described.

  4. Tlx1/3 and Ptf1a control the expression of distinct sets of transmitter and peptide receptor genes in the developing dorsal spinal cord.

    PubMed

    Guo, Zhen; Zhao, Congling; Huang, Menggui; Huang, Tianwen; Fan, Mingran; Xie, Zhiqin; Chen, Ying; Zhao, Xiaolin; Xia, Guannan; Geng, Junlan; Cheng, Leping

    2012-06-20

    Establishing the pattern of expression of transmitters and peptides as well as their receptors in different neuronal types is crucial for understanding the circuitry in various regions of the brain. Previous studies have demonstrated that the transmitter and peptide phenotypes in mouse dorsal spinal cord neurons are determined by the transcription factors Tlx1/3 and Ptf1a. Here we show that these transcription factors also determine the expression of two distinct sets of transmitter and peptide receptor genes in this region. We have screened the expression of 78 receptor genes in the spinal dorsal horn by in situ hybridization. We found that receptor genes Gabra1, Gabra5, Gabrb2, Gria3, Grin3a, Grin3b, Galr1, and Npy1r were preferentially expressed in Tlx3-expressing glutamatergic neurons and their derivatives, and deletion of Tlx1 and Tlx3 resulted in the loss of expression of these receptor genes. Furthermore, we obtained genetic evidence that Tlx3 uses distinct pathways to control the expression of receptor genes. We also found that receptor genes Grm3, Grm4, Grm5, Grik1, Grik2, Grik3, and Sstr2 were mainly expressed in Pax2-expressing GABAergic neurons in the spinal dorsal horn, and their expression in this region was abolished or markedly reduced in Ptf1a and Pax2 deletion mutant mice. Together, our studies indicate that Tlx1/3 and Ptf1a, the key transcription factors for fate determination of glutamatergic and GABAergic neurons in the dorsal spinal cord, are also responsible for controlling the expression of two distinct sets of transmitter and peptide receptor genes. PMID:22723691

  5. XRCC5 as a Risk Gene for Alcohol Dependence: Evidence from a Genome-Wide Gene-Set-Based Analysis and Follow-up Studies in Drosophila and Humans

    PubMed Central

    Juraeva, Dilafruz; Treutlein, Jens; Scholz, Henrike; Frank, Josef; Degenhardt, Franziska; Cichon, Sven; Ridinger, Monika; Mattheisen, Manuel; Witt, Stephanie H; Lang, Maren; Sommer, Wolfgang H; Hoffmann, Per; Herms, Stefan; Wodarz, Norbert; Soyka, Michael; Zill, Peter; Maier, Wolfgang; Jünger, Elisabeth; Gaebel, Wolfgang; Dahmen, Norbert; Scherbaum, Norbert; Schmäl, Christine; Steffens, Michael; Lucae, Susanne; Ising, Marcus; Smolka, Michael N; Zimmermann, Ulrich S; Müller-Myhsok, Bertram; Nöthen, Markus M; Mann, Karl; Kiefer, Falk; Spanagel, Rainer; Brors, Benedikt; Rietschel, Marcella

    2015-01-01

    Genetic factors have as large role as environmental factors in the etiology of alcohol dependence (AD). Although genome-wide association studies (GWAS) enable systematic searches for loci not hitherto implicated in the etiology of AD, many true findings may be missed owing to correction for multiple testing. The aim of the present study was to circumvent this limitation by searching for biological system-level differences, and then following up these findings in humans and animals. Gene-set-based analysis of GWAS data from 1333 cases and 2168 controls identified 19 significantly associated gene-sets, of which 5 could be replicated in an independent sample. Clustered in these gene-sets were novel and previously identified susceptibility genes. The most frequently present gene, ie in 6 out of 19 gene-sets, was X-ray repair complementing defective repair in Chinese hamster cells 5 (XRCC5). Previous human and animal studies have implicated XRCC5 in alcohol sensitivity. This phenotype is inversely correlated with the development of AD, presumably as more alcohol is required to achieve the desired effects. In the present study, the functional role of XRCC5 in AD was further validated in animals and humans. Drosophila mutants with reduced function of Ku80—the homolog of mammalian XRCC5—due to RNAi silencing showed reduced sensitivity to ethanol. In humans with free access to intravenous ethanol self-administration in the laboratory, the maximum achieved blood alcohol concentration was influenced in an allele-dose-dependent manner by genetic variation in XRCC5. In conclusion, our convergent approach identified new candidates and generated independent evidence for the involvement of XRCC5 in alcohol dependence. PMID:25035082

  6. XRCC5 as a risk gene for alcohol dependence: evidence from a genome-wide gene-set-based analysis and follow-up studies in Drosophila and humans.

    PubMed

    Juraeva, Dilafruz; Treutlein, Jens; Scholz, Henrike; Frank, Josef; Degenhardt, Franziska; Cichon, Sven; Ridinger, Monika; Mattheisen, Manuel; Witt, Stephanie H; Lang, Maren; Sommer, Wolfgang H; Hoffmann, Per; Herms, Stefan; Wodarz, Norbert; Soyka, Michael; Zill, Peter; Maier, Wolfgang; Jünger, Elisabeth; Gaebel, Wolfgang; Dahmen, Norbert; Scherbaum, Norbert; Schmäl, Christine; Steffens, Michael; Lucae, Susanne; Ising, Marcus; Smolka, Michael N; Zimmermann, Ulrich S; Müller-Myhsok, Bertram; Nöthen, Markus M; Mann, Karl; Kiefer, Falk; Spanagel, Rainer; Brors, Benedikt; Rietschel, Marcella

    2015-01-01

    Genetic factors have as large role as environmental factors in the etiology of alcohol dependence (AD). Although genome-wide association studies (GWAS) enable systematic searches for loci not hitherto implicated in the etiology of AD, many true findings may be missed owing to correction for multiple testing. The aim of the present study was to circumvent this limitation by searching for biological system-level differences, and then following up these findings in humans and animals. Gene-set-based analysis of GWAS data from 1333 cases and 2168 controls identified 19 significantly associated gene-sets, of which 5 could be replicated in an independent sample. Clustered in these gene-sets were novel and previously identified susceptibility genes. The most frequently present gene, ie in 6 out of 19 gene-sets, was X-ray repair complementing defective repair in Chinese hamster cells 5 (XRCC5). Previous human and animal studies have implicated XRCC5 in alcohol sensitivity. This phenotype is inversely correlated with the development of AD, presumably as more alcohol is required to achieve the desired effects. In the present study, the functional role of XRCC5 in AD was further validated in animals and humans. Drosophila mutants with reduced function of Ku80-the homolog of mammalian XRCC5-due to RNAi silencing showed reduced sensitivity to ethanol. In humans with free access to intravenous ethanol self-administration in the laboratory, the maximum achieved blood alcohol concentration was influenced in an allele-dose-dependent manner by genetic variation in XRCC5. In conclusion, our convergent approach identified new candidates and generated independent evidence for the involvement of XRCC5 in alcohol dependence. PMID:25035082

  7. Speeding up directed evolution: Combining the advantages of solid-phase combinatorial gene synthesis with statistically guided reduction of screening effort.

    PubMed

    Hoebenreich, Sabrina; Zilly, Felipe E; Acevedo-Rocha, Carlos G; Zilly, Matías; Reetz, Manfred T

    2015-03-20

    Efficient and economic methods in directed evolution at the protein, metabolic, and genome level are needed for biocatalyst development and the success of synthetic biology. In contrast to random strategies, semirational approaches such as saturation mutagenesis explore the sequence space in a focused manner. Although several combinatorial libraries based on saturation mutagenesis have been reported using solid-phase gene synthesis, direct comparison with traditional PCR-based methods is currently lacking. In this work, we compare combinatorial protein libraries created in-house via PCR versus those generated by commercial solid-phase gene synthesis. Using descriptive statistics and probabilistic distributions on amino acid occurrence frequencies, the quality of the libraries was assessed and compared, revealing that the outsourced libraries are characterized by less bias and outliers than the PCR-based ones. Afterward, we screened all libraries following a traditional algorithm for almost complete library coverage and compared this approach with an emergent statistical concept suggesting screening a lower portion of the protein sequence space. Upon analyzing the biocatalytic landscapes and best hits of all combinatorial libraries, we show that the screening effort could have been reduced in all cases by more than 50%, while still finding at least one of the best mutants. PMID:24921161

  8. CorSig: A General Framework for Estimating Statistical Significance of Correlation and Its Application to Gene Co-Expression Analysis

    PubMed Central

    Wang, Hong-Qiang; Tsai, Chung-Jui

    2013-01-01

    With the rapid increase of omics data, correlation analysis has become an indispensable tool for inferring meaningful associations from a large number of observations. Pearson correlation coefficient (PCC) and its variants are widely used for such purposes. However, it remains challenging to test whether an observed association is reliable both statistically and biologically. We present here a new method, CorSig, for statistical inference of correlation significance. CorSig is based on a biology-informed null hypothesis, i.e., testing whether the true PCC (ρ) between two variables is statistically larger than a user-specified PCC cutoff (τ), as opposed to the simple null hypothesis of ρ = 0 in existing methods, i.e., testing whether an association can be declared without a threshold. CorSig incorporates Fisher's Z transformation of the observed PCC (r), which facilitates use of standard techniques for p-value computation and multiple testing corrections. We compared CorSig against two methods: one uses a minimum PCC cutoff while the other (Zhu's procedure) controls correlation strength and statistical significance in two discrete steps. CorSig consistently outperformed these methods in various simulation data scenarios by balancing between false positives and false negatives. When tested on real-world Populus microarray data, CorSig effectively identified co-expressed genes in the flavonoid pathway, and discriminated between closely related gene family members for their differential association with flavonoid and lignin pathways. The p-values obtained by CorSig can be used as a stand-alone parameter for stratification of co-expressed genes according to their correlation strength in lieu of an arbitrary cutoff. CorSig requires one single tunable parameter, and can be readily extended to other correlation measures. Thus, CorSig should be useful for a wide range of applications, particularly for network analysis of high-dimensional genomic data. Software

  9. Niemeyer Virus: A New Mimivirus Group A Isolate Harboring a Set of Duplicated Aminoacyl-tRNA Synthetase Genes

    PubMed Central

    Boratto, Paulo V. M.; Arantes, Thalita S.; Silva, Lorena C. F.; Assis, Felipe L.; Kroon, Erna G.; La Scola, Bernard; Abrahão, Jônatas S.

    2015-01-01

    It is well recognized that gene duplication/acquisition is a key factor for molecular evolution, being directly related to the emergence of new genetic variants. The importance of such phenomena can also be expanded to the viral world, with impacts on viral fitness and environmental adaptations. In this work we describe the isolation and characterization of Niemeyer virus, a new mimivirus isolate obtained from water samples of an urban lake in Brazil. Genomic data showed that Niemeyer harbors duplicated copies of three of its four aminoacyl-tRNA synthetase genes (cysteinyl, methionyl, and tyrosyl RS). Gene expression analysis showed that such duplications allowed significantly increased expression of methionyl and tyrosyl aaRS mRNA by Niemeyer in comparison to APMV. Remarkably, phylogenetic data revealed that Niemeyer duplicated gene pairs are different, each one clustering with a different group of mimivirus strains. Taken together, our results raise new questions about the origins and selective pressures involving events of aaRS gain and loss among mimiviruses. PMID:26635738

  10. Polymorphisms in sodium-dependent vitamin C transporter genes and plasma, aqueous humor and lens nucleus ascorbate concentrations in an ascorbate depleted setting.

    PubMed

    Senthilkumari, Srinivasan; Talwar, Badri; Dharmalingam, Kuppamuthu; Ravindran, Ravilla D; Jayanthi, Ramamurthy; Sundaresan, Periasamy; Saravanan, Charu; Young, Ian S; Dangour, Alan D; Fletcher, Astrid E

    2014-07-01

    We have previously reported low concentrations of plasma ascorbate and low dietary vitamin C intake in the older Indian population and a strong inverse association of these with cataract. Little is known about ascorbate levels in aqueous humor and lens in populations habitually depleted of ascorbate and no studies in any setting have investigated whether genetic polymorphisms influence ascorbate levels in ocular tissues. Our objectives were to investigate relationships between ascorbate concentrations in plasma, aqueous humor and lens and whether these relationships are influenced by Single Nucleotide Polymorphisms (SNPs) in sodium-dependent vitamin C transporter genes (SLC23A1 and SLC23A2). We enrolled sixty patients (equal numbers of men and women, mean age 63 years) undergoing small incision cataract surgery in southern India. We measured ascorbate concentrations in plasma, aqueous humor and lens nucleus using high performance liquid chromatography. SLC23A1 SNPs (rs4257763, rs6596473) and SLC23A2 SNPs (rs1279683 and rs12479919) were genotyped using a TaqMan assay. Patients were interviewed for lifestyle factors which might influence ascorbate. Plasma vitamin C was normalized by a log10 transformation. Statistical analysis used linear regression with the slope of the within-subject associations estimated using beta (β) coefficients. The ascorbate concentrations (μmol/L) were: plasma ascorbate, median and inter-quartile range (IQR), 15.2 (7.8, 34.5), mean (SD) of aqueous humor ascorbate, 1074 (545) and lens nucleus ascorbate, 0.42 (0.16) (μmol/g lens nucleus wet weight). Minimum allele frequencies were: rs1279683 (0.28), rs12479919 (0.30), rs659647 (0.48). Decreasing concentrations of ocular ascorbate from the common to the rare genotype were observed for rs6596473 and rs12479919. The per allele difference in aqueous humor ascorbate for rs6596473 was -217 μmol/L, p < 0.04 and a per allele difference in lens nucleus ascorbate of -0.085 μmol/g, p < 0

  11. In silico analyses identify gene-sets, associated with clinical outcome in ovarian cancer: role of mitotic kinases

    PubMed Central

    Ocaña, Alberto; Pérez-Peña, Javier; Alcaraz-Sanabria, Ana; Sánchez-Corrales, Verónica; Nieto-Jiménez, Cristina; Templeton, Arnoud J.; Seruga, Bostjan; Pandiella, Atanasio; Amir, Eitan

    2016-01-01

    Introduction Accurate assessment of prognosis in early stage ovarian cancer is challenging resulting in suboptimal selection of patients for adjuvant therapy. The identification of predictive markers for cytotoxic chemotherapy is therefore highly desirable. Protein kinases are important components in oncogenic transformation and those relating to cell cycle and mitosis control may allow for identification of high-risk early stage ovarian tumors. Methods Genes with differential expression in ovarian surface epithelia (OSE) and ovarian cancer epithelial cells (CEPIs) were identified from public datasets and analyzed with dChip software. Progression-free (PFS) and overall survival (OS) associated with these genes in stage I/II and late stage ovarian cancer was explored using the Kaplan Meier Plotter online tool. Results Of 2925 transcripts associated with modified expression in CEPIs compared to OSE, 66 genes coded for upregulated protein kinases. Expression of 9 of these genes (CDC28, CHK1, NIMA, Aurora kinase A, Aurora kinase B, BUB1, BUB1βB, CDKN2A and TTK) was associated with worse PFS (HR:3.40, log rank p<0.001). The combined analyses of CHK1, CDKN2A, AURKA, AURKB, TTK and NEK2 showed the highest magnitude of association with PFS (HR:4.62, log rank p<0.001). Expression of AURKB predicted detrimental OS in stage I/II ovarian cancer better than all other combinations Conclusion Genes linked to cell cycle control are associated with worse outcome in early stage ovarian cancer. Incorporation of these biomarkers in clinical studies may help in the identification of patients at high risk of relapse for whom optimizing adjuvant therapeutic strategies is needed. PMID:26992217

  12. Effect on in vitro cell response of the statistical insertion of N-(2-hydroxypropyl) methacrylamide on linear pro-dendronic polyamine's gene carriers.

    PubMed

    Redondo, Juan Alfonso; Martínez-Campos, Enrique; Navarro, Rodrigo; Reinecke, Helmut; Elvira, Carlos; López-Lacomba, José Luis; Gallardo, Alberto

    2015-06-01

    Statistical copolymers of N-(2-hydroxypropyl) methacrylamide (HPMA) and the dendronic methacrylic monomer 2-(3-(Bis(2-(diethylamino)ethyl)amino)propanamido)ethyl methacrylate (TEDETAMA, derived from N,N,N',N'-tetraethyldiethylenetriamine, TEDETA), were synthesized through radical copolymerization and evaluated in vitro as non-viral gene carriers. Three copolymers with nominal molar percentages of HPMA of 25%, 50% and 75% were prepared and studied comparatively to the positive controls poly-TEDETAMA and hyperbranched polyethyleneimine (PEI, 25kDa). Their ability to complex DNA at different N/P molar ratios, from 1/1 up to 8/1, was determined through agarose gel electrophoresis and Dynamic Light Scattering. The resulting complexes (polyplexes) were characterized and evaluated in vitro as possible non-viral gene carriers for Swiss-3T3 fibroblasts, using luciferase as reporter gene and a calcein cytocompatibility assay. All the copolymers, except the one with highest HPMA proportion (75 molar %) at the lowest N/P ratio, condensed DNA to a particle size between 100 and 300 nm. The copolymers with 25 and 50 molar % of HPMA displayed higher transfection efficiency and cytocompatibility than the positive controls poly-TEDETAMA and PEI. A higher proportion of HPMA (75 molar %) led to copolymers that displayed very low transfection efficiency, despite their full cytocompatibility even at the highest N/P ratio. These results indicate that the statistical combination of TEDETAMA and HPMA and its fine compositional tuning in the copolymers may fulfill the fine balance of transfection efficiency and cytocompatibility in a superior way to the control poly-TEDETAMA and PEI. PMID:25937440

  13. The Etv1 transcription factor activity-dependently downregulates a set of genes controlling cell growth and differentiation in maturing cerebellar granule cells.

    PubMed

    Okazawa, Makoto; Abe, Haruka; Nakanishi, Shigetada

    2016-05-13

    In the early postnatal period, cerebellar granule cells exhibit an activity-dependent downregulation of a set of immaturation genes involved in cell growth and migration and are shifted to establishment of a mature network formation. Through the use of a granule cell culture and both pharmacological and RNA interference (siRNA) analyses, the present investigation revealed that the downregulation of these immaturation genes is controlled by strikingly unified signaling mechanisms that operate sequentially through the stimulation of AMPA and NMDA receptors, tetrodotoxin-sensitive Na(+) channels and Ca(2+)/calmodulin-dependent protein kinase II (CaMKII). This signaling cascade induces the Etv1 transcription factor, and knockdown of Etv1 by a siRNA technique prevented this activity-dependent downregulation of immaturation genes. Thus, taken into consideration the mechanism that controls the upregulation of maturation genes involved in synaptic formation, these results indicate that Etv1 orchestrates the activity-dependent regulation of both maturation and immaturation genes in developing granule cells and plays a key role in specifying the identity of mature granule cells in the cerebellum. PMID:27059140

  14. A literature based method for identifying gene-disease connections.

    PubMed

    Adamic, Lada A; Wilkinson, Dennis; Huberman, Bernardo A; Adar, Eytan

    2002-01-01

    We present a statistical method that can swiftly identify, from the literature, sets of genes known to be associated with given diseases. It offers a comprehensive way to treat alias symbols, a statistical method for computing the relevance of the gene to the query, and a novel way to disambiguate gene symbols from other abbreviations. The method is illustrated by finding genes related to breast cancer. PMID:15838128

  15. Fine mapping and candidate gene analysis of hwh1 and hwh2, a set of complementary genes controlling hybrid breakdown in rice.

    PubMed

    Jiang, Wenzhu; Chu, Sang-Ho; Piao, Rihua; Chin, Joong-Hyoun; Jin, Yong-Mei; Lee, Joohyun; Qiao, Yongli; Han, Longzhi; Piao, Zongze; Koh, Hee-Jong

    2008-05-01

    Hybrid breakdown (HB), a phenomenon of reduced viability or fertility accompanied with retarded growth in hybrid progenies, often arises in the offspring of intersubspecific hybrids between indica and japonica in rice. We detected HB plants in F8 recombinant inbred lines derived from the cross between an indica variety, Milyang 23, and a japonica variety, Tong 88-7. HB plants showed retarded growth, with fewer tillers and spikelets. Genetic analysis revealed that HB was controlled by the complementary action of two recessive genes, hwh1 and hwh2, originating from each of both parents, which were fine-mapped on the short arm of chromosome 2 and on the near centromere region of the long arm of chromosome 11, respectively. A comparison of the sequences of candidate genes among both parents and HB plants revealed that hwh1 encoded a putative glucose-methanol-choline oxidoreductase with one amino acid change compared to Hwh1 and that hwh2 probably encoded a putative hexose transporter with a six amino acid insertion compared to Hwh2. Investigation of the distribution of these alleles among 54 japonica and indica cultivars using candidate gene-based markers suggested that the two loci might be involved in developing reproductive barriers between two subspecies. PMID:18335199

  16. Independent evolution of the core and accessory gene sets in the genus Neisseria: insights gained from the genome of Neisseria lactamica isolate 020-06

    PubMed Central

    2010-01-01

    Background The genus Neisseria contains two important yet very different pathogens, N. meningitidis and N. gonorrhoeae, in addition to non-pathogenic species, of which N. lactamica is the best characterized. Genomic comparisons of these three bacteria will provide insights into the mechanisms and evolution of pathogenesis in this group of organisms, which are applicable to understanding these processes more generally. Results Non-pathogenic N. lactamica exhibits very similar population structure and levels of diversity to the meningococcus, whilst gonococci are essentially recent descendents of a single clone. All three species share a common core gene set estimated to comprise around 1190 CDSs, corresponding to about 60% of the genome. However, some of the nucleotide sequence diversity within this core genome is particular to each group, indicating that cross-species recombination is rare in this shared core gene set. Other than the meningococcal cps region, which encodes the polysaccharide capsule, relatively few members of the large accessory gene pool are exclusive to one species group, and cross-species recombination within this accessory genome is frequent. Conclusion The three Neisseria species groups represent coherent biological and genetic groupings which appear to be maintained by low rates of inter-species horizontal genetic exchange within the core genome. There is extensive evidence for exchange among positively selected genes and the accessory genome and some evidence of hitch-hiking of housekeeping genes with other loci. It is not possible to define a 'pathogenome' for this group of organisms and the disease causing phenotypes are therefore likely to be complex, polygenic, and different among the various disease-associated phenotypes observed. PMID:21092259

  17. Mutation analysis of the MS4A and TREM gene clusters in a case-control Alzheimer's disease data set.

    PubMed

    Ghani, Mahdi; Sato, Christine; Kakhki, Erfan Ghani; Gibbs, J Raphael; Traynor, Bryan; St George-Hyslop, Peter; Rogaeva, Ekaterina

    2016-06-01

    Genome wide association studies have identified an association between Alzheimer's disease (AD) and common polymorphisms in the MS4A and TREM loci (each containing a cluster of homologous genes) and should be thoroughly investigated for the presence of potentially functional variations. We conducted a mutation analysis by next generation sequencing of 15 genes within the MS4A and TREM gene clusters; and catalogued rare coding variants detected in a North American data set of 210 cases and 233 controls. Investigation of the 5 homologues genes in the TREM locus revealed potentially damaging rare variants in TREM2, TREML1, TREML2, and TREML4. In agreement with a previous report, we observed a significant enrichment of TREM2-damaging missense substitutions in cases (N = 9; 4.2%) compared with controls (N=2; 0.9%; p = 0.010; after Yates' correction p = 0.022). Among known AD-associated TREM2 substitutions, we detected p.R47H, p.D87N, and p.H157Y affecting both TREM2 isoforms (NM_018965 and NM_001271821). In addition, we identified 2 cases with novel TREM2 variants (p.L205P and p.G219C), which mapped only to the isoform NM_001271821 at the C-terminus. Investigation of the MS4A gene cluster revealed that potentially damaging missense substitutions and loss-of-function variants were twice as frequent in controls (N = 19; 8.2%) than cases (N = 9; 4.3%), generating a nominally significant result (p = 0.047; after Yates' correction p = 0.07). Validation of our observation in large data sets might address the question whether such variants could contribute to the protective effect of the minor alleles of Genome wide association study-significant single nucleotide polymorphisms at the MS4A locus. PMID:27084067

  18. Statistical and Biological Gene-Lifestyle Interactions of MC4R and FTO with Diet and Physical Activity on Obesity: New Effects on Alcohol Consumption

    PubMed Central

    Covas, M. Isabel; Carrasco, Paula; Salas-Salvadó, Jordi; Martínez-González, Miguel Ángel; Arós, Fernando; Lapetra, José; Serra-Majem, Lluís; Lamuela-Raventos, Rosa; Gómez-Gracia, Enrique; Fiol, Miquel; Pintó, Xavier; Ros, Emilio; Martí, Amelia; Coltell, Oscar; Ordovás, Jose M.; Estruch, Ramon

    2012-01-01

    Background Fat mass and obesity (FTO) and melanocortin-4 receptor (MC4R) and are relevant genes associated with obesity. This could be through food intake, but results are contradictory. Modulation by diet or other lifestyle factors is also not well understood. Objective To investigate whether MC4R and FTO associations with body-weight are modulated by diet and physical activity (PA), and to study their association with alcohol and food intake. Methods Adherence to Mediterranean diet (AdMedDiet) and physical activity (PA) were assessed by validated questionnaires in 7,052 high cardiovascular risk subjects. MC4R rs17782313 and FTO rs9939609 were determined. Independent and joint associations (aggregate genetic score) as well as statistical and biological gene-lifestyle interactions were analyzed. Results FTO rs9939609 was associated with higher body mass index (BMI), waist circumference (WC) and obesity (P<0.05 for all). A similar, but not significant trend was found for MC4R rs17782313. Their additive effects (aggregate score) were significant and we observed a 7% per-allele increase of being obese (OR = 1.07; 95%CI 1.01–1.13). We found relevant statistical interactions (P<0.05) with PA. So, in active individuals, the associations with higher BMI, WC or obesity were not detected. A biological (non-statistical) interaction between AdMedDiet and rs9939609 and the aggregate score was found. Greater AdMedDiet in individuals carrying 4 or 3-risk alleles counterbalanced their genetic predisposition, exhibiting similar BMI (P = 0.502) than individuals with no risk alleles and lower AdMedDiet. They also had lower BMI (P = 0.021) than their counterparts with low AdMedDiet. We did not find any consistent association with energy or macronutrients, but found a novel association between these polymorphisms and lower alcohol consumption in variant-allele carriers (B+/−SE: −0.57+/−0.16 g/d per-score-allele; P = 0.001). Conclusion Statistical and biological

  19. Gene Set-Based Functionome Analysis of Pathogenesis in Epithelial Ovarian Serous Carcinoma and the Molecular Features in Different FIGO Stages.

    PubMed

    Chang, Chia-Ming; Chuang, Chi-Mu; Wang, Mong-Lien; Yang, Ming-Jie; Chang, Cheng-Chang; Yen, Ming-Shyen; Chiou, Shih-Hwa

    2016-01-01

    Serous carcinoma (SC) is the most common subtype of epithelial ovarian carcinoma and is divided into four stages by the Federation of Gynecologists and Obstetrics (FIGO) staging system. Currently, the molecular functions and biological processes of SC at different FIGO stages have not been quantified. Here, we conducted a whole-genome integrative analysis to investigate the functions of SC at different stages. The function, as defined by the GO term or canonical pathway gene set, was quantified by measuring the changes in the gene expressional order between cancerous and normal control states. The quantified function, i.e., the gene set regularity (GSR) index, was utilized to investigate the pathogenesis and functional regulation of SC at different FIGO stages. We showed that the informativeness of the GSR indices was sufficient for accurate pattern recognition and classification for machine learning. The function regularity presented by the GSR indices showed stepwise deterioration during SC progression from FIGO stage I to stage IV. The pathogenesis of SC was centered on cell cycle deregulation and accompanied with multiple functional aberrations as well as their interactions. PMID:27275818

  20. Gene Set-Based Functionome Analysis of Pathogenesis in Epithelial Ovarian Serous Carcinoma and the Molecular Features in Different FIGO Stages

    PubMed Central

    Chang, Chia-Ming; Chuang, Chi-Mu; Wang, Mong-Lien; Yang, Ming-Jie; Chang, Cheng-Chang; Yen, Ming-Shyen; Chiou, Shih-Hwa

    2016-01-01

    Serous carcinoma (SC) is the most common subtype of epithelial ovarian carcinoma and is divided into four stages by the Federation of Gynecologists and Obstetrics (FIGO) staging system. Currently, the molecular functions and biological processes of SC at different FIGO stages have not been quantified. Here, we conducted a whole-genome integrative analysis to investigate the functions of SC at different stages. The function, as defined by the GO term or canonical pathway gene set, was quantified by measuring the changes in the gene expressional order between cancerous and normal control states. The quantified function, i.e., the gene set regularity (GSR) index, was utilized to investigate the pathogenesis and functional regulation of SC at different FIGO stages. We showed that the informativeness of the GSR indices was sufficient for accurate pattern recognition and classification for machine learning. The function regularity presented by the GSR indices showed stepwise deterioration during SC progression from FIGO stage I to stage IV. The pathogenesis of SC was centered on cell cycle deregulation and accompanied with multiple functional aberrations as well as their interactions. PMID:27275818

  1. Genetic variations in the CLNK gene and ZNF518B gene are associated with gout in case-control sample sets.

    PubMed

    Jin, Tian-Bo; Ren, Yongchao; Shi, Xugang; Jiri, Mutu; He, Na; Feng, Tian; Yuan, Dongya; Kang, Longli

    2015-07-01

    A genome-wide association study of gout in European populations identified 12 genetic variants strongly associated with risk of gout, but it is unknown whether these variants are also associated with gout risk in Chinese populations. A total of 145 patients with gout and 310 healthy control patients were recruited for a case-control association study. Twelve SNPs of CLNK and ZNF518B gene were genotyped, and association analysis was performed. Odds ratios (ORs) with 95 % confidence intervals (CIs) were used to assess the association. Overall, we found four risk alleles for gout in patients: the allele "G" of rs2041215 and rs1686947 in the CLNK gene by dominant model (OR 1.66; 95 % CI 1.04-2.63; p = 0.031) (OR 2.19; 95 % CI 1.38-3.46; p = 0.001) and additive model (OR 1.39; 95 % CI 1.00-1.93; p = 0.049) (OR 1.67; 95 % CI 1.19-2.32; p = 0.003), respectively, and the allele "A" of rs10938799 and rs10016022 in ZNF518B gene by recessive model (OR 4.66; 95 % CI 1.44-15.09; p = 0.008) (OR 4.54; 95 % CI 1.23-16.76; p = 0.020). Further haplotype analysis showed that the TCATTCTGA haplotype of CLNK was more frequent among patients with gout (adjusted OR 0.48; 95 % CI 0.24-0.95; p = 0.036). Additionally, polymorphisms of rs2041215, rs10938799, and rs17467273 were also correlated with clinical pathological parameters. This study provides evidence for gout susceptibility genes, CLNK and ZNF518B, in a Chinese population, which may have potential as diagnostic and prognostic marker for gout patients. PMID:25591661

  2. Real-time recording of circadian liver gene expression in freely moving mice reveals the phase-setting behavior of hepatocyte clocks.

    PubMed

    Saini, Camille; Liani, André; Curie, Thomas; Gos, Pascal; Kreppel, Florian; Emmenegger, Yann; Bonacina, Luigi; Wolf, Jean-Pierre; Poget, Yves-Alain; Franken, Paul; Schibler, Ueli

    2013-07-01

    The mammalian circadian timing system consists of a master pacemaker in the suprachiasmatic nucleus (SCN) in the hypothalamus, which is thought to set the phase of slave oscillators in virtually all body cells. However, due to the lack of appropriate in vivo recording technologies, it has been difficult to study how the SCN synchronizes oscillators in peripheral tissues. Here we describe the real-time recording of bioluminescence emitted by hepatocytes expressing circadian luciferase reporter genes in freely moving mice. The technology employs a device dubbed RT-Biolumicorder, which consists of a cylindrical cage with reflecting conical walls that channel photons toward a photomultiplier tube. The monitoring of circadian liver gene expression revealed that hepatocyte oscillators of SCN-lesioned mice synchronized more rapidly to feeding cycles than hepatocyte clocks of intact mice. Hence, the SCN uses signaling pathways that counteract those of feeding rhythms when their phase is in conflict with its own phase. PMID:23824542

  3. Real-time recording of circadian liver gene expression in freely moving mice reveals the phase-setting behavior of hepatocyte clocks

    PubMed Central

    Saini, Camille; Liani, André; Curie, Thomas; Gos, Pascal; Kreppel, Florian; Emmenegger, Yann; Bonacina, Luigi; Wolf, Jean-Pierre; Poget, Yves-Alain; Franken, Paul; Schibler, Ueli

    2013-01-01

    The mammalian circadian timing system consists of a master pacemaker in the suprachiasmatic nucleus (SCN) in the hypothalamus, which is thought to set the phase of slave oscillators in virtually all body cells. However, due to the lack of appropriate in vivo recording technologies, it has been difficult to study how the SCN synchronizes oscillators in peripheral tissues. Here we describe the real-time recording of bioluminescence emitted by hepatocytes expressing circadian luciferase reporter genes in freely moving mice. The technology employs a device dubbed RT-Biolumicorder, which consists of a cylindrical cage with reflecting conical walls that channel photons toward a photomultiplier tube. The monitoring of circadian liver gene expression revealed that hepatocyte oscillators of SCN-lesioned mice synchronized more rapidly to feeding cycles than hepatocyte clocks of intact mice. Hence, the SCN uses signaling pathways that counteract those of feeding rhythms when their phase is in conflict with its own phase. PMID:23824542

  4. Analysis of Antibiotic Resistance Genes and its Associated SCCmec Types among Nasal Carriage of Methicillin Resistant Coagulase Negative Staphylococci from Community Settings, Chennai, Southern India

    PubMed Central

    Murugesan, Saravanan; Perumal, Nagaraj; Mahalingam, Surya Prakash; Dilliappan, Selva Kumar

    2015-01-01

    Objective The study was designed to find the distribution of SCCmec types and the various antibiotic resistance genes amongst MR-CoNS isolates from asymptomatic individuals. Materials and Methods A total of 145 nasal swabs were collected from asymptomatic healthy individuals from community settings. Identification and speciation of CoNS were done by standard biochemical methods. Screening of methicillin resistance (mecA gene) and detection of various antibiotic resistant genes were done using multiplex PCR method. SCCmec types (I - V) were determined using multiplex PCR. Results 50 (44.6%) isolates were found to be methicillin resistant both by cefoxitin method and multiplex PCR. S. epidermidis (40%) was the predominant species followed by S. haemolyticus (28%), S. hominis (20%) and S. warneri (12%). Highest resistance was shown for cotrimoxazole (26%), followed by ciprofloxacin (24%), tetracycline (20%), erythromycin (18%), fusidic acid (10%) and mupirocin (6%). Among SCCmec types, 44 isolates showed single type, including type I (30%), type IV (24%), type II (18%), type V (14%) and type III (2%). 6 isolates showed two types, III+IV (n= 2), II+V (n=2), IV+V (n=1) and type I+V (n=1). Conclusion In conclusion, to the best of our knowledge, this is the first study in India to study the distribution of antibiotic resistant genes and SCCmec types among MR-CoNS from community settings. This study highlights high prevalence of MR-CoNS in community and its role in harbouring genetically diverse SCCmec elements as antibiotic resistance determinant. PMID:26435940

  5. Meta-Analysis and Gene Set Analysis of Archived Microarrays Suggest Implication of the Spliceosome in Metastatic and Hypoxic Phenotypes

    PubMed Central

    Bareke, Eric; Depiereux, Sophie; Michiels, Carine; Depiereux, Eric

    2014-01-01

    We propose to make use of the wealth of underused DNA chip data available in public repositories to study the molecular mechanisms behind the adaptation of cancer cells to hypoxic conditions leading to the metastatic phenotype. We have developed new bioinformatics tools and adapted others to identify with maximum sensitivity those genes which are expressed differentially across several experiments. The comparison of two analytical approaches, based on either Over Representation Analysis or Functional Class Scoring, by a meta-analysis-based approach, led to the retrieval of known information about the biological situation – thus validating the model – but also more importantly to the discovery of the previously unknown implication of the spliceosome, the cellular machinery responsible for mRNA splicing, in the development of metastasis. PMID:24497970

  6. Prevalence of qacA/B Genes and Mupirocin Resistance Among Methicillin-Resistant Staphylococcus aureus (MRSA) Isolates in the Setting of Chlorhexidine Bathing Without Mupirocin.

    PubMed

    Warren, David K; Prager, Martin; Munigala, Satish; Wallace, Meghan A; Kennedy, Colleen R; Bommarito, Kerry M; Mazuski, John E; Burnham, Carey-Ann D

    2016-05-01

    OBJECTIVE We aimed to determine the frequency of qacA/B chlorhexidine tolerance genes and high-level mupirocin resistance among MRSA isolates before and after the introduction of a chlorhexidine (CHG) daily bathing intervention in a surgical intensive care unit (SICU). DESIGN Retrospective cohort study (2005-2012) SETTING A large tertiary-care center PATIENTS Patients admitted to SICU who had MRSA surveillance cultures of the anterior nares METHODS A random sample of banked MRSA anterior nares isolates recovered during (2005) and after (2006-2012) implementation of a daily CHG bathing protocol was examined for qacA/B genes and high-level mupirocin resistance. Staphylococcal cassette chromosome mec (SCCmec) typing was also performed. RESULTS Of the 504 randomly selected isolates (63 per year), 36 (7.1%) were qacA/B positive (+) and 35 (6.9%) were mupirocin resistant. Of these, 184 (36.5%) isolates were SCCmec type IV. There was a significant trend for increasing qacA/B (P=.02; highest prevalence, 16.9% in 2009 and 2010) and SCCmec type IV (P<.001; highest prevalence, 52.4% in 2012) during the study period. qacA/B(+) MRSA isolates were more likely to be mupirocin resistant (9 of 36 [25%] qacA/B(+) vs 26 of 468 [5.6%] qacA/B(-); P=.003). CONCLUSIONS A long-term, daily CHG bathing protocol was associated with a change in the frequency of qacA/B genes in MRSA isolates recovered from the anterior nares over an 8-year period. This change in the frequency of qacA/B genes is most likely due to patients in those years being exposed in prior admissions. Future studies need to further evaluate the implications of universal CHG daily bathing on MRSA qacA/B genes among hospitalized patients. Infect Control Hosp Epidemiol 2016;37:590-597. PMID:26828094

  7. Role of YY1 in the pathogenesis of prostate cancer and correlation with bioinformatic data sets of gene expression

    PubMed Central

    Kashyap, Vaishali; Bonavida, Benjamin

    2014-01-01

    Current treatments of various cancers include chemotherapy, radiation, surgery, immunotherapy, and combinations. However, there is a need to develop novel diagnostic and therapeutic treatments for unresponsive patients. These may be achieved by the identification of novel diagnostic and prognostic biomarkers which will help in the stratification of patients' initial responses to particular treatments and circumvent resistance, relapses, metastasis, and death. We have been investigating human prostate cancer as a model tumor. We have identified Yin Yang 1 (YY1), a dysregulated transcription factor, whose overexpression correlated with tumor progression as well as in the regulation of drug resistance and the development of EMT. YY1 expression is upregulated in human prostate cancer cell lines and tissues. We postulated that YY1 may be a potential biomarker in prostate cancer for patients' stratification as well as a novel target for therapeutic intervention. We used Bioinformatic gene RNA array datasets for the expression of YY1 in prostate tumor tissues as compared to normal tissues. Interestingly, variations on the expression levels of YY1 mRNA in prostate cancer were reported by different investigators. This mini review summarizes the current reported studies and Bioinformatic analyses on the role of YY1 in the pathogenesis of prostate cancer. PMID:25053986

  8. Cosmic statistics of statistics

    NASA Astrophysics Data System (ADS)

    Szapudi, István; Colombi, Stéphane; Bernardeau, Francis

    1999-12-01

    The errors on statistics measured in finite galaxy catalogues are exhaustively investigated. The theory of errors on factorial moments by Szapudi & Colombi is applied to cumulants via a series expansion method. All results are subsequently extended to the weakly non-linear regime. Together with previous investigations this yields an analytic theory of the errors for moments and connected moments of counts in cells from highly non-linear to weakly non-linear scales. For non-linear functions of unbiased estimators, such as the cumulants, the phenomenon of cosmic bias is identified and computed. Since it is subdued by the cosmic errors in the range of applicability of the theory, correction for it is inconsequential. In addition, the method of Colombi, Szapudi & Szalay concerning sampling effects is generalized, adapting the theory for inhomogeneous galaxy catalogues. While previous work focused on the variance only, the present article calculates the cross-correlations between moments and connected moments as well for a statistically complete description. The final analytic formulae representing the full theory are explicit but somewhat complicated. Therefore we have made available a fortran program capable of calculating the described quantities numerically (for further details e-mail SC at colombi@iap.fr). An important special case is the evaluation of the errors on the two-point correlation function, for which this should be more accurate than any method put forward previously. This tool will be immensely useful in the future for assessing the precision of measurements from existing catalogues, as well as aiding the design of new galaxy surveys. To illustrate the applicability of the results and to explore the numerical aspects of the theory qualitatively and quantitatively, the errors and cross-correlations are predicted under a wide range of assumptions for the future Sloan Digital Sky Survey. The principal results concerning the cumulants ξ, Q3 and Q4 is that

  9. Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses

    PubMed Central

    Bayzid, Md Shamsuzzoha; Mirarab, Siavash; Boussau, Bastien; Warnow, Tandy

    2015-01-01

    Because biological processes can result in different loci having different evolutionary histories, species tree estimation requires multiple loci from across multiple genomes. While many processes can result in discord between gene trees and species trees, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is considered to be a dominant cause for gene tree heterogeneity. Coalescent-based methods have been developed to estimate species trees, many of which operate by combining estimated gene trees, and so are called "summary methods". Because summary methods are generally fast (and much faster than more complicated coalescent-based methods that co-estimate gene trees and species trees), they have become very popular techniques for estimating species trees from multiple loci. However, recent studies have established that summary methods can have reduced accuracy in the presence of gene tree estimation error, and also that many biological datasets have substantial gene tree estimation error, so that summary methods may not be highly accurate in biologically realistic conditions. Mirarab et al. (Science 2014) presented the "statistical binning" technique to improve gene tree estimation in multi-locus analyses, and showed that it improved the accuracy of MP-EST, one of the most popular coalescent-based summary methods. Statistical binning, which uses a simple heuristic to evaluate "combinability" and then uses the larger sets of genes to re-calculate gene trees, has good empirical performance, but using statistical binning within a phylogenomic pipeline does not have the desirable property of being statistically consistent. We show that weighting the re-calculated gene trees by the bin sizes makes statistical binning statistically consistent under the multispecies coalescent, and maintains the good empirical performance. Thus, "weighted statistical binning" enables highly accurate genome-scale species tree estimation, and is also statistically

  10. A novel hybrid dimension reduction technique for undersized high dimensional gene expression data sets using information complexity criterion for cancer classification.

    PubMed

    Pamukçu, Esra; Bozdogan, Hamparsum; Çalık, Sinan

    2015-01-01

    Gene expression data typically are large, complex, and highly noisy. Their dimension is high with several thousand genes (i.e., features) but with only a limited number of observations (i.e., samples). Although the classical principal component analysis (PCA) method is widely used as a first standard step in dimension reduction and in supervised and unsupervised classification, it suffers from several shortcomings in the case of data sets involving undersized samples, since the sample covariance matrix degenerates and becomes singular. In this paper we address these limitations within the context of probabilistic PCA (PPCA) by introducing and developing a new and novel approach using maximum entropy covariance matrix and its hybridized smoothed covariance estimators. To reduce the dimensionality of the data and to choose the number of probabilistic PCs (PPCs) to be retained, we further introduce and develop celebrated Akaike's information criterion (AIC), consistent Akaike's information criterion (CAIC), and the information theoretic measure of complexity (ICOMP) criterion of Bozdogan. Six publicly available undersized benchmark data sets were analyzed to show the utility, flexibility, and versatility of our approach with hybridized smoothed covariance matrix estimators, which do not degenerate to perform the PPCA to reduce the dimension and to carry out supervised classification of cancer groups in high dimensions. PMID:25838836

  11. Statistical Inference at Work: Statistical Process Control as an Example

    ERIC Educational Resources Information Center

    Bakker, Arthur; Kent, Phillip; Derry, Jan; Noss, Richard; Hoyles, Celia

    2008-01-01

    To characterise statistical inference in the workplace this paper compares a prototypical type of statistical inference at work, statistical process control (SPC), with a type of statistical inference that is better known in educational settings, hypothesis testing. Although there are some similarities between the reasoning structure involved in…

  12. Chromatin H3K27me3/H3K4me3 histone marks define gene sets in high-grade serous ovarian cancer that distinguish malignant, tumour-sustaining and chemo-resistant ovarian tumour cells.

    PubMed

    Chapman-Rothe, N; Curry, E; Zeller, C; Liber, D; Stronach, E; Gabra, H; Ghaem-Maghami, S; Brown, R

    2013-09-19

    In embryonic stem (ES) cells, bivalent chromatin domains containing H3K4me3 and H3K27me3 marks silence developmental genes, while keeping them poised for activation following differentiation. We have identified gene sets associated with H3K27me3 and H3K4me3 marks at transcription start sites in a high-grade ovarian serous tumour and examined their association with epigenetic silencing and malignant progression. This revealed novel silenced bivalent marked genes, not described previously for ES cells, which are significantly enriched for the PI3K (P<10(-7)) and TGF-β signalling pathways (P<10(-5)). We matched histone marked gene sets to gene expression sets of eight normal fallopian tubes and 499 high-grade serous malignant ovarian samples. This revealed a significant decrease in gene expression for the H3K27me3 and bivalent gene sets in malignant tissue. We then correlated H3K27me3 and bivalent gene sets to gene expression data of ovarian tumour 'stem cell-like' sustaining cells versus non-sustaining cells. This showed a significantly lower expression for the H3K27me3 and bivalent gene sets in the tumour-sustaining cells. Similarly, comparison of matched chemo-sensitive and chemo-resistant ovarian cell lines showed a significantly lower expression of H3K27me3/bivalent marked genes in the chemo-resistant compared with the chemo-sensitive cell line. Our analysis supports the hypothesis that bivalent marks are associated with epigenetic silencing in ovarian cancer. However it also suggests that additional tumour specific bivalent marks, to those known in ES cells, are present in tumours and may potentially influence the subsequent development of drug resistance and tumour progression. PMID:23128397

  13. Genetic Dissection of the mamAB and mms6 Operons Reveals a Gene Set Essential for Magnetosome Biogenesis in Magnetospirillum gryphiswaldense

    PubMed Central

    Lohße, Anna; Borg, Sarah; Raschdorf, Oliver; Kolinko, Isabel; Tompa, Éva; Pósfai, Mihály; Faivre, Damien; Baumgartner, Jens

    2014-01-01

    Biosynthesis of bacterial magnetosomes, which are intracellular membrane-enclosed, nanosized magnetic crystals, is controlled by a set of >30 specific genes. In Magnetospirillum gryphiswaldense, these are clustered mostly within a large conserved genomic magnetosome island (MAI) comprising the mms6, mamGFDC, mamAB, and mamXY operons. Here, we demonstrate that the five previously uncharacterized genes of the mms6 operon have crucial functions in the regulation of magnetosome biomineralization that partially overlap MamF and other proteins encoded by the adjacent mamGFDC operon. While all other deletions resulted in size reduction, elimination of either mms36 or mms48 caused the synthesis of magnetite crystals larger than those in the wild type (WT). Whereas the mms6 operon encodes accessory factors for crystal maturation, the large mamAB operon contains several essential and nonessential genes involved in various other steps of magnetosome biosynthesis, as shown by single deletions of all mamAB genes. While single deletions of mamL, -P, -Q, -R, -B, -S, -T, and -U showed phenotypes similar to those of their orthologs in a previous study in the related M. magneticum, we found mamI and mamN to be not required for at least rudimentary iron biomineralization in M. gryphiswaldense. Thus, only mamE, -L, -M, -O, -Q, and -B were essential for formation of magnetite, whereas a mamI mutant still biomineralized tiny particles which, however, consisted of the nonmagnetic iron oxide hematite, as shown by high-resolution transmission electron microscopy (HRTEM) and the X-ray absorption near-edge structure (XANES). Based on this and previous studies, we propose an extended model for magnetosome biosynthesis in M. gryphiswaldense. PMID:24816605

  14. Efficient Test and Visualization of Multi-Set Intersections

    PubMed Central

    Wang, Minghui; Zhao, Yongzhong; Zhang, Bin

    2015-01-01

    Identification of sets of objects with shared features is a common operation in all disciplines. Analysis of intersections among multiple sets is fundamental for in-depth understanding of their complex relationships. However, so far no method has been developed to assess statistical significance of intersections among three or more sets. Moreover, the state-of-the-art approaches for visualization of multi-set intersections are not scalable. Here, we first developed a theoretical framework for computing the statistical distributions of multi-set intersections based upon combinatorial theory, and then accordingly designed a procedure to efficiently calculate the exact probabilities of multi-set intersections. We further developed multiple efficient and scalable techniques to visualize multi-set intersections and the corresponding intersection statistics. We implemented both the theoretical framework and the visualization techniques in a unified R software package, SuperExactTest. We demonstrated the utility of SuperExactTest through an intensive simulation study and a comprehensive analysis of seven independently curated cancer gene sets as well as six disease or trait associated gene sets identified by genome-wide association studies. We expect SuperExactTest developed by this study will have a broad range of applications in scientific data analysis in many disciplines. PMID:26603754

  15. Descriptive statistics.

    PubMed

    Shi, Runhua; McLarty, Jerry W

    2009-10-01

    In this article, we introduced basic concepts of statistics, type of distributions, and descriptive statistics. A few examples were also provided. The basic concepts presented herein are only a fraction of the concepts related to descriptive statistics. Also, there are many commonly used distributions not presented herein, such as Poisson distributions for rare events and exponential distributions, F distributions, and logistic distributions. More information can be found in many statistics books and publications. PMID:19891281

  16. The minimal gene set member msrA, encoding peptide methionine sulfoxide reductase, is a virulence determinant of the plant pathogen Erwinia chrysanthemi.

    PubMed

    Hassouni, M E; Chambost, J P; Expert, D; Van Gijsegem, F; Barras, F

    1999-02-01

    Peptide methionine sulfoxide reductase (MsrA), which repairs oxidized proteins, is present in most living organisms, and the cognate structural gene belongs to the so-called minimum gene set [Mushegian, A. R. & Koonin, E. V., (1996) Proc. Natl. Acad. Sci. USA 93, 10268-10273]. In this work, we report that MsrA is required for full virulence of the plant pathogen Erwinia chrysanthemi. The following differences were observed between the wild-type and a MsrA- mutant: (i) the MsrA- mutant was more sensitive to oxidative stress; (ii) the MsrA- mutant was less motile on solid surface; (iii) the MsrA- mutant exhibited reduced virulence on chicory leaves; and (iv) no systemic invasion was observed when the MsrA- mutant was inoculated into whole Saintpaulia ionantha plants. These results suggest that plants respond to virulent pathogens by producing active oxygen species, and that enzymes repairing oxidative damage allow virulent pathogens to survive the host environment, thereby supporting the theory that active oxygen species play a key role in plant defense. PMID:9927663

  17. The Complete Set of Genes Encoding Major Intrinsic Proteins in Arabidopsis Provides a Framework for a New Nomenclature for Major Intrinsic Proteins in Plants1

    PubMed Central

    Johanson, Urban; Karlsson, Maria; Johansson, Ingela; Gustavsson, Sofia; Sjövall, Sara; Fraysse, Laure; Weig, Alfons R.; Kjellbom, Per

    2001-01-01

    Major intrinsic proteins (MIPs) facilitate the passive transport of small polar molecules across membranes. MIPs constitute a very old family of proteins and different forms have been found in all kinds of living organisms, including bacteria, fungi, animals, and plants. In the genomic sequence of Arabidopsis, we have identified 35 different MIP-encoding genes. Based on sequence similarity, these 35 proteins are divided into four different subfamilies: plasma membrane intrinsic proteins, tonoplast intrinsic proteins, NOD26-like intrinsic proteins also called NOD26-like MIPs, and the recently discovered small basic intrinsic proteins. In Arabidopsis, there are 13 plasma membrane intrinsic proteins, 10 tonoplast intrinsic proteins, nine NOD26-like intrinsic proteins, and three small basic intrinsic proteins. The gene structure in general is conserved within each subfamily, although there is a tendency to lose introns. Based on phylogenetic comparisons of maize (Zea mays) and Arabidopsis MIPs (AtMIPs), it is argued that the general intron patterns in the subfamilies were formed before the split of monocotyledons and dicotyledons. Although the gene structure is unique for each subfamily, there is a common pattern in how transmembrane helices are encoded on the exons in three of the subfamilies. The nomenclature for plant MIPs varies widely between different species but also between subfamilies in the same species. Based on the phylogeny of all AtMIPs, a new and more consistent nomenclature is proposed. The complete set of AtMIPs, together with the new nomenclature, will facilitate the isolation, classification, and labeling of plant MIPs from other species. PMID:11500536

  18. Fine-Scale Linkage Mapping Reveals a Small Set of Candidate Genes Influencing Honey Bee Grooming Behavior in Response to Varroa Mites

    PubMed Central

    Arechavaleta-Velasco, Miguel E.; Alcala-Escamilla, Karla; Robles-Rios, Carlos; Tsuruda, Jennifer M.; Hunt, Greg J.

    2012-01-01

    Populations of honey bees in North America have been experiencing high annual colony mortality for 15–20 years. Many apicultural researchers believe that introduced parasites called Varroa mites (V. destructor) are the most important factor in colony deaths. One important resistance mechanism that limits mite population growth in colonies is the ability of some lines of honey bees to groom mites from their bodies. To search for genes influencing this trait, we used an Illumina Bead Station genotyping array to determine the genotypes of several hundred worker bees at over a thousand single-nucleotide polymorphisms in a family that was apparently segregating for alleles influencing this behavior. Linkage analyses provided a genetic map with 1,313 markers anchored to genome sequence. Genotypes were analyzed for association with grooming behavior, measured as the time that individual bees took to initiate grooming after mites were placed on their thoraces. Quantitative-trait-locus interval mapping identified a single chromosomal region that was significant at the chromosome-wide level (p<0.05) on chromosome 5 with a LOD score of 2.72. The 95% confidence interval for quantitative trait locus location contained only 27 genes (honey bee official gene annotation set 2) including Atlastin, Ataxin and Neurexin-1 (AmNrx1), which have potential neurodevelopmental and behavioral effects. Atlastin and Ataxin homologs are associated with neurological diseases in humans. AmNrx1 codes for a presynaptic protein with many alternatively spliced isoforms. Neurexin-1 influences the growth, maintenance and maturation of synapses in the brain, as well as the type of receptors most prominent within synapses. Neurexin-1 has also been associated with autism spectrum disorder and schizophrenia in humans, and self-grooming behavior in mice. PMID:23133594

  19. Statistical Diversions

    ERIC Educational Resources Information Center

    Petocz, Peter; Sowey, Eric

    2008-01-01

    As a branch of knowledge, Statistics is ubiquitous and its applications can be found in (almost) every field of human endeavour. In this article, the authors track down the possible source of the link between the "Siren song" and applications of Statistics. Answers to their previous five questions and five new questions on Statistics are presented.

  20. Statistical Software.

    ERIC Educational Resources Information Center

    Callamaras, Peter

    1983-01-01

    This buyer's guide to seven major types of statistics software packages for microcomputers reviews Edu-Ware Statistics 3.0; Financial Planning; Speed Stat; Statistics with DAISY; Human Systems Dynamics package of Stats Plus, ANOVA II, and REGRESS II; Maxistat; and Moore-Barnes' MBC Test Construction and MBC Correlation. (MBR)

  1. Bayesian Statistics.

    ERIC Educational Resources Information Center

    Meyer, Donald L.

    Bayesian statistical methodology and its possible uses in the behavioral sciences are discussed in relation to the solution of problems in both the use and teaching of fundamental statistical methods, including confidence intervals, significance tests, and sampling. The Bayesian model explains these statistical methods and offers a consistent…

  2. Statistical analysis of the eigenspace components of the two-dimensional, symmetric rank-two strain rate tensor derived from the space geodetic measurements (ITRF92-ITRF2000 data sets) in central Mediterranean and Western Europe

    NASA Astrophysics Data System (ADS)

    Cai, Jianqing; Grafarend, Erik W.

    2007-02-01

    In the deformation analysis with a 2-D (or planar and horizontal), symmetric rank-two deformation tensor in geosciences (geodesy, geophysics and geology), the eigenspace components of these random deformation tensors (principal components, principal directions) are of focal interest. With the new development of space-geodetic techniques, such as GPS, VLBI, SLR and DORIS, the components of deformation measures (such as the stress or strain tensor, etc.) can be estimated from their highly accurate regular measurement of positions and change rates and analysed by means of the proper statistical testing procedures. In this paper we begin with a review of the results of statistical inference of eigenspace components of the 2-D symmetric, rank-two random tensor (`random matrix'), that is, the best linear uniformly unbiased estimation (BLUUE) of the eigenspace elements and the best invariant quadratic uniformly unbiased estimate (BIQUUE) of its variance-covariance matrix. Then the geodynamic setting of the Earth and especially the selected investigated region-the central Mediterranean and Western Europe will be discussed. Thirdly, the ITRF sites are selected according to the history and quality of the ITRF realization series, and the related incremental velocities of selected ITRF sites are computed. Fourthly, the methods of derivation for the 2-D geodetic strain rates are introduced in order to obtain these strain rates from the incremental velocities. In the case study, both BLUUE and BIQUUE models as well as related hypothesis tests are applied to the eigenspace components of the 2-D strain rate tensor observations in the area of the central Mediterranean and Western Europe, as derived from the ITRF92 to ITRF2000 sequential station positions and velocities. The interpretation and comparison of these results with the geodynamic feature are followed. Furthermore the statistical inference of the eigenspace components provides us with not only the confidence regions of

  3. Set points, settling points and some alternative models: theoretical options to understand how genes and environments combine to regulate body adiposity

    PubMed Central

    Speakman, John R.; Levitsky, David A.; Allison, David B.; Bray, Molly S.; de Castro, John M.; Clegg, Deborah J.; Clapham, John C.; Dulloo, Abdul G.; Gruer, Laurence; Haw, Sally; Hebebrand, Johannes; Hetherington, Marion M.; Higgs, Susanne; Jebb, Susan A.; Loos, Ruth J. F.; Luckman, Simon; Luke, Amy; Mohammed-Ali, Vidya; O’Rahilly, Stephen; Pereira, Mark; Perusse, Louis; Robinson, Tom N.; Rolls, Barbara; Symonds, Michael E.; Westerterp-Plantenga, Margriet S.

    2011-01-01

    The close correspondence between energy intake and expenditure over prolonged time periods, coupled with an apparent protection of the level of body adiposity in the face of perturbations of energy balance, has led to the idea that body fatness is regulated via mechanisms that control intake and energy expenditure. Two models have dominated the discussion of how this regulation might take place. The set point model is rooted in physiology, genetics and molecular biology, and suggests that there is an active feedback mechanism linking adipose tissue (stored energy) to intake and expenditure via a set point, presumably encoded in the brain. This model is consistent with many of the biological aspects of energy balance, but struggles to explain the many significant environmental and social influences on obesity, food intake and physical activity. More importantly, the set point model does not effectively explain the ‘obesity epidemic’ – the large increase in body weight and adiposity of a large proportion of individuals in many countries since the 1980s. An alternative model, called the settling point model, is based on the idea that there is passive feedback between the size of the body stores and aspects of expenditure. This model accommodates many of the social and environmental characteristics of energy balance, but struggles to explain some of the biological and genetic aspects. The shortcomings of these two models reflect their failure to address the gene-by-environment interactions that dominate the regulation of body weight. We discuss two additional models – the general intake model and the dual intervention point model – that address this issue and might offer better ways to understand how body fatness is controlled. PMID:22065844

  4. Rapid Detection and Statistical Differentiation of KPC Gene Variants in Gram-Negative Pathogens by Use of High-Resolution Melting and ScreenClust Analyses

    PubMed Central

    Roth, Amanda L.

    2013-01-01

    In the United States, the production of the Klebsiella pneumoniae carbapenemase (KPC) is an important mechanism of carbapenem resistance in Gram-negative pathogens. Infections with KPC-producing organisms are associated with increased morbidity and mortality; therefore, the rapid detection of KPC-producing pathogens is critical in patient care and infection control. We developed a real-time PCR assay complemented with traditional high-resolution melting (HRM) analysis, as well as statistically based genotyping, using the Rotor-Gene ScreenClust HRM software to both detect the presence of blaKPC and differentiate between KPC-2-like and KPC-3-like alleles. A total of 166 clinical isolates of Enterobacteriaceae, Pseudomonas aeruginosa, and Acinetobacter baumannii with various β-lactamase susceptibility patterns were tested in the validation of this assay; 66 of these organisms were known to produce the KPC β-lactamase. The real-time PCR assay was able to detect the presence of blaKPC in all 66 of these clinical isolates (100% sensitivity and specificity). HRM analysis demonstrated that 26 had KPC-2-like melting peak temperatures, while 40 had KPC-3-like melting peak temperatures. Sequencing of 21 amplified products confirmed the melting peak results, with 9 isolates carrying blaKPC-2 and 12 isolates carrying blaKPC-3. This PCR/HRM assay can identify KPC-producing Gram-negative pathogens in as little as 3 h after isolation of pure colonies and does not require post-PCR sample manipulation for HRM analysis, and ScreenClust analysis easily distinguishes blaKPC-2-like and blaKPC-3-like alleles. Therefore, this assay is a rapid method to identify the presence of blaKPC enzymes in Gram-negative pathogens that can be easily integrated into busy clinical microbiology laboratories. PMID:23077125

  5. Statistical databases

    SciTech Connect

    Kogalovskii, M.R.

    1995-03-01

    This paper presents a review of problems related to statistical database systems, which are wide-spread in various fields of activity. Statistical databases (SDB) are referred to as databases that consist of data and are used for statistical analysis. Topics under consideration are: SDB peculiarities, properties of data models adequate for SDB requirements, metadata functions, null-value problems, SDB compromise protection problems, stored data compression techniques, and statistical data representation means. Also examined is whether the present Database Management Systems (DBMS) satisfy the SDB requirements. Some actual research directions in SDB systems are considered.

  6. Morbidity statistics

    PubMed Central

    Smith, Alwyn

    1969-01-01

    This paper is based on an analysis of questionnaires sent to the health ministries of Member States of WHO asking for information about the extent, nature, and scope of morbidity statistical information. It is clear that most countries collect some statistics of morbidity and many countries collect extensive data. However, few countries relate their collection to the needs of health administrators for information, and many countries collect statistics principally for publication in annual volumes which may appear anything up to 3 years after the year to which they refer. The desiderata of morbidity statistics may be summarized as reliability, representativeness, and relevance to current health problems. PMID:5306722

  7. Statistical Diversions

    ERIC Educational Resources Information Center

    Petocz, Peter; Sowey, Eric

    2008-01-01

    In this article, the authors focus on hypothesis testing--that peculiarly statistical way of deciding things. Statistical methods for testing hypotheses were developed in the 1920s and 1930s by some of the most famous statisticians, in particular Ronald Fisher, Jerzy Neyman and Egon Pearson, who laid the foundations of almost all modern methods of…

  8. UpSet: Visualization of Intersecting Sets

    PubMed Central

    Lex, Alexander; Gehlenborg, Nils; Strobelt, Hendrik; Vuillemot, Romain; Pfister, Hanspeter

    2016-01-01

    Understanding relationships between sets is an important analysis task that has received widespread attention in the visualization community. The major challenge in this context is the combinatorial explosion of the number of set intersections if the number of sets exceeds a trivial threshold. In this paper we introduce UpSet, a novel visualization technique for the quantitative analysis of sets, their intersections, and aggregates of intersections. UpSet is focused on creating task-driven aggregates, communicating the size and properties of aggregates and intersections, and a duality between the visualization of the elements in a dataset and their set membership. UpSet visualizes set intersections in a matrix layout and introduces aggregates based on groupings and queries. The matrix layout enables the effective representation of associated data, such as the number of elements in the aggregates and intersections, as well as additional summary statistics derived from subset or element attributes. Sorting according to various measures enables a task-driven analysis of relevant intersections and aggregates. The elements represented in the sets and their associated attributes are visualized in a separate view. Queries based on containment in specific intersections, aggregates or driven by attribute filters are propagated between both views. We also introduce several advanced visual encodings and interaction methods to overcome the problems of varying scales and to address scalability. UpSet is web-based and open source. We demonstrate its general utility in multiple use cases from various domains. PMID:26356912

  9. The PR/SET Domain Zinc Finger Protein Prdm4 Regulates Gene Expression in Embryonic Stem Cells but Plays a Nonessential Role in the Developing Mouse Embryo

    PubMed Central

    Bogani, Debora; Morgan, Marc A. J.; Nelson, Andrew C.; Costello, Ita; McGouran, Joanna F.; Kessler, Benedikt M.

    2013-01-01

    Prdm4 is a highly conserved member of the Prdm family of PR/SET domain zinc finger proteins. Many well-studied Prdm family members play critical roles in development and display striking loss-of-function phenotypes. Prdm4 functional contributions have yet to be characterized. Here, we describe its widespread expression in the early embryo and adult tissues. We demonstrate that DNA binding is exclusively mediated by the Prdm4 zinc finger domain, and we characterize its tripartite consensus sequence via SELEX (systematic evolution of ligands by exponential enrichment) and ChIP-seq (chromatin immunoprecipitation-sequencing) experiments. In embryonic stem cells (ESCs), Prdm4 regulates key pluripotency and differentiation pathways. Two independent strategies, namely, targeted deletion of the zinc finger domain and generation of a EUCOMM LacZ reporter allele, resulted in functional null alleles. However, homozygous mutant embryos develop normally and adults are healthy and fertile. Collectively, these results strongly suggest that Prdm4 functions redundantly with other transcriptional partners to cooperatively regulate gene expression in the embryo and adult animal. PMID:23918801

  10. The PR/SET domain zinc finger protein Prdm4 regulates gene expression in embryonic stem cells but plays a nonessential role in the developing mouse embryo.

    PubMed

    Bogani, Debora; Morgan, Marc A J; Nelson, Andrew C; Costello, Ita; McGouran, Joanna F; Kessler, Benedikt M; Robertson, Elizabeth J; Bikoff, Elizabeth K

    2013-10-01

    Prdm4 is a highly conserved member of the Prdm family of PR/SET domain zinc finger proteins. Many well-studied Prdm family members play critical roles in development and display striking loss-of-function phenotypes. Prdm4 functional contributions have yet to be characterized. Here, we describe its widespread expression in the early embryo and adult tissues. We demonstrate that DNA binding is exclusively mediated by the Prdm4 zinc finger domain, and we characterize its tripartite consensus sequence via SELEX (systematic evolution of ligands by exponential enrichment) and ChIP-seq (chromatin immunoprecipitation-sequencing) experiments. In embryonic stem cells (ESCs), Prdm4 regulates key pluripotency and differentiation pathways. Two independent strategies, namely, targeted deletion of the zinc finger domain and generation of a EUCOMM LacZ reporter allele, resulted in functional null alleles. However, homozygous mutant embryos develop normally and adults are healthy and fertile. Collectively, these results strongly suggest that Prdm4 functions redundantly with other transcriptional partners to cooperatively regulate gene expression in the embryo and adult animal. PMID:23918801

  11. The AHL- and BDSF-Dependent Quorum Sensing Systems Control Specific and Overlapping Sets of Genes in Burkholderia cenocepacia H111

    PubMed Central

    Aguilar, Claudio; Carlier, Aurelien L.; Grunau, Alexander; Omasits, Ulrich; Zhang, Lian-Hui; Ahrens, Christian H.; Eberl, Leo

    2012-01-01

    Quorum sensing in Burkholderia cenocepacia H111 involves two signalling systems that depend on different signal molecules, namely N-acyl homoserine lactones (AHLs) and the diffusible signal factor cis-2-dodecenoic acid (BDSF). Previous studies have shown that AHLs and BDSF control similar phenotypic traits, including biofilm formation, proteolytic activity and pathogenicity. In this study we mapped the BDSF stimulon by RNA-Seq and shotgun proteomics analysis. We demonstrate that a set of the identified BDSF-regulated genes or proteins are also controlled by AHLs, suggesting that the two regulons partially overlap. The detailed analysis of two mutually regulated operons, one encoding three lectins and the other one encoding the large surface protein BapA and its type I secretion machinery, revealed that both AHLs and BDSF are required for full expression, suggesting that the two signalling systems operate in parallel. In accordance with this, we show that both AHLs and BDSF are required for biofilm formation and protease production. PMID:23185499

  12. Statistics: A Brief Overview

    PubMed Central

    Winters, Ryan; Winters, Andrew; Amedee, Ronald G.

    2010-01-01

    The Accreditation Council for Graduate Medical Education sets forth a number of required educational topics that must be addressed in residency and fellowship programs. We sought to provide a primer on some of the important basic statistical concepts to consider when examining the medical literature. It is not essential to understand the exact workings and methodology of every statistical test encountered, but it is necessary to understand selected concepts such as parametric and nonparametric tests, correlation, and numerical versus categorical data. This working knowledge will allow you to spot obvious irregularities in statistical analyses that you encounter. PMID:21603381

  13. Statistics Clinic

    NASA Technical Reports Server (NTRS)

    Feiveson, Alan H.; Foy, Millennia; Ploutz-Snyder, Robert; Fiedler, James

    2014-01-01

    Do you have elevated p-values? Is the data analysis process getting you down? Do you experience anxiety when you need to respond to criticism of statistical methods in your manuscript? You may be suffering from Insufficient Statistical Support Syndrome (ISSS). For symptomatic relief of ISSS, come for a free consultation with JSC biostatisticians at our help desk during the poster sessions at the HRP Investigators Workshop. Get answers to common questions about sample size, missing data, multiple testing, when to trust the results of your analyses and more. Side effects may include sudden loss of statistics anxiety, improved interpretation of your data, and increased confidence in your results.

  14. SEER Statistics

    Cancer.gov

    The Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute works to provide information on cancer statistics in an effort to reduce the burden of cancer among the U.S. population.

  15. Cancer Statistics

    MedlinePlus

    ... cancer statistics across the world. U.S. Cancer Mortality Trends The best indicator of progress against cancer is ... the number of cancer survivors has increased. These trends show that progress is being made against the ...

  16. Statistical Physics

    NASA Astrophysics Data System (ADS)

    Hermann, Claudine

    Statistical Physics bridges the properties of a macroscopic system and the microscopic behavior of its constituting particles, otherwise impossible due to the giant magnitude of Avogadro's number. Numerous systems of today's key technologies - such as semiconductors or lasers - are macroscopic quantum objects; only statistical physics allows for understanding their fundamentals. Therefore, this graduate text also focuses on particular applications such as the properties of electrons in solids with applications, and radiation thermodynamics and the greenhouse effect.

  17. The set1Delta mutation unveils a novel signaling pathway relayed by the Rad53-dependent hyperphosphorylation of replication protein A that leads to transcriptional activation of repair genes.

    PubMed

    Schramke, V; Neecke, H; Brevet, V; Corda, Y; Lucchini, G; Longhese, M P; Gilson, E; Géli, V

    2001-07-15

    SET domain proteins are present in chromosomal proteins involved in epigenetic control of transcription. The yeast SET domain protein Set1p regulates chromatin structure, DNA repair, and telomeric functions. We investigated the mechanism by which the absence of Set1p increases DNA repair capacities of checkpoint mutants. We show that deletion of SET1 induces a response relayed by the signaling kinase Rad53p that leads to the MEC1/TEL1-independent hyperphosphorylation of replication protein A middle subunit (Rfa2p). Consequently, the binding of Rfa2p to upstream repressing sequences (URS) of repair genes is decreased, thereby leading to their derepression. Our results correlate the set1Delta-dependent phosphorylation of Rfa2p with the transcriptional induction of repair genes. Moreover, we show that the deletion of the amino-terminal region of Rfa2p suppresses the sensitivity to ultraviolet radiation of a mec3Delta checkpoint mutant, abolishes the URS-mediated repression, and increases the expression of repair genes. This work provides an additional link for the role of Rfa2p in the regulation of the repair capacity of the cell and reveals a role for the phosphorylation of Rfa2p and unveils unsuspected connections between chromatin, signaling pathways, telomeres, and DNA repair. PMID:11459833

  18. Statistical Physics of Particles

    NASA Astrophysics Data System (ADS)

    Kardar, Mehran

    2006-06-01

    Statistical physics has its origins in attempts to describe the thermal properties of matter in terms of its constituent particles, and has played a fundamental role in the development of quantum mechanics. Based on lectures for a course in statistical mechanics taught by Professor Kardar at Massachusetts Institute of Technology, this textbook introduces the central concepts and tools of statistical physics. It contains a chapter on probability and related issues such as the central limit theorem and information theory, and covers interacting particles, with an extensive description of the van der Waals equation and its derivation by mean field approximation. It also contains an integrated set of problems, with solutions to selected problems at the end of the book. It will be invaluable for graduate and advanced undergraduate courses in statistical physics. A complete set of solutions is available to lecturers on a password protected website at www.cambridge.org/9780521873420. Based on lecture notes from a course on Statistical Mechanics taught by the author at MIT Contains 89 exercises, with solutions to selected problems Contains chapters on probability and interacting particles Ideal for graduate courses in Statistical Mechanics

  19. Statistical Physics of Fields

    NASA Astrophysics Data System (ADS)

    Kardar, Mehran

    2006-06-01

    While many scientists are familiar with fractals, fewer are familiar with the concepts of scale-invariance and universality which underly the ubiquity of their shapes. These properties may emerge from the collective behaviour of simple fundamental constituents, and are studied using statistical field theories. Based on lectures for a course in statistical mechanics taught by Professor Kardar at Massachusetts Institute of Technology, this textbook demonstrates how such theories are formulated and studied. Perturbation theory, exact solutions, renormalization groups, and other tools are employed to demonstrate the emergence of scale invariance and universality, and the non-equilibrium dynamics of interfaces and directed paths in random media are discussed. Ideal for advanced graduate courses in statistical physics, it contains an integrated set of problems, with solutions to selected problems at the end of the book. A complete set of solutions is available to lecturers on a password protected website at www.cambridge.org/9780521873413. Based on lecture notes from a course on Statistical Mechanics taught by the author at MIT Contains 65 exercises, with solutions to selected problems Features a thorough introduction to the methods of Statistical Field theory Ideal for graduate courses in Statistical Physics

  20. Statistical optics

    NASA Astrophysics Data System (ADS)

    Goodman, J. W.

    This book is based on the thesis that some training in the area of statistical optics should be included as a standard part of any advanced optics curriculum. Random variables are discussed, taking into account definitions of probability and random variables, distribution functions and density functions, an extension to two or more random variables, statistical averages, transformations of random variables, sums of real random variables, Gaussian random variables, complex-valued random variables, and random phasor sums. Other subjects examined are related to random processes, some first-order properties of light waves, the coherence of optical waves, some problems involving high-order coherence, effects of partial coherence on imaging systems, imaging in the presence of randomly inhomogeneous media, and fundamental limits in photoelectric detection of light. Attention is given to deterministic versus statistical phenomena and models, the Fourier transform, and the fourth-order moment of the spectrum of a detected speckle image.

  1. Transcriptome analysis of various flower and silique development stages indicates a set of class III peroxidase genes potentially involved in pod shattering in Arabidopsis thaliana

    PubMed Central

    2010-01-01

    Background Plant class III peroxidases exist as a large multigenic family involved in numerous functions suggesting a functional specialization of each gene. However, few genes have been linked with a specific function. Consequently total peroxidase activity is still used in numerous studies although its relevance is questionable. Transcriptome analysis seems to be a promising tool to overcome the difficulties associated with the study of this family. Nevertheless available microarrays are not completely reliable for this purpose. We therefore used a macroarray dedicated to the 73 class III peroxidase genes of A. thaliana to identify genes potentially involved in flower and fruit development. Results The observed increase of total peroxidase activity during development was actually correlated with the induction of only a few class III peroxidase genes which supports the existence of a functional specialization of these proteins. We identified peroxidase genes that are predominantly expressed in one development stage and are probable components of the complex gene networks involved in the reproductive phase. An attempt has been made to gain insight into plausible functions of these genes by collecting and analyzing the expression data of different studies in plants. Peroxidase activity was additionally observed in situ in the silique dehiscence zone known to be involved in pod shattering. Because treatment with a peroxidase inhibitor delayed pod shattering, we subsequently studied mutants of transcription factors (TF) controlling this mechanism. Three peroxidases genes -AtPrx13, AtPrx30 and AtPrx55- were altered by the TFs involved in pod shatter. Conclusions Our data illustrated the problems caused by linking only an increase in total peroxidase activity to any specific development stage or function. The activity or involvement of specific class III peroxidase genes needs to be assessed. Several genes identified in our study had not been linked to any particular

  2. Analyses of synovial tissues from arthritic and protected congenic rat strains reveal a new core set of genes associated with disease severity

    PubMed Central

    Brenner, Max; Laragione, Teresina

    2013-01-01

    Little is known about the genes regulating disease severity and joint damage in rheumatoid arthritis (RA). In the present study we analyzed the gene expression characteristics of synovial tissues from four different strains congenic for non-MHC loci that develop mild and nonerosive arthritis compared with severe and erosive DA rats. DA.F344(Cia3d), DA.F344(Cia5a), DA.ACI(Cia10), and DA.ACI(Cia25) rats developed mild arthritis compared with DA. We found 685 genes with significantly different expression between congenics and DA, independent of the specific congenic interval, suggesting that these genes represent a new nongenetic core group of mediators of arthritis severity. This core group includes genes not previously implicated or with unclear role in arthritis severity, such as Tnn, Clec4m, and Spond1 among others, increased in DA. The core genes also included Scd1, Selenbp1, and Slc7a10, increased in congenics. Genes implicated in nuclear receptor activity, xenobiotic and lipid metabolism were also increased in the congenics, correlating with protection. Several disease mediators were among the core genes reduced in congenics, including IL-6, IL-17, and Ccl2. Analyses of upstream regulators (genes, pathways, or chemicals) suggested reduced activation of Stat3 and TLR-related genes and chemicals in congenics. Additionally, cigarette smoking was among the upstream regulators activated in DA, while p53 was an upstream regulator activated in congenics. We observed congenic-specific differential expression and detection in each individual strain. In conclusion, this new nongenetically regulated core genes of disease severity or protection in arthritis should provide new insight into critical pathways and potential new environmental risk factor for arthritis. PMID:24046282

  3. Statistics Revelations

    ERIC Educational Resources Information Center

    Chicot, Katie; Holmes, Hilary

    2012-01-01

    The use, and misuse, of statistics is commonplace, yet in the printed format data representations can be either over simplified, supposedly for impact, or so complex as to lead to boredom, supposedly for completeness and accuracy. In this article the link to the video clip shows how dynamic visual representations can enliven and enhance the…

  4. Statistical Fun

    ERIC Educational Resources Information Center

    Catley, Alan

    2007-01-01

    Following the announcement last year that there will be no more math coursework assessment at General Certificate of Secondary Education (GCSE), teachers will in the future be able to devote more time to preparing learners for formal examinations. One of the key things that the author has learned when teaching statistics is that it makes for far…

  5. The KM-Algorithm Identifies Regulated Genes in Time Series Expression Data

    PubMed Central

    Bremer, Martina; Doerge, R. W.

    2009-01-01

    We present a statistical method to rank observed genes in gene expression time series experiments according to their degree of regulation in a biological process. The ranking may be used to focus on specific genes or to select meaningful subsets of genes from which gene regulatory networks can be built. Our approach is based on a state space model that incorporates hidden regulators of gene expression. Kalman (K) smoothing and maximum (M) likelihood estimation techniques are used to derive optimal estimates of the model parameters upon which a proposed regulation criterion is based. The statistical power of the proposed algorithm is investigated, and a real data set is analyzed for the purpose of identifying regulated genes in time dependent gene expression data. This statistical approach supports the concept that meaningful biological conclusions can be drawn from gene expression time series experiments by focusing on strong regulation rather than large expression values. PMID:19956417

  6. Characteristic features of the nucleotide sequences of yeast mitochondrial ribosomal protein genes as analyzed by computer program GeneMark.

    PubMed

    Isono, K; McIninch, J D; Borodovsky, M

    1994-01-01

    The nucleotide sequence data for yeast mitochondrial ribosomal protein (MRP) genes were analyzed by the computer program GeneMark which predicts the presence of likely genes in sequence data by calculating statistical biases in the appearance of consecutive nucleotides. The program uses a set of standard sequence data for this calculation. We used this program for the analysis of yeast nucleotide sequence data containing MRP genes, hoping to obtain information as to whether they share features in common that are different from other yeast genes. Sequence data sets for ordinary yeast genes and for 27 known MRP genes were used. The MRP genes were nicely predicted as likely genes regardless of the data sets used, whereas other yeast genes were predicted to be likely genes only when the data set for ordinary yeast genes was used. The assembled sequence data for chromosomes II, III, VIII and XI as well as the segmented data for chromosome V were analyzed in a similar manner. In addition to the known MRP genes, eleven ORF's were predicted to be likely MRP genes. Thus, the method seems very powerful in analyzing genes of heterologous origins. PMID:7719921

  7. Gene prediction and gene classes in Arabidopsis thaliana.

    PubMed

    Mathé, C; Déhais, P; Pavy, N; Rombauts, S; Van Montagu, M; Rouzé, P

    2000-03-31

    Gene prediction methods for eukaryotic genomes still are not fully satisfying. One way to improve gene prediction accuracy, proven to be relevant for prokaryotes, is to consider more than one model of genes. Thus, we used our classification of Arabidopsis thaliana genes in two classes (CU(1) and CU(2)), previously delineated according to statistical features, in the GeneMark gene identification program. For each gene class, as well as for the two classes combined, a Markov model was developed (respectively, GM-CU(1), GM-CU(2) and GM-all) and then used on a test set of 168 genes to compare their respective efficiency. We concluded from this analysis that GM-CU(1) is more sensitive than GM-CU(2) which seems to be more specific to a gene type. Besides, GM-all does not give better results than GM-CU(1) and combining results from GM-CU(1) and GM-CU(2) greatly improve prediction efficiency in comparison with predictions made with GM-all only. Thus, this work confirms the necessity to consider more than one gene model for gene prediction in eukaryotic genomes, and to look for gene classes in order to build these models. PMID:10751690

  8. Genome-wide transcriptional analysis of grapevine berry ripening reveals a set of genes similarly modulated during three seasons and the occurrence of an oxidative burst at vèraison

    PubMed Central

    Pilati, Stefania; Perazzolli, Michele; Malossini, Andrea; Cestaro, Alessandro; Demattè, Lorenzo; Fontana, Paolo; Dal Ri, Antonio; Viola, Roberto; Velasco, Riccardo; Moser, Claudio

    2007-01-01

    Background Grapevine (Vitis species) is among the most important fruit crops in terms of cultivated area and economic impact. Despite this relevance, little is known about the transcriptional changes and the regulatory circuits underlying the biochemical and physical changes occurring during berry development. Results Fruit ripening in the non-climacteric crop species Vitis vinifera L. has been investigated at the transcriptional level by the use of the Affymetrix Vitis GeneChip® which contains approximately 14,500 unigenes. Gene expression data obtained from berries sampled before and after véraison in three growing years, were analyzed to identify genes specifically involved in fruit ripening and to investigate seasonal influences on the process. From these analyses a core set of 1477 genes was found which was similarly modulated in all seasons. We were able to separate ripening specific isoforms within gene families and to identify ripening related genes which appeared strongly regulated also by the seasonal weather conditions. Transcripts annotation by Gene Ontology vocabulary revealed five overrepresented functional categories of which cell wall organization and biogenesis, carbohydrate and secondary metabolisms and stress response were specifically induced during the ripening phase, while photosynthesis was strongly repressed. About 19% of the core gene set was characterized by genes involved in regulatory processes, such as transcription factors and transcripts related to hormonal metabolism and signal transduction. Auxin, ethylene and light emerged as the main stimuli influencing berry development. In addition, an oxidative burst, previously not detected in grapevine, characterized by rapid accumulation of H2O2 starting from véraison and by the modulation of many ROS scavenging enzymes, was observed. Conclusion The time-course gene expression analysis of grapevine berry development has identified the occurrence of two well distinct phases along the

  9. AGGrEGATOr: A Gene-based GEne-Gene interActTiOn test for case-control association studies.

    PubMed

    Emily, Mathieu

    2016-04-01

    Among the large of number of statistical methods that have been proposed to identify gene-gene interactions in case-control genome-wide association studies (GWAS), gene-based methods have recently grown in popularity as they confer advantage in both statistical power and biological interpretation. All of the gene-based methods jointly model the distribution of single nucleotide polymorphisms (SNPs) sets prior to the statistical test, leading to a limited power to detect sums of SNP-SNP signals. In this paper, we instead propose a gene-based method that first performs SNP-SNP interaction tests before aggregating the obtained p-values into a test at the gene level. Our method called AGGrEGATOr is based on a minP procedure that tests the significance of the minimum of a set of p-values. We use simulations to assess the capacity of AGGrEGATOr to correctly control for type-I error. The benefits of our approach in terms of statistical power and robustness to SNPs set characteristics are evaluated in a wide range of disease models by comparing it to previous methods. We also apply our method to detect gene pairs associated to rheumatoid arthritis (RA) on the GSE39428 dataset. We identify 13 potential gene-gene interactions and replicate one gene pair in the Wellcome Trust Case Control Consortium dataset at the level of 5%. We further test 15 gene pairs, previously reported as being statistically associated with RA or Crohn's disease (CD) or coronary artery disease (CAD), for replication in the Wellcome Trust Case Control Consortium dataset. We show that AGGrEGATOr is the only method able to successfully replicate seven gene pairs. PMID:26913459

  10. A new set of ESTs and cDNA clones from full-length and normalized libraries for gene discovery and functional characterization in citrus

    PubMed Central

    Marques, M Carmen; Alonso-Cantabrana, Hugo; Forment, Javier; Arribas, Raquel; Alamar, Santiago; Conejero, Vicente; Perez-Amador, Miguel A

    2009-01-01

    Background Interpretation of ever-increasing raw sequence information generated by modern genome sequencing technologies faces multiple challenges, such as gene function analysis and genome annotation. Indeed, nearly 40% of genes in plants encode proteins of unknown function. Functional characterization of these genes is one of the main challenges in modern biology. In this regard, the availability of full-length cDNA clones may fill in the gap created between sequence information and biological knowledge. Full-length cDNA clones facilitate functional analysis of the corresponding genes enabling manipulation of their expression in heterologous systems and the generation of a variety of tagged versions of the native protein. In addition, the development of full-length cDNA sequences has the power to improve the quality of genome annotation. Results We developed an integrated method to generate a new normalized EST collection enriched in full-length and rare transcripts of different citrus species from multiple tissues and developmental stages. We constructed a total of 15 cDNA libraries, from which we isolated 10,898 high-quality ESTs representing 6142 different genes. Percentages of redundancy and proportion of full-length clones range from 8 to 33, and 67 to 85, respectively, indicating good efficiency of the approach employed. The new EST collection adds 2113 new citrus ESTs, representing 1831 unigenes, to the collection of citrus genes available in the public databases. To facilitate functional analysis, cDNAs were introduced in a Gateway-based cloning vector for high-throughput functional analysis of genes in planta. Herein, we describe the technical methods used in the library construction, sequence analysis of clones and the overexpression of CitrSEP, a citrus homolog to the Arabidopsis SEP3 gene, in Arabidopsis as an example of a practical application of the engineered Gateway vector for functional analysis. Conclusion The new EST collection denotes an

  11. Statistical Optics

    NASA Astrophysics Data System (ADS)

    Goodman, Joseph W.

    2000-07-01

    The Wiley Classics Library consists of selected books that have become recognized classics in their respective fields. With these new unabridged and inexpensive editions, Wiley hopes to extend the life of these important works by making them available to future generations of mathematicians and scientists. Currently available in the Series: T. W. Anderson The Statistical Analysis of Time Series T. S. Arthanari & Yadolah Dodge Mathematical Programming in Statistics Emil Artin Geometric Algebra Norman T. J. Bailey The Elements of Stochastic Processes with Applications to the Natural Sciences Robert G. Bartle The Elements of Integration and Lebesgue Measure George E. P. Box & Norman R. Draper Evolutionary Operation: A Statistical Method for Process Improvement George E. P. Box & George C. Tiao Bayesian Inference in Statistical Analysis R. W. Carter Finite Groups of Lie Type: Conjugacy Classes and Complex Characters R. W. Carter Simple Groups of Lie Type William G. Cochran & Gertrude M. Cox Experimental Designs, Second Edition Richard Courant Differential and Integral Calculus, Volume I RIchard Courant Differential and Integral Calculus, Volume II Richard Courant & D. Hilbert Methods of Mathematical Physics, Volume I Richard Courant & D. Hilbert Methods of Mathematical Physics, Volume II D. R. Cox Planning of Experiments Harold S. M. Coxeter Introduction to Geometry, Second Edition Charles W. Curtis & Irving Reiner Representation Theory of Finite Groups and Associative Algebras Charles W. Curtis & Irving Reiner Methods of Representation Theory with Applications to Finite Groups and Orders, Volume I Charles W. Curtis & Irving Reiner Methods of Representation Theory with Applications to Finite Groups and Orders, Volume II Cuthbert Daniel Fitting Equations to Data: Computer Analysis of Multifactor Data, Second Edition Bruno de Finetti Theory of Probability, Volume I Bruno de Finetti Theory of Probability, Volume 2 W. Edwards Deming Sample Design in Business Research

  12. Setting Environmental Standards

    ERIC Educational Resources Information Center

    Fishbein, Gershon

    1975-01-01

    Recent court decisions have pointed out the complexities involved in setting environmental standards. Environmental health is composed of multiple causative agents, most of which work over long periods of time. This makes the cause-and-effect relationship between health statistics and environmental contaminant exposures difficult to prove in…

  13. Statistical and biological gene-lifestyle interactions of MC4R and FTO with diet and physical activity on obesity: new effects on alcohol consumption

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Fat mass and obesity (FTO) and melanocortin-4 receptor (MC4R) and are relevant genes associated with obesity. This could be through food intake, but results are contradictory. Modulation by diet or other lifestyle factors is also not well understood. To investigate whether MC4R and FTO associations ...

  14. A distinct p53 target gene set predicts for response to the selective p53–HDM2 inhibitor NVP-CGM097

    PubMed Central

    Jeay, Sébastien; Gaulis, Swann; Ferretti, Stéphane; Bitter, Hans; Ito, Moriko; Valat, Thérèse; Murakami, Masato; Ruetz, Stephan; Guthy, Daniel A; Rynn, Caroline; Jensen, Michael R; Wiesmann, Marion; Kallen, Joerg; Furet, Pascal; Gessier, François; Holzer, Philipp; Masuya, Keiichi; Würthner, Jens; Halilovic, Ensar; Hofmann, Francesco; Sellers, William R; Graus Porta, Diana

    2015-01-01

    Biomarkers for patient selection are essential for the successful and rapid development of emerging targeted anti-cancer therapeutics. In this study, we report the discovery of a novel patient selection strategy for the p53–HDM2 inhibitor NVP-CGM097, currently under evaluation in clinical trials. By intersecting high-throughput cell line sensitivity data with genomic data, we have identified a gene expression signature consisting of 13 up-regulated genes that predicts for sensitivity to NVP-CGM097 in both cell lines and in patient-derived tumor xenograft models. Interestingly, these 13 genes are known p53 downstream target genes, suggesting that the identified gene signature reflects the presence of at least a partially activated p53 pathway in NVP-CGM097-sensitive tumors. Together, our findings provide evidence for the use of this newly identified predictive gene signature to refine the selection of patients with wild-type p53 tumors and increase the likelihood of response to treatment with p53–HDM2 inhibitors, such as NVP-CGM097. DOI: http://dx.doi.org/10.7554/eLife.06498.001 PMID:25965177

  15. [Statistical materials].

    PubMed

    1986-01-01

    Official population data for the USSR are presented for 1985 and 1986. Part 1 (pp. 65-72) contains data on capitals of union republics and cities with over one million inhabitants, including population estimates for 1986 and vital statistics for 1985. Part 2 (p. 72) presents population estimates by sex and union republic, 1986. Part 3 (pp. 73-6) presents data on population growth, including birth, death, and natural increase rates, 1984-1985; seasonal distribution of births and deaths; birth order; age-specific birth rates in urban and rural areas and by union republic; marriages; age at marriage; and divorces. PMID:12178831

  16. Representational Versatility in Learning Statistics

    ERIC Educational Resources Information Center

    Graham, Alan T.; Thomas, Michael O. J.

    2005-01-01

    Statistical data can be represented in a number of qualitatively different ways, the choice depending on the following three conditions: the concepts to be investigated; the nature of the data; and the purpose for which they were collected. This paper begins by setting out frameworks that describe the nature of statistical thinking in schools, and…

  17. Text Sets.

    ERIC Educational Resources Information Center

    Giorgis, Cyndi; Johnson, Nancy J.

    2002-01-01

    Presents annotations of approximately 30 titles grouped in text sets. Defines a text set as five to ten books on a particular topic or theme. Discusses books on the following topics: living creatures; pirates; physical appearance; natural disasters; and the Irish potato famine. (SG)

  18. Scan statistic-based analysis of exome sequencing data identifies FAN1 at 15q13.3 as a susceptibility gene for schizophrenia and autism

    PubMed Central

    Ionita-Laza, Iuliana; Xu, Bin; Makarov, Vlad; Buxbaum, Joseph D.; Roos, J. Louw; Gogos, Joseph A.; Karayiorgou, Maria

    2014-01-01

    We used a family-based cluster detection approach designed to localize significant rare disease–risk variants clusters within a region of interest to systematically search for schizophrenia (SCZ) susceptibility genes within 49 genomic loci previously implicated by de novo copy number variants. Using two independent whole-exome sequencing family datasets and a follow-up autism spectrum disorder (ASD) case/control whole-exome sequencing dataset, we identified variants in one gene, Fanconi-associated nuclease 1 (FAN1), as being associated with both SCZ and ASD. FAN1 is located in a region on chromosome 15q13.3 implicated by a recurrent copy number variant, which predisposes to an array of psychiatric and neurodevelopmental phenotypes. In both SCZ and ASD datasets, rare nonsynonymous risk variants cluster significantly in affected individuals within a 20-kb window that spans several key functional domains of the gene. Our finding suggests that FAN1 is a key driver in the 15q13.3 locus for the associated psychiatric and neurodevelopmental phenotypes. FAN1 encodes a DNA repair enzyme, thus implicating abnormalities in DNA repair in the susceptibility to SCZ or ASD. PMID:24344280

  19. Allele frequency-based and polymorphism-versus-divergence indices of balancing selection in a new filtered set of polymorphic genes in Plasmodium falciparum.

    PubMed

    Ochola, Lynette Isabella; Tetteh, Kevin K A; Stewart, Lindsay B; Riitho, Victor; Marsh, Kevin; Conway, David J

    2010-10-01

    Signatures of balancing selection operating on specific gene loci in endemic pathogens can identify candidate targets of naturally acquired immunity. In malaria parasites, several leading vaccine candidates convincingly show such signatures when subjected to several tests of neutrality, but the discovery of new targets affected by selection to a similar extent has been slow. A small minority of all genes are under such selection, as indicated by a recent study of 26 Plasmodium falciparum merozoite-stage genes that were not previously prioritized as vaccine candidates, of which only one (locus PF10_0348) showed a strong signature. Therefore, to focus discovery efforts on genes that are polymorphic, we scanned all available shotgun genome sequence data from laboratory lines of P. falciparum and chose six loci with more than five single nucleotide polymorphisms per kilobase (including PF10_0348) for in-depth frequency-based analyses in a Kenyan population (allele sample sizes >50 for each locus) and comparison of Hudson-Kreitman-Aguade (HKA) ratios of population diversity (π) to interspecific divergence (K) from the chimpanzee parasite Plasmodium reichenowi. Three of these (the msp3/6-like genes PF10_0348 and PF10_0355 and the surf(4.1) gene PFD1160w) showed exceptionally high positive values of Tajima's D and Fu and Li's F indices and have the highest HKA ratios, indicating that they are under balancing selection and should be prioritized for studies of their protein products as candidate targets of immunity. Combined with earlier results, there is now strong evidence that high HKA ratio (as well as the frequency-independent ratio of Watterson's /K) is predictive of high values of Tajima's D. Thus, the former offers value for use in genome-wide screening when numbers of genome sequences within a species are low or in combination with Tajima's D as a 2D test on large population genomic samples. PMID:20457586

  20. 1979 DOE statistical symposium

    SciTech Connect

    Gardiner, D.A.; Truett T.

    1980-09-01

    The 1979 DOE Statistical Symposium was the fifth in the series of annual symposia designed to bring together statisticians and other interested parties who are actively engaged in helping to solve the nation's energy problems. The program included presentations of technical papers centered around exploration and disposal of nuclear fuel, general energy-related topics, and health-related issues, and workshops on model evaluation, risk analysis, analysis of large data sets, and resource estimation.

  1. Comparative Functional Genomic Analysis Identifies Distinct and Overlapping Sets of Genes Required for Resistance to Monomethylarsonous Acid (MMAIII) and Arsenite (AsIII) in Yeast

    PubMed Central

    Jo, William J.; Loguinov, Alex; Wintz, Henri; Chang, Michelle; Smith, Allan H.; Kalman, Dave; Zhang, Luoping; Smith, Martyn T.; Vulpe, Chris D.

    2009-01-01

    Arsenic is a human toxin and carcinogen commonly found as a contaminant in drinking water. Arsenite (AsIII) is the most toxic inorganic form, but recent evidence indicates that the metabolite monomethylarsonous acid (MMAIII) is even more toxic. We have used a chemical genomics approach to identify the genes that modulate the cellular toxicity of MMAIII and AsIII in the yeast Saccharomyces cerevisiae. Functional profiling using homozygous deletion mutants provided evidence of the requirement of highly conserved biological processes in the response against both arsenicals including tubulin folding, DNA double-strand break repair, and chromatin modification. At the equitoxic doses of 150μM MMAIII and 300μM AsIII, genes related to glutathione metabolism were essential only for resistance to the former, suggesting a higher potency of MMAIII to disrupt glutathione metabolism than AsIII. Treatments with MMAIII induced a significant increase in glutathione levels in the wild-type strain, which correlated to the requirement of genes from the sulfur and methionine metabolic pathways and was consistent with the induction of oxidative stress. Based on the relative sensitivity of deletion strains deficient in GSH metabolism and tubulin folding processes, oxidative stress appeared to be the primary mechanism of MMAIII toxicity whereas secondary to tubulin disruption in the case of AsIII. Many of the identified yeast genes have orthologs in humans that could potentially modulate arsenic toxicity in a similar manner as their yeast counterparts. PMID:19635755

  2. Statistical Parsimony Networks and Species Assemblages in Cephalotrichid Nemerteans (Nemertea)

    PubMed Central

    Chen, Haixia; Strand, Malin; Norenburg, Jon L.; Sun, Shichun; Kajihara, Hiroshi; Chernyshev, Alexey V.; Maslakova, Svetlana A.; Sundberg, Per

    2010-01-01

    Background It has been suggested that statistical parsimony network analysis could be used to get an indication of species represented in a set of nucleotide data, and the approach has been used to discuss species boundaries in some taxa. Methodology/Principal Findings Based on 635 base pairs of the mitochondrial protein-coding gene cytochrome c oxidase I (COI), we analyzed 152 nemertean specimens using statistical parsimony network analysis with the connection probability set to 95%. The analysis revealed 15 distinct networks together with seven singletons. Statistical parsimony yielded three networks supporting the species status of Cephalothrix rufifrons, C. major and C. spiralis as they currently have been delineated by morphological characters and geographical location. Many other networks contained haplotypes from nearby geographical locations. Cladistic structure by maximum likelihood analysis overall supported the network analysis, but indicated a false positive result where subnetworks should have been connected into one network/species. This probably is caused by undersampling of the intraspecific haplotype diversity. Conclusions/Significance Statistical parsimony network analysis provides a rapid and useful tool for detecting possible undescribed/cryptic species among cephalotrichid nemerteans based on COI gene. It should be combined with phylogenetic analysis to get indications of false positive results, i.e., subnetworks that would have been connected with more extensive haplotype sampling. PMID:20877627

  3. Statistical Methods in Cosmology

    NASA Astrophysics Data System (ADS)

    Verde, L.

    2010-03-01

    The advent of large data-set in cosmology has meant that in the past 10 or 20 years our knowledge and understanding of the Universe has changed not only quantitatively but also, and most importantly, qualitatively. Cosmologists rely on data where a host of useful information is enclosed, but is encoded in a non-trivial way. The challenges in extracting this information must be overcome to make the most of a large experimental effort. Even after having converged to a standard cosmological model (the LCDM model) we should keep in mind that this model is described by 10 or more physical parameters and if we want to study deviations from it, the number of parameters is even larger. Dealing with such a high dimensional parameter space and finding parameters constraints is a challenge on itself. Cosmologists want to be able to compare and combine different data sets both for testing for possible disagreements (which could indicate new physics) and for improving parameter determinations. Finally, cosmologists in many cases want to find out, before actually doing the experiment, how much one would be able to learn from it. For all these reasons, sophisiticated statistical techniques are being employed in cosmology, and it has become crucial to know some statistical background to understand recent literature in the field. I will introduce some statistical tools that any cosmologist should know about in order to be able to understand recently published results from the analysis of cosmological data sets. I will not present a complete and rigorous introduction to statistics as there are several good books which are reported in the references. The reader should refer to those.

  4. Identification of the set of genes, including nonannotated morA, under the direct control of ModE in Escherichia coli.

    PubMed

    Kurata, Tatsuaki; Katayama, Akira; Hiramatsu, Masakazu; Kiguchi, Yuya; Takeuchi, Masamitsu; Watanabe, Tomoyuki; Ogasawara, Hiroshi; Ishihama, Akira; Yamamoto, Kaneyoshi

    2013-10-01

    ModE is the molybdate-sensing transcription regulator that controls the expression of genes related to molybdate homeostasis in Escherichia coli. ModE is activated by binding molybdate and acts as both an activator and a repressor. By genomic systematic evolution of ligands by exponential enrichment (SELEX) screening and promoter reporter assays, we have identified a total of nine operons, including the hitherto identified modA, moaA, dmsA, and napF operons, of which six were activated by ModE and three were repressed. In addition, two promoters were newly identified and direct transcription of novel genes, referred to as morA and morB, located on antisense strands of yghW and torY, respectively. The morA gene encodes a short peptide, MorA, with an unusual initiation codon. Surprisingly, overexpression of the morA 5' untranslated region exhibited an inhibitory influence on colony formation of E. coli K-12. PMID:23913318

  5. Reconsideration of systematic relationships within the order Euplotida (Protista, Ciliophora) using new sequences of the gene coding for small-subunit rRNA and testing the use of combined data sets to construct phylogenies of the Diophrys-complex.

    PubMed

    Yi, Zhenzhen; Song, Weibo; Clamp, John C; Chen, Zigui; Gao, Shan; Zhang, Qianqian

    2009-03-01

    Comprehensive molecular analyses of phylogenetic relationships within euplotid ciliates are relatively rare, and the relationships among some families remain questionable. We performed phylogenetic analyses of the order Euplotida based on new sequences of the gene coding for small-subunit RNA (SSrRNA) from a variety of taxa across the entire order as well as sequences from some of these taxa of other genes (ITS1-5.8S-ITS2 region and histone H4) that have not been included in previous analyses. Phylogenetic trees based on SSrRNA gene sequences constructed with four different methods had a consistent branching pattern that included the following features: (1) the "typical" euplotids comprised a paraphyletic assemblage composed of two divergent clades (family Uronychiidae and families Euplotidae-Certesiidae-Aspidiscidae-Gastrocirrhidae), (2) in the family Uronychiidae, the genera Uronychia and Paradiophrys formed a clearly outlined, well-supported clade that seemed to be rather divergent from Diophrys and Diophryopsis, suggesting that the Diophrys-complex may have had a longer and more separate evolutionary history than previously supposed, (3) inclusion of 12 new SSrRNA sequences in analyses of Euplotidae revealed two new clades of species within the family and cast additional doubt on the present classification of genera within the family, and (4) the intraspecific divergence among five species of Aspidisca was far greater than those of closely related genera. The ITS1-5.8S-ITS2 coding regions and partial histone H4 genes of six morphospecies in the Diophrys-complex were sequenced along with their SSrRNA genes and used to compare phylogenies constructed from single data sets to those constructed from combined sets. Results indicated that combined analyses could be used to construct more reliable, less ambiguous phylogenies of complex groups like the order Euplotida, because they provide a greater amount and diversity of information. PMID:19121402

  6. Donor TLR9 gene tagSNPs influence susceptibility to aGVHD and CMV reactivation in the allo-HSCT setting without polymorphisms in the TLR4 and NOD2 genes.

    PubMed

    Xiao, H W; Luo, Y; Lai, X Y; Shi, J M; Tan, Y M; He, J S; Xie, W Z; Zheng, W Y; Ye, X J; Yu, X H; Cai, Z; Lin, M F; Huang, H

    2014-02-01

    Owing to ethnicity of the population, those best confirmed polymorphisms in the TLR (toll-like receptor)4 and NOD2 genes with significantly prognostic impact on allogeneic hematopoietic SCT (allo-HSCT) seem to be more applicable to Europeans and are nonpolymorphic in the Asian population. The influence of innate immunity gene polymorphisms on the outcomes of allo-HSCT in those populations has been questioned. We evaluated the influence of 10 candidate single nucleotide polymorphisms (SNPs) in the TLR1, TLR2, TLR3, TLR8 and TLR9 genes on the outcomes of allo-HSCT in a Chinese population including 138 pairs of patients and unrelated donors and a second cohort of 102 pairs of patients and HLA-identical sibling donors. We found that two tagSNPs in the TLR9 gene in the donor side, +1174 A/G (rs352139) and +1635 C/T (rs352140), influenced the risk of acute GVHD (aGVHD) and CMV reactivation. Furthermore, the presence of the susceptible haplotype (A-C) in donor may be an informative predicator of worse OS at 5 years compared with those with the G-C and G-T haplotypes (58% vs 82.9%, P=0.024). Our data suggested an unrecognized association between donor TLR9 tagSNPs and the risk of HSCT-related complications in a population without polymorphisms in the TLR4 and NOD2 genes. PMID:24121213

  7. Deciphering azole resistance mechanisms with a focus on transcription factor-encoding genes TAC1, MRR1 and UPC2 in a set of fluconazole-resistant clinical isolates of Candida albicans.

    PubMed

    Morio, Florent; Pagniez, Fabrice; Besse, Myriam; Gay-andrieu, Françoise; Miegeville, Michel; Le Pape, Patrice

    2013-11-01

    Several and often combined mechanisms can lead to acquired azole resistance in Candida albicans and subsequent therapeutic failure. The aim of this study was to provide a complete overview of the molecular basis of azole resistance in a set of six C. albicans clinical isolates recovered from patients who failed azole therapy. For this purpose, expression levels of CDR1, MDR1 and ERG11 were investigated by reverse transcription PCR (RT-PCR) together with amplification and sequencing of the genes encoding their transcription factors TAC1, MRR1 and UPC2. In all, the data underline that azole resistance in this set of clinical isolates results from distinct, often combined, mechanisms, being mostly driven by CDR1 and/or MDR1 active efflux. We show that gain-of-function (GOF) mutations in the transcription-factor-encoding genes TAC1, MRR1 and UPC2 are a common event in azole-resistant C. albicans clinical isolates. In addition, together with the finding that these genes are highly permissive to nucleotide changes, we describe several novel mutations that could act as putative GOF mutations involved in fluconazole resistance. PMID:24051054

  8. Setting Objectives

    ERIC Educational Resources Information Center

    Elkins, Aaron J.

    1977-01-01

    The author questions the extent to which educators have relied on "relevance" and learner participation in objective-setting in the past decade. He describes a useful approach to learner-oriented evaluation in which content relevance was not judged by participants until after they had been exposed to it. (MF)

  9. Compare Gene Profiles

    SciTech Connect

    2014-05-31

    Compare Gene Profiles (CGP) performs pairwise gene content comparisons among a relatively large set of related bacterial genomes. CGP performs pairwise BLAST among gene calls from a set of input genome and associated annotation files, and combines the results to generate lists of common genes, unique genes, homologs, and genes from each genome that differ substantially in length from corresponding genes in the other genomes. CGP is implemented in Python and runs in a Linux environment in serial or parallel mode.

  10. Statistical methods in microbiology.

    PubMed Central

    Ilstrup, D M

    1990-01-01

    Statistical methodology is viewed by the average laboratory scientist, or physician, sometimes with fear and trepidation, occasionally with loathing, and seldom with fondness. Statistics may never be loved by the medical community, but it does not have to be hated by them. It is true that statistical science is sometimes highly mathematical, always philosophical, and occasionally obtuse, but for the majority of medical studies it can be made palatable. The goal of this article has been to outline a finite set of methods of analysis that investigators should choose based on the nature of the variable being studied and the design of the experiment. The reader is encouraged to seek the advice of a professional statistician when there is any doubt about the appropriate method of analysis. A statistician can also help the investigator with problems that have nothing to do with statistical tests, such as quality control, choice of response variable and comparison groups, randomization, and blinding of assessment of response variables. PMID:2200604

  11. A Functional Genomic Screen Combined with Time-Lapse Microscopy Uncovers a Novel Set of Genes Involved in Dorsal Closure of Drosophila Embryos

    PubMed Central

    Jankovics, Ferenc; Henn, László; Bujna, Ágnes; Vilmos, Péter; Kiss, Nóra; Erdélyi, Miklós

    2011-01-01

    Morphogenesis, the establishment of the animal body, requires the coordinated rearrangement of cells and tissues regulated by a very strictly-determined genetic program. Dorsal closure of the epithelium in the Drosophila melanogaster embryo is one of the best models for such a complex morphogenetic event. To explore the genetic regulation of dorsal closure, we carried out a large-scale RNA interference-based screen in combination with in vivo time-lapse microscopy and identified several genes essential for the closure or affecting its dynamics. One of the novel dorsal closure genes, the small GTPase activator pebble (pbl), was selected for detailed analysis. We show that pbl regulates actin accumulation and protrusion dynamics in the leading edge of the migrating epithelial cells. In addition, pbl affects dorsal closure dynamics by regulating head involution, a morphogenetic process mechanically coupled with dorsal closure. Finally, we provide evidence that pbl is involved in closure of the adult thorax, suggesting its general requirement in epithelial closure processes. PMID:21799798

  12. Gene Ontology annotation of sequence-specific DNA binding transcription factors: setting the stage for a large-scale curation effort

    PubMed Central

    Tripathi, Sushil; Christie, Karen R.; Balakrishnan, Rama; Huntley, Rachael; Hill, David P.; Thommesen, Liv; Blake, Judith A.; Kuiper, Martin; Lægreid, Astrid

    2013-01-01

    Transcription factors control which information in a genome becomes transcribed to produce RNAs that function in the biological systems of cells and organisms. Reliable and comprehensive information about transcription factors is invaluable for large-scale network-based studies. However, existing transcription factor knowledge bases are still lacking in well-documented functional information. Here, we provide guidelines for a curation strategy, which constitutes a robust framework for using the controlled vocabularies defined by the Gene Ontology Consortium to annotate specific DNA binding transcription factors (DbTFs) based on experimental evidence reported in literature. Our standardized protocol and workflow for annotating specific DNA binding RNA polymerase II transcription factors is designed to document high-quality and decisive evidence from valid experimental methods. Within a collaborative biocuration effort involving the user community, we are now in the process of exhaustively annotating the full repertoire of human, mouse and rat proteins that qualify as DbTFs in as much as they are experimentally documented in the biomedical literature today. The completion of this task will significantly enrich Gene Ontology-based information resources for the research community. Database URL: www.tfcheckpoint.org PMID:23981286

  13. Arabidopsis Flower and Embryo Developmental Genes are Repressed in Seedlings by Different Combinations of Polycomb Group Proteins in Association with Distinct Sets of Cis-regulatory Elements

    PubMed Central

    Liu, Jian; Zhang, Lei; He, Chongsheng; Shen, Wen-Hui; Jin, Hong; Xu, Lin; Zhang, Yijing

    2016-01-01

    Polycomb repressive complexes (PRCs) play crucial roles in transcriptional repression and developmental regulation in both plants and animals. In plants, depletion of different members of PRCs causes both overlapping and unique phenotypic defects. However, the underlying molecular mechanism determining the target specificity and functional diversity is not sufficiently characterized. Here, we quantitatively compared changes of tri-methylation at H3K27 in Arabidopsis mutants deprived of various key PRC components. We show that CURLY LEAF (CLF), a major catalytic subunit of PRC2, coordinates with different members of PRC1 in suppression of distinct plant developmental programs. We found that expression of flower development genes is repressed in seedlings preferentially via non-redundant role of CLF, which specifically associated with LIKE HETEROCHROMATIN PROTEIN1 (LHP1). In contrast, expression of embryo development genes is repressed by PRC1-catalytic core subunits AtBMI1 and AtRING1 in common with PRC2-catalytic enzymes CLF or SWINGER (SWN). This context-dependent role of CLF corresponds well with the change in H3K27me3 profiles, and is remarkably associated with differential co-occupancy of binding motifs of transcription factors (TFs), including MADS box and ABA-related factors. We propose that different combinations of PRC members distinctively regulate different developmental programs, and their target specificity is modulated by specific TFs. PMID:26760036

  14. SETS. Set Equation Transformation System

    SciTech Connect

    Worrel, R.B.

    1992-01-13

    SETS is used for symbolic manipulation of Boolean equations, particularly the reduction of equations by the application of Boolean identities. It is a flexible and efficient tool for performing probabilistic risk analysis (PRA), vital area analysis, and common cause analysis. The equation manipulation capabilities of SETS can also be used to analyze noncoherent fault trees and determine prime implicants of Boolean functions, to verify circuit design implementation, to determine minimum cost fire protection requirements for nuclear reactor plants, to obtain solutions to combinatorial optimization problems with Boolean constraints, and to determine the susceptibility of a facility to unauthorized access through nullification of sensors in its protection system.

  15. Statistical and computational challenges in physical mapping

    SciTech Connect

    Nelson, D.O.; Speed, T.P.

    1994-06-01

    One of the great success stories of modern molecular genetics has been the ability of biologists to isolate and characterize the genes responsible for serious inherited diseases like Huntington`s disease, cystic fibrosis, and myotonic dystrophy. Instrumental in these efforts has been the construction of so-called {open_quotes}physical maps{close_quotes} of large regions of human chromosomes. Constructing a physical map of a chromosome presents a number of interesting challenges to the computational statistician. In addition to the general ill-posedness of the problem, complications include the size of the data sets, computational complexity, and the pervasiveness of experimental error. The nature of the problem and the presence of many levels of experimental uncertainty make statistical approaches to map construction appealing. Simultaneously, however, the size and combinatorial complexity of the problem make such approaches computationally demanding. In this paper we discuss what physical maps are and describe three different kinds of physical maps, outlining issues which arise in constructing them. In addition, we describe our experience with powerful, interactive statistical computing environments. We found that the ability to create high-level specifications of proposed algorithms which could then be directly executed provided a flexible rapid prototyping facility for developing new statistical models and methods. The ability to check the implementation of an algorithm by comparing its results to that of an executable specification enabled us to rapidly debug both specification and implementation in an environment of changing needs.

  16. Fermions from classical statistics

    SciTech Connect

    Wetterich, C.

    2010-12-15

    We describe fermions in terms of a classical statistical ensemble. The states {tau} of this ensemble are characterized by a sequence of values one or zero or a corresponding set of two-level observables. Every classical probability distribution can be associated to a quantum state for fermions. If the time evolution of the classical probabilities p{sub {tau}} amounts to a rotation of the wave function q{sub {tau}}(t)={+-}{radical}(p{sub {tau}}(t)), we infer the unitary time evolution of a quantum system of fermions according to a Schroedinger equation. We establish how such classical statistical ensembles can be mapped to Grassmann functional integrals. Quantum field theories for fermions arise for a suitable time evolution of classical probabilities for generalized Ising models.

  17. Statistics in disease ecology

    PubMed Central

    Waller, Lance A.

    2008-01-01

    The three papers included in this special issue represent a set of presentations in an invited session on disease ecology at the 2005 Spring Meeting of the Eastern North American Region of the International Biometric Society. The papers each address statistical estimation and inference for particular components of different disease processes and, taken together, illustrate the breadth of statistical issues arising in the study of the ecology and public health impact of disease. As an introduction, we provide a very brief overview of the area of “disease ecology”, a variety of synonyms addressing different aspects of disease ecology, and present a schematic structure illustrating general components of the underlying disease process, data collection issues, and different disciplinary perspectives ranging from microbiology to public health surveillance. PMID:19081740

  18. A decision-theory approach to interpretable set analysis for high-dimensional data.

    PubMed

    Boca, Simina M; Bravo, Héctor Céorrada; Caffo, Brian; Leek, Jeffrey T; Parmigiani, Giovanni

    2013-09-01

    A key problem in high-dimensional significance analysis is to find pre-defined sets that show enrichment for a statistical signal of interest; the classic example is the enrichment of gene sets for differentially expressed genes. Here, we propose a new decision-theory approach to the analysis of gene sets which focuses on estimating the fraction of non-null variables in a set. We introduce the idea of "atoms," non-overlapping sets based on the original pre-defined set annotations. Our approach focuses on finding the union of atoms that minimizes a weighted average of the number of false discoveries and missed discoveries. We introduce a new false discovery rate for sets, called the atomic false discovery rate (afdr), and prove that the optimal estimator in our decision-theory framework is to threshold the afdr. These results provide a coherent and interpretable framework for the analysis of sets that addresses the key issues of overlapping annotations and difficulty in interpreting p values in both competitive and self-contained tests. We illustrate our method and compare it to a popular existing method using simulated examples, as well as gene-set and brain ROI data analyses. PMID:23909925

  19. Identification of a set of genes associated with response to interleukin-2 and interferon-α combination therapy for renal cell carcinoma through genome-wide gene expression profiling

    PubMed Central

    MIZUMORI, OSAMU; ZEMBUTSU, HITOSHI; KATO, YOICHIRO; TSUNODA, TATSUHIKO; MIYA, FUYUKI; MORIZONO, TAKASHI; TSUKAMOTO, TAIJI; FUJIOKA, TOMOAKI; TOMITA, YOSHIHIKO; KITAMURA, TADAICHI; OZONO, SEIICHIRO; MIKI, TSUNEHARU; NAITO, SEIJI; AKAZA, HIDEYUKI; NAKAMURA, YUSUKE

    2010-01-01

    Interleukin (IL)-2 and interferon (IFN)-α combination therapy for metastatic renal cell carcinoma (RCC) improves the prognosis for a subset of patients, while some patients suffer from severe adverse drug reactions with little benefit. To establish a method to predict responses to this combination therapy (approximately 30% response rate), the gene expression profiles of primary RCCs were analyzed using an oligoDNA microarray consisting of 38,500 genes or ESTs, after enrichment of the cancer cell population by laser micro-beam microdissection. The analysis of 10 responders and 18 non-responders identified 24 genes that exhibited significant differential expression between the two groups. In addition, the patients whose tumors did not express HLA-DQA1 or HLA-DQB1 molecules demonstrated poor clinical response. Exclusion of patients with tumors lacking either of these two genes is likely to improve the response rate to IL-2 and IFN-α combination therapy from 30 to 67%, indicating that a simple pretreatment test provides useful information with which to subselect patients with renal cancer in order to improve the efficacy of this treatment and reduce unnecessary medical costs. PMID:22993625

  20. Toolbox Approaches Using Molecular Markers and 16S rRNA Gene Amplicon Data Sets for Identification of Fecal Pollution in Surface Water.

    PubMed

    Ahmed, W; Staley, C; Sadowsky, M J; Gyawali, P; Sidhu, J P S; Palmer, A; Beale, D J; Toze, S

    2015-10-01

    In this study, host-associated molecular markers and bacterial 16S rRNA gene community analysis using high-throughput sequencing were used to identify the sources of fecal pollution in environmental waters in Brisbane, Australia. A total of 92 fecal and composite wastewater samples were collected from different host groups (cat, cattle, dog, horse, human, and kangaroo), and 18 water samples were collected from six sites (BR1 to BR6) along the Brisbane River in Queensland, Australia. Bacterial communities in the fecal, wastewater, and river water samples were sequenced. Water samples were also tested for the presence of bird-associated (GFD), cattle-associated (CowM3), horse-associated, and human-associated (HF183) molecular markers, to provide multiple lines of evidence regarding the possible presence of fecal pollution associated with specific hosts. Among the 18 water samples tested, 83%, 33%, 17%, and 17% were real-time PCR positive for the GFD, HF183, CowM3, and horse markers, respectively. Among the potential sources of fecal pollution in water samples from the river, DNA sequencing tended to show relatively small contributions from wastewater treatment plants (up to 13% of sequence reads). Contributions from other animal sources were rarely detected and were very small (<3% of sequence reads). Source contributions determined via sequence analysis versus detection of molecular markers showed variable agreement. A lack of relationships among fecal indicator bacteria, host-associated molecular markers, and 16S rRNA gene community analysis data was also observed. Nonetheless, we show that bacterial community and host-associated molecular marker analyses can be combined to identify potential sources of fecal pollution in an urban river. This study is a proof of concept, and based on the results, we recommend using bacterial community analysis (where possible) along with PCR detection or quantification of host-associated molecular markers to provide information on

  1. Transcription factor CecR (YbiH) regulates a set of genes affecting the sensitivity of Escherichia coli against cefoperazone and chloramphenicol.

    PubMed

    Yamanaka, Yuki; Shimada, Tomohiro; Yamamoto, Kaneyoshi; Ishihama, Akira

    2016-07-01

    Genomic SELEX (systematic evolution of ligands by exponential enrichment) screening was performed for identification of the binding site of YbiH, an as yet uncharacterized TetR-family transcription factor, on the Escherichia coli genome. YbiH was found to be a unique single-target regulator that binds in vitro within the intergenic spacer located between the divergently transcribed ybiH-ybhGFSR and rhlE operons. YbhG is an inner membrane protein and YbhFSR forms a membrane-associated ATP-binding cassette (ABC) transporter while RhlE is a ribosome-associated RNA helicase. Gel shift assay and DNase footprinting analyses indicated one clear binding site of YbiH, including a complete palindromic sequence of AATTAGTT-AACTAATT. An in vivo reporter assay indicated repression of the ybiH operon and activation of the rhlE operon by YbiH. After phenotype microarray screening, YbiH was indicated to confer resistance to chloramphenicol and cefazoline (a first-generation cephalosporin). A systematic survey of the participation of each of the predicted YbiH-regulated genes in the antibiotic sensitivity indicated involvement of the YbhFSR ABC-type transporter in the sensitivity to cefoperazone (a third-generation cephalosporin) and of the membrane protein YbhG in the control of sensitivity to chloramphenicol. Taken together with the growth test in the presence of these two antibiotics and in vitro transcription assay, it was concluded that the hitherto uncharacterized YbiH regulates transcription of both the bidirectional transcription units, the ybiH-ybhGFSR operon and the rhlE gene, which altogether are involved in the control of sensitivity to cefoperazone and chloramphenicol. We thus propose to rename YbiH as CecR (regulator of cefoperazone and chloramphenicol sensitivity). PMID:27112147

  2. Toolbox Approaches Using Molecular Markers and 16S rRNA Gene Amplicon Data Sets for Identification of Fecal Pollution in Surface Water

    PubMed Central

    Staley, C.; Sadowsky, M. J.; Gyawali, P.; Sidhu, J. P. S.; Palmer, A.; Beale, D. J.; Toze, S.

    2015-01-01

    In this study, host-associated molecular markers and bacterial 16S rRNA gene community analysis using high-throughput sequencing were used to identify the sources of fecal pollution in environmental waters in Brisbane, Australia. A total of 92 fecal and composite wastewater samples were collected from different host groups (cat, cattle, dog, horse, human, and kangaroo), and 18 water samples were collected from six sites (BR1 to BR6) along the Brisbane River in Queensland, Australia. Bacterial communities in the fecal, wastewater, and river water samples were sequenced. Water samples were also tested for the presence of bird-associated (GFD), cattle-associated (CowM3), horse-associated, and human-associated (HF183) molecular markers, to provide multiple lines of evidence regarding the possible presence of fecal pollution associated with specific hosts. Among the 18 water samples tested, 83%, 33%, 17%, and 17% were real-time PCR positive for the GFD, HF183, CowM3, and horse markers, respectively. Among the potential sources of fecal pollution in water samples from the river, DNA sequencing tended to show relatively small contributions from wastewater treatment plants (up to 13% of sequence reads). Contributions from other animal sources were rarely detected and were very small (<3% of sequence reads). Source contributions determined via sequence analysis versus detection of molecular markers showed variable agreement. A lack of relationships among fecal indicator bacteria, host-associated molecular markers, and 16S rRNA gene community analysis data was also observed. Nonetheless, we show that bacterial community and host-associated molecular marker analyses can be combined to identify potential sources of fecal pollution in an urban river. This study is a proof of concept, and based on the results, we recommend using bacterial community analysis (where possible) along with PCR detection or quantification of host-associated molecular markers to provide information on

  3. Gene-set and multivariate genome-wide association analysis of oppositional defiant behavior subtypes in attention-deficit/hyperactivity disorder.

    PubMed

    Aebi, Marcel; van Donkelaar, Marjolein M J; Poelmans, Geert; Buitelaar, Jan K; Sonuga-Barke, Edmund J S; Stringaris, Argyris; Consortium, Image; Faraone, Stephen V; Franke, Barbara; Steinhausen, Hans-Christoph; van Hulzen, Kimm J E

    2016-07-01

    Oppositional defiant disorder (ODD) is a frequent psychiatric disorder seen in children and adolescents with attention-deficit-hyperactivity disorder (ADHD). ODD is also a common antecedent to both affective disorders and aggressive behaviors. Although the heritability of ODD has been estimated to be around 0.60, there has been little research into the molecular genetics of ODD. The present study examined the association of irritable and defiant/vindictive dimensions and categorical subtypes of ODD (based on latent class analyses) with previously described specific polymorphisms (DRD4 exon3 VNTR, 5-HTTLPR, and seven OXTR SNPs) as well as with dopamine, serotonin, and oxytocin genes and pathways in a clinical sample of children and adolescents with ADHD. In addition, we performed a multivariate genome-wide association study (GWAS) of the aforementioned ODD dimensions and subtypes. Apart from adjusting the analyses for age and sex, we controlled for "parental ability to cope with disruptive behavior." None of the hypothesis-driven analyses revealed a significant association with ODD dimensions and subtypes. Inadequate parenting behavior was significantly associated with all ODD dimensions and subtypes, most strongly with defiant/vindictive behaviors. In addition, the GWAS did not result in genome-wide significant findings but bioinformatics and literature analyses revealed that the proteins encoded by 28 of the 53 top-ranked genes functionally interact in a molecular landscape centered around Beta-catenin signaling and involved in the regulation of neurite outgrowth. Our findings provide new insights into the molecular basis of ODD and inform future genetic studies of oppositional behavior. © 2015 The Authors. American Journal of Medical Genetics Part B: Neuropsychiatric Genetics Published by Wiley Periodicals, Inc. PMID:26184070

  4. Statistical Methods for Evolutionary Trees

    PubMed Central

    Edwards, A. W. F.

    2009-01-01

    In 1963 and 1964, L. L. Cavalli-Sforza and A. W. F. Edwards introduced novel methods for computing evolutionary trees from genetical data, initially for human populations from blood-group gene frequencies. The most important development was their introduction of statistical methods of estimation applied to stochastic models of evolution. PMID:19797062

  5. High-resolution definition of the Vibrio cholerae essential gene set with hidden Markov model–based analyses of transposon-insertion sequencing data

    PubMed Central

    Chao, Michael C.; Pritchard, Justin R.; Zhang, Yanjia J.; Rubin, Eric J.; Livny, Jonathan; Davis, Brigid M.; Waldor, Matthew K.

    2013-01-01

    The coupling of high-density transposon mutagenesis to high-throughput DNA sequencing (transposon-insertion sequencing) enables simultaneous and genome-wide assessment of the contributions of individual loci to bacterial growth and survival. We have refined analysis of transposon-insertion sequencing data by normalizing for the effect of DNA replication on sequencing output and using a hidden Markov model (HMM)-based filter to exploit heretofore unappreciated information inherent in all transposon-insertion sequencing data sets. The HMM can smooth variations in read abundance and thereby reduce the effects of read noise, as well as permit fine scale mapping that is independent of genomic annotation and enable classification of loci into several functional categories (e.g. essential, domain essential or ‘sick’). We generated a high-resolution map of genomic loci (encompassing both intra- and intergenic sequences) that are required or beneficial for in vitro growth of the cholera pathogen, Vibrio cholerae. This work uncovered new metabolic and physiologic requirements for V. cholerae survival, and by combining transposon-insertion sequencing and transcriptomic data sets, we also identified several novel noncoding RNA species that contribute to V. cholerae growth. Our findings suggest that HMM-based approaches will enhance extraction of biological meaning from transposon-insertion sequencing genomic data. PMID:23901011

  6. Cosmetic Plastic Surgery Statistics

    MedlinePlus

    2014 Cosmetic Plastic Surgery Statistics Cosmetic Procedure Trends 2014 Plastic Surgery Statistics Report Please credit the AMERICAN SOCIETY OF PLASTIC SURGEONS when citing statistical data or using ...

  7. Bioinformatic Description of Immunotherapy Targets for Pediatric T-Cell Leukemia and the Impact of Normal Gene Sets Used for Comparison

    PubMed Central

    Orentas, Rimas J.; Nordlund, Jessica; He, Jianbin; Sindiri, Sivasish; Mackall, Crystal; Fry, Terry J.; Khan, Javed

    2014-01-01

    Pediatric lymphoid leukemia has the highest cure rate of all pediatric malignancies, yet due to its prevalence, still accounts for the majority of childhood cancer deaths and requires long-term highly toxic therapy. The ability to target B-cell ALL with immunoglobulin-like binders, whether anti-CD22 antibody or anti-CD19 CAR-Ts, has impacted treatment options for some patients. The development of new ways to target B-cell antigens continues at rapid pace. T-cell ALL accounts for up to 20% of childhood leukemia but has yet to see a set of high-value immunotherapeutic targets identified. To find new targets for T-ALL immunotherapy, we employed a bioinformatic comparison to broad normal tissue arrays, hematopoietic stem cells (HSC), and mature lymphocytes, then filtered the results for transcripts encoding plasma membrane proteins. T-ALL bears a core T-cell signature and transcripts encoding TCR/CD3 components and canonical markers of T-cell development predominate, especially when comparison was made to normal tissue or HSC. However, when comparison to mature lymphocytes was also undertaken, we identified two antigens that may drive, or be associated with leukemogenesis; TALLA-1 and hedgehog interacting protein. In addition, TCR subfamilies, CD1, activation and adhesion markers, membrane-organizing molecules, and receptors linked to metabolism and inflammation were also identified. Of these, only CD52, CD37, and CD98 are currently being targeted clinically. This work provides a set of targets to be considered for future development of immunotherapies for T-ALL. PMID:24959420

  8. Identification of Human Ether-à-go-go Related Gene Modulators by Three Screening Platforms in an Academic Drug-Discovery Setting

    PubMed Central

    Huang, Xi-Ping; Mangano, Thomas; Hufeisen, Sandy; Setola, Vincent

    2010-01-01

    Abstract The human Ether-à-go-go related gene (hERG) potassium channel is responsible for the rapid delayed rectifier potassium current that plays a critical role in the repolarization of cardiomyocytes during the cardiac action potential. In humans, inhibition of hERG by drugs can prolong the electrocardiographic QT interval, which, in rare instance, leads to ventricular arrhythmia and sudden cardiac death. As such, several medications that block hERG channels in vitro have been withdrawn from the market due to QT prolongation and arrhythmias. The current FDA guidelines recommend that drug candidates destined for human use be evaluated for potential hERG activity (www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/ucm074963.pdf). Here, we employed automated planar patch clamp (APPC), high-throughput fluorescent Tl+ flux, and moderate-throughput [3H]dofetilide competition binding assays to characterize a panel of 49 drugs for their activities at the hERG channel. Notably, we used the same HEK293-hERG cell line for all assays, facilitating comparisons of hERG potencies across screening platforms. In general, hERG inhibitors were most potent in APPC assays, intermediate potent in [3H]dofetilide binding assays, and least potent in Tl+ flux assays. Binding affinity constants (pKi values) and Tl+ flux potencies (pEC50 values) correlated well with APPC pEC50 values. Further, the inhibitory potencies of many known hERG inhibitors in APPC matched literature values from manual and/or automated patch clamp systems. We also developed a novel fluorescent Tl+ flux assays to measure the effects of drugs that modulate hERG trafficking and surface expression. PMID:21158687

  9. Statistics in fusion experiments

    NASA Astrophysics Data System (ADS)

    McNeill, D. H.

    1997-11-01

    Since the reasons for the variability in data from plasma experiments are often unknown or uncontrollable, statistical methods must be applied. Reliable interpretation and public accountability require full data sets. Two examples of data misrepresentation at PPPL are analyzed: Te >100 eV on S-1 spheromak.(M. Yamada, Nucl. Fusion 25, 1327 (1985); reports to DoE; etc.) The reported high values (statistical artifacts of Thomson scattering measurements) were selected from a mass of data with an average of 40 eV or less. ``Correlated'' spectroscopic data were meaningless. (2) Extrapolation to Q >=0.5 for DT in TFTR.(D. Meade et al., IAEA Baltimore (1990), V. 1, p. 9; H. P. Furth, Statements to U. S. Congress (1989).) The DD yield used there was the highest through 1990 (>= 50% above average) and the DT to DD power ratio used was about twice any published value. Average DD yields and published yield ratios scale to Q<0.15 for DT, in accord with the observed performance over the last 3 1/2 years. Press reports of outlier data from TFTR have obscured the fact that the DT behavior follows from trivial scaling of the DD data. Good practice in future fusion research would have confidence intervals and other descriptive statistics accompanying reported numerical values (cf. JAMA).

  10. Clinical Outcome 3 Years After Autologous Chondrocyte Implantation Does Not Correlate With the Expression of a Predefined Gene Marker Set in Chondrocytes Prior to Implantation but Is Associated With Critical Signaling Pathways

    PubMed Central

    Stenberg, Johan; de Windt, Tommy S.; Synnergren, Jane; Hynsjö, Lars; van der Lee, Josefine; Saris, Daniel B.F.; Brittberg, Mats; Peterson, Lars; Lindahl, Anders

    2014-01-01

    Background: There is a need for tools to predict the chondrogenic potency of autologous cells for cartilage repair. Purpose: To evaluate previously proposed chondrogenic biomarkers and to identify new biomarkers in the chondrocyte transcriptome capable of predicting clinical success or failure after autologous chondrocyte implantation. Study Design: Controlled laboratory study and case-control study; Level of evidence, 3. Methods: Five patients with clinical improvement after autologous chondrocyte implantation and 5 patients with graft failures 3 years after implantation were included. Surplus chondrocytes from the transplantation were frozen for each patient. Each chondrocyte sample was subsequently thawed at the same time point and cultured for 1 cell doubling, prior to RNA purification and global microarray analysis. The expression profiles of a set of predefined marker genes (ie, collagen type II α1 [COL2A1], bone morphogenic protein 2 [BMP2], fibroblast growth factor receptor 3 [FGFR3], aggrecan [ACAN], CD44, and activin receptor–like kinase receptor 1 [ACVRL1]) were also evaluated. Results: No significant difference in expression of the predefined marker set was observed between the success and failure groups. Thirty-nine genes were found to be induced, and 38 genes were found to be repressed between the 2 groups prior to autologous chondrocyte implantation, which have implications for cell-regulating pathways (eg, apoptosis, interleukin signaling, and β-catenin regulation). Conclusion: No expressional differences that predict clinical outcome could be found in the present study, which may have implications for quality control assessments of autologous chondrocyte implantation. The subtle difference in gene expression regulation found between the 2 groups may strengthen the basis for further research, aiming at reliable biomarkers and quality control for tissue engineering in cartilage repair. Clinical Relevance: The present study shows the possible

  11. Electrophysiological characterization of a large set of novel variants in the SCN5A-gene: identification of novel LQTS3 and BrS mutations.

    PubMed

    Ortiz-Bonnin, Beatriz; Rinné, Susanne; Moss, Robin; Streit, Anne K; Scharf, Michael; Richter, Katrin; Stöber, Anika; Pfeufer, Arne; Seemann, Gunnar; Kääb, Stefan; Beckmann, Britt-Maria; Decher, Niels

    2016-08-01

    SCN5A encodes for the α-subunit of the cardiac voltage-gated sodium channel Nav1.5. Gain-of-function mutations in SCN5A are related to congenital long QT syndrome (LQTS3) characterized by delayed cardiac repolarization, leading to a prolonged QT interval in the ECG. Loss-of-function mutations in SCN5A are related to Brugada syndrome (BrS), characterized by an ST-segment elevation in the right precordial leads (V1-V3). The aim of this study was the characterization of a large set of novel SCN5A variants found in patients with different cardiac phenotypes, mainly LQTS and BrS. SCN5A variants of 13 families were functionally characterized in Xenopus laevis oocytes using the two-electrode voltage-clamp technique. We found in most of the cases, but not all, that the electrophysiology of the variants correlated with the clinically diagnosed phenotype. A susceptibility to develop LQTS can be suggested in patients carrying the variants S216L, K480N, A572D, F816Y, and G983D. However, taking the phenotype into account, the presence of the variants in genomic data bases, the mutational segregation, combined with our in vitro and in silico experiments, the variants S216L, S262G, K480N, A572D, F816Y, G983D, and T1526P remain as variants of unknown significance. However, the SCN5A variants R568H and A993T can be classified as pathogenic LQTS3 causing mutations, while R222stop and R2012H are novel BrS causing mutations. PMID:27287068

  12. A cautionary note on the rank product statistic.

    PubMed

    Koziol, James A

    2016-06-01

    The rank product method introduced by Breitling R et al. [2004, FEBS Letters 573, 83-92] has rapidly generated popularity in practical settings, in particular, detecting differential expression of genes in microarray experiments. The purpose of this note is to point out a particular property of the rank product method, namely, its differential sensitivity to over- and underexpression. It turns out that overexpression is less likely to be detected than underexpression with the rank product statistic. We have conducted both empirical and exact power studies that demonstrate this phenomenon, and summarize these findings in this note. PMID:27160968

  13. Biostatistical and medical statistics graduate education

    PubMed Central

    2014-01-01

    The development of graduate education in biostatistics and medical statistics is discussed in the context of training within a medical center setting. The need for medical researchers to employ a wide variety of statistical designs in clinical, genetic, basic science and translational settings justifies the ongoing integration of biostatistical training into medical center educational settings and informs its content. The integration of large data issues are a challenge. PMID:24472088

  14. On the statistics of the "genetic fingerprint".

    PubMed

    Ritter, H

    1991-01-01

    In analogy to the polygene determined morphological features, the DNA-fingerprint is also not suitable for statistical processing. Statements about the individuality are merely speculative. Frequencies of genes cannot be found, since it is impossible to determine which combinations of bands belong to one gene locus. Hence the DNA fingerprint enables the recognition of exclusions from paternity; it does not, however, allow a statistical analysis, no matter which method be employed. PMID:1685896

  15. The Statistical Fermi Paradox

    NASA Astrophysics Data System (ADS)

    Maccone, C.

    In this paper is provided the statistical generalization of the Fermi paradox. The statistics of habitable planets may be based on a set of ten (and possibly more) astrobiological requirements first pointed out by Stephen H. Dole in his book Habitable planets for man (1964). The statistical generalization of the original and by now too simplistic Dole equation is provided by replacing a product of ten positive numbers by the product of ten positive random variables. This is denoted the SEH, an acronym standing for “Statistical Equation for Habitables”. The proof in this paper is based on the Central Limit Theorem (CLT) of Statistics, stating that the sum of any number of independent random variables, each of which may be ARBITRARILY distributed, approaches a Gaussian (i.e. normal) random variable (Lyapunov form of the CLT). It is then shown that: 1. The new random variable NHab, yielding the number of habitables (i.e. habitable planets) in the Galaxy, follows the log- normal distribution. By construction, the mean value of this log-normal distribution is the total number of habitable planets as given by the statistical Dole equation. 2. The ten (or more) astrobiological factors are now positive random variables. The probability distribution of each random variable may be arbitrary. The CLT in the so-called Lyapunov or Lindeberg forms (that both do not assume the factors to be identically distributed) allows for that. In other words, the CLT "translates" into the SEH by allowing an arbitrary probability distribution for each factor. This is both astrobiologically realistic and useful for any further investigations. 3. By applying the SEH it is shown that the (average) distance between any two nearby habitable planets in the Galaxy may be shown to be inversely proportional to the cubic root of NHab. This distance is denoted by new random variable D. The relevant probability density function is derived, which was named the "Maccone distribution" by Paul Davies in

  16. Predict! Teaching Statistics Using Informational Statistical Inference

    ERIC Educational Resources Information Center

    Makar, Katie

    2013-01-01

    Statistics is one of the most widely used topics for everyday life in the school mathematics curriculum. Unfortunately, the statistics taught in schools focuses on calculations and procedures before students have a chance to see it as a useful and powerful tool. Researchers have found that a dominant view of statistics is as an assortment of tools…

  17. Statistics Poker: Reinforcing Basic Statistical Concepts

    ERIC Educational Resources Information Center

    Leech, Nancy L.

    2008-01-01

    Learning basic statistical concepts does not need to be tedious or dry; it can be fun and interesting through cooperative learning in the small-group activity of