Sample records for existing gene models

  1. Analysis of Cytoskeletal and Motility Proteins in the Sea Urchin Genome Assembly

    PubMed Central

    RL, Morris; MP, Hoffman; RA, Obar; SS, McCafferty; IR, Gibbons; AD, Leone; J, Cool; EL, Allgood; AM, Musante; KM, Judkins; BJ, Rossetti; AP, Rawson; DR, Burgess

    2007-01-01

    The sea urchin embryo is a classical model system for studying the role of the cytoskeleton in such events as fertilization, mitosis, cleavage, cell migration and gastrulation. We have conducted an analysis of gene models derived from the Strongylocentrotus purpuratus genome assembly and have gathered strong evidence for the existence of multiple gene families encoding cytoskeletal proteins and their regulators in sea urchin. While many cytoskeletal genes have been cloned from sea urchin with sequences already existing in public databases, genome analysis reveals a significantly higher degree of diversity within certain gene families. Furthermore, genes are described corresponding to homologs of cytoskeletal proteins not previously documented in sea urchins. To illustrate the varying degree of sequence diversity that exists within cytoskeletal gene families, we conducted an analysis of genes encoding actins, specific actin-binding proteins, myosins, tubulins, kinesins, dyneins, specific microtubule-associated proteins, and intermediate filaments. We conducted ontological analysis of select genes to better understand the relatedness of urchin cytoskeletal genes to those of other deuterostomes. We analyzed developmental expression (EST) data to confirm the existence of select gene models and to understand their differential expression during various stages of early development. PMID:17027957

  2. A Protocol for Using Gene Set Enrichment Analysis to Identify the Appropriate Animal Model for Translational Research.

    PubMed

    Weidner, Christopher; Steinfath, Matthias; Wistorf, Elisa; Oelgeschläger, Michael; Schneider, Marlon R; Schönfelder, Gilbert

    2017-08-16

    Recent studies that compared transcriptomic datasets of human diseases with datasets from mouse models using traditional gene-to-gene comparison techniques resulted in contradictory conclusions regarding the relevance of animal models for translational research. A major reason for the discrepancies between different gene expression analyses is the arbitrary filtering of differentially expressed genes. Furthermore, the comparison of single genes between different species and platforms often is limited by technical variance, leading to misinterpretation of the con/discordance between data from human and animal models. Thus, standardized approaches for systematic data analysis are needed. To overcome subjective gene filtering and ineffective gene-to-gene comparisons, we recently demonstrated that gene set enrichment analysis (GSEA) has the potential to avoid these problems. Therefore, we developed a standardized protocol for the use of GSEA to distinguish between appropriate and inappropriate animal models for translational research. This protocol is not suitable to predict how to design new model systems a-priori, as it requires existing experimental omics data. However, the protocol describes how to interpret existing data in a standardized manner in order to select the most suitable animal model, thus avoiding unnecessary animal experiments and misleading translational studies.

  3. From Coexpression to Coregulation: An Approach to Inferring Transcriptional Regulation Among Gene Classes from Large-Scale Expression Data

    NASA Technical Reports Server (NTRS)

    Mjolsness, Eric; Castano, Rebecca; Mann, Tobias; Wold, Barbara

    2000-01-01

    We provide preliminary evidence that existing algorithms for inferring small-scale gene regulation networks from gene expression data can be adapted to large-scale gene expression data coming from hybridization microarrays. The essential steps are (I) clustering many genes by their expression time-course data into a minimal set of clusters of co-expressed genes, (2) theoretically modeling the various conditions under which the time-courses are measured using a continuous-time analog recurrent neural network for the cluster mean time-courses, (3) fitting such a regulatory model to the cluster mean time courses by simulated annealing with weight decay, and (4) analysing several such fits for commonalities in the circuit parameter sets including the connection matrices. This procedure can be used to assess the adequacy of existing and future gene expression time-course data sets for determining transcriptional regulatory relationships such as coregulation.

  4. Omics analysis of mouse brain models of human diseases.

    PubMed

    Paban, Véronique; Loriod, Béatrice; Villard, Claude; Buee, Luc; Blum, David; Pietropaolo, Susanna; Cho, Yoon H; Gory-Faure, Sylvie; Mansour, Elodie; Gharbi, Ali; Alescio-Lautier, Béatrice

    2017-02-05

    The identification of common gene/protein profiles related to brain alterations, if they exist, may indicate the convergence of the pathogenic mechanisms driving brain disorders. Six genetically engineered mouse lines modelling neurodegenerative diseases and neuropsychiatric disorders were considered. Omics approaches, including transcriptomic and proteomic methods, were used. The gene/protein lists were used for inter-disease comparisons and further functional and network investigations. When the inter-disease comparison was performed using the gene symbol identifiers, the number of genes/proteins involved in multiple diseases decreased rapidly. Thus, no genes/proteins were shared by all 6 mouse models. Only one gene/protein (Gfap) was shared among 4 disorders, providing strong evidence that a common molecular signature does not exist among brain diseases. The inter-disease comparison of functional processes showed the involvement of a few major biological processes indicating that brain diseases of diverse aetiologies might utilize common biological pathways in the nervous system, without necessarily involving similar molecules. Copyright © 2016 Elsevier B.V. All rights reserved.

  5. A Penalized Robust Method for Identifying Gene-Environment Interactions

    PubMed Central

    Shi, Xingjie; Liu, Jin; Huang, Jian; Zhou, Yong; Xie, Yang; Ma, Shuangge

    2015-01-01

    In high-throughput studies, an important objective is to identify gene-environment interactions associated with disease outcomes and phenotypes. Many commonly adopted methods assume specific parametric or semiparametric models, which may be subject to model mis-specification. In addition, they usually use significance level as the criterion for selecting important interactions. In this study, we adopt the rank-based estimation, which is much less sensitive to model specification than some of the existing methods and includes several commonly encountered data and models as special cases. Penalization is adopted for the identification of gene-environment interactions. It achieves simultaneous estimation and identification and does not rely on significance level. For computation feasibility, a smoothed rank estimation is further proposed. Simulation shows that under certain scenarios, for example with contaminated or heavy-tailed data, the proposed method can significantly outperform the existing alternatives with more accurate identification. We analyze a lung cancer prognosis study with gene expression measurements under the AFT (accelerated failure time) model. The proposed method identifies interactions different from those using the alternatives. Some of the identified genes have important implications. PMID:24616063

  6. Scoring the correlation of genes by their shared properties using OScal, an improved overlap quantification model.

    PubMed

    Liu, Hui; Liu, Wei; Lin, Ying; Liu, Teng; Ma, Zhaowu; Li, Mo; Zhang, Hong-Mei; Kenneth Wang, Qing; Guo, An-Yuan

    2015-05-27

    Scoring the correlation between two genes by their shared properties is a common and basic work in biological study. A prospective way to score this correlation is to quantify the overlap between the two sets of homogeneous properties of the two genes. However the proper model has not been decided, here we focused on studying the quantification of overlap and proposed a more effective model after theoretically compared 7 existing models. We defined three characteristic parameters (d, R, r) of an overlap, which highlight essential differences among the 7 models and grouped them into two classes. Then the pros and cons of the two groups of model were fully examined by their solution space in the (d, R, r) coordinate system. Finally we proposed a new model called OScal (Overlap Score calculator), which was modified on Poisson distribution (one of 7 models) to avoid its disadvantages. Tested in assessing gene relation using different data, OScal performs better than existing models. In addition, OScal is a basic mathematic model, with very low computation cost and few restrictive conditions, so it can be used in a wide-range of research areas to measure the overlap or similarity of two entities.

  7. Risk Classification with an Adaptive Naive Bayes Kernel Machine Model.

    PubMed

    Minnier, Jessica; Yuan, Ming; Liu, Jun S; Cai, Tianxi

    2015-04-22

    Genetic studies of complex traits have uncovered only a small number of risk markers explaining a small fraction of heritability and adding little improvement to disease risk prediction. Standard single marker methods may lack power in selecting informative markers or estimating effects. Most existing methods also typically do not account for non-linearity. Identifying markers with weak signals and estimating their joint effects among many non-informative markers remains challenging. One potential approach is to group markers based on biological knowledge such as gene structure. If markers in a group tend to have similar effects, proper usage of the group structure could improve power and efficiency in estimation. We propose a two-stage method relating markers to disease risk by taking advantage of known gene-set structures. Imposing a naive bayes kernel machine (KM) model, we estimate gene-set specific risk models that relate each gene-set to the outcome in stage I. The KM framework efficiently models potentially non-linear effects of predictors without requiring explicit specification of functional forms. In stage II, we aggregate information across gene-sets via a regularization procedure. Estimation and computational efficiency is further improved with kernel principle component analysis. Asymptotic results for model estimation and gene set selection are derived and numerical studies suggest that the proposed procedure could outperform existing procedures for constructing genetic risk models.

  8. An Expression of Periodic Phenomena of Fashion on Sexual Selection Model with Conformity Genes and Memes

    NASA Astrophysics Data System (ADS)

    Mutoh, Atsuko; Tokuhara, Shinya; Kanoh, Masayoshi; Oboshi, Tamon; Kato, Shohei; Itoh, Hidenori

    It is generally thought that living things have trends in their preferences. The mechanism of occurrence of another trends in successive periods is concerned in their conformity. According to social impact theory, the minority is always exists in the group. There is a possibility that the minority make the transition to the majority by conforming agents. Because of agent's promotion of their conform actions, the majority can make the transition. We proposed an evolutionary model with both genes and memes, and elucidated the interaction between genes and memes on sexual selection. In this paper, we propose an agent model for sexual selection imported the concept of conformity. Using this model we try an environment where male agents and female agents are existed, we find that periodic phenomena of fashion are expressed. And we report the influence of conformity and differentiation on the transition of their preferences.

  9. Evaluating Gene Set Enrichment Analysis Via a Hybrid Data Model

    PubMed Central

    Hua, Jianping; Bittner, Michael L.; Dougherty, Edward R.

    2014-01-01

    Gene set enrichment analysis (GSA) methods have been widely adopted by biological labs to analyze data and generate hypotheses for validation. Most of the existing comparison studies focus on whether the existing GSA methods can produce accurate P-values; however, practitioners are often more concerned with the correct gene-set ranking generated by the methods. The ranking performance is closely related to two critical goals associated with GSA methods: the ability to reveal biological themes and ensuring reproducibility, especially for small-sample studies. We have conducted a comprehensive simulation study focusing on the ranking performance of seven representative GSA methods. We overcome the limitation on the availability of real data sets by creating hybrid data models from existing large data sets. To build the data model, we pick a master gene from the data set to form the ground truth and artificially generate the phenotype labels. Multiple hybrid data models can be constructed from one data set and multiple data sets of smaller sizes can be generated by resampling the original data set. This approach enables us to generate a large batch of data sets to check the ranking performance of GSA methods. Our simulation study reveals that for the proposed data model, the Q2 type GSA methods have in general better performance than other GSA methods and the global test has the most robust results. The properties of a data set play a critical role in the performance. For the data sets with highly connected genes, all GSA methods suffer significantly in performance. PMID:24558298

  10. Conceptual Variation or Incoherence? Textbook Discourse on Genes in Six Countries

    NASA Astrophysics Data System (ADS)

    Gericke, Niklas M.; Hagberg, Mariana; dos Santos, Vanessa Carvalho; Joaquim, Leyla Mariane; El-Hani, Charbel N.

    2014-02-01

    The aim of this paper is to investigate in a systematic and comparative way previous results of independent studies on the treatment of genes and gene function in high school textbooks from six different countries. We analyze how the conceptual variation within the scientific domain of Genetics regarding gene function models and gene concepts is transformed via the didactic transposition into school science textbooks. The results indicate that a common textbook discourse on genes and their function exist in textbooks from the different countries. The structure of science as represented by conceptual variation and the use of multiple models was present in all the textbooks. However, the existence of conceptual variation and multiple models is implicit in these textbooks, i.e., the phenomenon of conceptual variation and multiple models are not addressed explicitly, nor its consequences and, thus, it ends up introducing conceptual incoherence about the gene concept and its function within the textbooks. We conclude that within the found textbook-discourse ontological aspects of the academic disciplines of genetics and molecular biology were retained, but without their epistemological underpinnings; these are lost in the didactic transposition. These results are of interest since students might have problems reconstructing the correct scientific understanding from the transformed school science knowledge as depicted within the high school textbooks. Implications for textbook writing as well as teaching are discussed in the paper.

  11. Effects of the Family Environment: Gene-Environment Interaction and Passive Gene-Environment Correlation

    ERIC Educational Resources Information Center

    Price, Thomas S.; Jaffee, Sara R.

    2008-01-01

    The classical twin study provides a useful resource for testing hypotheses about how the family environment influences children's development, including how genes can influence sensitivity to environmental effects. However, existing statistical models do not account for the possibility that children can inherit exposure to family environments…

  12. Dog models for blinding inherited retinal dystrophies.

    PubMed

    Petersen-Jones, Simon M; Komáromy, András M

    2015-03-01

    Spontaneous canine models exist for several inherited retinal dystrophies. This review will summarize the models and indicate where they have been used in translational gene therapy trials. The RPE65 gene therapy trials to treat childhood blindness are a good example of how studies in dogs have contributed to therapy development. Outcomes in human clinical trials are compared and contrasted with the result of the preclinical dog trials.

  13. Inherited variation in immune response genes in follicular lymphoma and diffuse large B-cell lymphoma.

    PubMed

    Nielsen, Kaspar Rene; Steffensen, Rudi; Haunstrup, Thure Mors; Bødker, Julie Støve; Dybkær, Karen; Baech, John; Bøgsted, Martin; Johnsen, Hans Erik

    2015-01-01

    Diffuse large B-cell lymphoma (DLBCL) and follicular lymphoma (FL) both depend on immune-mediated survival and proliferation signals from the tumor microenvironment. Inherited genetic variation influences this complex interaction. A total of 89 studies investigating immune-response genes in DLBCL and FL were critically reviewed. Relatively consistent association exists for variation in the tumor necrosis factor alpha (TNFA) and interleukin-10 loci and DLBCL risk; for DLBCL outcome association with the TNFA locus exists. Variations at chromosome 6p31-32 were associated with FL risk. Importantly, individual risk alleles have been shown to interact with each other. We suggest that the pathogenetic impact of polymorphic genes should include gene-gene interaction analysis and should be validated in preclinical model systems of normal B lymphopoiesis and B-cell malignancies. In the future, large cohort studies of interactions and genome-wide association studies are needed to extend the present findings and explore new risk alleles to be studied in preclinical models.

  14. Improved annotation with de novo transcriptome assembly in four social amoeba species.

    PubMed

    Singh, Reema; Lawal, Hajara M; Schilde, Christina; Glöckner, Gernot; Barton, Geoffrey J; Schaap, Pauline; Cole, Christian

    2017-01-31

    Annotation of gene models and transcripts is a fundamental step in genome sequencing projects. Often this is performed with automated prediction pipelines, which can miss complex and atypical genes or transcripts. RNA sequencing (RNA-seq) data can aid the annotation with empirical data. Here we present de novo transcriptome assemblies generated from RNA-seq data in four Dictyostelid species: D. discoideum, P. pallidum, D. fasciculatum and D. lacteum. The assemblies were incorporated with existing gene models to determine corrections and improvement on a whole-genome scale. This is the first time this has been performed in these eukaryotic species. An initial de novo transcriptome assembly was generated by Trinity for each species and then refined with Program to Assemble Spliced Alignments (PASA). The completeness and quality were assessed with the Benchmarking Universal Single-Copy Orthologs (BUSCO) and Transrate tools at each stage of the assemblies. The final datasets of 11,315-12,849 transcripts contained 5,610-7,712 updates and corrections to >50% of existing gene models including changes to hundreds or thousands of protein products. Putative novel genes are also identified and alternative splice isoforms were observed for the first time in P. pallidum, D. lacteum and D. fasciculatum. In taking a whole transcriptome approach to genome annotation with empirical data we have been able to enrich the annotations of four existing genome sequencing projects. In doing so we have identified updates to the majority of the gene annotations across all four species under study and found putative novel genes and transcripts which could be worthy for follow-up. The new transcriptome data we present here will be a valuable resource for genome curators in the Dictyostelia and we propose this effective methodology for use in other genome annotation projects.

  15. Receptor Signaling Directs Global Recruitment of Pre-existing Transcription Factors to Inducible Elements.

    PubMed

    Cockerill, Peter N

    2016-12-01

    Gene expression programs are largely regulated by the tissue-specific expression of lineage-defining transcription factors or by the inducible expression of transcription factors in response to specific stimuli. Here I will review our own work over the last 20 years to show how specific activation signals also lead to the wide-spread re-distribution of pre-existing constitutive transcription factors to sites undergoing chromatin reorganization. I will summarize studies showing that activation of kinase signaling pathways creates open chromatin regions that recruit pre-existing factors which were previously unable to bind to closed chromatin. As models I will draw upon genes activated or primed by receptor signaling in memory T cells, and genes activated by cytokine receptor mutations in acute myeloid leukemia. I also summarize a hit-and-run model of stable epigenetic reprograming in memory T cells, mediated by transient Activator Protein 1 (AP-1) binding, which enables the accelerated activation of inducible enhancers.

  16. Dog Models for Blinding Inherited Retinal Dystrophies

    PubMed Central

    Komáromy, András M.

    2015-01-01

    Abstract Spontaneous canine models exist for several inherited retinal dystrophies. This review will summarize the models and indicate where they have been used in translational gene therapy trials. The RPE65 gene therapy trials to treat childhood blindness are a good example of how studies in dogs have contributed to therapy development. Outcomes in human clinical trials are compared and contrasted with the result of the preclinical dog trials. PMID:25671556

  17. A powerful score-based test statistic for detecting gene-gene co-association.

    PubMed

    Xu, Jing; Yuan, Zhongshang; Ji, Jiadong; Zhang, Xiaoshuai; Li, Hongkai; Wu, Xuesen; Xue, Fuzhong; Liu, Yanxun

    2016-01-29

    The genetic variants identified by Genome-wide association study (GWAS) can only account for a small proportion of the total heritability for complex disease. The existence of gene-gene joint effects which contains the main effects and their co-association is one of the possible explanations for the "missing heritability" problems. Gene-gene co-association refers to the extent to which the joint effects of two genes differ from the main effects, not only due to the traditional interaction under nearly independent condition but the correlation between genes. Generally, genes tend to work collaboratively within specific pathway or network contributing to the disease and the specific disease-associated locus will often be highly correlated (e.g. single nucleotide polymorphisms (SNPs) in linkage disequilibrium). Therefore, we proposed a novel score-based statistic (SBS) as a gene-based method for detecting gene-gene co-association. Various simulations illustrate that, under different sample sizes, marginal effects of causal SNPs and co-association levels, the proposed SBS has the better performance than other existed methods including single SNP-based and principle component analysis (PCA)-based logistic regression model, the statistics based on canonical correlations (CCU), kernel canonical correlation analysis (KCCU), partial least squares path modeling (PLSPM) and delta-square (δ (2)) statistic. The real data analysis of rheumatoid arthritis (RA) further confirmed its advantages in practice. SBS is a powerful and efficient gene-based method for detecting gene-gene co-association.

  18. A new computational strategy for predicting essential genes.

    PubMed

    Cheng, Jian; Wu, Wenwu; Zhang, Yinwen; Li, Xiangchen; Jiang, Xiaoqian; Wei, Gehong; Tao, Shiheng

    2013-12-21

    Determination of the minimum gene set for cellular life is one of the central goals in biology. Genome-wide essential gene identification has progressed rapidly in certain bacterial species; however, it remains difficult to achieve in most eukaryotic species. Several computational models have recently been developed to integrate gene features and used as alternatives to transfer gene essentiality annotations between organisms. We first collected features that were widely used by previous predictive models and assessed the relationships between gene features and gene essentiality using a stepwise regression model. We found two issues that could significantly reduce model accuracy: (i) the effect of multicollinearity among gene features and (ii) the diverse and even contrasting correlations between gene features and gene essentiality existing within and among different species. To address these issues, we developed a novel model called feature-based weighted Naïve Bayes model (FWM), which is based on Naïve Bayes classifiers, logistic regression, and genetic algorithm. The proposed model assesses features and filters out the effects of multicollinearity and diversity. The performance of FWM was compared with other popular models, such as support vector machine, Naïve Bayes model, and logistic regression model, by applying FWM to reciprocally predict essential genes among and within 21 species. Our results showed that FWM significantly improves the accuracy and robustness of essential gene prediction. FWM can remarkably improve the accuracy of essential gene prediction and may be used as an alternative method for other classification work. This method can contribute substantially to the knowledge of the minimum gene sets required for living organisms and the discovery of new drug targets.

  19. ACTG: novel peptide mapping onto gene models.

    PubMed

    Choi, Seunghyuk; Kim, Hyunwoo; Paek, Eunok

    2017-04-15

    In many proteogenomic applications, mapping peptide sequences onto genome sequences can be very useful, because it allows us to understand origins of the gene products. Existing software tools either take the genomic position of a peptide start site as an input or assume that the peptide sequence exactly matches the coding sequence of a given gene model. In case of novel peptides resulting from genomic variations, especially structural variations such as alternative splicing, these existing tools cannot be directly applied unless users supply information about the variant, either its genomic position or its transcription model. Mapping potentially novel peptides to genome sequences, while allowing certain genomic variations, requires introducing novel gene models when aligning peptide sequences to gene structures. We have developed a new tool called ACTG (Amino aCids To Genome), which maps peptides to genome, assuming all possible single exon skipping, junction variation allowing three edit distances from the original splice sites, exon extension and frame shift. In addition, it can also consider SNVs (single nucleotide variations) during mapping phase if a user provides the VCF (variant call format) file as an input. Available at http://prix.hanyang.ac.kr/ACTG/search.jsp . eunokpaek@hanyang.ac.kr. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  20. A systems biology approach to investigate the antimicrobial activity of oleuropein.

    PubMed

    Li, Xianhua; Liu, Yanhong; Jia, Qian; LaMacchia, Virginia; O'Donoghue, Kathryn; Huang, Zuyi

    2016-12-01

    Oleuropein and its hydrolysis products are olive phenolic compounds that have antimicrobial effects on a variety of pathogens, with the potential to be utilized in food and pharmaceutical products. While the existing research is mainly focused on individual genes or enzymes that are regulated by oleuropein for antimicrobial activities, little work has been done to integrate intracellular genes, enzymes and metabolic reactions for a systematic investigation of antimicrobial mechanism of oleuropein. In this study, the first genome-scale modeling method was developed to predict the system-level changes of intracellular metabolism triggered by oleuropein in Staphylococcus aureus, a common food-borne pathogen. To simulate the antimicrobial effect, an existing S. aureus genome-scale metabolic model was extended by adding the missing nitric oxide reactions, and exchange rates of potassium, phosphate and glutamate were adjusted in the model as suggested by previous research to mimic the stress imposed by oleuropein on S. aureus. The developed modeling approach was able to match S. aureus growth rates with experimental data for five oleuropein concentrations. The reactions with large flux change were identified and the enzymes of fifteen of these reactions were validated by existing research for their important roles in oleuropein metabolism. When compared with experimental data, the up/down gene regulations of 80% of these enzymes were correctly predicted by our modeling approach. This study indicates that the genome-scale modeling approach provides a promising avenue for revealing the intracellular metabolism of oleuropein antimicrobial properties.

  1. Combinatorial explosion in model gene networks

    NASA Astrophysics Data System (ADS)

    Edwards, R.; Glass, L.

    2000-09-01

    The explosive growth in knowledge of the genome of humans and other organisms leaves open the question of how the functioning of genes in interacting networks is coordinated for orderly activity. One approach to this problem is to study mathematical properties of abstract network models that capture the logical structures of gene networks. The principal issue is to understand how particular patterns of activity can result from particular network structures, and what types of behavior are possible. We study idealized models in which the logical structure of the network is explicitly represented by Boolean functions that can be represented by directed graphs on n-cubes, but which are continuous in time and described by differential equations, rather than being updated synchronously via a discrete clock. The equations are piecewise linear, which allows significant analysis and facilitates rapid integration along trajectories. We first give a combinatorial solution to the question of how many distinct logical structures exist for n-dimensional networks, showing that the number increases very rapidly with n. We then outline analytic methods that can be used to establish the existence, stability and periods of periodic orbits corresponding to particular cycles on the n-cube. We use these methods to confirm the existence of limit cycles discovered in a sample of a million randomly generated structures of networks of 4 genes. Even with only 4 genes, at least several hundred different patterns of stable periodic behavior are possible, many of them surprisingly complex. We discuss ways of further classifying these periodic behaviors, showing that small mutations (reversal of one or a few edges on the n-cube) need not destroy the stability of a limit cycle. Although these networks are very simple as models of gene networks, their mathematical transparency reveals relationships between structure and behavior, they suggest that the possibilities for orderly dynamics in such networks are extremely rich and they offer novel ways to think about how mutations can alter dynamics.

  2. Combinatorial explosion in model gene networks.

    PubMed

    Edwards, R.; Glass, L.

    2000-09-01

    The explosive growth in knowledge of the genome of humans and other organisms leaves open the question of how the functioning of genes in interacting networks is coordinated for orderly activity. One approach to this problem is to study mathematical properties of abstract network models that capture the logical structures of gene networks. The principal issue is to understand how particular patterns of activity can result from particular network structures, and what types of behavior are possible. We study idealized models in which the logical structure of the network is explicitly represented by Boolean functions that can be represented by directed graphs on n-cubes, but which are continuous in time and described by differential equations, rather than being updated synchronously via a discrete clock. The equations are piecewise linear, which allows significant analysis and facilitates rapid integration along trajectories. We first give a combinatorial solution to the question of how many distinct logical structures exist for n-dimensional networks, showing that the number increases very rapidly with n. We then outline analytic methods that can be used to establish the existence, stability and periods of periodic orbits corresponding to particular cycles on the n-cube. We use these methods to confirm the existence of limit cycles discovered in a sample of a million randomly generated structures of networks of 4 genes. Even with only 4 genes, at least several hundred different patterns of stable periodic behavior are possible, many of them surprisingly complex. We discuss ways of further classifying these periodic behaviors, showing that small mutations (reversal of one or a few edges on the n-cube) need not destroy the stability of a limit cycle. Although these networks are very simple as models of gene networks, their mathematical transparency reveals relationships between structure and behavior, they suggest that the possibilities for orderly dynamics in such networks are extremely rich and they offer novel ways to think about how mutations can alter dynamics. (c) 2000 American Institute of Physics.

  3. Modeling Gene-Wise Dependencies Improves the Identification of Drug Response Biomarkers in Cancer Studies | Office of Cancer Genomics

    Cancer.gov

    Recent advances in biomedical and sequencing technologies have revealed the genomic landscape of common forms of human cancer in unprecedented detail. Of the genes that drive tumorigenesis when altered, for most cancers it is believed that there exist a small number of “mountains” (genes altered at high frequencies across the population), and a much larger number of “hills” (much less frequently altered genes).

  4. Evaluating bacterial gene-finding HMM structures as probabilistic logic programs.

    PubMed

    Mørk, Søren; Holmes, Ian

    2012-03-01

    Probabilistic logic programming offers a powerful way to describe and evaluate structured statistical models. To investigate the practicality of probabilistic logic programming for structure learning in bioinformatics, we undertook a simplified bacterial gene-finding benchmark in PRISM, a probabilistic dialect of Prolog. We evaluate Hidden Markov Model structures for bacterial protein-coding gene potential, including a simple null model structure, three structures based on existing bacterial gene finders and two novel model structures. We test standard versions as well as ADPH length modeling and three-state versions of the five model structures. The models are all represented as probabilistic logic programs and evaluated using the PRISM machine learning system in terms of statistical information criteria and gene-finding prediction accuracy, in two bacterial genomes. Neither of our implementations of the two currently most used model structures are best performing in terms of statistical information criteria or prediction performances, suggesting that better-fitting models might be achievable. The source code of all PRISM models, data and additional scripts are freely available for download at: http://github.com/somork/codonhmm. Supplementary data are available at Bioinformatics online.

  5. Investigation of Seasonal and Latitudinal Effects on the Expression of Clock Genes in Drosophila

    NASA Astrophysics Data System (ADS)

    Hosseini, Seyede Sanaz; Nazarimehr, Fahimeh; Jafari, Sajad

    The primary goal in this work is to develop a dynamical model capturing the influence of seasonal and latitudinal variations on the expression of Drosophila clock genes. To this end, we study a specific dynamical system with strange attractors that exhibit changes of Drosophila activity in a range of latitudes and across different seasons. Bifurcations of this system are analyzed to peruse the effect of season and latitude on the behavior of clock genes. Existing experimental data collected from the activity of Drosophila melanogaster corroborate the dynamical model.

  6. On the Complexity of Duplication-Transfer-Loss Reconciliation with Non-Binary Gene Trees.

    PubMed

    Kordi, Misagh; Bansal, Mukul S

    2017-01-01

    Duplication-Transfer-Loss (DTL) reconciliation has emerged as a powerful technique for studying gene family evolution in the presence of horizontal gene transfer. DTL reconciliation takes as input a gene family phylogeny and the corresponding species phylogeny, and reconciles the two by postulating speciation, gene duplication, horizontal gene transfer, and gene loss events. Efficient algorithms exist for finding optimal DTL reconciliations when the gene tree is binary. However, gene trees are frequently non-binary. With such non-binary gene trees, the reconciliation problem seeks to find a binary resolution of the gene tree that minimizes the reconciliation cost. Given the prevalence of non-binary gene trees, many efficient algorithms have been developed for this problem in the context of the simpler Duplication-Loss (DL) reconciliation model. Yet, no efficient algorithms exist for DTL reconciliation with non-binary gene trees and the complexity of the problem remains unknown. In this work, we resolve this open question by showing that the problem is, in fact, NP-hard. Our reduction applies to both the dated and undated formulations of DTL reconciliation. By resolving this long-standing open problem, this work will spur the development of both exact and heuristic algorithms for this important problem.

  7. Derivation of large-scale cellular regulatory networks from biological time series data.

    PubMed

    de Bivort, Benjamin L

    2010-01-01

    Pharmacological agents and other perturbants of cellular homeostasis appear to nearly universally affect the activity of many genes, proteins, and signaling pathways. While this is due in part to nonspecificity of action of the drug or cellular stress, the large-scale self-regulatory behavior of the cell may also be responsible, as this typically means that when a cell switches states, dozens or hundreds of genes will respond in concert. If many genes act collectively in the cell during state transitions, rather than every gene acting independently, models of the cell can be created that are comprehensive of the action of all genes, using existing data, provided that the functional units in the model are collections of genes. Techniques to develop these large-scale cellular-level models are provided in detail, along with methods of analyzing them, and a brief summary of major conclusions about large-scale cellular networks to date.

  8. Duchenne Muscular Dystrophy Gene Therapy in the Canine Model

    PubMed Central

    2015-01-01

    Abstract Duchenne muscular dystrophy (DMD) is an X-linked lethal muscle disease caused by dystrophin deficiency. Gene therapy has significantly improved the outcome of dystrophin-deficient mice. Yet, clinical translation has not resulted in the expected benefits in human patients. This translational gap is largely because of the insufficient modeling of DMD in mice. Specifically, mice lacking dystrophin show minimum dystrophic symptoms, and they do not respond to the gene therapy vector in the same way as human patients do. Further, the size of a mouse is hundredfolds smaller than a boy, making it impossible to scale-up gene therapy in a mouse model. None of these limitations exist in the canine DMD (cDMD) model. For this reason, cDMD dogs have been considered a highly valuable platform to test experimental DMD gene therapy. Over the last three decades, a variety of gene therapy approaches have been evaluated in cDMD dogs using a number of nonviral and viral vectors. These studies have provided critical insight for the development of an effective gene therapy protocol in human patients. This review discusses the history, current status, and future directions of the DMD gene therapy in the canine model. PMID:25710459

  9. A comparative analysis of biclustering algorithms for gene expression data

    PubMed Central

    Eren, Kemal; Deveci, Mehmet; Küçüktunç, Onur; Çatalyürek, Ümit V.

    2013-01-01

    The need to analyze high-dimension biological data is driving the development of new data mining methods. Biclustering algorithms have been successfully applied to gene expression data to discover local patterns, in which a subset of genes exhibit similar expression levels over a subset of conditions. However, it is not clear which algorithms are best suited for this task. Many algorithms have been published in the past decade, most of which have been compared only to a small number of algorithms. Surveys and comparisons exist in the literature, but because of the large number and variety of biclustering algorithms, they are quickly outdated. In this article we partially address this problem of evaluating the strengths and weaknesses of existing biclustering methods. We used the BiBench package to compare 12 algorithms, many of which were recently published or have not been extensively studied. The algorithms were tested on a suite of synthetic data sets to measure their performance on data with varying conditions, such as different bicluster models, varying noise, varying numbers of biclusters and overlapping biclusters. The algorithms were also tested on eight large gene expression data sets obtained from the Gene Expression Omnibus. Gene Ontology enrichment analysis was performed on the resulting biclusters, and the best enrichment terms are reported. Our analyses show that the biclustering method and its parameters should be selected based on the desired model, whether that model allows overlapping biclusters, and its robustness to noise. In addition, we observe that the biclustering algorithms capable of finding more than one model are more successful at capturing biologically relevant clusters. PMID:22772837

  10. A kernel regression approach to gene-gene interaction detection for case-control studies.

    PubMed

    Larson, Nicholas B; Schaid, Daniel J

    2013-11-01

    Gene-gene interactions are increasingly being addressed as a potentially important contributor to the variability of complex traits. Consequently, attentions have moved beyond single locus analysis of association to more complex genetic models. Although several single-marker approaches toward interaction analysis have been developed, such methods suffer from very high testing dimensionality and do not take advantage of existing information, notably the definition of genes as functional units. Here, we propose a comprehensive family of gene-level score tests for identifying genetic elements of disease risk, in particular pairwise gene-gene interactions. Using kernel machine methods, we devise score-based variance component tests under a generalized linear mixed model framework. We conducted simulations based upon coalescent genetic models to evaluate the performance of our approach under a variety of disease models. These simulations indicate that our methods are generally higher powered than alternative gene-level approaches and at worst competitive with exhaustive SNP-level (where SNP is single-nucleotide polymorphism) analyses. Furthermore, we observe that simulated epistatic effects resulted in significant marginal testing results for the involved genes regardless of whether or not true main effects were present. We detail the benefits of our methods and discuss potential genome-wide analysis strategies for gene-gene interaction analysis in a case-control study design. © 2013 WILEY PERIODICALS, INC.

  11. The Inference of Gene Trees with Species Trees

    PubMed Central

    Szöllősi, Gergely J.; Tannier, Eric; Daubin, Vincent; Boussau, Bastien

    2015-01-01

    This article reviews the various models that have been used to describe the relationships between gene trees and species trees. Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can coexist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree–species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a more reliable basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree–species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution. PMID:25070970

  12. A Survey of Statistical Models for Reverse Engineering Gene Regulatory Networks

    PubMed Central

    Huang, Yufei; Tienda-Luna, Isabel M.; Wang, Yufeng

    2009-01-01

    Statistical models for reverse engineering gene regulatory networks are surveyed in this article. To provide readers with a system-level view of the modeling issues in this research, a graphical modeling framework is proposed. This framework serves as the scaffolding on which the review of different models can be systematically assembled. Based on the framework, we review many existing models for many aspects of gene regulation; the pros and cons of each model are discussed. In addition, network inference algorithms are also surveyed under the graphical modeling framework by the categories of point solutions and probabilistic solutions and the connections and differences among the algorithms are provided. This survey has the potential to elucidate the development and future of reverse engineering GRNs and bring statistical signal processing closer to the core of this research. PMID:20046885

  13. Integrated pipeline for inferring the evolutionary history of a gene family embedded in the species tree: a case study on the STIMATE gene family.

    PubMed

    Song, Jia; Zheng, Sisi; Nguyen, Nhung; Wang, Youjun; Zhou, Yubin; Lin, Kui

    2017-10-03

    Because phylogenetic inference is an important basis for answering many evolutionary problems, a large number of algorithms have been developed. Some of these algorithms have been improved by integrating gene evolution models with the expectation of accommodating the hierarchy of evolutionary processes. To the best of our knowledge, however, there still is no single unifying model or algorithm that can take all evolutionary processes into account through a stepwise or simultaneous method. On the basis of three existing phylogenetic inference algorithms, we built an integrated pipeline for inferring the evolutionary history of a given gene family; this pipeline can model gene sequence evolution, gene duplication-loss, gene transfer and multispecies coalescent processes. As a case study, we applied this pipeline to the STIMATE (TMEM110) gene family, which has recently been reported to play an important role in store-operated Ca 2+ entry (SOCE) mediated by ORAI and STIM proteins. We inferred their phylogenetic trees in 69 sequenced chordate genomes. By integrating three tree reconstruction algorithms with diverse evolutionary models, a pipeline for inferring the evolutionary history of a gene family was developed, and its application was demonstrated.

  14. A system-level model for the microbial regulatory genome.

    PubMed

    Brooks, Aaron N; Reiss, David J; Allard, Antoine; Wu, Wei-Ju; Salvanha, Diego M; Plaisier, Christopher L; Chandrasekaran, Sriram; Pan, Min; Kaur, Amardeep; Baliga, Nitin S

    2014-07-15

    Microbes can tailor transcriptional responses to diverse environmental challenges despite having streamlined genomes and a limited number of regulators. Here, we present data-driven models that capture the dynamic interplay of the environment and genome-encoded regulatory programs of two types of prokaryotes: Escherichia coli (a bacterium) and Halobacterium salinarum (an archaeon). The models reveal how the genome-wide distributions of cis-acting gene regulatory elements and the conditional influences of transcription factors at each of those elements encode programs for eliciting a wide array of environment-specific responses. We demonstrate how these programs partition transcriptional regulation of genes within regulons and operons to re-organize gene-gene functional associations in each environment. The models capture fitness-relevant co-regulation by different transcriptional control mechanisms acting across the entire genome, to define a generalized, system-level organizing principle for prokaryotic gene regulatory networks that goes well beyond existing paradigms of gene regulation. An online resource (http://egrin2.systemsbiology.net) has been developed to facilitate multiscale exploration of conditional gene regulation in the two prokaryotes. © 2014 The Authors. Published under the terms of the CC BY 4.0 license.

  15. Improved animal models for testing gene therapy for atherosclerosis.

    PubMed

    Du, Liang; Zhang, Jingwan; De Meyer, Guido R Y; Flynn, Rowan; Dichek, David A

    2014-04-01

    Gene therapy delivered to the blood vessel wall could augment current therapies for atherosclerosis, including systemic drug therapy and stenting. However, identification of clinically useful vectors and effective therapeutic transgenes remains at the preclinical stage. Identification of effective vectors and transgenes would be accelerated by availability of animal models that allow practical and expeditious testing of vessel-wall-directed gene therapy. Such models would include humanlike lesions that develop rapidly in vessels that are amenable to efficient gene delivery. Moreover, because human atherosclerosis develops in normal vessels, gene therapy that prevents atherosclerosis is most logically tested in relatively normal arteries. Similarly, gene therapy that causes atherosclerosis regression requires gene delivery to an existing lesion. Here we report development of three new rabbit models for testing vessel-wall-directed gene therapy that either prevents or reverses atherosclerosis. Carotid artery intimal lesions in these new models develop within 2-7 months after initiation of a high-fat diet and are 20-80 times larger than lesions in a model we described previously. Individual models allow generation of lesions that are relatively rich in either macrophages or smooth muscle cells, permitting testing of gene therapy strategies targeted at either cell type. Two of the models include gene delivery to essentially normal arteries and will be useful for identifying strategies that prevent lesion development. The third model generates lesions rapidly in vector-naïve animals and can be used for testing gene therapy that promotes lesion regression. These models are optimized for testing helper-dependent adenovirus (HDAd)-mediated gene therapy; however, they could be easily adapted for testing of other vectors or of different types of molecular therapies, delivered directly to the blood vessel wall. Our data also supports the promise of HDAd to deliver long-term therapy from vascular endothelium without accelerating atherosclerotic disease.

  16. Integrating mitosis, toxicity, and transgene expression in a telecommunications packet-switched network model of lipoplex-mediated gene delivery.

    PubMed

    Martin, Timothy M; Wysocki, Beata J; Beyersdorf, Jared P; Wysocki, Tadeusz A; Pannier, Angela K

    2014-08-01

    Gene delivery systems transport exogenous genetic information to cells or biological systems with the potential to directly alter endogenous gene expression and behavior with applications in functional genomics, tissue engineering, medical devices, and gene therapy. Nonviral systems offer advantages over viral systems because of their low immunogenicity, inexpensive synthesis, and easy modification but suffer from lower transfection levels. The representation of gene transfer using models offers perspective and interpretation of complex cellular mechanisms,including nonviral gene delivery where exact mechanisms are unknown. Here, we introduce a novel telecommunications model of the nonviral gene delivery process in which the delivery of the gene to a cell is synonymous with delivery of a packet of information to a destination computer within a packet-switched computer network. Such a model uses nodes and layers to simplify the complexity of modeling the transfection process and to overcome several challenges of existing models. These challenges include a limited scope and limited time frame, which often does not incorporate biological effects known to affect transfection. The telecommunication model was constructed in MATLAB to model lipoplex delivery of the gene encoding the green fluorescent protein to HeLa cells. Mitosis and toxicity events were included in the model resulting in simulation outputs of nuclear internalization and transfection efficiency that correlated with experimental data. A priori predictions based on model sensitivity analysis suggest that increasing endosomal escape and decreasing lysosomal degradation, protein degradation, and GFP-induced toxicity can improve transfection efficiency by three-fold. Application of the telecommunications model to nonviral gene delivery offers insight into the development of new gene delivery systems with therapeutically relevant transfection levels.

  17. Gene regulatory network identification from the yeast cell cycle based on a neuro-fuzzy system.

    PubMed

    Wang, B H; Lim, J W; Lim, J S

    2016-08-30

    Many studies exist for reconstructing gene regulatory networks (GRNs). In this paper, we propose a method based on an advanced neuro-fuzzy system, for gene regulatory network reconstruction from microarray time-series data. This approach uses a neural network with a weighted fuzzy function to model the relationships between genes. Fuzzy rules, which determine the regulators of genes, are very simplified through this method. Additionally, a regulator selection procedure is proposed, which extracts the exact dynamic relationship between genes, using the information obtained from the weighted fuzzy function. Time-series related features are extracted from the original data to employ the characteristics of temporal data that are useful for accurate GRN reconstruction. The microarray dataset of the yeast cell cycle was used for our study. We measured the mean squared prediction error for the efficiency of the proposed approach and evaluated the accuracy in terms of precision, sensitivity, and F-score. The proposed method outperformed the other existing approaches.

  18. Incorporating prior information into differential network analysis using non-paranormal graphical models.

    PubMed

    Zhang, Xiao-Fei; Ou-Yang, Le; Yan, Hong

    2017-08-15

    Understanding how gene regulatory networks change under different cellular states is important for revealing insights into network dynamics. Gaussian graphical models, which assume that the data follow a joint normal distribution, have been used recently to infer differential networks. However, the distributions of the omics data are non-normal in general. Furthermore, although much biological knowledge (or prior information) has been accumulated, most existing methods ignore the valuable prior information. Therefore, new statistical methods are needed to relax the normality assumption and make full use of prior information. We propose a new differential network analysis method to address the above challenges. Instead of using Gaussian graphical models, we employ a non-paranormal graphical model that can relax the normality assumption. We develop a principled model to take into account the following prior information: (i) a differential edge less likely exists between two genes that do not participate together in the same pathway; (ii) changes in the networks are driven by certain regulator genes that are perturbed across different cellular states and (iii) the differential networks estimated from multi-view gene expression data likely share common structures. Simulation studies demonstrate that our method outperforms other graphical model-based algorithms. We apply our method to identify the differential networks between platinum-sensitive and platinum-resistant ovarian tumors, and the differential networks between the proneural and mesenchymal subtypes of glioblastoma. Hub nodes in the estimated differential networks rediscover known cancer-related regulator genes and contain interesting predictions. The source code is at https://github.com/Zhangxf-ccnu/pDNA. szuouyl@gmail.com. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  19. Confident difference criterion: a new Bayesian differentially expressed gene selection algorithm with applications.

    PubMed

    Yu, Fang; Chen, Ming-Hui; Kuo, Lynn; Talbott, Heather; Davis, John S

    2015-08-07

    Recently, the Bayesian method becomes more popular for analyzing high dimensional gene expression data as it allows us to borrow information across different genes and provides powerful estimators for evaluating gene expression levels. It is crucial to develop a simple but efficient gene selection algorithm for detecting differentially expressed (DE) genes based on the Bayesian estimators. In this paper, by extending the two-criterion idea of Chen et al. (Chen M-H, Ibrahim JG, Chi Y-Y. A new class of mixture models for differential gene expression in DNA microarray data. J Stat Plan Inference. 2008;138:387-404), we propose two new gene selection algorithms for general Bayesian models and name these new methods as the confident difference criterion methods. One is based on the standardized differences between two mean expression values among genes; the other adds the differences between two variances to it. The proposed confident difference criterion methods first evaluate the posterior probability of a gene having different gene expressions between competitive samples and then declare a gene to be DE if the posterior probability is large. The theoretical connection between the proposed first method based on the means and the Bayes factor approach proposed by Yu et al. (Yu F, Chen M-H, Kuo L. Detecting differentially expressed genes using alibrated Bayes factors. Statistica Sinica. 2008;18:783-802) is established under the normal-normal-model with equal variances between two samples. The empirical performance of the proposed methods is examined and compared to those of several existing methods via several simulations. The results from these simulation studies show that the proposed confident difference criterion methods outperform the existing methods when comparing gene expressions across different conditions for both microarray studies and sequence-based high-throughput studies. A real dataset is used to further demonstrate the proposed methodology. In the real data application, the confident difference criterion methods successfully identified more clinically important DE genes than the other methods. The confident difference criterion method proposed in this paper provides a new efficient approach for both microarray studies and sequence-based high-throughput studies to identify differentially expressed genes.

  20. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kim, Young June; Ahn, Kwang Sung; Kim, Minjeong

    Highlights: • ATM gene-targeted pigs were produced by somatic cell nuclear transfer. • A novel large animal model for ataxia telangiectasia was developed. • The new model may provide an alternative to the mouse model. - Abstract: Ataxia telangiectasia (A-T) is a recessive autosomal disorder associated with pleiotropic phenotypes, including progressive cerebellar degeneration, gonad atrophy, and growth retardation. Even though A-T is known to be caused by the mutations in the Ataxia telangiectasia mutated (ATM) gene, the correlation between abnormal cellular physiology caused by ATM mutations and the multiple symptoms of A-T disease has not been clearly determined. None ofmore » the existing ATM mouse models properly reflects the extent to which neurological degeneration occurs in human. In an attempt to provide a large animal model for A-T, we produced gene-targeted pigs with mutations in the ATM gene by somatic cell nuclear transfer. The disrupted allele in the ATM gene of cloned piglets was confirmed via PCR and Southern blot analysis. The ATM gene-targeted pigs generated in the present study may provide an alternative to the current mouse model for the study of mechanisms underlying A-T disorder and for the development of new therapies.« less

  1. Lentivirus-mediated platelet gene therapy of murine hemophilia A with pre-existing anti-FVIII immunity

    PubMed Central

    Kuether, E. L.; Schroeder, J. A.; Fahs, S. A.; Cooley, B. C.; Chen, Y.; Montgomery, R. R.; Wilcox, D. A.; Shi, Q.

    2012-01-01

    Summary Background The development of inhibitory antibodies, referred to as inhibitors, against exogenous FVIII in a significant subset of patients with hemophilia A remains a persistent challenge to the efficacy of protein replacement therapy. Our previous studies using the transgenic approach provided proof-of-principle that platelet-specific expression could be successful for treating hemophilia A in the presence of inhibitory antibodies. Objective To investigate a clinically translatable approach for platelet gene therapy of hemophilia A with pre-existing inhibitors. Methods Platelet-FVIII expression in pre-immunized FVIIInull mice was introduced by transplantation of lentivirus-transduced bone marrow or enriched hematopoietic stem cells. FVIII expression was determined by a chromogenic assay. The transgene copy number per cell was quantitated by real time PCR. Inhibitor titer was measured by Bethesda assay. Phenotypic correction was assessed by the tail clipping assay and an electrolytic-induced venous injury model. Integration sites were analyzed by LAM-PCR. Results Therapeutic levels of platelet-FVIII expression were sustained long-term without evoking an anti-FVIII memory response in the transduced pre-immunized recipients. The tail clip survival test and the electrolytic injury model confirmed that hemostasis was improved in the treated animals. Sequential bone marrow transplants showed sustained platelet-FVIII expression resulting in phenotypic correction in pre-immunized secondary and tertiary recipients. Conclusions Lentivirus-mediated platelet-specific gene transfer improves hemostasis in hemophilic A mice with pre-existing inhibitors, indicating that this approach may be a promising strategy for gene therapy of hemophilia A even in the high-risk setting of pre-existing inhibitory antibodies. PMID:22632092

  2. Global transcriptome analysis reveals extensive gene remodeling, alternative splicing and differential transcription profiles in non-seed vascular plant Selaginella moellendorffii.

    PubMed

    Zhu, Yan; Chen, Longxian; Zhang, Chengjun; Hao, Pei; Jing, Xinyun; Li, Xuan

    2017-01-25

    Selaginella moellendorffii, a lycophyte, is a model plant to study the early evolution and development of vascular plants. As the first and only sequenced lycophyte to date, the genome of S. moellendorffii revealed many conserved genes and pathways, as well as specialized genes different from flowering plants. Despite the progress made, little is known about long noncoding RNAs (lncRNA) and the alternative splicing (AS) of coding genes in S. moellendorffii. Its coding gene models have not been fully validated with transcriptome data. Furthermore, it remains important to understand whether the regulatory mechanisms similar to flowering plants are used, and how they operate in a non-seed primitive vascular plant. RNA-sequencing (RNA-seq) was performed for three S. moellendorffii tissues, root, stem, and leaf, by constructing strand-specific RNA-seq libraries from RNA purified using RiboMinus isolation protocol. A total of 176 million reads (44 Gbp) were obtained from three tissue types, and were mapped to S. moellendorffii genome. By comparing with 22,285 existing gene models of S. moellendorffii, we identified 7930 high-confidence novel coding genes (a 35.6% increase), and for the first time reported 4422 lncRNAs in a lycophyte. Further, we refined 2461 (11.0%) of existing gene models, and identified 11,030 AS events (for 5957 coding genes) revealed for the first time for lycophytes. Tissue-specific gene expression with functional implication was analyzed, and 1031, 554, and 269 coding genes, and 174, 39, and 17 lncRNAs were identified in root, stem, and leaf tissues, respectively. The expression of critical genes for vascular development stages, i.e. formation of provascular cells, xylem specification and differentiation, and phloem specification and differentiation, was compared in S. moellendorffii tissues, indicating a less complex regulatory mechanism in lycophytes than in flowering plants. The results were further strengthened by the evolutionary trend of seven transcription factor families related to vascular development, which was observed among four representative species of seed and non-seed vascular plants, and nonvascular land and aquatic plants. The deep RNA-seq study of S. moellendorffii discovered extensive new gene contents, including novel coding genes, lncRNAs, AS events, and refined gene models. Compared to flowering vascular plants, S. moellendorffii displayed a less complexity in both gene structure, alternative splicing, and regulatory elements of vascular development. The study offered important insight into the evolution of vascular plants, and the regulation mechanism of vascular development in a non-seed plant.

  3. High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing.

    PubMed

    Lagarde, Julien; Uszczynska-Ratajczak, Barbara; Carbonell, Silvia; Pérez-Lluch, Sílvia; Abad, Amaya; Davis, Carrie; Gingeras, Thomas R; Frankish, Adam; Harrow, Jennifer; Guigo, Roderic; Johnson, Rory

    2017-12-01

    Accurate annotation of genes and their transcripts is a foundation of genomics, but currently no annotation technique combines throughput and accuracy. As a result, reference gene collections remain incomplete-many gene models are fragmentary, and thousands more remain uncataloged, particularly for long noncoding RNAs (lncRNAs). To accelerate lncRNA annotation, the GENCODE consortium has developed RNA Capture Long Seq (CLS), which combines targeted RNA capture with third-generation long-read sequencing. Here we present an experimental reannotation of the GENCODE intergenic lncRNA populations in matched human and mouse tissues that resulted in novel transcript models for 3,574 and 561 gene loci, respectively. CLS approximately doubled the annotated complexity of targeted loci, outperforming existing short-read techniques. Full-length transcript models produced by CLS enabled us to definitively characterize the genomic features of lncRNAs, including promoter and gene structure, and protein-coding potential. Thus, CLS removes a long-standing bottleneck in transcriptome annotation and generates manual-quality full-length transcript models at high-throughput scales.

  4. Modeling gene regulatory network motifs using statecharts

    PubMed Central

    2012-01-01

    Background Gene regulatory networks are widely used by biologists to describe the interactions among genes, proteins and other components at the intra-cellular level. Recently, a great effort has been devoted to give gene regulatory networks a formal semantics based on existing computational frameworks. For this purpose, we consider Statecharts, which are a modular, hierarchical and executable formal model widely used to represent software systems. We use Statecharts for modeling small and recurring patterns of interactions in gene regulatory networks, called motifs. Results We present an improved method for modeling gene regulatory network motifs using Statecharts and we describe the successful modeling of several motifs, including those which could not be modeled or whose models could not be distinguished using the method of a previous proposal. We model motifs in an easy and intuitive way by taking advantage of the visual features of Statecharts. Our modeling approach is able to simulate some interesting temporal properties of gene regulatory network motifs: the delay in the activation and the deactivation of the "output" gene in the coherent type-1 feedforward loop, the pulse in the incoherent type-1 feedforward loop, the bistability nature of double positive and double negative feedback loops, the oscillatory behavior of the negative feedback loop, and the "lock-in" effect of positive autoregulation. Conclusions We present a Statecharts-based approach for the modeling of gene regulatory network motifs in biological systems. The basic motifs used to build more complex networks (that is, simple regulation, reciprocal regulation, feedback loop, feedforward loop, and autoregulation) can be faithfully described and their temporal dynamics can be analyzed. PMID:22536967

  5. Protein and gene model inference based on statistical modeling in k-partite graphs.

    PubMed

    Gerster, Sarah; Qeli, Ermir; Ahrens, Christian H; Bühlmann, Peter

    2010-07-06

    One of the major goals of proteomics is the comprehensive and accurate description of a proteome. Shotgun proteomics, the method of choice for the analysis of complex protein mixtures, requires that experimentally observed peptides are mapped back to the proteins they were derived from. This process is also known as protein inference. We present Markovian Inference of Proteins and Gene Models (MIPGEM), a statistical model based on clearly stated assumptions to address the problem of protein and gene model inference for shotgun proteomics data. In particular, we are dealing with dependencies among peptides and proteins using a Markovian assumption on k-partite graphs. We are also addressing the problems of shared peptides and ambiguous proteins by scoring the encoding gene models. Empirical results on two control datasets with synthetic mixtures of proteins and on complex protein samples of Saccharomyces cerevisiae, Drosophila melanogaster, and Arabidopsis thaliana suggest that the results with MIPGEM are competitive with existing tools for protein inference.

  6. An improved Pearson's correlation proximity-based hierarchical clustering for mining biological association between genes.

    PubMed

    Booma, P M; Prabhakaran, S; Dhanalakshmi, R

    2014-01-01

    Microarray gene expression datasets has concerned great awareness among molecular biologist, statisticians, and computer scientists. Data mining that extracts the hidden and usual information from datasets fails to identify the most significant biological associations between genes. A search made with heuristic for standard biological process measures only the gene expression level, threshold, and response time. Heuristic search identifies and mines the best biological solution, but the association process was not efficiently addressed. To monitor higher rate of expression levels between genes, a hierarchical clustering model was proposed, where the biological association between genes is measured simultaneously using proximity measure of improved Pearson's correlation (PCPHC). Additionally, the Seed Augment algorithm adopts average linkage methods on rows and columns in order to expand a seed PCPHC model into a maximal global PCPHC (GL-PCPHC) model and to identify association between the clusters. Moreover, a GL-PCPHC applies pattern growing method to mine the PCPHC patterns. Compared to existing gene expression analysis, the PCPHC model achieves better performance. Experimental evaluations are conducted for GL-PCPHC model with standard benchmark gene expression datasets extracted from UCI repository and GenBank database in terms of execution time, size of pattern, significance level, biological association efficiency, and pattern quality.

  7. An Improved Pearson's Correlation Proximity-Based Hierarchical Clustering for Mining Biological Association between Genes

    PubMed Central

    Booma, P. M.; Prabhakaran, S.; Dhanalakshmi, R.

    2014-01-01

    Microarray gene expression datasets has concerned great awareness among molecular biologist, statisticians, and computer scientists. Data mining that extracts the hidden and usual information from datasets fails to identify the most significant biological associations between genes. A search made with heuristic for standard biological process measures only the gene expression level, threshold, and response time. Heuristic search identifies and mines the best biological solution, but the association process was not efficiently addressed. To monitor higher rate of expression levels between genes, a hierarchical clustering model was proposed, where the biological association between genes is measured simultaneously using proximity measure of improved Pearson's correlation (PCPHC). Additionally, the Seed Augment algorithm adopts average linkage methods on rows and columns in order to expand a seed PCPHC model into a maximal global PCPHC (GL-PCPHC) model and to identify association between the clusters. Moreover, a GL-PCPHC applies pattern growing method to mine the PCPHC patterns. Compared to existing gene expression analysis, the PCPHC model achieves better performance. Experimental evaluations are conducted for GL-PCPHC model with standard benchmark gene expression datasets extracted from UCI repository and GenBank database in terms of execution time, size of pattern, significance level, biological association efficiency, and pattern quality. PMID:25136661

  8. Synchronous versus asynchronous modeling of gene regulatory networks.

    PubMed

    Garg, Abhishek; Di Cara, Alessandro; Xenarios, Ioannis; Mendoza, Luis; De Micheli, Giovanni

    2008-09-01

    In silico modeling of gene regulatory networks has gained some momentum recently due to increased interest in analyzing the dynamics of biological systems. This has been further facilitated by the increasing availability of experimental data on gene-gene, protein-protein and gene-protein interactions. The two dynamical properties that are often experimentally testable are perturbations and stable steady states. Although a lot of work has been done on the identification of steady states, not much work has been reported on in silico modeling of cellular differentiation processes. In this manuscript, we provide algorithms based on reduced ordered binary decision diagrams (ROBDDs) for Boolean modeling of gene regulatory networks. Algorithms for synchronous and asynchronous transition models have been proposed and their corresponding computational properties have been analyzed. These algorithms allow users to compute cyclic attractors of large networks that are currently not feasible using existing software. Hereby we provide a framework to analyze the effect of multiple gene perturbation protocols, and their effect on cell differentiation processes. These algorithms were validated on the T-helper model showing the correct steady state identification and Th1-Th2 cellular differentiation process. The software binaries for Windows and Linux platforms can be downloaded from http://si2.epfl.ch/~garg/genysis.html.

  9. Clustering of time-course gene expression profiles using normal mixture models with autoregressive random effects

    PubMed Central

    2012-01-01

    Background Time-course gene expression data such as yeast cell cycle data may be periodically expressed. To cluster such data, currently used Fourier series approximations of periodic gene expressions have been found not to be sufficiently adequate to model the complexity of the time-course data, partly due to their ignoring the dependence between the expression measurements over time and the correlation among gene expression profiles. We further investigate the advantages and limitations of available models in the literature and propose a new mixture model with autoregressive random effects of the first order for the clustering of time-course gene-expression profiles. Some simulations and real examples are given to demonstrate the usefulness of the proposed models. Results We illustrate the applicability of our new model using synthetic and real time-course datasets. We show that our model outperforms existing models to provide more reliable and robust clustering of time-course data. Our model provides superior results when genetic profiles are correlated. It also gives comparable results when the correlation between the gene profiles is weak. In the applications to real time-course data, relevant clusters of coregulated genes are obtained, which are supported by gene-function annotation databases. Conclusions Our new model under our extension of the EMMIX-WIRE procedure is more reliable and robust for clustering time-course data because it adopts a random effects model that allows for the correlation among observations at different time points. It postulates gene-specific random effects with an autocorrelation variance structure that models coregulation within the clusters. The developed R package is flexible in its specification of the random effects through user-input parameters that enables improved modelling and consequent clustering of time-course data. PMID:23151154

  10. GraphTeams: a method for discovering spatial gene clusters in Hi-C sequencing data.

    PubMed

    Schulz, Tizian; Stoye, Jens; Doerr, Daniel

    2018-05-08

    Hi-C sequencing offers novel, cost-effective means to study the spatial conformation of chromosomes. We use data obtained from Hi-C experiments to provide new evidence for the existence of spatial gene clusters. These are sets of genes with associated functionality that exhibit close proximity to each other in the spatial conformation of chromosomes across several related species. We present the first gene cluster model capable of handling spatial data. Our model generalizes a popular computational model for gene cluster prediction, called δ-teams, from sequences to graphs. Following previous lines of research, we subsequently extend our model to allow for several vertices being associated with the same label. The model, called δ-teams with families, is particular suitable for our application as it enables handling of gene duplicates. We develop algorithmic solutions for both models. We implemented the algorithm for discovering δ-teams with families and integrated it into a fully automated workflow for discovering gene clusters in Hi-C data, called GraphTeams. We applied it to human and mouse data to find intra- and interchromosomal gene cluster candidates. The results include intrachromosomal clusters that seem to exhibit a closer proximity in space than on their chromosomal DNA sequence. We further discovered interchromosomal gene clusters that contain genes from different chromosomes within the human genome, but are located on a single chromosome in mouse. By identifying δ-teams with families, we provide a flexible model to discover gene cluster candidates in Hi-C data. Our analysis of Hi-C data from human and mouse reveals several known gene clusters (thus validating our approach), but also few sparsely studied or possibly unknown gene cluster candidates that could be the source of further experimental investigations.

  11. Modeling stochasticity and robustness in gene regulatory networks.

    PubMed

    Garg, Abhishek; Mohanram, Kartik; Di Cara, Alessandro; De Micheli, Giovanni; Xenarios, Ioannis

    2009-06-15

    Understanding gene regulation in biological processes and modeling the robustness of underlying regulatory networks is an important problem that is currently being addressed by computational systems biologists. Lately, there has been a renewed interest in Boolean modeling techniques for gene regulatory networks (GRNs). However, due to their deterministic nature, it is often difficult to identify whether these modeling approaches are robust to the addition of stochastic noise that is widespread in gene regulatory processes. Stochasticity in Boolean models of GRNs has been addressed relatively sparingly in the past, mainly by flipping the expression of genes between different expression levels with a predefined probability. This stochasticity in nodes (SIN) model leads to over representation of noise in GRNs and hence non-correspondence with biological observations. In this article, we introduce the stochasticity in functions (SIF) model for simulating stochasticity in Boolean models of GRNs. By providing biological motivation behind the use of the SIF model and applying it to the T-helper and T-cell activation networks, we show that the SIF model provides more biologically robust results than the existing SIN model of stochasticity in GRNs. Algorithms are made available under our Boolean modeling toolbox, GenYsis. The software binaries can be downloaded from http://si2.epfl.ch/ approximately garg/genysis.html.

  12. Gene expression models for prediction of longitudinal dispersion coefficient in streams

    NASA Astrophysics Data System (ADS)

    Sattar, Ahmed M. A.; Gharabaghi, Bahram

    2015-05-01

    Longitudinal dispersion is the key hydrologic process that governs transport of pollutants in natural streams. It is critical for spill action centers to be able to predict the pollutant travel time and break-through curves accurately following accidental spills in urban streams. This study presents a novel gene expression model for longitudinal dispersion developed using 150 published data sets of geometric and hydraulic parameters in natural streams in the United States, Canada, Europe, and New Zealand. The training and testing of the model were accomplished using randomly-selected 67% (100 data sets) and 33% (50 data sets) of the data sets, respectively. Gene expression programming (GEP) is used to develop empirical relations between the longitudinal dispersion coefficient and various control variables, including the Froude number which reflects the effect of reach slope, aspect ratio, and the bed material roughness on the dispersion coefficient. Two GEP models have been developed, and the prediction uncertainties of the developed GEP models are quantified and compared with those of existing models, showing improved prediction accuracy in favor of GEP models. Finally, a parametric analysis is performed for further verification of the developed GEP models. The main reason for the higher accuracy of the GEP models compared to the existing regression models is that exponents of the key variables (aspect ratio and bed material roughness) are not constants but a function of the Froude number. The proposed relations are both simple and accurate and can be effectively used to predict the longitudinal dispersion coefficients in natural streams.

  13. Gene set analysis using variance component tests.

    PubMed

    Huang, Yen-Tsung; Lin, Xihong

    2013-06-28

    Gene set analyses have become increasingly important in genomic research, as many complex diseases are contributed jointly by alterations of numerous genes. Genes often coordinate together as a functional repertoire, e.g., a biological pathway/network and are highly correlated. However, most of the existing gene set analysis methods do not fully account for the correlation among the genes. Here we propose to tackle this important feature of a gene set to improve statistical power in gene set analyses. We propose to model the effects of an independent variable, e.g., exposure/biological status (yes/no), on multiple gene expression values in a gene set using a multivariate linear regression model, where the correlation among the genes is explicitly modeled using a working covariance matrix. We develop TEGS (Test for the Effect of a Gene Set), a variance component test for the gene set effects by assuming a common distribution for regression coefficients in multivariate linear regression models, and calculate the p-values using permutation and a scaled chi-square approximation. We show using simulations that type I error is protected under different choices of working covariance matrices and power is improved as the working covariance approaches the true covariance. The global test is a special case of TEGS when correlation among genes in a gene set is ignored. Using both simulation data and a published diabetes dataset, we show that our test outperforms the commonly used approaches, the global test and gene set enrichment analysis (GSEA). We develop a gene set analyses method (TEGS) under the multivariate regression framework, which directly models the interdependence of the expression values in a gene set using a working covariance. TEGS outperforms two widely used methods, GSEA and global test in both simulation and a diabetes microarray data.

  14. Computational Identification of Novel Genes: Current and Future Perspectives.

    PubMed

    Klasberg, Steffen; Bitard-Feildel, Tristan; Mallet, Ludovic

    2016-01-01

    While it has long been thought that all genomic novelties are derived from the existing material, many genes lacking homology to known genes were found in recent genome projects. Some of these novel genes were proposed to have evolved de novo, ie, out of noncoding sequences, whereas some have been shown to follow a duplication and divergence process. Their discovery called for an extension of the historical hypotheses about gene origination. Besides the theoretical breakthrough, increasing evidence accumulated that novel genes play important roles in evolutionary processes, including adaptation and speciation events. Different techniques are available to identify genes and classify them as novel. Their classification as novel is usually based on their similarity to known genes, or lack thereof, detected by comparative genomics or against databases. Computational approaches are further prime methods that can be based on existing models or leveraging biological evidences from experiments. Identification of novel genes remains however a challenging task. With the constant software and technologies updates, no gold standard, and no available benchmark, evaluation and characterization of genomic novelty is a vibrant field. In this review, the classical and state-of-the-art tools for gene prediction are introduced. The current methods for novel gene detection are presented; the methodological strategies and their limits are discussed along with perspective approaches for further studies.

  15. Genetic Basis of Body Color and Spotting Pattern in Redheaded Pine Sawfly Larvae (Neodiprion lecontei).

    PubMed

    Linnen, Catherine R; O'Quin, Claire T; Shackleford, Taylor; Sears, Connor R; Lindstedt, Carita

    2018-05-01

    Pigmentation has emerged as a premier model for understanding the genetic basis of phenotypic evolution, and a growing catalog of color loci is starting to reveal biases in the mutations, genes, and genetic architectures underlying color variation in the wild. However, existing studies have sampled a limited subset of taxa, color traits, and developmental stages. To expand the existing sample of color loci, we performed QTL mapping analyses on two types of larval pigmentation traits that vary among populations of the redheaded pine sawfly ( Neodiprion lecontei ): carotenoid-based yellow body color and melanin-based spotting pattern. For both traits, our QTL models explained a substantial proportion of phenotypic variation and suggested a genetic architecture that is neither monogenic nor highly polygenic. Additionally, we used our linkage map to anchor the current N. lecontei genome assembly. With these data, we identified promising candidate genes underlying (1) a loss of yellow pigmentation in populations in the mid-Atlantic/northeastern United States [C locus-associated membrane protein homologous to a mammalian HDL receptor-2 gene ( Cameo2 ) and lipid transfer particle apolipoproteins II and I gene ( apoLTP-II/I )], and (2) a pronounced reduction in black spotting in Great Lakes populations [members of the yellow gene family, tyrosine hydroxylase gene ( pale ), and dopamine N -acetyltransferase gene ( Dat )]. Several of these genes also contribute to color variation in other wild and domesticated taxa. Overall, our findings are consistent with the hypothesis that predictable genes of large effect contribute to color evolution in nature. Copyright © 2018 by the Genetics Society of America.

  16. Evolutionary dynamics of olfactory and other chemosensory receptor genes in vertebrates

    PubMed Central

    Niimura, Yoshihito

    2007-01-01

    The numbers of functional olfactory receptor (OR) genes in humans and mice are about 400 and 1,000 respectively. In both humans and mice, these genes exist as genomic clusters and are scattered over almost all chromosomes. The difference in the number of genes between the two species is apparently caused by massive inactivation of OR genes in the human lineage and a substantial increase of OR genes in the mouse lineage after the human–mouse divergence. Compared with mammals, fishes have a much smaller number of OR genes. However, the OR gene family in fishes is much more divergent than that in mammals. Fishes have many different groups of genes that are absent in mammals, suggesting that the mammalian OR gene family is characterized by the loss of many group genes that existed in the ancestor of vertebrates and the subsequent expansion of specific groups of genes. Therefore, this gene family apparently changed dynamically depending on the evolutionary lineage and evolved under the birth-and-death model of evolution. Study of the evolutionary changes of two gene families for vomeronasal receptors and two gene families for taste receptors, which are structurally similar, but remotely related to OR genes, showed that some of the gene families evolved in the same fashion as the OR gene family. It appears that the number and types of genes in chemosensory receptor gene families have evolved in response to environmental needs, but they are also affected by fortuitous factors. PMID:16607462

  17. MicroRNA-integrated and network-embedded gene selection with diffusion distance.

    PubMed

    Huang, Di; Zhou, Xiaobo; Lyon, Christopher J; Hsueh, Willa A; Wong, Stephen T C

    2010-10-29

    Gene network information has been used to improve gene selection in microarray-based studies by selecting marker genes based both on their expression and the coordinate expression of genes within their gene network under a given condition. Here we propose a new network-embedded gene selection model. In this model, we first address the limitations of microarray data. Microarray data, although widely used for gene selection, measures only mRNA abundance, which does not always reflect the ultimate gene phenotype, since it does not account for post-transcriptional effects. To overcome this important (critical in certain cases) but ignored-in-almost-all-existing-studies limitation, we design a new strategy to integrate together microarray data with the information of microRNA, the major post-transcriptional regulatory factor. We also handle the challenges led by gene collaboration mechanism. To incorporate the biological facts that genes without direct interactions may work closely due to signal transduction and that two genes may be functionally connected through multi paths, we adopt the concept of diffusion distance. This concept permits us to simulate biological signal propagation and therefore to estimate the collaboration probability for all gene pairs, directly or indirectly-connected, according to multi paths connecting them. We demonstrate, using type 2 diabetes (DM2) as an example, that the proposed strategies can enhance the identification of functional gene partners, which is the key issue in a network-embedded gene selection model. More importantly, we show that our gene selection model outperforms related ones. Genes selected by our model 1) have improved classification capability; 2) agree with biological evidence of DM2-association; and 3) are involved in many well-known DM2-associated pathways.

  18. GeneCount: genome-wide calculation of absolute tumor DNA copy numbers from array comparative genomic hybridization data

    PubMed Central

    Lyng, Heidi; Lando, Malin; Brøvig, Runar S; Svendsrud, Debbie H; Johansen, Morten; Galteland, Eivind; Brustugun, Odd T; Meza-Zepeda, Leonardo A; Myklebost, Ola; Kristensen, Gunnar B; Hovig, Eivind; Stokke, Trond

    2008-01-01

    Absolute tumor DNA copy numbers can currently be achieved only on a single gene basis by using fluorescence in situ hybridization (FISH). We present GeneCount, a method for genome-wide calculation of absolute copy numbers from clinical array comparative genomic hybridization data. The tumor cell fraction is reliably estimated in the model. Data consistent with FISH results are achieved. We demonstrate significant improvements over existing methods for exploring gene dosages and intratumor copy number heterogeneity in cancers. PMID:18500990

  19. Construction and analysis of gene-gene dynamics influence networks based on a Boolean model.

    PubMed

    Mazaya, Maulida; Trinh, Hung-Cuong; Kwon, Yung-Keun

    2017-12-21

    Identification of novel gene-gene relations is a crucial issue to understand system-level biological phenomena. To this end, many methods based on a correlation analysis of gene expressions or structural analysis of molecular interaction networks have been proposed. They have a limitation in identifying more complicated gene-gene dynamical relations, though. To overcome this limitation, we proposed a measure to quantify a gene-gene dynamical influence (GDI) using a Boolean network model and constructed a GDI network to indicate existence of a dynamical influence for every ordered pair of genes. It represents how much a state trajectory of a target gene is changed by a knockout mutation subject to a source gene in a gene-gene molecular interaction (GMI) network. Through a topological comparison between GDI and GMI networks, we observed that the former network is denser than the latter network, which implies that there exist many gene pairs of dynamically influencing but molecularly non-interacting relations. In addition, a larger number of hub genes were generated in the GDI network. On the other hand, there was a correlation between these networks such that the degree value of a node was positively correlated to each other. We further investigated the relationships of the GDI value with structural properties and found that there are negative and positive correlations with the length of a shortest path and the number of paths, respectively. In addition, a GDI network could predict a set of genes whose steady-state expression is affected in E. coli gene-knockout experiments. More interestingly, we found that the drug-targets with side-effects have a larger number of outgoing links than the other genes in the GDI network, which implies that they are more likely to influence the dynamics of other genes. Finally, we found biological evidences showing that the gene pairs which are not molecularly interacting but dynamically influential can be considered for novel gene-gene relationships. Taken together, construction and analysis of the GDI network can be a useful approach to identify novel gene-gene relationships in terms of the dynamical influence.

  20. Analysis of functional importance of binding sites in the Drosophila gap gene network model.

    PubMed

    Kozlov, Konstantin; Gursky, Vitaly V; Kulakovskiy, Ivan V; Dymova, Arina; Samsonova, Maria

    2015-01-01

    The statistical thermodynamics based approach provides a promising framework for construction of the genotype-phenotype map in many biological systems. Among important aspects of a good model connecting the DNA sequence information with that of a molecular phenotype (gene expression) is the selection of regulatory interactions and relevant transcription factor bindings sites. As the model may predict different levels of the functional importance of specific binding sites in different genomic and regulatory contexts, it is essential to formulate and study such models under different modeling assumptions. We elaborate a two-layer model for the Drosophila gap gene network and include in the model a combined set of transcription factor binding sites and concentration dependent regulatory interaction between gap genes hunchback and Kruppel. We show that the new variants of the model are more consistent in terms of gene expression predictions for various genetic constructs in comparison to previous work. We quantify the functional importance of binding sites by calculating their impact on gene expression in the model and calculate how these impacts correlate across all sites under different modeling assumptions. The assumption about the dual interaction between hb and Kr leads to the most consistent modeling results, but, on the other hand, may obscure existence of indirect interactions between binding sites in regulatory regions of distinct genes. The analysis confirms the previously formulated regulation concept of many weak binding sites working in concert. The model predicts a more or less uniform distribution of functionally important binding sites over the sets of experimentally characterized regulatory modules and other open chromatin domains.

  1. Reframed Genome-Scale Metabolic Model to Facilitate Genetic Design and Integration with Expression Data.

    PubMed

    Gu, Deqing; Jian, Xingxing; Zhang, Cheng; Hua, Qiang

    2017-01-01

    Genome-scale metabolic network models (GEMs) have played important roles in the design of genetically engineered strains and helped biologists to decipher metabolism. However, due to the complex gene-reaction relationships that exist in model systems, most algorithms have limited capabilities with respect to directly predicting accurate genetic design for metabolic engineering. In particular, methods that predict reaction knockout strategies leading to overproduction are often impractical in terms of gene manipulations. Recently, we proposed a method named logical transformation of model (LTM) to simplify the gene-reaction associations by introducing intermediate pseudo reactions, which makes it possible to generate genetic design. Here, we propose an alternative method to relieve researchers from deciphering complex gene-reactions by adding pseudo gene controlling reactions. In comparison to LTM, this new method introduces fewer pseudo reactions and generates a much smaller model system named as gModel. We showed that gModel allows two seldom reported applications: identification of minimal genomes and design of minimal cell factories within a modified OptKnock framework. In addition, gModel could be used to integrate expression data directly and improve the performance of the E-Fmin method for predicting fluxes. In conclusion, the model transformation procedure will facilitate genetic research based on GEMs, extending their applications.

  2. Analysis of gene network robustness based on saturated fixed point attractors

    PubMed Central

    2014-01-01

    The analysis of gene network robustness to noise and mutation is important for fundamental and practical reasons. Robustness refers to the stability of the equilibrium expression state of a gene network to variations of the initial expression state and network topology. Numerical simulation of these variations is commonly used for the assessment of robustness. Since there exists a great number of possible gene network topologies and initial states, even millions of simulations may be still too small to give reliable results. When the initial and equilibrium expression states are restricted to being saturated (i.e., their elements can only take values 1 or −1 corresponding to maximum activation and maximum repression of genes), an analytical gene network robustness assessment is possible. We present this analytical treatment based on determination of the saturated fixed point attractors for sigmoidal function models. The analysis can determine (a) for a given network, which and how many saturated equilibrium states exist and which and how many saturated initial states converge to each of these saturated equilibrium states and (b) for a given saturated equilibrium state or a given pair of saturated equilibrium and initial states, which and how many gene networks, referred to as viable, share this saturated equilibrium state or the pair of saturated equilibrium and initial states. We also show that the viable networks sharing a given saturated equilibrium state must follow certain patterns. These capabilities of the analytical treatment make it possible to properly define and accurately determine robustness to noise and mutation for gene networks. Previous network research conclusions drawn from performing millions of simulations follow directly from the results of our analytical treatment. Furthermore, the analytical results provide criteria for the identification of model validity and suggest modified models of gene network dynamics. The yeast cell-cycle network is used as an illustration of the practical application of this analytical treatment. PMID:24650364

  3. Analyzing the relationship between sequence divergence and nodal support using Bayesian phylogenetic analyses.

    PubMed

    Makowsky, Robert; Cox, Christian L; Roelke, Corey; Chippindale, Paul T

    2010-11-01

    Determining the appropriate gene for phylogeny reconstruction can be a difficult process. Rapidly evolving genes tend to resolve recent relationships, but suffer from alignment issues and increased homoplasy among distantly related species. Conversely, slowly evolving genes generally perform best for deeper relationships, but lack sufficient variation to resolve recent relationships. We determine the relationship between sequence divergence and Bayesian phylogenetic reconstruction ability using both natural and simulated datasets. The natural data are based on 28 well-supported relationships within the subphylum Vertebrata. Sequences of 12 genes were acquired and Bayesian analyses were used to determine phylogenetic support for correct relationships. Simulated datasets were designed to determine whether an optimal range of sequence divergence exists across extreme phylogenetic conditions. Across all genes we found that an optimal range of divergence for resolving the correct relationships does exist, although this level of divergence expectedly depends on the distance metric. Simulated datasets show that an optimal range of sequence divergence exists across diverse topologies and models of evolution. We determine that a simple to measure property of genetic sequences (genetic distance) is related to phylogenic reconstruction ability in Bayesian analyses. This information should be useful for selecting the most informative gene to resolve any relationships, especially those that are difficult to resolve, as well as minimizing both cost and confounding information during project design. Copyright © 2010. Published by Elsevier Inc.

  4. Modeling genome-wide dynamic regulatory network in mouse lungs with influenza infection using high-dimensional ordinary differential equations.

    PubMed

    Wu, Shuang; Liu, Zhi-Ping; Qiu, Xing; Wu, Hulin

    2014-01-01

    The immune response to viral infection is regulated by an intricate network of many genes and their products. The reverse engineering of gene regulatory networks (GRNs) using mathematical models from time course gene expression data collected after influenza infection is key to our understanding of the mechanisms involved in controlling influenza infection within a host. A five-step pipeline: detection of temporally differentially expressed genes, clustering genes into co-expressed modules, identification of network structure, parameter estimate refinement, and functional enrichment analysis, is developed for reconstructing high-dimensional dynamic GRNs from genome-wide time course gene expression data. Applying the pipeline to the time course gene expression data from influenza-infected mouse lungs, we have identified 20 distinct temporal expression patterns in the differentially expressed genes and constructed a module-based dynamic network using a linear ODE model. Both intra-module and inter-module annotations and regulatory relationships of our inferred network show some interesting findings and are highly consistent with existing knowledge about the immune response in mice after influenza infection. The proposed method is a computationally efficient, data-driven pipeline bridging experimental data, mathematical modeling, and statistical analysis. The application to the influenza infection data elucidates the potentials of our pipeline in providing valuable insights into systematic modeling of complicated biological processes.

  5. Stochastic models for inferring genetic regulation from microarray gene expression data.

    PubMed

    Tian, Tianhai

    2010-03-01

    Microarray expression profiles are inherently noisy and many different sources of variation exist in microarray experiments. It is still a significant challenge to develop stochastic models to realize noise in microarray expression profiles, which has profound influence on the reverse engineering of genetic regulation. Using the target genes of the tumour suppressor gene p53 as the test problem, we developed stochastic differential equation models and established the relationship between the noise strength of stochastic models and parameters of an error model for describing the distribution of the microarray measurements. Numerical results indicate that the simulated variance from stochastic models with a stochastic degradation process can be represented by a monomial in terms of the hybridization intensity and the order of the monomial depends on the type of stochastic process. The developed stochastic models with multiple stochastic processes generated simulations whose variance is consistent with the prediction of the error model. This work also established a general method to develop stochastic models from experimental information. 2009 Elsevier Ireland Ltd. All rights reserved.

  6. Disease modeling in genetic kidney diseases: zebrafish.

    PubMed

    Schenk, Heiko; Müller-Deile, Janina; Kinast, Mark; Schiffer, Mario

    2017-07-01

    Growing numbers of translational genomics studies are based on the highly efficient and versatile zebrafish (Danio rerio) vertebrate model. The increasing types of zebrafish models have improved our understanding of inherited kidney diseases, since they not only display pathophysiological changes but also give us the opportunity to develop and test novel treatment options in a high-throughput manner. New paradigms in inherited kidney diseases have been developed on the basis of the distinct genome conservation of approximately 70 % between zebrafish and humans in terms of existing gene orthologs. Several options are available to determine the functional role of a specific gene or gene sets. Permanent genome editing can be induced via complete gene knockout by using the CRISPR/Cas-system, among others, or via transient modification by using various morpholino techniques. Cross-species rescues succeeding knockdown techniques are employed to determine the functional significance of a target gene or a specific mutation. This article summarizes the current techniques and discusses their perspectives.

  7. Stacking transgenes in forest trees.

    PubMed

    Halpin, Claire; Boerjan, Wout

    2003-08-01

    Huge potential exists for improving plant raw materials and foodstuffs via metabolic engineering. To date, progress has mostly been limited to modulating the expression of single genes of well-studied pathways, such as the lignin biosynthetic pathway, in model species. However, a recent report illustrates a new level of sophistication in metabolic engineering by overexpressing one lignin enzyme while simultaneously suppressing the expression of another lignin gene in a tree, aspen. This novel approach to multi-gene manipulation has succeeded in concurrently improving several wood-quality traits.

  8. A Genome-Wide Analysis Reveals No Nuclear Dobzhansky-Muller Pairs of Determinants of Speciation between S. cerevisiae and S. paradoxus, but Suggests More Complex Incompatibilities

    PubMed Central

    Kao, Katy C.; Schwartz, Katja; Sherlock, Gavin

    2010-01-01

    The Dobzhansky-Muller (D-M) model of speciation by genic incompatibility is widely accepted as the primary cause of interspecific postzygotic isolation. Since the introduction of this model, there have been theoretical and experimental data supporting the existence of such incompatibilities. However, speciation genes have been largely elusive, with only a handful of candidate genes identified in a few organisms. The Saccharomyces sensu stricto yeasts, which have small genomes and can mate interspecifically to produce sterile hybrids, are thus an ideal model for studying postzygotic isolation. Among them, only a single D-M pair, comprising a mitochondrially targeted product of a nuclear gene and a mitochondrially encoded locus, has been found. Thus far, no D-M pair of nuclear genes has been identified between any sensu stricto yeasts. We report here the first detailed genome-wide analysis of rare meiotic products from an otherwise sterile hybrid and show that no classic D-M pairs of speciation genes exist between the nuclear genomes of the closely related yeasts S. cerevisiae and S. paradoxus. Instead, our analyses suggest that more complex interactions, likely involving multiple loci having weak effects, may be responsible for their post-zygotic separation. The lack of a nuclear encoded classic D-M pair between these two yeasts, yet the existence of multiple loci that may each exert a small effect through complex interactions suggests that initial speciation events might not always be mediated by D-M pairs. An alternative explanation may be that the accumulation of polymorphisms leads to gamete inviability due to the activities of anti-recombination mechanisms and/or incompatibilities between the species' transcriptional and metabolic networks, with no single pair at least initially being responsible for the incompatibility. After such a speciation event, it is possible that one or more D-M pairs might subsequently arise following isolation. PMID:20686707

  9. Towards the theory of pollinator-mediated gene flow.

    PubMed Central

    Cresswell, James E

    2003-01-01

    I present a new exposition of a model of gene flow by animal-mediated pollination between a source population and a sink population. The model's parameters describe two elements: (i) the expected portion of the source's paternity that extends to the sink population; and (ii) the dilution of this portion by within-sink pollinations. The model is termed the portion-dilution model (PDM). The PDM is a parametric restatement of the conventional view of animal-mediated pollination. In principle, it can be applied to plant species in general. I formulate a theoretical value of the portion parameter that maximizes gene flow and prescribe this as a benchmark against which to judge the performance of real systems. Existing foraging theory can be used in solving part of the PDM, but a theory for source-to-sink transitions by pollinators is currently elusive. PMID:12831465

  10. MapReduce Algorithms for Inferring Gene Regulatory Networks from Time-Series Microarray Data Using an Information-Theoretic Approach.

    PubMed

    Abduallah, Yasser; Turki, Turki; Byron, Kevin; Du, Zongxuan; Cervantes-Cervantes, Miguel; Wang, Jason T L

    2017-01-01

    Gene regulation is a series of processes that control gene expression and its extent. The connections among genes and their regulatory molecules, usually transcription factors, and a descriptive model of such connections are known as gene regulatory networks (GRNs). Elucidating GRNs is crucial to understand the inner workings of the cell and the complexity of gene interactions. To date, numerous algorithms have been developed to infer gene regulatory networks. However, as the number of identified genes increases and the complexity of their interactions is uncovered, networks and their regulatory mechanisms become cumbersome to test. Furthermore, prodding through experimental results requires an enormous amount of computation, resulting in slow data processing. Therefore, new approaches are needed to expeditiously analyze copious amounts of experimental data resulting from cellular GRNs. To meet this need, cloud computing is promising as reported in the literature. Here, we propose new MapReduce algorithms for inferring gene regulatory networks on a Hadoop cluster in a cloud environment. These algorithms employ an information-theoretic approach to infer GRNs using time-series microarray data. Experimental results show that our MapReduce program is much faster than an existing tool while achieving slightly better prediction accuracy than the existing tool.

  11. Selection and evaluation of reference genes for expression studies with quantitative PCR in the model fungus Neurospora crassa under different environmental conditions in continuous culture.

    PubMed

    Cusick, Kathleen D; Fitzgerald, Lisa A; Pirlo, Russell K; Cockrell, Allison L; Petersen, Emily R; Biffinger, Justin C

    2014-01-01

    Neurospora crassa has served as a model organism for studying circadian pathways and more recently has gained attention in the biofuel industry due to its enhanced capacity for cellulase production. However, in order to optimize N. crassa for biotechnological applications, metabolic pathways during growth under different environmental conditions must be addressed. Reverse-transcription quantitative PCR (RT-qPCR) is a technique that provides a high-throughput platform from which to measure the expression of a large set of genes over time. The selection of a suitable reference gene is critical for gene expression studies using relative quantification, as this strategy is based on normalization of target gene expression to a reference gene whose expression is stable under the experimental conditions. This study evaluated twelve candidate reference genes for use with N. crassa when grown in continuous culture bioreactors under different light and temperature conditions. Based on combined stability values from NormFinder and Best Keeper software packages, the following are the most appropriate reference genes under conditions of: (1) light/dark cycling: btl, asl, and vma1; (2) all-dark growth: btl, tbp, vma1, and vma2; (3) temperature flux: btl, vma1, act, and asl; (4) all conditions combined: vma1, vma2, tbp, and btl. Since N. crassa exists as different cell types (uni- or multi-nucleated), expression changes in a subset of the candidate genes was further assessed using absolute quantification. A strong negative correlation was found to exist between ratio and threshold cycle (CT) values, demonstrating that CT changes serve as a reliable reflection of transcript, and not gene copy number, fluctuations. The results of this study identified genes that are appropriate for use as reference genes in RT-qPCR studies with N. crassa and demonstrated that even with the presence of different cell types, relative quantification is an acceptable method for measuring gene expression changes during growth in bioreactors.

  12. Effect of various classes of pesticides on expression of stress genes in transgenic C. elegans model of Parkinson's disease.

    PubMed

    Jadiya, Pooja; Mir, Snober S; Nazir, Aamir

    2012-12-01

    Neurodegenerative diseases are known to be associated with genetic and environmental factors. The multifactorial Parkinson's disease (PD) is triggered and/or further worsened by exposure to certain pesticides. Existing literature suggests a link between pesticide exposure and increased incidence of PD. We carried out the present study to look into the stress gene expression pattern of transgenic Caenorhabditis elegans (C. elegans) model of PD after exposure to pesticides from different classes. Expression level of sod-1, sod-2, sod-3, hsp-70, hsp-60, and hsp-16.2 stress responsive genes was determined using qPCR. Our findings demonstrate that the expression of stress related genes does not follow a generalized pattern to different toxicants; rather each pesticide class has a specific expression signature.

  13. PCV: An Alignment Free Method for Finding Homologous Nucleotide Sequences and its Application in Phylogenetic Study.

    PubMed

    Kumar, Rajnish; Mishra, Bharat Kumar; Lahiri, Tapobrata; Kumar, Gautam; Kumar, Nilesh; Gupta, Rahul; Pal, Manoj Kumar

    2017-06-01

    Online retrieval of the homologous nucleotide sequences through existing alignment techniques is a common practice against the given database of sequences. The salient point of these techniques is their dependence on local alignment techniques and scoring matrices the reliability of which is limited by computational complexity and accuracy. Toward this direction, this work offers a novel way for numerical representation of genes which can further help in dividing the data space into smaller partitions helping formation of a search tree. In this context, this paper introduces a 36-dimensional Periodicity Count Value (PCV) which is representative of a particular nucleotide sequence and created through adaptation from the concept of stochastic model of Kolekar et al. (American Institute of Physics 1298:307-312, 2010. doi: 10.1063/1.3516320 ). The PCV construct uses information on physicochemical properties of nucleotides and their positional distribution pattern within a gene. It is observed that PCV representation of gene reduces computational cost in the calculation of distances between a pair of genes while being consistent with the existing methods. The validity of PCV-based method was further tested through their use in molecular phylogeny constructs in comparison with that using existing sequence alignment methods.

  14. Hierarchical Bayesian modelling of gene expression time series across irregularly sampled replicates and clusters.

    PubMed

    Hensman, James; Lawrence, Neil D; Rattray, Magnus

    2013-08-20

    Time course data from microarrays and high-throughput sequencing experiments require simple, computationally efficient and powerful statistical models to extract meaningful biological signal, and for tasks such as data fusion and clustering. Existing methodologies fail to capture either the temporal or replicated nature of the experiments, and often impose constraints on the data collection process, such as regularly spaced samples, or similar sampling schema across replications. We propose hierarchical Gaussian processes as a general model of gene expression time-series, with application to a variety of problems. In particular, we illustrate the method's capacity for missing data imputation, data fusion and clustering.The method can impute data which is missing both systematically and at random: in a hold-out test on real data, performance is significantly better than commonly used imputation methods. The method's ability to model inter- and intra-cluster variance leads to more biologically meaningful clusters. The approach removes the necessity for evenly spaced samples, an advantage illustrated on a developmental Drosophila dataset with irregular replications. The hierarchical Gaussian process model provides an excellent statistical basis for several gene-expression time-series tasks. It has only a few additional parameters over a regular GP, has negligible additional complexity, is easily implemented and can be integrated into several existing algorithms. Our experiments were implemented in python, and are available from the authors' website: http://staffwww.dcs.shef.ac.uk/people/J.Hensman/.

  15. Investor Outlook: Solving Gene Therapy Pricing…with a Cures Voucher?

    PubMed

    Schimmer, Joshua; Breazzano, Steven

    2016-12-01

    Gene therapy reimbursement continues to be an intense topic of discussion in the field given the unique and durable benefits from a single administration and generally small patient populations against a reimbursement framework that is not optimized for such "cures" or long-lived benefits. As more gene therapy programs enter the market and late-stage development, it is increasingly important for the field to define a reimbursement model that works for all stakeholders in order to encourage the next wave of innovation. To add to the discussion around new payment models and potential solutions, we propose a flexible voucher system that takes advantage of existing infrastructure, precedent, and regulatory frameworks.

  16. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhu, Yitan; Xu, Yanxun; Helseth, Donald L.

    Background: Genetic interactions play a critical role in cancer development. Existing knowledge about cancer genetic interactions is incomplete, especially lacking evidences derived from large-scale cancer genomics data. The Cancer Genome Atlas (TCGA) produces multimodal measurements across genomics and features of thousands of tumors, which provide an unprecedented opportunity to investigate the interplays of genes in cancer. Methods: We introduce Zodiac, a computational tool and resource to integrate existing knowledge about cancer genetic interactions with new information contained in TCGA data. It is an evolution of existing knowledge by treating it as a prior graph, integrating it with a likelihood modelmore » derived by Bayesian graphical model based on TCGA data, and producing a posterior graph as updated and data-enhanced knowledge. In short, Zodiac realizes “Prior interaction map + TCGA data → Posterior interaction map.” Results: Zodiac provides molecular interactions for about 200 million pairs of genes. All the results are generated from a big-data analysis and organized into a comprehensive database allowing customized search. In addition, Zodiac provides data processing and analysis tools that allow users to customize the prior networks and update the genetic pathways of their interest. Zodiac is publicly available at www.compgenome.org/ZODIAC. Conclusions: Zodiac recapitulates and extends existing knowledge of molecular interactions in cancer. It can be used to explore novel gene-gene interactions, transcriptional regulation, and other types of molecular interplays in cancer.« less

  17. Can Thrifty Gene(s) or Predictive Fetal Programming for Thriftiness Lead to Obesity?

    PubMed Central

    Baig, Ulfat; Belsare, Prajakta; Watve, Milind; Jog, Maithili

    2011-01-01

    Obesity and related disorders are thought to have their roots in metabolic “thriftiness” that evolved to combat periodic starvation. The association of low birth weight with obesity in later life caused a shift in the concept from thrifty gene to thrifty phenotype or anticipatory fetal programming. The assumption of thriftiness is implicit in obesity research. We examine here, with the help of a mathematical model, the conditions for evolution of thrifty genes or fetal programming for thriftiness. The model suggests that a thrifty gene cannot exist in a stable polymorphic state in a population. The conditions for evolution of thrifty fetal programming are restricted if the correlation between intrauterine and lifetime conditions is poor. Such a correlation is not observed in natural courses of famine. If there is fetal programming for thriftiness, it could have evolved in anticipation of social factors affecting nutrition that can result in a positive correlation. PMID:21773010

  18. Bayesian median regression for temporal gene expression data

    NASA Astrophysics Data System (ADS)

    Yu, Keming; Vinciotti, Veronica; Liu, Xiaohui; 't Hoen, Peter A. C.

    2007-09-01

    Most of the existing methods for the identification of biologically interesting genes in a temporal expression profiling dataset do not fully exploit the temporal ordering in the dataset and are based on normality assumptions for the gene expression. In this paper, we introduce a Bayesian median regression model to detect genes whose temporal profile is significantly different across a number of biological conditions. The regression model is defined by a polynomial function where both time and condition effects as well as interactions between the two are included. MCMC-based inference returns the posterior distribution of the polynomial coefficients. From this a simple Bayes factor test is proposed to test for significance. The estimation of the median rather than the mean, and within a Bayesian framework, increases the robustness of the method compared to a Hotelling T2-test previously suggested. This is shown on simulated data and on muscular dystrophy gene expression data.

  19. Differential gene transcription across the life cycle in Daphnia magna using a new all genome custom-made microarray.

    PubMed

    Campos, Bruno; Fletcher, Danielle; Piña, Benjamín; Tauler, Romà; Barata, Carlos

    2018-05-18

    Unravelling the link between genes and environment across the life cycle is a challenging goal that requires model organisms with well-characterized life-cycles, ecological interactions in nature, tractability in the laboratory, and available genomic tools. Very few well-studied invertebrate model species meet these requirements, being the waterflea Daphnia magna one of them. Here we report a full genome transcription profiling of D. magna during its life-cycle. The study was performed using a new microarray platform designed from the complete set of gene models representing the whole transcribed genome of D. magna. Up to 93% of the existing 41,317 D. magna gene models showed differential transcription patterns across the developmental stages of D. magna, 59% of which were functionally annotated. Embryos showed the highest number of unique transcribed genes, mainly related to DNA, RNA, and ribosome biogenesis, likely related to cellular proliferation and morphogenesis of the several body organs. Adult females showed an enrichment of transcripts for genes involved in reproductive processes. These female-specific transcripts were essentially absent in males, whose transcriptome was enriched in specific genes of male sexual differentiation genes, like doublesex. Our results define major characteristics of transcriptional programs involved in the life-cycle, differentiate males and females, and show that large scale gene-transcription data collected in whole animals can be used to identify genes involved in specific biological and biochemical processes.

  20. The transcriptional landscape of age in human peripheral blood

    PubMed Central

    Peters, Marjolein J.; Joehanes, Roby; Pilling, Luke C.; Schurmann, Claudia; Conneely, Karen N.; Powell, Joseph; Reinmaa, Eva; Sutphin, George L.; Zhernakova, Alexandra; Schramm, Katharina; Wilson, Yana A.; Kobes, Sayuko; Tukiainen, Taru; Nalls, Michael A.; Hernandez, Dena G.; Cookson, Mark R.; Gibbs, Raphael J.; Hardy, John; Ramasamy, Adaikalavan; Zonderman, Alan B.; Dillman, Allissa; Traynor, Bryan; Smith, Colin; Longo, Dan L.; Trabzuni, Daniah; Troncoso, Juan; van der Brug, Marcel; Weale, Michael E.; O'Brien, Richard; Johnson, Robert; Walker, Robert; Zielke, Ronald H.; Arepalli, Sampath; Ryten, Mina; Singleton, Andrew B.; Ramos, Yolande F.; Göring, Harald H. H.; Fornage, Myriam; Liu, Yongmei; Gharib, Sina A.; Stranger, Barbara E.; De Jager, Philip L.; Aviv, Abraham; Levy, Daniel; Murabito, Joanne M.; Munson, Peter J.; Huan, Tianxiao; Hofman, Albert; Uitterlinden, André G.; Rivadeneira, Fernando; van Rooij, Jeroen; Stolk, Lisette; Broer, Linda; Verbiest, Michael M. P. J.; Jhamai, Mila; Arp, Pascal; Metspalu, Andres; Tserel, Liina; Milani, Lili; Samani, Nilesh J.; Peterson, Pärt; Kasela, Silva; Codd, Veryan; Peters, Annette; Ward-Caviness, Cavin K.; Herder, Christian; Waldenberger, Melanie; Roden, Michael; Singmann, Paula; Zeilinger, Sonja; Illig, Thomas; Homuth, Georg; Grabe, Hans-Jörgen; Völzke, Henry; Steil, Leif; Kocher, Thomas; Murray, Anna; Melzer, David; Yaghootkar, Hanieh; Bandinelli, Stefania; Moses, Eric K.; Kent, Jack W.; Curran, Joanne E.; Johnson, Matthew P.; Williams-Blangero, Sarah; Westra, Harm-Jan; McRae, Allan F.; Smith, Jennifer A.; Kardia, Sharon L. R.; Hovatta, Iiris; Perola, Markus; Ripatti, Samuli; Salomaa, Veikko; Henders, Anjali K.; Martin, Nicholas G.; Smith, Alicia K.; Mehta, Divya; Binder, Elisabeth B.; Nylocks, K Maria; Kennedy, Elizabeth M.; Klengel, Torsten; Ding, Jingzhong; Suchy-Dicey, Astrid M.; Enquobahrie, Daniel A.; Brody, Jennifer; Rotter, Jerome I.; Chen, Yii-Der I.; Houwing-Duistermaat, Jeanine; Kloppenburg, Margreet; Slagboom, P. Eline; Helmer, Quinta; den Hollander, Wouter; Bean, Shannon; Raj, Towfique; Bakhshi, Noman; Wang, Qiao Ping; Oyston, Lisa J.; Psaty, Bruce M.; Tracy, Russell P.; Montgomery, Grant W.; Turner, Stephen T.; Blangero, John; Meulenbelt, Ingrid; Ressler, Kerry J.; Yang, Jian; Franke, Lude; Kettunen, Johannes; Visscher, Peter M.; Neely, G. Gregory; Korstanje, Ron; Hanson, Robert L.; Prokisch, Holger; Ferrucci, Luigi; Esko, Tonu; Teumer, Alexander; van Meurs, Joyce B. J.; Johnson, Andrew D.

    2015-01-01

    Disease incidences increase with age, but the molecular characteristics of ageing that lead to increased disease susceptibility remain inadequately understood. Here we perform a whole-blood gene expression meta-analysis in 14,983 individuals of European ancestry (including replication) and identify 1,497 genes that are differentially expressed with chronological age. The age-associated genes do not harbor more age-associated CpG-methylation sites than other genes, but are instead enriched for the presence of potentially functional CpG-methylation sites in enhancer and insulator regions that associate with both chronological age and gene expression levels. We further used the gene expression profiles to calculate the ‘transcriptomic age' of an individual, and show that differences between transcriptomic age and chronological age are associated with biological features linked to ageing, such as blood pressure, cholesterol levels, fasting glucose, and body mass index. The transcriptomic prediction model adds biological relevance and complements existing epigenetic prediction models, and can be used by others to calculate transcriptomic age in external cohorts. PMID:26490707

  1. Probabilistic modeling of bifurcations in single-cell gene expression data using a Bayesian mixture of factor analyzers.

    PubMed

    Campbell, Kieran R; Yau, Christopher

    2017-03-15

    Modeling bifurcations in single-cell transcriptomics data has become an increasingly popular field of research. Several methods have been proposed to infer bifurcation structure from such data, but all rely on heuristic non-probabilistic inference. Here we propose the first generative, fully probabilistic model for such inference based on a Bayesian hierarchical mixture of factor analyzers. Our model exhibits competitive performance on large datasets despite implementing full Markov-Chain Monte Carlo sampling, and its unique hierarchical prior structure enables automatic determination of genes driving the bifurcation process. We additionally propose an Empirical-Bayes like extension that deals with the high levels of zero-inflation in single-cell RNA-seq data and quantify when such models are useful. We apply or model to both real and simulated single-cell gene expression data and compare the results to existing pseudotime methods. Finally, we discuss both the merits and weaknesses of such a unified, probabilistic approach in the context practical bioinformatics analyses.

  2. Unique core genomes of the bacterial family vibrionaceae: insights into niche adaptation and speciation.

    PubMed

    Kahlke, Tim; Goesmann, Alexander; Hjerde, Erik; Willassen, Nils Peder; Haugen, Peik

    2012-05-10

    The criteria for defining bacterial species and even the concept of bacterial species itself are under debate, and the discussion is apparently intensifying as more genome sequence data is becoming available. However, it is still unclear how the new advances in genomics should be used most efficiently to address this question. In this study we identify genes that are common to any group of genomes in our dataset, to determine whether genes specific to a particular taxon exist and to investigate their potential role in adaptation of bacteria to their specific niche. These genes were named unique core genes. Additionally, we investigate the existence and importance of unique core genes that are found in isolates of phylogenetically non-coherent groups. These groups of isolates, that share a genetic feature without sharing a closest common ancestor, are termed genophyletic groups. The bacterial family Vibrionaceae was used as the model, and we compiled and compared genome sequences of 64 different isolates. Using the software orthoMCL we determined clusters of homologous genes among the investigated genome sequences. We used multilocus sequence analysis to build a host phylogeny and mapped the numbers of unique core genes of all distinct groups of isolates onto the tree. The results show that unique core genes are more likely to be found in monophyletic groups of isolates. Genophyletic groups of isolates, in contrast, are less common especially for large groups of isolate. The subsequent annotation of unique core genes that are present in genophyletic groups indicate a high degree of horizontally transferred genes. Finally, the annotation of the unique core genes of Vibrio cholerae revealed genes involved in aerotaxis and biosynthesis of the iron-chelator vibriobactin. The presented work indicates that genes specific for any taxon inside the bacterial family Vibrionaceae exist. These unique core genes encode conserved metabolic functions that can shed light on the adaptation of a species to its ecological niche. Additionally, our study suggests that unique core genes can be used to aid classification of bacteria and contribute to a bacterial species definition on a genomic level. Furthermore, these genes may be of importance in clinical diagnostics and drug development.

  3. Promoter architecture dictates cell-to-cell variability in gene expression.

    PubMed

    Jones, Daniel L; Brewster, Robert C; Phillips, Rob

    2014-12-19

    Variability in gene expression among genetically identical cells has emerged as a central preoccupation in the study of gene regulation; however, a divide exists between the predictions of molecular models of prokaryotic transcriptional regulation and genome-wide experimental studies suggesting that this variability is indifferent to the underlying regulatory architecture. We constructed a set of promoters in Escherichia coli in which promoter strength, transcription factor binding strength, and transcription factor copy numbers are systematically varied, and used messenger RNA (mRNA) fluorescence in situ hybridization to observe how these changes affected variability in gene expression. Our parameter-free models predicted the observed variability; hence, the molecular details of transcription dictate variability in mRNA expression, and transcriptional noise is specifically tunable and thus represents an evolutionarily accessible phenotypic parameter. Copyright © 2014, American Association for the Advancement of Science.

  4. Conditional entropy in variation-adjusted windows detects selection signatures associated with expression quantitative trait loci (eQTLs)

    PubMed Central

    2015-01-01

    Background Over the past 50,000 years, shifts in human-environmental or human-human interactions shaped genetic differences within and among human populations, including variants under positive selection. Shaped by environmental factors, such variants influence the genetics of modern health, disease, and treatment outcome. Because evolutionary processes tend to act on gene regulation, we test whether regulatory variants are under positive selection. We introduce a new approach to enhance detection of genetic markers undergoing positive selection, using conditional entropy to capture recent local selection signals. Results We use conditional logistic regression to compare our Adjusted Haplotype Conditional Entropy (H|H) measure of positive selection to existing positive selection measures. H|H and existing measures were applied to published regulatory variants acting in cis (cis-eQTLs), with conditional logistic regression testing whether regulatory variants undergo stronger positive selection than the surrounding gene. These cis-eQTLs were drawn from six independent studies of genotype and RNA expression. The conditional logistic regression shows that, overall, H|H is substantially more powerful than existing positive-selection methods in identifying cis-eQTLs against other Single Nucleotide Polymorphisms (SNPs) in the same genes. When broken down by Gene Ontology, H|H predictions are particularly strong in some biological process categories, where regulatory variants are under strong positive selection compared to the bulk of the gene, distinct from those GO categories under overall positive selection. . However, cis-eQTLs in a second group of genes lack positive selection signatures detectable by H|H, consistent with ancient short haplotypes compared to the surrounding gene (for example, in innate immunity GO:0042742); under such other modes of selection, H|H would not be expected to be a strong predictor.. These conditional logistic regression models are adjusted for Minor allele frequency(MAF); otherwise, ascertainment bias is a huge factor in all eQTL data sets. Relationships between Gene Ontology categories, positive selection and eQTL specificity were replicated with H|H in a single larger data set. Our measure, Adjusted Haplotype Conditional Entropy (H|H), was essential in generating all of the results above because it: 1) is a stronger overall predictor for eQTLs than comparable existing approaches, and 2) shows low sequential auto-correlation, overcoming problems with convergence of these conditional regression statistical models. Conclusions Our new method, H|H, provides a consistently more robust signal associated with cis-eQTLs compared to existing methods. We interpret this to indicate that some cis-eQTLs are under positive selection compared to their surrounding genes. Conditional entropy indicative of a selective sweep is an especially strong predictor of eQTLs for genes in several biological processes of medical interest. Where conditional entropy is a weak or negative predictor of eQTLs, such as innate immune genes, this would be consistent with balancing selection acting on such eQTLs over long time periods. Different measures of selection may be needed for variant prioritization under other modes of evolutionary selection. PMID:26111110

  5. Identifying metabolic enzymes with multiple types of association evidence

    PubMed Central

    Kharchenko, Peter; Chen, Lifeng; Freund, Yoav; Vitkup, Dennis; Church, George M

    2006-01-01

    Background Existing large-scale metabolic models of sequenced organisms commonly include enzymatic functions which can not be attributed to any gene in that organism. Existing computational strategies for identifying such missing genes rely primarily on sequence homology to known enzyme-encoding genes. Results We present a novel method for identifying genes encoding for a specific metabolic function based on a local structure of metabolic network and multiple types of functional association evidence, including clustering of genes on the chromosome, similarity of phylogenetic profiles, gene expression, protein fusion events and others. Using E. coli and S. cerevisiae metabolic networks, we illustrate predictive ability of each individual type of association evidence and show that significantly better predictions can be obtained based on the combination of all data. In this way our method is able to predict 60% of enzyme-encoding genes of E. coli metabolism within the top 10 (out of 3551) candidates for their enzymatic function, and as a top candidate within 43% of the cases. Conclusion We illustrate that a combination of genome context and other functional association evidence is effective in predicting genes encoding metabolic enzymes. Our approach does not rely on direct sequence homology to known enzyme-encoding genes, and can be used in conjunction with traditional homology-based metabolic reconstruction methods. The method can also be used to target orphan metabolic activities. PMID:16571130

  6. The Joint Effects of Background Selection and Genetic Recombination on Local Gene Genealogies

    PubMed Central

    Zeng, Kai; Charlesworth, Brian

    2011-01-01

    Background selection, the effects of the continual removal of deleterious mutations by natural selection on variability at linked sites, is potentially a major determinant of DNA sequence variability. However, the joint effects of background selection and genetic recombination on the shape of the neutral gene genealogy have proved hard to study analytically. The only existing formula concerns the mean coalescent time for a pair of alleles, making it difficult to assess the importance of background selection from genome-wide data on sequence polymorphism. Here we develop a structured coalescent model of background selection with recombination and implement it in a computer program that efficiently generates neutral gene genealogies for an arbitrary sample size. We check the validity of the structured coalescent model against forward-in-time simulations and show that it accurately captures the effects of background selection. The model produces more accurate predictions of the mean coalescent time than the existing formula and supports the conclusion that the effect of background selection is greater in the interior of a deleterious region than at its boundaries. The level of linkage disequilibrium between sites is elevated by background selection, to an extent that is well summarized by a change in effective population size. The structured coalescent model is readily extendable to more realistic situations and should prove useful for analyzing genome-wide polymorphism data. PMID:21705759

  7. The joint effects of background selection and genetic recombination on local gene genealogies.

    PubMed

    Zeng, Kai; Charlesworth, Brian

    2011-09-01

    Background selection, the effects of the continual removal of deleterious mutations by natural selection on variability at linked sites, is potentially a major determinant of DNA sequence variability. However, the joint effects of background selection and genetic recombination on the shape of the neutral gene genealogy have proved hard to study analytically. The only existing formula concerns the mean coalescent time for a pair of alleles, making it difficult to assess the importance of background selection from genome-wide data on sequence polymorphism. Here we develop a structured coalescent model of background selection with recombination and implement it in a computer program that efficiently generates neutral gene genealogies for an arbitrary sample size. We check the validity of the structured coalescent model against forward-in-time simulations and show that it accurately captures the effects of background selection. The model produces more accurate predictions of the mean coalescent time than the existing formula and supports the conclusion that the effect of background selection is greater in the interior of a deleterious region than at its boundaries. The level of linkage disequilibrium between sites is elevated by background selection, to an extent that is well summarized by a change in effective population size. The structured coalescent model is readily extendable to more realistic situations and should prove useful for analyzing genome-wide polymorphism data.

  8. Genotet: An Interactive Web-based Visual Exploration Framework to Support Validation of Gene Regulatory Networks.

    PubMed

    Yu, Bowen; Doraiswamy, Harish; Chen, Xi; Miraldi, Emily; Arrieta-Ortiz, Mario Luis; Hafemeister, Christoph; Madar, Aviv; Bonneau, Richard; Silva, Cláudio T

    2014-12-01

    Elucidation of transcriptional regulatory networks (TRNs) is a fundamental goal in biology, and one of the most important components of TRNs are transcription factors (TFs), proteins that specifically bind to gene promoter and enhancer regions to alter target gene expression patterns. Advances in genomic technologies as well as advances in computational biology have led to multiple large regulatory network models (directed networks) each with a large corpus of supporting data and gene-annotation. There are multiple possible biological motivations for exploring large regulatory network models, including: validating TF-target gene relationships, figuring out co-regulation patterns, and exploring the coordination of cell processes in response to changes in cell state or environment. Here we focus on queries aimed at validating regulatory network models, and on coordinating visualization of primary data and directed weighted gene regulatory networks. The large size of both the network models and the primary data can make such coordinated queries cumbersome with existing tools and, in particular, inhibits the sharing of results between collaborators. In this work, we develop and demonstrate a web-based framework for coordinating visualization and exploration of expression data (RNA-seq, microarray), network models and gene-binding data (ChIP-seq). Using specialized data structures and multiple coordinated views, we design an efficient querying model to support interactive analysis of the data. Finally, we show the effectiveness of our framework through case studies for the mouse immune system (a dataset focused on a subset of key cellular functions) and a model bacteria (a small genome with high data-completeness).

  9. Gene regulation and noise reduction by coupling of stochastic processes

    NASA Astrophysics Data System (ADS)

    Ramos, Alexandre F.; Hornos, José Eduardo M.; Reinitz, John

    2015-02-01

    Here we characterize the low-noise regime of a stochastic model for a negative self-regulating binary gene. The model has two stochastic variables, the protein number and the state of the gene. Each state of the gene behaves as a protein source governed by a Poisson process. The coupling between the two gene states depends on protein number. This fact has a very important implication: There exist protein production regimes characterized by sub-Poissonian noise because of negative covariance between the two stochastic variables of the model. Hence the protein numbers obey a probability distribution that has a peak that is sharper than those of the two coupled Poisson processes that are combined to produce it. Biochemically, the noise reduction in protein number occurs when the switching of the genetic state is more rapid than protein synthesis or degradation. We consider the chemical reaction rates necessary for Poisson and sub-Poisson processes in prokaryotes and eucaryotes. Our results suggest that the coupling of multiple stochastic processes in a negative covariance regime might be a widespread mechanism for noise reduction.

  10. Gene regulation and noise reduction by coupling of stochastic processes

    PubMed Central

    Hornos, José Eduardo M.; Reinitz, John

    2015-01-01

    Here we characterize the low noise regime of a stochastic model for a negative self-regulating binary gene. The model has two stochastic variables, the protein number and the state of the gene. Each state of the gene behaves as a protein source governed by a Poisson process. The coupling between the the two gene states depends on protein number. This fact has a very important implication: there exist protein production regimes characterized by sub-Poissonian noise because of negative covariance between the two stochastic variables of the model. Hence the protein numbers obey a probability distribution that has a peak that is sharper than those of the two coupled Poisson processes that are combined to produce it. Biochemically, the noise reduction in protein number occurs when the switching of genetic state is more rapid than protein synthesis or degradation. We consider the chemical reaction rates necessary for Poisson and sub-Poisson processes in prokaryotes and eucaryotes. Our results suggest that the coupling of multiple stochastic processes in a negative covariance regime might be a widespread mechanism for noise reduction. PMID:25768447

  11. Gene regulation and noise reduction by coupling of stochastic processes.

    PubMed

    Ramos, Alexandre F; Hornos, José Eduardo M; Reinitz, John

    2015-02-01

    Here we characterize the low-noise regime of a stochastic model for a negative self-regulating binary gene. The model has two stochastic variables, the protein number and the state of the gene. Each state of the gene behaves as a protein source governed by a Poisson process. The coupling between the two gene states depends on protein number. This fact has a very important implication: There exist protein production regimes characterized by sub-Poissonian noise because of negative covariance between the two stochastic variables of the model. Hence the protein numbers obey a probability distribution that has a peak that is sharper than those of the two coupled Poisson processes that are combined to produce it. Biochemically, the noise reduction in protein number occurs when the switching of the genetic state is more rapid than protein synthesis or degradation. We consider the chemical reaction rates necessary for Poisson and sub-Poisson processes in prokaryotes and eucaryotes. Our results suggest that the coupling of multiple stochastic processes in a negative covariance regime might be a widespread mechanism for noise reduction.

  12. Barriers to Gene Flow in the Marine Environment: Insights from Two Common Intertidal Limpet Species of the Atlantic and Mediterranean

    PubMed Central

    Sá-Pinto, Alexandra; Branco, Madalena S.; Alexandrino, Paulo B.; Fontaine, Michaël C.; Baird, Stuart J. E.

    2012-01-01

    Knowledge of the scale of dispersal and the mechanisms governing gene flow in marine environments remains fragmentary despite being essential for understanding evolution of marine biota and to design management plans. We use the limpets Patella ulyssiponensis and Patella rustica as models for identifying factors affecting gene flow in marine organisms across the North-East Atlantic and the Mediterranean Sea. A set of allozyme loci and a fragment of the mitochondrial gene cytochrome C oxidase subunit I were screened for genetic variation through starch gel electrophoresis and DNA sequencing, respectively. An approach combining clustering algorithms with clinal analyses was used to test for the existence of barriers to gene flow and estimate their geographic location and abruptness. Sharp breaks in the genetic composition of individuals were observed in the transitions between the Atlantic and the Mediterranean and across southern Italian shores. An additional break within the Atlantic cluster separates samples from the Alboran Sea and Atlantic African shores from those of the Iberian Atlantic shores. The geographic congruence of the genetic breaks detected in these two limpet species strongly supports the existence of transpecific barriers to gene flow in the Mediterranean Sea and Northeastern Atlantic. This leads to testable hypotheses regarding factors restricting gene flow across the study area. PMID:23239977

  13. chromoWIZ: a web tool to query and visualize chromosome-anchored genes from cereal and model genomes.

    PubMed

    Nussbaumer, Thomas; Kugler, Karl G; Schweiger, Wolfgang; Bader, Kai C; Gundlach, Heidrun; Spannagl, Manuel; Poursarebani, Naser; Pfeifer, Matthias; Mayer, Klaus F X

    2014-12-10

    Over the last years reference genome sequences of several economically and scientifically important cereals and model plants became available. Despite the agricultural significance of these crops only a small number of tools exist that allow users to inspect and visualize the genomic position of genes of interest in an interactive manner. We present chromoWIZ, a web tool that allows visualizing the genomic positions of relevant genes and comparing these data between different plant genomes. Genes can be queried using gene identifiers, functional annotations, or sequence homology in four grass species (Triticum aestivum, Hordeum vulgare, Brachypodium distachyon, Oryza sativa). The distribution of the anchored genes is visualized along the chromosomes by using heat maps. Custom gene expression measurements, differential expression information, and gene-to-group mappings can be uploaded and can be used for further filtering. This tool is mainly designed for breeders and plant researchers, who are interested in the location and the distribution of candidate genes as well as in the syntenic relationships between different grass species. chromoWIZ is freely available and online accessible at http://mips.helmholtz-muenchen.de/plant/chromoWIZ/index.jsp.

  14. On meme--gene coevolution.

    PubMed

    Bull, L; Holland, O; Blackmore, S

    2000-01-01

    In this article we examine the effects of the emergence of a new replicator, memes, on the evolution of a pre-existing replicator, genes. Using a version of the NKCS model we examine the effects of increasing the rate of meme evolution in relation to the rate of gene evolution, for various degrees of interdependence between the two replicators. That is, the effects of memes' (suggested) more rapid rate of evolution in comparison to that of genes is investigated using a tunable model of coevolution. It is found that, for almost any degree of interdependence between the two replicators, as the rate of meme evolution increases, a phase transition-like dynamic occurs under which memes have a significantly detrimental effect on the evolution of genes, quickly resulting in the cessation of effective gene evolution. Conversely, the memes experience a sharp increase in benefit from increasing their rate of evolution. We then examine the effects of enabling genes to reduce the percentage of gene-detrimental evolutionary steps taken by memes. Here a critical region emerges as the comparative rate of meme evolution increases, such that if genes cannot effectively select memes a high percentage of the time, they suffer from meme evolution as if they had almost no selective capability.

  15. Application of network methods for understanding evolutionary dynamics in discrete habitats.

    PubMed

    Greenbaum, Gili; Fefferman, Nina H

    2017-06-01

    In populations occupying discrete habitat patches, gene flow between habitat patches may form an intricate population structure. In such structures, the evolutionary dynamics resulting from interaction of gene-flow patterns with other evolutionary forces may be exceedingly complex. Several models describing gene flow between discrete habitat patches have been presented in the population-genetics literature; however, these models have usually addressed relatively simple settings of habitable patches and have stopped short of providing general methodologies for addressing nontrivial gene-flow patterns. In the last decades, network theory - a branch of discrete mathematics concerned with complex interactions between discrete elements - has been applied to address several problems in population genetics by modelling gene flow between habitat patches using networks. Here, we present the idea and concepts of modelling complex gene flows in discrete habitats using networks. Our goal is to raise awareness to existing network theory applications in molecular ecology studies, as well as to outline the current and potential contribution of network methods to the understanding of evolutionary dynamics in discrete habitats. We review the main branches of network theory that have been, or that we believe potentially could be, applied to population genetics and molecular ecology research. We address applications to theoretical modelling and to empirical population-genetic studies, and we highlight future directions for extending the integration of network science with molecular ecology. © 2017 John Wiley & Sons Ltd.

  16. Genomics Analogy Model for Educators (GAME): Fuzzy DNA Model to Enable the Learning of Gene Sequencing by Visually-Impaired and Blind Students

    ERIC Educational Resources Information Center

    Butler, Charles; Bello, Julia; York, Alan; Orvis, Kathryn; Pittendrigh, Barry R.

    2008-01-01

    Much of the general population is aware of terms such as biotechnology, genetic engineering, and genomics. However, there is a lack of understanding concerning these fields among many secondary school students. Few teaching models exist to explain concepts behind genomics and even less are available for teaching the visually impaired and blind.…

  17. Development and Validation of the PREMM5 Model for Comprehensive Risk Assessment of Lynch Syndrome.

    PubMed

    Kastrinos, Fay; Uno, Hajime; Ukaegbu, Chinedu; Alvero, Carmelita; McFarland, Ashley; Yurgelun, Matthew B; Kulke, Matthew H; Schrag, Deborah; Meyerhardt, Jeffrey A; Fuchs, Charles S; Mayer, Robert J; Ng, Kimmie; Steyerberg, Ewout W; Syngal, Sapna

    2017-07-01

    Purpose Current Lynch syndrome (LS) prediction models quantify the risk to an individual of carrying a pathogenic germline mutation in three mismatch repair (MMR) genes: MLH1, MSH2, and MSH6. We developed a new prediction model, PREMM 5 , that incorporates the genes PMS2 and EPCAM to provide comprehensive LS risk assessment. Patients and Methods PREMM 5 was developed to predict the likelihood of a mutation in any of the LS genes by using polytomous logistic regression analysis of clinical and germline data from 18,734 individuals who were tested for all five genes. Predictors of mutation status included sex, age at genetic testing, and proband and family cancer histories. Discrimination was evaluated by the area under the receiver operating characteristic curve (AUC), and clinical impact was determined by decision curve analysis; comparisons were made to the existing PREMM 1,2,6 model. External validation of PREMM 5 was performed in a clinic-based cohort of 1,058 patients with colorectal cancer. Results Pathogenic mutations were detected in 1,000 (5%) of 18,734 patients in the development cohort; mutations included MLH1 (n = 306), MSH2 (n = 354), MSH6 (n = 177), PMS2 (n = 141), and EPCAM (n = 22). PREMM 5 distinguished carriers from noncarriers with an AUC of 0.81 (95% CI, 0.79 to 0.82), and performance was similar in the validation cohort (AUC, 0.83; 95% CI, 0.75 to 0.92). Prediction was more difficult for PMS2 mutations (AUC, 0.64; 95% CI, 0.60 to 0.68) than for other genes. Performance characteristics of PREMM 5 exceeded those of PREMM 1,2,6 . Decision curve analysis supported germline LS testing for PREMM 5 scores ≥ 2.5%. Conclusion PREMM 5 provides comprehensive risk estimation of all five LS genes and supports LS genetic testing for individuals with scores ≥ 2.5%. At this threshold, PREMM 5 provides performance that is superior to the existing PREMM 1,2,6 model in the identification of carriers of LS, including those with weaker phenotypes and individuals unaffected by cancer.

  18. Development and Validation of the PREMM5 Model for Comprehensive Risk Assessment of Lynch Syndrome

    PubMed Central

    Uno, Hajime; Ukaegbu, Chinedu; Alvero, Carmelita; McFarland, Ashley; Yurgelun, Matthew B.; Kulke, Matthew H.; Schrag, Deborah; Meyerhardt, Jeffrey A.; Fuchs, Charles S.; Mayer, Robert J.; Ng, Kimmie; Steyerberg, Ewout W.; Syngal, Sapna

    2017-01-01

    Purpose Current Lynch syndrome (LS) prediction models quantify the risk to an individual of carrying a pathogenic germline mutation in three mismatch repair (MMR) genes: MLH1, MSH2, and MSH6. We developed a new prediction model, PREMM5, that incorporates the genes PMS2 and EPCAM to provide comprehensive LS risk assessment. Patients and Methods PREMM5 was developed to predict the likelihood of a mutation in any of the LS genes by using polytomous logistic regression analysis of clinical and germline data from 18,734 individuals who were tested for all five genes. Predictors of mutation status included sex, age at genetic testing, and proband and family cancer histories. Discrimination was evaluated by the area under the receiver operating characteristic curve (AUC), and clinical impact was determined by decision curve analysis; comparisons were made to the existing PREMM1,2,6 model. External validation of PREMM5 was performed in a clinic-based cohort of 1,058 patients with colorectal cancer. Results Pathogenic mutations were detected in 1,000 (5%) of 18,734 patients in the development cohort; mutations included MLH1 (n = 306), MSH2 (n = 354), MSH6 (n = 177), PMS2 (n = 141), and EPCAM (n = 22). PREMM5 distinguished carriers from noncarriers with an AUC of 0.81 (95% CI, 0.79 to 0.82), and performance was similar in the validation cohort (AUC, 0.83; 95% CI, 0.75 to 0.92). Prediction was more difficult for PMS2 mutations (AUC, 0.64; 95% CI, 0.60 to 0.68) than for other genes. Performance characteristics of PREMM5 exceeded those of PREMM1,2,6. Decision curve analysis supported germline LS testing for PREMM5 scores ≥ 2.5%. Conclusion PREMM5 provides comprehensive risk estimation of all five LS genes and supports LS genetic testing for individuals with scores ≥ 2.5%. At this threshold, PREMM5 provides performance that is superior to the existing PREMM1,2,6 model in the identification of carriers of LS, including those with weaker phenotypes and individuals unaffected by cancer. PMID:28489507

  19. Theory of microbial genome evolution

    NASA Astrophysics Data System (ADS)

    Koonin, Eugene

    Bacteria and archaea have small genomes tightly packed with protein-coding genes. This compactness is commonly perceived as evidence of adaptive genome streamlining caused by strong purifying selection in large microbial populations. In such populations, even the small cost incurred by nonfunctional DNA because of extra energy and time expenditure is thought to be sufficient for this extra genetic material to be eliminated by selection. However, contrary to the predictions of this model, there exists a consistent, positive correlation between the strength of selection at the protein sequence level, measured as the ratio of nonsynonymous to synonymous substitution rates, and microbial genome size. By fitting the genome size distributions in multiple groups of prokaryotes to predictions of mathematical models of population evolution, we show that only models in which acquisition of additional genes is, on average, slightly beneficial yield a good fit to genomic data. Thus, the number of genes in prokaryotic genomes seems to reflect the equilibrium between the benefit of additional genes that diminishes as the genome grows and deletion bias. New genes acquired by microbial genomes, on average, appear to be adaptive. Evolution of bacterial and archaeal genomes involves extensive horizontal gene transfer and gene loss. Many microbes have open pangenomes, where each newly sequenced genome contains more than 10% `ORFans', genes without detectable homologues in other species. A simple, steady-state evolutionary model reveals two sharply distinct classes of microbial genes, one of which (ORFans) is characterized by effectively instantaneous gene replacement, whereas the other consists of genes with finite, distributed replacement rates. These findings imply a conservative estimate of at least a billion distinct genes in the prokaryotic genomic universe.

  20. Incorporating networks in a probabilistic graphical model to find drivers for complex human diseases.

    PubMed

    Mezlini, Aziz M; Goldenberg, Anna

    2017-10-01

    Discovering genetic mechanisms driving complex diseases is a hard problem. Existing methods often lack power to identify the set of responsible genes. Protein-protein interaction networks have been shown to boost power when detecting gene-disease associations. We introduce a Bayesian framework, Conflux, to find disease associated genes from exome sequencing data using networks as a prior. There are two main advantages to using networks within a probabilistic graphical model. First, networks are noisy and incomplete, a substantial impediment to gene discovery. Incorporating networks into the structure of a probabilistic models for gene inference has less impact on the solution than relying on the noisy network structure directly. Second, using a Bayesian framework we can keep track of the uncertainty of each gene being associated with the phenotype rather than returning a fixed list of genes. We first show that using networks clearly improves gene detection compared to individual gene testing. We then show consistently improved performance of Conflux compared to the state-of-the-art diffusion network-based method Hotnet2 and a variety of other network and variant aggregation methods, using randomly generated and literature-reported gene sets. We test Hotnet2 and Conflux on several network configurations to reveal biases and patterns of false positives and false negatives in each case. Our experiments show that our novel Bayesian framework Conflux incorporates many of the advantages of the current state-of-the-art methods, while offering more flexibility and improved power in many gene-disease association scenarios.

  1. Dynamical behaviour of a discrete selection-migration model with arbitrary dominance

    Treesearch

    James F. Selgrade; Jordan West Bostic; James H. Roberds

    2009-01-01

    To study the effects of immigration of genes (possibly transgenic) into a natural population, a one-island selection-migration model with density-dependent regulation is used to track allele frequency and population size. The existence and uniqueness of a polymorphic genetic equilibrium is proved under a general assumption about dominance in fitnesses. Also, conditions...

  2. No control genes required: Bayesian analysis of qRT-PCR data.

    PubMed

    Matz, Mikhail V; Wright, Rachel M; Scott, James G

    2013-01-01

    Model-based analysis of data from quantitative reverse-transcription PCR (qRT-PCR) is potentially more powerful and versatile than traditional methods. Yet existing model-based approaches cannot properly deal with the higher sampling variances associated with low-abundant targets, nor do they provide a natural way to incorporate assumptions about the stability of control genes directly into the model-fitting process. In our method, raw qPCR data are represented as molecule counts, and described using generalized linear mixed models under Poisson-lognormal error. A Markov Chain Monte Carlo (MCMC) algorithm is used to sample from the joint posterior distribution over all model parameters, thereby estimating the effects of all experimental factors on the expression of every gene. The Poisson-based model allows for the correct specification of the mean-variance relationship of the PCR amplification process, and can also glean information from instances of no amplification (zero counts). Our method is very flexible with respect to control genes: any prior knowledge about the expected degree of their stability can be directly incorporated into the model. Yet the method provides sensible answers without such assumptions, or even in the complete absence of control genes. We also present a natural Bayesian analogue of the "classic" analysis, which uses standard data pre-processing steps (logarithmic transformation and multi-gene normalization) but estimates all gene expression changes jointly within a single model. The new methods are considerably more flexible and powerful than the standard delta-delta Ct analysis based on pairwise t-tests. Our methodology expands the applicability of the relative-quantification analysis protocol all the way to the lowest-abundance targets, and provides a novel opportunity to analyze qRT-PCR data without making any assumptions concerning target stability. These procedures have been implemented as the MCMC.qpcr package in R.

  3. Multiclass classification of microarray data samples with a reduced number of genes

    PubMed Central

    2011-01-01

    Background Multiclass classification of microarray data samples with a reduced number of genes is a rich and challenging problem in Bioinformatics research. The problem gets harder as the number of classes is increased. In addition, the performance of most classifiers is tightly linked to the effectiveness of mandatory gene selection methods. Critical to gene selection is the availability of estimates about the maximum number of genes that can be handled by any classification algorithm. Lack of such estimates may lead to either computationally demanding explorations of a search space with thousands of dimensions or classification models based on gene sets of unrestricted size. In the former case, unbiased but possibly overfitted classification models may arise. In the latter case, biased classification models unable to support statistically significant findings may be obtained. Results A novel bound on the maximum number of genes that can be handled by binary classifiers in binary mediated multiclass classification algorithms of microarray data samples is presented. The bound suggests that high-dimensional binary output domains might favor the existence of accurate and sparse binary mediated multiclass classifiers for microarray data samples. Conclusions A comprehensive experimental work shows that the bound is indeed useful to induce accurate and sparse multiclass classifiers for microarray data samples. PMID:21342522

  4. Ab initio gene identification in metagenomic sequences

    PubMed Central

    Zhu, Wenhan; Lomsadze, Alexandre; Borodovsky, Mark

    2010-01-01

    We describe an algorithm for gene identification in DNA sequences derived from shotgun sequencing of microbial communities. Accurate ab initio gene prediction in a short nucleotide sequence of anonymous origin is hampered by uncertainty in model parameters. While several machine learning approaches could be proposed to bypass this difficulty, one effective method is to estimate parameters from dependencies, formed in evolution, between frequencies of oligonucleotides in protein-coding regions and genome nucleotide composition. Original version of the method was proposed in 1999 and has been used since for (i) reconstructing codon frequency vector needed for gene finding in viral genomes and (ii) initializing parameters of self-training gene finding algorithms. With advent of new prokaryotic genomes en masse it became possible to enhance the original approach by using direct polynomial and logistic approximations of oligonucleotide frequencies, as well as by separating models for bacteria and archaea. These advances have increased the accuracy of model reconstruction and, subsequently, gene prediction. We describe the refined method and assess its accuracy on known prokaryotic genomes split into short sequences. Also, we show that as a result of application of the new method, several thousands of new genes could be added to existing annotations of several human and mouse gut metagenomes. PMID:20403810

  5. Latent variable models for gene-environment interactions in longitudinal studies with multiple correlated exposures.

    PubMed

    Tao, Yebin; Sánchez, Brisa N; Mukherjee, Bhramar

    2015-03-30

    Many existing cohort studies designed to investigate health effects of environmental exposures also collect data on genetic markers. The Early Life Exposures in Mexico to Environmental Toxicants project, for instance, has been genotyping single nucleotide polymorphisms on candidate genes involved in mental and nutrient metabolism and also in potentially shared metabolic pathways with the environmental exposures. Given the longitudinal nature of these cohort studies, rich exposure and outcome data are available to address novel questions regarding gene-environment interaction (G × E). Latent variable (LV) models have been effectively used for dimension reduction, helping with multiple testing and multicollinearity issues in the presence of correlated multivariate exposures and outcomes. In this paper, we first propose a modeling strategy, based on LV models, to examine the association between repeated outcome measures (e.g., child weight) and a set of correlated exposure biomarkers (e.g., prenatal lead exposure). We then construct novel tests for G × E effects within the LV framework to examine effect modification of outcome-exposure association by genetic factors (e.g., the hemochromatosis gene). We consider two scenarios: one allowing dependence of the LV models on genes and the other assuming independence between the LV models and genes. We combine the two sets of estimates by shrinkage estimation to trade off bias and efficiency in a data-adaptive way. Using simulations, we evaluate the properties of the shrinkage estimates, and in particular, we demonstrate the need for this data-adaptive shrinkage given repeated outcome measures, exposure measures possibly repeated and time-varying gene-environment association. Copyright © 2014 John Wiley & Sons, Ltd.

  6. Kinetics of conjugative gene transfer on surfaces in granular porous media

    NASA Astrophysics Data System (ADS)

    Massoudieh, A.; Crain, C.; Lambertini, E.; Nelson, K. E.; Barkouki, T.; L'Amoreaux, P.; Loge, F. J.; Ginn, T. R.

    2010-03-01

    The transfer of genetic material among bacteria in the environment can occur both in the planktonic and attached state. Given the propensity of organisms to exist in sessile microbial communities in oligotrophic subsurface conditions, and that such conditions typify the subsurface, this study focuses on exploratory modeling of horizontal gene transfer among surface-associated Escherichiacoli in the subsurface. The mathematics so far used to describe the kinetics of conjugation in biofilms are developed largely from experimental observations of planktonic gene transfer, and are absent of lags or plasmid stability that appear experimentally. We develop a model and experimental system to quantify bacterial filtration and gene transfer in the attached state, on granular porous media. We include attachment kinetics described in Nelson et al. (2007) using the filtration theory approach of Nelson and Ginn (2001, 2005) with motility of E. coli described according to Biondi et al. (1998).

  7. Kinetics of conjugative gene transfer on surfaces in granular porous media

    NASA Astrophysics Data System (ADS)

    Ginn, T.; Massoudieh, A.; Nelson, K.; Mathew, A.; Lambertini, E.

    2005-12-01

    The transfer of genetic material among bacteria in the environment can occur both in the planktonic and attached state. Given the propensity of organisms to exist in sessile microbial communities in oligotrophic conditions, and that such conditions typify the subsurface, this study focuses on exploratory modeling of horizontal gene transfer among surface-associated E. coli in the subsurface. The mathematics so far used to describe the kinetics of conjugation in biofilms are developed largely from experimental observations of planktonic gene transfer, and are absent of lags or plasmid stability that appear experimentally. We develop a model for bacterial filtration and gene transfer in the attached state, for the early stages of biofilm formation using a recently revised filtration theory approach (Nelson and Ginn, 2005) with motility of E. coli described as a continuous time random walk according to data from microflow chamber experiments (Biondi et al., 2002).

  8. Current CRISPR gene drive systems are likely to be highly invasive in wild populations.

    PubMed

    Noble, Charleston; Adlam, Ben; Church, George M; Esvelt, Kevin M; Nowak, Martin A

    2018-06-19

    Recent reports have suggested that self-propagating CRISPR-based gene drive systems are unlikely to efficiently invade wild populations due to drive-resistant alleles that prevent cutting. Here we develop mathematical models based on existing empirical data to explicitly test this assumption for population alteration drives. Our models show that although resistance prevents spread to fixation in large populations, even the least effective drive systems reported to date are likely to be highly invasive. Releasing a small number of organisms will often cause invasion of the local population, followed by invasion of additional populations connected by very low rates of gene flow. Hence, initiating contained field trials as tentatively endorsed by the National Academies report on gene drive could potentially result in unintended spread to additional populations. Our mathematical results suggest that self-propagating gene drive is best suited to applications such as malaria prevention that seek to affect all wild populations of the target species. © 2018, Noble et al.

  9. Monitoring transcription initiation activities in rat and dog.

    PubMed

    Lizio, Marina; Mukarram, Abdul Kadir; Ohno, Mizuho; Watanabe, Shoko; Itoh, Masayoshi; Hasegawa, Akira; Lassmann, Timo; Severin, Jessica; Harshbarger, Jayson; Abugessaisa, Imad; Kasukawa, Takeya; Hon, Chung Chau; Carninci, Piero; Hayashizaki, Yoshihide; Forrest, Alistair R R; Kawaji, Hideya

    2017-11-28

    The promoter landscape of several non-human model organisms is far from complete. As a part of FANTOM5 data collection, we generated 13 profiles of transcription initiation activities in dog and rat aortic smooth muscle cells, mesenchymal stem cells and hepatocytes by employing CAGE (Cap Analysis of Gene Expression) technology combined with single molecule sequencing. Our analyses show that the CAGE profiles recapitulate known transcription start sites (TSSs) consistently, in addition to uncover novel TSSs. Our dataset can be thus used with high confidence to support gene annotation in dog and rat species. We identified 28,497 and 23,147 CAGE peaks, or promoter regions, for rat and dog respectively, and associated them to known genes. This approach could be seen as a standard method for improvement of existing gene models, as well as discovery of novel genes. Given that the FANTOM5 data collection includes dog and rat matched cell types in human and mouse as well, this data would also be useful for cross-species studies.

  10. The transcription factor titration effect dictates level of gene expression.

    PubMed

    Brewster, Robert C; Weinert, Franz M; Garcia, Hernan G; Song, Dan; Rydenfelt, Mattias; Phillips, Rob

    2014-03-13

    Models of transcription are often built around a picture of RNA polymerase and transcription factors (TFs) acting on a single copy of a promoter. However, most TFs are shared between multiple genes with varying binding affinities. Beyond that, genes often exist at high copy number-in multiple identical copies on the chromosome or on plasmids or viral vectors with copy numbers in the hundreds. Using a thermodynamic model, we characterize the interplay between TF copy number and the demand for that TF. We demonstrate the parameter-free predictive power of this model as a function of the copy number of the TF and the number and affinities of the available specific binding sites; such predictive control is important for the understanding of transcription and the desire to quantitatively design the output of genetic circuits. Finally, we use these experiments to dynamically measure plasmid copy number through the cell cycle. Copyright © 2014 Elsevier Inc. All rights reserved.

  11. Determining Semantically Related Significant Genes.

    PubMed

    Taha, Kamal

    2014-01-01

    GO relation embodies some aspects of existence dependency. If GO term xis existence-dependent on GO term y, the presence of y implies the presence of x. Therefore, the genes annotated with the function of the GO term y are usually functionally and semantically related to the genes annotated with the function of the GO term x. A large number of gene set enrichment analysis methods have been developed in recent years for analyzing gene sets enrichment. However, most of these methods overlook the structural dependencies between GO terms in GO graph by not considering the concept of existence dependency. We propose in this paper a biological search engine called RSGSearch that identifies enriched sets of genes annotated with different functions using the concept of existence dependency. We observe that GO term xcannot be existence-dependent on GO term y, if x- and y- have the same specificity (biological characteristics). After encoding into a numeric format the contributions of GO terms annotating target genes to the semantics of their lowest common ancestors (LCAs), RSGSearch uses microarray experiment to identify the most significant LCA that annotates the result genes. We evaluated RSGSearch experimentally and compared it with five gene set enrichment systems. Results showed marked improvement.

  12. Rapid Generation of Human Genetic Loss-of-Function iPSC Lines by Simultaneous Reprogramming and Gene Editing.

    PubMed

    Tidball, Andrew M; Dang, Louis T; Glenn, Trevor W; Kilbane, Emma G; Klarr, Daniel J; Margolis, Joshua L; Uhler, Michael D; Parent, Jack M

    2017-09-12

    Specifically ablating genes in human induced pluripotent stem cells (iPSCs) allows for studies of gene function as well as disease mechanisms in disorders caused by loss-of-function (LOF) mutations. While techniques exist for engineering such lines, we have developed and rigorously validated a method of simultaneous iPSC reprogramming while generating CRISPR/Cas9-dependent insertions/deletions (indels). This approach allows for the efficient and rapid formation of genetic LOF human disease cell models with isogenic controls. The rate of mutagenized lines was strikingly consistent across experiments targeting four different human epileptic encephalopathy genes and a metabolic enzyme-encoding gene, and was more efficient and consistent than using CRISPR gene editing of established iPSC lines. The ability of our streamlined method to reproducibly generate heterozygous and homozygous LOF iPSC lines with passage-matched isogenic controls in a single step provides for the rapid development of LOF disease models with ideal control lines, even in the absence of patient tissue. Copyright © 2017 The Author(s). Published by Elsevier Inc. All rights reserved.

  13. Pluripotency gene network dynamics: System views from parametric analysis.

    PubMed

    Akberdin, Ilya R; Omelyanchuk, Nadezda A; Fadeev, Stanislav I; Leskova, Natalya E; Oschepkova, Evgeniya A; Kazantsev, Fedor V; Matushkin, Yury G; Afonnikov, Dmitry A; Kolchanov, Nikolay A

    2018-01-01

    Multiple experimental data demonstrated that the core gene network orchestrating self-renewal and differentiation of mouse embryonic stem cells involves activity of Oct4, Sox2 and Nanog genes by means of a number of positive feedback loops among them. However, recent studies indicated that the architecture of the core gene network should also incorporate negative Nanog autoregulation and might not include positive feedbacks from Nanog to Oct4 and Sox2. Thorough parametric analysis of the mathematical model based on this revisited core regulatory circuit identified that there are substantial changes in model dynamics occurred depending on the strength of Oct4 and Sox2 activation and molecular complexity of Nanog autorepression. The analysis showed the existence of four dynamical domains with different numbers of stable and unstable steady states. We hypothesize that these domains can constitute the checkpoints in a developmental progression from naïve to primed pluripotency and vice versa. During this transition, parametric conditions exist, which generate an oscillatory behavior of the system explaining heterogeneity in expression of pluripotent and differentiation factors in serum ESC cultures. Eventually, simulations showed that addition of positive feedbacks from Nanog to Oct4 and Sox2 leads mainly to increase of the parametric space for the naïve ESC state, in which pluripotency factors are strongly expressed while differentiation ones are repressed.

  14. Coexistence of Y, W, and Z sex chromosomes in Xenopus tropicalis

    PubMed Central

    Roco, Álvaro S.; Olmstead, Allen W.; Degitz, Sigmund J.; Amano, Tosikazu; Zimmerman, Lyle B.; Bullejos, Mónica

    2015-01-01

    Homomorphic sex chromosomes and rapid turnover of sex-determining genes can complicate establishing the sex chromosome system operating in a given species. This difficulty exists in Xenopus tropicalis, an anuran quickly becoming a relevant model for genetic, genomic, biochemical, and ecotoxicological research. Despite the recent interest attracted by this species, little is known about its sex chromosome system. Direct evidence that females are the heterogametic sex, as in the related species Xenopus laevis, has yet to be presented. Furthermore, X. laevis’ sex-determining gene, DM-W, does not exist in X. tropicalis, and the sex chromosomes in the two species are not homologous. Here we identify X. tropicalis’ sex chromosome system by integrating data from (i) breeding sex-reversed individuals, (ii) gynogenesis, (iii) triploids, and (iv) crosses among several strains. Our results indicate that at least three different types of sex chromosomes exist: Y, W, and Z, observed in YZ, YW, and ZZ males and in ZW and WW females. Because some combinations of parental sex chromosomes produce unisex offspring and other distorted sex ratios, understanding the sex-determination systems in X. tropicalis is critical for developing this flexible animal model for genetics and ecotoxicology. PMID:26216983

  15. Dynamical Analysis of Density-dependent Selection in a Discrete one-island Migration Model

    Treesearch

    James H. Roberds; James F. Selgrade

    2000-01-01

    A system of non-linear difference equations is used to model the effects of density-dependent selection and migration in a population characterized by two alleles at a single gene locus. Results for the existence and stability of polymorphic equilibria are established. Properties for a genetically important class of equilibria associated with complete dominance in...

  16. Applications of statistical physics and information theory to the analysis of DNA sequences

    NASA Astrophysics Data System (ADS)

    Grosse, Ivo

    2000-10-01

    DNA carries the genetic information of most living organisms, and the of genome projects is to uncover that genetic information. One basic task in the analysis of DNA sequences is the recognition of protein coding genes. Powerful computer programs for gene recognition have been developed, but most of them are based on statistical patterns that vary from species to species. In this thesis I address the question if there exist universal statistical patterns that are different in coding and noncoding DNA of all living species, regardless of their phylogenetic origin. In search for such species-independent patterns I study the mutual information function of genomic DNA sequences, and find that it shows persistent period-three oscillations. To understand the biological origin of the observed period-three oscillations, I compare the mutual information function of genomic DNA sequences to the mutual information function of stochastic model sequences. I find that the pseudo-exon model is able to reproduce the mutual information function of genomic DNA sequences. Moreover, I find that a generalization of the pseudo-exon model can connect the existence and the functional form of long-range correlations to the presence and the length distributions of coding and noncoding regions. Based on these theoretical studies I am able to find an information-theoretical quantity, the average mutual information (AMI), whose probability distributions are significantly different in coding and noncoding DNA, while they are almost identical in all studied species. These findings show that there exist universal statistical patterns that are different in coding and noncoding DNA of all studied species, and they suggest that the AMI may be used to identify genes in different living species, irrespective of their taxonomic origin.

  17. Identification of Novel Tissue-Specific Genes by Analysis of Microarray Databases: A Human and Mouse Model

    PubMed Central

    Suh, Yeunsu; Davis, Michael E.; Lee, Kichoon

    2013-01-01

    Understanding the tissue-specific pattern of gene expression is critical in elucidating the molecular mechanisms of tissue development, gene function, and transcriptional regulations of biological processes. Although tissue-specific gene expression information is available in several databases, follow-up strategies to integrate and use these data are limited. The objective of the current study was to identify and evaluate novel tissue-specific genes in human and mouse tissues by performing comparative microarray database analysis and semi-quantitative PCR analysis. We developed a powerful approach to predict tissue-specific genes by analyzing existing microarray data from the NCBI′s Gene Expression Omnibus (GEO) public repository. We investigated and confirmed tissue-specific gene expression in the human and mouse kidney, liver, lung, heart, muscle, and adipose tissue. Applying our novel comparative microarray approach, we confirmed 10 kidney, 11 liver, 11 lung, 11 heart, 8 muscle, and 8 adipose specific genes. The accuracy of this approach was further verified by employing semi-quantitative PCR reaction and by searching for gene function information in existing publications. Three novel tissue-specific genes were discovered by this approach including AMDHD1 (amidohydrolase domain containing 1) in the liver, PRUNE2 (prune homolog 2) in the heart, and ACVR1C (activin A receptor, type IC) in adipose tissue. We further confirmed the tissue-specific expression of these 3 novel genes by real-time PCR. Among them, ACVR1C is adipose tissue-specific and adipocyte-specific in adipose tissue, and can be used as an adipocyte developmental marker. From GEO profiles, we predicted the processes in which AMDHD1 and PRUNE2 may participate. Our approach provides a novel way to identify new sets of tissue-specific genes and to predict functions in which they may be involved. PMID:23741331

  18. Upon Accounting for the Impact of Isoenzyme Loss, Gene Deletion Costs Anticorrelate with Their Evolutionary Rates.

    PubMed

    Jacobs, Christopher; Lambourne, Luke; Xia, Yu; Segrè, Daniel

    2017-01-01

    System-level metabolic network models enable the computation of growth and metabolic phenotypes from an organism's genome. In particular, flux balance approaches have been used to estimate the contribution of individual metabolic genes to organismal fitness, offering the opportunity to test whether such contributions carry information about the evolutionary pressure on the corresponding genes. Previous failure to identify the expected negative correlation between such computed gene-loss cost and sequence-derived evolutionary rates in Saccharomyces cerevisiae has been ascribed to a real biological gap between a gene's fitness contribution to an organism "here and now" and the same gene's historical importance as evidenced by its accumulated mutations over millions of years of evolution. Here we show that this negative correlation does exist, and can be exposed by revisiting a broadly employed assumption of flux balance models. In particular, we introduce a new metric that we call "function-loss cost", which estimates the cost of a gene loss event as the total potential functional impairment caused by that loss. This new metric displays significant negative correlation with evolutionary rate, across several thousand minimal environments. We demonstrate that the improvement gained using function-loss cost over gene-loss cost is explained by replacing the base assumption that isoenzymes provide unlimited capacity for backup with the assumption that isoenzymes are completely non-redundant. We further show that this change of the assumption regarding isoenzymes increases the recall of epistatic interactions predicted by the flux balance model at the cost of a reduction in the precision of the predictions. In addition to suggesting that the gene-to-reaction mapping in genome-scale flux balance models should be used with caution, our analysis provides new evidence that evolutionary gene importance captures much more than strict essentiality.

  19. A statistical approach to identify, monitor, and manage incomplete curated data sets.

    PubMed

    Howe, Douglas G

    2018-04-02

    Many biological knowledge bases gather data through expert curation of published literature. High data volume, selective partial curation, delays in access, and publication of data prior to the ability to curate it can result in incomplete curation of published data. Knowing which data sets are incomplete and how incomplete they are remains a challenge. Awareness that a data set may be incomplete is important for proper interpretation, to avoiding flawed hypothesis generation, and can justify further exploration of published literature for additional relevant data. Computational methods to assess data set completeness are needed. One such method is presented here. In this work, a multivariate linear regression model was used to identify genes in the Zebrafish Information Network (ZFIN) Database having incomplete curated gene expression data sets. Starting with 36,655 gene records from ZFIN, data aggregation, cleansing, and filtering reduced the set to 9870 gene records suitable for training and testing the model to predict the number of expression experiments per gene. Feature engineering and selection identified the following predictive variables: the number of journal publications; the number of journal publications already attributed for gene expression annotation; the percent of journal publications already attributed for expression data; the gene symbol; and the number of transgenic constructs associated with each gene. Twenty-five percent of the gene records (2483 genes) were used to train the model. The remaining 7387 genes were used to test the model. One hundred and twenty-two and 165 of the 7387 tested genes were identified as missing expression annotations based on their residuals being outside the model lower or upper 95% confidence interval respectively. The model had precision of 0.97 and recall of 0.71 at the negative 95% confidence interval and precision of 0.76 and recall of 0.73 at the positive 95% confidence interval. This method can be used to identify data sets that are incompletely curated, as demonstrated using the gene expression data set from ZFIN. This information can help both database resources and data consumers gauge when it may be useful to look further for published data to augment the existing expertly curated information.

  20. Inference of Gene Regulatory Networks Incorporating Multi-Source Biological Knowledge via a State Space Model with L1 Regularization

    PubMed Central

    Hasegawa, Takanori; Yamaguchi, Rui; Nagasaki, Masao; Miyano, Satoru; Imoto, Seiya

    2014-01-01

    Comprehensive understanding of gene regulatory networks (GRNs) is a major challenge in the field of systems biology. Currently, there are two main approaches in GRN analysis using time-course observation data, namely an ordinary differential equation (ODE)-based approach and a statistical model-based approach. The ODE-based approach can generate complex dynamics of GRNs according to biologically validated nonlinear models. However, it cannot be applied to ten or more genes to simultaneously estimate system dynamics and regulatory relationships due to the computational difficulties. The statistical model-based approach uses highly abstract models to simply describe biological systems and to infer relationships among several hundreds of genes from the data. However, the high abstraction generates false regulations that are not permitted biologically. Thus, when dealing with several tens of genes of which the relationships are partially known, a method that can infer regulatory relationships based on a model with low abstraction and that can emulate the dynamics of ODE-based models while incorporating prior knowledge is urgently required. To accomplish this, we propose a method for inference of GRNs using a state space representation of a vector auto-regressive (VAR) model with L1 regularization. This method can estimate the dynamic behavior of genes based on linear time-series modeling constructed from an ODE-based model and can infer the regulatory structure among several tens of genes maximizing prediction ability for the observational data. Furthermore, the method is capable of incorporating various types of existing biological knowledge, e.g., drug kinetics and literature-recorded pathways. The effectiveness of the proposed method is shown through a comparison of simulation studies with several previous methods. For an application example, we evaluated mRNA expression profiles over time upon corticosteroid stimulation in rats, thus incorporating corticosteroid kinetics/dynamics, literature-recorded pathways and transcription factor (TF) information. PMID:25162401

  1. Inferring Gene Regulatory Networks by Singular Value Decomposition and Gravitation Field Algorithm

    PubMed Central

    Zheng, Ming; Wu, Jia-nan; Huang, Yan-xin; Liu, Gui-xia; Zhou, You; Zhou, Chun-guang

    2012-01-01

    Reconstruction of gene regulatory networks (GRNs) is of utmost interest and has become a challenge computational problem in system biology. However, every existing inference algorithm from gene expression profiles has its own advantages and disadvantages. In particular, the effectiveness and efficiency of every previous algorithm is not high enough. In this work, we proposed a novel inference algorithm from gene expression data based on differential equation model. In this algorithm, two methods were included for inferring GRNs. Before reconstructing GRNs, singular value decomposition method was used to decompose gene expression data, determine the algorithm solution space, and get all candidate solutions of GRNs. In these generated family of candidate solutions, gravitation field algorithm was modified to infer GRNs, used to optimize the criteria of differential equation model, and search the best network structure result. The proposed algorithm is validated on both the simulated scale-free network and real benchmark gene regulatory network in networks database. Both the Bayesian method and the traditional differential equation model were also used to infer GRNs, and the results were used to compare with the proposed algorithm in our work. And genetic algorithm and simulated annealing were also used to evaluate gravitation field algorithm. The cross-validation results confirmed the effectiveness of our algorithm, which outperforms significantly other previous algorithms. PMID:23226565

  2. A mesh generation and machine learning framework for Drosophila gene expression pattern image analysis

    PubMed Central

    2013-01-01

    Background Multicellular organisms consist of cells of many different types that are established during development. Each type of cell is characterized by the unique combination of expressed gene products as a result of spatiotemporal gene regulation. Currently, a fundamental challenge in regulatory biology is to elucidate the gene expression controls that generate the complex body plans during development. Recent advances in high-throughput biotechnologies have generated spatiotemporal expression patterns for thousands of genes in the model organism fruit fly Drosophila melanogaster. Existing qualitative methods enhanced by a quantitative analysis based on computational tools we present in this paper would provide promising ways for addressing key scientific questions. Results We develop a set of computational methods and open source tools for identifying co-expressed embryonic domains and the associated genes simultaneously. To map the expression patterns of many genes into the same coordinate space and account for the embryonic shape variations, we develop a mesh generation method to deform a meshed generic ellipse to each individual embryo. We then develop a co-clustering formulation to cluster the genes and the mesh elements, thereby identifying co-expressed embryonic domains and the associated genes simultaneously. Experimental results indicate that the gene and mesh co-clusters can be correlated to key developmental events during the stages of embryogenesis we study. The open source software tool has been made available at http://compbio.cs.odu.edu/fly/. Conclusions Our mesh generation and machine learning methods and tools improve upon the flexibility, ease-of-use and accuracy of existing methods. PMID:24373308

  3. Leveraging multiple gene networks to prioritize GWAS candidate genes via network representation learning.

    PubMed

    Wu, Mengmeng; Zeng, Wanwen; Liu, Wenqiang; Lv, Hairong; Chen, Ting; Jiang, Rui

    2018-06-03

    Genome-wide association studies (GWAS) have successfully discovered a number of disease-associated genetic variants in the past decade, providing an unprecedented opportunity for deciphering genetic basis of human inherited diseases. However, it is still a challenging task to extract biological knowledge from the GWAS data, due to such issues as missing heritability and weak interpretability. Indeed, the fact that the majority of discovered loci fall into noncoding regions without clear links to genes has been preventing the characterization of their functions and appealing for a sophisticated approach to bridge genetic and genomic studies. Towards this problem, network-based prioritization of candidate genes, which performs integrated analysis of gene networks with GWAS data, has emerged as a promising direction and attracted much attention. However, most existing methods overlook the sparse and noisy properties of gene networks and thus may lead to suboptimal performance. Motivated by this understanding, we proposed a novel method called REGENT for integrating multiple gene networks with GWAS data to prioritize candidate genes for complex diseases. We leveraged a technique called the network representation learning to embed a gene network into a compact and robust feature space, and then designed a hierarchical statistical model to integrate features of multiple gene networks with GWAS data for the effective inference of genes associated with a disease of interest. We applied our method to six complex diseases and demonstrated the superior performance of REGENT over existing approaches in recovering known disease-associated genes. We further conducted a pathway analysis and showed that the ability of REGENT to discover disease-associated pathways. We expect to see applications of our method to a broad spectrum of diseases for post-GWAS analysis. REGENT is freely available at https://github.com/wmmthu/REGENT. Copyright © 2018 Elsevier Inc. All rights reserved.

  4. Response of PAH-degrading genes to PAH bioavailability in the overlying water, suspended sediment, and deposited sediment of the Yangtze River.

    PubMed

    Xia, Xinghui; Xia, Na; Lai, Yunjia; Dong, Jianwei; Zhao, Pujun; Zhu, Baotong; Li, Zhihuang; Ye, Wan; Yuan, Yue; Huang, Junxiong

    2015-06-01

    The degrading genes of hydrophobic organic compounds (HOCs) serve as indicators of in situ HOC degradation potential, and the existing forms and bioavailability of HOCs might influence the distribution of HOC-degrading genes in natural waters. However, little research has been conducted to study the relationship between them. In the present study, nahAc and nidA genes, which act as biomarkers for naphthalene- and pyrene-degrading bacteria, were selected as model genotypes to investigate the response of polycyclic aromatic hydrocarbon (PAH)-degrading genes to PAH bioavailability in the overlying water, suspended sediment (SPS), and deposited sediment of the Yangtze River. The freely dissolved concentration, typically used to reflect HOC bioavailability, and total dissolved, as well as sorbed concentrations of PAHs were determined. Phylogenetic analysis showed that all the PAH-ring hydroxylating dioxygenase gene sequences of Gram-negative bacteria (PAH-RHD[GN]) were closely related to nahAc, nagAc, nidA, and uncultured PAH-RHD genes. The PAH-RHD[GN] gene diversity as well as nahAc and nidA gene copy numbers decreased in the following order: deposited sediment>SPS>overlying water. The nahAc and nidA gene abundance was not significantly correlated with environmental parameters but was significantly correlated with the bioavailable existing forms of naphthalene and pyrene in the three phases. The nahAc gene copy numbers in the overlying water and deposited sediment were positively correlated with freely dissolved naphthalene concentrations in the overlying and pore water phases, respectively, and so were nidA gene copy numbers. This study suggests that the distribution and abundance of HOC-degrading bacterial population depend on the HOC bioavailability in aquatic environments. Copyright © 2015 Elsevier Ltd. All rights reserved.

  5. Function-driven discovery of disease genes in zebrafish using an integrated genomics big data resource.

    PubMed

    Shim, Hongseok; Kim, Ji Hyun; Kim, Chan Yeong; Hwang, Sohyun; Kim, Hyojin; Yang, Sunmo; Lee, Ji Eun; Lee, Insuk

    2016-11-16

    Whole exome sequencing (WES) accelerates disease gene discovery using rare genetic variants, but further statistical and functional evidence is required to avoid false-discovery. To complement variant-driven disease gene discovery, here we present function-driven disease gene discovery in zebrafish (Danio rerio), a promising human disease model owing to its high anatomical and genomic similarity to humans. To facilitate zebrafish-based function-driven disease gene discovery, we developed a genome-scale co-functional network of zebrafish genes, DanioNet (www.inetbio.org/danionet), which was constructed by Bayesian integration of genomics big data. Rigorous statistical assessment confirmed the high prediction capacity of DanioNet for a wide variety of human diseases. To demonstrate the feasibility of the function-driven disease gene discovery using DanioNet, we predicted genes for ciliopathies and performed experimental validation for eight candidate genes. We also validated the existence of heterozygous rare variants in the candidate genes of individuals with ciliopathies yet not in controls derived from the UK10K consortium, suggesting that these variants are potentially involved in enhancing the risk of ciliopathies. These results showed that an integrated genomics big data for a model animal of diseases can expand our opportunity for harnessing WES data in disease gene discovery. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. Systems oncology: towards patient-specific treatment regimes informed by multiscale mathematical modelling.

    PubMed

    Powathil, Gibin G; Swat, Maciej; Chaplain, Mark A J

    2015-02-01

    The multiscale complexity of cancer as a disease necessitates a corresponding multiscale modelling approach to produce truly predictive mathematical models capable of improving existing treatment protocols. To capture all the dynamics of solid tumour growth and its progression, mathematical modellers need to couple biological processes occurring at various spatial and temporal scales (from genes to tissues). Because effectiveness of cancer therapy is considerably affected by intracellular and extracellular heterogeneities as well as by the dynamical changes in the tissue microenvironment, any model attempt to optimise existing protocols must consider these factors ultimately leading to improved multimodal treatment regimes. By improving existing and building new mathematical models of cancer, modellers can play important role in preventing the use of potentially sub-optimal treatment combinations. In this paper, we analyse a multiscale computational mathematical model for cancer growth and spread, incorporating the multiple effects of radiation therapy and chemotherapy in the patient survival probability and implement the model using two different cell based modelling techniques. We show that the insights provided by such multiscale modelling approaches can ultimately help in designing optimal patient-specific multi-modality treatment protocols that may increase patients quality of life. Copyright © 2014 Elsevier Ltd. All rights reserved.

  7. Fast forward to new genes in mammalian reproduction.

    PubMed

    Furnes, Bjarte; Schimenti, John

    2007-01-01

    The study of reproductive genetics in mammals has lagged behind that of simpler and more tractable model organisms, such as D. melanogaster, C. elegans and various yeast models. Although much valuable information has been generated using these organisms, they do not model the genetic and biological complexity of mammalian reproduction. Thus, the majority of genes required for gametogenesis in mammals remain unidentified. To expand on the existing knowledge of mammalian reproductive genetics, we have carried out forward genetic screens in mice to identify infertility mutants and the underlying mutant genes. Two different approaches were used: mutagenesis of the germline in whole mice, and mutagenesis of embryonic stem cells. This was followed by two- or three-generation breeding schemes to identify pedigrees segregating infertility mutations, which were then phenotypically characterized, genetically mapped, and in some cases, positionally cloned. This whole-genome approach has generated a wide collection of mutants with defects ranging from problems with germ cell development to abnormal sperm morphology. These models have allowed us to study the genetics, as well as the physiology, of reproduction in mammals. This review focuses on describing some of the genes identified in these screens and the ongoing effort to characterize additional mutants.

  8. Fast forward to new genes in mammalian reproduction

    PubMed Central

    Furnes, Bjarte; Schimenti, John

    2007-01-01

    The study of reproductive genetics in mammals has lagged behind that of simpler and more tractable model organisms, such as D. melanogaster, C. elegans and various yeast models. Although much valuable information has been generated using these organisms, they do not model the genetic and biological complexity of mammalian reproduction. Thus, the majority of genes required for gametogenesis in mammals remain unidentified. To expand on the existing knowledge of mammalian reproductive genetics, we have carried out forward genetic screens in mice to identify infertility mutants and the underlying mutant genes. Two different approaches were used: mutagenesis of the germline in whole mice, and mutagenesis of embryonic stem cells. This was followed by two- or three-generation breeding schemes to identify pedigrees segregating infertility mutations, which were then phenotypically characterized, genetically mapped, and in some cases, positionally cloned. This whole-genome approach has generated a wide collection of mutants with defects ranging from problems with germ cell development to abnormal sperm morphology. These models have allowed us to study the genetics, as well as the physiology, of reproduction in mammals. This review focuses on describing some of the genes identified in these screens and the ongoing effort to characterize additional mutants. PMID:16973708

  9. Bi-Force: large-scale bicluster editing and its application to gene expression data biclustering

    PubMed Central

    Sun, Peng; Speicher, Nora K.; Röttger, Richard; Guo, Jiong; Baumbach, Jan

    2014-01-01

    Abstract The explosion of the biological data has dramatically reformed today's biological research. The need to integrate and analyze high-dimensional biological data on a large scale is driving the development of novel bioinformatics approaches. Biclustering, also known as ‘simultaneous clustering’ or ‘co-clustering’, has been successfully utilized to discover local patterns in gene expression data and similar biomedical data types. Here, we contribute a new heuristic: ‘Bi-Force’. It is based on the weighted bicluster editing model, to perform biclustering on arbitrary sets of biological entities, given any kind of pairwise similarities. We first evaluated the power of Bi-Force to solve dedicated bicluster editing problems by comparing Bi-Force with two existing algorithms in the BiCluE software package. We then followed a biclustering evaluation protocol in a recent review paper from Eren et al. (2013) (A comparative analysis of biclustering algorithms for gene expressiondata. Brief. Bioinform., 14:279–292.) and compared Bi-Force against eight existing tools: FABIA, QUBIC, Cheng and Church, Plaid, BiMax, Spectral, xMOTIFs and ISA. To this end, a suite of synthetic datasets as well as nine large gene expression datasets from Gene Expression Omnibus were analyzed. All resulting biclusters were subsequently investigated by Gene Ontology enrichment analysis to evaluate their biological relevance. The distinct theoretical foundation of Bi-Force (bicluster editing) is more powerful than strict biclustering. We thus outperformed existing tools with Bi-Force at least when following the evaluation protocols from Eren et al. Bi-Force is implemented in Java and integrated into the open source software package of BiCluE. The software as well as all used datasets are publicly available at http://biclue.mpi-inf.mpg.de. PMID:24682815

  10. A robust prognostic signature for hormone-positive node-negative breast cancer.

    PubMed

    Griffith, Obi L; Pepin, François; Enache, Oana M; Heiser, Laura M; Collisson, Eric A; Spellman, Paul T; Gray, Joe W

    2013-01-01

    Systemic chemotherapy in the adjuvant setting can cure breast cancer in some patients that would otherwise recur with incurable, metastatic disease. However, since only a fraction of patients would have recurrence after surgery alone, the challenge is to stratify high-risk patients (who stand to benefit from systemic chemotherapy) from low-risk patients (who can safely be spared treatment related toxicities and costs). We focus here on risk stratification in node-negative, ER-positive, HER2-negative breast cancer. We use a large database of publicly available microarray datasets to build a random forests classifier and develop a robust multi-gene mRNA transcription-based predictor of relapse free survival at 10 years, which we call the Random Forests Relapse Score (RFRS). Performance was assessed by internal cross-validation, multiple independent data sets, and comparison to existing algorithms using receiver-operating characteristic and Kaplan-Meier survival analysis. Internal redundancy of features was determined using k-means clustering to define optimal signatures with smaller numbers of primary genes, each with multiple alternates. Internal OOB cross-validation for the initial (full-gene-set) model on training data reported an ROC AUC of 0.704, which was comparable to or better than those reported previously or obtained by applying existing methods to our dataset. Three risk groups with probability cutoffs for low, intermediate, and high-risk were defined. Survival analysis determined a highly significant difference in relapse rate between these risk groups. Validation of the models against independent test datasets showed highly similar results. Smaller 17-gene and 8-gene optimized models were also developed with minimal reduction in performance. Furthermore, the signature was shown to be almost equally effective on both hormone-treated and untreated patients. RFRS allows flexibility in both the number and identity of genes utilized from thousands to as few as 17 or eight genes, each with multiple alternatives. The RFRS reports a probability score strongly correlated with risk of relapse. This score could therefore be used to assign systemic chemotherapy specifically to those high-risk patients most likely to benefit from further treatment.

  11. A robust prognostic signature for hormone-positive node-negative breast cancer

    PubMed Central

    2013-01-01

    Background Systemic chemotherapy in the adjuvant setting can cure breast cancer in some patients that would otherwise recur with incurable, metastatic disease. However, since only a fraction of patients would have recurrence after surgery alone, the challenge is to stratify high-risk patients (who stand to benefit from systemic chemotherapy) from low-risk patients (who can safely be spared treatment related toxicities and costs). Methods We focus here on risk stratification in node-negative, ER-positive, HER2-negative breast cancer. We use a large database of publicly available microarray datasets to build a random forests classifier and develop a robust multi-gene mRNA transcription-based predictor of relapse free survival at 10 years, which we call the Random Forests Relapse Score (RFRS). Performance was assessed by internal cross-validation, multiple independent data sets, and comparison to existing algorithms using receiver-operating characteristic and Kaplan-Meier survival analysis. Internal redundancy of features was determined using k-means clustering to define optimal signatures with smaller numbers of primary genes, each with multiple alternates. Results Internal OOB cross-validation for the initial (full-gene-set) model on training data reported an ROC AUC of 0.704, which was comparable to or better than those reported previously or obtained by applying existing methods to our dataset. Three risk groups with probability cutoffs for low, intermediate, and high-risk were defined. Survival analysis determined a highly significant difference in relapse rate between these risk groups. Validation of the models against independent test datasets showed highly similar results. Smaller 17-gene and 8-gene optimized models were also developed with minimal reduction in performance. Furthermore, the signature was shown to be almost equally effective on both hormone-treated and untreated patients. Conclusions RFRS allows flexibility in both the number and identity of genes utilized from thousands to as few as 17 or eight genes, each with multiple alternatives. The RFRS reports a probability score strongly correlated with risk of relapse. This score could therefore be used to assign systemic chemotherapy specifically to those high-risk patients most likely to benefit from further treatment. PMID:24112773

  12. Reverse-engineering of gene networks for regulating early blood development from single-cell measurements.

    PubMed

    Wei, Jiangyong; Hu, Xiaohua; Zou, Xiufen; Tian, Tianhai

    2017-12-28

    Recent advances in omics technologies have raised great opportunities to study large-scale regulatory networks inside the cell. In addition, single-cell experiments have measured the gene and protein activities in a large number of cells under the same experimental conditions. However, a significant challenge in computational biology and bioinformatics is how to derive quantitative information from the single-cell observations and how to develop sophisticated mathematical models to describe the dynamic properties of regulatory networks using the derived quantitative information. This work designs an integrated approach to reverse-engineer gene networks for regulating early blood development based on singel-cell experimental observations. The wanderlust algorithm is initially used to develop the pseudo-trajectory for the activities of a number of genes. Since the gene expression data in the developed pseudo-trajectory show large fluctuations, we then use Gaussian process regression methods to smooth the gene express data in order to obtain pseudo-trajectories with much less fluctuations. The proposed integrated framework consists of both bioinformatics algorithms to reconstruct the regulatory network and mathematical models using differential equations to describe the dynamics of gene expression. The developed approach is applied to study the network regulating early blood cell development. A graphic model is constructed for a regulatory network with forty genes and a dynamic model using differential equations is developed for a network of nine genes. Numerical results suggests that the proposed model is able to match experimental data very well. We also examine the networks with more regulatory relations and numerical results show that more regulations may exist. We test the possibility of auto-regulation but numerical simulations do not support the positive auto-regulation. In addition, robustness is used as an importantly additional criterion to select candidate networks. The research results in this work shows that the developed approach is an efficient and effective method to reverse-engineer gene networks using single-cell experimental observations.

  13. Modularity of Plant Metabolic Gene Clusters: A Trio of Linked Genes That Are Collectively Required for Acylation of Triterpenes in Oat[W][OA

    PubMed Central

    Mugford, Sam T.; Louveau, Thomas; Melton, Rachel; Qi, Xiaoquan; Bakht, Saleha; Hill, Lionel; Tsurushima, Tetsu; Honkanen, Suvi; Rosser, Susan J.; Lomonossoff, George P.; Osbourn, Anne

    2013-01-01

    Operon-like gene clusters are an emerging phenomenon in the field of plant natural products. The genes encoding some of the best-characterized plant secondary metabolite biosynthetic pathways are scattered across plant genomes. However, an increasing number of gene clusters encoding the synthesis of diverse natural products have recently been reported in plant genomes. These clusters have arisen through the neo-functionalization and relocation of existing genes within the genome, and not by horizontal gene transfer from microbes. The reasons for clustering are not yet clear, although this form of gene organization is likely to facilitate co-inheritance and co-regulation. Oats (Avena spp) synthesize antimicrobial triterpenoids (avenacins) that provide protection against disease. The synthesis of these compounds is encoded by a gene cluster. Here we show that a module of three adjacent genes within the wider biosynthetic gene cluster is required for avenacin acylation. Through the characterization of these genes and their encoded proteins we present a model of the subcellular organization of triterpenoid biosynthesis. PMID:23532069

  14. Simulation Modeling to Compare High-Throughput, Low-Iteration Optimization Strategies for Metabolic Engineering

    PubMed Central

    Heinsch, Stephen C.; Das, Siba R.; Smanski, Michael J.

    2018-01-01

    Increasing the final titer of a multi-gene metabolic pathway can be viewed as a multivariate optimization problem. While numerous multivariate optimization algorithms exist, few are specifically designed to accommodate the constraints posed by genetic engineering workflows. We present a strategy for optimizing expression levels across an arbitrary number of genes that requires few design-build-test iterations. We compare the performance of several optimization algorithms on a series of simulated expression landscapes. We show that optimal experimental design parameters depend on the degree of landscape ruggedness. This work provides a theoretical framework for designing and executing numerical optimization on multi-gene systems. PMID:29535690

  15. Function and evolution of sex determination mechanisms, genes and pathways in insects

    PubMed Central

    Gempe, Tanja; Beye, Martin

    2011-01-01

    Animals have evolved a bewildering diversity of mechanisms to determine the two sexes. Studies of sex determination genes – their history and function – in non-model insects and Drosophila have allowed us to begin to understand the generation of sex determination diversity. One common theme from these studies is that evolved mechanisms produce activities in either males or females to control a shared gene switch that regulates sexual development. Only a few small-scale changes in existing and duplicated genes are sufficient to generate large differences in sex determination systems. This review summarises recent findings in insects, surveys evidence of how and why sex determination mechanisms can change rapidly and suggests fruitful areas of future research. PMID:21110346

  16. Bayesian estimation of differential transcript usage from RNA-seq data.

    PubMed

    Papastamoulis, Panagiotis; Rattray, Magnus

    2017-11-27

    Next generation sequencing allows the identification of genes consisting of differentially expressed transcripts, a term which usually refers to changes in the overall expression level. A specific type of differential expression is differential transcript usage (DTU) and targets changes in the relative within gene expression of a transcript. The contribution of this paper is to: (a) extend the use of cjBitSeq to the DTU context, a previously introduced Bayesian model which is originally designed for identifying changes in overall expression levels and (b) propose a Bayesian version of DRIMSeq, a frequentist model for inferring DTU. cjBitSeq is a read based model and performs fully Bayesian inference by MCMC sampling on the space of latent state of each transcript per gene. BayesDRIMSeq is a count based model and estimates the Bayes Factor of a DTU model against a null model using Laplace's approximation. The proposed models are benchmarked against the existing ones using a recent independent simulation study as well as a real RNA-seq dataset. Our results suggest that the Bayesian methods exhibit similar performance with DRIMSeq in terms of precision/recall but offer better calibration of False Discovery Rate.

  17. Vector platforms for gene therapy of inherited retinopathies

    PubMed Central

    Trapani, Ivana; Puppo, Agostina; Auricchio, Alberto

    2014-01-01

    Inherited retinopathies (IR) are common untreatable blinding conditions. Most of them are inherited as monogenic disorders, due to mutations in genes expressed in retinal photoreceptors (PR) and in retinal pigment epithelium (RPE). The retina’s compatibility with gene transfer has made transduction of different retinal cell layers in small and large animal models via viral and non-viral vectors possible. The ongoing identification of novel viruses as well as modifications of existing ones based either on rational design or directed evolution have generated vector variants with improved transduction properties. Dozens of promising proofs of concept have been obtained in IR animal models with both viral and non-viral vectors, and some of them have been relayed to clinical trials. To date, recombinant vectors based on the adeno-associated virus (AAV) represent the most promising tool for retinal gene therapy, given their ability to efficiently deliver therapeutic genes to both PR and RPE and their excellent safety and efficacy profiles in humans. However, AAVs’ limited cargo capacity has prevented application of the viral vector to treatments requiring transfer of genes with a coding sequence larger than 5 kb. Vectors with larger capacity, i.e. nanoparticles, adenoviral and lentiviral vectors are being exploited for gene transfer to the retina in animal models and, more recently, in humans. This review focuses on the available platforms for retinal gene therapy to fight inherited blindness, highlights their main strengths and examines the efforts to overcome some of their limitations. PMID:25124745

  18. Functional genetics for all: engineered nucleases, CRISPR and the gene editing revolution.

    PubMed

    Gilles, Anna F; Averof, Michalis

    2014-01-01

    Developmental biology, as all experimental science, is empowered by technological advances. The availability of genetic tools in some species - designated as model organisms - has driven their use as major platforms for understanding development, physiology and behavior. Extending these tools to a wider range of species determines whether (and how) we can experimentally approach developmental diversity and evolution. During the last two decades, comparative developmental biology (evo-devo) was marked by the introduction of gene knockdown and deep sequencing technologies that are applicable to a wide range of species. These approaches allowed us to test the developmental role of specific genes in diverse species, to study biological processes that are not accessible in established models and, in some cases, to conduct genome-wide screens that overcome the limitations of the candidate gene approach. The recent discovery of CRISPR/Cas as a means of precise alterations into the genome promises to revolutionize developmental genetics. In this review we describe the development of gene editing tools, from zinc-finger nucleases to TALENs and CRISPR, and examine their application in gene targeting, their limitations and the opportunities they present for evo-devo. We outline their use in gene knock-out and knock-in approaches, and in manipulating gene functions by directing molecular effectors to specific sites in the genome. The ease-of-use and efficiency of CRISPR in diverse species provide an opportunity to close the technology gap that exists between established model organisms and emerging genetically-tractable species.

  19. No Control Genes Required: Bayesian Analysis of qRT-PCR Data

    PubMed Central

    Matz, Mikhail V.; Wright, Rachel M.; Scott, James G.

    2013-01-01

    Background Model-based analysis of data from quantitative reverse-transcription PCR (qRT-PCR) is potentially more powerful and versatile than traditional methods. Yet existing model-based approaches cannot properly deal with the higher sampling variances associated with low-abundant targets, nor do they provide a natural way to incorporate assumptions about the stability of control genes directly into the model-fitting process. Results In our method, raw qPCR data are represented as molecule counts, and described using generalized linear mixed models under Poisson-lognormal error. A Markov Chain Monte Carlo (MCMC) algorithm is used to sample from the joint posterior distribution over all model parameters, thereby estimating the effects of all experimental factors on the expression of every gene. The Poisson-based model allows for the correct specification of the mean-variance relationship of the PCR amplification process, and can also glean information from instances of no amplification (zero counts). Our method is very flexible with respect to control genes: any prior knowledge about the expected degree of their stability can be directly incorporated into the model. Yet the method provides sensible answers without such assumptions, or even in the complete absence of control genes. We also present a natural Bayesian analogue of the “classic” analysis, which uses standard data pre-processing steps (logarithmic transformation and multi-gene normalization) but estimates all gene expression changes jointly within a single model. The new methods are considerably more flexible and powerful than the standard delta-delta Ct analysis based on pairwise t-tests. Conclusions Our methodology expands the applicability of the relative-quantification analysis protocol all the way to the lowest-abundance targets, and provides a novel opportunity to analyze qRT-PCR data without making any assumptions concerning target stability. These procedures have been implemented as the MCMC.qpcr package in R. PMID:23977043

  20. Isolation of Novel CreERT2-Driver Lines in Zebrafish Using an Unbiased Gene Trap Approach

    PubMed Central

    Jungke, Peggy; Hammer, Juliane; Hans, Stefan; Brand, Michael

    2015-01-01

    Gene manipulation using the Cre/loxP-recombinase system has been successfully employed in zebrafish to study gene functions and lineage relationships. Recently, gene trapping approaches have been applied to produce large collections of transgenic fish expressing conditional alleles in various tissues. However, the limited number of available cell- and tissue-specific Cre/CreERT2-driver lines still constrains widespread application in this model organism. To enlarge the pool of existing CreERT2-driver lines, we performed a genome-wide gene trap screen using a Tol2-based mCherry-T2a-CreERT2 (mCT2aC) gene trap vector. This cassette consists of a splice acceptor and a mCherry-tagged variant of CreERT2 which enables simultaneous labeling of the trapping event, as well as CreERT2 expression from the endogenous promoter. Using this strategy, we generated 27 novel functional CreERT2-driver lines expressing in a cell- and tissue-specific manner during development and adulthood. This study summarizes the analysis of the generated CreERT2-driver lines with respect to functionality, expression, integration, as well as associated phenotypes. Our results significantly enlarge the existing pool of CreERT2-driver lines in zebrafish and combined with Cre–dependent effector lines, the new CreERT2-driver lines will be important tools to manipulate the zebrafish genome. PMID:26083735

  1. Identification of Differentially Expressed Thyroid Hormone Responsive Genes from the Brain of the Mexican Axolotl (Ambystoma mexicanum) ✧

    PubMed Central

    Huggins, P; Johnson, CK; Schoergendorfer, A; Putta, S; Bathke, AC; Stromberg, AJ; Voss, SR

    2011-01-01

    The Mexican axolotl (Ambystoma mexicanum) presents an excellent model to investigate mechanisms of brain development that are conserved among vertebrates. In particular, metamorphic changes of the brain can be induced in free-living aquatic juveniles and adults by simply adding thyroid hormone (T4) to rearing water. Whole brains were sampled from juvenile A. mexicanum that were exposed to 0, 8, and 18 days of 50 nM T4, and these were used to isolate RNA and make normalized cDNA libraries for 454 DNA sequencing. A total of 1,875,732 high quality cDNA reads were assembled with existing ESTs to obtain 5,884 new contigs for human RefSeq protein models, and to develop a custom Affymetrix gene expression array (Amby_002) with approximately 20,000 probe sets. The Amby_002 array was used to identify 303 transcripts that differed statistically (p < 0.05, fold change > 1.5) as a function of days of T4 treatment. Further statistical analyses showed that Amby_002 performed concordantly in comparison to an existing, small format expression array. This study introduces a new A. mexicanum microarray resource for the community and the first lists of T4-responsive genes from the brain of a salamander amphibian. PMID:21457787

  2. Identification of differentially expressed thyroid hormone responsive genes from the brain of the Mexican Axolotl (Ambystoma mexicanum).

    PubMed

    Huggins, P; Johnson, C K; Schoergendorfer, A; Putta, S; Bathke, A C; Stromberg, A J; Voss, S R

    2012-01-01

    The Mexican axolotl (Ambystoma mexicanum) presents an excellent model to investigate mechanisms of brain development that are conserved among vertebrates. In particular, metamorphic changes of the brain can be induced in free-living aquatic juveniles and adults by simply adding thyroid hormone (T4) to rearing water. Whole brains were sampled from juvenile A. mexicanum that were exposed to 0, 8, and 18 days of 50 nM T4, and these were used to isolate RNA and make normalized cDNA libraries for 454 DNA sequencing. A total of 1,875,732 high quality cDNA reads were assembled with existing ESTs to obtain 5884 new contigs for human RefSeq protein models, and to develop a custom Affymetrix gene expression array (Amby_002) with approximately 20,000 probe sets. The Amby_002 array was used to identify 303 transcripts that differed statistically (p<0.05, fold change >1.5) as a function of days of T4 treatment. Further statistical analyses showed that Amby_002 performed concordantly in comparison to an existing, small format expression array. This study introduces a new A. mexicanum microarray resource for the community and the first lists of T4-responsive genes from the brain of a salamander amphibian. Copyright © 2011 Elsevier Inc. All rights reserved.

  3. GIANT API: an application programming interface for functional genomics

    PubMed Central

    Roberts, Andrew M.; Wong, Aaron K.; Fisk, Ian; Troyanskaya, Olga G.

    2016-01-01

    GIANT API provides biomedical researchers programmatic access to tissue-specific and global networks in humans and model organisms, and associated tools, which includes functional re-prioritization of existing genome-wide association study (GWAS) data. Using tissue-specific interaction networks, researchers are able to predict relationships between genes specific to a tissue or cell lineage, identify the changing roles of genes across tissues and uncover disease-gene associations. Additionally, GIANT API enables computational tools like NetWAS, which leverages tissue-specific networks for re-prioritization of GWAS results. The web services covered by the API include 144 tissue-specific functional gene networks in human, global functional networks for human and six common model organisms and the NetWAS method. GIANT API conforms to the REST architecture, which makes it stateless, cacheable and highly scalable. It can be used by a diverse range of clients including web browsers, command terminals, programming languages and standalone apps for data analysis and visualization. The API is freely available for use at http://giant-api.princeton.edu. PMID:27098035

  4. Analysis of difference of association between polymorphisms in the XRCC5, RPA3 and RTEL1 genes and glioma, astrocytoma and glioblastoma

    PubMed Central

    Jin, Tianbo; Wang, Yuan; Li, Gang; Du, Shuli; Yang, Hua; Geng, Tingting; Hou, Peng; Gong, Yongkuan

    2015-01-01

    Background: Gliomas are the most common aggressive brain tumors and have many complex pathological types. Previous reports have discovered that genetic mutations are associated with the risk of glioma. However, it is unclear whether uniform genetic mutations exist difference between glioma and its two pathological types in the Han Chinese population. Materials and methods: We evaluated 20 SNPs of 703 glioma cases (338 astrocytoma cases, 122 glioblastoma cases) and 635 controls in a Han Chinese population using χ2 test and genetic model analysis. Results: In three case-control studies, we found rs9288516 in XRCC5 gene showed a decreased risk of glioma (OR, 0.85; 95% CI, 0.73-0.99; P = 0.042) and glioblastoma (OR, 0.70; 95% CI, 0.52-0.92; P = 0.001) in the allele model. We identified rs414805 in RPA3 gene showed an increased risk of glioblastoma in allele model (OR, 1.38; 95% CI, 1.00-1.89; P = 0.047) and dominant model (OR, 1.57; 95% CI, 1.05-2.35; P = 0.027), analysis respectively. Meanwhile, rs2297440 in RTEL1 gene showed an increased risk of glioma (OR, 1.30; 95% CI, 1.10-1.54; P = 0.002) and astrocytoma (OR, 1.26; 95% CI, 1.02-1.54; P = 0.029) in the allele model. In addition, we also observed a haplotype of “GCT” in the RTEL1 gene with an increased risk of astrocytoma (P = 0.005). Conclusions: Polymorphisms in the XRCC5, RPA3 and RTEL1 genes, combinating with previous reaserches, are associated with glioma developing. However, those genes mutations may play different roles in the glioma, astrocytoma and glioblastoma, respectively. PMID:26328260

  5. Analysis of difference of association between polymorphisms in the XRCC5, RPA3 and RTEL1 genes and glioma, astrocytoma and glioblastoma.

    PubMed

    Jin, Tianbo; Wang, Yuan; Li, Gang; Du, Shuli; Yang, Hua; Geng, Tingting; Hou, Peng; Gong, Yongkuan

    2015-01-01

    Gliomas are the most common aggressive brain tumors and have many complex pathological types. Previous reports have discovered that genetic mutations are associated with the risk of glioma. However, it is unclear whether uniform genetic mutations exist difference between glioma and its two pathological types in the Han Chinese population. We evaluated 20 SNPs of 703 glioma cases (338 astrocytoma cases, 122 glioblastoma cases) and 635 controls in a Han Chinese population using χ(2) test and genetic model analysis. In three case-control studies, we found rs9288516 in XRCC5 gene showed a decreased risk of glioma (OR, 0.85; 95% CI, 0.73-0.99; P = 0.042) and glioblastoma (OR, 0.70; 95% CI, 0.52-0.92; P = 0.001) in the allele model. We identified rs414805 in RPA3 gene showed an increased risk of glioblastoma in allele model (OR, 1.38; 95% CI, 1.00-1.89; P = 0.047) and dominant model (OR, 1.57; 95% CI, 1.05-2.35; P = 0.027), analysis respectively. Meanwhile, rs2297440 in RTEL1 gene showed an increased risk of glioma (OR, 1.30; 95% CI, 1.10-1.54; P = 0.002) and astrocytoma (OR, 1.26; 95% CI, 1.02-1.54; P = 0.029) in the allele model. In addition, we also observed a haplotype of "GCT" in the RTEL1 gene with an increased risk of astrocytoma (P = 0.005). Polymorphisms in the XRCC5, RPA3 and RTEL1 genes, combinating with previous reaserches, are associated with glioma developing. However, those genes mutations may play different roles in the glioma, astrocytoma and glioblastoma, respectively.

  6. Gene Expression Profiles of Human Dendritic Cells Interacting with Aspergillus fumigatus in a Bilayer Model of the Alveolar Epithelium/Endothelium Interface

    PubMed Central

    Morton, Charles Oliver; Fliesser, Mirjam; Dittrich, Marcus; Mueller, Tobias; Bauer, Ruth; Kneitz, Susanne; Hope, William; Rogers, Thomas Richard; Einsele, Hermann; Loeffler, Juergen

    2014-01-01

    The initial stages of the interaction between the host and Aspergillus fumigatus at the alveolar surface of the human lung are critical in the establishment of aspergillosis. Using an in vitro bilayer model of the alveolus, including both the epithelium (human lung adenocarcinoma epithelial cell line, A549) and endothelium (human pulmonary artery epithelial cells, HPAEC) on transwell membranes, it was possible to closely replicate the in vivo conditions. Two distinct sub-groups of dendritic cells (DC), monocyte-derived DC (moDC) and myeloid DC (mDC), were included in the model to examine immune responses to fungal infection at the alveolar surface. RNA in high quantity and quality was extracted from the cell layers on the transwell membrane to allow gene expression analysis using tailored custom-made microarrays, containing probes for 117 immune-relevant genes. This microarray data indicated minimal induction of immune gene expression in A549 alveolar epithelial cells in response to germ tubes of A. fumigatus. In contrast, the addition of DC to the system greatly increased the number of differentially expressed immune genes. moDC exhibited increased expression of genes including CLEC7A, CD209 and CCL18 in the absence of A. fumigatus compared to mDC. In the presence of A. fumigatus, both DC subgroups exhibited up-regulation of genes identified in previous studies as being associated with the exposure of DC to A. fumigatus and exhibiting chemotactic properties for neutrophils, including CXCL2, CXCL5, CCL20, and IL1B. This model closely approximated the human alveolus allowing for an analysis of the host pathogen interface that complements existing animal models of IA. PMID:24870357

  7. Predicting Gene Structure Changes Resulting from Genetic Variants via Exon Definition Features.

    PubMed

    Majoros, William H; Holt, Carson; Campbell, Michael S; Ware, Doreen; Yandell, Mark; Reddy, Timothy E

    2018-04-25

    Genetic variation that disrupts gene function by altering gene splicing between individuals can substantially influence traits and disease. In those cases, accurately predicting the effects of genetic variation on splicing can be highly valuable for investigating the mechanisms underlying those traits and diseases. While methods have been developed to generate high quality computational predictions of gene structures in reference genomes, the same methods perform poorly when used to predict the potentially deleterious effects of genetic changes that alter gene splicing between individuals. Underlying that discrepancy in predictive ability are the common assumptions by reference gene finding algorithms that genes are conserved, well-formed, and produce functional proteins. We describe a probabilistic approach for predicting recent changes to gene structure that may or may not conserve function. The model is applicable to both coding and noncoding genes, and can be trained on existing gene annotations without requiring curated examples of aberrant splicing. We apply this model to the problem of predicting altered splicing patterns in the genomes of individual humans, and we demonstrate that performing gene-structure prediction without relying on conserved coding features is feasible. The model predicts an unexpected abundance of variants that create de novo splice sites, an observation supported by both simulations and empirical data from RNA-seq experiments. While these de novo splice variants are commonly misinterpreted by other tools as coding or noncoding variants of little or no effect, we find that in some cases they can have large effects on splicing activity and protein products, and we propose that they may commonly act as cryptic factors in disease. The software is available from geneprediction.org/SGRF. bmajoros@duke.edu. Supplementary information is available at Bioinformatics online.

  8. Towards a whole-cell modeling approach for synthetic biology

    NASA Astrophysics Data System (ADS)

    Purcell, Oliver; Jain, Bonny; Karr, Jonathan R.; Covert, Markus W.; Lu, Timothy K.

    2013-06-01

    Despite rapid advances over the last decade, synthetic biology lacks the predictive tools needed to enable rational design. Unlike established engineering disciplines, the engineering of synthetic gene circuits still relies heavily on experimental trial-and-error, a time-consuming and inefficient process that slows down the biological design cycle. This reliance on experimental tuning is because current modeling approaches are unable to make reliable predictions about the in vivo behavior of synthetic circuits. A major reason for this lack of predictability is that current models view circuits in isolation, ignoring the vast number of complex cellular processes that impinge on the dynamics of the synthetic circuit and vice versa. To address this problem, we present a modeling approach for the design of synthetic circuits in the context of cellular networks. Using the recently published whole-cell model of Mycoplasma genitalium, we examined the effect of adding genes into the host genome. We also investigated how codon usage correlates with gene expression and find agreement with existing experimental results. Finally, we successfully implemented a synthetic Goodwin oscillator in the whole-cell model. We provide an updated software framework for the whole-cell model that lays the foundation for the integration of whole-cell models with synthetic gene circuit models. This software framework is made freely available to the community to enable future extensions. We envision that this approach will be critical to transforming the field of synthetic biology into a rational and predictive engineering discipline.

  9. Draft genome assembly of the Bengalese finch, Lonchura striata domestica, a model for motor skill variability and learning

    PubMed Central

    Mets, David G; Brainard, Michael S

    2018-01-01

    Abstract Background Vocal learning in songbirds has emerged as a powerful model for sensorimotor learning. Neurobehavioral studies of Bengalese finch (Lonchura striata domestica) song, naturally more variable and plastic than songs of other finch species, have demonstrated the importance of behavioral variability for initial learning, maintenance, and plasticity of vocalizations. However, the molecular and genetic underpinnings of this variability and the learning it supports are poorly understood. Findings To establish a platform for the molecular analysis of behavioral variability and plasticity, we generated an initial draft assembly of the Bengalese finch genome from a single male animal to 151× coverage and an N50 of 3.0 MB. Furthermore, we developed an initial set of gene models using RNA-seq data from 8 samples that comprise liver, muscle, cerebellum, brainstem/midbrain, and forebrain tissue from juvenile and adult Bengalese finches of both sexes. Conclusions We provide a draft Bengalese finch genome and gene annotation to facilitate the study of the molecular-genetic influences on behavioral variability and the process of vocal learning. These data will directly support many avenues for the identification of genes involved in learning, including differential expression analysis, comparative genomic analysis (through comparison to existing avian genome assemblies), and derivation of genetic maps for linkage analysis. Bengalese finch gene models and sequences will be essential for subsequent manipulation (molecular or genetic) of genes and gene products, enabling novel mechanistic investigations into the role of variability in learned behavior. PMID:29618046

  10. Draft genome assembly of the Bengalese finch, Lonchura striata domestica, a model for motor skill variability and learning.

    PubMed

    Colquitt, Bradley M; Mets, David G; Brainard, Michael S

    2018-03-01

    Vocal learning in songbirds has emerged as a powerful model for sensorimotor learning. Neurobehavioral studies of Bengalese finch (Lonchura striata domestica) song, naturally more variable and plastic than songs of other finch species, have demonstrated the importance of behavioral variability for initial learning, maintenance, and plasticity of vocalizations. However, the molecular and genetic underpinnings of this variability and the learning it supports are poorly understood. To establish a platform for the molecular analysis of behavioral variability and plasticity, we generated an initial draft assembly of the Bengalese finch genome from a single male animal to 151× coverage and an N50 of 3.0 MB. Furthermore, we developed an initial set of gene models using RNA-seq data from 8 samples that comprise liver, muscle, cerebellum, brainstem/midbrain, and forebrain tissue from juvenile and adult Bengalese finches of both sexes. We provide a draft Bengalese finch genome and gene annotation to facilitate the study of the molecular-genetic influences on behavioral variability and the process of vocal learning. These data will directly support many avenues for the identification of genes involved in learning, including differential expression analysis, comparative genomic analysis (through comparison to existing avian genome assemblies), and derivation of genetic maps for linkage analysis. Bengalese finch gene models and sequences will be essential for subsequent manipulation (molecular or genetic) of genes and gene products, enabling novel mechanistic investigations into the role of variability in learned behavior.

  11. Noise in gene expression is coupled to growth rate.

    PubMed

    Keren, Leeat; van Dijk, David; Weingarten-Gabbay, Shira; Davidi, Dan; Jona, Ghil; Weinberger, Adina; Milo, Ron; Segal, Eran

    2015-12-01

    Genetically identical cells exposed to the same environment display variability in gene expression (noise), with important consequences for the fidelity of cellular regulation and biological function. Although population average gene expression is tightly coupled to growth rate, the effects of changes in environmental conditions on expression variability are not known. Here, we measure the single-cell expression distributions of approximately 900 Saccharomyces cerevisiae promoters across four environmental conditions using flow cytometry, and find that gene expression noise is tightly coupled to the environment and is generally higher at lower growth rates. Nutrient-poor conditions, which support lower growth rates, display elevated levels of noise for most promoters, regardless of their specific expression values. We present a simple model of noise in expression that results from having an asynchronous population, with cells at different cell-cycle stages, and with different partitioning of the cells between the stages at different growth rates. This model predicts non-monotonic global changes in noise at different growth rates as well as overall higher variability in expression for cell-cycle-regulated genes in all conditions. The consistency between this model and our data, as well as with noise measurements of cells growing in a chemostat at well-defined growth rates, suggests that cell-cycle heterogeneity is a major contributor to gene expression noise. Finally, we identify gene and promoter features that play a role in gene expression noise across conditions. Our results show the existence of growth-related global changes in gene expression noise and suggest their potential phenotypic implications. © 2015 Keren et al.; Published by Cold Spring Harbor Laboratory Press.

  12. Noise in gene expression is coupled to growth rate

    PubMed Central

    Keren, Leeat; van Dijk, David; Weingarten-Gabbay, Shira; Davidi, Dan; Jona, Ghil; Weinberger, Adina; Milo, Ron; Segal, Eran

    2015-01-01

    Genetically identical cells exposed to the same environment display variability in gene expression (noise), with important consequences for the fidelity of cellular regulation and biological function. Although population average gene expression is tightly coupled to growth rate, the effects of changes in environmental conditions on expression variability are not known. Here, we measure the single-cell expression distributions of approximately 900 Saccharomyces cerevisiae promoters across four environmental conditions using flow cytometry, and find that gene expression noise is tightly coupled to the environment and is generally higher at lower growth rates. Nutrient-poor conditions, which support lower growth rates, display elevated levels of noise for most promoters, regardless of their specific expression values. We present a simple model of noise in expression that results from having an asynchronous population, with cells at different cell-cycle stages, and with different partitioning of the cells between the stages at different growth rates. This model predicts non-monotonic global changes in noise at different growth rates as well as overall higher variability in expression for cell-cycle–regulated genes in all conditions. The consistency between this model and our data, as well as with noise measurements of cells growing in a chemostat at well-defined growth rates, suggests that cell-cycle heterogeneity is a major contributor to gene expression noise. Finally, we identify gene and promoter features that play a role in gene expression noise across conditions. Our results show the existence of growth-related global changes in gene expression noise and suggest their potential phenotypic implications. PMID:26355006

  13. GeneImp: Fast Imputation to Large Reference Panels Using Genotype Likelihoods from Ultralow Coverage Sequencing

    PubMed Central

    Spiliopoulou, Athina; Colombo, Marco; Orchard, Peter; Agakov, Felix; McKeigue, Paul

    2017-01-01

    We address the task of genotype imputation to a dense reference panel given genotype likelihoods computed from ultralow coverage sequencing as inputs. In this setting, the data have a high-level of missingness or uncertainty, and are thus more amenable to a probabilistic representation. Most existing imputation algorithms are not well suited for this situation, as they rely on prephasing for computational efficiency, and, without definite genotype calls, the prephasing task becomes computationally expensive. We describe GeneImp, a program for genotype imputation that does not require prephasing and is computationally tractable for whole-genome imputation. GeneImp does not explicitly model recombination, instead it capitalizes on the existence of large reference panels—comprising thousands of reference haplotypes—and assumes that the reference haplotypes can adequately represent the target haplotypes over short regions unaltered. We validate GeneImp based on data from ultralow coverage sequencing (0.5×), and compare its performance to the most recent version of BEAGLE that can perform this task. We show that GeneImp achieves imputation quality very close to that of BEAGLE, using one to two orders of magnitude less time, without an increase in memory complexity. Therefore, GeneImp is the first practical choice for whole-genome imputation to a dense reference panel when prephasing cannot be applied, for instance, in datasets produced via ultralow coverage sequencing. A related future application for GeneImp is whole-genome imputation based on the off-target reads from deep whole-exome sequencing. PMID:28348060

  14. Limit cycles in piecewise-affine gene network models with multiple interaction loops

    NASA Astrophysics Data System (ADS)

    Farcot, Etienne; Gouzé, Jean-Luc

    2010-01-01

    In this article, we consider piecewise affine differential equations modelling gene networks. We work with arbitrary decay rates, and under a local hypothesis expressed as an alignment condition of successive focal points. The interaction graph of the system may be rather complex (multiple intricate loops of any sign, multiple thresholds, etc.). Our main result is an alternative theorem showing that if a sequence of region is periodically visited by trajectories, then under our hypotheses, there exists either a unique stable periodic solution, or the origin attracts all trajectories in this sequence of regions. This result extends greatly our previous work on a single negative feedback loop. We give several examples and simulations illustrating different cases.

  15. Predictive minimum description length principle approach to inferring gene regulatory networks.

    PubMed

    Chaitankar, Vijender; Zhang, Chaoyang; Ghosh, Preetam; Gong, Ping; Perkins, Edward J; Deng, Youping

    2011-01-01

    Reverse engineering of gene regulatory networks using information theory models has received much attention due to its simplicity, low computational cost, and capability of inferring large networks. One of the major problems with information theory models is to determine the threshold that defines the regulatory relationships between genes. The minimum description length (MDL) principle has been implemented to overcome this problem. The description length of the MDL principle is the sum of model length and data encoding length. A user-specified fine tuning parameter is used as control mechanism between model and data encoding, but it is difficult to find the optimal parameter. In this work, we propose a new inference algorithm that incorporates mutual information (MI), conditional mutual information (CMI), and predictive minimum description length (PMDL) principle to infer gene regulatory networks from DNA microarray data. In this algorithm, the information theoretic quantities MI and CMI determine the regulatory relationships between genes and the PMDL principle method attempts to determine the best MI threshold without the need of a user-specified fine tuning parameter. The performance of the proposed algorithm is evaluated using both synthetic time series data sets and a biological time series data set (Saccharomyces cerevisiae). The results show that the proposed algorithm produced fewer false edges and significantly improved the precision when compared to existing MDL algorithm.

  16. Flower development and sex specification in wild grapevine.

    PubMed

    Ramos, Miguel Jesus Nunes; Coito, João Lucas; Silva, Helena Gomes; Cunha, Jorge; Costa, Maria Manuela Ribeiro; Rocheta, Margarida

    2014-12-12

    Wild plants of Vitis closely related to the cultivated grapevine (V. v. vinifera) are believed to have been first domesticated 10,000 years BC around the Caspian Sea. V. v. vinifera is hermaphrodite whereas V. v. sylvestris is a dioecious species. Male flowers show a reduced pistil without style or stigma and female flowers present reflexed stamens with infertile pollen. V. vinifera produce perfect flowers with all functional structures. The mechanism for flower sex determination and specification in grapevine is still unknown. To understand which genes are involved during the establishment of male, female and complete flowers, we analysed and compared the transcription profiles of four developmental stages of the three genders. We showed that sex determination is a late event during flower development and that the expression of genes from the ABCDE model is not directly correlated with the establishment of sexual dimorphism. We propose a temporal comprehensive model in which two mutations in two linked genes could be players in sex determination and indirectly establish the Vitis domestication process. Additionally, we also found clusters of genes differentially expressed between genders and between developmental stages that suggest a role involved in sex differentiation. Also, the detection of differentially transcribed regions that extended existing gene models (intergenic regions) between sexes suggests that they may account for some of the variation between the subspecies. There is no evidence of differences of expression levels in genes from the ABCDE model that could explain the shift from hermaphroditism to dioecy. We propose that sex specification occurs after floral organ identity has been established and therefore, sex determination genes might be having an effect downstream of the ABCDE model genes.For the first time a full transcriptomic analysis was performed in different flower developmental stages in the same individual. Our experimental approach enabled us to create a comprehensive catalogue of transcribed genes across developmental stages and genders that will contribute for future work in sex determination in seed plants.

  17. Long non-coding RNA expression patterns in lung tissues of chronic cigarette smoke induced COPD mouse model.

    PubMed

    Zhang, Haiyun; Sun, Dejun; Li, Defu; Zheng, Zeguang; Xu, Jingyi; Liang, Xue; Zhang, Chenting; Wang, Sheng; Wang, Jian; Lu, Wenju

    2018-05-15

    Long non-coding RNAs (lncRNAs) have critical regulatory roles in protein-coding gene expression. Aberrant expression profiles of lncRNAs have been observed in various human diseases. In this study, we investigated transcriptome profiles in lung tissues of chronic cigarette smoke (CS)-induced COPD mouse model. We found that 109 lncRNAs and 260 mRNAs were significantly differential expressed in lungs of chronic CS-induced COPD mouse model compared with control animals. GO and KEGG analyses indicated that differentially expressed lncRNAs associated protein-coding genes were mainly involved in protein processing of endoplasmic reticulum pathway, and taurine and hypotaurine metabolism pathway. The combination of high throughput data analysis and the results of qRT-PCR validation in lungs of chronic CS-induced COPD mouse model, 16HBE cells with CSE treatment and PBMC from patients with COPD revealed that NR_102714 and its associated protein-coding gene UCHL1 might be involved in the development of COPD both in mouse and human. In conclusion, our study demonstrated that aberrant expression profiles of lncRNAs and mRNAs existed in lungs of chronic CS-induced COPD mouse model. From animal models perspective, these results might provide further clues to investigate biological functions of lncRNAs and their potential target protein-coding genes in the pathogenesis of COPD.

  18. RNA interference can be used to disrupt gene function in tardigrades

    PubMed Central

    Tenlen, Jennifer R.; McCaskill, Shaina; Goldstein, Bob

    2012-01-01

    How morphological diversity arises is a key question in evolutionary developmental biology. As a long-term approach to address this question, we are developing the water bear Hypsibius dujardini (Phylum Tardigrada) as a model system. We expect that using a close relative of two well-studied models, Drosophila (Phylum Arthropoda) and Caenorhabditis elegans (Phylum Nematoda), will facilitate identifying genetic pathways relevant to understanding the evolution of development. Tardigrades are also valuable research subjects for investigating how organisms and biological materials can survive extreme conditions. Methods to disrupt gene activity are essential to each of these efforts, but no such method yet exists for the Phylum Tardigrada. We developed a protocol to disrupt tardigrade gene functions by double-stranded RNA-mediated RNA interference (RNAi). We show that targeting tardigrade homologs of essential developmental genes by RNAi produced embryonic lethality, whereas targeting green fluorescent protein did not. Disruption of gene functions appears to be relatively specific by two criteria: targeting distinct genes resulted in distinct phenotypes that were consistent with predicted gene functions, and by RT-PCR, RNAi reduced the level of a target mRNA and not a control mRNA. These studies represent the first evidence that gene functions can be disrupted by RNAi in the phylum Tardigrada. Our results form a platform for dissecting tardigrade gene functions for understanding the evolution of developmental mechanisms and survival in extreme environments. PMID:23187800

  19. RNA interference can be used to disrupt gene function in tardigrades.

    PubMed

    Tenlen, Jennifer R; McCaskill, Shaina; Goldstein, Bob

    2013-05-01

    How morphological diversity arises is a key question in evolutionary developmental biology. As a long-term approach to address this question, we are developing the water bear Hypsibius dujardini (Phylum Tardigrada) as a model system. We expect that using a close relative of two well-studied models, Drosophila (Phylum Arthropoda) and Caenorhabditis elegans (Phylum Nematoda), will facilitate identifying genetic pathways relevant to understanding the evolution of development. Tardigrades are also valuable research subjects for investigating how organisms and biological materials can survive extreme conditions. Methods to disrupt gene activity are essential to each of these efforts, but no such method yet exists for the Phylum Tardigrada. We developed a protocol to disrupt tardigrade gene functions by double-stranded RNA-mediated RNA interference (RNAi). We showed that targeting tardigrade homologs of essential developmental genes by RNAi produced embryonic lethality, whereas targeting green fluorescent protein did not. Disruption of gene functions appears to be relatively specific by two criteria: targeting distinct genes resulted in distinct phenotypes that were consistent with predicted gene functions and by RT-PCR, RNAi reduced the level of a target mRNA and not a control mRNA. These studies represent the first evidence that gene functions can be disrupted by RNAi in the phylum Tardigrada. Our results form a platform for dissecting tardigrade gene functions for understanding the evolution of developmental mechanisms and survival in extreme environments.

  20. A quantitative validated model reveals two phases of transcriptional regulation for the gap gene giant in Drosophila.

    PubMed

    Hoermann, Astrid; Cicin-Sain, Damjan; Jaeger, Johannes

    2016-03-15

    Understanding eukaryotic transcriptional regulation and its role in development and pattern formation is one of the big challenges in biology today. Most attempts at tackling this problem either focus on the molecular details of transcription factor binding, or aim at genome-wide prediction of expression patterns from sequence through bioinformatics and mathematical modelling. Here we bridge the gap between these two complementary approaches by providing an integrative model of cis-regulatory elements governing the expression of the gap gene giant (gt) in the blastoderm embryo of Drosophila melanogaster. We use a reverse-engineering method, where mathematical models are fit to quantitative spatio-temporal reporter gene expression data to infer the regulatory mechanisms underlying gt expression in its anterior and posterior domains. These models are validated through prediction of gene expression in mutant backgrounds. A detailed analysis of our data and models reveals that gt is regulated by domain-specific CREs at early stages, while a late element drives expression in both the anterior and the posterior domains. Initial gt expression depends exclusively on inputs from maternal factors. Later, gap gene cross-repression and gt auto-activation become increasingly important. We show that auto-regulation creates a positive feedback, which mediates the transition from early to late stages of regulation. We confirm the existence and role of gt auto-activation through targeted mutagenesis of Gt transcription factor binding sites. In summary, our analysis provides a comprehensive picture of spatio-temporal gene regulation by different interacting enhancer elements for an important developmental regulator. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  1. Theory of prokaryotic genome evolution.

    PubMed

    Sela, Itamar; Wolf, Yuri I; Koonin, Eugene V

    2016-10-11

    Bacteria and archaea typically possess small genomes that are tightly packed with protein-coding genes. The compactness of prokaryotic genomes is commonly perceived as evidence of adaptive genome streamlining caused by strong purifying selection in large microbial populations. In such populations, even the small cost incurred by nonfunctional DNA because of extra energy and time expenditure is thought to be sufficient for this extra genetic material to be eliminated by selection. However, contrary to the predictions of this model, there exists a consistent, positive correlation between the strength of selection at the protein sequence level, measured as the ratio of nonsynonymous to synonymous substitution rates, and microbial genome size. Here, by fitting the genome size distributions in multiple groups of prokaryotes to predictions of mathematical models of population evolution, we show that only models in which acquisition of additional genes is, on average, slightly beneficial yield a good fit to genomic data. These results suggest that the number of genes in prokaryotic genomes reflects the equilibrium between the benefit of additional genes that diminishes as the genome grows and deletion bias (i.e., the rate of deletion of genetic material being slightly greater than the rate of acquisition). Thus, new genes acquired by microbial genomes, on average, appear to be adaptive. The tight spacing of protein-coding genes likely results from a combination of the deletion bias and purifying selection that efficiently eliminates nonfunctional, noncoding sequences.

  2. Comparing machine learning and logistic regression methods for predicting hypertension using a combination of gene expression and next-generation sequencing data.

    PubMed

    Held, Elizabeth; Cape, Joshua; Tintle, Nathan

    2016-01-01

    Machine learning methods continue to show promise in the analysis of data from genetic association studies because of the high number of variables relative to the number of observations. However, few best practices exist for the application of these methods. We extend a recently proposed supervised machine learning approach for predicting disease risk by genotypes to be able to incorporate gene expression data and rare variants. We then apply 2 different versions of the approach (radial and linear support vector machines) to simulated data from Genetic Analysis Workshop 19 and compare performance to logistic regression. Method performance was not radically different across the 3 methods, although the linear support vector machine tended to show small gains in predictive ability relative to a radial support vector machine and logistic regression. Importantly, as the number of genes in the models was increased, even when those genes contained causal rare variants, model predictive ability showed a statistically significant decrease in performance for both the radial support vector machine and logistic regression. The linear support vector machine showed more robust performance to the inclusion of additional genes. Further work is needed to evaluate machine learning approaches on larger samples and to evaluate the relative improvement in model prediction from the incorporation of gene expression data.

  3. Refined mapping of autoimmune disease associated genetic variants with gene expression suggests an important role for non-coding RNAs.

    PubMed

    Ricaño-Ponce, Isis; Zhernakova, Daria V; Deelen, Patrick; Luo, Oscar; Li, Xingwang; Isaacs, Aaron; Karjalainen, Juha; Di Tommaso, Jennifer; Borek, Zuzanna Agnieszka; Zorro, Maria M; Gutierrez-Achury, Javier; Uitterlinden, Andre G; Hofman, Albert; van Meurs, Joyce; Netea, Mihai G; Jonkers, Iris H; Withoff, Sebo; van Duijn, Cornelia M; Li, Yang; Ruan, Yijun; Franke, Lude; Wijmenga, Cisca; Kumar, Vinod

    2016-04-01

    Genome-wide association and fine-mapping studies in 14 autoimmune diseases (AID) have implicated more than 250 loci in one or more of these diseases. As more than 90% of AID-associated SNPs are intergenic or intronic, pinpointing the causal genes is challenging. We performed a systematic analysis to link 460 SNPs that are associated with 14 AID to causal genes using transcriptomic data from 629 blood samples. We were able to link 71 (39%) of the AID-SNPs to two or more nearby genes, providing evidence that for part of the AID loci multiple causal genes exist. While 54 of the AID loci are shared by one or more AID, 17% of them do not share candidate causal genes. In addition to finding novel genes such as ULK3, we also implicate novel disease mechanisms and pathways like autophagy in celiac disease pathogenesis. Furthermore, 42 of the AID SNPs specifically affected the expression of 53 non-coding RNA genes. To further understand how the non-coding genome contributes to AID, the SNPs were linked to functional regulatory elements, which suggest a model where AID genes are regulated by network of chromatin looping/non-coding RNAs interactions. The looping model also explains how a causal candidate gene is not necessarily the gene closest to the AID SNP, which was the case in nearly 50% of cases. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  4. Identification of Genes Uniquely Expressed in the Germ-Line Tissues of the Jewel Wasp Nasonia vitripennis

    PubMed Central

    Ferree, Patrick M.; Fang, Christopher; Mastrodimos, Mariah; Hay, Bruce A.; Amrhein, Henry; Akbari, Omar S.

    2015-01-01

    The jewel wasp Nasonia vitripennis is a rising model organism for the study of haplo-diploid reproduction characteristic of hymenopteran insects, which include all wasps, bees, and ants. We performed transcriptional profiling of the ovary, the female soma, and the male soma of N. vitripennis to complement a previously existing transcriptome of the wasp testis. These data were deposited into an open-access genome browser for visualization of transcripts relative to their gene models. We used these data to identify the assemblies of genes uniquely expressed in the germ-line tissues. We found that 156 protein-coding genes are expressed exclusively in the wasp testis compared with only 22 in the ovary. Of the testis-specific genes, eight are candidates for male-specific DNA packaging proteins known as protamines. We found very similar expression patterns of centrosome associated genes in the testis and ovary, arguing that de novo centrosome formation, a key process for development of unfertilized eggs into males, likely does not rely on large-scale transcriptional differences between these tissues. In contrast, a number of meiosis-related genes show a bias toward testis-specific expression, despite the lack of true meiosis in N. vitripennis males. These patterns may reflect an unexpected complexity of male gamete production in the haploid males of this organism. Broadly, these data add to the growing number of genomic and genetic tools available in N. vitripennis for addressing important biological questions in this rising insect model organism. PMID:26464360

  5. Single-cell and coupled GRN models of cell patterning in the Arabidopsis thaliana root stem cell niche

    PubMed Central

    2010-01-01

    Background Recent experimental work has uncovered some of the genetic components required to maintain the Arabidopsis thaliana root stem cell niche (SCN) and its structure. Two main pathways are involved. One pathway depends on the genes SHORTROOT and SCARECROW and the other depends on the PLETHORA genes, which have been proposed to constitute the auxin readouts. Recent evidence suggests that a regulatory circuit, composed of WOX5 and CLE40, also contributes to the SCN maintenance. Yet, we still do not understand how the niche is dynamically maintained and patterned or if the uncovered molecular components are sufficient to recover the observed gene expression configurations that characterize the cell types within the root SCN. Mathematical and computational tools have proven useful in understanding the dynamics of cell differentiation. Hence, to further explore root SCN patterning, we integrated available experimental data into dynamic Gene Regulatory Network (GRN) models and addressed if these are sufficient to attain observed gene expression configurations in the root SCN in a robust and autonomous manner. Results We found that an SCN GRN model based only on experimental data did not reproduce the configurations observed within the root SCN. We developed several alternative GRN models that recover these expected stable gene configurations. Such models incorporate a few additional components and interactions in addition to those that have been uncovered. The recovered configurations are stable to perturbations, and the models are able to recover the observed gene expression profiles of almost all the mutants described so far. However, the robustness of the postulated GRNs is not as high as that of other previously studied networks. Conclusions These models are the first published approximations for a dynamic mechanism of the A. thaliana root SCN cellular pattering. Our model is useful to formally show that the data now available are not sufficient to fully reproduce root SCN organization and genetic profiles. We then highlight some experimental holes that remain to be studied and postulate some novel gene interactions. Finally, we suggest the existence of a generic dynamical motif that can be involved in both plant and animal SCN maintenance. PMID:20920363

  6. Computational discovery and in vivo validation of hnf4 as a regulatory gene in planarian regeneration.

    PubMed

    Lobo, Daniel; Morokuma, Junji; Levin, Michael

    2016-09-01

    Automated computational methods can infer dynamic regulatory network models directly from temporal and spatial experimental data, such as genetic perturbations and their resultant morphologies. Recently, a computational method was able to reverse-engineer the first mechanistic model of planarian regeneration that can recapitulate the main anterior-posterior patterning experiments published in the literature. Validating this comprehensive regulatory model via novel experiments that had not yet been performed would add in our understanding of the remarkable regeneration capacity of planarian worms and demonstrate the power of this automated methodology. Using the Michigan Molecular Interactions and STRING databases and the MoCha software tool, we characterized as hnf4 an unknown regulatory gene predicted to exist by the reverse-engineered dynamic model of planarian regeneration. Then, we used the dynamic model to predict the morphological outcomes under different single and multiple knock-downs (RNA interference) of hnf4 and its predicted gene pathway interactors β-catenin and hh Interestingly, the model predicted that RNAi of hnf4 would rescue the abnormal regenerated phenotype (tailless) of RNAi of hh in amputated trunk fragments. Finally, we validated these predictions in vivo by performing the same surgical and genetic experiments with planarian worms, obtaining the same phenotypic outcomes predicted by the reverse-engineered model. These results suggest that hnf4 is a regulatory gene in planarian regeneration, validate the computational predictions of the reverse-engineered dynamic model, and demonstrate the automated methodology for the discovery of novel genes, pathways and experimental phenotypes. michael.levin@tufts.edu. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  7. Partial least squares based identification of Duchenne muscular dystrophy specific genes.

    PubMed

    An, Hui-bo; Zheng, Hua-cheng; Zhang, Li; Ma, Lin; Liu, Zheng-yan

    2013-11-01

    Large-scale parallel gene expression analysis has provided a greater ease for investigating the underlying mechanisms of Duchenne muscular dystrophy (DMD). Previous studies typically implemented variance/regression analysis, which would be fundamentally flawed when unaccounted sources of variability in the arrays existed. Here we aim to identify genes that contribute to the pathology of DMD using partial least squares (PLS) based analysis. We carried out PLS-based analysis with two datasets downloaded from the Gene Expression Omnibus (GEO) database to identify genes contributing to the pathology of DMD. Except for the genes related to inflammation, muscle regeneration and extracellular matrix (ECM) modeling, we found some genes with high fold change, which have not been identified by previous studies, such as SRPX, GPNMB, SAT1, and LYZ. In addition, downregulation of the fatty acid metabolism pathway was found, which may be related to the progressive muscle wasting process. Our results provide a better understanding for the downstream mechanisms of DMD.

  8. A framework for list representation, enabling list stabilization through incorporation of gene exchangeabilities.

    PubMed

    Soneson, Charlotte; Fontes, Magnus

    2012-01-01

    Analysis of multivariate data sets from, for example, microarray studies frequently results in lists of genes which are associated with some response of interest. The biological interpretation is often complicated by the statistical instability of the obtained gene lists, which may partly be due to the functional redundancy among genes, implying that multiple genes can play exchangeable roles in the cell. In this paper, we use the concept of exchangeability of random variables to model this functional redundancy and thereby account for the instability. We present a flexible framework to incorporate the exchangeability into the representation of lists. The proposed framework supports straightforward comparison between any 2 lists. It can also be used to generate new more stable gene rankings incorporating more information from the experimental data. Using 2 microarray data sets, we show that the proposed method provides more robust gene rankings than existing methods with respect to sampling variations, without compromising the biological significance of the rankings.

  9. Upon accounting for the impact of isoenzyme loss, gene deletion costs anticorrelate with their evolutionary rates

    DOE PAGES

    Jacobs, Christopher; Lambourne, Luke; Xia, Yu; ...

    2017-01-20

    Here, system-level metabolic network models enable the computation of growth and metabolic phenotypes from an organism's genome. In particular, flux balance approaches have been used to estimate the contribution of individual metabolic genes to organismal fitness, offering the opportunity to test whether such contributions carry information about the evolutionary pressure on the corresponding genes. Previous failure to identify the expected negative correlation between such computed gene-loss cost and sequence-derived evolutionary rates in Saccharomyces cerevisiae has been ascribed to a real biological gap between a gene's fitness contribution to an organism "here and now"º and the same gene's historical importance asmore » evidenced by its accumulated mutations over millions of years of evolution. Here we show that this negative correlation does exist, and can be exposed by revisiting a broadly employed assumption of flux balance models. In particular, we introduce a new metric that we call "function-loss cost", which estimates the cost of a gene loss event as the total potential functional impairment caused by that loss. This new metric displays significant negative correlation with evolutionary rate, across several thousand minimal environments. We demonstrate that the improvement gained using function-loss cost over gene-loss cost is explained by replacing the base assumption that isoenzymes provide unlimited capacity for backup with the assumption that isoenzymes are completely non-redundant. We further show that this change of the assumption regarding isoenzymes increases the recall of epistatic interactions predicted by the flux balance model at the cost of a reduction in the precision of the predictions. In addition to suggesting that the gene-to-reaction mapping in genome-scale flux balance models should be used with caution, our analysis provides new evidence that evolutionary gene importance captures much more than strict essentiality.« less

  10. Upon accounting for the impact of isoenzyme loss, gene deletion costs anticorrelate with their evolutionary rates

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jacobs, Christopher; Lambourne, Luke; Xia, Yu

    Here, system-level metabolic network models enable the computation of growth and metabolic phenotypes from an organism's genome. In particular, flux balance approaches have been used to estimate the contribution of individual metabolic genes to organismal fitness, offering the opportunity to test whether such contributions carry information about the evolutionary pressure on the corresponding genes. Previous failure to identify the expected negative correlation between such computed gene-loss cost and sequence-derived evolutionary rates in Saccharomyces cerevisiae has been ascribed to a real biological gap between a gene's fitness contribution to an organism "here and now"º and the same gene's historical importance asmore » evidenced by its accumulated mutations over millions of years of evolution. Here we show that this negative correlation does exist, and can be exposed by revisiting a broadly employed assumption of flux balance models. In particular, we introduce a new metric that we call "function-loss cost", which estimates the cost of a gene loss event as the total potential functional impairment caused by that loss. This new metric displays significant negative correlation with evolutionary rate, across several thousand minimal environments. We demonstrate that the improvement gained using function-loss cost over gene-loss cost is explained by replacing the base assumption that isoenzymes provide unlimited capacity for backup with the assumption that isoenzymes are completely non-redundant. We further show that this change of the assumption regarding isoenzymes increases the recall of epistatic interactions predicted by the flux balance model at the cost of a reduction in the precision of the predictions. In addition to suggesting that the gene-to-reaction mapping in genome-scale flux balance models should be used with caution, our analysis provides new evidence that evolutionary gene importance captures much more than strict essentiality.« less

  11. Gene-environment studies: any advantage over environmental studies?

    PubMed

    Bermejo, Justo Lorenzo; Hemminki, Kari

    2007-07-01

    Gene-environment studies have been motivated by the likely existence of prevalent low-risk genes that interact with common environmental exposures. The present study assessed the statistical advantage of the simultaneous consideration of genes and environment to investigate the effect of environmental risk factors on disease. In particular, we contemplated the possibility that several genes modulate the environmental effect. Environmental exposures, genotypes and phenotypes were simulated according to a wide range of parameter settings. Different models of gene-gene-environment interaction were considered. For each parameter combination, we estimated the probability of detecting the main environmental effect, the power to identify the gene-environment interaction and the frequency of environmentally affected individuals at which environmental and gene-environment studies show the same statistical power. The proportion of cases in the population attributable to the modeled risk factors was also calculated. Our data indicate that environmental exposures with weak effects may account for a significant proportion of the population prevalence of the disease. A general result was that, if the environmental effect was restricted to rare genotypes, the power to detect the gene-environment interaction was higher than the power to identify the main environmental effect. In other words, when few individuals contribute to the overall environmental effect, individual contributions are large and result in easily identifiable gene-environment interactions. Moreover, when multiple genes interacted with the environment, the statistical benefit of gene-environment studies was limited to those studies that included major contributors to the gene-environment interaction. The advantage of gene-environment over plain environmental studies also depends on the inheritance mode of the involved genes, on the study design and, to some extend, on the disease prevalence.

  12. The chicken frizzle feather is due to an a-keratin (KRT75) mutation that causes a defective rachis

    USDA-ARS?s Scientific Manuscript database

    Feathers have complex forms and are an excellent model to study the development and evolution of morphologies. Existing chicken feather mutants are especially useful for identifying genetic determinants of feather formation. The present study focused on the gene, F, underlying the frizzle feather tr...

  13. Obesity and Breast Cancer

    DTIC Science & Technology

    2005-07-01

    serum INS, IGF-I and binding proteins, triglycerides, HDL - cholesterol , total and free steroids, sex hormone binding globulin, adiponectin, leptin, and...collection of information is estimated to average 1 hour per response , including the time for reviewing instructions, searching existing data sources...Bioinformatics, Biostatistics, Computer Science, Digital Mammography, Magnetic Resonance Imaging, Tissue Arrays, Gene Polymorphisms , Animal Models, Clinical

  14. Finding pathway-modulating genes from a novel Ontology Fingerprint-derived gene network.

    PubMed

    Qin, Tingting; Matmati, Nabil; Tsoi, Lam C; Mohanty, Bidyut K; Gao, Nan; Tang, Jijun; Lawson, Andrew B; Hannun, Yusuf A; Zheng, W Jim

    2014-10-01

    To enhance our knowledge regarding biological pathway regulation, we took an integrated approach, using the biomedical literature, ontologies, network analyses and experimental investigation to infer novel genes that could modulate biological pathways. We first constructed a novel gene network via a pairwise comparison of all yeast genes' Ontology Fingerprints--a set of Gene Ontology terms overrepresented in the PubMed abstracts linked to a gene along with those terms' corresponding enrichment P-values. The network was further refined using a Bayesian hierarchical model to identify novel genes that could potentially influence the pathway activities. We applied this method to the sphingolipid pathway in yeast and found that many top-ranked genes indeed displayed altered sphingolipid pathway functions, initially measured by their sensitivity to myriocin, an inhibitor of de novo sphingolipid biosynthesis. Further experiments confirmed the modulation of the sphingolipid pathway by one of these genes, PFA4, encoding a palmitoyl transferase. Comparative analysis showed that few of these novel genes could be discovered by other existing methods. Our novel gene network provides a unique and comprehensive resource to study pathway modulations and systems biology in general. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. Differential Evolution of Antiretroviral Restriction Factors in Pteropid Bats as Revealed by APOBEC3 Gene Complexity

    PubMed Central

    Hayward, Joshua A; Tachedjian, Mary; Cui, Jie; Cheng, Adam Z; Johnson, Adam; Baker, Michelle L; Harris, Reuben S; Wang, Lin-Fa

    2018-01-01

    Abstract Bats have attracted attention in recent years as important reservoirs of viruses deadly to humans and other mammals. These infections are typically nonpathogenic in bats raising questions about innate immune differences that might exist between bats and other mammals. The APOBEC3 gene family encodes antiviral DNA cytosine deaminases with important roles in the suppression of diverse viruses and genomic parasites. Here, we characterize pteropid APOBEC3 genes and show that species within the genus Pteropus possess the largest and most diverse array of APOBEC3 genes identified in any mammal reported to date. Several bat APOBEC3 proteins are antiviral as demonstrated by restriction of retroviral infectivity using HIV-1 as a model, and recombinant A3Z1 subtypes possess strong DNA deaminase activity. These genes represent the first group of antiviral restriction factors identified in bats with extensive diversification relative to homologues in other mammals. PMID:29617834

  16. Differential Evolution of Antiretroviral Restriction Factors in Pteropid Bats as Revealed by APOBEC3 Gene Complexity.

    PubMed

    Hayward, Joshua A; Tachedjian, Mary; Cui, Jie; Cheng, Adam Z; Johnson, Adam; Baker, Michelle L; Harris, Reuben S; Wang, Lin-Fa; Tachedjian, Gilda

    2018-07-01

    Bats have attracted attention in recent years as important reservoirs of viruses deadly to humans and other mammals. These infections are typically nonpathogenic in bats raising questions about innate immune differences that might exist between bats and other mammals. The APOBEC3 gene family encodes antiviral DNA cytosine deaminases with important roles in the suppression of diverse viruses and genomic parasites. Here, we characterize pteropid APOBEC3 genes and show that species within the genus Pteropus possess the largest and most diverse array of APOBEC3 genes identified in any mammal reported to date. Several bat APOBEC3 proteins are antiviral as demonstrated by restriction of retroviral infectivity using HIV-1 as a model, and recombinant A3Z1 subtypes possess strong DNA deaminase activity. These genes represent the first group of antiviral restriction factors identified in bats with extensive diversification relative to homologues in other mammals.

  17. The emergence of overlapping scale-free genetic architecture in digital organisms.

    PubMed

    Gerlee, P; Lundh, T

    2008-01-01

    We have studied the evolution of genetic architecture in digital organisms and found that the gene overlap follows a scale-free distribution, which is commonly found in metabolic networks of many organisms. Our results show that the slope of the scale-free distribution depends on the mutation rate and that the gene development is driven by expansion of already existing genes, which is in direct correspondence to the preferential growth algorithm that gives rise to scale-free networks. To further validate our results we have constructed a simple model of gene development, which recapitulates the results from the evolutionary process and shows that the mutation rate affects the tendency of genes to cluster. In addition we could relate the slope of the scale-free distribution to the genetic complexity of the organisms and show that a high mutation rate gives rise to a more complex genetic architecture.

  18. An integrated and comparative approach towards identification, characterization and functional annotation of candidate genes for drought tolerance in sorghum (Sorghum bicolor (L.) Moench).

    PubMed

    Woldesemayat, Adugna Abdi; Van Heusden, Peter; Ndimba, Bongani K; Christoffels, Alan

    2017-12-22

    Drought is the most disastrous abiotic stress that severely affects agricultural productivity worldwide. Understanding the biological basis of drought-regulated traits, requires identification and an in-depth characterization of genetic determinants using model organisms and high-throughput technologies. However, studies on drought tolerance have generally been limited to traditional candidate gene approach that targets only a single gene in a pathway that is related to a trait. In this study, we used sorghum, one of the model crops that is well adapted to arid regions, to mine genes and define determinants for drought tolerance using drought expression libraries and RNA-seq data. We provide an integrated and comparative in silico candidate gene identification, characterization and annotation approach, with an emphasis on genes playing a prominent role in conferring drought tolerance in sorghum. A total of 470 non-redundant functionally annotated drought responsive genes (DRGs) were identified using experimental data from drought responses by employing pairwise sequence similarity searches, pathway and interpro-domain analysis, expression profiling and orthology relation. Comparison of the genomic locations between these genes and sorghum quantitative trait loci (QTLs) showed that 40% of these genes were co-localized with QTLs known for drought tolerance. The genome reannotation conducted using the Program to Assemble Spliced Alignment (PASA), resulted in 9.6% of existing single gene models being updated. In addition, 210 putative novel genes were identified using AUGUSTUS and PASA based analysis on expression dataset. Among these, 50% were single exonic, 69.5% represented drought responsive and 5.7% were complete gene structure models. Analysis of biochemical metabolism revealed 14 metabolic pathways that are related to drought tolerance and also had a strong biological network, among categories of genes involved. Identification of these pathways, signifies the interplay of biochemical reactions that make up the metabolic network, constituting fundamental interface for sorghum defence mechanism against drought stress. This study suggests untapped natural variability in sorghum that could be used for developing drought tolerance. The data presented here, may be regarded as an initial reference point in functional and comparative genomics in the Gramineae family.

  19. From data to function: functional modeling of poultry genomics data.

    PubMed

    McCarthy, F M; Lyons, E

    2013-09-01

    One of the challenges of functional genomics is to create a better understanding of the biological system being studied so that the data produced are leveraged to provide gains for agriculture, human health, and the environment. Functional modeling enables researchers to make sense of these data as it reframes a long list of genes or gene products (mRNA, ncRNA, and proteins) by grouping based upon function, be it individual molecular functions or interactions between these molecules or broader biological processes, including metabolic and signaling pathways. However, poultry researchers have been hampered by a lack of functional annotation data, tools, and training to use these data and tools. Moreover, this lack is becoming more critical as new sequencing technologies enable us to generate data not only for an increasingly diverse range of species but also individual genomes and populations of individuals. We discuss the impact of these new sequencing technologies on poultry research, with a specific focus on what functional modeling resources are available for poultry researchers. We also describe key strategies for researchers who wish to functionally model their own data, providing background information about functional modeling approaches, the data and tools to support these approaches, and the strengths and limitations of each. Specifically, we describe methods for functional analysis using Gene Ontology (GO) functional summaries, functional enrichment analysis, and pathways and network modeling. As annotation efforts begin to provide the fundamental data that underpin poultry functional modeling (such as improved gene identification, standardized gene nomenclature, temporal and spatial expression data and gene product function), tool developers are incorporating these data into new and existing tools that are used for functional modeling, and cyberinfrastructure is being developed to provide the necessary extendibility and scalability for storing and analyzing these data. This process will support the efforts of poultry researchers to make sense of their functional genomics data sets, and we provide here a starting point for researchers who wish to take advantage of these tools.

  20. Non-parallel coevolution of sender and receiver in the acoustic communication system of treefrogs.

    PubMed

    Schul, Johannes; Bush, Sarah L

    2002-09-07

    Advertisement calls of closely related species often differ in quantitative features such as the repetition rate of signal units. These differences are important in species recognition. Current models of signal-receiver coevolution predict two possible patterns in the evolution of the mechanism used by receivers to recognize the call: (i) classical sexual selection models (Fisher process, good genes/indirect benefits, direct benefits models) predict that close relatives use qualitatively similar signal recognition mechanisms tuned to different values of a call parameter; and (ii) receiver bias models (hidden preference, pre-existing bias models) predict that if different signal recognition mechanisms are used by sibling species, evidence of an ancestral mechanism will persist in the derived species, and evidence of a pre-existing bias will be detectable in the ancestral species. We describe qualitatively different call recognition mechanisms in sibling species of treefrogs. Whereas Hyla chrysoscelis uses pulse rate to recognize male calls, Hyla versicolor uses absolute measurements of pulse duration and interval duration. We found no evidence of either hidden preferences or pre-existing biases. The results are compared with similar data from katydids (Tettigonia sp.). In both taxa, the data are not adequately explained by current models of signal-receiver coevolution.

  1. Prior knowledge driven Granger causality analysis on gene regulatory network discovery

    DOE PAGES

    Yao, Shun; Yoo, Shinjae; Yu, Dantong

    2015-08-28

    Our study focuses on discovering gene regulatory networks from time series gene expression data using the Granger causality (GC) model. However, the number of available time points (T) usually is much smaller than the number of target genes (n) in biological datasets. The widely applied pairwise GC model (PGC) and other regularization strategies can lead to a significant number of false identifications when n>>T. In this study, we proposed a new method, viz., CGC-2SPR (CGC using two-step prior Ridge regularization) to resolve the problem by incorporating prior biological knowledge about a target gene data set. In our simulation experiments, themore » propose new methodology CGC-2SPR showed significant performance improvement in terms of accuracy over other widely used GC modeling (PGC, Ridge and Lasso) and MI-based (MRNET and ARACNE) methods. In addition, we applied CGC-2SPR to a real biological dataset, i.e., the yeast metabolic cycle, and discovered more true positive edges with CGC-2SPR than with the other existing methods. In our research, we noticed a “ 1+1>2” effect when we combined prior knowledge and gene expression data to discover regulatory networks. Based on causality networks, we made a functional prediction that the Abm1 gene (its functions previously were unknown) might be related to the yeast’s responses to different levels of glucose. In conclusion, our research improves causality modeling by combining heterogeneous knowledge, which is well aligned with the future direction in system biology. Furthermore, we proposed a method of Monte Carlo significance estimation (MCSE) to calculate the edge significances which provide statistical meanings to the discovered causality networks. All of our data and source codes will be available under the link https://bitbucket.org/dtyu/granger-causality/wiki/Home.« less

  2. Resolving the homology—function relationship through comparative genomics of membrane-trafficking machinery and parasite cell biology

    PubMed Central

    Klinger, Christen M.; Ramirez-Macias, Inmaculada; Herman, Emily K.; Turkewitz, Aaron P.; Field, Mark C.; Dacks, Joel B.

    2016-01-01

    With advances in DNA sequencing technology, it is increasingly common and tractable to informatically look for genes of interest in the genomic databases of parasitic organisms and infer cellular states. Assignment of a putative gene function based on homology to functionally characterized genes in other organisms, though powerful, relies on the implicit assumption of functional homology, i.e. that orthology indicates conserved function. Eukaryotes reveal a dazzling array of cellular features and structural organization, suggesting a concomitant diversity in their underlying molecular machinery. Significantly, examples of novel functions for pre-existing or new paralogues are not uncommon. Do these examples undermine the basic assumption of functional homology, especially in parasitic protists, which are often highly derived? Here we examine the extent to which functional homology exists between organisms spanning the eukaryotic lineage. By comparing membrane trafficking proteins between parasitic protists and traditional model organisms, where direct functional evidence is available, we find that function is indeed largely conserved between orthologues, albeit with significant adaptation arising from the unique biological features within each lineage. PMID:27444378

  3. Finding pathway-modulating genes from a novel Ontology Fingerprint-derived gene network

    PubMed Central

    Qin, Tingting; Matmati, Nabil; Tsoi, Lam C.; Mohanty, Bidyut K.; Gao, Nan; Tang, Jijun; Lawson, Andrew B.; Hannun, Yusuf A.; Zheng, W. Jim

    2014-01-01

    To enhance our knowledge regarding biological pathway regulation, we took an integrated approach, using the biomedical literature, ontologies, network analyses and experimental investigation to infer novel genes that could modulate biological pathways. We first constructed a novel gene network via a pairwise comparison of all yeast genes’ Ontology Fingerprints—a set of Gene Ontology terms overrepresented in the PubMed abstracts linked to a gene along with those terms’ corresponding enrichment P-values. The network was further refined using a Bayesian hierarchical model to identify novel genes that could potentially influence the pathway activities. We applied this method to the sphingolipid pathway in yeast and found that many top-ranked genes indeed displayed altered sphingolipid pathway functions, initially measured by their sensitivity to myriocin, an inhibitor of de novo sphingolipid biosynthesis. Further experiments confirmed the modulation of the sphingolipid pathway by one of these genes, PFA4, encoding a palmitoyl transferase. Comparative analysis showed that few of these novel genes could be discovered by other existing methods. Our novel gene network provides a unique and comprehensive resource to study pathway modulations and systems biology in general. PMID:25063300

  4. The Prediction of Key Cytoskeleton Components Involved in Glomerular Diseases Based on a Protein-Protein Interaction Network.

    PubMed

    Ding, Fangrui; Tan, Aidi; Ju, Wenjun; Li, Xuejuan; Li, Shao; Ding, Jie

    2016-01-01

    Maintenance of the physiological morphologies of different types of cells and tissues is essential for the normal functioning of each system in the human body. Dynamic variations in cell and tissue morphologies depend on accurate adjustments of the cytoskeletal system. The cytoskeletal system in the glomerulus plays a key role in the normal process of kidney filtration. To enhance the understanding of the possible roles of the cytoskeleton in glomerular diseases, we constructed the Glomerular Cytoskeleton Network (GCNet), which shows the protein-protein interaction network in the glomerulus, and identified several possible key cytoskeletal components involved in glomerular diseases. In this study, genes/proteins annotated to the cytoskeleton were detected by Gene Ontology analysis, and glomerulus-enriched genes were selected from nine available glomerular expression datasets. Then, the GCNet was generated by combining these two sets of information. To predict the possible key cytoskeleton components in glomerular diseases, we then examined the common regulation of the genes in GCNet in the context of five glomerular diseases based on their transcriptomic data. As a result, twenty-one cytoskeleton components as potential candidate were highlighted for consistently down- or up-regulating in all five glomerular diseases. And then, these candidates were examined in relation to existing known glomerular diseases and genes to determine their possible functions and interactions. In addition, the mRNA levels of these candidates were also validated in a puromycin aminonucleoside(PAN) induced rat nephropathy model and were also matched with existing Diabetic Nephropathy (DN) transcriptomic data. As a result, there are 15 of 21 candidates in PAN induced nephropathy model were consistent with our predication and also 12 of 21 candidates were matched with differentially expressed genes in the DN transcriptomic data. By providing a novel interaction network and prediction, GCNet contributes to improving the understanding of normal glomerular function and will be useful for detecting target cytoskeleton molecules of interest that may be involved in glomerular diseases in future studies.

  5. The Prediction of Key Cytoskeleton Components Involved in Glomerular Diseases Based on a Protein-Protein Interaction Network

    PubMed Central

    Ju, Wenjun; Li, Xuejuan; Li, Shao; Ding, Jie

    2016-01-01

    Maintenance of the physiological morphologies of different types of cells and tissues is essential for the normal functioning of each system in the human body. Dynamic variations in cell and tissue morphologies depend on accurate adjustments of the cytoskeletal system. The cytoskeletal system in the glomerulus plays a key role in the normal process of kidney filtration. To enhance the understanding of the possible roles of the cytoskeleton in glomerular diseases, we constructed the Glomerular Cytoskeleton Network (GCNet), which shows the protein-protein interaction network in the glomerulus, and identified several possible key cytoskeletal components involved in glomerular diseases. In this study, genes/proteins annotated to the cytoskeleton were detected by Gene Ontology analysis, and glomerulus-enriched genes were selected from nine available glomerular expression datasets. Then, the GCNet was generated by combining these two sets of information. To predict the possible key cytoskeleton components in glomerular diseases, we then examined the common regulation of the genes in GCNet in the context of five glomerular diseases based on their transcriptomic data. As a result, twenty-one cytoskeleton components as potential candidate were highlighted for consistently down- or up-regulating in all five glomerular diseases. And then, these candidates were examined in relation to existing known glomerular diseases and genes to determine their possible functions and interactions. In addition, the mRNA levels of these candidates were also validated in a puromycin aminonucleoside(PAN) induced rat nephropathy model and were also matched with existing Diabetic Nephropathy (DN) transcriptomic data. As a result, there are 15 of 21 candidates in PAN induced nephropathy model were consistent with our predication and also 12 of 21 candidates were matched with differentially expressed genes in the DN transcriptomic data. By providing a novel interaction network and prediction, GCNet contributes to improving the understanding of normal glomerular function and will be useful for detecting target cytoskeleton molecules of interest that may be involved in glomerular diseases in future studies. PMID:27227331

  6. A robust two-way semi-linear model for normalization of cDNA microarray data

    PubMed Central

    Wang, Deli; Huang, Jian; Xie, Hehuang; Manzella, Liliana; Soares, Marcelo Bento

    2005-01-01

    Background Normalization is a basic step in microarray data analysis. A proper normalization procedure ensures that the intensity ratios provide meaningful measures of relative expression values. Methods We propose a robust semiparametric method in a two-way semi-linear model (TW-SLM) for normalization of cDNA microarray data. This method does not make the usual assumptions underlying some of the existing methods. For example, it does not assume that: (i) the percentage of differentially expressed genes is small; or (ii) the numbers of up- and down-regulated genes are about the same, as required in the LOWESS normalization method. We conduct simulation studies to evaluate the proposed method and use a real data set from a specially designed microarray experiment to compare the performance of the proposed method with that of the LOWESS normalization approach. Results The simulation results show that the proposed method performs better than the LOWESS normalization method in terms of mean square errors for estimated gene effects. The results of analysis of the real data set also show that the proposed method yields more consistent results between the direct and the indirect comparisons and also can detect more differentially expressed genes than the LOWESS method. Conclusions Our simulation studies and the real data example indicate that the proposed robust TW-SLM method works at least as well as the LOWESS method and works better when the underlying assumptions for the LOWESS method are not satisfied. Therefore, it is a powerful alternative to the existing normalization methods. PMID:15663789

  7. Outgroup, alignment and modelling improvements indicate that two TNFSF13-like genes existed in the vertebrate ancestor.

    PubMed

    Redmond, Anthony K; Pettinello, Rita; Dooley, Helen

    2017-03-01

    The molecular machinery required for lymphocyte development and differentiation appears to have emerged concomitantly with distinct B- and T-like lymphocyte subsets in the ancestor of all vertebrates. The TNFSF superfamily (TNFSF) members BAFF (TNFSF13/Blys) and APRIL (TNFSF13) are key regulators of B cell development survival, and activation in mammals, but the temporal emergence of these molecules, and their precise relationship to the newly identified TNFSF gene BALM (BAFF and APRIL-like molecule), have not yet been elucidated. Here, to resolve the early evolutionary history of this family, we improved outgroup sampling and alignment quality, and applied better fitting substitution models compared to past studies. Our analyses reveal that BALM is a definitive TNFSF13 family member, which split from BAFF in the gnathostome (jawed vertebrate) ancestor. Most importantly, however, we show that both the APRIL and BAFF lineages existed in the ancestors of all extant vertebrates. This implies that APRIL has been lost, or is yet to be found, in cyclostomes (jawless vertebrates). Our results suggest that lineage-specific gene duplication and loss events have caused lymphocyte regulation, despite shared origins, to become secondarily distinct between gnathostomes and cyclostomes. Finally, the structure of lamprey BAFF-like, and its phylogenetic placement as sister to BAFF and BALM, but not the more slowly evolving APRIL, indicates that the primordial lymphocyte regulator was more APRIL-like than BAFF-like.

  8. PhotoMorphs™: A Novel Light-Activated Reagent for Controlling Gene Expression in Zebrafish

    PubMed Central

    Tomasini, Amber J.; Schuler, Aaron D.; Zebala, John A.; Mayer, Alan N.

    2009-01-01

    Manipulating gene expression in zebrafish is critical for exploiting the full potential of this vertebrate model organism. Morpholino oligos are the most commonly employed antisense technology for knocking down gene expression. However, morpholinos suffer from a lack of control over the timing and location of knockdown. In this report, we describe a novel light-activatable knockdown reagent called PhotoMorph™. PhotoMorphs can be generated from existing morpholinos by hybridization with a complementary caging strand containing a photocleavable linkage. The caging strand neutralizes the morpholino activity until irradiation of the PhotoMorph with UV light releases the morpholino. We generated PhotoMorphs to target genes encoding enhanced green fluorescent protein (EGFP), No tail, and E-cadherin to illustrate the utility of this approach. Temporal control of gene expression with PhotoMorphs permitted us to circumvent the early lethal phenotype of E-cadherin knockdown. A splice-blocking PhotoMorph directed to the rheb gene showed light-dependent gene knockdown up to 72 hpf. PhotoMorphs thus offer a new class of laboratory reagents suitable for the spatiotemporal control of gene expression in the zebrafish. PMID:19644983

  9. Afrobatrachian mitochondrial genomes: genome reorganization, gene rearrangement mechanisms, and evolutionary trends of duplicated and rearranged genes

    PubMed Central

    2013-01-01

    Background Mitochondrial genomic (mitogenomic) reorganizations are rarely found in closely-related animals, yet drastic reorganizations have been found in the Ranoides frogs. The phylogenetic relationships of the three major ranoid taxa (Natatanura, Microhylidae, and Afrobatrachia) have been problematic, and mitogenomic information for afrobatrachians has not been available. Several molecular models for mitochondrial (mt) gene rearrangements have been proposed, but observational evidence has been insufficient to evaluate them. Furthermore, evolutionary trends in rearranged mt genes have not been well understood. To gain molecular and phylogenetic insights into these issues, we analyzed the mt genomes of four afrobatrachian species (Breviceps adspersus, Hemisus marmoratus, Hyperolius marmoratus, and Trichobatrachus robustus) and performed molecular phylogenetic analyses. Furthermore we searched for two evolutionary patterns expected in the rearranged mt genes of ranoids. Results Extensively reorganized mt genomes having many duplicated and rearranged genes were found in three of the four afrobatrachians analyzed. In fact, Breviceps has the largest known mt genome among vertebrates. Although the kinds of duplicated and rearranged genes differed among these species, a remarkable gene rearrangement pattern of non-tandemly copied genes situated within tandemly-copied regions was commonly found. Furthermore, the existence of concerted evolution was observed between non-neighboring copies of triplicated 12S and 16S ribosomal RNA regions. Conclusions Phylogenetic analyses based on mitogenomic data support a close relationship between Afrobatrachia and Microhylidae, with their estimated divergence 100 million years ago consistent with present-day endemism of afrobatrachians on the African continent. The afrobatrachian mt data supported the first tandem and second non-tandem duplication model for mt gene rearrangements and the recombination-based model for concerted evolution of duplicated mt regions. We also showed that specific nucleotide substitution and compositional patterns expected in duplicated and rearranged mt genes did not occur, suggesting no disadvantage in employing these genes for phylogenetic inference. PMID:24053406

  10. Pluripotency, Differentiation, and Reprogramming: A Gene Expression Dynamics Model with Epigenetic Feedback Regulation

    PubMed Central

    Miyamoto, Tadashi; Furusawa, Chikara; Kaneko, Kunihiko

    2015-01-01

    Embryonic stem cells exhibit pluripotency: they can differentiate into all types of somatic cells. Pluripotent genes such as Oct4 and Nanog are activated in the pluripotent state, and their expression decreases during cell differentiation. Inversely, expression of differentiation genes such as Gata6 and Gata4 is promoted during differentiation. The gene regulatory network controlling the expression of these genes has been described, and slower-scale epigenetic modifications have been uncovered. Although the differentiation of pluripotent stem cells is normally irreversible, reprogramming of cells can be experimentally manipulated to regain pluripotency via overexpression of certain genes. Despite these experimental advances, the dynamics and mechanisms of differentiation and reprogramming are not yet fully understood. Based on recent experimental findings, we constructed a simple gene regulatory network including pluripotent and differentiation genes, and we demonstrated the existence of pluripotent and differentiated states from the resultant dynamical-systems model. Two differentiation mechanisms, interaction-induced switching from an expression oscillatory state and noise-assisted transition between bistable stationary states, were tested in the model. The former was found to be relevant to the differentiation process. We also introduced variables representing epigenetic modifications, which controlled the threshold for gene expression. By assuming positive feedback between expression levels and the epigenetic variables, we observed differentiation in expression dynamics. Additionally, with numerical reprogramming experiments for differentiated cells, we showed that pluripotency was recovered in cells by imposing overexpression of two pluripotent genes and external factors to control expression of differentiation genes. Interestingly, these factors were consistent with the four Yamanaka factors, Oct4, Sox2, Klf4, and Myc, which were necessary for the establishment of induced pluripotent stem cells. These results, based on a gene regulatory network and expression dynamics, contribute to our wider understanding of pluripotency, differentiation, and reprogramming of cells, and they provide a fresh viewpoint on robustness and control during development. PMID:26308610

  11. Impact of Cigarette Smoke on the Human and Mouse Lungs: A Gene-Expression Comparison Study

    PubMed Central

    Morissette, Mathieu C.; Lamontagne, Maxime; Bérubé, Jean-Christophe; Gaschler, Gordon; Williams, Andrew; Yauk, Carole; Couture, Christian; Laviolette, Michel; Hogg, James C.; Timens, Wim; Halappanavar, Sabina; Stampfli, Martin R.; Bossé, Yohan

    2014-01-01

    Cigarette smoke is well known for its adverse effects on human health, especially on the lungs. Basic research is essential to identify the mechanisms involved in the development of cigarette smoke-related diseases, but translation of new findings from pre-clinical models to the clinic remains difficult. In the present study, we aimed at comparing the gene expression signature between the lungs of human smokers and mice exposed to cigarette smoke to identify the similarities and differences. Using human and mouse whole-genome gene expression arrays, changes in gene expression, signaling pathways and biological functions were assessed. We found that genes significantly modulated by cigarette smoke in humans were enriched for genes modulated by cigarette smoke in mice, suggesting a similar response of both species. Sixteen smoking-induced genes were in common between humans and mice including six newly reported to be modulated by cigarette smoke. In addition, we identified a new conserved pulmonary response to cigarette smoke in the induction of phospholipid metabolism/degradation pathways. Finally, the majority of biological functions modulated by cigarette smoke in humans were also affected in mice. Altogether, the present study provides information on similarities and differences in lung gene expression response to cigarette smoke that exist between human and mouse. Our results foster the idea that animal models should be used to study the involvement of pathways rather than single genes in human diseases. PMID:24663285

  12. A regulation probability model-based meta-analysis of multiple transcriptomics data sets for cancer biomarker identification.

    PubMed

    Xie, Xin-Ping; Xie, Yu-Feng; Wang, Hong-Qiang

    2017-08-23

    Large-scale accumulation of omics data poses a pressing challenge of integrative analysis of multiple data sets in bioinformatics. An open question of such integrative analysis is how to pinpoint consistent but subtle gene activity patterns across studies. Study heterogeneity needs to be addressed carefully for this goal. This paper proposes a regulation probability model-based meta-analysis, jGRP, for identifying differentially expressed genes (DEGs). The method integrates multiple transcriptomics data sets in a gene regulatory space instead of in a gene expression space, which makes it easy to capture and manage data heterogeneity across studies from different laboratories or platforms. Specifically, we transform gene expression profiles into a united gene regulation profile across studies by mathematically defining two gene regulation events between two conditions and estimating their occurring probabilities in a sample. Finally, a novel differential expression statistic is established based on the gene regulation profiles, realizing accurate and flexible identification of DEGs in gene regulation space. We evaluated the proposed method on simulation data and real-world cancer datasets and showed the effectiveness and efficiency of jGRP in identifying DEGs identification in the context of meta-analysis. Data heterogeneity largely influences the performance of meta-analysis of DEGs identification. Existing different meta-analysis methods were revealed to exhibit very different degrees of sensitivity to study heterogeneity. The proposed method, jGRP, can be a standalone tool due to its united framework and controllable way to deal with study heterogeneity.

  13. Pan- and core- network analysis of co-expression genes in a model plant

    DOE PAGES

    He, Fei; Maslov, Sergei

    2016-12-16

    Genome-wide gene expression experiments have been performed using the model plant Arabidopsis during the last decade. Some studies involved construction of coexpression networks, a popular technique used to identify groups of co-regulated genes, to infer unknown gene functions. One approach is to construct a single coexpression network by combining multiple expression datasets generated in different labs. We advocate a complementary approach in which we construct a large collection of 134 coexpression networks based on expression datasets reported in individual publications. To this end we reanalyzed public expression data. To describe this collection of networks we introduced concepts of ‘pan-network’ andmore » ‘core-network’ representing union and intersection between a sizeable fractions of individual networks, respectively. Here, we showed that these two types of networks are different both in terms of their topology and biological function of interacting genes. For example, the modules of the pan-network are enriched in regulatory and signaling functions, while the modules of the core-network tend to include components of large macromolecular complexes such as ribosomes and photosynthetic machinery. Our analysis is aimed to help the plant research community to better explore the information contained within the existing vast collection of gene expression data in Arabidopsis.« less

  14. Sex chromosome loss and the pseudoautosomal region genes in hematological malignancies

    PubMed Central

    Weng, Stephanie; Stoner, Samuel A.; Zhang, Dong-Er

    2016-01-01

    Cytogenetic aberrations, such as chromosomal translocations, aneuploidy, and amplifications, are frequently detected in hematological malignancies. For many of the common autosomal aberrations, the mechanisms underlying their roles in cancer development have been well-characterized. On the contrary, although loss of a sex chromosome is observed in a broad range of hematological malignancies, how it cooperates in disease development is less understood. Nevertheless, it has been postulated that tumor suppressor genes reside on the sex chromosomes. Although the X and Y sex chromosomes are highly divergent, the pseudoautosomal regions are homologous between both chromosomes. Here, we review what is currently known about the pseudoautosomal region genes in the hematological system. Additionally, we discuss implications for haploinsufficiency of critical pseudoautosomal region sex chromosome genes, driven by sex chromosome loss, in promoting hematological malignancies. Because mechanistic studies on disease development rely heavily on murine models, we also discuss the challenges and caveats of existing models, and propose alternatives for examining the involvement of pseudoautosomal region genes and loss of a sex chromosome in vivo. With the widespread detection of loss of a sex chromosome in different hematological malignances, the elucidation of the role of pseudoautosomal region genes in the development and progression of these diseases would be invaluable to the field. PMID:27655702

  15. Genetic Influences on Peer and Family Relationships Across Adolescent Development: Introduction to the Special Issue.

    PubMed

    Mullineaux, Paula Y; DiLalla, Lisabeth Fisher

    2015-07-01

    Nearly all aspects of human development are influenced by genetic and environmental factors, which conjointly shape development through several gene-environment interplay mechanisms. More recently, researchers have begun to examine the influence of genetic factors on peer and family relationships across the pre-adolescent and adolescent time periods. This article introduces the special issue by providing a critical overview of behavior genetic methodology and existing research demonstrating gene-environment processes operating on the link between peer and family relationships and adolescent adjustment. The overview is followed by a summary of new research studies, which use genetically informed samples to examine how peer and family environment work together with genetic factors to influence behavioral outcomes across adolescence. The studies in this special issue provide further evidence of gene-environment interplay through innovative behavior genetic methodological approaches across international samples. Results from the quantitative models indicate environmental moderation of genetic risk for coercive adolescent-parent relationships and deviant peer affiliation. The molecular genetics studies provide support for a gene-environment interaction differential susceptibility model for dopamine regulation genes across positive and negative peer and family environments. Overall, the findings from the studies in this special issue demonstrate the importance of considering how genes and environments work in concert to shape developmental outcomes during adolescence.

  16. Pan- and core- network analysis of co-expression genes in a model plant

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    He, Fei; Maslov, Sergei

    Genome-wide gene expression experiments have been performed using the model plant Arabidopsis during the last decade. Some studies involved construction of coexpression networks, a popular technique used to identify groups of co-regulated genes, to infer unknown gene functions. One approach is to construct a single coexpression network by combining multiple expression datasets generated in different labs. We advocate a complementary approach in which we construct a large collection of 134 coexpression networks based on expression datasets reported in individual publications. To this end we reanalyzed public expression data. To describe this collection of networks we introduced concepts of ‘pan-network’ andmore » ‘core-network’ representing union and intersection between a sizeable fractions of individual networks, respectively. Here, we showed that these two types of networks are different both in terms of their topology and biological function of interacting genes. For example, the modules of the pan-network are enriched in regulatory and signaling functions, while the modules of the core-network tend to include components of large macromolecular complexes such as ribosomes and photosynthetic machinery. Our analysis is aimed to help the plant research community to better explore the information contained within the existing vast collection of gene expression data in Arabidopsis.« less

  17. Bi-Force: large-scale bicluster editing and its application to gene expression data biclustering.

    PubMed

    Sun, Peng; Speicher, Nora K; Röttger, Richard; Guo, Jiong; Baumbach, Jan

    2014-05-01

    The explosion of the biological data has dramatically reformed today's biological research. The need to integrate and analyze high-dimensional biological data on a large scale is driving the development of novel bioinformatics approaches. Biclustering, also known as 'simultaneous clustering' or 'co-clustering', has been successfully utilized to discover local patterns in gene expression data and similar biomedical data types. Here, we contribute a new heuristic: 'Bi-Force'. It is based on the weighted bicluster editing model, to perform biclustering on arbitrary sets of biological entities, given any kind of pairwise similarities. We first evaluated the power of Bi-Force to solve dedicated bicluster editing problems by comparing Bi-Force with two existing algorithms in the BiCluE software package. We then followed a biclustering evaluation protocol in a recent review paper from Eren et al. (2013) (A comparative analysis of biclustering algorithms for gene expressiondata. Brief. Bioinform., 14:279-292.) and compared Bi-Force against eight existing tools: FABIA, QUBIC, Cheng and Church, Plaid, BiMax, Spectral, xMOTIFs and ISA. To this end, a suite of synthetic datasets as well as nine large gene expression datasets from Gene Expression Omnibus were analyzed. All resulting biclusters were subsequently investigated by Gene Ontology enrichment analysis to evaluate their biological relevance. The distinct theoretical foundation of Bi-Force (bicluster editing) is more powerful than strict biclustering. We thus outperformed existing tools with Bi-Force at least when following the evaluation protocols from Eren et al. Bi-Force is implemented in Java and integrated into the open source software package of BiCluE. The software as well as all used datasets are publicly available at http://biclue.mpi-inf.mpg.de. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. GIANT API: an application programming interface for functional genomics.

    PubMed

    Roberts, Andrew M; Wong, Aaron K; Fisk, Ian; Troyanskaya, Olga G

    2016-07-08

    GIANT API provides biomedical researchers programmatic access to tissue-specific and global networks in humans and model organisms, and associated tools, which includes functional re-prioritization of existing genome-wide association study (GWAS) data. Using tissue-specific interaction networks, researchers are able to predict relationships between genes specific to a tissue or cell lineage, identify the changing roles of genes across tissues and uncover disease-gene associations. Additionally, GIANT API enables computational tools like NetWAS, which leverages tissue-specific networks for re-prioritization of GWAS results. The web services covered by the API include 144 tissue-specific functional gene networks in human, global functional networks for human and six common model organisms and the NetWAS method. GIANT API conforms to the REST architecture, which makes it stateless, cacheable and highly scalable. It can be used by a diverse range of clients including web browsers, command terminals, programming languages and standalone apps for data analysis and visualization. The API is freely available for use at http://giant-api.princeton.edu. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. yStreX: yeast stress expression database

    PubMed Central

    Wanichthanarak, Kwanjeera; Nookaew, Intawat; Petranovic, Dina

    2014-01-01

    Over the past decade genome-wide expression analyses have been often used to study how expression of genes changes in response to various environmental stresses. Many of these studies (such as effects of oxygen concentration, temperature stress, low pH stress, osmotic stress, depletion or limitation of nutrients, addition of different chemical compounds, etc.) have been conducted in the unicellular Eukaryal model, yeast Saccharomyces cerevisiae. However, the lack of a unifying or integrated, bioinformatics platform that would permit efficient and rapid use of all these existing data remain an important issue. To facilitate research by exploiting existing transcription data in the field of yeast physiology, we have developed the yStreX database. It is an online repository of analyzed gene expression data from curated data sets from different studies that capture genome-wide transcriptional changes in response to diverse environmental transitions. The first aim of this online database is to facilitate comparison of cross-platform and cross-laboratory gene expression data. Additionally, we performed different expression analyses, meta-analyses and gene set enrichment analyses; and the results are also deposited in this database. Lastly, we constructed a user-friendly Web interface with interactive visualization to provide intuitive access and to display the queried data for users with no background in bioinformatics. Database URL: http://www.ystrexdb.com PMID:25024351

  20. HomoTarget: a new algorithm for prediction of microRNA targets in Homo sapiens.

    PubMed

    Ahmadi, Hamed; Ahmadi, Ali; Azimzadeh-Jamalkandi, Sadegh; Shoorehdeli, Mahdi Aliyari; Salehzadeh-Yazdi, Ali; Bidkhori, Gholamreza; Masoudi-Nejad, Ali

    2013-02-01

    MiRNAs play an essential role in the networks of gene regulation by inhibiting the translation of target mRNAs. Several computational approaches have been proposed for the prediction of miRNA target-genes. Reports reveal a large fraction of under-predicted or falsely predicted target genes. Thus, there is an imperative need to develop a computational method by which the target mRNAs of existing miRNAs can be correctly identified. In this study, combined pattern recognition neural network (PRNN) and principle component analysis (PCA) architecture has been proposed in order to model the complicated relationship between miRNAs and their target mRNAs in humans. The results of several types of intelligent classifiers and our proposed model were compared, showing that our algorithm outperformed them with higher sensitivity and specificity. Using the recent release of the mirBase database to find potential targets of miRNAs, this model incorporated twelve structural, thermodynamic and positional features of miRNA:mRNA binding sites to select target candidates. Copyright © 2012 Elsevier Inc. All rights reserved.

  1. Discovering novel pharmacogenomic biomarkers by imputing drug response in cancer patients from large genomics studies.

    PubMed

    Geeleher, Paul; Zhang, Zhenyu; Wang, Fan; Gruener, Robert F; Nath, Aritro; Morrison, Gladys; Bhutra, Steven; Grossman, Robert L; Huang, R Stephanie

    2017-10-01

    Obtaining accurate drug response data in large cohorts of cancer patients is very challenging; thus, most cancer pharmacogenomics discovery is conducted in preclinical studies, typically using cell lines and mouse models. However, these platforms suffer from serious limitations, including small sample sizes. Here, we have developed a novel computational method that allows us to impute drug response in very large clinical cancer genomics data sets, such as The Cancer Genome Atlas (TCGA). The approach works by creating statistical models relating gene expression to drug response in large panels of cancer cell lines and applying these models to tumor gene expression data in the clinical data sets (e.g., TCGA). This yields an imputed drug response for every drug in each patient. These imputed drug response data are then associated with somatic genetic variants measured in the clinical cohort, such as copy number changes or mutations in protein coding genes. These analyses recapitulated drug associations for known clinically actionable somatic genetic alterations and identified new predictive biomarkers for existing drugs. © 2017 Geeleher et al.; Published by Cold Spring Harbor Laboratory Press.

  2. DM-BLD: differential methylation detection using a hierarchical Bayesian model exploiting local dependency.

    PubMed

    Wang, Xiao; Gu, Jinghua; Hilakivi-Clarke, Leena; Clarke, Robert; Xuan, Jianhua

    2017-01-15

    The advent of high-throughput DNA methylation profiling techniques has enabled the possibility of accurate identification of differentially methylated genes for cancer research. The large number of measured loci facilitates whole genome methylation study, yet posing great challenges for differential methylation detection due to the high variability in tumor samples. We have developed a novel probabilistic approach, D: ifferential M: ethylation detection using a hierarchical B: ayesian model exploiting L: ocal D: ependency (DM-BLD), to detect differentially methylated genes based on a Bayesian framework. The DM-BLD approach features a joint model to capture both the local dependency of measured loci and the dependency of methylation change in samples. Specifically, the local dependency is modeled by Leroux conditional autoregressive structure; the dependency of methylation changes is modeled by a discrete Markov random field. A hierarchical Bayesian model is developed to fully take into account the local dependency for differential analysis, in which differential states are embedded as hidden variables. Simulation studies demonstrate that DM-BLD outperforms existing methods for differential methylation detection, particularly when the methylation change is moderate and the variability of methylation in samples is high. DM-BLD has been applied to breast cancer data to identify important methylated genes (such as polycomb target genes and genes involved in transcription factor activity) associated with breast cancer recurrence. A Matlab package of DM-BLD is available at http://www.cbil.ece.vt.edu/software.htm CONTACT: Xuan@vt.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  3. Origins of magic: review of genetic and epigenetic effects.

    PubMed

    Ramagopalan, Sreeram V; Knight, Marian; Ebers, George C; Knight, Julian C

    2007-12-22

    To assess the evidence for a genetic basis to magic. Literature review. Harry Potter novels of J K Rowling. Muggles, witches, wizards, and squibs. Limited. Family and twin studies, magical ability, and specific magical skills. Magic shows strong evidence of heritability, with familial aggregation and concordance in twins. Evidence suggests magical ability to be a quantitative trait. Specific magical skills, notably being able to speak to snakes, predict the future, and change hair colour, all seem heritable. A multilocus model with a dominant gene for magic might exist, controlled epistatically by one or more loci, possibly recessive in nature. Magical enhancers regulating gene expressionmay be involved, combined with mutations at specific genes implicated in speech and hair colour such as FOXP2 and MCR1.

  4. Improving the measurement of semantic similarity by combining gene ontology and co-functional network: a random walk based approach.

    PubMed

    Peng, Jiajie; Zhang, Xuanshuo; Hui, Weiwei; Lu, Junya; Li, Qianqian; Liu, Shuhui; Shang, Xuequn

    2018-03-19

    Gene Ontology (GO) is one of the most popular bioinformatics resources. In the past decade, Gene Ontology-based gene semantic similarity has been effectively used to model gene-to-gene interactions in multiple research areas. However, most existing semantic similarity approaches rely only on GO annotations and structure, or incorporate only local interactions in the co-functional network. This may lead to inaccurate GO-based similarity resulting from the incomplete GO topology structure and gene annotations. We present NETSIM2, a new network-based method that allows researchers to measure GO-based gene functional similarities by considering the global structure of the co-functional network with a random walk with restart (RWR)-based method, and by selecting the significant term pairs to decrease the noise information. Based on the EC number (Enzyme Commission)-based groups of yeast and Arabidopsis, evaluation test shows that NETSIM2 can enhance the accuracy of Gene Ontology-based gene functional similarity. Using NETSIM2 as an example, we found that the accuracy of semantic similarities can be significantly improved after effectively incorporating the global gene-to-gene interactions in the co-functional network, especially on the species that gene annotations in GO are far from complete.

  5. Unbiased Quantitative Models of Protein Translation Derived from Ribosome Profiling Data

    PubMed Central

    Gritsenko, Alexey A.; Hulsman, Marc; Reinders, Marcel J. T.; de Ridder, Dick

    2015-01-01

    Translation of RNA to protein is a core process for any living organism. While for some steps of this process the effect on protein production is understood, a holistic understanding of translation still remains elusive. In silico modelling is a promising approach for elucidating the process of protein synthesis. Although a number of computational models of the process have been proposed, their application is limited by the assumptions they make. Ribosome profiling (RP), a relatively new sequencing-based technique capable of recording snapshots of the locations of actively translating ribosomes, is a promising source of information for deriving unbiased data-driven translation models. However, quantitative analysis of RP data is challenging due to high measurement variance and the inability to discriminate between the number of ribosomes measured on a gene and their speed of translation. We propose a solution in the form of a novel multi-scale interpretation of RP data that allows for deriving models with translation dynamics extracted from the snapshots. We demonstrate the usefulness of this approach by simultaneously determining for the first time per-codon translation elongation and per-gene translation initiation rates of Saccharomyces cerevisiae from RP data for two versions of the Totally Asymmetric Exclusion Process (TASEP) model of translation. We do this in an unbiased fashion, by fitting the models using only RP data with a novel optimization scheme based on Monte Carlo simulation to keep the problem tractable. The fitted models match the data significantly better than existing models and their predictions show better agreement with several independent protein abundance datasets than existing models. Results additionally indicate that the tRNA pool adaptation hypothesis is incomplete, with evidence suggesting that tRNA post-transcriptional modifications and codon context may play a role in determining codon elongation rates. PMID:26275099

  6. Unbiased Quantitative Models of Protein Translation Derived from Ribosome Profiling Data.

    PubMed

    Gritsenko, Alexey A; Hulsman, Marc; Reinders, Marcel J T; de Ridder, Dick

    2015-08-01

    Translation of RNA to protein is a core process for any living organism. While for some steps of this process the effect on protein production is understood, a holistic understanding of translation still remains elusive. In silico modelling is a promising approach for elucidating the process of protein synthesis. Although a number of computational models of the process have been proposed, their application is limited by the assumptions they make. Ribosome profiling (RP), a relatively new sequencing-based technique capable of recording snapshots of the locations of actively translating ribosomes, is a promising source of information for deriving unbiased data-driven translation models. However, quantitative analysis of RP data is challenging due to high measurement variance and the inability to discriminate between the number of ribosomes measured on a gene and their speed of translation. We propose a solution in the form of a novel multi-scale interpretation of RP data that allows for deriving models with translation dynamics extracted from the snapshots. We demonstrate the usefulness of this approach by simultaneously determining for the first time per-codon translation elongation and per-gene translation initiation rates of Saccharomyces cerevisiae from RP data for two versions of the Totally Asymmetric Exclusion Process (TASEP) model of translation. We do this in an unbiased fashion, by fitting the models using only RP data with a novel optimization scheme based on Monte Carlo simulation to keep the problem tractable. The fitted models match the data significantly better than existing models and their predictions show better agreement with several independent protein abundance datasets than existing models. Results additionally indicate that the tRNA pool adaptation hypothesis is incomplete, with evidence suggesting that tRNA post-transcriptional modifications and codon context may play a role in determining codon elongation rates.

  7. Inference of quantitative models of bacterial promoters from time-series reporter gene data.

    PubMed

    Stefan, Diana; Pinel, Corinne; Pinhal, Stéphane; Cinquemani, Eugenio; Geiselmann, Johannes; de Jong, Hidde

    2015-01-01

    The inference of regulatory interactions and quantitative models of gene regulation from time-series transcriptomics data has been extensively studied and applied to a range of problems in drug discovery, cancer research, and biotechnology. The application of existing methods is commonly based on implicit assumptions on the biological processes under study. First, the measurements of mRNA abundance obtained in transcriptomics experiments are taken to be representative of protein concentrations. Second, the observed changes in gene expression are assumed to be solely due to transcription factors and other specific regulators, while changes in the activity of the gene expression machinery and other global physiological effects are neglected. While convenient in practice, these assumptions are often not valid and bias the reverse engineering process. Here we systematically investigate, using a combination of models and experiments, the importance of this bias and possible corrections. We measure in real time and in vivo the activity of genes involved in the FliA-FlgM module of the E. coli motility network. From these data, we estimate protein concentrations and global physiological effects by means of kinetic models of gene expression. Our results indicate that correcting for the bias of commonly-made assumptions improves the quality of the models inferred from the data. Moreover, we show by simulation that these improvements are expected to be even stronger for systems in which protein concentrations have longer half-lives and the activity of the gene expression machinery varies more strongly across conditions than in the FliA-FlgM module. The approach proposed in this study is broadly applicable when using time-series transcriptome data to learn about the structure and dynamics of regulatory networks. In the case of the FliA-FlgM module, our results demonstrate the importance of global physiological effects and the active regulation of FliA and FlgM half-lives for the dynamics of FliA-dependent promoters.

  8. Association between polymorphisms of estrogen receptor 2 and benign prostatic hyperplasia

    PubMed Central

    KIM, SU KANG; CHUNG, JOO-HO; PARK, HYUN CHUL; KIM, JUN HO; ANN, JAE HONG; PARK, HUN KUK; LEE, SANG HYUP; YOO, KOO HAN; LEE, BYUNG-CHEOL; KIM, YOUNG OCK

    2015-01-01

    Estrogens and estrogen receptors (ESRs) have been implicated in the stimulation of aberrant prostate growth and the development of prostate diseases. The aim of the present study was to investigate four single nucleotide polymorphisms (SNPs) of the ESR2 gene in order to examine whether ESR2 is a susceptibility gene for benign prostatic hyperplasia (BPH). In order to evaluate whether an association exists between ESR2 and BPH risk, four polymorphisms [rs4986938 (intron), rs17766755 (intron), rs12435857 (intron) and rs1256049 (Val328Val)] of the ESR2 gene were genotyped by direct sequencing. A total of 94 patients with BPH and 79 control subjects were examined. SNPStats and Haploview version 4.2 we used for the genetic analysis. Multiple logistic regression models (codominant1, codominant2, dominant, recessive and log-additive) were produced in order to obtain the odds ratio, 95% confidence interval and P-value. Three SNPs (rs4986938, rs17766755 and rs12435857) showed significant associations with BPH (rs4986938, P=0.015 in log-additive model; rs17766755, P=0.033 in codominant1 model, P=0.019 in dominant model and P=0.020 in log-additive model; rs12435857, P=0.023 in dominant model and P=0.011 in log-additive model). The minor alleles of these SNPs increased the risk of BPH, and the AAC haplotype showed significant association with BPH (χ2=6.34, P=0.0118). These data suggest that the ESR2 gene may be associated with susceptibility to BPH. PMID:26640585

  9. Association between polymorphisms of estrogen receptor 2 and benign prostatic hyperplasia.

    PubMed

    Kim, Su Kang; Chung, Joo-Ho; Park, Hyun Chul; Kim, Jun Ho; Ann, Jae Hong; Park, Hun Kuk; Lee, Sang Hyup; Yoo, Koo Han; Lee, Byung-Cheol; Kim, Young Ock

    2015-11-01

    Estrogens and estrogen receptors (ESRs) have been implicated in the stimulation of aberrant prostate growth and the development of prostate diseases. The aim of the present study was to investigate four single nucleotide polymorphisms (SNPs) of the ESR2 gene in order to examine whether ESR2 is a susceptibility gene for benign prostatic hyperplasia (BPH). In order to evaluate whether an association exists between ESR2 and BPH risk, four polymorphisms [rs4986938 (intron), rs17766755 (intron), rs12435857 (intron) and rs1256049 (Val328Val)] of the ESR2 gene were genotyped by direct sequencing. A total of 94 patients with BPH and 79 control subjects were examined. SNPStats and Haploview version 4.2 we used for the genetic analysis. Multiple logistic regression models (codominant1, codominant2, dominant, recessive and log-additive) were produced in order to obtain the odds ratio, 95% confidence interval and P-value. Three SNPs (rs4986938, rs17766755 and rs12435857) showed significant associations with BPH (rs4986938, P=0.015 in log-additive model; rs17766755, P=0.033 in codominant1 model, P=0.019 in dominant model and P=0.020 in log-additive model; rs12435857, P=0.023 in dominant model and P=0.011 in log-additive model). The minor alleles of these SNPs increased the risk of BPH, and the AAC haplotype showed significant association with BPH (χ 2 =6.34, P=0.0118). These data suggest that the ESR2 gene may be associated with susceptibility to BPH.

  10. An integrative approach for measuring semantic similarities using gene ontology.

    PubMed

    Peng, Jiajie; Li, Hongxiang; Jiang, Qinghua; Wang, Yadong; Chen, Jin

    2014-01-01

    Gene Ontology (GO) provides rich information and a convenient way to study gene functional similarity, which has been successfully used in various applications. However, the existing GO based similarity measurements have limited functions for only a subset of GO information is considered in each measure. An appropriate integration of the existing measures to take into account more information in GO is demanding. We propose a novel integrative measure called InteGO2 to automatically select appropriate seed measures and then to integrate them using a metaheuristic search method. The experiment results show that InteGO2 significantly improves the performance of gene similarity in human, Arabidopsis and yeast on both molecular function and biological process GO categories. InteGO2 computes gene-to-gene similarities more accurately than tested existing measures and has high robustness. The supplementary document and software are available at http://mlg.hit.edu.cn:8082/.

  11. Iterative local Gaussian clustering for expressed genes identification linked to malignancy of human colorectal carcinoma

    PubMed Central

    Wasito, Ito; Hashim, Siti Zaiton M; Sukmaningrum, Sri

    2007-01-01

    Gene expression profiling plays an important role in the identification of biological and clinical properties of human solid tumors such as colorectal carcinoma. Profiling is required to reveal underlying molecular features for diagnostic and therapeutic purposes. A non-parametric density-estimation-based approach called iterative local Gaussian clustering (ILGC), was used to identify clusters of expressed genes. We used experimental data from a previous study by Muro and others consisting of 1,536 genes in 100 colorectal cancer and 11 normal tissues. In this dataset, the ILGC finds three clusters, two large and one small gene clusters, similar to their results which used Gaussian mixture clustering. The correlation of each cluster of genes and clinical properties of malignancy of human colorectal cancer was analysed for the existence of tumor or normal, the existence of distant metastasis and the existence of lymph node metastasis. PMID:18305825

  12. Iterative local Gaussian clustering for expressed genes identification linked to malignancy of human colorectal carcinoma.

    PubMed

    Wasito, Ito; Hashim, Siti Zaiton M; Sukmaningrum, Sri

    2007-12-30

    Gene expression profiling plays an important role in the identification of biological and clinical properties of human solid tumors such as colorectal carcinoma. Profiling is required to reveal underlying molecular features for diagnostic and therapeutic purposes. A non-parametric density-estimation-based approach called iterative local Gaussian clustering (ILGC), was used to identify clusters of expressed genes. We used experimental data from a previous study by Muro and others consisting of 1,536 genes in 100 colorectal cancer and 11 normal tissues. In this dataset, the ILGC finds three clusters, two large and one small gene clusters, similar to their results which used Gaussian mixture clustering. The correlation of each cluster of genes and clinical properties of malignancy of human colorectal cancer was analysed for the existence of tumor or normal, the existence of distant metastasis and the existence of lymph node metastasis.

  13. Comparative Life Cycle Transcriptomics Revises Leishmania mexicana Genome Annotation and Links a Chromosome Duplication with Parasitism of Vertebrates

    PubMed Central

    Fiebig, Michael; Kelly, Steven; Gluenz, Eva

    2015-01-01

    Leishmania spp. are protozoan parasites that have two principal life cycle stages: the motile promastigote forms that live in the alimentary tract of the sandfly and the amastigote forms, which are adapted to survive and replicate in the harsh conditions of the phagolysosome of mammalian macrophages. Here, we used Illumina sequencing of poly-A selected RNA to characterise and compare the transcriptomes of L. mexicana promastigotes, axenic amastigotes and intracellular amastigotes. These data allowed the production of the first transcriptome evidence-based annotation of gene models for this species, including genome-wide mapping of trans-splice sites and poly-A addition sites. The revised genome annotation encompassed 9,169 protein-coding genes including 936 novel genes as well as modifications to previously existing gene models. Comparative analysis of gene expression across promastigote and amastigote forms revealed that 3,832 genes are differentially expressed between promastigotes and intracellular amastigotes. A large proportion of genes that were downregulated during differentiation to amastigotes were associated with the function of the motile flagellum. In contrast, those genes that were upregulated included cell surface proteins, transporters, peptidases and many uncharacterized genes, including 293 of the 936 novel genes. Genome-wide distribution analysis of the differentially expressed genes revealed that the tetraploid chromosome 30 is highly enriched for genes that were upregulated in amastigotes, providing the first evidence of a link between this whole chromosome duplication event and adaptation to the vertebrate host in this group. Peptide evidence for 42 proteins encoded by novel transcripts supports the idea of an as yet uncharacterised set of small proteins in Leishmania spp. with possible implications for host-pathogen interactions. PMID:26452044

  14. GeneNetFinder2: Improved Inference of Dynamic Gene Regulatory Relations with Multiple Regulators.

    PubMed

    Han, Kyungsook; Lee, Jeonghoon

    2016-01-01

    A gene involved in complex regulatory interactions may have multiple regulators since gene expression in such interactions is often controlled by more than one gene. Another thing that makes gene regulatory interactions complicated is that regulatory interactions are not static, but change over time during the cell cycle. Most research so far has focused on identifying gene regulatory relations between individual genes in a particular stage of the cell cycle. In this study we developed a method for identifying dynamic gene regulations of several types from the time-series gene expression data. The method can find gene regulations with multiple regulators that work in combination or individually as well as those with single regulators. The method has been implemented as the second version of GeneNetFinder (hereafter called GeneNetFinder2) and tested on several gene expression datasets. Experimental results with gene expression data revealed the existence of genes that are not regulated by individual genes but rather by a combination of several genes. Such gene regulatory relations cannot be found by conventional methods. Our method finds such regulatory relations as well as those with multiple, independent regulators or single regulators, and represents gene regulatory relations as a dynamic network in which different gene regulatory relations are shown in different stages of the cell cycle. GeneNetFinder2 is available at http://bclab.inha.ac.kr/GeneNetFinder and will be useful for modeling dynamic gene regulations with multiple regulators.

  15. An Integrative Framework for Bayesian Variable Selection with Informative Priors for Identifying Genes and Pathways

    PubMed Central

    Ander, Bradley P.; Zhang, Xiaoshuai; Xue, Fuzhong; Sharp, Frank R.; Yang, Xiaowei

    2013-01-01

    The discovery of genetic or genomic markers plays a central role in the development of personalized medicine. A notable challenge exists when dealing with the high dimensionality of the data sets, as thousands of genes or millions of genetic variants are collected on a relatively small number of subjects. Traditional gene-wise selection methods using univariate analyses face difficulty to incorporate correlational, structural, or functional structures amongst the molecular measures. For microarray gene expression data, we first summarize solutions in dealing with ‘large p, small n’ problems, and then propose an integrative Bayesian variable selection (iBVS) framework for simultaneously identifying causal or marker genes and regulatory pathways. A novel partial least squares (PLS) g-prior for iBVS is developed to allow the incorporation of prior knowledge on gene-gene interactions or functional relationships. From the point view of systems biology, iBVS enables user to directly target the joint effects of multiple genes and pathways in a hierarchical modeling diagram to predict disease status or phenotype. The estimated posterior selection probabilities offer probabilitic and biological interpretations. Both simulated data and a set of microarray data in predicting stroke status are used in validating the performance of iBVS in a Probit model with binary outcomes. iBVS offers a general framework for effective discovery of various molecular biomarkers by combining data-based statistics and knowledge-based priors. Guidelines on making posterior inferences, determining Bayesian significance levels, and improving computational efficiencies are also discussed. PMID:23844055

  16. An integrative framework for Bayesian variable selection with informative priors for identifying genes and pathways.

    PubMed

    Peng, Bin; Zhu, Dianwen; Ander, Bradley P; Zhang, Xiaoshuai; Xue, Fuzhong; Sharp, Frank R; Yang, Xiaowei

    2013-01-01

    The discovery of genetic or genomic markers plays a central role in the development of personalized medicine. A notable challenge exists when dealing with the high dimensionality of the data sets, as thousands of genes or millions of genetic variants are collected on a relatively small number of subjects. Traditional gene-wise selection methods using univariate analyses face difficulty to incorporate correlational, structural, or functional structures amongst the molecular measures. For microarray gene expression data, we first summarize solutions in dealing with 'large p, small n' problems, and then propose an integrative Bayesian variable selection (iBVS) framework for simultaneously identifying causal or marker genes and regulatory pathways. A novel partial least squares (PLS) g-prior for iBVS is developed to allow the incorporation of prior knowledge on gene-gene interactions or functional relationships. From the point view of systems biology, iBVS enables user to directly target the joint effects of multiple genes and pathways in a hierarchical modeling diagram to predict disease status or phenotype. The estimated posterior selection probabilities offer probabilitic and biological interpretations. Both simulated data and a set of microarray data in predicting stroke status are used in validating the performance of iBVS in a Probit model with binary outcomes. iBVS offers a general framework for effective discovery of various molecular biomarkers by combining data-based statistics and knowledge-based priors. Guidelines on making posterior inferences, determining Bayesian significance levels, and improving computational efficiencies are also discussed.

  17. Genome sequence of the ultrasmall unicellular red alga Cyanidioschyzon merolae 10D.

    PubMed

    Matsuzaki, Motomichi; Misumi, Osami; Shin-I, Tadasu; Maruyama, Shinichiro; Takahara, Manabu; Miyagishima, Shin-Ya; Mori, Toshiyuki; Nishida, Keiji; Yagisawa, Fumi; Nishida, Keishin; Yoshida, Yamato; Nishimura, Yoshiki; Nakao, Shunsuke; Kobayashi, Tamaki; Momoyama, Yu; Higashiyama, Tetsuya; Minoda, Ayumi; Sano, Masako; Nomoto, Hisayo; Oishi, Kazuko; Hayashi, Hiroko; Ohta, Fumiko; Nishizaka, Satoko; Haga, Shinobu; Miura, Sachiko; Morishita, Tomomi; Kabeya, Yukihiro; Terasawa, Kimihiro; Suzuki, Yutaka; Ishii, Yasuyuki; Asakawa, Shuichi; Takano, Hiroyoshi; Ohta, Niji; Kuroiwa, Haruko; Tanaka, Kan; Shimizu, Nobuyoshi; Sugano, Sumio; Sato, Naoki; Nozaki, Hisayoshi; Ogasawara, Naotake; Kohara, Yuji; Kuroiwa, Tsuneyoshi

    2004-04-08

    Small, compact genomes of ultrasmall unicellular algae provide information on the basic and essential genes that support the lives of photosynthetic eukaryotes, including higher plants. Here we report the 16,520,305-base-pair sequence of the 20 chromosomes of the unicellular red alga Cyanidioschyzon merolae 10D as the first complete algal genome. We identified 5,331 genes in total, of which at least 86.3% were expressed. Unique characteristics of this genomic structure include: a lack of introns in all but 26 genes; only three copies of ribosomal DNA units that maintain the nucleolus; and two dynamin genes that are involved only in the division of mitochondria and plastids. The conserved mosaic origin of Calvin cycle enzymes in this red alga and in green plants supports the hypothesis of the existence of single primary plastid endosymbiosis. The lack of a myosin gene, in addition to the unexpressed actin gene, suggests a simpler system of cytokinesis. These results indicate that the C. merolae genome provides a model system with a simple gene composition for studying the origin, evolution and fundamental mechanisms of eukaryotic cells.

  18. Statistical algorithms improve accuracy of gene fusion detection

    PubMed Central

    Hsieh, Gillian; Bierman, Rob; Szabo, Linda; Lee, Alex Gia; Freeman, Donald E.; Watson, Nathaniel; Sweet-Cordero, E. Alejandro

    2017-01-01

    Abstract Gene fusions are known to play critical roles in tumor pathogenesis. Yet, sensitive and specific algorithms to detect gene fusions in cancer do not currently exist. In this paper, we present a new statistical algorithm, MACHETE (Mismatched Alignment CHimEra Tracking Engine), which achieves highly sensitive and specific detection of gene fusions from RNA-Seq data, including the highest Positive Predictive Value (PPV) compared to the current state-of-the-art, as assessed in simulated data. We show that the best performing published algorithms either find large numbers of fusions in negative control data or suffer from low sensitivity detecting known driving fusions in gold standard settings, such as EWSR1-FLI1. As proof of principle that MACHETE discovers novel gene fusions with high accuracy in vivo, we mined public data to discover and subsequently PCR validate novel gene fusions missed by other algorithms in the ovarian cancer cell line OVCAR3. These results highlight the gains in accuracy achieved by introducing statistical models into fusion detection, and pave the way for unbiased discovery of potentially driving and druggable gene fusions in primary tumors. PMID:28541529

  19. An Adaptive Genetic Association Test Using Double Kernel Machines.

    PubMed

    Zhan, Xiang; Epstein, Michael P; Ghosh, Debashis

    2015-10-01

    Recently, gene set-based approaches have become very popular in gene expression profiling studies for assessing how genetic variants are related to disease outcomes. Since most genes are not differentially expressed, existing pathway tests considering all genes within a pathway suffer from considerable noise and power loss. Moreover, for a differentially expressed pathway, it is of interest to select important genes that drive the effect of the pathway. In this article, we propose an adaptive association test using double kernel machines (DKM), which can both select important genes within the pathway as well as test for the overall genetic pathway effect. This DKM procedure first uses the garrote kernel machines (GKM) test for the purposes of subset selection and then the least squares kernel machine (LSKM) test for testing the effect of the subset of genes. An appealing feature of the kernel machine framework is that it can provide a flexible and unified method for multi-dimensional modeling of the genetic pathway effect allowing for both parametric and nonparametric components. This DKM approach is illustrated with application to simulated data as well as to data from a neuroimaging genetics study.

  20. Tempo and Mode of Gene Duplication in Mammalian Ribosomal Protein Evolution

    PubMed Central

    Gajdosik, Matthew D.; Simon, Amanda; Nelson, Craig E.

    2014-01-01

    Gene duplication has been widely recognized as a major driver of evolutionary change and organismal complexity through the generation of multi-gene families. Therefore, understanding the forces that govern the evolution of gene families through the retention or loss of duplicated genes is fundamentally important in our efforts to study genome evolution. Previous work from our lab has shown that ribosomal protein (RP) genes constitute one of the largest classes of conserved duplicated genes in mammals. This result was surprising due to the fact that ribosomal protein genes evolve slowly and transcript levels are very tightly regulated. In our present study, we identified and characterized all RP duplicates in eight mammalian genomes in order to investigate the tempo and mode of ribosomal protein family evolution. We show that a sizable number of duplicates are transcriptionally active and are very highly conserved. Furthermore, we conclude that existing gene duplication models do not readily account for the preservation of a very large number of intact retroduplicated ribosomal protein (RT-RP) genes observed in mammalian genomes. We suggest that selection against dominant-negative mutations may underlie the unexpected retention and conservation of duplicated RP genes, and may shape the fate of newly duplicated genes, regardless of duplication mechanism. PMID:25369106

  1. Excess congenital non-synonymous variation in leukemia-associated genes in MLL− infant leukemia: a Children's Oncology Group report

    PubMed Central

    Valentine, M C; Linabery, A M; Chasnoff, S; Hughes, A E O; Mallaney, C; Sanchez, N; Giacalone, J; Heerema, N A; Hilden, J M; Spector, L G; Ross, J A; Druley, T E

    2014-01-01

    Infant leukemia (IL) is a rare sporadic cancer with a grim prognosis. Although most cases are accompanied by MLL rearrangements and harbor very few somatic mutations, less is known about the genetics of the cases without MLL translocations. We performed the largest exome-sequencing study to date on matched non-cancer DNA from pairs of mothers and IL patients to characterize congenital variation that may contribute to early leukemogenesis. Using the COSMIC database to define acute leukemia-associated candidate genes, we find a significant enrichment of rare, potentially functional congenital variation in IL patients compared with randomly selected genes within the same patients and unaffected pediatric controls. IL acute myeloid leukemia (AML) patients had more overall variation than IL acute lymphocytic leukemia (ALL) patients, but less of that variation was inherited from mothers. Of our candidate genes, we found that MLL3 was a compound heterozygote in every infant who developed AML and 50% of infants who developed ALL. These data suggest a model by which known genetic mechanisms for leukemogenesis could be disrupted without an abundance of somatic mutation or chromosomal rearrangements. This model would be consistent with existing models for the establishment of leukemia clones in utero and the high rate of IL concordance in monozygotic twins. PMID:24301523

  2. Origin and Functional Prediction of Pollen Allergens in Plants1[OPEN

    PubMed Central

    Chen, Miaolin; Xu, Jie; Ren, Kang; Searle, Iain

    2016-01-01

    Pollen allergies have long been a major pandemic health problem for human. However, the evolutionary events and biological function of pollen allergens in plants remain largely unknown. Here, we report the genome-wide prediction of pollen allergens and their biological function in the dicotyledonous model plant Arabidopsis (Arabidopsis thaliana) and the monocotyledonous model plant rice (Oryza sativa). In total, 145 and 107 pollen allergens were predicted from rice and Arabidopsis, respectively. These pollen allergens are putatively involved in stress responses and metabolic processes such as cell wall metabolism during pollen development. Interestingly, these putative pollen allergen genes were derived from large gene families and became diversified during evolution. Sequence analysis across 25 plant species from green alga to angiosperms suggest that about 40% of putative pollen allergenic proteins existed in both lower and higher plants, while other allergens emerged during evolution. Although a high proportion of gene duplication has been observed among allergen-coding genes, our data show that these genes might have undergone purifying selection during evolution. We also observed that epitopes of an allergen might have a biological function, as revealed by comprehensive analysis of two known allergens, expansin and profilin. This implies a crucial role of conserved amino acid residues in both in planta biological function and allergenicity. Finally, a model explaining how pollen allergens were generated and maintained in plants is proposed. Prediction and systematic analysis of pollen allergens in model plants suggest that pollen allergens were evolved by gene duplication and then functional specification. This study provides insight into the phylogenetic and evolutionary scenario of pollen allergens that will be helpful to future characterization and epitope screening of pollen allergens. PMID:27436829

  3. Origin and Functional Prediction of Pollen Allergens in Plants.

    PubMed

    Chen, Miaolin; Xu, Jie; Devis, Deborah; Shi, Jianxin; Ren, Kang; Searle, Iain; Zhang, Dabing

    2016-09-01

    Pollen allergies have long been a major pandemic health problem for human. However, the evolutionary events and biological function of pollen allergens in plants remain largely unknown. Here, we report the genome-wide prediction of pollen allergens and their biological function in the dicotyledonous model plant Arabidopsis (Arabidopsis thaliana) and the monocotyledonous model plant rice (Oryza sativa). In total, 145 and 107 pollen allergens were predicted from rice and Arabidopsis, respectively. These pollen allergens are putatively involved in stress responses and metabolic processes such as cell wall metabolism during pollen development. Interestingly, these putative pollen allergen genes were derived from large gene families and became diversified during evolution. Sequence analysis across 25 plant species from green alga to angiosperms suggest that about 40% of putative pollen allergenic proteins existed in both lower and higher plants, while other allergens emerged during evolution. Although a high proportion of gene duplication has been observed among allergen-coding genes, our data show that these genes might have undergone purifying selection during evolution. We also observed that epitopes of an allergen might have a biological function, as revealed by comprehensive analysis of two known allergens, expansin and profilin. This implies a crucial role of conserved amino acid residues in both in planta biological function and allergenicity. Finally, a model explaining how pollen allergens were generated and maintained in plants is proposed. Prediction and systematic analysis of pollen allergens in model plants suggest that pollen allergens were evolved by gene duplication and then functional specification. This study provides insight into the phylogenetic and evolutionary scenario of pollen allergens that will be helpful to future characterization and epitope screening of pollen allergens. © 2016 American Society of Plant Biologists. All rights reserved.

  4. Does Marriage Moderate Genetic Effects on Delinquency and Violence?

    PubMed Central

    Li, Yi; Liu, Hexuan; Guo, Guang

    2015-01-01

    Using data from the National Longitudinal Study of Adolescent to Adult Health (N = 1,254), the authors investigated whether marriage can foster desistance from delinquency and violence by moderating genetic effects. In contrast to existing gene–environment research that typically focuses on one or a few genetic polymorphisms, they extended a recently developed mixed linear model to consider the collective influence of 580 single nucleotide polymorphisms in 64 genes related to aggression and risky behavior. The mixed linear model estimates the proportion of variance in the phenotype that is explained by the single nucleotide polymorphisms. The authors found that the proportion of variance in delinquency/violence explained was smaller among married individuals than unmarried individuals. Because selection, confounding, and heterogeneity may bias the estimate of the Gene × Marriage interaction, they conducted a series of analyses to address these issues. The findings suggest that the Gene × Marriage interaction results were not seriously affected by these issues. PMID:26549892

  5. The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection

    PubMed Central

    Yu, Yun; Degnan, James H.; Nakhleh, Luay

    2012-01-01

    Gene tree topologies have proven a powerful data source for various tasks, including species tree inference and species delimitation. Consequently, methods for computing probabilities of gene trees within species trees have been developed and widely used in probabilistic inference frameworks. All these methods assume an underlying multispecies coalescent model. However, when reticulate evolutionary events such as hybridization occur, these methods are inadequate, as they do not account for such events. Methods that account for both hybridization and deep coalescence in computing the probability of a gene tree topology currently exist for very limited cases. However, no such methods exist for general cases, owing primarily to the fact that it is currently unknown how to compute the probability of a gene tree topology within the branches of a phylogenetic network. Here we present a novel method for computing the probability of gene tree topologies on phylogenetic networks and demonstrate its application to the inference of hybridization in the presence of incomplete lineage sorting. We reanalyze a Saccharomyces species data set for which multiple analyses had converged on a species tree candidate. Using our method, though, we show that an evolutionary hypothesis involving hybridization in this group has better support than one of strict divergence. A similar reanalysis on a group of three Drosophila species shows that the data is consistent with hybridization. Further, using extensive simulation studies, we demonstrate the power of gene tree topologies at obtaining accurate estimates of branch lengths and hybridization probabilities of a given phylogenetic network. Finally, we discuss identifiability issues with detecting hybridization, particularly in cases that involve extinction or incomplete sampling of taxa. PMID:22536161

  6. Transcriptional interference networks coordinate the expression of functionally related genes clustered in the same genomic loci

    PubMed Central

    Boldogköi, Zsolt

    2012-01-01

    The regulation of gene expression is essential for normal functioning of biological systems in every form of life. Gene expression is primarily controlled at the level of transcription, especially at the phase of initiation. Non-coding RNAs are one of the major players at every level of genetic regulation, including the control of chromatin organization, transcription, various post-transcriptional processes, and translation. In this study, the Transcriptional Interference Network (TIN) hypothesis was put forward in an attempt to explain the global expression of antisense RNAs and the overall occurrence of tandem gene clusters in the genomes of various biological systems ranging from viruses to mammalian cells. The TIN hypothesis suggests the existence of a novel layer of genetic regulation, based on the interactions between the transcriptional machineries of neighboring genes at their overlapping regions, which are assumed to play a fundamental role in coordinating gene expression within a cluster of functionally linked genes. It is claimed that the transcriptional overlaps between adjacent genes are much more widespread in genomes than is thought today. The Waterfall model of the TIN hypothesis postulates a unidirectional effect of upstream genes on the transcription of downstream genes within a cluster of tandemly arrayed genes, while the Seesaw model proposes a mutual interdependence of gene expression between the oppositely oriented genes. The TIN represents an auto-regulatory system with an exquisitely timed and highly synchronized cascade of gene expression in functionally linked genes located in close physical proximity to each other. In this study, we focused on herpesviruses. The reason for this lies in the compressed nature of viral genes, which allows a tight regulation and an easier investigation of the transcriptional interactions between genes. However, I believe that the same or similar principles can be applied to cellular organisms too. PMID:22783276

  7. Transcriptional interference networks coordinate the expression of functionally related genes clustered in the same genomic loci.

    PubMed

    Boldogköi, Zsolt

    2012-01-01

    The regulation of gene expression is essential for normal functioning of biological systems in every form of life. Gene expression is primarily controlled at the level of transcription, especially at the phase of initiation. Non-coding RNAs are one of the major players at every level of genetic regulation, including the control of chromatin organization, transcription, various post-transcriptional processes, and translation. In this study, the Transcriptional Interference Network (TIN) hypothesis was put forward in an attempt to explain the global expression of antisense RNAs and the overall occurrence of tandem gene clusters in the genomes of various biological systems ranging from viruses to mammalian cells. The TIN hypothesis suggests the existence of a novel layer of genetic regulation, based on the interactions between the transcriptional machineries of neighboring genes at their overlapping regions, which are assumed to play a fundamental role in coordinating gene expression within a cluster of functionally linked genes. It is claimed that the transcriptional overlaps between adjacent genes are much more widespread in genomes than is thought today. The Waterfall model of the TIN hypothesis postulates a unidirectional effect of upstream genes on the transcription of downstream genes within a cluster of tandemly arrayed genes, while the Seesaw model proposes a mutual interdependence of gene expression between the oppositely oriented genes. The TIN represents an auto-regulatory system with an exquisitely timed and highly synchronized cascade of gene expression in functionally linked genes located in close physical proximity to each other. In this study, we focused on herpesviruses. The reason for this lies in the compressed nature of viral genes, which allows a tight regulation and an easier investigation of the transcriptional interactions between genes. However, I believe that the same or similar principles can be applied to cellular organisms too.

  8. Comparative and Evolutionary Analysis of the HES/HEY Gene Family Reveal Exon/Intron Loss and Teleost Specific Duplication Events

    PubMed Central

    Ma, Zhaowu; Zhou, Yang; Abbood, Nibras Najm; Liu, Jianfeng; Su, Li; Jia, Haibo; Guo, An-Yuan

    2012-01-01

    Background HES/HEY genes encode a family of basic helix-loop-helix (bHLH) transcription factors with both bHLH and Orange domain. HES/HEY proteins are direct targets of the Notch signaling pathway and play an essential role in developmental decisions, such as the developments of nervous system, somitogenesis, blood vessel and heart. Despite their important functions, the origin and evolution of this HES/HEY gene family has yet to be elucidated. Methods and Findings In this study, we identified genes of the HES/HEY family in representative species and performed evolutionary analysis to elucidate their origin and evolutionary process. Our results showed that the HES/HEY genes only existed in metazoans and may originate from the common ancestor of metazoans. We identified HES/HEY genes in more than 10 species representing the main lineages. Combining the bHLH and Orange domain sequences, we constructed the phylogenetic trees by different methods (Bayesian, ML, NJ and ME) and classified the HES/HEY gene family into four groups. Our results indicated that this gene family had undergone three expansions, which were along with the origins of Eumetazoa, vertebrate, and teleost. Gene structure analysis revealed that the HES/HEY genes were involved in exon and/or intron loss in different species lineages. Genes of this family were duplicated in bony fishes and doubled than other vertebrates. Furthermore, we studied the teleost-specific duplications in zebrafish and investigated the expression pattern of duplicated genes in different tissues by RT-PCR. Finally, we proposed a model to show the evolution of this gene family with processes of expansion, exon/intron loss, and motif loss. Conclusions Our study revealed the evolution of HES/HEY gene family, the expression and function divergence of duplicated genes, which also provide clues for the research of Notch function in development. This study shows a model of gene family analysis with gene structure evolution and duplication. PMID:22808219

  9. Comparative and evolutionary analysis of the HES/HEY gene family reveal exon/intron loss and teleost specific duplication events.

    PubMed

    Zhou, Mi; Yan, Jun; Ma, Zhaowu; Zhou, Yang; Abbood, Nibras Najm; Liu, Jianfeng; Su, Li; Jia, Haibo; Guo, An-Yuan

    2012-01-01

    HES/HEY genes encode a family of basic helix-loop-helix (bHLH) transcription factors with both bHLH and Orange domain. HES/HEY proteins are direct targets of the Notch signaling pathway and play an essential role in developmental decisions, such as the developments of nervous system, somitogenesis, blood vessel and heart. Despite their important functions, the origin and evolution of this HES/HEY gene family has yet to be elucidated. In this study, we identified genes of the HES/HEY family in representative species and performed evolutionary analysis to elucidate their origin and evolutionary process. Our results showed that the HES/HEY genes only existed in metazoans and may originate from the common ancestor of metazoans. We identified HES/HEY genes in more than 10 species representing the main lineages. Combining the bHLH and Orange domain sequences, we constructed the phylogenetic trees by different methods (Bayesian, ML, NJ and ME) and classified the HES/HEY gene family into four groups. Our results indicated that this gene family had undergone three expansions, which were along with the origins of Eumetazoa, vertebrate, and teleost. Gene structure analysis revealed that the HES/HEY genes were involved in exon and/or intron loss in different species lineages. Genes of this family were duplicated in bony fishes and doubled than other vertebrates. Furthermore, we studied the teleost-specific duplications in zebrafish and investigated the expression pattern of duplicated genes in different tissues by RT-PCR. Finally, we proposed a model to show the evolution of this gene family with processes of expansion, exon/intron loss, and motif loss. Our study revealed the evolution of HES/HEY gene family, the expression and function divergence of duplicated genes, which also provide clues for the research of Notch function in development. This study shows a model of gene family analysis with gene structure evolution and duplication.

  10. Inference of Gene Regulatory Networks Using Bayesian Nonparametric Regression and Topology Information.

    PubMed

    Fan, Yue; Wang, Xiao; Peng, Qinke

    2017-01-01

    Gene regulatory networks (GRNs) play an important role in cellular systems and are important for understanding biological processes. Many algorithms have been developed to infer the GRNs. However, most algorithms only pay attention to the gene expression data but do not consider the topology information in their inference process, while incorporating this information can partially compensate for the lack of reliable expression data. Here we develop a Bayesian group lasso with spike and slab priors to perform gene selection and estimation for nonparametric models. B-spline basis functions are used to capture the nonlinear relationships flexibly and penalties are used to avoid overfitting. Further, we incorporate the topology information into the Bayesian method as a prior. We present the application of our method on DREAM3 and DREAM4 datasets and two real biological datasets. The results show that our method performs better than existing methods and the topology information prior can improve the result.

  11. Protists and the Wild, Wild West of Gene Expression: New Frontiers, Lawlessness, and Misfits.

    PubMed

    Smith, David Roy; Keeling, Patrick J

    2016-09-08

    The DNA double helix has been called one of life's most elegant structures, largely because of its universality, simplicity, and symmetry. The expression of information encoded within DNA, however, can be far from simple or symmetric and is sometimes surprisingly variable, convoluted, and wantonly inefficient. Although exceptions to the rules exist in certain model systems, the true extent to which life has stretched the limits of gene expression is made clear by nonmodel systems, particularly protists (microbial eukaryotes). The nuclear and organelle genomes of protists are subject to the most tangled forms of gene expression yet identified. The complicated and extravagant picture of the underlying genetics of eukaryotic microbial life changes how we think about the flow of genetic information and the evolutionary processes shaping it. Here, we discuss the origins, diversity, and growing interest in noncanonical protist gene expression and its relationship to genomic architecture.

  12. A splice site mutation in a gene encoding for PDK4, a mitochondrial protein, is associated with the development of dilated cardiomyopathy in the Doberman pinscher

    USDA-ARS?s Scientific Manuscript database

    Familial dilated cardiomyopathy is a primary myocardial disease that can result in the development of congestive heart failure and sudden cardiac death. Spontaneous animal models of familial dilated cardiomyopathy exist and the Doberman pinscher dog is one of the most commonly reported canine breeds...

  13. Mining functionally relevant gene sets for analyzing physiologically novel clinical expression data.

    PubMed

    Turcan, Sevin; Vetter, Douglas E; Maron, Jill L; Wei, Xintao; Slonim, Donna K

    2011-01-01

    Gene set analyses have become a standard approach for increasing the sensitivity of transcriptomic studies. However, analytical methods incorporating gene sets require the availability of pre-defined gene sets relevant to the underlying physiology being studied. For novel physiological problems, relevant gene sets may be unavailable or existing gene set databases may bias the results towards only the best-studied of the relevant biological processes. We describe a successful attempt to mine novel functional gene sets for translational projects where the underlying physiology is not necessarily well characterized in existing annotation databases. We choose targeted training data from public expression data repositories and define new criteria for selecting biclusters to serve as candidate gene sets. Many of the discovered gene sets show little or no enrichment for informative Gene Ontology terms or other functional annotation. However, we observe that such gene sets show coherent differential expression in new clinical test data sets, even if derived from different species, tissues, and disease states. We demonstrate the efficacy of this method on a human metabolic data set, where we discover novel, uncharacterized gene sets that are diagnostic of diabetes, and on additional data sets related to neuronal processes and human development. Our results suggest that our approach may be an efficient way to generate a collection of gene sets relevant to the analysis of data for novel clinical applications where existing functional annotation is relatively incomplete.

  14. MAGMA: Generalized Gene-Set Analysis of GWAS Data

    PubMed Central

    de Leeuw, Christiaan A.; Mooij, Joris M.; Heskes, Tom; Posthuma, Danielle

    2015-01-01

    By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn’s Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn’s Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn’s Disease data was found to be considerably faster as well. PMID:25885710

  15. MAGMA: generalized gene-set analysis of GWAS data.

    PubMed

    de Leeuw, Christiaan A; Mooij, Joris M; Heskes, Tom; Posthuma, Danielle

    2015-04-01

    By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn's Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn's Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn's Disease data was found to be considerably faster as well.

  16. Contrasting Features of Urea Cycle Disorders in Human Patients and Knockout Mouse Models

    PubMed Central

    Deignan, Joshua L.; Cederbaum, Stephen D.; Grody, Wayne W.

    2009-01-01

    The urea cycle exists for the removal of excess nitrogen from the body. Six separate enzymes comprise the urea cycle, and a deficiency in any one of them causes a urea cycle disorder (UCD) in humans. Arginase is the only urea cycle enzyme with an alternate isoform, though no known human disorder currently exists due to a deficiency in the second isoform. While all of the UCDs usually present with hyperammonemia in the first few days to months of life, most disorders are distinguished by a characteristic profile of plasma amino acid alterations that can be utilized for diagnosis. While enzyme assay is possible, an analysis of the underlying mutation is preferable for an accurate diagnosis. Mouse models for each of the urea cycle disorders exist (with the exception of NAGS deficiency), and for almost all of them, their clinical and biochemical phenotypes rather closely resemble the phenotypes seen in human patients. Consequently, all of the current mouse models are highly useful for future research into novel pharmacological and dietary treatments and gene therapy protocols for the management of urea cycle disorders. PMID:17933574

  17. Contrasting features of urea cycle disorders in human patients and knockout mouse models.

    PubMed

    Deignan, Joshua L; Cederbaum, Stephen D; Grody, Wayne W

    2008-01-01

    The urea cycle exists for the removal of excess nitrogen from the body. Six separate enzymes comprise the urea cycle, and a deficiency in any one of them causes a urea cycle disorder (UCD) in humans. Arginase is the only urea cycle enzyme with an alternate isoform, though no known human disorder currently exists due to a deficiency in the second isoform. While all of the UCDs usually present with hyperammonemia in the first few days to months of life, most disorders are distinguished by a characteristic profile of plasma amino acid alterations that can be utilized for diagnosis. While enzyme assay is possible, an analysis of the underlying mutation is preferable for an accurate diagnosis. Mouse models for each of the urea cycle disorders exist (with the exception of NAGS deficiency), and for almost all of them, their clinical and biochemical phenotypes rather closely resemble the phenotypes seen in human patients. Consequently, all of the current mouse models are highly useful for future research into novel pharmacological and dietary treatments and gene therapy protocols for the management of urea cycle disorders.

  18. Insights into social insects from the genome of the honeybee Apis mellifera

    PubMed Central

    2007-01-01

    Here we report the genome sequence of the honeybee Apis mellifera, a key model for social behaviour and essential to global ecology through pollination. Compared with other sequenced insect genomes, the A. mellifera genome has high A+T and CpG contents, lacks major transposon families, evolves more slowly, and is more similar to vertebrates for circadian rhythm, RNA interference and DNA methylation genes, among others. Furthermore, A. mellifera has fewer genes for innate immunity, detoxification enzymes, cuticle-forming proteins and gustatory receptors, more genes for odorant receptors, and novel genes for nectar and pollen utilization, consistent with its ecology and social organization. Compared to Drosophila, genes in early developmental pathways differ in Apis, whereas similarities exist for functions that differ markedly, such as sex determination, brain function and behaviour. Population genetics suggests a novel African origin for the species A. mellifera and insights into whether Africanized bees spread throughout the New World via hybridization or displacement. PMID:17073008

  19. Epigenetic modulators, modifiers and mediators in cancer aetiology and progression

    PubMed Central

    Feinberg, Andrew P.; Koldobskiy, Michael A.; Göndör, Anita

    2016-01-01

    This year is the tenth anniversary of the publication in this journal of a model suggesting the existence of ‘tumour progenitor genes’. These genes are epigenetically disrupted at the earliest stages of malignancies, even before mutations, and thus cause altered differentiation throughout tumour evolution. The past decade of discovery in cancer epigenetics has revealed a number of similarities between cancer genes and stem cell reprogramming genes, widespread mutations in epigenetic regulators, and the part played by chromatin structure in cellular plasticity in both development and cancer. In the light of these discoveries, we suggest here a framework for cancer epigenetics involving three types of genes: ‘epigenetic mediators’, corresponding to the tumour progenitor genes suggested earlier; ‘epigenetic modifiers’ of the mediators, which are frequently mutated in cancer; and ‘epigenetic modulators’ upstream of the modifiers, which are responsive to changes in the cellular environment and often linked to the nuclear architecture. We suggest that this classification is helpful in framing new diagnostic and therapeutic approaches to cancer. PMID:26972587

  20. An integrative approach to ortholog prediction for disease-focused and other functional studies.

    PubMed

    Hu, Yanhui; Flockhart, Ian; Vinayagam, Arunachalam; Bergwitz, Clemens; Berger, Bonnie; Perrimon, Norbert; Mohr, Stephanie E

    2011-08-31

    Mapping of orthologous genes among species serves an important role in functional genomics by allowing researchers to develop hypotheses about gene function in one species based on what is known about the functions of orthologs in other species. Several tools for predicting orthologous gene relationships are available. However, these tools can give different results and identification of predicted orthologs is not always straightforward. We report a simple but effective tool, the Drosophila RNAi Screening Center Integrative Ortholog Prediction Tool (DIOPT; http://www.flyrnai.org/diopt), for rapid identification of orthologs. DIOPT integrates existing approaches, facilitating rapid identification of orthologs among human, mouse, zebrafish, C. elegans, Drosophila, and S. cerevisiae. As compared to individual tools, DIOPT shows increased sensitivity with only a modest decrease in specificity. Moreover, the flexibility built into the DIOPT graphical user interface allows researchers with different goals to appropriately 'cast a wide net' or limit results to highest confidence predictions. DIOPT also displays protein and domain alignments, including percent amino acid identity, for predicted ortholog pairs. This helps users identify the most appropriate matches among multiple possible orthologs. To facilitate using model organisms for functional analysis of human disease-associated genes, we used DIOPT to predict high-confidence orthologs of disease genes in Online Mendelian Inheritance in Man (OMIM) and genes in genome-wide association study (GWAS) data sets. The results are accessible through the DIOPT diseases and traits query tool (DIOPT-DIST; http://www.flyrnai.org/diopt-dist). DIOPT and DIOPT-DIST are useful resources for researchers working with model organisms, especially those who are interested in exploiting model organisms such as Drosophila to study the functions of human disease genes.

  1. Penalized differential pathway analysis of integrative oncogenomics studies.

    PubMed

    van Wieringen, Wessel N; van de Wiel, Mark A

    2014-04-01

    Through integration of genomic data from multiple sources, we may obtain a more accurate and complete picture of the molecular mechanisms underlying tumorigenesis. We discuss the integration of DNA copy number and mRNA gene expression data from an observational integrative genomics study involving cancer patients. The two molecular levels involved are linked through the central dogma of molecular biology. DNA copy number aberrations abound in the cancer cell. Here we investigate how these aberrations affect gene expression levels within a pathway using observational integrative genomics data of cancer patients. In particular, we aim to identify differential edges between regulatory networks of two groups involving these molecular levels. Motivated by the rate equations, the regulatory mechanism between DNA copy number aberrations and gene expression levels within a pathway is modeled by a simultaneous-equations model, for the one- and two-group case. The latter facilitates the identification of differential interactions between the two groups. Model parameters are estimated by penalized least squares using the lasso (L1) penalty to obtain a sparse pathway topology. Simulations show that the inclusion of DNA copy number data benefits the discovery of gene-gene interactions. In addition, the simulations reveal that cis-effects tend to be over-estimated in a univariate (single gene) analysis. In the application to real data from integrative oncogenomic studies we show that inclusion of prior information on the regulatory network architecture benefits the reproducibility of all edges. Furthermore, analyses of the TP53 and TGFb signaling pathways between ER+ and ER- samples from an integrative genomics breast cancer study identify reproducible differential regulatory patterns that corroborate with existing literature.

  2. DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations.

    PubMed

    Yuan, Yuchen; Shi, Yi; Li, Changyang; Kim, Jinman; Cai, Weidong; Han, Zeguang; Feng, David Dagan

    2016-12-23

    With the developments of DNA sequencing technology, large amounts of sequencing data have become available in recent years and provide unprecedented opportunities for advanced association studies between somatic point mutations and cancer types/subtypes, which may contribute to more accurate somatic point mutation based cancer classification (SMCC). However in existing SMCC methods, issues like high data sparsity, small volume of sample size, and the application of simple linear classifiers, are major obstacles in improving the classification performance. To address the obstacles in existing SMCC studies, we propose DeepGene, an advanced deep neural network (DNN) based classifier, that consists of three steps: firstly, the clustered gene filtering (CGF) concentrates the gene data by mutation occurrence frequency, filtering out the majority of irrelevant genes; secondly, the indexed sparsity reduction (ISR) converts the gene data into indexes of its non-zero elements, thereby significantly suppressing the impact of data sparsity; finally, the data after CGF and ISR is fed into a DNN classifier, which extracts high-level features for accurate classification. Experimental results on our curated TCGA-DeepGene dataset, which is a reformulated subset of the TCGA dataset containing 12 selected types of cancer, show that CGF, ISR and DNN all contribute in improving the overall classification performance. We further compare DeepGene with three widely adopted classifiers and demonstrate that DeepGene has at least 24% performance improvement in terms of testing accuracy. Based on deep learning and somatic point mutation data, we devise DeepGene, an advanced cancer type classifier, which addresses the obstacles in existing SMCC studies. Experiments indicate that DeepGene outperforms three widely adopted existing classifiers, which is mainly attributed to its deep learning module that is able to extract the high level features between combinatorial somatic point mutations and cancer types.

  3. Gene Set−Based Integrative Analysis Revealing Two Distinct Functional Regulation Patterns in Four Common Subtypes of Epithelial Ovarian Cancer

    PubMed Central

    Chang, Chia-Ming; Chuang, Chi-Mu; Wang, Mong-Lien; Yang, Yi-Ping; Chuang, Jen-Hua; Yang, Ming-Jie; Yen, Ming-Shyen; Chiou, Shih-Hwa; Chang, Cheng-Chang

    2016-01-01

    Clear cell (CCC), endometrioid (EC), mucinous (MC) and high-grade serous carcinoma (SC) are the four most common subtypes of epithelial ovarian carcinoma (EOC). The widely accepted dualistic model of ovarian carcinogenesis divided EOCs into type I and II categories based on the molecular features. However, this hypothesis has not been experimentally demonstrated. We carried out a gene set-based analysis by integrating the microarray gene expression profiles downloaded from the publicly available databases. These quantified biological functions of EOCs were defined by 1454 Gene Ontology (GO) term and 674 Reactome pathway gene sets. The pathogenesis of the four EOC subtypes was investigated by hierarchical clustering and exploratory factor analysis. The patterns of functional regulation among the four subtypes containing 1316 cases could be accurately classified by machine learning. The results revealed that the ERBB and PI3K-related pathways played important roles in the carcinogenesis of CCC, EC and MC; while deregulation of cell cycle was more predominant in SC. The study revealed that two different functional regulation patterns exist among the four EOC subtypes, which were compatible with the type I and II classifications proposed by the dualistic model of ovarian carcinogenesis. PMID:27527159

  4. ANISEED 2017: extending the integrated ascidian database to the exploration and evolutionary comparison of genome-scale datasets

    PubMed Central

    Brozovic, Matija; Dantec, Christelle; Dardaillon, Justine; Dauga, Delphine; Faure, Emmanuel; Gineste, Mathieu; Louis, Alexandra; Naville, Magali; Nitta, Kazuhiro R; Piette, Jacques; Reeves, Wendy; Scornavacca, Céline; Simion, Paul; Vincentelli, Renaud; Bellec, Maelle; Aicha, Sameh Ben; Fagotto, Marie; Guéroult-Bellone, Marion; Haeussler, Maximilian; Jacox, Edwin; Lowe, Elijah K; Mendez, Mickael; Roberge, Alexis; Stolfi, Alberto; Yokomori, Rui; Cambillau, Christian; Christiaen, Lionel; Delsuc, Frédéric; Douzery, Emmanuel; Dumollard, Rémi; Kusakabe, Takehiro; Nakai, Kenta; Nishida, Hiroki; Satou, Yutaka; Swalla, Billie; Veeman, Michael; Volff, Jean-Nicolas

    2018-01-01

    Abstract ANISEED (www.aniseed.cnrs.fr) is the main model organism database for tunicates, the sister-group of vertebrates. This release gives access to annotated genomes, gene expression patterns, and anatomical descriptions for nine ascidian species. It provides increased integration with external molecular and taxonomy databases, better support for epigenomics datasets, in particular RNA-seq, ChIP-seq and SELEX-seq, and features novel interactive interfaces for existing and novel datatypes. In particular, the cross-species navigation and comparison is enhanced through a novel taxonomy section describing each represented species and through the implementation of interactive phylogenetic gene trees for 60% of tunicate genes. The gene expression section displays the results of RNA-seq experiments for the three major model species of solitary ascidians. Gene expression is controlled by the binding of transcription factors to cis-regulatory sequences. A high-resolution description of the DNA-binding specificity for 131 Ciona robusta (formerly C. intestinalis type A) transcription factors by SELEX-seq is provided and used to map candidate binding sites across the Ciona robusta and Phallusia mammillata genomes. Finally, use of a WashU Epigenome browser enhances genome navigation, while a Genomicus server was set up to explore microsynteny relationships within tunicates and with vertebrates, Amphioxus, echinoderms and hemichordates. PMID:29149270

  5. Spatial gradients of protein-level time delays set the pace of the traveling segmentation clock waves

    PubMed Central

    Ay, Ahmet; Holland, Jack; Sperlea, Adriana; Devakanmalai, Gnanapackiam Sheela; Knierer, Stephan; Sangervasi, Sebastian; Stevenson, Angel; Özbudak, Ertuğrul M.

    2014-01-01

    The vertebrate segmentation clock is a gene expression oscillator controlling rhythmic segmentation of the vertebral column during embryonic development. The period of oscillations becomes longer as cells are displaced along the posterior to anterior axis, which results in traveling waves of clock gene expression sweeping in the unsegmented tissue. Although various hypotheses necessitating the inclusion of additional regulatory genes into the core clock network at different spatial locations have been proposed, the mechanism underlying traveling waves has remained elusive. Here, we combined molecular-level computational modeling and quantitative experimentation to solve this puzzle. Our model predicts the existence of an increasing gradient of gene expression time delays along the posterior to anterior direction to recapitulate spatiotemporal profiles of the traveling segmentation clock waves in different genetic backgrounds in zebrafish. We validated this prediction by measuring an increased time delay of oscillatory Her1 protein production along the unsegmented tissue. Our results refuted the need for spatial expansion of the core feedback loop to explain the occurrence of traveling waves. Spatial regulation of gene expression time delays is a novel way of creating dynamic patterns; this is the first report demonstrating such a control mechanism in any tissue and future investigations will explore the presence of analogous examples in other biological systems. PMID:25336742

  6. Transcriptomic Analysis and the Expression of Disease-Resistant Genes in Oryza meyeriana under Native Condition

    PubMed Central

    He, Bin; Tao, Xiang; Gu, Yinghong; Wei, Changhe; Cheng, Xiaojie; Xiao, Suqin; Cheng, Zaiquan; Zhang, Yizheng

    2015-01-01

    Oryza meyeriana (O. meyeriana), with a GG genome type (2n = 24), accumulated plentiful excellent characteristics with respect to resistance to many diseases such as rice shade and blast, even immunity to bacterial blight. It is very important to know if the diseases-resistant genes exist and express in this wild rice under native conditions. However, limited genomic or transcriptomic data of O. meyeriana are currently available. In this study, we present the first comprehensive characterization of the O. meyeriana transcriptome using RNA-seq and obtained 185,323 contigs with an average length of 1,692 bp and an N50 of 2,391 bp. Through differential expression analysis, it was found that there were most tissue-specifically expressed genes in roots, and next to stems and leaves. By similarity search against protein databases, 146,450 had at least a significant alignment to existed gene models. Comparison with the Oryza sativa (japonica-type Nipponbare and indica-type 93–11) genomes revealed that 13% of the O. meyeriana contigs had not been detected in O. sativa. Many diseases-resistant genes, such as bacterial blight resistant, blast resistant, rust resistant, fusarium resistant, cyst nematode resistant and downy mildew gene, were mined from the transcriptomic database. There are two kinds of rice bacterial blight-resistant genes (Xa1 and Xa26) differentially or specifically expressed in O. meyeriana. The 4 Xa1 contigs were all only expressed in root, while three of Xa26 contigs have the highest expression level in leaves, two of Xa26 contigs have the highest expression profile in stems and one of Xa26 contigs was expressed dominantly in roots. The transcriptomic database of O. meyeriana has been constructed and many diseases-resistant genes were found to express under native condition, which provides a foundation for future discovery of a number of novel genes and provides a basis for studying the molecular mechanisms associated with disease resistance in O. meyeriana. PMID:26640944

  7. Comparison and evaluation of gene therapy and epigenetic approaches for wound healing.

    PubMed

    Cutroneo, K R; Chiu, J F

    2000-01-01

    During the past decade considerable evidence has mounted concerning the importance of growth factors in the wound healing process both for cell replication and for stimulating reparative cells to synthesize and secrete extracellular matrix components. During normal wound healing the growth factor concentration has to be maintained at a certain level. If the growth factor concentration is too low, normal healing fails to occur. Whereas if the growth factor concentration is too high due to either over-expression of the growth factor or too much growth factor being applied to the wound, aberrant wound healing will occur. One approach for controlling the amount of growth factor at the wound site during normal healing is through gene therapy and the titration of gene dosage. However if a narrow window exists between the beneficial therapeutic effect and toxic effects with increasing gene dosage, an agent may be necessary to give in combination with gene therapy to regulate the over-expression of growth factor. In addition to genetic approaches to regulate wound healing, epigenetic approaches also exist. Antisense oligodeoxynucleotides have been shown to regulate wound repair in certain model systems and to determine the protein(s) necessary for normal wound healing. A novel approach to regulate the activity of collagen genes, thereby affecting fibrosis, is to use a sense oligodeoxynucleotide having the same sequence of the cis element which regulates the promoter activity of a particular collagen gene. This exogenous oligodeoxynucleotide will compete with the cis element in the collagen gene for the trans-acting factor which regulates promoter activity. These epigenetic approaches afford the opportunity to regulate over-expression of growth factor and therefore preclude the potential toxic effects of gene therapy. Both genetic and epigenetic approaches for regulating the wound healing process, either normal or aberrant wound healing, have certain advantages and disadvantages which are discussed in the present article.

  8. A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions

    PubMed Central

    Glusman, Gustavo; Qin, Shizhen; El-Gewely, M. Raafat; Siegel, Andrew F; Roach, Jared C; Hood, Leroy; Smit, Arian F. A

    2006-01-01

    The identification and characterization of the complete ensemble of genes is a main goal of deciphering the digital information stored in the human genome. Many algorithms for computational gene prediction have been described, ultimately derived from two basic concepts: (1) modeling gene structure and (2) recognizing sequence similarity. Successful hybrid methods combining these two concepts have also been developed. We present a third orthogonal approach to gene prediction, based on detecting the genomic signatures of transcription, accumulated over evolutionary time. We discuss four algorithms based on this third concept: Greens and CHOWDER, which quantify mutational strand biases caused by transcription-coupled DNA repair, and ROAST and PASTA, which are based on strand-specific selection against polyadenylation signals. We combined these algorithms into an integrated method called FEAST, which we used to predict the location and orientation of thousands of putative transcription units not overlapping known genes. Many of the newly predicted transcriptional units do not appear to code for proteins. The new algorithms are particularly apt at detecting genes with long introns and lacking sequence conservation. They therefore complement existing gene prediction methods and will help identify functional transcripts within many apparent “genomic deserts.” PMID:16543943

  9. Extensive cross-regulation of post-transcriptional regulatory networks in Drosophila

    DOE PAGES

    Stoiber, Marcus H.; Olson, Sara; May, Gemma E.; ...

    2015-08-20

    In eukaryotic cells, RNAs exist as ribonucleoprotein particles (RNPs). Despite the importance of these complexes in many biological processes, including splicing, polyadenylation, stability, transportation, localization, and translation, their compositions are largely unknown. We affinity-purified 20 distinct RNA-binding proteins (RBPs) from cultured Drosophila melanogaster cells under native conditions and identified both the RNA and protein compositions of these RNP complexes. We identified “high occupancy target” (HOT) RNAs that interact with the majority of the RBPs we surveyed. HOT RNAs encode components of the nonsense-mediated decay and splicing machinery, as well as RNA-binding and translation initiation proteins. The RNP complexes contain proteinsmore » and mRNAs involved in RNA binding and post-transcriptional regulation. Genes with the capacity to produce hundreds of mRNA isoforms, ultracomplex genes, interact extensively with heterogeneous nuclear ribonuclear proteins (hnRNPs). Our data are consistent with a model in which subsets of RNPs include mRNA and protein products from the same gene, indicating the widespread existence of auto-regulatory RNPs. Lastly, from the simultaneous acquisition and integrative analysis of protein and RNA constituents of RNPs, we identify extensive cross-regulatory and hierarchical interactions in post-transcriptional control.« less

  10. Extensive cross-regulation of post-transcriptional regulatory networks in Drosophila

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stoiber, Marcus H.; Olson, Sara; May, Gemma E.

    In eukaryotic cells, RNAs exist as ribonucleoprotein particles (RNPs). Despite the importance of these complexes in many biological processes, including splicing, polyadenylation, stability, transportation, localization, and translation, their compositions are largely unknown. We affinity-purified 20 distinct RNA-binding proteins (RBPs) from cultured Drosophila melanogaster cells under native conditions and identified both the RNA and protein compositions of these RNP complexes. We identified “high occupancy target” (HOT) RNAs that interact with the majority of the RBPs we surveyed. HOT RNAs encode components of the nonsense-mediated decay and splicing machinery, as well as RNA-binding and translation initiation proteins. The RNP complexes contain proteinsmore » and mRNAs involved in RNA binding and post-transcriptional regulation. Genes with the capacity to produce hundreds of mRNA isoforms, ultracomplex genes, interact extensively with heterogeneous nuclear ribonuclear proteins (hnRNPs). Our data are consistent with a model in which subsets of RNPs include mRNA and protein products from the same gene, indicating the widespread existence of auto-regulatory RNPs. Lastly, from the simultaneous acquisition and integrative analysis of protein and RNA constituents of RNPs, we identify extensive cross-regulatory and hierarchical interactions in post-transcriptional control.« less

  11. A Statistical Approach for Testing Cross-Phenotype Effects of Rare Variants

    PubMed Central

    Broadaway, K. Alaine; Cutler, David J.; Duncan, Richard; Moore, Jacob L.; Ware, Erin B.; Jhun, Min A.; Bielak, Lawrence F.; Zhao, Wei; Smith, Jennifer A.; Peyser, Patricia A.; Kardia, Sharon L.R.; Ghosh, Debashis; Epstein, Michael P.

    2016-01-01

    Increasing empirical evidence suggests that many genetic variants influence multiple distinct phenotypes. When cross-phenotype effects exist, multivariate association methods that consider pleiotropy are often more powerful than univariate methods that model each phenotype separately. Although several statistical approaches exist for testing cross-phenotype effects for common variants, there is a lack of similar tests for gene-based analysis of rare variants. In order to fill this important gap, we introduce a statistical method for cross-phenotype analysis of rare variants using a nonparametric distance-covariance approach that compares similarity in multivariate phenotypes to similarity in rare-variant genotypes across a gene. The approach can accommodate both binary and continuous phenotypes and further can adjust for covariates. Our approach yields a closed-form test whose significance can be evaluated analytically, thereby improving computational efficiency and permitting application on a genome-wide scale. We use simulated data to demonstrate that our method, which we refer to as the Gene Association with Multiple Traits (GAMuT) test, provides increased power over competing approaches. We also illustrate our approach using exome-chip data from the Genetic Epidemiology Network of Arteriopathy. PMID:26942286

  12. A natural allele of Nxf1/TAP supresses retrovirus insertional mutations

    PubMed Central

    Floyd, Jennifer A.; Gold, David A.; Concepcion, Dorothy; Poon, Tiffany H.; Wang, Xiaobo; Keithley, Elizabeth; Chen, Dan; Ward, Erica J.; Chinn, Steven B.; Friedman, Rick A.; Yu, Hon-Tsen; Moriwaki, Kazuo; Shiroishi, Toshihiko; Hamilton, Bruce A.

    2009-01-01

    Endogenous retroviruses have shaped the evolution of mammalian genomes. Host genes that control the effects of retrovirus insertions are therefore of great interest. The Modifier-of-vibrator-1 locus controls level of correctly processed mRNA from genes mutated by endogenous retrovirus insertions into introns, including the pitpnvb tremor mutation and the Eya1BOR model of human branchiootorenal syndrome. Positional complementation cloning identifies Mvb1 as the nuclear export factor Nxf1, providing an unexpected link between mRNA export receptor and pre-mRNA processing. Population structure of the suppressing allele in wild M. m. castaneus suggests selective advantage. A congenic Mvb1CAST allele is a useful tool for modifying gene expression from existing mutations and could be used to manipulate engineered mutations containing retroviral elements. PMID:14517553

  13. Redefining C and D in the petunia ABC.

    PubMed

    Heijmans, Klaas; Ament, Kai; Rijpkema, Anneke S; Zethof, Jan; Wolters-Arts, Mieke; Gerats, Tom; Vandenbussche, Michiel

    2012-06-01

    According to the ABC(DE) model for flower development, C-genes are required for stamen and carpel development and floral determinacy, and D-genes were proposed to play a unique role in ovule development. Both C- and D-genes belong to the AGAMOUS (AG) subfamily of MADS box transcription factors. We show that the petunia (Petunia hybrida) C-clade genes PETUNIA MADS BOX GENE3 and FLORAL BINDING PROTEIN6 (FBP6) largely overlap in function, both in floral organ identity specification and floral determinacy, unlike the pronounced subfunctionalization observed in Arabidopsis thaliana and snapdragon (Antirrhinum majus). Some specialization has also evolved, since FBP6 plays a unique role in the development of the style and stigma. Furthermore, we show that the D-genes FBP7 and FBP11 are not essential to confer ovule identity. Instead, this function is redundantly shared among all AG members. In turn, the D-genes also participate in floral determinacy. Gain-of-function analyses suggest the presence of a posttranscriptional C-repression mechanism in petunia, most likely not existing in Arabidopsis. Finally, we show that expression maintenance of the paleoAPETALA3-type B-gene TOMATO MADS BOX GENE6 depends on the activity of C-genes. Taken together, this demonstrates considerable variation in the molecular control of floral development between eudicot species.

  14. Redefining C and D in the Petunia ABC[W

    PubMed Central

    Heijmans, Klaas; Ament, Kai; Rijpkema, Anneke S.; Zethof, Jan; Wolters-Arts, Mieke; Gerats, Tom; Vandenbussche, Michiel

    2012-01-01

    According to the ABC(DE) model for flower development, C-genes are required for stamen and carpel development and floral determinacy, and D-genes were proposed to play a unique role in ovule development. Both C- and D-genes belong to the AGAMOUS (AG) subfamily of MADS box transcription factors. We show that the petunia (Petunia hybrida) C-clade genes PETUNIA MADS BOX GENE3 and FLORAL BINDING PROTEIN6 (FBP6) largely overlap in function, both in floral organ identity specification and floral determinacy, unlike the pronounced subfunctionalization observed in Arabidopsis thaliana and snapdragon (Antirrhinum majus). Some specialization has also evolved, since FBP6 plays a unique role in the development of the style and stigma. Furthermore, we show that the D-genes FBP7 and FBP11 are not essential to confer ovule identity. Instead, this function is redundantly shared among all AG members. In turn, the D-genes also participate in floral determinacy. Gain-of-function analyses suggest the presence of a posttranscriptional C-repression mechanism in petunia, most likely not existing in Arabidopsis. Finally, we show that expression maintenance of the paleoAPETALA3-type B-gene TOMATO MADS BOX GENE6 depends on the activity of C-genes. Taken together, this demonstrates considerable variation in the molecular control of floral development between eudicot species. PMID:22706285

  15. Meiosis genes in Daphnia pulex and the role of parthenogenesis in genome evolution.

    PubMed

    Schurko, Andrew M; Logsdon, John M; Eads, Brian D

    2009-04-21

    Thousands of parthenogenetic animal species have been described and cytogenetic manifestations of this reproductive mode are well known. However, little is understood about the molecular determinants of parthenogenesis. The Daphnia pulex genome must contain the molecular machinery for different reproductive modes: sexual (both male and female meiosis) and parthenogenetic (which is either cyclical or obligate). This feature makes D. pulex an ideal model to investigate the genetic basis of parthenogenesis and its consequences for gene and genome evolution. Here we describe the inventory of meiotic genes and their expression patterns during meiotic and parthenogenetic reproduction to help address whether parthenogenesis uses existing meiotic and mitotic machinery, or whether novel processes may be involved. We report an inventory of 130 homologs representing over 40 genes encoding proteins with diverse roles in meiotic processes in the genome of D. pulex. Many genes involved in cell cycle regulation and sister chromatid cohesion are characterized by expansions in copy number. In contrast, most genes involved in DNA replication and homologous recombination are present as single copies. Notably, RECQ2 (which suppresses homologous recombination) is present in multiple copies while DMC1 is the only gene in our inventory that is absent in the Daphnia genome. Expression patterns for 44 gene copies were similar during meiosis versus parthenogenesis, although several genes displayed marked differences in expression level in germline and somatic tissues. We propose that expansions in meiotic gene families in D. pulex may be associated with parthenogenesis. Taking into account our findings, we provide a mechanistic model of parthenogenesis, highlighting steps that must differ from meiosis including sister chromatid cohesion and kinetochore attachment.

  16. Meiosis genes in Daphnia pulex and the role of parthenogenesis in genome evolution

    PubMed Central

    Schurko, Andrew M; Logsdon, John M; Eads, Brian D

    2009-01-01

    Background Thousands of parthenogenetic animal species have been described and cytogenetic manifestations of this reproductive mode are well known. However, little is understood about the molecular determinants of parthenogenesis. The Daphnia pulex genome must contain the molecular machinery for different reproductive modes: sexual (both male and female meiosis) and parthenogenetic (which is either cyclical or obligate). This feature makes D. pulex an ideal model to investigate the genetic basis of parthenogenesis and its consequences for gene and genome evolution. Here we describe the inventory of meiotic genes and their expression patterns during meiotic and parthenogenetic reproduction to help address whether parthenogenesis uses existing meiotic and mitotic machinery, or whether novel processes may be involved. Results We report an inventory of 130 homologs representing over 40 genes encoding proteins with diverse roles in meiotic processes in the genome of D. pulex. Many genes involved in cell cycle regulation and sister chromatid cohesion are characterized by expansions in copy number. In contrast, most genes involved in DNA replication and homologous recombination are present as single copies. Notably, RECQ2 (which suppresses homologous recombination) is present in multiple copies while DMC1 is the only gene in our inventory that is absent in the Daphnia genome. Expression patterns for 44 gene copies were similar during meiosis versus parthenogenesis, although several genes displayed marked differences in expression level in germline and somatic tissues. Conclusion We propose that expansions in meiotic gene families in D. pulex may be associated with parthenogenesis. Taking into account our findings, we provide a mechanistic model of parthenogenesis, highlighting steps that must differ from meiosis including sister chromatid cohesion and kinetochore attachment. PMID:19383157

  17. Mitochondria, oligodendrocytes and inflammation in bipolar disorder: evidence from transcriptome studies points to intriguing parallels with multiple sclerosis

    PubMed Central

    Konradi, Christine; Sillivan, Stephanie E.; Clay, Hayley B.

    2011-01-01

    Gene expression studies of bipolar disorder (BPD) have shown changes in transcriptome profiles in multiple brain regions. Here we summarize the most consistent findings in the scientific literature, and compare them to data from schizophrenia (SZ) and major depressive disorder (MDD). The transcriptome profiles of all three disorders overlap, making the existence of a BPD-specific profile unlikely. Three groups of functionally related genes are consistently expressed at altered levels in BPD, SZ and MDD. Genes involved in energy metabolism and mitochondrial function are downregulated, genes involved in immune response and inflammation are upregulated, and genes expressed in oligodendrocytes are downregulated. Experimental paradigms for multiple sclerosis demonstrate a tight link between energy metabolism, inflammation and demyelination. These studies also show variabilities in the extent of oligodendrocyte stress, which can vary from a downregulation of oligodendrocyte genes, such as observed in psychiatric disorders, to cell death and brain lesions seen in multiple sclerosis. We conclude that experimental models of multiple sclerosis could be of interest for the research of BPD, SZ and MDD. PMID:21310238

  18. A serine proteinase homologue, SPH-3, plays a central role in insect immunity.

    PubMed

    Felföldi, Gabriella; Eleftherianos, Ioannis; Ffrench-Constant, Richard H; Venekei, István

    2011-04-15

    Numerous vertebrate and invertebrate genes encode serine proteinase homologues (SPHs) similar to members of the serine proteinase family, but lacking one or more residues of the catalytic triad. These SPH proteins are thought to play a role in immunity, but their precise functions are poorly understood. In this study, we show that SPH-3 (an insect non-clip domain-containing SPH) is of central importance in the immune response of a model lepidopteran, Manduca sexta. We examine M. sexta infection with a virulent, insect-specific, Gram-negative bacterium Photorhabdus luminescens. RNA interference suppression of bacteria-induced SPH-3 synthesis severely compromises the insect's ability to defend itself against infection by preventing the transcription of multiple antimicrobial effector genes, but, surprisingly, not the transcription of immune recognition genes. Upregulation of the gene encoding prophenoloxidase and the activity of the phenoloxidase enzyme are among the antimicrobial responses that are severely attenuated on SPH-3 knockdown. These findings suggest the existence of two largely independent signaling pathways controlling immune recognition by the fat body, one governing effector gene transcription, and the other regulating genes encoding pattern recognition proteins.

  19. Data identification for improving gene network inference using computational algebra.

    PubMed

    Dimitrova, Elena; Stigler, Brandilyn

    2014-11-01

    Identification of models of gene regulatory networks is sensitive to the amount of data used as input. Considering the substantial costs in conducting experiments, it is of value to have an estimate of the amount of data required to infer the network structure. To minimize wasted resources, it is also beneficial to know which data are necessary to identify the network. Knowledge of the data and knowledge of the terms in polynomial models are often required a priori in model identification. In applications, it is unlikely that the structure of a polynomial model will be known, which may force data sets to be unnecessarily large in order to identify a model. Furthermore, none of the known results provides any strategy for constructing data sets to uniquely identify a model. We provide a specialization of an existing criterion for deciding when a set of data points identifies a minimal polynomial model when its monomial terms have been specified. Then, we relax the requirement of the knowledge of the monomials and present results for model identification given only the data. Finally, we present a method for constructing data sets that identify minimal polynomial models.

  20. Quantifying temporal isolation: a modelling approach assessing the effect of flowering time differences on crop-to-weed pollen flow in sunflower

    PubMed Central

    Roumet, Marie; Cayre, Adeline; Latreille, Muriel; Muller, Marie-Hélène

    2015-01-01

    Flowering time divergence can be a crucial component of reproductive isolation between sympatric populations, but few studies have quantified its actual contribution to the reduction of gene flow. In this study, we aimed at estimating pollen-mediated gene flow between cultivated sunflower and a weedy conspecific sunflower population growing in the same field and at quantifying, how it is affected by the weeds' flowering time. For that purpose, we extended an existing mating model by including a temporal distance (i.e. flowering time difference between potential parents) effect on mating probabilities. Using phenological and genotypic data gathered on the crop and on a sample of the weedy population and its offspring, we estimated an average hybridization rate of approximately 10%. This rate varied strongly from 30% on average for weeds flowering at the crop flowering peak to 0% when the crop finished flowering and was affected by the local density of weeds. Our result also suggested the occurrence of other factors limiting crop-to-weed gene flow. This level of gene flow and its dependence on flowering time might influence the evolutionary fate of weedy sunflower populations sympatric to their crop relative. PMID:25667603

  1. Population-based case-control study of DRD2 gene polymorphisms and alcoholism.

    PubMed

    Bhaskar, L V K S; Thangaraj, K; Non, A L; Singh, Lalji; Rao, V R

    2010-10-01

    Several independent lines of evidence for genetic contributions to vulnerability to alcoholism exist. Dopamine is thought to play a major role in the mechanism of reward and reinforcement in response to alcohol. D2 dopamine receptor (DRD2) gene has been among the stronger candidate genes implicated in alcoholism. In this study, alcohol use was assessed in 196 randomly selected Kota individuals of Nilgiri Hills, South India. Six DRD2 SNPs were assessed in 81 individuals with alcoholism and 151 controls to evaluate the association between single nucleotide polymorphisms (SNPs) and alcoholism. Of the three models (dominant, recessive, and additive) tested for association between alcoholism and DRD2 SNPs, only the additive model shows association for three loci (rs1116313, TaqID, and rs2734835). Of six studied polymorphisms, five are in strong linkage disequilibrium forming onesingle haplotype block. Though the global haplotype analysis with these five SNPs was not significant, haplotype analysis using all six SNPs yielded a global P value of .033, even after adjusting for age. These findings support the importance of dopamine receptor gene polymorphisms in alcoholism. Further studies to replicate these findings in different populations are needed to confirm these results.

  2. Routes to DNA accessibility: alternative pathways for nucleosome unwinding.

    PubMed

    Schlingman, Daniel J; Mack, Andrew H; Kamenetska, Masha; Mochrie, Simon G J; Regan, Lynne

    2014-07-15

    The dynamic packaging of DNA into chromatin is a key determinant of eukaryotic gene regulation and epigenetic inheritance. Nucleosomes are the basic unit of chromatin, and therefore the accessible states of the nucleosome must be the starting point for mechanistic models regarding these essential processes. Although the existence of different unwound nucleosome states has been hypothesized, there have been few studies of these states. The consequences of multiple states are far reaching. These states will behave differently in all aspects, including their interactions with chromatin remodelers, histone variant exchange, and kinetic properties. Here, we demonstrate the existence of two distinct states of the unwound nucleosome, which are accessible at physiological forces and ionic strengths. Using optical tweezers, we measure the rates of unwinding and rewinding for these two states and show that the rewinding rates from each state are different. In addition, we show that the probability of unwinding into each state is dependent on the applied force and ionic strength. Our results demonstrate not only that multiple unwound states exist but that their accessibility can be differentially perturbed, suggesting possible roles for these states in gene regulation. For example, different histone variants or modifications may facilitate or suppress access to DNA by promoting unwinding into one state or the other. We anticipate that the two unwound states reported here will be the basis for future models of eukaryotic transcriptional control. Copyright © 2014 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  3. Structures and Boolean Dynamics in Gene Regulatory Networks

    NASA Astrophysics Data System (ADS)

    Szedlak, Anthony

    This dissertation discusses the topological and dynamical properties of GRNs in cancer, and is divided into four main chapters. First, the basic tools of modern complex network theory are introduced. These traditional tools as well as those developed by myself (set efficiency, interset efficiency, and nested communities) are crucial for understanding the intricate topological properties of GRNs, and later chapters recall these concepts. Second, the biology of gene regulation is discussed, and a method for disease-specific GRN reconstruction developed by our collaboration is presented. This complements the traditional exhaustive experimental approach of building GRNs edge-by-edge by quickly inferring the existence of as of yet undiscovered edges using correlations across sets of gene expression data. This method also provides insight into the distribution of common mutations across GRNs. Third, I demonstrate that the structures present in these reconstructed networks are strongly related to the evolutionary histories of their constituent genes. Investigation of how the forces of evolution shaped the topology of GRNs in multicellular organisms by growing outward from a core of ancient, conserved genes can shed light upon the ''reverse evolution'' of normal cells into unicellular-like cancer states. Next, I simulate the dynamics of the GRNs of cancer cells using the Hopfield model, an infinite range spin-glass model designed with the ability to encode Boolean data as attractor states. This attractor-driven approach facilitates the integration of gene expression data into predictive mathematical models. Perturbations representing therapeutic interventions are applied to sets of genes, and the resulting deviations from their attractor states are recorded, suggesting new potential drug targets for experimentation. Finally, I extend the Hopfield model to modular networks, cyclic attractors, and complex attractors, and apply these concepts to simulations of the cell cycle process. Futher development of these and other theoretical and computational tools is necessary to analyze the deluge of experimental data produced by modern and future biological high throughput methods. (Abstract shortened by ProQuest.).

  4. Evaluation of techniques for increasing recall in a dictionary approach to gene and protein name identification.

    PubMed

    Schuemie, Martijn J; Mons, Barend; Weeber, Marc; Kors, Jan A

    2007-06-01

    Gene and protein name identification in text requires a dictionary approach to relate synonyms to the same gene or protein, and to link names to external databases. However, existing dictionaries are incomplete. We investigate two complementary methods for automatic generation of a comprehensive dictionary: combination of information from existing gene and protein databases and rule-based generation of spelling variations. Both methods have been reported in literature before, but have hitherto not been combined and evaluated systematically. We combined gene and protein names from several existing databases of four different organisms. The combined dictionaries showed a substantial increase in recall on three different test sets, as compared to any single database. Application of 23 spelling variation rules to the combined dictionaries further increased recall. However, many rules appeared to have no effect and some appear to have a detrimental effect on precision.

  5. From animal models to human disease: a genetic approach for personalized medicine in ALS.

    PubMed

    Picher-Martel, Vincent; Valdmanis, Paul N; Gould, Peter V; Julien, Jean-Pierre; Dupré, Nicolas

    2016-07-11

    Amyotrophic Lateral Sclerosis (ALS) is the most frequent motor neuron disease in adults. Classical ALS is characterized by the death of upper and lower motor neurons leading to progressive paralysis. Approximately 10 % of ALS patients have familial form of the disease. Numerous different gene mutations have been found in familial cases of ALS, such as mutations in superoxide dismutase 1 (SOD1), TAR DNA-binding protein 43 (TDP-43), fused in sarcoma (FUS), C9ORF72, ubiquilin-2 (UBQLN2), optineurin (OPTN) and others. Multiple animal models were generated to mimic the disease and to test future treatments. However, no animal model fully replicates the spectrum of phenotypes in the human disease and it is difficult to assess how a therapeutic effect in disease models can predict efficacy in humans. Importantly, the genetic and phenotypic heterogeneity of ALS leads to a variety of responses to similar treatment regimens. From this has emerged the concept of personalized medicine (PM), which is a medical scheme that combines study of genetic, environmental and clinical diagnostic testing, including biomarkers, to individualized patient care. In this perspective, we used subgroups of specific ALS-linked gene mutations to go through existing animal models and to provide a comprehensive profile of the differences and similarities between animal models of disease and human disease. Finally, we reviewed application of biomarkers and gene therapies relevant in personalized medicine approach. For instance, this includes viral delivering of antisense oligonucleotide and small interfering RNA in SOD1, TDP-43 and C9orf72 mice models. Promising gene therapies raised possibilities for treating differently the major mutations in familial ALS cases.

  6. A stochastic evolution model for residue Insertion-Deletion Independent from Substitution.

    PubMed

    Lèbre, Sophie; Michel, Christian J

    2010-12-01

    We develop here a new class of stochastic models of gene evolution based on residue Insertion-Deletion Independent from Substitution (IDIS). Indeed, in contrast to all existing evolution models, insertions and deletions are modeled here by a concept in population dynamics. Therefore, they are not only independent from each other, but also independent from the substitution process. After a separate stochastic analysis of the substitution and the insertion-deletion processes, we obtain a matrix differential equation combining these two processes defining the IDIS model. By deriving a general solution, we give an analytical expression of the residue occurrence probability at evolution time t as a function of a substitution rate matrix, an insertion rate vector, a deletion rate and an initial residue probability vector. Various mathematical properties of the IDIS model in relation with time t are derived: time scale, time step, time inversion and sequence length. Particular expressions of the nucleotide occurrence probability at time t are given for classical substitution rate matrices in various biological contexts: equal insertion rate, insertion-deletion only and substitution only. All these expressions can be directly used for biological evolutionary applications. The IDIS model shows a strongly different stochastic behavior from the classical substitution only model when compared on a gene dataset. Indeed, by considering three processes of residue insertion, deletion and substitution independently from each other, it allows a more realistic representation of gene evolution and opens new directions and applications in this research field. Copyright © 2010 Elsevier Ltd. All rights reserved.

  7. Analysis of the hierarchy of quorum-sensing regulation in Pseudomonas aeruginosa.

    PubMed

    Wagner, Victoria E; Li, Luen-Luen; Isabella, Vincent M; Iglewski, Barbara H

    2007-01-01

    Quorum-sensing in Pseudomonas aeruginosa is known to regulate several aspects of pathogenesis, including virulence factor production, biofilm development, and antimicrobial resistance. Recent high-throughput analysis has revealed the existence of several layers of regulation within the QS-circuit. To address this complexity, mutations in genes encoding known or putative transcriptional regulators that were also identified as being regulated by the las and/or rhl QS systems were screened for their contribution in mediating several phenotypes, for example motility, secreted virulence products, and pathogenic capacity in a lettuce leaf model. These studies have further elucidated the potential contribution to virulence of these genes within the QS regulon.

  8. Meta-analysis of gene expression patterns in animal models of prenatal alcohol exposure suggests role for protein synthesis inhibition and chromatin remodeling

    PubMed Central

    Rogic, Sanja; Wong, Albertina; Pavlidis, Paul

    2017-01-01

    Background Prenatal alcohol exposure (PAE) can result in an array of morphological, behavioural and neurobiological deficits that can range in their severity. Despite extensive research in the field and a significant progress made, especially in understanding the range of possible malformations and neurobehavioral abnormalities, the molecular mechanisms of alcohol responses in development are still not well understood. There have been multiple transcriptomic studies looking at the changes in gene expression after PAE in animal models, however there is a limited apparent consensus among the reported findings. In an effort to address this issue, we performed a comprehensive re-analysis and meta-analysis of all suitable, publically available expression data sets. Methods We assembled ten microarray data sets of gene expression after PAE in mouse and rat models consisting of samples from a total of 63 ethanol-exposed and 80 control animals. We re-analyzed each data set for differential expression and then used the results to perform meta-analyses considering all data sets together or grouping them by time or duration of exposure (pre- and post-natal, acute and chronic, respectively). We performed network and Gene Ontology enrichment analysis to further characterize the identified signatures. Results For each sub-analysis we identified signatures of differential expressed genes that show support from multiple studies. Overall, the changes in gene expression were more extensive after acute ethanol treatment during prenatal development than in other models. Considering the analysis of all the data together, we identified a robust core signature of 104 genes down-regulated after PAE, with no up-regulated genes. Functional analysis reveals over-representation of genes involved in protein synthesis, mRNA splicing and chromatin organization. Conclusions Our meta-analysis shows that existing studies, despite superficial dissimilarity in findings, share features that allow us to identify a common core signature set of transcriptome changes in PAE. This is an important step to identifying the biological processes that underlie the etiology of FASD. PMID:26996386

  9. Gene doping.

    PubMed

    Azzazy, Hassan M E

    2010-01-01

    Gene doping abuses the legitimate approach of gene therapy. While gene therapy aims to correct genetic disorders by introducing a foreign gene to replace an existing faulty one or by manipulating existing gene(s) to achieve a therapeutic benefit, gene doping employs the same concepts to bestow performance advantages on athletes over their competitors. Recent developments in genetic engineering have contributed significantly to the progress of gene therapy research and currently numerous clinical trials are underway. Some athletes and their staff are probably watching this progress closely. Any gene that plays a role in muscle development, oxygen delivery to tissues, neuromuscular coordination, or even pain control is considered a candidate for gene dopers. Unfortunately, detecting gene doping is technically very difficult because the transgenic proteins expressed by the introduced genes are similar to their endogenous counterparts. Researchers today are racing the clock because assuring the continued integrity of sports competition depends on their ability to develop effective detection strategies in preparation for the 2012 Olympics, which may mark the appearance of genetically modified athletes.

  10. Genomic survey and expression analysis of DNA repair genes in the genus Leptospira.

    PubMed

    Martins-Pinheiro, Marinalva; Schons-Fonseca, Luciane; da Silva, Josefa B; Domingos, Renan H; Momo, Leonardo Hiroyuki Santos; Simões, Ana Carolina Quirino; Ho, Paulo Lee; da Costa, Renata M A

    2016-04-01

    Leptospirosis is an emerging zoonosis with important economic and public health consequences and is caused by pathogenic leptospires. The genus Leptospira belongs to the order Spirochaetales and comprises saprophytic (L. biflexa), pathogenic (L. interrogans) and host-dependent (L. borgpetersenii) members. Here, we present an in silico search for DNA repair pathways in Leptospira spp. The relevance of such DNA repair pathways was assessed through the identification of mRNA levels of some genes during infection in animal model and after exposition to spleen cells. The search was performed by comparison of available Leptospira spp. genomes in public databases with known DNA repair-related genes. Leptospires exhibit some distinct and unexpected characteristics, for instance the existence of a redundant mechanism for repairing a chemically diverse spectrum of alkylated nucleobases, a new mutS-like gene and a new shorter version of uvrD. Leptospira spp. shares some characteristics from Gram-positive, as the presence of PcrA, two RecQ paralogs and two SSB proteins; the latter is considered a feature shared by naturally competent bacteria. We did not find a significant reduction in the number of DNA repair-related genes in both pathogenic and host-dependent species. Pathogenic leptospires were enriched for genes dedicated to base excision repair and non-homologous end joining. Their evolutionary history reveals a remarkable importance of lateral gene transfer events for the evolution of the genus. Up-regulation of specific DNA repair genes, including components of SOS regulon, during infection in animal model validates the critical role of DNA repair mechanisms for the complex interplay between host/pathogen.

  11. Games among relatives revisited.

    PubMed

    Allen, Benjamin; Nowak, Martin A

    2015-08-07

    We present a simple model for the evolution of social behavior in family-structured, finite sized populations. Interactions are represented as evolutionary games describing frequency-dependent selection. Individuals interact more frequently with siblings than with members of the general population, as quantified by an assortment parameter r, which can be interpreted as "relatedness". Other models, mostly of spatially structured populations, have shown that assortment can promote the evolution of cooperation by facilitating interaction between cooperators, but this effect depends on the details of the evolutionary process. For our model, we find that sibling assortment promotes cooperation in stringent social dilemmas such as the Prisoner's Dilemma, but not necessarily in other situations. These results are obtained through straightforward calculations of changes in gene frequency. We also analyze our model using inclusive fitness. We find that the quantity of inclusive fitness does not exist for general games. For special games, where inclusive fitness exists, it provides less information than the straightforward analysis. Copyright © 2015 Elsevier Ltd. All rights reserved.

  12. Multivariate Cholesky models of human female fertility patterns in the NLSY.

    PubMed

    Rodgers, Joseph Lee; Bard, David E; Miller, Warren B

    2007-03-01

    Substantial evidence now exists that variables measuring or correlated with human fertility outcomes have a heritable component. In this study, we define a series of age-sequenced fertility variables, and fit multivariate models to account for underlying shared genetic and environmental sources of variance. We make predictions based on a theory developed by Udry [(1996) Biosocial models of low-fertility societies. In: Casterline, JB, Lee RD, Foote KA (eds) Fertility in the United States: new patterns, new theories. The Population Council, New York] suggesting that biological/genetic motivations can be more easily realized and measured in settings in which fertility choices are available. Udry's theory, along with principles from molecular genetics and certain tenets of life history theory, allow us to make specific predictions about biometrical patterns across age. Consistent with predictions, our results suggest that there are different sources of genetic influence on fertility variance at early compared to later ages, but that there is only one source of shared environmental influence that occurs at early ages. These patterns are suggestive of the types of gene-gene and gene-environment interactions for which we must account to better understand individual differences in fertility outcomes.

  13. Finding approximate gene clusters with Gecko 3.

    PubMed

    Winter, Sascha; Jahn, Katharina; Wehner, Stefanie; Kuchenbecker, Leon; Marz, Manja; Stoye, Jens; Böcker, Sebastian

    2016-11-16

    Gene-order-based comparison of multiple genomes provides signals for functional analysis of genes and the evolutionary process of genome organization. Gene clusters are regions of co-localized genes on genomes of different species. The rapid increase in sequenced genomes necessitates bioinformatics tools for finding gene clusters in hundreds of genomes. Existing tools are often restricted to few (in many cases, only two) genomes, and often make restrictive assumptions such as short perfect conservation, conserved gene order or monophyletic gene clusters. We present Gecko 3, an open-source software for finding gene clusters in hundreds of bacterial genomes, that comes with an easy-to-use graphical user interface. The underlying gene cluster model is intuitive, can cope with low degrees of conservation as well as misannotations and is complemented by a sound statistical evaluation. To evaluate the biological benefit of Gecko 3 and to exemplify our method, we search for gene clusters in a dataset of 678 bacterial genomes using Synechocystis sp. PCC 6803 as a reference. We confirm detected gene clusters reviewing the literature and comparing them to a database of operons; we detect two novel clusters, which were confirmed by publicly available experimental RNA-Seq data. The computational analysis is carried out on a laptop computer in <40 min. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  14. Machine Learning-Assisted Network Inference Approach to Identify a New Class of Genes that Coordinate the Functionality of Cancer Networks.

    PubMed

    Ghanat Bari, Mehrab; Ung, Choong Yong; Zhang, Cheng; Zhu, Shizhen; Li, Hu

    2017-08-01

    Emerging evidence indicates the existence of a new class of cancer genes that act as "signal linkers" coordinating oncogenic signals between mutated and differentially expressed genes. While frequently mutated oncogenes and differentially expressed genes, which we term Class I cancer genes, are readily detected by most analytical tools, the new class of cancer-related genes, i.e., Class II, escape detection because they are neither mutated nor differentially expressed. Given this hypothesis, we developed a Machine Learning-Assisted Network Inference (MALANI) algorithm, which assesses all genes regardless of expression or mutational status in the context of cancer etiology. We used 8807 expression arrays, corresponding to 9 cancer types, to build more than 2 × 10 8 Support Vector Machine (SVM) models for reconstructing a cancer network. We found that ~3% of ~19,000 not differentially expressed genes are Class II cancer gene candidates. Some Class II genes that we found, such as SLC19A1 and ATAD3B, have been recently reported to associate with cancer outcomes. To our knowledge, this is the first study that utilizes both machine learning and network biology approaches to uncover Class II cancer genes in coordinating functionality in cancer networks and will illuminate our understanding of how genes are modulated in a tissue-specific network contribute to tumorigenesis and therapy development.

  15. A cross-species analysis method to analyze animal models' similarity to human's disease state

    PubMed Central

    2012-01-01

    Background Animal models are indispensable tools in studying the cause of human diseases and searching for the treatments. The scientific value of an animal model depends on the accurate mimicry of human diseases. The primary goal of the current study was to develop a cross-species method by using the animal models' expression data to evaluate the similarity to human diseases' and assess drug molecules' efficiency in drug research. Therefore, we hoped to reveal that it is feasible and useful to compare gene expression profiles across species in the studies of pathology, toxicology, drug repositioning, and drug action mechanism. Results We developed a cross-species analysis method to analyze animal models' similarity to human diseases and effectiveness in drug research by utilizing the existing animal gene expression data in the public database, and mined some meaningful information to help drug research, such as potential drug candidates, possible drug repositioning, side effects and analysis in pharmacology. New animal models could be evaluated by our method before they are used in drug discovery. We applied the method to several cases of known animal model expression profiles and obtained some useful information to help drug research. We found that trichostatin A and some other HDACs could have very similar response across cell lines and species at gene expression level. Mouse hypoxia model could accurately mimic the human hypoxia, while mouse diabetes drug model might have some limitation. The transgenic mouse of Alzheimer was a useful model and we deeply analyzed the biological mechanisms of some drugs in this case. In addition, all the cases could provide some ideas for drug discovery and drug repositioning. Conclusions We developed a new cross-species gene expression module comparison method to use animal models' expression data to analyse the effectiveness of animal models in drug research. Moreover, through data integration, our method could be applied for drug research, such as potential drug candidates, possible drug repositioning, side effects and information about pharmacology. PMID:23282076

  16. A cross-species analysis method to analyze animal models' similarity to human's disease state.

    PubMed

    Yu, Shuhao; Zheng, Lulu; Li, Yun; Li, Chunyan; Ma, Chenchen; Li, Yixue; Li, Xuan; Hao, Pei

    2012-01-01

    Animal models are indispensable tools in studying the cause of human diseases and searching for the treatments. The scientific value of an animal model depends on the accurate mimicry of human diseases. The primary goal of the current study was to develop a cross-species method by using the animal models' expression data to evaluate the similarity to human diseases' and assess drug molecules' efficiency in drug research. Therefore, we hoped to reveal that it is feasible and useful to compare gene expression profiles across species in the studies of pathology, toxicology, drug repositioning, and drug action mechanism. We developed a cross-species analysis method to analyze animal models' similarity to human diseases and effectiveness in drug research by utilizing the existing animal gene expression data in the public database, and mined some meaningful information to help drug research, such as potential drug candidates, possible drug repositioning, side effects and analysis in pharmacology. New animal models could be evaluated by our method before they are used in drug discovery. We applied the method to several cases of known animal model expression profiles and obtained some useful information to help drug research. We found that trichostatin A and some other HDACs could have very similar response across cell lines and species at gene expression level. Mouse hypoxia model could accurately mimic the human hypoxia, while mouse diabetes drug model might have some limitation. The transgenic mouse of Alzheimer was a useful model and we deeply analyzed the biological mechanisms of some drugs in this case. In addition, all the cases could provide some ideas for drug discovery and drug repositioning. We developed a new cross-species gene expression module comparison method to use animal models' expression data to analyse the effectiveness of animal models in drug research. Moreover, through data integration, our method could be applied for drug research, such as potential drug candidates, possible drug repositioning, side effects and information about pharmacology.

  17. Inference on the Strength of Balancing Selection for Epistatically Interacting Loci

    PubMed Central

    Buzbas, Erkan Ozge; Joyce, Paul; Rosenberg, Noah A.

    2011-01-01

    Existing inference methods for estimating the strength of balancing selection in multi-locus genotypes rely on the assumption that there are no epistatic interactions between loci. Complex systems in which balancing selection is prevalent, such as sets of human immune system genes, are known to contain components that interact epistatically. Therefore, current methods may not produce reliable inference on the strength of selection at these loci. In this paper, we address this problem by presenting statistical methods that can account for epistatic interactions in making inference about balancing selection. A theoretical result due to Fearnhead (2006) is used to build a multi-locus Wright-Fisher model of balancing selection, allowing for epistatic interactions among loci. Antagonistic and synergistic types of interactions are examined. The joint posterior distribution of the selection and mutation parameters is sampled by Markov chain Monte Carlo methods, and the plausibility of models is assessed via Bayes factors. As a component of the inference process, an algorithm to generate multi-locus allele frequencies under balancing selection models with epistasis is also presented. Recent evidence on interactions among a set of human immune system genes is introduced as a motivating biological system for the epistatic model, and data on these genes are used to demonstrate the methods. PMID:21277883

  18. Integrative biology approach identifies cytokine targeting strategies for psoriasis.

    PubMed

    Perera, Gayathri K; Ainali, Chrysanthi; Semenova, Ekaterina; Hundhausen, Christian; Barinaga, Guillermo; Kassen, Deepika; Williams, Andrew E; Mirza, Muddassar M; Balazs, Mercedesz; Wang, Xiaoting; Rodriguez, Robert Sanchez; Alendar, Andrej; Barker, Jonathan; Tsoka, Sophia; Ouyang, Wenjun; Nestle, Frank O

    2014-02-12

    Cytokines are critical checkpoints of inflammation. The treatment of human autoimmune disease has been revolutionized by targeting inflammatory cytokines as key drivers of disease pathogenesis. Despite this, there exist numerous pitfalls when translating preclinical data into the clinic. We developed an integrative biology approach combining human disease transcriptome data sets with clinically relevant in vivo models in an attempt to bridge this translational gap. We chose interleukin-22 (IL-22) as a model cytokine because of its potentially important proinflammatory role in epithelial tissues. Injection of IL-22 into normal human skin grafts produced marked inflammatory skin changes resembling human psoriasis. Injection of anti-IL-22 monoclonal antibody in a human xenotransplant model of psoriasis, developed specifically to test potential therapeutic candidates, efficiently blocked skin inflammation. Bioinformatic analysis integrating both the IL-22 and anti-IL-22 cytokine transcriptomes and mapping them onto a psoriasis disease gene coexpression network identified key cytokine-dependent hub genes. Using knockout mice and small-molecule blockade, we show that one of these hub genes, the so far unexplored serine/threonine kinase PIM1, is a critical checkpoint for human skin inflammation and potential future therapeutic target in psoriasis. Using in silico integration of human data sets and biological models, we were able to identify a new target in the treatment of psoriasis.

  19. Novel harmonic regularization approach for variable selection in Cox's proportional hazards model.

    PubMed

    Chu, Ge-Jin; Liang, Yong; Wang, Jia-Xuan

    2014-01-01

    Variable selection is an important issue in regression and a number of variable selection methods have been proposed involving nonconvex penalty functions. In this paper, we investigate a novel harmonic regularization method, which can approximate nonconvex Lq  (1/2 < q < 1) regularizations, to select key risk factors in the Cox's proportional hazards model using microarray gene expression data. The harmonic regularization method can be efficiently solved using our proposed direct path seeking approach, which can produce solutions that closely approximate those for the convex loss function and the nonconvex regularization. Simulation results based on the artificial datasets and four real microarray gene expression datasets, such as real diffuse large B-cell lymphoma (DCBCL), the lung cancer, and the AML datasets, show that the harmonic regularization method can be more accurate for variable selection than existing Lasso series methods.

  20. Improved kinetic model of Escherichia coli central carbon metabolism in batch and continuous cultures.

    PubMed

    Kurata, Hiroyuki; Sugimoto, Yurie

    2018-02-01

    Many kinetic models of Escherichia coli central metabolism have been built, but few models accurately reproduced the dynamic behaviors of wild type and multiple genetic mutants. In 2016, our latest kinetic model improved problems of existing models to reproduce the cell growth and glucose uptake of wild type, ΔpykA:pykF and Δpgi in a batch culture, while it overestimated the glucose uptake and cell growth rates of Δppc and hardly captured the typical characteristics of the glyoxylate and TCA cycle fluxes for Δpgi and Δppc. Such discrepancies between the simulated and experimental data suggested biological complexity. In this study, we overcame these problems by assuming critical mechanisms regarding the OAA-regulated isocitrate dehydrogenase activity, aceBAK gene regulation and growth suppression. The present model accurately predicts the extracellular and intracellular dynamics of wild type and many gene knockout mutants in batch and continuous cultures. It is now the most accurate, detailed kinetic model of E. coli central carbon metabolism and will contribute to advances in mathematical modeling of cell factories. Copyright © 2017 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.

  1. Novel strategies to mine alcoholism-related haplotypes and genes by combining existing knowledge framework.

    PubMed

    Zhang, RuiJie; Li, Xia; Jiang, YongShuai; Liu, GuiYou; Li, ChuanXing; Zhang, Fan; Xiao, Yun; Gong, BinSheng

    2009-02-01

    High-throughout single nucleotide polymorphism detection technology and the existing knowledge provide strong support for mining the disease-related haplotypes and genes. In this study, first, we apply four kinds of haplotype identification methods (Confidence Intervals, Four Gamete Tests, Solid Spine of LD and fusing method of haplotype block) into high-throughout SNP genotype data to identify blocks, then use cluster analysis to verify the effectiveness of the four methods, and select the alcoholism-related SNP haplotypes through risk analysis. Second, we establish a mapping from haplotypes to alcoholism-related genes. Third, we inquire NCBI SNP and gene databases to locate the blocks and identify the candidate genes. In the end, we make gene function annotation by KEGG, Biocarta, and GO database. We find 159 haplotype blocks, which relate to the alcoholism most possibly on chromosome 1 approximately 22, including 227 haplotypes, of which 102 SNP haplotypes may increase the risk of alcoholism. We get 121 alcoholism-related genes and verify their reliability by the functional annotation of biology. In a word, we not only can handle the SNP data easily, but also can locate the disease-related genes precisely by combining our novel strategies of mining alcoholism-related haplotypes and genes with existing knowledge framework.

  2. Skin Tumors Rb(eing) Uncovered

    PubMed Central

    Costa, Clotilde; Paramio, Jesús M.; Santos, Mirentxu

    2013-01-01

    The Rb1 gene was the first bona fide tumor suppressor identified and cloned more than 25 years ago. Since then, a plethora of studies have revealed the functions of pRb and the existence of a sophisticated and strictly regulated pathway that modulates such functional roles. An emerging paradox affecting Rb1 in cancer connects the relatively low number of mutations affecting Rb1 gene in specific human tumors, compared with the widely functional inactivation of pRb in most, if not in all, human cancers. The existence of a retinoblastoma family of proteins pRb, p107, and p130 and their potential unique and overlapping functions as master regulators of cell cycle progression and transcriptional modulation by similar processes, may provide potential clues to explain such conundrum. Here, we will review the development of different genetically engineered mouse models, in particular those affecting stratified epithelia, and how they have offered new avenues to understand the roles of the Rb family members and their targets in the context of tumor development and progression. PMID:24381932

  3. 3D cultured immortalized human hepatocytes useful to develop drugs for blood-borne HCV

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aly, Hussein Hassan; Shimotohno, Kunitada; Hijikata, Makoto

    2009-02-06

    Due to the high polymorphism of natural hepatitis C virus (HCV) variants, existing recombinant HCV replication models have failed to be effective in developing effective anti-HCV agents. In the current study, we describe an in vitro system that supports the infection and replication of natural HCV from patient blood using an immortalized primary human hepatocyte cell line cultured in a three-dimensional (3D) culture system. Comparison of the gene expression profile of cells cultured in the 3D system to those cultured in the existing 2D system demonstrated an up-regulation of several genes activated by peroxisome proliferator-activated receptor alpha (PPAR{alpha}) signaling. Furthermore,more » using PPAR{alpha} agonists and antagonists, we also analyzed the effect of PPAR{alpha} signaling on the modulation of HCV replication using this system. The 3D in vitro system described in this study provides significant insight into the search for novel anti-HCV strategies that are specific to various strains of HCV.« less

  4. Constructing biological pathway models with hybrid functional Petri nets.

    PubMed

    Doi, Atsushi; Fujita, Sachie; Matsuno, Hiroshi; Nagasaki, Masao; Miyano, Satoru

    2004-01-01

    In many research projects on modeling and analyzing biological pathways, the Petri net has been recognized as a promising method for representing biological pathways. From the pioneering works by Reddy et al., 1993, and Hofestädt, 1994, that model metabolic pathways by traditional Petri net, several enhanced Petri nets such as colored Petri net, stochastic Petri net, and hybrid Petri net have been used for modeling biological phenomena. Recently, Matsuno et al., 2003b, introduced the hybrid functional Petri net (HFPN) in order to give a more intuitive and natural modeling method for biological pathways than these existing Petri nets. Although the paper demonstrates the effectiveness of HFPN with two examples of gene regulation mechanism for circadian rhythms and apoptosis signaling pathway, there has been no detailed explanation about the method of HFPN construction for these examples. The purpose of this paper is to describe method to construct biological pathways with the HFPN step-by-step. The method is demonstrated by the well-known glycolytic pathway controlled by the lac operon gene regulatory mechanism.

  5. Constructing biological pathway models with hybrid functional petri nets.

    PubMed

    Doi, Atsushi; Fujita, Sachie; Matsuno, Hiroshi; Nagasaki, Masao; Miyano, Satoru

    2011-01-01

    In many research projects on modeling and analyzing biological pathways, the Petri net has been recognized as a promising method for representing biological pathways. From the pioneering works by Reddy et al., 1993, and Hofestädt, 1994, that model metabolic pathways by traditional Petri net, several enhanced Petri nets such as colored Petri net, stochastic Petri net, and hybrid Petri net have been used for modeling biological phenomena. Recently, Matsuno et al., 2003b, introduced the hybrid functional Petri net (HFPN) in order to give a more intuitive and natural modeling method for biological pathways than these existing Petri nets. Although the paper demonstrates the effectiveness of HFPN with two examples of gene regulation mechanism for circadian rhythms and apoptosis signaling pathway, there has been no detailed explanation about the method of HFPN construction for these examples. The purpose of this paper is to describe method to construct biological pathways with the HFPN step-by-step. The method is demonstrated by the well-known glycolytic pathway controlled by the lac operon gene regulatory mechanism.

  6. The vertebrate phylotypic stage and an early bilaterian-related stage in mouse embryogenesis defined by genomic information.

    PubMed

    Irie, Naoki; Sehara-Fujisawa, Atsuko

    2007-01-12

    Embryos of taxonomically different vertebrates are thought to pass through a stage in which they resemble one another morphologically. This "vertebrate phylotypic stage" may represent the basic vertebrate body plan that was established in the common ancestor of vertebrates. However, much controversy remains about when the phylotypic stage appears, and whether it even exists. To overcome the limitations of studies based on morphological comparison, we explored a comprehensive quantitative method for defining the constrained stage using expressed sequence tag (EST) data, gene ontologies (GO), and available genomes of various animals. If strong developmental constraints occur during the phylotypic stage of vertebrate embryos, then genes conserved among vertebrates would be highly expressed at this stage. We established a novel method for evaluating the ancestral nature of mouse embryonic stages that does not depend on comparative morphology. The numerical "ancestor index" revealed that the mouse indeed has a highly conserved embryonic period at embryonic day 8.0-8.5, the time of appearance of the pharyngeal arch and somites. During this period, the mouse prominently expresses GO-determined developmental genes shared among vertebrates. Similar analyses revealed the existence of a bilaterian-related period, during which GO-determined developmental genes shared among bilaterians are markedly expressed at the cleavage-to-gastrulation period. The genes associated with the phylotypic stage identified by our method are essential in embryogenesis. Our results demonstrate that the mid-embryonic stage of the mouse is indeed highly constrained, supporting the existence of the phylotypic stage. Furthermore, this candidate stage is preceded by a putative bilaterian ancestor-related period. These results not only support the developmental hourglass model, but also highlight the hierarchical aspect of embryogenesis proposed by von Baer. Identification of conserved stages and tissues by this method in various animals would be a powerful tool to examine the phylotypic stage hypothesis, and to understand which kinds of developmental events and gene sets are evolutionarily constrained and how they limit the possible variations of animal basic body plans.

  7. Integron associated mobile genes: Just a collection of plug in apps or essential components of cell network hardware?

    PubMed

    Labbate, Maurizio; Boucher, Yan; Luu, Ivan; Chowdhury, Piklu Roy; Stokes, H W

    2012-01-01

    Lateral gene transfer (LGT) impacts on the evolution of prokaryotes in both the short and long-term. The short-term impacts of mobilized genes are a concern to humans since LGT explains the global rise of multi drug resistant pathogens seen in the past 70 years. However, LGT has been a feature of prokaryotes from the earliest days of their existence and the concept of a bifurcating tree of life is not entirely applicable to prokaryotes since most genes in extant prokaryotic genomes have probably been acquired from other lineages. Successful transfer and maintenance of a gene in a new host is understandable if it acts independently of cell networks and confers an advantage. Antibiotic resistance provides an example of this whereby a gene can be advantageous in virtually any cell across broad species backgrounds. In a longer evolutionary context however laterally transferred genes can be assimilated into even essential cell networks. How this happens is not well understood and we discuss recent work that identifies a mobile gene, unique to a cell lineage, which is detrimental to the cell when lost. We also present some additional data and believe our emerging model will be helpful in understanding how mobile genes integrate into cell networks.

  8. Validation of the β-amy1 transcription profiling assay and selection of reference genes suited for a RT-qPCR assay in developing barley caryopsis.

    PubMed

    Ovesná, Jaroslava; Kučera, Ladislav; Vaculová, Kateřina; Štrymplová, Kamila; Svobodová, Ilona; Milella, Luigi

    2012-01-01

    Reverse transcription coupled with real-time quantitative PCR (RT-qPCR) is a frequently used method for gene expression profiling. Reference genes (RGs) are commonly employed to normalize gene expression data. A limited information exist on the gene expression and profiling in developing barley caryopsis. Expression stability was assessed by measuring the cycle threshold (Ct) range and applying both the GeNorm (pair-wise comparison of geometric means) and Normfinder (model-based approach) principles for the calculation. Here, we have identified a set of four RGs suitable for studying gene expression in the developing barley caryopsis. These encode the proteins GAPDH, HSP90, HSP70 and ubiquitin. We found a correlation between the frequency of occurrence of a transcript in silico and its suitability as an RG. This set of RGs was tested by comparing the normalized level of β-amylase (β-amy1) transcript with directly measured quantities of the BMY1 gene product in the developing barley caryopsis. This panel of genes could be used for other gene expression studies, as well as to optimize β-amy1 analysis for study of the impact of β-amy1 expression upon barley end-use quality.

  9. Tools for visually exploring biological networks.

    PubMed

    Suderman, Matthew; Hallett, Michael

    2007-10-15

    Many tools exist for visually exploring biological networks including well-known examples such as Cytoscape, VisANT, Pathway Studio and Patika. These systems play a key role in the development of integrative biology, systems biology and integrative bioinformatics. The trend in the development of these tools is to go beyond 'static' representations of cellular state, towards a more dynamic model of cellular processes through the incorporation of gene expression data, subcellular localization information and time-dependent behavior. We provide a comprehensive review of the relative advantages and disadvantages of existing systems with two goals in mind: to aid researchers in efficiently identifying the appropriate existing tools for data visualization; to describe the necessary and realistic goals for the next generation of visualization tools. In view of the first goal, we provide in the Supplementary Material a systematic comparison of more than 35 existing tools in terms of over 25 different features. Supplementary data are available at Bioinformatics online.

  10. Rce1, a novel transcriptional repressor, regulates cellulase gene expression by antagonizing the transactivator Xyr1 in Trichoderma reesei.

    PubMed

    Cao, Yanli; Zheng, Fanglin; Wang, Lei; Zhao, Guolei; Chen, Guanjun; Zhang, Weixin; Liu, Weifeng

    2017-07-01

    Cellulase gene expression in the model cellulolytic fungus Trichoderma reesei is supposed to be controlled by an intricate regulatory network involving multiple transcription factors. Here, we identified a novel transcriptional repressor of cellulase gene expression, Rce1. Disruption of the rce1 gene not only facilitated the induced expression of cellulase genes but also led to a significant delay in terminating the induction process. However, Rce1 did not participate in Cre1-mediated catabolite repression. Electrophoretic mobility shift (EMSA) and DNase I footprinting assays in combination with chromatin immunoprecipitation (ChIP) demonstrated that Rce1 could bind directly to a cbh1 (cellobiohydrolase 1-encoding) gene promoter region containing a cluster of Xyr1 binding sites. Furthermore, competitive binding assays revealed that Rce1 antagonized Xyr1 from binding to the cbh1 promoter. These results indicate that intricate interactions exist between a variety of transcription factors to ensure tight and energy-efficient regulation of cellulase gene expression in T. reesei. This study also provides important clues regarding increased cellulase production in T. reesei. © 2017 John Wiley & Sons Ltd.

  11. An Adaptive Genetic Association Test Using Double Kernel Machines

    PubMed Central

    Zhan, Xiang; Epstein, Michael P.; Ghosh, Debashis

    2014-01-01

    Recently, gene set-based approaches have become very popular in gene expression profiling studies for assessing how genetic variants are related to disease outcomes. Since most genes are not differentially expressed, existing pathway tests considering all genes within a pathway suffer from considerable noise and power loss. Moreover, for a differentially expressed pathway, it is of interest to select important genes that drive the effect of the pathway. In this article, we propose an adaptive association test using double kernel machines (DKM), which can both select important genes within the pathway as well as test for the overall genetic pathway effect. This DKM procedure first uses the garrote kernel machines (GKM) test for the purposes of subset selection and then the least squares kernel machine (LSKM) test for testing the effect of the subset of genes. An appealing feature of the kernel machine framework is that it can provide a flexible and unified method for multi-dimensional modeling of the genetic pathway effect allowing for both parametric and nonparametric components. This DKM approach is illustrated with application to simulated data as well as to data from a neuroimaging genetics study. PMID:26640602

  12. VTCdb: a gene co-expression database for the crop species Vitis vinifera (grapevine).

    PubMed

    Wong, Darren C J; Sweetman, Crystal; Drew, Damian P; Ford, Christopher M

    2013-12-16

    Gene expression datasets in model plants such as Arabidopsis have contributed to our understanding of gene function and how a single underlying biological process can be governed by a diverse network of genes. The accumulation of publicly available microarray data encompassing a wide range of biological and environmental conditions has enabled the development of additional capabilities including gene co-expression analysis (GCA). GCA is based on the understanding that genes encoding proteins involved in similar and/or related biological processes may exhibit comparable expression patterns over a range of experimental conditions, developmental stages and tissues. We present an open access database for the investigation of gene co-expression networks within the cultivated grapevine, Vitis vinifera. The new gene co-expression database, VTCdb (http://vtcdb.adelaide.edu.au/Home.aspx), offers an online platform for transcriptional regulatory inference in the cultivated grapevine. Using condition-independent and condition-dependent approaches, grapevine co-expression networks were constructed using the latest publicly available microarray datasets from diverse experimental series, utilising the Affymetrix Vitis vinifera GeneChip (16 K) and the NimbleGen Grape Whole-genome microarray chip (29 K), thus making it possible to profile approximately 29,000 genes (95% of the predicted grapevine transcriptome). Applications available with the online platform include the use of gene names, probesets, modules or biological processes to query the co-expression networks, with the option to choose between Affymetrix or Nimblegen datasets and between multiple co-expression measures. Alternatively, the user can browse existing network modules using interactive network visualisation and analysis via CytoscapeWeb. To demonstrate the utility of the database, we present examples from three fundamental biological processes (berry development, photosynthesis and flavonoid biosynthesis) whereby the recovered sub-networks reconfirm established plant gene functions and also identify novel associations. Together, we present valuable insights into grapevine transcriptional regulation by developing network models applicable to researchers in their prioritisation of gene candidates, for on-going study of biological processes related to grapevine development, metabolism and stress responses.

  13. Using phylogenetically-informed annotation (PIA) to search for light-interacting genes in transcriptomes from non-model organisms.

    PubMed

    Speiser, Daniel I; Pankey, M Sabrina; Zaharoff, Alexander K; Battelle, Barbara A; Bracken-Grissom, Heather D; Breinholt, Jesse W; Bybee, Seth M; Cronin, Thomas W; Garm, Anders; Lindgren, Annie R; Patel, Nipam H; Porter, Megan L; Protas, Meredith E; Rivera, Ajna S; Serb, Jeanne M; Zigler, Kirk S; Crandall, Keith A; Oakley, Todd H

    2014-11-19

    Tools for high throughput sequencing and de novo assembly make the analysis of transcriptomes (i.e. the suite of genes expressed in a tissue) feasible for almost any organism. Yet a challenge for biologists is that it can be difficult to assign identities to gene sequences, especially from non-model organisms. Phylogenetic analyses are one useful method for assigning identities to these sequences, but such methods tend to be time-consuming because of the need to re-calculate trees for every gene of interest and each time a new data set is analyzed. In response, we employed existing tools for phylogenetic analysis to produce a computationally efficient, tree-based approach for annotating transcriptomes or new genomes that we term Phylogenetically-Informed Annotation (PIA), which places uncharacterized genes into pre-calculated phylogenies of gene families. We generated maximum likelihood trees for 109 genes from a Light Interaction Toolkit (LIT), a collection of genes that underlie the function or development of light-interacting structures in metazoans. To do so, we searched protein sequences predicted from 29 fully-sequenced genomes and built trees using tools for phylogenetic analysis in the Osiris package of Galaxy (an open-source workflow management system). Next, to rapidly annotate transcriptomes from organisms that lack sequenced genomes, we repurposed a maximum likelihood-based Evolutionary Placement Algorithm (implemented in RAxML) to place sequences of potential LIT genes on to our pre-calculated gene trees. Finally, we implemented PIA in Galaxy and used it to search for LIT genes in 28 newly-sequenced transcriptomes from the light-interacting tissues of a range of cephalopod mollusks, arthropods, and cubozoan cnidarians. Our new trees for LIT genes are available on the Bitbucket public repository ( http://bitbucket.org/osiris_phylogenetics/pia/ ) and we demonstrate PIA on a publicly-accessible web server ( http://galaxy-dev.cnsi.ucsb.edu/pia/ ). Our new trees for LIT genes will be a valuable resource for researchers studying the evolution of eyes or other light-interacting structures. We also introduce PIA, a high throughput method for using phylogenetic relationships to identify LIT genes in transcriptomes from non-model organisms. With simple modifications, our methods may be used to search for different sets of genes or to annotate data sets from taxa outside of Metazoa.

  14. Frozen gene pools - A future for species otherwise destined for extinction

    USGS Publications Warehouse

    Gee, G.F.

    1986-01-01

    Conclusion: Semen banks and ova and embryo banks can be practical methods to maintain gene pools. Gene pool preservation is desperately needed today due to the rapid decline in number of species and their habitat, a matter that is of concern to.biologists, economists, and politicians worldwide. Techniques are available for the cryopreservation of semen from many animals (and embryos from a few mammals) and adaptations of these techniques to other animals should be possible. A frozen gene pool in conjunction with existing programs makes it possible to preserve gene pools at less cost or in.some cases where no other alternative to extinction existed.

  15. Systems analysis of cis-regulatory motifs in C4 photosynthesis genes using maize and rice leaf transcriptomic data during a process of de-etiolation

    PubMed Central

    Xu, Jiajia; Bräutigam, Andrea; Weber, Andreas P. M.; Zhu, Xin-Guang

    2016-01-01

    Identification of potential cis-regulatory motifs controlling the development of C4 photosynthesis is a major focus of current research. In this study, we used time-series RNA-seq data collected from etiolated maize and rice leaf tissues sampled during a de-etiolation process to systematically characterize the expression patterns of C4-related genes and to further identify potential cis elements in five different genomic regions (i.e. promoter, 5′UTR, 3′UTR, intron, and coding sequence) of C4 orthologous genes. The results demonstrate that although most of the C4 genes show similar expression patterns, a number of them, including chloroplast dicarboxylate transporter 1, aspartate aminotransferase, and triose phosphate transporter, show shifted expression patterns compared with their C3 counterparts. A number of conserved short DNA motifs between maize C4 genes and their rice orthologous genes were identified not only in the promoter, 5′UTR, 3′UTR, and coding sequences, but also in the introns of core C4 genes. We also identified cis-regulatory motifs that exist in maize C4 genes and also in genes showing similar expression patterns as maize C4 genes but that do not exist in rice C3 orthologs, suggesting a possible recruitment of pre-existing cis-elements from genes unrelated to C4 photosynthesis into C4 photosynthesis genes during C4 evolution. PMID:27436282

  16. MED: a new non-supervised gene prediction algorithm for bacterial and archaeal genomes.

    PubMed

    Zhu, Huaiqiu; Hu, Gang-Qing; Yang, Yi-Fan; Wang, Jin; She, Zhen-Su

    2007-03-16

    Despite a remarkable success in the computational prediction of genes in Bacteria and Archaea, a lack of comprehensive understanding of prokaryotic gene structures prevents from further elucidation of differences among genomes. It continues to be interesting to develop new ab initio algorithms which not only accurately predict genes, but also facilitate comparative studies of prokaryotic genomes. This paper describes a new prokaryotic genefinding algorithm based on a comprehensive statistical model of protein coding Open Reading Frames (ORFs) and Translation Initiation Sites (TISs). The former is based on a linguistic "Entropy Density Profile" (EDP) model of coding DNA sequence and the latter comprises several relevant features related to the translation initiation. They are combined to form a so-called Multivariate Entropy Distance (MED) algorithm, MED 2.0, that incorporates several strategies in the iterative program. The iterations enable us to develop a non-supervised learning process and to obtain a set of genome-specific parameters for the gene structure, before making the prediction of genes. Results of extensive tests show that MED 2.0 achieves a competitive high performance in the gene prediction for both 5' and 3' end matches, compared to the current best prokaryotic gene finders. The advantage of the MED 2.0 is particularly evident for GC-rich genomes and archaeal genomes. Furthermore, the genome-specific parameters given by MED 2.0 match with the current understanding of prokaryotic genomes and may serve as tools for comparative genomic studies. In particular, MED 2.0 is shown to reveal divergent translation initiation mechanisms in archaeal genomes while making a more accurate prediction of TISs compared to the existing gene finders and the current GenBank annotation.

  17. Annotating novel genes by integrating synthetic lethals and genomic information

    PubMed Central

    Schöner, Daniel; Kalisch, Markus; Leisner, Christian; Meier, Lukas; Sohrmann, Marc; Faty, Mahamadou; Barral, Yves; Peter, Matthias; Gruissem, Wilhelm; Bühlmann, Peter

    2008-01-01

    Background Large scale screening for synthetic lethality serves as a common tool in yeast genetics to systematically search for genes that play a role in specific biological processes. Often the amounts of data resulting from a single large scale screen far exceed the capacities of experimental characterization of every identified target. Thus, there is need for computational tools that select promising candidate genes in order to reduce the number of follow-up experiments to a manageable size. Results We analyze synthetic lethality data for arp1 and jnm1, two spindle migration genes, in order to identify novel members in this process. To this end, we use an unsupervised statistical method that integrates additional information from biological data sources, such as gene expression, phenotypic profiling, RNA degradation and sequence similarity. Different from existing methods that require large amounts of synthetic lethal data, our method merely relies on synthetic lethality information from two single screens. Using a Multivariate Gaussian Mixture Model, we determine the best subset of features that assign the target genes to two groups. The approach identifies a small group of genes as candidates involved in spindle migration. Experimental testing confirms the majority of our candidates and we present she1 (YBL031W) as a novel gene involved in spindle migration. We applied the statistical methodology also to TOR2 signaling as another example. Conclusion We demonstrate the general use of Multivariate Gaussian Mixture Modeling for selecting candidate genes for experimental characterization from synthetic lethality data sets. For the given example, integration of different data sources contributes to the identification of genetic interaction partners of arp1 and jnm1 that play a role in the same biological process. PMID:18194531

  18. Transgenic Mouse Models of Childhood Onset Psychiatric Disorders

    PubMed Central

    Robertson, Holly R.; Feng, Guoping

    2011-01-01

    Childhood onset psychiatric disorders, such as Attention Deficit Hyperactivity Disorder (ADHD), Autism Spectrum Disorder (ASD), Mood Disorders, Obsessive Compulsive Spectrum Disorders (OCSD), and Schizophrenia (SZ), affect many school age children leading to a lower quality of life, including difficulties in school and personal relationships that persists into adulthood. Currently, the causes of these psychiatric disorders are poorly understood resulting in difficulty diagnosing affected children, and insufficient treatment options. Family and twin studies implicate a genetic contribution for ADHD, ASD, Mood Disorders, OCSD, and SZ. Identification of candidate genes and chromosomal regions associated with a particular disorder provide targets for directed research, and understanding how these genes influence the disease state will provide valuable insights for improving the diagnosis and treatment of children with psychiatric disorders. Animal models are one important approach in the study of human diseases, allowing for the use of a variety of experimental approaches to dissect the contribution of a specific chromosomal or genetic abnormality in human disorders. While it is impossible to model an entire psychiatric disorder in a single animal model, these models can be extremely valuable in dissecting out the specific role of a gene, pathway, neuron subtype, or brain region in a particular abnormal behavior. In this review we discuss existing transgenic mouse models for childhood onset psychiatric disorders. We compare the strength and weakness of various transgenic animal models proposed for each of the common childhood onset psychiatric disorders, and discuss future directions for the study of these disorders using cutting-edge genetic tools. PMID:21309772

  19. Topographical mapping of α- and β-keratins on developing chicken skin integuments: Functional interaction and evolutionary perspectives

    PubMed Central

    Wu, Ping; Ng, Chen Siang; Yan, Jie; Lai, Yung-Chih; Chen, Chih-Kuan; Lai, Yu-Ting; Wu, Siao-Man; Chen, Jiun-Jie; Luo, Weiqi; Widelitz, Randall B.; Li, Wen-Hsiung; Chuong, Cheng-Ming

    2015-01-01

    Avian integumentary organs include feathers, scales, claws, and beaks. They cover the body surface and play various functions to help adapt birds to diverse environments. These keratinized structures are mainly composed of corneous materials made of α-keratins, which exist in all vertebrates, and β-keratins, which only exist in birds and reptiles. Here, members of the keratin gene families were used to study how gene family evolution contributes to novelty and adaptation, focusing on tissue morphogenesis. Using chicken as a model, we applied RNA-seq and in situ hybridization to map α- and β-keratin genes in various skin appendages at embryonic developmental stages. The data demonstrate that temporal and spatial α- and β-keratin expression is involved in establishing the diversity of skin appendage phenotypes. Embryonic feathers express a higher proportion of β-keratin genes than other skin regions. In feather filament morphogenesis, β-keratins show intricate complexity in diverse substructures of feather branches. To explore functional interactions, we used a retrovirus transgenic system to ectopically express mutant α- or antisense β-keratin forms. α- and β-keratins show mutual dependence and mutations in either keratin type results in disrupted keratin networks and failure to form proper feather branches. Our data suggest that combinations of α- and β-keratin genes contribute to the morphological and structural diversity of different avian skin appendages, with feather-β-keratins conferring more possible composites in building intrafeather architecture complexity, setting up a platform of morphological evolution of functional forms in feathers. PMID:26598683

  20. A latent variable approach to study gene-environment interactions in the presence of multiple correlated exposures.

    PubMed

    Sánchez, Brisa N; Kang, Shan; Mukherjee, Bhramar

    2012-06-01

    Many existing cohort studies initially designed to investigate disease risk as a function of environmental exposures have collected genomic data in recent years with the objective of testing for gene-environment interaction (G × E) effects. In environmental epidemiology, interest in G × E arises primarily after a significant effect of the environmental exposure has been documented. Cohort studies often collect rich exposure data; as a result, assessing G × E effects in the presence of multiple exposure markers further increases the burden of multiple testing, an issue already present in both genetic and environment health studies. Latent variable (LV) models have been used in environmental epidemiology to reduce dimensionality of the exposure data, gain power by reducing multiplicity issues via condensing exposure data, and avoid collinearity problems due to presence of multiple correlated exposures. We extend the LV framework to characterize gene-environment interaction in presence of multiple correlated exposures and genotype categories. Further, similar to what has been done in case-control G × E studies, we use the assumption of gene-environment (G-E) independence to boost the power of tests for interaction. The consequences of making this assumption, or the issue of how to explicitly model G-E association has not been previously investigated in LV models. We postulate a hierarchy of assumptions about the LV model regarding the different forms of G-E dependence and show that making such assumptions may influence inferential results on the G, E, and G × E parameters. We implement a class of shrinkage estimators to data adaptively trade-off between the most restrictive to most flexible form of G-E dependence assumption and note that such class of compromise estimators can serve as a benchmark of model adequacy in LV models. We demonstrate the methods with an example from the Early Life Exposures in Mexico City to Neuro-Toxicants Study of lead exposure, iron metabolism genes, and birth weight. © 2011, The International Biometric Society.

  1. Incorporating time-delays in S-System model for reverse engineering genetic networks.

    PubMed

    Chowdhury, Ahsan Raja; Chetty, Madhu; Vinh, Nguyen Xuan

    2013-06-18

    In any gene regulatory network (GRN), the complex interactions occurring amongst transcription factors and target genes can be either instantaneous or time-delayed. However, many existing modeling approaches currently applied for inferring GRNs are unable to represent both these interactions simultaneously. As a result, all these approaches cannot detect important interactions of the other type. S-System model, a differential equation based approach which has been increasingly applied for modeling GRNs, also suffers from this limitation. In fact, all S-System based existing modeling approaches have been designed to capture only instantaneous interactions, and are unable to infer time-delayed interactions. In this paper, we propose a novel Time-Delayed S-System (TDSS) model which uses a set of delay differential equations to represent the system dynamics. The ability to incorporate time-delay parameters in the proposed S-System model enables simultaneous modeling of both instantaneous and time-delayed interactions. Furthermore, the delay parameters are not limited to just positive integer values (corresponding to time stamps in the data), but can also take fractional values. Moreover, we also propose a new criterion for model evaluation exploiting the sparse and scale-free nature of GRNs to effectively narrow down the search space, which not only reduces the computation time significantly but also improves model accuracy. The evaluation criterion systematically adapts the max-min in-degrees and also systematically balances the effect of network accuracy and complexity during optimization. The four well-known performance measures applied to the experimental studies on synthetic networks with various time-delayed regulations clearly demonstrate that the proposed method can capture both instantaneous and delayed interactions correctly with high precision. The experiments carried out on two well-known real-life networks, namely IRMA and SOS DNA repair network in Escherichia coli show a significant improvement compared with other state-of-the-art approaches for GRN modeling.

  2. Natural killer cell receptor genes in the family Equidae: not only Ly49.

    PubMed

    Futas, Jan; Horin, Petr

    2013-01-01

    Natural killer (NK) cells have important functions in immunity. NK recognition in mammals can be mediated through killer cell immunoglobulin-like receptors (KIR) and/or killer cell lectin-like Ly49 receptors. Genes encoding highly variable NK cell receptors (NKR) represent rapidly evolving genomic regions. No single conservative model of NKR genes was observed in mammals. Single-copy low polymorphic NKR genes present in one mammalian species may expand into highly polymorphic multigene families in other species. In contrast to other non-rodent mammals, multiple Ly49-like genes appear to exist in the horse, while no functional KIR genes were observed in this species. In this study, Ly49 and KIR were sought and their evolution was characterized in the entire family Equidae. Genomic sequences retrieved showed the presence of at least five highly conserved polymorphic Ly49 genes in horses, asses and zebras. These findings confirmed that the expansion of Ly49 occurred in the entire family. Several KIR-like sequences were also identified in the genome of Equids. Besides a previously identified non-functional KIR-Immunoglobulin-like transcript fusion gene (KIR-ILTA) and two putative pseudogenes, a KIR3DL-like sequence was analyzed. In contrast to previous observations made in the horse, the KIR3DL sequence, genomic organization and mRNA expression suggest that all Equids might produce a functional KIR receptor protein molecule with a single non-mutated immune tyrosine-based inhibition motif (ITIM) domain. No evidence for positive selection in the KIR3DL gene was found. Phylogenetic analysis including rhinoceros and tapir genomic DNA and deduced amino acid KIR-related sequences showed differences between families and even between species within the order Perissodactyla. The results suggest that the order Perissodactyla and its family Equidae with expanded Ly49 genes and with a potentially functional KIR gene may represent an interesting model for evolutionary biology of NKR genes.

  3. Natural Killer Cell Receptor Genes in the Family Equidae: Not only Ly49

    PubMed Central

    Futas, Jan; Horin, Petr

    2013-01-01

    Natural killer (NK) cells have important functions in immunity. NK recognition in mammals can be mediated through killer cell immunoglobulin-like receptors (KIR) and/or killer cell lectin-like Ly49 receptors. Genes encoding highly variable NK cell receptors (NKR) represent rapidly evolving genomic regions. No single conservative model of NKR genes was observed in mammals. Single-copy low polymorphic NKR genes present in one mammalian species may expand into highly polymorphic multigene families in other species. In contrast to other non-rodent mammals, multiple Ly49-like genes appear to exist in the horse, while no functional KIR genes were observed in this species. In this study, Ly49 and KIR were sought and their evolution was characterized in the entire family Equidae. Genomic sequences retrieved showed the presence of at least five highly conserved polymorphic Ly49 genes in horses, asses and zebras. These findings confirmed that the expansion of Ly49 occurred in the entire family. Several KIR-like sequences were also identified in the genome of Equids. Besides a previously identified non-functional KIR-Immunoglobulin-like transcript fusion gene (KIR-ILTA) and two putative pseudogenes, a KIR3DL-like sequence was analyzed. In contrast to previous observations made in the horse, the KIR3DL sequence, genomic organization and mRNA expression suggest that all Equids might produce a functional KIR receptor protein molecule with a single non-mutated immune tyrosine-based inhibition motif (ITIM) domain. No evidence for positive selection in the KIR3DL gene was found. Phylogenetic analysis including rhinoceros and tapir genomic DNA and deduced amino acid KIR-related sequences showed differences between families and even between species within the order Perissodactyla. The results suggest that the order Perissodactyla and its family Equidae with expanded Ly49 genes and with a potentially functional KIR gene may represent an interesting model for evolutionary biology of NKR genes. PMID:23724088

  4. A post-gene silencing bioinformatics protocol for plant-defence gene validation and underlying process identification: case study of the Arabidopsis thaliana NPR1.

    PubMed

    Yocgo, Rosita E; Geza, Ephifania; Chimusa, Emile R; Mazandu, Gaston K

    2017-11-23

    Advances in forward and reverse genetic techniques have enabled the discovery and identification of several plant defence genes based on quantifiable disease phenotypes in mutant populations. Existing models for testing the effect of gene inactivation or genes causing these phenotypes do not take into account eventual uncertainty of these datasets and potential noise inherent in the biological experiment used, which may mask downstream analysis and limit the use of these datasets. Moreover, elucidating biological mechanisms driving the induced disease resistance and influencing these observable disease phenotypes has never been systematically tackled, eliciting the need for an efficient model to characterize completely the gene target under consideration. We developed a post-gene silencing bioinformatics (post-GSB) protocol which accounts for potential biases related to the disease phenotype datasets in assessing the contribution of the gene target to the plant defence response. The post-GSB protocol uses Gene Ontology semantic similarity and pathway dataset to generate enriched process regulatory network based on the functional degeneracy of the plant proteome to help understand the induced plant defence response. We applied this protocol to investigate the effect of the NPR1 gene silencing to changes in Arabidopsis thaliana plants following Pseudomonas syringae pathovar tomato strain DC3000 infection. Results indicated that the presence of a functionally active NPR1 reduced the plant's susceptibility to the infection, with about 99% of variability in Pseudomonas spore growth between npr1 mutant and wild-type samples. Moreover, the post-GSB protocol has revealed the coordinate action of target-associated genes and pathways through an enriched process regulatory network, summarizing the potential target-based induced disease resistance mechanism. This protocol can improve the characterization of the gene target and, potentially, elucidate induced defence response by more effectively utilizing available phenotype information and plant proteome functional knowledge.

  5. Visualization of RNA structure models within the Integrative Genomics Viewer.

    PubMed

    Busan, Steven; Weeks, Kevin M

    2017-07-01

    Analyses of the interrelationships between RNA structure and function are increasingly important components of genomic studies. The SHAPE-MaP strategy enables accurate RNA structure probing and realistic structure modeling of kilobase-length noncoding RNAs and mRNAs. Existing tools for visualizing RNA structure models are not suitable for efficient analysis of long, structurally heterogeneous RNAs. In addition, structure models are often advantageously interpreted in the context of other experimental data and gene annotation information, for which few tools currently exist. We have developed a module within the widely used and well supported open-source Integrative Genomics Viewer (IGV) that allows visualization of SHAPE and other chemical probing data, including raw reactivities, data-driven structural entropies, and data-constrained base-pair secondary structure models, in context with linear genomic data tracks. We illustrate the usefulness of visualizing RNA structure in the IGV by exploring structure models for a large viral RNA genome, comparing bacterial mRNA structure in cells with its structure under cell- and protein-free conditions, and comparing a noncoding RNA structure modeled using SHAPE data with a base-pairing model inferred through sequence covariation analysis. © 2017 Busan and Weeks; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  6. Prediction and Validation of Disease Genes Using HeteSim Scores.

    PubMed

    Zeng, Xiangxiang; Liao, Yuanlu; Liu, Yuansheng; Zou, Quan

    2017-01-01

    Deciphering the gene disease association is an important goal in biomedical research. In this paper, we use a novel relevance measure, called HeteSim, to prioritize candidate disease genes. Two methods based on heterogeneous networks constructed using protein-protein interaction, gene-phenotype associations, and phenotype-phenotype similarity, are presented. In HeteSim_MultiPath (HSMP), HeteSim scores of different paths are combined with a constant that dampens the contributions of longer paths. In HeteSim_SVM (HSSVM), HeteSim scores are combined with a machine learning method. The 3-fold experiments show that our non-machine learning method HSMP performs better than the existing non-machine learning methods, our machine learning method HSSVM obtains similar accuracy with the best existing machine learning method CATAPULT. From the analysis of the top 10 predicted genes for different diseases, we found that HSSVM avoid the disadvantage of the existing machine learning based methods, which always predict similar genes for different diseases. The data sets and Matlab code for the two methods are freely available for download at http://lab.malab.cn/data/HeteSim/index.jsp.

  7. Molecular genetic approaches to the study of cellular senescence.

    PubMed

    Goletz, T J; Smith, J R; Pereira-Smith, O M

    1994-01-01

    Cellular senescence is an inability of cells to synthesize DNA and divide, which results in a terminal loss of proliferation despite the maintenance of basic metabolic processes. Senescence has been proposed as a model for the study of aging at the cellular level, and the basis for this model system and its features have been summarized. Although strong experimental evidence exists to support the hypothesis that cellular senescence is a dominant active process, the mechanisms responsible for this phenomenon remain a mystery. Investigators have taken several approaches to gain a better understanding of senescence. Several groups have documented the differences between young and senescent cells, and others have identified changes that occur during the course of a cell's in vitro life span. Using molecular and biochemical approaches, important changes in gene expression and function of cell-cycle-associated products have been identified. The active production of an inhibitor of DNA synthesis has been demonstrated. This may represent the final step in a cascade of events governing senescence. The study of immortal cells which have escaped senescence has also provided useful information, particularly with regard to the genes governing the senescence program. These studies have identified four complementation groups for indefinite division, which suggests that there are at least four genes or gene pathways in the senescence program. Through the use of microcell-mediated chromosome transfer, chromosomes encoding senescence genes have been identified; efforts to clone these genes are ongoing.(ABSTRACT TRUNCATED AT 250 WORDS)

  8. Controlled insertional mutagenesis using a LINE-1 (ORFeus) gene-trap mouse model.

    PubMed

    O'Donnell, Kathryn A; An, Wenfeng; Schrum, Christina T; Wheelan, Sarah J; Boeke, Jef D

    2013-07-16

    A codon-optimized mouse LINE-1 element, ORFeus, exhibits dramatically higher retrotransposition frequencies compared with its native long interspersed element 1 counterpart. To establish a retrotransposon-mediated mouse model with regulatable and potent mutagenic capabilities, we generated a tetracycline (tet)-regulated ORFeus element harboring a gene-trap cassette. Here, we show that mice expressing tet-ORFeus broadly exhibit robust retrotransposition in somatic tissues when treated with doxycycline. Consistent with a significant mutagenic burden, we observed a reduced number of double transgenic animals when treated with high-level doxycycline during embryogenesis. Transgene induction in skin resulted in a white spotting phenotype due to somatic ORFeus-mediated mutations that likely disrupt melanocyte development. The data suggest a high level of transposition in melanocyte precursors and consequent mutation of genes important for melanoblast proliferation, differentiation, or migration. These findings reveal the utility of a retrotransposon-based mutagenesis system as an alternative to existing DNA transposon systems. Moreover, breeding these mice to different tet-transactivator/reversible tet-transactivator lines supports broad functionality of tet-ORFeus because of the potential for dose-dependent, tissue-specific, and temporal-specific mutagenesis.

  9. Mapping annotations with textual evidence using an scLDA model.

    PubMed

    Jin, Bo; Chen, Vicky; Chen, Lujia; Lu, Xinghua

    2011-01-01

    Most of the knowledge regarding genes and proteins is stored in biomedical literature as free text. Extracting information from complex biomedical texts demands techniques capable of inferring biological concepts from local text regions and mapping them to controlled vocabularies. To this end, we present a sentence-based correspondence latent Dirichlet allocation (scLDA) model which, when trained with a corpus of PubMed documents with known GO annotations, performs the following tasks: 1) learning major biological concepts from the corpus, 2) inferring the biological concepts existing within text regions (sentences), and 3) identifying the text regions in a document that provides evidence for the observed annotations. When applied to new gene-related documents, a trained scLDA model is capable of predicting GO annotations and identifying text regions as textual evidence supporting the predicted annotations. This study uses GO annotation data as a testbed; the approach can be generalized to other annotated data, such as MeSH and MEDLINE documents.

  10. Evaluating High-Throughput Ab Initio Gene Finders to Discover Proteins Encoded in Eukaryotic Pathogen Genomes Missed by Laboratory Techniques

    PubMed Central

    Goodswen, Stephen J.; Kennedy, Paul J.; Ellis, John T.

    2012-01-01

    Next generation sequencing technology is advancing genome sequencing at an unprecedented level. By unravelling the code within a pathogen’s genome, every possible protein (prior to post-translational modifications) can theoretically be discovered, irrespective of life cycle stages and environmental stimuli. Now more than ever there is a great need for high-throughput ab initio gene finding. Ab initio gene finders use statistical models to predict genes and their exon-intron structures from the genome sequence alone. This paper evaluates whether existing ab initio gene finders can effectively predict genes to deduce proteins that have presently missed capture by laboratory techniques. An aim here is to identify possible patterns of prediction inaccuracies for gene finders as a whole irrespective of the target pathogen. All currently available ab initio gene finders are considered in the evaluation but only four fulfil high-throughput capability: AUGUSTUS, GeneMark_hmm, GlimmerHMM, and SNAP. These gene finders require training data specific to a target pathogen and consequently the evaluation results are inextricably linked to the availability and quality of the data. The pathogen, Toxoplasma gondii, is used to illustrate the evaluation methods. The results support current opinion that predicted exons by ab initio gene finders are inaccurate in the absence of experimental evidence. However, the results reveal some patterns of inaccuracy that are common to all gene finders and these inaccuracies may provide a focus area for future gene finder developers. PMID:23226328

  11. Cis-regulatory element based targeted gene finding: genome-wide identification of abscisic acid- and abiotic stress-responsive genes in Arabidopsis thaliana.

    PubMed

    Zhang, Weixiong; Ruan, Jianhua; Ho, Tuan-Hua David; You, Youngsook; Yu, Taotao; Quatrano, Ralph S

    2005-07-15

    A fundamental problem of computational genomics is identifying the genes that respond to certain endogenous cues and environmental stimuli. This problem can be referred to as targeted gene finding. Since gene regulation is mainly determined by the binding of transcription factors and cis-regulatory DNA sequences, most existing gene annotation methods, which exploit the conservation of open reading frames, are not effective in finding target genes. A viable approach to targeted gene finding is to exploit the cis-regulatory elements that are known to be responsible for the transcription of target genes. Given such cis-elements, putative target genes whose promoters contain the elements can be identified. As a case study, we apply the above approach to predict the genes in model plant Arabidopsis thaliana which are inducible by a phytohormone, abscisic acid (ABA), and abiotic stress, such as drought, cold and salinity. We first construct and analyze two ABA specific cis-elements, ABA-responsive element (ABRE) and its coupling element (CE), in A.thaliana, based on their conservation in rice and other cereal plants. We then use the ABRE-CE module to identify putative ABA-responsive genes in A.thaliana. Based on RT-PCR verification and the results from literature, this method has an accuracy rate of 67.5% for the top 40 predictions. The cis-element based targeted gene finding approach is expected to be widely applicable since a large number of cis-elements in many species are available.

  12. A Morpholino-based screen to identify novel genes involved in craniofacial morphogenesis

    PubMed Central

    Melvin, Vida Senkus; Feng, Weiguo; Hernandez-Lagunas, Laura; Artinger, Kristin Bruk; Williams, Trevor

    2014-01-01

    BACKGROUND The regulatory mechanisms underpinning facial development are conserved between diverse species. Therefore, results from model systems provide insight into the genetic causes of human craniofacial defects. Previously, we generated a comprehensive dataset examining gene expression during development and fusion of the mouse facial prominences. Here, we used this resource to identify genes that have dynamic expression patterns in the facial prominences, but for which only limited information exists concerning developmental function. RESULTS This set of ~80 genes was used for a high throughput functional analysis in the zebrafish system using Morpholino gene knockdown technology. This screen revealed three classes of cranial cartilage phenotypes depending upon whether knockdown of the gene affected the neurocranium, viscerocranium, or both. The targeted genes that produced consistent phenotypes encoded proteins linked to transcription (meis1, meis2a, tshz2, vgll4l), signaling (pkdcc, vlk, macc1, wu:fb16h09), and extracellular matrix function (smoc2). The majority of these phenotypes were not altered by reduction of p53 levels, demonstrating that both p53 dependent and independent mechanisms were involved in the craniofacial abnormalities. CONCLUSIONS This Morpholino-based screen highlights new genes involved in development of the zebrafish craniofacial skeleton with wider relevance to formation of the face in other species, particularly mouse and human. PMID:23559552

  13. Distinct RNAi Pathways in the Regulation of Physiology and Development in the Fungus Mucor circinelloides.

    PubMed

    Ruiz-Vázquez, Rosa M; Nicolás, Francisco E; Torres-Martínez, Santiago; Garre, Victoriano

    2015-01-01

    The basal fungus Mucor circinelloides has become, in recent years, a valuable model to study RNA-mediated gene silencing or RNA interference (RNAi). Serendipitously discovered in the late 1900s, the gene silencing in M. circinelloides is a landscape of consensus and dissents. Although similar to other classical fungal models in the basic design of the essential machinery that is responsible for silencing of gene expression, the existence of small RNA molecules of different sizes generated during this process and the presence of a mechanism that amplifies the silencing signal, give it a unique identity. In addition, M. circinelloides combines the components of RNAi machinery to carry out functions that not only limit themselves to the defense against foreign genetic material, but it uses some of these elements to regulate the expression of its own genes. Thus, different combinations of RNAi elements produce distinct classes of endogenous small RNAs (esRNAs) that regulate different physiological and developmental processes in response to environmental signals. The recent discovery of a new RNAi pathway involved in the specific degradation of endogenous mRNAs, using a novel RNase protein, adds one more element to the exciting puzzle of the gene silencing in M. circinelloides, in addition to providing hints about the evolutionary origin of the RNAi mechanism. Copyright © 2015 Elsevier Inc. All rights reserved.

  14. The aquatic animals' transcriptome resource for comparative functional analysis.

    PubMed

    Chou, Chih-Hung; Huang, Hsi-Yuan; Huang, Wei-Chih; Hsu, Sheng-Da; Hsiao, Chung-Der; Liu, Chia-Yu; Chen, Yu-Hung; Liu, Yu-Chen; Huang, Wei-Yun; Lee, Meng-Lin; Chen, Yi-Chang; Huang, Hsien-Da

    2018-05-09

    Aquatic animals have great economic and ecological importance. Among them, non-model organisms have been studied regarding eco-toxicity, stress biology, and environmental adaptation. Due to recent advances in next-generation sequencing techniques, large amounts of RNA-seq data for aquatic animals are publicly available. However, currently there is no comprehensive resource exist for the analysis, unification, and integration of these datasets. This study utilizes computational approaches to build a new resource of transcriptomic maps for aquatic animals. This aquatic animal transcriptome map database dbATM provides de novo assembly of transcriptome, gene annotation and comparative analysis of more than twenty aquatic organisms without draft genome. To improve the assembly quality, three computational tools (Trinity, Oases and SOAPdenovo-Trans) were employed to enhance individual transcriptome assembly, and CAP3 and CD-HIT-EST software were then used to merge these three assembled transcriptomes. In addition, functional annotation analysis provides valuable clues to gene characteristics, including full-length transcript coding regions, conserved domains, gene ontology and KEGG pathways. Furthermore, all aquatic animal genes are essential for comparative genomics tasks such as constructing homologous gene groups and blast databases and phylogenetic analysis. In conclusion, we establish a resource for non model organism aquatic animals, which is great economic and ecological importance and provide transcriptomic information including functional annotation and comparative transcriptome analysis. The database is now publically accessible through the URL http://dbATM.mbc.nctu.edu.tw/ .

  15. Variants in the SMARCA4 gene was associated with coronary heart disease susceptibility in Chinese han population.

    PubMed

    Guo, Xuan; Wang, Xiaohong; Wang, Yuan; Zhang, Chunyan; Quan, Xiaohui; Zhang, Yan; Jia, Shan; Ma, Weidong; Fan, Yajie; Wang, Congxia

    2017-01-31

    Coronary heart disease (CHD) is the leading cause of death worldwide. Many single-nucleotide polymorphisms (SNPs) are found to be related to the risk of CHD in previous studies. This study investigated whether polymorphism of SMARCA4 gene is associated with CHD. Genotypes at five CHD-relevant SNPs were determined in 456 cases of incident CHD and 685 unaffected controls in Chinese Han population using χ2 test, genetic model analysis and haplotype analysis. We also analysis the differences in continuous variables among the subjects with three genotypes of related genes were assessed using the ANOVA. We identified two susceptibility SNPs in the SMARCA4 gene that were potentially associated with a decreased risk of CHD. We identified rs11879293 (OR, 0.74; 95% CI, 0.59-0.96; P = 0.012) and rs12232780 (OR, 0.70; 95% CI, 0.54-0.90; P = 0.005) were associated with a decreased risk of CHD risk under the log-additive model adjusted by gender and age. Meanwhile, we also found that significant differences in glucose concentrations with rs11879293 and rs1122608 different genotype. Serum LDL-C and HDL-C were seen among the 3 genotypes of rs12232780 exist differences. This study provides an evidence for polymorphism of SMARCA4 gene associated with CHD development in Chinese Han population.

  16. The impact of rare variation on gene expression across tissues.

    PubMed

    Li, Xin; Kim, Yungil; Tsang, Emily K; Davis, Joe R; Damani, Farhan N; Chiang, Colby; Hess, Gaelen T; Zappala, Zachary; Strober, Benjamin J; Scott, Alexandra J; Li, Amy; Ganna, Andrea; Bassik, Michael C; Merker, Jason D; Hall, Ira M; Battle, Alexis; Montgomery, Stephen B

    2017-10-11

    Rare genetic variants are abundant in humans and are expected to contribute to individual disease risk. While genetic association studies have successfully identified common genetic variants associated with susceptibility, these studies are not practical for identifying rare variants. Efforts to distinguish pathogenic variants from benign rare variants have leveraged the genetic code to identify deleterious protein-coding alleles, but no analogous code exists for non-coding variants. Therefore, ascertaining which rare variants have phenotypic effects remains a major challenge. Rare non-coding variants have been associated with extreme gene expression in studies using single tissues, but their effects across tissues are unknown. Here we identify gene expression outliers, or individuals showing extreme expression levels for a particular gene, across 44 human tissues by using combined analyses of whole genomes and multi-tissue RNA-sequencing data from the Genotype-Tissue Expression (GTEx) project v6p release. We find that 58% of underexpression and 28% of overexpression outliers have nearby conserved rare variants compared to 8% of non-outliers. Additionally, we developed RIVER (RNA-informed variant effect on regulation), a Bayesian statistical model that incorporates expression data to predict a regulatory effect for rare variants with higher accuracy than models using genomic annotations alone. Overall, we demonstrate that rare variants contribute to large gene expression changes across tissues and provide an integrative method for interpretation of rare variants in individual genomes.

  17. ANISEED 2017: extending the integrated ascidian database to the exploration and evolutionary comparison of genome-scale datasets.

    PubMed

    Brozovic, Matija; Dantec, Christelle; Dardaillon, Justine; Dauga, Delphine; Faure, Emmanuel; Gineste, Mathieu; Louis, Alexandra; Naville, Magali; Nitta, Kazuhiro R; Piette, Jacques; Reeves, Wendy; Scornavacca, Céline; Simion, Paul; Vincentelli, Renaud; Bellec, Maelle; Aicha, Sameh Ben; Fagotto, Marie; Guéroult-Bellone, Marion; Haeussler, Maximilian; Jacox, Edwin; Lowe, Elijah K; Mendez, Mickael; Roberge, Alexis; Stolfi, Alberto; Yokomori, Rui; Brown, C Titus; Cambillau, Christian; Christiaen, Lionel; Delsuc, Frédéric; Douzery, Emmanuel; Dumollard, Rémi; Kusakabe, Takehiro; Nakai, Kenta; Nishida, Hiroki; Satou, Yutaka; Swalla, Billie; Veeman, Michael; Volff, Jean-Nicolas; Lemaire, Patrick

    2018-01-04

    ANISEED (www.aniseed.cnrs.fr) is the main model organism database for tunicates, the sister-group of vertebrates. This release gives access to annotated genomes, gene expression patterns, and anatomical descriptions for nine ascidian species. It provides increased integration with external molecular and taxonomy databases, better support for epigenomics datasets, in particular RNA-seq, ChIP-seq and SELEX-seq, and features novel interactive interfaces for existing and novel datatypes. In particular, the cross-species navigation and comparison is enhanced through a novel taxonomy section describing each represented species and through the implementation of interactive phylogenetic gene trees for 60% of tunicate genes. The gene expression section displays the results of RNA-seq experiments for the three major model species of solitary ascidians. Gene expression is controlled by the binding of transcription factors to cis-regulatory sequences. A high-resolution description of the DNA-binding specificity for 131 Ciona robusta (formerly C. intestinalis type A) transcription factors by SELEX-seq is provided and used to map candidate binding sites across the Ciona robusta and Phallusia mammillata genomes. Finally, use of a WashU Epigenome browser enhances genome navigation, while a Genomicus server was set up to explore microsynteny relationships within tunicates and with vertebrates, Amphioxus, echinoderms and hemichordates. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. Diurnal oscillations of soybean circadian clock and drought responsive genes.

    PubMed

    Marcolino-Gomes, Juliana; Rodrigues, Fabiana Aparecida; Fuganti-Pagliarini, Renata; Bendix, Claire; Nakayama, Thiago Jonas; Celaya, Brandon; Molinari, Hugo Bruno Correa; de Oliveira, Maria Cristina Neves; Harmon, Frank G; Nepomuceno, Alexandre

    2014-01-01

    Rhythms produced by the endogenous circadian clock play a critical role in allowing plants to respond and adapt to the environment. While there is a well-established regulatory link between the circadian clock and responses to abiotic stress in model plants, little is known of the circadian system in crop species like soybean. This study examines how drought impacts diurnal oscillation of both drought responsive and circadian clock genes in soybean. Drought stress induced marked changes in gene expression of several circadian clock-like components, such as LCL1-, GmELF4- and PRR-like genes, which had reduced expression in stressed plants. The same conditions produced a phase advance of expression for the GmTOC1-like, GmLUX-like and GmPRR7-like genes. Similarly, the rhythmic expression pattern of the soybean drought-responsive genes DREB-, bZIP-, GOLS-, RAB18- and Remorin-like changed significantly after plant exposure to drought. In silico analysis of promoter regions of these genes revealed the presence of cis-elements associated both with stress and circadian clock regulation. Furthermore, some soybean genes with upstream ABRE elements were responsive to abscisic acid treatment. Our results indicate that some connection between the drought response and the circadian clock may exist in soybean since (i) drought stress affects gene expression of circadian clock components and (ii) several stress responsive genes display diurnal oscillation in soybeans.

  19. Diurnal Oscillations of Soybean Circadian Clock and Drought Responsive Genes

    PubMed Central

    Marcolino-Gomes, Juliana; Rodrigues, Fabiana Aparecida; Fuganti-Pagliarini, Renata; Bendix, Claire; Nakayama, Thiago Jonas; Celaya, Brandon; Molinari, Hugo Bruno Correa; de Oliveira, Maria Cristina Neves; Harmon, Frank G.; Nepomuceno, Alexandre

    2014-01-01

    Rhythms produced by the endogenous circadian clock play a critical role in allowing plants to respond and adapt to the environment. While there is a well-established regulatory link between the circadian clock and responses to abiotic stress in model plants, little is known of the circadian system in crop species like soybean. This study examines how drought impacts diurnal oscillation of both drought responsive and circadian clock genes in soybean. Drought stress induced marked changes in gene expression of several circadian clock-like components, such as LCL1-, GmELF4- and PRR-like genes, which had reduced expression in stressed plants. The same conditions produced a phase advance of expression for the GmTOC1-like, GmLUX-like and GmPRR7-like genes. Similarly, the rhythmic expression pattern of the soybean drought-responsive genes DREB-, bZIP-, GOLS-, RAB18- and Remorin-like changed significantly after plant exposure to drought. In silico analysis of promoter regions of these genes revealed the presence of cis-elements associated both with stress and circadian clock regulation. Furthermore, some soybean genes with upstream ABRE elements were responsive to abscisic acid treatment. Our results indicate that some connection between the drought response and the circadian clock may exist in soybean since (i) drought stress affects gene expression of circadian clock components and (ii) several stress responsive genes display diurnal oscillation in soybeans. PMID:24475115

  20. The US business cycle: power law scaling for interacting units with complex internal structure

    NASA Astrophysics Data System (ADS)

    Ormerod, Paul

    2002-11-01

    In the social sciences, there is increasing evidence of the existence of power law distributions. The distribution of recessions in capitalist economies has recently been shown to follow such a distribution. The preferred explanation for this is self-organised criticality. Gene Stanley and colleagues propose an alternative, namely that power law scaling can arise from the interplay between random multiplicative growth and the complex structure of the units composing the system. This paper offers a parsimonious model of the US business cycle based on similar principles. The business cycle, along with long-term growth, is one of the two features which distinguishes capitalism from all previously existing societies. Yet, economics lacks a satisfactory theory of the cycle. The source of cycles is posited in economic theory to be a series of random shocks which are external to the system. In this model, the cycle is an internal feature of the system, arising from the level of industrial concentration of the agents and the interactions between them. The model-in contrast to existing economic theories of the cycle-accounts for the key features of output growth in the US business cycle in the 20th century.

  1. Estimating true evolutionary distances under the DCJ model.

    PubMed

    Lin, Yu; Moret, Bernard M E

    2008-07-01

    Modern techniques can yield the ordering and strandedness of genes on each chromosome of a genome; such data already exists for hundreds of organisms. The evolutionary mechanisms through which the set of the genes of an organism is altered and reordered are of great interest to systematists, evolutionary biologists, comparative genomicists and biomedical researchers. Perhaps the most basic concept in this area is that of evolutionary distance between two genomes: under a given model of genomic evolution, how many events most likely took place to account for the difference between the two genomes? We present a method to estimate the true evolutionary distance between two genomes under the 'double-cut-and-join' (DCJ) model of genome rearrangement, a model under which a single multichromosomal operation accounts for all genomic rearrangement events: inversion, transposition, translocation, block interchange and chromosomal fusion and fission. Our method relies on a simple structural characterization of a genome pair and is both analytically and computationally tractable. We provide analytical results to describe the asymptotic behavior of genomes under the DCJ model, as well as experimental results on a wide variety of genome structures to exemplify the very high accuracy (and low variance) of our estimator. Our results provide a tool for accurate phylogenetic reconstruction from multichromosomal gene rearrangement data as well as a theoretical basis for refinements of the DCJ model to account for biological constraints. All of our software is available in source form under GPL at http://lcbb.epfl.ch.

  2. Discovering mutated driver genes through a robust and sparse co-regularized matrix factorization framework with prior information from mRNA expression patterns and interaction network.

    PubMed

    Xi, Jianing; Wang, Minghui; Li, Ao

    2018-06-05

    Discovery of mutated driver genes is one of the primary objective for studying tumorigenesis. To discover some relatively low frequently mutated driver genes from somatic mutation data, many existing methods incorporate interaction network as prior information. However, the prior information of mRNA expression patterns are not exploited by these existing network-based methods, which is also proven to be highly informative of cancer progressions. To incorporate prior information from both interaction network and mRNA expressions, we propose a robust and sparse co-regularized nonnegative matrix factorization to discover driver genes from mutation data. Furthermore, our framework also conducts Frobenius norm regularization to overcome overfitting issue. Sparsity-inducing penalty is employed to obtain sparse scores in gene representations, of which the top scored genes are selected as driver candidates. Evaluation experiments by known benchmarking genes indicate that the performance of our method benefits from the two type of prior information. Our method also outperforms the existing network-based methods, and detect some driver genes that are not predicted by the competing methods. In summary, our proposed method can improve the performance of driver gene discovery by effectively incorporating prior information from interaction network and mRNA expression patterns into a robust and sparse co-regularized matrix factorization framework.

  3. FGWAS: Functional genome wide association analysis.

    PubMed

    Huang, Chao; Thompson, Paul; Wang, Yalin; Yu, Yang; Zhang, Jingwen; Kong, Dehan; Colen, Rivka R; Knickmeyer, Rebecca C; Zhu, Hongtu

    2017-10-01

    Functional phenotypes (e.g., subcortical surface representation), which commonly arise in imaging genetic studies, have been used to detect putative genes for complexly inherited neuropsychiatric and neurodegenerative disorders. However, existing statistical methods largely ignore the functional features (e.g., functional smoothness and correlation). The aim of this paper is to develop a functional genome-wide association analysis (FGWAS) framework to efficiently carry out whole-genome analyses of functional phenotypes. FGWAS consists of three components: a multivariate varying coefficient model, a global sure independence screening procedure, and a test procedure. Compared with the standard multivariate regression model, the multivariate varying coefficient model explicitly models the functional features of functional phenotypes through the integration of smooth coefficient functions and functional principal component analysis. Statistically, compared with existing methods for genome-wide association studies (GWAS), FGWAS can substantially boost the detection power for discovering important genetic variants influencing brain structure and function. Simulation studies show that FGWAS outperforms existing GWAS methods for searching sparse signals in an extremely large search space, while controlling for the family-wise error rate. We have successfully applied FGWAS to large-scale analysis of data from the Alzheimer's Disease Neuroimaging Initiative for 708 subjects, 30,000 vertices on the left and right hippocampal surfaces, and 501,584 SNPs. Copyright © 2017 Elsevier Inc. All rights reserved.

  4. Examination of AVPR1a as an autism susceptibility gene.

    PubMed

    Wassink, T H; Piven, J; Vieland, V J; Pietila, J; Goedken, R J; Folstein, S E; Sheffield, V C

    2004-10-01

    Impaired reciprocal social interaction is one of the core features of autism. While its determinants are complex, one biomolecular pathway that clearly influences social behavior is the arginine-vasopressin (AVP) system. The behavioral effects of AVP are mediated through the AVP receptor 1a (AVPR1a), making the AVPR1a gene a reasonable candidate for autism susceptibility. We tested the gene's contribution to autism by screening its exons in 125 independent autistic probands and genotyping two promoter polymorphisms in 65 autism affected sibling pair (ASP) families. While we found no nonconservative coding sequence changes, we did identify evidence of linkage and of linkage disequilibrium. These results were most pronounced in a subset of the ASP families with relatively less severe impairment of language. Thus, though we did not demonstrate a disease-causing variant in the coding sequence, numerous nontraditional disease-causing genetic abnormalities are known to exist that would escape detection by traditional gene screening methods. Given the emerging biological, animal model, and now genetic data, AVPR1a and genes in the AVP system remain strong candidates for involvement in autism susceptibility and deserve continued scrutiny.

  5. HTS-Net: An integrated regulome-interactome approach for establishing network regulation models in high-throughput screenings

    PubMed Central

    Rioualen, Claire; Da Costa, Quentin; Chetrit, Bernard; Charafe-Jauffret, Emmanuelle; Ginestier, Christophe

    2017-01-01

    High-throughput RNAi screenings (HTS) allow quantifying the impact of the deletion of each gene in any particular function, from virus-host interactions to cell differentiation. However, there has been less development for functional analysis tools dedicated to RNAi analyses. HTS-Net, a network-based analysis program, was developed to identify gene regulatory modules impacted in high-throughput screenings, by integrating transcription factors-target genes interaction data (regulome) and protein-protein interaction networks (interactome) on top of screening z-scores. HTS-Net produces exhaustive HTML reports for results navigation and exploration. HTS-Net is a new pipeline for RNA interference screening analyses that proves better performance than simple gene rankings by z-scores, by re-prioritizing genes and replacing them in their biological context, as shown by the three studies that we reanalyzed. Formatted input data for the three studied datasets, source code and web site for testing the system are available from the companion web site at http://htsnet.marseille.inserm.fr/. We also compared our program with existing algorithms (CARD and hotnet2). PMID:28949986

  6. Diversity and complexity in chromatin recognition by TFII-I transcription factors in pluripotent embryonic stem cells and embryonic tissues.

    PubMed

    Makeyev, Aleksandr V; Enkhmandakh, Badam; Hong, Seung-Hyun; Joshi, Pujan; Shin, Dong-Guk; Bayarsaihan, Dashzeveg

    2012-01-01

    GTF2I and GTF2IRD1 encode a family of closely related transcription factors TFII-I and BEN critical in embryonic development. Both genes are deleted in Williams-Beuren syndrome, a complex genetic disorder associated with neurocognitive, craniofacial, dental and skeletal abnormalities. Although genome-wide promoter analysis has revealed the existence of multiple TFII-I binding sites in embryonic stem cells (ESCs), there was no correlation between TFII-I occupancy and gene expression. Surprisingly, TFII-I recognizes the promoter sequences enriched for H3K4me3/K27me3 bivalent domain, an epigenetic signature of developmentally important genes. Moreover, we discovered significant differences in the association between TFII-I and BEN with the cis-regulatory elements in ESCs and embryonic craniofacial tissues. Our data indicate that in embryonic tissues BEN, but not the highly homologous TFII-I, is primarily recruited to target gene promoters. We propose a "feed-forward model" of gene regulation to explain the specificity of promoter recognition by TFII-I factors in eukaryotic cells.

  7. Network Analysis of Rodent Transcriptomes in Spaceflight

    NASA Technical Reports Server (NTRS)

    Ramachandran, Maya; Fogle, Homer; Costes, Sylvain

    2017-01-01

    Network analysis methods leverage prior knowledge of cellular systems and the statistical and conceptual relationships between analyte measurements to determine gene connectivity. Correlation and conditional metrics are used to infer a network topology and provide a systems-level context for cellular responses. Integration across multiple experimental conditions and omics domains can reveal the regulatory mechanisms that underlie gene expression. GeneLab has assembled rich multi-omic (transcriptomics, proteomics, epigenomics, and epitranscriptomics) datasets for multiple murine tissues from the Rodent Research 1 (RR-1) experiment. RR-1 assesses the impact of 37 days of spaceflight on gene expression across a variety of tissue types, such as adrenal glands, quadriceps, gastrocnemius, tibalius anterior, extensor digitorum longus, soleus, eye, and kidney. Network analysis is particularly useful for RR-1 -omics datasets because it reinforces subtle relationships that may be overlooked in isolated analyses and subdues confounding factors. Our objective is to use network analysis to determine potential target nodes for therapeutic intervention and identify similarities with existing disease models. Multiple network algorithms are used for a higher confidence consensus.

  8. Systems analysis of cis-regulatory motifs in C4 photosynthesis genes using maize and rice leaf transcriptomic data during a process of de-etiolation.

    PubMed

    Xu, Jiajia; Bräutigam, Andrea; Weber, Andreas P M; Zhu, Xin-Guang

    2016-09-01

    Identification of potential cis-regulatory motifs controlling the development of C4 photosynthesis is a major focus of current research. In this study, we used time-series RNA-seq data collected from etiolated maize and rice leaf tissues sampled during a de-etiolation process to systematically characterize the expression patterns of C4-related genes and to further identify potential cis elements in five different genomic regions (i.e. promoter, 5'UTR, 3'UTR, intron, and coding sequence) of C4 orthologous genes. The results demonstrate that although most of the C4 genes show similar expression patterns, a number of them, including chloroplast dicarboxylate transporter 1, aspartate aminotransferase, and triose phosphate transporter, show shifted expression patterns compared with their C3 counterparts. A number of conserved short DNA motifs between maize C4 genes and their rice orthologous genes were identified not only in the promoter, 5'UTR, 3'UTR, and coding sequences, but also in the introns of core C4 genes. We also identified cis-regulatory motifs that exist in maize C4 genes and also in genes showing similar expression patterns as maize C4 genes but that do not exist in rice C3 orthologs, suggesting a possible recruitment of pre-existing cis-elements from genes unrelated to C4 photosynthesis into C4 photosynthesis genes during C4 evolution. © The Author 2016. Published by Oxford University Press on behalf of the Society for Experimental Biology.

  9. Heterogeneous Stock Rat: A Unique Animal Model for Mapping Genes Influencing Bone Fragility

    PubMed Central

    Alam, Imranul; Koller, Daniel L.; Sun, Qiwei; Roeder, Ryan K.; Cañete, Toni; Blázquez, Gloria; López-Aumatell, Regina; Martínez-Membrives, Esther; Vicens-Costa, Elia; Mont, Carme; Díaz, Sira; Tobeña, Adolf; Fernández-Teruel, Alberto; Whitley, Adam; Strid, Pernilla; Diez, Margarita; Johannesson, Martina; Flint, Jonathan; Econs, Michael J.; Turner, Charles H.; Foroud, Tatiana

    2011-01-01

    Previously, we demonstrated that skeletal mass, structure and biomechanical properties vary considerably among 11 different inbred rat strains. Subsequently, we performed quantitative trait loci (QTL) analysis in 4 inbred rat strains (F344, LEW, COP and DA) for different bone phenotypes and identified several candidate genes influencing various bone traits. The standard approach to narrowing QTL intervals down to a few candidate genes typically employs the generation of congenic lines, which is time consuming and often not successful. A potential alternative approach is to use a highly genetically informative animal model resource capable of delivering very high-resolution gene mapping such as Heterogeneous stock (HS) rat. HS rat was derived from eight inbred progenitors: ACI/N, BN/SsN, BUF/N, F344/N, M520/N, MR/N, WKY/N and WN/N. The genetic recombination pattern generated across 50 generations in these rats has been shown to deliver ultra-high even gene-level resolution for complex genetic studies. The purpose of this study is to investigate the usefulness of the HS rat model for fine mapping and identification of genes underlying bone fragility phenotypes. We compared bone geometry, density and strength phenotypes at multiple skeletal sites in HS rats with those obtained from 5 of the 8 progenitor inbred strains. In addition, we estimated the heritability for different bone phenotypes in these rats and employed principal component analysis to explore relationships among bone phenotypes in the HS rats. Our study demonstrates that significant variability exists for different skeletal phenotypes in HS rats compared with their inbred progenitors. In addition, we estimated high heritability for several bone phenotypes and biologically interpretable factors explaining significant overall variability, suggesting that the HS rat model could be a unique genetic resource for rapid and efficient discovery of the genetic determinants of bone fragility. PMID:21334473

  10. Heterogeneous stock rat: a unique animal model for mapping genes influencing bone fragility.

    PubMed

    Alam, Imranul; Koller, Daniel L; Sun, Qiwei; Roeder, Ryan K; Cañete, Toni; Blázquez, Gloria; López-Aumatell, Regina; Martínez-Membrives, Esther; Vicens-Costa, Elia; Mont, Carme; Díaz, Sira; Tobeña, Adolf; Fernández-Teruel, Alberto; Whitley, Adam; Strid, Pernilla; Diez, Margarita; Johannesson, Martina; Flint, Jonathan; Econs, Michael J; Turner, Charles H; Foroud, Tatiana

    2011-05-01

    Previously, we demonstrated that skeletal mass, structure and biomechanical properties vary considerably among 11 different inbred rat strains. Subsequently, we performed quantitative trait loci (QTL) analysis in four inbred rat strains (F344, LEW, COP and DA) for different bone phenotypes and identified several candidate genes influencing various bone traits. The standard approach to narrowing QTL intervals down to a few candidate genes typically employs the generation of congenic lines, which is time consuming and often not successful. A potential alternative approach is to use a highly genetically informative animal model resource capable of delivering very high resolution gene mapping such as Heterogeneous stock (HS) rat. HS rat was derived from eight inbred progenitors: ACI/N, BN/SsN, BUF/N, F344/N, M520/N, MR/N, WKY/N and WN/N. The genetic recombination pattern generated across 50 generations in these rats has been shown to deliver ultra-high even gene-level resolution for complex genetic studies. The purpose of this study is to investigate the usefulness of the HS rat model for fine mapping and identification of genes underlying bone fragility phenotypes. We compared bone geometry, density and strength phenotypes at multiple skeletal sites in HS rats with those obtained from five of the eight progenitor inbred strains. In addition, we estimated the heritability for different bone phenotypes in these rats and employed principal component analysis to explore relationships among bone phenotypes in the HS rats. Our study demonstrates that significant variability exists for different skeletal phenotypes in HS rats compared with their inbred progenitors. In addition, we estimated high heritability for several bone phenotypes and biologically interpretable factors explaining significant overall variability, suggesting that the HS rat model could be a unique genetic resource for rapid and efficient discovery of the genetic determinants of bone fragility. Copyright © 2010 Elsevier Inc. All rights reserved.

  11. NCKX3 was compensated by calcium transporting genes and bone resorption in a NCKX3 KO mouse model.

    PubMed

    Yang, Hyun; Ahn, Changhwan; Shin, Eun-Kyeong; Lee, Ji-Sun; An, Beum-Soo; Jeung, Eui-Bae

    2017-10-15

    Gene knockout is the most powerful tool for determination of gene function or permanent modification of the phenotypic characteristics of an animal. Existing methods for gene disruption are limited by their efficiency, time required for completion and potential for confounding off-target effects. In this study, a rapid single-step approach to knockout of a targeted gene in mice using zinc-finger nucleases (ZFNs) was demonstrated for generation of mutant (knockout; KO) alleles. Specifically, ZFNs to target the sodium/calcium/potassium exchanger3 (NCKX3) gene in C57bl/6j were designed using the concept of this approach. NCKX3 KO mice were generated and the phenotypic characterization and molecular regulation of active calcium transporting genes was assessed when mice were fed different calcium diets during growth. General phenotypes such as body weight and plasma ion level showed no distinct abnormalities. Thus, the potassium/sodium/calcium exchanger of NCKX3 KO mice proceeded normally in this study. As a result, the compensatory molecular regulation of this mechanism was elucidated. Renal TRPV5 mRNA of NCKX3 KO mice increased in both male and female mice. Expression of TRPV6 mRNA was only down-regulated in the duodenum of male KO mice. Renal- and duodenal expression of PTHR and VDR were not changed; however, GR mRNA expression was increased in the kidney of NCKX3 KO mice. Depletion of the NCKX3 gene in a KO mouse model showed loss of bone mineral contents and increased plasma parathyroid hormone, suggesting that NCKX3 may play a role in regulating calcium homeostasis. Copyright © 2017 Elsevier B.V. All rights reserved.

  12. Network Reconstruction From High-Dimensional Ordinary Differential Equations.

    PubMed

    Chen, Shizhe; Shojaie, Ali; Witten, Daniela M

    2017-01-01

    We consider the task of learning a dynamical system from high-dimensional time-course data. For instance, we might wish to estimate a gene regulatory network from gene expression data measured at discrete time points. We model the dynamical system nonparametrically as a system of additive ordinary differential equations. Most existing methods for parameter estimation in ordinary differential equations estimate the derivatives from noisy observations. This is known to be challenging and inefficient. We propose a novel approach that does not involve derivative estimation. We show that the proposed method can consistently recover the true network structure even in high dimensions, and we demonstrate empirical improvement over competing approaches. Supplementary materials for this article are available online.

  13. Novel Harmonic Regularization Approach for Variable Selection in Cox's Proportional Hazards Model

    PubMed Central

    Chu, Ge-Jin; Liang, Yong; Wang, Jia-Xuan

    2014-01-01

    Variable selection is an important issue in regression and a number of variable selection methods have been proposed involving nonconvex penalty functions. In this paper, we investigate a novel harmonic regularization method, which can approximate nonconvex Lq  (1/2 < q < 1) regularizations, to select key risk factors in the Cox's proportional hazards model using microarray gene expression data. The harmonic regularization method can be efficiently solved using our proposed direct path seeking approach, which can produce solutions that closely approximate those for the convex loss function and the nonconvex regularization. Simulation results based on the artificial datasets and four real microarray gene expression datasets, such as real diffuse large B-cell lymphoma (DCBCL), the lung cancer, and the AML datasets, show that the harmonic regularization method can be more accurate for variable selection than existing Lasso series methods. PMID:25506389

  14. Order Under Uncertainty: Robust Differential Expression Analysis Using Probabilistic Models for Pseudotime Inference

    PubMed Central

    Campbell, Kieran R.

    2016-01-01

    Single cell gene expression profiling can be used to quantify transcriptional dynamics in temporal processes, such as cell differentiation, using computational methods to label each cell with a ‘pseudotime’ where true time series experimentation is too difficult to perform. However, owing to the high variability in gene expression between individual cells, there is an inherent uncertainty in the precise temporal ordering of the cells. Pre-existing methods for pseudotime estimation have predominantly given point estimates precluding a rigorous analysis of the implications of uncertainty. We use probabilistic modelling techniques to quantify pseudotime uncertainty and propagate this into downstream differential expression analysis. We demonstrate that reliance on a point estimate of pseudotime can lead to inflated false discovery rates and that probabilistic approaches provide greater robustness and measures of the temporal resolution that can be obtained from pseudotime inference. PMID:27870852

  15. A new model for approximating RNA folding trajectories and population kinetics

    NASA Astrophysics Data System (ADS)

    Kirkpatrick, Bonnie; Hajiaghayi, Monir; Condon, Anne

    2013-01-01

    RNA participates both in functional aspects of the cell and in gene regulation. The interactions of these molecules are mediated by their secondary structure which can be viewed as a planar circle graph with arcs for all the chemical bonds between pairs of bases in the RNA sequence. The problem of predicting RNA secondary structure, specifically the chemically most probable structure, has many useful and efficient algorithms. This leaves RNA folding, the problem of predicting the dynamic behavior of RNA structure over time, as the main open problem. RNA folding is important for functional understanding because some RNA molecules change secondary structure in response to interactions with the environment. The full RNA folding model on at most O(3n) secondary structures is the gold standard. We present a new subset approximation model for the full model, give methods to analyze its accuracy and discuss the relative merits of our model as compared with a pre-existing subset approximation. The main advantage of our model is that it generates Monte Carlo folding pathways with the same probabilities with which they are generated under the full model. The pre-existing subset approximation does not have this property.

  16. An overview of bioinformatics methods for modeling biological pathways in yeast

    PubMed Central

    Hou, Jie; Acharya, Lipi; Zhu, Dongxiao

    2016-01-01

    The advent of high-throughput genomics techniques, along with the completion of genome sequencing projects, identification of protein–protein interactions and reconstruction of genome-scale pathways, has accelerated the development of systems biology research in the yeast organism Saccharomyces cerevisiae. In particular, discovery of biological pathways in yeast has become an important forefront in systems biology, which aims to understand the interactions among molecules within a cell leading to certain cellular processes in response to a specific environment. While the existing theoretical and experimental approaches enable the investigation of well-known pathways involved in metabolism, gene regulation and signal transduction, bioinformatics methods offer new insights into computational modeling of biological pathways. A wide range of computational approaches has been proposed in the past for reconstructing biological pathways from high-throughput datasets. Here we review selected bioinformatics approaches for modeling biological pathways in S. cerevisiae, including metabolic pathways, gene-regulatory pathways and signaling pathways. We start with reviewing the research on biological pathways followed by discussing key biological databases. In addition, several representative computational approaches for modeling biological pathways in yeast are discussed. PMID:26476430

  17. BoolNet--an R package for generation, reconstruction and analysis of Boolean networks.

    PubMed

    Müssel, Christoph; Hopfensitz, Martin; Kestler, Hans A

    2010-05-15

    As the study of information processing in living cells moves from individual pathways to complex regulatory networks, mathematical models and simulation become indispensable tools for analyzing the complex behavior of such networks and can provide deep insights into the functioning of cells. The dynamics of gene expression, for example, can be modeled with Boolean networks (BNs). These are mathematical models of low complexity, but have the advantage of being able to capture essential properties of gene-regulatory networks. However, current implementations of BNs only focus on different sub-aspects of this model and do not allow for a seamless integration into existing preprocessing pipelines. BoolNet efficiently integrates methods for synchronous, asynchronous and probabilistic BNs. This includes reconstructing networks from time series, generating random networks, robustness analysis via perturbation, Markov chain simulations, and identification and visualization of attractors. The package BoolNet is freely available from the R project at http://cran.r-project.org/ or http://www.informatik.uni-ulm.de/ni/mitarbeiter/HKestler/boolnet/ under Artistic License 2.0. hans.kestler@uni-ulm.de Supplementary data are available at Bioinformatics online.

  18. A computational approach to identify cellular heterogeneity and tissue-specific gene regulatory networks.

    PubMed

    Jambusaria, Ankit; Klomp, Jeff; Hong, Zhigang; Rafii, Shahin; Dai, Yang; Malik, Asrar B; Rehman, Jalees

    2018-06-07

    The heterogeneity of cells across tissue types represents a major challenge for studying biological mechanisms as well as for therapeutic targeting of distinct tissues. Computational prediction of tissue-specific gene regulatory networks may provide important insights into the mechanisms underlying the cellular heterogeneity of cells in distinct organs and tissues. Using three pathway analysis techniques, gene set enrichment analysis (GSEA), parametric analysis of gene set enrichment (PGSEA), alongside our novel model (HeteroPath), which assesses heterogeneously upregulated and downregulated genes within the context of pathways, we generated distinct tissue-specific gene regulatory networks. We analyzed gene expression data derived from freshly isolated heart, brain, and lung endothelial cells and populations of neurons in the hippocampus, cingulate cortex, and amygdala. In both datasets, we found that HeteroPath segregated the distinct cellular populations by identifying regulatory pathways that were not identified by GSEA or PGSEA. Using simulated datasets, HeteroPath demonstrated robustness that was comparable to what was seen using existing gene set enrichment methods. Furthermore, we generated tissue-specific gene regulatory networks involved in vascular heterogeneity and neuronal heterogeneity by performing motif enrichment of the heterogeneous genes identified by HeteroPath and linking the enriched motifs to regulatory transcription factors in the ENCODE database. HeteroPath assesses contextual bidirectional gene expression within pathways and thus allows for transcriptomic assessment of cellular heterogeneity. Unraveling tissue-specific heterogeneity of gene expression can lead to a better understanding of the molecular underpinnings of tissue-specific phenotypes.

  19. Elucidating Cannabinoid Biology in Zebrafish (Danio rerio)

    PubMed Central

    Krug, Randall G.; Clark, Karl J.

    2015-01-01

    The number of annual cannabinoid users exceeds 100,000,000 globally and an estimated 9 % of these individuals will suffer from dependency. Although exogenous cannabinoids, like those contained in marijuana, are known to exert their effects by disrupting the endocannabinoid system, a dearth of knowledge exists about the potential toxicological consequences on public health. Conversely, the endocannabinoid system represents a promising therapeutic target for a plethora of disorders because it functions to endogenously regulate a vast repertoire of physiological functions. Accordingly, the rapidly expanding field of cannabinoid biology has sought to leverage model organisms in order to provide both toxicological and therapeutic insights about altered endocannabinoid signaling. The primary goal of this manuscript is to review the existing field of cannabinoid research in the genetically tractable zebrafish model—focusing on the cannabinoid receptor genes, cnr1 and cnr2, and the genes that produce enzymes for synthesis and degradation of the cognate ligands anandamide and 2-arachidonylglycerol. Consideration is also given to research that has studied the effects of exposure to exogenous phytocannabinoids and synthetic cannabinoids that are known to interact with cannabinoid receptors. These results are considered in the context of either endocannabinoid gene expression or endocannabinoid gene function, and are integrated with findings from rodent studies. This provides the framework for a discussion of how zebrafish may be leveraged in the future to provide novel toxicological and therapeutic insights in the field of cannabinoid biology, which has become increasingly significant given recent trends in cannabis legislation. PMID:26192460

  20. Choroideremia research: Report and perspectives on the second international scientific symposium for choroideremia.

    PubMed

    Chan, Stephanie C; Bubela, Tania; Dimopoulos, Ioannis S; Freund, Paul R; Varkouhi, Amir K; MacDonald, Ian M

    2016-09-01

    To discuss progress in research on choroideremia (CHM) and related retinopathies with special emphasis on gene therapy approaches. Biomedical and clinical researchers from across the world as well as representatives of the social science research community were convened to the 2nd International Scientific Symposium for Choroideremia in Denver, Colorado in June 2014 to enhance our understanding of CHM and accelerate the translation of research to clinical application for the benefit of those affected by CHM. Pre-clinical research using cell and animal models continues to further our understanding in the pathogenesis of CHM as well as to demonstrate proof-of-concept for gene transfer strategies. With the advent of modern imaging technology, better outcome measures are being defined for upcoming clinical trials. Results from the first gene therapy trial in CHM show promise, with sustained visual improvement over 6 months post-treatment. Current and next-generation gene transfer approaches may make targeted vector delivery possible in the future for CHM and other inherited retinal diseases. While no accepted therapies exist for CHM, promising approaches using viral-vectored gene therapy and cell therapies are entering clinical trials for eye diseases, with gene therapy trials underway for CHM.

  1. Immunologic and gene expression profiles of spontaneous canine oligodendrogliomas.

    PubMed

    Filley, Anna; Henriquez, Mario; Bhowmik, Tanmoy; Tewari, Brij Nath; Rao, Xi; Wan, Jun; Miller, Margaret A; Liu, Yunlong; Bentley, R Timothy; Dey, Mahua

    2018-05-01

    Malignant glioma (MG), the most common primary brain tumor in adults, is extremely aggressive and uniformly fatal. Several treatment strategies have shown significant preclinical promise in murine models of glioma; however, none have produced meaningful clinical responses in human patients. We hypothesize that introduction of an additional preclinical animal model better approximating the complexity of human MG, particularly in interactions with host immune responses, will bridge the existing gap between these two stages of testing. Here, we characterize the immunologic landscape and gene expression profiles of spontaneous canine glioma and evaluate its potential for serving as such a translational model. RNA in situ hybridization, flowcytometry, and RNA sequencing were used to evaluate immune cell presence and gene expression in healthy and glioma-bearing canines. Similar to human MGs, canine gliomas demonstrated increased intratumoral immune cell infiltration (CD4+, CD8+ and CD4+Foxp3+ T cells). The peripheral blood of glioma-bearing dogs also contained a relatively greater proportion of CD4+Foxp3+ regulatory T cells and plasmacytoid dendritic cells. Tumors were strongly positive for PD-L1 expression and glioma-bearing animals also possessed a greater proportion of immune cells expressing the immune checkpoint receptors CTLA-4 and PD-1. Analysis of differentially expressed genes in our canine populations revealed several genetic changes paralleling those known to occur in human disease. Naturally occurring canine glioma has many characteristics closely resembling human disease, particularly with respect to genetic dysregulation and host immune responses to tumors, supporting its use as a translational model in the preclinical testing of prospective anti-glioma therapies proven successful in murine studies.

  2. Determination of nonlinear genetic architecture using compressed sensing.

    PubMed

    Ho, Chiu Man; Hsu, Stephen D H

    2015-01-01

    One of the fundamental problems of modern genomics is to extract the genetic architecture of a complex trait from a data set of individual genotypes and trait values. Establishing this important connection between genotype and phenotype is complicated by the large number of candidate genes, the potentially large number of causal loci, and the likely presence of some nonlinear interactions between different genes. Compressed Sensing methods obtain solutions to under-constrained systems of linear equations. These methods can be applied to the problem of determining the best model relating genotype to phenotype, and generally deliver better performance than simply regressing the phenotype against each genetic variant, one at a time. We introduce a Compressed Sensing method that can reconstruct nonlinear genetic models (i.e., including epistasis, or gene-gene interactions) from phenotype-genotype (GWAS) data. Our method uses L1-penalized regression applied to nonlinear functions of the sensing matrix. The computational and data resource requirements for our method are similar to those necessary for reconstruction of linear genetic models (or identification of gene-trait associations), assuming a condition of generalized sparsity, which limits the total number of gene-gene interactions. An example of a sparse nonlinear model is one in which a typical locus interacts with several or even many others, but only a small subset of all possible interactions exist. It seems plausible that most genetic architectures fall in this category. We give theoretical arguments suggesting that the method is nearly optimal in performance, and demonstrate its effectiveness on broad classes of nonlinear genetic models using simulated human genomes and the small amount of currently available real data. A phase transition (i.e., dramatic and qualitative change) in the behavior of the algorithm indicates when sufficient data is available for its successful application. Our results indicate that predictive models for many complex traits, including a variety of human disease susceptibilities (e.g., with additive heritability h (2)∼0.5), can be extracted from data sets comprised of n ⋆∼100s individuals, where s is the number of distinct causal variants influencing the trait. For example, given a trait controlled by ∼10 k loci, roughly a million individuals would be sufficient for application of the method.

  3. Antibody profiling using a recombinant protein-based multiplex ELISA array accelerates recombinant vaccine development: Case study on red sea bream iridovirus as a reverse vaccinology model.

    PubMed

    Matsuyama, Tomomasa; Sano, Natsumi; Takano, Tomokazu; Sakai, Takamitsu; Yasuike, Motoshige; Fujiwara, Atushi; Kawato, Yasuhiko; Kurita, Jun; Yoshida, Kazunori; Shimada, Yukinori; Nakayasu, Chihaya

    2018-05-03

    Predicting antigens that would be protective is crucial for the development of recombinant vaccine using genome based vaccine development, also known as reverse vaccinology. High-throughput antigen screening is effective for identifying vaccine target genes, particularly for pathogens for which minimal antigenicity data exist. Using red sea bream iridovirus (RSIV) as a research model, we developed enzyme-linked immune sorbent assay (ELISA) based RSIV-derived 72 recombinant antigen array to profile antiviral antibody responses in convalescent Japanese amberjack (Seriola quinqueradiata). Two and three genes for which the products were unrecognized and recognized, respectively, by antibodies in convalescent serum were selected for recombinant vaccine preparation, and the protective effect was examined in infection tests using Japanese amberjack and greater amberjack (S. dumerili). No protection was provided by vaccines prepared from gene products unrecognized by convalescent serum antibodies. By contrast, two vaccines prepared from gene products recognized by serum antibodies induced protective immunity in both fish species. These results indicate that ELISA array screening is effective for identifying antigens that induce protective immune responses. As this method does not require culturing of pathogens, it is also suitable for identifying protective antigens to un-culturable etiologic agents. Copyright © 2018 Elsevier Ltd. All rights reserved.

  4. Eukaryotic genomes may exhibit up to 10 generic classes of gene promoters.

    PubMed

    Gagniuc, Paul; Ionescu-Tirgoviste, Constantin

    2012-09-28

    The main function of gene promoters appears to be the integration of different gene products in their biological pathways in order to maintain homeostasis. Generally, promoters have been classified in two major classes, namely TATA and CpG. Nevertheless, many genes using the same combinatorial formation of transcription factors have different gene expression patterns. Accordingly, we tried to ask ourselves some fundamental questions: Why certain genes have an overall predisposition for higher gene expression levels than others? What causes such a predisposition? Is there a structural relationship of these sequences in different tissues? Is there a strong phylogenetic relationship between promoters of closely related species? In order to gain valuable insights into different promoter regions, we obtained a series of image-based patterns which allowed us to identify 10 generic classes of promoters. A comprehensive analysis was undertaken for promoter sequences from Arabidopsis thaliana, Drosophila melanogaster, Homo sapiens and Oryza sativa, and a more extensive analysis of tissue-specific promoters in humans. We observed a clear preference for these species to use certain classes of promoters for specific biological processes. Moreover, in humans, we found that different tissues use distinct classes of promoters, reflecting an emerging promoter network. Depending on the tissue type, comparisons made between these classes of promoters reveal a complementarity between their patterns whereas some other classes of promoters have been observed to occur in competition. Furthermore, we also noticed the existence of some transitional states between these classes of promoters that may explain certain evolutionary mechanisms, which suggest a possible predisposition for specific levels of gene expression and perhaps for a different number of factors responsible for triggering gene expression. Our conclusions are based on comprehensive data from three different databases and a new computer model whose core is using Kappa index of coincidence. To fully understand the connections between gene promoters and gene expression, we analyzed thousands of promoter sequences using our Kappa Index of Coincidence method and a specialized Optical Character Recognition (OCR) neural network. Under our criteria, 10 classes of promoters were detected. In addition, the existence of "transitional" promoters suggests that there is an evolutionary weighted continuum between classes, depending perhaps upon changes in their gene products.

  5. A mathematical model of breast cancer development, local treatment and recurrence.

    PubMed

    Enderling, Heiko; Chaplain, Mark A J; Anderson, Alexander R A; Vaidya, Jayant S

    2007-05-21

    Cancer development is a stepwise process through which normal somatic cells acquire mutations which enable them to escape their normal function in the tissue and become self-sufficient in survival. The number of mutations depends on the patient's age, genetic susceptibility and on the exposure of the patient to carcinogens throughout their life. It is believed that in every malignancy 4-6 crucial similar mutations have to occur on cancer-related genes. These genes are classified as oncogenes and tumour suppressor genes (TSGs) which gain or lose their function respectively, after they have received one mutative hit or both of their alleles have been knocked out. With the acquisition of each of the necessary mutations the transformed cell gains a selective advantage over normal cells, and the mutation will spread throughout the tissue via clonal expansion. We present a simplified model of this mutation and expansion process, in which we assume that the loss of two TSGs is sufficient to give rise to a cancer. Our mathematical model of the stepwise development of breast cancer verifies the idea that the normal mutation rate in genes is only sufficient to give rise to a tumour within a clinically observable time if a high number of breast stem cells and TSGs exist or genetic instability is involved as a driving force of the mutation pathway. Furthermore, our model shows that if a mutation occurred in stem cells pre-puberty, and formed a field of cells with this mutation through clonal formation of the breast, it is most likely that a tumour will arise from within this area. We then apply different treatment strategies, namely surgery and adjuvant external beam radiotherapy and targeted intraoperative radiotherapy (TARGIT) and use the model to identify different sources of local recurrence and analyse their prevention.

  6. TEMPORAL GENE INDUCTION PATTERNS IN SHEEPSHEAD MINNOWS EXPOSED TO 17-ESTRADIOL

    EPA Science Inventory

    Gene arrays provide a powerful method to examine changes in gene expression in fish due to chemical exposures in the environment. In this study, we expanded an existing gene array for sheepshead minnows (Cyprinodon variegatus) (SHM) and used it to examine temporal changes in gene...

  7. Recommended nomenclature for five mammalian carboxylesterase gene families: human, mouse, and rat genes and proteins.

    PubMed

    Holmes, Roger S; Wright, Matthew W; Laulederkind, Stanley J F; Cox, Laura A; Hosokawa, Masakiyo; Imai, Teruko; Ishibashi, Shun; Lehner, Richard; Miyazaki, Masao; Perkins, Everett J; Potter, Phillip M; Redinbo, Matthew R; Robert, Jacques; Satoh, Tetsuo; Yamashita, Tetsuro; Yan, Bingfan; Yokoi, Tsuyoshi; Zechner, Rudolf; Maltais, Lois J

    2010-10-01

    Mammalian carboxylesterase (CES or Ces) genes encode enzymes that participate in xenobiotic, drug, and lipid metabolism in the body and are members of at least five gene families. Tandem duplications have added more genes for some families, particularly for mouse and rat genomes, which has caused confusion in naming rodent Ces genes. This article describes a new nomenclature system for human, mouse, and rat carboxylesterase genes that identifies homolog gene families and allocates a unique name for each gene. The guidelines of human, mouse, and rat gene nomenclature committees were followed and "CES" (human) and "Ces" (mouse and rat) root symbols were used followed by the family number (e.g., human CES1). Where multiple genes were identified for a family or where a clash occurred with an existing gene name, a letter was added (e.g., human CES4A; mouse and rat Ces1a) that reflected gene relatedness among rodent species (e.g., mouse and rat Ces1a). Pseudogenes were named by adding "P" and a number to the human gene name (e.g., human CES1P1) or by using a new letter followed by ps for mouse and rat Ces pseudogenes (e.g., Ces2d-ps). Gene transcript isoforms were named by adding the GenBank accession ID to the gene symbol (e.g., human CES1_AB119995 or mouse Ces1e_BC019208). This nomenclature improves our understanding of human, mouse, and rat CES/Ces gene families and facilitates research into the structure, function, and evolution of these gene families. It also serves as a model for naming CES genes from other mammalian species.

  8. Understanding genetic regulatory networks

    NASA Astrophysics Data System (ADS)

    Kauffman, Stuart

    2003-04-01

    Random Boolean networks (RBM) were introduced about 35 years ago as first crude models of genetic regulatory networks. RBNs are comprised of N on-off genes, connected by a randomly assigned regulatory wiring diagram where each gene has K inputs, and each gene is controlled by a randomly assigned Boolean function. This procedure samples at random from the ensemble of all possible NK Boolean networks. The central ideas are to study the typical, or generic properties of this ensemble, and see 1) whether characteristic differences appear as K and biases in Boolean functions are introducted, and 2) whether a subclass of this ensemble has properties matching real cells. Such networks behave in an ordered or a chaotic regime, with a phase transition, "the edge of chaos" between the two regimes. Networks with continuous variables exhibit the same two regimes. Substantial evidence suggests that real cells are in the ordered regime. A key concept is that of an attractor. This is a reentrant trajectory of states of the network, called a state cycle. The central biological interpretation is that cell types are attractors. A number of properties differentiate the ordered and chaotic regimes. These include the size and number of attractors, the existence in the ordered regime of a percolating "sea" of genes frozen in the on or off state, with a remainder of isolated twinkling islands of genes, a power law distribution of avalanches of gene activity changes following perturbation to a single gene in the ordered regime versus a similar power law distribution plus a spike of enormous avalanches of gene changes in the chaotic regime, and the existence of branching pathway of "differentiation" between attractors induced by perturbations in the ordered regime. Noise is serious issue, since noise disrupts attractors. But numerical evidence suggests that attractors can be made very stable to noise, and meanwhile, metaplasias may be a biological manifestation of noise. As we learn more about the wiring diagram and constraints on rules controlling real genes, we can build refined ensembles reflecting these properties, study the generic properties of the refined ensembles, and hope to gain insight into the dynamics of real cells.

  9. Transposases are the most abundant, most ubiquitous genes in nature.

    PubMed

    Aziz, Ramy K; Breitbart, Mya; Edwards, Robert A

    2010-07-01

    Genes, like organisms, struggle for existence, and the most successful genes persist and widely disseminate in nature. The unbiased determination of the most successful genes requires access to sequence data from a wide range of phylogenetic taxa and ecosystems, which has finally become achievable thanks to the deluge of genomic and metagenomic sequences. Here, we analyzed 10 million protein-encoding genes and gene tags in sequenced bacterial, archaeal, eukaryotic and viral genomes and metagenomes, and our analysis demonstrates that genes encoding transposases are the most prevalent genes in nature. The finding that these genes, classically considered as selfish genes, outnumber essential or housekeeping genes suggests that they offer selective advantage to the genomes and ecosystems they inhabit, a hypothesis in agreement with an emerging body of literature. Their mobile nature not only promotes dissemination of transposable elements within and between genomes but also leads to mutations and rearrangements that can accelerate biological diversification and--consequently--evolution. By securing their own replication and dissemination, transposases guarantee to thrive so long as nucleic acid-based life forms exist.

  10. Extending gene ontology with gene association networks.

    PubMed

    Peng, Jiajie; Wang, Tao; Wang, Jixuan; Wang, Yadong; Chen, Jin

    2016-04-15

    Gene ontology (GO) is a widely used resource to describe the attributes for gene products. However, automatic GO maintenance remains to be difficult because of the complex logical reasoning and the need of biological knowledge that are not explicitly represented in the GO. The existing studies either construct whole GO based on network data or only infer the relations between existing GO terms. None is purposed to add new terms automatically to the existing GO. We proposed a new algorithm 'GOExtender' to efficiently identify all the connected gene pairs labeled by the same parent GO terms. GOExtender is used to predict new GO terms with biological network data, and connect them to the existing GO. Evaluation tests on biological process and cellular component categories of different GO releases showed that GOExtender can extend new GO terms automatically based on the biological network. Furthermore, we applied GOExtender to the recent release of GO and discovered new GO terms with strong support from literature. Software and supplementary document are available at www.msu.edu/%7Ejinchen/GOExtender jinchen@msu.edu or ydwang@hit.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  11. Gene refashioning through innovative shifting of reading frames in mosses.

    PubMed

    Guan, Yanlong; Liu, Li; Wang, Qia; Zhao, Jinjie; Li, Ping; Hu, Jinyong; Yang, Zefeng; Running, Mark P; Sun, Hang; Huang, Jinling

    2018-04-19

    Early-diverging land plants such as mosses are known for their outstanding abilities to grow in various terrestrial habitats, incorporating tremendous structural and physiological innovations, as well as many lineage-specific genes. How these genes and functional innovations evolved remains unclear. In this study, we show that a dual-coding gene YAN/AltYAN in the moss Physcomitrella patens evolved from a pre-existing hemerythrin gene. Experimental evidence indicates that YAN/AltYAN is involved in fatty acid and lipid metabolism, as well as oil body and wax formation. Strikingly, both the recently evolved dual-coding YAN/AltYAN and the pre-existing hemerythrin gene might have similar physiological effects on oil body biogenesis and dehydration resistance. These findings bear important implications in understanding the mechanisms of gene origination and the strategies of plants to fine-tune their adaptation to various habitats.

  12. Revisiting GMOs: Are There Differences in European Consumers’ Acceptance and Valuation for Cisgenically vs Transgenically Bred Rice?

    PubMed Central

    Delwaide, Anne-Cécile; Nalley, Lawton L.; Dixon, Bruce L.; Danforth, Diana M.; Nayga, Rodolfo M.; Van Loo, Ellen J.; Verbeke, Wim

    2015-01-01

    Both cisgenesis and transgenesis are plant breeding techniques that can be used to introduce new genes into plant genomes. However, transgenesis uses gene(s) from a non-plant organism or from a donor plant that is sexually incompatible with the recipient plant while cisgenesis involves the introduction of gene(s) from a crossable—sexually compatible—plant. Traditional breeding techniques could possibly achieve the same results as those from cisgenesis, but would require a much larger timeframe. Cisgenesis allows plant breeders to enhance an existing cultivar more quickly and with little to no genetic drag. The current regulation in the European Union (EU) on genetically modified organisms (GMOs) treats cisgenic plants the same as transgenic plants and both are mandatorily labeled as GMOs. This study estimates European consumers’ willingness-to-pay (WTP) for rice labeled as GM, cisgenic, with environmental benefits (which cisgenesis could provide), or any combination of these three attributes. Data were collected from 3,002 participants through an online survey administered in Belgium, France, the Netherlands, Spain and the United Kingdom in 2013. Censored regression models were used to model consumers’ WTP in each country. Model estimates highlight significant differences in WTP across countries. In all five countries, consumers are willing-to-pay a premium to avoid purchasing rice labeled as GM. In all countries except Spain, consumers have a significantly higher WTP to avoid consuming rice labeled as GM compared to rice labeled as cisgenic, suggesting that inserting genes from the plant’s own gene pool is more acceptable to consumers. Additionally, French consumers are willing-to-pay a premium for rice labeled as having environmental benefits compared to conventional rice. These findings suggest that not all GMOs are the same in consumers’ eyes and thus, from a consumer preference perspective, the differences between transgenic and cisgenic products are recommended to be reflected in GMO labeling and trade policies. PMID:25973946

  13. The anabolic/androgenic steroid nandrolone exacerbates gene expression modifications induced by mutant SOD1 in muscles of mice models of amyotrophic lateral sclerosis

    PubMed Central

    Galbiati, Mariarita; Onesto, Elisa; Zito, Arianna; Crippa, Valeria; Rusmini, Paola; Mariotti, Raffaella; Bentivoglio, Marina; Bendotti, Caterina; Poletti, Angelo

    2012-01-01

    Anabolic/androgenic steroids (AAS) are drugs that enhance muscle mass, and are often illegally utilized in athletes to improve their performances. Recent data suggest that the increased risk for amyotrophic lateral sclerosis (ALS) in male soccer and football players could be linked to AAS abuse. ALS is a motor neuron disease mainly occurring in sporadic (sALS) forms, but some familial forms (fALS) exist and have been linked to mutations in different genes. Some of these, in their wild type (wt) form, have been proposed as risk factors for sALS, i.e. superoxide dismutase 1 (SOD1) gene, whose mutations are causative of about 20% of fALS. Notably, SOD1 toxicity might occur both in motor neurons and in muscle cells. Using gastrocnemius muscles of mice overexpressing human mutant SOD1 (mutSOD1) at different disease stages, we found that the expression of a selected set of genes associated to muscle atrophy, MyoD, myogenin, atrogin-1, and transforming growth factor (TGF)β1, is up-regulated already at the presymptomatic stage. Atrogin-1 gene expression was increased also in mice overexpressing human wtSOD1. Similar alterations were found in axotomized mouse muscles and in cultured ALS myoblast models. In these ALS models, we then evaluated the pharmacological effects of the synthetic AAS nandrolone on the expression of the genes modified in ALS muscle. Nandrolone administration had no effects on MyoD, myogenin, and atrogin-1 expression, but it significantly increased TGFβ1 expression at disease onset. Altogether, these data suggest that, in fALS, muscle gene expression is altered at early stages, and AAS may exacerbate some of the alterations induced by SOD1 possibly acting as a contributing factor also in sALS. PMID:22178654

  14. The anabolic/androgenic steroid nandrolone exacerbates gene expression modifications induced by mutant SOD1 in muscles of mice models of amyotrophic lateral sclerosis.

    PubMed

    Galbiati, Mariarita; Onesto, Elisa; Zito, Arianna; Crippa, Valeria; Rusmini, Paola; Mariotti, Raffaella; Bentivoglio, Marina; Bendotti, Caterina; Poletti, Angelo

    2012-02-01

    Anabolic/androgenic steroids (AAS) are drugs that enhance muscle mass, and are often illegally utilized in athletes to improve their performances. Recent data suggest that the increased risk for amyotrophic lateral sclerosis (ALS) in male soccer and football players could be linked to AAS abuse. ALS is a motor neuron disease mainly occurring in sporadic (sALS) forms, but some familial forms (fALS) exist and have been linked to mutations in different genes. Some of these, in their wild type (wt) form, have been proposed as risk factors for sALS, i.e. superoxide dismutase 1 (SOD1) gene, whose mutations are causative of about 20% of fALS. Notably, SOD1 toxicity might occur both in motor neurons and in muscle cells. Using gastrocnemius muscles of mice overexpressing human mutant SOD1 (mutSOD1) at different disease stages, we found that the expression of a selected set of genes associated to muscle atrophy, MyoD, myogenin, atrogin-1, and transforming growth factor (TGF)β1, is up-regulated already at the presymptomatic stage. Atrogin-1 gene expression was increased also in mice overexpressing human wtSOD1. Similar alterations were found in axotomized mouse muscles and in cultured ALS myoblast models. In these ALS models, we then evaluated the pharmacological effects of the synthetic AAS nandrolone on the expression of the genes modified in ALS muscle. Nandrolone administration had no effects on MyoD, myogenin, and atrogin-1 expression, but it significantly increased TGFβ1 expression at disease onset. Altogether, these data suggest that, in fALS, muscle gene expression is altered at early stages, and AAS may exacerbate some of the alterations induced by SOD1 possibly acting as a contributing factor also in sALS. Copyright © 2011 Elsevier Ltd. All rights reserved.

  15. Revisiting GMOs: Are There Differences in European Consumers' Acceptance and Valuation for Cisgenically vs Transgenically Bred Rice?

    PubMed

    Delwaide, Anne-Cécile; Nalley, Lawton L; Dixon, Bruce L; Danforth, Diana M; Nayga, Rodolfo M; Van Loo, Ellen J; Verbeke, Wim

    2015-01-01

    Both cisgenesis and transgenesis are plant breeding techniques that can be used to introduce new genes into plant genomes. However, transgenesis uses gene(s) from a non-plant organism or from a donor plant that is sexually incompatible with the recipient plant while cisgenesis involves the introduction of gene(s) from a crossable--sexually compatible--plant. Traditional breeding techniques could possibly achieve the same results as those from cisgenesis, but would require a much larger timeframe. Cisgenesis allows plant breeders to enhance an existing cultivar more quickly and with little to no genetic drag. The current regulation in the European Union (EU) on genetically modified organisms (GMOs) treats cisgenic plants the same as transgenic plants and both are mandatorily labeled as GMOs. This study estimates European consumers' willingness-to-pay (WTP) for rice labeled as GM, cisgenic, with environmental benefits (which cisgenesis could provide), or any combination of these three attributes. Data were collected from 3,002 participants through an online survey administered in Belgium, France, the Netherlands, Spain and the United Kingdom in 2013. Censored regression models were used to model consumers' WTP in each country. Model estimates highlight significant differences in WTP across countries. In all five countries, consumers are willing-to-pay a premium to avoid purchasing rice labeled as GM. In all countries except Spain, consumers have a significantly higher WTP to avoid consuming rice labeled as GM compared to rice labeled as cisgenic, suggesting that inserting genes from the plant's own gene pool is more acceptable to consumers. Additionally, French consumers are willing-to-pay a premium for rice labeled as having environmental benefits compared to conventional rice. These findings suggest that not all GMOs are the same in consumers' eyes and thus, from a consumer preference perspective, the differences between transgenic and cisgenic products are recommended to be reflected in GMO labeling and trade policies.

  16. Tightly Regulated Expression of Autographa californica Multicapsid Nucleopolyhedrovirus Immediate Early Genes Emerges from Their Interactions and Possible Collective Behaviors

    PubMed Central

    Taka, Hitomi; Asano, Shin-ichiro; Matsuura, Yoshiharu; Bando, Hisanori

    2015-01-01

    To infect their hosts, DNA viruses must successfully initiate the expression of viral genes that control subsequent viral gene expression and manipulate the host environment. Viral genes that are immediately expressed upon infection play critical roles in the early infection process. In this study, we investigated the expression and regulation of five canonical regulatory immediate-early (IE) genes of Autographa californica multicapsid nucleopolyhedrovirus: ie0, ie1, ie2, me53, and pe38. A systematic transient gene-expression analysis revealed that these IE genes are generally transactivators, suggesting the existence of a highly interactive regulatory network. A genetic analysis using gene knockout viruses demonstrated that the expression of these IE genes was tolerant to the single deletions of activator IE genes in the early stage of infection. A network graph analysis on the regulatory relationships observed in the transient expression analysis suggested that the robustness of IE gene expression is due to the organization of the IE gene regulatory network and how each IE gene is activated. However, some regulatory relationships detected by the genetic analysis were contradictory to those observed in the transient expression analysis, especially for IE0-mediated regulation. Statistical modeling, combined with genetic analysis using knockout alleles for ie0 and ie1, showed that the repressor function of ie0 was due to the interaction between ie0 and ie1, not ie0 itself. Taken together, these systematic approaches provided insight into the topology and nature of the IE gene regulatory network. PMID:25816136

  17. Proceedings of the International Summit on Human Gene Editing: a global discussion-Washington, D.C., December 1-3, 2015.

    PubMed

    LaBarbera, Andrew R

    2016-09-01

    The US Academies of Sciences and Medicine, the Royal Society, and the Chinese Academy of Sciences convened a summit of experts in biology, medicine, law, ethics, sociology, and journalism, in December 2015 to review the state of the art in gene editing technology and discuss the medical and social ramifications of the technologies. The summit concluded with the following consensus recommendations: (1) intensive basic and preclinical research in animal and human models should proceed with appropriate legal and ethical oversight; (2) clinical applications in somatic cells must be rigorously evaluated within existing and evolving regulatory frameworks for gene therapy; (3) it would be irresponsible to proceed with any clinical use of germline editing until relevant safety and efficacy issues have been resolved and there is broad societal consensus about such a use; and (4) the international community should strive to establish generally acceptable uses of human germline editing.

  18. Stem cell and genetic therapies for the fetus.

    PubMed

    Roybal, Jessica L; Santore, Matthew T; Flake, Alan W

    2010-02-01

    Advances in prenatal diagnosis have led to the prenatal management of a variety of congenital diseases. Although prenatal stem cell and gene therapy await clinical application, they offer tremendous potential for the treatment of many genetic disorders. Normal developmental events in the fetus offer unique biologic advantages for the engraftment of hematopoietic stem cells and efficient gene transfer that are not present after birth. Although barriers to hematopoietic stem cell engraftment exist, progress has been made and preclinical studies are now underway for strategies based on prenatal tolerance induction to facilitate postnatal cellular transplantation. Similarly, in-utero gene therapy shows experimental promise for a host of diseases and proof-in-principle has been demonstrated in murine models, but ethical and safety issues still need to be addressed. Here we review the current status and future potential of prenatal cellular and genetic therapy. Copyright 2009 Elsevier Ltd. All rights reserved.

  19. Analysis of microarray leukemia data using an efficient MapReduce-based K-nearest-neighbor classifier.

    PubMed

    Kumar, Mukesh; Rath, Nitish Kumar; Rath, Santanu Kumar

    2016-04-01

    Microarray-based gene expression profiling has emerged as an efficient technique for classification, prognosis, diagnosis, and treatment of cancer. Frequent changes in the behavior of this disease generates an enormous volume of data. Microarray data satisfies both the veracity and velocity properties of big data, as it keeps changing with time. Therefore, the analysis of microarray datasets in a small amount of time is essential. They often contain a large amount of expression, but only a fraction of it comprises genes that are significantly expressed. The precise identification of genes of interest that are responsible for causing cancer are imperative in microarray data analysis. Most existing schemes employ a two-phase process such as feature selection/extraction followed by classification. In this paper, various statistical methods (tests) based on MapReduce are proposed for selecting relevant features. After feature selection, a MapReduce-based K-nearest neighbor (mrKNN) classifier is also employed to classify microarray data. These algorithms are successfully implemented in a Hadoop framework. A comparative analysis is done on these MapReduce-based models using microarray datasets of various dimensions. From the obtained results, it is observed that these models consume much less execution time than conventional models in processing big data. Copyright © 2016 Elsevier Inc. All rights reserved.

  20. Cell-bound lipases from Burkholderia sp. ZYB002: gene sequence analysis, expression, enzymatic characterization, and 3D structural model.

    PubMed

    Shu, Zhengyu; Lin, Hong; Shi, Shaolei; Mu, Xiangduo; Liu, Yanru; Huang, Jianzhong

    2016-05-03

    The whole-cell lipase from Burkholderia cepacia has been used as a biocatalyst in organic synthesis. However, there is no report in the literature on the component or the gene sequence of the cell-bound lipase from this species. Qualitative analysis of the cell-bound lipase would help to illuminate the regulation mechanism of gene expression and further improve the yield of the cell-bound lipase by gene engineering. Three predictive cell-bound lipases, lipA, lipC21 and lipC24, from Burkholderia sp. ZYB002 were cloned and expressed in E. coli. Both LipA and LipC24 displayed the lipase activity. LipC24 was a novel mesophilic enzyme and displayed preference for medium-chain-length acyl groups (C10-C14). The 3D structural model of LipC24 revealed the open Y-type active site. LipA displayed 96 % amino acid sequence identity with the known extracellular lipase. lipA-inactivation and lipC24-inactivation decreased the total cell-bound lipase activity of Burkholderia sp. ZYB002 by 42 % and 14 %, respectively. The cell-bound lipase activity from Burkholderia sp. ZYB002 originated from a multi-enzyme mixture with LipA as the main component. LipC24 was a novel lipase and displayed different enzymatic characteristics and structural model with LipA. Besides LipA and LipC24, other type of the cell-bound lipases (or esterases) should exist.

  1. The effects of upaB deletion and the double/triple deletion of upaB, aatA, and aatB genes on pathogenicity of avian pathogenic Escherichia coli.

    PubMed

    Zhu-Ge, Xiang-Kai; Pan, Zi-Hao; Tang, Fang; Mao, Xiang; Hu, Lin; Wang, Shao-Hui; Xu, Bin; Lu, Cheng-Ping; Fan, Hong-Jie; Dai, Jian-Jun

    2015-12-01

    Autotransporters (ATs) are associated with pathogenesis of Avian Pathogenic Escherichia coli (APEC). The molecular characterization of APEC ATs can provide insights about their relevance to APEC pathogenesis. Here, we characterized a conventional autotransporter UpaB in APEC DE205B genome. The upaB existed in 41.9 % of 236 APEC isolates and was predominantly associated with ECOR B2 and D. Our studies showed that UpaB mediates the DE205B adhesion in DF-1 cells, and enhances autoaggregation and biofilm formation of fimbria-negative E. coli AAEC189 (MG1655Δfim) in vitro. Deletion of upaB of DE205B attenuates the virulence in duck model and early colonization in the duck lungs during APEC systemic infection. Furthermore, double and triple deletion of upaB, aatA, and aatB genes cumulatively attenuated DE205B adhesion in DF-1 cells, accompanying with decreased 50 % lethal dose (LD50) in duck model and the early colonization in the duck lungs. However, DE205BΔupaB/ΔaatA/ΔaatB might "compensate" the influence of gene deletion by upregulating the expression of fimbrial adhesin genes yqiL, yadN, and vacuolating autotransporter vat during early colonization of APEC. Finally, we demonstrated that vaccination with recombinant UpaB, AatA, and AatB proteins conferred protection against colisepticemia caused by DE205B infection in duck model.

  2. Probing quantum frustrated systems via factorization of the ground state.

    PubMed

    Giampaolo, Salvatore M; Adesso, Gerardo; Illuminati, Fabrizio

    2010-05-21

    The existence of definite orders in frustrated quantum systems is related rigorously to the occurrence of fully factorized ground states below a threshold value of the frustration. Ground-state separability thus provides a natural measure of frustration: strongly frustrated systems are those that cannot accommodate for classical-like solutions. The exact form of the factorized ground states and the critical frustration are determined for various classes of nonexactly solvable spin models with different spatial ranges of the interactions. For weak frustration, the existence of disentangling transitions determines the range of applicability of mean-field descriptions in biological and physical problems such as stochastic gene expression and the stability of long-period modulated structures.

  3. Bmi1 represses Ink4a/Arf and Hox genes to regulate stem cells in the rodent incisor

    PubMed Central

    Biehs, Brian; Hu, Jimmy Kuang-Hsien; Strauli, Nicolas B.; Sangiorgi, Eugenio; Jung, Heekyung; Heber, Ralf-Peter; Ho, Sunita; Goodwin, Alice F.; Dasen, Jeremy S.; Capecchi, Mario R.; Klein, Ophir D.

    2013-01-01

    The polycomb group gene Bmi1 is required for maintenance of adult stem cells in many organs1, 2. Inactivation of Bmi1 leads to impaired stem cell self-renewal due to deregulated gene expression. One critical target of BMI1 is Ink4a/Arf, which encodes the cell cycle inhibitors p16ink4a and p19Arf3. However, deletion of Ink4a/Arf only partially rescues Bmi1 null phenotypes4, indicating that other important targets of BMI1 exist. Here, using the continuously-growing mouse incisor as a model system, we report that Bmi1 is expressed by incisor stem cells and that deletion of Bmi1 resulted in fewer stem cells, perturbed gene expression, and defective enamel production. Transcriptional profiling revealed that Hox expression is normally repressed by BMI1 in the adult, and functional assays demonstrated that BMI1-mediated repression of Hox genes preserves the undifferentiated state of stem cells. As Hox gene upregulation has also been reported in other systems when Bmi1 is inactivated1, 2, 5–7, our findings point to a general mechanism whereby BMI1-mediated repression of Hox genes is required for the maintenance of adult stem cells and for prevention of inappropriate differentiation. PMID:23728424

  4. SGP-1: Prediction and Validation of Homologous Genes Based on Sequence Alignments

    PubMed Central

    Wiehe, Thomas; Gebauer-Jung, Steffi; Mitchell-Olds, Thomas; Guigó, Roderic

    2001-01-01

    Conventional methods of gene prediction rely on the recognition of DNA-sequence signals, the coding potential or the comparison of a genomic sequence with a cDNA, EST, or protein database. Reasons for limited accuracy in many circumstances are species-specific training and the incompleteness of reference databases. Lately, comparative genome analysis has attracted increasing attention. Several analysis tools that are based on human/mouse comparisons are already available. Here, we present a program for the prediction of protein-coding genes, termed SGP-1 (Syntenic Gene Prediction), which is based on the similarity of homologous genomic sequences. In contrast to most existing tools, the accuracy of SGP-1 depends little on species-specific properties such as codon usage or the nucleotide distribution. SGP-1 may therefore be applied to nonstandard model organisms in vertebrates as well as in plants, without the need for extensive parameter training. In addition to predicting genes in large-scale genomic sequences, the program may be useful to validate gene structure annotations from databases. To this end, SGP-1 output also contains comparisons between predicted and annotated gene structures in HTML format. The program can be accessed via a Web server at http://soft.ice.mpg.de/sgp-1. The source code, written in ANSI C, is available on request from the authors. PMID:11544202

  5. Early Identification of Molecular Predictors of Heterotopic Ossification Following Extremity Blast Injury with a Biomarker Assay

    DTIC Science & Technology

    2018-03-01

    biomarkers were identified by correlation between animals exhibiting radiographic evidence of HO. 15. SUBJECT TERMS Heterotopic ossification, blast...the animal model that predict the occurrence of HO in our experimental animals and determine if a correlation exists to similarly predict the...impact on other disciplines? Up-regulation of genes in the Sprague-Dawley rat contributing to fibrosis and inflammation have been correlated with the

  6. Structural features based genome-wide characterization and prediction of nucleosome organization

    PubMed Central

    2012-01-01

    Background Nucleosome distribution along chromatin dictates genomic DNA accessibility and thus profoundly influences gene expression. However, the underlying mechanism of nucleosome formation remains elusive. Here, taking a structural perspective, we systematically explored nucleosome formation potential of genomic sequences and the effect on chromatin organization and gene expression in S. cerevisiae. Results We analyzed twelve structural features related to flexibility, curvature and energy of DNA sequences. The results showed that some structural features such as DNA denaturation, DNA-bending stiffness, Stacking energy, Z-DNA, Propeller twist and free energy, were highly correlated with in vitro and in vivo nucleosome occupancy. Specifically, they can be classified into two classes, one positively and the other negatively correlated with nucleosome occupancy. These two kinds of structural features facilitated nucleosome binding in centromere regions and repressed nucleosome formation in the promoter regions of protein-coding genes to mediate transcriptional regulation. Based on these analyses, we integrated all twelve structural features in a model to predict more accurately nucleosome occupancy in vivo than the existing methods that mainly depend on sequence compositional features. Furthermore, we developed a novel approach, named DLaNe, that located nucleosomes by detecting peaks of structural profiles, and built a meta predictor to integrate information from different structural features. As a comparison, we also constructed a hidden Markov model (HMM) to locate nucleosomes based on the profiles of these structural features. The result showed that the meta DLaNe and HMM-based method performed better than the existing methods, demonstrating the power of these structural features in predicting nucleosome positions. Conclusions Our analysis revealed that DNA structures significantly contribute to nucleosome organization and influence chromatin structure and gene expression regulation. The results indicated that our proposed methods are effective in predicting nucleosome occupancy and positions and that these structural features are highly predictive of nucleosome organization. The implementation of our DLaNe method based on structural features is available online. PMID:22449207

  7. Integrated Enrichment Analysis of Variants and Pathways in Genome-Wide Association Studies Indicates Central Role for IL-2 Signaling Genes in Type 1 Diabetes, and Cytokine Signaling Genes in Crohn's Disease

    PubMed Central

    Carbonetto, Peter; Stephens, Matthew

    2013-01-01

    Pathway analyses of genome-wide association studies aggregate information over sets of related genes, such as genes in common pathways, to identify gene sets that are enriched for variants associated with disease. We develop a model-based approach to pathway analysis, and apply this approach to data from the Wellcome Trust Case Control Consortium (WTCCC) studies. Our method offers several benefits over existing approaches. First, our method not only interrogates pathways for enrichment of disease associations, but also estimates the level of enrichment, which yields a coherent way to promote variants in enriched pathways, enhancing discovery of genes underlying disease. Second, our approach allows for multiple enriched pathways, a feature that leads to novel findings in two diseases where the major histocompatibility complex (MHC) is a major determinant of disease susceptibility. Third, by modeling disease as the combined effect of multiple markers, our method automatically accounts for linkage disequilibrium among variants. Interrogation of pathways from eight pathway databases yields strong support for enriched pathways, indicating links between Crohn's disease (CD) and cytokine-driven networks that modulate immune responses; between rheumatoid arthritis (RA) and “Measles” pathway genes involved in immune responses triggered by measles infection; and between type 1 diabetes (T1D) and IL2-mediated signaling genes. Prioritizing variants in these enriched pathways yields many additional putative disease associations compared to analyses without enrichment. For CD and RA, 7 of 8 additional non-MHC associations are corroborated by other studies, providing validation for our approach. For T1D, prioritization of IL-2 signaling genes yields strong evidence for 7 additional non-MHC candidate disease loci, as well as suggestive evidence for several more. Of the 7 strongest associations, 4 are validated by other studies, and 3 (near IL-2 signaling genes RAF1, MAPK14, and FYN) constitute novel putative T1D loci for further study. PMID:24098138

  8. Assessment of potential environmental risks of transgene flow in smallholder farming systems in Asia: Brassica napus as a case study in Korea.

    PubMed

    Zhang, Chuan-Jie; Yook, Min-Jung; Park, Hae-Rim; Lim, Soo-Hyun; Kim, Jin-Won; Nah, Gyoungju; Song, Hae-Ryong; Jo, Beom-Ho; Roh, Kyung Hee; Park, Suhyoung; Kim, Do-Soon

    2018-06-02

    The cultivation of genetically modified (GM) crops has raised many questions regarding their environmental risks, particularly about their ecological impact on non-target organisms, such as their closely-related relative species. Although evaluations of transgene flow from GM crops to their conventional crops has been conducted under large-scale farming system worldwide, in particular in North America and Australia, few studies have been conducted under smallholder farming systems in Asia with diverse crops in co-existence. A two-year field study was conducted to assess the potential environmental risks of gene flow from glufosinate-ammonium resistant (GR) Brassica napus to its conventional relatives, B. napus, B. juncea, and Raphanus sativus under simulated smallholder field conditions in Korea. Herbicide resistance and simple sequence repeat (SSR) markers were used to identify the hybrids. Hybridization frequency of B. napus × GR B. napus was 2.33% at a 2 m distance, which decreased to 0.007% at 75 m. For B. juncea, it was 0.076% at 2 m and decreased to 0.025% at 16 m. No gene flow was observed to R. sativus. The log-logistic model described hybridization frequency with increasing distance from GR B. napus to B. napus and B. juncea and predicted that the effective isolation distances for 0.01% gene flow from GR B. napus to B. napus and B. juncea were 122.5 and 23.7 m, respectively. Results suggest that long-distance gene flow from GR B. napus to B. napus and B. juncea is unlikely, but gene flow can potentially occur between adjacent fields where the smallholder farming systems exist. Copyright © 2018. Published by Elsevier B.V.

  9. Incidence of genome structure, DNA asymmetry, and cell physiology on T-DNA integration in chromosomes of the phytopathogenic fungus Leptosphaeria maculans.

    PubMed

    Bourras, Salim; Meyer, Michel; Grandaubert, Jonathan; Lapalu, Nicolas; Fudal, Isabelle; Linglin, Juliette; Ollivier, Benedicte; Blaise, Françoise; Balesdent, Marie-Hélène; Rouxel, Thierry

    2012-08-01

    The ever-increasing generation of sequence data is accompanied by unsatisfactory functional annotation, and complex genomes, such as those of plants and filamentous fungi, show a large number of genes with no predicted or known function. For functional annotation of unknown or hypothetical genes, the production of collections of mutants using Agrobacterium tumefaciens-mediated transformation (ATMT) associated with genotyping and phenotyping has gained wide acceptance. ATMT is also widely used to identify pathogenicity determinants in pathogenic fungi. A systematic analysis of T-DNA borders was performed in an ATMT-mutagenized collection of the phytopathogenic fungus Leptosphaeria maculans to evaluate the features of T-DNA integration in its particular transposable element-rich compartmentalized genome. A total of 318 T-DNA tags were recovered and analyzed for biases in chromosome and genic compartments, existence of CG/AT skews at the insertion site, and occurrence of microhomologies between the T-DNA left border (LB) and the target sequence. Functional annotation of targeted genes was done using the Gene Ontology annotation. The T-DNA integration mainly targeted gene-rich, transcriptionally active regions, and it favored biological processes consistent with the physiological status of a germinating spore. T-DNA integration was strongly biased toward regulatory regions, and mainly promoters. Consistent with the T-DNA intranuclear-targeting model, the density of T-DNA insertion correlated with CG skew near the transcription initiation site. The existence of microhomologies between promoter sequences and the T-DNA LB flanking sequence was also consistent with T-DNA integration to host DNA mediated by homologous recombination based on the microhomology-mediated end-joining pathway.

  10. Extension of the lod score: the mod score.

    PubMed

    Clerget-Darpoux, F

    2001-01-01

    In 1955 Morton proposed the lod score method both for testing linkage between loci and for estimating the recombination fraction between them. If a disease is controlled by a gene at one of these loci, the lod score computation requires the prior specification of an underlying model that assigns the probabilities of genotypes from the observed phenotypes. To address the case of linkage studies for diseases with unknown mode of inheritance, we suggested (Clerget-Darpoux et al., 1986) extending the lod score function to a so-called mod score function. In this function, the variables are both the recombination fraction and the disease model parameters. Maximizing the mod score function over all these parameters amounts to maximizing the probability of marker data conditional on the disease status. Under the absence of linkage, the mod score conforms to a chi-square distribution, with extra degrees of freedom in comparison to the lod score function (MacLean et al., 1993). The mod score is asymptotically maximum for the true disease model (Clerget-Darpoux and Bonaïti-Pellié, 1992; Hodge and Elston, 1994). Consequently, the power to detect linkage through mod score will be highest when the space of models where the maximization is performed includes the true model. On the other hand, one must avoid overparametrization of the model space. For example, when the approach is applied to affected sibpairs, only two constrained disease model parameters should be used (Knapp et al., 1994) for the mod score maximization. It is also important to emphasize the existence of a strong correlation between the disease gene location and the disease model. Consequently, there is poor resolution of the location of the susceptibility locus when the disease model at this locus is unknown. Of course, this is true regardless of the statistics used. The mod score may also be applied in a candidate gene strategy to model the potential effect of this gene in the disease. Since, however, it ignores the information provided both by disease segregation and by linkage disequilibrium between the marker alleles and the functional disease alleles, its power of discrimination between genetic models is weak. The MASC method (Clerget-Darpoux et al., 1988) has been designed to address more efficiently the objectives of a candidate gene approach.

  11. A novel gene network inference algorithm using predictive minimum description length approach.

    PubMed

    Chaitankar, Vijender; Ghosh, Preetam; Perkins, Edward J; Gong, Ping; Deng, Youping; Zhang, Chaoyang

    2010-05-28

    Reverse engineering of gene regulatory networks using information theory models has received much attention due to its simplicity, low computational cost, and capability of inferring large networks. One of the major problems with information theory models is to determine the threshold which defines the regulatory relationships between genes. The minimum description length (MDL) principle has been implemented to overcome this problem. The description length of the MDL principle is the sum of model length and data encoding length. A user-specified fine tuning parameter is used as control mechanism between model and data encoding, but it is difficult to find the optimal parameter. In this work, we proposed a new inference algorithm which incorporated mutual information (MI), conditional mutual information (CMI) and predictive minimum description length (PMDL) principle to infer gene regulatory networks from DNA microarray data. In this algorithm, the information theoretic quantities MI and CMI determine the regulatory relationships between genes and the PMDL principle method attempts to determine the best MI threshold without the need of a user-specified fine tuning parameter. The performance of the proposed algorithm was evaluated using both synthetic time series data sets and a biological time series data set for the yeast Saccharomyces cerevisiae. The benchmark quantities precision and recall were used as performance measures. The results show that the proposed algorithm produced less false edges and significantly improved the precision, as compared to the existing algorithm. For further analysis the performance of the algorithms was observed over different sizes of data. We have proposed a new algorithm that implements the PMDL principle for inferring gene regulatory networks from time series DNA microarray data that eliminates the need of a fine tuning parameter. The evaluation results obtained from both synthetic and actual biological data sets show that the PMDL principle is effective in determining the MI threshold and the developed algorithm improves precision of gene regulatory network inference. Based on the sensitivity analysis of all tested cases, an optimal CMI threshold value has been identified. Finally it was observed that the performance of the algorithms saturates at a certain threshold of data size.

  12. Linear score tests for variance components in linear mixed models and applications to genetic association studies.

    PubMed

    Qu, Long; Guennel, Tobias; Marshall, Scott L

    2013-12-01

    Following the rapid development of genome-scale genotyping technologies, genetic association mapping has become a popular tool to detect genomic regions responsible for certain (disease) phenotypes, especially in early-phase pharmacogenomic studies with limited sample size. In response to such applications, a good association test needs to be (1) applicable to a wide range of possible genetic models, including, but not limited to, the presence of gene-by-environment or gene-by-gene interactions and non-linearity of a group of marker effects, (2) accurate in small samples, fast to compute on the genomic scale, and amenable to large scale multiple testing corrections, and (3) reasonably powerful to locate causal genomic regions. The kernel machine method represented in linear mixed models provides a viable solution by transforming the problem into testing the nullity of variance components. In this study, we consider score-based tests by choosing a statistic linear in the score function. When the model under the null hypothesis has only one error variance parameter, our test is exact in finite samples. When the null model has more than one variance parameter, we develop a new moment-based approximation that performs well in simulations. Through simulations and analysis of real data, we demonstrate that the new test possesses most of the aforementioned characteristics, especially when compared to existing quadratic score tests or restricted likelihood ratio tests. © 2013, The International Biometric Society.

  13. Identification of learning and memory genes in canine; promoter investigation and determining the selective pressure.

    PubMed

    Seifi Moroudi, Reihane; Masoudi, Ali Akbar; Vaez Torshizi, Rasoul; Zandi, Mohammad

    2014-12-01

    One of the important behaviors of dogs is trainability which is affected by learning and memory genes. These kinds of the genes have not yet been identified in dogs. In the current research, these genes were found in animal models by mining the biological data and scientific literatures. The proteins of these genes were obtained from the UniProt database in dogs and humans. Not all homologous proteins perform similar functions, thus comparison of these proteins was studied in terms of protein families, domains, biological processes, molecular functions, and cellular location of metabolic pathways in Interpro, KEGG, Quick Go and Psort databases. The results showed that some of these proteins have the same performance in the rat or mouse, dog, and human. It is anticipated that the protein of these genes may be effective in learning and memory in dogs. Then, the expression pattern of the recognized genes was investigated in the dog hippocampus using the existing information in the GEO profile. The results showed that BDNF, TAC1 and CCK genes are expressed in the dog hippocampus, therefore, these genes could be strong candidates associated with learning and memory in dogs. Subsequently, due to the importance of the promoter regions in gene function, this region was investigated in the above genes. Analysis of the promoter indicated that the HNF-4 site of BDNF gene and the transcription start site of CCK gene is exposed to methylation. Phylogenetic analysis of protein sequences of these genes showed high similarity in each of these three genes among the studied species. The dN/dS ratio for BDNF, TAC1 and CCK genes indicates a purifying selection during the evolution of the genes.

  14. A literature search tool for intelligent extraction of disease-associated genes.

    PubMed

    Jung, Jae-Yoon; DeLuca, Todd F; Nelson, Tristan H; Wall, Dennis P

    2014-01-01

    To extract disorder-associated genes from the scientific literature in PubMed with greater sensitivity for literature-based support than existing methods. We developed a PubMed query to retrieve disorder-related, original research articles. Then we applied a rule-based text-mining algorithm with keyword matching to extract target disorders, genes with significant results, and the type of study described by the article. We compared our resulting candidate disorder genes and supporting references with existing databases. We demonstrated that our candidate gene set covers nearly all genes in manually curated databases, and that the references supporting the disorder-gene link are more extensive and accurate than other general purpose gene-to-disorder association databases. We implemented a novel publication search tool to find target articles, specifically focused on links between disorders and genotypes. Through comparison against gold-standard manually updated gene-disorder databases and comparison with automated databases of similar functionality we show that our tool can search through the entirety of PubMed to extract the main gene findings for human diseases rapidly and accurately.

  15. GC[Formula: see text]NMF: A Novel Matrix Factorization Framework for Gene-Phenotype Association Prediction.

    PubMed

    Zhang, Yaogong; Liu, Jiahui; Liu, Xiaohu; Hong, Yuxiang; Fan, Xin; Huang, Yalou; Wang, Yuan; Xie, Maoqiang

    2018-04-24

    Gene-phenotype association prediction can be applied to reveal the inherited basis of human diseases and facilitate drug development. Gene-phenotype associations are related to complex biological processes and influenced by various factors, such as relationship between phenotypes and that among genes. While due to sparseness of curated gene-phenotype associations and lack of integrated analysis of the joint effect of multiple factors, existing applications are limited to prediction accuracy and potential gene-phenotype association detection. In this paper, we propose a novel method by exploiting weighted graph constraint learned from hierarchical structures of phenotype data and group prior information among genes by inheriting advantages of Non-negative Matrix Factorization (NMF), called Weighted Graph Constraint and Group Centric Non-negative Matrix Factorization (GC[Formula: see text]NMF). Specifically, first we introduce the depth of parent-child relationships between two adjacent phenotypes in hierarchical phenotypic data as weighted graph constraint for a better phenotype understanding. Second, we utilize intra-group correlation among genes in a gene group as group constraint for gene understanding. Such information provides us with the intuition that genes in a group probably result in similar phenotypes. The model not only allows us to achieve a high-grade prediction performance, but also helps us to learn interpretable representation of genes and phenotypes simultaneously to facilitate future biological analysis. Experimental results on biological gene-phenotype association datasets of mouse and human demonstrate that GC[Formula: see text]NMF can obtain superior prediction accuracy and good understandability for biological explanation over other state-of-the-arts methods.

  16. Genome-Wide Transcriptome Analyses of Silicon Metabolism in Phaeodactylum tricornutum Reveal the Multilevel Regulation of Silicic Acid Transporters

    PubMed Central

    Sapriel, Guillaume; Quinet, Michelle; Heijde, Marc; Jourdren, Laurent; Tanty, Véronique; Luo, Guangzuo; Le Crom, Stéphane; Lopez, Pascal Jean

    2009-01-01

    Background Diatoms are largely responsible for production of biogenic silica in the global ocean. However, in surface seawater, Si(OH)4 can be a major limiting factor for diatom productivity. Analyzing at the global scale the genes networks involved in Si transport and metabolism is critical in order to elucidate Si biomineralization, and to understand diatoms contribution to biogeochemical cycles. Methodology/Principal Findings Using whole genome expression analyses we evaluated the transcriptional response to Si availability for the model species Phaeodactylum tricornutum. Among the differentially regulated genes we found genes involved in glutamine-nitrogen pathways, encoding putative extracellular matrix components, or involved in iron regulation. Some of these compounds may be good candidates for intracellular intermediates involved in silicic acid storage and/or intracellular transport, which are very important processes that remain mysterious in diatoms. Expression analyses and localization studies gave the first picture of the spatial distribution of a silicic acid transporter in a diatom model species, and support the existence of transcriptional and post-transcriptional regulations. Conclusions/Significance Our global analyses revealed that about one fourth of the differentially expressed genes are organized in clusters, underlying a possible evolution of P. tricornutum genome, and perhaps other pennate diatoms, toward a better optimization of its response to variable environmental stimuli. High fitness and adaptation of diatoms to various Si levels in marine environments might arise in part by global regulations from gene (expression level) to genomic (organization in clusters, dosage compensation by gene duplication), and by post-transcriptional regulation and spatial distribution of SIT proteins. PMID:19829693

  17. Epistasis Analysis for Estrogen Metabolic and Signaling Pathway Genes on Young Ischemic Stroke Patients

    PubMed Central

    Hsieh, Yi-Chen; Jeng, Jiann-Shing; Lin, Huey-Juan; Hu, Chaur-Jong; Yu, Chia-Chen; Lien, Li-Ming; Peng, Giia-Sheun; Chen, Chin-I; Tang, Sung-Chun; Chi, Nai-Fang; Tseng, Hung-Pin; Chern, Chang-Ming; Hsieh, Fang-I; Bai, Chyi-Huey; Chen, Yi-Rhu; Chiou, Hung-Yi; Jeng, Jiann-Shing; Tang, Sung-Chun; Yeh, Shin-Joe; Tsai, Li-Kai; Kong, Shin; Lien, Li-Ming; Chiu, Hou-Chang; Chen, Wei-Hung; Bai, Chyi-Huey; Huang, Tzu-Hsuan; Chi-Ieong, Lau; Wu, Ya-Ying; Yuan, Rey-Yue; Hu, Chaur-Jong; Sheu, Jau- Jiuan; Yu, Jia-Ming; Ho, Chun-Sum; Chen, Chin-I; Sung, Jia-Ying; Weng, Hsing-Yu; Han, Yu-Hsuan; Huang, Chun-Ping; Chung, Wen-Ting; Ke, Der-Shin; Lin, Huey-Juan; Chang, Chia-Yu; Yeh, Poh-Shiow; Lin, Kao-Chang; Cheng, Tain-Junn; Chou, Chih-Ho; Yang, Chun-Ming; Peng, Giia-Sheun; Lin, Jiann-Chyun; Hsu, Yaw-Don; Denq, Jong-Chyou; Lee, Jiunn-Tay; Hsu, Chang-Hung; Lin, Chun-Chieh; Yen, Che-Hung; Cheng, Chun-An; Sung, Yueh-Feng; Chen, Yuan-Liang; Lien, Ming-Tung; Chou, Chung-Hsing; Liu, Chia-Chen; Yang, Fu-Chi; Wu, Yi-Chung; Tso, An-Chen; Lai, Yu- Hua; Chiang, Chun-I; Tsai, Chia-Kuang; Liu, Meng-Ta; Lin, Ying-Che; Hsu, Yu-Chuan; Chen, Chih-Hung; Sung, Pi-Shan; Chern, Chang-Ming; Hu, Han-Hwa; Wong, Wen-Jang; Luk, Yun-On; Hsu, Li-Chi; Chung, Chih-Ping; Tseng, Hung-Pin; Liu, Chin-Hsiung; Lin, Chun-Liang; Lin, Hung-Chih; Hu, Chaur-Jong

    2012-01-01

    Background Endogenous estrogens play an important role in the overall cardiocirculatory system. However, there are no studies exploring the hormone metabolism and signaling pathway genes together on ischemic stroke, including sulfotransferase family 1E (SULT1E1), catechol-O-methyl-transferase (COMT), and estrogen receptor α (ESR1). Methods A case-control study was conducted on 305 young ischemic stroke subjects aged ≦ 50 years and 309 age-matched healthy controls. SULT1E1 -64G/A, COMT Val158Met, ESR1 c.454−397 T/C and c.454−351 A/G genes were genotyped and compared between cases and controls to identify single nucleotide polymorphisms associated with ischemic stroke susceptibility. Gene-gene interaction effects were analyzed using entropy-based multifactor dimensionality reduction (MDR), classification and regression tree (CART), and traditional multiple regression models. Results COMT Val158Met polymorphism showed a significant association with susceptibility of young ischemic stroke among females. There was a two-way interaction between SULT1E1 -64G/A and COMT Val158Met in both MDR and CART analysis. The logistic regression model also showed there was a significant interaction effect between SULT1E1 -64G/A and COMT Val158Met on ischemic stroke of the young (P for interaction = 0.0171). We further found that lower estradiol level could increase the risk of young ischemic stroke for those who carry either SULT1E1 or COMT risk genotypes, showing a significant interaction effect (P for interaction = 0.0174). Conclusions Our findings support that a significant epistasis effect exists among estrogen metabolic and signaling pathway genes and gene-environment interactions on young ischemic stroke subjects. PMID:23112845

  18. Evaluation and integration of functional annotation pipelines for newly sequenced organisms: the potato genome as a test case.

    PubMed

    Amar, David; Frades, Itziar; Danek, Agnieszka; Goldberg, Tatyana; Sharma, Sanjeev K; Hedley, Pete E; Proux-Wera, Estelle; Andreasson, Erik; Shamir, Ron; Tzfadia, Oren; Alexandersson, Erik

    2014-12-05

    For most organisms, even if their genome sequence is available, little functional information about individual genes or proteins exists. Several annotation pipelines have been developed for functional analysis based on sequence, 'omics', and literature data. However, researchers encounter little guidance on how well they perform. Here, we used the recently sequenced potato genome as a case study. The potato genome was selected since its genome is newly sequenced and it is a non-model plant even if there is relatively ample information on individual potato genes, and multiple gene expression profiles are available. We show that the automatic gene annotations of potato have low accuracy when compared to a "gold standard" based on experimentally validated potato genes. Furthermore, we evaluate six state-of-the-art annotation pipelines and show that their predictions are markedly dissimilar (Jaccard similarity coefficient of 0.27 between pipelines on average). To overcome this discrepancy, we introduce a simple GO structure-based algorithm that reconciles the predictions of the different pipelines. We show that the integrated annotation covers more genes, increases by over 50% the number of highly co-expressed GO processes, and obtains much higher agreement with the gold standard. We find that different annotation pipelines produce different results, and show how to integrate them into a unified annotation that is of higher quality than each single pipeline. We offer an improved functional annotation of both PGSC and ITAG potato gene models, as well as tools that can be applied to additional pipelines and improve annotation in other organisms. This will greatly aid future functional analysis of '-omics' datasets from potato and other organisms with newly sequenced genomes. The new potato annotations are available with this paper.

  19. Identification of causal genes for complex traits

    PubMed Central

    Hormozdiari, Farhad; Kichaev, Gleb; Yang, Wen-Yun; Pasaniuc, Bogdan; Eskin, Eleazar

    2015-01-01

    Motivation: Although genome-wide association studies (GWAS) have identified thousands of variants associated with common diseases and complex traits, only a handful of these variants are validated to be causal. We consider ‘causal variants’ as variants which are responsible for the association signal at a locus. As opposed to association studies that benefit from linkage disequilibrium (LD), the main challenge in identifying causal variants at associated loci lies in distinguishing among the many closely correlated variants due to LD. This is particularly important for model organisms such as inbred mice, where LD extends much further than in human populations, resulting in large stretches of the genome with significantly associated variants. Furthermore, these model organisms are highly structured and require correction for population structure to remove potential spurious associations. Results: In this work, we propose CAVIAR-Gene (CAusal Variants Identification in Associated Regions), a novel method that is able to operate across large LD regions of the genome while also correcting for population structure. A key feature of our approach is that it provides as output a minimally sized set of genes that captures the genes which harbor causal variants with probability ρ. Through extensive simulations, we demonstrate that our method not only speeds up computation, but also have an average of 10% higher recall rate compared with the existing approaches. We validate our method using a real mouse high-density lipoprotein data (HDL) and show that CAVIAR-Gene is able to identify Apoa2 (a gene known to harbor causal variants for HDL), while reducing the number of genes that need to be tested for functionality by a factor of 2. Availability and implementation: Software is freely available for download at genetics.cs.ucla.edu/caviar. Contact: eeskin@cs.ucla.edu PMID:26072484

  20. Identification of causal genes for complex traits.

    PubMed

    Hormozdiari, Farhad; Kichaev, Gleb; Yang, Wen-Yun; Pasaniuc, Bogdan; Eskin, Eleazar

    2015-06-15

    Although genome-wide association studies (GWAS) have identified thousands of variants associated with common diseases and complex traits, only a handful of these variants are validated to be causal. We consider 'causal variants' as variants which are responsible for the association signal at a locus. As opposed to association studies that benefit from linkage disequilibrium (LD), the main challenge in identifying causal variants at associated loci lies in distinguishing among the many closely correlated variants due to LD. This is particularly important for model organisms such as inbred mice, where LD extends much further than in human populations, resulting in large stretches of the genome with significantly associated variants. Furthermore, these model organisms are highly structured and require correction for population structure to remove potential spurious associations. In this work, we propose CAVIAR-Gene (CAusal Variants Identification in Associated Regions), a novel method that is able to operate across large LD regions of the genome while also correcting for population structure. A key feature of our approach is that it provides as output a minimally sized set of genes that captures the genes which harbor causal variants with probability ρ. Through extensive simulations, we demonstrate that our method not only speeds up computation, but also have an average of 10% higher recall rate compared with the existing approaches. We validate our method using a real mouse high-density lipoprotein data (HDL) and show that CAVIAR-Gene is able to identify Apoa2 (a gene known to harbor causal variants for HDL), while reducing the number of genes that need to be tested for functionality by a factor of 2. Software is freely available for download at genetics.cs.ucla.edu/caviar. © The Author 2015. Published by Oxford University Press.

  1. DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data.

    PubMed

    Arango-Argoty, Gustavo; Garner, Emily; Pruden, Amy; Heath, Lenwood S; Vikesland, Peter; Zhang, Liqing

    2018-02-01

    Growing concerns about increasing rates of antibiotic resistance call for expanded and comprehensive global monitoring. Advancing methods for monitoring of environmental media (e.g., wastewater, agricultural waste, food, and water) is especially needed for identifying potential resources of novel antibiotic resistance genes (ARGs), hot spots for gene exchange, and as pathways for the spread of ARGs and human exposure. Next-generation sequencing now enables direct access and profiling of the total metagenomic DNA pool, where ARGs are typically identified or predicted based on the "best hits" of sequence searches against existing databases. Unfortunately, this approach produces a high rate of false negatives. To address such limitations, we propose here a deep learning approach, taking into account a dissimilarity matrix created using all known categories of ARGs. Two deep learning models, DeepARG-SS and DeepARG-LS, were constructed for short read sequences and full gene length sequences, respectively. Evaluation of the deep learning models over 30 antibiotic resistance categories demonstrates that the DeepARG models can predict ARGs with both high precision (> 0.97) and recall (> 0.90). The models displayed an advantage over the typical best hit approach, yielding consistently lower false negative rates and thus higher overall recall (> 0.9). As more data become available for under-represented ARG categories, the DeepARG models' performance can be expected to be further enhanced due to the nature of the underlying neural networks. Our newly developed ARG database, DeepARG-DB, encompasses ARGs predicted with a high degree of confidence and extensive manual inspection, greatly expanding current ARG repositories. The deep learning models developed here offer more accurate antimicrobial resistance annotation relative to current bioinformatics practice. DeepARG does not require strict cutoffs, which enables identification of a much broader diversity of ARGs. The DeepARG models and database are available as a command line version and as a Web service at http://bench.cs.vt.edu/deeparg .

  2. EMSAR: estimation of transcript abundance from RNA-seq data by mappability-based segmentation and reclustering.

    PubMed

    Lee, Soohyun; Seo, Chae Hwa; Alver, Burak Han; Lee, Sanghyuk; Park, Peter J

    2015-09-03

    RNA-seq has been widely used for genome-wide expression profiling. RNA-seq data typically consists of tens of millions of short sequenced reads from different transcripts. However, due to sequence similarity among genes and among isoforms, the source of a given read is often ambiguous. Existing approaches for estimating expression levels from RNA-seq reads tend to compromise between accuracy and computational cost. We introduce a new approach for quantifying transcript abundance from RNA-seq data. EMSAR (Estimation by Mappability-based Segmentation And Reclustering) groups reads according to the set of transcripts to which they are mapped and finds maximum likelihood estimates using a joint Poisson model for each optimal set of segments of transcripts. The method uses nearly all mapped reads, including those mapped to multiple genes. With an efficient transcriptome indexing based on modified suffix arrays, EMSAR minimizes the use of CPU time and memory while achieving accuracy comparable to the best existing methods. EMSAR is a method for quantifying transcripts from RNA-seq data with high accuracy and low computational cost. EMSAR is available at https://github.com/parklab/emsar.

  3. The RNAi machinery controls distinct responses to environmental signals in the basal fungus Mucor circinelloides.

    PubMed

    Nicolás, Francisco E; Vila, Ana; Moxon, Simon; Cascales, María D; Torres-Martínez, Santiago; Ruiz-Vázquez, Rosa M; Garre, Victoriano

    2015-03-25

    RNA interference (RNAi) is a conserved mechanism of genome defence that can also have a role in the regulation of endogenous functions through endogenous small RNAs (esRNAs). In fungi, knowledge of the functions regulated by esRNAs has been hampered by lack of clear phenotypes in most mutants affected in the RNAi machinery. Mutants of Mucor circinelloides affected in RNAi genes show defects in physiological and developmental processes, thus making Mucor an outstanding fungal model for studying endogenous functions regulated by RNAi. Some classes of Mucor esRNAs map to exons (ex-siRNAs) and regulate expression of the genes from which they derive. To have a broad picture of genes regulated by the silencing machinery during vegetative growth, we have sequenced and compared the mRNA profiles of mutants in the main RNAi genes by using RNA-seq. In addition, we have achieved a more complete phenotypic characterization of silencing mutants. Deletion of any main RNAi gene provoked a deep impact in mRNA accumulation at exponential and stationary growth. Genes showing increased mRNA levels, as expected for direct ex-siRNAs targets, but also genes with decreased expression were detected, suggesting that, most probably, the initial ex-siRNA targets regulate the expression of other genes, which can be up- or down-regulated. Expression of 50% of the genes was dependent on more than one RNAi gene in agreement with the existence of several classes of ex-siRNAs produced by different combinations of RNAi proteins. These combinations of proteins have also been involved in the regulation of different cellular processes. Besides genes regulated by the canonical RNAi pathway, this analysis identified processes, such as growth at low pH and sexual interaction that are regulated by a dicer-independent non-canonical RNAi pathway. This work shows that the RNAi pathways play a relevant role in the regulation of a significant number of endogenous genes in M. circinelloides during exponential and stationary growth phases and opens up an important avenue for in-depth study of genes involved in the regulation of physiological and developmental processes in this fungal model.

  4. Lateral Gene Transfer from the Dead

    PubMed Central

    Szöllősi, Gergely J.; Tannier, Eric; Lartillot, Nicolas; Daubin, Vincent

    2013-01-01

    In phylogenetic studies, the evolution of molecular sequences is assumed to have taken place along the phylogeny traced by the ancestors of extant species. In the presence of lateral gene transfer, however, this may not be the case, because the species lineage from which a gene was transferred may have gone extinct or not have been sampled. Because it is not feasible to specify or reconstruct the complete phylogeny of all species, we must describe the evolution of genes outside the represented phylogeny by modeling the speciation dynamics that gave rise to the complete phylogeny. We demonstrate that if the number of sampled species is small compared with the total number of existing species, the overwhelming majority of gene transfers involve speciation to and evolution along extinct or unsampled lineages. We show that the evolution of genes along extinct or unsampled lineages can to good approximation be treated as those of independently evolving lineages described by a few global parameters. Using this result, we derive an algorithm to calculate the probability of a gene tree and recover the maximum-likelihood reconciliation given the phylogeny of the sampled species. Examining 473 near-universal gene families from 36 cyanobacteria, we find that nearly a third of transfer events (28%) appear to have topological signatures of evolution along extinct species, but only approximately 6% of transfers trace their ancestry to before the common ancestor of the sampled cyanobacteria. [Gene tree reconciliation; lateral gene transfer; macroevolution; phylogeny.] PMID:23355531

  5. Genome-wide gene expression and RNA half-life measurements allow predictions of regulation and metabolic behavior in Methanosarcina acetivorans

    DOE PAGES

    Peterson, Joseph R.; Thor, ShengShee; Kohler, Lars; ...

    2016-11-16

    Here, while a few studies on the variations in mRNA expression and half-lives measured under different growth conditions have been used to predict patterns of regulation in bacterial organisms, the extent to which this information can also play a role in defining metabolic phenotypes has yet to be examined systematically. Here we present the first comprehensive study for a model methanogen. As a result, we use expression and half-life data for the methanogen Methanosarcina acetivorans growing on fast- and slow-growth substrates to examine the regulation of its genes. Unlike Escherichia coli where only small shifts in half-lives were observed, wemore » found that most mRNA have significantly longer half-lives for slow growth on acetate compared to fast growth on methanol or trimethylamine. Interestingly, half-life shifts are not uniform across functional classes of enzymes, suggesting the existence of a selective stabilization mechanism for mRNAs. Using the transcriptomics data we determined whether transcription or degradation rate controls the change in transcript abundance. Degradation was found to control abundance for about half of the metabolic genes underscoring its role in regulating metabolism. Genes involved in half of the metabolic reactions were found to be differentially expressed among the substrates suggesting the existence of drastically different metabolic phenotypes that extend beyond just the methanogenesis pathways. By integrating expression data with an updated metabolic model of the organism (iST807) significant differences in pathway flux and production of metabolites were predicted for the three growth substrates. In conclusion, this study provides the first global picture of differential expression and half-lives for a class II methanogen, as well as provides the first evidence in a single organism that drastic genome-wide shifts in RNA half-lives can be modulated by growth substrate. We determined which genes in each metabolic pathway control the flux and classified them as regulated by transcription (e.g. transcription factor) or degradation (e.g. post-transcriptional modification). We found that more than half of genes in metabolism were controlled by degradation. Our results suggest that M. acetivorans employs extensive post-transcriptional regulation to optimize key metabolic steps, and more generally that degradation could play a much greater role in optimizing an organism’s metabolism than previously thought.« less

  6. Genome-wide gene expression and RNA half-life measurements allow predictions of regulation and metabolic behavior in Methanosarcina acetivorans

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Peterson, Joseph R.; Thor, ShengShee; Kohler, Lars

    Here, while a few studies on the variations in mRNA expression and half-lives measured under different growth conditions have been used to predict patterns of regulation in bacterial organisms, the extent to which this information can also play a role in defining metabolic phenotypes has yet to be examined systematically. Here we present the first comprehensive study for a model methanogen. As a result, we use expression and half-life data for the methanogen Methanosarcina acetivorans growing on fast- and slow-growth substrates to examine the regulation of its genes. Unlike Escherichia coli where only small shifts in half-lives were observed, wemore » found that most mRNA have significantly longer half-lives for slow growth on acetate compared to fast growth on methanol or trimethylamine. Interestingly, half-life shifts are not uniform across functional classes of enzymes, suggesting the existence of a selective stabilization mechanism for mRNAs. Using the transcriptomics data we determined whether transcription or degradation rate controls the change in transcript abundance. Degradation was found to control abundance for about half of the metabolic genes underscoring its role in regulating metabolism. Genes involved in half of the metabolic reactions were found to be differentially expressed among the substrates suggesting the existence of drastically different metabolic phenotypes that extend beyond just the methanogenesis pathways. By integrating expression data with an updated metabolic model of the organism (iST807) significant differences in pathway flux and production of metabolites were predicted for the three growth substrates. In conclusion, this study provides the first global picture of differential expression and half-lives for a class II methanogen, as well as provides the first evidence in a single organism that drastic genome-wide shifts in RNA half-lives can be modulated by growth substrate. We determined which genes in each metabolic pathway control the flux and classified them as regulated by transcription (e.g. transcription factor) or degradation (e.g. post-transcriptional modification). We found that more than half of genes in metabolism were controlled by degradation. Our results suggest that M. acetivorans employs extensive post-transcriptional regulation to optimize key metabolic steps, and more generally that degradation could play a much greater role in optimizing an organism’s metabolism than previously thought.« less

  7. Gene function prediction based on the Gene Ontology hierarchical structure.

    PubMed

    Cheng, Liangxi; Lin, Hongfei; Hu, Yuncui; Wang, Jian; Yang, Zhihao

    2014-01-01

    The information of the Gene Ontology annotation is helpful in the explanation of life science phenomena, and can provide great support for the research of the biomedical field. The use of the Gene Ontology is gradually affecting the way people store and understand bioinformatic data. To facilitate the prediction of gene functions with the aid of text mining methods and existing resources, we transform it into a multi-label top-down classification problem and develop a method that uses the hierarchical relationships in the Gene Ontology structure to relieve the quantitative imbalance of positive and negative training samples. Meanwhile the method enhances the discriminating ability of classifiers by retaining and highlighting the key training samples. Additionally, the top-down classifier based on a tree structure takes the relationship of target classes into consideration and thus solves the incompatibility between the classification results and the Gene Ontology structure. Our experiment on the Gene Ontology annotation corpus achieves an F-value performance of 50.7% (precision: 52.7% recall: 48.9%). The experimental results demonstrate that when the size of training set is small, it can be expanded via topological propagation of associated documents between the parent and child nodes in the tree structure. The top-down classification model applies to the set of texts in an ontology structure or with a hierarchical relationship.

  8. A Novel Paramyxovirus?

    PubMed Central

    García-Sastre, Adolfo; Palese, Peter

    2005-01-01

    In public databases, we identified sequences reported as human genes expressed in kidney mesangial cells. The similarity of these genes to paramyxovirus matrix, fusion, and phosphoprotein genes suggests that they are derived from a novel paramyxovirus. These genes are sufficiently unique to suggest the existence of a novel paramyxovirus genus. PMID:15705331

  9. Similarity-based gene detection: using COGs to find evolutionarily-conserved ORFs.

    PubMed

    Powell, Bradford C; Hutchison, Clyde A

    2006-01-19

    Experimental verification of gene products has not kept pace with the rapid growth of microbial sequence information. However, existing annotations of gene locations contain sufficient information to screen for probable errors. Furthermore, comparisons among genomes become more informative as more genomes are examined. We studied all open reading frames (ORFs) of at least 30 codons from the genomes of 27 sequenced bacterial strains. We grouped the potential peptide sequences encoded from the ORFs by forming Clusters of Orthologous Groups (COGs). We used this grouping in order to find homologous relationships that would not be distinguishable from noise when using simple BLAST searches. Although COG analysis was initially developed to group annotated genes, we applied it to the task of grouping anonymous DNA sequences that may encode proteins. "Mixed COGs" of ORFs (clusters in which some sequences correspond to annotated genes and some do not) are attractive targets when seeking errors of gene prediction. Examination of mixed COGs reveals some situations in which genes appear to have been missed in current annotations and a smaller number of regions that appear to have been annotated as gene loci erroneously. This technique can also be used to detect potential pseudogenes or sequencing errors. Our method uses an adjustable parameter for degree of conservation among the studied genomes (stringency). We detail results for one level of stringency at which we found 83 potential genes which had not previously been identified, 60 potential pseudogenes, and 7 sequences with existing gene annotations that are probably incorrect. Systematic study of sequence conservation offers a way to improve existing annotations by identifying potentially homologous regions where the annotation of the presence or absence of a gene is inconsistent among genomes.

  10. Similarity-based gene detection: using COGs to find evolutionarily-conserved ORFs

    PubMed Central

    Powell, Bradford C; Hutchison, Clyde A

    2006-01-01

    Background Experimental verification of gene products has not kept pace with the rapid growth of microbial sequence information. However, existing annotations of gene locations contain sufficient information to screen for probable errors. Furthermore, comparisons among genomes become more informative as more genomes are examined. We studied all open reading frames (ORFs) of at least 30 codons from the genomes of 27 sequenced bacterial strains. We grouped the potential peptide sequences encoded from the ORFs by forming Clusters of Orthologous Groups (COGs). We used this grouping in order to find homologous relationships that would not be distinguishable from noise when using simple BLAST searches. Although COG analysis was initially developed to group annotated genes, we applied it to the task of grouping anonymous DNA sequences that may encode proteins. Results "Mixed COGs" of ORFs (clusters in which some sequences correspond to annotated genes and some do not) are attractive targets when seeking errors of gene predicion. Examination of mixed COGs reveals some situations in which genes appear to have been missed in current annotations and a smaller number of regions that appear to have been annotated as gene loci erroneously. This technique can also be used to detect potential pseudogenes or sequencing errors. Our method uses an adjustable parameter for degree of conservation among the studied genomes (stringency). We detail results for one level of stringency at which we found 83 potential genes which had not previously been identified, 60 potential pseudogenes, and 7 sequences with existing gene annotations that are probably incorrect. Conclusion Systematic study of sequence conservation offers a way to improve existing annotations by identifying potentially homologous regions where the annotation of the presence or absence of a gene is inconsistent among genomes. PMID:16423288

  11. Discovering relationships between nuclear receptor signaling pathways, genes, and tissues in Transcriptomine.

    PubMed

    Becnel, Lauren B; Ochsner, Scott A; Darlington, Yolanda F; McOwiti, Apollo; Kankanamge, Wasula H; Dehart, Michael; Naumov, Alexey; McKenna, Neil J

    2017-04-25

    We previously developed a web tool, Transcriptomine, to explore expression profiling data sets involving small-molecule or genetic manipulations of nuclear receptor signaling pathways. We describe advances in biocuration, query interface design, and data visualization that enhance the discovery of uncharacterized biology in these pathways using this tool. Transcriptomine currently contains about 45 million data points encompassing more than 2000 experiments in a reference library of nearly 550 data sets retrieved from public archives and systematically curated. To make the underlying data points more accessible to bench biologists, we classified experimental small molecules and gene manipulations into signaling pathways and experimental tissues and cell lines into physiological systems and organs. Incorporation of these mappings into Transcriptomine enables the user to readily evaluate tissue-specific regulation of gene expression by nuclear receptor signaling pathways. Data points from animal and cell model experiments and from clinical data sets elucidate the roles of nuclear receptor pathways in gene expression events accompanying various normal and pathological cellular processes. In addition, data sets targeting non-nuclear receptor signaling pathways highlight transcriptional cross-talk between nuclear receptors and other signaling pathways. We demonstrate with specific examples how data points that exist in isolation in individual data sets validate each other when connected and made accessible to the user in a single interface. In summary, Transcriptomine allows bench biologists to routinely develop research hypotheses, validate experimental data, or model relationships between signaling pathways, genes, and tissues. Copyright © 2017, American Association for the Advancement of Science.

  12. Constraints on signaling network logic reveal functional subgraphs on Multiple Myeloma OMIC data.

    PubMed

    Miannay, Bertrand; Minvielle, Stéphane; Magrangeas, Florence; Guziolowski, Carito

    2018-03-21

    The integration of gene expression profiles (GEPs) and large-scale biological networks derived from pathways databases is a subject which is being widely explored. Existing methods are based on network distance measures among significantly measured species. Only a small number of them include the directionality and underlying logic existing in biological networks. In this study we approach the GEP-networks integration problem by considering the network logic, however our approach does not require a prior species selection according to their gene expression level. We start by modeling the biological network representing its underlying logic using Logic Programming. This model points to reachable network discrete states that maximize a notion of harmony between the molecular species active or inactive possible states and the directionality of the pathways reactions according to their activator or inhibitor control role. Only then, we confront these network states with the GEP. From this confrontation independent graph components are derived, each of them related to a fixed and optimal assignment of active or inactive states. These components allow us to decompose a large-scale network into subgraphs and their molecular species state assignments have different degrees of similarity when compared to the same GEP. We apply our method to study the set of possible states derived from a subgraph from the NCI-PID Pathway Interaction Database. This graph links Multiple Myeloma (MM) genes to known receptors for this blood cancer. We discover that the NCI-PID MM graph had 15 independent components, and when confronted to 611 MM GEPs, we find 1 component as being more specific to represent the difference between cancer and healthy profiles.

  13. SVGenes: a library for rendering genomic features in scalable vector graphic format.

    PubMed

    Etherington, Graham J; MacLean, Daniel

    2013-08-01

    Drawing genomic features in attractive and informative ways is a key task in visualization of genomics data. Scalable Vector Graphics (SVG) format is a modern and flexible open standard that provides advanced features including modular graphic design, advanced web interactivity and animation within a suitable client. SVGs do not suffer from loss of image quality on re-scaling and provide the ability to edit individual elements of a graphic on the whole object level independent of the whole image. These features make SVG a potentially useful format for the preparation of publication quality figures including genomic objects such as genes or sequencing coverage and for web applications that require rich user-interaction with the graphical elements. SVGenes is a Ruby-language library that uses SVG primitives to render typical genomic glyphs through a simple and flexible Ruby interface. The library implements a simple Page object that spaces and contains horizontal Track objects that in turn style, colour and positions features within them. Tracks are the level at which visual information is supplied providing the full styling capability of the SVG standard. Genomic entities like genes, transcripts and histograms are modelled in Glyph objects that are attached to a track and take advantage of SVG primitives to render the genomic features in a track as any of a selection of defined glyphs. The feature model within SVGenes is simple but flexible and not dependent on particular existing gene feature formats meaning graphics for any existing datasets can easily be created without need for conversion. The library is provided as a Ruby Gem from https://rubygems.org/gems/bio-svgenes under the MIT license, and open source code is available at https://github.com/danmaclean/bioruby-svgenes also under the MIT License. dan.maclean@tsl.ac.uk.

  14. Predicting multi-level drug response with gene expression profile in multiple myeloma using hierarchical ordinal regression.

    PubMed

    Zhang, Xinyan; Li, Bingzong; Han, Huiying; Song, Sha; Xu, Hongxia; Hong, Yating; Yi, Nengjun; Zhuang, Wenzhuo

    2018-05-10

    Multiple myeloma (MM), like other cancers, is caused by the accumulation of genetic abnormalities. Heterogeneity exists in the patients' response to treatments, for example, bortezomib. This urges efforts to identify biomarkers from numerous molecular features and build predictive models for identifying patients that can benefit from a certain treatment scheme. However, previous studies treated the multi-level ordinal drug response as a binary response where only responsive and non-responsive groups are considered. It is desirable to directly analyze the multi-level drug response, rather than combining the response to two groups. In this study, we present a novel method to identify significantly associated biomarkers and then develop ordinal genomic classifier using the hierarchical ordinal logistic model. The proposed hierarchical ordinal logistic model employs the heavy-tailed Cauchy prior on the coefficients and is fitted by an efficient quasi-Newton algorithm. We apply our hierarchical ordinal regression approach to analyze two publicly available datasets for MM with five-level drug response and numerous gene expression measures. Our results show that our method is able to identify genes associated with the multi-level drug response and to generate powerful predictive models for predicting the multi-level response. The proposed method allows us to jointly fit numerous correlated predictors and thus build efficient models for predicting the multi-level drug response. The predictive model for the multi-level drug response can be more informative than the previous approaches. Thus, the proposed approach provides a powerful tool for predicting multi-level drug response and has important impact on cancer studies.

  15. Hox cluster polarity in early transcriptional availability: a high order regulatory level of clustered Hox genes in the mouse.

    PubMed

    Roelen, Bernard A J; de Graaff, Wim; Forlani, Sylvie; Deschamps, Jacqueline

    2002-11-01

    The molecular mechanism underlying the 3' to 5' polarity of induction of mouse Hox genes is still elusive. While relief from a cluster-encompassing repression was shown to lead to all Hoxd genes being expressed like the 3'most of them, Hoxd1 (Kondo and Duboule, 1999), the molecular basis of initial activation of this 3'most gene, is not understood yet. We show that, already before primitive streak formation, prior to initial expression of the first Hox gene, a dramatic transcriptional stimulation of the 3'most genes, Hoxb1 and Hoxb2, is observed upon a short pulse of exogenous retinoic acid (RA), whereas it is not in the case for more 5', cluster-internal, RA-responsive Hoxb genes. In contrast, the RA-responding Hoxb1lacZ transgene that faithfully mimics the endogenous gene (Marshall et al., 1994) did not exhibit the sensitivity of Hoxb1 to precocious activation. We conclude that polarity in initial activation of Hoxb genes reflects a greater availability of 3'Hox genes for transcription, suggesting a pre-existing (susceptibility to) opening of the chromatin structure at the 3' extremity of the cluster. We discuss the data in the context of prevailing models involving differential chromatin opening in the directionality of clustered Hox gene transcription, and regarding the importance of the cluster context for correct timing of initial Hox gene expression.Interestingly, Cdx1 manifested the same early transcriptional availability as Hoxb1. Copyright 2002 Elsevier Science Ireland Ltd.

  16. Supervised group Lasso with applications to microarray data analysis

    PubMed Central

    Ma, Shuangge; Song, Xiao; Huang, Jian

    2007-01-01

    Background A tremendous amount of efforts have been devoted to identifying genes for diagnosis and prognosis of diseases using microarray gene expression data. It has been demonstrated that gene expression data have cluster structure, where the clusters consist of co-regulated genes which tend to have coordinated functions. However, most available statistical methods for gene selection do not take into consideration the cluster structure. Results We propose a supervised group Lasso approach that takes into account the cluster structure in gene expression data for gene selection and predictive model building. For gene expression data without biological cluster information, we first divide genes into clusters using the K-means approach and determine the optimal number of clusters using the Gap method. The supervised group Lasso consists of two steps. In the first step, we identify important genes within each cluster using the Lasso method. In the second step, we select important clusters using the group Lasso. Tuning parameters are determined using V-fold cross validation at both steps to allow for further flexibility. Prediction performance is evaluated using leave-one-out cross validation. We apply the proposed method to disease classification and survival analysis with microarray data. Conclusion We analyze four microarray data sets using the proposed approach: two cancer data sets with binary cancer occurrence as outcomes and two lymphoma data sets with survival outcomes. The results show that the proposed approach is capable of identifying a small number of influential gene clusters and important genes within those clusters, and has better prediction performance than existing methods. PMID:17316436

  17. DEsingle for detecting three types of differential expression in single-cell RNA-seq data.

    PubMed

    Miao, Zhun; Deng, Ke; Wang, Xiaowo; Zhang, Xuegong

    2018-04-24

    The excessive amount of zeros in single-cell RNA-seq data include "real" zeros due to the on-off nature of gene transcription in single cells and "dropout" zeros due to technical reasons. Existing differential expression (DE) analysis methods cannot distinguish these two types of zeros. We developed an R package DEsingle which employed Zero-Inflated Negative Binomial model to estimate the proportion of real and dropout zeros and to define and detect 3 types of DE genes in single-cell RNA-seq data with higher accuracy. The R package DEsingle is freely available at https://github.com/miaozhun/DEsingle and is under Bioconductor's consideration now. zhangxg@tsinghua.edu.cn. Supplementary data are available at Bioinformatics online.

  18. An extended set of yeast-based functional assays accurately identifies human disease mutations

    PubMed Central

    Sun, Song; Yang, Fan; Tan, Guihong; Costanzo, Michael; Oughtred, Rose; Hirschman, Jodi; Theesfeld, Chandra L.; Bansal, Pritpal; Sahni, Nidhi; Yi, Song; Yu, Analyn; Tyagi, Tanya; Tie, Cathy; Hill, David E.; Vidal, Marc; Andrews, Brenda J.; Boone, Charles; Dolinski, Kara; Roth, Frederick P.

    2016-01-01

    We can now routinely identify coding variants within individual human genomes. A pressing challenge is to determine which variants disrupt the function of disease-associated genes. Both experimental and computational methods exist to predict pathogenicity of human genetic variation. However, a systematic performance comparison between them has been lacking. Therefore, we developed and exploited a panel of 26 yeast-based functional complementation assays to measure the impact of 179 variants (101 disease- and 78 non-disease-associated variants) from 22 human disease genes. Using the resulting reference standard, we show that experimental functional assays in a 1-billion-year diverged model organism can identify pathogenic alleles with significantly higher precision and specificity than current computational methods. PMID:26975778

  19. Peak flood estimation using gene expression programming

    NASA Astrophysics Data System (ADS)

    Zorn, Conrad R.; Shamseldin, Asaad Y.

    2015-12-01

    As a case study for the Auckland Region of New Zealand, this paper investigates the potential use of gene-expression programming (GEP) in predicting specific return period events in comparison to the established and widely used Regional Flood Estimation (RFE) method. Initially calibrated to 14 gauged sites, the GEP derived model was further validated to 10 and 100 year flood events with a relative errors of 29% and 18%, respectively. This is compared to the RFE method providing 48% and 44% errors for the same flood events. While the effectiveness of GEP in predicting specific return period events is made apparent, it is argued that the derived equations should be used in conjunction with those existing methodologies rather than as a replacement.

  20. Zebrafish models for translational neuroscience research: from tank to bedside

    PubMed Central

    Stewart, Adam Michael; Braubach, Oliver; Spitsbergen, Jan; Gerlai, Robert; Kalueff, Allan V.

    2014-01-01

    The zebrafish (Danio rerio) is emerging as a new important species for studying mechanisms of brain function and dysfunction. Focusing on selected central nervous system (CNS) disorders (brain cancer, epilepsy, and anxiety) and using them as examples, we discuss the value of zebrafish models in translational neuroscience. We further evaluate the contribution of zebrafish to neuroimaging, circuit level, and drug discovery research. Outlining the role of zebrafish in modeling a wide range of human brain disorders, we also summarize recent applications and existing challenges in this field. Finally, we emphasize the potential of zebrafish models in behavioral phenomics and high-throughput genetic/small molecule screening, which is critical for CNS drug discovery and identifying novel candidate genes. PMID:24726051

  1. Laplacian normalization and random walk on heterogeneous networks for disease-gene prioritization.

    PubMed

    Zhao, Zhi-Qin; Han, Guo-Sheng; Yu, Zu-Guo; Li, Jinyan

    2015-08-01

    Random walk on heterogeneous networks is a recently emerging approach to effective disease gene prioritization. Laplacian normalization is a technique capable of normalizing the weight of edges in a network. We use this technique to normalize the gene matrix and the phenotype matrix before the construction of the heterogeneous network, and also use this idea to define the transition matrices of the heterogeneous network. Our method has remarkably better performance than the existing methods for recovering known gene-phenotype relationships. The Shannon information entropy of the distribution of the transition probabilities in our networks is found to be smaller than the networks constructed by the existing methods, implying that a higher number of top-ranked genes can be verified as disease genes. In fact, the most probable gene-phenotype relationships ranked within top 3 or top 5 in our gene lists can be confirmed by the OMIM database for many cases. Our algorithms have shown remarkably superior performance over the state-of-the-art algorithms for recovering gene-phenotype relationships. All Matlab codes can be available upon email request. Copyright © 2015 Elsevier Ltd. All rights reserved.

  2. Prior knowledge based mining functional modules from Yeast PPI networks with gene ontology

    PubMed Central

    2010-01-01

    Background In the literature, there are fruitful algorithmic approaches for identification functional modules in protein-protein interactions (PPI) networks. Because of accumulation of large-scale interaction data on multiple organisms and non-recording interaction data in the existing PPI database, it is still emergent to design novel computational techniques that can be able to correctly and scalably analyze interaction data sets. Indeed there are a number of large scale biological data sets providing indirect evidence for protein-protein interaction relationships. Results The main aim of this paper is to present a prior knowledge based mining strategy to identify functional modules from PPI networks with the aid of Gene Ontology. Higher similarity value in Gene Ontology means that two gene products are more functionally related to each other, so it is better to group such gene products into one functional module. We study (i) to encode the functional pairs into the existing PPI networks; and (ii) to use these functional pairs as pairwise constraints to supervise the existing functional module identification algorithms. Topology-based modularity metric and complex annotation in MIPs will be used to evaluate the identified functional modules by these two approaches. Conclusions The experimental results on Yeast PPI networks and GO have shown that the prior knowledge based learning methods perform better than the existing algorithms. PMID:21172053

  3. Interactions among Genes Regulating Ovule Development in Arabidopsis Thaliana

    PubMed Central

    Baker, S. C.; Robinson-Beers, K.; Villanueva, J. M.; Gaiser, J. C.; Gasser, C. S.

    1997-01-01

    The INNER NO OUTER (INO) and AINTEGUMENTA (ANT) genes are essential for ovule integument development in Arabidopsis thaliana. Ovules of ino mutants initiate two integument primordia, but the outer integument primordium forms on the opposite side of the ovule from the normal location and undergoes no further development. The inner integument appears to develop normally, resulting in erect, unitegmic ovules that resemble those of gymnosperms. ino plants are partially fertile and produce seeds with altered surface topography, demonstrating a lineage dependence in development of the testa. ant mutations affect initiation of both integuments. The strongest of five new ant alleles we have isolated produces ovules that lack integuments and fail to complete megasporogenesis. ant mutations also affect flower development, resulting in narrow petals and the absence of one or both lateral stamens. Characterization of double mutants between ant, ino and other mutations affecting ovule development has enabled the construction of a model for genetic control of ovule development. This model proposes parallel independent regulatory pathways for a number of aspects of this process, a dependence on the presence of an inner integument for development of the embryo sac, and the existence of additional genes regulating ovule development. PMID:9093862

  4. GAPP: A Proteogenomic Software for Genome Annotation and Global Profiling of Post-translational Modifications in Prokaryotes.

    PubMed

    Zhang, Jia; Yang, Ming-Kun; Zeng, Honghui; Ge, Feng

    2016-11-01

    Although the number of sequenced prokaryotic genomes is growing rapidly, experimentally verified annotation of prokaryotic genome remains patchy and challenging. To facilitate genome annotation efforts for prokaryotes, we developed an open source software called GAPP for genome annotation and global profiling of post-translational modifications (PTMs) in prokaryotes. With a single command, it provides a standard workflow to validate and refine predicted genetic models and discover diverse PTM events. We demonstrated the utility of GAPP using proteomic data from Helicobacter pylori, one of the major human pathogens that is responsible for many gastric diseases. Our results confirmed 84.9% of the existing predicted H. pylori proteins, identified 20 novel protein coding genes, and corrected four existing gene models with regard to translation initiation sites. In particular, GAPP revealed a large repertoire of PTMs using the same proteomic data and provided a rich resource that can be used to examine the functions of reversible modifications in this human pathogen. This software is a powerful tool for genome annotation and global discovery of PTMs and is applicable to any sequenced prokaryotic organism; we expect that it will become an integral part of ongoing genome annotation efforts for prokaryotes. GAPP is freely available at https://sourceforge.net/projects/gappproteogenomic/. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.

  5. Gene Fusion: A Genome Wide Survey

    NASA Technical Reports Server (NTRS)

    Liang, Ping; Riley, Monica

    2001-01-01

    As a well known fact, organisms form larger and complex multimodular (composite or chimeric) and mostly multi-functional proteins through gene fusion of two or more individual genes which have independent evolution histories and functions. We call each of these components a module. The existence of multimodular proteins may improves the efficiency in gene regulation and in cellular functions, and thus may give the host organism advantages in adaptation to environments. Analysis of all gene fusions in present-day organisms should allow us to examine the patterns of gene fusion in context with cellular functions, to trace back the evolution processes from the ancient smaller and uni-functional proteins to the present-day larger and complex multi-functional proteins, and to estimate the minimal number of ancestor proteins that existed in the last common ancestor for all life on earth. Although many multimodular proteins have been experimentally known, identification of gene fusion events systematically at genome scale had not been possible until recently when large number of completed genome sequences have been becoming available. In addition, technical difficulties for such analysis also exist due to the complexity of this biological and evolutionary process. We report from this study a new strategy to computationally identify multimodular proteins using completed genome sequences and the results surveyed from 22 organisms with the data from over 40 organisms to be presented during the meeting. Additional information is contained in the original extended abstract.

  6. Retroviruses Hijack Chromatin Loops to Drive Oncogene Expression and Highlight the Chromatin Architecture around Proto-Oncogenic Loci

    PubMed Central

    Pattison, Jillian M.; Wright, Jason B.; Cole, Michael D.

    2015-01-01

    The majority of the genome consists of intergenic and non-coding DNA sequences shown to play a major role in different gene regulatory networks. However, the specific potency of these distal elements as well as how these regions exert function across large genomic distances remains unclear. To address these unresolved issues, we closely examined the chromatin architecture around proto-oncogenic loci in the mouse and human genomes to demonstrate a functional role for chromatin looping in distal gene regulation. Using cell culture models, we show that tumorigenic retroviral integration sites within the mouse genome occur near existing large chromatin loops and that this chromatin architecture is maintained within the human genome as well. Significantly, as mutagenesis screens are not feasible in humans, we demonstrate a way to leverage existing screens in mice to identify disease relevant human enhancers and expose novel disease mechanisms. For instance, we characterize the epigenetic landscape upstream of the human Cyclin D1 locus to find multiple distal interactions that contribute to the complex cis-regulation of this cell cycle gene. Furthermore, we characterize a novel distal interaction upstream of the Cyclin D1 gene which provides mechanistic evidence for the abundant overexpression of Cyclin D1 occurring in multiple myeloma cells harboring a pathogenic translocation event. Through use of mapped retroviral integrations and translocation breakpoints, our studies highlight the importance of chromatin looping in oncogene expression, elucidate the epigenetic mechanisms crucial for distal cis-regulation, and in one particular instance, explain how a translocation event drives tumorigenesis through upregulation of a proto-oncogene. PMID:25799187

  7. Pyviko: an automated Python tool to design gene knockouts in complex viruses with overlapping genes.

    PubMed

    Taylor, Louis J; Strebel, Klaus

    2017-01-07

    Gene knockouts are a common tool used to study gene function in various organisms. However, designing gene knockouts is complicated in viruses, which frequently contain sequences that code for multiple overlapping genes. Designing mutants that can be traced by the creation of new or elimination of existing restriction sites further compounds the difficulty in experimental design of knockouts of overlapping genes. While software is available to rapidly identify restriction sites in a given nucleotide sequence, no existing software addresses experimental design of mutations involving multiple overlapping amino acid sequences in generating gene knockouts. Pyviko performed well on a test set of over 240,000 gene pairs collected from viral genomes deposited in the National Center for Biotechnology Information Nucleotide database, identifying a point mutation which added a premature stop codon within the first 20 codons of the target gene in 93.2% of all tested gene-overprinted gene pairs. This shows that Pyviko can be used successfully in a wide variety of contexts to facilitate the molecular cloning and study of viral overprinted genes. Pyviko is an extensible and intuitive Python tool for designing knockouts of overlapping genes. Freely available as both a Python package and a web-based interface ( http://louiejtaylor.github.io/pyViKO/ ), Pyviko simplifies the experimental design of gene knockouts in complex viruses with overlapping genes.

  8. Clusternomics: Integrative context-dependent clustering for heterogeneous datasets

    PubMed Central

    Wernisch, Lorenz

    2017-01-01

    Integrative clustering is used to identify groups of samples by jointly analysing multiple datasets describing the same set of biological samples, such as gene expression, copy number, methylation etc. Most existing algorithms for integrative clustering assume that there is a shared consistent set of clusters across all datasets, and most of the data samples follow this structure. However in practice, the structure across heterogeneous datasets can be more varied, with clusters being joined in some datasets and separated in others. In this paper, we present a probabilistic clustering method to identify groups across datasets that do not share the same cluster structure. The proposed algorithm, Clusternomics, identifies groups of samples that share their global behaviour across heterogeneous datasets. The algorithm models clusters on the level of individual datasets, while also extracting global structure that arises from the local cluster assignments. Clusters on both the local and the global level are modelled using a hierarchical Dirichlet mixture model to identify structure on both levels. We evaluated the model both on simulated and on real-world datasets. The simulated data exemplifies datasets with varying degrees of common structure. In such a setting Clusternomics outperforms existing algorithms for integrative and consensus clustering. In a real-world application, we used the algorithm for cancer subtyping, identifying subtypes of cancer from heterogeneous datasets. We applied the algorithm to TCGA breast cancer dataset, integrating gene expression, miRNA expression, DNA methylation and proteomics. The algorithm extracted clinically meaningful clusters with significantly different survival probabilities. We also evaluated the algorithm on lung and kidney cancer TCGA datasets with high dimensionality, again showing clinically significant results and scalability of the algorithm. PMID:29036190

  9. Clusternomics: Integrative context-dependent clustering for heterogeneous datasets.

    PubMed

    Gabasova, Evelina; Reid, John; Wernisch, Lorenz

    2017-10-01

    Integrative clustering is used to identify groups of samples by jointly analysing multiple datasets describing the same set of biological samples, such as gene expression, copy number, methylation etc. Most existing algorithms for integrative clustering assume that there is a shared consistent set of clusters across all datasets, and most of the data samples follow this structure. However in practice, the structure across heterogeneous datasets can be more varied, with clusters being joined in some datasets and separated in others. In this paper, we present a probabilistic clustering method to identify groups across datasets that do not share the same cluster structure. The proposed algorithm, Clusternomics, identifies groups of samples that share their global behaviour across heterogeneous datasets. The algorithm models clusters on the level of individual datasets, while also extracting global structure that arises from the local cluster assignments. Clusters on both the local and the global level are modelled using a hierarchical Dirichlet mixture model to identify structure on both levels. We evaluated the model both on simulated and on real-world datasets. The simulated data exemplifies datasets with varying degrees of common structure. In such a setting Clusternomics outperforms existing algorithms for integrative and consensus clustering. In a real-world application, we used the algorithm for cancer subtyping, identifying subtypes of cancer from heterogeneous datasets. We applied the algorithm to TCGA breast cancer dataset, integrating gene expression, miRNA expression, DNA methylation and proteomics. The algorithm extracted clinically meaningful clusters with significantly different survival probabilities. We also evaluated the algorithm on lung and kidney cancer TCGA datasets with high dimensionality, again showing clinically significant results and scalability of the algorithm.

  10. Genetic diversity, morphological uniformity and polyketide production in dinoflagellates (Amphidinium, Dinoflagellata).

    PubMed

    Murray, Shauna A; Garby, Tamsyn; Hoppenrath, Mona; Neilan, Brett A

    2012-01-01

    Dinoflagellates are an intriguing group of eukaryotes, showing many unusual morphological and genetic features. Some groups of dinoflagellates are morphologically highly uniform, despite indications of genetic diversity. The species Amphidinium carterae is abundant and cosmopolitan in marine environments, grows easily in culture, and has therefore been used as a 'model' dinoflagellate in research into dinoflagellate genetics, polyketide production and photosynthesis. We have investigated the diversity of 'cryptic' species of Amphidinium that are morphologically similar to A. carterae, including the very similar species Amphidinium massartii, based on light and electron microscopy, two nuclear gene regions (LSU rDNA and ITS rDNA) and one mitochondrial gene region (cytochrome b). We found that six genetically distinct cryptic species (clades) exist within the species A. massartii and four within A. carterae, and that these clades differ from one another in molecular sequences at levels comparable to other dinoflagellate species, genera or even families. Using primers based on an alignment of alveolate ketosynthase sequences, we isolated partial ketosynthase genes from several Amphidinium species. We compared these genes to known dinoflagellate ketosynthase genes and investigated the evolution and diversity of the strains of Amphidinium that produce them.

  11. Culture adaptation of malaria parasites selects for convergent loss-of-function mutants.

    PubMed

    Claessens, Antoine; Affara, Muna; Assefa, Samuel A; Kwiatkowski, Dominic P; Conway, David J

    2017-01-24

    Cultured human pathogens may differ significantly from source populations. To investigate the genetic basis of laboratory adaptation in malaria parasites, clinical Plasmodium falciparum isolates were sampled from patients and cultured in vitro for up to three months. Genome sequence analysis was performed on multiple culture time point samples from six monoclonal isolates, and single nucleotide polymorphism (SNP) variants emerging over time were detected. Out of a total of five positively selected SNPs, four represented nonsense mutations resulting in stop codons, three of these in a single ApiAP2 transcription factor gene, and one in SRPK1. To survey further for nonsense mutants associated with culture, genome sequences of eleven long-term laboratory-adapted parasite strains were examined, revealing four independently acquired nonsense mutations in two other ApiAP2 genes, and five in Epac. No mutants of these genes exist in a large database of parasite sequences from uncultured clinical samples. This implicates putative master regulator genes in which multiple independent stop codon mutations have convergently led to culture adaptation, affecting most laboratory lines of P. falciparum. Understanding the adaptive processes should guide development of experimental models, which could include targeted gene disruption to adapt fastidious malaria parasite species to culture.

  12. Peptidoglycan recognition protein genes and their roles in the innate immune pathways of the red flour beetle, Tribolium castaneum.

    PubMed

    Koyama, Hiroaki; Kato, Daiki; Minakuchi, Chieka; Tanaka, Toshiharu; Yokoi, Kakeru; Miura, Ken

    2015-11-01

    We have previously demonstrated that the functional Toll and IMD innate immune pathways indeed exist in the model beetle, Tribolium castaneum while the beetle's pathways have broader specificity in terms of microbial activation than that of Drosophila. To elucidate the molecular basis of this broad microbial activation, we here focused on potential upstream sensors of the T. castaneum innate immune pathways, peptidoglycan recognition proteins (PGRPs). Our phenotype analyses utilizing RNA interference-based comprehensive gene knockdown followed by bacterial challenge suggested: PGRP-LA functions as a pivotal sensor of the IMD pathway for both Gram-negative and Gram-positive bacteria; PGRP-LC acts as an IMD pathway-associated sensor mainly for Gram-negative bacteria; PGRP-LE also has some roles in Gram-negative bacterial recognition of the IMD pathway. On the other hand, we did not obtain clear phenotype changes by gene knockdown of short-type PGRP genes, probably because of highly inducible nature of these genes. Our results may collectively account for the promiscuous bacterial activation of the T. castaneum innate immune pathways at least in part. Copyright © 2015 Elsevier Inc. All rights reserved.

  13. Comparative Phylogenomics Uncovers the Impact of Symbiotic Associations on Host Genome Evolution

    PubMed Central

    Delaux, Pierre-Marc; Varala, Kranthi; Edger, Patrick P.; Coruzzi, Gloria M.; Pires, J. Chris; Ané, Jean-Michel

    2014-01-01

    Mutualistic symbioses between eukaryotes and beneficial microorganisms of their microbiome play an essential role in nutrition, protection against disease, and development of the host. However, the impact of beneficial symbionts on the evolution of host genomes remains poorly characterized. Here we used the independent loss of the most widespread plant–microbe symbiosis, arbuscular mycorrhization (AM), as a model to address this question. Using a large phenotypic approach and phylogenetic analyses, we present evidence that loss of AM symbiosis correlates with the loss of many symbiotic genes in the Arabidopsis lineage (Brassicales). Then, by analyzing the genome and/or transcriptomes of nine other phylogenetically divergent non-host plants, we show that this correlation occurred in a convergent manner in four additional plant lineages, demonstrating the existence of an evolutionary pattern specific to symbiotic genes. Finally, we use a global comparative phylogenomic approach to track this evolutionary pattern among land plants. Based on this approach, we identify a set of 174 highly conserved genes and demonstrate enrichment in symbiosis-related genes. Our findings are consistent with the hypothesis that beneficial symbionts maintain purifying selection on host gene networks during the evolution of entire lineages. PMID:25032823

  14. Genome-Wide RNAi Screen Identifies Broadly-Acting Host Factors That Inhibit Arbovirus Infection

    PubMed Central

    Yasunaga, Ari; Hanna, Sheri L.; Li, Jianqing; Cho, Hyelim; Rose, Patrick P.; Spiridigliozzi, Anna; Gold, Beth; Diamond, Michael S.; Cherry, Sara

    2014-01-01

    Vector-borne viruses are an important class of emerging and re-emerging pathogens; thus, an improved understanding of the cellular factors that modulate infection in their respective vertebrate and insect hosts may aid control efforts. In particular, cell-intrinsic antiviral pathways restrict vector-borne viruses including the type I interferon response in vertebrates and the RNA interference (RNAi) pathway in insects. However, it is likely that additional cell-intrinsic mechanisms exist to limit these viruses. Since insects rely on innate immune mechanisms to inhibit virus infections, we used Drosophila as a model insect to identify cellular factors that restrict West Nile virus (WNV), a flavivirus with a broad and expanding geographical host range. Our genome-wide RNAi screen identified 50 genes that inhibited WNV infection. Further screening revealed that 17 of these genes were antiviral against additional flaviviruses, and seven of these were antiviral against other vector-borne viruses, expanding our knowledge of invertebrate cell-intrinsic immunity. Investigation of two newly identified factors that restrict diverse viruses, dXPO1 and dRUVBL1, in the Tip60 complex, demonstrated they contributed to antiviral defense at the organismal level in adult flies, in mosquito cells, and in mammalian cells. These data suggest the existence of broadly acting and functionally conserved antiviral genes and pathways that restrict virus infections in evolutionarily divergent hosts. PMID:24550726

  15. Shrinkage regression-based methods for microarray missing value imputation.

    PubMed

    Wang, Hsiuying; Chiu, Chia-Chun; Wu, Yi-Ching; Wu, Wei-Sheng

    2013-01-01

    Missing values commonly occur in the microarray data, which usually contain more than 5% missing values with up to 90% of genes affected. Inaccurate missing value estimation results in reducing the power of downstream microarray data analyses. Many types of methods have been developed to estimate missing values. Among them, the regression-based methods are very popular and have been shown to perform better than the other types of methods in many testing microarray datasets. To further improve the performances of the regression-based methods, we propose shrinkage regression-based methods. Our methods take the advantage of the correlation structure in the microarray data and select similar genes for the target gene by Pearson correlation coefficients. Besides, our methods incorporate the least squares principle, utilize a shrinkage estimation approach to adjust the coefficients of the regression model, and then use the new coefficients to estimate missing values. Simulation results show that the proposed methods provide more accurate missing value estimation in six testing microarray datasets than the existing regression-based methods do. Imputation of missing values is a very important aspect of microarray data analyses because most of the downstream analyses require a complete dataset. Therefore, exploring accurate and efficient methods for estimating missing values has become an essential issue. Since our proposed shrinkage regression-based methods can provide accurate missing value estimation, they are competitive alternatives to the existing regression-based methods.

  16. Behavioral Teratogenesis in Drosophila melanogaster.

    PubMed

    Mishra, Monalisa; Barik, Bedanta Kumar

    2018-01-01

    Developmental biology is a fascinating branch of science which helps us to understand the mechanism of development, thus the findings are used in various therapeutic approach. Drosophila melanogaster served as a model to find the key molecules that initiate and regulate the mechanism of development. Various genes, transcription factors, and signaling pathways helping in development are identified in Drosophila. Many toxic compounds, which can affect the development, are also recognized using Drosophila model. These compounds, which can affect the development, are named as a teratogen. Many teratogens identified using Drosophila may also act as a teratogen for a human being since 75% of conservation exist between the disease genes present in Drosophila and human. There are certain teratogens, which do not cause developmental defect if exposed during pregnancy, however; behavioral defect appears in later part of development. Such compounds are named as a behavioral teratogen. Thus, it is worthy to identify the potential behavioral teratogen using Drosophila model. Drosophila behavior is well studied in various developmental stages. This chapter describes various methods which can be employed to test behavioral teratogenesis in Drosophila.

  17. Mayr, Dobzhansky, and Bush and the complexities of sympatric speciation in Rhagoletis

    PubMed Central

    Feder, Jeffrey L.; Xie, Xianfa; Rull, Juan; Velez, Sebastian; Forbes, Andrew; Leung, Brian; Dambroski, Hattie; Filchak, Kenneth E.; Aluja, Martin

    2005-01-01

    The Rhagoletis pomonella sibling species complex is a model for sympatric speciation by means of host plant shifting. However, genetic variation aiding the sympatric radiation of the group in the United States may have geographic roots. Inversions on chromosomes 1-3 affecting diapause traits adapting flies to differences in host fruiting phenology appear to exist in the United States because of a series of secondary introgression events from Mexico. Here, we investigate whether these inverted regions of the genome may have subsequently evolved to become more recalcitrant to introgression relative to collinear regions, consistent with new models for chromosomal speciation. As predicted by the models, gene trees for six nuclear loci mapping to chromosomes other than 1-3 tended to have shallower node depths separating Mexican and U.S. haplotypes relative to an outgroup sequence than nine genes residing on chromosomes 1-3. We discuss the implications of secondary contact and differential introgression with respect to sympatric host race formation and speciation in Rhagoletis, reconciling some of the seemingly dichotomous views of Mayr, Dobzhansky, and Bush concerning modes of divergence. PMID:15851672

  18. Hybrid models for chemical reaction networks: Multiscale theory and application to gene regulatory systems.

    PubMed

    Winkelmann, Stefanie; Schütte, Christof

    2017-09-21

    Well-mixed stochastic chemical kinetics are properly modeled by the chemical master equation (CME) and associated Markov jump processes in molecule number space. If the reactants are present in large amounts, however, corresponding simulations of the stochastic dynamics become computationally expensive and model reductions are demanded. The classical model reduction approach uniformly rescales the overall dynamics to obtain deterministic systems characterized by ordinary differential equations, the well-known mass action reaction rate equations. For systems with multiple scales, there exist hybrid approaches that keep parts of the system discrete while another part is approximated either using Langevin dynamics or deterministically. This paper aims at giving a coherent overview of the different hybrid approaches, focusing on their basic concepts and the relation between them. We derive a novel general description of such hybrid models that allows expressing various forms by one type of equation. We also check in how far the approaches apply to model extensions of the CME for dynamics which do not comply with the central well-mixed condition and require some spatial resolution. A simple but meaningful gene expression system with negative self-regulation is analysed to illustrate the different approximation qualities of some of the hybrid approaches discussed. Especially, we reveal the cause of error in the case of small volume approximations.

  19. Distinct promoter activation mechanisms modulate noise-driven HIV gene expression

    NASA Astrophysics Data System (ADS)

    Chavali, Arvind K.; Wong, Victor C.; Miller-Jensen, Kathryn

    2015-12-01

    Latent human immunodeficiency virus (HIV) infections occur when the virus occupies a transcriptionally silent but reversible state, presenting a major obstacle to cure. There is experimental evidence that random fluctuations in gene expression, when coupled to the strong positive feedback encoded by the HIV genetic circuit, act as a ‘molecular switch’ controlling cell fate, i.e., viral replication versus latency. Here, we implemented a stochastic computational modeling approach to explore how different promoter activation mechanisms in the presence of positive feedback would affect noise-driven activation from latency. We modeled the HIV promoter as existing in one, two, or three states that are representative of increasingly complex mechanisms of promoter repression underlying latency. We demonstrate that two-state and three-state models are associated with greater variability in noisy activation behaviors, and we find that Fano factor (defined as variance over mean) proves to be a useful noise metric to compare variability across model structures and parameter values. Finally, we show how three-state promoter models can be used to qualitatively describe complex reactivation phenotypes in response to therapeutic perturbations that we observe experimentally. Ultimately, our analysis suggests that multi-state models more accurately reflect observed heterogeneous reactivation and may be better suited to evaluate how noise affects viral clearance.

  20. A Mixture Modeling Framework for Differential Analysis of High-Throughput Data

    PubMed Central

    Taslim, Cenny; Lin, Shili

    2014-01-01

    The inventions of microarray and next generation sequencing technologies have revolutionized research in genomics; platforms have led to massive amount of data in gene expression, methylation, and protein-DNA interactions. A common theme among a number of biological problems using high-throughput technologies is differential analysis. Despite the common theme, different data types have their own unique features, creating a “moving target” scenario. As such, methods specifically designed for one data type may not lead to satisfactory results when applied to another data type. To meet this challenge so that not only currently existing data types but also data from future problems, platforms, or experiments can be analyzed, we propose a mixture modeling framework that is flexible enough to automatically adapt to any moving target. More specifically, the approach considers several classes of mixture models and essentially provides a model-based procedure whose model is adaptive to the particular data being analyzed. We demonstrate the utility of the methodology by applying it to three types of real data: gene expression, methylation, and ChIP-seq. We also carried out simulations to gauge the performance and showed that the approach can be more efficient than any individual model without inflating type I error. PMID:25057284

  1. Hybrid models for chemical reaction networks: Multiscale theory and application to gene regulatory systems

    NASA Astrophysics Data System (ADS)

    Winkelmann, Stefanie; Schütte, Christof

    2017-09-01

    Well-mixed stochastic chemical kinetics are properly modeled by the chemical master equation (CME) and associated Markov jump processes in molecule number space. If the reactants are present in large amounts, however, corresponding simulations of the stochastic dynamics become computationally expensive and model reductions are demanded. The classical model reduction approach uniformly rescales the overall dynamics to obtain deterministic systems characterized by ordinary differential equations, the well-known mass action reaction rate equations. For systems with multiple scales, there exist hybrid approaches that keep parts of the system discrete while another part is approximated either using Langevin dynamics or deterministically. This paper aims at giving a coherent overview of the different hybrid approaches, focusing on their basic concepts and the relation between them. We derive a novel general description of such hybrid models that allows expressing various forms by one type of equation. We also check in how far the approaches apply to model extensions of the CME for dynamics which do not comply with the central well-mixed condition and require some spatial resolution. A simple but meaningful gene expression system with negative self-regulation is analysed to illustrate the different approximation qualities of some of the hybrid approaches discussed. Especially, we reveal the cause of error in the case of small volume approximations.

  2. [Intercellular communication-based robust circadian oscillation of the suprachiasmatic nucleus in the brain: mechanisms beyond intracellular clock machinery].

    PubMed

    Doi, Masao

    2013-12-01

    Recent advances in circadian biology strongly suggest that there are still genes involved in the generation and maintenance of biological rhythms that remain to be identified. It has been generally appreciated that circadian rhythms are generated intracellularly through transcription/translation-based autoregulatory feedback circuits of the clock genes. However, the existence of new intracellular clock machinery that cannot be explained by existing clock genes has recently been reported. This clock manifests as oxidation-reduction cycles of peroxiredoxin proteins, implying that as-yet-undiscovered clock genes may exist within cells to regulate redox cycling. Moreover, great strides have also been made in understanding the cell-cell communication-based robust circadian oscillations of the suprachiasmatic nucleus (SCN), the central pacemaker in the brain. Thousands of neurons that constitute the SCN maintain a high degree of synchrony in a way that allows the SCN neurons to create coherent signals as a whole. Inactivation of the genes involved in the cell-cell synchronization of the SCN, which include the genes encoding VIP, VPAC2, and RGS16, leads to altered circadian rhythms in behavior and physiologies. The purpose of this review is to provide an overview of recent advances in the circadian biology, with a special emphasis on the importance of cell-cell interactions within the SCN.

  3. New steroid 5alpha-reductase type I (SRD5A1) homologous sequences on human chromosomes 6 and 8.

    PubMed

    Eminović, I; Liović, M; Prezelj, J; Kocijancic, A; Rozman, D; Komel, R

    2001-01-01

    To date, two genes encoding 5alpha-reductase isoenzymes are known (type I, type II), and one type I pseudogene. The divergent localization of these genes and the still not fully understood function of the encoded enzymes as well as the perplexing results we obtained after sequencing PCR-amplified SRD5A1 gene fragments (out of genomic DNA), made us assume that, in addition to the known SRD5A1 gene, one or more different human 5alpha-reductase type I coding genes may exist. Our research provide the first evidence for the existence of two new SRD5A1 related, previously unidentified sequences in the human genome. These sequences which were localized to chromosomes 6 and 8 are highly homologous (> 99%) to SRD5A1, and also do not contain any deletions or insertions that are otherwise a characteristic of the SRD5API pseudogene. Our results imply that these sequences may be either coding parts of yet unknown, active SRD5A1 genes, and/or of previously unidentified pseudogenes. These findings additionally support data of Chen et al. who confirmed the existence of various SRD5A1 proteins in cultured human skin cells.

  4. An integrated approach for identifying wrongly labelled samples when performing classification in microarray data.

    PubMed

    Leung, Yuk Yee; Chang, Chun Qi; Hung, Yeung Sam

    2012-01-01

    Using hybrid approach for gene selection and classification is common as results obtained are generally better than performing the two tasks independently. Yet, for some microarray datasets, both classification accuracy and stability of gene sets obtained still have rooms for improvement. This may be due to the presence of samples with wrong class labels (i.e. outliers). Outlier detection algorithms proposed so far are either not suitable for microarray data, or only solve the outlier detection problem on their own. We tackle the outlier detection problem based on a previously proposed Multiple-Filter-Multiple-Wrapper (MFMW) model, which was demonstrated to yield promising results when compared to other hybrid approaches (Leung and Hung, 2010). To incorporate outlier detection and overcome limitations of the existing MFMW model, three new features are introduced in our proposed MFMW-outlier approach: 1) an unbiased external Leave-One-Out Cross-Validation framework is developed to replace internal cross-validation in the previous MFMW model; 2) wrongly labeled samples are identified within the MFMW-outlier model; and 3) a stable set of genes is selected using an L1-norm SVM that removes any redundant genes present. Six binary-class microarray datasets were tested. Comparing with outlier detection studies on the same datasets, MFMW-outlier could detect all the outliers found in the original paper (for which the data was provided for analysis), and the genes selected after outlier removal were proven to have biological relevance. We also compared MFMW-outlier with PRAPIV (Zhang et al., 2006) based on same synthetic datasets. MFMW-outlier gave better average precision and recall values on three different settings. Lastly, artificially flipped microarray datasets were created by removing our detected outliers and flipping some of the remaining samples' labels. Almost all the 'wrong' (artificially flipped) samples were detected, suggesting that MFMW-outlier was sufficiently powerful to detect outliers in high-dimensional microarray datasets.

  5. Modeling phenotypic metabolic adaptations of Mycobacterium tuberculosis H37Rv under hypoxia.

    PubMed

    Fang, Xin; Wallqvist, Anders; Reifman, Jaques

    2012-01-01

    The ability to adapt to different conditions is key for Mycobacterium tuberculosis, the causative agent of tuberculosis (TB), to successfully infect human hosts. Adaptations allow the organism to evade the host immune responses during acute infections and persist for an extended period of time during the latent infectious stage. In latently infected individuals, estimated to include one-third of the human population, the organism exists in a variety of metabolic states, which impedes the development of a simple strategy for controlling or eradicating this disease. Direct knowledge of the metabolic states of M. tuberculosis in patients would aid in the management of the disease as well as in forming the basis for developing new drugs and designing more efficacious drug cocktails. Here, we propose an in silico approach to create state-specific models based on readily available gene expression data. The coupling of differential gene expression data with a metabolic network model allowed us to characterize the metabolic adaptations of M. tuberculosis H37Rv to hypoxia. Given the microarray data for the alterations in gene expression, our model predicted reduced oxygen uptake, ATP production changes, and a global change from an oxidative to a reductive tricarboxylic acid (TCA) program. Alterations in the biomass composition indicated an increase in the cell wall metabolites required for cell-wall growth, as well as heightened accumulation of triacylglycerol in preparation for a low-nutrient, low metabolic activity life style. In contrast, the gene expression program in the deletion mutant of dosR, which encodes the immediate hypoxic response regulator, failed to adapt to low-oxygen stress. Our predictions were compatible with recent experimental observations of M. tuberculosis activity under hypoxic and anaerobic conditions. Importantly, alterations in the flow and accumulation of a particular metabolite were not necessarily directly linked to differential gene expression of the enzymes catalyzing the related metabolic reactions.

  6. Convergent occurrence of the developmental hourglass in plant and animal embryogenesis?

    PubMed

    Cridge, Andrew G; Dearden, Peter K; Brownfield, Lynette R

    2016-04-01

    The remarkable similarity of animal embryos at particular stages of development led to the proposal of a developmental hourglass. In this model, early events in development are less conserved across species but lead to a highly conserved 'phylotypic period'. Beyond this stage, the model suggests that development once again becomes less conserved, leading to the diversity of forms. Recent comparative studies of gene expression in animal groups have provided strong support for the hourglass model. How and why might such an hourglass pattern be generated? More importantly, how might early acting events in development evolve while still maintaining a later conserved stage? The discovery that an hourglass pattern may also exist in the embryogenesis of plants provides comparative data that may help us explain this phenomenon. Whether the developmental hourglass occurs in plants, and what this means for our understanding of embryogenesis in plants and animals is discussed. Models by which conserved early-acting genes might change their functional role in the evolution of gene networks, how networks buffer these changes, and how that might constrain, or confer diversity, of the body plan are also discused. Evidence of a morphological and molecular hourglass in plant and animal embryogenesis suggests convergent evolution. This convergence is likely due to developmental constraints imposed upon embryogenesis by the need to produce a viable embryo with an established body plan, controlled by the architecture of the underlying gene regulatory networks. As the body plan is largely laid down during the middle phases of embryo development in plants and animals, then it is perhaps not surprising this stage represents the narrow waist of the hourglass where the gene regulatory networks are the oldest and most robust and integrated, limiting species diversity and constraining morphological space. © The Author 2016. Published by Oxford University Press on behalf of the Annals of Botany Company.

  7. GENE EXPRESSION NETWORKS

    EPA Science Inventory

    "Gene expression network" is the term used to describe the interplay, simple or complex, between two or more gene products in performing a specific cellular function. Although the delineation of such networks is complicated by the existence of multiple and subtle types of intera...

  8. Selective Gene Transfection of Individual Cells In Vitro with Plasmonic Nanobubbles

    PubMed Central

    Lukianova-Hleb, Ekaterina; Samaniego, Adam P.; Wen, Jianguo; Metelitsa, Leonid; Chang, Chung-Che; Lapotko, Dmitri

    2011-01-01

    Gene delivery and transfection of eukaryotic cells is widely used for research and for developing gene cell therapy. However, the existing methods lack selectivity, efficacy and safety when heterogeneous cell systems must be treated. We report a new method that employs plasmonic nanobubbles (PNBs) for delivery and transfection. A PNB is a novel, tunable cellular agent with a dual mechanical and optical action due to the formation of the vapor nanobubble around a transiently heated gold nanoparticle upon its exposure to a laser pulse. PNBs enabled the mechanical injection of the extracellular cDNA plasmid into the cytoplasm of individual target living cells, cultured leukemia cells and human CD34+CD117+ stem cells and expression of a green fluorescent protein (GFP) in those cells. PNB generation and lifetime correlated with the expression of green fluorescent protein in PNB-treated cells. Optical scattering by PNBs additionally provided the detection of the target cells and the guidance of cDNA injection at single cell level. In both cell models PNBs demonstrated a gene transfection effect in a single pulse treatment with high selectivity, efficacy and safety. Thus, PNBs provided targeted gene delivery at the single cell level in a single pulse procedure that can be used for safe and effective gene therapy. PMID:21315120

  9. Selective gene transfection of individual cells in vitro with plasmonic nanobubbles.

    PubMed

    Lukianova-Hleb, Ekaterina Y; Samaniego, Adam P; Wen, Jianguo; Metelitsa, Leonid S; Chang, Chung-Che; Lapotko, Dmitri O

    2011-06-10

    Gene delivery and transfection of eukaryotic cells are widely used for research and for developing gene cell therapy. However, the existing methods lack selectivity, efficacy and safety when heterogeneous cell systems must be treated. We report a new method that employs plasmonic nanobubbles (PNBs) for delivery and transfection. A PNB is a novel, tunable cellular agent with a dual mechanical and optical action due to the formation of the vapor nanobubble around a transiently heated gold nanoparticle upon its exposure to a laser pulse. PNBs enabled the mechanical injection of the extracellular cDNA plasmid into the cytoplasm of individual target living cells, cultured leukemia cells and human CD34+ CD117+ stem cells and expression of a green fluorescent protein (GFP) in those cells. PNB generation and lifetime correlated with the expression of green fluorescent protein in PNB-treated cells. Optical scattering by PNBs additionally provided the detection of the target cells and the guidance of cDNA injection at single cell level. In both cell models PNBs demonstrated a gene transfection effect in a single pulse treatment with high selectivity, efficacy and safety. Thus, PNBs provided targeted gene delivery at the single cell level in a single pulse procedure that can be used for safe and effective gene therapy. Copyright © 2011 Elsevier B.V. All rights reserved.

  10. Isolation, X location and activity of the marsupial homologue of SLC16A2, an XIST-flanking gene in eutherian mammals

    PubMed Central

    Wakefield, Matthew J.; Walcher, Cristina; Disteche, Christine M.; Whitehead, Siobhan; Ross, Mark; Marshall Graves, Jennifer A.

    2010-01-01

    X chromosome inactivation (XCI) achieves dosage compensation between males and females for most X-linked genes in eutherian mammals. It is a whole-chromosome effect under the control of the XIST locus, although some genes escape inactivation. Marsupial XCI differs from the eutherian process, implying fundamental changes in the XCI mechanism during the evolution of the two lineages. There is no direct evidence for the existence of a marsupial XIST homologue. XCI has been studied for only a handful of genes in any marsupial, and none in the model kangaroo Macropus eugenii (the tammar wallaby). We have therefore studied the sequence, location and activity of a gene SLC16A2 (solute carrier, family 16, class A, member 2) that flanks XIST on the human and mouse X chromosomes. A BAC clone containing the marsupial SLC16A2 was mapped to the end of the long arm of the tammar X chromosome and used in RNA FISH experiments to determine whether one or both loci are transcribed in female cells. In male and female cells, only a single signal was found, indicating that the marsupial SLC16A2 gene is silenced on the inactivated X. PMID:16235118

  11. Transcriptomic analysis of rice aleurone cells identified a novel abscisic acid response element.

    PubMed

    Watanabe, Kenneth A; Homayouni, Arielle; Gu, Lingkun; Huang, Kuan-Ying; Ho, Tuan-Hua David; Shen, Qingxi J

    2017-09-01

    Seeds serve as a great model to study plant responses to drought stress, which is largely mediated by abscisic acid (ABA). The ABA responsive element (ABRE) is a key cis-regulatory element in ABA signalling. However, its consensus sequence (ACGTG(G/T)C) is present in the promoters of only about 40% of ABA-induced genes in rice aleurone cells, suggesting other ABREs may exist. To identify novel ABREs, RNA sequencing was performed on aleurone cells of rice seeds treated with 20 μM ABA. Gibbs sampling was used to identify enriched elements, and particle bombardment-mediated transient expression studies were performed to verify the function. Gene ontology analysis was performed to predict the roles of genes containing the novel ABREs. This study revealed 2443 ABA-inducible genes and a novel ABRE, designated as ABREN, which was experimentally verified to mediate ABA signalling in rice aleurone cells. Many of the ABREN-containing genes are predicted to be involved in stress responses and transcription. Analysis of other species suggests that the ABREN may be monocot specific. This study also revealed interesting expression patterns of genes involved in ABA metabolism and signalling. Collectively, this study advanced our understanding of diverse cis-regulatory sequences and the transcriptomes underlying ABA responses in rice aleurone cells. © 2017 John Wiley & Sons Ltd.

  12. Implications of genome wide association studies for addiction: are our a priori assumptions all wrong?

    PubMed

    Hall, F Scott; Drgonova, Jana; Jain, Siddharth; Uhl, George R

    2013-12-01

    Substantial genetic contributions to addiction vulnerability are supported by data from twin studies, linkage studies, candidate gene association studies and, more recently, Genome Wide Association Studies (GWAS). Parallel to this work, animal studies have attempted to identify the genes that may contribute to responses to addictive drugs and addiction liability, initially focusing upon genes for the targets of the major drugs of abuse. These studies identified genes/proteins that affect responses to drugs of abuse; however, this does not necessarily mean that variation in these genes contributes to the genetic component of addiction liability. One of the major problems with initial linkage and candidate gene studies was an a priori focus on the genes thought to be involved in addiction based upon the known contributions of those proteins to drug actions, making the identification of novel genes unlikely. The GWAS approach is systematic and agnostic to such a priori assumptions. From the numerous GWAS now completed several conclusions may be drawn: (1) addiction is highly polygenic; each allelic variant contributing in a small, additive fashion to addiction vulnerability; (2) unexpected, compared to our a priori assumptions, classes of genes are most important in explaining addiction vulnerability; (3) although substantial genetic heterogeneity exists, there is substantial convergence of GWAS signals on particular genes. This review traces the history of this research; from initial transgenic mouse models based upon candidate gene and linkage studies, through the progression of GWAS for addiction and nicotine cessation, to the current human and transgenic mouse studies post-GWAS. © 2013.

  13. Extensive Gene Remodeling in the Viral World: New Evidence for Nongradual Evolution in the Mobilome Network

    PubMed Central

    Jachiet, Pierre-Alain; Colson, Philippe; Lopez, Philippe; Bapteste, Eric

    2014-01-01

    Complex nongradual evolutionary processes such as gene remodeling are difficult to model, to visualize, and to investigate systematically. Despite these challenges, the creation of composite (or mosaic) genes by combination of genetic segments from unrelated gene families was established as an important adaptive phenomena in eukaryotic genomes. In contrast, almost no general studies have been conducted to quantify composite genes in viruses. Although viral genome mosaicism has been well-described, the extent of gene mosaicism and its rules of emergence remain largely unexplored. Applying methods from graph theory to inclusive similarity networks, and using data from more than 3,000 complete viral genomes, we provide the first demonstration that composite genes in viruses are 1) functionally biased, 2) involved in key aspects of the arm race between cells and viruses, and 3) can be classified into two distinct types of composite genes in all viral classes. Beyond the quantification of the widespread recombination of genes among different viruses of the same class, we also report a striking sharing of genetic information between viruses of different classes and with different nucleic acid types. This latter discovery provides novel evidence for the existence of a large and complex mobilome network, which appears partly bound by the sharing of genetic information and by the formation of composite genes between mobile entities with different genetic material. Considering that there are around 10E31 viruses on the planet, gene remodeling appears as a hugely significant way of generating and moving novel sequences between different kinds of organisms on Earth. PMID:25104113

  14. Dinucleotide controlled null models for comparative RNA gene prediction.

    PubMed

    Gesell, Tanja; Washietl, Stefan

    2008-05-27

    Comparative prediction of RNA structures can be used to identify functional noncoding RNAs in genomic screens. It was shown recently by Babak et al. [BMC Bioinformatics. 8:33] that RNA gene prediction programs can be biased by the genomic dinucleotide content, in particular those programs using a thermodynamic folding model including stacking energies. As a consequence, there is need for dinucleotide-preserving control strategies to assess the significance of such predictions. While there have been randomization algorithms for single sequences for many years, the problem has remained challenging for multiple alignments and there is currently no algorithm available. We present a program called SISSIz that simulates multiple alignments of a given average dinucleotide content. Meeting additional requirements of an accurate null model, the randomized alignments are on average of the same sequence diversity and preserve local conservation and gap patterns. We make use of a phylogenetic substitution model that includes overlapping dependencies and site-specific rates. Using fast heuristics and a distance based approach, a tree is estimated under this model which is used to guide the simulations. The new algorithm is tested on vertebrate genomic alignments and the effect on RNA structure predictions is studied. In addition, we directly combined the new null model with the RNAalifold consensus folding algorithm giving a new variant of a thermodynamic structure based RNA gene finding program that is not biased by the dinucleotide content. SISSIz implements an efficient algorithm to randomize multiple alignments preserving dinucleotide content. It can be used to get more accurate estimates of false positive rates of existing programs, to produce negative controls for the training of machine learning based programs, or as standalone RNA gene finding program. Other applications in comparative genomics that require randomization of multiple alignments can be considered. SISSIz is available as open source C code that can be compiled for every major platform and downloaded here: http://sourceforge.net/projects/sissiz.

  15. Knowledge management for systems biology a general and visually driven framework applied to translational medicine.

    PubMed

    Maier, Dieter; Kalus, Wenzel; Wolff, Martin; Kalko, Susana G; Roca, Josep; Marin de Mas, Igor; Turan, Nil; Cascante, Marta; Falciani, Francesco; Hernandez, Miguel; Villà-Freixa, Jordi; Losko, Sascha

    2011-03-05

    To enhance our understanding of complex biological systems like diseases we need to put all of the available data into context and use this to detect relations, pattern and rules which allow predictive hypotheses to be defined. Life science has become a data rich science with information about the behaviour of millions of entities like genes, chemical compounds, diseases, cell types and organs, which are organised in many different databases and/or spread throughout the literature. Existing knowledge such as genotype-phenotype relations or signal transduction pathways must be semantically integrated and dynamically organised into structured networks that are connected with clinical and experimental data. Different approaches to this challenge exist but so far none has proven entirely satisfactory. To address this challenge we previously developed a generic knowledge management framework, BioXM™, which allows the dynamic, graphic generation of domain specific knowledge representation models based on specific objects and their relations supporting annotations and ontologies. Here we demonstrate the utility of BioXM for knowledge management in systems biology as part of the EU FP6 BioBridge project on translational approaches to chronic diseases. From clinical and experimental data, text-mining results and public databases we generate a chronic obstructive pulmonary disease (COPD) knowledge base and demonstrate its use by mining specific molecular networks together with integrated clinical and experimental data. We generate the first semantically integrated COPD specific public knowledge base and find that for the integration of clinical and experimental data with pre-existing knowledge the configuration based set-up enabled by BioXM reduced implementation time and effort for the knowledge base compared to similar systems implemented as classical software development projects. The knowledgebase enables the retrieval of sub-networks including protein-protein interaction, pathway, gene--disease and gene--compound data which are used for subsequent data analysis, modelling and simulation. Pre-structured queries and reports enhance usability; establishing their use in everyday clinical settings requires further simplification with a browser based interface which is currently under development.

  16. Knowledge management for systems biology a general and visually driven framework applied to translational medicine

    PubMed Central

    2011-01-01

    Background To enhance our understanding of complex biological systems like diseases we need to put all of the available data into context and use this to detect relations, pattern and rules which allow predictive hypotheses to be defined. Life science has become a data rich science with information about the behaviour of millions of entities like genes, chemical compounds, diseases, cell types and organs, which are organised in many different databases and/or spread throughout the literature. Existing knowledge such as genotype - phenotype relations or signal transduction pathways must be semantically integrated and dynamically organised into structured networks that are connected with clinical and experimental data. Different approaches to this challenge exist but so far none has proven entirely satisfactory. Results To address this challenge we previously developed a generic knowledge management framework, BioXM™, which allows the dynamic, graphic generation of domain specific knowledge representation models based on specific objects and their relations supporting annotations and ontologies. Here we demonstrate the utility of BioXM for knowledge management in systems biology as part of the EU FP6 BioBridge project on translational approaches to chronic diseases. From clinical and experimental data, text-mining results and public databases we generate a chronic obstructive pulmonary disease (COPD) knowledge base and demonstrate its use by mining specific molecular networks together with integrated clinical and experimental data. Conclusions We generate the first semantically integrated COPD specific public knowledge base and find that for the integration of clinical and experimental data with pre-existing knowledge the configuration based set-up enabled by BioXM reduced implementation time and effort for the knowledge base compared to similar systems implemented as classical software development projects. The knowledgebase enables the retrieval of sub-networks including protein-protein interaction, pathway, gene - disease and gene - compound data which are used for subsequent data analysis, modelling and simulation. Pre-structured queries and reports enhance usability; establishing their use in everyday clinical settings requires further simplification with a browser based interface which is currently under development. PMID:21375767

  17. Molecular cloning and expression analysis of sea bass (Dicentrarchus labrax L.) tumor necrosis factor-alpha (TNF-alpha).

    PubMed

    Nascimento, Diana S; Pereira, Pedro J B; Reis, Marta I R; do Vale, Ana; Zou, Jun; Silva, Manuel T; Secombes, Christopher J; dos Santos, Nuno M S

    2007-09-01

    In the search for pro-inflammatory genes in sea bass a TNF-alpha gene was cloned and sequenced. The sea bass TNF-alpha (sbTNF-alpha) putative protein conserves the TNF-alpha family signature, as well as the two cysteines usually involved in the formation of a disulfide bond. The mouse TNF-alpha Thr-Leu cleavage sequence and a potential transmembrane domain were also found, suggesting that sbTNF-alpha exists as two forms: a approximately 28 kDa membrane-bound form and a approximately 18.4 kDa soluble protein. The single copy sbTNF-alpha gene contains a four exon-three intron structure similar to other known TNF-alpha genes. Homology modeling of sbTNF-alpha is compatible with the trimeric quaternary architecture of its mammalian counterparts. SbTNF-alpha is constitutively expressed in several unstimulated tissues, and was not up-regulated in the spleen and head-kidney, in response to UV-killed Photobacterium damselae subsp. piscicida. However, an increase of sbTNF-alpha expression was detected in the head-kidney during an experimental infection using the same pathogen.

  18. High-Throughput Screening Using iPSC-Derived Neuronal Progenitors to Identify Compounds Counteracting Epigenetic Gene Silencing in Fragile X Syndrome.

    PubMed

    Kaufmann, Markus; Schuffenhauer, Ansgar; Fruh, Isabelle; Klein, Jessica; Thiemeyer, Anke; Rigo, Pierre; Gomez-Mancilla, Baltazar; Heidinger-Millot, Valerie; Bouwmeester, Tewis; Schopfer, Ulrich; Mueller, Matthias; Fodor, Barna D; Cobos-Correa, Amanda

    2015-10-01

    Fragile X syndrome (FXS) is the most common form of inherited mental retardation, and it is caused in most of cases by epigenetic silencing of the Fmr1 gene. Today, no specific therapy exists for FXS, and current treatments are only directed to improve behavioral symptoms. Neuronal progenitors derived from FXS patient induced pluripotent stem cells (iPSCs) represent a unique model to study the disease and develop assays for large-scale drug discovery screens since they conserve the Fmr1 gene silenced within the disease context. We have established a high-content imaging assay to run a large-scale phenotypic screen aimed to identify compounds that reactivate the silenced Fmr1 gene. A set of 50,000 compounds was tested, including modulators of several epigenetic targets. We describe an integrated drug discovery model comprising iPSC generation, culture scale-up, and quality control and screening with a very sensitive high-content imaging assay assisted by single-cell image analysis and multiparametric data analysis based on machine learning algorithms. The screening identified several compounds that induced a weak expression of fragile X mental retardation protein (FMRP) and thus sets the basis for further large-scale screens to find candidate drugs or targets tackling the underlying mechanism of FXS with potential for therapeutic intervention. © 2015 Society for Laboratory Automation and Screening.

  19. Population-genetic models of sex-limited genomic imprinting.

    PubMed

    Kelly, S Thomas; Spencer, Hamish G

    2017-06-01

    Genomic imprinting is a form of epigenetic modification involving parent-of-origin-dependent gene expression, usually the inactivation of one gene copy in some tissues, at least, for some part of the diploid life cycle. Occurring at a number of loci in mammals and flowering plants, this mode of non-Mendelian expression can be viewed more generally as parentally-specific differential gene expression. The effects of natural selection on genetic variation at imprinted loci have previously been examined in a several population-genetic models. Here we expand the existing one-locus, two-allele population-genetic models of viability selection with genomic imprinting to include sex-limited imprinting, i.e., imprinted expression occurring only in one sex, and differential viability between the sexes. We first consider models of complete inactivation of either parental allele and these models are subsequently generalized to incorporate differential expression. Stable polymorphic equilibrium was possible without heterozygote advantage as observed in some prior models of imprinting in both sexes. In contrast to these latter models, in the sex-limited case it was critical whether the paternally inherited or maternally inherited allele was inactivated. The parental origin of inactivated alleles had a different impact on how the population responded to the different selection pressures between the sexes. Under the same fitness parameters, imprinting in the other sex altered the number of possible equilibrium states and their stability. When the parental origin of imprinted alleles and the sex in which they are inactive differ, an allele cannot be inactivated in consecutive generations. The system dynamics became more complex with more equilibrium points emerging. Our results show that selection can interact with epigenetic factors to maintain genetic variation in previously unanticipated ways. Copyright © 2017 Elsevier Inc. All rights reserved.

  20. SENCA: A Multilayered Codon Model to Study the Origins and Dynamics of Codon Usage

    PubMed Central

    Pouyet, Fanny; Bailly-Bechet, Marc; Mouchiroud, Dominique; Guéguen, Laurent

    2016-01-01

    Gene sequences are the target of evolution operating at different levels, including the nucleotide, codon, and amino acid levels. Disentangling the impact of those different levels on gene sequences requires developing a probabilistic model with three layers. Here we present SENCA (site evolution of nucleotides, codons, and amino acids), a codon substitution model that separately describes 1) nucleotide processes which apply on all sites of a sequence such as the mutational bias, 2) preferences between synonymous codons, and 3) preferences among amino acids. We argue that most synonymous substitutions are not neutral and that SENCA provides more accurate estimates of selection compared with more classical codon sequence models. We study the forces that drive the genomic content evolution, intraspecifically in the core genome of 21 prokaryotes and interspecifically for five Enterobacteria. We retrieve the existence of a universal mutational bias toward AT, and that taking into account selection on synonymous codon usage has consequences on the measurement of selection on nonsynonymous substitutions. We also confirm that codon usage bias is mostly driven by selection on preferred codons. We propose new summary statistics to measure the relative importance of the different evolutionary processes acting on sequences. PMID:27401173

  1. Fishing for causes and cures of motor neuron disorders

    PubMed Central

    Patten, Shunmoogum A.; Armstrong, Gary A. B.; Lissouba, Alexandra; Kabashi, Edor; Parker, J. Alex; Drapeau, Pierre

    2014-01-01

    Motor neuron disorders (MNDs) are a clinically heterogeneous group of neurological diseases characterized by progressive degeneration of motor neurons, and share some common pathological pathways. Despite remarkable advances in our understanding of these diseases, no curative treatment for MNDs exists. To better understand the pathogenesis of MNDs and to help develop new treatments, the establishment of animal models that can be studied efficiently and thoroughly is paramount. The zebrafish (Danio rerio) is increasingly becoming a valuable model for studying human diseases and in screening for potential therapeutics. In this Review, we highlight recent progress in using zebrafish to study the pathology of the most common MNDs: spinal muscular atrophy (SMA), amyotrophic lateral sclerosis (ALS) and hereditary spastic paraplegia (HSP). These studies indicate the power of zebrafish as a model to study the consequences of disease-related genes, because zebrafish homologues of human genes have conserved functions with respect to the aetiology of MNDs. Zebrafish also complement other animal models for the study of pathological mechanisms of MNDs and are particularly advantageous for the screening of compounds with therapeutic potential. We present an overview of their potential usefulness in MND drug discovery, which is just beginning and holds much promise for future therapeutic development. PMID:24973750

  2. An overview of bioinformatics methods for modeling biological pathways in yeast.

    PubMed

    Hou, Jie; Acharya, Lipi; Zhu, Dongxiao; Cheng, Jianlin

    2016-03-01

    The advent of high-throughput genomics techniques, along with the completion of genome sequencing projects, identification of protein-protein interactions and reconstruction of genome-scale pathways, has accelerated the development of systems biology research in the yeast organism Saccharomyces cerevisiae In particular, discovery of biological pathways in yeast has become an important forefront in systems biology, which aims to understand the interactions among molecules within a cell leading to certain cellular processes in response to a specific environment. While the existing theoretical and experimental approaches enable the investigation of well-known pathways involved in metabolism, gene regulation and signal transduction, bioinformatics methods offer new insights into computational modeling of biological pathways. A wide range of computational approaches has been proposed in the past for reconstructing biological pathways from high-throughput datasets. Here we review selected bioinformatics approaches for modeling biological pathways inS. cerevisiae, including metabolic pathways, gene-regulatory pathways and signaling pathways. We start with reviewing the research on biological pathways followed by discussing key biological databases. In addition, several representative computational approaches for modeling biological pathways in yeast are discussed. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  3. Verification of Gyrokinetic codes: theoretical background and applications

    NASA Astrophysics Data System (ADS)

    Tronko, Natalia

    2016-10-01

    In fusion plasmas the strong magnetic field allows the fast gyro motion to be systematically removed from the description of the dynamics, resulting in a considerable model simplification and gain of computational time. Nowadays, the gyrokinetic (GK) codes play a major role in the understanding of the development and the saturation of turbulence and in the prediction of the consequent transport. We present a new and generic theoretical framework and specific numerical applications to test the validity and the domain of applicability of existing GK codes. For a sound verification process, the underlying theoretical GK model and the numerical scheme must be considered at the same time, which makes this approach pioneering. At the analytical level, the main novelty consists in using advanced mathematical tools such as variational formulation of dynamics for systematization of basic GK code's equations to access the limits of their applicability. The indirect verification of numerical scheme is proposed via the Benchmark process. In this work, specific examples of code verification are presented for two GK codes: the multi-species electromagnetic ORB5 (PIC), and the radially global version of GENE (Eulerian). The proposed methodology can be applied to any existing GK code. We establish a hierarchy of reduced GK Vlasov-Maxwell equations using the generic variational formulation. Then, we derive and include the models implemented in ORB5 and GENE inside this hierarchy. At the computational level, detailed verification of global electromagnetic test cases based on the CYCLONE are considered, including a parametric β-scan covering the transition between the ITG to KBM and the spectral properties at the nominal β value.

  4. Modular modelling with Physiome standards

    PubMed Central

    Nickerson, David P.; Nielsen, Poul M. F.; Hunter, Peter J.

    2016-01-01

    Key points The complexity of computational models is increasing, supported by research in modelling tools and frameworks. But relatively little thought has gone into design principles for complex models.We propose a set of design principles for complex model construction with the Physiome standard modelling protocol CellML.By following the principles, models are generated that are extensible and are themselves suitable for reuse in larger models of increasing complexity.We illustrate these principles with examples including an architectural prototype linking, for the first time, electrophysiology, thermodynamically compliant metabolism, signal transduction, gene regulation and synthetic biology.The design principles complement other Physiome research projects, facilitating the application of virtual experiment protocols and model analysis techniques to assist the modelling community in creating libraries of composable, characterised and simulatable quantitative descriptions of physiology. Abstract The ability to produce and customise complex computational models has great potential to have a positive impact on human health. As the field develops towards whole‐cell models and linking such models in multi‐scale frameworks to encompass tissue, organ, or organism levels, reuse of previous modelling efforts will become increasingly necessary. Any modelling group wishing to reuse existing computational models as modules for their own work faces many challenges in the context of construction, storage, retrieval, documentation and analysis of such modules. Physiome standards, frameworks and tools seek to address several of these challenges, especially for models expressed in the modular protocol CellML. Aside from providing a general ability to produce modules, there has been relatively little research work on architectural principles of CellML models that will enable reuse at larger scales. To complement and support the existing tools and frameworks, we develop a set of principles to address this consideration. The principles are illustrated with examples that couple electrophysiology, signalling, metabolism, gene regulation and synthetic biology, together forming an architectural prototype for whole‐cell modelling (including human intervention) in CellML. Such models illustrate how testable units of quantitative biophysical simulation can be constructed. Finally, future relationships between modular models so constructed and Physiome frameworks and tools are discussed, with particular reference to how such frameworks and tools can in turn be extended to complement and gain more benefit from the results of applying the principles. PMID:27353233

  5. Identification and distribution of the NBS-LRR gene family in the cassava genome

    USDA-ARS?s Scientific Manuscript database

    Plant resistance genes (R genes) exist in large families and usually contain both a nucleotide-binding site domain and a leucine-rich repeat domain, denoted NBS-LRR. The genome sequence of cassava (Manihot esculenta) is a valuable resource for analyzing the genomic organization of resistance genes i...

  6. Testing cross-phenotype effects of rare variants in longitudinal studies of complex traits.

    PubMed

    Rudra, Pratyaydipta; Broadaway, K Alaine; Ware, Erin B; Jhun, Min A; Bielak, Lawrence F; Zhao, Wei; Smith, Jennifer A; Peyser, Patricia A; Kardia, Sharon L R; Epstein, Michael P; Ghosh, Debashis

    2018-06-01

    Many gene mapping studies of complex traits have identified genes or variants that influence multiple phenotypes. With the advent of next-generation sequencing technology, there has been substantial interest in identifying rare variants in genes that possess cross-phenotype effects. In the presence of such effects, modeling both the phenotypes and rare variants collectively using multivariate models can achieve higher statistical power compared to univariate methods that either model each phenotype separately or perform separate tests for each variant. Several studies collect phenotypic data over time and using such longitudinal data can further increase the power to detect genetic associations. Although rare-variant approaches exist for testing cross-phenotype effects at a single time point, there is no analogous method for performing such analyses using longitudinal outcomes. In order to fill this important gap, we propose an extension of Gene Association with Multiple Traits (GAMuT) test, a method for cross-phenotype analysis of rare variants using a framework based on the distance covariance. The approach allows for both binary and continuous phenotypes and can also adjust for covariates. Our simple adjustment to the GAMuT test allows it to handle longitudinal data and to gain power by exploiting temporal correlation. The approach is computationally efficient and applicable on a genome-wide scale due to the use of a closed-form test whose significance can be evaluated analytically. We use simulated data to demonstrate that our method has favorable power over competing approaches and also apply our approach to exome chip data from the Genetic Epidemiology Network of Arteriopathy. © 2018 WILEY PERIODICALS, INC.

  7. Evolution of the Bipolar Mating System of the Mushroom Coprinellus disseminatus From Its Tetrapolar Ancestors Involves Loss of Mating-Type-Specific Pheromone Receptor Function

    PubMed Central

    James, Timothy Y.; Srivilai, Prayook; Kües, Ursula; Vilgalys, Rytas

    2006-01-01

    Mating incompatibility in mushroom fungi is controlled by the mating-type loci. In tetrapolar species, two unlinked mating-type loci exist (A and B), whereas in bipolar species there is only one locus. The A and B mating-type loci encode homeodomain transcription factors and pheromones and pheromone receptors, respectively. Most mushroom species have a tetrapolar mating system, but numerous transitions to bipolar mating systems have occurred. Here we determined the genes controlling mating type in the bipolar mushroom Coprinellus disseminatus. Through positional cloning and degenerate PCR, we sequenced both the transcription factor and pheromone receptor mating-type gene homologs from C. disseminatus. Only the transcription factor genes segregate with mating type, discounting the hypothesis of genetic linkage between the A and B mating-type loci as the causal origin of bipolar mating behavior. The mating-type locus of C. disseminatus is similar to the A mating-type locus of the model species Coprinopsis cinerea and encodes two tightly linked pairs of homeodomain transcription factor genes. When transformed into C. cinerea, the C. disseminatus A and B homologs elicited sexual reactions like native mating-type genes. Although mating type in C. disseminatus is controlled by only the transcription factor genes, cellular functions appear to be conserved for both groups of genes. PMID:16461425

  8. Construction of regulatory networks using expression time-series data of a genotyped population.

    PubMed

    Yeung, Ka Yee; Dombek, Kenneth M; Lo, Kenneth; Mittler, John E; Zhu, Jun; Schadt, Eric E; Bumgarner, Roger E; Raftery, Adrian E

    2011-11-29

    The inference of regulatory and biochemical networks from large-scale genomics data is a basic problem in molecular biology. The goal is to generate testable hypotheses of gene-to-gene influences and subsequently to design bench experiments to confirm these network predictions. Coexpression of genes in large-scale gene-expression data implies coregulation and potential gene-gene interactions, but provide little information about the direction of influences. Here, we use both time-series data and genetics data to infer directionality of edges in regulatory networks: time-series data contain information about the chronological order of regulatory events and genetics data allow us to map DNA variations to variations at the RNA level. We generate microarray data measuring time-dependent gene-expression levels in 95 genotyped yeast segregants subjected to a drug perturbation. We develop a Bayesian model averaging regression algorithm that incorporates external information from diverse data types to infer regulatory networks from the time-series and genetics data. Our algorithm is capable of generating feedback loops. We show that our inferred network recovers existing and novel regulatory relationships. Following network construction, we generate independent microarray data on selected deletion mutants to prospectively test network predictions. We demonstrate the potential of our network to discover de novo transcription-factor binding sites. Applying our construction method to previously published data demonstrates that our method is competitive with leading network construction algorithms in the literature.

  9. Effects of FVIII immunity on hepatocyte and hematopoietic stem cell–directed gene therapy of murine hemophilia A

    PubMed Central

    Lytle, Allison M; Brown, Harrison C; Paik, Na Yoon; Knight, Kristopher A; Wright, J Fraser; Spencer, H Trent; Doering, Christopher B

    2016-01-01

    Immune responses to coagulation factors VIII (FVIII) and IX (FIX) represent primary obstacles to hemophilia treatment. Previously, we showed that hematopoietic stem cell (HSC) retroviral gene therapy induces immune nonresponsiveness to FVIII in both naive and preimmunized murine hemophilia A settings. Liver-directed adeno-associated viral (AAV)-FIX vector gene transfer achieved similar results in preclinical hemophilia B models. However, as clinical immune responses to FVIII and FIX differ, we investigated the ability of liver-directed AAV-FVIII gene therapy to affect FVIII immunity in hemophilia A mice. Both FVIII naive and preimmunized mice were administered recombinant AAV8 encoding a liver-directed bioengineered FVIII expression cassette. Naive animals receiving high or mid-doses subsequently achieved near normal FVIII activity levels. However, challenge with adjuvant-free recombinant FVIII induced loss of FVIII activity and anti-FVIII antibodies in mid-dose, but not high-dose AAV or HSC lentiviral (LV) vector gene therapy cohorts. Furthermore, unlike what was shown previously for FIX gene transfer, AAV-FVIII administration to hemophilia A inhibitor mice conferred no effect on anti-FVIII antibody or inhibitory titers. These data suggest that functional differences exist in the immune modulation achieved to FVIII or FIX in hemophilia mice by gene therapy approaches incorporating liver-directed AAV vectors or HSC-directed LV. PMID:26909355

  10. Expression studies of the PIS-regulated genes suggest different mechanisms of sex determination within mammals.

    PubMed

    Pannetier, M; Servel, N; Cocquet, J; Besnard, N; Cotinot, C; Pailhoux, E

    2003-01-01

    In mammals, the Y-located SRY gene is known to induce testis formation from the indifferent gonad. A related gene, SOX9, also plays a critical role in testis differentiation in mammals, in birds and reptiles. It is now assumed that SRY acts upstream of SOX9 in the sex determination cascade, but the regulatory link which should exist between these two genes remains unknown. Studies on XX sex reversal in polled goats (PIS mutation: Polled Intersex Syndrome) have led to the discovery of a female-specific locus crucial for ovarian differentiation. This genomic region is composed of at least two genes, FOXL2 and PISRT1, which share a common transcriptional regulatory region, PIS. In this review, we present the expression pattern of these PIS-regulated genes in mice. The FOXL2 expression profile of mice is similar to that described in goats in accordance with a conserved role of this ovarian differentiating gene in mammals. On the contrary, the PISRT1 expression profile is different between mice and goats, suggesting different mechanisms of the primary switch in the testis determination process within mammals. A model based on two different modes of SOX9 regulation in mice and other mammals is proposed in order to integrate our results into the current scheme of gonad differentiation. Copyright 2003 S. Karger AG, Basel

  11. Identification and analysis of Eimeria nieschulzi gametocyte genes reveal splicing events of gam genes and conserved motifs in the wall-forming proteins within the genus Eimeria (Coccidia, Apicomplexa)

    PubMed Central

    Wiedmer, Stefanie; Erdbeer, Alexander; Volke, Beate; Randel, Stephanie; Kapplusch, Franz; Hanig, Sacha; Kurth, Michael

    2017-01-01

    The genus Eimeria (Apicomplexa, Coccidia) provides a wide range of different species with different hosts to study common and variable features within the genus and its species. A common characteristic of all known Eimeria species is the oocyst, the infectious stage where its life cycle starts and ends. In our study, we utilized Eimeria nieschulzi as a model organism. This rat-specific parasite has complex oocyst morphology and can be transfected and even cultivated in vitro up to the oocyst stage. We wanted to elucidate how the known oocyst wall-forming proteins are preserved in this rodent Eimeria species compared to other Eimeria. In newly obtained genomics data, we were able to identify different gametocyte genes that are orthologous to already known gam genes involved in the oocyst wall formation of avian Eimeria species. These genes appeared putatively as single exon genes, but cDNA analysis showed alternative splicing events in the transcripts. The analysis of the translated sequence revealed different conserved motifs but also dissimilar regions in GAM proteins, as well as polymorphic regions. The occurrence of an underrepresented gam56 gene version suggests the existence of a second distinct E. nieschulzi genotype within the E. nieschulzi Landers isolate that we maintain. PMID:29210668

  12. Two co-existing germline mutations P53 V157D and PMS2 R20Q promote tumorigenesis in a familial cancer syndrome.

    PubMed

    Wang, Zuoyun; Sun, Yihua; Gao, Bin; Lu, Yi; Fang, Rong; Gao, Yijun; Xiao, Tian; Liu, Xin-Yuan; Pao, William; Zhao, Yun; Chen, Haiquan; Ji, Hongbin

    2014-01-01

    Germline mutations are responsible for familial cancer syndromes which account for approximately 5-10% of all types of cancers. These mutations mainly occur at tumor suppressor genes or genome stability genes, such as DNA repair genes. Here we have identified a cancer predisposition family, in which eight members were inflicted with a wide spectrum of cancer including one diagnosed with lung cancer at 22years old. Sequencing analysis of tumor samples as well as histologically normal specimens identified two germline mutations co-existing in the familial cancer syndrome, the mutation of tumor suppressor gene P53 V157D and mismatch repair gene PMS2 R20Q. We further demonstrate that P53 V157D and/or PMS2 R20Q mutant promotes lung cancer cell proliferation. These two mutants are capable of promoting colony formation in soft agar as well as tumor formation in transgenic drosophila system. Collectively, these data have uncovered the important role of co-existing germline P53 and PMS2 mutations in the familial cancer syndrome development. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  13. Two co-existing germline mutations P53 V157D and PMS2 R20Q promote tumorigenesis in a familial cancer syndrome

    PubMed Central

    Wang, Zuoyun; Sun, Yihua; Gao, Bin; Lu, Yi; Fang, Rong; Gao, Yijun; Xiao, Tian; Liu, Xin-Yuan; Pao, William; Zhao, Yun; Chen, Haiquan; Ji, Hongbin

    2014-01-01

    Germline mutations are responsible for familial cancer syndromes which account for approximately 5–10% of all types of cancers. These mutations mainly occur at tumor suppressor genes or genome stability genes, such as DNA repair genes. Here we have identified a cancer predisposition family, in which eight members were inflicted with a wide spectrum of cancer including one diagnosed with lung cancer at 22 years old. Sequencing analysis of tumor samples as well as histologically normal specimens identified two germline mutations co-existing in the familial cancer syndrome, the mutation of tumor suppressor gene P53 V157D and mismatch repair gene PMS2 R20Q. We further demonstrate that P53 V157D and/or PMS2 R20Q mutant promotes lung cancer cell proliferation. These two mutants are capable of promoting colony formation in soft agar as well as tumor formation in transgenic drosophila system. Collectively, these data have uncovered the important role of co-existing germline P53 and PMS2 mutations in the familial cancer syndrome development. PMID:23981578

  14. Short and long-term genome stability analysis of prokaryotic genomes.

    PubMed

    Brilli, Matteo; Liò, Pietro; Lacroix, Vincent; Sagot, Marie-France

    2013-05-08

    Gene organization dynamics is actively studied because it provides useful evolutionary information, makes functional annotation easier and often enables to characterize pathogens. There is therefore a strong interest in understanding the variability of this trait and the possible correlations with life-style. Two kinds of events affect genome organization: on one hand translocations and recombinations change the relative position of genes shared by two genomes (i.e. the backbone gene order); on the other, insertions and deletions leave the backbone gene order unchanged but they alter the gene neighborhoods by breaking the syntenic regions. A complete picture about genome organization evolution therefore requires to account for both kinds of events. We developed an approach where we model chromosomes as graphs on which we compute different stability estimators; we consider genome rearrangements as well as the effect of gene insertions and deletions. In a first part of the paper, we fit a measure of backbone gene order conservation (hereinafter called backbone stability) against phylogenetic distance for over 3000 genome comparisons, improving existing models for the divergence in time of backbone stability. Intra- and inter-specific comparisons were treated separately to focus on different time-scales. The use of multiple genomes of a same species allowed to identify genomes with diverging gene order with respect to their conspecific. The inter-species analysis indicates that pathogens are more often unstable with respect to non-pathogens. In a second part of the text, we show that in pathogens, gene content dynamics (insertions and deletions) have a much more dramatic effect on genome organization stability than backbone rearrangements. In this work, we studied genome organization divergence taking into account the contribution of both genome order rearrangements and genome content dynamics. By studying species with multiple sequenced genomes available, we were able to explore genome organization stability at different time-scales and to find significant differences for pathogen and non-pathogen species. The output of our framework also allows to identify the conserved gene clusters and/or partial occurrences thereof, making possible to explore how gene clusters assembled during evolution.

  15. An algorithm for computing the gene tree probability under the multispecies coalescent and its application in the inference of population tree

    PubMed Central

    2016-01-01

    Motivation: Gene tree represents the evolutionary history of gene lineages that originate from multiple related populations. Under the multispecies coalescent model, lineages may coalesce outside the species (population) boundary. Given a species tree (with branch lengths), the gene tree probability is the probability of observing a specific gene tree topology under the multispecies coalescent model. There are two existing algorithms for computing the exact gene tree probability. The first algorithm is due to Degnan and Salter, where they enumerate all the so-called coalescent histories for the given species tree and the gene tree topology. Their algorithm runs in exponential time in the number of gene lineages in general. The second algorithm is the STELLS algorithm (2012), which is usually faster but also runs in exponential time in almost all the cases. Results: In this article, we present a new algorithm, called CompactCH, for computing the exact gene tree probability. This new algorithm is based on the notion of compact coalescent histories: multiple coalescent histories are represented by a single compact coalescent history. The key advantage of our new algorithm is that it runs in polynomial time in the number of gene lineages if the number of populations is fixed to be a constant. The new algorithm is more efficient than the STELLS algorithm both in theory and in practice when the number of populations is small and there are multiple gene lineages from each population. As an application, we show that CompactCH can be applied in the inference of population tree (i.e. the population divergence history) from population haplotypes. Simulation results show that the CompactCH algorithm enables efficient and accurate inference of population trees with much more haplotypes than a previous approach. Availability: The CompactCH algorithm is implemented in the STELLS software package, which is available for download at http://www.engr.uconn.edu/ywu/STELLS.html. Contact: ywu@engr.uconn.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307621

  16. Evolution of disease response genes in loblolly pine: insights from candidate genes.

    PubMed

    Ersoz, Elhan S; Wright, Mark H; González-Martínez, Santiago C; Langley, Charles H; Neale, David B

    2010-12-06

    Host-pathogen interactions that may lead to a competitive co-evolution of virulence and resistance mechanisms present an attractive system to study molecular evolution because strong, recent (or even current) selective pressure is expected at many genomic loci. However, it is unclear whether these selective forces would act to preserve existing diversity, promote novel diversity, or reduce linked neutral diversity during rapid fixation of advantageous alleles. In plants, the lack of adaptive immunity places a larger burden on genetic diversity to ensure survival of plant populations. This burden is even greater if the generation time of the plant is much longer than the generation time of the pathogen. Here, we present nucleotide polymorphism and substitution data for 41 candidate genes from the long-lived forest tree loblolly pine, selected primarily for their prospective influences on host-pathogen interactions. This dataset is analyzed together with 15 drought-tolerance and 13 wood-quality genes from previous studies. A wide range of neutrality tests were performed and tested against expectations from realistic demographic models. Collectively, our analyses found that axr (auxin response factor), caf1 (chromatin assembly factor) and gatabp1 (gata binding protein 1) candidate genes carry patterns consistent with directional selection and erd3 (early response to drought 3) displays patterns suggestive of a selective sweep, both of which are consistent with the arm-race model of disease response evolution. Furthermore, we have identified patterns consistent with diversifying selection at erf1-like (ethylene responsive factor 1), ccoaoemt (caffeoyl-CoA-O-methyltransferase), cyp450-like (cytochrome p450-like) and pr4.3 (pathogen response 4.3), expected under the trench-warfare evolution model. Finally, a drought-tolerance candidate related to the plant cell wall, lp5, displayed patterns consistent with balancing selection. In conclusion, both arms-race and trench-warfare models seem compatible with patterns of polymorphism found in different disease-response candidate genes, indicating a mixed strategy of disease tolerance evolution for loblolly pine, a major tree crop in southeastern United States.

  17. Power of data mining methods to detect genetic associations and interactions.

    PubMed

    Molinaro, Annette M; Carriero, Nicholas; Bjornson, Robert; Hartge, Patricia; Rothman, Nathaniel; Chatterjee, Nilanjan

    2011-01-01

    Genetic association studies, thus far, have focused on the analysis of individual main effects of SNP markers. Nonetheless, there is a clear need for modeling epistasis or gene-gene interactions to better understand the biologic basis of existing associations. Tree-based methods have been widely studied as tools for building prediction models based on complex variable interactions. An understanding of the power of such methods for the discovery of genetic associations in the presence of complex interactions is of great importance. Here, we systematically evaluate the power of three leading algorithms: random forests (RF), Monte Carlo logic regression (MCLR), and multifactor dimensionality reduction (MDR). We use the algorithm-specific variable importance measures (VIMs) as statistics and employ permutation-based resampling to generate the null distribution and associated p values. The power of the three is assessed via simulation studies. Additionally, in a data analysis, we evaluate the associations between individual SNPs in pro-inflammatory and immunoregulatory genes and the risk of non-Hodgkin lymphoma. The power of RF is highest in all simulation models, that of MCLR is similar to RF in half, and that of MDR is consistently the lowest. Our study indicates that the power of RF VIMs is most reliable. However, in addition to tuning parameters, the power of RF is notably influenced by the type of variable (continuous vs. categorical) and the chosen VIM. Copyright © 2011 S. Karger AG, Basel.

  18. Modeling of DNA local parameters predicts encrypted architectural motifs in Xenopus laevis ribosomal gene promoter.

    PubMed

    Roux-Rouquie, M; Marilley, M

    2000-09-15

    We have modeled local DNA sequence parameters to search for DNA architectural motifs involved in transcription regulation and promotion within the Xenopus laevis ribosomal gene promoter and the intergenic spacer (IGS) sequences. The IGS was found to be shaped into distinct topological domains. First, intrinsic bends split the IGS into domains of common but different helical features. Local parameters at inter-domain junctions exhibit a high variability with respect to intrinsic curvature, bendability and thermal stability. Secondly, the repeated sequence blocks of the IGS exhibit right-handed supercoiled structures which could be related to their enhancer properties. Thirdly, the gene promoter presents both inherent curvature and minor groove narrowing which may be viewed as motifs of a structural code for protein recognition and binding. Such pre-existing deformations could simply be remodeled during the binding of the transcription complex. Alternatively, these deformations could pre-shape the promoter in such a way that further remodeling is facilitated. Mutations shown to abolish promoter curvature as well as intrinsic minor groove narrowing, in a variant which maintained full transcriptional activity, bring circumstantial evidence for structurally-preorganized motifs in relation to transcription regulation and promotion. Using well documented X. laevis rDNA regulatory sequences we showed that computer modeling may be of invaluable assistance in assessing encrypted architectural motifs. The evidence of these DNA topological motifs with respect to the concept of structural code is discussed.

  19. Dissecting maize diversity in lowland South America: genetic structure and geographic distribution models.

    PubMed

    Bracco, Mariana; Cascales, Jimena; Hernández, Julián Cámara; Poggio, Lidia; Gottlieb, Alexandra M; Lia, Verónica V

    2016-08-26

    Maize landraces from South America have traditionally been assigned to two main categories: Andean and Tropical Lowland germplasm. However, the genetic structure and affiliations of the lowland gene pools have been difficult to assess due to limited sampling and the lack of comparative analysis. Here, we examined SSR and Adh2 sequence variation in a diverse sample of maize landraces from lowland middle South America, and performed a comprehensive integrative analysis of population structure and diversity including already published data of archaeological and extant specimens from the Americas. Geographic distribution models were used to explore the relationship between environmental factors and the observed genetic structure. Bayesian and multivariate analyses of population structure showed the existence of two previously overlooked lowland gene pools associated with Guaraní indigenous communities of middle South America. The singularity of this germplasm was also evidenced by the frequency distribution of microsatellite repeat motifs of the Adh2 locus and the distinct spatial pattern inferred from geographic distribution models. Our results challenge the prevailing view that lowland middle South America is just a contact zone between Andean and Tropical Lowland germplasm and highlight the occurrence of a unique, locally adapted gene pool. This information is relevant for the conservation and utilization of maize genetic resources, as well as for a better understanding of environment-genotype associations.

  20. SSER: Species specific essential reactions database.

    PubMed

    Labena, Abraham A; Ye, Yuan-Nong; Dong, Chuan; Zhang, Fa-Z; Guo, Feng-Biao

    2017-04-19

    Essential reactions are vital components of cellular networks. They are the foundations of synthetic biology and are potential candidate targets for antimetabolic drug design. Especially if a single reaction is catalyzed by multiple enzymes, then inhibiting the reaction would be a better option than targeting the enzymes or the corresponding enzyme-encoding gene. The existing databases such as BRENDA, BiGG, KEGG, Bio-models, Biosilico, and many others offer useful and comprehensive information on biochemical reactions. But none of these databases especially focus on essential reactions. Therefore, building a centralized repository for this class of reactions would be of great value. Here, we present a species-specific essential reactions database (SSER). The current version comprises essential biochemical and transport reactions of twenty-six organisms which are identified via flux balance analysis (FBA) combined with manual curation on experimentally validated metabolic network models. Quantitative data on the number of essential reactions, number of the essential reactions associated with their respective enzyme-encoding genes and shared essential reactions across organisms are the main contents of the database. SSER would be a prime source to obtain essential reactions data and related gene and metabolite information and it can significantly facilitate the metabolic network models reconstruction and analysis, and drug target discovery studies. Users can browse, search, compare and download the essential reactions of organisms of their interest through the website http://cefg.uestc.edu.cn/sser .

  1. Archaeal “Dark Matter” and the Origin of Eukaryotes

    PubMed Central

    Williams, Tom A.; Embley, T. Martin

    2014-01-01

    Current hypotheses about the history of cellular life are mainly based on analyses of cultivated organisms, but these represent only a small fraction of extant biodiversity. The sequencing of new environmental lineages therefore provides an opportunity to test, revise, or reject existing ideas about the tree of life and the origin of eukaryotes. According to the textbook three domains hypothesis, the eukaryotes emerge as the sister group to a monophyletic Archaea. However, recent analyses incorporating better phylogenetic models and an improved sampling of the archaeal domain have generally supported the competing eocyte hypothesis, in which core genes of eukaryotic cells originated from within the Archaea, with important implications for eukaryogenesis. Given this trend, it was surprising that a recent analysis incorporating new genomes from uncultivated Archaea recovered a strongly supported three domains tree. Here, we show that this result was due in part to the use of a poorly fitting phylogenetic model and also to the inclusion by an automated pipeline of genes of putative bacterial origin rather than nucleocytosolic versions for some of the eukaryotes analyzed. When these issues were resolved, analyses including the new archaeal lineages placed core eukaryotic genes within the Archaea. These results are consistent with a number of recent studies in which improved archaeal sampling and better phylogenetic models agree in supporting the eocyte tree over the three domains hypothesis. PMID:24532674

  2. Archaeal "dark matter" and the origin of eukaryotes.

    PubMed

    Williams, Tom A; Embley, T Martin

    2014-03-01

    Current hypotheses about the history of cellular life are mainly based on analyses of cultivated organisms, but these represent only a small fraction of extant biodiversity. The sequencing of new environmental lineages therefore provides an opportunity to test, revise, or reject existing ideas about the tree of life and the origin of eukaryotes. According to the textbook three domains hypothesis, the eukaryotes emerge as the sister group to a monophyletic Archaea. However, recent analyses incorporating better phylogenetic models and an improved sampling of the archaeal domain have generally supported the competing eocyte hypothesis, in which core genes of eukaryotic cells originated from within the Archaea, with important implications for eukaryogenesis. Given this trend, it was surprising that a recent analysis incorporating new genomes from uncultivated Archaea recovered a strongly supported three domains tree. Here, we show that this result was due in part to the use of a poorly fitting phylogenetic model and also to the inclusion by an automated pipeline of genes of putative bacterial origin rather than nucleocytosolic versions for some of the eukaryotes analyzed. When these issues were resolved, analyses including the new archaeal lineages placed core eukaryotic genes within the Archaea. These results are consistent with a number of recent studies in which improved archaeal sampling and better phylogenetic models agree in supporting the eocyte tree over the three domains hypothesis.

  3. Multi-scale chromatin state annotation using a hierarchical hidden Markov model

    NASA Astrophysics Data System (ADS)

    Marco, Eugenio; Meuleman, Wouter; Huang, Jialiang; Glass, Kimberly; Pinello, Luca; Wang, Jianrong; Kellis, Manolis; Yuan, Guo-Cheng

    2017-04-01

    Chromatin-state analysis is widely applied in the studies of development and diseases. However, existing methods operate at a single length scale, and therefore cannot distinguish large domains from isolated elements of the same type. To overcome this limitation, we present a hierarchical hidden Markov model, diHMM, to systematically annotate chromatin states at multiple length scales. We apply diHMM to analyse a public ChIP-seq data set. diHMM not only accurately captures nucleosome-level information, but identifies domain-level states that vary in nucleosome-level state composition, spatial distribution and functionality. The domain-level states recapitulate known patterns such as super-enhancers, bivalent promoters and Polycomb repressed regions, and identify additional patterns whose biological functions are not yet characterized. By integrating chromatin-state information with gene expression and Hi-C data, we identify context-dependent functions of nucleosome-level states. Thus, diHMM provides a powerful tool for investigating the role of higher-order chromatin structure in gene regulation.

  4. De novo Genome Assembly of the Fungal Plant Pathogen Pyrenophora semeniperda

    PubMed Central

    Soliai, Marcus M.; Meyer, Susan E.; Udall, Joshua A.; Elzinga, David E.; Hermansen, Russell A.; Bodily, Paul M.; Hart, Aaron A.; Coleman, Craig E.

    2014-01-01

    Pyrenophora semeniperda (anamorph Drechslera campulata) is a necrotrophic fungal seed pathogen that has a wide host range within the Poaceae. One of its hosts is cheatgrass (Bromus tectorum), a species exotic to the United States that has invaded natural ecosystems of the Intermountain West. As a natural pathogen of cheatgrass, P. semeniperda has potential as a biocontrol agent due to its effectiveness at killing seeds within the seed bank; however, few genetic resources exist for the fungus. Here, the genome of P. semeniperda isolate assembled from sequence reads of 454 pyrosequencing is presented. The total assembly is 32.5 Mb and includes 11,453 gene models encoding putative proteins larger than 24 amino acids. The models represent a variety of putative genes that are involved in pathogenic pathways typically found in necrotrophic fungi. In addition, extensive rearrangements, including inter- and intrachromosomal rearrangements, were found when the P. semeniperda genome was compared to P. tritici-repentis, a related fungal species. PMID:24475219

  5. Cross-platform normalization of microarray and RNA-seq data for machine learning applications

    PubMed Central

    Thompson, Jeffrey A.; Tan, Jie

    2016-01-01

    Large, publicly available gene expression datasets are often analyzed with the aid of machine learning algorithms. Although RNA-seq is increasingly the technology of choice, a wealth of expression data already exist in the form of microarray data. If machine learning models built from legacy data can be applied to RNA-seq data, larger, more diverse training datasets can be created and validation can be performed on newly generated data. We developed Training Distribution Matching (TDM), which transforms RNA-seq data for use with models constructed from legacy platforms. We evaluated TDM, as well as quantile normalization, nonparanormal transformation, and a simple log2 transformation, on both simulated and biological datasets of gene expression. Our evaluation included both supervised and unsupervised machine learning approaches. We found that TDM exhibited consistently strong performance across settings and that quantile normalization also performed well in many circumstances. We also provide a TDM package for the R programming language. PMID:26844019

  6. Molecular Cooperativity Governs Diverse and Monoallelic Olfactory Receptor Expression

    NASA Astrophysics Data System (ADS)

    Xing, Jianhua; Tian, Xiaojun; Zhang, Hang; Sannerud, Jens

    Multiple-objective optimization is common in biological systems. In the mammalian olfactory system, each sensory neuron stochastically expresses only one out of up to thousands of olfactory receptor (OR) gene alleles; at organism level the types of expressed ORs need to be maximized. The molecular mechanism of this Nobel-Prize winning puzzle remains unresolved after decades of extensive studies. Existing models focus only on monoallele activation, and cannot explain recent observations in mutants, especially the reduced global diversity of expressed ORs in G9a/GLP knockouts. In this work we integrated existing information on OR expression, and proposed an evolutionarily optimized three-layer regulation mechanism, which includes zonal segregation, epigenetic and enhancer competition coupled to a negative feedback loop. This model not only recapitulates monoallelic OR expression, but also elucidates how the olfactory system maximizes and maintains the diversity of OR expression. The model is validated by several experimental results, and particularly underscores cooperativity and synergy as a general design principle of multi-objective optimization in biology. The work is supported by the NIGMS/DMS Mathematical Biology program.

  7. Ancient human miRNAs are more likely to have broad functions and disease associations than young miRNAs.

    PubMed

    Patel, Vir D; Capra, John A

    2017-08-31

    microRNAs (miRNAs) are essential to the regulation of gene expression in eukaryotes, and improper expression of miRNAs contributes to hundreds of diseases. Despite the essential functions of miRNAs, the evolutionary dynamics of how they are integrated into existing gene regulatory and functional networks is not well understood. Knowledge of the origin and evolutionary history a gene has proven informative about its functions and disease associations; we hypothesize that incorporating the evolutionary origins of miRNAs into analyses will help resolve differences in their functional dynamics and how they influence disease. We computed the phylogenetic age of miRNAs across 146 species and quantified the relationship between human miRNA age and several functional attributes. Older miRNAs are significantly more likely to be associated with disease than younger miRNAs, and the number of associated diseases increases with age. As has been observed for genes, the miRNAs associated with different diseases have different age profiles. For example, human miRNAs implicated in cancer are enriched for origins near the dawn of animal multicellularity. Consistent with the increasing contribution of miRNAs to disease with age, older miRNAs target more genes than younger miRNAs, and older miRNAs are expressed in significantly more tissues. Furthermore, miRNAs of all ages exhibit a strong preference to target older genes; 93% of validated miRNA gene targets were in existence at the origin of the targeting miRNA. Finally, we find that human miRNAs in evolutionarily related families are more similar in their targets and expression profiles than unrelated miRNAs. Considering the evolutionary origin and history of a miRNA provides useful context for the analysis of its function. Consistent with recent work in Drosophila, our results support a model in which miRNAs increase their expression and functional regulatory interactions over evolutionary time, and thus older miRNAs have increased potential to cause disease. We anticipate that these patterns hold across mammalian species; however, comprehensively evaluating them will require refining miRNA annotations across species and collecting functional data in non-human systems.

  8. Integrating text mining, data mining, and network analysis for identifying genetic breast cancer trends.

    PubMed

    Jurca, Gabriela; Addam, Omar; Aksac, Alper; Gao, Shang; Özyer, Tansel; Demetrick, Douglas; Alhajj, Reda

    2016-04-26

    Breast cancer is a serious disease which affects many women and may lead to death. It has received considerable attention from the research community. Thus, biomedical researchers aim to find genetic biomarkers indicative of the disease. Novel biomarkers can be elucidated from the existing literature. However, the vast amount of scientific publications on breast cancer make this a daunting task. This paper presents a framework which investigates existing literature data for informative discoveries. It integrates text mining and social network analysis in order to identify new potential biomarkers for breast cancer. We utilized PubMed for the testing. We investigated gene-gene interactions, as well as novel interactions such as gene-year, gene-country, and abstract-country to find out how the discoveries varied over time and how overlapping/diverse are the discoveries and the interest of various research groups in different countries. Interesting trends have been identified and discussed, e.g., different genes are highlighted in relationship to different countries though the various genes were found to share functionality. Some text analysis based results have been validated against results from other tools that predict gene-gene relations and gene functions.

  9. Isolation and characterization of multiple F-box genes linked to the S9- and S10-RNase in apple (Malus × domestica Borkh.).

    PubMed

    Okada, Kazuma; Moriya, Shigeki; Haji, Takashi; Abe, Kazuyuki

    2013-06-01

    Using 11 consensus primer pairs designed from S-linked F-box genes of apple and Japanese pear, 10 new F-box genes (MdFBX21 to 30) were isolated from the apple cultivar 'Spartan' (S(9)S(10)). MdFBX21 to 23 and MdFBX24 to 30 were completely linked to the S(9) -RNase and S(10-)RNase, respectively, and showed pollen-specific expression and S-haplotype-specific polymorphisms. Therefore, these 10 F-box genes are good candidates for the pollen determinant of self-incompatibility in apple. Phylogenetic analysis and comparison of deduced amino acid sequences of MdFBX21 to 30 with those of 25 S-linked F-box genes previously isolated from apple showed that a deduced amino acid identity of greater than 88.0 % can be used as the tentative criterion to classify F-box genes into one type. Using this criterion, 31 of 35 F-box genes of apple were classified into 11 types (SFBB1-11). All types included F-box genes derived from S(3-) and S(9-)haplotypes, and seven types included F-box genes derived from S(3-), S(9-), and S(10-)haplotypes. Moreover, comparison of nucleotide sequences of S-RNases and multiple F-box genes among S(3-), S(9-), and S(10-)haplotypes suggested that F-box genes within each type showed high nucleotide identity regardless of the identity of the S-RNase. The large number of F-box genes as candidates for the pollen determinant and the high degree of conservation within each type are consistent with the collaborative non-self-recognition model reported for Petunia. These findings support that the collaborative non-self-recognition system also exists in apple.

  10. Diversity in Expression of Phosphorus (P) Responsive Genes in Cucumis melo L

    PubMed Central

    Fita, Ana; Bowen, Helen C.; Hayden, Rory M.; Nuez, Fernando; Picó, Belén; Hammond, John P.

    2012-01-01

    Background Phosphorus (P) is a major limiting nutrient for plant growth in many soils. Studies in model species have identified genes involved in plant adaptations to low soil P availability. However, little information is available on the genetic bases of these adaptations in vegetable crops. In this respect, sequence data for melon now makes it possible to identify melon orthologues of candidate P responsive genes, and the expression of these genes can be used to explain the diversity in the root system adaptation to low P availability, recently observed in this species. Methodology and Findings Transcriptional responses to P starvation were studied in nine diverse melon accessions by comparing the expression of eight candidate genes (Cm-PAP10.1, Cm-PAP10.2, Cm-RNS1, Cm-PPCK1, Cm-transferase, Cm-SQD1, Cm-DGD1 and Cm-SPX2) under P replete and P starved conditions. Differences among melon accessions were observed in response to P starvation, including differences in plant morphology, P uptake, P use efficiency (PUE) and gene expression. All studied genes were up regulated under P starvation conditions. Differences in the expression of genes involved in P mobilization and remobilization (Cm-PAP10.1, Cm-PAP10.2 and Cm-RNS1) under P starvation conditions explained part of the differences in P uptake and PUE among melon accessions. The levels of expression of the other studied genes were diverse among melon accessions, but contributed less to the phenotypical response of the accessions. Conclusions This is the first time that these genes have been described in the context of P starvation responses in melon. There exists significant diversity in gene expression levels and P use efficiency among melon accessions as well as significant correlations between gene expression levels and phenotypical measurements. PMID:22536378

  11. Paternal poly (ADP-ribose) metabolism modulates retention of inheritable sperm histones and early embryonic gene expression.

    PubMed

    Ihara, Motomasa; Meyer-Ficca, Mirella L; Leu, N Adrian; Rao, Shilpa; Li, Fan; Gregory, Brian D; Zalenskaya, Irina A; Schultz, Richard M; Meyer, Ralph G

    2014-05-01

    To achieve the extreme nuclear condensation necessary for sperm function, most histones are replaced with protamines during spermiogenesis in mammals. Mature sperm retain only a small fraction of nucleosomes, which are, in part, enriched on gene regulatory sequences, and recent findings suggest that these retained histones provide epigenetic information that regulates expression of a subset of genes involved in embryo development after fertilization. We addressed this tantalizing hypothesis by analyzing two mouse models exhibiting abnormal histone positioning in mature sperm due to impaired poly(ADP-ribose) (PAR) metabolism during spermiogenesis and identified altered sperm histone retention in specific gene loci genome-wide using MNase digestion-based enrichment of mononucleosomal DNA. We then set out to determine the extent to which expression of these genes was altered in embryos generated with these sperm. For control sperm, most genes showed some degree of histone association, unexpectedly suggesting that histone retention in sperm genes is not an all-or-none phenomenon and that a small number of histones may remain associated with genes throughout the genome. The amount of retained histones, however, was altered in many loci when PAR metabolism was impaired. To ascertain whether sperm histone association and embryonic gene expression are linked, the transcriptome of individual 2-cell embryos derived from such sperm was determined using microarrays and RNA sequencing. Strikingly, a moderate but statistically significant portion of the genes that were differentially expressed in these embryos also showed different histone retention in the corresponding gene loci in sperm of their fathers. These findings provide new evidence for the existence of a linkage between sperm histone retention and gene expression in the embryo.

  12. Paternal Poly (ADP-ribose) Metabolism Modulates Retention of Inheritable Sperm Histones and Early Embryonic Gene Expression

    PubMed Central

    Leu, N. Adrian; Rao, Shilpa; Li, Fan; Gregory, Brian D.; Zalenskaya, Irina A.; Schultz, Richard M.; Meyer, Ralph G.

    2014-01-01

    To achieve the extreme nuclear condensation necessary for sperm function, most histones are replaced with protamines during spermiogenesis in mammals. Mature sperm retain only a small fraction of nucleosomes, which are, in part, enriched on gene regulatory sequences, and recent findings suggest that these retained histones provide epigenetic information that regulates expression of a subset of genes involved in embryo development after fertilization. We addressed this tantalizing hypothesis by analyzing two mouse models exhibiting abnormal histone positioning in mature sperm due to impaired poly(ADP-ribose) (PAR) metabolism during spermiogenesis and identified altered sperm histone retention in specific gene loci genome-wide using MNase digestion-based enrichment of mononucleosomal DNA. We then set out to determine the extent to which expression of these genes was altered in embryos generated with these sperm. For control sperm, most genes showed some degree of histone association, unexpectedly suggesting that histone retention in sperm genes is not an all-or-none phenomenon and that a small number of histones may remain associated with genes throughout the genome. The amount of retained histones, however, was altered in many loci when PAR metabolism was impaired. To ascertain whether sperm histone association and embryonic gene expression are linked, the transcriptome of individual 2-cell embryos derived from such sperm was determined using microarrays and RNA sequencing. Strikingly, a moderate but statistically significant portion of the genes that were differentially expressed in these embryos also showed different histone retention in the corresponding gene loci in sperm of their fathers. These findings provide new evidence for the existence of a linkage between sperm histone retention and gene expression in the embryo. PMID:24810616

  13. DGCA: A comprehensive R package for Differential Gene Correlation Analysis.

    PubMed

    McKenzie, Andrew T; Katsyv, Igor; Song, Won-Min; Wang, Minghui; Zhang, Bin

    2016-11-15

    Dissecting the regulatory relationships between genes is a critical step towards building accurate predictive models of biological systems. A powerful approach towards this end is to systematically study the differences in correlation between gene pairs in more than one distinct condition. In this study we develop an R package, DGCA (for Differential Gene Correlation Analysis), which offers a suite of tools for computing and analyzing differential correlations between gene pairs across multiple conditions. To minimize parametric assumptions, DGCA computes empirical p-values via permutation testing. To understand differential correlations at a systems level, DGCA performs higher-order analyses such as measuring the average difference in correlation and multiscale clustering analysis of differential correlation networks. Through a simulation study, we show that the straightforward z-score based method that DGCA employs significantly outperforms the existing alternative methods for calculating differential correlation. Application of DGCA to the TCGA RNA-seq data in breast cancer not only identifies key changes in the regulatory relationships between TP53 and PTEN and their target genes in the presence of inactivating mutations, but also reveals an immune-related differential correlation module that is specific to triple negative breast cancer (TNBC). DGCA is an R package for systematically assessing the difference in gene-gene regulatory relationships under different conditions. This user-friendly, effective, and comprehensive software tool will greatly facilitate the application of differential correlation analysis in many biological studies and thus will help identification of novel signaling pathways, biomarkers, and targets in complex biological systems and diseases.

  14. GIS assessment of the risk of gene flow from Brassica napus to its wild relatives in China.

    PubMed

    Dong, Jing-Jing; Zhang, Ming-Gang; Wei, Wei; Ma, Ke-Ping; Wang, Ying-Hao

    2018-06-16

    Risk of gene flow from canola (Brassica napus) to species of wild relatives was used as an example to evaluate the risk of gene flow of transgenic crops. B. juncea and B. rapa were the most common weedy Brassica species in China, which were both sexually compatible with canola. Data on canola cultivation in China were collected and analyzed using geographic information system (GIS), and the distribution of its wild relatives was predicted by MaxEnt species distribution model. Based on biological and phenological evidence, our results showed that gene flow risk exists in most parts of the country, especially in places with higher richness of wild Brassica species. However, risk in dominant canola cultivation regions is relatively low owing to the reduced distribution density of wild species in these regions. Three regions of higher risk of gene flow had been identified. Risk of gene flow is relatively high in certain areas. China has been assumed to be the original center of B. juncea and B. rapa, and gene flow may lead to negative effects on the conservation of biodiversity of local species. Strategies had been proposed to reduce the possibility of gene flow either by monitoring introgression from crops to wild relatives in the areas of high adoption of the crop or by taking measures to limit the releasing of new crops or varieties in the areas with abundant wild relatives.

  15. sigReannot: an oligo-set re-annotation pipeline based on similarities with the Ensembl transcripts and Unigene clusters.

    PubMed

    Casel, Pierrot; Moreews, François; Lagarrigue, Sandrine; Klopp, Christophe

    2009-07-16

    Microarray is a powerful technology enabling to monitor tens of thousands of genes in a single experiment. Most microarrays are now using oligo-sets. The design of the oligo-nucleotides is time consuming and error prone. Genome wide microarray oligo-sets are designed using as large a set of transcripts as possible in order to monitor as many genes as possible. Depending on the genome sequencing state and on the assembly state the knowledge of the existing transcripts can be very different. This knowledge evolves with the different genome builds and gene builds. Once the design is done the microarrays are often used for several years. The biologists working in EADGENE expressed the need of up-to-dated annotation files for the oligo-sets they share including information about the orthologous genes of model species, the Gene Ontology, the corresponding pathways and the chromosomal location. The results of SigReannot on a chicken micro-array used in the EADGENE project compared to the initial annotations show that 23% of the oligo-nucleotide gene annotations were not confirmed, 2% were modified and 1% were added. The interest of this up-to-date annotation procedure is demonstrated through the analysis of real data previously published. SigReannot uses the oligo-nucleotide design procedure criteria to validate the probe-gene link and the Ensembl transcripts as reference for annotation. It therefore produces a high quality annotation based on reference gene sets.

  16. Population genetic testing for cancer susceptibility: founder mutations to genomes.

    PubMed

    Foulkes, William D; Knoppers, Bartha Maria; Turnbull, Clare

    2016-01-01

    The current standard model for identifying carriers of high-risk mutations in cancer-susceptibility genes (CSGs) generally involves a process that is not amenable to population-based testing: access to genetic tests is typically regulated by health-care providers on the basis of a labour-intensive assessment of an individual's personal and family history of cancer, with face-to-face genetic counselling performed before mutation testing. Several studies have shown that application of these selection criteria results in a substantial proportion of mutation carriers being missed. Population-based genetic testing has been proposed as an alternative approach to determining cancer susceptibility, and aims for a more-comprehensive detection of mutation carriers. Herein, we review the existing data on population-based genetic testing, and consider some of the barriers, pitfalls, and challenges related to the possible expansion of this approach. We consider mechanisms by which population-based genetic testing for cancer susceptibility could be delivered, and suggest how such genetic testing might be integrated into existing and emerging health-care structures. The existing models of genetic testing (including issues relating to informed consent) will very likely require considerable alteration if the potential benefits of population-based genetic testing are to be fully realized.

  17. Spermatogenesis Drives Rapid Gene Creation and Masculinization of the X Chromosome in Stalk-Eyed Flies (Diopsidae).

    PubMed

    Baker, Richard H; Narechania, Apurva; DeSalle, Rob; Johns, Philip M; Reinhardt, Josephine A; Wilkinson, Gerald S

    2016-03-26

    Throughout their evolutionary history, genomes acquire new genetic material that facilitates phenotypic innovation and diversification. Developmental processes associated with reproduction are particularly likely to involve novel genes. Abundant gene creation impacts the evolution of chromosomal gene content and general regulatory mechanisms such as dosage compensation. Numerous studies in model organisms have found complex and, at times contradictory, relationships among these genomic attributes highlighting the need to examine these patterns in other systems characterized by abundant sexual selection. Therefore, we examined the association among novel gene creation, tissue-specific gene expression, and chromosomal gene content within stalk-eyed flies. Flies in this family are characterized by strong sexual selection and the presence of a newly evolved X chromosome. We generated RNA-seq transcriptome data from the testes for three species within the family and from seven additional tissues in the highly dimorphic species,Teleopsis dalmanni Analysis of dipteran gene orthology reveals dramatic testes-specific gene creation in stalk-eyed flies, involving numerous gene families that are highly conserved in other insect groups. Identification of X-linked genes for the three species indicates that the X chromosome arose prior to the diversification of the family. The most striking feature of this X chromosome is that it is highly masculinized, containing nearly twice as many testes-specific genes as expected based on its size. All the major processes that may drive differential sex chromosome gene content-creation of genes with male-specific expression, development of male-specific expression from pre-existing genes, and movement of genes with male-specific expression-are elevated on the X chromosome ofT. dalmanni This masculinization occurs despite evidence that testes expressed genes do not achieve the same levels of gene expression on the X chromosome as they do on the autosomes. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  18. Algorithms for Hidden Markov Models Restricted to Occurrences of Regular Expressions

    PubMed Central

    Tataru, Paula; Sand, Andreas; Hobolth, Asger; Mailund, Thomas; Pedersen, Christian N. S.

    2013-01-01

    Hidden Markov Models (HMMs) are widely used probabilistic models, particularly for annotating sequential data with an underlying hidden structure. Patterns in the annotation are often more relevant to study than the hidden structure itself. A typical HMM analysis consists of annotating the observed data using a decoding algorithm and analyzing the annotation to study patterns of interest. For example, given an HMM modeling genes in DNA sequences, the focus is on occurrences of genes in the annotation. In this paper, we define a pattern through a regular expression and present a restriction of three classical algorithms to take the number of occurrences of the pattern in the hidden sequence into account. We present a new algorithm to compute the distribution of the number of pattern occurrences, and we extend the two most widely used existing decoding algorithms to employ information from this distribution. We show experimentally that the expectation of the distribution of the number of pattern occurrences gives a highly accurate estimate, while the typical procedure can be biased in the sense that the identified number of pattern occurrences does not correspond to the true number. We furthermore show that using this distribution in the decoding algorithms improves the predictive power of the model. PMID:24833225

  19. A bright future for bioluminescent imaging in viral research

    PubMed Central

    Coleman, Stewart M; McGregor, Alistair

    2015-01-01

    Summary Bioluminescence imaging (BLI) has emerged as a powerful tool in the study of animal models of viral disease. BLI enables real-time in vivo study of viral infection, host immune response and the efficacy of intervention strategies. Substrate dependent light emitting luciferase enzyme when incorporated into a virus as a reporter gene enables detection of bioluminescence from infected cells using sensitive charge-coupled device (CCD) camera systems. Advantages of BLI include low background, real-time tracking of infection in the same animal and reduction in the requirement for larger animal numbers. Transgenic luciferase-tagged mice enable the use of pre-existing nontagged viruses in BLI studies. Continued development in luciferase reporter genes, substrates, transgenic animals and imaging systems will greatly enhance future BLI strategies in viral research. PMID:26413138

  20. Genetic analysis of the cytoplasmic dynein subunit families.

    PubMed

    Pfister, K Kevin; Shah, Paresh R; Hummerich, Holger; Russ, Andreas; Cotton, James; Annuar, Azlina Ahmad; King, Stephen M; Fisher, Elizabeth M C

    2006-01-01

    Cytoplasmic dyneins, the principal microtubule minus-end-directed motor proteins of the cell, are involved in many essential cellular processes. The major form of this enzyme is a complex of at least six protein subunits, and in mammals all but one of the subunits are encoded by at least two genes. Here we review current knowledge concerning the subunits, their interactions, and their functional roles as derived from biochemical and genetic analyses. We also carried out extensive database searches to look for new genes and to clarify anomalies in the databases. Our analysis documents evolutionary relationships among the dynein subunits of mammals and other model organisms, and sheds new light on the role of this diverse group of proteins, highlighting the existence of two cytoplasmic dynein complexes with distinct cellular roles.

  1. Genetic Analysis of the Cytoplasmic Dynein Subunit Families

    PubMed Central

    Pfister, K. Kevin; Shah, Paresh R; Hummerich, Holger; Russ, Andreas; Cotton, James; Annuar, Azlina Ahmad; King, Stephen M; Fisher, Elizabeth M. C

    2006-01-01

    Cytoplasmic dyneins, the principal microtubule minus-end-directed motor proteins of the cell, are involved in many essential cellular processes. The major form of this enzyme is a complex of at least six protein subunits, and in mammals all but one of the subunits are encoded by at least two genes. Here we review current knowledge concerning the subunits, their interactions, and their functional roles as derived from biochemical and genetic analyses. We also carried out extensive database searches to look for new genes and to clarify anomalies in the databases. Our analysis documents evolutionary relationships among the dynein subunits of mammals and other model organisms, and sheds new light on the role of this diverse group of proteins, highlighting the existence of two cytoplasmic dynein complexes with distinct cellular roles. PMID:16440056

  2. Stent-mediated gene and drug delivery for cardiovascular disease and cancer: A brief insight.

    PubMed

    Krishnagopal, Akshaya; Reddy, Aakash; Sen, Dwaipayan

    2017-05-01

    This review concisely recapitulates the different existing modes of stent-mediated gene/drug delivery, their considerable advancement in clinical trials and a rationale for other merging new technologies such as nanotechnology and microRNA-based therapeutics, in addition to addressing the limitations in each of these perpetual stent platforms. Over the past decade, stent-mediated gene/drug delivery has materialized as a hopeful alternative for cardiovascular disease and cancer in contrast to routine conventional treatment modalities. Regardless of the phenomenal recent developments achieved by coronary interventions and cancer therapies that employ gene and drug-eluting stents, practical hurdles still remain a challenge. The present review highlights the limitations that each of the existing stent-based gene/drug delivery system encompasses and therefore provides a vision for the future with respect to discovering an ideal stent therapeutic platform that would circumvent all the practical hurdles witnessed with the existing technology. Further study of the improvisation of next-generation drug-eluting stents has helped to overcome the issue of restenosis to some extent. However, current stent formulations fall short of the anticipated clinically meaningful outcomes and there is an explicit need for more randomized trials aiming to further evaluate stent platforms in favour of enhanced safety and clinical value. Gene-eluting stents may hold promise in contributing new ideas for stent-based prevention of in-stent restenosis through genetic interventions by capitalizing on a wide variety of molecular targets. Therefore, the central consideration directs us toward finding an ideal stent therapeutic platform that would tackle all of the gaps in the existing technology. Copyright © 2017 John Wiley & Sons, Ltd.

  3. The theoretical cognitive process of visualization for science education.

    PubMed

    Mnguni, Lindelani E

    2014-01-01

    The use of visual models such as pictures, diagrams and animations in science education is increasing. This is because of the complex nature associated with the concepts in the field. Students, especially entrant students, often report misconceptions and learning difficulties associated with various concepts especially those that exist at a microscopic level, such as DNA, the gene and meiosis as well as those that exist in relatively large time scales such as evolution. However the role of visual literacy in the construction of knowledge in science education has not been investigated much. This article explores the theoretical process of visualization answering the question "how can visual literacy be understood based on the theoretical cognitive process of visualization in order to inform the understanding, teaching and studying of visual literacy in science education?" Based on various theories on cognitive processes during learning for science and general education the author argues that the theoretical process of visualization consists of three stages, namely, Internalization of Visual Models, Conceptualization of Visual Models and Externalization of Visual Models. The application of this theoretical cognitive process of visualization and the stages of visualization in science education are discussed.

  4. Genetically engineered cardiac pacemaker: Stem cells transfected with HCN2 gene and myocytes—A model

    NASA Astrophysics Data System (ADS)

    Kanani, S.; Pumir, A.; Krinsky, V.

    2008-01-01

    One of the successfully tested methods to design genetically engineered cardiac pacemaker cells consists in transfecting a human mesenchymal stem cell (hMSC) with a HCN2 gene and connecting it to a myocyte. We develop and study a mathematical model, describing a myocyte connected to a hMSC transfected with a HCN2 gene. The cardiac action potential is described both with the simple Beeler Reuter model, as well as with the elaborate dynamic Luo Rudy model. The HCN2 channel is described by fitting electrophysiological records, in the spirit of Hodgkin Huxley. The model shows that oscillations can occur in a pair myocyte-stem cell, that was not observed in the experiments yet. The model predicted that: (1) HCN pacemaker channels can induce oscillations only if the number of expressed I channels is low enough. At too high an expression level of I channels, oscillations cannot be induced, no matter how many pacemaker channels are expressed. (2) At low expression levels of I channels, a large domain of values in the parameter space (n, N) exists, where oscillations should be observed. We denote N the number of expressed pacemaker channels in the stem cell, and n the number of gap junction channels coupling the stem cell and the myocyte. (3) The expression levels of I channels observed in ventricular myocytes, both in the Beeler Reuter and in the dynamic Luo Rudy models are too high to allow to observe oscillations. With expression levels below ˜1/4 of the original value, oscillations can be observed. The main consequence of this work is that in order to obtain oscillations in an experiment with a myocyte-stem cell pair, increasing the values of n, N is unlikely to be helpful, unless the expression level of I has been reduced enough. The model also allows us to explore levels of gene expression not yet achieved in experiments, and could be useful to plan new experiments, aimed at improving the robustness of the oscillations.

  5. Genome-scale metabolic analysis of Clostridium thermocellum for bioethanol production

    PubMed Central

    2010-01-01

    Background Microorganisms possess diverse metabolic capabilities that can potentially be leveraged for efficient production of biofuels. Clostridium thermocellum (ATCC 27405) is a thermophilic anaerobe that is both cellulolytic and ethanologenic, meaning that it can directly use the plant sugar, cellulose, and biochemically convert it to ethanol. A major challenge in using microorganisms for chemical production is the need to modify the organism to increase production efficiency. The process of properly engineering an organism is typically arduous. Results Here we present a genome-scale model of C. thermocellum metabolism, iSR432, for the purpose of establishing a computational tool to study the metabolic network of C. thermocellum and facilitate efforts to engineer C. thermocellum for biofuel production. The model consists of 577 reactions involving 525 intracellular metabolites, 432 genes, and a proteomic-based representation of a cellulosome. The process of constructing this metabolic model led to suggested annotation refinements for 27 genes and identification of areas of metabolism requiring further study. The accuracy of the iSR432 model was tested using experimental growth and by-product secretion data for growth on cellobiose and fructose. Analysis using this model captures the relationship between the reduction-oxidation state of the cell and ethanol secretion and allowed for prediction of gene deletions and environmental conditions that would increase ethanol production. Conclusions By incorporating genomic sequence data, network topology, and experimental measurements of enzyme activities and metabolite fluxes, we have generated a model that is reasonably accurate at predicting the cellular phenotype of C. thermocellum and establish a strong foundation for rational strain design. In addition, we are able to draw some important conclusions regarding the underlying metabolic mechanisms for observed behaviors of C. thermocellum and highlight remaining gaps in the existing genome annotations. PMID:20307315

  6. Structural and Functional Characterization of a Caenorhabditis elegans Genetic Interaction Network within Pathways

    PubMed Central

    Boucher, Benjamin; Lee, Anna Y.; Hallett, Michael; Jenna, Sarah

    2016-01-01

    A genetic interaction (GI) is defined when the mutation of one gene modifies the phenotypic expression associated with the mutation of a second gene. Genome-wide efforts to map GIs in yeast revealed structural and functional properties of a GI network. This provided insights into the mechanisms underlying the robustness of yeast to genetic and environmental insults, and also into the link existing between genotype and phenotype. While a significant conservation of GIs and GI network structure has been reported between distant yeast species, such a conservation is not clear between unicellular and multicellular organisms. Structural and functional characterization of a GI network in these latter organisms is consequently of high interest. In this study, we present an in-depth characterization of ~1.5K GIs in the nematode Caenorhabditis elegans. We identify and characterize six distinct classes of GIs by examining a wide-range of structural and functional properties of genes and network, including co-expression, phenotypical manifestations, relationship with protein-protein interaction dense subnetworks (PDS) and pathways, molecular and biological functions, gene essentiality and pleiotropy. Our study shows that GI classes link genes within pathways and display distinctive properties, specifically towards PDS. It suggests a model in which pathways are composed of PDS-centric and PDS-independent GIs coordinating molecular machines through two specific classes of GIs involving pleiotropic and non-pleiotropic connectors. Our study provides the first in-depth characterization of a GI network within pathways of a multicellular organism. It also suggests a model to understand better how GIs control system robustness and evolution. PMID:26871911

  7. Cloning, analysis and functional annotation of expressed sequence tags from the Earthworm Eisenia fetida

    PubMed Central

    Pirooznia, Mehdi; Gong, Ping; Guan, Xin; Inouye, Laura S; Yang, Kuan; Perkins, Edward J; Deng, Youping

    2007-01-01

    Background Eisenia fetida, commonly known as red wiggler or compost worm, belongs to the Lumbricidae family of the Annelida phylum. Little is known about its genome sequence although it has been extensively used as a test organism in terrestrial ecotoxicology. In order to understand its gene expression response to environmental contaminants, we cloned 4032 cDNAs or expressed sequence tags (ESTs) from two E. fetida libraries enriched with genes responsive to ten ordnance related compounds using suppressive subtractive hybridization-PCR. Results A total of 3144 good quality ESTs (GenBank dbEST accession number EH669363–EH672369 and EL515444–EL515580) were obtained from the raw clone sequences after cleaning. Clustering analysis yielded 2231 unique sequences including 448 contigs (from 1361 ESTs) and 1783 singletons. Comparative genomic analysis showed that 743 or 33% of the unique sequences shared high similarity with existing genes in the GenBank nr database. Provisional function annotation assigned 830 Gene Ontology terms to 517 unique sequences based on their homology with the annotated genomes of four model organisms Drosophila melanogaster, Mus musculus, Saccharomyces cerevisiae, and Caenorhabditis elegans. Seven percent of the unique sequences were further mapped to 99 Kyoto Encyclopedia of Genes and Genomes pathways based on their matching Enzyme Commission numbers. All the information is stored and retrievable at a highly performed, web-based and user-friendly relational database called EST model database or ESTMD version 2. Conclusion The ESTMD containing the sequence and annotation information of 4032 E. fetida ESTs is publicly accessible at . PMID:18047730

  8. Candidate genes, pathways and mechanisms for alcoholism: an expanded convergent functional genomics approach.

    PubMed

    Rodd, Z A; Bertsch, B A; Strother, W N; Le-Niculescu, H; Balaraman, Y; Hayden, E; Jerome, R E; Lumeng, L; Nurnberger, J I; Edenberg, H J; McBride, W J; Niculescu, A B

    2007-08-01

    We describe a comprehensive translational approach for identifying candidate genes for alcoholism. The approach relies on the cross-matching of animal model brain gene expression data with human genetic linkage data, as well as human tissue data and biological roles data, an approach termed convergent functional genomics. An analysis of three animal model paradigms, based on inbred alcohol-preferring (iP) and alcohol-non-preferring (iNP) rats, and their response to treatments with alcohol, was used. A comprehensive analysis of microarray gene expression data from five key brain regions (frontal cortex, amygdala, caudate-putamen, nucleus accumbens and hippocampus) was carried out. The Bayesian-like integration of multiple independent lines of evidence, each by itself lacking sufficient discriminatory power, led to the identification of high probability candidate genes, pathways and mechanisms for alcoholism. These data reveal that alcohol has pleiotropic effects on multiple systems, which may explain the diverse neuropsychiatric and medical pathology in alcoholism. Some of the pathways identified suggest avenues for pharmacotherapy of alcoholism with existing agents, such as angiotensin-converting enzyme (ACE) inhibitors. Experiments we carried out in alcohol-preferring rats with an ACE inhibitor show a marked modulation of alcohol intake. Other pathways are new potential targets for drug development. The emergent overall picture is that physical and physiological robustness may permit alcohol-preferring individuals to withstand the aversive effects of alcohol. In conjunction with a higher reactivity to its rewarding effects, they may able to ingest enough of this nonspecific drug for a strong hedonic and addictive effect to occur.

  9. Accurate and sensitive quantification of protein-DNA binding affinity.

    PubMed

    Rastogi, Chaitanya; Rube, H Tomas; Kribelbauer, Judith F; Crocker, Justin; Loker, Ryan E; Martini, Gabriella D; Laptenko, Oleg; Freed-Pastor, William A; Prives, Carol; Stern, David L; Mann, Richard S; Bussemaker, Harmen J

    2018-04-17

    Transcription factors (TFs) control gene expression by binding to genomic DNA in a sequence-specific manner. Mutations in TF binding sites are increasingly found to be associated with human disease, yet we currently lack robust methods to predict these sites. Here, we developed a versatile maximum likelihood framework named No Read Left Behind (NRLB) that infers a biophysical model of protein-DNA recognition across the full affinity range from a library of in vitro selected DNA binding sites. NRLB predicts human Max homodimer binding in near-perfect agreement with existing low-throughput measurements. It can capture the specificity of the p53 tetramer and distinguish multiple binding modes within a single sample. Additionally, we confirm that newly identified low-affinity enhancer binding sites are functional in vivo, and that their contribution to gene expression matches their predicted affinity. Our results establish a powerful paradigm for identifying protein binding sites and interpreting gene regulatory sequences in eukaryotic genomes. Copyright © 2018 the Author(s). Published by PNAS.

  10. Accurate and sensitive quantification of protein-DNA binding affinity

    PubMed Central

    Rastogi, Chaitanya; Rube, H. Tomas; Kribelbauer, Judith F.; Crocker, Justin; Loker, Ryan E.; Martini, Gabriella D.; Laptenko, Oleg; Freed-Pastor, William A.; Prives, Carol; Stern, David L.; Mann, Richard S.; Bussemaker, Harmen J.

    2018-01-01

    Transcription factors (TFs) control gene expression by binding to genomic DNA in a sequence-specific manner. Mutations in TF binding sites are increasingly found to be associated with human disease, yet we currently lack robust methods to predict these sites. Here, we developed a versatile maximum likelihood framework named No Read Left Behind (NRLB) that infers a biophysical model of protein-DNA recognition across the full affinity range from a library of in vitro selected DNA binding sites. NRLB predicts human Max homodimer binding in near-perfect agreement with existing low-throughput measurements. It can capture the specificity of the p53 tetramer and distinguish multiple binding modes within a single sample. Additionally, we confirm that newly identified low-affinity enhancer binding sites are functional in vivo, and that their contribution to gene expression matches their predicted affinity. Our results establish a powerful paradigm for identifying protein binding sites and interpreting gene regulatory sequences in eukaryotic genomes. PMID:29610332

  11. Membrane-bound SIV envelope trimers are immunogenic in ferrets after intranasal vaccination with a replication-competent canine distemper virus vector.

    PubMed

    Zhang, Xinsheng; Wallace, Olivia; Wright, Kevin J; Backer, Martin; Coleman, John W; Koehnke, Rebecca; Frenk, Esther; Domi, Arban; Chiuchiolo, Maria J; DeStefano, Joanne; Narpala, Sandeep; Powell, Rebecca; Morrow, Gavin; Boggiano, Cesar; Zamb, Timothy J; Richter King, C; Parks, Christopher L

    2013-11-01

    We are investigating canine distemper virus (CDV) as a vaccine vector for the delivery of HIV envelope (Env) that closely resembles the native trimeric spike. We selected CDV because it will promote vaccine delivery to lymphoid tissues, and because human exposure is infrequent, reducing potential effects of pre-existing immunity. Using SIV Env as a model, we tested a number of vector and gene insert designs. Vectors containing a gene inserted between the CDV H and L genes, which encoded Env lacking most of its cytoplasmic tail, propagated efficiently in Vero cells, expressed the immunogen on the cell surface, and incorporated the SIV glycoprotein into progeny virus particles. When ferrets were vaccinated intranasally, there were no signs of distress, vector replication was observed in the gut-associated lymphoid tissues, and the animals produced anti-SIV Env antibodies. These data show that live CDV-SIV Env vectors can safely induce anti-Env immune responses following intranasal vaccination. © 2013 Elsevier Inc. All rights reserved.

  12. Dissecting gene-environment interactions: A penalized robust approach accounting for hierarchical structures.

    PubMed

    Wu, Cen; Jiang, Yu; Ren, Jie; Cui, Yuehua; Ma, Shuangge

    2018-02-10

    Identification of gene-environment (G × E) interactions associated with disease phenotypes has posed a great challenge in high-throughput cancer studies. The existing marginal identification methods have suffered from not being able to accommodate the joint effects of a large number of genetic variants, while some of the joint-effect methods have been limited by failing to respect the "main effects, interactions" hierarchy, by ignoring data contamination, and by using inefficient selection techniques under complex structural sparsity. In this article, we develop an effective penalization approach to identify important G × E interactions and main effects, which can account for the hierarchical structures of the 2 types of effects. Possible data contamination is accommodated by adopting the least absolute deviation loss function. The advantage of the proposed approach over the alternatives is convincingly demonstrated in both simulation and a case study on lung cancer prognosis with gene expression measurements and clinical covariates under the accelerated failure time model. Copyright © 2017 John Wiley & Sons, Ltd.

  13. Cross-talk between Msx/Dlx homeobox genes and vitamin D during tooth mineralization.

    PubMed

    Lézot, F; Descroix, V; Mesbah, M; Hotton, D; Blin, C; Papagerakis, P; Mauro, N; Kato, S; MacDougall, M; Sharpe, P; Berdal, A

    2002-01-01

    Rickets is associated with site-specific disorders of enamel and dentin formation, which may reflect the impact of vitamin D on a morphogenetic pathway. This study is devoted to potential cross-talk between vitamin D and Msx/Dlx transcription factors. We raised the question of a potential link between tooth defects seen in mice with rickets and Msx2 gene misexpression, using mutant mice lacking the nuclear vitamin D receptor as an animal model. Our data showed a modulation of Msx2 expression. In order to search for a functional impact of this Msx2 misexpression secondary to rickets, we focused our attention on osteocalcin as a target gene for both vitamin D and Msx2. Combining Msx2 overexpression and vitamin D addition in vitro, we showed an inhibitory effect on osteocalcin expression in immortalized MO6-G3 odontoblasts. Finally, in the same cells, such combinations appeared to modulate VDR expression outlining the existence of complex cross-regulations between vitamin D and Msx/Dix pathways.

  14. Gain-of-function mutagenesis approaches in rice for functional genomics and improvement of crop productivity.

    PubMed

    Moin, Mazahar; Bakshi, Achala; Saha, Anusree; Dutta, Mouboni; Kirti, P B

    2017-07-01

    The epitome of any genome research is to identify all the existing genes in a genome and investigate their roles. Various techniques have been applied to unveil the functions either by silencing or over-expressing the genes by targeted expression or random mutagenesis. Rice is the most appropriate model crop for generating a mutant resource for functional genomic studies because of the availability of high-quality genome sequence and relatively smaller genome size. Rice has syntenic relationships with members of other cereals. Hence, characterization of functionally unknown genes in rice will possibly provide key genetic insights and can lead to comparative genomics involving other cereals. The current review attempts to discuss the available gain-of-function mutagenesis techniques for functional genomics, emphasizing the contemporary approach, activation tagging and alterations to this method for the enhancement of yield and productivity of rice. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  15. Association of ACE gene A2350G and I/D polymorphisms with essential hypertension in the northernmost province of China.

    PubMed

    Sun, Feifei; He, Ning; Zhang, Keyong; Wu, Nan; Zhao, Jingbo; Qiu, Changchun

    2018-01-01

    Angiotensin converting enzyme (ACE) gene, as a strong candidate gene for essential hypertension(EH), has been extensively studied. In this study, we carried out a population-based case-control study to explore whether ACE gene I/D and A2350G polymorphisms could consider to be risk factors for EH. A total of 2040 subjeces were recruited from Chinese Han in this study, out of which 1010 were cases and 1030 were normotensive individuals. ACE gene A2350G and I/D polymorphisms were amplified by polymerase chain reaction (PCR) and A2350G polymorphism was detected after restriction enzyme digestion with BstuI. Besides, we choosed 10% samples randomly sequencing to verify the accuracy of results. Genotype and allele frequencies distribution of I/D and A2350G in EH and control groups were significantly different. After grouped by sex or age, there were still statistical significances for two polymorphisms. In dominant and recessive model of A2350G, we found significant differences between two groups, respectively. For ACE I/D polymorphism, we observed that the existence of dramatical difference in dominant model between two groups, while in recessive model, marginally significant difference was found. Among the four haplotypes composed by ACE gene A2350G and I/D, haplotype G-D reached the statistical significance in two groups, and exhibited to be a risk factor for the development of EH, whose P < 0.001 and OR 95%CI = 1.639(1.435-1.872), while the other haplotypes were the protective factors and decreased the susceptibility to EH(P < 0.05). ACE gene A2350G and I/D polymorphisms were associated with increasing the risk of suffering from EH in the northernmost province of China individuals, with D allele and G allele individuals had a higher risk of EH(OR = 1.443, 95%CI = 1.273-1.636 and OR = 1.481, 95%CI = 1.303-1.684).

  16. In Search of 'Birth Month Genes': Using Existing Data Repositories to Locate Genes Underlying Birth Month-Disease Relationships.

    PubMed

    Boland, Mary Regina; Tatonetti, Nicholas P

    2016-01-01

    Prenatal and perinatal exposures vary seasonally (e.g., sunlight, allergens) and many diseases are linked with variance in exposure. Epidemiologists often measure these changes using birth month as a proxy for seasonal variance. Likewise, Genome-Wide Association Studies have associated or implicated these same diseases with many genes. Both disparate data types (epidemiological and genetic) can provide key insights into the underlying disease biology. We developed an algorithm that links 1) epidemiological data from birth month studies with 2) genetic data from published gene-disease association studies. Our framework uses existing data repositories - PubMed, DisGeNET and Gene Ontology - to produce a bipartite network that connects enriched seasonally varying biofactorss with birth month dependent diseases (BMDDs) through their overlapping developmental gene sets. As a proof-of-concept, we investigate 7 known BMDDs and highlight three important biological networks revealed by our algorithm and explore some interesting genetic mechanisms potentially responsible for the seasonal contribution to BMDDs.

  17. Mining Genotype-Phenotype Associations from Public Knowledge Sources via Semantic Web Querying.

    PubMed

    Kiefer, Richard C; Freimuth, Robert R; Chute, Christopher G; Pathak, Jyotishman

    2013-01-01

    Gene Wiki Plus (GeneWiki+) and the Online Mendelian Inheritance in Man (OMIM) are publicly available resources for sharing information about disease-gene and gene-SNP associations in humans. While immensely useful to the scientific community, both resources are manually curated, thereby making the data entry and publication process time-consuming, and to some degree, error-prone. To this end, this study investigates Semantic Web technologies to validate existing and potentially discover new genotype-phenotype associations in GWP and OMIM. In particular, we demonstrate the applicability of SPARQL queries for identifying associations not explicitly stated for commonly occurring chronic diseases in GWP and OMIM, and report our preliminary findings for coverage, completeness, and validity of the associations. Our results highlight the benefits of Semantic Web querying technology to validate existing disease-gene associations as well as identify novel associations although further evaluation and analysis is required before such information can be applied and used effectively.

  18. Gene therapy for haemophilia: prospects and challenges to prevent or reverse inhibitor formation.

    PubMed

    Scott, David W; Lozier, Jay N

    2012-02-01

    Monogenic hereditary diseases, such as haemophilia A and B, are ideal targets for gene therapeutic approaches. While these diseases can be treated with protein therapeutics, such as factor VIII (FVIII) or IX (FIX), the notion that permanent transfer of the genes encoding these factors can cure haemophilia is very attractive. An underlying problem with a gene therapy approach, however, is the patient's immune response to the therapeutic protein (as well as to the transmission vector), leading to the formation of inhibitory antibodies. Even more daunting is reversing an existing immune response in patients with pre-existing inhibitors. In this review, we will describe the laboratory and clinical progress, and the challenges met thus far, in achieving the goal of gene therapy efficacy, with a focus on the goal of tolerance induction. Published 2011. This article is a US Government work and is in the public domain in the USA.

  19. A New Algorithm for Identifying Cis-Regulatory Modules Based on Hidden Markov Model

    PubMed Central

    2017-01-01

    The discovery of cis-regulatory modules (CRMs) is the key to understanding mechanisms of transcription regulation. Since CRMs have specific regulatory structures that are the basis for the regulation of gene expression, how to model the regulatory structure of CRMs has a considerable impact on the performance of CRM identification. The paper proposes a CRM discovery algorithm called ComSPS. ComSPS builds a regulatory structure model of CRMs based on HMM by exploring the rules of CRM transcriptional grammar that governs the internal motif site arrangement of CRMs. We test ComSPS on three benchmark datasets and compare it with five existing methods. Experimental results show that ComSPS performs better than them. PMID:28497059

  20. Personalized Cancer Medicine: An Organoid Approach.

    PubMed

    Aboulkheyr Es, Hamidreza; Montazeri, Leila; Aref, Amir Reza; Vosough, Massoud; Baharvand, Hossein

    2018-04-01

    Personalized cancer therapy applies specific treatments to each patient. Using personalized tumor models with similar characteristics to the original tumors may result in more accurate predictions of drug responses in patients. Tumor organoid models have several advantages over pre-existing models, including conserving the molecular and cellular composition of the original tumor. These advantages highlight the tremendous potential of tumor organoids in personalized cancer therapy, particularly preclinical drug screening and predicting patient responses to selected treatment regimens. Here, we highlight the advantages, challenges, and translational potential of tumor organoids in personalized cancer therapy and focus on gene-drug associations, drug response prediction, and treatment selection. Finally, we discuss how microfluidic technology can contribute to immunotherapy drug screening in tumor organoids. Copyright © 2017 Elsevier Ltd. All rights reserved.

Top