Sample records for gene functions based

  1. Improving the measurement of semantic similarity by combining gene ontology and co-functional network: a random walk based approach.

    PubMed

    Peng, Jiajie; Zhang, Xuanshuo; Hui, Weiwei; Lu, Junya; Li, Qianqian; Liu, Shuhui; Shang, Xuequn

    2018-03-19

    Gene Ontology (GO) is one of the most popular bioinformatics resources. In the past decade, Gene Ontology-based gene semantic similarity has been effectively used to model gene-to-gene interactions in multiple research areas. However, most existing semantic similarity approaches rely only on GO annotations and structure, or incorporate only local interactions in the co-functional network. This may lead to inaccurate GO-based similarity resulting from the incomplete GO topology structure and gene annotations. We present NETSIM2, a new network-based method that allows researchers to measure GO-based gene functional similarities by considering the global structure of the co-functional network with a random walk with restart (RWR)-based method, and by selecting the significant term pairs to decrease the noise information. Based on the EC number (Enzyme Commission)-based groups of yeast and Arabidopsis, evaluation test shows that NETSIM2 can enhance the accuracy of Gene Ontology-based gene functional similarity. Using NETSIM2 as an example, we found that the accuracy of semantic similarities can be significantly improved after effectively incorporating the global gene-to-gene interactions in the co-functional network, especially on the species that gene annotations in GO are far from complete.

  2. GO-based functional dissimilarity of gene sets.

    PubMed

    Díaz-Díaz, Norberto; Aguilar-Ruiz, Jesús S

    2011-09-01

    The Gene Ontology (GO) provides a controlled vocabulary for describing the functions of genes and can be used to evaluate the functional coherence of gene sets. Many functional coherence measures consider each pair of gene functions in a set and produce an output based on all pairwise distances. A single gene can encode multiple proteins that may differ in function. For each functionality, other proteins that exhibit the same activity may also participate. Therefore, an identification of the most common function for all of the genes involved in a biological process is important in evaluating the functional similarity of groups of genes and a quantification of functional coherence can helps to clarify the role of a group of genes working together. To implement this approach to functional assessment, we present GFD (GO-based Functional Dissimilarity), a novel dissimilarity measure for evaluating groups of genes based on the most relevant functions of the whole set. The measure assigns a numerical value to the gene set for each of the three GO sub-ontologies. Results show that GFD performs robustly when applied to gene set of known functionality (extracted from KEGG). It performs particularly well on randomly generated gene sets. An ROC analysis reveals that the performance of GFD in evaluating the functional dissimilarity of gene sets is very satisfactory. A comparative analysis against other functional measures, such as GS2 and those presented by Resnik and Wang, also demonstrates the robustness of GFD.

  3. Gene function prediction based on Gene Ontology Hierarchy Preserving Hashing.

    PubMed

    Zhao, Yingwen; Fu, Guangyuan; Wang, Jun; Guo, Maozu; Yu, Guoxian

    2018-02-23

    Gene Ontology (GO) uses structured vocabularies (or terms) to describe the molecular functions, biological roles, and cellular locations of gene products in a hierarchical ontology. GO annotations associate genes with GO terms and indicate the given gene products carrying out the biological functions described by the relevant terms. However, predicting correct GO annotations for genes from a massive set of GO terms as defined by GO is a difficult challenge. To combat with this challenge, we introduce a Gene Ontology Hierarchy Preserving Hashing (HPHash) based semantic method for gene function prediction. HPHash firstly measures the taxonomic similarity between GO terms. It then uses a hierarchy preserving hashing technique to keep the hierarchical order between GO terms, and to optimize a series of hashing functions to encode massive GO terms via compact binary codes. After that, HPHash utilizes these hashing functions to project the gene-term association matrix into a low-dimensional one and performs semantic similarity based gene function prediction in the low-dimensional space. Experimental results on three model species (Homo sapiens, Mus musculus and Rattus norvegicus) for interspecies gene function prediction show that HPHash performs better than other related approaches and it is robust to the number of hash functions. In addition, we also take HPHash as a plugin for BLAST based gene function prediction. From the experimental results, HPHash again significantly improves the prediction performance. The codes of HPHash are available at: http://mlda.swu.edu.cn/codes.php?name=HPHash. Copyright © 2018 Elsevier Inc. All rights reserved.

  4. Systematic prediction of gene function in Arabidopsis thaliana using a probabilistic functional gene network

    PubMed Central

    Hwang, Sohyun; Rhee, Seung Y; Marcotte, Edward M; Lee, Insuk

    2012-01-01

    AraNet is a functional gene network for the reference plant Arabidopsis and has been constructed in order to identify new genes associated with plant traits. It is highly predictive for diverse biological pathways and can be used to prioritize genes for functional screens. Moreover, AraNet provides a web-based tool with which plant biologists can efficiently discover novel functions of Arabidopsis genes (http://www.functionalnet.org/aranet/). This protocol explains how to conduct network-based prediction of gene functions using AraNet and how to interpret the prediction results. Functional discovery in plant biology is facilitated by combining candidate prioritization by AraNet with focused experimental tests. PMID:21886106

  5. Transposon based functional characterization of soybean genes

    USDA-ARS?s Scientific Manuscript database

    Type II transposable elements that use cut and paste mechanism for jumping from one genomic region to another is ideal in tagging and cloning genes. Precise excision from an insertion site in a mutant gene leads to regaining the wild-type function. Thus, function of a gene can be established based o...

  6. An improved method for functional similarity analysis of genes based on Gene Ontology.

    PubMed

    Tian, Zhen; Wang, Chunyu; Guo, Maozu; Liu, Xiaoyan; Teng, Zhixia

    2016-12-23

    Measures of gene functional similarity are essential tools for gene clustering, gene function prediction, evaluation of protein-protein interaction, disease gene prioritization and other applications. In recent years, many gene functional similarity methods have been proposed based on the semantic similarity of GO terms. However, these leading approaches may make errorprone judgments especially when they measure the specificity of GO terms as well as the IC of a term set. Therefore, how to estimate the gene functional similarity reliably is still a challenging problem. We propose WIS, an effective method to measure the gene functional similarity. First of all, WIS computes the IC of a term by employing its depth, the number of its ancestors as well as the topology of its descendants in the GO graph. Secondly, WIS calculates the IC of a term set by means of considering the weighted inherited semantics of terms. Finally, WIS estimates the gene functional similarity based on the IC overlap ratio of term sets. WIS is superior to some other representative measures on the experiments of functional classification of genes in a biological pathway, collaborative evaluation of GO-based semantic similarity measures, protein-protein interaction prediction and correlation with gene expression. Further analysis suggests that WIS takes fully into account the specificity of terms and the weighted inherited semantics of terms between GO terms. The proposed WIS method is an effective and reliable way to compare gene function. The web service of WIS is freely available at http://nclab.hit.edu.cn/WIS/ .

  7. Gene function prediction with gene interaction networks: a context graph kernel approach.

    PubMed

    Li, Xin; Chen, Hsinchun; Li, Jiexun; Zhang, Zhu

    2010-01-01

    Predicting gene functions is a challenge for biologists in the postgenomic era. Interactions among genes and their products compose networks that can be used to infer gene functions. Most previous studies adopt a linkage assumption, i.e., they assume that gene interactions indicate functional similarities between connected genes. In this study, we propose to use a gene's context graph, i.e., the gene interaction network associated with the focal gene, to infer its functions. In a kernel-based machine-learning framework, we design a context graph kernel to capture the information in context graphs. Our experimental study on a testbed of p53-related genes demonstrates the advantage of using indirect gene interactions and shows the empirical superiority of the proposed approach over linkage-assumption-based methods, such as the algorithm to minimize inconsistent connected genes and diffusion kernels.

  8. Exploring the role of peptides in polymer-based gene delivery.

    PubMed

    Sun, Yanping; Yang, Zhen; Wang, Chunxi; Yang, Tianzhi; Cai, Cuifang; Zhao, Xiaoyun; Yang, Li; Ding, Pingtian

    2017-09-15

    Polymers are widely studied as non-viral gene vectors because of their strong DNA binding ability, capacity to carry large payload, flexibility of chemical modifications, low immunogenicity, and facile processes for manufacturing. However, high cytotoxicity and low transfection efficiency substantially restrict their application in clinical trials. Incorporating functional peptides is a promising approach to address these issues. Peptides demonstrate various functions in polymer-based gene delivery systems, such as targeting to specific cells, breaching membrane barriers, facilitating DNA condensation and release, and lowering cytotoxicity. In this review, we systematically summarize the role of peptides in polymer-based gene delivery, and elaborate how to rationally design polymer-peptide based gene delivery vectors. Polymers are widely studied as non-viral gene vectors, but suffer from high cytotoxicity and low transfection efficiency. Incorporating short, bioactive peptides into polymer-based gene delivery systems can address this issue. Peptides demonstrate various functions in polymer-based gene delivery systems, such as targeting to specific cells, breaching membrane barriers, facilitating DNA condensation and release, and lowering cytotoxicity. In this review, we highlight the peptides' roles in polymer-based gene delivery, and elaborate how to utilize various functional peptides to enhance the transfection efficiency of polymers. The optimized peptide-polymer vectors should be able to alter their structures and functions according to biological microenvironments and utilize inherent intracellular pathways of cells, and consequently overcome the barriers during gene delivery to enhance transfection efficiency. Copyright © 2017 Acta Materialia Inc. Published by Elsevier Ltd. All rights reserved.

  9. InteGO2: A web tool for measuring and visualizing gene semantic similarities using Gene Ontology

    DOE PAGES

    Peng, Jiajie; Li, Hongxiang; Liu, Yongzhuang; ...

    2016-08-31

    Here, the Gene Ontology (GO) has been used in high-throughput omics research as a major bioinformatics resource. The hierarchical structure of GO provides users a convenient platform for biological information abstraction and hypothesis testing. Computational methods have been developed to identify functionally similar genes. However, none of the existing measurements take into account all the rich information in GO. Similarly, using these existing methods, web-based applications have been constructed to compute gene functional similarities, and to provide pure text-based outputs. Without a graphical visualization interface, it is difficult for result interpretation. As a result, we present InteGO2, a web toolmore » that allows researchers to calculate the GO-based gene semantic similarities using seven widely used GO-based similarity measurements. Also, we provide an integrative measurement that synergistically integrates all the individual measurements to improve the overall performance. Using HTML5 and cytoscape.js, we provide a graphical interface in InteGO2 to visualize the resulting gene functional association networks. In conclusion, InteGO2 is an easy-to-use HTML5 based web tool. With it, researchers can measure gene or gene product functional similarity conveniently, and visualize the network of functional interactions in a graphical interface.« less

  10. InteGO2: a web tool for measuring and visualizing gene semantic similarities using Gene Ontology.

    PubMed

    Peng, Jiajie; Li, Hongxiang; Liu, Yongzhuang; Juan, Liran; Jiang, Qinghua; Wang, Yadong; Chen, Jin

    2016-08-31

    The Gene Ontology (GO) has been used in high-throughput omics research as a major bioinformatics resource. The hierarchical structure of GO provides users a convenient platform for biological information abstraction and hypothesis testing. Computational methods have been developed to identify functionally similar genes. However, none of the existing measurements take into account all the rich information in GO. Similarly, using these existing methods, web-based applications have been constructed to compute gene functional similarities, and to provide pure text-based outputs. Without a graphical visualization interface, it is difficult for result interpretation. We present InteGO2, a web tool that allows researchers to calculate the GO-based gene semantic similarities using seven widely used GO-based similarity measurements. Also, we provide an integrative measurement that synergistically integrates all the individual measurements to improve the overall performance. Using HTML5 and cytoscape.js, we provide a graphical interface in InteGO2 to visualize the resulting gene functional association networks. InteGO2 is an easy-to-use HTML5 based web tool. With it, researchers can measure gene or gene product functional similarity conveniently, and visualize the network of functional interactions in a graphical interface. InteGO2 can be accessed via http://mlg.hit.edu.cn:8089/ .

  11. InteGO2: A web tool for measuring and visualizing gene semantic similarities using Gene Ontology

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Peng, Jiajie; Li, Hongxiang; Liu, Yongzhuang

    Here, the Gene Ontology (GO) has been used in high-throughput omics research as a major bioinformatics resource. The hierarchical structure of GO provides users a convenient platform for biological information abstraction and hypothesis testing. Computational methods have been developed to identify functionally similar genes. However, none of the existing measurements take into account all the rich information in GO. Similarly, using these existing methods, web-based applications have been constructed to compute gene functional similarities, and to provide pure text-based outputs. Without a graphical visualization interface, it is difficult for result interpretation. As a result, we present InteGO2, a web toolmore » that allows researchers to calculate the GO-based gene semantic similarities using seven widely used GO-based similarity measurements. Also, we provide an integrative measurement that synergistically integrates all the individual measurements to improve the overall performance. Using HTML5 and cytoscape.js, we provide a graphical interface in InteGO2 to visualize the resulting gene functional association networks. In conclusion, InteGO2 is an easy-to-use HTML5 based web tool. With it, researchers can measure gene or gene product functional similarity conveniently, and visualize the network of functional interactions in a graphical interface.« less

  12. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, Jing; Ma, Zihao; Carr, Steven A.

    Coexpression of mRNAs under multiple conditions is commonly used to infer cofunctionality of their gene products despite well-known limitations of this “guilt-by-association” (GBA) approach. Recent advancements in mass spectrometry-based proteomic technologies have enabled global expression profiling at the protein level; however, whether proteome profiling data can outperform transcriptome profiling data for coexpression based gene function prediction has not been systematically investigated. Here, we address this question by constructing and analyzing mRNA and protein coexpression networks for three cancer types with matched mRNA and protein profiling data from The Cancer Genome Atlas (TCGA) and the Clinical Proteomic Tumor Analysis Consortium (CPTAC).more » Our analyses revealed a marked difference in wiring between the mRNA and protein coexpression networks. Whereas protein coexpression was driven primarily by functional similarity between coexpressed genes, mRNA coexpression was driven by both cofunction and chromosomal colocalization of the genes. Functionally coherent mRNA modules were more likely to have their edges preserved in corresponding protein networks than functionally incoherent mRNA modules. Proteomic data strengthened the link between gene expression and function for at least 75% of Gene Ontology (GO) biological processes and 90% of KEGG pathways. A web application Gene2Net (http://cptac.gene2net.org) developed based on the three protein coexpression networks revealed novel gene-function relationships, such as linking ERBB2 (HER2) to lipid biosynthetic process in breast cancer, identifying PLG as a new gene involved in complement activation, and identifying AEBP1 as a new epithelial-mesenchymal transition (EMT) marker. Our results demonstrate that proteome profiling outperforms transcriptome profiling for coexpression based gene function prediction. Proteomics should be integrated if not preferred in gene function and human disease studies. Molecular & Cellular Proteomics 16: 10.1074/mcp.M116.060301, 121–134, 2017.« less

  13. MorphDB: Prioritizing Genes for Specialized Metabolism Pathways and Gene Ontology Categories in Plants.

    PubMed

    Zwaenepoel, Arthur; Diels, Tim; Amar, David; Van Parys, Thomas; Shamir, Ron; Van de Peer, Yves; Tzfadia, Oren

    2018-01-01

    Recent times have seen an enormous growth of "omics" data, of which high-throughput gene expression data are arguably the most important from a functional perspective. Despite huge improvements in computational techniques for the functional classification of gene sequences, common similarity-based methods often fall short of providing full and reliable functional information. Recently, the combination of comparative genomics with approaches in functional genomics has received considerable interest for gene function analysis, leveraging both gene expression based guilt-by-association methods and annotation efforts in closely related model organisms. Besides the identification of missing genes in pathways, these methods also typically enable the discovery of biological regulators (i.e., transcription factors or signaling genes). A previously built guilt-by-association method is MORPH, which was proven to be an efficient algorithm that performs particularly well in identifying and prioritizing missing genes in plant metabolic pathways. Here, we present MorphDB, a resource where MORPH-based candidate genes for large-scale functional annotations (Gene Ontology, MapMan bins) are integrated across multiple plant species. Besides a gene centric query utility, we present a comparative network approach that enables researchers to efficiently browse MORPH predictions across functional gene sets and species, facilitating efficient gene discovery and candidate gene prioritization. MorphDB is available at http://bioinformatics.psb.ugent.be/webtools/morphdb/morphDB/index/. We also provide a toolkit, named "MORPH bulk" (https://github.com/arzwa/morph-bulk), for running MORPH in bulk mode on novel data sets, enabling researchers to apply MORPH to their own species of interest.

  14. Systems biology definition of the core proteome of metabolism and expression is consistent with high-throughput data.

    PubMed

    Yang, Laurence; Tan, Justin; O'Brien, Edward J; Monk, Jonathan M; Kim, Donghyuk; Li, Howard J; Charusanti, Pep; Ebrahim, Ali; Lloyd, Colton J; Yurkovich, James T; Du, Bin; Dräger, Andreas; Thomas, Alex; Sun, Yuekai; Saunders, Michael A; Palsson, Bernhard O

    2015-08-25

    Finding the minimal set of gene functions needed to sustain life is of both fundamental and practical importance. Minimal gene lists have been proposed by using comparative genomics-based core proteome definitions. A definition of a core proteome that is supported by empirical data, is understood at the systems-level, and provides a basis for computing essential cell functions is lacking. Here, we use a systems biology-based genome-scale model of metabolism and expression to define a functional core proteome consisting of 356 gene products, accounting for 44% of the Escherichia coli proteome by mass based on proteomics data. This systems biology core proteome includes 212 genes not found in previous comparative genomics-based core proteome definitions, accounts for 65% of known essential genes in E. coli, and has 78% gene function overlap with minimal genomes (Buchnera aphidicola and Mycoplasma genitalium). Based on transcriptomics data across environmental and genetic backgrounds, the systems biology core proteome is significantly enriched in nondifferentially expressed genes and depleted in differentially expressed genes. Compared with the noncore, core gene expression levels are also similar across genetic backgrounds (two times higher Spearman rank correlation) and exhibit significantly more complex transcriptional and posttranscriptional regulatory features (40% more transcription start sites per gene, 22% longer 5'UTR). Thus, genome-scale systems biology approaches rigorously identify a functional core proteome needed to support growth. This framework, validated by using high-throughput datasets, facilitates a mechanistic understanding of systems-level core proteome function through in silico models; it de facto defines a paleome.

  15. LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights

    PubMed Central

    Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong

    2016-01-01

    Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher’s exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO’s usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher. PMID:26750448

  16. LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights.

    PubMed

    Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong

    2016-01-11

    Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher's exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO's usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher.

  17. Combining guilt-by-association and guilt-by-profiling to predict Saccharomyces cerevisiae gene function

    PubMed Central

    Tian, Weidong; Zhang, Lan V; Taşan, Murat; Gibbons, Francis D; King, Oliver D; Park, Julie; Wunderlich, Zeba; Cherry, J Michael; Roth, Frederick P

    2008-01-01

    Background: Learning the function of genes is a major goal of computational genomics. Methods for inferring gene function have typically fallen into two categories: 'guilt-by-profiling', which exploits correlation between function and other gene characteristics; and 'guilt-by-association', which transfers function from one gene to another via biological relationships. Results: We have developed a strategy ('Funckenstein') that performs guilt-by-profiling and guilt-by-association and combines the results. Using a benchmark set of functional categories and input data for protein-coding genes in Saccharomyces cerevisiae, Funckenstein was compared with a previous combined strategy. Subsequently, we applied Funckenstein to 2,455 Gene Ontology terms. In the process, we developed 2,455 guilt-by-profiling classifiers based on 8,848 gene characteristics and 12 functional linkage graphs based on 23 biological relationships. Conclusion: Funckenstein outperforms a previous combined strategy using a common benchmark dataset. The combination of 'guilt-by-profiling' and 'guilt-by-association' gave significant improvement over the component classifiers, showing the greatest synergy for the most specific functions. Performance was evaluated by cross-validation and by literature examination of the top-scoring novel predictions. These quantitative predictions should help prioritize experimental study of yeast gene functions. PMID:18613951

  18. Comparative analysis of grapevine whole-genome gene predictions, functional annotation, categorization and integration of the predicted gene sequences

    PubMed Central

    2012-01-01

    Background The first draft assembly and gene prediction of the grapevine genome (8X base coverage) was made available to the scientific community in 2007, and functional annotation was developed on this gene prediction. Since then additional Sanger sequences were added to the 8X sequences pool and a new version of the genomic sequence with superior base coverage (12X) was produced. Results In order to more efficiently annotate the function of the genes predicted in the new assembly, it is important to build on as much of the previous work as possible, by transferring 8X annotation of the genome to the 12X version. The 8X and 12X assemblies and gene predictions of the grapevine genome were compared to answer the question, “Can we uniquely map 8X predicted genes to 12X predicted genes?” The results show that while the assemblies and gene structure predictions are too different to make a complete mapping between them, most genes (18,725) showed a one-to-one relationship between 8X predicted genes and the last version of 12X predicted genes. In addition, reshuffled genomic sequence structures appeared. These highlight regions of the genome where the gene predictions need to be taken with caution. Based on the new grapevine gene functional annotation and in-depth functional categorization, twenty eight new molecular networks have been created for VitisNet while the existing networks were updated. Conclusions The outcomes of this study provide a functional annotation of the 12X genes, an update of VitisNet, the system of the grapevine molecular networks, and a new functional categorization of genes. Data are available at the VitisNet website (http://www.sdstate.edu/ps/research/vitis/pathways.cfm). PMID:22554261

  19. NaviGO: interactive tool for visualization and functional similarity and coherence analysis with gene ontology.

    PubMed

    Wei, Qing; Khan, Ishita K; Ding, Ziyun; Yerneni, Satwica; Kihara, Daisuke

    2017-03-20

    The number of genomics and proteomics experiments is growing rapidly, producing an ever-increasing amount of data that are awaiting functional interpretation. A number of function prediction algorithms were developed and improved to enable fast and automatic function annotation. With the well-defined structure and manual curation, Gene Ontology (GO) is the most frequently used vocabulary for representing gene functions. To understand relationship and similarity between GO annotations of genes, it is important to have a convenient pipeline that quantifies and visualizes the GO function analyses in a systematic fashion. NaviGO is a web-based tool for interactive visualization, retrieval, and computation of functional similarity and associations of GO terms and genes. Similarity of GO terms and gene functions is quantified with six different scores including protein-protein interaction and context based association scores we have developed in our previous works. Interactive navigation of the GO function space provides intuitive and effective real-time visualization of functional groupings of GO terms and genes as well as statistical analysis of enriched functions. We developed NaviGO, which visualizes and analyses functional similarity and associations of GO terms and genes. The NaviGO webserver is freely available at: http://kiharalab.org/web/navigo .

  20. Functional annotation of the vlinc class of non-coding RNAs using systems biology approach

    PubMed Central

    Laurent, Georges St.; Vyatkin, Yuri; Antonets, Denis; Ri, Maxim; Qi, Yao; Saik, Olga; Shtokalo, Dmitry; de Hoon, Michiel J.L.; Kawaji, Hideya; Itoh, Masayoshi; Lassmann, Timo; Arner, Erik; Forrest, Alistair R.R.; Nicolas, Estelle; McCaffrey, Timothy A.; Carninci, Piero; Hayashizaki, Yoshihide; Wahlestedt, Claes; Kapranov, Philipp

    2016-01-01

    Functionality of the non-coding transcripts encoded by the human genome is the coveted goal of the modern genomics research. While commonly relied on the classical methods of forward genetics, integration of different genomics datasets in a global Systems Biology fashion presents a more productive avenue of achieving this very complex aim. Here we report application of a Systems Biology-based approach to dissect functionality of a newly identified vast class of very long intergenic non-coding (vlinc) RNAs. Using highly quantitative FANTOM5 CAGE dataset, we show that these RNAs could be grouped into 1542 novel human genes based on analysis of insulators that we show here indeed function as genomic barrier elements. We show that vlincRNAs genes likely function in cis to activate nearby genes. This effect while most pronounced in closely spaced vlincRNA–gene pairs can be detected over relatively large genomic distances. Furthermore, we identified 101 vlincRNA genes likely involved in early embryogenesis based on patterns of their expression and regulation. We also found another 109 such genes potentially involved in cellular functions also happening at early stages of development such as proliferation, migration and apoptosis. Overall, we show that Systems Biology-based methods have great promise for functional annotation of non-coding RNAs. PMID:27001520

  1. Combining evidence, biomedical literature and statistical dependence: new insights for functional annotation of gene sets

    PubMed Central

    Aubry, Marc; Monnier, Annabelle; Chicault, Celine; de Tayrac, Marie; Galibert, Marie-Dominique; Burgun, Anita; Mosser, Jean

    2006-01-01

    Background Large-scale genomic studies based on transcriptome technologies provide clusters of genes that need to be functionally annotated. The Gene Ontology (GO) implements a controlled vocabulary organised into three hierarchies: cellular components, molecular functions and biological processes. This terminology allows a coherent and consistent description of the knowledge about gene functions. The GO terms related to genes come primarily from semi-automatic annotations made by trained biologists (annotation based on evidence) or text-mining of the published scientific literature (literature profiling). Results We report an original functional annotation method based on a combination of evidence and literature that overcomes the weaknesses and the limitations of each approach. It relies on the Gene Ontology Annotation database (GOA Human) and the PubGene biomedical literature index. We support these annotations with statistically associated GO terms and retrieve associative relations across the three GO hierarchies to emphasise the major pathways involved by a gene cluster. Both annotation methods and associative relations were quantitatively evaluated with a reference set of 7397 genes and a multi-cluster study of 14 clusters. We also validated the biological appropriateness of our hybrid method with the annotation of a single gene (cdc2) and that of a down-regulated cluster of 37 genes identified by a transcriptome study of an in vitro enterocyte differentiation model (CaCo-2 cells). Conclusion The combination of both approaches is more informative than either separate approach: literature mining can enrich an annotation based only on evidence. Text-mining of the literature can also find valuable associated MEDLINE references that confirm the relevance of the annotation. Eventually, GO terms networks can be built with associative relations in order to highlight cooperative and competitive pathways and their connected molecular functions. PMID:16674810

  2. Metabolic Pathway Assignment of Plant Genes based on Phylogenetic Profiling–A Feasibility Study

    PubMed Central

    Weißenborn, Sandra; Walther, Dirk

    2017-01-01

    Despite many developed experimental and computational approaches, functional gene annotation remains challenging. With the rapidly growing number of sequenced genomes, the concept of phylogenetic profiling, which predicts functional links between genes that share a common co-occurrence pattern across different genomes, has gained renewed attention as it promises to annotate gene functions based on presence/absence calls alone. We applied phylogenetic profiling to the problem of metabolic pathway assignments of plant genes with a particular focus on secondary metabolism pathways. We determined phylogenetic profiles for 40,960 metabolic pathway enzyme genes with assigned EC numbers from 24 plant species based on sequence and pathway annotation data from KEGG and Ensembl Plants. For gene sequence family assignments, needed to determine the presence or absence of particular gene functions in the given plant species, we included data of all 39 species available at the Ensembl Plants database and established gene families based on pairwise sequence identities and annotation information. Aside from performing profiling comparisons, we used machine learning approaches to predict pathway associations from phylogenetic profiles alone. Selected metabolic pathways were indeed found to be composed of gene families of greater than expected phylogenetic profile similarity. This was particularly evident for primary metabolism pathways, whereas for secondary pathways, both the available annotation in different species as well as the abstraction of functional association via distinct pathways proved limiting. While phylogenetic profile similarity was generally not found to correlate with gene co-expression, direct physical interactions of proteins were reflected by a significantly increased profile similarity suggesting an application of phylogenetic profiling methods as a filtering step in the identification of protein-protein interactions. This feasibility study highlights the potential and challenges associated with phylogenetic profiling methods for the detection of functional relationships between genes as well as the need to enlarge the set of plant genes with proven secondary metabolism involvement as well as the limitations of distinct pathways as abstractions of relationships between genes. PMID:29163570

  3. Gene context analysis in the Integrated Microbial Genomes (IMG) data management system.

    PubMed

    Mavromatis, Konstantinos; Chu, Ken; Ivanova, Natalia; Hooper, Sean D; Markowitz, Victor M; Kyrpides, Nikos C

    2009-11-24

    Computational methods for determining the function of genes in newly sequenced genomes have been traditionally based on sequence similarity to genes whose function has been identified experimentally. Function prediction methods can be extended using gene context analysis approaches such as examining the conservation of chromosomal gene clusters, gene fusion events and co-occurrence profiles across genomes. Context analysis is based on the observation that functionally related genes are often having similar gene context and relies on the identification of such events across phylogenetically diverse collection of genomes. We have used the data management system of the Integrated Microbial Genomes (IMG) as the framework to implement and explore the power of gene context analysis methods because it provides one of the largest available genome integrations. Visualization and search tools to facilitate gene context analysis have been developed and applied across all publicly available archaeal and bacterial genomes in IMG. These computations are now maintained as part of IMG's regular genome content update cycle. IMG is available at: http://img.jgi.doe.gov.

  4. Predicting Protein Function by Genomic Context: Quantitative Evaluation and Qualitative Inferences

    PubMed Central

    Huynen, Martijn; Snel, Berend; Lathe, Warren; Bork, Peer

    2000-01-01

    Various new methods have been proposed to predict functional interactions between proteins based on the genomic context of their genes. The types of genomic context that they use are Type I: the fusion of genes; Type II: the conservation of gene-order or co-occurrence of genes in potential operons; and Type III: the co-occurrence of genes across genomes (phylogenetic profiles). Here we compare these types for their coverage, their correlations with various types of functional interaction, and their overlap with homology-based function assignment. We apply the methods to Mycoplasma genitalium, the standard benchmarking genome in computational and experimental genomics. Quantitatively, conservation of gene order is the technique with the highest coverage, applying to 37% of the genes. By combining gene order conservation with gene fusion (6%), the co-occurrence of genes in operons in absence of gene order conservation (8%), and the co-occurrence of genes across genomes (11%), significant context information can be obtained for 50% of the genes (the categories overlap). Qualitatively, we observe that the functional interactions between genes are stronger as the requirements for physical neighborhood on the genome are more stringent, while the fraction of potential false positives decreases. Moreover, only in cases in which gene order is conserved in a substantial fraction of the genomes, in this case six out of twenty-five, does a single type of functional interaction (physical interaction) clearly dominate (>80%). In other cases, complementary function information from homology searches, which is available for most of the genes with significant genomic context, is essential to predict the type of interaction. Using a combination of genomic context and homology searches, new functional features can be predicted for 10% of M. genitalium genes. PMID:10958638

  5. A transversal approach to predict gene product networks from ontology-based similarity

    PubMed Central

    Chabalier, Julie; Mosser, Jean; Burgun, Anita

    2007-01-01

    Background Interpretation of transcriptomic data is usually made through a "standard" approach which consists in clustering the genes according to their expression patterns and exploiting Gene Ontology (GO) annotations within each expression cluster. This approach makes it difficult to underline functional relationships between gene products that belong to different expression clusters. To address this issue, we propose a transversal analysis that aims to predict functional networks based on a combination of GO processes and data expression. Results The transversal approach presented in this paper consists in computing the semantic similarity between gene products in a Vector Space Model. Through a weighting scheme over the annotations, we take into account the representativity of the terms that annotate a gene product. Comparing annotation vectors results in a matrix of gene product similarities. Combined with expression data, the matrix is displayed as a set of functional gene networks. The transversal approach was applied to 186 genes related to the enterocyte differentiation stages. This approach resulted in 18 functional networks proved to be biologically relevant. These results were compared with those obtained through a standard approach and with an approach based on information content similarity. Conclusion Complementary to the standard approach, the transversal approach offers new insight into the cellular mechanisms and reveals new research hypotheses by combining gene product networks based on semantic similarity, and data expression. PMID:17605807

  6. Partitioning of functional gene expression data using principal points.

    PubMed

    Kim, Jaehee; Kim, Haseong

    2017-10-12

    DNA microarrays offer motivation and hope for the simultaneous study of variations in multiple genes. Gene expression is a temporal process that allows variations in expression levels with a characterized gene function over a period of time. Temporal gene expression curves can be treated as functional data since they are considered as independent realizations of a stochastic process. This process requires appropriate models to identify patterns of gene functions. The partitioning of the functional data can find homogeneous subgroups of entities for the massive genes within the inherent biological networks. Therefor it can be a useful technique for the analysis of time-course gene expression data. We propose a new self-consistent partitioning method of functional coefficients for individual expression profiles based on the orthonormal basis system. A principal points based functional partitioning method is proposed for time-course gene expression data. The method explores the relationship between genes using Legendre coefficients as principal points to extract the features of gene functions. Our proposed method provides high connectivity in connectedness after clustering for simulated data and finds a significant subsets of genes with the increased connectivity. Our approach has comparative advantages that fewer coefficients are used from the functional data and self-consistency of principal points for partitioning. As real data applications, we are able to find partitioned genes through the gene expressions found in budding yeast data and Escherichia coli data. The proposed method benefitted from the use of principal points, dimension reduction, and choice of orthogonal basis system as well as provides appropriately connected genes in the resulting subsets. We illustrate our method by applying with each set of cell-cycle-regulated time-course yeast genes and E. coli genes. The proposed method is able to identify highly connected genes and to explore the complex dynamics of biological systems in functional genomics.

  7. dbWFA: a web-based database for functional annotation of Triticum aestivum transcripts

    PubMed Central

    Vincent, Jonathan; Dai, Zhanwu; Ravel, Catherine; Choulet, Frédéric; Mouzeyar, Said; Bouzidi, M. Fouad; Agier, Marie; Martre, Pierre

    2013-01-01

    The functional annotation of genes based on sequence homology with genes from model species genomes is time-consuming because it is necessary to mine several unrelated databases. The aim of the present work was to develop a functional annotation database for common wheat Triticum aestivum (L.). The database, named dbWFA, is based on the reference NCBI UniGene set, an expressed gene catalogue built by expressed sequence tag clustering, and on full-length coding sequences retrieved from the TriFLDB database. Information from good-quality heterogeneous sources, including annotations for model plant species Arabidopsis thaliana (L.) Heynh. and Oryza sativa L., was gathered and linked to T. aestivum sequences through BLAST-based homology searches. Even though the complexity of the transcriptome cannot yet be fully appreciated, we developed a tool to easily and promptly obtain information from multiple functional annotation systems (Gene Ontology, MapMan bin codes, MIPS Functional Categories, PlantCyc pathway reactions and TAIR gene families). The use of dbWFA is illustrated here with several query examples. We were able to assign a putative function to 45% of the UniGenes and 81% of the full-length coding sequences from TriFLDB. Moreover, comparison of the annotation of the whole T. aestivum UniGene set along with curated annotations of the two model species assessed the accuracy of the annotation provided by dbWFA. To further illustrate the use of dbWFA, genes specifically expressed during the early cell division or late storage polymer accumulation phases of T. aestivum grain development were identified using a clustering analysis and then annotated using dbWFA. The annotation of these two sets of genes was consistent with previous analyses of T. aestivum grain transcriptomes and proteomes. Database URL: urgi.versailles.inra.fr/dbWFA/ PMID:23660284

  8. Functional annotation of the vlinc class of non-coding RNAs using systems biology approach.

    PubMed

    St Laurent, Georges; Vyatkin, Yuri; Antonets, Denis; Ri, Maxim; Qi, Yao; Saik, Olga; Shtokalo, Dmitry; de Hoon, Michiel J L; Kawaji, Hideya; Itoh, Masayoshi; Lassmann, Timo; Arner, Erik; Forrest, Alistair R R; Nicolas, Estelle; McCaffrey, Timothy A; Carninci, Piero; Hayashizaki, Yoshihide; Wahlestedt, Claes; Kapranov, Philipp

    2016-04-20

    Functionality of the non-coding transcripts encoded by the human genome is the coveted goal of the modern genomics research. While commonly relied on the classical methods of forward genetics, integration of different genomics datasets in a global Systems Biology fashion presents a more productive avenue of achieving this very complex aim. Here we report application of a Systems Biology-based approach to dissect functionality of a newly identified vast class of very long intergenic non-coding (vlinc) RNAs. Using highly quantitative FANTOM5 CAGE dataset, we show that these RNAs could be grouped into 1542 novel human genes based on analysis of insulators that we show here indeed function as genomic barrier elements. We show that vlinc RNAs genes likely function in cisto activate nearby genes. This effect while most pronounced in closely spaced vlinc RNA-gene pairs can be detected over relatively large genomic distances. Furthermore, we identified 101 vlinc RNA genes likely involved in early embryogenesis based on patterns of their expression and regulation. We also found another 109 such genes potentially involved in cellular functions also happening at early stages of development such as proliferation, migration and apoptosis. Overall, we show that Systems Biology-based methods have great promise for functional annotation of non-coding RNAs. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. Statistical indicators of collective behavior and functional clusters in gene networks of yeast

    NASA Astrophysics Data System (ADS)

    Živković, J.; Tadić, B.; Wick, N.; Thurner, S.

    2006-03-01

    We analyze gene expression time-series data of yeast (S. cerevisiae) measured along two full cell-cycles. We quantify these data by using q-exponentials, gene expression ranking and a temporal mean-variance analysis. We construct gene interaction networks based on correlation coefficients and study the formation of the corresponding giant components and minimum spanning trees. By coloring genes according to their cell function we find functional clusters in the correlation networks and functional branches in the associated trees. Our results suggest that a percolation point of functional clusters can be identified on these gene expression correlation networks.

  10. Functional clustering of time series gene expression data by Granger causality

    PubMed Central

    2012-01-01

    Background A common approach for time series gene expression data analysis includes the clustering of genes with similar expression patterns throughout time. Clustered gene expression profiles point to the joint contribution of groups of genes to a particular cellular process. However, since genes belong to intricate networks, other features, besides comparable expression patterns, should provide additional information for the identification of functionally similar genes. Results In this study we perform gene clustering through the identification of Granger causality between and within sets of time series gene expression data. Granger causality is based on the idea that the cause of an event cannot come after its consequence. Conclusions This kind of analysis can be used as a complementary approach for functional clustering, wherein genes would be clustered not solely based on their expression similarity but on their topological proximity built according to the intensity of Granger causality among them. PMID:23107425

  11. Prior knowledge based mining functional modules from Yeast PPI networks with gene ontology

    PubMed Central

    2010-01-01

    Background In the literature, there are fruitful algorithmic approaches for identification functional modules in protein-protein interactions (PPI) networks. Because of accumulation of large-scale interaction data on multiple organisms and non-recording interaction data in the existing PPI database, it is still emergent to design novel computational techniques that can be able to correctly and scalably analyze interaction data sets. Indeed there are a number of large scale biological data sets providing indirect evidence for protein-protein interaction relationships. Results The main aim of this paper is to present a prior knowledge based mining strategy to identify functional modules from PPI networks with the aid of Gene Ontology. Higher similarity value in Gene Ontology means that two gene products are more functionally related to each other, so it is better to group such gene products into one functional module. We study (i) to encode the functional pairs into the existing PPI networks; and (ii) to use these functional pairs as pairwise constraints to supervise the existing functional module identification algorithms. Topology-based modularity metric and complex annotation in MIPs will be used to evaluate the identified functional modules by these two approaches. Conclusions The experimental results on Yeast PPI networks and GO have shown that the prior knowledge based learning methods perform better than the existing algorithms. PMID:21172053

  12. The Orphan Disease Networks

    PubMed Central

    Zhang, Minlu; Zhu, Cheng; Jacomy, Alexis; Lu, Long J.; Jegga, Anil G.

    2011-01-01

    The low prevalence rate of orphan diseases (OD) requires special combined efforts to improve diagnosis, prevention, and discovery of novel therapeutic strategies. To identify and investigate relationships based on shared genes or shared functional features, we have conducted a bioinformatic-based global analysis of all orphan diseases with known disease-causing mutant genes. Starting with a bipartite network of known OD and OD-causing mutant genes and using the human protein interactome, we first construct and topologically analyze three networks: the orphan disease network, the orphan disease-causing mutant gene network, and the orphan disease-causing mutant gene interactome. Our results demonstrate that in contrast to the common disease-causing mutant genes that are predominantly nonessential, a majority of orphan disease-causing mutant genes are essential. In confirmation of this finding, we found that OD-causing mutant genes are topologically important in the protein interactome and are ubiquitously expressed. Additionally, functional enrichment analysis of those genes in which mutations cause ODs shows that a majority result in premature death or are lethal in the orthologous mouse gene knockout models. To address the limitations of traditional gene-based disease networks, we also construct and analyze OD networks on the basis of shared enriched features (biological processes, cellular components, pathways, phenotypes, and literature citations). Analyzing these functionally-linked OD networks, we identified several additional OD-OD relations that are both phenotypically similar and phenotypically diverse. Surprisingly, we observed that the wiring of the gene-based and other feature-based OD networks are largely different; this suggests that the relationship between ODs cannot be fully captured by the gene-based network alone. PMID:21664998

  13. Prediction of operon-like gene clusters in the Arabidopsis thaliana genome based on co-expression analysis of neighboring genes.

    PubMed

    Wada, Masayoshi; Takahashi, Hiroki; Altaf-Ul-Amin, Md; Nakamura, Kensuke; Hirai, Masami Y; Ohta, Daisaku; Kanaya, Shigehiko

    2012-07-15

    Operon-like arrangements of genes occur in eukaryotes ranging from yeasts and filamentous fungi to nematodes, plants, and mammals. In plants, several examples of operon-like gene clusters involved in metabolic pathways have recently been characterized, e.g. the cyclic hydroxamic acid pathways in maize, the avenacin biosynthesis gene clusters in oat, the thalianol pathway in Arabidopsis thaliana, and the diterpenoid momilactone cluster in rice. Such operon-like gene clusters are defined by their co-regulation or neighboring positions within immediate vicinity of chromosomal regions. A comprehensive analysis of the expression of neighboring genes therefore accounts a crucial step to reveal the complete set of operon-like gene clusters within a genome. Genome-wide prediction of operon-like gene clusters should contribute to functional annotation efforts and provide novel insight into evolutionary aspects acquiring certain biological functions as well. We predicted co-expressed gene clusters by comparing the Pearson correlation coefficient of neighboring genes and randomly selected gene pairs, based on a statistical method that takes false discovery rate (FDR) into consideration for 1469 microarray gene expression datasets of A. thaliana. We estimated that A. thaliana contains 100 operon-like gene clusters in total. We predicted 34 statistically significant gene clusters consisting of 3 to 22 genes each, based on a stringent FDR threshold of 0.1. Functional relationships among genes in individual clusters were estimated by sequence similarity and functional annotation of genes. Duplicated gene pairs (determined based on BLAST with a cutoff of E<10(-5)) are included in 27 clusters. Five clusters are associated with metabolism, containing P450 genes restricted to the Brassica family and predicted to be involved in secondary metabolism. Operon-like clusters tend to include genes encoding bio-machinery associated with ribosomes, the ubiquitin/proteasome system, secondary metabolic pathways, lipid and fatty-acid metabolism, and the lipid transfer system. Copyright © 2012 Elsevier B.V. All rights reserved.

  14. Measuring semantic similarities by combining gene ontology annotations and gene co-function networks

    DOE PAGES

    Peng, Jiajie; Uygun, Sahra; Kim, Taehyong; ...

    2015-02-14

    Background: Gene Ontology (GO) has been used widely to study functional relationships between genes. The current semantic similarity measures rely only on GO annotations and GO structure. This limits the power of GO-based similarity because of the limited proportion of genes that are annotated to GO in most organisms. Results: We introduce a novel approach called NETSIM (network-based similarity measure) that incorporates information from gene co-function networks in addition to using the GO structure and annotations. Using metabolic reaction maps of yeast, Arabidopsis, and human, we demonstrate that NETSIM can improve the accuracy of GO term similarities. We also demonstratemore » that NETSIM works well even for genomes with sparser gene annotation data. We applied NETSIM on large Arabidopsis gene families such as cytochrome P450 monooxygenases to group the members functionally and show that this grouping could facilitate functional characterization of genes in these families. Conclusions: Using NETSIM as an example, we demonstrated that the performance of a semantic similarity measure could be significantly improved after incorporating genome-specific information. NETSIM incorporates both GO annotations and gene co-function network data as a priori knowledge in the model. Therefore, functional similarities of GO terms that are not explicitly encoded in GO but are relevant in a taxon-specific manner become measurable when GO annotations are limited.« less

  15. SGFSC: speeding the gene functional similarity calculation based on hash tables.

    PubMed

    Tian, Zhen; Wang, Chunyu; Guo, Maozu; Liu, Xiaoyan; Teng, Zhixia

    2016-11-04

    In recent years, many measures of gene functional similarity have been proposed and widely used in all kinds of essential research. These methods are mainly divided into two categories: pairwise approaches and group-wise approaches. However, a common problem with these methods is their time consumption, especially when measuring the gene functional similarities of a large number of gene pairs. The problem of computational efficiency for pairwise approaches is even more prominent because they are dependent on the combination of semantic similarity. Therefore, the efficient measurement of gene functional similarity remains a challenging problem. To speed current gene functional similarity calculation methods, a novel two-step computing strategy is proposed: (1) establish a hash table for each method to store essential information obtained from the Gene Ontology (GO) graph and (2) measure gene functional similarity based on the corresponding hash table. There is no need to traverse the GO graph repeatedly for each method with the help of the hash table. The analysis of time complexity shows that the computational efficiency of these methods is significantly improved. We also implement a novel Speeding Gene Functional Similarity Calculation tool, namely SGFSC, which is bundled with seven typical measures using our proposed strategy. Further experiments show the great advantage of SGFSC in measuring gene functional similarity on the whole genomic scale. The proposed strategy is successful in speeding current gene functional similarity calculation methods. SGFSC is an efficient tool that is freely available at http://nclab.hit.edu.cn/SGFSC . The source code of SGFSC can be downloaded from http://pan.baidu.com/s/1dFFmvpZ .

  16. Identifying metabolic enzymes with multiple types of association evidence

    PubMed Central

    Kharchenko, Peter; Chen, Lifeng; Freund, Yoav; Vitkup, Dennis; Church, George M

    2006-01-01

    Background Existing large-scale metabolic models of sequenced organisms commonly include enzymatic functions which can not be attributed to any gene in that organism. Existing computational strategies for identifying such missing genes rely primarily on sequence homology to known enzyme-encoding genes. Results We present a novel method for identifying genes encoding for a specific metabolic function based on a local structure of metabolic network and multiple types of functional association evidence, including clustering of genes on the chromosome, similarity of phylogenetic profiles, gene expression, protein fusion events and others. Using E. coli and S. cerevisiae metabolic networks, we illustrate predictive ability of each individual type of association evidence and show that significantly better predictions can be obtained based on the combination of all data. In this way our method is able to predict 60% of enzyme-encoding genes of E. coli metabolism within the top 10 (out of 3551) candidates for their enzymatic function, and as a top candidate within 43% of the cases. Conclusion We illustrate that a combination of genome context and other functional association evidence is effective in predicting genes encoding metabolic enzymes. Our approach does not rely on direct sequence homology to known enzyme-encoding genes, and can be used in conjunction with traditional homology-based metabolic reconstruction methods. The method can also be used to target orphan metabolic activities. PMID:16571130

  17. Captured metagenomics: large-scale targeting of genes based on ‘sequence capture’ reveals functional diversity in soils

    PubMed Central

    Manoharan, Lokeshwaran; Kushwaha, Sandeep K.; Hedlund, Katarina; Ahrén, Dag

    2015-01-01

    Microbial enzyme diversity is a key to understand many ecosystem processes. Whole metagenome sequencing (WMG) obtains information on functional genes, but it is costly and inefficient due to large amount of sequencing that is required. In this study, we have applied a captured metagenomics technique for functional genes in soil microorganisms, as an alternative to WMG. Large-scale targeting of functional genes, coding for enzymes related to organic matter degradation, was applied to two agricultural soil communities through captured metagenomics. Captured metagenomics uses custom-designed, hybridization-based oligonucleotide probes that enrich functional genes of interest in metagenomic libraries where only probe-bound DNA fragments are sequenced. The captured metagenomes were highly enriched with targeted genes while maintaining their target diversity and their taxonomic distribution correlated well with the traditional ribosomal sequencing. The captured metagenomes were highly enriched with genes related to organic matter degradation; at least five times more than similar, publicly available soil WMG projects. This target enrichment technique also preserves the functional representation of the soils, thereby facilitating comparative metagenomics projects. Here, we present the first study that applies the captured metagenomics approach in large scale, and this novel method allows deep investigations of central ecosystem processes by studying functional gene abundances. PMID:26490729

  18. Evidence-Based Annotation of Gene Function in Shewanella oneidensis MR-1 Using Genome-Wide Fitness Profiling across 121 Conditions

    PubMed Central

    Deutschbauer, Adam; Price, Morgan N.; Wetmore, Kelly M.; Shao, Wenjun; Baumohl, Jason K.; Xu, Zhuchen; Nguyen, Michelle; Tamse, Raquel; Davis, Ronald W.; Arkin, Adam P.

    2011-01-01

    Most genes in bacteria are experimentally uncharacterized and cannot be annotated with a specific function. Given the great diversity of bacteria and the ease of genome sequencing, high-throughput approaches to identify gene function experimentally are needed. Here, we use pools of tagged transposon mutants in the metal-reducing bacterium Shewanella oneidensis MR-1 to probe the mutant fitness of 3,355 genes in 121 diverse conditions including different growth substrates, alternative electron acceptors, stresses, and motility. We find that 2,350 genes have a pattern of fitness that is significantly different from random and 1,230 of these genes (37% of our total assayed genes) have enough signal to show strong biological correlations. We find that genes in all functional categories have phenotypes, including hundreds of hypotheticals, and that potentially redundant genes (over 50% amino acid identity to another gene in the genome) are also likely to have distinct phenotypes. Using fitness patterns, we were able to propose specific molecular functions for 40 genes or operons that lacked specific annotations or had incomplete annotations. In one example, we demonstrate that the previously hypothetical gene SO_3749 encodes a functional acetylornithine deacetylase, thus filling a missing step in S. oneidensis metabolism. Additionally, we demonstrate that the orphan histidine kinase SO_2742 and orphan response regulator SO_2648 form a signal transduction pathway that activates expression of acetyl-CoA synthase and is required for S. oneidensis to grow on acetate as a carbon source. Lastly, we demonstrate that gene expression and mutant fitness are poorly correlated and that mutant fitness generates more confident predictions of gene function than does gene expression. The approach described here can be applied generally to create large-scale gene-phenotype maps for evidence-based annotation of gene function in prokaryotes. PMID:22125499

  19. Framework for reanalysis of publicly available Affymetrix® GeneChip® data sets based on functional regions of interest.

    PubMed

    Saka, Ernur; Harrison, Benjamin J; West, Kirk; Petruska, Jeffrey C; Rouchka, Eric C

    2017-12-06

    Since the introduction of microarrays in 1995, researchers world-wide have used both commercial and custom-designed microarrays for understanding differential expression of transcribed genes. Public databases such as ArrayExpress and the Gene Expression Omnibus (GEO) have made millions of samples readily available. One main drawback to microarray data analysis involves the selection of probes to represent a specific transcript of interest, particularly in light of the fact that transcript-specific knowledge (notably alternative splicing) is dynamic in nature. We therefore developed a framework for reannotating and reassigning probe groups for Affymetrix® GeneChip® technology based on functional regions of interest. This framework addresses three issues of Affymetrix® GeneChip® data analyses: removing nonspecific probes, updating probe target mapping based on the latest genome knowledge and grouping probes into gene, transcript and region-based (UTR, individual exon, CDS) probe sets. Updated gene and transcript probe sets provide more specific analysis results based on current genomic and transcriptomic knowledge. The framework selects unique probes, aligns them to gene annotations and generates a custom Chip Description File (CDF). The analysis reveals only 87% of the Affymetrix® GeneChip® HG-U133 Plus 2 probes uniquely align to the current hg38 human assembly without mismatches. We also tested new mappings on the publicly available data series using rat and human data from GSE48611 and GSE72551 obtained from GEO, and illustrate that functional grouping allows for the subtle detection of regions of interest likely to have phenotypical consequences. Through reanalysis of the publicly available data series GSE48611 and GSE72551, we profiled the contribution of UTR and CDS regions to the gene expression levels globally. The comparison between region and gene based results indicated that the detected expressed genes by gene-based and region-based CDFs show high consistency and regions based results allows us to detection of changes in transcript formation.

  20. In search of functional association from time-series microarray data based on the change trend and level of gene expression

    PubMed Central

    He, Feng; Zeng, An-Ping

    2006-01-01

    Background The increasing availability of time-series expression data opens up new possibilities to study functional linkages of genes. Present methods used to infer functional linkages between genes from expression data are mainly based on a point-to-point comparison. Change trends between consecutive time points in time-series data have been so far not well explored. Results In this work we present a new method based on extracting main features of the change trend and level of gene expression between consecutive time points. The method, termed as trend correlation (TC), includes two major steps: 1, calculating a maximal local alignment of change trend score by dynamic programming and a change trend correlation coefficient between the maximal matched change levels of each gene pair; 2, inferring relationships of gene pairs based on two statistical extraction procedures. The new method considers time shifts and inverted relationships in a similar way as the local clustering (LC) method but the latter is merely based on a point-to-point comparison. The TC method is demonstrated with data from yeast cell cycle and compared with the LC method and the widely used Pearson correlation coefficient (PCC) based clustering method. The biological significance of the gene pairs is examined with several large-scale yeast databases. Although the TC method predicts an overall lower number of gene pairs than the other two methods at a same p-value threshold, the additional number of gene pairs inferred by the TC method is considerable: e.g. 20.5% compared with the LC method and 49.6% with the PCC method for a p-value threshold of 2.7E-3. Moreover, the percentage of the inferred gene pairs consistent with databases by our method is generally higher than the LC method and similar to the PCC method. A significant number of the gene pairs only inferred by the TC method are process-identity or function-similarity pairs or have well-documented biological interactions, including 443 known protein interactions and some known cell cycle related regulatory interactions. It should be emphasized that the overlapping of gene pairs detected by the three methods is normally not very high, indicating a necessity of combining the different methods in search of functional association of genes from time-series data. For a p-value threshold of 1E-5 the percentage of process-identity and function-similarity gene pairs among the shared part of the three methods reaches 60.2% and 55.6% respectively, building a good basis for further experimental and functional study. Furthermore, the combined use of methods is important to infer more complete regulatory circuits and network as exemplified in this study. Conclusion The TC method can significantly augment the current major methods to infer functional linkages and biological network and is well suitable for exploring temporal relationships of gene expression in time-series data. PMID:16478547

  1. RefEx, a reference gene expression dataset as a web tool for the functional analysis of genes.

    PubMed

    Ono, Hiromasa; Ogasawara, Osamu; Okubo, Kosaku; Bono, Hidemasa

    2017-08-29

    Gene expression data are exponentially accumulating; thus, the functional annotation of such sequence data from metadata is urgently required. However, life scientists have difficulty utilizing the available data due to its sheer magnitude and complicated access. We have developed a web tool for browsing reference gene expression pattern of mammalian tissues and cell lines measured using different methods, which should facilitate the reuse of the precious data archived in several public databases. The web tool is called Reference Expression dataset (RefEx), and RefEx allows users to search by the gene name, various types of IDs, chromosomal regions in genetic maps, gene family based on InterPro, gene expression patterns, or biological categories based on Gene Ontology. RefEx also provides information about genes with tissue-specific expression, and the relative gene expression values are shown as choropleth maps on 3D human body images from BodyParts3D. Combined with the newly incorporated Functional Annotation of Mammals (FANTOM) dataset, RefEx provides insight regarding the functional interpretation of unfamiliar genes. RefEx is publicly available at http://refex.dbcls.jp/.

  2. RefEx, a reference gene expression dataset as a web tool for the functional analysis of genes

    PubMed Central

    Ono, Hiromasa; Ogasawara, Osamu; Okubo, Kosaku; Bono, Hidemasa

    2017-01-01

    Gene expression data are exponentially accumulating; thus, the functional annotation of such sequence data from metadata is urgently required. However, life scientists have difficulty utilizing the available data due to its sheer magnitude and complicated access. We have developed a web tool for browsing reference gene expression pattern of mammalian tissues and cell lines measured using different methods, which should facilitate the reuse of the precious data archived in several public databases. The web tool is called Reference Expression dataset (RefEx), and RefEx allows users to search by the gene name, various types of IDs, chromosomal regions in genetic maps, gene family based on InterPro, gene expression patterns, or biological categories based on Gene Ontology. RefEx also provides information about genes with tissue-specific expression, and the relative gene expression values are shown as choropleth maps on 3D human body images from BodyParts3D. Combined with the newly incorporated Functional Annotation of Mammals (FANTOM) dataset, RefEx provides insight regarding the functional interpretation of unfamiliar genes. RefEx is publicly available at http://refex.dbcls.jp/. PMID:28850115

  3. Matrix factorization-based data fusion for gene function prediction in baker's yeast and slime mold.

    PubMed

    Zitnik, Marinka; Zupan, Blaž

    2014-01-01

    The development of effective methods for the characterization of gene functions that are able to combine diverse data sources in a sound and easily-extendible way is an important goal in computational biology. We have previously developed a general matrix factorization-based data fusion approach for gene function prediction. In this manuscript, we show that this data fusion approach can be applied to gene function prediction and that it can fuse various heterogeneous data sources, such as gene expression profiles, known protein annotations, interaction and literature data. The fusion is achieved by simultaneous matrix tri-factorization that shares matrix factors between sources. We demonstrate the effectiveness of the approach by evaluating its performance on predicting ontological annotations in slime mold D. discoideum and on recognizing proteins of baker's yeast S. cerevisiae that participate in the ribosome or are located in the cell membrane. Our approach achieves predictive performance comparable to that of the state-of-the-art kernel-based data fusion, but requires fewer data preprocessing steps.

  4. GeneSCF: a real-time based functional enrichment tool with support for multiple organisms.

    PubMed

    Subhash, Santhilal; Kanduri, Chandrasekhar

    2016-09-13

    High-throughput technologies such as ChIP-sequencing, RNA-sequencing, DNA sequencing and quantitative metabolomics generate a huge volume of data. Researchers often rely on functional enrichment tools to interpret the biological significance of the affected genes from these high-throughput studies. However, currently available functional enrichment tools need to be updated frequently to adapt to new entries from the functional database repositories. Hence there is a need for a simplified tool that can perform functional enrichment analysis by using updated information directly from the source databases such as KEGG, Reactome or Gene Ontology etc. In this study, we focused on designing a command-line tool called GeneSCF (Gene Set Clustering based on Functional annotations), that can predict the functionally relevant biological information for a set of genes in a real-time updated manner. It is designed to handle information from more than 4000 organisms from freely available prominent functional databases like KEGG, Reactome and Gene Ontology. We successfully employed our tool on two of published datasets to predict the biologically relevant functional information. The core features of this tool were tested on Linux machines without the need for installation of more dependencies. GeneSCF is more reliable compared to other enrichment tools because of its ability to use reference functional databases in real-time to perform enrichment analysis. It is an easy-to-integrate tool with other pipelines available for downstream analysis of high-throughput data. More importantly, GeneSCF can run multiple gene lists simultaneously on different organisms thereby saving time for the users. Since the tool is designed to be ready-to-use, there is no need for any complex compilation and installation procedures.

  5. How the serotonin story is being rewritten by new gene-based discoveries principally related to SLC6A4, the serotonin transporter gene, which functions to influence all cellular serotonin systems.

    PubMed

    Murphy, Dennis L; Fox, Meredith A; Timpano, Kiara R; Moya, Pablo R; Ren-Patterson, Renee; Andrews, Anne M; Holmes, Andrew; Lesch, Klaus-Peter; Wendland, Jens R

    2008-11-01

    Discovered and crystallized over sixty years ago, serotonin's important functions in the brain and body were identified over the ensuing years by neurochemical, physiological and pharmacological investigations. This 2008 M. Rapport Memorial Serotonin Review focuses on some of the most recent discoveries involving serotonin that are based on genetic methodologies. These include examples of the consequences that result from direct serotonergic gene manipulation (gene deletion or overexpression) in mice and other species; an evaluation of some phenotypes related to functional human serotonergic gene variants, particularly in SLC6A4, the serotonin transporter gene; and finally, a consideration of the pharmacogenomics of serotonergic drugs with respect to both their therapeutic actions and side effects. The serotonin transporter (SERT) has been the most comprehensively studied of the serotonin system molecular components, and will be the primary focus of this review. We provide in-depth examples of gene-based discoveries primarily related to SLC6A4 that have clarified serotonin's many important homeostatic functions in humans, non-human primates, mice and other species.

  6. Functional Potential of Soil Microbial Communities in the Maize Rhizosphere

    PubMed Central

    Xiong, Jingbo; Li, Jiabao; He, Zhili; Zhou, Jizhong; Yannarell, Anthony C.; Mackie, Roderick I.

    2014-01-01

    Microbial communities in the rhizosphere make significant contributions to crop health and nutrient cycling. However, their ability to perform important biogeochemical processes remains uncharacterized. Here, we identified important functional genes that characterize the rhizosphere microbial community to understand metabolic capabilities in the maize rhizosphere using the GeoChip-based functional gene array method. Significant differences in functional gene structure were apparent between rhizosphere and bulk soil microbial communities. Approximately half of the detected gene families were significantly (p<0.05) increased in the rhizosphere. Based on the detected gyrB genes, Gammaproteobacteria, Betaproteobacteria, Firmicutes, Bacteroidetes and Cyanobacteria were most enriched in the rhizosphere compared to those in the bulk soil. The rhizosphere niche also supported greater functional diversity in catabolic pathways. The maize rhizosphere had significantly enriched genes involved in carbon fixation and degradation (especially for hemicelluloses, aromatics and lignin), nitrogen fixation, ammonification, denitrification, polyphosphate biosynthesis and degradation, sulfur reduction and oxidation. This research demonstrates that the maize rhizosphere is a hotspot of genes, mostly originating from dominant soil microbial groups such as Proteobacteria, providing functional capacity for the transformation of labile and recalcitrant organic C, N, P and S compounds. PMID:25383887

  7. Weighted functional linear regression models for gene-based association analysis.

    PubMed

    Belonogova, Nadezhda M; Svishcheva, Gulnara R; Wilson, James F; Campbell, Harry; Axenovich, Tatiana I

    2018-01-01

    Functional linear regression models are effectively used in gene-based association analysis of complex traits. These models combine information about individual genetic variants, taking into account their positions and reducing the influence of noise and/or observation errors. To increase the power of methods, where several differently informative components are combined, weights are introduced to give the advantage to more informative components. Allele-specific weights have been introduced to collapsing and kernel-based approaches to gene-based association analysis. Here we have for the first time introduced weights to functional linear regression models adapted for both independent and family samples. Using data simulated on the basis of GAW17 genotypes and weights defined by allele frequencies via the beta distribution, we demonstrated that type I errors correspond to declared values and that increasing the weights of causal variants allows the power of functional linear models to be increased. We applied the new method to real data on blood pressure from the ORCADES sample. Five of the six known genes with P < 0.1 in at least one analysis had lower P values with weighted models. Moreover, we found an association between diastolic blood pressure and the VMP1 gene (P = 8.18×10-6), when we used a weighted functional model. For this gene, the unweighted functional and weighted kernel-based models had P = 0.004 and 0.006, respectively. The new method has been implemented in the program package FREGAT, which is freely available at https://cran.r-project.org/web/packages/FREGAT/index.html.

  8. Identification of hub subnetwork based on topological features of genes in breast cancer

    PubMed Central

    ZHUANG, DA-YONG; JIANG, LI; HE, QING-QING; ZHOU, PENG; YUE, TAO

    2015-01-01

    The aim of this study was to provide functional insight into the identification of hub subnetworks by aggregating the behavior of genes connected in a protein-protein interaction (PPI) network. We applied a protein network-based approach to identify subnetworks which may provide new insight into the functions of pathways involved in breast cancer rather than individual genes. Five groups of breast cancer data were downloaded and analyzed from the Gene Expression Omnibus (GEO) database of high-throughput gene expression data to identify gene signatures using the genome-wide global significance (GWGS) method. A PPI network was constructed using Cytoscape and clusters that focused on highly connected nodes were obtained using the molecular complex detection (MCODE) clustering algorithm. Pathway analysis was performed to assess the functional relevance of selected gene signatures based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Topological centrality was used to characterize the biological importance of gene signatures, pathways and clusters. The results revealed that, cluster1, as well as the cell cycle and oocyte meiosis pathways were significant subnetworks in the analysis of degree and other centralities, in which hub nodes mostly distributed. The most important hub nodes, with top ranked centrality, were also similar with the common genes from the above three subnetwork intersections, which was viewed as a hub subnetwork with more reproducible than individual critical genes selected without network information. This hub subnetwork attributed to the same biological process which was essential in the function of cell growth and death. This increased the accuracy of identifying gene interactions that took place within the same functional process and was potentially useful for the development of biomarkers and networks for breast cancer. PMID:25573623

  9. Long-term balanced fertilization increases the soil microbial functional diversity in a phosphorus-limited paddy soil.

    PubMed

    Su, Jian-Qiang; Ding, Long-Jun; Xue, Kai; Yao, Huai-Ying; Quensen, John; Bai, Shi-Jie; Wei, Wen-Xue; Wu, Jin-Shui; Zhou, Jizhong; Tiedje, James M; Zhu, Yong-Guan

    2015-01-01

    The influence of long-term chemical fertilization on soil microbial communities has been one of the frontier topics of agricultural and environmental sciences and is critical for linking soil microbial flora with soil functions. In this study, 16S rRNA gene pyrosequencing and a functional gene array, geochip 4.0, were used to investigate the shifts in microbial composition and functional gene structure in paddy soils with different fertilization treatments over a 22-year period. These included a control without fertilizers; chemical nitrogen fertilizer (N); N and phosphate (NP); N and potassium (NK); and N, P and K (NPK). Based on 16S rRNA gene data, both species evenness and key genera were affected by P fertilization. Functional gene array-based analysis revealed that long-term fertilization significantly changed the overall microbial functional structures. Chemical fertilization significantly increased the diversity and abundance of most genes involved in C, N, P and S cycling, especially for the treatments NK and NPK. Significant correlations were found among functional gene structure and abundance, related soil enzymatic activities and rice yield, suggesting that a fertilizer-induced shift in the microbial community may accelerate the nutrient turnover in soil, which in turn influenced rice growth. The effect of N fertilization on soil microbial functional genes was mitigated by the addition of P fertilizer in this P-limited paddy soil, suggesting that balanced chemical fertilization is beneficial to the soil microbial community and its functions. © 2014 John Wiley & Sons Ltd.

  10. Development of genome-based anti-virulence therapeutics to control HLB

    USDA-ARS?s Scientific Manuscript database

    Orthologous gene replacement technique has been developed to confirm functions of key virulence genes in 'Candidatus Liberibacters asiaticus'. These results facilitate the development of antivirulence drugs that specifically target functional domains of virulence gene products to disarm pathogenicit...

  11. Gene by Environment Interaction and Resilience: Effects of Child Maltreatment and Serotonin, Corticotropin Releasing Hormone, Dopamine, and Oxytocin Genes

    PubMed Central

    Cicchetti, Dante; Rogosch, Fred A.

    2013-01-01

    In this investigation, gene-environment interaction effects in predicting resilience in adaptive functioning among maltreated and nonmaltreated low-income children (N = 595) were examined. A multi-component index of resilient functioning was derived and levels of resilient functioning were identified. Variants in four genes, 5-HTTLPR, CRHR1, DRD4 -521C/T, and OXTR, were investigated. In a series of ANCOVAs, child maltreatment demonstrated a strong negative main effect on children’s resilient functioning, whereas no main effects for any of the genotypes of the respective genes were found. However, gene-environment interactions involving genotypes of each of the respective genes and maltreatment status were obtained. For each respective gene, among children with a specific genotype, the relative advantage in resilient functioning of nonmaltreated compared to maltreated children was stronger than was the case for nonmaltreated and maltreated children with other genotypes of the respective gene. Across the four genes, a composite of the genotypes that more strongly differentiated resilient functioning between nonmaltreated and maltreated children provided further evidence of genetic variations influencing resilient functioning in nonmaltreated children, whereas genetic variation had a negligible effect on promoting resilience among maltreated children. Additional effects were observed for children based on the number of subtypes of maltreatment children experienced, as well as for abuse and neglect subgroups. Finally, maltreated and nonmaltreated children with high levels of resilience differed in their average number of differentiating genotypes. These results suggest that differential resilient outcomes are based on the interaction between genes and developmental experiences. PMID:22559122

  12. A genome scale metabolic network for rice and accompanying analysis of tryptophan, auxin and serotonin biosynthesis regulation under biotic stress

    USDA-ARS?s Scientific Manuscript database

    Functional annotations of large plant genome projects mostly provide information on gene function and gene families based on the presence of protein domains and gene homology, but not necessarily in association with gene expression or metabolic and regulatory networks. These additional annotations a...

  13. Computational gene network study on antibiotic resistance genes of Acinetobacter baumannii.

    PubMed

    Anitha, P; Anbarasu, Anand; Ramaiah, Sudha

    2014-05-01

    Multi Drug Resistance (MDR) in Acinetobacter baumannii is one of the major threats for emerging nosocomial infections in hospital environment. Multidrug-resistance in A. baumannii may be due to the implementation of multi-combination resistance mechanisms such as β-lactamase synthesis, Penicillin-Binding Proteins (PBPs) changes, alteration in porin proteins and in efflux pumps against various existing classes of antibiotics. Multiple antibiotic resistance genes are involved in MDR. These resistance genes are transferred through plasmids, which are responsible for the dissemination of antibiotic resistance among Acinetobacter spp. In addition, these resistance genes may also have a tendency to interact with each other or with their gene products. Therefore, it becomes necessary to understand the impact of these interactions in antibiotic resistance mechanism. Hence, our study focuses on protein and gene network analysis on various resistance genes, to elucidate the role of the interacting proteins and to study their functional contribution towards antibiotic resistance. From the search tool for the retrieval of interacting gene/protein (STRING), a total of 168 functional partners for 15 resistance genes were extracted based on the confidence scoring system. The network study was then followed up with functional clustering of associated partners using molecular complex detection (MCODE). Later, we selected eight efficient clusters based on score. Interestingly, the associated protein we identified from the network possessed greater functional similarity with known resistance genes. This network-based approach on resistance genes of A. baumannii could help in identifying new genes/proteins and provide clues on their association in antibiotic resistance. Copyright © 2014 Elsevier Ltd. All rights reserved.

  14. Disentangling the multigenic and pleiotropic nature of molecular function

    PubMed Central

    2015-01-01

    Background Biological processes at the molecular level are usually represented by molecular interaction networks. Function is organised and modularity identified based on network topology, however, this approach often fails to account for the dynamic and multifunctional nature of molecular components. For example, a molecule engaging in spatially or temporally independent functions may be inappropriately clustered into a single functional module. To capture biologically meaningful sets of interacting molecules, we use experimentally defined pathways as spatial/temporal units of molecular activity. Results We defined functional profiles of Saccharomyces cerevisiae based on a minimal set of Gene Ontology terms sufficient to represent each pathway's genes. The Gene Ontology terms were used to annotate 271 pathways, accounting for pathway multi-functionality and gene pleiotropy. Pathways were then arranged into a network, linked by shared functionality. Of the genes in our data set, 44% appeared in multiple pathways performing a diverse set of functions. Linking pathways by overlapping functionality revealed a modular network with energy metabolism forming a sparse centre, surrounded by several denser clusters comprised of regulatory and metabolic pathways. Signalling pathways formed a relatively discrete cluster connected to the centre of the network. Genetic interactions were enriched within the clusters of pathways by a factor of 5.5, confirming the organisation of our pathway network is biologically significant. Conclusions Our representation of molecular function according to pathway relationships enables analysis of gene/protein activity in the context of specific functional roles, as an alternative to typical molecule-centric graph-based methods. The pathway network demonstrates the cooperation of multiple pathways to perform biological processes and organises pathways into functionally related clusters with interdependent outcomes. PMID:26678917

  15. A Modified ABCDE Model of Flowering in Orchids Based on Gene Expression Profiling Studies of the Moth Orchid Phalaenopsis aphrodite

    PubMed Central

    Lee, Ann-Ying; Chen, Chun-Yi; Chang, Yao-Chien Alex; Chao, Ya-Ting; Shih, Ming-Che

    2013-01-01

    Previously we developed genomic resources for orchids, including transcriptomic analyses using next-generation sequencing techniques and construction of a web-based orchid genomic database. Here, we report a modified molecular model of flower development in the Orchidaceae based on functional analysis of gene expression profiles in Phalaenopsis aphrodite (a moth orchid) that revealed novel roles for the transcription factors involved in floral organ pattern formation. Phalaenopsis orchid floral organ-specific genes were identified by microarray analysis. Several critical transcription factors including AP3, PI, AP1 and AGL6, displayed distinct spatial distribution patterns. Phylogenetic analysis of orchid MADS box genes was conducted to infer the evolutionary relationship among floral organ-specific genes. The results suggest that gene duplication MADS box genes in orchid may have resulted in their gaining novel functions during evolution. Based on these analyses, a modified model of orchid flowering was proposed. Comparison of the expression profiles of flowers of a peloric mutant and wild-type Phalaenopsis orchid further identified genes associated with lip morphology and peloric effects. Large scale investigation of gene expression profiles revealed that homeotic genes from the ABCDE model of flower development classes A and B in the Phalaenopsis orchid have novel functions due to evolutionary diversification, and display differential expression patterns. PMID:24265826

  16. High-throughput screening of a CRISPR/Cas9 library for functional genomics in human cells.

    PubMed

    Zhou, Yuexin; Zhu, Shiyou; Cai, Changzu; Yuan, Pengfei; Li, Chunmei; Huang, Yanyi; Wei, Wensheng

    2014-05-22

    Targeted genome editing technologies are powerful tools for studying biology and disease, and have a broad range of research applications. In contrast to the rapid development of toolkits to manipulate individual genes, large-scale screening methods based on the complete loss of gene expression are only now beginning to be developed. Here we report the development of a focused CRISPR/Cas-based (clustered regularly interspaced short palindromic repeats/CRISPR-associated) lentiviral library in human cells and a method of gene identification based on functional screening and high-throughput sequencing analysis. Using knockout library screens, we successfully identified the host genes essential for the intoxication of cells by anthrax and diphtheria toxins, which were confirmed by functional validation. The broad application of this powerful genetic screening strategy will not only facilitate the rapid identification of genes important for bacterial toxicity but will also enable the discovery of genes that participate in other biological processes.

  17. PanFP: Pangenome-based functional profiles for microbial communities

    DOE PAGES

    Jun, Se -Ran; Hauser, Loren John; Schadt, Christopher Warren; ...

    2015-09-26

    For decades there has been increasing interest in understanding the relationships between microbial communities and ecosystem functions. Current DNA sequencing technologies allows for the exploration of microbial communities in two principle ways: targeted rRNA gene surveys and shotgun metagenomics. For large study designs, it is often still prohibitively expensive to sequence metagenomes at both the breadth and depth necessary to statistically capture the true functional diversity of a community. Although rRNA gene surveys provide no direct evidence of function, they do provide a reasonable estimation of microbial diversity, while being a very cost effective way to screen samples of interestmore » for later shotgun metagenomic analyses. However, there is a great deal of 16S rRNA gene survey data currently available from diverse environments, and thus a need for tools to infer functional composition of environmental samples based on 16S rRNA gene survey data. As a result, we present a computational method called pangenome based functional profiles (PanFP), which infers functional profiles of microbial communities from 16S rRNA gene survey data for Bacteria and Archaea. PanFP is based on pangenome reconstruction of a 16S rRNA gene operational taxonomic unit (OTU) from known genes and genomes pooled from the OTU s taxonomic lineage. From this lineage, we derive an OTU functional profile by weighting a pangenome s functional profile with the OTUs abundance observed in a given sample. We validated our method by comparing PanFP to the functional profiles obtained from the direct shotgun metagenomic measurement of 65 diverse communities via Spearman correlation coefficients. These correlations improved with increasing sequencing depth, within the range of 0.8 0.9 for the most deeply sequenced Human Microbiome Project mock community samples. PanFP is very similar in performance to another recently released tool, PICRUSt, for almost all of survey data analysed here. But, our method is unique in that any OTU building method can be used, as opposed to being limited to closed reference OTU picking strategies against specific reference sequence databases. In conclusion, we developed an automated computational method, which derives an inferred functional profile based on the 16S rRNA gene surveys of microbial communities. The inferred functional profile provides a cost effective way to study complex ecosystems through predicted comparative functional metagenomes and metadata analysis. All PanFP source code and additional documentation are freely available online at GitHub.« less

  18. PanFP: pangenome-based functional profiles for microbial communities.

    PubMed

    Jun, Se-Ran; Robeson, Michael S; Hauser, Loren J; Schadt, Christopher W; Gorin, Andrey A

    2015-09-26

    For decades there has been increasing interest in understanding the relationships between microbial communities and ecosystem functions. Current DNA sequencing technologies allows for the exploration of microbial communities in two principle ways: targeted rRNA gene surveys and shotgun metagenomics. For large study designs, it is often still prohibitively expensive to sequence metagenomes at both the breadth and depth necessary to statistically capture the true functional diversity of a community. Although rRNA gene surveys provide no direct evidence of function, they do provide a reasonable estimation of microbial diversity, while being a very cost-effective way to screen samples of interest for later shotgun metagenomic analyses. However, there is a great deal of 16S rRNA gene survey data currently available from diverse environments, and thus a need for tools to infer functional composition of environmental samples based on 16S rRNA gene survey data. We present a computational method called pangenome-based functional profiles (PanFP), which infers functional profiles of microbial communities from 16S rRNA gene survey data for Bacteria and Archaea. PanFP is based on pangenome reconstruction of a 16S rRNA gene operational taxonomic unit (OTU) from known genes and genomes pooled from the OTU's taxonomic lineage. From this lineage, we derive an OTU functional profile by weighting a pangenome's functional profile with the OTUs abundance observed in a given sample. We validated our method by comparing PanFP to the functional profiles obtained from the direct shotgun metagenomic measurement of 65 diverse communities via Spearman correlation coefficients. These correlations improved with increasing sequencing depth, within the range of 0.8-0.9 for the most deeply sequenced Human Microbiome Project mock community samples. PanFP is very similar in performance to another recently released tool, PICRUSt, for almost all of survey data analysed here. But, our method is unique in that any OTU building method can be used, as opposed to being limited to closed-reference OTU picking strategies against specific reference sequence databases. We developed an automated computational method, which derives an inferred functional profile based on the 16S rRNA gene surveys of microbial communities. The inferred functional profile provides a cost effective way to study complex ecosystems through predicted comparative functional metagenomes and metadata analysis. All PanFP source code and additional documentation are freely available online at GitHub ( https://github.com/srjun/PanFP ).

  19. GO(vis), a gene ontology visualization tool based on multi-dimensional values.

    PubMed

    Ning, Zi; Jiang, Zhenran

    2010-05-01

    Most of gene product similarity measurements concentrate on the information content of Gene Ontology (GO) terms or use a path-based similarity between GO terms, which may ignore other important information contained in the structure of the ontology. In our study, we integrate different GO similarity measure approaches to analyze the functional relationship of genes and gene products with a new triangle-based visualization tool called GO(Vis). The purpose of this tool is to demonstrate the effect of three important information factors when measuring the similarity between gene products. One advantage of this tool is that its important ratio can be adjusted to meet different measuring requirements according to the biological knowledge of each factor. The experimental results demonstrate that GO(Vis) can display diagrams of the functional relationship for gene products effectively.

  20. Predictive regulatory models in Drosophila melanogaster by integrative inference of transcriptional networks

    PubMed Central

    Marbach, Daniel; Roy, Sushmita; Ay, Ferhat; Meyer, Patrick E.; Candeias, Rogerio; Kahveci, Tamer; Bristow, Christopher A.; Kellis, Manolis

    2012-01-01

    Gaining insights on gene regulation from large-scale functional data sets is a grand challenge in systems biology. In this article, we develop and apply methods for transcriptional regulatory network inference from diverse functional genomics data sets and demonstrate their value for gene function and gene expression prediction. We formulate the network inference problem in a machine-learning framework and use both supervised and unsupervised methods to predict regulatory edges by integrating transcription factor (TF) binding, evolutionarily conserved sequence motifs, gene expression, and chromatin modification data sets as input features. Applying these methods to Drosophila melanogaster, we predict ∼300,000 regulatory edges in a network of ∼600 TFs and 12,000 target genes. We validate our predictions using known regulatory interactions, gene functional annotations, tissue-specific expression, protein–protein interactions, and three-dimensional maps of chromosome conformation. We use the inferred network to identify putative functions for hundreds of previously uncharacterized genes, including many in nervous system development, which are independently confirmed based on their tissue-specific expression patterns. Last, we use the regulatory network to predict target gene expression levels as a function of TF expression, and find significantly higher predictive power for integrative networks than for motif or ChIP-based networks. Our work reveals the complementarity between physical evidence of regulatory interactions (TF binding, motif conservation) and functional evidence (coordinated expression or chromatin patterns) and demonstrates the power of data integration for network inference and studies of gene regulation at the systems level. PMID:22456606

  1. Genome Editing in the Cricket, Gryllus bimaculatus.

    PubMed

    Watanabe, Takahito; Noji, Sumihare; Mito, Taro

    2017-01-01

    Hemimetabolous, or incompletely metamorphosing, insects are phylogenetically basal and include many beneficial and deleterious species. The cricket, Gryllus bimaculatus, is an emerging model for hemimetabolous insects, based on the success of RNA interference (RNAi)-based gene-functional analyses and transgenic technology. Taking advantage of genome editing technologies in this species would greatly promote functional genomics studies. Genome editing has proven to be an effective method for site-specific genome manipulation in various species. Here, we describe a protocol for genome editing including gene knockout and gene knockin in G. bimaculatus for functional genomics studies.

  2. Analysis of mammalian gene function through broad based phenotypic screens across a consortium of mouse clinics

    PubMed Central

    Adams, David J; Adams, Niels C; Adler, Thure; Aguilar-Pimentel, Antonio; Ali-Hadji, Dalila; Amann, Gregory; André, Philippe; Atkins, Sarah; Auburtin, Aurelie; Ayadi, Abdel; Becker, Julien; Becker, Lore; Bedu, Elodie; Bekeredjian, Raffi; Birling, Marie-Christine; Blake, Andrew; Bottomley, Joanna; Bowl, Mike; Brault, Véronique; Busch, Dirk H; Bussell, James N; Calzada-Wack, Julia; Cater, Heather; Champy, Marie-France; Charles, Philippe; Chevalier, Claire; Chiani, Francesco; Codner, Gemma F; Combe, Roy; Cox, Roger; Dalloneau, Emilie; Dierich, André; Di Fenza, Armida; Doe, Brendan; Duchon, Arnaud; Eickelberg, Oliver; Esapa, Chris T; El Fertak, Lahcen; Feigel, Tanja; Emelyanova, Irina; Estabel, Jeanne; Favor, Jack; Flenniken, Ann; Gambadoro, Alessia; Garrett, Lilian; Gates, Hilary; Gerdin, Anna-Karin; Gkoutos, George; Greenaway, Simon; Glasl, Lisa; Goetz, Patrice; Da Cruz, Isabelle Goncalves; Götz, Alexander; Graw, Jochen; Guimond, Alain; Hans, Wolfgang; Hicks, Geoff; Hölter, Sabine M; Höfler, Heinz; Hancock, John M; Hoehndorf, Robert; Hough, Tertius; Houghton, Richard; Hurt, Anja; Ivandic, Boris; Jacobs, Hughes; Jacquot, Sylvie; Jones, Nora; Karp, Natasha A; Katus, Hugo A; Kitchen, Sharon; Klein-Rodewald, Tanja; Klingenspor, Martin; Klopstock, Thomas; Lalanne, Valerie; Leblanc, Sophie; Lengger, Christoph; le Marchand, Elise; Ludwig, Tonia; Lux, Aline; McKerlie, Colin; Maier, Holger; Mandel, Jean-Louis; Marschall, Susan; Mark, Manuel; Melvin, David G; Meziane, Hamid; Micklich, Kateryna; Mittelhauser, Christophe; Monassier, Laurent; Moulaert, David; Muller, Stéphanie; Naton, Beatrix; Neff, Frauke; Nolan, Patrick M; Nutter, Lauryl MJ; Ollert, Markus; Pavlovic, Guillaume; Pellegata, Natalia S; Peter, Emilie; Petit-Demoulière, Benoit; Pickard, Amanda; Podrini, Christine; Potter, Paul; Pouilly, Laurent; Puk, Oliver; Richardson, David; Rousseau, Stephane; Quintanilla-Fend, Leticia; Quwailid, Mohamed M; Racz, Ildiko; Rathkolb, Birgit; Riet, Fabrice; Rossant, Janet; Roux, Michel; Rozman, Jan; Ryder, Ed; Salisbury, Jennifer; Santos, Luis; Schäble, Karl-Heinz; Schiller, Evelyn; Schrewe, Anja; Schulz, Holger; Steinkamp, Ralf; Simon, Michelle; Stewart, Michelle; Stöger, Claudia; Stöger, Tobias; Sun, Minxuan; Sunter, David; Teboul, Lydia; Tilly, Isabelle; Tocchini-Valentini, Glauco P; Tost, Monica; Treise, Irina; Vasseur, Laurent; Velot, Emilie; Vogt-Weisenhorn, Daniela; Wagner, Christelle; Walling, Alison; Weber, Bruno; Wendling, Olivia; Westerberg, Henrik; Willershäuser, Monja; Wolf, Eckhard; Wolter, Anne; Wood, Joe; Wurst, Wolfgang; Yildirim, Ali Önder; Zeh, Ramona; Zimmer, Andreas; Zimprich, Annemarie

    2015-01-01

    The function of the majority of genes in the mouse and human genomes remains unknown. The mouse ES cell knockout resource provides a basis for characterisation of relationships between gene and phenotype. The EUMODIC consortium developed and validated robust methodologies for broad-based phenotyping of knockouts through a pipeline comprising 20 disease-orientated platforms. We developed novel statistical methods for pipeline design and data analysis aimed at detecting reproducible phenotypes with high power. We acquired phenotype data from 449 mutant alleles, representing 320 unique genes, of which half had no prior functional annotation. We captured data from over 27,000 mice finding that 83% of the mutant lines are phenodeviant, with 65% demonstrating pleiotropy. Surprisingly, we found significant differences in phenotype annotation according to zygosity. Novel phenotypes were uncovered for many genes with unknown function providing a powerful basis for hypothesis generation and further investigation in diverse systems. PMID:26214591

  3. Refinement of light-responsive transcript lists using rice oligonucleotide arrays: evaluation of gene-redundancy.

    PubMed

    Jung, Ki-Hong; Dardick, Christopher; Bartley, Laura E; Cao, Peijian; Phetsom, Jirapa; Canlas, Patrick; Seo, Young-Su; Shultz, Michael; Ouyang, Shu; Yuan, Qiaoping; Frank, Bryan C; Ly, Eugene; Zheng, Li; Jia, Yi; Hsia, An-Ping; An, Kyungsook; Chou, Hui-Hsien; Rocke, David; Lee, Geun Cheol; Schnable, Patrick S; An, Gynheung; Buell, C Robin; Ronald, Pamela C

    2008-10-06

    Studies of gene function are often hampered by gene-redundancy, especially in organisms with large genomes such as rice (Oryza sativa). We present an approach for using transcriptomics data to focus functional studies and address redundancy. To this end, we have constructed and validated an inexpensive and publicly available rice oligonucleotide near-whole genome array, called the rice NSF45K array. We generated expression profiles for light- vs. dark-grown rice leaf tissue and validated the biological significance of the data by analyzing sources of variation and confirming expression trends with reverse transcription polymerase chain reaction. We examined trends in the data by evaluating enrichment of gene ontology terms at multiple false discovery rate thresholds. To compare data generated with the NSF45K array with published results, we developed publicly available, web-based tools (www.ricearray.org). The Oligo and EST Anatomy Viewer enables visualization of EST-based expression profiling data for all genes on the array. The Rice Multi-platform Microarray Search Tool facilitates comparison of gene expression profiles across multiple rice microarray platforms. Finally, we incorporated gene expression and biochemical pathway data to reduce the number of candidate gene products putatively participating in the eight steps of the photorespiration pathway from 52 to 10, based on expression levels of putatively functionally redundant genes. We confirmed the efficacy of this method to cope with redundancy by correctly predicting participation in photorespiration of a gene with five paralogs. Applying these methods will accelerate rice functional genomics.

  4. Discovering transnosological molecular basis of human brain diseases using biclustering analysis of integrated gene expression data.

    PubMed

    Cha, Kihoon; Hwang, Taeho; Oh, Kimin; Yi, Gwan-Su

    2015-01-01

    It has been reported that several brain diseases can be treated as transnosological manner implicating possible common molecular basis under those diseases. However, molecular level commonality among those brain diseases has been largely unexplored. Gene expression analyses of human brain have been used to find genes associated with brain diseases but most of those studies were restricted either to an individual disease or to a couple of diseases. In addition, identifying significant genes in such brain diseases mostly failed when it used typical methods depending on differentially expressed genes. In this study, we used a correlation-based biclustering approach to find coexpressed gene sets in five neurodegenerative diseases and three psychiatric disorders. By using biclustering analysis, we could efficiently and fairly identified various gene sets expressed specifically in both single and multiple brain diseases. We could find 4,307 gene sets correlatively expressed in multiple brain diseases and 3,409 gene sets exclusively specified in individual brain diseases. The function enrichment analysis of those gene sets showed many new possible functional bases as well as neurological processes that are common or specific for those eight diseases. This study introduces possible common molecular bases for several brain diseases, which open the opportunity to clarify the transnosological perspective assumed in brain diseases. It also showed the advantages of correlation-based biclustering analysis and accompanying function enrichment analysis for gene expression data in this type of investigation.

  5. Discovering transnosological molecular basis of human brain diseases using biclustering analysis of integrated gene expression data

    PubMed Central

    2015-01-01

    Background It has been reported that several brain diseases can be treated as transnosological manner implicating possible common molecular basis under those diseases. However, molecular level commonality among those brain diseases has been largely unexplored. Gene expression analyses of human brain have been used to find genes associated with brain diseases but most of those studies were restricted either to an individual disease or to a couple of diseases. In addition, identifying significant genes in such brain diseases mostly failed when it used typical methods depending on differentially expressed genes. Results In this study, we used a correlation-based biclustering approach to find coexpressed gene sets in five neurodegenerative diseases and three psychiatric disorders. By using biclustering analysis, we could efficiently and fairly identified various gene sets expressed specifically in both single and multiple brain diseases. We could find 4,307 gene sets correlatively expressed in multiple brain diseases and 3,409 gene sets exclusively specified in individual brain diseases. The function enrichment analysis of those gene sets showed many new possible functional bases as well as neurological processes that are common or specific for those eight diseases. Conclusions This study introduces possible common molecular bases for several brain diseases, which open the opportunity to clarify the transnosological perspective assumed in brain diseases. It also showed the advantages of correlation-based biclustering analysis and accompanying function enrichment analysis for gene expression data in this type of investigation. PMID:26043779

  6. On the Use of Gene Ontology Annotations to Assess Functional Similarity among Orthologs and Paralogs: A Short Report

    PubMed Central

    Thomas, Paul D.; Wood, Valerie; Mungall, Christopher J.; Lewis, Suzanna E.; Blake, Judith A.

    2012-01-01

    A recent paper (Nehrt et al., PLoS Comput. Biol. 7:e1002073, 2011) has proposed a metric for the “functional similarity” between two genes that uses only the Gene Ontology (GO) annotations directly derived from published experimental results. Applying this metric, the authors concluded that paralogous genes within the mouse genome or the human genome are more functionally similar on average than orthologous genes between these genomes, an unexpected result with broad implications if true. We suggest, based on both theoretical and empirical considerations, that this proposed metric should not be interpreted as a functional similarity, and therefore cannot be used to support any conclusions about the “ortholog conjecture” (or, more properly, the “ortholog functional conservation hypothesis”). First, we reexamine the case studies presented by Nehrt et al. as examples of orthologs with divergent functions, and come to a very different conclusion: they actually exemplify how GO annotations for orthologous genes provide complementary information about conserved biological functions. We then show that there is a global ascertainment bias in the experiment-based GO annotations for human and mouse genes: particular types of experiments tend to be performed in different model organisms. We conclude that the reported statistical differences in annotations between pairs of orthologous genes do not reflect differences in biological function, but rather complementarity in experimental approaches. Our results underscore two general considerations for researchers proposing novel types of analysis based on the GO: 1) that GO annotations are often incomplete, potentially in a biased manner, and subject to an “open world assumption” (absence of an annotation does not imply absence of a function), and 2) that conclusions drawn from a novel, large-scale GO analysis should whenever possible be supported by careful, in-depth examination of examples, to help ensure the conclusions have a justifiable biological basis. PMID:22359495

  7. The promises and pitfalls of RNA-interference-based therapeutics

    PubMed Central

    Castanotto, Daniela; Rossi, John J.

    2009-01-01

    The discovery that gene expression can be controlled by the Watson–Crick base-pairing of small RNAs with messenger RNAs containing complementary sequence — a process known as RNA interference — has markedly advanced our understanding of eukaryotic gene regulation and function. The ability of short RNA sequences to modulate gene expression has provided a powerful tool with which to study gene function and is set to revolutionize the treatment of disease. Remarkably, despite being just one decade from its discovery, the phenomenon is already being used therapeutically in human clinical trials, and biotechnology companies that focus on RNA-interference-based therapeutics are already publicly traded. PMID:19158789

  8. Differential network analysis reveals the genome-wide landscape of estrogen receptor modulation in hormonal cancers

    PubMed Central

    Hsiao, Tzu-Hung; Chiu, Yu-Chiao; Hsu, Pei-Yin; Lu, Tzu-Pin; Lai, Liang-Chuan; Tsai, Mong-Hsun; Huang, Tim H.-M.; Chuang, Eric Y.; Chen, Yidong

    2016-01-01

    Several mutual information (MI)-based algorithms have been developed to identify dynamic gene-gene and function-function interactions governed by key modulators (genes, proteins, etc.). Due to intensive computation, however, these methods rely heavily on prior knowledge and are limited in genome-wide analysis. We present the modulated gene/gene set interaction (MAGIC) analysis to systematically identify genome-wide modulation of interaction networks. Based on a novel statistical test employing conjugate Fisher transformations of correlation coefficients, MAGIC features fast computation and adaption to variations of clinical cohorts. In simulated datasets MAGIC achieved greatly improved computation efficiency and overall superior performance than the MI-based method. We applied MAGIC to construct the estrogen receptor (ER) modulated gene and gene set (representing biological function) interaction networks in breast cancer. Several novel interaction hubs and functional interactions were discovered. ER+ dependent interaction between TGFβ and NFκB was further shown to be associated with patient survival. The findings were verified in independent datasets. Using MAGIC, we also assessed the essential roles of ER modulation in another hormonal cancer, ovarian cancer. Overall, MAGIC is a systematic framework for comprehensively identifying and constructing the modulated interaction networks in a whole-genome landscape. MATLAB implementation of MAGIC is available for academic uses at https://github.com/chiuyc/MAGIC. PMID:26972162

  9. Tobacco rattle virus (TRV) based silencing of cotton enoyl-CoA reductase (ECR) gene and the role of very long chain fatty acids in normal leaf development and resistance to wilt disease

    USDA-ARS?s Scientific Manuscript database

    A Tobacco rattle virus (TRV) based virus-induced gene silencing (VIGS) assay was employed as a reverse genetic approach to study gene function in cotton (Gossypium hirsutum). This approach was used to investigate the function of Enoyl-CoA reductase (GhECR) in pathogen defense. Amino acid sequence al...

  10. MATRIX FACTORIZATION-BASED DATA FUSION FOR GENE FUNCTION PREDICTION IN BAKER’S YEAST AND SLIME MOLD

    PubMed Central

    ŽITNIK, MARINKA; ZUPAN, BLAŽ

    2014-01-01

    The development of effective methods for the characterization of gene functions that are able to combine diverse data sources in a sound and easily-extendible way is an important goal in computational biology. We have previously developed a general matrix factorization-based data fusion approach for gene function prediction. In this manuscript, we show that this data fusion approach can be applied to gene function prediction and that it can fuse various heterogeneous data sources, such as gene expression profiles, known protein annotations, interaction and literature data. The fusion is achieved by simultaneous matrix tri-factorization that shares matrix factors between sources. We demonstrate the effectiveness of the approach by evaluating its performance on predicting ontological annotations in slime mold D. discoideum and on recognizing proteins of baker’s yeast S. cerevisiae that participate in the ribosome or are located in the cell membrane. Our approach achieves predictive performance comparable to that of the state-of-the-art kernel-based data fusion, but requires fewer data preprocessing steps. PMID:24297565

  11. Mitochondrial Gene Therapy: Advances in Mitochondrial Gene Cloning, Plasmid Production, and Nanosystems Targeted to Mitochondria.

    PubMed

    Coutinho, Eduarda; Batista, Cátia; Sousa, Fani; Queiroz, João; Costa, Diana

    2017-03-06

    Mitochondrial gene therapy seems to be a valuable and promising strategy to treat mitochondrial disorders. The use of a therapeutic vector based on mitochondrial DNA, along with its affinity to the site of mitochondria, can be considered a powerful tool in the reestablishment of normal mitochondrial function. In line with this and for the first time, we successfully cloned the mitochondrial gene ND1 that was stably maintained in multicopy pCAG-GFP plasmid, which is used to transform E. coli. This mitochondrial-gene-based plasmid was encapsulated into nanoparticles. Furthermore, the functionalization of nanoparticles with polymers, such as cellulose or gelatin, enhances their overall properties and performance for gene therapy. The fluorescence arising from rhodamine nanoparticles in mitochondria and a fluorescence microscopy study show pCAG-GFP-ND1-based nanoparticles' cell internalization and mitochondria targeting. The quantification of GFP expression strongly supports this finding. This work highlights the viability of gene therapy based on mitochondrial DNA instigating further in vitro research and clinical translation.

  12. An extensive analysis of disease-gene associations using network integration and fast kernel-based gene prioritization methods.

    PubMed

    Valentini, Giorgio; Paccanaro, Alberto; Caniza, Horacio; Romero, Alfonso E; Re, Matteo

    2014-06-01

    In the context of "network medicine", gene prioritization methods represent one of the main tools to discover candidate disease genes by exploiting the large amount of data covering different types of functional relationships between genes. Several works proposed to integrate multiple sources of data to improve disease gene prioritization, but to our knowledge no systematic studies focused on the quantitative evaluation of the impact of network integration on gene prioritization. In this paper, we aim at providing an extensive analysis of gene-disease associations not limited to genetic disorders, and a systematic comparison of different network integration methods for gene prioritization. We collected nine different functional networks representing different functional relationships between genes, and we combined them through both unweighted and weighted network integration methods. We then prioritized genes with respect to each of the considered 708 medical subject headings (MeSH) diseases by applying classical guilt-by-association, random walk and random walk with restart algorithms, and the recently proposed kernelized score functions. The results obtained with classical random walk algorithms and the best single network achieved an average area under the curve (AUC) across the 708 MeSH diseases of about 0.82, while kernelized score functions and network integration boosted the average AUC to about 0.89. Weighted integration, by exploiting the different "informativeness" embedded in different functional networks, outperforms unweighted integration at 0.01 significance level, according to the Wilcoxon signed rank sum test. For each MeSH disease we provide the top-ranked unannotated candidate genes, available for further bio-medical investigation. Network integration is necessary to boost the performances of gene prioritization methods. Moreover the methods based on kernelized score functions can further enhance disease gene ranking results, by adopting both local and global learning strategies, able to exploit the overall topology of the network. Copyright © 2014 The Authors. Published by Elsevier B.V. All rights reserved.

  13. Microarray-based analysis of survival of soil microbial community during ozonation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, Jian; Van Nostrand, Joy D.; He, Zhili

    A 15 h ozonation was performed on bioremediated soil to remove recalcitrant residual oil. To monitor the survival of indigenous microorganisms in the soil during in-situ chemical oxidation(ISCO) culturing and a functional genearray, GeoChip, was used to examine the functional genes and structure of the microbial community during ozonation (0h, 2h, 4h, 6h, 10hand15h). Breakthrough ozonation decreased the population of cultivable heterotrophic bacteria by about 3 orders of magnitude. The total functional gene abundance and diversity decreased during ozonation, as the number of functional genes was reduced by 48percent after 15 h. However, functional genes were evenly distributed during ozonationmore » as judged by the Shannon-Weaver Evenness index. A sharp decrease in gene number was observed in the first 6 h of ozonation followed by a slower decrease in the next 9 h, which was consistent with microbial populations measured by a culture based method. Functional genes involved in carbon, nitrogen, phosphors and sulfur cycling, metal resistance and organic remediation were detected in all samples. Though the pattern of gene categories detected was similar for all time points, hierarchica lcluster of all functional genes and major functional categories all showed a time-serial pattern. Bacteria, archaea and fungi decreased by 96.1percent, 95.1percent and 91.3percent, respectively, after 15 h ozonation. Delta proteobacteria, which were reduced by 94.3percent, showed the highest resistance to ozonation while Actinobacteria, reduced by 96.3percent, showed the lowest resistance. Microorganisms similar to Rhodothermus, Obesumbacterium, Staphylothermus, Gluconobacter, and Enterococcus were dominant at all time points. Functional genes related to petroleum degradation decreased 1~;;2 orders of magnitude. Most of the key functional genes were still detected after ozonation, allowing a rapid recovery of the microbial community after ozonation. While ozone had a large impact on the indigenous soil microorganisms, a fraction of the key functional gene-containing microorganisms survived during ozonation and kept the community functional.« less

  14. Systematically Differentiating Functions for Alternatively Spliced Isoforms through Integrating RNA-seq Data

    PubMed Central

    Menon, Rajasree; Wen, Yuchen; Omenn, Gilbert S.; Kretzler, Matthias; Guan, Yuanfang

    2013-01-01

    Integrating large-scale functional genomic data has significantly accelerated our understanding of gene functions. However, no algorithm has been developed to differentiate functions for isoforms of the same gene using high-throughput genomic data. This is because standard supervised learning requires ‘ground-truth’ functional annotations, which are lacking at the isoform level. To address this challenge, we developed a generic framework that interrogates public RNA-seq data at the transcript level to differentiate functions for alternatively spliced isoforms. For a specific function, our algorithm identifies the ‘responsible’ isoform(s) of a gene and generates classifying models at the isoform level instead of at the gene level. Through cross-validation, we demonstrated that our algorithm is effective in assigning functions to genes, especially the ones with multiple isoforms, and robust to gene expression levels and removal of homologous gene pairs. We identified genes in the mouse whose isoforms are predicted to have disparate functionalities and experimentally validated the ‘responsible’ isoforms using data from mammary tissue. With protein structure modeling and experimental evidence, we further validated the predicted isoform functional differences for the genes Cdkn2a and Anxa6. Our generic framework is the first to predict and differentiate functions for alternatively spliced isoforms, instead of genes, using genomic data. It is extendable to any base machine learner and other species with alternatively spliced isoforms, and shifts the current gene-centered function prediction to isoform-level predictions. PMID:24244129

  15. Functional Annotation, Genome Organization and Phylogeny of the Grapevine (Vitis vinifera) Terpene Synthase Gene Family Based on Genome Assembly, FLcDNA Cloning, and Enzyme Assays

    PubMed Central

    2010-01-01

    Background Terpenoids are among the most important constituents of grape flavour and wine bouquet, and serve as useful metabolite markers in viticulture and enology. Based on the initial 8-fold sequencing of a nearly homozygous Pinot noir inbred line, 89 putative terpenoid synthase genes (VvTPS) were predicted by in silico analysis of the grapevine (Vitis vinifera) genome assembly [1]. The finding of this very large VvTPS family, combined with the importance of terpenoid metabolism for the organoleptic properties of grapevine berries and finished wines, prompted a detailed examination of this gene family at the genomic level as well as an investigation into VvTPS biochemical functions. Results We present findings from the analysis of the up-dated 12-fold sequencing and assembly of the grapevine genome that place the number of predicted VvTPS genes at 69 putatively functional VvTPS, 20 partial VvTPS, and 63 VvTPS probable pseudogenes. Gene discovery and annotation included information about gene architecture and chromosomal location. A dense cluster of 45 VvTPS is localized on chromosome 18. Extensive FLcDNA cloning, gene synthesis, and protein expression enabled functional characterization of 39 VvTPS; this is the largest number of functionally characterized TPS for any species reported to date. Of these enzymes, 23 have unique functions and/or phylogenetic locations within the plant TPS gene family. Phylogenetic analyses of the TPS gene family showed that while most VvTPS form species-specific gene clusters, there are several examples of gene orthology with TPS of other plant species, representing perhaps more ancient VvTPS, which have maintained functions independent of speciation. Conclusions The highly expanded VvTPS gene family underpins the prominence of terpenoid metabolism in grapevine. We provide a detailed experimental functional annotation of 39 members of this important gene family in grapevine and comprehensive information about gene structure and phylogeny for the entire currently known VvTPS gene family. PMID:20964856

  16. Gene Function Hypotheses for the Campylobacter jejuni Glycome Generated by a Logic-Based Approach

    PubMed Central

    Sternberg, Michael J.E.; Tamaddoni-Nezhad, Alireza; Lesk, Victor I.; Kay, Emily; Hitchen, Paul G.; Cootes, Adrian; van Alphen, Lieke B.; Lamoureux, Marc P.; Jarrell, Harold C.; Rawlings, Christopher J.; Soo, Evelyn C.; Szymanski, Christine M.; Dell, Anne; Wren, Brendan W.; Muggleton, Stephen H.

    2013-01-01

    Increasingly, experimental data on biological systems are obtained from several sources and computational approaches are required to integrate this information and derive models for the function of the system. Here, we demonstrate the power of a logic-based machine learning approach to propose hypotheses for gene function integrating information from two diverse experimental approaches. Specifically, we use inductive logic programming that automatically proposes hypotheses explaining the empirical data with respect to logically encoded background knowledge. We study the capsular polysaccharide biosynthetic pathway of the major human gastrointestinal pathogen Campylobacter jejuni. We consider several key steps in the formation of capsular polysaccharide consisting of 15 genes of which 8 have assigned function, and we explore the extent to which functions can be hypothesised for the remaining 7. Two sources of experimental data provide the information for learning—the results of knockout experiments on the genes involved in capsule formation and the absence/presence of capsule genes in a multitude of strains of different serotypes. The machine learning uses the pathway structure as background knowledge. We propose assignments of specific genes to five previously unassigned reaction steps. For four of these steps, there was an unambiguous optimal assignment of gene to reaction, and to the fifth, there were three candidate genes. Several of these assignments were consistent with additional experimental results. We therefore show that the logic-based methodology provides a robust strategy to integrate results from different experimental approaches and propose hypotheses for the behaviour of a biological system. PMID:23103756

  17. Gene function hypotheses for the Campylobacter jejuni glycome generated by a logic-based approach.

    PubMed

    Sternberg, Michael J E; Tamaddoni-Nezhad, Alireza; Lesk, Victor I; Kay, Emily; Hitchen, Paul G; Cootes, Adrian; van Alphen, Lieke B; Lamoureux, Marc P; Jarrell, Harold C; Rawlings, Christopher J; Soo, Evelyn C; Szymanski, Christine M; Dell, Anne; Wren, Brendan W; Muggleton, Stephen H

    2013-01-09

    Increasingly, experimental data on biological systems are obtained from several sources and computational approaches are required to integrate this information and derive models for the function of the system. Here, we demonstrate the power of a logic-based machine learning approach to propose hypotheses for gene function integrating information from two diverse experimental approaches. Specifically, we use inductive logic programming that automatically proposes hypotheses explaining the empirical data with respect to logically encoded background knowledge. We study the capsular polysaccharide biosynthetic pathway of the major human gastrointestinal pathogen Campylobacter jejuni. We consider several key steps in the formation of capsular polysaccharide consisting of 15 genes of which 8 have assigned function, and we explore the extent to which functions can be hypothesised for the remaining 7. Two sources of experimental data provide the information for learning-the results of knockout experiments on the genes involved in capsule formation and the absence/presence of capsule genes in a multitude of strains of different serotypes. The machine learning uses the pathway structure as background knowledge. We propose assignments of specific genes to five previously unassigned reaction steps. For four of these steps, there was an unambiguous optimal assignment of gene to reaction, and to the fifth, there were three candidate genes. Several of these assignments were consistent with additional experimental results. We therefore show that the logic-based methodology provides a robust strategy to integrate results from different experimental approaches and propose hypotheses for the behaviour of a biological system. Copyright © 2012 Elsevier Ltd. All rights reserved.

  18. A study of structural properties of gene network graphs for mathematical modeling of integrated mosaic gene networks.

    PubMed

    Petrovskaya, Olga V; Petrovskiy, Evgeny D; Lavrik, Inna N; Ivanisenko, Vladimir A

    2017-04-01

    Gene network modeling is one of the widely used approaches in systems biology. It allows for the study of complex genetic systems function, including so-called mosaic gene networks, which consist of functionally interacting subnetworks. We conducted a study of a mosaic gene networks modeling method based on integration of models of gene subnetworks by linear control functionals. An automatic modeling of 10,000 synthetic mosaic gene regulatory networks was carried out using computer experiments on gene knockdowns/knockouts. Structural analysis of graphs of generated mosaic gene regulatory networks has revealed that the most important factor for building accurate integrated mathematical models, among those analyzed in the study, is data on expression of genes corresponding to the vertices with high properties of centrality.

  19. Comparison of two schemes for automatic keyword extraction from MEDLINE for functional gene clustering.

    PubMed

    Liu, Ying; Ciliax, Brian J; Borges, Karin; Dasigi, Venu; Ram, Ashwin; Navathe, Shamkant B; Dingledine, Ray

    2004-01-01

    One of the key challenges of microarray studies is to derive biological insights from the unprecedented quatities of data on gene-expression patterns. Clustering genes by functional keyword association can provide direct information about the nature of the functional links among genes within the derived clusters. However, the quality of the keyword lists extracted from biomedical literature for each gene significantly affects the clustering results. We extracted keywords from MEDLINE that describes the most prominent functions of the genes, and used the resulting weights of the keywords as feature vectors for gene clustering. By analyzing the resulting cluster quality, we compared two keyword weighting schemes: normalized z-score and term frequency-inverse document frequency (TFIDF). The best combination of background comparison set, stop list and stemming algorithm was selected based on precision and recall metrics. In a test set of four known gene groups, a hierarchical algorithm correctly assigned 25 of 26 genes to the appropriate clusters based on keywords extracted by the TDFIDF weighting scheme, but only 23 og 26 with the z-score method. To evaluate the effectiveness of the weighting schemes for keyword extraction for gene clusters from microarray profiles, 44 yeast genes that are differentially expressed during the cell cycle were used as a second test set. Using established measures of cluster quality, the results produced from TFIDF-weighted keywords had higher purity, lower entropy, and higher mutual information than those produced from normalized z-score weighted keywords. The optimized algorithms should be useful for sorting genes from microarray lists into functionally discrete clusters.

  20. Transcriptome-wide analysis of WRKY transcription factors in wheat and their leaf rust responsive expression profiling.

    PubMed

    Satapathy, Lopamudra; Singh, Dharmendra; Ranjan, Prashant; Kumar, Dhananjay; Kumar, Manish; Prabhu, Kumble Vinod; Mukhopadhyay, Kunal

    2014-12-01

    WRKY, a plant-specific transcription factor family, has important roles in pathogen defense, abiotic cues and phytohormone signaling, yet little is known about their roles and molecular mechanism of function in response to rust diseases in wheat. We identified 100 TaWRKY sequences using wheat Expressed Sequence Tag database of which 22 WRKY sequences were novel. Identified proteins were characterized based on their zinc finger motifs and phylogenetic analysis clustered them into six clades consisting of class IIc and class III WRKY proteins. Functional annotation revealed major functions in metabolic and cellular processes in control plants; whereas response to stimuli, signaling and defense in pathogen inoculated plants, their major molecular function being binding to DNA. Tag-based expression analysis of the identified genes revealed differential expression between mock and Puccinia triticina inoculated wheat near isogenic lines. Gene expression was also performed with six rust-related microarray experiments at Gene Expression Omnibus database. TaWRKY10, 15, 17 and 56 were common in both tag-based and microarray-based differential expression analysis and could be representing rust specific WRKY genes. The obtained results will bestow insight into the functional characterization of WRKY transcription factors responsive to leaf rust pathogenesis that can be used as candidate genes in molecular breeding programs to improve biotic stress tolerance in wheat.

  1. Dramatic Increases of Soil Microbial Functional Gene Diversity at the Treeline Ecotone of Changbai Mountain.

    PubMed

    Shen, Congcong; Shi, Yu; Ni, Yingying; Deng, Ye; Van Nostrand, Joy D; He, Zhili; Zhou, Jizhong; Chu, Haiyan

    2016-01-01

    The elevational and latitudinal diversity patterns of microbial taxa have attracted great attention in the past decade. Recently, the distribution of functional attributes has been in the spotlight. Here, we report a study profiling soil microbial communities along an elevation gradient (500-2200 m) on Changbai Mountain. Using a comprehensive functional gene microarray (GeoChip 5.0), we found that microbial functional gene richness exhibited a dramatic increase at the treeline ecotone, but the bacterial taxonomic and phylogenetic diversity based on 16S rRNA gene sequencing did not exhibit such a similar trend. However, the β-diversity (compositional dissimilarity among sites) pattern for both bacterial taxa and functional genes was similar, showing significant elevational distance-decay patterns which presented increased dissimilarity with elevation. The bacterial taxonomic diversity/structure was strongly influenced by soil pH, while the functional gene diversity/structure was significantly correlated with soil dissolved organic carbon (DOC). This finding highlights that soil DOC may be a good predictor in determining the elevational distribution of microbial functional genes. The finding of significant shifts in functional gene diversity at the treeline ecotone could also provide valuable information for predicting the responses of microbial functions to climate change.

  2. Integration of somatic mutation, expression and functional data reveals potential driver genes predictive of breast cancer survival.

    PubMed

    Suo, Chen; Hrydziuszko, Olga; Lee, Donghwan; Pramana, Setia; Saputra, Dhany; Joshi, Himanshu; Calza, Stefano; Pawitan, Yudi

    2015-08-15

    Genome and transcriptome analyses can be used to explore cancers comprehensively, and it is increasingly common to have multiple omics data measured from each individual. Furthermore, there are rich functional data such as predicted impact of mutations on protein coding and gene/protein networks. However, integration of the complex information across the different omics and functional data is still challenging. Clinical validation, particularly based on patient outcomes such as survival, is important for assessing the relevance of the integrated information and for comparing different procedures. An analysis pipeline is built for integrating genomic and transcriptomic alterations from whole-exome and RNA sequence data and functional data from protein function prediction and gene interaction networks. The method accumulates evidence for the functional implications of mutated potential driver genes found within and across patients. A driver-gene score (DGscore) is developed to capture the cumulative effect of such genes. To contribute to the score, a gene has to be frequently mutated, with high or moderate mutational impact at protein level, exhibiting an extreme expression and functionally linked to many differentially expressed neighbors in the functional gene network. The pipeline is applied to 60 matched tumor and normal samples of the same patient from The Cancer Genome Atlas breast-cancer project. In clinical validation, patients with high DGscores have worse survival than those with low scores (P = 0.001). Furthermore, the DGscore outperforms the established expression-based signatures MammaPrint and PAM50 in predicting patient survival. In conclusion, integration of mutation, expression and functional data allows identification of clinically relevant potential driver genes in cancer. The documented pipeline including annotated sample scripts can be found in http://fafner.meb.ki.se/biostatwiki/driver-genes/. yudi.pawitan@ki.se Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  3. A machine-learned computational functional genomics-based approach to drug classification.

    PubMed

    Lötsch, Jörn; Ultsch, Alfred

    2016-12-01

    The public accessibility of "big data" about the molecular targets of drugs and the biological functions of genes allows novel data science-based approaches to pharmacology that link drugs directly with their effects on pathophysiologic processes. This provides a phenotypic path to drug discovery and repurposing. This paper compares the performance of a functional genomics-based criterion to the traditional drug target-based classification. Knowledge discovery in the DrugBank and Gene Ontology databases allowed the construction of a "drug target versus biological process" matrix as a combination of "drug versus genes" and "genes versus biological processes" matrices. As a canonical example, such matrices were constructed for classical analgesic drugs. These matrices were projected onto a toroid grid of 50 × 82 artificial neurons using a self-organizing map (SOM). The distance, respectively, cluster structure of the high-dimensional feature space of the matrices was visualized on top of this SOM using a U-matrix. The cluster structure emerging on the U-matrix provided a correct classification of the analgesics into two main classes of opioid and non-opioid analgesics. The classification was flawless with both the functional genomics and the traditional target-based criterion. The functional genomics approach inherently included the drugs' modulatory effects on biological processes. The main pharmacological actions known from pharmacological science were captures, e.g., actions on lipid signaling for non-opioid analgesics that comprised many NSAIDs and actions on neuronal signal transmission for opioid analgesics. Using machine-learned techniques for computational drug classification in a comparative assessment, a functional genomics-based criterion was found to be similarly suitable for drug classification as the traditional target-based criterion. This supports a utility of functional genomics-based approaches to computational system pharmacology for drug discovery and repurposing.

  4. funRiceGenes dataset for comprehensive understanding and application of rice functional genes.

    PubMed

    Yao, Wen; Li, Guangwei; Yu, Yiming; Ouyang, Yidan

    2018-01-01

    As a main staple food, rice is also a model plant for functional genomic studies of monocots. Decoding of every DNA element of the rice genome is essential for genetic improvement to address increasing food demands. The past 15 years have witnessed extraordinary advances in rice functional genomics. Systematic characterization and proper deposition of every rice gene are vital for both functional studies and crop genetic improvement. We built a comprehensive and accurate dataset of ∼2800 functionally characterized rice genes and ∼5000 members of different gene families by integrating data from available databases and reviewing every publication on rice functional genomic studies. The dataset accounts for 19.2% of the 39 045 annotated protein-coding rice genes, which provides the most exhaustive archive for investigating the functions of rice genes. We also constructed 214 gene interaction networks based on 1841 connections between 1310 genes. The largest network with 762 genes indicated that pleiotropic genes linked different biological pathways. Increasing degree of conservation of the flowering pathway was observed among more closely related plants, implying substantial value of rice genes for future dissection of flowering regulation in other crops. All data are deposited in the funRiceGenes database (https://funricegenes.github.io/). Functionality for advanced search and continuous updating of the database are provided by a Shiny application (http://funricegenes.ncpgr.cn/). The funRiceGenes dataset would enable further exploring of the crosslink between gene functions and natural variations in rice, which can also facilitate breeding design to improve target agronomic traits of rice. © The Authors 2017. Published by Oxford University Press.

  5. A novel bioinformatics pipeline to discover genes related to arbuscular mycorrhizal symbiosis based on their evolutionary conservation pattern among higher plants.

    PubMed

    Favre, Patrick; Bapaume, Laure; Bossolini, Eligio; Delorenzi, Mauro; Falquet, Laurent; Reinhardt, Didier

    2014-12-03

    Genes involved in arbuscular mycorrhizal (AM) symbiosis have been identified primarily by mutant screens, followed by identification of the mutated genes (forward genetics). In addition, a number of AM-related genes has been identified by their AM-related expression patterns, and their function has subsequently been elucidated by knock-down or knock-out approaches (reverse genetics). However, genes that are members of functionally redundant gene families, or genes that have a vital function and therefore result in lethal mutant phenotypes, are difficult to identify. If such genes are constitutively expressed and therefore escape differential expression analyses, they remain elusive. The goal of this study was to systematically search for AM-related genes with a bioinformatics strategy that is insensitive to these problems. The central element of our approach is based on the fact that many AM-related genes are conserved only among AM-competent species. Our approach involves genome-wide comparisons at the proteome level of AM-competent host species with non-mycorrhizal species. Using a clustering method we first established orthologous/paralogous relationships and subsequently identified protein clusters that contain members only of the AM-competent species. Proteins of these clusters were then analyzed in an extended set of 16 plant species and ranked based on their relatedness among AM-competent monocot and dicot species, relative to non-mycorrhizal species. In addition, we combined the information on the protein-coding sequence with gene expression data and with promoter analysis. As a result we present a list of yet uncharacterized proteins that show a strongly AM-related pattern of sequence conservation, indicating that the respective genes may have been under selection for a function in AM. Among the top candidates are three genes that encode a small family of similar receptor-like kinases that are related to the S-locus receptor kinases involved in sporophytic self-incompatibility. We present a new systematic strategy of gene discovery based on conservation of the protein-coding sequence that complements classical forward and reverse genetics. This strategy can be applied to diverse other biological phenomena if species with established genome sequences fall into distinguished groups that differ in a defined functional trait of interest.

  6. Functionally Relevant Microsatellite Markers From Chickpea Transcription Factor Genes for Efficient Genotyping Applications and Trait Association Mapping

    PubMed Central

    Kujur, Alice; Bajaj, Deepak; Saxena, Maneesha S.; Tripathi, Shailesh; Upadhyaya, Hari D.; Gowda, C.L.L.; Singh, Sube; Jain, Mukesh; Tyagi, Akhilesh K.; Parida, Swarup K.

    2013-01-01

    We developed 1108 transcription factor gene-derived microsatellite (TFGMS) and 161 transcription factor functional domain-associated microsatellite (TFFDMS) markers from 707 TFs of chickpea. The robust amplification efficiency (96.5%) and high intra-specific polymorphic potential (34%) detected by markers suggest their immense utilities in efficient large-scale genotyping applications, including construction of both physical and functional transcript maps and understanding population structure. Candidate gene-based association analysis revealed strong genetic association of TFFDMS markers with three major seed and pod traits. Further, TFGMS markers in the 5′ untranslated regions of TF genes showing differential expression during seed development had higher trait association potential. The significance of TFFDMS markers was demonstrated by correlating their allelic variation with amino acid sequence expansion/contraction in the functional domain and alteration of secondary protein structure encoded by genes. The seed weight-associated markers were validated through traditional bi-parental genetic mapping. The determination of gene-specific linkage disequilibrium (LD) patterns in desi and kabuli based on single nucleotide polymorphism-microsatellite marker haplotypes revealed extended LD decay, enhanced LD resolution and trait association potential of genes. The evolutionary history of a strong seed-size/weight-associated TF based on natural variation and haplotype sharing among desi, kabuli and wild unravelled useful information having implication for seed-size trait evolution during chickpea domestication. PMID:23633531

  7. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jun, Se -Ran; Hauser, Loren John; Schadt, Christopher Warren

    For decades there has been increasing interest in understanding the relationships between microbial communities and ecosystem functions. Current DNA sequencing technologies allows for the exploration of microbial communities in two principle ways: targeted rRNA gene surveys and shotgun metagenomics. For large study designs, it is often still prohibitively expensive to sequence metagenomes at both the breadth and depth necessary to statistically capture the true functional diversity of a community. Although rRNA gene surveys provide no direct evidence of function, they do provide a reasonable estimation of microbial diversity, while being a very cost effective way to screen samples of interestmore » for later shotgun metagenomic analyses. However, there is a great deal of 16S rRNA gene survey data currently available from diverse environments, and thus a need for tools to infer functional composition of environmental samples based on 16S rRNA gene survey data. As a result, we present a computational method called pangenome based functional profiles (PanFP), which infers functional profiles of microbial communities from 16S rRNA gene survey data for Bacteria and Archaea. PanFP is based on pangenome reconstruction of a 16S rRNA gene operational taxonomic unit (OTU) from known genes and genomes pooled from the OTU s taxonomic lineage. From this lineage, we derive an OTU functional profile by weighting a pangenome s functional profile with the OTUs abundance observed in a given sample. We validated our method by comparing PanFP to the functional profiles obtained from the direct shotgun metagenomic measurement of 65 diverse communities via Spearman correlation coefficients. These correlations improved with increasing sequencing depth, within the range of 0.8 0.9 for the most deeply sequenced Human Microbiome Project mock community samples. PanFP is very similar in performance to another recently released tool, PICRUSt, for almost all of survey data analysed here. But, our method is unique in that any OTU building method can be used, as opposed to being limited to closed reference OTU picking strategies against specific reference sequence databases. In conclusion, we developed an automated computational method, which derives an inferred functional profile based on the 16S rRNA gene surveys of microbial communities. The inferred functional profile provides a cost effective way to study complex ecosystems through predicted comparative functional metagenomes and metadata analysis. All PanFP source code and additional documentation are freely available online at GitHub.« less

  8. Analysis of mammalian gene function through broad-based phenotypic screens across a consortium of mouse clinics.

    PubMed

    de Angelis, Martin Hrabě; Nicholson, George; Selloum, Mohammed; White, Jacqui; Morgan, Hugh; Ramirez-Solis, Ramiro; Sorg, Tania; Wells, Sara; Fuchs, Helmut; Fray, Martin; Adams, David J; Adams, Niels C; Adler, Thure; Aguilar-Pimentel, Antonio; Ali-Hadji, Dalila; Amann, Gregory; André, Philippe; Atkins, Sarah; Auburtin, Aurelie; Ayadi, Abdel; Becker, Julien; Becker, Lore; Bedu, Elodie; Bekeredjian, Raffi; Birling, Marie-Christine; Blake, Andrew; Bottomley, Joanna; Bowl, Mike; Brault, Véronique; Busch, Dirk H; Bussell, James N; Calzada-Wack, Julia; Cater, Heather; Champy, Marie-France; Charles, Philippe; Chevalier, Claire; Chiani, Francesco; Codner, Gemma F; Combe, Roy; Cox, Roger; Dalloneau, Emilie; Dierich, André; Di Fenza, Armida; Doe, Brendan; Duchon, Arnaud; Eickelberg, Oliver; Esapa, Chris T; El Fertak, Lahcen; Feigel, Tanja; Emelyanova, Irina; Estabel, Jeanne; Favor, Jack; Flenniken, Ann; Gambadoro, Alessia; Garrett, Lilian; Gates, Hilary; Gerdin, Anna-Karin; Gkoutos, George; Greenaway, Simon; Glasl, Lisa; Goetz, Patrice; Da Cruz, Isabelle Goncalves; Götz, Alexander; Graw, Jochen; Guimond, Alain; Hans, Wolfgang; Hicks, Geoff; Hölter, Sabine M; Höfler, Heinz; Hancock, John M; Hoehndorf, Robert; Hough, Tertius; Houghton, Richard; Hurt, Anja; Ivandic, Boris; Jacobs, Hughes; Jacquot, Sylvie; Jones, Nora; Karp, Natasha A; Katus, Hugo A; Kitchen, Sharon; Klein-Rodewald, Tanja; Klingenspor, Martin; Klopstock, Thomas; Lalanne, Valerie; Leblanc, Sophie; Lengger, Christoph; le Marchand, Elise; Ludwig, Tonia; Lux, Aline; McKerlie, Colin; Maier, Holger; Mandel, Jean-Louis; Marschall, Susan; Mark, Manuel; Melvin, David G; Meziane, Hamid; Micklich, Kateryna; Mittelhauser, Christophe; Monassier, Laurent; Moulaert, David; Muller, Stéphanie; Naton, Beatrix; Neff, Frauke; Nolan, Patrick M; Nutter, Lauryl Mj; Ollert, Markus; Pavlovic, Guillaume; Pellegata, Natalia S; Peter, Emilie; Petit-Demoulière, Benoit; Pickard, Amanda; Podrini, Christine; Potter, Paul; Pouilly, Laurent; Puk, Oliver; Richardson, David; Rousseau, Stephane; Quintanilla-Fend, Leticia; Quwailid, Mohamed M; Racz, Ildiko; Rathkolb, Birgit; Riet, Fabrice; Rossant, Janet; Roux, Michel; Rozman, Jan; Ryder, Ed; Salisbury, Jennifer; Santos, Luis; Schäble, Karl-Heinz; Schiller, Evelyn; Schrewe, Anja; Schulz, Holger; Steinkamp, Ralf; Simon, Michelle; Stewart, Michelle; Stöger, Claudia; Stöger, Tobias; Sun, Minxuan; Sunter, David; Teboul, Lydia; Tilly, Isabelle; Tocchini-Valentini, Glauco P; Tost, Monica; Treise, Irina; Vasseur, Laurent; Velot, Emilie; Vogt-Weisenhorn, Daniela; Wagner, Christelle; Walling, Alison; Weber, Bruno; Wendling, Olivia; Westerberg, Henrik; Willershäuser, Monja; Wolf, Eckhard; Wolter, Anne; Wood, Joe; Wurst, Wolfgang; Yildirim, Ali Önder; Zeh, Ramona; Zimmer, Andreas; Zimprich, Annemarie; Holmes, Chris; Steel, Karen P; Herault, Yann; Gailus-Durner, Valérie; Mallon, Ann-Marie; Brown, Steve Dm

    2015-09-01

    The function of the majority of genes in the mouse and human genomes remains unknown. The mouse embryonic stem cell knockout resource provides a basis for the characterization of relationships between genes and phenotypes. The EUMODIC consortium developed and validated robust methodologies for the broad-based phenotyping of knockouts through a pipeline comprising 20 disease-oriented platforms. We developed new statistical methods for pipeline design and data analysis aimed at detecting reproducible phenotypes with high power. We acquired phenotype data from 449 mutant alleles, representing 320 unique genes, of which half had no previous functional annotation. We captured data from over 27,000 mice, finding that 83% of the mutant lines are phenodeviant, with 65% demonstrating pleiotropy. Surprisingly, we found significant differences in phenotype annotation according to zygosity. New phenotypes were uncovered for many genes with previously unknown function, providing a powerful basis for hypothesis generation and further investigation in diverse systems.

  9. Pairwise gene GO-based measures for biclustering of high-dimensional expression data.

    PubMed

    Nepomuceno, Juan A; Troncoso, Alicia; Nepomuceno-Chamorro, Isabel A; Aguilar-Ruiz, Jesús S

    2018-01-01

    Biclustering algorithms search for groups of genes that share the same behavior under a subset of samples in gene expression data. Nowadays, the biological knowledge available in public repositories can be used to drive these algorithms to find biclusters composed of groups of genes functionally coherent. On the other hand, a distance among genes can be defined according to their information stored in Gene Ontology (GO). Gene pairwise GO semantic similarity measures report a value for each pair of genes which establishes their functional similarity. A scatter search-based algorithm that optimizes a merit function that integrates GO information is studied in this paper. This merit function uses a term that addresses the information through a GO measure. The effect of two possible different gene pairwise GO measures on the performance of the algorithm is analyzed. Firstly, three well known yeast datasets with approximately one thousand of genes are studied. Secondly, a group of human datasets related to clinical data of cancer is also explored by the algorithm. Most of these data are high-dimensional datasets composed of a huge number of genes. The resultant biclusters reveal groups of genes linked by a same functionality when the search procedure is driven by one of the proposed GO measures. Furthermore, a qualitative biological study of a group of biclusters show their relevance from a cancer disease perspective. It can be concluded that the integration of biological information improves the performance of the biclustering process. The two different GO measures studied show an improvement in the results obtained for the yeast dataset. However, if datasets are composed of a huge number of genes, only one of them really improves the algorithm performance. This second case constitutes a clear option to explore interesting datasets from a clinical point of view.

  10. Semantics based approach for analyzing disease-target associations.

    PubMed

    Kaalia, Rama; Ghosh, Indira

    2016-08-01

    A complex disease is caused by heterogeneous biological interactions between genes and their products along with the influence of environmental factors. There have been many attempts for understanding the cause of these diseases using experimental, statistical and computational methods. In the present work the objective is to address the challenge of representation and integration of information from heterogeneous biomedical aspects of a complex disease using semantics based approach. Semantic web technology is used to design Disease Association Ontology (DAO-db) for representation and integration of disease associated information with diabetes as the case study. The functional associations of disease genes are integrated using RDF graphs of DAO-db. Three semantic web based scoring algorithms (PageRank, HITS (Hyperlink Induced Topic Search) and HITS with semantic weights) are used to score the gene nodes on the basis of their functional interactions in the graph. Disease Association Ontology for Diabetes (DAO-db) provides a standard ontology-driven platform for describing genes, proteins, pathways involved in diabetes and for integrating functional associations from various interaction levels (gene-disease, gene-pathway, gene-function, gene-cellular component and protein-protein interactions). An automatic instance loader module is also developed in present work that helps in adding instances to DAO-db on a large scale. Our ontology provides a framework for querying and analyzing the disease associated information in the form of RDF graphs. The above developed methodology is used to predict novel potential targets involved in diabetes disease from the long list of loose (statistically associated) gene-disease associations. Copyright © 2016 Elsevier Inc. All rights reserved.

  11. A human functional protein interaction network and its application to cancer data analysis

    PubMed Central

    2010-01-01

    Background One challenge facing biologists is to tease out useful information from massive data sets for further analysis. A pathway-based analysis may shed light by projecting candidate genes onto protein functional relationship networks. We are building such a pathway-based analysis system. Results We have constructed a protein functional interaction network by extending curated pathways with non-curated sources of information, including protein-protein interactions, gene coexpression, protein domain interaction, Gene Ontology (GO) annotations and text-mined protein interactions, which cover close to 50% of the human proteome. By applying this network to two glioblastoma multiforme (GBM) data sets and projecting cancer candidate genes onto the network, we found that the majority of GBM candidate genes form a cluster and are closer than expected by chance, and the majority of GBM samples have sequence-altered genes in two network modules, one mainly comprising genes whose products are localized in the cytoplasm and plasma membrane, and another comprising gene products in the nucleus. Both modules are highly enriched in known oncogenes, tumor suppressors and genes involved in signal transduction. Similar network patterns were also found in breast, colorectal and pancreatic cancers. Conclusions We have built a highly reliable functional interaction network upon expert-curated pathways and applied this network to the analysis of two genome-wide GBM and several other cancer data sets. The network patterns revealed from our results suggest common mechanisms in the cancer biology. Our system should provide a foundation for a network or pathway-based analysis platform for cancer and other diseases. PMID:20482850

  12. Essential RNA-Based Technologies and Their Applications in Plant Functional Genomics.

    PubMed

    Teotia, Sachin; Singh, Deepali; Tang, Xiaoqing; Tang, Guiliang

    2016-02-01

    Genome sequencing has not only extended our understanding of the blueprints of many plant species but has also revealed the secrets of coding and non-coding genes. We present here a brief introduction to and personal account of key RNA-based technologies, as well as their development and applications for functional genomics of plant coding and non-coding genes, with a focus on short tandem target mimics (STTMs), artificial microRNAs (amiRNAs), and CRISPR/Cas9. In addition, their use in multiplex technologies for the functional dissection of gene networks is discussed. Copyright © 2015 Elsevier Ltd. All rights reserved.

  13. Virus-induced gene silencing offers a functional genomics platform for studying plant cell wall formation.

    PubMed

    Zhu, Xiaohong; Pattathil, Sivakumar; Mazumder, Koushik; Brehm, Amanda; Hahn, Michael G; Dinesh-Kumar, S P; Joshi, Chandrashekhar P

    2010-09-01

    Virus-induced gene silencing (VIGS) is a powerful genetic tool for rapid assessment of plant gene functions in the post-genomic era. Here, we successfully implemented a Tobacco Rattle Virus (TRV)-based VIGS system to study functions of genes involved in either primary or secondary cell wall formation in Nicotiana benthamiana plants. A 3-week post-VIGS time frame is sufficient to observe phenotypic alterations in the anatomical structure of stems and chemical composition of the primary and secondary cell walls. We used cell wall glycan-directed monoclonal antibodies to demonstrate that alteration of cell wall polymer synthesis during the secondary growth phase of VIGS plants has profound effects on the extractability of components from woody stem cell walls. Therefore, TRV-based VIGS together with cell wall component profiling methods provide a high-throughput gene discovery platform for studying plant cell wall formation from a bioenergy perspective.

  14. Identifying gnostic predictors of the vaccine response.

    PubMed

    Haining, W Nicholas; Pulendran, Bali

    2012-06-01

    Molecular predictors of the response to vaccination could transform vaccine development. They would allow larger numbers of vaccine candidates to be rapidly screened, shortening the development time for new vaccines. Gene-expression based predictors of vaccine response have shown early promise. However, a limitation of gene-expression based predictors is that they often fail to reveal the mechanistic basis of their ability to classify response. Linking predictive signatures to the function of their component genes would advance basic understanding of vaccine immunity and also improve the robustness of vaccine prediction. New analytic tools now allow more biological meaning to be extracted from predictive signatures. Functional genomic approaches to perturb gene expression in mammalian cells permit the function of predictive genes to be surveyed in highly parallel experiments. The challenge for vaccinologists is therefore to use these tools to embed mechanistic insights into predictors of vaccine response. Copyright © 2012 Elsevier Ltd. All rights reserved.

  15. Identifying gnostic predictors of the vaccine response

    PubMed Central

    Haining, W. Nicholas; Pulendran, Bali

    2012-01-01

    Molecular predictors of the response to vaccination could transform vaccine development. They would allow larger numbers of vaccine candidates to be rapidly screened, shortening the development time for new vaccines. Gene-expression based predictors of vaccine response have shown early promise. However, a limitation of gene-expression based predictors is that they often fail to reveal the mechanistic basis for their ability to classify response. Linking predictive signatures to the function of their component genes would advance basic understanding of vaccine immunity and also improve the robustness of outcome classification. New analytic tools now allow more biological meaning to be extracted from predictive signatures. Functional genomic approaches to perturb gene expression in mammalian cells permit the function of predictive genes to be surveyed in highly parallel experiments. The challenge for vaccinologists is therefore to use these tools to embed mechanistic insights into predictors of vaccine response. PMID:22633886

  16. Lymphocyte signaling: beyond knockouts.

    PubMed

    Saveliev, Alexander; Tybulewicz, Victor L J

    2009-04-01

    The analysis of lymphocyte signaling was greatly enhanced by the advent of gene targeting, which allows the selective inactivation of a single gene. Although this gene 'knockout' approach is often informative, in many cases, the phenotype resulting from gene ablation might not provide a complete picture of the function of the corresponding protein. If a protein has multiple functions within a single or several signaling pathways, or stabilizes other proteins in a complex, the phenotypic consequences of a gene knockout may manifest as a combination of several different perturbations. In these cases, gene targeting to 'knock in' subtle point mutations might provide more accurate insight into protein function. However, to be informative, such mutations must be carefully based on structural and biophysical data.

  17. Identification and VIGS-based characterization of Bx1 ortholog in rye (Secale cereale L.)

    PubMed Central

    Groszyk, Jolanta; Kowalczyk, Mariusz; Yanushevska, Yuliya; Stochmal, Anna; Rakoczy-Trojanowska, Monika

    2017-01-01

    The first step of the benzoxazinoid (BX) synthesis pathway is catalyzed by an enzyme with indole-3-glycerol phosphate lyase activity encoded by 3 genes, Bx1, TSA and Igl. A gene highly homologous to maize and wheat Bx1 has been identified in rye. The goal of the study was to analyze the gene and to experimentally verify its role in the rye BX biosynthesis pathway as a rye ortholog of the Bx1 gene. Expression of the gene showed peak values 3 days after imbibition (dai) and at 21 dai it was undetectable. Changes of the BX content in leaves were highly correlated with the expression pattern until 21 dai. In plants older than 21 dai despite the undetectable expression of the analyzed gene there was still low accumulation of BXs. Function of the gene was verified by correlating its native expression and virus-induced silencing with BX accumulation. Barley stripe mosaic virus (BSMV)-based vectors were used to induce transcriptional (TGS) and posttranscriptional (PTGS) silencing of the analyzed gene. Both strategies (PTGS and TGS) significantly reduced the transcript level of the analyzed gene, and this was highly correlated with lowered BX content. Inoculation with virus-based vectors specifically induced expression of the analyzed gene, indicating up-regulation by biotic stressors. This is the first report of using the BSMV-based system for functional analysis of rye gene. The findings prove that the analyzed gene is a rye ortholog of the Bx1 gene. Its expression is developmentally regulated and is strongly induced by biotic stress. Stable accumulation of BXs in plants older than 21 dai associated with undetectable expression of ScBx1 indicates that the function of the ScBx1 in the BX biosynthesis is redundant with another gene. We anticipate that the unknown gene is a putative ortholog of the Igl, which still remains to be identified in rye. PMID:28234909

  18. Identification and VIGS-based characterization of Bx1 ortholog in rye (Secale cereale L.).

    PubMed

    Groszyk, Jolanta; Kowalczyk, Mariusz; Yanushevska, Yuliya; Stochmal, Anna; Rakoczy-Trojanowska, Monika; Orczyk, Waclaw

    2017-01-01

    The first step of the benzoxazinoid (BX) synthesis pathway is catalyzed by an enzyme with indole-3-glycerol phosphate lyase activity encoded by 3 genes, Bx1, TSA and Igl. A gene highly homologous to maize and wheat Bx1 has been identified in rye. The goal of the study was to analyze the gene and to experimentally verify its role in the rye BX biosynthesis pathway as a rye ortholog of the Bx1 gene. Expression of the gene showed peak values 3 days after imbibition (dai) and at 21 dai it was undetectable. Changes of the BX content in leaves were highly correlated with the expression pattern until 21 dai. In plants older than 21 dai despite the undetectable expression of the analyzed gene there was still low accumulation of BXs. Function of the gene was verified by correlating its native expression and virus-induced silencing with BX accumulation. Barley stripe mosaic virus (BSMV)-based vectors were used to induce transcriptional (TGS) and posttranscriptional (PTGS) silencing of the analyzed gene. Both strategies (PTGS and TGS) significantly reduced the transcript level of the analyzed gene, and this was highly correlated with lowered BX content. Inoculation with virus-based vectors specifically induced expression of the analyzed gene, indicating up-regulation by biotic stressors. This is the first report of using the BSMV-based system for functional analysis of rye gene. The findings prove that the analyzed gene is a rye ortholog of the Bx1 gene. Its expression is developmentally regulated and is strongly induced by biotic stress. Stable accumulation of BXs in plants older than 21 dai associated with undetectable expression of ScBx1 indicates that the function of the ScBx1 in the BX biosynthesis is redundant with another gene. We anticipate that the unknown gene is a putative ortholog of the Igl, which still remains to be identified in rye.

  19. Comparative genome analysis of PHB gene family reveals deep evolutionary origins and diverse gene function.

    PubMed

    Di, Chao; Xu, Wenying; Su, Zhen; Yuan, Joshua S

    2010-10-07

    PHB (Prohibitin) gene family is involved in a variety of functions important for different biological processes. PHB genes are ubiquitously present in divergent species from prokaryotes to eukaryotes. Human PHB genes have been found to be associated with various diseases. Recent studies by our group and others have shown diverse function of PHB genes in plants for development, senescence, defence, and others. Despite the importance of the PHB gene family, no comprehensive gene family analysis has been carried to evaluate the relatedness of PHB genes across different species. In order to better guide the gene function analysis and understand the evolution of the PHB gene family, we therefore carried out the comparative genome analysis of the PHB genes across different kingdoms. The relatedness, motif distribution, and intron/exon distribution all indicated that PHB genes is a relatively conserved gene family. The PHB genes can be classified into 5 classes and each class have a very deep evolutionary origin. The PHB genes within the class maintained the same motif patterns during the evolution. With Arabidopsis as the model species, we found that PHB gene intron/exon structure and domains are also conserved during the evolution. Despite being a conserved gene family, various gene duplication events led to the expansion of the PHB genes. Both segmental and tandem gene duplication were involved in Arabidopsis PHB gene family expansion. However, segmental duplication is predominant in Arabidopsis. Moreover, most of the duplicated genes experienced neofunctionalization. The results highlighted that PHB genes might be involved in important functions so that the duplicated genes are under the evolutionary pressure to derive new function. PHB gene family is a conserved gene family and accounts for diverse but important biological functions based on the similar molecular mechanisms. The highly diverse biological function indicated that more research needs to be carried out to dissect the PHB gene function. The conserved gene evolution indicated that the study in the model species can be translated to human and mammalian studies.

  20. Escherichia coli K-12 and B contain functional bacteriophage P2 ogr genes.

    PubMed Central

    Slettan, A; Gebhardt, K; Kristiansen, E; Birkeland, N K; Lindqvist, B H

    1992-01-01

    The bacteriophage P2 ogr gene encodes an essential 72-amino-acid protein which acts as a positive regulator of P2 late transcription. A P2 ogr deletion phage, which depends on the supply of Ogr protein in trans for lytic growth on Escherichia coli C, has previously been constructed. E. coli B and K-12 were found to support the growth of the ogr-defective P2 phage because of the presence of functional ogr genes located in cryptic P2-like prophages in these strains. The cryptic ogr genes were cloned and sequenced. Compared with the P2 wild-type ogr gene, the ogr genes in the B and K-12 strains are conserved, containing mostly silent base substitutions. One of the base substitutions in the K-12 ogr gene results in replacement of an alanine with valine at position 57 in the Ogr protein but does not seem to affect the function of Ogr as a transcriptional activator. The cryptic ogr genes are constitutively transcribed, apparently at a higher level than the wild-type ogr gene in a P2 lysogen. Images PMID:1597424

  1. Dramatic Increases of Soil Microbial Functional Gene Diversity at the Treeline Ecotone of Changbai Mountain

    PubMed Central

    Shen, Congcong; Shi, Yu; Ni, Yingying; Deng, Ye; Van Nostrand, Joy D.; He, Zhili; Zhou, Jizhong; Chu, Haiyan

    2016-01-01

    The elevational and latitudinal diversity patterns of microbial taxa have attracted great attention in the past decade. Recently, the distribution of functional attributes has been in the spotlight. Here, we report a study profiling soil microbial communities along an elevation gradient (500–2200 m) on Changbai Mountain. Using a comprehensive functional gene microarray (GeoChip 5.0), we found that microbial functional gene richness exhibited a dramatic increase at the treeline ecotone, but the bacterial taxonomic and phylogenetic diversity based on 16S rRNA gene sequencing did not exhibit such a similar trend. However, the β-diversity (compositional dissimilarity among sites) pattern for both bacterial taxa and functional genes was similar, showing significant elevational distance-decay patterns which presented increased dissimilarity with elevation. The bacterial taxonomic diversity/structure was strongly influenced by soil pH, while the functional gene diversity/structure was significantly correlated with soil dissolved organic carbon (DOC). This finding highlights that soil DOC may be a good predictor in determining the elevational distribution of microbial functional genes. The finding of significant shifts in functional gene diversity at the treeline ecotone could also provide valuable information for predicting the responses of microbial functions to climate change. PMID:27524983

  2. GeoChip-Based Analysis of the Functional Gene Diversity and Metabolic Potential of Microbial Communities in Acid Mine Drainage▿ †

    PubMed Central

    Xie, Jianping; He, Zhili; Liu, Xinxing; Liu, Xueduan; Van Nostrand, Joy D.; Deng, Ye; Wu, Liyou; Zhou, Jizhong; Qiu, Guanzhou

    2011-01-01

    Acid mine drainage (AMD) is an extreme environment, usually with low pH and high concentrations of metals. Although the phylogenetic diversity of AMD microbial communities has been examined extensively, little is known about their functional gene diversity and metabolic potential. In this study, a comprehensive functional gene array (GeoChip 2.0) was used to analyze the functional diversity, composition, structure, and metabolic potential of AMD microbial communities from three copper mines in China. GeoChip data indicated that these microbial communities were functionally diverse as measured by the number of genes detected, gene overlapping, unique genes, and various diversity indices. Almost all key functional gene categories targeted by GeoChip 2.0 were detected in the AMD microbial communities, including carbon fixation, carbon degradation, methane generation, nitrogen fixation, nitrification, denitrification, ammonification, nitrogen reduction, sulfur metabolism, metal resistance, and organic contaminant degradation, which suggested that the functional gene diversity was higher than was previously thought. Mantel test results indicated that AMD microbial communities are shaped largely by surrounding environmental factors (e.g., S, Mg, and Cu). Functional genes (e.g., narG and norB) and several key functional processes (e.g., methane generation, ammonification, denitrification, sulfite reduction, and organic contaminant degradation) were significantly (P < 0.10) correlated with environmental variables. This study presents an overview of functional gene diversity and the structure of AMD microbial communities and also provides insights into our understanding of metabolic potential in AMD ecosystems. PMID:21097602

  3. New Dimensions in Microbial Ecology-Functional Genes in Studies to Unravel the Biodiversity and Role of Functional Microbial Groups in the Environment.

    PubMed

    Imhoff, Johannes F

    2016-05-24

    During the past decades, tremendous advances have been made in the possibilities to study the diversity of microbial communities in the environment. The development of methods to study these communities on the basis of 16S rRNA gene sequences analysis was a first step into the molecular analysis of environmental communities and the study of biodiversity in natural habitats. A new dimension in this field was reached with the introduction of functional genes of ecological importance and the establishment of genetic tools to study the diversity of functional microbial groups and their responses to environmental factors. Functional gene approaches are excellent tools to study the diversity of a particular function and to demonstrate changes in the composition of prokaryote communities contributing to this function. The phylogeny of many functional genes largely correlates with that of the 16S rRNA gene, and microbial species may be identified on the basis of functional gene sequences. Functional genes are perfectly suited to link culture-based microbiological work with environmental molecular genetic studies. In this review, the development of functional gene studies in environmental microbiology is highlighted with examples of genes relevant for important ecophysiological functions. Examples are presented for bacterial photosynthesis and two types of anoxygenic phototrophic bacteria, with genes of the Fenna-Matthews-Olson-protein (fmoA) as target for the green sulfur bacteria and of two reaction center proteins (pufLM) for the phototrophic purple bacteria, with genes of adenosine-5'phosphosulfate (APS) reductase (aprA), sulfate thioesterase (soxB) and dissimilatory sulfite reductase (dsrAB) for sulfur oxidizing and sulfate reducing bacteria, with genes of ammonia monooxygenase (amoA) for nitrifying/ammonia-oxidizing bacteria, with genes of particulate nitrate reductase and nitrite reductases (narH/G, nirS, nirK) for denitrifying bacteria and with genes of methane monooxygenase (pmoA) for methane oxidizing bacteria.

  4. Genome Editing of Monkey.

    PubMed

    Liu, Zhen; Cai, Yijun; Sun, Qiang

    2017-01-01

    Gene-modified monkey models would be particularly valuable in biomedical and neuroscience research. Virus-based transgenic and programmable nucleases-based site-specific gene editing methods (TALEN, CRISPR-cas9) enable the generation of gene-modified monkeys with gain or loss of function of specific genes. Here, we describe the generation of transgenic and knock-out (KO) monkeys with high efficiency by lentivirus and programmable nucleases.

  5. Distributed Function Mining for Gene Expression Programming Based on Fast Reduction.

    PubMed

    Deng, Song; Yue, Dong; Yang, Le-chan; Fu, Xiong; Feng, Ya-zhou

    2016-01-01

    For high-dimensional and massive data sets, traditional centralized gene expression programming (GEP) or improved algorithms lead to increased run-time and decreased prediction accuracy. To solve this problem, this paper proposes a new improved algorithm called distributed function mining for gene expression programming based on fast reduction (DFMGEP-FR). In DFMGEP-FR, fast attribution reduction in binary search algorithms (FAR-BSA) is proposed to quickly find the optimal attribution set, and the function consistency replacement algorithm is given to solve integration of the local function model. Thorough comparative experiments for DFMGEP-FR, centralized GEP and the parallel gene expression programming algorithm based on simulated annealing (parallel GEPSA) are included in this paper. For the waveform, mushroom, connect-4 and musk datasets, the comparative results show that the average time-consumption of DFMGEP-FR drops by 89.09%%, 88.85%, 85.79% and 93.06%, respectively, in contrast to centralized GEP and by 12.5%, 8.42%, 9.62% and 13.75%, respectively, compared with parallel GEPSA. Six well-studied UCI test data sets demonstrate the efficiency and capability of our proposed DFMGEP-FR algorithm for distributed function mining.

  6. Functional Characteristics of the Flying Squirrel's Cecal Microbiota under a Leaf-Based Diet, Based on Multiple Meta-Omic Profiling

    PubMed Central

    Lu, Hsiao-Pei; Liu, Po-Yu; Wang, Yu-bin; Hsieh, Ji-Fan; Ho, Han-Chen; Huang, Shiao-Wei; Lin, Chung-Yen; Hsieh, Chih-hao; Yu, Hon-Tsen

    2018-01-01

    Mammalian herbivores rely on microbial activities in an expanded gut chamber to convert plant biomass into absorbable nutrients. Distinct from ruminants, small herbivores typically have a simple stomach but an enlarged cecum to harbor symbiotic microbes; however, knowledge of this specialized gut structure and characteristics of its microbial contents is limited. Here, we used leaf-eating flying squirrels as a model to explore functional characteristics of the cecal microbiota adapted to a high-fiber, toxin-rich diet. Specifically, environmental conditions across gut regions were evaluated by measuring mass, pH, feed particle size, and metabolomes. Then, parallel metagenomes and metatranscriptomes were used to detect microbial functions corresponding to the cecal environment. Based on metabolomic profiles, >600 phytochemical compounds were detected, although many were present only in the foregut and probably degraded or transformed by gut microbes in the hindgut. Based on metagenomic (DNA) and metatranscriptomic (RNA) profiles, taxonomic compositions of the cecal microbiota were dominated by bacteria of the Firmicutes taxa; they contained major gene functions related to degradation and fermentation of leaf-derived compounds. Based on functional compositions, genes related to multidrug exporters were rich in microbial genomes, whereas genes involved in nutrient importers were rich in microbial transcriptomes. In addition, genes encoding chemotaxis-associated components and glycoside hydrolases specific for plant beta-glycosidic linkages were abundant in both DNA and RNA. This exploratory study provides findings which may help to form molecular-based hypotheses regarding functional contributions of symbiotic gut microbiota in small herbivores with folivorous dietary habits. PMID:29354108

  7. Targeted capture and resequencing of 1040 genes reveal environmentally driven functional variation in grey wolves.

    PubMed

    Schweizer, Rena M; Robinson, Jacqueline; Harrigan, Ryan; Silva, Pedro; Galverni, Marco; Musiani, Marco; Green, Richard E; Novembre, John; Wayne, Robert K

    2016-01-01

    In an era of ever-increasing amounts of whole-genome sequence data for individuals and populations, the utility of traditional single nucleotide polymorphisms (SNPs) array-based genome scans is uncertain. We previously performed a SNP array-based genome scan to identify candidate genes under selection in six distinct grey wolf (Canis lupus) ecotypes. Using this information, we designed a targeted capture array for 1040 genes, including all exons and flanking regions, as well as 5000 1-kb nongenic neutral regions, and resequenced these regions in 107 wolves. Selection tests revealed striking patterns of variation within candidate genes relative to noncandidate regions and identified potentially functional variants related to local adaptation. We found 27% and 47% of candidate genes from the previous SNP array study had functional changes that were outliers in sweed and bayenv analyses, respectively. This result verifies the use of genomewide SNP surveys to tag genes that contain functional variants between populations. We highlight nonsynonymous variants in APOB, LIPG and USH2A that occur in functional domains of these proteins, and that demonstrate high correlation with precipitation seasonality and vegetation. We find Arctic and High Arctic wolf ecotypes have higher numbers of genes under selection, which highlight their conservation value and heightened threat due to climate change. This study demonstrates that combining genomewide genotyping arrays with large-scale resequencing and environmental data provides a powerful approach to discern candidate functional variants in natural populations. © 2015 John Wiley & Sons Ltd.

  8. Functional Interaction Network Construction and Analysis for Disease Discovery.

    PubMed

    Wu, Guanming; Haw, Robin

    2017-01-01

    Network-based approaches project seemingly unrelated genes or proteins onto a large-scale network context, therefore providing a holistic visualization and analysis platform for genomic data generated from high-throughput experiments, reducing the dimensionality of data via using network modules and increasing the statistic analysis power. Based on the Reactome database, the most popular and comprehensive open-source biological pathway knowledgebase, we have developed a highly reliable protein functional interaction network covering around 60 % of total human genes and an app called ReactomeFIViz for Cytoscape, the most popular biological network visualization and analysis platform. In this chapter, we describe the detailed procedures on how this functional interaction network is constructed by integrating multiple external data sources, extracting functional interactions from human curated pathway databases, building a machine learning classifier called a Naïve Bayesian Classifier, predicting interactions based on the trained Naïve Bayesian Classifier, and finally constructing the functional interaction database. We also provide an example on how to use ReactomeFIViz for performing network-based data analysis for a list of genes.

  9. Transposons As Tools for Functional Genomics in Vertebrate Models.

    PubMed

    Kawakami, Koichi; Largaespada, David A; Ivics, Zoltán

    2017-11-01

    Genetic tools and mutagenesis strategies based on transposable elements are currently under development with a vision to link primary DNA sequence information to gene functions in vertebrate models. By virtue of their inherent capacity to insert into DNA, transposons can be developed into powerful tools for chromosomal manipulations. Transposon-based forward mutagenesis screens have numerous advantages including high throughput, easy identification of mutated alleles, and providing insight into genetic networks and pathways based on phenotypes. For example, the Sleeping Beauty transposon has become highly instrumental to induce tumors in experimental animals in a tissue-specific manner with the aim of uncovering the genetic basis of diverse cancers. Here, we describe a battery of mutagenic cassettes that can be applied in conjunction with transposon vectors to mutagenize genes, and highlight versatile experimental strategies for the generation of engineered chromosomes for loss-of-function as well as gain-of-function mutagenesis for functional gene annotation in vertebrate models, including zebrafish, mice, and rats. Copyright © 2017 Elsevier Ltd. All rights reserved.

  10. Modular organization of the white spruce (Picea glauca) transcriptome reveals functional organization and evolutionary signatures.

    PubMed

    Raherison, Elie S M; Giguère, Isabelle; Caron, Sébastien; Lamara, Mebarek; MacKay, John J

    2015-07-01

    Transcript profiling has shown the molecular bases of several biological processes in plants but few studies have developed an understanding of overall transcriptome variation. We investigated transcriptome structure in white spruce (Picea glauca), aiming to delineate its modular organization and associated functional and evolutionary attributes. Microarray analyses were used to: identify and functionally characterize groups of co-expressed genes; investigate expressional and functional diversity of vascular tissue preferential genes which were conserved among Picea species, and identify expression networks underlying wood formation. We classified 22 857 genes as variable (79%; 22 coexpression groups) or invariant (21%) by profiling across several vegetative tissues. Modular organization and complex transcriptome restructuring among vascular tissue preferential genes was revealed by their assignment to coexpression groups with partially overlapping profiles and partially distinct functions. Integrated analyses of tissue-based and temporally variable profiles identified secondary xylem gene networks, showed their remodelling over a growing season and identified PgNAC-7 (no apical meristerm (NAM), Arabidopsis transcription activation factor (ATAF) and cup-shaped cotyledon (CUC) transcription factor 007 in Picea glauca) as a major hub gene specific to earlywood formation. Reference profiling identified comprehensive, statistically robust coexpressed groups, revealing that modular organization underpins the evolutionary conservation of the transcriptome structure. © 2015 The Authors. New Phytologist © 2015 New Phytologist Trust.

  11. Patterns of population differentiation of candidate genes for cardiovascular disease.

    PubMed

    Kullo, Iftikhar J; Ding, Keyue

    2007-07-12

    The basis for ethnic differences in cardiovascular disease (CVD) susceptibility is not fully understood. We investigated patterns of population differentiation (FST) of a set of genes in etiologic pathways of CVD among 3 ethnic groups: Yoruba in Nigeria (YRI), Utah residents with European ancestry (CEU), and Han Chinese (CHB) + Japanese (JPT). We identified 37 pathways implicated in CVD based on the PANTHER classification and 416 genes in these pathways were further studied; these genes belonged to 6 biological processes (apoptosis, blood circulation and gas exchange, blood clotting, homeostasis, immune response, and lipoprotein metabolism). Genotype data were obtained from the HapMap database. We calculated FST for 15,559 common SNPs (minor allele frequency > or = 0.10 in at least one population) in genes that co-segregated among the populations, as well as an average-weighted FST for each gene. SNPs were classified as putatively functional (non-synonymous and untranslated regions) or non-functional (intronic and synonymous sites). Mean FST values for common putatively functional variants were significantly higher than FST values for nonfunctional variants. A significant variation in FST was also seen based on biological processes; the processes of 'apoptosis' and 'lipoprotein metabolism' showed an excess of genes with high FST. Thus, putative functional SNPs in genes in etiologic pathways for CVD show greater population differentiation than non-functional SNPs and a significant variance of FST values was noted among pairwise population comparisons for different biological processes. These results suggest a possible basis for varying susceptibility to CVD among ethnic groups.

  12. Coexpression network based on natural variation in human gene expression reveals gene interactions and functions

    PubMed Central

    Nayak, Renuka R.; Kearns, Michael; Spielman, Richard S.; Cheung, Vivian G.

    2009-01-01

    Genes interact in networks to orchestrate cellular processes. Analysis of these networks provides insights into gene interactions and functions. Here, we took advantage of normal variation in human gene expression to infer gene networks, which we constructed using correlations in expression levels of more than 8.5 million gene pairs in immortalized B cells from three independent samples. The resulting networks allowed us to identify biological processes and gene functions. Among the biological pathways, we found processes such as translation and glycolysis that co-occur in the same subnetworks. We predicted the functions of poorly characterized genes, including CHCHD2 and TMEM111, and provided experimental evidence that TMEM111 is part of the endoplasmic reticulum-associated secretory pathway. We also found that IFIH1, a susceptibility gene of type 1 diabetes, interacts with YES1, which plays a role in glucose transport. Furthermore, genes that predispose to the same diseases are clustered nonrandomly in the coexpression network, suggesting that networks can provide candidate genes that influence disease susceptibility. Therefore, our analysis of gene coexpression networks offers information on the role of human genes in normal and disease processes. PMID:19797678

  13. A Rapid CRISPR/Cas-based Mutagenesis Assay in Zebrafish for Identification of Genes Involved in Thyroid Morphogenesis and Function.

    PubMed

    Trubiroha, A; Gillotay, P; Giusti, N; Gacquer, D; Libert, F; Lefort, A; Haerlingen, B; De Deken, X; Opitz, R; Costagliola, S

    2018-04-04

    The foregut endoderm gives rise to several organs including liver, pancreas, lung and thyroid with important roles in human physiology. Understanding which genes and signalling pathways regulate their development is crucial for understanding developmental disorders as well as diseases in adulthood. We exploited unique advantages of the zebrafish model to develop a rapid and scalable CRISPR/Cas-based mutagenesis strategy aiming at the identification of genes involved in morphogenesis and function of the thyroid. Core elements of the mutagenesis assay comprise bi-allelic gene invalidation in somatic mutants, a non-invasive monitoring of thyroid development in live transgenic fish, complementary analyses of thyroid function in fixed specimens and quantitative analyses of mutagenesis efficiency by Illumina sequencing of individual fish. We successfully validated our mutagenesis-phenotyping strategy in experiments targeting genes with known functions in early thyroid morphogenesis (pax2a, nkx2.4b) and thyroid functional differentiation (duox, duoxa, tshr). We also demonstrate that duox and duoxa crispants phenocopy thyroid phenotypes previously observed in human patients with bi-allelic DUOX2 and DUOXA2 mutations. The proposed combination of efficient mutagenesis protocols, rapid non-invasive phenotyping and sensitive genotyping holds great potential to systematically characterize the function of larger candidate gene panels during thyroid development and is applicable to other organs and tissues.

  14. Computational Selection of Transcriptomics Experiments Improves Guilt-by-Association Analyses

    PubMed Central

    Bhat, Prajwal; Yang, Haixuan; Bögre, László; Devoto, Alessandra; Paccanaro, Alberto

    2012-01-01

    The Guilt-by-Association (GBA) principle, according to which genes with similar expression profiles are functionally associated, is widely applied for functional analyses using large heterogeneous collections of transcriptomics data. However, the use of such large collections could hamper GBA functional analysis for genes whose expression is condition specific. In these cases a smaller set of condition related experiments should instead be used, but identifying such functionally relevant experiments from large collections based on literature knowledge alone is an impractical task. We begin this paper by analyzing, both from a mathematical and a biological point of view, why only condition specific experiments should be used in GBA functional analysis. We are able to show that this phenomenon is independent of the functional categorization scheme and of the organisms being analyzed. We then present a semi-supervised algorithm that can select functionally relevant experiments from large collections of transcriptomics experiments. Our algorithm is able to select experiments relevant to a given GO term, MIPS FunCat term or even KEGG pathways. We extensively test our algorithm on large dataset collections for yeast and Arabidopsis. We demonstrate that: using the selected experiments there is a statistically significant improvement in correlation between genes in the functional category of interest; the selected experiments improve GBA-based gene function prediction; the effectiveness of the selected experiments increases with annotation specificity; our algorithm can be successfully applied to GBA-based pathway reconstruction. Importantly, the set of experiments selected by the algorithm reflects the existing literature knowledge about the experiments. [A MATLAB implementation of the algorithm and all the data used in this paper can be downloaded from the paper website: http://www.paccanarolab.org/papers/CorrGene/]. PMID:22879875

  15. Advanced drug and gene delivery systems based on functional biodegradable polycarbonates and copolymers.

    PubMed

    Chen, Wei; Meng, Fenghua; Cheng, Ru; Deng, Chao; Feijen, Jan; Zhong, Zhiyuan

    2014-09-28

    Biodegradable polymeric nanocarriers are one of the most promising systems for targeted and controlled drug and gene delivery. They have shown several unique advantages such as excellent biocompatibility, prolonged circulation time, passive tumor targeting via the enhanced permeability and retention (EPR) effect, and degradation in vivo into nontoxic products after completing their tasks. The current biodegradable drug and gene delivery systems exhibit, however, typically low in vivo therapeutic efficacy, due to issues of low loading capacity, inadequate in vivo stability, premature cargo release, poor uptake by target cells, and slow release of therapeutics inside tumor cells. To overcome these problems, a variety of advanced drug and gene delivery systems has recently been designed and developed based on functional biodegradable polycarbonates and copolymers. Notably, polycarbonates and copolymers with diverse functionalities such as hydroxyl, carboxyl, amine, alkene, alkyne, halogen, azido, acryloyl, vinyl sulfone, pyridyldisulfide, and saccharide, could be readily obtained by controlled ring-opening polymerization. In this paper, we give an overview on design concepts and recent developments of functional polycarbonate-based nanocarriers including stimuli-sensitive, photo-crosslinkable, or active targeting polymeric micelles, polymersomes and polyplexes for enhanced drug and gene delivery in vitro and in vivo. These multifunctional biodegradable nanosystems might be eventually developed for safe and efficient cancer chemotherapy and gene therapy. Copyright © 2014 Elsevier B.V. All rights reserved.

  16. Discovering the Deregulated Molecular Functions Involved in Malignant Transformation of Endometriosis to Endometriosis-Associated Ovarian Carcinoma Using a Data-Driven, Function-Based Analysis

    PubMed Central

    Chang, Chia-Ming; Yang, Yi-Ping; Chuang, Jen-Hua; Chuang, Chi-Mu; Lin, Tzu-Wei; Wang, Peng-Hui; Yu, Mu-Hsien

    2017-01-01

    The clinical characteristics of clear cell carcinoma (CCC) and endometrioid carcinoma EC) are concomitant with endometriosis (ES), which leads to the postulation of malignant transformation of ES to endometriosis-associated ovarian carcinoma (EAOC). Different deregulated functional areas were proposed accounting for the pathogenesis of EAOC transformation, and there is still a lack of a data-driven analysis with the accumulated experimental data in publicly-available databases to incorporate the deregulated functions involved in the malignant transformation of EOAC. We used the microarray gene expression datasets of ES, CCC and EC downloaded from the National Center for Biotechnology Information Gene Expression Omnibus (NCBI GEO) database. Then, we investigated the pathogenesis of EAOC by a data-driven, function-based analytic model with the quantified molecular functions defined by 1454 Gene Ontology (GO) term gene sets. This model converts the gene expression profiles to the functionome consisting of 1454 quantified GO functions, and then, the key functions involving the malignant transformation of EOAC can be extracted by a series of filters. Our results demonstrate that the deregulated oxidoreductase activity, metabolism, hormone activity, inflammatory response, innate immune response and cell-cell signaling play the key roles in the malignant transformation of EAOC. These results provide the evidence supporting the specific molecular pathways involved in the malignant transformation of EAOC. PMID:29113136

  17. Analysis of temporal transcription expression profiles reveal links between protein function and developmental stages of Drosophila melanogaster.

    PubMed

    Wan, Cen; Lees, Jonathan G; Minneci, Federico; Orengo, Christine A; Jones, David T

    2017-10-01

    Accurate gene or protein function prediction is a key challenge in the post-genome era. Most current methods perform well on molecular function prediction, but struggle to provide useful annotations relating to biological process functions due to the limited power of sequence-based features in that functional domain. In this work, we systematically evaluate the predictive power of temporal transcription expression profiles for protein function prediction in Drosophila melanogaster. Our results show significantly better performance on predicting protein function when transcription expression profile-based features are integrated with sequence-derived features, compared with the sequence-derived features alone. We also observe that the combination of expression-based and sequence-based features leads to further improvement of accuracy on predicting all three domains of gene function. Based on the optimal feature combinations, we then propose a novel multi-classifier-based function prediction method for Drosophila melanogaster proteins, FFPred-fly+. Interpreting our machine learning models also allows us to identify some of the underlying links between biological processes and developmental stages of Drosophila melanogaster.

  18. An integrative approach to inferring biologically meaningful gene modules.

    PubMed

    Cho, Ji-Hoon; Wang, Kai; Galas, David J

    2011-07-26

    The ability to construct biologically meaningful gene networks and modules is critical for contemporary systems biology. Though recent studies have demonstrated the power of using gene modules to shed light on the functioning of complex biological systems, most modules in these networks have shown little association with meaningful biological function. We have devised a method which directly incorporates gene ontology (GO) annotation in construction of gene modules in order to gain better functional association. We have devised a method, Semantic Similarity-Integrated approach for Modularization (SSIM) that integrates various gene-gene pairwise similarity values, including information obtained from gene expression, protein-protein interactions and GO annotations, in the construction of modules using affinity propagation clustering. We demonstrated the performance of the proposed method using data from two complex biological responses: 1. the osmotic shock response in Saccharomyces cerevisiae, and 2. the prion-induced pathogenic mouse model. In comparison with two previously reported algorithms, modules identified by SSIM showed significantly stronger association with biological functions. The incorporation of semantic similarity based on GO annotation with gene expression and protein-protein interaction data can greatly enhance the functional relevance of inferred gene modules. In addition, the SSIM approach can also reveal the hierarchical structure of gene modules to gain a broader functional view of the biological system. Hence, the proposed method can facilitate comprehensive and in-depth analysis of high throughput experimental data at the gene network level.

  19. fabp4 is central to eight obesity associated genes: a functional gene network-based polymorphic study.

    PubMed

    Bag, Susmita; Ramaiah, Sudha; Anbarasu, Anand

    2015-01-07

    Network study on genes and proteins offers functional basics of the complexity of gene and protein, and its interacting partners. The gene fatty acid-binding protein 4 (fabp4) is found to be highly expressed in adipose tissue, and is one of the most abundant proteins in mature adipocytes. Our investigations on functional modules of fabp4 provide useful information on the functional genes interacting with fabp4, their biochemical properties and their regulatory functions. The present study shows that there are eight set of candidate genes: acp1, ext2, insr, lipe, ostf1, sncg, usp15, and vim that are strongly and functionally linked up with fabp4. Gene ontological analysis of network modules of fabp4 provides an explicit idea on the functional aspect of fabp4 and its interacting nodes. The hierarchal mapping on gene ontology indicates gene specific processes and functions as well as their compartmentalization in tissues. The fabp4 along with its interacting genes are involved in lipid metabolic activity and are integrated in multi-cellular processes of tissues and organs. They also have important protein/enzyme binding activity. Our study elucidated disease-associated nsSNP prediction for fabp4 and it is interesting to note that there are four rsID׳s (rs1051231, rs3204631, rs140925685 and rs141169989) with disease allelic variation (T104P, T126P, G27D and G90V respectively). On the whole, our gene network analysis presents a clear insight about the interactions and functions associated with fabp4 gene network. Copyright © 2014 Elsevier Ltd. All rights reserved.

  20. A plasmid collection for PCR-based gene targeting in the filamentous ascomycete Ashbya gossypii.

    PubMed

    Kaufmann, Andreas

    2009-08-01

    PCR-based gene targeting with heterologous markers is an efficient method to delete genes, generate gene fusions, and modulate gene expression. For the yeasts Saccharomyces cerevisiae and Schizosaccharomyces pombe, several plasmid collections are available covering a wide range of tags and markers. For several reasons, many of these cassettes cannot be used in the filamentous ascomycete Ashbya gossypii. This article describes the construction of 93 heterologous modules for C- and N-terminal tagging and promoter replacements in A. gossypii. The performance of 12 different fluorescent tags was evaluated by monitoring their brightness, detectability, and photostability when fused to the myosin light-chain protein Mlc2. Furthermore, the thiamine-repressible S. cerevisiae THI13 promoter was established to regulate gene expression in A. gossypii. This collection will help accelerate analysis of gene function in A. gossypii and in other ascomycetes where S. cerevisiae promoter elements are functional.

  1. Sequence- and Structure-Based Functional Annotation and Assessment of Metabolic Transporters in Aspergillus oryzae: A Representative Case Study

    PubMed Central

    Raethong, Nachon; Wong-ekkabut, Jirasak; Laoteng, Kobkul; Vongsangnak, Wanwipa

    2016-01-01

    Aspergillus oryzae is widely used for the industrial production of enzymes. In A. oryzae metabolism, transporters appear to play crucial roles in controlling the flux of molecules for energy generation, nutrients delivery, and waste elimination in the cell. While the A. oryzae genome sequence is available, transporter annotation remains limited and thus the connectivity of metabolic networks is incomplete. In this study, we developed a metabolic annotation strategy to understand the relationship between the sequence, structure, and function for annotation of A. oryzae metabolic transporters. Sequence-based analysis with manual curation showed that 58 genes of 12,096 total genes in the A. oryzae genome encoded metabolic transporters. Under consensus integrative databases, 55 unambiguous metabolic transporter genes were distributed into channels and pores (7 genes), electrochemical potential-driven transporters (33 genes), and primary active transporters (15 genes). To reveal the transporter functional role, a combination of homology modeling and molecular dynamics simulation was implemented to assess the relationship between sequence to structure and structure to function. As in the energy metabolism of A. oryzae, the H+-ATPase encoded by the AO090005000842 gene was selected as a representative case study of multilevel linkage annotation. Our developed strategy can be used for enhancing metabolic network reconstruction. PMID:27274991

  2. Sequence- and Structure-Based Functional Annotation and Assessment of Metabolic Transporters in Aspergillus oryzae: A Representative Case Study.

    PubMed

    Raethong, Nachon; Wong-Ekkabut, Jirasak; Laoteng, Kobkul; Vongsangnak, Wanwipa

    2016-01-01

    Aspergillus oryzae is widely used for the industrial production of enzymes. In A. oryzae metabolism, transporters appear to play crucial roles in controlling the flux of molecules for energy generation, nutrients delivery, and waste elimination in the cell. While the A. oryzae genome sequence is available, transporter annotation remains limited and thus the connectivity of metabolic networks is incomplete. In this study, we developed a metabolic annotation strategy to understand the relationship between the sequence, structure, and function for annotation of A. oryzae metabolic transporters. Sequence-based analysis with manual curation showed that 58 genes of 12,096 total genes in the A. oryzae genome encoded metabolic transporters. Under consensus integrative databases, 55 unambiguous metabolic transporter genes were distributed into channels and pores (7 genes), electrochemical potential-driven transporters (33 genes), and primary active transporters (15 genes). To reveal the transporter functional role, a combination of homology modeling and molecular dynamics simulation was implemented to assess the relationship between sequence to structure and structure to function. As in the energy metabolism of A. oryzae, the H(+)-ATPase encoded by the AO090005000842 gene was selected as a representative case study of multilevel linkage annotation. Our developed strategy can be used for enhancing metabolic network reconstruction.

  3. CRISPR/Cas9-mediated efficient genome editing via blastospore-based transformation in entomopathogenic fungus Beauveria bassiana.

    PubMed

    Chen, Jingjing; Lai, Yiling; Wang, Lili; Zhai, Suzhen; Zou, Gen; Zhou, Zhihua; Cui, Chunlai; Wang, Sibao

    2017-04-03

    Beauveria bassiana is an environmentally friendly alternative to chemical insecticides against various agricultural insect pests and vectors of human diseases. However, its application has been limited due to slow kill and sensitivity to abiotic stresses. Understanding of the molecular pathogenesis and physiological characteristics would facilitate improvement of the fungal performance. Loss-of-function mutagenesis is the most powerful tool to characterize gene functions, but it is hampered by the low rate of homologous recombination and the limited availability of selectable markers. Here, by combining the use of uridine auxotrophy as recipient and donor DNAs harboring auxotrophic complementation gene ura5 as a selectable marker with the blastospore-based transformation system, we established a highly efficient, low false-positive background and cost-effective CRISPR/Cas9-mediated gene editing system in B. bassiana. This system has been demonstrated as a simple and powerful tool for targeted gene knock-out and/or knock-in in B. bassiana in a single gene disruption. We further demonstrated that our system allows simultaneous disruption of multiple genes via homology-directed repair in a single transformation. This technology will allow us to study functionally redundant genes and holds significant potential to greatly accelerate functional genomics studies of B. bassiana.

  4. CRISPR/Cas9-mediated efficient genome editing via blastospore-based transformation in entomopathogenic fungus Beauveria bassiana

    PubMed Central

    Chen, Jingjing; Lai, Yiling; Wang, Lili; Zhai, Suzhen; Zou, Gen; Zhou, Zhihua; Cui, Chunlai; Wang, Sibao

    2017-01-01

    Beauveria bassiana is an environmentally friendly alternative to chemical insecticides against various agricultural insect pests and vectors of human diseases. However, its application has been limited due to slow kill and sensitivity to abiotic stresses. Understanding of the molecular pathogenesis and physiological characteristics would facilitate improvement of the fungal performance. Loss-of-function mutagenesis is the most powerful tool to characterize gene functions, but it is hampered by the low rate of homologous recombination and the limited availability of selectable markers. Here, by combining the use of uridine auxotrophy as recipient and donor DNAs harboring auxotrophic complementation gene ura5 as a selectable marker with the blastospore-based transformation system, we established a highly efficient, low false-positive background and cost-effective CRISPR/Cas9-mediated gene editing system in B. bassiana. This system has been demonstrated as a simple and powerful tool for targeted gene knock-out and/or knock-in in B. bassiana in a single gene disruption. We further demonstrated that our system allows simultaneous disruption of multiple genes via homology-directed repair in a single transformation. This technology will allow us to study functionally redundant genes and holds significant potential to greatly accelerate functional genomics studies of B. bassiana. PMID:28368054

  5. Fractal Clustering and Knowledge-driven Validation Assessment for Gene Expression Profiling.

    PubMed

    Wang, Lu-Yong; Balasubramanian, Ammaiappan; Chakraborty, Amit; Comaniciu, Dorin

    2005-01-01

    DNA microarray experiments generate a substantial amount of information about the global gene expression. Gene expression profiles can be represented as points in multi-dimensional space. It is essential to identify relevant groups of genes in biomedical research. Clustering is helpful in pattern recognition in gene expression profiles. A number of clustering techniques have been introduced. However, these traditional methods mainly utilize shape-based assumption or some distance metric to cluster the points in multi-dimension linear Euclidean space. Their results shows poor consistence with the functional annotation of genes in previous validation study. From a novel different perspective, we propose fractal clustering method to cluster genes using intrinsic (fractal) dimension from modern geometry. This method clusters points in such a way that points in the same clusters are more self-affine among themselves than to the points in other clusters. We assess this method using annotation-based validation assessment for gene clusters. It shows that this method is superior in identifying functional related gene groups than other traditional methods.

  6. Lymphocyte signaling : beyond knockouts

    PubMed Central

    Saveliev, Alexander; Tybulewicz, Victor L. J.

    2016-01-01

    The analysis of lymphocyte signaling was greatly enhanced by the advent of gene targeting, which allows the selective inactivation of a single gene. Whereas this gene ‘knockout’ approach is often informative, in many cases the phenotype resulting from gene ablation might not provide a complete picture of the function of the corresponding protein. If a protein has multiple functions within a single or several signaling pathways, or stabilizes other proteins in a complex, the phenotypic consequences of a gene knockout may manifest as a combination of several different perturbations. In these cases, gene targeting to ‘knockin’ subtle point mutations might provide more accurate insight into protein function. However, to be informative, such mutations must be carefully designed based on structural and biophysical data. PMID:19295633

  7. A partial structural and functional rescue of a retinitis pigmentosa model with compacted DNA nanoparticles.

    PubMed

    Cai, Xue; Nash, Zack; Conley, Shannon M; Fliesler, Steven J; Cooper, Mark J; Naash, Muna I

    2009-01-01

    Previously we have shown that compacted DNA nanoparticles can drive high levels of transgene expression after subretinal injection in the mouse eye. Here we delivered compacted DNA nanoparticles containing a therapeutic gene to the retinas of a mouse model of retinitis pigmentosa. Nanoparticles containing the wild-type retinal degeneration slow (Rds) gene were injected into the subretinal space of rds(+/-) mice on postnatal day 5. Gene expression was sustained for up to four months at levels up to four times higher than in controls injected with saline or naked DNA. The nanoparticles were taken up into virtually all photoreceptors and mediated significant structural and biochemical rescue of the disease without histological or functional evidence of toxicity. Electroretinogram recordings showed that nanoparticle-mediated gene transfer restored cone function to a near-normal level in contrast to transfer of naked plasmid DNA. Rod function was also improved. These findings demonstrate that compacted DNA nanoparticles represent a viable option for development of gene-based interventions for ocular diseases and obviate major barriers commonly encountered with non-viral based therapies.

  8. Genome-Wide siRNA-Based Functional Genomics of Pigmentation Identifies Novel Genes and Pathways That Impact Melanogenesis in Human Cells

    PubMed Central

    Bodemann, Brian; Petersen, Sean; Aruri, Jayavani; Koshy, Shiney; Richardson, Zachary; Le, Lu Q.; Krasieva, Tatiana; Roth, Michael G.; Farmer, Pat; White, Michael A.

    2008-01-01

    Melanin protects the skin and eyes from the harmful effects of UV irradiation, protects neural cells from toxic insults, and is required for sound conduction in the inner ear. Aberrant regulation of melanogenesis underlies skin disorders (melasma and vitiligo), neurologic disorders (Parkinson's disease), auditory disorders (Waardenburg's syndrome), and opthalmologic disorders (age related macular degeneration). Much of the core synthetic machinery driving melanin production has been identified; however, the spectrum of gene products participating in melanogenesis in different physiological niches is poorly understood. Functional genomics based on RNA-mediated interference (RNAi) provides the opportunity to derive unbiased comprehensive collections of pharmaceutically tractable single gene targets supporting melanin production. In this study, we have combined a high-throughput, cell-based, one-well/one-gene screening platform with a genome-wide arrayed synthetic library of chemically synthesized, small interfering RNAs to identify novel biological pathways that govern melanin biogenesis in human melanocytes. Ninety-two novel genes that support pigment production were identified with a low false discovery rate. Secondary validation and preliminary mechanistic studies identified a large panel of targets that converge on tyrosinase expression and stability. Small molecule inhibition of a family of gene products in this class was sufficient to impair chronic tyrosinase expression in pigmented melanoma cells and UV-induced tyrosinase expression in primary melanocytes. Isolation of molecular machinery known to support autophagosome biosynthesis from this screen, together with in vitro and in vivo validation, exposed a close functional relationship between melanogenesis and autophagy. In summary, these studies illustrate the power of RNAi-based functional genomics to identify novel genes, pathways, and pharmacologic agents that impact a biological phenotype and operate outside of preconceived mechanistic relationships. PMID:19057677

  9. A postprocessing method in the HMC framework for predicting gene function based on biological instrumental data

    NASA Astrophysics Data System (ADS)

    Feng, Shou; Fu, Ping; Zheng, Wenbin

    2018-03-01

    Predicting gene function based on biological instrumental data is a complicated and challenging hierarchical multi-label classification (HMC) problem. When using local approach methods to solve this problem, a preliminary results processing method is usually needed. This paper proposed a novel preliminary results processing method called the nodes interaction method. The nodes interaction method revises the preliminary results and guarantees that the predictions are consistent with the hierarchy constraint. This method exploits the label dependency and considers the hierarchical interaction between nodes when making decisions based on the Bayesian network in its first phase. In the second phase, this method further adjusts the results according to the hierarchy constraint. Implementing the nodes interaction method in the HMC framework also enhances the HMC performance for solving the gene function prediction problem based on the Gene Ontology (GO), the hierarchy of which is a directed acyclic graph that is more difficult to tackle. The experimental results validate the promising performance of the proposed method compared to state-of-the-art methods on eight benchmark yeast data sets annotated by the GO.

  10. Functional Enzyme-Based Approach for Linking Microbial Community Functions with Biogeochemical Process Kinetics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Minjing; Qian, Wei-jun; Gao, Yuqian

    The kinetics of biogeochemical processes in natural and engineered environmental systems are typically described using Monod-type or modified Monod-type models. These models rely on biomass as surrogates for functional enzymes in microbial community that catalyze biogeochemical reactions. A major challenge to apply such models is the difficulty to quantitatively measure functional biomass for constraining and validating the models. On the other hand, omics-based approaches have been increasingly used to characterize microbial community structure, functions, and metabolites. Here we proposed an enzyme-based model that can incorporate omics-data to link microbial community functions with biogeochemical process kinetics. The model treats enzymes asmore » time-variable catalysts for biogeochemical reactions and applies biogeochemical reaction network to incorporate intermediate metabolites. The sequences of genes and proteins from metagenomes, as well as those from the UniProt database, were used for targeted enzyme quantification and to provide insights into the dynamic linkage among functional genes, enzymes, and metabolites that are necessary to be incorporated in the model. The application of the model was demonstrated using denitrification as an example by comparing model-simulated with measured functional enzymes, genes, denitrification substrates and intermediates« less

  11. Genic insights from integrated human proteomics in GeneCards.

    PubMed

    Fishilevich, Simon; Zimmerman, Shahar; Kohn, Asher; Iny Stein, Tsippi; Olender, Tsviya; Kolker, Eugene; Safran, Marilyn; Lancet, Doron

    2016-01-01

    GeneCards is a one-stop shop for searchable human gene annotations (http://www.genecards.org/). Data are automatically mined from ∼120 sources and presented in an integrated web card for every human gene. We report the application of recent advances in proteomics to enhance gene annotation and classification in GeneCards. First, we constructed the Human Integrated Protein Expression Database (HIPED), a unified database of protein abundance in human tissues, based on the publically available mass spectrometry (MS)-based proteomics sources ProteomicsDB, Multi-Omics Profiling Expression Database, Protein Abundance Across Organisms and The MaxQuant DataBase. The integrated database, residing within GeneCards, compares favourably with its individual sources, covering nearly 90% of human protein-coding genes. For gene annotation and comparisons, we first defined a protein expression vector for each gene, based on normalized abundances in 69 normal human tissues. This vector is portrayed in the GeneCards expression section as a bar graph, allowing visual inspection and comparison. These data are juxtaposed with transcriptome bar graphs. Using the protein expression vectors, we further defined a pairwise metric that helps assess expression-based pairwise proximity. This new metric for finding functional partners complements eight others, including sharing of pathways, gene ontology (GO) terms and domains, implemented in the GeneCards Suite. In parallel, we calculated proteome-based differential expression, highlighting a subset of tissues that overexpress a gene and subserving gene classification. This textual annotation allows users of VarElect, the suite's next-generation phenotyper, to more effectively discover causative disease variants. Finally, we define the protein-RNA expression ratio and correlation as yet another attribute of every gene in each tissue, adding further annotative information. The results constitute a significant enhancement of several GeneCards sections and help promote and organize the genome-wide structural and functional knowledge of the human proteome. Database URL:http://www.genecards.org/. © The Author(s) 2016. Published by Oxford University Press.

  12. Gene Set−Based Integrative Analysis Revealing Two Distinct Functional Regulation Patterns in Four Common Subtypes of Epithelial Ovarian Cancer

    PubMed Central

    Chang, Chia-Ming; Chuang, Chi-Mu; Wang, Mong-Lien; Yang, Yi-Ping; Chuang, Jen-Hua; Yang, Ming-Jie; Yen, Ming-Shyen; Chiou, Shih-Hwa; Chang, Cheng-Chang

    2016-01-01

    Clear cell (CCC), endometrioid (EC), mucinous (MC) and high-grade serous carcinoma (SC) are the four most common subtypes of epithelial ovarian carcinoma (EOC). The widely accepted dualistic model of ovarian carcinogenesis divided EOCs into type I and II categories based on the molecular features. However, this hypothesis has not been experimentally demonstrated. We carried out a gene set-based analysis by integrating the microarray gene expression profiles downloaded from the publicly available databases. These quantified biological functions of EOCs were defined by 1454 Gene Ontology (GO) term and 674 Reactome pathway gene sets. The pathogenesis of the four EOC subtypes was investigated by hierarchical clustering and exploratory factor analysis. The patterns of functional regulation among the four subtypes containing 1316 cases could be accurately classified by machine learning. The results revealed that the ERBB and PI3K-related pathways played important roles in the carcinogenesis of CCC, EC and MC; while deregulation of cell cycle was more predominant in SC. The study revealed that two different functional regulation patterns exist among the four EOC subtypes, which were compatible with the type I and II classifications proposed by the dualistic model of ovarian carcinogenesis. PMID:27527159

  13. Evaluation and integration of functional annotation pipelines for newly sequenced organisms: the potato genome as a test case.

    PubMed

    Amar, David; Frades, Itziar; Danek, Agnieszka; Goldberg, Tatyana; Sharma, Sanjeev K; Hedley, Pete E; Proux-Wera, Estelle; Andreasson, Erik; Shamir, Ron; Tzfadia, Oren; Alexandersson, Erik

    2014-12-05

    For most organisms, even if their genome sequence is available, little functional information about individual genes or proteins exists. Several annotation pipelines have been developed for functional analysis based on sequence, 'omics', and literature data. However, researchers encounter little guidance on how well they perform. Here, we used the recently sequenced potato genome as a case study. The potato genome was selected since its genome is newly sequenced and it is a non-model plant even if there is relatively ample information on individual potato genes, and multiple gene expression profiles are available. We show that the automatic gene annotations of potato have low accuracy when compared to a "gold standard" based on experimentally validated potato genes. Furthermore, we evaluate six state-of-the-art annotation pipelines and show that their predictions are markedly dissimilar (Jaccard similarity coefficient of 0.27 between pipelines on average). To overcome this discrepancy, we introduce a simple GO structure-based algorithm that reconciles the predictions of the different pipelines. We show that the integrated annotation covers more genes, increases by over 50% the number of highly co-expressed GO processes, and obtains much higher agreement with the gold standard. We find that different annotation pipelines produce different results, and show how to integrate them into a unified annotation that is of higher quality than each single pipeline. We offer an improved functional annotation of both PGSC and ITAG potato gene models, as well as tools that can be applied to additional pipelines and improve annotation in other organisms. This will greatly aid future functional analysis of '-omics' datasets from potato and other organisms with newly sequenced genomes. The new potato annotations are available with this paper.

  14. Genetics pathway-based imaging approaches in Chinese Han population with Alzheimer's disease risk.

    PubMed

    Bai, Feng; Liao, Wei; Yue, Chunxian; Pu, Mengjia; Shi, Yongmei; Yu, Hui; Yuan, Yonggui; Geng, Leiyu; Zhang, Zhijun

    2016-01-01

    The tau hypothesis has been raised with regard to the pathophysiology of Alzheimer's disease (AD). Mild cognitive impairment (MCI) is associated with a high risk for developing AD. However, no study has directly examined the brain topological alterations based on combined effects of tau protein pathway genes in MCI population. Forty-three patients with MCI and 30 healthy controls underwent resting-state functional magnetic resonance imaging (fMRI) in Chinese Han, and a tau protein pathway-based imaging approaches (7 candidate genes: 17 SNPs) were used to investigate changes in the topological organisation of brain activation associated with MCI. Impaired regional activation is related to tau protein pathway genes (5/7 candidate genes) in patients with MCI and likely in topologically convergent and divergent functional alterations patterns associated with genes, and combined effects of tau protein pathway genes disrupt the topological architecture of cortico-cerebellar loops. The associations between the loops and behaviours further suggest that tau protein pathway genes do play a significant role in non-episodic memory impairment. Tau pathway-based imaging approaches might strengthen the credibility in imaging genetic associations and generate pathway frameworks that might provide powerful new insights into the neural mechanisms that underlie MCI.

  15. Multiconstrained gene clustering based on generalized projections

    PubMed Central

    2010-01-01

    Background Gene clustering for annotating gene functions is one of the fundamental issues in bioinformatics. The best clustering solution is often regularized by multiple constraints such as gene expressions, Gene Ontology (GO) annotations and gene network structures. How to integrate multiple pieces of constraints for an optimal clustering solution still remains an unsolved problem. Results We propose a novel multiconstrained gene clustering (MGC) method within the generalized projection onto convex sets (POCS) framework used widely in image reconstruction. Each constraint is formulated as a corresponding set. The generalized projector iteratively projects the clustering solution onto these sets in order to find a consistent solution included in the intersection set that satisfies all constraints. Compared with previous MGC methods, POCS can integrate multiple constraints from different nature without distorting the original constraints. To evaluate the clustering solution, we also propose a new performance measure referred to as Gene Log Likelihood (GLL) that considers genes having more than one function and hence in more than one cluster. Comparative experimental results show that our POCS-based gene clustering method outperforms current state-of-the-art MGC methods. Conclusions The POCS-based MGC method can successfully combine multiple constraints from different nature for gene clustering. Also, the proposed GLL is an effective performance measure for the soft clustering solutions. PMID:20356386

  16. A GWAS meta-analysis from 5 population-based cohorts implicates ion channel genes in the pathogenesis of irritable bowel syndrome.

    PubMed

    Bonfiglio, F; Henström, M; Nag, A; Hadizadeh, F; Zheng, T; Cenit, M C; Tigchelaar, E; Williams, F; Reznichenko, A; Ek, W E; Rivera, N V; Homuth, G; Aghdassi, A A; Kacprowski, T; Männikkö, M; Karhunen, V; Bujanda, L; Rafter, J; Wijmenga, C; Ronkainen, J; Hysi, P; Zhernakova, A; D'Amato, M

    2018-04-19

    Irritable bowel syndrome (IBS) shows genetic predisposition, however, large-scale, powered gene mapping studies are lacking. We sought to exploit existing genetic (genotype) and epidemiological (questionnaire) data from a series of population-based cohorts for IBS genome-wide association studies (GWAS) and their meta-analysis. Based on questionnaire data compatible with Rome III Criteria, we identified a total of 1335 IBS cases and 9768 asymptomatic individuals from 5 independent European genotyped cohorts. Individual GWAS were carried out with sex-adjusted logistic regression under an additive model, followed by meta-analysis using the inverse variance method. Functional annotation of significant results was obtained via a computational pipeline exploiting ontology and interaction networks, and tissue-specific and gene set enrichment analyses. Suggestive GWAS signals (P ≤ 5.0 × 10 -6 ) were detected for 7 genomic regions, harboring 64 gene candidates to affect IBS risk via functional or expression changes. Functional annotation of this gene set convincingly (best FDR-corrected P = 3.1 × 10 -10 ) highlighted regulation of ion channel activity as the most plausible pathway affecting IBS risk. Our results confirm the feasibility of population-based studies for gene-discovery efforts in IBS, identify risk genes and loci to be prioritized in independent follow-ups, and pinpoint ion channels as important players and potential therapeutic targets warranting further investigation. © 2018 John Wiley & Sons Ltd.

  17. Directed evolution induces tributyrin hydrolysis in a virulence factor of Xylella fastidiosa using a duplicated gene as a template.

    PubMed

    Gouran, Hossein; Chakraborty, Sandeep; Rao, Basuthkar J; Asgeirsson, Bjarni; Dandekar, Abhaya

    2014-01-01

    Duplication of genes is one of the preferred ways for natural selection to add advantageous functionality to the genome without having to reinvent the wheel with respect to catalytic efficiency and protein stability. The duplicated secretory virulence factors of Xylella fastidiosa (LesA, LesB and LesC), implicated in Pierce's disease of grape and citrus variegated chlorosis of citrus species, epitomizes the positive selection pressures exerted on advantageous genes in such pathogens. A deeper insight into the evolution of these lipases/esterases is essential to develop resistance mechanisms in transgenic plants. Directed evolution, an attempt to accelerate the evolutionary steps in the laboratory, is inherently simple when targeted for loss of function. A bigger challenge is to specify mutations that endow a new function, such as a lost functionality in a duplicated gene. Previously, we have proposed a method for enumerating candidates for mutations intended to transfer the functionality of one protein into another related protein based on the spatial and electrostatic properties of the active site residues (DECAAF). In the current work, we present in vivo validation of DECAAF by inducing tributyrin hydrolysis in LesB based on the active site similarity to LesA. The structures of these proteins have been modeled using RaptorX based on the closely related LipA protein from Xanthomonas oryzae. These mutations replicate the spatial and electrostatic conformation of LesA in the modeled structure of the mutant LesB as well, providing in silico validation before proceeding to the laborious in vivo work. Such focused mutations allows one to dissect the relevance of the duplicated genes in finer detail as compared to gene knockouts, since they do not interfere with other moonlighting functions, protein expression levels or protein-protein interaction.

  18. Directed evolution induces tributyrin hydrolysis in a virulence factor of Xylella fastidiosa using a duplicated gene as a template

    PubMed Central

    Rao, Basuthkar J.; Asgeirsson, Bjarni; Dandekar, Abhaya

    2014-01-01

    Duplication of genes is one of the preferred ways for natural selection to add advantageous functionality to the genome without having to reinvent the wheel with respect to catalytic efficiency and protein stability. The duplicated secretory virulence factors of Xylella fastidiosa (LesA, LesB and LesC), implicated in Pierce's disease of grape and citrus variegated chlorosis of citrus species, epitomizes the positive selection pressures exerted on advantageous genes in such pathogens. A deeper insight into the evolution of these lipases/esterases is essential to develop resistance mechanisms in transgenic plants. Directed evolution, an attempt to accelerate the evolutionary steps in the laboratory, is inherently simple when targeted for loss of function. A bigger challenge is to specify mutations that endow a new function, such as a lost functionality in a duplicated gene. Previously, we have proposed a method for enumerating candidates for mutations intended to transfer the functionality of one protein into another related protein based on the spatial and electrostatic properties of the active site residues (DECAAF). In the current work, we present in vivo validation of DECAAF by inducing tributyrin hydrolysis in LesB based on the active site similarity to LesA. The structures of these proteins have been modeled using RaptorX based on the closely related LipA protein from Xanthomonas oryzae. These mutations replicate the spatial and electrostatic conformation of LesA in the modeled structure of the mutant LesB as well, providing in silico validation before proceeding to the laborious in vivo work. Such focused mutations allows one to dissect the relevance of the duplicated genes in finer detail as compared to gene knockouts, since they do not interfere with other moonlighting functions, protein expression levels or protein-protein interaction. PMID:25717364

  19. Integrated pathway-based transcription regulation network mining and visualization based on gene expression profiles.

    PubMed

    Kibinge, Nelson; Ono, Naoaki; Horie, Masafumi; Sato, Tetsuo; Sugiura, Tadao; Altaf-Ul-Amin, Md; Saito, Akira; Kanaya, Shigehiko

    2016-06-01

    Conventionally, workflows examining transcription regulation networks from gene expression data involve distinct analytical steps. There is a need for pipelines that unify data mining and inference deduction into a singular framework to enhance interpretation and hypotheses generation. We propose a workflow that merges network construction with gene expression data mining focusing on regulation processes in the context of transcription factor driven gene regulation. The pipeline implements pathway-based modularization of expression profiles into functional units to improve biological interpretation. The integrated workflow was implemented as a web application software (TransReguloNet) with functions that enable pathway visualization and comparison of transcription factor activity between sample conditions defined in the experimental design. The pipeline merges differential expression, network construction, pathway-based abstraction, clustering and visualization. The framework was applied in analysis of actual expression datasets related to lung, breast and prostrate cancer. Copyright © 2016 Elsevier Inc. All rights reserved.

  20. Function-driven discovery of disease genes in zebrafish using an integrated genomics big data resource.

    PubMed

    Shim, Hongseok; Kim, Ji Hyun; Kim, Chan Yeong; Hwang, Sohyun; Kim, Hyojin; Yang, Sunmo; Lee, Ji Eun; Lee, Insuk

    2016-11-16

    Whole exome sequencing (WES) accelerates disease gene discovery using rare genetic variants, but further statistical and functional evidence is required to avoid false-discovery. To complement variant-driven disease gene discovery, here we present function-driven disease gene discovery in zebrafish (Danio rerio), a promising human disease model owing to its high anatomical and genomic similarity to humans. To facilitate zebrafish-based function-driven disease gene discovery, we developed a genome-scale co-functional network of zebrafish genes, DanioNet (www.inetbio.org/danionet), which was constructed by Bayesian integration of genomics big data. Rigorous statistical assessment confirmed the high prediction capacity of DanioNet for a wide variety of human diseases. To demonstrate the feasibility of the function-driven disease gene discovery using DanioNet, we predicted genes for ciliopathies and performed experimental validation for eight candidate genes. We also validated the existence of heterozygous rare variants in the candidate genes of individuals with ciliopathies yet not in controls derived from the UK10K consortium, suggesting that these variants are potentially involved in enhancing the risk of ciliopathies. These results showed that an integrated genomics big data for a model animal of diseases can expand our opportunity for harnessing WES data in disease gene discovery. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. Plant functional genomics

    NASA Astrophysics Data System (ADS)

    Holtorf, Hauke; Guitton, Marie-Christine; Reski, Ralf

    2002-04-01

    Functional genome analysis of plants has entered the high-throughput stage. The complete genome information from key species such as Arabidopsis thaliana and rice is now available and will further boost the application of a range of new technologies to functional plant gene analysis. To broadly assign functions to unknown genes, different fast and multiparallel approaches are currently used and developed. These new technologies are based on known methods but are adapted and improved to accommodate for comprehensive, large-scale gene analysis, i.e. such techniques are novel in the sense that their design allows researchers to analyse many genes at the same time and at an unprecedented pace. Such methods allow analysis of the different constituents of the cell that help to deduce gene function, namely the transcripts, proteins and metabolites. Similarly the phenotypic variations of entire mutant collections can now be analysed in a much faster and more efficient way than before. The different methodologies have developed to form their own fields within the functional genomics technological platform and are termed transcriptomics, proteomics, metabolomics and phenomics. Gene function, however, cannot solely be inferred by using only one such approach. Rather, it is only by bringing together all the information collected by different functional genomic tools that one will be able to unequivocally assign functions to unknown plant genes. This review focuses on current technical developments and their impact on the field of plant functional genomics. The lower plant Physcomitrella is introduced as a new model system for gene function analysis, owing to its high rate of homologous recombination.

  2. Investigating a multigene prognostic assay based on significant pathways for Luminal A breast cancer through gene expression profile analysis.

    PubMed

    Gao, Haiyan; Yang, Mei; Zhang, Xiaolan

    2018-04-01

    The present study aimed to investigate potential recurrence-risk biomarkers based on significant pathways for Luminal A breast cancer through gene expression profile analysis. Initially, the gene expression profiles of Luminal A breast cancer patients were downloaded from The Cancer Genome Atlas database. The differentially expressed genes (DEGs) were identified using a Limma package and the hierarchical clustering analysis was conducted for the DEGs. In addition, the functional pathways were screened using Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses and rank ratio calculation. The multigene prognostic assay was exploited based on the statistically significant pathways and its prognostic function was tested using train set and verified using the gene expression data and survival data of Luminal A breast cancer patients downloaded from the Gene Expression Omnibus. A total of 300 DEGs were identified between good and poor outcome groups, including 176 upregulated genes and 124 downregulated genes. The DEGs may be used to effectively distinguish Luminal A samples with different prognoses verified by hierarchical clustering analysis. There were 9 pathways screened as significant pathways and a total of 18 DEGs involved in these 9 pathways were identified as prognostic biomarkers. According to the survival analysis and receiver operating characteristic curve, the obtained 18-gene prognostic assay exhibited good prognostic function with high sensitivity and specificity to both the train and test samples. In conclusion the 18-gene prognostic assay including the key genes, transcription factor 7-like 2, anterior parietal cortex and lymphocyte enhancer factor-1 may provide a new method for predicting outcomes and may be conducive to the promotion of precision medicine for Luminal A breast cancer.

  3. Analysis of Aspergillus nidulans metabolism at the genome-scale

    PubMed Central

    David, Helga; Özçelik, İlknur Ş; Hofmann, Gerald; Nielsen, Jens

    2008-01-01

    Background Aspergillus nidulans is a member of a diverse group of filamentous fungi, sharing many of the properties of its close relatives with significance in the fields of medicine, agriculture and industry. Furthermore, A. nidulans has been a classical model organism for studies of development biology and gene regulation, and thus it has become one of the best-characterized filamentous fungi. It was the first Aspergillus species to have its genome sequenced, and automated gene prediction tools predicted 9,451 open reading frames (ORFs) in the genome, of which less than 10% were assigned a function. Results In this work, we have manually assigned functions to 472 orphan genes in the metabolism of A. nidulans, by using a pathway-driven approach and by employing comparative genomics tools based on sequence similarity. The central metabolism of A. nidulans, as well as biosynthetic pathways of relevant secondary metabolites, was reconstructed based on detailed metabolic reconstructions available for A. niger and Saccharomyces cerevisiae, and information on the genetics, biochemistry and physiology of A. nidulans. Thereby, it was possible to identify metabolic functions without a gene associated, and to look for candidate ORFs in the genome of A. nidulans by comparing its sequence to sequences of well-characterized genes in other species encoding the function of interest. A classification system, based on defined criteria, was developed for evaluating and selecting the ORFs among the candidates, in an objective and systematic manner. The functional assignments served as a basis to develop a mathematical model, linking 666 genes (both previously and newly annotated) to metabolic roles. The model was used to simulate metabolic behavior and additionally to integrate, analyze and interpret large-scale gene expression data concerning a study on glucose repression, thereby providing a means of upgrading the information content of experimental data and getting further insight into this phenomenon in A. nidulans. Conclusion We demonstrate how pathway modeling of A. nidulans can be used as an approach to improve the functional annotation of the genome of this organism. Furthermore we show how the metabolic model establishes functional links between genes, enabling the upgrade of the information content of transcriptome data. PMID:18405346

  4. Integration of biological data by kernels on graph nodes allows prediction of new genes involved in mitotic chromosome condensation

    PubMed Central

    Hériché, Jean-Karim; Lees, Jon G.; Morilla, Ian; Walter, Thomas; Petrova, Boryana; Roberti, M. Julia; Hossain, M. Julius; Adler, Priit; Fernández, José M.; Krallinger, Martin; Haering, Christian H.; Vilo, Jaak; Valencia, Alfonso; Ranea, Juan A.; Orengo, Christine; Ellenberg, Jan

    2014-01-01

    The advent of genome-wide RNA interference (RNAi)–based screens puts us in the position to identify genes for all functions human cells carry out. However, for many functions, assay complexity and cost make genome-scale knockdown experiments impossible. Methods to predict genes required for cell functions are therefore needed to focus RNAi screens from the whole genome on the most likely candidates. Although different bioinformatics tools for gene function prediction exist, they lack experimental validation and are therefore rarely used by experimentalists. To address this, we developed an effective computational gene selection strategy that represents public data about genes as graphs and then analyzes these graphs using kernels on graph nodes to predict functional relationships. To demonstrate its performance, we predicted human genes required for a poorly understood cellular function—mitotic chromosome condensation—and experimentally validated the top 100 candidates with a focused RNAi screen by automated microscopy. Quantitative analysis of the images demonstrated that the candidates were indeed strongly enriched in condensation genes, including the discovery of several new factors. By combining bioinformatics prediction with experimental validation, our study shows that kernels on graph nodes are powerful tools to integrate public biological data and predict genes involved in cellular functions of interest. PMID:24943848

  5. Functional characterisation of metal(loid) processes in planta through the integration of synchrotron techniques and plant molecular biology

    PubMed Central

    Donner, Erica; Punshon, Tracy; Guerinot, Mary Lou; Lombi, Enzo

    2013-01-01

    Functional characterisation of the genes regulating metal(loid) homeostasis in plants is a major focus of crop biofortification, phytoremediation, and food security research. This paper focuses on the potential for advancing plant metal(loid) research by combining molecular biology and synchrotron-based techniques. Recent advances in x-ray focussing optics and fluorescence detection have greatly improved the potential of synchrotron techniques for plant science research, allowing metal(loids) to be imaged in vivo in hydrated plant tissues at sub-micron resolution. Laterally resolved metal(loid) speciation can also be determined. By using molecular techniques to probe the location of gene expression and protein localisation and combining it with this synchrotron-derived data, functional information can be effectively and efficiently assigned to specific genes. This paper provides a review of the state of the art in this field, and provides examples as to how synchrotron-based methods can be combined with molecular techniques to facilitate functional characterisation of genes in planta. PMID:22200921

  6. An Evolution-Based Screen for Genetic Differentiation between Anopheles Sister Taxa Enriches for Detection of Functional Immune Factors

    PubMed Central

    Takashima, Eizo; Williams, Marni; Eiglmeier, Karin; Pain, Adrien; Guelbeogo, Wamdaogo M.; Gneme, Awa; Brito-Fravallo, Emma; Holm, Inge; Lavazec, Catherine; Sagnon, N’Fale; Baxter, Richard H.; Riehle, Michelle M.; Vernick, Kenneth D.

    2015-01-01

    Nucleotide variation patterns across species are shaped by the processes of natural selection, including exposure to environmental pathogens. We examined patterns of genetic variation in two sister species, Anopheles gambiae and Anopheles coluzzii, both efficient natural vectors of human malaria in West Africa. We used the differentiation signature displayed by a known coordinate selective sweep of immune genes APL1 and TEP1 in A. coluzzii to design a population genetic screen trained on the sweep, classified a panel of 26 potential immune genes for concordance with the signature, and functionally tested their immune phenotypes. The screen results were strongly predictive for genes with protective immune phenotypes: genes meeting the screen criteria were significantly more likely to display a functional phenotype against malaria infection than genes not meeting the criteria (p = 0.0005). Thus, an evolution-based screen can efficiently prioritize candidate genes for labor-intensive downstream functional testing, and safely allow the elimination of genes not meeting the screen criteria. The suite of immune genes with characteristics similar to the APL1-TEP1 selective sweep appears to be more widespread in the A. coluzzii genome than previously recognized. The immune gene differentiation may be a consequence of adaptation of A. coluzzii to new pathogens encountered in its niche expansion during the separation from A. gambiae, although the role, if any of natural selection by Plasmodium is unknown. Application of the screen allowed identification of new functional immune factors, and assignment of new functions to known factors. We describe biochemical binding interactions between immune proteins that underlie functional activity for malaria infection, which highlights the interplay between pathogen specificity and the structure of immune complexes. We also find that most malaria-protective immune factors display phenotypes for either human or rodent malaria, with broad specificity a rarity. PMID:26633695

  7. GenCLiP 2.0: a web server for functional clustering of genes and construction of molecular networks based on free terms.

    PubMed

    Wang, Jia-Hong; Zhao, Ling-Feng; Lin, Pei; Su, Xiao-Rong; Chen, Shi-Jun; Huang, Li-Qiang; Wang, Hua-Feng; Zhang, Hai; Hu, Zhen-Fu; Yao, Kai-Tai; Huang, Zhong-Xi

    2014-09-01

    Identifying biological functions and molecular networks in a gene list and how the genes may relate to various topics is of considerable value to biomedical researchers. Here, we present a web-based text-mining server, GenCLiP 2.0, which can analyze human genes with enriched keywords and molecular interactions. Compared with other similar tools, GenCLiP 2.0 offers two unique features: (i) analysis of gene functions with free terms (i.e. any terms in the literature) generated by literature mining or provided by the user and (ii) accurate identification and integration of comprehensive molecular interactions from Medline abstracts, to construct molecular networks and subnetworks related to the free terms. http://ci.smu.edu.cn. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  8. Bioinformatics for spermatogenesis: annotation of male reproduction based on proteomics

    PubMed Central

    Zhou, Tao; Zhou, Zuo-Min; Guo, Xue-Jiang

    2013-01-01

    Proteomics strategies have been widely used in the field of male reproduction, both in basic and clinical research. Bioinformatics methods are indispensable in proteomics-based studies and are used for data presentation, database construction and functional annotation. In the present review, we focus on the functional annotation of gene lists obtained through qualitative or quantitative methods, summarizing the common and male reproduction specialized proteomics databases. We introduce several integrated tools used to find the hidden biological significance from the data obtained. We further describe in detail the information on male reproduction derived from Gene Ontology analyses, pathway analyses and biomedical analyses. We provide an overview of bioinformatics annotations in spermatogenesis, from gene function to biological function and from biological function to clinical application. On the basis of recently published proteomics studies and associated data, we show that bioinformatics methods help us to discover drug targets for sperm motility and to scan for cancer-testis genes. In addition, we summarize the online resources relevant to male reproduction research for the exploration of the regulation of spermatogenesis. PMID:23852026

  9. Integrated analysis of microRNA and gene expression profiles reveals a functional regulatory module associated with liver fibrosis.

    PubMed

    Chen, Wei; Zhao, Wenshan; Yang, Aiting; Xu, Anjian; Wang, Huan; Cong, Min; Liu, Tianhui; Wang, Ping; You, Hong

    2017-12-15

    Liver fibrosis, characterized with the excessive accumulation of extracellular matrix (ECM) proteins, represents the final common pathway of chronic liver inflammation. Ever-increasing evidence indicates microRNAs (miRNAs) dysregulation has important implications in the different stages of liver fibrosis. However, our knowledge of miRNA-gene regulation details pertaining to such disease remains unclear. The publicly available Gene Expression Omnibus (GEO) datasets of patients suffered from cirrhosis were extracted for integrated analysis. Differentially expressed miRNAs (DEMs) and genes (DEGs) were identified using GEO2R web tool. Putative target gene prediction of DEMs was carried out using the intersection of five major algorithms: DIANA-microT, TargetScan, miRanda, PICTAR5 and miRWalk. Functional miRNA-gene regulatory network (FMGRN) was constructed based on the computational target predictions at the sequence level and the inverse expression relationships between DEMs and DEGs. DAVID web server was selected to perform KEGG pathway enrichment analysis. Functional miRNA-gene regulatory module was generated based on the biological interpretation. Internal connections among genes in liver fibrosis-related module were determined using String database. MiRNA-gene regulatory modules related to liver fibrosis were experimentally verified in recombinant human TGFβ1 stimulated and specific miRNA inhibitor treated LX-2 cells. We totally identified 85 and 923 dysregulated miRNAs and genes in liver cirrhosis biopsy samples compared to their normal controls. All evident miRNA-gene pairs were identified and assembled into FMGRN which consisted of 990 regulations between 51 miRNAs and 275 genes, forming two big sub-networks that were defined as down-network and up-network, respectively. KEGG pathway enrichment analysis revealed that up-network was prominently involved in several KEGG pathways, in which "Focal adhesion", "PI3K-Akt signaling pathway" and "ECM-receptor interaction" were remarked significant (adjusted p<0.001). Genes enriched in these pathways coupled with their regulatory miRNAs formed a functional miRNA-gene regulatory module that contains 7 miRNAs, 22 genes and 42 miRNA-gene connections. Gene interaction analysis based on String database revealed that 8 out of 22 genes were highly clustered. Finally, we experimentally confirmed a functional regulatory module containing 5 miRNAs (miR-130b-3p, miR-148a-3p, miR-345-5p, miR-378a-3p, and miR-422a) and 6 genes (COL6A1, COL6A2, COL6A3, PIK3R3, COL1A1, CCND2) associated with liver fibrosis. Our integrated analysis of miRNA and gene expression profiles highlighted a functional miRNA-gene regulatory module associated with liver fibrosis, which, to some extent, may provide important clues to better understand the underlying pathogenesis of liver fibrosis. Copyright © 2017. Published by Elsevier B.V.

  10. An inducible tool for random mutagenesis in Aspergillus niger based on the transposon Vader.

    PubMed

    Paun, Linda; Nitsche, Benjamin; Homan, Tim; Ram, Arthur F; Kempken, Frank

    2016-07-01

    The ascomycete Aspergillus niger is widely used in the biotechnology, for instance in producing most of the world's citric acid. It is also known as a major food and feed contaminant. While generation of gene knockouts for functional genomics has become feasible in ku70 mutants, analyzing gene functions or metabolic pathways remains a laborious task. An unbiased transposon-based mutagenesis approach may aid this process of analyzing gene functions by providing mutant libraries in a short time. The Vader transposon is a non-autonomous DNA-transposon, which is activated by the homologous tan1-transposase. However, in the most commonly used lab strain of A. niger (N400 strain and derivatives), we found that the transposase, encoded by the tan1 gene, is mutated and inactive. To establish a Vader transposon-based mutagenesis system in the N400 background, we expressed the functional transposase of A. niger strain CBS 513.88 under the control of an inducible promoter based on the Tet-on system, which is activated in the presence of the antibiotic doxycycline (DOX). Increasing amounts of doxycycline lead to higher Vader excision frequencies, whereas little to none activity of Vader was observed without addition of doxycycline. Hence, this system appears to be suitable for producing stable mutants in the A. niger N400 background.

  11. Functional Abstraction as a Method to Discover Knowledge in Gene Ontologies

    PubMed Central

    Ultsch, Alfred; Lötsch, Jörn

    2014-01-01

    Computational analyses of functions of gene sets obtained in microarray analyses or by topical database searches are increasingly important in biology. To understand their functions, the sets are usually mapped to Gene Ontology knowledge bases by means of over-representation analysis (ORA). Its result represents the specific knowledge of the functionality of the gene set. However, the specific ontology typically consists of many terms and relationships, hindering the understanding of the ‘main story’. We developed a methodology to identify a comprehensibly small number of GO terms as “headlines” of the specific ontology allowing to understand all central aspects of the roles of the involved genes. The Functional Abstraction method finds a set of headlines that is specific enough to cover all details of a specific ontology and is abstract enough for human comprehension. This method exceeds the classical approaches at ORA abstraction and by focusing on information rather than decorrelation of GO terms, it directly targets human comprehension. Functional abstraction provides, with a maximum of certainty, information value, coverage and conciseness, a representation of the biological functions in a gene set plays a role. This is the necessary means to interpret complex Gene Ontology results thus strengthening the role of functional genomics in biomarker and drug discovery. PMID:24587272

  12. Using molecular functional networks to manifest connections between obesity and obesity-related diseases

    PubMed Central

    Yang, Jialiang; Qiu, Jing; Wang, Kejing; Zhu, Lijuan; Fan, Jingjing; Zheng, Deyin; Meng, Xiaodi; Yang, Jiasheng; Peng, Lihong; Fu, Yu; Zhang, Dahan; Peng, Shouneng; Huang, Haiyun; Zhang, Yi

    2017-01-01

    Obesity is a primary risk factor for many diseases such as certain cancers. In this study, we have developed three algorithms including a random-walk based method OBNet, a shortest-path based method OBsp and a direct-overlap method OBoverlap, to reveal obesity-disease connections at protein-interaction subnetworks corresponding to thousands of biological functions and pathways. Through literature mining, we also curated an obesity-associated disease list, by which we compared the methods. As a result, OBNet outperforms other two methods. OBNet can predict whether a disease is obesity-related based on its associated genes. Meanwhile, OBNet identifies extensive connections between obesity genes and genes associated with a few diseases at various functional modules and pathways. Using breast cancer and Type 2 diabetes as two examples, OBNet identifies meaningful genes that may play key roles in connecting obesity and the two diseases. For example, TGFB1 and VEGFA are inferred to be the top two key genes mediating obesity-breast cancer connection in modules associated with brain development. Finally, the top modules identified by OBNet in breast cancer significantly overlap with modules identified from TCGA breast cancer gene expression study, revealing the power of OBNet in identifying biological processes involved in the disease. PMID:29156709

  13. A network-based method for the identification of putative genes related to infertility.

    PubMed

    Wang, ShaoPeng; Huang, GuoHua; Hu, Qinghua; Zou, Quan

    2016-11-01

    Infertility has become one of the major health problems worldwide, with its incidence having risen markedly in recent decades. There is an urgent need to investigate the pathological mechanisms behind infertility and to design effective treatments. However, this is made difficult by the fact that various biological factors have been identified to be related to infertility, including genetic factors. A network-based method was established to identify new genes potentially related to infertility. A network constructed using human protein-protein interactions based on previously validated infertility-related genes enabled the identification of some novel candidate genes. These genes were then filtered by a permutation test and their functional and structural associations with infertility-related genes. Our method identified 23 novel genes, which have strong functional and structural associations with previously validated infertility-related genes. Substantial evidence indicates that the identified genes are strongly related to dysfunction of the four main biological processes of fertility: reproductive development and physiology, gametogenesis, meiosis and recombination, and hormone regulation. The newly discovered genes may provide new directions for investigating infertility. This article is part of a Special Issue entitled "System Genetics" Guest Editor: Dr. Yudong Cai and Dr. Tao Huang. Copyright © 2016 Elsevier B.V. All rights reserved.

  14. Patterns of population differentiation of candidate genes for cardiovascular disease

    PubMed Central

    Kullo, Iftikhar J; Ding, Keyue

    2007-01-01

    Background The basis for ethnic differences in cardiovascular disease (CVD) susceptibility is not fully understood. We investigated patterns of population differentiation (FST) of a set of genes in etiologic pathways of CVD among 3 ethnic groups: Yoruba in Nigeria (YRI), Utah residents with European ancestry (CEU), and Han Chinese (CHB) + Japanese (JPT). We identified 37 pathways implicated in CVD based on the PANTHER classification and 416 genes in these pathways were further studied; these genes belonged to 6 biological processes (apoptosis, blood circulation and gas exchange, blood clotting, homeostasis, immune response, and lipoprotein metabolism). Genotype data were obtained from the HapMap database. Results We calculated FST for 15,559 common SNPs (minor allele frequency ≥ 0.10 in at least one population) in genes that co-segregated among the populations, as well as an average-weighted FST for each gene. SNPs were classified as putatively functional (non-synonymous and untranslated regions) or non-functional (intronic and synonymous sites). Mean FST values for common putatively functional variants were significantly higher than FST values for nonfunctional variants. A significant variation in FST was also seen based on biological processes; the processes of 'apoptosis' and 'lipoprotein metabolism' showed an excess of genes with high FST. Thus, putative functional SNPs in genes in etiologic pathways for CVD show greater population differentiation than non-functional SNPs and a significant variance of FST values was noted among pairwise population comparisons for different biological processes. Conclusion These results suggest a possible basis for varying susceptibility to CVD among ethnic groups. PMID:17626638

  15. [Gene deletion and functional analysis of the heptyl glycosyltransferase (waaF) gene in Vibrio parahemolyticus O-antigen cluster].

    PubMed

    Zhao, Feng; Meng, Songsong; Zhou, Deqing

    2016-02-04

    To construct heptyl glycosyltransferase gene II (waaF) gene deletion mutant of Vibrio parahaemolyticus, and explore the function of the waaF gene in Vibrio parahaemolyticus. The waaF gene deletion mutant was constructed by chitin-based transformation technology using clinical isolates, and then the growth rate, morphology and serotypes were identified. The different sources (O3, O5 and O10) waaF gene complementations were constructed through E. coli S17λpir strains conjugative transferring with Vibrio parahaemolyticus, and the function of the waaF gene was further verified by serotypes. The waaF gene deletion mutant strain was successfully constructed and it grew normally. The growth rate and morphology of mutant were similar with the wild type strains (WT), but the mutant could not occurred agglutination reaction with O antisera. The O3 and O5 sources waaF gene complementations occurred agglutination reaction with O antisera, but the O10 sources waaF gene complementations was not. The waaF gene was related with O-antigen synthesis and it was the key gene of O-antigen synthesis pathway in Vibrio parahaemolyticus. The function of different sources waaF gene were not the same.

  16. Identifying osteosarcoma metastasis associated genes by weighted gene co-expression network analysis (WGCNA).

    PubMed

    Tian, Honglai; Guan, Donghui; Li, Jianmin

    2018-06-01

    Osteosarcoma (OS), the most common malignant bone tumor, accounts for the heavy healthy threat in the period of children and adolescents. OS occurrence usually correlates with early metastasis and high death rate. This study aimed to better understand the mechanism of OS metastasis.Based on Gene Expression Omnibus (GEO) database, we downloaded 4 expression profile data sets associated with OS metastasis, and selected differential expressed genes. Weighted gene co-expression network analysis (WGCNA) approach allowed us to investigate the most OS metastasis-correlated module. Gene Ontology functional and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were used to give annotation of selected OS metastasis-associated genes.We select 897 differential expressed genes from OS metastasis and OS non-metastasis groups. Based on these selected genes, WGCNA further explored 142 genes included in the most OS metastasis-correlated module. Gene Ontology functional and KEGG pathway enrichment analyses showed that significantly OS metastasis-associated genes were involved in pathway correlated with insulin-like growth factor binding.Our research figured out several potential molecules participating in metastasis process and factors acting as biomarker. With this study, we could better explore the mechanism of OS metastasis and further discover more therapy targets.

  17. GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies

    PubMed Central

    Zhang, Bing; Schmoyer, Denise; Kirov, Stefan; Snoddy, Jay

    2004-01-01

    Background Microarray and other high-throughput technologies are producing large sets of interesting genes that are difficult to analyze directly. Bioinformatics tools are needed to interpret the functional information in the gene sets. Results We have created a web-based tool for data analysis and data visualization for sets of genes called GOTree Machine (GOTM). This tool was originally intended to analyze sets of co-regulated genes identified from microarray analysis but is adaptable for use with other gene sets from other high-throughput analyses. GOTree Machine generates a GOTree, a tree-like structure to navigate the Gene Ontology Directed Acyclic Graph for input gene sets. This system provides user friendly data navigation and visualization. Statistical analysis helps users to identify the most important Gene Ontology categories for the input gene sets and suggests biological areas that warrant further study. GOTree Machine is available online at . Conclusion GOTree Machine has a broad application in functional genomic, proteomic and other high-throughput methods that generate large sets of interesting genes; its primary purpose is to help users sort for interesting patterns in gene sets. PMID:14975175

  18. An integrative approach to inferring biologically meaningful gene modules

    PubMed Central

    2011-01-01

    Background The ability to construct biologically meaningful gene networks and modules is critical for contemporary systems biology. Though recent studies have demonstrated the power of using gene modules to shed light on the functioning of complex biological systems, most modules in these networks have shown little association with meaningful biological function. We have devised a method which directly incorporates gene ontology (GO) annotation in construction of gene modules in order to gain better functional association. Results We have devised a method, Semantic Similarity-Integrated approach for Modularization (SSIM) that integrates various gene-gene pairwise similarity values, including information obtained from gene expression, protein-protein interactions and GO annotations, in the construction of modules using affinity propagation clustering. We demonstrated the performance of the proposed method using data from two complex biological responses: 1. the osmotic shock response in Saccharomyces cerevisiae, and 2. the prion-induced pathogenic mouse model. In comparison with two previously reported algorithms, modules identified by SSIM showed significantly stronger association with biological functions. Conclusions The incorporation of semantic similarity based on GO annotation with gene expression and protein-protein interaction data can greatly enhance the functional relevance of inferred gene modules. In addition, the SSIM approach can also reveal the hierarchical structure of gene modules to gain a broader functional view of the biological system. Hence, the proposed method can facilitate comprehensive and in-depth analysis of high throughput experimental data at the gene network level. PMID:21791051

  19. Canonical Genetic Signatures of the Adult Human Brain

    PubMed Central

    Hawrylycz, Michael; Miller, Jeremy A.; Menon, Vilas; Feng, David; Dolbeare, Tim; Guillozet-Bongaarts, Angela L.; Jegga, Anil G.; Aronow, Bruce J.; Lee, Chang-Kyu; Bernard, Amy; Glasser, Matthew F.; Dierker, Donna L.; Menche, Jörge; Szafer, Aaron; Collman, Forrest; Grange, Pascal; Berman, Kenneth A.; Mihalas, Stefan; Yao, Zizhen; Stewart, Lance; Barabási, Albert-László; Schulkin, Jay; Phillips, John; Ng, Lydia; Dang, Chinh; Haynor, David R.; Jones, Allan; Van Essen, David C.; Koch, Christof; Lein, Ed

    2015-01-01

    The structure and function of the human brain are highly stereotyped, implying a conserved molecular program responsible for its development, cellular structure, and function. We applied a correlation-based metric of “differential stability” (DS) to assess reproducibility of gene expression patterning across 132 structures in six individual brains, revealing meso-scale genetic organization. The highest DS genes are highly biologically relevant, with enrichment for brain-related biological annotations, disease associations, drug targets, and literature citations. Using high DS genes we identified 32 anatomically diverse and reproducible gene expression signatures, which represent distinct cell types, intracellular components, and/or associations with neurodevelopmental and neurodegenerative disorders. Genes in neuron-associated compared to non-neuronal networks showed higher preservation between human and mouse; however, many diversely-patterned genes displayed dramatic shifts in regulation between species. Finally, highly consistent transcriptional architecture in neocortex is correlated with resting state functional connectivity, suggesting a link between conserved gene expression and functionally relevant circuitry. PMID:26571460

  20. MADGiC: a model-based approach for identifying driver genes in cancer

    PubMed Central

    Korthauer, Keegan D.; Kendziorski, Christina

    2015-01-01

    Motivation: Identifying and prioritizing somatic mutations is an important and challenging area of cancer research that can provide new insights into gene function as well as new targets for drug development. Most methods for prioritizing mutations rely primarily on frequency-based criteria, where a gene is identified as having a driver mutation if it is altered in significantly more samples than expected according to a background model. Although useful, frequency-based methods are limited in that all mutations are treated equally. It is well known, however, that some mutations have no functional consequence, while others may have a major deleterious impact. The spatial pattern of mutations within a gene provides further insight into their functional consequence. Properly accounting for these factors improves both the power and accuracy of inference. Also important is an accurate background model. Results: Here, we develop a Model-based Approach for identifying Driver Genes in Cancer (termed MADGiC) that incorporates both frequency and functional impact criteria and accommodates a number of factors to improve the background model. Simulation studies demonstrate advantages of the approach, including a substantial increase in power over competing methods. Further advantages are illustrated in an analysis of ovarian and lung cancer data from The Cancer Genome Atlas (TCGA) project. Availability and implementation: R code to implement this method is available at http://www.biostat.wisc.edu/ kendzior/MADGiC/. Contact: kendzior@biostat.wisc.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25573922

  1. Decoding the genome beyond sequencing: the new phase of genomic research.

    PubMed

    Heng, Henry H Q; Liu, Guo; Stevens, Joshua B; Bremer, Steven W; Ye, Karen J; Abdallah, Batoul Y; Horne, Steven D; Ye, Christine J

    2011-10-01

    While our understanding of gene-based biology has greatly improved, it is clear that the function of the genome and most diseases cannot be fully explained by genes and other regulatory elements. Genes and the genome represent distinct levels of genetic organization with their own coding systems; Genes code parts like protein and RNA, but the genome codes the structure of genetic networks, which are defined by the whole set of genes, chromosomes and their topological interactions within a cell. Accordingly, the genetic code of DNA offers limited understanding of genome functions. In this perspective, we introduce the genome theory which calls for the departure of gene-centric genomic research. To make this transition for the next phase of genomic research, it is essential to acknowledge the importance of new genome-based biological concepts and to establish new technology platforms to decode the genome beyond sequencing. Copyright © 2011 Elsevier Inc. All rights reserved.

  2. Genome-wide characterization of GRAS family genes in Medicago truncatula reveals their evolutionary dynamics and functional diversification

    PubMed Central

    Zhang, Hailing; Cao, Yingping; Shang, Chen; Li, Jikai; Wang, Jianli; Wu, Zhenying; Ma, Lichao; Qi, Tianxiong; Fu, Chunxiang; Hu, Baozhong

    2017-01-01

    The GRAS gene family is a large plant-specific family of transcription factors that are involved in diverse processes during plant development. Medicago truncatula is an ideal model plant for genetic research in legumes, and specifically for studying nodulation, which is crucial for nitrogen fixation. In this study, 59 MtGRAS genes were identified and classified into eight distinct subgroups based on phylogenetic relationships. Motifs located in the C-termini were conserved across the subgroups, while motifs in the N-termini were subfamily specific. Gene duplication was the main evolutionary force for MtGRAS expansion, especially proliferation of the LISCL subgroup. Seventeen duplicated genes showed strong effects of purifying selection and diverse expression patterns, highlighting their functional importance and diversification after duplication. Thirty MtGRAS genes, including NSP1 and NSP2, were preferentially expressed in nodules, indicating possible roles in the process of nodulation. A transcriptome study, combined with gene expression analysis under different stress conditions, suggested potential functions of MtGRAS genes in various biological pathways and stress responses. Taken together, these comprehensive analyses provide basic information for understanding the potential functions of GRAS genes, and will facilitate further discovery of MtGRAS gene functions. PMID:28945786

  3. 75 FR 66381 - Cellular, Tissue and Gene Therapies Advisory Committee; Notice of Meeting

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-10-28

    ...] Cellular, Tissue and Gene Therapies Advisory Committee; Notice of Meeting AGENCY: Food and Drug...: Cellular, Tissue and Gene Therapies Advisory Committee. General Function of the Committee: To provide... Competent Retrovirus (RCR)/Lentivirus (RCL) in Retroviral and Lentiviral Vector Based Gene Therapy Products...

  4. Incorporating interaction networks into the determination of functionally related hit genes in genomic experiments with Markov random fields

    PubMed Central

    Robinson, Sean; Nevalainen, Jaakko; Pinna, Guillaume; Campalans, Anna; Radicella, J. Pablo; Guyon, Laurent

    2017-01-01

    Abstract Motivation: Incorporating gene interaction data into the identification of ‘hit’ genes in genomic experiments is a well-established approach leveraging the ‘guilt by association’ assumption to obtain a network based hit list of functionally related genes. We aim to develop a method to allow for multivariate gene scores and multiple hit labels in order to extend the analysis of genomic screening data within such an approach. Results: We propose a Markov random field-based method to achieve our aim and show that the particular advantages of our method compared with those currently used lead to new insights in previously analysed data as well as for our own motivating data. Our method additionally achieves the best performance in an independent simulation experiment. The real data applications we consider comprise of a survival analysis and differential expression experiment and a cell-based RNA interference functional screen. Availability and implementation: We provide all of the data and code related to the results in the paper. Contact: sean.j.robinson@utu.fi or laurent.guyon@cea.fr Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28881978

  5. Gene Networks and Functional Features of Gravitropic response in Rice Shoot Bases

    NASA Astrophysics Data System (ADS)

    Hu, Liwei; Zang, Aiping; Ai, Qianru; Chen, Haiying; Li, Lin; Li, Rui; Su, Feng; Chen, Xijiang; Rong, Hui; Dou, Xianying; Reinhold-Hurek, Barbara; Li, Qi; Cai, Weiming

    To delineate key genes and the corresponding physiological functions as well as the coordina-tion of genes involved in the gravitropism of rice shoot bases, we used whole-genome microarray analysis of upper and lower parts of rice shoot bases at 0.5 h and 6 h after gravistimulation. And bio-information analysis was applied including GO-analysis, expression tendency and net-work analysis. In the lower shoot bases, auxin-mediated signaling pathway and glutathione transferase activity with the biggest enrichment were activated at 0.5 h, while cytokinin stimu-lus and photosynthesis were activated at 6 h. Meanwhile, several processes were suppressed in the lower shoot bases, including: xyloglucan:xyloglucosyl transferase activity, glucan metabolic processes, and ATPase activity at 0.5 h; and tRNA isopentenyltransferase activity, and chiti-nase activity, etc. at 6 h. Gene expression profile responding to gravistimulation suggested that the asymmetrically activation of several phytohormone signaling pathways including auxin, gib-berellin and cytokinin brassinolide ethylene and cytokinin-related genes were involved in the differentially growth between the upper and lower parts of rice shoot bases, and so do cell wall-related genes. Topological analysis of the coexpression networks revealed the core statue of AY177699.1(apetala3-like protein) and AK105103.1 at 0.5 h; AK062612.1 (ethylene response factor) and AK099932.1 (lectin-like receptor kinase 72) at 6 h. All the core factors have the function "response to endogenous stimulus". Additionally, AK108057.1(similar to germin-like protein precursor) was discovered as the most important core gene in the upper shoot bases in 6h after gravistimualtion while AK067424.1(cellulose synthase-like protein), AK120101.1 (Zinc finger, B-box domain containing protein) and CR278698 (ATPase associated with various cel-lular activities cellulose synthase-like protein) contribute equally to gravitropic response in the lower shoot bases.

  6. Insertional engineering of chromosomes with Sleeping Beauty transposition: an overview.

    PubMed

    Grabundzija, Ivana; Izsvák, Zsuzsanna; Ivics, Zoltán

    2011-01-01

    Novel genetic tools and mutagenesis strategies based on the Sleeping Beauty (SB) transposable element are currently under development with a vision to link primary DNA sequence information to gene functions in vertebrate models. By virtue of its inherent capacity to insert into DNA, the SB transposon can be developed into powerful tools for chromosomal manipulations. Mutagenesis screens based on SB have numerous advantages including high throughput and easy identification of mutated alleles. Forward genetic approaches based on insertional mutagenesis by engineered SB transposons have the advantage of providing insight into genetic networks and pathways based on phenotype. Indeed, the SB transposon has become a highly instrumental tool to induce tumors in experimental animals in a tissue-specific -manner with the aim of uncovering the genetic basis of diverse cancers. Here, we describe a battery of mutagenic cassettes that can be applied in conjunction with SB transposon vectors to mutagenize genes, and highlight versatile experimental strategies for the generation of engineered chromosomes for loss-of-function as well as gain-of-function mutagenesis for functional gene annotation in vertebrate models.

  7. Differentially Expressed Genes in Resistant and Susceptible Common Bean (Phaseolus vulgaris L.) Genotypes in Response to Fusarium oxysporum f. sp. phaseoli

    PubMed Central

    Xue, Renfeng; Wu, Jing; Zhu, Zhendong; Wang, Lanfen; Wang, Xiaoming; Wang, Shumin; Blair, Matthew W.

    2015-01-01

    Fusarium wilt of common bean (Phaseolus vulgaris L.), caused by Fusarium oxysporum Schlechtend.:Fr. f.sp. phaseoli (Fop), is one of the most important diseases of common beans worldwide. Few natural sources of resistance to Fop exist and provide only moderate or partial levels of protection. Despite the economic importance of the disease across multiple crops, only a few of Fop induced genes have been analyzed in legumes. Therefore, our goal was to identify transcriptionally regulated genes during an incompatible interaction between common bean and the Fop pathogen using the cDNA amplified fragment length polymorphism (cDNA-AFLP) technique. We generated a total of 8,730 transcript-derived fragments (TDFs) with 768 primer pairs based on the comparison of a moderately resistant and a susceptible genotype. In total, 423 TDFs (4.9%) displayed altered expression patterns after inoculation with Fop inoculum. We obtained full amplicon sequences for 122 selected TDFs, of which 98 were identified as annotated known genes in different functional categories based on their putative functions, 10 were predicted but non-annotated genes and 14 were not homologous to any known genes. The 98 TDFs encoding genes of known putative function were classified as related to metabolism (22), signal transduction (21), protein synthesis and processing (20), development and cytoskeletal organization (12), transport of proteins (7), gene expression and RNA metabolism (4), redox reactions (4), defense and stress responses (3), energy metabolism (3), and hormone responses (2). Based on the analyses of homology, 19 TDFs from different functional categories were chosen for expression analysis using quantitative RT-PCR. The genes found to be important here were implicated at various steps of pathogen infection and will allow a better understanding of the mechanisms of defense and resistance to Fop and similar pathogens. The differential response genes discovered here could also be used as molecular markers in association mapping or QTL analysis. PMID:26030070

  8. Evaluating Functional Annotations of Enzymes Using the Gene Ontology.

    PubMed

    Holliday, Gemma L; Davidson, Rebecca; Akiva, Eyal; Babbitt, Patricia C

    2017-01-01

    The Gene Ontology (GO) (Ashburner et al., Nat Genet 25(1):25-29, 2000) is a powerful tool in the informatics arsenal of methods for evaluating annotations in a protein dataset. From identifying the nearest well annotated homologue of a protein of interest to predicting where misannotation has occurred to knowing how confident you can be in the annotations assigned to those proteins is critical. In this chapter we explore what makes an enzyme unique and how we can use GO to infer aspects of protein function based on sequence similarity. These can range from identification of misannotation or other errors in a predicted function to accurate function prediction for an enzyme of entirely unknown function. Although GO annotation applies to any gene products, we focus here a describing our approach for hierarchical classification of enzymes in the Structure-Function Linkage Database (SFLD) (Akiva et al., Nucleic Acids Res 42(Database issue):D521-530, 2014) as a guide for informed utilisation of annotation transfer based on GO terms.

  9. Computational functional genomics-based approaches in analgesic drug discovery and repurposing.

    PubMed

    Lippmann, Catharina; Kringel, Dario; Ultsch, Alfred; Lötsch, Jörn

    2018-06-01

    Persistent pain is a major healthcare problem affecting a fifth of adults worldwide with still limited treatment options. The search for new analgesics increasingly includes the novel research area of functional genomics, which combines data derived from various processes related to DNA sequence, gene expression or protein function and uses advanced methods of data mining and knowledge discovery with the goal of understanding the relationship between the genome and the phenotype. Its use in drug discovery and repurposing for analgesic indications has so far been performed using knowledge discovery in gene function and drug target-related databases; next-generation sequencing; and functional proteomics-based approaches. Here, we discuss recent efforts in functional genomics-based approaches to analgesic drug discovery and repurposing and highlight the potential of computational functional genomics in this field including a demonstration of the workflow using a novel R library 'dbtORA'.

  10. A Morpholino-based screen to identify novel genes involved in craniofacial morphogenesis

    PubMed Central

    Melvin, Vida Senkus; Feng, Weiguo; Hernandez-Lagunas, Laura; Artinger, Kristin Bruk; Williams, Trevor

    2014-01-01

    BACKGROUND The regulatory mechanisms underpinning facial development are conserved between diverse species. Therefore, results from model systems provide insight into the genetic causes of human craniofacial defects. Previously, we generated a comprehensive dataset examining gene expression during development and fusion of the mouse facial prominences. Here, we used this resource to identify genes that have dynamic expression patterns in the facial prominences, but for which only limited information exists concerning developmental function. RESULTS This set of ~80 genes was used for a high throughput functional analysis in the zebrafish system using Morpholino gene knockdown technology. This screen revealed three classes of cranial cartilage phenotypes depending upon whether knockdown of the gene affected the neurocranium, viscerocranium, or both. The targeted genes that produced consistent phenotypes encoded proteins linked to transcription (meis1, meis2a, tshz2, vgll4l), signaling (pkdcc, vlk, macc1, wu:fb16h09), and extracellular matrix function (smoc2). The majority of these phenotypes were not altered by reduction of p53 levels, demonstrating that both p53 dependent and independent mechanisms were involved in the craniofacial abnormalities. CONCLUSIONS This Morpholino-based screen highlights new genes involved in development of the zebrafish craniofacial skeleton with wider relevance to formation of the face in other species, particularly mouse and human. PMID:23559552

  11. Diversity and interactions of microbial functional genes under differing environmental conditions: insights from a membrane bioreactor and an oxidation ditch.

    PubMed

    Xia, Yu; Hu, Man; Wen, Xianghua; Wang, Xiaohui; Yang, Yunfeng; Zhou, Jizhong

    2016-01-08

    The effect of environmental conditions on the diversity and interactions of microbial communities has caused tremendous interest in microbial ecology. Here, we found that with identical influents but differing operational parameters (mainly mixed liquor suspended solid (MLSS) concentrations, solid retention time (SRT) and dissolved oxygen (DO) concentrations), two full-scale municipal wastewater treatment systems applying oxidation ditch (OD) and membrane bioreactor (MBR) processes harbored a majority of shared genes (87.2%) but had different overall functional gene structures as revealed by two datasets of 12-day time-series generated by a functional gene array-GeoChip 4.2. Association networks of core carbon, nitrogen and phosphorus cycling genes in each system based on random matrix theory (RMT) showed different topological properties and the MBR nodes showed an indication of higher connectivity. MLSS and DO were shown to be effective in shaping functional gene structures of the systems by statistical analyses. Higher MLSS concentrations resulting in decreased resource availability of the MBR system were thought to promote positive interactions of important functional genes. Together, these findings show the differences of functional potentials of some bioprocesses caused by differing environmental conditions and suggest that higher stress of resource limitation increased positive gene interactions in the MBR system.

  12. Diversity and interactions of microbial functional genes under differing environmental conditions: insights from a membrane bioreactor and an oxidation ditch

    NASA Astrophysics Data System (ADS)

    Xia, Yu; Hu, Man; Wen, Xianghua; Wang, Xiaohui; Yang, Yunfeng; Zhou, Jizhong

    2016-01-01

    The effect of environmental conditions on the diversity and interactions of microbial communities has caused tremendous interest in microbial ecology. Here, we found that with identical influents but differing operational parameters (mainly mixed liquor suspended solid (MLSS) concentrations, solid retention time (SRT) and dissolved oxygen (DO) concentrations), two full-scale municipal wastewater treatment systems applying oxidation ditch (OD) and membrane bioreactor (MBR) processes harbored a majority of shared genes (87.2%) but had different overall functional gene structures as revealed by two datasets of 12-day time-series generated by a functional gene array-GeoChip 4.2. Association networks of core carbon, nitrogen and phosphorus cycling genes in each system based on random matrix theory (RMT) showed different topological properties and the MBR nodes showed an indication of higher connectivity. MLSS and DO were shown to be effective in shaping functional gene structures of the systems by statistical analyses. Higher MLSS concentrations resulting in decreased resource availability of the MBR system were thought to promote positive interactions of important functional genes. Together, these findings show the differences of functional potentials of some bioprocesses caused by differing environmental conditions and suggest that higher stress of resource limitation increased positive gene interactions in the MBR system.

  13. Diversity and interactions of microbial functional genes under differing environmental conditions: insights from a membrane bioreactor and an oxidation ditch

    PubMed Central

    Xia, Yu; Hu, Man; Wen, Xianghua; Wang, Xiaohui; Yang, Yunfeng; Zhou, Jizhong

    2016-01-01

    The effect of environmental conditions on the diversity and interactions of microbial communities has caused tremendous interest in microbial ecology. Here, we found that with identical influents but differing operational parameters (mainly mixed liquor suspended solid (MLSS) concentrations, solid retention time (SRT) and dissolved oxygen (DO) concentrations), two full-scale municipal wastewater treatment systems applying oxidation ditch (OD) and membrane bioreactor (MBR) processes harbored a majority of shared genes (87.2%) but had different overall functional gene structures as revealed by two datasets of 12-day time-series generated by a functional gene array-GeoChip 4.2. Association networks of core carbon, nitrogen and phosphorus cycling genes in each system based on random matrix theory (RMT) showed different topological properties and the MBR nodes showed an indication of higher connectivity. MLSS and DO were shown to be effective in shaping functional gene structures of the systems by statistical analyses. Higher MLSS concentrations resulting in decreased resource availability of the MBR system were thought to promote positive interactions of important functional genes. Together, these findings show the differences of functional potentials of some bioprocesses caused by differing environmental conditions and suggest that higher stress of resource limitation increased positive gene interactions in the MBR system. PMID:26743465

  14. BeeSpace Navigator: exploratory analysis of gene function using semantic indexing of biological literature.

    PubMed

    Sen Sarma, Moushumi; Arcoleo, David; Khetani, Radhika S; Chee, Brant; Ling, Xu; He, Xin; Jiang, Jing; Mei, Qiaozhu; Zhai, ChengXiang; Schatz, Bruce

    2011-07-01

    With the rapid decrease in cost of genome sequencing, the classification of gene function is becoming a primary problem. Such classification has been performed by human curators who read biological literature to extract evidence. BeeSpace Navigator is a prototype software for exploratory analysis of gene function using biological literature. The software supports an automatic analogue of the curator process to extract functions, with a simple interface intended for all biologists. Since extraction is done on selected collections that are semantically indexed into conceptual spaces, the curation can be task specific. Biological literature containing references to gene lists from expression experiments can be analyzed to extract concepts that are computational equivalents of a classification such as Gene Ontology, yielding discriminating concepts that differentiate gene mentions from other mentions. The functions of individual genes can be summarized from sentences in biological literature, to produce results resembling a model organism database entry that is automatically computed. Statistical frequency analysis based on literature phrase extraction generates offline semantic indexes to support these gene function services. The website with BeeSpace Navigator is free and open to all; there is no login requirement at www.beespace.illinois.edu for version 4. Materials from the 2010 BeeSpace Software Training Workshop are available at www.beespace.illinois.edu/bstwmaterials.php.

  15. Rapid deletion plasmid construction methods for protoplast and Agrobacterium based fungal transformation systems

    USDA-ARS?s Scientific Manuscript database

    Increasing availability of genomic data and sophistication of analytical methodology in fungi has elevated the need for functional genomics tools in these organisms. Gene deletion is a critical tool for functional analysis. The targeted deletion of genes requires both a suitable method for the trans...

  16. Genome-Wide Identification and Structural Analysis of bZIP Transcription Factor Genes in Brassica napus.

    PubMed

    Zhou, Yan; Xu, Daixiang; Jia, Ledong; Huang, Xiaohu; Ma, Guoqiang; Wang, Shuxian; Zhu, Meichen; Zhang, Aoxiang; Guan, Mingwei; Lu, Kun; Xu, Xinfu; Wang, Rui; Li, Jiana; Qu, Cunmin

    2017-10-24

    The basic region/leucine zipper motif (bZIP) transcription factor family is one of the largest families of transcriptional regulators in plants. bZIP genes have been systematically characterized in some plants, but not in rapeseed ( Brassica napus ). In this study, we identified 247 BnbZIP genes in the rapeseed genome, which we classified into 10 subfamilies based on phylogenetic analysis of their deduced protein sequences. The BnbZIP genes were grouped into functional clades with Arabidopsis genes with similar putative functions, indicating functional conservation. Genome mapping analysis revealed that the BnbZIPs are distributed unevenly across all 19 chromosomes, and that some of these genes arose through whole-genome duplication and dispersed duplication events. All expression profiles of 247 bZIP genes were extracted from RNA-sequencing data obtained from 17 different B . napus ZS11 tissues with 42 various developmental stages. These genes exhibited different expression patterns in various tissues, revealing that these genes are differentially regulated. Our results provide a valuable foundation for functional dissection of the different BnbZIP homologs in B . napus and its parental lines and for molecular breeding studies of bZIP genes in B . napus .

  17. Genome-Wide Identification and Structural Analysis of bZIP Transcription Factor Genes in Brassica napus

    PubMed Central

    Zhou, Yan; Xu, Daixiang; Jia, Ledong; Huang, Xiaohu; Ma, Guoqiang; Wang, Shuxian; Zhu, Meichen; Zhang, Aoxiang; Guan, Mingwei; Xu, Xinfu; Wang, Rui; Li, Jiana

    2017-01-01

    The basic region/leucine zipper motif (bZIP) transcription factor family is one of the largest families of transcriptional regulators in plants. bZIP genes have been systematically characterized in some plants, but not in rapeseed (Brassica napus). In this study, we identified 247 BnbZIP genes in the rapeseed genome, which we classified into 10 subfamilies based on phylogenetic analysis of their deduced protein sequences. The BnbZIP genes were grouped into functional clades with Arabidopsis genes with similar putative functions, indicating functional conservation. Genome mapping analysis revealed that the BnbZIPs are distributed unevenly across all 19 chromosomes, and that some of these genes arose through whole-genome duplication and dispersed duplication events. All expression profiles of 247 bZIP genes were extracted from RNA-sequencing data obtained from 17 different B. napus ZS11 tissues with 42 various developmental stages. These genes exhibited different expression patterns in various tissues, revealing that these genes are differentially regulated. Our results provide a valuable foundation for functional dissection of the different BnbZIP homologs in B. napus and its parental lines and for molecular breeding studies of bZIP genes in B. napus. PMID:29064393

  18. Down-Regulation of Gene Expression by RNA-Induced Gene Silencing

    NASA Astrophysics Data System (ADS)

    Travella, Silvia; Keller, Beat

    Down-regulation of endogenous genes via post-transcriptional gene silencing (PTGS) is a key to the characterization of gene function in plants. Many RNA-based silencing mechanisms such as post-transcriptional gene silencing, co-suppression, quelling, and RNA interference (RNAi) have been discovered among species of different kingdoms (plants, fungi, and animals). One of the most interesting discoveries was RNAi, a sequence-specific gene-silencing mechanism initiated by the introduction of double-stranded RNA (dsRNA), homologous in sequence to the silenced gene, which triggers degradation of mRNA. Infection of plants with modified viruses can also induce RNA silencing and is referred to as virus-induced gene silencing (VIGS). In contrast to insertional mutagenesis, these emerging new reverse genetic approaches represent a powerful tool for exploring gene function and for manipulating gene expression experimentally in cereal species such as barley and wheat. We examined how RNAi and VIGS have been used to assess gene function in barley and wheat, including molecular mechanisms involved in the process and available methodological elements, such as vectors, inoculation procedures, and analysis of silenced phenotypes.

  19. Long-Term Oil Contamination Alters the Molecular Ecological Networks of Soil Microbial Functional Genes

    PubMed Central

    Liang, Yuting; Zhao, Huihui; Deng, Ye; Zhou, Jizhong; Li, Guanghe; Sun, Bo

    2016-01-01

    With knowledge on microbial composition and diversity, investigation of within-community interactions is a further step to elucidate microbial ecological functions, such as the biodegradation of hazardous contaminants. In this work, microbial functional molecular ecological networks were studied in both contaminated and uncontaminated soils to determine the possible influences of oil contamination on microbial interactions and potential functions. Soil samples were obtained from an oil-exploring site located in South China, and the microbial functional genes were analyzed with GeoChip, a high-throughput functional microarray. By building random networks based on null model, we demonstrated that overall network structures and properties were significantly different between contaminated and uncontaminated soils (P < 0.001). Network connectivity, module numbers, and modularity were all reduced with contamination. Moreover, the topological roles of the genes (module hub and connectors) were altered with oil contamination. Subnetworks of genes involved in alkane and polycyclic aromatic hydrocarbon degradation were also constructed. Negative co-occurrence patterns prevailed among functional genes, thereby indicating probable competition relationships. The potential “keystone” genes, defined as either “hubs” or genes with highest connectivities in the network, were further identified. The network constructed in this study predicted the potential effects of anthropogenic contamination on microbial community co-occurrence interactions. PMID:26870020

  20. Synergistic effect of amino acids modified on dendrimer surface in gene delivery.

    PubMed

    Wang, Fei; Wang, Yitong; Wang, Hui; Shao, Naimin; Chen, Yuanyuan; Cheng, Yiyun

    2014-11-01

    Design of an efficient gene vector based on dendrimer remains a great challenge due to the presence of multiple barriers in gene delivery. Single-functionalization on dendrimer cannot overcome all the barriers. In this study, we synthesized a list of single-, dual- and triple-functionalized dendrimers with arginine, phenylalanine and histidine for gene delivery using a one-pot approach. The three amino acids play different roles in gene delivery: arginine is essential in formation of stable complexes, phenylalanine improves cellular uptake efficacy, and histidine increases pH-buffering capacity and minimizes cytotoxicity of the cationic dendrimer. A combination of these amino acids on dendrimer generates a synergistic effect in gene delivery. The dual- and triple-functionalized dendrimers show minimal cytotoxicity on the transfected NIH 3T3 cells. Using this combination strategy, we can obtain triple-functionalized dendrimers with comparable transfection efficacy to several commercial transfection reagents. Such a combination strategy should be applicable to the design of efficient and biocompatible gene vectors for gene delivery. Copyright © 2014 Elsevier Ltd. All rights reserved.

  1. Detection of single-copy functional genes in prokaryotic cells by two-pass TSA-FISH with polynucleotide probes.

    PubMed

    Kawakami, Shuji; Hasegawa, Takuya; Imachi, Hiroyuki; Yamaguchi, Takashi; Harada, Hideki; Ohashi, Akiyoshi; Kubota, Kengo

    2012-02-01

    In situ detection of functional genes with single-cell resolution is currently of interest to microbiologists. Here, we developed a two-pass tyramide signal amplification (TSA)-fluorescence in situ hybridization (FISH) protocol with PCR-derived polynucleotide probes for the detection of single-copy genes in prokaryotic cells. The mcrA gene and the apsA gene in methanogens and sulfate-reducing bacteria, respectively, were targeted. The protocol showed bright fluorescence with a good signal-to-noise ratio and achieved a high efficiency of detection (>98%). The discrimination threshold was approximately 82-89% sequence identity. Microorganisms possessing the mcrA or apsA gene in anaerobic sludge samples were successfully detected by two-pass TSA-FISH with polynucleotide probes. The developed protocol is useful for identifying single microbial cells based on functional gene sequences. Copyright © 2011 Elsevier B.V. All rights reserved.

  2. Foxtail Mosaic Virus-Induced Gene Silencing in Monocot Plants1[OPEN

    PubMed Central

    Liu, Na; Xie, Ke; Jia, Qi; Zhao, Jinping; Chen, Tianyuan; Li, Huangai; Wei, Xiang; Diao, Xianmin; Hong, Yiguo

    2016-01-01

    Virus-induced gene silencing (VIGS) is a powerful technique to study gene function in plants. However, very few VIGS vectors are available for monocot plants. Here we report that Foxtail mosaic virus (FoMV) can be engineered as an effective VIGS system to induce efficient silencing of endogenous genes in monocot plants including barley (Hordeum vulgare L.), wheat (Triticum aestivum) and foxtail millet (Setaria italica). This is evidenced by FoMV-based silencing of phytoene desaturase (PDS) and magnesium chelatase in barley, of PDS and Cloroplastos alterados1 in foxtail millet and wheat, and of an additional gene IspH in foxtail millet. Silencing of these genes resulted in photobleached or chlorosis phenotypes in barley, wheat, and foxtail millet. Furthermore, our FoMV-based gene silencing is the first VIGS system reported for foxtail millet, an important C4 model plant. It may provide an efficient toolbox for high-throughput functional genomics in economically important monocot crops. PMID:27225900

  3. Foxtail Mosaic Virus-Induced Gene Silencing in Monocot Plants.

    PubMed

    Liu, Na; Xie, Ke; Jia, Qi; Zhao, Jinping; Chen, Tianyuan; Li, Huangai; Wei, Xiang; Diao, Xianmin; Hong, Yiguo; Liu, Yule

    2016-07-01

    Virus-induced gene silencing (VIGS) is a powerful technique to study gene function in plants. However, very few VIGS vectors are available for monocot plants. Here we report that Foxtail mosaic virus (FoMV) can be engineered as an effective VIGS system to induce efficient silencing of endogenous genes in monocot plants including barley (Hordeum vulgare L.), wheat (Triticum aestivum) and foxtail millet (Setaria italica). This is evidenced by FoMV-based silencing of phytoene desaturase (PDS) and magnesium chelatase in barley, of PDS and Cloroplastos alterados1 in foxtail millet and wheat, and of an additional gene IspH in foxtail millet. Silencing of these genes resulted in photobleached or chlorosis phenotypes in barley, wheat, and foxtail millet. Furthermore, our FoMV-based gene silencing is the first VIGS system reported for foxtail millet, an important C4 model plant. It may provide an efficient toolbox for high-throughput functional genomics in economically important monocot crops. © 2016 American Society of Plant Biologists. All Rights Reserved.

  4. OncoBinder facilitates interpretation of proteomic interaction data by capturing coactivation pairs in cancer.

    PubMed

    Van Coillie, Samya; Liang, Lunxi; Zhang, Yao; Wang, Huanbin; Fang, Jing-Yuan; Xu, Jie

    2016-04-05

    High-throughput methods such as co-immunoprecipitationmass spectrometry (coIP-MS) and yeast 2 hybridization (Y2H) have suggested a broad range of unannotated protein-protein interactions (PPIs), and interpretation of these PPIs remains a challenging task. The advancements in cancer genomic researches allow for the inference of "coactivation pairs" in cancer, which may facilitate the identification of PPIs involved in cancer. Here we present OncoBinder as a tool for the assessment of proteomic interaction data based on the functional synergy of oncoproteins in cancer. This decision tree-based method combines gene mutation, copy number and mRNA expression information to infer the functional status of protein-coding genes. We applied OncoBinder to evaluate the potential binders of EGFR and ERK2 proteins based on the gastric cancer dataset of The Cancer Genome Atlas (TCGA). As a result, OncoBinder identified high confidence interactions (annotated by Kyoto Encyclopedia of Genes and Genomes (KEGG) or validated by low-throughput assays) more efficiently than co-expression based method. Taken together, our results suggest that evaluation of gene functional synergy in cancer may facilitate the interpretation of proteomic interaction data. The OncoBinder toolbox for Matlab is freely accessible online.

  5. RYBP stimulates PRC1 to shape chromatin-based communication between Polycomb repressive complexes

    PubMed Central

    Rose, Nathan R; King, Hamish W; Blackledge, Neil P; Fursova, Nadezda A; Ember, Katherine JI; Fischer, Roman; Kessler, Benedikt M; Klose, Robert J

    2016-01-01

    Polycomb group (PcG) proteins function as chromatin-based transcriptional repressors that are essential for normal gene regulation during development. However, how these systems function to achieve transcriptional regulation remains very poorly understood. Here, we discover that the histone H2AK119 E3 ubiquitin ligase activity of Polycomb repressive complex 1 (PRC1) is defined by the composition of its catalytic subunits and is highly regulated by RYBP/YAF2-dependent stimulation. In mouse embryonic stem cells, RYBP plays a central role in shaping H2AK119 mono-ubiquitylation at PcG targets and underpins an activity-based communication between PRC1 and Polycomb repressive complex 2 (PRC2) which is required for normal histone H3 lysine 27 trimethylation (H3K27me3). Without normal histone modification-dependent communication between PRC1 and PRC2, repressive Polycomb chromatin domains can erode, rendering target genes susceptible to inappropriate gene expression signals. This suggests that activity-based communication and histone modification-dependent thresholds create a localized form of epigenetic memory required for normal PcG chromatin domain function in gene regulation. DOI: http://dx.doi.org/10.7554/eLife.18591.001 PMID:27705745

  6. snpGeneSets: An R Package for Genome-Wide Study Annotation

    PubMed Central

    Mei, Hao; Li, Lianna; Jiang, Fan; Simino, Jeannette; Griswold, Michael; Mosley, Thomas; Liu, Shijian

    2016-01-01

    Genome-wide studies (GWS) of SNP associations and differential gene expressions have generated abundant results; next-generation sequencing technology has further boosted the number of variants and genes identified. Effective interpretation requires massive annotation and downstream analysis of these genome-wide results, a computationally challenging task. We developed the snpGeneSets package to simplify annotation and analysis of GWS results. Our package integrates local copies of knowledge bases for SNPs, genes, and gene sets, and implements wrapper functions in the R language to enable transparent access to low-level databases for efficient annotation of large genomic data. The package contains functions that execute three types of annotations: (1) genomic mapping annotation for SNPs and genes and functional annotation for gene sets; (2) bidirectional mapping between SNPs and genes, and genes and gene sets; and (3) calculation of gene effect measures from SNP associations and performance of gene set enrichment analyses to identify functional pathways. We applied snpGeneSets to type 2 diabetes (T2D) results from the NHGRI genome-wide association study (GWAS) catalog, a Finnish GWAS, and a genome-wide expression study (GWES). These studies demonstrate the usefulness of snpGeneSets for annotating and performing enrichment analysis of GWS results. The package is open-source, free, and can be downloaded at: https://www.umc.edu/biostats_software/. PMID:27807048

  7. Identification of potential crucial genes associated with steroid-induced necrosis of femoral head based on gene expression profile.

    PubMed

    Lin, Zhe; Lin, Yongsheng

    2017-09-05

    The aim of this study was to explore potential crucial genes associated with the steroid-induced necrosis of femoral head (SINFH) and to provide valid biological information for further investigation of SINFH. Gene expression profile of GSE26316, generated from 3 SINFH rat samples and 3 normal rat samples were downloaded from Gene Expression Omnibus (GEO) database. The differentially expressed genes (DEGs) were identified using LIMMA package. After functional enrichment analyses of DEGs, protein-protein interaction (PPI) network and sub-PPI network analyses were conducted based on the STRING database and cytoscape. In total, 59 up-regulated DEGs and 156 downregulated DEGs were identified. The up-regulated DEGs were mainly involved in functions about immunity (e.g. Fcer1A and Il7R), and the downregulated DEGs were mainly enriched in muscle system process (e.g. Tnni2, Mylpf and Myl1). The PPI network of DEGs consisted of 123 nodes and 300 interactions. Tnni2, Mylpf, and Myl1 were the top 3 outstanding genes based on both subgraph centrality and degree centrality evaluation. These three genes interacted with each other in the network. Furthermore, the significant network module was composed of 22 downregulated genes (e.g. Tnni2, Mylpf and Myl1). These genes were mainly enriched in functions like muscle system process. The DEGs related to the regulation of immune system process (e.g. Fcer1A and Il7R), and DEGs correlated with muscle system process (e.g. Tnni2, Mylpf and Myl1) may be closely associated with the progress of SINFH, which is still needed to be confirmed by experiments. Copyright © 2017 Elsevier B.V. All rights reserved.

  8. Use of transcriptome sequencing to understand the pistillate flowering in hickory (Carya cathayensis Sarg.).

    PubMed

    Huang, You-Jun; Liu, Li-Li; Huang, Jian-Qin; Wang, Zheng-Jia; Chen, Fang-Fang; Zhang, Qi-Xiang; Zheng, Bing-Song; Chen, Ming

    2013-10-10

    Different from herbaceous plants, the woody plants undergo a long-period vegetative stage to achieve floral transition. They then turn into seasonal plants, flowering annually. In this study, a preliminary model of gene regulations for seasonal pistillate flowering in hickory (Carya cathayensis) was proposed. The genome-wide dynamic transcriptome was characterized via the joint-approach of RNA sequencing and microarray analysis. Differential transcript abundance analysis uncovered the dynamic transcript abundance patterns of flowering correlated genes and their major functions based on Gene Ontology (GO) analysis. To explore pistillate flowering mechanism in hickory, a comprehensive flowering gene regulatory network based on Arabidopsis thaliana was constructed by additional literature mining. A total of 114 putative flowering or floral genes including 31 with differential transcript abundance were identified in hickory. The locations, functions and dynamic transcript abundances were analyzed in the gene regulatory networks. A genome-wide co-expression network for the putative flowering or floral genes shows three flowering regulatory modules corresponding to response to light abiotic stimulus, cold stress, and reproductive development process, respectively. Totally 27 potential flowering or floral genes were recruited which are meaningful to understand the hickory specific seasonal flowering mechanism better. Flowering event of pistillate flower bud in hickory is triggered by several pathways synchronously including the photoperiod, autonomous, vernalization, gibberellin, and sucrose pathway. Totally 27 potential flowering or floral genes were recruited from the genome-wide co-expression network function module analysis. Moreover, the analysis provides a potential FLC-like gene based vernalization pathway and an 'AC' model for pistillate flower development in hickory. This work provides an available framework for pistillate flower development in hickory, which is significant for insight into regulation of flowering and floral development of woody plants.

  9. Use of transcriptome sequencing to understand the pistillate flowering in hickory (Carya cathayensis Sarg.)

    PubMed Central

    2013-01-01

    Background Different from herbaceous plants, the woody plants undergo a long-period vegetative stage to achieve floral transition. They then turn into seasonal plants, flowering annually. In this study, a preliminary model of gene regulations for seasonal pistillate flowering in hickory (Carya cathayensis) was proposed. The genome-wide dynamic transcriptome was characterized via the joint-approach of RNA sequencing and microarray analysis. Results Differential transcript abundance analysis uncovered the dynamic transcript abundance patterns of flowering correlated genes and their major functions based on Gene Ontology (GO) analysis. To explore pistillate flowering mechanism in hickory, a comprehensive flowering gene regulatory network based on Arabidopsis thaliana was constructed by additional literature mining. A total of 114 putative flowering or floral genes including 31 with differential transcript abundance were identified in hickory. The locations, functions and dynamic transcript abundances were analyzed in the gene regulatory networks. A genome-wide co-expression network for the putative flowering or floral genes shows three flowering regulatory modules corresponding to response to light abiotic stimulus, cold stress, and reproductive development process, respectively. Totally 27 potential flowering or floral genes were recruited which are meaningful to understand the hickory specific seasonal flowering mechanism better. Conclusions Flowering event of pistillate flower bud in hickory is triggered by several pathways synchronously including the photoperiod, autonomous, vernalization, gibberellin, and sucrose pathway. Totally 27 potential flowering or floral genes were recruited from the genome-wide co-expression network function module analysis. Moreover, the analysis provides a potential FLC-like gene based vernalization pathway and an 'AC’ model for pistillate flower development in hickory. This work provides an available framework for pistillate flower development in hickory, which is significant for insight into regulation of flowering and floral development of woody plants. PMID:24106755

  10. DOSim: an R package for similarity between diseases based on Disease Ontology.

    PubMed

    Li, Jiang; Gong, Binsheng; Chen, Xi; Liu, Tao; Wu, Chao; Zhang, Fan; Li, Chunquan; Li, Xiang; Rao, Shaoqi; Li, Xia

    2011-06-29

    The construction of the Disease Ontology (DO) has helped promote the investigation of diseases and disease risk factors. DO enables researchers to analyse disease similarity by adopting semantic similarity measures, and has expanded our understanding of the relationships between different diseases and to classify them. Simultaneously, similarities between genes can also be analysed by their associations with similar diseases. As a result, disease heterogeneity is better understood and insights into the molecular pathogenesis of similar diseases have been gained. However, bioinformatics tools that provide easy and straight forward ways to use DO to study disease and gene similarity simultaneously are required. We have developed an R-based software package (DOSim) to compute the similarity between diseases and to measure the similarity between human genes in terms of diseases. DOSim incorporates a DO-based enrichment analysis function that can be used to explore the disease feature of an independent gene set. A multilayered enrichment analysis (GO and KEGG annotation) annotation function that helps users explore the biological meaning implied in a newly detected gene module is also part of the DOSim package. We used the disease similarity application to demonstrate the relationship between 128 different DO cancer terms. The hierarchical clustering of these 128 different cancers showed modular characteristics. In another case study, we used the gene similarity application on 361 obesity-related genes. The results revealed the complex pathogenesis of obesity. In addition, the gene module detection and gene module multilayered annotation functions in DOSim when applied on these 361 obesity-related genes helped extend our understanding of the complex pathogenesis of obesity risk phenotypes and the heterogeneity of obesity-related diseases. DOSim can be used to detect disease-driven gene modules, and to annotate the modules for functions and pathways. The DOSim package can also be used to visualise DO structure. DOSim can reflect the modular characteristic of disease related genes and promote our understanding of the complex pathogenesis of diseases. DOSim is available on the Comprehensive R Archive Network (CRAN) or http://bioinfo.hrbmu.edu.cn/dosim.

  11. Identification and analysis of unitary loss of long-established protein-coding genes in Poaceae shows evidences for biased gene loss and putatively functional transcription of relics.

    PubMed

    Zhao, Yi; Tang, Liang; Li, Zhe; Jin, Jinpu; Luo, Jingchu; Gao, Ge

    2015-04-18

    Long-established protein-coding genes may lose their coding potential during evolution ("unitary gene loss"). Members of the Poaceae family are a major food source and represent an ideal model clade for plant evolution research. However, the global pattern of unitary gene loss in Poaceae genomes as well as the evolutionary fate of lost genes are still less-investigated and remain largely elusive. Using a locally developed pipeline, we identified 129 unitary gene loss events for long-established protein-coding genes from four representative species of Poaceae, i.e. brachypodium, rice, sorghum and maize. Functional annotation suggested that the lost genes in all or most of Poaceae species are enriched for genes involved in development and response to endogenous stimulus. We also found that 44 mutated genomic loci of lost genes, which we referred as relics, were still actively transcribed, and of which 84% (37 of 44) showed significantly differential expression across different tissues. More interestingly, we found that there were totally five expressed relics may function as competitive endogenous RNA in brachypodium, rice and sorghum genome. Based on comparative genomics and transcriptome data, we firstly compiled a comprehensive catalogue of unitary gene loss events in Poaceae species and characterized a statistically significant functional preference for these lost genes as well showed the potential of relics functioning as competitive endogenous RNAs in Poaceae genomes.

  12. NetGen: a novel network-based probabilistic generative model for gene set functional enrichment analysis.

    PubMed

    Sun, Duanchen; Liu, Yinliang; Zhang, Xiang-Sun; Wu, Ling-Yun

    2017-09-21

    High-throughput experimental techniques have been dramatically improved and widely applied in the past decades. However, biological interpretation of the high-throughput experimental results, such as differential expression gene sets derived from microarray or RNA-seq experiments, is still a challenging task. Gene Ontology (GO) is commonly used in the functional enrichment studies. The GO terms identified via current functional enrichment analysis tools often contain direct parent or descendant terms in the GO hierarchical structure. Highly redundant terms make users difficult to analyze the underlying biological processes. In this paper, a novel network-based probabilistic generative model, NetGen, was proposed to perform the functional enrichment analysis. An additional protein-protein interaction (PPI) network was explicitly used to assist the identification of significantly enriched GO terms. NetGen achieved a superior performance than the existing methods in the simulation studies. The effectiveness of NetGen was explored further on four real datasets. Notably, several GO terms which were not directly linked with the active gene list for each disease were identified. These terms were closely related to the corresponding diseases when accessed to the curated literatures. NetGen has been implemented in the R package CopTea publicly available at GitHub ( http://github.com/wulingyun/CopTea/ ). Our procedure leads to a more reasonable and interpretable result of the functional enrichment analysis. As a novel term combination-based functional enrichment analysis method, NetGen is complementary to current individual term-based methods, and can help to explore the underlying pathogenesis of complex diseases.

  13. A dual selection based, targeted gene replacement tool for Magnaporthe grisea and Fusarium oxysporum.

    PubMed

    Khang, Chang Hyun; Park, Sook-Young; Lee, Yong-Hwan; Kang, Seogchan

    2005-06-01

    Rapid progress in fungal genome sequencing presents many new opportunities for functional genomic analysis of fungal biology through the systematic mutagenesis of the genes identified through sequencing. However, the lack of efficient tools for targeted gene replacement is a limiting factor for fungal functional genomics, as it often necessitates the screening of a large number of transformants to identify the desired mutant. We developed an efficient method of gene replacement and evaluated factors affecting the efficiency of this method using two plant pathogenic fungi, Magnaporthe grisea and Fusarium oxysporum. This method is based on Agrobacterium tumefaciens-mediated transformation with a mutant allele of the target gene flanked by the herpes simplex virus thymidine kinase (HSVtk) gene as a conditional negative selection marker against ectopic transformants. The HSVtk gene product converts 5-fluoro-2'-deoxyuridine to a compound toxic to diverse fungi. Because ectopic transformants express HSVtk, while gene replacement mutants lack HSVtk, growing transformants on a medium amended with 5-fluoro-2'-deoxyuridine facilitates the identification of targeted mutants by counter-selecting against ectopic transformants. In addition to M. grisea and F. oxysporum, the method and associated vectors are likely to be applicable to manipulating genes in a broad spectrum of fungi, thus potentially serving as an efficient, universal functional genomic tool for harnessing the growing body of fungal genome sequence data to study fungal biology.

  14. Integrative analysis for identification of shared markers from various functional cells/tissues for rheumatoid arthritis.

    PubMed

    Xia, Wei; Wu, Jian; Deng, Fei-Yan; Wu, Long-Fei; Zhang, Yong-Hong; Guo, Yu-Fan; Lei, Shu-Feng

    2017-02-01

    Rheumatoid arthritis (RA) is a systemic autoimmune disease. So far, it is unclear whether there exist common RA-related genes shared in different tissues/cells. In this study, we conducted an integrative analysis on multiple datasets to identify potential shared genes that are significant in multiple tissues/cells for RA. Seven microarray gene expression datasets representing various RA-related tissues/cells were downloaded from the Gene Expression Omnibus (GEO). Statistical analyses, testing both marginal and joint effects, were conducted to identify significant genes shared in various samples. Followed-up analyses were conducted on functional annotation clustering analysis, protein-protein interaction (PPI) analysis, gene-based association analysis, and ELISA validation analysis in in-house samples. We identified 18 shared significant genes, which were mainly involved in the immune response and chemokine signaling pathway. Among the 18 genes, eight genes (PPBP, PF4, HLA-F, S100A8, RNASEH2A, P2RY6, JAG2, and PCBP1) interact with known RA genes. Two genes (HLA-F and PCBP1) are significant in gene-based association analysis (P = 1.03E-31, P = 1.30E-2, respectively). Additionally, PCBP1 also showed differential protein expression levels in in-house case-control plasma samples (P = 2.60E-2). This study represented the first effort to identify shared RA markers from different functional cells or tissues. The results suggested that one of the shared genes, i.e., PCBP1, is a promising biomarker for RA.

  15. An evidence-based knowledgebase of metastasis suppressors to identify key pathways relevant to cancer metastasis

    PubMed Central

    Zhao, Min; Li, Zhe; Qu, Hong

    2015-01-01

    Metastasis suppressor genes (MS genes) are genes that play important roles in inhibiting the process of cancer metastasis without preventing growth of the primary tumor. Identification of these genes and understanding their functions are critical for investigation of cancer metastasis. Recent studies on cancer metastasis have identified many new susceptibility MS genes. However, the comprehensive illustration of diverse cellular processes regulated by metastasis suppressors during the metastasis cascade is lacking. Thus, the relationship between MS genes and cancer risk is still unclear. To unveil the cellular complexity of MS genes, we have constructed MSGene (http://MSGene.bioinfo-minzhao.org/), the first literature-based gene resource for exploring human MS genes. In total, we manually curated 194 experimentally verified MS genes and mapped to 1448 homologous genes from 17 model species. Follow-up functional analyses associated 194 human MS genes with epithelium/tissue morphogenesis and epithelia cell proliferation. In addition, pathway analysis highlights the prominent role of MS genes in activation of platelets and coagulation system in tumor metastatic cascade. Moreover, global mutation pattern of MS genes across multiple cancers may reveal common cancer metastasis mechanisms. All these results illustrate the importance of MSGene to our understanding on cell development and cancer metastasis. PMID:26486520

  16. GOMA: functional enrichment analysis tool based on GO modules

    PubMed Central

    Huang, Qiang; Wu, Ling-Yun; Wang, Yong; Zhang, Xiang-Sun

    2013-01-01

    Analyzing the function of gene sets is a critical step in interpreting the results of high-throughput experiments in systems biology. A variety of enrichment analysis tools have been developed in recent years, but most output a long list of significantly enriched terms that are often redundant, making it difficult to extract the most meaningful functions. In this paper, we present GOMA, a novel enrichment analysis method based on the new concept of enriched functional Gene Ontology (GO) modules. With this method, we systematically revealed functional GO modules, i.e., groups of functionally similar GO terms, via an optimization model and then ranked them by enrichment scores. Our new method simplifies enrichment analysis results by reducing redundancy, thereby preventing inconsistent enrichment results among functionally similar terms and providing more biologically meaningful results. PMID:23237213

  17. Genome-wide analysis of the Dof transcription factor gene family reveals soybean-specific duplicable and functional characteristics.

    PubMed

    Guo, Yong; Qiu, Li-Juan

    2013-01-01

    The Dof domain protein family is a classic plant-specific zinc-finger transcription factor family involved in a variety of biological processes. There is great diversity in the number of Dof genes in different plants. However, there are only very limited reports on the characterization of Dof transcription factors in soybean (Glycine max). In the present study, 78 putative Dof genes were identified from the whole-genome sequence of soybean. The predicted GmDof genes were non-randomly distributed within and across 19 out of 20 chromosomes and 97.4% (38 pairs) were preferentially retained duplicate paralogous genes located in duplicated regions of the genome. Soybean-specific segmental duplications contributed significantly to the expansion of the soybean Dof gene family. These Dof proteins were phylogenetically clustered into nine distinct subgroups among which the gene structure and motif compositions were considerably conserved. Comparative phylogenetic analysis of these Dof proteins revealed four major groups, similar to those reported for Arabidopsis and rice. Most of the GmDofs showed specific expression patterns based on RNA-seq data analyses. The expression patterns of some duplicate genes were partially redundant while others showed functional diversity, suggesting the occurrence of sub-functionalization during subsequent evolution. Comprehensive expression profile analysis also provided insights into the soybean-specific functional divergence among members of the Dof gene family. Cis-regulatory element analysis of these GmDof genes suggested diverse functions associated with different processes. Taken together, our results provide useful information for the functional characterization of soybean Dof genes by combining phylogenetic analysis with global gene-expression profiling.

  18. SNPs in stress-responsive rice genes: validation, genotyping, functional relevance and population structure

    PubMed Central

    2012-01-01

    Background Single nucleotide polymorphism (SNP) validation and large-scale genotyping are required to maximize the use of DNA sequence variation and determine the functional relevance of candidate genes for complex stress tolerance traits through genetic association in rice. We used the bead array platform-based Illumina GoldenGate assay to validate and genotype SNPs in a select set of stress-responsive genes to understand their functional relevance and study the population structure in rice. Results Of the 384 putative SNPs assayed, we successfully validated and genotyped 362 (94.3%). Of these 325 (84.6%) showed polymorphism among the 91 rice genotypes examined. Physical distribution, degree of allele sharing, admixtures and introgression, and amino acid replacement of SNPs in 263 abiotic and 62 biotic stress-responsive genes provided clues for identification and targeted mapping of trait-associated genomic regions. We assessed the functional and adaptive significance of validated SNPs in a set of contrasting drought tolerant upland and sensitive lowland rice genotypes by correlating their allelic variation with amino acid sequence alterations in catalytic domains and three-dimensional secondary protein structure encoded by stress-responsive genes. We found a strong genetic association among SNPs in the nine stress-responsive genes with upland and lowland ecological adaptation. Higher nucleotide diversity was observed in indica accessions compared with other rice sub-populations based on different population genetic parameters. The inferred ancestry of 16% among rice genotypes was derived from admixed populations with the maximum between upland aus and wild Oryza species. Conclusions SNPs validated in biotic and abiotic stress-responsive rice genes can be used in association analyses to identify candidate genes and develop functional markers for stress tolerance in rice. PMID:22921105

  19. Knowledge Driven Variable Selection (KDVS) – a new approach to enrichment analysis of gene signatures obtained from high–throughput data

    PubMed Central

    2013-01-01

    Background High–throughput (HT) technologies provide huge amount of gene expression data that can be used to identify biomarkers useful in the clinical practice. The most frequently used approaches first select a set of genes (i.e. gene signature) able to characterize differences between two or more phenotypical conditions, and then provide a functional assessment of the selected genes with an a posteriori enrichment analysis, based on biological knowledge. However, this approach comes with some drawbacks. First, gene selection procedure often requires tunable parameters that affect the outcome, typically producing many false hits. Second, a posteriori enrichment analysis is based on mapping between biological concepts and gene expression measurements, which is hard to compute because of constant changes in biological knowledge and genome analysis. Third, such mapping is typically used in the assessment of the coverage of gene signature by biological concepts, that is either score–based or requires tunable parameters as well, limiting its power. Results We present Knowledge Driven Variable Selection (KDVS), a framework that uses a priori biological knowledge in HT data analysis. The expression data matrix is transformed, according to prior knowledge, into smaller matrices, easier to analyze and to interpret from both computational and biological viewpoints. Therefore KDVS, unlike most approaches, does not exclude a priori any function or process potentially relevant for the biological question under investigation. Differently from the standard approach where gene selection and functional assessment are applied independently, KDVS embeds these two steps into a unified statistical framework, decreasing the variability derived from the threshold–dependent selection, the mapping to the biological concepts, and the signature coverage. We present three case studies to assess the usefulness of the method. Conclusions We showed that KDVS not only enables the selection of known biological functionalities with accuracy, but also identification of new ones. An efficient implementation of KDVS was devised to obtain results in a fast and robust way. Computing time is drastically reduced by the effective use of distributed resources. Finally, integrated visualization techniques immediately increase the interpretability of results. Overall, KDVS approach can be considered as a viable alternative to enrichment–based approaches. PMID:23302187

  20. GTA: a game theoretic approach to identifying cancer subnetwork markers.

    PubMed

    Farahmand, S; Goliaei, S; Ansari-Pour, N; Razaghi-Moghadam, Z

    2016-03-01

    The identification of genetic markers (e.g. genes, pathways and subnetworks) for cancer has been one of the most challenging research areas in recent years. A subset of these studies attempt to analyze genome-wide expression profiles to identify markers with high reliability and reusability across independent whole-transcriptome microarray datasets. Therefore, the functional relationships of genes are integrated with their expression data. However, for a more accurate representation of the functional relationships among genes, utilization of the protein-protein interaction network (PPIN) seems to be necessary. Herein, a novel game theoretic approach (GTA) is proposed for the identification of cancer subnetwork markers by integrating genome-wide expression profiles and PPIN. The GTA method was applied to three distinct whole-transcriptome breast cancer datasets to identify the subnetwork markers associated with metastasis. To evaluate the performance of our approach, the identified subnetwork markers were compared with gene-based, pathway-based and network-based markers. We show that GTA is not only capable of identifying robust metastatic markers, it also provides a higher classification performance. In addition, based on these GTA-based subnetworks, we identified a new bonafide candidate gene for breast cancer susceptibility.

  1. Genomic connectivity networks based on the BrainSpan atlas of the developing human brain

    NASA Astrophysics Data System (ADS)

    Mahfouz, Ahmed; Ziats, Mark N.; Rennert, Owen M.; Lelieveldt, Boudewijn P. F.; Reinders, Marcel J. T.

    2014-03-01

    The human brain comprises systems of networks that span the molecular, cellular, anatomic and functional levels. Molecular studies of the developing brain have focused on elucidating networks among gene products that may drive cellular brain development by functioning together in biological pathways. On the other hand, studies of the brain connectome attempt to determine how anatomically distinct brain regions are connected to each other, either anatomically (diffusion tensor imaging) or functionally (functional MRI and EEG), and how they change over development. A global examination of the relationship between gene expression and connectivity in the developing human brain is necessary to understand how the genetic signature of different brain regions instructs connections to other regions. Furthermore, analyzing the development of connectivity networks based on the spatio-temporal dynamics of gene expression provides a new insight into the effect of neurodevelopmental disease genes on brain networks. In this work, we construct connectivity networks between brain regions based on the similarity of their gene expression signature, termed "Genomic Connectivity Networks" (GCNs). Genomic connectivity networks were constructed using data from the BrainSpan Transcriptional Atlas of the Developing Human Brain. Our goal was to understand how the genetic signatures of anatomically distinct brain regions relate to each other across development. We assessed the neurodevelopmental changes in connectivity patterns of brain regions when networks were constructed with genes implicated in the neurodevelopmental disorder autism (autism spectrum disorder; ASD). Using graph theory metrics to characterize the GCNs, we show that ASD-GCNs are relatively less connected later in development with the cerebellum showing a very distinct expression of ASD-associated genes compared to other brain regions.

  2. Versatile types of polysaccharide-based supramolecular polycation/pDNA nanoplexes for gene delivery

    NASA Astrophysics Data System (ADS)

    Hu, Yang; Zhao, Nana; Yu, Bingran; Liu, Fusheng; Xu, Fu-Jian

    2014-06-01

    Different polysaccharide-based supramolecular polycations were readily synthesized by assembling multiple β-cyclodextrin-cored star polycations with an adamantane-functionalized dextran via host-guest interaction in the absence or presence of bioreducible linkages. Compared with nanoplexes of the starting star polycation and pDNA, the supramolecular polycation/pDNA nanoplexes exhibited similarly low cytotoxicity, improved cellular internalization and significantly higher gene transfection efficiencies. The incorporation of disulfide linkages imparted the supramolecular polycation/pDNA nanoplexes with the advantage of intracellular bioreducibility, resulting in better gene delivery properties. In addition, the antitumor properties of supramolecular polycation/pDNA nanoplexes were also investigated using a suicide gene therapy system. The present study demonstrates that the proper assembly of cyclodextrin-cored polycations with adamantane-functionalized polysaccharides is an effective strategy for the production of new nanoplex delivery systems.Different polysaccharide-based supramolecular polycations were readily synthesized by assembling multiple β-cyclodextrin-cored star polycations with an adamantane-functionalized dextran via host-guest interaction in the absence or presence of bioreducible linkages. Compared with nanoplexes of the starting star polycation and pDNA, the supramolecular polycation/pDNA nanoplexes exhibited similarly low cytotoxicity, improved cellular internalization and significantly higher gene transfection efficiencies. The incorporation of disulfide linkages imparted the supramolecular polycation/pDNA nanoplexes with the advantage of intracellular bioreducibility, resulting in better gene delivery properties. In addition, the antitumor properties of supramolecular polycation/pDNA nanoplexes were also investigated using a suicide gene therapy system. The present study demonstrates that the proper assembly of cyclodextrin-cored polycations with adamantane-functionalized polysaccharides is an effective strategy for the production of new nanoplex delivery systems. Electronic supplementary information (ESI) available: 1H NMR assay and synthetic route of Dex-Ad and Dex-SS-Ad. See DOI: 10.1039/c4nr01590h

  3. Proteins of Unknown Biochemical Function: A Persistent Problem and a Roadmap to Help Overcome It.

    PubMed

    Niehaus, Thomas D; Thamm, Antje M K; de Crécy-Lagard, Valérie; Hanson, Andrew D

    2015-11-01

    The number of sequenced genomes is rapidly increasing, but functional annotation of the genes in these genomes lags far behind. Even in Arabidopsis (Arabidopsis thaliana), only approximately 40% of enzyme- and transporter-encoding genes have credible functional annotations, and this number is even lower in nonmodel plants. Functional characterization of unknown genes is a challenge, but various databases (e.g. for protein localization and coexpression) can be mined to provide clues. If homologous microbial genes exist-and about one-half the genes encoding unknown enzymes and transporters in Arabidopsis have microbial homologs-cross-kingdom comparative genomics can powerfully complement plant-based data. Multiple lines of evidence can strengthen predictions and warrant experimental characterization. In some cases, relatively quick tests in genetically tractable microbes can determine whether a prediction merits biochemical validation, which is costly and demands specialized skills. © 2015 American Society of Plant Biologists. All Rights Reserved.

  4. In silico experiment system for testing hypothesis on gene functions using three condition specific biological networks.

    PubMed

    Lee, Chai-Jin; Kang, Dongwon; Lee, Sangseon; Lee, Sunwon; Kang, Jaewoo; Kim, Sun

    2018-05-25

    Determining functions of a gene requires time consuming, expensive biological experiments. Scientists can speed up this experimental process if the literature information and biological networks can be adequately provided. In this paper, we present a web-based information system that can perform in silico experiments of computationally testing hypothesis on the function of a gene. A hypothesis that is specified in English by the user is converted to genes using a literature and knowledge mining system called BEST. Condition-specific TF, miRNA and PPI (protein-protein interaction) networks are automatically generated by projecting gene and miRNA expression data to template networks. Then, an in silico experiment is to test how well the target genes are connected from the knockout gene through the condition-specific networks. The test result visualizes path from the knockout gene to the target genes in the three networks. Statistical and information-theoretic scores are provided on the resulting web page to help scientists either accept or reject the hypothesis being tested. Our web-based system was extensively tested using three data sets, such as E2f1, Lrrk2, and Dicer1 knockout data sets. We were able to re-produce gene functions reported in the original research papers. In addition, we comprehensively tested with all disease names in MalaCards as hypothesis to show the effectiveness of our system. Our in silico experiment system can be very useful in suggesting biological mechanisms which can be further tested in vivo or in vitro. http://biohealth.snu.ac.kr/software/insilico/. Copyright © 2018 Elsevier Inc. All rights reserved.

  5. Functional classification of rice flanking sequence tagged genes using MapMan terms and global understanding on metabolic and regulatory pathways affected by dxr mutant having defects in light response.

    PubMed

    Chandran, Anil Kumar Nalini; Lee, Gang-Seob; Yoo, Yo-Han; Yoon, Ung-Han; Ahn, Byung-Ohg; Yun, Doh-Won; Kim, Jin-Hyun; Choi, Hong-Kyu; An, GynHeung; Kim, Tae-Ho; Jung, Ki-Hong

    2016-12-01

    Rice is one of the most important food crops for humans. To improve the agronomical traits of rice, the functions of more than 1,000 rice genes have been recently characterized and summarized. The completed, map-based sequence of the rice genome has significantly accelerated the functional characterization of rice genes, but progress remains limited in assigning functions to all predicted non-transposable element (non-TE) genes, estimated to number 37,000-41,000. The International Rice Functional Genomics Consortium (IRFGC) has generated a huge number of gene-indexed mutants by using mutagens such as T-DNA, Tos17 and Ds/dSpm. These mutants have been identified by 246,566 flanking sequence tags (FSTs) and cover 65 % (25,275 of 38,869) of the non-TE genes in rice, while the mutation ratio of TE genes is 25.7 %. In addition, almost 80 % of highly expressed non-TE genes have insertion mutations, indicating that highly expressed genes in rice chromosomes are more likely to have mutations by mutagens such as T-DNA, Ds, dSpm and Tos17. The functions of around 2.5 % of rice genes have been characterized, and studies have mainly focused on transcriptional and post-transcriptional regulation. Slow progress in characterizing the function of rice genes is mainly due to a lack of clues to guide functional studies or functional redundancy. These limitations can be partially solved by a well-categorized functional classification of FST genes. To create this classification, we used the diverse overviews installed in the MapMan toolkit. Gene Ontology (GO) assignment to FST genes supplemented the limitation of MapMan overviews. The functions of 863 of 1,022 known genes can be evaluated by current FST lines, indicating that FST genes are useful resources for functional genomic studies. We assigned 16,169 out of 29,624 FST genes to 34 MapMan classes, including major three categories such as DNA, RNA and protein. To demonstrate the MapMan application on FST genes, transcriptome analysis was done from a rice mutant of 1-deoxy-D-xylulose 5-phosphate reductoisomerase (DXR) gene with FST. Mapping of 756 down-regulated genes in dxr mutants and their annotation in terms of various MapMan overviews revealed candidate genes downstream of DXR-mediating light signaling pathway in diverse functional classes such as the methyl-D-erythritol 4-phosphatepathway (MEP) pathway overview, photosynthesis, secondary metabolism and regulatory overview. This report provides a useful guide for systematic phenomics and further applications to enhance the key agronomic traits of rice.

  6. Functional Gene Differences in Soil Microbial Communities from Conventional, Low-Input, and Organic Farmlands

    PubMed Central

    Xue, Kai; Wu, Liyou; Deng, Ye; He, Zhili; Van Nostrand, Joy; Robertson, Philip G.; Schmidt, Thomas M.

    2013-01-01

    Various agriculture management practices may have distinct influences on soil microbial communities and their ecological functions. In this study, we utilized GeoChip, a high-throughput microarray-based technique containing approximately 28,000 probes for genes involved in nitrogen (N)/carbon (C)/sulfur (S)/phosphorus (P) cycles and other processes, to evaluate the potential functions of soil microbial communities under conventional (CT), low-input (LI), and organic (ORG) management systems at an agricultural research site in Michigan. Compared to CT, a high diversity of functional genes was observed in LI. The functional gene diversity in ORG did not differ significantly from that of either CT or LI. Abundances of genes encoding enzymes involved in C/N/P/S cycles were generally lower in CT than in LI or ORG, with the exceptions of genes in pathways for lignin degradation, methane generation/oxidation, and assimilatory N reduction, which all remained unchanged. Canonical correlation analysis showed that selected soil (bulk density, pH, cation exchange capacity, total C, C/N ratio, NO3−, NH4+, available phosphorus content, and available potassium content) and crop (seed and whole biomass) variables could explain 69.5% of the variation of soil microbial community composition. Also, significant correlations were observed between NO3− concentration and denitrification genes, NH4+ concentration and ammonification genes, and N2O flux and denitrification genes, indicating a close linkage between soil N availability or process and associated functional genes. PMID:23241975

  7. Genome-wide high-throughput SNP discovery and genotyping for understanding natural (functional) allelic diversity and domestication patterns in wild chickpea

    PubMed Central

    Bajaj, Deepak; Das, Shouvik; Badoni, Saurabh; Kumar, Vinod; Singh, Mohar; Bansal, Kailash C.; Tyagi, Akhilesh K.; Parida, Swarup K.

    2015-01-01

    We identified 82489 high-quality genome-wide SNPs from 93 wild and cultivated Cicer accessions through integrated reference genome- and de novo-based GBS assays. High intra- and inter-specific polymorphic potential (66–85%) and broader natural allelic diversity (6–64%) detected by genome-wide SNPs among accessions signify their efficacy for monitoring introgression and transferring target trait-regulating genomic (gene) regions/allelic variants from wild to cultivated Cicer gene pools for genetic improvement. The population-specific assignment of wild Cicer accessions pertaining to the primary gene pool are more influenced by geographical origin/phenotypic characteristics than species/gene-pools of origination. The functional significance of allelic variants (non-synonymous and regulatory SNPs) scanned from transcription factors and stress-responsive genes in differentiating wild accessions (with potential known sources of yield-contributing and stress tolerance traits) from cultivated desi and kabuli accessions, fine-mapping/map-based cloning of QTLs and determination of LD patterns across wild and cultivated gene-pools are suitably elucidated. The correlation between phenotypic (agromorphological traits) and molecular diversity-based admixed domestication patterns within six structured populations of wild and cultivated accessions via genome-wide SNPs was apparent. This suggests utility of whole genome SNPs as a potential resource for identifying naturally selected trait-regulating genomic targets/functional allelic variants adaptive to diverse agroclimatic regions for genetic enhancement of cultivated gene-pools. PMID:26208313

  8. Genome-Wide Screening and Characterization of the Dof Gene Family in Physic Nut (Jatropha curcas L.).

    PubMed

    Wang, Peipei; Li, Jing; Gao, Xiaoyang; Zhang, Di; Li, Anlin; Liu, Changning

    2018-05-29

    Physic nut ( Jatropha curcas L.) is a species of flowering plant with great potential for biofuel production and as an emerging model organism for functional genomic analysis, particularly in the Euphorbiaceae family. DNA binding with one finger (Dof) transcription factors play critical roles in numerous biological processes in plants. Nevertheless, the knowledge about members, and the evolutionary and functional characteristics of the Dof gene family in physic nut is insufficient. Therefore, we performed a genome-wide screening and characterization of the Dof gene family within the physic nut draft genome. In total, 24 JcDof genes (encoding 33 JcDof proteins) were identified. All the JcDof genes were divided into three major groups based on phylogenetic inference, which was further validated by the subsequent gene structure and motif analysis. Genome comparison revealed that segmental duplication may have played crucial roles in the expansion of the JcDof gene family, and gene expansion was mainly subjected to positive selection. The expression profile demonstrated the broad involvement of JcDof genes in response to various abiotic stresses, hormonal treatments and functional divergence. This study provides valuable information for better understanding the evolution of JcDof genes, and lays a foundation for future functional exploration of JcDof genes.

  9. A Review of Gene Knockout Strategies for Microbial Cells.

    PubMed

    Tang, Phooi Wah; Chua, Pooi San; Chong, Shiue Kee; Mohamad, Mohd Saberi; Choon, Yee Wen; Deris, Safaai; Omatu, Sigeru; Corchado, Juan Manuel; Chan, Weng Howe; Rahim, Raha Abdul

    2015-01-01

    Predicting the effects of genetic modification is difficult due to the complexity of metabolic net- works. Various gene knockout strategies have been utilised to deactivate specific genes in order to determine the effects of these genes on the function of microbes. Deactivation of genes can lead to deletion of certain proteins and functions. Through these strategies, the associated function of a deleted gene can be identified from the metabolic networks. The main aim of this paper is to review the available techniques in gene knockout strategies for microbial cells. The review is done in terms of their methodology, recent applications in microbial cells. In addition, the advantages and disadvantages of the techniques are compared and discuss and the related patents are also listed as well. Traditionally, gene knockout is done through wet lab (in vivo) techniques, which were conducted through laboratory experiments. However, these techniques are costly and time consuming. Hence, various dry lab (in silico) techniques, where are conducted using computational approaches, have been developed to surmount these problem. The development of numerous techniques for gene knockout in microbial cells has brought many advancements in the study of gene functions. Based on the literatures, we found that the gene knockout strategies currently used are sensibly implemented with regard to their benefits.

  10. What's that gene (or protein)? Online resources for exploring functions of genes, transcripts, and proteins

    PubMed Central

    Hutchins, James R. A.

    2014-01-01

    The genomic era has enabled research projects that use approaches including genome-scale screens, microarray analysis, next-generation sequencing, and mass spectrometry–based proteomics to discover genes and proteins involved in biological processes. Such methods generate data sets of gene, transcript, or protein hits that researchers wish to explore to understand their properties and functions and thus their possible roles in biological systems of interest. Recent years have seen a profusion of Internet-based resources to aid this process. This review takes the viewpoint of the curious biologist wishing to explore the properties of protein-coding genes and their products, identified using genome-based technologies. Ten key questions are asked about each hit, addressing functions, phenotypes, expression, evolutionary conservation, disease association, protein structure, interactors, posttranslational modifications, and inhibitors. Answers are provided by presenting the latest publicly available resources, together with methods for hit-specific and data set–wide information retrieval, suited to any genome-based analytical technique and experimental species. The utility of these resources is demonstrated for 20 factors regulating cell proliferation. Results obtained using some of these are discussed in more depth using the p53 tumor suppressor as an example. This flexible and universally applicable approach for characterizing experimental hits helps researchers to maximize the potential of their projects for biological discovery. PMID:24723265

  11. Effect of bioaugmentation and biostimulation on sulfate-reducing column startup captured by functional gene profiling.

    PubMed

    Pereyra, Luciana P; Hiibel, Sage R; Perrault, Elizabeth M; Reardon, Kenneth F; Pruden, Amy

    2012-10-01

    Sulfate-reducing permeable reactive zones (SR-PRZs) depend upon a complex microbial community to utilize a lignocellulosic substrate and produce sulfides, which remediate mine drainage by binding heavy metals. To gain insight into the impact of the microbial community composition on the startup time and pseudo-steady-state performance, functional genes corresponding to cellulose-degrading (CD), fermentative, sulfate-reducing, and methanogenic microorganisms were characterized in columns simulating SR-PRZs using quantitative polymerase chain reaction (qPCR) and denaturing gradient gel electrophoresis (DGGE). Duplicate columns were bioaugmented with sulfate-reducing or CD bacteria or biostimulated with ethanol or carboxymethyl cellulose and compared with baseline dairy manure inoculum and uninoculated controls. Sulfate removal began after ~ 15 days for all columns and pseudo-steady state was achieved by Day 30. Despite similar performance, DGGE profiles of 16S rRNA gene and functional genes at pseudo-steady state were distinct among the column treatments, suggesting the potential to control ultimate microbial community composition via bioaugmentation and biostimulation. qPCR revealed enrichment of functional genes in all columns between the initial and pseudo-steady-state time points. This is the first functional gene-based study of CD, fermentative and sulfate-reducing bacteria and methanogenic archaea in a lignocellulose-based environment and provides new qualitative and quantitative insight into startup of a complex microbial system. © 2012 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.

  12. VarWalker: Personalized Mutation Network Analysis of Putative Cancer Genes from Next-Generation Sequencing Data

    PubMed Central

    Jia, Peilin; Zhao, Zhongming

    2014-01-01

    A major challenge in interpreting the large volume of mutation data identified by next-generation sequencing (NGS) is to distinguish driver mutations from neutral passenger mutations to facilitate the identification of targetable genes and new drugs. Current approaches are primarily based on mutation frequencies of single-genes, which lack the power to detect infrequently mutated driver genes and ignore functional interconnection and regulation among cancer genes. We propose a novel mutation network method, VarWalker, to prioritize driver genes in large scale cancer mutation data. VarWalker fits generalized additive models for each sample based on sample-specific mutation profiles and builds on the joint frequency of both mutation genes and their close interactors. These interactors are selected and optimized using the Random Walk with Restart algorithm in a protein-protein interaction network. We applied the method in >300 tumor genomes in two large-scale NGS benchmark datasets: 183 lung adenocarcinoma samples and 121 melanoma samples. In each cancer, we derived a consensus mutation subnetwork containing significantly enriched consensus cancer genes and cancer-related functional pathways. These cancer-specific mutation networks were then validated using independent datasets for each cancer. Importantly, VarWalker prioritizes well-known, infrequently mutated genes, which are shown to interact with highly recurrently mutated genes yet have been ignored by conventional single-gene-based approaches. Utilizing VarWalker, we demonstrated that network-assisted approaches can be effectively adapted to facilitate the detection of cancer driver genes in NGS data. PMID:24516372

  13. VarWalker: personalized mutation network analysis of putative cancer genes from next-generation sequencing data.

    PubMed

    Jia, Peilin; Zhao, Zhongming

    2014-02-01

    A major challenge in interpreting the large volume of mutation data identified by next-generation sequencing (NGS) is to distinguish driver mutations from neutral passenger mutations to facilitate the identification of targetable genes and new drugs. Current approaches are primarily based on mutation frequencies of single-genes, which lack the power to detect infrequently mutated driver genes and ignore functional interconnection and regulation among cancer genes. We propose a novel mutation network method, VarWalker, to prioritize driver genes in large scale cancer mutation data. VarWalker fits generalized additive models for each sample based on sample-specific mutation profiles and builds on the joint frequency of both mutation genes and their close interactors. These interactors are selected and optimized using the Random Walk with Restart algorithm in a protein-protein interaction network. We applied the method in >300 tumor genomes in two large-scale NGS benchmark datasets: 183 lung adenocarcinoma samples and 121 melanoma samples. In each cancer, we derived a consensus mutation subnetwork containing significantly enriched consensus cancer genes and cancer-related functional pathways. These cancer-specific mutation networks were then validated using independent datasets for each cancer. Importantly, VarWalker prioritizes well-known, infrequently mutated genes, which are shown to interact with highly recurrently mutated genes yet have been ignored by conventional single-gene-based approaches. Utilizing VarWalker, we demonstrated that network-assisted approaches can be effectively adapted to facilitate the detection of cancer driver genes in NGS data.

  14. Reconstructing the Evolutionary History of Paralogous APETALA1/FRUITFULL-Like Genes in Grasses (Poaceae)

    PubMed Central

    Preston, Jill C.; Kellogg, Elizabeth A.

    2006-01-01

    Gene duplication is an important mechanism for the generation of evolutionary novelty. Paralogous genes that are not silenced may evolve new functions (neofunctionalization) that will alter the developmental outcome of preexisting genetic pathways, partition ancestral functions (subfunctionalization) into divergent developmental modules, or function redundantly. Functional divergence can occur by changes in the spatio-temporal patterns of gene expression and/or by changes in the activities of their protein products. We reconstructed the evolutionary history of two paralogous monocot MADS-box transcription factors, FUL1 and FUL2, and determined the evolution of sequence and gene expression in grass AP1/FUL-like genes. Monocot AP1/FUL-like genes duplicated at the base of Poaceae and codon substitutions occurred under relaxed selection mostly along the branch leading to FUL2. Following the duplication, FUL1 was apparently lost from early diverging taxa, a pattern consistent with major changes in grass floral morphology. Overlapping gene expression patterns in leaves and spikelets indicate that FUL1 and FUL2 probably share some redundant functions, but that FUL2 may have become temporally restricted under partial subfunctionalization to particular stages of floret development. These data have allowed us to reconstruct the history of AP1/FUL-like genes in Poaceae and to hypothesize a role for this gene duplication in the evolution of the grass spikelet. PMID:16816429

  15. A data science approach to candidate gene selection of pain regarded as a process of learning and neural plasticity.

    PubMed

    Ultsch, Alfred; Kringel, Dario; Kalso, Eija; Mogil, Jeffrey S; Lötsch, Jörn

    2016-12-01

    The increasing availability of "big data" enables novel research approaches to chronic pain while also requiring novel techniques for data mining and knowledge discovery. We used machine learning to combine the knowledge about n = 535 genes identified empirically as relevant to pain with the knowledge about the functions of thousands of genes. Starting from an accepted description of chronic pain as displaying systemic features described by the terms "learning" and "neuronal plasticity," a functional genomics analysis proposed that among the functions of the 535 "pain genes," the biological processes "learning or memory" (P = 8.6 × 10) and "nervous system development" (P = 2.4 × 10) are statistically significantly overrepresented as compared with the annotations to these processes expected by chance. After establishing that the hypothesized biological processes were among important functional genomics features of pain, a subset of n = 34 pain genes were found to be annotated with both Gene Ontology terms. Published empirical evidence supporting their involvement in chronic pain was identified for almost all these genes, including 1 gene identified in March 2016 as being involved in pain. By contrast, such evidence was virtually absent in a randomly selected set of 34 other human genes. Hence, the present computational functional genomics-based method can be used for candidate gene selection, providing an alternative to established methods.

  16. PPDB - A tool for investigation of plants physiology based on gene ontology.

    PubMed

    Sharma, Ajay Shiv; Gupta, Hari Om; Prasad, Rajendra

    2014-09-02

    Representing the way forward, from functional genomics and its ontology to functional understanding and physiological model, in a computationally tractable fashion is one of the ongoing challenges faced by computational biology. To tackle the standpoint, we herein feature the applications of contemporary database management to the development of PPDB, a searching and browsing tool for the Plants Physiology Database that is based upon the mining of a large amount of gene ontology data currently available. The working principles and search options associated with the PPDB are publicly available and freely accessible on-line ( http://www.iitr.ernet.in/ajayshiv/ ) through a user friendly environment generated by means of Drupal-6.24. By knowing that genes are expressed in temporally and spatially characteristic patterns and that their functionally distinct products often reside in specific cellular compartments and may be part of one or more multi-component complexes, this sort of work is intended to be relevant for investigating the functional relationships of gene products at a system level and, thus, helps us approach to the full physiology.

  17. PPDB: A Tool for Investigation of Plants Physiology Based on Gene Ontology.

    PubMed

    Sharma, Ajay Shiv; Gupta, Hari Om; Prasad, Rajendra

    2015-09-01

    Representing the way forward, from functional genomics and its ontology to functional understanding and physiological model, in a computationally tractable fashion is one of the ongoing challenges faced by computational biology. To tackle the standpoint, we herein feature the applications of contemporary database management to the development of PPDB, a searching and browsing tool for the Plants Physiology Database that is based upon the mining of a large amount of gene ontology data currently available. The working principles and search options associated with the PPDB are publicly available and freely accessible online ( http://www.iitr.ac.in/ajayshiv/ ) through a user-friendly environment generated by means of Drupal-6.24. By knowing that genes are expressed in temporally and spatially characteristic patterns and that their functionally distinct products often reside in specific cellular compartments and may be part of one or more multicomponent complexes, this sort of work is intended to be relevant for investigating the functional relationships of gene products at a system level and, thus, helps us approach to the full physiology.

  18. TRV Based Virus Induced Gene Silencing in Gladiolus (Gladiolus grandiflorus L.), A Monocotyledonous Ornamental Plant

    USDA-ARS?s Scientific Manuscript database

    Virus-induced gene silencing (VIGS) has not yet successfully been used as a tool for gene functional analysis in non-grass monocotyledonous geophytes. We therefore tested VIGS in gladiolus (Gladiolus grandiflora L) using a Tobacco Rattle Virus (TRV) vector containing a fragment of the gladiolus gene...

  19. Assembly of a biocompatible triazole-linked gene by one-pot click-DNA ligation

    NASA Astrophysics Data System (ADS)

    Kukwikila, Mikiembo; Gale, Nittaya; El-Sagheer, Afaf H.; Brown, Tom; Tavassoli, Ali

    2017-11-01

    The chemical synthesis of oligonucleotides and their enzyme-mediated assembly into genes and genomes has significantly advanced multiple scientific disciplines. However, these approaches are not without their shortcomings; enzymatic amplification and ligation of oligonucleotides into genes and genomes makes automation challenging, and site-specific incorporation of epigenetic information and/or modified bases into large constructs is not feasible. Here we present a fully chemical one-pot method for the assembly of oligonucleotides into a gene by click-DNA ligation. We synthesize the 335 base-pair gene that encodes the green fluorescent protein iLOV from ten functionalized oligonucleotides that contain 5ʹ-azide and 3ʹ-alkyne units. The resulting click-linked iLOV gene contains eight triazoles at the sites of chemical ligation, and yet is fully biocompatible; it is replicated by DNA polymerases in vitro and encodes a functional iLOV protein in Escherichia coli. We demonstrate the power and potential of our one-pot gene-assembly method by preparing an epigenetically modified variant of the iLOV gene.

  20. Comprehensive analysis of coding-lncRNA gene co-expression network uncovers conserved functional lncRNAs in zebrafish.

    PubMed

    Chen, Wen; Zhang, Xuan; Li, Jing; Huang, Shulan; Xiang, Shuanglin; Hu, Xiang; Liu, Changning

    2018-05-09

    Zebrafish is a full-developed model system for studying development processes and human disease. Recent studies of deep sequencing had discovered a large number of long non-coding RNAs (lncRNAs) in zebrafish. However, only few of them had been functionally characterized. Therefore, how to take advantage of the mature zebrafish system to deeply investigate the lncRNAs' function and conservation is really intriguing. We systematically collected and analyzed a series of zebrafish RNA-seq data, then combined them with resources from known database and literatures. As a result, we obtained by far the most complete dataset of zebrafish lncRNAs, containing 13,604 lncRNA genes (21,128 transcripts) in total. Based on that, a co-expression network upon zebrafish coding and lncRNA genes was constructed and analyzed, and used to predict the Gene Ontology (GO) and the KEGG annotation of lncRNA. Meanwhile, we made a conservation analysis on zebrafish lncRNA, identifying 1828 conserved zebrafish lncRNA genes (1890 transcripts) that have their putative mammalian orthologs. We also found that zebrafish lncRNAs play important roles in regulation of the development and function of nervous system; these conserved lncRNAs present a significant sequential and functional conservation, with their mammalian counterparts. By integrative data analysis and construction of coding-lncRNA gene co-expression network, we gained the most comprehensive dataset of zebrafish lncRNAs up to present, as well as their systematic annotations and comprehensive analyses on function and conservation. Our study provides a reliable zebrafish-based platform to deeply explore lncRNA function and mechanism, as well as the lncRNA commonality between zebrafish and human.

  1. A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference.

    PubMed

    Shen, Xing-Xing; Salichos, Leonidas; Rokas, Antonis

    2016-09-02

    Molecular phylogenetic inference is inherently dependent on choices in both methodology and data. Many insightful studies have shown how choices in methodology, such as the model of sequence evolution or optimality criterion used, can strongly influence inference. In contrast, much less is known about the impact of choices in the properties of the data, typically genes, on phylogenetic inference. We investigated the relationships between 52 gene properties (24 sequence-based, 19 function-based, and 9 tree-based) with each other and with three measures of phylogenetic signal in two assembled data sets of 2,832 yeast and 2,002 mammalian genes. We found that most gene properties, such as evolutionary rate (measured through the percent average of pairwise identity across taxa) and total tree length, were highly correlated with each other. Similarly, several gene properties, such as gene alignment length, Guanine-Cytosine content, and the proportion of tree distance on internal branches divided by relative composition variability (treeness/RCV), were strongly correlated with phylogenetic signal. Analysis of partial correlations between gene properties and phylogenetic signal in which gene evolutionary rate and alignment length were simultaneously controlled, showed similar patterns of correlations, albeit weaker in strength. Examination of the relative importance of each gene property on phylogenetic signal identified gene alignment length, alongside with number of parsimony-informative sites and variable sites, as the most important predictors. Interestingly, the subsets of gene properties that optimally predicted phylogenetic signal differed considerably across our three phylogenetic measures and two data sets; however, gene alignment length and RCV were consistently included as predictors of all three phylogenetic measures in both yeasts and mammals. These results suggest that a handful of sequence-based gene properties are reliable predictors of phylogenetic signal and could be useful in guiding the choice of phylogenetic markers. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  2. Prospecting Metagenomic Enzyme Subfamily Genes for DNA Family Shuffling by a Novel PCR-based Approach*

    PubMed Central

    Wang, Qiuyan; Wu, Huili; Wang, Anming; Du, Pengfei; Pei, Xiaolin; Li, Haifeng; Yin, Xiaopu; Huang, Lifeng; Xiong, Xiaolong

    2010-01-01

    DNA family shuffling is a powerful method for enzyme engineering, which utilizes recombination of naturally occurring functional diversity to accelerate laboratory-directed evolution. However, the use of this technique has been hindered by the scarcity of family genes with the required level of sequence identity in the genome database. We describe here a strategy for collecting metagenomic homologous genes for DNA shuffling from environmental samples by truncated metagenomic gene-specific PCR (TMGS-PCR). Using identified metagenomic gene-specific primers, twenty-three 921-bp truncated lipase gene fragments, which shared 64–99% identity with each other and formed a distinct subfamily of lipases, were retrieved from 60 metagenomic samples. These lipase genes were shuffled, and selected active clones were characterized. The chimeric clones show extensive functional and genetic diversity, as demonstrated by functional characterization and sequence analysis. Our results indicate that homologous sequences of genes captured by TMGS-PCR can be used as suitable genetic material for DNA family shuffling with broad applications in enzyme engineering. PMID:20962349

  3. Virus-induced gene silencing and transient gene expression in soybean using Bean pod mottle virus infectious clones

    USDA-ARS?s Scientific Manuscript database

    Virus-induced gene silencing (VIGS) is a powerful and rapid approach for determining the functions of plant genes. The basis of VIGS is that a viral genome is engineered so that it can carry fragments of plant genes, typically in the 200-300 base pair size range. The recombinant viruses are used to ...

  4. Fuzzy measures on the Gene Ontology for gene product similarity.

    PubMed

    Popescu, Mihail; Keller, James M; Mitchell, Joyce A

    2006-01-01

    One of the most important objects in bioinformatics is a gene product (protein or RNA). For many gene products, functional information is summarized in a set of Gene Ontology (GO) annotations. For these genes, it is reasonable to include similarity measures based on the terms found in the GO or other taxonomy. In this paper, we introduce several novel measures for computing the similarity of two gene products annotated with GO terms. The fuzzy measure similarity (FMS) has the advantage that it takes into consideration the context of both complete sets of annotation terms when computing the similarity between two gene products. When the two gene products are not annotated by common taxonomy terms, we propose a method that avoids a zero similarity result. To account for the variations in the annotation reliability, we propose a similarity measure based on the Choquet integral. These similarity measures provide extra tools for the biologist in search of functional information for gene products. The initial testing on a group of 194 sequences representing three proteins families shows a higher correlation of the FMS and Choquet similarities to the BLAST sequence similarities than the traditional similarity measures such as pairwise average or pairwise maximum.

  5. In silico mining and PCR-based approaches to transcription factor discovery in non-model plants: gene discovery of the WRKY transcription factors in conifers.

    PubMed

    Liu, Jun-Jun; Xiang, Yu

    2011-01-01

    WRKY transcription factors are key regulators of numerous biological processes in plant growth and development, as well as plant responses to abiotic and biotic stresses. Research on biological functions of plant WRKY genes has focused in the past on model plant species or species with largely characterized transcriptomes. However, a variety of non-model plants, such as forest conifers, are essential as feed, biofuel, and wood or for sustainable ecosystems. Identification of WRKY genes in these non-model plants is equally important for understanding the evolutionary and function-adaptive processes of this transcription factor family. Because of limited genomic information, the rarity of regulatory gene mRNAs in transcriptomes, and the sequence divergence to model organism genes, identification of transcription factors in non-model plants using methods similar to those generally used for model plants is difficult. This chapter describes a gene family discovery strategy for identification of WRKY transcription factors in conifers by a combination of in silico-based prediction and PCR-based experimental approaches. Compared to traditional cDNA library screening or EST sequencing at transcriptome scales, this integrated gene discovery strategy provides fast, simple, reliable, and specific methods to unveil the WRKY gene family at both genome and transcriptome levels in non-model plants.

  6. Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction.

    PubMed

    Stojanova, Daniela; Ceci, Michelangelo; Malerba, Donato; Dzeroski, Saso

    2013-09-26

    Ontologies and catalogs of gene functions, such as the Gene Ontology (GO) and MIPS-FUN, assume that functional classes are organized hierarchically, that is, general functions include more specific ones. This has recently motivated the development of several machine learning algorithms for gene function prediction that leverages on this hierarchical organization where instances may belong to multiple classes. In addition, it is possible to exploit relationships among examples, since it is plausible that related genes tend to share functional annotations. Although these relationships have been identified and extensively studied in the area of protein-protein interaction (PPI) networks, they have not received much attention in hierarchical and multi-class gene function prediction. Relations between genes introduce autocorrelation in functional annotations and violate the assumption that instances are independently and identically distributed (i.i.d.), which underlines most machine learning algorithms. Although the explicit consideration of these relations brings additional complexity to the learning process, we expect substantial benefits in predictive accuracy of learned classifiers. This article demonstrates the benefits (in terms of predictive accuracy) of considering autocorrelation in multi-class gene function prediction. We develop a tree-based algorithm for considering network autocorrelation in the setting of Hierarchical Multi-label Classification (HMC). We empirically evaluate the proposed algorithm, called NHMC (Network Hierarchical Multi-label Classification), on 12 yeast datasets using each of the MIPS-FUN and GO annotation schemes and exploiting 2 different PPI networks. The results clearly show that taking autocorrelation into account improves the predictive performance of the learned models for predicting gene function. Our newly developed method for HMC takes into account network information in the learning phase: When used for gene function prediction in the context of PPI networks, the explicit consideration of network autocorrelation increases the predictive performance of the learned models. Overall, we found that this holds for different gene features/ descriptions, functional annotation schemes, and PPI networks: Best results are achieved when the PPI network is dense and contains a large proportion of function-relevant interactions.

  7. Identification of susceptible genes for complex chronic diseases based on disease risk functional SNPs and interaction networks.

    PubMed

    Li, Wan; Zhu, Lina; Huang, Hao; He, Yuehan; Lv, Junjie; Li, Weimin; Chen, Lina; He, Weiming

    2017-10-01

    Complex chronic diseases are caused by the effects of genetic and environmental factors. Single nucleotide polymorphisms (SNPs), one common type of genetic variations, played vital roles in diseases. We hypothesized that disease risk functional SNPs in coding regions and protein interaction network modules were more likely to contribute to the identification of disease susceptible genes for complex chronic diseases. This could help to further reveal the pathogenesis of complex chronic diseases. Disease risk SNPs were first recognized from public SNP data for coronary heart disease (CHD), hypertension (HT) and type 2 diabetes (T2D). SNPs in coding regions that were classified into nonsense and missense by integrating several SNP functional annotation databases were treated as functional SNPs. Then, regions significantly associated with each disease were screened using random permutations for disease risk functional SNPs. Corresponding to these regions, 155, 169 and 173 potential disease susceptible genes were identified for CHD, HT and T2D, respectively. A disease-related gene product interaction network in environmental context was constructed for interacting gene products of both disease genes and potential disease susceptible genes for these diseases. After functional enrichment analysis for disease associated modules, 5 CHD susceptible genes, 7 HT susceptible genes and 3 T2D susceptible genes were finally identified, some of which had pleiotropic effects. Most of these genes were verified to be related to these diseases in literature. This was similar for disease genes identified from another method proposed by Lee et al. from a different aspect. This research could provide novel perspectives for diagnosis and treatment of complex chronic diseases and susceptible genes identification for other diseases. Copyright © 2017 Elsevier Inc. All rights reserved.

  8. Postnatal Cardiac Gene Editing Using CRISPR/Cas9 With AAV9-Mediated Delivery of Short Guide RNAs Results in Mosaic Gene Disruption.

    PubMed

    Johansen, Anne Katrine; Molenaar, Bas; Versteeg, Danielle; Leitoguinho, Ana Rita; Demkes, Charlotte; Spanjaard, Bastiaan; de Ruiter, Hesther; Akbari Moqadam, Farhad; Kooijman, Lieneke; Zentilin, Lorena; Giacca, Mauro; van Rooij, Eva

    2017-10-27

    CRISPR/Cas9 (clustered regularly interspaced palindromic repeats/CRISPR-associated protein 9)-based DNA editing has rapidly evolved as an attractive tool to modify the genome. Although CRISPR/Cas9 has been extensively used to manipulate the germline in zygotes, its application in postnatal gene editing remains incompletely characterized. To evaluate the feasibility of CRISPR/Cas9-based cardiac genome editing in vivo in postnatal mice. We generated cardiomyocyte-specific Cas9 mice and demonstrated that Cas9 expression does not affect cardiac function or gene expression. As a proof-of-concept, we delivered short guide RNAs targeting 3 genes critical for cardiac physiology, Myh6 , Sav1 , and Tbx20 , using a cardiotropic adeno-associated viral vector 9. Despite a similar degree of DNA disruption and subsequent mRNA downregulation, only disruption of Myh6 was sufficient to induce a cardiac phenotype, irrespective of short guide RNA exposure or the level of Cas9 expression. DNA sequencing analysis revealed target-dependent mutations that were highly reproducible across mice resulting in differential rates of in- and out-of-frame mutations. Finally, we applied a dual short guide RNA approach to effectively delete an important coding region of Sav1 , which increased the editing efficiency. Our results indicate that the effect of postnatal CRISPR/Cas9-based cardiac gene editing using adeno-associated virus serotype 9 to deliver a single short guide RNA is target dependent. We demonstrate a mosaic pattern of gene disruption, which hinders the application of the technology to study gene function. Further studies are required to expand the versatility of CRISPR/Cas9 as a robust tool to study novel cardiac gene functions in vivo. © 2017 American Heart Association, Inc.

  9. A post-gene silencing bioinformatics protocol for plant-defence gene validation and underlying process identification: case study of the Arabidopsis thaliana NPR1.

    PubMed

    Yocgo, Rosita E; Geza, Ephifania; Chimusa, Emile R; Mazandu, Gaston K

    2017-11-23

    Advances in forward and reverse genetic techniques have enabled the discovery and identification of several plant defence genes based on quantifiable disease phenotypes in mutant populations. Existing models for testing the effect of gene inactivation or genes causing these phenotypes do not take into account eventual uncertainty of these datasets and potential noise inherent in the biological experiment used, which may mask downstream analysis and limit the use of these datasets. Moreover, elucidating biological mechanisms driving the induced disease resistance and influencing these observable disease phenotypes has never been systematically tackled, eliciting the need for an efficient model to characterize completely the gene target under consideration. We developed a post-gene silencing bioinformatics (post-GSB) protocol which accounts for potential biases related to the disease phenotype datasets in assessing the contribution of the gene target to the plant defence response. The post-GSB protocol uses Gene Ontology semantic similarity and pathway dataset to generate enriched process regulatory network based on the functional degeneracy of the plant proteome to help understand the induced plant defence response. We applied this protocol to investigate the effect of the NPR1 gene silencing to changes in Arabidopsis thaliana plants following Pseudomonas syringae pathovar tomato strain DC3000 infection. Results indicated that the presence of a functionally active NPR1 reduced the plant's susceptibility to the infection, with about 99% of variability in Pseudomonas spore growth between npr1 mutant and wild-type samples. Moreover, the post-GSB protocol has revealed the coordinate action of target-associated genes and pathways through an enriched process regulatory network, summarizing the potential target-based induced disease resistance mechanism. This protocol can improve the characterization of the gene target and, potentially, elucidate induced defence response by more effectively utilizing available phenotype information and plant proteome functional knowledge.

  10. Functional networks inference from rule-based machine learning models.

    PubMed

    Lazzarini, Nicola; Widera, Paweł; Williamson, Stuart; Heer, Rakesh; Krasnogor, Natalio; Bacardit, Jaume

    2016-01-01

    Functional networks play an important role in the analysis of biological processes and systems. The inference of these networks from high-throughput (-omics) data is an area of intense research. So far, the similarity-based inference paradigm (e.g. gene co-expression) has been the most popular approach. It assumes a functional relationship between genes which are expressed at similar levels across different samples. An alternative to this paradigm is the inference of relationships from the structure of machine learning models. These models are able to capture complex relationships between variables, that often are different/complementary to the similarity-based methods. We propose a protocol to infer functional networks from machine learning models, called FuNeL. It assumes, that genes used together within a rule-based machine learning model to classify the samples, might also be functionally related at a biological level. The protocol is first tested on synthetic datasets and then evaluated on a test suite of 8 real-world datasets related to human cancer. The networks inferred from the real-world data are compared against gene co-expression networks of equal size, generated with 3 different methods. The comparison is performed from two different points of view. We analyse the enriched biological terms in the set of network nodes and the relationships between known disease-associated genes in a context of the network topology. The comparison confirms both the biological relevance and the complementary character of the knowledge captured by the FuNeL networks in relation to similarity-based methods and demonstrates its potential to identify known disease associations as core elements of the network. Finally, using a prostate cancer dataset as a case study, we confirm that the biological knowledge captured by our method is relevant to the disease and consistent with the specialised literature and with an independent dataset not used in the inference process. The implementation of our network inference protocol is available at: http://ico2s.org/software/funel.html.

  11. Functional Gene Diversity and Metabolic Potential of the Microbial Community in an Estuary-Shelf Environment

    PubMed Central

    Wang, Yu; Zhang, Rui; He, Zhili; Van Nostrand, Joy D.; Zheng, Qiang; Zhou, Jizhong; Jiao, Nianzhi

    2017-01-01

    Microbes play crucial roles in various biogeochemical processes in the ocean, including carbon (C), nitrogen (N), and phosphorus (P) cycling. Functional gene diversity and the structure of the microbial community determines its metabolic potential and therefore its ecological function in the marine ecosystem. However, little is known about the functional gene composition and metabolic potential of bacterioplankton in estuary areas. The East China Sea (ECS) is a dynamic marginal ecosystem in the western Pacific Ocean that is mainly affected by input from the Changjiang River and the Kuroshio Current. Here, using a high-throughput functional gene microarray (GeoChip), we analyzed the functional gene diversity, composition, structure, and metabolic potential of microbial assemblages in different ECS water masses. Four water masses determined by temperature and salinity relationship showed different patterns of functional gene diversity and composition. Generally, functional gene diversity [Shannon–Weaner’s H and reciprocal of Simpson’s 1/(1-D)] in the surface water masses was higher than that in the bottom water masses. The different presence and proportion of functional genes involved in C, N, and P cycling among the bacteria of the different water masses showed different metabolic preferences of the microbial populations in the ECS. Genes involved in starch metabolism (amyA and nplT) showed higher proportion in microbial communities of the surface water masses than of the bottom water masses. In contrast, a higher proportion of genes involved in chitin degradation was observed in microorganisms of the bottom water masses. Moreover, we found a higher proportion of nitrogen fixation (nifH), transformation of hydroxylamine to nitrite (hao) and ammonification (gdh) genes in the microbial communities of the bottom water masses compared with those of the surface water masses. The spatial variation of microbial functional genes was significantly correlated with salinity, temperature, and chlorophyll based on canonical correspondence analysis, suggesting a significant influence of hydrologic conditions on water microbial communities. Our data provide new insights into better understanding of the functional potential of microbial communities in the complex estuarine-coastal environmental gradient of the ECS. PMID:28680420

  12. Developing a Bacteroides System for Function-Based Screening of DNA from the Human Gut Microbiome.

    PubMed

    Lam, Kathy N; Martens, Eric C; Charles, Trevor C

    2018-01-01

    Functional metagenomics is a powerful method that allows the isolation of genes whose role may not have been predicted from DNA sequence. In this approach, first, environmental DNA is cloned to generate metagenomic libraries that are maintained in Escherichia coli, and second, the cloned DNA is screened for activities of interest. Typically, functional screens are carried out using E. coli as a surrogate host, although there likely exist barriers to gene expression, such as lack of recognition of native promoters. Here, we describe efforts to develop Bacteroides thetaiotaomicron as a surrogate host for screening metagenomic DNA from the human gut. We construct a B. thetaiotaomicron-compatible fosmid cloning vector, generate a fosmid clone library using DNA from the human gut, and show successful functional complementation of a B. thetaiotaomicron glycan utilization mutant. Though we were unable to retrieve the physical fosmid after complementation, we used genome sequencing to identify the complementing genes derived from the human gut microbiome. Our results demonstrate that the use of B. thetaiotaomicron to express metagenomic DNA is promising, but they also exemplify the challenges that can be encountered in the development of new surrogate hosts for functional screening. IMPORTANCE Human gut microbiome research has been supported by advances in DNA sequencing that make it possible to obtain gigabases of sequence data from metagenomes but is limited by a lack of knowledge of gene function that leads to incomplete annotation of these data sets. There is a need for the development of methods that can provide experimental data regarding microbial gene function. Functional metagenomics is one such method, but functional screens are often carried out using hosts that may not be able to express the bulk of the environmental DNA being screened. We expand the range of current screening hosts and demonstrate that human gut-derived metagenomic libraries can be introduced into the gut microbe Bacteroides thetaiotaomicron to identify genes based on activity screening. Our results support the continuing development of genetically tractable systems to obtain information about gene function.

  13. Nmf9 Encodes a Highly Conserved Protein Important to Neurological Function in Mice and Flies.

    PubMed

    Zhang, Shuxiao; Ross, Kevin D; Seidner, Glen A; Gorman, Michael R; Poon, Tiffany H; Wang, Xiaobo; Keithley, Elizabeth M; Lee, Patricia N; Martindale, Mark Q; Joiner, William J; Hamilton, Bruce A

    2015-07-01

    Many protein-coding genes identified by genome sequencing remain without functional annotation or biological context. Here we define a novel protein-coding gene, Nmf9, based on a forward genetic screen for neurological function. ENU-induced and genome-edited null mutations in mice produce deficits in vestibular function, fear learning and circadian behavior, which correlated with Nmf9 expression in inner ear, amygdala, and suprachiasmatic nuclei. Homologous genes from unicellular organisms and invertebrate animals predict interactions with small GTPases, but the corresponding domains are absent in mammalian Nmf9. Intriguingly, homozygotes for null mutations in the Drosophila homolog, CG45058, show profound locomotor defects and premature death, while heterozygotes show striking effects on sleep and activity phenotypes. These results link a novel gene orthology group to discrete neurological functions, and show conserved requirement across wide phylogenetic distance and domain level structural changes.

  14. The cld mutation: narrowing the critical chromosomal region and selecting candidate genes.

    PubMed

    Péterfy, Miklós; Mao, Hui Z; Doolittle, Mark H

    2006-10-01

    Combined lipase deficiency (cld) is a recessive, lethal mutation specific to the tw73 haplotype on mouse Chromosome 17. While the cld mutation results in lipase proteins that are inactive, aggregated, and retained in the endoplasmic reticulum (ER), it maps separately from the lipase structural genes. We have narrowed the gene critical region by about 50% using the tw18 haplotype for deletion mapping and a recombinant chromosome used originally to map cld with respect to the phenotypic marker tf. The region now extends from 22 to 25.6 Mbp on the wild-type chromosome, currently containing 149 genes and 50 expressed sequence tags (ESTs). To identify the affected gene, we have selected candidates based on their known role in associated biological processes, cellular components, and molecular functions that best fit with the predicted function of the cld gene. A secondary approach was based on differences in mRNA levels between mutant (cld/cld) and unaffected (+/cld) cells. Using both approaches, we have identified seven functional candidates with an ER localization and/or an involvement in protein maturation and folding that could explain the lipase deficiency, and six expression candidates that exhibit large differences in mRNA levels between mutant and unaffected cells. Significantly, two genes were found to be candidates with regard to both function and expression, thus emerging as the strongest candidates for cld. We discuss the implications of our mapping results and our selection of candidates with respect to other genes, deletions, and mutations occurring in the cld critical region.

  15. Transcriptome Sequencing Revealed Significant Alteration of Cortical Promoter Usage and Splicing in Schizophrenia

    PubMed Central

    Wu, Jing Qin; Wang, Xi; Beveridge, Natalie J.; Tooney, Paul A.; Scott, Rodney J.; Carr, Vaughan J.; Cairns, Murray J.

    2012-01-01

    Background While hybridization based analysis of the cortical transcriptome has provided important insight into the neuropathology of schizophrenia, it represents a restricted view of disease-associated gene activity based on predetermined probes. By contrast, sequencing technology can provide un-biased analysis of transcription at nucleotide resolution. Here we use this approach to investigate schizophrenia-associated cortical gene expression. Methodology/Principal Findings The data was generated from 76 bp reads of RNA-Seq, aligned to the reference genome and assembled into transcripts for quantification of exons, splice variants and alternative promoters in postmortem superior temporal gyrus (STG/BA22) from 9 male subjects with schizophrenia and 9 matched non-psychiatric controls. Differentially expressed genes were then subjected to further sequence and functional group analysis. The output, amounting to more than 38 Gb of sequence, revealed significant alteration of gene expression including many previously shown to be associated with schizophrenia. Gene ontology enrichment analysis followed by functional map construction identified three functional clusters highly relevant to schizophrenia including neurotransmission related functions, synaptic vesicle trafficking, and neural development. Significantly, more than 2000 genes displayed schizophrenia-associated alternative promoter usage and more than 1000 genes showed differential splicing (FDR<0.05). Both types of transcriptional isoforms were exemplified by reads aligned to the neurodevelopmentally significant doublecortin-like kinase 1 (DCLK1) gene. Conclusions This study provided the first deep and un-biased analysis of schizophrenia-associated transcriptional diversity within the STG, and revealed variants with important implications for the complex pathophysiology of schizophrenia. PMID:22558445

  16. An integrative approach for measuring semantic similarities using gene ontology.

    PubMed

    Peng, Jiajie; Li, Hongxiang; Jiang, Qinghua; Wang, Yadong; Chen, Jin

    2014-01-01

    Gene Ontology (GO) provides rich information and a convenient way to study gene functional similarity, which has been successfully used in various applications. However, the existing GO based similarity measurements have limited functions for only a subset of GO information is considered in each measure. An appropriate integration of the existing measures to take into account more information in GO is demanding. We propose a novel integrative measure called InteGO2 to automatically select appropriate seed measures and then to integrate them using a metaheuristic search method. The experiment results show that InteGO2 significantly improves the performance of gene similarity in human, Arabidopsis and yeast on both molecular function and biological process GO categories. InteGO2 computes gene-to-gene similarities more accurately than tested existing measures and has high robustness. The supplementary document and software are available at http://mlg.hit.edu.cn:8082/.

  17. Genome-wide analysis of the GRAS gene family in physic nut (Jatropha curcas L.).

    PubMed

    Wu, Z Y; Wu, P Z; Chen, Y P; Li, M R; Wu, G J; Jiang, H W

    2015-12-29

    GRAS proteins play vital roles in plant growth and development. Physic nut (Jatropha curcas L.) was found to have a total of 48 GRAS family members (JcGRAS), 15 more than those found in Arabidopsis. The JcGRAS genes were divided into 12 subfamilies or 15 ancient monophyletic lineages based on the phylogenetic analysis of GRAS proteins from both flowering and lower plants. The functions of GRAS genes in 9 subfamilies have been reported previously for several plants, while the genes in the remaining 3 subfamilies were of unknown function; we named the latter families U1 to U3. No member of U3 subfamily is present in Arabidopsis and Poaceae species according to public genome sequence data. In comparison with the number of GRAS genes in Arabidopsis, more were detected in physic nut, resulting from the retention of many ancient GRAS subfamilies and the formation of tandem repeats during evolution. No evidence of recent duplication among JcGRAS genes was observed in physic nut. Based on digital gene expression data, 21 of the 48 genes exhibited differential expression in four tissues analyzed. Two members of subfamily U3 were expressed only in buds and flowers, implying that they may play specific roles. Our results provide valuable resources for future studies on the functions of GRAS proteins in physic nut.

  18. Gene regulatory network identification from the yeast cell cycle based on a neuro-fuzzy system.

    PubMed

    Wang, B H; Lim, J W; Lim, J S

    2016-08-30

    Many studies exist for reconstructing gene regulatory networks (GRNs). In this paper, we propose a method based on an advanced neuro-fuzzy system, for gene regulatory network reconstruction from microarray time-series data. This approach uses a neural network with a weighted fuzzy function to model the relationships between genes. Fuzzy rules, which determine the regulators of genes, are very simplified through this method. Additionally, a regulator selection procedure is proposed, which extracts the exact dynamic relationship between genes, using the information obtained from the weighted fuzzy function. Time-series related features are extracted from the original data to employ the characteristics of temporal data that are useful for accurate GRN reconstruction. The microarray dataset of the yeast cell cycle was used for our study. We measured the mean squared prediction error for the efficiency of the proposed approach and evaluated the accuracy in terms of precision, sensitivity, and F-score. The proposed method outperformed the other existing approaches.

  19. Translatomics combined with transcriptomics and proteomics reveals novel functional, recently evolved orphan genes in Escherichia coli O157:H7 (EHEC).

    PubMed

    Neuhaus, Klaus; Landstorfer, Richard; Fellner, Lea; Simon, Svenja; Schafferhans, Andrea; Goldberg, Tatyana; Marx, Harald; Ozoline, Olga N; Rost, Burkhard; Kuster, Bernhard; Keim, Daniel A; Scherer, Siegfried

    2016-02-24

    Genomes of E. coli, including that of the human pathogen Escherichia coli O157:H7 (EHEC) EDL933, still harbor undetected protein-coding genes which, apparently, have escaped annotation due to their small size and non-essential function. To find such genes, global gene expression of EHEC EDL933 was examined, using strand-specific RNAseq (transcriptome), ribosomal footprinting (translatome) and mass spectrometry (proteome). Using the above methods, 72 short, non-annotated protein-coding genes were detected. All of these showed signals in the ribosomal footprinting assay indicating mRNA translation. Seven were verified by mass spectrometry. Fifty-seven genes are annotated in other enterobacteriaceae, mainly as hypothetical genes; the remaining 15 genes constitute novel discoveries. In addition, protein structure and function were predicted computationally and compared between EHEC-encoded proteins and 100-times randomly shuffled proteins. Based on this comparison, 61 of the 72 novel proteins exhibit predicted structural and functional features similar to those of annotated proteins. Many of the novel genes show differential transcription when grown under eleven diverse growth conditions suggesting environmental regulation. Three genes were found to confer a phenotype in previous studies, e.g., decreased cattle colonization. These findings demonstrate that ribosomal footprinting can be used to detect novel protein coding genes, contributing to the growing body of evidence that hypothetical genes are not annotation artifacts and opening an additional way to study their functionality. All 72 genes are taxonomically restricted and, therefore, appear to have evolved relatively recently de novo.

  20. Microbial community structure in fermentation process of Shaoxing rice wine by Illumina-based metagenomic sequencing.

    PubMed

    Xie, Guangfa; Wang, Lan; Gao, Qikang; Yu, Wenjing; Hong, Xutao; Zhao, Lingyun; Zou, Huijun

    2013-09-01

    To understand the role of the community structure of microbes in the environment in the fermentation of Shaoxing rice wine, samples collected from a wine factory were subjected to Illumina-based metagenomic sequencing. De novo assembly of the sequencing reads allowed the characterisation of more than 23 thousand microbial genes derived from 1.7 and 1.88 Gbp of sequences from two samples fermented for 5 and 30 days respectively. The microbial community structure at different fermentation times of Shaoxing rice wine was revealed, showing the different roles of the microbiota in the fermentation process of Shaoxing rice wine. The gene function of both samples was also studied in the COG database, with most genes belonging to category S (function unknown), category E (amino acid transport and metabolism) and unclassified group. The results show that both the microbial community structure and gene function composition change greatly at different time points of Shaoxing rice wine fermentation. © 2013 Society of Chemical Industry.

  1. Eleven loci with new reproducible genetic associations with allergic disease risk.

    PubMed

    Ferreira, Manuel A R; Vonk, Judith M; Baurecht, Hansjörg; Marenholz, Ingo; Tian, Chao; Hoffman, Joshua D; Helmer, Quinta; Tillander, Annika; Ullemar, Vilhelmina; Lu, Yi; Rüschendorf, Franz; Hinds, David A; Hübner, Norbert; Weidinger, Stephan; Magnusson, Patrik K E; Jorgenson, Eric; Lee, Young-Ae; Boomsma, Dorret I; Karlsson, Robert; Almqvist, Catarina; Koppelman, Gerard H; Paternoster, Lavinia

    2018-04-19

    A recent genome-wide association study (GWAS) identified 99 loci that contain genetic risk variants shared between asthma, hay fever, and eczema. Many more risk loci shared between these common allergic diseases remain to be discovered, which could point to new therapeutic opportunities. We sought to identify novel risk loci shared between asthma, hay fever, and eczema by applying a gene-based test of association to results from a published GWAS that included data from 360,838 subjects. We used approximate conditional analysis to adjust the results from the published GWAS for the effects of the top risk variants identified in that study. We then analyzed the adjusted GWAS results with the EUGENE gene-based approach, which combines evidence for association with disease risk across regulatory variants identified in different tissues. Novel gene-based associations were followed up in an independent sample of 233,898 subjects from the UK Biobank study. Of the 19,432 genes tested, 30 had a significant gene-based association at a Bonferroni-corrected P value of 2.5 × 10 -6 . Of these, 20 were also significantly associated (P < .05/30 = .0016) with disease risk in the replication sample, including 19 that were located in 11 loci not reported to contain allergy risk variants in previous GWASs. Among these were 9 genes with a known function that is directly relevant to allergic disease: FOSL2, VPRBP, IPCEF1, PRR5L, NCF4, APOBR, IL27, ATXN2L, and LAT. For 4 genes (eg, ATXN2L), a genetically determined decrease in gene expression was associated with decreased allergy risk, and therefore drugs that inhibit gene expression or function are predicted to ameliorate disease symptoms. The opposite directional effect was observed for 14 genes, including IL27, a cytokine known to suppress T H 2 responses. Using a gene-based approach, we identified 11 risk loci for allergic disease that were not reported in previous GWASs. Functional studies that investigate the contribution of the 19 associated genes to the pathophysiology of allergic disease and assess their therapeutic potential are warranted. Copyright © 2018 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.

  2. Planting increases the abundance and structure complexity of soil core functional genes relevant to carbon and nitrogen cycling

    PubMed Central

    Wang, Feng; Liang, Yuting; Jiang, Yuji; Yang, Yunfeng; Xue, Kai; Xiong, Jinbo; Zhou, Jizhong; Sun, Bo

    2015-01-01

    Plants have an important impact on soil microbial communities and their functions. However, how plants determine the microbial composition and network interactions is still poorly understood. During a four-year field experiment, we investigated the functional gene composition of three types of soils (Phaeozem, Cambisols and Acrisol) under maize planting and bare fallow regimes located in cold temperate, warm temperate and subtropical regions, respectively. The core genes were identified using high-throughput functional gene microarray (GeoChip 3.0), and functional molecular ecological networks (fMENs) were subsequently developed with the random matrix theory (RMT)-based conceptual framework. Our results demonstrated that planting significantly (P < 0.05) increased the gene alpha-diversity in terms of richness and Shannon – Simpson’s indexes for all three types of soils and 83.5% of microbial alpha-diversity can be explained by the plant factor. Moreover, planting had significant impacts on the microbial community structure and the network interactions of the microbial communities. The calculated network complexity was higher under maize planting than under bare fallow regimes. The increase of the functional genes led to an increase in both soil respiration and nitrification potential with maize planting, indicating that changes in the soil microbial communities and network interactions influenced ecological functioning. PMID:26396042

  3. Identification of candidate MLO powdery mildew susceptibility genes in cultivated Solanaceae and functional characterization of tobacco NtMLO1.

    PubMed

    Appiano, Michela; Pavan, Stefano; Catalano, Domenico; Zheng, Zheng; Bracuto, Valentina; Lotti, Concetta; Visser, Richard G F; Ricciardi, Luigi; Bai, Yuling

    2015-10-01

    Specific homologs of the plant Mildew Locus O (MLO) gene family act as susceptibility factors towards the powdery mildew (PM) fungal disease, causing significant economic losses in agricultural settings. Thus, in order to obtain PM resistant phenotypes, a general breeding strategy has been proposed, based on the selective inactivation of MLO susceptibility genes across cultivated species. In this study, PCR-based methodologies were used in order to isolate MLO genes from cultivated solanaceous crops that are hosts for PM fungi, namely eggplant, potato and tobacco, which were named SmMLO1, StMLO1 and NtMLO1, respectively. Based on phylogenetic analysis and sequence alignment, these genes were predicted to be orthologs of tomato SlMLO1 and pepper CaMLO2, previously shown to be required for PM pathogenesis. Full-length sequence of the tobacco homolog NtMLO1 was used for a heterologous transgenic complementation assay, resulting in its characterization as a PM susceptibility gene. The same assay showed that a single nucleotide change in a mutated NtMLO1 allele leads to complete gene loss-of-function. Results here presented, also including a complete overview of the tobacco and potato MLO gene families, are valuable to study MLO gene evolution in Solanaceae and for molecular breeding approaches aimed at introducing PM resistance using strategies of reverse genetics.

  4. Methodology for the inference of gene function from phenotype data.

    PubMed

    Ascensao, Joao A; Dolan, Mary E; Hill, David P; Blake, Judith A

    2014-12-12

    Biomedical ontologies are increasingly instrumental in the advancement of biological research primarily through their use to efficiently consolidate large amounts of data into structured, accessible sets. However, ontology development and usage can be hampered by the segregation of knowledge by domain that occurs due to independent development and use of the ontologies. The ability to infer data associated with one ontology to data associated with another ontology would prove useful in expanding information content and scope. We here focus on relating two ontologies: the Gene Ontology (GO), which encodes canonical gene function, and the Mammalian Phenotype Ontology (MP), which describes non-canonical phenotypes, using statistical methods to suggest GO functional annotations from existing MP phenotype annotations. This work is in contrast to previous studies that have focused on inferring gene function from phenotype primarily through lexical or semantic similarity measures. We have designed and tested a set of algorithms that represents a novel methodology to define rules for predicting gene function by examining the emergent structure and relationships between the gene functions and phenotypes rather than inspecting the terms semantically. The algorithms inspect relationships among multiple phenotype terms to deduce if there are cases where they all arise from a single gene function. We apply this methodology to data about genes in the laboratory mouse that are formally represented in the Mouse Genome Informatics (MGI) resource. From the data, 7444 rule instances were generated from five generalized rules, resulting in 4818 unique GO functional predictions for 1796 genes. We show that our method is capable of inferring high-quality functional annotations from curated phenotype data. As well as creating inferred annotations, our method has the potential to allow for the elucidation of unforeseen, biologically significant associations between gene function and phenotypes that would be overlooked by a semantics-based approach. Future work will include the implementation of the described algorithms for a variety of other model organism databases, taking full advantage of the abundance of available high quality curated data.

  5. Identification of functional differences in metabolic networks using comparative genomics and constraint-based models.

    PubMed

    Hamilton, Joshua J; Reed, Jennifer L

    2012-01-01

    Genome-scale network reconstructions are useful tools for understanding cellular metabolism, and comparisons of such reconstructions can provide insight into metabolic differences between organisms. Recent efforts toward comparing genome-scale models have focused primarily on aligning metabolic networks at the reaction level and then looking at differences and similarities in reaction and gene content. However, these reaction comparison approaches are time-consuming and do not identify the effect network differences have on the functional states of the network. We have developed a bilevel mixed-integer programming approach, CONGA, to identify functional differences between metabolic networks by comparing network reconstructions aligned at the gene level. We first identify orthologous genes across two reconstructions and then use CONGA to identify conditions under which differences in gene content give rise to differences in metabolic capabilities. By seeking genes whose deletion in one or both models disproportionately changes flux through a selected reaction (e.g., growth or by-product secretion) in one model over another, we are able to identify structural metabolic network differences enabling unique metabolic capabilities. Using CONGA, we explore functional differences between two metabolic reconstructions of Escherichia coli and identify a set of reactions responsible for chemical production differences between the two models. We also use this approach to aid in the development of a genome-scale model of Synechococcus sp. PCC 7002. Finally, we propose potential antimicrobial targets in Mycobacterium tuberculosis and Staphylococcus aureus based on differences in their metabolic capabilities. Through these examples, we demonstrate that a gene-centric approach to comparing metabolic networks allows for a rapid comparison of metabolic models at a functional level. Using CONGA, we can identify differences in reaction and gene content which give rise to different functional predictions. Because CONGA provides a general framework, it can be applied to find functional differences across models and biological systems beyond those presented here.

  6. Identification of Functional Differences in Metabolic Networks Using Comparative Genomics and Constraint-Based Models

    PubMed Central

    Hamilton, Joshua J.; Reed, Jennifer L.

    2012-01-01

    Genome-scale network reconstructions are useful tools for understanding cellular metabolism, and comparisons of such reconstructions can provide insight into metabolic differences between organisms. Recent efforts toward comparing genome-scale models have focused primarily on aligning metabolic networks at the reaction level and then looking at differences and similarities in reaction and gene content. However, these reaction comparison approaches are time-consuming and do not identify the effect network differences have on the functional states of the network. We have developed a bilevel mixed-integer programming approach, CONGA, to identify functional differences between metabolic networks by comparing network reconstructions aligned at the gene level. We first identify orthologous genes across two reconstructions and then use CONGA to identify conditions under which differences in gene content give rise to differences in metabolic capabilities. By seeking genes whose deletion in one or both models disproportionately changes flux through a selected reaction (e.g., growth or by-product secretion) in one model over another, we are able to identify structural metabolic network differences enabling unique metabolic capabilities. Using CONGA, we explore functional differences between two metabolic reconstructions of Escherichia coli and identify a set of reactions responsible for chemical production differences between the two models. We also use this approach to aid in the development of a genome-scale model of Synechococcus sp. PCC 7002. Finally, we propose potential antimicrobial targets in Mycobacterium tuberculosis and Staphylococcus aureus based on differences in their metabolic capabilities. Through these examples, we demonstrate that a gene-centric approach to comparing metabolic networks allows for a rapid comparison of metabolic models at a functional level. Using CONGA, we can identify differences in reaction and gene content which give rise to different functional predictions. Because CONGA provides a general framework, it can be applied to find functional differences across models and biological systems beyond those presented here. PMID:22666308

  7. Annotation of gene function in citrus using gene expression information and co-expression networks

    PubMed Central

    2014-01-01

    Background The genus Citrus encompasses major cultivated plants such as sweet orange, mandarin, lemon and grapefruit, among the world’s most economically important fruit crops. With increasing volumes of transcriptomics data available for these species, Gene Co-expression Network (GCN) analysis is a viable option for predicting gene function at a genome-wide scale. GCN analysis is based on a “guilt-by-association” principle whereby genes encoding proteins involved in similar and/or related biological processes may exhibit similar expression patterns across diverse sets of experimental conditions. While bioinformatics resources such as GCN analysis are widely available for efficient gene function prediction in model plant species including Arabidopsis, soybean and rice, in citrus these tools are not yet developed. Results We have constructed a comprehensive GCN for citrus inferred from 297 publicly available Affymetrix Genechip Citrus Genome microarray datasets, providing gene co-expression relationships at a genome-wide scale (33,000 transcripts). The comprehensive citrus GCN consists of a global GCN (condition-independent) and four condition-dependent GCNs that survey the sweet orange species only, all citrus fruit tissues, all citrus leaf tissues, or stress-exposed plants. All of these GCNs are clustered using genome-wide, gene-centric (guide) and graph clustering algorithms for flexibility of gene function prediction. For each putative cluster, gene ontology (GO) enrichment and gene expression specificity analyses were performed to enhance gene function, expression and regulation pattern prediction. The guide-gene approach was used to infer novel roles of genes involved in disease susceptibility and vitamin C metabolism, and graph-clustering approaches were used to investigate isoprenoid/phenylpropanoid metabolism in citrus peel, and citric acid catabolism via the GABA shunt in citrus fruit. Conclusions Integration of citrus gene co-expression networks, functional enrichment analysis and gene expression information provide opportunities to infer gene function in citrus. We present a publicly accessible tool, Network Inference for Citrus Co-Expression (NICCE, http://citrus.adelaide.edu.au/nicce/home.aspx), for the gene co-expression analysis in citrus. PMID:25023870

  8. A method to identify differential expression profiles of time-course gene data with Fourier transformation.

    PubMed

    Kim, Jaehee; Ogden, Robert Todd; Kim, Haseong

    2013-10-18

    Time course gene expression experiments are an increasingly popular method for exploring biological processes. Temporal gene expression profiles provide an important characterization of gene function, as biological systems are both developmental and dynamic. With such data it is possible to study gene expression changes over time and thereby to detect differential genes. Much of the early work on analyzing time series expression data relied on methods developed originally for static data and thus there is a need for improved methodology. Since time series expression is a temporal process, its unique features such as autocorrelation between successive points should be incorporated into the analysis. This work aims to identify genes that show different gene expression profiles across time. We propose a statistical procedure to discover gene groups with similar profiles using a nonparametric representation that accounts for the autocorrelation in the data. In particular, we first represent each profile in terms of a Fourier basis, and then we screen out genes that are not differentially expressed based on the Fourier coefficients. Finally, we cluster the remaining gene profiles using a model-based approach in the Fourier domain. We evaluate the screening results in terms of sensitivity, specificity, FDR and FNR, compare with the Gaussian process regression screening in a simulation study and illustrate the results by application to yeast cell-cycle microarray expression data with alpha-factor synchronization.The key elements of the proposed methodology: (i) representation of gene profiles in the Fourier domain; (ii) automatic screening of genes based on the Fourier coefficients and taking into account autocorrelation in the data, while controlling the false discovery rate (FDR); (iii) model-based clustering of the remaining gene profiles. Using this method, we identified a set of cell-cycle-regulated time-course yeast genes. The proposed method is general and can be potentially used to identify genes which have the same patterns or biological processes, and help facing the present and forthcoming challenges of data analysis in functional genomics.

  9. Reconstruction of a Functional Human Gene Network, with an Application for Prioritizing Positional Candidate Genes

    PubMed Central

    Franke, Lude; Bakel, Harm van; Fokkens, Like; de Jong, Edwin D.; Egmont-Petersen, Michael; Wijmenga, Cisca

    2006-01-01

    Most common genetic disorders have a complex inheritance and may result from variants in many genes, each contributing only weak effects to the disease. Pinpointing these disease genes within the myriad of susceptibility loci identified in linkage studies is difficult because these loci may contain hundreds of genes. However, in any disorder, most of the disease genes will be involved in only a few different molecular pathways. If we know something about the relationships between the genes, we can assess whether some genes (which may reside in different loci) functionally interact with each other, indicating a joint basis for the disease etiology. There are various repositories of information on pathway relationships. To consolidate this information, we developed a functional human gene network that integrates information on genes and the functional relationships between genes, based on data from the Kyoto Encyclopedia of Genes and Genomes, the Biomolecular Interaction Network Database, Reactome, the Human Protein Reference Database, the Gene Ontology database, predicted protein-protein interactions, human yeast two-hybrid interactions, and microarray coexpressions. We applied this network to interrelate positional candidate genes from different disease loci and then tested 96 heritable disorders for which the Online Mendelian Inheritance in Man database reported at least three disease genes. Artificial susceptibility loci, each containing 100 genes, were constructed around each disease gene, and we used the network to rank these genes on the basis of their functional interactions. By following up the top five genes per artificial locus, we were able to detect at least one known disease gene in 54% of the loci studied, representing a 2.8-fold increase over random selection. This suggests that our method can significantly reduce the cost and effort of pinpointing true disease genes in analyses of disorders for which numerous loci have been reported but for which most of the genes are unknown. PMID:16685651

  10. Solexa-Sequencing Based Transcriptome Study of Plaice Skin Phenotype in Rex Rabbits (Oryctolagus cuniculus)

    PubMed Central

    Pan, Lei; Liu, Yan; Wei, Qiang; Xiao, Chenwen; Ji, Quanan; Bao, Guolian; Wu, Xinsheng

    2015-01-01

    Background Fur is an important genetically-determined characteristic of domestic rabbits; rabbit furs are of great economic value. We used the Solexa sequencing technology to assess gene expression in skin tissues from full-sib Rex rabbits of different phenotypes in order to explore the molecular mechanisms associated with fur determination. Methodology/Principal Findings Transcriptome analysis included de novo assembly, gene function identification, and gene function classification and enrichment. We obtained 74,032,912 and 71,126,891 short reads of 100 nt, which were assembled into 377,618 unique sequences by Trinity strategy (N50=680 nt). Based on BLAST results with known proteins, 50,228 sequences were identified at a cut-off E-value ≥ 10-5. Using Blast to Gene Ontology (GO), Clusters of Orthologous Groups (KOG) and Kyoto Encyclopedia of Genes and Genomes (KEGG), we obtained several genes with important protein functions. A total of 308 differentially expressed genes were obtained by transcriptome analysis of plaice and un-plaice phenotype animals; 209 additional differentially expressed genes were not found in any database. These genes included 49 that were only expressed in plaice skin rabbits. The novel genes may play important roles during skin growth and development. In addition, 99 known differentially expressed genes were assigned to PI3K-Akt signaling, focal adhesion, and ECM-receptor interactin, among others. Growth factors play a role in skin growth and development by regulating these signaling pathways. We confirmed the altered expression levels of seven target genes by qRT-PCR. And chosen a key gene for SNP to found the differentially between plaice and un-plaice phenotypes rabbit. Conclusions/Significance The rabbit transcriptome profiling data provide new insights in understanding the molecular mechanisms underlying rabbit skin growth and development. PMID:25955442

  11. A novel gene expression-based prognostic scoring system to predict survival in gastric cancer

    DOE PAGES

    Wang, Pin; Wang, Yunshan; Hang, Bo; ...

    2016-07-11

    Analysis of gene expression patterns in gastric cancer (GC) can help to identify a comprehensive panel of gene biomarkers for predicting clinical outcomes and to discover potential new therapeutic targets. Here, a multi-step bioinformatics analytic approach was developed to establish a novel prognostic scoring system for GC. We first identified 276 genes that were robustly differentially expressed between normal and GC tissues, of which, 249 were found to be significantly associated with overall survival (OS) by univariate Cox regression analysis. The biological functions of 249 genes are related to cell cycle, RNA/ncRNA process, acetylation and extracellular matrix organization. A networkmore » was generated for view of the gene expression architecture of 249 genes in 265 GCs. Finally, we applied a canonical discriminant analysis approach to identify a 53-gene signature and a prognostic scoring system was established based on a canonical discriminant function of 53 genes. The prognostic scores strongly predicted patients with GC to have either a poor or good OS. Our study raises the prospect that the practicality of GC patient prognosis can be assessed by this prognostic scoring system.« less

  12. Analyses of the NAC transcription factor gene family in Gossypium raimondii Ulbr.: chromosomal location, structure, phylogeny, and expression patterns.

    PubMed

    Shang, Haihong; Li, Wei; Zou, Changsong; Yuan, Youlu

    2013-07-01

    NAC domain proteins are plant-specific transcription factors known to play diverse roles in various plant developmental processes. In the present study, we performed the first comprehensive study of the NAC gene family in Gossypium raimondii Ulbr., incorporating phylogenetic, chromosomal location, gene structure, conserved motif, and expression profiling analyses. We identified 145 NAC transcription factor (NAC-TF) genes that were phylogenetically clustered into 18 distinct subfamilies. Of these, 127 NAC-TF genes were distributed across the 13 chromosomes, 80 (55%) were preferentially retained duplicates located in both duplicated regions and six were located in triplicated chromosomal regions. The majority of NAC-TF genes showed temporal-, spatial-, and tissue-specific expression patterns based on transcriptomic and qRT-PCR analyses. However, the expression patterns of several duplicate genes were partially redundant, suggesting the occurrence of sub-functionalization during their evolution. Based on their genomic organization, we concluded that genomic duplications contributed significantly to the expansion of the NAC-TF gene family in G. raimondii. Comprehensive analysis of their expression profiles could provide novel insights into the functional divergence among members of the NAC gene family in G. raimondii. © 2013 Institute of Botany, Chinese Academy of Sciences.

  13. A novel gene expression-based prognostic scoring system to predict survival in gastric cancer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, Pin; Wang, Yunshan; Hang, Bo

    Analysis of gene expression patterns in gastric cancer (GC) can help to identify a comprehensive panel of gene biomarkers for predicting clinical outcomes and to discover potential new therapeutic targets. Here, a multi-step bioinformatics analytic approach was developed to establish a novel prognostic scoring system for GC. We first identified 276 genes that were robustly differentially expressed between normal and GC tissues, of which, 249 were found to be significantly associated with overall survival (OS) by univariate Cox regression analysis. The biological functions of 249 genes are related to cell cycle, RNA/ncRNA process, acetylation and extracellular matrix organization. A networkmore » was generated for view of the gene expression architecture of 249 genes in 265 GCs. Finally, we applied a canonical discriminant analysis approach to identify a 53-gene signature and a prognostic scoring system was established based on a canonical discriminant function of 53 genes. The prognostic scores strongly predicted patients with GC to have either a poor or good OS. Our study raises the prospect that the practicality of GC patient prognosis can be assessed by this prognostic scoring system.« less

  14. GeneKnockout by Targeted Mutagenesis in a Hemimetabolous Insect, the Two-Spotted Cricket Gryllus bimaculatus, using TALENs.

    PubMed

    Watanabe, Takahito; Noji, Sumihare; Mito, Taro

    2016-01-01

    Hemimetabolous, or incompletely metamorphosing, insects are phylogenetically basal. These insects include many deleterious species. The cricket, Gryllus bimaculatus, is an emerging model for hemimetabolous insects, based on the success of RNA interference (RNAi)-based gene-functional analyses and transgenic technology. Taking advantage of genome-editing technologies in this species would greatly promote functional genomics studies. Genome editing using transcription activator-like effector nucleases (TALENs) has proven to be an effective method for site-specific genome manipulation in various species. TALENs are artificial nucleases that are capable of inducing DNA double-strand breaks into specified target sequences. Here, we describe a protocol for TALEN-based gene knockout in G. bimaculatus, including a mutant selection scheme via mutation detection assays, for generating homozygous knockout organisms.

  15. Co-expression network with protein-protein interaction and transcription regulation in malaria parasite Plasmodium falciparum.

    PubMed

    Yu, Fu-Dong; Yang, Shao-You; Li, Yuan-Yuan; Hu, Wei

    2013-04-10

    Malaria continues to be one of the most severe global infectious diseases, as a major threat to human health and economic development. Network-based biological analysis is a promising approach to uncover key genes and biological processes from a network viewpoint, which could not be recognized from individual gene-based signatures. We integrated gene co-expression profile with protein-protein interaction and transcriptional regulation information to construct a comprehensive gene co-expression network of Plasmodium falciparum. Based on this network, we identified 10 core modules by using ICE (Iterative Clique Enumeration) algorithm, which were essential for malaria parasite development in intraerythrocytic developmental cycle (IDC) stages. In each module, all genes were highly correlated probably due to co-regulation or formation of a protein complex. Some of these genes were recognized to be differentially coexpressed among three close-by IDC stages. The gene of prpf8 (PFD0265w) encoding pre-mRNA processing splicing factor 8 product was identified as DCGs (differentially co-expressed genes) among IDC stages, although this gene function was seldom reported in previous researches. Integrating the species-specific gene prediction and differential co-expression gene detection, we found some modules could perform species-specific functions according to some of genes in these modules were species-specific genes, like the module 10. Furthermore, in order to reveal the underlying mechanisms of the erythrocyte invasion by P. falciparum, Steiner Tree algorithm was employed to identify the invasion subnetwork from our gene co-expression network. The subnetwork-based analysis indicated that some important Plasmodium parasite specific genes could corporate with each other and be co-regulated during the parasite invasion process, which including a head-to-head gene pair of PfRH2a (PF13_0198) and PfRH2b (MAL13P1.176). This study based on gene co-expression network could shed new insights on the mechanisms of pathogenesis, even virulence and P. falciparum development. Crown Copyright © 2012. Published by Elsevier B.V. All rights reserved.

  16. AgBase: supporting functional modeling in agricultural organisms

    PubMed Central

    McCarthy, Fiona M.; Gresham, Cathy R.; Buza, Teresia J.; Chouvarine, Philippe; Pillai, Lakshmi R.; Kumar, Ranjit; Ozkan, Seval; Wang, Hui; Manda, Prashanti; Arick, Tony; Bridges, Susan M.; Burgess, Shane C.

    2011-01-01

    AgBase (http://www.agbase.msstate.edu/) provides resources to facilitate modeling of functional genomics data and structural and functional annotation of agriculturally important animal, plant, microbe and parasite genomes. The website is redesigned to improve accessibility and ease of use, including improved search capabilities. Expanded capabilities include new dedicated pages for horse, cat, dog, cotton, rice and soybean. We currently provide 590 240 Gene Ontology (GO) annotations to 105 454 gene products in 64 different species, including GO annotations linked to transcripts represented on agricultural microarrays. For many of these arrays, this provides the only functional annotation available. GO annotations are available for download and we provide comprehensive, species-specific GO annotation files for 18 different organisms. The tools available at AgBase have been expanded and several existing tools improved based upon user feedback. One of seven new tools available at AgBase, GOModeler, supports hypothesis testing from functional genomics data. We host several associated databases and provide genome browsers for three agricultural pathogens. Moreover, we provide comprehensive training resources (including worked examples and tutorials) via links to Educational Resources at the AgBase website. PMID:21075795

  17. GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis.

    PubMed

    Zheng, Qi; Wang, Xiu-Jie

    2008-07-01

    Gene Ontology (GO) analysis has become a commonly used approach for functional studies of large-scale genomic or transcriptomic data. Although there have been a lot of software with GO-related analysis functions, new tools are still needed to meet the requirements for data generated by newly developed technologies or for advanced analysis purpose. Here, we present a Gene Ontology Enrichment Analysis Software Toolkit (GOEAST), an easy-to-use web-based toolkit that identifies statistically overrepresented GO terms within given gene sets. Compared with available GO analysis tools, GOEAST has the following improved features: (i) GOEAST displays enriched GO terms in graphical format according to their relationships in the hierarchical tree of each GO category (biological process, molecular function and cellular component), therefore, provides better understanding of the correlations among enriched GO terms; (ii) GOEAST supports analysis for data from various sources (probe or probe set IDs of Affymetrix, Illumina, Agilent or customized microarrays, as well as different gene identifiers) and multiple species (about 60 prokaryote and eukaryote species); (iii) One unique feature of GOEAST is to allow cross comparison of the GO enrichment status of multiple experiments to identify functional correlations among them. GOEAST also provides rigorous statistical tests to enhance the reliability of analysis results. GOEAST is freely accessible at http://omicslab.genetics.ac.cn/GOEAST/

  18. Molecular and functional characterization of novel fructosyltransferases and invertases from Agave tequilana.

    PubMed

    Cortés-Romero, Celso; Martínez-Hernández, Aída; Mellado-Mojica, Erika; López, Mercedes G; Simpson, June

    2012-01-01

    Fructans are the main storage polysaccharides found in Agave species. The synthesis of these complex carbohydrates relies on the activities of specific fructosyltransferase enzymes closely related to the hydrolytic invertases. Analysis of Agave tequilana transcriptome data led to the identification of ESTs encoding putative fructosyltransferases and invertases. Based on sequence alignments and structure/function relationships, two different genes were predicted to encode 1-SST and 6G-FFT type fructosyltransferases, in addition, 4 genes encoding putative cell wall invertases and 4 genes encoding putative vacuolar invertases were also identified. Probable functions for each gene, were assigned based on conserved amino acid sequences and confirmed for 2 fructosyltransferases and one invertase by analyzing the enzymatic activity of recombinant Agave protein s expressed and purified from Pichia pastoris. The genome organization of the fructosyltransferase/invertase genes, for which the corresponding cDNA contained the complete open reading frame, was found to be well conserved since all genes were shown to carry a 9 bp mini-exon and all showed a similar structure of 8 exons/7 introns with the exception of a cell wall invertase gene which has 7 exons and 6 introns. Fructosyltransferase genes were strongly expressed in the storage organs of the plants, especially in vegetative stages of development and to lower levels in photosynthetic tissues, in contrast to the invertase genes where higher levels of expression were observed in leaf tissues and in mature plants.

  19. Molecular and Functional Characterization of Novel Fructosyltransferases and Invertases from Agave tequilana

    PubMed Central

    Cortés-Romero, Celso; Martínez-Hernández, Aída; Mellado-Mojica, Erika; López, Mercedes G.; Simpson, June

    2012-01-01

    Fructans are the main storage polysaccharides found in Agave species. The synthesis of these complex carbohydrates relies on the activities of specific fructosyltransferase enzymes closely related to the hydrolytic invertases. Analysis of Agave tequilana transcriptome data led to the identification of ESTs encoding putative fructosyltransferases and invertases. Based on sequence alignments and structure/function relationships, two different genes were predicted to encode 1-SST and 6G-FFT type fructosyltransferases, in addition, 4 genes encoding putative cell wall invertases and 4 genes encoding putative vacuolar invertases were also identified. Probable functions for each gene, were assigned based on conserved amino acid sequences and confirmed for 2 fructosyltransferases and one invertase by analyzing the enzymatic activity of recombinant Agave protein s expressed and purified from Pichia pastoris. The genome organization of the fructosyltransferase/invertase genes, for which the corresponding cDNA contained the complete open reading frame, was found to be well conserved since all genes were shown to carry a 9 bp mini-exon and all showed a similar structure of 8 exons/7 introns with the exception of a cell wall invertase gene which has 7 exons and 6 introns. Fructosyltransferase genes were strongly expressed in the storage organs of the plants, especially in vegetative stages of development and to lower levels in photosynthetic tissues, in contrast to the invertase genes where higher levels of expression were observed in leaf tissues and in mature plants. PMID:22558253

  20. A high-throughput virus-induced gene silencing protocol identifies genes involved in multi-stress tolerance

    PubMed Central

    2013-01-01

    Background Understanding the function of a particular gene under various stresses is important for engineering plants for broad-spectrum stress tolerance. Although virus-induced gene silencing (VIGS) has been used to characterize genes involved in abiotic stress tolerance, currently available gene silencing and stress imposition methodology at the whole plant level is not suitable for high-throughput functional analyses of genes. This demands a robust and reliable methodology for characterizing genes involved in abiotic and multi-stress tolerance. Results Our methodology employs VIGS-based gene silencing in leaf disks combined with simple stress imposition and effect quantification methodologies for easy and faster characterization of genes involved in abiotic and multi-stress tolerance. By subjecting leaf disks from gene-silenced plants to various abiotic stresses and inoculating silenced plants with various pathogens, we show the involvement of several genes for multi-stress tolerance. In addition, we demonstrate that VIGS can be used to characterize genes involved in thermotolerance. Our results also showed the functional relevance of NtEDS1 in abiotic stress, NbRBX1 and NbCTR1 in oxidative stress; NtRAR1 and NtNPR1 in salinity stress; NbSOS1 and NbHSP101 in biotic stress; and NtEDS1, NbETR1, NbWRKY2 and NbMYC2 in thermotolerance. Conclusions In addition to widening the application of VIGS, we developed a robust, easy and high-throughput methodology for functional characterization of genes involved in multi-stress tolerance. PMID:24289810

  1. Ortholog-based screening and identification of genes related to intracellular survival.

    PubMed

    Yang, Xiaowen; Wang, Jiawei; Bing, Guoxia; Bie, Pengfei; De, Yanyan; Lyu, Yanli; Wu, Qingmin

    2018-04-20

    Bioinformatics and comparative genomics analysis methods were used to predict unknown pathogen genes based on homology with identified or functionally clustered genes. In this study, the genes of common pathogens were analyzed to screen and identify genes associated with intracellular survival through sequence similarity, phylogenetic tree analysis and the λ-Red recombination system test method. The total 38,952 protein-coding genes of common pathogens were divided into 19,775 clusters. As demonstrated through a COG analysis, information storage and processing genes might play an important role intracellular survival. Only 19 clusters were present in facultative intracellular pathogens, and not all were present in extracellular pathogens. Construction of a phylogenetic tree selected 18 of these 19 clusters. Comparisons with the DEG database and previous research revealed that seven other clusters are considered essential gene clusters and that seven other clusters are associated with intracellular survival. Moreover, this study confirmed that clusters screened by orthologs with similar function could be replaced with an approved uvrY gene and its orthologs, and the results revealed that the usg gene is associated with intracellular survival. The study improves the current understanding of intracellular pathogens characteristics and allows further exploration of the intracellular survival-related gene modules in these pathogens. Copyright © 2018. Published by Elsevier B.V.

  2. bc-GenExMiner 3.0: new mining module computes breast cancer gene expression correlation analyses.

    PubMed

    Jézéquel, Pascal; Frénel, Jean-Sébastien; Campion, Loïc; Guérin-Charbonnel, Catherine; Gouraud, Wilfried; Ricolleau, Gabriel; Campone, Mario

    2013-01-01

    We recently developed a user-friendly web-based application called bc-GenExMiner (http://bcgenex.centregauducheau.fr), which offered the possibility to evaluate prognostic informativity of genes in breast cancer by means of a 'prognostic module'. In this study, we develop a new module called 'correlation module', which includes three kinds of gene expression correlation analyses. The first one computes correlation coefficient between 2 or more (up to 10) chosen genes. The second one produces two lists of genes that are most correlated (positively and negatively) to a 'tested' gene. A gene ontology (GO) mining function is also proposed to explore GO 'biological process', 'molecular function' and 'cellular component' terms enrichment for the output lists of most correlated genes. The third one explores gene expression correlation between the 15 telomeric and 15 centromeric genes surrounding a 'tested' gene. These correlation analyses can be performed in different groups of patients: all patients (without any subtyping), in molecular subtypes (basal-like, HER2+, luminal A and luminal B) and according to oestrogen receptor status. Validation tests based on published data showed that these automatized analyses lead to results consistent with studies' conclusions. In brief, this new module has been developed to help basic researchers explore molecular mechanisms of breast cancer. DATABASE URL: http://bcgenex.centregauducheau.fr

  3. NEAT: an efficient network enrichment analysis test.

    PubMed

    Signorelli, Mirko; Vinciotti, Veronica; Wit, Ernst C

    2016-09-05

    Network enrichment analysis is a powerful method, which allows to integrate gene enrichment analysis with the information on relationships between genes that is provided by gene networks. Existing tests for network enrichment analysis deal only with undirected networks, they can be computationally slow and are based on normality assumptions. We propose NEAT, a test for network enrichment analysis. The test is based on the hypergeometric distribution, which naturally arises as the null distribution in this context. NEAT can be applied not only to undirected, but to directed and partially directed networks as well. Our simulations indicate that NEAT is considerably faster than alternative resampling-based methods, and that its capacity to detect enrichments is at least as good as the one of alternative tests. We discuss applications of NEAT to network analyses in yeast by testing for enrichment of the Environmental Stress Response target gene set with GO Slim and KEGG functional gene sets, and also by inspecting associations between functional sets themselves. NEAT is a flexible and efficient test for network enrichment analysis that aims to overcome some limitations of existing resampling-based tests. The method is implemented in the R package neat, which can be freely downloaded from CRAN ( https://cran.r-project.org/package=neat ).

  4. A subregion-based burden test for simultaneous identification of susceptibility loci and subregions within.

    PubMed

    Zhu, Bin; Mirabello, Lisa; Chatterjee, Nilanjan

    2018-06-22

    In rare variant association studies, aggregating rare and/or low frequency variants, may increase statistical power for detection of the underlying susceptibility gene or region. However, it is unclear which variants, or class of them, in a gene contribute most to the association. We proposed a subregion-based burden test (REBET) to simultaneously select susceptibility genes and identify important underlying subregions. The subregions are predefined by shared common biologic characteristics, such as the protein domain or functional impact. Based on a subset-based approach considering local correlations between combinations of test statistics of subregions, REBET is able to properly control the type I error rate while adjusting for multiple comparisons in a computationally efficient manner. Simulation studies show that REBET can achieve power competitive to alternative methods when rare variants cluster within subregions. In two case studies, REBET is able to identify known disease susceptibility genes, and more importantly pinpoint the unreported most susceptible subregions, which represent protein domains essential for gene function. R package REBET is available at https://dceg.cancer.gov/tools/analysis/rebet. Published 2018. This article is a U.S. Government work and is in the public domain in the USA.

  5. Maize GO annotation—methods, evaluation, and review (maize-GAMER)

    USDA-ARS?s Scientific Manuscript database

    We created a new high-coverage, robust, and reproducible functional annotation of maize protein-coding genes based on Gene Ontology (GO) term assignments. Whereas the existing Phytozome and Gramene maize GO annotation sets only cover 41% and 56% of maize protein-coding genes, respectively, this stu...

  6. Gene Network Construction from Microarray Data Identifies a Key Network Module and Several Candidate Hub Genes in Age-Associated Spatial Learning Impairment

    PubMed Central

    Uddin, Raihan; Singh, Shiva M.

    2017-01-01

    As humans age many suffer from a decrease in normal brain functions including spatial learning impairments. This study aimed to better understand the molecular mechanisms in age-associated spatial learning impairment (ASLI). We used a mathematical modeling approach implemented in Weighted Gene Co-expression Network Analysis (WGCNA) to create and compare gene network models of young (learning unimpaired) and aged (predominantly learning impaired) brains from a set of exploratory datasets in rats in the context of ASLI. The major goal was to overcome some of the limitations previously observed in the traditional meta- and pathway analysis using these data, and identify novel ASLI related genes and their networks based on co-expression relationship of genes. This analysis identified a set of network modules in the young, each of which is highly enriched with genes functioning in broad but distinct GO functional categories or biological pathways. Interestingly, the analysis pointed to a single module that was highly enriched with genes functioning in “learning and memory” related functions and pathways. Subsequent differential network analysis of this “learning and memory” module in the aged (predominantly learning impaired) rats compared to the young learning unimpaired rats allowed us to identify a set of novel ASLI candidate hub genes. Some of these genes show significant repeatability in networks generated from independent young and aged validation datasets. These hub genes are highly co-expressed with other genes in the network, which not only show differential expression but also differential co-expression and differential connectivity across age and learning impairment. The known function of these hub genes indicate that they play key roles in critical pathways, including kinase and phosphatase signaling, in functions related to various ion channels, and in maintaining neuronal integrity relating to synaptic plasticity and memory formation. Taken together, they provide a new insight and generate new hypotheses into the molecular mechanisms responsible for age associated learning impairment, including spatial learning. PMID:29066959

  7. Gene Network Construction from Microarray Data Identifies a Key Network Module and Several Candidate Hub Genes in Age-Associated Spatial Learning Impairment.

    PubMed

    Uddin, Raihan; Singh, Shiva M

    2017-01-01

    As humans age many suffer from a decrease in normal brain functions including spatial learning impairments. This study aimed to better understand the molecular mechanisms in age-associated spatial learning impairment (ASLI). We used a mathematical modeling approach implemented in Weighted Gene Co-expression Network Analysis (WGCNA) to create and compare gene network models of young (learning unimpaired) and aged (predominantly learning impaired) brains from a set of exploratory datasets in rats in the context of ASLI. The major goal was to overcome some of the limitations previously observed in the traditional meta- and pathway analysis using these data, and identify novel ASLI related genes and their networks based on co-expression relationship of genes. This analysis identified a set of network modules in the young, each of which is highly enriched with genes functioning in broad but distinct GO functional categories or biological pathways. Interestingly, the analysis pointed to a single module that was highly enriched with genes functioning in "learning and memory" related functions and pathways. Subsequent differential network analysis of this "learning and memory" module in the aged (predominantly learning impaired) rats compared to the young learning unimpaired rats allowed us to identify a set of novel ASLI candidate hub genes. Some of these genes show significant repeatability in networks generated from independent young and aged validation datasets. These hub genes are highly co-expressed with other genes in the network, which not only show differential expression but also differential co-expression and differential connectivity across age and learning impairment. The known function of these hub genes indicate that they play key roles in critical pathways, including kinase and phosphatase signaling, in functions related to various ion channels, and in maintaining neuronal integrity relating to synaptic plasticity and memory formation. Taken together, they provide a new insight and generate new hypotheses into the molecular mechanisms responsible for age associated learning impairment, including spatial learning.

  8. Precise integration of inducible transcriptional elements (PrIITE) enables absolute control of gene expression.

    PubMed

    Pinto, Rita; Hansen, Lars; Hintze, John; Almeida, Raquel; Larsen, Sylvester; Coskun, Mehmet; Davidsen, Johanne; Mitchelmore, Cathy; David, Leonor; Troelsen, Jesper Thorvald; Bennett, Eric Paul

    2017-07-27

    Tetracycline-based inducible systems provide powerful methods for functional studies where gene expression can be controlled. However, the lack of tight control of the inducible system, leading to leakiness and adverse effects caused by undesirable tetracycline dosage requirements, has proven to be a limitation. Here, we report that the combined use of genome editing tools and last generation Tet-On systems can resolve these issues. Our principle is based on precise integration of inducible transcriptional elements (coined PrIITE) targeted to: (i) exons of an endogenous gene of interest (GOI) and (ii) a safe harbor locus. Using PrIITE cells harboring a GFP reporter or CDX2 transcription factor, we demonstrate discrete inducibility of gene expression with complete abrogation of leakiness. CDX2 PrIITE cells generated by this approach uncovered novel CDX2 downstream effector genes. Our results provide a strategy for characterization of dose-dependent effector functions of essential genes that require absence of endogenous gene expression. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. Utilizing virus-induced gene silencing for the functional characterization of maize genes during infection with the fungal pathogen Ustilago maydis.

    PubMed

    van der Linde, Karina; Doehlemann, Gunther

    2013-01-01

    While in dicotyledonous plants virus-induced gene silencing (VIGS) is well established to study plant-pathogen interaction, in monocots only few examples of efficient VIGS have been reported so far. One of the available systems is based on the brome mosaic virus (BMV) which allows gene silencing in different cereals including barley (Hordeum vulgare), wheat (Triticum aestivum), and maize (Zea mays).Infection of maize plants by the corn smut fungus Ustilago maydis leads to the formation of large tumors on stem, leaves, and inflorescences. During this biotrophic interaction, plant defense responses are actively suppressed by the pathogen, and previous transcriptome analyses of infected maize plants showed comprehensive and stage-specific changes in host gene expression during disease progression.To identify maize genes that are functionally involved in the interaction with U. maydis, we adapted a VIGS system based on the Brome mosaic virus (BMV) to maize at conditions that allow successful U. maydis infection of BMV pre-infected maize plants. This setup enables quantification of VIGS and its impact on U. maydis infection using a quantitative real-time PCR (q(RT)-PCR)-based readout.

  10. A guide to best practices for Gene Ontology (GO) manual annotation

    PubMed Central

    Balakrishnan, Rama; Harris, Midori A.; Huntley, Rachael; Van Auken, Kimberly; Cherry, J. Michael

    2013-01-01

    The Gene Ontology Consortium (GOC) is a community-based bioinformatics project that classifies gene product function through the use of structured controlled vocabularies. A fundamental application of the Gene Ontology (GO) is in the creation of gene product annotations, evidence-based associations between GO definitions and experimental or sequence-based analysis. Currently, the GOC disseminates 126 million annotations covering >374 000 species including all the kingdoms of life. This number includes two classes of GO annotations: those created manually by experienced biocurators reviewing the literature or by examination of biological data (1.1 million annotations covering 2226 species) and those generated computationally via automated methods. As manual annotations are often used to propagate functional predictions between related proteins within and between genomes, it is critical to provide accurate consistent manual annotations. Toward this goal, we present here the conventions defined by the GOC for the creation of manual annotation. This guide represents the best practices for manual annotation as established by the GOC project over the past 12 years. We hope this guide will encourage research communities to annotate gene products of their interest to enhance the corpus of GO annotations available to all. Database URL: http://www.geneontology.org PMID:23842463

  11. A Systematic Investigation into Aging Related Genes in Brain and Their Relationship with Alzheimer's Disease.

    PubMed

    Meng, Guofeng; Zhong, Xiaoyan; Mei, Hongkang

    2016-01-01

    Aging, as a complex biological process, is accompanied by the accumulation of functional loses at different levels, which makes age to be the biggest risk factor to many neurological diseases. Even following decades of investigation, the process of aging is still far from being fully understood, especially at a systematic level. In this study, we identified aging related genes in brain by collecting the ones with sustained and consistent gene expression or DNA methylation changes in the aging process. Functional analysis with Gene Ontology to these genes suggested transcriptional regulators to be the most affected genes in the aging process. Transcription regulation analysis found some transcription factors, especially Specificity Protein 1 (SP1), to play important roles in regulating aging related gene expression. Module-based functional analysis indicated these genes to be associated with many well-known aging related pathways, supporting the validity of our approach to select aging related genes. Finally, we investigated the roles of aging related genes on Alzheimer's Disease (AD). We found that aging and AD related genes both involved some common pathways, which provided a possible explanation why aging made the brain more vulnerable to Alzheimer's Disease.

  12. Prioritizing chronic obstructive pulmonary disease (COPD) candidate genes in COPD-related networks

    PubMed Central

    Zhang, Yihua; Li, Wan; Feng, Yuyan; Guo, Shanshan; Zhao, Xilei; Wang, Yahui; He, Yuehan; He, Weiming; Chen, Lina

    2017-01-01

    Chronic obstructive pulmonary disease (COPD) is a multi-factor disease, which could be caused by many factors, including disturbances of metabolism and protein-protein interactions (PPIs). In this paper, a weighted COPD-related metabolic network and a weighted COPD-related PPI network were constructed base on COPD disease genes and functional information. Candidate genes in these weighted COPD-related networks were prioritized by making use of a gene prioritization method, respectively. Literature review and functional enrichment analysis of the top 100 genes in these two networks suggested the correlation of COPD and these genes. The performance of our gene prioritization method was superior to that of ToppGene and ToppNet for genes from the COPD-related metabolic network or the COPD-related PPI network after assessing using leave-one-out cross-validation, literature validation and functional enrichment analysis. The top-ranked genes prioritized from COPD-related metabolic and PPI networks could promote the better understanding about the molecular mechanism of this disease from different perspectives. The top 100 genes in COPD-related metabolic network or COPD-related PPI network might be potential markers for the diagnosis and treatment of COPD. PMID:29262568

  13. Prioritizing chronic obstructive pulmonary disease (COPD) candidate genes in COPD-related networks.

    PubMed

    Zhang, Yihua; Li, Wan; Feng, Yuyan; Guo, Shanshan; Zhao, Xilei; Wang, Yahui; He, Yuehan; He, Weiming; Chen, Lina

    2017-11-28

    Chronic obstructive pulmonary disease (COPD) is a multi-factor disease, which could be caused by many factors, including disturbances of metabolism and protein-protein interactions (PPIs). In this paper, a weighted COPD-related metabolic network and a weighted COPD-related PPI network were constructed base on COPD disease genes and functional information. Candidate genes in these weighted COPD-related networks were prioritized by making use of a gene prioritization method, respectively. Literature review and functional enrichment analysis of the top 100 genes in these two networks suggested the correlation of COPD and these genes. The performance of our gene prioritization method was superior to that of ToppGene and ToppNet for genes from the COPD-related metabolic network or the COPD-related PPI network after assessing using leave-one-out cross-validation, literature validation and functional enrichment analysis. The top-ranked genes prioritized from COPD-related metabolic and PPI networks could promote the better understanding about the molecular mechanism of this disease from different perspectives. The top 100 genes in COPD-related metabolic network or COPD-related PPI network might be potential markers for the diagnosis and treatment of COPD.

  14. Network diffusion-based analysis of high-throughput data for the detection of differentially enriched modules

    PubMed Central

    Bersanelli, Matteo; Mosca, Ettore; Remondini, Daniel; Castellani, Gastone; Milanesi, Luciano

    2016-01-01

    A relation exists between network proximity of molecular entities in interaction networks, functional similarity and association with diseases. The identification of network regions associated with biological functions and pathologies is a major goal in systems biology. We describe a network diffusion-based pipeline for the interpretation of different types of omics in the context of molecular interaction networks. We introduce the network smoothing index, a network-based quantity that allows to jointly quantify the amount of omics information in genes and in their network neighbourhood, using network diffusion to define network proximity. The approach is applicable to both descriptive and inferential statistics calculated on omics data. We also show that network resampling, applied to gene lists ranked by quantities derived from the network smoothing index, indicates the presence of significantly connected genes. As a proof of principle, we identified gene modules enriched in somatic mutations and transcriptional variations observed in samples of prostate adenocarcinoma (PRAD). In line with the local hypothesis, network smoothing index and network resampling underlined the existence of a connected component of genes harbouring molecular alterations in PRAD. PMID:27731320

  15. Application of CRISPR/Cas9 Gene Editing System on MDV-1 Genome for the Study of Gene Function.

    PubMed

    Zhang, Yaoyao; Tang, Na; Sadigh, Yashar; Baigent, Susan; Shen, Zhiqiang; Nair, Venugopal; Yao, Yongxiu

    2018-05-24

    Marek's disease virus (MDV) is a member of alphaherpesviruses associated with Marek's disease, a highly contagious neoplastic disease in chickens. Complete sequencing of the viral genome and recombineering techniques using infectious bacterial artificial chromosome (BAC) clones of Marek's disease virus genome have identified major genes that are associated with pathogenicity. Recent advances in CRISPR/Cas9-based gene editing have given opportunities for precise editing of the viral genome for identifying pathogenic determinants. Here we describe the application of CRISPR/Cas9 gene editing approaches to delete the Meq and pp38 genes from the CVI988 vaccine strain of MDV. This powerful technology will speed up the MDV gene function studies significantly, leading to a better understanding of the molecular mechanisms of MDV pathogenesis.

  16. Biolistics-based gene silencing in plants using a modified particle inflow gun.

    PubMed

    Davies, Kevin M; Deroles, Simon C; Boase, Murray R; Hunter, Don A; Schwinn, Kathy E

    2013-01-01

    RNA interference (RNAi) is one of the most commonly used techniques for examining the function of genes of interest. In this chapter we present two examples of RNAi that use the particle inflow gun for delivery of the DNA constructs. In one example transient RNAi is used to show the function of an anthocyanin regulatory gene in flower petals. In the second example stably transformed cell cultures are produced with an RNAi construct that results in a change in the anthocyanin hydroxylation pattern.

  17. Prioritization of Disease Susceptibility Genes Using LSM/SVD.

    PubMed

    Gong, Lejun; Yang, Ronggen; Yan, Qin; Sun, Xiao

    2013-12-01

    Understanding the role of genetics in diseases is one of the most important tasks in the postgenome era. It is generally too expensive and time consuming to perform experimental validation for all candidate genes related to disease. Computational methods play important roles for prioritizing these candidates. Herein, we propose an approach to prioritize disease genes using latent semantic mapping based on singular value decomposition. Our hypothesis is that similar functional genes are likely to cause similar diseases. Measuring the functional similarity between known disease susceptibility genes and unknown genes is to predict new disease susceptibility genes. Taking autism as an instance, the analysis results of the top ten genes prioritized demonstrate they might be autism susceptibility genes, which also indicates our approach could discover new disease susceptibility genes. The novel approach of disease gene prioritization could discover new disease susceptibility genes, and latent disease-gene relations. The prioritized results could also support the interpretive diversity and experimental views as computational evidence for disease researchers.

  18. Deep transcriptome sequencing provides new insights into the structural and functional organization of the wheat genome.

    PubMed

    Pingault, Lise; Choulet, Frédéric; Alberti, Adriana; Glover, Natasha; Wincker, Patrick; Feuillet, Catherine; Paux, Etienne

    2015-02-10

    Because of its size, allohexaploid nature, and high repeat content, the bread wheat genome is a good model to study the impact of the genome structure on gene organization, function, and regulation. However, because of the lack of a reference genome sequence, such studies have long been hampered and our knowledge of the wheat gene space is still limited. The access to the reference sequence of the wheat chromosome 3B provided us with an opportunity to study the wheat transcriptome and its relationships to genome and gene structure at a level that has never been reached before. By combining this sequence with RNA-seq data, we construct a fine transcriptome map of the chromosome 3B. More than 8,800 transcription sites are identified, that are distributed throughout the entire chromosome. Expression level, expression breadth, alternative splicing as well as several structural features of genes, including transcript length, number of exons, and cumulative intron length are investigated. Our analysis reveals a non-monotonic relationship between gene expression and structure and leads to the hypothesis that gene structure is determined by its function, whereas gene expression is subject to energetic cost. Moreover, we observe a recombination-based partitioning at the gene structure and function level. Our analysis provides new insights into the relationships between gene and genome structure and function. It reveals mechanisms conserved with other plant species as well as superimposed evolutionary forces that shaped the wheat gene space, likely participating in wheat adaptation.

  19. Searching new signals for production traits through gene-based association analysis in three Italian cattle breeds.

    PubMed

    Capomaccio, Stefano; Milanesi, Marco; Bomba, Lorenzo; Cappelli, Katia; Nicolazzi, Ezequiel L; Williams, John L; Ajmone-Marsan, Paolo; Stefanon, Bruno

    2015-08-01

    Genome-wide association studies (GWAS) have been widely applied to disentangle the genetic basis of complex traits. In cattle breeds, classical GWAS approaches with medium-density marker panels are far from conclusive, especially for complex traits. This is due to the intrinsic limitations of GWAS and the assumptions that are made to step from the association signals to the functional variations. Here, we applied a gene-based strategy to prioritize genotype-phenotype associations found for milk production and quality traits with classical approaches in three Italian dairy cattle breeds with different sample sizes (Italian Brown n = 745; Italian Holstein n = 2058; Italian Simmental n = 477). Although classical regression on single markers revealed only a single genome-wide significant genotype-phenotype association, for Italian Holstein, the gene-based approach identified specific genes in each breed that are associated with milk physiology and mammary gland development. As no standard method has yet been established to step from variation to functional units (i.e., genes), the strategy proposed here may contribute to revealing new genes that play significant roles in complex traits, such as those investigated here, amplifying low association signals using a gene-centric approach. © 2015 Stichting International Foundation for Animal Genetics.

  20. A function-based screen for seeking RubisCO active clones from metagenomes: novel enzymes influencing RubisCO activity.

    PubMed

    Böhnke, Stefanie; Perner, Mirjam

    2015-03-01

    Ribulose-1,5-bisphosphate carboxylase/oxygenase (RubisCO) is a key enzyme of the Calvin cycle, which is responsible for most of Earth's primary production. Although research on RubisCO genes and enzymes in plants, cyanobacteria and bacteria has been ongoing for years, still little is understood about its regulation and activation in bacteria. Even more so, hardly any information exists about the function of metagenomic RubisCOs and the role of the enzymes encoded on the flanking DNA owing to the lack of available function-based screens for seeking active RubisCOs from the environment. Here we present the first solely activity-based approach for identifying RubisCO active fosmid clones from a metagenomic library. We constructed a metagenomic library from hydrothermal vent fluids and screened 1056 fosmid clones. Twelve clones exhibited RubisCO activity and the metagenomic fragments resembled genes from Thiomicrospira crunogena. One of these clones was further analyzed. It contained a 35.2 kb metagenomic insert carrying the RubisCO gene cluster and flanking DNA regions. Knockouts of twelve genes and two intergenic regions on this metagenomic fragment demonstrated that the RubisCO activity was significantly impaired and was attributed to deletions in genes encoding putative transcriptional regulators and those believed to be vital for RubisCO activation. Our new technique revealed a novel link between a poorly characterized gene and RubisCO activity. This screen opens the door to directly investigating RubisCO genes and respective enzymes from environmental samples.

  1. Broad distribution spectrum from Gaussian to power law appears in stochastic variations in RNA-seq data.

    PubMed

    Awazu, Akinori; Tanabe, Takahiro; Kamitani, Mari; Tezuka, Ayumi; Nagano, Atsushi J

    2018-05-29

    Gene expression levels exhibit stochastic variations among genetically identical organisms under the same environmental conditions. In many recent transcriptome analyses based on RNA sequencing (RNA-seq), variations in gene expression levels among replicates were assumed to follow a negative binomial distribution, although the physiological basis of this assumption remains unclear. In this study, RNA-seq data were obtained from Arabidopsis thaliana under eight conditions (21-27 replicates), and the characteristics of gene-dependent empirical probability density function (ePDF) profiles of gene expression levels were analyzed. For A. thaliana and Saccharomyces cerevisiae, various types of ePDF of gene expression levels were obtained that were classified as Gaussian, power law-like containing a long tail, or intermediate. These ePDF profiles were well fitted with a Gauss-power mixing distribution function derived from a simple model of a stochastic transcriptional network containing a feedback loop. The fitting function suggested that gene expression levels with long-tailed ePDFs would be strongly influenced by feedback regulation. Furthermore, the features of gene expression levels are correlated with their functions, with the levels of essential genes tending to follow a Gaussian-like ePDF while those of genes encoding nucleic acid-binding proteins and transcription factors exhibit long-tailed ePDF.

  2. Transient, Inducible, Placenta-Specific Gene Expression in Mice

    PubMed Central

    Fan, Xiujun; Petitt, Matthew; Gamboa, Matthew; Huang, Mei; Dhal, Sabita; Druzin, Maurice L.; Wu, Joseph C.

    2012-01-01

    Molecular understanding of placental functions and pregnancy disorders is limited by the absence of methods for placenta-specific gene manipulation. Although persistent placenta-specific gene expression has been achieved by lentivirus-based gene delivery methods, developmentally and physiologically important placental genes have highly stage-specific functions, requiring controllable, transient expression systems for functional analysis. Here, we describe an inducible, placenta-specific gene expression system that enables high-level, transient transgene expression and monitoring of gene expression by live bioluminescence imaging in mouse placenta at different stages of pregnancy. We used the third generation tetracycline-responsive tranactivator protein Tet-On 3G, with 10- to 100-fold increased sensitivity to doxycycline (Dox) compared with previous versions, enabling unusually sensitive on-off control of gene expression in vivo. Transgenic mice expressing Tet-On 3G were created using a new integrase-based, site-specific approach, yielding high-level transgene expression driven by a ubiquitous promoter. Blastocysts from these mice were transduced with the Tet-On 3G-response element promoter-driving firefly luciferase using lentivirus-mediated placenta-specific gene delivery and transferred into wild-type pseudopregnant recipients for placenta-specific, Dox-inducible gene expression. Systemic Dox administration at various time points during pregnancy led to transient, placenta-specific firefly luciferase expression as early as d 5 of pregnancy in a Dox dose-dependent manner. This system enables, for the first time, reliable pregnancy stage-specific induction of gene expression in the placenta and live monitoring of gene expression during pregnancy. It will be widely applicable to studies of both placental development and pregnancy, and the site-specific Tet-On G3 mouse will be valuable for studies in a broad range of tissues. PMID:23011919

  3. Peptides, polypeptides and peptide-polymer hybrids as nucleic acid carriers.

    PubMed

    Ahmed, Marya

    2017-10-24

    Cell penetrating peptides (CPPs), and protein transduction domains (PTDs) of viruses and other natural proteins serve as a template for the development of efficient peptide based gene delivery vectors. PTDs are sequences of acidic or basic amphipathic amino acids, with superior membrane trespassing efficacies. Gene delivery vectors derived from these natural, cationic and cationic amphipathic peptides, however, offer little flexibility in tailoring the physicochemical properties of single chain peptide based systems. Owing to significant advances in the field of peptide chemistry, synthetic mimics of natural peptides are often prepared and have been evaluated for their gene expression, as a function of amino acid functionalities, architecture and net cationic content of peptide chains. Moreover, chimeric single polypeptide chains are prepared by a combination of multiple small natural or synthetic peptides, which imparts distinct physiological properties to peptide based gene delivery therapeutics. In order to obtain multivalency and improve the gene delivery efficacies of low molecular weight cationic peptides, bioactive peptides are often incorporated into a polymeric architecture to obtain novel 'polymer-peptide hybrids' with improved gene delivery efficacies. Peptide modified polymers prepared by physical or chemical modifications exhibit enhanced endosomal escape, stimuli responsive degradation and targeting efficacies, as a function of physicochemical and biological activities of peptides attached onto a polymeric scaffold. The focus of this review is to provide comprehensive and step-wise progress in major natural and synthetic peptides, chimeric polypeptides, and peptide-polymer hybrids for nucleic acid delivery applications.

  4. A P-Norm Robust Feature Extraction Method for Identifying Differentially Expressed Genes

    PubMed Central

    Liu, Jian; Liu, Jin-Xing; Gao, Ying-Lian; Kong, Xiang-Zhen; Wang, Xue-Song; Wang, Dong

    2015-01-01

    In current molecular biology, it becomes more and more important to identify differentially expressed genes closely correlated with a key biological process from gene expression data. In this paper, based on the Schatten p-norm and Lp-norm, a novel p-norm robust feature extraction method is proposed to identify the differentially expressed genes. In our method, the Schatten p-norm is used as the regularization function to obtain a low-rank matrix and the Lp-norm is taken as the error function to improve the robustness to outliers in the gene expression data. The results on simulation data show that our method can obtain higher identification accuracies than the competitive methods. Numerous experiments on real gene expression data sets demonstrate that our method can identify more differentially expressed genes than the others. Moreover, we confirmed that the identified genes are closely correlated with the corresponding gene expression data. PMID:26201006

  5. A P-Norm Robust Feature Extraction Method for Identifying Differentially Expressed Genes.

    PubMed

    Liu, Jian; Liu, Jin-Xing; Gao, Ying-Lian; Kong, Xiang-Zhen; Wang, Xue-Song; Wang, Dong

    2015-01-01

    In current molecular biology, it becomes more and more important to identify differentially expressed genes closely correlated with a key biological process from gene expression data. In this paper, based on the Schatten p-norm and Lp-norm, a novel p-norm robust feature extraction method is proposed to identify the differentially expressed genes. In our method, the Schatten p-norm is used as the regularization function to obtain a low-rank matrix and the Lp-norm is taken as the error function to improve the robustness to outliers in the gene expression data. The results on simulation data show that our method can obtain higher identification accuracies than the competitive methods. Numerous experiments on real gene expression data sets demonstrate that our method can identify more differentially expressed genes than the others. Moreover, we confirmed that the identified genes are closely correlated with the corresponding gene expression data.

  6. PGMapper: a web-based tool linking phenotype to genes.

    PubMed

    Xiong, Qing; Qiu, Yuhui; Gu, Weikuan

    2008-04-01

    With the availability of whole genome sequence in many species, linkage analysis, positional cloning and microarray are gradually becoming powerful tools for investigating the links between phenotype and genotype or genes. However, in these methods, causative genes underlying a quantitative trait locus, or a disease, are usually located within a large genomic region or a large set of genes. Examining the function of every gene is very time consuming and needs to retrieve and integrate the information from multiple databases or genome resources. PGMapper is a software tool for automatically matching phenotype to genes from a defined genome region or a group of given genes by combining the mapping information from the Ensembl database and gene function information from the OMIM and PubMed databases. PGMapper is currently available for candidate gene search of human, mouse, rat, zebrafish and 12 other species. Available online at http://www.genediscovery.org/pgmapper/index.jsp.

  7. A system for dosage-based functional genomics in poplar

    Treesearch

    Isabelle M. Henry; Matthew S. Zinkgraf; Andrew T. Groover; Luca Comai

    2015-01-01

    Altering gene dosage through variation in gene copy number is a powerful approach to addressing questions regarding gene regulation, quantitative trait loci, and heterosis, but one that is not easily applied to sexually transmitted species. Elite poplar (Populus spp) varieties are created through interspecific hybridization, followed by...

  8. Genome-wide identification, phylogeny and expression analyses of SCARECROW-LIKE(SCL) genes in millet (Setaria italica).

    PubMed

    Liu, Hongyun; Qin, Jiajia; Fan, Hui; Cheng, Jinjin; Li, Lin; Liu, Zheng

    2017-07-01

    As a member of the GRAS gene family, SCARECROW - LIKE ( SCL ) genes encode transcriptional regulators that are involved in plant information transmission and signal transduction. In this study, 44 SCL genes including two SCARECROW genes in millet were identified to be distributed on eight chromosomes, except chromosome 6. All the millet genes contain motifs 6-8, indicating that these motifs are conserved during the evolution. SCL genes of millet were divided into eight groups based on the phylogenetic relationship and classification of Arabidopsis SCL genes. Several putative millet orthologous genes in Arabidopsis , maize and rice were identified. High throughput RNA sequencing revealed that the expressions of millet SCL genes in root, stem, leaf, spica, and along leaf gradient varied greatly. Analyses combining the gene expression patterns, gene structures, motif compositions, promoter cis -elements identification, alternative splicing of transcripts and phylogenetic relationship of SCL genes indicate that the these genes may play diverse functions. Functionally characterized SCL genes in maize, rice and Arabidopsis would provide us some clues for future characterization of their homologues in millet. To the best of our knowledge, this is the first study of millet SCL genes at the genome wide level. Our work provides a useful platform for functional analysis of SCL genes in millet, a model crop for C 4 photosynthesis and bioenergy studies.

  9. Contrasting microbial functional genes in two distinct saline-alkali and slightly acidic oil-contaminated sites.

    PubMed

    Liang, Yuting; Zhao, Huihui; Zhang, Xu; Zhou, Jizhong; Li, Guanghe

    2014-07-15

    To compare the functional gene structure and diversity of microbial communities in saline-alkali and slightly acidic oil-contaminated sites, 40 soil samples were collected from two typical oil exploration sites in North and South China and analyzed with a comprehensive functional gene array (GeoChip 3.0). The overall microbial pattern was significantly different between the two sites, and a more divergent pattern was observed in slightly acidic soils. Response ratio was calculated to compare the microbial functional genes involved in organic contaminant degradation and carbon, nitrogen, phosphorus, and sulfur cycling. The results indicated a significantly low abundance of most genes involved in organic contaminant degradation and in the cycling of nitrogen and phosphorus in saline-alkali soils. By contrast, most carbon degradation genes and all carbon fixation genes had similar abundance at both sites. Based on the relationship between the environmental variables and microbial functional structure, pH was the major factor influencing the microbial distribution pattern in the two sites. This study demonstrated that microbial functional diversity and heterogeneity in oil-contaminated environments can vary significantly in relation to local environmental conditions. The limitation of nitrogen and phosphorus and the low degradation capacity of organic contaminant should be carefully considered, particularly in most oil-exploration sites with saline-alkali soils. Copyright © 2014 Elsevier B.V. All rights reserved.

  10. An in silico pipeline to filter the Toxoplasma gondii proteome for proteins that could traffic to the host cell nucleus and influence host cell epigenetic regulation.

    PubMed

    Syn, Genevieve; Blackwell, Jenefer M; Jamieson, Sarra E; Francis, Richard W

    2018-01-01

    Toxoplasma gondii uses epigenetic mechanisms to regulate both endogenous and host cell gene expression. To identify genes with putative epigenetic functions, we developed an in silico pipeline to interrogate the T. gondii proteome of 8313 proteins. Step 1 employs PredictNLS and NucPred to identify genes predicted to target eukaryotic nuclei. Step 2 uses GOLink to identify proteins of epigenetic function based on Gene Ontology terms. This resulted in 611 putative nuclear localised proteins with predicted epigenetic functions. Step 3 filtered for secretory proteins using SignalP, SecretomeP, and experimental data. This identified 57 of the 611 putative epigenetic proteins as likely to be secreted. The pipeline is freely available online, uses open access tools and software with user-friendly Perl scripts to automate and manage the results, and is readily adaptable to undertake any such in silico search for genes contributing to particular functions.

  11. AGORA : Organellar genome annotation from the amino acid and nucleotide references.

    PubMed

    Jung, Jaehee; Kim, Jong Im; Jeong, Young-Sik; Yi, Gangman

    2018-03-29

    Next-generation sequencing (NGS) technologies have led to the accumulation of highthroughput sequence data from various organisms in biology. To apply gene annotation of organellar genomes for various organisms, more optimized tools for functional gene annotation are required. Almost all gene annotation tools are mainly focused on the chloroplast genome of land plants or the mitochondrial genome of animals.We have developed a web application AGORA for the fast, user-friendly, and improved annotations of organellar genomes. AGORA annotates genes based on a BLAST-based homology search and clustering with selected reference sequences from the NCBI database or user-defined uploaded data. AGORA can annotate the functional genes in almost all mitochondrion and plastid genomes of eukaryotes. The gene annotation of a genome with an exon-intron structure within a gene or inverted repeat region is also available. It provides information of start and end positions of each gene, BLAST results compared with the reference sequence, and visualization of gene map by OGDRAW. Users can freely use the software, and the accessible URL is https://bigdata.dongguk.edu/gene_project/AGORA/.The main module of the tool is implemented by the python and php, and the web page is built by the HTML and CSS to support all browsers. gangman@dongguk.edu.

  12. The Biogeographic Pattern of Microbial Functional Genes along an Altitudinal Gradient of the Tibetan Pasture

    PubMed Central

    Qi, Qi; Zhao, Mengxin; Wang, Shiping; Ma, Xingyu; Wang, Yuxuan; Gao, Ying; Lin, Qiaoyan; Li, Xiangzhen; Gu, Baohua; Li, Guoxue; Zhou, Jizhong; Yang, Yunfeng

    2017-01-01

    As the highest place of the world, the Tibetan plateau is a fragile ecosystem. Given the importance of microbial communities in driving soil nutrient cycling, it is of interest to document the microbial biogeographic pattern here. We adopted a microarray-based tool named GeoChip 4.0 to investigate grassland microbial functional genes along an elevation gradient from 3200 to 3800 m above sea level open to free grazing by local herdsmen and wild animals. Interestingly, microbial functional diversities increase with elevation, so does the relative abundances of genes associated with carbon degradation, nitrogen cycling, methane production, cold shock and oxygen limitation. The range of Shannon diversities (10.27–10.58) showed considerably smaller variation than what was previously observed at ungrazed sites nearby (9.95–10.65), suggesting the important role of livestock grazing on microbial diversities. Closer examination showed that the dissimilarity of microbial community at our study sites increased with elevations, revealing an elevation-decay relationship of microbial functional genes. Both microbial functional diversity and the number of unique genes increased with elevations. Furthermore, we detected a tight linkage of greenhouse gas (CO2) and relative abundances of carbon cycling genes. Our biogeographic study provides insights on microbial functional diversity and soil biogeochemical cycling in Tibetan pastures. PMID:28659870

  13. 2A self-cleaving peptide-based multi-gene expression system in the silkworm Bombyx mori

    PubMed Central

    Wang, Yuancheng; Wang, Feng; Wang, Riyuan; Zhao, Ping; Xia, Qingyou

    2015-01-01

    Fundamental and applied studies of silkworms have entered the functional genomics era. Here, we report a multi-gene expression system (MGES) based on 2A self-cleaving peptide (2A), which regulates the simultaneous expression and cleavage of multiple gene targets in the silk gland of transgenic silkworms. First, a glycine-serine-glycine spacer (GSG) was found to significantly improve the cleavage efficiency of 2A. Then, the cleavage efficiency of six types of 2As with GSG was analyzed. The shortest porcine teschovirus-1 2A (P2A-GSG) exhibited the highest cleavage efficiency in all insect cell lines that we tested. Next, P2A-GSG successfully cleaved the artificial human serum albumin (66 kDa) linked with human acidic fibroblast growth factor (20.2 kDa) fusion genes and vitellogenin receptor fragment (196 kD) of silkworm linked with EGFP fusion genes, importantly, vitellogenin receptor protein was secreted to the outside of cells. Furthermore, P2A-GSG successfully mediated the simultaneous expression and cleavage of a DsRed and EGFP fusion gene in silk glands and caused secretion into the cocoon of transgenic silkworms using our sericin1 expression system. We predicted that the MGES would be an efficient tool for gene function research and innovative research on various functional silk materials in medicine, cosmetics, and other biomedical areas. PMID:26537835

  14. Understanding Transcription Factor Regulation by Integrating Gene Expression and DNase I Hypersensitive Sites.

    PubMed

    Wang, Guohua; Wang, Fang; Huang, Qian; Li, Yu; Liu, Yunlong; Wang, Yadong

    2015-01-01

    Transcription factors are proteins that bind to DNA sequences to regulate gene transcription. The transcription factor binding sites are short DNA sequences (5-20 bp long) specifically bound by one or more transcription factors. The identification of transcription factor binding sites and prediction of their function continue to be challenging problems in computational biology. In this study, by integrating the DNase I hypersensitive sites with known position weight matrices in the TRANSFAC database, the transcription factor binding sites in gene regulatory region are identified. Based on the global gene expression patterns in cervical cancer HeLaS3 cell and HelaS3-ifnα4h cell (interferon treatment on HeLaS3 cell for 4 hours), we present a model-based computational approach to predict a set of transcription factors that potentially cause such differential gene expression. Significantly, 6 out 10 predicted functional factors, including IRF, IRF-2, IRF-9, IRF-1 and IRF-3, ICSBP, belong to interferon regulatory factor family and upregulate the gene expression levels responding to the interferon treatment. Another factor, ISGF-3, is also a transcriptional activator induced by interferon alpha. Using the different transcription factor binding sites selected criteria, the prediction result of our model is consistent. Our model demonstrated the potential to computationally identify the functional transcription factors in gene regulation.

  15. PLAU inferred from a correlation network is critical for suppressor function of regulatory T cells

    PubMed Central

    He, Feng; Chen, Hairong; Probst-Kepper, Michael; Geffers, Robert; Eifes, Serge; del Sol, Antonio; Schughart, Klaus; Zeng, An-Ping; Balling, Rudi

    2012-01-01

    Human FOXP3+CD25+CD4+ regulatory T cells (Tregs) are essential to the maintenance of immune homeostasis. Several genes are known to be important for murine Tregs, but for human Tregs the genes and underlying molecular networks controlling the suppressor function still largely remain unclear. Here, we describe a strategy to identify the key genes directly from an undirected correlation network which we reconstruct from a very high time-resolution (HTR) transcriptome during the activation of human Tregs/CD4+ T-effector cells. We show that a predicted top-ranked new key gene PLAU (the plasminogen activator urokinase) is important for the suppressor function of both human and murine Tregs. Further analysis unveils that PLAU is particularly important for memory Tregs and that PLAU mediates Treg suppressor function via STAT5 and ERK signaling pathways. Our study demonstrates the potential for identifying novel key genes for complex dynamic biological processes using a network strategy based on HTR data, and reveals a critical role for PLAU in Treg suppressor function. PMID:23169000

  16. CRISPR-STOP: gene silencing through base-editing-induced nonsense mutations.

    PubMed

    Kuscu, Cem; Parlak, Mahmut; Tufan, Turan; Yang, Jiekun; Szlachta, Karol; Wei, Xiaolong; Mammadov, Rashad; Adli, Mazhar

    2017-07-01

    CRISPR-Cas9-induced DNA damage may have deleterious effects at high-copy-number genomic regions. Here, we use CRISPR base editors to knock out genes by changing single nucleotides to create stop codons. We show that the CRISPR-STOP method is an efficient and less deleterious alternative to wild-type Cas9 for gene-knockout studies. Early stop codons can be introduced in ∼17,000 human genes. CRISPR-STOP-mediated targeted screening demonstrates comparable efficiency to WT Cas9, which indicates the suitability of our approach for genome-wide functional screenings.

  17. Fusing literature and full network data improves disease similarity computation.

    PubMed

    Li, Ping; Nie, Yaling; Yu, Jingkai

    2016-08-30

    Identifying relatedness among diseases could help deepen understanding for the underlying pathogenic mechanisms of diseases, and facilitate drug repositioning projects. A number of methods for computing disease similarity had been developed; however, none of them were designed to utilize information of the entire protein interaction network, using instead only those interactions involving disease causing genes. Most of previously published methods required gene-disease association data, unfortunately, many diseases still have very few or no associated genes, which impeded broad adoption of those methods. In this study, we propose a new method (MedNetSim) for computing disease similarity by integrating medical literature and protein interaction network. MedNetSim consists of a network-based method (NetSim), which employs the entire protein interaction network, and a MEDLINE-based method (MedSim), which computes disease similarity by mining the biomedical literature. Among function-based methods, NetSim achieved the best performance. Its average AUC (area under the receiver operating characteristic curve) reached 95.2 %. MedSim, whose performance was even comparable to some function-based methods, acquired the highest average AUC in all semantic-based methods. Integration of MedSim and NetSim (MedNetSim) further improved the average AUC to 96.4 %. We further studied the effectiveness of different data sources. It was found that quality of protein interaction data was more important than its volume. On the contrary, higher volume of gene-disease association data was more beneficial, even with a lower reliability. Utilizing higher volume of disease-related gene data further improved the average AUC of MedNetSim and NetSim to 97.5 % and 96.7 %, respectively. Integrating biomedical literature and protein interaction network can be an effective way to compute disease similarity. Lacking sufficient disease-related gene data, literature-based methods such as MedSim can be a great addition to function-based algorithms. It may be beneficial to steer more resources torward studying gene-disease associations and improving the quality of protein interaction data. Disease similarities can be computed using the proposed methods at http:// www.digintelli.com:8000/ .

  18. Discovering functional modules by topic modeling RNA-Seq based toxicogenomic data.

    PubMed

    Yu, Ke; Gong, Binsheng; Lee, Mikyung; Liu, Zhichao; Xu, Joshua; Perkins, Roger; Tong, Weida

    2014-09-15

    Toxicogenomics (TGx) endeavors to elucidate the underlying molecular mechanisms through exploring gene expression profiles in response to toxic substances. Recently, RNA-Seq is increasingly regarded as a more powerful alternative to microarrays in TGx studies. However, realizing RNA-Seq's full potential requires novel approaches to extracting information from the complex TGx data. Considering read counts as the number of times a word occurs in a document, gene expression profiles from RNA-Seq are analogous to a word by document matrix used in text mining. Topic modeling aiming at to discover the latent structures in text corpora would be helpful to explore RNA-Seq based TGx data. In this study, topic modeling was applied on a typical RNA-Seq based TGx data set to discover hidden functional modules. The RNA-Seq based gene expression profiles were transformed into "documents", on which latent Dirichlet allocation (LDA) was used to build a topic model. We found samples treated by the compounds with the same modes of actions (MoAs) could be clustered based on topic similarities. The topic most relevant to each cluster was identified as a "marker" topic, which was interpreted by gene enrichment analysis with MoAs then confirmed by compound and pathways associations mined from literature. To further validate the "marker" topics, we tested topic transferability from RNA-Seq to microarrays. The RNA-Seq based gene expression profile of a topic specifically associated with peroxisome proliferator-activated receptors (PPAR) signaling pathway was used to query samples with similar expression profiles in two different microarray data sets, yielding accuracy of about 85%. This proof-of-concept study demonstrates the applicability of topic modeling to discover functional modules in RNA-Seq data and suggests a valuable computational tool for leveraging information within TGx data in RNA-Seq era.

  19. Age distribution patterns of human gene families: divergent for Gene Ontology categories and concordant between different subcellular localizations.

    PubMed

    Liu, Gangbiao; Zou, Yangyun; Cheng, Qiqun; Zeng, Yanwu; Gu, Xun; Su, Zhixi

    2014-04-01

    The age distribution of gene duplication events within the human genome exhibits two waves of duplications along with an ancient component. However, because of functional constraint differences, genes in different functional categories might show dissimilar retention patterns after duplication. It is known that genes in some functional categories are highly duplicated in the early stage of vertebrate evolution. However, the correlations of the age distribution pattern of gene duplication between the different functional categories are still unknown. To investigate this issue, we developed a robust pipeline to date the gene duplication events in the human genome. We successfully estimated about three-quarters of the duplication events within the human genome, along with the age distribution pattern in each Gene Ontology (GO) slim category. We found that some GO slim categories show different distribution patterns when compared to the whole genome. Further hierarchical clustering of the GO slim functional categories enabled grouping into two main clusters. We found that human genes located in the duplicated copy number variant regions, whose duplicate genes have not been fixed in the human population, were mainly enriched in the groups with a high proportion of recently duplicated genes. Moreover, we used a phylogenetic tree-based method to date the age of duplications in three signaling-related gene superfamilies: transcription factors, protein kinases and G-protein coupled receptors. These superfamilies were expressed in different subcellular localizations. They showed a similar age distribution as the signaling-related GO slim categories. We also compared the differences between the age distributions of gene duplications in multiple subcellular localizations. We found that the distribution patterns of the major subcellular localizations were similar to that of the whole genome. This study revealed the whole picture of the evolution patterns of gene functional categories in the human genome.

  20. dbCPG: A web resource for cancer predisposition genes.

    PubMed

    Wei, Ran; Yao, Yao; Yang, Wu; Zheng, Chun-Hou; Zhao, Min; Xia, Junfeng

    2016-06-21

    Cancer predisposition genes (CPGs) are genes in which inherited mutations confer highly or moderately increased risks of developing cancer. Identification of these genes and understanding the biological mechanisms that underlie them is crucial for the prevention, early diagnosis, and optimized management of cancer. Over the past decades, great efforts have been made to identify CPGs through multiple strategies. However, information on these CPGs and their molecular functions is scattered. To address this issue and provide a comprehensive resource for researchers, we developed the Cancer Predisposition Gene Database (dbCPG, Database URL: http://bioinfo.ahu.edu.cn:8080/dbCPG/index.jsp), the first literature-based gene resource for exploring human CPGs. It contains 827 human (724 protein-coding, 23 non-coding, and 80 unknown type genes), 637 rats, and 658 mouse CPGs. Furthermore, data mining was performed to gain insights into the understanding of the CPGs data, including functional annotation, gene prioritization, network analysis of prioritized genes and overlap analysis across multiple cancer types. A user-friendly web interface with multiple browse, search, and upload functions was also developed to facilitate access to the latest information on CPGs. Taken together, the dbCPG database provides a comprehensive data resource for further studies of cancer predisposition genes.

  1. Diametrical clustering for identifying anti-correlated gene clusters.

    PubMed

    Dhillon, Inderjit S; Marcotte, Edward M; Roshan, Usman

    2003-09-01

    Clustering genes based upon their expression patterns allows us to predict gene function. Most existing clustering algorithms cluster genes together when their expression patterns show high positive correlation. However, it has been observed that genes whose expression patterns are strongly anti-correlated can also be functionally similar. Biologically, this is not unintuitive-genes responding to the same stimuli, regardless of the nature of the response, are more likely to operate in the same pathways. We present a new diametrical clustering algorithm that explicitly identifies anti-correlated clusters of genes. Our algorithm proceeds by iteratively (i). re-partitioning the genes and (ii). computing the dominant singular vector of each gene cluster; each singular vector serving as the prototype of a 'diametric' cluster. We empirically show the effectiveness of the algorithm in identifying diametrical or anti-correlated clusters. Testing the algorithm on yeast cell cycle data, fibroblast gene expression data, and DNA microarray data from yeast mutants reveals that opposed cellular pathways can be discovered with this method. We present systems whose mRNA expression patterns, and likely their functions, oppose the yeast ribosome and proteosome, along with evidence for the inverse transcriptional regulation of a number of cellular systems.

  2. The structure of a gene co-expression network reveals biological functions underlying eQTLs.

    PubMed

    Villa-Vialaneix, Nathalie; Liaubet, Laurence; Laurent, Thibault; Cherel, Pierre; Gamot, Adrien; SanCristobal, Magali

    2013-01-01

    What are the commonalities between genes, whose expression level is partially controlled by eQTL, especially with regard to biological functions? Moreover, how are these genes related to a phenotype of interest? These issues are particularly difficult to address when the genome annotation is incomplete, as is the case for mammalian species. Moreover, the direct link between gene expression and a phenotype of interest may be weak, and thus difficult to handle. In this framework, the use of a co-expression network has proven useful: it is a robust approach for modeling a complex system of genetic regulations, and to infer knowledge for yet unknown genes. In this article, a case study was conducted with a mammalian species. It showed that the use of a co-expression network based on partial correlation, combined with a relevant clustering of nodes, leads to an enrichment of biological functions of around 83%. Moreover, the use of a spatial statistics approach allowed us to superimpose additional information related to a phenotype; this lead to highlighting specific genes or gene clusters that are related to the network structure and the phenotype. Three main results are worth noting: first, key genes were highlighted as a potential focus for forthcoming biological experiments; second, a set of biological functions, which support a list of genes under partial eQTL control, was set up by an overview of the global structure of the gene expression network; third, pH was found correlated with gene clusters, and then with related biological functions, as a result of a spatial analysis of the network topology.

  3. Water-soluble polymers bearing phosphorylcholine group and other zwitterionic groups for carrying DNA derivatives.

    PubMed

    Lin, Xiaojie; Ishihara, Kazuhiko

    2014-01-01

    Water-soluble polymers with equal positive and negative charges in the same monomer unit, such as the phosphorylcholine group and other zwitterionic groups, exhibit promising potential in gene delivery with appreciable transfection efficiency, compared with the traditional poly(ethylene glycol)-based polycation-gene complexes. These zwitterionic polymers with various architectural structures and properties have been synthesized by various polymerization methods, such as conventional radical polymerization, atom-transfer radical-polymerization, reversible addition-fragmentation chain-transfer polymerization, and nitroxide-mediated radical polymerization. These techniques have been used to efficiently facilitate gene therapy by fabrication of non-viral vectors with high cytocompatibility, large gene-carrying capacity, effective cell-membrane permeability, and in vivo gene-loading/releasing functionality. Zwitterionic polymer-based gene delivery vectors systems can be categorized into soluble-polymer/gene mixing, molecular self-assembly, and polymer-gene conjugation systems. This review describes the preparation and characterization of various zwitterionic polymer-based gene delivery vectors, specifically water-soluble phospholipid polymers for carrying gene derivatives.

  4. Biochemical characterization of microbial type terpene synthases in two closely related species of hornworts, Anthoceros punctatus and Anthoceros agrestis.

    PubMed

    Xiong, Wangdan; Fu, Jianyu; Köllner, Tobias G; Chen, Xinlu; Jia, Qidong; Guo, Haobo; Qian, Ping; Guo, Hong; Wu, Guojiang; Chen, Feng

    2018-05-01

    Microbial terpene synthase-like (MTPSL) genes are a type of terpene synthase genes only recently identified in plants. In contrast to typical plant terpene synthase genes, which are ubiquitous in land plants, MTPSL genes appear to occur only in nonseed plants. Our knowledge of catalytic functions of MTPSLs is very limited. Here we report biochemical characterization of the enzymes encoded by MTPSL genes from two closely related species of hornworts, Anthoceros punctatus and Anthoceros agrestis. Seven full-length MTPSL genes were identified in A. punctatus (ApMTPSL1-7) based on the analysis of its genome sequence. Using homology-based cloning, the apparent orthologs for six of the ApMTPSL genes, except ApMTPSL2, were cloned from A. agrestis. They were designated AaMTPSL1, 3-7. The coding sequences for each of the 13 Anthoceros MTPSL genes were cloned into a protein expression vector. Escherichia coli-expressed recombinant MTPSLs from hornworts were assayed for terpene synthase activities. Six ApMTPSLs and five AaMTPSLs, except for ApMTPSL5 and AaMTPSL5, showed catalytic activities with one or more isoprenyl diphosphate substrates. All functional MTPSLs exhibited sesquiterpene synthase activities. In contrast, only ApMTPSL7 and AaMTPSL7 showed monoterpene synthase activity and only ApMTPSL2, ApMTPSL6 and AaMTPSL6 showed diterpene synthase activity. Most MTPSLs from Anthoceros contain uncanonical aspartate-rich motif in the form of either 'DDxxxD' or 'DDxxx'. Homology-based structural modeling analysis of ApMTPSL1 and ApMTPSL7, which contain 'DDxxxD' and 'DDxxx' motif, respectively, showed that 'DDxxxD' and 'DDxxx' motifs are localized in the similar positions as the canonical 'DDxxD' motif in known terpene synthases. To further understand the role of individual aspartate residues in the motifs, ApMTPSL1 and ApMTPSL7 were selected as two representatives for site-directed mutagenesis studies. No activities were detected when any of the conserved aspartic acid was mutated into alanine. This study provides new information about the catalytic functions of MTPSLs and the functionality of their uncanonical aspartate-rich motifs, and builds a knowledge base for studying the biological importance of MTPSL genes and their terpene products in nonseed plants. Copyright © 2018 Elsevier Ltd. All rights reserved.

  5. An extended set of yeast-based functional assays accurately identifies human disease mutations

    PubMed Central

    Sun, Song; Yang, Fan; Tan, Guihong; Costanzo, Michael; Oughtred, Rose; Hirschman, Jodi; Theesfeld, Chandra L.; Bansal, Pritpal; Sahni, Nidhi; Yi, Song; Yu, Analyn; Tyagi, Tanya; Tie, Cathy; Hill, David E.; Vidal, Marc; Andrews, Brenda J.; Boone, Charles; Dolinski, Kara; Roth, Frederick P.

    2016-01-01

    We can now routinely identify coding variants within individual human genomes. A pressing challenge is to determine which variants disrupt the function of disease-associated genes. Both experimental and computational methods exist to predict pathogenicity of human genetic variation. However, a systematic performance comparison between them has been lacking. Therefore, we developed and exploited a panel of 26 yeast-based functional complementation assays to measure the impact of 179 variants (101 disease- and 78 non-disease-associated variants) from 22 human disease genes. Using the resulting reference standard, we show that experimental functional assays in a 1-billion-year diverged model organism can identify pathogenic alleles with significantly higher precision and specificity than current computational methods. PMID:26975778

  6. Characterization and detection of a widely distributed gene cluster that predicts anaerobic choline utilization by human gut bacteria.

    PubMed

    Martínez-del Campo, Ana; Bodea, Smaranda; Hamer, Hilary A; Marks, Jonathan A; Haiser, Henry J; Turnbaugh, Peter J; Balskus, Emily P

    2015-04-14

    Elucidation of the molecular mechanisms underlying the human gut microbiota's effects on health and disease has been complicated by difficulties in linking metabolic functions associated with the gut community as a whole to individual microorganisms and activities. Anaerobic microbial choline metabolism, a disease-associated metabolic pathway, exemplifies this challenge, as the specific human gut microorganisms responsible for this transformation have not yet been clearly identified. In this study, we established the link between a bacterial gene cluster, the choline utilization (cut) cluster, and anaerobic choline metabolism in human gut isolates by combining transcriptional, biochemical, bioinformatic, and cultivation-based approaches. Quantitative reverse transcription-PCR analysis and in vitro biochemical characterization of two cut gene products linked the entire cluster to growth on choline and supported a model for this pathway. Analyses of sequenced bacterial genomes revealed that the cut cluster is present in many human gut bacteria, is predictive of choline utilization in sequenced isolates, and is widely but discontinuously distributed across multiple bacterial phyla. Given that bacterial phylogeny is a poor marker for choline utilization, we were prompted to develop a degenerate PCR-based method for detecting the key functional gene choline TMA-lyase (cutC) in genomic and metagenomic DNA. Using this tool, we found that new choline-metabolizing gut isolates universally possessed cutC. We also demonstrated that this gene is widespread in stool metagenomic data sets. Overall, this work represents a crucial step toward understanding anaerobic choline metabolism in the human gut microbiota and underscores the importance of examining this microbial community from a function-oriented perspective. Anaerobic choline utilization is a bacterial metabolic activity that occurs in the human gut and is linked to multiple diseases. While bacterial genes responsible for choline fermentation (the cut gene cluster) have been recently identified, there has been no characterization of these genes in human gut isolates and microbial communities. In this work, we use multiple approaches to demonstrate that the pathway encoded by the cut genes is present and functional in a diverse range of human gut bacteria and is also widespread in stool metagenomes. We also developed a PCR-based strategy to detect a key functional gene (cutC) involved in this pathway and applied it to characterize newly isolated choline-utilizing strains. Both our analyses of the cut gene cluster and this molecular tool will aid efforts to further understand the role of choline metabolism in the human gut microbiota and its link to disease. Copyright © 2015 Martínez-del Campo et al.

  7. Quantifying the Effect of DNA Packaging on Gene Expression Level

    NASA Astrophysics Data System (ADS)

    Kim, Harold

    2010-10-01

    Gene expression, the process by which the genetic code comes alive in the form of proteins, is one of the most important biological processes in living cells, and begins when transcription factors bind to specific DNA sequences in the promoter region upstream of a gene. The relationship between gene expression output and transcription factor input which is termed the gene regulation function is specific to each promoter, and predicting this gene regulation function from the locations of transcription factor binding sites is one of the challenges in biology. In eukaryotic organisms (for example, animals, plants, fungi etc), DNA is highly compacted into nucleosomes, 147-bp segments of DNA tightly wrapped around histone protein core, and therefore, the accessibility of transcription factor binding sites depends on their locations with respect to nucleosomes - sites inside nucleosomes are less accessible than those outside nucleosomes. To understand how transcription factor binding sites contribute to gene expression in a quantitative manner, we obtain gene regulation functions of promoters with various configurations of transcription factor binding sites by using fluorescent protein reporters to measure transcription factor input and gene expression output in single yeast cells. In this talk, I will show that the affinity of a transcription factor binding site inside and outside the nucleosome controls different aspects of the gene regulation function, and explain this finding based on a mass-action kinetic model that includes competition between nucleosomes and transcription factors.

  8. Ecological transcriptomics of lake-type and riverine sockeye salmon (Oncorhynchus nerka)

    PubMed Central

    2011-01-01

    Background There are a growing number of genomes sequenced with tentative functions assigned to a large proportion of the individual genes. Model organisms in laboratory settings form the basis for the assignment of gene function, and the ecological context of gene function is lacking. This work addresses this shortcoming by investigating expressed genes of sockeye salmon (Oncorhynchus nerka) muscle tissue. We compared morphology and gene expression in natural juvenile sockeye populations related to river and lake habitats. Based on previously documented divergent morphology, feeding strategy, and predation in association with these distinct environments, we expect that burst swimming is favored in riverine population and continuous swimming is favored in lake-type population. In turn we predict that morphology and expressed genes promote burst swimming in riverine sockeye and continuous swimming in lake-type sockeye. Results We found the riverine sockeye population had deep, robust bodies and lake-type had shallow, streamlined bodies. Gene expression patterns were measured using a 16K microarray, discovering 141 genes with significant differential expression. Overall, the identity and function of these genes was consistent with our hypothesis. In addition, Gene Ontology (GO) enrichment analyses with a larger set of differentially expressed genes found the "biosynthesis" category enriched for the riverine population and the "metabolism" category enriched for the lake-type population. Conclusions This study provides a framework for understanding sockeye life history from a transcriptomic perspective and a starting point for more extensive, targeted studies determining the ecological context of genes. PMID:22136247

  9. Ecological transcriptomics of lake-type and riverine sockeye salmon (Oncorhynchus nerka).

    PubMed

    Pavey, Scott A; Sutherland, Ben J G; Leong, Jong; Robb, Adrienne; von Schalburg, Kris; Hamon, Troy R; Koop, Ben F; Nielsen, Jennifer L

    2011-12-02

    There are a growing number of genomes sequenced with tentative functions assigned to a large proportion of the individual genes. Model organisms in laboratory settings form the basis for the assignment of gene function, and the ecological context of gene function is lacking. This work addresses this shortcoming by investigating expressed genes of sockeye salmon (Oncorhynchus nerka) muscle tissue. We compared morphology and gene expression in natural juvenile sockeye populations related to river and lake habitats. Based on previously documented divergent morphology, feeding strategy, and predation in association with these distinct environments, we expect that burst swimming is favored in riverine population and continuous swimming is favored in lake-type population. In turn we predict that morphology and expressed genes promote burst swimming in riverine sockeye and continuous swimming in lake-type sockeye. We found the riverine sockeye population had deep, robust bodies and lake-type had shallow, streamlined bodies. Gene expression patterns were measured using a 16 k microarray, discovering 141 genes with significant differential expression. Overall, the identity and function of these genes was consistent with our hypothesis. In addition, Gene Ontology (GO) enrichment analyses with a larger set of differentially expressed genes found the "biosynthesis" category enriched for the riverine population and the "metabolism" category enriched for the lake-type population. This study provides a framework for understanding sockeye life history from a transcriptomic perspective and a starting point for more extensive, targeted studies determining the ecological context of genes.

  10. Prediction and Testing of Biological Networks Underlying Intestinal Cancer

    PubMed Central

    Mariadason, John M.; Wang, Donghai; Augenlicht, Leonard H.; Chance, Mark R.

    2010-01-01

    Colorectal cancer progresses through an accumulation of somatic mutations, some of which reside in so-called “driver” genes that provide a growth advantage to the tumor. To identify points of intersection between driver gene pathways, we implemented a network analysis framework using protein interactions to predict likely connections – both precedented and novel – between key driver genes in cancer. We applied the framework to find significant connections between two genes, Apc and Cdkn1a (p21), known to be synergistic in tumorigenesis in mouse models. We then assessed the functional coherence of the resulting Apc-Cdkn1a network by engineering in vivo single node perturbations of the network: mouse models mutated individually at Apc (Apc1638N+/−) or Cdkn1a (Cdkn1a−/−), followed by measurements of protein and gene expression changes in intestinal epithelial tissue. We hypothesized that if the predicted network is biologically coherent (functional), then the predicted nodes should associate more specifically with dysregulated genes and proteins than stochastically selected genes and proteins. The predicted Apc-Cdkn1a network was significantly perturbed at the mRNA-level by both single gene knockouts, and the predictions were also strongly supported based on physical proximity and mRNA coexpression of proteomic targets. These results support the functional coherence of the proposed Apc-Cdkn1a network and also demonstrate how network-based predictions can be statistically tested using high-throughput biological data. PMID:20824133

  11. Integrative gene network construction to analyze cancer recurrence using semi-supervised learning.

    PubMed

    Park, Chihyun; Ahn, Jaegyoon; Kim, Hyunjin; Park, Sanghyun

    2014-01-01

    The prognosis of cancer recurrence is an important research area in bioinformatics and is challenging due to the small sample sizes compared to the vast number of genes. There have been several attempts to predict cancer recurrence. Most studies employed a supervised approach, which uses only a few labeled samples. Semi-supervised learning can be a great alternative to solve this problem. There have been few attempts based on manifold assumptions to reveal the detailed roles of identified cancer genes in recurrence. In order to predict cancer recurrence, we proposed a novel semi-supervised learning algorithm based on a graph regularization approach. We transformed the gene expression data into a graph structure for semi-supervised learning and integrated protein interaction data with the gene expression data to select functionally-related gene pairs. Then, we predicted the recurrence of cancer by applying a regularization approach to the constructed graph containing both labeled and unlabeled nodes. The average improvement rate of accuracy for three different cancer datasets was 24.9% compared to existing supervised and semi-supervised methods. We performed functional enrichment on the gene networks used for learning. We identified that those gene networks are significantly associated with cancer-recurrence-related biological functions. Our algorithm was developed with standard C++ and is available in Linux and MS Windows formats in the STL library. The executable program is freely available at: http://embio.yonsei.ac.kr/~Park/ssl.php.

  12. Gene knockout by targeted mutagenesis in a hemimetabolous insect, the two-spotted cricket Gryllus bimaculatus, using TALENs.

    PubMed

    Watanabe, Takahito; Noji, Sumihare; Mito, Taro

    2014-08-15

    Hemimetabolous, or incompletely metamorphosing, insects are phylogenetically basal. These insects include many deleterious species. The cricket, Gryllus bimaculatus, is an emerging model for hemimetabolous insects, based on the success of RNA interference (RNAi)-based gene-functional analyses and transgenic technology. Taking advantage of genome-editing technologies in this species would greatly promote functional genomics studies. Genome editing using transcription activator-like effector nucleases (TALENs) has proven to be an effective method for site-specific genome manipulation in various species. TALENs are artificial nucleases that are capable of inducing DNA double-strand breaks into specified target sequences. Here, we describe a protocol for TALEN-based gene knockout in G. bimaculatus, including a mutant selection scheme via mutation detection assays, for generating homozygous knockout organisms. Copyright © 2014 Elsevier Inc. All rights reserved.

  13. Distinct Molecular Signature of Murine Fetal Liver and Adult Hematopoietic Stem Cells Identify Novel Regulators of Hematopoietic Stem Cell Function

    PubMed Central

    Manesia, Javed K.; Franch, Monica; Tabas-Madrid, Daniel; Nogales-Cadenas, Ruben; Vanwelden, Thomas; Van Den Bosch, Elisa; Xu, Zhuofei; Pascual-Montano, Alberto; Khurana, Satish; Verfaillie, Catherine M.

    2018-01-01

    During ontogeny, fetal liver (FL) acts as a major site for hematopoietic stem cell (HSC) maturation and expansion, whereas HSCs in the adult bone marrow (ABM) are largely quiescent. HSCs in the FL possess faster repopulation capacity as compared with ABM HSCs. However, the molecular mechanism regulating the greater self-renewal potential of FL HSCs has not yet extensively been assessed. Recently, we published RNA sequencing-based gene expression analysis on FL HSCs from 14.5-day mouse embryo (E14.5) in comparison to the ABM HSCs. We reanalyzed these data to identify key transcriptional regulators that play important roles in the expansion of HSCs during development. The comparison of FL E14.5 with ABM HSCs identified more than 1,400 differentially expressed genes. More than 200 genes were shortlisted based on the gene ontology (GO) annotation term “transcription.” By morpholino-based knockdown studies in zebrafish, we assessed the function of 18 of these regulators, previously not associated with HSC proliferation. Our studies identified a previously unknown role for tdg, uhrf1, uchl5, and ncoa1 in the emergence of definitive hematopoiesis in zebrafish. In conclusion, we demonstrate that identification of genes involved in transcriptional regulation differentially expressed between expanding FL HSCs and quiescent ABM HSCs, uncovers novel regulators of HSC function. PMID:27958775

  14. GeoChip-based analysis of microbial functional gene diversity in a landfill leachate-contaminated aquifer

    USGS Publications Warehouse

    Lu, Zhenmei; He, Zhili; Parisi, Victoria A.; Kang, Sanghoon; Deng, Ye; Van Nostrand, Joy D.; Masoner, Jason R.; Cozzarelli, Isabelle M.; Suflita, Joseph M.; Zhou, Jizhong

    2012-01-01

    The functional gene diversity and structure of microbial communities in a shallow landfill leachate-contaminated aquifer were assessed using a comprehensive functional gene array (GeoChip 3.0). Water samples were obtained from eight wells at the same aquifer depth immediately below a municipal landfill or along the predominant downgradient groundwater flowpath. Functional gene richness and diversity immediately below the landfill and the closest well were considerably lower than those in downgradient wells. Mantel tests and canonical correspondence analysis (CCA) suggested that various geochemical parameters had a significant impact on the subsurface microbial community structure. That is, leachate from the unlined landfill impacted the diversity, composition, structure, and functional potential of groundwater microbial communities as a function of groundwater pH, and concentrations of sulfate, ammonia, and dissolved organic carbon (DOC). Historical geochemical records indicate that all sampled wells chronically received leachate, and the increase in microbial diversity as a function of distance from the landfill is consistent with mitigation of the impact of leachate on the groundwater system by natural attenuation mechanisms.

  15. Analysis of the functional gene structure and metabolic potential of microbial community in high arsenic groundwater.

    PubMed

    Li, Ping; Jiang, Zhou; Wang, Yanhong; Deng, Ye; Van Nostrand, Joy D; Yuan, Tong; Liu, Han; Wei, Dazhun; Zhou, Jizhong

    2017-10-15

    Microbial functional potential in high arsenic (As) groundwater ecosystems remains largely unknown. In this study, the microbial community functional composition of nineteen groundwater samples was investigated using a functional gene array (GeoChip 5.0). Samples were divided into low and high As groups based on the clustering analysis of geochemical parameters and microbial functional structures. The results showed that As related genes (arsC, arrA), sulfate related genes (dsrA and dsrB), nitrogen cycling related genes (ureC, amoA, and hzo) and methanogen genes (mcrA, hdrB) in groundwater samples were correlated with As, SO 4 2- , NH 4 + or CH 4 concentrations, respectively. Canonical correspondence analysis (CCA) results indicated that some geochemical parameters including As, total organic content, SO 4 2- , NH 4 + , oxidation-reduction potential (ORP) and pH were important factors shaping the functional microbial community structures. Alkaline and reducing conditions with relatively low SO 4 2- , ORP, and high NH 4 + , as well as SO 4 2- and Fe reduction and ammonification involved in microbially-mediated geochemical processes could be associated with As enrichment in groundwater. This study provides an overall picture of functional microbial communities in high As groundwater aquifers, and also provides insights into the critical role of microorganisms in As biogeochemical cycling. Copyright © 2017 Elsevier Ltd. All rights reserved.

  16. Functional relevance for type 1 diabetes mellitus-associated genetic variants by using integrative analyses.

    PubMed

    Qiu, Ying-Hua; Deng, Fei-Yan; Tang, Zai-Xiang; Jiang, Zhen-Huan; Lei, Shu-Feng

    2015-10-01

    Type 1 diabetes mellitus (type 1 DM) is an autoimmune disease. Although genome-wide association studies (GWAS) and meta-analyses have successfully identified numerous type 1 DM-associated susceptibility loci, the underlying mechanisms for these susceptibility loci are currently largely unclear. Based on publicly available datasets, we performed integrative analyses (i.e., integrated gene relationships among implicated loci, differential gene expression analysis, functional prediction and functional annotation clustering analysis) and combined with expression quantitative trait loci (eQTL) results to further explore function mechanisms underlying the associations between genetic variants and type 1 DM. Among a total of 183 type 1 DM-associated SNPs, eQTL analysis showed that 17 SNPs with cis-regulated eQTL effects on 9 genes. All the 9 eQTL genes enrich in immune-related pathways or Gene Ontology (GO) terms. Functional prediction analysis identified 5 SNPs located in transcription factor (TF) binding sites. Of the 9 eQTL genes, 6 (TAP2, HLA-DOB, HLA-DQB1, HLA-DQA1, HLA-DRB5 and CTSH) were differentially expressed in type 1 DM-associated related cells. Especially, rs3825932 in CTSH has integrative functional evidence supporting the association with type 1 DM. These findings indicated that integrative analyses can yield important functional information to link genetic variants and type 1 DM. Copyright © 2015 American Society for Histocompatibility and Immunogenetics. Published by Elsevier Inc. All rights reserved.

  17. Robust one-Tube Ω-PCR Strategy Accelerates Precise Sequence Modification of Plasmids for Functional Genomics

    PubMed Central

    Chen, Letian; Wang, Fengpin; Wang, Xiaoyu; Liu, Yao-Guang

    2013-01-01

    Functional genomics requires vector construction for protein expression and functional characterization of target genes; therefore, a simple, flexible and low-cost molecular manipulation strategy will be highly advantageous for genomics approaches. Here, we describe a Ω-PCR strategy that enables multiple types of sequence modification, including precise insertion, deletion and substitution, in any position of a circular plasmid. Ω-PCR is based on an overlap extension site-directed mutagenesis technique, and is named for its characteristic Ω-shaped secondary structure during PCR. Ω-PCR can be performed either in two steps, or in one tube in combination with exonuclease I treatment. These strategies have wide applications for protein engineering, gene function analysis and in vitro gene splicing. PMID:23335613

  18. Discovering potential driver genes through an integrated model of somatic mutation profiles and gene functional information.

    PubMed

    Xi, Jianing; Wang, Minghui; Li, Ao

    2017-09-26

    The accumulating availability of next-generation sequencing data offers an opportunity to pinpoint driver genes that are causally implicated in oncogenesis through computational models. Despite previous efforts made regarding this challenging problem, there is still room for improvement in the driver gene identification accuracy. In this paper, we propose a novel integrated approach called IntDriver for prioritizing driver genes. Based on a matrix factorization framework, IntDriver can effectively incorporate functional information from both the interaction network and Gene Ontology similarity, and detect driver genes mutated in different sets of patients at the same time. When evaluated through known benchmarking driver genes, the top ranked genes of our result show highly significant enrichment for the known genes. Meanwhile, IntDriver also detects some known driver genes that are not found by the other competing approaches. When measured by precision, recall and F1 score, the performances of our approach are comparable or increased in comparison to the competing approaches.

  19. A statistical framework for biomedical literature mining.

    PubMed

    Chung, Dongjun; Lawson, Andrew; Zheng, W Jim

    2017-09-30

    In systems biology, it is of great interest to identify new genes that were not previously reported to be associated with biological pathways related to various functions and diseases. Identification of these new pathway-modulating genes does not only promote understanding of pathway regulation mechanisms but also allow identification of novel targets for therapeutics. Recently, biomedical literature has been considered as a valuable resource to investigate pathway-modulating genes. While the majority of currently available approaches are based on the co-occurrence of genes within an abstract, it has been reported that these approaches show only sub-optimal performances because 70% of abstracts contain information only for a single gene. To overcome such limitation, we propose a novel statistical framework based on the concept of ontology fingerprint that uses gene ontology to extract information from large biomedical literature data. The proposed framework simultaneously identifies pathway-modulating genes and facilitates interpreting functions of these new genes. We also propose a computationally efficient posterior inference procedure based on Metropolis-Hastings within Gibbs sampler for parameter updates and the poor man's reversible jump Markov chain Monte Carlo approach for model selection. We evaluate the proposed statistical framework with simulation studies, experimental validation, and an application to studies of pathway-modulating genes in yeast. The R implementation of the proposed model is currently available at https://dongjunchung.github.io/bayesGO/. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  20. Identification and characterization of the grape WRKY family.

    PubMed

    Zhang, Ying; Feng, Jian Can

    2014-01-01

    WRKY transcription factors have functions in plant growth and development and in response to biotic and abiotic stresses. Many studies have focused on functional identification of WRKY transcription factors, but little is known about the molecular phylogeny or global expression patterns of the complete WRKY family. In this study, we identified 80 WRKY proteins encoded in the grape genome. Based on the structural features of these proteins, the grape WRKY genes were classified into three groups (groups 1-3). Analysis of WRKY genes expression profiles indicated that 28 WRKY genes were differentially expressed in response to biotic stress caused by grape whiterot and/or salicylic acid (SA). In that 16 WRKY genes upregulated both by whiterot pathogenic bacteria and SA. The results indicated that 16 WRKY proteins participated in SA-dependent defense signal pathway. This study provides a basis for cloning genes with specific functions from grape.

  1. Evolutionary Characteristics of Missing Proteins: Insights into the Evolution of Human Chromosomes Related to Missing-Protein-Encoding Genes.

    PubMed

    Xu, Aishi; Li, Guang; Yang, Dong; Wu, Songfeng; Ouyang, Hongsheng; Xu, Ping; He, Fuchu

    2015-12-04

    Although the "missing protein" is a temporary concept in C-HPP, the biological information for their "missing" could be an important clue in evolutionary studies. Here we classified missing-protein-encoding genes into two groups, the genes encoding PE2 proteins (with transcript evidence) and the genes encoding PE3/4 proteins (with no transcript evidence). These missing-protein-encoding genes distribute unevenly among different chromosomes, chromosomal regions, or gene clusters. In the view of evolutionary features, PE3/4 genes tend to be young, spreading at the nonhomology chromosomal regions and evolving at higher rates. Interestingly, there is a higher proportion of singletons in PE3/4 genes than the proportion of singletons in all genes (background) and OTCSGs (organ, tissue, cell type-specific genes). More importantly, most of the paralogous PE3/4 genes belong to the newly duplicated members of the paralogous gene groups, which mainly contribute to special biological functions, such as "smell perception". These functions are heavily restricted into specific type of cells, tissues, or specific developmental stages, acting as the new functional requirements that facilitated the emergence of the missing-protein-encoding genes during evolution. In addition, the criteria for the extremely special physical-chemical proteins were first set up based on the properties of PE2 proteins, and the evolutionary characteristics of those proteins were explored. Overall, the evolutionary analyses of missing-protein-encoding genes are expected to be highly instructive for proteomics and functional studies in the future.

  2. Development of resources for the analysis of gene function in Pucciniomycotina red yeasts.

    PubMed

    Ianiri, Giuseppe; Wright, Sandra A I; Castoria, Raffaello; Idnurm, Alexander

    2011-07-01

    The Pucciniomycotina is an important subphylum of basidiomycete fungi but with limited tools to analyze gene functions. Transformation protocols were established for a Sporobolomyces species (strain IAM 13481), the first Pucciniomycotina species with a completed draft genome sequence, to enable assessment of gene function through phenotypic characterization of mutant strains. Transformation markers were the URA3 and URA5 genes that enable selection and counter-selection based on uracil auxotrophy and resistance to 5-fluoroorotic acid. The wild type copies of these genes were cloned into plasmids that were used for transformation of Sporobolomyces sp. by both biolistic and Agrobacterium-mediated approaches. These resources have been deposited to be available from the Fungal Genetics Stock Center. To show that these techniques could be used to elucidate gene functions, the LEU1 gene was targeted for specific homologous replacement, and also demonstrating that this gene is required for the biosynthesis of leucine in basidiomycete fungi. T-DNA insertional mutants were isolated and further characterized, revealing insertions in genes that encode the homologs of Chs7, Erg3, Kre6, Kex1, Pik1, Sad1, Ssu1 and Tlg1. Phenotypic analysis of these mutants reveals both conserved and divergent functions compared with other fungi. Some of these strains exhibit reduced resistance to detergents, the antifungal agent fluconazole or sodium sulfite, or lower recovery from heat stress. While there are current experimental limitations for Sporobolomyces sp. such as the lack of Mendelian genetics for conventional mating, these findings demonstrate the facile nature of at least one Pucciniomycotina species for genetic manipulation and the potential to develop these organisms into new models for understanding gene function and evolution in the fungi. Copyright © 2011 Elsevier Inc. All rights reserved.

  3. Large-scale inference of gene function through phylogenetic annotation of Gene Ontology terms: case study of the apoptosis and autophagy cellular processes.

    PubMed

    Feuermann, Marc; Gaudet, Pascale; Mi, Huaiyu; Lewis, Suzanna E; Thomas, Paul D

    2016-01-01

    We previously reported a paradigm for large-scale phylogenomic analysis of gene families that takes advantage of the large corpus of experimentally supported Gene Ontology (GO) annotations. This 'GO Phylogenetic Annotation' approach integrates GO annotations from evolutionarily related genes across ∼100 different organisms in the context of a gene family tree, in which curators build an explicit model of the evolution of gene functions. GO Phylogenetic Annotation models the gain and loss of functions in a gene family tree, which is used to infer the functions of uncharacterized (or incompletely characterized) gene products, even for human proteins that are relatively well studied. Here, we report our results from applying this paradigm to two well-characterized cellular processes, apoptosis and autophagy. This revealed several important observations with respect to GO annotations and how they can be used for function inference. Notably, we applied only a small fraction of the experimentally supported GO annotations to infer function in other family members. The majority of other annotations describe indirect effects, phenotypes or results from high throughput experiments. In addition, we show here how feedback from phylogenetic annotation leads to significant improvements in the PANTHER trees, the GO annotations and GO itself. Thus GO phylogenetic annotation both increases the quantity and improves the accuracy of the GO annotations provided to the research community. We expect these phylogenetically based annotations to be of broad use in gene enrichment analysis as well as other applications of GO annotations.Database URL: http://amigo.geneontology.org/amigo. © The Author(s) 2016. Published by Oxford University Press.

  4. rSNPBase 3.0: an updated database of SNP-related regulatory elements, element-gene pairs and SNP-based gene regulatory networks.

    PubMed

    Guo, Liyuan; Wang, Jing

    2018-01-04

    Here, we present the updated rSNPBase 3.0 database (http://rsnp3.psych.ac.cn), which provides human SNP-related regulatory elements, element-gene pairs and SNP-based regulatory networks. This database is the updated version of the SNP regulatory annotation database rSNPBase and rVarBase. In comparison to the last two versions, there are both structural and data adjustments in rSNPBase 3.0: (i) The most significant new feature is the expansion of analysis scope from SNP-related regulatory elements to include regulatory element-target gene pairs (E-G pairs), therefore it can provide SNP-based gene regulatory networks. (ii) Web function was modified according to data content and a new network search module is provided in the rSNPBase 3.0 in addition to the previous regulatory SNP (rSNP) search module. The two search modules support data query for detailed information (related-elements, element-gene pairs, and other extended annotations) on specific SNPs and SNP-related graphic networks constructed by interacting transcription factors (TFs), miRNAs and genes. (3) The type of regulatory elements was modified and enriched. To our best knowledge, the updated rSNPBase 3.0 is the first data tool supports SNP functional analysis from a regulatory network prospective, it will provide both a comprehensive understanding and concrete guidance for SNP-related regulatory studies. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. rSNPBase 3.0: an updated database of SNP-related regulatory elements, element-gene pairs and SNP-based gene regulatory networks

    PubMed Central

    2018-01-01

    Abstract Here, we present the updated rSNPBase 3.0 database (http://rsnp3.psych.ac.cn), which provides human SNP-related regulatory elements, element-gene pairs and SNP-based regulatory networks. This database is the updated version of the SNP regulatory annotation database rSNPBase and rVarBase. In comparison to the last two versions, there are both structural and data adjustments in rSNPBase 3.0: (i) The most significant new feature is the expansion of analysis scope from SNP-related regulatory elements to include regulatory element–target gene pairs (E–G pairs), therefore it can provide SNP-based gene regulatory networks. (ii) Web function was modified according to data content and a new network search module is provided in the rSNPBase 3.0 in addition to the previous regulatory SNP (rSNP) search module. The two search modules support data query for detailed information (related-elements, element-gene pairs, and other extended annotations) on specific SNPs and SNP-related graphic networks constructed by interacting transcription factors (TFs), miRNAs and genes. (3) The type of regulatory elements was modified and enriched. To our best knowledge, the updated rSNPBase 3.0 is the first data tool supports SNP functional analysis from a regulatory network prospective, it will provide both a comprehensive understanding and concrete guidance for SNP-related regulatory studies. PMID:29140525

  6. pTRA - A reporter system for monitoring the intracellular dynamics of gene expression.

    PubMed

    Wagner, Sabine G; Ziegler, Martin; Löwe, Hannes; Kremling, Andreas; Pflüger-Grau, Katharina

    2018-01-01

    The presence of standardised tools and methods to measure and represent accurately biological parts and functions is a prerequisite for successful metabolic engineering and crucial to understand and predict the behaviour of synthetic genetic circuits. Many synthetic gene networks are based on transcriptional circuits, thus information on transcriptional and translational activity is important for understanding and fine-tuning the synthetic function. To this end, we have developed a toolkit to analyse systematically the transcriptional and translational activity of a specific synthetic part in vivo. It is based on the plasmid pTRA and allows the assignment of specific transcriptional and translational outputs to the gene(s) of interest (GOI) and to compare different genetic setups. By this, the optimal combination of transcriptional strength and translational activity can be identified. The design is tested in a case study using the gene encoding the fluorescent mCherry protein as GOI. We show the intracellular dynamics of mRNA and protein formation and discuss the potential and shortcomings of the pTRA plasmid.

  7. Leveraging blood serotonin as an endophenotype to identify de novo and rare variants involved in autism.

    PubMed

    Chen, Rui; Davis, Lea K; Guter, Stephen; Wei, Qiang; Jacob, Suma; Potter, Melissa H; Cox, Nancy J; Cook, Edwin H; Sutcliffe, James S; Li, Bingshan

    2017-01-01

    Autism spectrum disorder (ASD) is one of the most highly heritable neuropsychiatric disorders, but underlying molecular mechanisms are still unresolved due to extreme locus heterogeneity. Leveraging meaningful endophenotypes or biomarkers may be an effective strategy to reduce heterogeneity to identify novel ASD genes. Numerous lines of evidence suggest a link between hyperserotonemia, i.e., elevated serotonin (5-hydroxytryptamine or 5-HT) in whole blood, and ASD. However, the genetic determinants of blood 5-HT level and their relationship to ASD are largely unknown. In this study, pursuing the hypothesis that de novo variants (DNVs) and rare risk alleles acting in a recessive mode may play an important role in predisposition of hyperserotonemia in people with ASD, we carried out whole exome sequencing (WES) in 116 ASD parent-proband trios with most (107) probands having 5-HT measurements. Combined with published ASD DNVs, we identified USP15 as having recurrent de novo loss of function mutations and discovered evidence supporting two other known genes with recurrent DNVs ( FOXP1 and KDM5B ). Genes harboring functional DNVs significantly overlap with functional/disease gene sets known to be involved in ASD etiology, including FMRP targets and synaptic formation and transcriptional regulation genes. We grouped the probands into High-5HT and Normal-5HT groups based on normalized serotonin levels, and used network-based gene set enrichment analysis (NGSEA) to identify novel hyperserotonemia-related ASD genes based on LoF and missense DNVs. We found enrichment in the High-5HT group for a gene network module (DAWN-1) previously implicated in ASD, and this points to the TGF-β pathway and cell junction processes. Through analysis of rare recessively acting variants (RAVs), we also found that rare compound heterozygotes (CHs) in the High-5HT group were enriched for loci in an ASD-associated gene set. Finally, we carried out rare variant group-wise transmission disequilibrium tests (gTDT) and observed significant association of rare variants in genes encoding a subset of the serotonin pathway with ASD. Our study identified USP15 as a novel gene implicated in ASD based on recurrent DNVs. It also demonstrates the potential value of 5-HT as an effective endophenotype for gene discovery in ASD, and the effectiveness of this strategy needs to be further explored in studies of larger sample sizes.

  8. The NtAMI1 gene functions in cell division of tobacco BY-2 cells in the presence of indole-3-acetamide.

    PubMed

    Nemoto, Keiichirou; Hara, Masamitsu; Suzuki, Masashi; Seki, Hikaru; Muranaka, Toshiya; Mano, Yoshihiro

    2009-01-22

    Tobacco (Nicotiana tabacum) Bright Yellow-2 (BY-2) cells can be grown in medium containing indole-3-acetamide (IAM). Based on this finding, the NtAMI1 gene, whose product is functionally equivalent to the AtAMI1 gene of Arabidopsis thaliana and the aux2 gene of Agrobacterium rhizogenes, was isolated from BY-2 cells. Overexpression of the NtAMI1 gene allowed BY-2 cells to proliferate at lower concentrations of IAM, whereas suppression of the NtAMI1 gene by RNA interference (RNAi) caused severe growth inhibition in the medium containing IAM. These results suggest that IAM is incorporated into plant cells and converted to the auxin, indole-3-acetic acid, by NtAMI1.

  9. The genetics of colony form and function in Caribbean Acropora corals.

    PubMed

    Hemond, Elizabeth M; Kaluziak, Stefan T; Vollmer, Steven V

    2014-12-17

    Colonial reef-building corals have evolved a broad spectrum of colony morphologies based on coordinated asexual reproduction of polyps on a secreted calcium carbonate skeleton. Though cnidarians have been shown to possess and use similar developmental genes to bilaterians during larval development and polyp formation, little is known about genetic regulation of colony morphology in hard corals. We used RNA-seq to evaluate transcriptomic differences between functionally distinct regions of the coral (apical branch tips and branch bases) in two species of Caribbean Acropora, the staghorn coral, A. cervicornis, and the elkhorn coral, A. palmata. Transcriptome-wide gene profiles differed significantly between different parts of the coral colony as well as between species. Genes showing differential expression between branch tips and bases were involved in developmental signaling pathways, such as Wnt, Notch, and BMP, as well as pH regulation, ion transport, extracellular matrix production and other processes. Differences both within colonies and between species identify a relatively small number of genes that may contribute to the distinct "staghorn" versus "elkhorn" morphologies of these two sister species. The large number of differentially expressed genes supports a strong division of labor between coral branch tips and branch bases. Genes involved in growth of mature Acropora colonies include the classical signaling pathways associated with development of cnidarian larvae and polyps as well as morphological determination in higher metazoans.

  10. Oxidized Guanine Base Lesions Function in 8-Oxoguanine DNA Glycosylase-1-mediated Epigenetic Regulation of Nuclear Factor κB-driven Gene Expression*

    PubMed Central

    Pan, Lang; Hao, Wenjing; Ba, Xueqing

    2016-01-01

    A large percentage of redox-responsive gene promoters contain evolutionarily conserved guanine-rich clusters; guanines are the bases most susceptible to oxidative modification(s). Consequently, 7,8-dihydro-8-oxoguanine (8-oxoG) is one of the most abundant base lesions in promoters and is primarily repaired via the 8-oxoguanine DNA glycosylase-1 (OOG1)-initiated base excision repair pathway. In view of a prompt cellular response to oxidative challenge, we hypothesized that the 8-oxoG lesion and the cognate repair protein OGG1 are utilized in transcriptional gene activation. Here, we document TNFα-induced enrichment of both 8-oxoG and OGG1 in promoters of pro-inflammatory genes, which precedes interaction of NF-κB with its DNA-binding motif. OGG1 bound to 8-oxoG upstream from the NF-κB motif increased its DNA occupancy by promoting an on-rate of both homodimeric and heterodimeric forms of NF-κB. OGG1 depletion decreased both NF-κB binding and gene expression, whereas Nei-like glycosylase-1 and -2 had a marginal effect. These results are the first to document a novel paradigm wherein the DNA repair protein OGG1 bound to its substrate is coupled to DNA occupancy of NF-κB and functions in epigenetic regulation of gene expression. PMID:27756845

  11. Binding and condensation of plasmid DNA onto functionalized carbon nanotubes: toward the construction of nanotube-based gene delivery vectors.

    PubMed

    Singh, Ravi; Pantarotto, Davide; McCarthy, David; Chaloin, Olivier; Hoebeke, Johan; Partidos, Charalambos D; Briand, Jean-Paul; Prato, Maurizio; Bianco, Alberto; Kostarelos, Kostas

    2005-03-30

    Carbon nanotubes (CNTs) constitute a class of nanomaterials that possess characteristics suitable for a variety of possible applications. Their compatibility with aqueous environments has been made possible by the chemical functionalization of their surface, allowing for exploration of their interactions with biological components including mammalian cells. Functionalized CNTs (f-CNTs) are being intensively explored in advanced biotechnological applications ranging from molecular biosensors to cellular growth substrates. We have been exploring the potential of f-CNTs as delivery vehicles of biologically active molecules in view of possible biomedical applications, including vaccination and gene delivery. Recently we reported the capability of ammonium-functionalized single-walled CNTs to penetrate human and murine cells and facilitate the delivery of plasmid DNA leading to expression of marker genes. To optimize f-CNTs as gene delivery vehicles, it is essential to characterize their interactions with DNA. In the present report, we study the interactions of three types of f-CNTs, ammonium-functionalized single-walled and multiwalled carbon nanotubes (SWNT-NH3+; MWNT-NH3+), and lysine-functionalized single-walled carbon nanotubes (SWNT-Lys-NH3+), with plasmid DNA. Nanotube-DNA complexes were analyzed by scanning electron microscopy, surface plasmon resonance, PicoGreen dye exclusion, and agarose gel shift assay. The results indicate that all three types of cationic carbon nanotubes are able to condense DNA to varying degrees, indicating that both nanotube surface area and charge density are critical parameters that determine the interaction and electrostatic complex formation between f-CNTs with DNA. All three different f-CNT types in this study exhibited upregulation of marker gene expression over naked DNA using a mammalian (human) cell line. Differences in the levels of gene expression were correlated with the structural and biophysical data obtained for the f-CNT:DNA complexes to suggest that large surface area leading to very efficient DNA condensation is not necessary for effective gene transfer. However, it will require further investigation to determine whether the degree of binding and tight association between DNA and nanotubes is a desirable trait to increase gene expression efficiency in vitro or in vivo. This study constitutes the first thorough investigation into the physicochemical interactions between cationic functionalized carbon nanotubes and DNA toward construction of carbon nanotube-based gene transfer vector systems.

  12. Random forests-based differential analysis of gene sets for gene expression data.

    PubMed

    Hsueh, Huey-Miin; Zhou, Da-Wei; Tsai, Chen-An

    2013-04-10

    In DNA microarray studies, gene-set analysis (GSA) has become the focus of gene expression data analysis. GSA utilizes the gene expression profiles of functionally related gene sets in Gene Ontology (GO) categories or priori-defined biological classes to assess the significance of gene sets associated with clinical outcomes or phenotypes. Many statistical approaches have been proposed to determine whether such functionally related gene sets express differentially (enrichment and/or deletion) in variations of phenotypes. However, little attention has been given to the discriminatory power of gene sets and classification of patients. In this study, we propose a method of gene set analysis, in which gene sets are used to develop classifications of patients based on the Random Forest (RF) algorithm. The corresponding empirical p-value of an observed out-of-bag (OOB) error rate of the classifier is introduced to identify differentially expressed gene sets using an adequate resampling method. In addition, we discuss the impacts and correlations of genes within each gene set based on the measures of variable importance in the RF algorithm. Significant classifications are reported and visualized together with the underlying gene sets and their contribution to the phenotypes of interest. Numerical studies using both synthesized data and a series of publicly available gene expression data sets are conducted to evaluate the performance of the proposed methods. Compared with other hypothesis testing approaches, our proposed methods are reliable and successful in identifying enriched gene sets and in discovering the contributions of genes within a gene set. The classification results of identified gene sets can provide an valuable alternative to gene set testing to reveal the unknown, biologically relevant classes of samples or patients. In summary, our proposed method allows one to simultaneously assess the discriminatory ability of gene sets and the importance of genes for interpretation of data in complex biological systems. The classifications of biologically defined gene sets can reveal the underlying interactions of gene sets associated with the phenotypes, and provide an insightful complement to conventional gene set analyses. Copyright © 2012 Elsevier B.V. All rights reserved.

  13. Interactions between genetic background, insulin resistance and β-cell function.

    PubMed

    Kahn, S E; Suvag, S; Wright, L A; Utzschneider, K M

    2012-10-01

    An interaction between genes and the environment is a critical component underlying the pathogenesis of the hyperglycaemia of type 2 diabetes. The development of more sophisticated techniques for studying gene variants and for analysing genetic data has led to the discovery of some 40 genes associated with type 2 diabetes. Most of these genes are related to changes in β-cell function, with a few associated with decreased insulin sensitivity and obesity. Interestingly, using quantitative traits based on continuous measures rather than dichotomous ones, it has become evident that not all genes associated with changes in fasting or post-prandial glucose are also associated with a diagnosis of type 2 diabetes. Identification of these gene variants has provided novel insights into the physiology and pathophysiology of the β-cell, including the identification of molecules involved in β-cell function that were not previously recognized as playing a role in this critical cell. Published 2012. This article is a U.S. Government work and is in the public domain in the USA.

  14. Virus induced gene silencing (VIGS) for functional analysis of wheat genes involved in Zymoseptoria tritici susceptibility and resistance.

    PubMed

    Lee, Wing-Sham; Rudd, Jason J; Kanyuka, Kostya

    2015-06-01

    Virus-induced gene silencing (VIGS) has emerged as a powerful reverse genetic technology in plants supplementary to stable transgenic RNAi and, in certain species, as a viable alternative approach for gene functional analysis. The RNA virus Barley stripe mosaic virus (BSMV) was developed as a VIGS vector in the early 2000s and since then it has been used to study the function of wheat genes. Several variants of BSMV vectors are available, with some requiring in vitro transcription of infectious viral RNA, while others rely on in planta production of viral RNA from DNA-based vectors delivered to plant cells either by particle bombardment or Agrobacterium tumefaciens. We adapted the latest generation of binary BSMV VIGS vectors for the identification and study of wheat genes of interest involved in interactions with Zymoseptoria tritici and here present detailed and the most up-to-date protocols. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  15. Polymorphisms in the type I deiodinase gene and frontal function in recurrent depressive disorder.

    PubMed

    Gałecka, Elżbieta; Talarowska, Monika; Orzechowska, Agata; Górski, Paweł; Szemraj, Janusz

    2016-09-01

    Significant impairment of some psychological functions, including cognitive functioning, has been characteristically found in depressed patients. Memory disturbances may be related to the levels of thyroid hormones (TH) that are under the influence of different mechanisms and molecules, including deiodinase type 1(D1) - an important determinant of circulating triiodothyronine (T3). We investigated the relationship between two functionally known polymorphisms within the DIO1 gene, i.e. DIO1a-C/T and DIO1b-A/G, and cognitive functioning in patients diagnosed with recurrent depressive disorder (rDD). In the planned analysis we mainly concentrated on the frontal function: working memory, executive functions and verbal fluency. Genetic variants were genotyped in 128 patients using a method based on polymerase chain reaction (PCR). Cognitive functions were assessed by the Trail Making Test, the Stroop Test and the Verbal Fluency Test (VFT). No significant associations were found between DIO1 polymorphisms and cognitive functioning in rDD. Only the CT and TT genotypes of the DIO1a variant were significantly related to verbal fluency. There were no significant differences between the distribution of the genotypes and demographic/medical variables. Based on the study, the examined polymorphisms are not an important risk or protective factor for cognitive impairment in depressive patients. Functional variants within the DIO1 gene that affect triiodothyronine (T3) levels seem not to be associated with cognitive functions. Nevertheless, considering the fact that the DIO1 gene is related to the course and management of depression, further studies on a larger sample size might be suggested. Copyright © 2016 Medical University of Bialystok. Published by Elsevier Urban & Partner Sp. z o.o. All rights reserved.

  16. Diversity and functions of bacterial community in drinking water biofilms revealed by high-throughput sequencing

    PubMed Central

    Chao, Yuanqing; Mao, Yanping; Wang, Zhiping; Zhang, Tong

    2015-01-01

    The development of biofilms in drinking water (DW) systems may cause various problems to water quality. To investigate the community structure of biofilms on different pipe materials and the global/specific metabolic functions of DW biofilms, PCR-based 454 pyrosequencing data for 16S rRNA genes and Illumina metagenomic data were generated and analysed. Considerable differences in bacterial diversity and taxonomic structure were identified between biofilms formed on stainless steel and biofilms formed on plastics, indicating that the metallic materials facilitate the formation of higher diversity biofilms. Moreover, variations in several dominant genera were observed during biofilm formation. Based on PCA analysis, the global functions in the DW biofilms were similar to other DW metagenomes. Beyond the global functions, the occurrences and abundances of specific protective genes involved in the glutathione metabolism, the SoxRS system, the OxyR system, RpoS regulated genes, and the production/degradation of extracellular polymeric substances were also evaluated. A near-complete and low-contamination draft genome was constructed from the metagenome of the DW biofilm, based on the coverage and tetranucleotide frequencies, and identified as a Bradyrhizobiaceae-like bacterium according to a phylogenetic analysis. Our findings provide new insight into DW biofilms, especially in terms of their metabolic functions. PMID:26067561

  17. Diversity and functions of bacterial community in drinking water biofilms revealed by high-throughput sequencing

    NASA Astrophysics Data System (ADS)

    Chao, Yuanqing; Mao, Yanping; Wang, Zhiping; Zhang, Tong

    2015-06-01

    The development of biofilms in drinking water (DW) systems may cause various problems to water quality. To investigate the community structure of biofilms on different pipe materials and the global/specific metabolic functions of DW biofilms, PCR-based 454 pyrosequencing data for 16S rRNA genes and Illumina metagenomic data were generated and analysed. Considerable differences in bacterial diversity and taxonomic structure were identified between biofilms formed on stainless steel and biofilms formed on plastics, indicating that the metallic materials facilitate the formation of higher diversity biofilms. Moreover, variations in several dominant genera were observed during biofilm formation. Based on PCA analysis, the global functions in the DW biofilms were similar to other DW metagenomes. Beyond the global functions, the occurrences and abundances of specific protective genes involved in the glutathione metabolism, the SoxRS system, the OxyR system, RpoS regulated genes, and the production/degradation of extracellular polymeric substances were also evaluated. A near-complete and low-contamination draft genome was constructed from the metagenome of the DW biofilm, based on the coverage and tetranucleotide frequencies, and identified as a Bradyrhizobiaceae-like bacterium according to a phylogenetic analysis. Our findings provide new insight into DW biofilms, especially in terms of their metabolic functions.

  18. Diversity and functions of bacterial community in drinking water biofilms revealed by high-throughput sequencing.

    PubMed

    Chao, Yuanqing; Mao, Yanping; Wang, Zhiping; Zhang, Tong

    2015-06-12

    The development of biofilms in drinking water (DW) systems may cause various problems to water quality. To investigate the community structure of biofilms on different pipe materials and the global/specific metabolic functions of DW biofilms, PCR-based 454 pyrosequencing data for 16S rRNA genes and Illumina metagenomic data were generated and analysed. Considerable differences in bacterial diversity and taxonomic structure were identified between biofilms formed on stainless steel and biofilms formed on plastics, indicating that the metallic materials facilitate the formation of higher diversity biofilms. Moreover, variations in several dominant genera were observed during biofilm formation. Based on PCA analysis, the global functions in the DW biofilms were similar to other DW metagenomes. Beyond the global functions, the occurrences and abundances of specific protective genes involved in the glutathione metabolism, the SoxRS system, the OxyR system, RpoS regulated genes, and the production/degradation of extracellular polymeric substances were also evaluated. A near-complete and low-contamination draft genome was constructed from the metagenome of the DW biofilm, based on the coverage and tetranucleotide frequencies, and identified as a Bradyrhizobiaceae-like bacterium according to a phylogenetic analysis. Our findings provide new insight into DW biofilms, especially in terms of their metabolic functions.

  19. Simple Shared Motifs (SSM) in conserved region of promoters: a new approach to identify co-regulation patterns.

    PubMed

    Gruel, Jérémy; LeBorgne, Michel; LeMeur, Nolwenn; Théret, Nathalie

    2011-09-12

    Regulation of gene expression plays a pivotal role in cellular functions. However, understanding the dynamics of transcription remains a challenging task. A host of computational approaches have been developed to identify regulatory motifs, mainly based on the recognition of DNA sequences for transcription factor binding sites. Recent integration of additional data from genomic analyses or phylogenetic footprinting has significantly improved these methods. Here, we propose a different approach based on the compilation of Simple Shared Motifs (SSM), groups of sequences defined by their length and similarity and present in conserved sequences of gene promoters. We developed an original algorithm to search and count SSM in pairs of genes. An exceptional number of SSM is considered as a common regulatory pattern. The SSM approach is applied to a sample set of genes and validated using functional gene-set enrichment analyses. We demonstrate that the SSM approach selects genes that are over-represented in specific biological categories (Ontology and Pathways) and are enriched in co-expressed genes. Finally we show that genes co-expressed in the same tissue or involved in the same biological pathway have increased SSM values. Using unbiased clustering of genes, Simple Shared Motifs analysis constitutes an original contribution to provide a clearer definition of expression networks.

  20. Simple Shared Motifs (SSM) in conserved region of promoters: a new approach to identify co-regulation patterns

    PubMed Central

    2011-01-01

    Background Regulation of gene expression plays a pivotal role in cellular functions. However, understanding the dynamics of transcription remains a challenging task. A host of computational approaches have been developed to identify regulatory motifs, mainly based on the recognition of DNA sequences for transcription factor binding sites. Recent integration of additional data from genomic analyses or phylogenetic footprinting has significantly improved these methods. Results Here, we propose a different approach based on the compilation of Simple Shared Motifs (SSM), groups of sequences defined by their length and similarity and present in conserved sequences of gene promoters. We developed an original algorithm to search and count SSM in pairs of genes. An exceptional number of SSM is considered as a common regulatory pattern. The SSM approach is applied to a sample set of genes and validated using functional gene-set enrichment analyses. We demonstrate that the SSM approach selects genes that are over-represented in specific biological categories (Ontology and Pathways) and are enriched in co-expressed genes. Finally we show that genes co-expressed in the same tissue or involved in the same biological pathway have increased SSM values. Conclusions Using unbiased clustering of genes, Simple Shared Motifs analysis constitutes an original contribution to provide a clearer definition of expression networks. PMID:21910886

  1. NABIC marker database: A molecular markers information network of agricultural crops.

    PubMed

    Kim, Chang-Kug; Seol, Young-Joo; Lee, Dong-Jun; Jeong, In-Seon; Yoon, Ung-Han; Lee, Gang-Seob; Hahn, Jang-Ho; Park, Dong-Suk

    2013-01-01

    In 2013, National Agricultural Biotechnology Information Center (NABIC) reconstructs a molecular marker database for useful genetic resources. The web-based marker database consists of three major functional categories: map viewer, RSN marker and gene annotation. It provides 7250 marker locations, 3301 RSN marker property, 3280 molecular marker annotation information in agricultural plants. The individual molecular marker provides information such as marker name, expressed sequence tag number, gene definition and general marker information. This updated marker-based database provides useful information through a user-friendly web interface that assisted in tracing any new structures of the chromosomes and gene positional functions using specific molecular markers. The database is available for free at http://nabic.rda.go.kr/gere/rice/molecularMarkers/

  2. SoFoCles: feature filtering for microarray classification based on gene ontology.

    PubMed

    Papachristoudis, Georgios; Diplaris, Sotiris; Mitkas, Pericles A

    2010-02-01

    Marker gene selection has been an important research topic in the classification analysis of gene expression data. Current methods try to reduce the "curse of dimensionality" by using statistical intra-feature set calculations, or classifiers that are based on the given dataset. In this paper, we present SoFoCles, an interactive tool that enables semantic feature filtering in microarray classification problems with the use of external, well-defined knowledge retrieved from the Gene Ontology. The notion of semantic similarity is used to derive genes that are involved in the same biological path during the microarray experiment, by enriching a feature set that has been initially produced with legacy methods. Among its other functionalities, SoFoCles offers a large repository of semantic similarity methods that are used in order to derive feature sets and marker genes. The structure and functionality of the tool are discussed in detail, as well as its ability to improve classification accuracy. Through experimental evaluation, SoFoCles is shown to outperform other classification schemes in terms of classification accuracy in two real datasets using different semantic similarity computation approaches.

  3. CRISPR/Cas9: An inexpensive, efficient loss of function tool to screen human disease genes in Xenopus.

    PubMed

    Bhattacharya, Dipankan; Marfo, Chris A; Li, Davis; Lane, Maura; Khokha, Mustafa K

    2015-12-15

    Congenital malformations are the major cause of infant mortality in the US and Europe. Due to rapid advances in human genomics, we can now efficiently identify sequence variants that may cause disease in these patients. However, establishing disease causality remains a challenge. Additionally, in the case of congenital heart disease, many of the identified candidate genes are either novel to embryonic development or have no known function. Therefore, there is a pressing need to develop inexpensive and efficient technologies to screen these candidate genes for disease phenocopy in model systems and to perform functional studies to uncover their role in development. For this purpose, we sought to test F0 CRISPR based gene editing as a loss of function strategy for disease phenocopy in the frog model organism, Xenopus tropicalis. We demonstrate that the CRISPR/Cas9 system can efficiently modify both alleles in the F0 generation within a few hours post fertilization, recapitulating even early disease phenotypes that are highly similar to knockdowns from morpholino oligos (MOs) in nearly all cases tested. We find that injecting Cas9 protein is dramatically more efficacious and less toxic than cas9 mRNA. We conclude that CRISPR based F0 gene modification in X. tropicalis is efficient and cost effective and readily recapitulates disease and MO phenotypes. Copyright © 2015 Elsevier Inc. All rights reserved.

  4. Microarray-based determination of anti-inflammatory genes targeted by 6-(methylsulfinyl)hexyl isothiocyanate in macrophages.

    PubMed

    Chen, Jihua; Uto, Takuhiro; Tanigawa, Shunsuke; Yamada-Kato, Tomeo; Fujii, Makoto; Hou, DE-Xing

    2010-01-01

    6-(Methylsulfinyl)hexyl isothiocyanate (6-MSITC) is a bioactive ingredient of wasabi [Wasabia japonica (Miq.) Matsumura], which is a popular pungent spice of Japan. To evaluate the anti-inflammatory function and underlying genes targeted by 6-MSITC, gene expression profiling through DNA microarray was performed in mouse macrophages. Among 22,050 oligonucleotides, the expression levels of 406 genes were increased by ≥3-fold in lipopolysaccharide (LPS)-activated RAW264 cells, 238 gene signals of which were attenuated by 6-MSITC (≥2-fold). Expression levels of 717 genes were decreased by ≥3-fold in LPS-activated cells, of which 336 gene signals were restored by 6-MSITC (≥2-fold). Utilizing group analysis, 206 genes affected by 6-MSITC with a ≥2-fold change were classified into 35 categories relating to biological processes (81), molecular functions (108) and signaling pathways (17). The genes were further categorized as 'defense, inflammatory response, cytokine activities and receptor activities' and some were confirmed by real-time polymerase chain reaction. Ingenuity pathway analysis further revealed that wasabi 6-MSITC regulated the relevant networks of chemokines, interleukins and interferons to exert its anti-inflammatory function.

  5. Microarray-based determination of anti-inflammatory genes targeted by 6-(methylsulfinyl)hexyl isothiocyanate in macrophages

    PubMed Central

    CHEN, JIHUA; UTO, TAKUHIRO; TANIGAWA, SHUNSUKE; YAMADA-KATO, TOMEO; FUJII, MAKOTO; HOU, DE-XING

    2010-01-01

    6-(Methylsulfinyl)hexyl isothiocyanate (6-MSITC) is a bioactive ingredient of wasabi [Wasabia japonica (Miq.) Matsumura], which is a popular pungent spice of Japan. To evaluate the anti-inflammatory function and underlying genes targeted by 6-MSITC, gene expression profiling through DNA microarray was performed in mouse macrophages. Among 22,050 oligonucleotides, the expression levels of 406 genes were increased by ≥3-fold in lipopolysaccharide (LPS)-activated RAW264 cells, 238 gene signals of which were attenuated by 6-MSITC (≥2-fold). Expression levels of 717 genes were decreased by ≥3-fold in LPS-activated cells, of which 336 gene signals were restored by 6-MSITC (≥2-fold). Utilizing group analysis, 206 genes affected by 6-MSITC with a ≥2-fold change were classified into 35 categories relating to biological processes (81), molecular functions (108) and signaling pathways (17). The genes were further categorized as ‘defense, inflammatory response, cytokine activities and receptor activities’ and some were confirmed by real-time polymerase chain reaction. Ingenuity pathway analysis further revealed that wasabi 6-MSITC regulated the relevant networks of chemokines, interleukins and interferons to exert its anti-inflammatory function. PMID:23136589

  6. High-throughput transcriptome sequencing and preliminary functional analysis in four Neotropical tree species.

    PubMed

    Brousseau, Louise; Tinaut, Alexandra; Duret, Caroline; Lang, Tiange; Garnier-Gere, Pauline; Scotti, Ivan

    2014-03-27

    The Amazonian rainforest is predicted to suffer from ongoing environmental changes. Despite the need to evaluate the impact of such changes on tree genetic diversity, we almost entirely lack genomic resources. In this study, we analysed the transcriptome of four tropical tree species (Carapa guianensis, Eperua falcata, Symphonia globulifera and Virola michelii) with contrasting ecological features, belonging to four widespread botanical families (respectively Meliaceae, Fabaceae, Clusiaceae and Myristicaceae). We sequenced cDNA libraries from three organs (leaves, stems, and roots) using 454 pyrosequencing. We have developed an R and bioperl-based bioinformatic procedure for de novo assembly, gene functional annotation and marker discovery. Mismatch identification takes into account single-base quality values as well as the likelihood of false variants as a function of contig depth and number of sequenced chromosomes. Between 17103 (for Symphonia globulifera) and 23390 (for Eperua falcata) contigs were assembled. Organs varied in the numbers of unigenes they apparently express, with higher number in roots. Patterns of gene expression were similar across species, with metabolism of aromatic compounds standing out as an overrepresented gene function. Transcripts corresponding to several gene functions were found to be over- or underrepresented in each organ. We identified between 4434 (for Symphonia globulifera) and 9076 (for Virola surinamensis) well-supported mismatches. The resulting overall mismatch density was comprised between 0.89 (S. globulifera) and 1.05 (V. surinamensis) mismatches/100 bp in variation-containing contigs. The relative representation of gene functions in the four transcriptomes suggests that secondary metabolism may be particularly important in tropical trees. The differential representation of transcripts among tissues suggests differential gene expression, which opens the way to functional studies in these non-model, ecologically important species. We found substantial amounts of mismatches in the four species. These newly identified putative variants are a first step towards acquiring much needed genomic resources for tropical tree species.

  7. A sight on protein-based nanoparticles as drug/gene delivery systems.

    PubMed

    Salatin, Sara; Jelvehgari, Mitra; Maleki-Dizaj, Solmaz; Adibkia, Khosro

    2015-01-01

    Polymeric nanomaterials have extensively been applied for the preparation of targeted and controlled release drug/gene delivery systems. However, problems involved in the formulation of synthetic polymers such as using of the toxic solvents and surfactants have limited their desirable applications. In this regard, natural biomolecules including proteins and polysaccharide are suitable alternatives due to their safety. According to literature, protein-based nanoparticles possess many advantages for drug and gene delivery such as biocompatibility, biodegradability and ability to functionalize with targeting ligands. This review provides a general sight on the application of biodegradable protein-based nanoparticles in drug/gene delivery based on their origins. Their unique physicochemical properties that help them to be formulated as pharmaceutical carriers are also discussed.

  8. Pathway Distiller - multisource biological pathway consolidation

    PubMed Central

    2012-01-01

    Background One method to understand and evaluate an experiment that produces a large set of genes, such as a gene expression microarray analysis, is to identify overrepresentation or enrichment for biological pathways. Because pathways are able to functionally describe the set of genes, much effort has been made to collect curated biological pathways into publicly accessible databases. When combining disparate databases, highly related or redundant pathways exist, making their consolidation into pathway concepts essential. This will facilitate unbiased, comprehensive yet streamlined analysis of experiments that result in large gene sets. Methods After gene set enrichment finds representative pathways for large gene sets, pathways are consolidated into representative pathway concepts. Three complementary, but different methods of pathway consolidation are explored. Enrichment Consolidation combines the set of the pathways enriched for the signature gene list through iterative combining of enriched pathways with other pathways with similar signature gene sets; Weighted Consolidation utilizes a Protein-Protein Interaction network based gene-weighting approach that finds clusters of both enriched and non-enriched pathways limited to the experiments' resultant gene list; and finally the de novo Consolidation method uses several measurements of pathway similarity, that finds static pathway clusters independent of any given experiment. Results We demonstrate that the three consolidation methods provide unified yet different functional insights of a resultant gene set derived from a genome-wide profiling experiment. Results from the methods are presented, demonstrating their applications in biological studies and comparing with a pathway web-based framework that also combines several pathway databases. Additionally a web-based consolidation framework that encompasses all three methods discussed in this paper, Pathway Distiller (http://cbbiweb.uthscsa.edu/PathwayDistiller), is established to allow researchers access to the methods and example microarray data described in this manuscript, and the ability to analyze their own gene list by using our unique consolidation methods. Conclusions By combining several pathway systems, implementing different, but complementary pathway consolidation methods, and providing a user-friendly web-accessible tool, we have enabled users the ability to extract functional explanations of their genome wide experiments. PMID:23134636

  9. Pathway Distiller - multisource biological pathway consolidation.

    PubMed

    Doderer, Mark S; Anguiano, Zachry; Suresh, Uthra; Dashnamoorthy, Ravi; Bishop, Alexander J R; Chen, Yidong

    2012-01-01

    One method to understand and evaluate an experiment that produces a large set of genes, such as a gene expression microarray analysis, is to identify overrepresentation or enrichment for biological pathways. Because pathways are able to functionally describe the set of genes, much effort has been made to collect curated biological pathways into publicly accessible databases. When combining disparate databases, highly related or redundant pathways exist, making their consolidation into pathway concepts essential. This will facilitate unbiased, comprehensive yet streamlined analysis of experiments that result in large gene sets. After gene set enrichment finds representative pathways for large gene sets, pathways are consolidated into representative pathway concepts. Three complementary, but different methods of pathway consolidation are explored. Enrichment Consolidation combines the set of the pathways enriched for the signature gene list through iterative combining of enriched pathways with other pathways with similar signature gene sets; Weighted Consolidation utilizes a Protein-Protein Interaction network based gene-weighting approach that finds clusters of both enriched and non-enriched pathways limited to the experiments' resultant gene list; and finally the de novo Consolidation method uses several measurements of pathway similarity, that finds static pathway clusters independent of any given experiment. We demonstrate that the three consolidation methods provide unified yet different functional insights of a resultant gene set derived from a genome-wide profiling experiment. Results from the methods are presented, demonstrating their applications in biological studies and comparing with a pathway web-based framework that also combines several pathway databases. Additionally a web-based consolidation framework that encompasses all three methods discussed in this paper, Pathway Distiller (http://cbbiweb.uthscsa.edu/PathwayDistiller), is established to allow researchers access to the methods and example microarray data described in this manuscript, and the ability to analyze their own gene list by using our unique consolidation methods. By combining several pathway systems, implementing different, but complementary pathway consolidation methods, and providing a user-friendly web-accessible tool, we have enabled users the ability to extract functional explanations of their genome wide experiments.

  10. Functional genomics platform for pooled screening and mammalian genetic interaction maps

    PubMed Central

    Kampmann, Martin; Bassik, Michael C.; Weissman, Jonathan S.

    2014-01-01

    Systematic genetic interaction maps in microorganisms are powerful tools for identifying functional relationships between genes and defining the function of uncharacterized genes. We have recently implemented this strategy in mammalian cells as a two-stage approach. First, genes of interest are robustly identified in a pooled genome-wide screen using complex shRNA libraries. Second, phenotypes for all pairwise combinations of hit genes are measured in a double-shRNA screen and used to construct a genetic interaction map. Our protocol allows for rapid pooled screening under various conditions without a requirement for robotics, in contrast to arrayed approaches. Each stage of the protocol can be implemented in ~2 weeks, with additional time for analysis and generation of reagents. We discuss considerations for screen design, and present complete experimental procedures as well as a full computational analysis suite for identification of hits in pooled screens and generation of genetic interaction maps. While the protocols outlined here were developed for our original shRNA-based approach, they can be applied more generally, including to CRISPR-based approaches. PMID:24992097

  11. GeneRIF indexing: sentence selection based on machine learning.

    PubMed

    Jimeno-Yepes, Antonio J; Sticco, J Caitlin; Mork, James G; Aronson, Alan R

    2013-05-31

    A Gene Reference Into Function (GeneRIF) describes novel functionality of genes. GeneRIFs are available from the National Center for Biotechnology Information (NCBI) Gene database. GeneRIF indexing is performed manually, and the intention of our work is to provide methods to support creating the GeneRIF entries. The creation of GeneRIF entries involves the identification of the genes mentioned in MEDLINE®; citations and the sentences describing a novel function. We have compared several learning algorithms and several features extracted or derived from MEDLINE sentences to determine if a sentence should be selected for GeneRIF indexing. Features are derived from the sentences or using mechanisms to augment the information provided by them: assigning a discourse label using a previously trained model, for example. We show that machine learning approaches with specific feature combinations achieve results close to one of the annotators. We have evaluated different feature sets and learning algorithms. In particular, Naïve Bayes achieves better performance with a selection of features similar to one used in related work, which considers the location of the sentence, the discourse of the sentence and the functional terminology in it. The current performance is at a level similar to human annotation and it shows that machine learning can be used to automate the task of sentence selection for GeneRIF annotation. The current experiments are limited to the human species. We would like to see how the methodology can be extended to other species, specifically the normalization of gene mentions in other species.

  12. DArT Markers Effectively Target Gene Space in the Rye Genome

    PubMed Central

    Gawroński, Piotr; Pawełkowicz, Magdalena; Tofil, Katarzyna; Uszyński, Grzegorz; Sharifova, Saida; Ahluwalia, Shivaksh; Tyrka, Mirosław; Wędzony, Maria; Kilian, Andrzej; Bolibok-Brągoszewska, Hanna

    2016-01-01

    Large genome size and complexity hamper considerably the genomics research in relevant species. Rye (Secale cereale L.) has one of the largest genomes among cereal crops and repetitive sequences account for over 90% of its length. Diversity Arrays Technology is a high-throughput genotyping method, in which a preferential sampling of gene-rich regions is achieved through the use of methylation sensitive restriction enzymes. We obtained sequences of 6,177 rye DArT markers and following a redundancy analysis assembled them into 3,737 non-redundant sequences, which were then used in homology searches against five Pooideae sequence sets. In total 515 DArT sequences could be incorporated into publicly available rye genome zippers providing a starting point for the integration of DArT- and transcript-based genomics resources in rye. Using Blast2Go pipeline we attributed putative gene functions to 1101 (29.4%) of the non-redundant DArT marker sequences, including 132 sequences with putative disease resistance-related functions, which were found to be preferentially located in the 4RL and 6RL chromosomes. Comparative analysis based on the DArT sequences revealed obvious inconsistencies between two recently published high density consensus maps of rye. Furthermore we demonstrated that DArT marker sequences can be a source of SSR polymorphisms. Obtained data demonstrate that DArT markers effectively target gene space in the large, complex, and repetitive rye genome. Through the annotation of putative gene functions and the alignment of DArT sequences relative to reference genomes we obtained information, that will complement the results of the studies, where DArT genotyping was deployed, by simplifying the gene ontology and microcolinearity based identification of candidate genes. PMID:27833625

  13. DArT Markers Effectively Target Gene Space in the Rye Genome.

    PubMed

    Gawroński, Piotr; Pawełkowicz, Magdalena; Tofil, Katarzyna; Uszyński, Grzegorz; Sharifova, Saida; Ahluwalia, Shivaksh; Tyrka, Mirosław; Wędzony, Maria; Kilian, Andrzej; Bolibok-Brągoszewska, Hanna

    2016-01-01

    Large genome size and complexity hamper considerably the genomics research in relevant species. Rye ( Secale cereale L.) has one of the largest genomes among cereal crops and repetitive sequences account for over 90% of its length. Diversity Arrays Technology is a high-throughput genotyping method, in which a preferential sampling of gene-rich regions is achieved through the use of methylation sensitive restriction enzymes. We obtained sequences of 6,177 rye DArT markers and following a redundancy analysis assembled them into 3,737 non-redundant sequences, which were then used in homology searches against five Pooideae sequence sets. In total 515 DArT sequences could be incorporated into publicly available rye genome zippers providing a starting point for the integration of DArT- and transcript-based genomics resources in rye. Using Blast2Go pipeline we attributed putative gene functions to 1101 (29.4%) of the non-redundant DArT marker sequences, including 132 sequences with putative disease resistance-related functions, which were found to be preferentially located in the 4RL and 6RL chromosomes. Comparative analysis based on the DArT sequences revealed obvious inconsistencies between two recently published high density consensus maps of rye. Furthermore we demonstrated that DArT marker sequences can be a source of SSR polymorphisms. Obtained data demonstrate that DArT markers effectively target gene space in the large, complex, and repetitive rye genome. Through the annotation of putative gene functions and the alignment of DArT sequences relative to reference genomes we obtained information, that will complement the results of the studies, where DArT genotyping was deployed, by simplifying the gene ontology and microcolinearity based identification of candidate genes.

  14. Design and construction of a first-generation high-throughput integrated robotic molecular biology platform for bioenergy applications

    USDA-ARS?s Scientific Manuscript database

    The molecular biological techniques for plasmid-based assembly and cloning of gene open reading frames are essential for elucidating the function of the proteins encoded by the genes. These techniques involve the production of full-length cDNA libraries as a source of plasmid-based clones to expres...

  15. Systematic discovery of novel ciliary genes through functional genomics in the zebrafish

    PubMed Central

    Choksi, Semil P.; Babu, Deepak; Lau, Doreen; Yu, Xianwen; Roy, Sudipto

    2014-01-01

    Cilia are microtubule-based hair-like organelles that play many important roles in development and physiology, and are implicated in a rapidly expanding spectrum of human diseases, collectively termed ciliopathies. Primary ciliary dyskinesia (PCD), one of the most prevalent of ciliopathies, arises from abnormalities in the differentiation or motility of the motile cilia. Despite their biomedical importance, a methodical functional screen for ciliary genes has not been carried out in any vertebrate at the organismal level. We sought to systematically discover novel motile cilia genes by identifying the genes induced by Foxj1, a winged-helix transcription factor that has an evolutionarily conserved role as the master regulator of motile cilia biogenesis. Unexpectedly, we find that the majority of the Foxj1-induced genes have not been associated with cilia before. To characterize these novel putative ciliary genes, we subjected 50 randomly selected candidates to a systematic functional phenotypic screen in zebrafish embryos. Remarkably, we find that over 60% are required for ciliary differentiation or function, whereas 30% of the proteins encoded by these genes localize to motile cilia. We also show that these genes regulate the proper differentiation and beating of motile cilia. This collection of Foxj1-induced genes will be invaluable for furthering our understanding of ciliary biology, and in the identification of new mutations underlying ciliary disorders in humans. PMID:25139857

  16. Genomewide Analysis of Aryl Hydrocarbon Receptor Binding Targets Reveals an Extensive Array of Gene Clusters that Control Morphogenetic and Developmental Programs

    PubMed Central

    Sartor, Maureen A.; Schnekenburger, Michael; Marlowe, Jennifer L.; Reichard, John F.; Wang, Ying; Fan, Yunxia; Ma, Ci; Karyala, Saikumar; Halbleib, Danielle; Liu, Xiangdong; Medvedovic, Mario; Puga, Alvaro

    2009-01-01

    Background The vertebrate aryl hydrocarbon receptor (AHR) is a ligand-activated transcription factor that regulates cellular responses to environmental polycyclic and halogenated compounds. The naive receptor is believed to reside in an inactive cytosolic complex that translocates to the nucleus and induces transcription of xenobiotic detoxification genes after activation by ligand. Objectives We conducted an integrative genomewide analysis of AHR gene targets in mouse hepatoma cells and determined whether AHR regulatory functions may take place in the absence of an exogenous ligand. Methods The network of AHR-binding targets in the mouse genome was mapped through a multipronged approach involving chromatin immunoprecipitation/chip and global gene expression signatures. The findings were integrated into a prior functional knowledge base from Gene Ontology, interaction networks, Kyoto Encyclopedia of Genes and Genomes pathways, sequence motif analysis, and literature molecular concepts. Results We found the naive receptor in unstimulated cells bound to an extensive array of gene clusters with functions in regulation of gene expression, differentiation, and pattern specification, connecting multiple morphogenetic and developmental programs. Activation by the ligand displaced the receptor from some of these targets toward sites in the promoters of xenobiotic metabolism genes. Conclusions The vertebrate AHR appears to possess unsuspected regulatory functions that may be potential targets of environmental injury. PMID:19654925

  17. A machine-learned analysis of human gene polymorphisms modulating persisting pain points at major roles of neuroimmune processes.

    PubMed

    Kringel, Dario; Lippmann, Catharina; Parnham, Michael J; Kalso, Eija; Ultsch, Alfred; Lötsch, Jörn

    2018-06-19

    Human genetic research has implicated functional variants of more than one hundred genes in the modulation of persisting pain. Artificial intelligence and machine learning techniques may combine this knowledge with results of genetic research gathered in any context, which permits the identification of the key biological processes involved in chronic sensitization to pain. Based on published evidence, a set of 110 genes carrying variants reported to be associated with modulation of the clinical phenotype of persisting pain in eight different clinical settings was submitted to unsupervised machine-learning aimed at functional clustering. Subsequently, a mathematically supported subset of genes, comprising those most consistently involved in persisting pain, was analyzed by means of computational functional genomics in the Gene Ontology knowledgebase. Clustering of genes with evidence for a modulation of persisting pain elucidated a functionally heterogeneous set. The situation cleared when the focus was narrowed to a genetic modulation consistently observed throughout several clinical settings. On this basis, two groups of biological processes, the immune system and nitric oxide signaling, emerged as major players in sensitization to persisting pain, which is biologically highly plausible and in agreement with other lines of pain research. The present computational functional genomics-based approach provided a computational systems-biology perspective on chronic sensitization to pain. Human genetic control of persisting pain points to the immune system as a source of potential future targets for drugs directed against persisting pain. Contemporary machine-learned methods provide innovative approaches to knowledge discovery from previous evidence. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  18. Expression-based clustering of CAZyme-encoding genes of Aspergillus niger.

    PubMed

    Gruben, Birgit S; Mäkelä, Miia R; Kowalczyk, Joanna E; Zhou, Miaomiao; Benoit-Gelber, Isabelle; De Vries, Ronald P

    2017-11-23

    The Aspergillus niger genome contains a large repertoire of genes encoding carbohydrate active enzymes (CAZymes) that are targeted to plant polysaccharide degradation enabling A. niger to grow on a wide range of plant biomass substrates. Which genes need to be activated in certain environmental conditions depends on the composition of the available substrate. Previous studies have demonstrated the involvement of a number of transcriptional regulators in plant biomass degradation and have identified sets of target genes for each regulator. In this study, a broad transcriptional analysis was performed of the A. niger genes encoding (putative) plant polysaccharide degrading enzymes. Microarray data focusing on the initial response of A. niger to the presence of plant biomass related carbon sources were analyzed of a wild-type strain N402 that was grown on a large range of carbon sources and of the regulatory mutant strains ΔxlnR, ΔaraR, ΔamyR, ΔrhaR and ΔgalX that were grown on their specific inducing compounds. The cluster analysis of the expression data revealed several groups of co-regulated genes, which goes beyond the traditionally described co-regulated gene sets. Additional putative target genes of the selected regulators were identified, based on their expression profile. Notably, in several cases the expression profile puts questions on the function assignment of uncharacterized genes that was based on homology searches, highlighting the need for more extensive biochemical studies into the substrate specificity of enzymes encoded by these non-characterized genes. The data also revealed sets of genes that were upregulated in the regulatory mutants, suggesting interaction between the regulatory systems and a therefore even more complex overall regulatory network than has been reported so far. Expression profiling on a large number of substrates provides better insight in the complex regulatory systems that drive the conversion of plant biomass by fungi. In addition, the data provides additional evidence in favor of and against the similarity-based functions assigned to uncharacterized genes.

  19. GeoChip-based analysis of metabolic diversity of microbial communities at the Juan de Fuca Ridge hydrothermal vent

    PubMed Central

    Wang, Fengping; Zhou, Huaiyang; Meng, Jun; Peng, Xiaotong; Jiang, Lijing; Sun, Ping; Zhang, Chuanlun; Van Nostrand, Joy D.; Deng, Ye; He, Zhili; Wu, Liyou; Zhou, Jizhong; Xiao, Xiang

    2009-01-01

    Deep-sea hydrothermal vents are one of the most unique and fascinating ecosystems on Earth. Although phylogenetic diversity of vent communities has been extensively examined, their physiological diversity is poorly understood. In this study, a GeoChip-based, high-throughput metagenomics technology revealed dramatic differences in microbial metabolic functions in a newly grown protochimney (inner section, Proto-I; outer section, Proto-O) and the outer section of a mature chimney (4143-1) at the Juan de Fuca Ridge. Very limited numbers of functional genes were detected in Proto-I (113 genes), whereas much higher numbers of genes were detected in Proto-O (504 genes) and 4143-1 (5,414 genes). Microbial functional genes/populations in Proto-O and Proto-I were substantially different (around 1% common genes), suggesting a rapid change in the microbial community composition during the growth of the chimney. Previously retrieved cbbL and cbbM genes involved in the Calvin Benson Bassham (CBB) cycle from deep-sea hydrothermal vents were predominant in Proto-O and 4143-1, whereas photosynthetic green-like cbbL genes were the major components in Proto-I. In addition, genes involved in methanogenesis, aerobic and anaerobic methane oxidation (e.g., ANME1 and ANME2), nitrification, denitrification, sulfate reduction, degradation of complex carbon substrates, and metal resistance were also detected. Clone libraries supported the GeoChip results but were less effective than the microarray in delineating microbial populations of low biomass. Overall, these results suggest that the hydrothermal microbial communities are metabolically and physiologically highly diverse, and the communities appear to be undergoing rapid dynamic succession and adaptation in response to the steep temperature and chemical gradients across the chimney. PMID:19273854

  20. Whole-exome sequencing in obsessive-compulsive disorder identifies rare mutations in immunological and neurodevelopmental pathways

    PubMed Central

    Cappi, C; Brentani, H; Lima, L; Sanders, S J; Zai, G; Diniz, B J; Reis, V N S; Hounie, A G; Conceição do Rosário, M; Mariani, D; Requena, G L; Puga, R; Souza-Duran, F L; Shavitt, R G; Pauls, D L; Miguel, E C; Fernandez, T V

    2016-01-01

    Studies of rare genetic variation have identified molecular pathways conferring risk for developmental neuropsychiatric disorders. To date, no published whole-exome sequencing studies have been reported in obsessive-compulsive disorder (OCD). We sequenced all the genome coding regions in 20 sporadic OCD cases and their unaffected parents to identify rare de novo (DN) single-nucleotide variants (SNVs). The primary aim of this pilot study was to determine whether DN variation contributes to OCD risk. To this aim, we evaluated whether there is an elevated rate of DN mutations in OCD, which would justify this approach toward gene discovery in larger studies of the disorder. Furthermore, to explore functional molecular correlations among genes with nonsynonymous DN SNVs in OCD probands, a protein–protein interaction (PPI) network was generated based on databases of direct molecular interactions. We applied Degree-Aware Disease Gene Prioritization (DADA) to rank the PPI network genes based on their relatedness to a set of OCD candidate genes from two OCD genome-wide association studies (Stewart et al., 2013; Mattheisen et al., 2014). In addition, we performed a pathway analysis with genes from the PPI network. The rate of DN SNVs in OCD was 2.51 × 10−8 per base per generation, significantly higher than a previous estimated rate in unaffected subjects using the same sequencing platform and analytic pipeline. Several genes harboring DN SNVs in OCD were highly interconnected in the PPI network and ranked high in the DADA analysis. Nearly all the DN SNVs in this study are in genes expressed in the human brain, and a pathway analysis revealed enrichment in immunological and central nervous system functioning and development. The results of this pilot study indicate that further investigation of DN variation in larger OCD cohorts is warranted to identify specific risk genes and to confirm our preliminary finding with regard to PPI network enrichment for particular biological pathways and functions. PMID:27023170

  1. Comprehensive analysis of pathway or functionally related gene expression in the National Cancer Institute's anticancer screen.

    PubMed

    Huang, Ruili; Wallqvist, Anders; Covell, David G

    2006-03-01

    We have analyzed the level of gene coregulation, using gene expression patterns measured across the National Cancer Institute's 60 tumor cell panels (NCI(60)), in the context of predefined pathways or functional categories annotated by KEGG (Kyoto Encyclopedia of Genes and Genomes), BioCarta, and GO (Gene Ontology). Statistical methods were used to evaluate the level of gene expression coherence (coordinated expression) by comparing intra- and interpathway gene-gene correlations. Our results show that gene expression in pathways, or groups of functionally related genes, has a significantly higher level of coherence than that of a randomly selected set of genes. Transcriptional-level gene regulation appears to be on a "need to be" basis, such that pathways comprising genes encoding closely interacting proteins and pathways responsible for vital cellular processes or processes that are related to growth or proliferation, specifically in cancer cells, such as those engaged in genetic information processing, cell cycle, energy metabolism, and nucleotide metabolism, tend to be more modular (lower degree of gene sharing) and to have genes significantly more coherently expressed than most signaling and regular metabolic pathways. Hierarchical clustering of pathways based on their differential gene expression in the NCI(60) further revealed interesting interpathway communications or interactions indicative of a higher level of pathway regulation. The knowledge of the nature of gene expression regulation and biological pathways can be applied to understanding the mechanism by which small drug molecules interfere with biological systems.

  2. Gene Ontology-Based Analysis of Zebrafish Omics Data Using the Web Tool Comparative Gene Ontology.

    PubMed

    Ebrahimie, Esmaeil; Fruzangohar, Mario; Moussavi Nik, Seyyed Hani; Newman, Morgan

    2017-10-01

    Gene Ontology (GO) analysis is a powerful tool in systems biology, which uses a defined nomenclature to annotate genes/proteins within three categories: "Molecular Function," "Biological Process," and "Cellular Component." GO analysis can assist in revealing functional mechanisms underlying observed patterns in transcriptomic, genomic, and proteomic data. The already extensive and increasing use of zebrafish for modeling genetic and other diseases highlights the need to develop a GO analytical tool for this organism. The web tool Comparative GO was originally developed for GO analysis of bacterial data in 2013 ( www.comparativego.com ). We have now upgraded and elaborated this web tool for analysis of zebrafish genetic data using GOs and annotations from the Gene Ontology Consortium.

  3. Genome-wide analysis of the structural genes regulating defense phenylpropanoid metabolism in Populus

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tschaplinski, Timothy J; Tsai, Chung-Jui; Harding, Scott A

    Salicin-based phenolic glycosides, hydroxycinnamate derivatives and flavonoid-derived condensed tannins comprise up to one-third of Populus leaf dry mass. Genes regulating the abundance and chemical diversity of these substances have not been comprehensively analysed in tree species exhibiting this metabolically demanding level of phenolic metabolism. Here, shikimate-phenylpropanoid pathway genes thought to give rise to these phenolic products were annotated from the Populus genome, their expression assessed by semiquantitative or quantitative reverse transcription polymerase chain reaction (PCR), and metabolic evidence for function presented. Unlike Arabidopsis, Populus leaves accumulate an array of hydroxycinnamoyl-quinate esters, which is consistent with broadened function of the expandedmore » hydroxycinnamoyl-CoA transferase gene family. Greater flavonoid pathway diversity is also represented, and flavonoid gene families are larger. Consistent with expanded pathway function, most of these genes were upregulated during wound-stimulated condensed tannin synthesis in leaves. The suite of Populus genes regulating phenylpropanoid product accumulation should have important application in managing phenolic carbon pools in relation to climate change and global carbon cycling.« less

  4. A method to identify differential expression profiles of time-course gene data with Fourier transformation

    PubMed Central

    2013-01-01

    Background Time course gene expression experiments are an increasingly popular method for exploring biological processes. Temporal gene expression profiles provide an important characterization of gene function, as biological systems are both developmental and dynamic. With such data it is possible to study gene expression changes over time and thereby to detect differential genes. Much of the early work on analyzing time series expression data relied on methods developed originally for static data and thus there is a need for improved methodology. Since time series expression is a temporal process, its unique features such as autocorrelation between successive points should be incorporated into the analysis. Results This work aims to identify genes that show different gene expression profiles across time. We propose a statistical procedure to discover gene groups with similar profiles using a nonparametric representation that accounts for the autocorrelation in the data. In particular, we first represent each profile in terms of a Fourier basis, and then we screen out genes that are not differentially expressed based on the Fourier coefficients. Finally, we cluster the remaining gene profiles using a model-based approach in the Fourier domain. We evaluate the screening results in terms of sensitivity, specificity, FDR and FNR, compare with the Gaussian process regression screening in a simulation study and illustrate the results by application to yeast cell-cycle microarray expression data with alpha-factor synchronization. The key elements of the proposed methodology: (i) representation of gene profiles in the Fourier domain; (ii) automatic screening of genes based on the Fourier coefficients and taking into account autocorrelation in the data, while controlling the false discovery rate (FDR); (iii) model-based clustering of the remaining gene profiles. Conclusions Using this method, we identified a set of cell-cycle-regulated time-course yeast genes. The proposed method is general and can be potentially used to identify genes which have the same patterns or biological processes, and help facing the present and forthcoming challenges of data analysis in functional genomics. PMID:24134721

  5. Integrated pathway-based approach identifies association between genomic regions at CTCF and CACNB2 and schizophrenia.

    PubMed

    Juraeva, Dilafruz; Haenisch, Britta; Zapatka, Marc; Frank, Josef; Witt, Stephanie H; Mühleisen, Thomas W; Treutlein, Jens; Strohmaier, Jana; Meier, Sandra; Degenhardt, Franziska; Giegling, Ina; Ripke, Stephan; Leber, Markus; Lange, Christoph; Schulze, Thomas G; Mössner, Rainald; Nenadic, Igor; Sauer, Heinrich; Rujescu, Dan; Maier, Wolfgang; Børglum, Anders; Ophoff, Roel; Cichon, Sven; Nöthen, Markus M; Rietschel, Marcella; Mattheisen, Manuel; Brors, Benedikt

    2014-06-01

    In the present study, an integrated hierarchical approach was applied to: (1) identify pathways associated with susceptibility to schizophrenia; (2) detect genes that may be potentially affected in these pathways since they contain an associated polymorphism; and (3) annotate the functional consequences of such single-nucleotide polymorphisms (SNPs) in the affected genes or their regulatory regions. The Global Test was applied to detect schizophrenia-associated pathways using discovery and replication datasets comprising 5,040 and 5,082 individuals of European ancestry, respectively. Information concerning functional gene-sets was retrieved from the Kyoto Encyclopedia of Genes and Genomes, Gene Ontology, and the Molecular Signatures Database. Fourteen of the gene-sets or pathways identified in the discovery dataset were confirmed in the replication dataset. These include functional processes involved in transcriptional regulation and gene expression, synapse organization, cell adhesion, and apoptosis. For two genes, i.e. CTCF and CACNB2, evidence for association with schizophrenia was available (at the gene-level) in both the discovery study and published data from the Psychiatric Genomics Consortium schizophrenia study. Furthermore, these genes mapped to four of the 14 presently identified pathways. Several of the SNPs assigned to CTCF and CACNB2 have potential functional consequences, and a gene in close proximity to CACNB2, i.e. ARL5B, was identified as a potential gene of interest. Application of the present hierarchical approach thus allowed: (1) identification of novel biological gene-sets or pathways with potential involvement in the etiology of schizophrenia, as well as replication of these findings in an independent cohort; (2) detection of genes of interest for future follow-up studies; and (3) the highlighting of novel genes in previously reported candidate regions for schizophrenia.

  6. Analysis of functional redundancies within the Arabidopsis TCP transcription factor family.

    PubMed

    Danisman, Selahattin; van Dijk, Aalt D J; Bimbo, Andrea; van der Wal, Froukje; Hennig, Lars; de Folter, Stefan; Angenent, Gerco C; Immink, Richard G H

    2013-12-01

    Analyses of the functions of TEOSINTE-LIKE1, CYCLOIDEA, and PROLIFERATING CELL FACTOR1 (TCP) transcription factors have been hampered by functional redundancy between its individual members. In general, putative functionally redundant genes are predicted based on sequence similarity and confirmed by genetic analysis. In the TCP family, however, identification is impeded by relatively low overall sequence similarity. In a search for functionally redundant TCP pairs that control Arabidopsis leaf development, this work performed an integrative bioinformatics analysis, combining protein sequence similarities, gene expression data, and results of pair-wise protein-protein interaction studies for the 24 members of the Arabidopsis TCP transcription factor family. For this, the work completed any lacking gene expression and protein-protein interaction data experimentally and then performed a comprehensive prediction of potential functional redundant TCP pairs. Subsequently, redundant functions could be confirmed for selected predicted TCP pairs by genetic and molecular analyses. It is demonstrated that the previously uncharacterized class I TCP19 gene plays a role in the control of leaf senescence in a redundant fashion with TCP20. Altogether, this work shows the power of combining classical genetic and molecular approaches with bioinformatics predictions to unravel functional redundancies in the TCP transcription factor family.

  7. Analysis of functional redundancies within the Arabidopsis TCP transcription factor family

    PubMed Central

    Danisman, Selahattin; de Folter, Stefan; Immink, Richard G. H.

    2013-01-01

    Analyses of the functions of TEOSINTE-LIKE1, CYCLOIDEA, and PROLIFERATING CELL FACTOR1 (TCP) transcription factors have been hampered by functional redundancy between its individual members. In general, putative functionally redundant genes are predicted based on sequence similarity and confirmed by genetic analysis. In the TCP family, however, identification is impeded by relatively low overall sequence similarity. In a search for functionally redundant TCP pairs that control Arabidopsis leaf development, this work performed an integrative bioinformatics analysis, combining protein sequence similarities, gene expression data, and results of pair-wise protein–protein interaction studies for the 24 members of the Arabidopsis TCP transcription factor family. For this, the work completed any lacking gene expression and protein–protein interaction data experimentally and then performed a comprehensive prediction of potential functional redundant TCP pairs. Subsequently, redundant functions could be confirmed for selected predicted TCP pairs by genetic and molecular analyses. It is demonstrated that the previously uncharacterized class I TCP19 gene plays a role in the control of leaf senescence in a redundant fashion with TCP20. Altogether, this work shows the power of combining classical genetic and molecular approaches with bioinformatics predictions to unravel functional redundancies in the TCP transcription factor family. PMID:24129704

  8. Electrochemical biosensor based on functional composite nanofibers for detection of K-ras gene via multiple signal amplification strategy.

    PubMed

    Wang, Xiaoying; Shu, Guofang; Gao, Chanchan; Yang, Yu; Xu, Qian; Tang, Meng

    2014-12-01

    An electrochemical biosensor based on functional composite nanofibers for hybridization detection of specific K-ras gene that is highly associated with colorectal cancer via multiple signal amplification strategy has been developed. The carboxylated multiwalled carbon nanotubes (MWCNTs) doped nylon 6 (PA6) composite nanofibers (MWCNTs-PA6) was prepared using electrospinning, which served as the nanosized backbone for thionine (TH) electropolymerization. The functional composite nanofibers [MWCNTs-PA6-PTH, where PTH is poly(thionine)] used as supporting scaffolds for single-stranded DNA1 (ssDNA1) immobilization can dramatically increase the amount of DNA attachment and the hybridization sensitivity. Through the hybridization reaction, a sandwich format of ssDNA1/K-ras gene/gold nanoparticle-labeled ssDNA2 (AuNPs-ssDNA2) was fabricated, and the AuNPs offered excellent electrochemical signal transduction. The signal amplification was further implemented by forming network-like thiocyanuric acid/gold nanoparticles (TA/AuNPs). A significant sensitivity enhancement was obtained; the detection limit was down to 30fM, and the discriminations were up to 54.3 and 51.9% between the K-ras gene and the one-base mismatched sequences including G/C and A/T mismatched bases, respectively. The amenability of this method to the analyses of K-ras gene from the SW480 colorectal cancer cell lysates was demonstrated. The results are basically consistent with those of the K-ras Kit (HRM: high-resolution melt). The method holds promise for the diagnosis and management of cancer. Copyright © 2014 Elsevier Inc. All rights reserved.

  9. Functional Genomics Using the Saccharomyces cerevisiae Yeast Deletion Collections.

    PubMed

    Nislow, Corey; Wong, Lai Hong; Lee, Amy Huei-Yi; Giaever, Guri

    2016-09-01

    Constructed by a consortium of 16 laboratories, the Saccharomyces genome-wide deletion collections have, for the past decade, provided a powerful, rapid, and inexpensive approach for functional profiling of the yeast genome. Loss-of-function deletion mutants were systematically created using a polymerase chain reaction (PCR)-based gene deletion strategy to generate a start-to-stop codon replacement of each open reading frame by homologous recombination. Each strain carries two molecular barcodes that serve as unique strain identifiers, enabling their growth to be analyzed in parallel and the fitness contribution of each gene to be quantitatively assessed by hybridization to high-density oligonucleotide arrays or through the use of next-generation sequencing technologies. Functional profiling of the deletion collections, using either strain-by-strain or parallel assays, provides an unbiased approach to systematically survey the yeast genome. The Saccharomyces yeast deletion collections have proved immensely powerful in contributing to the understanding of gene function, including functional relationships between genes and genetic pathways in response to diverse genetic and environmental perturbations. © 2016 Cold Spring Harbor Laboratory Press.

  10. The transcription factor p53: Not a repressor, solely an activator

    PubMed Central

    Fischer, Martin; Steiner, Lydia; Engeland, Kurt

    2014-01-01

    The predominant function of the tumor suppressor p53 is transcriptional regulation. It is generally accepted that p53-dependent transcriptional activation occurs by binding to a specific recognition site in promoters of target genes. Additionally, several models for p53-dependent transcriptional repression have been postulated. Here, we evaluate these models based on a computational meta-analysis of genome-wide data. Surprisingly, several major models of p53-dependent gene regulation are implausible. Meta-analysis of large-scale data is unable to confirm reports on directly repressed p53 target genes and falsifies models of direct repression. This notion is supported by experimental re-analysis of representative genes reported as directly repressed by p53. Therefore, p53 is not a direct repressor of transcription, but solely activates its target genes. Moreover, models based on interference of p53 with activating transcription factors as well as models based on the function of ncRNAs are also not supported by the meta-analysis. As an alternative to models of direct repression, the meta-analysis leads to the conclusion that p53 represses transcription indirectly by activation of the p53-p21-DREAM/RB pathway. PMID:25486564

  11. Shortening tobacco life cycle accelerates functional gene identification in genomic research.

    PubMed

    Ning, G; Xiao, X; Lv, H; Li, X; Zuo, Y; Bao, M

    2012-11-01

    Definitive allocation of function requires the introduction of genetic mutations and analysis of their phenotypic consequences. Novel, rapid and convenient techniques or materials are very important and useful to accelerate gene identification in functional genomics research. Here, over-expression of PmFT (Prunus mume), a novel FT orthologue, and PtFT (Populus tremula) lead to shortening of the tobacco life cycle. A series of novel short life cycle stable tobacco lines (30-50 days) were developed through repeated self-crossing selection breeding. Based on the second transformation via a gusA reporter gene, the promoter from BpFULL1 in silver birch (Betula pendula) and the gene (CPC) from Arabidopsis thaliana were effectively tested using short life cycle tobacco lines. Comparative analysis among wild type, short life cycle tobacco and Arabidopsis transformation system verified that it is optional to accelerate functional gene studies by shortening host plant material life cycle, at least in these short life cycle tobacco lines. The results verified that the novel short life cycle transgenic tobacco lines not only combine the advantages of economic nursery requirements and a simple transformation system, but also provide a robust, effective and stable host system to accelerate gene analysis. Thus, shortening tobacco life cycle strategy is feasible to accelerate heterologous or homologous functional gene identification in genomic research. © 2012 German Botanical Society and The Royal Botanical Society of the Netherlands.

  12. DiRE: identifying distant regulatory elements of co-expressed genes

    PubMed Central

    Gotea, Valer; Ovcharenko, Ivan

    2008-01-01

    Regulation of gene expression in eukaryotic genomes is established through a complex cooperative activity of proximal promoters and distant regulatory elements (REs) such as enhancers, repressors and silencers. We have developed a web server named DiRE, based on the Enhancer Identification (EI) method, for predicting distant regulatory elements in higher eukaryotic genomes, namely for determining their chromosomal location and functional characteristics. The server uses gene co-expression data, comparative genomics and profiles of transcription factor binding sites (TFBSs) to determine TFBS-association signatures that can be used for discriminating specific regulatory functions. DiRE's unique feature is its ability to detect REs outside of proximal promoter regions, as it takes advantage of the full gene locus to conduct the search. DiRE can predict common REs for any set of input genes for which the user has prior knowledge of co-expression, co-function or other biologically meaningful grouping. The server predicts function-specific REs consisting of clusters of specifically-associated TFBSs and it also scores the association of individual transcription factors (TFs) with the biological function shared by the group of input genes. Its integration with the Array2BIO server allows users to start their analysis with raw microarray expression data. The DiRE web server is freely available at http://dire.dcode.org. PMID:18487623

  13. Gene-by-environment effect of house dust mite on purinergic receptor P2Y12 (P2RY12) and lung function in children with asthma.

    PubMed

    Bunyavanich, S; Boyce, J A; Raby, B A; Weiss, S T

    2012-02-01

    Distinct receptors likely exist for leukotriene (LT)E(4), a potent mediator of airway inflammation. Purinergic receptor P2Y12 is needed for LTE(4)-induced airways inflammation, and P2Y12 antagonism attenuates house dust mite-induced pulmonary eosinophilia in mice. Although experimental data support a role for P2Y12 in airway inflammation, its role in human asthma has never been studied. To test for association between variants in the P2Y12 gene (P2RY12) and lung function in human subjects with asthma, and to examine for gene-by-environment interaction with house dust mite exposure. Nineteen single nucleotide polymorphisms (SNPs) in P2RY12 were genotyped in 422 children with asthma and their parents (n = 1266). Using family based methods, we tested for associations between these SNPs and five lung function measures. We performed haplotype association analyses and tested for gene-by-environment interactions using house dust mite exposure. We used the false discovery rate to account for multiple comparisons. Five SNPs in P2RY12 were associated with multiple lung function measures (P-values 0.006–0.025). Haplotypes in P2RY12 were also associated with lung function (P-values 0.0055–0.046). House dust mite exposure modulated associations between P2RY12 and lung function, with minor allele homozygotes exposed to house dust mite demonstrating worse lung function than those unexposed (significant interaction P-values 0.0028–0.040). The P2RY12 variants were associated with lung function in a large family-based asthma cohort. House dust mite exposure caused significant gene-by-environment effects. Our findings add the first human evidence to experimental data supporting a role for P2Y12 in lung function. P2Y12 could represent a novel target for asthma treatment.

  14. Exploration of the Anti-Inflammatory Drug Space Through Network Pharmacology: Applications for Drug Repurposing

    PubMed Central

    de Anda-Jáuregui, Guillermo; Guo, Kai; McGregor, Brett A.; Hur, Junguk

    2018-01-01

    The quintessential biological response to disease is inflammation. It is a driver and an important element in a wide range of pathological states. Pharmacological management of inflammation is therefore central in the clinical setting. Anti-inflammatory drugs modulate specific molecules involved in the inflammatory response; these drugs are traditionally classified as steroidal and non-steroidal drugs. However, the effects of these drugs are rarely limited to their canonical targets, affecting other molecules and altering biological functions with system-wide effects that can lead to the emergence of secondary therapeutic applications or adverse drug reactions (ADRs). In this study, relationships among anti-inflammatory drugs, functional pathways, and ADRs were explored through network models. We integrated structural drug information, experimental anti-inflammatory drug perturbation gene expression profiles obtained from the Connectivity Map and Library of Integrated Network-Based Cellular Signatures, functional pathways in the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome databases, as well as adverse reaction information from the U.S. Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS). The network models comprise nodes representing anti-inflammatory drugs, functional pathways, and adverse effects. We identified structural and gene perturbation similarities linking anti-inflammatory drugs. Functional pathways were connected to drugs by implementing Gene Set Enrichment Analysis (GSEA). Drugs and adverse effects were connected based on the proportional reporting ratio (PRR) of an adverse effect in response to a given drug. Through these network models, relationships among anti-inflammatory drugs, their functional effects at the pathway level, and their adverse effects were explored. These networks comprise 70 different anti-inflammatory drugs, 462 functional pathways, and 1,175 ADRs. Network-based properties, such as degree, clustering coefficient, and node strength, were used to identify new therapeutic applications within and beyond the anti-inflammatory context, as well as ADR risk for these drugs, helping to select better repurposing candidates. Based on these parameters, we identified naproxen, meloxicam, etodolac, tenoxicam, flufenamic acid, fenoprofen, and nabumetone as candidates for drug repurposing with lower ADR risk. This network-based analysis pipeline provides a novel way to explore the effects of drugs in a therapeutic space. PMID:29545755

  15. Exploration of the Anti-Inflammatory Drug Space Through Network Pharmacology: Applications for Drug Repurposing.

    PubMed

    de Anda-Jáuregui, Guillermo; Guo, Kai; McGregor, Brett A; Hur, Junguk

    2018-01-01

    The quintessential biological response to disease is inflammation. It is a driver and an important element in a wide range of pathological states. Pharmacological management of inflammation is therefore central in the clinical setting. Anti-inflammatory drugs modulate specific molecules involved in the inflammatory response; these drugs are traditionally classified as steroidal and non-steroidal drugs. However, the effects of these drugs are rarely limited to their canonical targets, affecting other molecules and altering biological functions with system-wide effects that can lead to the emergence of secondary therapeutic applications or adverse drug reactions (ADRs). In this study, relationships among anti-inflammatory drugs, functional pathways, and ADRs were explored through network models. We integrated structural drug information, experimental anti-inflammatory drug perturbation gene expression profiles obtained from the Connectivity Map and Library of Integrated Network-Based Cellular Signatures, functional pathways in the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome databases, as well as adverse reaction information from the U.S. Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS). The network models comprise nodes representing anti-inflammatory drugs, functional pathways, and adverse effects. We identified structural and gene perturbation similarities linking anti-inflammatory drugs. Functional pathways were connected to drugs by implementing Gene Set Enrichment Analysis (GSEA). Drugs and adverse effects were connected based on the proportional reporting ratio (PRR) of an adverse effect in response to a given drug. Through these network models, relationships among anti-inflammatory drugs, their functional effects at the pathway level, and their adverse effects were explored. These networks comprise 70 different anti-inflammatory drugs, 462 functional pathways, and 1,175 ADRs. Network-based properties, such as degree, clustering coefficient, and node strength, were used to identify new therapeutic applications within and beyond the anti-inflammatory context, as well as ADR risk for these drugs, helping to select better repurposing candidates. Based on these parameters, we identified naproxen, meloxicam, etodolac, tenoxicam, flufenamic acid, fenoprofen, and nabumetone as candidates for drug repurposing with lower ADR risk. This network-based analysis pipeline provides a novel way to explore the effects of drugs in a therapeutic space.

  16. Use of whole genome expression analysis in the toxicity screening of nanoparticles

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fröhlich, Eleonore, E-mail: eleonore.froehlich@medunigraz.at; Meindl, Claudia; Wagner, Karin

    2014-10-15

    The use of nanoparticles (NPs) offers exciting new options in technical and medical applications provided they do not cause adverse cellular effects. Cellular effects of NPs depend on particle parameters and exposure conditions. In this study, whole genome expression arrays were employed to identify the influence of particle size, cytotoxicity, protein coating, and surface functionalization of polystyrene particles as model particles and for short carbon nanotubes (CNTs) as particles with potential interest in medical treatment. Another aim of the study was to find out whether screening by microarray would identify other or additional targets than commonly used cell-based assays formore » NP action. Whole genome expression analysis and assays for cell viability, interleukin secretion, oxidative stress, and apoptosis were employed. Similar to conventional assays, microarray data identified inflammation, oxidative stress, and apoptosis as affected by NP treatment. Application of lower particle doses and presence of protein decreased the total number of regulated genes but did not markedly influence the top regulated genes. Cellular effects of CNTs were small; only carboxyl-functionalized single-walled CNTs caused appreciable regulation of genes. It can be concluded that regulated functions correlated well with results in cell-based assays. Presence of protein mitigated cytotoxicity but did not cause a different pattern of regulated processes. - Highlights: • Regulated functions were screened using whole genome expression assays. • Polystyrene particles regulated more genes than short carbon nanotubes. • Protein coating of polystyrene particles did not change regulation pattern. • Functions regulated by microarray were confirmed by cell-based assay.« less

  17. Polycation-based gene therapy: current knowledge and new perspectives.

    PubMed

    Tiera, Marcio J; Shi, Qin; Winnik, Françoise M; Fernandes, Julio C

    2011-08-01

    At present, gene transfection insufficient efficiency is a major drawback of non-viral gene therapy. The 2 main types of delivery systems deployed in gene therapy are based on viral or non-viral gene carriers. Several non-viral modalities can transfer foreign genetic material into the human body. To do so, polycation-based gene delivery methods must achieve sufficient efficiency in the transportation of therapeutic genes across various extracellular and intracellular barriers. These barriers include interactions with blood components, vascular endothelial cells and uptake by the reticuloendothelial system. Furthermore, the degradation of therapeutic DNA by serum nucleases is a potential obstacle for functional delivery to target cells. Cationic polymers constitute one of the most promising approaches to the use of viral vectors for gene therapy. A better understanding of the mechanisms by which DNA can escape from endosomes and traffic to enter the nucleus has triggered new strategies of synthesis and has revitalized research into new polycation-based systems. The objective of this review is to address the state of the art in gene therapy with synthetic and natural polycations and the latest advances to improve gene transfer efficiency in cells.

  18. Lynx web services for annotations and systems analysis of multi-gene disorders.

    PubMed

    Sulakhe, Dinanath; Taylor, Andrew; Balasubramanian, Sandhya; Feng, Bo; Xie, Bingqing; Börnigen, Daniela; Dave, Utpal J; Foster, Ian T; Gilliam, T Conrad; Maltsev, Natalia

    2014-07-01

    Lynx is a web-based integrated systems biology platform that supports annotation and analysis of experimental data and generation of weighted hypotheses on molecular mechanisms contributing to human phenotypes and disorders of interest. Lynx has integrated multiple classes of biomedical data (genomic, proteomic, pathways, phenotypic, toxicogenomic, contextual and others) from various public databases as well as manually curated data from our group and collaborators (LynxKB). Lynx provides tools for gene list enrichment analysis using multiple functional annotations and network-based gene prioritization. Lynx provides access to the integrated database and the analytical tools via REST based Web Services (http://lynx.ci.uchicago.edu/webservices.html). This comprises data retrieval services for specific functional annotations, services to search across the complete LynxKB (powered by Lucene), and services to access the analytical tools built within the Lynx platform. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. The structural and functional connectivity of the grassland plant Lychnis flos-cuculi

    PubMed Central

    Aavik, T; Holderegger, R; Bolliger, J

    2014-01-01

    Understanding the relationship between structural and functional connectivity is essential for successful restoration and conservation management, particularly in intensely managed agricultural landscapes. We evaluated the relationship between structural and functional connectivity of the wetland plant Lychnis flos-cuculi in a fragmented agricultural landscape using landscape genetic and network approaches. First, we studied the effect of structural connectivity, such as geographic distance and various landscape elements (forest, agricultural land, settlements and ditch verges), on gene flow among populations as a measurement of functional connectivity. Second, we examined the effect of structural graph-theoretic connectivity measures on gene flow among populations and on genetic diversity within populations of L. flos-cuculi. Among landscape elements, forests hindered gene flow in L. flos-cuculi, whereas gene flow was independent of geographic distance. Among the structural graph-theoretic connectivity variables, only intrapopulation connectivity, which was based on population size, had a significant positive effect on gene flow, that is, more gene flow took place among larger populations. Unexpectedly, interpopulation connectivity of populations, which takes into account the spatial location and distance among populations, did not influence gene flow in L. flos-cuculi. However, higher observed heterozygosity and lower inbreeding was observed in populations characterised by higher structural interpopulation connectivity. This finding shows that a spatially coherent network of populations is significant for maintaining the genetic diversity of populations. Nevertheless, lack of significant relationships between gene flow and most of the structural connectivity measures suggests that structural connectivity does not necessarily correspond to functional connectivity. PMID:24253937

  20. Linking Yeast Gcn5p Catalytic Function and Gene Regulation Using a Quantitative, Graded Dominant Mutant Approach

    PubMed Central

    Lanza, Amanda M.; Blazeck, John J.; Crook, Nathan C.; Alper, Hal S.

    2012-01-01

    Establishing causative links between protein functional domains and global gene regulation is critical for advancements in genetics, biotechnology, disease treatment, and systems biology. This task is challenging for multifunctional proteins when relying on traditional approaches such as gene deletions since they remove all domains simultaneously. Here, we describe a novel approach to extract quantitative, causative links by modulating the expression of a dominant mutant allele to create a function-specific competitive inhibition. Using the yeast histone acetyltransferase Gcn5p as a case study, we demonstrate the utility of this approach and (1) find evidence that Gcn5p is more involved in cell-wide gene repression, instead of the accepted gene activation associated with HATs, (2) identify previously unknown gene targets and interactions for Gcn5p-based acetylation, (3) quantify the strength of some Gcn5p-DNA associations, (4) demonstrate that this approach can be used to correctly identify canonical chromatin modifications, (5) establish the role of acetyltransferase activity on synthetic lethal interactions, and (6) identify new functional classes of genes regulated by Gcn5p acetyltransferase activity—all six of these major conclusions were unattainable by using standard gene knockout studies alone. We recommend that a graded dominant mutant approach be utilized in conjunction with a traditional knockout to study multifunctional proteins and generate higher-resolution data that more accurately probes protein domain function and influence. PMID:22558379

  1. The Biogeographic Pattern of Microbial Functional Genes along an Altitudinal Gradient of the Tibetan Pasture

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Qi, Qi; Zhao, Mengxin; Wang, Shiping

    As the highest place of the world, the Tibetan plateau is a fragile ecosystem. Given the importance of microbial communities in driving soil nutrient cycling, it is of interest to document the microbial biogeographic pattern here. We adopted a microarray-based tool named GeoChip 4.0 to investigate grassland microbial functional genes along an elevation gradient from 3200 to 3800 m above sea level open to free grazing by local herdsmen and wild animals. Interestingly, microbial functional diversities increase with elevation, so does the relative abundances of genes associated with carbon degradation, nitrogen cycling, methane production, cold shock and oxygen limitation. Themore » range of Shannon diversities (10.27–10.58) showed considerably smaller variation than what was previously observed at ungrazed sites nearby (9.95–10.65), suggesting the important role of livestock grazing on microbial diversities. Closer examination showed that the dissimilarity of microbial community at our study sites increased with elevations, revealing an elevation-decay relationship of microbial functional genes. Both microbial functional diversity and the number of unique genes increased with elevations. Furthermore, we detected a tight linkage of greenhouse gas (CO2) and relative abundances of carbon cycling genes. Our biogeographic study provides insights on microbial functional diversity and soil biogeochemical cycling in Tibetan pastures.« less

  2. Literature and patent analysis of the cloning and identification of human functional genes in China.

    PubMed

    Xia, Yan; Tang, LiSha; Yao, Lei; Wan, Bo; Yang, XianMei; Yu, Long

    2012-03-01

    The Human Genome Project was launched at the end of the 1980s. Since then, the cloning and identification of functional genes has been a major focus of research across the world. In China too, the potentially profound impact of such studies on the life sciences and on human health was realized, and relevant studies were initiated in the 1990s. To advance China's involvement in the Human Genome Project, in the mid-1990s, Committee of Experts in Biology from National High Technology Research and Development Program of China (863 Program) proposed the "two 1%" goal. This goal envisaged China contributing 1% of the total sequencing work, and cloning and identifying 1% of the total human functional genes. Over the past 20 years, tremendous achievement has been accomplished by Chinese scientists. It is well known that scientists in China finished the 1% of sequencing work of the Human Genome Project, whereas, there is no comprehensive report about "whether China had finished cloning and identifying 1% of human functional genes". In the present study, the GenBank database at the National Center of Biotechnology Information, the PubMed search tool, and the patent database of the State Intellectual Property Office, China, were used to retrieve entries based on two screening standards: (i) Were the newly cloned and identified genes first reported by Chinese scientists? (ii) Were the Chinese scientists awarded the gene sequence patent? Entries were retrieved from the databases up to the cut-off date of 30 June 2011 and the obtained data were analyzed further. The results showed that 589 new human functional genes were first reported by Chinese scientists and 159 gene sequences were patented (http://gene.fudan.sh.cn/introduction/database/chinagene/chinagene.html). This study systematically summarizes China's contributions to human functional genomics research and answers the question "has China finished cloning and identifying 1% of human functional genes?" in the affirmative.

  3. FOAM (Functional Ontology Assignments for Metagenomes): A Hidden Markov Model (HMM) database with environmental focus

    DOE PAGES

    Prestat, Emmanuel; David, Maude M.; Hultman, Jenni; ...

    2014-09-26

    A new functional gene database, FOAM (Functional Ontology Assignments for Metagenomes), was developed to screen environmental metagenomic sequence datasets. FOAM provides a new functional ontology dedicated to classify gene functions relevant to environmental microorganisms based on Hidden Markov Models (HMMs). Sets of aligned protein sequences (i.e. ‘profiles’) were tailored to a large group of target KEGG Orthologs (KOs) from which HMMs were trained. The alignments were checked and curated to make them specific to the targeted KO. Within this process, sequence profiles were enriched with the most abundant sequences available to maximize the yield of accurate classifier models. An associatedmore » functional ontology was built to describe the functional groups and hierarchy. FOAM allows the user to select the target search space before HMM-based comparison steps and to easily organize the results into different functional categories and subcategories. FOAM is publicly available at http://portal.nersc.gov/project/m1317/FOAM/.« less

  4. RAS oncogene-mediated deregulation of the transcriptome: from molecular signature to function.

    PubMed

    Schäfer, Reinhold; Sers, Christine

    2011-01-01

    Transcriptome analysis of cancer cells has developed into a standard procedure to elucidate multiple features of the malignant process and to link gene expression to clinical properties. Gene expression profiling based on microarrays provides essentially correlative information and needs to be transferred to the functional level in order to understand the activity and contribution of individual genes or sets of genes as elements of the gene signature. To date, there exist significant gaps in the functional understanding of gene expression profiles. Moreover, the processes that drive the profound transcriptional alterations that characterize cancer cells remain mainly elusive. We have used pathway-restricted gene expression profiles derived from RAS oncogene-transformed cells and from RAS-expressing cancer cells to identify regulators downstream of the MAPK pathway.We describe the role of epigenetic regulation exemplified by the control of several immune genes in generic cell lines and colorectal cancer cells, particularly the functional interaction between signaling and DNA methylation. Moreover, we assess the role of the architectural transcription factor high mobility AT-hook 2 (HMGA2) as a regulator of the RAS-responsive transcriptome in ovarian epithelial cells. Finally, we describe an integrated approach combining pathway interference in colorectal cancer cells, gene expression profiling and computational analysis of regulatory elements of deregulated target genes. This strategy resulted in the identification of Y-box binding protein 1 (YBX1) as a regulator of MAPK-dependent proliferation and gene expression. The implications for a therapeutic application of HMGA2 gene silencing and the role of YBX1 as a prognostic factor are discussed.

  5. Identification of possible genetic polymorphisms involved in cancer cachexia: a systematic review.

    PubMed

    Tan, Benjamin H L; Ross, James A; Kaasa, Stein; Skorpen, Frank; Fearon, Kenneth C H

    2011-04-01

    Cancer cachexia is a polygenic and complex syndrome. Genetic variations in regulation of the inflammatory response, muscle and fat metabolic pathways, and pathways in appetite regulation are likely to contribute to the susceptibility or resistance to developing cancer cachexia. A systematic search of Medline and EmBase databases, covering 1986-2008 was performed for potential candidate genes/genetic polymorphisms relating to cancer cachexia. Related genes were then identified using pathway functional analysis software. All candidate genes were reviewed for functional polymorphisms or clinically significant polymorphisms associated with cachexia using the OMIM and GeneRIF databases. Genes with variants which had functional or clinical associations with cachexia and replicated in at least one study were entered into pathway analysis software to reveal possible network associations between genes. A total of 184 polymorphisms with functional or clinical relevance to cancer cachexia were identified in 92 candidate genes. Of these, 42 polymorphisms (in 33 genes) were replicated in more than one study with 13 polymorphisms found to influence two or more hallmarks of cachexia (i.e. inflammation, loss of fat mass and/or lean mass and reduced survival). Thirty-three genes were found to be significantly interconnected in two major networks with four genes (ADIPOQ, IL6, NFKB1 and TLR4) interlinking both networks. Selection of candidate genes and polymorphisms is a key element of multigene study design. The present study provides an initial framework to select genes/polymorphisms for further study in cancer cachexia, and to develop their potential as susceptibility biomarkers of developing cachexia.

  6. A high resolution atlas of gene expression in the domestic sheep (Ovis aries)

    PubMed Central

    Farquhar, Iseabail L.; Young, Rachel; Lefevre, Lucas; Pridans, Clare; Tsang, Hiu G.; Afrasiabi, Cyrus; Watson, Mick; Whitelaw, C. Bruce; Freeman, Tom C.; Archibald, Alan L.; Hume, David A.

    2017-01-01

    Sheep are a key source of meat, milk and fibre for the global livestock sector, and an important biomedical model. Global analysis of gene expression across multiple tissues has aided genome annotation and supported functional annotation of mammalian genes. We present a large-scale RNA-Seq dataset representing all the major organ systems from adult sheep and from several juvenile, neonatal and prenatal developmental time points. The Ovis aries reference genome (Oar v3.1) includes 27,504 genes (20,921 protein coding), of which 25,350 (19,921 protein coding) had detectable expression in at least one tissue in the sheep gene expression atlas dataset. Network-based cluster analysis of this dataset grouped genes according to their expression pattern. The principle of ‘guilt by association’ was used to infer the function of uncharacterised genes from their co-expression with genes of known function. We describe the overall transcriptional signatures present in the sheep gene expression atlas and assign those signatures, where possible, to specific cell populations or pathways. The findings are related to innate immunity by focusing on clusters with an immune signature, and to the advantages of cross-breeding by examining the patterns of genes exhibiting the greatest expression differences between purebred and crossbred animals. This high-resolution gene expression atlas for sheep is, to our knowledge, the largest transcriptomic dataset from any livestock species to date. It provides a resource to improve the annotation of the current reference genome for sheep, presenting a model transcriptome for ruminants and insight into gene, cell and tissue function at multiple developmental stages. PMID:28915238

  7. A high resolution atlas of gene expression in the domestic sheep (Ovis aries).

    PubMed

    Clark, Emily L; Bush, Stephen J; McCulloch, Mary E B; Farquhar, Iseabail L; Young, Rachel; Lefevre, Lucas; Pridans, Clare; Tsang, Hiu G; Wu, Chunlei; Afrasiabi, Cyrus; Watson, Mick; Whitelaw, C Bruce; Freeman, Tom C; Summers, Kim M; Archibald, Alan L; Hume, David A

    2017-09-01

    Sheep are a key source of meat, milk and fibre for the global livestock sector, and an important biomedical model. Global analysis of gene expression across multiple tissues has aided genome annotation and supported functional annotation of mammalian genes. We present a large-scale RNA-Seq dataset representing all the major organ systems from adult sheep and from several juvenile, neonatal and prenatal developmental time points. The Ovis aries reference genome (Oar v3.1) includes 27,504 genes (20,921 protein coding), of which 25,350 (19,921 protein coding) had detectable expression in at least one tissue in the sheep gene expression atlas dataset. Network-based cluster analysis of this dataset grouped genes according to their expression pattern. The principle of 'guilt by association' was used to infer the function of uncharacterised genes from their co-expression with genes of known function. We describe the overall transcriptional signatures present in the sheep gene expression atlas and assign those signatures, where possible, to specific cell populations or pathways. The findings are related to innate immunity by focusing on clusters with an immune signature, and to the advantages of cross-breeding by examining the patterns of genes exhibiting the greatest expression differences between purebred and crossbred animals. This high-resolution gene expression atlas for sheep is, to our knowledge, the largest transcriptomic dataset from any livestock species to date. It provides a resource to improve the annotation of the current reference genome for sheep, presenting a model transcriptome for ruminants and insight into gene, cell and tissue function at multiple developmental stages.

  8. Cationic liposome/DNA complexes: from structure to interactions with cellular membranes.

    PubMed

    Caracciolo, Giulio; Amenitsch, Heinz

    2012-10-01

    Gene-based therapeutic approaches are based upon the concept that, if a disease is caused by a mutation in a gene, then adding back the wild-type gene should restore regular function and attenuate the disease phenotype. To deliver the gene of interest, both viral and nonviral vectors are used. Viruses are efficient, but their application is impeded by detrimental side-effects. Among nonviral vectors, cationic liposomes are the most promising candidates for gene delivery. They form stable complexes with polyanionic DNA (lipoplexes). Despite several advantages over viral vectors, the transfection efficiency (TE) of lipoplexes is too low compared with those of engineered viral vectors. This is due to lack of knowledge about the interactions between complexes and cellular components. Rational design of efficient lipoplexes therefore requires deeper comprehension of the interactions between the vector and the DNA as well as the cellular pathways and mechanisms involved. The importance of the lipoplex structure in biological function is revealed in the application of synchrotron small-angle X-ray scattering in combination with functional TE measurements. According to current understanding, the structure of lipoplexes can change upon interaction with cellular membranes and such changes affect the delivery efficiency. Recently, a correlation between the mechanism of gene release from complexes, the structure, and the physical and chemical parameters of the complexes has been established. Studies aimed at correlating structure and activity of lipoplexes are reviewed herein. This is a fundamental step towards rational design of highly efficient lipid gene vectors.

  9. Functional Analyses of NSF1 in Wine Yeast Using Interconnected Correlation Clustering and Molecular Analyses

    PubMed Central

    Bessonov, Kyrylo; Walkey, Christopher J.; Shelp, Barry J.; van Vuuren, Hennie J. J.; Chiu, David; van der Merwe, George

    2013-01-01

    Analyzing time-course expression data captured in microarray datasets is a complex undertaking as the vast and complex data space is represented by a relatively low number of samples as compared to thousands of available genes. Here, we developed the Interdependent Correlation Clustering (ICC) method to analyze relationships that exist among genes conditioned on the expression of a specific target gene in microarray data. Based on Correlation Clustering, the ICC method analyzes a large set of correlation values related to gene expression profiles extracted from given microarray datasets. ICC can be applied to any microarray dataset and any target gene. We applied this method to microarray data generated from wine fermentations and selected NSF1, which encodes a C2H2 zinc finger-type transcription factor, as the target gene. The validity of the method was verified by accurate identifications of the previously known functional roles of NSF1. In addition, we identified and verified potential new functions for this gene; specifically, NSF1 is a negative regulator for the expression of sulfur metabolism genes, the nuclear localization of Nsf1 protein (Nsf1p) is controlled in a sulfur-dependent manner, and the transcription of NSF1 is regulated by Met4p, an important transcriptional activator of sulfur metabolism genes. The inter-disciplinary approach adopted here highlighted the accuracy and relevancy of the ICC method in mining for novel gene functions using complex microarray datasets with a limited number of samples. PMID:24130853

  10. COGNAT: a web server for comparative analysis of genomic neighborhoods.

    PubMed

    Klimchuk, Olesya I; Konovalov, Kirill A; Perekhvatov, Vadim V; Skulachev, Konstantin V; Dibrova, Daria V; Mulkidjanian, Armen Y

    2017-11-22

    In prokaryotic genomes, functionally coupled genes can be organized in conserved gene clusters enabling their coordinated regulation. Such clusters could contain one or several operons, which are groups of co-transcribed genes. Those genes that evolved from a common ancestral gene by speciation (i.e. orthologs) are expected to have similar genomic neighborhoods in different organisms, whereas those copies of the gene that are responsible for dissimilar functions (i.e. paralogs) could be found in dissimilar genomic contexts. Comparative analysis of genomic neighborhoods facilitates the prediction of co-regulated genes and helps to discern different functions in large protein families. We intended, building on the attribution of gene sequences to the clusters of orthologous groups of proteins (COGs), to provide a method for visualization and comparative analysis of genomic neighborhoods of evolutionary related genes, as well as a respective web server. Here we introduce the COmparative Gene Neighborhoods Analysis Tool (COGNAT), a web server for comparative analysis of genomic neighborhoods. The tool is based on the COG database, as well as the Pfam protein families database. As an example, we show the utility of COGNAT in identifying a new type of membrane protein complex that is formed by paralog(s) of one of the membrane subunits of the NADH:quinone oxidoreductase of type 1 (COG1009) and a cytoplasmic protein of unknown function (COG3002). This article was reviewed by Drs. Igor Zhulin, Uri Gophna and Igor Rogozin.

  11. The evolution of duplicate gene expression in mammalian organs

    PubMed Central

    Guschanski, Katerina; Warnefors, Maria; Kaessmann, Henrik

    2017-01-01

    Gene duplications generate genomic raw material that allows the emergence of novel functions, likely facilitating adaptive evolutionary innovations. However, global assessments of the functional and evolutionary relevance of duplicate genes in mammals were until recently limited by the lack of appropriate comparative data. Here, we report a large-scale study of the expression evolution of DNA-based functional gene duplicates in three major mammalian lineages (placental mammals, marsupials, egg-laying monotremes) and birds, on the basis of RNA sequencing (RNA-seq) data from nine species and eight organs. We observe dynamic changes in tissue expression preference of paralogs with different duplication ages, suggesting differential contribution of paralogs to specific organ functions during vertebrate evolution. Specifically, we show that paralogs that emerged in the common ancestor of bony vertebrates are enriched for genes with brain-specific expression and provide evidence for differential forces underlying the preferential emergence of young testis- and liver-specific expressed genes. Further analyses uncovered that the overall spatial expression profiles of gene families tend to be conserved, with several exceptions of pronounced tissue specificity shifts among lineage-specific gene family expansions. Finally, we trace new lineage-specific genes that may have contributed to the specific biology of mammalian organs, including the little-studied placenta. Overall, our study provides novel and taxonomically broad evidence for the differential contribution of duplicate genes to tissue-specific transcriptomes and for their importance for the phenotypic evolution of vertebrates. PMID:28743766

  12. oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes

    PubMed Central

    Ho Sui, Shannan J.; Mortimer, James R.; Arenillas, David J.; Brumm, Jochen; Walsh, Christopher J.; Kennedy, Brian P.; Wasserman, Wyeth W.

    2005-01-01

    Targeted transcript profiling studies can identify sets of co-expressed genes; however, identification of the underlying functional mechanism(s) is a significant challenge. Established methods for the analysis of gene annotations, particularly those based on the Gene Ontology, can identify functional linkages between genes. Similar methods for the identification of over-represented transcription factor binding sites (TFBSs) have been successful in yeast, but extension to human genomics has largely proved ineffective. Creation of a system for the efficient identification of common regulatory mechanisms in a subset of co-expressed human genes promises to break a roadblock in functional genomics research. We have developed an integrated system that searches for evidence of co-regulation by one or more transcription factors (TFs). oPOSSUM combines a pre-computed database of conserved TFBSs in human and mouse promoters with statistical methods for identification of sites over-represented in a set of co-expressed genes. The algorithm successfully identified mediating TFs in control sets of tissue-specific genes and in sets of co-expressed genes from three transcript profiling studies. Simulation studies indicate that oPOSSUM produces few false positives using empirically defined thresholds and can tolerate up to 50% noise in a set of co-expressed genes. PMID:15933209

  13. dbCPG: A web resource for cancer predisposition genes

    PubMed Central

    Wei, Ran; Yao, Yao; Yang, Wu; Zheng, Chun-Hou; Zhao, Min; Xia, Junfeng

    2016-01-01

    Cancer predisposition genes (CPGs) are genes in which inherited mutations confer highly or moderately increased risks of developing cancer. Identification of these genes and understanding the biological mechanisms that underlie them is crucial for the prevention, early diagnosis, and optimized management of cancer. Over the past decades, great efforts have been made to identify CPGs through multiple strategies. However, information on these CPGs and their molecular functions is scattered. To address this issue and provide a comprehensive resource for researchers, we developed the Cancer Predisposition Gene Database (dbCPG, Database URL: http://bioinfo.ahu.edu.cn:8080/dbCPG/index.jsp), the first literature-based gene resource for exploring human CPGs. It contains 827 human (724 protein-coding, 23 non-coding, and 80 unknown type genes), 637 rats, and 658 mouse CPGs. Furthermore, data mining was performed to gain insights into the understanding of the CPGs data, including functional annotation, gene prioritization, network analysis of prioritized genes and overlap analysis across multiple cancer types. A user-friendly web interface with multiple browse, search, and upload functions was also developed to facilitate access to the latest information on CPGs. Taken together, the dbCPG database provides a comprehensive data resource for further studies of cancer predisposition genes. PMID:27192119

  14. Gene network biological validity based on gene-gene interaction relevance.

    PubMed

    Gómez-Vela, Francisco; Díaz-Díaz, Norberto

    2014-01-01

    In recent years, gene networks have become one of the most useful tools for modeling biological processes. Many inference gene network algorithms have been developed as techniques for extracting knowledge from gene expression data. Ensuring the reliability of the inferred gene relationships is a crucial task in any study in order to prove that the algorithms used are precise. Usually, this validation process can be carried out using prior biological knowledge. The metabolic pathways stored in KEGG are one of the most widely used knowledgeable sources for analyzing relationships between genes. This paper introduces a new methodology, GeneNetVal, to assess the biological validity of gene networks based on the relevance of the gene-gene interactions stored in KEGG metabolic pathways. Hence, a complete KEGG pathway conversion into a gene association network and a new matching distance based on gene-gene interaction relevance are proposed. The performance of GeneNetVal was established with three different experiments. Firstly, our proposal is tested in a comparative ROC analysis. Secondly, a randomness study is presented to show the behavior of GeneNetVal when the noise is increased in the input network. Finally, the ability of GeneNetVal to detect biological functionality of the network is shown.

  15. Targeted sequencing identifies genetic polymorphisms of flavin-containing monooxygenase genes contributing to susceptibility of nicotine dependence in European American and African American.

    PubMed

    Zhang, Tian-Xiao; Saccone, Nancy L; Bierut, Laura J; Rice, John P

    2017-04-01

    Smoking is a leading cause of preventable death. Early studies based on samples of twins have linked the lifetime smoking practices to genetic predisposition. The flavin-containing monooxygenase (FMO) protein family consists of a group of enzymes that metabolize drugs and xenobiotics. Both FMO1 and FMO3 were potentially susceptible genes for nicotine metabolism process. In this study, we investigated the potential of FMO genes to confer risk of nicotine dependence via deep targeted sequencing in 2,820 study subjects comprising 1,583 nicotine dependents and 1,237 controls from European American and African American. Specifically, we focused on the two genomic segments including FMO1 , FMO3 , and pseudo gene FMO6P , and aimed to investigate the potential association between FMO genes and nicotine dependence. Both common and low-frequency/rare variants were analyzed using different algorithms. The potential functional significance of SNPs with association signal was investigated with relevant bioinformatics tools. We identified different clusters of significant common variants in European (with most significant SNP rs6674596, p  =   .0004, OR = 0.67, MAF_EA = 0.14, FMO1 ) and African Americans (with the most significant SNP rs6608453, p  =   .001, OR = 0.64, MAF_AA = 0.1, FMO6P ). No significant signals were identified through haplotype-based analyses. Gene network investigation indicated that both FMO1 and FMO3 have a strong relation with a variety of genes belonging to CYP gene families (with combined score greater than 0.9). Most of the significant variants identified were SNPs located within intron regions or with unknown functional significance, indicating a need for future work to understand the underlying functional significance of these signals. Our findings indicated significant association between FMO genes and nicotine dependence. Replications of our findings in other ethnic groups were needed in the future. Most of the significant variants identified were SNPs located within intronic regions or with unknown functional significance, indicating a need for future work to understand the underlying functional significance of these signals.

  16. Transcriptional Control by PARP-1: Chromatin Modulation, Enhancer-binding, Coregulation, and Insulation

    PubMed Central

    Kraus, W. Lee

    2008-01-01

    Summary The regulation of gene expression requires a wide array of protein factors that can modulate chromatin structure, act at enhancers, function as transcriptional coregulators, or regulate insulator function. Poly(ADP-ribose) polymerase-1 (PARP-1), an abundant and ubiquitous nuclear enzyme that catalyzes the NAD+-dependent addition of ADP-ribose polymers on a variety of nuclear proteins, has been implicated in all of these functions. Recent biochemical, genomic, proteomic, and cell-based studies have highlighted the role of PARP-1 in each of these processes and provided new insights about the molecular mechanisms governing PARP-1-dependent regulation of gene expression. In addition, these studies have demonstrated how PARP-1 functions as an integral part of cellular signaling pathways that culminate in gene regulatory outcomes. PMID:18450439

  17. Design and construction of a first-generation high-throughput integrated robotic molecular biology platform for bioenergy applications.

    PubMed

    Hughes, Stephen R; Butt, Tauseef R; Bartolett, Scott; Riedmuller, Steven B; Farrelly, Philip

    2011-08-01

    The molecular biological techniques for plasmid-based assembly and cloning of gene open reading frames are essential for elucidating the function of the proteins encoded by the genes. High-throughput integrated robotic molecular biology platforms that have the capacity to rapidly clone and express heterologous gene open reading frames in bacteria and yeast and to screen large numbers of expressed proteins for optimized function are an important technology for improving microbial strains for biofuel production. The process involves the production of full-length complementary DNA libraries as a source of plasmid-based clones to express the desired proteins in active form for determination of their functions. Proteins that were identified by high-throughput screening as having desired characteristics are overexpressed in microbes to enable them to perform functions that will allow more cost-effective and sustainable production of biofuels. Because the plasmid libraries are composed of several thousand unique genes, automation of the process is essential. This review describes the design and implementation of an automated integrated programmable robotic workcell capable of producing complementary DNA libraries, colony picking, isolating plasmid DNA, transforming yeast and bacteria, expressing protein, and performing appropriate functional assays. These operations will allow tailoring microbial strains to use renewable feedstocks for production of biofuels, bioderived chemicals, fertilizers, and other coproducts for profitable and sustainable biorefineries. Published by Elsevier Inc.

  18. Microarray profiling of human white adipose tissue after exogenous leptin injection.

    PubMed

    Taleb, S; Van Haaften, R; Henegar, C; Hukshorn, C; Cancello, R; Pelloux, V; Hanczar, B; Viguerie, N; Langin, D; Evelo, C; Zucker, J; Clément, K; Saris, W H M

    2006-03-01

    Leptin is a secreted adipocyte hormone that plays a key role in the regulation of body weight homeostasis. The leptin effect on human white adipose tissue (WAT) is still debated. The aim of this study was to assess whether the administration of polyethylene glycol-leptin (PEG-OB) in a single supraphysiological dose has transcriptional effects on genes of WAT and to identify its target genes and functional pathways in WAT. Blood samples and WAT biopsies were obtained from 10 healthy nonobese men before treatment and 72 h after the PEG-OB injection, leading to an approximate 809-fold increase in circulating leptin. The WAT gene expression profile before and after the PEG-OB injection was compared using pangenomic microarrays. Functional gene annotations based on the gene ontology of the PEG-OB regulated genes were performed using both an 'in house' automated procedure and GenMAPP (Gene Microarray Pathway Profiler), designed for viewing and analyzing gene expression data in the context of biological pathways. Statistical analysis of microarray data revealed that PEG-OB had a major down-regulated effect on WAT gene expression, as we obtained 1,822 and 100 down- and up-regulated genes, respectively. Microarray data were validated using reverse transcription quantitative PCR. Functional gene annotations of PEG-OB regulated genes revealed that the functional class related to immunity and inflammation was among the most mobilized PEG-OB pathway in WAT. These genes are mainly expressed in the cell of the stroma vascular fraction in comparison with adipocytes. Our observations support the hypothesis that leptin could act on WAT, particularly on genes related to inflammation and immunity, which may suggest a novel leptin target pathway in human WAT.

  19. From data to function: functional modeling of poultry genomics data.

    PubMed

    McCarthy, F M; Lyons, E

    2013-09-01

    One of the challenges of functional genomics is to create a better understanding of the biological system being studied so that the data produced are leveraged to provide gains for agriculture, human health, and the environment. Functional modeling enables researchers to make sense of these data as it reframes a long list of genes or gene products (mRNA, ncRNA, and proteins) by grouping based upon function, be it individual molecular functions or interactions between these molecules or broader biological processes, including metabolic and signaling pathways. However, poultry researchers have been hampered by a lack of functional annotation data, tools, and training to use these data and tools. Moreover, this lack is becoming more critical as new sequencing technologies enable us to generate data not only for an increasingly diverse range of species but also individual genomes and populations of individuals. We discuss the impact of these new sequencing technologies on poultry research, with a specific focus on what functional modeling resources are available for poultry researchers. We also describe key strategies for researchers who wish to functionally model their own data, providing background information about functional modeling approaches, the data and tools to support these approaches, and the strengths and limitations of each. Specifically, we describe methods for functional analysis using Gene Ontology (GO) functional summaries, functional enrichment analysis, and pathways and network modeling. As annotation efforts begin to provide the fundamental data that underpin poultry functional modeling (such as improved gene identification, standardized gene nomenclature, temporal and spatial expression data and gene product function), tool developers are incorporating these data into new and existing tools that are used for functional modeling, and cyberinfrastructure is being developed to provide the necessary extendibility and scalability for storing and analyzing these data. This process will support the efforts of poultry researchers to make sense of their functional genomics data sets, and we provide here a starting point for researchers who wish to take advantage of these tools.

  20. Bioinformatics-Based Identification of Candidate Genes from QTLs Associated with Cell Wall Traits in Populus

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ranjan, Priya; Yin, Tongming; Zhang, Xinye

    2009-11-01

    Quantitative trait locus (QTL) studies are an integral part of plant research and are used to characterize the genetic basis of phenotypic variation observed in structured populations and inform marker-assisted breeding efforts. These QTL intervals can span large physical regions on a chromosome comprising hundreds of genes, thereby hampering candidate gene identification. Genome history, evolution, and expression evidence can be used to narrow the genes in the interval to a smaller list that is manageable for detailed downstream functional genomics characterization. Our primary motivation for the present study was to address the need for a research methodology that identifies candidatemore » genes within a broad QTL interval. Here we present a bioinformatics-based approach for subdividing candidate genes within QTL intervals into alternate groups of high probability candidates. Application of this approach in the context of studying cell wall traits, specifically lignin content and S/G ratios of stem and root in Populus plants, resulted in manageable sets of genes of both known and putative cell wall biosynthetic function. These results provide a roadmap for future experimental work leading to identification of new genes controlling cell wall recalcitrance and, ultimately, in the utility of plant biomass as an energy feedstock.« less

  1. Prioritization of orphan disease-causing genes using topological feature and GO similarity between proteins in interaction networks.

    PubMed

    Li, Min; Li, Qi; Ganegoda, Gamage Upeksha; Wang, JianXin; Wu, FangXiang; Pan, Yi

    2014-11-01

    Identification of disease-causing genes among a large number of candidates is a fundamental challenge in human disease studies. However, it is still time-consuming and laborious to determine the real disease-causing genes by biological experiments. With the advances of the high-throughput techniques, a large number of protein-protein interactions have been produced. Therefore, to address this issue, several methods based on protein interaction network have been proposed. In this paper, we propose a shortest path-based algorithm, named SPranker, to prioritize disease-causing genes in protein interaction networks. Considering the fact that diseases with similar phenotypes are generally caused by functionally related genes, we further propose an improved algorithm SPGOranker by integrating the semantic similarity of GO annotations. SPGOranker not only considers the topological similarity between protein pairs in a protein interaction network but also takes their functional similarity into account. The proposed algorithms SPranker and SPGOranker were applied to 1598 known orphan disease-causing genes from 172 orphan diseases and compared with three state-of-the-art approaches, ICN, VS and RWR. The experimental results show that SPranker and SPGOranker outperform ICN, VS, and RWR for the prioritization of orphan disease-causing genes. Importantly, for the case study of severe combined immunodeficiency, SPranker and SPGOranker predict several novel causal genes.

  2. A new paradigm for transcription factor TFIIB functionality

    PubMed Central

    Gelev, Vladimir; Zabolotny, Janice M.; Lange, Martin; Hiromura, Makoto; Yoo, Sang Wook; Orlando, Joseph S.; Kushnir, Anna; Horikoshi, Nobuo; Paquet, Eric; Bachvarov, Dimcho; Schaffer, Priscilla A.; Usheva, Anny

    2014-01-01

    Experimental and bioinformatic studies of transcription initiation by RNA polymerase II (RNAP2) have revealed a mechanism of RNAP2 transcription initiation less uniform across gene promoters than initially thought. However, the general transcription factor TFIIB is presumed to be universally required for RNAP2 transcription initiation. Based on bioinformatic analysis of data and effects of TFIIB knockdown in primary and transformed cell lines on cellular functionality and global gene expression, we report that TFIIB is dispensable for transcription of many human promoters, but is essential for herpes simplex virus-1 (HSV-1) gene transcription and replication. We report a novel cell cycle TFIIB regulation and localization of the acetylated TFIIB variant on the transcriptionally silent mitotic chromatids. Taken together, these results establish a new paradigm for TFIIB functionality in human gene expression, which when downregulated has potent anti-viral effects. PMID:24441171

  3. Genetic Bases of Fungal White Rot Wood Decay Predicted by Phylogenomic Analysis of Correlated Gene-Phenotype Evolution.

    PubMed

    Nagy, László G; Riley, Robert; Bergmann, Philip J; Krizsán, Krisztina; Martin, Francis M; Grigoriev, Igor V; Cullen, Dan; Hibbett, David S

    2017-01-01

    Fungal decomposition of plant cell walls (PCW) is a complex process that has diverse industrial applications and huge impacts on the carbon cycle. White rot (WR) is a powerful mode of PCW decay in which lignin and carbohydrates are both degraded. Mechanistic studies of decay coupled with comparative genomic analyses have provided clues to the enzymatic components of WR systems and their evolutionary origins, but the complete suite of genes necessary for WR remains undetermined. Here, we use phylogenomic comparative methods, which we validate through simulations, to identify shifts in gene family diversification rates that are correlated with evolution of WR, using data from 62 fungal genomes. We detected 409 gene families that appear to be evolutionarily correlated with WR. The identified gene families encode well-characterized decay enzymes, e.g., fungal class II peroxidases and cellobiohydrolases, and enzymes involved in import and detoxification pathways, as well as 73 gene families that have no functional annotation. About 310 of the 409 identified gene families are present in the genome of the model WR fungus Phanerochaete chrysosporium and 192 of these (62%) have been shown to be upregulated under ligninolytic culture conditions, which corroborates the phylogeny-based functional inferences. These results illuminate the complexity of WR and suggest that its evolution has involved a general elaboration of the decay apparatus, including numerous gene families with as-yet unknown exact functions. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  4. Evolution of the bHLH Genes Involved in Stomatal Development: Implications for the Expansion of Developmental Complexity of Stomata in Land Plants

    PubMed Central

    Ran, Jin-Hua; Shen, Ting-Ting; Liu, Wen-Juan; Wang, Xiao-Quan

    2013-01-01

    Stomata play significant roles in plant evolution. A trio of closely related basic Helix-Loop-Helix (bHLH) subgroup Ia genes, SPCH, MUTE and FAMA, mediate sequential steps of stomatal development, and their functions may be conserved in land plants. However, the evolutionary history of the putative SPCH/MUTE/FAMA genes is still greatly controversial, especially the phylogenetic positions of the bHLH Ia members from basal land plants. To better understand the evolutionary pattern and functional diversity of the bHLH genes involved in stomatal development, we made a comprehensive evolutionary analysis of the homologous genes from 54 species representing the major lineages of green plants. The phylogenetic analysis indicated: (1) All bHLH Ia genes from the two basal land plants Physcomitrella and Selaginella were closely related to the FAMA genes of seed plants; and (2) the gymnosperm ‘SPCH’ genes were sister to a clade comprising the angiosperm SPCH and MUTE genes, while the FAMA genes of gymnosperms and angiosperms had a sister relationship. The revealed phylogenetic relationships are also supported by the distribution of gene structures and previous functional studies. Therefore, we deduce that the function of FAMA might be ancestral in the bHLH Ia subgroup. In addition, the gymnosperm “SPCH” genes may represent an ancestral state and have a dual function of SPCH and MUTE, two genes that could have originated from a duplication event in the common ancestor of angiosperms. Moreover, in angiosperms, SPCHs have experienced more duplications and harbor more copies than MUTEs and FAMAs, which, together with variation of the stomatal development in the entry division, implies that SPCH might have contributed greatly to the diversity of stomatal development. Based on the above, we proposed a model for the correlation between the evolution of stomatal development and the genes involved in this developmental process in land plants. PMID:24244399

  5. Multiple Renal Cyst Development but Not Situs Abnormalities in Transgenic RNAi Mice against Inv::GFP Rescue Gene

    PubMed Central

    Kamijho, Yuki; Shiozaki, Yayoi; Sakurai, Eiki; Hanaoka, Kazunori; Watanabe, Daisuke

    2014-01-01

    In this study we generated RNA interference (RNAi)-mediated gene knockdown transgenic mice (transgenic RNAi mice) against the functional Inv gene. Inv mutant mice show consistently reversed internal organs (situs inversus), multiple renal cysts and neonatal lethality. The Inv::GFP-rescue mice, which introduced the Inv::GFP fusion gene, can rescue inv mutant mice phenotypes. This indicates that the Inv::GFP gene is functional in vivo. To analyze the physiological functions of the Inv gene, and to demonstrate the availability of transgenic RNAi mice, we introduced a short hairpin RNA expression vector against GFP mRNA into Inv::GFP-rescue mice and analyzed the gene silencing effects and Inv functions by examining phenotypes. Transgenic RNAi mice with the Inv::GFP-rescue gene (Inv-KD mice) down-regulated Inv::GFP fusion protein and showed hypomorphic phenotypes of inv mutant mice, such as renal cyst development, but not situs abnormalities or postnatal lethality. This indicates that shRNAi-mediated gene silencing systems that target the tag sequence of the fusion gene work properly in vivo, and suggests that a relatively high level of Inv protein is required for kidney development in contrast to left/right axis determination. Inv::GFP protein was significantly down-regulated in the germ cells of Inv-KD mice testis compared with somatic cells, suggesting the existence of a testicular germ cell-specific enhanced RNAi system that regulates germ cell development. The Inv-KD mouse is useful for studying Inv gene functions in adult tissue that are unable to be analyzed in inv mutant mice showing postnatal lethality. In addition, the shRNA-based gene silencing system against the tag sequence of the fusion gene can be utilized as a new technique to regulate gene expression in either in vitro or in vivo experiments. PMID:24586938

  6. Genome-wide evidence for divergent selection between populations of a major agricultural pathogen.

    PubMed

    Hartmann, Fanny E; McDonald, Bruce A; Croll, Daniel

    2018-06-01

    The genetic and environmental homogeneity in agricultural ecosystems is thought to impose strong and uniform selection pressures. However, the impact of this selection on plant pathogen genomes remains largely unknown. We aimed to identify the proportion of the genome and the specific gene functions under positive selection in populations of the fungal wheat pathogen Zymoseptoria tritici. First, we performed genome scans in four field populations that were sampled from different continents and on distinct wheat cultivars to test which genomic regions are under recent selection. Based on extended haplotype homozygosity and composite likelihood ratio tests, we identified 384 and 81 selective sweeps affecting 4% and 0.5% of the 35 Mb core genome, respectively. We found differences both in the number and the position of selective sweeps across the genome between populations. Using a XtX-based outlier detection approach, we identified 51 extremely divergent genomic regions between the allopatric populations, suggesting that divergent selection led to locally adapted pathogen populations. We performed an outlier detection analysis between two sympatric populations infecting two different wheat cultivars to identify evidence for host-driven selection. Selective sweep regions harboured genes that are likely to play a role in successfully establishing host infections. We also identified secondary metabolite gene clusters and an enrichment in genes encoding transporter and protein localization functions. The latter gene functions mediate responses to environmental stress, including interactions with the host. The distinct gene functions under selection indicate that both local host genotypes and abiotic factors contributed to local adaptation. © 2018 The Authors. Molecular Ecology Published by John Wiley & Sons Ltd.

  7. Robust extraction of functional signals from gene set analysis using a generalized threshold free scoring function

    PubMed Central

    2009-01-01

    Background A central task in contemporary biosciences is the identification of biological processes showing response in genome-wide differential gene expression experiments. Two types of analysis are common. Either, one generates an ordered list based on the differential expression values of the probed genes and examines the tail areas of the list for over-representation of various functional classes. Alternatively, one monitors the average differential expression level of genes belonging to a given functional class. So far these two types of method have not been combined. Results We introduce a scoring function, Gene Set Z-score (GSZ), for the analysis of functional class over-representation that combines two previous analysis methods. GSZ encompasses popular functions such as correlation, hypergeometric test, Max-Mean and Random Sets as limiting cases. GSZ is stable against changes in class size as well as across different positions of the analysed gene list in tests with randomized data. GSZ shows the best overall performance in a detailed comparison to popular functions using artificial data. Likewise, GSZ stands out in a cross-validation of methods using split real data. A comparison of empirical p-values further shows a strong difference in favour of GSZ, which clearly reports better p-values for top classes than the other methods. Furthermore, GSZ detects relevant biological themes that are missed by the other methods. These observations also hold when comparing GSZ with popular program packages. Conclusion GSZ and improved versions of earlier methods are a useful contribution to the analysis of differential gene expression. The methods and supplementary material are available from the website http://ekhidna.biocenter.helsinki.fi/users/petri/public/GSZ/GSZscore.html. PMID:19775443

  8. Divergence of RNA polymerase α subunits in angiosperm plastid genomes is mediated by genomic rearrangement.

    PubMed

    Blazier, J Chris; Ruhlman, Tracey A; Weng, Mao-Lun; Rehman, Sumaiyah K; Sabir, Jamal S M; Jansen, Robert K

    2016-04-18

    Genes for the plastid-encoded RNA polymerase (PEP) persist in the plastid genomes of all photosynthetic angiosperms. However, three unrelated lineages (Annonaceae, Passifloraceae and Geraniaceae) have been identified with unusually divergent open reading frames (ORFs) in the conserved region of rpoA, the gene encoding the PEP α subunit. We used sequence-based approaches to evaluate whether these genes retain function. Both gene sequences and complete plastid genome sequences were assembled and analyzed from each of the three angiosperm families. Multiple lines of evidence indicated that the rpoA sequences are likely functional despite retaining as low as 30% nucleotide sequence identity with rpoA genes from outgroups in the same angiosperm order. The ratio of non-synonymous to synonymous substitutions indicated that these genes are under purifying selection, and bioinformatic prediction of conserved domains indicated that functional domains are preserved. One of the lineages (Pelargonium, Geraniaceae) contains species with multiple rpoA-like ORFs that show evidence of ongoing inter-paralog gene conversion. The plastid genomes containing these divergent rpoA genes have experienced extensive structural rearrangement, including large expansions of the inverted repeat. We propose that illegitimate recombination, not positive selection, has driven the divergence of rpoA.

  9. Gene Coexpression Network Alignment and Conservation of Gene Modules between Two Grass Species: Maize and Rice[C][W][OA

    PubMed Central

    Ficklin, Stephen P.; Feltus, F. Alex

    2011-01-01

    One major objective for plant biology is the discovery of molecular subsystems underlying complex traits. The use of genetic and genomic resources combined in a systems genetics approach offers a means for approaching this goal. This study describes a maize (Zea mays) gene coexpression network built from publicly available expression arrays. The maize network consisted of 2,071 loci that were divided into 34 distinct modules that contained 1,928 enriched functional annotation terms and 35 cofunctional gene clusters. Of note, 391 maize genes of unknown function were found to be coexpressed within modules along with genes of known function. A global network alignment was made between this maize network and a previously described rice (Oryza sativa) coexpression network. The IsoRankN tool was used, which incorporates both gene homology and network topology for the alignment. A total of 1,173 aligned loci were detected between the two grass networks, which condensed into 154 conserved subgraphs that preserved 4,758 coexpression edges in rice and 6,105 coexpression edges in maize. This study provides an early view into maize coexpression space and provides an initial network-based framework for the translation of functional genomic and genetic information between these two vital agricultural species. PMID:21606319

  10. Gene coexpression network alignment and conservation of gene modules between two grass species: maize and rice.

    PubMed

    Ficklin, Stephen P; Feltus, F Alex

    2011-07-01

    One major objective for plant biology is the discovery of molecular subsystems underlying complex traits. The use of genetic and genomic resources combined in a systems genetics approach offers a means for approaching this goal. This study describes a maize (Zea mays) gene coexpression network built from publicly available expression arrays. The maize network consisted of 2,071 loci that were divided into 34 distinct modules that contained 1,928 enriched functional annotation terms and 35 cofunctional gene clusters. Of note, 391 maize genes of unknown function were found to be coexpressed within modules along with genes of known function. A global network alignment was made between this maize network and a previously described rice (Oryza sativa) coexpression network. The IsoRankN tool was used, which incorporates both gene homology and network topology for the alignment. A total of 1,173 aligned loci were detected between the two grass networks, which condensed into 154 conserved subgraphs that preserved 4,758 coexpression edges in rice and 6,105 coexpression edges in maize. This study provides an early view into maize coexpression space and provides an initial network-based framework for the translation of functional genomic and genetic information between these two vital agricultural species.

  11. Structural and functional annotation of the porcine immunome

    PubMed Central

    2013-01-01

    Background The domestic pig is known as an excellent model for human immunology and the two species share many pathogens. Susceptibility to infectious disease is one of the major constraints on swine performance, yet the structure and function of genes comprising the pig immunome are not well-characterized. The completion of the pig genome provides the opportunity to annotate the pig immunome, and compare and contrast pig and human immune systems. Results The Immune Response Annotation Group (IRAG) used computational curation and manual annotation of the swine genome assembly 10.2 (Sscrofa10.2) to refine the currently available automated annotation of 1,369 immunity-related genes through sequence-based comparison to genes in other species. Within these genes, we annotated 3,472 transcripts. Annotation provided evidence for gene expansions in several immune response families, and identified artiodactyl-specific expansions in the cathelicidin and type 1 Interferon families. We found gene duplications for 18 genes, including 13 immune response genes and five non-immune response genes discovered in the annotation process. Manual annotation provided evidence for many new alternative splice variants and 8 gene duplications. Over 1,100 transcripts without porcine sequence evidence were detected using cross-species annotation. We used a functional approach to discover and accurately annotate porcine immune response genes. A co-expression clustering analysis of transcriptomic data from selected experimental infections or immune stimulations of blood, macrophages or lymph nodes identified a large cluster of genes that exhibited a correlated positive response upon infection across multiple pathogens or immune stimuli. Interestingly, this gene cluster (cluster 4) is enriched for known general human immune response genes, yet contains many un-annotated porcine genes. A phylogenetic analysis of the encoded proteins of cluster 4 genes showed that 15% exhibited an accelerated evolution as compared to 4.1% across the entire genome. Conclusions This extensive annotation dramatically extends the genome-based knowledge of the molecular genetics and structure of a major portion of the porcine immunome. Our complementary functional approach using co-expression during immune response has provided new putative immune response annotation for over 500 porcine genes. Our phylogenetic analysis of this core immunome cluster confirms rapid evolutionary change in this set of genes, and that, as in other species, such genes are important components of the pig’s adaptation to pathogen challenge over evolutionary time. These comprehensive and integrated analyses increase the value of the porcine genome sequence and provide important tools for global analyses and data-mining of the porcine immune response. PMID:23676093

  12. Using whole-exome sequencing to investigate the genetic bases of lysosomal storage diseases of unknown etiology.

    PubMed

    Wang, Nan; Zhang, Yeting; Gedvilaite, Erika; Loh, Jui Wan; Lin, Timothy; Liu, Xiuping; Liu, Chang-Gong; Kumar, Dibyendu; Donnelly, Robert; Raymond, Kimiyo; Schuchman, Edward H; Sleat, David E; Lobel, Peter; Xing, Jinchuan

    2017-11-01

    Lysosomes are membrane-bound, acidic eukaryotic cellular organelles that play important roles in the degradation of macromolecules. Mutations that cause the loss of lysosomal protein function can lead to a group of disorders categorized as the lysosomal storage diseases (LSDs). Suspicion of LSD is frequently based on clinical and pathologic findings, but in some cases, the underlying genetic and biochemical defects remain unknown. Here, we performed whole-exome sequencing (WES) on 14 suspected LSD cases to evaluate the feasibility of using WES for identifying causal mutations. By examining 2,157 candidate genes potentially associated with lysosomal function, we identified eight variants in five genes as candidate disease-causing variants in four individuals. These included both known and novel mutations. Variants were corroborated by targeted sequencing and, when possible, functional assays. In addition, we identified nonsense mutations in two individuals in genes that are not known to have lysosomal function. However, mutations in these genes could have resulted in phenotypes that were diagnosed as LSDs. This study demonstrates that WES can be used to identify causal mutations in suspected LSD cases. We also demonstrate cases where a confounding clinical phenotype may potentially reflect more than one lysosomal protein defect. © 2017 Wiley Periodicals, Inc.

  13. Filtering genetic variants and placing informative priors based on putative biological function.

    PubMed

    Friedrichs, Stefanie; Malzahn, Dörthe; Pugh, Elizabeth W; Almeida, Marcio; Liu, Xiao Qing; Bailey, Julia N

    2016-02-03

    High-density genetic marker data, especially sequence data, imply an immense multiple testing burden. This can be ameliorated by filtering genetic variants, exploiting or accounting for correlations between variants, jointly testing variants, and by incorporating informative priors. Priors can be based on biological knowledge or predicted variant function, or even be used to integrate gene expression or other omics data. Based on Genetic Analysis Workshop (GAW) 19 data, this article discusses diversity and usefulness of functional variant scores provided, for example, by PolyPhen2, SIFT, or RegulomeDB annotations. Incorporating functional scores into variant filters or weights and adjusting the significance level for correlations between variants yielded significant associations with blood pressure traits in a large family study of Mexican Americans (GAW19 data set). Marker rs218966 in gene PHF14 and rs9836027 in MAP4 significantly associated with hypertension; additionally, rare variants in SNUPN significantly associated with systolic blood pressure. Variant weights strongly influenced the power of kernel methods and burden tests. Apart from variant weights in test statistics, prior weights may also be used when combining test statistics or to informatively weight p values while controlling false discovery rate (FDR). Indeed, power improved when gene expression data for FDR-controlled informative weighting of association test p values of genes was used. Finally, approaches exploiting variant correlations included identity-by-descent mapping and the optimal strategy for joint testing rare and common variants, which was observed to depend on linkage disequilibrium structure.

  14. Integrating gene and protein expression data with genome-scale metabolic networks to infer functional pathways.

    PubMed

    Pey, Jon; Valgepea, Kaspar; Rubio, Angel; Beasley, John E; Planes, Francisco J

    2013-12-08

    The study of cellular metabolism in the context of high-throughput -omics data has allowed us to decipher novel mechanisms of importance in biotechnology and health. To continue with this progress, it is essential to efficiently integrate experimental data into metabolic modeling. We present here an in-silico framework to infer relevant metabolic pathways for a particular phenotype under study based on its gene/protein expression data. This framework is based on the Carbon Flux Path (CFP) approach, a mixed-integer linear program that expands classical path finding techniques by considering additional biophysical constraints. In particular, the objective function of the CFP approach is amended to account for gene/protein expression data and influence obtained paths. This approach is termed integrative Carbon Flux Path (iCFP). We show that gene/protein expression data also influences the stoichiometric balancing of CFPs, which provides a more accurate picture of active metabolic pathways. This is illustrated in both a theoretical and real scenario. Finally, we apply this approach to find novel pathways relevant in the regulation of acetate overflow metabolism in Escherichia coli. As a result, several targets which could be relevant for better understanding of the phenomenon leading to impaired acetate overflow are proposed. A novel mathematical framework that determines functional pathways based on gene/protein expression data is presented and validated. We show that our approach is able to provide new insights into complex biological scenarios such as acetate overflow in Escherichia coli.

  15. The Sequences of 1504 Mutants in the Model Rice Variety Kitaake Facilitate Rapid Functional Genomic Studies

    PubMed Central

    Pham, Nikki T.; Wei, Tong; Schackwitz, Wendy S.; Lipzen, Anna M.; Duong, Phat Q.; Jones, Kyle C.; Ruan, Deling; Bauer, Diane; Peng, Yi; Schmutz, Jeremy

    2017-01-01

    The availability of a whole-genome sequenced mutant population and the cataloging of mutations of each line at a single-nucleotide resolution facilitate functional genomic analysis. To this end, we generated and sequenced a fast-neutron-induced mutant population in the model rice cultivar Kitaake (Oryza sativa ssp japonica), which completes its life cycle in 9 weeks. We sequenced 1504 mutant lines at 45-fold coverage and identified 91,513 mutations affecting 32,307 genes, i.e., 58% of all rice genes. We detected an average of 61 mutations per line. Mutation types include single-base substitutions, deletions, insertions, inversions, translocations, and tandem duplications. We observed a high proportion of loss-of-function mutations. We identified an inversion affecting a single gene as the causative mutation for the short-grain phenotype in one mutant line. This result reveals the usefulness of the resource for efficient, cost-effective identification of genes conferring specific phenotypes. To facilitate public access to this genetic resource, we established an open access database called KitBase that provides access to sequence data and seed stocks. This population complements other available mutant collections and gene-editing technologies. This work demonstrates how inexpensive next-generation sequencing can be applied to generate a high-density catalog of mutations. PMID:28576844

  16. ISAAC - InterSpecies Analysing Application using Containers.

    PubMed

    Baier, Herbert; Schultz, Jörg

    2014-01-15

    Information about genes, transcripts and proteins is spread over a wide variety of databases. Different tools have been developed using these databases to identify biological signals in gene lists from large scale analysis. Mostly, they search for enrichments of specific features. But, these tools do not allow an explorative walk through different views and to change the gene lists according to newly upcoming stories. To fill this niche, we have developed ISAAC, the InterSpecies Analysing Application using Containers. The central idea of this web based tool is to enable the analysis of sets of genes, transcripts and proteins under different biological viewpoints and to interactively modify these sets at any point of the analysis. Detailed history and snapshot information allows tracing each action. Furthermore, one can easily switch back to previous states and perform new analyses. Currently, sets can be viewed in the context of genomes, protein functions, protein interactions, pathways, regulation, diseases and drugs. Additionally, users can switch between species with an automatic, orthology based translation of existing gene sets. As todays research usually is performed in larger teams and consortia, ISAAC provides group based functionalities. Here, sets as well as results of analyses can be exchanged between members of groups. ISAAC fills the gap between primary databases and tools for the analysis of large gene lists. With its highly modular, JavaEE based design, the implementation of new modules is straight forward. Furthermore, ISAAC comes with an extensive web-based administration interface including tools for the integration of third party data. Thus, a local installation is easily feasible. In summary, ISAAC is tailor made for highly explorative interactive analyses of gene, transcript and protein sets in a collaborative environment.

  17. GoWeb: a semantic search engine for the life science web.

    PubMed

    Dietze, Heiko; Schroeder, Michael

    2009-10-01

    Current search engines are keyword-based. Semantic technologies promise a next generation of semantic search engines, which will be able to answer questions. Current approaches either apply natural language processing to unstructured text or they assume the existence of structured statements over which they can reason. Here, we introduce a third approach, GoWeb, which combines classical keyword-based Web search with text-mining and ontologies to navigate large results sets and facilitate question answering. We evaluate GoWeb on three benchmarks of questions on genes and functions, on symptoms and diseases, and on proteins and diseases. The first benchmark is based on the BioCreAtivE 1 Task 2 and links 457 gene names with 1352 functions. GoWeb finds 58% of the functional GeneOntology annotations. The second benchmark is based on 26 case reports and links symptoms with diseases. GoWeb achieves 77% success rate improving an existing approach by nearly 20%. The third benchmark is based on 28 questions in the TREC genomics challenge and links proteins to diseases. GoWeb achieves a success rate of 79%. GoWeb's combination of classical Web search with text-mining and ontologies is a first step towards answering questions in the biomedical domain. GoWeb is online at: http://www.gopubmed.org/goweb.

  18. A TAD further: exogenous control of gene activation.

    PubMed

    Mapp, Anna K; Ansari, Aseem Z

    2007-01-23

    Designer molecules that can be used to impose exogenous control on gene transcription, artificial transcription factors (ATFs), are highly desirable as mechanistic probes of gene regulation, as potential therapeutic agents, and as components of cell-based devices. Recently, several advances have been made in the design of ATFs that activate gene transcription (activator ATFs), including reports of small-molecule-based systems and ATFs that exhibit potent activity. However, the many open mechanistic questions about transcriptional activators, in particular, the structure and function of the transcriptional activation domain (TAD), have hindered rapid development of synthetic ATFs. A compelling need thus exists for chemical tools and insights toward a more detailed portrait of the dynamic process of gene activation.

  19. NoGOA: predicting noisy GO annotations using evidences and sparse representation.

    PubMed

    Yu, Guoxian; Lu, Chang; Wang, Jun

    2017-07-21

    Gene Ontology (GO) is a community effort to represent functional features of gene products. GO annotations (GOA) provide functional associations between GO terms and gene products. Due to resources limitation, only a small portion of annotations are manually checked by curators, and the others are electronically inferred. Although quality control techniques have been applied to ensure the quality of annotations, the community consistently report that there are still considerable noisy (or incorrect) annotations. Given the wide application of annotations, however, how to identify noisy annotations is an important but yet seldom studied open problem. We introduce a novel approach called NoGOA to predict noisy annotations. NoGOA applies sparse representation on the gene-term association matrix to reduce the impact of noisy annotations, and takes advantage of sparse representation coefficients to measure the semantic similarity between genes. Secondly, it preliminarily predicts noisy annotations of a gene based on aggregated votes from semantic neighborhood genes of that gene. Next, NoGOA estimates the ratio of noisy annotations for each evidence code based on direct annotations in GOA files archived on different periods, and then weights entries of the association matrix via estimated ratios and propagates weights to ancestors of direct annotations using GO hierarchy. Finally, it integrates evidence-weighted association matrix and aggregated votes to predict noisy annotations. Experiments on archived GOA files of six model species (H. sapiens, A. thaliana, S. cerevisiae, G. gallus, B. Taurus and M. musculus) demonstrate that NoGOA achieves significantly better results than other related methods and removing noisy annotations improves the performance of gene function prediction. The comparative study justifies the effectiveness of integrating evidence codes with sparse representation for predicting noisy GO annotations. Codes and datasets are available at http://mlda.swu.edu.cn/codes.php?name=NoGOA .

  20. Recrudescence Mechanisms and Gene Expression Profile of the Reproductive Tracts from Chickens during the Molting Period

    PubMed Central

    Ahn, Suzie E.; Lim, Chul-Hong; Lee, Jin-Young; Bae, Seung-Min; Kim, Jinyoung; Bazer, Fuller W.; Song, Gwonhwa

    2013-01-01

    The reproductive system of chickens undergoes dynamic morphological and functional tissue remodeling during the molting period. The present study identified global gene expression profiles following oviductal tissue regression and regeneration in laying hens in which molting was induced by feeding high levels of zinc in the diet. During the molting and recrudescence processes, progressive morphological and physiological changes included regression and re-growth of reproductive organs and fluctuations in concentrations of testosterone, progesterone, estradiol and corticosterone in blood. The cDNA microarray analysis of oviductal tissues revealed the biological significance of gene expression-based modulation in oviductal tissue during its remodeling. Based on the gene expression profiles, expression patterns of selected genes such as, TF, ANGPTL3, p20K, PTN, AvBD11 and SERPINB3 exhibited similar patterns in expression with gradual decreases during regression of the oviduct and sequential increases during resurrection of the functional oviduct. Also, miR-1689* inhibited expression of Sp1, while miR-17-3p, miR-22* and miR-1764 inhibited expression of STAT1. Similarly, chicken miR-1562 and miR-138 reduced the expression of ANGPTL3 and p20K, respectively. These results suggest that these differentially regulated genes are closely correlated with the molecular mechanism(s) for development and tissue remodeling of the avian female reproductive tract, and that miRNA-mediated regulation of key genes likely contributes to remodeling of the avian reproductive tract by controlling expression of those genes post-transcriptionally. The discovered global gene profiles provide new molecular candidates responsible for regulating morphological and functional recrudescence of the avian reproductive tract, and provide novel insights into understanding the remodeling process at the genomic and epigenomic levels. PMID:24098561

  1. Metagenomics and novel gene discovery

    PubMed Central

    Culligan, Eamonn P; Sleator, Roy D; Marchesi, Julian R; Hill, Colin

    2014-01-01

    Metagenomics provides a means of assessing the total genetic pool of all the microbes in a particular environment, in a culture-independent manner. It has revealed unprecedented diversity in microbial community composition, which is further reflected in the encoded functional diversity of the genomes, a large proportion of which consists of novel genes. Herein, we review both sequence-based and functional metagenomic methods to uncover novel genes and outline some of the associated problems of each type of approach, as well as potential solutions. Furthermore, we discuss the potential for metagenomic biotherapeutic discovery, with a particular focus on the human gut microbiome and finally, we outline how the discovery of novel genes may be used to create bioengineered probiotics. PMID:24317337

  2. Natural antisense RNAs as mRNA regulatory elements in bacteria: a review on function and applications.

    PubMed

    Saberi, Fatemeh; Kamali, Mehdi; Najafi, Ali; Yazdanparast, Alavieh; Moghaddam, Mehrdad Moosazadeh

    2016-01-01

    Naturally occurring antisense RNAs are small, diffusible, untranslated transcripts that pair to target RNAs at specific regions of complementarity to control their biological function by regulating gene expression at the post-transcriptional level. This review focuses on known cases of antisense RNA control in prokaryotes and provides an overview of some natural RNA-based mechanisms that bacteria use to modulate gene expression, such as mRNA sensors, riboswitches and antisense RNAs. We also highlight recent advances in RNA-based technology. The review shows that studies on both natural and synthetic systems are reciprocally beneficial.

  3. SoyNet: a database of co-functional networks for soybean Glycine max.

    PubMed

    Kim, Eiru; Hwang, Sohyun; Lee, Insuk

    2017-01-04

    Soybean (Glycine max) is a legume crop with substantial economic value, providing a source of oil and protein for humans and livestock. More than 50% of edible oils consumed globally are derived from this crop. Soybean plants are also important for soil fertility, as they fix atmospheric nitrogen by symbiosis with microorganisms. The latest soybean genome annotation (version 2.0) lists 56 044 coding genes, yet their functional contributions to crop traits remain mostly unknown. Co-functional networks have proven useful for identifying genes that are involved in a particular pathway or phenotype with various network algorithms. Here, we present SoyNet (available at www.inetbio.org/soynet), a database of co-functional networks for G. max and a companion web server for network-based functional predictions. SoyNet maps 1 940 284 co-functional links between 40 812 soybean genes (72.8% of the coding genome), which were inferred from 21 distinct types of genomics data including 734 microarrays and 290 RNA-seq samples from soybean. SoyNet provides a new route to functional investigation of the soybean genome, elucidating genes and pathways of agricultural importance. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. Construction and Analysis of Functional Networks in the Gut Microbiome of Type 2 Diabetes Patients.

    PubMed

    Li, Lianshuo; Wang, Zicheng; He, Peng; Ma, Shining; Du, Jie; Jiang, Rui

    2016-10-01

    Although networks of microbial species have been widely used in the analysis of 16S rRNA sequencing data of a microbiome, the construction and analysis of a complete microbial gene network are in general problematic because of the large number of microbial genes in metagenomics studies. To overcome this limitation, we propose to map microbial genes to functional units, including KEGG orthologous groups and the evolutionary genealogy of genes: Non-supervised Orthologous Groups (eggNOG) orthologous groups, to enable the construction and analysis of a microbial functional network. We devised two statistical methods to infer pairwise relationships between microbial functional units based on a deep sequencing dataset of gut microbiome from type 2 diabetes (T2D) patients as well as healthy controls. Networks containing such functional units and their significant interactions were constructed subsequently. We conducted a variety of analyses of global properties, local properties, and functional modules in the resulting functional networks. Our data indicate that besides the observations consistent with the current knowledge, this study provides novel biological insights into the gut microbiome associated with T2D. Copyright © 2016. Production and hosting by Elsevier Ltd.

  5. Gene function in early mouse embryonic stem cell differentiation

    PubMed Central

    Sene, Kagnew Hailesellasse; Porter, Christopher J; Palidwor, Gareth; Perez-Iratxeta, Carolina; Muro, Enrique M; Campbell, Pearl A; Rudnicki, Michael A; Andrade-Navarro, Miguel A

    2007-01-01

    Background Little is known about the genes that drive embryonic stem cell differentiation. However, such knowledge is necessary if we are to exploit the therapeutic potential of stem cells. To uncover the genetic determinants of mouse embryonic stem cell (mESC) differentiation, we have generated and analyzed 11-point time-series of DNA microarray data for three biologically equivalent but genetically distinct mESC lines (R1, J1, and V6.5) undergoing undirected differentiation into embryoid bodies (EBs) over a period of two weeks. Results We identified the initial 12 hour period as reflecting the early stages of mESC differentiation and studied probe sets showing consistent changes of gene expression in that period. Gene function analysis indicated significant up-regulation of genes related to regulation of transcription and mRNA splicing, and down-regulation of genes related to intracellular signaling. Phylogenetic analysis indicated that the genes showing the largest expression changes were more likely to have originated in metazoans. The probe sets with the most consistent gene changes in the three cell lines represented 24 down-regulated and 12 up-regulated genes, all with closely related human homologues. Whereas some of these genes are known to be involved in embryonic developmental processes (e.g. Klf4, Otx2, Smn1, Socs3, Tagln, Tdgf1), our analysis points to others (such as transcription factor Phf21a, extracellular matrix related Lama1 and Cyr61, or endoplasmic reticulum related Sc4mol and Scd2) that have not been previously related to mESC function. The majority of identified functions were related to transcriptional regulation, intracellular signaling, and cytoskeleton. Genes involved in other cellular functions important in ESC differentiation such as chromatin remodeling and transmembrane receptors were not observed in this set. Conclusion Our analysis profiles for the first time gene expression at a very early stage of mESC differentiation, and identifies a functional and phylogenetic signature for the genes involved. The data generated constitute a valuable resource for further studies. All DNA microarray data used in this study are available in the StemBase database of stem cell gene expression data [1] and in the NCBI's GEO database. PMID:17394647

  6. Genome-Wide Comparative Analysis of the Phospholipase D Gene Families among Allotetraploid Cotton and Its Diploid Progenitors

    PubMed Central

    Tang, Kai; Dong, Chun-Juan; Liu, Jin-Yuan

    2016-01-01

    In this study, 40 phospholipase D (PLD) genes were identified from allotetraploid cotton Gossypium hirsutum, and 20 PLD genes were examined in diploid cotton Gossypium raimondii. Combining with 19 previously identified Gossypium arboreum PLD genes, a comparative analysis was performed among the PLD gene families among allotetraploid and two diploid cottons. Based on the orthologous relationships, we found that almost each G. hirsutum PLD had a corresponding homolog in the G. arboreum and G. raimondii genomes, except for GhPLDβ3A, whose homolog GaPLDβ3 may have been lost during the evolution of G. arboreum after the interspecific hybridization. Phylogenetic analysis showed that all of the cotton PLDs were unevenly classified into six numbered subgroups: α, β/γ, δ, ε, ζ and φ. An N-terminal C2 domain was found in the α, β/γ, δ and ε subgroups, while phox homology (PX) and pleckstrin homology (PH) domains were identified in the ζ subgroup. The subgroup φ possessed a single peptide instead of a functional domain. In each phylogenetic subgroup, the PLDs showed high conservation in gene structure and amino acid sequences in functional domains. The expansion of GhPLD and GrPLD gene families were mainly attributed to segmental duplication and partly attributed to tandem duplication. Furthermore, purifying selection played a critical role in the evolution of PLD genes in cotton. Quantitative RT-PCR documented that allotetraploid cotton PLD genes were broadly expressed and each had a unique spatial and developmental expression pattern, indicating their functional diversification in cotton growth and development. Further analysis of cis-regulatory elements elucidated transcriptional regulations and potential functions. Our comparative analysis provided valuable information for understanding the putative functions of the PLD genes in cotton fiber. PMID:27213891

  7. MicroRNA-integrated and network-embedded gene selection with diffusion distance.

    PubMed

    Huang, Di; Zhou, Xiaobo; Lyon, Christopher J; Hsueh, Willa A; Wong, Stephen T C

    2010-10-29

    Gene network information has been used to improve gene selection in microarray-based studies by selecting marker genes based both on their expression and the coordinate expression of genes within their gene network under a given condition. Here we propose a new network-embedded gene selection model. In this model, we first address the limitations of microarray data. Microarray data, although widely used for gene selection, measures only mRNA abundance, which does not always reflect the ultimate gene phenotype, since it does not account for post-transcriptional effects. To overcome this important (critical in certain cases) but ignored-in-almost-all-existing-studies limitation, we design a new strategy to integrate together microarray data with the information of microRNA, the major post-transcriptional regulatory factor. We also handle the challenges led by gene collaboration mechanism. To incorporate the biological facts that genes without direct interactions may work closely due to signal transduction and that two genes may be functionally connected through multi paths, we adopt the concept of diffusion distance. This concept permits us to simulate biological signal propagation and therefore to estimate the collaboration probability for all gene pairs, directly or indirectly-connected, according to multi paths connecting them. We demonstrate, using type 2 diabetes (DM2) as an example, that the proposed strategies can enhance the identification of functional gene partners, which is the key issue in a network-embedded gene selection model. More importantly, we show that our gene selection model outperforms related ones. Genes selected by our model 1) have improved classification capability; 2) agree with biological evidence of DM2-association; and 3) are involved in many well-known DM2-associated pathways.

  8. GreenPhylDB v2.0: comparative and functional genomics in plants.

    PubMed

    Rouard, Mathieu; Guignon, Valentin; Aluome, Christelle; Laporte, Marie-Angélique; Droc, Gaëtan; Walde, Christian; Zmasek, Christian M; Périn, Christophe; Conte, Matthieu G

    2011-01-01

    GreenPhylDB is a database designed for comparative and functional genomics based on complete genomes. Version 2 now contains sixteen full genomes of members of the plantae kingdom, ranging from algae to angiosperms, automatically clustered into gene families. Gene families are manually annotated and then analyzed phylogenetically in order to elucidate orthologous and paralogous relationships. The database offers various lists of gene families including plant, phylum and species specific gene families. For each gene cluster or gene family, easy access to gene composition, protein domains, publications, external links and orthologous gene predictions is provided. Web interfaces have been further developed to improve the navigation through information related to gene families. New analysis tools are also available, such as a gene family ontology browser that facilitates exploration. GreenPhylDB is a component of the South Green Bioinformatics Platform (http://southgreen.cirad.fr/) and is accessible at http://greenphyl.cirad.fr. It enables comparative genomics in a broad taxonomy context to enhance the understanding of evolutionary processes and thus tends to speed up gene discovery.

  9. FunSimMat: a comprehensive functional similarity database

    PubMed Central

    Schlicker, Andreas; Albrecht, Mario

    2008-01-01

    Functional similarity based on Gene Ontology (GO) annotation is used in diverse applications like gene clustering, gene expression data analysis, protein interaction prediction and evaluation. However, there exists no comprehensive resource of functional similarity values although such a database would facilitate the use of functional similarity measures in different applications. Here, we describe FunSimMat (Functional Similarity Matrix, http://funsimmat.bioinf.mpi-inf.mpg.de/), a large new database that provides several different semantic similarity measures for GO terms. It offers various precomputed functional similarity values for proteins contained in UniProtKB and for protein families in Pfam and SMART. The web interface allows users to efficiently perform both semantic similarity searches with GO terms and functional similarity searches with proteins or protein families. All results can be downloaded in tab-delimited files for use with other tools. An additional XML–RPC interface gives automatic online access to FunSimMat for programs and remote services. PMID:17932054

  10. The human RHOX gene cluster: target genes and functional analysis of gene variants in infertile men.

    PubMed

    Borgmann, Jennifer; Tüttelmann, Frank; Dworniczak, Bernd; Röpke, Albrecht; Song, Hye-Won; Kliesch, Sabine; Wilkinson, Miles F; Laurentino, Sandra; Gromoll, Jörg

    2016-11-15

    The X-linked reproductive homeobox (RHOX) gene cluster encodes transcription factors preferentially expressed in reproductive tissues. This gene cluster has important roles in male fertility based on phenotypic defects of Rhox-mutant mice and the finding that aberrant RHOX promoter methylation is strongly associated with abnormal human sperm parameters. However, little is known about the molecular mechanism of RHOX function in humans. Using gene expression profiling, we identified genes regulated by members of the human RHOX gene cluster. Some genes were uniquely regulated by RHOXF1 or RHOXF2/2B, while others were regulated by both of these transcription factors. Several of these regulated genes encode proteins involved in processes relevant to spermatogenesis; e.g. stress protection and cell survival. One of the target genes of RHOXF2/2B is RHOXF1, suggesting cross-regulation to enhance transcriptional responses. The potential role of RHOX in human infertility was addressed by sequencing all RHOX exons in a group of 250 patients with severe oligozoospermia. This revealed two mutations in RHOXF1 (c.515G > A and c.522C > T) and four in RHOXF2/2B (-73C > G, c.202G > A, c.411C > T and c.679G > A), of which only one (c.202G > A) was found in a control group of men with normal sperm concentration. Functional analysis demonstrated that c.202G > A and c.679G > A significantly impaired the ability of RHOXF2/2B to regulate downstream genes. Molecular modelling suggested that these mutations alter RHOXF2/F2B protein conformation. By combining clinical data with in vitro functional analysis, we demonstrate how the X-linked RHOX gene cluster may function in normal human spermatogenesis and we provide evidence that it is impaired in human male fertility.

  11. A genetic screen for modifiers of Drosophila caspase Dcp-1 reveals caspase involvement in autophagy and novel caspase-related genes.

    PubMed

    Kim, Young-Il; Ryu, Taewoo; Lee, Judong; Heo, Young-Shin; Ahnn, Joohong; Lee, Seung-Jae; Yoo, OokJoon

    2010-01-25

    Caspases are cysteine proteases with essential functions in the apoptotic pathway; their proteolytic activity toward various substrates is associated with the morphological changes of cells. Recent reports have described non-apoptotic functions of caspases, including autophagy. In this report, we searched for novel modifiers of the phenotype of Dcp-1 gain-of-function (GF) animals by screening promoter element- inserted Drosophila melanogaster lines (EP lines). We screened approximately 15,000 EP lines and identified 72 Dcp-1-interacting genes that were classified into 10 groups based on their functions and pathways: 4 apoptosis signaling genes, 10 autophagy genes, 5 insulin/IGF and TOR signaling pathway genes, 6 MAP kinase and JNK signaling pathway genes, 4 ecdysone signaling genes, 6 ubiquitination genes, 11 various developmental signaling genes, 12 transcription factors, 3 translation factors, and 11 other unclassified genes including 5 functionally undefined genes. Among them, insulin/IGF and TOR signaling pathway, MAP kinase and JNK signaling pathway, and ecdysone signaling are known to be involved in autophagy. Together with the identification of autophagy genes, the results of our screen suggest that autophagy counteracts Dcp-1-induced apoptosis. Consistent with this idea, we show that expression of eGFP-Atg5 rescued the eye phenotype caused by Dcp-1 GF. Paradoxically, we found that over-expression of full-length Dcp-1 induced autophagy, as Atg8b-GFP, an indicator of autophagy, was increased in the eye imaginal discs and in the S2 cell line. Taken together, these data suggest that autophagy suppresses Dcp-1-mediated apoptotic cell death, whereas Dcp-1 positively regulates autophagy, possibly through feedback regulation. We identified a number of Dcp-1 modifiers that genetically interact with Dcp-1-induced cell death. Our results showing that Dcp-1 and autophagy-related genes influence each other will aid future investigations of the complicated relationships between apoptosis and autophagy.

  12. Resolving the homology—function relationship through comparative genomics of membrane-trafficking machinery and parasite cell biology

    PubMed Central

    Klinger, Christen M.; Ramirez-Macias, Inmaculada; Herman, Emily K.; Turkewitz, Aaron P.; Field, Mark C.; Dacks, Joel B.

    2016-01-01

    With advances in DNA sequencing technology, it is increasingly common and tractable to informatically look for genes of interest in the genomic databases of parasitic organisms and infer cellular states. Assignment of a putative gene function based on homology to functionally characterized genes in other organisms, though powerful, relies on the implicit assumption of functional homology, i.e. that orthology indicates conserved function. Eukaryotes reveal a dazzling array of cellular features and structural organization, suggesting a concomitant diversity in their underlying molecular machinery. Significantly, examples of novel functions for pre-existing or new paralogues are not uncommon. Do these examples undermine the basic assumption of functional homology, especially in parasitic protists, which are often highly derived? Here we examine the extent to which functional homology exists between organisms spanning the eukaryotic lineage. By comparing membrane trafficking proteins between parasitic protists and traditional model organisms, where direct functional evidence is available, we find that function is indeed largely conserved between orthologues, albeit with significant adaptation arising from the unique biological features within each lineage. PMID:27444378

  13. Gene discovery in the hamster: a comparative genomics approach for gene annotation by sequencing of hamster testis cDNAs

    PubMed Central

    Oduru, Sreedhar; Campbell, Janee L; Karri, SriTulasi; Hendry, William J; Khan, Shafiq A; Williams, Simon C

    2003-01-01

    Background Complete genome annotation will likely be achieved through a combination of computer-based analysis of available genome sequences combined with direct experimental characterization of expressed regions of individual genomes. We have utilized a comparative genomics approach involving the sequencing of randomly selected hamster testis cDNAs to begin to identify genes not previously annotated on the human, mouse, rat and Fugu (pufferfish) genomes. Results 735 distinct sequences were analyzed for their relatedness to known sequences in public databases. Eight of these sequences were derived from previously unidentified genes and expression of these genes in testis was confirmed by Northern blotting. The genomic locations of each sequence were mapped in human, mouse, rat and pufferfish, where applicable, and the structure of their cognate genes was derived using computer-based predictions, genomic comparisons and analysis of uncharacterized cDNA sequences from human and macaque. Conclusion The use of a comparative genomics approach resulted in the identification of eight cDNAs that correspond to previously uncharacterized genes in the human genome. The proteins encoded by these genes included a new member of the kinesin superfamily, a SET/MYND-domain protein, and six proteins for which no specific function could be predicted. Each gene was expressed primarily in testis, suggesting that they may play roles in the development and/or function of testicular cells. PMID:12783626

  14. Genome Wide Identification of Orthologous ZIP Genes Associated with Zinc and Iron Translocation in Setaria italica.

    PubMed

    Alagarasan, Ganesh; Dubey, Mahima; Aswathy, Kumar S; Chandel, Girish

    2017-01-01

    Genes in the ZIP family encode transcripts to store and transport bivalent metal micronutrient, particularly iron (Fe) and or zinc (Zn). These transcripts are important for a variety of functions involved in the developmental and physiological processes in many plant species, including most, if not all, Poaceae plant species and the model species Arabidopsis. Here, we present the report of a genome wide investigation of orthologous ZIP genes in Setaria italica and the identification of 7 single copy genes. RT-PCR shows 4 of them could be used to increase the bio-availability of zinc and iron content in grains. Of 36 ZIP members, 25 genes have traces of signal peptide based sub-cellular localization, as compared to those of plant species studied previously, yet translocation of ions remains unclear. In silico analysis of gene structure and protein nature suggests that these two were preeminent in shaping the functional diversity of the ZIP gene family in S. italica . NAC, bZIP and bHLH are the predominant Fe and Zn responsive transcription factors present in SiZIP genes. Together, our results provide new insights into the signal peptide based/independent iron and zinc translocation in the plant system and allowed identification of ZIP genes that may be involved in the zinc and iron absorption from the soil, and thus transporting it to the cereal grain underlying high micronutrient accumulation.

  15. Functional metagenomics to mine the human gut microbiome for dietary fiber catabolic enzymes.

    PubMed

    Tasse, Lena; Bercovici, Juliette; Pizzut-Serin, Sandra; Robe, Patrick; Tap, Julien; Klopp, Christophe; Cantarel, Brandi L; Coutinho, Pedro M; Henrissat, Bernard; Leclerc, Marion; Doré, Joël; Monsan, Pierre; Remaud-Simeon, Magali; Potocki-Veronese, Gabrielle

    2010-11-01

    The human gut microbiome is a complex ecosystem composed mainly of uncultured bacteria. It plays an essential role in the catabolism of dietary fibers, the part of plant material in our diet that is not metabolized in the upper digestive tract, because the human genome does not encode adequate carbohydrate active enzymes (CAZymes). We describe a multi-step functionally based approach to guide the in-depth pyrosequencing of specific regions of the human gut metagenome encoding the CAZymes involved in dietary fiber breakdown. High-throughput functional screens were first applied to a library covering 5.4 × 10(9) bp of metagenomic DNA, allowing the isolation of 310 clones showing beta-glucanase, hemicellulase, galactanase, amylase, or pectinase activities. Based on the results of refined secondary screens, sequencing efforts were reduced to 0.84 Mb of nonredundant metagenomic DNA, corresponding to 26 clones that were particularly efficient for the degradation of raw plant polysaccharides. Seventy-three CAZymes from 35 different families were discovered. This corresponds to a fivefold target-gene enrichment compared to random sequencing of the human gut metagenome. Thirty-three of these CAZy encoding genes are highly homologous to prevalent genes found in the gut microbiome of at least 20 individuals for whose metagenomic data are available. Moreover, 18 multigenic clusters encoding complementary enzyme activities for plant cell wall degradation were also identified. Gene taxonomic assignment is consistent with horizontal gene transfer events in dominant gut species and provides new insights into the human gut functional trophic chain.

  16. Identification of functional elements and regulatory circuits by Drosophila modENCODE

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Roy, Sushmita; Ernst, Jason; Kharchenko, Peter V.

    2010-12-22

    To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- andmore » tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation. Several years after the complete genetic sequencing of many species, it is still unclear how to translate genomic information into a functional map of cellular and developmental programs. The Encyclopedia of DNA Elements (ENCODE) (1) and model organism ENCODE (modENCODE) (2) projects use diverse genomic assays to comprehensively annotate the Homo sapiens (human), Drosophila melanogaster (fruit fly), and Caenorhabditis elegans (worm) genomes, through systematic generation and computational integration of functional genomic data sets. Previous genomic studies in flies have made seminal contributions to our understanding of basic biological mechanisms and genome functions, facilitated by genetic, experimental, computational, and manual annotation of the euchromatic and heterochromatic genome (3), small genome size, short life cycle, and a deep knowledge of development, gene function, and chromosome biology. The functions of {approx}40% of the protein and nonprotein-coding genes [FlyBase 5.12 (4)] have been determined from cDNA collections (5, 6), manual curation of gene models (7), gene mutations and comprehensive genome-wide RNA interference screens (8-10), and comparative genomic analyses (11, 12). The Drosophila modENCODE project has generated more than 700 data sets that profile transcripts, histone modifications and physical nucleosome properties, general and specific transcription factors (TFs), and replication programs in cell lines, isolated tissues, and whole organisms across several developmental stages (Fig. 1). Here, we computationally integrate these data sets and report (i) improved and additional genome annotations, including full-length proteincoding genes and peptides as short as 21 amino acids; (ii) noncoding transcripts, including 132 candidate structural RNAs and 1608 nonstructural transcripts; (iii) additional Argonaute (Ago)-associated small RNA genes and pathways, including new microRNAs (miRNAs) encoded within protein-coding exons and endogenous small interfering RNAs (siRNAs) from 3-inch untranslated regions; (iv) chromatin 'states' defined by combinatorial patterns of 18 chromatin marks that are associated with distinct functions and properties; (v) regions of high TF occupancy and replication activity with likely epigenetic regulation; (vi)mixed TF and miRNA regulatory networks with hierarchical structure and enriched feed-forward loops; (vii) coexpression- and co-regulation-based functional annotations for nearly 3000 genes; (viii) stage- and tissue-specific regulators; and (ix) predictive models of gene expression levels and regulator function.« less

  17. A high efficiency gene disruption strategy using a positive-negative split selection marker and electroporation for Fusarium oxysporum.

    PubMed

    Liang, Liqin; Li, Jianqiang; Cheng, Lin; Ling, Jian; Luo, Zhongqin; Bai, Miao; Xie, Bingyan

    2014-11-01

    The Fusarium oxysporum species complex consists of fungal pathogens that cause serial vascular wilt disease on more than 100 cultivated species throughout the world. Gene function analysis is rapidly becoming more and more important as the whole-genome sequences of various F. oxysporum strains are being completed. Gene-disruption techniques are a common molecular tool for studying gene function, yet are often a limiting step in gene function identification. In this study we have developed a F. oxysporum high-efficiency gene-disruption strategy based on split-marker homologous recombination cassettes with dual selection and electroporation transformation. The method was efficiently used to delete three RNA-dependent RNA polymerase (RdRP) genes. The gene-disruption cassettes of three genes can be constructed simultaneously within a short time using this technique. The optimal condition for electroporation is 10μF capacitance, 300Ω resistance, 4kV/cm field strength, with 1μg of DNA (gene-disruption cassettes). Under these optimal conditions, we were able to obtain 95 transformants per μg DNA. And after positive-negative selection, the transformants were efficiently screened by PCR, screening efficiency averaged 85%: 90% (RdRP1), 85% (RdRP2) and 77% (RdRP3). This gene-disruption strategy should pave the way for high throughout genetic analysis in F. oxysporum. Copyright © 2014 Elsevier GmbH. All rights reserved.

  18. Homeobox genes in the rodent pineal gland: roles in development and phenotype maintenance.

    PubMed

    Rath, Martin F; Rohde, Kristian; Klein, David C; Møller, Morten

    2013-06-01

    The pineal gland is a neuroendocrine gland responsible for nocturnal synthesis of melatonin. During early development of the rodent pineal gland from the roof of the diencephalon, homeobox genes of the orthodenticle homeobox (Otx)- and paired box (Pax)-families are expressed and are essential for normal pineal development consistent with the well-established role that homeobox genes play in developmental processes. However, the pineal gland appears to be unusual because strong homeobox gene expression persists in the pineal gland of the adult brain. Accordingly, in addition to developmental functions, homeobox genes appear to be key regulators in postnatal phenotype maintenance in this tissue. In this paper, we review ontogenetic and phylogenetic aspects of pineal development and recent progress in understanding the involvement of homebox genes in rodent pineal development and adult function. A working model is proposed for understanding the sequential action of homeobox genes in controlling development and mature circadian function of the mammalian pinealocyte based on knowledge from detailed developmental and daily gene expression analyses in rats, the pineal phenotypes of homebox gene-deficient mice and studies on development of the retinal photoreceptor; the pinealocyte and retinal photoreceptor share features not seen in other tissues and are likely to have evolved from the same ancestral photodetector cell.

  19. Homeobox genes in the rodent pineal gland: roles in development and phenotype maintenance

    PubMed Central

    Rath, Martin F.; Rohde, Kristian; Klein, David C.; Møller, Morten

    2012-01-01

    The pineal gland is a neuroendocrine gland responsible for nocturnal synthesis of melatonin. During early development of the rodent pineal gland from the roof of the diencephalon, homeobox genes of the orthodenticle homeobox (Otx)- and paired box (Pax)-families are expressed and are essential for normal pineal development consistent with the well-established role that homeobox genes play in developmental processes. However, the pineal gland appears to be unusual because strong homeobox gene expression persists in the pineal gland of the adult brain. Accordingly, in addition to developmental functions, homeobox genes appear to be key regulators in postnatal phenotype maintenance in this tissue. In this paper, we review ontogenetic and phylogenetic aspects of pineal development and recent progress in understanding the involvement of homebox genes in rodent pineal development and adult function. A working model is proposed for understanding the sequential action of homeobox genes in controlling development and mature circadian function of the mammalian pinealocyte based on knowledge from detailed developmental and daily gene expression analyses in rats, the pineal phenotypes of homebox gene-deficient mice and studies on development of the retinal photoreceptor; the pinealocyte and retinal photoreceptor share features not seen in other tissues and are likely to have evolved from the same ancestral photodetector cell. PMID:23076630

  20. Computing and Applying Atomic Regulons to Understand Gene Expression and Regulation

    PubMed Central

    Faria, José P.; Davis, James J.; Edirisinghe, Janaka N.; Taylor, Ronald C.; Weisenhorn, Pamela; Olson, Robert D.; Stevens, Rick L.; Rocha, Miguel; Rocha, Isabel; Best, Aaron A.; DeJongh, Matthew; Tintle, Nathan L.; Parrello, Bruce; Overbeek, Ross; Henry, Christopher S.

    2016-01-01

    Understanding gene function and regulation is essential for the interpretation, prediction, and ultimate design of cell responses to changes in the environment. An important step toward meeting the challenge of understanding gene function and regulation is the identification of sets of genes that are always co-expressed. These gene sets, Atomic Regulons (ARs), represent fundamental units of function within a cell and could be used to associate genes of unknown function with cellular processes and to enable rational genetic engineering of cellular systems. Here, we describe an approach for inferring ARs that leverages large-scale expression data sets, gene context, and functional relationships among genes. We computed ARs for Escherichia coli based on 907 gene expression experiments and compared our results with gene clusters produced by two prevalent data-driven methods: Hierarchical clustering and k-means clustering. We compared ARs and purely data-driven gene clusters to the curated set of regulatory interactions for E. coli found in RegulonDB, showing that ARs are more consistent with gold standard regulons than are data-driven gene clusters. We further examined the consistency of ARs and data-driven gene clusters in the context of gene interactions predicted by Context Likelihood of Relatedness (CLR) analysis, finding that the ARs show better agreement with CLR predicted interactions. We determined the impact of increasing amounts of expression data on AR construction and find that while more data improve ARs, it is not necessary to use the full set of gene expression experiments available for E. coli to produce high quality ARs. In order to explore the conservation of co-regulated gene sets across different organisms, we computed ARs for Shewanella oneidensis, Pseudomonas aeruginosa, Thermus thermophilus, and Staphylococcus aureus, each of which represents increasing degrees of phylogenetic distance from E. coli. Comparison of the organism-specific ARs showed that the consistency of AR gene membership correlates with phylogenetic distance, but there is clear variability in the regulatory networks of closely related organisms. As large scale expression data sets become increasingly common for model and non-model organisms, comparative analyses of atomic regulons will provide valuable insights into fundamental regulatory modules used across the bacterial domain. PMID:27933038

  1. TMEM88, CCL14 and CLEC3B as prognostic biomarkers for prognosis and palindromia of human hepatocellular carcinoma.

    PubMed

    Zhang, Xin; Wan, Jin-Xiang; Ke, Zun-Ping; Wang, Feng; Chai, Hai-Xia; Liu, Jia-Qiang

    2017-07-01

    Hepatocellular carcinoma is one of the most mortal and prevalent cancers with increasing incidence worldwide. Elucidating genetic driver genes for prognosis and palindromia of hepatocellular carcinoma helps managing clinical decisions for patients. In this study, the high-throughput RNA sequencing data on platform IlluminaHiSeq of hepatocellular carcinoma were downloaded from The Cancer Genome Atlas with 330 primary hepatocellular carcinoma patient samples. Stable key genes with differential expressions were identified with which Kaplan-Meier survival analysis was performed using Cox proportional hazards test in R language. Driver genes influencing the prognosis of this disease were determined using clustering analysis. Functional analysis of driver genes was performed by literature search and Gene Set Enrichment Analysis. Finally, the selected driver genes were verified using external dataset GSE40873. A total of 5781 stable key genes were identified, including 156 genes definitely related to prognoses of hepatocellular carcinoma. Based on the significant key genes, samples were grouped into five clusters which were further integrated into high- and low-risk classes based on clinical features. TMEM88, CCL14, and CLEC3B were selected as driver genes which clustered high-/low-risk patients successfully (generally, p = 0.0005124445). Finally, survival analysis of the high-/low-risk samples from external database illustrated significant difference with p value 0.0198. In conclusion, TMEM88, CCL14, and CLEC3B genes were stable and available in predicting the survival and palindromia time of hepatocellular carcinoma. These genes could function as potential prognostic genes contributing to improve patients' outcomes and survival.

  2. Application of the CRISPR/Cas9 gene editing technique to research on functional genomes of parasites.

    PubMed

    Cui, Yubao; Yu, Lili

    2016-12-01

    The clustered regularly-interspaced short palindromic repeats (CRISPR) structural family functions as an acquired immune system in prokaryotes. Gene editing techniques have co-opted CRISPR and the associated Cas nucleases to allow for the precise genetic modification of human cells, zebrafish, mice, and other eukaryotes. Indeed, this approach has been used to induce a variety of modifications including directed insertion/deletion (InDel) of bases, gene knock-in, introduction of mutations in both alleles of a target gene, and deletion of small DNA fragments. Thus, CRISPR technology offers a precise molecular tool for directed genome modification with a range of potential applications; further, its high mutation efficiency, simple process, and low cost provide additional advantages over prior editing techniques. This paper will provide an overview of the basic structure and function of the CRISPR gene editing system as well as current and potential applications to research on parasites. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  3. A Translational Pathway Toward a Clinical Trial Using the Second-Generation AAV Micro Dystrophin Vector

    DTIC Science & Technology

    2017-09-01

    future experimental therapeutic studies in the canine model such as CRISPR -mediated gene editing, stem cell therapy, dystrophin-independent disease...There is no scientific/budget overlap with the current proposal.) CRISPR /Cas9-based gene editing for the correction of Duchenne muscular dystrophy...lab will perform in vivo gene delivery and functional outcome measurements in mice treated by AAV- CRISPR gene repair vectors and if needed will also

  4. Genome-wide analysis of TCP family in tobacco.

    PubMed

    Chen, L; Chen, Y Q; Ding, A M; Chen, H; Xia, F; Wang, W F; Sun, Y H

    2016-05-23

    The TCP family is a transcription factor family, members of which are extensively involved in plant growth and development as well as in signal transduction in the response against many physiological and biochemical stimuli. In the present study, 61 TCP genes were identified in tobacco (Nicotiana tabacum) genome. Bioinformatic methods were employed for predicting and analyzing the gene structure, gene expression, phylogenetic analysis, and conserved domains of TCP proteins in tobacco. The 61 NtTCP genes were divided into three diverse groups, based on the division of TCP genes in tomato and Arabidopsis, and the results of the conserved domain and sequence analyses further confirmed the classification of the NtTCP genes. The expression pattern of NtTCP also demonstrated that majority of these genes play important roles in all the tissues, while some special genes exercise their functions only in specific tissues. In brief, the comprehensive and thorough study of the TCP family in other plants provides sufficient resources for studying the structure and functions of TCPs in tobacco.

  5. Targeted Gene Deletion in Cordyceps militaris Using the Split-Marker Approach.

    PubMed

    Lou, HaiWei; Ye, ZhiWei; Yun, Fan; Lin, JunFang; Guo, LiQiong; Chen, BaiXiong; Mu, ZhiXian

    2018-05-01

    The macrofungus Cordyceps militaris contains many kinds of bioactive ingredients that are regulated by functional genes, but the functions of many genes in C. militaris are still unknown. In this study, to improve the frequency of homologous integration, a genetic transformation system based on a split-marker approach was developed for the first time in C. militaris to knock out a gene encoding a terpenoid synthase (Tns). The linear and split-marker deletion cassettes were constructed and introduced into C. militaris protoplasts by PEG-mediated transformation. The transformation of split-marker fragments resulted in a higher efficiency of targeted gene disruption than the transformation of linear deletion cassettes did. The color phenotype of the Tns gene deletion mutants was different from that of wild-type C. militaris. Moreover, a PEG-mediated protoplast transformation system was established, and stable genetic transformants were obtained. This method of targeted gene deletion represents an important tool for investigating the role of C. militaris genes.

  6. Gene-based association study of genes linked to hippocampal sclerosis of aging neuropathology: GRN, TMEM106B, ABCC9, and KCNMB2

    PubMed Central

    Katsumata, Yuriko; Nelson, Peter T.; Ellingson, Sally R.; Fardo, David W.

    2017-01-01

    Hippocampal sclerosis of aging (HS-Aging) is a common neurodegenerative condition associated with dementia. To learn more about genetic risk of HS-Aging pathology, we tested gene-based associations of the GRN, TMEM106B, ABCC9, and KCNMB2 genes, which were reported to be associated with HS-Aging pathology in previous studies. Genetic data were obtained from the Alzheimer’s Disease Genetics Consortium (ADGC), linked to autopsy-derived neuropathological outcomes from the National Alzheimer’s Coordinating Center (NACC). Of the 3,251 subjects included in the study, 271 (8.3%) were identified as an HS-Aging case. The significant gene-based association between the ABCC9 gene and HS-Aging appeared to be driven by a region in which a significant haplotype-based association was found. We tested this haplotype as an expression Quantitative Trait Locus (eQTL) using two different public-access brain gene expression databases. The HS-Aging pathology protective ABCC9 haplotype was associated with decreased ABCC9 expression, indicating a possible toxic gain of function. PMID:28131462

  7. The heptanucleotide motif GAGACGC is a key component of a cis-acting promoter element that is critical for SnSAG1 expression in Sarcocystis neurona.

    PubMed

    Gaji, Rajshekhar Y; Howe, Daniel K

    2009-07-01

    The apicomplexan parasite Sarcocystis neurona undergoes a complex process of intracellular development, during which many genes are temporally regulated. The described study was undertaken to begin identifying the basic promoter elements that control gene expression in S. neurona. Sequence analysis of the 5'-flanking region of five S. neurona genes revealed a conserved heptanucleotide motif GAGACGC that is similar to the WGAGACG motif described upstream of multiple genes in Toxoplasma gondii. The promoter region for the major surface antigen gene SnSAG1, which contains three heptanucleotide motifs within 135 bases of the transcription start site, was dissected by functional analysis using a dual luciferase reporter assay. These analyses revealed that a minimal promoter fragment containing all three motifs was sufficient to drive reporter molecule expression, with the presence and orientation of the 5'-most heptanucleotide motif being absolutely critical for promoter function. Further studies should help to identify additional sequence elements important for promoter function and for controlling gene expression during intracellular development by this apicomplexan pathogen.

  8. Functional modules by relating protein interaction networks and gene expression.

    PubMed

    Tornow, Sabine; Mewes, H W

    2003-11-01

    Genes and proteins are organized on the basis of their particular mutual relations or according to their interactions in cellular and genetic networks. These include metabolic or signaling pathways and protein interaction, regulatory or co-expression networks. Integrating the information from the different types of networks may lead to the notion of a functional network and functional modules. To find these modules, we propose a new technique which is based on collective, multi-body correlations in a genetic network. We calculated the correlation strength of a group of genes (e.g. in the co-expression network) which were identified as members of a module in a different network (e.g. in the protein interaction network) and estimated the probability that this correlation strength was found by chance. Groups of genes with a significant correlation strength in different networks have a high probability that they perform the same function. Here, we propose evaluating the multi-body correlations by applying the superparamagnetic approach. We compare our method to the presently applied mean Pearson correlations and show that our method is more sensitive in revealing functional relationships.

  9. Functional modules by relating protein interaction networks and gene expression

    PubMed Central

    Tornow, Sabine; Mewes, H. W.

    2003-01-01

    Genes and proteins are organized on the basis of their particular mutual relations or according to their interactions in cellular and genetic networks. These include metabolic or signaling pathways and protein interaction, regulatory or co-expression networks. Integrating the information from the different types of networks may lead to the notion of a functional network and functional modules. To find these modules, we propose a new technique which is based on collective, multi-body correlations in a genetic network. We calculated the correlation strength of a group of genes (e.g. in the co-expression network) which were identified as members of a module in a different network (e.g. in the protein interaction network) and estimated the probability that this correlation strength was found by chance. Groups of genes with a significant correlation strength in different networks have a high probability that they perform the same function. Here, we propose evaluating the multi-body correlations by applying the superparamagnetic approach. We compare our method to the presently applied mean Pearson correlations and show that our method is more sensitive in revealing functional relationships. PMID:14576317

  10. Multi-tissue analysis of co-expression networks by higher-order generalized singular value decomposition identifies functionally coherent transcriptional modules.

    PubMed

    Xiao, Xiaolin; Moreno-Moral, Aida; Rotival, Maxime; Bottolo, Leonardo; Petretto, Enrico

    2014-01-01

    Recent high-throughput efforts such as ENCODE have generated a large body of genome-scale transcriptional data in multiple conditions (e.g., cell-types and disease states). Leveraging these data is especially important for network-based approaches to human disease, for instance to identify coherent transcriptional modules (subnetworks) that can inform functional disease mechanisms and pathological pathways. Yet, genome-scale network analysis across conditions is significantly hampered by the paucity of robust and computationally-efficient methods. Building on the Higher-Order Generalized Singular Value Decomposition, we introduce a new algorithmic approach for efficient, parameter-free and reproducible identification of network-modules simultaneously across multiple conditions. Our method can accommodate weighted (and unweighted) networks of any size and can similarly use co-expression or raw gene expression input data, without hinging upon the definition and stability of the correlation used to assess gene co-expression. In simulation studies, we demonstrated distinctive advantages of our method over existing methods, which was able to recover accurately both common and condition-specific network-modules without entailing ad-hoc input parameters as required by other approaches. We applied our method to genome-scale and multi-tissue transcriptomic datasets from rats (microarray-based) and humans (mRNA-sequencing-based) and identified several common and tissue-specific subnetworks with functional significance, which were not detected by other methods. In humans we recapitulated the crosstalk between cell-cycle progression and cell-extracellular matrix interactions processes in ventricular zones during neocortex expansion and further, we uncovered pathways related to development of later cognitive functions in the cortical plate of the developing brain which were previously unappreciated. Analyses of seven rat tissues identified a multi-tissue subnetwork of co-expressed heat shock protein (Hsp) and cardiomyopathy genes (Bag3, Cryab, Kras, Emd, Plec), which was significantly replicated using separate failing heart and liver gene expression datasets in humans, thus revealing a conserved functional role for Hsp genes in cardiovascular disease.

  11. Fast gene ontology based clustering for microarray experiments.

    PubMed

    Ovaska, Kristian; Laakso, Marko; Hautaniemi, Sampsa

    2008-11-21

    Analysis of a microarray experiment often results in a list of hundreds of disease-associated genes. In order to suggest common biological processes and functions for these genes, Gene Ontology annotations with statistical testing are widely used. However, these analyses can produce a very large number of significantly altered biological processes. Thus, it is often challenging to interpret GO results and identify novel testable biological hypotheses. We present fast software for advanced gene annotation using semantic similarity for Gene Ontology terms combined with clustering and heat map visualisation. The methodology allows rapid identification of genes sharing the same Gene Ontology cluster. Our R based semantic similarity open-source package has a speed advantage of over 2000-fold compared to existing implementations. From the resulting hierarchical clustering dendrogram genes sharing a GO term can be identified, and their differences in the gene expression patterns can be seen from the heat map. These methods facilitate advanced annotation of genes resulting from data analysis.

  12. Investigation of candidate genes for osteoarthritis based on gene expression profiles.

    PubMed

    Dong, Shuanghai; Xia, Tian; Wang, Lei; Zhao, Qinghua; Tian, Jiwei

    2016-12-01

    To explore the mechanism of osteoarthritis (OA) and provide valid biological information for further investigation. Gene expression profile of GSE46750 was downloaded from Gene Expression Omnibus database. The Linear Models for Microarray Data (limma) package (Bioconductor project, http://www.bioconductor.org/packages/release/bioc/html/limma.html) was used to identify differentially expressed genes (DEGs) in inflamed OA samples. Gene Ontology function enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways enrichment analysis of DEGs were performed based on Database for Annotation, Visualization and Integrated Discovery data, and protein-protein interaction (PPI) network was constructed based on the Search Tool for the Retrieval of Interacting Genes/Proteins database. Regulatory network was screened based on Encyclopedia of DNA Elements. Molecular Complex Detection was used for sub-network screening. Two sub-networks with highest node degree were integrated with transcriptional regulatory network and KEGG functional enrichment analysis was processed for 2 modules. In total, 401 up- and 196 down-regulated DEGs were obtained. Up-regulated DEGs were involved in inflammatory response, while down-regulated DEGs were involved in cell cycle. PPI network with 2392 protein interactions was constructed. Moreover, 10 genes including Interleukin 6 (IL6) and Aurora B kinase (AURKB) were found to be outstanding in PPI network. There are 214 up- and 8 down-regulated transcription factor (TF)-target pairs in the TF regulatory network. Module 1 had TFs including SPI1, PRDM1, and FOS, while module 2 contained FOSL1. The nodes in module 1 were enriched in chemokine signaling pathway, while the nodes in module 2 were mainly enriched in cell cycle. The screened DEGs including IL6, AGT, and AURKB might be potential biomarkers for gene therapy for OA by being regulated by TFs such as FOS and SPI1, and participating in the cell cycle and cytokine-cytokine receptor interaction pathway. Copyright © 2016 Turkish Association of Orthopaedics and Traumatology. Production and hosting by Elsevier B.V. All rights reserved.

  13. Design and construction of a first-generation high-throughput integrated molecular biology platform for production of optimized synthetic genes and improved industrial strains

    USDA-ARS?s Scientific Manuscript database

    The molecular biological techniques for plasmid-based assembly and cloning of synthetic assembled gene open reading frames are essential for elucidating the function of the proteins encoded by the genes. These techniques involve the production of full-length cDNA libraries as a source of plasmid-bas...

  14. Identification of differentially expressed genes through RNA sequencing in goats (Capra hircus) at different postnatal stages

    PubMed Central

    Li, Qian; Lin, Sen

    2017-01-01

    Intramuscular fat (IMF) content and fatty acid composition of longissimus dorsi muscle (LM) change with growth, which partially determines the flavor and nutritional value of goat (Capra hircus) meat. However, unlike cattle, little information is available on the transcriptome-wide changes during different postnatal stages in small ruminants, especially goats. In this study, the sequencing reads of goat LM tissues collected from kid, youth, and adult period were mapped to the goat genome. Results showed that out of total 24 689 Unigenes, 20 435 Unigenes were annotated. Based on expected number of fragments per kilobase of transcript sequence per million base pairs sequenced (FPKM), 111 annotated differentially expressed genes (DEGs) were identified among different postnatal stages, which were subsequently assigned to 16 possible expression patterns by series-cluster analysis. Functional classification by Gene Ontology (GO) analysis was used for selecting the genes showing highest expression related to lipid metabolism. Finally, we identified the node genes for lipid metabolism regulation using co-expression analysis. In conclusion, these data may uncover candidate genes having functional roles in regulation of goat muscle development and lipid metabolism during the various growth stages in goats. PMID:28800357

  15. Identification of differentially expressed genes through RNA sequencing in goats (Capra hircus) at different postnatal stages.

    PubMed

    Lin, Yaqiu; Zhu, Jiangjiang; Wang, Yong; Li, Qian; Lin, Sen

    2017-01-01

    Intramuscular fat (IMF) content and fatty acid composition of longissimus dorsi muscle (LM) change with growth, which partially determines the flavor and nutritional value of goat (Capra hircus) meat. However, unlike cattle, little information is available on the transcriptome-wide changes during different postnatal stages in small ruminants, especially goats. In this study, the sequencing reads of goat LM tissues collected from kid, youth, and adult period were mapped to the goat genome. Results showed that out of total 24 689 Unigenes, 20 435 Unigenes were annotated. Based on expected number of fragments per kilobase of transcript sequence per million base pairs sequenced (FPKM), 111 annotated differentially expressed genes (DEGs) were identified among different postnatal stages, which were subsequently assigned to 16 possible expression patterns by series-cluster analysis. Functional classification by Gene Ontology (GO) analysis was used for selecting the genes showing highest expression related to lipid metabolism. Finally, we identified the node genes for lipid metabolism regulation using co-expression analysis. In conclusion, these data may uncover candidate genes having functional roles in regulation of goat muscle development and lipid metabolism during the various growth stages in goats.

  16. Optimization lighting layout based on gene density improved genetic algorithm for indoor visible light communications

    NASA Astrophysics Data System (ADS)

    Liu, Huanlin; Wang, Xin; Chen, Yong; Kong, Deqian; Xia, Peijie

    2017-05-01

    For indoor visible light communication system, the layout of LED lamps affects the uniformity of the received power on communication plane. In order to find an optimized lighting layout that meets both the lighting needs and communication needs, a gene density genetic algorithm (GDGA) is proposed. In GDGA, a gene indicates a pair of abscissa and ordinate of a LED, and an individual represents a LED layout in the room. The segmented crossover operation and gene mutation strategy based on gene density are put forward to make the received power on communication plane more uniform and increase the population's diversity. A weighted differences function between individuals is designed as the fitness function of GDGA for reserving the population having the useful LED layout genetic information and ensuring the global convergence of GDGA. Comparing square layout and circular layout, with the optimized layout achieved by the GDGA, the power uniformity increases by 83.3%, 83.1% and 55.4%, respectively. Furthermore, the convergence of GDGA is verified compared with evolutionary algorithm (EA). Experimental results show that GDGA can quickly find an approximation of optimal layout.

  17. A genetic replacement system for selection-based engineering of essential proteins

    PubMed Central

    2012-01-01

    Background Essential genes represent the core of biological functions required for viability. Molecular understanding of essentiality as well as design of synthetic cellular systems includes the engineering of essential proteins. An impediment to this effort is the lack of growth-based selection systems suitable for directed evolution approaches. Results We established a simple strategy for genetic replacement of an essential gene by a (library of) variant(s) during a transformation. The system was validated using three different essential genes and plasmid combinations and it reproducibly shows transformation efficiencies on the order of 107 transformants per microgram of DNA without any identifiable false positives. This allowed for reliable recovery of functional variants out of at least a 105-fold excess of non-functional variants. This outperformed selection in conventional bleach-out strains by at least two orders of magnitude, where recombination between functional and non-functional variants interfered with reliable recovery even in recA negative strains. Conclusions We propose that this selection system is extremely suitable for evaluating large libraries of engineered essential proteins resulting in the reliable isolation of functional variants in a clean strain background which can readily be used for in vivo applications as well as expression and purification for use in in vitro studies. PMID:22898007

  18. Enriching regulatory networks by bootstrap learning using optimised GO-based gene similarity and gene links mined from PubMed abstracts

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Taylor, Ronald C.; Sanfilippo, Antonio P.; McDermott, Jason E.

    2011-02-18

    Transcriptional regulatory networks are being determined using “reverse engineering” methods that infer connections based on correlations in gene state. Corroboration of such networks through independent means such as evidence from the biomedical literature is desirable. Here, we explore a novel approach, a bootstrapping version of our previous Cross-Ontological Analytic method (XOA) that can be used for semi-automated annotation and verification of inferred regulatory connections, as well as for discovery of additional functional relationships between the genes. First, we use our annotation and network expansion method on a biological network learned entirely from the literature. We show how new relevant linksmore » between genes can be iteratively derived using a gene similarity measure based on the Gene Ontology that is optimized on the input network at each iteration. Second, we apply our method to annotation, verification, and expansion of a set of regulatory connections found by the Context Likelihood of Relatedness algorithm.« less

  19. A statistical method for measuring activation of gene regulatory networks.

    PubMed

    Esteves, Gustavo H; Reis, Luiz F L

    2018-06-13

    Gene expression data analysis is of great importance for modern molecular biology, given our ability to measure the expression profiles of thousands of genes and enabling studies rooted in systems biology. In this work, we propose a simple statistical model for the activation measuring of gene regulatory networks, instead of the traditional gene co-expression networks. We present the mathematical construction of a statistical procedure for testing hypothesis regarding gene regulatory network activation. The real probability distribution for the test statistic is evaluated by a permutation based study. To illustrate the functionality of the proposed methodology, we also present a simple example based on a small hypothetical network and the activation measuring of two KEGG networks, both based on gene expression data collected from gastric and esophageal samples. The two KEGG networks were also analyzed for a public database, available through NCBI-GEO, presented as Supplementary Material. This method was implemented in an R package that is available at the BioConductor project website under the name maigesPack.

  20. Transcriptional interference networks coordinate the expression of functionally related genes clustered in the same genomic loci

    PubMed Central

    Boldogköi, Zsolt

    2012-01-01

    The regulation of gene expression is essential for normal functioning of biological systems in every form of life. Gene expression is primarily controlled at the level of transcription, especially at the phase of initiation. Non-coding RNAs are one of the major players at every level of genetic regulation, including the control of chromatin organization, transcription, various post-transcriptional processes, and translation. In this study, the Transcriptional Interference Network (TIN) hypothesis was put forward in an attempt to explain the global expression of antisense RNAs and the overall occurrence of tandem gene clusters in the genomes of various biological systems ranging from viruses to mammalian cells. The TIN hypothesis suggests the existence of a novel layer of genetic regulation, based on the interactions between the transcriptional machineries of neighboring genes at their overlapping regions, which are assumed to play a fundamental role in coordinating gene expression within a cluster of functionally linked genes. It is claimed that the transcriptional overlaps between adjacent genes are much more widespread in genomes than is thought today. The Waterfall model of the TIN hypothesis postulates a unidirectional effect of upstream genes on the transcription of downstream genes within a cluster of tandemly arrayed genes, while the Seesaw model proposes a mutual interdependence of gene expression between the oppositely oriented genes. The TIN represents an auto-regulatory system with an exquisitely timed and highly synchronized cascade of gene expression in functionally linked genes located in close physical proximity to each other. In this study, we focused on herpesviruses. The reason for this lies in the compressed nature of viral genes, which allows a tight regulation and an easier investigation of the transcriptional interactions between genes. However, I believe that the same or similar principles can be applied to cellular organisms too. PMID:22783276

  1. Identification and expression profiling analysis of TCP family genes involved in growth and development in maize.

    PubMed

    Chai, Wenbo; Jiang, Pengfei; Huang, Guoyu; Jiang, Haiyang; Li, Xiaoyu

    2017-10-01

    The TCP family is a group of plant-specific transcription factors. TCP genes encode proteins harboring bHLH structure, which is implicated in DNA binding and protein-protein interactions and known as the TCP domain. TCP genes play important roles in plant development and have been evolutionarily and functionally elaborated in various plants, however, no overall phylogenetic analysis or expression profiling of TCP genes in Zea mays has been reported. In the present study, a systematic analysis of molecular evolution and functional prediction of TCP family genes in maize ( Z . mays L.) has been conducted. We performed a genome-wide survey of TCP genes in maize, revealing the gene structure, chromosomal location and phylogenetic relationship of family members. Microsynteny between grass species and tissue-specific expression profiles were also investigated. In total, 29 TCP genes were identified in the maize genome, unevenly distributed on the 10 maize chromosomes. Additionally, ZmTCP genes were categorized into nine classes based on phylogeny and purifying selection may largely be responsible for maintaining the functions of maize TCP genes. What's more, microsynteny analysis suggested that TCP genes have been conserved during evolution. Finally, expression analysis revealed that most TCP genes are expressed in the stem and ear, which suggests that ZmTCP genes influence stem and ear growth. This result is consistent with the previous finding that maize TCP genes represses the growth of axillary organs and enables the formation of female inflorescences. Altogether, this study presents a thorough overview of TCP family in maize and provides a new perspective on the evolution of this gene family. The results also indicate that TCP family genes may be involved in development stage in plant growing conditions. Additionally, our results will be useful for further functional analysis of the TCP gene family in maize.

  2. Transcriptional interference networks coordinate the expression of functionally related genes clustered in the same genomic loci.

    PubMed

    Boldogköi, Zsolt

    2012-01-01

    The regulation of gene expression is essential for normal functioning of biological systems in every form of life. Gene expression is primarily controlled at the level of transcription, especially at the phase of initiation. Non-coding RNAs are one of the major players at every level of genetic regulation, including the control of chromatin organization, transcription, various post-transcriptional processes, and translation. In this study, the Transcriptional Interference Network (TIN) hypothesis was put forward in an attempt to explain the global expression of antisense RNAs and the overall occurrence of tandem gene clusters in the genomes of various biological systems ranging from viruses to mammalian cells. The TIN hypothesis suggests the existence of a novel layer of genetic regulation, based on the interactions between the transcriptional machineries of neighboring genes at their overlapping regions, which are assumed to play a fundamental role in coordinating gene expression within a cluster of functionally linked genes. It is claimed that the transcriptional overlaps between adjacent genes are much more widespread in genomes than is thought today. The Waterfall model of the TIN hypothesis postulates a unidirectional effect of upstream genes on the transcription of downstream genes within a cluster of tandemly arrayed genes, while the Seesaw model proposes a mutual interdependence of gene expression between the oppositely oriented genes. The TIN represents an auto-regulatory system with an exquisitely timed and highly synchronized cascade of gene expression in functionally linked genes located in close physical proximity to each other. In this study, we focused on herpesviruses. The reason for this lies in the compressed nature of viral genes, which allows a tight regulation and an easier investigation of the transcriptional interactions between genes. However, I believe that the same or similar principles can be applied to cellular organisms too.

  3. Genome-wide identification, characterisation and expression analysis of the MADS-box gene family in Prunus mume.

    PubMed

    Xu, Zongda; Zhang, Qixiang; Sun, Lidan; Du, Dongliang; Cheng, Tangren; Pan, Huitang; Yang, Weiru; Wang, Jia

    2014-10-01

    MADS-box genes encode transcription factors that play crucial roles in plant development, especially in flower and fruit development. To gain insight into this gene family in Prunus mume, an important ornamental and fruit plant in East Asia, and to elucidate their roles in flower organ determination and fruit development, we performed a genome-wide identification, characterisation and expression analysis of MADS-box genes in this Rosaceae tree. In this study, 80 MADS-box genes were identified in P. mume and categorised into MIKC, Mα, Mβ, Mγ and Mδ groups based on gene structures and phylogenetic relationships. The MIKC group could be further classified into 12 subfamilies. The FLC subfamily was absent in P. mume and the six tandemly arranged DAM genes might experience a species-specific evolution process in P. mume. The MADS-box gene family might experience an evolution process from MIKC genes to Mδ genes to Mα, Mβ and Mγ genes. The expression analysis suggests that P. mume MADS-box genes have diverse functions in P. mume development and the functions of duplicated genes diverged after the duplication events. In addition to its involvement in the development of female gametophytes, type I genes also play roles in male gametophytes development. In conclusion, this study adds to our understanding of the roles that the MADS-box genes played in flower and fruit development and lays a foundation for selecting candidate genes for functional studies in P. mume and other species. Furthermore, this study also provides a basis to study the evolution of the MADS-box family.

  4. Antagonistic Roles for KNOX1 and KNOX2 Genes in Patterning the Land Plant Body Plan Following an Ancient Gene Duplication

    PubMed Central

    Furumizu, Chihiro; Alvarez, John Paul; Sakakibara, Keiko; Bowman, John L.

    2015-01-01

    Neofunctionalization following gene duplication is thought to be one of the key drivers in generating evolutionary novelty. A gene duplication in a common ancestor of land plants produced two classes of KNOTTED-like TALE homeobox genes, class I (KNOX1) and class II (KNOX2). KNOX1 genes are linked to tissue proliferation and maintenance of meristematic potentials of flowering plant and moss sporophytes, and modulation of KNOX1 activity is implicated in contributing to leaf shape diversity of flowering plants. While KNOX2 function has been shown to repress the gametophytic (haploid) developmental program during moss sporophyte (diploid) development, little is known about KNOX2 function in flowering plants, hindering syntheses regarding the relationship between two classes of KNOX genes in the context of land plant evolution. Arabidopsis plants harboring loss-of-function KNOX2 alleles exhibit impaired differentiation of all aerial organs and have highly complex leaves, phenocopying gain-of-function KNOX1 alleles. Conversely, gain-of-function KNOX2 alleles in conjunction with a presumptive heterodimeric BELL TALE homeobox partner suppressed SAM activity in Arabidopsis and reduced leaf complexity in the Arabidopsis relative Cardamine hirsuta, reminiscent of loss-of-function KNOX1 alleles. Little evidence was found indicative of epistasis or mutual repression between KNOX1 and KNOX2 genes. KNOX proteins heterodimerize with BELL TALE homeobox proteins to form functional complexes, and contrary to earlier reports based on in vitro and heterologous expression, we find high selectivity between KNOX and BELL partners in vivo. Thus, KNOX2 genes confer opposing activities rather than redundant roles with KNOX1 genes, and together they act to direct the development of all above-ground organs of the Arabidopsis sporophyte. We infer that following the KNOX1/KNOX2 gene duplication in an ancestor of land plants, neofunctionalization led to evolution of antagonistic biochemical activity thereby facilitating the evolution of more complex sporophyte transcriptional networks, providing plasticity for the morphological evolution of land plant body plans. PMID:25671434

  5. VirtualLeaf: an open-source framework for cell-based modeling of plant tissue growth and development.

    PubMed

    Merks, Roeland M H; Guravage, Michael; Inzé, Dirk; Beemster, Gerrit T S

    2011-02-01

    Plant organs, including leaves and roots, develop by means of a multilevel cross talk between gene regulation, patterned cell division and cell expansion, and tissue mechanics. The multilevel regulatory mechanisms complicate classic molecular genetics or functional genomics approaches to biological development, because these methodologies implicitly assume a direct relation between genes and traits at the level of the whole plant or organ. Instead, understanding gene function requires insight into the roles of gene products in regulatory networks, the conditions of gene expression, etc. This interplay is impossible to understand intuitively. Mathematical and computer modeling allows researchers to design new hypotheses and produce experimentally testable insights. However, the required mathematics and programming experience makes modeling poorly accessible to experimental biologists. Problem-solving environments provide biologically intuitive in silico objects ("cells", "regulation networks") required for setting up a simulation and present those to the user in terms of familiar, biological terminology. Here, we introduce the cell-based computer modeling framework VirtualLeaf for plant tissue morphogenesis. The current version defines a set of biologically intuitive C++ objects, including cells, cell walls, and diffusing and reacting chemicals, that provide useful abstractions for building biological simulations of developmental processes. We present a step-by-step introduction to building models with VirtualLeaf, providing basic example models of leaf venation and meristem development. VirtualLeaf-based models provide a means for plant researchers to analyze the function of developmental genes in the context of the biophysics of growth and patterning. VirtualLeaf is an ongoing open-source software project (http://virtualleaf.googlecode.com) that runs on Windows, Mac, and Linux.

  6. DaGO-Fun: tool for Gene Ontology-based functional analysis using term information content measures.

    PubMed

    Mazandu, Gaston K; Mulder, Nicola J

    2013-09-25

    The use of Gene Ontology (GO) data in protein analyses have largely contributed to the improved outcomes of these analyses. Several GO semantic similarity measures have been proposed in recent years and provide tools that allow the integration of biological knowledge embedded in the GO structure into different biological analyses. There is a need for a unified tool that provides the scientific community with the opportunity to explore these different GO similarity measure approaches and their biological applications. We have developed DaGO-Fun, an online tool available at http://web.cbio.uct.ac.za/ITGOM, which incorporates many different GO similarity measures for exploring, analyzing and comparing GO terms and proteins within the context of GO. It uses GO data and UniProt proteins with their GO annotations as provided by the Gene Ontology Annotation (GOA) project to precompute GO term information content (IC), enabling rapid response to user queries. The DaGO-Fun online tool presents the advantage of integrating all the relevant IC-based GO similarity measures, including topology- and annotation-based approaches to facilitate effective exploration of these measures, thus enabling users to choose the most relevant approach for their application. Furthermore, this tool includes several biological applications related to GO semantic similarity scores, including the retrieval of genes based on their GO annotations, the clustering of functionally related genes within a set, and term enrichment analysis.

  7. Microarray and comparative genomics-based identification of genes and gene regulatory regions of the mouse immune system

    PubMed Central

    Hutton, John J; Jegga, Anil G; Kong, Sue; Gupta, Ashima; Ebert, Catherine; Williams, Sarah; Katz, Jonathan D; Aronow, Bruce J

    2004-01-01

    Background In this study we have built and mined a gene expression database composed of 65 diverse mouse tissues for genes preferentially expressed in immune tissues and cell types. Using expression pattern criteria, we identified 360 genes with preferential expression in thymus, spleen, peripheral blood mononuclear cells, lymph nodes (unstimulated or stimulated), or in vitro activated T-cells. Results Gene clusters, formed based on similarity of expression-pattern across either all tissues or the immune tissues only, had highly significant associations both with immunological processes such as chemokine-mediated response, antigen processing, receptor-related signal transduction, and transcriptional regulation, and also with more general processes such as replication and cell cycle control. Within-cluster gene correlations implicated known associations of known genes, as well as immune process-related roles for poorly described genes. To characterize regulatory mechanisms and cis-elements of genes with similar patterns of expression, we used a new version of a comparative genomics-based cis-element analysis tool to identify clusters of cis-elements with compositional similarity among multiple genes. Several clusters contained genes that shared 5–6 cis-elements that included ETS and zinc-finger binding sites. cis-Elements AP2 EGRF ETSF MAZF SP1F ZF5F and AREB ETSF MZF1 PAX5 STAT were shared in a thymus-expressed set; AP4R E2FF EBOX ETSF MAZF SP1F ZF5F and CREB E2FF MAZF PCAT SP1F STAT cis-clusters occurred in activated T-cells; CEBP CREB NFKB SORY and GATA NKXH OCT1 RBIT occurred in stimulated lymph nodes. Conclusion This study demonstrates a series of analytic approaches that have allowed the implication of genes and regulatory elements that participate in the differentiation, maintenance, and function of the immune system. Polymorphism or mutation of these could adversely impact immune system functions. PMID:15504237

  8. Rapid and tunable method to temporally control gene editing based on conditional Cas9 stabilization. | Office of Cancer Genomics

    Cancer.gov

    The CRISPR/Cas9 system is a powerful tool for studying gene function. Here, we describe a method that allows temporal control of CRISPR/Cas9 activity based on conditional Cas9 destabilization. We demonstrate that fusing an FKBP12-derived destabilizing domain to Cas9 (DD-Cas9) enables conditional Cas9 expression and temporal control of gene editing in the presence of an FKBP12 synthetic ligand. This system can be easily adapted to co-express, from the same promoter, DD-Cas9 with any other gene of interest without co-modulation of the latter.

  9. FlyBase: genes and gene models

    PubMed Central

    Drysdale, Rachel A.; Crosby, Madeline A.

    2005-01-01

    FlyBase (http://flybase.org) is the primary repository of genetic and molecular data of the insect family Drosophilidae. For the most extensively studied species, Drosophila melanogaster, a wide range of data are presented in integrated formats. Data types include mutant phenotypes, molecular characterization of mutant alleles and aberrations, cytological maps, wild-type expression patterns, anatomical images, transgenic constructs and insertions, sequence-level gene models and molecular classification of gene product functions. There is a growing body of data for other Drosophila species; this is expected to increase dramatically over the next year, with the completion of draft-quality genomic sequences of an additional 11 Drosphila species. PMID:15608223

  10. Synthetic biology in mammalian cells: Next generation research tools and therapeutics

    PubMed Central

    Lienert, Florian; Lohmueller, Jason J; Garg, Abhishek; Silver, Pamela A

    2014-01-01

    Recent progress in DNA manipulation and gene circuit engineering has greatly improved our ability to programme and probe mammalian cell behaviour. These advances have led to a new generation of synthetic biology research tools and potential therapeutic applications. Programmable DNA-binding domains and RNA regulators are leading to unprecedented control of gene expression and elucidation of gene function. Rebuilding complex biological circuits such as T cell receptor signalling in isolation from their natural context has deepened our understanding of network motifs and signalling pathways. Synthetic biology is also leading to innovative therapeutic interventions based on cell-based therapies, protein drugs, vaccines and gene therapies. PMID:24434884

  11. Dataset of the human homologues and orthologues of lipid-metabolic genes identified as DAF-16 targets their roles in lipid and energy metabolism.

    PubMed

    Fan, Lavender Yuen-Nam; Saavedra-García, Paula; Lam, Eric Wing-Fai

    2017-04-01

    The data presented in this article are related to the review article entitled 'Unravelling the role of fatty acid metabolism in cancer through the FOXO3-FOXM1 axis' (Saavedra-Garcia et al., 2017) [24]. Here, we have matched the DAF-16/FOXO3 downstream genes with their respective human orthologues and reviewed the roles of these targeted genes in FA metabolism. The list of genes listed in this article are precisely selected from literature reviews based on their functions in mammalian FA metabolism. The nematode Caenorhabditis elegans gene orthologues of the genes are obtained from WormBase, the online biological database of C. elegans. This dataset has not been uploaded to a public repository yet.

  12. Comprehensive Expression Profiling and Functional Network Analysis of Porphyra-334, One Mycosporine-Like Amino Acid (MAA), in Human Keratinocyte Exposed with UV-radiation.

    PubMed

    Suh, Sung-Suk; Lee, Sung Gu; Youn, Ui Joung; Han, Se Jong; Kim, Il-Chan; Kim, Sanghee

    2017-06-24

    Mycosporine-like amino acids (MAAs) have been highlighted as pharmacologically active secondary compounds to protect cells from harmful UV-radiation by absorbing its energy. Previous studies have mostly focused on characterizing their physiological properties such as antioxidant activity and osmotic regulation. However, molecular mechanisms underlying their UV-protective capability have not yet been revealed. In the present study, we investigated the expression profiling of porphyra-334-modulated genes or microRNA (miRNAs) in response to UV-exposure and their functional networks, using cDNA and miRNAs microarray. Based on our data, we showed that porphyra-334-regulated genes play essential roles in UV-affected biological processes such as Wnt (Wingless/integrase-1) and Notch pathways which exhibit antagonistic relationship in various biological processes; the UV-repressed genes were in the Wnt signaling pathway, while the activated genes were in the Notch signaling. In addition, porphyra-334-regulated miRNAs can target many genes related with UV-mediated biological processes such as apoptosis, cell proliferation and translational elongation. Notably, we observed that functional roles of the target genes for up-regulated miRNAs are inversely correlated with those for down-regulated miRNAs; the former genes promote apoptosis and translational elongation, whereas the latter function as inhibitors in these processes. Taken together, these data suggest that porphyra-334 protects cells from harmful UV radiation through the comprehensive modulation of expression patterns of genes involved in UV-mediated biological processes, and that provide a new insight to understand its functional molecular networks.

  13. A new set of ESTs and cDNA clones from full-length and normalized libraries for gene discovery and functional characterization in citrus

    PubMed Central

    Marques, M Carmen; Alonso-Cantabrana, Hugo; Forment, Javier; Arribas, Raquel; Alamar, Santiago; Conejero, Vicente; Perez-Amador, Miguel A

    2009-01-01

    Background Interpretation of ever-increasing raw sequence information generated by modern genome sequencing technologies faces multiple challenges, such as gene function analysis and genome annotation. Indeed, nearly 40% of genes in plants encode proteins of unknown function. Functional characterization of these genes is one of the main challenges in modern biology. In this regard, the availability of full-length cDNA clones may fill in the gap created between sequence information and biological knowledge. Full-length cDNA clones facilitate functional analysis of the corresponding genes enabling manipulation of their expression in heterologous systems and the generation of a variety of tagged versions of the native protein. In addition, the development of full-length cDNA sequences has the power to improve the quality of genome annotation. Results We developed an integrated method to generate a new normalized EST collection enriched in full-length and rare transcripts of different citrus species from multiple tissues and developmental stages. We constructed a total of 15 cDNA libraries, from which we isolated 10,898 high-quality ESTs representing 6142 different genes. Percentages of redundancy and proportion of full-length clones range from 8 to 33, and 67 to 85, respectively, indicating good efficiency of the approach employed. The new EST collection adds 2113 new citrus ESTs, representing 1831 unigenes, to the collection of citrus genes available in the public databases. To facilitate functional analysis, cDNAs were introduced in a Gateway-based cloning vector for high-throughput functional analysis of genes in planta. Herein, we describe the technical methods used in the library construction, sequence analysis of clones and the overexpression of CitrSEP, a citrus homolog to the Arabidopsis SEP3 gene, in Arabidopsis as an example of a practical application of the engineered Gateway vector for functional analysis. Conclusion The new EST collection denotes an important step towards the identification of all genes in the citrus genome. Furthermore, public availability of the cDNA clones generated in this study, and not only their sequence, enables testing of the biological function of the genes represented in the collection. Expression of the citrus SEP3 homologue, CitrSEP, in Arabidopsis results in early flowering, along with other phenotypes resembling the over-expression of the Arabidopsis SEPALLATA genes. Our findings suggest that the members of the SEP gene family play similar roles in these quite distant plant species. PMID:19747386

  14. Advances and perspectives on the use of CRISPR/Cas9 systems in plant genomics research

    DOE PAGES

    Liu, Degao; Hu, Rongbin; Palla, Kaitlin J.; ...

    2016-02-18

    Genome editing with site-specific nucleases has become a powerful tool for functional characterization of plant genes and genetic improvement of agricultural crops. Among the various site-specific nuclease-based technologies available for genome editing, the clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR-associated protein 9 (Cas9) systems have shown the greatest potential for rapid and efficient editing of genomes in plant species. Here, this article reviews the current status of application of CRISPR/Cas9 to plant genomics research, with a focus on loss-of-function and gain-of-function analysis of individual genes in the context of perennial plants and the potential application of CRISPR/Cas9 to perturbation ofmore » gene expression, as well as identification and analysis of gene modules as part of an accelerated domestication and synthetic biology effort.« less

  15. Advances and perspectives on the use of CRISPR/Cas9 systems in plant genomics research

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, Degao; Hu, Rongbin; Palla, Kaitlin J.

    Genome editing with site-specific nucleases has become a powerful tool for functional characterization of plant genes and genetic improvement of agricultural crops. Among the various site-specific nuclease-based technologies available for genome editing, the clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR-associated protein 9 (Cas9) systems have shown the greatest potential for rapid and efficient editing of genomes in plant species. Here, this article reviews the current status of application of CRISPR/Cas9 to plant genomics research, with a focus on loss-of-function and gain-of-function analysis of individual genes in the context of perennial plants and the potential application of CRISPR/Cas9 to perturbation ofmore » gene expression, as well as identification and analysis of gene modules as part of an accelerated domestication and synthetic biology effort.« less

  16. Efficient computation of the phylogenetic likelihood function on multi-gene alignments and multi-core architectures.

    PubMed

    Stamatakis, Alexandros; Ott, Michael

    2008-12-27

    The continuous accumulation of sequence data, for example, due to novel wet-laboratory techniques such as pyrosequencing, coupled with the increasing popularity of multi-gene phylogenies and emerging multi-core processor architectures that face problems of cache congestion, poses new challenges with respect to the efficient computation of the phylogenetic maximum-likelihood (ML) function. Here, we propose two approaches that can significantly speed up likelihood computations that typically represent over 95 per cent of the computational effort conducted by current ML or Bayesian inference programs. Initially, we present a method and an appropriate data structure to efficiently compute the likelihood score on 'gappy' multi-gene alignments. By 'gappy' we denote sampling-induced gaps owing to missing sequences in individual genes (partitions), i.e. not real alignment gaps. A first proof-of-concept implementation in RAXML indicates that this approach can accelerate inferences on large and gappy alignments by approximately one order of magnitude. Moreover, we present insights and initial performance results on multi-core architectures obtained during the transition from an OpenMP-based to a Pthreads-based fine-grained parallelization of the ML function.

  17. Using the gene ontology for microarray data mining: a comparison of methods and application to age effects in human prefrontal cortex.

    PubMed

    Pavlidis, Paul; Qin, Jie; Arango, Victoria; Mann, John J; Sibille, Etienne

    2004-06-01

    One of the challenges in the analysis of gene expression data is placing the results in the context of other data available about genes and their relationships to each other. Here, we approach this problem in the study of gene expression changes associated with age in two areas of the human prefrontal cortex, comparing two computational methods. The first method, "overrepresentation analysis" (ORA), is based on statistically evaluating the fraction of genes in a particular gene ontology class found among the set of genes showing age-related changes in expression. The second method, "functional class scoring" (FCS), examines the statistical distribution of individual gene scores among all genes in the gene ontology class and does not involve an initial gene selection step. We find that FCS yields more consistent results than ORA, and the results of ORA depended strongly on the gene selection threshold. Our findings highlight the utility of functional class scoring for the analysis of complex expression data sets and emphasize the advantage of considering all available genomic information rather than sets of genes that pass a predetermined "threshold of significance."

  18. Natural killer cell receptor genes in the family Equidae: not only Ly49.

    PubMed

    Futas, Jan; Horin, Petr

    2013-01-01

    Natural killer (NK) cells have important functions in immunity. NK recognition in mammals can be mediated through killer cell immunoglobulin-like receptors (KIR) and/or killer cell lectin-like Ly49 receptors. Genes encoding highly variable NK cell receptors (NKR) represent rapidly evolving genomic regions. No single conservative model of NKR genes was observed in mammals. Single-copy low polymorphic NKR genes present in one mammalian species may expand into highly polymorphic multigene families in other species. In contrast to other non-rodent mammals, multiple Ly49-like genes appear to exist in the horse, while no functional KIR genes were observed in this species. In this study, Ly49 and KIR were sought and their evolution was characterized in the entire family Equidae. Genomic sequences retrieved showed the presence of at least five highly conserved polymorphic Ly49 genes in horses, asses and zebras. These findings confirmed that the expansion of Ly49 occurred in the entire family. Several KIR-like sequences were also identified in the genome of Equids. Besides a previously identified non-functional KIR-Immunoglobulin-like transcript fusion gene (KIR-ILTA) and two putative pseudogenes, a KIR3DL-like sequence was analyzed. In contrast to previous observations made in the horse, the KIR3DL sequence, genomic organization and mRNA expression suggest that all Equids might produce a functional KIR receptor protein molecule with a single non-mutated immune tyrosine-based inhibition motif (ITIM) domain. No evidence for positive selection in the KIR3DL gene was found. Phylogenetic analysis including rhinoceros and tapir genomic DNA and deduced amino acid KIR-related sequences showed differences between families and even between species within the order Perissodactyla. The results suggest that the order Perissodactyla and its family Equidae with expanded Ly49 genes and with a potentially functional KIR gene may represent an interesting model for evolutionary biology of NKR genes.

  19. The shaping and functional consequences of the dosage effect landscape in multiple myeloma.

    PubMed

    Samur, Mehmet K; Shah, Parantu K; Wang, Xujun; Minvielle, Stéphane; Magrangeas, Florence; Avet-Loiseau, Hervé; Munshi, Nikhil C; Li, Cheng

    2013-10-02

    Multiple myeloma (MM) is a malignant proliferation of plasma B cells. Based on recurrent aneuploidy such as copy number alterations (CNAs), myeloma is divided into two subtypes with different CNA patterns and patient survival outcomes. How aneuploidy events arise, and whether they contribute to cancer cell evolution are actively studied. The large amount of transcriptomic changes resultant of CNAs (dosage effect) pose big challenges for identifying functional consequences of CNAs in myeloma in terms of specific driver genes and pathways. In this study, we hypothesize that gene-wise dosage effect varies as a result from complex regulatory networks that translate the impact of CNAs to gene expression, and studying this variation can provide insights into functional effects of CNAs. We propose gene-wise dosage effect score and genome-wide karyotype plot as tools to measure and visualize concordant copy number and expression changes across cancer samples. We find that dosage effect in myeloma is widespread yet variable, and it is correlated with gene expression level and CNA frequencies in different chromosomes. Our analysis suggests that despite the enrichment of differentially expressed genes between hyperdiploid MM and non-hyperdiploid MM in the trisomy chromosomes, the chromosomal proportion of dosage sensitive genes is higher in the non-trisomy chromosomes. Dosage-sensitive genes are enriched by genes with protein translation and localization functions, and dosage resistant genes are enriched by apoptosis genes. These results point to future studies on differential dosage sensitivity and resistance of pro- and anti-proliferation pathways and their variation across patients as therapeutic targets and prognosis markers. Our findings support the hypothesis that recurrent CNAs in myeloma are selected by their functional consequences. The novel dosage effect score defined in this work will facilitate integration of copy number and expression data for identifying driver genes in cancer genomics studies. The accompanying R code is available at http://www.canevolve.org/dosageEffect/.

  20. Natural Killer Cell Receptor Genes in the Family Equidae: Not only Ly49

    PubMed Central

    Futas, Jan; Horin, Petr

    2013-01-01

    Natural killer (NK) cells have important functions in immunity. NK recognition in mammals can be mediated through killer cell immunoglobulin-like receptors (KIR) and/or killer cell lectin-like Ly49 receptors. Genes encoding highly variable NK cell receptors (NKR) represent rapidly evolving genomic regions. No single conservative model of NKR genes was observed in mammals. Single-copy low polymorphic NKR genes present in one mammalian species may expand into highly polymorphic multigene families in other species. In contrast to other non-rodent mammals, multiple Ly49-like genes appear to exist in the horse, while no functional KIR genes were observed in this species. In this study, Ly49 and KIR were sought and their evolution was characterized in the entire family Equidae. Genomic sequences retrieved showed the presence of at least five highly conserved polymorphic Ly49 genes in horses, asses and zebras. These findings confirmed that the expansion of Ly49 occurred in the entire family. Several KIR-like sequences were also identified in the genome of Equids. Besides a previously identified non-functional KIR-Immunoglobulin-like transcript fusion gene (KIR-ILTA) and two putative pseudogenes, a KIR3DL-like sequence was analyzed. In contrast to previous observations made in the horse, the KIR3DL sequence, genomic organization and mRNA expression suggest that all Equids might produce a functional KIR receptor protein molecule with a single non-mutated immune tyrosine-based inhibition motif (ITIM) domain. No evidence for positive selection in the KIR3DL gene was found. Phylogenetic analysis including rhinoceros and tapir genomic DNA and deduced amino acid KIR-related sequences showed differences between families and even between species within the order Perissodactyla. The results suggest that the order Perissodactyla and its family Equidae with expanded Ly49 genes and with a potentially functional KIR gene may represent an interesting model for evolutionary biology of NKR genes. PMID:23724088

  1. Short Vegetative Phase-Like MADS-Box Genes Inhibit Floral Meristem Identity in Barley1[W][OA

    PubMed Central

    Trevaskis, Ben; Tadege, Million; Hemming, Megan N.; Peacock, W. James; Dennis, Elizabeth S.; Sheldon, Candice

    2007-01-01

    Analysis of the functions of Short Vegetative Phase (SVP)-like MADS-box genes in barley (Hordeum vulgare) indicated a role in determining meristem identity. Three SVP-like genes are expressed in vegetative tissues of barley: Barley MADS1 (BM1), BM10, and Vegetative to Reproductive Transition gene 2. These genes are induced by cold but are repressed during floral development. Ectopic expression of BM1 inhibited spike development and caused floral reversion in barley, with florets at the base of the spike replaced by tillers. Head emergence was delayed in plants that ectopically express BM1, primarily by delayed development after the floral transition, but expression levels of the barley VRN1 gene (HvVRN1) were not affected. Ectopic expression of BM10 inhibited spike development and caused partial floral reversion, where florets at the base of the spike were replaced by inflorescence-like structures, but did not affect heading date. Floral reversion occurred more frequently when BM1 and BM10 ectopic expression lines were grown in short-day conditions. BM1 and BM10 also inhibited floral development and caused floral reversion when expressed in Arabidopsis (Arabidopsis thaliana). We conclude that SVP-like genes function to suppress floral meristem identity in winter cereals. PMID:17114273

  2. A flexible and economical barcoding approach for highly multiplexed amplicon sequencing of diverse target genes

    PubMed Central

    Herbold, Craig W.; Pelikan, Claus; Kuzyk, Orest; Hausmann, Bela; Angel, Roey; Berry, David; Loy, Alexander

    2015-01-01

    High throughput sequencing of phylogenetic and functional gene amplicons provides tremendous insight into the structure and functional potential of complex microbial communities. Here, we introduce a highly adaptable and economical PCR approach to barcoding and pooling libraries of numerous target genes. In this approach, we replace gene- and sequencing platform-specific fusion primers with general, interchangeable barcoding primers, enabling nearly limitless customized barcode-primer combinations. Compared to barcoding with long fusion primers, our multiple-target gene approach is more economical because it overall requires lower number of primers and is based on short primers with generally lower synthesis and purification costs. To highlight our approach, we pooled over 900 different small-subunit rRNA and functional gene amplicon libraries obtained from various environmental or host-associated microbial community samples into a single, paired-end Illumina MiSeq run. Although the amplicon regions ranged in size from approximately 290 to 720 bp, we found no significant systematic sequencing bias related to amplicon length or gene target. Our results indicate that this flexible multiplexing approach produces large, diverse, and high quality sets of amplicon sequence data for modern studies in microbial ecology. PMID:26236305

  3. In planta functions of cytochrome P450 monooxygenase genes in the phytocassane biosynthetic gene cluster on rice chromosome 2.

    PubMed

    Ye, Zhongfeng; Yamazaki, Kohei; Minoda, Hiromi; Miyamoto, Koji; Miyazaki, Sho; Kawaide, Hiroshi; Yajima, Arata; Nojiri, Hideaki; Yamane, Hisakazu; Okada, Kazunori

    2018-06-01

    In response to environmental stressors such as blast fungal infections, rice produces phytoalexins, an antimicrobial diterpenoid compound. Together with momilactones, phytocassanes are among the major diterpenoid phytoalexins. The biosynthetic genes of diterpenoid phytoalexin are organized on the chromosome in functional gene clusters, comprising diterpene cyclase, dehydrogenase, and cytochrome P450 monooxygenase genes. Their functions have been studied extensively using in vitro enzyme assay systems. Specifically, P450 genes (CYP71Z6, Z7; CYP76M5, M6, M7, M8) on rice chromosome 2 have multifunctional activities associated with ent-copalyl diphosphate-related diterpene hydrocarbons, but the in planta contribution of these genes to diterpenoid phytoalexin production remains unknown. Here, we characterized cyp71z7 T-DNA mutant and CYP76M7/M8 RNAi lines to find that potential phytoalexin intermediates accumulated in these P450-suppressed rice plants. The results suggested that in planta, CYP71Z7 is responsible for C2-hydroxylation of phytocassanes and that CYP76M7/M8 is involved in C11α-hydroxylation of 3-hydroxy-cassadiene. Based on these results, we proposed potential routes of phytocassane biosynthesis in planta.

  4. The Variable Regions of Lactobacillus rhamnosus Genomes Reveal the Dynamic Evolution of Metabolic and Host-Adaptation Repertoires

    PubMed Central

    Ceapa, Corina; Davids, Mark; Ritari, Jarmo; Lambert, Jolanda; Wels, Michiel; Douillard, François P.; Smokvina, Tamara; de Vos, Willem M.; Knol, Jan; Kleerebezem, Michiel

    2016-01-01

    Lactobacillus rhamnosus is a diverse Gram-positive species with strains isolated from different ecological niches. Here, we report the genome sequence analysis of 40 diverse strains of L. rhamnosus and their genomic comparison, with a focus on the variable genome. Genomic comparison of 40 L. rhamnosus strains discriminated the conserved genes (core genome) and regions of plasticity involving frequent rearrangements and horizontal transfer (variome). The L. rhamnosus core genome encompasses 2,164 genes, out of 4,711 genes in total (the pan-genome). The accessory genome is dominated by genes encoding carbohydrate transport and metabolism, extracellular polysaccharides (EPS) biosynthesis, bacteriocin production, pili production, the cas system, and the associated clustered regularly interspaced short palindromic repeat (CRISPR) loci, and more than 100 transporter functions and mobile genetic elements like phages, plasmid genes, and transposons. A clade distribution based on amino acid differences between core (shared) proteins matched with the clade distribution obtained from the presence–absence of variable genes. The phylogenetic and variome tree overlap indicated that frequent events of gene acquisition and loss dominated the evolutionary segregation of the strains within this species, which is paralleled by evolutionary diversification of core gene functions. The CRISPR-Cas system could have contributed to this evolutionary segregation. Lactobacillus rhamnosus strains contain the genetic and metabolic machinery with strain-specific gene functions required to adapt to a large range of environments. A remarkable congruency of the evolutionary relatedness of the strains’ core and variome functions, possibly favoring interspecies genetic exchanges, underlines the importance of gene-acquisition and loss within the L. rhamnosus strain diversification. PMID:27358423

  5. pico-PLAZA, a genome database of microbial photosynthetic eukaryotes.

    PubMed

    Vandepoele, Klaas; Van Bel, Michiel; Richard, Guilhem; Van Landeghem, Sofie; Verhelst, Bram; Moreau, Hervé; Van de Peer, Yves; Grimsley, Nigel; Piganeau, Gwenael

    2013-08-01

    With the advent of next generation genome sequencing, the number of sequenced algal genomes and transcriptomes is rapidly growing. Although a few genome portals exist to browse individual genome sequences, exploring complete genome information from multiple species for the analysis of user-defined sequences or gene lists remains a major challenge. pico-PLAZA is a web-based resource (http://bioinformatics.psb.ugent.be/pico-plaza/) for algal genomics that combines different data types with intuitive tools to explore genomic diversity, perform integrative evolutionary sequence analysis and study gene functions. Apart from homologous gene families, multiple sequence alignments, phylogenetic trees, Gene Ontology, InterPro and text-mining functional annotations, different interactive viewers are available to study genome organization using gene collinearity and synteny information. Different search functions, documentation pages, export functions and an extensive glossary are available to guide non-expert scientists. To illustrate the versatility of the platform, different case studies are presented demonstrating how pico-PLAZA can be used to functionally characterize large-scale EST/RNA-Seq data sets and to perform environmental genomics. Functional enrichments analysis of 16 Phaeodactylum tricornutum transcriptome libraries offers a molecular view on diatom adaptation to different environments of ecological relevance. Furthermore, we show how complementary genomic data sources can easily be combined to identify marker genes to study the diversity and distribution of algal species, for example in metagenomes, or to quantify intraspecific diversity from environmental strains. © 2013 John Wiley & Sons Ltd and Society for Applied Microbiology.

  6. Metagenomic Insights into the Fibrolytic Microbiome in Yak Rumen

    PubMed Central

    Song, Lei; Liu, Di; Liu, Li; Chen, Furong; Wang, Min; Li, Jiabao; Zeng, Xiaowei; Dong, Zhiyang; Hu, Songnian; Li, Lingyan; Xu, Jian; Huang, Li; Dong, Xiuzhu

    2012-01-01

    The rumen hosts one of the most efficient microbial systems for degrading plant cell walls, yet the predominant cellulolytic proteins and fibrolytic mechanism(s) remain elusive. Here we investigated the cellulolytic microbiome of the yak rumen by using a combination of metagenome-based and bacterial artificial chromosome (BAC)-based functional screening approaches. Totally 223 fibrolytic BAC clones were pyrosequenced and 10,070 ORFs were identified. Among them 150 were annotated as the glycoside hydrolase (GH) genes for fibrolytic proteins, and the majority (69%) of them were clustered or linked with genes encoding related functions. Among the 35 fibrolytic contigs of >10 Kb in length, 25 were derived from Bacteroidetes and four from Firmicutes. Coverage analysis indicated that the fibrolytic genes on most Bacteroidetes-contigs were abundantly represented in the metagenomic sequences, and they were frequently linked with genes encoding SusC/SusD-type outer-membrane proteins. GH5, GH9, and GH10 cellulase/hemicellulase genes were predominant, but no GH48 exocellulase gene was found. Most (85%) of the cellulase and hemicellulase proteins possessed a signal peptide; only a few carried carbohydrate-binding modules, and no cellulosomal domains were detected. These findings suggest that the SucC/SucD-involving mechanism, instead of one based on cellulosomes or the free-enzyme system, serves a major role in lignocellulose degradation in yak rumen. Genes encoding an endoglucanase of a novel GH5 subfamily occurred frequently in the metagenome, and the recombinant proteins encoded by the genes displayed moderate Avicelase in addition to endoglucanase activities, suggesting their important contribution to lignocellulose degradation in the exocellulase-scarce rumen. PMID:22808161

  7. Combined sequence and sequence-structure-based methods for analyzing RAAS gene SNPs: a computational approach.

    PubMed

    Singh, Kh Dhanachandra; Karthikeyan, Muthusamy

    2014-12-01

    The renin-angiotensin-aldosterone system (RAAS) plays a key role in the regulation of blood pressure (BP). Mutations on the genes that encode components of the RAAS have played a significant role in genetic susceptibility to hypertension and have been intensively scrutinized. The identification of such probably causal mutations not only provides insight into the RAAS but may also serve as antihypertensive therapeutic targets and diagnostic markers. The methods for analyzing the SNPs from the huge dataset of SNPs, containing both functional and neutral SNPs is challenging by the experimental approach on every SNPs to determine their biological significance. To explore the functional significance of genetic mutation (SNPs), we adopted combined sequence and sequence-structure-based SNP analysis algorithm. Out of 3864 SNPs reported in dbSNP, we found 108 missense SNPs in the coding region and remaining in the non-coding region. In this study, we are reporting only those SNPs in coding region to be deleterious when three or more tools are predicted to be deleterious and which have high RMSD from the native structure. Based on these analyses, we have identified two SNPs of REN gene, eight SNPs of AGT gene, three SNPs of ACE gene, two SNPs of AT1R gene, three SNPs of CYP11B2 gene and three SNPs of CMA1 gene in the coding region were found to be deleterious. Further this type of study will be helpful in reducing the cost and time for identification of potential SNP and also helpful in selecting potential SNP for experimental study out of SNP pool.

  8. Genome-wide gene-based analysis suggests an association between Neuroligin 1 (NLGN1) and post-traumatic stress disorder.

    PubMed

    Kilaru, V; Iyer, S V; Almli, L M; Stevens, J S; Lori, A; Jovanovic, T; Ely, T D; Bradley, B; Binder, E B; Koen, N; Stein, D J; Conneely, K N; Wingo, A P; Smith, A K; Ressler, K J

    2016-05-24

    Post-traumatic stress disorder (PTSD) develops in only some people following trauma exposure, but the mechanisms differentially explaining risk versus resilience remain largely unknown. PTSD is heritable but candidate gene studies and genome-wide association studies (GWAS) have identified only a modest number of genes that reliably contribute to PTSD. New gene-based methods may help identify additional genes that increase risk for PTSD development or severity. We applied gene-based testing to GWAS data from the Grady Trauma Project (GTP), a primarily African American cohort, and identified two genes (NLGN1 and ZNRD1-AS1) that associate with PTSD after multiple test correction. Although the top SNP from NLGN1 did not replicate, we observed gene-based replication of NLGN1 with PTSD in the Drakenstein Child Health Study (DCHS) cohort from Cape Town. NLGN1 has previously been associated with autism, and it encodes neuroligin 1, a protein involved in synaptogenesis, learning, and memory. Within the GTP dataset, a single nucleotide polymorphism (SNP), rs6779753, underlying the gene-based association, associated with the intermediate phenotypes of higher startle response and greater functional magnetic resonance imaging activation of the amygdala, orbitofrontal cortex, right thalamus and right fusiform gyrus in response to fearful faces. These findings support a contribution of the NLGN1 gene pathway to the neurobiological underpinnings of PTSD.

  9. Genome-wide gene-based analysis suggests an association between Neuroligin 1 (NLGN1) and post-traumatic stress disorder

    PubMed Central

    Kilaru, V; Iyer, S V; Almli, L M; Stevens, J S; Lori, A; Jovanovic, T; Ely, T D; Bradley, B; Binder, E B; Koen, N; Stein, D J; Conneely, K N; Wingo, A P; Smith, A K; Ressler, K J

    2016-01-01

    Post-traumatic stress disorder (PTSD) develops in only some people following trauma exposure, but the mechanisms differentially explaining risk versus resilience remain largely unknown. PTSD is heritable but candidate gene studies and genome-wide association studies (GWAS) have identified only a modest number of genes that reliably contribute to PTSD. New gene-based methods may help identify additional genes that increase risk for PTSD development or severity. We applied gene-based testing to GWAS data from the Grady Trauma Project (GTP), a primarily African American cohort, and identified two genes (NLGN1 and ZNRD1-AS1) that associate with PTSD after multiple test correction. Although the top SNP from NLGN1 did not replicate, we observed gene-based replication of NLGN1 with PTSD in the Drakenstein Child Health Study (DCHS) cohort from Cape Town. NLGN1 has previously been associated with autism, and it encodes neuroligin 1, a protein involved in synaptogenesis, learning, and memory. Within the GTP dataset, a single nucleotide polymorphism (SNP), rs6779753, underlying the gene-based association, associated with the intermediate phenotypes of higher startle response and greater functional magnetic resonance imaging activation of the amygdala, orbitofrontal cortex, right thalamus and right fusiform gyrus in response to fearful faces. These findings support a contribution of the NLGN1 gene pathway to the neurobiological underpinnings of PTSD. PMID:27219346

  10. The Methanol Dehydrogenase Gene, mxaF, as a Functional and Phylogenetic Marker for Proteobacterial Methanotrophs in Natural Environments

    PubMed Central

    Lau, Evan; Fisher, Meredith C.; Steudler, Paul A.; Cavanaugh, Colleen M.

    2013-01-01

    The mxaF gene, coding for the large (α) subunit of methanol dehydrogenase, is highly conserved among distantly related methylotrophic species in the Alpha-, Beta- and Gammaproteobacteria. It is ubiquitous in methanotrophs, in contrast to other methanotroph-specific genes such as the pmoA and mmoX genes, which are absent in some methanotrophic proteobacterial genera. This study examined the potential for using the mxaF gene as a functional and phylogenetic marker for methanotrophs. mxaF and 16S rRNA gene phylogenies were constructed based on over 100 database sequences of known proteobacterial methanotrophs and other methylotrophs to assess their evolutionary histories. Topology tests revealed that mxaF and 16S rDNA genes of methanotrophs do not show congruent evolutionary histories, with incongruencies in methanotrophic taxa in the Methylococcaceae, Methylocystaceae, and Beijerinckiacea. However, known methanotrophs generally formed coherent clades based on mxaF gene sequences, allowing for phylogenetic discrimination of major taxa. This feature highlights the mxaF gene’s usefulness as a biomarker in studying the molecular diversity of proteobacterial methanotrophs in nature. To verify this, PCR-directed assays targeting this gene were used to detect novel methanotrophs from diverse environments including soil, peatland, hydrothermal vent mussel tissues, and methanotroph isolates. The placement of the majority of environmental mxaF gene sequences in distinct methanotroph-specific clades (Methylocystaceae and Methylococcaceae) detected in this study supports the use of mxaF as a biomarker for methanotrophic proteobacteria. PMID:23451130

  11. Microarray characterization of gene expression changes in blood during acute ethanol exposure

    PubMed Central

    2013-01-01

    Background As part of the civil aviation safety program to define the adverse effects of ethanol on flying performance, we performed a DNA microarray analysis of human whole blood samples from a five-time point study of subjects administered ethanol orally, followed by breathalyzer analysis, to monitor blood alcohol concentration (BAC) to discover significant gene expression changes in response to the ethanol exposure. Methods Subjects were administered either orange juice or orange juice with ethanol. Blood samples were taken based on BAC and total RNA was isolated from PaxGene™ blood tubes. The amplified cDNA was used in microarray and quantitative real-time polymerase chain reaction (RT-qPCR) analyses to evaluate differential gene expression. Microarray data was analyzed in a pipeline fashion to summarize and normalize and the results evaluated for relative expression across time points with multiple methods. Candidate genes showing distinctive expression patterns in response to ethanol were clustered by pattern and further analyzed for related function, pathway membership and common transcription factor binding within and across clusters. RT-qPCR was used with representative genes to confirm relative transcript levels across time to those detected in microarrays. Results Microarray analysis of samples representing 0%, 0.04%, 0.08%, return to 0.04%, and 0.02% wt/vol BAC showed that changes in gene expression could be detected across the time course. The expression changes were verified by qRT-PCR. The candidate genes of interest (GOI) identified from the microarray analysis and clustered by expression pattern across the five BAC points showed seven coordinately expressed groups. Analysis showed function-based networks, shared transcription factor binding sites and signaling pathways for members of the clusters. These include hematological functions, innate immunity and inflammation functions, metabolic functions expected of ethanol metabolism, and pancreatic and hepatic function. Five of the seven clusters showed links to the p38 MAPK pathway. Conclusions The results of this study provide a first look at changing gene expression patterns in human blood during an acute rise in blood ethanol concentration and its depletion because of metabolism and excretion, and demonstrate that it is possible to detect changes in gene expression using total RNA isolated from whole blood. The analysis approach for this study serves as a workflow to investigate the biology linked to expression changes across a time course and from these changes, to identify target genes that could serve as biomarkers linked to pilot performance. PMID:23883607

  12. Gene Therapy for the Retinal Degeneration of Usher Syndrome Caused by Mutations in MYO7A.

    PubMed

    Lopes, Vanda S; Williams, David S

    2015-01-20

    Usher syndrome is a deaf-blindness disorder. One of the subtypes, Usher 1B, is caused by loss of function of the gene encoding the unconventional myosin, MYO7A. A variety of different viral-based delivery approaches have been tested for retinal gene therapy to prevent the blindness of Usher 1B, and a clinical trial based on one of these approaches has begun. This review evaluates the different approaches. Copyright © 2015 Cold Spring Harbor Laboratory Press; all rights reserved.

  13. [Ubiquitination of recombinant adeno-associated viral vector and its application].

    PubMed

    Wang, Qi-zhao; Lu, Ying-hui; Diao, Yong; Xu, Rui-an

    2012-09-01

    Recombinant adeno-associated virus (rAAV) has been widely used as vector for gene therapy. However, the effectiveness of gene therapy based on rAAV needs to be further improved. Enhancement of the transduction efficiency is one of the most important fields for rAAV-based gene therapy. Recent results have showed that the ubiquitin-proteasome system plays an important role in the trafficking of rAAV vector in cytoplasm, and regulation of its function may significantly improve the transduction efficiency of rAAV vector in various types of cells and tissues.

  14. Phylogeny and expression profiling of CAD and CAD-like genes in hybrid Populus (P. deltoides × P. nigra): evidence from herbivore damage for subfunctionalization and functional divergence

    PubMed Central

    2010-01-01

    Background Cinnamyl Alcohol Dehydrogenase (CAD) proteins function in lignin biosynthesis and play a critical role in wood development and plant defense against stresses. Previous phylogenetic studies did not include genes from seedless plants and did not reflect the deep evolutionary history of this gene family. We reanalyzed the phylogeny of CAD and CAD-like genes using a representative dataset including lycophyte and bryophyte sequences. Many CAD/CAD-like genes do not seem to be associated with wood development under normal growth conditions. To gain insight into the functional evolution of CAD/CAD-like genes, we analyzed their expression in Populus plant tissues in response to feeding damage by gypsy moth larvae (Lymantria dispar L.). Expression of CAD/CAD-like genes in Populus tissues (xylem, leaves, and barks) was analyzed in herbivore-treated and non-treated plants by real time quantitative RT-PCR. Results CAD family genes were distributed in three classes based on sequence conservation. All the three classes are represented by seedless as well as seed plants, including the class of bona fide lignin pathway genes. The expression of some CAD/CAD-like genes that are not associated with xylem development were induced following herbivore damage in leaves, while other genes were induced in only bark or xylem tissues. Five of the CAD/CAD-like genes, however, showed a shift in expression from one tissue to another between non-treated and herbivore-treated plants. Systemic expression of the CAD/CAD-like genes was generally suppressed. Conclusions Our results indicated a correlation between the evolution of the CAD gene family and lignin and that the three classes of genes may have evolved in the ancestor of land plants. Our results also suggest that the CAD/CAD-like genes have evolved a diversity of expression profiles and potentially different functions, but that they are nonetheless co-regulated under stress conditions. PMID:20509918

  15. Competitive Genomic Screens of Barcoded Yeast Libraries

    PubMed Central

    Urbanus, Malene; Proctor, Michael; Heisler, Lawrence E.; Giaever, Guri; Nislow, Corey

    2011-01-01

    By virtue of advances in next generation sequencing technologies, we have access to new genome sequences almost daily. The tempo of these advances is accelerating, promising greater depth and breadth. In light of these extraordinary advances, the need for fast, parallel methods to define gene function becomes ever more important. Collections of genome-wide deletion mutants in yeasts and E. coli have served as workhorses for functional characterization of gene function, but this approach is not scalable, current gene-deletion approaches require each of the thousands of genes that comprise a genome to be deleted and verified. Only after this work is complete can we pursue high-throughput phenotyping. Over the past decade, our laboratory has refined a portfolio of competitive, miniaturized, high-throughput genome-wide assays that can be performed in parallel. This parallelization is possible because of the inclusion of DNA 'tags', or 'barcodes,' into each mutant, with the barcode serving as a proxy for the mutation and one can measure the barcode abundance to assess mutant fitness. In this study, we seek to fill the gap between DNA sequence and barcoded mutant collections. To accomplish this we introduce a combined transposon disruption-barcoding approach that opens up parallel barcode assays to newly sequenced, but poorly characterized microbes. To illustrate this approach we present a new Candida albicans barcoded disruption collection and describe how both microarray-based and next generation sequencing-based platforms can be used to collect 10,000 - 1,000,000 gene-gene and drug-gene interactions in a single experiment. PMID:21860376

  16. Discovering novel subsystems using comparative genomics

    PubMed Central

    Ferrer, Luciana; Shearer, Alexander G.; Karp, Peter D.

    2011-01-01

    Motivation: Key problems for computational genomics include discovering novel pathways in genome data, and discovering functional interaction partners for genes to define new members of partially elucidated pathways. Results: We propose a novel method for the discovery of subsystems from annotated genomes. For each gene pair, a score measuring the likelihood that the two genes belong to a same subsystem is computed using genome context methods. Genes are then grouped based on these scores, and the resulting groups are filtered to keep only high-confidence groups. Since the method is based on genome context analysis, it relies solely on structural annotation of the genomes. The method can be used to discover new pathways, find missing genes from a known pathway, find new protein complexes or other kinds of functional groups and assign function to genes. We tested the accuracy of our method in Escherichia coli K-12. In one configuration of the system, we find that 31.6% of the candidate groups generated by our method match a known pathway or protein complex closely, and that we rediscover 31.2% of all known pathways and protein complexes of at least 4 genes. We believe that a significant proportion of the candidates that do not match any known group in E.coli K-12 corresponds to novel subsystems that may represent promising leads for future laboratory research. We discuss in-depth examples of these findings. Availability: Predicted subsystems are available at http://brg.ai.sri.com/pwy-discovery/journal.html. Contact: lferrer@ai.sri.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21775308

  17. Integrated genome-wide Alu methylation and transcriptome profiling analyses reveal novel epigenetic regulatory networks associated with autism spectrum disorder.

    PubMed

    Saeliw, Thanit; Tangsuwansri, Chayanin; Thongkorn, Surangrat; Chonchaiya, Weerasak; Suphapeetiporn, Kanya; Mutirangura, Apiwat; Tencomnao, Tewin; Hu, Valerie W; Sarachana, Tewarit

    2018-01-01

    Alu elements are a group of repetitive elements that can influence gene expression through CpG residues and transcription factor binding. Altered gene expression and methylation profiles have been reported in various tissues and cell lines from individuals with autism spectrum disorder (ASD). However, the role of Alu elements in ASD remains unclear. We thus investigated whether Alu elements are associated with altered gene expression profiles in ASD. We obtained five blood-based gene expression profiles from the Gene Expression Omnibus database and human Alu-inserted gene lists from the TranspoGene database. Differentially expressed genes (DEGs) in ASD were identified from each study and overlapped with the human Alu-inserted genes. The biological functions and networks of Alu-inserted DEGs were then predicted by Ingenuity Pathway Analysis (IPA). A combined bisulfite restriction analysis of lymphoblastoid cell lines (LCLs) derived from 36 ASD and 20 sex- and age-matched unaffected individuals was performed to assess the global DNA methylation levels within Alu elements, and the Alu expression levels were determined by quantitative RT-PCR. In ASD blood or blood-derived cells, 320 Alu-inserted genes were reproducibly differentially expressed. Biological function and pathway analysis showed that these genes were significantly associated with neurodevelopmental disorders and neurological functions involved in ASD etiology. Interestingly, estrogen receptor and androgen signaling pathways implicated in the sex bias of ASD, as well as IL-6 signaling and neuroinflammation signaling pathways, were also highlighted. Alu methylation was not significantly different between the ASD and sex- and age-matched control groups. However, significantly altered Alu methylation patterns were observed in ASD cases sub-grouped based on Autism Diagnostic Interview-Revised scores compared with matched controls. Quantitative RT-PCR analysis of Alu expression also showed significant differences between ASD subgroups. Interestingly, Alu expression was correlated with methylation status in one phenotypic ASD subgroup. Alu methylation and expression were altered in LCLs from ASD subgroups. Our findings highlight the association of Alu elements with gene dysregulation in ASD blood samples and warrant further investigation. Moreover, the classification of ASD individuals into subgroups based on phenotypes may be beneficial and could provide insights into the still unknown etiology and the underlying mechanisms of ASD.

  18. Drosophila Pelle phosphorylates Dichaete protein and influences its subcellular distribution in developing oocytes.

    PubMed

    Mutsuddi, Mousumi; Mukherjee, Ashim; Shen, Baohe; Manley, James L; Nambu, John R

    2010-01-01

    The Drosophila Dichaete gene encodes a member of the Sox family of high mobility group (HMG) domain proteins that have crucial gene regulatory functions in diverse developmental processes. The subcellular localization and transcriptional regulatory activities of Sox proteins can be regulated by several post-translational modifications. To identify genes that functionally interact with Dichaete, we undertook a genetic modifier screen based on a Dichaete gain-of-function phenotype in the adult eye. Mutations in several genes, including decapentaplegic, engrailed and pelle, behaved as dominant modifiers of this eye phenotype. Further analysis of pelle mutants revealed that loss of pelle function results in alterations in the distinctive cytoplasmic distribution of Dichaete protein within the developing oocyte, as well as defects in the elaboration of individual egg chambers. The death domain-containing region of the Pelle protein kinase was found to associate with both Dichaete and mouse Sox2 proteins, and Pelle can phosphorylate Dichaete protein in vitro. Overall, these findings reveal that maternal functions of pelle are essential for proper localization of Dichaete protein in the oocyte and normal egg chamber formation. Dichaete appears to be a novel phosphorylation substrate for Pelle and may function in a Pelle-dependent signaling pathway during oogenesis.

  19. Functional Responses of Salt Marsh Microbial Communities to Long-Term Nutrient Enrichment

    PubMed Central

    Graves, Christopher J.; Makrides, Elizabeth J.; Schmidt, Victor T.; Giblin, Anne E.; Cardon, Zoe G.

    2016-01-01

    ABSTRACT Environmental nutrient enrichment from human agricultural and waste runoff could cause changes to microbial communities that allow them to capitalize on newly available resources. Currently, the response of microbial communities to nutrient enrichment remains poorly understood, and, while some studies have shown no clear changes in community composition in response to heavy nutrient loading, others targeting specific genes have demonstrated clear impacts. In this study, we compared functional metagenomic profiles from sediment samples taken along two salt marsh creeks, one of which was exposed for more than 40 years to treated sewage effluent at its head. We identified strong and consistent increases in the relative abundance of microbial genes related to each of the biochemical steps in the denitrification pathway at enriched sites. Despite fine-scale local increases in the abundance of denitrification-related genes, the overall community structures based on broadly defined functional groups and taxonomic annotations were similar and varied with other environmental factors, such as salinity, which were common to both creeks. Homology-based taxonomic assignments of nitrous oxide reductase sequences in our data show that increases are spread over a broad taxonomic range, thus limiting detection from taxonomic data alone. Together, these results illustrate a functionally targeted yet taxonomically broad response of microbial communities to anthropogenic nutrient loading, indicating some resolution to the apparently conflicting results of existing studies on the impacts of nutrient loading in sediment communities. IMPORTANCE In this study, we used environmental metagenomics to assess the response of microbial communities in estuarine sediments to long-term, nutrient-rich sewage effluent exposure. Unlike previous studies, which have mainly characterized communities based on taxonomic data or primer-based amplification of specific target genes, our whole-genome metagenomics approach allowed an unbiased assessment of the abundance of denitrification-related genes across the entire community. We identified strong and consistent increases in the relative abundance of gene sequences related to denitrification pathways across a broad phylogenetic range at sites exposed to long-term nutrient addition. While further work is needed to determine the consequences of these community responses in regulating environmental nutrient cycles, the increased abundance of bacteria harboring denitrification genes suggests that such processes may be locally upregulated. In addition, our results illustrate how whole-genome metagenomics combined with targeted hypothesis testing can reveal fine-scale responses of microbial communities to environmental disturbance. PMID:26944843

  20. Functional Responses of Salt Marsh Microbial Communities to Long-Term Nutrient Enrichment.

    PubMed

    Graves, Christopher J; Makrides, Elizabeth J; Schmidt, Victor T; Giblin, Anne E; Cardon, Zoe G; Rand, David M

    2016-05-01

    Environmental nutrient enrichment from human agricultural and waste runoff could cause changes to microbial communities that allow them to capitalize on newly available resources. Currently, the response of microbial communities to nutrient enrichment remains poorly understood, and, while some studies have shown no clear changes in community composition in response to heavy nutrient loading, others targeting specific genes have demonstrated clear impacts. In this study, we compared functional metagenomic profiles from sediment samples taken along two salt marsh creeks, one of which was exposed for more than 40 years to treated sewage effluent at its head. We identified strong and consistent increases in the relative abundance of microbial genes related to each of the biochemical steps in the denitrification pathway at enriched sites. Despite fine-scale local increases in the abundance of denitrification-related genes, the overall community structures based on broadly defined functional groups and taxonomic annotations were similar and varied with other environmental factors, such as salinity, which were common to both creeks. Homology-based taxonomic assignments of nitrous oxide reductase sequences in our data show that increases are spread over a broad taxonomic range, thus limiting detection from taxonomic data alone. Together, these results illustrate a functionally targeted yet taxonomically broad response of microbial communities to anthropogenic nutrient loading, indicating some resolution to the apparently conflicting results of existing studies on the impacts of nutrient loading in sediment communities. In this study, we used environmental metagenomics to assess the response of microbial communities in estuarine sediments to long-term, nutrient-rich sewage effluent exposure. Unlike previous studies, which have mainly characterized communities based on taxonomic data or primer-based amplification of specific target genes, our whole-genome metagenomics approach allowed an unbiased assessment of the abundance of denitrification-related genes across the entire community. We identified strong and consistent increases in the relative abundance of gene sequences related to denitrification pathways across a broad phylogenetic range at sites exposed to long-term nutrient addition. While further work is needed to determine the consequences of these community responses in regulating environmental nutrient cycles, the increased abundance of bacteria harboring denitrification genes suggests that such processes may be locally upregulated. In addition, our results illustrate how whole-genome metagenomics combined with targeted hypothesis testing can reveal fine-scale responses of microbial communities to environmental disturbance. Copyright © 2016 Graves et al.

Top