Sample records for candidate gene database

  1. The Candidate Cancer Gene Database: a database of cancer driver genes from forward genetic screens in mice.

    PubMed

    Abbott, Kenneth L; Nyre, Erik T; Abrahante, Juan; Ho, Yen-Yi; Isaksson Vogel, Rachel; Starr, Timothy K

    2015-01-01

    Identification of cancer driver gene mutations is crucial for advancing cancer therapeutics. Due to the overwhelming number of passenger mutations in the human tumor genome, it is difficult to pinpoint causative driver genes. Using transposon mutagenesis in mice many laboratories have conducted forward genetic screens and identified thousands of candidate driver genes that are highly relevant to human cancer. Unfortunately, this information is difficult to access and utilize because it is scattered across multiple publications using different mouse genome builds and strength metrics. To improve access to these findings and facilitate meta-analyses, we developed the Candidate Cancer Gene Database (CCGD, http://ccgd-starrlab.oit.umn.edu/). The CCGD is a manually curated database containing a unified description of all identified candidate driver genes and the genomic location of transposon common insertion sites (CISs) from all currently published transposon-based screens. To demonstrate relevance to human cancer, we performed a modified gene set enrichment analysis using KEGG pathways and show that human cancer pathways are highly enriched in the database. We also used hierarchical clustering to identify pathways enriched in blood cancers compared to solid cancers. The CCGD is a novel resource available to scientists interested in the identification of genetic drivers of cancer. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. Database of cattle candidate genes and genetic markers for milk production and mastitis

    PubMed Central

    Ogorevc, J; Kunej, T; Razpet, A; Dovc, P

    2009-01-01

    A cattle database of candidate genes and genetic markers for milk production and mastitis has been developed to provide an integrated research tool incorporating different types of information supporting a genomic approach to study lactation, udder development and health. The database contains 943 genes and genetic markers involved in mammary gland development and function, representing candidates for further functional studies. The candidate loci were drawn on a genetic map to reveal positional overlaps. For identification of candidate loci, data from seven different research approaches were exploited: (i) gene knockouts or transgenes in mice that result in specific phenotypes associated with mammary gland (143 loci); (ii) cattle QTL for milk production (344) and mastitis related traits (71); (iii) loci with sequence variations that show specific allele-phenotype interactions associated with milk production (24) or mastitis (10) in cattle; (iv) genes with expression profiles associated with milk production (207) or mastitis (107) in cattle or mouse; (v) cattle milk protein genes that exist in different genetic variants (9); (vi) miRNAs expressed in bovine mammary gland (32) and (vii) epigenetically regulated cattle genes associated with mammary gland function (1). Fourty-four genes found by multiple independent analyses were suggested as the most promising candidates and were further in silico analysed for expression levels in lactating mammary gland, genetic variability and top biological functions in functional networks. A miRNA target search for mammary gland expressed miRNAs identified 359 putative binding sites in 3′UTRs of candidate genes. PMID:19508288

  3. Knowledge Discovery in Biological Databases for Revealing Candidate Genes Linked to Complex Phenotypes.

    PubMed

    Hassani-Pak, Keywan; Rawlings, Christopher

    2017-06-13

    Genetics and "omics" studies designed to uncover genotype to phenotype relationships often identify large numbers of potential candidate genes, among which the causal genes are hidden. Scientists generally lack the time and technical expertise to review all relevant information available from the literature, from key model species and from a potentially wide range of related biological databases in a variety of data formats with variable quality and coverage. Computational tools are needed for the integration and evaluation of heterogeneous information in order to prioritise candidate genes and components of interaction networks that, if perturbed through potential interventions, have a positive impact on the biological outcome in the whole organism without producing negative side effects. Here we review several bioinformatics tools and databases that play an important role in biological knowledge discovery and candidate gene prioritization. We conclude with several key challenges that need to be addressed in order to facilitate biological knowledge discovery in the future.

  4. A literature search tool for intelligent extraction of disease-associated genes.

    PubMed

    Jung, Jae-Yoon; DeLuca, Todd F; Nelson, Tristan H; Wall, Dennis P

    2014-01-01

    To extract disorder-associated genes from the scientific literature in PubMed with greater sensitivity for literature-based support than existing methods. We developed a PubMed query to retrieve disorder-related, original research articles. Then we applied a rule-based text-mining algorithm with keyword matching to extract target disorders, genes with significant results, and the type of study described by the article. We compared our resulting candidate disorder genes and supporting references with existing databases. We demonstrated that our candidate gene set covers nearly all genes in manually curated databases, and that the references supporting the disorder-gene link are more extensive and accurate than other general purpose gene-to-disorder association databases. We implemented a novel publication search tool to find target articles, specifically focused on links between disorders and genotypes. Through comparison against gold-standard manually updated gene-disorder databases and comparison with automated databases of similar functionality we show that our tool can search through the entirety of PubMed to extract the main gene findings for human diseases rapidly and accurately.

  5. Defining the Human Macula Transcriptome and Candidate Retinal Disease Genes UsingEyeSAGE

    PubMed Central

    Rickman, Catherine Bowes; Ebright, Jessica N.; Zavodni, Zachary J.; Yu, Ling; Wang, Tianyuan; Daiger, Stephen P.; Wistow, Graeme; Boon, Kathy; Hauser, Michael A.

    2009-01-01

    Purpose To develop large-scale, high-throughput annotation of the human macula transcriptome and to identify and prioritize candidate genes for inherited retinal dystrophies, based on ocular-expression profiles using serial analysis of gene expression (SAGE). Methods Two human retina and two retinal pigment epithelium (RPE)/choroid SAGE libraries made from matched macula or midperipheral retina and adjacent RPE/choroid of morphologically normal 28- to 66-year-old donors and a human central retina longSAGE library made from 41- to 66-year-old donors were generated. Their transcription profiles were entered into a relational database, EyeSAGE, including microarray expression profiles of retina and publicly available normal human tissue SAGE libraries. EyeSAGE was used to identify retina- and RPE-specific and -associated genes, and candidate genes for retina and RPE disease loci. Differential and/or cell-type specific expression was validated by quantitative and single-cell RT-PCR. Results Cone photoreceptor-associated gene expression was elevated in the macula transcription profiles. Analysis of the longSAGE retina tags enhanced tag-to-gene mapping and revealed alternatively spliced genes. Analysis of candidate gene expression tables for the identified Bardet-Biedl syndrome disease gene (BBS5) in the BBS5 disease region table yielded BBS5 as the top candidate. Compelling candidates for inherited retina diseases were identified. Conclusions The EyeSAGE database, combining three different gene-profiling platforms including the authors’ multidonor-derived retina/RPE SAGE libraries and existing single-donor retina/RPE libraries, is a powerful resource for definition of the retina and RPE transcriptomes. It can be used to identify retina-specific genes, including alternatively spliced transcripts and to prioritize candidate genes within mapped retinal disease regions. PMID:16723438

  6. Defining the human macula transcriptome and candidate retinal disease genes using EyeSAGE.

    PubMed

    Bowes Rickman, Catherine; Ebright, Jessica N; Zavodni, Zachary J; Yu, Ling; Wang, Tianyuan; Daiger, Stephen P; Wistow, Graeme; Boon, Kathy; Hauser, Michael A

    2006-06-01

    To develop large-scale, high-throughput annotation of the human macula transcriptome and to identify and prioritize candidate genes for inherited retinal dystrophies, based on ocular-expression profiles using serial analysis of gene expression (SAGE). Two human retina and two retinal pigment epithelium (RPE)/choroid SAGE libraries made from matched macula or midperipheral retina and adjacent RPE/choroid of morphologically normal 28- to 66-year-old donors and a human central retina longSAGE library made from 41- to 66-year-old donors were generated. Their transcription profiles were entered into a relational database, EyeSAGE, including microarray expression profiles of retina and publicly available normal human tissue SAGE libraries. EyeSAGE was used to identify retina- and RPE-specific and -associated genes, and candidate genes for retina and RPE disease loci. Differential and/or cell-type specific expression was validated by quantitative and single-cell RT-PCR. Cone photoreceptor-associated gene expression was elevated in the macula transcription profiles. Analysis of the longSAGE retina tags enhanced tag-to-gene mapping and revealed alternatively spliced genes. Analysis of candidate gene expression tables for the identified Bardet-Biedl syndrome disease gene (BBS5) in the BBS5 disease region table yielded BBS5 as the top candidate. Compelling candidates for inherited retina diseases were identified. The EyeSAGE database, combining three different gene-profiling platforms including the authors' multidonor-derived retina/RPE SAGE libraries and existing single-donor retina/RPE libraries, is a powerful resource for definition of the retina and RPE transcriptomes. It can be used to identify retina-specific genes, including alternatively spliced transcripts and to prioritize candidate genes within mapped retinal disease regions.

  7. Identification of Inherited Retinal Disease-Associated Genetic Variants in 11 Candidate Genes.

    PubMed

    Astuti, Galuh D N; van den Born, L Ingeborgh; Khan, M Imran; Hamel, Christian P; Bocquet, Béatrice; Manes, Gaël; Quinodoz, Mathieu; Ali, Manir; Toomes, Carmel; McKibbin, Martin; El-Asrag, Mohammed E; Haer-Wigman, Lonneke; Inglehearn, Chris F; Black, Graeme C M; Hoyng, Carel B; Cremers, Frans P M; Roosing, Susanne

    2018-01-10

    Inherited retinal diseases (IRDs) display an enormous genetic heterogeneity. Whole exome sequencing (WES) recently identified genes that were mutated in a small proportion of IRD cases. Consequently, finding a second case or family carrying pathogenic variants in the same candidate gene often is challenging. In this study, we searched for novel candidate IRD gene-associated variants in isolated IRD families, assessed their causality, and searched for novel genotype-phenotype correlations. Whole exome sequencing was performed in 11 probands affected with IRDs. Homozygosity mapping data was available for five cases. Variants with minor allele frequencies ≤ 0.5% in public databases were selected as candidate disease-causing variants. These variants were ranked based on their: (a) presence in a gene that was previously implicated in IRD; (b) minor allele frequency in the Exome Aggregation Consortium database (ExAC); (c) in silico pathogenicity assessment using the combined annotation dependent depletion (CADD) score; and (d) interaction of the corresponding protein with known IRD-associated proteins. Twelve unique variants were found in 11 different genes in 11 IRD probands. Novel autosomal recessive and dominant inheritance patterns were found for variants in Small Nuclear Ribonucleoprotein U5 Subunit 200 ( SNRNP200 ) and Zinc Finger Protein 513 ( ZNF513 ), respectively. Using our pathogenicity assessment, a variant in DEAH-Box Helicase 32 ( DHX32 ) was the top ranked novel candidate gene to be associated with IRDs, followed by eight medium and lower ranked candidate genes. The identification of candidate disease-associated sequence variants in 11 single families underscores the notion that the previously identified IRD-associated genes collectively carry > 90% of the defects implicated in IRDs. To identify multiple patients or families with variants in the same gene and thereby provide extra proof for pathogenicity, worldwide data sharing is needed.

  8. Current limitations of SNP data from the public domain for studies of complex disorders: a test for ten candidate genes for obesity and osteoporosis.

    PubMed

    Dvornyk, Volodymyr; Long, Ji-Rong; Xiong, Dong-Hai; Liu, Peng-Yuan; Zhao, Lan-Juan; Shen, Hui; Zhang, Yuan-Yuan; Liu, Yong-Jun; Rocha-Sanchez, Sonia; Xiao, Peng; Recker, Robert R; Deng, Hong-Wen

    2004-02-25

    Public SNP databases are frequently used to choose SNPs for candidate genes in the association and linkage studies of complex disorders. However, their utility for such studies of diseases with ethnic-dependent background has never been evaluated. To estimate the accuracy and completeness of SNP public databases, we analyzed the allele frequencies of 41 SNPs in 10 candidate genes for obesity and/or osteoporosis in a large American-Caucasian sample (1,873 individuals from 405 nuclear families) by PCR-invader assay. We compared our results with those from the databases and other published studies. Of the 41 SNPs, 8 were monomorphic in our sample. Twelve were reported for the first time for Caucasians and the other 29 SNPs in our sample essentially confirmed the respective allele frequencies for Caucasians in the databases and previous studies. The comparison of our data with other ethnic groups showed significant differentiation between the three major world ethnic groups at some SNPs (Caucasians and Africans differed at 3 of the 18 shared SNPs, and Caucasians and Asians differed at 13 of the 22 shared SNPs). This genetic differentiation may have an important implication for studying the well-known ethnic differences in the prevalence of obesity and osteoporosis, and complex disorders in general. A comparative analysis of the SNP data of the candidate genes obtained in the present study, as well as those retrieved from the public domain, suggests that the databases may currently have serious limitations for studying complex disorders with an ethnic-dependent background due to the incomplete and uneven representation of the candidate SNPs in the databases for the major ethnic groups. This conclusion attests to the imperative necessity of large-scale and accurate characterization of these SNPs in different ethnic groups.

  9. Current limitations of SNP data from the public domain for studies of complex disorders: a test for ten candidate genes for obesity and osteoporosis

    PubMed Central

    Dvornyk, Volodymyr; Long, Ji-Rong; Xiong, Dong-Hai; Liu, Peng-Yuan; Zhao, Lan-Juan; Shen, Hui; Zhang, Yuan-Yuan; Liu, Yong-Jun; Rocha-Sanchez, Sonia; Xiao, Peng; Recker, Robert R; Deng, Hong-Wen

    2004-01-01

    Background Public SNP databases are frequently used to choose SNPs for candidate genes in the association and linkage studies of complex disorders. However, their utility for such studies of diseases with ethnic-dependent background has never been evaluated. Results To estimate the accuracy and completeness of SNP public databases, we analyzed the allele frequencies of 41 SNPs in 10 candidate genes for obesity and/or osteoporosis in a large American-Caucasian sample (1,873 individuals from 405 nuclear families) by PCR-invader assay. We compared our results with those from the databases and other published studies. Of the 41 SNPs, 8 were monomorphic in our sample. Twelve were reported for the first time for Caucasians and the other 29 SNPs in our sample essentially confirmed the respective allele frequencies for Caucasians in the databases and previous studies. The comparison of our data with other ethnic groups showed significant differentiation between the three major world ethnic groups at some SNPs (Caucasians and Africans differed at 3 of the 18 shared SNPs, and Caucasians and Asians differed at 13 of the 22 shared SNPs). This genetic differentiation may have an important implication for studying the well-known ethnic differences in the prevalence of obesity and osteoporosis, and complex disorders in general. Conclusion A comparative analysis of the SNP data of the candidate genes obtained in the present study, as well as those retrieved from the public domain, suggests that the databases may currently have serious limitations for studying complex disorders with an ethnic-dependent background due to the incomplete and uneven representation of the candidate SNPs in the databases for the major ethnic groups. This conclusion attests to the imperative necessity of large-scale and accurate characterization of these SNPs in different ethnic groups. PMID:15113403

  10. Mining biological databases for candidate disease genes

    NASA Astrophysics Data System (ADS)

    Braun, Terry A.; Scheetz, Todd; Webster, Gregg L.; Casavant, Thomas L.

    2001-07-01

    The publicly-funded effort to sequence the complete nucleotide sequence of the human genome, the Human Genome Project (HGP), has currently produced more than 93% of the 3 billion nucleotides of the human genome into a preliminary `draft' format. In addition, several valuable sources of information have been developed as direct and indirect results of the HGP. These include the sequencing of model organisms (rat, mouse, fly, and others), gene discovery projects (ESTs and full-length), and new technologies such as expression analysis and resources (micro-arrays or gene chips). These resources are invaluable for the researchers identifying the functional genes of the genome that transcribe and translate into the transcriptome and proteome, both of which potentially contain orders of magnitude more complexity than the genome itself. Preliminary analyses of this data identified approximately 30,000 - 40,000 human `genes.' However, the bulk of the effort still remains -- to identify the functional and structural elements contained within the transcriptome and proteome, and to associate function in the transcriptome and proteome to genes. A fortuitous consequence of the HGP is the existence of hundreds of databases containing biological information that may contain relevant data pertaining to the identification of disease-causing genes. The task of mining these databases for information on candidate genes is a commercial application of enormous potential. We are developing a system to acquire and mine data from specific databases to aid our efforts to identify disease genes. A high speed cluster of Linux of workstations is used to analyze sequence and perform distributed sequence alignments as part of our data mining and processing. This system has been used to mine GeneMap99 sequences within specific genomic intervals to identify potential candidate disease genes associated with Bardet-Biedle Syndrome (BBS).

  11. The prediction of candidate genes for cervix related cancer through gene ontology and graph theoretical approach.

    PubMed

    Hindumathi, V; Kranthi, T; Rao, S B; Manimaran, P

    2014-06-01

    With rapidly changing technology, prediction of candidate genes has become an indispensable task in recent years mainly in the field of biological research. The empirical methods for candidate gene prioritization that succors to explore the potential pathway between genetic determinants and complex diseases are highly cumbersome and labor intensive. In such a scenario predicting potential targets for a disease state through in silico approaches are of researcher's interest. The prodigious availability of protein interaction data coupled with gene annotation renders an ease in the accurate determination of disease specific candidate genes. In our work we have prioritized the cervix related cancer candidate genes by employing Csaba Ortutay and his co-workers approach of identifying the candidate genes through graph theoretical centrality measures and gene ontology. With the advantage of the human protein interaction data, cervical cancer gene sets and the ontological terms, we were able to predict 15 novel candidates for cervical carcinogenesis. The disease relevance of the anticipated candidate genes was corroborated through a literature survey. Also the presence of the drugs for these candidates was detected through Therapeutic Target Database (TTD) and DrugMap Central (DMC) which affirms that they may be endowed as potential drug targets for cervical cancer.

  12. Identification of candidate genes in Populus cell wall biosynthesis using text-mining, co-expression network and comparative genomics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yang, Xiaohan; Ye, Chuyu; Bisaria, Anjali

    2011-01-01

    Populus is an important bioenergy crop for bioethanol production. A greater understanding of cell wall biosynthesis processes is critical in reducing biomass recalcitrance, a major hindrance in efficient generation of ethanol from lignocellulosic biomass. Here, we report the identification of candidate cell wall biosynthesis genes through the development and application of a novel bioinformatics pipeline. As a first step, via text-mining of PubMed publications, we obtained 121 Arabidopsis genes that had the experimental evidences supporting their involvement in cell wall biosynthesis or remodeling. The 121 genes were then used as bait genes to query an Arabidopsis co-expression database and additionalmore » genes were identified as neighbors of the bait genes in the network, increasing the number of genes to 548. The 548 Arabidopsis genes were then used to re-query the Arabidopsis co-expression database and re-construct a network that captured additional network neighbors, expanding to a total of 694 genes. The 694 Arabidopsis genes were computationally divided into 22 clusters. Queries of the Populus genome using the Arabidopsis genes revealed 817 Populus orthologs. Functional analysis of gene ontology and tissue-specific gene expression indicated that these Arabidopsis and Populus genes are high likelihood candidates for functional genomics in relation to cell wall biosynthesis.« less

  13. HerDing: herb recommendation system to treat diseases using genes and chemicals

    PubMed Central

    Choi, Wonjun; Choi, Chan-Hun; Kim, Young Ran; Kim, Seon-Jong; Na, Chang-Su; Lee, Hyunju

    2016-01-01

    In recent years, herbs have been researched for new drug candidates because they have a long empirical history of treating diseases and are relatively free from side effects. Studies to scientifically prove the medical efficacy of herbs for target diseases often spend a considerable amount of time and effort in choosing candidate herbs and in performing experiments to measure changes of marker genes when treating herbs. A computational approach to recommend herbs for treating diseases might be helpful to promote efficiency in the early stage of such studies. Although several databases related to traditional Chinese medicine have been already developed, there is no specialized Web tool yet recommending herbs to treat diseases based on disease-related genes. Therefore, we developed a novel search engine, HerDing, focused on retrieving candidate herb-related information with user search terms (a list of genes, a disease name, a chemical name or an herb name). HerDing was built by integrating public databases and by applying a text-mining method. The HerDing website is free and open to all users, and there is no login requirement. Database URL: http://combio.gist.ac.kr/herding PMID:26980517

  14. HerDing: herb recommendation system to treat diseases using genes and chemicals.

    PubMed

    Choi, Wonjun; Choi, Chan-Hun; Kim, Young Ran; Kim, Seon-Jong; Na, Chang-Su; Lee, Hyunju

    2016-01-01

    In recent years, herbs have been researched for new drug candidates because they have a long empirical history of treating diseases and are relatively free from side effects. Studies to scientifically prove the medical efficacy of herbs for target diseases often spend a considerable amount of time and effort in choosing candidate herbs and in performing experiments to measure changes of marker genes when treating herbs. A computational approach to recommend herbs for treating diseases might be helpful to promote efficiency in the early stage of such studies. Although several databases related to traditional Chinese medicine have been already developed, there is no specialized Web tool yet recommending herbs to treat diseases based on disease-related genes. Therefore, we developed a novel search engine, HerDing, focused on retrieving candidate herb-related information with user search terms (a list of genes, a disease name, a chemical name or an herb name). HerDing was built by integrating public databases and by applying a text-mining method. The HerDing website is free and open to all users, and there is no login requirement. Database URL: http://combio.gist.ac.kr/herding. © The Author(s) 2016. Published by Oxford University Press.

  15. Genetic basis of interindividual susceptibility to cancer cachexia: selection of potential candidate gene polymorphisms for association studies.

    PubMed

    Johns, N; Tan, B H; MacMillan, M; Solheim, T S; Ross, J A; Baracos, V E; Damaraju, S; Fearon, K C H

    2014-12-01

    Cancer cachexia is a complex and multifactorial disease. Evolving definitions highlight the fact that a diverse range of biological processes contribute to cancer cachexia. Part of the variation in who will and who will not develop cancer cachexia may be genetically determined. As new definitions, classifications and biological targets continue to evolve, there is a need for reappraisal of the literature for future candidate association studies. This review summarizes genes identified or implicated as well as putative candidate genes contributing to cachexia, identified through diverse technology platforms and model systems to further guide association studies. A systematic search covering 1986-2012 was performed for potential candidate genes / genetic polymorphisms relating to cancer cachexia. All candidate genes were reviewed for functional polymorphisms or clinically significant polymorphisms associated with cachexia using the OMIM and GeneRIF databases. Pathway analysis software was used to reveal possible network associations between genes. Functionality of SNPs/genes was explored based on published literature, algorithms for detecting putative deleterious SNPs and interrogating the database for expression of quantitative trait loci (eQTLs). A total of 154 genes associated with cancer cachexia were identified and explored for functional polymorphisms. Of these 154 genes, 119 had a combined total of 281 polymorphisms with functional and/or clinical significance in terms of cachexia associated with them. Of these, 80 polymorphisms (in 51 genes) were replicated in more than one study with 24 polymorphisms found to influence two or more hallmarks of cachexia (i.e., inflammation, loss of fat mass and/or lean mass and reduced survival). Selection of candidate genes and polymorphisms is a key element of multigene study design. The present study provides a contemporary basis to select genes and/or polymorphisms for further association studies in cancer cachexia, and to develop their potential as susceptibility biomarkers of cachexia.

  16. A public platform for the verification of the phenotypic effect of candidate genes for resistance to aflatoxin accumulation and Aspergillus flavus infection in maize.

    PubMed

    Warburton, Marilyn L; Williams, William Paul; Hawkins, Leigh; Bridges, Susan; Gresham, Cathy; Harper, Jonathan; Ozkan, Seval; Mylroie, J Erik; Shan, Xueyan

    2011-07-01

    A public candidate gene testing pipeline for resistance to aflatoxin accumulation or Aspergillus flavus infection in maize is presented here. The pipeline consists of steps for identifying, testing, and verifying the association of selected maize gene sequences with resistance under field conditions. Resources include a database of genetic and protein sequences associated with the reduction in aflatoxin contamination from previous studies; eight diverse inbred maize lines for polymorphism identification within any maize gene sequence; four Quantitative Trait Loci (QTL) mapping populations and one association mapping panel, all phenotyped for aflatoxin accumulation resistance and associated phenotypes; and capacity for Insertion/Deletion (InDel) and SNP genotyping in the population(s) for mapping. To date, ten genes have been identified as possible candidate genes and put through the candidate gene testing pipeline, and results are presented here to demonstrate the utility of the pipeline.

  17. Islander: A database of precisely mapped genomic islands in tRNA and tmRNA genes

    DOE PAGES

    Hudson, Corey M.; Lau, Britney Y.; Williams, Kelly P.

    2014-11-05

    Genomic islands are mobile DNAs that are major agents of bacterial and archaeal evolution. Integration into prokaryotic chromosomes usually occurs site-specifically at tRNA or tmRNA gene (together, tDNA) targets, catalyzed by tyrosine integrases. This splits the target gene, yet sequences within the island restore the disrupted gene; the regenerated target and its displaced fragment precisely mark the endpoints of the island. We applied this principle to search for islands in genomic DNA sequences. Our algorithm identifies tDNAs, finds fragments of those tDNAs in the same replicon and removes unlikely candidate islands through a series of filters. A search for islandsmore » in 2168 whole prokaryotic genomes produced 3919 candidates. The website Islander (recently moved to http://bioinformatics.sandia.gov/islander/) presents these precisely mapped candidate islands, the gene content and the island sequence. The algorithm further insists that each island encode an integrase, and attachment site sequence identity is carefully noted; therefore, the database also serves in the study of integrase site-specificity and its evolution.« less

  18. Identification of possible genetic polymorphisms involved in cancer cachexia: a systematic review.

    PubMed

    Tan, Benjamin H L; Ross, James A; Kaasa, Stein; Skorpen, Frank; Fearon, Kenneth C H

    2011-04-01

    Cancer cachexia is a polygenic and complex syndrome. Genetic variations in regulation of the inflammatory response, muscle and fat metabolic pathways, and pathways in appetite regulation are likely to contribute to the susceptibility or resistance to developing cancer cachexia. A systematic search of Medline and EmBase databases, covering 1986-2008 was performed for potential candidate genes/genetic polymorphisms relating to cancer cachexia. Related genes were then identified using pathway functional analysis software. All candidate genes were reviewed for functional polymorphisms or clinically significant polymorphisms associated with cachexia using the OMIM and GeneRIF databases. Genes with variants which had functional or clinical associations with cachexia and replicated in at least one study were entered into pathway analysis software to reveal possible network associations between genes. A total of 184 polymorphisms with functional or clinical relevance to cancer cachexia were identified in 92 candidate genes. Of these, 42 polymorphisms (in 33 genes) were replicated in more than one study with 13 polymorphisms found to influence two or more hallmarks of cachexia (i.e. inflammation, loss of fat mass and/or lean mass and reduced survival). Thirty-three genes were found to be significantly interconnected in two major networks with four genes (ADIPOQ, IL6, NFKB1 and TLR4) interlinking both networks. Selection of candidate genes and polymorphisms is a key element of multigene study design. The present study provides an initial framework to select genes/polymorphisms for further study in cancer cachexia, and to develop their potential as susceptibility biomarkers of developing cachexia.

  19. PosMed-plus: an intelligent search engine that inferentially integrates cross-species information resources for molecular breeding of plants.

    PubMed

    Makita, Yuko; Kobayashi, Norio; Mochizuki, Yoshiki; Yoshida, Yuko; Asano, Satomi; Heida, Naohiko; Deshpande, Mrinalini; Bhatia, Rinki; Matsushima, Akihiro; Ishii, Manabu; Kawaguchi, Shuji; Iida, Kei; Hanada, Kosuke; Kuromori, Takashi; Seki, Motoaki; Shinozaki, Kazuo; Toyoda, Tetsuro

    2009-07-01

    Molecular breeding of crops is an efficient way to upgrade plant functions useful to mankind. A key step is forward genetics or positional cloning to identify the genes that confer useful functions. In order to accelerate the whole research process, we have developed an integrated database system powered by an intelligent data-retrieval engine termed PosMed-plus (Positional Medline for plant upgrading science), allowing us to prioritize highly promising candidate genes in a given chromosomal interval(s) of Arabidopsis thaliana and rice, Oryza sativa. By inferentially integrating cross-species information resources including genomes, transcriptomes, proteomes, localizomes, phenomes and literature, the system compares a user's query, such as phenotypic or functional keywords, with the literature associated with the relevant genes located within the interval. By utilizing orthologous and paralogous correspondences, PosMed-plus efficiently integrates cross-species information to facilitate the ranking of rice candidate genes based on evidence from other model species such as Arabidopsis. PosMed-plus is a plant science version of the PosMed system widely used by mammalian researchers, and provides both a powerful integrative search function and a rich integrative display of the integrated databases. PosMed-plus is the first cross-species integrated database that inferentially prioritizes candidate genes for forward genetics approaches in plant science, and will be expanded for wider use in plant upgrading in many species.

  20. How Artificial Intelligence Can Improve Our Understanding of the Genes Associated with Endometriosis: Natural Language Processing of the PubMed Database

    PubMed Central

    Mashiach, R.; Cohen, S.; Kedem, A.; Baron, A.; Zajicek, M.; Feldman, I.; Seidman, D.; Soriano, D.

    2018-01-01

    Endometriosis is a disease characterized by the development of endometrial tissue outside the uterus, but its cause remains largely unknown. Numerous genes have been studied and proposed to help explain its pathogenesis. However, the large number of these candidate genes has made functional validation through experimental methodologies nearly impossible. Computational methods could provide a useful alternative for prioritizing those most likely to be susceptibility genes. Using artificial intelligence applied to text mining, this study analyzed the genes involved in the pathogenesis, development, and progression of endometriosis. The data extraction by text mining of the endometriosis-related genes in the PubMed database was based on natural language processing, and the data were filtered to remove false positives. Using data from the text mining and gene network information as input for the web-based tool, 15,207 endometriosis-related genes were ranked according to their score in the database. Characterization of the filtered gene set through gene ontology, pathway, and network analysis provided information about the numerous mechanisms hypothesized to be responsible for the establishment of ectopic endometrial tissue, as well as the migration, implantation, survival, and proliferation of ectopic endometrial cells. Finally, the human genome was scanned through various databases using filtered genes as a seed to determine novel genes that might also be involved in the pathogenesis of endometriosis but which have not yet been characterized. These genes could be promising candidates to serve as useful diagnostic biomarkers and therapeutic targets in the management of endometriosis. PMID:29750165

  1. How Artificial Intelligence Can Improve Our Understanding of the Genes Associated with Endometriosis: Natural Language Processing of the PubMed Database.

    PubMed

    Bouaziz, J; Mashiach, R; Cohen, S; Kedem, A; Baron, A; Zajicek, M; Feldman, I; Seidman, D; Soriano, D

    2018-01-01

    Endometriosis is a disease characterized by the development of endometrial tissue outside the uterus, but its cause remains largely unknown. Numerous genes have been studied and proposed to help explain its pathogenesis. However, the large number of these candidate genes has made functional validation through experimental methodologies nearly impossible. Computational methods could provide a useful alternative for prioritizing those most likely to be susceptibility genes. Using artificial intelligence applied to text mining, this study analyzed the genes involved in the pathogenesis, development, and progression of endometriosis. The data extraction by text mining of the endometriosis-related genes in the PubMed database was based on natural language processing, and the data were filtered to remove false positives. Using data from the text mining and gene network information as input for the web-based tool, 15,207 endometriosis-related genes were ranked according to their score in the database. Characterization of the filtered gene set through gene ontology, pathway, and network analysis provided information about the numerous mechanisms hypothesized to be responsible for the establishment of ectopic endometrial tissue, as well as the migration, implantation, survival, and proliferation of ectopic endometrial cells. Finally, the human genome was scanned through various databases using filtered genes as a seed to determine novel genes that might also be involved in the pathogenesis of endometriosis but which have not yet been characterized. These genes could be promising candidates to serve as useful diagnostic biomarkers and therapeutic targets in the management of endometriosis.

  2. Identification of Enzyme Genes Using Chemical Structure Alignments of Substrate-Product Pairs.

    PubMed

    Moriya, Yuki; Yamada, Takuji; Okuda, Shujiro; Nakagawa, Zenichi; Kotera, Masaaki; Tokimatsu, Toshiaki; Kanehisa, Minoru; Goto, Susumu

    2016-03-28

    Although there are several databases that contain data on many metabolites and reactions in biochemical pathways, there is still a big gap in the numbers between experimentally identified enzymes and metabolites. It is supposed that many catalytic enzyme genes are still unknown. Although there are previous studies that estimate the number of candidate enzyme genes, these studies required some additional information aside from the structures of metabolites such as gene expression and order in the genome. In this study, we developed a novel method to identify a candidate enzyme gene of a reaction using the chemical structures of the substrate-product pair (reactant pair). The proposed method is based on a search for similar reactant pairs in a reference database and offers ortholog groups that possibly mediate the given reaction. We applied the proposed method to two experimentally validated reactions. As a result, we confirmed that the histidine transaminase was correctly identified. Although our method could not directly identify the asparagine oxo-acid transaminase, we successfully found the paralog gene most similar to the correct enzyme gene. We also applied our method to infer candidate enzyme genes in the mesaconate pathway. The advantage of our method lies in the prediction of possible genes for orphan enzyme reactions where any associated gene sequences are not determined yet. We believe that this approach will facilitate experimental identification of genes for orphan enzymes.

  3. Search for sarcoidosis candidate genes by integration of data from genomic, transcriptomic and proteomic studies.

    PubMed

    Maver, Ales; Medica, Igor; Peterlin, Borut

    2009-12-01

    The search for gene candidates in multifactorial diseases such as sarcoidosis can be based on the integration of linkage association data, gene expression data, and protein profile data from genomic, transcriptomic and proteomic studies, respectively. In this study we performed a literature-based search for studies reporting such data, followed by integration of collected information. Different databases were examined--Medline, HugGE Navigator, ArrayExpress and Gene Expression Omnibus (GEO). Candidate genes were defined as genes which were reported in at least 2 different types of omics studies. Genes previously investigated in sarcoidosis were excluded from further analyses. We identified 177 genes associated with sarcoidosis as potential new candidate genes. Subsequently, 9 gene candidates identified to overlap in 2 different types of studies (genomic, transcriptomic and/or proteomic) were consistently reported in at least 3 studies: SERPINB1, FABP4, S100A8, HBEGF, IL7R, LRIG1, PTPN23, DPM2 and NUP214. These genes are involved in regulation of immune response, cellular proliferation, apoptosis, inhibition of protease activity, lipid metabolism. Exact biological functions of HBEGF, LRIG1, PTPN23, DPM2 and NUP214 remain to be completely elucidated. We propose 9 candidate genes: SERPINB1, FABP4, S100A8, HBEGF, IL7R, LRIG1, PTPN23, DPM2 and NUP214, as genes with high potential for association with sarcoidosis.

  4. A whole genome SNP genotyping by DNA microarray and candidate gene association study for kidney stone disease

    PubMed Central

    2014-01-01

    Background Kidney stone disease (KSD) is a complex disorder with unknown etiology in majority of the patients. Genetic and environmental factors may cause the disease. In the present study, we used DNA microarray to genotype single nucleotide polymorphisms (SNP) and performed candidate gene association analysis to determine genetic variations associated with the disease. Methods A whole genome SNP genotyping by DNA microarray was initially conducted in 101 patients and 105 control subjects. A set of 104 candidate genes reported to be involved in KSD, gathered from public databases and candidate gene association study databases, were evaluated for their variations associated with KSD. Results Altogether 82 SNPs distributed within 22 candidate gene regions showed significant differences in SNP allele frequencies between the patient and control groups (P < 0.05). Of these, 4 genes including BGLAP, AHSG, CD44, and HAO1, encoding osteocalcin, fetuin-A, CD44-molecule and glycolate oxidase 1, respectively, were further assessed for their associations with the disease because they carried high proportion of SNPs with statistical differences of allele frequencies between the patient and control groups within the gene. The total of 26 SNPs showed significant differences of allele frequencies between the patient and control groups and haplotypes associated with disease risk were identified. The SNP rs759330 located 144 bp downstream of BGLAP where it is a predicted microRNA binding site at 3′UTR of PAQR6 – a gene encoding progestin and adipoQ receptor family member VI, was genotyped in 216 patients and 216 control subjects and found to have significant differences in its genotype and allele frequencies (P = 0.0007, OR 2.02 and P = 0.0001, OR 2.02, respectively). Conclusions Our results suggest that these candidate genes are associated with KSD and PAQR6 comes into our view as the most potent candidate since associated SNP rs759330 is located in the miRNA binding site and may affect mRNA expression level. PMID:24886237

  5. Pivotal role of the muscle-contraction pathway in cryptorchidism and evidence for genomic connections with cardiomyopathy pathways in RASopathies.

    PubMed

    Cannistraci, Carlo V; Ogorevc, Jernej; Zorc, Minja; Ravasi, Timothy; Dovc, Peter; Kunej, Tanja

    2013-02-14

    Cryptorchidism is the most frequent congenital disorder in male children; however the genetic causes of cryptorchidism remain poorly investigated. Comparative integratomics combined with systems biology approach was employed to elucidate genetic factors and molecular pathways underlying testis descent. Literature mining was performed to collect genomic loci associated with cryptorchidism in seven mammalian species. Information regarding the collected candidate genes was stored in MySQL relational database. Genomic view of the loci was presented using Flash GViewer web tool (http://gmod.org/wiki/Flashgviewer/). DAVID Bioinformatics Resources 6.7 was used for pathway enrichment analysis. Cytoscape plug-in PiNGO 1.11 was employed for protein-network-based prediction of novel candidate genes. Relevant protein-protein interactions were confirmed and visualized using the STRING database (version 9.0). The developed cryptorchidism gene atlas includes 217 candidate loci (genes, regions involved in chromosomal mutations, and copy number variations) identified at the genomic, transcriptomic, and proteomic level. Human orthologs of the collected candidate loci were presented using a genomic map viewer. The cryptorchidism gene atlas is freely available online: http://www.integratomics-time.com/cryptorchidism/. Pathway analysis suggested the presence of twelve enriched pathways associated with the list of 179 literature-derived candidate genes. Additionally, a list of 43 network-predicted novel candidate genes was significantly associated with four enriched pathways. Joint pathway analysis of the collected and predicted candidate genes revealed the pivotal importance of the muscle-contraction pathway in cryptorchidism and evidence for genomic associations with cardiomyopathy pathways in RASopathies. The developed gene atlas represents an important resource for the scientific community researching genetics of cryptorchidism. The collected data will further facilitate development of novel genetic markers and could be of interest for functional studies in animals and human. The proposed network-based systems biology approach elucidates molecular mechanisms underlying co-presence of cryptorchidism and cardiomyopathy in RASopathies. Such approach could also aid in molecular explanation of co-presence of diverse and apparently unrelated clinical manifestations in other syndromes.

  6. NCG 4.0: the network of cancer genes in the era of massive mutational screenings of cancer genomes

    PubMed Central

    An, Omer; Pendino, Vera; D’Antonio, Matteo; Ratti, Emanuele; Gentilini, Marco; Ciccarelli, Francesca D.

    2014-01-01

    NCG 4.0 is the latest update of the Network of Cancer Genes, a web-based repository of systems-level properties of cancer genes. In its current version, the database collects information on 537 known (i.e. experimentally supported) and 1463 candidate (i.e. inferred using statistical methods) cancer genes. Candidate cancer genes derive from the manual revision of 67 original publications describing the mutational screening of 3460 human exomes and genomes in 23 different cancer types. For all 2000 cancer genes, duplicability, evolutionary origin, expression, functional annotation, interaction network with other human proteins and with microRNAs are reported. In addition to providing a substantial update of cancer-related information, NCG 4.0 also introduces two new features. The first is the annotation of possible false-positive cancer drivers, defined as candidate cancer genes inferred from large-scale screenings whose association with cancer is likely to be spurious. The second is the description of the systems-level properties of 64 human microRNAs that are causally involved in cancer progression (oncomiRs). Owing to the manual revision of all information, NCG 4.0 constitutes a complete and reliable resource on human coding and non-coding genes whose deregulation drives cancer onset and/or progression. NCG 4.0 can also be downloaded as a free application for Android smart phones. Database URL: http://bio.ieo.eu/ncg/ PMID:24608173

  7. Pea Marker Database (PMD) - A new online database combining known pea (Pisum sativum L.) gene-based markers.

    PubMed

    Kulaeva, Olga A; Zhernakov, Aleksandr I; Afonin, Alexey M; Boikov, Sergei S; Sulima, Anton S; Tikhonovich, Igor A; Zhukov, Vladimir A

    2017-01-01

    Pea (Pisum sativum L.) is the oldest model object of plant genetics and one of the most agriculturally important legumes in the world. Since the pea genome has not been sequenced yet, identification of genes responsible for mutant phenotypes or desirable agricultural traits is usually performed via genetic mapping followed by candidate gene search. Such mapping is best carried out using gene-based molecular markers, as it opens the possibility for exploiting genome synteny between pea and its close relative Medicago truncatula Gaertn., possessing sequenced and annotated genome. In the last 5 years, a large number of pea gene-based molecular markers have been designed and mapped owing to the rapid evolution of "next-generation sequencing" technologies. However, the access to the complete set of markers designed worldwide is limited because the data are not uniformed and therefore hard to use. The Pea Marker Database was designed to combine the information about pea markers in a form of user-friendly and practical online tool. Version 1 (PMD1) comprises information about 2484 genic markers, including their locations in linkage groups, the sequences of corresponding pea transcripts and the names of related genes in M. truncatula. Version 2 (PMD2) is an updated version comprising 15944 pea markers in the same format with several advanced features. To test the performance of the PMD, fine mapping of pea symbiotic genes Sym13 and Sym27 in linkage groups VII and V, respectively, was carried out. The results of mapping allowed us to propose the Sen1 gene (a homologue of SEN1 gene of Lotus japonicus (Regel) K. Larsen) as the best candidate gene for Sym13, and to narrow the list of possible candidate genes for Sym27 to ten, thus proving PMD to be useful for pea gene mapping and cloning. All information contained in PMD1 and PMD2 is available at www.peamarker.arriam.ru.

  8. Online Analytical Processing (OLAP): A Fast and Effective Data Mining Tool for Gene Expression Databases

    PubMed Central

    2005-01-01

    Gene expression databases contain a wealth of information, but current data mining tools are limited in their speed and effectiveness in extracting meaningful biological knowledge from them. Online analytical processing (OLAP) can be used as a supplement to cluster analysis for fast and effective data mining of gene expression databases. We used Analysis Services 2000, a product that ships with SQLServer2000, to construct an OLAP cube that was used to mine a time series experiment designed to identify genes associated with resistance of soybean to the soybean cyst nematode, a devastating pest of soybean. The data for these experiments is stored in the soybean genomics and microarray database (SGMD). A number of candidate resistance genes and pathways were found. Compared to traditional cluster analysis of gene expression data, OLAP was more effective and faster in finding biologically meaningful information. OLAP is available from a number of vendors and can work with any relational database management system through OLE DB. PMID:16046824

  9. Reconstruction of a Functional Human Gene Network, with an Application for Prioritizing Positional Candidate Genes

    PubMed Central

    Franke, Lude; Bakel, Harm van; Fokkens, Like; de Jong, Edwin D.; Egmont-Petersen, Michael; Wijmenga, Cisca

    2006-01-01

    Most common genetic disorders have a complex inheritance and may result from variants in many genes, each contributing only weak effects to the disease. Pinpointing these disease genes within the myriad of susceptibility loci identified in linkage studies is difficult because these loci may contain hundreds of genes. However, in any disorder, most of the disease genes will be involved in only a few different molecular pathways. If we know something about the relationships between the genes, we can assess whether some genes (which may reside in different loci) functionally interact with each other, indicating a joint basis for the disease etiology. There are various repositories of information on pathway relationships. To consolidate this information, we developed a functional human gene network that integrates information on genes and the functional relationships between genes, based on data from the Kyoto Encyclopedia of Genes and Genomes, the Biomolecular Interaction Network Database, Reactome, the Human Protein Reference Database, the Gene Ontology database, predicted protein-protein interactions, human yeast two-hybrid interactions, and microarray coexpressions. We applied this network to interrelate positional candidate genes from different disease loci and then tested 96 heritable disorders for which the Online Mendelian Inheritance in Man database reported at least three disease genes. Artificial susceptibility loci, each containing 100 genes, were constructed around each disease gene, and we used the network to rank these genes on the basis of their functional interactions. By following up the top five genes per artificial locus, we were able to detect at least one known disease gene in 54% of the loci studied, representing a 2.8-fold increase over random selection. This suggests that our method can significantly reduce the cost and effort of pinpointing true disease genes in analyses of disorders for which numerous loci have been reported but for which most of the genes are unknown. PMID:16685651

  10. Integrated database for identifying candidate genes for Aspergillus flavus resistance in maize

    PubMed Central

    2010-01-01

    Background Aspergillus flavus Link:Fr, an opportunistic fungus that produces aflatoxin, is pathogenic to maize and other oilseed crops. Aflatoxin is a potent carcinogen, and its presence markedly reduces the value of grain. Understanding and enhancing host resistance to A. flavus infection and/or subsequent aflatoxin accumulation is generally considered an efficient means of reducing grain losses to aflatoxin. Different proteomic, genomic and genetic studies of maize (Zea mays L.) have generated large data sets with the goal of identifying genes responsible for conferring resistance to A. flavus, or aflatoxin. Results In order to maximize the usage of different data sets in new studies, including association mapping, we have constructed a relational database with web interface integrating the results of gene expression, proteomic (both gel-based and shotgun), Quantitative Trait Loci (QTL) genetic mapping studies, and sequence data from the literature to facilitate selection of candidate genes for continued investigation. The Corn Fungal Resistance Associated Sequences Database (CFRAS-DB) (http://agbase.msstate.edu/) was created with the main goal of identifying genes important to aflatoxin resistance. CFRAS-DB is implemented using MySQL as the relational database management system running on a Linux server, using an Apache web server, and Perl CGI scripts as the web interface. The database and the associated web-based interface allow researchers to examine many lines of evidence (e.g. microarray, proteomics, QTL studies, SNP data) to assess the potential role of a gene or group of genes in the response of different maize lines to A. flavus infection and subsequent production of aflatoxin by the fungus. Conclusions CFRAS-DB provides the first opportunity to integrate data pertaining to the problem of A. flavus and aflatoxin resistance in maize in one resource and to support queries across different datasets. The web-based interface gives researchers different query options for mining the database across different types of experiments. The database is publically available at http://agbase.msstate.edu. PMID:20946609

  11. Integrated database for identifying candidate genes for Aspergillus flavus resistance in maize.

    PubMed

    Kelley, Rowena Y; Gresham, Cathy; Harper, Jonathan; Bridges, Susan M; Warburton, Marilyn L; Hawkins, Leigh K; Pechanova, Olga; Peethambaran, Bela; Pechan, Tibor; Luthe, Dawn S; Mylroie, J E; Ankala, Arunkanth; Ozkan, Seval; Henry, W B; Williams, W P

    2010-10-07

    Aspergillus flavus Link:Fr, an opportunistic fungus that produces aflatoxin, is pathogenic to maize and other oilseed crops. Aflatoxin is a potent carcinogen, and its presence markedly reduces the value of grain. Understanding and enhancing host resistance to A. flavus infection and/or subsequent aflatoxin accumulation is generally considered an efficient means of reducing grain losses to aflatoxin. Different proteomic, genomic and genetic studies of maize (Zea mays L.) have generated large data sets with the goal of identifying genes responsible for conferring resistance to A. flavus, or aflatoxin. In order to maximize the usage of different data sets in new studies, including association mapping, we have constructed a relational database with web interface integrating the results of gene expression, proteomic (both gel-based and shotgun), Quantitative Trait Loci (QTL) genetic mapping studies, and sequence data from the literature to facilitate selection of candidate genes for continued investigation. The Corn Fungal Resistance Associated Sequences Database (CFRAS-DB) (http://agbase.msstate.edu/) was created with the main goal of identifying genes important to aflatoxin resistance. CFRAS-DB is implemented using MySQL as the relational database management system running on a Linux server, using an Apache web server, and Perl CGI scripts as the web interface. The database and the associated web-based interface allow researchers to examine many lines of evidence (e.g. microarray, proteomics, QTL studies, SNP data) to assess the potential role of a gene or group of genes in the response of different maize lines to A. flavus infection and subsequent production of aflatoxin by the fungus. CFRAS-DB provides the first opportunity to integrate data pertaining to the problem of A. flavus and aflatoxin resistance in maize in one resource and to support queries across different datasets. The web-based interface gives researchers different query options for mining the database across different types of experiments. The database is publically available at http://agbase.msstate.edu.

  12. Production of individualized V gene databases reveals high levels of immunoglobulin genetic diversity

    NASA Astrophysics Data System (ADS)

    Corcoran, Martin M.; Phad, Ganesh E.; Bernat, Néstor Vázquez; Stahl-Hennig, Christiane; Sumida, Noriyuki; Persson, Mats A. A.; Martin, Marcel; Hedestam, Gunilla B. Karlsson

    2016-12-01

    Comprehensive knowledge of immunoglobulin genetics is required to advance our understanding of B cell biology. Validated immunoglobulin variable (V) gene databases are close to completion only for human and mouse. We present a novel computational approach, IgDiscover, that identifies germline V genes from expressed repertoires to a specificity of 100%. IgDiscover uses a cluster identification process to produce candidate sequences that, once filtered, results in individualized germline V gene databases. IgDiscover was tested in multiple species, validated by genomic cloning and cross library comparisons and produces comprehensive gene databases even where limited genomic sequence is available. IgDiscover analysis of the allelic content of the Indian and Chinese-origin rhesus macaques reveals high levels of immunoglobulin gene diversity in this species. Further, we describe a novel human IGHV3-21 allele and confirm significant gene differences between Balb/c and C57BL6 mouse strains, demonstrating the power of IgDiscover as a germline V gene discovery tool.

  13. Production of individualized V gene databases reveals high levels of immunoglobulin genetic diversity

    PubMed Central

    Corcoran, Martin M.; Phad, Ganesh E.; Bernat, Néstor Vázquez; Stahl-Hennig, Christiane; Sumida, Noriyuki; Persson, Mats A.A.; Martin, Marcel; Hedestam, Gunilla B. Karlsson

    2016-01-01

    Comprehensive knowledge of immunoglobulin genetics is required to advance our understanding of B cell biology. Validated immunoglobulin variable (V) gene databases are close to completion only for human and mouse. We present a novel computational approach, IgDiscover, that identifies germline V genes from expressed repertoires to a specificity of 100%. IgDiscover uses a cluster identification process to produce candidate sequences that, once filtered, results in individualized germline V gene databases. IgDiscover was tested in multiple species, validated by genomic cloning and cross library comparisons and produces comprehensive gene databases even where limited genomic sequence is available. IgDiscover analysis of the allelic content of the Indian and Chinese-origin rhesus macaques reveals high levels of immunoglobulin gene diversity in this species. Further, we describe a novel human IGHV3-21 allele and confirm significant gene differences between Balb/c and C57BL6 mouse strains, demonstrating the power of IgDiscover as a germline V gene discovery tool. PMID:27995928

  14. PGMapper: a web-based tool linking phenotype to genes.

    PubMed

    Xiong, Qing; Qiu, Yuhui; Gu, Weikuan

    2008-04-01

    With the availability of whole genome sequence in many species, linkage analysis, positional cloning and microarray are gradually becoming powerful tools for investigating the links between phenotype and genotype or genes. However, in these methods, causative genes underlying a quantitative trait locus, or a disease, are usually located within a large genomic region or a large set of genes. Examining the function of every gene is very time consuming and needs to retrieve and integrate the information from multiple databases or genome resources. PGMapper is a software tool for automatically matching phenotype to genes from a defined genome region or a group of given genes by combining the mapping information from the Ensembl database and gene function information from the OMIM and PubMed databases. PGMapper is currently available for candidate gene search of human, mouse, rat, zebrafish and 12 other species. Available online at http://www.genediscovery.org/pgmapper/index.jsp.

  15. The database of chromosome imbalance regions and genes resided in lung cancer from Asian and Caucasian identified by array-comparative genomic hybridization

    PubMed Central

    2012-01-01

    Background Cancer-related genes show racial differences. Therefore, identification and characterization of DNA copy number alteration regions in different racial groups helps to dissect the mechanism of tumorigenesis. Methods Array-comparative genomic hybridization (array-CGH) was analyzed for DNA copy number profile in 40 Asian and 20 Caucasian lung cancer patients. Three methods including MetaCore analysis for disease and pathway correlations, concordance analysis between array-CGH database and the expression array database, and literature search for copy number variation genes were performed to select novel lung cancer candidate genes. Four candidate oncogenes were validated for DNA copy number and mRNA and protein expression by quantitative polymerase chain reaction (qPCR), chromogenic in situ hybridization (CISH), reverse transcriptase-qPCR (RT-qPCR), and immunohistochemistry (IHC) in more patients. Results We identified 20 chromosomal imbalance regions harboring 459 genes for Caucasian and 17 regions containing 476 genes for Asian lung cancer patients. Seven common chromosomal imbalance regions harboring 117 genes, included gain on 3p13-14, 6p22.1, 9q21.13, 13q14.1, and 17p13.3; and loss on 3p22.2-22.3 and 13q13.3 were found both in Asian and Caucasian patients. Gene validation for four genes including ARHGAP19 (10q24.1) functioning in Rho activity control, FRAT2 (10q24.1) involved in Wnt signaling, PAFAH1B1 (17p13.3) functioning in motility control, and ZNF322A (6p22.1) involved in MAPK signaling was performed using qPCR and RT-qPCR. Mean gene dosage and mRNA expression level of the four candidate genes in tumor tissues were significantly higher than the corresponding normal tissues (P<0.001~P=0.06). In addition, CISH analysis of patients indicated that copy number amplification indeed occurred for ARHGAP19 and ZNF322A genes in lung cancer patients. IHC analysis of paraffin blocks from Asian Caucasian patients demonstrated that the frequency of PAFAH1B1 protein overexpression was 68% in Asian and 70% in Caucasian. Conclusions Our study provides an invaluable database revealing common and differential imbalance regions at specific chromosomes among Asian and Caucasian lung cancer patients. Four validation methods confirmed our database, which would help in further studies on the mechanism of lung tumorigenesis. PMID:22691236

  16. Endeavour update: a web resource for gene prioritization in multiple species

    PubMed Central

    Tranchevent, Léon-Charles; Barriot, Roland; Yu, Shi; Van Vooren, Steven; Van Loo, Peter; Coessens, Bert; De Moor, Bart; Aerts, Stein; Moreau, Yves

    2008-01-01

    Endeavour (http://www.esat.kuleuven.be/endeavourweb; this web site is free and open to all users and there is no login requirement) is a web resource for the prioritization of candidate genes. Using a training set of genes known to be involved in a biological process of interest, our approach consists of (i) inferring several models (based on various genomic data sources), (ii) applying each model to the candidate genes to rank those candidates against the profile of the known genes and (iii) merging the several rankings into a global ranking of the candidate genes. In the present article, we describe the latest developments of Endeavour. First, we provide a web-based user interface, besides our Java client, to make Endeavour more universally accessible. Second, we support multiple species: in addition to Homo sapiens, we now provide gene prioritization for three major model organisms: Mus musculus, Rattus norvegicus and Caenorhabditis elegans. Third, Endeavour makes use of additional data sources and is now including numerous databases: ontologies and annotations, protein–protein interactions, cis-regulatory information, gene expression data sets, sequence information and text-mining data. We tested the novel version of Endeavour on 32 recent disease gene associations from the literature. Additionally, we describe a number of recent independent studies that made use of Endeavour to prioritize candidate genes for obesity and Type II diabetes, cleft lip and cleft palate, and pulmonary fibrosis. PMID:18508807

  17. Transactional Database Transformation and Its Application in Prioritizing Human Disease Genes

    PubMed Central

    Xiang, Yang; Payne, Philip R.O.; Huang, Kun

    2013-01-01

    Binary (0,1) matrices, commonly known as transactional databases, can represent many application data, including gene-phenotype data where “1” represents a confirmed gene-phenotype relation and “0” represents an unknown relation. It is natural to ask what information is hidden behind these “0”s and “1”s. Unfortunately, recent matrix completion methods, though very effective in many cases, are less likely to infer something interesting from these (0,1)-matrices. To answer this challenge, we propose IndEvi, a very succinct and effective algorithm to perform independent-evidence-based transactional database transformation. Each entry of a (0,1)-matrix is evaluated by “independent evidence” (maximal supporting patterns) extracted from the whole matrix for this entry. The value of an entry, regardless of its value as 0 or 1, has completely no effect for its independent evidence. The experiment on a gene-phenotype database shows that our method is highly promising in ranking candidate genes and predicting unknown disease genes. PMID:21422495

  18. AgeFactDB--the JenAge Ageing Factor Database--towards data integration in ageing research.

    PubMed

    Hühne, Rolf; Thalheim, Torsten; Sühnel, Jürgen

    2014-01-01

    AgeFactDB (http://agefactdb.jenage.de) is a database aimed at the collection and integration of ageing phenotype data including lifespan information. Ageing factors are considered to be genes, chemical compounds or other factors such as dietary restriction, whose action results in a changed lifespan or another ageing phenotype. Any information related to the effects of ageing factors is called an observation and is presented on observation pages. To provide concise access to the complete information for a particular ageing factor, corresponding observations are also summarized on ageing factor pages. In a first step, ageing-related data were primarily taken from existing databases such as the Ageing Gene Database--GenAge, the Lifespan Observations Database and the Dietary Restriction Gene Database--GenDR. In addition, we have started to include new ageing-related information. Based on homology data taken from the HomoloGene Database, AgeFactDB also provides observation and ageing factor pages of genes that are homologous to known ageing-related genes. These homologues are considered as candidate or putative ageing-related genes. AgeFactDB offers a variety of search and browse options, and also allows the download of ageing factor or observation lists in TSV, CSV and XML formats.

  19. LAILAPS: the plant science search engine.

    PubMed

    Esch, Maria; Chen, Jinbo; Colmsee, Christian; Klapperstück, Matthias; Grafahrend-Belau, Eva; Scholz, Uwe; Lange, Matthias

    2015-01-01

    With the number of sequenced plant genomes growing, the number of predicted genes and functional annotations is also increasing. The association between genes and phenotypic traits is currently of great interest. Unfortunately, the information available today is widely scattered over a number of different databases. Information retrieval (IR) has become an all-encompassing bioinformatics methodology for extracting knowledge from complex, heterogeneous and distributed databases, and therefore can be a useful tool for obtaining a comprehensive view of plant genomics, from genes to traits. Here we describe LAILAPS (http://lailaps.ipk-gatersleben.de), an IR system designed to link plant genomic data in the context of phenotypic attributes for a detailed forward genetic research. LAILAPS comprises around 65 million indexed documents, encompassing >13 major life science databases with around 80 million links to plant genomic resources. The LAILAPS search engine allows fuzzy querying for candidate genes linked to specific traits over a loosely integrated system of indexed and interlinked genome databases. Query assistance and an evidence-based annotation system enable time-efficient and comprehensive information retrieval. An artificial neural network incorporating user feedback and behavior tracking allows relevance sorting of results. We fully describe LAILAPS's functionality and capabilities by comparing this system's performance with other widely used systems and by reporting both a validation in maize and a knowledge discovery use-case focusing on candidate genes in barley. © The Author 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists.

  20. Semantic Web Ontology and Data Integration: a Case Study in Aiding Psychiatric Drug Repurposing.

    PubMed

    Liang, Chen; Sun, Jingchun; Tao, Cui

    2015-01-01

    There remain significant difficulties selecting probable candidate drugs from existing databases. We describe an ontology-oriented approach to represent the nexus between genes, drugs, phenotypes, symptoms, and diseases from multiple information sources. We also report a case study in which we attempted to explore candidate drugs effective for bipolar disorder and epilepsy. We constructed an ontology incorporating knowledge between the two diseases and performed semantic reasoning tasks with the ontology. The results suggested 48 candidate drugs that hold promise for further breakthrough. The evaluation demonstrated the validity our approach. Our approach prioritizes the candidate drugs that have potential associations among genes, phenotypes and symptoms, and thus facilitates the data integration and drug repurposing in psychiatric disorders.

  1. Biomine: predicting links between biological entities using network models of heterogeneous databases.

    PubMed

    Eronen, Lauri; Toivonen, Hannu

    2012-06-06

    Biological databases contain large amounts of data concerning the functions and associations of genes and proteins. Integration of data from several such databases into a single repository can aid the discovery of previously unknown connections spanning multiple types of relationships and databases. Biomine is a system that integrates cross-references from several biological databases into a graph model with multiple types of edges, such as protein interactions, gene-disease associations and gene ontology annotations. Edges are weighted based on their type, reliability, and informativeness. We present Biomine and evaluate its performance in link prediction, where the goal is to predict pairs of nodes that will be connected in the future, based on current data. In particular, we formulate protein interaction prediction and disease gene prioritization tasks as instances of link prediction. The predictions are based on a proximity measure computed on the integrated graph. We consider and experiment with several such measures, and perform a parameter optimization procedure where different edge types are weighted to optimize link prediction accuracy. We also propose a novel method for disease-gene prioritization, defined as finding a subset of candidate genes that cluster together in the graph. We experimentally evaluate Biomine by predicting future annotations in the source databases and prioritizing lists of putative disease genes. The experimental results show that Biomine has strong potential for predicting links when a set of selected candidate links is available. The predictions obtained using the entire Biomine dataset are shown to clearly outperform ones obtained using any single source of data alone, when different types of links are suitably weighted. In the gene prioritization task, an established reference set of disease-associated genes is useful, but the results show that under favorable conditions, Biomine can also perform well when no such information is available.The Biomine system is a proof of concept. Its current version contains 1.1 million entities and 8.1 million relations between them, with focus on human genetics. Some of its functionalities are available in a public query interface at http://biomine.cs.helsinki.fi, allowing searching for and visualizing connections between given biological entities.

  2. In silico analysis of cacao (Theobroma cacao L.) genes that involved in pathogen and disease responses

    NASA Astrophysics Data System (ADS)

    Agung, Muhammad Budi; Budiarsa, I. Made; Suwastika, I. Nengah

    2017-02-01

    Cocoa bean is one of the main commodities from Indonesia for the world, which still have problem regarding yield degradation due to pathogens and disease attack. Developing robust cacao plant that genetically resistant to pathogen and disease attack is an ideal solution in over taking on this problem. The aim of this study was to identify Theobroma cacao genes on database of cacao genome that homolog to response genes of pathogen and disease attack in other plant, through in silico analysis. Basic information survey and gene identification were performed in GenBank and The Arabidopsis Information Resource database. The In silico analysis contains protein BLAST, homology test of each gene's protein candidates, and identification of homologue gene in Cacao Genome Database using data source "Theobroma cacao cv. Matina 1-6 v1.1" genome. Identification found that Thecc1EG011959t1 (EDS1), Thecc1EG006803t1 (EDS5), Thecc1EG013842t1 (ICS1), and Thecc1EG015614t1 (BG_PPAP) gene of Cacao Genome Database were Theobroma cacao genes that homolog to plant's resistance genes which highly possible to have similar functions of each gene's homologue gene.

  3. RiceMetaSys for salt and drought stress responsive genes in rice: a web interface for crop improvement.

    PubMed

    Sandhu, Maninder; Sureshkumar, V; Prakash, Chandra; Dixit, Rekha; Solanke, Amolkumar U; Sharma, Tilak Raj; Mohapatra, Trilochan; S V, Amitha Mithra

    2017-09-30

    Genome-wide microarray has enabled development of robust databases for functional genomics studies in rice. However, such databases do not directly cater to the needs of breeders. Here, we have attempted to develop a web interface which combines the information from functional genomic studies across different genetic backgrounds with DNA markers so that they can be readily deployed in crop improvement. In the current version of the database, we have included drought and salinity stress studies since these two are the major abiotic stresses in rice. RiceMetaSys, a user-friendly and freely available web interface provides comprehensive information on salt responsive genes (SRGs) and drought responsive genes (DRGs) across genotypes, crop development stages and tissues, identified from multiple microarray datasets. 'Physical position search' is an attractive tool for those using QTL based approach for dissecting tolerance to salt and drought stress since it can provide the list of SRGs and DRGs in any physical interval. To identify robust candidate genes for use in crop improvement, the 'common genes across varieties' search tool is useful. Graphical visualization of expression profiles across genes and rice genotypes has been enabled to facilitate the user and to make the comparisons more impactful. Simple Sequence Repeat (SSR) search in the SRGs and DRGs is a valuable tool for fine mapping and marker assisted selection since it provides primers for survey of polymorphism. An external link to intron specific markers is also provided for this purpose. Bulk retrieval of data without any limit has been enabled in case of locus and SSR search. The aim of this database is to facilitate users with a simple and straight-forward search options for identification of robust candidate genes from among thousands of SRGs and DRGs so as to facilitate linking variation in expression profiles to variation in phenotype. Database URL: http://14.139.229.201.

  4. ProDiGe: Prioritization Of Disease Genes with multitask machine learning from positive and unlabeled examples

    PubMed Central

    2011-01-01

    Background Elucidating the genetic basis of human diseases is a central goal of genetics and molecular biology. While traditional linkage analysis and modern high-throughput techniques often provide long lists of tens or hundreds of disease gene candidates, the identification of disease genes among the candidates remains time-consuming and expensive. Efficient computational methods are therefore needed to prioritize genes within the list of candidates, by exploiting the wealth of information available about the genes in various databases. Results We propose ProDiGe, a novel algorithm for Prioritization of Disease Genes. ProDiGe implements a novel machine learning strategy based on learning from positive and unlabeled examples, which allows to integrate various sources of information about the genes, to share information about known disease genes across diseases, and to perform genome-wide searches for new disease genes. Experiments on real data show that ProDiGe outperforms state-of-the-art methods for the prioritization of genes in human diseases. Conclusions ProDiGe implements a new machine learning paradigm for gene prioritization, which could help the identification of new disease genes. It is freely available at http://cbio.ensmp.fr/prodige. PMID:21977986

  5. Prediction of gene-phenotype associations in humans, mice, and plants using phenologs.

    PubMed

    Woods, John O; Singh-Blom, Ulf Martin; Laurent, Jon M; McGary, Kriston L; Marcotte, Edward M

    2013-06-21

    Phenotypes and diseases may be related to seemingly dissimilar phenotypes in other species by means of the orthology of underlying genes. Such "orthologous phenotypes," or "phenologs," are examples of deep homology, and may be used to predict additional candidate disease genes. In this work, we develop an unsupervised algorithm for ranking phenolog-based candidate disease genes through the integration of predictions from the k nearest neighbor phenologs, comparing classifiers and weighting functions by cross-validation. We also improve upon the original method by extending the theory to paralogous phenotypes. Our algorithm makes use of additional phenotype data--from chicken, zebrafish, and E. coli, as well as new datasets for C. elegans--establishing that several types of annotations may be treated as phenotypes. We demonstrate the use of our algorithm to predict novel candidate genes for human atrial fibrillation (such as HRH2, ATP4A, ATP4B, and HOPX) and epilepsy (e.g., PAX6 and NKX2-1). We suggest gene candidates for pharmacologically-induced seizures in mouse, solely based on orthologous phenotypes from E. coli. We also explore the prediction of plant gene-phenotype associations, as for the Arabidopsis response to vernalization phenotype. We are able to rank gene predictions for a significant portion of the diseases in the Online Mendelian Inheritance in Man database. Additionally, our method suggests candidate genes for mammalian seizures based only on bacterial phenotypes and gene orthology. We demonstrate that phenotype information may come from diverse sources, including drug sensitivities, gene ontology biological processes, and in situ hybridization annotations. Finally, we offer testable candidates for a variety of human diseases, plant traits, and other classes of phenotypes across a wide array of species.

  6. Parkinson's disease candidate gene prioritization based on expression profile of midbrain dopaminergic neurons

    PubMed Central

    2010-01-01

    Background Parkinson's disease is the second most common neurodegenerative disorder. The pathological hallmark of the disease is degeneration of midbrain dopaminergic neurons. Genetic association studies have linked 13 human chromosomal loci to Parkinson's disease. Identification of gene(s), as part of the etiology of Parkinson's disease, within the large number of genes residing in these loci can be achieved through several approaches, including screening methods, and considering appropriate criteria. Since several of the indentified Parkinson's disease genes are expressed in substantia nigra pars compact of the midbrain, expression within the neurons of this area could be a suitable criterion to limit the number of candidates and identify PD genes. Methods In this work we have used the combination of findings from six rodent transcriptome analysis studies on the gene expression profile of midbrain dopaminergic neurons and the PARK loci in OMIM (Online Mendelian Inheritance in Man) database, to identify new candidate genes for Parkinson's disease. Results Merging the two datasets, we identified 20 genes within PARK loci, 7 of which are located in an orphan Parkinson's disease locus and one, which had been identified as a disease gene. In addition to identifying a set of candidates for further genetic association studies, these results show that the criteria of expression in midbrain dopaminergic neurons may be used to narrow down the number of genes in PARK loci for such studies. PMID:20716345

  7. The construction of an EST database for Bombyx mori and its application

    PubMed Central

    Mita, Kazuei; Morimyo, Mitsuoki; Okano, Kazuhiro; Koike, Yoshiko; Nohata, Junko; Kawasaki, Hideki; Kadono-Okuda, Keiko; Yamamoto, Kimiko; Suzuki, Masataka G.; Shimada, Toru; Goldsmith, Marian R.; Maeda, Susumu

    2003-01-01

    To build a foundation for the complete genome analysis of Bombyx mori, we have constructed an EST database. Because gene expression patterns deeply depend on tissues as well as developmental stages, we analyzed many cDNA libraries prepared from various tissues and different developmental stages to cover the entire set of Bombyx genes. So far, the Bombyx EST database contains 35,000 ESTs from 36 cDNA libraries, which are grouped into ≈11,000 nonredundant ESTs with the average length of 1.25 kb. The comparison with FlyBase suggests that the present EST database, SilkBase, covers >55% of all genes of Bombyx. The fraction of library-specific ESTs in each cDNA library indicates that we have not yet reached saturation, showing the validity of our strategy for constructing an EST database to cover all genes. To tackle the coming saturation problem, we have checked two methods, subtraction and normalization, to increase coverage and decrease the number of housekeeping genes, resulting in a 5–11% increase of library-specific ESTs. The identification of a number of genes and comprehensive cloning of gene families have already emerged from the SilkBase search. Direct links of SilkBase with FlyBase and WormBase provide ready identification of candidate Lepidoptera-specific genes. PMID:14614147

  8. Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones

    PubMed Central

    Imanishi, Tadashi; Itoh, Takeshi; Suzuki, Yutaka; O'Donovan, Claire; Fukuchi, Satoshi; Koyanagi, Kanako O; Barrero, Roberto A; Tamura, Takuro; Yamaguchi-Kabata, Yumi; Tanino, Motohiko; Yura, Kei; Miyazaki, Satoru; Ikeo, Kazuho; Homma, Keiichi; Kasprzyk, Arek; Nishikawa, Tetsuo; Hirakawa, Mika; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Ashurst, Jennifer; Jia, Libin; Nakao, Mitsuteru; Thomas, Michael A; Mulder, Nicola; Karavidopoulou, Youla; Jin, Lihua; Kim, Sangsoo; Yasuda, Tomohiro; Lenhard, Boris; Eveno, Eric; Suzuki, Yoshiyuki; Yamasaki, Chisato; Takeda, Jun-ichi; Gough, Craig; Hilton, Phillip; Fujii, Yasuyuki; Sakai, Hiroaki; Tanaka, Susumu; Amid, Clara; Bellgard, Matthew; Bonaldo, Maria de Fatima; Bono, Hidemasa; Bromberg, Susan K; Brookes, Anthony J; Bruford, Elspeth; Carninci, Piero; Chelala, Claude; Couillault, Christine; de Souza, Sandro J.; Debily, Marie-Anne; Devignes, Marie-Dominique; Dubchak, Inna; Endo, Toshinori; Estreicher, Anne; Eyras, Eduardo; Fukami-Kobayashi, Kaoru; R. Gopinath, Gopal; Graudens, Esther; Hahn, Yoonsoo; Han, Michael; Han, Ze-Guang; Hanada, Kousuke; Hanaoka, Hideki; Harada, Erimi; Hashimoto, Katsuyuki; Hinz, Ursula; Hirai, Momoki; Hishiki, Teruyoshi; Hopkinson, Ian; Imbeaud, Sandrine; Inoko, Hidetoshi; Kanapin, Alexander; Kaneko, Yayoi; Kasukawa, Takeya; Kelso, Janet; Kersey, Paul; Kikuno, Reiko; Kimura, Kouichi; Korn, Bernhard; Kuryshev, Vladimir; Makalowska, Izabela; Makino, Takashi; Mano, Shuhei; Mariage-Samson, Regine; Mashima, Jun; Matsuda, Hideo; Mewes, Hans-Werner; Minoshima, Shinsei; Nagai, Keiichi; Nagasaki, Hideki; Nagata, Naoki; Nigam, Rajni; Ogasawara, Osamu; Ohara, Osamu; Ohtsubo, Masafumi; Okada, Norihiro; Okido, Toshihisa; Oota, Satoshi; Ota, Motonori; Ota, Toshio; Otsuki, Tetsuji; Piatier-Tonneau, Dominique; Poustka, Annemarie; Ren, Shuang-Xi; Saitou, Naruya; Sakai, Katsunaga; Sakamoto, Shigetaka; Sakate, Ryuichi; Schupp, Ingo; Servant, Florence; Sherry, Stephen; Shiba, Rie; Shimizu, Nobuyoshi; Shimoyama, Mary; Simpson, Andrew J; Soares, Bento; Steward, Charles; Suwa, Makiko; Suzuki, Mami; Takahashi, Aiko; Tamiya, Gen; Tanaka, Hiroshi; Taylor, Todd; Terwilliger, Joseph D; Unneberg, Per; Veeramachaneni, Vamsi; Watanabe, Shinya; Wilming, Laurens; Yasuda, Norikazu; Yoo, Hyang-Sook; Stodolsky, Marvin; Makalowski, Wojciech; Go, Mitiko; Nakai, Kenta; Takagi, Toshihisa; Kanehisa, Minoru; Sakaki, Yoshiyuki; Quackenbush, John; Okazaki, Yasushi; Hayashizaki, Yoshihide; Hide, Winston; Chakraborty, Ranajit; Nishikawa, Ken; Sugawara, Hideaki; Tateno, Yoshio; Chen, Zhu; Oishi, Michio; Tonellato, Peter; Apweiler, Rolf; Okubo, Kousaku; Wagner, Lukas; Wiemann, Stefan; Strausberg, Robert L; Isogai, Takao; Auffray, Charles; Nomura, Nobuo; Sugano, Sumio

    2004-01-01

    The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology. PMID:15103394

  9. Integrative Functional Genomics for Systems Genetics in GeneWeaver.org.

    PubMed

    Bubier, Jason A; Langston, Michael A; Baker, Erich J; Chesler, Elissa J

    2017-01-01

    The abundance of existing functional genomics studies permits an integrative approach to interpreting and resolving the results of diverse systems genetics studies. However, a major challenge lies in assembling and harmonizing heterogeneous data sets across species for facile comparison to the positional candidate genes and coexpression networks that come from systems genetic studies. GeneWeaver is an online database and suite of tools at www.geneweaver.org that allows for fast aggregation and analysis of gene set-centric data. GeneWeaver contains curated experimental data together with resource-level data such as GO annotations, MP annotations, and KEGG pathways, along with persistent stores of user entered data sets. These can be entered directly into GeneWeaver or transferred from widely used resources such as GeneNetwork.org. Data are analyzed using statistical tools and advanced graph algorithms to discover new relations, prioritize candidate genes, and generate function hypotheses. Here we use GeneWeaver to find genes common to multiple gene sets, prioritize candidate genes from a quantitative trait locus, and characterize a set of differentially expressed genes. Coupling a large multispecies repository curated and empirical functional genomics data to fast computational tools allows for the rapid integrative analysis of heterogeneous data for interpreting and extrapolating systems genetics results.

  10. TOM: a web-based integrated approach for identification of candidate disease genes.

    PubMed

    Rossi, Simona; Masotti, Daniele; Nardini, Christine; Bonora, Elena; Romeo, Giovanni; Macii, Enrico; Benini, Luca; Volinia, Stefano

    2006-07-01

    The massive production of biological data by means of highly parallel devices like microarrays for gene expression has paved the way to new possible approaches in molecular genetics. Among them the possibility of inferring biological answers by querying large amounts of expression data. Based on this principle, we present here TOM, a web-based resource for the efficient extraction of candidate genes for hereditary diseases. The service requires the previous knowledge of at least another gene responsible for the disease and the linkage area, or else of two disease associated genetic intervals. The algorithm uses the information stored in public resources, including mapping, expression and functional databases. Given the queries, TOM will select and list one or more candidate genes. This approach allows the geneticist to bypass the costly and time consuming tracing of genetic markers through entire families and might improve the chance of identifying disease genes, particularly for rare diseases. We present here the tool and the results obtained on known benchmark and on hereditary predisposition to familial thyroid cancer. Our algorithm is available at http://www-micrel.deis.unibo.it/~tom/.

  11. A fruit quality gene map of Prunus

    PubMed Central

    2009-01-01

    Background Prunus fruit development, growth, ripening, and senescence includes major biochemical and sensory changes in texture, color, and flavor. The genetic dissection of these complex processes has important applications in crop improvement, to facilitate maximizing and maintaining stone fruit quality from production and processing through to marketing and consumption. Here we present an integrated fruit quality gene map of Prunus containing 133 genes putatively involved in the determination of fruit texture, pigmentation, flavor, and chilling injury resistance. Results A genetic linkage map of 211 markers was constructed for an intraspecific peach (Prunus persica) progeny population, Pop-DG, derived from a canning peach cultivar 'Dr. Davis' and a fresh market cultivar 'Georgia Belle'. The Pop-DG map covered 818 cM of the peach genome and included three morphological markers, 11 ripening candidate genes, 13 cold-responsive genes, 21 novel EST-SSRs from the ChillPeach database, 58 previously reported SSRs, 40 RAFs, 23 SRAPs, 14 IMAs, and 28 accessory markers from candidate gene amplification. The Pop-DG map was co-linear with the Prunus reference T × E map, with 39 SSR markers in common to align the maps. A further 158 markers were bin-mapped to the reference map: 59 ripening candidate genes, 50 cold-responsive genes, and 50 novel EST-SSRs from ChillPeach, with deduced locations in Pop-DG via comparative mapping. Several candidate genes and EST-SSRs co-located with previously reported major trait loci and quantitative trait loci for chilling injury symptoms in Pop-DG. Conclusion The candidate gene approach combined with bin-mapping and availability of a community-recognized reference genetic map provides an efficient means of locating genes of interest in a target genome. We highlight the co-localization of fruit quality candidate genes with previously reported fruit quality QTLs. The fruit quality gene map developed here is a valuable tool for dissecting the genetic architecture of fruit quality traits in Prunus crops. PMID:19995417

  12. Phenome-driven disease genetics prediction toward drug discovery.

    PubMed

    Chen, Yang; Li, Li; Zhang, Guo-Qiang; Xu, Rong

    2015-06-15

    Discerning genetic contributions to diseases not only enhances our understanding of disease mechanisms, but also leads to translational opportunities for drug discovery. Recent computational approaches incorporate disease phenotypic similarities to improve the prediction power of disease gene discovery. However, most current studies used only one data source of human disease phenotype. We present an innovative and generic strategy for combining multiple different data sources of human disease phenotype and predicting disease-associated genes from integrated phenotypic and genomic data. To demonstrate our approach, we explored a new phenotype database from biomedical ontologies and constructed Disease Manifestation Network (DMN). We combined DMN with mimMiner, which was a widely used phenotype database in disease gene prediction studies. Our approach achieved significantly improved performance over a baseline method, which used only one phenotype data source. In the leave-one-out cross-validation and de novo gene prediction analysis, our approach achieved the area under the curves of 90.7% and 90.3%, which are significantly higher than 84.2% (P < e(-4)) and 81.3% (P < e(-12)) for the baseline approach. We further demonstrated that our predicted genes have the translational potential in drug discovery. We used Crohn's disease as an example and ranked the candidate drugs based on the rank of drug targets. Our gene prediction approach prioritized druggable genes that are likely to be associated with Crohn's disease pathogenesis, and our rank of candidate drugs successfully prioritized the Food and Drug Administration-approved drugs for Crohn's disease. We also found literature evidence to support a number of drugs among the top 200 candidates. In summary, we demonstrated that a novel strategy combining unique disease phenotype data with system approaches can lead to rapid drug discovery. nlp. edu/public/data/DMN © The Author 2015. Published by Oxford University Press.

  13. An integrative, translational approach to understanding rare and orphan genetically based diseases

    PubMed Central

    Hoehndorf, Robert; Schofield, Paul N.; Gkoutos, Georgios V.

    2013-01-01

    PhenomeNet is an approach for integrating phenotypes across species and identifying candidate genes for genetic diseases based on the similarity between a disease and animal model phenotypes. In contrast to ‘guilt-by-association’ approaches, PhenomeNet relies exclusively on the comparison of phenotypes to suggest candidate genes, and can, therefore, be applied to study the molecular basis of rare and orphan diseases for which the molecular basis is unknown. In addition to disease phenotypes from the Online Mendelian Inheritance in Man (OMIM) database, we have now integrated the clinical signs from Orphanet into PhenomeNet. We demonstrate that our approach can efficiently identify known candidate genes for genetic diseases in Orphanet and OMIM. Furthermore, we find evidence that mutations in the HIP1 gene might cause Bassoe syndrome, a rare disorder with unknown genetic aetiology. Our results demonstrate that integration and computational analysis of human disease and animal model phenotypes using PhenomeNet has the potential to reveal novel insights into the pathobiology underlying genetic diseases. PMID:23853703

  14. Transcriptome sequencing and identification of cold tolerance genes in hardy Corylus species (C. heterophylla Fisch) floral buds.

    PubMed

    Chen, Xin; Zhang, Jin; Liu, Qingzhong; Guo, Wei; Zhao, Tiantian; Ma, Qinghua; Wang, Guixi

    2014-01-01

    The genus Corylus is an important woody species in Northeast China. Its products, hazelnuts, constitute one of the most important raw materials for the pastry and chocolate industry. However, limited genetic research has focused on Corylus because of the lack of genomic resources. The advent of high-throughput sequencing technologies provides a turning point for Corylus research. In the present study, we performed de novo transcriptome sequencing for the first time to produce a comprehensive database for the Corylus heterophylla Fisch floral buds. The C. heterophylla Fisch floral buds transcriptome was sequenced using the Illumina paired-end sequencing technology. We produced 28,930,890 raw reads and assembled them into 82,684 contigs. A total of 40,941 unigenes were identified, among which 30,549 were annotated in the NCBI Non-redundant (Nr) protein database and 18,581 were annotated in the Swiss-Prot database. Of these annotated unigenes, 25,311 and 10,514 unigenes were assigned to gene ontology (GO) categories and clusters of orthologous groups (COG), respectively. We could map 17,207 unigenes onto 128 pathways using the Kyoto Encyclopedia of Genes and Genomes Pathway (KEGG) database. Additionally, based on the transcriptome, we constructed a candidate cold tolerance gene set of C. heterophylla Fisch floral buds. The expression patterns of selected genes during four stages of cold acclimation suggested that these genes might be involved in different cold responsive stages in C. heterophylla Fisch floral buds. The transcriptome of C. heterophylla Fisch floral buds was deep sequenced, de novo assembled, and annotated, providing abundant data to better understand the C. heterophylla Fisch floral buds transcriptome. Candidate genes potentially involved in cold tolerance were identified, providing a material basis for future molecular mechanism analysis of C. heterophylla Fisch floral buds tolerant to cold stress.

  15. A comparative analysis of genetic diversity of candidate genes associated with type 2 diabetes in worldwide populations.

    PubMed

    Gong, Xian; Zhang, Chao; Yiliyasi·Aisa, Yiliyasi·Aisa; Shi, Ying; Yang, Xue-wei; NuersimanguliAosiman, NuersimanguliAosiman; Guan, Ya-qun; Xu, Shu-hua

    2016-06-20

    Over the last decade, a larger number of type 2 diabetes mellitus (T2DM) susceptible candidate genes have been reported by numerous genome-wide association studies (GWAS). Understanding the genetic diversity of these candidate genes among worldwide populations not only facilitates to elucidating the genetic mechanism of T2DM, but also provides guidance to further studies of pathogenesis of T2DM in any certain population. In this study, we identified 170 genes or genomic regions associated with T2DM by searching the GWAS databases and related literatures. We next analyzed the genetic diversity of these genes (or genomic regions) among present-day human populations by curetting the 1000 Genomes Projects phase1 dataset covering 14 worldwide populations. We further compared the characteristics of T2DM genes in different populations. No significant differences of genetic diversity were observed among the 14 worldwide populations between the T2DM candidate genes and the non-T2DM genes in terms of overall pattern. However, we observed some genes, such as IL20RA, RNMTL1-NXN, NOTCH2, ADRA2A-BTBD7P2, TBC1D4, RBM38-HMGB1P1, UBE2E2, and PPARD, show considerable differentiation between populations. In particular, IL20RA (FST=0.1521) displays the greatest population difference which is mainly contributed by that between Africans and non-Africans. Moreover, we revealed genetic differences between East Asians and Europeans on some candidate genes such as DGKB-AGMO (FST=0.173) and JAZF1 (FST=0.182). Our results indicate that some T2DM susceptible candidate genes harbor highly-differentiated variants between populations. These analyses, despite preliminary, should advance our understanding of the population difference of susceptibility to T2DM and provide insightful reference that future studies can relay on.

  16. PAINT: a promoter analysis and interaction network generation tool for gene regulatory network identification.

    PubMed

    Vadigepalli, Rajanikanth; Chakravarthula, Praveen; Zak, Daniel E; Schwaber, James S; Gonye, Gregory E

    2003-01-01

    We have developed a bioinformatics tool named PAINT that automates the promoter analysis of a given set of genes for the presence of transcription factor binding sites. Based on coincidence of regulatory sites, this tool produces an interaction matrix that represents a candidate transcriptional regulatory network. This tool currently consists of (1) a database of promoter sequences of known or predicted genes in the Ensembl annotated mouse genome database, (2) various modules that can retrieve and process the promoter sequences for binding sites of known transcription factors, and (3) modules for visualization and analysis of the resulting set of candidate network connections. This information provides a substantially pruned list of genes and transcription factors that can be examined in detail in further experimental studies on gene regulation. Also, the candidate network can be incorporated into network identification methods in the form of constraints on feasible structures in order to render the algorithms tractable for large-scale systems. The tool can also produce output in various formats suitable for use in external visualization and analysis software. In this manuscript, PAINT is demonstrated in two case studies involving analysis of differentially regulated genes chosen from two microarray data sets. The first set is from a neuroblastoma N1E-115 cell differentiation experiment, and the second set is from neuroblastoma N1E-115 cells at different time intervals following exposure to neuropeptide angiotensin II. PAINT is available for use as an agent in BioSPICE simulation and analysis framework (www.biospice.org), and can also be accessed via a WWW interface at www.dbi.tju.edu/dbi/tools/paint/.

  17. Liverome: a curated database of liver cancer-related gene signatures with self-contained context information.

    PubMed

    Lee, Langho; Wang, Kai; Li, Gang; Xie, Zhi; Wang, Yuli; Xu, Jiangchun; Sun, Shaoxian; Pocalyko, David; Bhak, Jong; Kim, Chulhong; Lee, Kee-Ho; Jang, Ye Jin; Yeom, Young Il; Yoo, Hyang-Sook; Hwang, Seungwoo

    2011-11-30

    Hepatocellular carcinoma (HCC) is the fifth most common cancer worldwide. A number of molecular profiling studies have investigated the changes in gene and protein expression that are associated with various clinicopathological characteristics of HCC and generated a wealth of scattered information, usually in the form of gene signature tables. A database of the published HCC gene signatures would be useful to liver cancer researchers seeking to retrieve existing differential expression information on a candidate gene and to make comparisons between signatures for prioritization of common genes. A challenge in constructing such database is that a direct import of the signatures as appeared in articles would lead to a loss or ambiguity of their context information that is essential for a correct biological interpretation of a gene's expression change. This challenge arises because designation of compared sample groups is most often abbreviated, ad hoc, or even missing from published signature tables. Without manual curation, the context information becomes lost, leading to uninformative database contents. Although several databases of gene signatures are available, none of them contains informative form of signatures nor shows comprehensive coverage on liver cancer. Thus we constructed Liverome, a curated database of liver cancer-related gene signatures with self-contained context information. Liverome's data coverage is more than three times larger than any other signature database, consisting of 143 signatures taken from 98 HCC studies, mostly microarray and proteome, and involving 6,927 genes. The signatures were post-processed into an informative and uniform representation and annotated with an itemized summary so that all context information is unambiguously self-contained within the database. The signatures were further informatively named and meaningfully organized according to ten functional categories for guided browsing. Its web interface enables a straightforward retrieval of known differential expression information on a query gene and a comparison of signatures to prioritize common genes. The utility of Liverome-collected data is shown by case studies in which useful biological insights on HCC are produced. Liverome database provides a comprehensive collection of well-curated HCC gene signatures and straightforward interfaces for gene search and signature comparison as well. Liverome is available at http://liverome.kobic.re.kr.

  18. Candidate Loci for Yield-Related Traits in Maize Revealed by a Combination of MetaQTL Analysis and Regional Association Mapping

    PubMed Central

    Chen, Lin; An, Yixin; Li, Yong-xiang; Li, Chunhui; Shi, Yunsu; Song, Yanchun; Zhang, Dengfeng; Wang, Tianyu; Li, Yu

    2017-01-01

    Maize grain yield and related traits are complex and are controlled by a large number of genes of small effect or quantitative trait loci (QTL). Over the years, a large number of yield-related QTLs have been identified in maize and deposited in public databases. However, integrating and re-analyzing these data and mining candidate loci for yield-related traits has become a major issue in maize. In this study, we collected information on QTLs conferring maize yield-related traits from 33 published studies. Then, 999 of these QTLs were iteratively projected and subjected to meta-analysis to obtain metaQTLs (MQTLs). A total of 76 MQTLs were found across the maize genome. Based on a comparative genomics strategy, several maize orthologs of rice yield-related genes were identified in these MQTL regions. Furthermore, three potential candidate genes (Gene ID: GRMZM2G359974, GRMZM2G301884, and GRMZM2G083894) associated with kernel size and weight within three MQTL regions were identified using regional association mapping, based on the results of the meta-analysis. This strategy, combining MQTL analysis and regional association mapping, is helpful for functional marker development and rapid identification of candidate genes or loci. PMID:29312420

  19. AgeFactDB—the JenAge Ageing Factor Database—towards data integration in ageing research

    PubMed Central

    Hühne, Rolf; Thalheim, Torsten; Sühnel, Jürgen

    2014-01-01

    AgeFactDB (http://agefactdb.jenage.de) is a database aimed at the collection and integration of ageing phenotype data including lifespan information. Ageing factors are considered to be genes, chemical compounds or other factors such as dietary restriction, whose action results in a changed lifespan or another ageing phenotype. Any information related to the effects of ageing factors is called an observation and is presented on observation pages. To provide concise access to the complete information for a particular ageing factor, corresponding observations are also summarized on ageing factor pages. In a first step, ageing-related data were primarily taken from existing databases such as the Ageing Gene Database—GenAge, the Lifespan Observations Database and the Dietary Restriction Gene Database—GenDR. In addition, we have started to include new ageing-related information. Based on homology data taken from the HomoloGene Database, AgeFactDB also provides observation and ageing factor pages of genes that are homologous to known ageing-related genes. These homologues are considered as candidate or putative ageing-related genes. AgeFactDB offers a variety of search and browse options, and also allows the download of ageing factor or observation lists in TSV, CSV and XML formats. PMID:24217911

  20. An integrated database-pipeline system for studying single nucleotide polymorphisms and diseases.

    PubMed

    Yang, Jin Ok; Hwang, Sohyun; Oh, Jeongsu; Bhak, Jong; Sohn, Tae-Kwon

    2008-12-12

    Studies on the relationship between disease and genetic variations such as single nucleotide polymorphisms (SNPs) are important. Genetic variations can cause disease by influencing important biological regulation processes. Despite the needs for analyzing SNP and disease correlation, most existing databases provide information only on functional variants at specific locations on the genome, or deal with only a few genes associated with disease. There is no combined resource to widely support gene-, SNP-, and disease-related information, and to capture relationships among such data. Therefore, we developed an integrated database-pipeline system for studying SNPs and diseases. To implement the pipeline system for the integrated database, we first unified complicated and redundant disease terms and gene names using the Unified Medical Language System (UMLS) for classification and noun modification, and the HUGO Gene Nomenclature Committee (HGNC) and NCBI gene databases. Next, we collected and integrated representative databases for three categories of information. For genes and proteins, we examined the NCBI mRNA, UniProt, UCSC Table Track and MitoDat databases. For genetic variants we used the dbSNP, JSNP, ALFRED, and HGVbase databases. For disease, we employed OMIM, GAD, and HGMD databases. The database-pipeline system provides a disease thesaurus, including genes and SNPs associated with disease. The search results for these categories are available on the web page http://diseasome.kobic.re.kr/, and a genome browser is also available to highlight findings, as well as to permit the convenient review of potentially deleterious SNPs among genes strongly associated with specific diseases and clinical phenotypes. Our system is designed to capture the relationships between SNPs associated with disease and disease-causing genes. The integrated database-pipeline provides a list of candidate genes and SNP markers for evaluation in both epidemiological and molecular biological approaches to diseases-gene association studies. Furthermore, researchers then can decide semi-automatically the data set for association studies while considering the relationships between genetic variation and diseases. The database can also be economical for disease-association studies, as well as to facilitate an understanding of the processes which cause disease. Currently, the database contains 14,674 SNP records and 109,715 gene records associated with human diseases and it is updated at regular intervals.

  1. GExplore: a web server for integrated queries of protein domains, gene expression and mutant phenotypes

    PubMed Central

    2009-01-01

    Background The majority of the genes even in well-studied multi-cellular model organisms have not been functionally characterized yet. Mining the numerous genome wide data sets related to protein function to retrieve potential candidate genes for a particular biological process remains a challenge. Description GExplore has been developed to provide a user-friendly database interface for data mining at the gene expression/protein function level to help in hypothesis development and experiment design. It supports combinatorial searches for proteins with certain domains, tissue- or developmental stage-specific expression patterns, and mutant phenotypes. GExplore operates on a stand-alone database and has fast response times, which is essential for exploratory searches. The interface is not only user-friendly, but also modular so that it accommodates additional data sets in the future. Conclusion GExplore is an online database for quick mining of data related to gene and protein function, providing a multi-gene display of data sets related to the domain composition of proteins as well as expression and phenotype data. GExplore is publicly available at: http://genome.sfu.ca/gexplore/ PMID:19917126

  2. OGRO: The Overview of functionally characterized Genes in Rice online database.

    PubMed

    Yamamoto, Eiji; Yonemaru, Jun-Ichi; Yamamoto, Toshio; Yano, Masahiro

    2012-12-01

    The high-quality sequence information and rich bioinformatics tools available for rice have contributed to remarkable advances in functional genomics. To facilitate the application of gene function information to the study of natural variation in rice, we comprehensively searched for articles related to rice functional genomics and extracted information on functionally characterized genes. As of 31 March 2012, 702 functionally characterized genes were annotated. This number represents about 1.6% of the predicted loci in the Rice Annotation Project Database. The compiled gene information is organized to facilitate direct comparisons with quantitative trait locus (QTL) information in the Q-TARO database. Comparison of genomic locations between functionally characterized genes and the QTLs revealed that QTL clusters were often co-localized with high-density gene regions, and that the genes associated with the QTLs in these clusters were different genes, suggesting that these QTL clusters are likely to be explained by tightly linked but distinct genes. Information on the functionally characterized genes compiled during this study is now available in the O verview of Functionally Characterized G enes in R ice O nline database (OGRO) on the Q-TARO website ( http://qtaro.abr.affrc.go.jp/ogro ). The database has two interfaces: a table containing gene information, and a genome viewer that allows users to compare the locations of QTLs and functionally characterized genes. OGRO on Q-TARO will facilitate a candidate-gene approach to identifying the genes responsible for QTLs. Because the QTL descriptions in Q-TARO contain information on agronomic traits, such comparisons will also facilitate the annotation of functionally characterized genes in terms of their effects on traits important for rice breeding. The increasing amount of information on rice gene function being generated from mutant panels and other types of studies will make the OGRO database even more valuable in the future.

  3. Genome-wide identification of lineage-specific genes in Arabidopsis, Oryza and Populus

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yang, Xiaohan; Jawdy, Sara; Tschaplinski, Timothy J

    2009-01-01

    Protein sequences were compared among Arabidopsis, Oryza and Populus to identify differential gene (DG) sets that are in one but not the other two genomes. The DG sets were screened against a plant transcript database, the NR protein database and six newly-sequenced genomes (Carica, Glycine, Medicago, Sorghum, Vitis and Zea) to identify a set of species-specific genes (SS). Gene expression, protein motif and intron number were examined. 192, 641 and 109 SS genes were identified in Arabidopsis, Oryza and Populus, respectively. Some SS genes were preferentially expressed in flowers, roots, xylem and cambium or up-regulated by stress. Six conserved motifsmore » in Arabidopsis and Oryza SS proteins were found in other distant lineages. The SS gene sets were enriched with intronless genes. The results reflect functional and/or anatomical differences between monocots and eudicots or between herbaceous and woody plants. The Populus-specific genes are candidates for carbon sequestration and biofuel research.« less

  4. dbMDEGA: a database for meta-analysis of differentially expressed genes in autism spectrum disorder.

    PubMed

    Zhang, Shuyun; Deng, Libin; Jia, Qiyue; Huang, Shaoting; Gu, Junwang; Zhou, Fankun; Gao, Meng; Sun, Xinyi; Feng, Chang; Fan, Guangqin

    2017-11-16

    Autism spectrum disorders (ASD) are hereditary, heterogeneous and biologically complex neurodevelopmental disorders. Individual studies on gene expression in ASD cannot provide clear consensus conclusions. Therefore, a systematic review to synthesize the current findings from brain tissues and a search tool to share the meta-analysis results are urgently needed. Here, we conducted a meta-analysis of brain gene expression profiles in the current reported human ASD expression datasets (with 84 frozen male cortex samples, 17 female cortex samples, 32 cerebellum samples and 4 formalin fixed samples) and knock-out mouse ASD model expression datasets (with 80 collective brain samples). Then, we applied R language software and developed an interactive shared and updated database (dbMDEGA) displaying the results of meta-analysis of data from ASD studies regarding differentially expressed genes (DEGs) in the brain. This database, dbMDEGA ( https://dbmdega.shinyapps.io/dbMDEGA/ ), is a publicly available web-portal for manual annotation and visualization of DEGs in the brain from data from ASD studies. This database uniquely presents meta-analysis values and homologous forest plots of DEGs in brain tissues. Gene entries are annotated with meta-values, statistical values and forest plots of DEGs in brain samples. This database aims to provide searchable meta-analysis results based on the current reported brain gene expression datasets of ASD to help detect candidate genes underlying this disorder. This new analytical tool may provide valuable assistance in the discovery of DEGs and the elucidation of the molecular pathogenicity of ASD. This database model may be replicated to study other disorders.

  5. Bioinformatic analysis of the nucleotide binding site-encoding disease-resistance genes in foxtail millet (Setaria italica (L.) Beauv.).

    PubMed

    Zhu, Y B; Xie, X Q; Li, Z Y; Bai, H; Dong, L; Dong, Z P; Dong, J G

    2014-08-28

    The nucleotide-binding site (NBS) disease-resistance genes are the largest category of plant disease-resistance gene analogs. The complete set of disease-resistant candidate genes, which encode the NBS sequence, was filtered in the genomes of two varieties of foxtail millet (Yugu1 and 'Zhang gu'). This study investigated a number of characteristics of the putative NBS genes, such as structural diversity and phylogenetic relationships. A total of 269 and 281 NBS-coding sequences were identified in Yugu1 and 'Zhang gu', respectively. When the two databases were compared, 72 genes were found to be identical and 164 genes showed more than 90% similarity. Physical positioning and gene family analysis of the NBS disease-resistance genes in the genome revealed that the number of genes on each chromosome was similar in both varieties. The eighth chromosome contained the largest number of genes and the ninth chromosome contained the lowest number of genes. Exactly 34 gene clusters containing the 161 genes were found in the Yugu1 genome, with each cluster containing 4.7 genes on average. In comparison, the 'Zhang gu' genome possessed 28 gene clusters, which had 151 genes, with an average of 5.4 genes in each cluster. The largest gene cluster, located on the eighth chromosome, contained 12 genes in the Yugu1 database, whereas it contained 16 genes in the 'Zhang gu' database. The classification results showed that the CC-NBS-LRR gene made up the largest part of each chromosome in the two databases. Two TIR-NBS genes were also found in the Yugu1 genome.

  6. iSyTE 2.0: a database for expression-based gene discovery in the eye

    PubMed Central

    Kakrana, Atul; Yang, Andrian; Anand, Deepti; Djordjevic, Djordje; Ramachandruni, Deepti; Singh, Abhyudai; Huang, Hongzhan

    2018-01-01

    Abstract Although successful in identifying new cataract-linked genes, the previous version of the database iSyTE (integrated Systems Tool for Eye gene discovery) was based on expression information on just three mouse lens stages and was functionally limited to visualization by only UCSC-Genome Browser tracks. To increase its efficacy, here we provide an enhanced iSyTE version 2.0 (URL: http://research.bioinformatics.udel.edu/iSyTE) based on well-curated, comprehensive genome-level lens expression data as a one-stop portal for the effective visualization and analysis of candidate genes in lens development and disease. iSyTE 2.0 includes all publicly available lens Affymetrix and Illumina microarray datasets representing a broad range of embryonic and postnatal stages from wild-type and specific gene-perturbation mouse mutants with eye defects. Further, we developed a new user-friendly web interface for direct access and cogent visualization of the curated expression data, which supports convenient searches and a range of downstream analyses. The utility of these new iSyTE 2.0 features is illustrated through examples of established genes associated with lens development and pathobiology, which serve as tutorials for its application by the end-user. iSyTE 2.0 will facilitate the prioritization of eye development and disease-linked candidate genes in studies involving transcriptomics or next-generation sequencing data, linkage analysis and GWAS approaches. PMID:29036527

  7. Transcriptome Profiling of Khat (Catha edulis) and Ephedra sinica Reveals Gene Candidates Potentially Involved in Amphetamine-Type Alkaloid Biosynthesis

    PubMed Central

    Groves, Ryan A.; Hagel, Jillian M.; Zhang, Ye; Kilpatrick, Korey; Levy, Asaf; Marsolais, Frédéric; Lewinsohn, Efraim; Sensen, Christoph W.; Facchini, Peter J.

    2015-01-01

    Amphetamine analogues are produced by plants in the genus Ephedra and by khat (Catha edulis), and include the widely used decongestants and appetite suppressants (1S,2S)-pseudoephedrine and (1R,2S)-ephedrine. The production of these metabolites, which derive from L-phenylalanine, involves a multi-step pathway partially mapped out at the biochemical level using knowledge of benzoic acid metabolism established in other plants, and direct evidence using khat and Ephedra species as model systems. Despite the commercial importance of amphetamine-type alkaloids, only a single step in their biosynthesis has been elucidated at the molecular level. We have employed Illumina next-generation sequencing technology, paired with Trinity and Velvet-Oases assembly platforms, to establish data-mining frameworks for Ephedra sinica and khat plants. Sequence libraries representing a combined 200,000 unigenes were subjected to an annotation pipeline involving direct searches against public databases. Annotations included the assignment of Gene Ontology (GO) terms used to allocate unigenes to functional categories. As part of our functional genomics program aimed at novel gene discovery, the databases were mined for enzyme candidates putatively involved in alkaloid biosynthesis. Queries used for mining included enzymes with established roles in benzoic acid metabolism, as well as enzymes catalyzing reactions similar to those predicted for amphetamine alkaloid metabolism. Gene candidates were evaluated based on phylogenetic relationships, FPKM-based expression data, and mechanistic considerations. Establishment of expansive sequence resources is a critical step toward pathway characterization, a goal with both academic and industrial implications. PMID:25806807

  8. RNA-Seq analysis and annotation of a draft blueberry genome assembly identifies candidate genes involved in fruit ripening, biosynthesis of bioactive compounds, and stage-specific alternative splicing.

    PubMed

    Gupta, Vikas; Estrada, April D; Blakley, Ivory; Reid, Rob; Patel, Ketan; Meyer, Mason D; Andersen, Stig Uggerhøj; Brown, Allan F; Lila, Mary Ann; Loraine, Ann E

    2015-01-01

    Blueberries are a rich source of antioxidants and other beneficial compounds that can protect against disease. Identifying genes involved in synthesis of bioactive compounds could enable the breeding of berry varieties with enhanced health benefits. Toward this end, we annotated a previously sequenced draft blueberry genome assembly using RNA-Seq data from five stages of berry fruit development and ripening. Genome-guided assembly of RNA-Seq read alignments combined with output from ab initio gene finders produced around 60,000 gene models, of which more than half were similar to proteins from other species, typically the grape Vitis vinifera. Comparison of gene models to the PlantCyc database of metabolic pathway enzymes identified candidate genes involved in synthesis of bioactive compounds, including bixin, an apocarotenoid with potential disease-fighting properties, and defense-related cyanogenic glycosides, which are toxic. Cyanogenic glycoside (CG) biosynthetic enzymes were highly expressed in green fruit, and a candidate CG detoxification enzyme was up-regulated during fruit ripening. Candidate genes for ethylene, anthocyanin, and 400 other biosynthetic pathways were also identified. Homology-based annotation using Blast2GO and InterPro assigned Gene Ontology terms to around 15,000 genes. RNA-Seq expression profiling showed that blueberry growth, maturation, and ripening involve dynamic gene expression changes, including coordinated up- and down-regulation of metabolic pathway enzymes and transcriptional regulators. Analysis of RNA-seq alignments identified developmentally regulated alternative splicing, promoter use, and 3' end formation. We report genome sequence, gene models, functional annotations, and RNA-Seq expression data that provide an important new resource enabling high throughput studies in blueberry.

  9. Organization and annotation of the Xcat critical region: elimination of seven positional candidate genes.

    PubMed

    Huang, Kristen M; Geunes-Boyer, Scarlett; Wu, Sufen; Dutra, Amalia; Favor, Jack; Stambolian, Dwight

    2004-05-01

    Xcat mice display X-linked congenital cataracts and are a mouse model for the human X-linked cataract disease Nance Horan syndrome (NHS). The genetic defect in Xcat mice and NHS patients is not known. We isolated and sequenced a BAC contig representing a portion of the Xcat critical region. We combined our sequencing data with the most recent mouse sequence assemblies from both Celera and public databases. The sequence of the 2.2-Mb Xcat critical region was then analyzed for potential Xcat candidate genes. The coding regions of the seven known genes within this area (Rai2, Rbbp7, Ctps2, Calb3, Grpr, Reps2, and Syap1) were sequenced in Xcat mice and no mutations were detected. The expression of Rai2 was quantitatively identical in wild-type and Xcat mutant eyes. These results indicate that the Xcat mutation is within a novel, undiscovered gene.

  10. Transcriptional sequencing and analysis of major genes involved in the adventitious root formation of mango cotyledon segments.

    PubMed

    Li, Yun-He; Zhang, Hong-Na; Wu, Qing-Song; Muday, Gloria K

    2017-06-01

    A total of 74,745 unigenes were generated and 1975 DEGs were identified. Candidate genes that may be involved in the adventitious root formation of mango cotyledon segment were revealed. Adventitious root formation is a crucial step in plant vegetative propagation, but the molecular mechanism of adventitious root formation remains unclear. Adventitious roots formed only at the proximal cut surface (PCS) of mango cotyledon segments, whereas no roots were formed on the opposite, distal cut surface (DCS). To identify the transcript abundance changes linked to adventitious root development, RNA was isolated from PCS and DCS at 0, 4 and 7 days after culture, respectively. Illumina sequencing of libraries generated from these samples yielded 62.36 Gb high-quality reads that were assembled into 74,745 unigenes with an average sequence length of 807 base pairs, and 33,252 of the assembled unigenes at least had homologs in one of the public databases. Comparative analysis of these transcriptome databases revealed that between the different time points at PCS there were 1966 differentially expressed genes (DEGs), while there were only 51 DEGs for the PCS vs. DCS when time-matched samples were compared. Of these DEGs, 1636 were assigned to gene ontology (GO) classes, the majority of that was involved in cellular processes, metabolic processes and single-organism processes. Candidate genes that may be involved in the adventitious root formation of mango cotyledon segment are predicted to encode polar auxin transport carriers, auxin-regulated proteins, cell wall remodeling enzymes and ethylene-related proteins. In order to validate RNA-sequencing results, we further analyzed the expression profiles of 20 genes by quantitative real-time PCR. This study expands the transcriptome information for Mangifera indica and identifies candidate genes involved in adventitious root formation in cotyledon segments of mango.

  11. Phenome-driven disease genetics prediction toward drug discovery

    PubMed Central

    Chen, Yang; Li, Li; Zhang, Guo-Qiang; Xu, Rong

    2015-01-01

    Motivation: Discerning genetic contributions to diseases not only enhances our understanding of disease mechanisms, but also leads to translational opportunities for drug discovery. Recent computational approaches incorporate disease phenotypic similarities to improve the prediction power of disease gene discovery. However, most current studies used only one data source of human disease phenotype. We present an innovative and generic strategy for combining multiple different data sources of human disease phenotype and predicting disease-associated genes from integrated phenotypic and genomic data. Results: To demonstrate our approach, we explored a new phenotype database from biomedical ontologies and constructed Disease Manifestation Network (DMN). We combined DMN with mimMiner, which was a widely used phenotype database in disease gene prediction studies. Our approach achieved significantly improved performance over a baseline method, which used only one phenotype data source. In the leave-one-out cross-validation and de novo gene prediction analysis, our approach achieved the area under the curves of 90.7% and 90.3%, which are significantly higher than 84.2% (P < e−4) and 81.3% (P < e−12) for the baseline approach. We further demonstrated that our predicted genes have the translational potential in drug discovery. We used Crohn’s disease as an example and ranked the candidate drugs based on the rank of drug targets. Our gene prediction approach prioritized druggable genes that are likely to be associated with Crohn’s disease pathogenesis, and our rank of candidate drugs successfully prioritized the Food and Drug Administration-approved drugs for Crohn’s disease. We also found literature evidence to support a number of drugs among the top 200 candidates. In summary, we demonstrated that a novel strategy combining unique disease phenotype data with system approaches can lead to rapid drug discovery. Availability and implementation: nlp.case.edu/public/data/DMN Contact: rxx@case.edu PMID:26072493

  12. Identification and fine mapping of a stay-green gene (Brnye1) in pakchoi (Brassica campestris L. ssp. chinensis).

    PubMed

    Wang, Nan; Liu, Zhiyong; Zhang, Yun; Li, Chengyu; Feng, Hui

    2018-03-01

    Using bulked segregant analysis combined with next-generation sequencing, we delimited the Brnye1 gene responsible for the stay-green trait of nye in pakchoi. Sequence analysis identified Bra019346 as the candidate gene. "Stay-green" refers to a plant trait whereby leaves remain green during senescence. This trait is useful in the cultivation of pakchoi (Brassica campestris L. ssp. chinensis), which is marketed as a green leaf product. This study aimed to identify the gene responsible for the stay-green trait in pakchoi. We identified a stay-green mutant in pakchoi, which we termed "nye". Genetic analysis revealed that the stay-green trait is controlled by a single recessive gene, Brnye1. Using the BSA-seq method, a 3.0-Mb candidate region was mapped on chromosome A03, which helped us localize Brnye1 to an 81.01-kb interval between SSR markers SSRWN27 and SSRWN30 via linkage analysis in an F 2 population. We identified 12 genes in this region, 11 of which were annotated based on the Brassica rapa annotation database, and one was a functionally unknown gene. An orthologous gene of the Arabidopsis gene AtNYE1, Bra019346, was identified as the potential candidate for Brnye1. Sequence analysis revealed a 40-bp insertion in the second exon of Bra019346 in nye, which generated the TAA stop codon. A candidate gene-specific Indel marker in 1561 F 2 individuals showed perfect cosegregation with Brnye1 in the nye mutant. These results provide a foundation for uncovering the molecular mechanism of the stay-green trait in pakchoi.

  13. Ontology-oriented retrieval of putative microRNAs in Vitis vinifera via GrapeMiRNA: a web database of de novo predicted grape microRNAs.

    PubMed

    Lazzari, Barbara; Caprera, Andrea; Cestaro, Alessandro; Merelli, Ivan; Del Corvo, Marcello; Fontana, Paolo; Milanesi, Luciano; Velasco, Riccardo; Stella, Alessandra

    2009-06-29

    Two complete genome sequences are available for Vitis vinifera Pinot noir. Based on the sequence and gene predictions produced by the IASMA, we performed an in silico detection of putative microRNA genes and of their targets, and collected the most reliable microRNA predictions in a web database. The application is available at http://www.itb.cnr.it/ptp/grapemirna/. The program FindMiRNA was used to detect putative microRNA genes in the grape genome. A very high number of predictions was retrieved, calling for validation. Nine parameters were calculated and, based on the grape microRNAs dataset available at miRBase, thresholds were defined and applied to FindMiRNA predictions having targets in gene exons. In the resulting subset, predictions were ranked according to precursor positions and sequence similarity, and to target identity. To further validate FindMiRNA predictions, comparisons to the Arabidopsis genome, to the grape Genoscope genome, and to the grape EST collection were performed. Results were stored in a MySQL database and a web interface was prepared to query the database and retrieve predictions of interest. The GrapeMiRNA database encompasses 5,778 microRNA predictions spanning the whole grape genome. Predictions are integrated with information that can be of use in selection procedures. Tools added in the web interface also allow to inspect predictions according to gene ontology classes and metabolic pathways of targets. The GrapeMiRNA database can be of help in selecting candidate microRNA genes to be validated.

  14. Screening key candidate genes and pathways involved in insulinoma by microarray analysis.

    PubMed

    Zhou, Wuhua; Gong, Li; Li, Xuefeng; Wan, Yunyan; Wang, Xiangfei; Li, Huili; Jiang, Bin

    2018-06-01

    Insulinoma is a rare type tumor and its genetic features remain largely unknown. This study aimed to search for potential key genes and relevant enriched pathways of insulinoma.The gene expression data from GSE73338 were downloaded from Gene Expression Omnibus database. Differentially expressed genes (DEGs) were identified between insulinoma tissues and normal pancreas tissues, followed by pathway enrichment analysis, protein-protein interaction (PPI) network construction, and module analysis. The expressions of candidate key genes were validated by quantitative real-time polymerase chain reaction (RT-PCR) in insulinoma tissues.A total of 1632 DEGs were obtained, including 1117 upregulated genes and 514 downregulated genes. Pathway enrichment results showed that upregulated DEGs were significantly implicated in insulin secretion, and downregulated DEGs were mainly enriched in pancreatic secretion. PPI network analysis revealed 7 hub genes with degrees more than 10, including GCG (glucagon), GCGR (glucagon receptor), PLCB1 (phospholipase C, beta 1), CASR (calcium sensing receptor), F2R (coagulation factor II thrombin receptor), GRM1 (glutamate metabotropic receptor 1), and GRM5 (glutamate metabotropic receptor 5). DEGs involved in the significant modules were enriched in calcium signaling pathway, protein ubiquitination, and platelet degranulation. Quantitative RT-PCR data confirmed that the expression trends of these hub genes were similar to the results of bioinformatic analysis.The present study demonstrated that candidate DEGs and enriched pathways were the potential critical molecule events involved in the development of insulinoma, and these findings were useful for better understanding of insulinoma genesis.

  15. Search for 5'-leader regulatory RNA structures based on gene annotation aided by the RiboGap database.

    PubMed

    Naghdi, Mohammad Reza; Smail, Katia; Wang, Joy X; Wade, Fallou; Breaker, Ronald R; Perreault, Jonathan

    2017-03-15

    The discovery of noncoding RNAs (ncRNAs) and their importance for gene regulation led us to develop bioinformatics tools to pursue the discovery of novel ncRNAs. Finding ncRNAs de novo is challenging, first due to the difficulty of retrieving large numbers of sequences for given gene activities, and second due to exponential demands on calculation needed for comparative genomics on a large scale. Recently, several tools for the prediction of conserved RNA secondary structure were developed, but many of them are not designed to uncover new ncRNAs, or are too slow for conducting analyses on a large scale. Here we present various approaches using the database RiboGap as a primary tool for finding known ncRNAs and for uncovering simple sequence motifs with regulatory roles. This database also can be used to easily extract intergenic sequences of eubacteria and archaea to find conserved RNA structures upstream of given genes. We also show how to extend analysis further to choose the best candidate ncRNAs for experimental validation. Copyright © 2017 Elsevier Inc. All rights reserved.

  16. Detection of alternative splice variants at the proteome level in Aspergillus flavus.

    PubMed

    Chang, Kung-Yen; Georgianna, D Ryan; Heber, Steffen; Payne, Gary A; Muddiman, David C

    2010-03-05

    Identification of proteins from proteolytic peptides or intact proteins plays an essential role in proteomics. Researchers use search engines to match the acquired peptide sequences to the target proteins. However, search engines depend on protein databases to provide candidates for consideration. Alternative splicing (AS), the mechanism where the exon of pre-mRNAs can be spliced and rearranged to generate distinct mRNA and therefore protein variants, enable higher eukaryotic organisms, with only a limited number of genes, to have the requisite complexity and diversity at the proteome level. Multiple alternative isoforms from one gene often share common segments of sequences. However, many protein databases only include a limited number of isoforms to keep minimal redundancy. As a result, the database search might not identify a target protein even with high quality tandem MS data and accurate intact precursor ion mass. We computationally predicted an exhaustive list of putative isoforms of Aspergillus flavus proteins from 20 371 expressed sequence tags to investigate whether an alternative splicing protein database can assign a greater proportion of mass spectrometry data. The newly constructed AS database provided 9807 new alternatively spliced variants in addition to 12 832 previously annotated proteins. The searches of the existing tandem MS spectra data set using the AS database identified 29 new proteins encoded by 26 genes. Nine fungal genes appeared to have multiple protein isoforms. In addition to the discovery of splice variants, AS database also showed potential to improve genome annotation. In summary, the introduction of an alternative splicing database helps identify more proteins and unveils more information about a proteome.

  17. ChimerDB 3.0: an enhanced database for fusion genes from cancer transcriptome and literature data mining.

    PubMed

    Lee, Myunggyo; Lee, Kyubum; Yu, Namhee; Jang, Insu; Choi, Ikjung; Kim, Pora; Jang, Ye Eun; Kim, Byounggun; Kim, Sunkyu; Lee, Byungwook; Kang, Jaewoo; Lee, Sanghyuk

    2017-01-04

    Fusion gene is an important class of therapeutic targets and prognostic markers in cancer. ChimerDB is a comprehensive database of fusion genes encompassing analysis of deep sequencing data and manual curations. In this update, the database coverage was enhanced considerably by adding two new modules of The Cancer Genome Atlas (TCGA) RNA-Seq analysis and PubMed abstract mining. ChimerDB 3.0 is composed of three modules of ChimerKB, ChimerPub and ChimerSeq. ChimerKB represents a knowledgebase including 1066 fusion genes with manual curation that were compiled from public resources of fusion genes with experimental evidences. ChimerPub includes 2767 fusion genes obtained from text mining of PubMed abstracts. ChimerSeq module is designed to archive the fusion candidates from deep sequencing data. Importantly, we have analyzed RNA-Seq data of the TCGA project covering 4569 patients in 23 cancer types using two reliable programs of FusionScan and TopHat-Fusion. The new user interface supports diverse search options and graphic representation of fusion gene structure. ChimerDB 3.0 is available at http://ercsb.ewha.ac.kr/fusiongene/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. Sequencing rare marine actinomycete genomes reveals high density of unique natural product biosynthetic gene clusters.

    PubMed

    Schorn, Michelle A; Alanjary, Mohammad M; Aguinaldo, Kristen; Korobeynikov, Anton; Podell, Sheila; Patin, Nastassia; Lincecum, Tommie; Jensen, Paul R; Ziemert, Nadine; Moore, Bradley S

    2016-12-01

    Traditional natural product discovery methods have nearly exhausted the accessible diversity of microbial chemicals, making new sources and techniques paramount in the search for new molecules. Marine actinomycete bacteria have recently come into the spotlight as fruitful producers of structurally diverse secondary metabolites, and remain relatively untapped. In this study, we sequenced 21 marine-derived actinomycete strains, rarely studied for their secondary metabolite potential and under-represented in current genomic databases. We found that genome size and phylogeny were good predictors of biosynthetic gene cluster diversity, with larger genomes rivalling the well-known marine producers in the Streptomyces and Salinispora genera. Genomes in the Micrococcineae suborder, however, had consistently the lowest number of biosynthetic gene clusters. By networking individual gene clusters into gene cluster families, we were able to computationally estimate the degree of novelty each genus contributed to the current sequence databases. Based on the similarity measures between all actinobacteria in the Joint Genome Institute's Atlas of Biosynthetic gene Clusters database, rare marine genera show a high degree of novelty and diversity, with Corynebacterium, Gordonia, Nocardiopsis, Saccharomonospora and Pseudonocardia genera representing the highest gene cluster diversity. This research validates that rare marine actinomycetes are important candidates for exploration, as they are relatively unstudied, and their relatives are historically rich in secondary metabolites.

  19. Sequencing rare marine actinomycete genomes reveals high density of unique natural product biosynthetic gene clusters

    PubMed Central

    Schorn, Michelle A.; Alanjary, Mohammad M.; Aguinaldo, Kristen; Korobeynikov, Anton; Podell, Sheila; Patin, Nastassia; Lincecum, Tommie; Jensen, Paul R.; Ziemert, Nadine

    2016-01-01

    Traditional natural product discovery methods have nearly exhausted the accessible diversity of microbial chemicals, making new sources and techniques paramount in the search for new molecules. Marine actinomycete bacteria have recently come into the spotlight as fruitful producers of structurally diverse secondary metabolites, and remain relatively untapped. In this study, we sequenced 21 marine-derived actinomycete strains, rarely studied for their secondary metabolite potential and under-represented in current genomic databases. We found that genome size and phylogeny were good predictors of biosynthetic gene cluster diversity, with larger genomes rivalling the well-known marine producers in the Streptomyces and Salinispora genera. Genomes in the Micrococcineae suborder, however, had consistently the lowest number of biosynthetic gene clusters. By networking individual gene clusters into gene cluster families, we were able to computationally estimate the degree of novelty each genus contributed to the current sequence databases. Based on the similarity measures between all actinobacteria in the Joint Genome Institute's Atlas of Biosynthetic gene Clusters database, rare marine genera show a high degree of novelty and diversity, with Corynebacterium, Gordonia, Nocardiopsis, Saccharomonospora and Pseudonocardia genera representing the highest gene cluster diversity. This research validates that rare marine actinomycetes are important candidates for exploration, as they are relatively unstudied, and their relatives are historically rich in secondary metabolites. PMID:27902408

  20. Brassica ASTRA: an integrated database for Brassica genomic research.

    PubMed

    Love, Christopher G; Robinson, Andrew J; Lim, Geraldine A C; Hopkins, Clare J; Batley, Jacqueline; Barker, Gary; Spangenberg, German C; Edwards, David

    2005-01-01

    Brassica ASTRA is a public database for genomic information on Brassica species. The database incorporates expressed sequences with Swiss-Prot and GenBank comparative sequence annotation as well as secondary Gene Ontology (GO) annotation derived from the comparison with Arabidopsis TAIR GO annotations. Simple sequence repeat molecular markers are identified within resident sequences and mapped onto the closely related Arabidopsis genome sequence. Bacterial artificial chromosome (BAC) end sequences derived from the Multinational Brassica Genome Project are also mapped onto the Arabidopsis genome sequence enabling users to identify candidate Brassica BACs corresponding to syntenic regions of Arabidopsis. This information is maintained in a MySQL database with a web interface providing the primary means of interrogation. The database is accessible at http://hornbill.cspp.latrobe.edu.au.

  1. FusionHub: A unified web platform for annotation and visualization of gene fusion events in human cancer.

    PubMed

    Panigrahi, Priyabrata; Jere, Abhay; Anamika, Krishanpal

    2018-01-01

    Gene fusion is a chromosomal rearrangement event which plays a significant role in cancer due to the oncogenic potential of the chimeric protein generated through fusions. At present many databases are available in public domain which provides detailed information about known gene fusion events and their functional role. Existing gene fusion detection tools, based on analysis of transcriptomics data usually report a large number of fusion genes as potential candidates, which could be either known or novel or false positives. Manual annotation of these putative genes is indeed time-consuming. We have developed a web platform FusionHub, which acts as integrated search engine interfacing various fusion gene databases and simplifies large scale annotation of fusion genes in a seamless way. In addition, FusionHub provides three ways of visualizing fusion events: circular view, domain architecture view and network view. Design of potential siRNA molecules through ensemble method is another utility integrated in FusionHub that could aid in siRNA-based targeted therapy. FusionHub is freely available at https://fusionhub.persistent.co.in.

  2. STOPGAP: a database for systematic target opportunity assessment by genetic association predictions.

    PubMed

    Shen, Judong; Song, Kijoung; Slater, Andrew J; Ferrero, Enrico; Nelson, Matthew R

    2017-09-01

    We developed the STOPGAP (Systematic Target OPportunity assessment by Genetic Association Predictions) database, an extensive catalog of human genetic associations mapped to effector gene candidates. STOPGAP draws on a variety of publicly available GWAS associations, linkage disequilibrium (LD) measures, functional genomic and variant annotation sources. Algorithms were developed to merge the association data, partition associations into non-overlapping LD clusters, map variants to genes and produce a variant-to-gene score used to rank the relative confidence among potential effector genes. This database can be used for a multitude of investigations into the genes and genetic mechanisms underlying inter-individual variation in human traits, as well as supporting drug discovery applications. Shell, R, Perl and Python scripts and STOPGAP R data files (version 2.5.1 at publication) are available at https://github.com/StatGenPRD/STOPGAP . Some of the most useful STOPGAP fields can be queried through an R Shiny web application at http://stopgapwebapp.com . matthew.r.nelson@gsk.com. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  3. DISSECTING THE GENETICS OF HUMAN HIGH MYOPIA: A MOLECULAR BIOLOGIC APPROACH

    PubMed Central

    Young, Terri L

    2004-01-01

    ABSTRACT Purpose Despite the plethora of experimental myopia animal studies that demonstrate biochemical factor changes in various eye tissues, and limited human studies utilizing pharmacologic agents to thwart axial elongation, we have little knowledge of the basic physiology that drives myopic development. Identifying the implicated genes for myopia susceptibility will provide a fundamental molecular understanding of how myopia occurs and may lead to directed physiologic (ie, pharmacologic, gene therapy) interventions. The purpose of this proposal is to describe the results of positional candidate gene screening of selected genes within the autosomal dominant high-grade myopia-2 locus (MYP2) on chromosome 18p11.31. Methods A physical map of a contracted MYP2 interval was compiled, and gene expression studies in ocular tissues using complementary DNA library screens, microarray matches, and reverse-transcription techniques aided in prioritizing gene selection for screening. The TGIF, EMLIN-2, MLCB, and CLUL1 genes were screened in DNA samples from unrelated controls and in high-myopia affected and unaffected family members from the original seven MYP2 pedigrees. All candidate genes were screened by direct base pair sequence analysis. Results Consistent segregation of a gene sequence alteration (polymorphism) with myopia was not demonstrated in any of the seven families. Novel single nucleotide polymorphisms were found. Conclusion The positional candidate genes TGIF, EMLIN-2, MLCB, and CLUL1 are not associated with MYP2-linked high-grade myopia. Base change polymorphisms discovered with base sequence screening of these genes were submitted to an Internet database. Other genes that also map within the interval are currently undergoing mutation screening. PMID:15747770

  4. Transcript map of the Ovum mutant (Om) locus: isolation by exon trapping of new candidate genes for the DDK syndrome.

    PubMed

    Le Bras, Stéphanie; Cohen-Tannoudji, Michel; Guyot, Valérie; Vandormael-Pournin, Sandrine; Coumailleau, Franck; Babinet, Charles; Baldacci, Patricia

    2002-08-21

    The DDK syndrome is defined as the embryonic lethality of F1 mouse embryos from crosses between DDK females and males from other strains (named hereafter as non-DDK strains). Genetically controlled by the Ovum mutant (Om) locus, it is due to a deleterious interaction between a maternal factor present in DDK oocytes and the non-DDK paternal pronucleus. Therefore, the DDK syndrome constitutes a unique genetic tool to study the crucial interactions that take place between the parental genomes and the egg cytoplasm during mammalian development. In this paper, we present an extensive analysis performed by exon trapping on the Om region. Twenty-seven trapped sequences were from genes in the databases: beta-adaptin, CCT zeta2, DNA LigaseIII, Notchless, Rad51l3 and Scya1. Twenty-eight other sequences presented similarities with expressed sequence tags and genomic sequences whereas 57 did not. The pattern of expression of 37 of these markers was established. Importantly, five of them are expressed in DDK oocytes and are candidate genes for the maternal factor, and 20 are candidate genes for the paternal factor since they are expressed in testis. This data is an important step towards identifying the genes responsible for the DDK syndrome.

  5. Candidate gene database and transcript map for peach, a model species for fruit trees.

    PubMed

    Horn, Renate; Lecouls, Anne-Claire; Callahan, Ann; Dandekar, Abhaya; Garay, Lilibeth; McCord, Per; Howad, Werner; Chan, Helen; Verde, Ignazio; Main, Doreen; Jung, Sook; Georgi, Laura; Forrest, Sam; Mook, Jennifer; Zhebentyayeva, Tatyana; Yu, Yeisoo; Kim, Hye Ran; Jesudurai, Christopher; Sosinski, Bryon; Arús, Pere; Baird, Vance; Parfitt, Dan; Reighard, Gregory; Scorza, Ralph; Tomkins, Jeffrey; Wing, Rod; Abbott, Albert Glenn

    2005-05-01

    Peach (Prunus persica) is a model species for the Rosaceae, which includes a number of economically important fruit tree species. To develop an extensive Prunus expressed sequence tag (EST) database for identifying and cloning the genes important to fruit and tree development, we generated 9,984 high-quality ESTs from a peach cDNA library of developing fruit mesocarp. After assembly and annotation, a putative peach unigene set consisting of 3,842 ESTs was defined. Gene ontology (GO) classification was assigned based on the annotation of the single "best hit" match against the Swiss-Prot database. No significant homology could be found in the GenBank nr databases for 24.3% of the sequences. Using core markers from the general Prunus genetic map, we anchored bacterial artificial chromosome (BAC) clones on the genetic map, thereby providing a framework for the construction of a physical and transcript map. A transcript map was developed by hybridizing 1,236 ESTs from the putative peach unigene set and an additional 68 peach cDNA clones against the peach BAC library. Hybridizing ESTs to genetically anchored BACs immediately localized 11.2% of the ESTs on the genetic map. ESTs showed a clustering of expressed genes in defined regions of the linkage groups. [The data were built into a regularly updated Genome Database for Rosaceae (GDR), available at (http://www.genome.clemson.edu/gdr/).].

  6. Integrated analysis of gene expression and methylation profiles of 48 candidate genes in breast cancer patients.

    PubMed

    Li, Zibo; Heng, Jianfu; Yan, Jinhua; Guo, Xinwu; Tang, Lili; Chen, Ming; Peng, Limin; Wu, Yepeng; Wang, Shouman; Xiao, Zhi; Deng, Zhongping; Dai, Lizhong; Wang, Jun

    2016-11-01

    Gene-specific methylation and expression have shown biological and clinical importance for breast cancer diagnosis and prognosis. Integrated analysis of gene methylation and gene expression may identify genes associated with biology mechanism and clinical outcome of breast cancer and aid in clinical management. Using high-throughput microfluidic quantitative PCR, we analyzed the expression profiles of 48 candidate genes in 96 Chinese breast cancer patients and investigated their correlation with gene methylation and associations with breast cancer clinical parameters. Breast cancer-specific gene expression alternation was found in 25 genes with significant expression difference between paired tumor and normal tissues. A total of 9 genes (CCND2, EGFR, GSTP1, PGR, PTGS2, RECK, SOX17, TNFRSF10D, and WIF1) showed significant negative correlation between methylation and gene expression, which were validated in the TCGA database. Total 23 genes (ACADL, APC, BRCA2, CADM1, CAV1, CCND2, CST6, EGFR, ESR2, GSTP1, ICAM5, NPY, PGR, PTGS2, RECK, RUNX3, SFRP1, SOX17, SYK, TGFBR2, TNFRSF10D, WIF1, and WRN) annotated with potential TFBSs in the promoter regions showed negative correlation between methylation and expression. In logistics regression analysis, 31 of the 48 genes showed improved performance in disease prediction with combination of methylation and expression coefficient. Our results demonstrated the complex correlation and the possible regulatory mechanisms between DNA methylation and gene expression. Integration analysis of methylation and expression of candidate genes could improve performance in breast cancer prediction. These findings would contribute to molecular characterization and identification of biomarkers for potential clinical applications.

  7. Genome-wide copy number variant analysis for congenital ventricular septal defects in Chinese Han population.

    PubMed

    An, Yu; Duan, Wenyuan; Huang, Guoying; Chen, Xiaoli; Li, Li; Nie, Chenxia; Hou, Jia; Gui, Yonghao; Wu, Yiming; Zhang, Feng; Shen, Yiping; Wu, Bailin; Wang, Hongyan

    2016-01-08

    Ventricular septal defects (VSDs) constitute the most prevalent congenital heart disease (CHD), occurs either in isolation (isolated VSD) or in combination with other cardiac defects (complex VSD). Copy number variation (CNV) has been highlighted as a possible contributing factor to the etiology of many congenital diseases. However, little is known concerning the involvement of CNVs in either isolated or complex VSDs. We analyzed 154 unrelated Chinese individuals with VSD by chromosomal microarray analysis. The subjects were recruited from four hospitals across China. Each case underwent clinical assessment to define the type of VSD, either isolated or complex VSD. CNVs detected were categorized into syndrom related CNVs, recurrent CNVs and rare CNVs. Genes encompassed by the CNVs were analyzed using enrichment and pathway analysis. Among 154 probands, we identified 29 rare CNVs in 26 VSD patients (16.9 %, 26/154) and 8 syndrome-related CNVs in 8 VSD patients (5.2 %, 8/154). 12 of the detected 29 rare CNVs (41.3 %) were recurrently reported in DECIPHER or ISCA database as associated with either VSD or general heart disease. Fifteen genes (5 %, 15/285) within CNVs were associated with a broad spectrum of complicated CHD. Among these15 genes, 7 genes were in "abnormal interventricular septum morphology" derived from the MGI (mouse genome informatics) database, and nine genes were associated with cardiovascular system development (GO:0072538).We also found that these VSD-related candidate genes are enriched in chromatin binding and transcription regulation, which are the biological processes underlying heart development. Our study demonstrates the potential clinical diagnostic utility of genomic imbalance profiling in VSD patients. Additionally, gene enrichment and pathway analysis helped us to implicate VSD related candidate genes.

  8. HuMiChip: Development of a Functional Gene Array for the Study of Human Microbiomes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tu, Q.; Deng, Ye; Lin, Lu

    Microbiomes play very important roles in terms of nutrition, health and disease by interacting with their hosts. Based on sequence data currently available in public domains, we have developed a functional gene array to monitor both organismal and functional gene profiles of normal microbiota in human and mouse hosts, and such an array is called human and mouse microbiota array, HMM-Chip. First, seed sequences were identified from KEGG databases, and used to construct a seed database (seedDB) containing 136 gene families in 19 metabolic pathways closely related to human and mouse microbiomes. Second, a mother database (motherDB) was constructed withmore » 81 genomes of bacterial strains with 54 from gut and 27 from oral environments, and 16 metagenomes, and used for selection of genes and probe design. Gene prediction was performed by Glimmer3 for bacterial genomes, and by the Metagene program for metagenomes. In total, 228,240 and 801,599 genes were identified for bacterial genomes and metagenomes, respectively. Then the motherDB was searched against the seedDB using the HMMer program, and gene sequences in the motherDB that were highly homologous with seed sequences in the seedDB were used for probe design by the CommOligo software. Different degrees of specific probes, including gene-specific, inclusive and exclusive group-specific probes were selected. All candidate probes were checked against the motherDB and NCBI databases for specificity. Finally, 7,763 probes covering 91.2percent (12,601 out of 13,814) HMMer confirmed sequences from 75 bacterial genomes and 16 metagenomes were selected. This developed HMM-Chip is able to detect the diversity and abundance of functional genes, the gene expression of microbial communities, and potentially, the interactions of microorganisms and their hosts.« less

  9. Phytoremediation of chromium using Salix species: cloning ESTs and candidate genes involved in the Cr response.

    PubMed

    Quaggiotti, Silvia; Barcaccia, Gianni; Schiavon, Michela; Nicolé, Silvia; Galla, Giulio; Rossignolo, Virginia; Soattin, Marica; Malagoli, Mario

    2007-11-01

    In this research a differential display based on the detection of cDNA-AFLP markers was used to identify candidate genes potentially involved in the regulation of the response to chromium in four different willow species (Salix alba, Salix eleagnos, Salix fragilis and Salix matsudana) chosen on the basis of their suitability in phytoremediation techniques. Our approach enabled the assay of a large set of mRNA-related fragments and increased the reliability of amplification-based transcriptome analysis. The vast majority of transcript-derived fragments were shared among samples within species and thus attributable to constitutively expressed genes. However, a number of differentially expressed mRNAs were scored in each species and a total of 68 transcripts displaying an altered expression in response to Cr were isolated and sequenced. Public database querying revealed that 44.1% and 4.4% of the cloned ESTs score significant similarity with genes encoding proteins having known or putative function, or with genes coding for unknown proteins, respectively, whereas the remaining 51.5% did not retrieve any homology. Semi-quantitative RT-PCR analysis of seven candidate genes fully confirmed the expression patterns obtained by cDNA-AFLP. Our results indicate the existence of common mechanisms of gene regulation in response to Cr, pathogen attack and senescence-mediated programmed cell death, and suggest a role for the genes isolated in the cross-talk of the signaling pathways governing the adaptation to biotic and abiotic stresses.

  10. Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database.

    PubMed

    Winsor, Geoffrey L; Griffiths, Emma J; Lo, Raymond; Dhillon, Bhavjinder K; Shay, Julie A; Brinkman, Fiona S L

    2016-01-04

    The Pseudomonas Genome Database (http://www.pseudomonas.com) is well known for the application of community-based annotation approaches for producing a high-quality Pseudomonas aeruginosa PAO1 genome annotation, and facilitating whole-genome comparative analyses with other Pseudomonas strains. To aid analysis of potentially thousands of complete and draft genome assemblies, this database and analysis platform was upgraded to integrate curated genome annotations and isolate metadata with enhanced tools for larger scale comparative analysis and visualization. Manually curated gene annotations are supplemented with improved computational analyses that help identify putative drug targets and vaccine candidates or assist with evolutionary studies by identifying orthologs, pathogen-associated genes and genomic islands. The database schema has been updated to integrate isolate metadata that will facilitate more powerful analysis of genomes across datasets in the future. We continue to place an emphasis on providing high-quality updates to gene annotations through regular review of the scientific literature and using community-based approaches including a major new Pseudomonas community initiative for the assignment of high-quality gene ontology terms to genes. As we further expand from thousands of genomes, we plan to provide enhancements that will aid data visualization and analysis arising from whole-genome comparative studies including more pan-genome and population-based approaches. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  11. Leishmania genome analysis and high-throughput immunological screening identifies tuzin as a novel vaccine candidate against visceral leishmaniasis.

    PubMed

    Lakshmi, Bhavana Sethu; Wang, Ruobing; Madhubala, Rentala

    2014-06-24

    Leishmaniasis is a neglected tropical disease caused by Leishmania species. It is a major health concern affecting 88 countries and threatening 350 million people globally. Unfortunately, there are no vaccines and there are limitations associated with the current therapeutic regimens for leishmaniasis. The emerging cases of drug-resistance further aggravate the situation, demanding rapid drug and vaccine development. The genome sequence of Leishmania, provides access to novel genes that hold potential as chemotherapeutic targets or vaccine candidates. In this study, we selected 19 antigenic genes from about 8000 common Leishmania genes based on the Leishmania major and Leishmania infantum genome information available in the pathogen databases. Potential vaccine candidates thus identified were screened using an in vitro high throughput immunological platform developed in the laboratory. Four candidate genes coding for tuzin, flagellar glycoprotein-like protein (FGP), phospholipase A1-like protein (PLA1) and potassium voltage-gated channel protein (K VOLT) showed a predominant protective Th1 response over disease exacerbating Th2. We report the immunogenic properties and protective efficacy of one of the four antigens, tuzin, as a DNA vaccine against Leishmania donovani challenge. Our results show that administration of tuzin DNA protected BALB/c mice against L. donovani challenge and that protective immunity was associated with higher levels of IFN-γ and IL-12 production in comparison to IL-4 and IL-10. Our study presents a simple approach to rapidly identify potential vaccine candidates using the exhaustive information stored in the genome and an in vitro high-throughput immunological platform. Copyright © 2014. Published by Elsevier Ltd.

  12. Identification and expression of the WRKY transcription factors of Carica papaya in response to abiotic and biotic stresses.

    PubMed

    Pan, Lin-Jie; Jiang, Ling

    2014-03-01

    The WRKY transcription factor (TF) plays a very important role in the response of plants to various abiotic and biotic stresses. A local papaya database was built according to the GenBank expressed sequence tag database using the BioEdit software. Fifty-two coding sequences of Carica papaya WRKY TFs were predicted using the tBLASTn tool. The phylogenetic tree of the WRKY proteins was classified. The expression profiles of 13 selected C. papaya WRKY TF genes under stress induction were constructed by quantitative real-time polymerase chain reaction. The expression levels of these WRKY genes in response to 3 abiotic and 2 biotic stresses were evaluated. TF807.3 and TF72.14 are upregulated by low temperature; TF807.3, TF43.76, TF12.199 and TF12.62 are involved in the response to drought stress; TF9.35, TF18.51, TF72.14 and TF12.199 is involved in response to wound; TF12.199, TF807.3, TF21.156 and TF18.51 was induced by PRSV pathogen; TF72.14 and TF43.76 are upregulated by SA. The regulated expression levels of above eight genes normalized against housekeeping gene actin were significant at probability of 0.01 levels. These WRKY TFs could be related to corresponding stress resistance and selected as the candidate genes, especially, the two genes TF807.3 and TF12.199, which were regulated notably by four stresses respectively. This study may provide useful information and candidate genes for the development of transgenic stress tolerant papaya varieties.

  13. Autism genes are selectively targeted by environmental pollutants including pesticides, heavy metals, bisphenol A, phthalates and many others in food, cosmetics or household products.

    PubMed

    Carter, C J; Blizard, R A

    2016-10-27

    The increasing incidence of autism suggests a major environmental influence. Epidemiology has implicated many candidates and genetics many susceptibility genes. Gene/environment interactions in autism were analysed using 206 autism susceptibility genes (ASG's) from the Autworks database to interrogate ∼1 million chemical/gene interactions in the comparative toxicogenomics database. Any bias towards ASG's was statistically determined for each chemical. Many suspect compounds identified in epidemiology, including tetrachlorodibenzodioxin, pesticides, particulate matter, benzo(a)pyrene, heavy metals, valproate, acetaminophen, SSRI's, cocaine, bisphenol A, phthalates, polyhalogenated biphenyls, flame retardants, diesel constituents, terbutaline and oxytocin, inter alia showed a significant degree of bias towards ASG's, as did relevant endogenous agents (retinoids, sex steroids, thyroxine, melatonin, folate, dopamine, serotonin). Numerous other suspected endocrine disruptors (over 100) selectively targeted ASG's including paraquat, atrazine and other pesticides not yet studied in autism and many compounds used in food, cosmetics or household products, including tretinoin, soy phytoestrogens, aspartame, titanium dioxide and sodium fluoride. Autism polymorphisms influence the sensitivity to some of these chemicals and these same genes play an important role in barrier function and control of respiratory cilia sweeping particulate matter from the airways. Pesticides, heavy metals and pollutants also disrupt barrier and/or ciliary function, which is regulated by sex steroids and by bitter/sweet taste receptors. Further epidemiological studies and neurodevelopmental and behavioural research is warranted to determine the relevance of large number of suspect candidates whose addition to the environment, household, food and cosmetics might be fuelling the autism epidemic in a gene-dependent manner. Copyright © 2016. Published by Elsevier Ltd.

  14. Single nucleotide polymorphisms in bone turnover-related genes in Koreans: ethnic differences in linkage disequilibrium and haplotype

    PubMed Central

    Kim, Kyung-Seon; Kim, Ghi-Su; Hwang, Joo-Yeon; Lee, Hye-Ja; Park, Mi-Hyun; Kim, Kwang-joong; Jung, Jongsun; Cha, Hyo-Soung; Shin, Hyoung Doo; Kang, Jong-Ho; Park, Eui Kyun; Kim, Tae-Ho; Hong, Jung-Min; Koh, Jung-Min; Oh, Bermseok; Kimm, Kuchan; Kim, Shin-Yoon; Lee, Jong-Young

    2007-01-01

    Background Osteoporosis is defined as the loss of bone mineral density that leads to bone fragility with aging. Population-based case-control studies have identified polymorphisms in many candidate genes that have been associated with bone mass maintenance or osteoporotic fracture. To investigate single nucleotide polymorphisms (SNPs) that are associated with osteoporosis, we examined the genetic variation among Koreans by analyzing 81 genes according to their function in bone formation and resorption during bone remodeling. Methods We resequenced all the exons, splice junctions and promoter regions of candidate osteoporosis genes using 24 unrelated Korean individuals. Using the common SNPs from our study and the HapMap database, a statistical analysis of deviation in heterozygosity depicted. Results We identified 942 variants, including 888 SNPs, 43 insertion/deletion polymorphisms, and 11 microsatellite markers. Of the SNPs, 557 (63%) had been previously identified and 331 (37%) were newly discovered in the Korean population. When compared SNPs in the Korean population with those in HapMap database, 1% (or less) of SNPs in the Japanese and Chinese subpopulations and 20% of those in Caucasian and African subpopulations were significantly differentiated from the Hardy-Weinberg expectations. In addition, an analysis of the genetic diversity showed that there were no significant differences among Korean, Han Chinese and Japanese populations, but African and Caucasian populations were significantly differentiated in selected genes. Nevertheless, in the detailed analysis of genetic properties, the LD and Haplotype block patterns among the five sub-populations were substantially different from one another. Conclusion Through the resequencing of 81 osteoporosis candidate genes, 118 unknown SNPs with a minor allele frequency (MAF) > 0.05 were discovered in the Korean population. In addition, using the common SNPs between our study and HapMap, an analysis of genetic diversity and deviation in heterozygosity was performed and the polymorphisms of the above genes among the five populations were substantially differentiated from one another. Further studies of osteoporosis could utilize the polymorphisms identified in our data since they may have important implications for the selection of highly informative SNPs for future association studies. PMID:18036257

  15. Discovery of candidate KEN-box motifs using cell cycle keyword enrichment combined with native disorder prediction and motif conservation.

    PubMed

    Michael, Sushama; Travé, Gilles; Ramu, Chenna; Chica, Claudia; Gibson, Toby J

    2008-02-15

    KEN-box-mediated target selection is one of the mechanisms used in the proteasomal destruction of mitotic cell cycle proteins via the APC/C complex. While annotating the Eukaryotic Linear Motif resource (ELM, http://elm.eu.org/), we found that KEN motifs were significantly enriched in human protein entries with cell cycle keywords in the UniProt/Swiss-Prot database-implying that KEN-boxes might be more common than reported. Matches to short linear motifs in protein database searches are not, per se, significant. KEN-box enrichment with cell cycle Gene Ontology terms suggests that collectively these motifs are functional but does not prove that any given instance is so. Candidates were surveyed for native disorder prediction using GlobPlot and IUPred and for motif conservation in homologues. Among >25 strong new candidates, the most notable are human HIPK2, CHFR, CDC27, Dab2, Upf2, kinesin Eg5, DNA Topoisomerase 1 and yeast Cdc5 and Swi5. A similar number of weaker candidates were present. These proteins have yet to be tested for APC/C targeted destruction, providing potential new avenues of research.

  16. Identification of an expressed gene in Dipylidium caninum.

    PubMed

    Miranda, Rodrigo R C; Costa-Júnior, Livio M; Campos, Artur K; Santos, Hudson A; Rabelo, Elida M L

    2004-10-01

    Recombinant DNA studies have been focused on developing vaccines to different cestodes. But few studies involving Dipylidium caninum molecular biology and genes have been done. Only partial sequences of mitochondrial DNA and ribosomal RNA gene are available in databases. Any molecular work with this parasite, including epidemiology, study of drug-resistant strains, and vaccine development, is hampered by the lack of knowledge of its genome. Thus, the knowledge of specific genes of different developmental stages of D. caninum is crucial to locate potential targets to be used as candidates to develop a vaccine and/or new drugs against this parasite. Here we report, for the first time, the sequencing of a fragment of a D. caninum expressed gene.

  17. Huntington's Disease and its therapeutic target genes: a global functional profile based on the HD Research Crossroads database

    PubMed Central

    2012-01-01

    Background Huntington’s disease (HD) is a fatal progressive neurodegenerative disorder caused by the expansion of the polyglutamine repeat region in the huntingtin gene. Although the disease is triggered by the mutation of a single gene, intensive research has linked numerous other genes to its pathogenesis. To obtain a systematic overview of these genes, which may serve as therapeutic targets, CHDI Foundation has recently established the HD Research Crossroads database. With currently over 800 cataloged genes, this web-based resource constitutes the most extensive curation of genes relevant to HD. It provides us with an unprecedented opportunity to survey molecular mechanisms involved in HD in a holistic manner. Methods To gain a synoptic view of therapeutic targets for HD, we have carried out a variety of bioinformatical and statistical analyses to scrutinize the functional association of genes curated in the HD Research Crossroads database. In particular, enrichment analyses were performed with respect to Gene Ontology categories, KEGG signaling pathways, and Pfam protein families. For selected processes, we also analyzed differential expression, using published microarray data. Additionally, we generated a candidate set of novel genetic modifiers of HD by combining information from the HD Research Crossroads database with previous genome-wide linkage studies. Results Our analyses led to a comprehensive identification of molecular mechanisms associated with HD. Remarkably, we not only recovered processes and pathways, which have frequently been linked to HD (such as cytotoxicity, apoptosis, and calcium signaling), but also found strong indications for other potentially disease-relevant mechanisms that have been less intensively studied in the context of HD (such as the cell cycle and RNA splicing, as well as Wnt and ErbB signaling). For follow-up studies, we provide a regularly updated compendium of molecular mechanism, that are associated with HD, at http://hdtt.sysbiolab.eu Additionally, we derived a candidate set of 24 novel genetic modifiers, including histone deacetylase 3 (HDAC3), metabotropic glutamate receptor 1 (GRM1), CDK5 regulatory subunit 2 (CDK5R2), and coactivator 1ß of the peroxisome proliferator-activated receptor gamma (PPARGC1B). Conclusions The results of our study give us an intriguing picture of the molecular complexity of HD. Our analyses can be seen as a first step towards a comprehensive list of biological processes, molecular functions, and pathways involved in HD, and may provide a basis for the development of more holistic disease models and new therapeutics. PMID:22741533

  18. ANISEED 2017: extending the integrated ascidian database to the exploration and evolutionary comparison of genome-scale datasets

    PubMed Central

    Brozovic, Matija; Dantec, Christelle; Dardaillon, Justine; Dauga, Delphine; Faure, Emmanuel; Gineste, Mathieu; Louis, Alexandra; Naville, Magali; Nitta, Kazuhiro R; Piette, Jacques; Reeves, Wendy; Scornavacca, Céline; Simion, Paul; Vincentelli, Renaud; Bellec, Maelle; Aicha, Sameh Ben; Fagotto, Marie; Guéroult-Bellone, Marion; Haeussler, Maximilian; Jacox, Edwin; Lowe, Elijah K; Mendez, Mickael; Roberge, Alexis; Stolfi, Alberto; Yokomori, Rui; Cambillau, Christian; Christiaen, Lionel; Delsuc, Frédéric; Douzery, Emmanuel; Dumollard, Rémi; Kusakabe, Takehiro; Nakai, Kenta; Nishida, Hiroki; Satou, Yutaka; Swalla, Billie; Veeman, Michael; Volff, Jean-Nicolas

    2018-01-01

    Abstract ANISEED (www.aniseed.cnrs.fr) is the main model organism database for tunicates, the sister-group of vertebrates. This release gives access to annotated genomes, gene expression patterns, and anatomical descriptions for nine ascidian species. It provides increased integration with external molecular and taxonomy databases, better support for epigenomics datasets, in particular RNA-seq, ChIP-seq and SELEX-seq, and features novel interactive interfaces for existing and novel datatypes. In particular, the cross-species navigation and comparison is enhanced through a novel taxonomy section describing each represented species and through the implementation of interactive phylogenetic gene trees for 60% of tunicate genes. The gene expression section displays the results of RNA-seq experiments for the three major model species of solitary ascidians. Gene expression is controlled by the binding of transcription factors to cis-regulatory sequences. A high-resolution description of the DNA-binding specificity for 131 Ciona robusta (formerly C. intestinalis type A) transcription factors by SELEX-seq is provided and used to map candidate binding sites across the Ciona robusta and Phallusia mammillata genomes. Finally, use of a WashU Epigenome browser enhances genome navigation, while a Genomicus server was set up to explore microsynteny relationships within tunicates and with vertebrates, Amphioxus, echinoderms and hemichordates. PMID:29149270

  19. Genome-Wide Identification of Differentially Expressed Genes Associated with the High Yielding of Oleoresin in Secondary Xylem of Masson Pine (Pinus massoniana Lamb) by Transcriptomic Analysis

    PubMed Central

    Liu, Qinghua; Zhou, Zhichun; Wei, Yongcheng; Shen, Danyu; Feng, Zhongping; Hong, Shanping

    2015-01-01

    Masson pine is an important timber and resource for oleoresin in South China. Increasing yield of oleoresin in stems can raise economic benefits and enhance the resistance to bark beetles. However, the genetic mechanisms for regulating the yield of oleoresin were still unknown. Here, high-throughput sequencing technology was used to investigate the transcriptome and compare the gene expression profiles of high and low oleoresin-yielding genotypes. A total of 40,690,540 reads were obtained and assembled into 137,499 transcripts from the secondary xylem tissues. We identified 84,842 candidate unigenes based on sequence annotation using various databases and 96 unigenes were candidates for terpenoid backbone biosynthesis in pine. By comparing the expression profiles of high and low oleoresin-yielding genotypes, 649 differentially expressed genes (DEGs) were identified. GO enrichment analysis of DEGs revealed that multiple pathways were related to high yield of oleoresin. Nine candidate genes were validated by QPCR analysis. Among them, the candidate genes encoding geranylgeranyl diphosphate synthase (GGPS) and (-)-alpha/beta-pinene synthase were up-regulated in the high oleoresin-yielding genotype, while tricyclene synthase revealed lower expression level, which was in good agreement with the GC/MS result. In addition, DEG encoding ABC transporters, pathogenesis-related proteins (PR5 and PR9), phosphomethylpyrimidine synthase, non-specific lipid-transfer protein-like protein and ethylene responsive transcription factors (ERFs) were also confirmed to be critical for the biosynthesis of oleoresin. The next-generation sequencing strategy used in this study has proven to be a powerful means for analyzing transcriptome variation related to the yield of oleoresin in masson pine. The candidate genes encoding GGPS, (-)-alpha/beta-pinene, tricyclene synthase, ABC transporters, non-specific lipid-transfer protein-like protein, phosphomethylpyrimidine synthase, ERFs and pathogen responses may play important roles in regulating the yield of oleoresin. These DEGs are worthy of special attention in future studies. PMID:26167875

  20. Transcriptome analysis reveals candidate genes involved in luciferin metabolism in Luciola aquatilis (Coleoptera: Lampyridae)

    PubMed Central

    Vongsangnak, Wanwipa; Chumnanpuen, Pramote

    2016-01-01

    Bioluminescence, which living organisms such as fireflies emit light, has been studied extensively for over half a century. This intriguing reaction, having its origins in nature where glowing insects can signal things such as attraction or defense, is now widely used in biotechnology with applications of bioluminescence and chemiluminescence. Luciferase, a key enzyme in this reaction, has been well characterized; however, the enzymes involved in the biosynthetic pathway of its substrate, luciferin, remains unsolved at present. To elucidate the luciferin metabolism, we performed a de novo transcriptome analysis using larvae of the firefly species, Luciola aquatilis. Here, a comparative analysis is performed with the model coleopteran insect Tribolium casteneum to elucidate the metabolic pathways in L. aquatilis. Based on a template luciferin biosynthetic pathway, combined with a range of protein and pathway databases, and various prediction tools for functional annotation, the candidate genes, enzymes, and biochemical reactions involved in luciferin metabolism are proposed for L. aquatilis. The candidate gene expression is validated in the adult L. aquatilis using reverse transcription PCR (RT-PCR). This study provides useful information on the bio-production of luciferin in the firefly and will benefit to future applications of the valuable firefly bioluminescence system. PMID:27761329

  1. VTCdb: a gene co-expression database for the crop species Vitis vinifera (grapevine).

    PubMed

    Wong, Darren C J; Sweetman, Crystal; Drew, Damian P; Ford, Christopher M

    2013-12-16

    Gene expression datasets in model plants such as Arabidopsis have contributed to our understanding of gene function and how a single underlying biological process can be governed by a diverse network of genes. The accumulation of publicly available microarray data encompassing a wide range of biological and environmental conditions has enabled the development of additional capabilities including gene co-expression analysis (GCA). GCA is based on the understanding that genes encoding proteins involved in similar and/or related biological processes may exhibit comparable expression patterns over a range of experimental conditions, developmental stages and tissues. We present an open access database for the investigation of gene co-expression networks within the cultivated grapevine, Vitis vinifera. The new gene co-expression database, VTCdb (http://vtcdb.adelaide.edu.au/Home.aspx), offers an online platform for transcriptional regulatory inference in the cultivated grapevine. Using condition-independent and condition-dependent approaches, grapevine co-expression networks were constructed using the latest publicly available microarray datasets from diverse experimental series, utilising the Affymetrix Vitis vinifera GeneChip (16 K) and the NimbleGen Grape Whole-genome microarray chip (29 K), thus making it possible to profile approximately 29,000 genes (95% of the predicted grapevine transcriptome). Applications available with the online platform include the use of gene names, probesets, modules or biological processes to query the co-expression networks, with the option to choose between Affymetrix or Nimblegen datasets and between multiple co-expression measures. Alternatively, the user can browse existing network modules using interactive network visualisation and analysis via CytoscapeWeb. To demonstrate the utility of the database, we present examples from three fundamental biological processes (berry development, photosynthesis and flavonoid biosynthesis) whereby the recovered sub-networks reconfirm established plant gene functions and also identify novel associations. Together, we present valuable insights into grapevine transcriptional regulation by developing network models applicable to researchers in their prioritisation of gene candidates, for on-going study of biological processes related to grapevine development, metabolism and stress responses.

  2. Kassiopeia: a database and web application for the analysis of mutually exclusive exomes of eukaryotes

    PubMed Central

    2014-01-01

    Background Alternative splicing is an important process in higher eukaryotes that allows obtaining several transcripts from one gene. A specific case of alternative splicing is mutually exclusive splicing, in which exactly one exon out of a cluster of neighbouring exons is spliced into the mature transcript. Recently, a new algorithm for the prediction of these exons has been developed based on the preconditions that the exons of the cluster have similar lengths, sequence homology, and conserved splice sites, and that they are translated in the same reading frame. Description In this contribution we introduce Kassiopeia, a database and web application for the generation, storage, and presentation of genome-wide analyses of mutually exclusive exomes. Currently, Kassiopeia provides access to the mutually exclusive exomes of twelve Drosophila species, the thale cress Arabidopsis thaliana, the flatworm Caenorhabditis elegans, and human. Mutually exclusive spliced exons (MXEs) were predicted based on gene reconstructions from Scipio. Based on the standard prediction values, with which 83.5% of the annotated MXEs of Drosophila melanogaster were reconstructed, the exomes contain surprisingly more MXEs than previously supposed and identified. The user can search Kassiopeia using BLAST or browse the genes of each species optionally adjusting the parameters used for the prediction to reveal more divergent or only very similar exon candidates. Conclusions We developed a pipeline to predict MXEs in the genomes of several model organisms and a web interface, Kassiopeia, for their visualization. For each gene Kassiopeia provides a comprehensive gene structure scheme, the sequences and predicted secondary structures of the MXEs, and, if available, further evidence for MXE candidates from cDNA/EST data, predictions of MXEs in homologous genes of closely related species, and RNA secondary structure predictions. Kassiopeia can be accessed at http://www.motorprotein.de/kassiopeia. PMID:24507667

  3. Retroviral insertions in the VISION database identify molecular pathways in mouse lymphoid leukemia and lymphoma

    PubMed Central

    Weiser, Keith C.; Liu, Bin; Hansen, Gwenn M.; Skapura, Darlene; Hentges, Kathryn E.; Yarlagadda, Sujatha; Morse III, Herbert C.

    2007-01-01

    AKXD recombinant inbred (RI) strains develop a variety of leukemias and lymphomas due to somatically acquired insertions of retroviral DNA into the genome of hematopoetic cells that can mutate cellular proto-oncogenes and tumor suppressor genes. We generated a new set of tumors from nine AKXD RI strains selected for their propensity to develop B-cell tumors, the most common type of human hematopoietic cancers. We employed a PCR technique called viral insertion site amplification (VISA) to rapidly isolate genomic sequence at the site of provirus insertion. Here we describe 550 VISA sequence tags (VSTs) that identify 74 common insertion sites (CISs), of which 21 have not been identified previously. Several suspected proto-oncogenes and tumor suppressor genes lie near CISs, providing supportive evidence for their roles in cancer. Furthermore, numerous previously uncharacterized genes lie near CISs, providing a pool of candidate disease genes for future research. Pathway analysis of candidate genes identified several signaling pathways as common and powerful routes to blood cancer, including Notch, E-protein, NFκB, and Ras signaling. Misregulation of several Notch signaling genes was confirmed by quantitative RT-PCR. Our data suggest that analyses of insertional mutagenesis on a single genetic background are biased toward the identification of cooperating mutations. This tumor collection represents the most comprehensive study of the genetics of B-cell leukemia and lymphoma development in mice. We have deposited the VST sequences, CISs in a genome viewer, histopathology, and molecular tumor typing data in a public web database called VISION (Viral Insertion Sites Identifying Oncogenes), which is located at http://www.mouse-genome.bcm.tmc.edu/vision. PMID:17926094

  4. Retroviral insertions in the VISION database identify molecular pathways in mouse lymphoid leukemia and lymphoma.

    PubMed

    Weiser, Keith C; Liu, Bin; Hansen, Gwenn M; Skapura, Darlene; Hentges, Kathryn E; Yarlagadda, Sujatha; Morse Iii, Herbert C; Justice, Monica J

    2007-10-01

    AKXD recombinant inbred (RI) strains develop a variety of leukemias and lymphomas due to somatically acquired insertions of retroviral DNA into the genome of hematopoetic cells that can mutate cellular proto-oncogenes and tumor suppressor genes. We generated a new set of tumors from nine AKXD RI strains selected for their propensity to develop B-cell tumors, the most common type of human hematopoietic cancers. We employed a PCR technique called viral insertion site amplification (VISA) to rapidly isolate genomic sequence at the site of provirus insertion. Here we describe 550 VISA sequence tags (VSTs) that identify 74 common insertion sites (CISs), of which 21 have not been identified previously. Several suspected proto-oncogenes and tumor suppressor genes lie near CISs, providing supportive evidence for their roles in cancer. Furthermore, numerous previously uncharacterized genes lie near CISs, providing a pool of candidate disease genes for future research. Pathway analysis of candidate genes identified several signaling pathways as common and powerful routes to blood cancer, including Notch, E-protein, NFkappaB, and Ras signaling. Misregulation of several Notch signaling genes was confirmed by quantitative RT-PCR. Our data suggest that analyses of insertional mutagenesis on a single genetic background are biased toward the identification of cooperating mutations. This tumor collection represents the most comprehensive study of the genetics of B-cell leukemia and lymphoma development in mice. We have deposited the VST sequences, CISs in a genome viewer, histopathology, and molecular tumor typing data in a public web database called VISION (Viral Insertion Sites Identifying Oncogenes), which is located at http://www.mouse-genome.bcm.tmc.edu/vision .

  5. Genome-wide prediction and analysis of human tissue-selective genes using microarray expression data

    PubMed Central

    2013-01-01

    Background Understanding how genes are expressed specifically in particular tissues is a fundamental question in developmental biology. Many tissue-specific genes are involved in the pathogenesis of complex human diseases. However, experimental identification of tissue-specific genes is time consuming and difficult. The accurate predictions of tissue-specific gene targets could provide useful information for biomarker development and drug target identification. Results In this study, we have developed a machine learning approach for predicting the human tissue-specific genes using microarray expression data. The lists of known tissue-specific genes for different tissues were collected from UniProt database, and the expression data retrieved from the previously compiled dataset according to the lists were used for input vector encoding. Random Forests (RFs) and Support Vector Machines (SVMs) were used to construct accurate classifiers. The RF classifiers were found to outperform SVM models for tissue-specific gene prediction. The results suggest that the candidate genes for brain or liver specific expression can provide valuable information for further experimental studies. Our approach was also applied for identifying tissue-selective gene targets for different types of tissues. Conclusions A machine learning approach has been developed for accurately identifying the candidate genes for tissue specific/selective expression. The approach provides an efficient way to select some interesting genes for developing new biomedical markers and improve our knowledge of tissue-specific expression. PMID:23369200

  6. Association study of IL10, IL1beta, and IL1RN and schizophrenia using tag SNPs from a comprehensive database: suggestive association with rs16944 at IL1beta.

    PubMed

    Shirts, Brian H; Wood, Joel; Yolken, Robert H; Nimgaonkar, Vishwajit L

    2006-12-01

    Genetic association studies of several candidate cytokine genes have been motivated by evidence of immune dysfunction among patients with schizophrenia. Intriguing but inconsistent associations have been reported with polymorphisms of three positional candidate genes, namely IL1beta, IL1RN, and IL10. We used comprehensive sequencing data from the Seattle SNPs database to select tag SNPs that represent all common polymorphisms in the Caucasian population at these loci. Associations with 28 tag SNPs were evaluated in 478 cases and 501 unscreened control individuals, while accounting for population sub-structure using the genomic control method. The samples were also stratified by gender, diagnostic category, and exposure to infectious agents. Significant association was not detected after correcting for multiple comparisons. However, meta-analysis of our data combined with previously published association studies of rs16944 (IL1beta -511) suggests that the C allele confers modest risk for schizophrenia among individuals reporting Caucasian ancestry, but not Asians (Caucasians, n=819 cases, 1292 controls; p=0.0013, OR=1.24, 95% CI 1.09, 1.41).

  7. MINEs: Open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics

    DOE PAGES

    Jeffryes, James G.; Colastani, Ricardo L.; Elbadawi-Sidhu, Mona; ...

    2015-08-28

    Metabolomics have proven difficult to execute in an untargeted and generalizable manner. Liquid chromatography–mass spectrometry (LC–MS) has made it possible to gather data on thousands of cellular metabolites. However, matching metabolites to their spectral features continues to be a bottleneck, meaning that much of the collected information remains uninterpreted and that new metabolites are seldom discovered in untargeted studies. These challenges require new approaches that consider compounds beyond those available in curated biochemistry databases. Here we present Metabolic In silico Network Expansions (MINEs), an extension of known metabolite databases to include molecules that have not been observed, but are likelymore » to occur based on known metabolites and common biochemical reactions. We utilize an algorithm called the Biochemical Network Integrated Computational Explorer (BNICE) and expert-curated reaction rules based on the Enzyme Commission classification system to propose the novel chemical structures and reactions that comprise MINE databases. Starting from the Kyoto Encyclopedia of Genes and Genomes (KEGG) COMPOUND database, the MINE contains over 571,000 compounds, of which 93% are not present in the PubChem database. However, these MINE compounds have on average higher structural similarity to natural products than compounds from KEGG or PubChem. MINE databases were able to propose annotations for 98.6% of a set of 667 MassBank spectra, 14% more than KEGG alone and equivalent to PubChem while returning far fewer candidates per spectra than PubChem (46 vs. 1715 median candidates). Application of MINEs to LC–MS accurate mass data enabled the identity of an unknown peak to be confidently predicted. MINE databases are freely accessible for non-commercial use via user-friendly web-tools at http://minedatabase.mcs.anl.gov and developer-friendly APIs. MINEs improve metabolomics peak identification as compared to general chemical databases whose results include irrelevant synthetic compounds. MINEs complement and expand on previous in silico generated compound databases that focus on human metabolism. We are actively developing the database; future versions of this resource will incorporate transformation rules for spontaneous chemical reactions and more advanced filtering and prioritization of candidate structures.« less

  8. Identification of Regulatory Genes Implicated in Continuous Flowering of Longan (Dimocarpus longan L.)

    PubMed Central

    Jia, Tianqi; Wei, Danfeng; Meng, Shan; Allan, Andrew C.; Zeng, Lihui

    2014-01-01

    Longan (Dimocarpus longan L.) is a tropical/subtropical fruit tree of significant economic importance in Southeast Asia. However, a lack of transcriptomic and genomic information hinders research on longan traits, such as the control of flowering. In this study, high-throughput RNA sequencing (RNA-Seq) was used to investigate differentially expressed genes between a unique longan cultivar ‘Sijimi’(S) which flowers throughout the year and a more typical cultivar ‘Lidongben’(L) which flowers only once in the season, with the aim of identifying candidate genes associated with continuous flowering. 36,527 and 40,982 unigenes were obtained by de novo assembly of the clean reads from cDNA libraries of L and S cultivars. Additionally 40,513 unigenes were assembled from combined reads of these libraries. A total of 32,475 unigenes were annotated by BLAST search to NCBI non-redundant protein (NR), Swiss-Prot, Clusters of Orthologous Groups (COGs) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Of these, almost fifteen thousand unigenes were identified as significantly differentially expressed genes (DEGs) by using Reads Per kb per Million reads (RPKM) method. A total of 6,415 DEGs were mapped to 128 KEGG pathways, and 8,743 DEGs were assigned to 54 Gene Ontology categories. After blasting the DEGs to public sequence databases, 539 potential flowering-related DEGs were identified. In addition, 107 flowering-time genes were identified in longan, their expression levels between two longan samples were compared by RPKM method, of which the expression levels of 15 were confirmed by real-time quantitative PCR. Our results suggest longan homologues of SHORT VEGETATIVE PHASE (SVP), GIGANTEA (GI), F-BOX 1 (FKF1) and EARLY FLOWERING 4 (ELF4) may be involved this flowering trait and ELF4 may be a key gene. The identification of candidate genes related to continuous flowering will provide new insight into the molecular process of regulating flowering time in woody plants. PMID:25479005

  9. Dictionary-driven prokaryotic gene finding.

    PubMed

    Shibuya, Tetsuo; Rigoutsos, Isidore

    2002-06-15

    Gene identification, also known as gene finding or gene recognition, is among the important problems of molecular biology that have been receiving increasing attention with the advent of large scale sequencing projects. Previous strategies for solving this problem can be categorized into essentially two schools of thought: one school employs sequence composition statistics, whereas the other relies on database similarity searches. In this paper, we propose a new gene identification scheme that combines the best characteristics from each of these two schools. In particular, our method determines gene candidates among the ORFs that can be identified in a given DNA strand through the use of the Bio-Dictionary, a database of patterns that covers essentially all of the currently available sample of the natural protein sequence space. Our approach relies entirely on the use of redundant patterns as the agents on which the presence or absence of genes is predicated and does not employ any additional evidence, e.g. ribosome-binding site signals. The Bio-Dictionary Gene Finder (BDGF), the algorithm's implementation, is a single computational engine able to handle the gene identification task across distinct archaeal and bacterial genomes. The engine exhibits performance that is characterized by simultaneous very high values of sensitivity and specificity, and a high percentage of correctly predicted start sites. Using a collection of patterns derived from an old (June 2000) release of the Swiss-Prot/TrEMBL database that contained 451 602 proteins and fragments, we demonstrate our method's generality and capabilities through an extensive analysis of 17 complete archaeal and bacterial genomes. Examples of previously unreported genes are also shown and discussed in detail.

  10. SSER: Species specific essential reactions database.

    PubMed

    Labena, Abraham A; Ye, Yuan-Nong; Dong, Chuan; Zhang, Fa-Z; Guo, Feng-Biao

    2017-04-19

    Essential reactions are vital components of cellular networks. They are the foundations of synthetic biology and are potential candidate targets for antimetabolic drug design. Especially if a single reaction is catalyzed by multiple enzymes, then inhibiting the reaction would be a better option than targeting the enzymes or the corresponding enzyme-encoding gene. The existing databases such as BRENDA, BiGG, KEGG, Bio-models, Biosilico, and many others offer useful and comprehensive information on biochemical reactions. But none of these databases especially focus on essential reactions. Therefore, building a centralized repository for this class of reactions would be of great value. Here, we present a species-specific essential reactions database (SSER). The current version comprises essential biochemical and transport reactions of twenty-six organisms which are identified via flux balance analysis (FBA) combined with manual curation on experimentally validated metabolic network models. Quantitative data on the number of essential reactions, number of the essential reactions associated with their respective enzyme-encoding genes and shared essential reactions across organisms are the main contents of the database. SSER would be a prime source to obtain essential reactions data and related gene and metabolite information and it can significantly facilitate the metabolic network models reconstruction and analysis, and drug target discovery studies. Users can browse, search, compare and download the essential reactions of organisms of their interest through the website http://cefg.uestc.edu.cn/sser .

  11. Transcriptome Sequencing of Codonopsis pilosula and Identification of Candidate Genes Involved in Polysaccharide Biosynthesis

    PubMed Central

    Gao, Jian Ping; Wang, Dong; Cao, Ling Ya; Sun, Hai Feng

    2015-01-01

    Background Codonopsis pilosula (Franch.) Nannf. is one of the most widely used medicinal plants. Although chemical and pharmacological studies have shown that codonopsis polysaccharides (CPPs) are bioactive compounds and that their composition is variable, their biosynthetic pathways remain largely unknown. Next-generation sequencing is an efficient and high-throughput technique that allows the identification of candidate genes involved in secondary metabolism. Principal Findings To identify the components involved in CPP biosynthesis, a transcriptome library, prepared using root and other tissues, was assembled with the help of Illumina sequencing. A total of 9.2 Gb of clean nucleotides was obtained comprising 91,175,044 clean reads, 102,125 contigs, and 45,511 unigenes. After aligning the sequences to the public protein databases, 76.1% of the unigenes were annotated. Among these annotated unigenes, 26,189 were assigned to Gene Ontology categories, 11,415 to Clusters of Orthologous Groups, and 18,848 to Kyoto Encyclopedia of Genes and Genomes pathways. Analysis of abundance of transcripts in the library showed that genes, including those encoding metallothionein, aquaporin, and cysteine protease that are related to stress responses, were in the top list. Among genes involved in the biosynthesis of CPP, those responsible for the synthesis of UDP-L-arabinose and UDP-xylose were highly expressed. Significance To our knowledge, this is the first study to provide a public transcriptome dataset prepared from C. pilosula and an outline of the biosynthetic pathway of polysaccharides in a medicinal plant. Identified candidate genes involved in CPP biosynthesis provide understanding of the biosynthesis and regulation of CPP at the molecular level. PMID:25719364

  12. De novo characterization of the Chinese fir (Cunninghamia lanceolata) transcriptome and analysis of candidate genes involved in cellulose and lignin biosynthesis

    PubMed Central

    2012-01-01

    Background Chinese fir (Cunninghamia lanceolata) is an important timber species that accounts for 20–30% of the total commercial timber production in China. However, the available genomic information of Chinese fir is limited, and this severely encumbers functional genomic analysis and molecular breeding in Chinese fir. Recently, major advances in transcriptome sequencing have provided fast and cost-effective approaches to generate large expression datasets that have proven to be powerful tools to profile the transcriptomes of non-model organisms with undetermined genomes. Results In this study, the transcriptomes of nine tissues from Chinese fir were analyzed using the Illumina HiSeq™ 2000 sequencing platform. Approximately 40 million paired-end reads were obtained, generating 3.62 gigabase pairs of sequencing data. These reads were assembled into 83,248 unique sequences (i.e. Unigenes) with an average length of 449 bp, amounting to 37.40 Mb. A total of 73,779 Unigenes were supported by more than 5 reads, 42,663 (57.83%) had homologs in the NCBI non-redundant and Swiss-Prot protein databases, corresponding to 27,224 unique protein entries. Of these Unigenes, 16,750 were assigned to Gene Ontology classes, and 14,877 were clustered into orthologous groups. A total of 21,689 (29.40%) were mapped to 119 pathways by BLAST comparison against the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. The majority of the genes encoding the enzymes in the biosynthetic pathways of cellulose and lignin were identified in the Unigene dataset by targeted searches of their annotations. And a number of candidate Chinese fir genes in the two metabolic pathways were discovered firstly. Eighteen genes related to cellulose and lignin biosynthesis were cloned for experimental validating of transcriptome data. Overall 49 Unigenes, covering different regions of these selected genes, were found by alignment. Their expression patterns in different tissues were analyzed by qRT-PCR to explore their putative functions. Conclusions A substantial fraction of transcript sequences was obtained from the deep sequencing of Chinese fir. The assembled Unigene dataset was used to discover candidate genes of cellulose and lignin biosynthesis. This transcriptome dataset will provide a comprehensive sequence resource for molecular genetics research of C. lanceolata. PMID:23171398

  13. MINEs: open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics.

    PubMed

    Jeffryes, James G; Colastani, Ricardo L; Elbadawi-Sidhu, Mona; Kind, Tobias; Niehaus, Thomas D; Broadbelt, Linda J; Hanson, Andrew D; Fiehn, Oliver; Tyo, Keith E J; Henry, Christopher S

    2015-01-01

    In spite of its great promise, metabolomics has proven difficult to execute in an untargeted and generalizable manner. Liquid chromatography-mass spectrometry (LC-MS) has made it possible to gather data on thousands of cellular metabolites. However, matching metabolites to their spectral features continues to be a bottleneck, meaning that much of the collected information remains uninterpreted and that new metabolites are seldom discovered in untargeted studies. These challenges require new approaches that consider compounds beyond those available in curated biochemistry databases. Here we present Metabolic In silico Network Expansions (MINEs), an extension of known metabolite databases to include molecules that have not been observed, but are likely to occur based on known metabolites and common biochemical reactions. We utilize an algorithm called the Biochemical Network Integrated Computational Explorer (BNICE) and expert-curated reaction rules based on the Enzyme Commission classification system to propose the novel chemical structures and reactions that comprise MINE databases. Starting from the Kyoto Encyclopedia of Genes and Genomes (KEGG) COMPOUND database, the MINE contains over 571,000 compounds, of which 93% are not present in the PubChem database. However, these MINE compounds have on average higher structural similarity to natural products than compounds from KEGG or PubChem. MINE databases were able to propose annotations for 98.6% of a set of 667 MassBank spectra, 14% more than KEGG alone and equivalent to PubChem while returning far fewer candidates per spectra than PubChem (46 vs. 1715 median candidates). Application of MINEs to LC-MS accurate mass data enabled the identity of an unknown peak to be confidently predicted. MINE databases are freely accessible for non-commercial use via user-friendly web-tools at http://minedatabase.mcs.anl.gov and developer-friendly APIs. MINEs improve metabolomics peak identification as compared to general chemical databases whose results include irrelevant synthetic compounds. Furthermore, MINEs complement and expand on previous in silico generated compound databases that focus on human metabolism. We are actively developing the database; future versions of this resource will incorporate transformation rules for spontaneous chemical reactions and more advanced filtering and prioritization of candidate structures. Graphical abstractMINE database construction and access methods. The process of constructing a MINE database from the curated source databases is depicted on the left. The methods for accessing the database are shown on the right.

  14. HIVsirDB: a database of HIV inhibiting siRNAs.

    PubMed

    Tyagi, Atul; Ahmed, Firoz; Thakur, Nishant; Sharma, Arun; Raghava, Gajendra P S; Kumar, Manoj

    2011-01-01

    Human immunodeficiency virus (HIV) is responsible for millions of deaths every year. The current treatment involves the use of multiple antiretroviral agents that may harm patients due to their toxic nature. RNA interference (RNAi) is a potent candidate for the future treatment of HIV, uses short interfering RNA (siRNA/shRNA) for silencing HIV genes. In this study, attempts have been made to create a database HIVsirDB of siRNAs responsible for silencing HIV genes. HIVsirDB is a manually curated database of HIV inhibiting siRNAs that provides comprehensive information about each siRNA or shRNA. Information was collected and compiled from literature and public resources. This database contains around 750 siRNAs that includes 75 partially complementary siRNAs differing by one or more bases with the target sites and over 100 escape mutant sequences. HIVsirDB structure contains sixteen fields including siRNA sequence, HIV strain, targeted genome region, efficacy and conservation of target sequences. In order to facilitate user, many tools have been integrated in this database that includes; i) siRNAmap for mapping siRNAs on target sequence, ii) HIVsirblast for BLAST search against database, iii) siRNAalign for aligning siRNAs. HIVsirDB is a freely accessible database of siRNAs which can silence or degrade HIV genes. It covers 26 types of HIV strains and 28 cell types. This database will be very useful for developing models for predicting efficacy of HIV inhibiting siRNAs. In summary this is a useful resource for researchers working in the field of siRNA based HIV therapy. HIVsirDB database is accessible at http://crdd.osdd.net/raghava/hivsir/.

  15. Whole genome sequencing resource identifies 18 new candidate genes for autism spectrum disorder

    PubMed Central

    Yuen, Ryan KC; Merico, Daniele; Bookman, Matt; Howe, Jennifer L; Thiruvahindrapuram, Bhooma; Patel, Rohan V; Whitney, Joe; Deflaux, Nicole; Bingham, Jonathan; Wang, Zhuozhi; Pellecchia, Giovanna; Buchanan, Janet A; Walker, Susan; Marshall, Christian R; Uddin, Mohammed; Zarrei, Mehdi; Deneault, Eric; D’Abate, Lia; Chan, Ada JS; Koyanagi, Stephanie; Paton, Tara; Pereira, Sergio L; Hoang, Ny; Engchuan, Worrawat; Higginbotham, Edward J; Ho, Karen; Lamoureux, Sylvia; Li, Weili; MacDonald, Jeffrey R; Nalpathamkalam, Thomas; Sung, Wilson WL; Tsoi, Fiona J; Wei, John; Xu, Lizhen; Tasse, Anne-Marie; Kirby, Emily; Van Etten, William; Twigger, Simon; Roberts, Wendy; Drmic, Irene; Jilderda, Sanne; Modi, Bonnie MacKinnon; Kellam, Barbara; Szego, Michael; Cytrynbaum, Cheryl; Weksberg, Rosanna; Zwaigenbaum, Lonnie; Woodbury-Smith, Marc; Brian, Jessica; Senman, Lili; Iaboni, Alana; Doyle-Thomas, Krissy; Thompson, Ann; Chrysler, Christina; Leef, Jonathan; Savion-Lemieux, Tal; Smith, Isabel M; Liu, Xudong; Nicolson, Rob; Seifer, Vicki; Fedele, Angie; Cook, Edwin H; Dager, Stephen; Estes, Annette; Gallagher, Louise; Malow, Beth A; Parr, Jeremy R; Spence, Sarah J; Vorstman, Jacob; Frey, Brendan J; Robinson, James T; Strug, Lisa J; Fernandez, Bridget A; Elsabbagh, Mayada; Carter, Melissa T; Hallmayer, Joachim; Knoppers, Bartha M; Anagnostou, Evdokia; Szatmari, Peter; Ring, Robert H; Glazer, David; Pletcher, Mathew T; Scherer, Stephen W

    2017-01-01

    We are performing whole genome sequencing (WGS) of families with Autism Spectrum Disorder (ASD) to build a resource, named MSSNG, to enable the sub-categorization of phenotypes and underlying genetic factors involved. Here, we report WGS of 5,205 samples from families with ASD, accompanied by clinical information, creating a database accessible in a cloud platform, and through an internet portal with controlled access. We found an average of 73.8 de novo single nucleotide variants and 12.6 de novo insertion/deletions (indels) or copy number variations (CNVs) per ASD subject. We identified 18 new candidate ASD-risk genes such as MED13 and PHF3, and found that participants bearing mutations in susceptibility genes had significantly lower adaptive ability (p=6×10−4). In 294/2,620 (11.2%) of ASD cases, a molecular basis could be determined and 7.2% of these carried CNV/chromosomal abnormalities, emphasizing the importance of detecting all forms of genetic variation as diagnostic and therapeutic targets in ASD. PMID:28263302

  16. Human genetic factors in tuberculosis: an update.

    PubMed

    van Tong, Hoang; Velavan, Thirumalaisamy P; Thye, Thorsten; Meyer, Christian G

    2017-09-01

    Tuberculosis (TB) is a major threat to human health, especially in many developing countries. Human genetic variability has been recognised to be of great relevance in host responses to Mycobacterium tuberculosis infection and in regulating both the establishment and the progression of the disease. An increasing number of candidate gene and genome-wide association studies (GWAS) have focused on human genetic factors contributing to susceptibility or resistance to TB. To update previous reviews on human genetic factors in TB we searched the MEDLINE database and PubMed for articles from 1 January 2014 through 31 March 2017 and reviewed the role of human genetic variability in TB. Search terms applied in various combinations were 'tuberculosis', 'human genetics', 'candidate gene studies', 'genome-wide association studies' and 'Mycobacterium tuberculosis'. Articles in English retrieved and relevant references cited in these articles were reviewed. Abstracts and reports from meetings were also included. This review provides a recent summary of associations of polymorphisms of human genes with susceptibility/resistance to TB. © 2017 John Wiley & Sons Ltd.

  17. Advances in QTL Mapping in Pigs

    PubMed Central

    Rothschild, Max F.; Hu, Zhi-liang; Jiang, Zhihua

    2007-01-01

    Over the past 15 years advances in the porcine genetic linkage map and discovery of useful candidate genes have led to valuable gene and trait information being discovered. Early use of exotic breed crosses and now commercial breed crosses for quantitative trait loci (QTL) scans and candidate gene analyses have led to 110 publications which have identified 1,675 QTL. Additionally, these studies continue to identify genes associated with economically important traits such as growth rate, leanness, feed intake, meat quality, litter size, and disease resistance. A well developed QTL database called PigQTLdb is now as a valuable tool for summarizing and pinpointing in silico regions of interest to researchers. The commercial pig industry is actively incorporating these markers in marker-assisted selection along with traditional performance information to improve traits of economic performance. The long awaited sequencing efforts are also now beginning to provide sequence available for both comparative genomics and large scale single nucleotide polymorphism (SNP) association studies. While these advances are all positive, development of useful new trait families and measurement of new or underlying traits still limits future discoveries. A review of these developments is presented. PMID:17384738

  18. Novel strategies to mine alcoholism-related haplotypes and genes by combining existing knowledge framework.

    PubMed

    Zhang, RuiJie; Li, Xia; Jiang, YongShuai; Liu, GuiYou; Li, ChuanXing; Zhang, Fan; Xiao, Yun; Gong, BinSheng

    2009-02-01

    High-throughout single nucleotide polymorphism detection technology and the existing knowledge provide strong support for mining the disease-related haplotypes and genes. In this study, first, we apply four kinds of haplotype identification methods (Confidence Intervals, Four Gamete Tests, Solid Spine of LD and fusing method of haplotype block) into high-throughout SNP genotype data to identify blocks, then use cluster analysis to verify the effectiveness of the four methods, and select the alcoholism-related SNP haplotypes through risk analysis. Second, we establish a mapping from haplotypes to alcoholism-related genes. Third, we inquire NCBI SNP and gene databases to locate the blocks and identify the candidate genes. In the end, we make gene function annotation by KEGG, Biocarta, and GO database. We find 159 haplotype blocks, which relate to the alcoholism most possibly on chromosome 1 approximately 22, including 227 haplotypes, of which 102 SNP haplotypes may increase the risk of alcoholism. We get 121 alcoholism-related genes and verify their reliability by the functional annotation of biology. In a word, we not only can handle the SNP data easily, but also can locate the disease-related genes precisely by combining our novel strategies of mining alcoholism-related haplotypes and genes with existing knowledge framework.

  19. Dissecting Vancomycin-Intermediate Resistance in Staphylococcus aureus Using Genome-Wide Association

    PubMed Central

    Alam, Md Tauqeer; Petit, Robert A.; Crispell, Emily K.; Thornton, Timothy A.; Conneely, Karen N.; Jiang, Yunxuan; Satola, Sarah W.; Read, Timothy D.

    2014-01-01

    Vancomycin-intermediate Staphylococcus aureus (VISA) is currently defined as having minimal inhibitory concentration (MIC) of 4–8 µg/ml. VISA evolves through changes in multiple genetic loci with at least 16 candidate genes identified in clinical and in vitro-selected VISA strains. We report a whole-genome comparative analysis of 49 vancomycin-sensitive S. aureus and 26 VISA strains. Resistance to vancomycin was determined by broth microdilution, Etest, and population analysis profile-area under the curve (PAP-AUC). Genome-wide association studies (GWAS) of 55,977 single-nucleotide polymorphisms identified in one or more strains found one highly significant association (P = 8.78E-08) between a nonsynonymous mutation at codon 481 (H481) of the rpoB gene and increased vancomycin MIC. Additionally, we used a database of public S. aureus genome sequences to identify rare mutations in candidate genes associated with VISA. On the basis of these data, we proposed a preliminary model called ECM+RMCG for the VISA phenotype as a benchmark for future efforts. The model predicted VISA based on the presence of a rare mutation in a set of candidate genes (walKR, vraSR, graSR, and agrA) and/or three previously experimentally verified mutations (including the rpoB H481 locus) with an accuracy of 81% and a sensitivity of 73%. Further, the level of resistance measured by both Etest and PAP-AUC regressed positively with the number of mutations present in a strain. This study demonstrated 1) the power of GWAS for identifying common genetic variants associated with antibiotic resistance in bacteria and 2) that rare mutations in candidate gene, identified using large genomic data sets, can also be associated with resistance phenotypes. PMID:24787619

  20. The prescribable drugs with efficacy in experimental epilepsies (PDE3) database for drug repurposing research in epilepsy.

    PubMed

    Sivapalarajah, Shayeeshan; Krishnakumar, Mathangi; Bickerstaffe, Harry; Chan, YikYing; Clarkson, Joseph; Hampden-Martin, Alistair; Mirza, Ahmad; Tanti, Matthew; Marson, Anthony; Pirmohamed, Munir; Mirza, Nasir

    2018-02-01

    Current antiepileptic drugs (AEDs) have several shortcomings. For example, they fail to control seizures in 30% of patients. Hence, there is a need to identify new AEDs. Drug repurposing is the discovery of new indications for approved drugs. This drug "recycling" offers the potential of significant savings in the time and cost of drug development. Many drugs licensed for other indications exhibit antiepileptic efficacy in animal models. Our aim was to create a database of "prescribable" drugs, approved for other conditions, with published evidence of efficacy in animal models of epilepsy, and to collate data that would assist in choosing the most promising candidates for drug repurposing. The database was created by the following: (1) computational literature-mining using novel software that identifies Medline abstracts containing the name of a prescribable drug, a rodent model of epilepsy, and a phrase indicating seizure reduction; then (2) crowdsourced manual curation of the identified abstracts. The final database includes 173 drugs and 500 abstracts. It is made freely available at www.liverpool.ac.uk/D3RE/PDE3. The database is reliable: 94% of the included drugs have corroborative evidence of efficacy in animal models (for example, evidence from multiple independent studies). The database includes many drugs that are appealing candidates for repurposing, as they are widely accepted by prescribers and patients-the database includes half of the 20 most commonly prescribed drugs in England-and they target many proteins involved in epilepsy but not targeted by current AEDs. It is important to note that the drugs are of potential relevance to human epilepsy-the database is highly enriched with drugs that target proteins of known causal human epilepsy genes (Fisher's exact test P-value < 3 × 10 -5 ). We present data to help prioritize the most promising candidates for repurposing from the database. The PDE3 database is an important new resource for drug repurposing research in epilepsy. Wiley Periodicals, Inc. © 2018 International League Against Epilepsy.

  1. Mining functionally relevant gene sets for analyzing physiologically novel clinical expression data.

    PubMed

    Turcan, Sevin; Vetter, Douglas E; Maron, Jill L; Wei, Xintao; Slonim, Donna K

    2011-01-01

    Gene set analyses have become a standard approach for increasing the sensitivity of transcriptomic studies. However, analytical methods incorporating gene sets require the availability of pre-defined gene sets relevant to the underlying physiology being studied. For novel physiological problems, relevant gene sets may be unavailable or existing gene set databases may bias the results towards only the best-studied of the relevant biological processes. We describe a successful attempt to mine novel functional gene sets for translational projects where the underlying physiology is not necessarily well characterized in existing annotation databases. We choose targeted training data from public expression data repositories and define new criteria for selecting biclusters to serve as candidate gene sets. Many of the discovered gene sets show little or no enrichment for informative Gene Ontology terms or other functional annotation. However, we observe that such gene sets show coherent differential expression in new clinical test data sets, even if derived from different species, tissues, and disease states. We demonstrate the efficacy of this method on a human metabolic data set, where we discover novel, uncharacterized gene sets that are diagnostic of diabetes, and on additional data sets related to neuronal processes and human development. Our results suggest that our approach may be an efficient way to generate a collection of gene sets relevant to the analysis of data for novel clinical applications where existing functional annotation is relatively incomplete.

  2. Transcriptomic analysis of the mussel Elliptio complanata identifies candidate stress-response genes and an abundance of novel or noncoding transcripts

    USGS Publications Warehouse

    Cornman, Robert S.; Robertson, Laura S.; Galbraith, Heather S.; Blakeslee, Carrie J.

    2014-01-01

    Mussels are useful indicator species of environmental stress and degradation, and the global decline in freshwater mussel diversity and abundance is of conservation concern. Elliptio complanata is a common freshwater mussel of eastern North America that can serve both as an indicator and as an experimental model for understanding mussel physiology and genetics. To support genetic components of these research goals, we assembled transcriptome contigs from Illumina paired-end reads. Despite efforts to collapse similar contigs, the final assembly was in excess of 136,000 contigs with an N50 of 982 bp. Even so, comparisons to the CEGMA database of conserved eukaryotic genes indicated that ∼20% of genes remain unrepresented. However, numerous candidate stress-response genes were present, and we identified lineage-specific patterns of diversification among molluscs for cytochrome P450 detoxification genes and two saccharide-modifying enzymes: 1,3 beta-galactosyltransferase and fucosyltransferase. Less than a quarter of contigs had protein-level similarity based on modest BLAST and Hmmer3 statistical thresholds. These results add comparative genomic resources for molluscs and suggest a wealth of novel proteins and noncoding transcripts.

  3. Partial genome assembly for a candidate division OP11 single cell from an anoxic spring (Zodletone Spring, Oklahoma).

    PubMed

    Youssef, Noha H; Blainey, Paul C; Quake, Stephen R; Elshahed, Mostafa S

    2011-11-01

    Members of candidate division OP11 are widely distributed in terrestrial and marine ecosystems, yet little information regarding their metabolic capabilities and ecological role within such habitats is currently available. Here, we report on the microfluidic isolation, multiple-displacement-amplification, pyrosequencing, and genomic analysis of a single cell (ZG1) belonging to candidate division OP11. Genome analysis of the ∼270-kb partial genome assembly obtained showed that it had no particular similarity to a specific phylum. Four hundred twenty-three open reading frames were identified, 46% of which had no function prediction. In-depth analysis revealed a heterotrophic lifestyle, with genes encoding endoglucanase, amylopullulanase, and laccase enzymes, suggesting a capacity for utilization of cellulose, starch, and, potentially, lignin, respectively. Genes encoding several glycolysis enzymes as well as formate utilization were identified, but no evidence for an electron transport chain was found. The presence of genes encoding various components of lipopolysaccharide biosynthesis indicates a Gram-negative bacterial cell wall. The partial genome also provides evidence for antibiotic resistance (β-lactamase, aminoglycoside phosphotransferase), as well as antibiotic production (bacteriocin) and extracellular bactericidal peptidases. Multiple mechanisms for stress response were identified, as were elements of type I and type IV secretion systems. Finally, housekeeping genes identified within the partial genome were used to demonstrate the OP11 affiliation of multiple hitherto unclassified genomic fragments from multiple database-deposited metagenomic data sets. These results provide the first glimpse into the lifestyle of a member of a ubiquitous, yet poorly understood bacterial candidate division.

  4. Human Disease Insight: An integrated knowledge-based platform for disease-gene-drug information.

    PubMed

    Tasleem, Munazzah; Ishrat, Romana; Islam, Asimul; Ahmad, Faizan; Hassan, Md Imtaiyaz

    2016-01-01

    The scope of the Human Disease Insight (HDI) database is not limited to researchers or physicians as it also provides basic information to non-professionals and creates disease awareness, thereby reducing the chances of patient suffering due to ignorance. HDI is a knowledge-based resource providing information on human diseases to both scientists and the general public. Here, our mission is to provide a comprehensive human disease database containing most of the available useful information, with extensive cross-referencing. HDI is a knowledge management system that acts as a central hub to access information about human diseases and associated drugs and genes. In addition, HDI contains well-classified bioinformatics tools with helpful descriptions. These integrated bioinformatics tools enable researchers to annotate disease-specific genes and perform protein analysis, search for biomarkers and identify potential vaccine candidates. Eventually, these tools will facilitate the analysis of disease-associated data. The HDI provides two types of search capabilities and includes provisions for downloading, uploading and searching disease/gene/drug-related information. The logistical design of the HDI allows for regular updating. The database is designed to work best with Mozilla Firefox and Google Chrome and is freely accessible at http://humandiseaseinsight.com. Copyright © 2015 King Saud Bin Abdulaziz University for Health Sciences. Published by Elsevier Ltd. All rights reserved.

  5. An emerging cyberinfrastructure for biodefense pathogen and pathogen-host data.

    PubMed

    Zhang, C; Crasta, O; Cammer, S; Will, R; Kenyon, R; Sullivan, D; Yu, Q; Sun, W; Jha, R; Liu, D; Xue, T; Zhang, Y; Moore, M; McGarvey, P; Huang, H; Chen, Y; Zhang, J; Mazumder, R; Wu, C; Sobral, B

    2008-01-01

    The NIAID-funded Biodefense Proteomics Resource Center (RC) provides storage, dissemination, visualization and analysis capabilities for the experimental data deposited by seven Proteomics Research Centers (PRCs). The data and its publication is to support researchers working to discover candidates for the next generation of vaccines, therapeutics and diagnostics against NIAID's Category A, B and C priority pathogens. The data includes transcriptional profiles, protein profiles, protein structural data and host-pathogen protein interactions, in the context of the pathogen life cycle in vivo and in vitro. The database has stored and supported host or pathogen data derived from Bacillus, Brucella, Cryptosporidium, Salmonella, SARS, Toxoplasma, Vibrio and Yersinia, human tissue libraries, and mouse macrophages. These publicly available data cover diverse data types such as mass spectrometry, yeast two-hybrid (Y2H), gene expression profiles, X-ray and NMR determined protein structures and protein expression clones. The growing database covers over 23 000 unique genes/proteins from different experiments and organisms. All of the genes/proteins are annotated and integrated across experiments using UniProt Knowledgebase (UniProtKB) accession numbers. The web-interface for the database enables searching, querying and downloading at the level of experiment, group and individual gene(s)/protein(s) via UniProtKB accession numbers or protein function keywords. The system is accessible at http://www.proteomicsresource.org/.

  6. De novo comparative transcriptome analysis of genes involved in fruit morphology of pumpkin cultivars with extreme size difference and development of EST-SSR markers.

    PubMed

    Xanthopoulou, Aliki; Ganopoulos, Ioannis; Psomopoulos, Fotis; Manioudaki, Maria; Moysiadis, Theodoros; Kapazoglou, Aliki; Osathanunkul, Maslin; Michailidou, Sofia; Kalivas, Apostolos; Tsaftaris, Athanasios; Nianiou-Obeidat, Irini; Madesis, Panagiotis

    2017-07-30

    The genetic basis of fruit size and shape was investigated for the first time in Cucurbita species and genetic loci associated with fruit morphology have been identified. Although extensive genomic resources are available at present for tomato (Solanum lycopersicum), cucumber (Cucumis sativus), melon (Cucumis melo) and watermelon (Citrullus lanatus), genomic databases for Cucurbita species are limited. Recently, our group reported the generation of pumpkin (Cucurbita pepo) transcriptome databases from two contrasting cultivars with extreme fruit sizes. In the current study we used these databases to perform comparative transcriptome analysis in order to identify genes with potential roles in fruit morphology and fruit size. Differential Gene Expression (DGE) analysis between cv. 'Munchkin' (small-fruit) and cv. 'Big Moose' (large-fruit) revealed a variety of candidate genes associated with fruit morphology with significant differences in gene expression between the two cultivars. In addition, we have set the framework for generating EST-SSR markers, which discriminate different C. pepo cultivars and show transferability to related Cucurbitaceae species. The results of the present study will contribute to both further understanding the molecular mechanisms regulating fruit morphology and furthermore identifying the factors that determine fruit size. Moreover, they may lead to the development of molecular marker tools for selecting genotypes with desired morphological traits. Copyright © 2017. Published by Elsevier B.V.

  7. ANISEED 2017: extending the integrated ascidian database to the exploration and evolutionary comparison of genome-scale datasets.

    PubMed

    Brozovic, Matija; Dantec, Christelle; Dardaillon, Justine; Dauga, Delphine; Faure, Emmanuel; Gineste, Mathieu; Louis, Alexandra; Naville, Magali; Nitta, Kazuhiro R; Piette, Jacques; Reeves, Wendy; Scornavacca, Céline; Simion, Paul; Vincentelli, Renaud; Bellec, Maelle; Aicha, Sameh Ben; Fagotto, Marie; Guéroult-Bellone, Marion; Haeussler, Maximilian; Jacox, Edwin; Lowe, Elijah K; Mendez, Mickael; Roberge, Alexis; Stolfi, Alberto; Yokomori, Rui; Brown, C Titus; Cambillau, Christian; Christiaen, Lionel; Delsuc, Frédéric; Douzery, Emmanuel; Dumollard, Rémi; Kusakabe, Takehiro; Nakai, Kenta; Nishida, Hiroki; Satou, Yutaka; Swalla, Billie; Veeman, Michael; Volff, Jean-Nicolas; Lemaire, Patrick

    2018-01-04

    ANISEED (www.aniseed.cnrs.fr) is the main model organism database for tunicates, the sister-group of vertebrates. This release gives access to annotated genomes, gene expression patterns, and anatomical descriptions for nine ascidian species. It provides increased integration with external molecular and taxonomy databases, better support for epigenomics datasets, in particular RNA-seq, ChIP-seq and SELEX-seq, and features novel interactive interfaces for existing and novel datatypes. In particular, the cross-species navigation and comparison is enhanced through a novel taxonomy section describing each represented species and through the implementation of interactive phylogenetic gene trees for 60% of tunicate genes. The gene expression section displays the results of RNA-seq experiments for the three major model species of solitary ascidians. Gene expression is controlled by the binding of transcription factors to cis-regulatory sequences. A high-resolution description of the DNA-binding specificity for 131 Ciona robusta (formerly C. intestinalis type A) transcription factors by SELEX-seq is provided and used to map candidate binding sites across the Ciona robusta and Phallusia mammillata genomes. Finally, use of a WashU Epigenome browser enhances genome navigation, while a Genomicus server was set up to explore microsynteny relationships within tunicates and with vertebrates, Amphioxus, echinoderms and hemichordates. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  8. De novo sequencing and analysis of the transcriptome of Panax ginseng in the leaf-expansion period.

    PubMed

    Liu, Shichao; Wang, Siming; Liu, Meichen; Yang, Fei; Zhang, Hui; Liu, Shiyang; Wang, Qun; Zhao, Yu

    2016-08-01

    Panax ginseng, a traditional Chinese medicine, is used worldwide for its variety of health benefits and its treatment efficacy. However, it is difficult to cultivate due to its vulnerability to environmental stresses. The present study provided the first report, to the best of our knowledge, of transcriptome analysis of ginseng at the leaf‑expansion stage. Using the Illumina sequencing platform, >40,000,000 high‑quality paired‑end reads were obtained and assembled into 100,533 unique sequences. When the sequences were searched against the publicly available National Center for Biotechnology Information protein database using The Basic Local Alignment Search Tool, 61,599 sequences exhibited similarity to known proteins. Functional annotation and classification, including use of the Gene Ontology, Clusters of Orthologous Groups, and Kyoto Encyclopedia of Genes and Genomes databases, revealed that the activated genes in ginseng were predominantly ribonuclease‑like storage genes, environmental stress genes, pathogenesis-related genes and other antioxidant genes. A number of candidate genes in environmental stress‑associated pathways were also identified. These novel data provide useful information on the growth and development stages of ginseng, and serve as an important public information platform for further understanding of the molecular mechanisms and functional genomics of ginseng.

  9. Common variants in Mendelian kidney disease genes and their association with renal function.

    PubMed

    Parsa, Afshin; Fuchsberger, Christian; Köttgen, Anna; O'Seaghdha, Conall M; Pattaro, Cristian; de Andrade, Mariza; Chasman, Daniel I; Teumer, Alexander; Endlich, Karlhans; Olden, Matthias; Chen, Ming-Huei; Tin, Adrienne; Kim, Young J; Taliun, Daniel; Li, Man; Feitosa, Mary; Gorski, Mathias; Yang, Qiong; Hundertmark, Claudia; Foster, Meredith C; Glazer, Nicole; Isaacs, Aaron; Rao, Madhumathi; Smith, Albert V; O'Connell, Jeffrey R; Struchalin, Maksim; Tanaka, Toshiko; Li, Guo; Hwang, Shih-Jen; Atkinson, Elizabeth J; Lohman, Kurt; Cornelis, Marilyn C; Johansson, Asa; Tönjes, Anke; Dehghan, Abbas; Couraki, Vincent; Holliday, Elizabeth G; Sorice, Rossella; Kutalik, Zoltan; Lehtimäki, Terho; Esko, Tõnu; Deshmukh, Harshal; Ulivi, Sheila; Chu, Audrey Y; Murgia, Federico; Trompet, Stella; Imboden, Medea; Kollerits, Barbara; Pistis, Giorgio; Harris, Tamara B; Launer, Lenore J; Aspelund, Thor; Eiriksdottir, Gudny; Mitchell, Braxton D; Boerwinkle, Eric; Schmidt, Helena; Hofer, Edith; Hu, Frank; Demirkan, Ayse; Oostra, Ben A; Turner, Stephen T; Ding, Jingzhong; Andrews, Jeanette S; Freedman, Barry I; Giulianini, Franco; Koenig, Wolfgang; Illig, Thomas; Döring, Angela; Wichmann, H-Erich; Zgaga, Lina; Zemunik, Tatijana; Boban, Mladen; Minelli, Cosetta; Wheeler, Heather E; Igl, Wilmar; Zaboli, Ghazal; Wild, Sarah H; Wright, Alan F; Campbell, Harry; Ellinghaus, David; Nöthlings, Ute; Jacobs, Gunnar; Biffar, Reiner; Ernst, Florian; Homuth, Georg; Kroemer, Heyo K; Nauck, Matthias; Stracke, Sylvia; Völker, Uwe; Völzke, Henry; Kovacs, Peter; Stumvoll, Michael; Mägi, Reedik; Hofman, Albert; Uitterlinden, Andre G; Rivadeneira, Fernando; Aulchenko, Yurii S; Polasek, Ozren; Hastie, Nick; Vitart, Veronique; Helmer, Catherine; Wang, Jie Jin; Stengel, Bénédicte; Ruggiero, Daniela; Bergmann, Sven; Kähönen, Mika; Viikari, Jorma; Nikopensius, Tiit; Province, Michael; Colhoun, Helen; Doney, Alex; Robino, Antonietta; Krämer, Bernhard K; Portas, Laura; Ford, Ian; Buckley, Brendan M; Adam, Martin; Thun, Gian-Andri; Paulweber, Bernhard; Haun, Margot; Sala, Cinzia; Mitchell, Paul; Ciullo, Marina; Vollenweider, Peter; Raitakari, Olli; Metspalu, Andres; Palmer, Colin; Gasparini, Paolo; Pirastu, Mario; Jukema, J Wouter; Probst-Hensch, Nicole M; Kronenberg, Florian; Toniolo, Daniela; Gudnason, Vilmundur; Shuldiner, Alan R; Coresh, Josef; Schmidt, Reinhold; Ferrucci, Luigi; van Duijn, Cornelia M; Borecki, Ingrid; Kardia, Sharon L R; Liu, Yongmei; Curhan, Gary C; Rudan, Igor; Gyllensten, Ulf; Wilson, James F; Franke, Andre; Pramstaller, Peter P; Rettig, Rainer; Prokopenko, Inga; Witteman, Jacqueline; Hayward, Caroline; Ridker, Paul M; Bochud, Murielle; Heid, Iris M; Siscovick, David S; Fox, Caroline S; Kao, W Linda; Böger, Carsten A

    2013-12-01

    Many common genetic variants identified by genome-wide association studies for complex traits map to genes previously linked to rare inherited Mendelian disorders. A systematic analysis of common single-nucleotide polymorphisms (SNPs) in genes responsible for Mendelian diseases with kidney phenotypes has not been performed. We thus developed a comprehensive database of genes for Mendelian kidney conditions and evaluated the association between common genetic variants within these genes and kidney function in the general population. Using the Online Mendelian Inheritance in Man database, we identified 731 unique disease entries related to specific renal search terms and confirmed a kidney phenotype in 218 of these entries, corresponding to mutations in 258 genes. We interrogated common SNPs (minor allele frequency >5%) within these genes for association with the estimated GFR in 74,354 European-ancestry participants from the CKDGen Consortium. However, the top four candidate SNPs (rs6433115 at LRP2, rs1050700 at TSC1, rs249942 at PALB2, and rs9827843 at ROBO2) did not achieve significance in a stage 2 meta-analysis performed in 56,246 additional independent individuals, indicating that these common SNPs are not associated with estimated GFR. The effect of less common or rare variants in these genes on kidney function in the general population and disease-specific cohorts requires further research.

  10. Transcriptome-Wide Identification of Reference Genes for Expression Analysis of Soybean Responses to Drought Stress along the Day.

    PubMed

    Marcolino-Gomes, Juliana; Rodrigues, Fabiana Aparecida; Fuganti-Pagliarini, Renata; Nakayama, Thiago Jonas; Ribeiro Reis, Rafaela; Bouças Farias, Jose Renato; Harmon, Frank G; Correa Molinari, Hugo Bruno; Correa Molinari, Mayla Daiane; Nepomuceno, Alexandre

    2015-01-01

    The soybean transcriptome displays strong variation along the day in optimal growth conditions and also in response to adverse circumstances, like drought stress. However, no study conducted to date has presented suitable reference genes, with stable expression along the day, for relative gene expression quantification in combined studies on drought stress and diurnal oscillations. Recently, water deficit responses have been associated with circadian clock oscillations at the transcription level, revealing the existence of hitherto unknown processes and increasing the demand for studies on plant responses to drought stress and its oscillation during the day. We performed data mining from a transcriptome-wide background using microarrays and RNA-seq databases to select an unpublished set of candidate reference genes, specifically chosen for the normalization of gene expression in studies on soybean under both drought stress and diurnal oscillations. Experimental validation and stability analysis in soybean plants submitted to drought stress and sampled during a 24 h timecourse showed that four of these newer reference genes (FYVE, NUDIX, Golgin-84 and CYST) indeed exhibited greater expression stability than the conventionally used housekeeping genes (ELF1-β and β-actin) under these conditions. We also demonstrated the effect of using reference candidate genes with different stability values to normalize the relative expression data from a drought-inducible soybean gene (DREB5) evaluated in different periods of the day.

  11. Identifying the candidate genes involved in the calyx abscission process of 'Kuerlexiangli' (Pyrus sinkiangensis Yu) by digital transcript abundance measurements.

    PubMed

    Qi, Xiaoxiao; Wu, Jun; Wang, Lifen; Li, Leiting; Cao, Yufen; Tian, Luming; Dong, Xingguang; Zhang, Shaoling

    2013-10-23

    'Kuerlexiangli' (Pyrus sinkiangensis Yu), a native pear of Xinjiang, China, is an important agricultural fruit and primary export to the international market. However, fruit with persistent calyxes affect fruit shape and quality. Although several studies have looked into the physiological aspects of the calyx abscission process, the underlying molecular mechanisms remain unknown. In order to better understand the molecular basis of the process of calyx abscission, materials at three critical stages of regulation, with 6000 × Flusilazole plus 300 × PBO treatment (calyx abscising treatment) and 50 mg.L-1GA3 treatment (calyx persisting treatment), were collected and cDNA fragments were sequenced using digital transcript abundance measurements to identify candidate genes. Digital transcript abundance measurements was performed using high-throughput Illumina GAII sequencing on seven samples that were collected at three important stages of the calyx abscission process with chemical agent treatments promoting calyx abscission and persistence. Altogether more than 251,123,845 high quality reads were obtained with approximately 8.0 M raw data for each library. The values of 69.85%-71.90% of clean data in the digital transcript abundance measurements could be mapped to the pear genome database. There were 12,054 differentially expressed genes having Gene Ontology (GO) terms and associating with 251 Kyoto Encyclopedia of Genes and Genomes (KEGG) defined pathways. The differentially expressed genes correlated with calyx abscission were mainly involved in photosynthesis, plant hormone signal transduction, cell wall modification, transcriptional regulation, and carbohydrate metabolism. Furthermore, candidate calyx abscission-specific genes, e.g. Inflorescence deficient in abscission gene, were identified. Quantitative real-time PCR was used to confirm the digital transcript abundance measurements results. We identified candidate genes that showed highly dynamic changes in expression during the calyx abscission process. These genes are potential targets for future functional characterization and should be valuable for exploration of the mechanisms of calyx abscission, and eventually for developing methods based on small molecule application to induce calyx abscission in fruit production.

  12. Signature of genetic associations in oral cancer.

    PubMed

    Sharma, Vishwas; Nandan, Amrita; Sharma, Amitesh Kumar; Singh, Harpreet; Bharadwaj, Mausumi; Sinha, Dhirendra Narain; Mehrotra, Ravi

    2017-10-01

    Oral cancer etiology is complex and controlled by multi-factorial events including genetic events. Candidate gene studies, genome-wide association studies, and next-generation sequencing identified various chromosomal loci to be associated with oral cancer. There is no available review that could give us the comprehensive picture of genetic loci identified to be associated with oral cancer by candidate gene studies-based, genome-wide association studies-based, and next-generation sequencing-based approaches. A systematic literature search was performed in the PubMed database to identify the loci associated with oral cancer by exclusive candidate gene studies-based, genome-wide association studies-based, and next-generation sequencing-based study approaches. The information of loci associated with oral cancer is made online through the resource "ORNATE." Next, screening of the loci validated by candidate gene studies and next-generation sequencing approach or by two independent studies within candidate gene studies or next-generation sequencing approaches were performed. A total of 264 loci were identified to be associated with oral cancer by candidate gene studies, genome-wide association studies, and next-generation sequencing approaches. In total, 28 loci, that is, 14q32.33 (AKT1), 5q22.2 (APC), 11q22.3 (ATM), 2q33.1 (CASP8), 11q13.3 (CCND1), 16q22.1 (CDH1), 9p21.3 (CDKN2A), 1q31.1 (COX-2), 7p11.2 (EGFR), 22q13.2 (EP300), 4q35.2 (FAT1), 4q31.3 (FBXW7), 4p16.3 (FGFR3), 1p13.3 (GSTM1-GSTT1), 11q13.2 (GSTP1), 11p15.5 (H-RAS), 3p25.3 (hOGG1), 1q32.1 (IL-10), 4q13.3 (IL-8), 12p12.1 (KRAS), 12q15 (MDM2), 12q13.12 (MLL2), 9q34.3 (NOTCH1), 17p13.1 (p53), 3q26.32 (PIK3CA), 10q23.31 (PTEN), 13q14.2 (RB1), and 5q14.2 (XRCC4), were validated to be associated with oral cancer. "ORNATE" gives a snapshot of genetic loci associated with oral cancer. All 28 loci were validated to be linked to oral cancer for which further fine-mapping followed by gene-by-gene and gene-environment interaction studies is needed to confirm their involvement in modifying oral cancer.

  13. Role of skeletal muscle in ear development.

    PubMed

    Rot, Irena; Baguma-Nibasheka, Mark; Costain, Willard J; Hong, Paul; Tafra, Robert; Mardesic-Brakus, Snjezana; Mrduljas-Djujic, Natasa; Saraga-Babic, Mirna; Kablar, Boris

    2017-10-01

    The current paper is a continuation of our work described in Rot and Kablar, 2010. Here, we show lists of 10 up- and 87 down-regulated genes obtained by a cDNA microarray analysis that compared developing Myf5-/-:Myod-/- (and Mrf4-/-) petrous part of the temporal bone, containing middle and inner ear, to the control, at embryonic day 18.5. Myf5-/-:Myod-/- fetuses entirely lack skeletal myoblasts and muscles. They are unable to move their head, which interferes with the perception of angular acceleration. Previously, we showed that the inner ear areas most affected in Myf5-/-:Myod-/- fetuses were the vestibular cristae ampullaris, sensitive to angular acceleration. Our finding that the type I hair cells were absent in the mutants' cristae was further used here to identify a profile of genes specific to the lacking cell type. Microarrays followed by a detailed consultation of web-accessible mouse databases allowed us to identify 6 candidate genes with a possible role in the development of the inner ear sensory organs: Actc1, Pgam2, Ldb3, Eno3, Hspb7 and Smpx. Additionally, we searched for human homologues of the candidate genes since a number of syndromes in humans have associated inner ear abnormalities. Mutations in one of our candidate genes, Smpx, have been reported as the cause of X-linked deafness in humans. Our current study suggests an epigenetic role that mechanical, and potentially other, stimuli originating from muscle, play in organogenesis, and offers an approach to finding novel genes responsible for altered inner ear phenotypes.

  14. Dictionary-driven prokaryotic gene finding

    PubMed Central

    Shibuya, Tetsuo; Rigoutsos, Isidore

    2002-01-01

    Gene identification, also known as gene finding or gene recognition, is among the important problems of molecular biology that have been receiving increasing attention with the advent of large scale sequencing projects. Previous strategies for solving this problem can be categorized into essentially two schools of thought: one school employs sequence composition statistics, whereas the other relies on database similarity searches. In this paper, we propose a new gene identification scheme that combines the best characteristics from each of these two schools. In particular, our method determines gene candidates among the ORFs that can be identified in a given DNA strand through the use of the Bio-Dictionary, a database of patterns that covers essentially all of the currently available sample of the natural protein sequence space. Our approach relies entirely on the use of redundant patterns as the agents on which the presence or absence of genes is predicated and does not employ any additional evidence, e.g. ribosome-binding site signals. The Bio-Dictionary Gene Finder (BDGF), the algorithm’s implementation, is a single computational engine able to handle the gene identification task across distinct archaeal and bacterial genomes. The engine exhibits performance that is characterized by simultaneous very high values of sensitivity and specificity, and a high percentage of correctly predicted start sites. Using a collection of patterns derived from an old (June 2000) release of the Swiss-Prot/TrEMBL database that contained 451 602 proteins and fragments, we demonstrate our method’s generality and capabilities through an extensive analysis of 17 complete archaeal and bacterial genomes. Examples of previously unreported genes are also shown and discussed in detail. PMID:12060689

  15. Whole Transcriptome Analysis Provides Insights into Molecular Mechanisms for Molting in Litopenaeus vannamei

    PubMed Central

    Gao, Yi; Zhang, Xiaojun; Wei, Jiankai; Sun, Xiaoqing; Yuan, Jianbo; Li, Fuhua; Xiang, Jianhai

    2015-01-01

    Molting is one of the most important biological processes in shrimp growth and development. All shrimp undergo cyclic molting periodically to shed and replace their exoskeletons. This process is essential for growth, metamorphosis, and reproduction in shrimp. However, the molecular mechanisms underlying shrimp molting remain poorly understood. In this study, we investigated global expression changes in the transcriptomes of the Pacific white shrimp, Litopenaeus vannamei, the most commonly cultured shrimp species worldwide. The transcriptome of whole L. vannamei was investigated by RNA-sequencing (RNA-seq) throughout the molting cycle, including the inter-molt (C), pre-molt (D0, D1, D2, D3, D4), and post-molt (P1 and P2) stages, and 93,756 unigenes were identified. Among these genes, we identified 5,117 genes differentially expressed (log2ratio ≥1 and FDR ≤0.001) in adjacent molt stages. The results were compared against the National Center for Biotechnology Information (NCBI) non-redundant protein/nucleotide sequence database, Swiss-Prot, PFAM database, the Gene Ontology database, and the Kyoto Encyclopedia of Genes and Genomes database in order to annotate gene descriptions, associate them with gene ontology terms, and assign them to pathways. The expression patterns for genes involved in several molecular events critical for molting, such as hormone regulation, triggering events, implementation phases, skelemin, immune responses were characterized and considered as mechanisms underlying molting in L. vannamei. Comparisons with transcriptomic analyses in other arthropods were also performed. The characterization of major transcriptional changes in genes involved in the molting cycle provides candidates for future investigation of the molecular mechanisms. The data generated in this study will serve as an important transcriptomic resource for the shrimp research community to facilitate gene and genome annotation and to characterize key molecular processes underlying shrimp development. PMID:26650402

  16. Inferring Gene Regulatory Networks by Singular Value Decomposition and Gravitation Field Algorithm

    PubMed Central

    Zheng, Ming; Wu, Jia-nan; Huang, Yan-xin; Liu, Gui-xia; Zhou, You; Zhou, Chun-guang

    2012-01-01

    Reconstruction of gene regulatory networks (GRNs) is of utmost interest and has become a challenge computational problem in system biology. However, every existing inference algorithm from gene expression profiles has its own advantages and disadvantages. In particular, the effectiveness and efficiency of every previous algorithm is not high enough. In this work, we proposed a novel inference algorithm from gene expression data based on differential equation model. In this algorithm, two methods were included for inferring GRNs. Before reconstructing GRNs, singular value decomposition method was used to decompose gene expression data, determine the algorithm solution space, and get all candidate solutions of GRNs. In these generated family of candidate solutions, gravitation field algorithm was modified to infer GRNs, used to optimize the criteria of differential equation model, and search the best network structure result. The proposed algorithm is validated on both the simulated scale-free network and real benchmark gene regulatory network in networks database. Both the Bayesian method and the traditional differential equation model were also used to infer GRNs, and the results were used to compare with the proposed algorithm in our work. And genetic algorithm and simulated annealing were also used to evaluate gravitation field algorithm. The cross-validation results confirmed the effectiveness of our algorithm, which outperforms significantly other previous algorithms. PMID:23226565

  17. Transcriptome mining: Multigene panel to test delousing drug response in the sea louse Caligus rogercresseyi.

    PubMed

    Valenzuela-Muñoz, V; Gallardo-Escárate, C

    2016-02-01

    Controlling infestations of copepodid ectoparasites in the salmon industry is increasingly problematic given higher instances of drug resistance or loss of sensitivity. Despite the importance of this issue, the molecular mechanisms and genes implicated in resistance/susceptibility are only scarcely understood. The objective of the present study was to identify and evaluate the expression levels of candidate genes associated with delousing drug response in the sea louse Caligus rogercresseyi. From RNA-seq data obtained for adult male and female sea lice, 62.48 M reads were assembled in 70,349 high-quality contigs. BLASTX analysis against UniprotKB/Swiss-Prot and the ESTs available for crustaceans in the NCBI database identified 870 transcripts previously related to genes associated with delousing drug response. Furthermore, 14 candidate genes were validated through RT-qPCR and were evaluated with deltamethrin and azamethiphos bioassays. The results evidenced an overregulation of genes involved in ion transport in salmon lice treated with deltamethrin, while those treated with azamethiphos evidenced an overregulation of genes such as cytochrome P450, Carboxylesterase, and acetylcholine receptors. The present study provides a multigene panel to test delousing drug response to pyrethroids and organophosphates in a highly prevalent pathogen of the Chilean salmon industry. Copyright © 2015 Elsevier B.V. All rights reserved.

  18. Understanding the pharmacogenetics of selective serotonin reuptake inhibitors.

    PubMed

    Fabbri, Chiara; Minarini, Alessandro; Niitsu, Tomihisa; Serretti, Alessandro

    2014-08-01

    The genetic background of antidepressant response represents a unique opportunity to identify biological markers of treatment outcome. Encouraging results alternating with inconsistent findings made antidepressant pharmacogenetics a stimulating but often discouraging field that requires careful discussion about cumulative evidence and methodological issues. The present review discusses both known and less replicated genes that have been implicated in selective serotonin reuptake inhibitors (SSRIs) efficacy and side effects. Candidate genes studies and genome-wide association studies (GWAS) were collected through MEDLINE database search (articles published till January 2014). Further, GWAS signals localized in promising genetic regions according to candidate gene studies are reported in order to assess the general comparability of results obtained through these two types of pharmacogenetic studies. Finally, a pathway enrichment approach is applied to the top genes (those harboring SNPs with p < 0.0001) outlined by previous GWAS in order to identify possible molecular mechanisms involved in SSRI effect. In order to improve the understanding of SSRI pharmacogenetics, the present review discusses the proposal of moving from the analysis of individual polymorphisms to genes and molecular pathways, and from the separation across different methodological approaches to their combination. Efforts in this direction are justified by the recent evidence of a favorable cost-utility of gene-guided antidepressant treatment.

  19. A conserved gene family encodes transmembrane proteins with fibronectin, immunoglobulin and leucine-rich repeat domains (FIGLER)

    PubMed Central

    Munfus, Delicia L; Haga, Christopher L; Burrows, Peter D; Cooper, Max D

    2007-01-01

    Background In mouse the cytokine interleukin-7 (IL-7) is required for generation of B lymphocytes, but human IL-7 does not appear to have this function. A bioinformatics approach was therefore used to identify IL-7 receptor related genes in the hope of identifying the elusive human cytokine. Results Our database search identified a family of nine gene candidates, which we have provisionally named fibronectin immunoglobulin leucine-rich repeat (FIGLER). The FIGLER 1–9 genes are predicted to encode type I transmembrane glycoproteins with 6–12 leucine-rich repeats (LRR), a C2 type Ig domain, a fibronectin type III domain, a hydrophobic transmembrane domain, and a cytoplasmic domain containing one to four tyrosine residues. Members of this multichromosomal gene family possess 20–47% overall amino acid identity and are differentially expressed in cell lines and primary hematopoietic lineage cells. Genes for FIGLER homologs were identified in macaque, orangutan, chimpanzee, mouse, rat, dog, chicken, toad, and puffer fish databases. The non-human FIGLER homologs share 38–99% overall amino acid identity with their human counterpart. Conclusion The extracellular domain structure and absence of recognizable cytoplasmic signaling motifs in members of the highly conserved FIGLER gene family suggest a trophic or cell adhesion function for these molecules. PMID:17854505

  20. Horizontal gene transfer of acetyltransferases, invertases and chorismate mutases from different bacteria to diverse recipients.

    PubMed

    Noon, Jason B; Baum, Thomas J

    2016-04-12

    Hoplolaimina plant-parasitic nematodes (PPN) are a lineage of animals with many documented cases of horizontal gene transfer (HGT). In a recent study, we reported on three likely HGT candidate genes in the soybean cyst nematode Heterodera glycines, all of which encode secreted candidate effectors with putative functions in the host plant. Hg-GLAND1 is a putative GCN5-related N-acetyltransferase (GNAT), Hg-GLAND13 is a putative invertase (INV), and Hg-GLAND16 is a putative chorismate mutase (CM), and blastp searches of the non-redundant database resulted in highest similarity to bacterial sequences. Here, we searched nematode and non-nematode sequence databases to identify all the nematodes possible that contain these three genes, and to formulate hypotheses about when they most likely appeared in the phylum Nematoda. We then performed phylogenetic analyses combined with model selection tests of alternative models of sequence evolution to determine whether these genes were horizontally acquired from bacteria. Mining of nematode sequence databases determined that GNATs appeared in Hoplolaimina PPN late in evolution, while both INVs and CMs appeared before the radiation of the Hoplolaimina suborder. Also, Hoplolaimina GNATs, INVs and CMs formed well-supported clusters with different rhizosphere bacteria in the phylogenetic trees, and the model selection tests greatly supported models of HGT over descent via common ancestry. Surprisingly, the phylogenetic trees also revealed additional, well-supported clusters of bacterial GNATs, INVs and CMs with diverse eukaryotes and archaea. There were at least eleven and eight well-supported clusters of GNATs and INVs, respectively, from different bacteria with diverse eukaryotes and archaea. Though less frequent, CMs from different bacteria formed supported clusters with multiple different eukaryotes. Moreover, almost all individual clusters containing bacteria and eukaryotes or archaea contained species that inhabit very similar niches. GNATs were horizontally acquired late in Hoplolaimina PPN evolution from bacteria most similar to the saprophytic and plant-pathogenic actinomycetes. INVs and CMs were horizontally acquired from bacteria most similar to rhizobacteria and Burkholderia soil bacteria, respectively, before the radiation of Hoplolaimina. Also, these three gene groups appear to have been frequent subjects of HGT from different bacteria to numerous, diverse lineages of eukaryotes and archaea, which suggests that these genes may confer important evolutionary advantages to many taxa. In the case of Hoplolaimina PPN, this advantage likely was an improved ability to parasitize plants.

  1. Candidate gene biodosimetry markers of exposure to external ionizing radiation in human blood: A systematic review

    PubMed Central

    Sima, Chao; Amundson, Sally A.; Zenhausern, Frederic

    2018-01-01

    Purpose To compile a list of genes that have been reported to be affected by external ionizing radiation (IR) and to assess their performance as candidate biomarkers for individual human radiation dosimetry. Methods Eligible studies were identified through extensive searches of the online databases from 1978 to 2017. Original English-language publications of microarray studies assessing radiation-induced changes in gene expression levels in human blood after external IR were included. Genes identified in at least half of the selected studies were retained for bio-statistical analysis in order to evaluate their diagnostic ability. Results 24 studies met the criteria and were included in this study. Radiation-induced expression of 10,170 unique genes was identified and the 31 genes that have been identified in at least 50% of studies (12/24 studies) were selected for diagnostic power analysis. Twenty-seven genes showed a significant Spearman’s correlation with radiation dose. Individually, TNFSF4, FDXR, MYC, ZMAT3 and GADD45A provided the best discrimination of radiation dose < 2 Gy and dose ≥ 2 Gy according to according to their maximized Youden’s index (0.67, 0.55, 0.55, 0.55 and 0.53 respectively). Moreover, 12 combinations of three genes display an area under the Receiver Operating Curve (ROC) curve (AUC) = 1 reinforcing the concept of biomarker combinations instead of looking for an ideal and unique biomarker. Conclusion Gene expression is a promising approach for radiation dosimetry assessment. A list of robust candidate biomarkers has been identified from analysis of the studies published to date, confirming for example the potential of well-known genes such as FDXR and TNFSF4 or highlighting other promising gene such as ZMAT3. However, heterogeneity in protocols and analysis methods will require additional studies to confirm these results. PMID:29879226

  2. NEIBank: Genomics and bioinformatics resources for vision research

    PubMed Central

    Peterson, Katherine; Gao, James; Buchoff, Patee; Jaworski, Cynthia; Bowes-Rickman, Catherine; Ebright, Jessica N.; Hauser, Michael A.; Hoover, David

    2008-01-01

    NEIBank is an integrated resource for genomics and bioinformatics in vision research. It includes expressed sequence tag (EST) data and sequence-verified cDNA clones for multiple eye tissues of several species, web-based access to human eye-specific SAGE data through EyeSAGE, and comprehensive, annotated databases of known human eye disease genes and candidate disease gene loci. All expression- and disease-related data are integrated in EyeBrowse, an eye-centric genome browser. NEIBank provides a comprehensive overview of current knowledge of the transcriptional repertoires of eye tissues and their relation to pathology. PMID:18648525

  3. Identification and functional analyses of sex determination genes in the sexually dimorphic stag beetle Cyclommatus metallifer.

    PubMed

    Gotoh, Hiroki; Zinna, Robert A; Warren, Ian; DeNieu, Michael; Niimi, Teruyuki; Dworkin, Ian; Emlen, Douglas J; Miura, Toru; Lavine, Laura C

    2016-03-22

    Genes in the sex determination pathway are important regulators of sexually dimorphic animal traits, including the elaborate and exaggerated male ornaments and weapons of sexual selection. In this study, we identified and functionally analyzed members of the sex determination gene family in the golden metallic stag beetle Cyclommatus metallifer, which exhibits extreme differences in mandible size between males and females. We constructed a C. metallifer transcriptomic database from larval and prepupal developmental stages and tissues of both males and females. Using Roche 454 pyrosequencing, we generated a de novo assembled database from a total of 1,223,516 raw reads, which resulted in 14,565 isotigs (putative transcript isoforms) contained in 10,794 isogroups (putative identified genes). We queried this database for C. metallifer conserved sex determination genes and identified 14 candidate sex determination pathway genes. We then characterized the roles of several of these genes in development of extreme sexual dimorphic traits in this species. We performed molecular expression analyses with RT-PCR and functional analyses using RNAi on three C. metallifer candidate genes--Sex-lethal (CmSxl), transformer-2 (Cmtra2), and intersex (Cmix). No differences in expression pattern were found between the sexes for any of these three genes. In the RNAi gene-knockdown experiments, we found that only the Cmix had any effect on sexually dimorphic morphology, and these mimicked the effects of Cmdsx knockdown in females. Knockdown of CmSxl had no measurable effects on stag beetle phenotype, while knockdown of Cmtra2 resulted in complete lethality at the prepupal period. These results indicate that the roles of CmSxl and Cmtra2 in the sex determination cascade are likely to have diverged in stag beetles when compared to Drosophila. Our results also suggest that Cmix has a conserved role in this pathway. In addition to those three genes, we also performed a more complete functional analysis of the C. metallifer dsx gene (Cmdsx) to identify the isoforms that regulate dimorphism more fully using exon-specific RNAi. We identified a total of 16 alternative splice variants of the Cmdsx gene that code for up to 14 separate exons. Despite the variation in RNA splice products of the Cmdsx gene, only four protein isoforms are predicted. The results of our exon-specific RNAi indicated that the essential CmDsx isoform for postembryonic male differentiation is CmDsxB, whereas postembryonic female specific differentiation is mainly regulated by CmDsxD. Taken together, our results highlight the importance of studying the function of highly conserved sex determination pathways in numerous insect species, especially those with dramatic and exaggerated sexual dimorphism, because conservation in protein structure does not always translate into conservation in downstream function.

  4. An Arrayed Genome-Scale Lentiviral-Enabled Short Hairpin RNA Screen Identifies Lethal and Rescuer Gene Candidates

    PubMed Central

    Bhinder, Bhavneet; Antczak, Christophe; Ramirez, Christina N.; Shum, David; Liu-Sullivan, Nancy; Radu, Constantin; Frattini, Mark G.

    2013-01-01

    Abstract RNA interference technology is becoming an integral tool for target discovery and validation.; With perhaps the exception of only few studies published using arrayed short hairpin RNA (shRNA) libraries, most of the reports have been either against pooled siRNA or shRNA, or arrayed siRNA libraries. For this purpose, we have developed a workflow and performed an arrayed genome-scale shRNA lethality screen against the TRC1 library in HeLa cells. The resulting targets would be a valuable resource of candidates toward a better understanding of cellular homeostasis. Using a high-stringency hit nomination method encompassing criteria of at least three active hairpins per gene and filtered for potential off-target effects (OTEs), referred to as the Bhinder–Djaballah analysis method, we identified 1,252 lethal and 6 rescuer gene candidates, knockdown of which resulted in severe cell death or enhanced growth, respectively. Cross referencing individual hairpins with the TRC1 validated clone database, 239 of the 1,252 candidates were deemed independently validated with at least three validated clones. Through our systematic OTE analysis, we have identified 31 microRNAs (miRNAs) in lethal and 2 in rescuer genes; all having a seed heptamer mimic in the corresponding shRNA hairpins and likely cause of the OTE observed in our screen, perhaps unraveling a previously unknown plausible essentiality of these miRNAs in cellular viability. Taken together, we report on a methodology for performing large-scale arrayed shRNA screens, a comprehensive analysis method to nominate high-confidence hits, and a performance assessment of the TRC1 library highlighting the intracellular inefficiencies of shRNA processing in general. PMID:23198867

  5. RankProd Combined with Genetic Algorithm Optimized Artificial Neural Network Establishes a Diagnostic and Prognostic Prediction Model that Revealed C1QTNF3 as a Biomarker for Prostate Cancer.

    PubMed

    Hou, Qi; Bing, Zhi-Tong; Hu, Cheng; Li, Mao-Yin; Yang, Ke-Hu; Mo, Zu; Xie, Xiang-Wei; Liao, Ji-Lin; Lu, Yan; Horie, Shigeo; Lou, Ming-Wu

    2018-06-01

    Prostate cancer (PCa) is the most commonly diagnosed cancer in males in the Western world. Although prostate-specific antigen (PSA) has been widely used as a biomarker for PCa diagnosis, its results can be controversial. Therefore, new biomarkers are needed to enhance the clinical management of PCa. From publicly available microarray data, differentially expressed genes (DEGs) were identified by meta-analysis with RankProd. Genetic algorithm optimized artificial neural network (GA-ANN) was introduced to establish a diagnostic prediction model and to filter candidate genes. The diagnostic and prognostic capability of the prediction model and candidate genes were investigated in both GEO and TCGA datasets. Candidate genes were further validated by qPCR, Western Blot and Tissue microarray. By RankProd meta-analyses, 2306 significantly up- and 1311 down-regulated probes were found in 133 cases and 30 controls microarray data. The overall accuracy rate of the PCa diagnostic prediction model, consisting of a 15-gene signature, reached up to 100% in both the training and test dataset. The prediction model also showed good results for the diagnosis (AUC = 0.953) and prognosis (AUC of 5 years overall survival time = 0.808) of PCa in the TCGA database. The expression levels of three genes, FABP5, C1QTNF3 and LPHN3, were validated by qPCR. C1QTNF3 high expression was further validated in PCa tissue by Western Blot and Tissue microarray. In the GEO datasets, C1QTNF3 was a good predictor for the diagnosis of PCa (GSE6956: AUC = 0.791; GSE8218: AUC = 0.868; GSE26910: AUC = 0.972). In the TCGA database, C1QTNF3 was significantly associated with PCa patient recurrence free survival (P < .001, AUC = 0.57). In this study, we have developed a diagnostic and prognostic prediction model for PCa. C1QTNF3 was revealed as a promising biomarker for PCa. This approach can be applied to other high-throughput data from different platforms for the discovery of oncogenes or biomarkers in different kinds of diseases. Copyright © 2018. Published by Elsevier B.V.

  6. Identification of candidate genes associated with porcine meat color traits by genome-wide transcriptome analysis.

    PubMed

    Li, Bojiang; Dong, Chao; Li, Pinghua; Ren, Zhuqing; Wang, Han; Yu, Fengxiang; Ning, Caibo; Liu, Kaiqing; Wei, Wei; Huang, Ruihua; Chen, Jie; Wu, Wangjun; Liu, Honglin

    2016-10-17

    Meat color is considered to be the most important indicator of meat quality, however, the molecular mechanisms underlying traits related to meat color remain mostly unknown. In this study, to elucidate the molecular basis of meat color, we constructed six cDNA libraries from biceps femoris (Bf) and soleus (Sol), which exhibit obvious differences in meat color, and analyzed the whole-transcriptome differences between Bf (white muscle) and Sol (red muscle) using high-throughput sequencing technology. Using DEseq2 method, we identified 138 differentially expressed genes (DEGs) between Bf and Sol. Using DEGseq method, we identified 770, 810, and 476 DEGs in comparisons between Bf and Sol in three separate animals. Of these DEGs, 52 were overlapping DEGs. Using these data, we determined the enriched GO terms, metabolic pathways and candidate genes associated with meat color traits. Additionally, we mapped 114 non-redundant DEGs to the meat color QTLs via a comparative analysis with the porcine quantitative trait loci (QTL) database. Overall, our data serve as a valuable resource for identifying genes whose functions are critical for meat color traits and can accelerate studies of the molecular mechanisms of meat color formation.

  7. Identification of candidate genes associated with porcine meat color traits by genome-wide transcriptome analysis

    PubMed Central

    Li, Bojiang; Dong, Chao; Li, Pinghua; Ren, Zhuqing; Wang, Han; Yu, Fengxiang; Ning, Caibo; Liu, Kaiqing; Wei, Wei; Huang, Ruihua; Chen, Jie; Wu, Wangjun; Liu, Honglin

    2016-01-01

    Meat color is considered to be the most important indicator of meat quality, however, the molecular mechanisms underlying traits related to meat color remain mostly unknown. In this study, to elucidate the molecular basis of meat color, we constructed six cDNA libraries from biceps femoris (Bf) and soleus (Sol), which exhibit obvious differences in meat color, and analyzed the whole-transcriptome differences between Bf (white muscle) and Sol (red muscle) using high-throughput sequencing technology. Using DEseq2 method, we identified 138 differentially expressed genes (DEGs) between Bf and Sol. Using DEGseq method, we identified 770, 810, and 476 DEGs in comparisons between Bf and Sol in three separate animals. Of these DEGs, 52 were overlapping DEGs. Using these data, we determined the enriched GO terms, metabolic pathways and candidate genes associated with meat color traits. Additionally, we mapped 114 non-redundant DEGs to the meat color QTLs via a comparative analysis with the porcine quantitative trait loci (QTL) database. Overall, our data serve as a valuable resource for identifying genes whose functions are critical for meat color traits and can accelerate studies of the molecular mechanisms of meat color formation. PMID:27748458

  8. Characterization of the OmyY1 region on the rainbow trout Y chromosome

    USGS Publications Warehouse

    Phillips, Ruth B.; DeKoning, Jenefer J.; Brunelli, Joseph P.; Faber-Hammond, Joshua J.; Hansen, John D.; Christensen, Kris A.; Renn, Suzy C.P.; Thorgaard, Gary H.

    2013-01-01

    We characterized the male-specific region on the Y chromosome of rainbow trout, which contains both sdY (the sex-determining gene) and the male-specific genetic marker, OmyY1. Several clones containing the OmyY1 marker were screened from a BAC library from a YY clonal line and found to be part of an 800 kb BAC contig. Using fluorescence in situ hybridization (FISH), these clones were localized to the end of the short arm of the Y chromosome in rainbow trout, with an additional signal on the end of the X chromosome in many cells. We sequenced a minimum tiling path of these clones using Illumina and 454 pyrosequencing. The region is rich in transposons and rDNA, but also appears to contain several single-copy protein-coding genes. Most of these genes are also found on the X chromosome; and in several cases sex-specific SNPs in these genes were identified between the male (YY) and female (XX) homozygous clonal lines. Additional genes were identified by hybridization of the BACs to the cGRASP salmonid 4x44K oligo microarray. By BLASTn evaluations using hypothetical transcripts of OmyY1-linked candidate genes as query against several EST databases, we conclude at least 12 of these candidate genes are likely functional, and expressed.

  9. A taxonomic framework for cable bacteria and proposal of the candidate genera Electrothrix and Electronema.

    PubMed

    Trojan, Daniela; Schreiber, Lars; Bjerg, Jesper T; Bøggild, Andreas; Yang, Tingting; Kjeldsen, Kasper U; Schramm, Andreas

    2016-07-01

    Cable bacteria are long, multicellular filaments that can conduct electric currents over centimeter-scale distances. All cable bacteria identified to date belong to the deltaproteobacterial family Desulfobulbaceae and have not been isolated in pure culture yet. Their taxonomic delineation and exact phylogeny is uncertain, as most studies so far have reported only short partial 16S rRNA sequences or have relied on identification by a combination of filament morphology and 16S rRNA-targeted fluorescence in situ hybridization with a Desulfobulbaceae-specific probe. In this study, nearly full-length 16S rRNA gene sequences of 16 individual cable bacteria filaments from freshwater, salt marsh, and marine sites of four geographic locations are presented. These sequences formed a distinct, monophyletic sister clade to the genus Desulfobulbus and could be divided into six coherent, species-level clusters, arranged as two genus-level groups. The same grouping was retrieved by phylogenetic analysis of full or partial dsrAB genes encoding the dissimilatory sulfite reductase. Based on these results, it is proposed to accommodate cable bacteria within two novel candidate genera: the mostly marine "Candidatus Electrothrix", with four candidate species, and the mostly freshwater "Candidatus Electronema", with two candidate species. This taxonomic framework can be used to assign environmental sequences confidently to the cable bacteria clade, even without morphological information. Database searches revealed 185 16S rRNA gene sequences that affiliated within the clade formed by the proposed cable bacteria genera, of which 120 sequences could be assigned to one of the six candidate species, while the remaining 65 sequences indicated the existence of up to five additional species. Copyright © 2016 The Author(s). Published by Elsevier GmbH.. All rights reserved.

  10. Fungal Screening on Olive Oil for Extracellular Triacylglycerol Lipases: Selection of a Trichoderma harzianum Strain and Genome Wide Search for the Genes

    PubMed Central

    Canseco-Pérez, Miguel Angel; Castillo-Avila, Genny Margarita; Islas-Flores, Ignacio; Apolinar-Hernández, Max M.; Rivera-Muñoz, Gerardo; Gamboa-Angulo, Marcela; Couoh-Uicab, Yeny

    2018-01-01

    A lipolytic screening with fungal strains isolated from lignocellulosic waste collected in banana plantation dumps was carried out. A Trichoderma harzianum strain (B13-1) showed good extracellular lipolytic activity (205 UmL−1). Subsequently, functional screening of the lipolytic activity on Rhodamine B enriched with olive oil as the only carbon source was performed. The successful growth of the strain allows us to suggest that a true lipase is responsible for the lipolytic activity in the B13-1 strain. In order to identify the gene(s) encoding the protein responsible for the lipolytic activity, in silico identification and characterization of triacylglycerol lipases from T. harzianum is reported for the first time. A survey in the genome of this fungus retrieved 50 lipases; however, bioinformatic analyses and putative functional descriptions in different databases allowed us to choose seven lipases as candidates. Suitability of the bioinformatic screening to select the candidates was confirmed by reverse transcription polymerase chain reaction (RT-PCR). The gene codifying 526309 was expressed when the fungus grew in a medium with olive oil as carbon source. This protein shares homology with commercial lipases, making it a candidate for further applications. The success in identifying a lipase gene inducible with olive oil and the suitability of the functional screening and bioinformatic survey carried out herein, support the premise that the strategy can be used in other microorganisms with sequenced genomes to search for true lipases, or other enzymes belonging to large protein families. PMID:29370083

  11. Comparative genomics identifies candidate genes for infectious salmon anemia (ISA) resistance in Atlantic salmon (Salmo salar).

    PubMed

    Li, Jieying; Boroevich, Keith A; Koop, Ben F; Davidson, William S

    2011-04-01

    Infectious salmon anemia (ISA) has been described as the hoof and mouth disease of salmon farming. ISA is caused by a lethal and highly communicable virus, which can have a major impact on salmon aquaculture, as demonstrated by an outbreak in Chile in 2007. A quantitative trait locus (QTL) for ISA resistance has been mapped to three microsatellite markers on linkage group (LG) 8 (Chr 15) on the Atlantic salmon genetic map. We identified bacterial artificial chromosome (BAC) clones and three fingerprint contigs from the Atlantic salmon physical map that contains these markers. We made use of the extensive BAC end sequence database to extend these contigs by chromosome walking and identified additional two markers in this region. The BAC end sequences were used to search for conserved synteny between this segment of LG8 and the fish genomes that have been sequenced. An examination of the genes in the syntenic segments of the tetraodon and medaka genomes identified candidates for association with ISA resistance in Atlantic salmon based on differential expression profiles from ISA challenges or on the putative biological functions of the proteins they encode. One gene in particular, HIV-EP2/MBP-2, caught our attention as it may influence the expression of several genes that have been implicated in the response to infection by infectious salmon anemia virus (ISAV). Therefore, we suggest that HIV-EP2/MBP-2 is a very strong candidate for the gene associated with the ISAV resistance QTL in Atlantic salmon and is worthy of further study.

  12. Genome-wide identification of suitable zebrafish Danio rerio reference genes for normalization of gene expression data by RT-qPCR.

    PubMed

    Xu, H; Li, C; Zeng, Q; Agrawal, I; Zhu, X; Gong, Z

    2016-06-01

    In this study, to systematically identify the most stably expressed genes for internal reference in zebrafish Danio rerio investigations, 37 D. rerio transcriptomic datasets (both RNA sequencing and microarray data) were collected from gene expression omnibus (GEO) database and unpublished data, and gene expression variations were analysed under three experimental conditions: tissue types, developmental stages and chemical treatments. Forty-four putative candidate genes were identified with the c.v. <0·2 from all datasets. Following clustering into different functional groups, 21 genes, in addition to four conventional housekeeping genes (eef1a1l1, b2m, hrpt1l and actb1), were selected from different functional groups for further quantitative real-time (qrt-)PCR validation using 25 RNA samples from different adult tissues, developmental stages and chemical treatments. The qrt-PCR data were then analysed using the statistical algorithm refFinder for gene expression stability. Several new candidate genes showed better expression stability than the conventional housekeeping genes in all three categories. It was found that sep15 and metap1 were the top two stable genes for tissue types, ube2a and tmem50a the top two for different developmental stages, and rpl13a and rp1p0 the top two for chemical treatments. Thus, based on the extensive transcriptomic analyses and qrt-PCR validation, these new reference genes are recommended for normalization of D. rerio qrt-PCR data respectively for the three different experimental conditions. © 2016 The Fisheries Society of the British Isles.

  13. An emerging cyberinfrastructure for biodefense pathogen and pathogen–host data

    PubMed Central

    Zhang, C.; Crasta, O.; Cammer, S.; Will, R.; Kenyon, R.; Sullivan, D.; Yu, Q.; Sun, W.; Jha, R.; Liu, D.; Xue, T.; Zhang, Y.; Moore, M.; McGarvey, P.; Huang, H.; Chen, Y.; Zhang, J.; Mazumder, R.; Wu, C.; Sobral, B.

    2008-01-01

    The NIAID-funded Biodefense Proteomics Resource Center (RC) provides storage, dissemination, visualization and analysis capabilities for the experimental data deposited by seven Proteomics Research Centers (PRCs). The data and its publication is to support researchers working to discover candidates for the next generation of vaccines, therapeutics and diagnostics against NIAID's Category A, B and C priority pathogens. The data includes transcriptional profiles, protein profiles, protein structural data and host–pathogen protein interactions, in the context of the pathogen life cycle in vivo and in vitro. The database has stored and supported host or pathogen data derived from Bacillus, Brucella, Cryptosporidium, Salmonella, SARS, Toxoplasma, Vibrio and Yersinia, human tissue libraries, and mouse macrophages. These publicly available data cover diverse data types such as mass spectrometry, yeast two-hybrid (Y2H), gene expression profiles, X-ray and NMR determined protein structures and protein expression clones. The growing database covers over 23 000 unique genes/proteins from different experiments and organisms. All of the genes/proteins are annotated and integrated across experiments using UniProt Knowledgebase (UniProtKB) accession numbers. The web-interface for the database enables searching, querying and downloading at the level of experiment, group and individual gene(s)/protein(s) via UniProtKB accession numbers or protein function keywords. The system is accessible at http://www.proteomicsresource.org/. PMID:17984082

  14. Identification of key candidate genes and pathways in hepatitis B virus-associated acute liver failure by bioinformatical analysis

    PubMed Central

    Lin, Huapeng; Zhang, Qian; Li, Xiaocheng; Wu, Yushen; Liu, Ye; Hu, Yingchun

    2018-01-01

    Abstract Hepatitis B virus-associated acute liver failure (HBV-ALF) is a rare but life-threatening syndrome that carried a high morbidity and mortality. Our study aimed to explore the possible molecular mechanisms of HBV-ALF by means of bioinformatics analysis. In this study, genes expression microarray datasets of HBV-ALF from Gene Expression Omnibus were collected, and then we identified differentially expressed genes (DEGs) by the limma package in R. After functional enrichment analysis, we constructed the protein–protein interaction (PPI) network by the Search Tool for the Retrieval of Interacting Genes online database and weighted genes coexpression network by the WGCNA package in R. Subsequently, we picked out the hub genes among the DEGs. A total of 423 DEGs with 198 upregulated genes and 225 downregulated genes were identified between HBV-ALF and normal samples. The upregulated genes were mainly enriched in immune response, and the downregulated genes were mainly enriched in complement and coagulation cascades. Orosomucoid 1 (ORM1), orosomucoid 2 (ORM2), plasminogen (PLG), and aldehyde oxidase 1 (AOX1) were picked out as the hub genes that with a high degree in both PPI network and weighted genes coexpression network. The weighted genes coexpression network analysis found out 3 of the 5 modules that upregulated genes enriched in were closely related to immune system. The downregulated genes enriched in only one module, and the genes in this module majorly enriched in the complement and coagulation cascades pathway. In conclusion, 4 genes (ORM1, ORM2, PLG, and AOX1) with immune response and the complement and coagulation cascades pathway may take part in the pathogenesis of HBV-ALF, and these candidate genes and pathways could be therapeutic targets for HBV-ALF. PMID:29384847

  15. Transcriptome Analysis in Sheepgrass (Leymus chinensis): A Dominant Perennial Grass of the Eurasian Steppe

    PubMed Central

    Chen, Shuangyan; Huang, Xin; Yan, Xueqing; Liang, Ye; Wang, Yuezhu; Li, Xiaofeng; Peng, Xianjun; Ma, Xingyong; Zhang, Lexin; Cai, Yueyue; Ma, Tian; Cheng, Liqin; Qi, Dongmei; Zheng, Huajun; Yang, Xiaohan; Li, Xiaoxia; Liu, Gongshe

    2013-01-01

    Background Sheepgrass [Leymus chinensis (Trin.) Tzvel.] is an important perennial forage grass across the Eurasian Steppe and is known for its adaptability to various environmental conditions. However, insufficient data resources in public databases for sheepgrass limited our understanding of the mechanism of environmental adaptations, gene discovery and molecular marker development. Results The transcriptome of sheepgrass was sequenced using Roche 454 pyrosequencing technology. We assembled 952,328 high-quality reads into 87,214 unigenes, including 32,416 contigs and 54,798 singletons. There were 15,450 contigs over 500 bp in length. BLAST searches of our database against Swiss-Prot and NCBI non-redundant protein sequences (nr) databases resulted in the annotation of 54,584 (62.6%) of the unigenes. Gene Ontology (GO) analysis assigned 89,129 GO term annotations for 17,463 unigenes. We identified 11,675 core Poaceae-specific and 12,811 putative sheepgrass-specific unigenes by BLAST searches against all plant genome and transcriptome databases. A total of 2,979 specific freezing-responsive unigenes were found from this RNAseq dataset. We identified 3,818 EST-SSRs in 3,597 unigenes, and some SSRs contained unigenes that were also candidates for freezing-response genes. Characterizations of nucleotide repeats and dominant motifs of SSRs in sheepgrass were also performed. Similarity and phylogenetic analysis indicated that sheepgrass is closely related to barley and wheat. Conclusions This research has greatly enriched sheepgrass transcriptome resources. The identified stress-related genes will help us to decipher the genetic basis of the environmental and ecological adaptations of this species and will be used to improve wheat and barley crops through hybridization or genetic transformation. The EST-SSRs reported here will be a valuable resource for future gene-phenotype studies and for the molecular breeding of sheepgrass and other Poaceae species. PMID:23861841

  16. Transcriptome analysis in sheepgrass (Leymus chinensis): a dominant perennial grass of the Eurasian Steppe.

    PubMed

    Chen, Shuangyan; Huang, Xin; Yan, Xueqing; Liang, Ye; Wang, Yuezhu; Li, Xiaofeng; Peng, Xianjun; Ma, Xingyong; Zhang, Lexin; Cai, Yueyue; Ma, Tian; Cheng, Liqin; Qi, Dongmei; Zheng, Huajun; Yang, Xiaohan; Li, Xiaoxia; Liu, Gongshe

    2013-01-01

    Sheepgrass [Leymus chinensis (Trin.) Tzvel.] is an important perennial forage grass across the Eurasian Steppe and is known for its adaptability to various environmental conditions. However, insufficient data resources in public databases for sheepgrass limited our understanding of the mechanism of environmental adaptations, gene discovery and molecular marker development. The transcriptome of sheepgrass was sequenced using Roche 454 pyrosequencing technology. We assembled 952,328 high-quality reads into 87,214 unigenes, including 32,416 contigs and 54,798 singletons. There were 15,450 contigs over 500 bp in length. BLAST searches of our database against Swiss-Prot and NCBI non-redundant protein sequences (nr) databases resulted in the annotation of 54,584 (62.6%) of the unigenes. Gene Ontology (GO) analysis assigned 89,129 GO term annotations for 17,463 unigenes. We identified 11,675 core Poaceae-specific and 12,811 putative sheepgrass-specific unigenes by BLAST searches against all plant genome and transcriptome databases. A total of 2,979 specific freezing-responsive unigenes were found from this RNAseq dataset. We identified 3,818 EST-SSRs in 3,597 unigenes, and some SSRs contained unigenes that were also candidates for freezing-response genes. Characterizations of nucleotide repeats and dominant motifs of SSRs in sheepgrass were also performed. Similarity and phylogenetic analysis indicated that sheepgrass is closely related to barley and wheat. This research has greatly enriched sheepgrass transcriptome resources. The identified stress-related genes will help us to decipher the genetic basis of the environmental and ecological adaptations of this species and will be used to improve wheat and barley crops through hybridization or genetic transformation. The EST-SSRs reported here will be a valuable resource for future gene-phenotype studies and for the molecular breeding of sheepgrass and other Poaceae species.

  17. Agave tequilana MADS genes show novel expression patterns in meristems, developing bulbils and floral organs.

    PubMed

    Delgado Sandoval, Silvia del Carmen; Abraham Juárez, María Jazmín; Simpson, June

    2012-03-01

    Agave tequilana is a monocarpic perennial species that flowers after 5-8 years of vegetative growth signaling the end of the plant's life cycle. When fertilization is unsuccessful, vegetative bulbils are induced on the umbels of the inflorescence near the bracteoles from newly formed meristems. Although the regulation of inflorescence and flower development has been described in detail for monocarpic annuals and polycarpic species, little is known at the molecular level for these processes in monocarpic perennials, and few studies have been carried out on bulbils. Histological samples revealed the early induction of umbel meristems soon after the initiation of the vegetative to inflorescence transition in A. tequilana. To identify candidate genes involved in the regulation of floral induction, a search for MADS-box transcription factor ESTs was conducted using an A. tequilana transcriptome database. Seven different MIKC MADS genes classified into 6 different types were identified based on previously characterized A. thaliana and O. sativa MADS genes and sequences from non-grass monocotyledons. Quantitative real-time PCR analysis of the seven candidate MADS genes in vegetative, inflorescence, bulbil and floral tissues uncovered novel patterns of expression for some of the genes in comparison with orthologous genes characterized in other species. In situ hybridization studies using two different genes showed expression in specific tissues of vegetative meristems and floral buds. Distinct MADS gene regulatory patterns in A. tequilana may be related to the specific reproductive strategies employed by this species.

  18. Exome sequencing of oral squamous cell carcinoma in users of Arabian snuff reveals novel candidates for driver genes.

    PubMed

    Al-Hebshi, Nezar Noor; Li, Shiyong; Nasher, Akram Thabet; El-Setouhy, Maged; Alsanosi, Rashad; Blancato, Jan; Loffredo, Christopher

    2016-07-15

    The study sought to identify genetic aberrations driving oral squamous cell carcinoma (OSCC) development among users of shammah, an Arabian preparation of smokeless tobacco. Twenty archival OSCC samples, 15 of which with a history of shammah exposure, were whole-exome sequenced at an average depth of 127×. Somatic mutations were identified using a novel, matched controls-independent filtration algorithm. CODEX and Exomedepth coupled with a novel, Database of Genomic Variant-based filter were employed to call somatic gene-copy number variations. Significantly mutated genes were identified with Oncodrive FM and the Youn and Simon's method. Candidate driver genes were nominated based on Gene Set Enrichment Analysis. The observed mutational spectrum was similar to that reported by the TCGA project. In addition to confirming known genes of OSCC (TP53, CDKNA2, CASP8, PIK3CA, HRAS, FAT1, TP63, CCND1 and FADD) the analysis identified several candidate novel driver events including mutations of NOTCH3, CSMD3, CRB1, CLTCL1, OSMR and TRPM2, amplification of the proto-oncogenes FOSL1, RELA, TRAF6, MDM2, FRS2 and BAG1, and deletion of the recently described tumor suppressor SMARCC1. Analysis also revealed significantly altered pathways not previously implicated in OSCC including Oncostatin-M signalling pathway, AP-1 and C-MYB transcription networks and endocytosis. There was a trend for higher number of mutations, amplifications and driver events in samples with history of shammah exposure particularly those that tested EBV positive, suggesting an interaction between tobacco exposure and EBV. The work provides further evidence for the genetic heterogeneity of oral cancer and suggests shammah-associated OSCC is characterized by extensive amplification of oncogenes. © 2016 UICC.

  19. Common Variants in Mendelian Kidney Disease Genes and Their Association with Renal Function

    PubMed Central

    Fuchsberger, Christian; Köttgen, Anna; O’Seaghdha, Conall M.; Pattaro, Cristian; de Andrade, Mariza; Chasman, Daniel I.; Teumer, Alexander; Endlich, Karlhans; Olden, Matthias; Chen, Ming-Huei; Tin, Adrienne; Kim, Young J.; Taliun, Daniel; Li, Man; Feitosa, Mary; Gorski, Mathias; Yang, Qiong; Hundertmark, Claudia; Foster, Meredith C.; Glazer, Nicole; Isaacs, Aaron; Rao, Madhumathi; Smith, Albert V.; O’Connell, Jeffrey R.; Struchalin, Maksim; Tanaka, Toshiko; Li, Guo; Hwang, Shih-Jen; Atkinson, Elizabeth J.; Lohman, Kurt; Cornelis, Marilyn C.; Johansson, Åsa; Tönjes, Anke; Dehghan, Abbas; Couraki, Vincent; Holliday, Elizabeth G.; Sorice, Rossella; Kutalik, Zoltan; Lehtimäki, Terho; Esko, Tõnu; Deshmukh, Harshal; Ulivi, Sheila; Chu, Audrey Y.; Murgia, Federico; Trompet, Stella; Imboden, Medea; Kollerits, Barbara; Pistis, Giorgio; Harris, Tamara B.; Launer, Lenore J.; Aspelund, Thor; Eiriksdottir, Gudny; Mitchell, Braxton D.; Boerwinkle, Eric; Schmidt, Helena; Hofer, Edith; Hu, Frank; Demirkan, Ayse; Oostra, Ben A.; Turner, Stephen T.; Ding, Jingzhong; Andrews, Jeanette S.; Freedman, Barry I.; Giulianini, Franco; Koenig, Wolfgang; Illig, Thomas; Döring, Angela; Wichmann, H.-Erich; Zgaga, Lina; Zemunik, Tatijana; Boban, Mladen; Minelli, Cosetta; Wheeler, Heather E.; Igl, Wilmar; Zaboli, Ghazal; Wild, Sarah H.; Wright, Alan F.; Campbell, Harry; Ellinghaus, David; Nöthlings, Ute; Jacobs, Gunnar; Biffar, Reiner; Ernst, Florian; Homuth, Georg; Kroemer, Heyo K.; Nauck, Matthias; Stracke, Sylvia; Völker, Uwe; Völzke, Henry; Kovacs, Peter; Stumvoll, Michael; Mägi, Reedik; Hofman, Albert; Uitterlinden, Andre G.; Rivadeneira, Fernando; Aulchenko, Yurii S.; Polasek, Ozren; Hastie, Nick; Vitart, Veronique; Helmer, Catherine; Wang, Jie Jin; Stengel, Bénédicte; Ruggiero, Daniela; Bergmann, Sven; Kähönen, Mika; Viikari, Jorma; Nikopensius, Tiit; Province, Michael; Colhoun, Helen; Doney, Alex; Robino, Antonietta; Krämer, Bernhard K.; Portas, Laura; Ford, Ian; Buckley, Brendan M.; Adam, Martin; Thun, Gian-Andri; Paulweber, Bernhard; Haun, Margot; Sala, Cinzia; Mitchell, Paul; Ciullo, Marina; Vollenweider, Peter; Raitakari, Olli; Metspalu, Andres; Palmer, Colin; Gasparini, Paolo; Pirastu, Mario; Jukema, J. Wouter; Probst-Hensch, Nicole M.; Kronenberg, Florian; Toniolo, Daniela; Gudnason, Vilmundur; Shuldiner, Alan R.; Coresh, Josef; Schmidt, Reinhold; Ferrucci, Luigi; van Duijn, Cornelia M.; Borecki, Ingrid; Kardia, Sharon L.R.; Liu, Yongmei; Curhan, Gary C.; Rudan, Igor; Gyllensten, Ulf; Wilson, James F.; Franke, Andre; Pramstaller, Peter P.; Rettig, Rainer; Prokopenko, Inga; Witteman, Jacqueline; Hayward, Caroline; Ridker, Paul M.; Bochud, Murielle; Heid, Iris M.; Siscovick, David S.; Fox, Caroline S.; Kao, W. Linda; Böger, Carsten A.

    2013-01-01

    Many common genetic variants identified by genome-wide association studies for complex traits map to genes previously linked to rare inherited Mendelian disorders. A systematic analysis of common single-nucleotide polymorphisms (SNPs) in genes responsible for Mendelian diseases with kidney phenotypes has not been performed. We thus developed a comprehensive database of genes for Mendelian kidney conditions and evaluated the association between common genetic variants within these genes and kidney function in the general population. Using the Online Mendelian Inheritance in Man database, we identified 731 unique disease entries related to specific renal search terms and confirmed a kidney phenotype in 218 of these entries, corresponding to mutations in 258 genes. We interrogated common SNPs (minor allele frequency >5%) within these genes for association with the estimated GFR in 74,354 European-ancestry participants from the CKDGen Consortium. However, the top four candidate SNPs (rs6433115 at LRP2, rs1050700 at TSC1, rs249942 at PALB2, and rs9827843 at ROBO2) did not achieve significance in a stage 2 meta-analysis performed in 56,246 additional independent individuals, indicating that these common SNPs are not associated with estimated GFR. The effect of less common or rare variants in these genes on kidney function in the general population and disease-specific cohorts requires further research. PMID:24029420

  20. Norrie disease gene: characterization of deletions and possible function.

    PubMed

    Chen, Z Y; Battinelli, E M; Hendriks, R W; Powell, J F; Middleton-Price, H; Sims, K B; Breakefield, X O; Craig, I W

    1993-05-01

    Positional cloning experiments have resulted recently in the isolation of a candidate gene for Norrie disease (pseudoglioma; NDP), a severe X-linked neurodevelopmental disorder. Here we report the isolation and analysis of human genomic DNA clones encompassing the NDP gene. The gene spans 28 kb and consists of 3 exons, the first of which is entirely contained within the 5' untranslated region. Detailed analysis of genomic deletions in Norrie patients shows that they are heterogeneous, both in size and in position. By PCR analysis, we found that expression of the NDP gene was not confined to the eye or to the brain. An extensive DNA and protein sequence comparison between the human NDP gene and related genes from the database revealed homology with cysteine-rich protein-binding domains of immediate--early genes implicated in the regulation of cell proliferation. We propose that NDP is a molecule related in function to these genes and may be involved in a pathway that regulates neural cell differentiation and proliferation.

  1. Identification of Streptococcus mitis321A vaccine antigens based on reverse vaccinology

    PubMed Central

    Zhang, Qiao; Lin, Kexiong; Wang, Changzheng; Xu, Zhi; Yang, Li; Ma, Qianli

    2018-01-01

    Streptococcus mitis (S. mitis) may transform into highly pathogenic bacteria. The aim of the present study was to identify potential antigen targets for designing an effective vaccine against the pathogenic S. mitis321A. The genome of S. mitis321A was sequenced using an Illumina Hiseq2000 instrument. Subsequently, Glimmer 3.02 and Tandem Repeat Finder (TRF) 4.04 were used to predict genes and tandem repeats, respectively, with DNA sequence function analysis using the Basic Local Alignment Search Tool (BLAST) in the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Cluster of Orthologous Groups of proteins (COG) databases. Putative gene antigen candidates were screened with BLAST ahead of phylogenetic tree analysis. The DNA sequence assembly size was 2,110,680 bp with 40.12% GC, 6 scaffolds and 9 contig. Consequently, 1,944 genes were predicted, and 119 TRF, 56 microsatellite DNA, 10 minisatellite DNA and 154 transposons were acquired. The predicted genes were associated with various pathways and functions concerning membrane transport and energy metabolism. Multiple putative genes encoding surface proteins, secreted proteins and virulence factors, as well as essential genes were determined. The majority of essential genes belonged to a phylogenetic lineage, while 321AGL000129 and 321AGL000299 were on the same branch. The current study provided useful information regarding the biological function of the S. mitis321A genome and recommends putative antigen candidates for developing a potent vaccine against S. mitis. PMID:29620181

  2. Disease gene prioritization by integrating tissue-specific molecular networks using a robust multi-network model.

    PubMed

    Ni, Jingchao; Koyuturk, Mehmet; Tong, Hanghang; Haines, Jonathan; Xu, Rong; Zhang, Xiang

    2016-11-10

    Accurately prioritizing candidate disease genes is an important and challenging problem. Various network-based methods have been developed to predict potential disease genes by utilizing the disease similarity network and molecular networks such as protein interaction or gene co-expression networks. Although successful, a common limitation of the existing methods is that they assume all diseases share the same molecular network and a single generic molecular network is used to predict candidate genes for all diseases. However, different diseases tend to manifest in different tissues, and the molecular networks in different tissues are usually different. An ideal method should be able to incorporate tissue-specific molecular networks for different diseases. In this paper, we develop a robust and flexible method to integrate tissue-specific molecular networks for disease gene prioritization. Our method allows each disease to have its own tissue-specific network(s). We formulate the problem of candidate gene prioritization as an optimization problem based on network propagation. When there are multiple tissue-specific networks available for a disease, our method can automatically infer the relative importance of each tissue-specific network. Thus it is robust to the noisy and incomplete network data. To solve the optimization problem, we develop fast algorithms which have linear time complexities in the number of nodes in the molecular networks. We also provide rigorous theoretical foundations for our algorithms in terms of their optimality and convergence properties. Extensive experimental results show that our method can significantly improve the accuracy of candidate gene prioritization compared with the state-of-the-art methods. In our experiments, we compare our methods with 7 popular network-based disease gene prioritization algorithms on diseases from Online Mendelian Inheritance in Man (OMIM) database. The experimental results demonstrate that our methods recover true associations more accurately than other methods in terms of AUC values, and the performance differences are significant (with paired t-test p-values less than 0.05). This validates the importance to integrate tissue-specific molecular networks for studying disease gene prioritization and show the superiority of our network models and ranking algorithms toward this purpose. The source code and datasets are available at http://nijingchao.github.io/CRstar/ .

  3. Selection of appropriate reference genes for the detection of rhythmic gene expression via quantitative real-time PCR in Tibetan hulless barley.

    PubMed

    Cai, Jing; Li, Pengfei; Luo, Xiao; Chang, Tianliang; Li, Jiaxing; Zhao, Yuwei; Xu, Yao

    2018-01-01

    Hulless barley (Hordeum vulgare L. var. nudum. hook. f.) has been cultivated as a major crop in the Qinghai-Tibet plateau of China for thousands of years. Compared to other cereal crops, the Tibetan hulless barley has developed stronger endogenous resistances to survive in the severe environment of its habitat. To understand the unique resistant mechanisms of this plant, detailed genetic studies need to be performed. The quantitative real-time reverse transcription-polymerase chain reaction (qRT-PCR) is the most commonly used method in detecting gene expression. However, the selection of stable reference genes under limited experimental conditions was considered to be an essential step for obtaining accurate results in qRT-PCR. In this study, 10 candidate reference genes-ACT (Actin), E2 (Ubiquitin conjugating enzyme 2), TUBα (Alpha-tubulin), TUBβ6 (Beta-tubulin 6), GAPDH (Glyceraldehyde 3-phosphate dehydrogenase), EF-1α (Elongation factor 1-alpha), SAMDC (S-adenosylmethionine decarboxylase), PKABA1 (Gene for protein kinase HvPKABA1), PGK (Phosphoglycerate kinase), and HSP90 (Heat shock protein 90)-were selected from the NCBI gene database of barley. Following qRT-PCR amplifications of all candidate reference genes in Tibetan hulless barley seedlings under various stressed conditions, the stabilities of these candidates were analyzed by three individual software packages including geNorm, NormFinder, and BestKeeper. The results demonstrated that TUBβ6, E2, TUBα, and HSP90 were generally the most suitable sets under all tested conditions; similarly, TUBα and HSP90 showed peak stability under salt stress, TUBα and EF-1α were the most suitable reference genes under cold stress, and ACT and E2 were the most stable under drought stress. Finally, a known circadian gene CCA1 was used to verify the service ability of chosen reference genes. The results confirmed that all recommended reference genes by the three software were suitable for gene expression analysis under tested stress conditions by the qRT-PCR method.

  4. Candidate genes for obesity-susceptibility show enriched association within a large genome-wide association study for BMI.

    PubMed

    Vimaleswaran, Karani S; Tachmazidou, Ioanna; Zhao, Jing Hua; Hirschhorn, Joel N; Dudbridge, Frank; Loos, Ruth J F

    2012-10-15

    Before the advent of genome-wide association studies (GWASs), hundreds of candidate genes for obesity-susceptibility had been identified through a variety of approaches. We examined whether those obesity candidate genes are enriched for associations with body mass index (BMI) compared with non-candidate genes by using data from a large-scale GWAS. A thorough literature search identified 547 candidate genes for obesity-susceptibility based on evidence from animal studies, Mendelian syndromes, linkage studies, genetic association studies and expression studies. Genomic regions were defined to include the genes ±10 kb of flanking sequence around candidate and non-candidate genes. We used summary statistics publicly available from the discovery stage of the genome-wide meta-analysis for BMI performed by the genetic investigation of anthropometric traits consortium in 123 564 individuals. Hypergeometric, rank tail-strength and gene-set enrichment analysis tests were used to test for the enrichment of association in candidate compared with non-candidate genes. The hypergeometric test of enrichment was not significant at the 5% P-value quantile (P = 0.35), but was nominally significant at the 25% quantile (P = 0.015). The rank tail-strength and gene-set enrichment tests were nominally significant for the full set of genes and borderline significant for the subset without SNPs at P < 10(-7). Taken together, the observed evidence for enrichment suggests that the candidate gene approach retains some value. However, the degree of enrichment is small despite the extensive number of candidate genes and the large sample size. Studies that focus on candidate genes have only slightly increased chances of detecting associations, and are likely to miss many true effects in non-candidate genes, at least for obesity-related traits.

  5. Identification of homogeneous genetic architecture of multiple genetically correlated traits by block clustering of genome-wide associations.

    PubMed

    Gupta, Mayetri; Cheung, Ching-Lung; Hsu, Yi-Hsiang; Demissie, Serkalem; Cupples, L Adrienne; Kiel, Douglas P; Karasik, David

    2011-06-01

    Genome-wide association studies (GWAS) using high-density genotyping platforms offer an unbiased strategy to identify new candidate genes for osteoporosis. It is imperative to be able to clearly distinguish signal from noise by focusing on the best phenotype in a genetic study. We performed GWAS of multiple phenotypes associated with fractures [bone mineral density (BMD), bone quantitative ultrasound (QUS), bone geometry, and muscle mass] with approximately 433,000 single-nucleotide polymorphisms (SNPs) and created a database of resulting associations. We performed analysis of GWAS data from 23 phenotypes by a novel modification of a block clustering algorithm followed by gene-set enrichment analysis. A data matrix of standardized regression coefficients was partitioned along both axes--SNPs and phenotypes. Each partition represents a distinct cluster of SNPs that have similar effects over a particular set of phenotypes. Application of this method to our data shows several SNP-phenotype connections. We found a strong cluster of association coefficients of high magnitude for 10 traits (BMD at several skeletal sites, ultrasound measures, cross-sectional bone area, and section modulus of femoral neck and shaft). These clustered traits were highly genetically correlated. Gene-set enrichment analyses indicated the augmentation of genes that cluster with the 10 osteoporosis-related traits in pathways such as aldosterone signaling in epithelial cells, role of osteoblasts, osteoclasts, and chondrocytes in rheumatoid arthritis, and Parkinson signaling. In addition to several known candidate genes, we also identified PRKCH and SCNN1B as potential candidate genes for multiple bone traits. In conclusion, our mining of GWAS results revealed the similarity of association results between bone strength phenotypes that may be attributed to pleiotropic effects of genes. This knowledge may prove helpful in identifying novel genes and pathways that underlie several correlated phenotypes, as well as in deciphering genetic and phenotypic modularity underlying osteoporosis risk. Copyright © 2011 American Society for Bone and Mineral Research.

  6. Text mining-based in silico drug discovery in oral mucositis caused by high-dose cancer therapy.

    PubMed

    Kirk, Jon; Shah, Nirav; Noll, Braxton; Stevens, Craig B; Lawler, Marshall; Mougeot, Farah B; Mougeot, Jean-Luc C

    2018-08-01

    Oral mucositis (OM) is a major dose-limiting side effect of chemotherapy and radiation used in cancer treatment. Due to the complex nature of OM, currently available drug-based treatments are of limited efficacy. Our objectives were (i) to determine genes and molecular pathways associated with OM and wound healing using computational tools and publicly available data and (ii) to identify drugs formulated for topical use targeting the relevant OM molecular pathways. OM and wound healing-associated genes were determined by text mining, and the intersection of the two gene sets was selected for gene ontology analysis using the GeneCodis program. Protein interaction network analysis was performed using STRING-db. Enriched gene sets belonging to the identified pathways were queried against the Drug-Gene Interaction database to find drug candidates for topical use in OM. Our analysis identified 447 genes common to both the "OM" and "wound healing" text mining concepts. Gene enrichment analysis yielded 20 genes representing six pathways and targetable by a total of 32 drugs which could possibly be formulated for topical application. A manual search on ClinicalTrials.gov confirmed no relevant pathway/drug candidate had been overlooked. Twenty-five of the 32 drugs can directly affect the PTGS2 (COX-2) pathway, the pathway that has been targeted in previous clinical trials with limited success. Drug discovery using in silico text mining and pathway analysis tools can facilitate the identification of existing drugs that have the potential of topical administration to improve OM treatment.

  7. Gene Fusion Markup Language: a prototype for exchanging gene fusion data.

    PubMed

    Kalyana-Sundaram, Shanker; Shanmugam, Achiraman; Chinnaiyan, Arul M

    2012-10-16

    An avalanche of next generation sequencing (NGS) studies has generated an unprecedented amount of genomic structural variation data. These studies have also identified many novel gene fusion candidates with more detailed resolution than previously achieved. However, in the excitement and necessity of publishing the observations from this recently developed cutting-edge technology, no community standardization approach has arisen to organize and represent the data with the essential attributes in an interchangeable manner. As transcriptome studies have been widely used for gene fusion discoveries, the current non-standard mode of data representation could potentially impede data accessibility, critical analyses, and further discoveries in the near future. Here we propose a prototype, Gene Fusion Markup Language (GFML) as an initiative to provide a standard format for organizing and representing the significant features of gene fusion data. GFML will offer the advantage of representing the data in a machine-readable format to enable data exchange, automated analysis interpretation, and independent verification. As this database-independent exchange initiative evolves it will further facilitate the formation of related databases, repositories, and analysis tools. The GFML prototype is made available at http://code.google.com/p/gfml-prototype/. The Gene Fusion Markup Language (GFML) presented here could facilitate the development of a standard format for organizing, integrating and representing the significant features of gene fusion data in an inter-operable and query-able fashion that will enable biologically intuitive access to gene fusion findings and expedite functional characterization. A similar model is envisaged for other NGS data analyses.

  8. PceRBase: a database of plant competing endogenous RNA.

    PubMed

    Yuan, Chunhui; Meng, Xianwen; Li, Xue; Illing, Nicola; Ingle, Robert A; Wang, Jingjing; Chen, Ming

    2017-01-04

    Competition for microRNA (miRNA) binding between RNA molecules has emerged as a novel mechanism for the regulation of eukaryotic gene expression. Competing endogenous RNA (ceRNA) can act as decoys for miRNA binding, thereby forming a ceRNA network by regulating the abundance of other RNA transcripts which share the same or similar microRNA response elements. Although this type of RNA cross talk was first described in Arabidopsis, and was subsequently shown to be active in animal models, there is no database collecting potential ceRNA data for plants. We have developed a Plant ceRNA database (PceRBase, http://bis.zju.edu.cn/pcernadb/index.jsp) which contains potential ceRNA target-target, and ceRNA target-mimic pairs from 26 plant species. For example, in Arabidopsis lyrata, 311 candidate ceRNAs are identified which could affect 2646 target-miRNA-target interactions. Predicted pairing structure between miRNAs and their target mRNA transcripts, expression levels of ceRNA pairs and associated GO annotations are also stored in the database. A web interface provides convenient browsing and searching for specific genes of interest. Tools are available for the visualization and enrichment analysis of genes in the ceRNA networks. Moreover, users can use PceRBase to predict novel competing mimic-target and target-target interactions from their own data. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. Rose Scent

    PubMed Central

    Guterman, Inna; Shalit, Moshe; Menda, Naama; Piestun, Dan; Dafny-Yelin, Mery; Shalev, Gil; Bar, Einat; Davydov, Olga; Ovadis, Mariana; Emanuel, Michal; Wang, Jihong; Adam, Zach; Pichersky, Eran; Lewinsohn, Efraim; Zamir, Dani; Vainstein, Alexander; Weiss, David

    2002-01-01

    For centuries, rose has been the most important crop in the floriculture industry; its economic importance also lies in the use of its petals as a source of natural fragrances. Here, we used genomics approaches to identify novel scent-related genes, using rose flowers from tetraploid scented and nonscented cultivars. An annotated petal EST database of ∼2100 unique genes from both cultivars was created, and DNA chips were prepared and used for expression analyses of selected clones. Detailed chemical analysis of volatile composition in the two cultivars, together with the identification of secondary metabolism–related genes whose expression coincides with scent production, led to the discovery of several novel flower scent–related candidate genes. The function of some of these genes, including a germacrene D synthase, was biochemically determined using an Escherichia coli expression system. This work demonstrates the advantages of using the high-throughput approaches of genomics to detail traits of interest expressed in a cultivar-specific manner in nonmodel plants. PMID:12368489

  10. Molecular and comparative genetics of mental retardation.

    PubMed Central

    Inlow, Jennifer K; Restifo, Linda L

    2004-01-01

    Affecting 1-3% of the population, mental retardation (MR) poses significant challenges for clinicians and scientists. Understanding the biology of MR is complicated by the extraordinary heterogeneity of genetic MR disorders. Detailed analyses of >1000 Online Mendelian Inheritance in Man (OMIM) database entries and literature searches through September 2003 revealed 282 molecularly identified MR genes. We estimate that hundreds more MR genes remain to be identified. A novel test, in which we distributed unmapped MR disorders proportionately across the autosomes, failed to eliminate the well-known X-chromosome overrepresentation of MR genes and candidate genes. This evidence argues against ascertainment bias as the main cause of the skewed distribution. On the basis of a synthesis of clinical and laboratory data, we developed a biological functions classification scheme for MR genes. Metabolic pathways, signaling pathways, and transcription are the most common functions, but numerous other aspects of neuronal and glial biology are controlled by MR genes as well. Using protein sequence and domain-organization comparisons, we found a striking conservation of MR genes and genetic pathways across the approximately 700 million years that separate Homo sapiens and Drosophila melanogaster. Eighty-seven percent have one or more fruit fly homologs and 76% have at least one candidate functional ortholog. We propose that D. melanogaster can be used in a systematic manner to study MR and possibly to develop bioassays for therapeutic drug discovery. We selected 42 Drosophila orthologs as most likely to reveal molecular and cellular mechanisms of nervous system development or plasticity relevant to MR. PMID:15020472

  11. Virus-Induced Gene Silencing Using Tobacco Rattle Virus as a Tool to Study the Interaction between Nicotiana attenuata and Rhizophagus irregularis.

    PubMed

    Groten, Karin; Pahari, Nabin T; Xu, Shuqing; Miloradovic van Doorn, Maja; Baldwin, Ian T

    2015-01-01

    Most land plants live in a symbiotic association with arbuscular mycorrhizal fungi (AMF) that belong to the phylum Glomeromycota. Although a number of plant genes involved in the plant-AMF interactions have been identified by analyzing mutants, the ability to rapidly manipulate gene expression to study the potential functions of new candidate genes remains unrealized. We analyzed changes in gene expression of wild tobacco roots (Nicotiana attenuata) after infection with mycorrhizal fungi (Rhizophagus irregularis) by serial analysis of gene expression (SuperSAGE) combined with next generation sequencing, and established a virus-induced gene-silencing protocol to study the function of candidate genes in the interaction. From 92,434 SuperSAGE Tag sequences, 32,808 (35%) matched with our in-house Nicotiana attenuata transcriptome database and 3,698 (4%) matched to Rhizophagus genes. In total, 11,194 Tags showed a significant change in expression (p<0.05, >2-fold change) after infection. When comparing the functions of highly up-regulated annotated Tags in this study with those of two previous large-scale gene expression studies, 18 gene functions were found to be up-regulated in all three studies mainly playing roles related to phytohormone metabolism, catabolism and defense. To validate the function of identified candidate genes, we used the technique of virus-induced gene silencing (VIGS) to silence the expression of three putative N. attenuata genes: germin-like protein, indole-3-acetic acid-amido synthetase GH3.9 and, as a proof-of-principle, calcium and calmodulin-dependent protein kinase (CCaMK). The silencing of the three plant genes in roots was successful, but only CCaMK silencing had a significant effect on the interaction with R. irregularis. Interestingly, when a highly activated inoculum was used for plant inoculation, the effect of CCaMK silencing on fungal colonization was masked, probably due to trans-complementation. This study demonstrates that large-scale gene expression studies across different species induce of a core set of genes of similar functions. However, additional factors seem to influence the overall pattern of gene expression, resulting in high variability among independent studies with different hosts. We conclude that VIGS is a powerful tool with which to investigate the function of genes involved in plant-AMF interactions but that inoculum strength can strongly influence the outcome of the interaction.

  12. Small RNA and transcriptome deep sequencing proffers insight into floral gene regulation in Rosa cultivars

    PubMed Central

    2012-01-01

    Background Roses (Rosa sp.), which belong to the family Rosaceae, are the most economically important ornamental plants—making up 30% of the floriculture market. However, given high demand for roses, rose breeding programs are limited in molecular resources which can greatly enhance and speed breeding efforts. A better understanding of important genes that contribute to important floral development and desired phenotypes will lead to improved rose cultivars. For this study, we analyzed rose miRNAs and the rose flower transcriptome in order to generate a database to expound upon current knowledge regarding regulation of important floral characteristics. A rose genetic database will enable comprehensive analysis of gene expression and regulation via miRNA among different Rosa cultivars. Results We produced more than 0.5 million reads from expressed sequences, totalling more than 110 million bp. From these, we generated 35,657, 31,434, 34,725, and 39,722 flower unigenes from Rosa hybrid: ‘Vital’, ‘Maroussia’, and ‘Sympathy’ and Rosa rugosa Thunb. , respectively. The unigenes were assigned functional annotations, domains, metabolic pathways, Gene Ontology (GO) terms, Plant Ontology (PO) terms, and MIPS Functional Catalogue (FunCat) terms. Rose flower transcripts were compared with genes from whole genome sequences of Rosaceae members (apple, strawberry, and peach) and grape. We also produced approximately 40 million small RNA reads from flower tissue for Rosa, representing 267 unique miRNA tags. Among identified miRNAs, 25 of them were novel and 242 of them were conserved miRNAs. Statistical analyses of miRNA profiles revealed both shared and species-specific miRNAs, which presumably effect flower development and phenotypes. Conclusions In this study, we constructed a Rose miRNA and transcriptome database, and we analyzed the miRNAs and transcriptome generated from the flower tissues of four Rosa cultivars. The database provides a comprehensive genetic resource which can be used to better understand rose flower development and to identify candidate genes for important phenotypes. PMID:23171001

  13. Small RNA and transcriptome deep sequencing proffers insight into floral gene regulation in Rosa cultivars.

    PubMed

    Kim, Jungeun; Park, June Hyun; Lim, Chan Ju; Lim, Jae Yun; Ryu, Jee-Youn; Lee, Bong-Woo; Choi, Jae-Pil; Kim, Woong Bom; Lee, Ha Yeon; Choi, Yourim; Kim, Donghyun; Hur, Cheol-Goo; Kim, Sukweon; Noh, Yoo-Sun; Shin, Chanseok; Kwon, Suk-Yoon

    2012-11-21

    Roses (Rosa sp.), which belong to the family Rosaceae, are the most economically important ornamental plants--making up 30% of the floriculture market. However, given high demand for roses, rose breeding programs are limited in molecular resources which can greatly enhance and speed breeding efforts. A better understanding of important genes that contribute to important floral development and desired phenotypes will lead to improved rose cultivars. For this study, we analyzed rose miRNAs and the rose flower transcriptome in order to generate a database to expound upon current knowledge regarding regulation of important floral characteristics. A rose genetic database will enable comprehensive analysis of gene expression and regulation via miRNA among different Rosa cultivars. We produced more than 0.5 million reads from expressed sequences, totalling more than 110 million bp. From these, we generated 35,657, 31,434, 34,725, and 39,722 flower unigenes from Rosa hybrid: 'Vital', 'Maroussia', and 'Sympathy' and Rosa rugosa Thunb., respectively. The unigenes were assigned functional annotations, domains, metabolic pathways, Gene Ontology (GO) terms, Plant Ontology (PO) terms, and MIPS Functional Catalogue (FunCat) terms. Rose flower transcripts were compared with genes from whole genome sequences of Rosaceae members (apple, strawberry, and peach) and grape. We also produced approximately 40 million small RNA reads from flower tissue for Rosa, representing 267 unique miRNA tags. Among identified miRNAs, 25 of them were novel and 242 of them were conserved miRNAs. Statistical analyses of miRNA profiles revealed both shared and species-specific miRNAs, which presumably effect flower development and phenotypes. In this study, we constructed a Rose miRNA and transcriptome database, and we analyzed the miRNAs and transcriptome generated from the flower tissues of four Rosa cultivars. The database provides a comprehensive genetic resource which can be used to better understand rose flower development and to identify candidate genes for important phenotypes.

  14. Investigation of candidate genes for osteoarthritis based on gene expression profiles.

    PubMed

    Dong, Shuanghai; Xia, Tian; Wang, Lei; Zhao, Qinghua; Tian, Jiwei

    2016-12-01

    To explore the mechanism of osteoarthritis (OA) and provide valid biological information for further investigation. Gene expression profile of GSE46750 was downloaded from Gene Expression Omnibus database. The Linear Models for Microarray Data (limma) package (Bioconductor project, http://www.bioconductor.org/packages/release/bioc/html/limma.html) was used to identify differentially expressed genes (DEGs) in inflamed OA samples. Gene Ontology function enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways enrichment analysis of DEGs were performed based on Database for Annotation, Visualization and Integrated Discovery data, and protein-protein interaction (PPI) network was constructed based on the Search Tool for the Retrieval of Interacting Genes/Proteins database. Regulatory network was screened based on Encyclopedia of DNA Elements. Molecular Complex Detection was used for sub-network screening. Two sub-networks with highest node degree were integrated with transcriptional regulatory network and KEGG functional enrichment analysis was processed for 2 modules. In total, 401 up- and 196 down-regulated DEGs were obtained. Up-regulated DEGs were involved in inflammatory response, while down-regulated DEGs were involved in cell cycle. PPI network with 2392 protein interactions was constructed. Moreover, 10 genes including Interleukin 6 (IL6) and Aurora B kinase (AURKB) were found to be outstanding in PPI network. There are 214 up- and 8 down-regulated transcription factor (TF)-target pairs in the TF regulatory network. Module 1 had TFs including SPI1, PRDM1, and FOS, while module 2 contained FOSL1. The nodes in module 1 were enriched in chemokine signaling pathway, while the nodes in module 2 were mainly enriched in cell cycle. The screened DEGs including IL6, AGT, and AURKB might be potential biomarkers for gene therapy for OA by being regulated by TFs such as FOS and SPI1, and participating in the cell cycle and cytokine-cytokine receptor interaction pathway. Copyright © 2016 Turkish Association of Orthopaedics and Traumatology. Production and hosting by Elsevier B.V. All rights reserved.

  15. Chasing Migration Genes: A Brain Expressed Sequence Tag Resource for Summer and Migratory Monarch Butterflies (Danaus plexippus)

    PubMed Central

    Zhu, Haisun; Casselman, Amy; Reppert, Steven M.

    2008-01-01

    North American monarch butterflies (Danaus plexippus) undergo a spectacular fall migration. In contrast to summer butterflies, migrants are juvenile hormone (JH) deficient, which leads to reproductive diapause and increased longevity. Migrants also utilize time-compensated sun compass orientation to help them navigate to their overwintering grounds. Here, we describe a brain expressed sequence tag (EST) resource to identify genes involved in migratory behaviors. A brain EST library was constructed from summer and migrating butterflies. Of 9,484 unique sequences, 6068 had positive hits with the non-redundant protein database; the EST database likely represents ∼52% of the gene-encoding potential of the monarch genome. The brain transcriptome was cataloged using Gene Ontology and compared to Drosophila. Monarch genes were well represented, including those implicated in behavior. Three genes involved in increased JH activity (allatotropin, juvenile hormone acid methyltransfersase, and takeout) were upregulated in summer butterflies, compared to migrants. The locomotion-relevant turtle gene was marginally upregulated in migrants, while the foraging and single-minded genes were not differentially regulated. Many of the genes important for the monarch circadian clock mechanism (involved in sun compass orientation) were in the EST resource, including the newly identified cryptochrome 2. The EST database also revealed a novel Na+/K+ ATPase allele predicted to be more resistant to the toxic effects of milkweed than that reported previously. Potential genetic markers were identified from 3,486 EST contigs and included 1599 double-hit single nucleotide polymorphisms (SNPs) and 98 microsatellite polymorphisms. These data provide a template of the brain transcriptome for the monarch butterfly. Our “snap-shot” analysis of the differential regulation of candidate genes between summer and migratory butterflies suggests that unbiased, comprehensive transcriptional profiling will inform the molecular basis of migration. The identified SNPs and microsatellite polymorphisms can be used as genetic markers to address questions of population and subspecies structure. PMID:18183285

  16. Identification of learning and memory genes in canine; promoter investigation and determining the selective pressure.

    PubMed

    Seifi Moroudi, Reihane; Masoudi, Ali Akbar; Vaez Torshizi, Rasoul; Zandi, Mohammad

    2014-12-01

    One of the important behaviors of dogs is trainability which is affected by learning and memory genes. These kinds of the genes have not yet been identified in dogs. In the current research, these genes were found in animal models by mining the biological data and scientific literatures. The proteins of these genes were obtained from the UniProt database in dogs and humans. Not all homologous proteins perform similar functions, thus comparison of these proteins was studied in terms of protein families, domains, biological processes, molecular functions, and cellular location of metabolic pathways in Interpro, KEGG, Quick Go and Psort databases. The results showed that some of these proteins have the same performance in the rat or mouse, dog, and human. It is anticipated that the protein of these genes may be effective in learning and memory in dogs. Then, the expression pattern of the recognized genes was investigated in the dog hippocampus using the existing information in the GEO profile. The results showed that BDNF, TAC1 and CCK genes are expressed in the dog hippocampus, therefore, these genes could be strong candidates associated with learning and memory in dogs. Subsequently, due to the importance of the promoter regions in gene function, this region was investigated in the above genes. Analysis of the promoter indicated that the HNF-4 site of BDNF gene and the transcription start site of CCK gene is exposed to methylation. Phylogenetic analysis of protein sequences of these genes showed high similarity in each of these three genes among the studied species. The dN/dS ratio for BDNF, TAC1 and CCK genes indicates a purifying selection during the evolution of the genes.

  17. sRNAdb: A small non-coding RNA database for gram-positive bacteria

    PubMed Central

    2012-01-01

    Background The class of small non-coding RNA molecules (sRNA) regulates gene expression by different mechanisms and enables bacteria to mount a physiological response due to adaptation to the environment or infection. Over the last decades the number of sRNAs has been increasing rapidly. Several databases like Rfam or fRNAdb were extended to include sRNAs as a class of its own. Furthermore new specialized databases like sRNAMap (gram-negative bacteria only) and sRNATarBase (target prediction) were established. To the best of the authors’ knowledge no database focusing on sRNAs from gram-positive bacteria is publicly available so far. Description In order to understand sRNA’s functional and phylogenetic relationships we have developed sRNAdb and provide tools for data analysis and visualization. The data compiled in our database is assembled from experiments as well as from bioinformatics analyses. The software enables comparison and visualization of gene loci surrounding the sRNAs of interest. To accomplish this, we use a client–server based approach. Offline versions of the database including analyses and visualization tools can easily be installed locally on the user’s computer. This feature facilitates customized local addition of unpublished sRNA candidates and related information such as promoters or terminators using tab-delimited files. Conclusion sRNAdb allows a user-friendly and comprehensive comparative analysis of sRNAs from available sequenced gram-positive prokaryotic replicons. Offline versions including analysis and visualization tools facilitate complex user specific bioinformatics analyses. PMID:22883983

  18. EXONSAMPLER: a computer program for genome-wide and candidate gene exon sampling for targeted next-generation sequencing.

    PubMed

    Cosart, Ted; Beja-Pereira, Albano; Luikart, Gordon

    2014-11-01

    The computer program EXONSAMPLER automates the sampling of thousands of exon sequences from publicly available reference genome sequences and gene annotation databases. It was designed to provide exon sequences for the efficient, next-generation gene sequencing method called exon capture. The exon sequences can be sampled by a list of gene name abbreviations (e.g. IFNG, TLR1), or by sampling exons from genes spaced evenly across chromosomes. It provides a list of genomic coordinates (a bed file), as well as a set of sequences in fasta format. User-adjustable parameters for collecting exon sequences include a minimum and maximum acceptable exon length, maximum number of exonic base pairs (bp) to sample per gene, and maximum total bp for the entire collection. It allows for partial sampling of very large exons. It can preferentially sample upstream (5 prime) exons, downstream (3 prime) exons, both external exons, or all internal exons. It is written in the Python programming language using its free libraries. We describe the use of EXONSAMPLER to collect exon sequences from the domestic cow (Bos taurus) genome for the design of an exon-capture microarray to sequence exons from related species, including the zebu cow and wild bison. We collected ~10% of the exome (~3 million bp), including 155 candidate genes, and ~16,000 exons evenly spaced genomewide. We prioritized the collection of 5 prime exons to facilitate discovery and genotyping of SNPs near upstream gene regulatory DNA sequences, which control gene expression and are often under natural selection. © 2014 John Wiley & Sons Ltd.

  19. Porcine transcriptome analysis based on 97 non-normalized cDNA libraries and assembly of 1,021,891 expressed sequence tags

    PubMed Central

    Gorodkin, Jan; Cirera, Susanna; Hedegaard, Jakob; Gilchrist, Michael J; Panitz, Frank; Jørgensen, Claus; Scheibye-Knudsen, Karsten; Arvin, Troels; Lumholdt, Steen; Sawera, Milena; Green, Trine; Nielsen, Bente J; Havgaard, Jakob H; Rosenkilde, Carina; Wang, Jun; Li, Heng; Li, Ruiqiang; Liu, Bin; Hu, Songnian; Dong, Wei; Li, Wei; Yu, Jun; Wang, Jian; Stærfeldt, Hans-Henrik; Wernersson, Rasmus; Madsen, Lone B; Thomsen, Bo; Hornshøj, Henrik; Bujie, Zhan; Wang, Xuegang; Wang, Xuefei; Bolund, Lars; Brunak, Søren; Yang, Huanming; Bendixen, Christian; Fredholm, Merete

    2007-01-01

    Background Knowledge of the structure of gene expression is essential for mammalian transcriptomics research. We analyzed a collection of more than one million porcine expressed sequence tags (ESTs), of which two-thirds were generated in the Sino-Danish Pig Genome Project and one-third are from public databases. The Sino-Danish ESTs were generated from one normalized and 97 non-normalized cDNA libraries representing 35 different tissues and three developmental stages. Results Using the Distiller package, the ESTs were assembled to roughly 48,000 contigs and 73,000 singletons, of which approximately 25% have a high confidence match to UniProt. Approximately 6,000 new porcine gene clusters were identified. Expression analysis based on the non-normalized libraries resulted in the following findings. The distribution of cluster sizes is scaling invariant. Brain and testes are among the tissues with the greatest number of different expressed genes, whereas tissues with more specialized function, such as developing liver, have fewer expressed genes. There are at least 65 high confidence housekeeping gene candidates and 876 cDNA library-specific gene candidates. We identified differential expression of genes between different tissues, in particular brain/spinal cord, and found patterns of correlation between genes that share expression in pairs of libraries. Finally, there was remarkable agreement in expression between specialized tissues according to Gene Ontology categories. Conclusion This EST collection, the largest to date in pig, represents an essential resource for annotation, comparative genomics, assembly of the pig genome sequence, and further porcine transcription studies. PMID:17407547

  20. High-throughput comparison, functional annotation, and metabolic modeling of plant genomes using the PlantSEED resource

    PubMed Central

    Seaver, Samuel M. D.; Gerdes, Svetlana; Frelin, Océane; Lerma-Ortiz, Claudia; Bradbury, Louis M. T.; Zallot, Rémi; Hasnain, Ghulam; Niehaus, Thomas D.; El Yacoubi, Basma; Pasternak, Shiran; Olson, Robert; Pusch, Gordon; Overbeek, Ross; Stevens, Rick; de Crécy-Lagard, Valérie; Ware, Doreen; Hanson, Andrew D.; Henry, Christopher S.

    2014-01-01

    The increasing number of sequenced plant genomes is placing new demands on the methods applied to analyze, annotate, and model these genomes. Today’s annotation pipelines result in inconsistent gene assignments that complicate comparative analyses and prevent efficient construction of metabolic models. To overcome these problems, we have developed the PlantSEED, an integrated, metabolism-centric database to support subsystems-based annotation and metabolic model reconstruction for plant genomes. PlantSEED combines SEED subsystems technology, first developed for microbial genomes, with refined protein families and biochemical data to assign fully consistent functional annotations to orthologous genes, particularly those encoding primary metabolic pathways. Seamless integration with its parent, the prokaryotic SEED database, makes PlantSEED a unique environment for cross-kingdom comparative analysis of plant and bacterial genomes. The consistent annotations imposed by PlantSEED permit rapid reconstruction and modeling of primary metabolism for all plant genomes in the database. This feature opens the unique possibility of model-based assessment of the completeness and accuracy of gene annotation and thus allows computational identification of genes and pathways that are restricted to certain genomes or need better curation. We demonstrate the PlantSEED system by producing consistent annotations for 10 reference genomes. We also produce a functioning metabolic model for each genome, gapfilling to identify missing annotations and proposing gene candidates for missing annotations. Models are built around an extended biomass composition representing the most comprehensive published to date. To our knowledge, our models are the first to be published for seven of the genomes analyzed. PMID:24927599

  1. High-throughput comparison, functional annotation, and metabolic modeling of plant genomes using the PlantSEED resource.

    PubMed

    Seaver, Samuel M D; Gerdes, Svetlana; Frelin, Océane; Lerma-Ortiz, Claudia; Bradbury, Louis M T; Zallot, Rémi; Hasnain, Ghulam; Niehaus, Thomas D; El Yacoubi, Basma; Pasternak, Shiran; Olson, Robert; Pusch, Gordon; Overbeek, Ross; Stevens, Rick; de Crécy-Lagard, Valérie; Ware, Doreen; Hanson, Andrew D; Henry, Christopher S

    2014-07-01

    The increasing number of sequenced plant genomes is placing new demands on the methods applied to analyze, annotate, and model these genomes. Today's annotation pipelines result in inconsistent gene assignments that complicate comparative analyses and prevent efficient construction of metabolic models. To overcome these problems, we have developed the PlantSEED, an integrated, metabolism-centric database to support subsystems-based annotation and metabolic model reconstruction for plant genomes. PlantSEED combines SEED subsystems technology, first developed for microbial genomes, with refined protein families and biochemical data to assign fully consistent functional annotations to orthologous genes, particularly those encoding primary metabolic pathways. Seamless integration with its parent, the prokaryotic SEED database, makes PlantSEED a unique environment for cross-kingdom comparative analysis of plant and bacterial genomes. The consistent annotations imposed by PlantSEED permit rapid reconstruction and modeling of primary metabolism for all plant genomes in the database. This feature opens the unique possibility of model-based assessment of the completeness and accuracy of gene annotation and thus allows computational identification of genes and pathways that are restricted to certain genomes or need better curation. We demonstrate the PlantSEED system by producing consistent annotations for 10 reference genomes. We also produce a functioning metabolic model for each genome, gapfilling to identify missing annotations and proposing gene candidates for missing annotations. Models are built around an extended biomass composition representing the most comprehensive published to date. To our knowledge, our models are the first to be published for seven of the genomes analyzed.

  2. Variation in umami perception and in candidate genes for the umami receptor in mice and humans1234

    PubMed Central

    Shirosaki, Shinya; Ohkuri, Tadahiro; Sanematsu, Keisuke; Islam, AA Shahidul; Ogiwara, Yoko; Kawai, Misako; Yoshida, Ryusuke; Ninomiya, Yuzo

    2009-01-01

    The unique taste induced by monosodium glutamate is referred to as umami taste. The umami taste is also elicited by the purine nucleotides inosine 5′-monophosphate and guanosine 5′-monophosphate. There is evidence that a heterodimeric G protein–coupled receptor, which consists of the T1R1 (taste receptor type 1, member 1, Tas1r1) and the T1R3 (taste receptor type 1, member 3, Tas1r3) proteins, functions as an umami taste receptor for rodents and humans. Splice variants of metabotropic glutamate receptors, mGluR1 (glutamate receptor, metabotropic 1, Grm1) and mGluR4 (glutamate receptor, metabotropic 4, Grm4), also have been proposed as taste receptors for glutamate. The taste sensitivity to umami substances varies in inbred mouse strains and in individual humans. However, little is known about the relation of umami taste sensitivity to variations in candidate umami receptor genes in rodents or in humans. In this article, we summarize current knowledge of the diversity of umami perception in mice and humans. Furthermore, we combine previously published data and new information from the single nucleotide polymorphism databases regarding variation in the mouse and human candidate umami receptor genes: mouse Tas1r1 (TAS1R1 for human), mouse Tas1r3 (TAS1R3 for human), mouse Grm1 (GRM1 for human), and mouse Grm4 (GRM4 for human). Finally, we discuss prospective associations between variation of these genes and umami taste perception in both species. PMID:19625681

  3. PhenomeCentral: A Portal for Phenotypic and Genotypic Matchmaking of Patients with Rare Genetic Diseases

    PubMed Central

    Buske, Orion J.; Girdea, Marta; Dumitriu, Sergiu; Gallinger, Bailey; Hartley, Taila; Trang, Heather; Misyura, Andriy; Friedman, Tal; Beaulieu, Chandree; Bone, William P.; Links, Amanda E.; Washington, Nicole L.; Haendel, Melissa A.; Robinson, Peter N.; Boerkoel, Cornelius F.; Adams, David; Gahl, William A.; Boycott, Kym M.; Brudno, Michael

    2017-01-01

    The discovery of disease-causing mutations typically requires confirmation of the variant or gene in multiple unrelated individuals, and a large number of rare genetic diseases remain unsolved due to difficulty identifying second families. To enable the secure sharing of case records by clinicians and rare disease scientists, we have developed the PhenomeCentral portal (https://phenomecentral.org). Each record includes a phenotypic description and relevant genetic information (exome or candidate genes). PhenomeCentral identifies similar patients in the database based on semantic similarity between clinical features, automatically prioritized genes from whole-exome data, and candidate genes entered by the users, enabling both hypothesis-free and hypothesis-driven matchmaking. Users can then contact other submitters to follow up on promising matches. PhenomeCentral incorporates data for over 1,000 patients with rare genetic diseases, contributed by the FORGE and Care4Rare Canada projects, the US NIH Undiagnosed Diseases Program, the EU Neuromics and ANDDIrare projects, as well as numerous independent clinicians and scientists. Though the majority of these records have associated exome data, most lack a molecular diagnosis. PhenomeCentral has already been used to identify causative mutations for several patients, and its ability to find matching patients and diagnose these diseases will grow with each additional patient that is entered. PMID:26251998

  4. Excess congenital non-synonymous variation in leukemia-associated genes in MLL− infant leukemia: a Children's Oncology Group report

    PubMed Central

    Valentine, M C; Linabery, A M; Chasnoff, S; Hughes, A E O; Mallaney, C; Sanchez, N; Giacalone, J; Heerema, N A; Hilden, J M; Spector, L G; Ross, J A; Druley, T E

    2014-01-01

    Infant leukemia (IL) is a rare sporadic cancer with a grim prognosis. Although most cases are accompanied by MLL rearrangements and harbor very few somatic mutations, less is known about the genetics of the cases without MLL translocations. We performed the largest exome-sequencing study to date on matched non-cancer DNA from pairs of mothers and IL patients to characterize congenital variation that may contribute to early leukemogenesis. Using the COSMIC database to define acute leukemia-associated candidate genes, we find a significant enrichment of rare, potentially functional congenital variation in IL patients compared with randomly selected genes within the same patients and unaffected pediatric controls. IL acute myeloid leukemia (AML) patients had more overall variation than IL acute lymphocytic leukemia (ALL) patients, but less of that variation was inherited from mothers. Of our candidate genes, we found that MLL3 was a compound heterozygote in every infant who developed AML and 50% of infants who developed ALL. These data suggest a model by which known genetic mechanisms for leukemogenesis could be disrupted without an abundance of somatic mutation or chromosomal rearrangements. This model would be consistent with existing models for the establishment of leukemia clones in utero and the high rate of IL concordance in monozygotic twins. PMID:24301523

  5. Array comparative genomic hybridization and computational genome annotation in constitutional cytogenetics: suggesting candidate genes for novel submicroscopic chromosomal imbalance syndromes.

    PubMed

    Van Vooren, Steven; Coessens, Bert; De Moor, Bart; Moreau, Yves; Vermeesch, Joris R

    2007-09-01

    Genome-wide array comparative genomic hybridization screening is uncovering pathogenic submicroscopic chromosomal imbalances in patients with developmental disorders. In those patients, imbalances appear now to be scattered across the whole genome, and most patients carry different chromosomal anomalies. Screening patients with developmental disorders can be considered a forward functional genome screen. The imbalances pinpoint the location of genes that are involved in human development. Because most imbalances encompass regions harboring multiple genes, the challenge is to (1) identify those genes responsible for the specific phenotype and (2) disentangle the role of the different genes located in an imbalanced region. In this review, we discuss novel tools and relevant databases that have recently been developed to aid this gene discovery process. Identification of the functional relevance of genes will not only deepen our understanding of human development but will, in addition, aid in the data interpretation and improve genetic counseling.

  6. Juvenile hormone and colony conditions differentially influence cytochrome P450 gene expression in the termite Reticulitermes flavipes.

    PubMed

    Zhou, X; Song, C; Grzymala, T L; Oi, F M; Scharf, M E

    2006-12-01

    In lower termites, the worker caste is a totipotent immature stage that is capable of differentiating into other adult caste phenotypes. We investigated the diversity of family 4 cytochrome P450 (CYP4) genes in Reticulitermes flavipes workers, with the specific goal of identifying P450s potentially involved in regulating caste differentiation. Seven novel CYP4 genes were identified. Quantitative real-time PCR revealed the tissue distribution of expression for the seven CYP4s, as well as temporal expression changes in workers in association with a release from colony influences and during juvenile hormone (JH)-induced soldier caste differentiation. Several fat-body-related CYP4 genes were differentially expressed after JH treatment. Still other genes changed expression in association with removal from colony influences, suggesting that primer pheromones and/or other colony influences impact their expression. These findings add to a growing database of candidate termite caste-regulatory genes, and provide explicit evidence that colony factors influence termite gene expression.

  7. Impact of training sets on classification of high-throughput bacterial 16s rRNA gene surveys

    PubMed Central

    Werner, Jeffrey J; Koren, Omry; Hugenholtz, Philip; DeSantis, Todd Z; Walters, William A; Caporaso, J Gregory; Angenent, Largus T; Knight, Rob; Ley, Ruth E

    2012-01-01

    Taxonomic classification of the thousands–millions of 16S rRNA gene sequences generated in microbiome studies is often achieved using a naïve Bayesian classifier (for example, the Ribosomal Database Project II (RDP) classifier), due to favorable trade-offs among automation, speed and accuracy. The resulting classification depends on the reference sequences and taxonomic hierarchy used to train the model; although the influence of primer sets and classification algorithms have been explored in detail, the influence of training set has not been characterized. We compared classification results obtained using three different publicly available databases as training sets, applied to five different bacterial 16S rRNA gene pyrosequencing data sets generated (from human body, mouse gut, python gut, soil and anaerobic digester samples). We observed numerous advantages to using the largest, most diverse training set available, that we constructed from the Greengenes (GG) bacterial/archaeal 16S rRNA gene sequence database and the latest GG taxonomy. Phylogenetic clusters of previously unclassified experimental sequences were identified with notable improvements (for example, 50% reduction in reads unclassified at the phylum level in mouse gut, soil and anaerobic digester samples), especially for phylotypes belonging to specific phyla (Tenericutes, Chloroflexi, Synergistetes and Candidate phyla TM6, TM7). Trimming the reference sequences to the primer region resulted in systematic improvements in classification depth, and greatest gains at higher confidence thresholds. Phylotypes unclassified at the genus level represented a greater proportion of the total community variation than classified operational taxonomic units in mouse gut and anaerobic digester samples, underscoring the need for greater diversity in existing reference databases. PMID:21716311

  8. Integration of deep transcriptome and proteome analyses reveals the components of alkaloid metabolism in opium poppy cell cultures

    PubMed Central

    2010-01-01

    Background Papaver somniferum (opium poppy) is the source for several pharmaceutical benzylisoquinoline alkaloids including morphine, the codeine and sanguinarine. In response to treatment with a fungal elicitor, the biosynthesis and accumulation of sanguinarine is induced along with other plant defense responses in opium poppy cell cultures. The transcriptional induction of alkaloid metabolism in cultured cells provides an opportunity to identify components of this process via the integration of deep transcriptome and proteome databases generated using next-generation technologies. Results A cDNA library was prepared for opium poppy cell cultures treated with a fungal elicitor for 10 h. Using 454 GS-FLX Titanium pyrosequencing, 427,369 expressed sequence tags (ESTs) with an average length of 462 bp were generated. Assembly of these sequences yielded 93,723 unigenes, of which 23,753 were assigned Gene Ontology annotations. Transcripts encoding all known sanguinarine biosynthetic enzymes were identified in the EST database, 5 of which were represented among the 50 most abundant transcripts. Liquid chromatography-tandem mass spectrometry (LC-MS/MS) of total protein extracts from cell cultures treated with a fungal elicitor for 50 h facilitated the identification of 1,004 proteins. Proteins were fractionated by one-dimensional SDS-PAGE and digested with trypsin prior to LC-MS/MS analysis. Query of an opium poppy-specific EST database substantially enhanced peptide identification. Eight out of 10 known sanguinarine biosynthetic enzymes and many relevant primary metabolic enzymes were represented in the peptide database. Conclusions The integration of deep transcriptome and proteome analyses provides an effective platform to catalogue the components of secondary metabolism, and to identify genes encoding uncharacterized enzymes. The establishment of corresponding transcript and protein databases generated by next-generation technologies in a system with a well-defined metabolite profile facilitates an improved linkage between genes, enzymes, and pathway components. The proteome database represents the most relevant alkaloid-producing enzymes, compared with the much deeper and more complete transcriptome library. The transcript database contained full-length mRNAs encoding most alkaloid biosynthetic enzymes, which is a key requirement for the functional characterization of novel gene candidates. PMID:21083930

  9. PRGdb 3.0: a comprehensive platform for prediction and analysis of plant disease resistance genes.

    PubMed

    Osuna-Cruz, Cristina M; Paytuvi-Gallart, Andreu; Di Donato, Antimo; Sundesha, Vicky; Andolfo, Giuseppe; Aiese Cigliano, Riccardo; Sanseverino, Walter; Ercolano, Maria R

    2018-01-04

    The Plant Resistance Genes database (PRGdb; http://prgdb.org) has been redesigned with a new user interface, new sections, new tools and new data for genetic improvement, allowing easy access not only to the plant science research community but also to breeders who want to improve plant disease resistance. The home page offers an overview of easy-to-read search boxes that streamline data queries and directly show plant species for which data from candidate or cloned genes have been collected. Bulk data files and curated resistance gene annotations are made available for each plant species hosted. The new Gene Model view offers detailed information on each cloned resistance gene structure to highlight shared attributes with other genes. PRGdb 3.0 offers 153 reference resistance genes and 177 072 annotated candidate Pathogen Receptor Genes (PRGs). Compared to the previous release, the number of putative genes has been increased from 106 to 177 K from 76 sequenced Viridiplantae and algae genomes. The DRAGO 2 tool, which automatically annotates and predicts (PRGs) from DNA and amino acid with high accuracy and sensitivity, has been added. BLAST search has been implemented to offer users the opportunity to annotate and compare their own sequences. The improved section on plant diseases displays useful information linked to genes and genomes to connect complementary data and better address specific needs. Through, a revised and enlarged collection of data, the development of new tools and a renewed portal, PRGdb 3.0 engages the plant science community in developing a consensus plan to improve knowledge and strategies to fight diseases that afflict main crops and other plants. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. State-of-the-art on viral microRNAs in HPV infection and cancer development.

    PubMed

    Poltronieri, Palmiro; Sun, Binlian; Huang, Kai-Yao; Chang, Tzu-Hao; Lee, Tzong-Yi

    2018-03-27

    high-risk HPV subtypes are driving forces for human cancer development: HPV-16 and HPV-18 are responsible for most HPV-caused cancers. This review describes the present knowledge on HR-HPV genomes coding potential for viral miRNAs. HPV subtypes miRNA database, VIRmiRtar, has been constructed applying bioinformatics and a computational method, ViralMir, exploiting structural features, presence of hairpins, and validation by comparison with RNA sequencing datasets. Several miRNA candidates have been localised in the genomes of high-risk HPV subtypes. Among these, HPV-16 miR-1, miR-2 and miR-3. The database contains a list of host candidate gene targets that may be responsible for the oncogenesis in the various cellular environments. miRNA silencing therapies, based on specific cellular uptake of miRNA mimics and antagomiRs, directed towards HPV encoded miRNAs and/or microRNAs deregulated in the host cells, could be a valuable approach to support pharmaceutical interventions in the treatment of HPV dependent cancers. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  11. Transcriptome Analysis and Differential Gene Expression on the Testis of Orange Mud Crab, Scylla olivacea, during Sexual Maturation

    PubMed Central

    Waiho, Khor; Fazhan, Hanafiah; Shahreza, Md Sheriff; Moh, Julia Hwei Zhong; Noorbaiduri, Shaibani; Wong, Li Lian; Sinnasamy, Saranya

    2017-01-01

    Adequate genetic information is essential for sustainable crustacean fisheries and aquaculture management. The commercially important orange mud crab, Scylla olivacea, is prevalent in Southeast Asia region and is highly sought after. Although it is a suitable aquaculture candidate, full domestication of this species is hampered by the lack of knowledge about the sexual maturation process and the molecular mechanisms behind it, especially in males. To date, data on its whole genome is yet to be reported for S. olivacea. The available transcriptome data published previously on this species focus primarily on females and the role of central nervous system in reproductive development. De novo transcriptome sequencing for the testes of S. olivacea from immature, maturing and mature stages were performed. A total of approximately 144 million high-quality reads were generated and de novo assembled into 160,569 transcripts with a total length of 142.2 Mb. Approximately 15–23% of the total assembled transcripts were annotated when compared to public protein sequence databases (i.e. UniProt database, Interpro database, Pfam database and Drosophila melanogaster protein database), and GO-categorised with GO Ontology terms. A total of 156,181 high-quality Single-Nucleotide Polymorphisms (SNPs) were mined from the transcriptome data of present study. Transcriptome comparison among the testes of different maturation stages revealed one gene (beta crystallin like gene) with the most significant differential expression—up-regulated in immature stage and down-regulated in maturing and mature stages. This was further validated by qRT-PCR. In conclusion, a comprehensive transcriptome of the testis of orange mud crabs from different maturation stages were obtained. This report provides an invaluable resource for enhancing our understanding of this species’ genome structure and biology, as expressed and controlled by their gonads. PMID:28135340

  12. Genome-Wide Association Study of Seed Dormancy and the Genomic Consequences of Improvement Footprints in Rice (Oryza sativa L.)

    PubMed Central

    Lu, Qing; Niu, Xiaojun; Zhang, Mengchen; Wang, Caihong; Xu, Qun; Feng, Yue; Yang, Yaolong; Wang, Shan; Yuan, Xiaoping; Yu, Hanyong; Wang, Yiping; Chen, Xiaoping; Liang, Xuanqiang; Wei, Xinghua

    2018-01-01

    Seed dormancy is an important agronomic trait affecting grain yield and quality because of pre-harvest germination and is influenced by both environmental and genetic factors. However, our knowledge of the factors controlling seed dormancy remains limited. To better reveal the molecular mechanism underlying this trait, a genome-wide association study was conducted in an indica-only population consisting of 453 accessions genotyped using 5,291 SNPs. Nine known and new significant SNPs were identified on eight chromosomes. These lead SNPs explained 34.9% of the phenotypic variation, and four of them were designed as dCAPS markers in the hope of accelerating molecular breeding. Moreover, a total of 212 candidate genes was predicted and eight candidate genes showed plant tissue-specific expression in expression profile data from different public bioinformatics databases. In particular, LOC_Os03g10110, which had a maize homolog involved in embryo development, was identified as a candidate regulator for further biological function investigations. Additionally, a polymorphism information content ratio method was used to screen improvement footprints and 27 selective sweeps were identified, most of which harbored domestication-related genes. Further studies suggested that three significant SNPs were adjacent to the candidate selection signals, supporting the accuracy of our genome-wide association study (GWAS) results. These findings show that genome-wide screening for selective sweeps can be used to identify new improvement-related DNA regions, although the phenotypes are unknown. This study enhances our knowledge of the genetic variation in seed dormancy, and the new dormancy-associated SNPs will provide real benefits in molecular breeding. PMID:29354150

  13. Identification of immunohistochemical markers for distinguishing lung adenocarcinoma from squamous cell carcinoma

    PubMed Central

    Zhan, Cheng; Yan, Li; Wang, Lin; Sun, Yang; Wang, Xingxing; Lin, Zongwu; Zhang, Yongxing; Wang, Qun

    2015-01-01

    Background Immunohistochemical staining has been widely used in distinguishing lung adenocarcinoma (LUAD) from lung squamous cell carcinoma (LUSC), which is of vital importance for the diagnosis and treatment of lung cancer. Due to the lack of a comprehensive analysis of different lung cancer subtypes, there may still be undiscovered markers with higher diagnostic accuracy. Methods Herein first, we systematically analyzed high-throughput data obtained from The Cancer Genome Atlas (TCGA) database. Combining differently expressed gene screening and receiver operating characteristic (ROC) curve analysis, we attempted to identify the genes which might be suitable as immunohistochemical markers in distinguishing LUAD from LUSC. Then we detected the expression of six of these genes (MLPH, TMC5, SFTA3, DSG3, DSC3 and CALML3) in lung cancer sections using immunohistochemical staining. Results A number of genes were identified as candidate immunohistochemical markers with high sensitivity and specificity in distinguishing LUAD from LUSC. Then the staining results confirmed the potentials of the six genes (MLPH, TMC5, SFTA3, DSG3, DSC3 and CALML3) in distinguishing LUAD from LUSC, and their sensitivity and specificity were not less than many commonly used markers. Conclusions The results revealed that the six genes (MLPH, TMC5, SFTA3, DSG3, DSC3 and CALML3) might be suitable markers in distinguishing LUAD from LUSC, and also validated the feasibility of our methods for identification of candidate markers from high-throughput data. PMID:26380766

  14. TCF21 is related to testis growth and development in broiler chickens.

    PubMed

    Zhang, Hui; Na, Wei; Zhang, Hong-Li; Wang, Ning; Du, Zhi-Qiang; Wang, Shou-Zhi; Wang, Zhi-Peng; Zhang, Zhiwu; Li, Hui

    2017-02-24

    Large amounts of fat deposition often lead to loss of reproductive efficiency in humans and animals. We used broiler chickens as a model species to conduct a two-directional selection for and against abdominal fat over 19 generations, which resulted in a lean and a fat line. Direct selection for abdominal fat content also indirectly resulted in significant differences (P < 0.05) in testis weight (TeW) and in TeW as a percentage of total body weight (TeP) between the lean and fat lines. A total of 475 individuals from the generation 11 (G 11 ) were genotyped. Genome-wide association studies revealed two regions on chicken chromosomes 3 and 10 that were associated with TeW and TeP. Forty G 16 individuals (20 from each line), were further profiled by focusing on these two chromosomal regions, to identify candidate genes with functions that may be potentially related to testis growth and development. Of the nine candidate genes identified with database mining, a significant association was confirmed for one gene, TCF21, based on mRNA expression analysis. Gene expression analysis of the TCF21 gene was conducted again across 30 G 19 individuals (15 individuals from each line) and the results confirmed the findings on the G 16 animals. This study revealed that the TCF21 gene is related to testis growth and development in male broilers. This finding will be useful to guide future studies to understand the genetic mechanisms that underlie reproductive efficiency.

  15. Host-associated bacterial taxa from Chlorobi, Chloroflexi, GN02, Synergistetes, SR1, TM7, and WPS-2 Phyla/candidate divisions

    PubMed Central

    Camanocha, Anuj; Dewhirst, Floyd E.

    2014-01-01

    Background and objective In addition to the well-known phyla Firmicutes, Proteobacteria, Bacteroidetes, Actinobacteria, Spirochaetes, Fusobacteria, Tenericutes, and Chylamydiae, the oral microbiomes of mammals contain species from the lesser-known phyla or candidate divisions, including Synergistetes, TM7, Chlorobi, Chloroflexi, GN02, SR1, and WPS-2. The objectives of this study were to create phyla-selective 16S rDNA PCR primer pairs, create selective 16S rDNA clone libraries, identify novel oral taxa, and update canine and human oral microbiome databases. Design 16S rRNA gene sequences for members of the lesser-known phyla were downloaded from GenBank and Greengenes databases and aligned with sequences in our RNA databases. Primers with potential phylum level selectivity were designed heuristically with the goal of producing nearly full-length 16S rDNA amplicons. The specificity of primer pairs was examined by making clone libraries from PCR amplicons and determining phyla identity by BLASTN analysis. Results Phylum-selective primer pairs were identified that allowed construction of clone libraries with 96–100% specificity for each of the lesser-known phyla. From these clone libraries, seven human and two canine novel oral taxa were identified and added to their respective taxonomic databases. For each phylum, genome sequences closest to human oral taxa were identified and added to the Human Oral Microbiome Database to facilitate metagenomic, transcriptomic, and proteomic studies that involve tiling sequences to the most closely related taxon. While examining ribosomal operons in lesser-known phyla from single-cell genomes and metagenomes, we identified a novel rRNA operon order (23S-5S-16S) in three SR1 genomes and the splitting of the 23S rRNA gene by an I-CeuI-like homing endonuclease in a WPS-2 genome. Conclusions This study developed useful primer pairs for making phylum-selective 16S rRNA clone libraries. Phylum-specific libraries were shown to be useful for identifying previously unrecognized taxa in lesser-known phyla and would be useful for future environmental and host-associated studies. PMID:25317252

  16. Assembly of the draft genome of buckwheat and its applications in identifying agronomically useful genes

    PubMed Central

    Yasui, Yasuo; Hirakawa, Hideki; Ueno, Mariko; Matsui, Katsuhiro; Katsube-Tanaka, Tomoyuki; Yang, Soo Jung; Aii, Jotaro; Sato, Shingo; Mori, Masashi

    2016-01-01

    Buckwheat (Fagopyrum esculentum Moench; 2n = 2x = 16) is a nutritionally dense annual crop widely grown in temperate zones. To accelerate molecular breeding programmes of this important crop, we generated a draft assembly of the buckwheat genome using short reads obtained by next-generation sequencing (NGS), and constructed the Buckwheat Genome DataBase. After assembling short reads, we determined 387,594 scaffolds as the draft genome sequence (FES_r1.0). The total length of FES_r1.0 was 1,177,687,305 bp, and the N50 of the scaffolds was 25,109 bp. Gene prediction analysis revealed 286,768 coding sequences (CDSs; FES_r1.0_cds) including those related to transposable elements. The total length of FES_r1.0_cds was 212,917,911 bp, and the N50 was 1,101 bp. Of these, the functions of 35,816 CDSs excluding those for transposable elements were annotated by BLAST analysis. To demonstrate the utility of the database, we conducted several test analyses using BLAST and keyword searches. Furthermore, we used the draft genome as a reference sequence for NGS-based markers, and successfully identified novel candidate genes controlling heteromorphic self-incompatibility of buckwheat. The database and draft genome sequence provide a valuable resource that can be used in efforts to develop buckwheat cultivars with superior agronomic traits. PMID:27037832

  17. PolySearch2: a significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins and more.

    PubMed

    Liu, Yifeng; Liang, Yongjie; Wishart, David

    2015-07-01

    PolySearch2 (http://polysearch.ca) is an online text-mining system for identifying relationships between biomedical entities such as human diseases, genes, SNPs, proteins, drugs, metabolites, toxins, metabolic pathways, organs, tissues, subcellular organelles, positive health effects, negative health effects, drug actions, Gene Ontology terms, MeSH terms, ICD-10 medical codes, biological taxonomies and chemical taxonomies. PolySearch2 supports a generalized 'Given X, find all associated Ys' query, where X and Y can be selected from the aforementioned biomedical entities. An example query might be: 'Find all diseases associated with Bisphenol A'. To find its answers, PolySearch2 searches for associations against comprehensive collections of free-text collections, including local versions of MEDLINE abstracts, PubMed Central full-text articles, Wikipedia full-text articles and US Patent application abstracts. PolySearch2 also searches 14 widely used, text-rich biological databases such as UniProt, DrugBank and Human Metabolome Database to improve its accuracy and coverage. PolySearch2 maintains an extensive thesaurus of biological terms and exploits the latest search engine technology to rapidly retrieve relevant articles and databases records. PolySearch2 also generates, ranks and annotates associative candidates and present results with relevancy statistics and highlighted key sentences to facilitate user interpretation. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. PolySearch2: a significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins and more

    PubMed Central

    Liu, Yifeng; Liang, Yongjie; Wishart, David

    2015-01-01

    PolySearch2 (http://polysearch.ca) is an online text-mining system for identifying relationships between biomedical entities such as human diseases, genes, SNPs, proteins, drugs, metabolites, toxins, metabolic pathways, organs, tissues, subcellular organelles, positive health effects, negative health effects, drug actions, Gene Ontology terms, MeSH terms, ICD-10 medical codes, biological taxonomies and chemical taxonomies. PolySearch2 supports a generalized ‘Given X, find all associated Ys’ query, where X and Y can be selected from the aforementioned biomedical entities. An example query might be: ‘Find all diseases associated with Bisphenol A’. To find its answers, PolySearch2 searches for associations against comprehensive collections of free-text collections, including local versions of MEDLINE abstracts, PubMed Central full-text articles, Wikipedia full-text articles and US Patent application abstracts. PolySearch2 also searches 14 widely used, text-rich biological databases such as UniProt, DrugBank and Human Metabolome Database to improve its accuracy and coverage. PolySearch2 maintains an extensive thesaurus of biological terms and exploits the latest search engine technology to rapidly retrieve relevant articles and databases records. PolySearch2 also generates, ranks and annotates associative candidates and present results with relevancy statistics and highlighted key sentences to facilitate user interpretation. PMID:25925572

  19. POLR2C Mutations Are Associated With Primary Ovarian Insufficiency in Women.

    PubMed

    Moriwaki, Mika; Moore, Barry; Mosbruger, Timothy; Neklason, Deborah W; Yandell, Mark; Jorde, Lynn B; Welt, Corrine K

    2017-03-01

    Primary ovarian insufficiency (POI) results from a premature loss of oocytes, causing infertility and early menopause. The etiology of POI remains unknown in a majority of cases. To identify candidate genes in families affected by POI. This was a family-based genetic study. The study was performed at two academic institutions. A family with four generations of women affected by POI (n = 5). Four of these women, three with an associated autoimmune diagnosis, were studied. The controls (n = 387) were recruited for health in old age. Whole-genome sequencing was performed. Candidate genes were identified by comparing gene mutations in three family members and 387 control subjects analyzed simultaneously using the pedigree Variant Annotation, Analysis and Search Tool. Data were also compared with that in publicly available databases. We identified a heterozygous nonsense mutation in a subunit of RNA polymerase II ( POLR2C ) that synthesizes messenger RNA. A rare sequence variant in POLR2C was also identified in one of 96 women with sporadic POI. POLR2C expression was decreased in the proband compared with women with POI from another cause. Knockdown in an embryonic carcinoma cell line resulted in decreased protein production and impaired cell proliferation. These data support a role for RNA polymerase II mutations as candidates in the etiology of POI. The current data also support results from genome-wide association studies that hypothesize a role for RNA polymerase II subunits in age at menopause in the population.

  20. Methodological approaches for using synchrotron X-ray fluorescence (SXRF) imaging as a tool in ionomics: Examples from Arabidopsis thaliana

    PubMed Central

    Hindt, Maria; Socha, Amanda L.; Zuber, Hélène

    2013-01-01

    Here we present approaches for using multi-elemental imaging (specifically synchrotron X-ray fluorescence microscopy, SXRF) in ionomics, with examples using the model plant Arabidopsis thaliana. The complexity of each approach depends on the amount of a priori information available for the gene and/or phenotype being studied. Three approaches are outlined, which apply to experimental situations where a gene of interest has been identified but has an unknown phenotype (Phenotyping), an unidentified gene is associated with a known phenotype (Gene Cloning) and finally, a Screening approach, where both gene and phenotype are unknown. These approaches make use of open-access, online databases with which plant molecular genetics researchers working in the model plant Arabidopsis will be familiar, in particular the Ionomics Hub and online transcriptomic databases such as the Arabidopsis eFP browser. The approaches and examples we describe are based on the assumption that altering the expression of ion transporters can result in changes in elemental distribution. We provide methodological details on using elemental imaging to aid or accelerate gene functional characterization by narrowing down the search for candidate genes to the tissues in which elemental distributions are altered. We use synchrotron X-ray microprobes as a technique of choice, which can now be used to image all parts of an Arabidopsis plant in a hydrated state. We present elemental images of leaves, stem, root, siliques and germinating hypocotyls. PMID:23912758

  1. STARNET 2: a web-based tool for accelerating discovery of gene regulatory networks using microarray co-expression data

    PubMed Central

    Jupiter, Daniel; Chen, Hailin; VanBuren, Vincent

    2009-01-01

    Background Although expression microarrays have become a standard tool used by biologists, analysis of data produced by microarray experiments may still present challenges. Comparison of data from different platforms, organisms, and labs may involve complicated data processing, and inferring relationships between genes remains difficult. Results STARNET 2 is a new web-based tool that allows post hoc visual analysis of correlations that are derived from expression microarray data. STARNET 2 facilitates user discovery of putative gene regulatory networks in a variety of species (human, rat, mouse, chicken, zebrafish, Drosophila, C. elegans, S. cerevisiae, Arabidopsis and rice) by graphing networks of genes that are closely co-expressed across a large heterogeneous set of preselected microarray experiments. For each of the represented organisms, raw microarray data were retrieved from NCBI's Gene Expression Omnibus for a selected Affymetrix platform. All pairwise Pearson correlation coefficients were computed for expression profiles measured on each platform, respectively. These precompiled results were stored in a MySQL database, and supplemented by additional data retrieved from NCBI. A web-based tool allows user-specified queries of the database, centered at a gene of interest. The result of a query includes graphs of correlation networks, graphs of known interactions involving genes and gene products that are present in the correlation networks, and initial statistical analyses. Two analyses may be performed in parallel to compare networks, which is facilitated by the new HEATSEEKER module. Conclusion STARNET 2 is a useful tool for developing new hypotheses about regulatory relationships between genes and gene products, and has coverage for 10 species. Interpretation of the correlation networks is supported with a database of previously documented interactions, a test for enrichment of Gene Ontology terms, and heat maps of correlation distances that may be used to compare two networks. The list of genes in a STARNET network may be useful in developing a list of candidate genes to use for the inference of causal networks. The tool is freely available at , and does not require user registration. PMID:19828039

  2. Isolation and identification of new pollen-specific SFB genes in Japanese apricot (Prunus mume).

    PubMed

    Wang, P P; Gao, Z H; Ni, Z J; Zhuang, W B; Zhang, Z

    2013-09-03

    SFB, a candidate gene for the pollen S gene, has been identified in several species of Prunus (Rosaceae). We isolated 5 new SFB alleles from 6 Japanese apricot (Prunus mume) lines using a specific Prunus SFB primer pair (SFB-C1F and Pm-Vb), which was designed from conserved regions of Prunus SFB. The nucleotide sequences of these SFB genes were submitted to the GenBank database. The 5 new SFB alleles share typical structural features with SFB alleles from other Prunus species and were found to be polymorphic, with 67.08 to 96.91% amino acid identity. These new SFB alleles were specifically expressed in the pollen. We conclude that the PmSFB alleles that we identified are the pollen S determinants of Japanese apricot; they have potential as a tool for studies of the mechanisms of pollen self-incompatibility.

  3. TargetCompare: A web interface to compare simultaneous miRNAs targets

    PubMed Central

    Moreira, Fabiano Cordeiro; Dustan, Bruno; Hamoy, Igor G; Ribeiro-dos-Santos, André M; dos Santos, Ândrea Ribeiro

    2014-01-01

    MicroRNAs (miRNAs) are small non-coding nucleotide sequences between 17 and 25 nucleotides in length that primarily function in the regulation of gene expression. A since miRNA has thousand of predict targets in a complex, regulatory cell signaling network. Therefore, it is of interest to study multiple target genes simultaneously. Hence, we describe a web tool (developed using Java programming language and MySQL database server) to analyse multiple targets of pre-selected miRNAs. We cross validated the tool in eight most highly expressed miRNAs in the antrum region of stomach. This helped to identify 43 potential genes that are target of at least six of the referred miRNAs. The developed tool aims to reduce the randomness and increase the chance of selecting strong candidate target genes and miRNAs responsible for playing important roles in the studied tissue. Availability http://lghm.ufpa.br/targetcompare PMID:25352731

  4. TargetCompare: A web interface to compare simultaneous miRNAs targets.

    PubMed

    Moreira, Fabiano Cordeiro; Dustan, Bruno; Hamoy, Igor G; Ribeiro-Dos-Santos, André M; Dos Santos, Andrea Ribeiro

    2014-01-01

    MicroRNAs (miRNAs) are small non-coding nucleotide sequences between 17 and 25 nucleotides in length that primarily function in the regulation of gene expression. A since miRNA has thousand of predict targets in a complex, regulatory cell signaling network. Therefore, it is of interest to study multiple target genes simultaneously. Hence, we describe a web tool (developed using Java programming language and MySQL database server) to analyse multiple targets of pre-selected miRNAs. We cross validated the tool in eight most highly expressed miRNAs in the antrum region of stomach. This helped to identify 43 potential genes that are target of at least six of the referred miRNAs. The developed tool aims to reduce the randomness and increase the chance of selecting strong candidate target genes and miRNAs responsible for playing important roles in the studied tissue. http://lghm.ufpa.br/targetcompare.

  5. Gene Fusion Markup Language: a prototype for exchanging gene fusion data

    PubMed Central

    2012-01-01

    Background An avalanche of next generation sequencing (NGS) studies has generated an unprecedented amount of genomic structural variation data. These studies have also identified many novel gene fusion candidates with more detailed resolution than previously achieved. However, in the excitement and necessity of publishing the observations from this recently developed cutting-edge technology, no community standardization approach has arisen to organize and represent the data with the essential attributes in an interchangeable manner. As transcriptome studies have been widely used for gene fusion discoveries, the current non-standard mode of data representation could potentially impede data accessibility, critical analyses, and further discoveries in the near future. Results Here we propose a prototype, Gene Fusion Markup Language (GFML) as an initiative to provide a standard format for organizing and representing the significant features of gene fusion data. GFML will offer the advantage of representing the data in a machine-readable format to enable data exchange, automated analysis interpretation, and independent verification. As this database-independent exchange initiative evolves it will further facilitate the formation of related databases, repositories, and analysis tools. The GFML prototype is made available at http://code.google.com/p/gfml-prototype/. Conclusion The Gene Fusion Markup Language (GFML) presented here could facilitate the development of a standard format for organizing, integrating and representing the significant features of gene fusion data in an inter-operable and query-able fashion that will enable biologically intuitive access to gene fusion findings and expedite functional characterization. A similar model is envisaged for other NGS data analyses. PMID:23072312

  6. Towards barcode markers in Fungi: an intron map of Ascomycota mitochondria.

    PubMed

    Santamaria, Monica; Vicario, Saverio; Pappadà, Graziano; Scioscia, Gaetano; Scazzocchio, Claudio; Saccone, Cecilia

    2009-06-16

    A standardized and cost-effective molecular identification system is now an urgent need for Fungi owing to their wide involvement in human life quality. In particular the potential use of mitochondrial DNA species markers has been taken in account. Unfortunately, a serious difficulty in the PCR and bioinformatic surveys is due to the presence of mobile introns in almost all the fungal mitochondrial genes. The aim of this work is to verify the incidence of this phenomenon in Ascomycota, testing, at the same time, a new bioinformatic tool for extracting and managing sequence databases annotations, in order to identify the mitochondrial gene regions where introns are missing so as to propose them as species markers. The general trend towards a large occurrence of introns in the mitochondrial genome of Fungi has been confirmed in Ascomycota by an extensive bioinformatic analysis, performed on all the entries concerning 11 mitochondrial protein coding genes and 2 mitochondrial rRNA (ribosomal RNA) specifying genes, belonging to this phylum, available in public nucleotide sequence databases. A new query approach has been developed to retrieve effectively introns information included in these entries. After comparing the new query-based approach with a blast-based procedure, with the aim of designing a faithful Ascomycota mitochondrial intron map, the first method appeared clearly the most accurate. Within this map, despite the large pervasiveness of introns, it is possible to distinguish specific regions comprised in several genes, including the full NADH dehydrogenase subunit 6 (ND6) gene, which could be considered as barcode candidates for Ascomycota due to their paucity of introns and to their length, above 400 bp, comparable to the lower end size of the length range of barcodes successfully used in animals. The development of the new query system described here would answer the pressing requirement to improve drastically the bioinformatics support to the DNA Barcode Initiative. The large scale investigation of Ascomycota mitochondrial introns performed through this tool, allowing to exclude the introns-rich sequences from the barcode candidates exploration, could be the first step towards a mitochondrial barcoding strategy for these organisms, similar to the standard approach employed in metazoans.

  7. Transcriptome Analysis in Sheepgrass (Leymus chinensis). A Dominant Perennial Grass of the Eurasian Steppe

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, Shuangyan; Huang, Xin; Yang, Xiaohan

    BACKGROUND: Sheepgrass [Leymus chinensis (Trin.) Tzvel.] is an important perennial forage grass across the Eurasian Steppe and is known for its adaptability to various environmental conditions. However, insufficient data resources in public databases for sheepgrass limited our understanding of the mechanism of environmental adaptations, gene discovery and molecular marker development. RESULTS: The transcriptome of sheepgrass was sequenced using Roche 454 pyrosequencing technology. We assembled 952,328 high-quality reads into 87,214 unigenes, including 32,416 contigs and 54,798 singletons. There were 15,450 contigs over 500 bp in length. BLAST searches of our database against Swiss-Prot and NCBI non-redundant protein sequences (nr) databases resultedmore » in the annotation of 54,584 (62.6%) of the unigenes. Gene Ontology (GO) analysis assigned 89,129 GO term annotations for 17,463 unigenes. We identified 11,675 core Poaceae-specific and 12,811 putative sheepgrass-specific unigenes by BLAST searches against all plant genome and transcriptome databases. A total of 2,979 specific freezing-responsive unigenes were found from this RNAseq dataset. We identified 3,818 EST-SSRs in 3,597 unigenes, and some SSRs contained unigenes that were also candidates for freezing-response genes. Characterizations of nucleotide repeats and dominant motifs of SSRs in sheepgrass were also performed. Similarity and phylogenetic analysis indicated that sheepgrass is closely related to barley and wheat. CONCLUSIONS: This research has greatly enriched sheepgrass transcriptome resources. The identified stress-related genes will help us to decipher the genetic basis of the environmental and ecological adaptations of this species and will be used to improve wheat and barley crops through hybridization or genetic transformation. The EST-SSRs reported here will be a valuable resource for future gene-phenotype studies and for the molecular breeding of sheepgrass and other Poaceae species.« less

  8. No Evidence That Schizophrenia Candidate Genes Are More Associated With Schizophrenia Than Noncandidate Genes.

    PubMed

    Johnson, Emma C; Border, Richard; Melroy-Greif, Whitney E; de Leeuw, Christiaan A; Ehringer, Marissa A; Keller, Matthew C

    2017-11-15

    A recent analysis of 25 historical candidate gene polymorphisms for schizophrenia in the largest genome-wide association study conducted to date suggested that these commonly studied variants were no more associated with the disorder than would be expected by chance. However, the same study identified other variants within those candidate genes that demonstrated genome-wide significant associations with schizophrenia. As such, it is possible that variants within historic schizophrenia candidate genes are associated with schizophrenia at levels above those expected by chance, even if the most-studied specific polymorphisms are not. The present study used association statistics from the largest schizophrenia genome-wide association study conducted to date as input to a gene set analysis to investigate whether variants within schizophrenia candidate genes are enriched for association with schizophrenia. As a group, variants in the most-studied candidate genes were no more associated with schizophrenia than were variants in control sets of noncandidate genes. While a small subset of candidate genes did appear to be significantly associated with schizophrenia, these genes were not particularly noteworthy given the large number of more strongly associated noncandidate genes. The history of schizophrenia research should serve as a cautionary tale to candidate gene investigators examining other phenotypes: our findings indicate that the most investigated candidate gene hypotheses of schizophrenia are not well supported by genome-wide association studies, and it is likely that this will be the case for other complex traits as well. Copyright © 2017 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.

  9. Gene expression profiles of fin regeneration in loach (Paramisgurnus dabryanu).

    PubMed

    Li, Li; He, Jingya; Wang, Linlin; Chen, Weihua; Chang, Zhongjie

    2017-11-01

    Teleost fins can regenerate accurate position-matched structure and function after amputation. However, we still lack systematic transcriptional profiling and methodologies to understand the molecular basis of fin regeneration. After histological analysis, we established a suppression subtraction hybridization library containing 418 distinct sequences expressed differentially during the process of blastema formation and differentiation in caudal fin regeneration. Genome ontology and comparative analysis of differential distribution of our data and the reference zebrafish genome showed notable subcategories, including multi-organism processes, response to stimuli, extracellular matrix, antioxidant activity, and cell junction function. KEGG pathway analysis allowed the effective identification of relevant genes in those pathways involved in tissue morphogenesis and regeneration, including tight junction, cell adhesion molecules, mTOR and Jak-STAT signaling pathway. From relevant function subcategories and signaling pathways, 78 clones were examined for further Southern-blot hybridization. Then, 17 genes were chosen and characterized using semi-quantitative PCR. Then 4 candidate genes were identified, including F11r, Mmp9, Agr2 and one without a match to any database. After real-time quantitative PCR, the results showed obvious expression changes in different periods of caudal fin regeneration. We can assume that the 4 candidates, likely valuable genes associated with fin regeneration, deserve additional attention. Thus, our study demonstrated how to investigate the transcript profiles with an emphasis on bioinformatics intervention and how to identify potential genes related to fin regeneration processes. The results also provide a foundation or knowledge for further research into genes and molecular mechanisms of fin regeneration. Copyright © 2017 Elsevier B.V. All rights reserved.

  10. An intersection network based on combining SNP co-association and RNA co-expression networks for feed utilization traits in Japanese Black cattle.

    PubMed

    Okada, D; Endo, S; Matsuda, H; Ogawa, S; Taniguchi, Y; Katsuta, T; Watanabe, T; Iwaisaki, H

    2018-05-12

    Genome-wide association studies (GWAS) of quantitative traits have detected numerous genetic associations, but they encounter difficulties in pinpointing prominent candidate genes and inferring gene networks. The present study used a systems genetics approach integrating GWAS results with external RNA-expression data to detect candidate gene networks in feed utilization and growth traits of Japanese Black cattle, which are matters of concern. A SNP co-association network was derived from significant correlations between SNPs with effects estimated by GWAS across seven phenotypic traits. The resulting network genes contained significant numbers of annotations related to the traits. Using bovine transcriptome data from a public database, an RNA co-expression network was inferred based on the similarity of expression patterns across different tissues. An intersection network was then generated by superimposing the SNP and RNA networks and extracting shared interactions. This intersection network contained four tissue-specific modules: nervous system, reproductive system, muscular system, and glands. To characterize the structure (topographical properties) of the three networks, their scale-free properties were evaluated, which revealed that the intersection network was the most scale-free. In the sub-network containing the most connected transcription factors (URI1, ROCK2 and ETV6), most genes were widely expressed across tissues, and genes previously shown to be involved in the traits were found. Results indicated that the current approach might be used to construct a gene network that better reflects biological information, providing encouragement for the genetic dissection of economically important quantitative traits.

  11. Transcriptome-wide analysis of WRKY transcription factors in wheat and their leaf rust responsive expression profiling.

    PubMed

    Satapathy, Lopamudra; Singh, Dharmendra; Ranjan, Prashant; Kumar, Dhananjay; Kumar, Manish; Prabhu, Kumble Vinod; Mukhopadhyay, Kunal

    2014-12-01

    WRKY, a plant-specific transcription factor family, has important roles in pathogen defense, abiotic cues and phytohormone signaling, yet little is known about their roles and molecular mechanism of function in response to rust diseases in wheat. We identified 100 TaWRKY sequences using wheat Expressed Sequence Tag database of which 22 WRKY sequences were novel. Identified proteins were characterized based on their zinc finger motifs and phylogenetic analysis clustered them into six clades consisting of class IIc and class III WRKY proteins. Functional annotation revealed major functions in metabolic and cellular processes in control plants; whereas response to stimuli, signaling and defense in pathogen inoculated plants, their major molecular function being binding to DNA. Tag-based expression analysis of the identified genes revealed differential expression between mock and Puccinia triticina inoculated wheat near isogenic lines. Gene expression was also performed with six rust-related microarray experiments at Gene Expression Omnibus database. TaWRKY10, 15, 17 and 56 were common in both tag-based and microarray-based differential expression analysis and could be representing rust specific WRKY genes. The obtained results will bestow insight into the functional characterization of WRKY transcription factors responsive to leaf rust pathogenesis that can be used as candidate genes in molecular breeding programs to improve biotic stress tolerance in wheat.

  12. Identification of Candidate Gene Variants in Korean MODY Families by Whole-Exome Sequencing.

    PubMed

    Shim, Ye Jee; Kim, Jung Eun; Hwang, Su-Kyeong; Choi, Bong Seok; Choi, Byung Ho; Cho, Eun-Mi; Jang, Kyoung Mi; Ko, Cheol Woo

    2015-01-01

    To date, 13 genes causing maturity-onset diabetes of the young (MODY) have been identified. However, there is a big discrepancy in the genetic locus between Asian and Caucasian patients with MODY. Thus, we conducted whole-exome sequencing in Korean MODY families to identify causative gene variants. Six MODY probands and their family members were included. Variants in the dbSNP135 and TIARA databases for Koreans and the variants with minor allele frequencies >0.5% of the 1000 Genomes database were excluded. We selected only the functional variants (gain of stop codon, frameshifts and nonsynonymous single-nucleotide variants) and conducted a case-control comparison in the family members. The selected variants were scanned for the previously introduced gene set implicated in glucose metabolism. Three variants c.620C>T:p.Thr207Ile in PTPRD, c.559C>G:p.Gln187Glu in SYT9, and c.1526T>G:p.Val509Gly in WFS1 were respectively identified in 3 families. We could not find any disease-causative alleles of known MODY 1-13 genes. Based on the predictive program, Thr207Ile in PTPRD was considered pathogenic. Whole-exome sequencing is a valuable method for the genetic diagnosis of MODY. Further evaluation is necessary about the role of PTPRD, SYT9 and WFS1 in normal insulin release from pancreatic beta cells. © 2015 S. Karger AG, Basel.

  13. Isolation and expression analysis of cDNAs that are associated with alternate bearing in Olea europaea L. cv. Ayvalık

    PubMed Central

    2013-01-01

    Background Olive cDNA libraries to isolate candidate genes that can help enlightening the molecular mechanism of periodicity and / or fruit production were constructed and analyzed. For this purpose, cDNA libraries from the leaves of trees in “on year” and in “off year” in July (when fruits start to appear) and in November (harvest time) were constructed. Randomly selected 100 positive clones from each library were analyzed with respect to sequence and size. A fruit-flesh cDNA library was also constructed and characterized to confirm the reliability of each library’s temporal and spatial properties. Results Quantitative real-time RT-PCR (qRT-PCR) analyses of the cDNA libraries confirmed cDNA molecules that are associated with different developmental stages (e. g. “on year” leaves in July, “off year” leaves in July, leaves in November) and fruits. Hence, a number of candidate cDNAs associated with “on year” and “off year” were isolated. Comparison of the detected cDNAs to the current EST database of GenBank along with other non - redundant databases of NCBI revealed homologs of previously described genes along with several unknown cDNAs. Of around 500 screened cDNAs, 48 cDNA elements were obtained after eliminating ribosomal RNA sequences. These independent transcripts were analyzed using BLAST searches (cutoff E-value of 1.0E-5) against the KEGG and GenBank nucleotide databases and 37 putative transcripts corresponding to known gene functions were annotated with gene names and Gene Ontology (GO) terms. Transcripts in the biological process were found to be related with metabolic process (27%), cellular process (23%), response to stimulus (17%), localization process (8.5%), multicellular organismal process (6.25%), developmental process (6.25%) and reproduction (4.2%). Conclusions A putative P450 monooxigenase expressed fivefold more in the “on year” than that of “off year” leaves in July. Two putative dehydrins expressed significantly more in “on year” leaves than that of “off year” leaves in November. Homologs of UDP – glucose epimerase, acyl - CoA binding protein, triose phosphate isomerase and a putative nuclear core anchor protein were significant in fruits only, while a homolog of an embryo binding protein / small GTPase regulator was detected in “on year” leaves only. One of the two unknown cDNAs was specific to leaves in July while the other was detected in all of the libraries except fruits. KEGG pathway analyses for the obtained sequences correlated with essential metabolisms such as galactose metabolism, amino sugar and nucleotide sugar metabolisms and photosynthesis. Detailed analysis of the results presents candidate cDNAs that can be used to dissect further the genetic basis of fruit production and / or alternate bearing which causes significant economical loss for olive growers. PMID:23552171

  14. An informatics approach to integrating genetic and neurological data in speech and language neuroscience.

    PubMed

    Bohland, Jason W; Myers, Emma M; Kim, Esther

    2014-01-01

    A number of heritable disorders impair the normal development of speech and language processes and occur in large numbers within the general population. While candidate genes and loci have been identified, the gap between genotype and phenotype is vast, limiting current understanding of the biology of normal and disordered processes. This gap exists not only in our scientific knowledge, but also in our research communities, where genetics researchers and speech, language, and cognitive scientists tend to operate independently. Here we describe a web-based, domain-specific, curated database that represents information about genotype-phenotype relations specific to speech and language disorders, as well as neuroimaging results demonstrating focal brain differences in relevant patients versus controls. Bringing these two distinct data types into a common database ( http://neurospeech.org/sldb ) is a first step toward bringing molecular level information into cognitive and computational theories of speech and language function. One bridge between these data types is provided by densely sampled profiles of gene expression in the brain, such as those provided by the Allen Brain Atlases. Here we present results from exploratory analyses of human brain gene expression profiles for genes implicated in speech and language disorders, which are annotated in our database. We then discuss how such datasets can be useful in the development of computational models that bridge levels of analysis, necessary to provide a mechanistic understanding of heritable language disorders. We further describe our general approach to information integration, discuss important caveats and considerations, and offer a specific but speculative example based on genes implicated in stuttering and basal ganglia function in speech motor control.

  15. Complex network theory for the identification and assessment of candidate protein targets.

    PubMed

    McGarry, Ken; McDonald, Sharon

    2018-06-01

    In this work we use complex network theory to provide a statistical model of the connectivity patterns of human proteins and their interaction partners. Our intention is to identify important proteins that may be predisposed to be potential candidates as drug targets for therapeutic interventions. Target proteins usually have more interaction partners than non-target proteins, but there are no hard-and-fast rules for defining the actual number of interactions. We devise a statistical measure for identifying hub proteins, we score our target proteins with gene ontology annotations. The important druggable protein targets are likely to have similar biological functions that can be assessed for their potential therapeutic value. Our system provides a statistical analysis of the local and distant neighborhood protein interactions of the potential targets using complex network measures. This approach builds a more accurate model of drug-to-target activity and therefore the likely impact on treating diseases. We integrate high quality protein interaction data from the HINT database and disease associated proteins from the DrugTarget database. Other sources include biological knowledge from Gene Ontology and drug information from DrugBank. The problem is a very challenging one since the data is highly imbalanced between target proteins and the more numerous nontargets. We use undersampling on the training data and build Random Forest classifier models which are used to identify previously unclassified target proteins. We validate and corroborate these findings from the available literature. Copyright © 2018 Elsevier Ltd. All rights reserved.

  16. Novel approaches to global mining of aberrantly methylated promoter sites in squamous head and neck cancer.

    PubMed

    Worsham, Maria J; Chen, Kang Mei; Stephen, Josena K; Havard, Shaleta; Benninger, Michael S

    2010-07-01

    Promoter hypermethylation is emerging as a promising molecular strategy for early detection of cancer. We examined promoter methylation status of 1143 cancer-associated genes to perform a global but unbiased inspection of methylated regions in head and neck squamous cell carcinoma (HNSCC). Laboratory-based study. Integrated health care system. Five samples, two frozen primary HNSCC biopsies and three HNSCC cell lines, were examined. Whole genomic DNA was interrogated using a combination of DNA immunoprecipitation (IP) and Affymetrix whole-genome tiling arrays. Of the 1143 unique cancer genes on the array, 265 were recorded across five samples. Of the 265 genes, 55 were present in all five samples, and 36 were common to four of five samples, 46 to three of five, 56 to two of five, and 72 to one of five samples. Hypermethylated genes in the five samples were cross-examined against those in PubMeth, a cancer methylation database combining text mining and expert annotation (http://www.pubmeth.org). Of the 441 genes in PubMeth, only 33 are referenced to HNSCC. We matched 34 genes in our samples to the 441 genes in the PubMeth database. Of the 34 genes, eight are reported in PubMeth as HNSCC associated. This pilot study examined the contribution of global DNA hypermethylation to the pathogenesis of HNSCC. The whole-genome methylation approach indicated 231 new genes with methylated promoter regions not yet reported in HNSCC. Examination of this comprehensive gene panel in a larger HNSCC cohort should advance selection of HNSCC-specific candidate genes for further validation as biomarkers in HNSCC. 2010 American Academy of Otolaryngology-Head and Neck Surgery Foundation. Published by Mosby, Inc. All rights reserved.

  17. Identification of a QTL in Mus musculus for Alcohol Preference, Withdrawal, and Ap3m2 Expression Using Integrative Functional Genomics and Precision Genetics

    PubMed Central

    Bubier, Jason A.; Jay, Jeremy J.; Baker, Christopher L.; Bergeson, Susan E.; Ohno, Hiroshi; Metten, Pamela; Crabbe, John C.; Chesler, Elissa J.

    2014-01-01

    Extensive genetic and genomic studies of the relationship between alcohol drinking preference and withdrawal severity have been performed using animal models. Data from multiple such publications and public data resources have been incorporated in the GeneWeaver database with >60,000 gene sets including 285 alcohol withdrawal and preference-related gene sets. Among these are evidence for positional candidates regulating these behaviors in overlapping quantitative trait loci (QTL) mapped in distinct mouse populations. Combinatorial integration of functional genomics experimental results revealed a single QTL positional candidate gene in one of the loci common to both preference and withdrawal. Functional validation studies in Ap3m2 knockout mice confirmed these relationships. Genetic validation involves confirming the existence of segregating polymorphisms that could account for the phenotypic effect. By exploiting recent advances in mouse genotyping, sequence, epigenetics, and phylogeny resources, we confirmed that Ap3m2 resides in an appropriately segregating genomic region. We have demonstrated genetic and alcohol-induced regulation of Ap3m2 expression. Although sequence analysis revealed no polymorphisms in the Ap3m2-coding region that could account for all phenotypic differences, there are several upstream SNPs that could. We have identified one of these to be an H3K4me3 site that exhibits strain differences in methylation. Thus, by making cross-species functional genomics readily computable we identified a common QTL candidate for two related bio-behavioral processes via functional evidence and demonstrate sufficiency of the genetic locus as a source of variation underlying two traits. PMID:24923803

  18. Identification of a QTL in Mus musculus for alcohol preference, withdrawal, and Ap3m2 expression using integrative functional genomics and precision genetics.

    PubMed

    Bubier, Jason A; Jay, Jeremy J; Baker, Christopher L; Bergeson, Susan E; Ohno, Hiroshi; Metten, Pamela; Crabbe, John C; Chesler, Elissa J

    2014-08-01

    Extensive genetic and genomic studies of the relationship between alcohol drinking preference and withdrawal severity have been performed using animal models. Data from multiple such publications and public data resources have been incorporated in the GeneWeaver database with >60,000 gene sets including 285 alcohol withdrawal and preference-related gene sets. Among these are evidence for positional candidates regulating these behaviors in overlapping quantitative trait loci (QTL) mapped in distinct mouse populations. Combinatorial integration of functional genomics experimental results revealed a single QTL positional candidate gene in one of the loci common to both preference and withdrawal. Functional validation studies in Ap3m2 knockout mice confirmed these relationships. Genetic validation involves confirming the existence of segregating polymorphisms that could account for the phenotypic effect. By exploiting recent advances in mouse genotyping, sequence, epigenetics, and phylogeny resources, we confirmed that Ap3m2 resides in an appropriately segregating genomic region. We have demonstrated genetic and alcohol-induced regulation of Ap3m2 expression. Although sequence analysis revealed no polymorphisms in the Ap3m2-coding region that could account for all phenotypic differences, there are several upstream SNPs that could. We have identified one of these to be an H3K4me3 site that exhibits strain differences in methylation. Thus, by making cross-species functional genomics readily computable we identified a common QTL candidate for two related bio-behavioral processes via functional evidence and demonstrate sufficiency of the genetic locus as a source of variation underlying two traits. Copyright © 2014 by the Genetics Society of America.

  19. Discovering and understanding oncogenic gene fusions through data intensive computational approaches

    PubMed Central

    Latysheva, Natasha S.; Babu, M. Madan

    2016-01-01

    Abstract Although gene fusions have been recognized as important drivers of cancer for decades, our understanding of the prevalence and function of gene fusions has been revolutionized by the rise of next-generation sequencing, advances in bioinformatics theory and an increasing capacity for large-scale computational biology. The computational work on gene fusions has been vastly diverse, and the present state of the literature is fragmented. It will be fruitful to merge three camps of gene fusion bioinformatics that appear to rarely cross over: (i) data-intensive computational work characterizing the molecular biology of gene fusions; (ii) development research on fusion detection tools, candidate fusion prioritization algorithms and dedicated fusion databases and (iii) clinical research that seeks to either therapeutically target fusion transcripts and proteins or leverages advances in detection tools to perform large-scale surveys of gene fusion landscapes in specific cancer types. In this review, we unify these different—yet highly complementary and symbiotic—approaches with the view that increased synergy will catalyze advancements in gene fusion identification, characterization and significance evaluation. PMID:27105842

  20. Genotator: a disease-agnostic tool for genetic annotation of disease.

    PubMed

    Wall, Dennis P; Pivovarov, Rimma; Tong, Mark; Jung, Jae-Yoon; Fusaro, Vincent A; DeLuca, Todd F; Tonellato, Peter J

    2010-10-29

    Disease-specific genetic information has been increasing at rapid rates as a consequence of recent improvements and massive cost reductions in sequencing technologies. Numerous systems designed to capture and organize this mounting sea of genetic data have emerged, but these resources differ dramatically in their disease coverage and genetic depth. With few exceptions, researchers must manually search a variety of sites to assemble a complete set of genetic evidence for a particular disease of interest, a process that is both time-consuming and error-prone. We designed a real-time aggregation tool that provides both comprehensive coverage and reliable gene-to-disease rankings for any disease. Our tool, called Genotator, automatically integrates data from 11 externally accessible clinical genetics resources and uses these data in a straightforward formula to rank genes in order of disease relevance. We tested the accuracy of coverage of Genotator in three separate diseases for which there exist specialty curated databases, Autism Spectrum Disorder, Parkinson's Disease, and Alzheimer Disease. Genotator is freely available at http://genotator.hms.harvard.edu. Genotator demonstrated that most of the 11 selected databases contain unique information about the genetic composition of disease, with 2514 genes found in only one of the 11 databases. These findings confirm that the integration of these databases provides a more complete picture than would be possible from any one database alone. Genotator successfully identified at least 75% of the top ranked genes for all three of our use cases, including a 90% concordance with the top 40 ranked candidates for Alzheimer Disease. As a meta-query engine, Genotator provides high coverage of both historical genetic research as well as recent advances in the genetic understanding of specific diseases. As such, Genotator provides a real-time aggregation of ranked data that remains current with the pace of research in the disease fields. Genotator's algorithm appropriately transforms query terms to match the input requirements of each targeted databases and accurately resolves named synonyms to ensure full coverage of the genetic results with official nomenclature. Genotator generates an excel-style output that is consistent across disease queries and readily importable to other applications.

  1. Genome-wide identification of genetic determinants for the cytotoxicity of perifosine

    PubMed Central

    2008-01-01

    Perifosine belongs to the class of alkylphospholipid analogues, which act primarily at the cell membrane, thereby targeting signal transduction pathways. In phase I/II clinical trials, perifosine has induced tumour regression and caused disease stabilisation in a variety of tumour types. The genetic determinants responsible for its cytotoxicity have not been comprehensively studied, however. We performed a genome-wide analysis to identify genes whose expression levels or genotypic variation were correlated with the cytotoxicity of perifosine, using public databases on the US National Cancer Institute (NCI)-60 human cancer cell lines. For demonstrating drug specificity, the NCI Standard Agent Database (including 171 drugs acting through a variety of mechanisms) was used as a control. We identified agents with similar cytotoxicity profiles to that of perifosine in compounds used in the NCI drug screen. Furthermore, Gene Ontology and pathway analyses were carried out on genes more likely to be perifosine specific. The results suggested that genes correlated with perifosine cytotoxicity are connected by certain known pathways that lead to the mitogen-activated protein kinase signalling pathway and apoptosis. Biological processes such as 'response to stress', 'inflammatory response' and 'ubiquitin cycle' were enriched among these genes. Three single nucleotide polymorphisms (SNPs) located in CACNA2DI and EXOC4 were found to be correlated with perifosine cytotoxicity. Our results provided a manageable list of genes whose expression levels or genotypic variation were strongly correlated with the cytotoxcity of perifosine. These genes could be targets for further studies using candidate-gene approaches. The results also provided insights into the pharmacodynamics of perifosine. PMID:19129090

  2. Immuno-Navigator, a batch-corrected coexpression database, reveals cell type-specific gene networks in the immune system

    PubMed Central

    Vandenbon, Alexis; Dinh, Viet H.; Mikami, Norihisa; Kitagawa, Yohko; Teraguchi, Shunsuke; Ohkura, Naganari; Sakaguchi, Shimon

    2016-01-01

    High-throughput gene expression data are one of the primary resources for exploring complex intracellular dynamics in modern biology. The integration of large amounts of public data may allow us to examine general dynamical relationships between regulators and target genes. However, obstacles for such analyses are study-specific biases or batch effects in the original data. Here we present Immuno-Navigator, a batch-corrected gene expression and coexpression database for 24 cell types of the mouse immune system. We systematically removed batch effects from the underlying gene expression data and showed that this removal considerably improved the consistency between inferred correlations and prior knowledge. The data revealed widespread cell type-specific correlation of expression. Integrated analysis tools allow users to use this correlation of expression for the generation of hypotheses about biological networks and candidate regulators in specific cell types. We show several applications of Immuno-Navigator as examples. In one application we successfully predicted known regulators of importance in naturally occurring Treg cells from their expression correlation with a set of Treg-specific genes. For one high-scoring gene, integrin β8 (Itgb8), we confirmed an association between Itgb8 expression in forkhead box P3 (Foxp3)-positive T cells and Treg-specific epigenetic remodeling. Our results also suggest that the regulation of Treg-specific genes within Treg cells is relatively independent of Foxp3 expression, supporting recent results pointing to a Foxp3-independent component in the development of Treg cells. PMID:27078110

  3. Transcriptome analysis in Coffea eugenioides, an Arabica coffee ancestor, reveals differentially expressed genes in leaves and fruits.

    PubMed

    Yuyama, Priscila Mary; Reis Júnior, Osvaldo; Ivamoto, Suzana Tiemi; Domingues, Douglas Silva; Carazzolle, Marcelo Falsarella; Pereira, Gonçalo Amarante Guimarães; Charmetant, Pierre; Leroy, Thierry; Pereira, Luiz Filipe Protasio

    2016-02-01

    Studies in diploid parental species of polyploid plants are important to understand their contributions to the formation of plant and species evolution. Coffea eugenioides is a diploid species that is considered to be an ancestor of allopolyploid Coffea arabica together with Coffea canephora. Despite its importance in the evolutionary history of the main economic species of coffee, no study has focused on C. eugenioides molecular genetics. RNA-seq creates the possibility to generate reference transcriptomes and identify coding genes and potential candidates related to important agronomic traits. Therefore, the main objectives were to obtain a global overview of transcriptionally active genes in this species using next-generation sequencing and to analyze specific genes that were highly expressed in leaves and fruits with potential exploratory characteristics for breeding and understanding the evolutionary biology of coffee. A de novo assembly generated 36,935 contigs that were annotated using eight databases. We observed a total of ~5000 differentially expressed genes between leaves and fruits. Several genes exclusively expressed in fruits did not exhibit similarities with sequences in any database. We selected ten differentially expressed unigenes in leaves and fruits to evaluate transcriptional profiles using qPCR. Our study provides the first gene catalog for C. eugenioides and enhances the knowledge concerning the mechanisms involved in the C. arabica homeologous. Furthermore, this work will open new avenues for studies into specific genes and pathways in this species, especially related to fruit, and our data have potential value in assisted breeding applications.

  4. Detection of biomarkers for Hepatocellular Carcinoma using a hybrid univariate gene selection methods

    PubMed Central

    2012-01-01

    Background Discovering new biomarkers has a great role in improving early diagnosis of Hepatocellular carcinoma (HCC). The experimental determination of biomarkers needs a lot of time and money. This motivates this work to use in-silico prediction of biomarkers to reduce the number of experiments required for detecting new ones. This is achieved by extracting the most representative genes in microarrays of HCC. Results In this work, we provide a method for extracting the differential expressed genes, up regulated ones, that can be considered candidate biomarkers in high throughput microarrays of HCC. We examine the power of several gene selection methods (such as Pearson’s correlation coefficient, Cosine coefficient, Euclidean distance, Mutual information and Entropy with different estimators) in selecting informative genes. A biological interpretation of the highly ranked genes is done using KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways, ENTREZ and DAVID (Database for Annotation, Visualization, and Integrated Discovery) databases. The top ten genes selected using Pearson’s correlation coefficient and Cosine coefficient contained six genes that have been implicated in cancer (often multiple cancers) genesis in previous studies. A fewer number of genes were obtained by the other methods (4 genes using Mutual information, 3genes using Euclidean distance and only one gene using Entropy). A better result was obtained by the utilization of a hybrid approach based on intersecting the highly ranked genes in the output of all investigated methods. This hybrid combination yielded seven genes (2 genes for HCC and 5 genes in different types of cancer) in the top ten genes of the list of intersected genes. Conclusions To strengthen the effectiveness of the univariate selection methods, we propose a hybrid approach by intersecting several of these methods in a cascaded manner. This approach surpasses all of univariate selection methods when used individually according to biological interpretation and the examination of gene expression signal profiles. PMID:22867264

  5. Computational genomic analysis of PARK7 interactome reveals high BBS1 gene expression as a prognostic factor favoring survival in malignant pleural mesothelioma.

    PubMed

    Vavougios, Georgios D; Solenov, Evgeniy I; Hatzoglou, Chrissi; Baturina, Galina S; Katkova, Liubov E; Molyvdas, Paschalis Adam; Gourgoulianis, Konstantinos I; Zarogiannis, Sotirios G

    2015-10-01

    The aim of our study was to assess the differential gene expression of Parkinson protein 7 (PARK7) interactome in malignant pleural mesothelioma (MPM) using data mining techniques to identify novel candidate genes that may play a role in the pathogenicity of MPM. We constructed the PARK7 interactome using the ConsensusPathDB database. We then interrogated the Oncomine Cancer Microarray database using the Gordon Mesothelioma Study, for differential gene expression of the PARK7 interactome. In ConsensusPathDB, 38 protein interactors of PARK7 were identified. In the Gordon Mesothelioma Study, 34 of them were assessed out of which SUMO1, UBC3, KIAA0101, HDAC2, DAXX, RBBP4, BBS1, NONO, RBBP7, HTRA2, and STUB1 were significantly overexpressed whereas TRAF6 and MTA2 were significantly underexpressed in MPM patients (network 2). Furthermore, Kaplan-Meier analysis revealed that MPM patients with high BBS1 expression had a median overall survival of 16.5 vs. 8.7 mo of those that had low expression. For validation purposes, we performed a meta-analysis in Oncomine database in five sarcoma datasets. Eight network 2 genes (KIAA0101, HDAC2, SUMO1, RBBP4, NONO, RBBP7, HTRA2, and MTA2) were significantly differentially expressed in an array of 18 different sarcoma types. Finally, Gene Ontology annotation enrichment analysis revealed significant roles of the PARK7 interactome in NuRD, CHD, and SWI/SNF protein complexes. In conclusion, we identified 13 novel genes differentially expressed in MPM, never reported before. Among them, BBS1 emerged as a novel predictor of overall survival in MPM. Finally, we identified that PARK7 interactome is involved in novel pathways pertinent in MPM disease. Copyright © 2015 the American Physiological Society.

  6. Generation and Analysis of a Large-Scale Expressed Sequence Tag Database from a Full-Length Enriched cDNA Library of Developing Leaves of Gossypium hirsutum L

    PubMed Central

    Pang, Chaoyou; Fan, Shuli; Song, Meizhen; Yu, Shuxun

    2013-01-01

    Background Cotton (Gossypium hirsutum L.) is one of the world’s most economically-important crops. However, its entire genome has not been sequenced, and limited resources are available in GenBank for understanding the molecular mechanisms underlying leaf development and senescence. Methodology/Principal Findings In this study, 9,874 high-quality ESTs were generated from a normalized, full-length cDNA library derived from pooled RNA isolated from throughout leaf development during the plant blooming stage. After clustering and assembly of these ESTs, 5,191 unique sequences, representative 1,652 contigs and 3,539 singletons, were obtained. The average unique sequence length was 682 bp. Annotation of these unique sequences revealed that 84.4% showed significant homology to sequences in the NCBI non-redundant protein database, and 57.3% had significant hits to known proteins in the Swiss-Prot database. Comparative analysis indicated that our library added 2,400 ESTs and 991 unique sequences to those known for cotton. The unigenes were functionally characterized by gene ontology annotation. We identified 1,339 and 200 unigenes as potential leaf senescence-related genes and transcription factors, respectively. Moreover, nine genes related to leaf senescence and eleven MYB transcription factors were randomly selected for quantitative real-time PCR (qRT-PCR), which revealed that these genes were regulated differentially during senescence. The qRT-PCR for three GhYLSs revealed that these genes express express preferentially in senescent leaves. Conclusions/Significance These EST resources will provide valuable sequence information for gene expression profiling analyses and functional genomics studies to elucidate their roles, as well as for studying the mechanisms of leaf development and senescence in cotton and discovering candidate genes related to important agronomic traits of cotton. These data will also facilitate future whole-genome sequence assembly and annotation in G. hirsutum and comparative genomics among Gossypium species. PMID:24146870

  7. Transcriptome Analysis of Dendrobium officinale and its Application to the Identification of Genes Associated with Polysaccharide Synthesis

    PubMed Central

    Zhang, Jianxia; He, Chunmei; Wu, Kunlin; Teixeira da Silva, Jaime A.; Zeng, Songjun; Zhang, Xinhua; Yu, Zhenming; Xia, Haoqiang; Duan, Jun

    2016-01-01

    Dendrobium officinale is one of the most important Chinese medicinal herbs. Polysaccharides are one of the main active ingredients of D. officinale. To identify the genes that maybe related to polysaccharides synthesis, two cDNA libraries were prepared from juvenile and adult D. officinale, and were named Dendrobium-1 and Dendrobium-2, respectively. Illumina sequencing for Dendrobium-1 generated 102 million high quality reads that were assembled into 93,881 unigenes with an average sequence length of 790 base pairs. The sequencing for Dendrobium-2 generated 86 million reads that were assembled into 114,098 unigenes with an average sequence length of 695 base pairs. Two transcriptome databases were integrated and assembled into a total of 145,791 unigenes. Among them, 17,281 unigenes were assigned to 126 KEGG pathways while 135 unigenes were involved in fructose and mannose metabolism. Gene Ontology analysis revealed that the majority of genes were associated with metabolic and cellular processes. Furthermore, 430 glycosyltransferase and 89 cellulose synthase genes were identified. Comparative analysis of both transcriptome databases revealed a total of 32,794 differential expression genes (DEGs), including 22,051 up-regulated and 10,743 down-regulated genes in Dendrobium-2 compared to Dendrobium-1. Furthermore, a total of 1142 and 7918 unigenes showed unique expression in Dendrobium-1 and Dendrobium-2, respectively. These DEGs were mainly correlated with metabolic pathways and the biosynthesis of secondary metabolites. In addition, 170 DEGs belonged to glycosyltransferase genes, 37 DEGs were related to cellulose synthase genes and 627 DEGs encoded transcription factors. This study substantially expands the transcriptome information for D. officinale and provides valuable clues for identifying candidate genes involved in polysaccharide biosynthesis and elucidating the mechanism of polysaccharide biosynthesis. PMID:26904032

  8. Insight into Catechins Metabolic Pathways of Camellia sinensis Based on Genome and Transcriptome Analysis.

    PubMed

    Wang, Wenzhao; Zhou, Yihui; Wu, Yingling; Dai, Xinlong; Liu, Yajun; Qian, Yumei; Li, Mingzhuo; Jiang, Xiaolan; Wang, Yunsheng; Gao, Liping; Xia, Tao

    2018-04-25

    Tea is an important economic crop with a 3.02 Gb genome. It accumulates various bioactive compounds, especially catechins, which are closely associated with tea flavor and quality. Catechins are biosynthesized through the phenylpropanoid and flavonoid pathways, with 12 structural genes being involved in their synthesis. However, we found that in Camellia sinensis the understanding of the basic profile of catechins biosynthesis is still unclear. The gene structure, locus, transcript number, transcriptional variation, and function of multigene families have not yet been clarified. Our previous studies demonstrated that the accumulation of flavonoids in tea is species, tissue, and induction specific, which indicates that gene coexpression patterns may be involved in tea catechins and flavonoids biosynthesis. In this paper, we screened candidate genes of multigene families involved in the phenylpropanoid and flavonoid pathways based on an analysis of genome and transcriptome sequence data. The authenticity of candidate genes was verified by PCR cloning, and their function was validated by reverse genetic methods. In the present study, 36 genes from 12 gene families were identified and were accessed in the NCBI database. During this process, some intron retention events of the CsCHI and CsDFR genes were found. Furthermore, the transcriptome sequencing of various tea tissues and subcellular location assays revealed coexpression and colocalization patterns. The correlation analysis showed that CsCHIc, CsF3'H, and CsANRb expression levels are associated significantly with the concentration of soluble PA as well as the expression levels of CsPALc and CsPALf with the concentration of insoluble PA. This work provides insights into catechins metabolism in tea and provides a foundation for future studies.

  9. Adaptation of video game UVW mapping to 3D visualization of gene expression patterns

    NASA Astrophysics Data System (ADS)

    Vize, Peter D.; Gerth, Victor E.

    2007-01-01

    Analysis of gene expression patterns within an organism plays a critical role in associating genes with biological processes in both health and disease. During embryonic development the analysis and comparison of different gene expression patterns allows biologists to identify candidate genes that may regulate the formation of normal tissues and organs and to search for genes associated with congenital diseases. No two individual embryos, or organs, are exactly the same shape or size so comparing spatial gene expression in one embryo to that in another is difficult. We will present our efforts in comparing gene expression data collected using both volumetric and projection approaches. Volumetric data is highly accurate but difficult to process and compare. Projection methods use UV mapping to align texture maps to standardized spatial frameworks. This approach is less accurate but is very rapid and requires very little processing. We have built a database of over 180 3D models depicting gene expression patterns mapped onto the surface of spline based embryo models. Gene expression data in different models can easily be compared to determine common regions of activity. Visualization software, both Java and OpenGL optimized for viewing 3D gene expression data will also be demonstrated.

  10. Computational Analysis of Candidate Disease Genes and Variants for Salt-Sensitive Hypertension in Indigenous Southern Africans

    PubMed Central

    Tiffin, Nicki; Meintjes, Ayton; Ramesar, Rajkumar; Bajic, Vladimir B.; Rayner, Brian

    2010-01-01

    Multiple factors underlie susceptibility to essential hypertension, including a significant genetic and ethnic component, and environmental effects. Blood pressure response of hypertensive individuals to salt is heterogeneous, but salt sensitivity appears more prevalent in people of indigenous African origin. The underlying genetics of salt-sensitive hypertension, however, are poorly understood. In this study, computational methods including text- and data-mining have been used to select and prioritize candidate aetiological genes for salt-sensitive hypertension. Additionally, we have compared allele frequencies and copy number variation for single nucleotide polymorphisms in candidate genes between indigenous Southern African and Caucasian populations, with the aim of identifying candidate genes with significant variability between the population groups: identifying genetic variability between population groups can exploit ethnic differences in disease prevalence to aid with prioritisation of good candidate genes. Our top-ranking candidate genes include parathyroid hormone precursor (PTH) and type-1angiotensin II receptor (AGTR1). We propose that the candidate genes identified in this study warrant further investigation as potential aetiological genes for salt-sensitive hypertension. PMID:20886000

  11. Detection of Significant Pneumococcal Meningitis Biomarkers by Ego Network.

    PubMed

    Wang, Qian; Lou, Zhifeng; Zhai, Liansuo; Zhao, Haibin

    2017-06-01

    To identify significant biomarkers for detection of pneumococcal meningitis based on ego network. Based on the gene expression data of pneumococcal meningitis and global protein-protein interactions (PPIs) data recruited from open access databases, the authors constructed a differential co-expression network (DCN) to identify pneumococcal meningitis biomarkers in a network view. Here EgoNet algorithm was employed to screen the significant ego networks that could accurately distinguish pneumococcal meningitis from healthy controls, by sequentially seeking ego genes, searching candidate ego networks, refinement of candidate ego networks and significance analysis to identify ego networks. Finally, the functional inference of the ego networks was performed to identify significant pathways for pneumococcal meningitis. By differential co-expression analysis, the authors constructed the DCN that covered 1809 genes and 3689 interactions. From the DCN, a total of 90 ego genes were identified. Starting from these ego genes, three significant ego networks (Module 19, Module 70 and Module 71) that could predict clinical outcomes for pneumococcal meningitis were identified by EgoNet algorithm, and the corresponding ego genes were GMNN, MAD2L1 and TPX2, respectively. Pathway analysis showed that these three ego networks were related to CDT1 association with the CDC6:ORC:origin complex, inactivation of APC/C via direct inhibition of the APC/C complex pathway, and DNA strand elongation, respectively. The authors successfully screened three significant ego modules which could accurately predict the clinical outcomes for pneumococcal meningitis and might play important roles in host response to pathogen infection in pneumococcal meningitis.

  12. Genome-wide generation and use of informative intron-spanning and intron-length polymorphism markers for high-throughput genetic analysis in rice

    PubMed Central

    Badoni, Saurabh; Das, Sweta; Sayal, Yogesh K.; Gopalakrishnan, S.; Singh, Ashok K.; Rao, Atmakuri R.; Agarwal, Pinky; Parida, Swarup K.; Tyagi, Akhilesh K.

    2016-01-01

    We developed genome-wide 84634 ISM (intron-spanning marker) and 16510 InDel-fragment length polymorphism-based ILP (intron-length polymorphism) markers from genes physically mapped on 12 rice chromosomes. These genic markers revealed much higher amplification-efficiency (80%) and polymorphic-potential (66%) among rice accessions even by a cost-effective agarose gel-based assay. A wider level of functional molecular diversity (17–79%) and well-defined precise admixed genetic structure was assayed by 3052 genome-wide markers in a structured population of indica, japonica, aromatic and wild rice. Six major grain weight QTLs (11.9–21.6% phenotypic variation explained) were mapped on five rice chromosomes of a high-density (inter-marker distance: 0.98 cM) genetic linkage map (IR 64 x Sonasal) anchored with 2785 known/candidate gene-derived ISM and ILP markers. The designing of multiple ISM and ILP markers (2 to 4 markers/gene) in an individual gene will broaden the user-preference to select suitable primer combination for efficient assaying of functional allelic variation/diversity and realistic estimation of differential gene expression profiles among rice accessions. The genomic information generated in our study is made publicly accessible through a user-friendly web-resource, “Oryza ISM-ILP marker” database. The known/candidate gene-derived ISM and ILP markers can be enormously deployed to identify functionally relevant trait-associated molecular tags by optimal-resource expenses, leading towards genomics-assisted crop improvement in rice. PMID:27032371

  13. Classification of rare missense substitutions, using risk surfaces, with genetic- and molecular-epidemiology applications.

    PubMed

    Tavtigian, Sean V; Byrnes, Graham B; Goldgar, David E; Thomas, Alun

    2008-11-01

    Many individually rare missense substitutions are encountered during deep resequencing of candidate susceptibility genes and clinical mutation screening of known susceptibility genes. BRCA1 and BRCA2 are among the most resequenced of all genes, and clinical mutation screening of these genes provides an extensive data set for analysis of rare missense substitutions. Align-GVGD is a mathematically simple missense substitution analysis algorithm, based on the Grantham difference, which has already contributed to classification of missense substitutions in BRCA1, BRCA2, and CHEK2. However, the distribution of genetic risk as a function of Align-GVGD's output variables Grantham variation (GV) and Grantham deviation (GD) has not been well characterized. Here, we used data from the Myriad Genetic Laboratories database of nearly 70,000 full-sequence tests plus two risk estimates, one approximating the odds ratio and the other reflecting strength of selection, to display the distribution of risk in the GV-GD plane as a series of surfaces. We abstracted contours from the surfaces and used the contours to define a sequence of missense substitution grades ordered from greatest risk to least risk. The grades were validated internally using a third, personal and family history-based, measure of risk. The Align-GVGD grades defined here are applicable to both the genetic epidemiology problem of classifying rare missense substitutions observed in known susceptibility genes and the molecular epidemiology problem of analyzing rare missense substitutions observed during case-control mutation screening studies of candidate susceptibility genes. (c) 2008 Wiley-Liss, Inc.

  14. De novo assembly of the Japanese lawngrass (Zoysia japonica Steud.) root transcriptome and identification of candidate unigenes related to early responses under salt stress

    PubMed Central

    Xie, Qi; Niu, Jun; Xu, Xilin; Xu, Lixin; Zhang, Yinbing; Fan, Bo; Liang, Xiaohong; Zhang, Lijuan; Yin, Shuxia; Han, Liebao

    2015-01-01

    Japanese lawngrass (Zoysia japonica Steud.) is an important warm-season turfgrass that is able to survive in a range of soils, from infertile sands to clays, and to grow well under saline conditions. However, little is known about the molecular mechanisms involved in its resistance to salt stress. Here, we used high-throughput RNA sequencing (RNA-seq) to investigate the changes in gene expression of Zoysia grass at high NaCl concentrations. We first constructed two sequencing libraries, including control and NaCl-treated samples, and sequenced them using the Illumina HiSeq™ 2000 platform. Approximately 157.20 million paired-end reads with a total length of 68.68 Mb were obtained. Subsequently, 32,849 unigenes with an N50 length of 1781 bp were assembled using Trinity. Furthermore, three public databases, the Kyoto Encyclopedia of Genes and Genomes (KEGG), Swiss-prot, and Clusters of Orthologous Groups (COGs), were used for gene function analysis and enrichment. The annotated genes included 57 Gene Ontology (GO) terms, 120 KEGG pathways, and 24 COGs. Compared with the control, 1455 genes were significantly different (false discovery rate ≤0.01, |log2Ratio |≥1) in the NaCl-treated samples. These genes were enriched in 10 KEGG pathways and 73 GO terms, and subjected to 25 COG categories. Using high-throughput next-generation sequencing, we built a database as a global transcript resource for Z. japonica Steud. roots. The results of this study will advance our understanding of the early salt response in Japanese lawngrass roots. PMID:26347751

  15. SZDB: A Database for Schizophrenia Genetic Research

    PubMed Central

    Wu, Yong; Yao, Yong-Gang

    2017-01-01

    Abstract Schizophrenia (SZ) is a debilitating brain disorder with a complex genetic architecture. Genetic studies, especially recent genome-wide association studies (GWAS), have identified multiple variants (loci) conferring risk to SZ. However, how to efficiently extract meaningful biological information from bulk genetic findings of SZ remains a major challenge. There is a pressing need to integrate multiple layers of data from various sources, eg, genetic findings from GWAS, copy number variations (CNVs), association and linkage studies, gene expression, protein–protein interaction (PPI), co-expression, expression quantitative trait loci (eQTL), and Encyclopedia of DNA Elements (ENCODE) data, to provide a comprehensive resource to facilitate the translation of genetic findings into SZ molecular diagnosis and mechanism study. Here we developed the SZDB database (http://www.szdb.org/), a comprehensive resource for SZ research. SZ genetic data, gene expression data, network-based data, brain eQTL data, and SNP function annotation information were systematically extracted, curated and deposited in SZDB. In-depth analyses and systematic integration were performed to identify top prioritized SZ genes and enriched pathways. Multiple types of data from various layers of SZ research were systematically integrated and deposited in SZDB. In-depth data analyses and integration identified top prioritized SZ genes and enriched pathways. We further showed that genes implicated in SZ are highly co-expressed in human brain and proteins encoded by the prioritized SZ risk genes are significantly interacted. The user-friendly SZDB provides high-confidence candidate variants and genes for further functional characterization. More important, SZDB provides convenient online tools for data search and browse, data integration, and customized data analyses. PMID:27451428

  16. Transcriptome sequencing of a chimaera reveals coordinated expression of anthocyanin biosynthetic genes mediating yellow formation in herbaceous peony (Paeonia lactiflora Pall.).

    PubMed

    Zhao, Daqiu; Jiang, Yao; Ning, Chuanlong; Meng, Jiasong; Lin, Shasha; Ding, Wen; Tao, Jun

    2014-08-19

    Herbaceous peony (Paeonia lactiflora Pall.) is a traditional flower in China and a wedding attractive flower in worldwide. In its flower colour, yellow is the rarest which is ten times the price of the other colours. However, the breeding of new yellow P. lactiflora varieties using genetic engineering is severely limited due to the little-known biochemical and molecular mechanisms underlying its characteristic formation. In this study, two cDNA libraries generated from P. lactiflora chimaera with red outer-petal and yellow inner-petal were sequenced using an Illumina HiSeq™ 2000 platform. 66,179,398 and 65,481,444 total raw reads from red outer-petal and yellow inner-petal cDNA libraries were generated, which were assembled into 61,431 and 70,359 Unigenes with an average length of 628 and 617 nt, respectively. Moreover, 61,408 non-redundant All-unigenes were obtained, with 37,511 All-unigenes (61.08%) annotated in public databases. In addition, 6,345 All-unigenes were differentially expressed between the red outer-petal and yellow inner-petal, with 3,899 up-regulated and 2,446 down-regulated All-unigenes, and the flavonoid metabolic pathway related to colour development was identified using the Kyoto encyclopedia of genes and genomes database (KEGG). Subsequently, the expression patterns of 10 candidate differentially expressed genes (DEGs) involved in the flavonoid metabolic pathway were examined, and flavonoids were qualitatively and quantitatively analysed. Numerous anthoxanthins (flavone and flavonol) and a few anthocyanins were detected in the yellow inner-petal, which were all lower than those in the red outer-petal due to the low expression levels of the phenylalanine ammonialyase gene (PlPAL), flavonol synthase gene (PlFLS), dihydroflavonol 4-reductase gene (PlDFR), anthocyanidin synthase gene (PlANS), anthocyanidin 3-O-glucosyltransferase gene (Pl3GT) and anthocyanidin 5-O-glucosyltransferase gene (Pl5GT). Transcriptome sequencing (RNA-Seq) analysis based on the high throughput sequencing technology was an efficient approach to identify critical genes in P. lactiflora and other non-model plants. The flavonoid metabolic pathway and glucide metabolic pathway were identified as relatived yellow formation in P. lactiflora, PlPAL, PlFLS, PlDFR, PlANS, Pl3GT and Pl5GT were selected as potential candidates involved in flavonoid metabolic pathway, which inducing inhibition of anthocyanin biosynthesis mediated yellow formation in P. lactiflora. This study could lay a theoretical foundation for breeding new yellow P. lactiflora varieties.

  17. microPIR2: a comprehensive database for human–mouse comparative study of microRNA–promoter interactions

    PubMed Central

    Piriyapongsa, Jittima; Bootchai, Chaiwat; Ngamphiw, Chumpol; Tongsima, Sissades

    2014-01-01

    microRNA (miRNA)–promoter interaction resource (microPIR) is a public database containing over 15 million predicted miRNA target sites located within human promoter sequences. These predicted targets are presented along with their related genomic and experimental data, making the microPIR database the most comprehensive repository of miRNA promoter target sites. Here, we describe major updates of the microPIR database including new target predictions in the mouse genome and revised human target predictions. The updated database (microPIR2) now provides ∼80 million human and 40 million mouse predicted target sites. In addition to being a reference database, microPIR2 is a tool for comparative analysis of target sites on the promoters of human–mouse orthologous genes. In particular, this new feature was designed to identify potential miRNA–promoter interactions conserved between species that could be stronger candidates for further experimental validation. We also incorporated additional supporting information to microPIR2 such as nuclear and cytoplasmic localization of miRNAs and miRNA–disease association. Extra search features were also implemented to enable various investigations of targets of interest. Database URL: http://www4a.biotec.or.th/micropir2 PMID:25425035

  18. Single-nucleotide polymorphisms studied for associations with urinary toxicity from (125)I prostate brachytherapy implants.

    PubMed

    Usmani, Nawaid; Leong, Nelson; Martell, Kevin; Lan, Lanna; Ghosh, Sunita; Pervez, Nadeem; Pedersen, John; Yee, Don; Murtha, Albert; Amanie, John; Sloboda, Ron; Murray, David; Parliament, Matthew

    2014-01-01

    To identify clinical, dosimetric, and genetic factors that are associated with late urinary toxicity after a (125)I prostate brachytherapy implant. Genomic DNA from 296 men treated with (125)I prostate brachytherapy monotherapy was extracted from saliva samples for this study. A retrospective database was compiled including clinical, dosimetric, and toxicity data for this cohort of patients. Fourteen candidate single-nucleotide polymorphism (SNPs) from 13 genes (TP53, ERCC2, GSTP1, NOS, TGFβ1, MSH6, RAD51, ATM, LIG4, XRCC1, XRCC3, GSTA1, and SOD2) were tested in this cohort for correlations with toxicity. This study identified 217 men with at least 2 years of followup. Of these, 39 patients developed Grade ≥2 late urinary complications with a transurethral resection of prostate, urethral stricture, gross hematuria, or a sustained increase in their International Prostate Symptom Score. The only clinical or dosimetric factor that was associated with late urinary toxicity was age (p = 0.02). None of the 14 SNPs tested in this study were associated with late urinary toxicity in the univariate analysis. This study identified age as the only variable being associated with late urinary toxicity. However, the small sample size and the candidate gene approach used in this study mean that further investigations are essential. Genome-wide association studies are emerging as the preferred approach for future radiogenomic studies to overcome the limitations from a candidate gene approach. Crown Copyright © 2014. Published by Elsevier Inc. All rights reserved.

  19. Assembly of the draft genome of buckwheat and its applications in identifying agronomically useful genes.

    PubMed

    Yasui, Yasuo; Hirakawa, Hideki; Ueno, Mariko; Matsui, Katsuhiro; Katsube-Tanaka, Tomoyuki; Yang, Soo Jung; Aii, Jotaro; Sato, Shingo; Mori, Masashi

    2016-06-01

    Buckwheat (Fagopyrum esculentum Moench; 2n = 2x = 16) is a nutritionally dense annual crop widely grown in temperate zones. To accelerate molecular breeding programmes of this important crop, we generated a draft assembly of the buckwheat genome using short reads obtained by next-generation sequencing (NGS), and constructed the Buckwheat Genome DataBase. After assembling short reads, we determined 387,594 scaffolds as the draft genome sequence (FES_r1.0). The total length of FES_r1.0 was 1,177,687,305 bp, and the N50 of the scaffolds was 25,109 bp. Gene prediction analysis revealed 286,768 coding sequences (CDSs; FES_r1.0_cds) including those related to transposable elements. The total length of FES_r1.0_cds was 212,917,911 bp, and the N50 was 1,101 bp. Of these, the functions of 35,816 CDSs excluding those for transposable elements were annotated by BLAST analysis. To demonstrate the utility of the database, we conducted several test analyses using BLAST and keyword searches. Furthermore, we used the draft genome as a reference sequence for NGS-based markers, and successfully identified novel candidate genes controlling heteromorphic self-incompatibility of buckwheat. The database and draft genome sequence provide a valuable resource that can be used in efforts to develop buckwheat cultivars with superior agronomic traits. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  20. Accurate Typing of Human Leukocyte Antigen Class I Genes by Oxford Nanopore Sequencing.

    PubMed

    Liu, Chang; Xiao, Fangzhou; Hoisington-Lopez, Jessica; Lang, Kathrin; Quenzel, Philipp; Duffy, Brian; Mitra, Robi David

    2018-04-03

    Oxford Nanopore Technologies' MinION has expanded the current DNA sequencing toolkit by delivering long read lengths and extreme portability. The MinION has the potential to enable expedited point-of-care human leukocyte antigen (HLA) typing, an assay routinely used to assess the immunologic compatibility between organ donors and recipients, but the platform's high error rate makes it challenging to type alleles with accuracy. We developed and validated accurate typing of HLA by Oxford nanopore (Athlon), a bioinformatic pipeline that i) maps nanopore reads to a database of known HLA alleles, ii) identifies candidate alleles with the highest read coverage at different resolution levels that are represented as branching nodes and leaves of a tree structure, iii) generates consensus sequences by remapping the reads to the candidate alleles, and iv) calls the final diploid genotype by blasting consensus sequences against the reference database. Using two independent data sets generated on the R9.4 flow cell chemistry, Athlon achieved a 100% accuracy in class I HLA typing at the two-field resolution. Copyright © 2018 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  1. Identification of candidate genes in osteoporosis by integrated microarray analysis.

    PubMed

    Li, J J; Wang, B Q; Fei, Q; Yang, Y; Li, D

    2016-12-01

    In order to screen the altered gene expression profile in peripheral blood mononuclear cells of patients with osteoporosis, we performed an integrated analysis of the online microarray studies of osteoporosis. We searched the Gene Expression Omnibus (GEO) database for microarray studies of peripheral blood mononuclear cells in patients with osteoporosis. Subsequently, we integrated gene expression data sets from multiple microarray studies to obtain differentially expressed genes (DEGs) between patients with osteoporosis and normal controls. Gene function analysis was performed to uncover the functions of identified DEGs. A total of three microarray studies were selected for integrated analysis. In all, 1125 genes were found to be significantly differentially expressed between osteoporosis patients and normal controls, with 373 upregulated and 752 downregulated genes. Positive regulation of the cellular amino metabolic process (gene ontology (GO): 0033240, false discovery rate (FDR) = 1.00E + 00) was significantly enriched under the GO category for biological processes, while for molecular functions, flavin adenine dinucleotide binding (GO: 0050660, FDR = 3.66E-01) and androgen receptor binding (GO: 0050681, FDR = 6.35E-01) were significantly enriched. DEGs were enriched in many osteoporosis-related signalling pathways, including those of mitogen-activated protein kinase (MAPK) and calcium. Protein-protein interaction (PPI) network analysis showed that the significant hub proteins contained ubiquitin specific peptidase 9, X-linked (Degree = 99), ubiquitin specific peptidase 19 (Degree = 57) and ubiquitin conjugating enzyme E2 B (Degree = 57). Analysis of gene function of identified differentially expressed genes may expand our understanding of fundamental mechanisms leading to osteoporosis. Moreover, significantly enriched pathways, such as MAPK and calcium, may involve in osteoporosis through osteoblastic differentiation and bone formation.Cite this article: J. J. Li, B. Q. Wang, Q. Fei, Y. Yang, D. Li. Identification of candidate genes in osteoporosis by integrated microarray analysis. Bone Joint Res 2016;5:594-601. DOI: 10.1302/2046-3758.512.BJR-2016-0073.R1. © 2016 Fei et al.

  2. VitisCyc: a metabolic pathway knowledgebase for grapevine (Vitis vinifera)

    PubMed Central

    Naithani, Sushma; Raja, Rajani; Waddell, Elijah N.; Elser, Justin; Gouthu, Satyanarayana; Deluc, Laurent G.; Jaiswal, Pankaj

    2014-01-01

    We have developed VitisCyc, a grapevine-specific metabolic pathway database that allows researchers to (i) search and browse the database for its various components such as metabolic pathways, reactions, compounds, genes and proteins, (ii) compare grapevine metabolic networks with other publicly available plant metabolic networks, and (iii) upload, visualize and analyze high-throughput data such as transcriptomes, proteomes, metabolomes etc. using OMICs-Viewer tool. VitisCyc is based on the genome sequence of the nearly homozygous genotype PN40024 of Vitis vinifera “Pinot Noir” cultivar with 12X v1 annotations and was built on BioCyc platform using Pathway Tools software and MetaCyc reference database. Furthermore, VitisCyc was enriched for plant-specific pathways and grape-specific metabolites, reactions and pathways. Currently VitisCyc harbors 68 super pathways, 362 biosynthesis pathways, 118 catabolic pathways, 5 detoxification pathways, 36 energy related pathways and 6 transport pathways, 10,908 enzymes, 2912 enzymatic reactions, 31 transport reactions and 2024 compounds. VitisCyc, as a community resource, can aid in the discovery of candidate genes and pathways that are regulated during plant growth and development, and in response to biotic and abiotic stress signals generated from a plant's immediate environment. VitisCyc version 3.18 is available online at http://pathways.cgrb.oregonstate.edu. PMID:25538713

  3. Comprehensive analysis of alternative splicing and functionality in neuronal differentiation of P19 cells.

    PubMed

    Suzuki, Hitoshi; Osaki, Ken; Sano, Kaori; Alam, A H M Khurshid; Nakamura, Yuichiro; Ishigaki, Yasuhito; Kawahara, Kozo; Tsukahara, Toshifumi

    2011-02-18

    Alternative splicing, which produces multiple mRNAs from a single gene, occurs in most human genes and contributes to protein diversity. Many alternative isoforms are expressed in a spatio-temporal manner, and function in diverse processes, including in the neural system. The purpose of the present study was to comprehensively investigate neural-splicing using P19 cells. GeneChip Exon Array analysis was performed using total RNAs purified from cells during neuronal cell differentiation. To efficiently and readily extract the alternative exon candidates, 9 filtering conditions were prepared, yielding 262 candidate exons (236 genes). Semiquantitative RT-PCR results in 30 randomly selected candidates suggested that 87% of the candidates were differentially alternatively spliced in neuronal cells compared to undifferentiated cells. Gene ontology and pathway analyses suggested that many of the candidate genes were associated with neural events. Together with 66 genes whose functions in neural cells or organs were reported previously, 47 candidate genes were found to be linked to 189 events in the gene-level profile of neural differentiation. By text-mining for the alternative isoform, distinct functions of the isoforms of 9 candidate genes indicated by the result of Exon Array were confirmed. Alternative exons were successfully extracted. Results from the informatics analyses suggested that neural events were primarily governed by genes whose expression was increased and whose transcripts were differentially alternatively spliced in the neuronal cells. In addition to known functions in neural cells or organs, the uninvestigated alternative splicing events of 11 genes among 47 candidate genes suggested that cell cycle events are also potentially important. These genes may help researchers to differentiate the roles of alternative splicing in cell differentiation and cell proliferation.

  4. Transcriptomic analysis of flower development in wintersweet (Chimonanthus praecox).

    PubMed

    Liu, Daofeng; Sui, Shunzhao; Ma, Jing; Li, Zhineng; Guo, Yulong; Luo, Dengpan; Yang, Jianfeng; Li, Mingyang

    2014-01-01

    Wintersweet (Chimonanthus praecox) is familiar as a garden plant and woody ornamental flower. On account of its unique flowering time and strong fragrance, it has a high ornamental and economic value. Despite a long history of human cultivation, our understanding of wintersweet genetics and molecular biology remains scant, reflecting a lack of basic genomic and transcriptomic data. In this study, we assembled three cDNA libraries, from three successive stages in flower development, designated as the flower bud with displayed petal, open flower and senescing flower stages. Using the Illumina RNA-Seq method, we obtained 21,412,928, 26,950,404, 24,912,954 qualified Illumina reads, respectively, for the three successive stages. The pooled reads from all three libraries were then assembled into 106,995 transcripts, 51,793 of which were annotated in the NCBI non-redundant protein database. Of these annotated sequences, 32,649 and 21,893 transcripts were assigned to gene ontology categories and clusters of orthologous groups, respectively. We could map 15,587 transcripts onto 312 pathways using the Kyoto Encyclopedia of Genes and Genomes pathway database. Based on these transcriptomic data, we obtained a large number of candidate genes that were differentially expressed at the open flower and senescing flower stages. An analysis of differentially expressed genes involved in plant hormone signal transduction pathways indicated that although flower opening and senescence may be independent of the ethylene signaling pathway in wintersweet, salicylic acid may be involved in the regulation of flower senescence. We also succeeded in isolating key genes of floral scent biosynthesis and proposed a biosynthetic pathway for monoterpenes and sesquiterpenes in wintersweet flowers, based on the annotated sequences. This comprehensive transcriptomic analysis presents fundamental information on the genes and pathways which are involved in flower development in wintersweet. And our data provided a useful database for further research of wintersweet and other Calycanthaceae family plants.

  5. Transcriptomic Analysis of Flower Development in Wintersweet (Chimonanthus praecox)

    PubMed Central

    Liu, Daofeng; Sui, Shunzhao; Ma, Jing; Li, Zhineng; Guo, Yulong; Luo, Dengpan; Yang, Jianfeng; Li, Mingyang

    2014-01-01

    Wintersweet (Chimonanthus praecox) is familiar as a garden plant and woody ornamental flower. On account of its unique flowering time and strong fragrance, it has a high ornamental and economic value. Despite a long history of human cultivation, our understanding of wintersweet genetics and molecular biology remains scant, reflecting a lack of basic genomic and transcriptomic data. In this study, we assembled three cDNA libraries, from three successive stages in flower development, designated as the flower bud with displayed petal, open flower and senescing flower stages. Using the Illumina RNA-Seq method, we obtained 21,412,928, 26,950,404, 24,912,954 qualified Illumina reads, respectively, for the three successive stages. The pooled reads from all three libraries were then assembled into 106,995 transcripts, 51,793 of which were annotated in the NCBI non-redundant protein database. Of these annotated sequences, 32,649 and 21,893 transcripts were assigned to gene ontology categories and clusters of orthologous groups, respectively. We could map 15,587 transcripts onto 312 pathways using the Kyoto Encyclopedia of Genes and Genomes pathway database. Based on these transcriptomic data, we obtained a large number of candidate genes that were differentially expressed at the open flower and senescing flower stages. An analysis of differentially expressed genes involved in plant hormone signal transduction pathways indicated that although flower opening and senescence may be independent of the ethylene signaling pathway in wintersweet, salicylic acid may be involved in the regulation of flower senescence. We also succeeded in isolating key genes of floral scent biosynthesis and proposed a biosynthetic pathway for monoterpenes and sesquiterpenes in wintersweet flowers, based on the annotated sequences. This comprehensive transcriptomic analysis presents fundamental information on the genes and pathways which are involved in flower development in wintersweet. And our data provided a useful database for further research of wintersweet and other Calycanthaceae family plants. PMID:24489818

  6. Convergence of GWA and candidate gene studies for alcoholism

    PubMed Central

    Olfson, Emily; Bierut, Laura Jean

    2012-01-01

    Background Genome-wide association (GWA) studies have led to a paradigm shift in how researchers study the genetics underlying disease. Many GWA studies are now publicly available and can be used to examine whether or not previously proposed candidate genes are supported by GWA data. This approach is particularly important for the field of alcoholism because the contribution of many candidate genes remains controversial. Methods Using the Human Genome Epidemiology (HuGE) Navigator, we selected candidate genes for alcoholism that have been frequently examined in scientific articles in the past decade. Specific candidate loci as well as all the reported SNPs in candidate genes were examined in the Study of Alcohol Addiction: Genetics and Addiction (SAGE), a GWA study comparing alcohol dependent and non-dependent subjects. Results Several commonly reported candidate loci, including rs1800497 in DRD2, rs698 in ADH1C, rs1799971 in OPRM1 and rs4680 in COMT, are not replicated in SAGE (p> .05). Among candidate loci available for analysis, only rs279858 in GABRA2 (p=0.0052, OR=1.16) demonstrated a modest association. Examination of all SNPs reported in SAGE in over 50 candidate genes revealed no SNPs with large frequency differences between cases and controls and the lowest p value of any SNP was .0006. Discussion We provide evidence that several extensively studied candidate loci do not have a strong contribution to risk of developing alcohol dependence in European and African Ancestry populations. Due to lack of coverage, we were unable to rule out the contribution of other variants and these genes and particular loci warrant further investigation. Our analysis demonstrates that publicly available GWA results can be used to better understand which if any of previously proposed candidate genes contribute to disease. Furthermore, we illustrate how examining the convergence of candidate gene and GWA studies can help elucidate the genetic architecture of alcoholism and more generally complex diseases. PMID:22978509

  7. Antagonist molecules in the treatment of angina

    PubMed Central

    Gupta, Ashish K.; Winchester, David; Pepine, Carl J.

    2017-01-01

    Introduction Management of chronic angina has evolved dramatically in the last few decades with several options for pharmacotherapy outlined in various evidence-based guidelines. Areas covered There is a growing list of drugs that are currently being investigated for treatment of chronic angina. These also include several herbal medications, which are now being scientifically evaluated as potential alternative or even adjunctive therapy for angina. Gene- and cell-based therapies have opened yet another avenue for management of chronic refractory angina in ‘no-option’ patients who are not candidates for either percutaneous or surgical revascularization and are on optimal medical therapy. An extensive review of literature using PUBMED, Cochrane database, clinical trial databases of USA and European Union was done and summarized in this review. This review will attempt to discuss the traditional as well as novel therapeutic agents for angina. Expert opinion Several pharmacological and non-pharmacological therapeutic options are now available for treatment and management of chronic refractory angina. Renewed interest in traditional therapies and cell- and gene-based modalities with targeted drug delivery systems will open the doors for personalized therapy for patients with chronic refractory angina. PMID:24047238

  8. Vγ9 and Vδ2 T cell antigen receptor genes and butyrophilin 3 (BTN3) emerged with placental mammals and are concomitantly preserved in selected species like alpaca (Vicugna pacos).

    PubMed

    Karunakaran, Mohindar M; Göbel, Thomas W; Starick, Lisa; Walter, Lutz; Herrmann, Thomas

    2014-04-01

    Human Vγ9Vδ2 T cells recognize phosphorylated products of isoprenoid metabolism (phosphoantigens) PAg with TCR comprising Vγ9JP γ-chains and Vδ2 δ-chains dependent on butyrophilin 3 (BTN3) expressed by antigen-presenting cells. They are massively activated in many infections and show anti-tumor activity and so far, they have been considered to exist only in higher primates. We performed a comprehensive analysis of databases and identified the three genes in species of both placental magnorders, but not in rodents. The common occurrence or loss of in silico translatable Vγ9, Vδ2, and BTN3 genes suggested their co-evolution based on a functional relationship. In the peripheral lymphocytes of alpaca (Vicugna pacos), characteristic Vγ9JP rearrangements and in-frame Vδ2 rearrangements were found and could be co-expressed in a TCR-negative mouse T cell hybridoma where they rescued CD3 expression and function. Finally, database sequence analysis of the extracellular domain of alpaca BTN3 revealed complete conservation of proposed PAg binding residues of human BTN3A1. In summary, we show emergence and preservation of Vγ9 and Vδ2 TCR genes with the gene of the putative antigen-presenting molecule BTN3 in placental mammals and lay the ground for analysis of alpaca as candidate for a first non-primate species to possess Vγ9Vδ2 T cells.

  9. EDdb: a web resource for eating disorder and its application to identify an extended adipocytokine signaling pathway related to eating disorder.

    PubMed

    Zhao, Min; Li, XiaoMo; Qu, Hong

    2013-12-01

    Eating disorder is a group of physiological and psychological disorders affecting approximately 1% of the female population worldwide. Although the genetic epidemiology of eating disorder is becoming increasingly clear with accumulated studies, the underlying molecular mechanisms are still unclear. Recently, integration of various high-throughput data expanded the range of candidate genes and started to generate hypotheses for understanding potential pathogenesis in complex diseases. This article presents EDdb (Eating Disorder database), the first evidence-based gene resource for eating disorder. Fifty-nine experimentally validated genes from the literature in relation to eating disorder were collected as the core dataset. Another four datasets with 2824 candidate genes across 601 genome regions were expanded based on the core dataset using different criteria (e.g., protein-protein interactions, shared cytobands, and related complex diseases). Based on human protein-protein interaction data, we reconstructed a potential molecular sub-network related to eating disorder. Furthermore, with an integrative pathway enrichment analysis of genes in EDdb, we identified an extended adipocytokine signaling pathway in eating disorder. Three genes in EDdb (ADIPO (adiponectin), TNF (tumor necrosis factor) and NR3C1 (nuclear receptor subfamily 3, group C, member 1)) link the KEGG (Kyoto Encyclopedia of Genes and Genomes) "adipocytokine signaling pathway" with the BioCarta "visceral fat deposits and the metabolic syndrome" pathway to form a joint pathway. In total, the joint pathway contains 43 genes, among which 39 genes are related to eating disorder. As the first comprehensive gene resource for eating disorder, EDdb ( http://eddb.cbi.pku.edu.cn ) enables the exploration of gene-disease relationships and cross-talk mechanisms between related disorders. Through pathway statistical studies, we revealed that abnormal body weight caused by eating disorder and obesity may both be related to dysregulation of the novel joint pathway of adipocytokine signaling. In addition, this joint pathway may be the common pathway for body weight regulation in complex human diseases related to unhealthy lifestyle.

  10. Genome-Wide Association Study Identifying Candidate Genes Influencing Important Agronomic Traits of Flax (Linum usitatissimum L.) Using SLAF-seq

    PubMed Central

    Xie, Dongwei; Dai, Zhigang; Yang, Zemao; Sun, Jian; Zhao, Debao; Yang, Xue; Zhang, Liguo; Tang, Qing; Su, Jianguang

    2018-01-01

    Flax (Linum usitatissimum L.) is an important cash crop, and its agronomic traits directly affect yield and quality. Molecular studies on flax remain inadequate because relatively few flax genes have been associated with agronomic traits or have been identified as having potential applications. To identify markers and candidate genes that can potentially be used for genetic improvement of crucial agronomic traits, we examined 224 specimens of core flax germplasm; specifically, phenotypic data for key traits, including plant height, technical length, number of branches, number of fruits, and 1000-grain weight were investigated under three environmental conditions before specific-locus amplified fragment sequencing (SLAF-seq) was employed to perform a genome-wide association study (GWAS) for these five agronomic traits. Subsequently, the results were used to screen single nucleotide polymorphism (SNP) loci and candidate genes that exhibited a significant correlation with the important agronomic traits. Our analyses identified a total of 42 SNP loci that showed significant correlations with the five important agronomic flax traits. Next, candidate genes were screened in the 10 kb zone of each of the 42 SNP loci. These SNP loci were then analyzed by a more stringent screening via co-identification using both a general linear model (GLM) and a mixed linear model (MLM) as well as co-occurrences in at least two of the three environments, whereby 15 final candidate genes were obtained. Based on these results, we determined that UGT and PL are candidate genes for plant height, GRAS and XTH are candidate genes for the number of branches, Contig1437 and LU0019C12 are candidate genes for the number of fruits, and PHO1 is a candidate gene for the 1000-seed weight. We propose that the identified SNP loci and corresponding candidate genes might serve as a biological basis for improving crucial agronomic flax traits. PMID:29375606

  11. Genome-Wide Association Study Identifying Candidate Genes Influencing Important Agronomic Traits of Flax (Linum usitatissimum L.) Using SLAF-seq.

    PubMed

    Xie, Dongwei; Dai, Zhigang; Yang, Zemao; Sun, Jian; Zhao, Debao; Yang, Xue; Zhang, Liguo; Tang, Qing; Su, Jianguang

    2017-01-01

    Flax ( Linum usitatissimum L.) is an important cash crop, and its agronomic traits directly affect yield and quality. Molecular studies on flax remain inadequate because relatively few flax genes have been associated with agronomic traits or have been identified as having potential applications. To identify markers and candidate genes that can potentially be used for genetic improvement of crucial agronomic traits, we examined 224 specimens of core flax germplasm; specifically, phenotypic data for key traits, including plant height, technical length, number of branches, number of fruits, and 1000-grain weight were investigated under three environmental conditions before specific-locus amplified fragment sequencing (SLAF-seq) was employed to perform a genome-wide association study (GWAS) for these five agronomic traits. Subsequently, the results were used to screen single nucleotide polymorphism (SNP) loci and candidate genes that exhibited a significant correlation with the important agronomic traits. Our analyses identified a total of 42 SNP loci that showed significant correlations with the five important agronomic flax traits. Next, candidate genes were screened in the 10 kb zone of each of the 42 SNP loci. These SNP loci were then analyzed by a more stringent screening via co-identification using both a general linear model (GLM) and a mixed linear model (MLM) as well as co-occurrences in at least two of the three environments, whereby 15 final candidate genes were obtained. Based on these results, we determined that UGT and PL are candidate genes for plant height, GRAS and XTH are candidate genes for the number of branches, Contig1437 and LU0019C12 are candidate genes for the number of fruits, and PHO1 is a candidate gene for the 1000-seed weight. We propose that the identified SNP loci and corresponding candidate genes might serve as a biological basis for improving crucial agronomic flax traits.

  12. Profiling and Quantifying Differential Gene Transcription Provide Insights into Ganoderic Acid Biosynthesis in Ganoderma lucidum in Response to Methyl Jasmonate

    PubMed Central

    Shi, Liang; Mu, Da-Shuai; Jiang, Ai-Liang; Han, Qin; Zhao, Ming-Wen

    2013-01-01

    Ganoderma lucidum is a mushroom with traditional medicinal properties that has been widely used in China and other countries in Eastern Asia. Ganoderic acids (GA) produced by G. lucidum exhibit important pharmacological activities. Previous studies have demonstrated that methyl jasmonate (MeJA) is a potent inducer of GA biosynthesis and the expression of genes involved in the GA biosynthesis pathway in G. lucidum. To further explore the mechanism of GA biosynthesis, cDNA-Amplified Fragment Length Polymorphism (cDNA-AFLP) was used to identify genes that are differentially expressed in response to MeJA. Using 64 primer combinations, over 3910 transcriptionally derived fragments (TDFs) were obtained. Reliable sequence data were obtained for 390 of 458 selected TDFs. Ninety of these TDFs were annotated with known functions through BLASTX searching the GenBank database, and 12 annotated TDFs were assigned into secondary metabolic pathways by searching the KEGGPATHWAY database. Twenty-five TDFs were selected for qRT-PCR analysis to confirm the expression patterns observed with cDNA-AFLP. The qRT-PCR results were consistent with the altered patterns of gene expression revealed by the cDNA-AFLP technique. Additionally, the transcript levels of 10 genes were measured at the mycelium, primordia, and fruiting body developmental stages of G. lucidum. The greatest expression levels were reached during primordia for all of the genes except cytochrome b2 reached its highest expression level in the mycelium stage. This study not only identifies new candidate genes involved in the regulation of GA biosynthesis but also provides further insight into MeJA-induced gene expression and secondary metabolic response in G. lucidum. PMID:23762280

  13. A direct molecular link between the autism candidate gene RORa and the schizophrenia candidate MIR137

    NASA Astrophysics Data System (ADS)

    Devanna, Paolo; Vernes, Sonja C.

    2014-02-01

    Retinoic acid-related orphan receptor alpha gene (RORa) and the microRNA MIR137 have both recently been identified as novel candidate genes for neuropsychiatric disorders. RORa encodes a ligand-dependent orphan nuclear receptor that acts as a transcriptional regulator and miR-137 is a brain enriched small non-coding RNA that interacts with gene transcripts to control protein levels. Given the mounting evidence for RORa in autism spectrum disorders (ASD) and MIR137 in schizophrenia and ASD, we investigated if there was a functional biological relationship between these two genes. Herein, we demonstrate that miR-137 targets the 3'UTR of RORa in a site specific manner. We also provide further support for MIR137 as an autism candidate by showing that a large number of previously implicated autism genes are also putatively targeted by miR-137. This work supports the role of MIR137 as an ASD candidate and demonstrates a direct biological link between these previously unrelated autism candidate genes.

  14. Candidate genes and molecular markers associated with heat tolerance in colonial Bentgrass.

    PubMed

    Jespersen, David; Belanger, Faith C; Huang, Bingru

    2017-01-01

    Elevated temperature is a major abiotic stress limiting the growth of cool-season grasses during the summer months. The objectives of this study were to determine the genetic variation in the expression patterns of selected genes involved in several major metabolic pathways regulating heat tolerance for two genotypes contrasting in heat tolerance to confirm their status as potential candidate genes, and to identify PCR-based markers associated with candidate genes related to heat tolerance in a colonial (Agrostis capillaris L.) x creeping bentgrass (Agrostis stolonifera L.) hybrid backcross population. Plants were subjected to heat stress in controlled-environmental growth chambers for phenotypic evaluation and determination of genetic variation in candidate gene expression. Molecular markers were developed for genes involved in protein degradation (cysteine protease), antioxidant defense (catalase and glutathione-S-transferase), energy metabolism (glyceraldehyde-3-phosphate dehydrogenase), cell expansion (expansin), and stress protection (heat shock proteins HSP26, HSP70, and HSP101). Kruskal-Wallis analysis, a commonly used non-parametric test used to compare population individuals with or without the gene marker, found the physiological traits of chlorophyll content, electrolyte leakage, normalized difference vegetative index, and turf quality were associated with all candidate gene markers with the exception of HSP101. Differential gene expression was frequently found for the tested candidate genes. The development of candidate gene markers for important heat tolerance genes may allow for the development of new cultivars with increased abiotic stress tolerance using marker-assisted selection.

  15. Candidate genes and molecular markers associated with heat tolerance in colonial Bentgrass

    PubMed Central

    Jespersen, David; Belanger, Faith C.; Huang, Bingru

    2017-01-01

    Elevated temperature is a major abiotic stress limiting the growth of cool-season grasses during the summer months. The objectives of this study were to determine the genetic variation in the expression patterns of selected genes involved in several major metabolic pathways regulating heat tolerance for two genotypes contrasting in heat tolerance to confirm their status as potential candidate genes, and to identify PCR-based markers associated with candidate genes related to heat tolerance in a colonial (Agrostis capillaris L.) x creeping bentgrass (Agrostis stolonifera L.) hybrid backcross population. Plants were subjected to heat stress in controlled-environmental growth chambers for phenotypic evaluation and determination of genetic variation in candidate gene expression. Molecular markers were developed for genes involved in protein degradation (cysteine protease), antioxidant defense (catalase and glutathione-S-transferase), energy metabolism (glyceraldehyde-3-phosphate dehydrogenase), cell expansion (expansin), and stress protection (heat shock proteins HSP26, HSP70, and HSP101). Kruskal-Wallis analysis, a commonly used non-parametric test used to compare population individuals with or without the gene marker, found the physiological traits of chlorophyll content, electrolyte leakage, normalized difference vegetative index, and turf quality were associated with all candidate gene markers with the exception of HSP101. Differential gene expression was frequently found for the tested candidate genes. The development of candidate gene markers for important heat tolerance genes may allow for the development of new cultivars with increased abiotic stress tolerance using marker-assisted selection. PMID:28187136

  16. Prediction model of potential hepatocarcinogenicity of rat hepatocarcinogens using a large-scale toxicogenomics database

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Uehara, Takeki, E-mail: takeki.uehara@shionogi.co.jp; Toxicogenomics Informatics Project, National Institute of Biomedical Innovation, 7-6-8 Asagi, Ibaraki, Osaka 567-0085; Minowa, Yohsuke

    2011-09-15

    The present study was performed to develop a robust gene-based prediction model for early assessment of potential hepatocarcinogenicity of chemicals in rats by using our toxicogenomics database, TG-GATEs (Genomics-Assisted Toxicity Evaluation System developed by the Toxicogenomics Project in Japan). The positive training set consisted of high- or middle-dose groups that received 6 different non-genotoxic hepatocarcinogens during a 28-day period. The negative training set consisted of high- or middle-dose groups of 54 non-carcinogens. Support vector machine combined with wrapper-type gene selection algorithms was used for modeling. Consequently, our best classifier yielded prediction accuracies for hepatocarcinogenicity of 99% sensitivity and 97% specificitymore » in the training data set, and false positive prediction was almost completely eliminated. Pathway analysis of feature genes revealed that the mitogen-activated protein kinase p38- and phosphatidylinositol-3-kinase-centered interactome and the v-myc myelocytomatosis viral oncogene homolog-centered interactome were the 2 most significant networks. The usefulness and robustness of our predictor were further confirmed in an independent validation data set obtained from the public database. Interestingly, similar positive predictions were obtained in several genotoxic hepatocarcinogens as well as non-genotoxic hepatocarcinogens. These results indicate that the expression profiles of our newly selected candidate biomarker genes might be common characteristics in the early stage of carcinogenesis for both genotoxic and non-genotoxic carcinogens in the rat liver. Our toxicogenomic model might be useful for the prospective screening of hepatocarcinogenicity of compounds and prioritization of compounds for carcinogenicity testing. - Highlights: >We developed a toxicogenomic model to predict hepatocarcinogenicity of chemicals. >The optimized model consisting of 9 probes had 99% sensitivity and 97% specificity. >This model enables us to detect genotoxic as well as non-genotoxic hepatocarcinogens.« less

  17. Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation

    NASA Astrophysics Data System (ADS)

    Sheynkman, Gloria M.; Shortreed, Michael R.; Cesnik, Anthony J.; Smith, Lloyd M.

    2016-06-01

    Mass spectrometry-based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications.

  18. Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation

    PubMed Central

    Sheynkman, Gloria M.; Shortreed, Michael R.; Cesnik, Anthony J.; Smith, Lloyd M.

    2016-01-01

    Mass spectrometry–based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications. PMID:27049631

  19. New Insights into the Function and Global Distribution of Polyethylene Terephthalate (PET)-Degrading Bacteria and Enzymes in Marine and Terrestrial Metagenomes.

    PubMed

    Danso, Dominik; Schmeisser, Christel; Chow, Jennifer; Zimmermann, Wolfgang; Wei, Ren; Leggewie, Christian; Li, Xiangzhen; Hazen, Terry; Streit, Wolfgang R

    2018-04-15

    Polyethylene terephthalate (PET) is one of the most important synthetic polymers used today. Unfortunately, the polymers accumulate in nature and to date no highly active enzymes are known that can degrade it at high velocity. Enzymes involved in PET degradation are mainly α- and β-hydrolases, like cutinases and related enzymes (EC 3.1.1). Currently, only a small number of such enzymes are well characterized. In this work, a search algorithm was developed that identified 504 possible PET hydrolase candidate genes from various databases. A further global search that comprised more than 16 Gb of sequence information within 108 marine and 25 terrestrial metagenomes obtained from the Integrated Microbial Genome (IMG) database detected 349 putative PET hydrolases. Heterologous expression of four such candidate enzymes verified the function of these enzymes and confirmed the usefulness of the developed search algorithm. In this way, two novel and thermostable enzymes with high potential for downstream application were partially characterized. Clustering of 504 novel enzyme candidates based on amino acid similarities indicated that PET hydrolases mainly occur in the phyla of Actinobacteria , Proteobacteria , and Bacteroidetes Within the Proteobacteria , the Betaproteobacteria , Deltaproteobacteria , and Gammaproteobacteria were the main hosts. Remarkably enough, in the marine environment, bacteria affiliated with the phylum Bacteroidetes appear to be the main hosts of PET hydrolase genes, rather than Actinobacteria or Proteobacteria , as observed for the terrestrial metagenomes. Our data further imply that PET hydrolases are truly rare enzymes. The highest occurrence of 1.5 hits/Mb was observed in sequences from a sample site containing crude oil. IMPORTANCE Polyethylene terephthalate (PET) accumulates in our environment without significant microbial conversion. Although a few PET hydrolases are already known, it is still unknown how frequently they appear and with which main bacterial phyla they are affiliated. In this study, deep sequence mining of protein databases and metagenomes demonstrated that PET hydrolases indeed occur at very low frequencies in the environment. Furthermore, it was possible to link them to phyla that were previously not known to harbor such enzymes. This work contributes novel knowledge on the phylogenetic relationships, the recent evolution, and the global distribution of PET hydrolases. Finally, we describe the biochemical traits of four novel PET hydrolases. Copyright © 2018 Danso et al.

  20. New Insights into the Function and Global Distribution of Polyethylene Terephthalate (PET)-Degrading Bacteria and Enzymes in Marine and Terrestrial Metagenomes

    PubMed Central

    Danso, Dominik; Schmeisser, Christel; Chow, Jennifer; Wei, Ren; Leggewie, Christian; Li, Xiangzhen

    2018-01-01

    ABSTRACT Polyethylene terephthalate (PET) is one of the most important synthetic polymers used today. Unfortunately, the polymers accumulate in nature and to date no highly active enzymes are known that can degrade it at high velocity. Enzymes involved in PET degradation are mainly α- and β-hydrolases, like cutinases and related enzymes (EC 3.1.1). Currently, only a small number of such enzymes are well characterized. In this work, a search algorithm was developed that identified 504 possible PET hydrolase candidate genes from various databases. A further global search that comprised more than 16 Gb of sequence information within 108 marine and 25 terrestrial metagenomes obtained from the Integrated Microbial Genome (IMG) database detected 349 putative PET hydrolases. Heterologous expression of four such candidate enzymes verified the function of these enzymes and confirmed the usefulness of the developed search algorithm. In this way, two novel and thermostable enzymes with high potential for downstream application were partially characterized. Clustering of 504 novel enzyme candidates based on amino acid similarities indicated that PET hydrolases mainly occur in the phyla of Actinobacteria, Proteobacteria, and Bacteroidetes. Within the Proteobacteria, the Betaproteobacteria, Deltaproteobacteria, and Gammaproteobacteria were the main hosts. Remarkably enough, in the marine environment, bacteria affiliated with the phylum Bacteroidetes appear to be the main hosts of PET hydrolase genes, rather than Actinobacteria or Proteobacteria, as observed for the terrestrial metagenomes. Our data further imply that PET hydrolases are truly rare enzymes. The highest occurrence of 1.5 hits/Mb was observed in sequences from a sample site containing crude oil. IMPORTANCE Polyethylene terephthalate (PET) accumulates in our environment without significant microbial conversion. Although a few PET hydrolases are already known, it is still unknown how frequently they appear and with which main bacterial phyla they are affiliated. In this study, deep sequence mining of protein databases and metagenomes demonstrated that PET hydrolases indeed occur at very low frequencies in the environment. Furthermore, it was possible to link them to phyla that were previously not known to harbor such enzymes. This work contributes novel knowledge on the phylogenetic relationships, the recent evolution, and the global distribution of PET hydrolases. Finally, we describe the biochemical traits of four novel PET hydrolases. PMID:29427431

  1. A small number of candidate gene SNPs reveal continental ancestry in African Americans

    PubMed Central

    KODAMAN, NURI; ALDRICH, MELINDA C.; SMITH, JEFFREY R.; SIGNORELLO, LISA B.; BRADLEY, KEVIN; BREYER, JOAN; COHEN, SARAH S.; LONG, JIRONG; CAI, QIUYIN; GILES, JUSTIN; BUSH, WILLIAM S.; BLOT, WILLIAM J.; MATTHEWS, CHARLES E.; WILLIAMS, SCOTT M.

    2013-01-01

    SUMMARY Using genetic data from an obesity candidate gene study of self-reported African Americans and European Americans, we investigated the number of Ancestry Informative Markers (AIMs) and candidate gene SNPs necessary to infer continental ancestry. Proportions of African and European ancestry were assessed with STRUCTURE (K=2), using 276 AIMs. These reference values were compared to estimates derived using 120, 60, 30, and 15 SNP subsets randomly chosen from the 276 AIMs and from 1144 SNPs in 44 candidate genes. All subsets generated estimates of ancestry consistent with the reference estimates, with mean correlations greater than 0.99 for all subsets of AIMs, and mean correlations of 0.99±0.003; 0.98± 0.01; 0.93±0.03; and 0.81± 0.11 for subsets of 120, 60, 30, and 15 candidate gene SNPs, respectively. Among African Americans, the median absolute difference from reference African ancestry values ranged from 0.01 to 0.03 for the four AIMs subsets and from 0.03 to 0.09 for the four candidate gene SNP subsets. Furthermore, YRI/CEU Fst values provided a metric to predict the performance of candidate gene SNPs. Our results demonstrate that a small number of SNPs randomly selected from candidate genes can be used to estimate admixture proportions in African Americans reliably. PMID:23278390

  2. Nursing leadership succession planning in Veterans Health Administration: creating a useful database.

    PubMed

    Weiss, Lizabeth M; Drake, Audrey

    2007-01-01

    An electronic database was developed for succession planning and placement of nursing leaders interested and ready, willing, and able to accept an assignment in a nursing leadership position. The tool is a 1-page form used to identify candidates for nursing leadership assignments. This tool has been deployed nationally, with access to the database restricted to nurse executives at every Veterans Health Administration facility for the purpose of entering the names of developed nurse leaders ready for a leadership assignment. The tool is easily accessed through the Veterans Health Administration Office of Nursing Service, and by limiting access to the nurse executive group, ensures candidates identified are qualified. Demographic information included on the survey tool includes the candidate's demographic information and other certifications/credentials. This completed information form is entered into a database from which a report can be generated, resulting in a listing of potential candidates to contact to supplement a local or Veterans Integrated Service Network wide position announcement. The data forms can be sorted by positions, areas of clinical or functional experience, training programs completed, and geographic preference. The forms can be edited or updated and/or added or deleted in the system as the need is identified. This tool allows facilities with limited internal candidates to have a resource with Department of Veterans Affairs prepared staff in which to seek additional candidates. It also provides a way for interested candidates to be considered for positions outside of their local geographic area.

  3. EnRICH: Extraction and Ranking using Integration and Criteria Heuristics.

    PubMed

    Zhang, Xia; Greenlee, M Heather West; Serb, Jeanne M

    2013-01-15

    High throughput screening technologies enable biologists to generate candidate genes at a rate that, due to time and cost constraints, cannot be studied by experimental approaches in the laboratory. Thus, it has become increasingly important to prioritize candidate genes for experiments. To accomplish this, researchers need to apply selection requirements based on their knowledge, which necessitates qualitative integration of heterogeneous data sources and filtration using multiple criteria. A similar approach can also be applied to putative candidate gene relationships. While automation can assist in this routine and imperative procedure, flexibility of data sources and criteria must not be sacrificed. A tool that can optimize the trade-off between automation and flexibility to simultaneously filter and qualitatively integrate data is needed to prioritize candidate genes and generate composite networks from heterogeneous data sources. We developed the java application, EnRICH (Extraction and Ranking using Integration and Criteria Heuristics), in order to alleviate this need. Here we present a case study in which we used EnRICH to integrate and filter multiple candidate gene lists in order to identify potential retinal disease genes. As a result of this procedure, a candidate pool of several hundred genes was narrowed down to five candidate genes, of which four are confirmed retinal disease genes and one is associated with a retinal disease state. We developed a platform-independent tool that is able to qualitatively integrate multiple heterogeneous datasets and use different selection criteria to filter each of them, provided the datasets are tables that have distinct identifiers (required) and attributes (optional). With the flexibility to specify data sources and filtering criteria, EnRICH automatically prioritizes candidate genes or gene relationships for biologists based on their specific requirements. Here, we also demonstrate that this tool can be effectively and easily used to apply highly specific user-defined criteria and can efficiently identify high quality candidate genes from relatively sparse datasets.

  4. Degrees of separation as a statistical tool for evaluating candidate genes.

    PubMed

    Nelson, Ronald M; Pettersson, Mats E

    2014-12-01

    Selection of candidate genes is an important step in the exploration of complex genetic architecture. The number of gene networks available is increasing and these can provide information to help with candidate gene selection. It is currently common to use the degree of connectedness in gene networks as validation in Genome Wide Association (GWA) and Quantitative Trait Locus (QTL) mapping studies. However, it can cause misleading results if not validated properly. Here we present a method and tool for validating the gene pairs from GWA studies given the context of the network they co-occur in. It ensures that proposed interactions and gene associations are not statistical artefacts inherent to the specific gene network architecture. The CandidateBacon package provides an easy and efficient method to calculate the average degree of separation (DoS) between pairs of genes to currently available gene networks. We show how these empirical estimates of average connectedness are used to validate candidate gene pairs. Validation of interacting genes by comparing their connectedness with the average connectedness in the gene network will provide support for said interactions by utilising the growing amount of gene network information available. Copyright © 2014 Elsevier Ltd. All rights reserved.

  5. De Novo Foliar Transcriptome of Chenopodium amaranticolor and Analysis of Its Gene Expression During Virus-Induced Hypersensitive Response

    PubMed Central

    Zhang, Yongqiang; Pei, Xinwu; Zhang, Chao; Lu, Zifeng; Wang, Zhixing; Jia, Shirong; Li, Weimin

    2012-01-01

    Background The hypersensitive response (HR) system of Chenopodium spp. confers broad-spectrum virus resistance. However, little knowledge exists at the genomic level for Chenopodium, thus impeding the advanced molecular research of this attractive feature. Hence, we took advantage of RNA-seq to survey the foliar transcriptome of C. amaranticolor, a Chenopodium species widely used as laboratory indicator for pathogenic viruses, in order to facilitate the characterization of the HR-type of virus resistance. Methodology and Principal Findings Using Illumina HiSeq™ 2000 platform, we obtained 39,868,984 reads with 3,588,208,560 bp, which were assembled into 112,452 unigenes (3,847 clusters and 108,605 singletons). BlastX search against the NCBI NR database identified 61,698 sequences with a cut-off E-value above 10−5. Assembled sequences were annotated with gene descriptions, GO, COG and KEGG terms, respectively. A total number of 738 resistance gene analogs (RGAs) and homology sequences of 6 key signaling proteins within the R proteins-directed signaling pathway were identified. Based on this transcriptome data, we investigated the gene expression profiles over the stage of HR induced by Tobacco mosaic virus and Cucumber mosaic virus by using digital gene expression analysis. Numerous candidate genes specifically or commonly regulated by these two distinct viruses at early and late stages of the HR were identified, and the dynamic changes of the differently expressed genes enriched in the pathway of plant-pathogen interaction were particularly emphasized. Conclusions To our knowledge, this study is the first description of the genetic makeup of C. amaranticolor, providing deep insight into the comprehensive gene expression information at transcriptional level in this species. The 738 RGAs as well as the differentially regulated genes, particularly the common genes regulated by both TMV and CMV, are suitable candidates which merit further functional characterization to dissect the molecular mechanisms and regulatory pathways of the HR-type of virus resistance in Chenopodium. PMID:23029338

  6. Interactogeneous: Disease Gene Prioritization Using Heterogeneous Networks and Full Topology Scores

    PubMed Central

    Gonçalves, Joana P.; Francisco, Alexandre P.; Moreau, Yves; Madeira, Sara C.

    2012-01-01

    Disease gene prioritization aims to suggest potential implications of genes in disease susceptibility. Often accomplished in a guilt-by-association scheme, promising candidates are sorted according to their relatedness to known disease genes. Network-based methods have been successfully exploiting this concept by capturing the interaction of genes or proteins into a score. Nonetheless, most current approaches yield at least some of the following limitations: (1) networks comprise only curated physical interactions leading to poor genome coverage and density, and bias toward a particular source; (2) scores focus on adjacencies (direct links) or the most direct paths (shortest paths) within a constrained neighborhood around the disease genes, ignoring potentially informative indirect paths; (3) global clustering is widely applied to partition the network in an unsupervised manner, attributing little importance to prior knowledge; (4) confidence weights and their contribution to edge differentiation and ranking reliability are often disregarded. We hypothesize that network-based prioritization related to local clustering on graphs and considering full topology of weighted gene association networks integrating heterogeneous sources should overcome the above challenges. We term such a strategy Interactogeneous. We conducted cross-validation tests to assess the impact of network sources, alternative path inclusion and confidence weights on the prioritization of putative genes for 29 diseases. Heat diffusion ranking proved the best prioritization method overall, increasing the gap to neighborhood and shortest paths scores mostly on single source networks. Heterogeneous associations consistently delivered superior performance over single source data across the majority of methods. Results on the contribution of confidence weights were inconclusive. Finally, the best Interactogeneous strategy, heat diffusion ranking and associations from the STRING database, was used to prioritize genes for Parkinson’s disease. This method effectively recovered known genes and uncovered interesting candidates which could be linked to pathogenic mechanisms of the disease. PMID:23185389

  7. De novo Transcriptome Analysis of Miscanthus lutarioriparius Identifies Candidate Genes in Rhizome Development

    PubMed Central

    Hu, Ruibo; Yu, Changjiang; Wang, Xiaoyu; Jia, Chunlin; Pei, Shengqiang; He, Kang; He, Guo; Kong, Yingzhen; Zhou, Gongke

    2017-01-01

    HIGHLIGHT De novo transcriptome profiling of five tissues reveals candidate genes putatively involved in rhizome development in M. lutarioriparius. Miscanthus lutarioriparius is a promising lignocellulosic feedstock for second-generation bioethanol production. However, the genomic resource for this species is relatively limited thus hampers our understanding of the molecular mechanisms underlying many important biological processes. In this study, we performed the first de novo transcriptome analysis of five tissues (leaf, stem, root, lateral bud and rhizome bud) of M. lutarioriparius with an emphasis to identify putative genes involved in rhizome development. Approximately 66 gigabase (GB) paired-end clean reads were obtained and assembled into 169,064 unigenes with an average length of 759 bp. Among these unigenes, 103,899 (61.5%) were annotated in seven public protein databases. Differential gene expression profiling analysis revealed that 4,609, 3,188, 1,679, 1,218, and 1,077 genes were predominantly expressed in root, leaf, stem, lateral bud, and rhizome bud, respectively. Their expression patterns were further classified into 12 distinct clusters. Pathway enrichment analysis revealed that genes predominantly expressed in rhizome bud were mainly involved in primary metabolism and hormone signaling and transduction pathways. Noteworthy, 19 transcription factors (TFs) and 16 hormone signaling pathway-related genes were identified to be predominantly expressed in rhizome bud compared with the other tissues, suggesting putative roles in rhizome formation and development. In addition, a predictive regulatory network was constructed between four TFs and six auxin and abscisic acid (ABA) -related genes. Furthermore, the expression of 24 rhizome-specific genes was further validated by quantitative real-time RT-PCR (qRT-PCR) analysis. Taken together, this study provide a global portrait of gene expression across five different tissues and reveal preliminary insights into rhizome growth and development. The data presented will contribute to our understanding of the molecular mechanisms underlying rhizome development in M. lutarioriparius and remarkably enrich the genomic resources of Miscanthus. PMID:28446913

  8. Identification and association analysis of several hundred single nucleotide polymorphisms within candidate genes for back fat thickness in Italian Large White pigs using a selective genotyping approach.

    PubMed

    Fontanesi, L; Galimberti, G; Calò, D G; Fronza, R; Martelli, P L; Scotti, E; Colombo, M; Schiavo, G; Casadio, R; Buttazzoni, L; Russo, V

    2012-08-01

    Combining different approaches (resequencing of portions of 54 obesity candidate genes, literature mining for pig markers associated with fat deposition or related traits in 77 genes, and in silico mining of porcine expressed sequence tags and other sequences available in databases), we identified and analyzed 736 SNP within candidate genes to identify markers associated with back fat thickness (BFT) in Italian Large White sows. Animals were chosen using a selective genotyping approach according to their EBV for BFT (276 with most negative and 279 with most positive EBV) within a population of ≈ 12,000 pigs. Association analysis between the SNP and BFT has been carried out using the MAX test proposed for case-control studies. The designed assays were successful for 656 SNP: 370 were excluded (low call rate or minor allele frequency <5%), whereas the remaining 286 in 212 genes were taken for subsequent analyses, among which 64 showed a P(nominal) value <0.1. To deal with the multiple testing problem in a candidate gene approach, we applied the proportion of false positives (PFP) method. Thirty-eight SNP were significant (P(PFP) < 0.20). The most significant SNP was the IGF2 intron3-g.3072G>A polymorphism (P(nominal) < 1.0E-50). The second most significant SNP was the MC4R c.1426A>G polymorphism (P(nominal) = 8.0E-05). The third top SNP (P(nominal) = 6.2E-04) was the intronic TBC1D1 g.219G>A polymorphic site, in agreement with our previous results obtained in an independent study. The list of significant markers also included SNP in additional genes (ABHD16A, ABHD5, ACP2, ALMS1, APOA2, ATP1A2, CALR, COL14A1, CTSF, DARS, DECR1, ENPP1, ESR1, GH1, GHRL, GNMT, IKBKB, JAK3, MTTP, NFKBIA, NT5E, PLAT, PPARG, PPP2R5D, PRLR, RRAGD, RFC2, SDHD, SERPINF1, UBE2H, VCAM1, and WAT). Functional relationships between genes were obtained using the Ingenuity Pathway Analysis (IPA) Knowledge Base. The top scoring pathway included 19 genes with a P(nominal) < 0.1, 2 of which (IKBKB and NFKBIA) are involved in the hypothalamic IKKβ/NFκB program that could represent a key axis to affect fat deposition traits in pigs. These results represent a starting point to plan marker-assisted selection in Italian Large White nuclei for BFT. Because of similarities between humans and pigs, this study might also provide useful clues to investigate genetic factors affecting human obesity.

  9. LOD score exclusion analyses for candidate QTLs using random population samples.

    PubMed

    Deng, Hong-Wen

    2003-11-01

    While extensive analyses have been conducted to test for, no formal analyses have been conducted to test against, the importance of candidate genes as putative QTLs using random population samples. Previously, we developed an LOD score exclusion mapping approach for candidate genes for complex diseases. Here, we extend this LOD score approach for exclusion analyses of candidate genes for quantitative traits. Under this approach, specific genetic effects (as reflected by heritability) and inheritance models at candidate QTLs can be analyzed and if an LOD score is < or = -2.0, the locus can be excluded from having a heritability larger than that specified. Simulations show that this approach has high power to exclude a candidate gene from having moderate genetic effects if it is not a QTL and is robust to population admixture. Our exclusion analysis complements association analysis for candidate genes as putative QTLs in random population samples. The approach is applied to test the importance of Vitamin D receptor (VDR) gene as a potential QTL underlying the variation of bone mass, an important determinant of osteoporosis.

  10. In silico analysis of expressed sequence tags from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform with conventional database searches.

    PubMed

    Nagaraj, Shivashankar H; Gasser, Robin B; Nisbet, Alasdair J; Ranganathan, Shoba

    2008-01-01

    The analysis of expressed sequence tags (EST) offers a rapid and cost effective approach to elucidate the transcriptome of an organism, but requires several computational methods for assembly and annotation. Researchers frequently analyse each step manually, which is laborious and time consuming. We have recently developed ESTExplorer, a semi-automated computational workflow system, in order to achieve the rapid analysis of EST datasets. In this study, we evaluated EST data analysis for the parasitic nematode Trichostrongylus vitrinus (order Strongylida) using ESTExplorer, compared with database matching alone. We functionally annotated 1776 ESTs obtained via suppressive-subtractive hybridisation from T. vitrinus, an important parasitic trichostrongylid of small ruminants. Cluster and comparative genomic analyses of the transcripts using ESTExplorer indicated that 290 (41%) sequences had homologues in Caenorhabditis elegans, 329 (42%) in parasitic nematodes, 202 (28%) in organisms other than nematodes, and 218 (31%) had no significant match to any sequence in the current databases. Of the C. elegans homologues, 90 were associated with 'non-wildtype' double-stranded RNA interference (RNAi) phenotypes, including embryonic lethality, maternal sterility, sterile progeny, larval arrest and slow growth. We could functionally classify 267 (38%) sequences using the Gene Ontologies (GO) and establish pathway associations for 230 (33%) sequences using the Kyoto Encyclopedia of Genes and Genomes (KEGG). Further examination of this EST dataset revealed a number of signalling molecules, proteases, protease inhibitors, enzymes, ion channels and immune-related genes. In addition, we identified 40 putative secreted proteins that could represent potential candidates for developing novel anthelmintics or vaccines. We further compared the automated EST sequence annotations, using ESTExplorer, with database search results for individual T. vitrinus ESTs. ESTExplorer reliably and rapidly annotated 301 ESTs, with pathway and GO information, eliminating 60 low quality hits from database searches. We evaluated the efficacy of ESTExplorer in analysing EST data, and demonstrate that computational tools can be used to accelerate the process of gene discovery in EST sequencing projects. The present study has elucidated sets of relatively conserved and potentially novel genes for biological investigation, and the annotated EST set provides further insight into the molecular biology of T. vitrinus, towards the identification of novel drug targets.

  11. Association of candidate genes with drought tolerance traits in diverse perennial ryegrass accessions

    Treesearch

    Xiaoqing Yu; Guihua Bai; Shuwei Liu; Na Luo; Ying Wang; Douglas S. Richmond; Paula M. Pijut; Scott A. Jackson; Jianming Yu; Yiwei Jiang

    2013-01-01

    Drought is a major environmental stress limiting growth of perennial grasses in temperate regions. Plant drought tolerance is a complex trait that is controlled by multiple genes. Candidate gene association mapping provides a powerful tool for dissection of complex traits. Candidate gene association mapping of drought tolerance traits was conducted in 192 diverse...

  12. A Simple Screening Approach To Prioritize Genes for Functional Analysis Identifies a Role for Interferon Regulatory Factor 7 in the Control of Respiratory Syncytial Virus Disease

    PubMed Central

    McDonald, Jacqueline U.; Kaforou, Myrsini; Clare, Simon; Hale, Christine; Ivanova, Maria; Huntley, Derek; Dorner, Marcus; Wright, Victoria J.; Levin, Michael; Martinon-Torres, Federico; Herberg, Jethro A.

    2016-01-01

    ABSTRACT Greater understanding of the functions of host gene products in response to infection is required. While many of these genes enable pathogen clearance, some enhance pathogen growth or contribute to disease symptoms. Many studies have profiled transcriptomic and proteomic responses to infection, generating large data sets, but selecting targets for further study is challenging. Here we propose a novel data-mining approach combining multiple heterogeneous data sets to prioritize genes for further study by using respiratory syncytial virus (RSV) infection as a model pathogen with a significant health care impact. The assumption was that the more frequently a gene is detected across multiple studies, the more important its role is. A literature search was performed to find data sets of genes and proteins that change after RSV infection. The data sets were standardized, collated into a single database, and then panned to determine which genes occurred in multiple data sets, generating a candidate gene list. This candidate gene list was validated by using both a clinical cohort and in vitro screening. We identified several genes that were frequently expressed following RSV infection with no assigned function in RSV control, including IFI27, IFIT3, IFI44L, GBP1, OAS3, IFI44, and IRF7. Drilling down into the function of these genes, we demonstrate a role in disease for the gene for interferon regulatory factor 7, which was highly ranked on the list, but not for IRF1, which was not. Thus, we have developed and validated an approach for collating published data sets into a manageable list of candidates, identifying novel targets for future analysis. IMPORTANCE Making the most of “big data” is one of the core challenges of current biology. There is a large array of heterogeneous data sets of host gene responses to infection, but these data sets do not inform us about gene function and require specialized skill sets and training for their utilization. Here we describe an approach that combines and simplifies these data sets, distilling this information into a single list of genes commonly upregulated in response to infection with RSV as a model pathogen. Many of the genes on the list have unknown functions in RSV disease. We validated the gene list with new clinical, in vitro, and in vivo data. This approach allows the rapid selection of genes of interest for further, more-detailed studies, thus reducing time and costs. Furthermore, the approach is simple to use and widely applicable to a range of diseases. PMID:27822537

  13. Identification and characterization of a new multigene family in the human MHC: A candidate autoimmune disease susceptibility element (3.8-1)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Harris, J.M.; Venditti, C.P.; Chorney, M.J.

    1994-09-01

    An association between idiopathic hemochromatosis (HFE) and the HLA-A3 locus has been previously well-established. In an attempt to identify potential HFE candidate genes, a genomic DNA fragment distal to the HLA-A9 breakpoint was used to screen a B cell cDNA library; a member (3.8-1) of a new multigene family, composed of five distinct genomic cross-reactive fragments, was identified. Clone 3.8-1 represents the 3{prime} end of 9.6 kb transcript which is expressed in multiple tissues including the spleen, thymus, lung and kidney. Sequencing and genome database analysis indicate that 3.8-1 is unique, with no homology to any known entries. The genomicmore » residence of 3-8.1, defined by polymorphism analysis and physical mapping using YAC clones, appears to be absent from the genomes of higher primates, although four other cross-reactivities are maintained. The absence of this gene as well as other probes which map in the TNF to HLA-B interval, suggest that this portion of the human HMC, located between the Class I and Class III regions, arose in humans as the result of a post-speciation insertional event. The large size of the 3.8-1 gene and the possible categorization of 3.8-1 as a human-specific gene are significant given the genetic data that place an autoimmune susceptibility element for IDDM and myasthenia gravis in the precise region where this gene resides. In an attempt to isolate the 5{prime} end of this large transcript, we have constructed a cosmid contig which encompasses the genomic locus of this gene and are progressively isolating coding sequences by exon trapping.« less

  14. Diterpenes biochemical profile and transcriptional analysis of cytochrome P450s genes in leaves, roots, flowers, and during Coffea arabica L. fruit development.

    PubMed

    Ivamoto, Suzana T; Sakuray, Leonardo M; Ferreira, Lucia P; Kitzberger, Cíntia S G; Scholz, Maria B S; Pot, David; Leroy, Thierry; Vieira, Luiz G E; Domingues, Douglas S; Pereira, Luiz F P

    2017-02-01

    Lipids are among the major chemical compounds present in coffee beans, and they affect the flavor and aroma of the coffee beverage. Coffee oil is rich in kaurene diterpene compounds, mainly cafestol (CAF) and kahweol (KAH), which are related to plant defense mechanisms and to nutraceutical and sensorial beverage characteristics. Despite their importance, the final steps of coffee diterpenes biosynthesis remain unknown. To understand the molecular basis of coffee diterpenes biosynthesis, we report the content dynamics of CAF and KAH in several Coffea arabica tissues and the transcriptional analysis of cytochrome P450 genes (P450). We measured CAF and KAH concentrations in leaves, roots, flower buds, flowers and fruit tissues at seven developmental stages (30-240 days after flowering - DAF) using HPLC. Higher CAF levels were detected in flower buds and flowers when compared to fruits. In contrast, KAH concentration increased along fruit development, peaking at 120 DAF. We did not detect CAF or KAH in leaves, and higher amounts of KAH than CAF were detected in roots. Using P450 candidate genes from a coffee EST database, we performed RT-qPCR transcriptional analysis of leaves, flowers and fruits at three developmental stages (90, 120 and 150 DAF). Three P450 genes (CaCYP76C4, CaCYP82C2 and CaCYP74A1) had transcriptional patterns similar to CAF concentration and two P450 genes (CaCYP71A25 and CaCYP701A3) have transcript accumulation similar to KAH concentration. These data warrant further investigation of these P450s as potential candidate genes involved in the final stages of the CAF and KAH biosynthetic pathways. Copyright © 2016 Elsevier Masson SAS. All rights reserved.

  15. [Hereditary motor and sensory neuropathy with proximal dominant involvement (HMSN-P) is caused by a mutation in TFG].

    PubMed

    Ishiura, Hiroyuki; Tsuji, Shoji

    2013-01-01

    Hereditary motor and sensory neuropathy with proximal dominant involvement (HMSN-P) is an autosomal dominant neurodegenerative disease characterized by proximal predominant weakness and muscle atrophy accompanied by distal sensory disturbance. Linkage analysis using 4 families identified a region on chromosome 3 showing a LOD score exceeding 4. Further refinement of candidate region was performed by haplotype analysis using high-density SNP data, resulting in a minimum candidate region spanning 3.3 Mb. Exome analysis of an HMSN-P patient revealed a mutation (c.854C>T, p.Pro285Leu) in TRK-fused gene (TFG). The identical mutation was found in the four families, which cosegregated with the disease. The mutation was neither found in Japanese control subjects nor public databases. Detailed haplotype analysis suggested two independent origins of the mutation. These findings indicate that the mutation in TFG causes HMSN-P.

  16. Convergence of genome-wide association and candidate gene studies for alcoholism.

    PubMed

    Olfson, Emily; Bierut, Laura Jean

    2012-12-01

    Genome-wide association (GWA) studies have led to a paradigm shift in how researchers study the genetics underlying disease. Many GWA studies are now publicly available and can be used to examine whether or not previously proposed candidate genes are supported by GWA data. This approach is particularly important for the field of alcoholism because the contribution of many candidate genes remains controversial. Using the Human Genome Epidemiology (HuGE) Navigator, we selected candidate genes for alcoholism that have been frequently examined in scientific articles in the past decade. Specific candidate loci as well as all the reported single nucleotide polymorphisms (SNPs) in candidate genes were examined in the Study of Addiction: Genetics and Environment (SAGE), a GWA study comparing alcohol-dependent and nondependent subjects. Several commonly reported candidate loci, including rs1800497 in DRD2, rs698 in ADH1C, rs1799971 in OPRM1, and rs4680 in COMT, are not replicated in SAGE (p > 0.05). Among candidate loci available for analysis, only rs279858 in GABRA2 (p = 0.0052, OR = 1.16) demonstrated a modest association. Examination of all SNPs reported in SAGE in over 50 candidate genes revealed no SNPs with large frequency differences between cases and controls, and the lowest p-value of any SNP was 0.0006. We provide evidence that several extensively studied candidate loci do not have a strong contribution to risk of developing alcohol dependence in European and African ancestry populations. Owing to the lack of coverage, we were unable to rule out the contribution of other variants, and these genes and particular loci warrant further investigation. Our analysis demonstrates that publicly available GWA results can be used to better understand which if any of previously proposed candidate genes contribute to disease. Furthermore, we illustrate how examining the convergence of candidate gene and GWA studies can help elucidate the genetic architecture of alcoholism and more generally complex diseases. Copyright © 2012 by the Research Society on Alcoholism.

  17. Identification and characterization of microRNAs related to salt stress in broccoli, using high-throughput sequencing and bioinformatics analysis.

    PubMed

    Tian, Yunhong; Tian, Yunming; Luo, Xiaojun; Zhou, Tao; Huang, Zuoping; Liu, Ying; Qiu, Yihan; Hou, Bing; Sun, Dan; Deng, Hongyu; Qian, Shen; Yao, Kaitai

    2014-09-03

    MicroRNAs (miRNAs) are a new class of endogenous regulators of a broad range of physiological processes, which act by regulating gene expression post-transcriptionally. The brassica vegetable, broccoli (Brassica oleracea var. italica), is very popular with a wide range of consumers, but environmental stresses such as salinity are a problem worldwide in restricting its growth and yield. Little is known about the role of miRNAs in the response of broccoli to salt stress. In this study, broccoli subjected to salt stress and broccoli grown under control conditions were analyzed by high-throughput sequencing. Differential miRNA expression was confirmed by real-time reverse transcription polymerase chain reaction (RT-PCR). The prediction of miRNA targets was undertaken using the Kyoto Encyclopedia of Genes and Genomes (KEGG) Orthology (KO) database and Gene Ontology (GO)-enrichment analyses. Two libraries of small (or short) RNAs (sRNAs) were constructed and sequenced by high-throughput Solexa sequencing. A total of 24,511,963 and 21,034,728 clean reads, representing 9,861,236 (40.23%) and 8,574,665 (40.76%) unique reads, were obtained for control and salt-stressed broccoli, respectively. Furthermore, 42 putative known and 39 putative candidate miRNAs that were differentially expressed between control and salt-stressed broccoli were revealed by their read counts and confirmed by the use of stem-loop real-time RT-PCR. Amongst these, the putative conserved miRNAs, miR393 and miR855, and two putative candidate miRNAs, miR3 and miR34, were the most strongly down-regulated when broccoli was salt-stressed, whereas the putative conserved miRNA, miR396a, and the putative candidate miRNA, miR37, were the most up-regulated. Finally, analysis of the predicted gene targets of miRNAs using the GO and KO databases indicated that a range of metabolic and other cellular functions known to be associated with salt stress were up-regulated in broccoli treated with salt. A comprehensive study of broccoli miRNA in relation to salt stress has been performed. We report significant data on the miRNA profile of broccoli that will underpin further studies on stress responses in broccoli and related species. The differential regulation of miRNAs between control and salt-stressed broccoli indicates that miRNAs play an integral role in the regulation of responses to salt stress.

  18. Large-Scale Discovery of Disease-Disease and Disease-Gene Associations

    PubMed Central

    Gligorijevic, Djordje; Stojanovic, Jelena; Djuric, Nemanja; Radosavljevic, Vladan; Grbovic, Mihajlo; Kulathinal, Rob J.; Obradovic, Zoran

    2016-01-01

    Data-driven phenotype analyses on Electronic Health Record (EHR) data have recently drawn benefits across many areas of clinical practice, uncovering new links in the medical sciences that can potentially affect the well-being of millions of patients. In this paper, EHR data is used to discover novel relationships between diseases by studying their comorbidities (co-occurrences in patients). A novel embedding model is designed to extract knowledge from disease comorbidities by learning from a large-scale EHR database comprising more than 35 million inpatient cases spanning nearly a decade, revealing significant improvements on disease phenotyping over current computational approaches. In addition, the use of the proposed methodology is extended to discover novel disease-gene associations by including valuable domain knowledge from genome-wide association studies. To evaluate our approach, its effectiveness is compared against a held-out set where, again, it revealed very compelling results. For selected diseases, we further identify candidate gene lists for which disease-gene associations were not studied previously. Thus, our approach provides biomedical researchers with new tools to filter genes of interest, thus, reducing costly lab studies. PMID:27578529

  19. Identification and characterization of two bisabolene synthases from linear glandular trichomes of sunflower (Helianthus annuus L., Asteraceae).

    PubMed

    Aschenbrenner, Anna-Katharina; Kwon, Moonhyuk; Conrad, Jürgen; Ro, Dae-Kyun; Spring, Otmar

    2016-04-01

    Sunflower is known to produce a variety of bisabolene-type sesquiterpenes and accumulates these substances in trichomes of leaves, stems and flowering parts. A bioinformatics approach was used to identify the enzyme responsible for the initial step in the biosynthesis of these compounds from its precursor farnesyl pyrophosphate. Based on sequence similarity with a known bisabolene synthases from Arabidopsis thaliana AtTPS12, candidate genes of Helianthus were searched in EST-database and used to design specific primers. PCR experiments identified two candidates in the RNA pool of linear glandular trichomes of sunflower. Their sequences contained the typical motifs of sesquiterpene synthases and their expression in yeast functionally characterized them as bisabolene synthases. Spectroscopic analysis identified the stereochemistry of the product of both enzymes as (Z)-γ-bisabolene. The origin of the two sunflower bisabolene synthase genes from the transcripts of linear trichomes indicates that they may be involved in the synthesis of sesquiterpenes produced in these trichomes. Comparison of the amino acid sequences of the sunflower bisabolene synthases showed high similarity with sesquiterpene synthases from other Asteracean species and indicated putative evolutionary origin from a β-farnesene synthase. Copyright © 2016 Elsevier Ltd. All rights reserved.

  20. The influence of genetic variation on late toxicities in childhood cancer survivors: A review.

    PubMed

    Clemens, E; van der Kooi, A L F; Broer, L; van Dulmen-den Broeder, E; Visscher, H; Kremer, L; Tissing, W; Loonen, J; Ronckers, C M; Pluijm, S M F; Neggers, S J C M M; Zolk, O; Langer, T; Zehnhoff-Dinnesen, A Am; Wilson, C L; Hudson, M M; Carleton, B; Laven, J S E; Uitterlinden, A G; van den Heuvel-Eibrink, M M

    2018-06-01

    The variability in late toxicities among childhood cancer survivors (CCS) is only partially explained by treatment and baseline patient characteristics. Inter-individual variability in the association between treatment exposure and risk of late toxicity suggests that genetic variation possibly modifies this association. We reviewed the available literature on genetic susceptibility of late toxicity after childhood cancer treatment related to components of metabolic syndrome, bone mineral density, gonadal impairment and hearing impairment. A systematic literature search was performed, using Embase, Cochrane Library, Google Scholar, MEDLINE, and Web of Science databases. Eligible publications included all English language reports of candidate gene studies and genome wide association studies (GWAS) that aimed to identify genetic risk factors associated with the four late toxicities, defined as toxicity present after end of treatment. Twenty-seven articles were identified, including 26 candidate gene studies: metabolic syndrome (n = 6); BMD (n = 6); gonadal impairment (n = 2); hearing impairment (n = 12) and one GWAS (metabolic syndrome). Eighty percent of the genetic studies on late toxicity after childhood cancer had relatively small sample sizes (n < 200), leading to insufficient power, and lacked adjustment for multiple comparisons. Only four (4/26 = 15%) candidate gene studies had their findings validated in independent replication cohorts as part of their own report. Genetic susceptibility associations are not consistent or not replicated and therefore, currently no evidence-based recommendations can be made for hearing impairment, gonadal impairment, bone mineral density impairment and metabolic syndrome in CCS. To advance knowledge related to genetic variation influencing late toxicities among CCS, future studies need adequate power, independent cohorts for replication, harmonization of disease outcomes and sample collections, and (international) collaboration. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.

  1. Identification of differentially expressed genes in the oviduct of two rabbit lines divergently selected for uterine capacity using suppression subtractive hybridization.

    PubMed

    Ballester, M; Castelló, A; Peiró, R; Argente, M J; Santacreu, M A; Folch, J M

    2013-06-01

    Suppressive subtractive hybridization libraries from oviduct at 62 h post-mating of two lines of rabbits divergently selected for uterine capacity were generated to identify differentially expressed genes. A total of 438 singletons and 126 contigs were obtained by cluster assembly and sequence alignment of 704 expressed sequence tags (ESTs), of which 54% showed homology to known proteins of the non-redundant NCBI databases. Differential screening by dot blot validated 71 ESTs, of which 47 showed similarity to known genes. Transcripts of genes were functionally annotated in the molecular function and the biological process gene ontology categories using the BLAST2GO software and were assigned to reproductive developmental process, immune response, amino acid metabolism and degradation, response to stress and apoptosis terms. Finally, three interesting genes, PGR, HSD17B4 and ERO1L, were identified as overexpressed in the low line using RT-qPCR. Our study provides a list of candidate genes that can be useful to understanding the molecular mechanisms underlying the phenotypic differences observed in early embryo survival and development traits. © 2012 The Authors, Animal Genetics © 2012 Stichting International Foundation for Animal Genetics.

  2. Identification of positive selection in disease response genes within members of the Poaceae.

    PubMed

    Rech, Gabriel E; Vargas, Walter A; Sukno, Serenella A; Thon, Michael R

    2012-12-01

    Millions of years of coevolution between plants and pathogens can leave footprints on their genomes and genes involved on this interaction are expected to show patterns of positive selection in which novel, beneficial alleles are rapidly fixed within the population. Using information about upregulated genes in maize during Colletotrichum graminicola infection and resources available in the Phytozome database, we looked for evidence of positive selection in the Poaceae lineage, acting on protein coding sequences related with plant defense. We found six genes with evidence of positive selection and another eight with sites showing episodic selection. Some of them have already been described as evolving under positive selection, but others are reported here for the first time including genes encoding isocitrate lyase, dehydrogenases, a multidrug transporter, a protein containing a putative leucine-rich repeat and other proteins with unknown functions. Mapping positively selected residues onto the predicted 3-D structure of proteins showed that most of them are located on the surface, where proteins are in contact with other molecules. We present here a set of Poaceae genes that are likely to be involved in plant defense mechanisms and have evidence of positive selection. These genes are excellent candidates for future functional validation.

  3. Use of homologous and heterologous gene expression profiling tools to characterize transcription dynamics during apple fruit maturation and ripening.

    PubMed

    Costa, Fabrizio; Alba, Rob; Schouten, Henk; Soglio, Valeria; Gianfranceschi, Luca; Serra, Sara; Musacchi, Stefano; Sansavini, Silviero; Costa, Guglielmo; Fei, Zhangjun; Giovannoni, James

    2010-10-25

    Fruit development, maturation and ripening consists of a complex series of biochemical and physiological changes that in climacteric fruits, including apple and tomato, are coordinated by the gaseous hormone ethylene. These changes lead to final fruit quality and understanding of the functional machinery underlying these processes is of both biological and practical importance. To date many reports have been made on the analysis of gene expression in apple. In this study we focused our investigation on the role of ethylene during apple maturation, specifically comparing transcriptomics of normal ripening with changes resulting from application of the hormone receptor competitor 1-methylcyclopropene. To gain insight into the molecular process regulating ripening in apple, and to compare to tomato (model species for ripening studies), we utilized both homologous and heterologous (tomato) microarray to profile transcriptome dynamics of genes involved in fruit development and ripening, emphasizing those which are ethylene regulated.The use of both types of microarrays facilitated transcriptome comparison between apple and tomato (for the later using data previously published and available at the TED: tomato expression database) and highlighted genes conserved during ripening of both species, which in turn represent a foundation for further comparative genomic studies. The cross-species analysis had the secondary aim of examining the efficiency of heterologous (specifically tomato) microarray hybridization for candidate gene identification as related to the ripening process. The resulting transcriptomics data revealed coordinated gene expression during fruit ripening of a subset of ripening-related and ethylene responsive genes, further facilitating the analysis of ethylene response during fruit maturation and ripening. Our combined strategy based on microarray hybridization enabled transcriptome characterization during normal climacteric apple ripening, as well as definition of ethylene-dependent transcriptome changes. Comparison with tomato fruit maturation and ethylene responsive transcriptome activity facilitated identification of putative conserved orthologous ripening-related genes, which serve as an initial set of candidates for assessing conservation of gene activity across genomes of fruit bearing plant species.

  4. Forkhead-box series expression network is associated with outcome of clear-cell renal cell carcinoma.

    PubMed

    Jia, Zhongwei; Wan, Fangning; Zhu, Yao; Shi, Guohai; Zhang, Hailiang; Dai, Bo; Ye, Dingwei

    2018-06-01

    Previous studies have demonstrated that several members of the Forkhead-box (FOX) family of genes are associated with tumor progression and metastasis. The objective of the current study was to screen candidate FOX family genes identified from analysis of molecular networks in clear cell renal cell carcinoma (ccRCC). The expression of FOX family genes as well as FOX family-associated genes was examined, and Kaplan-Meier survival analysis was performed in The Cancer Genome Atlas (TCGA) cohort (n=525). Patient characteristics, including sex, age, tumor diameter, laterality, tumor-node-metastasis, tumor grade, stage, white blood cell count, platelet count, the levels of hemoglobin, overall survival (OS) and disease-free survival (DFS), were collected for univariate and multivariate Cox proportional hazards ratio analyses. A total of seven candidate FOX family genes were selected from the TCGA database subsequent to univariate and multivariate Cox proportional hazards ratio analyses. FOXA1, FOXA2, FOXD1, FOXD4L2, FOXK2 and FOXL1 were associated with poor OS time, while FOXA1, FOXA2, FOXD1 and FOXK2 were associated with poor DFS time (P<0.05). FOXN2 was associated with favorable outcomes for overall and disease-free survival (P<0.05). In the gene cluster network analysis, the expression of FOX family-associated genes, including nuclear receptor coactivator ( NCOA ) 1 , NADH-ubiquinone oxidoreductase flavoprotein 3 ( NDUFV3 ), phosphatidylserine decarboxylase ( PISD ) and pyruvate kinase liver and red blood cell ( PKLR ), were independent prognostic factors for OS in patients with ccRCC. Results of the present study revealed that the expression of FOX family genes, including FOXA1, FOXA2, FOXD1, FOXD4L2, FOXK2 and FOXL1 , and FOX family-associated genes, including NCOA1, NDUFV3, PISD and PKLR , are independent prognostic factors for patients with ccRCC.

  5. Using the candidate gene approach for detecting genes underlying seed oil concentration and yield in soybean.

    PubMed

    Eskandari, Mehrzad; Cober, Elroy R; Rajcan, Istvan

    2013-07-01

    Increasing the oil concentration in soybean seeds has been given more attention in recent years because of demand for both edible oil and biodiesel production. Oil concentration in soybean is a complex quantitative trait regulated by many genes as well as environmental conditions. To identify genes governing seed oil concentration in soybean, 16 putative candidate genes of three important gene families (GPAT: acyl-CoA:sn-glycerol-3-phosphate acyltransferase, DGAT: acyl-CoA:diacylglycerol acyltransferase, and PDAT: phospholipid:diacylglycerol acyltransferase) involved in triacylglycerol (TAG) biosynthesis pathways were selected and their sequences retrieved from the soybean database ( http://www.phytozome.net/soybean ). Three sequence mutations were discovered in either coding or noncoding regions of three DGAT soybean isoforms when comparing the parents of a 203 recombinant inbreed line (RIL) population; OAC Wallace and OAC Glencoe. The RIL population was used to study the effects of these mutations on seed oil concentration and other important agronomic and seed composition traits, including seed yield and protein concentration across three field locations in Ontario, Canada, in 2009 and 2010. An insertion/deletion (indel) mutation in the GmDGAT2B gene in OAC Wallace was significantly associated with reduced seed oil concentration across three environments and reduced seed yield at Woodstock in 2010. A mutation in the 3' untranslated (3'UTR) region of GmDGAT2C was associated with seed yield at Woodstock in 2009. A mutation in the intronic region of GmDGAR1B was associated with seed yield and protein concentration at Ottawa in 2010. The genes identified in this study had minor effects on either seed yield or oil concentration, which was in agreement with the quantitative nature of the traits. However, the novel gene-specific markers designed in the present study can be used in soybean breeding for marker-assisted selection aimed at increasing seed yield and oil concentration with no significant impact on seed protein concentration.

  6. Candidate gene identification of ovulation-inducing genes by RNA sequencing with an in vivo assay in zebrafish.

    PubMed

    Klangnurak, Wanlada; Fukuyo, Taketo; Rezanujjaman, M D; Seki, Masahide; Sugano, Sumio; Suzuki, Yutaka; Tokumoto, Toshinobu

    2018-01-01

    We previously reported the microarray-based selection of three ovulation-related genes in zebrafish. We used a different selection method in this study, RNA sequencing analysis. An additional eight up-regulated candidates were found as specifically up-regulated genes in ovulation-induced samples. Changes in gene expression were confirmed by qPCR analysis. Furthermore, up-regulation prior to ovulation during natural spawning was verified in samples from natural pairing. Gene knock-out zebrafish strains of one of the candidates, the starmaker gene (stm), were established by CRISPR genome editing techniques. Unexpectedly, homozygous mutants were fertile and could spawn eggs. However, a high percentage of unfertilized eggs and abnormal embryos were produced from these homozygous females. The results suggest that the stm gene is necessary for fertilization. In this study, we selected additional ovulation-inducing candidate genes, and a novel function of the stm gene was investigated.

  7. RNA-Seq Reveals Dynamic Changes of Gene Expression in Key Stages of Intestine Regeneration in the Sea Cucumber Apostichopus japonicas

    PubMed Central

    Sun, Lina; Yang, Hongsheng; Chen, Muyan; Ma, Deyou; Lin, Chenggang

    2013-01-01

    Background Sea cucumbers (Holothuroidea; Echinodermata) have the capacity to regenerate lost tissues and organs. Although the histological and cytological aspects of intestine regeneration have been extensively studied, little is known of the genetic mechanisms involved. There has, however, been a renewed effort to develop a database of Expressed Sequence Tags (ESTs) in Apostichopus japonicus, an economically-important species that occurs in China. This is important for studies on genetic breeding, molecular markers and special physiological phenomena. We have also constructed a library of ESTs obtained from the regenerative body wall and intestine of A. japonicus. The database has increased to ∼30000 ESTs. Results We used RNA-Seq to determine gene expression profiles associated with intestinal regeneration in A. japonicus at 3, 7, 14 and 21 days post evisceration (dpe). This was compared to profiles obtained from a normally-functioning intestine. Approximately 5 million (M) reads were sequenced in every library. Over 2400 up-regulated genes (>10%) and over 1000 down-regulated genes (∼5%) were observed at 3 and 7dpe (log2Ratio≥1, FDR≤0.001). Specific “Go terms” revealed that the DEGs (Differentially Expressed Genes) performed an important function at every regeneration stage. Besides some expected pathways (for example, Ribosome and Spliceosome pathway term), the “Notch signaling pathway,” the “ECM-receptor interaction” and the “Cytokine-cytokine receptor interaction” were significantly enriched. We also investigated the expression profiles of developmental genes, ECM-associated genes and Cytoskeletal genes. Twenty of the most important differentially expressed genes (DEGs) were verified by Real-time PCR, which resulted in a trend concordance of almost 100% between the two techniques. Conclusion Our studies demonstrated dynamic changes in global gene expression during intestine regeneration and presented a series of candidate genes and enriched pathways that contribute to intestine regeneration in sea cucumbers. This provides a foundation for future studies on the genetics/molecular mechanisms associated with intestine regeneration. PMID:23936330

  8. Potential interaction between the GARS-AIRS-GART Gene and CP2/LBP-1c/LSF transcription factor in Down syndrome-related Alzheimer disease.

    PubMed

    Banerjee, Disha; Nandagopal, Krishnadas

    2007-12-01

    (1) GARS-AIRS-GART is an important candidate gene in studies of Down syndrome (DS)-related Alzheimer's disease (AD), due to its chromosomal localization (21q22.1) in the Down syndrome critical region, involvement in de novo purine biosynthesis, and over-expression in DS brain. The aim of this study was to identify factor(s) likely to enhance transcription of GARS-AIRS-GART in DS-related AD. (2) Based on a bio-informatics approach, the PromoterInspector, Promoter Scan II, and EBI toolbox CpG plot software programs were used to identify GARS-AIRS-GART sequences important for gene transcription. Transcription factor binding motifs within these regions were mapped with the help of the MatInspector and TFSEARCH programs. Factors implicated in neurodevelopment or neurodegeneration were the focus of attention, and mining of human (T1Dbase) and murine (GNF) expression databases revealed information on the regional distribution of these factors and their relative abundance vis-a-vis GARS-AIRS-GART. (3) The Leader-binding protein 1-c (LBP-1c/CP2/LSF) emerged as a promising candidate from these studies, as MatInspector and TFSEARCH analyses revealed a total of four CP2 binding sites with potential for functional interaction(s) within the promoter and CpG islands of GARS-AIRS-GART. Furthermore, two of these sites harbor sequences for methylation-sensitive restriction enzymes, which suggest that methylation status may, in part, regulate CP2-mediated transcription of GARS-AIRS-GART. A search of T1Dbase and GNF expression databases reveals co-expression of CP2 and GARS-AIRS-GART in brain regions relevant to DS-related AD. (4) The virtual screen identified CP2/LBP-1c/LSF as a factor that likely mediates enhanced transcription of GARS-AIRS-GART in DS-related AD.

  9. MitProNet: A Knowledgebase and Analysis Platform of Proteome, Interactome and Diseases for Mammalian Mitochondria

    PubMed Central

    Mao, Song; Chai, Xiaoqiang; Hu, Yuling; Hou, Xugang; Tang, Yiheng; Bi, Cheng; Li, Xiao

    2014-01-01

    Mitochondrion plays a central role in diverse biological processes in most eukaryotes, and its dysfunctions are critically involved in a large number of diseases and the aging process. A systematic identification of mitochondrial proteomes and characterization of functional linkages among mitochondrial proteins are fundamental in understanding the mechanisms underlying biological functions and human diseases associated with mitochondria. Here we present a database MitProNet which provides a comprehensive knowledgebase for mitochondrial proteome, interactome and human diseases. First an inventory of mammalian mitochondrial proteins was compiled by widely collecting proteomic datasets, and the proteins were classified by machine learning to achieve a high-confidence list of mitochondrial proteins. The current version of MitProNet covers 1124 high-confidence proteins, and the remainders were further classified as middle- or low-confidence. An organelle-specific network of functional linkages among mitochondrial proteins was then generated by integrating genomic features encoded by a wide range of datasets including genomic context, gene expression profiles, protein-protein interactions, functional similarity and metabolic pathways. The functional-linkage network should be a valuable resource for the study of biological functions of mitochondrial proteins and human mitochondrial diseases. Furthermore, we utilized the network to predict candidate genes for mitochondrial diseases using prioritization algorithms. All proteins, functional linkages and disease candidate genes in MitProNet were annotated according to the information collected from their original sources including GO, GEO, OMIM, KEGG, MIPS, HPRD and so on. MitProNet features a user-friendly graphic visualization interface to present functional analysis of linkage networks. As an up-to-date database and analysis platform, MitProNet should be particularly helpful in comprehensive studies of complicated biological mechanisms underlying mitochondrial functions and human mitochondrial diseases. MitProNet is freely accessible at http://bio.scu.edu.cn:8085/MitProNet. PMID:25347823

  10. Selection of Suitable Reference Genes for RT-qPCR Normalization under Abiotic Stresses and Hormone Stimulation in Persimmon (Diospyros kaki Thunb)

    PubMed Central

    Wang, Peihong; Xiong, Aisheng; Gao, Zhihong; Yu, Xinyi; Li, Man; Hou, Yingjun; Sun, Chao; Qu, Shenchun

    2016-01-01

    The success of quantitative real-time reverse transcription polymerase chain reaction (RT-qPCR) to quantify gene expression depends on the stability of the reference genes used for data normalization. To date, systematic screening for reference genes in persimmon (Diospyros kaki Thunb) has never been reported. In this study, 13 candidate reference genes were cloned from 'Nantongxiaofangshi' using information available in the transcriptome database. Their expression stability was assessed by geNorm and NormFinder algorithms under abiotic stress and hormone stimulation. Our results showed that the most suitable reference genes across all samples were UBC and GAPDH, and not the commonly used persimmon reference gene ACT. In addition, UBC combined with RPII or TUA were found to be appropriate for the "abiotic stress" group and α-TUB combined with PP2A were found to be appropriate for the "hormone stimuli" group. For further validation, the transcript level of the DkDREB2C homologue under heat stress was studied with the selected genes (CYP, GAPDH, TUA, UBC, α-TUB, and EF1-α). The results suggested that it is necessary to choose appropriate reference genes according to the test materials or experimental conditions. Our study will be useful for future studies on gene expression in persimmon. PMID:27513755

  11. FISH Oracle: a web server for flexible visualization of DNA copy number data in a genomic context.

    PubMed

    Mader, Malte; Simon, Ronald; Steinbiss, Sascha; Kurtz, Stefan

    2011-07-28

    The rapidly growing amount of array CGH data requires improved visualization software supporting the process of identifying candidate cancer genes. Optimally, such software should work across multiple microarray platforms, should be able to cope with data from different sources and should be easy to operate. We have developed a web-based software FISH Oracle to visualize data from multiple array CGH experiments in a genomic context. Its fast visualization engine and advanced web and database technology supports highly interactive use. FISH Oracle comes with a convenient data import mechanism, powerful search options for genomic elements (e.g. gene names or karyobands), quick navigation and zooming into interesting regions, and mechanisms to export the visualization into different high quality formats. These features make the software especially suitable for the needs of life scientists. FISH Oracle offers a fast and easy to use visualization tool for array CGH and SNP array data. It allows for the identification of genomic regions representing minimal common changes based on data from one or more experiments. FISH Oracle will be instrumental to identify candidate onco and tumor suppressor genes based on the frequency and genomic position of DNA copy number changes. The FISH Oracle application and an installed demo web server are available at http://www.zbh.uni-hamburg.de/fishoracle.

  12. FISH Oracle: a web server for flexible visualization of DNA copy number data in a genomic context

    PubMed Central

    2011-01-01

    Background The rapidly growing amount of array CGH data requires improved visualization software supporting the process of identifying candidate cancer genes. Optimally, such software should work across multiple microarray platforms, should be able to cope with data from different sources and should be easy to operate. Results We have developed a web-based software FISH Oracle to visualize data from multiple array CGH experiments in a genomic context. Its fast visualization engine and advanced web and database technology supports highly interactive use. FISH Oracle comes with a convenient data import mechanism, powerful search options for genomic elements (e.g. gene names or karyobands), quick navigation and zooming into interesting regions, and mechanisms to export the visualization into different high quality formats. These features make the software especially suitable for the needs of life scientists. Conclusions FISH Oracle offers a fast and easy to use visualization tool for array CGH and SNP array data. It allows for the identification of genomic regions representing minimal common changes based on data from one or more experiments. FISH Oracle will be instrumental to identify candidate onco and tumor suppressor genes based on the frequency and genomic position of DNA copy number changes. The FISH Oracle application and an installed demo web server are available at http://www.zbh.uni-hamburg.de/fishoracle. PMID:21884636

  13. A tuberculosis biomarker database: the key to novel TB diagnostics.

    PubMed

    Yerlikaya, Seda; Broger, Tobias; MacLean, Emily; Pai, Madhukar; Denkinger, Claudia M

    2017-03-01

    New diagnostic innovations for tuberculosis (TB), including point-of-care solutions, are critical to reach the goals of the End TB Strategy. However, despite decades of research, numerous reports on new biomarker candidates, and significant investment, no well-performing, simple and rapid TB diagnostic test is yet available on the market, and the search for accurate, non-DNA biomarkers remains a priority. To help overcome this 'biomarker pipeline problem', FIND and partners are working on the development of a well-curated and user-friendly TB biomarker database. The web-based database will enable the dynamic tracking of evidence surrounding biomarker candidates in relation to target product profiles (TPPs) for needed TB diagnostics. It will be able to accommodate raw datasets and facilitate the verification of promising biomarker candidates and the identification of novel biomarker combinations. As such, the database will simplify data and knowledge sharing, empower collaboration, help in the coordination of efforts and allocation of resources, streamline the verification and validation of biomarker candidates, and ultimately lead to an accelerated translation into clinically useful tools. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  14. miRnalyze: an interactive database linking tool to unlock intuitive microRNA regulation of cell signaling pathways

    PubMed Central

    Subhra Das, Sankha; James, Mithun; Paul, Sandip

    2017-01-01

    Abstract The various pathophysiological processes occurring in living systems are known to be orchestrated by delicate interplays and cross-talks between different genes and their regulators. Among the various regulators of genes, there is a class of small non-coding RNA molecules known as microRNAs. Although, the relative simplicity of miRNAs and their ability to modulate cellular processes make them attractive therapeutic candidates, their presence in large numbers make it challenging for experimental researchers to interpret the intricacies of the molecular processes they regulate. Most of the existing bioinformatic tools fail to address these challenges. Here, we present a new web resource ‘miRnalyze’ that has been specifically designed to directly identify the putative regulation of cell signaling pathways by miRNAs. The tool integrates miRNA-target predictions with signaling cascade members by utilizing TargetScanHuman 7.1 miRNA-target prediction tool and the KEGG pathway database, and thus provides researchers with in-depth insights into modulation of signal transduction pathways by miRNAs. miRnalyze is capable of identifying common miRNAs targeting more than one gene in the same signaling pathway—a feature that further increases the probability of modulating the pathway and downstream reactions when using miRNA modulators. Additionally, miRnalyze can sort miRNAs according to the seed-match types and TargetScan Context ++ score, thus providing a hierarchical list of most valuable miRNAs. Furthermore, in order to provide users with comprehensive information regarding miRNAs, genes and pathways, miRnalyze also links to expression data of miRNAs (miRmine) and genes (TiGER) and proteome abundance (PaxDb) data. To validate the capability of the tool, we have documented the correlation of miRnalyze’s prediction with experimental confirmation studies. Database URL: http://www.mirnalyze.in PMID:28365733

  15. Next-generation sequencing to identify candidate genes and develop diagnostic markers for a novel Phytophthora resistance gene, RpsHC18, in soybean.

    PubMed

    Zhong, Chao; Sun, Suli; Li, Yinping; Duan, Canxing; Zhu, Zhendong

    2018-03-01

    A novel Phytophthora sojae resistance gene RpsHC18 was identified and finely mapped on soybean chromosome 3. Two NBS-LRR candidate genes were identified and two diagnostic markers of RpsHC18 were developed. Phytophthora root rot caused by Phytophthora sojae is a destructive disease of soybean. The most effective disease-control strategy is to deploy resistant cultivars carrying Phytophthora-resistant Rps genes. The soybean cultivar Huachun 18 has a broad and distinct resistance spectrum to 12 P. sojae isolates. Quantitative trait loci sequencing (QTL-seq), based on the whole-genome resequencing (WGRS) of two extreme resistant and susceptible phenotype bulks from an F 2:3 population, was performed, and one 767-kb genomic region with ΔSNP-index ≥ 0.9 on chromosome 3 was identified as the RpsHC18 candidate region in Huachun 18. The candidate region was reduced to a 146-kb region by fine mapping. Nonsynonymous SNP and haplotype analyses were carried out in the 146-kb region among ten soybean genotypes using WGRS. Four specific nonsynonymous SNPs were identified in two nucleotide-binding sites-leucine-rich repeat (NBS-LRR) genes, RpsHC18-NBL1 and RpsHC18-NBL2, which were considered to be the candidate genes. Finally, one specific SNP marker in each candidate gene was successfully developed using a tetra-primer ARMS-PCR assay, and the two markers were verified to be specific for RpsHC18 and to effectively distinguish other known Rps genes. In this study, we applied an integrated genomic-based strategy combining WGRS with traditional genetic mapping to identify RpsHC18 candidate genes and develop diagnostic markers. These results suggest that next-generation sequencing is a precise, rapid and cost-effective way to identify candidate genes and develop diagnostic markers, and it can accelerate Rps gene cloning and marker-assisted selection for breeding of P. sojae-resistant soybean cultivars.

  16. Dissecting the organ specificity of insecticide resistance candidate genes in Anopheles gambiae: known and novel candidate genes.

    PubMed

    Ingham, Victoria A; Jones, Christopher M; Pignatelli, Patricia; Balabanidou, Vasileia; Vontas, John; Wagstaff, Simon C; Moore, Jonathan D; Ranson, Hilary

    2014-11-25

    The elevated expression of enzymes with insecticide metabolism activity can lead to high levels of insecticide resistance in the malaria vector, Anopheles gambiae. In this study, adult female mosquitoes from an insecticide susceptible and resistant strain were dissected into four different body parts. RNA from each of these samples was used in microarray analysis to determine the enrichment patterns of the key detoxification gene families within the mosquito and to identify additional candidate insecticide resistance genes that may have been overlooked in previous experiments on whole organisms. A general enrichment in the transcription of genes from the four major detoxification gene families (carboxylesterases, glutathione transferases, UDP glucornyltransferases and cytochrome P450s) was observed in the midgut and malpighian tubules. Yet the subset of P450 genes that have previously been implicated in insecticide resistance in An gambiae, show a surprisingly varied profile of tissue enrichment, confirmed by qPCR and, for three candidates, by immunostaining. A stringent selection process was used to define a list of 105 genes that are significantly (p ≤0.001) over expressed in body parts from the resistant versus susceptible strain. Over half of these, including all the cytochrome P450s on this list, were identified in previous whole organism comparisons between the strains, but several new candidates were detected, notably from comparisons of the transcriptomes from dissected abdomen integuments. The use of RNA extracted from the whole organism to identify candidate insecticide resistance genes has a risk of missing candidates if key genes responsible for the phenotype have restricted expression within the body and/or are over expression only in certain tissues. However, as transcription of genes implicated in metabolic resistance to insecticides is not enriched in any one single organ, comparison of the transcriptome of individual dissected body parts cannot be recommended as a preferred means to identify new candidate insecticide resistant genes. Instead the rich data set on in vivo sites of transcription should be consulted when designing follow up qPCR validation steps, or for screening known candidates in field populations.

  17. Meta-review of protein network regulating obesity between validated obesity candidate genes in the white adipose tissue of high-fat diet-induced obese C57BL/6J mice.

    PubMed

    Kim, Eunjung; Kim, Eun Jung; Seo, Seung-Won; Hur, Cheol-Goo; McGregor, Robin A; Choi, Myung-Sook

    2014-01-01

    Worldwide obesity and related comorbidities are increasing, but identifying new therapeutic targets remains a challenge. A plethora of microarray studies in diet-induced obesity models has provided large datasets of obesity associated genes. In this review, we describe an approach to examine the underlying molecular network regulating obesity, and we discuss interactions between obesity candidate genes. We conducted network analysis on functional protein-protein interactions associated with 25 obesity candidate genes identified in a literature-driven approach based on published microarray studies of diet-induced obesity. The obesity candidate genes were closely associated with lipid metabolism and inflammation. Peroxisome proliferator activated receptor gamma (Pparg) appeared to be a core obesity gene, and obesity candidate genes were highly interconnected, suggesting a coordinately regulated molecular network in adipose tissue. In conclusion, the current network analysis approach may help elucidate the underlying molecular network regulating obesity and identify anti-obesity targets for therapeutic intervention.

  18. Whole exome sequencing identifies novel candidate genes that modify chronic obstructive pulmonary disease susceptibility.

    PubMed

    Bruse, Shannon; Moreau, Michael; Bromberg, Yana; Jang, Jun-Ho; Wang, Nan; Ha, Hongseok; Picchi, Maria; Lin, Yong; Langley, Raymond J; Qualls, Clifford; Klensney-Tait, Julia; Zabner, Joseph; Leng, Shuguang; Mao, Jenny; Belinsky, Steven A; Xing, Jinchuan; Nyunoya, Toru

    2016-01-07

    Chronic obstructive pulmonary disease (COPD) is characterized by an irreversible airflow limitation in response to inhalation of noxious stimuli, such as cigarette smoke. However, only 15-20 % smokers manifest COPD, suggesting a role for genetic predisposition. Although genome-wide association studies have identified common genetic variants that are associated with susceptibility to COPD, effect sizes of the identified variants are modest, as is the total heritability accounted for by these variants. In this study, an extreme phenotype exome sequencing study was combined with in vitro modeling to identify COPD candidate genes. We performed whole exome sequencing of 62 highly susceptible smokers and 30 exceptionally resistant smokers to identify rare variants that may contribute to disease risk or resistance to COPD. This was a cross-sectional case-control study without therapeutic intervention or longitudinal follow-up information. We identified candidate genes based on rare variant analyses and evaluated exonic variants to pinpoint individual genes whose function was computationally established to be significantly different between susceptible and resistant smokers. Top scoring candidate genes from these analyses were further filtered by requiring that each gene be expressed in human bronchial epithelial cells (HBECs). A total of 81 candidate genes were thus selected for in vitro functional testing in cigarette smoke extract (CSE)-exposed HBECs. Using small interfering RNA (siRNA)-mediated gene silencing experiments, we showed that silencing of several candidate genes augmented CSE-induced cytotoxicity in vitro. Our integrative analysis through both genetic and functional approaches identified two candidate genes (TACC2 and MYO1E) that augment cigarette smoke (CS)-induced cytotoxicity and, potentially, COPD susceptibility.

  19. Generating Gene Ontology-Disease Inferences to Explore Mechanisms of Human Disease at the Comparative Toxicogenomics Database

    PubMed Central

    Davis, Allan Peter; Wiegers, Thomas C.; King, Benjamin L.; Wiegers, Jolene; Grondin, Cynthia J.; Sciaky, Daniela; Johnson, Robin J.; Mattingly, Carolyn J.

    2016-01-01

    Strategies for discovering common molecular events among disparate diseases hold promise for improving understanding of disease etiology and expanding treatment options. One technique is to leverage curated datasets found in the public domain. The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) manually curates chemical-gene, chemical-disease, and gene-disease interactions from the scientific literature. The use of official gene symbols in CTD interactions enables this information to be combined with the Gene Ontology (GO) file from NCBI Gene. By integrating these GO-gene annotations with CTD’s gene-disease dataset, we produce 753,000 inferences between 15,700 GO terms and 4,200 diseases, providing opportunities to explore presumptive molecular underpinnings of diseases and identify biological similarities. Through a variety of applications, we demonstrate the utility of this novel resource. As a proof-of-concept, we first analyze known repositioned drugs (e.g., raloxifene and sildenafil) and see that their target diseases have a greater degree of similarity when comparing GO terms vs. genes. Next, a computational analysis predicts seemingly non-intuitive diseases (e.g., stomach ulcers and atherosclerosis) as being similar to bipolar disorder, and these are validated in the literature as reported co-diseases. Additionally, we leverage other CTD content to develop testable hypotheses about thalidomide-gene networks to treat seemingly disparate diseases. Finally, we illustrate how CTD tools can rank a series of drugs as potential candidates for repositioning against B-cell chronic lymphocytic leukemia and predict cisplatin and the small molecule inhibitor JQ1 as lead compounds. The CTD dataset is freely available for users to navigate pathologies within the context of extensive biological processes, molecular functions, and cellular components conferred by GO. This inference set should aid researchers, bioinformaticists, and pharmaceutical drug makers in finding commonalities in disease mechanisms, which in turn could help identify new therapeutics, new indications for existing pharmaceuticals, potential disease comorbidities, and alerts for side effects. PMID:27171405

  20. Generating Gene Ontology-Disease Inferences to Explore Mechanisms of Human Disease at the Comparative Toxicogenomics Database.

    PubMed

    Davis, Allan Peter; Wiegers, Thomas C; King, Benjamin L; Wiegers, Jolene; Grondin, Cynthia J; Sciaky, Daniela; Johnson, Robin J; Mattingly, Carolyn J

    2016-01-01

    Strategies for discovering common molecular events among disparate diseases hold promise for improving understanding of disease etiology and expanding treatment options. One technique is to leverage curated datasets found in the public domain. The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) manually curates chemical-gene, chemical-disease, and gene-disease interactions from the scientific literature. The use of official gene symbols in CTD interactions enables this information to be combined with the Gene Ontology (GO) file from NCBI Gene. By integrating these GO-gene annotations with CTD's gene-disease dataset, we produce 753,000 inferences between 15,700 GO terms and 4,200 diseases, providing opportunities to explore presumptive molecular underpinnings of diseases and identify biological similarities. Through a variety of applications, we demonstrate the utility of this novel resource. As a proof-of-concept, we first analyze known repositioned drugs (e.g., raloxifene and sildenafil) and see that their target diseases have a greater degree of similarity when comparing GO terms vs. genes. Next, a computational analysis predicts seemingly non-intuitive diseases (e.g., stomach ulcers and atherosclerosis) as being similar to bipolar disorder, and these are validated in the literature as reported co-diseases. Additionally, we leverage other CTD content to develop testable hypotheses about thalidomide-gene networks to treat seemingly disparate diseases. Finally, we illustrate how CTD tools can rank a series of drugs as potential candidates for repositioning against B-cell chronic lymphocytic leukemia and predict cisplatin and the small molecule inhibitor JQ1 as lead compounds. The CTD dataset is freely available for users to navigate pathologies within the context of extensive biological processes, molecular functions, and cellular components conferred by GO. This inference set should aid researchers, bioinformaticists, and pharmaceutical drug makers in finding commonalities in disease mechanisms, which in turn could help identify new therapeutics, new indications for existing pharmaceuticals, potential disease comorbidities, and alerts for side effects.

  1. Genome-wide analyses of late pollen-preferred genes conserved in various rice cultivars and functional identification of a gene involved in the key processes of late pollen development.

    PubMed

    Moon, Sunok; Oo, Moe Moe; Kim, Backki; Koh, Hee-Jong; Oh, Sung Aeong; Yi, Gihwan; An, Gynheung; Park, Soon Ki; Jung, Ki-Hong

    2018-04-23

    Understanding late pollen development, including the maturation and pollination process, is a key component in maintaining crop yields. Transcriptome data obtained through microarray or RNA-seq technologies can provide useful insight into those developmental processes. Six series of microarray data from a public transcriptome database, the Gene Expression Omnibus of the National Center for Biotechnology Information, are related to anther and pollen development. We performed a systematic and functional study across the rice genome of genes that are preferentially expressed in the late stages of pollen development, including maturation and germination. By comparing the transcriptomes of sporophytes and male gametes over time, we identified 627 late pollen-preferred genes that are conserved among japonica and indica rice cultivars. Functional classification analysis with a MapMan tool kit revealed a significant association between cell wall organization/metabolism and mature pollen grains. Comparative analysis of rice and Arabidopsis demonstrated that genes involved in cell wall modifications and the metabolism of major carbohydrates are unique to rice. We used the GUS reporter system to monitor the expression of eight of those genes. In addition, we evaluated the significance of our candidate genes, using T-DNA insertional mutant population and the CRISPR/Cas9 system. Mutants from T-DNA insertion and CRISPR/Cas9 systems of a rice gene encoding glycerophosphoryl diester phosphodiesterase are defective in their male gamete transfer. Through the global analyses of the late pollen-preferred genes from rice, we found several biological features of these genes. First, biological process related to cell wall organization and modification is over-represented in these genes to support rapid tube growth. Second, comparative analysis of late pollen preferred genes between rice and Arabidopsis provide a significant insight on the evolutional disparateness in cell wall biogenesis and storage reserves of pollen. In addition, these candidates might be useful targets for future examinations of late pollen development, and will be a valuable resource for accelerating the understanding of molecular mechanisms for pollen maturation and germination processes in rice.

  2. Differentially Private Frequent Sequence Mining via Sampling-based Candidate Pruning

    PubMed Central

    Xu, Shengzhi; Cheng, Xiang; Li, Zhengyi; Xiong, Li

    2016-01-01

    In this paper, we study the problem of mining frequent sequences under the rigorous differential privacy model. We explore the possibility of designing a differentially private frequent sequence mining (FSM) algorithm which can achieve both high data utility and a high degree of privacy. We found, in differentially private FSM, the amount of required noise is proportionate to the number of candidate sequences. If we could effectively reduce the number of unpromising candidate sequences, the utility and privacy tradeoff can be significantly improved. To this end, by leveraging a sampling-based candidate pruning technique, we propose a novel differentially private FSM algorithm, which is referred to as PFS2. The core of our algorithm is to utilize sample databases to further prune the candidate sequences generated based on the downward closure property. In particular, we use the noisy local support of candidate sequences in the sample databases to estimate which sequences are potentially frequent. To improve the accuracy of such private estimations, a sequence shrinking method is proposed to enforce the length constraint on the sample databases. Moreover, to decrease the probability of misestimating frequent sequences as infrequent, a threshold relaxation method is proposed to relax the user-specified threshold for the sample databases. Through formal privacy analysis, we show that our PFS2 algorithm is ε-differentially private. Extensive experiments on real datasets illustrate that our PFS2 algorithm can privately find frequent sequences with high accuracy. PMID:26973430

  3. Frequent silencing of the candidate tumor suppressor TRIM58 by promoter methylation in early-stage lung adenocarcinoma

    PubMed Central

    Naruto, Takuya; Kohmoto, Tomohiro; Watabnabe, Miki; Tsuboi, Mitsuhiro; Takizawa, Hiromitsu; Kondo, Kazuya; Tangoku, Akira; Imoto, Issei

    2017-01-01

    In this study, we aimed to identify novel drivers that would be epigenetically altered through aberrant methylation in early-stage lung adenocarcinoma (LADC), regardless of the presence or absence of tobacco smoking-induced epigenetic field defects. Through genome-wide screening for aberrantly methylated CpG islands (CGIs) in 12 clinically uniform, stage-I LADC cases affecting six non-smokers and six smokers, we identified candidate tumor-suppressor genes (TSGs) inactivated by hypermethylation. Through systematic expression analyses of those candidates in panels of additional tumor samples and cell lines treated or not treated with 5-aza-deoxycitidine followed by validation analyses of cancer-specific silencing by CGI hypermethylation using a public database, we identified TRIM58 as the most prominent candidate for TSG. TRIM58 was robustly silenced by hypermethylation even in early-stage primary LADC, and the restoration of TRIM58 expression in LADC cell lines inhibited cell growth in vitro and in vivo in anchorage-dependent and -independent manners. Our findings suggest that aberrant inactivation of TRIM58 consequent to CGI hypermethylation might stimulate the early carcinogenesis of LADC regardless of smoking status; furthermore, TRIM58 methylation might be a possible early diagnostic and epigenetic therapeutic target in LADC. PMID:27926516

  4. Frequent silencing of the candidate tumor suppressor TRIM58 by promoter methylation in early-stage lung adenocarcinoma.

    PubMed

    Kajiura, Koichiro; Masuda, Kiyoshi; Naruto, Takuya; Kohmoto, Tomohiro; Watabnabe, Miki; Tsuboi, Mitsuhiro; Takizawa, Hiromitsu; Kondo, Kazuya; Tangoku, Akira; Imoto, Issei

    2017-01-10

    In this study, we aimed to identify novel drivers that would be epigenetically altered through aberrant methylation in early-stage lung adenocarcinoma (LADC), regardless of the presence or absence of tobacco smoking-induced epigenetic field defects. Through genome-wide screening for aberrantly methylated CpG islands (CGIs) in 12 clinically uniform, stage-I LADC cases affecting six non-smokers and six smokers, we identified candidate tumor-suppressor genes (TSGs) inactivated by hypermethylation. Through systematic expression analyses of those candidates in panels of additional tumor samples and cell lines treated or not treated with 5-aza-deoxycitidine followed by validation analyses of cancer-specific silencing by CGI hypermethylation using a public database, we identified TRIM58 as the most prominent candidate for TSG. TRIM58 was robustly silenced by hypermethylation even in early-stage primary LADC, and the restoration of TRIM58 expression in LADC cell lines inhibited cell growth in vitro and in vivo in anchorage-dependent and -independent manners. Our findings suggest that aberrant inactivation of TRIM58 consequent to CGI hypermethylation might stimulate the early carcinogenesis of LADC regardless of smoking status; furthermore, TRIM58 methylation might be a possible early diagnostic and epigenetic therapeutic target in LADC.

  5. Evaluating Reported Candidate Gene Associations with Polycystic Ovary Syndrome

    PubMed Central

    Pau, Cindy; Saxena, Richa; Welt, Corrine Kolka

    2013-01-01

    Objective To replicate variants in candidate genes associated with PCOS in a population of European PCOS and control subjects. Design Case-control association analysis and meta-analysis. Setting Major academic hospital Patients Women of European ancestry with PCOS (n=525) and controls (n=472), aged 18 to 45 years. Intervention Variants previously associated with PCOS in candidate gene studies were genotyped (n=39). Metabolic, reproductive and anthropomorphic parameters were examined as a function of the candidate variants. All genetic association analyses were adjusted for age, BMI and ancestry and were reported after correction for multiple testing. Main Outcome Measure Association of candidate gene variants with PCOS. Results Three variants, rs3797179 (SRD5A1), rs12473543 (POMC), and rs1501299 (ADIPOQ), were nominally associated with PCOS. However, they did not remain significant after correction for multiple testing and none of the variants replicated in a sufficiently powered meta-analysis. Variants in the FBN3 gene (rs17202517 and rs73503752) were associated with smaller waist circumferences and variant rs727428 in the SHBG gene was associated with lower SHBG levels. Conclusion Previously identified variants in candidate genes do not appear to be associated with PCOS risk. PMID:23375202

  6. Reranking candidate gene models with cross-species comparison for improved gene prediction

    PubMed Central

    Liu, Qian; Crammer, Koby; Pereira, Fernando CN; Roos, David S

    2008-01-01

    Background Most gene finders score candidate gene models with state-based methods, typically HMMs, by combining local properties (coding potential, splice donor and acceptor patterns, etc). Competing models with similar state-based scores may be distinguishable with additional information. In particular, functional and comparative genomics datasets may help to select among competing models of comparable probability by exploiting features likely to be associated with the correct gene models, such as conserved exon/intron structure or protein sequence features. Results We have investigated the utility of a simple post-processing step for selecting among a set of alternative gene models, using global scoring rules to rerank competing models for more accurate prediction. For each gene locus, we first generate the K best candidate gene models using the gene finder Evigan, and then rerank these models using comparisons with putative orthologous genes from closely-related species. Candidate gene models with lower scores in the original gene finder may be selected if they exhibit strong similarity to probable orthologs in coding sequence, splice site location, or signal peptide occurrence. Experiments on Drosophila melanogaster demonstrate that reranking based on cross-species comparison outperforms the best gene models identified by Evigan alone, and also outperforms the comparative gene finders GeneWise and Augustus+. Conclusion Reranking gene models with cross-species comparison improves gene prediction accuracy. This straightforward method can be readily adapted to incorporate additional lines of evidence, as it requires only a ranked source of candidate gene models. PMID:18854050

  7. De novo transcriptome assembly databases for the butterfly orchid Phalaenopsis equestris

    PubMed Central

    Niu, Shan-Ce; Xu, Qing; Zhang, Guo-Qiang; Zhang, Yong-Qiang; Tsai, Wen-Chieh; Hsu, Jui-Ling; Liang, Chieh-Kai; Luo, Yi-Bo; Liu, Zhong-Jian

    2016-01-01

    Orchids are renowned for their spectacular flowers and ecological adaptations. After the sequencing of the genome of the tropical epiphytic orchid Phalaenopsis equestris, we combined Illumina HiSeq2000 for RNA-Seq and Trinity for de novo assembly to characterize the transcriptomes for 11 diverse P. equestris tissues representing the root, stem, leaf, flower buds, column, lip, petal, sepal and three developmental stages of seeds. Our aims were to contribute to a better understanding of the molecular mechanisms driving the analysed tissue characteristics and to enrich the available data for P. equestris. Here, we present three databases. The first dataset is the RNA-Seq raw reads, which can be used to execute new experiments with different analysis approaches. The other two datasets allow different types of searches for candidate homologues. The second dataset includes the sets of assembled unigenes and predicted coding sequences and proteins, enabling a sequence-based search. The third dataset consists of the annotation results of the aligned unigenes versus the Nonredundant (Nr) protein database, Kyoto Encyclopaedia of Genes and Genomes (KEGG) and Clusters of Orthologous Groups (COG) databases with low e-values, enabling a name-based search. PMID:27673730

  8. The Comprehensive Phytopathogen Genomics Resource: a web-based resource for data-mining plant pathogen genomes.

    PubMed

    Hamilton, John P; Neeno-Eckwall, Eric C; Adhikari, Bishwo N; Perna, Nicole T; Tisserat, Ned; Leach, Jan E; Lévesque, C André; Buell, C Robin

    2011-01-01

    The Comprehensive Phytopathogen Genomics Resource (CPGR) provides a web-based portal for plant pathologists and diagnosticians to view the genome and trancriptome sequence status of 806 bacterial, fungal, oomycete, nematode, viral and viroid plant pathogens. Tools are available to search and analyze annotated genome sequences of 74 bacterial, fungal and oomycete pathogens. Oomycete and fungal genomes are obtained directly from GenBank, whereas bacterial genome sequences are downloaded from the A Systematic Annotation Package (ASAP) database that provides curation of genomes using comparative approaches. Curated lists of bacterial genes relevant to pathogenicity and avirulence are also provided. The Plant Pathogen Transcript Assemblies Database provides annotated assemblies of the transcribed regions of 82 eukaryotic genomes from publicly available single pass Expressed Sequence Tags. Data-mining tools are provided along with tools to create candidate diagnostic markers, an emerging use for genomic sequence data in plant pathology. The Plant Pathogen Ribosomal DNA (rDNA) database is a resource for pathogens that lack genome or transcriptome data sets and contains 131 755 rDNA sequences from GenBank for 17 613 species identified as plant pathogens and related genera. Database URL: http://cpgr.plantbiology.msu.edu.

  9. Candidate gene prioritization by network analysis of differential expression using machine learning approaches

    PubMed Central

    2010-01-01

    Background Discovering novel disease genes is still challenging for diseases for which no prior knowledge - such as known disease genes or disease-related pathways - is available. Performing genetic studies frequently results in large lists of candidate genes of which only few can be followed up for further investigation. We have recently developed a computational method for constitutional genetic disorders that identifies the most promising candidate genes by replacing prior knowledge by experimental data of differential gene expression between affected and healthy individuals. To improve the performance of our prioritization strategy, we have extended our previous work by applying different machine learning approaches that identify promising candidate genes by determining whether a gene is surrounded by highly differentially expressed genes in a functional association or protein-protein interaction network. Results We have proposed three strategies scoring disease candidate genes relying on network-based machine learning approaches, such as kernel ridge regression, heat kernel, and Arnoldi kernel approximation. For comparison purposes, a local measure based on the expression of the direct neighbors is also computed. We have benchmarked these strategies on 40 publicly available knockout experiments in mice, and performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expression levels (Simple Expression Ranking). Our results showed that our four strategies could outperform this standard procedure and that the best results were obtained using the Heat Kernel Diffusion Ranking leading to an average ranking position of 8 out of 100 genes, an AUC value of 92.3% and an error reduction of 52.8% relative to the standard procedure approach which ranked the knockout gene on average at position 17 with an AUC value of 83.7%. Conclusion In this study we could identify promising candidate genes using network based machine learning approaches even if no knowledge is available about the disease or phenotype. PMID:20840752

  10. An inventory of continental U.S. terrestrial candidate ecological restoration areas based on landscape context.

    PubMed

    Wickham, James; Riitters, Kurt; Vogt, Peter; Costanza, Jennifer; Neale, Anne

    2017-11-01

    Landscape context is an important factor in restoration ecology, but the use of landscape context for site prioritization has not been as fully developed. We used morphological image processing to identify candidate ecological restoration areas based on their proximity to existing natural vegetation. We identified 1,102,720 candidate ecological restoration areas across the continental United States. Candidate ecological restoration areas were concentrated in the Great Plains and eastern United States. We populated the database of candidate ecological restoration areas with 17 attributes related to site content and context, including factors such as soil fertility and roads (site content), and number and area of potentially conjoined vegetated regions (site context) to facilitate its use for site prioritization. We demonstrate the utility of the database in the state of North Carolina, U.S.A. for a restoration objective related to restoration of water quality (mandated by the U.S. Clean Water Act), wetlands, and forest. The database will be made publicly available on the U.S. Environmental Protection Agency's EnviroAtlas website (http://enviroatlas.epa.gov) for stakeholders interested in ecological restoration.

  11. An inventory of continental U.S. terrestrial candidate ecological restoration areas based on landscape context

    PubMed Central

    Wickham, James; Riitters, Kurt; Vogt, Peter; Costanza, Jennifer; Neale, Anne

    2018-01-01

    Landscape context is an important factor in restoration ecology, but the use of landscape context for site prioritization has not been as fully developed. We used morphological image processing to identify candidate ecological restoration areas based on their proximity to existing natural vegetation. We identified 1,102,720 candidate ecological restoration areas across the continental United States. Candidate ecological restoration areas were concentrated in the Great Plains and eastern United States. We populated the database of candidate ecological restoration areas with 17 attributes related to site content and context, including factors such as soil fertility and roads (site content), and number and area of potentially conjoined vegetated regions (site context) to facilitate its use for site prioritization. We demonstrate the utility of the database in the state of North Carolina, U.S.A. for a restoration objective related to restoration of water quality (mandated by the U.S. Clean Water Act), wetlands, and forest. The database will be made publicly available on the U.S. Environmental Protection Agency's EnviroAtlas website (http://enviroatlas.epa.gov) for stakeholders interested in ecological restoration. PMID:29683130

  12. [Stability analysis of reference gene based on real-time PCR in Artemisia annua under cadmium treatment].

    PubMed

    Zhou, Liang-Yun; Mo, Ge; Wang, Sheng; Tang, Jin-Fu; Yue, Hong; Huang, Lu-Qi; Shao, Ai-Juan; Guo, Lan-Ping

    2014-03-01

    In this study, Actin, 18S rRNA, PAL, GAPDH and CPR of Artemisia annua were selected as candidate reference genes, and their gene-specific primers for real-time PCR were designed, then geNorm, NormFinder, BestKeeper, Delta CT and RefFinder were used to evaluate their expression stability in the leaves of A. annua under treatment of different concentrations of Cd, with the purpose of finding a reliable reference gene to ensure the reliability of gene-expression analysis. The results showed that there were some significant differences among the candidate reference genes under different treatments and the order of expression stability of candidate reference gene was Actin > 18S rRNA > PAL > GAPDH > CPR. These results suggested that Actin, 18S rRNA and PAL could be used as ideal reference genes of gene expression analysis in A. annua and multiple internal control genes were adopted for results calibration. In addition, differences in expression stability of candidate reference genes in the leaves of A. annua under the same concentrations of Cd were observed, which suggested that the screening of candidate reference genes was needed even under the same treatment. To our best knowledge, this study for the first time provided the ideal reference genes under Cd treatment in the leaves of A. annua and offered reference for the gene expression analysis of A. annua under other conditions.

  13. Identification of aberrantly expressed long non-coding RNAs in stomach adenocarcinoma.

    PubMed

    Gu, Jianbin; Li, Yong; Fan, Liqiao; Zhao, Qun; Tan, Bibo; Hua, Kelei; Wu, Guobin

    2017-07-25

    Stomach adenocarcinoma (STAD) is a common malignancy worldwide. This study aimed to identify the aberrantly expressed long non-coding RNAs (lncRNAs) in STAD. Total of 74 DElncRNAs and 449 DEmRNAs were identified in STAD compared with paired non-tumor tissues. The DElncRNA/DEmRNA co-expression network was constructed, which covered 519 nodes and 2993 edges. The qRT-PCR validation results of DElncRNAs were consistent with our bioinformatics analysis based on RNA-sequencing. The DEmRNAs co-expressed with DElncRNAs were significantly enriched in gastric acid secretion, complement and coagulation cascades, pancreatic secretion, cytokine-cytokine receptor interaction and Jak-STAT signaling pathway. The expression levels of the nine candidate DElncRNAs in TCGA database were compatible with our RNA-sequencing. FEZF1-AS1, HOTAIR and LINC01234 had the potential diagnosis value for STAD. The lncRNA and mRNA expression profile of 3 STAD tissues and 3 matched adjacent non-tumor tissues was obtained through high-throughput RNA-sequencing. Differentially expressed lncRNAs/mRNAs (DElncRNAs/DEmRNAs) were identified in STAD. DElncRNA/DEmRNA co-expression network construction, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were conducted to predict the biological functions of DElncRNAs. Quantitative real-time polymerase chain reaction (qRT-PCR) was subjected to validate the expression levels of DEmRNAs and DElncRNAs. Moreover, the expression of DElncRNAs was validated through The Cancer Genome Atlas (TCGA) database. The diagnosis value of candidate DElncRNAs was accessed by receiver operating characteristic (ROC) analysis. Our work might provide useful information for exploring the tumorigenesis mechanism of STAD and pave the road for identification of diagnostic biomarkers in STAD.

  14. Prediction of Human Disease Genes by Human-Mouse Conserved Coexpression Analysis

    PubMed Central

    Grassi, Elena; Damasco, Christian; Silengo, Lorenzo; Oti, Martin; Provero, Paolo; Di Cunto, Ferdinando

    2008-01-01

    Background Even in the post-genomic era, the identification of candidate genes within loci associated with human genetic diseases is a very demanding task, because the critical region may typically contain hundreds of positional candidates. Since genes implicated in similar phenotypes tend to share very similar expression profiles, high throughput gene expression data may represent a very important resource to identify the best candidates for sequencing. However, so far, gene coexpression has not been used very successfully to prioritize positional candidates. Methodology/Principal Findings We show that it is possible to reliably identify disease-relevant relationships among genes from massive microarray datasets by concentrating only on genes sharing similar expression profiles in both human and mouse. Moreover, we show systematically that the integration of human-mouse conserved coexpression with a phenotype similarity map allows the efficient identification of disease genes in large genomic regions. Finally, using this approach on 850 OMIM loci characterized by an unknown molecular basis, we propose high-probability candidates for 81 genetic diseases. Conclusion Our results demonstrate that conserved coexpression, even at the human-mouse phylogenetic distance, represents a very strong criterion to predict disease-relevant relationships among human genes. PMID:18369433

  15. High-resolution phylogenetic microbial community profiling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Singer, Esther; Coleman-Derr, Devin; Bowman, Brett

    2014-03-17

    The representation of bacterial and archaeal genome sequences is strongly biased towards cultivated organisms, which belong to merely four phylogenetic groups. Functional information and inter-phylum level relationships are still largely underexplored for candidate phyla, which are often referred to as microbial dark matter. Furthermore, a large portion of the 16S rRNA gene records in the GenBank database are labeled as environmental samples and unclassified, which is in part due to low read accuracy, potential chimeric sequences produced during PCR amplifications and the low resolution of short amplicons. In order to improve the phylogenetic classification of novel species and advance ourmore » knowledge of the ecosystem function of uncultivated microorganisms, high-throughput full length 16S rRNA gene sequencing methodologies with reduced biases are needed. We evaluated the performance of PacBio single-molecule real-time (SMRT) sequencing in high-resolution phylogenetic microbial community profiling. For this purpose, we compared PacBio and Illumina metagenomic shotgun and 16S rRNA gene sequencing of a mock community as well as of an environmental sample from Sakinaw Lake, British Columbia. Sakinaw Lake is known to contain a large age of microbial species from candidate phyla. Sequencing results show that community structure based on PacBio shotgun and 16S rRNA gene sequences is highly similar in both the mock and the environmental communities. Resolution power and community representation accuracy from SMRT sequencing data appeared to be independent of GC content of microbial genomes and was higher when compared to Illumina-based metagenome shotgun and 16S rRNA gene (iTag) sequences, e.g. full-length sequencing resolved all 23 OTUs in the mock community, while iTags did not resolve closely related species. SMRT sequencing hence offers various potential benefits when characterizing uncharted microbial communities.« less

  16. Genome wide association mapping for grain shape traits in indica rice.

    PubMed

    Feng, Yue; Lu, Qing; Zhai, Rongrong; Zhang, Mengchen; Xu, Qun; Yang, Yaolong; Wang, Shan; Yuan, Xiaoping; Yu, Hanyong; Wang, Yiping; Wei, Xinghua

    2016-10-01

    Using genome-wide association mapping, 47 SNPs within 27 significant loci were identified for four grain shape traits, and 424 candidate genes were predicted from public database. Grain shape is a key determinant of grain yield and quality in rice (Oryza sativa L.). However, our knowledge of genes controlling rice grain shape remains limited. Genome-wide association mapping based on linkage disequilibrium (LD) has recently emerged as an effective approach for identifying genes or quantitative trait loci (QTL) underlying complex traits in plants. In this study, association mapping based on 5291 single nucleotide polymorphisms (SNPs) was conducted to identify significant loci associated with grain shape traits in a global collection of 469 diverse rice accessions. A total of 47 SNPs were located in 27 significant loci for four grain traits, and explained ~44.93-65.90 % of the phenotypic variation for each trait. In total, 424 candidate genes within a 200 kb extension region (±100 kb of each locus) of these loci were predicted. Of them, the cloned genes GS3 and qSW5 showed very strong effects on grain length and grain width in our study. Comparing with previously reported QTLs for grain shape traits, we found 11 novel loci, including 3, 3, 2 and 3 loci for grain length, grain width, grain length-width ratio and thousand grain weight, respectively. Validation of these new loci would be performed in the future studies. These results revealed that besides GS3 and qSW5, multiple novel loci and mechanisms were involved in determining rice grain shape. These findings provided valuable information for understanding of the genetic control of grain shape and molecular marker assistant selection (MAS) breeding in rice.

  17. Identification and analysis of pig chimeric mRNAs using RNA sequencing data

    PubMed Central

    2012-01-01

    Background Gene fusion is ubiquitous over the course of evolution. It is expected to increase the diversity and complexity of transcriptomes and proteomes through chimeric sequence segments or altered regulation. However, chimeric mRNAs in pigs remain unclear. Here we identified some chimeric mRNAs in pigs and analyzed the expression of them across individuals and breeds using RNA-sequencing data. Results The present study identified 669 putative chimeric mRNAs in pigs, of which 251 chimeric candidates were detected in a set of RNA-sequencing data. The 618 candidates had clear trans-splicing sites, 537 of which obeyed the canonical GU-AG splice rule. Only two putative pig chimera variants whose fusion junction was overlapped with that of a known human chimeric mRNA were found. A set of unique chimeric events were considered middle variances in the expression across individuals and breeds, and revealed non-significant variance between sexes. Furthermore, the genomic region of the 5′ partner gene shares a similar DNA sequence with that of the 3′ partner gene for 458 putative chimeric mRNAs. The 81 of those shared DNA sequences significantly matched the known DNA-binding motifs in the JASPAR CORE database. Four DNA motifs shared in parental genomic regions had significant similarity with known human CTCF binding sites. Conclusions The present study provided detailed information on some pig chimeric mRNAs. We proposed a model that trans-acting factors, such as CTCF, induced the spatial organisation of parental genes to the same transcriptional factory so that parental genes were coordinatively transcribed to give birth to chimeric mRNAs. PMID:22925561

  18. LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources.

    PubMed

    Karchin, Rachel; Diekhans, Mark; Kelly, Libusha; Thomas, Daryl J; Pieper, Ursula; Eswar, Narayanan; Haussler, David; Sali, Andrej

    2005-06-15

    The NCBI dbSNP database lists over 9 million single nucleotide polymorphisms (SNPs) in the human genome, but currently contains limited annotation information. SNPs that result in amino acid residue changes (nsSNPs) are of critical importance in variation between individuals, including disease and drug sensitivity. We have developed LS-SNP, a genomic scale software pipeline to annotate nsSNPs. LS-SNP comprehensively maps nsSNPs onto protein sequences, functional pathways and comparative protein structure models, and predicts positions where nsSNPs destabilize proteins, interfere with the formation of domain-domain interfaces, have an effect on protein-ligand binding or severely impact human health. It currently annotates 28,043 validated SNPs that produce amino acid residue substitutions in human proteins from the SwissProt/TrEMBL database. Annotations can be viewed via a web interface either in the context of a genomic region or by selecting sets of SNPs, genes, proteins or pathways. These results are useful for identifying candidate functional SNPs within a gene, haplotype or pathway and in probing molecular mechanisms responsible for functional impacts of nsSNPs. http://www.salilab.org/LS-SNP CONTACT: rachelk@salilab.org http://salilab.org/LS-SNP/supp-info.pdf.

  19. OpenFlyData: an exemplar data web integrating gene expression data on the fruit fly Drosophila melanogaster.

    PubMed

    Miles, Alistair; Zhao, Jun; Klyne, Graham; White-Cooper, Helen; Shotton, David

    2010-10-01

    Integrating heterogeneous data across distributed sources is a major requirement for in silico bioinformatics supporting translational research. For example, genome-scale data on patterns of gene expression in the fruit fly Drosophila melanogaster are widely used in functional genomic studies in many organisms to inform candidate gene selection and validate experimental results. However, current data integration solutions tend to be heavy weight, and require significant initial and ongoing investment of effort. Development of a common Web-based data integration infrastructure (a.k.a. data web), using Semantic Web standards, promises to alleviate these difficulties, but little is known about the feasibility, costs, risks or practical means of migrating to such an infrastructure. We describe the development of OpenFlyData, a proof-of-concept system integrating gene expression data on D. melanogaster, combining Semantic Web standards with light-weight approaches to Web programming based on Web 2.0 design patterns. To support researchers designing and validating functional genomic studies, OpenFlyData includes user-facing search applications providing intuitive access to and comparison of gene expression data from FlyAtlas, the BDGP in situ database, and FlyTED, using data from FlyBase to expand and disambiguate gene names. OpenFlyData's services are also openly accessible, and are available for reuse by other bioinformaticians and application developers. Semi-automated methods and tools were developed to support labour- and knowledge-intensive tasks involved in deploying SPARQL services. These include methods for generating ontologies and relational-to-RDF mappings for relational databases, which we illustrate using the FlyBase Chado database schema; and methods for mapping gene identifiers between databases. The advantages of using Semantic Web standards for biomedical data integration are discussed, as are open issues. In particular, although the performance of open source SPARQL implementations is sufficient to query gene expression data directly from user-facing applications such as Web-based data fusions (a.k.a. mashups), we found open SPARQL endpoints to be vulnerable to denial-of-service-type problems, which must be mitigated to ensure reliability of services based on this standard. These results are relevant to data integration activities in translational bioinformatics. The gene expression search applications and SPARQL endpoints developed for OpenFlyData are deployed at http://openflydata.org. FlyUI, a library of JavaScript widgets providing re-usable user-interface components for Drosophila gene expression data, is available at http://flyui.googlecode.com. Software and ontologies to support transformation of data from FlyBase, FlyAtlas, BDGP and FlyTED to RDF are available at http://openflydata.googlecode.com. SPARQLite, an implementation of the SPARQL protocol, is available at http://sparqlite.googlecode.com. All software is provided under the GPL version 3 open source license.

  20. Interleukin-27 is a novel candidate diagnostic biomarker for bacterial infection in critically ill children.

    PubMed

    Wong, Hector R; Cvijanovich, Natalie Z; Hall, Mark; Allen, Geoffrey L; Thomas, Neal J; Freishtat, Robert J; Anas, Nick; Meyer, Keith; Checchia, Paul A; Lin, Richard; Bigham, Michael T; Sen, Anita; Nowak, Jeffrey; Quasney, Michael; Henricksen, Jared W; Chopra, Arun; Banschbach, Sharon; Beckman, Eileen; Harmon, Kelli; Lahni, Patrick; Shanley, Thomas P

    2012-10-29

    Differentiating between sterile inflammation and bacterial infection in critically ill patients with fever and other signs of the systemic inflammatory response syndrome (SIRS) remains a clinical challenge. The objective of our study was to mine an existing genome-wide expression database for the discovery of candidate diagnostic biomarkers to predict the presence of bacterial infection in critically ill children. Genome-wide expression data were compared between patients with SIRS having negative bacterial cultures (n = 21) and patients with sepsis having positive bacterial cultures (n = 60). Differentially expressed genes were subjected to a leave-one-out cross-validation (LOOCV) procedure to predict SIRS or sepsis classes. Serum concentrations of interleukin-27 (IL-27) and procalcitonin (PCT) were compared between 101 patients with SIRS and 130 patients with sepsis. All data represent the first 24 hours of meeting criteria for either SIRS or sepsis. Two hundred twenty one gene probes were differentially regulated between patients with SIRS and patients with sepsis. The LOOCV procedure correctly predicted 86% of the SIRS and sepsis classes, and Epstein-Barr virus-induced gene 3 (EBI3) had the highest predictive strength. Computer-assisted image analyses of gene-expression mosaics were able to predict infection with a specificity of 90% and a positive predictive value of 94%. Because EBI3 is a subunit of the heterodimeric cytokine, IL-27, we tested the ability of serum IL-27 protein concentrations to predict infection. At a cut-point value of ≥5 ng/ml, serum IL-27 protein concentrations predicted infection with a specificity and a positive predictive value of >90%, and the overall performance of IL-27 was generally better than that of PCT. A decision tree combining IL-27 and PCT improved overall predictive capacity compared with that of either biomarker alone. Genome-wide expression analysis has provided the foundation for the identification of IL-27 as a novel candidate diagnostic biomarker for predicting bacterial infection in critically ill children. Additional studies will be required to test further the diagnostic performance of IL-27. The microarray data reported in this article have been deposited in the Gene Expression Omnibus under accession number GSE4607.

  1. Sequence and Expression Analyses of Ethylene Response Factors Highly Expressed in Latex Cells from Hevea brasiliensis

    PubMed Central

    Piyatrakul, Piyanuch; Yang, Meng; Putranto, Riza-Arief; Pirrello, Julien; Dessailly, Florence; Hu, Songnian; Summo, Marilyne; Theeravatanasuk, Kannikar; Leclercq, Julie; Kuswanhadi; Montoro, Pascal

    2014-01-01

    The AP2/ERF superfamily encodes transcription factors that play a key role in plant development and responses to abiotic and biotic stress. In Hevea brasiliensis, ERF genes have been identified by RNA sequencing. This study set out to validate the number of HbERF genes, and identify ERF genes involved in the regulation of latex cell metabolism. A comprehensive Hevea transcriptome was improved using additional RNA reads from reproductive tissues. Newly assembled contigs were annotated in the Gene Ontology database and were assigned to 3 main categories. The AP2/ERF superfamily is the third most represented compared with other transcription factor families. A comparison with genomic scaffolds led to an estimation of 114 AP2/ERF genes and 1 soloist in Hevea brasiliensis. Based on a phylogenetic analysis, functions were predicted for 26 HbERF genes. A relative transcript abundance analysis was performed by real-time RT-PCR in various tissues. Transcripts of ERFs from group I and VIII were very abundant in all tissues while those of group VII were highly accumulated in latex cells. Seven of the thirty-five ERF expression marker genes were highly expressed in latex. Subcellular localization and transactivation analyses suggested that HbERF-VII candidate genes encoded functional transcription factors. PMID:24971876

  2. Sequence and expression analyses of ethylene response factors highly expressed in latex cells from Hevea brasiliensis.

    PubMed

    Piyatrakul, Piyanuch; Yang, Meng; Putranto, Riza-Arief; Pirrello, Julien; Dessailly, Florence; Hu, Songnian; Summo, Marilyne; Theeravatanasuk, Kannikar; Leclercq, Julie; Kuswanhadi; Montoro, Pascal

    2014-01-01

    The AP2/ERF superfamily encodes transcription factors that play a key role in plant development and responses to abiotic and biotic stress. In Hevea brasiliensis, ERF genes have been identified by RNA sequencing. This study set out to validate the number of HbERF genes, and identify ERF genes involved in the regulation of latex cell metabolism. A comprehensive Hevea transcriptome was improved using additional RNA reads from reproductive tissues. Newly assembled contigs were annotated in the Gene Ontology database and were assigned to 3 main categories. The AP2/ERF superfamily is the third most represented compared with other transcription factor families. A comparison with genomic scaffolds led to an estimation of 114 AP2/ERF genes and 1 soloist in Hevea brasiliensis. Based on a phylogenetic analysis, functions were predicted for 26 HbERF genes. A relative transcript abundance analysis was performed by real-time RT-PCR in various tissues. Transcripts of ERFs from group I and VIII were very abundant in all tissues while those of group VII were highly accumulated in latex cells. Seven of the thirty-five ERF expression marker genes were highly expressed in latex. Subcellular localization and transactivation analyses suggested that HbERF-VII candidate genes encoded functional transcription factors.

  3. Looking into flowering time in almond (Prunus dulcis (Mill) D. A. Webb): the candidate gene approach.

    PubMed

    Silva, C; Garcia-Mas, J; Sánchez, A M; Arús, P; Oliveira, M M

    2005-03-01

    Blooming time is one of the most important agronomic traits in almond. Biochemical and molecular events underlying flowering regulation must be understood before methods to stimulate late flowering can be developed. Attempts to elucidate the genetic control of this process have led to the identification of a major gene (Lb) and quantitative trait loci (QTLs) linked to observed phenotypic differences, but although this gene and these QTLs have been placed on the Prunus reference genetic map, their sequences and specific functions remain unknown. The aim of our investigation was to associate these loci with known genes using a candidate gene approach. Two almond cDNAs and eight Prunus expressed sequence tags were selected as candidate genes (CGs) since their sequences were highly identical to those of flowering regulatory genes characterized in other species. The CGs were amplified from both parental lines of the mapping population using specific primers. Sequence comparison revealed DNA polymorphisms between the parental lines, mainly of the single nucleotide type. Polymorphisms were used to develop co-dominant cleaved amplified polymorphic sequence markers or length polymorphisms based on insertion/deletion events for mapping the candidate genes on the Prunus reference map. Ten candidate genes were assigned to six linkage groups in the Prunus genome. The positions of two of these were compatible with the regions where two QTLs for blooming time were detected. One additional candidate was localized close to the position of the Evergrowing gene, which determines a non-deciduous behaviour in peach.

  4. Bioinformatics analysis and detection of gelatinase encoded gene in Lysinibacillussphaericus

    NASA Astrophysics Data System (ADS)

    Repin, Rul Aisyah Mat; Mutalib, Sahilah Abdul; Shahimi, Safiyyah; Khalid, Rozida Mohd.; Ayob, Mohd. Khan; Bakar, Mohd. Faizal Abu; Isa, Mohd Noor Mat

    2016-11-01

    In this study, we performed bioinformatics analysis toward genome sequence of Lysinibacillussphaericus (L. sphaericus) to determine gene encoded for gelatinase. L. sphaericus was isolated from soil and gelatinase species-specific bacterium to porcine and bovine gelatin. This bacterium offers the possibility of enzymes production which is specific to both species of meat, respectively. The main focus of this research is to identify the gelatinase encoded gene within the bacteria of L. Sphaericus using bioinformatics analysis of partially sequence genome. From the research study, three candidate gene were identified which was, gelatinase candidate gene 1 (P1), NODE_71_length_93919_cov_158.931839_21 which containing 1563 base pair (bp) in size with 520 amino acids sequence; Secondly, gelatinase candidate gene 2 (P2), NODE_23_length_52851_cov_190.061386_17 which containing 1776 bp in size with 591 amino acids sequence; and Thirdly, gelatinase candidate gene 3 (P3), NODE_106_length_32943_cov_169.147919_8 containing 1701 bp in size with 566 amino acids sequence. Three pairs of oligonucleotide primers were designed and namely as, F1, R1, F2, R2, F3 and R3 were targeted short sequences of cDNA by PCR. The amplicons were reliably results in 1563 bp in size for candidate gene P1 and 1701 bp in size for candidate gene P3. Therefore, the results of bioinformatics analysis of L. Sphaericus resulting in gene encoded gelatinase were identified.

  5. Evolutionary appearance of genes encoding proteins associated with box H/ACA snoRNAs: Cbf5p in Euglena gracilis, an early diverging eukaryote, and candidate Gar1p and Nop10p homologs in archaebacteria

    PubMed Central

    Watanabe, Yoh-ichi; Gray, Michael W.

    2000-01-01

    A reverse transcription–polymerase chain reaction (RT–PCR) approach was used to clone a cDNA encoding the Euglena gracilis homolog of yeast Cbf5p, a protein component of the box H/ACA class of snoRNPs that mediate pseudouridine formation in eukaryotic rRNA. Cbf5p is a putative pseudouridine synthase, and the Euglena homolog is the first full-length Cbf5p sequence to be reported for an early diverging unicellular eukaryote (protist). Phylogenetic analysis of putative pseudouridine synthase sequences confirms that archaebacterial and eukaryotic (including Euglena) Cbf5p proteins are specifically related and are distinct from the TruB/Pus4p clade that is responsible for formation of pseudouridine at position 55 in eubacterial (TruB) and eukaryotic (Pus4p) tRNAs. Using a bioinformatics approach, we also identified archaebacterial genes encoding candidate homologs of yeast Gar1p and Nop10p, two additional proteins known to be associated with eukaryotic box H/ACA snoRNPs. These observations raise the possibility that pseudouridine formation in archaebacterial rRNA may be dependent on analogs of the eukaryotic box H/ACA snoRNPs, whose evolutionary origin may therefore predate the split between Archaea (archaebacteria) and Eucarya (eukaryotes). Database searches further revealed, in archaebacterial and some eukaryotic genomes, two previously unrecognized groups of genes (here designated ‘PsuX’ and ‘PsuY’) distantly related to the Cbf5p/TruB gene family. PMID:10871366

  6. In Silico Comparative Transcriptome Analysis of Two Color Morphs of the Common Coral Trout (Plectropomus Leopardus)

    PubMed Central

    Wang, Le; Yu, Cuiping; Guo, Liang; Lin, Haoran; Meng, Zining

    2015-01-01

    The common coral trout is one species of major importance in commercial fisheries and aquaculture. Recently, two different color morphs of Plectropomus leopardus were discovered and the biological importance of the color difference is unknown. Since coral trout species are poorly characterized at the molecular level, we undertook the transcriptomic characterization of the two color morphs, one black and one red coral trout, using Illumina next generation sequencing technologies. The study produced 55162966 and 54588952 paired-end reads, for black and red trout, respectively. De novo transcriptome assembly generated 95367 and 99424 unique sequences in black and red trout, respectively, with 88813 sequences shared between them. Approximately 50% of both trancriptomes were functionally annotated by BLAST searches against protein databases. The two trancriptomes were enriched into 25 functional categories and showed similar profiles of Gene Ontology category compositions. 34110 unigenes were grouped into 259 KEGG pathways. Moreover, we identified 14649 simple sequence repeats (SSRs) and designed primers for potential application. We also discovered 130524 putative single nucleotide polymorphisms (SNPs) in the two transcriptomes, supplying potential genomic resources for the coral trout species. In addition, we identified 936 fast-evolving genes and 165 candidate genes under positive selection between the two color morphs. Finally, 38 candidate genes underlying the mechanism of color and pigmentation were also isolated. This study presents the first transcriptome resources for the common coral trout and provides basic information for the development of genomic tools for the identification, conservation, and understanding of the speciation and local adaptation of coral reef fish species. PMID:26713756

  7. Association and linkage studies of candidate genes involved in GABAergic neurotransmission in lithium-responsive bipolar disorder.

    PubMed Central

    Duffy, A; Turecki, G; Grof, P; Cavazzoni, P; Grof, E; Joober, R; Ahrens, B; Berghöfer, A; Müller-Oerlinghausen, B; Dvoráková, M; Libigerová, E; Vojtĕchovský, M; Zvolský, P; Nilsson, A; Licht, R W; Rasmussen, N A; Schou, M; Vestergaard, P; Holzinger, A; Schumann, C; Thau, K; Robertson, C; Rouleau, G A; Alda, M

    2000-01-01

    OBJECTIVE: To test for genetic linkage and association with GABAergic candidate genes in lithium-responsive bipolar disorder. DESIGN: Polymorphisms located in genes that code for GABRA3, GABRA5 and GABRB3 subunits of the GABAA receptor were investigated using association and linkage strategies. PARTICIPANTS: A total of 138 patients with bipolar 1 disorder with a clear response to lithium prophylaxis, selected from specialized lithium clinics in Canada and Europe that are part of the International Group for the Study of Lithium-Treated Patients, and 108 psychiatrically healthy controls. Families of 24 probands were suitable for linkage analysis. OUTCOME MEASURES: The association between the candidate genes and patients with bipolar disorder versus that of controls and genetic linkage within families. RESULTS: There was no significant association or linkage found between lithium-responsive bipolar disorder and the GABAergic candidate genes investigated. CONCLUSIONS: This study does not support a major role for the GABAergic candidate genes tested in lithium-responsive bipolar disorder. PMID:11022400

  8. [Role of Serotonin Transporter Gene in Eating Disorders].

    PubMed

    Hernández-Muñoz, Sandra; Camarena-Medellin, Beatriz

    2014-01-01

    The serotoninergic system has been implicated in mood and appetite regulation, and the serotonin transporter gene (SLC6A4) is a commonly studied candidate gene for eating disorders. However, most studies have focused on a single polymorphism (5-HTTLPR) in SLC6A4. We present the studies published on the association between eating disorders (ED) and 5-HTTLPR polymorphism in anorexia nervosa (AN), bulimia nervosa (BN), and eating disorders not otherwise specified (EDNOS). Search of databases: MEDLINE, ISI, and PubMed for SLC6A4 and ED. From a review of 37 original articles, it was suggested that carriers of S allele is a risk factor for eating disorders, especially for AN. However, BN did not show any association. Also, BMI, impulsivity, anxiety, depression, and age of onset have been associated with S allele in ED patients. Copyright © 2013 Asociación Colombiana de Psiquiatría. Publicado por Elsevier España. All rights reserved.

  9. Candidate chemosensory genes identified in the endoparasitoid Meteorus pulchricornis (Hymenoptera: Braconidae) by antennal transcriptome analysis.

    PubMed

    Sheng, Sheng; Liao, Cheng-Wu; Zheng, Yu; Zhou, Yu; Xu, Yan; Song, Wen-Miao; He, Peng; Zhang, Jian; Wu, Fu-An

    2017-06-01

    Meteorus pulchricornis is an endoparasitoid wasp which attacks the larvae of various lepidopteran pests. We present the first antennal transcriptome dataset for M. pulchricornis. A total of 48,845,072 clean reads were obtained and 34,967 unigenes were assembled. Of these, 15,458 unigenes showed a significant similarity (E-value <10 -5 ) to known proteins in the NCBI non-redundant protein database. Gene ontology (GO) and cluster of orthologous groups (COG) analyses were used to classify the functions of M. pulchricornis antennae genes. We identified 16 putative odorant-binding protein (OBP) genes, eight chemosensory protein (CSP) genes, 99 olfactory receptor (OR) genes, 19 ionotropic receptor (IR) genes and one sensory neuron membrane protein (SNMP) gene. BLASTx best hit results and phylogenetic analysis both indicated that these chemosensory genes were most closely related to those found in other hymenopteran species. Real-time quantitative PCR assays showed that 14 MpulOBP genes were antennae-specific. Of these, MpulOBP6, MpulOBP9, MpulOBP10, MpulOBP12, MpulOBP15 and MpulOBP16 were found to have greater expression in the antennae than in other body parts, while MpulOBP2 and MpulOBP3 were expressed predominately in the legs and abdomens, respectively. These results might provide a foundation for future studies of olfactory genes and chemoreception in M. pulchricornis. Copyright © 2017 Elsevier Inc. All rights reserved.

  10. Identification of candidate chemosensory genes in the antennal transcriptome of Tenebrio molitor (Coleoptera: Tenebrionidae).

    PubMed

    Liu, Su; Rao, Xiang-Jun; Li, Mao-Ye; Feng, Ming-Feng; He, Meng-Zhu; Li, Shi-Guang

    2015-03-01

    We present the first antennal transcriptome sequencing information for the yellow mealworm beetle, Tenebrio molitor (Coleoptera: Tenebrionidae). Analysis of the transcriptome dataset obtained 52,216,616 clean reads, from which 35,363 unigenes were assembled. Of these, 18,820 unigenes showed significant similarity (E-value <10(-5)) to known proteins in the NCBI non-redundant protein database. Gene ontology (GO) and Cluster of Orthologous Groups (COG) analyses were used for functional classification of these unigenes. We identified 19 putative odorant-binding protein (OBP) genes, 12 chemosensory protein (CSP) genes, 20 olfactory receptor (OR) genes, 6 ionotropic receptor (IR) genes and 2 sensory neuron membrane protein (SNMP) genes. BLASTX best hit results indicated that these chemosensory genes were most identical to their respective orthologs from Tribolium castaneum. Phylogenetic analyses also revealed that the T. molitor OBPs and CSPs are closely related to those of T. castaneum. Real-time quantitative PCR assays showed that eight TmolOBP genes were antennae-specific. Of these, TmolOBP5, TmolOBP7 and TmolOBP16 were found to be predominantly expressed in male antennae, while TmolOBP17 was expressed mainly in the legs of males. Several other genes were identified that were neither tissue-specific nor sex-specific. These results establish a firm foundation for future studies of the chemosensory genes in T. molitor. Copyright © 2015 Elsevier Inc. All rights reserved.

  11. Gene Identification of Pheromone Gland Genes Involved in Type II Sex Pheromone Biosynthesis and Transportation in Female Tea Pest Ectropis grisescens

    PubMed Central

    Li, Zhao-Qun; Ma, Long; Yin, Qian; Cai, Xiao-Ming; Luo, Zong-Xiu; Bian, Lei; Xin, Zhao-Jun; He, Peng; Chen, Zong-Mao

    2018-01-01

    Moths can biosynthesize sex pheromones in the female sex pheromone glands (PGs) and can distinguish species-specific sex pheromones using their antennae. However, the biosynthesis and transportation mechanism for Type II sex pheromone components has rarely been documented in moths. In this study, we constructed a massive PG transcriptome database (14.72 Gb) from a moth species, Ectropis grisescens, which uses type II sex pheromones and is a major tea pest in China. We further identified putative sex pheromone biosynthesis and transportation-related unigenes: 111 cytochrome P450 monooxygenases (CYPs), 25 odorant-binding proteins (OBPs), and 20 chemosensory proteins (CSPs). Tissue expression and phylogenetic tree analyses showed that one CYP (EgriCYP341-fragment3), one OBP (EgriOBP4), and one CSP (EgriCSP10) gene displayed an enriched expression in the PGs, and that EgriOBP2, 3, and 25 are clustered in the moth pheromone-binding protein clade. We considered these our candidate genes. Our results yielded large-scale PG sequence information for further functional studies. PMID:29317471

  12. Defining a new candidate gene for amelogenesis imperfecta: from molecular genetics to biochemistry.

    PubMed

    Urzúa, Blanca; Ortega-Pinto, Ana; Morales-Bozo, Irene; Rojas-Alcayaga, Gonzalo; Cifuentes, Víctor

    2011-02-01

    Amelogenesis imperfecta is a group of genetic conditions that affect the structure and clinical appearance of tooth enamel. The types (hypoplastic, hypocalcified, and hypomature) are correlated with defects in different stages of the process of enamel synthesis. Autosomal dominant, recessive, and X-linked types have been previously described. These disorders are considered clinically and genetically heterogeneous in etiology, involving a variety of genes, such as AMELX, ENAM, DLX3, FAM83H, MMP-20, KLK4, and WDR72. The mutations identified within these causal genes explain less than half of all cases of amelogenesis imperfecta. Most of the candidate and causal genes currently identified encode proteins involved in enamel synthesis. We think it is necessary to refocus the search for candidate genes using biochemical processes. This review provides theoretical evidence that the human SLC4A4 gene (sodium bicarbonate cotransporter) may be a new candidate gene.

  13. ENU Mutagenesis in Mice Identifies Candidate Genes For Hypogonadism

    PubMed Central

    Weiss, Jeffrey; Hurley, Lisa A.; Harris, Rebecca M.; Finlayson, Courtney; Tong, Minghan; Fisher, Lisa A.; Moran, Jennifer L.; Beier, David R.; Mason, Christopher; Jameson, J. Larry

    2012-01-01

    Genome-wide mutagenesis was performed in mice to identify candidate genes for male infertility, for which the predominant causes remain idiopathic. Mice were mutagenized using N-ethyl-N-nitrosourea (ENU), bred, and screened for phenotypes associated with the male urogenital system. Fifteen heritable lines were isolated and chromosomal loci were assigned using low density genome-wide SNP arrays. Ten of the fifteen lines were pursued further using higher resolution SNP analysis to narrow the candidate gene regions. Exon sequencing of candidate genes identified mutations in mice with cystic kidneys (Bicc1), cryptorchidism (Rxfp2), restricted germ cell deficiency (Plk4), and severe germ cell deficiency (Prdm9). In two other lines with severe hypogonadism candidate sequencing failed to identify mutations, suggesting defects in genes with previously undocumented roles in gonadal function. These genomic intervals were sequenced in their entirety and a candidate mutation was identified in SnrpE in one of the two lines. The line harboring the SnrpE variant retains substantial spermatogenesis despite small testis size, an unusual phenotype. In addition to the reproductive defects, heritable phenotypes were observed in mice with ataxia (Myo5a), tremors (Pmp22), growth retardation (unknown gene), and hydrocephalus (unknown gene). These results demonstrate that the ENU screen is an effective tool for identifying potential causes of male infertility. PMID:22258617

  14. Tempest: Accelerated MS/MS database search software for heterogeneous computing platforms

    PubMed Central

    Adamo, Mark E.; Gerber, Scott A.

    2017-01-01

    MS/MS database search algorithms derive a set of candidate peptide sequences from in-silico digest of a protein sequence database, and compute theoretical fragmentation patterns to match these candidates against observed MS/MS spectra. The original Tempest publication described these operations mapped to a CPU-GPU model, in which the CPU generates peptide candidates that are asynchronously sent to a discrete GPU to be scored against experimental spectra in parallel (Milloy et al., 2012). The current version of Tempest expands this model, incorporating OpenCL to offer seamless parallelization across multicore CPUs, GPUs, integrated graphics chips, and general-purpose coprocessors. Three protocols describe how to configure and run a Tempest search, including discussion of how to leverage Tempest's unique feature set to produce optimal results. PMID:27603022

  15. Systematic analysis of microarray datasets to identify Parkinson's disease‑associated pathways and genes.

    PubMed

    Feng, Yinling; Wang, Xuefeng

    2017-03-01

    In order to investigate commonly disturbed genes and pathways in various brain regions of patients with Parkinson's disease (PD), microarray datasets from previous studies were collected and systematically analyzed. Different normalization methods were applied to microarray datasets from different platforms. A strategy combining gene co‑expression networks and clinical information was adopted, using weighted gene co‑expression network analysis (WGCNA) to screen for commonly disturbed genes in different brain regions of patients with PD. Functional enrichment analysis of commonly disturbed genes was performed using the Database for Annotation, Visualization, and Integrated Discovery (DAVID). Co‑pathway relationships were identified with Pearson's correlation coefficient tests and a hypergeometric distribution‑based test. Common genes in pathway pairs were selected out and regarded as risk genes. A total of 17 microarray datasets from 7 platforms were retained for further analysis. Five gene coexpression modules were identified, containing 9,745, 736, 233, 101 and 93 genes, respectively. One module was significantly correlated with PD samples and thus the 736 genes it contained were considered to be candidate PD‑associated genes. Functional enrichment analysis demonstrated that these genes were implicated in oxidative phosphorylation and PD. A total of 44 pathway pairs and 52 risk genes were revealed, and a risk gene pathway relationship network was constructed. Eight modules were identified and were revealed to be associated with PD, cancers and metabolism. A number of disturbed pathways and risk genes were unveiled in PD, and these findings may help advance understanding of PD pathogenesis.

  16. The cld mutation: narrowing the critical chromosomal region and selecting candidate genes.

    PubMed

    Péterfy, Miklós; Mao, Hui Z; Doolittle, Mark H

    2006-10-01

    Combined lipase deficiency (cld) is a recessive, lethal mutation specific to the tw73 haplotype on mouse Chromosome 17. While the cld mutation results in lipase proteins that are inactive, aggregated, and retained in the endoplasmic reticulum (ER), it maps separately from the lipase structural genes. We have narrowed the gene critical region by about 50% using the tw18 haplotype for deletion mapping and a recombinant chromosome used originally to map cld with respect to the phenotypic marker tf. The region now extends from 22 to 25.6 Mbp on the wild-type chromosome, currently containing 149 genes and 50 expressed sequence tags (ESTs). To identify the affected gene, we have selected candidates based on their known role in associated biological processes, cellular components, and molecular functions that best fit with the predicted function of the cld gene. A secondary approach was based on differences in mRNA levels between mutant (cld/cld) and unaffected (+/cld) cells. Using both approaches, we have identified seven functional candidates with an ER localization and/or an involvement in protein maturation and folding that could explain the lipase deficiency, and six expression candidates that exhibit large differences in mRNA levels between mutant and unaffected cells. Significantly, two genes were found to be candidates with regard to both function and expression, thus emerging as the strongest candidates for cld. We discuss the implications of our mapping results and our selection of candidates with respect to other genes, deletions, and mutations occurring in the cld critical region.

  17. LGscore: A method to identify disease-related genes using biological literature and Google data.

    PubMed

    Kim, Jeongwoo; Kim, Hyunjin; Yoon, Youngmi; Park, Sanghyun

    2015-04-01

    Since the genome project in 1990s, a number of studies associated with genes have been conducted and researchers have confirmed that genes are involved in disease. For this reason, the identification of the relationships between diseases and genes is important in biology. We propose a method called LGscore, which identifies disease-related genes using Google data and literature data. To implement this method, first, we construct a disease-related gene network using text-mining results. We then extract gene-gene interactions based on co-occurrences in abstract data obtained from PubMed, and calculate the weights of edges in the gene network by means of Z-scoring. The weights contain two values: the frequency and the Google search results. The frequency value is extracted from literature data, and the Google search result is obtained using Google. We assign a score to each gene through a network analysis. We assume that genes with a large number of links and numerous Google search results and frequency values are more likely to be involved in disease. For validation, we investigated the top 20 inferred genes for five different diseases using answer sets. The answer sets comprised six databases that contain information on disease-gene relationships. We identified a significant number of disease-related genes as well as candidate genes for Alzheimer's disease, diabetes, colon cancer, lung cancer, and prostate cancer. Our method was up to 40% more accurate than existing methods. Copyright © 2015 Elsevier Inc. All rights reserved.

  18. Weighted gene co-expression network analysis of gene modules for the prognosis of esophageal cancer.

    PubMed

    Zhang, Cong; Sun, Qian

    2017-06-01

    Esophageal cancer is a common malignant tumor, whose pathogenesis and prognosis factors are not fully understood. This study aimed to discover the gene clusters that have similar functions and can be used to predict the prognosis of esophageal cancer. The matched microarray and RNA sequencing data of 185 patients with esophageal cancer were downloaded from The Cancer Genome Atlas (TCGA), and gene co-expression networks were built without distinguishing between squamous carcinoma and adenocarcinoma. The result showed that 12 modules were associated with one or more survival data such as recurrence status, recurrence time, vital status or vital time. Furthermore, survival analysis showed that 5 out of the 12 modules were related to progression-free survival (PFS) or overall survival (OS). As the most important module, the midnight blue module with 82 genes was related to PFS, apart from the patient age, tumor grade, primary treatment success, and duration of smoking and tumor histological type. Gene ontology enrichment analysis revealed that "glycoprotein binding" was the top enriched function of midnight blue module genes. Additionally, the blue module was the exclusive gene clusters related to OS. Platelet activating factor receptor (PTAFR) and feline Gardner-Rasheed (FGR) were the top hub genes in both modeling datasets and the STRING protein interaction database. In conclusion, our study provides novel insights into the prognosis-associated genes and screens out candidate biomarkers for esophageal cancer.

  19. Salivary miRNA profiles identify children with autism spectrum disorder, correlate with adaptive behavior, and implicate ASD candidate genes involved in neurodevelopment.

    PubMed

    Hicks, Steven D; Ignacio, Cherry; Gentile, Karen; Middleton, Frank A

    2016-04-22

    Autism spectrum disorder (ASD) is a common neurodevelopmental disorder that lacks adequate screening tools, often delaying diagnosis and therapeutic interventions. Despite a substantial genetic component, no single gene variant accounts for >1 % of ASD incidence. Epigenetic mechanisms that include microRNAs (miRNAs) may contribute to the ASD phenotype by altering networks of neurodevelopmental genes. The extracellular availability of miRNAs allows for painless, noninvasive collection from biofluids. In this study, we investigated the potential for saliva-based miRNAs to serve as diagnostic screening tools and evaluated their potential functional importance. Salivary miRNA was purified from 24 ASD subjects and 21 age- and gender-matched control subjects. The ASD group included individuals with mild ASD (DSM-5 criteria and Autism Diagnostic Observation Schedule) and no history of neurologic disorder, pre-term birth, or known chromosomal abnormality. All subjects completed a thorough neurodevelopmental assessment with the Vineland Adaptive Behavior Scales at the time of saliva collection. A total of 246 miRNAs were detected and quantified in at least half the samples by RNA-Seq and used to perform between-group comparisons with non-parametric testing, multivariate logistic regression and classification analyses, as well as Monte-Carlo Cross-Validation (MCCV). The top miRNAs were examined for correlations with measures of adaptive behavior. Functional enrichment analysis of the highest confidence mRNA targets of the top differentially expressed miRNAs was performed using the Database for Annotation, Visualization, and Integrated Discovery (DAVID), as well as the Simons Foundation Autism Database (AutDB) of ASD candidate genes. Fourteen miRNAs were differentially expressed in ASD subjects compared to controls (p <0.05; FDR <0.15) and showed more than 95 % accuracy at distinguishing subject groups in the best-fit logistic regression model. MCCV revealed an average ROC-AUC value of 0.92 across 100 simulations, further supporting the robustness of the findings. Most of the 14 miRNAs showed significant correlations with Vineland neurodevelopmental scores. Functional enrichment analysis detected significant over-representation of target gene clusters related to transcriptional activation, neuronal development, and AutDB genes. Measurement of salivary miRNA in this pilot study of subjects with mild ASD demonstrated differential expression of 14 miRNAs that are expressed in the developing brain, impact mRNAs related to brain development, and correlate with neurodevelopmental measures of adaptive behavior. These miRNAs have high specificity and cross-validated utility as a potential screening tool for ASD.

  20. antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences.

    PubMed

    Medema, Marnix H; Blin, Kai; Cimermancic, Peter; de Jager, Victor; Zakrzewski, Piotr; Fischbach, Michael A; Weber, Tilmann; Takano, Eriko; Breitling, Rainer

    2011-07-01

    Bacterial and fungal secondary metabolism is a rich source of novel bioactive compounds with potential pharmaceutical applications as antibiotics, anti-tumor drugs or cholesterol-lowering drugs. To find new drug candidates, microbiologists are increasingly relying on sequencing genomes of a wide variety of microbes. However, rapidly and reliably pinpointing all the potential gene clusters for secondary metabolites in dozens of newly sequenced genomes has been extremely challenging, due to their biochemical heterogeneity, the presence of unknown enzymes and the dispersed nature of the necessary specialized bioinformatics tools and resources. Here, we present antiSMASH (antibiotics & Secondary Metabolite Analysis Shell), the first comprehensive pipeline capable of identifying biosynthetic loci covering the whole range of known secondary metabolite compound classes (polyketides, non-ribosomal peptides, terpenes, aminoglycosides, aminocoumarins, indolocarbazoles, lantibiotics, bacteriocins, nucleosides, beta-lactams, butyrolactones, siderophores, melanins and others). It aligns the identified regions at the gene cluster level to their nearest relatives from a database containing all other known gene clusters, and integrates or cross-links all previously available secondary-metabolite specific gene analysis methods in one interactive view. antiSMASH is available at http://antismash.secondarymetabolites.org.

  1. Connection Map for Compounds (CMC): A Server for Combinatorial Drug Toxicity and Efficacy Analysis.

    PubMed

    Liu, Lei; Tsompana, Maria; Wang, Yong; Wu, Dingfeng; Zhu, Lixin; Zhu, Ruixin

    2016-09-26

    Drug discovery and development is a costly and time-consuming process with a high risk for failure resulting primarily from a drug's associated clinical safety and efficacy potential. Identifying and eliminating inapt candidate drugs as early as possible is an effective way for reducing unnecessary costs, but limited analytical tools are currently available for this purpose. Recent growth in the area of toxicogenomics and pharmacogenomics has provided with a vast amount of drug expression microarray data. Web servers such as CMap and LTMap have used this information to evaluate drug toxicity and mechanisms of action independently; however, their wider applicability has been limited by the lack of a combinatorial drug-safety type of analysis. Using available genome-wide drug transcriptional expression profiles, we developed the first web server for combinatorial evaluation of toxicity and efficacy of candidate drugs named "Connection Map for Compounds" (CMC). Using CMC, researchers can initially compare their query drug gene signatures with prebuilt gene profiles generated from two large-scale toxicogenomics databases, and subsequently perform a drug efficacy analysis for identification of known mechanisms of drug action or generation of new predictions. CMC provides a novel approach for drug repositioning and early evaluation in drug discovery with its unique combination of toxicity and efficacy analyses, expansibility of data and algorithms, and customization of reference gene profiles. CMC can be freely accessed at http://cadd.tongji.edu.cn/webserver/CMCbp.jsp .

  2. A computational approach to candidate gene prioritization for X-linked mental retardation using annotation-based binary filtering and motif-based linear discriminatory analysis

    PubMed Central

    2011-01-01

    Background Several computational candidate gene selection and prioritization methods have recently been developed. These in silico selection and prioritization techniques are usually based on two central approaches - the examination of similarities to known disease genes and/or the evaluation of functional annotation of genes. Each of these approaches has its own caveats. Here we employ a previously described method of candidate gene prioritization based mainly on gene annotation, in accompaniment with a technique based on the evaluation of pertinent sequence motifs or signatures, in an attempt to refine the gene prioritization approach. We apply this approach to X-linked mental retardation (XLMR), a group of heterogeneous disorders for which some of the underlying genetics is known. Results The gene annotation-based binary filtering method yielded a ranked list of putative XLMR candidate genes with good plausibility of being associated with the development of mental retardation. In parallel, a motif finding approach based on linear discriminatory analysis (LDA) was employed to identify short sequence patterns that may discriminate XLMR from non-XLMR genes. High rates (>80%) of correct classification was achieved, suggesting that the identification of these motifs effectively captures genomic signals associated with XLMR vs. non-XLMR genes. The computational tools developed for the motif-based LDA is integrated into the freely available genomic analysis portal Galaxy (http://main.g2.bx.psu.edu/). Nine genes (APLN, ZC4H2, MAGED4, MAGED4B, RAP2C, FAM156A, FAM156B, TBL1X, and UXT) were highlighted as highly-ranked XLMR methods. Conclusions The combination of gene annotation information and sequence motif-orientated computational candidate gene prediction methods highlight an added benefit in generating a list of plausible candidate genes, as has been demonstrated for XLMR. Reviewers: This article was reviewed by Dr Barbara Bardoni (nominated by Prof Juergen Brosius); Prof Neil Smalheiser and Dr Dustin Holloway (nominated by Prof Charles DeLisi). PMID:21668950

  3. The CanOE strategy: integrating genomic and metabolic contexts across multiple prokaryote genomes to find candidate genes for orphan enzymes.

    PubMed

    Smith, Adam Alexander Thil; Belda, Eugeni; Viari, Alain; Medigue, Claudine; Vallenet, David

    2012-05-01

    Of all biochemically characterized metabolic reactions formalized by the IUBMB, over one out of four have yet to be associated with a nucleic or protein sequence, i.e. are sequence-orphan enzymatic activities. Few bioinformatics annotation tools are able to propose candidate genes for such activities by exploiting context-dependent rather than sequence-dependent data, and none are readily accessible and propose result integration across multiple genomes. Here, we present CanOE (Candidate genes for Orphan Enzymes), a four-step bioinformatics strategy that proposes ranked candidate genes for sequence-orphan enzymatic activities (or orphan enzymes for short). The first step locates "genomic metabolons", i.e. groups of co-localized genes coding proteins catalyzing reactions linked by shared metabolites, in one genome at a time. These metabolons can be particularly helpful for aiding bioanalysts to visualize relevant metabolic data. In the second step, they are used to generate candidate associations between un-annotated genes and gene-less reactions. The third step integrates these gene-reaction associations over several genomes using gene families, and summarizes the strength of family-reaction associations by several scores. In the final step, these scores are used to rank members of gene families which are proposed for metabolic reactions. These associations are of particular interest when the metabolic reaction is a sequence-orphan enzymatic activity. Our strategy found over 60,000 genomic metabolons in more than 1,000 prokaryote organisms from the MicroScope platform, generating candidate genes for many metabolic reactions, of which more than 70 distinct orphan reactions. A computational validation of the approach is discussed. Finally, we present a case study on the anaerobic allantoin degradation pathway in Escherichia coli K-12.

  4. Transcriptome analysis of Brassica napus pod using RNA-Seq and identification of lipid-related candidate genes.

    PubMed

    Xu, Hai-Ming; Kong, Xiang-Dong; Chen, Fei; Huang, Ji-Xiang; Lou, Xiang-Yang; Zhao, Jian-Yi

    2015-10-24

    Brassica napus is an important oilseed crop. Dissection of the genetic architecture underlying oil-related biological processes will greatly facilitates the genetic improvement of rapeseed. The differential gene expression during pod development offers a snapshot on the genes responsible for oil accumulation in. To identify candidate genes in the linkage peaks reported previously, we used RNA sequencing (RNA-Seq) technology to analyze the pod transcriptomes of German cultivar Sollux and Chinese inbred line Gaoyou. The RNA samples were collected for RNA-Seq at 5-7, 15-17 and 25-27 days after flowering (DAF). Bioinformatics analysis was performed to investigate differentially expressed genes (DEGs). Gene annotation analysis was integrated with QTL mapping and Brassica napus pod transcriptome profiling to detect potential candidate genes in oilseed. Four hundred sixty five and two thousand, one hundred fourteen candidate DEGs were identified, respectively, between two varieties at the same stages and across different periods of each variety. Then, 33 DEGs between Sollux and Gaoyou were identified as the candidate genes affecting seed oil content by combining those DEGs with the quantitative trait locus (QTL) mapping results, of which, one was found to be homologous to Arabidopsis thaliana lipid-related genes. Intervarietal DEGs of lipid pathways in QTL regions represent important candidate genes for oil-related traits. Integrated analysis of transcriptome profiling, QTL mapping and comparative genomics with other relative species leads to efficient identification of most plausible functional genes underlying oil-content related characters, offering valuable resources for bettering breeding program of Brassica napus. This study provided a comprehensive overview on the pod transcriptomes of two varieties with different oil-contents at the three developmental stages.

  5. Transcriptome analysis of Bupleurum chinense focusing on genes involved in the biosynthesis of saikosaponins

    PubMed Central

    2011-01-01

    Abstract Background Bupleurum chinense DC. is a widely used traditional Chinese medicinal plant. Saikosaponins are the major bioactive constituents of B. chinense, but relatively little is known about saikosaponin biosynthesis. The 454 pyrosequencing technology provides a promising opportunity for finding novel genes that participate in plant metabolism. Consequently, this technology may help to identify the candidate genes involved in the saikosaponin biosynthetic pathway. Results One-quarter of the 454 pyrosequencing runs produced a total of 195, 088 high-quality reads, with an average read length of 356 bases (NCBI SRA accession SRA039388). A de novo assembly generated 24, 037 unique sequences (22, 748 contigs and 1, 289 singletons), 12, 649 (52.6%) of which were annotated against three public protein databases using a basic local alignment search tool (E-value ≤1e-10). All unique sequences were compared with NCBI expressed sequence tags (ESTs) (237) and encoding sequences (44) from the Bupleurum genus, and with a Sanger-sequenced EST dataset (3, 111). The 23, 173 (96.4%) unique sequences obtained in the present study represent novel Bupleurum genes. The ESTs of genes related to saikosaponin biosynthesis were found to encode known enzymes that catalyze the formation of the saikosaponin backbone; 246 cytochrome P450 (P450s) and 102 glycosyltransferases (GTs) unique sequences were also found in the 454 dataset. Full length cDNAs of 7 P450s and 7 uridine diphosphate GTs (UGTs) were verified by reverse transcriptase polymerase chain reaction or by cloning using 5' and/or 3' rapid amplification of cDNA ends. Two P450s and three UGTs were identified as the most likely candidates involved in saikosaponin biosynthesis. This finding was based on the coordinate up-regulation of their expression with β-AS in methyl jasmonate-treated adventitious roots and on their similar expression patterns with β-AS in various B. chinense tissues. Conclusions A collection of high-quality ESTs for B. chinense obtained by 454 pyrosequencing is provided here for the first time. These data should aid further research on the functional genomics of B. chinense and other Bupleurum species. The candidate genes for enzymes involved in saikosaponin biosynthesis, especially the P450s and UGTs, that were revealed provide a substantial foundation for follow-up research on the metabolism and regulation of the saikosaponins. PMID:22047182

  6. Validation of SmartRank: A likelihood ratio software for searching national DNA databases with complex DNA profiles.

    PubMed

    Benschop, Corina C G; van de Merwe, Linda; de Jong, Jeroen; Vanvooren, Vanessa; Kempenaers, Morgane; Kees van der Beek, C P; Barni, Filippo; Reyes, Eusebio López; Moulin, Léa; Pene, Laurent; Haned, Hinda; Sijen, Titia

    2017-07-01

    Searching a national DNA database with complex and incomplete profiles usually yields very large numbers of possible matches that can present many candidate suspects to be further investigated by the forensic scientist and/or police. Current practice in most forensic laboratories consists of ordering these 'hits' based on the number of matching alleles with the searched profile. Thus, candidate profiles that share the same number of matching alleles are not differentiated and due to the lack of other ranking criteria for the candidate list it may be difficult to discern a true match from the false positives or notice that all candidates are in fact false positives. SmartRank was developed to put forward only relevant candidates and rank them accordingly. The SmartRank software computes a likelihood ratio (LR) for the searched profile and each profile in the DNA database and ranks database entries above a defined LR threshold according to the calculated LR. In this study, we examined for mixed DNA profiles of variable complexity whether the true donors are retrieved, what the number of false positives above an LR threshold is and the ranking position of the true donors. Using 343 mixed DNA profiles over 750 SmartRank searches were performed. In addition, the performance of SmartRank and CODIS were compared regarding DNA database searches and SmartRank was found complementary to CODIS. We also describe the applicable domain of SmartRank and provide guidelines. The SmartRank software is open-source and freely available. Using the best practice guidelines, SmartRank enables obtaining investigative leads in criminal cases lacking a suspect. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. Identification of downy mildew resistance gene candidates by positional cloning in maize (Zea mays subsp. mays; Poaceae)1

    PubMed Central

    Kim, Jae Yoon; Moon, Jun-Cheol; Kim, Hyo Chul; Shin, Seungho; Song, Kitae; Kim, Kyung-Hee; Lee, Byung-Moo

    2017-01-01

    Premise of the study: Positional cloning in combination with phenotyping is a general approach to identify disease-resistance gene candidates in plants; however, it requires several time-consuming steps including population or fine mapping. Therefore, in the present study, we suggest a new combined strategy to improve the identification of disease-resistance gene candidates. Methods and Results: Downy mildew (DM)–resistant maize was selected from five cultivars using a spreader row technique. Positional cloning and bioinformatics tools were used to identify the DM-resistance quantitative trait locus marker (bnlg1702) and 47 protein-coding gene annotations. Eventually, five DM-resistance gene candidates, including bZIP34, Bak1, and Ppr, were identified by quantitative reverse-transcription PCR (RT-PCR) without fine mapping of the bnlg1702 locus. Conclusions: The combined protocol with the spreader row technique, quantitative trait locus positional cloning, and quantitative RT-PCR was effective for identifying DM-resistance candidate genes. This cloning approach may be applied to other whole-genome-sequenced crops or resistance to other diseases. PMID:28224059

  8. Integrative strategies to identify candidate genes in rodent models of human alcoholism.

    PubMed

    Treadwell, Julie A

    2006-01-01

    The search for genes underlying alcohol-related behaviours in rodent models of human alcoholism has been ongoing for many years with only limited success. Recently, new strategies that integrate several of the traditional approaches have provided new insights into the molecular mechanisms underlying ethanol's actions in the brain. We have used alcohol-preferring C57BL/6J (B6) and alcohol-avoiding DBA/2J (D2) genetic strains of mice in an integrative strategy combining high-throughput gene expression screening, genetic segregation analysis, and mapping to previously published quantitative trait loci to uncover candidate genes for the ethanol-preference phenotype. In our study, 2 genes, retinaldehyde binding protein 1 (Rlbp1) and syntaxin 12 (Stx12), were found to be strong candidates for ethanol preference. Such experimental approaches have the power and the potential to greatly speed up the laborious process of identifying candidate genes for the animal models of human alcoholism.

  9. LOD score exclusion analyses for candidate genes using random population samples.

    PubMed

    Deng, H W; Li, J; Recker, R R

    2001-05-01

    While extensive analyses have been conducted to test for, no formal analyses have been conducted to test against, the importance of candidate genes with random population samples. We develop a LOD score approach for exclusion analyses of candidate genes with random population samples. Under this approach, specific genetic effects and inheritance models at candidate genes can be analysed and if a LOD score is < or = - 2.0, the locus can be excluded from having an effect larger than that specified. Computer simulations show that, with sample sizes often employed in association studies, this approach has high power to exclude a gene from having moderate genetic effects. In contrast to regular association analyses, population admixture will not affect the robustness of our analyses; in fact, it renders our analyses more conservative and thus any significant exclusion result is robust. Our exclusion analysis complements association analysis for candidate genes in random population samples and is parallel to the exclusion mapping analyses that may be conducted in linkage analyses with pedigrees or relative pairs. The usefulness of the approach is demonstrated by an application to test the importance of vitamin D receptor and estrogen receptor genes underlying the differential risk to osteoporotic fractures.

  10. Whole-exome sequencing identifies novel candidate predisposition genes for familial polycythemia vera.

    PubMed

    Hirvonen, Elina A M; Pitkänen, Esa; Hemminki, Kari; Aaltonen, Lauri A; Kilpivaara, Outi

    2017-04-20

    Polycythemia vera (PV), characterized by massive production of erythrocytes, is one of the myeloproliferative neoplasms. Most patients carry a somatic gain-of-function mutation in JAK2, c.1849G > T (p.Val617Phe), leading to constitutive activation of JAK-STAT signaling pathway. Familial clustering is also observed occasionally, but high-penetrance predisposition genes to PV have remained unidentified. We studied the predisposition to PV by exome sequencing (three cases) in a Finnish PV family with four patients. The 12 shared variants (maximum allowed minor allele frequency <0.001 in Finnish population in ExAC database) predicted damaging in silico and absent in an additional control set of over 500 Finns were further validated by Sanger sequencing in a fourth affected family member. Three novel predisposition candidate variants were identified: c.1254C > G (p.Phe418Leu) in ZXDC, c.1931C > G (p.Pro644Arg) in ATN1, and c.701G > A (p.Arg234Gln) in LRRC3. We also observed a rare, predicted benign germline variant c.2912C > G (p.Ala971Gly) in BCORL1 in all four patients. Somatic mutations in BCORL1 have been reported in myeloid malignancies. We further screened the variants in eight PV patients in six other Finnish families, but no other carriers were found. Exome sequencing provides a powerful tool for the identification of novel variants, and understanding the familial predisposition of diseases. This is the first report on Finnish familial PV cases, and we identified three novel candidate variants that may predispose to the disease.

  11. A functional promoter variant of the human formimidoyltransferase cyclodeaminase (FTCD) gene is associated with working memory performance in young but not older adults.

    PubMed

    Greenwood, Pamela M; Schmidt, Kevin; Lin, Ming-Kuan; Lipsky, Robert; Parasuraman, Raja; Jankord, Ryan

    2018-06-21

    The central role of working memory in IQ and the high heritability of working memory performance motivated interest in identifying the specific genes underlying this heritability. The FTCD (formimidoyltransferase cyclodeaminase) gene was identified as a candidate gene for allelic association with working memory in part from genetic mapping studies of mouse Morris water maze performance. The present study tested variants of this gene for effects on a delayed match-to-sample task of a large sample of younger and older participants. The rs914246 variant, but not the rs914245 variant, of the FTCD gene modulated accuracy in the task for younger, but not older, people under high working memory load. The interaction of haplotype × distance × load had a partial eta squared effect size of 0.015. Analysis of simple main effects had partial eta squared effect sizes ranging from 0.012 to 0.040. A reporter gene assay revealed that the C allele of the rs914246 genotype is functional and a main factor regulating FTCD gene expression. This study extends previous work on the genetics of working memory by revealing that a gene in the glutamatergic pathway modulates working memory in young people but not in older people. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  12. Incorporating Information of microRNAs into Pathway Analysis in a Genome-Wide Association Study of Bipolar Disorder

    PubMed Central

    Shih, Wei-Liang; Kao, Chung-Feng; Chuang, Li-Chung; Kuo, Po-Hsiu

    2012-01-01

    MicroRNAs (miRNAs) are known to be important post-transcriptional regulators that are involved in the etiology of complex psychiatric traits. The present study aimed to incorporate miRNAs information into pathway analysis using a genome-wide association dataset to identify relevant biological pathways for bipolar disorder (BPD). We selected psychiatric- and neurological-associated miRNAs (N = 157) from PhenomiR database. The miRNA target genes (miTG) predictions were obtained from microRNA.org. Canonical pathways (N = 4,051) were downloaded from the Molecule Signature Database. We employed a novel weighting scheme for miTGs in pathway analysis using methods of gene set enrichment analysis and sum-statistic. Under four statistical scenarios, 38 significantly enriched pathways (P-value < 0.01 after multiple testing correction) were identified for the risk of developing BPD, including pathways of ion channels associated (e.g., gated channel activity, ion transmembrane transporter activity, and ion channel activity) and nervous related biological processes (e.g., nervous system development, cytoskeleton, and neuroactive ligand receptor interaction). Among them, 19 were identified only when the weighting scheme was applied. Many miRNA-targeted genes were functionally related to ion channels, collagen, and axonal growth and guidance that have been suggested to be associated with BPD previously. Some of these genes are linked to the regulation of miRNA machinery in the literature. Our findings provide support for the potential involvement of miRNAs in the psychopathology of BPD. Further investigations to elucidate the functions and mechanisms of identified candidate pathways are needed. PMID:23264780

  13. Biosynthesis of the active compounds of Isatis indigotica based on transcriptome sequencing and metabolites profiling

    PubMed Central

    2013-01-01

    Backgroud Isatis indigotica is a widely used herb for the clinical treatment of colds, fever, and influenza in Traditional Chinese Medicine (TCM). Various structural classes of compounds have been identified as effective ingredients. However, little is known at genetics level about these active metabolites. In the present study, we performed de novo transcriptome sequencing for the first time to produce a comprehensive dataset of I. indigotica. Results A database of 36,367 unigenes (average length = 1,115.67 bases) was generated by performing transcriptome sequencing. Based on the gene annotation of the transcriptome, 104 unigenes were identified covering most of the catalytic steps in the general biosynthetic pathways of indole, terpenoid, and phenylpropanoid. Subsequently, the organ-specific expression patterns of the genes involved in these pathways, and their responses to methyl jasmonate (MeJA) induction, were investigated. Metabolites profile of effective phenylpropanoid showed accumulation pattern of secondary metabolites were mostly correlated with the transcription of their biosynthetic genes. According to the analysis of UDP-dependent glycosyltransferases (UGT) family, several flavonoids were indicated to exist in I. indigotica and further identified by metabolic profile using UPLC/Q-TOF. Moreover, applying transcriptome co-expression analysis, nine new, putative UGTs were suggested as flavonol glycosyltransferases and lignan glycosyltransferases. Conclusions This database provides a pool of candidate genes involved in biosynthesis of effective metabolites in I. indigotica. Furthermore, the comprehensive analysis and characterization of the significant pathways are expected to give a better insight regarding the diversity of chemical composition, synthetic characteristics, and the regulatory mechanism which operate in this medical herb. PMID:24308360

  14. Computing Prediction and Functional Analysis of Prokaryotic Propionylation.

    PubMed

    Wang, Li-Na; Shi, Shao-Ping; Wen, Ping-Ping; Zhou, Zhi-You; Qiu, Jian-Ding

    2017-11-27

    Identification and systematic analysis of candidates for protein propionylation are crucial steps for understanding its molecular mechanisms and biological functions. Although several proteome-scale methods have been performed to delineate potential propionylated proteins, the majority of lysine-propionylated substrates and their role in pathological physiology still remain largely unknown. By gathering various databases and literatures, experimental prokaryotic propionylation data were collated to be trained in a support vector machine with various features via a three-step feature selection method. A novel online tool for seeking potential lysine-propionylated sites (PropSeek) ( http://bioinfo.ncu.edu.cn/PropSeek.aspx ) was built. Independent test results of leave-one-out and n-fold cross-validation were similar to each other, showing that PropSeek is a stable and robust predictor with satisfying performance. Meanwhile, analyses of Gene Ontology, Kyoto Encyclopedia of Genes and Genomes pathways, and protein-protein interactions implied a potential role of prokaryotic propionylation in protein synthesis and metabolism.

  15. Phenoscape: Identifying Candidate Genes for Evolutionary Phenotypes

    PubMed Central

    Edmunds, Richard C.; Su, Baofeng; Balhoff, James P.; Eames, B. Frank; Dahdul, Wasila M.; Lapp, Hilmar; Lundberg, John G.; Vision, Todd J.; Dunham, Rex A.; Mabee, Paula M.; Westerfield, Monte

    2016-01-01

    Phenotypes resulting from mutations in genetic model organisms can help reveal candidate genes for evolutionarily important phenotypic changes in related taxa. Although testing candidate gene hypotheses experimentally in nonmodel organisms is typically difficult, ontology-driven information systems can help generate testable hypotheses about developmental processes in experimentally tractable organisms. Here, we tested candidate gene hypotheses suggested by expert use of the Phenoscape Knowledgebase, specifically looking for genes that are candidates responsible for evolutionarily interesting phenotypes in the ostariophysan fishes that bear resemblance to mutant phenotypes in zebrafish. For this, we searched ZFIN for genetic perturbations that result in either loss of basihyal element or loss of scales phenotypes, because these are the ancestral phenotypes observed in catfishes (Siluriformes). We tested the identified candidate genes by examining their endogenous expression patterns in the channel catfish, Ictalurus punctatus. The experimental results were consistent with the hypotheses that these features evolved through disruption in developmental pathways at, or upstream of, brpf1 and eda/edar for the ancestral losses of basihyal element and scales, respectively. These results demonstrate that ontological annotations of the phenotypic effects of genetic alterations in model organisms, when aggregated within a knowledgebase, can be used effectively to generate testable, and useful, hypotheses about evolutionary changes in morphology. PMID:26500251

  16. Walking the interactome for candidate prioritization in exome sequencing studies of Mendelian diseases

    DOE PAGES

    Smedley, Damian; Kohler, Sebastian; Czeschik, Johanna Christina; ...

    2014-07-30

    Here, whole-exome sequencing (WES) has opened up previously unheard of possibilities for identifying novel disease genes in Mendelian disorders, only about half of which have been elucidated to date. However, interpretation of WES data remains challenging. As a result, we analyze protein–protein association (PPA) networks to identify candidate genes in the vicinity of genes previously implicated in a disease. The analysis, using a random-walk with restart (RWR) method, is adapted to the setting of WES by developing a composite variant-gene relevance score based on the rarity, location and predicted pathogenicity of variants and the RWR evaluation of genes harboring themore » variants. Benchmarking using known disease variants from 88 disease-gene families reveals that the correct gene is ranked among the top 10 candidates in ≥50% of cases, a figure which we confirmed using a prospective study of disease genes identified in 2012 and PPA data produced before that date. In conclusion, we implement our method in a freely available Web server, ExomeWalker, that displays a ranked list of candidates together with information on PPAs, frequency and predicted pathogenicity of the variants to allow quick and effective searches for candidates that are likely to reward closer investigation.« less

  17. Integration of QTL and bioinformatic tools to identify candidate genes for triglycerides in mice[S

    PubMed Central

    Leduc, Magalie S.; Hageman, Rachael S.; Verdugo, Ricardo A.; Tsaih, Shirng-Wern; Walsh, Kenneth; Churchill, Gary A.; Paigen, Beverly

    2011-01-01

    To identify genetic loci influencing lipid levels, we performed quantitative trait loci (QTL) analysis between inbred mouse strains MRL/MpJ and SM/J, measuring triglyceride levels at 8 weeks of age in F2 mice fed a chow diet. We identified one significant QTL on chromosome (Chr) 15 and three suggestive QTL on Chrs 2, 7, and 17. We also carried out microarray analysis on the livers of parental strains of 282 F2 mice and used these data to find cis-regulated expression QTL. We then narrowed the list of candidate genes under significant QTL using a “toolbox” of bioinformatic resources, including haplotype analysis; parental strain comparison for gene expression differences and nonsynonymous coding single nucleotide polymorphisms (SNP); cis-regulated eQTL in livers of F2 mice; correlation between gene expression and phenotype; and conditioning of expression on the phenotype. We suggest Slc25a7 as a candidate gene for the Chr 7 QTL and, based on expression differences, five genes (Polr3 h, Cyp2d22, Cyp2d26, Tspo, and Ttll12) as candidate genes for Chr 15 QTL. This study shows how bioinformatics can be used effectively to reduce candidate gene lists for QTL related to complex traits. PMID:21622629

  18. Walking the interactome for candidate prioritization in exome sequencing studies of Mendelian diseases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Smedley, Damian; Kohler, Sebastian; Czeschik, Johanna Christina

    Here, whole-exome sequencing (WES) has opened up previously unheard of possibilities for identifying novel disease genes in Mendelian disorders, only about half of which have been elucidated to date. However, interpretation of WES data remains challenging. As a result, we analyze protein–protein association (PPA) networks to identify candidate genes in the vicinity of genes previously implicated in a disease. The analysis, using a random-walk with restart (RWR) method, is adapted to the setting of WES by developing a composite variant-gene relevance score based on the rarity, location and predicted pathogenicity of variants and the RWR evaluation of genes harboring themore » variants. Benchmarking using known disease variants from 88 disease-gene families reveals that the correct gene is ranked among the top 10 candidates in ≥50% of cases, a figure which we confirmed using a prospective study of disease genes identified in 2012 and PPA data produced before that date. In conclusion, we implement our method in a freely available Web server, ExomeWalker, that displays a ranked list of candidates together with information on PPAs, frequency and predicted pathogenicity of the variants to allow quick and effective searches for candidates that are likely to reward closer investigation.« less

  19. Prioritization of candidate disease genes by topological similarity between disease and protein diffusion profiles.

    PubMed

    Zhu, Jie; Qin, Yufang; Liu, Taigang; Wang, Jun; Zheng, Xiaoqi

    2013-01-01

    Identification of gene-phenotype relationships is a fundamental challenge in human health clinic. Based on the observation that genes causing the same or similar phenotypes tend to correlate with each other in the protein-protein interaction network, a lot of network-based approaches were proposed based on different underlying models. A recent comparative study showed that diffusion-based methods achieve the state-of-the-art predictive performance. In this paper, a new diffusion-based method was proposed to prioritize candidate disease genes. Diffusion profile of a disease was defined as the stationary distribution of candidate genes given a random walk with restart where similarities between phenotypes are incorporated. Then, candidate disease genes are prioritized by comparing their diffusion profiles with that of the disease. Finally, the effectiveness of our method was demonstrated through the leave-one-out cross-validation against control genes from artificial linkage intervals and randomly chosen genes. Comparative study showed that our method achieves improved performance compared to some classical diffusion-based methods. To further illustrate our method, we used our algorithm to predict new causing genes of 16 multifactorial diseases including Prostate cancer and Alzheimer's disease, and the top predictions were in good consistent with literature reports. Our study indicates that integration of multiple information sources, especially the phenotype similarity profile data, and introduction of global similarity measure between disease and gene diffusion profiles are helpful for prioritizing candidate disease genes. Programs and data are available upon request.

  20. PharmMapper server: a web server for potential drug target identification using pharmacophore mapping approach

    PubMed Central

    Liu, Xiaofeng; Ouyang, Sisheng; Yu, Biao; Liu, Yabo; Huang, Kai; Gong, Jiayu; Zheng, Siyuan; Li, Zhihua; Li, Honglin; Jiang, Hualiang

    2010-01-01

    In silico drug target identification, which includes many distinct algorithms for finding disease genes and proteins, is the first step in the drug discovery pipeline. When the 3D structures of the targets are available, the problem of target identification is usually converted to finding the best interaction mode between the potential target candidates and small molecule probes. Pharmacophore, which is the spatial arrangement of features essential for a molecule to interact with a specific target receptor, is an alternative method for achieving this goal apart from molecular docking method. PharmMapper server is a freely accessed web server designed to identify potential target candidates for the given small molecules (drugs, natural products or other newly discovered compounds with unidentified binding targets) using pharmacophore mapping approach. PharmMapper hosts a large, in-house repertoire of pharmacophore database (namely PharmTargetDB) annotated from all the targets information in TargetBank, BindingDB, DrugBank and potential drug target database, including over 7000 receptor-based pharmacophore models (covering over 1500 drug targets information). PharmMapper automatically finds the best mapping poses of the query molecule against all the pharmacophore models in PharmTargetDB and lists the top N best-fitted hits with appropriate target annotations, as well as respective molecule’s aligned poses are presented. Benefited from the highly efficient and robust triangle hashing mapping method, PharmMapper bears high throughput ability and only costs 1 h averagely to screen the whole PharmTargetDB. The protocol was successful in finding the proper targets among the top 300 pharmacophore candidates in the retrospective benchmarking test of tamoxifen. PharmMapper is available at http://59.78.96.61/pharmmapper. PMID:20430828

  1. In Silico Gene Prioritization by Integrating Multiple Data Sources

    PubMed Central

    Zhou, Yingyao; Shields, Robert; Chanda, Sumit K.; Elston, Robert C.; Li, Jing

    2011-01-01

    Identifying disease genes is crucial to the understanding of disease pathogenesis, and to the improvement of disease diagnosis and treatment. In recent years, many researchers have proposed approaches to prioritize candidate genes by considering the relationship of candidate genes and existing known disease genes, reflected in other data sources. In this paper, we propose an expandable framework for gene prioritization that can integrate multiple heterogeneous data sources by taking advantage of a unified graphic representation. Gene-gene relationships and gene-disease relationships are then defined based on the overall topology of each network using a diffusion kernel measure. These relationship measures are in turn normalized to derive an overall measure across all networks, which is utilized to rank all candidate genes. Based on the informativeness of available data sources with respect to each specific disease, we also propose an adaptive threshold score to select a small subset of candidate genes for further validation studies. We performed large scale cross-validation analysis on 110 disease families using three data sources. Results have shown that our approach consistently outperforms other two state of the art programs. A case study using Parkinson disease (PD) has identified four candidate genes (UBB, SEPT5, GPR37 and TH) that ranked higher than our adaptive threshold, all of which are involved in the PD pathway. In particular, a very recent study has observed a deletion of TH in a patient with PD, which supports the importance of the TH gene in PD pathogenesis. A web tool has been implemented to assist scientists in their genetic studies. PMID:21731658

  2. Mutation databases for inherited renal disease: are they complete, accurate, clinically relevant, and freely available?

    PubMed

    Savige, Judy; Dagher, Hayat; Povey, Sue

    2014-07-01

    This study examined whether gene-specific DNA variant databases for inherited diseases of the kidney fulfilled the Human Variome Project recommendations of being complete, accurate, clinically relevant and freely available. A recent review identified 60 inherited renal diseases caused by mutations in 132 genes. The disease name, MIM number, gene name, together with "mutation" or "database," were used to identify web-based databases. Fifty-nine diseases (98%) due to mutations in 128 genes had a variant database. Altogether there were 349 databases (a median of 3 per gene, range 0-6), but no gene had two databases with the same number of variants, and 165 (50%) databases included fewer than 10 variants. About half the databases (180, 54%) had been updated in the previous year. Few (77, 23%) were curated by "experts" but these included nine of the 11 with the most variants. Even fewer databases (41, 12%) included clinical features apart from the name of the associated disease. Most (223, 67%) could be accessed without charge, including those for 50 genes (40%) with the maximum number of variants. Future efforts should focus on encouraging experts to collaborate on a single database for each gene affected in inherited renal disease, including both unpublished variants, and clinical phenotypes. © 2014 WILEY PERIODICALS, INC.

  3. The Genetic Basis for Variation in Sensitivity to Lead Toxicity in Drosophila melanogaster.

    PubMed

    Zhou, Shanshan; Morozova, Tatiana V; Hussain, Yasmeen N; Luoma, Sarah E; McCoy, Lenovia; Yamamoto, Akihiko; Mackay, Trudy F C; Anholt, Robert R H

    2016-07-01

    Lead toxicity presents a worldwide health problem, especially due to its adverse effects on cognitive development in children. However, identifying genes that give rise to individual variation in susceptibility to lead toxicity is challenging in human populations. Our goal was to use Drosophila melanogaster to identify evolutionarily conserved candidate genes associated with individual variation in susceptibility to lead exposure. To identify candidate genes associated with variation in susceptibility to lead toxicity, we measured effects of lead exposure on development time, viability and adult activity in the Drosophila melanogaster Genetic Reference Panel (DGRP) and performed genome-wide association analyses to identify candidate genes. We used mutants to assess functional causality of candidate genes and constructed a genetic network associated with variation in sensitivity to lead exposure, on which we could superimpose human orthologs. We found substantial heritabilities for all three traits and identified candidate genes associated with variation in susceptibility to lead exposure for each phenotype. The genetic architectures that determine variation in sensitivity to lead exposure are highly polygenic. Gene ontology and network analyses showed enrichment of genes associated with early development and function of the nervous system. Drosophila melanogaster presents an advantageous model to study the genetic underpinnings of variation in susceptibility to lead toxicity. Evolutionary conservation of cellular pathways that respond to toxic exposure allows predictions regarding orthologous genes and pathways across phyla. Thus, studies in the D. melanogaster model system can identify candidate susceptibility genes to guide subsequent studies in human populations. Zhou S, Morozova TV, Hussain YN, Luoma SE, McCoy L, Yamamoto A, Mackay TF, Anholt RR. 2016. The genetic basis for variation in sensitivity to lead toxicity in Drosophila melanogaster. Environ Health Perspect 124:1062-1070; http://dx.doi.org/10.1289/ehp.1510513.

  4. Replication of type 2 diabetes candidate genes variations in three geographically unrelated Indian population groups.

    PubMed

    Ali, Shafat; Chopra, Rupali; Manvati, Siddharth; Singh, Yoginder Pal; Kaul, Nabodita; Behura, Anita; Mahajan, Ankit; Sehajpal, Prabodh; Gupta, Subash; Dhar, Manoj K; Chainy, Gagan B N; Bhanwer, Amarjit S; Sharma, Swarkar; Bamezai, Rameshwar N K

    2013-01-01

    Type 2 diabetes (T2D) is a syndrome of multiple metabolic disorders and is genetically heterogeneous. India comprises one of the largest global populations with highest number of reported type 2 diabetes cases. However, limited information about T2D associated loci is available for Indian populations. It is, therefore, pertinent to evaluate the previously associated candidates as well as identify novel genetic variations in Indian populations to understand the extent of genetic heterogeneity. We chose to do a cost effective high-throughput mass-array genotyping and studied the candidate gene variations associated with T2D in literature. In this case-control candidate genes association study, 91 SNPs from 55 candidate genes have been analyzed in three geographically independent population groups from India. We report the genetic variants in five candidate genes: TCF7L2, HHEX, ENPP1, IDE and FTO, are significantly associated (after Bonferroni correction, p<5.5E-04) with T2D susceptibility in combined population. Interestingly, SNP rs7903146 of the TCF7L2 gene passed the genome wide significance threshold (combined P value = 2.05E-08) in the studied populations. We also observed the association of rs7903146 with blood glucose (fasting and postprandial) levels, supporting the role of TCF7L2 gene in blood glucose homeostasis. Further, we noted that the moderate risk provided by the independently associated loci in combined population with Odds Ratio (OR)<1.38 increased to OR = 2.44, (95%CI = 1.67-3.59) when the risk providing genotypes of TCF7L2, HHEX, ENPP1 and FTO genes were combined, suggesting the importance of gene-gene interactions evaluation in complex disorders like T2D.

  5. Replication of Type 2 Diabetes Candidate Genes Variations in Three Geographically Unrelated Indian Population Groups

    PubMed Central

    Ali, Shafat; Chopra, Rupali; Manvati, Siddharth; Mahajan, Ankit; Sehajpal, Prabodh; Gupta, Subash; Dhar, Manoj K.; Chainy, Gagan B. N.; Bhanwer, Amarjit S.; Sharma, Swarkar; Bamezai, Rameshwar N. K.

    2013-01-01

    Type 2 diabetes (T2D) is a syndrome of multiple metabolic disorders and is genetically heterogeneous. India comprises one of the largest global populations with highest number of reported type 2 diabetes cases. However, limited information about T2D associated loci is available for Indian populations. It is, therefore, pertinent to evaluate the previously associated candidates as well as identify novel genetic variations in Indian populations to understand the extent of genetic heterogeneity. We chose to do a cost effective high-throughput mass-array genotyping and studied the candidate gene variations associated with T2D in literature. In this case-control candidate genes association study, 91 SNPs from 55 candidate genes have been analyzed in three geographically independent population groups from India. We report the genetic variants in five candidate genes: TCF7L2, HHEX, ENPP1, IDE and FTO, are significantly associated (after Bonferroni correction, p<5.5E−04) with T2D susceptibility in combined population. Interestingly, SNP rs7903146 of the TCF7L2 gene passed the genome wide significance threshold (combined P value = 2.05E−08) in the studied populations. We also observed the association of rs7903146 with blood glucose (fasting and postprandial) levels, supporting the role of TCF7L2 gene in blood glucose homeostasis. Further, we noted that the moderate risk provided by the independently associated loci in combined population with Odds Ratio (OR)<1.38 increased to OR = 2.44, (95%CI = 1.67–3.59) when the risk providing genotypes of TCF7L2, HHEX, ENPP1 and FTO genes were combined, suggesting the importance of gene-gene interactions evaluation in complex disorders like T2D. PMID:23527042

  6. Lung cancer signature biomarkers: tissue specific semantic similarity based clustering of digital differential display (DDD) data.

    PubMed

    Srivastava, Mousami; Khurana, Pankaj; Sugadev, Ragumani

    2012-11-02

    The tissue-specific Unigene Sets derived from more than one million expressed sequence tags (ESTs) in the NCBI, GenBank database offers a platform for identifying significantly and differentially expressed tissue-specific genes by in-silico methods. Digital differential display (DDD) rapidly creates transcription profiles based on EST comparisons and numerically calculates, as a fraction of the pool of ESTs, the relative sequence abundance of known and novel genes. However, the process of identifying the most likely tissue for a specific disease in which to search for candidate genes from the pool of differentially expressed genes remains difficult. Therefore, we have used 'Gene Ontology semantic similarity score' to measure the GO similarity between gene products of lung tissue-specific candidate genes from control (normal) and disease (cancer) sets. This semantic similarity score matrix based on hierarchical clustering represents in the form of a dendrogram. The dendrogram cluster stability was assessed by multiple bootstrapping. Multiple bootstrapping also computes a p-value for each cluster and corrects the bias of the bootstrap probability. Subsequent hierarchical clustering by the multiple bootstrapping method (α = 0.95) identified seven clusters. The comparative, as well as subtractive, approach revealed a set of 38 biomarkers comprising four distinct lung cancer signature biomarker clusters (panel 1-4). Further gene enrichment analysis of the four panels revealed that each panel represents a set of lung cancer linked metastasis diagnostic biomarkers (panel 1), chemotherapy/drug resistance biomarkers (panel 2), hypoxia regulated biomarkers (panel 3) and lung extra cellular matrix biomarkers (panel 4). Expression analysis reveals that hypoxia induced lung cancer related biomarkers (panel 3), HIF and its modulating proteins (TGM2, CSNK1A1, CTNNA1, NAMPT/Visfatin, TNFRSF1A, ETS1, SRC-1, FN1, APLP2, DMBT1/SAG, AIB1 and AZIN1) are significantly down regulated. All down regulated genes in this panel were highly up regulated in most other types of cancers. These panels of proteins may represent signature biomarkers for lung cancer and will aid in lung cancer diagnosis and disease monitoring as well as in the prediction of responses to therapeutics.

  7. Use of homologous and heterologous gene expression profiling tools to characterize transcription dynamics during apple fruit maturation and ripening

    PubMed Central

    2010-01-01

    Background Fruit development, maturation and ripening consists of a complex series of biochemical and physiological changes that in climacteric fruits, including apple and tomato, are coordinated by the gaseous hormone ethylene. These changes lead to final fruit quality and understanding of the functional machinery underlying these processes is of both biological and practical importance. To date many reports have been made on the analysis of gene expression in apple. In this study we focused our investigation on the role of ethylene during apple maturation, specifically comparing transcriptomics of normal ripening with changes resulting from application of the hormone receptor competitor 1-Methylcyclopropene. Results To gain insight into the molecular process regulating ripening in apple, and to compare to tomato (model species for ripening studies), we utilized both homologous and heterologous (tomato) microarray to profile transcriptome dynamics of genes involved in fruit development and ripening, emphasizing those which are ethylene regulated. The use of both types of microarrays facilitated transcriptome comparison between apple and tomato (for the later using data previously published and available at the TED: tomato expression database) and highlighted genes conserved during ripening of both species, which in turn represent a foundation for further comparative genomic studies. The cross-species analysis had the secondary aim of examining the efficiency of heterologous (specifically tomato) microarray hybridization for candidate gene identification as related to the ripening process. The resulting transcriptomics data revealed coordinated gene expression during fruit ripening of a subset of ripening-related and ethylene responsive genes, further facilitating the analysis of ethylene response during fruit maturation and ripening. Conclusion Our combined strategy based on microarray hybridization enabled transcriptome characterization during normal climacteric apple ripening, as well as definition of ethylene-dependent transcriptome changes. Comparison with tomato fruit maturation and ethylene responsive transcriptome activity facilitated identification of putative conserved orthologous ripening-related genes, which serve as an initial set of candidates for assessing conservation of gene activity across genomes of fruit bearing plant species. PMID:20973957

  8. AmpuBase: a transcriptome database for eight species of apple snails (Gastropoda: Ampullariidae).

    PubMed

    Ip, Jack C H; Mu, Huawei; Chen, Qian; Sun, Jin; Ituarte, Santiago; Heras, Horacio; Van Bocxlaer, Bert; Ganmanee, Monthon; Huang, Xin; Qiu, Jian-Wen

    2018-03-05

    Gastropoda, with approximately 80,000 living species, is the largest class of Mollusca. Among gastropods, apple snails (family Ampullariidae) are globally distributed in tropical and subtropical freshwater ecosystems and many species are ecologically and economically important. Ampullariids exhibit various morphological and physiological adaptations to their respective habitats, which make them ideal candidates for studying adaptation, population divergence, speciation, and larger-scale patterns of diversity, including the biogeography of native and invasive populations. The limited availability of genomic data, however, hinders in-depth ecological and evolutionary studies of these non-model organisms. Using Illumina Hiseq platforms, we sequenced 1220 million reads for seven species of apple snails. Together with the previously published RNA-Seq data of two apple snails, we conducted de novo transcriptome assembly of eight species that belong to five genera of Ampullariidae, two of which represent Old World lineages and the other three New World lineages. There were 20,730 to 35,828 unigenes with predicted open reading frames for the eight species, with N50 (shortest sequence length at 50% of the unigenes) ranging from 1320 to 1803 bp. 69.7% to 80.2% of these unigenes were functionally annotated by searching against NCBI's non-redundant, Gene Ontology database and the Kyoto Encyclopaedia of Genes and Genomes. With these data we developed AmpuBase, a relational database that features online BLAST functionality for DNA/protein sequences, keyword searching for unigenes/functional terms, and download functions for sequences and whole transcriptomes. In summary, we have generated comprehensive transcriptome data for multiple ampullariid genera and species, and created a publicly accessible database with a user-friendly interface to facilitate future basic and applied studies on ampullariids, and comparative molecular studies with other invertebrates.

  9. Actinobase: Database on molecular diversity, phylogeny and biocatalytic potential of salt tolerant alkaliphilic actinomycetes.

    PubMed

    Sharma, Amit K; Gohel, Sangeeta; Singh, Satya P

    2012-01-01

    Actinobase is a relational database of molecular diversity, phylogeny and biocatalytic potential of haloalkaliphilic actinomycetes. The main objective of this data base is to provide easy access to range of information, data storage, comparison and analysis apart from reduced data redundancy, data entry, storage, retrieval costs and improve data security. Information related to habitat, cell morphology, Gram reaction, biochemical characterization and molecular features would allow researchers in understanding identification and stress adaptation of the existing and new candidates belonging to salt tolerant alkaliphilic actinomycetes. The PHP front end helps to add nucleotides and protein sequence of reported entries which directly help researchers to obtain the required details. Analysis of the genus wise status of the salt tolerant alkaliphilic actinomycetes indicated 6 different genera among the 40 classified entries of the salt tolerant alkaliphilic actinomycetes. The results represented wide spread occurrence of salt tolerant alkaliphilic actinomycetes belonging to diverse taxonomic positions. Entries and information related to actinomycetes in the database are publicly accessible at http://www.actinobase.in. On clustalW/X multiple sequence alignment of the alkaline protease gene sequences, different clusters emerged among the groups. The narrow search and limit options of the constructed database provided comparable information. The user friendly access to PHP front end facilitates would facilitate addition of sequences of reported entries. The database is available for free at http://www.actinobase.in.

  10. Epigenomic Elements Analyses for Promoters Identify ESRRG as a New Susceptibility Gene for Obesity-related Traits

    PubMed Central

    Dong, Shan-Shan; Guo, Yan; Zhu, Dong-Li; Chen, Xiao-Feng; Wu, Xiao-Ming; Shen, Hui; Chen, Xiang-Ding; Tan, Li-Jun; Tian, Qing; Deng, Hong-Wen; Yang, Tie-Lin

    2016-01-01

    OBJECTIVES With ENCODE epigenomic data and results from published genome-wide association studies (GWASs), we aimed to find regulatory signatures of obesity genes and discover novel susceptibility genes. METHODS Obesity genes were obtained from public GWASs databases and their promoters were annotated based on the regulatory elements information. Significantly enriched or depleted epigenomic elements in the promoters of obesity genes were evaluated and all human genes were then prioritized according to the existence of the selected elements to predict new candidate genes. Top ranked genes were subsequently applied to validate their associations with obesity-related traits in three independent in-house GWASs samples. RESULTS We identified RAD21 and EZH2 as over-represented, STAT2 and IRF3 as depleted transcription factors. Histone modification of H3K9me3 and chromatin state segmentation of “poised promoter” and “repressed” were overrepresented. All genes were prioritized and we selected the top five genes for validation at population level. Combined results from the three GWASs samples, rs7522101 in ESRRG remained significantly associated with BMI after multiple testing corrections (P = 7.25 × 10−5). It was also associated with β-cell function (P = 1.99 × 10−3) and fasting glucose level (P < 0.05) in the meta-analyses of glucose and insulin-related traits consortium (MAGIC) dataset. CONCLUSIONS In summary, we identified epigenomic characteristics for obesity genes and suggested ESRRG as a novel obesity susceptibility gene. PMID:27113491

  11. PIGD: a database for intronless genes in the Poaceae.

    PubMed

    Yan, Hanwei; Jiang, Cuiping; Li, Xiaoyu; Sheng, Lei; Dong, Qing; Peng, Xiaojian; Li, Qian; Zhao, Yang; Jiang, Haiyang; Cheng, Beijiu

    2014-10-01

    Intronless genes are a feature of prokaryotes; however, they are widespread and unequally distributed among eukaryotes and represent an important resource to study the evolution of gene architecture. Although many databases on exons and introns exist, there is currently no cohesive database that collects intronless genes in plants into a single database. In this study, we present the Poaceae Intronless Genes Database (PIGD), a user-friendly web interface to explore information on intronless genes from different plants. Five Poaceae species, Sorghum bicolor, Zea mays, Setaria italica, Panicum virgatum and Brachypodium distachyon, are included in the current release of PIGD. Gene annotations and sequence data were collected and integrated from different databases. The primary focus of this study was to provide gene descriptions and gene product records. In addition, functional annotations, subcellular localization prediction and taxonomic distribution are reported. PIGD allows users to readily browse, search and download data. BLAST and comparative analyses are also provided through this online database, which is available at http://pigd.ahau.edu.cn/. PIGD provides a solid platform for the collection, integration and analysis of intronless genes in the Poaceae. As such, this database will be useful for subsequent bio-computational analysis in comparative genomics and evolutionary studies.

  12. Candidate Gene Identification of Feed Efficiency and Coat Color Traits in a C57BL/6J × Kunming F2 Mice Population Using Genome-Wide Association Study.

    PubMed

    Miao, Yuanxin; Soudy, Fathia; Xu, Zhong; Liao, Mingxing; Zhao, Shuhong; Li, Xinyun

    2017-01-01

    Feed efficiency (FE) is a very important trait in livestock industry. Identification of the candidate genes could be of benefit for the improvement of FE trait. Mouse is used as the model for many studies in mammals. In this study, the candidate genes related to FE and coat color were identified using C57BL/6J (C57) × Kunming (KM) F2 mouse population. GWAS results showed that 61 and 2 SNPs were genome-wise suggestive significantly associated with feed conversion ratio (FCR) and feed intake (FI) traits, respectively. Moreover, the Erbin, Msrb2, Ptf1a, and Fgf10 were considered as the candidate genes of FE. The Lpl was considered as the candidate gene of FI. Further, the coat color trait was studied. KM mice are white and C57 ones are black. The GWAS results showed that the most significant SNP was located at chromosome 7, and the closely linked gene was Tyr. Therefore, our study offered useful target genes related to FE in mice; these genes may play similar roles in FE of livestock. Also, we identified the major gene of coat color in mice, which would be useful for better understanding of natural mutation of the coat color in mice.

  13. Genetic susceptibility to renal scar formation after urinary tract infection: a systematic review and meta-analysis of candidate gene polymorphisms.

    PubMed

    Zaffanello, Marco; Tardivo, Stefano; Cataldi, Luigi; Fanos, Vassilios; Biban, Paolo; Malerba, Giovanni

    2011-07-01

    Identifying patients who may develop renal scarring after urinary tract infections (UTI) remains challenging, as clinical determinants explain only a portion of individual risk. An additional factor that likely affects risk is individual genetic variability. We searched for peer-reviewed articles from 1980 to December 2009 in electronic databases that reported results showing an association between gene polymorphims and renal scaring after UTI. Two independent researchers screened articles using predetermined criteria. Studies were assessed for methodological quality using an aggregate scoring system. The 18 studies ultimately included in the review had investigated 16 polymorphisms in nine genes in association with renal scarring formation after UTI. Based on the predetermined criteria for assessing the quality of the studies, 12 studies (67%) were identified as being of poor quality design. A meta-analysis of cumulative studies showed on association between renal scarring formation after UTI and the angiotensin converting enzyme insertion/deletion polymorphism [ACE I/D; recessive model for D allele; odds ratio (OR) 1.73, 95% confidence interval (CI) 1.09-2.74, P = 0.02] or transforming growth factor (TGF)-β1 c.-509 T > C polymorphism (dominant model for T allele; OR 2.24, 95% CI 1.34-3.76, P = 0.002). However, heterogeneity among studies was large, indicating a strong difference that cannot only be explained by differences in study design. The studies reviewed in this article support a modest involvement of the vasomotor and inflammatory genes in the development of renal scarring after UTIs. This review also shows that only few possible candidate genes have been investigated for an association with renal scarring, raising the hypothesis that some gene polymorphisms may exert their effects through an interaction with as yet uninvestigated factors that may be related to geographic and/or socio-economic differences.

  14. LCGbase: A Comprehensive Database for Lineage-Based Co-regulated Genes.

    PubMed

    Wang, Dapeng; Zhang, Yubin; Fan, Zhonghua; Liu, Guiming; Yu, Jun

    2012-01-01

    Animal genes of different lineages, such as vertebrates and arthropods, are well-organized and blended into dynamic chromosomal structures that represent a primary regulatory mechanism for body development and cellular differentiation. The majority of genes in a genome are actually clustered, which are evolutionarily stable to different extents and biologically meaningful when evaluated among genomes within and across lineages. Until now, many questions concerning gene organization, such as what is the minimal number of genes in a cluster and what is the driving force leading to gene co-regulation, remain to be addressed. Here, we provide a user-friendly database-LCGbase (a comprehensive database for lineage-based co-regulated genes)-hosting information on evolutionary dynamics of gene clustering and ordering within animal kingdoms in two different lineages: vertebrates and arthropods. The database is constructed on a web-based Linux-Apache-MySQL-PHP framework and effective interactive user-inquiry service. Compared to other gene annotation databases with similar purposes, our database has three comprehensible advantages. First, our database is inclusive, including all high-quality genome assemblies of vertebrates and representative arthropod species. Second, it is human-centric since we map all gene clusters from other genomes in an order of lineage-ranks (such as primates, mammals, warm-blooded, and reptiles) onto human genome and start the database from well-defined gene pairs (a minimal cluster where the two adjacent genes are oriented as co-directional, convergent, and divergent pairs) to large gene clusters. Furthermore, users can search for any adjacent genes and their detailed annotations. Third, the database provides flexible parameter definitions, such as the distance of transcription start sites between two adjacent genes, which is extendable to genes that flanking the cluster across species. We also provide useful tools for sequence alignment, gene ontology (GO) annotation, promoter identification, gene expression (co-expression), and evolutionary analysis. This database not only provides a way to define lineage-specific and species-specific gene clusters but also facilitates future studies on gene co-regulation, epigenetic control of gene expression (DNA methylation and histone marks), and chromosomal structures in a context of gene clusters and species evolution. LCGbase is freely available at http://lcgbase.big.ac.cn/LCGbase.

  15. Defining the role of the MADS-box gene, Zea agamous like1, in maize domestication

    USDA-ARS?s Scientific Manuscript database

    Genomic scans for genes that show the signature of past selection have been widely applied to a number of species and have identified a large number of selection candidate genes. In cultivated maize (Zea mays ssp. mays) selection scans have identified several hundred candidate domestication genes...

  16. Genetic and Proteomic Interrogation of Lower Confidence Candidate Genes Reveals Signaling Networks in beta-Catenin-Active Cancers | Office of Cancer Genomics

    Cancer.gov

    Genome-scale expression studies and comprehensive loss-of-function genetic screens have focused almost exclusively on the highest confidence candidate genes. Here, we describe a strategy for characterizing the lower confidence candidates identified by such approaches.

  17. Bisphenol A-associated epigenomic changes in prepubescent girls: a cross-sectional study in Gharbiah, Egypt

    PubMed Central

    2013-01-01

    Background There is now compelling evidence that epigenetic modifications link adult disease susceptibility to environmental exposures during specific life stages, including pre-pubertal development. Animal studies indicate that bisphenol A (BPA), the monomer used in epoxy resins and polycarbonate plastics, may impact health through epigenetic mechanisms, and epidemiological data associate BPA levels with metabolic disorders, behavior changes, and reproductive effects. Thus, we conducted an environmental epidemiology study of BPA exposure and CpG methylation in pre-adolescent girls from Gharbiah, Egypt hypothesizing that methylation profiles exhibit exposure-dependent trends. Methods Urinary concentrations of total (free plus conjugated) species of BPA in spot samples were quantified for 60 girls aged 10 to 13. Genome-wide CpG methylation was concurrently measured in bisulfite-converted saliva DNA using the Infinium HumanMethylation27 BeadChip (N = 46). CpG sites from four candidate genes were validated via quantitative bisulfite pyrosequencing. Results CpG methylation varied widely among girls, and higher urinary BPA concentrations were generally associated with less genomic methylation. Based on pathway analyses, genes exhibiting reduced methylation with increasing urinary BPA were involved in immune function, transport activity, metabolism, and caspase activity. In particular, hypomethylation of CpG targets on chromosome X was associated with higher urinary BPA. Using the Comparative Toxicogenomics Database, we identified a number of candidate genes in our sample that previously have been associated with BPA-related expression change. Conclusions These data indicate that BPA may affect human health through specific epigenomic modification of genes in relevant pathways. Thus, epigenetic epidemiology holds promise for the identification of biomarkers from previous exposures and the development of epigenetic-based diagnostic strategies. PMID:23590724

  18. Novel Antigen Identification Method for Discovery of Protective Malaria Antigens by Rapid Testing of DNA Vaccines Encoding Exons from the Parasite Genome

    PubMed Central

    Haddad, Diana; Bilcikova, Erika; Witney, Adam A.; Carlton, Jane M.; White, Charles E.; Blair, Peter L.; Chattopadhyay, Rana; Russell, Joshua; Abot, Esteban; Charoenvit, Yupin; Aguiar, Joao C.; Carucci, Daniel J.; Weiss, Walter R.

    2004-01-01

    We describe a novel approach for identifying target antigens for preerythrocytic malaria vaccines. Our strategy is to rapidly test hundreds of DNA vaccines encoding exons from the Plasmodium yoelii yoelii genomic sequence. In this antigen identification method, we measure reduction in parasite burden in the liver after sporozoite challenge in mice. Orthologs of protective P. y. yoelii genes can then be identified in the genomic databases of Plasmodium falciparum and Plasmodium vivax and investigated as candidate antigens for a human vaccine. A pilot study to develop the antigen identification method approach used 192 P. y. yoelii exons from genes expressed during the sporozoite stage of the life cycle. A total of 182 (94%) exons were successfully cloned into a DNA immunization vector with the Gateway cloning technology. To assess immunization strategies, mice were vaccinated with 19 of the new DNA plasmids in addition to the well-characterized protective plasmid encoding P. y. yoelii circumsporozoite protein. Single plasmid immunization by gene gun identified a novel vaccine target antigen which decreased liver parasite burden by 95% and which has orthologs in P. vivax and P. knowlesi but not P. falciparum. Intramuscular injection of DNA plasmids produced a different pattern of protective responses from those seen with gene gun immunization. Intramuscular immunization with plasmid pools could reduce liver parasite burden in mice despite the fact that none of the plasmids was protective when given individually. We conclude that high-throughput cloning of exons into DNA vaccines and their screening is feasible and can rapidly identify new malaria vaccine candidate antigens. PMID:14977966

  19. Cryptosporidium hominis gene catalog: a resource for the selection of novel Cryptosporidium vaccine candidates

    PubMed Central

    Ifeonu, Olukemi O.; Simon, Raphael; Tennant, Sharon M.; Sheoran, Abhineet S.; Daly, Maria C.; Felix, Victor; Kissinger, Jessica C.; Widmer, Giovanni; Levine, Myron M.; Tzipori, Saul; Silva, Joana C.

    2016-01-01

    Human cryptosporidiosis, caused primarily by Cryptosporidium hominis and a subset of Cryptosporidium parvum, is a major cause of moderate-to-severe diarrhea in children under 5 years of age in developing countries and can lead to nutritional stunting and death. Cryptosporidiosis is particularly severe and potentially lethal in immunocompromised hosts. Biological and technical challenges have impeded traditional vaccinology approaches to identify novel targets for the development of vaccines against C. hominis, the predominant species associated with human disease. We deemed that the existence of genomic resources for multiple species in the genus, including a much-improved genome assembly and annotation for C. hominis, makes a reverse vaccinology approach feasible. To this end, we sought to generate a searchable online resource, termed C. hominis gene catalog, which registers all C. hominis genes and their properties relevant for the identification and prioritization of candidate vaccine antigens, including physical attributes, properties related to antigenic potential and expression data. Using bioinformatic approaches, we identified ∼400 C. hominis genes containing properties typical of surface-exposed antigens, such as predicted glycosylphosphatidylinositol (GPI)-anchor motifs, multiple transmembrane motifs and/or signal peptides targeting the encoded protein to the secretory pathway. This set can be narrowed further, e.g. by focusing on potential GPI-anchored proteins lacking homologs in the human genome, but with homologs in the other Cryptosporidium species for which genomic data are available, and with low amino acid polymorphism. Additional selection criteria related to recombinant expression and purification include minimizing predicted post-translation modifications and potential disulfide bonds. Forty proteins satisfying these criteria were selected from 3745 proteins in the updated C. hominis annotation. The immunogenic potential of a few of these is currently being tested. Database URL: http://cryptogc.igs.umaryland.edu PMID:28095366

  20. Biallelic missense variants in ZBTB11 can cause intellectual disability in human.

    PubMed

    Fattahi, Zohreh; Sheikh, Taimoor I; Musante, Luciana; Rasheed, Memoona; Taskiran, Ibrahim Ihsan; Harripaul, Ricardo; Hu, Hao; Kazeminasab, Somayeh; Alam, Muhammad Rizwan; Hosseini, Masoumeh; Larti, Farzaneh; Ghaderi, Zhila; Celik, Arzu; Ayub, Muhammad; Ansar, Muhammad; Haddadi, Mohammad; Wienker, Thomas F; Ropers, Hans Hilger; Kahrizi, Kimia; Vincent, John B; Najmabadi, H

    2018-06-08

    Exploring genes and pathways underlying Intellectual Disability (ID) provides insight into brain development and function, clarifying the complex puzzle of how cognition develops. As part of ongoing systematic studies to identify candidate ID genes, linkage analysis and next generation sequencing revealed ZBTB11, as a novel candidate ID gene. ZBTB11 encodes a less-studied transcription regulator and the two identified missense variants in this study may disrupt canonical Zn2+-binding residues of its C2H2 zinc finger domain, leading to possible altered DNA binding. Using HEK293T cells transfected with wild type and mutant GFP-ZBTB11 constructs, we found the ZBTB11 mutants being excluded from the nucleolus, where the wild-type recombinant protein is predominantly localized. Pathway analysis applied to ChIP-seq data deposited in the ENCODE database supports the localization of ZBTB11 in nucleoli, highlighting associated pathways such as rRNA synthesis, ribosomal assembly, RNA modification, stress sensing and provides a direct link between subcellular ZBTB11 location and its function. Furthermore, considering the report of prominent brain and spinal cord degeneration in a zebrafish Zbtb11 mutant, we investigated ZBTB11-ortholog knockdown in Drosophila melanogaster brain by targeting RNAi using the UAS/Gal4 system. The observed approximate reduction to a third of the mushroom body size - possibly through neuronal reduction or degeneration - may affect neuronal circuits in the brain that are required for adaptive behavior, specifying the role of this gene in nervous system. In conclusion, we report two ID families segregating ZBTB11 biallelic mutations disrupting Zn2+-binding motifs, and provide functional evidence linking ZBTB11 dysfunction to this phenotype.

  1. Weighted gene co-expression network analysis of colorectal cancer liver metastasis genome sequencing data and screening of anti-metastasis drugs.

    PubMed

    Gao, Bo; Shao, Qin; Choudhry, Hani; Marcus, Victoria; Dong, Kung; Ragoussis, Jiannis; Gao, Zu-Hua

    2016-09-01

    Approximately 9% of cancer-related deaths are caused by colorectal cancer (CRC). CRC patients are prone to liver metastasis, which is the most important cause for the high CRC mortality rate. Understanding the molecular mechanism of CRC liver metastasis could help us to find novel targets for the effective treatment of this deadly disease. Using weighted gene co-expression network analysis on the sequencing data of CRC with and with metastasis, we identified 5 colorectal cancer liver metastasis related modules which were labeled as brown, blue, grey, yellow and turquoise. In the brown module, which represents the metastatic tumor in the liver, gene ontology (GO) analysis revealed functions including the G-protein coupled receptor protein signaling pathway, epithelial cell differentiation and cell surface receptor linked signal transduction. In the blue module, which represents the primary CRC that has metastasized, GO analysis showed that the genes were mainly enriched in GO terms including G-protein coupled receptor protein signaling pathway, cell surface receptor linked signal transduction, and negative regulation of cell differentiation. In the yellow and turquoise modules, which represent the primary non-metastatic CRC, 13 downregulated CRC liver metastasis-related candidate miRNAs were identified (e.g. hsa-miR-204, hsa-miR-455, etc.). Furthermore, analyzing the DrugBank database and mining the literature identified 25 and 12 candidate drugs that could potentially block the metastatic processes of the primary tumor and inhibit the progression of metastatic tumors in the liver, respectively. Data generated from this study not only furthers our understanding of the genetic alterations that drive the metastatic process, but also guides the development of molecular-targeted therapy of colorectal cancer liver metastasis.

  2. Phenotypic evaluation and genetic dissection of resistance to Phytophthora sojae in the Chinese soybean mini core collection.

    PubMed

    Huang, Jing; Guo, Na; Li, Yinghui; Sun, Jutao; Hu, Guanjun; Zhang, Haipeng; Li, Yanfei; Zhang, Xing; Zhao, Jinming; Xing, Han; Qiu, Lijuan

    2016-06-18

    Phytophthora root and stem rot (PRR) caused by Phytophthora sojae is one of the most serious diseases affecting soybean (Glycine max (L.) Merr.) production all over the world. The most economical and environmentally-friendly way to control the disease is the exploration and utilization of resistant varieties. We screened a soybean mini core collection composed of 224 germplasm accessions for resistance against eleven P. sojae isolates. Soybean accessions from the Southern and Huanghuai regions, especially the Hubei, Jiangsu, Sichuan and Fujian provinces, had the most varied and broadest spectrum of resistance. Based on gene postulation, Rps1b, Rps1c, Rps4, Rps7 and novel resistance genes were identified in resistant accessions. Consequently, association mapping of resistance to each isolate was performed with 1,645 single nucleotide polymorphism (SNP) markers. A total of 14 marker-trait associations for Phytophthora resistance were identified. Among them, four were located in known PRR resistance loci intervals, five were located in other disease resistance quantitative trait locus (QTL) regions, and five associations unmasked novel loci for PRR resistance. In addition, we also identified candidate genes related to resistance. This is the first P. sojae resistance evaluation conducted using the Chinese soybean mini core collection, which is a representative sample of Chinese soybean cultivars. The resistance reaction analyses provided an excellent database of resistant resources and genetic variations for future breeding programs. The SNP markers associated with resistance will facilitate marker-assisted selection (MAS) in breeding programs for resistance to PRR, and the candidate genes may be useful for exploring the mechanism underlying P. sojae resistance.

  3. Revealing the Bacterial Butyrate Synthesis Pathways by Analyzing (Meta)genomic Data

    PubMed Central

    Vital, Marius; Howe, Adina Chuang

    2014-01-01

    ABSTRACT Butyrate-producing bacteria have recently gained attention, since they are important for a healthy colon and when altered contribute to emerging diseases, such as ulcerative colitis and type II diabetes. This guild is polyphyletic and cannot be accurately detected by 16S rRNA gene sequencing. Consequently, approaches targeting the terminal genes of the main butyrate-producing pathway have been developed. However, since additional pathways exist and alternative, newly recognized enzymes catalyzing the terminal reaction have been described, previous investigations are often incomplete. We undertook a broad analysis of butyrate-producing pathways and individual genes by screening 3,184 sequenced bacterial genomes from the Integrated Microbial Genome database. Genomes of 225 bacteria with a potential to produce butyrate were identified, including many previously unknown candidates. The majority of candidates belong to distinct families within the Firmicutes, but members of nine other phyla, especially from Actinobacteria, Bacteroidetes, Fusobacteria, Proteobacteria, Spirochaetes, and Thermotogae, were also identified as potential butyrate producers. The established gene catalogue (3,055 entries) was used to screen for butyrate synthesis pathways in 15 metagenomes derived from stool samples of healthy individuals provided by the HMP (Human Microbiome Project) consortium. A high percentage of total genomes exhibited a butyrate-producing pathway (mean, 19.1%; range, 3.2% to 39.4%), where the acetyl-coenzyme A (CoA) pathway was the most prevalent (mean, 79.7% of all pathways), followed by the lysine pathway (mean, 11.2%). Diversity analysis for the acetyl-CoA pathway showed that the same few firmicute groups associated with several Lachnospiraceae and Ruminococcaceae were dominating in most individuals, whereas the other pathways were associated primarily with Bacteroidetes. PMID:24757212

  4. A candidate gene study in low HDL-cholesterol families provides evidence for the involvement of the APOA2 gene and the APOA1C3A4 gene cluster.

    PubMed

    Lilja, Heidi E; Soro, Aino; Ylitalo, Kati; Nuotio, Ilpo; Viikari, Jorma S A; Salomaa, Veikko; Vartiainen, Erkki; Taskinen, Marja-Riitta; Peltonen, Leena; Pajukanta, Päivi

    2002-09-01

    In patients with premature coronary heart disease, the most common lipoprotein abnormality is high-density lipoprotein (HDL) deficiency. To assess the genetic background of the low HDL-cholesterol trait, we performed a candidate gene study in 25 families with low HDL, collected from the genetically isolated population of Finland. We studied 21 genes encoding essential proteins involved in the HDL metabolism by genotyping intragenic and flanking markers for these genes. We found suggestive evidence for linkage in two candidate regions: Marker D1S2844, in the apolipoprotein A-II (APOA2) region, yielded a LOD score of 2.14 and marker D11S939 flanking the apolipoprotein A-I/C-III/A-IV gene cluster (APOA1C3A4) produced a LOD score of 1.69. Interestingly, we identified potential shared haplotypes in these two regions in a subset of low HDL families. These families also contributed to the obtained positive LOD scores, whereas the rest of the families produced negative LOD scores. None of the remaining candidate regions provided any evidence for linkage. Since only a limited number of loci were tested in this candidate gene study, these LOD scores suggest significant involvement of the APOA2 gene and the APOA1C3A4 gene cluster, or loci in their immediate vicinity, in the pathogenesis of low HDL.

  5. Combining mouse mammary gland gene expression and comparative mapping for the identification of candidate genes for QTL of milk production traits in cattle

    PubMed Central

    Ron, Micha; Israeli, Galit; Seroussi, Eyal; Weller, Joel I; Gregg, Jeffrey P; Shani, Moshe; Medrano, Juan F

    2007-01-01

    Background Many studies have found segregating quantitative trait loci (QTL) for milk production traits in different dairy cattle populations. However, even for relatively large effects with a saturated marker map the confidence interval for QTL location by linkage analysis spans tens of map units, or hundreds of genes. Combining mapping and arraying has been suggested as an approach to identify candidate genes. Thus, gene expression analysis in the mammary gland of genes positioned in the confidence interval of the QTL can bridge the gap between fine mapping and quantitative trait nucleotide (QTN) determination. Results We hybridized Affymetrix microarray (MG-U74v2), containing 12,488 murine probes, with RNA derived from mammary gland of virgin, pregnant, lactating and involuting C57BL/6J mice in a total of nine biological replicates. We combined microarray data from two additional studies that used the same design in mice with a total of 75 biological replicates. The same filtering and normalization was applied to each microarray data using GeneSpring software. Analysis of variance identified 249 differentially expressed probe sets common to the three experiments along the four developmental stages of puberty, pregnancy, lactation and involution. 212 genes were assigned to their bovine map positions through comparative mapping, and thus form a list of candidate genes for previously identified QTLs for milk production traits. A total of 82 of the genes showed mammary gland-specific expression with at least 3-fold expression over the median representing all tissues tested in GeneAtlas. Conclusion This work presents a web tool for candidate genes for QTL (cgQTL) that allows navigation between the map of bovine milk production QTL, potential candidate genes and their level of expression in mammary gland arrays and in GeneAtlas. Three out of four confirmed genes that affect QTL in livestock (ABCG2, DGAT1, GDF8, IGF2) were over expressed in the target organ. Thus, cgQTL can be used to determine priority of candidate genes for QTN analysis based on differential expression in the target organ. PMID:17584498

  6. [Establishment of a comprehensive database for laryngeal cancer related genes and the miRNAs].

    PubMed

    Li, Mengjiao; E, Qimin; Liu, Jialin; Huang, Tingting; Liang, Chuanyu

    2015-09-01

    By collecting and analyzing the laryngeal cancer related genes and the miRNAs, to build a comprehensive laryngeal cancer-related gene database, which differs from the current biological information database with complex and clumsy structure and focuses on the theme of gene and miRNA, and it could make the research and teaching more convenient and efficient. Based on the B/S architecture, using Apache as a Web server, MySQL as coding language of database design and PHP as coding language of web design, a comprehensive database for laryngeal cancer-related genes was established, providing with the gene tables, protein tables, miRNA tables and clinical information tables of the patients with laryngeal cancer. The established database containsed 207 laryngeal cancer related genes, 243 proteins, 26 miRNAs, and their particular information such as mutations, methylations, diversified expressions, and the empirical references of laryngeal cancer relevant molecules. The database could be accessed and operated via the Internet, by which browsing and retrieval of the information were performed. The database were maintained and updated regularly. The database for laryngeal cancer related genes is resource-integrated and user-friendly, providing a genetic information query tool for the study of laryngeal cancer.

  7. Breast Tumors with Elevated Expression of 1q Candidate Genes Confer Poor Clinical Outcome and Sensitivity to Ras/PI3K Inhibition

    PubMed Central

    Viveka Thangaraj, Soundara; Periasamy, Jayaprakash; Bhaskar Rao, Divya; Barnabas, Georgina D.; Raghavan, Swetha; Ganesan, Kumaresan

    2013-01-01

    Genomic aberrations are common in cancers and the long arm of chromosome 1 is known for its frequent amplifications in breast cancer. However, the key candidate genes of 1q, and their contribution in breast cancer pathogenesis remain unexplored. We have analyzed the gene expression profiles of 1635 breast tumor samples using meta-analysis based approach and identified clinically significant candidates from chromosome 1q. Seven candidate genes including exonuclease 1 (EXO1) are consistently over expressed in breast tumors, specifically in high grade and aggressive breast tumors with poor clinical outcome. We derived a EXO1 co-expression module from the mRNA profiles of breast tumors which comprises 1q candidate genes and their co-expressed genes. By integrative functional genomics investigation, we identified the involvement of EGFR, RAS, PI3K / AKT, MYC, E2F signaling in the regulation of these selected 1q genes in breast tumors and breast cancer cell lines. Expression of EXO1 module was found as indicative of elevated cell proliferation, genomic instability, activated RAS/AKT/MYC/E2F1 signaling pathways and loss of p53 activity in breast tumors. mRNA–drug connectivity analysis indicates inhibition of RAS/PI3K as a possible targeted therapeutic approach for the patients with activated EXO1 module in breast tumors. Thus, we identified seven 1q candidate genes strongly associated with the poor survival of breast cancer patients and identified the possibility of targeting them with EGFR/RAS/PI3K inhibitors. PMID:24147022

  8. Exploration of ToxCast/Tox21 bioassays as candidate bioanalytical tools for measuring groups of chemicals in water.

    PubMed

    Louisse, Jochem; Dingemans, Milou M L; Baken, Kirsten A; van Wezel, Annemarie P; Schriks, Merijn

    2018-06-14

    The present study explores the ToxCast/Tox21 database to select candidate bioassays as bioanalytical tools for measuring groups of chemicals in water. To this aim, the ToxCast/Tox21 database was explored for bioassays that detect polycyclic aromatic hydrocarbons (PAHs), aromatic amines (AAs), (chloro)phenols ((C)Ps) and halogenated aliphatic hydrocarbons (HAliHs), which are included in the European and/or Dutch Drinking Water Directives. Based on the analysis of the availability and performance of bioassays included in the database, we concluded that several bioassays are suitable as bioanalytical tools for assessing the presence of PAHs and (C)Ps in drinking water sources. No bioassays were identified for AAs and HAliHs, due to the limited activity of these chemicals and/or the limited amount of data on these chemicals in the database. A series of bioassays was selected that measure molecular or cellular effects that are covered by bioassays currently in use for chemical water quality monitoring. Interestingly, also bioassays were selected that represent molecular or cellular effects that are not covered by bioassays currently applied. The usefulness of these newly identified bioassays as bioanalytical tools should be further evaluated in follow-up studies. Altogether, this study shows how exploration of the ToxCast/Tox21 database provides a series of candidate bioassays as bioanalytical tools for measuring groups of chemicals in water. This assessment can be performed for any group of chemicals of interest (if represented in the database), and may provide candidate bioassays that can be used to complement the currently applied bioassays for chemical water quality assessment. Copyright © 2018. Published by Elsevier Ltd.

  9. Tempest: Accelerated MS/MS Database Search Software for Heterogeneous Computing Platforms.

    PubMed

    Adamo, Mark E; Gerber, Scott A

    2016-09-07

    MS/MS database search algorithms derive a set of candidate peptide sequences from in silico digest of a protein sequence database, and compute theoretical fragmentation patterns to match these candidates against observed MS/MS spectra. The original Tempest publication described these operations mapped to a CPU-GPU model, in which the CPU (central processing unit) generates peptide candidates that are asynchronously sent to a discrete GPU (graphics processing unit) to be scored against experimental spectra in parallel. The current version of Tempest expands this model, incorporating OpenCL to offer seamless parallelization across multicore CPUs, GPUs, integrated graphics chips, and general-purpose coprocessors. Three protocols describe how to configure and run a Tempest search, including discussion of how to leverage Tempest's unique feature set to produce optimal results. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.

  10. Identification of genes from the Treacher Collins candidate region

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dixon, M.; Dixon, J.; Edwards, S.

    Treacher Collins syndrome (TCOF1) is an autosomal dominant disorder of craniofacial development. The TCOF1 locus has previously been mapped to chromosome 5q32-33. The candidate gene region has been defined as being between two flanking markers, ribosomal protein S14 (RPS14) and Annexin 6 (ANX6), by analyzing recombination events in affected individuals. It is estimated that the distance between these flanking markers is 500 kb by three separate analysis methods: (1) radiation hybrid mapping; (2) genetic linkage; and (3) YAC contig analysis. A cosmid contig which spans the candidate gene region for TCOF1 has been constructed by screening the Los Alamos Nationalmore » Laboratory flow-sorted chromosome 5 cosmid library. Cosmids were obtained by using a combination of probes generated from YAC end clones, Alu-PCR fragments from YACs, and asymmetric PCR fragments from both T7 and T3 cosmid ends. Exon amplifications, the selection of genomic coding sequences based upon the presence of functional splice acceptor and donor sites, was used to identify potential exon sequences. Sequences found to be conserved between species were then used to screen cDNA libraries in order to identify candidate genes. To date, four different cDNAs have been isolated from this region and are being analyzed as potential candidate genes for TCOF1. These include the genes encoding plasma glutathione peroxidase (GPX3), heparin sulfate sulfotransferase (HSST), a gene with homology to the ETS family of proteins and one which shows no homology to any known genes. Work is also in progress to identify and characterize additional cDNAs from the candidate gene region.« less

  11. Mapping a candidate gene (MdMYB10) for red flesh and foliage colour in apple

    PubMed Central

    Chagné, David; Carlisle, Charmaine M; Blond, Céline; Volz, Richard K; Whitworth, Claire J; Oraguzie, Nnadozie C; Crowhurst, Ross N; Allan, Andrew C; Espley, Richard V; Hellens, Roger P; Gardiner, Susan E

    2007-01-01

    Background Integrating plant genomics and classical breeding is a challenge for both plant breeders and molecular biologists. Marker-assisted selection (MAS) is a tool that can be used to accelerate the development of novel apple varieties such as cultivars that have fruit with anthocyanin through to the core. In addition, determining the inheritance of novel alleles, such as the one responsible for red flesh, adds to our understanding of allelic variation. Our goal was to map candidate anthocyanin biosynthetic and regulatory genes in a population segregating for the red flesh phenotypes. Results We have identified the Rni locus, a major genetic determinant of the red foliage and red colour in the core of apple fruit. In a population segregating for the red flesh and foliage phenotype we have determined the inheritance of the Rni locus and DNA polymorphisms of candidate anthocyanin biosynthetic and regulatory genes. Simple Sequence Repeats (SSRs) and Single Nucleotide Polymorphisms (SNPs) in the candidate genes were also located on an apple genetic map. We have shown that the MdMYB10 gene co-segregates with the Rni locus and is on Linkage Group (LG) 09 of the apple genome. Conclusion We have performed candidate gene mapping in a fruit tree crop and have provided genetic evidence that red colouration in the fruit core as well as red foliage are both controlled by a single locus named Rni. We have shown that the transcription factor MdMYB10 may be the gene underlying Rni as there were no recombinants between the marker for this gene and the red phenotype in a population of 516 individuals. Associating markers derived from candidate genes with a desirable phenotypic trait has demonstrated the application of genomic tools in a breeding programme of a horticultural crop species. PMID:17608951

  12. Antennal transcriptome analysis of the piercing moth Oraesia emarginata (Lepidoptera: Noctuidae)

    PubMed Central

    Feng, Bo; Guo, Qianshuang; Zheng, Kaidi; Qin, Yuanxia; Du, Yongjun

    2017-01-01

    The piercing fruit moth Oraesia emarginata is an economically significant pest; however, our understanding of its olfactory mechanisms in infestation is limited. The present study conducted antennal transcriptome analysis of olfactory genes using real-time quantitative reverse transcription PCR analysis (RT-qPCR). We identified a total of 104 candidate chemosensory genes from several gene families, including 35 olfactory receptors (ORs), 41 odorant-binding proteins, 20 chemosensory proteins, 6 ionotropic receptors, and 2 sensory neuron membrane proteins. Seven candidate pheromone receptors (PRs) and 3 candidate pheromone-binding proteins (PBPs) for sex pheromone recognition were found. OemaOR29 and OemaPBP1 had the highest fragments per kb per million fragments (FPKM) values in all ORs and OBPs, respectively. Eighteen olfactory genes were upregulated in females, including 5 candidate PRs, and 20 olfactory genes were upregulated in males, including 2 candidate PRs (OemaOR29 and 4) and 2 PBPs (OemaPBP1 and 3). These genes may have roles in mediating sex-specific behaviors. Most candidate olfactory genes of sex pheromone recognition (except OemaOR29 and OemaPBP3) in O. emarginata were not clustered with those of studied noctuid species (type I pheromone). In addition, OemaOR29 was belonged to cluster PRIII, which comprise proteins that recognize type II pheromones instead of type I pheromones. The structure and function of olfactory genes that encode sex pheromones in O. emarginata might thus differ from those of other studied noctuids. The findings of the present study may help explain the molecular mechanism underlying olfaction and the evolution of olfactory genes encoding sex pheromones in O. emarginata. PMID:28614384

  13. Elevated transcription factor specificity protein 1 in autistic brains alters the expression of autism candidate genes.

    PubMed

    Thanseem, Ismail; Anitha, Ayyappan; Nakamura, Kazuhiko; Suda, Shiro; Iwata, Keiko; Matsuzaki, Hideo; Ohtsubo, Masafumi; Ueki, Takatoshi; Katayama, Taiichi; Iwata, Yasuhide; Suzuki, Katsuaki; Minoshima, Shinsei; Mori, Norio

    2012-03-01

    Profound changes in gene expression can result from abnormalities in the concentrations of sequence-specific transcription factors like specificity protein 1 (Sp1). Specificity protein 1 binding sites have been reported in the promoter regions of several genes implicated in autism. We hypothesize that dysfunction of Sp1 could affect the expression of multiple autism candidate genes, contributing to the heterogeneity of autism. We assessed any alterations in the expression of Sp1 and that of autism candidate genes in the postmortem brain (anterior cingulate gyrus [ACG], motor cortex, and thalamus) of autism patients (n = 8) compared with healthy control subjects (n = 13). Alterations in the expression of candidate genes upon Sp1/DNA binding inhibition with mithramycin and Sp1 silencing by RNAi were studied in SK-N-SH neuronal cells. We observed elevated expression of Sp1 in ACG of autism patients (p = .010). We also observed altered expression of several autism candidate genes. GABRB3, RELN, and HTR2A showed reduced expression, whereas CD38, ITGB3, MAOA, MECP2, OXTR, and PTEN showed elevated expression in autism. In SK-N-SH cells, OXTR, PTEN, and RELN showed reduced expression upon Sp1/DNA binding inhibition and Sp1 silencing. The RNA integrity number was not available for any of the samples. Transcription factor Sp1 is dysfunctional in the ACG of autistic brain. Consequently, the expression of potential autism candidate genes regulated by Sp1, especially OXTR and PTEN, could be affected. The diverse downstream pathways mediated by the Sp1-regulated genes, along with the environmental and intracellular signal-related regulation of Sp1, could explain the complex phenotypes associated with autism.

  14. EBF factors drive expression of multiple classes of target genes governing neuronal development.

    PubMed

    Green, Yangsook S; Vetter, Monica L

    2011-04-30

    Early B cell factor (EBF) family members are transcription factors known to have important roles in several aspects of vertebrate neurogenesis, including commitment, migration and differentiation. Knowledge of how EBF family members contribute to neurogenesis is limited by a lack of detailed understanding of genes that are transcriptionally regulated by these factors. We performed a microarray screen in Xenopus animal caps to search for targets of EBF transcriptional activity, and identified candidate targets with multiple roles, including transcription factors of several classes. We determined that, among the most upregulated candidate genes with expected neuronal functions, most require EBF activity for some or all of their expression, and most have overlapping expression with ebf genes. We also found that the candidate target genes that had the most strongly overlapping expression patterns with ebf genes were predicted to be direct transcriptional targets of EBF transcriptional activity. The identification of candidate targets that are transcription factor genes, including nscl-1, emx1 and aml1, improves our understanding of how EBF proteins participate in the hierarchy of transcription control during neuronal development, and suggests novel mechanisms by which EBF activity promotes migration and differentiation. Other candidate targets, including pcdh8 and kcnk5, expand our knowledge of the types of terminal differentiated neuronal functions that EBF proteins regulate.

  15. Development of New Candidate Gene and EST-Based Molecular Markers for Gossypium Species

    PubMed Central

    Buyyarapu, Ramesh; Kantety, Ramesh V.; Yu, John Z.; Saha, Sukumar; Sharma, Govind C.

    2011-01-01

    New source of molecular markers accelerate the efforts in improving cotton fiber traits and aid in developing high-density integrated genetic maps. We developed new markers based on candidate genes and G. arboreum EST sequences that were used for polymorphism detection followed by genetic and physical mapping. Nineteen gene-based markers were surveyed for polymorphism detection in 26 Gossypium species. Cluster analysis generated a phylogenetic tree with four major sub-clusters for 23 species while three species branched out individually. CAP method enhanced the rate of polymorphism of candidate gene-based markers between G. hirsutum and G. barbadense. Two hundred A-genome based SSR markers were designed after datamining of G. arboreum EST sequences (Mississippi Gossypium arboreum   EST-SSR: MGAES). Over 70% of MGAES markers successfully produced amplicons while 65 of them demonstrated polymorphism between the parents of G. hirsutum and G. barbadense RIL population and formed 14 linkage groups. Chromosomal localization of both candidate gene-based and MGAES markers was assisted by euploid and hypoaneuploid CS-B analysis. Gene-based and MGAES markers were highly informative as they were designed from candidate genes and fiber transcriptome with a potential to be integrated into the existing cotton genetic and physical maps. PMID:22315588

  16. Identification of candidate transmission-blocking antigen genes in Theileria annulata and related vector-borne apicomplexan parasites.

    PubMed

    Lempereur, Laetitia; Larcombe, Stephen D; Durrani, Zeeshan; Karagenc, Tulin; Bilgic, Huseyin Bilgin; Bakirci, Serkan; Hacilarlioglu, Selin; Kinnaird, Jane; Thompson, Joanne; Weir, William; Shiels, Brian

    2017-06-05

    Vector-borne apicomplexan parasites are a major cause of mortality and morbidity to humans and livestock globally. The most important disease syndromes caused by these parasites are malaria, babesiosis and theileriosis. Strategies for control often target parasite stages in the mammalian host that cause disease, but this can result in reservoir infections that promote pathogen transmission and generate economic loss. Optimal control strategies should protect against clinical disease, block transmission and be applicable across related genera of parasites. We have used bioinformatics and transcriptomics to screen for transmission-blocking candidate antigens in the tick-borne apicomplexan parasite, Theileria annulata. A number of candidate antigen genes were identified which encoded amino acid domains that are conserved across vector-borne Apicomplexa (Babesia, Plasmodium and Theileria), including the Pfs48/45 6-cys domain and a novel cysteine-rich domain. Expression profiling confirmed that selected candidate genes are expressed by life cycle stages within infected ticks. Additionally, putative B cell epitopes were identified in the T. annulata gene sequences encoding the 6-cys and cysteine rich domains, in a gene encoding a putative papain-family cysteine peptidase, with similarity to the Plasmodium SERA family, and the gene encoding the T. annulata major merozoite/piroplasm surface antigen, Tams1. Candidate genes were identified that encode proteins with similarity to known transmission blocking candidates in related parasites, while one is a novel candidate conserved across vector-borne apicomplexans and has a potential role in the sexual phase of the life cycle. The results indicate that a 'One Health' approach could be utilised to develop a transmission-blocking strategy effective against vector-borne apicomplexan parasites of animals and humans.

  17. Sleeping Beauty transposon mutagenesis identifies genes that cooperate with mutant Smad4 in gastric cancer development

    PubMed Central

    Takeda, Haruna; Rust, Alistair G.; Ward, Jerrold M.; Yew, Christopher Chin Kuan; Jenkins, Nancy A.; Copeland, Neal G.

    2016-01-01

    Mutations in SMAD4 predispose to the development of gastrointestinal cancer, which is the third leading cause of cancer-related deaths. To identify genes driving gastric cancer (GC) development, we performed a Sleeping Beauty (SB) transposon mutagenesis screen in the stomach of Smad4+/− mutant mice. This screen identified 59 candidate GC trunk drivers and a much larger number of candidate GC progression genes. Strikingly, 22 SB-identified trunk drivers are known or candidate cancer genes, whereas four SB-identified trunk drivers, including PTEN, SMAD4, RNF43, and NF1, are known human GC trunk drivers. Similar to human GC, pathway analyses identified WNT, TGF-β, and PI3K-PTEN signaling, ubiquitin-mediated proteolysis, adherens junctions, and RNA degradation in addition to genes involved in chromatin modification and organization as highly deregulated pathways in GC. Comparative oncogenomic filtering of the complete list of SB-identified genes showed that they are highly enriched for genes mutated in human GC and identified many candidate human GC genes. Finally, by comparing our complete list of SB-identified genes against the list of mutated genes identified in five large-scale human GC sequencing studies, we identified LDL receptor-related protein 1B (LRP1B) as a previously unidentified human candidate GC tumor suppressor gene. In LRP1B, 129 mutations were found in 462 human GC samples sequenced, and LRP1B is one of the top 10 most deleted genes identified in a panel of 3,312 human cancers. SB mutagenesis has, thus, helped to catalog the cooperative molecular mechanisms driving SMAD4-induced GC growth and discover genes with potential clinical importance in human GC. PMID:27006499

  18. Sleeping Beauty transposon mutagenesis identifies genes that cooperate with mutant Smad4 in gastric cancer development.

    PubMed

    Takeda, Haruna; Rust, Alistair G; Ward, Jerrold M; Yew, Christopher Chin Kuan; Jenkins, Nancy A; Copeland, Neal G

    2016-04-05

    Mutations in SMAD4 predispose to the development of gastrointestinal cancer, which is the third leading cause of cancer-related deaths. To identify genes driving gastric cancer (GC) development, we performed a Sleeping Beauty (SB) transposon mutagenesis screen in the stomach of Smad4(+/-) mutant mice. This screen identified 59 candidate GC trunk drivers and a much larger number of candidate GC progression genes. Strikingly, 22 SB-identified trunk drivers are known or candidate cancer genes, whereas four SB-identified trunk drivers, including PTEN, SMAD4, RNF43, and NF1, are known human GC trunk drivers. Similar to human GC, pathway analyses identified WNT, TGF-β, and PI3K-PTEN signaling, ubiquitin-mediated proteolysis, adherens junctions, and RNA degradation in addition to genes involved in chromatin modification and organization as highly deregulated pathways in GC. Comparative oncogenomic filtering of the complete list of SB-identified genes showed that they are highly enriched for genes mutated in human GC and identified many candidate human GC genes. Finally, by comparing our complete list of SB-identified genes against the list of mutated genes identified in five large-scale human GC sequencing studies, we identified LDL receptor-related protein 1B (LRP1B) as a previously unidentified human candidate GC tumor suppressor gene. In LRP1B, 129 mutations were found in 462 human GC samples sequenced, and LRP1B is one of the top 10 most deleted genes identified in a panel of 3,312 human cancers. SB mutagenesis has, thus, helped to catalog the cooperative molecular mechanisms driving SMAD4-induced GC growth and discover genes with potential clinical importance in human GC.

  19. Identification of Candidate Genes Responsible for Stem Pith Production Using Expression Analysis in Solid-Stemmed Wheat.

    PubMed

    Oiestad, A J; Martin, J M; Cook, J; Varella, A C; Giroux, M J

    2017-07-01

    The wheat stem sawfly (WSS) is an economically important pest of wheat in the Northern Great Plains. The primary means of WSS control is resistance associated with the single quantitative trait locus (QTL) , which controls most stem solidness variation. The goal of this study was to identify stem solidness candidate genes via RNA-seq. This study made use of 28 single nucleotide polymorphism (SNP) makers derived from expressed sequence tags (ESTs) linked to contained within a 5.13 cM region. Allele specific expression of EST markers was examined in stem tissue for solid and hollow-stemmed pairs of two spring wheat near isogenic lines (NILs) differing for the QTL. Of the 28 ESTs, 13 were located within annotated genes and 10 had detectable stem expression. Annotated genes corresponding to four of the ESTs were differentially expressed between solid and hollow-stemmed NILs and represent possible stem solidness gene candidates. Further examination of the 5.13 cM region containing the 28 EST markers identified 260 annotated genes. Twenty of the 260 linked genes were up-regulated in hollow NIL stems, while only seven genes were up-regulated in solid NIL stems. An -methyltransferase within the region of interest was identified as a candidate based on differential expression between solid and hollow-stemmed NILs and putative function. Further study of these candidate genes may lead to the identification of the gene(s) controlling stem solidness and an increased ability to select for wheat stem solidness and manage WSS. Copyright © 2017 Crop Science Society of America.

  20. confFuse: High-Confidence Fusion Gene Detection across Tumor Entities.

    PubMed

    Huang, Zhiqin; Jones, David T W; Wu, Yonghe; Lichter, Peter; Zapatka, Marc

    2017-01-01

    Background: Fusion genes play an important role in the tumorigenesis of many cancers. Next-generation sequencing (NGS) technologies have been successfully applied in fusion gene detection for the last several years, and a number of NGS-based tools have been developed for identifying fusion genes during this period. Most fusion gene detection tools based on RNA-seq data report a large number of candidates (mostly false positives), making it hard to prioritize candidates for experimental validation and further analysis. Selection of reliable fusion genes for downstream analysis becomes very important in cancer research. We therefore developed confFuse, a scoring algorithm to reliably select high-confidence fusion genes which are likely to be biologically relevant. Results: confFuse takes multiple parameters into account in order to assign each fusion candidate a confidence score, of which score ≥8 indicates high-confidence fusion gene predictions. These parameters were manually curated based on our experience and on certain structural motifs of fusion genes. Compared with alternative tools, based on 96 published RNA-seq samples from different tumor entities, our method can significantly reduce the number of fusion candidates (301 high-confidence from 8,083 total predicted fusion genes) and keep high detection accuracy (recovery rate 85.7%). Validation of 18 novel, high-confidence fusions detected in three breast tumor samples resulted in a 100% validation rate. Conclusions: confFuse is a novel downstream filtering method that allows selection of highly reliable fusion gene candidates for further downstream analysis and experimental validations. confFuse is available at https://github.com/Zhiqin-HUANG/confFuse.

  1. Comparative analysis of protein interactome networks prioritizes candidate genes with cancer signatures.

    PubMed

    Li, Yongsheng; Sahni, Nidhi; Yi, Song

    2016-11-29

    Comprehensive understanding of human cancer mechanisms requires the identification of a thorough list of cancer-associated genes, which could serve as biomarkers for diagnoses and therapies in various types of cancer. Although substantial progress has been made in functional studies to uncover genes involved in cancer, these efforts are often time-consuming and costly. Therefore, it remains challenging to comprehensively identify cancer candidate genes. Network-based methods have accelerated this process through the analysis of complex molecular interactions in the cell. However, the extent to which various interactome networks can contribute to prediction of candidate genes responsible for cancer is still enigmatic. In this study, we evaluated different human protein-protein interactome networks and compared their application to cancer gene prioritization. Our results indicate that network analyses can increase the power to identify novel cancer genes. In particular, such predictive power can be enhanced with the use of unbiased systematic protein interaction maps for cancer gene prioritization. Functional analysis reveals that the top ranked genes from network predictions co-occur often with cancer-related terms in literature, and further, these candidate genes are indeed frequently mutated across cancers. Finally, our study suggests that integrating interactome networks with other omics datasets could provide novel insights into cancer-associated genes and underlying molecular mechanisms.

  2. Survey of candidate genes for maize resistance to infection by Aspergillus flavus and/or aflatoxin contamination

    Treesearch

    Leigh Hawkins; Marilyn Warburton; Juliet Tang; John Tomashek; Dafne Alves Oliveira; Oluwaseun Ogunola; J. Smith; W. Williams

    2018-01-01

    Many projects have identified candidate genes for resistance to aflatoxin accumulation or Aspergillus flavus infection and growth in maize using genetic mapping, genomics, transcriptomics and/or proteomics studies. However, only a small percentage of these candidates have been validated in field conditions, and their relative contribution to...

  3. Screening for susceptibility genes in hereditary non-polyposis colorectal cancer.

    PubMed

    Yu, Li; Yin, Bo; Qu, Kaiying; Li, Jingjing; Jin, Qiao; Liu, Ling; Liu, Chunlan; Zhu, Yuxing; Wang, Qi; Peng, Xiaowei; Zhou, Jianda; Cao, Peiguo; Cao, Ke

    2018-06-01

    In the present study, hereditary non-polyposis colorectal cancer (HNPCC) susceptibility genes were screened for using whole exome sequencing in 3 HNPCC patients from 1 family and using single nucleotide polymorphism (SNP) genotyping assays in 96 other colorectal cancer and control samples. Peripheral blood was obtained from 3 HNPCC patients from 1 family; the proband and the proband's brother and cousin. High-throughput sequencing was performed using whole exome capture technology. Sequences were aligned against the HAPMAP, dbSNP130 and 1,000 Genome Project databases. Reported common variations and synonymous mutations were filtered out. Non-synonymous single nucleotide variants in the 3 HNPCC patients were integrated and the candidate genes were identified. Finally, SNP genotyping was performed for the genes in 96 peripheral blood samples. In total, 60.4 Gb of data was retrieved from the 3 HNPCC patients using whole exome capture technology. Subsequently, according to certain screening criteria, 15 candidate genes were identified. Among the 96 samples that had been SNP genotyped, 92 were successfully genotyped for 15 gene loci, while genotyping for HTRA1 failed in 4 sporadic colorectal cancer patient samples. In 12 control subjects and 81 sporadic colorectal cancer patients, genotypes at 13 loci were wild-type, namely DDX20, ZFYVE26, PIK3R3, SLC26A8, ZEB2, TP53INP1, SLC11A1, LRBA, CEBPZ, ETAA1, SEMA3G, IFRD2 and FAT1 . The CEP290 genotype was mutant in 1 sporadic colorectal cancer patient and was wild-type in all other subjects. A total of 5 of the 12 control subjects and 30 of the 81 sporadic colorectal cancer patients had a mutant HTRA1 genotype. In all 3 HNPCC patients, the same mutant genotypes were identified at all 15 gene loci. Overall, 13 potential susceptibility genes for HNPCC were identified, namely DDX20, ZFYVE26, PIK3R3, SLC26A8, ZEB2, TP53INP1, SLC11A1, LRBA, CEBPZ, ETAA1, SEMA3G, IFRD2 and FAT1 .

  4. The Pratylenchus penetrans Transcriptome as a Source for the Development of Alternative Control Strategies: Mining for Putative Genes Involved in Parasitism and Evaluation of in planta RNAi

    PubMed Central

    Vieira, Paulo; Eves-van den Akker, Sebastian; Verma, Ruchi; Wantoch, Sarah; Eisenback, Jonathan D.; Kamo, Kathryn

    2015-01-01

    The root lesion nematode Pratylenchus penetrans is considered one of the most economically important species within the genus. Host range studies have shown that nearly 400 plant species can be parasitized by this species. To obtain insight into the transcriptome of this migratory plant-parasitic nematode, we used Illumina mRNA sequencing analysis of a mixed population, as well as nematode reads detected in infected soybean roots 3 and 7 days after nematode infection. Over 140 million paired end reads were obtained for this species, and de novo assembly resulted in a total of 23,715 transcripts. Homology searches showed significant hit matches to 58% of the total number of transcripts using different protein and EST databases. In general, the transcriptome of P. penetrans follows common features reported for other root lesion nematode species. We also explored the efficacy of RNAi, delivered from the host, as a strategy to control P. penetrans, by targeted knock-down of selected nematode genes. Different comparisons were performed to identify putative nematode genes with a role in parasitism, resulting in the identification of transcripts with similarities to other nematode parasitism genes. Focusing on the predicted nematode secreted proteins found in this transcriptome, we observed specific members to be up-regulated at the early time points of infection. In the present study, we observed an enrichment of predicted secreted proteins along the early time points of parasitism by this species, with a significant number being pioneer candidate genes. A representative set of genes examined using RT-PCR confirms their expression during the host infection. The expression patterns of the different candidate genes raise the possibility that they might be involved in critical steps of P. penetrans parasitism. This analysis sheds light on the transcriptional changes that accompany plant infection by P. penetrans, and will aid in identifying potential gene targets for selection and use to design effective control strategies against root lesion nematodes. PMID:26658731

  5. Evidence-based gene models for structural and functional annotations of the oil palm genome.

    PubMed

    Chan, Kuang-Lim; Tatarinova, Tatiana V; Rosli, Rozana; Amiruddin, Nadzirah; Azizi, Norazah; Halim, Mohd Amin Ab; Sanusi, Nik Shazana Nik Mohd; Jayanthi, Nagappan; Ponomarenko, Petr; Triska, Martin; Solovyev, Victor; Firdaus-Raih, Mohd; Sambanthamurthi, Ravigadevi; Murphy, Denis; Low, Eng-Ti Leslie

    2017-09-08

    Oil palm is an important source of edible oil. The importance of the crop, as well as its long breeding cycle (10-12 years) has led to the sequencing of its genome in 2013 to pave the way for genomics-guided breeding. Nevertheless, the first set of gene predictions, although useful, had many fragmented genes. Classification and characterization of genes associated with traits of interest, such as those for fatty acid biosynthesis and disease resistance, were also limited. Lipid-, especially fatty acid (FA)-related genes are of particular interest for the oil palm as they specify oil yields and quality. This paper presents the characterization of the oil palm genome using different gene prediction methods and comparative genomics analysis, identification of FA biosynthesis and disease resistance genes, and the development of an annotation database and bioinformatics tools. Using two independent gene-prediction pipelines, Fgenesh++ and Seqping, 26,059 oil palm genes with transcriptome and RefSeq support were identified from the oil palm genome. These coding regions of the genome have a characteristic broad distribution of GC 3 (fraction of cytosine and guanine in the third position of a codon) with over half the GC 3 -rich genes (GC 3  ≥ 0.75286) being intronless. In comparison, only one-seventh of the oil palm genes identified are intronless. Using comparative genomics analysis, characterization of conserved domains and active sites, and expression analysis, 42 key genes involved in FA biosynthesis in oil palm were identified. For three of them, namely EgFABF, EgFABH and EgFAD3, segmental duplication events were detected. Our analysis also identified 210 candidate resistance genes in six classes, grouped by their protein domain structures. We present an accurate and comprehensive annotation of the oil palm genome, focusing on analysis of important categories of genes (GC 3 -rich and intronless), as well as those associated with important functions, such as FA biosynthesis and disease resistance. The study demonstrated the advantages of having an integrated approach to gene prediction and developed a computational framework for combining multiple genome annotations. These results, available in the oil palm annotation database ( http://palmxplore.mpob.gov.my ), will provide important resources for studies on the genomes of oil palm and related crops. This article was reviewed by Alexander Kel, Igor Rogozin, and Vladimir A. Kuznetsov.

  6. Identification of Immunity Related Genes to Study the Physalis peruviana – Fusarium oxysporum Pathosystem

    PubMed Central

    Enciso-Rodríguez, Felix E.; González, Carolina; Rodríguez, Edwin A.; López, Camilo E.; Landsman, David; Barrero, Luz Stella; Mariño-Ramírez, Leonardo

    2013-01-01

    The Cape gooseberry ( Physalis peruviana L) is an Andean exotic fruit with high nutritional value and appealing medicinal properties. However, its cultivation faces important phytosanitary problems mainly due to pathogens like Fusarium oxysporum, Cercosporaphysalidis and Alternaria spp. Here we used the Cape gooseberry foliar transcriptome to search for proteins that encode conserved domains related to plant immunity including: NBS (Nucleotide Binding Site), CC (Coiled-Coil), TIR (Toll/Interleukin-1 Receptor). We identified 74 immunity related gene candidates in P . peruviana which have the typical resistance gene (R-gene) architecture, 17 Receptor like kinase (RLKs) candidates related to PAMP-Triggered Immunity (PTI), eight (TIR-NBS-LRR, or TNL) and nine (CC–NBS-LRR, or CNL) candidates related to Effector-Triggered Immunity (ETI) genes among others. These candidate genes were categorized by molecular function (98%), biological process (85%) and cellular component (79%) using gene ontology. Some of the most interesting predicted roles were those associated with binding and transferase activity. We designed 94 primers pairs from the 74 immunity-related genes (IRGs) to amplify the corresponding genomic regions on six genotypes that included resistant and susceptible materials. From these, we selected 17 single band amplicons and sequenced them in 14 F. oxysporum resistant and susceptible genotypes. Sequence polymorphisms were analyzed through preliminary candidate gene association, which allowed the detection of one SNP at the PpIRG-63 marker revealing a nonsynonymous mutation in the predicted LRR domain suggesting functional roles for resistance. PMID:23844210

  7. Identification of immunity related genes to study the Physalis peruviana--Fusarium oxysporum pathosystem.

    PubMed

    Enciso-Rodríguez, Felix E; González, Carolina; Rodríguez, Edwin A; López, Camilo E; Landsman, David; Barrero, Luz Stella; Mariño-Ramírez, Leonardo

    2013-01-01

    The Cape gooseberry (Physalisperuviana L) is an Andean exotic fruit with high nutritional value and appealing medicinal properties. However, its cultivation faces important phytosanitary problems mainly due to pathogens like Fusarium oxysporum, Cercosporaphysalidis and Alternaria spp. Here we used the Cape gooseberry foliar transcriptome to search for proteins that encode conserved domains related to plant immunity including: NBS (Nucleotide Binding Site), CC (Coiled-Coil), TIR (Toll/Interleukin-1 Receptor). We identified 74 immunity related gene candidates in P. peruviana which have the typical resistance gene (R-gene) architecture, 17 Receptor like kinase (RLKs) candidates related to PAMP-Triggered Immunity (PTI), eight (TIR-NBS-LRR, or TNL) and nine (CC-NBS-LRR, or CNL) candidates related to Effector-Triggered Immunity (ETI) genes among others. These candidate genes were categorized by molecular function (98%), biological process (85%) and cellular component (79%) using gene ontology. Some of the most interesting predicted roles were those associated with binding and transferase activity. We designed 94 primers pairs from the 74 immunity-related genes (IRGs) to amplify the corresponding genomic regions on six genotypes that included resistant and susceptible materials. From these, we selected 17 single band amplicons and sequenced them in 14 F. oxysporum resistant and susceptible genotypes. Sequence polymorphisms were analyzed through preliminary candidate gene association, which allowed the detection of one SNP at the PpIRG-63 marker revealing a nonsynonymous mutation in the predicted LRR domain suggesting functional roles for resistance.

  8. HLA genotyping by next-generation sequencing of complementary DNA.

    PubMed

    Segawa, Hidenobu; Kukita, Yoji; Kato, Kikuya

    2017-11-28

    Genotyping of the human leucocyte antigen (HLA) is indispensable for various medical treatments. However, unambiguous genotyping is technically challenging due to high polymorphism of the corresponding genomic region. Next-generation sequencing is changing the landscape of genotyping. In addition to high throughput of data, its additional advantage is that DNA templates are derived from single molecules, which is a strong merit for the phasing problem. Although most currently developed technologies use genomic DNA, use of cDNA could enable genotyping with reduced costs in data production and analysis. We thus developed an HLA genotyping system based on next-generation sequencing of cDNA. Each HLA gene was divided into 3 or 4 target regions subjected to PCR amplification and subsequent sequencing with Ion Torrent PGM. The sequence data were then subjected to an automated analysis. The principle of the analysis was to construct candidate sequences generated from all possible combinations of variable bases and arrange them in decreasing order of the number of reads. Upon collecting candidate sequences from all target regions, 2 haplotypes were usually assigned. Cases not assigned 2 haplotypes were forwarded to 4 additional processes: selection of candidate sequences applying more stringent criteria, removal of artificial haplotypes, selection of candidate sequences with a relaxed threshold for sequence matching, and countermeasure for incomplete sequences in the HLA database. The genotyping system was evaluated using 30 samples; the overall accuracy was 97.0% at the field 3 level and 98.3% at the G group level. With one sample, genotyping of DPB1 was not completed due to short read size. We then developed a method for complete sequencing of individual molecules of the DPB1 gene, using the molecular barcode technology. The performance of the automatic genotyping system was comparable to that of systems developed in previous studies. Thus, next-generation sequencing of cDNA is a viable option for HLA genotyping.

  9. Next-generation sequencing for identification of candidate genes for Fusarium wilt and sterility mosaic disease in pigeonpea (Cajanus cajan).

    PubMed

    Singh, Vikas K; Khan, Aamir W; Saxena, Rachit K; Kumar, Vinay; Kale, Sandip M; Sinha, Pallavi; Chitikineni, Annapurna; Pazhamala, Lekha T; Garg, Vanika; Sharma, Mamta; Sameer Kumar, Chanda Venkata; Parupalli, Swathi; Vechalapu, Suryanarayana; Patil, Suyash; Muniswamy, Sonnappa; Ghanta, Anuradha; Yamini, Kalinati Narasimhan; Dharmaraj, Pallavi Subbanna; Varshney, Rajeev K

    2016-05-01

    To map resistance genes for Fusarium wilt (FW) and sterility mosaic disease (SMD) in pigeonpea, sequencing-based bulked segregant analysis (Seq-BSA) was used. Resistant (R) and susceptible (S) bulks from the extreme recombinant inbred lines of ICPL 20096 × ICPL 332 were sequenced. Subsequently, SNP index was calculated between R- and S-bulks with the help of draft genome sequence and reference-guided assembly of ICPL 20096 (resistant parent). Seq-BSA has provided seven candidate SNPs for FW and SMD resistance in pigeonpea. In parallel, four additional genotypes were re-sequenced and their combined analysis with R- and S-bulks has provided a total of 8362 nonsynonymous (ns) SNPs. Of 8362 nsSNPs, 60 were found within the 2-Mb flanking regions of seven candidate SNPs identified through Seq-BSA. Haplotype analysis narrowed down to eight nsSNPs in seven genes. These eight nsSNPs were further validated by re-sequencing 11 genotypes that are resistant and susceptible to FW and SMD. This analysis revealed association of four candidate nsSNPs in four genes with FW resistance and four candidate nsSNPs in three genes with SMD resistance. Further, In silico protein analysis and expression profiling identified two most promising candidate genes namely C.cajan_01839 for SMD resistance and C.cajan_03203 for FW resistance. Identified candidate genomic regions/SNPs will be useful for genomics-assisted breeding in pigeonpea. © 2015 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.

  10. A Novel Strategy for Selection and Validation of Reference Genes in Dynamic Multidimensional Experimental Design in Yeast

    PubMed Central

    Cankorur-Cetinkaya, Ayca; Dereli, Elif; Eraslan, Serpil; Karabekmez, Erkan; Dikicioglu, Duygu; Kirdar, Betul

    2012-01-01

    Background Understanding the dynamic mechanism behind the transcriptional organization of genes in response to varying environmental conditions requires time-dependent data. The dynamic transcriptional response obtained by real-time RT-qPCR experiments could only be correctly interpreted if suitable reference genes are used in the analysis. The lack of available studies on the identification of candidate reference genes in dynamic gene expression studies necessitates the identification and the verification of a suitable gene set for the analysis of transient gene expression response. Principal Findings In this study, a candidate reference gene set for RT-qPCR analysis of dynamic transcriptional changes in Saccharomyces cerevisiae was determined using 31 different publicly available time series transcriptome datasets. Ten of the twelve candidates (TPI1, FBA1, CCW12, CDC19, ADH1, PGK1, GCN4, PDC1, RPS26A and ARF1) we identified were not previously reported as potential reference genes. Our method also identified the commonly used reference genes ACT1 and TDH3. The most stable reference genes from this pool were determined as TPI1, FBA1, CDC19 and ACT1 in response to a perturbation in the amount of available glucose and as FBA1, TDH3, CCW12 and ACT1 in response to a perturbation in the amount of available ammonium. The use of these newly proposed gene sets outperformed the use of common reference genes in the determination of dynamic transcriptional response of the target genes, HAP4 and MEP2, in response to relaxation from glucose and ammonium limitations, respectively. Conclusions A candidate reference gene set to be used in dynamic real-time RT-qPCR expression profiling in yeast was proposed for the first time in the present study. Suitable pools of stable reference genes to be used under different experimental conditions could be selected from this candidate set in order to successfully determine the expression profiles for the genes of interest. PMID:22675547

  11. Supersonic and hypersonic shock/boundary-layer interaction database

    NASA Technical Reports Server (NTRS)

    Settles, Gary S.; Dodson, Lori J.

    1994-01-01

    An assessment is given of existing shock wave/tubulent boundary-layer interaction experiments having sufficient quality to guide turbulence modeling and code validation efforts. Although the focus of this work is hypersonic, experiments at Mach numbers as low as 3 were considered. The principal means of identifying candidate studies was a computerized search of the AIAA Aerospace Database. Several hundred candidate studies were examined and over 100 of these were subjected to a rigorous set of acceptance criteria for inclusion in the data-base. Nineteen experiments were found to meet these criteria, of which only seven were in the hypersonic regime (M is greater than 5).

  12. Bioinformatic analysis of Msx1 and Msx2 involved in craniofacial development.

    PubMed

    Dai, Jiewen; Mou, Zhifang; Shen, Shunyao; Dong, Yuefu; Yang, Tong; Shen, Steve Guofang

    2014-01-01

    Msx1 and Msx2 were revealed to be candidate genes for some craniofacial deformities, such as cleft lip with/without cleft palate (CL/P) and craniosynostosis. Many other genes were demonstrated to have a cross-talk with MSX genes in causing these defects. However, there is no systematic evaluation for these MSX gene-related factors. In this study, we performed systematic bioinformatic analysis for MSX genes by combining using GeneDecks, DAVID, and STRING database, and the results showed that there were numerous genes related to MSX genes, such as Irf6, TP63, Dlx2, Dlx5, Pax3, Pax9, Bmp4, Tgf-beta2, and Tgf-beta3 that have been demonstrated to be involved in CL/P, and Fgfr2, Fgfr1, Fgfr3, and Twist1 that were involved in craniosynostosis. Many of these genes could be enriched into different gene groups involved in different signaling ways, different craniofacial deformities, and different biological process. These findings could make us analyze the function of MSX gens in a gene network. In addition, our findings showed that Sumo, a novel gene whose polymorphisms were demonstrated to be associated with nonsyndromic CL/P by genome-wide association study, has protein-protein interaction with MSX1, which may offer us an alternative method to perform bioinformatic analysis for genes found by genome-wide association study and can make us predict the disrupted protein function due to the mutation in a gene DNA sequence. These findings may guide us to perform further functional studies in the future.

  13. NeuroRDF: semantic integration of highly curated data to prioritize biomarker candidates in Alzheimer's disease.

    PubMed

    Iyappan, Anandhi; Kawalia, Shweta Bagewadi; Raschka, Tamara; Hofmann-Apitius, Martin; Senger, Philipp

    2016-07-08

    Neurodegenerative diseases are incurable and debilitating indications with huge social and economic impact, where much is still to be learnt about the underlying molecular events. Mechanistic disease models could offer a knowledge framework to help decipher the complex interactions that occur at molecular and cellular levels. This motivates the need for the development of an approach integrating highly curated and heterogeneous data into a disease model of different regulatory data layers. Although several disease models exist, they often do not consider the quality of underlying data. Moreover, even with the current advancements in semantic web technology, we still do not have cure for complex diseases like Alzheimer's disease. One of the key reasons accountable for this could be the increasing gap between generated data and the derived knowledge. In this paper, we describe an approach, called as NeuroRDF, to develop an integrative framework for modeling curated knowledge in the area of complex neurodegenerative diseases. The core of this strategy lies in the usage of well curated and context specific data for integration into one single semantic web-based framework, RDF. This increases the probability of the derived knowledge to be novel and reliable in a specific disease context. This infrastructure integrates highly curated data from databases (Bind, IntAct, etc.), literature (PubMed), and gene expression resources (such as GEO and ArrayExpress). We illustrate the effectiveness of our approach by asking real-world biomedical questions that link these resources to prioritize the plausible biomarker candidates. Among the 13 prioritized candidate genes, we identified MIF to be a potential emerging candidate due to its role as a pro-inflammatory cytokine. We additionally report on the effort and challenges faced during generation of such an indication-specific knowledge base comprising of curated and quality-controlled data. Although many alternative approaches have been proposed and practiced for modeling diseases, the semantic web technology is a flexible and well established solution for harmonized aggregation. The benefit of this work, to use high quality and context specific data, becomes apparent in speculating previously unattended biomarker candidates around a well-known mechanism, further leveraged for experimental investigations.

  14. Exome capture sequencing reveals new insights into hepatitis B virus-induced hepatocellular carcinoma at the early stage of tumorigenesis.

    PubMed

    Chen, Yong; Wang, Lijuan; Xu, Hexiang; Liu, Xingxiang; Zhao, Yingren

    2013-10-01

    Hepatocellular carcinoma (HCC), the most common type of liver cancer, is the third primary cause of cancer-related mortality worldwide. The molecular mechanisms underlying the initiation and formation of HCC remain obscure. In the present study, we performed exome sequencing using tumor and normal tissues from 3 hepatitis B virus (HBV)-positive BCLC stage A HCC patients. Bioinformatic analysis was performed to find candidate protein-altering somatic mutations. Eighty damaging mutations were validated and 59 genes were reported to be mutated in HBV-related HCCs for the first time here. Further analysis using whole genome sequencing (WGS) data of 88 HBV-related HCC patients from the European Genome-phenome Archive database showed that mutations in 33 of the 59 genes were also detected in other samples. Variants of two newly found genes, ZNF717 and PARP4, were detected in more than 10% of the WGS samples. Several other genes, such as FLNA and CNTN2, are also noteworthy. Thus, the exome sequencing analysis of three BCLC stage A patients provides new insights into the molecular events governing the early steps of HBV-induced HCC tumorigenesis.

  15. Transcriptome analysis of molecular mechanisms responsible for light-stress response in Mythimna separata (Walker)

    PubMed Central

    Duan, Yun; Gong, ZhongJun; Wu, RenHai; Miao, Jin; Jiang, YueLi; Li, Tong; Wu, XiaoBo; Wu, YuQing

    2017-01-01

    Light is an important environmental signal for most insects. The Oriental Armyworm, Mythimna separata, is a serious pest of cereal crops worldwide, and is highly sensitive to light signals during its developmental and reproductive stages. However, molecular biological studies of its response to light stress are scarce, and related genomic information is not available. In this study, we sequenced and de novo assembled the transcriptomes of M. separata exposed to four different light conditions: dark, white light (WL), UV light (UVL) and yellow light (YL). A total of 46,327 unigenes with an average size of 571 base pairs (bp) were obtained, among which 24,344 (52.55%) matched to public databases. The numbers of genes differentially expressed between dark vs WL, dark vs UVL, dark vs YL, and UVL vs YL were 12,012, 12,950, 14,855, and 13,504, respectively. These results suggest that light exposure altered gene expression patterns in M. separata. Putative genes involved in phototransduction-fly, phototransduction, circadian rhythm-fly, olfactory transduction, and taste transduction were identified. This study thus identified a series of candidate genes and pathways potentially related to light stress in M. separata. PMID:28345615

  16. Schizophrenia, vitamin D, and brain development.

    PubMed

    Mackay-Sim, Alan; Féron, François; Eyles, Darryl; Burne, Thomas; McGrath, John

    2004-01-01

    Schizophrenia research is invigorated at present by the recent discovery of several plausible candidate susceptibility genes identified from genetic linkage and gene expression studies of brains from persons with schizophrenia. It is a current challenge to reconcile this gathering evidence for specific candidate susceptibility genes with the "neurodevelopmental hypothesis," which posits that schizophrenia arises from gene-environment interactions that disrupt brain development. We make the case here that schizophrenia may result not from numerous genes of small effect, but a few genes of transcriptional regulation acting during brain development. In particular we propose that low vitamin D during brain development interacts with susceptibility genes to alter the trajectory of brain development, probably by epigenetic regulation that alters gene expression throughout adult life. Vitamin D is an attractive "environmental" candidate because it appears to explain several key epidemiological features of schizophrenia. Vitamin D is an attractive "genetic" candidate because its nuclear hormone receptor regulates gene expression and nervous system development. The polygenic quality of schizophrenia, with linkage to many genes of small effect, maybe brought together via this "vitamin D hypothesis." We also discuss the possibility of a broader set of environmental and genetic factors interacting via the nuclear hormone receptors to affect the development of the brain leading to schizophrenia.

  17. Multi-Dimensional Prioritization of Dental Caries Candidate Genes and Its Enriched Dense Network Modules

    PubMed Central

    Wang, Quan; Jia, Peilin; Cuenco, Karen T.; Feingold, Eleanor; Marazita, Mary L.; Wang, Lily; Zhao, Zhongming

    2013-01-01

    A number of genetic studies have suggested numerous susceptibility genes for dental caries over the past decade with few definite conclusions. The rapid accumulation of relevant information, along with the complex architecture of the disease, provides a challenging but also unique opportunity to review and integrate the heterogeneous data for follow-up validation and exploration. In this study, we collected and curated candidate genes from four major categories: association studies, linkage scans, gene expression analyses, and literature mining. Candidate genes were prioritized according to the magnitude of evidence related to dental caries. We then searched for dense modules enriched with the prioritized candidate genes through their protein-protein interactions (PPIs). We identified 23 modules comprising of 53 genes. Functional analyses of these 53 genes revealed three major clusters: cytokine network relevant genes, matrix metalloproteinases (MMPs) family, and transforming growth factor-beta (TGF-β) family, all of which have been previously implicated to play important roles in tooth development and carious lesions. Through our extensive data collection and an integrative application of gene prioritization and PPI network analyses, we built a dental caries-specific sub-network for the first time. Our study provided insights into the molecular mechanisms underlying dental caries. The framework we proposed in this work can be applied to other complex diseases. PMID:24146904

  18. Integrative Approach to Pain Genetics Identifies Pain Sensitivity Loci across Diseases

    PubMed Central

    Ruau, David; Dudley, Joel T.; Chen, Rong; Phillips, Nicholas G.; Swan, Gary E.; Lazzeroni, Laura C.; Clark, J. David

    2012-01-01

    Identifying human genes relevant for the processing of pain requires difficult-to-conduct and expensive large-scale clinical trials. Here, we examine a novel integrative paradigm for data-driven discovery of pain gene candidates, taking advantage of the vast amount of existing disease-related clinical literature and gene expression microarray data stored in large international repositories. First, thousands of diseases were ranked according to a disease-specific pain index (DSPI), derived from Medical Subject Heading (MESH) annotations in MEDLINE. Second, gene expression profiles of 121 of these human diseases were obtained from public sources. Third, genes with expression variation significantly correlated with DSPI across diseases were selected as candidate pain genes. Finally, selected candidate pain genes were genotyped in an independent human cohort and prospectively evaluated for significant association between variants and measures of pain sensitivity. The strongest signal was with rs4512126 (5q32, ABLIM3, P = 1.3×10−10) for the sensitivity to cold pressor pain in males, but not in females. Significant associations were also observed with rs12548828, rs7826700 and rs1075791 on 8q22.2 within NCALD (P = 1.7×10−4, 1.8×10−4, and 2.2×10−4 respectively). Our results demonstrate the utility of a novel paradigm that integrates publicly available disease-specific gene expression data with clinical data curated from MEDLINE to facilitate the discovery of pain-relevant genes. This data-derived list of pain gene candidates enables additional focused and efficient biological studies validating additional candidates. PMID:22685391

  19. Genome-wide association studies and epistasis analyses of candidate genes related to age at menarche and age at natural menopause in a Korean population.

    PubMed

    Pyun, Jung-A; Kim, Sunshin; Cho, Nam H; Koh, InSong; Lee, Jong-Young; Shin, Chol; Kwack, KyuBum

    2014-05-01

    The aim of this study was to identify polymorphisms and gene-gene interactions that are significantly associated with age at menarche and age at menopause in a Korean population. A total of 3,452 and 1,827 women participated in studies of age at menarche and age at natural menopause, respectively. Linear regression analyses adjusted for residence area were used to perform genome-wide association studies (GWAS), candidate gene association studies, and interactions between the candidate genes for age at menarche and age at natural menopause. In GWAS, four single nucleotide polymorphisms (SNPs; rs7528241, rs1324329, rs11597068, and rs6495785) were strongly associated with age at natural menopause (lowest P = 9.66 × 10). However, GWAS of age at menarche did not reveal any strong associations. In candidate gene association studies, SNPs with P < 0.01 were selected to test their synergistic interactions. For age at natural menopause, there was a significant interaction between intronic SNPs on ADAM metallopeptidase with thrombospondin type I motif 9 (ADAMTS9) and SMAD family member 3 (SMAD3) genes (P = 9.52 × 10). For age at menarche, there were three significant interactions between three intronic SNPs on follicle-stimulating hormone receptor (FSHR) gene and one SNP located at the 3' flanking region of insulin-like growth factor 2 receptor (IGF2R) gene (lowest P = 1.95 × 10). Novel SNPs and synergistic interactions between candidate genes are significantly associated with age at menarche and age at natural menopause in a Korean population.

  20. 50 CFR 660.17 - Catch monitors and catch monitor providers.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... work competently with standard database software and computer hardware. (v) Have a current and valid... candidate's academic transcripts and resume; (4) A statement signed by the candidate under penalty of...

  1. 50 CFR 660.17 - Catch monitors and catch monitor service providers.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... work competently with standard database software and computer hardware. (v) Have a current and valid... candidate's academic transcripts and resume; (4) A statement signed by the candidate under penalty of...

  2. 50 CFR 660.17 - Catch monitors and catch monitor service providers.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... work competently with standard database software and computer hardware. (v) Have a current and valid... candidate's academic transcripts and resume; (4) A statement signed by the candidate under penalty of...

  3. 50 CFR 660.17 - Catch monitors and catch monitor service providers.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... work competently with standard database software and computer hardware. (v) Have a current and valid... candidate's academic transcripts and resume; (4) A statement signed by the candidate under penalty of...

  4. Interleukin-27 is a novel candidate diagnostic biomarker for bacterial infection in critically ill children

    PubMed Central

    2012-01-01

    Introduction Differentiating between sterile inflammation and bacterial infection in critically ill patients with fever and other signs of the systemic inflammatory response syndrome (SIRS) remains a clinical challenge. The objective of our study was to mine an existing genome-wide expression database for the discovery of candidate diagnostic biomarkers to predict the presence of bacterial infection in critically ill children. Methods Genome-wide expression data were compared between patients with SIRS having negative bacterial cultures (n = 21) and patients with sepsis having positive bacterial cultures (n = 60). Differentially expressed genes were subjected to a leave-one-out cross-validation (LOOCV) procedure to predict SIRS or sepsis classes. Serum concentrations of interleukin-27 (IL-27) and procalcitonin (PCT) were compared between 101 patients with SIRS and 130 patients with sepsis. All data represent the first 24 hours of meeting criteria for either SIRS or sepsis. Results Two hundred twenty one gene probes were differentially regulated between patients with SIRS and patients with sepsis. The LOOCV procedure correctly predicted 86% of the SIRS and sepsis classes, and Epstein-Barr virus-induced gene 3 (EBI3) had the highest predictive strength. Computer-assisted image analyses of gene-expression mosaics were able to predict infection with a specificity of 90% and a positive predictive value of 94%. Because EBI3 is a subunit of the heterodimeric cytokine, IL-27, we tested the ability of serum IL-27 protein concentrations to predict infection. At a cut-point value of ≥5 ng/ml, serum IL-27 protein concentrations predicted infection with a specificity and a positive predictive value of >90%, and the overall performance of IL-27 was generally better than that of PCT. A decision tree combining IL-27 and PCT improved overall predictive capacity compared with that of either biomarker alone. Conclusions Genome-wide expression analysis has provided the foundation for the identification of IL-27 as a novel candidate diagnostic biomarker for predicting bacterial infection in critically ill children. Additional studies will be required to test further the diagnostic performance of IL-27. The microarray data reported in this article have been deposited in the Gene Expression Omnibus under accession number GSE4607. PMID:23107287

  5. Novel face-detection method under various environments

    NASA Astrophysics Data System (ADS)

    Jing, Min-Quan; Chen, Ling-Hwei

    2009-06-01

    We propose a method to detect a face with different poses under various environments. On the basis of skin color information, skin regions are first extracted from an input image. Next, the shoulder part is cut out by using shape information and the head part is then identified as a face candidate. For a face candidate, a set of geometric features is applied to determine if it is a profile face. If not, then a set of eyelike rectangles extracted from the face candidate and the lighting distribution are used to determine if the face candidate is a nonprofile face. Experimental results show that the proposed method is robust under a wide range of lighting conditions, different poses, and races. The detection rate for the HHI face database is 93.68%. For the Champion face database, the detection rate is 95.15%.

  6. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial “pan-genome”

    PubMed Central

    Tettelin, Hervé; Masignani, Vega; Cieslewicz, Michael J.; Donati, Claudio; Medini, Duccio; Ward, Naomi L.; Angiuoli, Samuel V.; Crabtree, Jonathan; Jones, Amanda L.; Durkin, A. Scott; DeBoy, Robert T.; Davidsen, Tanja M.; Mora, Marirosa; Scarselli, Maria; Margarit y Ros, Immaculada; Peterson, Jeremy D.; Hauser, Christopher R.; Sundaram, Jaideep P.; Nelson, William C.; Madupu, Ramana; Brinkac, Lauren M.; Dodson, Robert J.; Rosovitz, Mary J.; Sullivan, Steven A.; Daugherty, Sean C.; Haft, Daniel H.; Selengut, Jeremy; Gwinn, Michelle L.; Zhou, Liwei; Zafar, Nikhat; Khouri, Hoda; Radune, Diana; Dimitrov, George; Watkins, Kisha; O'Connor, Kevin J. B.; Smith, Shannon; Utterback, Teresa R.; White, Owen; Rubens, Craig E.; Grandi, Guido; Madoff, Lawrence C.; Kasper, Dennis L.; Telford, John L.; Wessels, Michael R.; Rappuoli, Rino; Fraser, Claire M.

    2005-01-01

    The development of efficient and inexpensive genome sequencing methods has revolutionized the study of human bacterial pathogens and improved vaccine design. Unfortunately, the sequence of a single genome does not reflect how genetic variability drives pathogenesis within a bacterial species and also limits genome-wide screens for vaccine candidates or for antimicrobial targets. We have generated the genomic sequence of six strains representing the five major disease-causing serotypes of Streptococcus agalactiae, the main cause of neonatal infection in humans. Analysis of these genomes and those available in databases showed that the S. agalactiae species can be described by a pan-genome consisting of a core genome shared by all isolates, accounting for ≈80% of any single genome, plus a dispensable genome consisting of partially shared and strain-specific genes. Mathematical extrapolation of the data suggests that the gene reservoir available for inclusion in the S. agalactiae pan-genome is vast and that unique genes will continue to be identified even after sequencing hundreds of genomes. PMID:16172379

  7. HomoTarget: a new algorithm for prediction of microRNA targets in Homo sapiens.

    PubMed

    Ahmadi, Hamed; Ahmadi, Ali; Azimzadeh-Jamalkandi, Sadegh; Shoorehdeli, Mahdi Aliyari; Salehzadeh-Yazdi, Ali; Bidkhori, Gholamreza; Masoudi-Nejad, Ali

    2013-02-01

    MiRNAs play an essential role in the networks of gene regulation by inhibiting the translation of target mRNAs. Several computational approaches have been proposed for the prediction of miRNA target-genes. Reports reveal a large fraction of under-predicted or falsely predicted target genes. Thus, there is an imperative need to develop a computational method by which the target mRNAs of existing miRNAs can be correctly identified. In this study, combined pattern recognition neural network (PRNN) and principle component analysis (PCA) architecture has been proposed in order to model the complicated relationship between miRNAs and their target mRNAs in humans. The results of several types of intelligent classifiers and our proposed model were compared, showing that our algorithm outperformed them with higher sensitivity and specificity. Using the recent release of the mirBase database to find potential targets of miRNAs, this model incorporated twelve structural, thermodynamic and positional features of miRNA:mRNA binding sites to select target candidates. Copyright © 2012 Elsevier Inc. All rights reserved.

  8. The Effects of Selenium Supplementation on Gene Expression Related to Insulin and Lipid in Infertile Polycystic Ovary Syndrome Women Candidate for In Vitro Fertilization: a Randomized, Double-Blind, Placebo-Controlled Trial.

    PubMed

    Zadeh Modarres, Shahrzad; Heidar, Zahra; Foroozanfard, Fatemeh; Rahmati, Zahra; Aghadavod, Esmat; Asemi, Zatollah

    2018-06-01

    This study was conducted to evaluate the effects of selenium supplementation on gene expression related to insulin and lipid in infertile women with polycystic ovary syndrome (PCOS) candidate for in vitro fertilization (IVF). This randomized double-blind, placebo-controlled trial was conducted among 40 infertile women with PCOS candidate for IVF. Subjects were randomly allocated into two groups to intake either 200-μg selenium (n = 20) or placebo (n = 20) per day for 8 weeks. Gene expression levels related to insulin and lipid were quantified in lymphocytes of women with PCOS candidate for IVF with RT-PCR method. Results of RT-PCR demonstrated that after the 8-week intervention, compared with the placebo, selenium supplementation upregulated gene expression of peroxisome proliferator-activated receptor gamma (PPAR-γ) (1.06 ± 0.15-fold increase vs. 0.94 ± 0.18-fold reduction, P = 0.02) and glucose transporter 1 (GLUT-1) (1.07 ± 0.20-fold increase vs. 0.87 ± 0.18-fold reduction, P = 0.003) in lymphocytes of women with PCOS candidate for IVF. In addition, compared with the placebo, selenium supplementation downregulated gene expression of low-density lipoprotein receptor (LDLR) (0.88 ± 0.17-fold reduction vs. 1.05 ± 0.22-fold increase, P = 0.01) in lymphocytes of women with PCOS candidate for IVF. We did not observe any significant effect of selenium supplementation on gene expression levels of lipoprotein(a) [LP(a)] in lymphocytes of women with PCOS candidate for IVF. Overall, selenium supplementation for 8 weeks in lymphocytes of women with infertile PCOS candidate for IVF significantly increased gene expression levels of PPAR-γ and GLUT-1 and significantly decreased gene expression levels of LDLR, but did not affect LP(a). http://www.irct.ir : IRCT201704245623N113.

  9. Finding gene regulatory network candidates using the gene expression knowledge base.

    PubMed

    Venkatesan, Aravind; Tripathi, Sushil; Sanz de Galdeano, Alejandro; Blondé, Ward; Lægreid, Astrid; Mironov, Vladimir; Kuiper, Martin

    2014-12-10

    Network-based approaches for the analysis of large-scale genomics data have become well established. Biological networks provide a knowledge scaffold against which the patterns and dynamics of 'omics' data can be interpreted. The background information required for the construction of such networks is often dispersed across a multitude of knowledge bases in a variety of formats. The seamless integration of this information is one of the main challenges in bioinformatics. The Semantic Web offers powerful technologies for the assembly of integrated knowledge bases that are computationally comprehensible, thereby providing a potentially powerful resource for constructing biological networks and network-based analysis. We have developed the Gene eXpression Knowledge Base (GeXKB), a semantic web technology based resource that contains integrated knowledge about gene expression regulation. To affirm the utility of GeXKB we demonstrate how this resource can be exploited for the identification of candidate regulatory network proteins. We present four use cases that were designed from a biological perspective in order to find candidate members relevant for the gastrin hormone signaling network model. We show how a combination of specific query definitions and additional selection criteria derived from gene expression data and prior knowledge concerning candidate proteins can be used to retrieve a set of proteins that constitute valid candidates for regulatory network extensions. Semantic web technologies provide the means for processing and integrating various heterogeneous information sources. The GeXKB offers biologists such an integrated knowledge resource, allowing them to address complex biological questions pertaining to gene expression. This work illustrates how GeXKB can be used in combination with gene expression results and literature information to identify new potential candidates that may be considered for extending a gene regulatory network.

  10. The effects of polymorphisms in IL-2, IFN-γ, TGF-β2, IgL, TLR-4, MD-2, and iNOS genes on resistance to Salmonella enteritidis in indigenous chickens.

    PubMed

    Tohidi, Reza; Idris, Ismail Bin; Panandam, Jothi Malar; Bejo, Mohd Hair

    2012-12-01

    Salmonella Enteritidis is a major cause of food poisoning worldwide, and poultry products are the main source of S. Enteritidis contamination for humans. Among the numerous strategies for disease control, improving genetic resistance to S. Enteritidis has been the most effective approach. We investigated the association between S. Enteritidis burden in the caecum, spleen, and liver of young indigenous chickens and seven candidate genes, selected on the basis of their critical roles in immunological functions. The genes included those encoding interleukin 2 (IL-2), interferon-γ (IFN-γ), transforming growth factor β2 (TGF-β2), immunoglobulin light chain (IgL), toll-like receptor 4 (TLR-4), myeloid differentiation protein 2 (MD-2), and inducible nitric oxide synthase (iNOS). Two Malaysian indigenous chicken breeds were used as sustainable genetic sources of alleles that are resistant to salmonellosis. The polymerase chain reaction restriction fragment-length polymorphism technique was used to genotype the candidate genes. Three different genotypes were observed in all of the candidate genes, except for MD-2. All of the candidate genes showed the Hardy-Weinberg equilibrium for the two populations. The IL-2-MnlI polymorphism was associated with S. Enteritidis burden in the caecum and spleen. The TGF-β2-RsaI, TLR-4-Sau 96I, and iNOS-AluI polymorphisms were associated with the caecum S. Enteritidis load. The other candidate genes were not associated with S. Enteritidis load in any organ. The results indicate that the IL-2, TGF-β2, TLR-4, and iNOS genes are potential candidates for use in selection programmes for increasing genetic resistance against S. Enteritidis in Malaysian indigenous chickens.

  11. Thyroid Hypoplasia in Congenital Hypothyroidism Associated with Thyroid Peroxidase Mutations.

    PubMed

    Stoupa, Athanasia; Chaabane, Rim; Guériouz, Manelle; Raynaud-Ravni, Catherine; Nitschke, Patrick; Bole-Feyset, Christine; Mnif, Mouna; Ammar Keskes, Leila; Hachicha, Mongia; Belguith, Neila; Polak, Michel; Carré, Aurore

    2018-05-23

    Primary congenital hypothyroidism (CH) affects about 1:3000 newborns worldwide and is mainly caused by defects in thyroid gland development (thyroid dysgenesis, TD) or hormone synthesis. A genetic cause is identified in less than 10% of TD patients. Our aim was to identify novel candidate genes in patients with TD using next-generation sequencing tools. We used whole exome sequencing (WES) to study two families, a consanguineous Tunisian family (one child with severe thyroid hypoplasia) and a French family (two newborn siblings, with a thyroid in situ that was not enlarged on ultrasound at diagnosis). Variants in candidate genes were filtered according to type of variation, frequency in public and in-house databases, in silico prediction tools, and inheritance mode. We unexpectedly identified three different variants of the thyroid peroxidase (TPO) gene. A homozygous missense mutation (c.875C>T, p.S292F) was found in the Tunisian patient with severe thyroid hypoplasia. The two French siblings were compound heterozygotes (c.387delC/c.2578G>A, p.N129Kfs*80/p.G860R) for TPO mutations. All three mutations have been previously described in patients with goitrous CH. In our patients treatment was initiated immediately after diagnosis and the effect, if any, of TSH stimulation of these thyroids remains unclear. We report the first cases of thyroid hypoplasia at diagnosis during neonatal period in patients with CH and TPO mutations. These cases highlight the importance of screening for TPO mutations not only in goitrous CH, but also in thyroids of normal or small size, and they broaden the clinical spectrum of described phenotypes.

  12. Microarray‑based bioinformatics analysis of the prospective target gene network of key miRNAs influenced by long non‑coding RNA PVT1 in HCC.

    PubMed

    Zhang, Yu; Mo, Wei-Jia; Wang, Xiao; Zhang, Tong-Tong; Qin, Yuan; Wang, Han-Lin; Chen, Gang; Wei, Dan-Ming; Dang, Yi-Wu

    2018-05-02

    The long non‑coding RNA (lncRNA) PVT1 plays vital roles in the tumorigenesis and development of various types of cancer. However, the potential expression profiling, functions and pathways of PVT1 in HCC remain unknown. PVT1 was knocked down in SMMC‑7721 cells, and a miRNA microarray analysis was performed to detect the differentially expressed miRNAs. Twelve target prediction algorithms were used to predict the underlying targets of these differentially expressed miRNAs. Bioinformatics analysis was performed to explore the underlying functions, pathways and networks of the targeted genes. Furthermore, the relationship between PVT1 and the clinical parameters in HCC was confirmed based on the original data in the TCGA database. Among the differentially expressed miRNAs, the top two upregulated and downregulated miRNAs were selected for further analysis based on the false discovery rate (FDR), fold‑change (FC) and P‑values. Based on the TCGA database, PVT1 was obviously highly expressed in HCC, and a statistically higher PVT1 expression was found for sex (male), ethnicity (Asian) and pathological grade (G3+G4) compared to the control groups (P<0.05). Furthermore, Gene Ontology (GO) analysis revealed that the target genes were involved in complex cellular pathways, such as the macromolecule biosynthetic process, compound metabolic process, and transcription. Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis revealed that the MAPK and Wnt signaling pathways may be correlated with the regulation of the four candidate miRNAs. The results therefore provide significant information on the differentially expressed miRNAs associated with PVT1 in HCC, and we hypothesized that PVT1 may play vital roles in HCC by regulating different miRNAs or target gene expression (particularly MAPK8) via the MAPK or Wnt signaling pathways. Thus, further investigation of the molecular mechanism of PVT1 in HCC is needed.

  13. A Web-based Tool for SDSS and 2MASS Database Searches

    NASA Astrophysics Data System (ADS)

    Hendrickson, M. A.; Uomoto, A.; Golimowski, D. A.

    We have developed a web site using HTML, Php, Python, and MySQL that extracts, processes, and displays data from the Sloan Digital Sky Survey (SDSS) and the Two-Micron All-Sky Survey (2MASS). The goal is to locate brown dwarf candidates in the SDSS database by looking at color cuts; however, this site could also be useful for targeted searches of other databases as well. MySQL databases are created from broad searches of SDSS and 2MASS data. Broad queries on the SDSS and 2MASS database servers are run weekly so that observers have the most up-to-date information from which to select candidates for observation. Observers can look at detailed information about specific objects including finding charts, images, and available spectra. In addition, updates from previous observations can be added by any collaborators; this format makes observational collaboration simple. Observers can also restrict the database search, just before or during an observing run, to select objects of special interest.

  14. HRGFish: A database of hypoxia responsive genes in fishes

    NASA Astrophysics Data System (ADS)

    Rashid, Iliyas; Nagpure, Naresh Sahebrao; Srivastava, Prachi; Kumar, Ravindra; Pathak, Ajey Kumar; Singh, Mahender; Kushwaha, Basdeo

    2017-02-01

    Several studies have highlighted the changes in the gene expression due to the hypoxia response in fishes, but the systematic organization of the information and the analytical platform for such genes are lacking. In the present study, an attempt was made to develop a database of hypoxia responsive genes in fishes (HRGFish), integrated with analytical tools, using LAMPP technology. Genes reported in hypoxia response for fishes were compiled through literature survey and the database presently covers 818 gene sequences and 35 gene types from 38 fishes. The upstream fragments (3,000 bp), covered in this database, enables to compute CG dinucleotides frequencies, motif finding of the hypoxia response element, identification of CpG island and mapping with the reference promoter of zebrafish. The database also includes functional annotation of genes and provides tools for analyzing sequences and designing primers for selected gene fragments. This may be the first database on the hypoxia response genes in fishes that provides a workbench to the scientific community involved in studying the evolution and ecological adaptation of the fish species in relation to hypoxia.

  15. The Genetic Basis for Variation in Sensitivity to Lead Toxicity in Drosophila melanogaster

    PubMed Central

    Zhou, Shanshan; Morozova, Tatiana V.; Hussain, Yasmeen N.; Luoma, Sarah E.; McCoy, Lenovia; Yamamoto, Akihiko; Mackay, Trudy F.C.; Anholt, Robert R.H.

    2016-01-01

    Background: Lead toxicity presents a worldwide health problem, especially due to its adverse effects on cognitive development in children. However, identifying genes that give rise to individual variation in susceptibility to lead toxicity is challenging in human populations. Objectives: Our goal was to use Drosophila melanogaster to identify evolutionarily conserved candidate genes associated with individual variation in susceptibility to lead exposure. Methods: To identify candidate genes associated with variation in susceptibility to lead toxicity, we measured effects of lead exposure on development time, viability and adult activity in the Drosophila melanogaster Genetic Reference Panel (DGRP) and performed genome-wide association analyses to identify candidate genes. We used mutants to assess functional causality of candidate genes and constructed a genetic network associated with variation in sensitivity to lead exposure, on which we could superimpose human orthologs. Results: We found substantial heritabilities for all three traits and identified candidate genes associated with variation in susceptibility to lead exposure for each phenotype. The genetic architectures that determine variation in sensitivity to lead exposure are highly polygenic. Gene ontology and network analyses showed enrichment of genes associated with early development and function of the nervous system. Conclusions: Drosophila melanogaster presents an advantageous model to study the genetic underpinnings of variation in susceptibility to lead toxicity. Evolutionary conservation of cellular pathways that respond to toxic exposure allows predictions regarding orthologous genes and pathways across phyla. Thus, studies in the D. melanogaster model system can identify candidate susceptibility genes to guide subsequent studies in human populations. Citation: Zhou S, Morozova TV, Hussain YN, Luoma SE, McCoy L, Yamamoto A, Mackay TF, Anholt RR. 2016. The genetic basis for variation in sensitivity to lead toxicity in Drosophila melanogaster. Environ Health Perspect 124:1062–1070; http://dx.doi.org/10.1289/ehp.1510513 PMID:26859824

  16. Secretome Characterization and Correlation Analysis Reveal Putative Pathogenicity Mechanisms and Identify Candidate Avirulence Genes in the Wheat Stripe Rust Fungus Puccinia striiformis f. sp. tritici.

    PubMed

    Xia, Chongjing; Wang, Meinan; Cornejo, Omar E; Jiwan, Derick A; See, Deven R; Chen, Xianming

    2017-01-01

    Stripe (yellow) rust, caused by Puccinia striiformis f. sp. tritici ( Pst ), is one of the most destructive diseases of wheat worldwide. Planting resistant cultivars is an effective way to control this disease, but race-specific resistance can be overcome quickly due to the rapid evolving Pst population. Studying the pathogenicity mechanisms is critical for understanding how Pst virulence changes and how to develop wheat cultivars with durable resistance to stripe rust. We re-sequenced 7 Pst isolates and included additional 7 previously sequenced isolates to represent balanced virulence/avirulence profiles for several avirulence loci in seretome analyses. We observed an uneven distribution of heterozygosity among the isolates. Secretome comparison of Pst with other rust fungi identified a large portion of species-specific secreted proteins, suggesting that they may have specific roles when interacting with the wheat host. Thirty-two effectors of Pst were identified from its secretome. We identified candidates for Avr genes corresponding to six Yr genes by correlating polymorphisms for effector genes to the virulence/avirulence profiles of the 14 Pst isolates. The putative AvYr76 was present in the avirulent isolates, but absent in the virulent isolates, suggesting that deleting the coding region of the candidate avirulence gene has produced races virulent to resistance gene Yr76 . We conclude that incorporating avirulence/virulence phenotypes into correlation analysis with variations in genomic structure and secretome, particularly presence/absence polymorphisms of effectors, is an efficient way to identify candidate Avr genes in Pst . The candidate effector genes provide a rich resource for further studies to determine the evolutionary history of Pst populations and the co-evolutionary arms race between Pst and wheat. The Avr candidates identified in this study will lead to cloning avirulence genes in Pst , which will enable us to understand molecular mechanisms underlying Pst -wheat interactions, to determine the effectiveness of resistance genes and further to develop durable resistance to stripe rust.

  17. Array-based comparative genomic hybridization-guided identification of reference genes for normalization of real-time quantitative polymerase chain reaction assay data for lymphomas, histiocytic sarcomas, and osteosarcomas of dogs.

    PubMed

    Tsai, Pei-Chien; Breen, Matthew

    2012-09-01

    To identify suitable reference genes for normalization of real-time quantitative PCR (RT-qPCR) assay data for common tumors of dogs. Malignant lymph node (n = 8), appendicular osteosarcoma (9), and histiocytic sarcoma (12) samples and control samples of various nonneoplastic canine tissues. Array-based comparative genomic hybridization (aCGH) data were used to guide selection of 9 candidate reference genes. Expression stability of candidate reference genes and 4 commonly used reference genes was determined for tumor samples with RT-qPCR assays and 3 software programs. LOC611555 was the candidate reference gene with the highest expression stability among the 3 tumor types. Of the commonly used reference genes, expression stability of HPRT was high in histiocytic sarcoma samples, and expression stability of Ubi and RPL32 was high in osteosarcoma samples. Some of the candidate reference genes had higher expression stability than did the commonly used reference genes. Data for constitutively expressed genes with high expression stability are required for normalization of RT-qPCR assay results. Without such data, accurate quantification of gene expression in tumor tissue samples is difficult. Results of the present study indicated LOC611555 may be a useful RT-qPCR assay reference gene for multiple tissue types. Some commonly used reference genes may be suitable for normalization of gene expression data for tumors of dogs, such as lymphomas, osteosarcomas, or histiocytic sarcomas.

  18. Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger.

    PubMed

    Wright, James C; Sugden, Deana; Francis-McIntyre, Sue; Riba-Garcia, Isabel; Gaskell, Simon J; Grigoriev, Igor V; Baker, Scott E; Beynon, Robert J; Hubbard, Simon J

    2009-02-04

    Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS) and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR). 405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6%) of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models. This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST) data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method.

  19. Knowledge-based analysis of microarrays for the discovery of transcriptional regulation relationships

    PubMed Central

    2010-01-01

    Background The large amount of high-throughput genomic data has facilitated the discovery of the regulatory relationships between transcription factors and their target genes. While early methods for discovery of transcriptional regulation relationships from microarray data often focused on the high-throughput experimental data alone, more recent approaches have explored the integration of external knowledge bases of gene interactions. Results In this work, we develop an algorithm that provides improved performance in the prediction of transcriptional regulatory relationships by supplementing the analysis of microarray data with a new method of integrating information from an existing knowledge base. Using a well-known dataset of yeast microarrays and the Yeast Proteome Database, a comprehensive collection of known information of yeast genes, we show that knowledge-based predictions demonstrate better sensitivity and specificity in inferring new transcriptional interactions than predictions from microarray data alone. We also show that comprehensive, direct and high-quality knowledge bases provide better prediction performance. Comparison of our results with ChIP-chip data and growth fitness data suggests that our predicted genome-wide regulatory pairs in yeast are reasonable candidates for follow-up biological verification. Conclusion High quality, comprehensive, and direct knowledge bases, when combined with appropriate bioinformatic algorithms, can significantly improve the discovery of gene regulatory relationships from high throughput gene expression data. PMID:20122245

  20. Knowledge-based analysis of microarrays for the discovery of transcriptional regulation relationships.

    PubMed

    Seok, Junhee; Kaushal, Amit; Davis, Ronald W; Xiao, Wenzhong

    2010-01-18

    The large amount of high-throughput genomic data has facilitated the discovery of the regulatory relationships between transcription factors and their target genes. While early methods for discovery of transcriptional regulation relationships from microarray data often focused on the high-throughput experimental data alone, more recent approaches have explored the integration of external knowledge bases of gene interactions. In this work, we develop an algorithm that provides improved performance in the prediction of transcriptional regulatory relationships by supplementing the analysis of microarray data with a new method of integrating information from an existing knowledge base. Using a well-known dataset of yeast microarrays and the Yeast Proteome Database, a comprehensive collection of known information of yeast genes, we show that knowledge-based predictions demonstrate better sensitivity and specificity in inferring new transcriptional interactions than predictions from microarray data alone. We also show that comprehensive, direct and high-quality knowledge bases provide better prediction performance. Comparison of our results with ChIP-chip data and growth fitness data suggests that our predicted genome-wide regulatory pairs in yeast are reasonable candidates for follow-up biological verification. High quality, comprehensive, and direct knowledge bases, when combined with appropriate bioinformatic algorithms, can significantly improve the discovery of gene regulatory relationships from high throughput gene expression data.

  1. Comparison of Gene Expression in Human Embryonic Stem Cells, hESC-Derived Mesenchymal Stem Cells and Human Mesenchymal Stem Cells.

    PubMed

    Barbet, Romain; Peiffer, Isabelle; Hatzfeld, Antoinette; Charbord, Pierre; Hatzfeld, Jacques A

    2011-01-01

    We present a strategy to identify developmental/differentiation and plasma membrane marker genes of the most primitive human Mesenchymal Stem Cells (hMSCs). Using sensitive and quantitative TaqMan Low Density Arrays (TLDA) methodology, we compared the expression of 381 genes in human Embryonic Stem Cells (hESCs), hESC-derived MSCs (hES-MSCs), and hMSCs. Analysis of differentiation genes indicated that hES-MSCs express the sarcomeric muscle lineage in addition to the classical mesenchymal lineages, suggesting they are more primitive than hMSCs. Transcript analysis of membrane antigens suggests that IL1R1(low), BMPR1B(low), FLT4(low), LRRC32(low), and CD34 may be good candidates for the detection and isolation of the most primitive hMSCs. The expression in hMSCs of cytokine genes, such as IL6, IL8, or FLT3LG, without expression of the corresponding receptor, suggests a role for these cytokines in the paracrine control of stem cell niches. Our database may be shared with other laboratories in order to explore the considerable clinical potential of hES-MSCs, which appear to represent an intermediate developmental stage between hESCs and hMSCs.

  2. Identification and characterization of UDP-glucose:Phloretin 4'-O-glycosyltransferase from Malus x domestica Borkh.

    PubMed

    Yahyaa, Mosaab; Davidovich-Rikanati, Rachel; Eyal, Yoram; Sheachter, Alona; Marzouk, Sally; Lewinsohn, Efraim; Ibdah, Mwafaq

    2016-10-01

    Apples (Malus x domestica Brokh.) are among the world's most important food crops with nutritive and medicinal importance. Many of the health beneficial properties of apple fruit are suggested to be due to (poly)phenolic metabolites, including various dihydrochalcones. Although many of the genes and enzymes involved in polyphenol biosynthesis are known in many plant species, the specific reactions that lead to the biosynthesis of the sweet tasting dihydrochalcones, such as trilobatin, are unknown. To identify candidate genes for involvement in the glycosylation of dihydrochalcones, existing genome databases of the Rosaceae were screened for apple genes with significant sequence similarity to Bacillus subtilis phloretin glycosyltransferase. Herein reported is the identification and functional characterization of a Malus x domestica gene encoding phloretin-4'-O-glycosyltransferase designated MdPh-4'-OGT. Recombinant MdPh-4'-OGT protein glycosylates phloretin in the presence of UDP-glucose into trilobatin in vitro. Its apparent Km values for phloretin and UDP-glucose were 26.1 μM and 1.2 mM, respectively. Expression analysis of the MdPh-4'-OGT gene indicated that its transcript levels showed significant variation in apple tissues of different developmental stages. Copyright © 2016 Elsevier Ltd. All rights reserved.

  3. In Silico Computational Transcriptomics Reveals Novel Endocrine Disruptors in Largemouth Bass ( Micropterus salmoides).

    PubMed

    Basili, Danilo; Zhang, Ji-Liang; Herbert, John; Kroll, Kevin; Denslow, Nancy D; Martyniuk, Christopher J; Falciani, Francesco; Antczak, Philipp

    2018-06-15

    In recent years, decreases in fish populations have been attributed, in part, to the effect of environmental chemicals on ovarian development. To understand the underlying molecular events we developed a dynamic model of ovary development linking gene transcription to key physiological end points, such as gonadosomatic index (GSI), plasma levels of estradiol (E2) and vitellogenin (VTG), in largemouth bass ( Micropterus salmoides). We were able to identify specific clusters of genes, which are affected at different stages of ovarian development. A subnetwork was identified that closely linked gene expression and physiological end points and by interrogating the Comparative Toxicogenomic Database (CTD), quercetin and tretinoin (ATRA) were identified as two potential candidates that may perturb this system. Predictions were validated by investigation of reproductive associated transcripts using qPCR in ovary and in the liver of both male and female largemouth bass treated after a single injection of quercetin and tretinoin (10 and 100 μg/kg). Both compounds were found to significantly alter the expression of some of these genes. Our findings support the use of omics and online repositories for identification of novel, yet untested, compounds. This is the first study of a dynamic model that links gene expression patterns across stages of ovarian development.

  4. Identification and characterization of a gene encoding for a nucleotidase from Phaseolus vulgaris.

    PubMed

    Cabello-Díaz, Juan Miguel; Gálvez-Valdivieso, Gregorio; Caballo, Cristina; Lambert, Rocío; Quiles, Francisco Antonio; Pineda, Manuel; Piedras, Pedro

    2015-08-01

    Nucleotidases are phosphatases that catalyze the removal of phosphate from nucleotides, compounds with an important role in plant metabolism. A phosphatase enzyme, with high affinity for nucleotides monophosphate previously identified and purified in embryonic axes from French bean, has been analyzed by MALDI TOF/TOF and two internal peptides have been obtained. The information of these peptide sequences has been used to search in the genome database and only a candidate gene that encodes for the phosphatase was identified (PvNTD1). The putative protein contains the conserved domains (motif I-IV) for haloacid dehalogenase-like hydrolases superfamily. The residues involved in the catalytic activity are also conserved. A recombinant protein overexpressed in Escherichia coli has shown molybdate resistant phosphatase activity with nucleosides monophosphate as substrate, confirming that the identified gene encodes for the phosphatase with high affinity for nucleotides purified in French bean embryonic axes. The activity of the purified protein was inhibited by adenosine. The expression of PvNTD1 gene was induced at the specific moment of radicle protrusion in embryonic axes. The gene was also highly expressed in young leaves whereas the level of expression in mature tissues was minimal. Copyright © 2015 The Authors. Published by Elsevier GmbH.. All rights reserved.

  5. Investigation of SnSPR1, a novel and abundant surface protein of Sarcocystis neurona merozoites.

    PubMed

    Zhang, Deqing; Howe, Daniel K

    2008-04-15

    An expressed sequence tag (EST) sequencing project has produced over 15,000 partial cDNA sequences from the equine pathogen Sarcocystis neurona. While many of the sequences are clear homologues of previously characterized genes, a significant number of the S. neurona ESTs do not exhibit similarity to anything in the extensive sequence databases that have been generated. In an effort to characterize parasite proteins that are novel to S. neurona, a seemingly unique gene was selected for further investigation based on its abundant representation in the collection of ESTs and the predicted presence of a signal peptide and glycolipid anchor addition on the encoded protein. The gene was expressed in E. coli, and monospecific polyclonal antiserum against the recombinant protein was produced by immunization of a rabbit. Characterization of the native protein in S. neurona merozoites and schizonts revealed that it is a low molecular weight surface protein that is expressed throughout intracellular development of the parasite. The protein was designated Surface Protein 1 (SPR1) to reflect its display on the outer surface of merozoites and to distinguish it from the ubiquitous SAG/SRS surface antigens of the heteroxenous Coccidia. Interestingly, infection assays in the presence of the polyclonal antiserum suggested that SnSPR1 plays some role in attachment and/or invasion of host cells by S. neurona merozoites. The work described herein represents a general template for selecting and characterizing the various unidentified gene sequences that are plentiful in the EST databases for S. neurona and other apicomplexans. Furthermore, this study illustrates the value of investigating these novel sequences since it can offer new candidates for diagnostic or vaccine development while also providing greater insight into the biology of these parasites.

  6. Isolated chromosome 8p23.2‑pter deletion: Novel evidence for developmental delay, intellectual disability, microcephaly and neurobehavioral disorders.

    PubMed

    Shi, Shanshan; Lin, Shaobin; Chen, Baojiang; Zhou, Yi

    2017-11-01

    The current study presents a patient carrying a de novo ~6 Mb deletion of the isolated chromosome 8p23.2‑pter that was identified with a single‑nucleotide polymorphism array. The patient was characterized by developmental delay (DD)/intellectual disability (ID), microcephaly, autism spectrum disorder, attention‑deficit/hyperactivity disorders and mildly dysmorphic features. The location, size and gene content of the deletion observed in this patient were compared with those in 7 patients with isolated 8p23.2 to 8pter deletions reported in previous studies (4 patients) or recorded in the Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources (DECIPHER) database (3 patients). The deletions reported in previous studies were assessed using a chromosomal microarray analysis. The 8p23.2‑pter deletion was a distinct microdeletion syndrome, as similar phenotypes were observed in patients with this deletion. Furthermore, following a detailed review of the potential associations between the genes located from 8p23.2 to 8pter and their clinical significance, it was hypothesized that DLG associated protein 2, ceroid‑lipofuscinosis neuronal 8, Rho guanine nucleotide exchange factor 10 and CUB and sushi multiple domains 1 may be candidate genes for DD/ID, microcephaly and neurobehavioral disorders. However, firm evidence should be accumulated from high‑resolution studies of patients with small, isolated, overlapping and interstitial deletions involving the region from 8p23.2 to 8pter. These studies will allow determination of genotype‑phenotype associations for the specific genes crucial to 8p23.2‑pter.

  7. A high-density genetic map of Arachis duranensis, a diploid ancestor of cultivated peanut

    PubMed Central

    2012-01-01

    Background Cultivated peanut (Arachis hypogaea) is an allotetraploid species whose ancestral genomes are most likely derived from the A-genome species, A. duranensis, and the B-genome species, A. ipaensis. The very recent (several millennia) evolutionary origin of A. hypogaea has imposed a bottleneck for allelic and phenotypic diversity within the cultigen. However, wild diploid relatives are a rich source of alleles that could be used for crop improvement and their simpler genomes can be more easily analyzed while providing insight into the structure of the allotetraploid peanut genome. The objective of this research was to establish a high-density genetic map of the diploid species A. duranensis based on de novo generated EST databases. Arachis duranensis was chosen for mapping because it is the A-genome progenitor of cultivated peanut and also in order to circumvent the confounding effects of gene duplication associated with allopolyploidy in A. hypogaea. Results More than one million expressed sequence tag (EST) sequences generated from normalized cDNA libraries of A. duranensis were assembled into 81,116 unique transcripts. Mining this dataset, 1236 EST-SNP markers were developed between two A. duranensis accessions, PI 475887 and Grif 15036. An additional 300 SNP markers also were developed from genomic sequences representing conserved legume orthologs. Of the 1536 SNP markers, 1054 were placed on a genetic map. In addition, 598 EST-SSR markers identified in A. hypogaea assemblies were included in the map along with 37 disease resistance gene candidate (RGC) and 35 other previously published markers. In total, 1724 markers spanning 1081.3 cM over 10 linkage groups were mapped. Gene sequences that provided mapped markers were annotated using similarity searches in three different databases, and gene ontology descriptions were determined using the Medicago Gene Atlas and TAIR databases. Synteny analysis between A. duranensis, Medicago and Glycine revealed significant stretches of conserved gene clusters spread across the peanut genome. A higher level of colinearity was detected between A. duranensis and Glycine than with Medicago. Conclusions The first high-density, gene-based linkage map for A. duranensis was generated that can serve as a reference map for both wild and cultivated Arachis species. The markers developed here are valuable resources for the peanut, and more broadly, to the legume research community. The A-genome map will have utility for fine mapping in other peanut species and has already had application for mapping a nematode resistance gene that was introgressed into A. hypogaea from A. cardenasii. PMID:22967170

  8. Genome-wide association links candidate genes to resistance to Plum Pox Virus in apricot (Prunus armeniaca).

    PubMed

    Mariette, Stéphanie; Wong Jun Tai, Fabienne; Roch, Guillaume; Barre, Aurélien; Chague, Aurélie; Decroocq, Stéphane; Groppi, Alexis; Laizet, Yec'han; Lambert, Patrick; Tricon, David; Nikolski, Macha; Audergon, Jean-Marc; Abbott, Albert G; Decroocq, Véronique

    2016-01-01

    In fruit tree species, many important traits have been characterized genetically by using single-family descent mapping in progenies segregating for the traits. However, most mapped loci have not been sufficiently resolved to the individual genes due to insufficient progeny sizes for high resolution mapping and the previous lack of whole-genome sequence resources of the study species. To address this problem for Plum Pox Virus (PPV) candidate resistance gene identification in Prunus species, we implemented a genome-wide association (GWA) approach in apricot. This study exploited the broad genetic diversity of the apricot (Prunus armeniaca) germplasm containing resistance to PPV, next-generation sequence-based genotyping, and the high-quality peach (Prunus persica) genome reference sequence for single nucleotide polymorphism (SNP) identification. The results of this GWA study validated previously reported PPV resistance quantitative trait loci (QTL) intervals, highlighted other potential resistance loci, and resolved each to a limited set of candidate genes for further study. This work substantiates the association genetics approach for resolution of QTL to candidate genes in apricot and suggests that this approach could simplify identification of other candidate genes for other marked trait intervals in this germplasm. © 2015 INRA, UMR 1332 BFP New Phytologist © 2015 New Phytologist Trust.

  9. Mutational Landscape of Candidate Genes in Familial Prostate Cancer

    PubMed Central

    Johnson, Anna M.; Zuhlke, Kimberly A.; Plotts, Chris; McDonnell, Shannon K.; Middha, Sumit; Riska, Shaun M.; Thibodeau, Stephen N.; Douglas, Julie A.; Cooney, Kathleen A.

    2014-01-01

    Background Family history is a major risk factor for prostate cancer (PCa), suggesting a genetic component to the disease. However, traditional linkage and association studies have failed to fully elucidate the underlying genetic basis of familial PCa. Methods Here we use a candidate gene approach to identify potential PCa susceptibility variants in whole exome sequencing data from familial PCa cases. Six hundred ninety-seven candidate genes were identified based on function, location near a known chromosome 17 linkage signal, and/or previous association with prostate or other cancers. Single nucleotide variants (SNVs) in these candidate genes were identified in whole exome sequence data from 33 PCa cases from 11 multiplex PCa families (3 cases/family). Results Overall, 4856 candidate gene SNVs were identified, including 1052 missense and 10 nonsense variants. Twenty missense variants were shared by all 3 family members in each family in which they were observed. Additionally, 15 missense variants were shared by 2 of 3 family members and predicted to be deleterious by 5 different algorithms. Four missense variants, BLM Gln123Arg, PARP2 Arg283Gln, LRCC46 Ala295Thr and KIF2B Pro91Leu, and 1 nonsense variant, CYP3A43 Arg441Ter, showed complete co-segregation with PCa status. Twelve additional variants displayed partial co-segregation with PCa. Conclusions Forty-three nonsense and shared, missense variants were identified in our candidate genes. Further research is needed to determine the contribution of these variants to PCa susceptibility. PMID:25111073

  10. Transcript and proteomic analysis of developing white lupin (Lupinus albus L.) roots

    PubMed Central

    Tian, Li; Peel, Gregory J; Lei, Zhentian; Aziz, Naveed; Dai, Xinbin; He, Ji; Watson, Bonnie; Zhao, Patrick X; Sumner, Lloyd W; Dixon, Richard A

    2009-01-01

    Background White lupin (Lupinus albus L.) roots efficiently take up and accumulate (heavy) metals, adapt to phosphate deficiency by forming cluster roots, and secrete antimicrobial prenylated isoflavones during development. Genomic and proteomic approaches were applied to identify candidate genes and proteins involved in antimicrobial defense and (heavy) metal uptake and translocation. Results A cDNA library was constructed from roots of white lupin seedlings. Eight thousand clones were randomly sequenced and assembled into 2,455 unigenes, which were annotated based on homologous matches in the NCBInr protein database. A reference map of developing white lupin root proteins was established through 2-D gel electrophoresis and peptide mass fingerprinting. High quality peptide mass spectra were obtained for 170 proteins. Microsomal membrane proteins were separated by 1-D gel electrophoresis and identified by LC-MS/MS. A total of 74 proteins were putatively identified by the peptide mass fingerprinting and the LC-MS/MS methods. Genomic and proteomic analyses identified candidate genes and proteins encoding metal binding and/or transport proteins, transcription factors, ABC transporters and phenylpropanoid biosynthetic enzymes. Conclusion The combined EST and protein datasets will facilitate the understanding of white lupin's response to biotic and abiotic stresses and its utility for phytoremediation. The root ESTs provided 82 perfect simple sequence repeat (SSR) markers with potential utility in breeding white lupin for enhanced agronomic traits. PMID:19123941

  11. Pharmacogenetics and Metabolism from Science to Implementation in Clinical Practice: The Example of Dihydropyrimidine Dehydrogenase.

    PubMed

    Del Re, Marzia; Restante, Giuliana; Di Paolo, Antonello; Crucitta, Stefania; Rofi, Eleonora; Danesi, Romano

    2017-01-01

    Fluoropyrimidines are widely used in the treatment of solid tumors and remain the backbone of many combination chemotherapy regimens. Despite their clinical benefit, they are associated with frequent gastrointestinal and hematological toxicities, which often lead to treatment discontinuation. Fluoropyrimidines undergo complex anabolic and catabolic biotransformation. Enzymes involved in this pathway include dihydropyrimidine dehydrogenase (DPD), which breaks down 5-FU and its prodrugs. Candidate gene approaches have demonstrated associations between 5-FU treatment outcomes and germline polymorphisms in DPD. The aim of this review is to report and discuss the latest results on fluoropyrimidine pharmacogenetics. Literature from PubMed databases and bibliography from retrieved publications have been analyzed according to terms such DPD, DPYD, fluoropyrimdines, polymorphisms, toxicity, pharmacogenetics. To date, many sequence variations have been identified within DPYD gene, although the majority of these have no functional consequences on enzymatic activity. Nowadays, there is a general agreement on the clinical significance of the importance of DPD deficiency in patients who suffer from severe, life-threatening drug toxicity although preemptive testing is not applied to all patients. Considering the published literature, clinicians are strongly encouraged to consider testing for DPD poor metabolizer variants as a rational pre-treatment screening for patients candidate to a fluoropyrimidine-based regimens, in order to prevent toxicities and personalise treatments. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  12. Candidate genes for idiopathic epilepsy in four dog breeds.

    PubMed

    Ekenstedt, Kari J; Patterson, Edward E; Minor, Katie M; Mickelson, James R

    2011-04-25

    Idiopathic epilepsy (IE) is a naturally occurring and significant seizure disorder affecting all dog breeds. Because dog breeds are genetically isolated populations, it is possible that IE is attributable to common founders and is genetically homogenous within breeds. In humans, a number of mutations, the majority of which are genes encoding ion channels, neurotransmitters, or their regulatory subunits, have been discovered to cause rare, specific types of IE. It was hypothesized that there are simple genetic bases for IE in some purebred dog breeds, specifically in Vizslas, English Springer Spaniels (ESS), Greater Swiss Mountain Dogs (GSMD), and Beagles, and that the gene(s) responsible may, in some cases, be the same as those already discovered in humans. Candidate genes known to be involved in human epilepsy, along with selected additional genes in the same gene families that are involved in murine epilepsy or are expressed in neural tissue, were examined in populations of affected and unaffected dogs. Microsatellite markers in close proximity to each candidate gene were genotyped and subjected to two-point linkage in Vizslas, and association analysis in ESS, GSMD and Beagles. Most of these candidate genes were not significantly associated with IE in these four dog breeds, while a few genes remained inconclusive. Other genes not included in this study may still be causing monogenic IE in these breeds or, like many cases of human IE, the disease in dogs may be likewise polygenic.

  13. Adaptation to climate through flowering phenology: a case study in Medicago truncatula.

    PubMed

    Burgarella, Concetta; Chantret, Nathalie; Gay, Laurène; Prosperi, Jean-Marie; Bonhomme, Maxime; Tiffin, Peter; Young, Nevin D; Ronfort, Joelle

    2016-07-01

    Local climatic conditions likely constitute an important selective pressure on genes underlying important fitness-related traits such as flowering time, and in many species, flowering phenology and climatic gradients strongly covary. To test whether climate shapes the genetic variation on flowering time genes and to identify candidate flowering genes involved in the adaptation to environmental heterogeneity, we used a large Medicago truncatula core collection to examine the association between nucleotide polymorphisms at 224 candidate genes and both climate variables and flowering phenotypes. Unlike genome-wide studies, candidate gene approaches are expected to enrich for the number of meaningful trait associations because they specifically target genes that are known to affect the trait of interest. We found that flowering time mediates adaptation to climatic conditions mainly by variation at genes located upstream in the flowering pathways, close to the environmental stimuli. Variables related to the annual precipitation regime reflected selective constraints on flowering time genes better than the other variables tested (temperature, altitude, latitude or longitude). By comparing phenotype and climate associations, we identified 12 flowering genes as the most promising candidates responsible for phenological adaptation to climate. Four of these genes were located in the known flowering time QTL region on chromosome 7. However, climate and flowering associations also highlighted largely distinct gene sets, suggesting different genetic architectures for adaptation to climate and flowering onset. © 2016 John Wiley & Sons Ltd.

  14. Epigenomic elements analyses for promoters identify ESRRG as a new susceptibility gene for obesity-related traits.

    PubMed

    Dong, S-S; Guo, Y; Zhu, D-L; Chen, X-F; Wu, X-M; Shen, H; Chen, X-D; Tan, L-J; Tian, Q; Deng, H-W; Yang, T-L

    2016-07-01

    With ENCODE epigenomic data and results from published genome-wide association studies (GWASs), we aimed to find regulatory signatures of obesity genes and discover novel susceptibility genes. Obesity genes were obtained from public GWAS databases and their promoters were annotated based on the regulatory element information. Significantly enriched or depleted epigenomic elements in the promoters of obesity genes were evaluated and all human genes were then prioritized according to the existence of the selected elements to predict new candidate genes. Top-ranked genes were subsequently applied to validate their associations with obesity-related traits in three independent in-house GWAS samples. We identified RAD21 and EZH2 as over-represented, and STAT2 (signal transducer and activator of transcription 2) and IRF3 (interferon regulatory transcription factor 3) as depleted transcription factors. Histone modification of H3K9me3 and chromatin state segmentation of 'poised promoter' and 'repressed' were over-represented. All genes were prioritized and we selected the top five genes for validation at the population level. Combining results from the three GWAS samples, rs7522101 in ESRRG (estrogen-related receptor-γ) remained significantly associated with body mass index after multiple testing corrections (P=7.25 × 10(-5)). It was also associated with β-cell function (P=1.99 × 10(-3)) and fasting glucose level (P<0.05) in the meta-analyses of glucose and insulin-related traits consortium (MAGIC) data set.Cnoclusions:In summary, we identified epigenomic characteristics for obesity genes and suggested ESRRG as a novel obesity-susceptibility gene.

  15. Transcriptome profiling of two maize inbreds with distinct responses to Gibberella ear rot disease to identify candidate resistance genes.

    PubMed

    Kebede, Aida Z; Johnston, Anne; Schneiderman, Danielle; Bosnich, Whynn; Harris, Linda J

    2018-02-09

    Gibberella ear rot (GER) is one of the most economically important fungal diseases of maize in the temperate zone due to moldy grain contaminated with health threatening mycotoxins. To develop resistant genotypes and control the disease, understanding the host-pathogen interaction is essential. RNA-Seq-derived transcriptome profiles of fungal- and mock-inoculated developing kernel tissues of two maize inbred lines were used to identify differentially expressed transcripts and propose candidate genes mapping within GER resistance quantitative trait loci (QTL). A total of 1255 transcripts were significantly (P ≤ 0.05) up regulated due to fungal infection in both susceptible and resistant inbreds. A greater number of transcripts were up regulated in the former (1174) than the latter (497) and increased as the infection progressed from 1 to 2 days after inoculation. Focusing on differentially expressed genes located within QTL regions for GER resistance, we identified 81 genes involved in membrane transport, hormone regulation, cell wall modification, cell detoxification, and biosynthesis of pathogenesis related proteins and phytoalexins as candidate genes contributing to resistance. Applying droplet digital PCR, we validated the expression profiles of a subset of these candidate genes from QTL regions contributed by the resistant inbred on chromosomes 1, 2 and 9. By screening global gene expression profiles for differentially expressed genes mapping within resistance QTL regions, we have identified candidate genes for gibberella ear rot resistance on several maize chromosomes which could potentially lead to a better understanding of Fusarium resistance mechanisms.

  16. An Integrative Genetics Approach to Identify Candidate Genes Regulating BMD: Combining Linkage, Gene Expression, and Association

    PubMed Central

    Farber, Charles R; van Nas, Atila; Ghazalpour, Anatole; Aten, Jason E; Doss, Sudheer; Sos, Brandon; Schadt, Eric E; Ingram-Drake, Leslie; Davis, Richard C; Horvath, Steve; Smith, Desmond J; Drake, Thomas A; Lusis, Aldons J

    2009-01-01

    Numerous quantitative trait loci (QTLs) affecting bone traits have been identified in the mouse; however, few of the underlying genes have been discovered. To improve the process of transitioning from QTL to gene, we describe an integrative genetics approach, which combines linkage analysis, expression QTL (eQTL) mapping, causality modeling, and genetic association in outbred mice. In C57BL/6J × C3H/HeJ (BXH) F2 mice, nine QTLs regulating femoral BMD were identified. To select candidate genes from within each QTL region, microarray gene expression profiles from individual F2 mice were used to identify 148 genes whose expression was correlated with BMD and regulated by local eQTLs. Many of the genes that were the most highly correlated with BMD have been previously shown to modulate bone mass or skeletal development. Candidates were further prioritized by determining whether their expression was predicted to underlie variation in BMD. Using network edge orienting (NEO), a causality modeling algorithm, 18 of the 148 candidates were predicted to be causally related to differences in BMD. To fine-map QTLs, markers in outbred MF1 mice were tested for association with BMD. Three chromosome 11 SNPs were identified that were associated with BMD within the Bmd11 QTL. Finally, our approach provides strong support for Wnt9a, Rasd1, or both underlying Bmd11. Integration of multiple genetic and genomic data sets can substantially improve the efficiency of QTL fine-mapping and candidate gene identification. PMID:18767929

  17. A Stratified Transcriptomics Analysis of Polygenic Fat and Lean Mouse Adipose Tissues Identifies Novel Candidate Obesity Genes

    PubMed Central

    Morton, Nicholas M.; Nelson, Yvonne B.; Michailidou, Zoi; Di Rollo, Emma M.; Ramage, Lynne; Hadoke, Patrick W. F.; Seckl, Jonathan R.; Bunger, Lutz; Horvat, Simon; Kenyon, Christopher J.; Dunbar, Donald R.

    2011-01-01

    Background Obesity and metabolic syndrome results from a complex interaction between genetic and environmental factors. In addition to brain-regulated processes, recent genome wide association studies have indicated that genes highly expressed in adipose tissue affect the distribution and function of fat and thus contribute to obesity. Using a stratified transcriptome gene enrichment approach we attempted to identify adipose tissue-specific obesity genes in the unique polygenic Fat (F) mouse strain generated by selective breeding over 60 generations for divergent adiposity from a comparator Lean (L) strain. Results To enrich for adipose tissue obesity genes a ‘snap-shot’ pooled-sample transcriptome comparison of key fat depots and non adipose tissues (muscle, liver, kidney) was performed. Known obesity quantitative trait loci (QTL) information for the model allowed us to further filter genes for increased likelihood of being causal or secondary for obesity. This successfully identified several genes previously linked to obesity (C1qr1, and Np3r) as positional QTL candidate genes elevated specifically in F line adipose tissue. A number of novel obesity candidate genes were also identified (Thbs1, Ppp1r3d, Tmepai, Trp53inp2, Ttc7b, Tuba1a, Fgf13, Fmr) that have inferred roles in fat cell function. Quantitative microarray analysis was then applied to the most phenotypically divergent adipose depot after exaggerating F and L strain differences with chronic high fat feeding which revealed a distinct gene expression profile of line, fat depot and diet-responsive inflammatory, angiogenic and metabolic pathways. Selected candidate genes Npr3 and Thbs1, as well as Gys2, a non-QTL gene that otherwise passed our enrichment criteria were characterised, revealing novel functional effects consistent with a contribution to obesity. Conclusions A focussed candidate gene enrichment strategy in the unique F and L model has identified novel adipose tissue-enriched genes contributing to obesity. PMID:21915269

  18. A stratified transcriptomics analysis of polygenic fat and lean mouse adipose tissues identifies novel candidate obesity genes.

    PubMed

    Morton, Nicholas M; Nelson, Yvonne B; Michailidou, Zoi; Di Rollo, Emma M; Ramage, Lynne; Hadoke, Patrick W F; Seckl, Jonathan R; Bunger, Lutz; Horvat, Simon; Kenyon, Christopher J; Dunbar, Donald R

    2011-01-01

    Obesity and metabolic syndrome results from a complex interaction between genetic and environmental factors. In addition to brain-regulated processes, recent genome wide association studies have indicated that genes highly expressed in adipose tissue affect the distribution and function of fat and thus contribute to obesity. Using a stratified transcriptome gene enrichment approach we attempted to identify adipose tissue-specific obesity genes in the unique polygenic Fat (F) mouse strain generated by selective breeding over 60 generations for divergent adiposity from a comparator Lean (L) strain. To enrich for adipose tissue obesity genes a 'snap-shot' pooled-sample transcriptome comparison of key fat depots and non adipose tissues (muscle, liver, kidney) was performed. Known obesity quantitative trait loci (QTL) information for the model allowed us to further filter genes for increased likelihood of being causal or secondary for obesity. This successfully identified several genes previously linked to obesity (C1qr1, and Np3r) as positional QTL candidate genes elevated specifically in F line adipose tissue. A number of novel obesity candidate genes were also identified (Thbs1, Ppp1r3d, Tmepai, Trp53inp2, Ttc7b, Tuba1a, Fgf13, Fmr) that have inferred roles in fat cell function. Quantitative microarray analysis was then applied to the most phenotypically divergent adipose depot after exaggerating F and L strain differences with chronic high fat feeding which revealed a distinct gene expression profile of line, fat depot and diet-responsive inflammatory, angiogenic and metabolic pathways. Selected candidate genes Npr3 and Thbs1, as well as Gys2, a non-QTL gene that otherwise passed our enrichment criteria were characterised, revealing novel functional effects consistent with a contribution to obesity. A focussed candidate gene enrichment strategy in the unique F and L model has identified novel adipose tissue-enriched genes contributing to obesity.

  19. Transcriptome analysis in different developmental stages of Batocera horsfieldi (Coleoptera: Cerambycidae) and comparison of candidate olfactory genes

    PubMed Central

    Yang, Wei; Yang, Chunping; Zhang, Jin; Yang, Yang; Wang, Baoxin; Guan, Fengrong

    2018-01-01

    The white-striped longhorn beetle Batocera horsfieldi (Coleoptera: Cerambycidae) is a polyphagous wood-boring pest that causes substantial damage to the lumber industry. Moreover olfactory proteins are crucial components to function in related processes, but the B. horsfieldi genome is not readily available for olfactory proteins analysis. In the present study, developmental transcriptomes of larvae from the first instar to the prepupal stage, pupae, and adults (females and males) from emergence to mating were built by RNA sequencing to establish a genetic background that may help understand olfactory genes. Approximately 199 million clean reads were obtained and assembled into 171,664 transcripts, which were classified into 23,380, 26,511, 22,393, 30,270, and 87, 732 unigenes for larvae, pupae, females, males, and combined datasets, respectively. The unigenes were annotated against NCBI’s non-redundant nucleotide and protein sequences, Swiss-Prot, Gene Ontology (GO), Pfam, Clusters of Eukaryotic Orthologous Groups (KOG), and KEGG Orthology (KO) databases. A total of 43,197 unigenes were annotated into 55 sub-categories under the three main GO categories; 25,237 unigenes were classified into 26 functional KOG categories, and 25,814 unigenes were classified into five functional KEGG Pathway categories. RSEM software identified 2,983, 3,097, 870, 2,437, 5,161, and 2,882 genes that were differentially expressed between larvae and males, larvae and pupae, larvae and females, males and females, males and pupae, and females and pupae, respectively. Among them, genes encoding seven candidate odorant binding proteins (OBPs) and three chemosensory proteins (CSPs) were identified. RT-PCR and RT-qPCR analyses showed that BhorOBP3, BhorCSP2, and BhorOBPC1/C3/C4 were highly expressed in the antenna of males, indicating these genes may may play key roles in foraging and host-orientation in B. horsfieldi. Our results provide valuable molecular information about the olfactory system in B. horsfieldi and will help guide future functional studies on olfactory genes. PMID:29474419

  20. Transcriptome analysis of olive flounder (Paralichthys olivaceus) head kidney infected with moderate and high virulent strains of infectious viral hemorrhagic septicaemia virus (VHSV).

    PubMed

    Hwang, Jee Youn; Markkandan, Kesavan; Kwon, Mun Gyeong; Seo, Jung Soo; Yoo, Seung-Il; Hwang, Seong Don; Son, Maeng-Hyun; Park, Junhyung

    2018-05-01

    Olive flounder (Paralichthys olivaceus) is one of the most valuable marine aquatic species in South Korea and faces tremendous exposure to the viral hemorrhagic septicemia virus (VHSV). Given the growing importance of flounder, it is therefore essential to understand the host defense of P. olivaceus against VHSV infection, but studies on its immune mechanism are hindered by the lack of genomic resources. In this study, the P. olivaceus was infected with disease-causing VHSV isolates, ADC-VHS2012-11 and ADC-VHS2014-5 which showed moderate virulent (20% mortality) and high virulent (65% mortality), in order to investigate the effect of difference in pathogenicity in head kidney during 1, 3, 7 days of post-infection using Illumina sequencing. After removing low-quality sequences, we obtained 144,933,160 high quality reads from thirty-six libraries which were further assembled into 53,384 unigenes with an average length of 563 bp with a range of 200 to 9605 bp. Transcriptome annotation revealed that 30,475 unigenes with a cut-off e-value of 10 -5 were functionally annotated. In total, 10,046 unigenes were clustered into 26 functional categories by searching against the eggNOG database, and 22,233 unigenes to 52 GO terms. In addition, 12,985 unigenes were grouped into 387 KEGG pathways. Among the 13,270 differently expressed genes, 6578 and 6692 were differentially expressed only in moderate and high virulent, respectively. Based on our sequence analysis, many candidate genes with fundamental roles in innate immune system including, pattern recognition receptors (TLRs & RLRs), Mx, complement proteins, lectins, and cytokines (chemokines, IFN, IRF, IL, TRF) were differentially expressed. Furthermore, GO enrichment analysis for these genes revealed gene response to defense response to virus, apoptotic process and transcription factor activity. In summary, this study identifies several putative immune pathways and candidate genes deserving further investigation in the context of novel gene discovery, gene expression and regulation studies and lays the foundation for fish immunology especially in P. olivaceus against VHSV. Copyright © 2018. Published by Elsevier Ltd.

  1. Genome-Wide Prediction of the Polymorphic Ser Gene Family in Tetrahymena thermophila Based on Motif Analysis

    PubMed Central

    Ponsuwanna, Patrath; Kümpornsin, Krittikorn; Chookajorn, Thanat

    2014-01-01

    Even though antigenic variation is employed among parasitic protozoa for host immune evasion, Tetrahymena thermophila, a free-living ciliate, can also change its surface protein antigens. These cysteine-rich glycosylphosphatidylinositol (GPI)-linked surface proteins are encoded by a family of polymorphic Ser genes. Despite the availability of T. thermophila genome, a comprehensive analysis of the Ser family is limited by its high degree of polymorphism. In order to overcome this problem, a new approach was adopted by searching for Ser candidates with common motif sequences, namely length-specific repetitive cysteine pattern and GPI anchor site. The candidate genes were phylogenetically compared with the previously identified Ser genes and classified into subtypes. Ser candidates were often found to be located as tandem arrays of the same subtypes on several chromosomal scaffolds. Certain Ser candidates located in the same chromosomal arrays were transcriptionally expressed at specific T. thermophila developmental stages. These Ser candidates selected by the motif analysis approach can form the foundation for a systematic identification of the entire Ser gene family, which will contribute to the understanding of their function and the basis of T. thermophila antigenic variation. PMID:25133747

  2. Analysis of Craniocardiac Malformations in Xenopus using Optical Coherence Tomography

    PubMed Central

    Deniz, Engin; Jonas, Stephan; Hooper, Michael; N. Griffin, John; Choma, Michael A.; Khokha, Mustafa K.

    2017-01-01

    Birth defects affect 3% of children in the United States. Among the birth defects, congenital heart disease and craniofacial malformations are major causes of mortality and morbidity. Unfortunately, the genetic mechanisms underlying craniocardiac malformations remain largely uncharacterized. To address this, human genomic studies are identifying sequence variations in patients, resulting in numerous candidate genes. However, the molecular mechanisms of pathogenesis for most candidate genes are unknown. Therefore, there is a need for functional analyses in rapid and efficient animal models of human disease. Here, we coupled the frog Xenopus tropicalis with Optical Coherence Tomography (OCT) to create a fast and efficient system for testing craniocardiac candidate genes. OCT can image cross-sections of microscopic structures in vivo at resolutions approaching histology. Here, we identify optimal OCT imaging planes to visualize and quantitate Xenopus heart and facial structures establishing normative data. Next we evaluate known human congenital heart diseases: cardiomyopathy and heterotaxy. Finally, we examine craniofacial defects by a known human teratogen, cyclopamine. We recapitulate human phenotypes readily and quantify the functional and structural defects. Using this approach, we can quickly test human craniocardiac candidate genes for phenocopy as a critical first step towards understanding disease mechanisms of the candidate genes. PMID:28195132

  3. Using Association Mapping in Teosinte (Zea Mays ssp Parviglumis) to Investigate the Function of Selection-Candidate Genes

    USDA-ARS?s Scientific Manuscript database

    Large-scale screens of the maize genome identified 48 genes that show the putative signature of artificial selection during maize domestication or improvement. These selection-candidate genes may act as quantitative trait loci (QTL) that control the phenotypic differences between maize and its proge...

  4. PINTA: a web server for network-based gene prioritization from expression data

    PubMed Central

    Nitsch, Daniela; Tranchevent, Léon-Charles; Gonçalves, Joana P.; Vogt, Josef Korbinian; Madeira, Sara C.; Moreau, Yves

    2011-01-01

    PINTA (available at http://www.esat.kuleuven.be/pinta/; this web site is free and open to all users and there is no login requirement) is a web resource for the prioritization of candidate genes based on the differential expression of their neighborhood in a genome-wide protein–protein interaction network. Our strategy is meant for biological and medical researchers aiming at identifying novel disease genes using disease specific expression data. PINTA supports both candidate gene prioritization (starting from a user defined set of candidate genes) as well as genome-wide gene prioritization and is available for five species (human, mouse, rat, worm and yeast). As input data, PINTA only requires disease specific expression data, whereas various platforms (e.g. Affymetrix) are supported. As a result, PINTA computes a gene ranking and presents the results as a table that can easily be browsed and downloaded by the user. PMID:21602267

  5. Allie: a database and a search service of abbreviations and long forms.

    PubMed

    Yamamoto, Yasunori; Yamaguchi, Atsuko; Bono, Hidemasa; Takagi, Toshihisa

    2011-01-01

    Many abbreviations are used in the literature especially in the life sciences, and polysemous abbreviations appear frequently, making it difficult to read and understand scientific papers that are outside of a reader's expertise. Thus, we have developed Allie, a database and a search service of abbreviations and their long forms (a.k.a. full forms or definitions). Allie searches for abbreviations and their corresponding long forms in a database that we have generated based on all titles and abstracts in MEDLINE. When a user query matches an abbreviation, Allie returns all potential long forms of the query along with their bibliographic data (i.e. title and publication year). In addition, for each candidate, co-occurring abbreviations and a research field in which it frequently appears in the MEDLINE data are displayed. This function helps users learn about the context in which an abbreviation appears. To deal with synonymous long forms, we use a dictionary called GENA that contains domain-specific terms such as gene, protein or disease names along with their synonymic information. Conceptually identical domain-specific terms are regarded as one term, and then conceptually identical abbreviation-long form pairs are grouped taking into account their appearance in MEDLINE. To keep up with new abbreviations that are continuously introduced, Allie has an automatic update system. In addition, the database of abbreviations and their long forms with their corresponding PubMed IDs is constructed and updated weekly. Database URL: The Allie service is available at http://allie.dbcls.jp/.

  6. Identifying positive selection candidate loci for high-altitude adaptation in Andean populations

    PubMed Central

    2009-01-01

    High-altitude environments (>2,500 m) provide scientists with a natural laboratory to study the physiological and genetic effects of low ambient oxygen tension on human populations. One approach to understanding how life at high altitude has affected human metabolism is to survey genome-wide datasets for signatures of natural selection. In this work, we report on a study to identify selection-nominated candidate genes involved in adaptation to hypoxia in one highland group, Andeans from the South American Altiplano. We analysed dense microarray genotype data using four test statistics that detect departures from neutrality. Using a candidate gene, single nucleotide polymorphism-based approach, we identified genes exhibiting preliminary evidence of recent genetic adaptation in this population. These included genes that are part of the hypoxia-inducible transcription factor (HIF) pathway, a biochemical pathway involved in oxygen homeostasis, as well as three other genomic regions previously not known to be associated with high-altitude phenotypes. In addition to identifying selection-nominated candidate genes, we also tested whether the HIF pathway shows evidence of natural selection. Our results indicate that the genes of this biochemical pathway as a group show no evidence of having evolved in response to hypoxia in Andeans. Results from particular HIF-targeted genes, however, suggest that genes in this pathway could play a role in Andean adaptation to high altitude, even if the pathway as a whole does not show higher relative rates of evolution. These data suggest a genetic role in high-altitude adaptation and provide a basis for genotype/phenotype association studies that are necessary to confirm the role of putative natural selection candidate genes and gene regions in adaptation to altitude. PMID:20038496

  7. Analysis of 60 reported glioma risk SNPs replicates published GWAS findings but fails to replicate associations from published candidate-gene studies.

    PubMed

    Walsh, Kyle M; Anderson, Erik; Hansen, Helen M; Decker, Paul A; Kosel, Matt L; Kollmeyer, Thomas; Rice, Terri; Zheng, Shichun; Xiao, Yuanyuan; Chang, Jeffrey S; McCoy, Lucie S; Bracci, Paige M; Wiemels, Joe L; Pico, Alexander R; Smirnov, Ivan; Lachance, Daniel H; Sicotte, Hugues; Eckel-Passow, Jeanette E; Wiencke, John K; Jenkins, Robert B; Wrensch, Margaret R

    2013-02-01

    Genomewide association studies (GWAS) and candidate-gene studies have implicated single-nucleotide polymorphisms (SNPs) in at least 45 different genes as putative glioma risk factors. Attempts to validate these associations have yielded variable results and few genetic risk factors have been consistently replicated. We conducted a case-control study of Caucasian glioma cases and controls from the University of California San Francisco (810 cases, 512 controls) and the Mayo Clinic (852 cases, 789 controls) in an attempt to replicate previously reported genetic risk factors for glioma. Sixty SNPs selected from the literature (eight from GWAS and 52 from candidate-gene studies) were successfully genotyped on an Illumina custom genotyping panel. Eight SNPs in/near seven different genes (TERT, EGFR, CCDC26, CDKN2A, PHLDB1, RTEL1, TP53) were significantly associated with glioma risk in the combined dataset (P < 0.05), with all associations in the same direction as in previous reports. Several SNP associations showed considerable differences across histologic subtype. All eight successfully replicated associations were first identified by GWAS, although none of the putative risk SNPs from candidate-gene studies was associated in the full case-control sample (all P values > 0.05). Although several confirmed associations are located near genes long known to be involved in gliomagenesis (e.g., EGFR, CDKN2A, TP53), these associations were first discovered by the GWAS approach and are in noncoding regions. These results highlight that the deficiencies of the candidate-gene approach lay in selecting both appropriate genes and relevant SNPs within these genes. © 2012 WILEY PERIODICALS, INC.

  8. Most of the tight positional conservation of transcription factor binding sites near the transcription start site reflects their co-localization within regulatory modules.

    PubMed

    Acevedo-Luna, Natalia; Mariño-Ramírez, Leonardo; Halbert, Armand; Hansen, Ulla; Landsman, David; Spouge, John L

    2016-11-21

    Transcription factors (TFs) form complexes that bind regulatory modules (RMs) within DNA, to control specific sets of genes. Some transcription factor binding sites (TFBSs) near the transcription start site (TSS) display tight positional preferences relative to the TSS. Furthermore, near the TSS, RMs can co-localize TFBSs with each other and the TSS. The proportion of TFBS positional preferences due to TFBS co-localization within RMs is unknown, however. ChIP experiments confirm co-localization of some TFBSs genome-wide, including near the TSS, but they typically examine only a few TFs at a time, using non-physiological conditions that can vary from lab to lab. In contrast, sequence analysis can examine many TFs uniformly and methodically, broadly surveying the co-localization of TFBSs with tight positional preferences relative to the TSS. Our statistics found 43 significant sets of human motifs in the JASPAR TF Database with positional preferences relative to the TSS, with 38 preferences tight (±5 bp). Each set of motifs corresponded to a gene group of 135 to 3304 genes, with 42/43 (98%) gene groups independently validated by DAVID, a gene ontology database, with FDR < 0.05. Motifs corresponding to two TFBSs in a RM should co-occur more than by chance alone, enriching the intersection of the gene groups corresponding to the two TFs. Thus, a gene-group intersection systematically enriched beyond chance alone provides evidence that the two TFs participate in an RM. Of the 903 = 43*42/2 intersections of the 43 significant gene groups, we found 768/903 (85%) pairs of gene groups with significantly enriched intersections, with 564/768 (73%) intersections independently validated by DAVID with FDR < 0.05. A user-friendly web site at http://go.usa.gov/3kjsH permits biologists to explore the interaction network of our TFBSs to identify candidate subunit RMs. Gene duplication and convergent evolution within a genome provide obvious biological mechanisms for replicating an RM near the TSS that binds a particular TF subunit. Of all intersections of our 43 significant gene groups, 85% were significantly enriched, with 73% of the significant enrichments independently validated by gene ontology. The co-localization of TFBSs within RMs therefore likely explains much of the tight TFBS positional preferences near the TSS.

  9. Indel-seq: a fast-forward genetics approach for identification of trait-associated putative candidate genomic regions and its application in pigeonpea (Cajanus cajan).

    PubMed

    Singh, Vikas K; Khan, Aamir W; Saxena, Rachit K; Sinha, Pallavi; Kale, Sandip M; Parupalli, Swathi; Kumar, Vinay; Chitikineni, Annapurna; Vechalapu, Suryanarayana; Sameer Kumar, Chanda Venkata; Sharma, Mamta; Ghanta, Anuradha; Yamini, Kalinati Narasimhan; Muniswamy, Sonnappa; Varshney, Rajeev K

    2017-07-01

    Identification of candidate genomic regions associated with target traits using conventional mapping methods is challenging and time-consuming. In recent years, a number of single nucleotide polymorphism (SNP)-based mapping approaches have been developed and used for identification of candidate/putative genomic regions. However, in the majority of these studies, insertion-deletion (Indel) were largely ignored. For efficient use of Indels in mapping target traits, we propose Indel-seq approach, which is a combination of whole-genome resequencing (WGRS) and bulked segregant analysis (BSA) and relies on the Indel frequencies in extreme bulks. Deployment of Indel-seq approach for identification of candidate genomic regions associated with fusarium wilt (FW) and sterility mosaic disease (SMD) resistance in pigeonpea has identified 16 Indels affecting 26 putative candidate genes. Of these 26 affected putative candidate genes, 24 genes showed effect in the upstream/downstream of the genic region and two genes showed effect in the genes. Validation of these 16 candidate Indels in other FW- and SMD-resistant and FW- and SMD-susceptible genotypes revealed a significant association of five Indels (three for FW and two for SMD resistance). Comparative analysis of Indel-seq with other genetic mapping approaches highlighted the importance of the approach in identification of significant genomic regions associated with target traits. Therefore, the Indel-seq approach can be used for quick and precise identification of candidate genomic regions for any target traits in any crop species. © 2016 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.

  10. Candidate EDA targets revealed by expression profiling of primary keratinocytes from Tabby mutant mice

    PubMed Central

    Esibizione, Diana; Cui, Chang-Yi; Schlessinger, David

    2009-01-01

    EDA, the gene mutated in anhidrotic ectodermal dysplasia, encodes ectodysplasin, a TNF superfamily member that activates NF-kB mediated transcription. To identify EDA target genes, we have earlier used expression profiling to infer genes differentially expressed at various developmental time points in Tabby (Eda-deficient) compared to wild-type mouse skin. To increase the resolution to find genes whose expression may be restricted to epidermal cells, we have now extended studies to primary keratinocyte cultures established from E19 wild-type and Tabby skin. Using microarrays bearing 44,000 gene probes, we found 385 preliminary candidate genes whose expression was significantly affected by Eda loss. By comparing expression profiles to those from Eda-A1 transgenic skin, we restricted the list to 38 “candidate EDA targets”, 14 of which were already known to be expressed in hair follicles or epidermis. We confirmed expression changes for 3 selected genes, Tbx1, Bmp7, and Jag1, both in keratinocytes and in whole skin, by Q-PCR and Western blotting analyses. Thus, by the analysis of keratinocytes, novel candidate pathways downstream of EDA were detected. PMID:18848976

  11. Massive NGS Data Analysis Reveals Hundreds Of Potential Novel Gene Fusions in Human Cell Lines.

    PubMed

    Gioiosa, Silvia; Bolis, Marco; Flati, Tiziano; Massini, Annalisa; Garattini, Enrico; Chillemi, Giovanni; Fratelli, Maddalena; Castrignanò, Tiziana

    2018-06-01

    Gene fusions derive from chromosomal rearrangements and the resulting chimeric transcripts are often endowed with oncogenic potential. Furthermore, they serve as diagnostic tools for the clinical classification of cancer subgroups with different prognosis and, in some cases, they can provide specific drug targets. So far, many efforts have been carried out to study gene fusion events occurring in tumor samples. In recent years, the availability of a comprehensive Next Generation Sequencing dataset for all the existing human tumor cell lines has provided the opportunity to further investigate these data in order to identify novel and still uncharacterized gene fusion events. In our work, we have extensively reanalyzed 935 paired-end RNA-seq experiments downloaded from "The Cancer Cell Line Encyclopedia" repository, aiming at addressing novel putative cell-line specific gene fusion events in human malignancies. The bioinformatics analysis has been performed by the execution of four different gene fusion detection algorithms. The results have been further prioritized by running a bayesian classifier which makes an in silico validation. The collection of fusion events supported by all of the predictive softwares results in a robust set of ∼ 1,700 in-silico predicted novel candidates suitable for downstream analyses. Given the huge amount of data and information produced, computational results have been systematized in a database named LiGeA. The database can be browsed through a dynamical and interactive web portal, further integrated with validated data from other well known repositories. Taking advantage of the intuitive query forms, the users can easily access, navigate, filter and select the putative gene fusions for further validations and studies. They can also find suitable experimental models for a given fusion of interest. We believe that the LiGeA resource can represent not only the first compendium of both known and putative novel gene fusion events in the catalog of all of the human malignant cell lines, but it can also become a handy starting point for wet-lab biologists who wish to investigate novel cancer biomarkers and specific drug targets.

  12. Doubling down on phosphorylation as a variable peptide modification.

    PubMed

    Cooper, Bret

    2016-09-01

    Some mass spectrometrists believe that searching for variable PTMs like phosphorylation of serine or threonine when using database-search algorithms to interpret peptide tandem mass spectra will increase false-positive matching. The basis for this is the premise that the algorithm compares a spectrum to both a nonphosphorylated peptide candidate and a phosphorylated candidate, which is double the number of candidates compared to a search with no possible phosphorylation. Hence, if the search space doubles, false-positive matching could increase accordingly as the algorithm considers more candidates to which false matches could be made. In this study, it is shown that the search for variable phosphoserine and phosphothreonine modifications does not always double the search space or unduly impinge upon the FDR. A breakdown of how one popular database-search algorithm deals with variable phosphorylation is presented. Published 2016. This article is a U.S. Government work and is in the public domain in the USA.

  13. A comprehensive study of the genomic differentiation between temperate Dent and Flint maize.

    PubMed

    Unterseer, Sandra; Pophaly, Saurabh D; Peis, Regina; Westermeier, Peter; Mayer, Manfred; Seidel, Michael A; Haberer, Georg; Mayer, Klaus F X; Ordas, Bernardo; Pausch, Hubert; Tellier, Aurélien; Bauer, Eva; Schön, Chris-Carolin

    2016-07-08

    Dent and Flint represent two major germplasm pools exploited in maize breeding. Several traits differentiate the two pools, like cold tolerance, early vigor, and flowering time. A comparative investigation of their genomic architecture relevant for quantitative trait expression has not been reported so far. Understanding the genomic differences between germplasm pools may contribute to a better understanding of the complementarity in heterotic patterns exploited in hybrid breeding and of mechanisms involved in adaptation to different environments. We perform whole-genome screens for signatures of selection specific to temperate Dent and Flint maize by comparing high-density genotyping data of 70 American and European Dent and 66 European Flint inbred lines. We find 2.2 % and 1.4 % of the genes are under selective pressure, respectively, and identify candidate genes associated with agronomic traits known to differ between the two pools. Taking flowering time as an example for the differentiation between Dent and Flint, we investigate candidate genes involved in the flowering network by phenotypic analyses in a Dent-Flint introgression library and find that the Flint haplotypes of the candidates promote earlier flowering. Within the flowering network, the majority of Flint candidates are associated with endogenous pathways in contrast to Dent candidate genes, which are mainly involved in response to environmental factors like light and photoperiod. The diversity patterns of the candidates in a unique panel of more than 900 individuals from 38 European landraces indicate a major contribution of landraces from France, Germany, and Spain to the candidate gene diversity of the Flint elite lines. In this study, we report the investigation of pool-specific differences between temperate Dent and Flint on a genome-wide scale. The identified candidate genes represent a promising source for the functional investigation of pool-specific haplotypes in different genetic backgrounds and for the evaluation of their potential for future crop improvement like the adaptation to specific environments.

  14. Identification of candidate genes associated with fibromyalgia susceptibility in southern Spanish women: the al-Ándalus project.

    PubMed

    Estévez-López, Fernando; Camiletti-Moirón, Daniel; Aparicio, Virginia A; Segura-Jiménez, Víctor; Álvarez-Gallardo, Inmaculada C; Soriano-Maldonado, Alberto; Borges-Cosic, Milkana; Acosta-Manzano, Pedro; Geenen, Rinie; Delgado-Fernández, Manuel; Martínez-González, Luis J; Ruiz, Jonatan R; Álvarez-Cubero, María J

    2018-02-27

    Candidate-gene studies on fibromyalgia susceptibility often include a small number of single nucleotide polymorphisms (SNPs), which is a limitation. Moreover, there is a paucity of evidence in Europe. Therefore, we compared genotype frequencies of candidate SNPs in a well-characterised sample of Spanish women with fibromyalgia and healthy non-fibromyalgia women. A total of 314 women with a diagnosis of fibromyalgia (cases) and 112 non-fibromyalgia healthy (controls) women participated in this candidate-gene study. Buccal swabs were collected for DNA extraction. Using TaqMan™ OpenArray™, we analysed 61 SNPs of 33 genes related to fibromyalgia susceptibility, symptoms, or potential mechanisms. We observed that the rs841 and rs1799971 GG genotype was more frequently observed in fibromyalgia than in controls (p = 0.04 and p = 0.02, respectively). The rs2097903 AT/TT genotypes were also more often present in the fibromyalgia participants than in their control peers (p = 0.04). There were no differences for the remaining SNPs. We identified, for the first time, associations of the rs841 (guanosine triphosphate cyclohydrolase 1 gene) and rs2097903 (catechol-O-methyltransferase gene) SNPs with higher risk of fibromyalgia susceptibility. We also confirmed that the rs1799971 SNP (opioid receptor μ1 gene) might confer genetic risk of fibromyalgia. We did not adjust for multiple comparisons, which would be too stringent and yield to non-significant differences in the genotype frequencies between cases and controls. Our findings may be biologically meaningful and informative, and should be further investigated in other populations. Of particular interest is to replicate the present study in a larger independent sample to confirm or refute our findings. On the other hand, by including 61 SNPs of 33 candidate-genes with a strong rationale (they were previously investigated in relation to fibromyalgia susceptibility, symptoms or potential mechanisms), the present research is the most comprehensive candidate-gene study on fibromyalgia susceptibility to date.

  15. Renal Gene Expression Database (RGED): a relational database of gene expression profiles in kidney disease

    PubMed Central

    Zhang, Qingzhou; Yang, Bo; Chen, Xujiao; Xu, Jing; Mei, Changlin; Mao, Zhiguo

    2014-01-01

    We present a bioinformatics database named Renal Gene Expression Database (RGED), which contains comprehensive gene expression data sets from renal disease research. The web-based interface of RGED allows users to query the gene expression profiles in various kidney-related samples, including renal cell lines, human kidney tissues and murine model kidneys. Researchers can explore certain gene profiles, the relationships between genes of interests and identify biomarkers or even drug targets in kidney diseases. The aim of this work is to provide a user-friendly utility for the renal disease research community to query expression profiles of genes of their own interest without the requirement of advanced computational skills. Availability and implementation: Website is implemented in PHP, R, MySQL and Nginx and freely available from http://rged.wall-eva.net. Database URL: http://rged.wall-eva.net PMID:25252782

  16. Comparative molecular analyses of select pH- and osmoregulatory genes in three freshwater crayfish Cherax quadricarinatus, C. destructor and C. cainii.

    PubMed

    Ali, Muhammad Y; Pavasovic, Ana; Dammannagoda, Lalith K; Mather, Peter B; Prentis, Peter J

    2017-01-01

    Systemic acid-base balance and osmotic/ionic regulation in decapod crustaceans are in part maintained by a set of transport-related enzymes such as carbonic anhydrase (CA), Na + /K + -ATPase (NKA), H + -ATPase (HAT), Na + /K + /2Cl - cotransporter (NKCC), Na + /Cl - /HCO[Formula: see text] cotransporter (NBC), Na + /H + exchanger (NHE), Arginine kinase (AK), Sarcoplasmic Ca +2 -ATPase (SERCA) and Calreticulin (CRT). We carried out a comparative molecular analysis of these genes in three commercially important yet eco-physiologically distinct freshwater crayfish , Cherax quadricarinatus, C. destructor and C. cainii , with the aim to identify mutations in these genes and determine if observed patterns of mutations were consistent with the action of natural selection. We also conducted a tissue-specific expression analysis of these genes across seven different organs, including gills, hepatopancreas, heart, kidney, liver, nerve and testes using NGS transcriptome data. The molecular analysis of the candidate genes revealed a high level of sequence conservation across the three Cherax sp. Hyphy analysis revealed that all candidate genes showed patterns of molecular variation consistent with neutral evolution. The tissue-specific expression analysis showed that 46% of candidate genes were expressed in all tissue types examined, while approximately 10% of candidate genes were only expressed in a single tissue type. The largest number of genes was observed in nerve (84%) and gills (78%) and the lowest in testes (66%). The tissue-specific expression analysis also revealed that most of the master genes regulating pH and osmoregulation (CA, NKA, HAT, NKCC, NBC, NHE) were expressed in all tissue types indicating an important physiological role for these genes outside of osmoregulation in other tissue types. The high level of sequence conservation observed in the candidate genes may be explained by the important role of these genes as well as potentially having a number of other basic physiological functions in different tissue types.

  17. How rare bone diseases have informed our knowledge of complex diseases.

    PubMed

    Johnson, Mark L

    2016-01-01

    Rare bone diseases, generally defined as monogenic traits with either autosomal recessive or dominant patterns of inheritance, have provided a rich database of genes and associated pathways over the past 2-3 decades. The molecular genetic dissection of these bone diseases has yielded some major surprises in terms of the causal genes and/or involved pathways. The discovery of genes/pathways involved in diseases such as osteopetrosis, osteosclerosis, osteogenesis imperfecta and many other rare bone diseases have all accelerated our understanding of complex traits. Importantly these discoveries have provided either direct validation for a specific gene embedded in a group of genes within an interval identified through a complex trait genome-wide association study (GWAS) or based upon the pathway associated with a monogenic trait gene, provided a means to prioritize a large number of genes for functional validation studies. In some instances GWAS studies have yielded candidate genes that fall within linkage intervals associated with monogenic traits and resulted in the identification of causal mutations in those rare diseases. Driving all of this discovery is a complement of technologies such as genome sequencing, bioinformatics and advanced statistical analysis methods that have accelerated genetic dissection and greatly reduced the cost. Thus, rare bone disorders in partnership with GWAS have brought us to the brink of a new era of personalized genomic medicine in which the prevention and management of complex diseases will be driven by the molecular understanding of each individuals contributing genetic risks for disease.

  18. Patterns of population differentiation of candidate genes for cardiovascular disease.

    PubMed

    Kullo, Iftikhar J; Ding, Keyue

    2007-07-12

    The basis for ethnic differences in cardiovascular disease (CVD) susceptibility is not fully understood. We investigated patterns of population differentiation (FST) of a set of genes in etiologic pathways of CVD among 3 ethnic groups: Yoruba in Nigeria (YRI), Utah residents with European ancestry (CEU), and Han Chinese (CHB) + Japanese (JPT). We identified 37 pathways implicated in CVD based on the PANTHER classification and 416 genes in these pathways were further studied; these genes belonged to 6 biological processes (apoptosis, blood circulation and gas exchange, blood clotting, homeostasis, immune response, and lipoprotein metabolism). Genotype data were obtained from the HapMap database. We calculated FST for 15,559 common SNPs (minor allele frequency > or = 0.10 in at least one population) in genes that co-segregated among the populations, as well as an average-weighted FST for each gene. SNPs were classified as putatively functional (non-synonymous and untranslated regions) or non-functional (intronic and synonymous sites). Mean FST values for common putatively functional variants were significantly higher than FST values for nonfunctional variants. A significant variation in FST was also seen based on biological processes; the processes of 'apoptosis' and 'lipoprotein metabolism' showed an excess of genes with high FST. Thus, putative functional SNPs in genes in etiologic pathways for CVD show greater population differentiation than non-functional SNPs and a significant variance of FST values was noted among pairwise population comparisons for different biological processes. These results suggest a possible basis for varying susceptibility to CVD among ethnic groups.

  19. Comprehensive Gene expression meta-analysis and integrated bioinformatic approaches reveal shared signatures between thrombosis and myeloproliferative disorders

    PubMed Central

    Jha, Prabhash Kumar; Vijay, Aatira; Sahu, Anita; Ashraf, Mohammad Zahid

    2016-01-01

    Thrombosis is a leading cause of morbidity and mortality in patients with myeloproliferative disorders (MPDs), particularly polycythemia vera (PV) and essential thrombocythemia (ET). Despite the attempts to establish a link between them, the shared biological mechanisms are yet to be characterized. An integrated gene expression meta-analysis of five independent publicly available microarray data of the three diseases was conducted to identify shared gene expression signatures and overlapping biological processes. Using INMEX bioinformatic tool, based on combined Effect Size (ES) approaches, we identified a total of 1,157 differentially expressed genes (DEGs) (697 overexpressed and 460 underexpressed genes) shared between the three diseases. EnrichR tool’s rich library was used for comprehensive functional enrichment and pathway analysis which revealed “mRNA Splicing” and “SUMO E3 ligases SUMOylate target proteins” among the most enriched terms. Network based meta-analysis identified MYC and FN1 to be the most highly ranked hub genes. Our results reveal that the alterations in biomarkers of the coagulation cascade like F2R, PROS1, SELPLG and ITGB2 were common between the three diseases. Interestingly, the study has generated a novel database of candidate genetic markers, pathways and transcription factors shared between thrombosis and MPDs, which might aid in the development of prognostic therapeutic biomarkers. PMID:27892526

  20. Genome analysis and identification of gelatinase encoded gene in Enterobacter aerogenes

    NASA Astrophysics Data System (ADS)

    Shahimi, Safiyyah; Mutalib, Sahilah Abdul; Khalid, Rozida Abdul; Repin, Rul Aisyah Mat; Lamri, Mohd Fadly; Bakar, Mohd Faizal Abu; Isa, Mohd Noor Mat

    2016-11-01

    In this study, bioinformatic analysis towards genome sequence of E. aerogenes was done to determine gene encoded for gelatinase. Enterobacter aerogenes was isolated from hot spring water and gelatinase species-specific bacterium to porcine and fish gelatin. This bacterium offers the possibility of enzymes production which is specific to both species gelatine, respectively. Enterobacter aerogenes was partially genome sequenced resulting in 5.0 mega basepair (Mbp) total size of sequence. From pre-process pipeline, 87.6 Mbp of total reads, 68.8 Mbp of total high quality reads and 78.58 percent of high quality percentage was determined. Genome assembly produced 120 contigs with 67.5% of contigs over 1 kilo base pair (kbp), 124856 bp of N50 contig length and 55.17 % of GC base content percentage. About 4705 protein gene was identified from protein prediction analysis. Two candidate genes selected have highest similarity identity percentage against gelatinase enzyme available in Swiss-Prot and NCBI online database. They were NODE_9_length_26866_cov_148.013245_12 containing 1029 base pair (bp) sequence with 342 amino acid sequence and NODE_24_length_155103_cov_177.082458_62 which containing 717 bp sequence with 238 amino acid sequence, respectively. Thus, two paired of primers (forward and reverse) were designed, based on the open reading frame (ORF) of selected genes. Genome analysis of E. aerogenes resulting genes encoded gelatinase were identified.

  1. Identification of an miRNA candidate reflects the possible significance of transcribed microsatellites in the hairpin precursors of black pepper.

    PubMed

    Joy, Nisha; Soniya, Eppurathu Vasudevan

    2012-06-01

    Plant miRNAs (18-24nt) are generated by the RNase III-type Dicer endonuclease from the endogenous hairpin precursors ('pre-miRNAs') with significant regulatory functions. The transcribed regions display a higher frequency of microsatellites, when compared to other regions of the genomic DNA. Simple sequence repeats (SSRs) resulting from replication slippage occurring in transcripts affect the expression of genes. The available experimental evidence for the incidence of SSRs in the miRNA precursors is limited. Considering the potential significance of SSRs in the miRNA genes, we carried out a preliminary analysis to verify the presence of SSRs in the pri-miRNAs of black pepper (Piper nigrum L.). We isolated a (CT) dinucleotide SSR bearing transcript using SMART strategy. The transcript was predicted to be a 'pri-miRNA candidate' with Dicer sites based on miRNA prediction tools and MFOLD structural predictions. The presence of this 'miRNA candidate' was confirmed by real-time TaqMan assays. The upstream sequence of the 'miRNA candidate' by genome walking when subjected to PlantCARE showed the presence of certain promoter elements, and the deduced amino acid showed significant similarity with NAP1 gene, which affects the transcription of many genes. Moreover the hairpin-like precursor overlapped the neighbouring NAP1 gene. In silico analysis revealed distinct putative functions for the 'miRNA candidate', of which majority were related to growth. Hence, we assume that this 'miRNA candidate' may get activated during transcription of NAP gene, thereby regulating the expression of many genes involved in developmental processes.

  2. Novel candidate genes may be possible predisposing factors revealed by whole exome sequencing in familial esophageal squamous cell carcinoma.

    PubMed

    Forouzanfar, Narjes; Baranova, Ancha; Milanizadeh, Saman; Heravi-Moussavi, Alireza; Jebelli, Amir; Abbaszadegan, Mohammad Reza

    2017-05-01

    Esophageal squamous cell carcinoma is one of the deadliest of all the cancers. Its metastatic properties portend poor prognosis and high rate of recurrence. A more advanced method to identify new molecular biomarkers predicting disease prognosis can be whole exome sequencing. Here, we report the most effective genetic variants of the Notch signaling pathway in esophageal squamous cell carcinoma susceptibility by whole exome sequencing. We analyzed nine probands in unrelated familial esophageal squamous cell carcinoma pedigrees to identify candidate genes. Genomic DNA was extracted and whole exome sequencing performed to generate information about genetic variants in the coding regions. Bioinformatics software applications were utilized to exploit statistical algorithms to demonstrate protein structure and variants conservation. Polymorphic regions were excluded by false-positive investigations. Gene-gene interactions were analyzed for Notch signaling pathway candidates. We identified novel and damaging variants of the Notch signaling pathway through extensive pathway-oriented filtering and functional predictions, which led to the study of 27 candidate novel mutations in all nine patients. Detection of the trinucleotide repeat containing 6B gene mutation (a slice site alteration) in five of the nine probands, but not in any of the healthy samples, suggested that it may be a susceptibility factor for familial esophageal squamous cell carcinoma. Noticeably, 8 of 27 novel candidate gene mutations (e.g. epidermal growth factor, signal transducer and activator of transcription 3, MET) act in a cascade leading to cell survival and proliferation. Our results suggest that the trinucleotide repeat containing 6B mutation may be a candidate predisposing gene in esophageal squamous cell carcinoma. In addition, some of the Notch signaling pathway genetic mutations may act as key contributors to esophageal squamous cell carcinoma.

  3. A large-scale RNA interference screen identifies genes that regulate autophagy at different stages.

    PubMed

    Guo, Sujuan; Pridham, Kevin J; Virbasius, Ching-Man; He, Bin; Zhang, Liqing; Varmark, Hanne; Green, Michael R; Sheng, Zhi

    2018-02-12

    Dysregulated autophagy is central to the pathogenesis and therapeutic development of cancer. However, how autophagy is regulated in cancer is not well understood and genes that modulate cancer autophagy are not fully defined. To gain more insights into autophagy regulation in cancer, we performed a large-scale RNA interference screen in K562 human chronic myeloid leukemia cells using monodansylcadaverine staining, an autophagy-detecting approach equivalent to immunoblotting of the autophagy marker LC3B or fluorescence microscopy of GFP-LC3B. By coupling monodansylcadaverine staining with fluorescence-activated cell sorting, we successfully isolated autophagic K562 cells where we identified 336 short hairpin RNAs. After candidate validation using Cyto-ID fluorescence spectrophotometry, LC3B immunoblotting, and quantitative RT-PCR, 82 genes were identified as autophagy-regulating genes. 20 genes have been reported previously and the remaining 62 candidates are novel autophagy mediators. Bioinformatic analyses revealed that most candidate genes were involved in molecular pathways regulating autophagy, rather than directly participating in the autophagy process. Further autophagy flux assays revealed that 57 autophagy-regulating genes suppressed autophagy initiation, whereas 21 candidates promoted autophagy maturation. Our RNA interference screen identifies identified genes that regulate autophagy at different stages, which helps decode autophagy regulation in cancer and offers novel avenues to develop autophagy-related therapies for cancer.

  4. Bioinformatics-Based Identification of Candidate Genes from QTLs Associated with Cell Wall Traits in Populus

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ranjan, Priya; Yin, Tongming; Zhang, Xinye

    2009-11-01

    Quantitative trait locus (QTL) studies are an integral part of plant research and are used to characterize the genetic basis of phenotypic variation observed in structured populations and inform marker-assisted breeding efforts. These QTL intervals can span large physical regions on a chromosome comprising hundreds of genes, thereby hampering candidate gene identification. Genome history, evolution, and expression evidence can be used to narrow the genes in the interval to a smaller list that is manageable for detailed downstream functional genomics characterization. Our primary motivation for the present study was to address the need for a research methodology that identifies candidatemore » genes within a broad QTL interval. Here we present a bioinformatics-based approach for subdividing candidate genes within QTL intervals into alternate groups of high probability candidates. Application of this approach in the context of studying cell wall traits, specifically lignin content and S/G ratios of stem and root in Populus plants, resulted in manageable sets of genes of both known and putative cell wall biosynthetic function. These results provide a roadmap for future experimental work leading to identification of new genes controlling cell wall recalcitrance and, ultimately, in the utility of plant biomass as an energy feedstock.« less

  5. Identification, Classification, and Expression Analysis of GRAS Gene Family in Malus domestica

    PubMed Central

    Fan, Sheng; Zhang, Dong; Gao, Cai; Zhao, Ming; Wu, Haiqin; Li, Youmei; Shen, Yawen; Han, Mingyu

    2017-01-01

    GRAS genes encode plant-specific transcription factors that play important roles in plant growth and development. However, little is known about the GRAS gene family in apple. In this study, 127 GRAS genes were identified in the apple (Malus domestica Borkh.) genome and named MdGRAS1 to MdGRAS127 according to their chromosomal locations. The chemical characteristics, gene structures and evolutionary relationships of the MdGRAS genes were investigated. The 127 MdGRAS genes could be grouped into eight subfamilies based on their structural features and phylogenetic relationships. Further analysis of gene structures, segmental and tandem duplication, gene phylogeny and tissue-specific expression with ArrayExpress database indicated their diversification in quantity, structure and function. We further examined the expression pattern of MdGRAS genes during apple flower induction with transcriptome sequencing. Eight higher MdGRAS (MdGRAS6, 26, 28, 44, 53, 64, 107, and 122) genes were surfaced. Further quantitative reverse transcription PCR indicated that the candidate eight genes showed distinct expression patterns among different tissues (leaves, stems, flowers, buds, and fruits). The transcription levels of eight genes were also investigated with various flowering related treatments (GA3, 6-BA, and sucrose) and different flowering varieties (Yanfu No. 6 and Nagafu No. 2). They all were affected by flowering-related circumstance and showed different expression level. Changes in response to these hormone or sugar related treatments indicated their potential involvement during apple flower induction. Taken together, our results provide rich resources for studying GRAS genes and their potential clues in genetic improvement of apple flowering, which enriches biological theories of GRAS genes in apple and their involvement in flower induction of fruit trees. PMID:28503152

  6. Integrated pathway-based approach identifies association between genomic regions at CTCF and CACNB2 and schizophrenia.

    PubMed

    Juraeva, Dilafruz; Haenisch, Britta; Zapatka, Marc; Frank, Josef; Witt, Stephanie H; Mühleisen, Thomas W; Treutlein, Jens; Strohmaier, Jana; Meier, Sandra; Degenhardt, Franziska; Giegling, Ina; Ripke, Stephan; Leber, Markus; Lange, Christoph; Schulze, Thomas G; Mössner, Rainald; Nenadic, Igor; Sauer, Heinrich; Rujescu, Dan; Maier, Wolfgang; Børglum, Anders; Ophoff, Roel; Cichon, Sven; Nöthen, Markus M; Rietschel, Marcella; Mattheisen, Manuel; Brors, Benedikt

    2014-06-01

    In the present study, an integrated hierarchical approach was applied to: (1) identify pathways associated with susceptibility to schizophrenia; (2) detect genes that may be potentially affected in these pathways since they contain an associated polymorphism; and (3) annotate the functional consequences of such single-nucleotide polymorphisms (SNPs) in the affected genes or their regulatory regions. The Global Test was applied to detect schizophrenia-associated pathways using discovery and replication datasets comprising 5,040 and 5,082 individuals of European ancestry, respectively. Information concerning functional gene-sets was retrieved from the Kyoto Encyclopedia of Genes and Genomes, Gene Ontology, and the Molecular Signatures Database. Fourteen of the gene-sets or pathways identified in the discovery dataset were confirmed in the replication dataset. These include functional processes involved in transcriptional regulation and gene expression, synapse organization, cell adhesion, and apoptosis. For two genes, i.e. CTCF and CACNB2, evidence for association with schizophrenia was available (at the gene-level) in both the discovery study and published data from the Psychiatric Genomics Consortium schizophrenia study. Furthermore, these genes mapped to four of the 14 presently identified pathways. Several of the SNPs assigned to CTCF and CACNB2 have potential functional consequences, and a gene in close proximity to CACNB2, i.e. ARL5B, was identified as a potential gene of interest. Application of the present hierarchical approach thus allowed: (1) identification of novel biological gene-sets or pathways with potential involvement in the etiology of schizophrenia, as well as replication of these findings in an independent cohort; (2) detection of genes of interest for future follow-up studies; and (3) the highlighting of novel genes in previously reported candidate regions for schizophrenia.

  7. Genome Comparison of Human and Non-Human Malaria Parasites Reveals Species Subset-Specific Genes Potentially Linked to Human Disease

    PubMed Central

    Frech, Christian; Chen, Nansheng

    2011-01-01

    Genes underlying important phenotypic differences between Plasmodium species, the causative agents of malaria, are frequently found in only a subset of species and cluster at dynamically evolving subtelomeric regions of chromosomes. We hypothesized that chromosome-internal regions of Plasmodium genomes harbour additional species subset-specific genes that underlie differences in human pathogenicity, human-to-human transmissibility, and human virulence. We combined sequence similarity searches with synteny block analyses to identify species subset-specific genes in chromosome-internal regions of six published Plasmodium genomes, including Plasmodium falciparum, Plasmodium vivax, Plasmodium knowlesi, Plasmodium yoelii, Plasmodium berghei, and Plasmodium chabaudi. To improve comparative analysis, we first revised incorrectly annotated gene models using homology-based gene finders and examined putative subset-specific genes within syntenic contexts. Confirmed subset-specific genes were then analyzed for their role in biological pathways and examined for molecular functions using publicly available databases. We identified 16 genes that are well conserved in the three primate parasites but not found in rodent parasites, including three key enzymes of the thiamine (vitamin B1) biosynthesis pathway. Thirteen genes were found to be present in both human parasites but absent in the monkey parasite P. knowlesi, including genes specifically upregulated in sporozoites or gametocytes that could be linked to parasite transmission success between humans. Furthermore, we propose 15 chromosome-internal P. falciparum-specific genes as new candidate genes underlying increased human virulence and detected a currently uncharacterized cluster of P. vivax-specific genes on chromosome 6 likely involved in erythrocyte invasion. In conclusion, Plasmodium species harbour many chromosome-internal differences in the form of protein-coding genes, some of which are potentially linked to human disease and thus promising leads for future laboratory research. PMID:22215999

  8. Identification, Classification, and Expression Analysis of GRAS Gene Family in Malus domestica.

    PubMed

    Fan, Sheng; Zhang, Dong; Gao, Cai; Zhao, Ming; Wu, Haiqin; Li, Youmei; Shen, Yawen; Han, Mingyu

    2017-01-01

    GRAS genes encode plant-specific transcription factors that play important roles in plant growth and development. However, little is known about the GRAS gene family in apple. In this study, 127 GRAS genes were identified in the apple ( Malus domestica Borkh.) genome and named MdGRAS1 to MdGRAS127 according to their chromosomal locations. The chemical characteristics, gene structures and evolutionary relationships of the MdGRAS genes were investigated. The 127 MdGRAS genes could be grouped into eight subfamilies based on their structural features and phylogenetic relationships. Further analysis of gene structures, segmental and tandem duplication, gene phylogeny and tissue-specific expression with ArrayExpress database indicated their diversification in quantity, structure and function. We further examined the expression pattern of MdGRAS genes during apple flower induction with transcriptome sequencing. Eight higher MdGRAS ( MdGRAS6, 26, 28, 44, 53, 64, 107 , and 122 ) genes were surfaced. Further quantitative reverse transcription PCR indicated that the candidate eight genes showed distinct expression patterns among different tissues (leaves, stems, flowers, buds, and fruits). The transcription levels of eight genes were also investigated with various flowering related treatments (GA 3 , 6-BA, and sucrose) and different flowering varieties (Yanfu No. 6 and Nagafu No. 2). They all were affected by flowering-related circumstance and showed different expression level. Changes in response to these hormone or sugar related treatments indicated their potential involvement during apple flower induction. Taken together, our results provide rich resources for studying GRAS genes and their potential clues in genetic improvement of apple flowering, which enriches biological theories of GRAS genes in apple and their involvement in flower induction of fruit trees.

  9. A data-mining approach to rank candidate protein-binding partners-The case of biogenesis of lysosome-related organelles complex-1 (BLOC-1).

    PubMed

    Rodriguez-Fernandez, I A; Dell'Angelica, E C

    2009-04-01

    The study of protein-protein interactions is a powerful approach to uncovering the molecular function of gene products associated with human disease. Protein-protein interaction data are accumulating at an unprecedented pace owing to interactomics projects, although it has been recognized that a significant fraction of these data likely represents false positives. During our studies of biogenesis of lysosome-related organelles complex-1 (BLOC-1), a protein complex involved in protein trafficking and containing the products of genes mutated in Hermansky-Pudlak syndrome, we faced the problem of having too many candidate binding partners to pursue experimentally. In this work, we have explored ways of efficiently gathering high-quality information about candidate binding partners and presenting the information in a visually friendly manner. We applied the approach to rank 70 candidate binding partners of human BLOC-1 and 102 candidates of its counterpart from Drosophila melanogaster. The top candidate for human BLOC-1 was the small GTPase encoded by the RAB11A gene, which is a paralogue of the Rab38 and Rab32 proteins in mammals and the lightoid gene product in flies. Interestingly, genetic analyses in D. melanogaster uncovered a synthetic sick/lethal interaction between Rab11 and lightoid. The data-mining approach described herein can be customized to study candidate binding partners for other proteins or possibly candidates derived from other types of 'omics' data.

  10. RNA-Seq and molecular docking reveal multi-level pesticide resistance in the bed bug

    PubMed Central

    2012-01-01

    Background Bed bugs (Cimex lectularius) are hematophagous nocturnal parasites of humans that have attained high impact status due to their worldwide resurgence. The sudden and rampant resurgence of C. lectularius has been attributed to numerous factors including frequent international travel, narrower pest management practices, and insecticide resistance. Results We performed a next-generation RNA sequencing (RNA-Seq) experiment to find differentially expressed genes between pesticide-resistant (PR) and pesticide-susceptible (PS) strains of C. lectularius. A reference transcriptome database of 51,492 expressed sequence tags (ESTs) was created by combining the databases derived from de novo assembled mRNA-Seq tags (30,404 ESTs) and our previous 454 pyrosequenced database (21,088 ESTs). The two-way GLMseq analysis revealed ~15,000 highly significant differentially expressed ESTs between the PR and PS strains. Among the top 5,000 differentially expressed ESTs, 109 putative defense genes (cuticular proteins, cytochrome P450s, antioxidant genes, ABC transporters, glutathione S-transferases, carboxylesterases and acetyl cholinesterase) involved in penetration resistance and metabolic resistance were identified. Tissue and development-specific expression of P450 CYP3 clan members showed high mRNA levels in the cuticle, Malpighian tubules, and midgut; and in early instar nymphs, respectively. Lastly, molecular modeling and docking of a candidate cytochrome P450 (CYP397A1V2) revealed the flexibility of the deduced protein to metabolize a broad range of insecticide substrates including DDT, deltamethrin, permethrin, and imidacloprid. Conclusions We developed significant molecular resources for C. lectularius putatively involved in metabolic resistance as well as those participating in other modes of insecticide resistance. RNA-Seq profiles of PR strains combined with tissue-specific profiles and molecular docking revealed multi-level insecticide resistance in C. lectularius. Future research that is targeted towards RNA interference (RNAi) on the identified metabolic targets such as cytochrome P450s and cuticular proteins could lay the foundation for a better understanding of the genetic basis of insecticide resistance in C. lectularius. PMID:22226239

  11. Biocatalytic Production of Trehalose from Maltose by Using Whole Cells of Permeabilized Recombinant Escherichia coli

    PubMed Central

    Sun, Ye; Mei, Wending; Ouyang, Jia

    2015-01-01

    Trehalose is a non-reducing disaccharide, which can protect proteins, lipid membranes, and cells from desiccation, refrigeration, dehydration, and other harsh environments. Trehalose can be produced by different pathways and trehalose synthase pathway is a convenient, practical, and low-cost pathway for the industrial production of trehalose. In this study, 3 candidate treS genes were screened from genomic databases of Pseudomonas and expressed in Escherichia coli. One of them from P. stutzeri A1501 exhibited the best transformation ability from maltose into trehalose and the least byproduct. Thus, whole cells of this recombinant E. coli were used as biocatalyst for trehalose production. In order to improve the conversion rate of maltose to trehalose, optimization of the permeabilization and biotransformation were carried out. Under optimal conditions, 92.2 g/l trehalose was produced with a high productivity of 23.1 g/(l h). No increase of glucose was detected during the whole course. The biocatalytic process developed in this study might serve as a candidate for the large scale production of trehalose. PMID:26462117

  12. A public platform for the verification of the phenotypic effect of candidate genes for resistance to aflatoxin accumulation and Aspergillus flavus infection in maize

    USDA-ARS?s Scientific Manuscript database

    A public candidate gene testing pipeline for resistance to aflatoxin accumulation or Aspergillus flavus infection in maize is presented here. The pipeline consists of steps for identifying, testing, and verifying the association of any maize gene sequence with resistance under field conditions. Reso...

  13. SNP discovery in candidate adaptive genes using exon capture in a free-ranging alpine ungulate

    Treesearch

    Gretchen H. Roffler; Stephen J. Amish; Seth Smith; Ted Cosart; Marty Kardos; Michael K. Schwartz; Gordon Luikart

    2016-01-01

    Identification of genes underlying genomic signatures of natural selection is key to understanding adaptation to local conditions. We used targeted resequencing to identify SNP markers in 5321 candidate adaptive genes associated with known immunological, metabolic and growth functions in ovids and other ungulates. We selectively targeted 8161 exons in protein-coding...

  14. MiRNA-TF-gene network analysis through ranking of biomolecules for multi-informative uterine leiomyoma dataset.

    PubMed

    Mallik, Saurav; Maulik, Ujjwal

    2015-10-01

    Gene ranking is an important problem in bioinformatics. Here, we propose a new framework for ranking biomolecules (viz., miRNAs, transcription-factors/TFs and genes) in a multi-informative uterine leiomyoma dataset having both gene expression and methylation data using (statistical) eigenvector centrality based approach. At first, genes that are both differentially expressed and methylated, are identified using Limma statistical test. A network, comprising these genes, corresponding TFs from TRANSFAC and ITFP databases, and targeter miRNAs from miRWalk database, is then built. The biomolecules are then ranked based on eigenvector centrality. Our proposed method provides better average accuracy in hub gene and non-hub gene classifications than other methods. Furthermore, pre-ranked Gene set enrichment analysis is applied on the pathway database as well as GO-term databases of Molecular Signatures Database with providing a pre-ranked gene-list based on different centrality values for comparing among the ranking methods. Finally, top novel potential gene-markers for the uterine leiomyoma are provided. Copyright © 2015 Elsevier Inc. All rights reserved.

  15. BGDB: a database of bivalent genes.

    PubMed

    Li, Qingyan; Lian, Shuabin; Dai, Zhiming; Xiang, Qian; Dai, Xianhua

    2013-01-01

    Bivalent gene is a gene marked with both H3K4me3 and H3K27me3 epigenetic modification in the same area, and is proposed to play a pivotal role related to pluripotency in embryonic stem (ES) cells. Identification of these bivalent genes and understanding their functions are important for further research of lineage specification and embryo development. So far, lots of genome-wide histone modification data were generated in mouse and human ES cells. These valuable data make it possible to identify bivalent genes, but no comprehensive data repositories or analysis tools are available for bivalent genes currently. In this work, we develop BGDB, the database of bivalent genes. The database contains 6897 bivalent genes in human and mouse ES cells, which are manually collected from scientific literature. Each entry contains curated information, including genomic context, sequences, gene ontology and other relevant information. The web services of BGDB database were implemented with PHP + MySQL + JavaScript, and provide diverse query functions. Database URL: http://dailab.sysu.edu.cn/bgdb/

  16. A genomic scan for selection reveals candidates for genes involved in the evolution of cultivated sunflower (Helianthus annuus).

    PubMed

    Chapman, Mark A; Pashley, Catherine H; Wenzler, Jessica; Hvala, John; Tang, Shunxue; Knapp, Steven J; Burke, John M

    2008-11-01

    Genomic scans for selection are a useful tool for identifying genes underlying phenotypic transitions. In this article, we describe the results of a genome scan designed to identify candidates for genes targeted by selection during the evolution of cultivated sunflower. This work involved screening 492 loci derived from ESTs on a large panel of wild, primitive (i.e., landrace), and improved sunflower (Helianthus annuus) lines. This sampling strategy allowed us to identify candidates for selectively important genes and investigate the likely timing of selection. Thirty-six genes showed evidence of selection during either domestication or improvement based on multiple criteria, and a sequence-based test of selection on a subset of these loci confirmed this result. In view of what is known about the structure of linkage disequilibrium across the sunflower genome, these genes are themselves likely to have been targeted by selection, rather than being merely linked to the actual targets. While the selection candidates showed a broad range of putative functions, they were enriched for genes involved in amino acid synthesis and protein catabolism. Given that a similar pattern has been detected in maize (Zea mays), this finding suggests that selection on amino acid composition may be a general feature of the evolution of crop plants. In terms of genomic locations, the selection candidates were significantly clustered near quantitative trait loci (QTL) that contribute to phenotypic differences between wild and cultivated sunflower, and specific instances of QTL colocalization provide some clues as to the roles that these genes may have played during sunflower evolution.

  17. A priori and a posteriori approaches for finding genes of evolutionary interest in non-model species: osmoregulatory genes in the kidney transcriptome of the desert rodent Dipodomys spectabilis (banner-tailed kangaroo rat).

    PubMed

    Marra, Nicholas J; Eo, Soo Hyung; Hale, Matthew C; Waser, Peter M; DeWoody, J Andrew

    2012-12-01

    One common goal in evolutionary biology is the identification of genes underlying adaptive traits of evolutionary interest. Recently next-generation sequencing techniques have greatly facilitated such evolutionary studies in species otherwise depauperate of genomic resources. Kangaroo rats (Dipodomys sp.) serve as exemplars of adaptation in that they inhabit extremely arid environments, yet require no drinking water because of ultra-efficient kidney function and osmoregulation. As a basis for identifying water conservation genes in kangaroo rats, we conducted a priori bioinformatics searches in model rodents (Mus musculus and Rattus norvegicus) to identify candidate genes with known or suspected osmoregulatory function. We then obtained 446,758 reads via 454 pyrosequencing to characterize genes expressed in the kidney of banner-tailed kangaroo rats (Dipodomys spectabilis). We also determined candidates a posteriori by identifying genes that were overexpressed in the kidney. The kangaroo rat sequences revealed nine different a priori candidate genes predicted from our Mus and Rattus searches, as well as 32 a posteriori candidate genes that were overexpressed in kidney. Mutations in two of these genes, Slc12a1 and Slc12a3, cause human renal diseases that result in the inability to concentrate urine. These genes are likely key determinants of physiological water conservation in desert rodents. Copyright © 2012 Elsevier Inc. All rights reserved.

  18. An expressed sequence tag (EST) data mining strategy succeeding in the discovery of new G-protein coupled receptors.

    PubMed

    Wittenberger, T; Schaller, H C; Hellebrand, S

    2001-03-30

    We have developed a comprehensive expressed sequence tag database search method and used it for the identification of new members of the G-protein coupled receptor superfamily. Our approach proved to be especially useful for the detection of expressed sequence tag sequences that do not encode conserved parts of a protein, making it an ideal tool for the identification of members of divergent protein families or of protein parts without conserved domain structures in the expressed sequence tag database. At least 14 of the expressed sequence tags found with this strategy are promising candidates for new putative G-protein coupled receptors. Here, we describe the sequence and expression analysis of five new members of this receptor superfamily, namely GPR84, GPR86, GPR87, GPR90 and GPR91. We also studied the genomic structure and chromosomal localization of the respective genes applying in silico methods. A cluster of six closely related G-protein coupled receptors was found on the human chromosome 3q24-3q25. It consists of four orphan receptors (GPR86, GPR87, GPR91, and H963), the purinergic receptor P2Y1, and the uridine 5'-diphosphoglucose receptor KIAA0001. It seems likely that these receptors evolved from a common ancestor and therefore might have related ligands. In conclusion, we describe a data mining procedure that proved to be useful for the identification and first characterization of new genes and is well applicable for other gene families. Copyright 2001 Academic Press.

  19. De novo Transcriptome Assembly of Common Wild Rice (Oryza rufipogon Griff.) and Discovery of Drought-Response Genes in Root Tissue Based on Transcriptomic Data.

    PubMed

    Tian, Xin-Jie; Long, Yan; Wang, Jiao; Zhang, Jing-Wen; Wang, Yan-Yan; Li, Wei-Min; Peng, Yu-Fa; Yuan, Qian-Hua; Pei, Xin-Wu

    2015-01-01

    The perennial O. rufipogon (common wild rice), which is considered to be the ancestor of Asian cultivated rice species, contains many useful genetic resources, including drought resistance genes. However, few studies have identified the drought resistance and tissue-specific genes in common wild rice. In this study, transcriptome sequencing libraries were constructed, including drought-treated roots (DR) and control leaves (CL) and roots (CR). Using Illumina sequencing technology, we generated 16.75 million bases of high-quality sequence data for common wild rice and conducted de novo assembly and annotation of genes without prior genome information. These reads were assembled into 119,332 unigenes with an average length of 715 bp. A total of 88,813 distinct sequences (74.42% of unigenes) significantly matched known genes in the NCBI NT database. Differentially expressed gene (DEG) analysis showed that 3617 genes were up-regulated and 4171 genes were down-regulated in the CR library compared with the CL library. Among the DEGs, 535 genes were expressed in roots but not in shoots. A similar comparison between the DR and CR libraries showed that 1393 genes were up-regulated and 315 genes were down-regulated in the DR library compared with the CR library. Finally, 37 genes that were specifically expressed in roots were screened after comparing the DEGs identified in the above-described analyses. This study provides a transcriptome sequence resource for common wild rice plants and establishes a digital gene expression profile of wild rice plants under drought conditions using the assembled transcriptome data as a reference. Several tissue-specific and drought-stress-related candidate genes were identified, representing a fully characterized transcriptome and providing a valuable resource for genetic and genomic studies in plants.

  20. Leveraging lung tissue transcriptome to uncover candidate causal genes in COPD genetic associations.

    PubMed

    Lamontagne, Maxime; Bérubé, Jean-Christophe; Obeidat, Ma'en; Cho, Michael H; Hobbs, Brian D; Sakornsakolpat, Phuwanat; de Jong, Kim; Boezen, H Marike; Nickle, David; Hao, Ke; Timens, Wim; van den Berge, Maarten; Joubert, Philippe; Laviolette, Michel; Sin, Don D; Paré, Peter D; Bossé, Yohan

    2018-05-15

    Causal genes of chronic obstructive pulmonary disease (COPD) remain elusive. The current study aims at integrating genome-wide association studies (GWAS) and lung expression quantitative trait loci (eQTL) data to map COPD candidate causal genes and gain biological insights into the recently discovered COPD susceptibility loci. Two complementary genomic datasets on COPD were studied. First, the lung eQTL dataset which included whole-genome gene expression and genotyping data from 1038 individuals. Second, the largest COPD GWAS to date from the International COPD Genetics Consortium (ICGC) with 13 710 cases and 38 062 controls. Methods that integrated GWAS with eQTL signals including transcriptome-wide association study (TWAS), colocalization and Mendelian randomization-based (SMR) approaches were used to map causality genes, i.e. genes with the strongest evidence of being the functional effector at specific loci. These methods were applied at the genome-wide level and at COPD risk loci derived from the GWAS literature. Replication was performed using lung data from GTEx. We collated 129 non-overlapping risk loci for COPD from the GWAS literature. At the genome-wide scale, 12 new COPD candidate genes/loci were revealed and six replicated in GTEx including CAMK2A, DMPK, MYO15A, TNFRSF10A, BTN3A2 and TRBV30. In addition, we mapped candidate causal genes for 60 out of the 129 GWAS-nominated loci and 23 of them were replicated in GTEx. Mapping candidate causal genes in lung tissue represents an important contribution to the genetics of COPD, enriches our biological interpretation of GWAS findings, and brings us closer to clinical translation of genetic associations.

Top