Science.gov

Sample records for global gene mining

  1. Global gene mining and the pharmaceutical industry

    SciTech Connect

    Knudsen, Lisbeth E.

    2005-09-01

    Worldwide efforts are ongoing in optimizing medical treatment by searching for the right medicine at the right dose for the individual. Metabolism is regulated by polymorphisms, which may be tested by relatively simple SNP analysis, however requiring DNA from the test individuals. Target genes for the efficiency of a given medicine or predisposition of a given disease are also subject to population studies, e.g., in Iceland, Estonia, Sweden, etc. For hypothesis testing and generation, several bio-banks with samples from patients and healthy persons within the pharmaceutical industry have been established during the past 10 years. Thus, more than 100,000 samples are stored in the freezers of either the pharmaceutical companies or their contractual partners at universities and test institutions. Ethical issues related to data protection of the individuals providing samples to bio-banks are several: nature and extent of information prior to consent, coverage of the consent given by the study person, labeling and storage of the sample and data (coded or anonymized). In general, genetic test data, once obtained, are permanent and cannot be changed. The test data may imply information that is not beneficial to the patient and his/her family (e.g., employment opportunities, insurance, etc.). Furthermore, there may be a long latency between the analysis of the genetic test and the clinical expression of the disease and wide differences in the disease patterns. Consequently, information about some genetic test data may stigmatize patients leading to poor quality of life. This has raised the issue of 'genetic exceptionalism' justifying specific regulation of use of genetic information. Discussions on how to handle sampling and data are ongoing within the industry and the regulatory sphere, the European Agency for the Evaluation of Medicinal Products (EMEA) having issued a position paper, the Council for International Organizations of Medical Sciences (CIOMS) having a working

  2. Coal mine methane global review

    SciTech Connect

    2008-07-01

    This is the second edition of the Coal Mine Methane Global Overview, updated in the summer of 2008. This document contains individual, comprehensive profiles that characterize the coal and coal mine methane sectors of 33 countries - 22 methane to market partners and an additional 11 coal-producing nations. The executive summary provides summary tables that include statistics on coal reserves, coal production, methane emissions, and CMM projects activity. An International Coal Mine Methane Projects Database accompanies this overview. It contains more detailed and comprehensive information on over two hundred CMM recovery and utilization projects around the world. Project information in the database is updated regularly. This document will be updated annually. Suggestions for updates and revisions can be submitted to the Administrative Support Group and will be incorporate into the document as appropriate.

  3. Mining for survival genes.

    PubMed

    Dawson, V L; Dawson, T M

    2006-12-01

    Many stressful, but not lethal, stimuli activate endogenous protective mechanisms that significantly decrease the degree of injury to subsequent injurious stimuli. This protective mechanism is termed preconditioning and tolerance. It occurs across organ systems including the brain and nervous system. Preconditioning has been investigated in cell and animal models and recently been shown to potentially occur in human brain. Learning more about these powerful endogenous neuroprotective mechanisms could help identify new approaches to treat patients with stroke and other central nervous system disorders or injury. Cell and animal models are helping us to better understand the network response of gene and protein expression that activates the neuroprotective response.

  4. Improving mine safety technology and training: establishing US global leadership

    SciTech Connect

    2006-12-15

    In 2006, the USA's record of mine safety was interrupted by fatalities that rocked the industry and caused the National Mining Association and its members to recommit to returning the US underground coal mining industry to a global mine safety leadership role. This report details a comprehensive approach to increase the odds of survival for miners in emergency situations and to create a culture of prevention of accidents. Among its 75 recommendations are a need to improve communications, mine rescue training, and escape and protection of miners. Section headings of the report are: Introduction; Review of mine emergency situations in the past 25 years: identifying and addressing the issues and complexities; Risk-based design and management; Communications technology; Escape and protection strategies; Emergency response and mine rescue procedures; Training for preparedness; Summary of recommendations; and Conclusions. 37 refs., 3 figs., 5 apps.

  5. Mining biological databases for candidate disease genes

    NASA Astrophysics Data System (ADS)

    Braun, Terry A.; Scheetz, Todd; Webster, Gregg L.; Casavant, Thomas L.

    2001-07-01

    The publicly-funded effort to sequence the complete nucleotide sequence of the human genome, the Human Genome Project (HGP), has currently produced more than 93% of the 3 billion nucleotides of the human genome into a preliminary `draft' format. In addition, several valuable sources of information have been developed as direct and indirect results of the HGP. These include the sequencing of model organisms (rat, mouse, fly, and others), gene discovery projects (ESTs and full-length), and new technologies such as expression analysis and resources (micro-arrays or gene chips). These resources are invaluable for the researchers identifying the functional genes of the genome that transcribe and translate into the transcriptome and proteome, both of which potentially contain orders of magnitude more complexity than the genome itself. Preliminary analyses of this data identified approximately 30,000 - 40,000 human `genes.' However, the bulk of the effort still remains -- to identify the functional and structural elements contained within the transcriptome and proteome, and to associate function in the transcriptome and proteome to genes. A fortuitous consequence of the HGP is the existence of hundreds of databases containing biological information that may contain relevant data pertaining to the identification of disease-causing genes. The task of mining these databases for information on candidate genes is a commercial application of enormous potential. We are developing a system to acquire and mine data from specific databases to aid our efforts to identify disease genes. A high speed cluster of Linux of workstations is used to analyze sequence and perform distributed sequence alignments as part of our data mining and processing. This system has been used to mine GeneMap99 sequences within specific genomic intervals to identify potential candidate disease genes associated with Bardet-Biedle Syndrome (BBS).

  6. Mining gene-chip data

    NASA Astrophysics Data System (ADS)

    Kloster, Morten

    2005-03-01

    DNA microarray (``gene chip'') technology has enabled a rapid accumulation of gene-expression data for model organisms such as S. cerevisiae and C. elegans, as well as for H. sapiens, raising the issue of how best to extract information about the gene regulatory networks of these organisms from this data. While basic clustering algorithms have been successful at finding genes that are coregulated for a small, specific set of experimental conditions, these algorithms are less effective when applied to large, varied data sets. One of the major challenges in analyzing the data is the diversity in both size and signal strength of the various transcriptional modules, i.e. sets of coregulated genes along with the sets of conditions for which the genes are strongly coregulated. One method that has proven successful at identifying large and/or strong modules is the Iterative Signature Algorithm (ISA) [1]. A modified version of the ISA algorithm, the Progressive Iterative Signature Algorithm (PISA), is also able to identify smaller, weaker modules by sequentially eliminating transcriptional modules as they are identified. Applying these algorithms to a large set of yeast gene expression data illustrates the strengths and weaknesses of each approach. [1] Bergmann, S., Ihmels, J., and Barkai, N., Phys. Rev. E 67, 031902 (2002).

  7. Microarray data and gene expression statistics for Saccharomyces cerevisiae exposed to simulated asbestos mine drainage.

    PubMed

    Driscoll, Heather E; Murray, Janet M; English, Erika L; Hunter, Timothy C; Pivarski, Kara; Dolci, Elizabeth D

    2017-08-01

    Here we describe microarray expression data (raw and normalized), experimental metadata, and gene-level data with expression statistics from Saccharomyces cerevisiae exposed to simulated asbestos mine drainage from the Vermont Asbestos Group (VAG) Mine on Belvidere Mountain in northern Vermont, USA. For nearly 100 years (between the late 1890s and 1993), chrysotile asbestos fibers were extracted from serpentinized ultramafic rock at the VAG Mine for use in construction and manufacturing industries. Studies have shown that water courses and streambeds nearby have become contaminated with asbestos mine tailings runoff, including elevated levels of magnesium, nickel, chromium, and arsenic, elevated pH, and chrysotile asbestos-laden mine tailings, due to leaching and gradual erosion of massive piles of mine waste covering approximately 9 km(2). We exposed yeast to simulated VAG Mine tailings leachate to help gain insight on how eukaryotic cells exposed to VAG Mine drainage may respond in the mine environment. Affymetrix GeneChip® Yeast Genome 2.0 Arrays were utilized to assess gene expression after 24-h exposure to simulated VAG Mine tailings runoff. The chemistry of mine-tailings leachate, mine-tailings leachate plus yeast extract peptone dextrose media, and control yeast extract peptone dextrose media is also reported. To our knowledge this is the first dataset to assess global gene expression patterns in a eukaryotic model system simulating asbestos mine tailings runoff exposure. Raw and normalized gene expression data are accessible through the National Center for Biotechnology Information Gene Expression Omnibus (NCBI GEO) Database Series GSE89875 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE89875).

  8. Documenting the global impacts of beach sand mining

    NASA Astrophysics Data System (ADS)

    Young, R.; Griffith, A.

    2009-04-01

    For centuries, beach sand has been mined for use as aggregate in concrete, for heavy minerals, and for construction fill. The global extent and impact of this phenomenon has gone relatively unnoticed by academics, NGOs, and major news sources. Most reports of sand mining activities are found at the very local scale (if the mining is ever documented at all). Yet, sand mining in many localities has resulted in the complete destruction of beach (and related) ecosystems along with severe impacts to coastal protection and tourism. The Program for the Study of Developed Shorelines at Western Carolina University and Beachcare.org have initiated the construction of a global database of beach sand mining activities. The database is being built through a combination of site visits and through the data mining of media resources, peer reviewed papers, and reports from private and governmental entities. Currently, we have documented sand mining in 35 countries on 6 continents representing the removal of millions of cubic meters of sand. Problems extend from Asia where critical infrastructure has been disrupted by sand mining to the Caribbean where policy reform has swiftly followed a highly publicized theft of sand. The Program for the Study of Developed Shorelines recently observed extensive sand mining in Morocco at the regional scale. Tens of kilometers of beach have been stripped of sand and the mining continues southward reducing hope of a thriving tourism-based economy. Problems caused by beach sand mining include: destruction of natural beaches and the ecosystems they protect (e.g. dunes, wetlands), habitat loss for globally important species (e.g. turtles, shorebirds), destruction of nearshore marine ecosystems, increased shoreline erosion rates, reduced protection from storms, tsunamis, and wave events, and economic losses through tourist abandonment and loss of coastal aesthetics. The threats posed by sand mining are made even more critical given the prospect of a

  9. Mining Gene Ontology Data with AGENDA.

    PubMed

    Ovezmyradov, Guvanch; Lu, Qianhao; Göpfert, Martin C

    2012-01-01

    The Gene Ontology (GO) initiative is a collaborative effort that uses controlled vocabularies for annotating genetic information. We here present AGENDA (Application for mining Gene Ontology Data), a novel web-based tool for accessing the GO database. AGENDA allows the user to simultaneously retrieve and compare gene lists linked to different GO terms in diverse species using batch queries, facilitating comparative approaches to genetic information. The web-based application offers diverse search options and allows the user to bookmark, visualize, and download the results. AGENDA is an open source web-based application that is freely available for non-commercial use at the project homepage. URL: http://sourceforge.net/projects/bioagenda.

  10. Text Mining in Cancer Gene and Pathway Prioritization

    PubMed Central

    Luo, Yuan; Riedlinger, Gregory; Szolovits, Peter

    2014-01-01

    Prioritization of cancer implicated genes has received growing attention as an effective way to reduce wet lab cost by computational analysis that ranks candidate genes according to the likelihood that experimental verifications will succeed. A multitude of gene prioritization tools have been developed, each integrating different data sources covering gene sequences, differential expressions, function annotations, gene regulations, protein domains, protein interactions, and pathways. This review places existing gene prioritization tools against the backdrop of an integrative Omic hierarchy view toward cancer and focuses on the analysis of their text mining components. We explain the relatively slow progress of text mining in gene prioritization, identify several challenges to current text mining methods, and highlight a few directions where more effective text mining algorithms may improve the overall prioritization task and where prioritizing the pathways may be more desirable than prioritizing only genes. PMID:25392685

  11. Text mining in cancer gene and pathway prioritization.

    PubMed

    Luo, Yuan; Riedlinger, Gregory; Szolovits, Peter

    2014-01-01

    Prioritization of cancer implicated genes has received growing attention as an effective way to reduce wet lab cost by computational analysis that ranks candidate genes according to the likelihood that experimental verifications will succeed. A multitude of gene prioritization tools have been developed, each integrating different data sources covering gene sequences, differential expressions, function annotations, gene regulations, protein domains, protein interactions, and pathways. This review places existing gene prioritization tools against the backdrop of an integrative Omic hierarchy view toward cancer and focuses on the analysis of their text mining components. We explain the relatively slow progress of text mining in gene prioritization, identify several challenges to current text mining methods, and highlight a few directions where more effective text mining algorithms may improve the overall prioritization task and where prioritizing the pathways may be more desirable than prioritizing only genes.

  12. Mining gene expression data by interpreting principal components

    PubMed Central

    Roden, Joseph C; King, Brandon W; Trout, Diane; Mortazavi, Ali; Wold, Barbara J; Hart, Christopher E

    2006-01-01

    Background There are many methods for analyzing microarray data that group together genes having similar patterns of expression over all conditions tested. However, in many instances the biologically important goal is to identify relatively small sets of genes that share coherent expression across only some conditions, rather than all or most conditions as required in traditional clustering; e.g. genes that are highly up-regulated and/or down-regulated similarly across only a subset of conditions. Equally important is the need to learn which conditions are the decisive ones in forming such gene sets of interest, and how they relate to diverse conditional covariates, such as disease diagnosis or prognosis. Results We present a method for automatically identifying such candidate sets of biologically relevant genes using a combination of principal components analysis and information theoretic metrics. To enable easy use of our methods, we have developed a data analysis package that facilitates visualization and subsequent data mining of the independent sources of significant variation present in gene microarray expression datasets (or in any other similarly structured high-dimensional dataset). We applied these tools to two public datasets, and highlight sets of genes most affected by specific subsets of conditions (e.g. tissues, treatments, samples, etc.). Statistically significant associations for highlighted gene sets were shown via global analysis for Gene Ontology term enrichment. Together with covariate associations, the tool provides a basis for building testable hypotheses about the biological or experimental causes of observed variation. Conclusion We provide an unsupervised data mining technique for diverse microarray expression datasets that is distinct from major methods now in routine use. In test uses, this method, based on publicly available gene annotations, appears to identify numerous sets of biologically relevant genes. It has proven especially

  13. Data mining and genetic algorithm based gene/SNP selection.

    PubMed

    Shah, Shital C; Kusiak, Andrew

    2004-07-01

    Genomic studies provide large volumes of data with the number of single nucleotide polymorphisms (SNPs) ranging into thousands. The analysis of SNPs permits determining relationships between genotypic and phenotypic information as well as the identification of SNPs related to a disease. The growing wealth of information and advances in biology call for the development of approaches for discovery of new knowledge. One such area is the identification of gene/SNP patterns impacting cure/drug development for various diseases. A new approach for predicting drug effectiveness is presented. The approach is based on data mining and genetic algorithms. A global search mechanism, weighted decision tree, decision-tree-based wrapper, a correlation-based heuristic, and the identification of intersecting feature sets are employed for selecting significant genes. The feature selection approach has resulted in 85% reduction of number of features. The relative increase in cross-validation accuracy and specificity for the significant gene/SNP set was 10% and 3.2%, respectively. The feature selection approach was successfully applied to data sets for drug and placebo subjects. The number of features has been significantly reduced while the quality of knowledge was enhanced. The feature set intersection approach provided the most significant genes/SNPs. The results reported in the paper discuss associations among SNPs resulting in patient-specific treatment protocols.

  14. Gene prioritization and clustering by multi-view text mining.

    PubMed

    Yu, Shi; Tranchevent, Leon-Charles; De Moor, Bart; Moreau, Yves

    2010-01-14

    Text mining has become a useful tool for biologists trying to understand the genetics of diseases. In particular, it can help identify the most interesting candidate genes for a disease for further experimental analysis. Many text mining approaches have been introduced, but the effect of disease-gene identification varies in different text mining models. Thus, the idea of incorporating more text mining models may be beneficial to obtain more refined and accurate knowledge. However, how to effectively combine these models still remains a challenging question in machine learning. In particular, it is a non-trivial issue to guarantee that the integrated model performs better than the best individual model. We present a multi-view approach to retrieve biomedical knowledge using different controlled vocabularies. These controlled vocabularies are selected on the basis of nine well-known bio-ontologies and are applied to index the vast amounts of gene-based free-text information available in the MEDLINE repository. The text mining result specified by a vocabulary is considered as a view and the obtained multiple views are integrated by multi-source learning algorithms. We investigate the effect of integration in two fundamental computational disease gene identification tasks: gene prioritization and gene clustering. The performance of the proposed approach is systematically evaluated and compared on real benchmark data sets. In both tasks, the multi-view approach demonstrates significantly better performance than other comparing methods. In practical research, the relevance of specific vocabulary pertaining to the task is usually unknown. In such case, multi-view text mining is a superior and promising strategy for text-based disease gene identification.

  15. Gene prioritization and clustering by multi-view text mining

    PubMed Central

    2010-01-01

    Background Text mining has become a useful tool for biologists trying to understand the genetics of diseases. In particular, it can help identify the most interesting candidate genes for a disease for further experimental analysis. Many text mining approaches have been introduced, but the effect of disease-gene identification varies in different text mining models. Thus, the idea of incorporating more text mining models may be beneficial to obtain more refined and accurate knowledge. However, how to effectively combine these models still remains a challenging question in machine learning. In particular, it is a non-trivial issue to guarantee that the integrated model performs better than the best individual model. Results We present a multi-view approach to retrieve biomedical knowledge using different controlled vocabularies. These controlled vocabularies are selected on the basis of nine well-known bio-ontologies and are applied to index the vast amounts of gene-based free-text information available in the MEDLINE repository. The text mining result specified by a vocabulary is considered as a view and the obtained multiple views are integrated by multi-source learning algorithms. We investigate the effect of integration in two fundamental computational disease gene identification tasks: gene prioritization and gene clustering. The performance of the proposed approach is systematically evaluated and compared on real benchmark data sets. In both tasks, the multi-view approach demonstrates significantly better performance than other comparing methods. Conclusions In practical research, the relevance of specific vocabulary pertaining to the task is usually unknown. In such case, multi-view text mining is a superior and promising strategy for text-based disease gene identification. PMID:20074336

  16. Novel approaches to global mining of aberrantly methylated promoter sites in squamous head and neck cancer.

    PubMed

    Worsham, Maria J; Chen, Kang Mei; Stephen, Josena K; Havard, Shaleta; Benninger, Michael S

    2010-07-01

    Promoter hypermethylation is emerging as a promising molecular strategy for early detection of cancer. We examined promoter methylation status of 1143 cancer-associated genes to perform a global but unbiased inspection of methylated regions in head and neck squamous cell carcinoma (HNSCC). Laboratory-based study. Integrated health care system. Five samples, two frozen primary HNSCC biopsies and three HNSCC cell lines, were examined. Whole genomic DNA was interrogated using a combination of DNA immunoprecipitation (IP) and Affymetrix whole-genome tiling arrays. Of the 1143 unique cancer genes on the array, 265 were recorded across five samples. Of the 265 genes, 55 were present in all five samples, and 36 were common to four of five samples, 46 to three of five, 56 to two of five, and 72 to one of five samples. Hypermethylated genes in the five samples were cross-examined against those in PubMeth, a cancer methylation database combining text mining and expert annotation (http://www.pubmeth.org). Of the 441 genes in PubMeth, only 33 are referenced to HNSCC. We matched 34 genes in our samples to the 441 genes in the PubMeth database. Of the 34 genes, eight are reported in PubMeth as HNSCC associated. This pilot study examined the contribution of global DNA hypermethylation to the pathogenesis of HNSCC. The whole-genome methylation approach indicated 231 new genes with methylated promoter regions not yet reported in HNSCC. Examination of this comprehensive gene panel in a larger HNSCC cohort should advance selection of HNSCC-specific candidate genes for further validation as biomarkers in HNSCC. 2010 American Academy of Otolaryngology-Head and Neck Surgery Foundation. Published by Mosby, Inc. All rights reserved.

  17. Biomedical hypothesis generation by text mining and gene prioritization.

    PubMed

    Petric, Ingrid; Ligeti, Balazs; Gyorffy, Balazs; Pongor, Sandor

    2014-01-01

    Text mining methods can facilitate the generation of biomedical hypotheses by suggesting novel associations between diseases and genes. Previously, we developed a rare-term model called RaJoLink (Petric et al, J. Biomed. Inform. 42(2): 219-227, 2009) in which hypotheses are formulated on the basis of terms rarely associated with a target domain. Since many current medical hypotheses are formulated in terms of molecular entities and molecular mechanisms, here we extend the methodology to proteins and genes, using a standardized vocabulary as well as a gene/protein network model. The proposed enhanced RaJoLink rare-term model combines text mining and gene prioritization approaches. Its utility is illustrated by finding known as well as potential gene-disease associations in ovarian cancer using MEDLINE abstracts and the STRING database.

  18. A gene pattern mining algorithm using interchangeable gene sets for prokaryotes.

    PubMed

    Hu, Meng; Choi, Kwangmin; Su, Wei; Kim, Sun; Yang, Jiong

    2008-02-26

    Mining gene patterns that are common to multiple genomes is an important biological problem, which can lead us to novel biological insights. When family classification of genes is available, this problem is similar to the pattern mining problem in the data mining community. However, when family classification information is not available, mining gene patterns is a challenging problem. There are several well developed algorithms for predicting gene patterns in a pair of genomes, such as FISH and DAGchainer. These algorithms use the optimization problem formulation which is solved using the dynamic programming technique. Unfortunately, extending these algorithms to multiple genome cases is not trivial due to the rapid increase in time and space complexity. In this paper, we propose a novel algorithm for mining gene patterns in more than two prokaryote genomes using interchangeable sets. The basic idea is to extend the pattern mining technique from the data mining community to handle the situation where family classification information is not available using interchangeable sets. In an experiment with four newly sequenced genomes (where the gene annotation is unavailable), we show that the gene pattern can capture important biological information. To examine the effectiveness of gene patterns further, we propose an ortholog prediction method based on our gene pattern mining algorithm and compare our method to the bi-directional best hit (BBH) technique in terms of COG orthologous gene classification information. The experiment show that our algorithm achieves a 3% increase in recall compared to BBH without sacrificing the precision of ortholog detection. The discovered gene patterns can be used for the detecting of ortholog and genes that collaborate for a common biological function.

  19. Metagenomics: Future of microbial gene mining.

    PubMed

    Vakhlu, J; Sudan, Avneet Kour; Johri, B N

    2008-06-01

    Modern biotechnology has a steadily increasing demand for novel genes for application in various industrial processes and development of genetically modified organisms. Identification, isolation and cloning for novel genes at a reasonable pace is the main driving force behind the development of unprecedented experimental approaches. Metagenomics is one such novel approach for engendering novel genes. Metagenomics of complex microbial communities (both cultivable and uncultivable) is a rich source of novel genes for biotechnological purposes. The contributions made by metagenomics to the already existing repository of prokaryotic genes is quite impressive but nevertheless, this technique is still in its infancy. In the present review we have drawn comparison between routine cloning techniques and metagenomic approach for harvesting novel microbial genes and described various methods to reach down to the specific genes in the metagenome. Accomplishments made thus far, limitations and future prospects of this resourceful technique are discussed.

  20. Integrative Data Mining Highlights Candidate Genes for Monogenic Myopathies

    PubMed Central

    Neto, Osorio Abath; Tassy, Olivier; Biancalana, Valérie; Zanoteli, Edmar; Pourquié, Olivier; Laporte, Jocelyn

    2014-01-01

    Inherited myopathies are a heterogeneous group of disabling disorders with still barely understood pathological mechanisms. Around 40% of afflicted patients remain without a molecular diagnosis after exclusion of known genes. The advent of high-throughput sequencing has opened avenues to the discovery of new implicated genes, but a working list of prioritized candidate genes is necessary to deal with the complexity of analyzing large-scale sequencing data. Here we used an integrative data mining strategy to analyze the genetic network linked to myopathies, derive specific signatures for inherited myopathy and related disorders, and identify and rank candidate genes for these groups. Training sets of genes were selected after literature review and used in Manteia, a public web-based data mining system, to extract disease group signatures in the form of enriched descriptor terms, which include functional annotation, human and mouse phenotypes, as well as biological pathways and protein interactions. These specific signatures were then used as an input to mine and rank candidate genes, followed by filtration against skeletal muscle expression and association with known diseases. Signatures and identified candidate genes highlight both potential common pathological mechanisms and allelic disease groups. Recent discoveries of gene associations to diseases, like B3GALNT2, GMPPB and B3GNT1 to congenital muscular dystrophies, were prioritized in the ranked lists, suggesting a posteriori validation of our approach and predictions. We show an example of how the ranked lists can be used to help analyze high-throughput sequencing data to identify candidate genes, and highlight the best candidate genes matching genomic regions linked to myopathies without known causative genes. This strategy can be automatized to generate fresh candidate gene lists, which help cope with database annotation updates as new knowledge is incorporated. PMID:25353622

  1. Mining Gene Expression Data of Multiple Sclerosis

    PubMed Central

    Zhu, Zhenli; Huang, Zhengliang; Li, Ke

    2014-01-01

    Objectives Microarray produces a large amount of gene expression data, containing various biological implications. The challenge is to detect a panel of discriminative genes associated with disease. This study proposed a robust classification model for gene selection using gene expression data, and performed an analysis to identify disease-related genes using multiple sclerosis as an example. Materials and methods Gene expression profiles based on the transcriptome of peripheral blood mononuclear cells from a total of 44 samples from 26 multiple sclerosis patients and 18 individuals with other neurological diseases (control) were analyzed. Feature selection algorithms including Support Vector Machine based on Recursive Feature Elimination, Receiver Operating Characteristic Curve, and Boruta algorithms were jointly performed to select candidate genes associating with multiple sclerosis. Multiple classification models categorized samples into two different groups based on the identified genes. Models’ performance was evaluated using cross-validation methods, and an optimal classifier for gene selection was determined. Results An overlapping feature set was identified consisting of 8 genes that were differentially expressed between the two phenotype groups. The genes were significantly associated with the pathways of apoptosis and cytokine-cytokine receptor interaction. TNFSF10 was significantly associated with multiple sclerosis. A Support Vector Machine model was established based on the featured genes and gave a practical accuracy of ∼86%. This binary classification model also outperformed the other models in terms of Sensitivity, Specificity and F1 score. Conclusions The combined analytical framework integrating feature ranking algorithms and Support Vector Machine model could be used for selecting genes for other diseases. PMID:24932510

  2. Mining phenotypes for gene function prediction

    PubMed Central

    Groth, Philip; Weiss, Bertram; Pohlenz, Hans-Dieter; Leser, Ulf

    2008-01-01

    Background Health and disease of organisms are reflected in their phenotypes. Often, a genetic component to a disease is discovered only after clearly defining its phenotype. In the past years, many technologies to systematically generate phenotypes in a high-throughput manner, such as RNA interference or gene knock-out, have been developed and used to decipher functions for genes. However, there have been relatively few efforts to make use of phenotype data beyond the single genotype-phenotype relationships. Results We present results on a study where we use a large set of phenotype data – in textual form – to predict gene annotation. To this end, we use text clustering to group genes based on their phenotype descriptions. We show that these clusters correlate well with several indicators for biological coherence in gene groups, such as functional annotations from the Gene Ontology (GO) and protein-protein interactions. We exploit these clusters for predicting gene function by carrying over annotations from well-annotated genes to other, less-characterized genes in the same cluster. For a subset of groups selected by applying objective criteria, we can predict GO-term annotations from the biological process sub-ontology with up to 72.6% precision and 16.7% recall, as evaluated by cross-validation. We manually verified some of these clusters and found them to exhibit high biological coherence, e.g. a group containing all available antennal Drosophila odorant receptors despite inconsistent GO-annotations. Conclusion The intrinsic nature of phenotypes to visibly reflect genetic activity underlines their usefulness in inferring new gene functions. Thus, systematically analyzing these data on a large scale offers many possibilities for inferring functional annotation of genes. We show that text clustering can play an important role in this process. PMID:18315868

  3. Integrative literature and data mining to rank disease candidate genes.

    PubMed

    Wu, Chao; Zhu, Cheng; Jegga, Anil G

    2014-01-01

    While the genomics-derived discoveries promise benefits to basic research and health care, the speed and affordability of sequencing following recent technological advances has further aggravated the data deluge. Seamless integration of the ever-increasing clinical, genomic, and experimental data and efficient mining for knowledge extraction, delivering actionable insight and generating testable hypotheses are therefore critical for the needs of biomedical research. For instance, high-throughput techniques are frequently applied to detect disease candidate genes. Experimental validation of these candidates however is both time-consuming and expensive. Hence, several computational approaches based on literature and data mining have been developed to identify the most promising candidates for follow-up studies. Based on "guilt by association" principle, most of these methods use prior knowledge about a disease of interest to discover and rank novel candidate genes. In this chapter, we provide a brief overview of recent advances made in literature- and data-mining-based approaches for candidate gene prioritization. As a case study, we focus on a Web-based computational approach that uses integrated heterogeneous data sources including gene-literature associations for ranking disease candidate genes and explain how to run typical queries using this system.

  4. MiningABs: mining associated biomarkers across multi-connected gene expression datasets.

    PubMed

    Cheng, Chun-Pei; DeBoever, Christopher; Frazer, Kelly A; Liu, Yu-Cheng; Tseng, Vincent S

    2014-06-08

    Human disease often arises as a consequence of alterations in a set of associated genes rather than alterations to a set of unassociated individual genes. Most previous microarray-based meta-analyses identified disease-associated genes or biomarkers independent of genetic interactions. Therefore, in this study, we present the first meta-analysis method capable of taking gene combination effects into account to efficiently identify associated biomarkers (ABs) across different microarray platforms. We propose a new meta-analysis approach called MiningABs to mine ABs across different array-based datasets. The similarity between paired probe sequences is quantified as a bridge to connect these datasets together. The ABs can be subsequently identified from an "improved" common logit model (c-LM) by combining several sibling-like LMs in a heuristic genetic algorithm selection process. Our approach is evaluated with two sets of gene expression datasets: i) 4 esophageal squamous cell carcinoma and ii) 3 hepatocellular carcinoma datasets. Based on an unbiased reciprocal test, we demonstrate that each gene in a group of ABs is required to maintain high cancer sample classification accuracy, and we observe that ABs are not limited to genes common to all platforms. Investigating the ABs using Gene Ontology (GO) enrichment, literature survey, and network analyses indicated that our ABs are not only strongly related to cancer development but also highly connected in a diverse network of biological interactions. The proposed meta-analysis method called MiningABs is able to efficiently identify ABs from different independently performed array-based datasets, and we show its validity in cancer biology via GO enrichment, literature survey and network analyses. We postulate that the ABs may facilitate novel target and drug discovery, leading to improved clinical treatment. Java source code, tutorial, example and related materials are available at "http://sourceforge.net/projects/miningabs/".

  5. OntoGene web services for biomedical text mining

    PubMed Central

    2014-01-01

    Text mining services are rapidly becoming a crucial component of various knowledge management pipelines, for example in the process of database curation, or for exploration and enrichment of biomedical data within the pharmaceutical industry. Traditional architectures, based on monolithic applications, do not offer sufficient flexibility for a wide range of use case scenarios, and therefore open architectures, as provided by web services, are attracting increased interest. We present an approach towards providing advanced text mining capabilities through web services, using a recently proposed standard for textual data interchange (BioC). The web services leverage a state-of-the-art platform for text mining (OntoGene) which has been tested in several community-organized evaluation challenges, with top ranked results in several of them. PMID:25472638

  6. OntoGene web services for biomedical text mining.

    PubMed

    Rinaldi, Fabio; Clematide, Simon; Marques, Hernani; Ellendorff, Tilia; Romacker, Martin; Rodriguez-Esteban, Raul

    2014-01-01

    Text mining services are rapidly becoming a crucial component of various knowledge management pipelines, for example in the process of database curation, or for exploration and enrichment of biomedical data within the pharmaceutical industry. Traditional architectures, based on monolithic applications, do not offer sufficient flexibility for a wide range of use case scenarios, and therefore open architectures, as provided by web services, are attracting increased interest. We present an approach towards providing advanced text mining capabilities through web services, using a recently proposed standard for textual data interchange (BioC). The web services leverage a state-of-the-art platform for text mining (OntoGene) which has been tested in several community-organized evaluation challenges,with top ranked results in several of them.

  7. Mining Bacterial Genomes for Secondary Metabolite Gene Clusters.

    PubMed

    Adamek, Martina; Spohn, Marius; Stegmann, Evi; Ziemert, Nadine

    2017-01-01

    With the emergence of bacterial resistance against frequently used antibiotics, novel antibacterial compounds are urgently needed. Traditional bioactivity-guided drug discovery strategies involve laborious screening efforts and display high rediscovery rates. With the progress in next generation sequencing methods and the knowledge that the majority of antibiotics in clinical use are produced as secondary metabolites by bacteria, mining bacterial genomes for secondary metabolites with antimicrobial activity is a promising approach, which can guide a more time and cost-effective identification of novel compounds. However, what sounds easy to accomplish, comes with several challenges. To date, several tools for the prediction of secondary metabolite gene clusters are available, some of which are based on the detection of signature genes, while others are searching for specific patterns in gene content or regulation.Apart from the mere identification of gene clusters, several other factors such as determining cluster boundaries and assessing the novelty of the detected cluster are important. For this purpose, comparison of the predicted secondary metabolite genes with different cluster and compound databases is necessary. Furthermore, it is advisable to classify detected clusters into gene cluster families. So far, there is no standardized procedure for genome mining; however, different approaches to overcome all of these challenges exist and are addressed in this chapter. We give practical guidance on the workflow for secondary metabolite gene cluster identification, which includes the determination of gene cluster boundaries, addresses problems occurring with the use of draft genomes, and gives an outlook on the different methods for gene cluster classification. Based on comprehensible examples a protocol is set, which should enable the readers to mine their own genome data for interesting secondary metabolites.

  8. ESTIMATE OF GLOBAL METHANE EMISSIONS FROM COAL MINES

    EPA Science Inventory

    Country-specific emissions of methane (CH4) from underground coal mines, surface coal mines, and coal crushing and transport operations are estimated for 1989. Emissions for individual countries are estimated by using two sets of regression equations (R2 values range from 0.56 to...

  9. Computer aided gene mining for gingerol biosynthesis.

    PubMed

    James, Priyanka; Baby, Bincy; Charles, SonaSona; Nair, Lekshmysree Saraschandran; Nazeem, Puthiyaveetil Abdulla

    2015-01-01

    Inspite of the large body of genomic data obtained from the transcriptome of Zingiber officinale, very few studies have focused on the identification and characterization of miRNAs in gingerol biosynthesis. Zingiber officinale transcriptome was analyzed using EST dataset (38169 total) deposited in public domains. In this paper computational functional annotation of the available ESTs and identification of genes which play a significant role in gingerol biosynthesis are described. Zingiber officinale transcriptome was analyzed using EST dataset (38169 total) from ncbi. ESTs were clustered and assembled, resulting in 8624 contigs and 8821 singletons. Assembled dataset was then submitted to the EST functional annotation workflow including blast, gene ontology (go) analysis, and pathway enrichment by kyoto encyclopedia of genes and genomes (kegg) and interproscan. The unigene datasets were further exploited to identify simple sequence repeats that enable linkage mapping. A total of 409 simple sequence repeats were identified from the contigs. Furthermore we examined the existence of novel miRNAs from the ESTs in rhizome, root and leaf tissues. EST analysis revealed the presence of single hypothetical miRNA in rhizome tissue. The hypothetical miRNA is warranted to play an important role in controlling genes involved in gingerol biosynthesis and hence demands experimental validation. The assembly and associated information of transcriptome data provides a comprehensive functional and evolutionary characterization of genomics of Zingiber officinale. As an effort to make the genomic and transcriptomic data widely available to the public domain, the results were integrated into a web-based Ginger EST database which is freely accessible at http://www.kaubic.in/gingerest/.

  10. Computer aided gene mining for gingerol biosynthesis

    PubMed Central

    James, Priyanka; Baby, Bincy; Charles, SonaSona; Nair, Lekshmysree Saraschandran; Nazeem, Puthiyaveetil Abdulla

    2015-01-01

    Inspite of the large body of genomic data obtained from the transcriptome of Zingiber officinale, very few studies have focused on the identification and characterization of miRNAs in gingerol biosynthesis. Zingiber officinale transcriptome was analyzed using EST dataset (38169 total) deposited in public domains. In this paper computational functional annotation of the available ESTs and identification of genes which play a significant role in gingerol biosynthesis are described. Zingiber officinale transcriptome was analyzed using EST dataset (38169 total) from ncbi. ESTs were clustered and assembled, resulting in 8624 contigs and 8821 singletons. Assembled dataset was then submitted to the EST functional annotation workflow including blast, gene ontology (go) analysis, and pathway enrichment by kyoto encyclopedia of genes and genomes (kegg) and interproscan. The unigene datasets were further exploited to identify simple sequence repeats that enable linkage mapping. A total of 409 simple sequence repeats were identified from the contigs. Furthermore we examined the existence of novel miRNAs from the ESTs in rhizome, root and leaf tissues. EST analysis revealed the presence of single hypothetical miRNA in rhizome tissue. The hypothetical miRNA is warranted to play an important role in controlling genes involved in gingerol biosynthesis and hence demands experimental validation. The assembly and associated information of transcriptome data provides a comprehensive functional and evolutionary characterization of genomics of Zingiber officinale. As an effort to make the genomic and transcriptomic data widely available to the public domain, the results were integrated into a web-based Ginger EST database which is freely accessible at http://www.kaubic.in/gingerest/. PMID:26229293

  11. Text Mining to Support Gene Ontology Curation and Vice Versa.

    PubMed

    Ruch, Patrick

    2017-01-01

    In this chapter, we explain how text mining can support the curation of molecular biology databases dealing with protein functions. We also show how curated data can play a disruptive role in the developments of text mining methods. We review a decade of efforts to improve the automatic assignment of Gene Ontology (GO) descriptors, the reference ontology for the characterization of genes and gene products. To illustrate the high potential of this approach, we compare the performances of an automatic text categorizer and show a large improvement of +225 % in both precision and recall on benchmarked data. We argue that automatic text categorization functions can ultimately be embedded into a Question-Answering (QA) system to answer questions related to protein functions. Because GO descriptors can be relatively long and specific, traditional QA systems cannot answer such questions. A new type of QA system, so-called Deep QA which uses machine learning methods trained with curated contents, is thus emerging. Finally, future advances of text mining instruments are directly dependent on the availability of high-quality annotated contents at every curation step. Databases workflows must start recording explicitly all the data they curate and ideally also some of the data they do not curate.

  12. Integrative data-mining tools to link gene and function.

    PubMed

    El Yacoubi, Basma; de Crécy-Lagard, Valérie

    2014-01-01

    Information derived from genomic and post-genomic data can be efficiently used to link gene and function. Several web-based platforms have been developed to mine these types of data by integrating different tools. This method paper is designed to allow the user to navigate these platforms in order to make functional predictions. The main focus is on phylogenetic distribution and physical clustering tools, but other tools such as pathway reconstruction, gene fusions, and analysis of high-throughput experimental data are also surveyed.

  13. The Determination of Children's Knowledge of Global Lunar Patterns from Online Essays Using Text Mining Analysis

    ERIC Educational Resources Information Center

    Cheon, Jongpil; Lee, Sangno; Smith, Walter; Song, Jaeki; Kim, Yongjin

    2013-01-01

    The purpose of this study was to use text mining analysis of early adolescents' online essays to determine their knowledge of global lunar patterns. Australian and American students in grades five to seven wrote about global lunar patterns they had discovered by sharing observations with each other via the Internet. These essays were analyzed for…

  14. The Determination of Children's Knowledge of Global Lunar Patterns from Online Essays Using Text Mining Analysis

    ERIC Educational Resources Information Center

    Cheon, Jongpil; Lee, Sangno; Smith, Walter; Song, Jaeki; Kim, Yongjin

    2013-01-01

    The purpose of this study was to use text mining analysis of early adolescents' online essays to determine their knowledge of global lunar patterns. Australian and American students in grades five to seven wrote about global lunar patterns they had discovered by sharing observations with each other via the Internet. These essays were analyzed for…

  15. Incremental fuzzy mining of gene expression data for gene function prediction.

    PubMed

    Ma, Patrick C H; Chan, Keith C C

    2011-05-01

    Due to the complexity of the underlying biological processes, gene expression data obtained from DNA microarray technologies are typically noisy and have very high dimensionality and these make the mining of such data for gene function prediction very difficult. To tackle these difficulties, we propose to use an incremental fuzzy mining technique called incremental fuzzy mining (IFM). By transforming quantitative expression values into linguistic terms, such as highly or lowly expressed, IFM can effectively capture heterogeneity in expression data for pattern discovery. It does so using a fuzzy measure to determine if interesting association patterns exist between the linguistic gene expression levels. Based on these patterns, IFM can make accurate gene function predictions and these predictions can be made in such a way that each gene can be allowed to belong to more than one functional class with different degrees of membership. Gene function prediction problem can be formulated both as classification and clustering problems, and IFM can be used either as a classification technique or together with existing clustering algorithms to improve the cluster groupings discovered for greater prediction accuracies. IFM is characterized also by its being an incremental data mining technique so that the discovered patterns can be continually refined based only on newly collected data without the need for retraining using the whole dataset. For performance evaluation, IFM has been tested with real expression datasets for both classification and clustering tasks. Experimental results show that it can effectively uncover hidden patterns for accurate gene function predictions. © 2011 IEEE

  16. GEOGLE: context mining tool for the correlation between gene expression and the phenotypic distinction

    PubMed Central

    Yu, Yao; Tu, Kang; Zheng, Siyuan; Li, Yun; Ding, Guohui; Ping, Jie; Hao, Pei; Li, Yixue

    2009-01-01

    Background In the post-genomic era, the development of high-throughput gene expression detection technology provides huge amounts of experimental data, which challenges the traditional pipelines for data processing and analyzing in scientific researches. Results In our work, we integrated gene expression information from Gene Expression Omnibus (GEO), biomedical ontology from Medical Subject Headings (MeSH) and signaling pathway knowledge from sigPathway entries to develop a context mining tool for gene expression analysis – GEOGLE. GEOGLE offers a rapid and convenient way for searching relevant experimental datasets, pathways and biological terms according to multiple types of queries: including biomedical vocabularies, GDS IDs, gene IDs, pathway names and signature list. Moreover, GEOGLE summarizes the signature genes from a subset of GDSes and estimates the correlation between gene expression and the phenotypic distinction with an integrated p value. Conclusion This approach performing global searching of expression data may expand the traditional way of collecting heterogeneous gene expression experiment data. GEOGLE is a novel tool that provides researchers a quantitative way to understand the correlation between gene expression and phenotypic distinction through meta-analysis of gene expression datasets from different experiments, as well as the biological meaning behind. The web site and user guide of GEOGLE are available at: PMID:19703314

  17. Gene association analysis: a survey of frequent pattern mining from gene expression data.

    PubMed

    Alves, Ronnie; Rodriguez-Baena, Domingo S; Aguilar-Ruiz, Jesus S

    2010-03-01

    Establishing an association between variables is always of interest in genomic studies. Generation of DNA microarray gene expression data introduces a variety of data analysis issues not encountered in traditional molecular biology or medicine. Frequent pattern mining (FPM) has been applied successfully in business and scientific data for discovering interesting association patterns, and is becoming a promising strategy in microarray gene expression analysis. We review the most relevant FPM strategies, as well as surrounding main issues when devising efficient and practical methods for gene association analysis (GAA). We observed that, so far, scalability achieved by efficient methods does not imply biological soundness of the discovered association patterns, and vice versa. Ideally, GAA should employ a balanced mining model taking into account best practices employed by methods reviewed in this survey. Integrative approaches, in which biological knowledge plays an important role within the mining process, are becoming more reliable.

  18. Identification of Nitrogen-Fixing Genes and Gene Clusters from Metagenomic Library of Acid Mine Drainage

    PubMed Central

    Yin, Huaqun; Liang, Yili; Cong, Jing; Liu, Xueduan

    2014-01-01

    Biological nitrogen fixation is an essential function of acid mine drainage (AMD) microbial communities. However, most acidophiles in AMD environments are uncultured microorganisms and little is known about the diversity of nitrogen-fixing genes and structure of nif gene cluster in AMD microbial communities. In this study, we used metagenomic sequencing to isolate nif genes in the AMD microbial community from Dexing Copper Mine, China. Meanwhile, a metagenome microarray containing 7,776 large-insertion fosmids was constructed to screen novel nif gene clusters. Metagenomic analyses revealed that 742 sequences were identified as nif genes including structural subunit genes nifH, nifD, nifK and various additional genes. The AMD community is massively dominated by the genus Acidithiobacillus. However, the phylogenetic diversity of nitrogen-fixing microorganisms is much higher than previously thought in the AMD community. Furthermore, a 32.5-kb genomic sequence harboring nif, fix and associated genes was screened by metagenome microarray. Comparative genome analysis indicated that most nif genes in this cluster are most similar to those of Herbaspirillum seropedicae, but the organization of the nif gene cluster had significant differences from H. seropedicae. Sequence analysis and reverse transcription PCR also suggested that distinct transcription units of nif genes exist in this gene cluster. nifQ gene falls into the same transcription unit with fixABCX genes, which have not been reported in other diazotrophs before. All of these results indicated that more novel diazotrophs survive in the AMD community. PMID:24498417

  19. Identification of nitrogen-fixing genes and gene clusters from metagenomic library of acid mine drainage.

    PubMed

    Dai, Zhimin; Guo, Xue; Yin, Huaqun; Liang, Yili; Cong, Jing; Liu, Xueduan

    2014-01-01

    Biological nitrogen fixation is an essential function of acid mine drainage (AMD) microbial communities. However, most acidophiles in AMD environments are uncultured microorganisms and little is known about the diversity of nitrogen-fixing genes and structure of nif gene cluster in AMD microbial communities. In this study, we used metagenomic sequencing to isolate nif genes in the AMD microbial community from Dexing Copper Mine, China. Meanwhile, a metagenome microarray containing 7,776 large-insertion fosmids was constructed to screen novel nif gene clusters. Metagenomic analyses revealed that 742 sequences were identified as nif genes including structural subunit genes nifH, nifD, nifK and various additional genes. The AMD community is massively dominated by the genus Acidithiobacillus. However, the phylogenetic diversity of nitrogen-fixing microorganisms is much higher than previously thought in the AMD community. Furthermore, a 32.5-kb genomic sequence harboring nif, fix and associated genes was screened by metagenome microarray. Comparative genome analysis indicated that most nif genes in this cluster are most similar to those of Herbaspirillum seropedicae, but the organization of the nif gene cluster had significant differences from H. seropedicae. Sequence analysis and reverse transcription PCR also suggested that distinct transcription units of nif genes exist in this gene cluster. nifQ gene falls into the same transcription unit with fixABCX genes, which have not been reported in other diazotrophs before. All of these results indicated that more novel diazotrophs survive in the AMD community.

  20. Resistance Genes in Global Crop Breeding Networks.

    PubMed

    Garrett, K A; Andersen, K F; Asche, F; Bowden, R L; Forbes, G A; Kulakow, P A; Zhou, B

    2017-08-31

    Resistance genes are a major tool for managing crop diseases. The networks of crop breeders who exchange resistance genes and deploy them in varieties help to determine the global landscape of resistance and epidemics, an important system for maintaining food security. These networks function as a complex adaptive system, with associated strengths and vulnerabilities, and implications for policies to support resistance gene deployment strategies. Extensions of epidemic network analysis can be used to evaluate the multilayer agricultural networks that support and influence crop breeding networks. Here, we evaluate the general structure of crop breeding networks for cassava, potato, rice, and wheat. All four are clustered due to phytosanitary and intellectual property regulations, and linked through CGIAR hubs. Cassava networks primarily include public breeding groups, whereas others are more mixed. These systems must adapt to global change in climate and land use, the emergence of new diseases, and disruptive breeding technologies. Research priorities to support policy include how best to maintain both diversity and redundancy in the roles played by individual crop breeding groups (public versus private and global versus local), and how best to manage connectivity to optimize resistance gene deployment while avoiding risks to the useful life of resistance genes. [Formula: see text] Copyright © 2017 The Author(s). This is an open access article distributed under the CC BY 4.0 International license .

  1. Beegle: from literature mining to disease-gene discovery

    PubMed Central

    ElShal, Sarah; Tranchevent, Léon-Charles; Sifrim, Alejandro; Ardeshirdavani, Amin; Davis, Jesse; Moreau, Yves

    2016-01-01

    Disease-gene identification is a challenging process that has multiple applications within functional genomics and personalized medicine. Typically, this process involves both finding genes known to be associated with the disease (through literature search) and carrying out preliminary experiments or screens (e.g. linkage or association studies, copy number analyses, expression profiling) to determine a set of promising candidates for experimental validation. This requires extensive time and monetary resources. We describe Beegle, an online search and discovery engine that attempts to simplify this process by automating the typical approaches. It starts by mining the literature to quickly extract a set of genes known to be linked with a given query, then it integrates the learning methodology of Endeavour (a gene prioritization tool) to train a genomic model and rank a set of candidate genes to generate novel hypotheses. In a realistic evaluation setup, Beegle has an average recall of 84% in the top 100 returned genes as a search engine, which improves the discovery engine by 12.6% in the top 5% prioritized genes. Beegle is publicly available at http://beegle.esat.kuleuven.be/. PMID:26384564

  2. Impact of mercury emissions from historic gold and silver mining: Global modeling

    NASA Astrophysics Data System (ADS)

    Strode, Sarah; Jaeglé, Lyatt; Selin, Noelle E.

    We compare a global model of mercury to sediment core records to constrain mercury emissions from the 19th century North American gold and silver mining. We use information on gold and silver production, the ratio of mercury lost to precious metal produced, and the fraction of mercury lost to the atmosphere to calculate an a priory mining inventory for the 1870s, when the historical gold rush was at its highest. The resulting global mining emissions are 1630 Mg yr -1, consistent with previously published studies. Using this a priori estimate, we find that our 1880 simulation over-predicts the mercury deposition enhancements archived in lake sediment records. Reducing the mining emissions to 820 Mg yr -1 improves agreement with observations, and leads to a 30% enhancement in global deposition in 1880 compared to the pre-industrial period. For North America, where 83% of the mining emissions are located, deposition increases by 60%. While our lower emissions of atmospheric mercury leads to a smaller impact of the North American gold rush on global mercury deposition than previously estimated, it also implies that a larger fraction of the mercury used in extracting precious metals could have been directly lost to local soils and watersheds.

  3. Whole-body gene expression by data mining.

    PubMed

    Pires Martins, R; Leach, R E; Krawetz, S A

    2001-02-15

    To date, a comprehensive survey of the expression of lysyl oxidase (LOX), lysyl oxidase-like 1 (LOXL1), and lysyl oxidase-like 2 (LOXL2) has yet to be performed. The use of in vitro strategies to accomplish this task would prove daunting as it is both time-consuming and costly. We present a new in silico data mining strategy that directly addresses these limitations. Sequences corresponding to the 3' untranslated regions of LOX, LOXL1, and LOXL2 were individually queried against the human expressed sequence tag database (dbEST). In this manner, the entire tissue repertoire available in the dbEST was surveyed. This provided an estimate of the levels of mRNA transcripts in a variety of adult and fetal tissues. We have also employed this strategy to determine the pattern of expression and levels of a newly discovered gene, CGI-15. The veracity of this technique has been independently assessed by semiquantitative PCR analysis. The application of this technology is bounded only by the ever-growing information available in the GenBank, UniGene, and human EST databases. The utility of our data mining strategy to establish relative transcript levels in numerous tissues is presented.

  4. Mining disease genes using integrated protein-protein interaction and gene-gene co-regulation information.

    PubMed

    Li, Jin; Wang, Limei; Guo, Maozu; Zhang, Ruijie; Dai, Qiguo; Liu, Xiaoyan; Wang, Chunyu; Teng, Zhixia; Xuan, Ping; Zhang, Mingming

    2015-01-01

    In humans, despite the rapid increase in disease-associated gene discovery, a large proportion of disease-associated genes are still unknown. Many network-based approaches have been used to prioritize disease genes. Many networks, such as the protein-protein interaction (PPI), KEGG, and gene co-expression networks, have been used. Expression quantitative trait loci (eQTLs) have been successfully applied for the determination of genes associated with several diseases. In this study, we constructed an eQTL-based gene-gene co-regulation network (GGCRN) and used it to mine for disease genes. We adopted the random walk with restart (RWR) algorithm to mine for genes associated with Alzheimer disease. Compared to the Human Protein Reference Database (HPRD) PPI network alone, the integrated HPRD PPI and GGCRN networks provided faster convergence and revealed new disease-related genes. Therefore, using the RWR algorithm for integrated PPI and GGCRN is an effective method for disease-associated gene mining.

  5. Horizontal gene transfer in an acid mine drainage microbial community.

    PubMed

    Guo, Jiangtao; Wang, Qi; Wang, Xiaoqi; Wang, Fumeng; Yao, Jinxian; Zhu, Huaiqiu

    2015-07-04

    Horizontal gene transfer (HGT) has been widely identified in complete prokaryotic genomes. However, the roles of HGT among members of a microbial community and in evolution remain largely unknown. With the emergence of metagenomics, it is nontrivial to investigate such horizontal flow of genetic materials among members in a microbial community from the natural environment. Because of the lack of suitable methods for metagenomics gene transfer detection, microorganisms from a low-complexity community acid mine drainage (AMD) with near-complete genomes were used to detect possible gene transfer events and suggest the biological significance. Using the annotation of coding regions by the current tools, a phylogenetic approach, and an approximately unbiased test, we found that HGTs in AMD organisms are not rare, and we predicted 119 putative transferred genes. Among them, 14 HGT events were determined to be transfer events among the AMD members. Further analysis of the 14 transferred genes revealed that the HGT events affected the functional evolution of archaea or bacteria in AMD, and it probably shaped the community structure, such as the dominance of G-plasma in archaea in AMD through HGT. Our study provides a novel insight into HGT events among microorganisms in natural communities. The interconnectedness between HGT and community evolution is essential to understand microbial community formation and development.

  6. Literature Mining and Ontology based Analysis of Host-Brucella Gene-Gene Interaction Network.

    PubMed

    Karadeniz, İlknur; Hur, Junguk; He, Yongqun; Özgür, Arzucan

    2015-01-01

    Brucella is an intracellular bacterium that causes chronic brucellosis in humans and various mammals. The identification of host-Brucella interaction is crucial to understand host immunity against Brucella infection and Brucella pathogenesis against host immune responses. Most of the information about the inter-species interactions between host and Brucella genes is only available in the text of the scientific publications. Many text-mining systems for extracting gene and protein interactions have been proposed. However, only a few of them have been designed by considering the peculiarities of host-pathogen interactions. In this paper, we used a text mining approach for extracting host-Brucella gene-gene interactions from the abstracts of articles in PubMed. The gene-gene interactions here represent the interactions between genes and/or gene products (e.g., proteins). The SciMiner tool, originally designed for detecting mammalian gene/protein names in text, was extended to identify host and Brucella gene/protein names in the abstracts. Next, sentence-level and abstract-level co-occurrence based approaches, as well as sentence-level machine learning based methods, originally designed for extracting intra-species gene interactions, were utilized to extract the interactions among the identified host and Brucella genes. The extracted interactions were manually evaluated. A total of 46 host-Brucella gene interactions were identified and represented as an interaction network. Twenty four of these interactions were identified from sentence-level processing. Twenty two additional interactions were identified when abstract-level processing was performed. The Interaction Network Ontology (INO) was used to represent the identified interaction types at a hierarchical ontology structure. Ontological modeling of specific gene-gene interactions demonstrates that host-pathogen gene-gene interactions occur at experimental conditions which can be ontologically represented. Our

  7. Science and Technology Text Mining: Global Technology Watch

    DTIC Science & Technology

    2003-07-01

    Prog Bio, 57: (3), 1998. 20. Swanson, D.R., “Fish Oil, Raynauds Syndrome , and Undiscovered Public Knowledge”, Perspect Biol Med, .30: (1), 1986. 21...based, required to understand the status of science and technology ( S &T) globally. Since one important dissemination avenue for S &T is its literature...literature in S &T development and exploitation. Ready access to the results of all global research performed is required in order to: REPORT DOCUMENTATION

  8. A random set scoring model for prioritization of disease candidate genes using protein complexes and data-mining of GeneRIF, OMIM and PubMed records.

    PubMed

    Jiang, Li; Edwards, Stefan M; Thomsen, Bo; Workman, Christopher T; Guldbrandtsen, Bernt; Sørensen, Peter

    2014-09-24

    Prioritizing genetic variants is a challenge because disease susceptibility loci are often located in genes of unknown function or the relationship with the corresponding phenotype is unclear. A global data-mining exercise on the biomedical literature can establish the phenotypic profile of genes with respect to their connection to disease phenotypes. The importance of protein-protein interaction networks in the genetic heterogeneity of common diseases or complex traits is becoming increasingly recognized. Thus, the development of a network-based approach combined with phenotypic profiling would be useful for disease gene prioritization. We developed a random-set scoring model and implemented it to quantify phenotype relevance in a network-based disease gene-prioritization approach. We validated our approach based on different gene phenotypic profiles, which were generated from PubMed abstracts, OMIM, and GeneRIF records. We also investigated the validity of several vocabulary filters and different likelihood thresholds for predicted protein-protein interactions in terms of their effect on the network-based gene-prioritization approach, which relies on text-mining of the phenotype data. Our method demonstrated good precision and sensitivity compared with those of two alternative complex-based prioritization approaches. We then conducted a global ranking of all human genes according to their relevance to a range of human diseases. The resulting accurate ranking of known causal genes supported the reliability of our approach. Moreover, these data suggest many promising novel candidate genes for human disorders that have a complex mode of inheritance. We have implemented and validated a network-based approach to prioritize genes for human diseases based on their phenotypic profile. We have devised a powerful and transparent tool to identify and rank candidate genes. Our global gene prioritization provides a unique resource for the biological interpretation of data

  9. Mining Gene Expression Data Focusing Cancer Therapeutics: A Digest.

    PubMed

    Jauhari, Shaurya; Rizvi, S A M

    2014-01-01

    An understanding towards genetics and epigenetics is essential to cope up with the paradigm shift which is underway. Personalized medicine and gene therapy will confluence the days to come. This review highlights traditional approaches as well as current advancements in the analysis of the gene expression data from cancer perspective. Due to improvements in biometric instrumentation and automation, it has become easier to collect a lot of experimental data in molecular biology. Analysis of such data is extremely important as it leads to knowledge discovery that can be validated by experiments. Previously, the diagnosis of complex genetic diseases has conventionally been done based on the non-molecular characteristics like kind of tumor tissue, pathological characteristics, and clinical phase. The microarray data can be well accounted for high dimensional space and noise. Same were the reasons for ineffective and imprecise results. Several machine learning and data mining techniques are presently applied for identifying cancer using gene expression data. While differences in efficiency do exist, none of the well-established approaches is uniformly superior to others. The quality of algorithm is important, but is not in itself a guarantee of the quality of a specific data analysis.

  10. Shift in Global Tantalum Mine Production, 2000–2014

    USGS Publications Warehouse

    Bleiwas, Donald I.; Papp, John F.; Yager, Thomas R.

    2015-12-10

    One of the activities of the U.S. Geological Survey National Minerals Information Center (USGS-NMIC) is to analyze global supply chains and characterize major components of mineral and material flows from ore extraction through processing to first tier products. These analyses support the core mission of the USGS-NMIC as the Federal entity responsible for the collection, analysis, and dissemination of objective, unbiased, factual information on minerals essential to the U.S. economy and national security.

  11. Exploring the diversity of arsenic resistance genes from acid mine drainage microorganisms.

    PubMed

    Morgante, Verónica; Mirete, Salvador; de Figueras, Carolina G; Postigo Cacho, Marina; González-Pastor, José E

    2015-06-01

    The microbial communities from the Tinto River, a natural acid mine drainage environment, were explored to search for novel genes involved in arsenic resistance using a functional metagenomic approach. Seven pentavalent arsenate resistance clones were selected and analysed to find the genes responsible for this phenotype. Insights about their possible mechanisms of resistance were obtained from sequence similarities and cellular arsenic concentration. A total of 19 individual open reading frames were analysed, and each one was individually cloned and assayed for its ability to confer arsenic resistance in Escherichia coli cells. A total of 13 functionally active genes involved in arsenic resistance were identified, and they could be classified into different global processes: transport, stress response, DNA damage repair, phospholipids biosynthesis, amino acid biosynthesis and RNA-modifying enzymes. Most genes (11) encode proteins not previously related to heavy metal resistance or hypothetical or unknown proteins. On the other hand, two genes were previously related to heavy metal resistance in microorganisms. In addition, the ClpB chaperone and the RNA-modifying enzymes retrieved in this work were shown to increase the cell survival under different stress conditions (heat shock, acid pH and UV radiation). Thus, these results reveal novel insights about unidentified mechanisms of arsenic resistance.

  12. Identification of underground mine workings with the use of global positioning system technology

    SciTech Connect

    Canty, G.A.; Everett, J.W.; Sharp, M.

    1998-12-31

    Identification of underground mine workings for well drilling is a difficult task given the limited resources available and lack of reliable information. Relic mine maps of questionable accuracy and difficulty in correlating the subsurface to the surface, make the process of locating wells arduous. With the development of global positioning system (GPS), specific locations on the earth can be identified with the aid of satellites. This technology can be applied to mine workings identification given a few necessary, precursory details. For an abandoned mine treatment project conducted by the University of Oklahoma, in conjunction with the Oklahoma Conservation Commission, a Trimble ProXL 8 channel GPS receiver was employed to locate specific points on the surface with respect to a mine map. A 1925 mine map was digitized into AutoCAD version 13 software. Surface features identified on the map, such as mine adits, were located and marked in the field using the GPS receiver. These features were than imported into AutoCAD and referenced with the same points drawn on the map. A rubber sheeting program, Multric, was used to tweak the points so the map features correlated with the surface points. The correlation of these features allowed the map to be geo-referenced with the surface. Specific drilling points were located on the digitized map and assigned a latitude and longitude. The GPS receiver, using real time differential correction, was used to locate these points in the field. This method was assumed to be relatively accurate, to within 5 to 15 feet.

  13. Global direct pressures on biodiversity by large-scale metal mining: Spatial distribution and implications for conservation.

    PubMed

    Murguía, Diego I; Bringezu, Stefan; Schaldach, Rüdiger

    2016-09-15

    Biodiversity loss is widely recognized as a serious global environmental change process. While large-scale metal mining activities do not belong to the top drivers of such change, these operations exert or may intensify pressures on biodiversity by adversely changing habitats, directly and indirectly, at local and regional scales. So far, analyses of global spatial dynamics of mining and its burden on biodiversity focused on the overlap between mines and protected areas or areas of high value for conservation. However, it is less clear how operating metal mines are globally exerting pressure on zones of different biodiversity richness; a similar gap exists for unmined but known mineral deposits. By using vascular plants' diversity as a proxy to quantify overall biodiversity, this study provides a first examination of the global spatial distribution of mines and deposits for five key metals across different biodiversity zones. The results indicate that mines and deposits are not randomly distributed, but concentrated within intermediate and high diversity zones, especially bauxite and silver. In contrast, iron, gold, and copper mines and deposits are closer to a more proportional distribution while showing a high concentration in the intermediate biodiversity zone. Considering the five metals together, 63% and 61% of available mines and deposits, respectively, are located in intermediate diversity zones, comprising 52% of the global land terrestrial surface. 23% of mines and 20% of ore deposits are located in areas of high plant diversity, covering 17% of the land. 13% of mines and 19% of deposits are in areas of low plant diversity, comprising 31% of the land surface. Thus, there seems to be potential for opening new mines in areas of low biodiversity in the future. Copyright © 2016 Elsevier Ltd. All rights reserved.

  14. Mining the glioma susceptibility genes in children from gene expression profiles and a methylation database

    PubMed Central

    Xi, Yongqiang; Tang, Wanzhong; Yang, Song; Li, Maolei; He, Yuchao; Fu, Xianhua

    2017-01-01

    Glioma is the most common type of primary brain tumor, which is associated with a poor prognosis due to its aggressive growth behavior and highly invasive nature. Research regarding glioma pathogenesis is expected to provide novel methods of adjuvant therapy for the treatment of glioma. The use of bioinformatics to identify candidate genes is commonly used to understand the genetic basis of disease. The present study used bioinformatics to mine the disease-related genes using gene expression profiles (GSE50021) and dual-channel DNA methylation data (GSE50022). The results identified 17 methylation sites located on 33 transcription factor binding sites, which may be responsible for downregulation of 17 target genes. glutamate metabotropic receptor 2 was one of the 17 downregulated target genes. Furthermore, inositol-trisphosphate 3-kinase A (ITPKA) was revealed to be the gene most associated with the risk of glioma in children. The protein coded by the ITPKA gene appeared in all risk sub-pathways, thus suggesting that ITPKA was the gene most associated with the risk of glioma, and inositol phosphate metabolism may be a key pathway associated with glioma in children. The identification of specific genes helps to determine the pathogenesis and possible therapeutic targets for the treatment of glioma in children. PMID:28927102

  15. Allele Mining Strategies: Principles and Utilisation for Blast Resistance Genes in Rice (Oryza sativa L.).

    PubMed

    Ashkani, Sadegh; Yusop, Mohd Rafii; Shabanimofrad, Mahmoodreza; Azady, Amin; Ghasemzadeh, Ali; Azizi, Parisa; Latif, Mohammad Abdul

    2015-01-01

    Allele mining is a promising way to dissect naturally occurring allelic variants of candidate genes with essential agronomic qualities. With the identification, isolation and characterisation of blast resistance genes in rice, it is now possible to dissect the actual allelic variants of these genes within an array of rice cultivars via allele mining. Multiple alleles from the complex locus serve as a reservoir of variation to generate functional genes. The routine sequence exchange is one of the main mechanisms of R gene evolution and development. Allele mining for resistance genes can be an important method to identify additional resistance alleles and new haplotypes along with the development of allele-specific markers for use in marker-assisted selection. Allele mining can be visualised as a vital link between effective utilisation of genetic and genomic resources in genomics-driven modern plant breeding. This review studies the actual concepts and potential of mining approaches for the discovery of alleles and their utilisation for blast resistance genes in rice. The details provided here will be important to provide the rice breeder with a worthwhile introduction to allele mining and its methodology for breakthrough discovery of fresh alleles hidden in hereditary diversity, which is vital for crop improvement.

  16. Mining and analysing spatio-temporal patterns of gene expression in an integrative database framework.

    PubMed

    Belmamoune, M; Potikanond, D; Verbeek, F J

    2010-03-25

    Mining patterns of gene expression provides a crucial approach in discovering knowledge such as finding genetic networks that underpin the embryonic development. Analysis of mining results and evaluation of their relevance in the domain remains a major concern. In this paper we describe our explorative studies in support of solutions to facilitate the analysis and interpretation of mining results. In our particular case we describe a solution that is found in the extension of the Gene Expression Management System (GEMS), i.e. an integrative framework for spatio-temporal organization of gene expression patterns of zebrafish to a framework supporting data mining, data analysis and patterns interpretation As a proof of principle, the GEMS has been equipped with data mining functionality suitable for spatio-temporal tracking, thereby generating added value to the submission of data for data mining and analysis. The analysis of the genetic networks is based on the availability of domain ontologies which dynamically provides meaning to the discovered patterns of gene expression data. Combination of data mining with the already presently available capabilities of GEMS will significantly augment current data processing and functional analysis strategies.

  17. The Determination of Children's Knowledge of Global Lunar Patterns from Online Essays Using Text Mining Analysis

    NASA Astrophysics Data System (ADS)

    Cheon, Jongpil; Lee, Sangno; Smith, Walter; Song, Jaeki; Kim, Yongjin

    2013-04-01

    The purpose of this study was to use text mining analysis of early adolescents' online essays to determine their knowledge of global lunar patterns. Australian and American students in grades five to seven wrote about global lunar patterns they had discovered by sharing observations with each other via the Internet. These essays were analyzed for the students' inclusion of words associated with the shape (i.e., phase), orientation and location of the Moon along with words about similarities and differences. Almost all students wrote about shape but fewer wrote about orientation or location. Students infrequently included words about similarities or differences in the same sentence with shape, orientation or location. Similar to studies about children's and adults' lunar misconceptions, it was found that male and female early adolescents also lacked a robust understanding of global lunar patterns.

  18. Online Analytical Processing (OLAP): A Fast and Effective Data Mining Tool for Gene Expression Databases

    PubMed Central

    2005-01-01

    Gene expression databases contain a wealth of information, but current data mining tools are limited in their speed and effectiveness in extracting meaningful biological knowledge from them. Online analytical processing (OLAP) can be used as a supplement to cluster analysis for fast and effective data mining of gene expression databases. We used Analysis Services 2000, a product that ships with SQLServer2000, to construct an OLAP cube that was used to mine a time series experiment designed to identify genes associated with resistance of soybean to the soybean cyst nematode, a devastating pest of soybean. The data for these experiments is stored in the soybean genomics and microarray database (SGMD). A number of candidate resistance genes and pathways were found. Compared to traditional cluster analysis of gene expression data, OLAP was more effective and faster in finding biologically meaningful information. OLAP is available from a number of vendors and can work with any relational database management system through OLE DB. PMID:16046824

  19. A genome-wide MeSH-based literature mining system predicts implicit gene-to-gene relationships and networks.

    PubMed

    Xiang, Zuoshuang; Qin, Tingting; Qin, Zhaohui S; He, Yongqun

    2013-10-16

    The large amount of literature in the post-genomics era enables the study of gene interactions and networks using all available articles published for a specific organism. MeSH is a controlled vocabulary of medical and scientific terms that is used by biomedical scientists to manually index articles in the PubMed literature database. We hypothesized that genome-wide gene-MeSH term associations from the PubMed literature database could be used to predict implicit gene-to-gene relationships and networks. While the gene-MeSH associations have been used to detect gene-gene interactions in some studies, different methods have not been well compared, and such a strategy has not been evaluated for a genome-wide literature analysis. Genome-wide literature mining of gene-to-gene interactions allows ranking of the best gene interactions and investigation of comprehensive biological networks at a genome level. The genome-wide GenoMesh literature mining algorithm was developed by sequentially generating a gene-article matrix, a normalized gene-MeSH term matrix, and a gene-gene matrix. The gene-gene matrix relies on the calculation of pairwise gene dissimilarities based on gene-MeSH relationships. An optimized dissimilarity score was identified from six well-studied functions based on a receiver operating characteristic (ROC) analysis. Based on the studies with well-studied Escherichia coli and less-studied Brucella spp., GenoMesh was found to accurately identify gene functions using weighted MeSH terms, predict gene-gene interactions not reported in the literature, and cluster all the genes studied from an organism using the MeSH-based gene-gene matrix. A web-based GenoMesh literature mining program is also available at: http://genomesh.hegroup.org. GenoMesh also predicts gene interactions and networks among genes associated with specific MeSH terms or user-selected gene lists. The GenoMesh algorithm and web program provide the first genome-wide, MeSH-based literature mining

  20. Development of biomarkers for screening hepatocellular carcinoma using global data mining and multiple reaction monitoring.

    PubMed

    Kim, Hyunsoo; Kim, Kyunggon; Yu, Su Jong; Jang, Eun Sun; Yu, Jiyoung; Cho, Geunhee; Yoon, Jung-Hwan; Kim, Youngsoo

    2013-01-01

    Hepatocellular carcinoma (HCC) is one of the most common and aggressive cancers and is associated with a poor survival rate. Clinically, the level of alpha-fetoprotein (AFP) has been used as a biomarker for the diagnosis of HCC. The discovery of useful biomarkers for HCC, focused solely on the proteome, has been difficult; thus, wide-ranging global data mining of genomic and proteomic databases from previous reports would be valuable in screening biomarker candidates. Further, multiple reaction monitoring (MRM), based on triple quadrupole mass spectrometry, has been effective with regard to high-throughput verification, complementing antibody-based verification pipelines. In this study, global data mining was performed using 5 types of HCC data to screen for candidate biomarker proteins: cDNA microarray, copy number variation, somatic mutation, epigenetic, and quantitative proteomics data. Next, we applied MRM to verify HCC candidate biomarkers in individual serum samples from 3 groups: a healthy control group, patients who have been diagnosed with HCC (Before HCC treatment group), and HCC patients who underwent locoregional therapy (After HCC treatment group). After determining the relative quantities of the candidate proteins by MRM, we compared their expression levels between the 3 groups, identifying 4 potential biomarkers: the actin-binding protein anillin (ANLN), filamin-B (FLNB), complementary C4-A (C4A), and AFP. The combination of 2 markers (ANLN, FLNB) improved the discrimination of the before HCC treatment group from the healthy control group compared with AFP. We conclude that the combination of global data mining and MRM verification enhances the screening and verification of potential HCC biomarkers. This efficacious integrative strategy is applicable to the development of markers for cancer and other diseases.

  1. Development of Biomarkers for Screening Hepatocellular Carcinoma Using Global Data Mining and Multiple Reaction Monitoring

    PubMed Central

    Yu, Su Jong; Jang, Eun Sun; Yu, Jiyoung; Cho, Geunhee; Yoon, Jung-Hwan; Kim, Youngsoo

    2013-01-01

    Hepatocellular carcinoma (HCC) is one of the most common and aggressive cancers and is associated with a poor survival rate. Clinically, the level of alpha-fetoprotein (AFP) has been used as a biomarker for the diagnosis of HCC. The discovery of useful biomarkers for HCC, focused solely on the proteome, has been difficult; thus, wide-ranging global data mining of genomic and proteomic databases from previous reports would be valuable in screening biomarker candidates. Further, multiple reaction monitoring (MRM), based on triple quadrupole mass spectrometry, has been effective with regard to high-throughput verification, complementing antibody-based verification pipelines. In this study, global data mining was performed using 5 types of HCC data to screen for candidate biomarker proteins: cDNA microarray, copy number variation, somatic mutation, epigenetic, and quantitative proteomics data. Next, we applied MRM to verify HCC candidate biomarkers in individual serum samples from 3 groups: a healthy control group, patients who have been diagnosed with HCC (Before HCC treatment group), and HCC patients who underwent locoregional therapy (After HCC treatment group). After determining the relative quantities of the candidate proteins by MRM, we compared their expression levels between the 3 groups, identifying 4 potential biomarkers: the actin-binding protein anillin (ANLN), filamin-B (FLNB), complementary C4-A (C4A), and AFP. The combination of 2 markers (ANLN, FLNB) improved the discrimination of the before HCC treatment group from the healthy control group compared with AFP. We conclude that the combination of global data mining and MRM verification enhances the screening and verification of potential HCC biomarkers. This efficacious integrative strategy is applicable to the development of markers for cancer and other diseases. PMID:23717429

  2. Automatic extraction of reference gene from literature in plants based on texting mining.

    PubMed

    He, Lin; Shen, Gengyu; Li, Fei; Huang, Shuiqing

    2015-01-01

    Real-Time Quantitative Polymerase Chain Reaction (qRT-PCR) is widely used in biological research. It is a key to the availability of qRT-PCR experiment to select a stable reference gene. However, selecting an appropriate reference gene usually requires strict biological experiment for verification with high cost in the process of selection. Scientific literatures have accumulated a lot of achievements on the selection of reference gene. Therefore, mining reference genes under specific experiment environments from literatures can provide quite reliable reference genes for similar qRT-PCR experiments with the advantages of reliability, economic and efficiency. An auxiliary reference gene discovery method from literature is proposed in this paper which integrated machine learning, natural language processing and text mining approaches. The validity tests showed that this new method has a better precision and recall on the extraction of reference genes and their environments.

  3. Network-based prediction and knowledge mining of disease genes.

    PubMed

    Carson, Matthew B; Lu, Hui

    2015-01-01

    In recent years, high-throughput protein interaction identification methods have generated a large amount of data. When combined with the results from other in vivo and in vitro experiments, a complex set of relationships between biological molecules emerges. The growing popularity of network analysis and data mining has allowed researchers to recognize indirect connections between these molecules. Due to the interdependent nature of network entities, evaluating proteins in this context can reveal relationships that may not otherwise be evident. We examined the human protein interaction network as it relates to human illness using the Disease Ontology. After calculating several topological metrics, we trained an alternating decision tree (ADTree) classifier to identify disease-associated proteins. Using a bootstrapping method, we created a tree to highlight conserved characteristics shared by many of these proteins. Subsequently, we reviewed a set of non-disease-associated proteins that were misclassified by the algorithm with high confidence and searched for evidence of a disease relationship. Our classifier was able to predict disease-related genes with 79% area under the receiver operating characteristic (ROC) curve (AUC), which indicates the tradeoff between sensitivity and specificity and is a good predictor of how a classifier will perform on future data sets. We found that a combination of several network characteristics including degree centrality, disease neighbor ratio, eccentricity, and neighborhood connectivity help to distinguish between disease- and non-disease-related proteins. Furthermore, the ADTree allowed us to understand which combinations of strongly predictive attributes contributed most to protein-disease classification. In our post-processing evaluation, we found several examples of potential novel disease-related proteins and corresponding literature evidence. In addition, we showed that first- and second-order neighbors in the PPI network

  4. Network-based prediction and knowledge mining of disease genes

    PubMed Central

    2015-01-01

    Background In recent years, high-throughput protein interaction identification methods have generated a large amount of data. When combined with the results from other in vivo and in vitro experiments, a complex set of relationships between biological molecules emerges. The growing popularity of network analysis and data mining has allowed researchers to recognize indirect connections between these molecules. Due to the interdependent nature of network entities, evaluating proteins in this context can reveal relationships that may not otherwise be evident. Methods We examined the human protein interaction network as it relates to human illness using the Disease Ontology. After calculating several topological metrics, we trained an alternating decision tree (ADTree) classifier to identify disease-associated proteins. Using a bootstrapping method, we created a tree to highlight conserved characteristics shared by many of these proteins. Subsequently, we reviewed a set of non-disease-associated proteins that were misclassified by the algorithm with high confidence and searched for evidence of a disease relationship. Results Our classifier was able to predict disease-related genes with 79% area under the receiver operating characteristic (ROC) curve (AUC), which indicates the tradeoff between sensitivity and specificity and is a good predictor of how a classifier will perform on future data sets. We found that a combination of several network characteristics including degree centrality, disease neighbor ratio, eccentricity, and neighborhood connectivity help to distinguish between disease- and non-disease-related proteins. Furthermore, the ADTree allowed us to understand which combinations of strongly predictive attributes contributed most to protein-disease classification. In our post-processing evaluation, we found several examples of potential novel disease-related proteins and corresponding literature evidence. In addition, we showed that first- and second

  5. Development and application of an interaction network ontology for literature mining of vaccine-associated gene-gene interactions.

    PubMed

    Hur, Junguk; Özgür, Arzucan; Xiang, Zuoshuang; He, Yongqun

    2015-01-01

    Literature mining of gene-gene interactions has been enhanced by ontology-based name classifications. However, in biomedical literature mining, interaction keywords have not been carefully studied and used beyond a collection of keywords. In this study, we report the development of a new Interaction Network Ontology (INO) that classifies >800 interaction keywords and incorporates interaction terms from the PSI Molecular Interactions (PSI-MI) and Gene Ontology (GO). Using INO-based literature mining results, a modified Fisher's exact test was established to analyze significantly over- and under-represented enriched gene-gene interaction types within a specific area. Such a strategy was applied to study the vaccine-mediated gene-gene interactions using all PubMed abstracts. The Vaccine Ontology (VO) and INO were used to support the retrieval of vaccine terms and interaction keywords from the literature. INO is aligned with the Basic Formal Ontology (BFO) and imports terms from 10 other existing ontologies. Current INO includes 540 terms. In terms of interaction-related terms, INO imports and aligns PSI-MI and GO interaction terms and includes over 100 newly generated ontology terms with 'INO_' prefix. A new annotation property, 'has literature mining keywords', was generated to allow the listing of different keywords mapping to the interaction types in INO. Using all PubMed documents published as of 12/31/2013, approximately 266,000 vaccine-associated documents were identified, and a total of 6,116 gene-pairs were associated with at least one INO term. Out of 78 INO interaction terms associated with at least five gene-pairs of the vaccine-associated sub-network, 14 terms were significantly over-represented (i.e., more frequently used) and 17 under-represented based on our modified Fisher's exact test. These over-represented and under-represented terms share some common top-level terms but are distinct at the bottom levels of the INO hierarchy. The analysis of these

  6. Gold mining in the Peruvian Amazon: global prices, deforestation, and mercury imports.

    PubMed

    Swenson, Jennifer J; Carter, Catherine E; Domec, Jean-Christophe; Delgado, Cesar I

    2011-04-19

    Many factors such as poverty, ineffective institutions and environmental regulations may prevent developing countries from managing how natural resources are extracted to meet a strong market demand. Extraction for some resources has reached such proportions that evidence is measurable from space. We present recent evidence of the global demand for a single commodity and the ecosystem destruction resulting from commodity extraction, recorded by satellites for one of the most biodiverse areas of the world. We find that since 2003, recent mining deforestation in Madre de Dios, Peru is increasing nonlinearly alongside a constant annual rate of increase in international gold price (∼18%/yr). We detect that the new pattern of mining deforestation (1915 ha/year, 2006-2009) is outpacing that of nearby settlement deforestation. We show that gold price is linked with exponential increases in Peruvian national mercury imports over time (R(2) = 0.93, p = 0.04, 2003-2009). Given the past rates of increase we predict that mercury imports may more than double for 2011 (∼500 t/year). Virtually all of Peru's mercury imports are used in artisanal gold mining. Much of the mining increase is unregulated/artisanal in nature, lacking environmental impact analysis or miner education. As a result, large quantities of mercury are being released into the atmosphere, sediments and waterways. Other developing countries endowed with gold deposits are likely experiencing similar environmental destruction in response to recent record high gold prices. The increasing availability of satellite imagery ought to evoke further studies linking economic variables with land use and cover changes on the ground.

  7. Gold Mining in the Peruvian Amazon: Global Prices, Deforestation, and Mercury Imports

    PubMed Central

    Swenson, Jennifer J.; Carter, Catherine E.; Domec, Jean-Christophe; Delgado, Cesar I.

    2011-01-01

    Many factors such as poverty, ineffective institutions and environmental regulations may prevent developing countries from managing how natural resources are extracted to meet a strong market demand. Extraction for some resources has reached such proportions that evidence is measurable from space. We present recent evidence of the global demand for a single commodity and the ecosystem destruction resulting from commodity extraction, recorded by satellites for one of the most biodiverse areas of the world. We find that since 2003, recent mining deforestation in Madre de Dios, Peru is increasing nonlinearly alongside a constant annual rate of increase in international gold price (∼18%/yr). We detect that the new pattern of mining deforestation (1915 ha/year, 2006–2009) is outpacing that of nearby settlement deforestation. We show that gold price is linked with exponential increases in Peruvian national mercury imports over time (R2 = 0.93, p = 0.04, 2003–2009). Given the past rates of increase we predict that mercury imports may more than double for 2011 (∼500 t/year). Virtually all of Peru's mercury imports are used in artisanal gold mining. Much of the mining increase is unregulated/artisanal in nature, lacking environmental impact analysis or miner education. As a result, large quantities of mercury are being released into the atmosphere, sediments and waterways. Other developing countries endowed with gold deposits are likely experiencing similar environmental destruction in response to recent record high gold prices. The increasing availability of satellite imagery ought to evoke further studies linking economic variables with land use and cover changes on the ground. PMID:21526143

  8. GeneKeyDB: a lightweight, gene-centric, relational database to support data mining environments.

    PubMed

    Kirov, S A; Peng, X; Baker, E; Schmoyer, D; Zhang, B; Snoddy, J

    2005-03-24

    The analysis of biological data is greatly enhanced by existing or emerging databases. Most existing databases, with few exceptions are not designed to easily support large scale computational analysis, but rather offer exclusively a web interface to the resource. We have recognized the growing need for a database which can be used successfully as a backend to computational analysis tools and pipelines. Such database should be sufficiently versatile to allow easy system integration. GeneKeyDB is a gene-centered relational database developed to enhance data mining in biological data sets. The system provides an underlying data layer for computational analysis tools and visualization tools. GeneKeyDB relies primarily on existing database identifiers derived from community databases (NCBI, GO, Ensembl, et al.) as well as the known relationships among those identifiers. It is a lightweight, portable, and extensible platform for integration with computational tools and analysis environments. GeneKeyDB can enable analysis tools and users to manipulate the intersections, unions, and differences among different data sets.

  9. Global demand for rare earth resources and strategies for green mining.

    PubMed

    Dutta, Tanushree; Kim, Ki-Hyun; Uchimiya, Minori; Kwon, Eilhann E; Jeon, Byong-Hun; Deep, Akash; Yun, Seong-Taek

    2016-10-01

    Rare earth elements (REEs) are essential raw materials for emerging renewable energy resources and 'smart' electronic devices. Global REE demand is slated to grow at an annual rate of 5% by 2020. This high growth rate will require a steady supply base of REEs in the long run. At present, China is responsible for 85% of global rare earth oxide (REO) production. To overcome this monopolistic supply situation, new strategies and investments are necessary to satisfy domestic supply demands. Concurrently, environmental, economic, and social problems arising from REE mining must be addressed. There is an urgent need to develop efficient REE recycling techniques from end-of-life products, technologies to minimize the amount of REEs required per unit device, and methods to recover them from fly ash or fossil fuel-burning wastes.

  10. Integration of text- and data-mining using ontologies successfully selects disease gene candidates.

    PubMed

    Tiffin, Nicki; Kelso, Janet F; Powell, Alan R; Pan, Hong; Bajic, Vladimir B; Hide, Winston A

    2005-01-01

    Genome-wide techniques such as microarray analysis, Serial Analysis of Gene Expression (SAGE), Massively Parallel Signature Sequencing (MPSS), linkage analysis and association studies are used extensively in the search for genes that cause diseases, and often identify many hundreds of candidate disease genes. Selection of the most probable of these candidate disease genes for further empirical analysis is a significant challenge. Additionally, identifying the genes that cause complex diseases is problematic due to low penetrance of multiple contributing genes. Here, we describe a novel bioinformatic approach that selects candidate disease genes according to their expression profiles. We use the eVOC anatomical ontology to integrate text-mining of biomedical literature and data-mining of available human gene expression data. To demonstrate that our method is successful and widely applicable, we apply it to a database of 417 candidate genes containing 17 known disease genes. We successfully select the known disease gene for 15 out of 17 diseases and reduce the candidate gene set to 63.3% (+/-18.8%) of its original size. This approach facilitates direct association between genomic data describing gene expression and information from biomedical texts describing disease phenotype, and successfully prioritizes candidate genes according to their expression in disease-affected tissues.

  11. Cross-Ontology multi-level association rule mining in the Gene Ontology.

    PubMed

    Manda, Prashanti; Ozkan, Seval; Wang, Hui; McCarthy, Fiona; Bridges, Susan M

    2012-01-01

    The Gene Ontology (GO) has become the internationally accepted standard for representing function, process, and location aspects of gene products. The wealth of GO annotation data provides a valuable source of implicit knowledge of relationships among these aspects. We describe a new method for association rule mining to discover implicit co-occurrence relationships across the GO sub-ontologies at multiple levels of abstraction. Prior work on association rule mining in the GO has concentrated on mining knowledge at a single level of abstraction and/or between terms from the same sub-ontology. We have developed a bottom-up generalization procedure called Cross-Ontology Data Mining-Level by Level (COLL) that takes into account the structure and semantics of the GO, generates generalized transactions from annotation data and mines interesting multi-level cross-ontology association rules. We applied our method on publicly available chicken and mouse GO annotation datasets and mined 5368 and 3959 multi-level cross ontology rules from the two datasets respectively. We show that our approach discovers more and higher quality association rules from the GO as evaluated by biologists in comparison to previously published methods. Biologically interesting rules discovered by our method reveal unknown and surprising knowledge about co-occurring GO terms.

  12. GeneMining: identification, visualization, and interpretation of brain ageing signatures.

    PubMed

    Salle, Paola; Bringay, Sandra; Teisseire, Maguelonne; Chakkour, Feirouz; Roche, Mathieu; Rassoul, Ronza Abdel; Verdier, Jean-Michel; Devau, Gina

    2009-01-01

    Transcriptomic technologies are promising tools for identifying new genes involved in cerebral ageing or in neurodegenerative diseases such as Alzheimer's disease. These technologies produce massive biological data, which so far are extremely difficult to exploit. In this context, we propose GeneMining, a multidisciplinary methodology, which aims at developing new strategies to analyse such data, and to design interactive tools to help biologists to identify, visualize and interpret brain ageing signatures. In order to address the specific problem of brain ageing signatures discovery, we combine and apply existing tools with emphasis to a new efficient data mining method based on sequential patterns.

  13. Global Identification of Disease Associated Genes in Fragile X Cells

    DTIC Science & Technology

    2016-08-01

    AWARD NUMBER: W81XWH-15-1-0204 TITLE: Global Identification of Disease-Associated Genes in Fragile X Cells PRINCIPAL INVESTIGATOR: Wenyi Feng...Global Identification of Disease-Associated Genes in Fragile X Cells 5a. CONTRACT NUMBER 5b. GRANT NUMBER W81XWH-15-1-0204 GRANT1171 2389...conflict. We have performed three biological replicate experiments to rigorously test if the Fragile X cell line produces more DSBs than the normal

  14. Ontology-based literature mining of E. coli vaccine-associated gene interaction networks.

    PubMed

    Hur, Junguk; Özgür, Arzucan; He, Yongqun

    2017-03-14

    Pathogenic Escherichia coli infections cause various diseases in humans and many animal species. However, with extensive E. coli vaccine research, we are still unable to fully protect ourselves against E. coli infections. To more rational development of effective and safe E. coli vaccine, it is important to better understand E. coli vaccine-associated gene interaction networks. In this study, we first extended the Vaccine Ontology (VO) to semantically represent various E. coli vaccines and genes used in the vaccine development. We also normalized E. coli gene names compiled from the annotations of various E. coli strains using a pan-genome-based annotation strategy. The Interaction Network Ontology (INO) includes a hierarchy of various interaction-related keywords useful for literature mining. Using VO, INO, and normalized E. coli gene names, we applied an ontology-based SciMiner literature mining strategy to mine all PubMed abstracts and retrieve E. coli vaccine-associated E. coli gene interactions. Four centrality metrics (i.e., degree, eigenvector, closeness, and betweenness) were calculated for identifying highly ranked genes and interaction types. Using vaccine-related PubMed abstracts, our study identified 11,350 sentences that contain 88 unique INO interactions types and 1,781 unique E. coli genes. Each sentence contained at least one interaction type and two unique E. coli genes. An E. coli gene interaction network of genes and INO interaction types was created. From this big network, a sub-network consisting of 5 E. coli vaccine genes, including carA, carB, fimH, fepA, and vat, and 62 other E. coli genes, and 25 INO interaction types was identified. While many interaction types represent direct interactions between two indicated genes, our study has also shown that many of these retrieved interaction types are indirect in that the two genes participated in the specified interaction process in a required but indirect process. Our centrality analysis of

  15. Global Gene Expression Analysis for the Assessment of Nanobiomaterials.

    PubMed

    Hanagata, Nobutaka

    2015-01-01

    Using global gene expression analysis, the effects of biomaterials and nanomaterials can be analyzed at the genetic level. Even though information obtained from global gene expression analysis can be useful for the evaluation and design of biomaterials and nanomaterials, its use for these purposes is not widespread. This is due to the difficulties involved in data analysis. Because the expression data of about 20,000 genes can be obtained at once with global gene expression analysis, the data must be analyzed using bioinformatics. A method of bioinformatic analysis called gene ontology can estimate the kinds of changes on cell functions caused by genes whose expression level is changed by biomaterials and nanomaterials. Also, by applying a statistical analysis technique called hierarchical clustering to global gene expression data between a variety of biomaterials, the effects of the properties of materials on cell functions can be estimated. In this chapter, these theories of analysis and examples of applications to nanomaterials and biomaterials are described. Furthermore, global microRNA analysis, a method that has gained attention in recent years, and its application to nanomaterials are introduced.

  16. A semiautomated approach to gene discovery through expressed sequence tag data mining: discovery of new human transporter genes.

    PubMed

    Brown, Shoshana; Chang, Jean L; Sadée, Wolfgang; Babbitt, Patricia C

    2003-01-01

    Identification and functional characterization of the genes in the human genome remain a major challenge. A principal source of publicly available information used for this purpose is the National Center for Biotechnology Information database of expressed sequence tags (dbEST), which contains over 4 million human ESTs. To extract the information buried in this data more effectively, we have developed a semiautomated method to mine dbEST for uncharacterized human genes. Starting with a single protein input sequence, a family of related proteins from all species is compiled. This entire family is then used to mine the human EST database for new gene candidates. Evaluation of putative new gene candidates in the context of a family of characterized proteins provides a framework for inference of the structure and function of the new genes. When applied to a test data set of 28 families within the major facilitator superfamily (MFS) of membrane transporters, our protocol found 73 previously characterized human MFS genes and 43 new MFS gene candidates. Development of this approach provided insights into the problems and pitfalls of automated data mining using public databases.

  17. Implications for global climate change from microbially-produced acid mine drainage

    NASA Astrophysics Data System (ADS)

    Norlund, K. L.; Hitchcock, A. P.; Warren, L. A.

    2009-05-01

    Microbial catalysis of sulphur cycling in acid mine drainage (AMD) environments is well known but the reaction pathways are poorly characterised. These reaction pathways involve both acid-consuming and acid- generating steps, with important consequences for overall AMD production as well as sulphur and carbon global biogeochemical cycles. Mining-associated sulphuric acid has been implicated in climate change through the weathering of carbonate minerals resulting in the release of 29 Tg C/year as carbon dioxide. Understanding of microbial AMD generation is based predominantly on studies of Acidithiobacillus ferrooxidans despite the knowledge that other environmentally common strains of bacteria are also active sulphur oxidizers and that microbial consortia are likely very important in environmental processes. Using an integrated experimental approach including geochemical experimentation, scanning transmission X-ray microscopy (STXM) and fluorescent in situ hybridization (FISH), we document a novel syntrophic sulphur metabolism involving two common mine bacteria: autotrophic sulphur oxidizing Acidithiobacillus ferrooxidans and heterotrophic Acidiphilium spp. The proposed sulphur geochemistry associated with this bacterial consortium produces 40-90% less acid than expected based on abiotic AMD models, with significant implications for both AMD mitigation and AMD carbon flux modelling. The two bacterial strains are specifically spatially segregated within a macrostructure of extracellular polymeric substance (EPS) that provides the necessary microgeochemical conditions for coupled sulphur oxidation and reduction reactions. STXM results identify multiple sulphur oxidation states associated with the pods, indicating that they are the sites of active sulphur disproportionation and recycling. Recent laboratory experimentation using type culture strains of the bacteria involved in pod-formation suggesting that this phenomenon is likely to be widespread in environments

  18. Global Analysis of Horizontal Gene Transfer in Fusarium verticillioides

    USDA-ARS?s Scientific Manuscript database

    The co-occurrence of microbes within plants and other specialized niches may facilitate horizontal gene transfer (HGT) affecting host-pathogen interactions. We recently identified fungal-to-fungal HGTs involving metabolic gene clusters. For a global analysis of HGTs in the maize pathogen Fusarium ve...

  19. Biomedical Information Extraction: Mining Disease Associated Genes from Literature

    ERIC Educational Resources Information Center

    Huang, Zhong

    2014-01-01

    Disease associated gene discovery is a critical step to realize the future of personalized medicine. However empirical and clinical validation of disease associated genes are time consuming and expensive. In silico discovery of disease associated genes from literature is therefore becoming the first essential step for biomarker discovery to…

  20. Biomedical Information Extraction: Mining Disease Associated Genes from Literature

    ERIC Educational Resources Information Center

    Huang, Zhong

    2014-01-01

    Disease associated gene discovery is a critical step to realize the future of personalized medicine. However empirical and clinical validation of disease associated genes are time consuming and expensive. In silico discovery of disease associated genes from literature is therefore becoming the first essential step for biomarker discovery to…

  1. RiceGeneThresher: a web-based application for mining genes underlying QTL in rice genome.

    PubMed

    Thongjuea, Supat; Ruanjaichon, Vinitchan; Bruskiewich, Richard; Vanavichit, Apichart

    2009-01-01

    RiceGeneThresher is a public online resource for mining genes underlying genome regions of interest or quantitative trait loci (QTL) in rice genome. It is a compendium of rice genomic resources consisting of genetic markers, genome annotation, expressed sequence tags (ESTs), protein domains, gene ontology, plant stress-responsive genes, metabolic pathways and prediction of protein-protein interactions. RiceGeneThresher system integrates these diverse data sources and provides powerful web-based applications, and flexible tools for delivering customized set of biological data on rice. Its system supports whole-genome gene mining for QTL by querying using DNA marker intervals or genomic loci. RiceGeneThresher provides biologically supported evidences that are essential for targeting groups or networks of genes involved in controlling traits underlying QTL. Users can use it to discover and to assign the most promising candidate genes in preparation for the further gene function validation analysis. The web-based application is freely available at http://rice.kps.ku.ac.th.

  2. Intrinsic limits to gene regulation by global crosstalk

    PubMed Central

    Friedlander, Tamar; Prizak, Roshan; Guet, Călin C.; Barton, Nicholas H.; Tkačik, Gašper

    2016-01-01

    Gene regulation relies on the specificity of transcription factor (TF)–DNA interactions. Limited specificity may lead to crosstalk: a regulatory state in which a gene is either incorrectly activated due to noncognate TF–DNA interactions or remains erroneously inactive. As each TF can have numerous interactions with noncognate cis-regulatory elements, crosstalk is inherently a global problem, yet has previously not been studied as such. We construct a theoretical framework to analyse the effects of global crosstalk on gene regulation. We find that crosstalk presents a significant challenge for organisms with low-specificity TFs, such as metazoans. Crosstalk is not easily mitigated by known regulatory schemes acting at equilibrium, including variants of cooperativity and combinatorial regulation. Our results suggest that crosstalk imposes a previously unexplored global constraint on the functioning and evolution of regulatory networks, which is qualitatively distinct from the known constraints that act at the level of individual gene regulatory elements. PMID:27489144

  3. A global test for gene-gene interactions based on random matrix theory.

    PubMed

    Frost, H Robert; Amos, Christopher I; Moore, Jason H

    2016-12-01

    Statistical interactions between markers of genetic variation, or gene-gene interactions, are believed to play an important role in the etiology of many multifactorial diseases and other complex phenotypes. Unfortunately, detecting gene-gene interactions is extremely challenging due to the large number of potential interactions and ambiguity regarding marker coding and interaction scale. For many data sets, there is insufficient statistical power to evaluate all candidate gene-gene interactions. In these cases, a global test for gene-gene interactions may be the best option. Global tests have much greater power relative to multiple individual interaction tests and can be used on subsets of the markers as an initial filter prior to testing for specific interactions. In this paper, we describe a novel global test for gene-gene interactions, the global epistasis test (GET), that is based on results from random matrix theory. As we show via simulation studies based on previously proposed models for common diseases including rheumatoid arthritis, type 2 diabetes, and breast cancer, our proposed GET method has superior performance characteristics relative to existing global gene-gene interaction tests. A glaucoma GWAS data set is used to demonstrate the practical utility of the GET method.

  4. miRTex: A Text Mining System for miRNA-Gene Relation Extraction.

    PubMed

    Li, Gang; Ross, Karen E; Arighi, Cecilia N; Peng, Yifan; Wu, Cathy H; Vijay-Shanker, K

    2015-01-01

    MicroRNAs (miRNAs) regulate a wide range of cellular and developmental processes through gene expression suppression or mRNA degradation. Experimentally validated miRNA gene targets are often reported in the literature. In this paper, we describe miRTex, a text mining system that extracts miRNA-target relations, as well as miRNA-gene and gene-miRNA regulation relations. The system achieves good precision and recall when evaluated on a literature corpus of 150 abstracts with F-scores close to 0.90 on the three different types of relations. We conducted full-scale text mining using miRTex to process all the Medline abstracts and all the full-length articles in the PubMed Central Open Access Subset. The results for all the Medline abstracts are stored in a database for interactive query and file download via the website at http://proteininformationresource.org/mirtex. Using miRTex, we identified genes potentially regulated by miRNAs in Triple Negative Breast Cancer, as well as miRNA-gene relations that, in conjunction with kinase-substrate relations, regulate the response to abiotic stress in Arabidopsis thaliana. These two use cases demonstrate the usefulness of miRTex text mining in the analysis of miRNA-regulated biological processes.

  5. miRTex: A Text Mining System for miRNA-Gene Relation Extraction

    PubMed Central

    Li, Gang; Ross, Karen E.; Arighi, Cecilia N.; Peng, Yifan; Wu, Cathy H.; Vijay-Shanker, K.

    2015-01-01

    MicroRNAs (miRNAs) regulate a wide range of cellular and developmental processes through gene expression suppression or mRNA degradation. Experimentally validated miRNA gene targets are often reported in the literature. In this paper, we describe miRTex, a text mining system that extracts miRNA-target relations, as well as miRNA-gene and gene-miRNA regulation relations. The system achieves good precision and recall when evaluated on a literature corpus of 150 abstracts with F-scores close to 0.90 on the three different types of relations. We conducted full-scale text mining using miRTex to process all the Medline abstracts and all the full-length articles in the PubMed Central Open Access Subset. The results for all the Medline abstracts are stored in a database for interactive query and file download via the website at http://proteininformationresource.org/mirtex. Using miRTex, we identified genes potentially regulated by miRNAs in Triple Negative Breast Cancer, as well as miRNA-gene relations that, in conjunction with kinase-substrate relations, regulate the response to abiotic stress in Arabidopsis thaliana. These two use cases demonstrate the usefulness of miRTex text mining in the analysis of miRNA-regulated biological processes. PMID:26407127

  6. Clique-based data mining for related genes in a biomedical database

    PubMed Central

    Matsunaga, Tsutomu; Yonemori, Chikara; Tomita, Etsuji; Muramatsu, Masaaki

    2009-01-01

    Background Progress in the life sciences cannot be made without integrating biomedical knowledge on numerous genes in order to help formulate hypotheses on the genetic mechanisms behind various biological phenomena, including diseases. There is thus a strong need for a way to automatically and comprehensively search from biomedical databases for related genes, such as genes in the same families and genes encoding components of the same pathways. Here we address the extraction of related genes by searching for densely-connected subgraphs, which are modeled as cliques, in a biomedical relational graph. Results We constructed a graph whose nodes were gene or disease pages, and edges were the hyperlink connections between those pages in the Online Mendelian Inheritance in Man (OMIM) database. We obtained over 20,000 sets of related genes (called 'gene modules') by enumerating cliques computationally. The modules included genes in the same family, genes for proteins that form a complex, and genes for components of the same signaling pathway. The results of experiments using 'metabolic syndrome'-related gene modules show that the gene modules can be used to get a coherent holistic picture helpful for interpreting relations among genes. Conclusion We presented a data mining approach extracting related genes by enumerating cliques. The extracted gene sets provide a holistic picture useful for comprehending complex disease mechanisms. PMID:19566964

  7. EXCAVATOR: a computer program for efficiently mining gene expression data.

    PubMed

    Xu, Dong; Olman, Victor; Wang, Li; Xu, Ying

    2003-10-01

    Massive amounts of gene expression data are generated using microarrays for functional studies of genes and gene expression data clustering is a useful tool for studying the functional relationship among genes in a biological process. We have developed a computer package EXCAVATOR for clustering gene expression profiles based on our new framework for representing gene expression data as a minimum spanning tree. EXCAVATOR uses a number of rigorous and efficient clustering algorithms. This program has a number of unique features, including capabilities for: (i) data- constrained clustering; (ii) identification of genes with similar expression profiles to pre-specified seed genes; (iii) cluster identification from a noisy background; (iv) computational comparison between different clustering results of the same data set. EXCAVATOR can be run from a Unix/Linux/DOS shell, from a Java interface or from a Web server. The clustering results can be visualized as colored figures and 2-dimensional plots. Moreover, EXCAVATOR provides a wide range of options for data formats, distance measures, objective functions, clustering algorithms, methods to choose number of clusters, etc. The effectiveness of EXCAVATOR has been demonstrated on several experimental data sets. Its performance compares favorably against the popular K-means clustering method in terms of clustering quality and computing time.

  8. DISEASES: text mining and data integration of disease-gene associations.

    PubMed

    Pletscher-Frankild, Sune; Pallejà, Albert; Tsafou, Kalliopi; Binder, Janos X; Jensen, Lars Juhl

    2015-03-01

    Text mining is a flexible technology that can be applied to numerous different tasks in biology and medicine. We present a system for extracting disease-gene associations from biomedical abstracts. The system consists of a highly efficient dictionary-based tagger for named entity recognition of human genes and diseases, which we combine with a scoring scheme that takes into account co-occurrences both within and between sentences. We show that this approach is able to extract half of all manually curated associations with a false positive rate of only 0.16%. Nonetheless, text mining should not stand alone, but be combined with other types of evidence. For this reason, we have developed the DISEASES resource, which integrates the results from text mining with manually curated disease-gene associations, cancer mutation data, and genome-wide association studies from existing databases. The DISEASES resource is accessible through a web interface at http://diseases.jensenlab.org/, where the text-mining software and all associations are also freely available for download. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.

  9. Global phylogeography of chitinase genes in aquatic metagenomes.

    PubMed

    Beier, Sara; Jones, Christopher M; Mohit, Vani; Hallin, Sara; Bertilsson, Stefan

    2011-02-01

    Phylogeny-based analysis of chitinase and 16S rRNA genes from metagenomic data suggests that salinity is a major driver for the distribution of both chitinolytic and total bacterial communities in aquatic systems. Additionally, more acidic chitinase proteins were observed with increasing salinity. Congruent habitat separation was further observed for both genes according to latitude and proximity to the coastline. However, comparison of chitinase and 16S rRNA genes extracted from different geographic locations showed little congruence in distribution. There was no indication that dispersal limited the global distribution of either gene.

  10. Transcriptome-Guided Mining of Genes Involved in Crocin Biosynthesis

    PubMed Central

    Ji, Aijia; Jia, Jing; Xu, Zhichao; Li, Ying; Bi, Wu; Ren, Fengming; He, Chunnian; Liu, Jie; Hu, Kaizhi; Song, Jingyuan

    2017-01-01

    Gardenia jasminoides is used in traditional Chinese medicine and has drawn attention as a rich source of crocin, a compound with reported activity against various cancers, depression and cardiovascular disease. However, genetic information on the crocin biosynthetic pathway of G. jasminoides is scarce. In this study, we performed a transcriptome analysis of the leaves, green fruits, and red fruits of G. jasminoides to identify and predict the genes that encode key enzymes responsible for crocin production, compared with Crocus sativus. Twenty-seven putative pathway genes were specifically expressed in the fruits, consistent with the distribution of crocin in G. jasminoides. Twenty-four of these genes were reported for the first time, and a novel CCD4a gene was predicted that encodes carotenoid cleavage dioxygenase leading to crocin synthesis, in contrast to CCD2 of C. sativus. In addition, 6 other candidate genes (ALDH12, ALDH14, UGT94U1, UGT86D1, UGT71H4, and UGT85K18) were predicted to be involved in crocin biosynthesis following phylogenetic analysis and different gene expression profiles. Identifying the genes that encode key enzymes should help elucidate the crocin biosynthesis pathway. PMID:28443112

  11. Mining Gene Regulatory Networks by Neural Modeling of Expression Time-Series.

    PubMed

    Rubiolo, Mariano; Milone, Diego H; Stegmayer, Georgina

    2015-01-01

    Discovering gene regulatory networks from data is one of the most studied topics in recent years. Neural networks can be successfully used to infer an underlying gene network by modeling expression profiles as times series. This work proposes a novel method based on a pool of neural networks for obtaining a gene regulatory network from a gene expression dataset. They are used for modeling each possible interaction between pairs of genes in the dataset, and a set of mining rules is applied to accurately detect the subjacent relations among genes. The results obtained on artificial and real datasets confirm the method effectiveness for discovering regulatory networks from a proper modeling of the temporal dynamics of gene expression profiles.

  12. Integrated pathway-based transcription regulation network mining and visualization based on gene expression profiles.

    PubMed

    Kibinge, Nelson; Ono, Naoaki; Horie, Masafumi; Sato, Tetsuo; Sugiura, Tadao; Altaf-Ul-Amin, Md; Saito, Akira; Kanaya, Shigehiko

    2016-06-01

    Conventionally, workflows examining transcription regulation networks from gene expression data involve distinct analytical steps. There is a need for pipelines that unify data mining and inference deduction into a singular framework to enhance interpretation and hypotheses generation. We propose a workflow that merges network construction with gene expression data mining focusing on regulation processes in the context of transcription factor driven gene regulation. The pipeline implements pathway-based modularization of expression profiles into functional units to improve biological interpretation. The integrated workflow was implemented as a web application software (TransReguloNet) with functions that enable pathway visualization and comparison of transcription factor activity between sample conditions defined in the experimental design. The pipeline merges differential expression, network construction, pathway-based abstraction, clustering and visualization. The framework was applied in analysis of actual expression datasets related to lung, breast and prostrate cancer.

  13. Integrating constitutive gene expression and chemoactivity: mining the NCI60 anticancer screen.

    PubMed

    Covell, David G

    2012-01-01

    Studies into the genetic origins of tumor cell chemoactivity pose significant challenges to bioinformatic mining efforts. Connections between measures of gene expression and chemoactivity have the potential to identify clinical biomarkers of compound response, cellular pathways important to efficacy and potential toxicities; all vital to anticancer drug development. An investigation has been conducted that jointly explores tumor-cell constitutive NCI60 gene expression profiles and small-molecule NCI60 growth inhibition chemoactivity profiles, viewed from novel applications of self-organizing maps (SOMs) and pathway-centric analyses of gene expressions, to identify subsets of over- and under-expressed pathway genes that discriminate chemo-sensitive and chemo-insensitive tumor cell types. Linear Discriminant Analysis (LDA) is used to quantify the accuracy of discriminating genes to predict tumor cell chemoactivity. LDA results find 15% higher prediction accuracies, using ∼30% fewer genes, for pathway-derived discriminating genes when compared to genes derived using conventional gene expression-chemoactivity correlations. The proposed pathway-centric data mining procedure was used to derive discriminating genes for ten well-known compounds. Discriminating genes were further evaluated using gene set enrichment analysis (GSEA) to reveal a cellular genetic landscape, comprised of small numbers of key over and under expressed on- and off-target pathway genes, as important for a compound's tumor cell chemoactivity. Literature-based validations are provided as support for chemo-important pathways derived from this procedure. Qualitatively similar results are found when using gene expression measurements derived from different microarray platforms. The data used in this analysis is available at http://pubchem.ncbi.nlm.nih.gov/andhttp://www.ncbi.nlm.nih.gov/projects/geo (GPL96, GSE32474).

  14. Ionospheric Signature of Surface Mine Blasts from Global Positioning System Measurements

    NASA Technical Reports Server (NTRS)

    Calais, Eric; Minster, J. Bernard; Hofton, Michelle A.; Hedlin, Michael A. H.

    1998-01-01

    Sources such as atmospheric or buried explosions and shallow earthquakes are known to produce infrasonic pressure waves in the atmosphere. Because of the coupling between neutral particles and electrons at ionospheric altitudes, these acoustic and gravity waves induce variations of the ionospheric electron density. The Global Positioning System (GPS) provides a way of directly measuring the total electron content in the ionosphere and, therefore, of detecting such perturbations in the upper atmosphere. In July and August 1996, three large surface mine blasts (1.5 Kt each) were detonated at the Black Thunder coal mine in eastern Wyoming. As part of a seismic and acoustic monitoring- experiment, we deployed five dual-frequency GPS receivers at distances ranging from 50 to 200 km from the mine and were able to detect the ionospheric perturbation caused by the blasts. The perturbation starts 10 to 15 min after the blast, lasts for about 30 min, and propagates with an apparent horizontal velocity of 1200 meters per second. Its amplitude reaches 3 x 10 (exp 14) el per square meters in the 7-3 min period band, a value close to the ionospheric perturbation caused by the M = 6.7 Northridge earthquake. The small signal-to-noise ratio of the perturbation can be improved by slant-stacking the electron content time-series recorded by the different GPS receivers taking into account the horizontal propagation of the perturbation. The energy of the perturbation is concentrated in the 200 to 300 second period band, a result consistent with previous observations and numerical model predictions. The 300 second band probably corresponds to gravity modes and shorter periods to acoustic modes, respectively. Using a 1-D stratified velocity model of the atmosphere we show that linear acoustic ray tracing fits arrival times at all GPS receivers. We interpret the perturbation as a direct acoustic wave caused by the explosion itself. This study shows that even relatively small subsurface

  15. Ionospheric Signature of Surface Mine Blasts from Global Positioning System Measurements

    NASA Technical Reports Server (NTRS)

    Calais, Eric; Minster, J. Bernard; Hofton, Michelle A.; Hedlin, Michael A. H.

    1998-01-01

    Sources such as atmospheric or buried explosions and shallow earthquakes are known to produce infrasonic pressure waves in the atmosphere. Because of the coupling between neutral particles and electrons at ionospheric altitudes, these acoustic and gravity waves induce variations of the ionospheric electron density. The Global Positioning System (GPS) provides a way of directly measuring the total electron content in the ionosphere and, therefore, of detecting such perturbations in the upper atmosphere. In July and August 1996, three large surface mine blasts (1.5 Kt each) were detonated at the Black Thunder coal mine in eastern Wyoming. As part of a seismic and acoustic monitoring- experiment, we deployed five dual-frequency GPS receivers at distances ranging from 50 to 200 km from the mine and were able to detect the ionospheric perturbation caused by the blasts. The perturbation starts 10 to 15 min after the blast, lasts for about 30 min, and propagates with an apparent horizontal velocity of 1200 meters per second. Its amplitude reaches 3 x 10 (exp 14) el per square meters in the 7-3 min period band, a value close to the ionospheric perturbation caused by the M = 6.7 Northridge earthquake. The small signal-to-noise ratio of the perturbation can be improved by slant-stacking the electron content time-series recorded by the different GPS receivers taking into account the horizontal propagation of the perturbation. The energy of the perturbation is concentrated in the 200 to 300 second period band, a result consistent with previous observations and numerical model predictions. The 300 second band probably corresponds to gravity modes and shorter periods to acoustic modes, respectively. Using a 1-D stratified velocity model of the atmosphere we show that linear acoustic ray tracing fits arrival times at all GPS receivers. We interpret the perturbation as a direct acoustic wave caused by the explosion itself. This study shows that even relatively small subsurface

  16. RESEARCH PAPERS : Ionospheric signature of surface mine blasts from Global Positioning System measurements

    NASA Astrophysics Data System (ADS)

    Calais, Eric; Bernard Minster, J.; Hofton, Michelle; Hedlin, Michael

    1998-01-01

    Sources such as atmospheric or buried explosions and shallow earthquakes are known to produce infrasonic pressure waves in the atmosphere Because of the coupling between neutral particles and electrons at ionospheric altitudes, these acoustic and gravity waves induce variations of the ionospheric electron density. The Global Positioning System (GPS) provides a way of directly measuring the total electron content in the ionosphere and, therefore, of detecting such perturbations in the upper atmosphere. In July and August 1996, three large surface mine blasts (1.5 Kt each) were detonated at the Black Thunder coal mine in eastern Wyoming. As part of a seismic and acoustic monitoring experiment, we deployed five dual-frequency GPS receivers at distances ranging from 50 to 200 km from the mine and were able to detect the ionospheric perturbation caused by the blasts. The perturbation starts 10 to 15 min after the blast, lasts for about 30 min, and propagates with an apparent horizontal velocity of 1200 m s- 1. Its amplitude reaches 3 × 1014 el m- 2 in the 7-3 min period band, a value close to the ionospheric perturbation caused by the M=6.7 Northridge earthquake (Calais & Minster 1995). The small signal-to-noise ratio of the perturbation can be improved by slant-stacking the electron content time-series recorded by the different GPS receivers taking into account the horizontal propagation of the perturbation. The energy of the perturbation is concentrated in the 200 to 300 s period band, a result consistent with previous observations and numerical model predictions. The 300 s band probably corresponds to gravity modes and shorter periods to acoustic modes, respectively. Using a 1-D stratified velocity model of the atmosphere we show that linear acoustic ray tracing fits arrival times at all GPS receivers. We interpret the perturbation as a direct acoustic wave caused by the explosion itself. This study shows that even relatively small subsurface events can produce

  17. An efficient method for mining cross-timepoint gene regulation sequential patterns from time course gene expression datasets.

    PubMed

    Cheng, Chun-Pei; Liu, Yu-Cheng; Tsai, Yi-Lin; Tseng, Vincent S

    2013-01-01

    Observation of gene expression changes implying gene regulations using a repetitive experiment in time course has become more and more important. However, there is no effective method which can handle such kind of data. For instance, in a clinical/biological progression like inflammatory response or cancer formation, a great number of differentially expressed genes at different time points could be identified through a large-scale microarray approach. For each repetitive experiment with different samples, converting the microarray datasets into transactional databases with significant singleton genes at each time point would allow sequential patterns implying gene regulations to be identified. Although traditional sequential pattern mining methods have been successfully proposed and widely used in different interesting topics, like mining customer purchasing sequences from a transactional database, to our knowledge, the methods are not suitable for such biological dataset because every transaction in the converted database may contain too many items/genes. In this paper, we propose a new algorithm called CTGR-Span (Cross-Timepoint Gene Regulation Sequential pattern) to efficiently mine CTGR-SPs (Cross-Timepoint Gene Regulation Sequential Patterns) even on larger datasets where traditional algorithms are infeasible. The CTGR-Span includes several biologically designed parameters based on the characteristics of gene regulation. We perform an optimal parameter tuning process using a GO enrichment analysis to yield CTGR-SPs more meaningful biologically. The proposed method was evaluated with two publicly available human time course microarray datasets and it was shown that it outperformed the traditional methods in terms of execution efficiency. After evaluating with previous literature, the resulting patterns also strongly correlated with the experimental backgrounds of the datasets used in this study. We propose an efficient CTGR-Span to mine several biologically

  18. Association analysis of reactive oxygen species-hypertension genes discovered by literature mining.

    PubMed

    Lim, Ji Eun; Hong, Kyung-Won; Jin, Hyun-Seok; Oh, Bermseok

    2012-12-01

    Oxidative stress, which results in an excessive product of reactive oxygen species (ROS), is one of the fundamental mechanisms of the development of hypertension. In the vascular system, ROS have physical and pathophysiological roles in vascular remodeling and endothelial dysfunction. In this study, ROS-hypertension-related genes were collected by the biological literature-mining tools, such as SciMiner and gene2pubmed, in order to identify the genes that would cause hypertension through ROS. Further, single nucleotide polymorphisms (SNPs) located within these gene regions were examined statistically for their association with hypertension in 6,419 Korean individuals, and pathway enrichment analysis using the associated genes was performed. The 2,945 SNPs of 237 ROS-hypertension genes were analyzed, and 68 genes were significantly associated with hypertension (p < 0.05). The most significant SNP was rs2889611 within MAPK8 (p = 2.70 × 10(-5); odds ratio, 0.82; confidence interval, 0.75 to 0.90). This study demonstrates that a text mining approach combined with association analysis may be useful to identify the candidate genes that cause hypertension through ROS or oxidative stress.

  19. Novel strategies to mine alcoholism-related haplotypes and genes by combining existing knowledge framework.

    PubMed

    Zhang, RuiJie; Li, Xia; Jiang, YongShuai; Liu, GuiYou; Li, ChuanXing; Zhang, Fan; Xiao, Yun; Gong, BinSheng

    2009-02-01

    High-throughout single nucleotide polymorphism detection technology and the existing knowledge provide strong support for mining the disease-related haplotypes and genes. In this study, first, we apply four kinds of haplotype identification methods (Confidence Intervals, Four Gamete Tests, Solid Spine of LD and fusing method of haplotype block) into high-throughout SNP genotype data to identify blocks, then use cluster analysis to verify the effectiveness of the four methods, and select the alcoholism-related SNP haplotypes through risk analysis. Second, we establish a mapping from haplotypes to alcoholism-related genes. Third, we inquire NCBI SNP and gene databases to locate the blocks and identify the candidate genes. In the end, we make gene function annotation by KEGG, Biocarta, and GO database. We find 159 haplotype blocks, which relate to the alcoholism most possibly on chromosome 1 approximately 22, including 227 haplotypes, of which 102 SNP haplotypes may increase the risk of alcoholism. We get 121 alcoholism-related genes and verify their reliability by the functional annotation of biology. In a word, we not only can handle the SNP data easily, but also can locate the disease-related genes precisely by combining our novel strategies of mining alcoholism-related haplotypes and genes with existing knowledge framework.

  20. Study of Lateral Gene Transfer in an Acid Mine Drainage Community Enabled by Comparative Genomics

    NASA Astrophysics Data System (ADS)

    Hugenholtz, P.; Croft, L.; Tyson, G. W.; Baker, B. J.; Detter, C.; Richardson, P. M.; Banfield, J. F.

    2002-12-01

    Lateral gene transfer (LGT) is thought to play a crucial role in the ecology and evolution of prokaryotes. We are investigating the role of LGT in an acid mine drainage community hosted in a pyrite-dominated metal sulfide deposit at the Richmond mine at Iron Mountain, CA. Due to biologically-mediated pyrite dissolution, the prevailing conditions within the mine are extremely low pH (< 1.0), very high ionic concentrations (molar concentrations of iron sulfate and mM concentrations of arsenic, copper and zinc), and moderate to high temperatures (30 to >50 C). These conditions are thought to largely isolate the community from potential external gene donors since naked DNA, phage and prokaryotes native to neutral pH habitats do not persist at pH <1.0 precluding an external influx of genes by transformation, transduction and conjugation, respectively. Microbial communities exist in several distinct habitats within Richmond mine including biofilms (subaqueous slime streamers and subaerial slimes) and cells attached directly to pyrite granules. This, however, belies an unusual simplicity in community composition. All communities investigated to date comprise only a handful of phylogenetically distinct organisms, typically dominated by the iron-oxidizing genera Leptospirillum and Ferroplasma. We have undertaken a community genomics analysis of a subaerial biofilm dominated by a Leptospirillum population to facilitate the study of LGT in this type of environment. The genome of Ferroplasma acidarmanus fer1, a minor component of the target community (but a major component of other Richmond mine communities), has been sequenced. Comparative genome analyses indicate that F. acidarmanus and the ancestor of two acidophilic Thermoplasma species belonging to the Euryarchaeota have traded many genes with phylogenetically remote acidophilic Sulfolobus species (Crenarchaeota). The putatively transferred sets of Sulfolobus genes in Ferroplasma and the Thermoplasma ancestor are distinct

  1. Mining the archives: a cross-platform analysis of gene ...

    EPA Pesticide Factsheets

    Formalin-fixed paraffin-embedded (FFPE) tissue samples represent a potentially invaluable resource for genomic research into the molecular basis of disease. However, use of FFPE samples in gene expression studies has been limited by technical challenges resulting from degradation of nucleic acids. Here we evaluated gene expression profiles derived from fresh-frozen (FRO) and FFPE mouse liver tissues using two DNA microarray protocols and two whole transcriptome sequencing (RNA-seq) library preparation methodologies. The ribo-depletion protocol outperformed the other three methods by having the highest correlations of differentially expressed genes (DEGs) and best overlap of pathways between FRO and FFPE groups. We next tested the effect of sample time in formalin (18 hours or 3 weeks) on gene expression profiles. Hierarchical clustering of the datasets indicated that test article treatment, and not preservation method, was the main driver of gene expression profiles. Meta- and pathway analyses indicated that biological responses were generally consistent for 18-hour and 3-week FFPE samples compared to FRO samples. However, clear erosion of signal intensity with time in formalin was evident, and DEG numbers differed by platform and preservation method. Lastly, we investigated the effect of age in FFPE block on genomic profiles. RNA-seq analysis of 8-, 19-, and 26-year-old control blocks using the ribo-depletion protocol resulted in comparable quality metrics, inc

  2. Mining the archives: a cross-platform analysis of gene ...

    EPA Pesticide Factsheets

    Formalin-fixed paraffin-embedded (FFPE) tissue samples represent a potentially invaluable resource for genomic research into the molecular basis of disease. However, use of FFPE samples in gene expression studies has been limited by technical challenges resulting from degradation of nucleic acids. Here we evaluated gene expression profiles derived from fresh-frozen (FRO) and FFPE mouse liver tissues using two DNA microarray protocols and two whole transcriptome sequencing (RNA-seq) library preparation methodologies. The ribo-depletion protocol outperformed the other three methods by having the highest correlations of differentially expressed genes (DEGs) and best overlap of pathways between FRO and FFPE groups. We next tested the effect of sample time in formalin (18 hours or 3 weeks) on gene expression profiles. Hierarchical clustering of the datasets indicated that test article treatment, and not preservation method, was the main driver of gene expression profiles. Meta- and pathway analyses indicated that biological responses were generally consistent for 18-hour and 3-week FFPE samples compared to FRO samples. However, clear erosion of signal intensity with time in formalin was evident, and DEG numbers differed by platform and preservation method. Lastly, we investigated the effect of age in FFPE block on genomic profiles. RNA-seq analysis of 8-, 19-, and 26-year-old control blocks using the ribo-depletion protocol resulted in comparable quality metrics, inc

  3. Soil metatranscriptomics for mining eukaryotic heavy metal resistance genes.

    PubMed

    Lehembre, Frédéric; Doillon, Didier; David, Elise; Perrotto, Sandrine; Baude, Jessica; Foulon, Julie; Harfouche, Lamia; Vallon, Laurent; Poulain, Julie; Da Silva, Corinne; Wincker, Patrick; Oger-Desfeux, Christine; Richaud, Pierre; Colpaert, Jan V; Chalot, Michel; Fraissinet-Tachet, Laurence; Blaudez, Damien; Marmeisse, Roland

    2013-10-01

    Heavy metals are pollutants which affect all organisms. Since a small number of eukaryotes have been investigated with respect to metal resistance, we hypothesize that many genes that control this phenomenon remain to be identified. This was tested by screening soil eukaryotic metatranscriptomes which encompass RNA from organisms belonging to the main eukaryotic phyla. Soil-extracted polyadenylated mRNAs were converted into cDNAs and 35 of them were selected for their ability to rescue the metal (Cd or Zn) sensitive phenotype of yeast mutants. Few of the genes belonged to families known to confer metal resistance when overexpressed in yeast. Several of them were homologous to genes that had not been studied in the context of metal resistance. For instance, the BOLA ones, which conferred cross metal (Zn, Co, Cd, Mn) resistance may act by interfering with Fe homeostasis. Other genes, such as those encoding 110- to 130-amino-acid-long, cysteine-rich polypeptides, had no homologues in databases. This study confirms that functional metatranscriptomics represents a powerful approach to address basic biological processes in eukaryotes. The selected genes can be used to probe new pathways involved in metal homeostasis and to manipulate the resistance level of selected organisms. © 2013 John Wiley & Sons Ltd and Society for Applied Microbiology.

  4. Integrative mining of traditional Chinese medicine literature and MEDLINE for functional gene networks.

    PubMed

    Zhou, Xuezhong; Liu, Baoyan; Wu, Zhaohui; Feng, Yi

    2007-10-01

    The amount of biomedical data in different disciplines is growing at an exponential rate. Integrating these significant knowledge sources to generate novel hypotheses for systems biology research is difficult. Traditional Chinese medicine (TCM) is a completely different discipline, and is a complementary knowledge system to modern biomedical science. This paper uses a significant TCM bibliographic literature database in China, together with MEDLINE, to help discover novel gene functional knowledge. We present an integrative mining approach to uncover the functional gene relationships from MEDLINE and TCM bibliographic literature. This paper introduces TCM literature (about 50,000 records) as one knowledge source for constructing literature-based gene networks. We use the TCM diagnosis, TCM syndrome, to automatically congregate the related genes. The syndrome-gene relationships are discovered based on the syndrome-disease relationships extracted from TCM literature and the disease-gene relationships in MEDLINE. Based on the bubble-bootstrapping and relation weight computing methods, we have developed a prototype system called MeDisco/3S, which has name entity and relation extraction, and online analytical processing (OLAP) capabilities, to perform the integrative mining process. We have got about 200,000 syndrome-gene relations, which could help generate syndrome-based gene networks, and help analyze the functional knowledge of genes from syndrome perspective. We take the gene network of Kidney-Yang Deficiency syndrome (KYD syndrome) and the functional analysis of some genes, such as CRH (corticotropin releasing hormone), PTH (parathyroid hormone), PRL (prolactin), BRCA1 (breast cancer 1, early onset) and BRCA2 (breast cancer 2, early onset), to demonstrate the preliminary results. The underlying hypothesis is that the related genes of the same syndrome will have some biological functional relationships, and will constitute a functional network. This paper presents

  5. The Gene Expression Barcode 3.0: improved data processing and mining tools.

    PubMed

    McCall, Matthew N; Jaffee, Harris A; Zelisko, Susan J; Sinha, Neeraj; Hooiveld, Guido; Irizarry, Rafael A; Zilliox, Michael J

    2014-01-01

    The Gene Expression Barcode project, http://barcode.luhs.org, seeks to determine the genes expressed for every tissue and cell type in humans and mice. Understanding the absolute expression of genes across tissues and cell types has applications in basic cell biology, hypothesis generation for gene function and clinical predictions using gene expression signatures. In its current version, this project uses the abundant publicly available microarray data sets combined with a suite of single-array preprocessing, quality control and analysis methods. In this article, we present the improvements that have been made since the previous version of the Gene Expression Barcode in 2011. These include a variety of new data mining tools and summaries, estimated transcriptomes and curated annotations.

  6. Immune gene mining by pyrosequencing in the rockshell, Thais clavigera.

    PubMed

    Rhee, Jae-Sung; Kim, Bo-Mi; Jeong, Chang-Bum; Horiguchi, Toshihiro; Lee, Young-Mi; Kim, Il-Chan; Lee, Jae-Seong

    2012-05-01

    The rockshell, Thais clavigera (Gastropoda: Muricidae) has been shown to be a useful species as a potential indicator for diverse pollution in the marine environment. However, their genetic information is still not widely available. Here, we performed an extensive transcriptome analysis of T. clavigera using the pyrosequencing method, and selected innate immune-related genes. Among the unigenes obtained in this species, we annotated a number of immune system-related genes (e.g. adhesive protein, antimicrobial protein, apoptosis- and cell cycle-related protein, cellular defense effector, immune regulator, pattern recognition protein, protease, protease inhibitor, reduction/oxidation-related protein, signal transduction-related protein and stress protein), which are potentially useful for immunity research in this species. To confirm the usefulness of potential immune-biomarker genes, we checked the transcript level of specific immune genes in both different tissues and LPS-exposed rockshells within the T. clavigera transcript database. This study would be helpful to extend our knowledge on the immune system of rockshell in comparative aspects. Also it would be useful to develop the rockshell as a potential test organism for monitoring of marine environment quality. Copyright © 2012 Elsevier Ltd. All rights reserved.

  7. DGIdb 2.0: mining clinically relevant drug-gene interactions.

    PubMed

    Wagner, Alex H; Coffman, Adam C; Ainscough, Benjamin J; Spies, Nicholas C; Skidmore, Zachary L; Campbell, Katie M; Krysiak, Kilannin; Pan, Deng; McMichael, Joshua F; Eldred, James M; Walker, Jason R; Wilson, Richard K; Mardis, Elaine R; Griffith, Malachi; Griffith, Obi L

    2016-01-04

    The Drug-Gene Interaction Database (DGIdb, www.dgidb.org) is a web resource that consolidates disparate data sources describing drug-gene interactions and gene druggability. It provides an intuitive graphical user interface and a documented application programming interface (API) for querying these data. DGIdb was assembled through an extensive manual curation effort, reflecting the combined information of twenty-seven sources. For DGIdb 2.0, substantial updates have been made to increase content and improve its usefulness as a resource for mining clinically actionable drug targets. Specifically, nine new sources of drug-gene interactions have been added, including seven resources specifically focused on interactions linked to clinical trials. These additions have more than doubled the overall count of drug-gene interactions. The total number of druggable gene claims has also increased by 30%. Importantly, a majority of the unrestricted, publicly-accessible sources used in DGIdb are now automatically updated on a weekly basis, providing the most current information for these sources. Finally, a new web view and API have been developed to allow searching for interactions by drug identifiers to complement existing gene-based search functionality. With these updates, DGIdb represents a comprehensive and user friendly tool for mining the druggable genome for precision medicine hypothesis generation. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  8. DDMGD: the database of text-mined associations between genes methylated in diseases from different species.

    PubMed

    Bin Raies, Arwa; Mansour, Hicham; Incitti, Roberto; Bajic, Vladimir B

    2015-01-01

    Gathering information about associations between methylated genes and diseases is important for diseases diagnosis and treatment decisions. Recent advancements in epigenetics research allow for large-scale discoveries of associations of genes methylated in diseases in different species. Searching manually for such information is not easy, as it is scattered across a large number of electronic publications and repositories. Therefore, we developed DDMGD database (http://www.cbrc.kaust.edu.sa/ddmgd/) to provide a comprehensive repository of information related to genes methylated in diseases that can be found through text mining. DDMGD's scope is not limited to a particular group of genes, diseases or species. Using the text mining system DEMGD we developed earlier and additional post-processing, we extracted associations of genes methylated in different diseases from PubMed Central articles and PubMed abstracts. The accuracy of extracted associations is 82% as estimated on 2500 hand-curated entries. DDMGD provides a user-friendly interface facilitating retrieval of these associations ranked according to confidence scores. Submission of new associations to DDMGD is provided. A comparison analysis of DDMGD with several other databases focused on genes methylated in diseases shows that DDMGD is comprehensive and includes most of the recent information on genes methylated in diseases. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. Global and gene specific DNA methylation changes during zebrafish development

    USDA-ARS?s Scientific Manuscript database

    DNA methylation is dynamic through the life of an organism. In this study, we measured the global and gene specific DNA methylation changes in zebrafish at different developmental stages. We found that the methylation percentage of cytosines was 11.75 ± 0.96% in 3.3 hour post fertilization (hpf) zeb...

  10. A high-resolution network model for global gene regulation in Mycobacterium tuberculosis

    PubMed Central

    Peterson, Eliza J.R.; Reiss, David J.; Turkarslan, Serdar; Minch, Kyle J.; Rustad, Tige; Plaisier, Christopher L.; Longabaugh, William J.R.; Sherman, David R.; Baliga, Nitin S.

    2014-01-01

    The resilience of Mycobacterium tuberculosis (MTB) is largely due to its ability to effectively counteract and even take advantage of the hostile environments of a host. In order to accelerate the discovery and characterization of these adaptive mechanisms, we have mined a compendium of 2325 publicly available transcriptome profiles of MTB to decipher a predictive, systems-scale gene regulatory network model. The resulting modular organization of 98% of all MTB genes within this regulatory network was rigorously tested using two independently generated datasets: a genome-wide map of 7248 DNA-binding locations for 143 transcription factors (TFs) and global transcriptional consequences of overexpressing 206 TFs. This analysis has discovered specific TFs that mediate conditional co-regulation of genes within 240 modules across 14 distinct environmental contexts. In addition to recapitulating previously characterized regulons, we discovered 454 novel mechanisms for gene regulation during stress, cholesterol utilization and dormancy. Significantly, 183 of these mechanisms act uniquely under conditions experienced during the infection cycle to regulate diverse functions including 23 genes that are essential to host-pathogen interactions. These and other insights underscore the power of a rational, model-driven approach to unearth novel MTB biology that operates under some but not all phases of infection. PMID:25232098

  11. A high-resolution network model for global gene regulation in Mycobacterium tuberculosis.

    PubMed

    Peterson, Eliza J R; Reiss, David J; Turkarslan, Serdar; Minch, Kyle J; Rustad, Tige; Plaisier, Christopher L; Longabaugh, William J R; Sherman, David R; Baliga, Nitin S

    2014-10-01

    The resilience of Mycobacterium tuberculosis (MTB) is largely due to its ability to effectively counteract and even take advantage of the hostile environments of a host. In order to accelerate the discovery and characterization of these adaptive mechanisms, we have mined a compendium of 2325 publicly available transcriptome profiles of MTB to decipher a predictive, systems-scale gene regulatory network model. The resulting modular organization of 98% of all MTB genes within this regulatory network was rigorously tested using two independently generated datasets: a genome-wide map of 7248 DNA-binding locations for 143 transcription factors (TFs) and global transcriptional consequences of overexpressing 206 TFs. This analysis has discovered specific TFs that mediate conditional co-regulation of genes within 240 modules across 14 distinct environmental contexts. In addition to recapitulating previously characterized regulons, we discovered 454 novel mechanisms for gene regulation during stress, cholesterol utilization and dormancy. Significantly, 183 of these mechanisms act uniquely under conditions experienced during the infection cycle to regulate diverse functions including 23 genes that are essential to host-pathogen interactions. These and other insights underscore the power of a rational, model-driven approach to unearth novel MTB biology that operates under some but not all phases of infection. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  12. [Gene mining of sulfur-containing amino acid metabolic enzymes in soybean].

    PubMed

    Qiu, Hongmei; Hao, Wenyuan; Gao, Shuqin; Ma, Xiaoping; Zheng, Yuhong; Meng, Fanfan; Fan, Xuhong; Wang, Yang; Wang, Yueqiang; Wang, Shuming

    2014-09-01

    The genes of sulfur-containing amino acid synthetases in soybean are essential for the synthesis of sulfur-containing amino acids. Gene mining of these enzymes is the basis for the molecular assistant breeding of high sulfur-containing amino acids in soybean. In this study, using software BioMercator2.1, 113 genes of sulfur-containing amino acid enzymes and 33 QTLs controlling the sulfur-containing amino acids content were mapped onto Consensus Map 4.0, which was integrated by genetic and physical maps of soybean. Sixteen candidate genes associated to the synthesis of sulfur-containing amino acids were screened based on the synteny between gene loci and QTLs, and the effect values of QTLs. Through a bioinformatic analysis of the copy number, SNP information, and expression profile of candidate genes, 12 related enzyme genes were identified and mapped on 8 linkage groups, such as D1a, M, A2, K, and G. The genes corresponding to QTL regions can explain 6%?38.5% genetic variation of sulfur-containing amino acids, and among them, the indirect effect values of 9 genes were more than 10%. These 12 genes were involved in sulfur-containing amino acid metabolism and were highly expressed in the cotyledons and flowers, showing an abundance of SNPs. These genes can be used as candidate genes for the development of functional markers, and it will lay a foundation for molecular design breeding in soybean.

  13. Literature Mining and Ontology based Analysis of Host-Brucella Gene–Gene Interaction Network

    PubMed Central

    Karadeniz, İlknur; Hur, Junguk; He, Yongqun; Özgür, Arzucan

    2015-01-01

    Brucella is an intracellular bacterium that causes chronic brucellosis in humans and various mammals. The identification of host-Brucella interaction is crucial to understand host immunity against Brucella infection and Brucella pathogenesis against host immune responses. Most of the information about the inter-species interactions between host and Brucella genes is only available in the text of the scientific publications. Many text-mining systems for extracting gene and protein interactions have been proposed. However, only a few of them have been designed by considering the peculiarities of host–pathogen interactions. In this paper, we used a text mining approach for extracting host-Brucella gene–gene interactions from the abstracts of articles in PubMed. The gene–gene interactions here represent the interactions between genes and/or gene products (e.g., proteins). The SciMiner tool, originally designed for detecting mammalian gene/protein names in text, was extended to identify host and Brucella gene/protein names in the abstracts. Next, sentence-level and abstract-level co-occurrence based approaches, as well as sentence-level machine learning based methods, originally designed for extracting intra-species gene interactions, were utilized to extract the interactions among the identified host and Brucella genes. The extracted interactions were manually evaluated. A total of 46 host-Brucella gene interactions were identified and represented as an interaction network. Twenty four of these interactions were identified from sentence-level processing. Twenty two additional interactions were identified when abstract-level processing was performed. The Interaction Network Ontology (INO) was used to represent the identified interaction types at a hierarchical ontology structure. Ontological modeling of specific gene–gene interactions demonstrates that host–pathogen gene–gene interactions occur at experimental conditions which can be ontologically

  14. Bacteria and Genes Involved in Arsenic Speciation in Sediment Impacted by Long-Term Gold Mining

    PubMed Central

    Costa, Patrícia S.; Scholte, Larissa L. S.; Reis, Mariana P.; Chaves, Anderson V.; Oliveira, Pollyanna L.; Itabayana, Luiza B.; Suhadolnik, Maria Luiza S.; Barbosa, Francisco A. R.; Chartone-Souza, Edmar; Nascimento, Andréa M. A.

    2014-01-01

    The bacterial community and genes involved in geobiocycling of arsenic (As) from sediment impacted by long-term gold mining were characterized through culture-based analysis of As-transforming bacteria and metagenomic studies of the arsC, arrA, and aioA genes. Sediment was collected from the historically gold mining impacted Mina stream, located in one of the world’s largest mining regions known as the “Iron Quadrangle”. A total of 123 As-resistant bacteria were recovered from the enrichment cultures, which were phenotypically and genotypically characterized for As-transformation. A diverse As-resistant bacteria community was found through phylogenetic analyses of the 16S rRNA gene. Bacterial isolates were affiliated with Proteobacteria, Firmicutes, and Actinobacteria and were represented by 20 genera. Most were AsV-reducing (72%), whereas AsIII-oxidizing accounted for 20%. Bacteria harboring the arsC gene predominated (85%), followed by aioA (20%) and arrA (7%). Additionally, we identified two novel As-transforming genera, Thermomonas and Pannonibacter. Metagenomic analysis of arsC, aioA, and arrA sequences confirmed the presence of these genes, with arrA sequences being more closely related to uncultured organisms. Evolutionary analyses revealed high genetic similarity between some arsC and aioA sequences obtained from isolates and clone libraries, suggesting that those isolates may represent environmentally important bacteria acting in As speciation. In addition, our findings show that the diversity of arrA genes is wider than earlier described, once none arrA-OTUs were affiliated with known reference strains. Therefore, the molecular diversity of arrA genes is far from being fully explored deserving further attention. PMID:24755825

  15. [Study of gene data mining based on informatics theory].

    PubMed

    Ang, Qing; Wang, Weidong; Wang, Guojing; Peng, Fulai

    2012-07-01

    By combining with informatics theory, ta system model consisting of feature selection which is based on redundancy and correlation is presented to develop disease classification research with five gene data set (NCI, Lymphoma, Lung, Leukemia, Colon). The result indicates that this modeling method can not only reduce data management computation amount, but also help confirming amount of features, further more improve classification accuracy, and the application of this model has a bright foreground in fields of disease analysis and individual treatment project establishment.

  16. TreeDT: tree pattern mining for gene mapping.

    PubMed

    Sevon, Petteri; Toivonen, Hannu; Ollikainen, Vesa

    2006-01-01

    We describe TreeDT, a novel association-based gene mapping method. Given a set of disease-associated haplotypes and a set of control haplotypes, TreeDT predicts likely locations of a disease susceptibility gene. TreeDT extracts, essentially in the form of haplotype trees, information about historical recombinations in the population: A haplotype tree constructed at a given chromosomal location is an estimate of the genealogy of the haplotypes. TreeDT constructs these trees for all locations on the given haplotypes and performs a novel disequilibrium test on each tree: Is there a small set of subtrees with relatively high proportions of disease-associated chromosomes, suggesting shared genetic history for those and a likely disease gene location? We give a detailed description of TreeDT and the tree disequilibrium tests, we analyze the algorithm formally, and we evaluate its performance experimentally on both simulated and real data sets. Experimental results demonstrate that TreeDT has high accuracy on difficult mapping tasks and comparisons to other methods (EATDT, HPM, TDT) show that TreeDT is very competitive.

  17. Global gene expression of Listeria monocytogenes to salt stress.

    PubMed

    Bae, Dongryeoul; Liu, Connie; Zhang, Ting; Jones, Marcus; Peterson, Scott N; Wang, Chinling

    2012-05-01

    Outbreaks of listeriosis caused by the ingestion of Listeria-contaminated ready-to-eat foods have been reported worldwide. Many ready-to-eat foods, such as deli meat products, contain high amounts of salt, which can disrupt the maintenance of osmotic balance within bacterial cells. To understand how Listeria monocytogenes adapts to salt stress, we examined the growth and global gene expression profiles of L. monocytogenes strain F2365 under salt stress using oligonucleotide probe-based DNA array and quantitative real-time PCR (qRT-PCR) analyses. The growth of L. monocytogenes in brain heart infusion (BHI) medium with various concentrations of NaCl (2.5, 5, and 10%) was significantly inhibited (P < 0.01) when compared with growth in BHI with no NaCl supplementation. Microarray data indicated that growth in BHI medium with 1.2% NaCl upregulated 4 genes and down-regulated 24 genes in L. monocytogenes, which was confirmed by qRT-PCR. The transcript levels of genes involved in the uptake of glycine betaine/(L)-proline were increased, whereas genes associated with a putative phosphoenolpyruvate-dependent sugar phosphotransferase system (PTS), metabolic enzymes, and virulence factor were down-regulated. Specifically, the expression levels of PTS transport genes were shown to be dependent on NaCl concentration. To further examine whether the down-regulation of PTS genes is related to decreased cell growth, the transcript levels of genes encoding components of enzyme II, involved in the uptake of various sugars used as the primary carbon source in bacteria, were also measured using qRT-PCR. Our results suggest that the decreased transcript levels of PTS genes may be caused by salt stress or reduced cell growth through salt stress. Here, we report global transcriptional profiles of L. monocytogenes in response to salt stress, contributing to an improved understanding of osmotolerance in this bacterium.

  18. Discovery of Phytophthora infestans Genes Expressed in Planta through Mining of cDNA Libraries

    PubMed Central

    Chaves, Diego; Pinzón, Andrés; Grajales, Alejandro; Rojas, Alejandro; Mutis, Gabriel; Cárdenas, Martha; Burbano, Daniel; Jiménez, Pedro; Bernal, Adriana; Restrepo, Silvia

    2010-01-01

    Background Phytophthora infestans (Mont.) de Bary causes late blight of potato and tomato, and has a broad host range within the Solanaceae family. Most studies of the Phytophthora – Solanum pathosystem have focused on gene expression in the host and have not analyzed pathogen gene expression in planta. Methodology/Principal Findings We describe in detail an in silico approach to mine ESTs from inoculated host plants deposited in a database in order to identify particular pathogen sequences associated with disease. We identified candidate effector genes through mining of 22,795 ESTs corresponding to P. infestans cDNA libraries in compatible and incompatible interactions with hosts from the Solanaceae family. Conclusions/Significance We annotated genes of P. infestans expressed in planta associated with late blight using different approaches and assigned putative functions to 373 out of the 501 sequences found in the P. infestans genome draft, including putative secreted proteins, domains associated with pathogenicity and poorly characterized proteins ideal for further experimental studies. Our study provides a methodology for analyzing cDNA libraries and provides an understanding of the plant – oomycete pathosystems that is independent of the host, condition, or type of sample by identifying genes of the pathogen expressed in planta. PMID:20352100

  19. [Literature mining and bioinformatic analysis of dysregulated genes in hypertrophic scar].

    PubMed

    Huang, Chen; Li, Bo-Lun; Qin, Ze-Lian

    2011-11-01

    To explore the pathogenesis mechanism of hypertrophic scar (HS) and the effective means for its clinical treatment, the difference of the gene expressions between HS and normal skin was compared. The differentially expressed genes between HS and normal skin were obtained by mining PubMed. The dysregulated genes in HS were analyzed by a series of bioinformatics methods, including protein-protein interaction networks, pathways, Gene Ontology and functional annotation clustering analysis. A total of 55 dysregulated genes in HS was identified (46 up-regulated genes and 9 down-regulated genes). Fifty-one genes were found to encode proteins with interaction network, including up-regulated genes TGFB1, FN1, JUN, COL1A1, CTGF, VEGFA, FOS, COL3A1, IGF1, IL4, PELO, SMAD2, TIMP1, PCNA, and ITGA4 and down-regulated genes ITGB1 and DCN as the central nodes for this network. The dysregulated genes in HS involved in a variety of biological pathways, such as focal adhesion formation, integrin signal transduction, and tumor formation. Furthermore, the dysregulated genes in HS played the important roles in biological processes of cell surface receptor linked signal transduction, tissue development, cell proliferation and apoptosis, and macromolecule biosynthetic process, as well as in molecular function of calcium ion binding, double-stranded DNA binding, heparin binding, promoter binding and MAP kinase activity. The results of functional annotation clustering analysis revealed that the dysregulated genes in HS involved in epidermis development, angiogenesis, and apoptosis. Such key genes as TGFB1, FN1, and JUN, along with the pathways, biological processes and molecular functions involving epidermis development, angiogenesis, and extracellular matrix-integrin-focal adhesion signal transduction may play the important roles in the development of HS. The investigations of the dysregulated genes in HS could provide the new targets for clinical treatment.

  20. Data mining of gene expression data by fuzzy and hybrid fuzzy methods.

    PubMed

    Schaefer, Gerald; Nakashima, Tomoharu

    2010-01-01

    Microarray studies and gene expression analysis have received tremendous attention over the last few years and provide many promising avenues toward the understanding of fundamental questions in biology and medicine. Data mining of these vasts amount of data is crucial in gaining this understanding. In this paper, we present a fuzzy rule-based classification system that allows for effective analysis of gene expression data. The applied classifier consists of a set of fuzzy if-then rules that enable accurate nonlinear classification of input patterns. We further present a hybrid fuzzy classification scheme in which a small number of fuzzy if-then rules are selected through means of a genetic algorithm, leading to a compact classifier for gene expression analysis. Extensive experimental results on various well-known gene expression datasets confirm the efficacy of our approaches.

  1. Integrated protein function prediction by mining function associations, sequences, and protein-protein and gene-gene interaction networks.

    PubMed

    Cao, Renzhi; Cheng, Jianlin

    2016-01-15

    Protein function prediction is an important and challenging problem in bioinformatics and computational biology. Functionally relevant biological information such as protein sequences, gene expression, and protein-protein interactions has been used mostly separately for protein function prediction. One of the major challenges is how to effectively integrate multiple sources of both traditional and new information such as spatial gene-gene interaction networks generated from chromosomal conformation data together to improve protein function prediction. In this work, we developed three different probabilistic scores (MIS, SEQ, and NET score) to combine protein sequence, function associations, and protein-protein interaction and spatial gene-gene interaction networks for protein function prediction. The MIS score is mainly generated from homologous proteins found by PSI-BLAST search, and also association rules between Gene Ontology terms, which are learned by mining the Swiss-Prot database. The SEQ score is generated from protein sequences. The NET score is generated from protein-protein interaction and spatial gene-gene interaction networks. These three scores were combined in a new Statistical Multiple Integrative Scoring System (SMISS) to predict protein function. We tested SMISS on the data set of 2011 Critical Assessment of Function Annotation (CAFA). The method performed substantially better than three base-line methods and an advanced method based on protein profile-sequence comparison, profile-profile comparison, and domain co-occurrence networks according to the maximum F-measure. Copyright © 2015 Elsevier Inc. All rights reserved.

  2. Data Mining for Global Change: A Vision for "Big Data" in the Earth Sciences

    NASA Astrophysics Data System (ADS)

    Steinhaeuser, K.

    2012-12-01

    Over the past several decades, the Earth sciences have undergone a rapid transformation from a historically data-poor to a relatively data-rich environment. This development is largely due to significant improvements in observation technologies (notably satellites since the 1970s) on one hand, and advances in computational tools (both hardware and software) on the other. As a result the Earth sciences are primed to enter the Fourth Paradigm, a term coined by the late Jim Gray to describe a new realm of scientific discovery driven by data analysis - the other three being theory, experimentation, and computer simulation. In particular, observations from remote sensors on satellites and weather radars, in situ sensors and sensor networks, along with outputs of global climate or Earth system models from large-scale simulations as well as regional modeling studies, produce data approaching the Tera- and Petabyte scales. These massive and information-rich datasets offer a significant opportunity for advancing our understanding of the global climate system and in turn our ability to make better informed projections of future climate change, yet current data analysis techniques are not able to realize their full potential. We will outline a vision for the application of "Big Data" tools and technologies in the Earth sciences, which have the potential to make a transformative impact on the toolbox available to the scientist as well as the way science is conducted. For instance, data mining and machine learning could provide novel computational tools that empower scientists to perform analyses more efficiently and effectively than ever before: tedious routine tasks become automated, existing methods scale to significantly larger datasets, and innovative methods may provide new capabilities altogether. Most notably we are not interested in leveraging computation for simulations of increasing scale or resolution but rather in the analysis of datasets of increasing size and

  3. Mining the transcriptomes of four commercially important shellfish species for single nucleotide polymorphisms within biomineralization genes.

    PubMed

    Vendrami, David L J; Shah, Abhijeet; Telesca, Luca; Hoffman, Joseph I

    2016-06-01

    Transcriptional profiling not only provides insights into patterns of gene expression, but also generates sequences that can be mined for molecular markers, which in turn can be used for population genetic studies. As part of a large-scale effort to better understand how commercially important European shellfish species may respond to ocean acidification, we therefore mined the transcriptomes of four species (the Pacific oyster Crassostrea gigas, the blue mussel Mytilus edulis, the great scallop Pecten maximus and the blunt gaper Mya truncata) for single nucleotide polymorphisms (SNPs). Illumina data for C. gigas, M. edulis and P. maximus and 454 data for M. truncata were interrogated using GATK and SWAP454 respectively to identify between 8267 and 47,159 high quality SNPs per species (total=121,053 SNPs residing within 34,716 different contigs). We then annotated the transcripts containing SNPs to reveal homology to diverse genes. Finally, as oceanic pH affects the ability of organisms to incorporate calcium carbonate, we honed in on genes implicated in the biomineralization process to identify a total of 1899 SNPs in 157 genes. These provide good candidates for biomarkers with which to study patterns of selection in natural or experimental populations. Copyright © 2016 Elsevier B.V. All rights reserved.

  4. Global Effects of Catecholamines on Actinobacillus pleuropneumoniae Gene Expression

    PubMed Central

    Li, Lu; Xu, Zhuofei; Zhou, Yang; Sun, Lili; Liu, Ziduo; Chen, Huanchun; Zhou, Rui

    2012-01-01

    Bacteria can use mammalian hormones to modulate pathogenic processes that play essential roles in disease development. Actinobacillus pleuropneumoniae is an important porcine respiratory pathogen causing great economic losses in the pig industry globally. Stress is known to contribute to the outcome of A. pleuropneumoniae infection. To test whether A. pleuropneumoniae could respond to stress hormone catecholamines, gene expression profiles after epinephrine (Epi) and norepinephrine (NE) treatment were compared with those from untreated bacteria. The microarray results showed that 158 and 105 genes were differentially expressed in the presence of Epi and NE, respectively. These genes were assigned to various functional categories including many virulence factors. Only 18 genes were regulated by both hormones. These genes included apxIA (the ApxI toxin structural gene), pgaB (involved in biofilm formation), APL_0443 (an autotransporter adhesin) and genes encoding potential hormone receptors such as tyrP2, the ygiY-ygiX (qseC-qseB) operon and narQ-narP (involved in nitrate metabolism). Further investigations demonstrated that cytotoxic activity was enhanced by Epi but repressed by NE in accordance with apxIA gene expression changes. Biofilm formation was not affected by either of the two hormones despite pgaB expression being affected. Adhesion to host cells was induced by NE but not by Epi, suggesting that the hormones affect other putative adhesins in addition to APL_0443. This study revealed that A. pleuropneumoniae gene expression, including those encoding virulence factors, was altered in response to both catecholamines. The differential regulation of A. pleuropneumoniae gene expression by the two hormones suggests that this pathogen may have multiple responsive systems for the two catecholamines. PMID:22347439

  5. Mining differential top-k co-expression patterns from time course comparative gene expression datasets

    PubMed Central

    2013-01-01

    Background Frequent pattern mining analysis applied on microarray dataset appears to be a promising strategy for identifying relationships between gene expression levels. Unfortunately, too many itemsets (co-expressed genes) are identified by this analysis method since it does not consider the importance of each gene within biological processes to a cellular response and does not take into account temporal properties under biological treatment-control matched conditions in a microarray dataset. Results We propose a method termed TIIM (Top-k Impactful Itemsets Miner), which only requires specifying a user-defined number k to explore the top k itemsets with the most significantly differentially co-expressed genes between 2 conditions in a time course. To give genes different weights, a table with impact degrees for each gene was constructed based on the number of neighboring genes that are differently expressed in the dataset within gene regulatory networks. Finally, the resulting top-k impactful itemsets were manually evaluated using previous literature and analyzed by a Gene Ontology enrichment method. Conclusions In this study, the proposed method was evaluated in 2 publicly available time course microarray datasets with 2 different experimental conditions. Both datasets identified potential itemsets with co-expressed genes evaluated from the literature and showed higher accuracies compared to the 2 corresponding control methods: i) performing TIIM without considering the gene expression differentiation between 2 different experimental conditions and impact degrees, and ii) performing TIIM with a constant impact degree for each gene. Our proposed method found that several new gene regulations involved in these itemsets were useful for biologists and provided further insights into the mechanisms underpinning biological processes. The Java source code and other related materials used in this study are available at

  6. MARQ: an online tool to mine GEO for experiments with similar or opposite gene expression signatures.

    PubMed

    Vazquez, Miguel; Nogales-Cadenas, Ruben; Arroyo, Javier; Botías, Pedro; García, Raul; Carazo, Jose M; Tirado, Francisco; Pascual-Montano, Alberto; Carmona-Saez, Pedro

    2010-07-01

    The enormous amount of data available in public gene expression repositories such as Gene Expression Omnibus (GEO) offers an inestimable resource to explore gene expression programs across several organisms and conditions. This information can be used to discover experiments that induce similar or opposite gene expression patterns to a given query, which in turn may lead to the discovery of new relationships among diseases, drugs or pathways, as well as the generation of new hypotheses. In this work, we present MARQ, a web-based application that allows researchers to compare a query set of genes, e.g. a set of over- and under-expressed genes, against a signature database built from GEO datasets for different organisms and platforms. MARQ offers an easy-to-use and integrated environment to mine GEO, in order to identify conditions that induce similar or opposite gene expression patterns to a given experimental condition. MARQ also includes additional functionalities for the exploration of the results, including a meta-analysis pipeline to find genes that are differentially expressed across different experiments. The application is freely available at http://marq.dacya.ucm.es.

  7. Identification of fever and vaccine-associated gene interaction networks using ontology-based literature mining.

    PubMed

    Hur, Junguk; Ozgür, Arzucan; Xiang, Zuoshuang; He, Yongqun

    2012-12-20

    multiple TLRs were found in the generic fever network, it is reasonable to hypothesize that vaccine-TLR interactions may play an important role in inducing fever response, which deserves a further investigation. This study demonstrated that ontology-based literature mining is a powerful method for analyzing gene interaction networks and generating new scientific hypotheses.

  8. Global Patterns of Diversity and Selection in Human Tyrosinase Gene

    PubMed Central

    Hudjashov, Georgi; Villems, Richard; Kivisild, Toomas

    2013-01-01

    Global variation in skin pigmentation is one of the most striking examples of environmental adaptation in humans. More than two hundred loci have been identified as candidate genes in model organisms and a few tens of these have been found to be significantly associated with human skin pigmentation in genome-wide association studies. However, the evolutionary history of different pigmentation genes is rather complex: some loci have been subjected to strong positive selection, while others evolved under the relaxation of functional constraints in low UV environment. Here we report the results of a global study of the human tyrosinase gene, which is one of the key enzymes in melanin production, to assess the role of its variation in the evolution of skin pigmentation differences among human populations. We observe a higher rate of non-synonymous polymorphisms in the European sample consistent with the relaxation of selective constraints. A similar pattern was previously observed in the MC1R gene and concurs with UV radiation-driven model of skin color evolution by which mutations leading to lower melanin levels and decreased photoprotection are subject to purifying selection at low latitudes while being tolerated or even favored at higher latitudes because they facilitate UV-dependent vitamin D production. Our coalescent date estimates suggest that the non-synonymous variants, which are frequent in Europe and North Africa, are recent and have emerged after the separation of East and West Eurasian populations. PMID:24040225

  9. Mining royalties: a global study of their impact on investors, government and civil society

    SciTech Connect

    Otto James

    2006-08-15

    The book discusses the history of royalties and the types currently in use, covering issues such as tax administration, revenue distribution and reporting. It identifies the strengths and weaknesses of various royalty approaches and their impact on production decisions and mine economics. A section on governance looks at the management of mining revenue by governments and the need for transparency. There is an attached CD with 4 appendixes with examples of royalty legislation from over 40 countries. 10 figs., 40 tabs., 4 apps.

  10. Mining expressed sequence tag (EST) libraries for cancer-associated genes.

    PubMed

    Schmitt, Armin O

    2010-01-01

    Originally established in the beginning of the 1990s as a direct route to gene finding, expressed sequence tags (ESTs) still lend themselves as a means to analyze gene expression in almost all human tissues. The type of questions that can be addressed using public EST libraries ranges from tissue-specific gene profiling to the comparison between tissues in diseased and healthy states. Thanks to a multitude of web-based online bioinformatics resources, mining in EST libraries is not restricted to experts in the field of data analysis, but can readily be performed by the medical or life scientist. In this chapter, a couple of cases studies are presented that guide the scientist to the most useful online resources so that they can conduct their own research.

  11. Mining maximal cohesive induced subnetworks and patterns by integrating biological networks with gene profile data.

    PubMed

    Alroobi, Rami; Ahmed, Syed; Salem, Saeed

    2013-09-01

    With the availability of vast amounts of protein-protein, protein-DNA interactions, and genome-wide mRNA expression data for several organisms, identifying biological complexes has emerged as a major task in systems biology. Most of the existing approaches for complex identification have focused on utilizing one source of data. Recent research has shown that systematic integration of gene profile data with interaction data yields significant patterns. In this paper, we introduce the problem of mining maximal cohesive subnetworks that satisfy user-defined constraints defined over the gene profiles of the reported subnetworks. Moreover, we introduce the problem of finding maximal cohesive patterns which are sets of cohesive genes. Experiments on Yeast and Human datasets show the effectiveness of the proposed approach by assessing the overlap of the discovered subnetworks with known biological complexes. Moreover, GO enrichment analysis shows that the discovered subnetworks are biologically significant.

  12. Distributed Function Mining for Gene Expression Programming Based on Fast Reduction

    PubMed Central

    Deng, Song; Yue, Dong; Yang, Le-chan; Fu, Xiong; Feng, Ya-zhou

    2016-01-01

    For high-dimensional and massive data sets, traditional centralized gene expression programming (GEP) or improved algorithms lead to increased run-time and decreased prediction accuracy. To solve this problem, this paper proposes a new improved algorithm called distributed function mining for gene expression programming based on fast reduction (DFMGEP-FR). In DFMGEP-FR, fast attribution reduction in binary search algorithms (FAR-BSA) is proposed to quickly find the optimal attribution set, and the function consistency replacement algorithm is given to solve integration of the local function model. Thorough comparative experiments for DFMGEP-FR, centralized GEP and the parallel gene expression programming algorithm based on simulated annealing (parallel GEPSA) are included in this paper. For the waveform, mushroom, connect-4 and musk datasets, the comparative results show that the average time-consumption of DFMGEP-FR drops by 89.09%%, 88.85%, 85.79% and 93.06%, respectively, in contrast to centralized GEP and by 12.5%, 8.42%, 9.62% and 13.75%, respectively, compared with parallel GEPSA. Six well-studied UCI test data sets demonstrate the efficiency and capability of our proposed DFMGEP-FR algorithm for distributed function mining. PMID:26751200

  13. Distributed Function Mining for Gene Expression Programming Based on Fast Reduction.

    PubMed

    Deng, Song; Yue, Dong; Yang, Le-chan; Fu, Xiong; Feng, Ya-zhou

    2016-01-01

    For high-dimensional and massive data sets, traditional centralized gene expression programming (GEP) or improved algorithms lead to increased run-time and decreased prediction accuracy. To solve this problem, this paper proposes a new improved algorithm called distributed function mining for gene expression programming based on fast reduction (DFMGEP-FR). In DFMGEP-FR, fast attribution reduction in binary search algorithms (FAR-BSA) is proposed to quickly find the optimal attribution set, and the function consistency replacement algorithm is given to solve integration of the local function model. Thorough comparative experiments for DFMGEP-FR, centralized GEP and the parallel gene expression programming algorithm based on simulated annealing (parallel GEPSA) are included in this paper. For the waveform, mushroom, connect-4 and musk datasets, the comparative results show that the average time-consumption of DFMGEP-FR drops by 89.09%%, 88.85%, 85.79% and 93.06%, respectively, in contrast to centralized GEP and by 12.5%, 8.42%, 9.62% and 13.75%, respectively, compared with parallel GEPSA. Six well-studied UCI test data sets demonstrate the efficiency and capability of our proposed DFMGEP-FR algorithm for distributed function mining.

  14. Drug repurposing by integrated literature mining and drug-gene-disease triangulation.

    PubMed

    Sun, Peng; Guo, Jiong; Winnenburg, Rainer; Baumbach, Jan

    2016-10-22

    Drug design is expensive, time-consuming and becoming increasingly complicated. Computational approaches for inferring potentially new purposes of existing drugs, referred to as drug repositioning, play an increasingly important part in current pharmaceutical studies. Here, we first summarize recent developments in computational drug repositioning and introduce the utilized data sources. Afterwards, we introduce a new data fusion model based on n-cluster editing as a novel multi-source triangulation strategy, which was further combined with semantic literature mining. Our evaluation suggests that utilizing drug-gene-disease triangulation coupled to sophisticated text analysis is a robust approach for identifying new drug candidates for repurposing.

  15. A genome-wide association study of global gene expression.

    PubMed

    Dixon, Anna L; Liang, Liming; Moffatt, Miriam F; Chen, Wei; Heath, Simon; Wong, Kenny C C; Taylor, Jenny; Burnett, Edward; Gut, Ivo; Farrall, Martin; Lathrop, G Mark; Abecasis, Gonçalo R; Cookson, William O C

    2007-10-01

    We have created a global map of the effects of polymorphism on gene expression in 400 children from families recruited through a proband with asthma. We genotyped 408,273 SNPs and identified expression quantitative trait loci from measurements of 54,675 transcripts representing 20,599 genes in Epstein-Barr virus-transformed lymphoblastoid cell lines. We found that 15,084 transcripts (28%) representing 6,660 genes had narrow-sense heritabilities (H2) > 0.3. We executed genome-wide association scans for these traits and found peak lod scores between 3.68 and 59.1. The most highly heritable traits were markedly enriched in Gene Ontology descriptors for response to unfolded protein (chaperonins and heat shock proteins), regulation of progression through the cell cycle, RNA processing, DNA repair, immune responses and apoptosis. SNPs that regulate expression of these genes are candidates in the study of degenerative diseases, malignancy, infection and inflammation. We have created a downloadable database to facilitate use of our findings in the mapping of complex disease loci.

  16. Global regulation of gene expression in Escherichia coli.

    PubMed Central

    Chuang, S E; Daniels, D L; Blattner, F R

    1993-01-01

    Global transcription responses of Escherichia coli to various stimuli or genetic defects were studied by measuring mRNA levels in about 400 segments of the genome. Measuring mRNA levels was done by analyzing hybridization to DNA dot blots made with overlapping lambda clones spanning the genome of E. coli K-12. Conditions examined included isopropyl-beta-D-thiogalactopyranoside (IPTG) induction, heat shock, osmotic shock, starvation for various nutrients, entrance of cells into the stationary phase of growth, anaerobic growth in a tube, growth in the gnotobiotic mouse gut, and effects of pleiotropic mutations rpoH, himA, topA, and crp. Most mapped genes known to be regulated by a particular situation were successfully detected. In addition, many chromosomal regions containing no previously known regulated genes were discovered that responded to various stimuli. This new method for studying globally regulated genetic systems in E. coli combines detection, cloning, and physical mapping of a battery of coregulated genes in one step. Images PMID:8458845

  17. The future of Yellowcake: a global assessment of uranium resources and mining.

    PubMed

    Mudd, Gavin M

    2014-02-15

    Uranium (U) mining remains controversial in many parts of the world, especially in a post-Fukushima context, and often in areas with significant U resources. Although nuclear proponents point to the relatively low carbon intensity of nuclear power compared to fossil fuels, opponents argue that this will be eroded in the future as ore grades decline and energy and greenhouse gas emissions (GGEs) intensity increases as a result. Invariably both sides fail to make use of the increasingly available data reported by some U mines through sustainability reporting - allowing a comprehensive assessment of recent trends in the energy and GGE intensity of U production, as well as combining this with reported mineral resources to allow more comprehensive modelling of future energy and GGEs intensity. In this study, detailed data sets are compiled on reported U resources by deposit type, as well as mine production, energy and GGE intensity. Some important aspects included are the relationship between ore grade, deposit type and recovery, which are crucial in future projections of U mining. Overall, the paper demonstrates that there are extensive U resources known to meet potential short to medium term demand, although the future of U mining remains uncertain due to the doubt about the future of nuclear power as well as a range of complex social, environmental, economic and some site-specific technical issues.

  18. Mining rare and ubiquitous toxin genes from a large collection of Bacillus thuringiensis strains.

    PubMed

    Li, Ying; Shu, Changlong; Zhang, Xuewen; Crickmore, Neil; Liang, Gemei; Jiang, Xingfu; Liu, Rongmei; Song, Fuping; Zhang, Jie

    2014-10-01

    There has been considerable effort made in recent years for research groups and other organizations to build up large collections of strains of Bacillus thuringiensis in the search for genes encoding novel insecticidal toxins, or encoding novel metabolic pathways. Whilst next generation sequencing allows the detailed genetic characterization of a bacterial strain with relative ease it is still not practicable for large strain collections. In this work we assess the practicability of mining a mixture of genomic DNA from a two thousand strain collection for particular genes. Using PCR the collection was screened for both a rare (cry15) toxin gene as well as a more commonly found gene (vip3A). The method was successful in identifying both a cry15 gene and multiple examples of the vip3A gene family including a novel member of this family (vip3Aj). A number of variants of vip3Ag were cloned and expressed, and differences in toxicity observed despite extremely high sequence similarity.

  19. Association mining of mutated cancer genes in different clinical stages across 11 cancer types

    PubMed Central

    Wang, Tingzhang; Zheng, Shu

    2016-01-01

    Many studies have demonstrated that some genes (e.g. APC, BRAF, KRAS, PTEN, TP53) are frequently mutated in cancer, however, underlying mechanism that contributes to their high mutation frequency remains unclear. Here we used Apriori algorithm to find the frequent mutational gene sets (FMGSs) from 4,904 tumors across 11 cancer types as part of the TCGA Pan-Cancer effort and then mined the hidden association rules (ARs) within these FMGSs. Intriguingly, we found that well-known cancer driver genes such as BRAF, KRAS, PTEN, and TP53 were often co-occurred with other driver genes and FMGSs size peaked at an itemset size of 3∼4 genes. Besides, the number and constitution of FMGS and ARs differed greatly among different cancers and stages. In addition, FMGS and ARs were rare in endocrine-related cancers such as breast carcinoma, ovarian cystadenocarcinoma, and thyroid carcinoma, but abundant in cancers contact directly with external environments such as skin melanoma and stomach adenocarcinoma. Furthermore, we observed more rules in stage IV than in other stages, indicating that distant metastasis needed more sophisticated gene regulatory network. PMID:27556693

  20. Association mining of mutated cancer genes in different clinical stages across 11 cancer types.

    PubMed

    Hu, Wangxiong; Li, Xiaofen; Wang, Tingzhang; Zheng, Shu

    2016-10-18

    Many studies have demonstrated that some genes (e.g. APC, BRAF, KRAS, PTEN, TP53) are frequently mutated in cancer, however, underlying mechanism that contributes to their high mutation frequency remains unclear. Here we used Apriori algorithm to find the frequent mutational gene sets (FMGSs) from 4,904 tumors across 11 cancer types as part of the TCGA Pan-Cancer effort and then mined the hidden association rules (ARs) within these FMGSs. Intriguingly, we found that well-known cancer driver genes such as BRAF, KRAS, PTEN, and TP53 were often co-occurred with other driver genes and FMGSs size peaked at an itemset size of 3~4 genes. Besides, the number and constitution of FMGS and ARs differed greatly among different cancers and stages. In addition, FMGS and ARs were rare in endocrine-related cancers such as breast carcinoma, ovarian cystadenocarcinoma, and thyroid carcinoma, but abundant in cancers contact directly with external environments such as skin melanoma and stomach adenocarcinoma. Furthermore, we observed more rules in stage IV than in other stages, indicating that distant metastasis needed more sophisticated gene regulatory network.

  1. Stress-Survival Gene Identification From an Acid Mine Drainage Algal Mat Community

    NASA Astrophysics Data System (ADS)

    Urbina-Navarrete, J.; Fujishima, K.; Paulino-Lima, I. G.; Rothschild-Mancinelli, B.; Rothschild, L. J.

    2014-12-01

    Microbial communities from acid mine drainage environments are exposed to multiple stressors to include low pH, high dissolved metal loads, seasonal freezing, and desiccation. The microbial and algal communities that inhabit these niche environments have evolved strategies that allow for their ecological success. Metagenomic analyses are useful in identifying species diversity, however they do not elucidate the mechanisms that allow for the resilience of a community under these extreme conditions. Many known or predicted genes encode for protein products that are unknown, or similarly, many proteins cannot be traced to their gene of origin. This investigation seeks to identify genes that are active in an algal consortium during stress from living in an acid mine drainage environment. Our approach involves using the entire community transcriptome for a functional screen in an Escherichia coli host. This approach directly targets the genes involved in survival, without need for characterizing the members of the consortium.The consortium was harvested and stressed with conditions similar to the native environment it was collected from. Exposure to low pH (< 3.2), high metal load, desiccation, and deep freeze resulted in the expression of stress-induced genes that were transcribed into messenger RNA (mRNA). These mRNA transcripts were harvested to build complementary DNA (cDNA) libraries in E. coli. The transformed E. coli were exposed to the same stressors as the original algal consortium to select for surviving cells. Successful cells incorporated the transcripts that encode survival mechanisms, thus allowing for selection and identification of the gene(s) involved. Initial selection screens for freeze and desiccation tolerance have yielded E. coli that are 1 order of magnitude more resistant to freezing (0.01% survival of control with no transcript, 0.2% survival of E. coli with transcript) and 3 orders of magnitude more resistant to desiccation (0.005% survival of

  2. Bioactivity-guided genome mining reveals the lomaiviticin biosynthetic gene cluster in Salinispora tropica.

    PubMed

    Kersten, Roland D; Lane, Amy L; Nett, Markus; Richter, Taylor K S; Duggan, Brendan M; Dorrestein, Pieter C; Moore, Bradley S

    2013-05-27

    The use of genome sequences has become routine in guiding the discovery and identification of microbial natural products and their biosynthetic pathways. In silico prediction of molecular features, such as metabolic building blocks, physico-chemical properties or biological functions, from orphan gene clusters has opened up the characterization of many new chemo- and genotypes in genome mining approaches. Here, we guided our genome mining of two predicted enediyne pathways in Salinispora tropica CNB-440 by a DNA interference bioassay to isolate DNA-targeting enediyne polyketides. An organic extract of S. tropica showed DNA-interference activity that surprisingly was not abolished in genetic mutants of the targeted enediyne pathways, ST_pks1 and spo. Instead we showed that the product of the orphan type II polyketide synthase pathway, ST_pks2, is solely responsible for the DNA-interfering activity of the parent strain. Subsequent comparative metabolic profiling revealed the lomaiviticins, glycosylated diazofluorene polyketides, as the ST_pks2 products. This study marks the first report of the 59 open reading frame lomaiviticin gene cluster (lom) and supports the biochemical logic of their dimeric construction through a pathway related to the kinamycin monomer. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  3. Gene Network Reconstruction using Global-Local Shrinkage Priors*

    PubMed Central

    Leday, Gwenaël G.R.; de Gunst, Mathisca C.M.; Kpogbezan, Gino B.; van der Vaart, Aad W.; van Wieringen, Wessel N.; van de Wiel, Mark A.

    2016-01-01

    Reconstructing a gene network from high-throughput molecular data is an important but challenging task, as the number of parameters to estimate easily is much larger than the sample size. A conventional remedy is to regularize or penalize the model likelihood. In network models, this is often done locally in the neighbourhood of each node or gene. However, estimation of the many regularization parameters is often difficult and can result in large statistical uncertainties. In this paper we propose to combine local regularization with global shrinkage of the regularization parameters to borrow strength between genes and improve inference. We employ a simple Bayesian model with non-sparse, conjugate priors to facilitate the use of fast variational approximations to posteriors. We discuss empirical Bayes estimation of hyper-parameters of the priors, and propose a novel approach to rank-based posterior thresholding. Using extensive model- and data-based simulations, we demonstrate that the proposed inference strategy outperforms popular (sparse) methods, yields more stable edges, and is more reproducible. The proposed method, termed ShrinkNet, is then applied to Glioblastoma to investigate the interactions between genes associated with patient survival. PMID:28408966

  4. Regulation of global gene expression and cell proliferation by APP

    PubMed Central

    Wu, Yili; Zhang, Si; Xu, Qin; Zou, Haiyan; Zhou, Weihui; Cai, Fang; Li, Tingyu; Song, Weihong

    2016-01-01

    Down syndrome (DS), caused by trisomy of chromosome 21, is one of the most common genetic disorders. Patients with DS display growth retardation and inevitably develop characteristic Alzheimer’s disease (AD) neuropathology, including neurofibrillary tangles and neuritic plaques. The expression of amyloid precursor protein (APP) is increased in both DS and AD patients. To reveal the function of APP and elucidate the pathogenic role of increased APP expression in DS and AD, we performed gene expression profiling using microarray method in human cells overexpressing APP. A set of genes are significantly altered, which are involved in cell cycle, cell proliferation and p53 signaling. We found that overexpression of APP inhibits cell proliferation. Furthermore, we confirmed that the downregulation of two validated genes, PSMA5 and PSMB7, inhibits cell proliferation, suggesting that the downregulation of PSMA5 and PSMB7 is involved in APP-induced cell proliferation impairment. Taken together, this study suggests that APP regulates global gene expression and increased APP expression inhibits cell proliferation. Our study provides a novel insight that APP overexpression may contribute to the growth impairment in DS patients and promote AD pathogenesis by inhibiting cell proliferation including neural stem cell proliferation and neurogenesis. PMID:26936520

  5. Regulation of global gene expression and cell proliferation by APP.

    PubMed

    Wu, Yili; Zhang, Si; Xu, Qin; Zou, Haiyan; Zhou, Weihui; Cai, Fang; Li, Tingyu; Song, Weihong

    2016-03-03

    Down syndrome (DS), caused by trisomy of chromosome 21, is one of the most common genetic disorders. Patients with DS display growth retardation and inevitably develop characteristic Alzheimer's disease (AD) neuropathology, including neurofibrillary tangles and neuritic plaques. The expression of amyloid precursor protein (APP) is increased in both DS and AD patients. To reveal the function of APP and elucidate the pathogenic role of increased APP expression in DS and AD, we performed gene expression profiling using microarray method in human cells overexpressing APP. A set of genes are significantly altered, which are involved in cell cycle, cell proliferation and p53 signaling. We found that overexpression of APP inhibits cell proliferation. Furthermore, we confirmed that the downregulation of two validated genes, PSMA5 and PSMB7, inhibits cell proliferation, suggesting that the downregulation of PSMA5 and PSMB7 is involved in APP-induced cell proliferation impairment. Taken together, this study suggests that APP regulates global gene expression and increased APP expression inhibits cell proliferation. Our study provides a novel insight that APP overexpression may contribute to the growth impairment in DS patients and promote AD pathogenesis by inhibiting cell proliferation including neural stem cell proliferation and neurogenesis.

  6. Global gene expression profile progression in Gaucher disease mouse models

    PubMed Central

    2011-01-01

    Background Gaucher disease is caused by defective glucocerebrosidase activity and the consequent accumulation of glucosylceramide. The pathogenic pathways resulting from lipid laden macrophages (Gaucher cells) in visceral organs and their abnormal functions are obscure. Results To elucidate this pathogenic pathway, developmental global gene expression analyses were conducted in distinct Gba1 point-mutated mice (V394L/V394L and D409 V/null). About 0.9 to 3% of genes had altered expression patterns (≥ ± 1.8 fold change), representing several categories, but particularly macrophage activation and immune response genes. Time course analyses (12 to 28 wk) of INFγ-regulated pro-inflammatory (13) and IL-4-regulated anti-inflammatory (11) cytokine/mediator networks showed tissue differential profiles in the lung and liver of the Gba1 mutant mice, implying that the lipid-storage macrophages were not functionally inert. The time course alterations of the INFγ and IL-4 pathways were similar, but varied in degree in these tissues and with the Gba1 mutation. Conclusions Biochemical and pathological analyses demonstrated direct relationships between the degree of tissue glucosylceramides and the gene expression profile alterations. These analyses implicate IFNγ-regulated pro-inflammatory and IL-4-regulated anti-inflammatory networks in differential disease progression with implications for understanding the Gaucher disease course and pathophysiology. PMID:21223590

  7. Mining biological information from 3D short time-series gene expression data: the OPTricluster algorithm.

    PubMed

    Tchagang, Alain B; Phan, Sieu; Famili, Fazel; Shearer, Heather; Fobert, Pierre; Huang, Yi; Zou, Jitao; Huang, Daiqing; Cutler, Adrian; Liu, Ziying; Pan, Youlian

    2012-04-04

    Nowadays, it is possible to collect expression levels of a set of genes from a set of biological samples during a series of time points. Such data have three dimensions: gene-sample-time (GST). Thus they are called 3D microarray gene expression data. To take advantage of the 3D data collected, and to fully understand the biological knowledge hidden in the GST data, novel subspace clustering algorithms have to be developed to effectively address the biological problem in the corresponding space. We developed a subspace clustering algorithm called Order Preserving Triclustering (OPTricluster), for 3D short time-series data mining. OPTricluster is able to identify 3D clusters with coherent evolution from a given 3D dataset using a combinatorial approach on the sample dimension, and the order preserving (OP) concept on the time dimension. The fusion of the two methodologies allows one to study similarities and differences between samples in terms of their temporal expression profile. OPTricluster has been successfully applied to four case studies: immune response in mice infected by malaria (Plasmodium chabaudi), systemic acquired resistance in Arabidopsis thaliana, similarities and differences between inner and outer cotyledon in Brassica napus during seed development, and to Brassica napus whole seed development. These studies showed that OPTricluster is robust to noise and is able to detect the similarities and differences between biological samples. Our analysis showed that OPTricluster generally outperforms other well known clustering algorithms such as the TRICLUSTER, gTRICLUSTER and K-means; it is robust to noise and can effectively mine the biological knowledge hidden in the 3D short time-series gene expression data.

  8. Genomic prediction contributing to a promising global strategy to turbocharge gene banks.

    PubMed

    Yu, Xiaoqing; Li, Xianran; Guo, Tingting; Zhu, Chengsong; Wu, Yuye; Mitchell, Sharon E; Roozeboom, Kraig L; Wang, Donghai; Wang, Ming Li; Pederson, Gary A; Tesso, Tesfaye T; Schnable, Patrick S; Bernardo, Rex; Yu, Jianming

    2016-10-03

    The 7.4 million plant accessions in gene banks are largely underutilized due to various resource constraints, but current genomic and analytic technologies are enabling us to mine this natural heritage. Here we report a proof-of-concept study to integrate genomic prediction into a broad germplasm evaluation process. First, a set of 962 biomass sorghum accessions were chosen as a reference set by germplasm curators. With high throughput genotyping-by-sequencing (GBS), we genetically characterized this reference set with 340,496 single nucleotide polymorphisms (SNPs). A set of 299 accessions was selected as the training set to represent the overall diversity of the reference set, and we phenotypically characterized the training set for biomass yield and other related traits. Cross-validation with multiple analytical methods using the data of this training set indicated high prediction accuracy for biomass yield. Empirical experiments with a 200-accession validation set chosen from the reference set confirmed high prediction accuracy. The potential to apply the prediction model to broader genetic contexts was also examined with an independent population. Detailed analyses on prediction reliability provided new insights into strategy optimization. The success of this project illustrates that a global, cost-effective strategy may be designed to assess the vast amount of valuable germplasm archived in 1,750 gene banks.

  9. Independent component analysis: mining microarray data for fundamental human gene expression modules

    PubMed Central

    Engreitz, Jesse M.; Daigle, Bernie J.; Marshall, Jonathan J.; Altman, Russ B.

    2010-01-01

    As public microarray repositories rapidly accumulate gene expression data, these resources contain increasingly valuable information about cellular processes in human biology. This presents a unique opportunity for intelligent data mining methods to extract information about the transcriptional modules underlying these biological processes. Modeling cellular gene expression as a combination of functional modules, we use independent component analysis (ICA) to derive 423 fundamental components of human biology from a 9,395-array compendium of heterogeneous expression data. Annotation using the Gene Ontology (GO) suggests that while some of these components represent known biological modules, others may describe biology not well characterized by existing manually-curated ontologies. In order to understand the biological functions represented by these modules, we investigate the mechanism of the preclinical anticancer drug parthenolide (PTL) by analyzing the differential expression of our fundamental components. Our method correctly identifies known pathways and predicts that N-glycan biosynthesis and T-cell receptor signaling may contribute to PTL response. The fundamental gene modules we describe have the potential to provide pathway-level insight into new gene expression datasets. PMID:20619355

  10. Gene Mining for Proline Based Signaling Proteins in Cell Wall of Arabidopsis thaliana.

    PubMed

    Ihsan, Muhammad Z; Ahmad, Samina J N; Shah, Zahid Hussain; Rehman, Hafiz M; Aslam, Zubair; Ahuja, Ishita; Bones, Atle M; Ahmad, Jam N

    2017-01-01

    The cell wall (CW) as a first line of defense against biotic and abiotic stresses is of primary importance in plant biology. The proteins associated with cell walls play a significant role in determining a plant's sustainability to adverse environmental conditions. In this work, the genes encoding cell wall proteins (CWPs) in Arabidopsis were identified and functionally classified using geneMANIA and GENEVESTIGATOR with published microarrays data. This yielded 1605 genes, out of which 58 genes encoded proline-rich proteins (PRPs) and glycine-rich proteins (GRPs). Here, we have focused on the cellular compartmentalization, biological processes, and molecular functioning of proline-rich CWPs along with their expression at different plant developmental stages. The mined genes were categorized into five classes on the basis of the type of PRPs encoded in the cell wall of Arabidopsis thaliana. We review the domain structure and function of each class of protein, many with respect to the developmental stages of the plant. We have then used networks, hierarchical clustering and correlations to analyze co-expression, co-localization, genetic, and physical interactions and shared protein domains of these PRPs. This has given us further insight into these functionally important CWPs and identified a number of potentially new cell-wall related proteins in A. thaliana.

  11. Gene Mining for Proline Based Signaling Proteins in Cell Wall of Arabidopsis thaliana

    PubMed Central

    Ihsan, Muhammad Z.; Ahmad, Samina J. N.; Shah, Zahid Hussain; Rehman, Hafiz M.; Aslam, Zubair; Ahuja, Ishita; Bones, Atle M.; Ahmad, Jam N.

    2017-01-01

    The cell wall (CW) as a first line of defense against biotic and abiotic stresses is of primary importance in plant biology. The proteins associated with cell walls play a significant role in determining a plant's sustainability to adverse environmental conditions. In this work, the genes encoding cell wall proteins (CWPs) in Arabidopsis were identified and functionally classified using geneMANIA and GENEVESTIGATOR with published microarrays data. This yielded 1605 genes, out of which 58 genes encoded proline-rich proteins (PRPs) and glycine-rich proteins (GRPs). Here, we have focused on the cellular compartmentalization, biological processes, and molecular functioning of proline-rich CWPs along with their expression at different plant developmental stages. The mined genes were categorized into five classes on the basis of the type of PRPs encoded in the cell wall of Arabidopsis thaliana. We review the domain structure and function of each class of protein, many with respect to the developmental stages of the plant. We have then used networks, hierarchical clustering and correlations to analyze co-expression, co-localization, genetic, and physical interactions and shared protein domains of these PRPs. This has given us further insight into these functionally important CWPs and identified a number of potentially new cell-wall related proteins in A. thaliana. PMID:28289422

  12. Genome mining demonstrates the widespread occurrence of gene clusters encoding bacteriocins in cyanobacteria.

    PubMed

    Wang, Hao; Fewer, David P; Sivonen, Kaarina

    2011-01-01

    Cyanobacteria are a rich source of natural products with interesting biological activities. Many of these are peptides and the end products of a non-ribosomal pathway. However, several cyanobacterial peptide classes were recently shown to be produced through the proteolytic cleavage and post-translational modification of short precursor peptides. A new class of bacteriocins produced through the proteolytic cleavage and heterocyclization of precursor proteins was recently identified from marine cyanobacteria. Here we show the widespread occurrence of bacteriocin gene clusters in cyanobacteria through comparative analysis of 58 cyanobacterial genomes. A total of 145 bacteriocin gene clusters were discovered through genome mining. These clusters encoded 290 putative bacteriocin precursors. They ranged in length from 28 to 164 amino acids with very little sequence conservation of the core peptide. The gene clusters could be classified into seven groups according to their gene organization and domain composition. This classification is supported by phylogenetic analysis, which further indicated independent evolutionary trajectories of gene clusters in different groups. Our data suggests that cyanobacteria are a prolific source of low-molecular weight post-translationally modified peptides.

  13. Automatic construction of gene relation networks using text mining and gene expression data.

    PubMed

    Karopka, Thomas; Scheel, Thomas; Bansemer, Sven; Glass, Anne

    2004-06-01

    Microarray gene expression analysis is a powerful high-throughput technique that enables researchers to monitor the expression of thousands of genes simultaneously. Using this methodology huge amounts of data are produced which have to be analysed. Clustering algorithms are used to group genes together based on a predefined distance measure. However, clustering algorithms do not necessarily group the genes in a biological meaningful way. Additional information is needed to improve the identification of disease relevant genes. The primary objective of our project is to support the analysis of microarray gene expression data by construction of gene relation networks (GRNs). Required information can not be found in a structured representation like a database. In contrast, a large number of relations are described in biomedical literature. The main outcome of this project is the implementation of a software system that provides clinicians and researchers with a tool that supports the analysis of microarray gene expression data by mapping known relationships from the biomedical literature to local gene expression experiments.

  14. Identification of disease-causing genes using microarray data mining and Gene Ontology.

    PubMed

    Mohammadi, Azadeh; Saraee, Mohammad H; Salehi, Mansoor

    2011-01-26

    One of the best and most accurate methods for identifying disease-causing genes is monitoring gene expression values in different samples using microarray technology. One of the shortcomings of microarray data is that they provide a small quantity of samples with respect to the number of genes. This problem reduces the classification accuracy of the methods, so gene selection is essential to improve the predictive accuracy and to identify potential marker genes for a disease. Among numerous existing methods for gene selection, support vector machine-based recursive feature elimination (SVMRFE) has become one of the leading methods, but its performance can be reduced because of the small sample size, noisy data and the fact that the method does not remove redundant genes. We propose a novel framework for gene selection which uses the advantageous features of conventional methods and addresses their weaknesses. In fact, we have combined the Fisher method and SVMRFE to utilize the advantages of a filtering method as well as an embedded method. Furthermore, we have added a redundancy reduction stage to address the weakness of the Fisher method and SVMRFE. In addition to gene expression values, the proposed method uses Gene Ontology which is a reliable source of information on genes. The use of Gene Ontology can compensate, in part, for the limitations of microarrays, such as having a small number of samples and erroneous measurement results. The proposed method has been applied to colon, Diffuse Large B-Cell Lymphoma (DLBCL) and prostate cancer datasets. The empirical results show that our method has improved classification performance in terms of accuracy, sensitivity and specificity. In addition, the study of the molecular function of selected genes strengthened the hypothesis that these genes are involved in the process of cancer growth. The proposed method addresses the weakness of conventional methods by adding a redundancy reduction stage and utilizing Gene

  15. Identification of disease-causing genes using microarray data mining and Gene Ontology

    PubMed Central

    2011-01-01

    Background One of the best and most accurate methods for identifying disease-causing genes is monitoring gene expression values in different samples using microarray technology. One of the shortcomings of microarray data is that they provide a small quantity of samples with respect to the number of genes. This problem reduces the classification accuracy of the methods, so gene selection is essential to improve the predictive accuracy and to identify potential marker genes for a disease. Among numerous existing methods for gene selection, support vector machine-based recursive feature elimination (SVMRFE) has become one of the leading methods, but its performance can be reduced because of the small sample size, noisy data and the fact that the method does not remove redundant genes. Methods We propose a novel framework for gene selection which uses the advantageous features of conventional methods and addresses their weaknesses. In fact, we have combined the Fisher method and SVMRFE to utilize the advantages of a filtering method as well as an embedded method. Furthermore, we have added a redundancy reduction stage to address the weakness of the Fisher method and SVMRFE. In addition to gene expression values, the proposed method uses Gene Ontology which is a reliable source of information on genes. The use of Gene Ontology can compensate, in part, for the limitations of microarrays, such as having a small number of samples and erroneous measurement results. Results The proposed method has been applied to colon, Diffuse Large B-Cell Lymphoma (DLBCL) and prostate cancer datasets. The empirical results show that our method has improved classification performance in terms of accuracy, sensitivity and specificity. In addition, the study of the molecular function of selected genes strengthened the hypothesis that these genes are involved in the process of cancer growth. Conclusions The proposed method addresses the weakness of conventional methods by adding a redundancy

  16. Analyzing Large Gene Expression and Methylation Data Profiles Using StatBicRM: Statistical Biclustering-Based Rule Mining

    PubMed Central

    Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra

    2015-01-01

    Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data

  17. Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.

    PubMed

    Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra

    2015-01-01

    Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data

  18. Topological origin of global attractors in gene regulatory networks

    NASA Astrophysics Data System (ADS)

    Zhang, YunJun; Ouyang, Qi; Geng, Zhi

    2015-02-01

    Fixed-point attractors with global stability manifest themselves in a number of gene regulatory networks. This property indicates the stability of regulatory networks against small state perturbations and is closely related to other complex dynamics. In this paper, we aim to reveal the core modules in regulatory networks that determine their global attractors and the relationship between these core modules and other motifs. This work has been done via three steps. Firstly, inspired by the signal transmission in the regulation process, we extract the model of chain-like network from regulation networks. We propose a module of "ideal transmission chain (ITC)", which is proved sufficient and necessary (under certain condition) to form a global fixed-point in the context of chain-like network. Secondly, by examining two well-studied regulatory networks (i.e., the cell-cycle regulatory networks of Budding yeast and Fission yeast), we identify the ideal modules in true regulation networks and demonstrate that the modules have a superior contribution to network stability (quantified by the relative size of the biggest attraction basin). Thirdly, in these two regulation networks, we find that the double negative feedback loops, which are the key motifs of forming bistability in regulation, are connected to these core modules with high network stability. These results have shed new light on the connection between the topological feature and the dynamic property of regulatory networks.

  19. Isolation of a low-sulfur tolerance gene from Eichhornia crassipes using a functional gene-mining approach.

    PubMed

    Liu, Xiao; Chen, Xi; Oliver, David J; Xiang, Cheng-Bin

    2009-12-01

    Genes enhancing nutrient utilization efficiency are needed for crop improvement. Here, we report the isolation of a gene conferring low-sulfur tolerance from water hyacinth (Eichhornia crassipes) using a functional gene-mining method. In doing this, an entry cDNA library was constructed from the roots of nutrient-starved water hyacinth using recombination cloning and subsequently shuttled into the plant transformation- and expression-ready vector. The plant transformation- and expression-ready library was transferred into Arabidopsis and a seed library of 50,000 independent transgenic lines was generated. Three transgenic lines with enhanced low-sulfur tolerance were isolated from the seed library. One of the transgenic lines, shl143-1, with improved tolerance to sulfate deficiency and an improved root system was further analyzed. It was found that a water hyacinth jacalin-related lectin gene (EcJRL-1) was overexpressed in shl143-1. Recapitulation analysis confirmed that the overexpression of the EcJRL-1 cDNA caused the phenotype. Therefore, this study demonstrates that a jacalin-related lectin is involved in root elongation under sulfur-deficient conditions.

  20. Global distribution of the CCR5 gene 32-basepair deletion.

    PubMed

    Martinson, J J; Chapman, N H; Rees, D C; Liu, Y T; Clegg, J B

    1997-05-01

    A mutant allele of the beta-chemokine receptor gene CCR5 bearing a 32-basepair (bp) deletion (denoted delta ccr5) which prevents cell invasion by the primary transmitting strain of HIV-1 has recently been characterized. Homozygotes for the mutation are resistant to infection, even after repeated high-risk exposures, but this resistance appears not to be total, as isolated cases of HIV-positive deletion homozygotes are now emerging. The consequence of the heterozygous state is not clear, but it may delay the progression to AIDS in infected individuals. A gene frequency of approximately 10% was found for delta ccr5 in populations of European descent, but no mutant alleles were reported in indigenous non-European populations. As the total number of non-European samples surveyed was small in comparison with the Europeans the global distribution of this mutation is far from clear. We have devised a rapid PCR assay for delta ccr5 and used it to screen 3,342 individuals from a globally-distributed range of populations. We find that delta ccr5 is not confined to people of European descent but is found at frequencies of 2-5% throughout Europe, the Middle East and the Indian subcontinent (Fig. 1). Isolated occurrences are seen elsewhere throughout the world, but these most likely represent recent European gene flow into the indigenous populations. The inter-population differences in delta ccr5 frequency may influence the pattern of HIV transmission and so will need to be incorporated into future predictions of HIV levels.

  1. The GeneCards Suite: From Gene Data Mining to Disease Genome Sequence Analyses.

    PubMed

    Stelzer, Gil; Rosen, Naomi; Plaschkes, Inbar; Zimmerman, Shahar; Twik, Michal; Fishilevich, Simon; Stein, Tsippi Iny; Nudel, Ron; Lieder, Iris; Mazor, Yaron; Kaplan, Sergey; Dahary, Dvir; Warshawsky, David; Guan-Golan, Yaron; Kohn, Asher; Rappaport, Noa; Safran, Marilyn; Lancet, Doron

    2016-06-20

    GeneCards, the human gene compendium, enables researchers to effectively navigate and inter-relate the wide universe of human genes, diseases, variants, proteins, cells, and biological pathways. Our recently launched Version 4 has a revamped infrastructure facilitating faster data updates, better-targeted data queries, and friendlier user experience. It also provides a stronger foundation for the GeneCards suite of companion databases and analysis tools. Improved data unification includes gene-disease links via MalaCards and merged biological pathways via PathCards, as well as drug information and proteome expression. VarElect, another suite member, is a phenotype prioritizer for next-generation sequencing, leveraging the GeneCards and MalaCards knowledgebase. It automatically infers direct and indirect scored associations between hundreds or even thousands of variant-containing genes and disease phenotype terms. VarElect's capabilities, either independently or within TGex, our comprehensive variant analysis pipeline, help prepare for the challenge of clinical projects that involve thousands of exome/genome NGS analyses. © 2016 by John Wiley & Sons, Inc.

  2. Phylogenetic diversity of archaea and the archaeal ammonia monooxygenase gene in uranium mining-impacted locations in Bulgaria.

    PubMed

    Radeva, Galina; Kenarova, Anelia; Bachvarova, Velina; Flemming, Katrin; Popov, Ivan; Vassilev, Dimitar; Selenska-Pobell, Sonja

    2014-01-01

    Uranium mining and milling activities adversely affect the microbial populations of impacted sites. The negative effects of uranium on soil bacteria and fungi are well studied, but little is known about the effects of radionuclides and heavy metals on archaea. The composition and diversity of archaeal communities inhabiting the waste pile of the Sliven uranium mine and the soil of the Buhovo uranium mine were investigated using 16S rRNA gene retrieval. A total of 355 archaeal clones were selected, and their 16S rDNA inserts were analysed by restriction fragment length polymorphism (RFLP) discriminating 14 different RFLP types. All evaluated archaeal 16S rRNA gene sequences belong to the 1.1b/Nitrososphaera cluster of Crenarchaeota. The composition of the archaeal community is distinct for each site of interest and dependent on environmental characteristics, including pollution levels. Since the members of 1.1b/Nitrososphaera cluster have been implicated in the nitrogen cycle, the archaeal communities from these sites were probed for the presence of the ammonia monooxygenase gene (amoA). Our data indicate that amoA gene sequences are distributed in a similar manner as in Crenarchaeota, suggesting that archaeal nitrification processes in uranium mining-impacted locations are under the control of the same key factors controlling archaeal diversity.

  3. Phylogenetic Diversity of Archaea and the Archaeal Ammonia Monooxygenase Gene in Uranium Mining-Impacted Locations in Bulgaria

    PubMed Central

    Radeva, Galina; Kenarova, Anelia; Bachvarova, Velina; Popov, Ivan; Selenska-Pobell, Sonja

    2014-01-01

    Uranium mining and milling activities adversely affect the microbial populations of impacted sites. The negative effects of uranium on soil bacteria and fungi are well studied, but little is known about the effects of radionuclides and heavy metals on archaea. The composition and diversity of archaeal communities inhabiting the waste pile of the Sliven uranium mine and the soil of the Buhovo uranium mine were investigated using 16S rRNA gene retrieval. A total of 355 archaeal clones were selected, and their 16S rDNA inserts were analysed by restriction fragment length polymorphism (RFLP) discriminating 14 different RFLP types. All evaluated archaeal 16S rRNA gene sequences belong to the 1.1b/Nitrososphaera cluster of Crenarchaeota. The composition of the archaeal community is distinct for each site of interest and dependent on environmental characteristics, including pollution levels. Since the members of 1.1b/Nitrososphaera cluster have been implicated in the nitrogen cycle, the archaeal communities from these sites were probed for the presence of the ammonia monooxygenase gene (amoA). Our data indicate that amoA gene sequences are distributed in a similar manner as in Crenarchaeota, suggesting that archaeal nitrification processes in uranium mining-impacted locations are under the control of the same key factors controlling archaeal diversity. PMID:24711725

  4. Metal bioaccumulation, genotoxicity and gene expression in the European wood mouse (Apodemus sylvaticus) inhabiting an abandoned uranium mining area.

    PubMed

    Lourenço, Joana; Pereira, Ruth; Gonçalves, Fernando; Mendo, Sónia

    2013-01-15

    Genotoxic effects caused by the exposure to wastes containing metals and radionuclides were investigated in the European wood mice (Apodemus sylvaticus). The animals were captured in the surroundings of an abandoned uranium mining site. DNA damage was assessed by comet assay; gene expression and single nucleotide polymorphisms (SNPs) were assessed, respectively, by Real-Time PCR and melt curve analysis. The bioaccumulation of metals in the liver, kidney and bones was also determined to help clarify cause-effect relationships. Results confirmed the bioaccumulation of cadmium and uranium in organisms exposed to uranium mining wastes. P53 gene was found to be significantly up-regulated in the liver of those organisms and SNPs in the Rb gene were also detected in the kidney. Our results showed that uranium mining wastes caused serious DNA damage resulting in genomic instability, disclosed by the significant increase in DNA strand breaks and P53 gene expression disturbance. These effects can have severe consequences, since they may contribute for the emergence of serious genetic diseases. The fact that mice are often used as bioindicator species for the evaluation of risks of environmental exposure to humans, raises concerns on the risks for human populations living near uranium mining areas. Copyright © 2012 Elsevier B.V. All rights reserved.

  5. Of text and gene – using text mining methods to uncover hidden knowledge in toxicogenomics

    PubMed Central

    2014-01-01

    Background Toxicogenomics studies often profile gene expression from assays involving multiple doses and time points. The dose- and time-dependent pattern is of great importance to assess toxicity but computational approaches are lacking to effectively utilize this characteristic in toxicity assessment. Topic modeling is a text mining approach, but may be used analogously in toxicogenomics due to the similar data structures between text and gene dysregulation. Results Topic modeling was applied to a very large toxicogenomics dataset containing microarray gene expression data from >15,000 samples associated with 131 drugs tested in three different assay platforms (i.e., in vitro assay, in vivo repeated dose study and in vivo single dose experiment) with a design including multiple doses and time points. A set of “topics” which each consist of a set of genes was determined, by which the varying sensitivity of three assay systems was observed. We found that the drug-dependent effect was more pronounced in the two in vivo systems than the in vitro system, while the time-dependent effect was most strongly reflected in the in vitro system followed by the single dose study and lastly the repeated dose experiment. The dose-dependent effect was similar across three assay systems. Although the results indicated a challenge to extrapolate the in vitro results to the in vivo situation, we did notice that, for some drugs but not for all the drugs, the similarity in gene expression patterns was observed across all three assay systems, indicating a possibility of using in vitro systems with careful designs (such as the choice of dose and time point), to replace the in vivo testing strategy. Nonetheless, a potential to replace the repeated dose study by the single-dose short-term methodology was strongly implied. Conclusions The study demonstrated that text mining methodologies such as topic modeling provide an alternative method compared to traditional means for data

  6. Identification of sulfate-reducing bacteria in methylmercury-contaminated mine tailings by analysis of SSU rRNA genes.

    PubMed

    Winch, Susan; Mills, Heath J; Kostka, Joel E; Fortin, Danielle; Lean, David R S

    2009-04-01

    Sulfate-reducing bacteria (SRB) are often used in bioremediation of acid mine drainage because microbial sulfate reduction increases pH and produces sulfide that binds with metals. Mercury methylation has also been linked with sulfate reduction. Previous geochemical analysis indicated the occurrence of sulfate reduction in mine tailings, but no molecular characterization of the mine tailings-associated microbial community has determined which SRB are present. This study characterizes the bacterial communities of two geochemically contrasting, high-methylmercury mine tailing environments, with emphasis on SRB, by analyzing small subunit (SSU) rRNA genes present in the tailings sediments and in enrichment cultures inoculated with tailings. Novel Deltaproteobacteria and Firmicutes-related sequences were detected in both the pH-neutral gold mine tailings and the acidic high-sulfide base-metal tailings. At the subphylum level, the SRB communities differed between sites, suggesting that the community structure was dependent on local geochemistry. Clones obtained from the gold tailings and enrichment cultures were more similar to previously cultured isolates whereas clones from acidic tailings were more closely related to uncultured lineages identified from other acidic sediments worldwide. This study provides new insights into the novelty and diversity of bacteria colonizing mine tailings, and identifies specific organisms that warrant further investigation with regard to their roles in mercury methylation and sulfur cycling in these environments.

  7. Gene expression data analysis using closed item set mining for labeled data.

    PubMed

    Rotter, Ana; Novak, Petra Kralj; Baebler, Spela; Toplak, Natasa; Blejec, Andrej; Lavrac, Nada; Gruden, Kristina

    2010-04-01

    This article presents an approach to microarray data analysis using discretised expression values in combination with a methodology of closed item set mining for class labeled data (RelSets). A statistical 2 x 2 factorial design analysis was run in parallel. The approach was validated on two independent sets of two-color microarray experiments using potato plants. Our results demonstrate that the two different analytical procedures, applied on the same data, are adequate for solving two different biological questions being asked. Statistical analysis is appropriate if an overview of the consequences of treatments and their interaction terms on the studied system is needed. If, on the other hand, a list of genes whose expression (upregulation or downregulation) differentiates between classes of data is required, the use of the RelSets algorithm is preferred. The used algorithms are freely available upon request to the authors.

  8. Mining high-throughput experimental data to link gene and function

    PubMed Central

    Blaby-Haas, Crysten E.; de Crécy-Lagard, Valérie

    2011-01-01

    Nearly 2200 genomes encoding some 6 million proteins have now been sequenced. Around 40% of these proteins are of unknown function even when function is loosely and minimally defined as “belonging to a superfamily”. In addition to in silico methods, the swelling stream of high-throughput experimental data can give valuable clues for linking these “unknowns” with precise biological roles. The goal is to develop integrative data-mining platforms that allow the scientific community at large to access and utilize this rich source of experimental knowledge. To this end, we review recent advances in generating whole-genome experimental datasets, where this data can be accessed, and how it can be used to drive prediction of gene function. PMID:21310501

  9. Mining genomic patterns in Mycobacterium tuberculosis H37Rv using a web server Tuber-Gene.

    PubMed

    Rishishwar, Lavanya; Pant, Bhasker; Pant, Kumud; Pardasani, Kamal R

    2011-10-01

    Mycobacterium tuberculosis (MTB), causative agent of tuberculosis, is one of the most dreaded diseases of the century. It has long been studied by researchers throughout the world using various wet-lab and dry-lab techniques. In this study, we focus on mining useful patterns at genomic level that can be applied for in silico functional characterization of genes from the MTB complex. The model developed on the basis of the patterns found in this study can correctly identify 99.77% of the input genes from the genome of MTB strain H37Rv. The model was tested against four other MTB strains and the homologue M. bovis to further evaluate its generalization capability. The mean prediction accuracy was 85.76%. It was also observed that the GC content remained fairly constant throughout the genome, implicating the absence of any pathogenicity island transferred from other organisms. This study reveals that dinucleotide composition is an efficient functional class discriminator for MTB complex. To facilitate the application of this model, a web server Tuber-Gene has been developed, which can be freely accessed at http://www.bifmanit.org/tb2/.

  10. Novel Nickel Resistance Genes from the Rhizosphere Metagenome of Plants Adapted to Acid Mine Drainage▿ †

    PubMed Central

    Mirete, Salvador; de Figueras, Carolina G.; González-Pastor, Jose E.

    2007-01-01

    Metal resistance determinants have traditionally been found in cultivated bacteria. To search for genes involved in nickel resistance, we analyzed the bacterial community of the rhizosphere of Erica andevalensis, an endemic heather which grows at the banks of the Tinto River, a naturally metal-enriched and extremely acidic environment in southwestern Spain. 16S rRNA gene sequence analysis of rhizosphere DNA revealed the presence of members of five phylogenetic groups of Bacteria and the two main groups of Archaea mostly associated with sites impacted by acid mine drainage (AMD). The diversity observed and the presence of heavy metals in the rhizosphere led us to construct and screen five different metagenomic libraries hosted in Escherichia coli for searching novel nickel resistance determinants. A total of 13 positive clones were detected and analyzed. Insights about their possible mechanisms of resistance were obtained from cellular nickel content and sequence similarities. Two clones encoded putative ABC transporter components, and a novel mechanism of metal efflux is suggested. In addition, a nickel hyperaccumulation mechanism is proposed for a clone encoding a serine O-acetyltransferase. Five clones encoded proteins similar to well-characterized proteins but not previously reported to be related to nickel resistance, and the remaining six clones encoded hypothetical or conserved hypothetical proteins of uncertain functions. This is the first report documenting nickel resistance genes recovered from the metagenome of an AMD environment. PMID:17675438

  11. Literature mining for the discovery of hidden connections between drugs, genes and diseases.

    PubMed

    Frijters, Raoul; van Vugt, Marianne; Smeets, Ruben; van Schaik, René; de Vlieg, Jacob; Alkema, Wynand

    2010-09-23

    The scientific literature represents a rich source for retrieval of knowledge on associations between biomedical concepts such as genes, diseases and cellular processes. A commonly used method to establish relationships between biomedical concepts from literature is co-occurrence. Apart from its use in knowledge retrieval, the co-occurrence method is also well-suited to discover new, hidden relationships between biomedical concepts following a simple ABC-principle, in which A and C have no direct relationship, but are connected via shared B-intermediates. In this paper we describe CoPub Discovery, a tool that mines the literature for new relationships between biomedical concepts. Statistical analysis using ROC curves showed that CoPub Discovery performed well over a wide range of settings and keyword thesauri. We subsequently used CoPub Discovery to search for new relationships between genes, drugs, pathways and diseases. Several of the newly found relationships were validated using independent literature sources. In addition, new predicted relationships between compounds and cell proliferation were validated and confirmed experimentally in an in vitro cell proliferation assay. The results show that CoPub Discovery is able to identify novel associations between genes, drugs, pathways and diseases that have a high probability of being biologically valid. This makes CoPub Discovery a useful tool to unravel the mechanisms behind disease, to find novel drug targets, or to find novel applications for existing drugs.

  12. Literature Mining for the Discovery of Hidden Connections between Drugs, Genes and Diseases

    PubMed Central

    Frijters, Raoul; van Vugt, Marianne; Smeets, Ruben; van Schaik, René; de Vlieg, Jacob; Alkema, Wynand

    2010-01-01

    The scientific literature represents a rich source for retrieval of knowledge on associations between biomedical concepts such as genes, diseases and cellular processes. A commonly used method to establish relationships between biomedical concepts from literature is co-occurrence. Apart from its use in knowledge retrieval, the co-occurrence method is also well-suited to discover new, hidden relationships between biomedical concepts following a simple ABC-principle, in which A and C have no direct relationship, but are connected via shared B-intermediates. In this paper we describe CoPub Discovery, a tool that mines the literature for new relationships between biomedical concepts. Statistical analysis using ROC curves showed that CoPub Discovery performed well over a wide range of settings and keyword thesauri. We subsequently used CoPub Discovery to search for new relationships between genes, drugs, pathways and diseases. Several of the newly found relationships were validated using independent literature sources. In addition, new predicted relationships between compounds and cell proliferation were validated and confirmed experimentally in an in vitro cell proliferation assay. The results show that CoPub Discovery is able to identify novel associations between genes, drugs, pathways and diseases that have a high probability of being biologically valid. This makes CoPub Discovery a useful tool to unravel the mechanisms behind disease, to find novel drug targets, or to find novel applications for existing drugs. PMID:20885778

  13. Novel nickel resistance genes from the rhizosphere metagenome of plants adapted to acid mine drainage.

    PubMed

    Mirete, Salvador; de Figueras, Carolina G; González-Pastor, Jose E

    2007-10-01

    Metal resistance determinants have traditionally been found in cultivated bacteria. To search for genes involved in nickel resistance, we analyzed the bacterial community of the rhizosphere of Erica andevalensis, an endemic heather which grows at the banks of the Tinto River, a naturally metal-enriched and extremely acidic environment in southwestern Spain. 16S rRNA gene sequence analysis of rhizosphere DNA revealed the presence of members of five phylogenetic groups of Bacteria and the two main groups of Archaea mostly associated with sites impacted by acid mine drainage (AMD). The diversity observed and the presence of heavy metals in the rhizosphere led us to construct and screen five different metagenomic libraries hosted in Escherichia coli for searching novel nickel resistance determinants. A total of 13 positive clones were detected and analyzed. Insights about their possible mechanisms of resistance were obtained from cellular nickel content and sequence similarities. Two clones encoded putative ABC transporter components, and a novel mechanism of metal efflux is suggested. In addition, a nickel hyperaccumulation mechanism is proposed for a clone encoding a serine O-acetyltransferase. Five clones encoded proteins similar to well-characterized proteins but not previously reported to be related to nickel resistance, and the remaining six clones encoded hypothetical or conserved hypothetical proteins of uncertain functions. This is the first report documenting nickel resistance genes recovered from the metagenome of an AMD environment.

  14. How to learn about gene function: text-mining or ontologies?

    PubMed

    Soldatos, Theodoros G; Perdigão, Nelson; Brown, Nigel P; Sabir, Kenneth S; O'Donoghue, Seán I

    2015-03-01

    As the amount of genome information increases rapidly, there is a correspondingly greater need for methods that provide accurate and automated annotation of gene function. For example, many high-throughput technologies--e.g., next-generation sequencing--are being used today to generate lists of genes associated with specific conditions. However, their functional interpretation remains a challenge and many tools exist trying to characterize the function of gene-lists. Such systems rely typically in enrichment analysis and aim to give a quick insight into the underlying biology by presenting it in a form of a summary-report. While the load of annotation may be alleviated by such computational approaches, the main challenge in modern annotation remains to develop a systems form of analysis in which a pipeline can effectively analyze gene-lists quickly and identify aggregated annotations through computerized resources. In this article we survey some of the many such tools and methods that have been developed to automatically interpret the biological functions underlying gene-lists. We overview current functional annotation aspects from the perspective of their epistemology (i.e., the underlying theories used to organize information about gene function into a body of verified and documented knowledge) and find that most of the currently used functional annotation methods fall broadly into one of two categories: they are based either on 'known' formally-structured ontology annotations created by 'experts' (e.g., the GO terms used to describe the function of Entrez Gene entries), or--perhaps more adventurously--on annotations inferred from literature (e.g., many text-mining methods use computer-aided reasoning to acquire knowledge represented in natural languages). Overall however, deriving detailed and accurate insight from such gene lists remains a challenging task, and improved methods are called for. In particular, future methods need to (1) provide more holistic

  15. Global gene expression response to telomerase in bovine adrenocortical cells

    SciTech Connect

    Perrault, Steven D.; Hornsby, Peter J.; Betts, Dean H. . E-mail: bettsd@uoguelph.ca

    2005-09-30

    The infinite proliferative capability of most immortalized cells is dependent upon the presence of the enzyme telomerase and its ability to maintain telomere length and structure. However, telomerase may be involved in a greater system than telomere length regulation, as recent evidence has shown it capable of increasing wound healing in vivo, and improving cellular proliferation rate and survival from apoptosis in vitro. Here, we describe the global gene expression response to ectopic telomerase expression in an in vitro bovine adrenocortical cell model. Telomerase-immortalized cells showed an increased ability for proliferation and survival in minimal essential medium above cells transgenic for GFP. cDNA microarray analyses revealed an altered cell state indicative of increased adrenocortical cell proliferation regulated by the IGF2 pathway and alterations in members of the TGF-B family. As well, we identified alterations in genes associated with development and wound healing that support a model that high telomerase expression induces a highly adaptable, progenitor-like state.

  16. Differential global gene expression in red and white skeletal muscle

    NASA Technical Reports Server (NTRS)

    Campbell, W. G.; Gordon, S. E.; Carlson, C. J.; Pattison, J. S.; Hamilton, M. T.; Booth, F. W.

    2001-01-01

    The differences in gene expression among the fiber types of skeletal muscle have long fascinated scientists, but for the most part, previous experiments have only reported differences of one or two genes at a time. The evolving technology of global mRNA expression analysis was employed to determine the potential differential expression of approximately 3,000 mRNAs between the white quad (white muscle) and the red soleus muscle (mixed red muscle) of female ICR mice (30-35 g). Microarray analysis identified 49 mRNA sequences that were differentially expressed between white and mixed red skeletal muscle, including newly identified differential expressions between muscle types. For example, the current findings increase the number of known, differentially expressed mRNAs for transcription factors/coregulators by nine and signaling proteins by three. The expanding knowledge of the diversity of mRNA expression between white and mixed red muscle suggests that there could be quite a complex regulation of phenotype between muscles of different fiber types.

  17. Differential global gene expression in red and white skeletal muscle

    NASA Technical Reports Server (NTRS)

    Campbell, W. G.; Gordon, S. E.; Carlson, C. J.; Pattison, J. S.; Hamilton, M. T.; Booth, F. W.

    2001-01-01

    The differences in gene expression among the fiber types of skeletal muscle have long fascinated scientists, but for the most part, previous experiments have only reported differences of one or two genes at a time. The evolving technology of global mRNA expression analysis was employed to determine the potential differential expression of approximately 3,000 mRNAs between the white quad (white muscle) and the red soleus muscle (mixed red muscle) of female ICR mice (30-35 g). Microarray analysis identified 49 mRNA sequences that were differentially expressed between white and mixed red skeletal muscle, including newly identified differential expressions between muscle types. For example, the current findings increase the number of known, differentially expressed mRNAs for transcription factors/coregulators by nine and signaling proteins by three. The expanding knowledge of the diversity of mRNA expression between white and mixed red muscle suggests that there could be quite a complex regulation of phenotype between muscles of different fiber types.

  18. Genes Involved in the Evolution of Herbivory by a Leaf-Mining, Drosophilid Fly

    PubMed Central

    Whiteman, Noah K.; Gloss, Andrew D.; Sackton, Timothy B.; Groen, Simon C.; Humphrey, Parris T.; Lapoint, Richard T.; Sønderby, Ida E.; Halkier, Barbara A.; Kocks, Christine; Ausubel, Frederick M.; Pierce, Naomi E.

    2012-01-01

    Herbivorous insects are among the most successful radiations of life. However, we know little about the processes underpinning the evolution of herbivory. We examined the evolution of herbivory in the fly, Scaptomyza flava, whose larvae are leaf miners on species of Brassicaceae, including the widely studied reference plant, Arabidopsis thaliana (Arabidopsis). Scaptomyza flava is phylogenetically nested within the paraphyletic genus Drosophila, and the whole genome sequences available for 12 species of Drosophila facilitated phylogenetic analysis and assembly of a transcriptome for S. flava. A time-calibrated phylogeny indicated that leaf mining in Scaptomyza evolved between 6 and 16 million years ago. Feeding assays showed that biosynthesis of glucosinolates, the major class of antiherbivore chemical defense compounds in mustard leaves, was upregulated by S. flava larval feeding. The presence of glucosinolates in wild-type (WT) Arabidopsis plants reduced S. flava larval weight gain and increased egg–adult development time relative to flies reared in glucosinolate knockout (GKO) plants. An analysis of gene expression differences in 5-day-old larvae reared on WT versus GKO plants showed a total of 341 transcripts that were differentially regulated by glucosinolate uptake in larval S. flava. Of these, approximately a third corresponded to homologs of Drosophila melanogaster genes associated with starvation, dietary toxin-, heat-, oxidation-, and aging-related stress. The upregulated transcripts exhibited elevated rates of protein evolution compared with unregulated transcripts. The remaining differentially regulated transcripts also contained a higher proportion of novel genes than the unregulated transcripts. Thus, the transition to herbivory in Scaptomyza appears to be coupled with the evolution of novel genes and the co-option of conserved stress-related genes. PMID:22813779

  19. Sequential pattern mining for discovering gene interactions and their contextual information from biomedical texts.

    PubMed

    Cellier, Peggy; Charnois, Thierry; Plantevit, Marc; Rigotti, Christophe; Crémilleux, Bruno; Gandrillon, Olivier; Kléma, Jiří; Manguin, Jean-Luc

    2015-01-01

    Discovering gene interactions and their characterizations from biological text collections is a crucial issue in bioinformatics. Indeed, text collections are large and it is very difficult for biologists to fully take benefit from this amount of knowledge. Natural Language Processing (NLP) methods have been applied to extract background knowledge from biomedical texts. Some of existing NLP approaches are based on handcrafted rules and thus are time consuming and often devoted to a specific corpus. Machine learning based NLP methods, give good results but generate outcomes that are not really understandable by a user. We take advantage of an hybridization of data mining and natural language processing to propose an original symbolic method to automatically produce patterns conveying gene interactions and their characterizations. Therefore, our method not only allows gene interactions but also semantics information on the extracted interactions (e.g., modalities, biological contexts, interaction types) to be detected. Only limited resource is required: the text collection that is used as a training corpus. Our approach gives results comparable to the results given by state-of-the-art methods and is even better for the gene interaction detection in AIMed. Experiments show how our approach enables to discover interactions and their characterizations. To the best of our knowledge, there is few methods that automatically extract the interactions and also associated semantics information. The extracted gene interactions from PubMed are available through a simple web interface at https://bingotexte.greyc.fr/. The software is available at https://bingo2.greyc.fr/?q=node/22.

  20. The Algorithm of Development the World Ocean Mining of the Industry During the Global Crisis

    NASA Astrophysics Data System (ADS)

    Nyrkov, Anatoliy; Budnik, Vladislav; Sokolov, Sergei; Chernyi, Sergei

    2016-08-01

    In the article reviewed extraction effect of hydrocarbons on the general country's developing, under the impact of economical, demographical and technological factors, as well as it's future role in the world energy balance. Also adduced facts which designate offshore and deep water production of unconventional and conventional hydrocarbons including mining of marine mineral resources as perspective area of development in the future, despite all the difficulties of this sector. In the article considered the state and prospects of the Russian continental shelf, in consideration of its geographical location and its all existing problems.

  1. pubmed2ensembl: A Resource for Mining the Biological Literature on Genes

    PubMed Central

    Baran, Joachim; Gerner, Martin; Haeussler, Maximilian; Nenadic, Goran; Bergman, Casey M.

    2011-01-01

    Background The last two decades have witnessed a dramatic acceleration in the production of genomic sequence information and publication of biomedical articles. Despite the fact that genome sequence data and publications are two of the most heavily relied-upon sources of information for many biologists, very little effort has been made to systematically integrate data from genomic sequences directly with the biological literature. For a limited number of model organisms dedicated teams manually curate publications about genes; however for species with no such dedicated staff many thousands of articles are never mapped to genes or genomic regions. Methodology/Principal Findings To overcome the lack of integration between genomic data and biological literature, we have developed pubmed2ensembl (http://www.pubmed2ensembl.org), an extension to the BioMart system that links over 2,000,000 articles in PubMed to nearly 150,000 genes in Ensembl from 50 species. We use several sources of curated (e.g., Entrez Gene) and automatically generated (e.g., gene names extracted through text-mining on MEDLINE records) sources of gene-publication links, allowing users to filter and combine different data sources to suit their individual needs for information extraction and biological discovery. In addition to extending the Ensembl BioMart database to include published information on genes, we also implemented a scripting language for automated BioMart construction and a novel BioMart interface that allows text-based queries to be performed against PubMed and PubMed Central documents in conjunction with constraints on genomic features. Finally, we illustrate the potential of pubmed2ensembl through typical use cases that involve integrated queries across the biomedical literature and genomic data. Conclusion/Significance By allowing biologists to find the relevant literature on specific genomic regions or sets of functionally related genes more easily, pubmed2ensembl offers a much

  2. Data mining and multiparameter analysis of lung surfactant protein genes in bronchopulmonary dysplasia.

    PubMed

    Rova, Meri; Haataja, Ritva; Marttila, Riitta; Ollikainen, Vesa; Tammela, Outi; Hallman, Mikko

    2004-06-01

    Bronchopulmonary dysplasia (BPD), the most common chronic lung disease in infancy, is influenced by a number of antenatal and postnatal risk factors and is mostly preceded by respiratory distress syndrome (RDS) in the newborn. Surfactant protein (SP-A, -B, -C and -D) gene variations may play a role in both BPD and RDS. An association study between these candidate genes and BPD was performed. A total of 365 preterm Finnish infants in a high-risk population with gestational age genes. A multiparameter analysis was performed using Agrawal's algorithm based data mining and conventional methods of statistical allelic association. In singletons and presenting multiples, the frequency of SP-B intron 4 deletion variant allele was increased in BPD versus controls (P=0.008, OR=2.0, 95%CI 1.2-3.4). The presence of the SP-B intron 4 deletion variant was a risk factor for BPD even when essential external confounding factors were included in the analyses. No other SP polymorphisms associated with BPD, and the SP-B intron 4 variation did not associate with RDS. Transcription Element Search Software predicted allele-specific differences at several putative transcription factor binding sites that may be important in SP-B regulation. The present multiparameter analysis demonstrates the presumable direct involvement of the SP-B intron 4 deletion variant allele as a genetic risk factor to BPD. We propose that two separate SP-B gene polymorphisms have a phenotypic significance via separate molecular mechanisms: the intron 4 length variation affecting transcriptional regulation, and the exonic Ile131Thr variation affecting post-translationally.

  3. H-InvDB in 2009: extended database and data mining resources for human genes and transcripts.

    PubMed

    Yamasaki, Chisato; Murakami, Katsuhiko; Takeda, Jun-ichi; Sato, Yoshiharu; Noda, Akiko; Sakate, Ryuichi; Habara, Takuya; Nakaoka, Hajime; Todokoro, Fusano; Matsuya, Akihiro; Imanishi, Tadashi; Gojobori, Takashi

    2010-01-01

    We report the extended database and data mining resources newly released in the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). H-InvDB is a comprehensive annotation resource of human genes and transcripts, and consists of two main views and six sub-databases. The latest release of H-InvDB (release 6.2) provides the annotation for 219,765 human transcripts in 43,159 human gene clusters based on human full-length cDNAs and mRNAs. H-InvDB now provides several new annotation features, such as mapping of microarray probes, new gene models, relation to known ncRNAs and information from the Glycogene database. H-InvDB also provides useful data mining resources-'Navigation search', 'H-InvDB Enrichment Analysis Tool (HEAT)' and web service APIs. 'Navigation search' is an extended search system that enables complicated searches by combining 16 different search options. HEAT is a data mining tool for automatically identifying features specific to a given human gene set. HEAT searches for H-InvDB annotations that are significantly enriched in a user-defined gene set, as compared with the entire H-InvDB representative transcripts. H-InvDB now has web service APIs of SOAP and REST to allow the use of H-InvDB data in programs, providing the users extended data accessibility.

  4. Efficient Mining of Discriminative Co-clusters from Gene Expression Data.

    PubMed

    Odibat, Omar; Reddy, Chandan K

    2014-12-01

    Discriminative models are used to analyze the differences between two classes and to identify class-specific patterns. Most of the existing discriminative models depend on using the entire feature space to compute the discriminative patterns for each class. Co-clustering has been proposed to capture the patterns that are correlated in a subset of features, but it cannot handle discriminative patterns in labeled datasets. In certain biological applications such as gene expression analysis, it is critical to consider the discriminative patterns that are correlated only in a subset of the feature space. The objective of this paper is two-fold: first, it presents an algorithm to efficiently find arbitrarily positioned co-clusters from complex data. Second, it extends this co-clustering algorithm to discover discriminative co-clusters by incorporating the class information into the co-cluster search process. In addition, we also characterize the discriminative co-clusters and propose three novel measures that can be used to evaluate the performance of any discriminative subspace pattern mining algorithm. We evaluated the proposed algorithms on several synthetic and real gene expression datasets, and our experimental results showed that the proposed algorithms outperformed several existing algorithms available in the literature.

  5. Phylogenomic study of lipid genes involved in microalgal biofuel production-candidate gene mining and metabolic pathway analyses.

    PubMed

    Misra, Namrata; Panda, Prasanna Kumar; Parida, Bikram Kumar; Mishra, Barada Kanta

    2012-01-01

    Optimizing microalgal biofuel production using metabolic engineering tools requires an in-depth understanding of the structure-function relationship of genes involved in lipid biosynthetic pathway. In the present study, genome-wide identification and characterization of 398 putative genes involved in lipid biosynthesis in Arabidopsis thaliana Chlamydomonas reinhardtii, Volvox carteri, Ostreococcus lucimarinus, Ostreococcus tauri and Cyanidioschyzon merolae was undertaken on the basis of their conserved motif/domain organization and phylogenetic profile. The results indicated that the core lipid metabolic pathways in all the species are carried out by a comparable number of orthologous proteins. Although the fundamental gene organizations were observed to be invariantly conserved between microalgae and Arabidopsis genome, with increased order of genome complexity there seems to be an association with more number of genes involved in triacylglycerol (TAG) biosynthesis and catabolism. Further, phylogenomic analysis of the genes provided insights into the molecular evolution of lipid biosynthetic pathway in microalgae and confirm the close evolutionary proximity between the Streptophyte and Chlorophyte lineages. Together, these studies will improve our understanding of the global lipid metabolic pathway and contribute to the engineering of regulatory networks of algal strains for higher accumulation of oil.

  6. Phylogenomic Study of Lipid Genes Involved in Microalgal Biofuel Production—Candidate Gene Mining and Metabolic Pathway Analyses

    PubMed Central

    Misra, Namrata; Panda, Prasanna Kumar; Parida, Bikram Kumar; Mishra, Barada Kanta

    2012-01-01

    Optimizing microalgal biofuel production using metabolic engineering tools requires an in-depth understanding of the structure-function relationship of genes involved in lipid biosynthetic pathway. In the present study, genome-wide identification and characterization of 398 putative genes involved in lipid biosynthesis in Arabidopsis thaliana Chlamydomonas reinhardtii, Volvox carteri, Ostreococcus lucimarinus, Ostreococcus tauri and Cyanidioschyzon merolae was undertaken on the basis of their conserved motif/domain organization and phylogenetic profile. The results indicated that the core lipid metabolic pathways in all the species are carried out by a comparable number of orthologous proteins. Although the fundamental gene organizations were observed to be invariantly conserved between microalgae and Arabidopsis genome, with increased order of genome complexity there seems to be an association with more number of genes involved in triacylglycerol (TAG) biosynthesis and catabolism. Further, phylogenomic analysis of the genes provided insights into the molecular evolution of lipid biosynthetic pathway in microalgae and confirm the close evolutionary proximity between the Streptophyte and Chlorophyte lineages. Together, these studies will improve our understanding of the global lipid metabolic pathway and contribute to the engineering of regulatory networks of algal strains for higher accumulation of oil. PMID:23032611

  7. Assessment of Local Biodiversity Loss in Uranium Mining-Tales And Its Projections On Global Scale

    NASA Astrophysics Data System (ADS)

    Sharshenova, D.; Zhamangulova, N.

    2015-12-01

    In Min-Kush, northern Kyrgyzstan there are 8 mining tales with an estimate of 1 961 000 tones of industrial Uranium. Local ecosystem services have declined rapidly. We analyzed a terrestrial assemblage database of Uranium mine-tale to quantify local biodiversity responses to land use and environmental changes. In the worst-affected habitats species richness reduced by 95.7%, total abundance by 60.9% and rarefaction-based richness by 72.5%. We estimate that, regional mountain ecosystem affected by this pressure reduced average within-sample richness (by 17.01%), total abundance (16.5%) and rarefaction-based richness (14.5%). Business-as-usual scenarios are the widely practiced in the region and moreover, due to economic constraints country can not afford any mitigation scenarios. We project that biodiversity loss and ecosystem service impairment will spread in the region through ground water, soil, plants, animals and microorganisms at the rate of 1km/year. Entire Tian-Shan mountain chain will be in danger within next 5-10 years. Our preliminary data shows that local people live in this area developed various forms of cancer, and the rate of premature death is as high as 40%. Strong international scientific and socio-economic partnership is needed to develop models and predictions.

  8. Identification of candidate genes in Populus cell wall biosynthesis using text-mining, co-expression network and comparative genomics

    SciTech Connect

    Yang, Xiaohan; Ye, Chuyu; Bisaria, Anjali; Tuskan, Gerald A; Kalluri, Udaya C

    2011-01-01

    Populus is an important bioenergy crop for bioethanol production. A greater understanding of cell wall biosynthesis processes is critical in reducing biomass recalcitrance, a major hindrance in efficient generation of ethanol from lignocellulosic biomass. Here, we report the identification of candidate cell wall biosynthesis genes through the development and application of a novel bioinformatics pipeline. As a first step, via text-mining of PubMed publications, we obtained 121 Arabidopsis genes that had the experimental evidences supporting their involvement in cell wall biosynthesis or remodeling. The 121 genes were then used as bait genes to query an Arabidopsis co-expression database and additional genes were identified as neighbors of the bait genes in the network, increasing the number of genes to 548. The 548 Arabidopsis genes were then used to re-query the Arabidopsis co-expression database and re-construct a network that captured additional network neighbors, expanding to a total of 694 genes. The 694 Arabidopsis genes were computationally divided into 22 clusters. Queries of the Populus genome using the Arabidopsis genes revealed 817 Populus orthologs. Functional analysis of gene ontology and tissue-specific gene expression indicated that these Arabidopsis and Populus genes are high likelihood candidates for functional genomics in relation to cell wall biosynthesis.

  9. Mining Genes Involved in Insecticide Resistance of Liposcelis bostrychophila Badonnel by Transcriptome and Expression Profile Analysis

    PubMed Central

    Dou, Wei; Shen, Guang-Mao; Niu, Jin-Zhi; Ding, Tian-Bo; Wei, Dan-Dan; Wang, Jin-Jun

    2013-01-01

    Background Recent studies indicate that infestations of psocids pose a new risk for global food security. Among the psocids species, Liposcelis bostrychophila Badonnel has gained recognition in importance because of its parthenogenic reproduction, rapid adaptation, and increased worldwide distribution. To date, the molecular data available for L. bostrychophila is largely limited to genes identified through homology. Also, no transcriptome data relevant to psocids infection is available. Methodology and Principal Findings In this study, we generated de novo assembly of L. bostrychophila transcriptome performed through the short read sequencing technology (Illumina). In a single run, we obtained more than 51 million sequencing reads that were assembled into 60,012 unigenes (mean size = 711 bp) by Trinity. The transcriptome sequences from different developmental stages of L. bostrychophila including egg, nymph and adult were annotated with non-redundant (Nr) protein database, gene ontology (GO), cluster of orthologous groups of proteins (COG), and KEGG orthology (KO). The analysis revealed three major enzyme families involved in insecticide metabolism as differentially expressed in the L. bostrychophila transcriptome. A total of 49 P450-, 31 GST- and 21 CES-specific genes representing the three enzyme families were identified. Besides, 16 transcripts were identified to contain target site sequences of resistance genes. Furthermore, we profiled gene expression patterns upon insecticide (malathion and deltamethrin) exposure using the tag-based digital gene expression (DGE) method. Conclusion The L. bostrychophila transcriptome and DGE data provide gene expression data that would further our understanding of molecular mechanisms in psocids. In particular, the findings of this investigation will facilitate identification of genes involved in insecticide resistance and designing of new compounds for control of psocids. PMID:24278202

  10. Data-mining analysis of the global distribution of soil carbon in observational databases and Earth system models

    NASA Astrophysics Data System (ADS)

    Hashimoto, Shoji; Nanko, Kazuki; Ťupek, Boris; Lehtonen, Aleksi

    2017-03-01

    Future climate change will dramatically change the carbon balance in the soil, and this change will affect the terrestrial carbon stock and the climate itself. Earth system models (ESMs) are used to understand the current climate and to project future climate conditions, but the soil organic carbon (SOC) stock simulated by ESMs and those of observational databases are not well correlated when the two are compared at fine grid scales. However, the specific key processes and factors, as well as the relationships among these factors that govern the SOC stock, remain unclear; the inclusion of such missing information would improve the agreement between modeled and observational data. In this study, we sought to identify the influential factors that govern global SOC distribution in observational databases, as well as those simulated by ESMs. We used a data-mining (machine-learning) (boosted regression trees - BRT) scheme to identify the factors affecting the SOC stock. We applied BRT scheme to three observational databases and 15 ESM outputs from the fifth phase of the Coupled Model Intercomparison Project (CMIP5) and examined the effects of 13 variables/factors categorized into five groups (climate, soil property, topography, vegetation, and land-use history). Globally, the contributions of mean annual temperature, clay content, carbon-to-nitrogen (CN) ratio, wetland ratio, and land cover were high in observational databases, whereas the contributions of the mean annual temperature, land cover, and net primary productivity (NPP) were predominant in the SOC distribution in ESMs. A comparison of the influential factors at a global scale revealed that the most distinct differences between the SOCs from the observational databases and ESMs were the low clay content and CN ratio contributions, and the high NPP contribution in the ESMs. The results of this study will aid in identifying the causes of the current mismatches between observational SOC databases and ESM outputs

  11. Global demand for rare earth resources and strategies for green mining

    USDA-ARS?s Scientific Manuscript database

    Rare earths elements (REEs) are essential raw materials for the emerging green (low-carbon) energy technologies and ‘smart’ electronic devices. Global REE demand is slated to grow at a compound annual rate of 5% by 2020. Such high growth rate would require a steady supply base of REEs in the long ru...

  12. A GLOBAL METHANE EMISSIONS PROGRAM FOR LANDFILLS, COAL MINES, AND NATURAL GAS SYSTEMS

    EPA Science Inventory

    The paper gives the scope and methodology of EPA/AEERL's methane emissions studies and discloses data accumulated thus far in the program. Anthropogenic methane emissions are a principal focus in AEERL's global climate research program, including three major sources: municipal so...

  13. Your Place or Mine? Global Imbalances in Internationalisation and Mobilisation in Educational Professional Experience

    ERIC Educational Resources Information Center

    Buchanan, John; Widodo, Ari

    2016-01-01

    International mobility programmes and opportunities have enthusiastically been embraced by universities as part of a growing demand for graduates with global, international and intercultural capital on the part of graduates. In this project, we take two universities, one Australian and one Indonesian, as illustrative case studies of some of the…

  14. Your Place or Mine? Global Imbalances in Internationalisation and Mobilisation in Educational Professional Experience

    ERIC Educational Resources Information Center

    Buchanan, John; Widodo, Ari

    2016-01-01

    International mobility programmes and opportunities have enthusiastically been embraced by universities as part of a growing demand for graduates with global, international and intercultural capital on the part of graduates. In this project, we take two universities, one Australian and one Indonesian, as illustrative case studies of some of the…

  15. Gold Mining in the Peruvian Amazon: Global Prices, Deforestation, and Mercury Imports

    Treesearch

    Jennifer J Swenson; Catherine E Carter; Jean-Christophe Domec; Cesar I Delgado

    2011-01-01

    Many factors such as poverty, ineffective institutions and environmental regulations may prevent developing countries from managing how natural resources are extracted to meet a strong market demand. Extraction for some resources has reached such proportions that evidence is measurable from space. We present recent evidence of the global demand for a single commodity...

  16. A GLOBAL METHANE EMISSIONS PROGRAM FOR LANDFILLS, COAL MINES, AND NATURAL GAS SYSTEMS

    EPA Science Inventory

    The paper gives the scope and methodology of EPA/AEERL's methane emissions studies and discloses data accumulated thus far in the program. Anthropogenic methane emissions are a principal focus in AEERL's global climate research program, including three major sources: municipal so...

  17. The use and re-use of unsustainably mined groundwater: A global budget

    NASA Astrophysics Data System (ADS)

    Grogan, D. S.; Prousevitch, A.; Wisser, D.; Lammers, R. B.; Frolking, S. E.

    2015-12-01

    Many of the world's major groundwater aquifers are rapidly depleting due to unsustainable groundwater pumping, while demand for food production - and therefore demand for irrigation water ­- is increasing. While it is likely that groundwater users will be impacted by the future's inevitable reduction in groundwater availability, there is a major gap in our understanding of potential impacts downstream of pumping sites. Due to inefficiencies in irrigation systems, significant amounts of abstracted groundwater become runoff, entering surface waters and flowing downstream to be re-abstracted and used again. In this study, we use a gridded water balance model to calculate the amount of unsustainably pumped groundwater that enters surface water systems by way of irrigation runoff, and quantify the additional irrigation water supplied by the re-use of this water. We assess the global budget of unsustainable groundwater sources and sinks, including downstream re-use, groundwater recharge, and flow to the oceans. Globally, we find that 80% of unsustainable groundwater is re-abstracted for irrigation either downstream or locally from groundwater recharge. This re-abstracted water contributes the water equivalent needed to irrigate 200,000 km2 of cropland globally. Including irrigation runoff reuse in an assessment of irrigation efficiency, we see that the traditional concept of irrigation efficiency (net irrigation/gross irrigation) significantly overestimates water "waste". We define a basin efficiency for unsustainable groundwater use that includes re-use, and see that while global irrigation efficiency is often estimated at 50%, global average unsustainable water use efficiency is > 60%. Losing this re-use resource by increasing irrigation efficiency does little to alleviate unsustainable groundwater demands.

  18. Nitrogenase gene amplicons from global marine surface waters are dominated by genes of non-cyanobacteria.

    PubMed

    Farnelid, Hanna; Andersson, Anders F; Bertilsson, Stefan; Al-Soud, Waleed Abu; Hansen, Lars H; Sørensen, Søren; Steward, Grieg F; Hagström, Åke; Riemann, Lasse

    2011-04-29

    Cyanobacteria are thought to be the main N(2)-fixing organisms (diazotrophs) in marine pelagic waters, but recent molecular analyses indicate that non-cyanobacterial diazotrophs are also present and active. Existing data are, however, restricted geographically and by limited sequencing depths. Our analysis of 79,090 nitrogenase (nifH) PCR amplicons encoding 7,468 unique proteins from surface samples (ten DNA samples and two RNA samples) collected at ten marine locations world-wide provides the first in-depth survey of a functional bacterial gene and yield insights into the composition and diversity of the nifH gene pool in marine waters. Great divergence in nifH composition was observed between sites. Cyanobacteria-like genes were most frequent among amplicons from the warmest waters, but overall the data set was dominated by nifH sequences most closely related to non-cyanobacteria. Clusters related to Alpha-, Beta-, Gamma-, and Delta-Proteobacteria were most common and showed distinct geographic distributions. Sequences related to anaerobic bacteria (nifH Cluster III) were generally rare, but preponderant in cold waters, especially in the Arctic. Although the two transcript samples were dominated by unicellular cyanobacteria, 42% of the identified non-cyanobacterial nifH clusters from the corresponding DNA samples were also detected in cDNA. The study indicates that non-cyanobacteria account for a substantial part of the nifH gene pool in marine surface waters and that these genes are at least occasionally expressed. The contribution of non-cyanobacterial diazotrophs to the global N(2) fixation budget cannot be inferred from sequence data alone, but the prevalence of non-cyanobacterial nifH genes and transcripts suggest that these bacteria are ecologically significant.

  19. Text Mining Effectively Scores and Ranks the Literature for Improving Chemical-Gene-Disease Curation at the Comparative Toxicogenomics Database

    PubMed Central

    Johnson, Robin J.; Lay, Jean M.; Lennon-Hopkins, Kelley; Saraceni-Richards, Cynthia; Sciaky, Daniela; Murphy, Cynthia Grondin; Mattingly, Carolyn J.

    2013-01-01

    The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) is a public resource that curates interactions between environmental chemicals and gene products, and their relationships to diseases, as a means of understanding the effects of environmental chemicals on human health. CTD provides a triad of core information in the form of chemical-gene, chemical-disease, and gene-disease interactions that are manually curated from scientific articles. To increase the efficiency, productivity, and data coverage of manual curation, we have leveraged text mining to help rank and prioritize the triaged literature. Here, we describe our text-mining process that computes and assigns each article a document relevancy score (DRS), wherein a high DRS suggests that an article is more likely to be relevant for curation at CTD. We evaluated our process by first text mining a corpus of 14,904 articles triaged for seven heavy metals (cadmium, cobalt, copper, lead, manganese, mercury, and nickel). Based upon initial analysis, a representative subset corpus of 3,583 articles was then selected from the 14,094 articles and sent to five CTD biocurators for review. The resulting curation of these 3,583 articles was analyzed for a variety of parameters, including article relevancy, novel data content, interaction yield rate, mean average precision, and biological and toxicological interpretability. We show that for all measured parameters, the DRS is an effective indicator for scoring and improving the ranking of literature for the curation of chemical-gene-disease information at CTD. Here, we demonstrate how fully incorporating text mining-based DRS scoring into our curation pipeline enhances manual curation by prioritizing more relevant articles, thereby increasing data content, productivity, and efficiency. PMID:23613709

  20. Text mining effectively scores and ranks the literature for improving chemical-gene-disease curation at the comparative toxicogenomics database.

    PubMed

    Davis, Allan Peter; Wiegers, Thomas C; Johnson, Robin J; Lay, Jean M; Lennon-Hopkins, Kelley; Saraceni-Richards, Cynthia; Sciaky, Daniela; Murphy, Cynthia Grondin; Mattingly, Carolyn J

    2013-01-01

    The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) is a public resource that curates interactions between environmental chemicals and gene products, and their relationships to diseases, as a means of understanding the effects of environmental chemicals on human health. CTD provides a triad of core information in the form of chemical-gene, chemical-disease, and gene-disease interactions that are manually curated from scientific articles. To increase the efficiency, productivity, and data coverage of manual curation, we have leveraged text mining to help rank and prioritize the triaged literature. Here, we describe our text-mining process that computes and assigns each article a document relevancy score (DRS), wherein a high DRS suggests that an article is more likely to be relevant for curation at CTD. We evaluated our process by first text mining a corpus of 14,904 articles triaged for seven heavy metals (cadmium, cobalt, copper, lead, manganese, mercury, and nickel). Based upon initial analysis, a representative subset corpus of 3,583 articles was then selected from the 14,094 articles and sent to five CTD biocurators for review. The resulting curation of these 3,583 articles was analyzed for a variety of parameters, including article relevancy, novel data content, interaction yield rate, mean average precision, and biological and toxicological interpretability. We show that for all measured parameters, the DRS is an effective indicator for scoring and improving the ranking of literature for the curation of chemical-gene-disease information at CTD. Here, we demonstrate how fully incorporating text mining-based DRS scoring into our curation pipeline enhances manual curation by prioritizing more relevant articles, thereby increasing data content, productivity, and efficiency.

  1. Mining topological structures of protein-protein interaction networks for human brain-specific genes.

    PubMed

    Cui, W J; Gong, X J; Yu, H; Zhang, X C

    2015-10-16

    Compared to other placental mammals, humans have unique thinking and cognitive abilities because of their developed cerebral cortex composed of billions of neurons and synaptic connections. As the primary effectors of the mechanisms of life, proteins and their interactions form the basis of cellular and molecular functions in the living body. In this paper, we developed a pipeline for mining topological structures, identifying functional modules, and analyzing their functions from publically available datasets. A human brain-specific protein-protein interaction network with 1482 nodes and 3105 edges was built using a MapReduce based shortest path algorithm. Within this, 7 functional cliques were identified using a network clustering method, 98 hub proteins were obtained by the calculation of betweenness and connectivity, and 5 closest relationship to clique connector proteins were recognized by the combination scores of topological distance and gene ontology similarity. Furthermore, we discovered functional modules interacting with TP53 protein, which involves several fragmented research study conclusions and might be an important clue for further in vivo or in silico experiments to confirm these associations.

  2. ChimerDB 3.0: an enhanced database for fusion genes from cancer transcriptome and literature data mining

    PubMed Central

    Lee, Myunggyo; Lee, Kyubum; Yu, Namhee; Jang, Insu; Choi, Ikjung; Kim, Pora; Jang, Ye Eun; Kim, Byounggun; Kim, Sunkyu; Lee, Byungwook; Kang, Jaewoo; Lee, Sanghyuk

    2017-01-01

    Fusion gene is an important class of therapeutic targets and prognostic markers in cancer. ChimerDB is a comprehensive database of fusion genes encompassing analysis of deep sequencing data and manual curations. In this update, the database coverage was enhanced considerably by adding two new modules of The Cancer Genome Atlas (TCGA) RNA-Seq analysis and PubMed abstract mining. ChimerDB 3.0 is composed of three modules of ChimerKB, ChimerPub and ChimerSeq. ChimerKB represents a knowledgebase including 1066 fusion genes with manual curation that were compiled from public resources of fusion genes with experimental evidences. ChimerPub includes 2767 fusion genes obtained from text mining of PubMed abstracts. ChimerSeq module is designed to archive the fusion candidates from deep sequencing data. Importantly, we have analyzed RNA-Seq data of the TCGA project covering 4569 patients in 23 cancer types using two reliable programs of FusionScan and TopHat-Fusion. The new user interface supports diverse search options and graphic representation of fusion gene structure. ChimerDB 3.0 is available at http://ercsb.ewha.ac.kr/fusiongene/. PMID:27899563

  3. ChimerDB 3.0: an enhanced database for fusion genes from cancer transcriptome and literature data mining.

    PubMed

    Lee, Myunggyo; Lee, Kyubum; Yu, Namhee; Jang, Insu; Choi, Ikjung; Kim, Pora; Jang, Ye Eun; Kim, Byounggun; Kim, Sunkyu; Lee, Byungwook; Kang, Jaewoo; Lee, Sanghyuk

    2017-01-04

    Fusion gene is an important class of therapeutic targets and prognostic markers in cancer. ChimerDB is a comprehensive database of fusion genes encompassing analysis of deep sequencing data and manual curations. In this update, the database coverage was enhanced considerably by adding two new modules of The Cancer Genome Atlas (TCGA) RNA-Seq analysis and PubMed abstract mining. ChimerDB 3.0 is composed of three modules of ChimerKB, ChimerPub and ChimerSeq. ChimerKB represents a knowledgebase including 1066 fusion genes with manual curation that were compiled from public resources of fusion genes with experimental evidences. ChimerPub includes 2767 fusion genes obtained from text mining of PubMed abstracts. ChimerSeq module is designed to archive the fusion candidates from deep sequencing data. Importantly, we have analyzed RNA-Seq data of the TCGA project covering 4569 patients in 23 cancer types using two reliable programs of FusionScan and TopHat-Fusion. The new user interface supports diverse search options and graphic representation of fusion gene structure. ChimerDB 3.0 is available at http://ercsb.ewha.ac.kr/fusiongene/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. Isolation and characterisation of mineral-oxidising "Acidibacillus" spp. from mine sites and geothermal environments in different global locations.

    PubMed

    Holanda, Roseanne; Hedrich, Sabrina; Ňancucheo, Ivan; Oliveira, Guilherme; Grail, Barry M; Johnson, D Barrie

    2016-09-01

    Eight strains of acidophilic bacteria, isolated from mine-impacted and geothermal sites from different parts of the world, were shown to form a distinct clade (proposed genus "Acidibacillus") within the phylum Firmicutes, well separated from the acidophilic genera Sulfobacillus and Alicyclobacillus. Two of the strains (both isolated from sites in Yellowstone National Park, USA) were moderate thermophiles that oxidised both ferrous iron and elemental sulphur, while the other six were mesophiles that also oxidised ferrous iron, but not sulphur. All eight isolates reduced ferric iron to varying degrees. The two groups shared <95% similarity of their 16S rRNA genes and were therefore considered to be distinct species: "Acidibacillus sulfuroxidans" (moderately thermophilic isolates) and "Acidibacillus ferrooxidans" (mesophilic isolates). Both species were obligate heterotrophs; none of the eight strains grew in the absence of organic carbon. "Acidibacillus" spp. were generally highly tolerant of elevated concentrations of cationic transition metals, though "A. sulfuroxidans" strains were more sensitive to some (e.g. nickel and zinc) than those of "A. ferrooxidans". Initial annotation of the genomes of two strains of "A. ferrooxidans" revealed the presence of genes (cbbL) involved in the RuBisCO pathway for CO2 assimilation and iron oxidation (rus), though with relatively low sequence identities.

  5. Mountaintop mining consequences

    Treesearch

    M.A. Palmer; E.S. Bernhardt; W.H. Schlesinger; K.N. Eshleman; E. Foufoula-Georgiou; M.S. Hendryx; A.D. Lemly; G.E. Likens; O.L. Loucks; M.E. Power; P.S. White; P.R. Wilcock

    2010-01-01

    There has been a global, 30-year increase in surface mining (1), which is now the dominant driver of land-use change in the central Appalachian ecoregion of the United States (2). One major form of such mining, mountaintop mining with valley fills (MTM/VF) (3), is widespread throughout eastern Kentucky, West Virginia (WV), and southwestern Virginia. Upper elevation...

  6. bc-GenExMiner 3.0: new mining module computes breast cancer gene expression correlation analyses.

    PubMed

    Jézéquel, Pascal; Frénel, Jean-Sébastien; Campion, Loïc; Guérin-Charbonnel, Catherine; Gouraud, Wilfried; Ricolleau, Gabriel; Campone, Mario

    2013-01-01

    We recently developed a user-friendly web-based application called bc-GenExMiner (http://bcgenex.centregauducheau.fr), which offered the possibility to evaluate prognostic informativity of genes in breast cancer by means of a 'prognostic module'. In this study, we develop a new module called 'correlation module', which includes three kinds of gene expression correlation analyses. The first one computes correlation coefficient between 2 or more (up to 10) chosen genes. The second one produces two lists of genes that are most correlated (positively and negatively) to a 'tested' gene. A gene ontology (GO) mining function is also proposed to explore GO 'biological process', 'molecular function' and 'cellular component' terms enrichment for the output lists of most correlated genes. The third one explores gene expression correlation between the 15 telomeric and 15 centromeric genes surrounding a 'tested' gene. These correlation analyses can be performed in different groups of patients: all patients (without any subtyping), in molecular subtypes (basal-like, HER2+, luminal A and luminal B) and according to oestrogen receptor status. Validation tests based on published data showed that these automatized analyses lead to results consistent with studies' conclusions. In brief, this new module has been developed to help basic researchers explore molecular mechanisms of breast cancer. DATABASE URL: http://bcgenex.centregauducheau.fr

  7. bc-GenExMiner 3.0: new mining module computes breast cancer gene expression correlation analyses

    PubMed Central

    Jézéquel, Pascal; Frénel, Jean-Sébastien; Campion, Loïc; Guérin-Charbonnel, Catherine; Gouraud, Wilfried; Ricolleau, Gabriel; Campone, Mario

    2013-01-01

    We recently developed a user-friendly web-based application called bc-GenExMiner (http://bcgenex.centregauducheau.fr), which offered the possibility to evaluate prognostic informativity of genes in breast cancer by means of a ‘prognostic module’. In this study, we develop a new module called ‘correlation module’, which includes three kinds of gene expression correlation analyses. The first one computes correlation coefficient between 2 or more (up to 10) chosen genes. The second one produces two lists of genes that are most correlated (positively and negatively) to a ‘tested’ gene. A gene ontology (GO) mining function is also proposed to explore GO ‘biological process’, ‘molecular function’ and ‘cellular component’ terms enrichment for the output lists of most correlated genes. The third one explores gene expression correlation between the 15 telomeric and 15 centromeric genes surrounding a ‘tested’ gene. These correlation analyses can be performed in different groups of patients: all patients (without any subtyping), in molecular subtypes (basal-like, HER2+, luminal A and luminal B) and according to oestrogen receptor status. Validation tests based on published data showed that these automatized analyses lead to results consistent with studies’ conclusions. In brief, this new module has been developed to help basic researchers explore molecular mechanisms of breast cancer. Database URL: http://bcgenex.centregauducheau.fr PMID:23325629

  8. Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD)

    PubMed Central

    Wiegers, Thomas C; Davis, Allan Peter; Cohen, K Bretonnel; Hirschman, Lynette; Mattingly, Carolyn J

    2009-01-01

    Background The Comparative Toxicogenomics Database (CTD) is a publicly available resource that promotes understanding about the etiology of environmental diseases. It provides manually curated chemical-gene/protein interactions and chemical- and gene-disease relationships from the peer-reviewed, published literature. The goals of the research reported here were to establish a baseline analysis of current CTD curation, develop a text-mining prototype from readily available open source components, and evaluate its potential value in augmenting curation efficiency and increasing data coverage. Results Prototype text-mining applications were developed and evaluated using a CTD data set consisting of manually curated molecular interactions and relationships from 1,600 documents. Preliminary results indicated that the prototype found 80% of the gene, chemical, and disease terms appearing in curated interactions. These terms were used to re-rank documents for curation, resulting in increases in mean average precision (63% for the baseline vs. 73% for a rule-based re-ranking), and in the correlation coefficient of rank vs. number of curatable interactions per document (baseline 0.14 vs. 0.38 for the rule-based re-ranking). Conclusion This text-mining project is unique in its integration of existing tools into a single workflow with direct application to CTD. We performed a baseline assessment of the inter-curator consistency and coverage in CTD, which allowed us to measure the potential of these integrated tools to improve prioritization of journal articles for manual curation. Our study presents a feasible and cost-effective approach for developing a text mining solution to enhance manual curation throughput and efficiency. PMID:19814812

  9. Text mining and manual curation of chemical-gene-disease networks for the comparative toxicogenomics database (CTD).

    PubMed

    Wiegers, Thomas C; Davis, Allan Peter; Cohen, K Bretonnel; Hirschman, Lynette; Mattingly, Carolyn J

    2009-10-08

    The Comparative Toxicogenomics Database (CTD) is a publicly available resource that promotes understanding about the etiology of environmental diseases. It provides manually curated chemical-gene/protein interactions and chemical- and gene-disease relationships from the peer-reviewed, published literature. The goals of the research reported here were to establish a baseline analysis of current CTD curation, develop a text-mining prototype from readily available open source components, and evaluate its potential value in augmenting curation efficiency and increasing data coverage. Prototype text-mining applications were developed and evaluated using a CTD data set consisting of manually curated molecular interactions and relationships from 1,600 documents. Preliminary results indicated that the prototype found 80% of the gene, chemical, and disease terms appearing in curated interactions. These terms were used to re-rank documents for curation, resulting in increases in mean average precision (63% for the baseline vs. 73% for a rule-based re-ranking), and in the correlation coefficient of rank vs. number of curatable interactions per document (baseline 0.14 vs. 0.38 for the rule-based re-ranking). This text-mining project is unique in its integration of existing tools into a single workflow with direct application to CTD. We performed a baseline assessment of the inter-curator consistency and coverage in CTD, which allowed us to measure the potential of these integrated tools to improve prioritization of journal articles for manual curation. Our study presents a feasible and cost-effective approach for developing a text mining solution to enhance manual curation throughput and efficiency.

  10. Functions and Unique Diversity of Genes and Microorganisms Involved in Arsenite Oxidation from the Tailings of a Realgar Mine.

    PubMed

    Zeng, Xian-Chun; E, Guoji; Wang, Jianing; Wang, Nian; Chen, Xiaoming; Mu, Yao; Li, Hao; Yang, Ye; Liu, Yichen; Wang, Yanxin

    2016-12-15

    The tailings of the Shimen realgar mine have unique geochemical features. Arsenite oxidation is one of the major biogeochemical processes that occurs in the tailings. However, little is known about the functional and molecular aspects of the microbial community involved in arsenite oxidation. Here, we fully explored the functional and molecular features of the microbial communities from the tailings of the Shimen realgar mine. We collected six samples of tailings from sites A, B, C, D, E, and F. Microcosm assays indicated that all of the six sites contain both chemoautotrophic and heterotrophic arsenite-oxidizing microorganisms; their activities differed considerably from each other. The microbial arsenite-oxidizing activities show a positive correlation with soluble arsenic concentrations. The microbial communities of the six sites contain 40 phyla of bacteria and 2 phyla of archaea that show extremely high diversity. Soluble arsenic, sulfate, pH, and total organic carbon (TOC) are the key environmental factors that shape the microbial communities. We further identified 114 unique arsenite oxidase genes from the samples; all of them code for new or new-type arsenite oxidases. We also isolated 10 novel arsenite oxidizers from the samples, of which 4 are chemoautotrophic and 6 are heterotrophic. These data highlight the unique diversities of the arsenite-oxidizing microorganisms and their oxidase genes from the tailings of the Shimen realgar mine. To the best of our knowledge, this is the first report describing the functional and molecular features of microbial communities from the tailings of a realgar mine. This study focused on the functional and molecular characterizations of microbial communities from the tailings of the Shimen realgar mine. We fully explored, for the first time, the arsenite-oxidizing activities and the functional gene diversities of microorganisms from the tailings, as well as the correlation of the microbial activities/diversities with

  11. Functions and Unique Diversity of Genes and Microorganisms Involved in Arsenite Oxidation from the Tailings of a Realgar Mine

    PubMed Central

    E, Guoji; Wang, Jianing; Wang, Nian; Chen, Xiaoming; Mu, Yao; Li, Hao; Yang, Ye; Liu, Yichen; Wang, Yanxin

    2016-01-01

    ABSTRACT The tailings of the Shimen realgar mine have unique geochemical features. Arsenite oxidation is one of the major biogeochemical processes that occurs in the tailings. However, little is known about the functional and molecular aspects of the microbial community involved in arsenite oxidation. Here, we fully explored the functional and molecular features of the microbial communities from the tailings of the Shimen realgar mine. We collected six samples of tailings from sites A, B, C, D, E, and F. Microcosm assays indicated that all of the six sites contain both chemoautotrophic and heterotrophic arsenite-oxidizing microorganisms; their activities differed considerably from each other. The microbial arsenite-oxidizing activities show a positive correlation with soluble arsenic concentrations. The microbial communities of the six sites contain 40 phyla of bacteria and 2 phyla of archaea that show extremely high diversity. Soluble arsenic, sulfate, pH, and total organic carbon (TOC) are the key environmental factors that shape the microbial communities. We further identified 114 unique arsenite oxidase genes from the samples; all of them code for new or new-type arsenite oxidases. We also isolated 10 novel arsenite oxidizers from the samples, of which 4 are chemoautotrophic and 6 are heterotrophic. These data highlight the unique diversities of the arsenite-oxidizing microorganisms and their oxidase genes from the tailings of the Shimen realgar mine. To the best of our knowledge, this is the first report describing the functional and molecular features of microbial communities from the tailings of a realgar mine. IMPORTANCE This study focused on the functional and molecular characterizations of microbial communities from the tailings of the Shimen realgar mine. We fully explored, for the first time, the arsenite-oxidizing activities and the functional gene diversities of microorganisms from the tailings, as well as the correlation of the microbial activities

  12. Identification of novel target genes for safer and more specific control of root-knot nematodes from a pan-genome mining.

    PubMed

    Danchin, Etienne G J; Arguel, Marie-Jeanne; Campan-Fournier, Amandine; Perfus-Barbeoch, Laetitia; Magliano, Marc; Rosso, Marie-Noëlle; Da Rocha, Martine; Da Silva, Corinne; Nottet, Nicolas; Labadie, Karine; Guy, Julie; Artiguenave, François; Abad, Pierre

    2013-10-01

    Root-knot nematodes are globally the most aggressive and damaging plant-parasitic nematodes. Chemical nematicides have so far constituted the most efficient control measures against these agricultural pests. Because of their toxicity for the environment and danger for human health, these nematicides have now been banned from use. Consequently, new and more specific control means, safe for the environment and human health, are urgently needed to avoid worldwide proliferation of these devastating plant-parasites. Mining the genomes of root-knot nematodes through an evolutionary and comparative genomics approach, we identified and analyzed 15,952 nematode genes conserved in genomes of plant-damaging species but absent from non target genomes of chordates, plants, annelids, insect pollinators and mollusks. Functional annotation of the corresponding proteins revealed a relative abundance of putative transcription factors in this parasite-specific set compared to whole proteomes of root-knot nematodes. This may point to important and specific regulators of genes involved in parasitism. Because these nematodes are known to secrete effector proteins in planta, essential for parasitism, we searched and identified 993 such effector-like proteins absent from non-target species. Aiming at identifying novel targets for the development of future control methods, we biologically tested the effect of inactivation of the corresponding genes through RNA interference. A total of 15 novel effector-like proteins and one putative transcription factor compatible with the design of siRNAs were present as non-redundant genes and had transcriptional support in the model root-knot nematode Meloidogyne incognita. Infestation assays with siRNA-treated M. incognita on tomato plants showed significant and reproducible reduction of the infestation for 12 of the 16 tested genes compared to control nematodes. These 12 novel genes, showing efficient reduction of parasitism when silenced, constitute

  13. Identification of Novel Target Genes for Safer and More Specific Control of Root-Knot Nematodes from a Pan-Genome Mining

    PubMed Central

    Danchin, Etienne G. J.; Perfus-Barbeoch, Laetitia; Magliano, Marc; Rosso, Marie-Noëlle; Da Rocha, Martine; Da Silva, Corinne; Nottet, Nicolas; Labadie, Karine; Guy, Julie; Artiguenave, François; Abad, Pierre

    2013-01-01

    Root-knot nematodes are globally the most aggressive and damaging plant-parasitic nematodes. Chemical nematicides have so far constituted the most efficient control measures against these agricultural pests. Because of their toxicity for the environment and danger for human health, these nematicides have now been banned from use. Consequently, new and more specific control means, safe for the environment and human health, are urgently needed to avoid worldwide proliferation of these devastating plant-parasites. Mining the genomes of root-knot nematodes through an evolutionary and comparative genomics approach, we identified and analyzed 15,952 nematode genes conserved in genomes of plant-damaging species but absent from non target genomes of chordates, plants, annelids, insect pollinators and mollusks. Functional annotation of the corresponding proteins revealed a relative abundance of putative transcription factors in this parasite-specific set compared to whole proteomes of root-knot nematodes. This may point to important and specific regulators of genes involved in parasitism. Because these nematodes are known to secrete effector proteins in planta, essential for parasitism, we searched and identified 993 such effector-like proteins absent from non-target species. Aiming at identifying novel targets for the development of future control methods, we biologically tested the effect of inactivation of the corresponding genes through RNA interference. A total of 15 novel effector-like proteins and one putative transcription factor compatible with the design of siRNAs were present as non-redundant genes and had transcriptional support in the model root-knot nematode Meloidogyne incognita. Infestation assays with siRNA-treated M. incognita on tomato plants showed significant and reproducible reduction of the infestation for 12 of the 16 tested genes compared to control nematodes. These 12 novel genes, showing efficient reduction of parasitism when silenced, constitute

  14. Multi-edge gene set networks reveal novel insights into global relationships between biological themes.

    PubMed

    Parikh, Jignesh R; Xia, Yu; Marto, Jarrod A

    2012-01-01

    Curated gene sets from databases such as KEGG Pathway and Gene Ontology are often used to systematically organize lists of genes or proteins derived from high-throughput data. However, the information content inherent to some relationships between the interrogated gene sets, such as pathway crosstalk, is often underutilized. A gene set network, where nodes representing individual gene sets such as KEGG pathways are connected to indicate a functional dependency, is well suited to visualize and analyze global gene set relationships. Here we introduce a novel gene set network construction algorithm that integrates gene lists derived from high-throughput experiments with curated gene sets to construct co-enrichment gene set networks. Along with previously described co-membership and linkage algorithms, we apply the co-enrichment algorithm to eight gene set collections to construct integrated multi-evidence gene set networks with multiple edge types connecting gene sets. We demonstrate the utility of approach through examples of novel gene set networks such as the chromosome map co-differential expression gene set network. A total of twenty-four gene set networks are exposed via a web tool called MetaNet, where context-specific multi-edge gene set networks are constructed from enriched gene sets within user-defined gene lists. MetaNet is freely available at http://blaispathways.dfci.harvard.edu/metanet/.

  15. Molecular Networking and Pattern-Based Genome Mining Improves discovery of biosynthetic gene clusters and their products from Salinispora species

    PubMed Central

    Duncan, Katherine R.; Crüsemann, Max; Lechner, Anna; Sarkar, Anindita; Li, Jie; Ziemert, Nadine; Wang, Mingxun; Bandeira, Nuno; Moore, Bradley S.; Dorrestein, Pieter C.; Jensen, Paul R.

    2015-01-01

    Summary Genome sequencing has revealed that bacteria contain many more biosynthetic gene clusters than predicted based on the number of secondary metabolites discovered to date. While this biosynthetic reservoir has fostered interest in new tools for natural product discovery, there remains a gap between gene cluster detection and compound discovery. Here we apply molecular networking and the new concept of pattern-based genome mining to 35 Salinispora strains including 30 for which draft genome sequences were either available or obtained for this study. The results provide a method to simultaneously compare large numbers of complex microbial extracts, which facilitated the identification of media components, known compounds and their derivatives, and new compounds that could be prioritized for structure elucidation. These efforts revealed considerable metabolite diversity and led to several molecular family-gene cluster pairings, of which the quinomycin-type depsipeptide retimycin A was characterized and linked to gene cluster NRPS40 using pattern-based bioinformatic approaches. PMID:25865308

  16. Molecular Networking and Pattern-Based Genome Mining Improves Discovery of Biosynthetic Gene Clusters and their Products from Salinispora Species

    SciTech Connect

    Duncan, Katherine R.; Crüsemann, Max; Lechner, Anna; Sarkar, Anindita; Li, Jie; Ziemert, Nadine; Wang, Mingxun; Bandeira, Nuno; Moore, Bradley S.; Dorrestein, Pieter C.; Jensen, Paul R.

    2015-04-09

    Genome sequencing has revealed that bacteria contain many more biosynthetic gene clusters than predicted based on the number of secondary metabolites discovered to date. While this biosynthetic reservoir has fostered interest in new tools for natural product discovery, there remains a gap between gene cluster detection and compound discovery. In this paper, we apply molecular networking and the new concept of pattern-based genome mining to 35 Salinispora strains, including 30 for which draft genome sequences were either available or obtained for this study. The results provide a method to simultaneously compare large numbers of complex microbial extracts, which facilitated the identification of media components, known compounds and their derivatives, and new compounds that could be prioritized for structure elucidation. Finally, these efforts revealed considerable metabolite diversity and led to several molecular family-gene cluster pairings, of which the quinomycin-type depsipeptide retimycin A was characterized and linked to gene cluster NRPS40 using pattern-based bioinformatic approaches.

  17. Relationships between subducting bathymetric ridges and significant subduction earthquakes from global geophysical data mining

    NASA Astrophysics Data System (ADS)

    Müller, R.; Landgrebe, T. C.

    2012-12-01

    The subduction of linear bathymetric asperities has been linked with the location and rupture characteristics of significant subduction earthquakes in many regions. This suggests that earthquake occurrence is biased toward the subduction of particular types of ocean floor fabric that has formed over 10's or 100's of millions of years, but has only recently been transported into the subduction coupling zone as a consequence of long-term plate tectonic processes. Open-access geophysical data sets offer the opportunity to carry out global investigations of the spatial association between significant earthquakes and well-defined subducting bathymetric features including volcanic ridges, fracture zones and seamount chains. We filter a global significant earthquake database to separate events from the subduction coupling zone only. The coupling zone is established by integrating recent 3-dimensional models of subducting slabs and the lithospheric thickness of overriding plates. A statistical methodology is used to compare spatial associations between subducting linear asperities and significant earthquakes with randomly chosen coupling zone locations to establish sensitivity/specificity relationships as a function of proximity, ruling out random effects and establishing meaningful spatial interpretations for hazard analysis. Our association analysis reveals that significant earthquakes are significantly biased towards localities involving both subducting fracture zones and volcanic ridges/chains. Fracture zone intersections are found to exhibit a stronger association within 50km proximity that rapidly diminishes with increasing distance from the targeted regions, whereas volcanic ridges/chains demonstrate a smaller but broader effect. Fracture zone intersections also display strong relationships with earthquakes with moment magnitudes greater than or equal to 8.5, whereas the opposite is the case for volcanic ridges/seamount chains, associated strongly only with events

  18. Banking biological collections: data warehousing, data mining, and data dilemmas in genomics and global health policy.

    PubMed

    Blatt, R J R

    2000-01-01

    While DNA databases may offer the opportunity to (1) assess population-based prevalence of specific genes and variants, (2) simplify the search for molecular markers, (3) improve targeted drug discovery and development for disease management, (4) refine strategies for disease prevention, and (5) provide the data necessary for evidence-based decision-making, serious scientific and social questions remain. Whether samples are identified, coded, or anonymous, biological banking raises profound ethical and legal issues pertaining to access, informed consent, privacy and confidentiality of genomic information, civil liberties, patenting, and proprietary rights. This paper provides an overview of key policy issues and questions pertaining to biological banking, with a focus on developments in specimen collection, transnational distribution, and public health and academic-industry research alliances. It highlights the challenges posed by the commercialization of genomics, and proposes the need for harmonization of biological banking policies.

  19. TCMGeneDIT: a database for associated traditional Chinese medicine, gene and disease information using text mining.

    PubMed

    Fang, Yu-Ching; Huang, Hsuan-Cheng; Chen, Hsin-Hsi; Juan, Hsueh-Fen

    2008-10-14

    Traditional Chinese Medicine (TCM), a complementary and alternative medical system in Western countries, has been used to treat various diseases over thousands of years in East Asian countries. In recent years, many herbal medicines were found to exhibit a variety of effects through regulating a wide range of gene expressions or protein activities. As available TCM data continue to accumulate rapidly, an urgent need for exploring these resources systematically is imperative, so as to effectively utilize the large volume of literature. TCM, gene, disease, biological pathway and protein-protein interaction information were collected from public databases. For association discovery, the TCM names, gene names, disease names, TCM ingredients and effects were used to annotate the literature corpus obtained from PubMed. The concept to mine entity associations was based on hypothesis testing and collocation analysis. The annotated corpus was processed with natural language processing tools and rule-based approaches were applied to the sentences for extracting the relations between TCM effectors and effects. We developed a database, TCMGeneDIT, to provide association information about TCMs, genes, diseases, TCM effects and TCM ingredients mined from vast amount of biomedical literature. Integrated protein-protein interaction and biological pathways information are also available for exploring the regulations of genes associated with TCM curative effects. In addition, the transitive relationships among genes, TCMs and diseases could be inferred through the shared intermediates. Furthermore, TCMGeneDIT is useful in understanding the possible therapeutic mechanisms of TCMs via gene regulations and deducing synergistic or antagonistic contributions of the prescription components to the overall therapeutic effects. The database is now available at http://tcm.lifescience.ntu.edu.tw/. TCMGeneDIT is a unique database that offers diverse association information on TCMs. This

  20. Mining Metatranscriptomic Data of a Cyanobacterial Bloom for Patterns of Secondary Metabolism Gene Expression

    NASA Astrophysics Data System (ADS)

    Penn, K.; Wang, J.; Thompson, J. R.

    2012-12-01

    The secondary metabolism of bacterial cells produces small molecules that can have both medicinal properties and toxigenic effects. This study focuses on mining metatranscriptomes from a tropical eutrophic water reservoir in Singapore experiencing a cyanobacterial Harmful Algal Bloom dominated by Microcystis, to identify the types of secondary metabolites genes being expressed and by what taxa. A phylogenomic approach as implemented in the online tool Natural Product Domain Seeker (NaPDoS) was used. NaPDoS was recently developed to classify ketosynthase and condensation domains from polyketide synthases and non-ribosomal peptide synthetases, respectively, to provide insight into potential types of pathway products. Water samples from the reservoir were collected six times over a day/night cycle. Total RNA was extracted and subjected to ribosomal depletion followed by cDNA synthesis and next-generation Illumina DNA sequencing, generating 493,468 to 678,064 95-101 base pairs post-quality control reads per sample. Evidence for expression of PKS and NRPS type genes based on identification of a ketosynthase and condensation domains are present in all time points. KS domains fall into to two main phylogenetic groups, type I and type II, within the type II group of domains are domains for fatty acid biosynthesis (fab), which is considered a part of primary metabolism. Type I KS domains are part of the classic PKS natural product biosynthetic genes that make things such as antibiotics and other toxins such as microcystin. 2849 KS domains were detected in the combined reservoir samples, of these 1141 were likely from fatty acid biosynthesis and 1708 were related to secondary metabolism type KS domains. The most abundant KS domains (485) besides the fab genes are closely related to a KS domain that is not currently experimentally linked to a known secondary metabolite but the domain is found in four Microcystis genomes along with two other species of cyanobacteria. The three

  1. The Influence of the Global Gene Expression Shift on Downstream Analyses.

    PubMed

    Xu, Qifeng; Zhang, Xuegong

    2016-01-01

    The assumption that total abundance of RNAs in a cell is roughly the same in different cells is underlying most studies based on gene expression analyses. But experiments have shown that changes in the expression of some master regulators such as c-MYC can cause global shift in the expression of almost all genes in some cell types like cancers. Such shift will violate this assumption and can cause wrong or biased conclusions for standard data analysis practices, such as detection of differentially expressed (DE) genes and molecular classification of tumors based on gene expression. Most existing gene expression data were generated without considering this possibility, and are therefore at the risk of having produced unreliable results if such global shift effect exists in the data. To evaluate this risk, we conducted a systematic study on the possible influence of the global gene expression shift effect on differential expression analysis and on molecular classification analysis. We collected data with known global shift effect and also generated data to simulate different situations of the effect based on a wide collection of real gene expression data, and conducted comparative studies on representative existing methods. We observed that some DE analysis methods are more tolerant to the global shift while others are very sensitive to it. Classification accuracy is not sensitive to the shift and actually can benefit from it, but genes selected for the classification can be greatly affected.

  2. Constructing a molecular interaction network for thyroid cancer via large-scale text mining of gene and pathway events.

    PubMed

    Wu, Chengkun; Schwartz, Jean-Marc; Brabant, Georg; Peng, Shao-Liang; Nenadic, Goran

    2015-01-01

    Biomedical studies need assistance from automated tools and easily accessible data to address the problem of the rapidly accumulating literature. Text-mining tools and curated databases have been developed to address such needs and they can be applied to improve the understanding of molecular pathogenesis of complex diseases like thyroid cancer. We have developed a system, PWTEES, which extracts pathway interactions from the literature utilizing an existing event extraction tool (TEES) and pathway named entity recognition (PathNER). We then applied the system on a thyroid cancer corpus and systematically extracted molecular interactions involving either genes or pathways. With the extracted information, we constructed a molecular interaction network taking genes and pathways as nodes. Using curated pathway information and network topological analyses, we highlight key genes and pathways involved in thyroid carcinogenesis. Mining events involving genes and pathways from the literature and integrating curated pathway knowledge can help improve the understanding of molecular interactions of complex diseases. The system developed for this study can be applied in studies other than thyroid cancer. The source code is freely available online at https://github.com/chengkun-wu/PWTEES.

  3. The systems genetics resource: a web application to mine global data for complex disease traits.

    PubMed

    van Nas, Atila; Pan, Calvin; Ingram-Drake, Leslie A; Ghazalpour, Anatole; Drake, Thomas A; Sobel, Eric M; Papp, Jeanette C; Lusis, Aldons J

    2013-01-01

    The Systems Genetics Resource (SGR) (http://systems.genetics.ucla.edu) is a new open-access web application and database that contains genotypes and clinical and intermediate phenotypes from both human and mouse studies. The mouse data include studies using crosses between specific inbred strains and studies using the Hybrid Mouse Diversity Panel. SGR is designed to assist researchers studying genes and pathways contributing to complex disease traits, including obesity, diabetes, atherosclerosis, heart failure, osteoporosis, and lipoprotein metabolism. Over the next few years, we hope to add data relevant to deafness, addiction, hepatic steatosis, toxin responses, and vascular injury. The intermediate phenotypes include expression array data for a variety of tissues and cultured cells, metabolite levels, and protein levels. Pre-computed tables of genetic loci controlling intermediate and clinical phenotypes, as well as phenotype correlations, are accessed via a user-friendly web interface. The web site includes detailed protocols for all of the studies. Data from published studies are freely available; unpublished studies have restricted access during their embargo period.

  4. Biotic Stress Globally Down-Regulates Photosynthesis Genes

    USDA-ARS?s Scientific Manuscript database

    Upon herbivore and pathogen attacks, plants switch from processes supporting growth and reproduction to defense by inducing a set of defense genes and down-regulating most of the nuclear encoded photosynthetic genes. To determine if this transcriptional response is universal we used transcriptome da...

  5. Application of Taxonomic Modeling to Microbiota Data Mining for Detection of Helminth Infection in Global Populations.

    PubMed

    Torbati, Mahbaneh Eshaghzadeh; Mitreva, Makedonka; Gopalakrishnan, Vanathi

    2016-12-01

    Human microbiome data from genomic sequencing technologies is fast accumulating, giving us insights into bacterial taxa that contribute to health and disease. The predictive modeling of such microbiota count data for the classification of human infection from parasitic worms, such as helminths, can help in the detection and management across global populations. Real-world datasets of microbiome experiments are typically sparse, containing hundreds of measurements for bacterial species, of which only a few are detected in the bio-specimens that are analyzed. This feature of microbiome data produces the challenge of needing more observations for accurate predictive modeling and has been dealt with previously, using different methods of feature reduction. To our knowledge, integrative methods, such as transfer learning, have not yet been explored in the microbiome domain as a way to deal with data sparsity by incorporating knowledge of different but related datasets. One way of incorporating this knowledge is by using a meaningful mapping among features of these datasets. In this paper, we claim that this mapping would exist among members of each individual cluster, grouped based on phylogenetic dependency among taxa and their association to the phenotype. We validate our claim by showing that models incorporating associations in such a grouped feature space result in no performance deterioration for the given classification task. In this paper, we test our hypothesis by using classification models that detect helminth infection in microbiota of human fecal samples obtained from Indonesia and Liberia countries. In our experiments, we first learn binary classifiers for helminth infection detection by using Naive Bayes, Support Vector Machines, Multilayer Perceptrons, and Random Forest methods. In the next step, we add taxonomic modeling by using the SMART-scan module to group the data, and learn classifiers using the same four methods, to test the validity of the

  6. Application of Taxonomic Modeling to Microbiota Data Mining for Detection of Helminth Infection in Global Populations

    PubMed Central

    Torbati, Mahbaneh Eshaghzadeh; Mitreva, Makedonka; Gopalakrishnan, Vanathi

    2017-01-01

    Human microbiome data from genomic sequencing technologies is fast accumulating, giving us insights into bacterial taxa that contribute to health and disease. The predictive modeling of such microbiota count data for the classification of human infection from parasitic worms, such as helminths, can help in the detection and management across global populations. Real-world datasets of microbiome experiments are typically sparse, containing hundreds of measurements for bacterial species, of which only a few are detected in the bio-specimens that are analyzed. This feature of microbiome data produces the challenge of needing more observations for accurate predictive modeling and has been dealt with previously, using different methods of feature reduction. To our knowledge, integrative methods, such as transfer learning, have not yet been explored in the microbiome domain as a way to deal with data sparsity by incorporating knowledge of different but related datasets. One way of incorporating this knowledge is by using a meaningful mapping among features of these datasets. In this paper, we claim that this mapping would exist among members of each individual cluster, grouped based on phylogenetic dependency among taxa and their association to the phenotype. We validate our claim by showing that models incorporating associations in such a grouped feature space result in no performance deterioration for the given classification task. In this paper, we test our hypothesis by using classification models that detect helminth infection in microbiota of human fecal samples obtained from Indonesia and Liberia countries. In our experiments, we first learn binary classifiers for helminth infection detection by using Naive Bayes, Support Vector Machines, Multilayer Perceptrons, and Random Forest methods. In the next step, we add taxonomic modeling by using the SMART-scan module to group the data, and learn classifiers using the same four methods, to test the validity of the

  7. Global gene expression analysis of reactive stroma in prostate cancer.

    PubMed

    Dakhova, Olga; Ozen, Mustafa; Creighton, Chad J; Li, Rile; Ayala, Gustavo; Rowley, David; Ittmann, Michael

    2009-06-15

    Marked reactive stroma formation, designated as grade 3 reactive stroma, is associated with poor outcome in clinically localized prostate cancer. To understand the biological processes and signaling mechanisms underlying the formation of such reactive stroma, we carried out microarray gene expression analysis of laser-captured reactive stroma and matched normal stroma. Seventeen cases of reactive stroma grade 3 cancer were used to laser-capture tumor and normal stroma. Expression analysis was carried out using Agilent 44K arrays. Up-regulation of selected genes was confirmed by quantitative reverse transcription-PCR. Expression data was analyzed to identify significantly up- and down-regulated genes, and gene ontology analysis was used to define pathways altered in reactive stroma. A total of 544 unique genes were significantly higher in the reactive stroma and 606 unique genes were lower. Gene ontology analysis revealed significant alterations in a number of novel processes in prostate cancer reactive stroma, including neurogenesis, axonogenesis, and the DNA damage/repair pathways, as well as evidence of increases in stem cells in prostate cancer reactive stroma. Formation of reactive stroma in prostate cancer is a dynamic process characterized by significant alterations in growth factor and signal transduction pathways and formation of new structures, including nerves and axons.

  8. Global Identification of Genes Specific for Rice Meiosis.

    PubMed

    Zhang, Bingwei; Xu, Meng; Bian, Shiquan; Hou, Lili; Tang, Ding; Li, Yafei; Gu, Minghong; Cheng, Zhukuan; Yu, Hengxiu

    2015-01-01

    The leptotene-zygotene transition is a major step in meiotic progression during which pairing between homologous chromosomes is initiated and double strand breaks occur. OsAM1, a homologue of maize AM1 and Arabidopsis SWI1, encodes a protein with a coiled-coil domain in its central region that is required for the leptotene-zygotene transition during rice meiosis. To gain more insight into the role of OsAM1 in rice meiosis and identify additional meiosis-specific genes, we characterized the transcriptomes of young panicles of Osam1 mutant and wild-type rice plants using RNA-Seq combined with bioinformatic and statistical analyses. As a result, a total of 25,750 and 28,455 genes were expressed in young panicles of wild-type and Osam1 mutant plants, respectively, and 4,400 differentially expressed genes (DEGs; log2 Ratio ≥ 1, FDR ≤ 0.05) were identified. Of these DEGs, four known rice meiosis-specific genes were detected, and 22 new putative meiosis-related genes were found by mapping these DEGs to reference biological pathways in the KEGG database. We identified eight additional well-conserved OsAM1-responsive rice meiotic genes by comparing our RNA-Seq data with known meiotic genes in Arabidopsis and fission yeast.

  9. Global Identification of Genes Specific for Rice Meiosis

    PubMed Central

    Zhang, Bingwei; Xu, Meng; Bian, Shiquan; Hou, Lili; Tang, Ding; Li, Yafei; Gu, Minghong; Cheng, Zhukuan; Yu, Hengxiu

    2015-01-01

    The leptotene-zygotene transition is a major step in meiotic progression during which pairing between homologous chromosomes is initiated and double strand breaks occur. OsAM1, a homologue of maize AM1 and Arabidopsis SWI1, encodes a protein with a coiled-coil domain in its central region that is required for the leptotene-zygotene transition during rice meiosis. To gain more insight into the role of OsAM1 in rice meiosis and identify additional meiosis-specific genes, we characterized the transcriptomes of young panicles of Osam1 mutant and wild-type rice plants using RNA-Seq combined with bioinformatic and statistical analyses. As a result, a total of 25,750 and 28,455 genes were expressed in young panicles of wild-type and Osam1 mutant plants, respectively, and 4,400 differentially expressed genes (DEGs; log2 Ratio ≥ 1, FDR ≤ 0.05) were identified. Of these DEGs, four known rice meiosis-specific genes were detected, and 22 new putative meiosis-related genes were found by mapping these DEGs to reference biological pathways in the KEGG database. We identified eight additional well-conserved OsAM1-responsive rice meiotic genes by comparing our RNA-Seq data with known meiotic genes in Arabidopsis and fission yeast. PMID:26394329

  10. Using the iHOP information resource to mine the biomedical literature on genes, proteins, and chemical compounds.

    PubMed

    Hoffmann, Robert

    2007-12-01

    The iHOP (Information Hyperlinked over Proteins) resource employs sophisticated text-mining methods to assist researchers in their quest for information on specific genes and proteins, their physical interactions and regulatory relationships, their relevance in pathologies, and their interactions with chemical compounds. iHOP parses thousands of PubMed documents every day and collects information specific to thousands of different biological molecules. Rather than providing long lists of entire abstracts upon keyword searches, iHOP selectively retrieves information that is specific to genes and proteins, and summarizes their interactions and functions. In iHOP, with genes and proteins acting as hyperlinks between sentences and abstracts, a large part of the PubMed knowledgebase becomes a giant navigable information network. (c) 2007 by John Wiley & Sons, Inc.

  11. Computational Genomics: From Genome Sequence To Global Gene Regulation

    NASA Astrophysics Data System (ADS)

    Li, Hao

    2000-03-01

    As various genome projects are shifting to the post-sequencing phase, it becomes a big challenge to analyze the sequence data and extract biological information using computational tools. In the past, computational genomics has mainly focused on finding new genes and mapping out their biological functions. With the rapid accumulation of experimental data on genome-wide gene activities, it is now possible to understand how genes are regulated on a genomic scale. A major mechanism for gene regulation is to control the level of transcription, which is achieved by regulatory proteins that bind to short DNA sequences - the regulatory elements. We have developed a new approach to identifying regulatory elements in genomes. The approach formalizes how one would proceed to decipher a ``text'' consisting of a long string of letters written in an unknown language that did not delineate words. The algorithm is based on a statistical mechanics model in which the sequence is segmented probabilistically into ``words'' and a ``dictionary'' of ``words'' is built concurrently. For the control regions in the yeast genome, we built a ``dictionary'' of about one thousand words which includes many known as well as putative regulatory elements. I will discuss how we can use this dictionary to search for genes that are likely to be regulated in a similar fashion and to analyze gene expression data generated from DNA micro-array experiments.

  12. Cancer Evolution Is Associated with Pervasive Positive Selection on Globally Expressed Genes

    PubMed Central

    Ostrow, Sheli L.; Barshir, Ruth; DeGregori, James; Yeger-Lotem, Esti; Hershberg, Ruth

    2014-01-01

    Cancer is an evolutionary process in which cells acquire new transformative, proliferative and metastatic capabilities. A full understanding of cancer requires learning the dynamics of the cancer evolutionary process. We present here a large-scale analysis of the dynamics of this evolutionary process within tumors, with a focus on breast cancer. We show that the cancer evolutionary process differs greatly from organismal (germline) evolution. Organismal evolution is dominated by purifying selection (that removes mutations that are harmful to fitness). In contrast, in the cancer evolutionary process the dominance of purifying selection is much reduced, allowing for a much easier detection of the signals of positive selection (adaptation). We further show that, as a group, genes that are globally expressed across human tissues show a very strong signal of positive selection within tumors. Indeed, known cancer genes are enriched for global expression patterns. Yet, positive selection is prevalent even on globally expressed genes that have not yet been associated with cancer, suggesting that globally expressed genes are enriched for yet undiscovered cancer related functions. We find that the increased positive selection on globally expressed genes within tumors is not due to their expression in the tissue relevant to the cancer. Rather, such increased adaptation is likely due to globally expressed genes being enriched in important housekeeping and essential functions. Thus, our results suggest that tumor adaptation is most often mediated through somatic changes to those genes that are important for the most basic cellular functions. Together, our analysis reveals the uniqueness of the cancer evolutionary process and the particular importance of globally expressed genes in driving cancer initiation and progression. PMID:24603726

  13. Identification and activation of novel biosynthetic gene clusters by genome mining in the kirromycin producer Streptomyces collinus Tü 365.

    PubMed

    Iftime, Dumitrita; Kulik, Andreas; Härtner, Thomas; Rohrer, Sabrina; Niedermeyer, Timo Horst Johannes; Stegmann, Evi; Weber, Tilmann; Wohlleben, Wolfgang

    2016-03-01

    Streptomycetes are prolific sources of novel biologically active secondary metabolites with pharmaceutical potential. S. collinus Tü 365 is a Streptomyces strain, isolated 1972 from Kouroussa (Guinea). It is best known as producer of the antibiotic kirromycin, an inhibitor of the protein biosynthesis interacting with elongation factor EF-Tu. Genome Mining revealed 32 gene clusters encoding the biosynthesis of diverse secondary metabolites in the genome of Streptomyces collinus Tü 365, indicating an enormous biosynthetic potential of this strain. The structural diversity of secondary metabolisms predicted for S. collinus Tü 365 includes PKS, NRPS, PKS-NRPS hybrids, a lanthipeptide, terpenes and siderophores. While some of these gene clusters were found to contain genes related to known secondary metabolites, which also could be detected in HPLC-MS analyses, most of the uncharacterized gene clusters are not expressed under standard laboratory conditions. With this study we aimed to characterize the genome information of S. collinus Tü 365 to make use of gene clusters, which previously have not been described for this strain. We were able to connect the gene clusters of a lanthipeptide, a carotenoid, five terpenoid compounds, an ectoine, a siderophore and a spore pigment-associated gene cluster to their respective biosynthesis products.

  14. Diversity and Distribution of Arsenic-Related Genes Along a Pollution Gradient in a River Affected by Acid Mine Drainage.

    PubMed

    Desoeuvre, Angélique; Casiot, Corinne; Héry, Marina

    2016-04-01

    Some microorganisms have the capacity to interact with arsenic through resistance or metabolic processes. Their activities contribute to the fate of arsenic in contaminated ecosystems. To investigate the genetic potential involved in these interactions in a zone of confluence between a pristine river and an arsenic-rich acid mine drainage, we explored the diversity of marker genes for arsenic resistance (arsB, acr3.1, acr3.2), methylation (arsM), and respiration (arrA) in waters characterized by contrasted concentrations of metallic elements (including arsenic) and pH. While arsB-carrying bacteria were representative of pristine waters, Acr3 proteins may confer to generalist bacteria the capacity to cope with an increase of contamination. arsM showed an unexpected wide distribution, suggesting biomethylation may impact arsenic fate in contaminated aquatic ecosystems. arrA gene survey suggested that only specialist microorganisms (adapted to moderately or extremely contaminated environments) have the capacity to respire arsenate. Their distribution, modulated by water chemistry, attested the specialist nature of the arsenate respirers. This is the first report of the impact of an acid mine drainage on the diversity and distribution of arsenic (As)-related genes in river waters. The fate of arsenic in this ecosystem is probably under the influence of the abundance and activity of specific microbial populations involved in different As biotransformations.

  15. Global gene expression in channel catfish after vaccination with an attenuated Edwardsiella ictaluri

    USDA-ARS?s Scientific Manuscript database

    To understand the global gene expression in channel catfish after immersion vaccination with an attenuated Edwardsiella ictaluri (AquaVac ESCTM), microarray analysis of 65,182 UniGene transcripts were performed. With a filter of false-discovery rate less than 0.05 and fold change greater than 2, a t...

  16. Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data

    PubMed Central

    2013-01-01

    Background Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with gene set analysis (GSA) methods for chemical treatment identification, for pharmacological mechanism elucidation, and for comparing compound toxicity profiles. Methods We created 30,211 chemical response-specific gene sets for human and mouse by next-gen TM, and derived 1,189 (human) and 588 (mouse) gene sets from the Comparative Toxicogenomics Database (CTD). We tested for significant differential expression (SDE) (false discovery rate -corrected p-values < 0.05) of the next-gen TM-derived gene sets and the CTD-derived gene sets in gene expression (GE) data sets of five chemicals (from experimental models). We tested for SDE of gene sets for six fibrates in a peroxisome proliferator-activated receptor alpha (PPARA) knock-out GE dataset and compared to results from the Connectivity Map. We tested for SDE of 319 next-gen TM-derived gene sets for environmental toxicants in three GE data sets of triazoles, and tested for SDE of 442 gene sets associated with embryonic structures. We compared the gene sets to triazole effects seen in the Whole Embryo Culture (WEC), and used principal component analysis (PCA) to discriminate triazoles from other chemicals. Results Next-gen TM-derived gene sets matching the chemical treatment were significantly altered in three GE data sets, and the corresponding CTD-derived gene sets were significantly altered in five GE data sets. Six next-gen TM-derived and four CTD-derived fibrate gene sets were significantly altered in the PPARA knock-out GE dataset. None of the fibrate signatures in cMap scored significant against the PPARA GE signature. 33 environmental toxicant gene sets were significantly altered in the triazole GE data sets. 21 of these toxicants

  17. A global test for gene‐gene interactions based on random matrix theory

    PubMed Central

    Amos, Christopher I.; Moore, Jason H.

    2016-01-01

    ABSTRACT Statistical interactions between markers of genetic variation, or gene‐gene interactions, are believed to play an important role in the etiology of many multifactorial diseases and other complex phenotypes. Unfortunately, detecting gene‐gene interactions is extremely challenging due to the large number of potential interactions and ambiguity regarding marker coding and interaction scale. For many data sets, there is insufficient statistical power to evaluate all candidate gene‐gene interactions. In these cases, a global test for gene‐gene interactions may be the best option. Global tests have much greater power relative to multiple individual interaction tests and can be used on subsets of the markers as an initial filter prior to testing for specific interactions. In this paper, we describe a novel global test for gene‐gene interactions, the global epistasis test (GET), that is based on results from random matrix theory. As we show via simulation studies based on previously proposed models for common diseases including rheumatoid arthritis, type 2 diabetes, and breast cancer, our proposed GET method has superior performance characteristics relative to existing global gene‐gene interaction tests. A glaucoma GWAS data set is used to demonstrate the practical utility of the GET method. PMID:27386793

  18. Carbohydrate metabolic pathway genes associated with quantitative trait loci (QTL) for obesity and type 2 diabetes: identification by data mining.

    PubMed

    Varma, Vijayalakshmi; Wise, Carolyn; Kaput, Jim

    2010-09-01

    Increasing consumption of refined carbohydrates is now being recognized as a primary contributor to the development of nutritionally related chronic diseases such as obesity and type 2 diabetes mellitus (T2DM). A data mining approach was used to evaluate the role of carbohydrate metabolic pathway genes in the development of obesity and T2DM. Data from public databases were used to map the position of the carbohydrate metabolic pathway genes to known quantitative trait loci (QTL) for obesity and T2DM and for examining the pathway genes for the presence of sequence and structural genetic variants such as single nucleotide polymorphisms (SNPs) and copy number variants (CNS), respectively. The results demonstrated that a majority of the genes of the carbohydrate metabolic pathways are associated with QTL for obesity and many for T2DM. In addition, some key genes of the pathways also encode non-synonymous SNPs that exhibit significant differences in population frequencies. This study emphasizes the significance of the metabolic pathways genes in the development of disease phenotypes, its differential occurrence across populations and between individuals, and a strategy for interpreting an individuals' risk for disease.

  19. tagtog: interactive and text-mining-assisted annotation of gene mentions in PLOS full-text articles.

    PubMed

    Cejuela, Juan Miguel; McQuilton, Peter; Ponting, Laura; Marygold, Steven J; Stefancsik, Raymund; Millburn, Gillian H; Rost, Burkhard

    2014-01-01

    The breadth and depth of biomedical literature are increasing year upon year. To keep abreast of these increases, FlyBase, a database for Drosophila genomic and genetic information, is constantly exploring new ways to mine the published literature to increase the efficiency and accuracy of manual curation and to automate some aspects, such as triaging and entity extraction. Toward this end, we present the 'tagtog' system, a web-based annotation framework that can be used to mark up biological entities (such as genes) and concepts (such as Gene Ontology terms) in full-text articles. tagtog leverages manual user annotation in combination with automatic machine-learned annotation to provide accurate identification of gene symbols and gene names. As part of the BioCreative IV Interactive Annotation Task, FlyBase has used tagtog to identify and extract mentions of Drosophila melanogaster gene symbols and names in full-text biomedical articles from the PLOS stable of journals. We show here the results of three experiments with different sized corpora and assess gene recognition performance and curation speed. We conclude that tagtog-named entity recognition improves with a larger corpus and that tagtog-assisted curation is quicker than manual curation. DATABASE URL: www.tagtog.net, www.flybase.org.

  20. Understanding gene expression in coronary artery disease through global profiling, network analysis and independent validation of key candidate genes.

    PubMed

    Arvind, Prathima; Jayashree, Shanker; Jambunathan, Srikarthika; Nair, Jiny; Kakkar, Vijay V

    2015-12-01

    Molecular mechanism underlying the patho-physiology of coronary artery disease (CAD) is complex. We used global expression profiling combined with analysis of biological network to dissect out potential genes and pathways associated with CAD in a representative case-control Asian Indian cohort. We initially performed blood transcriptomics profiling in 20 subjects, including 10 CAD patients and 10 healthy controls on the Agilent microarray platform. Data was analysed with Gene Spring Gx12.5, followed by network analysis using David v 6.7 and Reactome databases. The most significant differentially expressed genes from microarray were independently validated by real time PCR in 97 cases and 97 controls. A total of 190 gene transcripts showed significant differential expression (fold change>2,P<0.05) between the cases and the controls of which 142 genes were upregulated and 48 genes were downregulated. Genes associated with inflammation, immune response, cell regulation, proliferation and apoptotic pathways were enriched, while inflammatory and immune response genes were displayed as hubs in the network, having greater number of interactions with the neighbouring genes. Expression of EGR1/2/3, IL8, CXCL1, PTGS2, CD69, IFNG, FASLG, CCL4, CDC42, DDX58, NFKBID and NR4A2 genes were independently validated; EGR1/2/3 and IL8 showed >8-fold higher expression in cases relative to the controls implying their important role in CAD. In conclusion, global gene expression profiling combined with network analysis can help in identifying key genes and pathways for CAD.

  1. Cell types differ in global coordination of splicing and proportion of highly expressed genes.

    PubMed

    Trakhtenberg, Ephraim F; Pho, Nam; Holton, Kristina M; Chittenden, Thomas W; Goldberg, Jeffrey L; Dong, Lingsheng

    2016-08-31

    Balance in the transcriptome is regulated by coordinated synthesis and degradation of RNA molecules. Here we investigated whether mammalian cell types intrinsically differ in global coordination of gene splicing and expression levels. We analyzed RNA-seq transcriptome profiles of 8 different purified mouse cell types. We found that different cell types vary in proportion of highly expressed genes and the number of alternatively spliced transcripts expressed per gene, and that the cell types that express more variants of alternatively spliced transcripts per gene are those that have higher proportion of highly expressed genes. Cell types segregated into two clusters based on high or low proportion of highly expressed genes. Biological functions involved in negative regulation of gene expression were enriched in the group of cell types with low proportion of highly expressed genes, and biological functions involved in regulation of transcription and RNA splicing were enriched in the group of cell types with high proportion of highly expressed genes. Our findings show that cell types differ in proportion of highly expressed genes and the number of alternatively spliced transcripts expressed per gene, which represent distinct properties of the transcriptome and may reflect intrinsic differences in global coordination of synthesis, splicing, and degradation of RNA molecules.

  2. Cell types differ in global coordination of splicing and proportion of highly expressed genes

    PubMed Central

    Trakhtenberg, Ephraim F.; Pho, Nam; Holton, Kristina M.; Chittenden, Thomas W.; Goldberg, Jeffrey L.; Dong, Lingsheng

    2016-01-01

    Balance in the transcriptome is regulated by coordinated synthesis and degradation of RNA molecules. Here we investigated whether mammalian cell types intrinsically differ in global coordination of gene splicing and expression levels. We analyzed RNA-seq transcriptome profiles of 8 different purified mouse cell types. We found that different cell types vary in proportion of highly expressed genes and the number of alternatively spliced transcripts expressed per gene, and that the cell types that express more variants of alternatively spliced transcripts per gene are those that have higher proportion of highly expressed genes. Cell types segregated into two clusters based on high or low proportion of highly expressed genes. Biological functions involved in negative regulation of gene expression were enriched in the group of cell types with low proportion of highly expressed genes, and biological functions involved in regulation of transcription and RNA splicing were enriched in the group of cell types with high proportion of highly expressed genes. Our findings show that cell types differ in proportion of highly expressed genes and the number of alternatively spliced transcripts expressed per gene, which represent distinct properties of the transcriptome and may reflect intrinsic differences in global coordination of synthesis, splicing, and degradation of RNA molecules. PMID:27577089

  3. Global mining risk footprint of critical metals necessary for low-carbon technologies: the case of neodymium, cobalt, and platinum in Japan.

    PubMed

    Nansai, Keisuke; Nakajima, Kenichi; Kagawa, Shigemi; Kondo, Yasushi; Shigetomi, Yosuke; Suh, Sangwon

    2015-02-17

    Meeting the 2-degree global warming target requires wide adoption of low-carbon energy technologies. Many such technologies rely on the use of precious metals, however, increasing the dependence of national economies on these resources. Among such metals, those with supply security concerns are referred to as critical metals. Using the Policy Potential Index developed by the Fraser Institute, this study developed a new footprint indicator, the mining risk footprint (MRF), to quantify the mining risk directly and indirectly affecting a national economy through its consumption of critical metals. We formulated the MRF as a product of the material footprint (MF) of the consuming country and the mining risks of the countries where the materials are mined. A case study was conducted for the 2005 Japanese economy to determine the MF and MRF for three critical metals essential for emerging energy technologies: neodymium, cobalt and platinum. The results indicate that in 2005 the MFs generated by Japanese domestic final demand, that is, the consumption-based metal output of Japan, were 1.0 × 10(3) t for neodymium, 9.4 × 10(3) t for cobalt, and 2.1 × 10 t for platinum. Export demand contributes most to the MF, accounting for 3.0 × 10(3) t, 1.3 × 10(5) t, and 3.1 × 10 t, respectively. The MRFs of Japanese total final demand (domestic plus export) were calculated to be 1.7 × 10 points for neodymium, 4.5 × 10(-2) points for cobalt, and 5.6 points for platinum, implying that the Japanese economy is incurring a high mining risk through its use of neodymium. This country's MRFs are all dominated by export demand. The paper concludes by discussing the policy implications and future research directions for measuring the MFs and MRFs of critical metals. For countries poorly endowed with mineral resources, adopting low-carbon energy technologies may imply a shifting of risk from carbon resources to other natural resources, in particular critical metals, and a trade

  4. Global analysis of gene transcription regulation in prokaryotes.

    PubMed

    Zhou, D; Yang, R

    2006-10-01

    Prokaryotes have complex mechanisms to regulate their gene transcription, through the action of transcription factors (TFs). This review deals with current strategies, approaches and challenges in the understanding of i) how to map the repertoires of TF and operon on a genome, ii) how to identify the specific cis-acting DNA elements and their DNA-binding TFs that are required for expression of a given gene, iii) how to define the regulon members of a given TF, iv) how a given TF interacts with its target promoters, v) how these TF-promoter DNA interactions constitute regulatory networks, and vi) how transcriptional regulatory networks can be reconstructed by the reverse-engineering methods. Our goal is to depict the power of newly developed genomic techniques and computational tools, alone or in combination, to dissect the genetic circuitry of transcription regulation, and how this has the tremendous potential to model the regulatory networks in the prokaryotic cells.

  5. Contextual weighting for Support Vector Machines in literature mining: an application to gene versus protein name disambiguation.

    PubMed

    Pahikkala, Tapio; Ginter, Filip; Boberg, Jorma; Järvinen, Jouni; Salakoski, Tapio

    2005-06-22

    The ability to distinguish between genes and proteins is essential for understanding biological text. Support Vector Machines (SVMs) have been proven to be very efficient in general data mining tasks. We explore their capability for the gene versus protein name disambiguation task. We incorporated into the conventional SVM a weighting scheme based on distances of context words from the word to be disambiguated. This weighting scheme increased the performance of SVMs by five percentage points giving performance better than 85% as measured by the area under ROC curve and outperformed the Weighted Additive Classifier, which also incorporates the weighting, and the Naive Bayes classifier. We show that the performance of SVMs can be improved by the proposed weighting scheme. Furthermore, our results suggest that in this study the increase of the classification performance due to the weighting is greater than that obtained by selecting the underlying classifier or the kernel part of the SVM.

  6. Contextual weighting for Support Vector Machines in literature mining: an application to gene versus protein name disambiguation

    PubMed Central

    Pahikkala, Tapio; Ginter, Filip; Boberg, Jorma; Järvinen, Jouni; Salakoski, Tapio

    2005-01-01

    Background The ability to distinguish between genes and proteins is essential for understanding biological text. Support Vector Machines (SVMs) have been proven to be very efficient in general data mining tasks. We explore their capability for the gene versus protein name disambiguation task. Results We incorporated into the conventional SVM a weighting scheme based on distances of context words from the word to be disambiguated. This weighting scheme increased the performance of SVMs by five percentage points giving performance better than 85% as measured by the area under ROC curve and outperformed the Weighted Additive Classifier, which also incorporates the weighting, and the Naive Bayes classifier. Conclusion We show that the performance of SVMs can be improved by the proposed weighting scheme. Furthermore, our results suggest that in this study the increase of the classification performance due to the weighting is greater than that obtained by selecting the underlying classifier or the kernel part of the SVM. PMID:15972097

  7. Globalization of diabetes: the role of diet, lifestyle, and genes.

    PubMed

    Hu, Frank B

    2011-06-01

    Type 2 diabetes is a global public health crisis that threatens the economies of all nations, particularly developing countries. Fueled by rapid urbanization, nutrition transition, and increasingly sedentary lifestyles, the epidemic has grown in parallel with the worldwide rise in obesity. Asia's large population and rapid economic development have made it an epicenter of the epidemic. Asian populations tend to develop diabetes at younger ages and lower BMI levels than Caucasians. Several factors contribute to accelerated diabetes epidemic in Asians, including the "normal-weight metabolically obese" phenotype; high prevalence of smoking and heavy alcohol use; high intake of refined carbohydrates (e.g., white rice); and dramatically decreased physical activity levels. Poor nutrition in utero and in early life combined with overnutrition in later life may also play a role in Asia's diabetes epidemic. Recent advances in genome-wide association studies have contributed substantially to our understanding of diabetes pathophysiology, but currently identified genetic loci are insufficient to explain ethnic differences in diabetes risk. Nonetheless, interactions between Westernized diet and lifestyle and genetic background may accelerate the growth of diabetes in the context of rapid nutrition transition. Epidemiologic studies and randomized clinical trials show that type 2 diabetes is largely preventable through diet and lifestyle modifications. Translating these findings into practice, however, requires fundamental changes in public policies, the food and built environments, and health systems. To curb the escalating diabetes epidemic, primary prevention through promotion of a healthy diet and lifestyle should be a global public policy priority.

  8. Global Gene Expression Profile of the Hippocampus in a Rat Model of Vascular Dementia.

    PubMed

    Wu, Lin; Feng, Xiao-Tao; Hu, Yue-Qiang; Tang, Nong; Zhao, Qing-Shan; Li, Tian-Wei; Li, Hai-Yuan; Wang, Qing-Bi; Bi, Xin-Ya; Cai, Xin-Kun

    2015-09-01

    Vascular dementia (VD) has been one of the most serious public health problems worldwide. It is well known that cerebral hypoperfusion is the key pathophysiological basis of VD, but it remains unclear how global genes in hippocampus respond to cerebral ischemia-reperfusion. In this study, we aimed to reveal the global gene expression profile in the hippocampus of VD using a rat model. VD was induced by repeated occlusion of common carotid arteries followed by reperfusion. The rats with VD were characterized by deficit of memory and cognitive function and by the histopathological changes in the hippocampus, such as a reduction in the number and the size of neurons accompanied by an increase in intercellular space. Microarray analysis of global genes displayed up-regulation of 7 probesets with genes with fold change more than 1.5 (P < 0.05) and down-regulation of 13 probesets with genes with fold change less than 0.667 (P < 0.05) in the hippocampus. Gene Ontology (GO) and pathway analysis showed that the up-regulated genes are mainly involved in oxygen binding and transport, autoimmune response and inflammation, and that the down-regulated genes are related to glucose metabolism, autoimmune response and inflammation, and other biological process, related to memory and cognitive function. Thus, the abnormally expressed genes are closely related to oxygen transport, glucose metabolism, and autoimmune response. The current findings display global gene expression profile of the hippocampus in a rat model of VD, providing new insights into the molecular pathogenesis of VD.

  9. Functional Genome Mining for Metabolites Encoded by Large Gene Clusters through Heterologous Expression of a Whole-Genome Bacterial Artificial Chromosome Library in Streptomyces spp.

    PubMed Central

    Xu, Min; Wang, Yemin; Zhao, Zhilong; Gao, Guixi; Huang, Sheng-Xiong; Kang, Qianjin; He, Xinyi; Lin, Shuangjun; Pang, Xiuhua; Deng, Zixin

    2016-01-01

    ABSTRACT Genome sequencing projects in the last decade revealed numerous cryptic biosynthetic pathways for unknown secondary metabolites in microbes, revitalizing drug discovery from microbial metabolites by approaches called genome mining. In this work, we developed a heterologous expression and functional screening approach for genome mining from genomic bacterial artificial chromosome (BAC) libraries in Streptomyces spp. We demonstrate mining from a strain of Streptomyces rochei, which is known to produce streptothricins and borrelidin, by expressing its BAC library in the surrogate host Streptomyces lividans SBT5, and screening for antimicrobial activity. In addition to the successful capture of the streptothricin and borrelidin biosynthetic gene clusters, we discovered two novel linear lipopeptides and their corresponding biosynthetic gene cluster, as well as a novel cryptic gene cluster for an unknown antibiotic from S. rochei. This high-throughput functional genome mining approach can be easily applied to other streptomycetes, and it is very suitable for the large-scale screening of genomic BAC libraries for bioactive natural products and the corresponding biosynthetic pathways. IMPORTANCE Microbial genomes encode numerous cryptic biosynthetic gene clusters for unknown small metabolites with potential biological activities. Several genome mining approaches have been developed to activate and bring these cryptic metabolites to biological tests for future drug discovery. Previous sequence-guided procedures relied on bioinformatic analysis to predict potentially interesting biosynthetic gene clusters. In this study, we describe an efficient approach based on heterologous expression and functional screening of a whole-genome library for the mining of bioactive metabolites from Streptomyces. The usefulness of this function-driven approach was demonstrated by the capture of four large biosynthetic gene clusters for metabolites of various chemical types, including

  10. A prioritization analysis of disease association by data-mining of functional annotation of human genes.

    PubMed

    Taniya, Takayuki; Tanaka, Susumu; Yamaguchi-Kabata, Yumi; Hanaoka, Hideki; Yamasaki, Chisato; Maekawa, Harutoshi; Barrero, Roberto A; Lenhard, Boris; Datta, Milton W; Shimoyama, Mary; Bumgarner, Roger; Chakraborty, Ranajit; Hopkinson, Ian; Jia, Libin; Hide, Winston; Auffray, Charles; Minoshima, Shinsei; Imanishi, Tadashi; Gojobori, Takashi

    2012-01-01

    Complex diseases result from contributions of multiple genes that act in concert through pathways. Here we present a method to prioritize novel candidates of disease-susceptibility genes depending on the biological similarities to the known disease-related genes. The extent of disease-susceptibility of a gene is prioritized by analyzing seven features of human genes captured in H-InvDB. Taking rheumatoid arthritis (RA) and prostate cancer (PC) as two examples, we evaluated the efficiency of our method. Highly scored genes obtained included TNFSF12 and OSM as candidate disease genes for RA and PC, respectively. Subsequent characterization of these genes based upon an extensive literature survey reinforced the validity of these highly scored genes as possible disease-susceptibility genes. Our approach, Prioritization ANalysis of Disease Association (PANDA), is an efficient and cost-effective method to narrow down a large set of genes into smaller subsets that are most likely to be involved in the disease pathogenesis.

  11. Using PharmGKB to train text mining approaches for identifying potential gene targets for pharmacogenomic studies.

    PubMed

    Pakhomov, S; McInnes, B T; Lamba, J; Liu, Y; Melton, G B; Ghodke, Y; Bhise, N; Lamba, V; Birnbaum, A K

    2012-10-01

    The main objective of this study was to investigate the feasibility of using PharmGKB, a pharmacogenomic database, as a source of training data in combination with text of MEDLINE abstracts for a text mining approach to identification of potential gene targets for pathway-driven pharmacogenomics research. We used the manually curated relations between drugs and genes in PharmGKB database to train a support vector machine predictive model and applied this model prospectively to MEDLINE abstracts. The gene targets suggested by this approach were subsequently manually reviewed. Our quantitative analysis showed that a support vector machine classifiers trained on MEDLINE abstracts with single words (unigrams) used as features and PharmGKB relations used for supervision, achieve an overall sensitivity of 85% and specificity of 69%. The subsequent qualitative analysis showed that gene targets "suggested" by the automatic classifier were not anticipated by expert reviewers but were subsequently found to be relevant to the three drugs that were investigated: carbamazepine, lamivudine and zidovudine. Our results show that this approach is not only feasible but may also find new gene targets not identifiable by other methods thus making it a valuable tool for pathway-driven pharmacogenomics research. Copyright © 2012 Elsevier Inc. All rights reserved.

  12. Interestingness measures and strategies for mining multi-ontology multi-level association rules from gene ontology annotations for the discovery of new GO relationships.

    PubMed

    Manda, Prashanti; McCarthy, Fiona; Bridges, Susan M

    2013-10-01

    The Gene Ontology (GO), a set of three sub-ontologies, is one of the most popular bio-ontologies used for describing gene product characteristics. GO annotation data containing terms from multiple sub-ontologies and at different levels in the ontologies is an important source of implicit relationships between terms from the three sub-ontologies. Data mining techniques such as association rule mining that are tailored to mine from multiple ontologies at multiple levels of abstraction are required for effective knowledge discovery from GO annotation data. We present a data mining approach, Multi-ontology data mining at All Levels (MOAL) that uses the structure and relationships of the GO to mine multi-ontology multi-level association rules. We introduce two interestingness measures: Multi-ontology Support (MOSupport) and Multi-ontology Confidence (MOConfidence) customized to evaluate multi-ontology multi-level association rules. We also describe a variety of post-processing strategies for pruning uninteresting rules. We use publicly available GO annotation data to demonstrate our methods with respect to two applications (1) the discovery of co-annotation suggestions and (2) the discovery of new cross-ontology relationships. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.

  13. Global and gene-specific DNA methylation pattern discriminates cholecystitis from gallbladder cancer patients in Chile.

    PubMed

    Kagohara, Luciane Tsukamoto; Schussel, Juliana L; Subbannayya, Tejaswini; Sahasrabuddhe, Nandini; Lebron, Cynthia; Brait, Mariana; Maldonado, Leonel; Valle, Blanca L; Pirini, Francesca; Jahuira, Martha; Lopez, Jaime; Letelier, Pablo; Brebi-Mieville, Priscilla; Ili, Carmen; Pandey, Akhilesh; Chatterjee, Aditi; Sidransky, David; Guerrero-Preston, Rafael

    2015-01-01

    The aim of the study was to evaluate the use of global and gene-specific DNA methylation changes as potential biomarkers for gallbladder cancer (GBC) in a cohort from Chile. DNA methylation was analyzed through an ELISA-based technique and quantitative methylation-specific PCR. Global DNA Methylation Index (p = 0.02) and promoter methylation of SSBP2 (p = 0.01) and ESR1 (p = 0.05) were significantly different in GBC when compared with cholecystitis. Receiver curve operator analysis revealed promoter methylation of APC, CDKN2A, ESR1, PGP9.5 and SSBP2, together with the Global DNA Methylation Index, had 71% sensitivity, 95% specificity, a 0.97 area under the curve and a positive predictive value of 90%. Global and gene-specific DNA methylation may be useful biomarkers for GBC clinical assessment.

  14. Global and gene-specific DNA methylation pattern discriminates cholecystitis from gallbladder cancer patients in Chile

    PubMed Central

    Kagohara, Luciane Tsukamoto; Schussel, Juliana L; Subbannayya, Tejaswini; Sahasrabuddhe, Nandini; Lebron, Cynthia; Brait, Mariana; Maldonado, Leonel; Valle, Blanca L; Pirini, Francesca; Jahuira, Martha; Lopez, Jaime; Letelier, Pablo; Brebi-Mieville, Priscilla; Ili, Carmen; Pandey, Akhilesh; Chatterjee, Aditi; Sidransky, David; Guerrero-Preston, Rafael

    2015-01-01

    Aim The aim of the study was to evaluate the use of global and gene-specific DNA methylation changes as potential biomarkers for gallbladder cancer (GBC) in a cohort from Chile. Material & methods DNA methylation was analyzed through an ELISA-based technique and quantitative methylation-specific PCR. Results Global DNA Methylation Index (p = 0.02) and promoter methylation of SSBP2 (p = 0.01) and ESR1 (p = 0.05) were significantly different in GBC when compared with cholecystitis. Receiver curve operator analysis revealed promoter methylation of APC, CDKN2A, ESR1, PGP9.5 and SSBP2, together with the Global DNA Methylation Index, had 71% sensitivity, 95% specificity, a 0.97 area under the curve and a positive predictive value of 90%. Conclusion Global and gene-specific DNA methylation may be useful biomarkers for GBC clinical assessment. PMID:25066711

  15. Re-engineering cellular physiology by rewiring high-level global regulatory genes

    PubMed Central

    Fitzgerald, Stephen; Dillon, Shane C.; Chao, Tzu-Chiao; Wiencko, Heather L.; Hokamp, Karsten; Cameron, Andrew D. S.; Dorman, Charles J.

    2015-01-01

    Knowledge of global regulatory networks has been exploited to rewire the gene control programmes of the model bacterium Salmonella enterica serovar Typhimurium. The product is an organism with competitive fitness that is superior to that of the wild type but tuneable under specific growth conditions. The paralogous hns and stpA global regulatory genes are located in distinct regions of the chromosome and control hundreds of target genes, many of which contribute to stress resistance. The locations of the hns and stpA open reading frames were exchanged reciprocally, each acquiring the transcription control signals of the other. The new strain had none of the compensatory mutations normally associated with alterations to hns expression in Salmonella; instead it displayed rescheduled expression of the stress and stationary phase sigma factor RpoS and its regulon. Thus the expression patterns of global regulators can be adjusted artificially to manipulate microbial physiology, creating a new and resilient organism. PMID:26631971

  16. The alpha- and beta-expansin and xyloglucan endotransglucosylase/hydrolase gene families of wheat: molecular cloning, gene expression, and EST data mining.

    PubMed

    Liu, Yong; Liu, Dongcheng; Zhang, Haiying; Gao, Hongbo; Guo, Xiaoli; Wang, Daowen; Zhang, Xiangqi; Zhang, Aimin

    2007-10-01

    Expansins and xyloglucan endotransglucosylase/hydrolases (XTHs) are families of extracellular proteins with members that have been shown to play an important role in cell wall growth. In this study, three, six, and five members of the wheat alpha-expansin (TaEXPA1 to TaEXPA3), beta-expansin (TaEXPB1 to TaEXPB6), and XTH (TaXTH1 to TaXTH5) gene families, respectively, were isolated from a dwarf wheat line. The mRNA expression analysis by real-time RT-PCR indicates that these genes display different transcription levels in different stages/organs/treatments, possibly suggesting their functional roles in the cell wall expansion process. Moreover, the comparison of the expression levels reveals that most of the expansins show lower expression than the XTHs. Finally, we present the analysis of wheat alpha- and beta-expansins and XTH families by expressed sequence tag data mining.

  17. Global gene expression responses to waterlogging in roots and leaves of cotton (Gossypium hirsutum L.).

    PubMed

    Christianson, Jed A; Llewellyn, Danny J; Dennis, Elizabeth S; Wilson, Iain W

    2010-01-01

    Waterlogging stress causes yield reduction in cotton (Gossypium hirsutum L.). A major component of waterlogging stress is the lack of oxygen available to submerged tissues. While changes in expressed protein, gene transcription and metabolite levels have been studied in response to low oxygen stress, little research has been done on molecular responses to waterlogging in cotton. We assessed cotton growth responses to waterlogging and assayed global gene transcription responses in root and leaf cotton tissues of partially submerged plants. Waterlogging caused significant reductions in stem elongation, shoot mass, root mass and leaf number, and altered the expression of 1,012 genes (4% of genes assayed) in root tissue as early as 4 h after flooding. Many of these genes were associated with cell wall modification and growth pathways, glycolysis, fermentation, mitochondrial electron transport and nitrogen metabolism. Waterlogging of plant roots also altered global gene expression in leaf tissues, significantly changing the expression of 1,305 genes (5% of genes assayed) after 24 h of flooding. Genes affected were associated with cell wall growth and modification, tetrapyrrole synthesis, hormone response, starch metabolism and nitrogen metabolism The implications of these results for the development of waterlogging-tolerant cotton are discussed.

  18. Analysis of global changes in gene expression induced by human polynucleotide phosphorylase (hPNPaseold-35)

    PubMed Central

    Sokhi, Upneet K.; Bacolod, Manny D.; Emdad, Luni; Das, Swadesh K.; Dumur, Catherine I.; Miles, Michael F.; Sarkar, Devanand; Fisher, Paul B.

    2014-01-01

    As a strategy to identify gene expression changes affected by human polynucleotide phosphorylase (hPNPaseold-35), we performed gene expression analysis of HeLa cells in which hPNPaseold-35 was overexpressed. The observed changes were then compared to those of HO-1 melanoma cells in which hPNPaseold-35 was stably knocked down. Through this analysis, 90 transcripts, which positively or negatively correlated with hPNPaseold-35 expression, were identified. The majority of these genes were associated with cell communication, cell cycle and chromosomal organization gene ontology categories. For a number of these genes, the positive or negative correlations with hPNPaseold-35 expression were consistent with transcriptional data extracted from the TCGA (The Cancer Genome Atlas) expression datasets for colon adenocarcinoma (COAD), skin cutaneous melanoma (SKCM), ovarian serous cyst adenocarcinoma (OV), and prostate adenocarcinoma (PRAD). Further analysis comparing the gene expression changes between Ad.hPNPaseold-35 infected HO-1 melanoma cells and HeLa cells overexpressing hPNPaseold-35 under the control of a doxycycline-inducible promoter, revealed global changes in genes involved in cell cycle and mitosis. Overall, this study provides further evidence that hPNPaseold-35 is associated with global changes in cell cycle-associated genes and identifies potential gene targets for future investigation. PMID:24729470

  19. Heavy metals in wild house mice from coal-mining areas of Colombia and expression of genes related to oxidative stress, DNA damage and exposure to metals.

    PubMed

    Guerrero-Castilla, Angélica; Olivero-Verbel, Jesús; Marrugo-Negrete, José

    2014-03-01

    Coal mining is a source of pollutants that impact on environmental and human health. This study examined the metal content and the transcriptional status of gene markers associated with oxidative stress, metal transport and DNA damage in livers of feral mice collected near coal-mining operations, in comparison with mice obtained from a reference site. Mus musculus specimens were caught from La Loma and La Jagua, two coal-mining sites in the north of Colombia, as well as from Valledupar (Cesar Department), a city located 100km north of the mines. Concentrations in liver tissue of Hg, Zn, Pb, Cd, Cu and As were determined by differential stripping voltammetry, and real-time PCR was used to measure gene expression. Compared with the reference group (Valledupar), hepatic concentrations of Cd, Cu and Zn were significantly higher in animals living near mining areas. In exposed animals, the mRNA expression of NQ01, MT1, SOD1, MT2, and DDIT3 was 4.2-, 7.3-, 2.5-, 4.6- and 3.4-fold greater in coal mining sites, respectively, than in animals from the reference site (p<0.05). These results suggest that activities related to coal mining may generate pollutants that could affect the biota, inducing the transcription of biochemical markers related to oxidative stress, metal exposure, and DNA damage. These changes may be in part linked to metal toxicity, and could have implications for the development of chronic disease. Therefore, it is essential to implement preventive measures to minimize the effects of coal mining on its nearby environment, in order to protect human health.

  20. Standardizing global gene expression analysis between laboratories and across platforms.

    PubMed

    Bammler, Theodore; Beyer, Richard P; Bhattacharya, Sanchita; Boorman, Gary A; Boyles, Abee; Bradford, Blair U; Bumgarner, Roger E; Bushel, Pierre R; Chaturvedi, Kabir; Choi, Dongseok; Cunningham, Michael L; Deng, Shibing; Dressman, Holly K; Fannin, Rickie D; Farin, Fredrico M; Freedman, Jonathan H; Fry, Rebecca C; Harper, Angel; Humble, Michael C; Hurban, Patrick; Kavanagh, Terrance J; Kaufmann, William K; Kerr, Kathleen F; Jing, Li; Lapidus, Jodi A; Lasarev, Michael R; Li, Jianying; Li, Yi-Ju; Lobenhofer, Edward K; Lu, Xinfang; Malek, Renae L; Milton, Sean; Nagalla, Srinivasa R; O'malley, Jean P; Palmer, Valerie S; Pattee, Patrick; Paules, Richard S; Perou, Charles M; Phillips, Ken; Qin, Li-Xuan; Qiu, Yang; Quigley, Sean D; Rodland, Matthew; Rusyn, Ivan; Samson, Leona D; Schwartz, David A; Shi, Yan; Shin, Jung-Lim; Sieber, Stella O; Slifer, Susan; Speer, Marcy C; Spencer, Peter S; Sproles, Dean I; Swenberg, James A; Suk, William A; Sullivan, Robert C; Tian, Ru; Tennant, Raymond W; Todd, Signe A; Tucker, Charles J; Van Houten, Bennett; Weis, Brenda K; Xuan, Shirley; Zarbl, Helmut

    2005-05-01

    To facilitate collaborative research efforts between multi-investigator teams using DNA microarrays, we identified sources of error and data variability between laboratories and across microarray platforms, and methods to accommodate this variability. RNA expression data were generated in seven laboratories, which compared two standard RNA samples using 12 microarray platforms. At least two standard microarray types (one spotted, one commercial) were used by all laboratories. Reproducibility for most platforms within any laboratory was typically good, but reproducibility between platforms and across laboratories was generally poor. Reproducibility between laboratories increased markedly when standardized protocols were implemented for RNA labeling, hybridization, microarray processing, data acquisition and data normalization. Reproducibility was highest when analysis was based on biological themes defined by enriched Gene Ontology (GO) categories. These findings indicate that microarray results can be comparable across multiple laboratories, especially when a common platform and set of procedures are used.

  1. Linking genes to literature: text mining, information extraction, and retrieval applications for biology

    PubMed Central

    Krallinger, Martin; Valencia, Alfonso; Hirschman, Lynette

    2008-01-01

    Efficient access to information contained in online scientific literature collections is essential for life science research, playing a crucial role from the initial stage of experiment planning to the final interpretation and communication of the results. The biological literature also constitutes the main information source for manual literature curation used by expert-curated databases. Following the increasing popularity of web-based applications for analyzing biological data, new text-mining and information extraction strategies are being implemented. These systems exploit existing regularities in natural language to extract biologically relevant information from electronic texts automatically. The aim of the BioCreative challenge is to promote the development of such tools and to provide insight into their performance. This review presents a general introduction to the main characteristics and applications of currently available text-mining systems for life sciences in terms of the following: the type of biological information demands being addressed; the level of information granularity of both user queries and results; and the features and methods commonly exploited by these applications. The current trend in biomedical text mining points toward an increasing diversification in terms of application types and techniques, together with integration of domain-specific resources such as ontologies. Additional descriptions of some of the systems discussed here are available on the internet . PMID:18834499

  2. Linking genes to literature: text mining, information extraction, and retrieval applications for biology.

    PubMed

    Krallinger, Martin; Valencia, Alfonso; Hirschman, Lynette

    2008-01-01

    Efficient access to information contained in online scientific literature collections is essential for life science research, playing a crucial role from the initial stage of experiment planning to the final interpretation and communication of the results. The biological literature also constitutes the main information source for manual literature curation used by expert-curated databases. Following the increasing popularity of web-based applications for analyzing biological data, new text-mining and information extraction strategies are being implemented. These systems exploit existing regularities in natural language to extract biologically relevant information from electronic texts automatically. The aim of the BioCreative challenge is to promote the development of such tools and to provide insight into their performance. This review presents a general introduction to the main characteristics and applications of currently available text-mining systems for life sciences in terms of the following: the type of biological information demands being addressed; the level of information granularity of both user queries and results; and the features and methods commonly exploited by these applications. The current trend in biomedical text mining points toward an increasing diversification in terms of application types and techniques, together with integration of domain-specific resources such as ontologies. Additional descriptions of some of the systems discussed here are available on the internet http://zope.bioinfo.cnio.es/bionlp_tools/.

  3. GEM-TREND: a web tool for gene expression data mining toward relevant network discovery

    PubMed Central

    Feng, Chunlai; Araki, Michihiro; Kunimoto, Ryo; Tamon, Akiko; Makiguchi, Hiroki; Niijima, Satoshi; Tsujimoto, Gozoh; Okuno, Yasushi

    2009-01-01

    Background DNA microarray technology provides us with a first step toward the goal of uncovering gene functions on a genomic scale. In recent years, vast amounts of gene expression data have been collected, much of which are available in public databases, such as the Gene Expression Omnibus (GEO). To date, most researchers have been manually retrieving data from databases through web browsers using accession numbers (IDs) or keywords, but gene-expression patterns are not considered when retrieving such data. The Connectivity Map was recently introduced to compare gene expression data by introducing gene-expression signatures (represented by a set of genes with up- or down-regulated labels according to their biological states) and is available as a web tool for detecting similar gene-expression signatures from a limited data set (approximately 7,000 expression profiles representing 1,309 compounds). In order to support researchers to utilize the public gene expression data more effectively, we developed a web tool for finding similar gene expression data and generating its co-expression networks from a publicly available database. Results GEM-TREND, a web tool for searching gene expression data, allows users to search data from GEO using gene-expression signatures or gene expression ratio data as a query and retrieve gene expression data by comparing gene-expression pattern between the query and GEO gene expression data. The comparison methods are based on the nonparametric, rank-based pattern matching approach of Lamb et al. (Science 2006) with the additional calculation of statistical significance. The web tool was tested using gene expression ratio data randomly extracted from the GEO and with in-house microarray data, respectively. The results validated the ability of GEM-TREND to retrieve gene expression entries biologically related to a query from GEO. For further analysis, a network visualization interface is also provided, whereby genes and gene annotations

  4. The harmonizome: a collection of processed datasets gathered to serve and mine knowledge about genes and proteins

    PubMed Central

    Rouillard, Andrew D.; Gundersen, Gregory W.; Fernandez, Nicolas F.; Wang, Zichen; Monteiro, Caroline D.; McDermott, Michael G.; Ma’ayan, Avi

    2016-01-01

    Genomics, epigenomics, transcriptomics, proteomics and metabolomics efforts rapidly generate a plethora of data on the activity and levels of biomolecules within mammalian cells. At the same time, curation projects that organize knowledge from the biomedical literature into online databases are expanding. Hence, there is a wealth of information about genes, proteins and their associations, with an urgent need for data integration to achieve better knowledge extraction and data reuse. For this purpose, we developed the Harmonizome: a collection of processed datasets gathered to serve and mine knowledge about genes and proteins from over 70 major online resources. We extracted, abstracted and organized data into ∼72 million functional associations between genes/proteins and their attributes. Such attributes could be physical relationships with other biomolecules, expression in cell lines and tissues, genetic associations with knockout mouse or human phenotypes, or changes in expression after drug treatment. We stored these associations in a relational database along with rich metadata for the genes/proteins, their attributes and the original resources. The freely available Harmonizome web portal provides a graphical user interface, a web service and a mobile app for querying, browsing and downloading all of the collected data. To demonstrate the utility of the Harmonizome, we computed and visualized gene–gene and attribute–attribute similarity networks, and through unsupervised clustering, identified many unexpected relationships by combining pairs of datasets such as the association between kinase perturbations and disease signatures. We also applied supervised machine learning methods to predict novel substrates for kinases, endogenous ligands for G-protein coupled receptors, mouse phenotypes for knockout genes, and classified unannotated transmembrane proteins for likelihood of being ion channels. The Harmonizome is a comprehensive resource of knowledge

  5. Growth-rate dependent global effects on gene expression in bacteria

    PubMed Central

    Klumpp, Stefan; Zhang, Zhongge; Hwa, Terence

    2010-01-01

    Summary Bacterial gene expression depends not only on specific regulations but also directly on bacterial growth, because important global parameters such as the abundance of RNA polymerases and ribosomes are all growth-rate dependent. Understanding these global effects is necessary for a quantitative understanding of gene regulation and for the robust design of synthetic genetic circuits. The observed growth-rate dependence of constitutive gene expression can be explained by a simple model using the measured growth-rate dependence of the relevant cellular parameters. More complex growth dependences for genetic circuits involving activators, repressors and feedback control were analyzed, and salient features were verified experimentally using synthetic circuits. The results suggest a novel feedback mechanism mediated by general growth-dependent effects and not requiring explicit gene regulation, if the expressed protein affects cell growth. This mechanism can lead to growth bistability and promote the acquisition of important physiological functions such as antibiotic resistance and tolerance (persistence). PMID:20064380

  6. antiSMASH 3.0-a comprehensive resource for the genome mining of biosynthetic gene clusters.

    PubMed

    Weber, Tilmann; Blin, Kai; Duddela, Srikanth; Krug, Daniel; Kim, Hyun Uk; Bruccoleri, Robert; Lee, Sang Yup; Fischbach, Michael A; Müller, Rolf; Wohlleben, Wolfgang; Breitling, Rainer; Takano, Eriko; Medema, Marnix H

    2015-07-01

    Microbial secondary metabolism constitutes a rich source of antibiotics, chemotherapeutics, insecticides and other high-value chemicals. Genome mining of gene clusters that encode the biosynthetic pathways for these metabolites has become a key methodology for novel compound discovery. In 2011, we introduced antiSMASH, a web server and stand-alone tool for the automatic genomic identification and analysis of biosynthetic gene clusters, available at http://antismash.secondarymetabolites.org. Here, we present version 3.0 of antiSMASH, which has undergone major improvements. A full integration of the recently published ClusterFinder algorithm now allows using this probabilistic algorithm to detect putative gene clusters of unknown types. Also, a new dereplication variant of the ClusterBlast module now identifies similarities of identified clusters to any of 1172 clusters with known end products. At the enzyme level, active sites of key biosynthetic enzymes are now pinpointed through a curated pattern-matching procedure and Enzyme Commission numbers are assigned to functionally classify all enzyme-coding genes. Additionally, chemical structure prediction has been improved by incorporating polyketide reduction states. Finally, in order for users to be able to organize and analyze multiple antiSMASH outputs in a private setting, a new XML output module allows offline editing of antiSMASH annotations within the Geneious software.

  7. Molecular Networking and Pattern-Based Genome Mining Improves Discovery of Biosynthetic Gene Clusters and their Products from Salinispora Species

    DOE PAGES

    Duncan, Katherine R.; Crüsemann, Max; Lechner, Anna; ...

    2015-04-09

    Genome sequencing has revealed that bacteria contain many more biosynthetic gene clusters than predicted based on the number of secondary metabolites discovered to date. While this biosynthetic reservoir has fostered interest in new tools for natural product discovery, there remains a gap between gene cluster detection and compound discovery. In this paper, we apply molecular networking and the new concept of pattern-based genome mining to 35 Salinispora strains, including 30 for which draft genome sequences were either available or obtained for this study. The results provide a method to simultaneously compare large numbers of complex microbial extracts, which facilitated themore » identification of media components, known compounds and their derivatives, and new compounds that could be prioritized for structure elucidation. Finally, these efforts revealed considerable metabolite diversity and led to several molecular family-gene cluster pairings, of which the quinomycin-type depsipeptide retimycin A was characterized and linked to gene cluster NRPS40 using pattern-based bioinformatic approaches.« less

  8. Expression Analysis of Ni- and V-Associated Resistance Genes in a Bacillus megaterium Strain Isolated from a Mining Site.

    PubMed

    Fierros Romero, Grisel; Rivas Castillo, Andrea; Gómez Ramírez, Marlenne; Pless, Reynaldo; Rojas Avelizapa, Norma

    2016-08-01

    Bacillus megaterium strain MNSH1-9K-1 was isolated from a mining site in Guanajuato, Mexico. This B. megaterium strain presented the ability to remove Ni and V from a spent catalyst. Also, its associated metal resistance genes nccA, hant, VAN2, and smtAB were previously identified by a PCR approach. The present study reports for the first time, in B. megaterium, the changes in the expression of the genes nccA (Ni-Co-Cd resistance); hant (high-affinity nickel transporter); smtAB, a metal-binding protein gene; and VAN2 (V resistance) after exposure to 200 ppm of Ni and 200 ppm of V during the stationary phase of the microorganism in PHGII liquid media. The data presented here may contribute to the knowledge of the genes involved in the Ni and V resistances of B. megaterium, and the possible pathways implicated in the Ni-V removal processes, which may be potentiated for the biological treatment of high metal content residues.

  9. antiSMASH 3.0—a comprehensive resource for the genome mining of biosynthetic gene clusters

    PubMed Central

    Weber, Tilmann; Blin, Kai; Duddela, Srikanth; Krug, Daniel; Kim, Hyun Uk; Bruccoleri, Robert; Lee, Sang Yup; Fischbach, Michael A.; Müller, Rolf; Wohlleben, Wolfgang; Breitling, Rainer; Takano, Eriko; Medema, Marnix H.

    2015-01-01

    Microbial secondary metabolism constitutes a rich source of antibiotics, chemotherapeutics, insecticides and other high-value chemicals. Genome mining of gene clusters that encode the biosynthetic pathways for these metabolites has become a key methodology for novel compound discovery. In 2011, we introduced antiSMASH, a web server and stand-alone tool for the automatic genomic identification and analysis of biosynthetic gene clusters, available at http://antismash.secondarymetabolites.org. Here, we present version 3.0 of antiSMASH, which has undergone major improvements. A full integration of the recently published ClusterFinder algorithm now allows using this probabilistic algorithm to detect putative gene clusters of unknown types. Also, a new dereplication variant of the ClusterBlast module now identifies similarities of identified clusters to any of 1172 clusters with known end products. At the enzyme level, active sites of key biosynthetic enzymes are now pinpointed through a curated pattern-matching procedure and Enzyme Commission numbers are assigned to functionally classify all enzyme-coding genes. Additionally, chemical structure prediction has been improved by incorporating polyketide reduction states. Finally, in order for users to be able to organize and analyze multiple antiSMASH outputs in a private setting, a new XML output module allows offline editing of antiSMASH annotations within the Geneious software. PMID:25948579

  10. SSH gene expression profile of Eisenia andrei exposed in situ to a naturally contaminated soil from an abandoned uranium mine.

    PubMed

    Lourenço, Joana; Pereira, Ruth; Gonçalves, Fernando; Mendo, Sónia

    2013-02-01

    The effects of the exposure of earthworms (Eisenia andrei) to contaminated soil from an abandoned uranium mine, were assessed through gene expression profile evaluation by Suppression Subtractive Hybridization (SSH). Organisms were exposed in situ for 56 days, in containers placed both in a contaminated and in a non-contaminated site (reference). Organisms were sampled after 14 and 56 days of exposure. Results showed that the main physiological functions affected by the exposure to metals and radionuclides were: metabolism, oxireductase activity, redox homeostasis and response to chemical stimulus and stress. The relative expression of NADH dehydrogenase subunit 1 and elongation factor 1 alpha was also affected, since the genes encoding these enzymes were significantly up and down-regulated, after 14 and 56 days of exposure, respectively. Also, an EST with homology for SET oncogene was found to be up-regulated. To the best of our knowledge, this is the first time that this gene was identified in earthworms and thus, further studies are required, to clarify its involvement in the toxicity of metals and radionuclides. Considering the results herein presented, gene expression profiling proved to be a very useful tool to detect earthworms underlying responses to metals and radionuclides exposure, pointing out for the detection and development of potential new biomarkers.

  11. Molecular networking and pattern-based genome mining improves discovery of biosynthetic gene clusters and their products from Salinispora species.

    PubMed

    Duncan, Katherine R; Crüsemann, Max; Lechner, Anna; Sarkar, Anindita; Li, Jie; Ziemert, Nadine; Wang, Mingxun; Bandeira, Nuno; Moore, Bradley S; Dorrestein, Pieter C; Jensen, Paul R

    2015-04-23

    Genome sequencing has revealed that bacteria contain many more biosynthetic gene clusters than predicted based on the number of secondary metabolites discovered to date. While this biosynthetic reservoir has fostered interest in new tools for natural product discovery, there remains a gap between gene cluster detection and compound discovery. Here we apply molecular networking and the new concept of pattern-based genome mining to 35 Salinispora strains, including 30 for which draft genome sequences were either available or obtained for this study. The results provide a method to simultaneously compare large numbers of complex microbial extracts, which facilitated the identification of media components, known compounds and their derivatives, and new compounds that could be prioritized for structure elucidation. These efforts revealed considerable metabolite diversity and led to several molecular family-gene cluster pairings, of which the quinomycin-type depsipeptide retimycin A was characterized and linked to gene cluster NRPS40 using pattern-based bioinformatic approaches. Copyright © 2015 Elsevier Ltd. All rights reserved.

  12. Mining gene expression data for pollutants (dioxin, toluene, formaldehyde) and low dose of gamma-irradiation.

    PubMed

    Moskalev, Alexey; Shaposhnikov, Mikhail; Snezhkina, Anastasia; Kogan, Valeria; Plyusnina, Ekaterina; Peregudova, Darya; Melnikova, Nataliya; Uroshlev, Leonid; Mylnikov, Sergey; Dmitriev, Alexey; Plusnin, Sergey; Fedichev, Peter; Kudryavtseva, Anna

    2014-01-01

    General and specific effects of molecular genetic responses to adverse environmental factors are not well understood. This study examines genome-wide gene expression profiles of Drosophila melanogaster in response to ionizing radiation, formaldehyde, toluene, and 2,3,7,8-tetrachlorodibenzo-p-dioxin. We performed RNA-seq analysis on 25,415 transcripts to measure the change in gene expression in males and females separately. An analysis of the genes unique to each treatment yielded a list of genes as a gene expression signature. In the case of radiation exposure, both sexes exhibited a reproducible increase in their expression of the transcription factors sugarbabe and tramtrack. The influence of dioxin up-regulated metabolic genes, such as anachronism, CG16727, and several genes with unknown function. Toluene activated a gene involved in the response to the toxins, Cyp12d1-p; the transcription factor Fer3's gene; the metabolic genes CG2065, CG30427, and CG34447; and the genes Spn28Da and Spn3, which are responsible for reproduction and immunity. All significantly differentially expressed genes, including those shared among the stressors, can be divided into gene groups using Gene Ontology Biological Process identifiers. These gene groups are related to defense response, biological regulation, the cell cycle, metabolic process, and circadian rhythms. KEGG molecular pathway analysis revealed alteration of the Notch signaling pathway, TGF-beta signaling pathway, proteasome, basal transcription factors, nucleotide excision repair, Jak-STAT signaling pathway, circadian rhythm, Hippo signaling pathway, mTOR signaling pathway, ribosome, mismatch repair, RNA polymerase, mRNA surveillance pathway, Hedgehog signaling pathway, and DNA replication genes. Females and, to a lesser extent, males actively metabolize xenobiotics by the action of cytochrome P450 when under the influence of dioxin and toluene. Finally, in this work we obtained gene expression signatures pollutants

  13. SNPranker 2.0: a gene-centric data mining tool for diseases associated SNP prioritization in GWAS

    PubMed Central

    2013-01-01

    Background The capability of correlating specific genotypes with human diseases is a complex issue in spite of all advantages arisen from high-throughput technologies, such as Genome Wide Association Studies (GWAS). New tools for genetic variants interpretation and for Single Nucleotide Polymorphisms (SNPs) prioritization are actually needed. Given a list of the most relevant SNPs statistically associated to a specific pathology as result of a genotype study, a critical issue is the identification of genes that are effectively related to the disease by re-scoring the importance of the identified genetic variations. Vice versa, given a list of genes, it can be of great importance to predict which SNPs can be involved in the onset of a particular disease, in order to focus the research on their effects. Results We propose a new bioinformatics approach to support biological data mining in the analysis and interpretation of SNPs associated to pathologies. This system can be employed to design custom genotyping chips for disease-oriented studies and to re-score GWAS results. The proposed method relies (1) on the data integration of public resources using a gene-centric database design, (2) on the evaluation of a set of static biomolecular annotations, defined as features, and (3) on the SNP scoring function, which computes SNP scores using parameters and weights set by users. We employed a machine learning classifier to set default feature weights and an ontological annotation layer to enable the enrichment of the input gene set. We implemented our method as a web tool called SNPranker 2.0 (http://www.itb.cnr.it/snpranker), improving our first published release of this system. A user-friendly interface allows the input of a list of genes, SNPs or a biological process, and to customize the features set with relative weights. As result, SNPranker 2.0 returns a list of SNPs, localized within input and ontologically enriched genes, combined with their prioritization scores

  14. SNPranker 2.0: a gene-centric data mining tool for diseases associated SNP prioritization in GWAS.

    PubMed

    Merelli, Ivan; Calabria, Andrea; Cozzi, Paolo; Viti, Federica; Mosca, Ettore; Milanesi, Luciano

    2013-01-01

    The capability of correlating specific genotypes with human diseases is a complex issue in spite of all advantages arisen from high-throughput technologies, such as Genome Wide Association Studies (GWAS). New tools for genetic variants interpretation and for Single Nucleotide Polymorphisms (SNPs) prioritization are actually needed. Given a list of the most relevant SNPs statistically associated to a specific pathology as result of a genotype study, a critical issue is the identification of genes that are effectively related to the disease by re-scoring the importance of the identified genetic variations. Vice versa, given a list of genes, it can be of great importance to predict which SNPs can be involved in the onset of a particular disease, in order to focus the research on their effects. We propose a new bioinformatics approach to support biological data mining in the analysis and interpretation of SNPs associated to pathologies. This system can be employed to design custom genotyping chips for disease-oriented studies and to re-score GWAS results. The proposed method relies (1) on the data integration of public resources using a gene-centric database design, (2) on the evaluation of a set of static biomolecular annotations, defined as features, and (3) on the SNP scoring function, which computes SNP scores using parameters and weights set by users. We employed a machine learning classifier to set default feature weights and an ontological annotation layer to enable the enrichment of the input gene set. We implemented our method as a web tool called SNPranker 2.0 (http://www.itb.cnr.it/snpranker), improving our first published release of this system. A user-friendly interface allows the input of a list of genes, SNPs or a biological process, and to customize the features set with relative weights. As result, SNPranker 2.0 returns a list of SNPs, localized within input and ontologically enriched genes, combined with their prioritization scores. Different

  15. Polyketide and nonribosomal peptide retro-biosynthesis and global gene cluster matching.

    PubMed

    Dejong, Chris A; Chen, Gregory M; Li, Haoxin; Johnston, Chad W; Edwards, Mclean R; Rees, Philip N; Skinnider, Michael A; Webster, Andrew L H; Magarvey, Nathan A

    2016-12-01

    Polyketides (PKs) and nonribosomal peptides (NRPs) are profoundly important natural products, forming the foundations of many therapeutic regimes. Decades of research have revealed over 11,000 PK and NRP structures, and genome sequencing is uncovering new PK and NRP gene clusters at an unprecedented rate. However, only ∼10% of PK and NRPs are currently associated with gene clusters, and it is unclear how many of these orphan gene clusters encode previously isolated molecules. Therefore, to efficiently guide the discovery of new molecules, we must first systematically de-orphan emergent gene clusters from genomes. Here we provide to our knowledge the first comprehensive retro-biosynthetic program, generalized retro-biosynthetic assembly prediction engine (GRAPE), for PK and NRP families and introduce a computational pipeline, global alignment for natural products cheminformatics (GARLIC), to uncover how observed biosynthetic gene clusters relate to known molecules, leading to the identification of gene clusters that encode new molecules.

  16. Polymorphisms in metabolism and repair genes affects DNA damage caused by open-cast coal mining exposure.

    PubMed

    Espitia-Pérez, Lyda; Sosa, Milton Quintana; Salcedo-Arteaga, Shirley; León-Mejía, Grethel; Hoyos-Giraldo, Luz Stella; Brango, Hugo; Kvitko, Katia; da Silva, Juliana; Henriques, João A P

    2016-09-15

    Increasing evidence suggest that occupational exposure to open-cast coal mining residues like dust particles, heavy metals and Polycyclic Aromatic Hydrocarbons (PAHs) may cause a wide range of DNA damage and genomic instability that could be associated to initial steps in cancer development and other work-related diseases. The aim of our study was to evaluate if key polymorphisms in metabolism genes CYP1A1Msp1, GSTM1Null, GSTT1Null and DNA repair genes XRCC1Arg194Trp and hOGG1Ser326Cys could modify individual susceptibility to adverse coal exposure effects, considering the DNA damage (Comet assay) and micronucleus formation in lymphocytes (CBMN) and buccal mucosa cells (BMNCyt) as endpoints for genotoxicity. The study population is comprised of 200 healthy male subjects, 100 open-cast coal-mining workers from "El Cerrejón" (world's largest open-cast coal mine located in Guajira - Colombia) and 100 non-exposed referents from general population. The data revealed a significant increase of CBMN frequency in peripheral lymphocytes of occupationally exposed workers carrying the wild-type variant of GSTT1 (+) gene. Exposed subjects carrying GSTT1null polymorphism showed a lower micronucleus frequency compared with their positive counterparts (FR: 0.83; P=0.04), while BMNCyt, frequency and Comet assay parameters in lymphocytes: Damage Index (DI) and percentage of DNA in the tail (Tail % DNA) were significantly higher in exposed workers with the GSTM1Null polymorphism. Other exfoliated buccal mucosa abnormalities related to cell death (Karyorrhexis and Karyolysis) were increased in GSTT/M1Null carriers. Nuclear buds were significantly higher in workers carrying the CYP1A1Msp1 (m1/m2, m2/m2) allele. Moreover, BMNCyt frequency and Comet assay parameters were significantly lower in exposed carriers of XRCC1Arg194Trp (Arg/Trp, Trp/Trp) and hOGG1Ser326Cys (Ser/Cys, Cys/Cys), thereby providing new data to the increasing evidence about the protective role of these polymorphisms

  17. Methods for simultaneously identifying coherent local clusters with smooth global patterns in gene expression profiles.

    PubMed

    Tien, Yin-Jing; Lee, Yun-Shien; Wu, Han-Ming; Chen, Chun-Houh

    2008-03-20

    The hierarchical clustering tree (HCT) with a dendrogram 1 and the singular value decomposition (SVD) with a dimension-reduced representative map 2 are popular methods for two-way sorting the gene-by-array matrix map employed in gene expression profiling. While HCT dendrograms tend to optimize local coherent clustering patterns, SVD leading eigenvectors usually identify better global grouping and transitional structures. This study proposes a flipping mechanism for a conventional agglomerative HCT using a rank-two ellipse (R2E, an improved SVD algorithm for sorting purpose) seriation by Chen 3 as an external reference. While HCTs always produce permutations with good local behaviour, the rank-two ellipse seriation gives the best global grouping patterns and smooth transitional trends. The resulting algorithm automatically integrates the desirable properties of each method so that users have access to a clustering and visualization environment for gene expression profiles that preserves coherent local clusters and identifies global grouping trends. We demonstrate, through four examples, that the proposed method not only possesses better numerical and statistical properties, it also provides more meaningful biomedical insights than other sorting algorithms. We suggest that sorted proximity matrices for genes and arrays, in addition to the gene-by-array expression matrix, can greatly aid in the search for comprehensive understanding of gene expression structures. Software for the proposed methods can be obtained at http://gap.stat.sinica.edu.tw/Software/GAP.

  18. A global evolutionary and metabolic analysis of human obesity gene risk variants.

    PubMed

    Castillo, Joseph J; Hazlett, Zachary S; Orlando, Robert A; Garver, William S

    2017-09-05

    It is generally accepted that the selection of gene variants during human evolution optimized energy metabolism that now interacts with our obesogenic environment to increase the prevalence of obesity. The purpose of this study was to perform a global evolutionary and metabolic analysis of human obesity gene risk variants (110 human obesity genes with 127 nearest gene risk variants) identified using genome-wide association studies (GWAS) to enhance our knowledge of early and late genotypes. As a result of determining the mean frequency of these obesity gene risk variants in 13 available populations from around the world our results provide evidence for the early selection of ancestral risk variants (defined as selection before migration from Africa) and late selection of derived risk variants (defined as selection after migration from Africa). Our results also provide novel information for association of these obesity genes or encoded proteins with diverse metabolic pathways and other human diseases. The overall results indicate a significant differential evolutionary pattern for the selection of obesity gene ancestral and derived risk variants proposed to optimize energy metabolism in varying global environments and complex association with metabolic pathways and other human diseases. These results are consistent with obesity genes that encode proteins possessing a fundamental role in maintaining energy metabolism and survival during the course of human evolution. Copyright © 2017. Published by Elsevier B.V.

  19. Metagenomic data utilization and analysis (MEDUSA) and construction of a global gut microbial gene catalogue.

    PubMed

    Karlsson, Fredrik H; Nookaew, Intawat; Nielsen, Jens

    2014-07-01

    Metagenomic sequencing has contributed important new knowledge about the microbes that live in a symbiotic relationship with humans. With modern sequencing technology it is possible to generate large numbers of sequencing reads from a metagenome but analysis of the data is challenging. Here we present the bioinformatics pipeline MEDUSA that facilitates analysis of metagenomic reads at the gene and taxonomic level. We also constructed a global human gut microbial gene catalogue by combining data from 4 studies spanning 3 continents. Using MEDUSA we mapped 782 gut metagenomes to the global gene catalogue and a catalogue of sequenced microbial species. Hereby we find that all studies share about half a million genes and that on average 300,000 genes are shared by half the studied subjects. The gene richness is higher in the European studies compared to Chinese and American and this is also reflected in the species richness. Even though it is possible to identify common species and a core set of genes, we find that there are large variations in abundance of species and genes.

  20. Global expression differences and tissue specific expression differences in rice evolution result in two contrasting types of differentially expressed genes.

    PubMed

    Horiuchi, Youko; Harushima, Yoshiaki; Fujisawa, Hironori; Mochizuki, Takako; Fujita, Masahiro; Ohyanagi, Hajime; Kurata, Nori

    2015-12-23

    Since the development of transcriptome analysis systems, many expression evolution studies characterized evolutionary forces acting on gene expression, without explicit discrimination between global expression differences and tissue specific expression differences. However, different types of gene expression alteration should have different effects on an organism, the evolutionary forces that act on them might be different, and different types of genes might show different types of differential expression between species. To confirm this, we studied differentially expressed (DE) genes among closely related groups that have extensive gene expression atlases, and clarified characteristics of different types of DE genes including the identification of regulating loci for differential expression using expression quantitative loci (eQTL) analysis data. We detected differentially expressed (DE) genes between rice subspecies in five homologous tissues that were verified using japonica and indica transcriptome atlases in public databases. Using the transcriptome atlases, we classified DE genes into two types, global DE genes and changed-tissues DE genes. Global type DE genes were not expressed in any tissues in the atlas of one subspecies, however changed-tissues type DE genes were expressed in both subspecies with different tissue specificity. For the five tissues in the two japonica-indica combinations, 4.6 ± 0.8 and 5.9 ± 1.5 % of highly expressed genes were global and changed-tissues DE genes, respectively. Changed-tissues DE genes varied in number between tissues, increasing linearly with the abundance of tissue specifically expressed genes in the tissue. Molecular evolution of global DE genes was rapid, unlike that of changed-tissues DE genes. Based on gene ontology, global and changed-tissues DE genes were different, having no common GO terms. Expression differences of most global DE genes were regulated by cis-eQTLs. Expression evolution of changed-tissues DE

  1. Global identification of target genes regulated by APETALA3 and PISTILLATA floral homeotic gene action.

    PubMed

    Zik, Moriyah; Irish, Vivian F

    2003-01-01

    Identifying the genes regulated by the floral homeotic genes APETALA3 (AP3) and PISTILLATA (PI) is crucial for understanding the molecular mechanisms that lead to petal and stamen formation. We have used microarray analysis to conduct a broad survey of genes whose expression is affected by AP3 and PI activity. DNA microarrays consisting of 9216 Arabidopsis ESTs were screened with probes corresponding to mRNAs from different mutant and transgenic lines that misexpress AP3 and/or PI. The microarray results were further confirmed by RNA gel blot analyses. Our results suggest that AP3 and PI regulate a relatively small number of genes, implying that many genes used in petal and stamen development are not tissue specific and likely have roles in other processes as well. We recovered genes similar to previously identified petal- and stamen-expressed genes as well as genes that were not implicated previously in petal and stamen development. A very low percentage of the genes recovered encoded transcription factors. This finding suggests that AP3 and PI act relatively directly to regulate the genes required for the basic cellular processes responsible for petal and stamen morphogenesis.

  2. Mining for novel candidate clock genes in the circadian regulatory network.

    PubMed

    Bhargava, Anuprabha; Herzel, Hanspeter; Ananthasubramaniam, Bharath

    2015-11-14

    Most physiological processes in mammals are temporally regulated by means of a master circadian clock in the brain and peripheral oscillators in most other tissues. A transcriptional-translation feedback network of clock genes produces near 24 h oscillations in clock gene and protein expression. Here, we aim to identify novel additions to the clock network using a meta-analysis of public chromatin immunoprecipitation sequencing (ChIP-seq), proteomics and protein-protein interaction data starting from a published list of 1000 genes with robust transcriptional rhythms and circadian phenotypes of knockdowns. We identified 20 candidate genes including nine known clock genes that received significantly high scores and were also robust to the relative weights assigned to different data types. Our scoring was consistent with the original ranking of the 1000 genes, but also provided novel complementary insights. Candidate genes were enriched for genes expressed in a circadian manner in multiple tissues with regulation driven mainly by transcription factors BMAL1 and REV-ERB α,β. Moreover, peak transcription of candidate genes was remarkably consistent across tissues. While peaks of the 1000 genes were distributed uniformly throughout the day, candidate gene peaks were strongly concentrated around dusk. Finally, we showed that binding of specific transcription factors to a gene promoter was predictive of peak transcription at a certain time of day and discuss combinatorial phase regulation. Combining complementary publicly-available data targeting different levels of regulation within the circadian network, we filtered the original list and found 11 novel robust candidate clock genes. Using the criteria of circadian proteomic expression, circadian expression in multiple tissues and independent gene knockdown data, we propose six genes (Por, Mtss1, Dgat2, Pim3, Ppp1r3b, Upp2) involved in metabolism and cancer for further experimental investigation. The availability of

  3. Function Clustering Self-Organization Maps (FCSOMs) for mining differentially expressed genes in Drosophila and its correlation with the growth medium.

    PubMed

    Liu, L L; Liu, M J; Ma, M

    2015-09-28

    The central task of this study was to mine the gene-to-medium relationship. Adequate knowledge of this relationship could potentially improve the accuracy of differentially expressed gene mining. One of the approaches to differentially expressed gene mining uses conventional clustering algorithms to identify the gene-to-medium relationship. Compared to conventional clustering algorithms, self-organization maps (SOMs) identify the nonlinear aspects of the gene-to-medium relationships by mapping the input space into another higher dimensional feature space. However, SOMs are not suitable for huge datasets consisting of millions of samples. Therefore, a new computational model, the Function Clustering Self-Organization Maps (FCSOMs), was developed. FCSOMs take advantage of the theory of granular computing as well as advanced statistical learning methodologies, and are built specifically for each information granule (a function cluster of genes), which are intelligently partitioned by the clustering algorithm provided by the DAVID_6.7 software platform. However, only the gene functions, and not their expression values, are considered in the fuzzy clustering algorithm of DAVID. Compared to the clustering algorithm of DAVID, these experimental results show a marked improvement in the accuracy of classification with the application of FCSOMs. FCSOMs can handle huge datasets and their complex classification problems, as each FCSOM (modeled for each function cluster) can be easily parallelized.

  4. In silico clustering of Salmonella global gene expression data reveals novel genes co-regulated with the SPI-1 virulence genes through HilD

    PubMed Central

    Martínez-Flores, Irma; Pérez-Morales, Deyanira; Sánchez-Pérez, Mishael; Paredes, Claudia C.; Collado-Vides, Julio; Salgado, Heladia; Bustamante, Víctor H.

    2016-01-01

    A wide variety of Salmonella enterica serovars cause intestinal and systemic infections to humans and animals. Salmonella Patogenicity Island 1 (SPI-1) is a chromosomal region containing 39 genes that have crucial virulence roles. The AraC-like transcriptional regulator HilD, encoded in SPI-1, positively controls the expression of the SPI-1 genes, as well as of several other virulence genes located outside SPI-1. In this study, we applied a clustering method to the global gene expression data of S. enterica serovar Typhimurium from the COLOMBOS database; thus genes that show an expression pattern similar to that of SPI-1 genes were selected. This analysis revealed nine novel genes that are co-expressed with SPI-1, which are located in different chromosomal regions. Expression analyses and protein-DNA interaction assays showed regulation by HilD for six of these genes: gtgE, phoH, sinR, SL1263 (lpxR) and SL4247 were regulated directly, whereas SL1896 was regulated indirectly. Interestingly, phoH is an ancestral gene conserved in most of bacteria, whereas the other genes show characteristics of genes acquired by Salmonella. A role in virulence has been previously demonstrated for gtgE, lpxR and sinR. Our results further expand the regulon of HilD and thus identify novel possible Salmonella virulence genes. PMID:27886269

  5. Leveraging global gene expression patterns to predict expression of unmeasured genes.

    PubMed

    Rudd, James; Zelaya, René A; Demidenko, Eugene; Goode, Ellen L; Greene, Casey S; Doherty, Jennifer A

    2015-12-15

    Large collections of paraffin-embedded tissue represent a rich resource to test hypotheses based on gene expression patterns; however, measurement of genome-wide expression is cost-prohibitive on a large scale. Using the known expression correlation structure within a given disease type (in this case, high grade serous ovarian cancer; HGSC), we sought to identify reduced sets of directly measured (DM) genes which could accurately predict the expression of a maximized number of unmeasured genes. We developed a greedy gene set selection (GGS) algorithm which returns a DM set of user specified size based on a specific correlation threshold (|rP|) and minimum number of DM genes that must be correlated to an unmeasured gene in order to infer the value of the unmeasured gene (redundancy). We evaluated GGS in the Cancer Genome Atlas (TCGA) HGSC data across 144 combinations of DM size, redundancy (1-3), and |rP| (0.60, 0.65, 0.70). Across the parameter sweep, GGS allows on average 9 times more gene expression information to be captured compared to the DM set alone. GGS successfully augments prognostic HGSC gene sets; the addition of 20 GGS selected genes more than doubles the number of genes whose expression is predictable. Moreover, the expression prediction is highly accurate. After training regression models for the predictable gene set using 2/3 of the TCGA data, the average accuracy (ranked correlation of true and predicted values) in the 1/3 testing partition and four independent populations is above 0.65 and approaches 0.8 for conservative parameter sets. We observe similar accuracies in the TCGA HGSC RNA-sequencing data. Specifically, the prediction accuracy increases with increasing redundancy and increasing |rP|. GGS-selected genes, which maximize expression information about unmeasured genes, can be combined with candidate gene sets as a cost effective way to increase the amount of gene expression information obtained in large studies. This method can be applied

  6. Benzo[a]pyrene decreases global and gene specific DNA methylation during zebrafish development

    USDA-ARS?s Scientific Manuscript database

    DNA methylation is important for gene regulation and is vulnerable to early-life exposure to environmental contaminants. We found that direct waterborne benzo[a]pyrene (BaP) exposure at 24 'g/L from 2.5 to 96 hours post fertilization (hpf) to zebrafish embryos significantly decreased global cytosine...

  7. The facilitating roles and uses of gene banks in addressing the global plan of action

    USDA-ARS?s Scientific Manuscript database

    Contractions of livestock genetic resources are occurring as countries strive to meet increasing demand for livestock products. The Global Plan of Action’s (GPA) Strategic Priority Area 3 – Conservation, calls for governments to establish gene banks for ex-situ cryogenic conservation. Establishment ...

  8. [Expression of the genes encoding RhtB family proteins depends on global regulator Lrp].

    PubMed

    Kutukova, E A; Zakataeva, N P; Livshits, V A

    2005-01-01

    In the present work, further study of the genes encoding RhtB family proteins is presented. In our previous work the involvement of two family members, RhtB and RhtC, in efflux of amino acids was demonstrated. Now we investigated regulation of expression of the rhtB, rhtC, yeaS and yahN genes. It is shown that expression of these genes is under control of the global regulator Lrp, depends on the presence of some amino acids in growth medium, and increases during different physiological stresses.

  9. Expression of immunoregulatory genes and its relationship to lead exposure and lead-mediated oxidative stress in wild ungulates from an abandoned mining area.

    PubMed

    Rodríguez-Estival, Jaime; de la Lastra, José M Pérez; Ortiz-Santaliestra, Manuel E; Vidal, Dolors; Mateo, Rafael

    2013-04-01

    Lead (Pb) is a highly toxic metal that can induce oxidative stress and affect the immune system by modifying the expression of immunomodulator-related genes. The aim of the present study was to investigate the association between Pb exposure and the transcriptional profiles of some cytokines, as well as the relationship between Pb exposure and changes in oxidative stress biomarkers observed in the spleen of wild ungulates exposed to mining pollution. Red deer and wild boar from the mining area studied had higher spleen, liver, and bone Pb levels than controls, indicating a chronic exposure to Pb pollution. Such exposure caused a depletion of spleen glutathione levels in both species and disrupted the activity of antioxidant enzymes, suggesting the generation of oxidative stress conditions. Deer from the mining area also showed an induced T-helper (Th )-dependent immune response toward the Th 2 pathway, whereas boar from the mining area showed a cytokine profile suggesting an inclination of the immune response toward the Th 1 pathway. These results indicate that environmental exposure to Pb may alter immune responses in wild ungulates exposed to mining pollution. However, evidence of direct relationships between Pb-mediated oxidative stress and the changes detected in immune responses were not found. Further research is needed to evaluate the immunotoxic potential of Pb pollution, also considering the prevalence of chronic infectious diseases in wildlife in environments affected by mining activities.

  10. Mobile genes in the human microbiome are structured from global to individual scales

    PubMed Central

    Brito, IL; Jupiter, SD; Jenkins, AP; Naisilisili, W; Tamminen, M; Smillie, CS; Wortman, JR; Birren, BW; Xavier, RJ; Blainey, PC; Singh, AK; Gevers, D; Alm, EJ

    2016-01-01

    Recent work has underscored the importance of the microbiome in human health, largely attributing differences in phenotype to differences in the species present across individuals1,2,3,4,5. But mobile genes can confer profoundly different phenotypes on different strains of the same species. Little is known about the function and distribution of mobile genes in the human microbiome, and in particular whether the gene pool is globally homogenous or constrained by human population structure. Here, we investigate this question by comparing the mobile genes found in the microbiomes of 81 metropolitan North Americans with that of 172 agrarian Fiji islanders using a combination of single-cell genomics and metagenomics. We find large differences in mobile gene content between the Fijian and North American microbiomes, with functional variation that mirrors known dietary differences such as the excess of plant-based starch degradation genes. Remarkably, differences are also observed between the mobile gene pools of proximal Fijian villages, even though microbiome composition across villages is similar. Finally, we observe high rates of recombination leading to individual-specific mobile elements, suggesting that the abundance of some genes may reflect environmental selection rather than dispersal limitation. Together, these data support the hypothesis that human activities and behaviors provide selective pressures that shape mobile gene pools, and that acquisition of mobile genes is important to colonizing specific human populations. PMID:27409808

  11. Effect of ovarian hormones on the healthy equine uterus: a global gene expression analysis.

    PubMed

    Marth, Christina D; Young, Neil D; Glenton, Lisa Y; Noden, Drew M; Browning, Glenn F; Krekeler, Natali

    2015-05-20

    The physiological changes associated with the varying hormonal environment throughout the oestrous cycle are linked to the different functions the uterus needs to fulfil. The aim of the present study was to generate global gene expression profiles for the equine uterus during oestrus and Day 5 of dioestrus. To achieve this, samples were collected from five horses during oestrus (follicle >35 mm in diameter) and dioestrus (5 days after ovulation) and analysed using high-throughput RNA sequencing techniques (RNA-Seq). Differentially expressed genes between the two cycle stages were further investigated using Gene Ontology and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses. The expression of 1577 genes was found to be significantly upregulated during oestrus, whereas 1864 genes were expressed at significantly higher levels in dioestrus. Most genes upregulated during oestrus were associated with the extracellular matrix, signal interaction and transduction, cell communication or immune function, whereas genes expressed at higher levels in early dioestrus were most commonly associated with metabolic or transport functions, correlating well with the physiological functions of the uterus. These results allow for a more complete understanding of the hormonal influence on gene expression in the equine uterus by functional analysis of up- and downregulated genes in oestrus and dioestrus, respectively. In addition, a valuable baseline is provided for further research, including analyses of changes associated with uterine inflammation.

  12. Mining microbial metatranscriptomes for expression of antibiotic resistance genes under natural conditions

    NASA Astrophysics Data System (ADS)

    Versluis, Dennis; D'Andrea, Marco Maria; Ramiro Garcia, Javier; Leimena, Milkha M.; Hugenholtz, Floor; Zhang, Jing; Öztürk, Başak; Nylund, Lotta; Sipkema, Detmer; Schaik, Willem Van; de Vos, Willem M.; Kleerebezem, Michiel; Smidt, Hauke; Passel, Mark W. J. Van

    2015-07-01

    Antibiotic resistance genes are found in a broad range of ecological niches associated with complex microbiota. Here we investigated if resistance genes are not only present, but also transcribed under natural conditions. Furthermore, we examined the potential for antibiotic production by assessing the expression of associated secondary metabolite biosynthesis gene clusters. Metatranscriptome datasets from intestinal microbiota of four human adults, one human infant, 15 mice and six pigs, of which only the latter have received antibiotics prior to the study, as well as from sea bacterioplankton, a marine sponge, forest soil and sub-seafloor sediment, were investigated. We found that resistance genes are expressed in all studied ecological niches, albeit with niche-specific differences in relative expression levels and diversity of transcripts. For example, in mice and human infant microbiota predominantly tetracycline resistance genes were expressed while in human adult microbiota the spectrum of expressed genes was more diverse, and also included β-lactam, aminoglycoside and macrolide resistance genes. Resistance gene expression could result from the presence of natural antibiotics in the environment, although we could not link it to expression of corresponding secondary metabolites biosynthesis clusters. Alternatively, resistance gene expression could be constitutive, or these genes serve alternative roles besides antibiotic resistance.

  13. Mining microbial metatranscriptomes for expression of antibiotic resistance genes under natural conditions

    PubMed Central

    Versluis, Dennis; D’Andrea, Marco Maria; Ramiro Garcia, Javier; Leimena, Milkha M.; Hugenholtz, Floor; Zhang, Jing; Öztürk, Başak; Nylund, Lotta; Sipkema, Detmer; Schaik, Willem van; de Vos, Willem M.; Kleerebezem, Michiel; Smidt, Hauke; Passel, Mark W.J. van

    2015-01-01

    Antibiotic resistance genes are found in a broad range of ecological niches associated with complex microbiota. Here we investigated if resistance genes are not only present, but also transcribed under natural conditions. Furthermore, we examined the potential for antibiotic production by assessing the expression of associated secondary metabolite biosynthesis gene clusters. Metatranscriptome datasets from intestinal microbiota of four human adults, one human infant, 15 mice and six pigs, of which only the latter have received antibiotics prior to the study, as well as from sea bacterioplankton, a marine sponge, forest soil and sub-seafloor sediment, were investigated. We found that resistance genes are expressed in all studied ecological niches, albeit with niche-specific differences in relative expression levels and diversity of transcripts. For example, in mice and human infant microbiota predominantly tetracycline resistance genes were expressed while in human adult microbiota the spectrum of expressed genes was more diverse, and also included β-lactam, aminoglycoside and macrolide resistance genes. Resistance gene expression could result from the presence of natural antibiotics in the environment, although we could not link it to expression of corresponding secondary metabolites biosynthesis clusters. Alternatively, resistance gene expression could be constitutive, or these genes serve alternative roles besides antibiotic resistance. PMID:26153129

  14. Mining microbial metatranscriptomes for expression of antibiotic resistance genes under natural conditions.

    PubMed

    Versluis, Dennis; D'Andrea, Marco Maria; Ramiro Garcia, Javier; Leimena, Milkha M; Hugenholtz, Floor; Zhang, Jing; Öztürk, Başak; Nylund, Lotta; Sipkema, Detmer; van Schaik, Willem; de Vos, Willem M; Kleerebezem, Michiel; Smidt, Hauke; van Passel, Mark W J

    2015-07-08

    Antibiotic resistance genes are found in a broad range of ecological niches associated with complex microbiota. Here we investigated if resistance genes are not only present, but also transcribed under natural conditions. Furthermore, we examined the potential for antibiotic production by assessing the expression of associated secondary metabolite biosynthesis gene clusters. Metatranscriptome datasets from intestinal microbiota of four human adults, one human infant, 15 mice and six pigs, of which only the latter have received antibiotics prior to the study, as well as from sea bacterioplankton, a marine sponge, forest soil and sub-seafloor sediment, were investigated. We found that resistance genes are expressed in all studied ecological niches, albeit with niche-specific differences in relative expression levels and diversity of transcripts. For example, in mice and human infant microbiota predominantly tetracycline resistance genes were expressed while in human adult microbiota the spectrum of expressed genes was more diverse, and also included β-lactam, aminoglycoside and macrolide resistance genes. Resistance gene expression could result from the presence of natural antibiotics in the environment, although we could not link it to expression of corresponding secondary metabolites biosynthesis clusters. Alternatively, resistance gene expression could be constitutive, or these genes serve alternative roles besides antibiotic resistance.

  15. Data mining in networks of differentially expressed genes during sow pregnancy.

    PubMed

    Wang, Ligang; Zhang, Longchao; Li, Yong; Li, Wen; Luo, Weizhen; Cheng, Duxue; Yan, Hua; Ma, Xiaojun; Liu, Xin; Song, Xin; Liang, Jing; Zhao, Kebin; Wang, Lixian

    2012-01-01

    Small to moderate gains in Pig fertility can mean large returns in overall efficiency, and developing methods to improve it is highly desirable. High fertility rates depend on completion of successful pregnancies. To understand the molecular signals associated with pregnancy in sows, expression profiling experiments were conducted to identify differentially expressed genes in ovary and myometrium at different pregnancy periods using the Affymetrix Porcine GeneChip(TM). A total of 974, 1800, 335 and 710 differentially expressed transcripts were identified in the myometrium during early pregnancy (EP) and late pregnancy (LP), and in the ovary during EP and LP, respectively. Self-Organizing Map (SOM) clusters indicated the differentially expressed genes belonged to 7 different functional groups. Based on BLASTX searches and Gene Ontology (GO) classifications, 129 unique genes closely related to pregnancy showed differential expression patterns. GO analysis also indicated that there were 21 different molecular function categories, 20 different biological process categories, and 8 different cellular component categories of genes differentially expressed during sow pregnancy. Gene regulatory network reconstruction provided us with an interaction model of known genes such as insulin-like growth factor 2 (IGF2) gene, estrogen receptor (ESR) gene, retinol-binding protein-4 (RBP4) gene, and several unknown candidate genes related to reproduction. Several pitch point genes were selected for association study with reproduction traits. For instance, DPPA5 g.363 T>C was found to associate with litter born weight at later parities in Beijing Black pigs significantly (p < 0.05). Overall, this study contributes to elucidating the mechanism underlying pregnancy processes, which maybe provide valuable information for pig reproduction improvement.

  16. Mining for Nonribosomal Peptide Synthetase and Polyketide Synthase Genes Revealed a High Level of Diversity in the Sphagnum Bog Metagenome

    PubMed Central

    Müller, Christina A.; Oberauner-Wappis, Lisa; Peyman, Armin; Amos, Gregory C. A.; Wellington, Elizabeth M. H.

    2015-01-01

    Sphagnum bog ecosystems are among the oldest vegetation forms harboring a specific microbial community and are known to produce an exceptionally wide variety of bioactive substances. Although the Sphagnum metagenome shows a rich secondary metabolism, the genes have not yet been explored. To analyze nonribosomal peptide synthetases (NRPSs) and polyketide synthases (PKSs), the diversity of NRPS and PKS genes in Sphagnum-associated metagenomes was investigated by in silico data mining and sequence-based screening (PCR amplification of 9,500 fosmid clones). The in silico Illumina-based metagenomic approach resulted in the identification of 279 NRPSs and 346 PKSs, as well as 40 PKS-NRPS hybrid gene sequences. The occurrence of NRPS sequences was strongly dominated by the members of the Protebacteria phylum, especially by species of the Burkholderia genus, while PKS sequences were mainly affiliated with Actinobacteria. Thirteen novel NRPS-related sequences were identified by PCR amplification screening, displaying amino acid identities of 48% to 91% to annotated sequences of members of the phyla Proteobacteria, Actinobacteria, and Cyanobacteria. Some of the identified metagenomic clones showed the closest similarity to peptide synthases from Burkholderia or Lysobacter, which are emerging bacterial sources of as-yet-undescribed bioactive metabolites. This report highlights the role of the extreme natural ecosystems as a promising source for detection of secondary compounds and enzymes, serving as a source for biotechnological applications. PMID:26002894

  17. Mining for Nonribosomal Peptide Synthetase and Polyketide Synthase Genes Revealed a High Level of Diversity in the Sphagnum Bog Metagenome.

    PubMed

    Müller, Christina A; Oberauner-Wappis, Lisa; Peyman, Armin; Amos, Gregory C A; Wellington, Elizabeth M H; Berg, Gabriele

    2015-08-01

    Sphagnum bog ecosystems are among the oldest vegetation forms harboring a specific microbial community and are known to produce an exceptionally wide variety of bioactive substances. Although the Sphagnum metagenome shows a rich secondary metabolism, the genes have not yet been explored. To analyze nonribosomal peptide synthetases (NRPSs) and polyketide synthases (PKSs), the diversity of NRPS and PKS genes in Sphagnum-associated metagenomes was investigated by in silico data mining and sequence-based screening (PCR amplification of 9,500 fosmid clones). The in silico Illumina-based metagenomic approach resulted in the identification of 279 NRPSs and 346 PKSs, as well as 40 PKS-NRPS hybrid gene sequences. The occurrence of NRPS sequences was strongly dominated by the members of the Protebacteria phylum, especially by species of the Burkholderia genus, while PKS sequences were mainly affiliated with Actinobacteria. Thirteen novel NRPS-related sequences were identified by PCR amplification screening, displaying amino acid identities of 48% to 91% to annotated sequences of members of the phyla Proteobacteria, Actinobacteria, and Cyanobacteria. Some of the identified metagenomic clones showed the closest similarity to peptide synthases from Burkholderia or Lysobacter, which are emerging bacterial sources of as-yet-undescribed bioactive metabolites. This report highlights the role of the extreme natural ecosystems as a promising source for detection of secondary compounds and enzymes, serving as a source for biotechnological applications.

  18. Global analysis of genes involved in freshwater adaptation in threespine sticklebacks (Gasterosteus aculeatus).

    PubMed

    DeFaveri, Jacquelin; Shikano, Takahito; Shimada, Yukinori; Goto, Akira; Merilä, Juha

    2011-06-01

    Examples of parallel evolution of phenotypic traits have been repeatedly demonstrated in threespine sticklebacks (Gasterosteus aculeatus) across their global distribution. Using these as a model, we performed a targeted genome scan--focusing on physiologically important genes potentially related to freshwater adaptation--to identify genetic signatures of parallel physiological evolution on a global scale. To this end, 50 microsatellite loci, including 26 loci within or close to (<6 kb) physiologically important genes, were screened in paired marine and freshwater populations from six locations across the Northern Hemisphere. Signatures of directional selection were detected in 24 loci, including 17 physiologically important genes, in at least one location. Although no loci showed consistent signatures of selection in all divergent population pairs, several outliers were common in multiple locations. In particular, seven physiologically important genes, as well as reference ectodysplasin gene (EDA), showed signatures of selection in three or more locations. Hence, although these results give some evidence for consistent parallel molecular evolution in response to freshwater colonization, they suggest that different evolutionary pathways may underlie physiological adaptation to freshwater habitats within the global distribution of the threespine stickleback.

  19. Global gene expression profiles reveal significant nuclear reprogramming by the blastocyst stage after cloning.

    PubMed

    Smith, Sadie L; Everts, Robin E; Tian, X Cindy; Du, Fuliang; Sung, Li-Ying; Rodriguez-Zas, Sandra L; Jeong, Byeong-Seon; Renard, Jean-Paul; Lewin, Harris A; Yang, Xiangzhong

    2005-12-06

    Nuclear transfer (NT) has potential applications in agriculture and biomedicine, but the technology is hindered by low efficiency. Global gene expression analysis of clones is important for the comprehensive study of nuclear reprogramming. Here, we compared global gene expression profiles of individual bovine NT blastocysts with their somatic donor cells and fertilized control embryos using cDNA microarray technology. The NT embryos' gene expression profiles were drastically different from those of their donor cells and closely resembled those of the naturally fertilized embryos. Our findings demonstrate that the NT embryos have undergone significant nuclear reprogramming by the blastocyst stage; however, problems may occur during redifferentiation for tissue genesis and organogenesis, and small reprogramming errors may be magnified downstream in development.

  20. Drug Repositioning through Systematic Mining of Gene Coexpression Networks in Cancer

    PubMed Central

    Ivliev, Alexander E.; ‘t Hoen, Peter A. C.; Borisevich, Dmitrii; Nikolsky, Yuri; Sergeeva, Marina G.

    2016-01-01

    Gene coexpression network analysis is a powerful “data-driven” approach essential for understanding cancer biology and mechanisms of tumor development. Yet, despite the completion of thousands of studies on cancer gene expression, there have been few attempts to normalize and integrate co-expression data from scattered sources in a concise “meta-analysis” framework. We generated such a resource by exploring gene coexpression networks in 82 microarray datasets from 9 major human cancer types. The analysis was conducted using an elaborate weighted gene coexpression network (WGCNA) methodology and identified over 3,000 robust gene coexpression modules. The modules covered a range of known tumor features, such as proliferation, extracellular matrix remodeling, hypoxia, inflammation, angiogenesis, tumor differentiation programs, specific signaling pathways, genomic alterations, and biomarkers of individual tumor subtypes. To prioritize genes with respect to those tumor features, we ranked genes within each module by connectivity, leading to identification of module-specific functionally prominent hub genes. To showcase the utility of this network information, we positioned known cancer drug targets within the coexpression networks and predicted that Anakinra, an anti-rheumatoid therapeutic agent, may be promising for development in colorectal cancer. We offer a comprehensive, normalized and well documented collection of >3000 gene coexpression modules in a variety of cancers as a rich data resource to facilitate further progress in cancer research. PMID:27824868

  1. Temporal representation for gene networks: towards a qualitative temporal data mining.

    PubMed

    Turenne, Nicolas; Schwer, Sylviane R

    2008-01-01

    Processing literature (i.e., text corpora) to capture gene regulation events is not easy and can be driven by the final data representation. We propose to build, manually, an example of temporal representation (whole gene networks for coat formation in Bacillus Subtilis). Our temporal representation is based on a generalised formal language theory (S-languages). We propose an algorithm to link bags of relations with representation, by ordering interactions. In this paper, starting from the network made manually from text data, we show that S-languages are quite relevant to encapsulate gene properties, and infer knowledge across timestamped gene relations found in texts.

  2. Drug Repositioning through Systematic Mining of Gene Coexpression Networks in Cancer.

    PubMed

    Ivliev, Alexander E; 't Hoen, Peter A C; Borisevich, Dmitrii; Nikolsky, Yuri; Sergeeva, Marina G

    2016-01-01

    Gene coexpression network analysis is a powerful "data-driven" approach essential for understanding cancer biology and mechanisms of tumor development. Yet, despite the completion of thousands of studies on cancer gene expression, there have been few attempts to normalize and integrate co-expression data from scattered sources in a concise "meta-analysis" framework. We generated such a resource by exploring gene coexpression networks in 82 microarray datasets from 9 major human cancer types. The analysis was conducted using an elaborate weighted gene coexpression network (WGCNA) methodology and identified over 3,000 robust gene coexpression modules. The modules covered a range of known tumor features, such as proliferation, extracellular matrix remodeling, hypoxia, inflammation, angiogenesis, tumor differentiation programs, specific signaling pathways, genomic alterations, and biomarkers of individual tumor subtypes. To prioritize genes with respect to those tumor features, we ranked genes within each module by connectivity, leading to identification of module-specific functionally prominent hub genes. To showcase the utility of this network information, we positioned known cancer drug targets within the coexpression networks and predicted that Anakinra, an anti-rheumatoid therapeutic agent, may be promising for development in colorectal cancer. We offer a comprehensive, normalized and well documented collection of >3000 gene coexpression modules in a variety of cancers as a rich data resource to facilitate further progress in cancer research.

  3. Mining for Candidate Genes in an Introgression Line by Using RNA Sequencing: The Anthocyanin Overaccumulation Phenotype in Brassica

    PubMed Central

    Xie, Lulu; Li, Fei; Zhang, Shifan; Zhang, Hui; Qian, Wei; Li, Peirong; Zhang, Shujiang; Sun, Rifei

    2016-01-01

    Introgression breeding is a widely used method for the genetic improvement of crop plants; however, the mechanism underlying candidate gene flow patterns during hybridization is poorly understood. In this study, we used a powerful pipeline to investigate a Chinese cabbage (Brassica rapa L. ssp. pekinensis) introgression line with the anthocyanin overaccumulation phenotype. Our purpose was to analyze the gene flow patterns during hybridization and elucidate the genetic factors responsible for the accumulation of this important pigment compound. We performed RNA-seq analysis by using two pipelines, one with and one without a reference sequence, to obtain transcriptome data. We identified 930 significantly differentially expressed genes (DEGs) between the purple-leaf introgression line and B. rapa green cultivar, namely, 389 up-regulated and 541 down-regulated DEGs that mapped to the B. rapa reference genome. Since only one anthocyanin pathway regulatory gene was identified, i.e., Bra037887 (bHLH), we mined unmapped reads, revealing 2031 de novo assembled unigenes, including c3563g1i2. Phylogenetic analysis suggested that c3563g1i2, which was transferred from the Brassica B genome of the donor parental line Brassica juncea, may represent an R2R3-MYB transcription factor that participates in the ternary transcriptional activation complex responsible for the anthocyanin overaccumulation phenotype of the B. rapa introgression line. We also identified genes involved in cold and light reaction pathways that were highly upregulated in the introgression line, as confirmed using quantitative real-time PCR analysis. The results of this study shed light on the mechanisms underlying the purple leaf trait in Brassica plants and may facilitate the use of introgressive hybridization for many traits of interest. PMID:27597857

  4. The impact of endurance exercise on global and AMPK gene-specific DNA methylation

    SciTech Connect

    King-Himmelreich, Tanya S.; Schramm, Stefanie; Wolters, Miriam C.; Schmetzer, Julia; Möser, Christine V.; Knothe, Claudia; Geisslinger, Gerd

    2016-05-27

    Alterations in gene expression as a consequence of physical exercise are frequently described. The mechanism of these regulations might depend on epigenetic changes in global or gene-specific DNA methylation levels. The AMP-activated protein kinase (AMPK) plays a key role in maintenance of energy homeostasis and is activated by increases in the AMP/ATP ratio as occurring in skeletal muscles after sporting activity. To analyze whether exercise has an impact on the methylation status of the AMPK promoter, we determined the AMPK methylation status in human blood samples from patients before and after sporting activity in the context of rehabilitation as well as in skeletal muscles of trained and untrained mice. Further, we examined long interspersed nuclear element 1 (LINE-1) as indicator of global DNA methylation changes. Our results revealed that light sporting activity in mice and humans does not alter global DNA methylation but has an effect on methylation of specific CpG sites in the AMPKα2 gene. These regulations were associated with a reduced AMPKα2 mRNA and protein expression in muscle tissue, pointing at a contribution of the methylation status to AMPK expression. Taken together, these results suggest that exercise influences AMPKα2 gene methylation in human blood and eminently in the skeletal muscle of mice and therefore might repress AMPKα2 gene expression. -- Highlights: •AMPK gene methylation increases after moderate endurance exercise in humans and mice. •AMPKα mRNA and protein decrease after moderate endurance exercise in mice. •Global DNA methylation is not affected under the same conditions.

  5. Manteia, a predictive data mining system for vertebrate genes and its applications to human genetic diseases.

    PubMed

    Tassy, Olivier; Pourquié, Olivier

    2014-01-01

    The function of genes is often evolutionarily conserved, and comparing the annotation of ortholog genes in different model organisms has proved to be a powerful predictive tool to identify the function of human genes. Here, we describe Manteia, a resource available online at http://manteia.igbmc.fr. Manteia allows the comparison of embryological, expression, molecular and etiological data from human, mouse, chicken and zebrafish simultaneously to identify new functional and structural correlations and gene-disease associations. Manteia is particularly useful for the analysis of gene lists produced by high-throughput techniques such as microarrays or proteomics. Data can be easily analyzed statistically to characterize the function of groups of genes and to correlate the different aspects of their annotation. Sophisticated querying tools provide unlimited ways to merge the information contained in Manteia along with the possibility of introducing custom user-designed biological questions into the system. This allows for example to connect all the animal experimental results and annotations to the human genome, and take advantage of data not available for human to look for candidate genes responsible for genetic disorders. Here, we demonstrate the predictive and analytical power of the system to predict candidate genes responsible for human genetic diseases.

  6. Global gene expression profiling reveals genes expressed differentially in cattle with high and low residual feed intake.

    PubMed

    Chen, Y; Gondro, C; Quinn, K; Herd, R M; Parnell, P F; Vanselow, B

    2011-10-01

    Feed efficiency is an economically important trait in beef production. It can be measured as residual feed intake. This is the difference between actual feed intake recorded over a test period and the expected feed intake of an animal based on its size and growth rate. DNA-based marker-assisted selection would help beef breeders to accelerate genetic improvement for feed efficiency by reducing the generation interval and would obviate the high cost of measuring residual feed intake. Although numbers of quantitative trait loci and candidate genes have been identified with the advance of molecular genetics, our understanding of the physiological mechanisms and the nature of genes underlying residual feed intake is limited. The aim of the study was to use global gene expression profiling by microarray to identify genes that are differentially expressed in cattle, using lines genetically selected for low and high residual feed intake, and to uncover candidate genes for residual feed intake. A long-oligo microarray with 24 000 probes was used to profile the liver transcriptome of 44 cattle selected for high or low residual feed intake. One hundred and sixty-one unique genes were identified as being differentially expressed between animals with high and low residual feed intake. These genes were involved in seven gene networks affecting cellular growth and proliferation, cellular assembly and organization, cell signalling, drug metabolism, protein synthesis, lipid metabolism, and carbohydrate metabolism. Analysis of functional data using a transcriptional approach allows a better understanding of the underlying biological processes involved in residual feed intake and also allows the identification of candidate genes for marker-assisted selection. © 2011 The Authors, Animal Genetics © 2011 Stichting International Foundation for Animal Genetics.

  7. Global Landscape of a Co-Expressed Gene Network in Barley and its Application to Gene Discovery in Triticeae Crops

    PubMed Central

    Mochida, Keiichi; Uehara-Yamaguchi, Yukiko; Yoshida, Takuhiro; Sakurai, Tetsuya; Shinozaki, Kazuo

    2011-01-01

    Accumulated transcriptome data can be used to investigate regulatory networks of genes involved in various biological systems. Co-expression analysis data sets generated from comprehensively collected transcriptome data sets now represent efficient resources that are capable of facilitating the discovery of genes with closely correlated expression patterns. In order to construct a co-expression network for barley, we analyzed 45 publicly available experimental series, which are composed of 1,347 sets of GeneChip data for barley. On the basis of a gene-to-gene weighted correlation coefficient, we constructed a global barley co-expression network and classified it into clusters of subnetwork modules. The resulting clusters are candidates for functional regulatory modules in the barley transcriptome. To annotate each of the modules, we performed comparative annotation using genes in Arabidopsis and Brachypodium distachyon. On the basis of a comparative analysis between barley and two model species, we investigated functional properties from the representative distributions of the gene ontology (GO) terms. Modules putatively involved in drought stress response and cellulose biogenesis have been identified. These modules are discussed to demonstrate the effectiveness of the co-expression analysis. Furthermore, we applied the data set of co-expressed genes coupled with comparative analysis in attempts to discover potentially Triticeae-specific network modules. These results demonstrate that analysis of the co-expression network of the barley transcriptome together with comparative analysis should promote the process of gene discovery in barley. Furthermore, the insights obtained should be transferable to investigations of Triticeae plants. The associated data set generated in this analysis is publicly accessible at http://coexpression.psc.riken.jp/barley/. PMID:21441235

  8. Global variability in gene expression and alternative splicing is modulated by mitochondrial content.

    PubMed

    Guantes, Raul; Rastrojo, Alberto; Neves, Ricardo; Lima, Ana; Aguado, Begoña; Iborra, Francisco J

    2015-05-01

    Noise in gene expression is a main determinant of phenotypic variability. Increasing experimental evidence suggests that genome-wide cellular constraints largely contribute to the heterogeneity observed in gene products. It is still unclear, however, which global factors affect gene expression noise and to what extent. Since eukaryotic gene expression is an energy demanding process, differences in the energy budget of each cell could determine gene expression differences. Here, we quantify the contribution of mitochondrial variability (a natural source of ATP variation) to global variability in gene expression. We find that changes in mitochondrial content can account for ∼50% of the variability observed in protein levels. This is the combined result of the effect of mitochondria dosage on transcription and translation apparatus content and activities. Moreover, we find that mitochondrial levels have a large impact on alternative splicing, thus modulating both the abundance and type of mRNAs. A simple mathematical model in which mitochondrial content simultaneously affects transcription rate and splicing site choice can explain the alternative splicing data. The results of this study show that mitochondrial content (and/or probably function) influences mRNA abundance, translation, and alternative splicing, which ultimately affects cellular phenotype.

  9. Global variability in gene expression and alternative splicing is modulated by mitochondrial content

    PubMed Central

    Guantes, Raul; Rastrojo, Alberto; Neves, Ricardo; Lima, Ana; Aguado, Begoña; Iborra, Francisco J.

    2015-01-01

    Noise in gene expression is a main determinant of phenotypic variability. Increasing experimental evidence suggests that genome-wide cellular constraints largely contribute to the heterogeneity observed in gene products. It is still unclear, however, which global factors affect gene expression noise and to what extent. Since eukaryotic gene expression is an energy demanding process, differences in the energy budget of each cell could determine gene expression differences. Here, we quantify the contribution of mitochondrial variability (a natural source of ATP variation) to global variability in gene expression. We find that changes in mitochondrial content can account for ∼50% of the variability observed in protein levels. This is the combined result of the effect of mitochondria dosage on transcription and translation apparatus content and activities. Moreover, we find that mitochondrial levels have a large impact on alternative splicing, thus modulating both the abundance and type of mRNAs. A simple mathematical model in which mitochondrial content simultaneously affects transcription rate and splicing site choice can explain the alternative splicing data. The results of this study show that mitochondrial content (and/or probably function) influences mRNA abundance, translation, and alternative splicing, which ultimately affects cellular phenotype. PMID:25800673

  10. Global Gene Expression Profiling in Lung Tissues of Rat Exposed to Lunar Dust Particles

    NASA Technical Reports Server (NTRS)

    Yeshitla, Samrawit A.; Lam, Chiu-Wing; Kidane, Yared H.; Feiveson, Alan H.; Ploutz-Snyder, Robert; Wu, Honglu; James, John T.; Meyers, Valerie E.; Zhang, Ye

    2014-01-01

    The Moon's surface is covered by a layer of fine, potential reactive dust. Lunar dust contain about 1-2% respirable very fine dust (less than 3 micrometers). The habitable area of any lunar landing vehicle and outpost would inevitably be contaminated with lunar dust that could pose a health risk. The purpose of the study is to analyze the dynamics of global gene expression changes in lung tissues of rats exposed to lunar dust particles. F344 rats were exposed for 4 weeks (6h/d; 5d/wk) in nose-only inhalation chambers to concentrations of 0 (control air), 2.1, 6.8, 21, and 61 mg/m3 of lunar dust. Animals were euthanized at 1 day and 13 weeks after the last inhalation exposure. After being lavaged, lung tissue from each animal was collected and total RNA was isolated. Four samples of each dose group were analyzed using Agilent Rat GE v3 microarray to profile global gene expression of 44K transcripts. After background subtraction, normalization, and log transformation, t tests were used to compare the mean expression levels of each exposed group to the control group. Correction for multiple testing was made using the method of Benjamini, Krieger, and Yekuteli (1) to control the false discovery rate. Genes with significant changes of at least 1.75 fold were identified as genes of interest. Both low and high doses of lunar dust caused dramatic, dose-dependent global gene expression changes in the lung tissues. However, the responses of lung tissue to low dose lunar dust are distinguished from those of high doses, especially those associated with 61mg/m3 dust exposure. The data were further integrated into the Ingenuity system to analyze the gene ontology (GO), pathway distribution and putative upstream regulators and gene targets. Multiple pathways, functions, and upstream regulators have been identified in response to lunar dust induced damage in the lung tissue.

  11. Global Regulation of Gene Expression by the MafR Protein of Enterococcus faecalis

    PubMed Central

    Ruiz-Cruz, Sofía; Espinosa, Manuel; Goldmann, Oliver; Bravo, Alicia

    2016-01-01

    Enterococcus faecalis is a natural inhabitant of the human gastrointestinal tract. However, as an opportunistic pathogen, it is able to colonize other host niches and cause life-threatening infections. Its adaptation to new environments involves global changes in gene expression. The EF3013 gene (here named mafR) of E. faecalis strain V583 encodes a protein (MafR, 482 residues) that has sequence similarity to global response regulators of the Mga/AtxA family. The enterococcal OG1RF genome also encodes the MafR protein (gene OG1RF_12293). In this work, we have identified the promoter of the mafR gene using several in vivo approaches. Moreover, we show that MafR influences positively the transcription of many genes on a genome-wide scale. The most significant target genes encode components of PTS-type membrane transporters, components of ABC-type membrane transporters, and proteins involved in the metabolism of carbon sources. Some of these genes were previously reported to be up-regulated during the growth of E. faecalis in blood and/or in human urine. Furthermore, we show that a mafR deletion mutant strain induces a significant lower degree of inflammation in the peritoneal cavity of mice, suggesting that enterococcal cells deficient in MafR are less virulent. Our work indicates that MafR is a global transcriptional regulator. It might facilitate the adaptation of E. faecalis to particular host niches and, therefore, contribute to its potential virulence. PMID:26793169

  12. GeoChip-Based Analysis of the Functional Gene Diversity and Metabolic Potential of Microbial Communities in Acid Mine Drainage▿ †

    PubMed Central

    Xie, Jianping; He, Zhili; Liu, Xinxing; Liu, Xueduan; Van Nostrand, Joy D.; Deng, Ye; Wu, Liyou; Zhou, Jizhong; Qiu, Guanzhou

    2011-01-01

    Acid mine drainage (AMD) is an extreme environment, usually with low pH and high concentrations of metals. Although the phylogenetic diversity of AMD microbial communities has been examined extensively, little is known about their functional gene diversity and metabolic potential. In this study, a comprehensive functional gene array (GeoChip 2.0) was used to analyze the functional diversity, composition, structure, and metabolic potential of AMD microbial communities from three copper mines in China. GeoChip data indicated that these microbial communities were functionally diverse as measured by the number of genes detected, gene overlapping, unique genes, and various diversity indices. Almost all key functional gene categories targeted by GeoChip 2.0 were detected in the AMD microbial communities, including carbon fixation, carbon degradation, methane generation, nitrogen fixation, nitrification, denitrification, ammonification, nitrogen reduction, sulfur metabolism, metal resistance, and organic contaminant degradation, which suggested that the functional gene diversity was higher than was previously thought. Mantel test results indicated that AMD microbial communities are shaped largely by surrounding environmental factors (e.g., S, Mg, and Cu). Functional genes (e.g., narG and norB) and several key functional processes (e.g., methane generation, ammonification, denitrification, sulfite reduction, and organic contaminant degradation) were significantly (P < 0.10) correlated with environmental variables. This study presents an overview of functional gene diversity and the structure of AMD microbial communities and also provides insights into our understanding of metabolic potential in AMD ecosystems. PMID:21097602

  13. Reshaping of global gene expression networks and sex-biased gene expression by integration of a young gene

    PubMed Central

    Chen, Sidi; Ni, Xiaochun; Krinsky, Benjamin H; Zhang, Yong E; Vibranovski, Maria D; White, Kevin P; Long, Manyuan

    2012-01-01

    New genes originate frequently across diverse taxa. Given that genetic networks are typically comprised of robust, co-evolved interactions, the emergence of new genes raises an intriguing question: how do new genes interact with pre-existing genes? Here, we show that a recently originated gene rapidly evolved new gene networks and impacted sex-biased gene expression in Drosophila. This 4–6 million-year-old factor, named Zeus for its role in male fecundity, originated through retroposition of a highly conserved housekeeping gene, Caf40. Zeus acquired male reproductive organ expression patterns and phenotypes. Comparative expression profiling of mutants and closely related species revealed that Zeus has recruited a new set of downstream genes, and shaped the evolution of gene expression in germline. Comparative ChIP-chip revealed that the genomic binding profile of Zeus diverged rapidly from Caf40. These data demonstrate, for the first time, how a new gene quickly evolved novel networks governing essential biological processes at the genomic level. PMID:22543869

  14. Data Mining of Gene Arrays for Biomarkers of Survival in Ovarian Cancer.

    PubMed

    Coveney, Clare; Boocock, David J; Rees, Robert C; Deen, Suha; Ball, Graham R

    2015-07-17

    The expected five-year survival rate from a stage III ovarian cancer diagnosis is a mere 22%; this applies to the 7000 new cases diagnosed yearly in the UK. Stratification of patients with this heterogeneous disease, based on active molecular pathways, would aid a targeted treatment improving the prognosis for many cases. While hundreds of genes have been associated with ovarian cancer, few have yet been verified by peer research for clinical significance. Here, a meta-analysis approach was applied to two carefully selected gene expression microarray datasets. Artificial neural networks, Cox univariate survival analyses and T-tests identified genes whose expression was consistently and significantly associated with patient survival. The rigor of this experimental design increases confidence in the genes found to be of interest. A list of 56 genes were distilled from a potential 37,000 to be significantly related to survival in both datasets with a FDR of 1.39859 × 10(-11), the identities of which both verify genes already implicated with this disease and provide novel genes and pathways to pursue. Further investigation and validation of these may lead to clinical insights and have potential to predict a patient's response to treatment or be used as a novel target for therapy.

  15. Data Mining of Gene Arrays for Biomarkers of Survival in Ovarian Cancer

    PubMed Central

    Coveney, Clare; Boocock, David J.; Rees, Robert C.; Deen, Suha; Ball, Graham R.

    2015-01-01

    The expected five-year survival rate from a stage III ovarian cancer diagnosis is a mere 22%; this applies to the 7000 new cases diagnosed yearly in the UK. Stratification of patients with this heterogeneous disease, based on active molecular pathways, would aid a targeted treatment improving the prognosis for many cases. While hundreds of genes have been associated with ovarian cancer, few have yet been verified by peer research for clinical significance. Here, a meta-analysis approach was applied to two carefully selected gene expression microarray datasets. Artificial neural networks, Cox univariate survival analyses and T-tests identified genes whose expression was consistently and significantly associated with patient survival. The rigor of this experimental design increases confidence in the genes found to be of interest. A list of 56 genes were distilled from a potential 37,000 to be significantly related to survival in both datasets with a FDR of 1.39859 × 10−11, the identities of which both verify genes already implicated with this disease and provide novel genes and pathways to pursue. Further investigation and validation of these may lead to clinical insights and have potential to predict a patient’s response to treatment or be used as a novel target for therapy. PMID:27600227

  16. Global gene expression analysis of iron-inducible genes in Magnetospirillum magneticum AMB-1.

    PubMed

    Suzuki, Takeyuki; Okamura, Yoshiko; Calugay, Ronie J; Takeyama, Haruko; Matsunaga, Tadashi

    2006-03-01

    Iron uptake systems were identified by global expression profiling of Magnetospirillum magneticum AMB-1. feo, tpd, and ftr, which encode ferrous iron transporters, were up-regulated under iron-rich conditions. The concomitant rapid iron uptake and magnetite formation suggest that these uptake systems serve as iron supply lines for magnetosome synthesis.

  17. Global Gene Expression Analysis of Iron-Inducible Genes in Magnetospirillum magneticum AMB-1

    PubMed Central

    Suzuki, Takeyuki; Okamura, Yoshiko; Calugay, Ronie J.; Takeyama, Haruko; Matsunaga, Tadashi

    2006-01-01

    Iron uptake systems were identified by global expression profiling of Magnetospirillum magneticum AMB-1. feo, tpd, and ftr, which encode ferrous iron transporters, were up-regulated under iron-rich conditions. The concomitant rapid iron uptake and magnetite formation suggest that these uptake systems serve as iron supply lines for magnetosome synthesis. PMID:16513757

  18. The impacts of neutralized acid mine drainage contaminated water on the expression of selected endocrine-linked genes in juvenile Mozambique tilapia Oreochromis mossambicus exposed in vivo.

    PubMed

    Truter, Johannes Christoff; va Wyk, Johannes Hendrik; Oberholster, Paul Johan; Botha, Anna-Maria

    2014-02-01

    Acid mine drainage (AMD) is a global environmental concern due to detrimental impacts on river ecosystems. Little is however known regarding the biological impacts of neutralized AMD on aquatic vertebrates despite excessive discharge into watercourses. The aim of this investigation was to evaluate the endocrine modulatory potential of neutralized AMD, using molecular biomarkers in the teleost fish Oreochromis mossambicus in exposure studies. Surface water was collected from six locations downstream of a high density sludge (HDS) AMD treatment plant and a reference site unimpacted by AMD. The concentrations of 28 elements, including 22 metals, were quantified in the exposure water in order to identify potential links to altered gene expression. Relatively high concentrations of manganese (~ 10mg/l), nickel (~ 0.1mg/l) and cobalt (~ 0.03 mg/l) were detected downstream of the HDS plant. The expression of thyroid receptor-α (trα), trβ, androgen receptor-1 (ar1), ar2, glucocorticoid receptor-1 (gr1), gr2, mineralocorticoid receptor (mr) and aromatase (cyp19a1b) was quantified in juvenile fish after 48 h exposure. Slight but significant changes were observed in the expression of gr1 and mr in fish exposed to water collected directly downstream of the HDS plant, consisting of approximately 95 percent neutralized AMD. The most pronounced alterations in gene expression (i.e. trα, trβ, gr1, gr2, ar1 and mr) was associated with water collected further downstream at a location with no other apparent contamination vectors apart from the neutralized AMD. The altered gene expression associated with the "downstream" locality coincided with higher concentrations of certain metals relative to the locality adjacent to the HDS plant which may indicate a causative link. The current study provides evidence of endocrine disruptive activity associated with neutralized AMD contamination in regard to alterations in the expression of key genes linked to the thyroid, interrenal and

  19. Global adaptive rank truncated product method for gene-set analysis in association studies.

    PubMed

    Vilor-Tejedor, Natalia; Calle, M Luz

    2014-09-01

    Gene set analysis (GSA) aims to assess the overall association of a set of genetic variants with a phenotype and has the potential to detect subtle effects of variants in a gene or a pathway that might be missed when assessed individually. We present a new implementation of the Adaptive Rank Truncated Product method (ARTP) for analyzing the association of a set of Single Nucleotide Polymorphisms (SNPs) in a gene or pathway. The new implementation, referred to as globalARTP, improves the original one by allowing the different SNPs in the set to have different modes of inheritance. We perform a simulation study for exploring the power of the proposed methodology in a set of scenarios with different numbers of causal SNPs with different effect sizes. Moreover, we show the advantage of using the gene set approach in the context of an Alzheimer's disease case-control study where we explore the endocytosis pathway. The new method is implemented in the R function globalARTP of the globalGSA package available at http://cran.r-project.org. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  20. Extremely Acidophilic Protists from Acid Mine Drainage Host Rickettsiales-Lineage Endosymbionts That Have Intervening Sequences in Their 16S rRNA Genes

    PubMed Central

    Baker, Brett J.; Hugenholtz, Philip; Dawson, Scott C.; Banfield, Jillian F.

    2003-01-01

    During a molecular phylogenetic survey of extremely acidic (pH < 1), metal-rich acid mine drainage habitats in the Richmond Mine at Iron Mountain, Calif., we detected 16S rRNA gene sequences of a novel bacterial group belonging to the order Rickettsiales in the Alphaproteobacteria. The closest known relatives of this group (92% 16S rRNA gene sequence identity) are endosymbionts of the protist Acanthamoeba. Oligonucleotide 16S rRNA probes were designed and used to observe members of this group within acidophilic protists. To improve visualization of eukaryotic populations in the acid mine drainage samples, broad-specificity probes for eukaryotes were redesigned and combined to highlight this component of the acid mine drainage community. Approximately 4% of protists in the acid mine drainage samples contained endosymbionts. Measurements of internal pH of the protists showed that their cytosol is close to neutral, indicating that the endosymbionts may be neutrophilic. The endosymbionts had a conserved 273-nucleotide intervening sequence (IVS) in variable region V1 of their 16S rRNA genes. The IVS does not match any sequence in current databases, but the predicted secondary structure forms well-defined stem loops. IVSs are uncommon in rRNA genes and appear to be confined to bacteria living in close association with eukaryotes. Based on the phylogenetic novelty of the endosymbiont sequences and initial culture-independent characterization, we propose the name “Candidatus Captivus acidiprotistae.” To our knowledge, this is the first report of an endosymbiotic relationship in an extremely acidic habitat. PMID:12957940

  1. Structuring osteosarcoma knowledge: an osteosarcoma-gene association database based on literature mining and manual annotation

    PubMed Central

    Poos, Kathrin; Smida, Jan; Nathrath, Michaela; Maugg, Doris; Baumhoer, Daniel; Neumann, Anna; Korsching, Eberhard

    2014-01-01

    Osteosarcoma (OS) is the most common primary bone cancer exhibiting high genomic instability. This genomic instability affects multiple genes and microRNAs to a varying extent depending on patient and tumor subtype. Massive research is ongoing to identify genes including their gene products and microRNAs that correlate with disease progression and might be used as biomarkers for OS. However, the genomic complexity hampers the identification of reliable biomarkers. Up to now, clinico-pathological factors are the key determinants to guide prognosis and therapeutic treatments. Each day, new studies about OS are published and complicate the acquisition of information to support biomarker discovery and therapeutic improvements. Thus, it is necessary to provide a structured and annotated view on the current OS knowledge that is quick and easily accessible to researchers of the field. Therefore, we developed a publicly available database and Web interface that serves as resource for OS-associated genes and microRNAs. Genes and microRNAs were collected using an automated dictionary-based gene recognition procedure followed by manual review and annotation by experts of the field. In total, 911 genes and 81 microRNAs related to 1331 PubMed abstracts were collected (last update: 29 October 2013). Users can evaluate genes and microRNAs according to their potential prognostic and therapeutic impact, the experimental procedures, the sample types, the biological contexts and microRNA target gene interactions. Additionally, a pathway enrichment analysis of the collected genes highlights different aspects of OS progression. OS requires pathways commonly deregulated in cancer but also features OS-specific alterations like deregulated osteoclast differentiation. To our knowledge, this is the first effort of an OS database containing manual reviewed and annotated up-to-date OS knowledge. It might be a useful resource especially for the bone tumor research community, as specific

  2. Structuring osteosarcoma knowledge: an osteosarcoma-gene association database based on literature mining and manual annotation.

    PubMed

    Poos, Kathrin; Smida, Jan; Nathrath, Michaela; Maugg, Doris; Baumhoer, Daniel; Neumann, Anna; Korsching, Eberhard

    2014-01-01

    Osteosarcoma (OS) is the most common primary bone cancer exhibiting high genomic instability. This genomic instability affects multiple genes and microRNAs to a varying extent depending on patient and tumor subtype. Massive research is ongoing to identify genes including their gene products and microRNAs that correlate with disease progression and might be used as biomarkers for OS. However, the genomic complexity hampers the identification of reliable biomarkers. Up to now, clinico-pathological factors are the key determinants to guide prognosis and therapeutic treatments. Each day, new studies about OS are published and complicate the acquisition of information to support biomarker discovery and therapeutic improvements. Thus, it is necessary to provide a structured and annotated view on the current OS knowledge that is quick and easily accessible to researchers of the field. Therefore, we developed a publicly available database and Web interface that serves as resource for OS-associated genes and microRNAs. Genes and microRNAs were collected using an automated dictionary-based gene recognition procedure followed by manual review and annotation by experts of the field. In total, 911 genes and 81 microRNAs related to 1331 PubMed abstracts were collected (last update: 29 October 2013). Users can evaluate genes and microRNAs according to their potential prognostic and therapeutic impact, the experimental procedures, the sample types, the biological contexts and microRNA target gene interactions. Additionally, a pathway enrichment analysis of the collected genes highlights different aspects of OS progression. OS requires pathways commonly deregulated in cancer but also features OS-specific alterations like deregulated osteoclast differentiation. To our knowledge, this is the first effort of an OS database containing manual reviewed and annotated up-to-date OS knowledge. It might be a useful resource especially for the bone tumor research community, as specific

  3. Global Gene-Expression Analysis to Identify Differentially Expressed Genes Critical for the Heat Stress Response in Brassica rapa.

    PubMed

    Dong, Xiangshu; Yi, Hankuil; Lee, Jeongyeo; Nou, Ill-Sup; Han, Ching-Tack; Hur, Yoonkang

    2015-01-01

    Genome-wide dissection of the heat stress response (HSR) is necessary to overcome problems in crop production caused by global warming. To identify HSR genes, we profiled gene expression in two Chinese cabbage inbred lines with different thermotolerances, Chiifu and Kenshin. Many genes exhibited >2-fold changes in expression upon exposure to 0.5- 4 h at 45°C (high temperature, HT): 5.2% (2,142 genes) in Chiifu and 3.7% (1,535 genes) in Kenshin. The most enriched GO (Gene Ontology) items included 'response to heat', 'response to reactive oxygen species (ROS)', 'response to temperature stimulus', 'response to abiotic stimulus', and 'MAPKKK cascade'. In both lines, the genes most highly induced by HT encoded small heat shock proteins (Hsps) and heat shock factor (Hsf)-like proteins such as HsfB2A (Bra029292), whereas high-molecular weight Hsps were constitutively expressed. Other upstream HSR components were also up-regulated: ROS-scavenging genes like glutathione peroxidase 2 (BrGPX2, Bra022853), protein kinases, and phosphatases. Among heat stress (HS) marker genes in Arabidopsis, only exportin 1A (XPO1A) (Bra008580, Bra006382) can be applied to B. rapa for basal thermotolerance (BT) and short-term acquired thermotolerance (SAT) gene. CYP707A3 (Bra025083, Bra021965), which is involved in the dehydration response in Arabidopsis, was associated with membrane leakage in both lines following HS. Although many transcription factors (TF) genes, including DREB2A (Bra005852), were involved in HS tolerance in both lines, Bra024224 (MYB41) and Bra021735 (a bZIP/AIR1 [Anthocyanin-Impaired-Response-1]) were specific to Kenshin. Several candidate TFs involved in thermotolerance were confirmed as HSR genes by real-time PCR, and these assignments were further supported by promoter analysis. Although some of our findings are similar to those obtained using other plant species, clear differences in Brassica rapa reveal a distinct HSR in this species. Our data could also provide a

  4. Global Gene-Expression Analysis to Identify Differentially Expressed Genes Critical for the Heat Stress Response in Brassica rapa

    PubMed Central

    Dong, Xiangshu; Yi, Hankuil; Lee, Jeongyeo; Nou, Ill-Sup; Han, Ching-Tack; Hur, Yoonkang

    2015-01-01

    Genome-wide dissection of the heat stress response (HSR) is necessary to overcome problems in crop production caused by global warming. To identify HSR genes, we profiled gene expression in two Chinese cabbage inbred lines with different thermotolerances, Chiifu and Kenshin. Many genes exhibited >2-fold changes in expression upon exposure to 0.5– 4 h at 45°C (high temperature, HT): 5.2% (2,142 genes) in Chiifu and 3.7% (1,535 genes) in Kenshin. The most enriched GO (Gene Ontology) items included ‘response to heat’, ‘response to reactive oxygen species (ROS)’, ‘response to temperature stimulus’, ‘response to abiotic stimulus’, and ‘MAPKKK cascade’. In both lines, the genes most highly induced by HT encoded small heat shock proteins (Hsps) and heat shock factor (Hsf)-like proteins such as HsfB2A (Bra029292), whereas high-molecular weight Hsps were constitutively expressed. Other upstream HSR components were also up-regulated: ROS-scavenging genes like glutathione peroxidase 2 (BrGPX2, Bra022853), protein kinases, and phosphatases. Among heat stress (HS) marker genes in Arabidopsis, only exportin 1A (XPO1A) (Bra008580, Bra006382) can be applied to B. rapa for basal thermotolerance (BT) and short-term acquired thermotolerance (SAT) gene. CYP707A3 (Bra025083, Bra021965), which is involved in the dehydration response in Arabidopsis, was associated with membrane leakage in both lines following HS. Although many transcription factors (TF) genes, including DREB2A (Bra005852), were involved in HS tolerance in both lines, Bra024224 (MYB41) and Bra021735 (a bZIP/AIR1 [Anthocyanin-Impaired-Response-1]) were specific to Kenshin. Several candidate TFs involved in thermotolerance were confirmed as HSR genes by real-time PCR, and these assignments were further supported by promoter analysis. Although some of our findings are similar to those obtained using other plant species, clear differences in Brassica rapa reveal a distinct HSR in this species. Our data

  5. Allele Mining in Barley Genetic Resources Reveals Genes of Race-Non-Specific Powdery Mildew Resistance

    PubMed Central

    Spies, Annika; Korzun, Viktor; Bayles, Rosemary; Rajaraman, Jeyaraman; Himmelbach, Axel; Hedley, Pete E.; Schweizer, Patrick

    2012-01-01

    Race-non-specific, or quantitative, pathogen resistance is of high importance to plant breeders due to its expected durability. However, it is usually controlled by multiple quantitative trait loci (QTL) and therefore difficult to handle in practice. Knowing the genes that underlie race-non-specific resistance (NR) would allow its exploitation in a more targeted manner. Here, we performed an association-genetic study in a customized worldwide collection of spring barley accessions for candidate genes of race-NR to the powdery mildew fungus Blumeria graminis f. sp. hordei (Bgh) and combined data with results from QTL mapping as well as functional-genomics approaches. This led to the identification of 11 associated genes with converging evidence for an important role in race-NR in the presence of the Mlo gene for basal susceptibility. Outstanding in this respect was the gene encoding the transcription factor WRKY2. The results suggest that unlocking plant genetic resources and integrating functional-genomic with genetic approaches can accelerate the discovery of genes underlying race-NR in barley and other crop plants. PMID:22629270

  6. Harnessing opportunities for good governance of health impacts of mining projects in Mongolia: results of a global partnership.

    PubMed

    Pfeiffer, Michaela; Vanya, Delgermaa; Davison, Colleen; Lkhagvasuren, Oyunaa; Johnston, Lesley; Janes, Craig R

    2017-06-27

    The Sustainable Development Goals call for the effective governance of shared natural resources in ways that support inclusive growth, safeguard the integrity of the natural and physical environment, and promote health and well-being for all. For large-scale resource extraction projects -- e.g. in the mining sector -- environmental regulations and in particular environmental impact assessments (EIA) provide an important but insufficiently developed avenue to ensure that wider sustainable development issues, such as health, have been considered prior to the permitting of projects. In recognition of the opportunity provided in EIA to influence the extent to which health issues would be addressed in the design and delivery of mining projects, an international and intersectoral partnership, with the support of WHO and public funds from Canadian sources, engaged over a period of six years in a series of capacity development activities and knowledge translation/dissemination events aimed at influencing policy change in the extractives sector so as to include consideration of human health impacts. Early efforts significantly increased awareness of the need to include health considerations in EIAs. Coupling effective knowledge translation about health in EIA with the development of networks that fostered good intersectoral partnerships, this awareness supported the development and implementation of key pieces of legislation. These results show that intersectoral collaboration is essential, and must be supported by an effective conceptual understanding about which methods and models of impact assessment, particularly for health, lend themselves to integration within EIA. The results of our partnership demonstrate that when specific conditions are met, integrating health into the EIA system represents a promising avenue to ensure that mining activities contribute to wider sustainable development goals and objectives.

  7. Uranium from Africa - An overview on past and current mining activities: Re-appraising associated risks and chances in a global context

    NASA Astrophysics Data System (ADS)

    Winde, Frank; Brugge, Doug; Nidecker, Andreas; Ruegg, Urs

    2017-05-01

    In 2003, nuclear power received renewed interest as a perceived climate-neutral way to meet high energy demands of large industrialized countries, such as China, India, Russia and the USA. It triggered a growing demand for uranium (U) as nuclear fuel. Dubbed the 'nuclear renaissance', the U-price rose over tenfold before the global credit crisis dampend the rush. Many efforts to capitalise on the renewed demand focused on Africa. This paper provides an overview on the type and extent of uranium mining, production and exploration on the African continent and discusses the economic benefits as well as the potential environmental and health risks and the long-term needs for remediation of legacy sites. The actual historical results of uranium mining activities in more than thirty African countries provide data against which to assess the existing risks of uranium development. The already existing uraniferous waste in several African countries threatens scarce water resources and the health of adjacent residents. Responsibility should rest with the governments and the companies to ensure that these threats are not realized.

  8. Text mining.

    PubMed

    Clegg, Andrew B; Shepherd, Adrian J

    2008-01-01

    One of the fastest-growing fields in bioinformatics is text mining: the application of natural language processing techniques to problems of knowledge management and discovery, using large collections of biological or biomedical text such as MEDLINE. The techniques used in text mining range from the very simple (e.g., the inference of relationships between genes from frequent proximity in documents) to the complex and computationally intensive (e.g., the analysis of sentence structures with parsers in order to extract facts about protein-protein interactions from statements in the text). This chapter presents a general introduction to some of the key principles and challenges of natural language processing, and introduces some of the tools available to end-users and developers. A case study describes the construction and testing of a simple tool designed to tackle a task that is crucial to almost any application of text mining in bioinformatics--identifying gene/protein names in text and mapping them onto records in an external database.

  9. Global gene expression changes of amniotic fluid cell free RNA according to fetal development.

    PubMed

    Jang, Ji Hyon; Jung, Yong Wook; Shim, Sung Han; Sin, Yun Jeong; Lee, Kyoung Jin; Shim, Sung Shin; Ahn, Eun Hee; Cha, Dong Hyun

    2017-09-01

    This study aimed to evaluate the effect of in utero fetal development on the cell-free transcriptome of amniotic fluid by analyzing global gene expression in the amniotic fluid supernatant obtained at different gestational ages from euploid fetuses STUDY DESIGN: Thirteen amniotic fluid samples were obtained from five individuals at 28 gestational weeks and eight individuals at full term pregnancy. Transcriptome data previously analyzed by our group from 14 euploid mid-trimester amniotic fluid samples were used for comparative analysis. RNA was extracted from amniotic fluid supernatants, hybridized to Affymetrix GeneChip Human arrays, and the transcriptome was analyzed using the DAVID toolkit. We evaluated 27 samples, which were divided into three groups as follows: 14 subjects between 16 and 18 gestational weeks from our previous study (group 1), five subjects in late second trimester (group 2), and eight subjects at full term pregnancy (group 3). No genes were significantly differentially regulated between group 3 and group 2. We identified 545 probe sets that were significantly differentially expressed between group 1 and group 2 and 3 samples (FDR P-value <0.05). Based on tissue expression analysis, 396 genes that were upregulated in group 1 were enriched in the nervous system including brain and endocrine organs such as pancreas and adrenal gland. In addition, 136 genes that were upregulated in group 2 and 3 were specific to bronchioepithelial cells. Functional pathway analysis revealed that there was no significantly enriched pathway in terms of genes that were upregulated in either group 2 or group 3. Comparing the amniotic fluid cell-free transcriptome of group 1 and 2 with that of group 3, 18 genes were significantly differently modulated. Fetal development affects the amniotic fluid cell-free transcriptome. Fetal skin keratinization, which begins at 19 gestational weeks, might play an important role in changes in global gene expression in the amniotic

  10. Impact of smoking cessation on global gene expression in the bronchial epithelium of chronic smokers

    PubMed Central

    Zhang, Li; Lee, Jack; Tang, Hongli; Fan, You-Hong; Xiao, Lianchun; Ren, Hening; Kurie, Jonathan; Morice, Rodolfo C; Hong, Waun Ki; Mao, Li

    2014-01-01

    Cigarette smoke is the major cause of lung cancer and can interact in complex ways with drugs for lung cancer prevention or therapy. Molecular genetic research promises to elucidate the biologic mechanisms underlying divergent drug effects in smokers versus non-smokers and to help in developing new approaches for controlling lung cancer. The present study compared global gene expression profiles (determined via Affymetrix microarray measurements in bronchial epithelial cells) between chronic smokers, former smokers, and never smokers. Smoking effects on global gene expression were determined from a combined analysis of three independent datasets. Differential expression between current and never smokers occurred in 591 of the 13,902 genes measured on the microarrays (P < 0.01 and >2 fold change; pooled data)—a profound effect. In contrast, differential expression between current and former smokers occurred in only 145 of the measured genes (P < 0.01 and >2 fold change; pooled data). Nine of these 145 genes showed consistent and significant changes in each of the three datasets (P < 0.01 and >2 fold change), with 8 being down-regulated in former smokers. Seven of the 8 down-regulated genes, including CYP1B1 and 3 AKR genes, influence the metabolism of carcinogens and/or therapeutic/chemopreventive agents. Our data comparing former and current smokers allowed us to pinpoint the genes involved in smoking–drug interactions in lung cancer prevention and therapy. These findings have important implications for developing new targeted and dosing approaches for prevention and therapy in the lung and other sites, highlighting the importance of monitoring smoking status in patients receiving oncologic drug interventions. PMID:19138944

  11. Identification of candidate genes in Arabidopsis and Populus cell wall biosynthesis using text-mining, co-expression network analysis and comparative genomics.

    PubMed

    Yang, Xiaohan; Ye, Chu-Yu; Bisaria, Anjali; Tuskan, Gerald A; Kalluri, Udaya C

    2011-12-01

    Populus is an important bioenergy crop for bioethanol production. A greater understanding of cell wall biosynthesis processes is critical in reducing biomass recalcitrance, a major hindrance in efficient generation of biofuels from lignocellulosic biomass. Here, we report the identification of candidate cell wall biosynthesis genes through the development and application of a novel bioinformatics pipeline. As a first step, via text-mining of PubMed publications, we obtained 121 Arabidopsis genes that had the experimental evidence supporting their involvement in cell wall biosynthesis or remodeling. The 121 genes were then used as bait genes to query an Arabidopsis co-expression database, and additional genes were identified as neighbors of the bait genes in the network, increasing the number of genes to 548. The 548 Arabidopsis genes were then used to re-query the Arabidopsis co-expression database and re-construct a network that captured additional network neighbors, expanding to a total of 694 genes. The 694 Arabidopsis genes were computationally divided into 22 clusters. Queries of the Populus genome using the Arabidopsis genes revealed 817 Populus orthologs. Functional analysis of gene ontology and tissue-specific gene expression indicated that these Arabidopsis and Populus genes are high likelihood candidates for functional characterization in relation to cell wall biosynthesis.

  12. Global assessment of imprinted gene expression in the bovine conceptus by next generation sequencing

    PubMed Central

    Chen, Zhiyuan; Hagen, Darren E.; Wang, Juanbin; Elsik, Christine G.; Ji, Tieming; Siqueira, Luiz G.; Hansen, Peter J.; Rivera, Rocío M.

    2016-01-01

    ABSTRACT Genomic imprinting is an epigenetic mechanism that leads to parental-allele-specific gene expression. Approximately 150 imprinted genes have been identified in humans and mice but less than 30 have been described as imprinted in cattle. For the purpose of de novo identification of imprinted genes in bovine, we determined global monoallelic gene expression in brain, skeletal muscle, liver, kidney and placenta of day ∼105 Bos taurus indicus × Bos taurus taurus F1 conceptuses using RNA sequencing. To accomplish this, we developed a bioinformatics pipeline to identify parent-specific single nucleotide polymorphism alleles after filtering adenosine to inosine (A-to-I) RNA editing sites. We identified 53 genes subject to monoallelic expression. Twenty three are genes known to be imprinted in the cow and an additional 7 have previously been characterized as imprinted in human and/or mouse that have not been reported as imprinted in cattle. Of the remaining 23 genes, we found that 10 are uncharacterized or unannotated transcripts located in known imprinted clusters, whereas the other 13 genes are distributed throughout the bovine genome and are not close to any known imprinted clusters. To exclude potential cis-eQTL effects on allele expression, we corroborated the parental specificity of monoallelic expression in day 86 Bos taurus taurus × Bos taurus taurus conceptuses and identified 8 novel bovine imprinted genes. Further, we identified 671 candidate A-to-I RNA editing sites and describe random X-inactivation in day 15 bovine extraembryonic membranes. Our results expand the imprinted gene list in bovine and demonstrate that monoallelic gene expression can be the result of cis-eQTL effects. PMID:27245094

  13. Global assessment of imprinted gene expression in the bovine conceptus by next generation sequencing.

    PubMed

    Chen, Zhiyuan; Hagen, Darren E; Wang, Juanbin; Elsik, Christine G; Ji, Tieming; Siqueira, Luiz G; Hansen, Peter J; Rivera, Rocío M

    2016-07-02

    Genomic imprinting is an epigenetic mechanism that leads to parental-allele-specific gene expression. Approximately 150 imprinted genes have been identified in humans and mice but less than 30 have been described as imprinted in cattle. For the purpose of de novo identification of imprinted genes in bovine, we determined global monoallelic gene expression in brain, skeletal muscle, liver, kidney and placenta of day ∼105 Bos taurus indicus × Bos taurus taurus F1 conceptuses using RNA sequencing. To accomplish this, we developed a bioinformatics pipeline to identify parent-specific single nucleotide polymorphism alleles after filtering adenosine to inosine (A-to-I) RNA editing sites. We identified 53 genes subject to monoallelic expression. Twenty three are genes known to be imprinted in the cow and an additional 7 have previously been characterized as imprinted in human and/or mouse that have not been reported as imprinted in cattle. Of the remaining 23 genes, we found that 10 are uncharacterized or unannotated transcripts located in known imprinted clusters, whereas the other 13 genes are distributed throughout the bovine genome and are not close to any known imprinted clusters. To exclude potential cis-eQTL effects on allele expression, we corroborated the parental specificity of monoallelic expression in day 86 Bos taurus taurus × Bos taurus taurus conceptuses and identified 8 novel bovine imprinted genes. Further, we identified 671 candidate A-to-I RNA editing sites and describe random X-inactivation in day 15 bovine extraembryonic membranes. Our results expand the imprinted gene list in bovine and demonstrate that monoallelic gene expression can be the result of cis-eQTL effects.

  14. Global Analysis of the Human Pathophenotypic Similarity Gene Network Merges Disease Module Components

    PubMed Central

    Reyes-Palomares, Armando; Rodríguez-López, Rocío; Ranea, Juan A. G.; Jiménez, Francisca Sánchez; Medina, Miguel Angel

    2013-01-01

    The molecular complexity of genetic diseases requires novel approaches to break it down into coherent biological modules. For this purpose, many disease network models have been created and analyzed. We highlight two of them, “the human diseases networks” (HDN) and “the orphan disease networks” (ODN). However, in these models, each single node represents one disease or an ambiguous group of diseases. In these cases, the notion of diseases as unique entities reduces the usefulness of network-based methods. We hypothesize that using the clinical features (pathophenotypes) to define pathophenotypic connections between disease-causing genes improve our understanding of the molecular events originated by genetic disturbances. For this, we have built a pathophenotypic similarity gene network (PSGN) and compared it with the unipartite projections (based on gene-to-gene edges) similar to those used in previous network models (HDN and ODN). Unlike these disease network models, the PSGN uses semantic similarities. This pathophenotypic similarity has been calculated by comparing pathophenotypic annotations of genes (human abnormalities of HPO terms) in the “Human Phenotype Ontology”. The resulting network contains 1075 genes (nodes) and 26197 significant pathophenotypic similarities (edges). A global analysis of this network reveals: unnoticed pairs of genes showing significant pathophenotypic similarity, a biological meaningful re-arrangement of the pathological relationships between genes, correlations of biochemical interactions with higher similarity scores and functional biases in metabolic and essential genes toward the pathophenotypic specificity and the pleiotropy, respectively. Additionally, pathophenotypic similarities and metabolic interactions of genes associated with maple syrup urine disease (MSUD) have been used to merge into a coherent pathological module. Our results indicate that pathophenotypes contribute to identify underlying co

  15. Targeting c-Myc-activated genes with a correlation method: Detection of global changes in large gene expression network dynamics

    PubMed Central

    Remondini, D.; O'Connell, B.; Intrator, N.; Sedivy, J. M.; Neretti, N.; Castellani, G. C.; Cooper, L. N.

    2005-01-01

    This work studies the dynamics of a gene expression time series network. The network, which is obtained from the correlation of gene expressions, exhibits global dynamic properties that emerge after a cell state perturbation. The main features of this network appear to be more robust when compared with those obtained with a network obtained from a linear Markov model. In particular, the network properties strongly depend on the exact time sequence relationships between genes and are destroyed by random temporal data shuffling. We discuss in detail the problem of finding targets of the c-myc protooncogene, which encodes a transcriptional regulator whose inappropriate expression has been correlated with a wide array of malignancies. The data used for network construction are a time series of gene expression, collected by microarray analysis of a rat fibroblast cell line expressing a conditional Myc-estrogen receptor oncoprotein. We show that the correlation-based model can establish a clear relationship between network structure and the cascade of c-myc-activated genes. PMID:15867157

  16. Prioritization of candidate genes for cattle reproductive traits, based on protein-protein interactions, gene expression, and text-mining.

    PubMed

    Hulsegge, Ina; Woelders, Henri; Smits, Mari; Schokker, Dirkjan; Jiang, Li; Sørensen, Peter

    2013-05-15

    Reproduction is of significant economic importance in dairy cattle. Improved understanding of mechanisms that control estrous behavior and other reproduction traits could help in developing strategies to improve and/or monitor these traits. The objective of this study was to predict and rank genes and processes in brain areas and pituitary involved in reproductive traits in cattle using information derived from three different data sources: gene expression, protein-protein interactions, and literature. We identified 59, 89, 53, 23, and 71 genes in bovine amygdala, dorsal hypothalamus, hippocampus, pituitary, and ventral hypothalamus, respectively, potentially involved in processes underlying estrus and estrous behavior. Functional annotation of the candidate genes points to a number of tissue-specific processes of which the "neurotransmitter/ion channel/synapse" process in the amygdala, "steroid hormone receptor activity/ion binding" in the pituitary, "extracellular region" in the ventral hypothalamus, and "positive regulation of transcription/metabolic process" in the dorsal hypothalamus are most prominent. The regulation of the functional processes in the various tissues operate at different biological levels, including transcriptional, posttranscriptional, extracellular, and intercellular signaling levels.

  17. Differentiation in neutral genes and a candidate gene in the pied flycatcher: using biological archives to track global climate change.

    PubMed

    Kuhn, Kerstin; Schwenk, Klaus; Both, Christiaan; Canal, David; Johansson, Ulf S; van der Mije, Steven; Töpfer, Till; Päckert, Martin

    2013-11-01

    Global climate change is one of the major driving forces for adaptive shifts in migration and breeding phenology and possibly impacts demographic changes if a species fails to adapt sufficiently. In Western Europe, pied flycatchers (Ficedula hypoleuca) have insufficiently adapted their breeding phenology to the ongoing advance of food peaks within their breeding area and consequently suffered local population declines. We address the question whether this population decline led to a loss of genetic variation, using two neutral marker sets (mitochondrial control region and microsatellites), and one potentially selectively non-neutral marker (avian Clock gene). We report temporal changes in genetic diversity in extant populations and biological archives over more than a century, using samples from sites differing in the extent of climate change. Comparing genetic differentiation over this period revealed that only the recent Dutch population, which underwent population declines, showed slightly lower genetic variation than the historic Dutch population. As that loss of variation was only moderate and not observed in all markers, current gene flow across Western and Central European populations might have compensated local loss of variation over the last decades. A comparison of genetic differentiation in neutral loci versus the Clock gene locus provided evidence for stabilizing selection. Furthermore, in all genetic markers, we found a greater genetic differentiation in space than in time. This pattern suggests that local adaptation or historic processes might have a stronger effect on the population structure and genetic variation in the pied flycatcher than recent global climate changes.

  18. Benzo[a]pyrene decreases global and gene specific DNA methylation during zebrafish development

    PubMed Central

    Fang, Xiefan; Thornton, Cammi; Scheffler, Brian E.; Willett, Kristine L.

    2013-01-01

    DNA methylation is important for gene regulation and is vulnerable to early-life exposure to environmental contaminants. We found that direct waterborne benzo[a]pyrene (BaP) exposure at 24 μg/L from 2.5 to 96 hours post fertilization (hpf) to zebrafish embryos significantly decreased global cytosine methylation by 44.8% and promoter methylation in vasa by 17%. Consequently, vasa expression was significantly increased by 33%. In contrast, BaP exposure at environmentally relevant concentrations did not change CpG island methylation or gene expression in cancer genes such as ras-association domain family member 1 (rassf1), telomerase reverse transcriptase (tert), c-jun, and c-myca. Similarly, BaP did not change gene expression of DNA methyltransferase 1 (dnmt1) and glycine N-methyltransferase (gnmt). While total DNMT activity was not affected, GNMT enzyme activity was moderately increased. In summary, BaP is an epigenetic modifier for global and gene specific DNA methylation status in zebrafish larvae. PMID:23542452

  19. Local Mobile Gene Pools Rapidly Cross Species Boundaries To Create Endemicity within Global Vibrio cholerae Populations

    PubMed Central

    Boucher, Yan; Cordero, Otto X.; Takemura, Alison; Hunt, Dana E.; Schliep, Klaus; Bapteste, Eric; Lopez, Philippe; Tarr, Cheryl L.; Polz, Martin F.

    2011-01-01

    Vibrio cholerae represents both an environmental pathogen and a widely distributed microbial species comprised of closely related strains occurring in the tropical to temperate coastal ocean across the globe (Colwell RR, Science 274:2025–2031, 1996; Griffith DC, Kelly-Hope LA, Miller MA, Am. J. Trop. Med. Hyg. 75:973–977, 2006; Reidl J, Klose KE, FEMS Microbiol. Rev. 26:125–139, 2002). However, although this implies dispersal and growth across diverse environmental conditions, how locally successful populations assemble from a possibly global gene pool, relatively unhindered by geographic boundaries, remains poorly understood. Here, we show that environmental Vibrio cholerae possesses two, largely distinct gene pools: one is vertically inherited and globally well mixed, and the other is local and rapidly transferred across species boundaries to generate an endemic population structure. While phylogeographic analysis of isolates collected from Bangladesh and the U.S. east coast suggested strong panmixis for protein-coding genes, there was geographic structure in integrons, which are the only genomic islands present in all strains of V. cholerae (Chun J, et al., Proc. Natl. Acad. Sci. U. S. A. 106:15442–15447, 2009) and are capable of acquiring and expressing mobile gene cassettes. Geographic differentiation in integrons arises from high gene turnover, with acquisition from a locally cooccurring sister species being up to twice as likely as exchange with conspecific but geographically distant V. cholerae populations. PMID:21486909

  20. Global Developmental Gene Programing Involves a Nuclear Form of Fibroblast Growth Factor Receptor-1 (FGFR1).

    PubMed

    Terranova, Christopher; Narla, Sridhar T; Lee, Yu-Wei; Bard, Jonathan; Parikh, Abhirath; Stachowiak, Ewa K; Tzanakakis, Emmanuel S; Buck, Michael J; Birkaya, Barbara; Stachowiak, Michal K

    2015-01-01

    Genetic studies have placed the Fgfr1 gene at the top of major ontogenic pathways that enable gastrulation, tissue development and organogenesis. Using genome-wide sequencing and loss and gain of function experiments the present investigation reveals a mechanism that underlies global and direct gene regulation by the nuclear form of FGFR1, ensuring that pluripotent Embryonic Stem Cells differentiate into Neuronal Cells in response to Retinoic Acid. Nuclear FGFR1, both alone and with its partner nuclear receptors RXR and Nur77, targets thousands of active genes and controls the expression of pluripotency, homeobox, neuronal and mesodermal genes. Nuclear FGFR1 targets genes in developmental pathways represented by Wnt/β-catenin, CREB, BMP, the cell cycle and cancer-related TP53 pathway, neuroectodermal and mesodermal programing networks, axonal growth and synaptic plasticity pathways. Nuclear FGFR1 targets the consensus sequences of transcription factors known to engage CREB-binding protein, a common coregulator of transcription and established binding partner of nuclear FGFR1. This investigation reveals the role of nuclear FGFR1 as a global genomic programmer of cell, neural and muscle development.

  1. Mining 3D Patterns from Gene Expression Temporal Data: A New Tricluster Evaluation Measure

    PubMed Central

    2014-01-01

    Microarrays have revolutionized biotechnological research. The analysis of new data generated represents a computational challenge due to the characteristics of these data. Clustering techniques are applied to create groups of genes that exhibit a similar behavior. Biclustering emerges as a valuable tool for microarray data analysis since it relaxes the constraints for grouping, allowing genes to be evaluated only under a subset of the conditions. However, if a third dimension appears in the data, triclustering is the appropriate tool for the analysis. This occurs in longitudinal experiments in which the genes are evaluated under conditions at several time points. All clustering, biclustering, and triclustering techniques guide their search for solutions by a measure that evaluates the quality of clusters. We present an evaluation measure for triclusters called Mean Square Residue 3D. This measure is based on the classic biclustering measure Mean Square Residue. Mean Square Residue 3D has been applied to both synthetic and real data and it has proved to be capable of extracting groups of genes with homogeneous patterns in subsets of conditions and times, and these groups have shown a high correlation level and they are also related to their functional annotations extracted from the Gene Ontology project. PMID:25143987

  2. Potential impact of human mitochondrial replacement on global policy regarding germline gene modification.

    PubMed

    Ishii, Tetsuya

    2014-08-01

    Previous discussions regarding human germline gene modification led to a global consensus that no germline should undergo genetic modification. However, the UK Human Fertilisation and Embryology Authority, having conducted at the UK Government's request a scientific review and a wide public consultation, provided advice to the Government on the pros and cons of Parliament's lifting a ban on altering mitochondrial DNA content of human oocytes and embryos, so as to permit the prevention of maternal transmission of mitochondrial diseases. In this commentary, relevant ethical and biomedical issues are examined and requirements for proceeding with this novel procedure are suggested. Additionally, potentially significant impacts of the UK legalization on global policy concerning germline gene modification are discussed in the context of recent advances in genome-editing technology. It is concluded that international harmonization is needed, as well as further ethical and practical consideration, prior to the legalization of human mitochondrial replacement.

  3. Global gene expression in neuroendocrine tumors from patients with the MEN1 syndrome

    PubMed Central

    Dilley, William G; Kalyanaraman, Somasundaram; Verma, Sulekha; Cobb, J Perren; Laramie, Jason M; Lairmore, Terry C

    2005-01-01

    Background Multiple Endocrine Neoplasia type 1 (MEN1, OMIM 131100) is an autosomal dominant disorder characterized by endocrine tumors of the parathyroids, pancreatic islets and pituitary. The disease is caused by the functional loss of the tumor suppressor protein menin, coded by the MEN1 gene. The protein sequence has no significant homology to known consensus motifs. In vitro studies have shown menin binding to JunD, Pem, Smad3, NF-kappaB, nm23H1, and RPA2 proteins. However, none of these binding studies have led to a convincing theory of how loss-of-menin leads to neoplasia. Results Global gene expression studies on eight neuroendocrine tumors from MEN1 patients and 4 normal islet controls was performed utilizing Affymetrix U95Av2 chips. Overall hierarchical clustering placed all tumors in one group separate from the group of normal islets. Within the group of tumors, those of the same type were mostly clustered together. The clustering analysis also revealed 19 apoptosis-related genes that were under-expressed in the group of tumors. There were 193 genes that were increased/decreased by at least 2-fold in the tumors relative to the normal islets and that had a t-test significance value of p < = 0.005. Forty-five of these genes were increased and 148 were decreased in the tumors relative to the controls. One hundred and four of the genes could be classified as being involved in cell growth, cell death, or signal transduction. The results from 11 genes were selected for validation by quantitative RT-PCR. The average correlation coefficient was 0.655 (range 0.235–0.964). Conclusion This is the first analysis of global gene expression in MEN1-associated neuroendocrine tumors. Many genes were identified which were differentially expressed in neuroendocrine tumors arising in patients with the MEN1 syndrome, as compared with normal human islet cells. The expression of a group of apoptosis-related genes was significantly suppressed, suggesting that these genes may

  4. The Global Regulator Genes from Biocontrol Strain Serratia plymuthica IC1270: Cloning, Sequencing, and Functional Studies†

    PubMed Central

    Ovadis, Marianna; Liu, Xiaoguang; Gavriel, Sagi; Ismailov, Zafar; Chet, Ilan; Chernin, Leonid

    2004-01-01

    The biocontrol activity of various fluorescent pseudomonads towards plant-pathogenic fungi is dependent upon the GacA/GacS-type two-component system of global regulators and the RpoS transcription sigma factor. In particular, these components are required for the production of antifungal antibiotics and exoenzymes. To investigate the effects of these global regulators on the expression of biocontrol factors by plant-associated bacteria other than Pseudomonas spp., gacA/gacS and rpoS homologues were cloned from biocontrol strain IC1270 of Serratia plymuthica, which produces a set of antifungal compounds, including chitinolytic enzymes and the antibiotic pyrrolnitrin. The nucleotide and deduced protein sequence alignments of the cloned gacA/gacS-like genes—tentatively designated grrA (global response regulation activator) and grrS (global response regulation sensor) and of the cloned rpoS gene revealed 64 to 93% identity with matching genes and proteins of the enteric bacteria Escherichia coli, Pectobacterium carotovora subsp. carotovora, and Serratia marcescens. grrA, grrS, and rpoS gene replacement mutants of strain IC1270 were deficient in the production of pyrrolnitrin, an exoprotease, and N-acylhomoserine lactone quorum-sensing signal molecules. However, neither mutant appeared to differ from the parental strain in the production of siderophores, and only grrA and grrS mutants were deficient in the production of a 58-kDa endochitinase, representing the involvement of other sigma factors in the regulation of strain IC1270's chitinolytic activity. Compared to the parental strain, the grrA, grrS, and rpoS mutants were markedly less capable of suppressing Rhizoctonia solani and Pythium aphanidermatum under greenhouse conditions, indicating the dependence of strain IC1270's biocontrol property on the GrrA/GrrS and RpoS global regulators. PMID:15262936

  5. Global gene profiling in human endometrium during the window of implantation.

    PubMed

    Kao, L C; Tulac, S; Lobo, S; Imani, B; Yang, J P; Germeyer, A; Osteen, K; Taylor, R N; Lessey, B A; Giudice, L C

    2002-06-01

    Implantation in humans is a complex process that is temporally and spatially restricted. Over the past decade, using a one-by-one approach, several genes and gene products that may participate in this process have been identified in secretory phase endometrium. Herein, we have investigated global gene expression during the window of implantation (peak E2 and progesterone levels) in well characterized human endometrial biopsies timed to the LH surge, compared with the late proliferative phase (peak E2 level) of the menstrual cycle. Tissues were processed for poly(A(+)) RNA and hybridization of chemically fragmented, biotinylated cRNAs on high density oligonucleotide microarrays, screening for 12,686 genes and expressed sequence tags. After data normalization, mean values were obtained for gene readouts and fold ratios were derived comparing genes up- and down-regulated in the window of implantation vs. the late proliferative phase. Nonparametric testing revealed 156 significantly (P < 0.05) up-regulated genes and 377 significantly down-regulated genes in the implantation window. Up-regulated genes included those for cholesterol trafficking and transport [apolipoprotein (Apo)E being the most induced gene, 100-fold], prostaglandin (PG) biosynthesis (PLA2) and action (PGE2 receptor), proteoglycan synthesis (glucuronyltransferase), secretory proteins [glycodelin, mammaglobin, Dickkopf-1 (Dkk-1, a Wnt inhibitor)], IGF binding protein (IGFBP), and TGF-beta superfamilies, signal transduction, extracellular matrix components (osteopontin, laminin), neurotransmitter synthesis (monoamine oxidase) and receptors (gamma aminobutyric acid A receptor pi subunit), numerous immune modulators, detoxification genes (metallothioneins), and genes involved in water and ion transport [e.g. Clostridia Perfringens Enterotoxin (CPE) 1 receptor (CPE1-R) and K(+) ion channel], among others. Down-regulated genes included intestinal trefoil factor (ITF) [the most repressed gene (50-fold

  6. Global Regulation of Differential Gene Expression by c-Abl/Arg Oncogenic Kinases.

    PubMed

    Dong, Qincai; Li, Chenggong; Qu, Xiuhua; Cao, Cheng; Liu, Xuan

    2017-05-30

    BACKGROUND Studies have found that c-Abl oncogenic kinases may regulate gene transcription by RNA polymerase II phosphorylation or by direct regulation of specific transcription factors or coactivators. However, the global regulation of differential gene expression by c-Abl/Arg is largely unknown. In this study, differentially expressed genes (DEGs) regulated by c-Abl/Arg were identified, and related cellular functions and associated pathways were investigated. MATERIAL AND METHODS RNA obtained from wild-type and c-Abl/Arg gene-silenced MCF-7 cells was analyzed by RNA-Seq. DEGs were identified using edgeR software and partially validated by qRT-PCR. Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses were used to explore the potential functions of these DEGs. RESULTS A total of 1,034 DEGs were significantly regulated by c-Abl/Arg (399 were up-regulated and 635 were down-regulated after c-Abl/Arg double knockdown). GO and KEGG analyses showed that the DEGs were primarily involved in cellular metabolic processes, neurodegenerative disease, the metabolic process and signaling pathway of cAMP, angiogenesis, and cell proliferation. CONCLUSIONS Our data collectively support the hypothesis that c-Abl/Arg regulate differential gene expression, providing new insights into the biological functions of c-Abl and Arg.

  7. Global Expression Profiling of Low Temperature Induced Genes in the Chilling Tolerant Japonica Rice Jumli Marshi

    PubMed Central

    Chawade, Aakash; Lindlöf, Angelica; Olsson, Björn; Olsson, Olof

    2013-01-01

    Low temperature is a key factor that limits growth and productivity of many important agronomical crops worldwide. Rice (Oryza sativa L.) is negatively affected already at temperatures below +10°C and is therefore denoted as chilling sensitive. However, chilling tolerant rice cultivars exist and can be commercially cultivated at altitudes up to 3,050 meters with temperatures reaching as low as +4°C. In this work, the global transcriptional response to cold stress (+4°C) was studied in the Nepalese highland variety Jumli Marshi (spp. japonica) and 4,636 genes were identified as significantly differentially expressed within 24 hours of cold stress. Comparison with previously published microarray data from one chilling tolerant and two sensitive rice cultivars identified 182 genes differentially expressed (DE) upon cold stress in all four rice cultivars and 511 genes DE only in the chilling tolerant rice. Promoter analysis of the 182 genes suggests a complex cross-talk between ABRE and CBF regulons. Promoter analysis of the 511 genes identified over-represented ABRE motifs but not DRE motifs, suggesting a role for ABA signaling in cold tolerance. Moreover, 2,101 genes were DE in Jumli Marshi alone. By chromosomal localization analysis, 473 of these cold responsive genes were located within 13 different QTLs previously identified as cold associated. PMID:24349120

  8. Global irradiation effects, stem cell genes and rare transcripts in the planarian transcriptome.

    PubMed

    Galloni, Mireille

    2012-01-01

    Stem cells are the closest relatives of the totipotent primordial cell, which is able to spawn millions of daughter cells and hundreds of cell types in multicellular organisms. Stem cells are involved in tissue homeostasis and regeneration, and may play a major role in cancer development. Among animals, planarians host a model stem cell type, called the neoblast, which essentially confers immortality. Gaining insights into the global transcriptional landscape of these exceptional cells takes an unprecedented turn with the advent of Next Generation Sequencing methods. Two Digital Gene Expression transcriptomes of Schmidtea mediterranea planarians, with or without neoblasts lost through irradiation, were produced and analyzed. Twenty one bp NlaIII tags were mapped to transcripts in the Schmidtea and Dugesia taxids. Differential representation of tags in normal versus irradiated animals reflects differential gene expression. Canonical and non-canonical tags were included in the analysis, and comparative studies with human orthologs were conducted. Transcripts fell into 3 categories: invariant (including housekeeping genes), absent in irradiated animals (potential neoblast-specific genes, IRDOWN) and induced in irradiated animals (potential cellular stress response, IRUP). Different mRNA variants and gene family members were recovered. In the IR-DOWN class, almost all of the neoblast-specific genes previously described were found. In irradiated animals, a larger number of genes were induced rather than lost. A significant fraction of IRUP genes behaved as if transcript versions of different lengths were produced. Several novel potential neoblast-specific genes have been identified that varied in relative abundance, including highly conserved as well as novel proteins without predicted orthologs. Evidence for a large body of antisense transcripts, for example regulated antisense for the Smed-piwil1 gene, and evidence for RNA shortening in irradiated animals is presented

  9. The global gene expression profile of the secondary transition during pancreatic development.

    PubMed

    Willmann, Stefanie J; Mueller, Nikola S; Engert, Silvia; Sterr, Michael; Burtscher, Ingo; Raducanu, Aurelia; Irmler, Martin; Beckers, Johannes; Sass, Steffen; Theis, Fabian J; Lickert, Heiko

    2016-02-01

    Pancreas organogenesis is a highly dynamic process where neighboring tissue interactions lead to dynamic changes in gene regulatory networks that orchestrate endocrine, exocrine, and ductal lineage formation. To understand the spatio-temporal regulatory logic we have used the Forkhead transcription factor Foxa2-Venus fusion (FVF) knock-in reporter mouse to separate the FVF(+) pancreatic epithelium from the FVF(−) surrounding tissue (mesenchyme, neurons, blood, and blood vessels) to perform a genome-wide mRNA expression profiling at embryonic days (E) 12.5-15.5. Annotating genes and molecular processes suggest that FVF marks endoderm-derived multipotent epithelial progenitors at several lineage restriction steps, when the bulk of endocrine, exocrine and ductal cells are formed during the secondary transition. In the pancreatic epithelial compartment, we identified most known endocrine and exocrine lineage determining factors and diabetes-associated genes, but also unknown genes with spatio-temporal regulated pancreatic expression. In the non-endoderm-derived compartment, we identified many well-described regulatory genes that are not yet functionally annotated in pancreas development, emphasizing that neighboring tissue interactions are still ill defined. Pancreatic expression of over 635 genes was analyzed with them RNA in situ hybridization Genepaint public database. This validated the quality of the profiling data set and identified hundreds of genes with spatially restricted expression patterns in the pancreas. Some of these genes are also targeted by pancreatic transcription factors and show active chromatin marks in human islets of Langerhans. Thus, with the highest spatio-temporal resolution of a global gene expression profile during the secondary transition, our study enables to shed light on neighboring tissue interactions, developmental timing and diabetes gene regulation.

  10. Global gene expression in cotton (Gossypium hirsutum L.) leaves to waterlogging stress.

    PubMed

    Zhang, Yanjun; Kong, Xiangqiang; Dai, Jianlong; Luo, Zhen; Li, Zhenhuai; Lu, Hequan; Xu, Shizhen; Tang, Wei; Zhang, Dongmei; Li, Weijiang; Xin, Chengsong; Dong, Hezhong

    2017-01-01

    Cotton is sensitive to waterlogging stress, which usually results in stunted growth and yield loss. To date, the molecular mechanisms underlying the responses to waterlogging in cotton remain elusive. Cotton was grown in a rain-shelter and subjected to 0 (control)-, 10-, 15- and 20-d waterlogging at flowering stage. The fourth-leaves on the main-stem from the top were sampled and immediately frozen in liquid nitrogen for physiological measurement. Global gene transcription in the leaves of 15-d waterlogged plants was analyzed by RNA-Seq. Seven hundred and ninety four genes were up-regulated and 1018 genes were down-regulated in waterlogged cotton leaves compared with non-waterlogged control. The differentially expressed genes were mainly related to photosynthesis, nitrogen metabolism, starch and sucrose metabolism, glycolysis and plant hormone signal transduction. KEGG (Kyoto Encyclopedia of Genes and Genomes) analysis indicated that most genes related to flavonoid biosynthesis, oxidative phosphorylation, amino acid metabolism and biosynthesis as well as circadian rhythm pathways were differently expressed. Waterlogging increased the expression of anaerobic fermentation related genes, such as alcohol dehydrogenase (ADH), but decreased the leaf chlorophyll concentration and photosynthesis by down-regulating the expression of photosynthesis related genes. Many genes related to plant hormones and transcription factors were differently expressed under waterlogging stress. Most of the ethylene related genes and ethylene-responsive factor-type transcription factors were up-regulated under water-logging stress, suggesting that ethylene may play key roles in the survival of cotton under waterlogging stress.

  11. Ubiquitous cyanobacterial podoviruses in the global oceans unveiled through viral DNA polymerase gene sequences.

    PubMed

    Huang, Sijun; Wilhelm, Steven W; Jiao, Nianzhi; Chen, Feng

    2010-10-01

    As a major cyanophage group, cyanobacterial podoviruses are important in regulating the biomass and population structure of picocyanobacteria in the ocean. However, little is known about their biogeography in the open ocean. This study represents the first survey of the biodiversity of cyanopodoviruses in the global oceans based on the viral encoded DNA polymerase (pol) gene. A total of 303 DNA pol sequences were amplified by PCR from 10 virus communities collected in the Atlantic and Pacific oceans and the South China Sea. At least five subclusters of cyanopodoviruses were identified in these samples, and one subcluster (subcluster VIII) was found in all sampling sites and comprised approximately 50% of total sequences. The diversity index based on the DNA pol gene sequences recovered through PCR suggests that cyanopodoviruses are less diverse in these oceanic samples than in a previously studied estuarine environment. Although diverse podoviruses were present in the global ocean, each sample was dominated by one major group of cyanopodoviruses. No clear biogeographic patterns were observed using statistical analysis. A metagenomic analysis based on the Global Ocean Sampling database indicates that other types of cyanopodovirus-like DNA pol sequences were present in the global ocean. Together, our study results suggest that cyanopodoviruses are widely distributed in the ocean but their community composition varies with local environments.

  12. Mining from transcriptomes: 315 single-copy orthologous genes concatenated for the phylogenetic analyses of Orchidaceae.

    PubMed

    Deng, Hua; Zhang, Guo-Qiang; Lin, Min; Wang, Yan; Liu, Zhong-Jian

    2015-09-01

    Phylogenetic relationships are hotspots for orchid studies with controversial standpoints. Traditionally, the phylogenies of orchids are based on morphology and subjective factors. Although more reliable than classic phylogenic analyses, the current methods are based on a few gene markers and PCR amplification, which are labor intensive and cannot identify the placement of some species with degenerated plastid genomes. Therefore, a more efficient, labor-saving and reliable method is needed for phylogenic analysis. Here, we present a method of orchid phylogeny construction using transcriptomes. Ten representative species covering five subfamilies of Orchidaceae were selected, and 315 single-copy orthologous genes extracted from the transcriptomes of these organisms were applied to reconstruct a more robust phylogeny of orchids. This approach provided a rapid and reliable method of phylogeny construction for Orchidaceae, one of the most diversified family of angiosperms. We also showed the rigorous systematic position of holomycotrophic species, which has previously been difficult to determine because of the degenerated plastid genome. We concluded that the method presented in this study is more efficient and reliable than methods based on a few gene markers for phylogenic analyses, especially for the holomycotrophic species or those whose DNA sequences have been difficult to amplify. Meanwhile, a total of 315 single-copy orthologous genes of orchids are offered and more informative loci could be used in the future orchid phylogenetic studies.

  13. Mining from transcriptomes: 315 single-copy orthologous genes concatenated for the phylogenetic analyses of Orchidaceae

    PubMed Central

    Deng, Hua; Zhang, Guo-Qiang; Lin, Min; Wang, Yan; Liu, Zhong-Jian

    2015-01-01

    Phylogenetic relationships are hotspots for orchid studies with controversial standpoints. Traditionally, the phylogenies of orchids are based on morphology and subjective factors. Although more reliable than classic phylogenic analyses, the current methods are based on a few gene markers and PCR amplification, which are labor intensive and cannot identify the placement of some species with degenerated plastid genomes. Therefore, a more efficient, labor-saving and reliable method is needed for phylogenic analysis. Here, we present a method of orchid phylogeny construction using transcriptomes. Ten representative species covering five subfamilies of Orchidaceae were selected, and 315 single-copy orthologous genes extracted from the transcriptomes of these organisms were applied to reconstruct a more robust phylogeny of orchids. This approach provided a rapid and reliable method of phylogeny construction for Orchidaceae, one of the most diversified family of angiosperms. We also showed the rigorous systematic position of holomycotrophic species, which has previously been difficult to determine because of the degenerated plastid genome. We concluded that the method presented in this study is more efficient and reliable than methods based on a few gene markers for phylogenic analyses, especially for the holomycotrophic species or those whose DNA sequences have been difficult to amplify. Meanwhile, a total of 315 single-copy orthologous genes of orchids are offered and more informative loci could be used in the future orchid phylogenetic studies. PMID:26380706

  14. Identification of genes specific to mouse primordial germ cells through dynamic global gene expression.

    PubMed

    Sabour, Davood; Araúzo-Bravo, Marcos J; Hübner, Karin; Ko, Kinarm; Greber, Boris; Gentile, Luca; Stehling, Martin; Schöler, Hans R

    2011-01-01

    Molecular mechanisms underlying the commitment of cells to the germ cell lineage during mammalian embryogenesis remain poorly understood due to the limited availability of cellular materials to conduct in vitro analyses. Although primordial germ cells (PGCs)--precursors to germ cells--have been generated from embryonic stem cells (ESCs)--pluripotent stem cells derived from the inner cell mass of the blastocyst of the early embryo in vitro-the simultaneous expression of cell surface receptors and transcription factors complicates the detection of PGCs. To date, only a few genes that mark the onset of germ cell commitment in the epiblast--the outer layer of cells of the embryo--including tissue non-specific alkaline phosphatase (TNAP), Blimp1, Stella and Fragilis--have been used with some success to detect PGC formation in in vitro model systems. Here, we identified 11 genes (three of which are novel) that are specifically expressed in male and female fetal germ cells, both in vivo and in vitro, but are not expressed in ESCs. Expression of these genes allows us to distinguish committed germ cells from undifferentiated pluripotent cell populations, a prerequisite for the successful derivation of germ cells and gametes in vitro.

  15. Genome Mining for Radical SAM Protein Determinants Reveals Multiple Sactibiotic-Like Gene Clusters

    PubMed Central

    Murphy, Kiera; O'Sullivan, Orla; Rea, Mary C.; Cotter, Paul D.; Ross, R. Paul; Hill, Colin

    2011-01-01

    Thuricin CD is a two-component bacteriocin produced by Bacillus thuringiensis that kills a wide range of clinically significant Clostridium difficile. This bacteriocin has recently been characterized and consists of two distinct peptides, Trnβ and Trnα, which both possess 3 intrapeptide sulphur to α-carbon bridges and act synergistically. Indeed, thuricin CD and subtilosin A are the only antimicrobials known to possess these unusual structures and are known as the sactibiotics (sulplur to alpha carbon-containing antibiotics). Analysis of the thuricin CD-associated gene cluster revealed the presence of genes encoding two highly unusual SAM proteins (TrnC and TrnD) which are proposed to be responsible for these unusual post-translational modifications. On the basis of the frequently high conservation among enzymes responsible for the post-translational modification of specific antimicrobials, we performed an in silico screen for novel thuricin CD–like gene clusters using the TrnC and TrnD radical SAM proteins as driver sequences to perform an initial homology search against the complete non-redundant database. Fifteen novel thuricin CD–like gene clusters were identified, based on the presence of TrnC and TrnD homologues in the context of neighbouring genes encoding potential bacteriocin structural peptides. Moreover, metagenomic analysis revealed that TrnC or TrnD homologs are present in a variety of metagenomic environments, suggesting a widespread distribution of thuricin-like operons in a variety of environments. In-silico analysis of radical SAM proteins is sufficient to identify novel putative sactibiotic clusters. PMID:21760885

  16. Text mining and expert curation to develop a database on psychiatric diseases and their genes

    PubMed Central

    Gutiérrez-Sacristán, Alba; Bravo, Àlex; Portero-Tresserra, Marta; Valverde, Olga; Armario, Antonio; Blanco-Gandía, M.C.; Farré, Adriana; Fernández-Ibarrondo, Lierni; Fonseca, Francina; Giraldo, Jesús; Leis, Angela; Mané, Anna; Mayer, M.A.; Montagud-Romero, Sandra; Nadal, Roser; Ortiz, Jordi; Pavon, Francisco Javier; Perez, Ezequiel Jesús; Rodríguez-Arias, Marta; Serrano, Antonia; Torrens, Marta; Warnault, Vincent; Sanz, Ferran

    2017-01-01

    Abstract Psychiatric disorders constitute one of the main causes of disability worldwide. During the past years, considerable research has been conducted on the genetic architecture of such diseases, although little understanding of their etiology has been achieved. The difficulty to access up-to-date, relevant genotype-phenotype information has hampered the application of this wealth of knowledge to translational research and clinical practice in order to improve diagnosis and treatment of psychiatric patients. PsyGeNET (http://www.psygenet.org/) has been developed with the aim of supporting research on the genetic architecture of psychiatric diseases, by providing integrated and structured accessibility to their genotype–phenotype association data, together with analysis and visualization tools. In this article, we describe the protocol developed for the sustainable update of this knowledge resource. It includes the recruitment of a team of domain experts in order to perform the curation of the data extracted by text mining. Annotation guidelines and a web-based annotation tool were developed to support the curators’ tasks. A curation workflow was designed including a pilot phase and two rounds of curation and analysis phases. Negative evidence from the literature on gene–disease associations (GDAs) was taken into account in the curation process. We report the results of the application of this workflow to the curation of GDAs for PsyGeNET, including the analysis of the inter-annotator agreement and suggest this model as a suitable approach for the sustainable development and update of knowledge resources. Database URL: http://www.psygenet.org PsyGeNET corpus: http://www.psygenet.org/ds/PsyGeNET/results/psygenetCorpus.tar

  17. Global gene analysis of oocytes from early stages in human folliculogenesis shows high expression of novel genes in reproduction.

    PubMed

    Markholt, S; Grøndahl, M L; Ernst, E H; Andersen, C Yding; Ernst, E; Lykke-Hartmann, K

    2012-02-01

    The pool of primordial follicles in humans is laid down during embryonic development and follicles can remain dormant for prolonged intervals, often decades, until individual follicles resume growth. The mechanisms that induce growth and maturation of primordial follicles are poorly understood but follicles once activated either continue growth or undergo atresia. We have isolated pure populations of oocytes from human primordial, intermediate and primary follicles using laser capture micro-dissection microscopy and evaluated the global gene expression profiles by whole-genome microarray analysis. The array data were confirmed by qPCR for selected genes. A total of 6301 unique genes were identified as significantly expressed representing enriched specific functional categories such as 'RNA binding', 'translation initiation' and 'structural molecule activity'. Several genes, some not previously known to be associated with early oocyte development, were identified with exceptionally high expression levels, such as the anti-proliferative transmembrane protein with an epidermal growth factor-like and two follistatin-like domains (TMEFF2), the Rho-GTPase-activating protein oligophrenin 1 (OPHN1) and the mitochondrial-encoded ATPase6 (ATP6). Thus, the present study provides not only a technique to capture and perform transcriptome analysis of the sparse material of human oocytes from the earliest follicle stages but further includes a comprehensive basis for our understanding of the regulatory factors and pathways present during early human folliculogenesis.

  18. Phosphorylation Events in the Multiple Gene Regulator of Group A Streptococcus Significantly Influence Global Gene Expression and Virulence

    PubMed Central

    Sanson, Misu; Makthal, Nishanth; Gavagan, Maire; Cantu, Concepcion; Olsen, Randall J.; Musser, James M.

    2015-01-01

    Whole-genome sequencing analysis of ∼800 strains of group A Streptococcus (GAS) found that the gene encoding the multiple virulence gene regulator of GAS (mga) is highly polymorphic in serotype M59 strains but not in strains of other serotypes. To help understand the molecular mechanism of gene regulation by Mga and its contribution to GAS pathogenesis in serotype M59 GAS, we constructed an isogenic mga mutant strain. Transcriptome studies indicated a significant regulatory influence of Mga and altered metabolic capabilities conferred by Mga-regulated genes. We assessed the phosphorylation status of Mga in GAS cell lysates with Phos-tag gels. The results revealed that Mga is phosphorylated at histidines in vivo. Using phosphomimetic and nonphosphomimetic substitutions at conserved phosphoenolpyruvate:carbohydrate phosphotransferase regulation domain (PRD) histidines of Mga, we demonstrated that phosphorylation-mimicking aspartate replacements at H207 and H273 of PRD-1 and at H327 of PRD-2 are inhibitory to Mga-dependent gene expression. Conversely, non-phosphorylation-mimicking alanine substitutions at H273 and H327 relieved inhibition, and the mutant strains exhibited a wild-type phenotype. The opposing regulatory profiles observed for phosphorylation- and non-phosphorylation-mimicking substitutions at H273 extended to global gene regulation by Mga. Consistent with these observations, the H273D mutant strain attenuated GAS virulence, whereas the H273A strain exhibited a wild-type virulence phenotype in a mouse model of necrotizing fasciitis. Together, our results demonstrate phosphoregulation of Mga and its direct link to virulence in M59 GAS strains. These data also lay a foundation toward understanding how naturally occurring gain-of-function variations in mga, such as H201R, may confer an advantage to the pathogen and contribute to M59 GAS pathogenesis. PMID:25824840

  19. Alcohol consumption induces global gene expression changes in VTA dopaminergic neurons.

    PubMed

    Marballi, K; Genabai, N K; Blednov, Y A; Harris, R A; Ponomarev, I

    2016-03-01

    Alcoholism is associated with dysregulation in the neural circuitry that mediates motivated and goal-directed behaviors. The dopaminergic (DA) connection between the ventral tegmental area (VTA) and the nucleus accumbens is viewed as a critical component of the neurocircuitry mediating alcohol's rewarding and behavioral effects. We sought to determine the effects of binge alcohol drinking on global gene expression in VTA DA neurons. Alcohol-preferring C57BL/6J × FVB/NJ F1 hybrid female mice were exposed to a modified drinking in the dark (DID) procedure for 3 weeks, while control animals had access to water only. Global gene expression of laser-captured tyrosine hydroxylase (TH)-positive VTA DA neurons was measured using microarrays. A total of 644 transcripts were differentially expressed between the drinking and nondrinking mice, and 930 transcripts correlated with alcohol intake during the last 2 days of drinking in the alcohol group. Bioinformatics analysis of alcohol-responsive genes identified molecular pathways and networks perturbed in DA neurons by alcohol consumption, which included neuroimmune and epigenetic functions, alcohol metabolism and brain disorders. The majority of genes with high and specific expression in DA neurons were downregulated by or negatively correlated with alcohol consumption, suggesting a decreased activity of DA neurons in high drinking animals. These changes in the DA transcriptome provide a foundation for alcohol-induced neuroadaptations that may play a crucial role in the transition to addiction. © 2015 John Wiley & Sons Ltd and International Behavioural and Neural Genetics Society.

  20. Alcohol consumption induces global gene expression changes in VTA dopaminergic neurons

    PubMed Central

    Marballi, Ketan; Genabai, Naresh K.; Blednov, Yuri A.; Harris, R. Adron; Ponomarev, Igor

    2016-01-01

    Alcoholism is associated with dysregulation in the neural circuitry that mediates motivated and goal-directed behaviors. The dopaminergic connection between the ventral tegmental area (VTA) and the nucleus accumbens is viewed as a critical component of the neurocircuitry mediating alcohol’s rewarding and behavioral effects. We sought to determine the effects of binge alcohol drinking on global gene expression in VTA dopaminergic (DA) neurons. Alcohol-preferring C57BL/6J × FVB/NJ F1 hybrid female mice were exposed to a modified drinking in the dark (DID) procedure for 3 weeks, while control animals had access to water only. Global gene expression of laser-captured tyrosine hydroxylase - positive VTA DA neurons was measured using microarrays. 644 transcripts were differentially expressed between the drinking and non-drinking mice and 930 transcripts correlated with alcohol intake during the last two days of drinking in the alcohol group. Bioinformatics analysis of alcohol-responsive genes identified molecular pathways and networks perturbed in DA neurons by alcohol consumption, which included neuroimmune and epigenetic functions, alcohol metabolism and brain disorders. The majority of genes with high and specific expression in DA neurons were down regulated by or negatively correlated with alcohol consumption, suggesting a decreased activity of DA neurons in high drinking animals. These changes in the dopaminergic transcriptome provide a foundation for alcohol-induced neuroadaptations that may play a crucial role in the transition to addiction. PMID:26482798

  1. Mapping global vulnerability index in mining sectors: A case study Moulares-Redayef aquifer system, southwestern Tunisia

    NASA Astrophysics Data System (ADS)

    Khelif, Nadia; Jmal, Ikram; Bouri, Salem

    2016-09-01

    Contrary to the DRASTIC model grouping together the saturated and unsaturated zones to compute a global intrinsic vulnerability index, the global vulnerability index method incorporates both hydrogeological and hydrochemical data for a comprehensive index mapping for the saturated zones. This concept depends on the behavior and the uses of the groundwater. The main aim of this study is to propose a scientific basis for sustainable land use planning and groundwater management of the Moulares-Reayef aquifer, located in Southwestern Tunisia. The overexploitation of this aquifer causes the threat of groundwater quality by various sources of pollution. The global vulnerability index was applied in the Moulares-Reayef aquifer. The results show that the most favorable zones to pollutant percolation are situated along the wadis (Tabaddit, Zallaz, Berka, …) which are drained by continuous discharges. The global vulnerability values were correlated with nitrates values for validation. It revealed a significant correlation showing that high values of nitrates occurred in highly vulnerable zones with a value of 0.69 for the Pearson coefficient. The global vulnerability evaluation shows that the aquifer is characterized by high vertical vulnerability and high susceptibility.

  2. Global gene profiling of aging lungs in Atp8b1 mutant mice

    PubMed Central

    Soundararajan, Ramani; Stearns, Timothy M.; Czachor, Alexander; Fukumoto, Jutaro; Turn, Christina; Westermann-Clark, Emma; Breitzig, Mason; Tan, Lee; Lockey, Richard F.; King, Benjamin L.; Kolliputi, Narasaiah

    2016-01-01

    Objective Recent studies implicate cardiolipin oxidation in several age-related diseases. Atp8b1 encoding Type 4 P-type ATPases is a cardiolipin transporter. Mutation in Atp8b1 gene or inflammation of the lungs impairs the capacity of Atp8b1 to clear cardiolipin from lung fluid. However, the link between Atp8b1 mutation and age-related gene alteration is unknown. Therefore, we investigated how Atp8b1 mutation alters age-related genes. Methods We performed Affymetrix gene profiling of lungs isolated from young (7-9 wks, n=6) and aged (14 months, 14 M, n=6) C57BL/6 and Atp8b1 mutant mice. In addition, Ingenuity Pathway Analysis (IPA) was performed. Differentially expressed genes were validated by quantitative real-time PCR (qRT-PCR). Results Global transcriptome analysis revealed 532 differentially expressed genes in Atp8b1 lungs, 157 differentially expressed genes in C57BL/6 lungs, and 37 overlapping genes. IPA of age-related genes in Atp8b1 lungs showed enrichment of Xenobiotic metabolism and Nrf2-mediated signaling pathways. The increase in Adamts2 and Mmp13 transcripts in aged Atp8b1 lungs was validated by qRT-PCR. Similarly, the decrease in Col1a1 and increase in Cxcr6 transcripts was confirmed in both Atp8b1 mutant and C57BL/6 lungs. Conclusion Based on transcriptome profiling, our study indicates that Atp8b1 mutant mice may be susceptible to age-related lung diseases. PMID:27689529

  3. Temporal Global Changes in Gene Expression during Temperature Transition in Yersinia pestis

    PubMed Central

    Motin, Vladimir L.; Georgescu, Anca M.; Fitch, Joseph P.; Gu, Pauline P.; Nelson, David O.; Mabery, Shalini L.; Garnham, Janine B.; Sokhansanj, Bahrad A.; Ott, Linda L.; Coleman, Matthew A.; Elliott, Jeffrey M.; Kegelmeyer, Laura M.; Wyrobek, Andrew J.; Slezak, Thomas R.; Brubaker, Robert R.; Garcia, Emilio

    2004-01-01

    DNA microarrays encompassing the entire genome of Yersinia pestis were used to characterize global regulatory changes during steady-state vegetative growth occurring after shift from 26 to 37°C in the presence and absence of Ca2+. Transcriptional profiles revealed that 51, 4, and 13 respective genes and open reading frames (ORFs) on pCD, pPCP, and pMT were thermoinduced and that the majority of these genes carried by pCD were downregulated by Ca2+. In contrast, Ca2+ had little effect on chromosomal genes and ORFs, of which 235 were thermally upregulated and 274 were thermally downregulated. The primary consequence of these regulatory events is profligate catabolism of numerous metabolites available in the mammalian host. PMID:15342600

  4. Temporal global changes in gene expression during temperature transition in Yersinia pestis.

    PubMed

    Motin, Vladimir L; Georgescu, Anca M; Fitch, Joseph P; Gu, Pauline P; Nelson, David O; Mabery, Shalini L; Garnham, Janine B; Sokhansanj, Bahrad A; Ott, Linda L; Coleman, Matthew A; Elliott, Jeffrey M; Kegelmeyer, Laura M; Wyrobek, Andrew J; Slezak, Thomas R; Brubaker, Robert R; Garcia, Emilio

    2004-09-01

    DNA microarrays encompassing the entire genome of Yersinia pestis were used to characterize global regulatory changes during steady-state vegetative growth occurring after shift from 26 to 37 degrees C in the presence and absence of Ca2+. Transcriptional profiles revealed that 51, 4, and 13 respective genes and open reading frames (ORFs) on pCD, pPCP, and pMT were thermoinduced and that the majority of these genes carried by pCD were downregulated by Ca2+. In contrast, Ca2+ had little effect on chromosomal genes and ORFs, of which 235 were thermally upregulated and 274 were thermally downregulated. The primary consequence of these regulatory events is profligate catabolism of numerous metabolites available in the mammalian host.

  5. Bacterial growth: global effects on gene expression, growth feedback and proteome partition.

    PubMed

    Klumpp, Stefan; Hwa, Terence

    2014-08-01

    The function of endogenous as well as synthetic genetic circuits is generically coupled to the physiological state of the cell. For exponentially growing bacteria, a key characteristic of the state of the cell is the growth rate and thus gene expression is often growth-rate dependent. Here we review recent results on growth-rate dependent gene expression. We distinguish different types of growth-rate dependencies by the mechanisms of regulation involved and the presence or absence of an effect of the gene product on growth. The latter can lead to growth feedback, feedback mediated by changes of the global state of the cell. Moreover, we discuss how growth rate dependence can be used as a guide to study the molecular implementation of physiological regulation. Copyright © 2014 Elsevier Ltd. All rights reserved.

  6. Targeted mining of drought stress-responsive genes from EST resources in Cleistogenes songorica.

    PubMed

    Zhang, Jiyu; John, Ulrik P; Wang, Yanrong; Li, Xi; Gunawardana, Dilini; Polotnianka, Renata M; Spangenberg, German C; Nan, Zhibiao

    2011-10-15

    Cleistogenes songorica is an important perennial grass found in the pastoral steppe of Inner Mongolia. C. songorica flourishes in drought prone environments, and therefore provides an ideal candidate plant system for the identification of drought-tolerance conferring genes. We constructed cDNA libraries from leaves and roots of drought-stressed C. songorica seedlings. Expressed sequence tag (EST) sequencing of 5664 random cDNA clones produced 3579 high quality, trimmed sequences. The average read length of trimmed ESTs was 613bp. Clustering and assembly identified a non-redundant set of 1499 contigs, including 805 singleton unigenes and 694 multi-member unigenes. The resulting unigenes were functionally categorized according to the Gene Ontology (GO) hierarchy using the in house Bioinformatic Advanced Scientific Computing (BASC) annotation pipeline. Among the total 2.2Mbp of EST sequence data, 161 putative SSRs were found, a frequency similar to that previously observed in oat and Arabidopsis ESTs. Sixty-three unigenes were functionally annotated as being stress responsive, of which 22 were similar to genes implicated in drought stress response. Using quantitative real time RT-PCR, transcripts of 13 of these 22 genes were shown to be at least three fold more, or less abundant in drought-stressed leaves or roots, with 8 increased and 5 decreased in relative transcript abundance. The C. songorica EST and cDNA collections generated in this study are a valuable resource for microarray-based expression profiling, and functional genomics in order to elucidate their role, and to understand the underlying mechanisms of drought-tolerance in C. songorica.

  7. Effect of metformin on global gene expression in liver of KKAy mice.

    PubMed

    Liu, Zhi-Qin; Song, Xiao-Mei; Chen, Que-Ting; Liu, Ting; Teng, Ji-Tao; Zhou, Kun; Luo, Du-Qiang

    2016-12-01

    Metformin is a first-line drug for treating type 2 diabetes mellitus, yet its mechanism remains only partially understood and controversial. In this study we assessed a global gene expression profiling in liver of KKAy mice affected by metformin. This study aimed to identify the novel anti-diabetic mechanisms of metformin. After KKAy mice were administered metformin for 5 weeks, the gene changes profile in the livers of KKAy mice were assessed by using the Agilent whole mice genome oligo microarray. Metformin altered the gene expression profiles in liver of KKAy mice. To our best knowledge, some genes have not been reported until now, such as Anxa2, Atf6, and so on. These genes were involved in many pathways, such as peroxisome proliferator activated receptor signaling pathway. Gene expression changes induced by metformin were in support of the improvement of glucolipid metabolism and insulin resistance in KKAy mice. These findings expanded our knowledge of pharmacological action of metformin, and provided the potential novel insights and interesting information about the molecules involved in the antidiabetic effects of metformin. Copyright © 2016. Published by Elsevier Urban & Partner Sp. z o.o.

  8. Global features of gene expression on the proteome and transcriptome levels in S. coelicolor during germination.

    PubMed

    Strakova, Eva; Bobek, Jan; Zikova, Alice; Vohradsky, Jiri

    2013-01-01

    Streptomycetes have been studied mostly as producers of secondary metabolites, while the transition from dormant spores to an exponentially growing culture has largely been ignored. Here, we focus on a comparative analysis of fluorescently and radioactively labeled proteome and microarray acquired transcriptome expressed during the germination of Streptomyces coelicolor. The time-dynamics is considered, starting from dormant spores through 5.5 hours of growth with 13 time points. Time series of the gene expressions were analyzed using correlation, principal components analysis and an analysis of coding genes utilization. Principal component analysis was used to identify principal kinetic trends in gene expression and the corresponding genes driving S. coelicolor germination. In contrast with the correlation analysis, global trends in the gene/protein expression reflected by the first principal components showed that the prominent patterns in both the protein and the mRNA domains are surprisingly well correlated. Analysis of the number of expressed genes identified functional groups activated during different time intervals of the germination.

  9. Discovery of the rhizopodin biosynthetic gene cluster in Stigmatella aurantiaca Sg a15 by genome mining.

    PubMed

    Pistorius, Dominik; Müller, Rolf

    2012-02-13

    The field of bacterial natural product research is currently undergoing a paradigm change concerning the discovery of natural products. Previously most efforts were based on isolation of the most abundant compound in an extract, or on tracking bioactivity. However, traditional activity-guided approaches are limited by the available test panels and frequently lead to the rediscovery of already known compounds. The constantly increasing availability of bacterial genome sequences provides the potential for the discovery of a huge number of new natural compounds by in silico identification of biosynthetic gene clusters. Examination of the information on the biosynthetic machinery can further prevent rediscovery of known compounds, and can help identify so far unknown biosynthetic pathways of known compounds. By in silico screening of the genome of the myxobacterium Stigmatella aurantiaca Sg a15, a trans-AT polyketide synthase/non-ribosomal peptide synthetase (PKS/NRPS) gene cluster was identified that could not be correlated to any secondary metabolite known to be produced by this strain. Targeted gene inactivation and analysis of extracts from the resulting mutants by high performance liquid chromatography coupled to high resolution mass spectrometry (HPLC-HRMS), in combination with the use of statistical tools resulted in the identification of a compound that was absent in the mutants extracts. By matching with our in-house database of myxobacterial secondary metabolites, this compound was identified as rhizopodin. A detailed analysis of the rhizopodin biosynthetic machinery is presented in this manuscript.

  10. Identification of novel breast cancer-associated transcripts by UniGene database mining and gene expression analysis in normal and malignant cells.

    PubMed

    Laversin, Stéphanie A-S; Phatak, Vinaya M; Powe, Des G; Li, Geng; Miles, Amanda K; Hughes, David C; Ball, Graham R; Ellis, Ian O; Gritzapis, Angelos D; Missitzis, Ioannis; McArdle, Stéphanie E B; Rees, Robert C

    2013-03-01

    Breast cancer is a heterogeneous and complex disease. Although the use of tumor biomarkers has improved individualized breast cancer care, i.e., assessment of risk, diagnosis, prognosis, and prediction of treatment outcome, new markers are required to further improve patient clinical management. In the present study, a search for novel breast cancer-associated genes was performed by mining the UniGene database for expressed sequence tags (ESTs) originating from human normal breast, breast cancer tissue, or breast cancer cell lines. Two hundred and twenty-eight distinct breast-associated UniGene Clusters (BUC1-228) matched the search criteria. Four BUC ESTs (BUC6, BUC9, BUC10, and BUC11) were subsequently selected for extensive in silico database searches, and in vitro analyses through sequencing and RT-PCR based assays on well-characterized cell lines and tissues of normal and cancerous origin. BUC6, BUC9, BUC10, and BUC11 are clustered on 10p11.21-12.1 and showed no homology to any known RNAs. Overall, expression of the four BUC transcripts was high in normal breast and testis tissue, and in some breast cancers; in contrast, BUC was low in other normal tissues, peripheral blood mononuclear cells (PBMCs), and other cancer cell lines. Results to-date suggest that BUC11 and BUC9 translate to protein and BUC11 cytoplasmic and nuclear protein expression was detected in a large cohort of breast cancer samples using immunohistochemistry. This study demonstrates the discovery and expression analysis of a tissue-restricted novel transcript set which is strongly expressed in breast tissue and their application as clinical cancer biomarkers clearly warrants further investigation. Copyright © 2012 Wiley Periodicals, Inc.

  11. Triterpenoid Saponin Biosynthetic Pathway Profiling and Candidate Gene Mining of the Ilex asprella Root Using RNA-Seq

    PubMed Central

    Zheng, Xiasheng; Xu, Hui; Ma, Xinye; Zhan, Ruoting; Chen, Weiwen

    2014-01-01

    Ilex asprella, which contains abundant α-amyrin type triterpenoid saponins, is an anti-influenza herbal drug widely used in south China. In this work, we first analysed the transcriptome of the I. asprella root using RNA-Seq, which provided a dataset for functional gene mining. mRNA was isolated from the total RNA of the I. asprella root and reverse-transcribed into cDNA. Then, the cDNA library was sequenced using an Illumina HiSeq™ 2000, which generated 55,028,452 clean reads. De novo assembly of these reads generated 51,865 unigenes, in which 39,269 unigenes were annotated (75.71% yield). According to the structures of the triterpenoid saponins of I. asprella, a putative biosynthetic pathway downstream of 2,3-oxidosqualene was proposed and candidate unigenes in the transcriptome data that were potentially involved in the pathway were screened using homology-based BLAST and phylogenetic analysis. Further amplification and functional analysis of these putative unigenes will provide insight into the biosynthesis of Ilex triterpenoid saponins. PMID:24722569

  12. Triterpenoid saponin biosynthetic pathway profiling and candidate gene mining of the Ilex asprella root using RNA-Seq.

    PubMed

    Zheng, Xiasheng; Xu, Hui; Ma, Xinye; Zhan, Ruoting; Chen, Weiwen

    2014-04-09

    Ilex asprella, which contains abundant α-amyrin type triterpenoid saponins, is an anti-influenza herbal drug widely used in south China. In this work, we first analysed the transcriptome of the I. asprella root using RNA-Seq, which provided a dataset for functional gene mining. mRNA was isolated from the total RNA of the I. asprella root and reverse-transcribed into cDNA. Then, the cDNA library was sequenced using an Illumina HiSeq™ 2000, which generated 55,028,452 clean reads. De novo assembly of these reads generated 51,865 unigenes, in which 39,269 unigenes were annotated (75.71% yield). According to the structures of the triterpenoid saponins of I. asprella, a putative biosynthetic pathway downstream of 2,3-oxidosqualene was proposed and candidate unigenes in the transcriptome data that were potentially involved in the pathway were screened using homology-based BLAST and phylogenetic analysis. Further amplification and functional analysis of these putative unigenes will provide insight into the biosynthesis of Ilex triterpenoid saponins.

  13. Calibration of the maximum carboxylation velocity (Vcmax) using data mining techniques and ecophysiological data from the Brazilian semiarid region, for use in Dynamic Global Vegetation Models.

    PubMed

    Rezende, L F C; Arenque-Musa, B C; Moura, M S B; Aidar, S T; Von Randow, C; Menezes, R S C; Ometto, J P B H

    2016-06-01

    The semiarid region of northeastern Brazil, the Caatinga, is extremely important due to its biodiversity and endemism. Measurements of plant physiology are crucial to the calibration of Dynamic Global Vegetation Models (DGVMs) that are currently used to simulate the responses of vegetation in face of global changes. In a field work realized in an area of preserved Caatinga forest located in Petrolina, Pernambuco, measurements of carbon assimilation (in response to light and CO2) were performed on 11 individuals of Poincianella microphylla, a native species that is abundant in this region. These data were used to calibrate the maximum carboxylation velocity (Vcmax) used in the INLAND model. The calibration techniques used were Multiple Linear Regression (MLR), and data mining techniques as the Classification And Regression Tree (CART) and K-MEANS. The results were compared to the UNCALIBRATED model. It was found that simulated Gross Primary Productivity (GPP) reached 72% of observed GPP when using the calibrated Vcmax values, whereas the UNCALIBRATED approach accounted for 42% of observed GPP. Thus, this work shows the benefits of calibrating DGVMs using field ecophysiological measurements, especially in areas where field data is scarce or non-existent, such as in the Caatinga.

  14. The population genomics of begomoviruses: global scale population structure and gene flow

    PubMed Central

    2010-01-01

    Background The rapidly growing availability of diverse full genome sequences from across the world is increasing the feasibility of studying the large-scale population processes that underly observable pattern of virus diversity. In particular, characterizing the genetic structure of virus populations could potentially reveal much about how factors such as geographical distributions, host ranges and gene flow between populations combine to produce the discontinuous patterns of genetic diversity that we perceive as distinct virus species. Among the richest and most diverse full genome datasets that are available is that for the dicotyledonous plant infecting genus, Begomovirus, in the Family Geminiviridae. The begomoviruses all share the same whitefly vector, are highly recombinogenic and are distributed throughout tropical and subtropical regions where they seriously threaten the food security of the world's poorest people. Results We focus here on using a model-based population genetic approach to identify the genetically distinct sub-populations within the global begomovirus meta-population. We demonstrate the existence of at least seven major sub-populations that can further be sub-divided into as many as thirty four significantly differentiated and genetically cohesive minor sub-populations. Using the population structure framework revealed in the present study, we further explored the extent of gene flow and recombination between genetic populations. Conclusions Although geographical barriers are apparently the most significant underlying cause of the seven major population sub-divisions, within the framework of these sub-divisions, we explore patterns of gene flow to reveal that both host range differences and genetic barriers to recombination have probably been major contributors to the minor population sub-divisions that we have identified. We believe that the global Begomovirus population structure revealed here could facilitate population genetics studies

  15. GWA study data mining and independent replication identify cardiomyopathy-associated 5 (CMYA5) as a risk gene for schizophrenia.

    PubMed

    Chen, X; Lee, G; Maher, B S; Fanous, A H; Chen, J; Zhao, Z; Guo, A; van den Oord, E; Sullivan, P F; Shi, J; Levinson, D F; Gejman, P V; Sanders, A; Duan, J; Owen, M J; Craddock, N J; O'Donovan, M C; Blackman, J; Lewis, D; Kirov, G K; Qin, W; Schwab, S; Wildenauer, D; Chowdari, K; Nimgaonkar, V; Straub, R E; Weinberger, D R; O'Neill, F A; Walsh, D; Bronstein, M; Darvasi, A; Lencz, T; Malhotra, A K; Rujescu, D; Giegling, I; Werge, T; Hansen, T; Ingason, A; Nöethen, M M; Rietschel, M; Cichon, S; Djurovic, S; Andreassen, O A; Cantor, R M; Ophoff, R; Corvin, A; Morris, D W; Gill, M; Pato, C N; Pato, M T; Macedo, A; Gurling, H M D; McQuillin, A; Pimm, J; Hultman, C; Lichtenstein, P; Sklar, P; Purcell, S M; Scolnick, E; St Clair, D; Blackwood, D H R; Kendler, K S

    2011-11-01

    We conducted data-mining analyses using the Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) and molecular genetics of schizophrenia genome-wide association study supported by the genetic association information network (MGS-GAIN) schizophrenia data sets and performed bioinformatic prioritization for all the markers with P-values ≤0.05 in both data sets. In this process, we found that in the CMYA5 gene, there were two non-synonymous markers, rs3828611 and rs10043986, showing nominal significance in both the CATIE and MGS-GAIN samples. In a combined analysis of both the CATIE and MGS-GAIN samples, rs4704591 was identified as the most significant marker in the gene. Linkage disequilibrium analyses indicated that these markers were in low LD (3 828 611-rs10043986, r(2)=0.008; rs10043986-rs4704591, r(2)=0.204). In addition, CMYA5 was reported to be physically interacting with the DTNBP1 gene, a promising candidate for schizophrenia, suggesting that CMYA5 may be involved in the same biological pathway and process. On the basis of this information, we performed replication studies for these three single-nucleotide polymorphisms. The rs3828611 was found to have conflicting results in our Irish samples and was dropped out without further investigation. The other two markers were verified in 23 other independent data sets. In a meta-analysis of all 23 replication samples (family samples, 912 families with 4160 subjects; case-control samples, 11 380 cases and 15 021 controls), we found that both markers are significantly associated with schizophrenia (rs10043986, odds ratio (OR)=1.11, 95% confidence interval (CI)=1.04-1.18, P=8.2 × 10(-4) and rs4704591, OR=1.07, 95% CI=1.03-1.11, P=3.0 × 10(-4)). The results were also significant for the 22 Caucasian replication samples (rs10043986, OR=1.11, 95% CI=1.03-1.17, P=0.0026 and rs4704591, OR=1.07, 95% CI=1.02-1.11, P=0.0015). Furthermore, haplotype conditioned analyses indicated that the association

  16. GWA study data mining and independent replication identify cardiomyopathy-associated 5 (CMYA5) as a risk gene for schizophrenia

    PubMed Central

    Chen, X; Lee, G; Maher, BS; Fanous, AH; Chen, J; Zhao, Z; Guo, A; van den Oord, E; Sullivan, PF; Shi, J; Levinson, DF; Gejman, PV; Sanders, A; Duan, J; Owen, MJ; Craddock, NJ; O’Donovan, MC; Blackman, J; Lewis, D; Kirov, GK; Qin, W; Schwab, S; Wildenauer, D; Chowdari, K; Nimgaonkar, V; Straub, RE; Weinberger, DR; O’Neill, FA; Walsh, D; Bronstein, M; Darvasi, A; Lencz, T; Malhotra, AK; Rujescu, D; Giegling, I; Werge, T; Hansen, T; Ingason, A; Nöethen, MM; Rietschel, M; Cichon, S; Djurovic, S; Andreassen, OA; Cantor, RM; Ophoff, R; Corvin, A; Morris, DW; Gill, M; Pato, CN; Pato, MT; Macedo, A; Gurling, HMD; McQuillin, A; Pimm, J; Hultman, C; Lichtenstein, P; Sklar, P; Purcell, SM; Scolnick, E; St Clair, D; Blackwood, DHR; Kendler, KS

    2012-01-01

    We conducted data-mining analyses using the Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) and molecular genetics of schizophrenia genome-wide association study supported by the genetic association information network (MGS-GAIN) schizophrenia data sets and performed bioinformatic prioritization for all the markers with P-values ≤0.05 in both data sets. In this process, we found that in the CMYA5 gene, there were two non-synonymous markers, rs3828611 and rs10043986, showing nominal significance in both the CATIE and MGS-GAIN samples. In a combined analysis of both the CATIE and MGS-GAIN samples, rs4704591 was identified as the most significant marker in the gene. Linkage disequilibrium analyses indicated that these markers were in low LD (3 828 611–rs10043986, r2 = 0.008; rs10043986–rs4704591, r2 = 0.204). In addition, CMYA5 was reported to be physically interacting with the DTNBP1 gene, a promising candidate for schizophrenia, suggesting that CMYA5 may be involved in the same biological pathway and process. On the basis of this information, we performed replication studies for these three single-nucleotide polymorphisms. The rs3828611 was found to have conflicting results in our Irish samples and was dropped out without further investigation. The other two markers were verified in 23 other independent data sets. In a meta-analysis of all 23 replication samples (family samples, 912 families with 4160 subjects; case–control samples, 11 380 cases and 15 021 controls), we found that both markers are significantly associated with schizophrenia (rs10043986, odds ratio (OR) = 1.11, 95% confidence interval (CI) = 1.04–1.18, P = 8.2 × 10−4 and rs4704591, OR = 1.07, 95% CI = 1.03–1.11, P = 3.0 × 10−4). The results were also significant for the 22 Caucasian replication samples (rs10043986, OR = 1.11, 95% CI = 1.03–1.17, P = 0.0026 and rs4704591, OR = 1.07, 95% CI = 1.02–1.11, P = 0.0015). Furthermore, haplotype conditioned analyses

  17. Global aspects of pacC regulation of pathogenicity genes in Colletotrichum gloeosporioides as revealed by transcriptome analysis.

    PubMed

    Alkan, Noam; Meng, Xiangchun; Friedlander, Gilgi; Reuveni, Eli; Sukno, Serenella; Sherman, Amir; Thon, Michael; Fluhr, Robert; Prusky, Dov

    2013-11-01

    Colletotrichum gloeosporioides alkalinizes its surroundings during colonization of host tissue. The transcription factor pacC is a regulator of pH-controlled genes and is essential for successful colonization. We present here the sequence assembly of the Colletotrichum fruit pathogen and use it to explore the global regulation of pathogenicity by ambient pH. The assembled genome size was 54 Mb, encoding 18,456 genes. Transcriptomes of the wild type and ΔpacC mutant were established by RNA-seq and explored for their global pH-dependent gene regulation. The analysis showed that pacC upregulates 478 genes and downregulates 483 genes, comprising 5% of the fungal genome, including transporters, antioxidants, and cell-wall-degrading enzymes. Interestingly, gene families with similar functionality are both up- and downregulated by pacC. Global analysis of secreted genes showed significant pacC activation of degradative enzymes at alkaline pH and during fruit infection. Select genes from alkalizing-type pathogen C. gloeosporioides and from acidifying-type pathogen Sclerotinia sclerotiorum were verified by quantitative reverse-transcription polymerase chain reaction analysis at different pH values. Knock out of several pacC-activated genes confirmed their involvement in pathogenic colonization of alkalinized surroundings. The results suggest a global regulation by pacC of key pathogenicity genes during pH change in alkalinizing and acidifying pathogens.

  18. Gene Classification and Mining of Molecular Markers Useful in Red Clover (Trifolium pratense) Breeding

    PubMed Central

    Ištvánek, Jan; Dluhošová, Jana; Dluhoš, Petr; Pátková, Lenka; Nedělník, Jan; Řepková, Jana

    2017-01-01

    Red clover (Trifolium pratense) is an important forage plant worldwide. This study was directed to broadening current knowledge of red clover's coding regions and enhancing its utilization in practice by specific reanalysis of previously published assembly. A total of 42,996 genes were characterized using Illumina paired-end sequencing after manual revision of Blast2GO annotation. Genes were classified into metabolic and biosynthetic pathways in response to biological processes, with 7,517 genes being assigned to specific pathways. Moreover, 17,727 enzymatic nodes in all pathways were described. We identified 6,749 potential microsatellite loci in red clover coding sequences, and we characterized 4,005 potential simple sequence repeat (SSR) markers as generating polymerase chain reaction products preferentially within 100–350 bp. Marker density of 1 SSR marker per 12.39 kbp was achieved. Aligning reads against predicted coding sequences resulted in the identification of 343,027 single nucleotide polymorphism (SNP) markers, providing marker density of one SNP marker per 144.6 bp. Altogether, 95 SSRs in coding sequences were analyzed for 50 red clover varieties and a collection of 22 highly polymorphic SSRs with pooled polymorphism information content >0.9 was generated, thus obtaining primer pairs for application to diversity studies in T. pratense. A set of 8,623 genome-wide distributed SNPs was developed and used for polymorphism evaluation in individual plants. The polymorphic information content ranged from 0 to 0.375. Temperature switch PCR was successfully used in single-marker SNP genotyping for targeted coding sequences and for heterozygosity or homozygosity confirmation in validated five loci. Predicted large sets of SSRs and SNPs throughout the genome are key to rapidly implementing genome-based breeding approaches, for identifying genes underlying key traits, and for genome-wide association studies. Detailed knowledge of genetic relationships among

  19. Global Analysis of Transcriptome Responses and Gene Expression Profiles to Cold Stress of Jatropha curcas L.

    PubMed Central

    Wang, Haibo; Zou, Zhurong; Wang, Shasha; Gong, Ming

    2013-01-01

    Background Jatropha curcas L., also called the Physic nut, is an oil-rich shrub with multiple uses, including biodiesel production, and is currently exploited as a renewable energy resource in many countries. Nevertheless, because of its origin from the tropical MidAmerican zone, J. curcas confers an inherent but undesirable characteristic (low cold resistance) that may seriously restrict its large-scale popularization. This adaptive flaw can be genetically improved by elucidating the mechanisms underlying plant tolerance to cold temperatures. The newly developed Illumina Hiseq™ 2000 RNA-seq and Digital Gene Expression (DGE) are deep high-throughput approaches for gene expression analysis at the transcriptome level, using which we carefully investigated the gene expression profiles in response to cold stress to gain insight into the molecular mechanisms of cold response in J. curcas. Results In total, 45,251 unigenes were obtained by assembly of clean data generated by RNA-seq analysis of the J. curcas transcriptome. A total of 33,363 and 912 complete or partial coding sequences (CDSs) were determined by protein database alignments and ESTScan prediction, respectively. Among these unigenes, more than 41.52% were involved in approximately 128 known metabolic or signaling pathways, and 4,185 were possibly associated with cold resistance. DGE analysis was used to assess the changes in gene expression when exposed to cold condition (12°C) for 12, 24, and 48 h. The results showed that 3,178 genes were significantly upregulated and 1,244 were downregulated under cold stress. These genes were then functionally annotated based on the transcriptome data from RNA-seq analysis. Conclusions This study provides a global view of transcriptome response and gene expression profiling of J. curcas in response to cold stress. The results can help improve our current understanding of the mechanisms underlying plant cold resistance and favor the screening of crucial genes for

  20. The Global Response Regulator RegR Controls Expression of Denitrification Genes in Bradyrhizobium japonicum

    PubMed Central

    Torres, Maria J.; Argandoña, Montserrat; Vargas, Carmen; Bedmar, Eulogio J.; Fischer, Hans-Martin; Mesa, Socorro; Delgado, María J.

    2014-01-01

    Bradyrhizobium japonicum RegSR regulatory proteins belong to the family of two-component regulatory systems, and orthologs are present in many Proteobacteria where they globally control gene expression mostly in a redox-responsive manner. In this work, we have performed a transcriptional profiling of wild-type and regR mutant cells grown under anoxic denitrifying conditions. The comparative analyses of wild-type and regR strains revealed that almost 620 genes induced in the wild type under denitrifying conditions were regulated (directly or indirectly) by RegR, pointing out the important role of this protein as a global regulator of denitrification. Genes controlled by RegR included nor and nos structural genes encoding nitric oxide and nitrous oxide reductase, respectively, genes encoding electron transport proteins such as cycA (blr7544) or cy2 (bll2388), and genes involved in nitric oxide detoxification (blr2806-09) and copper homeostasis (copCAB), as well as two regulatory genes (bll3466, bll4130). Purified RegR interacted with the promoters of norC (blr3214), nosR (blr0314), a fixK-like gene (bll3466), and bll4130, which encodes a LysR-type regulator. By using fluorescently labeled oligonucleotide extension (FLOE), we were able to identify two transcriptional start sites located at about 35 (P1) and 22 (P2) bp upstream of the putative translational start codon of norC. P1 matched with the previously mapped 5′end of norC mRNA which we demonstrate in this work to be under FixK2 control. P2 is a start site modulated by RegR and specific for anoxic conditions. Moreover, qRT-PCR experiments, expression studies with a norC-lacZ fusion, and heme c-staining analyses revealed that anoxia and nitrate are required for RegR-dependent induction of nor genes, and that this control is independent of the sensor protein RegS. PMID:24949739

  1. The global response regulator RegR controls expression of denitrification genes in Bradyrhizobium japonicum.

    PubMed

    Torres, Maria J; Argandoña, Montserrat; Vargas, Carmen; Bedmar, Eulogio J; Fischer, Hans-Martin; Mesa, Socorro; Delgado, María J

    2014-01-01

    Bradyrhizobium japonicum RegSR regulatory proteins belong to the family of two-component regulatory systems, and orthologs are present in many Proteobacteria where they globally control gene expression mostly in a redox-responsive manner. In this work, we have performed a transcriptional profiling of wild-type and regR mutant cells grown under anoxic denitrifying conditions. The comparative analyses of wild-type and regR strains revealed that almost 620 genes induced in the wild type under denitrifying conditions were regulated (directly or indirectly) by RegR, pointing out the important role of this protein as a global regulator of denitrification. Genes controlled by RegR included nor and nos structural genes encoding nitric oxide and nitrous oxide reductase, respectively, genes encoding electron transport proteins such as cycA (blr7544) or cy2 (bll2388), and genes involved in nitric oxide detoxification (blr2806-09) and copper homeostasis (copCAB), as well as two regulatory genes (bll3466, bll4130). Purified RegR interacted with the promoters of norC (blr3214), nosR (blr0314), a fixK-like gene (bll3466), and bll4130, which encodes a LysR-type regulator. By using fluorescently labeled oligonucleotide extension (FLOE), we were able to identify two transcriptional start sites located at about 35 (P1) and 22 (P2) bp upstream of the putative translational start codon of norC. P1 matched with the previously mapped 5'end of norC mRNA which we demonstrate in this work to be under FixK2 control. P2 is a start site modulated by RegR and specific for anoxic conditions. Moreover, qRT-PCR experiments, expression studies with a norC-lacZ fusion, and heme c-staining analyses revealed that anoxia and nitrate are required for RegR-dependent induction of nor genes, and that this control is independent of the sensor protein RegS.

  2. Understanding TCV L-mode plasmas via global gyrokinetic GENE simulations

    NASA Astrophysics Data System (ADS)

    Merlo, Gabriele; Brunner, Stephan; Coda, Stefano; Goerler, Tobias; Huang, Zhouji; Jenko, Frank; Told, Daniel; Sauter, Olivier; Villard, Laurent

    2016-10-01

    It is known that global effects can have a significant influence on turbulent transport driven by microintabilities, especially for small size machines like the TCV tokamak. The global version of the gyrokinetic GENE code has been extensively used to model TCV plasmas for which finite ρ* effects are expected to be crucial in order to recover the experimentally observed behaviour. We will address in particular: (i) The effect of negative triangularity, which has been experimentally observed to lower up to a factor of two the heat flux through the lectron channel at all radial locations. Global effects and the inclusion of carbon impurities turn out to be the key elements required in order to match experiments and simulation results. (ii) The formation of either radially coherent or dispersive axisymmetric density fluctuations, experimentally interpreted as Geodesic Acoustic Modes. GENE simulations reproduce the observed behaviour and allow to conclude that the modification of safety factor alone cannot explain the transition between these two different fluctuation regimes.

  3. Genome mining unearths a hybrid nonribosomal peptide synthetase-like-pteridine synthase biosynthetic gene cluster

    PubMed Central

    Park, Hyun Bong; Perez, Corey E; Barber, Karl W; Rinehart, Jesse; Crawford, Jason M

    2017-01-01

    Nonribosomal peptides represent a large class of metabolites with pharmaceutical relevance. Pteridines, such as pterins, folates, and flavins, are heterocyclic metabolites that often serve as redox-active cofactors. The biosynthetic machineries for construction of these distinct classes of small molecules operate independently in the cell. Here, we discovered an unprecedented nonribosomal peptide synthetase-like-pteridine synthase hybrid biosynthetic gene cluster in Photorhabdus luminescens using genome synteny analysis. P. luminescens is a Gammaproteobacterium that undergoes phenotypic variation and can have both pathogenic and mutualistic roles. Through extensive gene deletion, pathway-targeted molecular networking, quantitative proteomic analysis, and NMR, we show that the genetic locus affects the regulation of quorum sensing and secondary metabolic enzymes and encodes new pteridine metabolites functionalized with cis-amide acyl-side chains, termed pepteridine A (1) and B (2). The pepteridines are produced in the pathogenic phenotypic variant and represent the first reported metabolites to be synthesized by a hybrid NRPS-pteridine pathway. These studies expand our view of the combinatorial biosynthetic potential available in bacteria. DOI: http://dx.doi.org/10.7554/eLife.25229.001

  4. Self-Organizing Global Gene Expression Regulated through Criticality: Mechanism of the Cell-Fate Change

    PubMed Central

    Tsuchiya, Masa; Giuliani, Alessandro; Hashimoto, Midori; Erenpreisa, Jekaterina; Yoshikawa, Kenichi

    2016-01-01

    Background A fundamental issue in bioscience is to understand the mechanism that underlies the dynamic control of genome-wide expression through the complex temporal-spatial self-organization of the genome to regulate the change in cell fate. We address this issue by elucidating a physically motivated mechanism of self-organization. Principal Findings Building upon transcriptome experimental data for seven distinct cell fates, including early embryonic development, we demonstrate that self-organized criticality (SOC) plays an essential role in the dynamic control of global gene expression regulation at both the population and single-cell levels. The novel findings are as follows: i) Mechanism of cell-fate changes: A sandpile-type critical transition self-organizes overall expression into a few transcription response domains (critical states). A cell-fate change occurs by means of a dissipative pulse-like global perturbation in self-organization through the erasure of initial-state critical behaviors (criticality). Most notably, the reprogramming of early embryo cells destroys the zygote SOC control to initiate self-organization in the new embryonal genome, which passes through a stochastic overall expression pattern. ii) Mechanism of perturbation of SOC controls: Global perturbations in self-organization involve the temporal regulation of critical states. Quantitative evaluation of this perturbation in terminal cell fates reveals that dynamic interactions between critical states determine the critical-state coherent regulation. The occurrence of a temporal change in criticality perturbs this between-states interaction, which directly affects the entire genomic system. Surprisingly, a sub-critical state, corresponding to an ensemble of genes that shows only marginal changes in expression and consequently are considered to be devoid of any interest, plays an essential role in generating a global perturbation in self-organization directed toward the cell-fate change

  5. Global characterization of interferon regulatory factor (IRF) genes in vertebrates: Glimpse of the diversification in evolution

    PubMed Central

    2010-01-01

    Background Interferon regulatory factors (IRFs), which can be identified based on a unique helix-turn-helix DNA-binding domain (DBD) are a large family of transcription factors involved in host immune response, haemotopoietic differentiation and immunomodulation. Despite the identification of ten IRF family members in mammals, and some recent effort to identify these members in fish, relatively little is known in the composition of these members in other classes of vertebrates, and the evolution and probably the origin of the IRF family have not been investigated in vertebrates. Results Genome data mining has been performed to identify any possible IRF family members in human, mouse, dog, chicken, anole lizard, frog, and some teleost fish, mainly zebrafish and stickleback, and also in non-vertebrate deuterostomes including the hemichordate, cephalochordate, urochordate and echinoderm. In vertebrates, all ten IRF family members, i.e. IRF-1 to IRF-10 were identified, with two genes of IRF-4 and IRF-6 identified in fish and frog, respectively, except that in zebrafish exist three IRF-4 genes. Surprisingly, an additional member in the IRF family, IRF-11 was found in teleost fish. A range of two to ten IRF-like genes were detected in the non-vertebrate deuterostomes, and they had little similarity to those IRF family members in vertebrates as revealed in genomic structure and in phylogenetic analysis. However, the ten IRF family members, IRF-1 to IRF-10 showed certain degrees of conservation in terms of genomic structure and gene synteny. In particular, IRF-1, IRF-2, IRF-6, IRF-8 are quite conserved in their genomic structure in all vertebrates, and to a less degree, some IRF family members, such as IRF-5 and IRF-9 are comparable in the structure. Synteny analysis revealed that the gene loci for the ten IRF family members in vertebrates were also quite conservative, but in zebrafish conserved genes were distributed in a much longer distance in chromosomes. Furthermore

  6. Global characterization of interferon regulatory factor (IRF) genes in vertebrates: glimpse of the diversification in evolution.

    PubMed

    Huang, Bei; Qi, Zhi T; Xu, Zhen; Nie, Pin

    2010-05-05

    Interferon regulatory factors (IRFs), which can be identified based on a unique helix-turn-helix DNA-binding domain (DBD) are a large family of transcription factors involved in host immune response, haemotopoietic differentiation and immunomodulation. Despite the identification of ten IRF family members in mammals, and some recent effort to identify these members in fish, relatively little is known in the composition of these members in other classes of vertebrates, and the evolution and probably the origin of the IRF family have not been investigated in vertebrates. Genome data mining has been performed to identify any possible IRF family members in human, mouse, dog, chicken, anole lizard, frog, and some teleost fish, mainly zebrafish and stickleback, and also in non-vertebrate deuterostomes including the hemichordate, cephalochordate, urochordate and echinoderm. In vertebrates, all ten IRF family members, i.e. IRF-1 to IRF-10 were identified, with two genes of IRF-4 and IRF-6 identified in fish and frog, respectively, except that in zebrafish exist three IRF-4 genes. Surprisingly, an additional member in the IRF family, IRF-11 was found in teleost fish. A range of two to ten IRF-like genes were detected in the non-vertebrate deuterostomes, and they had little similarity to those IRF family members in vertebrates as revealed in genomic structure and in phylogenetic analysis. However, the ten IRF family members, IRF-1 to IRF-10 showed certain degrees of conservation in terms of genomic structure and gene synteny. In particular, IRF-1, IRF-2, IRF-6, IRF-8 are quite conserved in their genomic structure in all vertebrates, and to a less degree, some IRF family members, such as IRF-5 and IRF-9 are comparable in the structure. Synteny analysis revealed that the gene loci for the ten IRF family members in vertebrates were also quite conservative, but in zebrafish conserved genes were distributed in a much longer distance in chromosomes. Furthermore, all ten different

  7. Global gene expression defines faded whorl specification of double flower domestication in Camellia.

    PubMed

    Li, Xinlei; Li, Jiyuan; Fan, Zhengqi; Liu, Zhongchi; Tanaka, Takayuki; Yin, Hengfu

    2017-06-09

    Double flowers in cultivated camellias are divergent in floral patterns which present a rich resource for demonstrating molecular modifications influenced by the human demands. Despite the key principle of ABCE model in whorl specification, the underlying mechanism of fine-tuning double flower formation remains largely unclear. Here a comprehensive comparative transcriptomics interrogation of gene expression among floral organs of wild type and "formal double" and "anemone double" is presented. Through a combination of transcriptome, small RNA and "degradome" sequencing, we studied the regulatory gene expression network underlying the double flower formation. We obtained the differentially expressed genes between whorls in wild and cultivated Camellia. We showed that the formation of double flowers tends to demolish gene expression canalization of key functions; the faded whorl specification mechanism was fundamental under the diverse patterns of double flowers. Furthermore, we identified conserved miRNA-targets regulations in the control of double flowers, and we found that miR172-AP2, miR156-SPLs were critical regulatory nodes contributing to the diversity of double flower forms. This work highlights the hierarchical patterning of global gene expression in floral development, and supports the roles of "faded ABC model" mechanism and miRNA-targets regulations underlying the double flower domestication.

  8. Global analysis of gene expression profiles in developing physic nut (Jatropha curcas L.) seeds.

    PubMed

    Jiang, Huawu; Wu, Pingzhi; Zhang, Sheng; Song, Chi; Chen, Yaping; Li, Meiru; Jia, Yongxia; Fang, Xiaohua; Chen, Fan; Wu, Guojiang

    2012-01-01

    Physic nut (Jatropha curcas L.) is an oilseed plant species with high potential utility as a biofuel. Furthermore, following recent sequencing of its genome and the availability of expressed sequence tag (EST) libraries, it is a valuable model plant for studying carbon assimilation in endosperms of oilseed plants. There have been several transcriptomic analyses of developing physic nut seeds using ESTs, but they have provided limited information on the accumulation of stored resources in the seeds. We applied next-generation Illumina sequencing technology to analyze global gene expression profiles of developing physic nut seeds 14, 19, 25, 29, 35, 41, and 45 days after pollination (DAP). The acquired profiles reveal the key genes, and their expression timeframes, involved in major metabolic processes including: carbon flow, starch metabolism, and synthesis of storage lipids and proteins in the developing seeds. The main period of storage reserves synthesis in the seeds appears to be 29-41 DAP, and the fatty acid composition of the developing seeds is consistent with relative expression levels of different isoforms of acyl-ACP thioesterase and fatty acid desaturase genes. Several transcription factor genes whose expression coincides with storage reserve deposition correspond to those known to regulate the process in Arabidopsis. The results will facilitate searches for genes that influence de novo lipid synthesis, accumulation and their regulatory networks in developing physic nut seeds, and other oil seeds. Thus, they will be helpful in attempts to modify these plants for efficient biofuel production.

  9. Shared control of gene expression in bacteria by transcription factors and global physiology of the cell

    PubMed Central

    Berthoumieux, Sara; de Jong, Hidde; Baptist, Guillaume; Pinel, Corinne; Ranquet, Caroline; Ropers, Delphine; Geiselmann, Johannes

    2013-01-01

    Gene expression is controlled by the joint effect of (i) the global physiological state of the cell, in particular the activity of the gene expression machinery, and (ii) DNA-binding transcription factors and other specific regulators. We present a model-based approach to distinguish between these two effects using time-resolved measurements of promoter activities. We demonstrate the strength of the approach by analyzing a circuit involved in the regulation of carbon metabolism in E. coli. Our results show that the transcriptional response of the network is controlled by the physiological state of the cell and the signaling metabolite cyclic AMP (cAMP). The absence of a strong regulatory effect of transcription factors suggests that they are not the main coordinators of gene expression changes during growth transitions, but rather that they complement the effect of global physiological control mechanisms. This change of perspective has important consequences for the interpretation of transcriptome data and the design of biological networks in biotechnology and synthetic biology. PMID:23340840

  10. Global analysis of the regulatory network structure of gene expression in Saccharomyces cerevisiae.

    PubMed

    Gunji, Wataru; Kai, Takahito; Takahashi, Yoriko; Maki, Yukihiro; Kurihara, Wataru; Utsugi, Takahiko; Fujimori, Fumihiro; Murakami, Yasufumi

    2004-06-30

    Gene expression in eukaryotic cells is controlled by the concerted action of various transcription factors. To help clarify these complex mechanisms, we attempted to develop a method for extracting maximal information regarding the transcriptional control pathways. To this end, we first analyzed the expression profiles of numerous transcription factors in yeast cells, under the assumption that the expression levels of these factors would be elevated under conditions in which the factors were active in the cells. Based on the results, we successfully categorized about 400 transcription factors into three groups based on their expression profiles. We then analyzed the effect of the loss of function of various induced transcription factors on the global expression profile to investigate the above-mentioned assumption of a correlation between transcription elevation and functional activity. By comparing the expression profiles of wild-type with those of disruption mutants using microarrays, we were able to detect a substantial number of relations between transcription factors and the genes they regulate. The results of these experiments suggested that our approach is useful for understanding the global transcriptional networks of eukaryotic cells, in which most genes are regulated in a temporal and conditional manner.

  11. Global gene expression profiling of individual human oocytes and embryos demonstrates heterogeneity in early development.

    PubMed

    Shaw, Lisa; Sneddon, Sharon F; Zeef, Leo; Kimber, Susan J; Brison, Daniel R

    2013-01-01

    Early development in humans is characterised by low and variable embryonic viability, reflected in low fecundity and high rates of miscarriage, relative to other mammals. Data from assisted reproduction programmes provides additional evidence that this is largely mediated at the level of embryonic competence and is highly heterogeneous among embryos. Understanding the basis of this heterogeneity has important implications in a number of areas including: the regulation of early human development, disorders of pregnancy, assisted reproduction programmes, the long term health of children which may be programmed in early development, and the molecular basis of pluripotency in human stem cell populations. We have therefore investigated global gene expression profiles using polyAPCR amplification and microarray technology applied to individual human oocytes and 4-cell and blastocyst stage embryos. In order to explore the basis of any variability in detail, each developmental stage is replicated in triplicate. Our data show that although transcript profiles are highly stage-specific, within each stage they are relatively variable. We describe expression of a number of gene families and pathways including apoptosis, cell cycle and amino acid metabolism, which are variably expressed and may be reflective of embryonic developmental competence. Overall, our data suggest that heterogeneity in human embryo developmental competence is reflected in global transcript profiles, and that the vast majority of existing human embryo gene expression data based on pooled oocytes and embryos need to be reinterpreted.

  12. Gene expression profiling--Opening the black box of plant ecosystem responses to global change

    SciTech Connect

    Leakey, A.D.B.; Ainsworth, E.A.; Bernard, S.M.; Markelz, R.J.C.; Ort, D.R.; Placella, S.A.P.; Rogers, A.; Smith, M.D.; Sudderth, E.A.; Weston, D.J.; Wullschleger, S.D.; Yuan, S.

    2009-11-01

    The use of genomic techniques to address ecological questions is emerging as the field of genomic ecology. Experimentation under environmentally realistic conditions to investigate the molecular response of plants to meaningful changes in growth conditions and ecological interactions is the defining feature of genomic ecology. Since the impact of global change factors on plant performance are mediated by direct effects at the molecular, biochemical and physiological scales, gene expression analysis promises important advances in understanding factors that have previously been consigned to the 'black box' of unknown mechanism. Various tools and approaches are available for assessing gene expression in model and non-model species as part of global change biology studies. Each approach has its own unique advantages and constraints. A first generation of genomic ecology studies in managed ecosystems and mesocosms have provided a testbed for the approach and have begun to reveal how the experimental design and data analysis of gene expression studies can be tailored for use in an ecological context.

  13. Mining the genetic diversity of Ehrlichia ruminantium using map genes family.

    PubMed

    Raliniaina, Modestine; Meyer, Damien F; Pinarello, Valérie; Sheikboudou, Christian; Emboulé, Loic; Kandassamy, Yane; Adakal, Hassane; Stachurski, Frédéric; Martinez, Dominique; Lefrançois, Thierry; Vachiéry, Nathalie

    2010-02-10

    Understanding bacterial genetic diversity is crucial to comprehend pathogenesis. Ehrlichia ruminantium (E. ruminantium), a tick-transmitted intracellular bacterial pathogen, causes heartwater disease in ruminants. This model rickettsia, whose genome has been recently sequenced, is restricted to neutrophils and reticulo-endothelial cells of its mammalian host and to the midgut and salivary glands of its vector tick. E. ruminantium harbors a multigene family encoding for 16 outer membrane proteins including MAP1, a major antigenic protein. All the 16 map paralogs are expressed in bovine endothelial cells and some are specifically translated in the tick or in the mammalian host. In this study, we carried out phylogenetic analyses of E. ruminantium using sequences of 6 MAP proteins, MAP1, MAP1-2, MAP1-6, MAP1-5, MAP1+1 and MAP1-14, localized either in the center or at the borders of the map genes cluster. We show that (i) map1 gene is a good tool to characterize the genetic diversity among Africa, Caribbean islands and Madagascar strains including new emerging isolates of E. ruminantium; (ii) the different map paralogs define different genotypes showing divergent evolution; (iii) there is no correlation between all MAP genotypes and the geographic origins of the strains; (iv) The genetic diversity revealed by MAP proteins is conserved whatever is the scale of strains sampling (village, region, continent) and thus was not related to the different timing of strains introduction, i.e. continuous introduction of strains versus punctual introduction (Africa versus Caribbean islands). These results provide therefore a significant advance towards the management of E. ruminantium diversity. The differential evolution of these paralogs suggests specific roles of these proteins in host-vector-pathogen interactions that could be crucial for developing broad-spectrum vaccines.

  14. GeneLab for High Schools: Data Mining for the Next Generation

    NASA Technical Reports Server (NTRS)

    Blaber, Elizabeth A.; Ly, Diana; Sato, Kevin Y.; Taylor, Elizabeth

    2016-01-01

    Modern biological sciences have become increasingly based on molecular biology and high-throughput molecular techniques, such as genomics, transcriptomics, and proteomics. NASA Scientists and the NASA Space Biology Program have aimed to examine the fundamental building blocks of life (RNA, DNA and protein) in order to understand the response of living organisms to space and aid in fundamental research discoveries on Earth. In an effort to enable NASA funded science to be available to everyone, NASA has collected the data from omics studies and curated them in a data system called GeneLab. Whilst most college-level interns, academics and other scientists have had some interaction with omics data sets and analysis tools, high school students often have not. Therefore, the Space Biology Program is implementing a new Summer Program for high-school students that aims to inspire the next generation of scientists to learn about and get involved in space research using GeneLabs Data System. The program consists of three main components core learning modules, focused on developing students knowledge on the Space Biology Program and Space Biology research, Genelab and the data system, and previous research conducted on model organisms in space; networking and team work, enabling students to interact with guest lecturers from local universities and their fellow peers, and also enabling them to visit local universities and genomics centers around the Bay area; and finally an independent learning project, whereby students will be required to form small groups, analyze a dataset on the Genelab platform, generate a hypothesis and develop a research plan to test their hypothesis. This program will not only help inspire high-school students to become involved in space-based research but will also help them develop key critical thinking and bioinformatics skills required for most college degrees and furthermore, will enable them to establish networks with their peers and connections

  15. A data mining approach for identifying pathway-gene biomarkers for predicting clinical outcome: A case study of erlotinib and sorafenib

    PubMed Central

    2017-01-01

    A novel data mining procedure is proposed for identifying potential pathway-gene biomarkers from preclinical drug sensitivity data for predicting clinical responses to erlotinib or sorafenib. The analysis applies linear ridge regression modeling to generate a small (N~1000) set of baseline gene expressions that jointly yield quality predictions of preclinical drug sensitivity data and clinical responses. Standard clustering of the pathway-gene combinations from gene set enrichment analysis of this initial gene set, according to their shared appearance in molecular function pathways, yields a reduced (N~300) set of potential pathway-gene biomarkers. A modified method for quantifying pathway fitness is used to determine smaller numbers of over and under expressed genes that correspond with favorable and unfavorable clinical responses. Detailed literature-based evidence is provided in support of the roles of these under and over expressed genes in compound efficacy. RandomForest analysis of potential pathway-gene biomarkers finds average treatment prediction errors of 10% and 22%, respectively, for patients receiving erlotinib or sorafenib that had a favorable clinical response. Higher errors were found for both compounds when predicting an unfavorable clinical response. Collectively these results suggest complementary roles for biomarker genes and biomarker pathways when predicting clinical responses from preclinical data. PMID:28792525

  16. Global Transcriptome Analysis Reveals That Poly(ADP-Ribose) Polymerase 1 Regulates Gene Expression through EZH2.

    PubMed

    Martin, Kayla A; Cesaroni, Matteo; Denny, Michael F; Lupey, Lena N; Tempera, Italo

    2015-12-01

    Posttranslational modifications, such as poly(ADP-ribosyl)ation (PARylation), regulate chromatin-modifying enzymes, ultimately affecting gene expression. This study explores the role of poly(ADP-ribose) polymerase (PARP) on global gene expression in a lymphoblastoid B cell line. We found that inhibition of PARP catalytic activity with olaparib resulted in global gene deregulation, affecting approximately 11% of the genes expressed. Gene ontology analysis revealed that PARP could exert these effects through transcription factors and chromatin-remodeling enzymes, including the polycomb repressive complex 2 (PRC2) member EZH2. EZH2 mediates the trimethylation of histone H3 at lysine 27 (H3K27me3), a modification associated with chromatin compaction and gene silencing. Both pharmacological inhibition of PARP and knockdown of PARP1 induced the expression of EZH2, which resulted in increased global H3K27me3. Chromatin immunoprecipitation confirmed that PARP1 inhibition led to H3K27me3 deposition at EZH2 target genes, which resulted in gene silencing. Moreover, increased EZH2 expression is attributed to the loss of the occupancy of the transcription repressor E2F4 at the EZH2 promoter following PARP inhibition. Together, these data show that PARP plays an important role in global gene regulation and identifies for the first time a direct role of PARP1 in regulating the expression and function of EZH2. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

  17. The role of wildlife (wild birds) in the global transmission of antimicrobial resistance genes.

    PubMed

    Wang, Jing; Ma, Zhen-Bao; Zeng, Zhen-Ling; Yang, Xue-Wen; Huang, Ying; Liu, Jian-Hua

    2017-03-18

    Antimicrobial resistance is an urgent global health challenge in human and veterinary medicine. Wild animals are not directly exposed to clinically relevant antibiotics; however, antibacterial resistance in wild animals has been increasingly reported worldwide in parallel to the situation in human and veterinary medicine. This underlies the complexity of bacterial resistance in wild animals and the possible interspecies transmission between humans, domestic animals, the environment, and wildlife. This review summarizes the current data on expanded-spectrum β-lactamase (ESBL), AmpC β-lactamase, carbapenemase, and colistin resistance genes in Enterobacteriaceae isolates of wildlife origin. The aim of this review is to better understand the important role of wild animals as reservoirs and vectors in the global dissemination of crucial clinical antibacterial resistance. In this regard, continued surveillance is urgently needed worldwide.

  18. The role of wildlife (wild birds) in the global transmission of antimicrobial resistance genes

    PubMed Central

    Wang, Jing; Ma, Zhen-Bao; Zeng, Zhen-Ling; Yang, Xue-Wen; Huang, Ying; Liu, Jian-Hua

    2017-01-01

    Antimicrobial resistance is an urgent global health challenge in human and veterinary medicine. Wild animals are not directly exposed to clinically relevant antibiotics; however, antibacterial resistance in wild animals has been increasingly reported worldwide in parallel to the situation in human and veterinary medicine. This underlies the complexity of bacterial resistance in wild animals and the possible interspecies transmission between humans, domestic animals, the environment, and wildlife. This review summarizes the current data on expanded-spectrum β-lactamase (ESBL), AmpC β-lactamase, carbapenemase, and colistin resistance genes in Enterobacteriaceae isolates of wildlife origin. The aim of this review is to better understand the important role of wild animals as reservoirs and vectors in the global dissemination of crucial clinical antibacterial resistance. In this regard, continued surveillance is urgently needed worldwide. PMID:28409502

  19. Global gene expression analyses of hematopoietic stem cell-like cell lines with inducible Lhx2 expression

    PubMed Central

    Richter, Karin; Wirta, Valtteri; Dahl, Lina; Bruce, Sara; Lundeberg, Joakim; Carlsson, Leif; Williams, Cecilia

    2006-01-01

    Background Expression of the LIM-homeobox gene Lhx2 in murine hematopoietic cells allows for the generation of hematopoietic stem cell (HSC)-like cell lines. To address the molecular basis of Lhx2 function, we generated HSC-like cell lines where Lhx2 expression is regulated by a tet-on system and hence dependent on the presence of doxycyclin (dox). These cell lines efficiently down-regulate Lhx2 expression upon dox withdrawal leading to a rapid differentiation into various myeloid cell types. Results Global gene expression of these cell lines cultured in dox was compared to different time points after dox withdrawal using microarray technology. We identified 267 differentially expressed genes. The majority of the genes overlapping with HSC-specific databases were those down-regulated after turning off Lhx2 expression and a majority of the genes overlapping with those defined as late progenitor-specific genes were the up-regulated genes, suggesting that these cell lines represent a relevant model system for normal HSCs also at the level of global gene expression. Moreover, in situ hybridisations of several genes down-regulated after dox withdrawal showed overlapping expression patterns with Lhx2 in various tissues during embryonic development. Conclusion Global gene expression analysis of HSC-like cell lines with inducible Lhx2 expression has identified genes putatively linked to self-renewal / differentiation of HSCs, and function of Lhx2 in organ development and stem / progenitor cells of non-hematopoietic origin. PMID:16600034

  20. Global gene expression analysis of the shoot apical meristem of maize (Zea mays L.)

    PubMed Central

    Ohtsu, Kazuhiro; Smith, Marianne B; Emrich, Scott J; Borsuk, Lisa A; Zhou, Ruilian; Chen, Tianle; Zhang, Xiaolan; Timmermans, Marja C P; Beck, Jon; Buckner, Brent; Janick-Buckner, Diane; Nettleton, Dan; Scanlon, Michael J; Schnable, Patrick S

    2007-01-01

    All above-ground plant organs are derived from shoot apical meristems (SAMs). Global analyses of gene expression were conducted on maize (Zea mays L.) SAMs to identify genes preferentially expressed in the SAM. The SAMs were collected from 14-day-old B73 seedlings via laser capture microdissection (LCM). The RNA samples extracted from LCM-collected SAMs and from seedlings were hybridized to microarrays spotted with 37 660 maize cDNAs. Approximately 30% (10 816) of these cDNAs were prepared as part of this study from manually dissected B73 maize apices. Over 5000 expressed sequence tags (ESTs) (about 13% of the total) were differentially expressed (P<0.0001) between SAMs and seedlings. Of these, 2783 and 2248 ESTs were up- and down-regulated in the SAM, respectively. The expression in the SAM of several of the differentially expressed ESTs was validated via quantitative RT-PCR and/or in situ hybridization. The up-regulated ESTs included many regulatory genes including transcription factors, chromatin remodeling factors and components of the gene-silencing machinery, as well as about 900 genes with unknown functions. Surprisingly, transcripts that hybridized to 62 retrotransposon-related cDNAs were also substantially up-regulated in the SAM. Complementary DNAs derived from the LCM-collected SAMs were sequenced to identify additional genes that are expressed in the SAM. This generated around 550 000 ESTs (454-SAM ESTs) from two genotypes. Consistent with the microarray results, approximately 14% of the 454-SAM ESTs from B73 were retrotransposon-related. Possible roles of genes that are preferentially expressed in the SAM are discussed. PMID:17764504

  1. Global gene expression analysis of the shoot apical meristem of maize (Zea mays L.).

    PubMed

    Ohtsu, Kazuhiro; Smith, Marianne B; Emrich, Scott J; Borsuk, Lisa A; Zhou, Ruilian; Chen, Tianle; Zhang, Xiaolan; Timmermans, Marja C P; Beck, Jon; Buckner, Brent; Janick-Buckner, Diane; Nettleton, Dan; Scanlon, Michael J; Schnable, Patrick S

    2007-11-01

    All above-ground plant organs are derived from shoot apical meristems (SAMs). Global analyses of gene expression were conducted on maize (Zea mays L.) SAMs to identify genes preferentially expressed in the SAM. The SAMs were collected from 14-day-old B73 seedlings via laser capture microdissection (LCM). The RNA samples extracted from LCM-collected SAMs and from seedlings were hybridized to microarrays spotted with 37 660 maize cDNAs. Approximately 30% (10 816) of these cDNAs were prepared as part of this study from manually dissected B73 maize apices. Over 5000 expressed sequence tags (ESTs) (about 13% of the total) were differentially expressed (P < 0.0001) between SAMs and seedlings. Of these, 2783 and 2248 ESTs were up- and down-regulated in the SAM, respectively. The expression in the SAM of several of the differentially expressed ESTs was validated via quantitative RT-PCR and/or in situ hybridization. The up-regulated ESTs included many regulatory genes including transcription factors, chromatin remodeling factors and components of the gene-silencing machinery, as well as about 900 genes with unknown functions. Surprisingly, transcripts that hybridized to 62 retrotransposon-related cDNAs were also substantially up-regulated in the SAM. Complementary DNAs derived from the LCM-collected SAMs were sequenced to identify additional genes that are expressed in the SAM. This generated around 550 000 ESTs (454-SAM ESTs) from two genotypes. Consistent with the microarray results, approximately 14% of the 454-SAM ESTs from B73 were retrotransposon-related. Possible roles of genes that are preferentially expressed in the SAM are discussed.

  2. X-linked paroxysmal dyskinesia and severe global retardation caused by defective MCT8 gene.

    PubMed

    Brockmann, Knut; Dumitrescu, Alexandra M; Best, Thomas T; Hanefeld, Folker; Refetoff, Samuel

    2005-06-01

    We previously reported two unrelated boys aged 3 and 8 years with mutations in the thyroid hormone transporter gene MCT8 resulting in severe global retardation and an uncommon pattern of thyroid hormone abnormalities. We now further describe an unusual neurological phenotype associated with these mutations, namely paroxysmal kinesigenic dyskinesias (PKD), provoked by certain stimuli including changing of their clothes or diapers. It is not clear how the MCT8 defect causes PKDs. PKDs have been previously noted in patients with thyroid abnormalities. This novel X-linked condition widens the spectrum of secondary PKDs.

  3. Mining and gene ontology based annotation of SSR markers from expressed sequence tags of Humulus lupulus.

    PubMed

    Singh, Swati; Gupta, Sanchita; Mani, Ashutosh; Chaturvedi, Anoop

    2012-01-01

    Humulus lupulus is commonly known as hops, a member of the family moraceae. Currently many projects are underway leading to the accumulation of voluminous genomic and expressed sequence tag sequences in public databases. The genetically characterized domains in these databases are limited due to non-availability of reliable molecular markers. The large data of EST sequences are available in hops. The simple sequence repeat markers extracted from EST data are used as molecular markers for genetic characterization, in the present study. 25,495 EST sequences were examined and assembled to get full-length sequences. Maximum frequency distribution was shown by mononucleotide SSR motifs i.e. 60.44% in contig and 62.16% in singleton where as minimum frequency are observed for hexanucleotide SSR in contig (0.09%) and pentanucleotide SSR in singletons (0.12%). Maximum trinucleotide motifs code for Glutamic acid (GAA) while AT/TA were the most frequent repeat of dinucleotide SSRs. Flanking primer pairs were designed in-silico for the SSR containing sequences. Functional categorization of SSRs containing sequences was done through gene ontology terms like biological process, cellular component and molecular function.

  4. Heme Signaling Impacts Global Gene Expression, Immunity and Dengue Virus Infectivity in Aedes aegypti

    PubMed Central

    Bottino-Rojas, Vanessa; Talyuli, Octávio A. C.; Jupatanakul, Natapong; Sim, Shuzhen; Dimopoulos, George; Venancio, Thiago M.; Bahia, Ana C.; Sorgine, Marcos H.; Oliveira, Pedro L.; Paiva-Silva, Gabriela O.

    2015-01-01

    Blood-feeding mosquitoes are exposed to high levels of heme, the product of hemoglobin degradation. Heme is a pro-oxidant that influences a variety of cellular processes. We performed a global analysis of heme-regulated Aedes aegypti (yellow fever mosquito) transcriptional changes to better understand influence on mosquito physiology at the molecular level. We observed an iron- and reactive oxygen species (ROS)-independent signaling induced by heme that comprised genes related to redox metabolism. By modulating the abundance of these transcripts, heme possibly acts as a danger signaling molecule. Furthermore, heme triggered critical changes in the expression of energy metabolism and immune response genes, altering the susceptibility towards bacteria and dengue virus. These findings seem to have implications on the adaptation of mosquitoes to hematophagy and consequently on their ability to transmit diseases. Altogether, these results may also contribute to the understanding of heme cell biology in eukaryotic cells. PMID:26275150

  5. LimTox: a web tool for applied text mining of adverse event and toxicity associations of compounds, drugs and genes

    PubMed Central

    Cañada, Andres; Rabal, Obdulia; Oyarzabal, Julen; Valencia, Alfonso

    2017-01-01

    Abstract A considerable effort has been devoted to retrieve systematically information for genes and proteins as well as relationships between them. Despite the importance of chemical compounds and drugs as a central bio-entity in pharmacological and biological research, only a limited number of freely available chemical text-mining/search engine technologies are currently accessible. Here we present LimTox (Literature Mining for Toxicology), a web-based online biomedical search tool with special focus on adverse hepatobiliary reactions. It integrates a range of text mining, named entity recognition and information extraction components. LimTox relies on machine-learning, rule-based, pattern-based and term lookup strategies. This system processes scientific abstracts, a set of full text articles and medical agency assessment reports. Although the main focus of LimTox is on adverse liver events, it enables also basic searches for other organ level toxicity associations (nephrotoxicity, cardiotoxicity, thyrotoxicity and phospholipidosis). This tool supports specialized search queries for: chemical compounds/drugs, genes (with additional emphasis on key enzymes in drug metabolism, namely P450 cytochromes—CYPs) and biochemical liver markers. The LimTox website is free and open to all users and there is no login requirement. LimTox can be accessed at: http://limtox.bioinfo.cnio.es PMID:28531339

  6. LimTox: a web tool for applied text mining of adverse event and toxicity associations of compounds, drugs and genes.

    PubMed

    Cañada, Andres; Capella-Gutierrez, Salvador; Rabal, Obdulia; Oyarzabal, Julen; Valencia, Alfonso; Krallinger, Martin

    2017-05-22

    A considerable effort has been devoted to retrieve systematically information for genes and proteins as well as relationships between them. Despite the importance of chemical compounds and drugs as a central bio-entity in pharmacological and biological research, only a limited number of freely available chemical text-mining/search engine technologies are currently accessible. Here we present LimTox (Literature Mining for Toxicology), a web-based online biomedical search tool with special focus on adverse hepatobiliary reactions. It integrates a range of text mining, named entity recognition and information extraction components. LimTox relies on machine-learning, rule-based, pattern-based and term lookup strategies. This system processes scientific abstracts, a set of full text articles and medical agency assessment reports. Although the main focus of LimTox is on adverse liver events, it enables also basic searches for other organ level toxicity associations (nephrotoxicity, cardiotoxicity, thyrotoxicity and phospholipidosis). This tool supports specialized search queries for: chemical compounds/drugs, genes (with additional emphasis on key enzymes in drug metabolism, namely P450 cytochromes-CYPs) and biochemical liver markers. The LimTox website is free and open to all users and there is no login requirement. LimTox can be accessed at: http://limtox.bioinfo.cnio.es. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  7. The Global Relationship between Chromatin Physical Topology, Fractal Structure, and Gene Expression

    PubMed Central

    Almassalha, L. M.; Tiwari, A.; Ruhoff, P. T.; Stypula-Cyrus, Y.; Cherkezyan, L.; Matsuda, H.; Dela Cruz, M. A.; Chandler, J. E.; White, C.; Maneval, C.; Subramanian, H.; Szleifer, I.; Roy, H. K.; Backman, V.

    2017-01-01

    Most of what we know about gene transcription comes from the view of cells as molecular machines: focusing on the role of molecular modifications to the proteins carrying out transcriptional reactions at a loci-by-loci basis. This view ignores a critical reality: biological reactions do not happen in an empty space, but in a highly complex, interrelated, and dense nanoenvironment that profoundly influences chemical interactions. We explored the relationship between the physical nanoenvironment of chromatin and gene transcription in vitro. We analytically show that changes in the fractal dimension, D, of chromatin correspond to simultaneous increases in chromatin accessibility and compaction heterogeneity. Using these predictions, we demonstrate experimentally that nanoscopic changes to chromatin D within thirty minutes correlate with concomitant enhancement and suppression of transcription. Further, we show that the increased heterogeneity of physical structure of chromatin due to increase in fractal dimension correlates with increased heterogeneity of gene networks. These findings indicate that the higher order folding of chromatin topology may act as a molecular-pathway independent code regulating global patterns of gene expression. Since physical organization of chromatin is frequently altered in oncogenesis, this work provides evidence pairing molecular function to physical structure for processes frequently altered during tumorigenesis. PMID:28117353

  8. Impact of Hfq on Global Gene Expression and Intracellular Survival in Brucella melitensis

    PubMed Central

    Du, Xinying; Yuan, Xitong; Wang, Zhoujia; Gong, Chunli; Zhuang, Yubin; Lei, Shuangshuang; Su, Xiao; Wang, Xuesong; Huang, Liuyu; Zhong, Zhijun; Peng, Guangneng; Yuan, Jing; Chen, Zeliang; Wang, Yufei

    2013-01-01

    Brucella melitensis is a facultative intracellular bacterium that replicates within macrophages. The ability of brucellae to survive and multiply in the hostile environment of host macrophages is essential to its virulence. The RNA-binding protein Hfq is a global regulator that is involved in stress resistance and pathogenicity. Here we demonstrate that Hfq is essential for stress adaptation and intracellular survival in B. melitensis. A B. melitensis hfq deletion mutant exhibits reduced survival under environmental stresses and is attenuated in cultured macrophages and mice. Microarray-based transcriptome analyses revealed that 359 genes involved in numerous cellular processes were dysregulated in the hfq mutant. From these same samples the proteins were also prepared for proteomic analysis to directly identify Hfq-regulated proteins. Fifty-five proteins with significantly affected expression were identified in the hfq mutant. Our results demonstrate that Hfq regulates many genes and/or proteins involved in metabolism, virulence, and stress responses, including those potentially involved in the adaptation of Brucella to the oxidative, acid, heat stress, and antibacterial peptides encountered within the host. The dysregulation of such genes and/or proteins could contribute to the attenuated hfq mutant phenotype. These findings highlight the involvement of Hfq as a key regulator of Brucella gene expression and facilitate our understanding of the role of Hfq in environmental stress adaptation and intracellular survival of B. melitensis. PMID:23977181

  9. Global gene expression profiles for life stages of the deadly amphibian pathogen Batrachochytrium dendrobatidis

    PubMed Central

    Rosenblum, Erica Bree; Stajich, Jason E.; Maddox, Nicole; Eisen, Michael B.

    2008-01-01

    Amphibians around the world are being threatened by an emerging pathogen, the chytrid fungus Batrachochytrium dendrobatidis (Bd). Despite intensive ecological study in the decade since Bd was discovered, little is known about the mechanism by which Bd kills frogs. Here, we compare patterns of global gene expression in controlled laboratory conditions for the two phases of the life cycle of Bd: the free-living zoospore and the substrate-embedded sporangia. We find zoospores to be transcriptionally less complex than sporangia. Several transcripts more abundant in zoospores provide clues about how this motile life stage interacts with its environment. Genes with higher levels of expression in sporangia provide new hypotheses about the molecular pathways involved in metabolic activity, flagellar function, and pathogenicity in Bd. We highlight expression patterns for a group of fungalysin metallopeptidase genes, a gene family thought to be involved in pathogenicity in another group of fungal pathogens that similarly cause cutaneous infection of vertebrates. Finally we discuss the challenges inherent in developing a molecular toolkit for chytrids, a basal fungal lineage separated by vast phylogenetic distance from other well characterized fungi. PMID:18852473

  10. Global gene expression profiles for life stages of the deadly amphibian pathogen Batrachochytrium dendrobatidis.

    PubMed

    Rosenblum, Erica Bree; Stajich, Jason E; Maddox, Nicole; Eisen, Michael B

    2008-11-04

    Amphibians around the world are being threatened by an emerging pathogen, the chytrid fungus Batrachochytrium dendrobatidis (Bd). Despite intensive ecological study in the decade since Bd was discovered, little is known about the mechanism by which Bd kills frogs. Here, we compare patterns of global gene expression in controlled laboratory conditions for the two phases of the life cycle of Bd: the free-living zoospore and the substrate-embedded sporangia. We find zoospores to be transcriptionally less complex than sporangia. Several transcripts more abundant in zoospores provide clues about how this motile life stage interacts with its environment. Genes with higher levels of expression in sporangia provide new hypotheses about the molecular pathways involved in metabolic activity, flagellar function, and pathogenicity in Bd. We highlight expression patterns for a group of fungalysin metallopeptidase genes, a gene family thought to be involved in pathogenicity in another group of fungal pathogens that similarly cause cutaneous infection of vertebrates. Finally we discuss the challenges inherent in developing a molecular toolkit for chytrids, a basal fungal lineage separated by vast phylogenetic distance from other well characterized fungi.

  11. The Global Relationship between Chromatin Physical Topology, Fractal Structure, and Gene Expression.

    PubMed

    Almassalha, L M; Tiwari, A; Ruhoff, P T; Stypula-Cyrus, Y; Cherkezyan, L; Matsuda, H; Dela Cruz, M A; Chandler, J E; White, C; Maneval, C; Subramanian, H; Szleifer, I; Roy, H K; Backman, V

    2017-01-24

    Most of what we know about gene transcription comes from the view of cells as molecular machines: focusing on the role of molecular modifications to the proteins carrying out transcriptional reactions at a loci-by-loci basis. This view ignores a critical reality: biological reactions do not happen in an empty space, but in a highly complex, interrelated, and dense nanoenvironment that profoundly influences chemical interactions. We explored the relationship between the physical nanoenvironment of chromatin and gene transcription in vitro. We analytically show that changes in the fractal dimension, D, of chromatin correspond to simultaneous increases in chromatin accessibility and compaction heterogeneity. Using these predictions, we demonstrate experimentally that nanoscopic changes to chromatin D within thirty minutes correlate with concomitant enhancement and suppression of transcription. Further, we show that the increased heterogeneity of physical structure of chromatin due to increase in fractal dimension correlates with increased heterogeneity of gene networks. These findings indicate that the higher order folding of chromatin topology may act as a molecular-pathway independent code regulating global patterns of gene expression. Since physical organization of chromatin is frequently altered in oncogenesis, this work provides evidence pairing molecular function to physical structure for processes frequently altered during tumorigenesis.

  12. Global Analysis of Serine-Threonine Protein Kinase Genes in Neurospora crassa ▿ †

    PubMed Central

    Park, Gyungsoon; Servin, Jacqueline A.; Turner, Gloria E.; Altamirano, Lorena; Colot, Hildur V.; Collopy, Patrick; Litvinkova, Liubov; Li, Liande; Jones, Carol A.; Diala, Fitz-Gerald; Dunlap, Jay C.; Borkovich, Katherine A.

    2011-01-01

    Serine/threonine (S/T) protein kinases are crucial components of diverse signaling pathways in eukaryotes, including the model filamentous fungus Neurospora crassa. In order to assess the importance of S/T kinases to Neurospora biology, we embarked on a global analysis of 86 S/T kinase genes in Neurospora. We were able to isolate viable mutants for 77 of the 86 kinase genes. Of these, 57% exhibited at least one growth or developmental phenotype, with a relatively large fraction (40%) possessing a defect in more than one trait. S/T kinase knockouts were subjected to chemical screening using a panel of eight chemical treatments, with 25 mutants exhibiting sensitivity or resistance to at least one chemical. This brought the total percentage of S/T mutants with phenotypes in our study to 71%. Mutants lacking apg-1, an S/T kinase required for autophagy in other organisms, possessed the greatest number of phenotypes, with defects in asexual and sexual growth and development and in altered sensitivity to five chemical treatments. We showed that NCU02245/stk-19 is required for chemotropic interactions between female and male cells during mating. Finally, we demonstrated allelism between the S/T kinase gene NCU00406 and velvet (vel), encoding a p21-activated protein kinase (PAK) gene important for asexual and sexual growth and development in Neurospora. PMID:21965514

  13. Global Gene Expression Profiling through the Complete Life Cycle of Trypanosoma vivax.

    PubMed

    Jackson, Andrew P; Goyard, Sophie; Xia, Dong; Foth, Bernardo J; Sanders, Mandy; Wastling, Jonathan M; Minoprio, Paola; Berriman, Matthew

    2015-01-01

    The parasitic flagellate Trypanosoma vivax is a cause of animal trypanosomiasis across Africa and South America. The parasite has a digenetic life cycle, passing between mammalian hosts and insect vectors, and a series of developmental forms adapted to each life cycle stage. Each point in the life cycle presents radically different challenges to parasite metabolism and physiology and distinct host interactions requiring remodeling of the parasite cell surface. Transcriptomic and proteomic studies of the related parasites T. brucei and T. congolense have shown how gene expression is regulated during their development. New methods for in vitro culture of the T. vivax insect stages have allowed us to describe global gene expression throughout the complete T. vivax life cycle for the first time. We combined transcriptomic and proteomic analysis of each life stage using RNA-seq and mass spectrometry respectively, to identify genes with patterns of preferential transcription or expression. While T. vivax conforms to a pattern of highly conserved gene expression found in other African trypanosomes, (e.g. developmental regulation of energy metabolism, restricted expression of a dominant variant antigen, and expression of 'Fam50' proteins in the insect mouthparts), we identified significant differences in gene expression affecting metabolism in the fly and a suite of T. vivax-specific genes with predicted cell-surface expression that are preferentially expressed in the mammal ('Fam29, 30, 42') or the vector ('Fam34, 35, 43'). T. vivax differs significantly from other African trypanosomes in the developmentally-regulated proteins likely to be expressed on its cell surface and thus, in the structure of the host-parasite interface. These unique features may yet explain the species differences in life cycle and could, in the form of bloodstream-stage proteins that do not undergo antigenic variation, provide targets for therapy.

  14. Escherichia coli Global Gene Expression in Urine from Women with Urinary Tract Infection

    PubMed Central

    Rasko, David A.; Faerber, Gary J.; Mobley, Harry L. T.

    2010-01-01

    Murine models of urinary tract infection (UTI) have provided substantial data identifying uropathogenic E. coli (UPEC) virulence factors and assessing their expression in vivo. However, it is unclear how gene expression in these animal models compares to UPEC gene expression during UTI in humans. To address this, we used a UPEC strain CFT073-specific microarray to measure global gene expression in eight E. coli isolates monitored directly from the urine of eight women presenting at a clinic with bacteriuria. The resulting gene expression profiles were compared to those of the same E. coli isolates cultured statically to exponential phase in pooled, sterilized human urine ex vivo. Known fitness factors, including iron acquisition and peptide transport systems, were highly expressed during human UTI and support a model in which UPEC replicates rapidly in vivo. While these findings were often consistent with previous data obtained from the murine UTI model, host-specific differences were observed. Most strikingly, expression of type 1 fimbrial genes, which are among the most highly expressed genes during murine experimental UTI and encode an essential virulence factor for this experimental model, was undetectable in six of the eight E. coli strains from women with UTI. Despite the lack of type 1 fimbrial expression in the urine samples, these E. coli isolates were generally capable of expressing type 1 fimbriae in vitro and highly upregulated fimA upon experimental murine infection. The findings presented here provide insight into the metabolic and pathogenic profile of UPEC in urine from women with UTI and represent the first transcriptome analysis for any pathogenic E. coli during a naturally occurring infection in humans. PMID:21085611

  15. Impact of Neutron Exposure on Global Gene Expression in a Human Peripheral Blood Model.

    PubMed

    Broustas, Constantinos G; Xu, Yanping; Harken, Andrew D; Chowdhury, Mashkura; Garty, Guy; Amundson, Sally A

    2017-04-01

    The detonation of an improvised nuclear device would produce prompt radiation consisting of both photons (gamma rays) and neutrons. While much effort in recent years has gone into the development of radiation biodosimetry methods suitable for mass triage, the possible effect of neutrons on the endpoints studied has remained largely uninvestigated. We have used a novel neutron irradiator with an energy spectrum based on that 1-1.5 km from the epicenter of the Hiroshima blast to begin examining the effect of neutrons on global gene expression, and the impact this may have on the development of gene expression signatures for radiation biodosimetry. We have exposed peripheral blood from healthy human donors to 0.1, 0.3, 0.5 or 1 Gy of neutrons ex vivo using our neutron irradiator, and compared the transcriptomic response 24 h later to that resulting from sham exposure or exposure to 0.1, 0.3, 0.5, 1, 2 or 4 Gy of photons (X rays). We identified 125 genes that responded significantly to both radiation qualities as a function of dose, with the magnitude of response to neutrons generally being greater than that seen after X-ray exposure. Gene ontology analysis suggested broad involvement of the p53 signaling pathway and general DNA damage response functions across all doses of both radiation qualities. Regulation of immune response and chromatin-related functions were implicated only following the highest doses of neutrons, suggesting a physiological impact of greater DNA damage. We also identified several genes that seem to respond primarily as a function of dose, with less effect of radiation quality. We confirmed this pattern of response by quantitative real-time RT-PCR for BAX, TNFRSF10B, ITLN2 and AEN and suggest that gene expression may provide a means to differentiate between total dose and a neutron component.

  16. Global Gene Expression Profiling through the Complete Life Cycle of Trypanosoma vivax

    PubMed Central

    Jackson, Andrew P.; Goyard, Sophie; Xia, Dong; Foth, Bernardo J.; Sanders, Mandy; Wastling, Jonathan M.; Minoprio, Paola; Berriman, Matthew

    2015-01-01

    The parasitic flagellate Trypanosoma vivax is a cause of animal trypanosomiasis across Africa and South America. The parasite has a digenetic life cycle, passing between mammalian hosts and insect vectors, and a series of developmental forms adapted to each life cycle stage. Each point in the life cycle presents radically different challenges to parasite metabolism and physiology and distinct host interactions requiring remodeling of the parasite cell surface. Transcriptomic and proteomic studies of the related parasites T. brucei and T. congolense have shown how gene expression is regulated during their development. New methods for in vitro culture of the T. vivax insect stages have allowed us to describe global gene expression throughout the complete T. vivax life cycle for the first time. We combined transcriptomic and proteomic analysis of each life stage using RNA-seq and mass spectrometry respectively, to identify genes with patterns of preferential transcription or expression. While T. vivax conforms to a pattern of highly conserved gene expression found in other African trypanosomes, (e.g. developmental regulation of energy metabolism, restricted expression of a dominant variant antigen, and expression of ‘Fam50’ proteins in the insect mouthparts), we identified significant differences in gene expression affecting metabolism in the fly and a suite of T. vivax-specific genes with predicted cell-surface expression that are preferentially expressed in the mammal (‘Fam29, 30, 42’) or the vector (‘Fam34, 35, 43’). T. vivax differs significantly from other African trypanosomes in the developmentally-regulated proteins likely to be expressed on its cell surface and thus, in the structure of the host-parasite interface. These unique features may yet explain the species differences in life cycle and could, in the form of bloodstream-stage proteins that do not undergo antigenic variation, provide targets for therapy. PMID:26266535

  17. Impact of Neutron Exposure on Global Gene Expression in a Human Peripheral Blood Model

    PubMed Central

    Broustas, Constantinos G.; Xu, Yanping; Harken, Andrew D.; Chowdhury, Mashkura; Garty, Guy; Amundson, Sally A.

    2017-01-01

    The detonation of an improvised nuclear device would produce prompt radiation consisting of both photons (gamma rays) and neutrons. While much effort in recent years has gone into the development of radiation biodosimetry methods suitable for mass triage, the possible effect of neutrons on the endpoints studied has remained largely uninvestigated. We have used a novel neutron irradiator with an energy spectrum based on that 1–1.5 km from the epicenter of the Hiroshima blast to begin examining the effect of neutrons on global gene expression, and the impact this may have on the development of gene expression signatures for radiation biodosimetry. We have exposed peripheral blood from healthy human donors to 0.1, 0.3, 0.5 or 1 Gy of neutrons ex vivo using our neutron irradiator, and compared the transcriptomic response 24 h later to that resulting from sham exposure or exposure to 0.1, 0.3, 0.5, 1, 2 or 4 Gy of photons (X rays). We identified 125 genes that responded significantly to both radiation qualities as a function of dose, with the magnitude of response to neutrons generally being greater than that seen after X-ray exposure. Gene ontology analysis suggested broad involvement of the p53 signaling pathway and general DNA damage response functions across all doses of both radiation qualities. Regulation of immune response and chromatin-related functions were implicated only following the highest doses of neutrons, suggesting a physiological impact of greater DNA damage. We also identified several genes that seem to respond primarily as a function of dose, with less effect of radiation quality. We confirmed this pattern of response by quantitative real-time RT-PCR for BAX, TNFRSF10B, ITLN2 and AEN and suggest that gene expression may provide a means to differentiate between total dose and a neutron component. PMID:28140791

  18. Role of Global and Local Topology in the Regulation of Gene Expression in Streptococcus pneumoniae

    PubMed Central

    Ferrándiz, María-José; Arnanz, Cristina; Martín-Galiano, Antonio J.; Rodríguez-Martín, Carlos; de la Campa, Adela G.

    2014-01-01

    The most basic level of transcription regulation in Streptococcus pneumoniae is the organization of its chromosome in topological domains. In response to drugs that caused DNA-relaxation, a global transcriptional response was observed. Several chromosomal domains were identified based on the transcriptional response of their genes: up-regulated (U), down-regulated (D), non-regulated (N), and flanking (F). We show that these distinct domains have different expression and conservation characteristics. Microarray fluorescence units under non-relaxation conditions were used as a measure of gene transcriptional level. Fluorescence units were significantly lower in F genes than in the other domains with a similar AT content. The transcriptional level of the domains categorized them was D>U>F. In addition, a comparison of 12 S. pneumoniae genome sequences showed a conservation of gene composition within U and D domains, and an extensive gene interchange in F domains. We tested the organization of chromosomal domains by measuring the relaxation-mediated transcription of eight insertions of a heterologous Ptccat cassette, two in each type of domain, showing that transcription depended on their chromosomal location. Moreover, transcription from the four promoters directing the five genes involved in supercoiling homeostasis, located either in U (gyrB), D (topA), or N (gyrA and parEC) domains was analyzed both in their chromosomal locations and in a replicating plasmid. Although expression from the chromosomal PgyrB and PtopA showed the expected domain regulation, their expression was down-regulated in the plasmid, which behaved as a D domain. However, both PparE and PgyrA carried their own regulatory signals, their topology-dependent expression being equivalent in the plasmid or in the chromosome. In PgyrA a DNA bend acted as a DNA supercoiling sensor. These results revealed that DNA topology functions as a general transcriptional regulator, superimposed upon other more

  19. Role of global and local topology in the regulation of gene expression in Streptococcus pneumoniae.

    PubMed

    Ferrándiz, María-José; Arnanz, Cristina; Martín-Galiano, Antonio J; Rodríguez-Martín, Carlos; de la Campa, Adela G

    2014-01-01

    The most basic level of transcription regulation in Streptococcus pneumoniae is the organization of its chromosome in topological domains. In response to drugs that caused DNA-relaxation, a global transcriptional response was observed. Several chromosomal domains were identified based on the transcriptional response of their genes: up-regulated (U), down-regulated (D), non-regulated (N), and flanking (F). We show that these distinct domains have different expression and conservation characteristics. Microarray fluorescence units under non-relaxation conditions were used as a measure of gene transcriptional level. Fluorescence units were significantly lower in F genes than in the other domains with a similar AT content. The transcriptional level of the domains categorized them was D>U>F. In addition, a comparison of 12 S. pneumoniae genome sequences showed a conservation of gene composition within U and D domains, and an extensive gene interchange in F domains. We tested the organization of chromosomal domains by measuring the relaxation-mediated transcription of eight insertions of a heterologous Ptccat cassette, two in each type of domain, showing that transcription depended on their chromosomal location. Moreover, transcription from the four promoters directing the five genes involved in supercoiling homeostasis, located either in U (gyrB), D (topA), or N (gyrA and parEC) domains was analyzed both in their chromosomal locations and in a replicating plasmid. Although expression from the chromosomal PgyrB and PtopA showed the expected domain regulation, their expression was down-regulated in the plasmid, which behaved as a D domain. However, both PparE and PgyrA carried their own regulatory signals, their topology-dependent expression being equivalent in the plasmid or in the chromosome. In PgyrA a DNA bend acted as a DNA supercoiling sensor. These results revealed that DNA topology functions as a general transcriptional regulator, superimposed upon other more

  20. DGIdb: mining the druggable genome.

    PubMed

    Griffith, Malachi; Griffith, Obi L; Coffman, Adam C; Weible, James V; McMichael, Josh F; Spies, Nicholas C; Koval, James; Das, Indraniel; Callaway, Matthew B; Eldred, James M; Miller, Christopher A; Subramanian, Janakiraman; Govindan, Ramaswamy; Kumar, Runjun D; Bose, Ron; Ding, Li; Walker, Jason R; Larson, David E; Dooling, David J; Smith, Scott M; Ley, Timothy J; Mardis, Elaine R; Wilson, Richard K

    2013-12-01

    The Drug-Gene Interaction database (DGIdb) mines existing resources that generate hypotheses about how mutated genes might be targeted therapeutically or prioritized for drug development. It provides an interface for searching lists of genes against a compendium of drug-gene interactions and potentially 'druggable' genes. DGIdb can be accessed at http://dgidb.org/.

  1. PolySearch: a web-based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites.

    PubMed

    Cheng, Dean; Knox, Craig; Young, Nelson; Stothard, Paul; Damaraju, Sambasivarao; Wishart, David S

    2008-07-01

    A particular challenge in biomedical text mining is to find ways of handling 'comprehensive' or 'associative' queries such as 'Find all genes associated with breast cancer'. Given that many queries in genomics, proteomics or metabolomics involve these kind of comprehensive searches we believe that a web-based tool that could support these searches would be quite useful. In response to this need, we have developed the PolySearch web server. PolySearch supports >50 different classes of queries against nearly a dozen different types of text, scientific abstract or bioinformatic databases. The typical query supported by PolySearch is 'Given X, find all Y's' where X or Y can be diseases, tissues, cell compartments, gene/protein names, SNPs, mutations, drugs and metabolites. PolySearch also exploits a variety of techniques in text mining and information retrieval to identify, highlight and rank informative abstracts, paragraphs or sentences. PolySearch's performance has been assessed in tasks such as gene synonym identification, protein-protein interaction identification and disease gene identification using a variety of manually assembled 'gold standard' text corpuses. Its f-measure on these tasks is 88, 81 and 79%, respectively. These values are between 5 and 50% better than other published tools. The server is freely available at http://wishart.biology.ualberta.ca/polysearch.

  2. PolySearch: a web-based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites

    PubMed Central

    Cheng, Dean; Knox, Craig; Young, Nelson; Stothard, Paul; Damaraju, Sambasivarao; Wishart, David S.

    2008-01-01

    A particular challenge in biomedical text mining is to find ways of handling ‘comprehensive’ or ‘associative’ queries such as ‘Find all genes associated with breast cancer’. Given that many queries in genomics, proteomics or metabolomics involve these kind of comprehensive searches we believe that a web-based tool that could support these searches would be quite useful. In response to this need, we have developed the PolySearch web server. PolySearch supports >50 different classes of queries against nearly a dozen different types of text, scientific abstract or bioinformatic databases. The typical query supported by PolySearch is ‘Given X, find all Y's’ where X or Y can be diseases, tissues, cell compartments, gene/protein names, SNPs, mutations, drugs and metabolites. PolySearch also exploits a variety of techniques in text mining and information retrieval to identify, highlight and rank informative abstracts, paragraphs or sentences. PolySearch's performance has been assessed in tasks such as gene synonym identification, protein–protein interaction identification and disease gene identification using a variety of manually assembled ‘gold standard’ text corpuses. Its f-measure on these tasks is 88, 81 and 79%, respectively. These values are between 5 and 50% better than other published tools. The server is freely available at http://wishart.biology.ualberta.ca/polysearch PMID:18487273

  3. Gene expression during the first 28 days of axolotl limb regeneration I: Experimental design and global analysis of gene expression

    PubMed Central

    Palumbo, Alex; Nagarajan, Radha; Gardiner, David M.; Muneoka, Ken; Stromberg, Arnold J.; Athippozhy, Antony T.

    2015-01-01

    Abstract While it is appreciated that global gene expression analyses can provide novel insights about complex biological processes, experiments are generally insufficiently powered to achieve this goal. Here we report the results of a robust microarray experiment of axolotl forelimb regeneration. At each of 20 post‐amputation time points, we estimated gene expression for 10 replicate RNA samples that were isolated from 1 mm of heterogeneous tissue collected from the distal limb tip. We show that the limb transcription program diverges progressively with time from the non‐injured state, and divergence among time adjacent samples is mostly gradual. However, punctuated episodes of transcription were identified for five intervals of time, with four of these coinciding with well‐described stages of limb regeneration—amputation, early bud, late bud, and pallet. The results suggest that regeneration is highly temporally structured and regulated by mechanisms that function within narrow windows of time to coordinate transcription within and across cell types of the regenerating limb. Our results provide an integrative framework for hypothesis generation using this complex and highly informative data set. PMID:27168937

  4. Longwall mining

    SciTech Connect

    1995-03-14

    As part of EIA`s program to provide information on coal, this report, Longwall-Mining, describes longwall mining and compares it with other underground mining methods. Using data from EIA and private sector surveys, the report describes major changes in the geologic, technological, and operating characteristics of longwall mining over the past decade. Most important, the report shows how these changes led to dramatic improvements in longwall mining productivity. For readers interested in the history of longwall mining and greater detail on recent developments affecting longwall mining, the report includes a bibliography.

  5. NusA-dependent transcription termination prevents misregulation of global gene expression

    PubMed Central

    Mondal, Smarajit; Yakhnin, Alexander V.; Sebastian, Aswathy; Albert, Istvan; Babitzke, Paul

    2017-01-01

    Intrinsic transcription terminators consist of an RNA hairpin followed by a U-rich tract, and these signals can trigger termination without the involvement of additional factors. Although NusA is known to stimulate intrinsic termination in vitro, the in vivo targets and global impact of NusA are not known because it is essential for viability. Using genome-wide 3′ end-mapping on an engineered Bacillus subtilis NusA depletion strain, we show that weak suboptimal terminators are the principle NusA substrates. Moreover, a subclass of weak non-canonical terminators was identified that completely depend on NusA for effective termination. NusA-dependent terminators tend to have weak hairpins and/or distal U-tract interruptions, supporting a model in which NusA is directly involved in the termination mechanism. Depletion of NusA altered global gene expression directly and indirectly via readthrough of suboptimal terminators. Readthrough of NusA-dependent terminators caused misregulation of genes involved in essential cellular functions, especially DNA replication and metabolism. We further show that nusA is autoregulated by a transcription attenuation mechanism that does not rely on antiterminator structures. Instead, NusA-stimulated termination in its 5′ UTR dictate