Sample records for combining gene-based methods

  1. Pathway Distiller - multisource biological pathway consolidation

    PubMed Central

    2012-01-01

    Background One method to understand and evaluate an experiment that produces a large set of genes, such as a gene expression microarray analysis, is to identify overrepresentation or enrichment for biological pathways. Because pathways are able to functionally describe the set of genes, much effort has been made to collect curated biological pathways into publicly accessible databases. When combining disparate databases, highly related or redundant pathways exist, making their consolidation into pathway concepts essential. This will facilitate unbiased, comprehensive yet streamlined analysis of experiments that result in large gene sets. Methods After gene set enrichment finds representative pathways for large gene sets, pathways are consolidated into representative pathway concepts. Three complementary, but different methods of pathway consolidation are explored. Enrichment Consolidation combines the set of the pathways enriched for the signature gene list through iterative combining of enriched pathways with other pathways with similar signature gene sets; Weighted Consolidation utilizes a Protein-Protein Interaction network based gene-weighting approach that finds clusters of both enriched and non-enriched pathways limited to the experiments' resultant gene list; and finally the de novo Consolidation method uses several measurements of pathway similarity, that finds static pathway clusters independent of any given experiment. Results We demonstrate that the three consolidation methods provide unified yet different functional insights of a resultant gene set derived from a genome-wide profiling experiment. Results from the methods are presented, demonstrating their applications in biological studies and comparing with a pathway web-based framework that also combines several pathway databases. Additionally a web-based consolidation framework that encompasses all three methods discussed in this paper, Pathway Distiller (http://cbbiweb.uthscsa.edu/PathwayDistiller), is established to allow researchers access to the methods and example microarray data described in this manuscript, and the ability to analyze their own gene list by using our unique consolidation methods. Conclusions By combining several pathway systems, implementing different, but complementary pathway consolidation methods, and providing a user-friendly web-accessible tool, we have enabled users the ability to extract functional explanations of their genome wide experiments. PMID:23134636

  2. Pathway Distiller - multisource biological pathway consolidation.

    PubMed

    Doderer, Mark S; Anguiano, Zachry; Suresh, Uthra; Dashnamoorthy, Ravi; Bishop, Alexander J R; Chen, Yidong

    2012-01-01

    One method to understand and evaluate an experiment that produces a large set of genes, such as a gene expression microarray analysis, is to identify overrepresentation or enrichment for biological pathways. Because pathways are able to functionally describe the set of genes, much effort has been made to collect curated biological pathways into publicly accessible databases. When combining disparate databases, highly related or redundant pathways exist, making their consolidation into pathway concepts essential. This will facilitate unbiased, comprehensive yet streamlined analysis of experiments that result in large gene sets. After gene set enrichment finds representative pathways for large gene sets, pathways are consolidated into representative pathway concepts. Three complementary, but different methods of pathway consolidation are explored. Enrichment Consolidation combines the set of the pathways enriched for the signature gene list through iterative combining of enriched pathways with other pathways with similar signature gene sets; Weighted Consolidation utilizes a Protein-Protein Interaction network based gene-weighting approach that finds clusters of both enriched and non-enriched pathways limited to the experiments' resultant gene list; and finally the de novo Consolidation method uses several measurements of pathway similarity, that finds static pathway clusters independent of any given experiment. We demonstrate that the three consolidation methods provide unified yet different functional insights of a resultant gene set derived from a genome-wide profiling experiment. Results from the methods are presented, demonstrating their applications in biological studies and comparing with a pathway web-based framework that also combines several pathway databases. Additionally a web-based consolidation framework that encompasses all three methods discussed in this paper, Pathway Distiller (http://cbbiweb.uthscsa.edu/PathwayDistiller), is established to allow researchers access to the methods and example microarray data described in this manuscript, and the ability to analyze their own gene list by using our unique consolidation methods. By combining several pathway systems, implementing different, but complementary pathway consolidation methods, and providing a user-friendly web-accessible tool, we have enabled users the ability to extract functional explanations of their genome wide experiments.

  3. Prediction and Validation of Disease Genes Using HeteSim Scores.

    PubMed

    Zeng, Xiangxiang; Liao, Yuanlu; Liu, Yuansheng; Zou, Quan

    2017-01-01

    Deciphering the gene disease association is an important goal in biomedical research. In this paper, we use a novel relevance measure, called HeteSim, to prioritize candidate disease genes. Two methods based on heterogeneous networks constructed using protein-protein interaction, gene-phenotype associations, and phenotype-phenotype similarity, are presented. In HeteSim_MultiPath (HSMP), HeteSim scores of different paths are combined with a constant that dampens the contributions of longer paths. In HeteSim_SVM (HSSVM), HeteSim scores are combined with a machine learning method. The 3-fold experiments show that our non-machine learning method HSMP performs better than the existing non-machine learning methods, our machine learning method HSSVM obtains similar accuracy with the best existing machine learning method CATAPULT. From the analysis of the top 10 predicted genes for different diseases, we found that HSSVM avoid the disadvantage of the existing machine learning based methods, which always predict similar genes for different diseases. The data sets and Matlab code for the two methods are freely available for download at http://lab.malab.cn/data/HeteSim/index.jsp.

  4. Evaluation of techniques for increasing recall in a dictionary approach to gene and protein name identification.

    PubMed

    Schuemie, Martijn J; Mons, Barend; Weeber, Marc; Kors, Jan A

    2007-06-01

    Gene and protein name identification in text requires a dictionary approach to relate synonyms to the same gene or protein, and to link names to external databases. However, existing dictionaries are incomplete. We investigate two complementary methods for automatic generation of a comprehensive dictionary: combination of information from existing gene and protein databases and rule-based generation of spelling variations. Both methods have been reported in literature before, but have hitherto not been combined and evaluated systematically. We combined gene and protein names from several existing databases of four different organisms. The combined dictionaries showed a substantial increase in recall on three different test sets, as compared to any single database. Application of 23 spelling variation rules to the combined dictionaries further increased recall. However, many rules appeared to have no effect and some appear to have a detrimental effect on precision.

  5. Evaluation of the efficacy of twelve mitochondrial protein-coding genes as barcodes for mollusk DNA barcoding.

    PubMed

    Yu, Hong; Kong, Lingfeng; Li, Qi

    2016-01-01

    In this study, we evaluated the efficacy of 12 mitochondrial protein-coding genes from 238 mitochondrial genomes of 140 molluscan species as potential DNA barcodes for mollusks. Three barcoding methods (distance, monophyly and character-based methods) were used in species identification. The species recovery rates based on genetic distances for the 12 genes ranged from 70.83 to 83.33%. There were no significant differences in intra- or interspecific variability among the 12 genes. The monophyly and character-based methods provided higher resolution than the distance-based method in species delimitation. Especially in closely related taxa, the character-based method showed some advantages. The results suggested that besides the standard COI barcode, other 11 mitochondrial protein-coding genes could also be potentially used as a molecular diagnostic for molluscan species discrimination. Our results also showed that the combination of mitochondrial genes did not enhance the efficacy for species identification and a single mitochondrial gene would be fully competent.

  6. Combining evidence, biomedical literature and statistical dependence: new insights for functional annotation of gene sets

    PubMed Central

    Aubry, Marc; Monnier, Annabelle; Chicault, Celine; de Tayrac, Marie; Galibert, Marie-Dominique; Burgun, Anita; Mosser, Jean

    2006-01-01

    Background Large-scale genomic studies based on transcriptome technologies provide clusters of genes that need to be functionally annotated. The Gene Ontology (GO) implements a controlled vocabulary organised into three hierarchies: cellular components, molecular functions and biological processes. This terminology allows a coherent and consistent description of the knowledge about gene functions. The GO terms related to genes come primarily from semi-automatic annotations made by trained biologists (annotation based on evidence) or text-mining of the published scientific literature (literature profiling). Results We report an original functional annotation method based on a combination of evidence and literature that overcomes the weaknesses and the limitations of each approach. It relies on the Gene Ontology Annotation database (GOA Human) and the PubGene biomedical literature index. We support these annotations with statistically associated GO terms and retrieve associative relations across the three GO hierarchies to emphasise the major pathways involved by a gene cluster. Both annotation methods and associative relations were quantitatively evaluated with a reference set of 7397 genes and a multi-cluster study of 14 clusters. We also validated the biological appropriateness of our hybrid method with the annotation of a single gene (cdc2) and that of a down-regulated cluster of 37 genes identified by a transcriptome study of an in vitro enterocyte differentiation model (CaCo-2 cells). Conclusion The combination of both approaches is more informative than either separate approach: literature mining can enrich an annotation based only on evidence. Text-mining of the literature can also find valuable associated MEDLINE references that confirm the relevance of the annotation. Eventually, GO terms networks can be built with associative relations in order to highlight cooperative and competitive pathways and their connected molecular functions. PMID:16674810

  7. A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions

    PubMed Central

    Glusman, Gustavo; Qin, Shizhen; El-Gewely, M. Raafat; Siegel, Andrew F; Roach, Jared C; Hood, Leroy; Smit, Arian F. A

    2006-01-01

    The identification and characterization of the complete ensemble of genes is a main goal of deciphering the digital information stored in the human genome. Many algorithms for computational gene prediction have been described, ultimately derived from two basic concepts: (1) modeling gene structure and (2) recognizing sequence similarity. Successful hybrid methods combining these two concepts have also been developed. We present a third orthogonal approach to gene prediction, based on detecting the genomic signatures of transcription, accumulated over evolutionary time. We discuss four algorithms based on this third concept: Greens and CHOWDER, which quantify mutational strand biases caused by transcription-coupled DNA repair, and ROAST and PASTA, which are based on strand-specific selection against polyadenylation signals. We combined these algorithms into an integrated method called FEAST, which we used to predict the location and orientation of thousands of putative transcription units not overlapping known genes. Many of the newly predicted transcriptional units do not appear to code for proteins. The new algorithms are particularly apt at detecting genes with long introns and lacking sequence conservation. They therefore complement existing gene prediction methods and will help identify functional transcripts within many apparent “genomic deserts.” PMID:16543943

  8. Identifying metabolic enzymes with multiple types of association evidence

    PubMed Central

    Kharchenko, Peter; Chen, Lifeng; Freund, Yoav; Vitkup, Dennis; Church, George M

    2006-01-01

    Background Existing large-scale metabolic models of sequenced organisms commonly include enzymatic functions which can not be attributed to any gene in that organism. Existing computational strategies for identifying such missing genes rely primarily on sequence homology to known enzyme-encoding genes. Results We present a novel method for identifying genes encoding for a specific metabolic function based on a local structure of metabolic network and multiple types of functional association evidence, including clustering of genes on the chromosome, similarity of phylogenetic profiles, gene expression, protein fusion events and others. Using E. coli and S. cerevisiae metabolic networks, we illustrate predictive ability of each individual type of association evidence and show that significantly better predictions can be obtained based on the combination of all data. In this way our method is able to predict 60% of enzyme-encoding genes of E. coli metabolism within the top 10 (out of 3551) candidates for their enzymatic function, and as a top candidate within 43% of the cases. Conclusion We illustrate that a combination of genome context and other functional association evidence is effective in predicting genes encoding metabolic enzymes. Our approach does not rely on direct sequence homology to known enzyme-encoding genes, and can be used in conjunction with traditional homology-based metabolic reconstruction methods. The method can also be used to target orphan metabolic activities. PMID:16571130

  9. Research on the Bionics Design of Automobile Styling Based on the Form Gene

    NASA Astrophysics Data System (ADS)

    Aili, Zhao; Long, Jiang

    2017-09-01

    From the heritage of form gene point of view, this thesis has analyzed the gene make-up, cultural inheritance and aesthetic features in the evolution and development of forms of brand automobiles and proposed the bionic design concept and methods in the automobile styling design. And this innovative method must be based on the form gene, and the consistency and combination of form element must be maintained during the design. Taking the design of Maserati as an example, the thesis will show you the design method and philosophy in the aspects of form gene expression and bionic design innovation for the future automobile styling.

  10. An efficient ensemble learning method for gene microarray classification.

    PubMed

    Osareh, Alireza; Shadgar, Bita

    2013-01-01

    The gene microarray analysis and classification have demonstrated an effective way for the effective diagnosis of diseases and cancers. However, it has been also revealed that the basic classification techniques have intrinsic drawbacks in achieving accurate gene classification and cancer diagnosis. On the other hand, classifier ensembles have received increasing attention in various applications. Here, we address the gene classification issue using RotBoost ensemble methodology. This method is a combination of Rotation Forest and AdaBoost techniques which in turn preserve both desirable features of an ensemble architecture, that is, accuracy and diversity. To select a concise subset of informative genes, 5 different feature selection algorithms are considered. To assess the efficiency of the RotBoost, other nonensemble/ensemble techniques including Decision Trees, Support Vector Machines, Rotation Forest, AdaBoost, and Bagging are also deployed. Experimental results have revealed that the combination of the fast correlation-based feature selection method with ICA-based RotBoost ensemble is highly effective for gene classification. In fact, the proposed method can create ensemble classifiers which outperform not only the classifiers produced by the conventional machine learning but also the classifiers generated by two widely used conventional ensemble learning methods, that is, Bagging and AdaBoost.

  11. Combining guilt-by-association and guilt-by-profiling to predict Saccharomyces cerevisiae gene function

    PubMed Central

    Tian, Weidong; Zhang, Lan V; Taşan, Murat; Gibbons, Francis D; King, Oliver D; Park, Julie; Wunderlich, Zeba; Cherry, J Michael; Roth, Frederick P

    2008-01-01

    Background: Learning the function of genes is a major goal of computational genomics. Methods for inferring gene function have typically fallen into two categories: 'guilt-by-profiling', which exploits correlation between function and other gene characteristics; and 'guilt-by-association', which transfers function from one gene to another via biological relationships. Results: We have developed a strategy ('Funckenstein') that performs guilt-by-profiling and guilt-by-association and combines the results. Using a benchmark set of functional categories and input data for protein-coding genes in Saccharomyces cerevisiae, Funckenstein was compared with a previous combined strategy. Subsequently, we applied Funckenstein to 2,455 Gene Ontology terms. In the process, we developed 2,455 guilt-by-profiling classifiers based on 8,848 gene characteristics and 12 functional linkage graphs based on 23 biological relationships. Conclusion: Funckenstein outperforms a previous combined strategy using a common benchmark dataset. The combination of 'guilt-by-profiling' and 'guilt-by-association' gave significant improvement over the component classifiers, showing the greatest synergy for the most specific functions. Performance was evaluated by cross-validation and by literature examination of the top-scoring novel predictions. These quantitative predictions should help prioritize experimental study of yeast gene functions. PMID:18613951

  12. Combining multiple tools outperforms individual methods in gene set enrichment analyses.

    PubMed

    Alhamdoosh, Monther; Ng, Milica; Wilson, Nicholas J; Sheridan, Julie M; Huynh, Huy; Wilson, Michael J; Ritchie, Matthew E

    2017-02-01

    Gene set enrichment (GSE) analysis allows researchers to efficiently extract biological insight from long lists of differentially expressed genes by interrogating them at a systems level. In recent years, there has been a proliferation of GSE analysis methods and hence it has become increasingly difficult for researchers to select an optimal GSE tool based on their particular dataset. Moreover, the majority of GSE analysis methods do not allow researchers to simultaneously compare gene set level results between multiple experimental conditions. The ensemble of genes set enrichment analyses (EGSEA) is a method developed for RNA-sequencing data that combines results from twelve algorithms and calculates collective gene set scores to improve the biological relevance of the highest ranked gene sets. EGSEA's gene set database contains around 25 000 gene sets from sixteen collections. It has multiple visualization capabilities that allow researchers to view gene sets at various levels of granularity. EGSEA has been tested on simulated data and on a number of human and mouse datasets and, based on biologists' feedback, consistently outperforms the individual tools that have been combined. Our evaluation demonstrates the superiority of the ensemble approach for GSE analysis, and its utility to effectively and efficiently extrapolate biological functions and potential involvement in disease processes from lists of differentially regulated genes. EGSEA is available as an R package at http://www.bioconductor.org/packages/EGSEA/ . The gene sets collections are available in the R package EGSEAdata from http://www.bioconductor.org/packages/EGSEAdata/ . monther.alhamdoosh@csl.com.au mritchie@wehi.edu.au. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  13. Revealing cell cycle control by combining model-based detection of periodic expression with novel cis-regulatory descriptors

    PubMed Central

    Andersson, Claes R; Hvidsten, Torgeir R; Isaksson, Anders; Gustafsson, Mats G; Komorowski, Jan

    2007-01-01

    Background We address the issue of explaining the presence or absence of phase-specific transcription in budding yeast cultures under different conditions. To this end we use a model-based detector of gene expression periodicity to divide genes into classes depending on their behavior in experiments using different synchronization methods. While computational inference of gene regulatory circuits typically relies on expression similarity (clustering) in order to find classes of potentially co-regulated genes, this method instead takes advantage of known time profile signatures related to the studied process. Results We explain the regulatory mechanisms of the inferred periodic classes with cis-regulatory descriptors that combine upstream sequence motifs with experimentally determined binding of transcription factors. By systematic statistical analysis we show that periodic classes are best explained by combinations of descriptors rather than single descriptors, and that different combinations correspond to periodic expression in different classes. We also find evidence for additive regulation in that the combinations of cis-regulatory descriptors associated with genes periodically expressed in fewer conditions are frequently subsets of combinations associated with genes periodically expression in more conditions. Finally, we demonstrate that our approach retrieves combinations that are more specific towards known cell-cycle related regulators than the frequently used clustering approach. Conclusion The results illustrate how a model-based approach to expression analysis may be particularly well suited to detect biologically relevant mechanisms. Our new approach makes it possible to provide more refined hypotheses about regulatory mechanisms of the cell cycle and it can easily be adjusted to reveal regulation of other, non-periodic, cellular processes. PMID:17939860

  14. Gene-Based Testing of Interactions in Association Studies of Quantitative Traits

    PubMed Central

    Ma, Li; Clark, Andrew G.; Keinan, Alon

    2013-01-01

    Various methods have been developed for identifying gene–gene interactions in genome-wide association studies (GWAS). However, most methods focus on individual markers as the testing unit, and the large number of such tests drastically erodes statistical power. In this study, we propose novel interaction tests of quantitative traits that are gene-based and that confer advantage in both statistical power and biological interpretation. The framework of gene-based gene–gene interaction (GGG) tests combine marker-based interaction tests between all pairs of markers in two genes to produce a gene-level test for interaction between the two. The tests are based on an analytical formula we derive for the correlation between marker-based interaction tests due to linkage disequilibrium. We propose four GGG tests that extend the following P value combining methods: minimum P value, extended Simes procedure, truncated tail strength, and truncated P value product. Extensive simulations point to correct type I error rates of all tests and show that the two truncated tests are more powerful than the other tests in cases of markers involved in the underlying interaction not being directly genotyped and in cases of multiple underlying interactions. We applied our tests to pairs of genes that exhibit a protein–protein interaction to test for gene-level interactions underlying lipid levels using genotype data from the Atherosclerosis Risk in Communities study. We identified five novel interactions that are not evident from marker-based interaction testing and successfully replicated one of these interactions, between SMAD3 and NEDD9, in an independent sample from the Multi-Ethnic Study of Atherosclerosis. We conclude that our GGG tests show improved power to identify gene-level interactions in existing, as well as emerging, association studies. PMID:23468652

  15. In search of functional association from time-series microarray data based on the change trend and level of gene expression

    PubMed Central

    He, Feng; Zeng, An-Ping

    2006-01-01

    Background The increasing availability of time-series expression data opens up new possibilities to study functional linkages of genes. Present methods used to infer functional linkages between genes from expression data are mainly based on a point-to-point comparison. Change trends between consecutive time points in time-series data have been so far not well explored. Results In this work we present a new method based on extracting main features of the change trend and level of gene expression between consecutive time points. The method, termed as trend correlation (TC), includes two major steps: 1, calculating a maximal local alignment of change trend score by dynamic programming and a change trend correlation coefficient between the maximal matched change levels of each gene pair; 2, inferring relationships of gene pairs based on two statistical extraction procedures. The new method considers time shifts and inverted relationships in a similar way as the local clustering (LC) method but the latter is merely based on a point-to-point comparison. The TC method is demonstrated with data from yeast cell cycle and compared with the LC method and the widely used Pearson correlation coefficient (PCC) based clustering method. The biological significance of the gene pairs is examined with several large-scale yeast databases. Although the TC method predicts an overall lower number of gene pairs than the other two methods at a same p-value threshold, the additional number of gene pairs inferred by the TC method is considerable: e.g. 20.5% compared with the LC method and 49.6% with the PCC method for a p-value threshold of 2.7E-3. Moreover, the percentage of the inferred gene pairs consistent with databases by our method is generally higher than the LC method and similar to the PCC method. A significant number of the gene pairs only inferred by the TC method are process-identity or function-similarity pairs or have well-documented biological interactions, including 443 known protein interactions and some known cell cycle related regulatory interactions. It should be emphasized that the overlapping of gene pairs detected by the three methods is normally not very high, indicating a necessity of combining the different methods in search of functional association of genes from time-series data. For a p-value threshold of 1E-5 the percentage of process-identity and function-similarity gene pairs among the shared part of the three methods reaches 60.2% and 55.6% respectively, building a good basis for further experimental and functional study. Furthermore, the combined use of methods is important to infer more complete regulatory circuits and network as exemplified in this study. Conclusion The TC method can significantly augment the current major methods to infer functional linkages and biological network and is well suitable for exploring temporal relationships of gene expression in time-series data. PMID:16478547

  16. Combining Evidence of Preferential Gene-Tissue Relationships from Multiple Sources

    PubMed Central

    Guo, Jing; Hammar, Mårten; Öberg, Lisa; Padmanabhuni, Shanmukha S.; Bjäreland, Marcus; Dalevi, Daniel

    2013-01-01

    An important challenge in drug discovery and disease prognosis is to predict genes that are preferentially expressed in one or a few tissues, i.e. showing a considerably higher expression in one tissue(s) compared to the others. Although several data sources and methods have been published explicitly for this purpose, they often disagree and it is not evident how to retrieve these genes and how to distinguish true biological findings from those that are due to choice-of-method and/or experimental settings. In this work we have developed a computational approach that combines results from multiple methods and datasets with the aim to eliminate method/study-specific biases and to improve the predictability of preferentially expressed human genes. A rule-based score is used to merge and assign support to the results. Five sets of genes with known tissue specificity were used for parameter pruning and cross-validation. In total we identify 3434 tissue-specific genes. We compare the genes of highest scores with the public databases: PaGenBase (microarray), TiGER (EST) and HPA (protein expression data). The results have 85% overlap to PaGenBase, 71% to TiGER and only 28% to HPA. 99% of our predictions have support from at least one of these databases. Our approach also performs better than any of the databases on identifying drug targets and biomarkers with known tissue-specificity. PMID:23950964

  17. Biomedical discovery acceleration, with applications to craniofacial development.

    PubMed

    Leach, Sonia M; Tipney, Hannah; Feng, Weiguo; Baumgartner, William A; Kasliwal, Priyanka; Schuyler, Ronald P; Williams, Trevor; Spritz, Richard A; Hunter, Lawrence

    2009-03-01

    The profusion of high-throughput instruments and the explosion of new results in the scientific literature, particularly in molecular biomedicine, is both a blessing and a curse to the bench researcher. Even knowledgeable and experienced scientists can benefit from computational tools that help navigate this vast and rapidly evolving terrain. In this paper, we describe a novel computational approach to this challenge, a knowledge-based system that combines reading, reasoning, and reporting methods to facilitate analysis of experimental data. Reading methods extract information from external resources, either by parsing structured data or using biomedical language processing to extract information from unstructured data, and track knowledge provenance. Reasoning methods enrich the knowledge that results from reading by, for example, noting two genes that are annotated to the same ontology term or database entry. Reasoning is also used to combine all sources into a knowledge network that represents the integration of all sorts of relationships between a pair of genes, and to calculate a combined reliability score. Reporting methods combine the knowledge network with a congruent network constructed from experimental data and visualize the combined network in a tool that facilitates the knowledge-based analysis of that data. An implementation of this approach, called the Hanalyzer, is demonstrated on a large-scale gene expression array dataset relevant to craniofacial development. The use of the tool was critical in the creation of hypotheses regarding the roles of four genes never previously characterized as involved in craniofacial development; each of these hypotheses was validated by further experimental work.

  18. Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses

    PubMed Central

    Bayzid, Md Shamsuzzoha; Mirarab, Siavash; Boussau, Bastien; Warnow, Tandy

    2015-01-01

    Because biological processes can result in different loci having different evolutionary histories, species tree estimation requires multiple loci from across multiple genomes. While many processes can result in discord between gene trees and species trees, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is considered to be a dominant cause for gene tree heterogeneity. Coalescent-based methods have been developed to estimate species trees, many of which operate by combining estimated gene trees, and so are called "summary methods". Because summary methods are generally fast (and much faster than more complicated coalescent-based methods that co-estimate gene trees and species trees), they have become very popular techniques for estimating species trees from multiple loci. However, recent studies have established that summary methods can have reduced accuracy in the presence of gene tree estimation error, and also that many biological datasets have substantial gene tree estimation error, so that summary methods may not be highly accurate in biologically realistic conditions. Mirarab et al. (Science 2014) presented the "statistical binning" technique to improve gene tree estimation in multi-locus analyses, and showed that it improved the accuracy of MP-EST, one of the most popular coalescent-based summary methods. Statistical binning, which uses a simple heuristic to evaluate "combinability" and then uses the larger sets of genes to re-calculate gene trees, has good empirical performance, but using statistical binning within a phylogenomic pipeline does not have the desirable property of being statistically consistent. We show that weighting the re-calculated gene trees by the bin sizes makes statistical binning statistically consistent under the multispecies coalescent, and maintains the good empirical performance. Thus, "weighted statistical binning" enables highly accurate genome-scale species tree estimation, and is also statistically consistent under the multi-species coalescent model. New data used in this study are available at DOI: http://dx.doi.org/10.6084/m9.figshare.1411146, and the software is available at https://github.com/smirarab/binning. PMID:26086579

  19. AUC-based biomarker ensemble with an application on gene scores predicting low bone mineral density.

    PubMed

    Zhao, X G; Dai, W; Li, Y; Tian, L

    2011-11-01

    The area under the receiver operating characteristic (ROC) curve (AUC), long regarded as a 'golden' measure for the predictiveness of a continuous score, has propelled the need to develop AUC-based predictors. However, the AUC-based ensemble methods are rather scant, largely due to the fact that the associated objective function is neither continuous nor concave. Indeed, there is no reliable numerical algorithm identifying optimal combination of a set of biomarkers to maximize the AUC, especially when the number of biomarkers is large. We have proposed a novel AUC-based statistical ensemble methods for combining multiple biomarkers to differentiate a binary response of interest. Specifically, we propose to replace the non-continuous and non-convex AUC objective function by a convex surrogate loss function, whose minimizer can be efficiently identified. With the established framework, the lasso and other regularization techniques enable feature selections. Extensive simulations have demonstrated the superiority of the new methods to the existing methods. The proposal has been applied to a gene expression dataset to construct gene expression scores to differentiate elderly women with low bone mineral density (BMD) and those with normal BMD. The AUCs of the resulting scores in the independent test dataset has been satisfactory. Aiming for directly maximizing AUC, the proposed AUC-based ensemble method provides an efficient means of generating a stable combination of multiple biomarkers, which is especially useful under the high-dimensional settings. lutian@stanford.edu. Supplementary data are available at Bioinformatics online.

  20. A Heterogeneous Network Based Method for Identifying GBM-Related Genes by Integrating Multi-Dimensional Data.

    PubMed

    Chen Peng; Ao Li

    2017-01-01

    The emergence of multi-dimensional data offers opportunities for more comprehensive analysis of the molecular characteristics of human diseases and therefore improving diagnosis, treatment, and prevention. In this study, we proposed a heterogeneous network based method by integrating multi-dimensional data (HNMD) to identify GBM-related genes. The novelty of the method lies in that the multi-dimensional data of GBM from TCGA dataset that provide comprehensive information of genes, are combined with protein-protein interactions to construct a weighted heterogeneous network, which reflects both the general and disease-specific relationships between genes. In addition, a propagation algorithm with resistance is introduced to precisely score and rank GBM-related genes. The results of comprehensive performance evaluation show that the proposed method significantly outperforms the network based methods with single-dimensional data and other existing approaches. Subsequent analysis of the top ranked genes suggests they may be functionally implicated in GBM, which further corroborates the superiority of the proposed method. The source code and the results of HNMD can be downloaded from the following URL: http://bioinformatics.ustc.edu.cn/hnmd/ .

  1. Performance Comparison of Bench-Top Next Generation Sequencers Using Microdroplet PCR-Based Enrichment for Targeted Sequencing in Patients with Autism Spectrum Disorder

    PubMed Central

    Okamoto, Nobuhiko; Nakashima, Mitsuko; Tsurusaki, Yoshinori; Miyake, Noriko; Saitsu, Hirotomo; Matsumoto, Naomichi

    2013-01-01

    Next-generation sequencing (NGS) combined with enrichment of target genes enables highly efficient and low-cost sequencing of multiple genes for genetic diseases. The aim of this study was to validate the accuracy and sensitivity of our method for comprehensive mutation detection in autism spectrum disorder (ASD). We assessed the performance of the bench-top Ion Torrent PGM and Illumina MiSeq platforms as optimized solutions for mutation detection, using microdroplet PCR-based enrichment of 62 ASD associated genes. Ten patients with known mutations were sequenced using NGS to validate the sensitivity of our method. The overall read quality was better with MiSeq, largely because of the increased indel-related error associated with PGM. The sensitivity of SNV detection was similar between the two platforms, suggesting they are both suitable for SNV detection in the human genome. Next, we used these methods to analyze 28 patients with ASD, and identified 22 novel variants in genes associated with ASD, with one mutation detected by MiSeq only. Thus, our results support the combination of target gene enrichment and NGS as a valuable molecular method for investigating rare variants in ASD. PMID:24066114

  2. Prioritization of candidate disease genes by combining topological similarity and semantic similarity.

    PubMed

    Liu, Bin; Jin, Min; Zeng, Pan

    2015-10-01

    The identification of gene-phenotype relationships is very important for the treatment of human diseases. Studies have shown that genes causing the same or similar phenotypes tend to interact with each other in a protein-protein interaction (PPI) network. Thus, many identification methods based on the PPI network model have achieved good results. However, in the PPI network, some interactions between the proteins encoded by candidate gene and the proteins encoded by known disease genes are very weak. Therefore, some studies have combined the PPI network with other genomic information and reported good predictive performances. However, we believe that the results could be further improved. In this paper, we propose a new method that uses the semantic similarity between the candidate gene and known disease genes to set the initial probability vector of a random walk with a restart algorithm in a human PPI network. The effectiveness of our method was demonstrated by leave-one-out cross-validation, and the experimental results indicated that our method outperformed other methods. Additionally, our method can predict new causative genes of multifactor diseases, including Parkinson's disease, breast cancer and obesity. The top predictions were good and consistent with the findings in the literature, which further illustrates the effectiveness of our method. Copyright © 2015 Elsevier Inc. All rights reserved.

  3. Multiconstrained gene clustering based on generalized projections

    PubMed Central

    2010-01-01

    Background Gene clustering for annotating gene functions is one of the fundamental issues in bioinformatics. The best clustering solution is often regularized by multiple constraints such as gene expressions, Gene Ontology (GO) annotations and gene network structures. How to integrate multiple pieces of constraints for an optimal clustering solution still remains an unsolved problem. Results We propose a novel multiconstrained gene clustering (MGC) method within the generalized projection onto convex sets (POCS) framework used widely in image reconstruction. Each constraint is formulated as a corresponding set. The generalized projector iteratively projects the clustering solution onto these sets in order to find a consistent solution included in the intersection set that satisfies all constraints. Compared with previous MGC methods, POCS can integrate multiple constraints from different nature without distorting the original constraints. To evaluate the clustering solution, we also propose a new performance measure referred to as Gene Log Likelihood (GLL) that considers genes having more than one function and hence in more than one cluster. Comparative experimental results show that our POCS-based gene clustering method outperforms current state-of-the-art MGC methods. Conclusions The POCS-based MGC method can successfully combine multiple constraints from different nature for gene clustering. Also, the proposed GLL is an effective performance measure for the soft clustering solutions. PMID:20356386

  4. Simultaneous mutation detection of three homoeologous genes in wheat by High Resolution Melting analysis and Mutation Surveyor.

    PubMed

    Dong, Chongmei; Vincent, Kate; Sharp, Peter

    2009-12-04

    TILLING (Targeting Induced Local Lesions IN Genomes) is a powerful tool for reverse genetics, combining traditional chemical mutagenesis with high-throughput PCR-based mutation detection to discover induced mutations that alter protein function. The most popular mutation detection method for TILLING is a mismatch cleavage assay using the endonuclease CelI. For this method, locus-specific PCR is essential. Most wheat genes are present as three similar sequences with high homology in exons and low homology in introns. Locus-specific primers can usually be designed in introns. However, it is sometimes difficult to design locus-specific PCR primers in a conserved region with high homology among the three homoeologous genes, or in a gene lacking introns, or if information on introns is not available. Here we describe a mutation detection method which combines High Resolution Melting (HRM) analysis of mixed PCR amplicons containing three homoeologous gene fragments and sequence analysis using Mutation Surveyor software, aimed at simultaneous detection of mutations in three homoeologous genes. We demonstrate that High Resolution Melting (HRM) analysis can be used in mutation scans in mixed PCR amplicons containing three homoeologous gene fragments. Combining HRM scanning with sequence analysis using Mutation Surveyor is sensitive enough to detect a single nucleotide mutation in the heterozygous state in a mixed PCR amplicon containing three homoeoloci. The method was tested and validated in an EMS (ethylmethane sulfonate)-treated wheat TILLING population, screening mutations in the carboxyl terminal domain of the Starch Synthase II (SSII) gene. Selected identified mutations of interest can be further analysed by cloning to confirm the mutation and determine the genomic origin of the mutation. Polyploidy is common in plants. Conserved regions of a gene often represent functional domains and have high sequence similarity between homoeologous loci. The method described here is a useful alternative to locus-specific based methods for screening mutations in conserved functional domains of homoeologous genes. This method can also be used for SNP (single nucleotide polymorphism) marker development and eco-TILLING in polyploid species.

  5. Improving the measurement of semantic similarity by combining gene ontology and co-functional network: a random walk based approach.

    PubMed

    Peng, Jiajie; Zhang, Xuanshuo; Hui, Weiwei; Lu, Junya; Li, Qianqian; Liu, Shuhui; Shang, Xuequn

    2018-03-19

    Gene Ontology (GO) is one of the most popular bioinformatics resources. In the past decade, Gene Ontology-based gene semantic similarity has been effectively used to model gene-to-gene interactions in multiple research areas. However, most existing semantic similarity approaches rely only on GO annotations and structure, or incorporate only local interactions in the co-functional network. This may lead to inaccurate GO-based similarity resulting from the incomplete GO topology structure and gene annotations. We present NETSIM2, a new network-based method that allows researchers to measure GO-based gene functional similarities by considering the global structure of the co-functional network with a random walk with restart (RWR)-based method, and by selecting the significant term pairs to decrease the noise information. Based on the EC number (Enzyme Commission)-based groups of yeast and Arabidopsis, evaluation test shows that NETSIM2 can enhance the accuracy of Gene Ontology-based gene functional similarity. Using NETSIM2 as an example, we found that the accuracy of semantic similarities can be significantly improved after effectively incorporating the global gene-to-gene interactions in the co-functional network, especially on the species that gene annotations in GO are far from complete.

  6. Enhanced capillary electrophoretic screening of Alzheimer based on direct apolipoprotein E genotyping and one-step multiplex PCR.

    PubMed

    Woo, Nain; Kim, Su-Kang; Sun, Yucheng; Kang, Seong Ho

    2018-01-01

    Human apolipoprotein E (ApoE) is associated with high cholesterol levels, coronary artery disease, and especially Alzheimer's disease. In this study, we developed an ApoE genotyping and one-step multiplex polymerase chain reaction (PCR) based-capillary electrophoresis (CE) method for the enhanced diagnosis of Alzheimer's. The primer mixture of ApoE genes enabled the performance of direct one-step multiplex PCR from whole blood without DNA purification. The combination of direct ApoE genotyping and one-step multiplex PCR minimized the risk of DNA loss or contamination due to the process of DNA purification. All amplified PCR products with different DNA lengths (112-, 253-, 308-, 444-, and 514-bp DNA) of the ApoE genes were analyzed within 2min by an extended voltage programming (VP)-based CE under the optimal conditions. The extended VP-based CE method was at least 120-180 times faster than conventional slab gel electrophoresis methods In particular, all amplified DNA fragments were detected in less than 10 PCR cycles using a laser-induced fluorescence detector. The detection limits of the ApoE genes were 6.4-62.0pM, which were approximately 100-100,000 times more sensitive than previous Alzheimer's diagnosis methods In addition, the combined one-step multiplex PCR and extended VP-based CE method was also successfully applied to the analysis of ApoE genotypes in Alzheimer's patients and normal samples and confirmed the distribution probability of allele frequencies. This combination of direct one-step multiplex PCR and an extended VP-based CE method should increase the diagnostic reliability of Alzheimer's with high sensitivity and short analysis time even with direct use of whole blood. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. Integrating Multiple Data Sources for Combinatorial Marker Discovery: A Study in Tumorigenesis.

    PubMed

    Bandyopadhyay, Sanghamitra; Mallik, Saurav

    2018-01-01

    Identification of combinatorial markers from multiple data sources is a challenging task in bioinformatics. Here, we propose a novel computational framework for identifying significant combinatorial markers ( s) using both gene expression and methylation data. The gene expression and methylation data are integrated into a single continuous data as well as a (post-discretized) boolean data based on their intrinsic (i.e., inverse) relationship. A novel combined score of methylation and expression data (viz., ) is introduced which is computed on the integrated continuous data for identifying initial non-redundant set of genes. Thereafter, (maximal) frequent closed homogeneous genesets are identified using a well-known biclustering algorithm applied on the integrated boolean data of the determined non-redundant set of genes. A novel sample-based weighted support ( ) is then proposed that is consecutively calculated on the integrated boolean data of the determined non-redundant set of genes in order to identify the non-redundant significant genesets. The top few resulting genesets are identified as potential s. Since our proposed method generates a smaller number of significant non-redundant genesets than those by other popular methods, the method is much faster than the others. Application of the proposed technique on an expression and a methylation data for Uterine tumor or Prostate Carcinoma produces a set of significant combination of markers. We expect that such a combination of markers will produce lower false positives than individual markers.

  8. Evaluation of Gene-Based Family-Based Methods to Detect Novel Genes Associated With Familial Late Onset Alzheimer Disease

    PubMed Central

    Fernández, Maria V.; Budde, John; Del-Aguila, Jorge L.; Ibañez, Laura; Deming, Yuetiva; Harari, Oscar; Norton, Joanne; Morris, John C.; Goate, Alison M.; Cruchaga, Carlos

    2018-01-01

    Gene-based tests to study the combined effect of rare variants on a particular phenotype have been widely developed for case-control studies, but their evolution and adaptation for family-based studies, especially studies of complex incomplete families, has been slower. In this study, we have performed a practical examination of all the latest gene-based methods available for family-based study designs using both simulated and real datasets. We examined the performance of several collapsing, variance-component, and transmission disequilibrium tests across eight different software packages and 22 models utilizing a cohort of 285 families (N = 1,235) with late-onset Alzheimer disease (LOAD). After a thorough examination of each of these tests, we propose a methodological approach to identify, with high confidence, genes associated with the tested phenotype and we provide recommendations to select the best software and model for family-based gene-based analyses. Additionally, in our dataset, we identified PTK2B, a GWAS candidate gene for sporadic AD, along with six novel genes (CHRD, CLCN2, HDLBP, CPAMD8, NLRP9, and MAS1L) as candidate genes for familial LOAD. PMID:29670507

  9. Evaluation of Gene-Based Family-Based Methods to Detect Novel Genes Associated With Familial Late Onset Alzheimer Disease.

    PubMed

    Fernández, Maria V; Budde, John; Del-Aguila, Jorge L; Ibañez, Laura; Deming, Yuetiva; Harari, Oscar; Norton, Joanne; Morris, John C; Goate, Alison M; Cruchaga, Carlos

    2018-01-01

    Gene-based tests to study the combined effect of rare variants on a particular phenotype have been widely developed for case-control studies, but their evolution and adaptation for family-based studies, especially studies of complex incomplete families, has been slower. In this study, we have performed a practical examination of all the latest gene-based methods available for family-based study designs using both simulated and real datasets. We examined the performance of several collapsing, variance-component, and transmission disequilibrium tests across eight different software packages and 22 models utilizing a cohort of 285 families ( N = 1,235) with late-onset Alzheimer disease (LOAD). After a thorough examination of each of these tests, we propose a methodological approach to identify, with high confidence, genes associated with the tested phenotype and we provide recommendations to select the best software and model for family-based gene-based analyses. Additionally, in our dataset, we identified PTK2B , a GWAS candidate gene for sporadic AD, along with six novel genes ( CHRD, CLCN2, HDLBP, CPAMD8, NLRP9 , and MAS1L ) as candidate genes for familial LOAD.

  10. Logic Learning Machine and standard supervised methods for Hodgkin's lymphoma prognosis using gene expression data and clinical variables.

    PubMed

    Parodi, Stefano; Manneschi, Chiara; Verda, Damiano; Ferrari, Enrico; Muselli, Marco

    2018-03-01

    This study evaluates the performance of a set of machine learning techniques in predicting the prognosis of Hodgkin's lymphoma using clinical factors and gene expression data. Analysed samples from 130 Hodgkin's lymphoma patients included a small set of clinical variables and more than 54,000 gene features. Machine learning classifiers included three black-box algorithms ( k-nearest neighbour, Artificial Neural Network, and Support Vector Machine) and two methods based on intelligible rules (Decision Tree and the innovative Logic Learning Machine method). Support Vector Machine clearly outperformed any of the other methods. Among the two rule-based algorithms, Logic Learning Machine performed better and identified a set of simple intelligible rules based on a combination of clinical variables and gene expressions. Decision Tree identified a non-coding gene ( XIST) involved in the early phases of X chromosome inactivation that was overexpressed in females and in non-relapsed patients. XIST expression might be responsible for the better prognosis of female Hodgkin's lymphoma patients.

  11. A robust multifactor dimensionality reduction method for detecting gene-gene interactions with application to the genetic analysis of bladder cancer susceptibility

    PubMed Central

    Gui, Jiang; Andrew, Angeline S.; Andrews, Peter; Nelson, Heather M.; Kelsey, Karl T.; Karagas, Margaret R.; Moore, Jason H.

    2010-01-01

    A central goal of human genetics is to identify and characterize susceptibility genes for common complex human diseases. An important challenge in this endeavor is the modeling of gene-gene interaction or epistasis that can result in non-additivity of genetic effects. The multifactor dimensionality reduction (MDR) method was developed as machine learning alternative to parametric logistic regression for detecting interactions in absence of significant marginal effects. The goal of MDR is to reduce the dimensionality inherent in modeling combinations of polymorphisms using a computational approach called constructive induction. Here, we propose a Robust Multifactor Dimensionality Reduction (RMDR) method that performs constructive induction using a Fisher’s Exact Test rather than a predetermined threshold. The advantage of this approach is that only those genotype combinations that are determined to be statistically significant are considered in the MDR analysis. We use two simulation studies to demonstrate that this approach will increase the success rate of MDR when there are only a few genotype combinations that are significantly associated with case-control status. We show that there is no loss of success rate when this is not the case. We then apply the RMDR method to the detection of gene-gene interactions in genotype data from a population-based study of bladder cancer in New Hampshire. PMID:21091664

  12. Direct isolation of differentially expressed genes from a specific chromosome region of common wheat: application of the amplified fragment length polymorphism-based mRNA fingerprinting (AMF) method in combination with a deletion line of wheat.

    PubMed

    Kojima, T; Habu, Y; Iida, S; Ogihara, Y

    2000-05-01

    The amplified restriction fragment length polymorphism (AFLP)-based mRNA fingerprinting (AMF) method makes it possible systematically and conveniently to identify differentially expressed cDNAs with high reproducibility. We have applied the AMF method to the cloning of the Q gene of common wheat, which is located on the long arm of chromosome 5A and pleiotropically controls the spike morphology and the threshing character of seeds. Using the AMF method, we compared the fingerprints of mRNA samples extracted from the young spikes of Triticum aestivum cv. Chinese Spring (CS) carrying the Q gene to those of a chromosome deletion line of CS, namely, q5, which lacks 15% of 5AL including the Q gene. Approximately 12,200 fragments were produced after PCR with 256 primer combinations. Of these, 92 fragments were differentially expressed between CS and q5. Northern and Southern analyses showed that 16 fragments gave specific or relatively stronger transcript signals in CS, and these clones were present in single copy or in low copy numbers in the wheat genome. Four clones were genetically mapped to the region deleted in q5. Subsequently, one clone, pTaQ22, was mapped at the same locus as the Q gene, indicating that pTaQ22 corresponds to the Q gene or is tightly linked to it. DNA sequence data showed that pTaQ22 had no homology to any known genes, thus suggesting a novel function for this gene in flower morphogenesis. This AMF method might provide a straightforward method for isolating genes in the hexaploid background of common wheat.

  13. A cluster merging method for time series microarray with production values.

    PubMed

    Chira, Camelia; Sedano, Javier; Camara, Monica; Prieto, Carlos; Villar, Jose R; Corchado, Emilio

    2014-09-01

    A challenging task in time-course microarray data analysis is to cluster genes meaningfully combining the information provided by multiple replicates covering the same key time points. This paper proposes a novel cluster merging method to accomplish this goal obtaining groups with highly correlated genes. The main idea behind the proposed method is to generate a clustering starting from groups created based on individual temporal series (representing different biological replicates measured in the same time points) and merging them by taking into account the frequency by which two genes are assembled together in each clustering. The gene groups at the level of individual time series are generated using several shape-based clustering methods. This study is focused on a real-world time series microarray task with the aim to find co-expressed genes related to the production and growth of a certain bacteria. The shape-based clustering methods used at the level of individual time series rely on identifying similar gene expression patterns over time which, in some models, are further matched to the pattern of production/growth. The proposed cluster merging method is able to produce meaningful gene groups which can be naturally ranked by the level of agreement on the clustering among individual time series. The list of clusters and genes is further sorted based on the information correlation coefficient and new problem-specific relevant measures. Computational experiments and results of the cluster merging method are analyzed from a biological perspective and further compared with the clustering generated based on the mean value of time series and the same shape-based algorithm.

  14. Analysis of Gene Expression Profiles of Soft Tissue Sarcoma Using a Combination of Knowledge-Based Filtering with Integration of Multiple Statistics

    PubMed Central

    Doi, Ayano; Ichinohe, Risa; Ikuyo, Yoriko; Takahashi, Teruyoshi; Marui, Shigetaka; Yasuhara, Koji; Nakamura, Tetsuro; Sugita, Shintaro; Sakamoto, Hiromi; Yoshida, Teruhiko; Hasegawa, Tadashi

    2014-01-01

    The diagnosis and treatment of soft tissue sarcomas (STS) have been difficult. Of the diverse histological subtypes, undifferentiated pleomorphic sarcoma (UPS) is particularly difficult to diagnose accurately, and its classification per se is still controversial. Recent advances in genomic technologies provide an excellent way to address such problems. However, it is often difficult, if not impossible, to identify definitive disease-associated genes using genome-wide analysis alone, primarily because of multiple testing problems. In the present study, we analyzed microarray data from 88 STS patients using a combination method that used knowledge-based filtering and a simulation based on the integration of multiple statistics to reduce multiple testing problems. We identified 25 genes, including hypoxia-related genes (e.g., MIF, SCD1, P4HA1, ENO1, and STAT1) and cell cycle- and DNA repair-related genes (e.g., TACC3, PRDX1, PRKDC, and H2AFY). These genes showed significant differential expression among histological subtypes, including UPS, and showed associations with overall survival. STAT1 showed a strong association with overall survival in UPS patients (logrank p = 1.84×10−6 and adjusted p value 2.99×10−3 after the permutation test). According to the literature, the 25 genes selected are useful not only as markers of differential diagnosis but also as prognostic/predictive markers and/or therapeutic targets for STS. Our combination method can identify genes that are potential prognostic/predictive factors and/or therapeutic targets in STS and possibly in other cancers. These disease-associated genes deserve further preclinical and clinical validation. PMID:25188299

  15. Gene Prioritization of Resistant Rice Gene against Xanthomas oryzae pv. oryzae by Using Text Mining Technologies

    PubMed Central

    Xia, Jingbo; Zhang, Xing; Yuan, Daojun; Chen, Lingling; Webster, Jonathan; Fang, Alex Chengyu

    2013-01-01

    To effectively assess the possibility of the unknown rice protein resistant to Xanthomonas oryzae pv. oryzae, a hybrid strategy is proposed to enhance gene prioritization by combining text mining technologies with a sequence-based approach. The text mining technique of term frequency inverse document frequency is used to measure the importance of distinguished terms which reflect biomedical activity in rice before candidate genes are screened and vital terms are produced. Afterwards, a built-in classifier under the chaos games representation algorithm is used to sieve the best possible candidate gene. Our experiment results show that the combination of these two methods achieves enhanced gene prioritization. PMID:24371834

  16. Gene prioritization of resistant rice gene against Xanthomas oryzae pv. oryzae by using text mining technologies.

    PubMed

    Xia, Jingbo; Zhang, Xing; Yuan, Daojun; Chen, Lingling; Webster, Jonathan; Fang, Alex Chengyu

    2013-01-01

    To effectively assess the possibility of the unknown rice protein resistant to Xanthomonas oryzae pv. oryzae, a hybrid strategy is proposed to enhance gene prioritization by combining text mining technologies with a sequence-based approach. The text mining technique of term frequency inverse document frequency is used to measure the importance of distinguished terms which reflect biomedical activity in rice before candidate genes are screened and vital terms are produced. Afterwards, a built-in classifier under the chaos games representation algorithm is used to sieve the best possible candidate gene. Our experiment results show that the combination of these two methods achieves enhanced gene prioritization.

  17. A computational approach to candidate gene prioritization for X-linked mental retardation using annotation-based binary filtering and motif-based linear discriminatory analysis

    PubMed Central

    2011-01-01

    Background Several computational candidate gene selection and prioritization methods have recently been developed. These in silico selection and prioritization techniques are usually based on two central approaches - the examination of similarities to known disease genes and/or the evaluation of functional annotation of genes. Each of these approaches has its own caveats. Here we employ a previously described method of candidate gene prioritization based mainly on gene annotation, in accompaniment with a technique based on the evaluation of pertinent sequence motifs or signatures, in an attempt to refine the gene prioritization approach. We apply this approach to X-linked mental retardation (XLMR), a group of heterogeneous disorders for which some of the underlying genetics is known. Results The gene annotation-based binary filtering method yielded a ranked list of putative XLMR candidate genes with good plausibility of being associated with the development of mental retardation. In parallel, a motif finding approach based on linear discriminatory analysis (LDA) was employed to identify short sequence patterns that may discriminate XLMR from non-XLMR genes. High rates (>80%) of correct classification was achieved, suggesting that the identification of these motifs effectively captures genomic signals associated with XLMR vs. non-XLMR genes. The computational tools developed for the motif-based LDA is integrated into the freely available genomic analysis portal Galaxy (http://main.g2.bx.psu.edu/). Nine genes (APLN, ZC4H2, MAGED4, MAGED4B, RAP2C, FAM156A, FAM156B, TBL1X, and UXT) were highlighted as highly-ranked XLMR methods. Conclusions The combination of gene annotation information and sequence motif-orientated computational candidate gene prediction methods highlight an added benefit in generating a list of plausible candidate genes, as has been demonstrated for XLMR. Reviewers: This article was reviewed by Dr Barbara Bardoni (nominated by Prof Juergen Brosius); Prof Neil Smalheiser and Dr Dustin Holloway (nominated by Prof Charles DeLisi). PMID:21668950

  18. shinyGISPA: A web application for characterizing phenotype by gene sets using multiple omics data combinations.

    PubMed

    Dwivedi, Bhakti; Kowalski, Jeanne

    2018-01-01

    While many methods exist for integrating multi-omics data or defining gene sets, there is no one single tool that defines gene sets based on merging of multiple omics data sets. We present shinyGISPA, an open-source application with a user-friendly web-based interface to define genes according to their similarity in several molecular changes that are driving a disease phenotype. This tool was developed to help facilitate the usability of a previously published method, Gene Integrated Set Profile Analysis (GISPA), among researchers with limited computer-programming skills. The GISPA method allows the identification of multiple gene sets that may play a role in the characterization, clinical application, or functional relevance of a disease phenotype. The tool provides an automated workflow that is highly scalable and adaptable to applications that go beyond genomic data merging analysis. It is available at http://shinygispa.winship.emory.edu/shinyGISPA/.

  19. shinyGISPA: A web application for characterizing phenotype by gene sets using multiple omics data combinations

    PubMed Central

    Dwivedi, Bhakti

    2018-01-01

    While many methods exist for integrating multi-omics data or defining gene sets, there is no one single tool that defines gene sets based on merging of multiple omics data sets. We present shinyGISPA, an open-source application with a user-friendly web-based interface to define genes according to their similarity in several molecular changes that are driving a disease phenotype. This tool was developed to help facilitate the usability of a previously published method, Gene Integrated Set Profile Analysis (GISPA), among researchers with limited computer-programming skills. The GISPA method allows the identification of multiple gene sets that may play a role in the characterization, clinical application, or functional relevance of a disease phenotype. The tool provides an automated workflow that is highly scalable and adaptable to applications that go beyond genomic data merging analysis. It is available at http://shinygispa.winship.emory.edu/shinyGISPA/. PMID:29415010

  20. Weighted functional linear regression models for gene-based association analysis.

    PubMed

    Belonogova, Nadezhda M; Svishcheva, Gulnara R; Wilson, James F; Campbell, Harry; Axenovich, Tatiana I

    2018-01-01

    Functional linear regression models are effectively used in gene-based association analysis of complex traits. These models combine information about individual genetic variants, taking into account their positions and reducing the influence of noise and/or observation errors. To increase the power of methods, where several differently informative components are combined, weights are introduced to give the advantage to more informative components. Allele-specific weights have been introduced to collapsing and kernel-based approaches to gene-based association analysis. Here we have for the first time introduced weights to functional linear regression models adapted for both independent and family samples. Using data simulated on the basis of GAW17 genotypes and weights defined by allele frequencies via the beta distribution, we demonstrated that type I errors correspond to declared values and that increasing the weights of causal variants allows the power of functional linear models to be increased. We applied the new method to real data on blood pressure from the ORCADES sample. Five of the six known genes with P < 0.1 in at least one analysis had lower P values with weighted models. Moreover, we found an association between diastolic blood pressure and the VMP1 gene (P = 8.18×10-6), when we used a weighted functional model. For this gene, the unweighted functional and weighted kernel-based models had P = 0.004 and 0.006, respectively. The new method has been implemented in the program package FREGAT, which is freely available at https://cran.r-project.org/web/packages/FREGAT/index.html.

  1. Prediction of essential proteins based on gene expression programming.

    PubMed

    Zhong, Jiancheng; Wang, Jianxin; Peng, Wei; Zhang, Zhen; Pan, Yi

    2013-01-01

    Essential proteins are indispensable for cell survive. Identifying essential proteins is very important for improving our understanding the way of a cell working. There are various types of features related to the essentiality of proteins. Many methods have been proposed to combine some of them to predict essential proteins. However, it is still a big challenge for designing an effective method to predict them by integrating different features, and explaining how these selected features decide the essentiality of protein. Gene expression programming (GEP) is a learning algorithm and what it learns specifically is about relationships between variables in sets of data and then builds models to explain these relationships. In this work, we propose a GEP-based method to predict essential protein by combing some biological features and topological features. We carry out experiments on S. cerevisiae data. The experimental results show that the our method achieves better prediction performance than those methods using individual features. Moreover, our method outperforms some machine learning methods and performs as well as a method which is obtained by combining the outputs of eight machine learning methods. The accuracy of predicting essential proteins can been improved by using GEP method to combine some topological features and biological features.

  2. Classification of intramural metastases and lymph node metastases of esophageal cancer from gene expression based on boosting and projective adaptive resonance theory.

    PubMed

    Takahashi, Hiro; Aoyagi, Kazuhiko; Nakanishi, Yukihiro; Sasaki, Hiroki; Yoshida, Teruhiko; Honda, Hiroyuki

    2006-07-01

    Esophageal cancer is a well-known cancer with poorer prognosis than other cancers. An optimal and individualized treatment protocol based on accurate diagnosis is urgently needed to improve the treatment of cancer patients. For this purpose, it is important to develop a sophisticated algorithm that can manage a large amount of data, such as gene expression data from DNA microarrays, for optimal and individualized diagnosis. Marker gene selection is essential in the analysis of gene expression data. We have already developed a combination method of the use of the projective adaptive resonance theory and that of a boosted fuzzy classifier with the SWEEP operator denoted PART-BFCS. This method is superior to other methods, and has four features, namely fast calculation, accurate prediction, reliable prediction, and rule extraction. In this study, we applied this method to analyze microarray data obtained from esophageal cancer patients. A combination method of PART-BFCS and the U-test was also investigated. It was necessary to use a specific type of BFCS, namely, BFCS-1,2, because the esophageal cancer data were very complexity. PART-BFCS and PART-BFCS with the U-test models showed higher performances than two conventional methods, namely, k-nearest neighbor (kNN) and weighted voting (WV). The genes including CDK6 could be found by our methods and excellent IF-THEN rules could be extracted. The genes selected in this study have a high potential as new diagnosis markers for esophageal cancer. These results indicate that the new methods can be used in marker gene selection for the diagnosis of cancer patients.

  3. Identification of type 2 diabetes-associated combination of SNPs using support vector machine.

    PubMed

    Ban, Hyo-Jeong; Heo, Jee Yeon; Oh, Kyung-Soo; Park, Keun-Joon

    2010-04-23

    Type 2 diabetes mellitus (T2D), a metabolic disorder characterized by insulin resistance and relative insulin deficiency, is a complex disease of major public health importance. Its incidence is rapidly increasing in the developed countries. Complex diseases are caused by interactions between multiple genes and environmental factors. Most association studies aim to identify individual susceptibility single markers using a simple disease model. Recent studies are trying to estimate the effects of multiple genes and multi-locus in genome-wide association. However, estimating the effects of association is very difficult. We aim to assess the rules for classifying diseased and normal subjects by evaluating potential gene-gene interactions in the same or distinct biological pathways. We analyzed the importance of gene-gene interactions in T2D susceptibility by investigating 408 single nucleotide polymorphisms (SNPs) in 87 genes involved in major T2D-related pathways in 462 T2D patients and 456 healthy controls from the Korean cohort studies. We evaluated the support vector machine (SVM) method to differentiate between cases and controls using SNP information in a 10-fold cross-validation test. We achieved a 65.3% prediction rate with a combination of 14 SNPs in 12 genes by using the radial basis function (RBF)-kernel SVM. Similarly, we investigated subpopulation data sets of men and women and identified different SNP combinations with the prediction rates of 70.9% and 70.6%, respectively. As the high-throughput technology for genome-wide SNPs improves, it is likely that a much higher prediction rate with biologically more interesting combination of SNPs can be acquired by using this method. Support Vector Machine based feature selection method in this research found novel association between combinations of SNPs and T2D in a Korean population.

  4. Improving the detection of pathways in genome-wide association studies by combined effects of SNPs from Linkage Disequilibrium blocks.

    PubMed

    Zhao, Huiying; Nyholt, Dale R; Yang, Yuanhao; Wang, Jihua; Yang, Yuedong

    2017-06-14

    Genome-wide association studies (GWAS) have successfully identified single variants associated with diseases. To increase the power of GWAS, gene-based and pathway-based tests are commonly employed to detect more risk factors. However, the gene- and pathway-based association tests may be biased towards genes or pathways containing a large number of single-nucleotide polymorphisms (SNPs) with small P-values caused by high linkage disequilibrium (LD) correlations. To address such bias, numerous pathway-based methods have been developed. Here we propose a novel method, DGAT-path, to divide all SNPs assigned to genes in each pathway into LD blocks, and to sum the chi-square statistics of LD blocks for assessing the significance of the pathway by permutation tests. The method was proven robust with the type I error rate >1.6 times lower than other methods. Meanwhile, the method displays a higher power and is not biased by the pathway size. The applications to the GWAS summary statistics for schizophrenia and breast cancer indicate that the detected top pathways contain more genes close to associated SNPs than other methods. As a result, the method identified 17 and 12 significant pathways containing 20 and 21 novel associated genes, respectively for two diseases. The method is available online by http://sparks-lab.org/server/DGAT-path .

  5. Allelic-based gene-gene interaction associated with quantitative traits.

    PubMed

    Jung, Jeesun; Sun, Bin; Kwon, Deukwoo; Koller, Daniel L; Foroud, Tatiana M

    2009-05-01

    Recent studies have shown that quantitative phenotypes may be influenced not only by multiple single nucleotide polymorphisms (SNPs) within a gene but also by the interaction between SNPs at unlinked genes. We propose a new statistical approach that can detect gene-gene interactions at the allelic level which contribute to the phenotypic variation in a quantitative trait. By testing for the association of allelic combinations at multiple unlinked loci with a quantitative trait, we can detect the SNP allelic interaction whether or not it can be detected as a main effect. Our proposed method assigns a score to unrelated subjects according to their allelic combination inferred from observed genotypes at two or more unlinked SNPs, and then tests for the association of the allelic score with a quantitative trait. To investigate the statistical properties of the proposed method, we performed a simulation study to estimate type I error rates and power and demonstrated that this allelic approach achieves greater power than the more commonly used genotypic approach to test for gene-gene interaction. As an example, the proposed method was applied to data obtained as part of a candidate gene study of sodium retention by the kidney. We found that this method detects an interaction between the calcium-sensing receptor gene (CaSR), the chloride channel gene (CLCNKB) and the Na, K, 2Cl cotransporter gene (CLC12A1) that contributes to variation in diastolic blood pressure.

  6. Cancer diagnosis marker extraction for soft tissue sarcomas based on gene expression profiling data by using projective adaptive resonance theory (PART) filtering method

    PubMed Central

    Takahashi, Hiro; Nemoto, Takeshi; Yoshida, Teruhiko; Honda, Hiroyuki; Hasegawa, Tadashi

    2006-01-01

    Background Recent advances in genome technologies have provided an excellent opportunity to determine the complete biological characteristics of neoplastic tissues, resulting in improved diagnosis and selection of treatment. To accomplish this objective, it is important to establish a sophisticated algorithm that can deal with large quantities of data such as gene expression profiles obtained by DNA microarray analysis. Results Previously, we developed the projective adaptive resonance theory (PART) filtering method as a gene filtering method. This is one of the clustering methods that can select specific genes for each subtype. In this study, we applied the PART filtering method to analyze microarray data that were obtained from soft tissue sarcoma (STS) patients for the extraction of subtype-specific genes. The performance of the filtering method was evaluated by comparison with other widely used methods, such as signal-to-noise, significance analysis of microarrays, and nearest shrunken centroids. In addition, various combinations of filtering and modeling methods were used to extract essential subtype-specific genes. The combination of the PART filtering method and boosting – the PART-BFCS method – showed the highest accuracy. Seven genes among the 15 genes that are frequently selected by this method – MIF, CYFIP2, HSPCB, TIMP3, LDHA, ABR, and RGS3 – are known prognostic marker genes for other tumors. These genes are candidate marker genes for the diagnosis of STS. Correlation analysis was performed to extract marker genes that were not selected by PART-BFCS. Sixteen genes among those extracted are also known prognostic marker genes for other tumors, and they could be candidate marker genes for the diagnosis of STS. Conclusion The procedure that consisted of two steps, such as the PART-BFCS and the correlation analysis, was proposed. The results suggest that novel diagnostic and therapeutic targets for STS can be extracted by a procedure that includes the PART filtering method. PMID:16948864

  7. SFM: A novel sequence-based fusion method for disease genes identification and prioritization.

    PubMed

    Yousef, Abdulaziz; Moghadam Charkari, Nasrollah

    2015-10-21

    The identification of disease genes from human genome is of great importance to improve diagnosis and treatment of disease. Several machine learning methods have been introduced to identify disease genes. However, these methods mostly differ in the prior knowledge used to construct the feature vector for each instance (gene), the ways of selecting negative data (non-disease genes) where there is no investigational approach to find them and the classification methods used to make the final decision. In this work, a novel Sequence-based fusion method (SFM) is proposed to identify disease genes. In this regard, unlike existing methods, instead of using a noisy and incomplete prior-knowledge, the amino acid sequence of the proteins which is universal data has been carried out to present the genes (proteins) into four different feature vectors. To select more likely negative data from candidate genes, the intersection set of four negative sets which are generated using distance approach is considered. Then, Decision Tree (C4.5) has been applied as a fusion method to combine the results of four independent state-of the-art predictors based on support vector machine (SVM) algorithm, and to make the final decision. The experimental results of the proposed method have been evaluated by some standard measures. The results indicate the precision, recall and F-measure of 82.6%, 85.6% and 84, respectively. These results confirm the efficiency and validity of the proposed method. Copyright © 2015 Elsevier Ltd. All rights reserved.

  8. cDREM: inferring dynamic combinatorial gene regulation.

    PubMed

    Wise, Aaron; Bar-Joseph, Ziv

    2015-04-01

    Genes are often combinatorially regulated by multiple transcription factors (TFs). Such combinatorial regulation plays an important role in development and facilitates the ability of cells to respond to different stresses. While a number of approaches have utilized sequence and ChIP-based datasets to study combinational regulation, these have often ignored the combinational logic and the dynamics associated with such regulation. Here we present cDREM, a new method for reconstructing dynamic models of combinatorial regulation. cDREM integrates time series gene expression data with (static) protein interaction data. The method is based on a hidden Markov model and utilizes the sparse group Lasso to identify small subsets of combinatorially active TFs, their time of activation, and the logical function they implement. We tested cDREM on yeast and human data sets. Using yeast we show that the predicted combinatorial sets agree with other high throughput genomic datasets and improve upon prior methods developed to infer combinatorial regulation. Applying cDREM to study human response to flu, we were able to identify several combinatorial TF sets, some of which were known to regulate immune response while others represent novel combinations of important TFs.

  9. A roadmap for natural product discovery based on large-scale genomics and metabolomics

    USDA-ARS?s Scientific Manuscript database

    Actinobacteria encode a wealth of natural product biosynthetic gene clusters, whose systematic study is complicated by numerous repetitive motifs. By combining several metrics we developed a method for global classification of these gene clusters into families (GCFs) and analyzed the biosynthetic ca...

  10. In silico Pathway Activation Network Decomposition Analysis (iPANDA) as a method for biomarker development.

    PubMed

    Ozerov, Ivan V; Lezhnina, Ksenia V; Izumchenko, Evgeny; Artemov, Artem V; Medintsev, Sergey; Vanhaelen, Quentin; Aliper, Alexander; Vijg, Jan; Osipov, Andreyan N; Labat, Ivan; West, Michael D; Buzdin, Anton; Cantor, Charles R; Nikolsky, Yuri; Borisov, Nikolay; Irincheeva, Irina; Khokhlovich, Edward; Sidransky, David; Camargo, Miguel Luiz; Zhavoronkov, Alex

    2016-11-16

    Signalling pathway activation analysis is a powerful approach for extracting biologically relevant features from large-scale transcriptomic and proteomic data. However, modern pathway-based methods often fail to provide stable pathway signatures of a specific phenotype or reliable disease biomarkers. In the present study, we introduce the in silico Pathway Activation Network Decomposition Analysis (iPANDA) as a scalable robust method for biomarker identification using gene expression data. The iPANDA method combines precalculated gene coexpression data with gene importance factors based on the degree of differential gene expression and pathway topology decomposition for obtaining pathway activation scores. Using Microarray Analysis Quality Control (MAQC) data sets and pretreatment data on Taxol-based neoadjuvant breast cancer therapy from multiple sources, we demonstrate that iPANDA provides significant noise reduction in transcriptomic data and identifies highly robust sets of biologically relevant pathway signatures. We successfully apply iPANDA for stratifying breast cancer patients according to their sensitivity to neoadjuvant therapy.

  11. In silico Pathway Activation Network Decomposition Analysis (iPANDA) as a method for biomarker development

    PubMed Central

    Ozerov, Ivan V.; Lezhnina, Ksenia V.; Izumchenko, Evgeny; Artemov, Artem V.; Medintsev, Sergey; Vanhaelen, Quentin; Aliper, Alexander; Vijg, Jan; Osipov, Andreyan N.; Labat, Ivan; West, Michael D.; Buzdin, Anton; Cantor, Charles R.; Nikolsky, Yuri; Borisov, Nikolay; Irincheeva, Irina; Khokhlovich, Edward; Sidransky, David; Camargo, Miguel Luiz; Zhavoronkov, Alex

    2016-01-01

    Signalling pathway activation analysis is a powerful approach for extracting biologically relevant features from large-scale transcriptomic and proteomic data. However, modern pathway-based methods often fail to provide stable pathway signatures of a specific phenotype or reliable disease biomarkers. In the present study, we introduce the in silico Pathway Activation Network Decomposition Analysis (iPANDA) as a scalable robust method for biomarker identification using gene expression data. The iPANDA method combines precalculated gene coexpression data with gene importance factors based on the degree of differential gene expression and pathway topology decomposition for obtaining pathway activation scores. Using Microarray Analysis Quality Control (MAQC) data sets and pretreatment data on Taxol-based neoadjuvant breast cancer therapy from multiple sources, we demonstrate that iPANDA provides significant noise reduction in transcriptomic data and identifies highly robust sets of biologically relevant pathway signatures. We successfully apply iPANDA for stratifying breast cancer patients according to their sensitivity to neoadjuvant therapy. PMID:27848968

  12. Integrative approach for inference of gene regulatory networks using lasso-based random featuring and application to psychiatric disorders.

    PubMed

    Kim, Dongchul; Kang, Mingon; Biswas, Ashis; Liu, Chunyu; Gao, Jean

    2016-08-10

    Inferring gene regulatory networks is one of the most interesting research areas in the systems biology. Many inference methods have been developed by using a variety of computational models and approaches. However, there are two issues to solve. First, depending on the structural or computational model of inference method, the results tend to be inconsistent due to innately different advantages and limitations of the methods. Therefore the combination of dissimilar approaches is demanded as an alternative way in order to overcome the limitations of standalone methods through complementary integration. Second, sparse linear regression that is penalized by the regularization parameter (lasso) and bootstrapping-based sparse linear regression methods were suggested in state of the art methods for network inference but they are not effective for a small sample size data and also a true regulator could be missed if the target gene is strongly affected by an indirect regulator with high correlation or another true regulator. We present two novel network inference methods based on the integration of three different criteria, (i) z-score to measure the variation of gene expression from knockout data, (ii) mutual information for the dependency between two genes, and (iii) linear regression-based feature selection. Based on these criterion, we propose a lasso-based random feature selection algorithm (LARF) to achieve better performance overcoming the limitations of bootstrapping as mentioned above. In this work, there are three main contributions. First, our z score-based method to measure gene expression variations from knockout data is more effective than similar criteria of related works. Second, we confirmed that the true regulator selection can be effectively improved by LARF. Lastly, we verified that an integrative approach can clearly outperform a single method when two different methods are effectively jointed. In the experiments, our methods were validated by outperforming the state of the art methods on DREAM challenge data, and then LARF was applied to inferences of gene regulatory network associated with psychiatric disorders.

  13. Functional characterisation of metal(loid) processes in planta through the integration of synchrotron techniques and plant molecular biology

    PubMed Central

    Donner, Erica; Punshon, Tracy; Guerinot, Mary Lou; Lombi, Enzo

    2013-01-01

    Functional characterisation of the genes regulating metal(loid) homeostasis in plants is a major focus of crop biofortification, phytoremediation, and food security research. This paper focuses on the potential for advancing plant metal(loid) research by combining molecular biology and synchrotron-based techniques. Recent advances in x-ray focussing optics and fluorescence detection have greatly improved the potential of synchrotron techniques for plant science research, allowing metal(loids) to be imaged in vivo in hydrated plant tissues at sub-micron resolution. Laterally resolved metal(loid) speciation can also be determined. By using molecular techniques to probe the location of gene expression and protein localisation and combining it with this synchrotron-derived data, functional information can be effectively and efficiently assigned to specific genes. This paper provides a review of the state of the art in this field, and provides examples as to how synchrotron-based methods can be combined with molecular techniques to facilitate functional characterisation of genes in planta. PMID:22200921

  14. Constructing an integrated gene similarity network for the identification of disease genes.

    PubMed

    Tian, Zhen; Guo, Maozu; Wang, Chunyu; Xing, LinLin; Wang, Lei; Zhang, Yin

    2017-09-20

    Discovering novel genes that are involved human diseases is a challenging task in biomedical research. In recent years, several computational approaches have been proposed to prioritize candidate disease genes. Most of these methods are mainly based on protein-protein interaction (PPI) networks. However, since these PPI networks contain false positives and only cover less half of known human genes, their reliability and coverage are very low. Therefore, it is highly necessary to fuse multiple genomic data to construct a credible gene similarity network and then infer disease genes on the whole genomic scale. We proposed a novel method, named RWRB, to infer causal genes of interested diseases. First, we construct five individual gene (protein) similarity networks based on multiple genomic data of human genes. Then, an integrated gene similarity network (IGSN) is reconstructed based on similarity network fusion (SNF) method. Finally, we employee the random walk with restart algorithm on the phenotype-gene bilayer network, which combines phenotype similarity network, IGSN as well as phenotype-gene association network, to prioritize candidate disease genes. We investigate the effectiveness of RWRB through leave-one-out cross-validation methods in inferring phenotype-gene relationships. Results show that RWRB is more accurate than state-of-the-art methods on most evaluation metrics. Further analysis shows that the success of RWRB is benefited from IGSN which has a wider coverage and higher reliability comparing with current PPI networks. Moreover, we conduct a comprehensive case study for Alzheimer's disease and predict some novel disease genes that supported by literature. RWRB is an effective and reliable algorithm in prioritizing candidate disease genes on the genomic scale. Software and supplementary information are available at http://nclab.hit.edu.cn/~tianzhen/RWRB/ .

  15. UNCLES: method for the identification of genes differentially consistently co-expressed in a specific subset of datasets.

    PubMed

    Abu-Jamous, Basel; Fa, Rui; Roberts, David J; Nandi, Asoke K

    2015-06-04

    Collective analysis of the increasingly emerging gene expression datasets are required. The recently proposed binarisation of consensus partition matrices (Bi-CoPaM) method can combine clustering results from multiple datasets to identify the subsets of genes which are consistently co-expressed in all of the provided datasets in a tuneable manner. However, results validation and parameter setting are issues that complicate the design of such methods. Moreover, although it is a common practice to test methods by application to synthetic datasets, the mathematical models used to synthesise such datasets are usually based on approximations which may not always be sufficiently representative of real datasets. Here, we propose an unsupervised method for the unification of clustering results from multiple datasets using external specifications (UNCLES). This method has the ability to identify the subsets of genes consistently co-expressed in a subset of datasets while being poorly co-expressed in another subset of datasets, and to identify the subsets of genes consistently co-expressed in all given datasets. We also propose the M-N scatter plots validation technique and adopt it to set the parameters of UNCLES, such as the number of clusters, automatically. Additionally, we propose an approach for the synthesis of gene expression datasets using real data profiles in a way which combines the ground-truth-knowledge of synthetic data and the realistic expression values of real data, and therefore overcomes the problem of faithfulness of synthetic expression data modelling. By application to those datasets, we validate UNCLES while comparing it with other conventional clustering methods, and of particular relevance, biclustering methods. We further validate UNCLES by application to a set of 14 real genome-wide yeast datasets as it produces focused clusters that conform well to known biological facts. Furthermore, in-silico-based hypotheses regarding the function of a few previously unknown genes in those focused clusters are drawn. The UNCLES method, the M-N scatter plots technique, and the expression data synthesis approach will have wide application for the comprehensive analysis of genomic and other sources of multiple complex biological datasets. Moreover, the derived in-silico-based biological hypotheses represent subjects for future functional studies.

  16. Robust diagnosis of non-Hodgkin lymphoma phenotypes validated on gene expression data from different laboratories.

    PubMed

    Bhanot, Gyan; Alexe, Gabriela; Levine, Arnold J; Stolovitzky, Gustavo

    2005-01-01

    A major challenge in cancer diagnosis from microarray data is the need for robust, accurate, classification models which are independent of the analysis techniques used and can combine data from different laboratories. We propose such a classification scheme originally developed for phenotype identification from mass spectrometry data. The method uses a robust multivariate gene selection procedure and combines the results of several machine learning tools trained on raw and pattern data to produce an accurate meta-classifier. We illustrate and validate our method by applying it to gene expression datasets: the oligonucleotide HuGeneFL microarray dataset of Shipp et al. (www.genome.wi.mit.du/MPR/lymphoma) and the Hu95Av2 Affymetrix dataset (DallaFavera's laboratory, Columbia University). Our pattern-based meta-classification technique achieves higher predictive accuracies than each of the individual classifiers , is robust against data perturbations and provides subsets of related predictive genes. Our techniques predict that combinations of some genes in the p53 pathway are highly predictive of phenotype. In particular, we find that in 80% of DLBCL cases the mRNA level of at least one of the three genes p53, PLK1 and CDK2 is elevated, while in 80% of FL cases, the mRNA level of at most one of them is elevated.

  17. A multistage gene normalization system integrating multiple effective methods.

    PubMed

    Li, Lishuang; Liu, Shanshan; Li, Lihua; Fan, Wenting; Huang, Degen; Zhou, Huiwei

    2013-01-01

    Gene/protein recognition and normalization is an important preliminary step for many biological text mining tasks. In this paper, we present a multistage gene normalization system which consists of four major subtasks: pre-processing, dictionary matching, ambiguity resolution and filtering. For the first subtask, we apply the gene mention tagger developed in our earlier work, which achieves an F-score of 88.42% on the BioCreative II GM testing set. In the stage of dictionary matching, the exact matching and approximate matching between gene names and the EntrezGene lexicon have been combined. For the ambiguity resolution subtask, we propose a semantic similarity disambiguation method based on Munkres' Assignment Algorithm. At the last step, a filter based on Wikipedia has been built to remove the false positives. Experimental results show that the presented system can achieve an F-score of 90.1%, outperforming most of the state-of-the-art systems.

  18. MorphDB: Prioritizing Genes for Specialized Metabolism Pathways and Gene Ontology Categories in Plants.

    PubMed

    Zwaenepoel, Arthur; Diels, Tim; Amar, David; Van Parys, Thomas; Shamir, Ron; Van de Peer, Yves; Tzfadia, Oren

    2018-01-01

    Recent times have seen an enormous growth of "omics" data, of which high-throughput gene expression data are arguably the most important from a functional perspective. Despite huge improvements in computational techniques for the functional classification of gene sequences, common similarity-based methods often fall short of providing full and reliable functional information. Recently, the combination of comparative genomics with approaches in functional genomics has received considerable interest for gene function analysis, leveraging both gene expression based guilt-by-association methods and annotation efforts in closely related model organisms. Besides the identification of missing genes in pathways, these methods also typically enable the discovery of biological regulators (i.e., transcription factors or signaling genes). A previously built guilt-by-association method is MORPH, which was proven to be an efficient algorithm that performs particularly well in identifying and prioritizing missing genes in plant metabolic pathways. Here, we present MorphDB, a resource where MORPH-based candidate genes for large-scale functional annotations (Gene Ontology, MapMan bins) are integrated across multiple plant species. Besides a gene centric query utility, we present a comparative network approach that enables researchers to efficiently browse MORPH predictions across functional gene sets and species, facilitating efficient gene discovery and candidate gene prioritization. MorphDB is available at http://bioinformatics.psb.ugent.be/webtools/morphdb/morphDB/index/. We also provide a toolkit, named "MORPH bulk" (https://github.com/arzwa/morph-bulk), for running MORPH in bulk mode on novel data sets, enabling researchers to apply MORPH to their own species of interest.

  19. Capturing the Alternative Cleavage and Polyadenylation Sites of 14 NAC Genes in Populus Using a Combination of 3'-RACE and High-Throughput Sequencing.

    PubMed

    Wang, Haoran; Wang, Mingxiu; Cheng, Qiang

    2018-03-08

    Detection of complex splice sites (SSs) and polyadenylation sites (PASs) of eukaryotic genes is essential for the elucidation of gene regulatory mechanisms. Transcriptome-wide studies using high-throughput sequencing (HTS) have revealed prevalent alternative splicing (AS) and alternative polyadenylation (APA) in plants. However, small-scale and high-depth HTS aimed at detecting genes or gene families are very few and limited. We explored a convenient and flexible method for profiling SSs and PASs, which combines rapid amplification of 3'-cDNA ends (3'-RACE) and HTS. Fourteen NAC (NAM, ATAF1/2, CUC2) transcription factor genes of Populus trichocarpa were analyzed by 3'-RACE-seq. Based on experimental reproducibility, boundary sequence analysis and reverse transcription PCR (RT-PCR) verification, only canonical SSs were considered to be authentic. Based on stringent criteria, candidate PASs without any internal priming features were chosen as authentic PASs and assumed to be PAS-rich markers. Thirty-four novel canonical SSs, six intronic/internal exons and thirty 3'-UTR PAS-rich markers were revealed by 3'-RACE-seq. Using 3'-RACE and real-time PCR, we confirmed that three APA transcripts ending in/around PAS-rich markers were differentially regulated in response to plant hormones. Our results indicate that 3'-RACE-seq is a robust and cost-effective method to discover SSs and label active regions subjected to APA for genes or gene families. The method is suitable for small-scale AS and APA research in the initial stage.

  20. Reconstructing genome-wide regulatory network of E. coli using transcriptome data and predicted transcription factor activities

    PubMed Central

    2011-01-01

    Background Gene regulatory networks play essential roles in living organisms to control growth, keep internal metabolism running and respond to external environmental changes. Understanding the connections and the activity levels of regulators is important for the research of gene regulatory networks. While relevance score based algorithms that reconstruct gene regulatory networks from transcriptome data can infer genome-wide gene regulatory networks, they are unfortunately prone to false positive results. Transcription factor activities (TFAs) quantitatively reflect the ability of the transcription factor to regulate target genes. However, classic relevance score based gene regulatory network reconstruction algorithms use models do not include the TFA layer, thus missing a key regulatory element. Results This work integrates TFA prediction algorithms with relevance score based network reconstruction algorithms to reconstruct gene regulatory networks with improved accuracy over classic relevance score based algorithms. This method is called Gene expression and Transcription factor activity based Relevance Network (GTRNetwork). Different combinations of TFA prediction algorithms and relevance score functions have been applied to find the most efficient combination. When the integrated GTRNetwork method was applied to E. coli data, the reconstructed genome-wide gene regulatory network predicted 381 new regulatory links. This reconstructed gene regulatory network including the predicted new regulatory links show promising biological significances. Many of the new links are verified by known TF binding site information, and many other links can be verified from the literature and databases such as EcoCyc. The reconstructed gene regulatory network is applied to a recent transcriptome analysis of E. coli during isobutanol stress. In addition to the 16 significantly changed TFAs detected in the original paper, another 7 significantly changed TFAs have been detected by using our reconstructed network. Conclusions The GTRNetwork algorithm introduces the hidden layer TFA into classic relevance score-based gene regulatory network reconstruction processes. Integrating the TFA biological information with regulatory network reconstruction algorithms significantly improves both detection of new links and reduces that rate of false positives. The application of GTRNetwork on E. coli gene transcriptome data gives a set of potential regulatory links with promising biological significance for isobutanol stress and other conditions. PMID:21668997

  1. Gene Expression-Based Survival Prediction in Lung Adenocarcinoma: A Multi-Site, Blinded Validation Study

    PubMed Central

    Shedden, Kerby; Taylor, Jeremy M.G.; Enkemann, Steve A.; Tsao, Ming S.; Yeatman, Timothy J.; Gerald, William L.; Eschrich, Steve; Jurisica, Igor; Venkatraman, Seshan E.; Meyerson, Matthew; Kuick, Rork; Dobbin, Kevin K.; Lively, Tracy; Jacobson, James W.; Beer, David G.; Giordano, Thomas J.; Misek, David E.; Chang, Andrew C.; Zhu, Chang Qi; Strumpf, Dan; Hanash, Samir; Shepherd, Francis A.; Ding, Kuyue; Seymour, Lesley; Naoki, Katsuhiko; Pennell, Nathan; Weir, Barbara; Verhaak, Roel; Ladd-Acosta, Christine; Golub, Todd; Gruidl, Mike; Szoke, Janos; Zakowski, Maureen; Rusch, Valerie; Kris, Mark; Viale, Agnes; Motoi, Noriko; Travis, William; Sharma, Anupama

    2009-01-01

    Although prognostic gene expression signatures for survival in early stage lung cancer have been proposed, for clinical application it is critical to establish their performance across different subject populations and in different laboratories. Here we report a large, training-testing, multi-site blinded validation study to characterize the performance of several prognostic models based on gene expression for 442 lung adenocarcinomas. The hypotheses proposed examined whether microarray measurements of gene expression either alone or combined with basic clinical covariates (stage, age, sex) can be used to predict overall survival in lung cancer subjects. Several models examined produced risk scores that substantially correlated with actual subject outcome. Most methods performed better with clinical data, supporting the combined use of clinical and molecular information when building prognostic models for early stage lung cancer. This study also provides the largest available set of microarray data with extensive pathological and clinical annotation for lung adenocarcinomas. PMID:18641660

  2. Predicting Protein Function by Genomic Context: Quantitative Evaluation and Qualitative Inferences

    PubMed Central

    Huynen, Martijn; Snel, Berend; Lathe, Warren; Bork, Peer

    2000-01-01

    Various new methods have been proposed to predict functional interactions between proteins based on the genomic context of their genes. The types of genomic context that they use are Type I: the fusion of genes; Type II: the conservation of gene-order or co-occurrence of genes in potential operons; and Type III: the co-occurrence of genes across genomes (phylogenetic profiles). Here we compare these types for their coverage, their correlations with various types of functional interaction, and their overlap with homology-based function assignment. We apply the methods to Mycoplasma genitalium, the standard benchmarking genome in computational and experimental genomics. Quantitatively, conservation of gene order is the technique with the highest coverage, applying to 37% of the genes. By combining gene order conservation with gene fusion (6%), the co-occurrence of genes in operons in absence of gene order conservation (8%), and the co-occurrence of genes across genomes (11%), significant context information can be obtained for 50% of the genes (the categories overlap). Qualitatively, we observe that the functional interactions between genes are stronger as the requirements for physical neighborhood on the genome are more stringent, while the fraction of potential false positives decreases. Moreover, only in cases in which gene order is conserved in a substantial fraction of the genomes, in this case six out of twenty-five, does a single type of functional interaction (physical interaction) clearly dominate (>80%). In other cases, complementary function information from homology searches, which is available for most of the genes with significant genomic context, is essential to predict the type of interaction. Using a combination of genomic context and homology searches, new functional features can be predicted for 10% of M. genitalium genes. PMID:10958638

  3. Combined sequence and sequence-structure-based methods for analyzing RAAS gene SNPs: a computational approach.

    PubMed

    Singh, Kh Dhanachandra; Karthikeyan, Muthusamy

    2014-12-01

    The renin-angiotensin-aldosterone system (RAAS) plays a key role in the regulation of blood pressure (BP). Mutations on the genes that encode components of the RAAS have played a significant role in genetic susceptibility to hypertension and have been intensively scrutinized. The identification of such probably causal mutations not only provides insight into the RAAS but may also serve as antihypertensive therapeutic targets and diagnostic markers. The methods for analyzing the SNPs from the huge dataset of SNPs, containing both functional and neutral SNPs is challenging by the experimental approach on every SNPs to determine their biological significance. To explore the functional significance of genetic mutation (SNPs), we adopted combined sequence and sequence-structure-based SNP analysis algorithm. Out of 3864 SNPs reported in dbSNP, we found 108 missense SNPs in the coding region and remaining in the non-coding region. In this study, we are reporting only those SNPs in coding region to be deleterious when three or more tools are predicted to be deleterious and which have high RMSD from the native structure. Based on these analyses, we have identified two SNPs of REN gene, eight SNPs of AGT gene, three SNPs of ACE gene, two SNPs of AT1R gene, three SNPs of CYP11B2 gene and three SNPs of CMA1 gene in the coding region were found to be deleterious. Further this type of study will be helpful in reducing the cost and time for identification of potential SNP and also helpful in selecting potential SNP for experimental study out of SNP pool.

  4. Combined SOM-portrayal of gene expression and DNA methylation landscapes disentangles modes of epigenetic regulation in glioblastoma.

    PubMed

    Hopp, Lydia; Löffler-Wirth, Henry; Galle, Jörg; Binder, Hans

    2018-06-11

    We present here a novel method that enables unraveling the interplay between gene expression and DNA methylation in complex diseases such as cancer. The method is based on self-organizing maps and allows for analysis of data landscapes from 'governed by methylation' to 'governed by expression'. We identified regulatory modules of coexpressed and comethylated genes in high-grade gliomas: two modes are governed by genes hypermethylated and underexpressed in IDH-mutated cases, while two other modes reflect immune and stromal signatures in the classical and mesenchymal subtypes. A fifth mode with proneural characteristics comprises genes of repressed and poised chromatin states active in healthy brain. Two additional modes enrich genes either in active or repressed chromatin states. The method disentangles the interplay between gene expression and methylation. It has the potential to integrate also mutation and copy number data and to apply to large sample cohorts.

  5. Use of simulated data sets to evaluate the fidelity of metagenomic processing methods

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mavromatis, K; Ivanova, N; Barry, Kerrie

    2007-01-01

    Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene-finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity-based ( blast hit distribution) and twomore » sequence composition-based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.« less

  6. Use of simulated data sets to evaluate the fidelity of Metagenomicprocessing methods

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mavromatis, Konstantinos; Ivanova, Natalia; Barry, Kerri

    2006-12-01

    Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity--based (blast hit distribution) and twomore » sequence composition--based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.« less

  7. Promoter library-based module combination (PLMC) technology for optimization of threonine biosynthesis in Corynebacterium glutamicum.

    PubMed

    Wei, Liang; Xu, Ning; Wang, Yiran; Zhou, Wei; Han, Guoqiang; Ma, Yanhe; Liu, Jun

    2018-05-01

    Due to the lack of efficient control elements and tools, the fine-tuning of gene expression in the multi-gene metabolic pathways is still a great challenge for engineering microbial cell factories, especially for the important industrial microorganism Corynebacterium glutamicum. In this study, the promoter library-based module combination (PLMC) technology was developed to efficiently optimize the expression of genes in C. glutamicum. A random promoter library was designed to contain the putative - 10 (NNTANANT) and - 35 (NNGNCN) consensus motifs, and refined through a three-step screening procedure to achieve numerous genetic control elements with different strength levels, including fluorescence-activated cell sorting (FACS) screening, agar plate screening, and 96-well plate screening. Multiple conventional strategies were employed for further precise characterizations of the promoter library, such as real-time quantitative PCR, sodium dodecyl sulfate polyacrylamide gel electrophoresis, FACS analysis, and the lacZ reporter system. These results suggested that the established promoter elements effectively regulated gene expression and showed varying strengths over a wide range. Subsequently, a multi-module combination technology was created based on the efficient promoter elements for combination and optimization of modules in the multi-gene pathways. Using this technology, the threonine biosynthesis pathway was reconstructed and optimized by predictable tuning expression of five modules in C. glutamicum. The threonine titer of the optimized strain was significantly improved to 12.8 g/L, an approximate 6.1-fold higher than that of the control strain. Overall, the PLMC technology presented in this study provides a rapid and effective method for combination and optimization of multi-gene pathways in C. glutamicum.

  8. Assembly and features of secondary metabolite biosynthetic gene clusters in Streptomyces ansochromogenes.

    PubMed

    Zhong, Xingyu; Tian, Yuqing; Niu, Guoqing; Tan, Huarong

    2013-07-01

    A draft genome sequence of Streptomyces ansochromogenes 7100 was generated using 454 sequencing technology. In combination with local BLAST searches and gap filling techniques, a comprehensive antiSMASH-based method was adopted to assemble the secondary metabolite biosynthetic gene clusters in the draft genome of S. ansochromogenes. A total of at least 35 putative gene clusters were identified and assembled. Transcriptional analysis showed that 20 of the 35 gene clusters were expressed in either or all of the three different media tested, whereas the other 15 gene clusters were silent in all three different media. This study provides a comprehensive method to identify and assemble secondary metabolite biosynthetic gene clusters in draft genomes of Streptomyces, and will significantly promote functional studies of these secondary metabolite biosynthetic gene clusters.

  9. Prior knowledge driven Granger causality analysis on gene regulatory network discovery

    DOE PAGES

    Yao, Shun; Yoo, Shinjae; Yu, Dantong

    2015-08-28

    Our study focuses on discovering gene regulatory networks from time series gene expression data using the Granger causality (GC) model. However, the number of available time points (T) usually is much smaller than the number of target genes (n) in biological datasets. The widely applied pairwise GC model (PGC) and other regularization strategies can lead to a significant number of false identifications when n>>T. In this study, we proposed a new method, viz., CGC-2SPR (CGC using two-step prior Ridge regularization) to resolve the problem by incorporating prior biological knowledge about a target gene data set. In our simulation experiments, themore » propose new methodology CGC-2SPR showed significant performance improvement in terms of accuracy over other widely used GC modeling (PGC, Ridge and Lasso) and MI-based (MRNET and ARACNE) methods. In addition, we applied CGC-2SPR to a real biological dataset, i.e., the yeast metabolic cycle, and discovered more true positive edges with CGC-2SPR than with the other existing methods. In our research, we noticed a “ 1+1>2” effect when we combined prior knowledge and gene expression data to discover regulatory networks. Based on causality networks, we made a functional prediction that the Abm1 gene (its functions previously were unknown) might be related to the yeast’s responses to different levels of glucose. In conclusion, our research improves causality modeling by combining heterogeneous knowledge, which is well aligned with the future direction in system biology. Furthermore, we proposed a method of Monte Carlo significance estimation (MCSE) to calculate the edge significances which provide statistical meanings to the discovered causality networks. All of our data and source codes will be available under the link https://bitbucket.org/dtyu/granger-causality/wiki/Home.« less

  10. Discovering semantic features in the literature: a foundation for building functional associations

    PubMed Central

    Chagoyen, Monica; Carmona-Saez, Pedro; Shatkay, Hagit; Carazo, Jose M; Pascual-Montano, Alberto

    2006-01-01

    Background Experimental techniques such as DNA microarray, serial analysis of gene expression (SAGE) and mass spectrometry proteomics, among others, are generating large amounts of data related to genes and proteins at different levels. As in any other experimental approach, it is necessary to analyze these data in the context of previously known information about the biological entities under study. The literature is a particularly valuable source of information for experiment validation and interpretation. Therefore, the development of automated text mining tools to assist in such interpretation is one of the main challenges in current bioinformatics research. Results We present a method to create literature profiles for large sets of genes or proteins based on common semantic features extracted from a corpus of relevant documents. These profiles can be used to establish pair-wise similarities among genes, utilized in gene/protein classification or can be even combined with experimental measurements. Semantic features can be used by researchers to facilitate the understanding of the commonalities indicated by experimental results. Our approach is based on non-negative matrix factorization (NMF), a machine-learning algorithm for data analysis, capable of identifying local patterns that characterize a subset of the data. The literature is thus used to establish putative relationships among subsets of genes or proteins and to provide coherent justification for this clustering into subsets. We demonstrate the utility of the method by applying it to two independent and vastly different sets of genes. Conclusion The presented method can create literature profiles from documents relevant to sets of genes. The representation of genes as additive linear combinations of semantic features allows for the exploration of functional associations as well as for clustering, suggesting a valuable methodology for the validation and interpretation of high-throughput experimental data. PMID:16438716

  11. SGFSC: speeding the gene functional similarity calculation based on hash tables.

    PubMed

    Tian, Zhen; Wang, Chunyu; Guo, Maozu; Liu, Xiaoyan; Teng, Zhixia

    2016-11-04

    In recent years, many measures of gene functional similarity have been proposed and widely used in all kinds of essential research. These methods are mainly divided into two categories: pairwise approaches and group-wise approaches. However, a common problem with these methods is their time consumption, especially when measuring the gene functional similarities of a large number of gene pairs. The problem of computational efficiency for pairwise approaches is even more prominent because they are dependent on the combination of semantic similarity. Therefore, the efficient measurement of gene functional similarity remains a challenging problem. To speed current gene functional similarity calculation methods, a novel two-step computing strategy is proposed: (1) establish a hash table for each method to store essential information obtained from the Gene Ontology (GO) graph and (2) measure gene functional similarity based on the corresponding hash table. There is no need to traverse the GO graph repeatedly for each method with the help of the hash table. The analysis of time complexity shows that the computational efficiency of these methods is significantly improved. We also implement a novel Speeding Gene Functional Similarity Calculation tool, namely SGFSC, which is bundled with seven typical measures using our proposed strategy. Further experiments show the great advantage of SGFSC in measuring gene functional similarity on the whole genomic scale. The proposed strategy is successful in speeding current gene functional similarity calculation methods. SGFSC is an efficient tool that is freely available at http://nclab.hit.edu.cn/SGFSC . The source code of SGFSC can be downloaded from http://pan.baidu.com/s/1dFFmvpZ .

  12. Spectral gene set enrichment (SGSE).

    PubMed

    Frost, H Robert; Li, Zhigang; Moore, Jason H

    2015-03-03

    Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist for unsupervised gene set testing, they predominantly compute enrichment relative to clusters of the genomic variables with performance strongly dependent on the clustering algorithm and number of clusters. We propose a novel method, spectral gene set enrichment (SGSE), for unsupervised competitive testing of the association between gene sets and empirical data sources. SGSE first computes the statistical association between gene sets and principal components (PCs) using our principal component gene set enrichment (PCGSE) method. The overall statistical association between each gene set and the spectral structure of the data is then computed by combining the PC-level p-values using the weighted Z-method with weights set to the PC variance scaled by Tracy-Widom test p-values. Using simulated data, we show that the SGSE algorithm can accurately recover spectral features from noisy data. To illustrate the utility of our method on real data, we demonstrate the superior performance of the SGSE method relative to standard cluster-based techniques for testing the association between MSigDB gene sets and the variance structure of microarray gene expression data. Unsupervised gene set testing can provide important information about the biological signal held in high-dimensional genomic data sets. Because it uses the association between gene sets and samples PCs to generate a measure of unsupervised enrichment, the SGSE method is independent of cluster or network creation algorithms and, most importantly, is able to utilize the statistical significance of PC eigenvalues to ignore elements of the data most likely to represent noise.

  13. Prediction of regulatory gene pairs using dynamic time warping and gene ontology.

    PubMed

    Yang, Andy C; Hsu, Hui-Huang; Lu, Ming-Da; Tseng, Vincent S; Shih, Timothy K

    2014-01-01

    Selecting informative genes is the most important task for data analysis on microarray gene expression data. In this work, we aim at identifying regulatory gene pairs from microarray gene expression data. However, microarray data often contain multiple missing expression values. Missing value imputation is thus needed before further processing for regulatory gene pairs becomes possible. We develop a novel approach to first impute missing values in microarray time series data by combining k-Nearest Neighbour (KNN), Dynamic Time Warping (DTW) and Gene Ontology (GO). After missing values are imputed, we then perform gene regulation prediction based on our proposed DTW-GO distance measurement of gene pairs. Experimental results show that our approach is more accurate when compared with existing missing value imputation methods on real microarray data sets. Furthermore, our approach can also discover more regulatory gene pairs that are known in the literature than other methods.

  14. A combined enrichment/polymerase chain reaction based method for the routine screening of Streptococcus agalactiae in pregnant women.

    PubMed

    Munari, F M; De-Paris, F; Salton, G D; Lora, P S; Giovanella, P; Machado, A B M P; Laybauer, L S; Oliveira, K R P; Ferri, C; Silveira, J L S; Laurino, C C F C; Xavier, R M; Barth, A L; Echeverrigaray, S; Laurino, J P

    2012-01-01

    Group B Streptococcus (GBS) is the most common cause of life-threatening infection in neonates. Guidelines from CDC recommend universal screening of pregnant women for rectovaginal GBS colonization. The objective of this study was to compare the performance of a combined enrichment/PCR based method targeting the atr gene in relation to culture using enrichment with selective broth medium (standard method) to identify the presence of GBS in pregnant women. Rectovaginal GBS samples from women at ≥36 weeks of pregnancy were obtained with a swab and analyzed by the two methods. A total of 89 samples were evaluated. The prevalence of positive results for GBS detection was considerable higher when assessed by the combined enrichment/PCR method than with the standard method (35.9% versus 22.5%, respectively). The results demonstrated that the use of selective enrichment broth followed by PCR targeting the atr gene is a highly sensitive, specific and accurate test for GBS screening in pregnant women, allowing the detection of the bacteria even in lightly colonized patients. This PCR methodology may provide a useful diagnostic tool for GBS detection and contributes for a more accurate and effective intrapartum antibiotic and lower newborn mortality and morbidity.

  15. Reference gene selection for quantitative gene expression studies during biological invasions: A test on multiple genes and tissues in a model ascidian Ciona savignyi.

    PubMed

    Huang, Xuena; Gao, Yangchun; Jiang, Bei; Zhou, Zunchun; Zhan, Aibin

    2016-01-15

    As invasive species have successfully colonized a wide range of dramatically different local environments, they offer a good opportunity to study interactions between species and rapidly changing environments. Gene expression represents one of the primary and crucial mechanisms for rapid adaptation to local environments. Here, we aim to select reference genes for quantitative gene expression analysis based on quantitative Real-Time PCR (qRT-PCR) for a model invasive ascidian, Ciona savignyi. We analyzed the stability of ten candidate reference genes in three tissues (siphon, pharynx and intestine) under two key environmental stresses (temperature and salinity) in the marine realm based on three programs (geNorm, NormFinder and delta Ct method). Our results demonstrated only minor difference for stability rankings among the three methods. The use of different single reference gene might influence the data interpretation, while multiple reference genes could minimize possible errors. Therefore, reference gene combinations were recommended for different tissues - the optimal reference gene combination for siphon was RPS15 and RPL17 under temperature stress, and RPL17, UBQ and TubA under salinity treatment; for pharynx, TubB, TubA and RPL17 were the most stable genes under temperature stress, while TubB, TubA and UBQ were the best under salinity stress; for intestine, UBQ, RPS15 and RPL17 were the most reliable reference genes under both treatments. Our results suggest that the necessity of selection and test of reference genes for different tissues under varying environmental stresses. The results obtained here are expected to reveal mechanisms of gene expression-mediated invasion success using C. savignyi as a model species. Copyright © 2015 Elsevier B.V. All rights reserved.

  16. Prevalence of extended-spectrum beta-lactamases-producing microorganisms in nosocomial patients and molecular characterization of the shv type isolates

    PubMed Central

    de Oliveira, Caio Fernando; Salla, Adenilde; Lara, Valéria Maria; Rieger, Alexandre; Horta, Jorge André; Alves, Sydney Hartz

    2010-01-01

    The emergence of Extended-Spectrum Beta-Lactamase (ESBL)-producing microorganisms in Brazilian hospitals is a challenge that concerns scientists, clinicians and healthcare institutions due to the serious risk they pose to confined patients. The goal of this study was the detection of ESBL production by clinical strains of Escherichia coli and Klebsiella sp. isolated from pus, urine and blood of patients at Hospital Universitário Santa Maria, Rio Grande Sul, RS, Brazil and the genotyping of the isolates based on bla SHV genes. The ESBL study was carried out using the Combined Disc Method, while Polymerase Chain Reaction (PCR) was used to study the bla SHV genes. Of the 90 tested isolates, 55 (61.1%) were identified as ESBL-producing by the combined disk method. The bla SHV genes were found in 67.8% of these microorganisms. K. pneumoniae predominated in the samples, presenting the highest frequency of positive results from the combined disk and PCR. PMID:24031491

  17. Thermodynamically balanced inside-out (TBIO) PCR-based gene synthesis: a novel method of primer design for high-fidelity assembly of longer gene sequences

    PubMed Central

    Gao, Xinxin; Yo, Peggy; Keith, Andrew; Ragan, Timothy J.; Harris, Thomas K.

    2003-01-01

    A novel thermodynamically-balanced inside-out (TBIO) method of primer design was developed and compared with a thermodynamically-balanced conventional (TBC) method of primer design for PCR-based gene synthesis of codon-optimized gene sequences for the human protein kinase B-2 (PKB2; 1494 bp), p70 ribosomal S6 subunit protein kinase-1 (S6K1; 1622 bp) and phosphoinositide-dependent protein kinase-1 (PDK1; 1712 bp). Each of the 60mer TBIO primers coded for identical nucleotide regions that the 60mer TBC primers covered, except that half of the TBIO primers were reverse complement sequences. In addition, the TBIO and TBC primers contained identical regions of temperature- optimized primer overlaps. The TBC method was optimized to generate sequential overlapping fragments (∼0.4–0.5 kb) for each of the gene sequences, and simultaneous and sequential combinations of overlapping fragments were tested for their ability to be assembled under an array of PCR conditions. However, no fully synthesized gene sequences could be obtained by this approach. In contrast, the TBIO method generated an initial central fragment (∼0.4–0.5 kb), which could be gel purified and used for further inside-out bidirectional elongation by additional increments of 0.4–0.5 kb. By using the newly developed TBIO method of PCR-based gene synthesis, error-free synthetic genes for the human protein kinases PKB2, S6K1 and PDK1 were obtained with little or no corrective mutagenesis. PMID:14602936

  18. Toward the identification of causal genes in complex diseases: a gene-centric joint test of significance combining genomic and transcriptomic data.

    PubMed

    Charlesworth, Jac C; Peralta, Juan M; Drigalenko, Eugene; Göring, Harald Hh; Almasy, Laura; Dyer, Thomas D; Blangero, John

    2009-12-15

    Gene identification using linkage, association, or genome-wide expression is often underpowered. We propose that formal combination of information from multiple gene-identification approaches may lead to the identification of novel loci that are missed when only one form of information is available. Firstly, we analyze the Genetic Analysis Workshop 16 Framingham Heart Study Problem 2 genome-wide association data for HDL-cholesterol using a "gene-centric" approach. Then we formally combine the association test results with genome-wide transcriptional profiling data for high-density lipoprotein cholesterol (HDL-C), from the San Antonio Family Heart Study, using a Z-transform test (Stouffer's method). We identified 39 genes by the joint test at a conservative 1% false-discovery rate, including 9 from the significant gene-based association test and 23 whose expression was significantly correlated with HDL-C. Seven genes identified as significant in the joint test were not independently identified by either the association or expression tests. This combined approach has increased power and leads to the direct nomination of novel candidate genes likely to be involved in the determination of HDL-C levels. Such information can then be used as justification for a more exhaustive search for functional sequence variation within the nominated genes. We anticipate that this type of analysis will improve our speed of identification of regulatory genes causally involved in disease risk.

  19. Hierarchical Gene Selection and Genetic Fuzzy System for Cancer Microarray Data Classification

    PubMed Central

    Nguyen, Thanh; Khosravi, Abbas; Creighton, Douglas; Nahavandi, Saeid

    2015-01-01

    This paper introduces a novel approach to gene selection based on a substantial modification of analytic hierarchy process (AHP). The modified AHP systematically integrates outcomes of individual filter methods to select the most informative genes for microarray classification. Five individual ranking methods including t-test, entropy, receiver operating characteristic (ROC) curve, Wilcoxon and signal to noise ratio are employed to rank genes. These ranked genes are then considered as inputs for the modified AHP. Additionally, a method that uses fuzzy standard additive model (FSAM) for cancer classification based on genes selected by AHP is also proposed in this paper. Traditional FSAM learning is a hybrid process comprising unsupervised structure learning and supervised parameter tuning. Genetic algorithm (GA) is incorporated in-between unsupervised and supervised training to optimize the number of fuzzy rules. The integration of GA enables FSAM to deal with the high-dimensional-low-sample nature of microarray data and thus enhance the efficiency of the classification. Experiments are carried out on numerous microarray datasets. Results demonstrate the performance dominance of the AHP-based gene selection against the single ranking methods. Furthermore, the combination of AHP-FSAM shows a great accuracy in microarray data classification compared to various competing classifiers. The proposed approach therefore is useful for medical practitioners and clinicians as a decision support system that can be implemented in the real medical practice. PMID:25823003

  20. Hierarchical gene selection and genetic fuzzy system for cancer microarray data classification.

    PubMed

    Nguyen, Thanh; Khosravi, Abbas; Creighton, Douglas; Nahavandi, Saeid

    2015-01-01

    This paper introduces a novel approach to gene selection based on a substantial modification of analytic hierarchy process (AHP). The modified AHP systematically integrates outcomes of individual filter methods to select the most informative genes for microarray classification. Five individual ranking methods including t-test, entropy, receiver operating characteristic (ROC) curve, Wilcoxon and signal to noise ratio are employed to rank genes. These ranked genes are then considered as inputs for the modified AHP. Additionally, a method that uses fuzzy standard additive model (FSAM) for cancer classification based on genes selected by AHP is also proposed in this paper. Traditional FSAM learning is a hybrid process comprising unsupervised structure learning and supervised parameter tuning. Genetic algorithm (GA) is incorporated in-between unsupervised and supervised training to optimize the number of fuzzy rules. The integration of GA enables FSAM to deal with the high-dimensional-low-sample nature of microarray data and thus enhance the efficiency of the classification. Experiments are carried out on numerous microarray datasets. Results demonstrate the performance dominance of the AHP-based gene selection against the single ranking methods. Furthermore, the combination of AHP-FSAM shows a great accuracy in microarray data classification compared to various competing classifiers. The proposed approach therefore is useful for medical practitioners and clinicians as a decision support system that can be implemented in the real medical practice.

  1. Detection of gene communities in multi-networks reveals cancer drivers

    NASA Astrophysics Data System (ADS)

    Cantini, Laura; Medico, Enzo; Fortunato, Santo; Caselle, Michele

    2015-12-01

    We propose a new multi-network-based strategy to integrate different layers of genomic information and use them in a coordinate way to identify driving cancer genes. The multi-networks that we consider combine transcription factor co-targeting, microRNA co-targeting, protein-protein interaction and gene co-expression networks. The rationale behind this choice is that gene co-expression and protein-protein interactions require a tight coregulation of the partners and that such a fine tuned regulation can be obtained only combining both the transcriptional and post-transcriptional layers of regulation. To extract the relevant biological information from the multi-network we studied its partition into communities. To this end we applied a consensus clustering algorithm based on state of art community detection methods. Even if our procedure is valid in principle for any pathology in this work we concentrate on gastric, lung, pancreas and colorectal cancer and identified from the enrichment analysis of the multi-network communities a set of candidate driver cancer genes. Some of them were already known oncogenes while a few are new. The combination of the different layers of information allowed us to extract from the multi-network indications on the regulatory pattern and functional role of both the already known and the new candidate driver genes.

  2. Domain selection combined with improved cloning strategy for high throughput expression of higher eukaryotic proteins

    PubMed Central

    Chen, Yunjia; Qiu, Shihong; Luan, Chi-Hao; Luo, Ming

    2007-01-01

    Background Expression of higher eukaryotic genes as soluble, stable recombinant proteins is still a bottleneck step in biochemical and structural studies of novel proteins today. Correct identification of stable domains/fragments within the open reading frame (ORF), combined with proper cloning strategies, can greatly enhance the success rate when higher eukaryotic proteins are expressed as these domains/fragments. Furthermore, a HTP cloning pipeline incorporated with bioinformatics domain/fragment selection methods will be beneficial to studies of structure and function genomics/proteomics. Results With bioinformatics tools, we developed a domain/domain boundary prediction (DDBP) method, which was trained by available experimental data. Combined with an improved cloning strategy, DDBP had been applied to 57 proteins from C. elegans. Expression and purification results showed there was a 10-fold increase in terms of obtaining purified proteins. Based on the DDBP method, the improved GATEWAY cloning strategy and a robotic platform, we constructed a high throughput (HTP) cloning pipeline, including PCR primer design, PCR, BP reaction, transformation, plating, colony picking and entry clones extraction, which have been successfully applied to 90 C. elegans genes, 88 Brucella genes, and 188 human genes. More than 97% of the targeted genes were obtained as entry clones. This pipeline has a modular design and can adopt different operations for a variety of cloning/expression strategies. Conclusion The DDBP method and improved cloning strategy were satisfactory. The cloning pipeline, combined with our recombinant protein HTP expression pipeline and the crystal screening robots, constitutes a complete platform for structure genomics/proteomics. This platform will increase the success rate of purification and crystallization dramatically and promote the further advancement of structure genomics/proteomics. PMID:17663785

  3. An automatic and efficient pipeline for disease gene identification through utilizing family-based sequencing data.

    PubMed

    Song, Dandan; Li, Ning; Liao, Lejian

    2015-01-01

    Due to the generation of enormous amounts of data at both lower costs as well as in shorter times, whole-exome sequencing technologies provide dramatic opportunities for identifying disease genes implicated in Mendelian disorders. Since upwards of thousands genomic variants can be sequenced in each exome, it is challenging to filter pathogenic variants in protein coding regions and reduce the number of missing true variants. Therefore, an automatic and efficient pipeline for finding disease variants in Mendelian disorders is designed by exploiting a combination of variants filtering steps to analyze the family-based exome sequencing approach. Recent studies on the Freeman-Sheldon disease are revisited and show that the proposed method outperforms other existing candidate gene identification methods.

  4. Essential protein discovery based on a combination of modularity and conservatism.

    PubMed

    Zhao, Bihai; Wang, Jianxin; Li, Xueyong; Wu, Fang-Xiang

    2016-11-01

    Essential proteins are indispensable for the survival of a living organism and play important roles in the emerging field of synthetic biology. Many computational methods have been proposed to identify essential proteins by using the topological features of interactome networks. However, most of these methods ignored intrinsic biological meaning of proteins. Researches show that essentiality is tied not only to the protein or gene itself, but also to the molecular modules to which that protein belongs. The results of this study reveal the modularity of essential proteins. On the other hand, essential proteins are more evolutionarily conserved than nonessential proteins and frequently bind each other. That is to say, conservatism is another important feature of essential proteins. Multiple networks are constructed by integrating protein-protein interaction (PPI) networks, time course gene expression data and protein domain information. Based on these networks, a new essential protein identification method is proposed based on a combination of modularity and conservatism of proteins. Experimental results show that the proposed method outperforms other essential protein identification methods in terms of a number essential protein out of top ranked candidates. Copyright © 2016. Published by Elsevier Inc.

  5. Improved Classification of Lung Cancer Using Radial Basis Function Neural Network with Affine Transforms of Voss Representation.

    PubMed

    Adetiba, Emmanuel; Olugbara, Oludayo O

    2015-01-01

    Lung cancer is one of the diseases responsible for a large number of cancer related death cases worldwide. The recommended standard for screening and early detection of lung cancer is the low dose computed tomography. However, many patients diagnosed die within one year, which makes it essential to find alternative approaches for screening and early detection of lung cancer. We present computational methods that can be implemented in a functional multi-genomic system for classification, screening and early detection of lung cancer victims. Samples of top ten biomarker genes previously reported to have the highest frequency of lung cancer mutations and sequences of normal biomarker genes were respectively collected from the COSMIC and NCBI databases to validate the computational methods. Experiments were performed based on the combinations of Z-curve and tetrahedron affine transforms, Histogram of Oriented Gradient (HOG), Multilayer perceptron and Gaussian Radial Basis Function (RBF) neural networks to obtain an appropriate combination of computational methods to achieve improved classification of lung cancer biomarker genes. Results show that a combination of affine transforms of Voss representation, HOG genomic features and Gaussian RBF neural network perceptibly improves classification accuracy, specificity and sensitivity of lung cancer biomarker genes as well as achieving low mean square error.

  6. PreCisIon: PREdiction of CIS-regulatory elements improved by gene's positION.

    PubMed

    Elati, Mohamed; Nicolle, Rémy; Junier, Ivan; Fernández, David; Fekih, Rim; Font, Julio; Képès, François

    2013-02-01

    Conventional approaches to predict transcriptional regulatory interactions usually rely on the definition of a shared motif sequence on the target genes of a transcription factor (TF). These efforts have been frustrated by the limited availability and accuracy of TF binding site motifs, usually represented as position-specific scoring matrices, which may match large numbers of sites and produce an unreliable list of target genes. To improve the prediction of binding sites, we propose to additionally use the unrelated knowledge of the genome layout. Indeed, it has been shown that co-regulated genes tend to be either neighbors or periodically spaced along the whole chromosome. This study demonstrates that respective gene positioning carries significant information. This novel type of information is combined with traditional sequence information by a machine learning algorithm called PreCisIon. To optimize this combination, PreCisIon builds a strong gene target classifier by adaptively combining weak classifiers based on either local binding sequence or global gene position. This strategy generically paves the way to the optimized incorporation of any future advances in gene target prediction based on local sequence, genome layout or on novel criteria. With the current state of the art, PreCisIon consistently improves methods based on sequence information only. This is shown by implementing a cross-validation analysis of the 20 major TFs from two phylogenetically remote model organisms. For Bacillus subtilis and Escherichia coli, respectively, PreCisIon achieves on average an area under the receiver operating characteristic curve of 70 and 60%, a sensitivity of 80 and 70% and a specificity of 60 and 56%. The newly predicted gene targets are demonstrated to be functionally consistent with previously known targets, as assessed by analysis of Gene Ontology enrichment or of the relevant literature and databases.

  7. A versatile genome-scale PCR-based pipeline for high-definition DNA FISH.

    PubMed

    Bienko, Magda; Crosetto, Nicola; Teytelman, Leonid; Klemm, Sandy; Itzkovitz, Shalev; van Oudenaarden, Alexander

    2013-02-01

    We developed a cost-effective genome-scale PCR-based method for high-definition DNA FISH (HD-FISH). We visualized gene loci with diffraction-limited resolution, chromosomes as spot clusters and single genes together with transcripts by combining HD-FISH with single-molecule RNA FISH. We provide a database of over 4.3 million primer pairs targeting the human and mouse genomes that is readily usable for rapid and flexible generation of probes.

  8. Nonviral vectors for cancer gene therapy: prospects for integrating vectors and combination therapies.

    PubMed

    Ohlfest, John R; Freese, Andrew B; Largaespada, David A

    2005-12-01

    Gene therapy has the potential to improve the clinical outcome of many cancers by transferring therapeutic genes into tumor cells or normal host tissue. Gene transfer into tumor cells or tumor-associated stroma is being employed to induce tumor cell death, stimulate anti-tumor immune response, inhibit angiogenesis, and control tumor cell growth. Viral vectors have been used to achieve this proof of principle in animal models and, in select cases, in human clinical trials. Nevertheless, there has been considerable interest in developing nonviral vectors for cancer gene therapy. Nonviral vectors are simpler, more amenable to large-scale manufacture, and potentially safer for clinical use. Nonviral vectors were once limited by low gene transfer efficiency and transient or steadily declining gene expression. However, recent improvements in plasmid-based vectors and delivery methods are showing promise in circumventing these obstacles. This article reviews the current status of nonviral cancer gene therapy, with an emphasis on combination strategies, long-term gene transfer using transposons and bacteriophage integrases, and future directions.

  9. Analysis of temporal transcription expression profiles reveal links between protein function and developmental stages of Drosophila melanogaster.

    PubMed

    Wan, Cen; Lees, Jonathan G; Minneci, Federico; Orengo, Christine A; Jones, David T

    2017-10-01

    Accurate gene or protein function prediction is a key challenge in the post-genome era. Most current methods perform well on molecular function prediction, but struggle to provide useful annotations relating to biological process functions due to the limited power of sequence-based features in that functional domain. In this work, we systematically evaluate the predictive power of temporal transcription expression profiles for protein function prediction in Drosophila melanogaster. Our results show significantly better performance on predicting protein function when transcription expression profile-based features are integrated with sequence-derived features, compared with the sequence-derived features alone. We also observe that the combination of expression-based and sequence-based features leads to further improvement of accuracy on predicting all three domains of gene function. Based on the optimal feature combinations, we then propose a novel multi-classifier-based function prediction method for Drosophila melanogaster proteins, FFPred-fly+. Interpreting our machine learning models also allows us to identify some of the underlying links between biological processes and developmental stages of Drosophila melanogaster.

  10. Lung tumor diagnosis and subtype discovery by gene expression profiling.

    PubMed

    Wang, Lu-yong; Tu, Zhuowen

    2006-01-01

    The optimal treatment of patients with complex diseases, such as cancers, depends on the accurate diagnosis by using a combination of clinical and histopathological data. In many scenarios, it becomes tremendously difficult because of the limitations in clinical presentation and histopathology. To accurate diagnose complex diseases, the molecular classification based on gene or protein expression profiles are indispensable for modern medicine. Moreover, many heterogeneous diseases consist of various potential subtypes in molecular basis and differ remarkably in their response to therapies. It is critical to accurate predict subgroup on disease gene expression profiles. More fundamental knowledge of the molecular basis and classification of disease could aid in the prediction of patient outcome, the informed selection of therapies, and identification of novel molecular targets for therapy. In this paper, we propose a new disease diagnostic method, probabilistic boosting tree (PB tree) method, on gene expression profiles of lung tumors. It enables accurate disease classification and subtype discovery in disease. It automatically constructs a tree in which each node combines a number of weak classifiers into a strong classifier. Also, subtype discovery is naturally embedded in the learning process. Our algorithm achieves excellent diagnostic performance, and meanwhile it is capable of detecting the disease subtype based on gene expression profile.

  11. Phylogenetic Analysis of Shewanella Strains by DNA Relatedness Derived from Whole Genome Microarray DNA-DNA Hybridization and Comparison with Other Methods

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Liyou; Yi, T. Y.; Van Nostrand, Joy

    Phylogenetic analyses were done for the Shewanella strains isolated from Baltic Sea (38 strains), US DOE Hanford Uranium bioremediation site [Hanford Reach of the Columbia River (HRCR), 11 strains], Pacific Ocean and Hawaiian sediments (8 strains), and strains from other resources (16 strains) with three out group strains, Rhodopseudomonas palustris, Clostridium cellulolyticum, and Thermoanaerobacter ethanolicus X514, using DNA relatedness derived from WCGA-based DNA-DNA hybridizations, sequence similarities of 16S rRNA gene and gyrB gene, and sequence similarities of 6 loci of Shewanella genome selected from a shared gene list of the Shewanella strains with whole genome sequenced based on the averagemore » nucleotide identity of them (ANI). The phylogenetic trees based on 16S rRNA and gyrB gene sequences, and DNA relatedness derived from WCGA hybridizations of the tested Shewanella strains share exactly the same sub-clusters with very few exceptions, in which the strains were basically grouped by species. However, the phylogenetic analysis based on DNA relatedness derived from WCGA hybridizations dramatically increased the differentiation resolution at species and strains level within Shewanella genus. When the tree based on DNA relatedness derived from WCGA hybridizations was compared to the tree based on the combined sequences of the selected functional genes (6 loci), we found that the resolutions of both methods are similar, but the clustering of the tree based on DNA relatedness derived from WMGA hybridizations was clearer. These results indicate that WCGA-based DNA-DNA hybridization is an idea alternative of conventional DNA-DNA hybridization methods and it is superior to the phylogenetics methods based on sequence similarities of single genes. Detailed analysis is being performed for the re-classification of the strains examined.« less

  12. Xander: employing a novel method for efficient gene-targeted metagenomic assembly

    DOE PAGES

    Wang, Qiong; Fish, Jordan A.; Gilman, Mariah; ...

    2015-08-05

    Here, metagenomics can provide important insight into microbial communities. However, assembling metagenomic datasets has proven to be computationally challenging. Current methods often assemble only fragmented partial genes. We present a novel method for targeting assembly of specific protein-coding genes. This method combines a de Bruijn graph, as used in standard assembly approaches, and a protein profile hidden Markov model (HMM) for the gene of interest, as used in standard annotation approaches. These are used to create a novel combined weighted assembly graph. Xander performs both assembly and annotation concomitantly using information incorporated in this graph. We demonstrate the utility ofmore » this approach by assembling contigs for one phylogenetic marker gene and for two functional marker genes, first on Human Microbiome Project (HMP)-defined community Illumina data and then on 21 rhizosphere soil metagenomic datasets from three different crops totaling over 800 Gbp of unassembled data. We compared our method to a recently published bulk metagenome assembly method and a recently published gene-targeted assembler and found our method produced more, longer, and higher quality gene sequences. In conclusion, xander combines gene assignment with the rapid assembly of full-length or near full-length functional genes from metagenomic data without requiring bulk assembly or post-processing to find genes of interest. HMMs used for assembly can be tailored to the targeted genes, allowing flexibility to improve annotation over generic annotation pipelines.« less

  13. Feature weight estimation for gene selection: a local hyperlinear learning approach

    PubMed Central

    2014-01-01

    Background Modeling high-dimensional data involving thousands of variables is particularly important for gene expression profiling experiments, nevertheless,it remains a challenging task. One of the challenges is to implement an effective method for selecting a small set of relevant genes, buried in high-dimensional irrelevant noises. RELIEF is a popular and widely used approach for feature selection owing to its low computational cost and high accuracy. However, RELIEF based methods suffer from instability, especially in the presence of noisy and/or high-dimensional outliers. Results We propose an innovative feature weighting algorithm, called LHR, to select informative genes from highly noisy data. LHR is based on RELIEF for feature weighting using classical margin maximization. The key idea of LHR is to estimate the feature weights through local approximation rather than global measurement, which is typically used in existing methods. The weights obtained by our method are very robust in terms of degradation of noisy features, even those with vast dimensions. To demonstrate the performance of our method, extensive experiments involving classification tests have been carried out on both synthetic and real microarray benchmark datasets by combining the proposed technique with standard classifiers, including the support vector machine (SVM), k-nearest neighbor (KNN), hyperplane k-nearest neighbor (HKNN), linear discriminant analysis (LDA) and naive Bayes (NB). Conclusion Experiments on both synthetic and real-world datasets demonstrate the superior performance of the proposed feature selection method combined with supervised learning in three aspects: 1) high classification accuracy, 2) excellent robustness to noise and 3) good stability using to various classification algorithms. PMID:24625071

  14. Xander: employing a novel method for efficient gene-targeted metagenomic assembly.

    PubMed

    Wang, Qiong; Fish, Jordan A; Gilman, Mariah; Sun, Yanni; Brown, C Titus; Tiedje, James M; Cole, James R

    2015-01-01

    Metagenomics can provide important insight into microbial communities. However, assembling metagenomic datasets has proven to be computationally challenging. Current methods often assemble only fragmented partial genes. We present a novel method for targeting assembly of specific protein-coding genes. This method combines a de Bruijn graph, as used in standard assembly approaches, and a protein profile hidden Markov model (HMM) for the gene of interest, as used in standard annotation approaches. These are used to create a novel combined weighted assembly graph. Xander performs both assembly and annotation concomitantly using information incorporated in this graph. We demonstrate the utility of this approach by assembling contigs for one phylogenetic marker gene and for two functional marker genes, first on Human Microbiome Project (HMP)-defined community Illumina data and then on 21 rhizosphere soil metagenomic datasets from three different crops totaling over 800 Gbp of unassembled data. We compared our method to a recently published bulk metagenome assembly method and a recently published gene-targeted assembler and found our method produced more, longer, and higher quality gene sequences. Xander combines gene assignment with the rapid assembly of full-length or near full-length functional genes from metagenomic data without requiring bulk assembly or post-processing to find genes of interest. HMMs used for assembly can be tailored to the targeted genes, allowing flexibility to improve annotation over generic annotation pipelines. This method is implemented as open source software and is available at https://github.com/rdpstaff/Xander_assembler.

  15. Predicting miRNA targets for head and neck squamous cell carcinoma using an ensemble method.

    PubMed

    Gao, Hong; Jin, Hui; Li, Guijun

    2018-01-01

    This study aimed to uncover potential microRNA (miRNA) targets in head and neck squamous cell carcinoma (HNSCC) using an ensemble method which combined 3 different methods: Pearson's correlation coefficient (PCC), Lasso and a causal inference method (i.e., intervention calculus when the directed acyclic graph (DAG) is absent [IDA]), based on Borda count election. The Borda count election method was used to integrate the top 100 predicted targets of each miRNA generated by individual methods. Afterwards, to validate the performance ability of our method, we checked the TarBase v6.0, miRecords v2013, miRWalk v2.0 and miRTarBase v4.5 databases to validate predictions for miRNAs. Pathway enrichment analysis of target genes in the top 1,000 miRNA-messenger RNA (mRNA) interactions was conducted to focus on significant KEGG pathways. Finally, we extracted target genes based on occurrence frequency ≥3. Based on an absolute value of PCC >0.7, we found 33 miRNAs and 288 mRNAs for further analysis. We extracted 10 target genes with predicted frequencies not less than 3. The target gene MYO5C possessed the highest frequency, which was predicted by 7 different miRNAs. Significantly, a total of 8 pathways were identified; the pathways of cytokine-cytokine receptor interaction and chemokine signaling pathway were the most significant. We successfully predicted target genes and pathways for HNSCC relying on miRNA expression data, mRNA expression profile, an ensemble method and pathway information. Our results may offer new information for the diagnosis and estimation of the prognosis of HNSCC.

  16. Methods for simultaneous control of lignin content and composition, and cellulose content in plants

    DOEpatents

    Chiang, Vincent Lee C.; Li, Laigeng

    2005-02-15

    The present invention relates to a method of concurrently introducing multiple genes into plants and trees is provided. The method includes simultaneous transformation of plants with multiple genes from the phenylpropanoid pathways including 4CL, CAld5H, AldOMT, SAD and CAD genes and combinations thereof to produce various lines of transgenic plants displaying altered agronomic traits. The agronomic traits of the plants are regulated by the orientation of the specific genes and the selected gene combinations, which are incorporated into the plant genome.

  17. Integrated rare variant-based risk gene prioritization in disease case-control sequencing studies.

    PubMed

    Lin, Jhih-Rong; Zhang, Quanwei; Cai, Ying; Morrow, Bernice E; Zhang, Zhengdong D

    2017-12-01

    Rare variants of major effect play an important role in human complex diseases and can be discovered by sequencing-based genome-wide association studies. Here, we introduce an integrated approach that combines the rare variant association test with gene network and phenotype information to identify risk genes implicated by rare variants for human complex diseases. Our data integration method follows a 'discovery-driven' strategy without relying on prior knowledge about the disease and thus maintains the unbiased character of genome-wide association studies. Simulations reveal that our method can outperform a widely-used rare variant association test method by 2 to 3 times. In a case study of a small disease cohort, we uncovered putative risk genes and the corresponding rare variants that may act as genetic modifiers of congenital heart disease in 22q11.2 deletion syndrome patients. These variants were missed by a conventional approach that relied on the rare variant association test alone.

  18. Meganucleases and Other Tools for Targeted Genome Engineering: Perspectives and Challenges for Gene Therapy

    PubMed Central

    Silva, George; Poirot, Laurent; Galetto, Roman; Smith, Julianne; Montoya, Guillermo; Duchateau, Philippe; Pâques, Frédéric

    2011-01-01

    The importance of safer approaches for gene therapy has been underscored by a series of severe adverse events (SAEs) observed in patients involved in clinical trials for Severe Combined Immune Deficiency Disease (SCID) and Chromic Granulomatous Disease (CGD). While a new generation of viral vectors is in the process of replacing the classical gamma-retrovirus–based approach, a number of strategies have emerged based on non-viral vectorization and/or targeted insertion aimed at achieving safer gene transfer. Currently, these methods display lower efficacies than viral transduction although many of them can yield more than 1% engineered cells in vitro. Nuclease-based approaches, wherein an endonuclease is used to trigger site-specific genome editing, can significantly increase the percentage of targeted cells. These methods therefore provide a real alternative to classical gene transfer as well as gene editing. However, the first endonuclease to be in clinic today is not used for gene transfer, but to inactivate a gene (CCR5) required for HIV infection. Here, we review these alternative approaches, with a special emphasis on meganucleases, a family of naturally occurring rare-cutting endonucleases, and speculate on their current and future potential. PMID:21182466

  19. Robust extraction of functional signals from gene set analysis using a generalized threshold free scoring function

    PubMed Central

    2009-01-01

    Background A central task in contemporary biosciences is the identification of biological processes showing response in genome-wide differential gene expression experiments. Two types of analysis are common. Either, one generates an ordered list based on the differential expression values of the probed genes and examines the tail areas of the list for over-representation of various functional classes. Alternatively, one monitors the average differential expression level of genes belonging to a given functional class. So far these two types of method have not been combined. Results We introduce a scoring function, Gene Set Z-score (GSZ), for the analysis of functional class over-representation that combines two previous analysis methods. GSZ encompasses popular functions such as correlation, hypergeometric test, Max-Mean and Random Sets as limiting cases. GSZ is stable against changes in class size as well as across different positions of the analysed gene list in tests with randomized data. GSZ shows the best overall performance in a detailed comparison to popular functions using artificial data. Likewise, GSZ stands out in a cross-validation of methods using split real data. A comparison of empirical p-values further shows a strong difference in favour of GSZ, which clearly reports better p-values for top classes than the other methods. Furthermore, GSZ detects relevant biological themes that are missed by the other methods. These observations also hold when comparing GSZ with popular program packages. Conclusion GSZ and improved versions of earlier methods are a useful contribution to the analysis of differential gene expression. The methods and supplementary material are available from the website http://ekhidna.biocenter.helsinki.fi/users/petri/public/GSZ/GSZscore.html. PMID:19775443

  20. Reranking candidate gene models with cross-species comparison for improved gene prediction

    PubMed Central

    Liu, Qian; Crammer, Koby; Pereira, Fernando CN; Roos, David S

    2008-01-01

    Background Most gene finders score candidate gene models with state-based methods, typically HMMs, by combining local properties (coding potential, splice donor and acceptor patterns, etc). Competing models with similar state-based scores may be distinguishable with additional information. In particular, functional and comparative genomics datasets may help to select among competing models of comparable probability by exploiting features likely to be associated with the correct gene models, such as conserved exon/intron structure or protein sequence features. Results We have investigated the utility of a simple post-processing step for selecting among a set of alternative gene models, using global scoring rules to rerank competing models for more accurate prediction. For each gene locus, we first generate the K best candidate gene models using the gene finder Evigan, and then rerank these models using comparisons with putative orthologous genes from closely-related species. Candidate gene models with lower scores in the original gene finder may be selected if they exhibit strong similarity to probable orthologs in coding sequence, splice site location, or signal peptide occurrence. Experiments on Drosophila melanogaster demonstrate that reranking based on cross-species comparison outperforms the best gene models identified by Evigan alone, and also outperforms the comparative gene finders GeneWise and Augustus+. Conclusion Reranking gene models with cross-species comparison improves gene prediction accuracy. This straightforward method can be readily adapted to incorporate additional lines of evidence, as it requires only a ranked source of candidate gene models. PMID:18854050

  1. OncoBinder facilitates interpretation of proteomic interaction data by capturing coactivation pairs in cancer.

    PubMed

    Van Coillie, Samya; Liang, Lunxi; Zhang, Yao; Wang, Huanbin; Fang, Jing-Yuan; Xu, Jie

    2016-04-05

    High-throughput methods such as co-immunoprecipitationmass spectrometry (coIP-MS) and yeast 2 hybridization (Y2H) have suggested a broad range of unannotated protein-protein interactions (PPIs), and interpretation of these PPIs remains a challenging task. The advancements in cancer genomic researches allow for the inference of "coactivation pairs" in cancer, which may facilitate the identification of PPIs involved in cancer. Here we present OncoBinder as a tool for the assessment of proteomic interaction data based on the functional synergy of oncoproteins in cancer. This decision tree-based method combines gene mutation, copy number and mRNA expression information to infer the functional status of protein-coding genes. We applied OncoBinder to evaluate the potential binders of EGFR and ERK2 proteins based on the gastric cancer dataset of The Cancer Genome Atlas (TCGA). As a result, OncoBinder identified high confidence interactions (annotated by Kyoto Encyclopedia of Genes and Genomes (KEGG) or validated by low-throughput assays) more efficiently than co-expression based method. Taken together, our results suggest that evaluation of gene functional synergy in cancer may facilitate the interpretation of proteomic interaction data. The OncoBinder toolbox for Matlab is freely accessible online.

  2. Reconstruction of metabolic pathways by combining probabilistic graphical model-based and knowledge-based methods

    PubMed Central

    2014-01-01

    Automatic reconstruction of metabolic pathways for an organism from genomics and transcriptomics data has been a challenging and important problem in bioinformatics. Traditionally, known reference pathways can be mapped into an organism-specific ones based on its genome annotation and protein homology. However, this simple knowledge-based mapping method might produce incomplete pathways and generally cannot predict unknown new relations and reactions. In contrast, ab initio metabolic network construction methods can predict novel reactions and interactions, but its accuracy tends to be low leading to a lot of false positives. Here we combine existing pathway knowledge and a new ab initio Bayesian probabilistic graphical model together in a novel fashion to improve automatic reconstruction of metabolic networks. Specifically, we built a knowledge database containing known, individual gene / protein interactions and metabolic reactions extracted from existing reference pathways. Known reactions and interactions were then used as constraints for Bayesian network learning methods to predict metabolic pathways. Using individual reactions and interactions extracted from different pathways of many organisms to guide pathway construction is new and improves both the coverage and accuracy of metabolic pathway construction. We applied this probabilistic knowledge-based approach to construct the metabolic networks from yeast gene expression data and compared its results with 62 known metabolic networks in the KEGG database. The experiment showed that the method improved the coverage of metabolic network construction over the traditional reference pathway mapping method and was more accurate than pure ab initio methods. PMID:25374614

  3. Accuracy of different bioinformatics methods in detecting antibiotic resistance and virulence factors from Staphylococcus aureus whole genome sequences.

    PubMed

    Mason, Amy; Foster, Dona; Bradley, Phelim; Golubchik, Tanya; Doumith, Michel; Gordon, N Claire; Pichon, Bruno; Iqbal, Zamin; Staves, Peter; Crook, Derrick; Walker, A Sarah; Kearns, Angela; Peto, Tim

    2018-06-20

    Background : In principle, whole genome sequencing (WGS) can predict phenotypic resistance directly from genotype, replacing laboratory-based tests. However, the contribution of different bioinformatics methods to genotype-phenotype discrepancies has not been systematically explored to date. Methods : We compared three WGS-based bioinformatics methods (Genefinder (read-based), Mykrobe (de Bruijn graph-based) and Typewriter (BLAST-based)) for predicting presence/absence of 83 different resistance determinants and virulence genes, and overall antimicrobial susceptibility, in 1379 Staphylococcus aureus isolates previously characterised by standard laboratory methods (disc diffusion, broth and/or agar dilution and PCR). Results : 99.5% (113830/114457) of individual resistance-determinant/virulence gene predictions were identical between all three methods, with only 627 (0.5%) discordant predictions, demonstrating high overall agreement (Fliess-Kappa=0.98, p<0.0001). Discrepancies when identified were in only one of the three methods for all genes except the cassette recombinase, ccrC(b ). Genotypic antimicrobial susceptibility prediction matched laboratory phenotype in 98.3% (14224/14464) cases (2720 (18.8%) resistant, 11504 (79.5%) susceptible). There was greater disagreement between the laboratory phenotypes and the combined genotypic predictions (97 (0.7%) phenotypically-susceptible but all bioinformatic methods reported resistance; 89 (0.6%) phenotypically-resistant, but all bioinformatics methods reported susceptible) than within the three bioinformatics methods (54 (0.4%) cases, 16 phenotypically-resistant, 38 phenotypically-susceptible). However, in 36/54 (67%), the consensus genotype matched the laboratory phenotype. Conclusions : In this study, the choice between these three specific bioinformatic methods to identify resistance-determinants or other genes in S. aureus did not prove critical, with all demonstrating high concordance with each other and phenotypic/molecular methods. However, each has some limitations and therefore consensus methods provide some assurance. Copyright © 2018 American Society for Microbiology.

  4. Joint amalgamation of most parsimonious reconciled gene trees

    PubMed Central

    Scornavacca, Celine; Jacox, Edwin; Szöllősi, Gergely J.

    2015-01-01

    Motivation: Traditionally, gene phylogenies have been reconstructed solely on the basis of molecular sequences; this, however, often does not provide enough information to distinguish between statistically equivalent relationships. To address this problem, several recent methods have incorporated information on the species phylogeny in gene tree reconstruction, leading to dramatic improvements in accuracy. Although probabilistic methods are able to estimate all model parameters but are computationally expensive, parsimony methods—generally computationally more efficient—require a prior estimate of parameters and of the statistical support. Results: Here, we present the Tree Estimation using Reconciliation (TERA) algorithm, a parsimony based, species tree aware method for gene tree reconstruction based on a scoring scheme combining duplication, transfer and loss costs with an estimate of the sequence likelihood. TERA explores all reconciled gene trees that can be amalgamated from a sample of gene trees. Using a large scale simulated dataset, we demonstrate that TERA achieves the same accuracy as the corresponding probabilistic method while being faster, and outperforms other parsimony-based methods in both accuracy and speed. Running TERA on a set of 1099 homologous gene families from complete cyanobacterial genomes, we find that incorporating knowledge of the species tree results in a two thirds reduction in the number of apparent transfer events. Availability and implementation: The algorithm is implemented in our program TERA, which is freely available from http://mbb.univ-montp2.fr/MBB/download_sources/16__TERA. Contact: celine.scornavacca@univ-montp2.fr, ssolo@angel.elte.hu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25380957

  5. In silico identification of novel ligands for G-quadruplex in the c- MYC promoter

    NASA Astrophysics Data System (ADS)

    Kang, Hyun-Jin; Park, Hyun-Ju

    2015-04-01

    G-quadruplex DNA formed in NHEIII1 region of oncogene promoter inhibits transcription of the genes. In this study, virtual screening combining pharmacophore-based search and structure-based docking screening was conducted to discover ligands binding to G-quadruplex in promoter region of c- MYC. Several hit ligands showed the selective PCR-arresting effects for oligonucleotide containing c- MYC G-quadruplex forming sequence. Among them, three hits selectively inhibited cell proliferation and decreased c- MYC mRNA level in Ramos cells, where NHEIII1 is included in translocated c- MYC gene for overexpression. Promoter assay using two kinds of constructs with wild-type and mutant sequences showed that interaction of these ligands with the G-quadruplex resulted in turning-off of the reporter gene. In conclusion, combined virtual screening methods were successfully used for discovery of selective c- MYC promoter G-quadruplex binders with anticancer activity.

  6. Analysis of genetic association using hierarchical clustering and cluster validation indices.

    PubMed

    Pagnuco, Inti A; Pastore, Juan I; Abras, Guillermo; Brun, Marcel; Ballarin, Virginia L

    2017-10-01

    It is usually assumed that co-expressed genes suggest co-regulation in the underlying regulatory network. Determining sets of co-expressed genes is an important task, based on some criteria of similarity. This task is usually performed by clustering algorithms, where the genes are clustered into meaningful groups based on their expression values in a set of experiment. In this work, we propose a method to find sets of co-expressed genes, based on cluster validation indices as a measure of similarity for individual gene groups, and a combination of variants of hierarchical clustering to generate the candidate groups. We evaluated its ability to retrieve significant sets on simulated correlated and real genomics data, where the performance is measured based on its detection ability of co-regulated sets against a full search. Additionally, we analyzed the quality of the best ranked groups using an online bioinformatics tool that provides network information for the selected genes. Copyright © 2017 Elsevier Inc. All rights reserved.

  7. In silico mining and PCR-based approaches to transcription factor discovery in non-model plants: gene discovery of the WRKY transcription factors in conifers.

    PubMed

    Liu, Jun-Jun; Xiang, Yu

    2011-01-01

    WRKY transcription factors are key regulators of numerous biological processes in plant growth and development, as well as plant responses to abiotic and biotic stresses. Research on biological functions of plant WRKY genes has focused in the past on model plant species or species with largely characterized transcriptomes. However, a variety of non-model plants, such as forest conifers, are essential as feed, biofuel, and wood or for sustainable ecosystems. Identification of WRKY genes in these non-model plants is equally important for understanding the evolutionary and function-adaptive processes of this transcription factor family. Because of limited genomic information, the rarity of regulatory gene mRNAs in transcriptomes, and the sequence divergence to model organism genes, identification of transcription factors in non-model plants using methods similar to those generally used for model plants is difficult. This chapter describes a gene family discovery strategy for identification of WRKY transcription factors in conifers by a combination of in silico-based prediction and PCR-based experimental approaches. Compared to traditional cDNA library screening or EST sequencing at transcriptome scales, this integrated gene discovery strategy provides fast, simple, reliable, and specific methods to unveil the WRKY gene family at both genome and transcriptome levels in non-model plants.

  8. Knowledge-based analysis of microarrays for the discovery of transcriptional regulation relationships

    PubMed Central

    2010-01-01

    Background The large amount of high-throughput genomic data has facilitated the discovery of the regulatory relationships between transcription factors and their target genes. While early methods for discovery of transcriptional regulation relationships from microarray data often focused on the high-throughput experimental data alone, more recent approaches have explored the integration of external knowledge bases of gene interactions. Results In this work, we develop an algorithm that provides improved performance in the prediction of transcriptional regulatory relationships by supplementing the analysis of microarray data with a new method of integrating information from an existing knowledge base. Using a well-known dataset of yeast microarrays and the Yeast Proteome Database, a comprehensive collection of known information of yeast genes, we show that knowledge-based predictions demonstrate better sensitivity and specificity in inferring new transcriptional interactions than predictions from microarray data alone. We also show that comprehensive, direct and high-quality knowledge bases provide better prediction performance. Comparison of our results with ChIP-chip data and growth fitness data suggests that our predicted genome-wide regulatory pairs in yeast are reasonable candidates for follow-up biological verification. Conclusion High quality, comprehensive, and direct knowledge bases, when combined with appropriate bioinformatic algorithms, can significantly improve the discovery of gene regulatory relationships from high throughput gene expression data. PMID:20122245

  9. Knowledge-based analysis of microarrays for the discovery of transcriptional regulation relationships.

    PubMed

    Seok, Junhee; Kaushal, Amit; Davis, Ronald W; Xiao, Wenzhong

    2010-01-18

    The large amount of high-throughput genomic data has facilitated the discovery of the regulatory relationships between transcription factors and their target genes. While early methods for discovery of transcriptional regulation relationships from microarray data often focused on the high-throughput experimental data alone, more recent approaches have explored the integration of external knowledge bases of gene interactions. In this work, we develop an algorithm that provides improved performance in the prediction of transcriptional regulatory relationships by supplementing the analysis of microarray data with a new method of integrating information from an existing knowledge base. Using a well-known dataset of yeast microarrays and the Yeast Proteome Database, a comprehensive collection of known information of yeast genes, we show that knowledge-based predictions demonstrate better sensitivity and specificity in inferring new transcriptional interactions than predictions from microarray data alone. We also show that comprehensive, direct and high-quality knowledge bases provide better prediction performance. Comparison of our results with ChIP-chip data and growth fitness data suggests that our predicted genome-wide regulatory pairs in yeast are reasonable candidates for follow-up biological verification. High quality, comprehensive, and direct knowledge bases, when combined with appropriate bioinformatic algorithms, can significantly improve the discovery of gene regulatory relationships from high throughput gene expression data.

  10. The extraction of drug-disease correlations based on module distance in incomplete human interactome.

    PubMed

    Yu, Liang; Wang, Bingbo; Ma, Xiaoke; Gao, Lin

    2016-12-23

    Extracting drug-disease correlations is crucial in unveiling disease mechanisms, as well as discovering new indications of available drugs, or drug repositioning. Both the interactome and the knowledge of disease-associated and drug-associated genes remain incomplete. We present a new method to predict the associations between drugs and diseases. Our method is based on a module distance, which is originally proposed to calculate distances between modules in incomplete human interactome. We first map all the disease genes and drug genes to a combined protein interaction network. Then based on the module distance, we calculate the distances between drug gene sets and disease gene sets, and take the distances as the relationships of drug-disease pairs. We also filter possible false positive drug-disease correlations by p-value. Finally, we validate the top-100 drug-disease associations related to six drugs in the predicted results. The overlapping between our predicted correlations with those reported in Comparative Toxicogenomics Database (CTD) and literatures, and their enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways demonstrate our approach can not only effectively identify new drug indications, but also provide new insight into drug-disease discovery.

  11. OPATs: Omnibus P-value association tests.

    PubMed

    Chen, Chia-Wei; Yang, Hsin-Chou

    2017-07-10

    Combining statistical significances (P-values) from a set of single-locus association tests in genome-wide association studies is a proof-of-principle method for identifying disease-associated genomic segments, functional genes and biological pathways. We review P-value combinations for genome-wide association studies and introduce an integrated analysis tool, Omnibus P-value Association Tests (OPATs), which provides popular analysis methods of P-value combinations. The software OPATs programmed in R and R graphical user interface features a user-friendly interface. In addition to analysis modules for data quality control and single-locus association tests, OPATs provides three types of set-based association test: window-, gene- and biopathway-based association tests. P-value combinations with or without threshold and rank truncation are provided. The significance of a set-based association test is evaluated by using resampling procedures. Performance of the set-based association tests in OPATs has been evaluated by simulation studies and real data analyses. These set-based association tests help boost the statistical power, alleviate the multiple-testing problem, reduce the impact of genetic heterogeneity, increase the replication efficiency of association tests and facilitate the interpretation of association signals by streamlining the testing procedures and integrating the genetic effects of multiple variants in genomic regions of biological relevance. In summary, P-value combinations facilitate the identification of marker sets associated with disease susceptibility and uncover missing heritability in association studies, thereby establishing a foundation for the genetic dissection of complex diseases and traits. OPATs provides an easy-to-use and statistically powerful analysis tool for P-value combinations. OPATs, examples, and user guide can be downloaded from http://www.stat.sinica.edu.tw/hsinchou/genetics/association/OPATs.htm. © The Author 2017. Published by Oxford University Press.

  12. StemTextSearch: Stem cell gene database with evidence from abstracts.

    PubMed

    Chen, Chou-Cheng; Ho, Chung-Liang

    2017-05-01

    Previous studies have used many methods to find biomarkers in stem cells, including text mining, experimental data and image storage. However, no text-mining methods have yet been developed which can identify whether a gene plays a positive or negative role in stem cells. StemTextSearch identifies the role of a gene in stem cells by using a text-mining method to find combinations of gene regulation, stem-cell regulation and cell processes in the same sentences of biomedical abstracts. The dataset includes 5797 genes, with 1534 genes having positive roles in stem cells, 1335 genes having negative roles, 1654 genes with both positive and negative roles, and 1274 with an uncertain role. The precision of gene role in StemTextSearch is 0.66, and the recall is 0.78. StemTextSearch is a web-based engine with queries that specify (i) gene, (ii) category of stem cell, (iii) gene role, (iv) gene regulation, (v) cell process, (vi) stem-cell regulation, and (vii) species. StemTextSearch is available through http://bio.yungyun.com.tw/StemTextSearch.aspx. Copyright © 2017. Published by Elsevier Inc.

  13. Fast gene ontology based clustering for microarray experiments.

    PubMed

    Ovaska, Kristian; Laakso, Marko; Hautaniemi, Sampsa

    2008-11-21

    Analysis of a microarray experiment often results in a list of hundreds of disease-associated genes. In order to suggest common biological processes and functions for these genes, Gene Ontology annotations with statistical testing are widely used. However, these analyses can produce a very large number of significantly altered biological processes. Thus, it is often challenging to interpret GO results and identify novel testable biological hypotheses. We present fast software for advanced gene annotation using semantic similarity for Gene Ontology terms combined with clustering and heat map visualisation. The methodology allows rapid identification of genes sharing the same Gene Ontology cluster. Our R based semantic similarity open-source package has a speed advantage of over 2000-fold compared to existing implementations. From the resulting hierarchical clustering dendrogram genes sharing a GO term can be identified, and their differences in the gene expression patterns can be seen from the heat map. These methods facilitate advanced annotation of genes resulting from data analysis.

  14. A simple, rapid, high-fidelity and cost-effective PCR-based two-step DNA synthesis method for long gene sequences.

    PubMed

    Xiong, Ai-Sheng; Yao, Quan-Hong; Peng, Ri-He; Li, Xian; Fan, Hui-Qin; Cheng, Zong-Ming; Li, Yi

    2004-07-07

    Chemical synthesis of DNA sequences provides a powerful tool for modifying genes and for studying gene function, structure and expression. Here, we report a simple, high-fidelity and cost-effective PCR-based two-step DNA synthesis (PTDS) method for synthesis of long segments of DNA. The method involves two steps. (i) Synthesis of individual fragments of the DNA of interest: ten to twelve 60mer oligonucleotides with 20 bp overlap are mixed and a PCR reaction is carried out with high-fidelity DNA polymerase Pfu to produce DNA fragments that are approximately 500 bp in length. (ii) Synthesis of the entire sequence of the DNA of interest: five to ten PCR products from the first step are combined and used as the template for a second PCR reaction using high-fidelity DNA polymerase pyrobest, with the two outermost oligonucleotides as primers. Compared with the previously published methods, the PTDS method is rapid (5-7 days) and suitable for synthesizing long segments of DNA (5-6 kb) with high G + C contents, repetitive sequences or complex secondary structures. Thus, the PTDS method provides an alternative tool for synthesizing and assembling long genes with complex structures. Using the newly developed PTDS method, we have successfully obtained several genes of interest with sizes ranging from 1.0 to 5.4 kb.

  15. Analysis of Temporal-spatial Co-variation within Gene Expression Microarray Data in an Organogenesis Model

    NASA Astrophysics Data System (ADS)

    Ehler, Martin; Rajapakse, Vinodh; Zeeberg, Barry; Brooks, Brian; Brown, Jacob; Czaja, Wojciech; Bonner, Robert F.

    The gene networks underlying closure of the optic fissure during vertebrate eye development are poorly understood. We used a novel clustering method based on Laplacian Eigenmaps, a nonlinear dimension reduction method, to analyze microarray data from laser capture microdissected (LCM) cells at the site and developmental stages (days 10.5 to 12.5) of optic fissure closure. Our new method provided greater biological specificity than classical clustering algorithms in terms of identifying more biological processes and functions related to eye development as defined by Gene Ontology at lower false discovery rates. This new methodology builds on the advantages of LCM to isolate pure phenotypic populations within complex tissues and allows improved ability to identify critical gene products expressed at lower copy number. The combination of LCM of embryonic organs, gene expression microarrays, and extracting spatial and temporal co-variations appear to be a powerful approach to understanding the gene regulatory networks that specify mammalian organogenesis.

  16. Simultaneous differential detection of human pathogenic and nonpathogenic Vibrio species using a multiplex PCR based on gyrB and pntA genes.

    PubMed

    Teh, C S J; Chua, K H; Thong, K L

    2010-06-01

    To develop a multiplex PCR targeting the gyrB and pntA genes for Vibrio species differentiation. Four pairs of primers targeting gyrB gene of Vibrios at genus level and pntA gene of Vibrio cholerae, Vibrio parahaemolyticus, Vibrio vulnificus were designed. This PCR method precisely identified 250 Vibrio species and demonstrated sensitivity in the range of 4 x 10(4) CFU ml(-1) (c. 200 CFU per PCR) to 2 x 10(3) CFU ml(-1) (c. 10 CFU per PCR). Overall, the gyrB gene marker showed a higher specificity than the dnaJ gene marker for Vibrio detection and was able to distinguish Aeromonas from Vibrio species. The multiplex PCR based on combined gyrB and pntA provides a high discriminatory power in the differentiation between Vibrio alginolyticus and V. parahaemolyticus, and between V. cholerae and Vibrio mimicus. This assay will be useful for rapid differentiation of various Vibrio species from clinical and environmental sources and significantly overcomes the limitations of the conventional methods.

  17. AucPR: an AUC-based approach using penalized regression for disease prediction with high-dimensional omics data.

    PubMed

    Yu, Wenbao; Park, Taesung

    2014-01-01

    It is common to get an optimal combination of markers for disease classification and prediction when multiple markers are available. Many approaches based on the area under the receiver operating characteristic curve (AUC) have been proposed. Existing works based on AUC in a high-dimensional context depend mainly on a non-parametric, smooth approximation of AUC, with no work using a parametric AUC-based approach, for high-dimensional data. We propose an AUC-based approach using penalized regression (AucPR), which is a parametric method used for obtaining a linear combination for maximizing the AUC. To obtain the AUC maximizer in a high-dimensional context, we transform a classical parametric AUC maximizer, which is used in a low-dimensional context, into a regression framework and thus, apply the penalization regression approach directly. Two kinds of penalization, lasso and elastic net, are considered. The parametric approach can avoid some of the difficulties of a conventional non-parametric AUC-based approach, such as the lack of an appropriate concave objective function and a prudent choice of the smoothing parameter. We apply the proposed AucPR for gene selection and classification using four real microarray and synthetic data. Through numerical studies, AucPR is shown to perform better than the penalized logistic regression and the nonparametric AUC-based method, in the sense of AUC and sensitivity for a given specificity, particularly when there are many correlated genes. We propose a powerful parametric and easily-implementable linear classifier AucPR, for gene selection and disease prediction for high-dimensional data. AucPR is recommended for its good prediction performance. Beside gene expression microarray data, AucPR can be applied to other types of high-dimensional omics data, such as miRNA and protein data.

  18. An extensive analysis of disease-gene associations using network integration and fast kernel-based gene prioritization methods.

    PubMed

    Valentini, Giorgio; Paccanaro, Alberto; Caniza, Horacio; Romero, Alfonso E; Re, Matteo

    2014-06-01

    In the context of "network medicine", gene prioritization methods represent one of the main tools to discover candidate disease genes by exploiting the large amount of data covering different types of functional relationships between genes. Several works proposed to integrate multiple sources of data to improve disease gene prioritization, but to our knowledge no systematic studies focused on the quantitative evaluation of the impact of network integration on gene prioritization. In this paper, we aim at providing an extensive analysis of gene-disease associations not limited to genetic disorders, and a systematic comparison of different network integration methods for gene prioritization. We collected nine different functional networks representing different functional relationships between genes, and we combined them through both unweighted and weighted network integration methods. We then prioritized genes with respect to each of the considered 708 medical subject headings (MeSH) diseases by applying classical guilt-by-association, random walk and random walk with restart algorithms, and the recently proposed kernelized score functions. The results obtained with classical random walk algorithms and the best single network achieved an average area under the curve (AUC) across the 708 MeSH diseases of about 0.82, while kernelized score functions and network integration boosted the average AUC to about 0.89. Weighted integration, by exploiting the different "informativeness" embedded in different functional networks, outperforms unweighted integration at 0.01 significance level, according to the Wilcoxon signed rank sum test. For each MeSH disease we provide the top-ranked unannotated candidate genes, available for further bio-medical investigation. Network integration is necessary to boost the performances of gene prioritization methods. Moreover the methods based on kernelized score functions can further enhance disease gene ranking results, by adopting both local and global learning strategies, able to exploit the overall topology of the network. Copyright © 2014 The Authors. Published by Elsevier B.V. All rights reserved.

  19. Gene doping.

    PubMed

    Haisma, H J; de Hon, O

    2006-04-01

    Together with the rapidly increasing knowledge on genetic therapies as a promising new branch of regular medicine, the issue has arisen whether these techniques might be abused in the field of sports. Previous experiences have shown that drugs that are still in the experimental phases of research may find their way into the athletic world. Both the World Anti-Doping Agency (WADA) and the International Olympic Committee (IOC) have expressed concerns about this possibility. As a result, the method of gene doping has been included in the list of prohibited classes of substances and prohibited methods. This review addresses the possible ways in which knowledge gained in the field of genetic therapies may be misused in elite sports. Many genes are readily available which may potentially have an effect on athletic performance. The sporting world will eventually be faced with the phenomena of gene doping to improve athletic performance. A combination of developing detection methods based on gene arrays or proteomics and a clear education program on the associated risks seems to be the most promising preventive method to counteract the possible application of gene doping.

  20. Identification of Novel Pre-Erythrocytic Malaria Antigen Candidates for Combination Vaccines with Circumsporozoite Protein

    PubMed Central

    Sahu, Tejram; Malkov, Vlad; Morrison, Robert; Pei, Ying; Juompan, Laure; Milman, Neta; Zarling, Stasya; Anderson, Charles; Wong-Madden, Sharon; Wendler, Jason; Ishizuka, Andrew; MacMillen, Zachary W.; Garcia, Valentino; Kappe, Stefan H. I.; Krzych, Urszula; Duffy, Patrick E.

    2016-01-01

    Malaria vaccine development has been hampered by the limited availability of antigens identified through conventional discovery approaches, and improvements are needed to enhance the efficacy of the leading vaccine candidate RTS,S that targets the circumsporozoite protein (CSP) of the infective sporozoite. Here we report a transcriptome-based approach to identify novel pre-erythrocytic vaccine antigens that could potentially be used in combination with CSP. We hypothesized that stage-specific upregulated genes would enrich for protective vaccine targets, and used tiling microarray to identify P. falciparum genes transcribed at higher levels during liver stage versus sporozoite or blood stages of development. We prepared DNA vaccines for 21 genes using the predicted orthologues in P. yoelii and P. berghei and tested their efficacy using different delivery methods against pre-erythrocytic malaria in rodent models. In our primary screen using P. yoelii in BALB/c mice, we found that 16 antigens significantly reduced liver stage parasite burden. In our confirmatory screen using P. berghei in C57Bl/6 mice, we confirmed 6 antigens that were protective in both models. Two antigens, when combined with CSP, provided significantly greater protection than CSP alone in both models. Based on the observations reported here, transcriptional patterns of Plasmodium genes can be useful in identifying novel pre-erythrocytic antigens that induce protective immunity alone or in combination with CSP. PMID:27434123

  1. Identification of Novel Pre-Erythrocytic Malaria Antigen Candidates for Combination Vaccines with Circumsporozoite Protein.

    PubMed

    Speake, Cate; Pichugin, Alexander; Sahu, Tejram; Malkov, Vlad; Morrison, Robert; Pei, Ying; Juompan, Laure; Milman, Neta; Zarling, Stasya; Anderson, Charles; Wong-Madden, Sharon; Wendler, Jason; Ishizuka, Andrew; MacMillen, Zachary W; Garcia, Valentino; Kappe, Stefan H I; Krzych, Urszula; Duffy, Patrick E

    2016-01-01

    Malaria vaccine development has been hampered by the limited availability of antigens identified through conventional discovery approaches, and improvements are needed to enhance the efficacy of the leading vaccine candidate RTS,S that targets the circumsporozoite protein (CSP) of the infective sporozoite. Here we report a transcriptome-based approach to identify novel pre-erythrocytic vaccine antigens that could potentially be used in combination with CSP. We hypothesized that stage-specific upregulated genes would enrich for protective vaccine targets, and used tiling microarray to identify P. falciparum genes transcribed at higher levels during liver stage versus sporozoite or blood stages of development. We prepared DNA vaccines for 21 genes using the predicted orthologues in P. yoelii and P. berghei and tested their efficacy using different delivery methods against pre-erythrocytic malaria in rodent models. In our primary screen using P. yoelii in BALB/c mice, we found that 16 antigens significantly reduced liver stage parasite burden. In our confirmatory screen using P. berghei in C57Bl/6 mice, we confirmed 6 antigens that were protective in both models. Two antigens, when combined with CSP, provided significantly greater protection than CSP alone in both models. Based on the observations reported here, transcriptional patterns of Plasmodium genes can be useful in identifying novel pre-erythrocytic antigens that induce protective immunity alone or in combination with CSP.

  2. Regularized rare variant enrichment analysis for case-control exome sequencing data.

    PubMed

    Larson, Nicholas B; Schaid, Daniel J

    2014-02-01

    Rare variants have recently garnered an immense amount of attention in genetic association analysis. However, unlike methods traditionally used for single marker analysis in GWAS, rare variant analysis often requires some method of aggregation, since single marker approaches are poorly powered for typical sequencing study sample sizes. Advancements in sequencing technologies have rendered next-generation sequencing platforms a realistic alternative to traditional genotyping arrays. Exome sequencing in particular not only provides base-level resolution of genetic coding regions, but also a natural paradigm for aggregation via genes and exons. Here, we propose the use of penalized regression in combination with variant aggregation measures to identify rare variant enrichment in exome sequencing data. In contrast to marginal gene-level testing, we simultaneously evaluate the effects of rare variants in multiple genes, focusing on gene-based least absolute shrinkage and selection operator (LASSO) and exon-based sparse group LASSO models. By using gene membership as a grouping variable, the sparse group LASSO can be used as a gene-centric analysis of rare variants while also providing a penalized approach toward identifying specific regions of interest. We apply extensive simulations to evaluate the performance of these approaches with respect to specificity and sensitivity, comparing these results to multiple competing marginal testing methods. Finally, we discuss our findings and outline future research. © 2013 WILEY PERIODICALS, INC.

  3. Characterizing gene sets using discriminative random walks with restart on heterogeneous biological networks.

    PubMed

    Blatti, Charles; Sinha, Saurabh

    2016-07-15

    Analysis of co-expressed gene sets typically involves testing for enrichment of different annotations or 'properties' such as biological processes, pathways, transcription factor binding sites, etc., one property at a time. This common approach ignores any known relationships among the properties or the genes themselves. It is believed that known biological relationships among genes and their many properties may be exploited to more accurately reveal commonalities of a gene set. Previous work has sought to achieve this by building biological networks that combine multiple types of gene-gene or gene-property relationships, and performing network analysis to identify other genes and properties most relevant to a given gene set. Most existing network-based approaches for recognizing genes or annotations relevant to a given gene set collapse information about different properties to simplify (homogenize) the networks. We present a network-based method for ranking genes or properties related to a given gene set. Such related genes or properties are identified from among the nodes of a large, heterogeneous network of biological information. Our method involves a random walk with restarts, performed on an initial network with multiple node and edge types that preserve more of the original, specific property information than current methods that operate on homogeneous networks. In this first stage of our algorithm, we find the properties that are the most relevant to the given gene set and extract a subnetwork of the original network, comprising only these relevant properties. We then re-rank genes by their similarity to the given gene set, based on a second random walk with restarts, performed on the above subnetwork. We demonstrate the effectiveness of this algorithm for ranking genes related to Drosophila embryonic development and aggressive responses in the brains of social animals. DRaWR was implemented as an R package available at veda.cs.illinois.edu/DRaWR. blatti@illinois.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  4. Combining classifiers to predict gene function in Arabidopsis thaliana using large-scale gene expression measurements.

    PubMed

    Lan, Hui; Carson, Rachel; Provart, Nicholas J; Bonner, Anthony J

    2007-09-21

    Arabidopsis thaliana is the model species of current plant genomic research with a genome size of 125 Mb and approximately 28,000 genes. The function of half of these genes is currently unknown. The purpose of this study is to infer gene function in Arabidopsis using machine-learning algorithms applied to large-scale gene expression data sets, with the goal of identifying genes that are potentially involved in plant response to abiotic stress. Using in house and publicly available data, we assembled a large set of gene expression measurements for A. thaliana. Using those genes of known function, we first evaluated and compared the ability of basic machine-learning algorithms to predict which genes respond to stress. Predictive accuracy was measured using ROC50 and precision curves derived through cross validation. To improve accuracy, we developed a method for combining these classifiers using a weighted-voting scheme. The combined classifier was then trained on genes of known function and applied to genes of unknown function, identifying genes that potentially respond to stress. Visual evidence corroborating the predictions was obtained using electronic Northern analysis. Three of the predicted genes were chosen for biological validation. Gene knockout experiments confirmed that all three are involved in a variety of stress responses. The biological analysis of one of these genes (At1g16850) is presented here, where it is shown to be necessary for the normal response to temperature and NaCl. Supervised learning methods applied to large-scale gene expression measurements can be used to predict gene function. However, the ability of basic learning methods to predict stress response varies widely and depends heavily on how much dimensionality reduction is used. Our method of combining classifiers can improve the accuracy of such predictions - in this case, predictions of genes involved in stress response in plants - and it effectively chooses the appropriate amount of dimensionality reduction automatically. The method provides a useful means of identifying genes in A. thaliana that potentially respond to stress, and we expect it would be useful in other organisms and for other gene functions.

  5. Effect of Aggregation Operators on Network-Based Disease Gene Prioritization: A Case Study on Blood Disorders.

    PubMed

    Grewal, Nivit; Singh, Shailendra; Chand, Trilok

    2017-01-01

    Owing to the innate noise in the biological data sources, a single source or a single measure do not suffice for an effective disease gene prioritization. So, the integration of multiple data sources or aggregation of multiple measures is the need of the hour. The aggregation operators combine multiple related data values to a single value such that the combined value has the effect of all the individual values. In this paper, an attempt has been made for applying the fuzzy aggregation on the network-based disease gene prioritization and investigate its effect under noise conditions. This study has been conducted for a set of 15 blood disorders by fusing four different network measures, computed from the protein interaction network, using a selected set of aggregation operators and ranking the genes on the basis of the aggregated value. The aggregation operator-based rankings have been compared with the "Random walk with restart" gene prioritization method. The impact of noise has also been investigated by adding varying proportions of noise to the seed set. The results reveal that for all the selected blood disorders, the Mean of Maximal operator has relatively outperformed the other aggregation operators for noisy as well as non-noisy data.

  6. Development and validation of real-time PCR screening methods for detection of cry1A.105 and cry2Ab2 genes in genetically modified organisms.

    PubMed

    Dinon, Andréia Z; Prins, Theo W; van Dijk, Jeroen P; Arisi, Ana Carolina M; Scholtens, Ingrid M J; Kok, Esther J

    2011-05-01

    Primers and probes were developed for the element-specific detection of cry1A.105 and cry2Ab2 genes, based on their DNA sequence as present in GM maize MON89034. Cry genes are present in many genetically modified (GM) plants and they are important targets for developing GMO element-specific detection methods. Element-specific methods can be of use to screen for the presence of GMOs in food and feed supply chains. Moreover, a combination of GMO elements may indicate the potential presence of unapproved GMOs (UGMs). Primer-probe combinations were evaluated in terms of specificity, efficiency and limit of detection. Except for specificity, the complete experiment was performed in 9 PCR runs, on 9 different days and by testing 8 DNA concentrations. The results showed a high specificity and efficiency for cry1A.105 and cry2Ab2 detection. The limit of detection was between 0.05 and 0.01 ng DNA per PCR reaction for both assays. These data confirm the applicability of these new primer-probe combinations for element detection that can contribute to the screening for GM and UGM crops in food and feed samples.

  7. Efficient Exploration of the Space of Reconciled Gene Trees

    PubMed Central

    Szöllősi, Gergely J.; Rosikiewicz, Wojciech; Boussau, Bastien; Tannier, Eric; Daubin, Vincent

    2013-01-01

    Gene trees record the combination of gene-level events, such as duplication, transfer and loss (DTL), and species-level events, such as speciation and extinction. Gene tree–species tree reconciliation methods model these processes by drawing gene trees into the species tree using a series of gene and species-level events. The reconstruction of gene trees based on sequence alone almost always involves choosing between statistically equivalent or weakly distinguishable relationships that could be much better resolved based on a putative species tree. To exploit this potential for accurate reconstruction of gene trees, the space of reconciled gene trees must be explored according to a joint model of sequence evolution and gene tree–species tree reconciliation. Here we present amalgamated likelihood estimation (ALE), a probabilistic approach to exhaustively explore all reconciled gene trees that can be amalgamated as a combination of clades observed in a sample of gene trees. We implement the ALE approach in the context of a reconciliation model (Szöllősi et al. 2013), which allows for the DTL of genes. We use ALE to efficiently approximate the sum of the joint likelihood over amalgamations and to find the reconciled gene tree that maximizes the joint likelihood among all such trees. We demonstrate using simulations that gene trees reconstructed using the joint likelihood are substantially more accurate than those reconstructed using sequence alone. Using realistic gene tree topologies, branch lengths, and alignment sizes, we demonstrate that ALE produces more accurate gene trees even if the model of sequence evolution is greatly simplified. Finally, examining 1099 gene families from 36 cyanobacterial genomes we find that joint likelihood-based inference results in a striking reduction in apparent phylogenetic discord, with respectively. 24%, 59%, and 46% reductions in the mean numbers of duplications, transfers, and losses per gene family. The open source implementation of ALE is available from https://github.com/ssolo/ALE.git. [amalgamation; gene tree reconciliation; gene tree reconstruction; lateral gene transfer; phylogeny.] PMID:23925510

  8. Identifying cooperative transcriptional regulations using protein–protein interactions

    PubMed Central

    Nagamine, Nobuyoshi; Kawada, Yuji; Sakakibara, Yasubumi

    2005-01-01

    Cooperative transcriptional activations among multiple transcription factors (TFs) are important to understand the mechanisms of complex transcriptional regulations in eukaryotes. Previous studies have attempted to find cooperative TFs based on gene expression data with gene expression profiles as a measure of similarity of gene regulations. In this paper, we use protein–protein interaction data to infer synergistic binding of cooperative TFs. Our fundamental idea is based on the assumption that genes contributing to a similar biological process are regulated under the same control mechanism. First, the protein–protein interaction networks are used to calculate the similarity of biological processes among genes. Second, we integrate this similarity and the chromatin immuno-precipitation data to identify cooperative TFs. Our computational experiments in yeast show that predictions made by our method have successfully identified eight pairs of cooperative TFs that have literature evidences but could not be identified by the previous method. Further, 12 new possible pairs have been inferred and we have examined the biological relevances for them. However, since a typical problem using protein–protein interaction data is that many false-positive data are contained, we propose a method combining various biological data to increase the prediction accuracy. PMID:16126847

  9. Knowledge-guided gene prioritization reveals new insights into the mechanisms of chemoresistance.

    PubMed

    Emad, Amin; Cairns, Junmei; Kalari, Krishna R; Wang, Liewei; Sinha, Saurabh

    2017-08-11

    Identification of genes whose basal mRNA expression predicts the sensitivity of tumor cells to cytotoxic treatments can play an important role in individualized cancer medicine. It enables detailed characterization of the mechanism of action of drugs. Furthermore, screening the expression of these genes in the tumor tissue may suggest the best course of chemotherapy or a combination of drugs to overcome drug resistance. We developed a computational method called ProGENI to identify genes most associated with the variation of drug response across different individuals, based on gene expression data. In contrast to existing methods, ProGENI also utilizes prior knowledge of protein-protein and genetic interactions, using random walk techniques. Analysis of two relatively new and large datasets including gene expression data on hundreds of cell lines and their cytotoxic responses to a large compendium of drugs reveals a significant improvement in prediction of drug sensitivity using genes identified by ProGENI compared to other methods. Our siRNA knockdown experiments on ProGENI-identified genes confirmed the role of many new genes in sensitivity to three chemotherapy drugs: cisplatin, docetaxel, and doxorubicin. Based on such experiments and extensive literature survey, we demonstrate that about 73% of our top predicted genes modulate drug response in selected cancer cell lines. In addition, global analysis of genes associated with groups of drugs uncovered pathways of cytotoxic response shared by each group. Our results suggest that knowledge-guided prioritization of genes using ProGENI gives new insight into mechanisms of drug resistance and identifies genes that may be targeted to overcome this phenomenon.

  10. Island-Model Genomic Selection for Long-Term Genetic Improvement of Autogamous Crops.

    PubMed

    Yabe, Shiori; Yamasaki, Masanori; Ebana, Kaworu; Hayashi, Takeshi; Iwata, Hiroyoshi

    2016-01-01

    Acceleration of genetic improvement of autogamous crops such as wheat and rice is necessary to increase cereal production in response to the global food crisis. Population and pedigree methods of breeding, which are based on inbred line selection, are used commonly in the genetic improvement of autogamous crops. These methods, however, produce a few novel combinations of genes in a breeding population. Recurrent selection promotes recombination among genes and produces novel combinations of genes in a breeding population, but it requires inaccurate single-plant evaluation for selection. Genomic selection (GS), which can predict genetic potential of individuals based on their marker genotype, might have high reliability of single-plant evaluation and might be effective in recurrent selection. To evaluate the efficiency of recurrent selection with GS, we conducted simulations using real marker genotype data of rice cultivars. Additionally, we introduced the concept of an "island model" inspired by evolutionary algorithms that might be useful to maintain genetic variation through the breeding process. We conducted GS simulations using real marker genotype data of rice cultivars to evaluate the efficiency of recurrent selection and the island model in an autogamous species. Results demonstrated the importance of producing novel combinations of genes through recurrent selection. An initial population derived from admixture of multiple bi-parental crosses showed larger genetic gains than a population derived from a single bi-parental cross in whole cycles, suggesting the importance of genetic variation in an initial population. The island-model GS better maintained genetic improvement in later generations than the other GS methods, suggesting that the island-model GS can utilize genetic variation in breeding and can retain alleles with small effects in the breeding population. The island-model GS will become a new breeding method that enhances the potential of genomic selection in autogamous crops, especially bringing long-term improvement.

  11. Island-Model Genomic Selection for Long-Term Genetic Improvement of Autogamous Crops

    PubMed Central

    Yabe, Shiori; Yamasaki, Masanori; Ebana, Kaworu; Hayashi, Takeshi; Iwata, Hiroyoshi

    2016-01-01

    Acceleration of genetic improvement of autogamous crops such as wheat and rice is necessary to increase cereal production in response to the global food crisis. Population and pedigree methods of breeding, which are based on inbred line selection, are used commonly in the genetic improvement of autogamous crops. These methods, however, produce a few novel combinations of genes in a breeding population. Recurrent selection promotes recombination among genes and produces novel combinations of genes in a breeding population, but it requires inaccurate single-plant evaluation for selection. Genomic selection (GS), which can predict genetic potential of individuals based on their marker genotype, might have high reliability of single-plant evaluation and might be effective in recurrent selection. To evaluate the efficiency of recurrent selection with GS, we conducted simulations using real marker genotype data of rice cultivars. Additionally, we introduced the concept of an “island model” inspired by evolutionary algorithms that might be useful to maintain genetic variation through the breeding process. We conducted GS simulations using real marker genotype data of rice cultivars to evaluate the efficiency of recurrent selection and the island model in an autogamous species. Results demonstrated the importance of producing novel combinations of genes through recurrent selection. An initial population derived from admixture of multiple bi-parental crosses showed larger genetic gains than a population derived from a single bi-parental cross in whole cycles, suggesting the importance of genetic variation in an initial population. The island-model GS better maintained genetic improvement in later generations than the other GS methods, suggesting that the island-model GS can utilize genetic variation in breeding and can retain alleles with small effects in the breeding population. The island-model GS will become a new breeding method that enhances the potential of genomic selection in autogamous crops, especially bringing long-term improvement. PMID:27115872

  12. Combinatorial therapy discovery using mixed integer linear programming.

    PubMed

    Pang, Kaifang; Wan, Ying-Wooi; Choi, William T; Donehower, Lawrence A; Sun, Jingchun; Pant, Dhruv; Liu, Zhandong

    2014-05-15

    Combinatorial therapies play increasingly important roles in combating complex diseases. Owing to the huge cost associated with experimental methods in identifying optimal drug combinations, computational approaches can provide a guide to limit the search space and reduce cost. However, few computational approaches have been developed for this purpose, and thus there is a great need of new algorithms for drug combination prediction. Here we proposed to formulate the optimal combinatorial therapy problem into two complementary mathematical algorithms, Balanced Target Set Cover (BTSC) and Minimum Off-Target Set Cover (MOTSC). Given a disease gene set, BTSC seeks a balanced solution that maximizes the coverage on the disease genes and minimizes the off-target hits at the same time. MOTSC seeks a full coverage on the disease gene set while minimizing the off-target set. Through simulation, both BTSC and MOTSC demonstrated a much faster running time over exhaustive search with the same accuracy. When applied to real disease gene sets, our algorithms not only identified known drug combinations, but also predicted novel drug combinations that are worth further testing. In addition, we developed a web-based tool to allow users to iteratively search for optimal drug combinations given a user-defined gene set. Our tool is freely available for noncommercial use at http://www.drug.liuzlab.org/. zhandong.liu@bcm.edu Supplementary data are available at Bioinformatics online.

  13. Inherited variation in circadian rhythm genes and risks of prostate cancer and three other cancer sites in combined cancer consortia.

    PubMed

    Gu, Fangyi; Zhang, Han; Hyland, Paula L; Berndt, Sonja; Gapstur, Susan M; Wheeler, William; Ellipse Consortium, The; Amos, Christopher I; Bezieau, Stephane; Bickeböller, Heike; Brenner, Hermann; Brennan, Paul; Chang-Claude, Jenny; Conti, David V; Doherty, Jennifer Anne; Gruber, Stephen B; Harrison, Tabitha A; Hayes, Richard B; Hoffmeister, Michael; Houlston, Richard S; Hung, Rayjean J; Jenkins, Mark A; Kraft, Peter; Lawrenson, Kate; McKay, James; Markt, Sarah; Mucci, Lorelei; Phelan, Catherine M; Qu, Conghui; Risch, Angela; Rossing, Mary Anne; Wichmann, H-Erich; Shi, Jianxin; Schernhammer, Eva; Yu, Kai; Landi, Maria Teresa; Caporaso, Neil E

    2017-11-01

    Circadian disruption has been linked to carcinogenesis in animal models, but the evidence in humans is inconclusive. Genetic variation in circadian rhythm genes provides a tool to investigate such associations. We examined associations of genetic variation in nine core circadian rhythm genes and six melatonin pathway genes with risk of colorectal, lung, ovarian and prostate cancers using data from the Genetic Associations and Mechanisms in Oncology (GAME-ON) network. The major results for prostate cancer were replicated in the Prostate, Lung, Colorectal and Ovarian (PLCO) cancer screening trial, and for colorectal cancer in the Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO). The total number of cancer cases and controls was 15,838/18,159 for colorectal, 14,818/14,227 for prostate, 12,537/17,285 for lung and 4,369/9,123 for ovary. For each cancer site, we conducted gene-based and pathway-based analyses by applying the summary-based Adaptive Rank Truncated Product method (sARTP) on the summary association statistics for each SNP within the candidate gene regions. Aggregate genetic variation in circadian rhythm and melatonin pathways were significantly associated with the risk of prostate cancer in data combining GAME-ON and PLCO, after Bonferroni correction (p pathway  < 0.00625). The two most significant genes were NPAS2 (p gene  = 0.0062) and AANAT (p gene  = 0.00078); the latter being significant after Bonferroni correction. For colorectal cancer, we observed a suggestive association with the circadian rhythm pathway in GAME-ON (p pathway  = 0.021); this association was not confirmed in GECCO (p pathway  = 0.76) or the combined data (p pathway  = 0.17). No significant association was observed for ovarian and lung cancer. These findings support a potential role for circadian rhythm and melatonin pathways in prostate carcinogenesis. Further functional studies are needed to better understand the underlying biologic mechanisms. © 2017 UICC.

  14. Gene Ranking of RNA-Seq Data via Discriminant Non-Negative Matrix Factorization.

    PubMed

    Jia, Zhilong; Zhang, Xiang; Guan, Naiyang; Bo, Xiaochen; Barnes, Michael R; Luo, Zhigang

    2015-01-01

    RNA-sequencing is rapidly becoming the method of choice for studying the full complexity of transcriptomes, however with increasing dimensionality, accurate gene ranking is becoming increasingly challenging. This paper proposes an accurate and sensitive gene ranking method that implements discriminant non-negative matrix factorization (DNMF) for RNA-seq data. To the best of our knowledge, this is the first work to explore the utility of DNMF for gene ranking. When incorporating Fisher's discriminant criteria and setting the reduced dimension as two, DNMF learns two factors to approximate the original gene expression data, abstracting the up-regulated or down-regulated metagene by using the sample label information. The first factor denotes all the genes' weights of two metagenes as the additive combination of all genes, while the second learned factor represents the expression values of two metagenes. In the gene ranking stage, all the genes are ranked as a descending sequence according to the differential values of the metagene weights. Leveraging the nature of NMF and Fisher's criterion, DNMF can robustly boost the gene ranking performance. The Area Under the Curve analysis of differential expression analysis on two benchmarking tests of four RNA-seq data sets with similar phenotypes showed that our proposed DNMF-based gene ranking method outperforms other widely used methods. Moreover, the Gene Set Enrichment Analysis also showed DNMF outweighs others. DNMF is also computationally efficient, substantially outperforming all other benchmarked methods. Consequently, we suggest DNMF is an effective method for the analysis of differential gene expression and gene ranking for RNA-seq data.

  15. Quantitative Resistance to Plant Pathogens in Pyramiding Strategies for Durable Crop Protection.

    PubMed

    Pilet-Nayel, Marie-Laure; Moury, Benoît; Caffier, Valérie; Montarry, Josselin; Kerlan, Marie-Claire; Fournet, Sylvain; Durel, Charles-Eric; Delourme, Régine

    2017-01-01

    Quantitative resistance has gained interest in plant breeding for pathogen control in low-input cropping systems. Although quantitative resistance frequently has only a partial effect and is difficult to select, it is considered more durable than major resistance (R) genes. With the exponential development of molecular markers over the past 20 years, resistance QTL have been more accurately detected and better integrated into breeding strategies for resistant varieties with increased potential for durability. This review summarizes current knowledge on the genetic inheritance, molecular basis, and durability of quantitative resistance. Based on this knowledge, we discuss how strategies that combine major R genes and QTL in crops can maintain the effectiveness of plant resistance to pathogens. Combining resistance QTL with complementary modes of action appears to be an interesting strategy for breeding effective and potentially durable resistance. Combining quantitative resistance with major R genes has proven to be a valuable approach for extending the effectiveness of major genes. In the plant genomics era, improved tools and methods are becoming available to better integrate quantitative resistance into breeding strategies. Nevertheless, optimal combinations of resistance loci will still have to be identified to preserve resistance effectiveness over time for durable crop protection.

  16. Transcriptome inference and systems approaches to polypharmacology and drug discovery in herbal medicine.

    PubMed

    Li, Peng; Chen, Jianxin; Zhang, Wuxia; Fu, Bangze; Wang, Wei

    2017-01-04

    Herbal medicine is a concoction of numerous chemical ingredients, and it exhibits polypharmacological effects to act on multiple pharmacological targets, regulating different biological mechanisms and treating a variety of diseases. Thus, this complexity is impossible to deconvolute by the reductionist method of extracting one active ingredient acting on one biological target. To dissect the polypharmacological effects of herbal medicines and their underling pharmacological targets as well as their corresponding active ingredients. We propose a system-biology strategy that combines omics and bioinformatical methodologies for exploring the polypharmacology of herbal mixtures. The myocardial ischemia model was induced by Ameroid constriction of the left anterior descending coronary in Ba-Ma miniature pigs. RNA-seq analysis was utilized to find the differential genes induced by myocardial ischemia in pigs treated with formula QSKL. A transcriptome-based inference method was used to find the landmark drugs with similar mechanisms to QSKL. Gene-level analysis of RNA-seq data in QSKL-treated cases versus control animals yields 279 differential genes. Transcriptome-based inference methods identified 80 landmark drugs that covered nearly all drug classes. Then, based on the landmark drugs, 155 potential pharmacological targets and 57 indications were identified for QSKL. Our results demonstrate the power of a combined approach for exploring the pharmacological target and chemical space of herbal medicines. We hope that our method could enhance our understanding of the molecular mechanisms of herbal systems and further accelerate the exploration of the value of traditional herbal medicine systems. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  17. Completing and Adapting Models of Biological Processes

    NASA Technical Reports Server (NTRS)

    Margaria, Tiziana; Hinchey, Michael G.; Raffelt, Harald; Rash, James L.; Rouff, Christopher A.; Steffen, Bernhard

    2006-01-01

    We present a learning-based method for model completion and adaptation, which is based on the combination of two approaches: 1) R2D2C, a technique for mechanically transforming system requirements via provably equivalent models to running code, and 2) automata learning-based model extrapolation. The intended impact of this new combination is to make model completion and adaptation accessible to experts of the field, like biologists or engineers. The principle is briefly illustrated by generating models of biological procedures concerning gene activities in the production of proteins, although the main application is going to concern autonomic systems for space exploration.

  18. Effects of phylogenetic reconstruction method on the robustness of species delimitation using single-locus data

    PubMed Central

    Tang, Cuong Q; Humphreys, Aelys M; Fontaneto, Diego; Barraclough, Timothy G; Paradis, Emmanuel

    2014-01-01

    Coalescent-based species delimitation methods combine population genetic and phylogenetic theory to provide an objective means for delineating evolutionarily significant units of diversity. The generalised mixed Yule coalescent (GMYC) and the Poisson tree process (PTP) are methods that use ultrametric (GMYC or PTP) or non-ultrametric (PTP) gene trees as input, intended for use mostly with single-locus data such as DNA barcodes. Here, we assess how robust the GMYC and PTP are to different phylogenetic reconstruction and branch smoothing methods. We reconstruct over 400 ultrametric trees using up to 30 different combinations of phylogenetic and smoothing methods and perform over 2000 separate species delimitation analyses across 16 empirical data sets. We then assess how variable diversity estimates are, in terms of richness and identity, with respect to species delimitation, phylogenetic and smoothing methods. The PTP method generally generates diversity estimates that are more robust to different phylogenetic methods. The GMYC is more sensitive, but provides consistent estimates for BEAST trees. The lower consistency of GMYC estimates is likely a result of differences among gene trees introduced by the smoothing step. Unresolved nodes (real anomalies or methodological artefacts) affect both GMYC and PTP estimates, but have a greater effect on GMYC estimates. Branch smoothing is a difficult step and perhaps an underappreciated source of bias that may be widespread among studies of diversity and diversification. Nevertheless, careful choice of phylogenetic method does produce equivalent PTP and GMYC diversity estimates. We recommend simultaneous use of the PTP model with any model-based gene tree (e.g. RAxML) and GMYC approaches with BEAST trees for obtaining species hypotheses. PMID:25821577

  19. Analysis of Rare, Exonic Variation amongst Subjects with Autism Spectrum Disorders and Population Controls

    PubMed Central

    Liu, Li; Sabo, Aniko; Neale, Benjamin M.; Nagaswamy, Uma; Stevens, Christine; Lim, Elaine; Bodea, Corneliu A.; Muzny, Donna; Reid, Jeffrey G.; Banks, Eric; Coon, Hillary; DePristo, Mark; Dinh, Huyen; Fennel, Tim; Flannick, Jason; Gabriel, Stacey; Garimella, Kiran; Gross, Shannon; Hawes, Alicia; Lewis, Lora; Makarov, Vladimir; Maguire, Jared; Newsham, Irene; Poplin, Ryan; Ripke, Stephan; Shakir, Khalid; Samocha, Kaitlin E.; Wu, Yuanqing; Boerwinkle, Eric; Buxbaum, Joseph D.; Cook, Edwin H.; Devlin, Bernie; Schellenberg, Gerard D.; Sutcliffe, James S.; Daly, Mark J.; Gibbs, Richard A.; Roeder, Kathryn

    2013-01-01

    We report on results from whole-exome sequencing (WES) of 1,039 subjects diagnosed with autism spectrum disorders (ASD) and 870 controls selected from the NIMH repository to be of similar ancestry to cases. The WES data came from two centers using different methods to produce sequence and to call variants from it. Therefore, an initial goal was to ensure the distribution of rare variation was similar for data from different centers. This proved straightforward by filtering called variants by fraction of missing data, read depth, and balance of alternative to reference reads. Results were evaluated using seven samples sequenced at both centers and by results from the association study. Next we addressed how the data and/or results from the centers should be combined. Gene-based analyses of association was an obvious choice, but should statistics for association be combined across centers (meta-analysis) or should data be combined and then analyzed (mega-analysis)? Because of the nature of many gene-based tests, we showed by theory and simulations that mega-analysis has better power than meta-analysis. Finally, before analyzing the data for association, we explored the impact of population structure on rare variant analysis in these data. Like other recent studies, we found evidence that population structure can confound case-control studies by the clustering of rare variants in ancestry space; yet, unlike some recent studies, for these data we found that principal component-based analyses were sufficient to control for ancestry and produce test statistics with appropriate distributions. After using a variety of gene-based tests and both meta- and mega-analysis, we found no new risk genes for ASD in this sample. Our results suggest that standard gene-based tests will require much larger samples of cases and controls before being effective for gene discovery, even for a disorder like ASD. PMID:23593035

  20. International interlaboratory study comparing single organism 16S rRNA gene sequencing data: Beyond consensus sequence comparisons

    PubMed Central

    Olson, Nathan D.; Lund, Steven P.; Zook, Justin M.; Rojas-Cornejo, Fabiola; Beck, Brian; Foy, Carole; Huggett, Jim; Whale, Alexandra S.; Sui, Zhiwei; Baoutina, Anna; Dobeson, Michael; Partis, Lina; Morrow, Jayne B.

    2015-01-01

    This study presents the results from an interlaboratory sequencing study for which we developed a novel high-resolution method for comparing data from different sequencing platforms for a multi-copy, paralogous gene. The combination of PCR amplification and 16S ribosomal RNA gene (16S rRNA) sequencing has revolutionized bacteriology by enabling rapid identification, frequently without the need for culture. To assess variability between laboratories in sequencing 16S rRNA, six laboratories sequenced the gene encoding the 16S rRNA from Escherichia coli O157:H7 strain EDL933 and Listeria monocytogenes serovar 4b strain NCTC11994. Participants performed sequencing methods and protocols available in their laboratories: Sanger sequencing, Roche 454 pyrosequencing®, or Ion Torrent PGM®. The sequencing data were evaluated on three levels: (1) identity of biologically conserved position, (2) ratio of 16S rRNA gene copies featuring identified variants, and (3) the collection of variant combinations in a set of 16S rRNA gene copies. The same set of biologically conserved positions was identified for each sequencing method. Analytical methods using Bayesian and maximum likelihood statistics were developed to estimate variant copy ratios, which describe the ratio of nucleotides at each identified biologically variable position, as well as the likely set of variant combinations present in 16S rRNA gene copies. Our results indicate that estimated variant copy ratios at biologically variable positions were only reproducible for high throughput sequencing methods. Furthermore, the likely variant combination set was only reproducible with increased sequencing depth and longer read lengths. We also demonstrate novel methods for evaluating variable positions when comparing multi-copy gene sequence data from multiple laboratories generated using multiple sequencing technologies. PMID:27077030

  1. Molecular diversification of Trichuris spp. from Sigmodontinae (Cricetidae) rodents from Argentina based on mitochondrial DNA sequences.

    PubMed

    Callejón, Rocío; Robles, María Del Rosario; Panei, Carlos Javier; Cutillas, Cristina

    2016-08-01

    A molecular phylogenetic hypothesis is presented for the genus Trichuris based on sequence data from mitochondrial cytochrome c oxidase 1 (cox1) and cytochrome b (cob). The taxa consisted of nine populations of whipworm from five species of Sigmodontinae rodents from Argentina. Bayesian Inference, Maximum Parsimony, and Maximum Likelihood methods were used to infer phylogenies for each gene separately but also for the combined mitochondrial data and the combined mitochondrial and nuclear dataset. Phylogenetic results based on cox1 and cob mitochondrial DNA (mtDNA) revealed three clades strongly resolved corresponding to three different species (Trichuris navonae, Trichuris bainae, and Trichuris pardinasi) showing phylogeographic variation, but relationships among Trichuris species were poorly resolved. Phylogenetic reconstruction based on concatenated sequences had greater phylogenetic resolution for delimiting species and populations intra-specific of Trichuris than those based on partitioned genes. Thus, populations of T. bainae and T. pardinasi could be affected by geographical factors and co-divergence parasite-host.

  2. Blueberry (Vaccinium corymbosum L.).

    PubMed

    Song, Guo-Qing

    2015-01-01

    Vaccinium consists of approximately 450 species, of which highbush blueberry (Vaccinium corymbosum) is one of the three major Vaccinium fruit crops (i.e., blueberry, cranberry, and lingonberry) domesticated in the twentieth century. In blueberry the adventitious shoot regeneration using leaf explants has been the most desirable regeneration system to date; Agrobacterium tumefaciens-mediated transformation is the major gene delivery method and effective selection has been reported using either the neomycin phosphotransferase II gene (nptII) or the bialaphos resistance (bar) gene as selectable markers. The A. tumefaciens-mediated transformation protocol described in this chapter is based on combining the optimal conditions for efficient plant regeneration, reliable gene delivery, and effective selection. The protocol has led to successful regeneration of transgenic plants from leaf explants of four commercially important highbush blueberry cultivars for multiple purposes, providing a powerful approach to supplement conventional breeding methods for blueberry by introducing genes of interest.

  3. Genetic Doping and Health Damages

    PubMed Central

    Fallahi, AA; Ravasi, AA; Farhud, DD

    2011-01-01

    Background: Use of genetic doping or gene transfer technology will be the newest and the lethal method of doping in future and have some unpleasant consequences for sports, athletes, and outcomes of competitions. The World Anti-Doping Agency (WADA) defines genetic doping as “the non-therapeutic use of genes, genetic elements, and/or cells that have the capacity to enhance athletic performance ”. The purpose of this review is to consider genetic doping, health damages and risks of new genes if delivered in athletes. Methods: This review, which is carried out by reviewing relevant publications, is primarily based on the journals available in GOOGLE, ELSEVIER, PUBMED in fields of genetic technology, and health using a combination of keywords (e.g., genetic doping, genes, exercise, performance, athletes) until July 2010. Conclusion: There are several genes related to sport performance and if they are used, they will have health risks and sever damages such as cancer, autoimmunization, and heart attack. PMID:23113049

  4. SigEMD: A powerful method for differential gene expression analysis in single-cell RNA sequencing data.

    PubMed

    Wang, Tianyu; Nabavi, Sheida

    2018-04-24

    Differential gene expression analysis is one of the significant efforts in single cell RNA sequencing (scRNAseq) analysis to discover the specific changes in expression levels of individual cell types. Since scRNAseq exhibits multimodality, large amounts of zero counts, and sparsity, it is different from the traditional bulk RNA sequencing (RNAseq) data. The new challenges of scRNAseq data promote the development of new methods for identifying differentially expressed (DE) genes. In this study, we proposed a new method, SigEMD, that combines a data imputation approach, a logistic regression model and a nonparametric method based on the Earth Mover's Distance, to precisely and efficiently identify DE genes in scRNAseq data. The regression model and data imputation are used to reduce the impact of large amounts of zero counts, and the nonparametric method is used to improve the sensitivity of detecting DE genes from multimodal scRNAseq data. By additionally employing gene interaction network information to adjust the final states of DE genes, we further reduce the false positives of calling DE genes. We used simulated datasets and real datasets to evaluate the detection accuracy of the proposed method and to compare its performance with those of other differential expression analysis methods. Results indicate that the proposed method has an overall powerful performance in terms of precision in detection, sensitivity, and specificity. Copyright © 2018 Elsevier Inc. All rights reserved.

  5. Detection of biomarkers for Hepatocellular Carcinoma using a hybrid univariate gene selection methods

    PubMed Central

    2012-01-01

    Background Discovering new biomarkers has a great role in improving early diagnosis of Hepatocellular carcinoma (HCC). The experimental determination of biomarkers needs a lot of time and money. This motivates this work to use in-silico prediction of biomarkers to reduce the number of experiments required for detecting new ones. This is achieved by extracting the most representative genes in microarrays of HCC. Results In this work, we provide a method for extracting the differential expressed genes, up regulated ones, that can be considered candidate biomarkers in high throughput microarrays of HCC. We examine the power of several gene selection methods (such as Pearson’s correlation coefficient, Cosine coefficient, Euclidean distance, Mutual information and Entropy with different estimators) in selecting informative genes. A biological interpretation of the highly ranked genes is done using KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways, ENTREZ and DAVID (Database for Annotation, Visualization, and Integrated Discovery) databases. The top ten genes selected using Pearson’s correlation coefficient and Cosine coefficient contained six genes that have been implicated in cancer (often multiple cancers) genesis in previous studies. A fewer number of genes were obtained by the other methods (4 genes using Mutual information, 3genes using Euclidean distance and only one gene using Entropy). A better result was obtained by the utilization of a hybrid approach based on intersecting the highly ranked genes in the output of all investigated methods. This hybrid combination yielded seven genes (2 genes for HCC and 5 genes in different types of cancer) in the top ten genes of the list of intersected genes. Conclusions To strengthen the effectiveness of the univariate selection methods, we propose a hybrid approach by intersecting several of these methods in a cascaded manner. This approach surpasses all of univariate selection methods when used individually according to biological interpretation and the examination of gene expression signal profiles. PMID:22867264

  6. mRMR-ABC: A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling

    PubMed Central

    Alshamlan, Hala; Badr, Ghada; Alohali, Yousef

    2015-01-01

    An artificial bee colony (ABC) is a relatively recent swarm intelligence optimization approach. In this paper, we propose the first attempt at applying ABC algorithm in analyzing a microarray gene expression profile. In addition, we propose an innovative feature selection algorithm, minimum redundancy maximum relevance (mRMR), and combine it with an ABC algorithm, mRMR-ABC, to select informative genes from microarray profile. The new approach is based on a support vector machine (SVM) algorithm to measure the classification accuracy for selected genes. We evaluate the performance of the proposed mRMR-ABC algorithm by conducting extensive experiments on six binary and multiclass gene expression microarray datasets. Furthermore, we compare our proposed mRMR-ABC algorithm with previously known techniques. We reimplemented two of these techniques for the sake of a fair comparison using the same parameters. These two techniques are mRMR when combined with a genetic algorithm (mRMR-GA) and mRMR when combined with a particle swarm optimization algorithm (mRMR-PSO). The experimental results prove that the proposed mRMR-ABC algorithm achieves accurate classification performance using small number of predictive genes when tested using both datasets and compared to previously suggested methods. This shows that mRMR-ABC is a promising approach for solving gene selection and cancer classification problems. PMID:25961028

  7. mRMR-ABC: A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling.

    PubMed

    Alshamlan, Hala; Badr, Ghada; Alohali, Yousef

    2015-01-01

    An artificial bee colony (ABC) is a relatively recent swarm intelligence optimization approach. In this paper, we propose the first attempt at applying ABC algorithm in analyzing a microarray gene expression profile. In addition, we propose an innovative feature selection algorithm, minimum redundancy maximum relevance (mRMR), and combine it with an ABC algorithm, mRMR-ABC, to select informative genes from microarray profile. The new approach is based on a support vector machine (SVM) algorithm to measure the classification accuracy for selected genes. We evaluate the performance of the proposed mRMR-ABC algorithm by conducting extensive experiments on six binary and multiclass gene expression microarray datasets. Furthermore, we compare our proposed mRMR-ABC algorithm with previously known techniques. We reimplemented two of these techniques for the sake of a fair comparison using the same parameters. These two techniques are mRMR when combined with a genetic algorithm (mRMR-GA) and mRMR when combined with a particle swarm optimization algorithm (mRMR-PSO). The experimental results prove that the proposed mRMR-ABC algorithm achieves accurate classification performance using small number of predictive genes when tested using both datasets and compared to previously suggested methods. This shows that mRMR-ABC is a promising approach for solving gene selection and cancer classification problems.

  8. A novel method of predicting microRNA-disease associations based on microRNA, disease, gene and environment factor networks.

    PubMed

    Peng, Wei; Lan, Wei; Zhong, Jiancheng; Wang, Jianxin; Pan, Yi

    2017-07-15

    MicroRNAs have been reported to have close relationship with diseases due to their deregulation of the expression of target mRNAs. Detecting disease-related microRNAs is helpful for disease therapies. With the development of high throughput experimental techniques, a large number of microRNAs have been sequenced. However, it is still a big challenge to identify which microRNAs are related to diseases. Recently, researchers are interesting in combining multiple-biological information to identify the associations between microRNAs and diseases. In this work, we have proposed a novel method to predict the microRNA-disease associations based on four biological properties. They are microRNA, disease, gene and environment factor. Compared with previous methods, our method makes predictions not only by using the prior knowledge of associations among microRNAs, disease, environment factors and genes, but also by using the internal relationship among these biological properties. We constructed four biological networks based on the similarity of microRNAs, diseases, environment factors and genes, respectively. Then random walking was implemented on the four networks unequally. In the walking course, the associations can be inferred from the neighbors in the same networks. Meanwhile the association information can be transferred from one network to another. The results of experiment showed that our method achieved better prediction performance than other existing state-of-the-art methods. Copyright © 2017 Elsevier Inc. All rights reserved.

  9. Combining transcription factor binding affinities with open-chromatin data for accurate gene expression prediction

    PubMed Central

    Schmidt, Florian; Gasparoni, Nina; Gasparoni, Gilles; Gianmoena, Kathrin; Cadenas, Cristina; Polansky, Julia K.; Ebert, Peter; Nordström, Karl; Barann, Matthias; Sinha, Anupam; Fröhler, Sebastian; Xiong, Jieyi; Dehghani Amirabad, Azim; Behjati Ardakani, Fatemeh; Hutter, Barbara; Zipprich, Gideon; Felder, Bärbel; Eils, Jürgen; Brors, Benedikt; Chen, Wei; Hengstler, Jan G.; Hamann, Alf; Lengauer, Thomas; Rosenstiel, Philip; Walter, Jörn; Schulz, Marcel H.

    2017-01-01

    The binding and contribution of transcription factors (TF) to cell specific gene expression is often deduced from open-chromatin measurements to avoid costly TF ChIP-seq assays. Thus, it is important to develop computational methods for accurate TF binding prediction in open-chromatin regions (OCRs). Here, we report a novel segmentation-based method, TEPIC, to predict TF binding by combining sets of OCRs with position weight matrices. TEPIC can be applied to various open-chromatin data, e.g. DNaseI-seq and NOMe-seq. Additionally, Histone-Marks (HMs) can be used to identify candidate TF binding sites. TEPIC computes TF affinities and uses open-chromatin/HM signal intensity as quantitative measures of TF binding strength. Using machine learning, we find low affinity binding sites to improve our ability to explain gene expression variability compared to the standard presence/absence classification of binding sites. Further, we show that both footprints and peaks capture essential TF binding events and lead to a good prediction performance. In our application, gene-based scores computed by TEPIC with one open-chromatin assay nearly reach the quality of several TF ChIP-seq data sets. Finally, these scores correctly predict known transcriptional regulators as illustrated by the application to novel DNaseI-seq and NOMe-seq data for primary human hepatocytes and CD4+ T-cells, respectively. PMID:27899623

  10. Construction of ontology augmented networks for protein complex prediction.

    PubMed

    Zhang, Yijia; Lin, Hongfei; Yang, Zhihao; Wang, Jian

    2013-01-01

    Protein complexes are of great importance in understanding the principles of cellular organization and function. The increase in available protein-protein interaction data, gene ontology and other resources make it possible to develop computational methods for protein complex prediction. Most existing methods focus mainly on the topological structure of protein-protein interaction networks, and largely ignore the gene ontology annotation information. In this article, we constructed ontology augmented networks with protein-protein interaction data and gene ontology, which effectively unified the topological structure of protein-protein interaction networks and the similarity of gene ontology annotations into unified distance measures. After constructing ontology augmented networks, a novel method (clustering based on ontology augmented networks) was proposed to predict protein complexes, which was capable of taking into account the topological structure of the protein-protein interaction network, as well as the similarity of gene ontology annotations. Our method was applied to two different yeast protein-protein interaction datasets and predicted many well-known complexes. The experimental results showed that (i) ontology augmented networks and the unified distance measure can effectively combine the structure closeness and gene ontology annotation similarity; (ii) our method is valuable in predicting protein complexes and has higher F1 and accuracy compared to other competing methods.

  11. Identification of altered pathways in breast cancer based on individualized pathway aberrance score.

    PubMed

    Shi, Sheng-Hong; Zhang, Wei; Jiang, Jing; Sun, Long

    2017-08-01

    The objective of the present study was to identify altered pathways in breast cancer based on the individualized pathway aberrance score (iPAS) method combined with the normal reference (nRef). There were 4 steps to identify altered pathways using the iPAS method: Data preprocessing conducted by the robust multi-array average (RMA) algorithm; gene-level statistics based on average Z ; pathway-level statistics according to iPAS; and a significance test dependent on 1 sample Wilcoxon test. The altered pathways were validated by calculating the changed percentage of each pathway in tumor samples and comparing them with pathways from differentially expressed genes (DEGs). A total of 688 altered pathways with P<0.01 were identified, including kinesin (KIF)- and polo-like kinase (PLK)-mediated events. When the percentage of change reached 50%, 310 pathways were involved in the total 688 altered pathways, which may validate the present results. In addition, there were 324 DEGs and 155 common genes between DEGs and pathway genes. DEGs and common genes were enriched in the same 9 significant terms, which also were members of altered pathways. The iPAS method was suitable for identifying altered pathways in breast cancer. Altered pathways (such as KIF and PLK mediated events) were important for understanding breast cancer mechanisms and for the future application of customized therapeutic decisions.

  12. Pre-Clinical Drug Prioritization via Prognosis-Guided Genetic Interaction Networks

    PubMed Central

    Xiong, Jianghui; Liu, Juan; Rayner, Simon; Tian, Ze; Li, Yinghui; Chen, Shanguang

    2010-01-01

    The high rates of failure in oncology drug clinical trials highlight the problems of using pre-clinical data to predict the clinical effects of drugs. Patient population heterogeneity and unpredictable physiology complicate pre-clinical cancer modeling efforts. We hypothesize that gene networks associated with cancer outcome in heterogeneous patient populations could serve as a reference for identifying drug effects. Here we propose a novel in vivo genetic interaction which we call ‘synergistic outcome determination’ (SOD), a concept similar to ‘Synthetic Lethality’. SOD is defined as the synergy of a gene pair with respect to cancer patients' outcome, whose correlation with outcome is due to cooperative, rather than independent, contributions of genes. The method combines microarray gene expression data with cancer prognostic information to identify synergistic gene-gene interactions that are then used to construct interaction networks based on gene modules (a group of genes which share similar function). In this way, we identified a cluster of important epigenetically regulated gene modules. By projecting drug sensitivity-associated genes on to the cancer-specific inter-module network, we defined a perturbation index for each drug based upon its characteristic perturbation pattern on the inter-module network. Finally, by calculating this index for compounds in the NCI Standard Agent Database, we significantly discriminated successful drugs from a broad set of test compounds, and further revealed the mechanisms of drug combinations. Thus, prognosis-guided synergistic gene-gene interaction networks could serve as an efficient in silico tool for pre-clinical drug prioritization and rational design of combinatorial therapies. PMID:21085674

  13. Congruent Deep Relationships in the Grape Family (Vitaceae) Based on Sequences of Chloroplast Genomes and Mitochondrial Genes via Genome Skimming

    PubMed Central

    Zhang, Ning; Wen, Jun; Zimmer, Elizabeth A.

    2015-01-01

    Vitaceae is well-known for having one of the most economically important fruits, i.e., the grape (Vitis vinifera). The deep phylogeny of the grape family was not resolved until a recent phylogenomic analysis of 417 nuclear genes from transcriptome data. However, it has been reported extensively that topologies based on nuclear and organellar genes may be incongruent due to differences in their evolutionary histories. Therefore, it is important to reconstruct a backbone phylogeny of the grape family using plastomes and mitochondrial genes. In this study, next-generation sequencing data sets of 27 species were obtained using genome skimming with total DNAs from silica-gel preserved tissue samples on an Illumina HiSeq 2500 instrument. Plastomes were assembled using the combination of de novo and reference genome (of V. vinifera) methods. Sixteen mitochondrial genes were also obtained via genome skimming using the reference genome of V. vinifera. Extensive phylogenetic analyses were performed using maximum likelihood and Bayesian methods. The topology based on either plastome data or mitochondrial genes is congruent with the one using hundreds of nuclear genes, indicating that the grape family did not exhibit significant reticulation at the deep level. The results showcase the power of genome skimming in capturing extensive phylogenetic data: especially from chloroplast and mitochondrial DNAs. PMID:26656830

  14. Congruent Deep Relationships in the Grape Family (Vitaceae) Based on Sequences of Chloroplast Genomes and Mitochondrial Genes via Genome Skimming.

    PubMed

    Zhang, Ning; Wen, Jun; Zimmer, Elizabeth A

    2015-01-01

    Vitaceae is well-known for having one of the most economically important fruits, i.e., the grape (Vitis vinifera). The deep phylogeny of the grape family was not resolved until a recent phylogenomic analysis of 417 nuclear genes from transcriptome data. However, it has been reported extensively that topologies based on nuclear and organellar genes may be incongruent due to differences in their evolutionary histories. Therefore, it is important to reconstruct a backbone phylogeny of the grape family using plastomes and mitochondrial genes. In this study,next-generation sequencing data sets of 27 species were obtained using genome skimming with total DNAs from silica-gel preserved tissue samples on an Illumina NextSeq 500 instrument [corrected]. Plastomes were assembled using the combination of de novo and reference genome (of V. vinifera) methods. Sixteen mitochondrial genes were also obtained via genome skimming using the reference genome of V. vinifera. Extensive phylogenetic analyses were performed using maximum likelihood and Bayesian methods. The topology based on either plastome data or mitochondrial genes is congruent with the one using hundreds of nuclear genes, indicating that the grape family did not exhibit significant reticulation at the deep level. The results showcase the power of genome skimming in capturing extensive phylogenetic data: especially from chloroplast and mitochondrial DNAs.

  15. Identification of Proteins Using iTRAQ and Virus-Induced Gene Silencing Reveals Three Bread Wheat Proteins Involved in the Response to Combined Osmotic-Cold Stress.

    PubMed

    Zhang, Ning; Zhang, Lingran; Shi, Chaonan; Zhao, Lei; Cui, Dangqun; Chen, Feng

    2018-05-25

    Crops are often subjected to a combination of stresses in the field. To date, studies on the physiological and molecular responses of common wheat to a combination of osmotic and cold stresses, however, remain unknown. In this study, wheat seedlings exposed to osmotic-cold stress for 24 h showed inhibited growth, as well as increased lipid peroxidation, relative electrolyte leakage, and soluble sugar contents. iTRAQ-based quantitative proteome method was employed to determine the proteomic profiles of the roots and leaves of wheat seedlings exposed to osmotic-cold stress conditions. A total of 250 and 258 proteins with significantly altered abundance in the roots and leaves were identified, respectively, and the majority of these proteins displayed differential abundance, thereby revealing organ-specific differences in adaptation to osmotic-cold stress. Yeast two hybrid assay examined five pairs of stress/defense-related protein-protein interactions in the predicted protein interaction network. Furthermore, quantitative real-time PCR analysis indicated that abiotic stresses increased the expression of three candidate protein genes, i.e., TaGRP2, CDCP, and Wcor410c in wheat leaves. Virus-induced gene silencing indicated that three genes TaGRP2, CDCP, and Wcor410c were involved in modulating osmotic-cold stress in common wheat. Our study provides useful information for the elucidation of molecular and genetics bases of osmotic-cold combined stress in bread wheat.

  16. GeneNetFinder2: Improved Inference of Dynamic Gene Regulatory Relations with Multiple Regulators.

    PubMed

    Han, Kyungsook; Lee, Jeonghoon

    2016-01-01

    A gene involved in complex regulatory interactions may have multiple regulators since gene expression in such interactions is often controlled by more than one gene. Another thing that makes gene regulatory interactions complicated is that regulatory interactions are not static, but change over time during the cell cycle. Most research so far has focused on identifying gene regulatory relations between individual genes in a particular stage of the cell cycle. In this study we developed a method for identifying dynamic gene regulations of several types from the time-series gene expression data. The method can find gene regulations with multiple regulators that work in combination or individually as well as those with single regulators. The method has been implemented as the second version of GeneNetFinder (hereafter called GeneNetFinder2) and tested on several gene expression datasets. Experimental results with gene expression data revealed the existence of genes that are not regulated by individual genes but rather by a combination of several genes. Such gene regulatory relations cannot be found by conventional methods. Our method finds such regulatory relations as well as those with multiple, independent regulators or single regulators, and represents gene regulatory relations as a dynamic network in which different gene regulatory relations are shown in different stages of the cell cycle. GeneNetFinder2 is available at http://bclab.inha.ac.kr/GeneNetFinder and will be useful for modeling dynamic gene regulations with multiple regulators.

  17. Combination of Metagenomics and Culture-Based Methods to Study the Interaction Between Ochratoxin A and Gut Microbiota

    PubMed Central

    Guo, Mingzhang; Huang, Kunlun; Chen, Siyuan; Qi, Xiaozhe; He, Xiaoyun; Cheng, Wen-Hsing; Luo, Yunbo; Xia, Kai; Xu, Wentao

    2014-01-01

    Gut microbiota represent an important bridge between environmental substances and host metabolism. Here we reported a comprehensive study of gut microbiota interaction with ochratoxin A (OTA), a major food-contaminating mycotoxin, using the combination of metagenomics and culture-based methods. Rats were given OTA (0, 70, or 210 μg/kg body weight) by gavage and fecal samples were collected at day 0 and day 28. Bacterial genomic DNA was extracted from the fecal samples and both 16S rRNA and shotgun sequencing (two main methods of metagenomics) were performed. The results indicated OTA treatment decreased the within-subject diversity of the gut microbiota, and the relative abundance of Lactobacillus increased considerably. Changes in functional genes of gut microbiota including signal transduction, carbohydrate transport, transposase, amino acid transport system, and mismatch repair were observed. To further understand the biological sense of increased Lactobacillus, Lactobacillus selective medium was used to isolate Lactobacillus species from fecal samples, and a strain with 99.8% 16S rRNA similarity with Lactobacillus plantarum strain PFK2 was obtained. Thin-layer chromatography showed that this strain could absorb but not degrade OTA, which was in agreement with the result in metagenomics that no genes related to OTA degradation increased. In conclusion, combination of metagenomics and culture-based methods can be a new strategy to study intestinal toxicity of toxins and find applicable bacterial strains for detoxification. When it comes to OTA, this kind of mycotoxin can cause compositional and functional changes of gut microbiota, and Lactobacillus are key genus to detoxify OTA in vivo. PMID:24973096

  18. Matrix factorization-based data fusion for gene function prediction in baker's yeast and slime mold.

    PubMed

    Zitnik, Marinka; Zupan, Blaž

    2014-01-01

    The development of effective methods for the characterization of gene functions that are able to combine diverse data sources in a sound and easily-extendible way is an important goal in computational biology. We have previously developed a general matrix factorization-based data fusion approach for gene function prediction. In this manuscript, we show that this data fusion approach can be applied to gene function prediction and that it can fuse various heterogeneous data sources, such as gene expression profiles, known protein annotations, interaction and literature data. The fusion is achieved by simultaneous matrix tri-factorization that shares matrix factors between sources. We demonstrate the effectiveness of the approach by evaluating its performance on predicting ontological annotations in slime mold D. discoideum and on recognizing proteins of baker's yeast S. cerevisiae that participate in the ribosome or are located in the cell membrane. Our approach achieves predictive performance comparable to that of the state-of-the-art kernel-based data fusion, but requires fewer data preprocessing steps.

  19. Inference of time-delayed gene regulatory networks based on dynamic Bayesian network hybrid learning method

    PubMed Central

    Yu, Bin; Xu, Jia-Meng; Li, Shan; Chen, Cheng; Chen, Rui-Xin; Wang, Lei; Zhang, Yan; Wang, Ming-Hui

    2017-01-01

    Gene regulatory networks (GRNs) research reveals complex life phenomena from the perspective of gene interaction, which is an important research field in systems biology. Traditional Bayesian networks have a high computational complexity, and the network structure scoring model has a single feature. Information-based approaches cannot identify the direction of regulation. In order to make up for the shortcomings of the above methods, this paper presents a novel hybrid learning method (DBNCS) based on dynamic Bayesian network (DBN) to construct the multiple time-delayed GRNs for the first time, combining the comprehensive score (CS) with the DBN model. DBNCS algorithm first uses CMI2NI (conditional mutual inclusive information-based network inference) algorithm for network structure profiles learning, namely the construction of search space. Then the redundant regulations are removed by using the recursive optimization algorithm (RO), thereby reduce the false positive rate. Secondly, the network structure profiles are decomposed into a set of cliques without loss, which can significantly reduce the computational complexity. Finally, DBN model is used to identify the direction of gene regulation within the cliques and search for the optimal network structure. The performance of DBNCS algorithm is evaluated by the benchmark GRN datasets from DREAM challenge as well as the SOS DNA repair network in Escherichia coli, and compared with other state-of-the-art methods. The experimental results show the rationality of the algorithm design and the outstanding performance of the GRNs. PMID:29113310

  20. Inference of time-delayed gene regulatory networks based on dynamic Bayesian network hybrid learning method.

    PubMed

    Yu, Bin; Xu, Jia-Meng; Li, Shan; Chen, Cheng; Chen, Rui-Xin; Wang, Lei; Zhang, Yan; Wang, Ming-Hui

    2017-10-06

    Gene regulatory networks (GRNs) research reveals complex life phenomena from the perspective of gene interaction, which is an important research field in systems biology. Traditional Bayesian networks have a high computational complexity, and the network structure scoring model has a single feature. Information-based approaches cannot identify the direction of regulation. In order to make up for the shortcomings of the above methods, this paper presents a novel hybrid learning method (DBNCS) based on dynamic Bayesian network (DBN) to construct the multiple time-delayed GRNs for the first time, combining the comprehensive score (CS) with the DBN model. DBNCS algorithm first uses CMI2NI (conditional mutual inclusive information-based network inference) algorithm for network structure profiles learning, namely the construction of search space. Then the redundant regulations are removed by using the recursive optimization algorithm (RO), thereby reduce the false positive rate. Secondly, the network structure profiles are decomposed into a set of cliques without loss, which can significantly reduce the computational complexity. Finally, DBN model is used to identify the direction of gene regulation within the cliques and search for the optimal network structure. The performance of DBNCS algorithm is evaluated by the benchmark GRN datasets from DREAM challenge as well as the SOS DNA repair network in Escherichia coli , and compared with other state-of-the-art methods. The experimental results show the rationality of the algorithm design and the outstanding performance of the GRNs.

  1. oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes

    PubMed Central

    Ho Sui, Shannan J.; Mortimer, James R.; Arenillas, David J.; Brumm, Jochen; Walsh, Christopher J.; Kennedy, Brian P.; Wasserman, Wyeth W.

    2005-01-01

    Targeted transcript profiling studies can identify sets of co-expressed genes; however, identification of the underlying functional mechanism(s) is a significant challenge. Established methods for the analysis of gene annotations, particularly those based on the Gene Ontology, can identify functional linkages between genes. Similar methods for the identification of over-represented transcription factor binding sites (TFBSs) have been successful in yeast, but extension to human genomics has largely proved ineffective. Creation of a system for the efficient identification of common regulatory mechanisms in a subset of co-expressed human genes promises to break a roadblock in functional genomics research. We have developed an integrated system that searches for evidence of co-regulation by one or more transcription factors (TFs). oPOSSUM combines a pre-computed database of conserved TFBSs in human and mouse promoters with statistical methods for identification of sites over-represented in a set of co-expressed genes. The algorithm successfully identified mediating TFs in control sets of tissue-specific genes and in sets of co-expressed genes from three transcript profiling studies. Simulation studies indicate that oPOSSUM produces few false positives using empirically defined thresholds and can tolerate up to 50% noise in a set of co-expressed genes. PMID:15933209

  2. [Clinical utility of real-time fluorescent PCR for combined detection of anaplastic lymphoma kinase and c-ros oncogene 1 receptor tyrosine kinase in non-small cell lung cancer].

    PubMed

    Bai, D Y; Zhang, H P; Zhong, S; Suo, W H; Gao, D H; Ding, Y; Tu, J H

    2016-12-23

    Objective: To investigate the clinical application value of combined detection of ALK fusion gene and c-ros oncogene 1 receptor tyrosine kinase (ROS1) fusion gene in non-small cell lung cancer (NSCLC) using real-time fluorescent PCR. Methods: A kit for combined detection of ALK fusion gene and ROS1 fusion gene based on fluorescent PCR was used to simultaneously detect the two fusion genes in 302 cases of NSCLC specimens. The results were validated through Sanger sequencing. The consistency of the two detection methods was analyzed. Results: All 302 cases of NSCLC specimens were successfully analyzed through fluorescent PCR (302/302). 12 cases (4.0%) were found to contain ALK fusion gene, including 3 cases with ALK-M1, 3 with ALK-M2, 3 with ALK-M3, 1 with ALK-M4, and 2 with ALK-M6 fusion gene.12 cases (4.0%) were found to contain ROS1 fusion gene, including 1 case with ROS1-M7, 8 cases with ROS1-M8, 1 case with ROS1-M12, 1 case with ROS1-M14, and 1 case with double-positive ROS1-M3 and ROS1-M8 fusion genes. The total detection rate of ALK fusion gene and ROS1 fusion gene was 7.9% (24/302) and 278 cases showed to be negative for ALK fusion gene and ROS1 fusion gene. The successful detection rates for Sanger DNA sequencing were also 100%. The positive, negative and total coincidence rates obtained by real-time fluorescent PCR and by Sanger DNA sequencing were all 100%. Conclusions: The results of Sanger DNA sequencing demonstrate that the real-time fluorescent PCR assay is equally effective in detecting ALK and ROS1 fusion genes in NSCLC tissues. Furthermore, real-time fluorescent PCR assay can be used to detect trace ALK and ROS1 fusion gene simultaneously in tiny samples, and can save time and avoid repeated sampling. It is worthy of recommendation as a rapid and reliable detection technique.

  3. A Comparative Study on Multifactor Dimensionality Reduction Methods for Detecting Gene-Gene Interactions with the Survival Phenotype

    PubMed Central

    Lee, Seungyeoun; Kim, Yongkang; Kwon, Min-Seok; Park, Taesung

    2015-01-01

    Genome-wide association studies (GWAS) have extensively analyzed single SNP effects on a wide variety of common and complex diseases and found many genetic variants associated with diseases. However, there is still a large portion of the genetic variants left unexplained. This missing heritability problem might be due to the analytical strategy that limits analyses to only single SNPs. One of possible approaches to the missing heritability problem is to consider identifying multi-SNP effects or gene-gene interactions. The multifactor dimensionality reduction method has been widely used to detect gene-gene interactions based on the constructive induction by classifying high-dimensional genotype combinations into one-dimensional variable with two attributes of high risk and low risk for the case-control study. Many modifications of MDR have been proposed and also extended to the survival phenotype. In this study, we propose several extensions of MDR for the survival phenotype and compare the proposed extensions with earlier MDR through comprehensive simulation studies. PMID:26339630

  4. An integrative machine learning strategy for improved prediction of essential genes in Escherichia coli metabolism using flux-coupled features.

    PubMed

    Nandi, Sutanu; Subramanian, Abhishek; Sarkar, Ram Rup

    2017-07-25

    Prediction of essential genes helps to identify a minimal set of genes that are absolutely required for the appropriate functioning and survival of a cell. The available machine learning techniques for essential gene prediction have inherent problems, like imbalanced provision of training datasets, biased choice of the best model for a given balanced dataset, choice of a complex machine learning algorithm, and data-based automated selection of biologically relevant features for classification. Here, we propose a simple support vector machine-based learning strategy for the prediction of essential genes in Escherichia coli K-12 MG1655 metabolism that integrates a non-conventional combination of an appropriate sample balanced training set, a unique organism-specific genotype, phenotype attributes that characterize essential genes, and optimal parameters of the learning algorithm to generate the best machine learning model (the model with the highest accuracy among all the models trained for different sample training sets). For the first time, we also introduce flux-coupled metabolic subnetwork-based features for enhancing the classification performance. Our strategy proves to be superior as compared to previous SVM-based strategies in obtaining a biologically relevant classification of genes with high sensitivity and specificity. This methodology was also trained with datasets of other recent supervised classification techniques for essential gene classification and tested using reported test datasets. The testing accuracy was always high as compared to the known techniques, proving that our method outperforms known methods. Observations from our study indicate that essential genes are conserved among homologous bacterial species, demonstrate high codon usage bias, GC content and gene expression, and predominantly possess a tendency to form physiological flux modules in metabolism.

  5. Terminator Operon Reporter: combining a transcription termination switch with reporter technology for improved gene synthesis and synthetic biology applications.

    PubMed

    Zampini, Massimiliano; Mur, Luis A J; Rees Stevens, Pauline; Pachebat, Justin A; Newbold, C James; Hayes, Finbarr; Kingston-Smith, Alison

    2016-05-25

    Synthetic biology is characterized by the development of novel and powerful DNA fabrication methods and by the application of engineering principles to biology. The current study describes Terminator Operon Reporter (TOR), a new gene assembly technology based on the conditional activation of a reporter gene in response to sequence errors occurring at the assembly stage of the synthetic element. These errors are monitored by a transcription terminator that is placed between the synthetic gene and reporter gene. Switching of this terminator between active and inactive states dictates the transcription status of the downstream reporter gene to provide a rapid and facile readout of the accuracy of synthetic assembly. Designed specifically and uniquely for the synthesis of protein coding genes in bacteria, TOR allows the rapid and cost-effective fabrication of synthetic constructs by employing oligonucleotides at the most basic purification level (desalted) and without the need for costly and time-consuming post-synthesis correction methods. Thus, TOR streamlines gene assembly approaches, which are central to the future development of synthetic biology.

  6. Rapid and efficient gene delivery into the adult mouse brain via focal electroporation

    PubMed Central

    Nomura, Tadashi; Nishimura, Yusuke; Gotoh, Hitoshi; Ono, Katsuhiko

    2016-01-01

    In vivo gene delivery is required for studying the cellular and molecular mechanisms of various biological events. Virus-mediated gene transfer or generation of transgenic animals is widely used; however, these methods are time-consuming and expensive. Here we show an improved electroporation technique for acute gene delivery into the adult mouse brain. Using a syringe-based microelectrode, local DNA injection and the application of electric current can be performed simultaneously; this allows rapid and efficient gene transduction of adult non-neuronal cells. Combining this technique with various expression vectors that carry specific promoters resulted in targeted gene expression in astrocytic cells. Our results constitute a powerful strategy for the genetic manipulation of adult brains in a spatio-temporally controlled manner. PMID:27430903

  7. Targeted gene knock-in by homology-directed genome editing using Cas9 ribonucleoprotein and AAV donor delivery.

    PubMed

    Gaj, Thomas; Staahl, Brett T; Rodrigues, Gonçalo M C; Limsirichai, Prajit; Ekman, Freja K; Doudna, Jennifer A; Schaffer, David V

    2017-06-20

    Realizing the full potential of genome editing requires the development of efficient and broadly applicable methods for delivering programmable nucleases and donor templates for homology-directed repair (HDR). The RNA-guided Cas9 endonuclease can be introduced into cells as a purified protein in complex with a single guide RNA (sgRNA). Such ribonucleoproteins (RNPs) can facilitate the high-fidelity introduction of single-base substitutions via HDR following co-delivery with a single-stranded DNA oligonucleotide. However, combining RNPs with transgene-containing donor templates for targeted gene addition has proven challenging, which in turn has limited the capabilities of the RNP-mediated genome editing toolbox. Here, we demonstrate that combining RNP delivery with naturally recombinogenic adeno-associated virus (AAV) donor vectors enables site-specific gene insertion by homology-directed genome editing. Compared to conventional plasmid-based expression vectors and donor templates, we show that combining RNP and AAV donor delivery increases the efficiency of gene addition by up to 12-fold, enabling the creation of lineage reporters that can be used to track the conversion of striatal neurons from human fibroblasts in real time. These results thus illustrate the potential for unifying nuclease protein delivery with AAV donor vectors for homology-directed genome editing. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  8. Automating gene library synthesis by structure-based combinatorial protein engineering: examples from plant sesquiterpene synthases.

    PubMed

    Dokarry, Melissa; Laurendon, Caroline; O'Maille, Paul E

    2012-01-01

    Structure-based combinatorial protein engineering (SCOPE) is a homology-independent recombination method to create multiple crossover gene libraries by assembling defined combinations of structural elements ranging from single mutations to domains of protein structure. SCOPE was originally inspired by DNA shuffling, which mimics recombination during meiosis, where mutations from parental genes are "shuffled" to create novel combinations in the resulting progeny. DNA shuffling utilizes sequence identity between parental genes to mediate template-switching events (the annealing and extension of one parental gene fragment on another) in PCR reassembly reactions to generate crossovers and hence recombination between parental genes. In light of the conservation of protein structure and degeneracy of sequence, SCOPE was developed to enable the "shuffling" of distantly related genes with no requirement for sequence identity. The central principle involves the use of oligonucleotides to encode for crossover regions to choreograph template-switching events during PCR assembly of gene fragments to create chimeric genes. This approach was initially developed to create libraries of hybrid DNA polymerases from distantly related parents, and later developed to create a combinatorial mutant library of sesquiterpene synthases to explore the catalytic landscapes underlying the functional divergence of related enzymes. This chapter presents a simplified protocol of SCOPE that can be integrated with different mutagenesis techniques and is suitable for automation by liquid-handling robots. Two examples are presented to illustrate the application of SCOPE to create gene libraries using plant sesquiterpene synthases as the model system. In the first example, we outline how to create an active-site library as a series of complex mixtures of diverse mutants. In the second example, we outline how to create a focused library as an array of individual clones to distil minimal combinations of functionally important mutations. Through these examples, the principles of the technique are illustrated and the suitability of automating various aspects of the procedure for given applications are discussed. Copyright © 2012 Elsevier Inc. All rights reserved.

  9. A Gene Module-Based eQTL Analysis Prioritizing Disease Genes and Pathways in Kidney Cancer.

    PubMed

    Yang, Mary Qu; Li, Dan; Yang, William; Zhang, Yifan; Liu, Jun; Tong, Weida

    2017-01-01

    Clear cell renal cell carcinoma (ccRCC) is the most common and most aggressive form of renal cell cancer (RCC). The incidence of RCC has increased steadily in recent years. The pathogenesis of renal cell cancer remains poorly understood. Many of the tumor suppressor genes, oncogenes, and dysregulated pathways in ccRCC need to be revealed for improvement of the overall clinical outlook of the disease. Here, we developed a systems biology approach to prioritize the somatic mutated genes that lead to dysregulation of pathways in ccRCC. The method integrated multi-layer information to infer causative mutations and disease genes. First, we identified differential gene modules in ccRCC by coupling transcriptome and protein-protein interactions. Each of these modules consisted of interacting genes that were involved in similar biological processes and their combined expression alterations were significantly associated with disease type. Then, subsequent gene module-based eQTL analysis revealed somatic mutated genes that had driven the expression alterations of differential gene modules. Our study yielded a list of candidate disease genes, including several known ccRCC causative genes such as BAP1 and PBRM1 , as well as novel genes such as NOD2, RRM1, CSRNP1, SLC4A2, TTLL1 and CNTN1. The differential gene modules and their driver genes revealed by our study provided a new perspective for understanding the molecular mechanisms underlying the disease. Moreover, we validated the results in independent ccRCC patient datasets. Our study provided a new method for prioritizing disease genes and pathways.

  10. Identification of Bacillus Probiotics Isolated from Soil Rhizosphere Using 16S rRNA, recA, rpoB Gene Sequencing and RAPD-PCR.

    PubMed

    Mohkam, Milad; Nezafat, Navid; Berenjian, Aydin; Mobasher, Mohammad Ali; Ghasemi, Younes

    2016-03-01

    Some Bacillus species, especially Bacillus subtilis and Bacillus pumilus groups, have highly similar 16S rRNA gene sequences, which are hard to identify based on 16S rDNA sequence analysis. To conquer this drawback, rpoB, recA sequence analysis along with randomly amplified polymorphic (RAPD) fingerprinting was examined as an alternative method for differentiating Bacillus species. The 16S rRNA, rpoB and recA genes were amplified via a polymerase chain reaction using their specific primers. The resulted PCR amplicons were sequenced, and phylogenetic analysis was employed by MEGA 6 software. Identification based on 16S rRNA gene sequencing was underpinned by rpoB and recA gene sequencing as well as RAPD-PCR technique. Subsequently, concatenation and phylogenetic analysis showed that extent of diversity and similarity were better obtained by rpoB and recA primers, which are also reinforced by RAPD-PCR methods. However, in one case, these approaches failed to identify one isolate, which in combination with the phenotypical method offsets this issue. Overall, RAPD fingerprinting, rpoB and recA along with concatenated genes sequence analysis discriminated closely related Bacillus species, which highlights the significance of the multigenic method in more precisely distinguishing Bacillus strains. This research emphasizes the benefit of RAPD fingerprinting, rpoB and recA sequence analysis superior to 16S rRNA gene sequence analysis for suitable and effective identification of Bacillus species as recommended for probiotic products.

  11. PGMapper: a web-based tool linking phenotype to genes.

    PubMed

    Xiong, Qing; Qiu, Yuhui; Gu, Weikuan

    2008-04-01

    With the availability of whole genome sequence in many species, linkage analysis, positional cloning and microarray are gradually becoming powerful tools for investigating the links between phenotype and genotype or genes. However, in these methods, causative genes underlying a quantitative trait locus, or a disease, are usually located within a large genomic region or a large set of genes. Examining the function of every gene is very time consuming and needs to retrieve and integrate the information from multiple databases or genome resources. PGMapper is a software tool for automatically matching phenotype to genes from a defined genome region or a group of given genes by combining the mapping information from the Ensembl database and gene function information from the OMIM and PubMed databases. PGMapper is currently available for candidate gene search of human, mouse, rat, zebrafish and 12 other species. Available online at http://www.genediscovery.org/pgmapper/index.jsp.

  12. Extracting rate changes in transcriptional regulation from MEDLINE abstracts.

    PubMed

    Liu, Wenting; Miao, Kui; Li, Guangxia; Chang, Kuiyu; Zheng, Jie; Rajapakse, Jagath C

    2014-01-01

    Time delays are important factors that are often neglected in gene regulatory network (GRN) inference models. Validating time delays from knowledge bases is a challenge since the vast majority of biological databases do not record temporal information of gene regulations. Biological knowledge and facts on gene regulations are typically extracted from bio-literature with specialized methods that depend on the regulation task. In this paper, we mine evidences for time delays related to the transcriptional regulation of yeast from the PubMed abstracts. Since the vast majority of abstracts lack quantitative time information, we can only collect qualitative evidences of time delays. Specifically, the speed-up or delay in transcriptional regulation rate can provide evidences for time delays (shorter or longer) in GRN. Thus, we focus on deriving events related to rate changes in transcriptional regulation. A corpus of yeast regulation related abstracts was manually labeled with such events. In order to capture these events automatically, we create an ontology of sub-processes that are likely to result in transcription rate changes by combining textual patterns and biological knowledge. We also propose effective feature extraction methods based on the created ontology to identify the direct evidences with specific details of these events. Our ontologies outperform existing state-of-the-art gene regulation ontologies in the automatic rule learning method applied to our corpus. The proposed deterministic ontology rule-based method can achieve comparable performance to the automatic rule learning method based on decision trees. This demonstrates the effectiveness of our ontology in identifying rate-changing events. We also tested the effectiveness of the proposed feature mining methods on detecting direct evidence of events. Experimental results show that the machine learning method on these features achieves an F1-score of 71.43%. The manually labeled corpus of events relating to rate changes in transcriptional regulation for yeast is available in https://sites.google.com/site/wentingntu/data. The created ontologies summarized both biological causes of rate changes in transcriptional regulation and corresponding positive and negative textual patterns from the corpus. They are demonstrated to be effective in identifying rate-changing events, which shows the benefits of combining textual patterns and biological knowledge on extracting complex biological events.

  13. Recognition of Protein-coding Genes Based on Z-curve Algorithms

    PubMed Central

    -Biao Guo, Feng; Lin, Yan; -Ling Chen, Ling

    2014-01-01

    Recognition of protein-coding genes, a classical bioinformatics issue, is an absolutely needed step for annotating newly sequenced genomes. The Z-curve algorithm, as one of the most effective methods on this issue, has been successfully applied in annotating or re-annotating many genomes, including those of bacteria, archaea and viruses. Two Z-curve based ab initio gene-finding programs have been developed: ZCURVE (for bacteria and archaea) and ZCURVE_V (for viruses and phages). ZCURVE_C (for 57 bacteria) and Zfisher (for any bacterium) are web servers for re-annotation of bacterial and archaeal genomes. The above four tools can be used for genome annotation or re-annotation, either independently or combined with the other gene-finding programs. In addition to recognizing protein-coding genes and exons, Z-curve algorithms are also effective in recognizing promoters and translation start sites. Here, we summarize the applications of Z-curve algorithms in gene finding and genome annotation. PMID:24822027

  14. Tissue Non-Specific Genes and Pathways Associated with Diabetes: An Expression Meta-Analysis.

    PubMed

    Mei, Hao; Li, Lianna; Liu, Shijian; Jiang, Fan; Griswold, Michael; Mosley, Thomas

    2017-01-21

    We performed expression studies to identify tissue non-specific genes and pathways of diabetes by meta-analysis. We searched curated datasets of the Gene Expression Omnibus (GEO) database and identified 13 and five expression studies of diabetes and insulin responses at various tissues, respectively. We tested differential gene expression by empirical Bayes-based linear method and investigated gene set expression association by knowledge-based enrichment analysis. Meta-analysis by different methods was applied to identify tissue non-specific genes and gene sets. We also proposed pathway mapping analysis to infer functions of the identified gene sets, and correlation and independent analysis to evaluate expression association profile of genes and gene sets between studies and tissues. Our analysis showed that PGRMC1 and HADH genes were significant over diabetes studies, while IRS1 and MPST genes were significant over insulin response studies, and joint analysis showed that HADH and MPST genes were significant over all combined data sets. The pathway analysis identified six significant gene sets over all studies. The KEGG pathway mapping indicated that the significant gene sets are related to diabetes pathogenesis. The results also presented that 12.8% and 59.0% pairwise studies had significantly correlated expression association for genes and gene sets, respectively; moreover, 12.8% pairwise studies had independent expression association for genes, but no studies were observed significantly different for expression association of gene sets. Our analysis indicated that there are both tissue specific and non-specific genes and pathways associated with diabetes pathogenesis. Compared to the gene expression, pathway association tends to be tissue non-specific, and a common pathway influencing diabetes development is activated through different genes at different tissues.

  15. Semi-Supervised Multi-View Learning for Gene Network Reconstruction

    PubMed Central

    Ceci, Michelangelo; Pio, Gianvito; Kuzmanovski, Vladimir; Džeroski, Sašo

    2015-01-01

    The task of gene regulatory network reconstruction from high-throughput data is receiving increasing attention in recent years. As a consequence, many inference methods for solving this task have been proposed in the literature. It has been recently observed, however, that no single inference method performs optimally across all datasets. It has also been shown that the integration of predictions from multiple inference methods is more robust and shows high performance across diverse datasets. Inspired by this research, in this paper, we propose a machine learning solution which learns to combine predictions from multiple inference methods. While this approach adds additional complexity to the inference process, we expect it would also carry substantial benefits. These would come from the automatic adaptation to patterns on the outputs of individual inference methods, so that it is possible to identify regulatory interactions more reliably when these patterns occur. This article demonstrates the benefits (in terms of accuracy of the reconstructed networks) of the proposed method, which exploits an iterative, semi-supervised ensemble-based algorithm. The algorithm learns to combine the interactions predicted by many different inference methods in the multi-view learning setting. The empirical evaluation of the proposed algorithm on a prokaryotic model organism (E. coli) and on a eukaryotic model organism (S. cerevisiae) clearly shows improved performance over the state of the art methods. The results indicate that gene regulatory network reconstruction for the real datasets is more difficult for S. cerevisiae than for E. coli. The software, all the datasets used in the experiments and all the results are available for download at the following link: http://figshare.com/articles/Semi_supervised_Multi_View_Learning_for_Gene_Network_Reconstruction/1604827. PMID:26641091

  16. Inductive matrix completion for predicting gene-disease associations.

    PubMed

    Natarajan, Nagarajan; Dhillon, Inderjit S

    2014-06-15

    Most existing methods for predicting causal disease genes rely on specific type of evidence, and are therefore limited in terms of applicability. More often than not, the type of evidence available for diseases varies-for example, we may know linked genes, keywords associated with the disease obtained by mining text, or co-occurrence of disease symptoms in patients. Similarly, the type of evidence available for genes varies-for example, specific microarray probes convey information only for certain sets of genes. In this article, we apply a novel matrix-completion method called Inductive Matrix Completion to the problem of predicting gene-disease associations; it combines multiple types of evidence (features) for diseases and genes to learn latent factors that explain the observed gene-disease associations. We construct features from different biological sources such as microarray expression data and disease-related textual data. A crucial advantage of the method is that it is inductive; it can be applied to diseases not seen at training time, unlike traditional matrix-completion approaches and network-based inference methods that are transductive. Comparison with state-of-the-art methods on diseases from the Online Mendelian Inheritance in Man (OMIM) database shows that the proposed approach is substantially better-it has close to one-in-four chance of recovering a true association in the top 100 predictions, compared to the recently proposed Catapult method (second best) that has <15% chance. We demonstrate that the inductive method is particularly effective for a query disease with no previously known gene associations, and for predicting novel genes, i.e. genes that are previously not linked to diseases. Thus the method is capable of predicting novel genes even for well-characterized diseases. We also validate the novelty of predictions by evaluating the method on recently reported OMIM associations and on associations recently reported in the literature. Source code and datasets can be downloaded from http://bigdata.ices.utexas.edu/project/gene-disease. © The Author 2014. Published by Oxford University Press.

  17. Identification of essential genes and synthetic lethal gene combinations in Escherichia coli K-12.

    PubMed

    Mori, Hirotada; Baba, Tomoya; Yokoyama, Katsushi; Takeuchi, Rikiya; Nomura, Wataru; Makishi, Kazuichi; Otsuka, Yuta; Dose, Hitomi; Wanner, Barry L

    2015-01-01

    Here we describe the systematic identification of single genes and gene pairs, whose knockout causes lethality in Escherichia coli K-12. During construction of precise single-gene knockout library of E. coli K-12, we identified 328 essential gene candidates for growth in complex (LB) medium. Upon establishment of the Keio single-gene deletion library, we undertook the development of the ASKA single-gene deletion library carrying a different antibiotic resistance. In addition, we developed tools for identification of synthetic lethal gene combinations by systematic construction of double-gene knockout mutants. We introduce these methods herein.

  18. Protein-protein interaction inference based on semantic similarity of Gene Ontology terms.

    PubMed

    Zhang, Shu-Bo; Tang, Qiang-Rong

    2016-07-21

    Identifying protein-protein interactions is important in molecular biology. Experimental methods to this issue have their limitations, and computational approaches have attracted more and more attentions from the biological community. The semantic similarity derived from the Gene Ontology (GO) annotation has been regarded as one of the most powerful indicators for protein interaction. However, conventional methods based on GO similarity fail to take advantage of the specificity of GO terms in the ontology graph. We proposed a GO-based method to predict protein-protein interaction by integrating different kinds of similarity measures derived from the intrinsic structure of GO graph. We extended five existing methods to derive the semantic similarity measures from the descending part of two GO terms in the GO graph, then adopted a feature integration strategy to combines both the ascending and the descending similarity scores derived from the three sub-ontologies to construct various kinds of features to characterize each protein pair. Support vector machines (SVM) were employed as discriminate classifiers, and five-fold cross validation experiments were conducted on both human and yeast protein-protein interaction datasets to evaluate the performance of different kinds of integrated features, the experimental results suggest the best performance of the feature that combines information from both the ascending and the descending parts of the three ontologies. Our method is appealing for effective prediction of protein-protein interaction. Copyright © 2016 Elsevier Ltd. All rights reserved.

  19. How DNA barcoding can be more effective in microalgae identification: a case of cryptic diversity revelation in Scenedesmus (Chlorophyceae)

    PubMed Central

    Zou, Shanmei; Fei, Cong; Wang, Chun; Gao, Zhan; Bao, Yachao; He, Meilin; Wang, Changhai

    2016-01-01

    Microalgae identification is extremely difficult. The efficiency of DNA barcoding in microalgae identification involves ideal gene markers and approaches employed, which however, is still under the way. Although Scenedesmus has obtained much research in producing lipids its identification is difficult. Here we present a comprehensive coalescent, distance and character-based DNA barcoding for 118 Scenedesmus strains based on rbcL, tufA, ITS and 16S. The four genes, and their combined data rbcL + tufA + ITS + 16S, rbcL + tufA and ITS + 16S were analyzed by all of GMYC, P ID, PTP, ABGD, and character-based barcoding respectively. It was apparent that the three combined gene data showed a higher proportion of resolution success than the single gene. In comparison, the GMYC and PTP analysis produced more taxonomic lineages. The ABGD generated various resolution in discrimination among the single and combined data. The character-based barcoding was proved to be the most effective approach for species discrimination in both single and combined data which produced consistent species identification. All the integrated results recovered 11 species, five out of which were revealed as potential cryptic species. We suggest that the character-based DNA barcoding together with other approaches based on multiple genes and their combined data could be more effective in microalgae diversity revelation. PMID:27827440

  20. How DNA barcoding can be more effective in microalgae identification: a case of cryptic diversity revelation in Scenedesmus (Chlorophyceae).

    PubMed

    Zou, Shanmei; Fei, Cong; Wang, Chun; Gao, Zhan; Bao, Yachao; He, Meilin; Wang, Changhai

    2016-11-09

    Microalgae identification is extremely difficult. The efficiency of DNA barcoding in microalgae identification involves ideal gene markers and approaches employed, which however, is still under the way. Although Scenedesmus has obtained much research in producing lipids its identification is difficult. Here we present a comprehensive coalescent, distance and character-based DNA barcoding for 118 Scenedesmus strains based on rbcL, tufA, ITS and 16S. The four genes, and their combined data rbcL + tufA + ITS + 16S, rbcL + tufA and ITS + 16S were analyzed by all of GMYC, P ID, PTP, ABGD, and character-based barcoding respectively. It was apparent that the three combined gene data showed a higher proportion of resolution success than the single gene. In comparison, the GMYC and PTP analysis produced more taxonomic lineages. The ABGD generated various resolution in discrimination among the single and combined data. The character-based barcoding was proved to be the most effective approach for species discrimination in both single and combined data which produced consistent species identification. All the integrated results recovered 11 species, five out of which were revealed as potential cryptic species. We suggest that the character-based DNA barcoding together with other approaches based on multiple genes and their combined data could be more effective in microalgae diversity revelation.

  1. A graph-based semantic similarity measure for the gene ontology.

    PubMed

    Alvarez, Marco A; Yan, Changhui

    2011-12-01

    Existing methods for calculating semantic similarities between pairs of Gene Ontology (GO) terms and gene products often rely on external databases like Gene Ontology Annotation (GOA) that annotate gene products using the GO terms. This dependency leads to some limitations in real applications. Here, we present a semantic similarity algorithm (SSA), that relies exclusively on the GO. When calculating the semantic similarity between a pair of input GO terms, SSA takes into account the shortest path between them, the depth of their nearest common ancestor, and a novel similarity score calculated between the definitions of the involved GO terms. In our work, we use SSA to calculate semantic similarities between pairs of proteins by combining pairwise semantic similarities between the GO terms that annotate the involved proteins. The reliability of SSA was evaluated by comparing the resulting semantic similarities between proteins with the functional similarities between proteins derived from expert annotations or sequence similarity. Comparisons with existing state-of-the-art methods showed that SSA is highly competitive with the other methods. SSA provides a reliable measure for semantics similarity independent of external databases of functional-annotation observations.

  2. Analysis of differential gene expression by bead-based fiber-optic array in growth-hormone-secreting pituitary adenomas.

    PubMed

    Jiang, Zhiquan; Gui, Songbo; Zhang, Yazhuo

    2010-09-01

    Growth-hormone-secreting pituitary adenomas (GHomas) account for approximately 20% of all pituitary neoplasms. However, the pathogenesis of GHomas remains to be elucidated. To explore the possible pathogenesis of GHomas, we used bead-based fiber-optic arrays to examine the gene expression in five GHomas and compared them to three healthy pituitaries. Four differentially expressed genes were chosen randomly for validation by quantitative real-time reverse transcription-polymerase chain reaction. We then performed pathway analysis on the identified differentially expressed genes using the Kyoto Encyclopedia of Genes and Genomes. Array analysis showed significant increases in the expression of 353 genes and 206 expressed sequence tags (ESTs) and decreases in 565 genes and 29 ESTs. Bioinformatic analysis showed that the genes HIGD1B, HOXB2, ANGPT2, HPGD and BTG2 may play an important role in the tumorigenesis and progression of GHomas. Pathway analysis showed that the wingless-type signaling pathway and extracellular-matrix receptor interactions may play a key role in the tumorigenesis and progression of GHomas. Our data suggested that there are numerous aberrantly expressed genes and pathways involved in the pathogenesis of GHomas. Bead-based fiber-optic arrays combined with pathway analysis of differentially expressed genes appear to be a valid method for investigating the pathogenesis of tumors.

  3. Analysis of differential gene expression by bead-based fiber-optic array in growth-hormone-secreting pituitary adenomas

    PubMed Central

    JIANG, ZHIQUAN; GUI, SONGBO; ZHANG, YAZHUO

    2010-01-01

    Growth-hormone-secreting pituitary adenomas (GHomas) account for approximately 20% of all pituitary neoplasms. However, the pathogenesis of GHomas remains to be elucidated. To explore the possible pathogenesis of GHomas, we used bead-based fiber-optic arrays to examine the gene expression in five GHomas and compared them to three healthy pituitaries. Four differentially expressed genes were chosen randomly for validation by quantitative real-time reverse transcription-polymerase chain reaction. We then performed pathway analysis on the identified differentially expressed genes using the Kyoto Encyclopedia of Genes and Genomes. Array analysis showed significant increases in the expression of 353 genes and 206 expressed sequence tags (ESTs) and decreases in 565 genes and 29 ESTs. Bioinformatic analysis showed that the genes HIGD1B, HOXB2, ANGPT2, HPGD and BTG2 may play an important role in the tumorigenesis and progression of GHomas. Pathway analysis showed that the wingless-type signaling pathway and extracellular-matrix receptor interactions may play a key role in the tumorigenesis and progression of GHomas. Our data suggested that there are numerous aberrantly expressed genes and pathways involved in the pathogenesis of GHomas. Bead-based fiber-optic arrays combined with pathway analysis of differentially expressed genes appear to be a valid method for investigating the pathogenesis of tumors. PMID:22993617

  4. dCITE: Measuring Necessary Cladistic Information Can Help You Reduce Polytomy Artefacts in Trees.

    PubMed

    Wise, Michael J

    2016-01-01

    Biologists regularly create phylogenetic trees to better understand the evolutionary origins of their species of interest, and often use genomes as their data source. However, as more and more incomplete genomes are published, in many cases it may not be possible to compute genome-based phylogenetic trees due to large gaps in the assembled sequences. In addition, comparison of complete genomes may not even be desirable due to the presence of horizontally acquired and homologous genes. A decision must therefore be made about which gene, or gene combinations, should be used to compute a tree. Deflated Cladistic Information based on Total Entropy (dCITE) is proposed as an easily computed metric for measuring the cladistic information in multiple sequence alignments representing a range of taxa, without the need to first compute the corresponding trees. dCITE scores can be used to rank candidate genes or decide whether input sequences provide insufficient cladistic information, making artefactual polytomies more likely. The dCITE method can be applied to protein, nucleotide or encoded phenotypic data, so can be used to select which data-type is most appropriate, given the choice. In a series of experiments the dCITE method was compared with related measures. Then, as a practical demonstration, the ideas developed in the paper were applied to a dataset representing species from the order Campylobacterales; trees based on sequence combinations, selected on the basis of their dCITE scores, were compared with a tree constructed to mimic Multi-Locus Sequence Typing (MLST) combinations of fragments. We see that the greater the dCITE score the more likely it is that the computed phylogenetic tree will be free of artefactual polytomies. Secondly, cladistic information saturates, beyond which little additional cladistic information can be obtained by adding additional sequences. Finally, sequences with high cladistic information produce more consistent trees for the same taxa.

  5. dCITE: Measuring Necessary Cladistic Information Can Help You Reduce Polytomy Artefacts in Trees

    PubMed Central

    2016-01-01

    Biologists regularly create phylogenetic trees to better understand the evolutionary origins of their species of interest, and often use genomes as their data source. However, as more and more incomplete genomes are published, in many cases it may not be possible to compute genome-based phylogenetic trees due to large gaps in the assembled sequences. In addition, comparison of complete genomes may not even be desirable due to the presence of horizontally acquired and homologous genes. A decision must therefore be made about which gene, or gene combinations, should be used to compute a tree. Deflated Cladistic Information based on Total Entropy (dCITE) is proposed as an easily computed metric for measuring the cladistic information in multiple sequence alignments representing a range of taxa, without the need to first compute the corresponding trees. dCITE scores can be used to rank candidate genes or decide whether input sequences provide insufficient cladistic information, making artefactual polytomies more likely. The dCITE method can be applied to protein, nucleotide or encoded phenotypic data, so can be used to select which data-type is most appropriate, given the choice. In a series of experiments the dCITE method was compared with related measures. Then, as a practical demonstration, the ideas developed in the paper were applied to a dataset representing species from the order Campylobacterales; trees based on sequence combinations, selected on the basis of their dCITE scores, were compared with a tree constructed to mimic Multi-Locus Sequence Typing (MLST) combinations of fragments. We see that the greater the dCITE score the more likely it is that the computed phylogenetic tree will be free of artefactual polytomies. Secondly, cladistic information saturates, beyond which little additional cladistic information can be obtained by adding additional sequences. Finally, sequences with high cladistic information produce more consistent trees for the same taxa. PMID:27898695

  6. Unsupervised automated high throughput phenotyping of RNAi time-lapse movies.

    PubMed

    Failmezger, Henrik; Fröhlich, Holger; Tresch, Achim

    2013-10-04

    Gene perturbation experiments in combination with fluorescence time-lapse cell imaging are a powerful tool in reverse genetics. High content applications require tools for the automated processing of the large amounts of data. These tools include in general several image processing steps, the extraction of morphological descriptors, and the grouping of cells into phenotype classes according to their descriptors. This phenotyping can be applied in a supervised or an unsupervised manner. Unsupervised methods are suitable for the discovery of formerly unknown phenotypes, which are expected to occur in high-throughput RNAi time-lapse screens. We developed an unsupervised phenotyping approach based on Hidden Markov Models (HMMs) with multivariate Gaussian emissions for the detection of knockdown-specific phenotypes in RNAi time-lapse movies. The automated detection of abnormal cell morphologies allows us to assign a phenotypic fingerprint to each gene knockdown. By applying our method to the Mitocheck database, we show that a phenotypic fingerprint is indicative of a gene's function. Our fully unsupervised HMM-based phenotyping is able to automatically identify cell morphologies that are specific for a certain knockdown. Beyond the identification of genes whose knockdown affects cell morphology, phenotypic fingerprints can be used to find modules of functionally related genes.

  7. Herpes Simplex Virus-based gene Therapy Enhances the Efficacy of Mitomycin-C in the Treatment of Human Bladder Transitional Cell Carcinoma

    PubMed Central

    Mullerad, Michael; Bochner, Bernard H.; Adusumilli, Prasad S.; Bhargava, Amit; Kikuchi, Eiji; Hui-Ni, Chen; Kattan, Michael W.; Chou, Ting-Chao; Fong, Yuman

    2005-01-01

    Purpose Oncolytic replication-competent herpes simplex virus type-1 (HSV) mutants have the ability to replicate in and kill malignant cells. We have previously demonstrated the ability of replication-competent HSV to control bladder cancer growth in an orthotopic murine model. We hypothesized that a combination of a chemotherapeutic agent used for intravesical treatment - mitomycin-C (MMC) - and oncolytic HSV would exert a synergistic effect in the treatment of human transitional cell carcinoma (TCC). Materials and Methods We used the mutant HSV NV1066, which is deleted for viral genes ICP0 and ICP4 and selectively infects cancer cells, to treat TCC lines, KU19-19 and SKUB. Cell survival was determined by lactate dehydrogenase (LDH) assay for each agent as well as for drug-viral combinations from days 1 to 5. The isobologram method and the combination index method of Chou-Talalay were used to assess for synergistic effect. Results NV1066 enhanced MMC mediated cytotoxicity at all combinations tested for both KU19-19 and SKUB. Combination of both agents demonstrated a synergistic effect and allowed dose reduction by 12 and 10.4 times (NV1066) and by 3 and 156 times (MMC) in the treatment of KU19-19 and SKUB respectively, while achieving an estimated 90% cell kill. Conclusion These data provide the cellular basis for the clinical investigation of combined mitomycin-C and oncolytic HSV therapy in the treatment of bladder cancer. PMID:16006968

  8. RefEx, a reference gene expression dataset as a web tool for the functional analysis of genes.

    PubMed

    Ono, Hiromasa; Ogasawara, Osamu; Okubo, Kosaku; Bono, Hidemasa

    2017-08-29

    Gene expression data are exponentially accumulating; thus, the functional annotation of such sequence data from metadata is urgently required. However, life scientists have difficulty utilizing the available data due to its sheer magnitude and complicated access. We have developed a web tool for browsing reference gene expression pattern of mammalian tissues and cell lines measured using different methods, which should facilitate the reuse of the precious data archived in several public databases. The web tool is called Reference Expression dataset (RefEx), and RefEx allows users to search by the gene name, various types of IDs, chromosomal regions in genetic maps, gene family based on InterPro, gene expression patterns, or biological categories based on Gene Ontology. RefEx also provides information about genes with tissue-specific expression, and the relative gene expression values are shown as choropleth maps on 3D human body images from BodyParts3D. Combined with the newly incorporated Functional Annotation of Mammals (FANTOM) dataset, RefEx provides insight regarding the functional interpretation of unfamiliar genes. RefEx is publicly available at http://refex.dbcls.jp/.

  9. RefEx, a reference gene expression dataset as a web tool for the functional analysis of genes

    PubMed Central

    Ono, Hiromasa; Ogasawara, Osamu; Okubo, Kosaku; Bono, Hidemasa

    2017-01-01

    Gene expression data are exponentially accumulating; thus, the functional annotation of such sequence data from metadata is urgently required. However, life scientists have difficulty utilizing the available data due to its sheer magnitude and complicated access. We have developed a web tool for browsing reference gene expression pattern of mammalian tissues and cell lines measured using different methods, which should facilitate the reuse of the precious data archived in several public databases. The web tool is called Reference Expression dataset (RefEx), and RefEx allows users to search by the gene name, various types of IDs, chromosomal regions in genetic maps, gene family based on InterPro, gene expression patterns, or biological categories based on Gene Ontology. RefEx also provides information about genes with tissue-specific expression, and the relative gene expression values are shown as choropleth maps on 3D human body images from BodyParts3D. Combined with the newly incorporated Functional Annotation of Mammals (FANTOM) dataset, RefEx provides insight regarding the functional interpretation of unfamiliar genes. RefEx is publicly available at http://refex.dbcls.jp/. PMID:28850115

  10. Comparative Evaluation of Multiplex PCR and Routine Laboratory Phenotypic Methods for Detection of Carbapenemases among Gram Negative Bacilli.

    PubMed

    Solanki, Rachana; Vanjari, Lavanya; Subramanian, Sreevidya; B, Aparna; E, Nagapriyanka; Lakshmi, Vemu

    2014-12-01

    Carbapenem resistant pathogens cause infections associated with significant morbidity and mortality. This study evaluates the use of Multiplex PCR for rapid detection of carbapenemase genes among carbapenem resistant Gram negative bacteria in comparison with the existing phenotypic methods like modified Hodge test (MHT), combined disc test (CDT) and automated methods. A total of 100 Carbapenem resistant clinical isolates, [Escherichia coli (25), Klebsiella pneumoniae (35) P. aeruginosa (18) and Acinetobacter baumannii (22)] were screened for the presence of carbapenemases (bla NDM-1, bla VIM , blaIMP and blaKPC genes) by phenotype methods such as the modified Hodge test (MHT) and combined disc test (CDT) and the molecular methods such as Multiplex PCR. Seventy of the 100 isolates were MHT positive while, 65 isolates were positive by CDT. All the CDT positive isolates with EDTA and APB were Metallo betalactamase (MBL) and K. pneumoniae carbapenemase (KPC) producers respectively. bla NDM-1 was present as a lone gene in 44 isolates. In 14 isolates bla NDM-1 gene was present with blaKPC gene, and in one isolate bla NDM-1 gene was present with blaVIM , gene. Only one E. coli isolate had a lone blaKPC gene. We didn't find bla IMP gene in any of the isolates. Neither of the genes could be detected in 35 isolates. Accurate detection of the genes related with carbapenemase production by Molecular methods like Multiplex PCR overcome the limitations of the phenotypic methods and Automated systems.

  11. Supertrees Based on the Subtree Prune-and-Regraft Distance

    PubMed Central

    Whidden, Christopher; Zeh, Norbert; Beiko, Robert G.

    2014-01-01

    Supertree methods reconcile a set of phylogenetic trees into a single structure that is often interpreted as a branching history of species. A key challenge is combining conflicting evolutionary histories that are due to artifacts of phylogenetic reconstruction and phenomena such as lateral gene transfer (LGT). Many supertree approaches use optimality criteria that do not reflect underlying processes, have known biases, and may be unduly influenced by LGT. We present the first method to construct supertrees by using the subtree prune-and-regraft (SPR) distance as an optimality criterion. Although calculating the rooted SPR distance between a pair of trees is NP-hard, our new maximum agreement forest-based methods can reconcile trees with hundreds of taxa and > 50 transfers in fractions of a second, which enables repeated calculations during the course of an iterative search. Our approach can accommodate trees in which uncertain relationships have been collapsed to multifurcating nodes. Using a series of benchmark datasets simulated under plausible rates of LGT, we show that SPR supertrees are more similar to correct species histories than supertrees based on parsimony or Robinson–Foulds distance criteria. We successfully constructed an SPR supertree from a phylogenomic dataset of 40,631 gene trees that covered 244 genomes representing several major bacterial phyla. Our SPR-based approach also allowed direct inference of highways of gene transfer between bacterial classes and genera. A Small number of these highways connect genera in different phyla and can highlight specific genes implicated in long-distance LGT. [Lateral gene transfer; matrix representation with parsimony; phylogenomics; prokaryotic phylogeny; Robinson–Foulds; subtree prune-and-regraft; supertrees.] PMID:24695589

  12. Absolute quantification of DNA methylation using microfluidic chip-based digital PCR.

    PubMed

    Wu, Zhenhua; Bai, Yanan; Cheng, Zule; Liu, Fangming; Wang, Ping; Yang, Dawei; Li, Gang; Jin, Qinghui; Mao, Hongju; Zhao, Jianlong

    2017-10-15

    Hypermethylation of CpG islands in the promoter region of many tumor suppressor genes downregulates their expression and in a result promotes tumorigenesis. Therefore, detection of DNA methylation status is a convenient diagnostic tool for cancer detection. Here, we reported a novel method for the integrative detection of methylation by the microfluidic chip-based digital PCR. This method relies on methylation-sensitive restriction enzyme HpaII, which cleaves the unmethylated DNA strands while keeping the methylated ones intact. After HpaII treatment, the DNA methylation level is determined quantitatively by the microfluidic chip-based digital PCR with the lower limit of detection equal to 0.52%. To validate the applicability of this method, promoter methylation of two tumor suppressor genes (PCDHGB6 and HOXA9) was tested in 10 samples of early stage lung adenocarcinoma and their adjacent non-tumorous tissues. The consistency was observed in the analysis of these samples using our method and a conventional bisulfite pyrosequencing. Combining high sensitivity and low cost, the microfluidic chip-based digital PCR method might provide a promising alternative for the detection of DNA methylation and early diagnosis of epigenetics-related diseases. Copyright © 2017 Elsevier B.V. All rights reserved.

  13. Combining transcription factor binding affinities with open-chromatin data for accurate gene expression prediction.

    PubMed

    Schmidt, Florian; Gasparoni, Nina; Gasparoni, Gilles; Gianmoena, Kathrin; Cadenas, Cristina; Polansky, Julia K; Ebert, Peter; Nordström, Karl; Barann, Matthias; Sinha, Anupam; Fröhler, Sebastian; Xiong, Jieyi; Dehghani Amirabad, Azim; Behjati Ardakani, Fatemeh; Hutter, Barbara; Zipprich, Gideon; Felder, Bärbel; Eils, Jürgen; Brors, Benedikt; Chen, Wei; Hengstler, Jan G; Hamann, Alf; Lengauer, Thomas; Rosenstiel, Philip; Walter, Jörn; Schulz, Marcel H

    2017-01-09

    The binding and contribution of transcription factors (TF) to cell specific gene expression is often deduced from open-chromatin measurements to avoid costly TF ChIP-seq assays. Thus, it is important to develop computational methods for accurate TF binding prediction in open-chromatin regions (OCRs). Here, we report a novel segmentation-based method, TEPIC, to predict TF binding by combining sets of OCRs with position weight matrices. TEPIC can be applied to various open-chromatin data, e.g. DNaseI-seq and NOMe-seq. Additionally, Histone-Marks (HMs) can be used to identify candidate TF binding sites. TEPIC computes TF affinities and uses open-chromatin/HM signal intensity as quantitative measures of TF binding strength. Using machine learning, we find low affinity binding sites to improve our ability to explain gene expression variability compared to the standard presence/absence classification of binding sites. Further, we show that both footprints and peaks capture essential TF binding events and lead to a good prediction performance. In our application, gene-based scores computed by TEPIC with one open-chromatin assay nearly reach the quality of several TF ChIP-seq data sets. Finally, these scores correctly predict known transcriptional regulators as illustrated by the application to novel DNaseI-seq and NOMe-seq data for primary human hepatocytes and CD4+ T-cells, respectively. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  14. Inferring Regulatory Networks by Combining Perturbation Screens and Steady State Gene Expression Profiles

    PubMed Central

    Michailidis, George

    2014-01-01

    Reconstructing transcriptional regulatory networks is an important task in functional genomics. Data obtained from experiments that perturb genes by knockouts or RNA interference contain useful information for addressing this reconstruction problem. However, such data can be limited in size and/or are expensive to acquire. On the other hand, observational data of the organism in steady state (e.g., wild-type) are more readily available, but their informational content is inadequate for the task at hand. We develop a computational approach to appropriately utilize both data sources for estimating a regulatory network. The proposed approach is based on a three-step algorithm to estimate the underlying directed but cyclic network, that uses as input both perturbation screens and steady state gene expression data. In the first step, the algorithm determines causal orderings of the genes that are consistent with the perturbation data, by combining an exhaustive search method with a fast heuristic that in turn couples a Monte Carlo technique with a fast search algorithm. In the second step, for each obtained causal ordering, a regulatory network is estimated using a penalized likelihood based method, while in the third step a consensus network is constructed from the highest scored ones. Extensive computational experiments show that the algorithm performs well in reconstructing the underlying network and clearly outperforms competing approaches that rely only on a single data source. Further, it is established that the algorithm produces a consistent estimate of the regulatory network. PMID:24586224

  15. A group LASSO-based method for robustly inferring gene regulatory networks from multiple time-course datasets.

    PubMed

    Liu, Li-Zhi; Wu, Fang-Xiang; Zhang, Wen-Jun

    2014-01-01

    As an abstract mapping of the gene regulations in the cell, gene regulatory network is important to both biological research study and practical applications. The reverse engineering of gene regulatory networks from microarray gene expression data is a challenging research problem in systems biology. With the development of biological technologies, multiple time-course gene expression datasets might be collected for a specific gene network under different circumstances. The inference of a gene regulatory network can be improved by integrating these multiple datasets. It is also known that gene expression data may be contaminated with large errors or outliers, which may affect the inference results. A novel method, Huber group LASSO, is proposed to infer the same underlying network topology from multiple time-course gene expression datasets as well as to take the robustness to large error or outliers into account. To solve the optimization problem involved in the proposed method, an efficient algorithm which combines the ideas of auxiliary function minimization and block descent is developed. A stability selection method is adapted to our method to find a network topology consisting of edges with scores. The proposed method is applied to both simulation datasets and real experimental datasets. It shows that Huber group LASSO outperforms the group LASSO in terms of both areas under receiver operating characteristic curves and areas under the precision-recall curves. The convergence analysis of the algorithm theoretically shows that the sequence generated from the algorithm converges to the optimal solution of the problem. The simulation and real data examples demonstrate the effectiveness of the Huber group LASSO in integrating multiple time-course gene expression datasets and improving the resistance to large errors or outliers.

  16. Fuzzy support vector machine: an efficient rule-based classification technique for microarrays.

    PubMed

    Hajiloo, Mohsen; Rabiee, Hamid R; Anooshahpour, Mahdi

    2013-01-01

    The abundance of gene expression microarray data has led to the development of machine learning algorithms applicable for tackling disease diagnosis, disease prognosis, and treatment selection problems. However, these algorithms often produce classifiers with weaknesses in terms of accuracy, robustness, and interpretability. This paper introduces fuzzy support vector machine which is a learning algorithm based on combination of fuzzy classifiers and kernel machines for microarray classification. Experimental results on public leukemia, prostate, and colon cancer datasets show that fuzzy support vector machine applied in combination with filter or wrapper feature selection methods develops a robust model with higher accuracy than the conventional microarray classification models such as support vector machine, artificial neural network, decision trees, k nearest neighbors, and diagonal linear discriminant analysis. Furthermore, the interpretable rule-base inferred from fuzzy support vector machine helps extracting biological knowledge from microarray data. Fuzzy support vector machine as a new classification model with high generalization power, robustness, and good interpretability seems to be a promising tool for gene expression microarray classification.

  17. A mesh generation and machine learning framework for Drosophila gene expression pattern image analysis

    PubMed Central

    2013-01-01

    Background Multicellular organisms consist of cells of many different types that are established during development. Each type of cell is characterized by the unique combination of expressed gene products as a result of spatiotemporal gene regulation. Currently, a fundamental challenge in regulatory biology is to elucidate the gene expression controls that generate the complex body plans during development. Recent advances in high-throughput biotechnologies have generated spatiotemporal expression patterns for thousands of genes in the model organism fruit fly Drosophila melanogaster. Existing qualitative methods enhanced by a quantitative analysis based on computational tools we present in this paper would provide promising ways for addressing key scientific questions. Results We develop a set of computational methods and open source tools for identifying co-expressed embryonic domains and the associated genes simultaneously. To map the expression patterns of many genes into the same coordinate space and account for the embryonic shape variations, we develop a mesh generation method to deform a meshed generic ellipse to each individual embryo. We then develop a co-clustering formulation to cluster the genes and the mesh elements, thereby identifying co-expressed embryonic domains and the associated genes simultaneously. Experimental results indicate that the gene and mesh co-clusters can be correlated to key developmental events during the stages of embryogenesis we study. The open source software tool has been made available at http://compbio.cs.odu.edu/fly/. Conclusions Our mesh generation and machine learning methods and tools improve upon the flexibility, ease-of-use and accuracy of existing methods. PMID:24373308

  18. DNA from uncultured organisms as a source of 2,5-diketo-L-gluconic acid reductases.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Eschenfeldt, W. H.; Stols, L.; Rosenbaum, H.

    2001-09-01

    Total DNA of a population of uncultured organisms was extracted from soil samples, and by using PCR methods, the genes encoding two different 2,5-diketo-D-gluconic acid reductases (DKGRs) were recovered. Degenerate PCR primers based on published sequence information gave internal gene fragments homologous to known DKGRs. Nested primers specific for the internal fragments were combined with random primers to amplify flanking gene fragments from the environmental DNA, and two hypothetical full-length genes were predicted from the combined sequences. Based on these predictions, specific primers were used to amplify the two complete genes in single PCRs. These genes were cloned and expressedmore » in Escherichia coli. The purified gene products catalyzed the reduction of 2,5-diketo-D-gluconic acid to 2-keto-L-gulonic acid. Compared to previously described DKGRs isolated from Corynebacterium spp., these environmental reductases possessed some valuable properties. Both exhibited greater than 20-fold-higher k{sub cat}/K{sub m} values than those previously determined, primarily as a result of better binding of substrate. The K{sub m} values for the two new reductases were 57 and 67 {mu}M, versus 2 and 13 mM for the Corynebacterium enzymes. Both environmental DKGRs accepted NADH as well as NADPH as a cosubstrate; other DKGRs and most related aldo-keto reductases use only NADPH. In addition, one of the new reductases was more thermostable than known DKGRs.« less

  19. Exploring Wound-Healing Genomic Machinery with a Network-Based Approach

    PubMed Central

    Vitali, Francesca; Marini, Simone; Balli, Martina; Grosemans, Hanne; Sampaolesi, Maurilio; Lussier, Yves A.; Cusella De Angelis, Maria Gabriella; Bellazzi, Riccardo

    2017-01-01

    The molecular mechanisms underlying tissue regeneration and wound healing are still poorly understood despite their importance. In this paper we develop a bioinformatics approach, combining biology and network theory to drive experiments for better understanding the genetic underpinnings of wound healing mechanisms and for selecting potential drug targets. We start by selecting literature-relevant genes in murine wound healing, and inferring from them a Protein-Protein Interaction (PPI) network. Then, we analyze the network to rank wound healing-related genes according to their topological properties. Lastly, we perform a procedure for in-silico simulation of a treatment action in a biological pathway. The findings obtained by applying the developed pipeline, including gene expression analysis, confirms how a network-based bioinformatics method is able to prioritize candidate genes for in vitro analysis, thus speeding up the understanding of molecular mechanisms and supporting the discovery of potential drug targets. PMID:28635674

  20. Luciferase reporter assay in Drosophila and mammalian tissue culture cells

    PubMed Central

    Yun, Chi

    2014-01-01

    Luciferase reporter gene assays are one of the most common methods for monitoring gene activity. Because of their sensitivity, dynamic range, and lack of endogenous activity, luciferase assays have been particularly useful for functional genomics in cell-based assays, such as RNAi screening. This unit describes delivery of two luciferase reporters with other nucleic acids (siRNA /dsRNA), measurement of the dual luciferase activities, and analysis of data generated. The systematic query of gene function (RNAi) combined with the advances in luminescent technology have made it possible to design powerful whole genome screens to address diverse and significant biological questions. PMID:24652620

  1. A hadoop-based method to predict potential effective drug combination.

    PubMed

    Sun, Yifan; Xiong, Yi; Xu, Qian; Wei, Dongqing

    2014-01-01

    Combination drugs that impact multiple targets simultaneously are promising candidates for combating complex diseases due to their improved efficacy and reduced side effects. However, exhaustive screening of all possible drug combinations is extremely time-consuming and impractical. Here, we present a novel Hadoop-based approach to predict drug combinations by taking advantage of the MapReduce programming model, which leads to an improvement of scalability of the prediction algorithm. By integrating the gene expression data of multiple drugs, we constructed data preprocessing and the support vector machines and naïve Bayesian classifiers on Hadoop for prediction of drug combinations. The experimental results suggest that our Hadoop-based model achieves much higher efficiency in the big data processing steps with satisfactory performance. We believed that our proposed approach can help accelerate the prediction of potential effective drugs with the increasing of the combination number at an exponential rate in future. The source code and datasets are available upon request.

  2. A Hadoop-Based Method to Predict Potential Effective Drug Combination

    PubMed Central

    Xiong, Yi; Xu, Qian; Wei, Dongqing

    2014-01-01

    Combination drugs that impact multiple targets simultaneously are promising candidates for combating complex diseases due to their improved efficacy and reduced side effects. However, exhaustive screening of all possible drug combinations is extremely time-consuming and impractical. Here, we present a novel Hadoop-based approach to predict drug combinations by taking advantage of the MapReduce programming model, which leads to an improvement of scalability of the prediction algorithm. By integrating the gene expression data of multiple drugs, we constructed data preprocessing and the support vector machines and naïve Bayesian classifiers on Hadoop for prediction of drug combinations. The experimental results suggest that our Hadoop-based model achieves much higher efficiency in the big data processing steps with satisfactory performance. We believed that our proposed approach can help accelerate the prediction of potential effective drugs with the increasing of the combination number at an exponential rate in future. The source code and datasets are available upon request. PMID:25147789

  3. Metabolic network prediction through pairwise rational kernels.

    PubMed

    Roche-Lima, Abiel; Domaratzki, Michael; Fristensky, Brian

    2014-09-26

    Metabolic networks are represented by the set of metabolic pathways. Metabolic pathways are a series of biochemical reactions, in which the product (output) from one reaction serves as the substrate (input) to another reaction. Many pathways remain incompletely characterized. One of the major challenges of computational biology is to obtain better models of metabolic pathways. Existing models are dependent on the annotation of the genes. This propagates error accumulation when the pathways are predicted by incorrectly annotated genes. Pairwise classification methods are supervised learning methods used to classify new pair of entities. Some of these classification methods, e.g., Pairwise Support Vector Machines (SVMs), use pairwise kernels. Pairwise kernels describe similarity measures between two pairs of entities. Using pairwise kernels to handle sequence data requires long processing times and large storage. Rational kernels are kernels based on weighted finite-state transducers that represent similarity measures between sequences or automata. They have been effectively used in problems that handle large amount of sequence information such as protein essentiality, natural language processing and machine translations. We create a new family of pairwise kernels using weighted finite-state transducers (called Pairwise Rational Kernel (PRK)) to predict metabolic pathways from a variety of biological data. PRKs take advantage of the simpler representations and faster algorithms of transducers. Because raw sequence data can be used, the predictor model avoids the errors introduced by incorrect gene annotations. We then developed several experiments with PRKs and Pairwise SVM to validate our methods using the metabolic network of Saccharomyces cerevisiae. As a result, when PRKs are used, our method executes faster in comparison with other pairwise kernels. Also, when we use PRKs combined with other simple kernels that include evolutionary information, the accuracy values have been improved, while maintaining lower construction and execution times. The power of using kernels is that almost any sort of data can be represented using kernels. Therefore, completely disparate types of data can be combined to add power to kernel-based machine learning methods. When we compared our proposal using PRKs with other similar kernel, the execution times were decreased, with no compromise of accuracy. We also proved that by combining PRKs with other kernels that include evolutionary information, the accuracy can also also be improved. As our proposal can use any type of sequence data, genes do not need to be properly annotated, avoiding accumulation errors because of incorrect previous annotations.

  4. Precise integration of inducible transcriptional elements (PrIITE) enables absolute control of gene expression.

    PubMed

    Pinto, Rita; Hansen, Lars; Hintze, John; Almeida, Raquel; Larsen, Sylvester; Coskun, Mehmet; Davidsen, Johanne; Mitchelmore, Cathy; David, Leonor; Troelsen, Jesper Thorvald; Bennett, Eric Paul

    2017-07-27

    Tetracycline-based inducible systems provide powerful methods for functional studies where gene expression can be controlled. However, the lack of tight control of the inducible system, leading to leakiness and adverse effects caused by undesirable tetracycline dosage requirements, has proven to be a limitation. Here, we report that the combined use of genome editing tools and last generation Tet-On systems can resolve these issues. Our principle is based on precise integration of inducible transcriptional elements (coined PrIITE) targeted to: (i) exons of an endogenous gene of interest (GOI) and (ii) a safe harbor locus. Using PrIITE cells harboring a GFP reporter or CDX2 transcription factor, we demonstrate discrete inducibility of gene expression with complete abrogation of leakiness. CDX2 PrIITE cells generated by this approach uncovered novel CDX2 downstream effector genes. Our results provide a strategy for characterization of dose-dependent effector functions of essential genes that require absence of endogenous gene expression. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. A new method for enhancer prediction based on deep belief network.

    PubMed

    Bu, Hongda; Gan, Yanglan; Wang, Yang; Zhou, Shuigeng; Guan, Jihong

    2017-10-16

    Studies have shown that enhancers are significant regulatory elements to play crucial roles in gene expression regulation. Since enhancers are unrelated to the orientation and distance to their target genes, it is a challenging mission for scholars and researchers to accurately predicting distal enhancers. In the past years, with the high-throughout ChiP-seq technologies development, several computational techniques emerge to predict enhancers using epigenetic or genomic features. Nevertheless, the inconsistency of computational models across different cell-lines and the unsatisfactory prediction performance call for further research in this area. Here, we propose a new Deep Belief Network (DBN) based computational method for enhancer prediction, which is called EnhancerDBN. This method combines diverse features, composed of DNA sequence compositional features, DNA methylation and histone modifications. Our computational results indicate that 1) EnhancerDBN outperforms 13 existing methods in prediction, and 2) GC content and DNA methylation can serve as relevant features for enhancer prediction. Deep learning is effective in boosting the performance of enhancer prediction.

  6. A Method for Predicting Protein Complexes from Dynamic Weighted Protein-Protein Interaction Networks.

    PubMed

    Liu, Lizhen; Sun, Xiaowu; Song, Wei; Du, Chao

    2018-06-01

    Predicting protein complexes from protein-protein interaction (PPI) network is of great significance to recognize the structure and function of cells. A protein may interact with different proteins under different time or conditions. Existing approaches only utilize static PPI network data that may lose much temporal biological information. First, this article proposed a novel method that combines gene expression data at different time points with traditional static PPI network to construct different dynamic subnetworks. Second, to further filter out the data noise, the semantic similarity based on gene ontology is regarded as the network weight together with the principal component analysis, which is introduced to deal with the weight computing by three traditional methods. Third, after building a dynamic PPI network, a predicting protein complexes algorithm based on "core-attachment" structural feature is applied to detect complexes from each dynamic subnetworks. Finally, it is revealed from the experimental results that our method proposed in this article performs well on detecting protein complexes from dynamic weighted PPI networks.

  7. Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition.

    PubMed

    Tamura, Takeyuki; Akutsu, Tatsuya

    2007-11-30

    Subcellular location prediction of proteins is an important and well-studied problem in bioinformatics. This is a problem of predicting which part in a cell a given protein is transported to, where an amino acid sequence of the protein is given as an input. This problem is becoming more important since information on subcellular location is helpful for annotation of proteins and genes and the number of complete genomes is rapidly increasing. Since existing predictors are based on various heuristics, it is important to develop a simple method with high prediction accuracies. In this paper, we propose a novel and general predicting method by combining techniques for sequence alignment and feature vectors based on amino acid composition. We implemented this method with support vector machines on plant data sets extracted from the TargetP database. Through fivefold cross validation tests, the obtained overall accuracies and average MCC were 0.9096 and 0.8655 respectively. We also applied our method to other datasets including that of WoLF PSORT. Although there is a predictor which uses the information of gene ontology and yields higher accuracy than ours, our accuracies are higher than existing predictors which use only sequence information. Since such information as gene ontology can be obtained only for known proteins, our predictor is considered to be useful for subcellular location prediction of newly-discovered proteins. Furthermore, the idea of combination of alignment and amino acid frequency is novel and general so that it may be applied to other problems in bioinformatics. Our method for plant is also implemented as a web-system and available on http://sunflower.kuicr.kyoto-u.ac.jp/~tamura/slpfa.html.

  8. MATRIX FACTORIZATION-BASED DATA FUSION FOR GENE FUNCTION PREDICTION IN BAKER’S YEAST AND SLIME MOLD

    PubMed Central

    ŽITNIK, MARINKA; ZUPAN, BLAŽ

    2014-01-01

    The development of effective methods for the characterization of gene functions that are able to combine diverse data sources in a sound and easily-extendible way is an important goal in computational biology. We have previously developed a general matrix factorization-based data fusion approach for gene function prediction. In this manuscript, we show that this data fusion approach can be applied to gene function prediction and that it can fuse various heterogeneous data sources, such as gene expression profiles, known protein annotations, interaction and literature data. The fusion is achieved by simultaneous matrix tri-factorization that shares matrix factors between sources. We demonstrate the effectiveness of the approach by evaluating its performance on predicting ontological annotations in slime mold D. discoideum and on recognizing proteins of baker’s yeast S. cerevisiae that participate in the ribosome or are located in the cell membrane. Our approach achieves predictive performance comparable to that of the state-of-the-art kernel-based data fusion, but requires fewer data preprocessing steps. PMID:24297565

  9. Detecting Horizontal Gene Transfer between Closely Related Taxa

    PubMed Central

    Adato, Orit; Ninyo, Noga; Gophna, Uri; Snir, Sagi

    2015-01-01

    Horizontal gene transfer (HGT), the transfer of genetic material between organisms, is crucial for genetic innovation and the evolution of genome architecture. Existing HGT detection algorithms rely on a strong phylogenetic signal distinguishing the transferred sequence from ancestral (vertically derived) genes in its recipient genome. Detecting HGT between closely related species or strains is challenging, as the phylogenetic signal is usually weak and the nucleotide composition is normally nearly identical. Nevertheless, there is a great importance in detecting HGT between congeneric species or strains, especially in clinical microbiology, where understanding the emergence of new virulent and drug-resistant strains is crucial, and often time-sensitive. We developed a novel, self-contained technique named Near HGT, based on the synteny index, to measure the divergence of a gene from its native genomic environment and used it to identify candidate HGT events between closely related strains. The method confirms candidate transferred genes based on the constant relative mutability (CRM). Using CRM, the algorithm assigns a confidence score based on “unusual” sequence divergence. A gene exhibiting exceptional deviations according to both synteny and mutability criteria, is considered a validated HGT product. We first employed the technique to a set of three E. coli strains and detected several highly probable horizontally acquired genes. We then compared the method to existing HGT detection tools using a larger strain data set. When combined with additional approaches our new algorithm provides richer picture and brings us closer to the goal of detecting all newly acquired genes in a particular strain. PMID:26439115

  10. Antibiotic Combinations That Enable One-Step, Targeted Mutagenesis of Chromosomal Genes.

    PubMed

    Lee, Wonsik; Do, Truc; Zhang, Ge; Kahne, Daniel; Meredith, Timothy C; Walker, Suzanne

    2018-06-08

    Targeted modification of bacterial chromosomes is necessary to understand new drug targets, investigate virulence factors, elucidate cell physiology, and validate results of -omics-based approaches. For some bacteria, reverse genetics remains a major bottleneck to progress in research. Here, we describe a compound-centric strategy that combines new negative selection markers with known positive selection markers to achieve simple, efficient one-step genome engineering of bacterial chromosomes. The method was inspired by the observation that certain nonessential metabolic pathways contain essential late steps, suggesting that antibiotics targeting a late step can be used to select for the absence of genes that control flux into the pathway. Guided by this hypothesis, we have identified antibiotic/counterselectable markers to accelerate reverse engineering of two increasingly antibiotic-resistant pathogens, Staphylococcus aureus and Acinetobacter baumannii. For S. aureus, we used wall teichoic acid biosynthesis inhibitors to select for the absence of tarO and for A. baumannii, we used colistin to select for the absence of lpxC. We have obtained desired gene deletions, gene fusions, and promoter swaps in a single plating step with perfect efficiency. Our method can also be adapted to generate markerless deletions of genes using FLP recombinase. The tools described here will accelerate research on two important pathogens, and the concept we outline can be readily adapted to any organism for which a suitable target pathway can be identified.

  11. Molecular cancer classification using a meta-sample-based regularized robust coding method.

    PubMed

    Wang, Shu-Lin; Sun, Liuchao; Fang, Jianwen

    2014-01-01

    Previous studies have demonstrated that machine learning based molecular cancer classification using gene expression profiling (GEP) data is promising for the clinic diagnosis and treatment of cancer. Novel classification methods with high efficiency and prediction accuracy are still needed to deal with high dimensionality and small sample size of typical GEP data. Recently the sparse representation (SR) method has been successfully applied to the cancer classification. Nevertheless, its efficiency needs to be improved when analyzing large-scale GEP data. In this paper we present the meta-sample-based regularized robust coding classification (MRRCC), a novel effective cancer classification technique that combines the idea of meta-sample-based cluster method with regularized robust coding (RRC) method. It assumes that the coding residual and the coding coefficient are respectively independent and identically distributed. Similar to meta-sample-based SR classification (MSRC), MRRCC extracts a set of meta-samples from the training samples, and then encodes a testing sample as the sparse linear combination of these meta-samples. The representation fidelity is measured by the l2-norm or l1-norm of the coding residual. Extensive experiments on publicly available GEP datasets demonstrate that the proposed method is more efficient while its prediction accuracy is equivalent to existing MSRC-based methods and better than other state-of-the-art dimension reduction based methods.

  12. Module-based construction of plasmids for chromosomal integration of the fission yeast Schizosaccharomyces pombe

    PubMed Central

    Kakui, Yasutaka; Sunaga, Tomonari; Arai, Kunio; Dodgson, James; Ji, Liang; Csikász-Nagy, Attila; Carazo-Salas, Rafael; Sato, Masamitsu

    2015-01-01

    Integration of an external gene into a fission yeast chromosome is useful to investigate the effect of the gene product. An easy way to knock-in a gene construct is use of an integration plasmid, which can be targeted and inserted to a chromosome through homologous recombination. Despite the advantage of integration, construction of integration plasmids is energy- and time-consuming, because there is no systematic library of integration plasmids with various promoters, fluorescent protein tags, terminators and selection markers; therefore, researchers are often forced to make appropriate ones through multiple rounds of cloning procedures. Here, we establish materials and methods to easily construct integration plasmids. We introduce a convenient cloning system based on Golden Gate DNA shuffling, which enables the connection of multiple DNA fragments at once: any kind of promoters and terminators, the gene of interest, in combination with any fluorescent protein tag genes and any selection markers. Each of those DNA fragments, called a ‘module’, can be tandemly ligated in the order we desire in a single reaction, which yields a circular plasmid in a one-step manner. The resulting plasmids can be integrated through standard methods for transformation. Thus, these materials and methods help easy construction of knock-in strains, and this will further increase the value of fission yeast as a model organism. PMID:26108218

  13. Snowball: resampling combined with distance-based regression to discover transcriptional consequences of a driver mutation

    PubMed Central

    Xu, Yaomin; Guo, Xingyi; Sun, Jiayang; Zhao, Zhongming

    2015-01-01

    Motivation: Large-scale cancer genomic studies, such as The Cancer Genome Atlas (TCGA), have profiled multidimensional genomic data, including mutation and expression profiles on a variety of cancer cell types, to uncover the molecular mechanism of cancerogenesis. More than a hundred driver mutations have been characterized that confer the advantage of cell growth. However, how driver mutations regulate the transcriptome to affect cellular functions remains largely unexplored. Differential analysis of gene expression relative to a driver mutation on patient samples could provide us with new insights in understanding driver mutation dysregulation in tumor genome and developing personalized treatment strategies. Results: Here, we introduce the Snowball approach as a highly sensitive statistical analysis method to identify transcriptional signatures that are affected by a recurrent driver mutation. Snowball utilizes a resampling-based approach and combines a distance-based regression framework to assign a robust ranking index of genes based on their aggregated association with the presence of the mutation, and further selects the top significant genes for downstream data analyses or experiments. In our application of the Snowball approach to both synthesized and TCGA data, we demonstrated that it outperforms the standard methods and provides more accurate inferences to the functional effects and transcriptional dysregulation of driver mutations. Availability and implementation: R package and source code are available from CRAN at http://cran.r-project.org/web/packages/DESnowball, and also available at http://bioinfo.mc.vanderbilt.edu/DESnowball/. Contact: zhongming.zhao@vanderbilt.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25192743

  14. pTRA - A reporter system for monitoring the intracellular dynamics of gene expression.

    PubMed

    Wagner, Sabine G; Ziegler, Martin; Löwe, Hannes; Kremling, Andreas; Pflüger-Grau, Katharina

    2018-01-01

    The presence of standardised tools and methods to measure and represent accurately biological parts and functions is a prerequisite for successful metabolic engineering and crucial to understand and predict the behaviour of synthetic genetic circuits. Many synthetic gene networks are based on transcriptional circuits, thus information on transcriptional and translational activity is important for understanding and fine-tuning the synthetic function. To this end, we have developed a toolkit to analyse systematically the transcriptional and translational activity of a specific synthetic part in vivo. It is based on the plasmid pTRA and allows the assignment of specific transcriptional and translational outputs to the gene(s) of interest (GOI) and to compare different genetic setups. By this, the optimal combination of transcriptional strength and translational activity can be identified. The design is tested in a case study using the gene encoding the fluorescent mCherry protein as GOI. We show the intracellular dynamics of mRNA and protein formation and discuss the potential and shortcomings of the pTRA plasmid.

  15. A transversal approach to predict gene product networks from ontology-based similarity

    PubMed Central

    Chabalier, Julie; Mosser, Jean; Burgun, Anita

    2007-01-01

    Background Interpretation of transcriptomic data is usually made through a "standard" approach which consists in clustering the genes according to their expression patterns and exploiting Gene Ontology (GO) annotations within each expression cluster. This approach makes it difficult to underline functional relationships between gene products that belong to different expression clusters. To address this issue, we propose a transversal analysis that aims to predict functional networks based on a combination of GO processes and data expression. Results The transversal approach presented in this paper consists in computing the semantic similarity between gene products in a Vector Space Model. Through a weighting scheme over the annotations, we take into account the representativity of the terms that annotate a gene product. Comparing annotation vectors results in a matrix of gene product similarities. Combined with expression data, the matrix is displayed as a set of functional gene networks. The transversal approach was applied to 186 genes related to the enterocyte differentiation stages. This approach resulted in 18 functional networks proved to be biologically relevant. These results were compared with those obtained through a standard approach and with an approach based on information content similarity. Conclusion Complementary to the standard approach, the transversal approach offers new insight into the cellular mechanisms and reveals new research hypotheses by combining gene product networks based on semantic similarity, and data expression. PMID:17605807

  16. Efficient production of recombinant adeno-associated viral vector, serotype DJ/8, carrying the GFP gene.

    PubMed

    Hashimoto, Haruo; Mizushima, Tomoko; Chijiwa, Tsuyoshi; Nakamura, Masato; Suemizu, Hiroshi

    2017-06-15

    The purpose of this study was to establish an efficient method for the preparation of an adeno-associated viral (AAV), serotype DJ/8, carrying the GFP gene (AAV-DJ/8-GFP). We compared the yields of AAV-DJ/8 vector, which were produced by three different combination methods, consisting of two plasmid DNA transfection methods (lipofectamine and calcium phosphate co-precipitation; CaPi) and two virus DNA purification methods (iodixanol and cesium chloride; CsCl). The results showed that the highest yield of AAV-DJ/8-GFP vector was accomplished with the combination method of lipofectamine transfection and iodixanol purification. The viral protein expression levels and the transduction efficacy in HEK293 and CHO cells were not different among four different combination methods for AAV-DJ/8-GFP vectors. We confirmed that the AAV-DJ/8-GFP vector could transduce to human and murine hepatocyte-derived cell lines. These results show that AAV-DJ/8-GFP, purified by the combination of lipofectamine and iodixanol, produces an efficient yield without altering the characteristics of protein expression and AAV gene transduction. Copyright © 2017 Elsevier B.V. All rights reserved.

  17. p53 as the focus of gene therapy: past, present and future.

    PubMed

    Valente, Joana Fa; Queiroz, Joao A; Sousa, Fani

    2018-01-15

    Several gene deviations can be responsible for triggering oncogenic processes. However, mutations in tumour suppressor genes are usually more associated to malignant diseases, being p53 one of the most affected and studied element. p53 is implicated in a number of known cellular functions, including DNA damage repair, cell cycle arrest in G1/S and G2/M and apoptosis, being an interesting target for cancer treatment. Considering these facts, the development of gene therapy approaches focused on p53 expression and regulation seems to be a promising strategy for cancer therapy. Several studies have shown that transfection of cancer cells with wild-type p53 expressing plasmids could directly drive cells into apoptosis and/or growth arrest, suggesting that a gene therapy approach for cancer treatment can be based on the re-establishment of the normal p53 expression levels and function. Up until now, several clinical research studies using viral and non-viral vectors delivering p53 genes, isolated or combined with other therapeutic agents, have been accomplished and there are already in the market therapies based on the use of this gene. This review summarizes the different methods used to deliver and/or target the p53 as well as the main results of therapeutic effect obtained with the different strategies applied. Finally, the ongoing approaches are described, also focusing the combinatorial therapeutics to show the increased therapeutic potential of combining gene therapy vectors with chemo or radiotherapy. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  18. Genomic Selection in Plant Breeding: Methods, Models, and Perspectives.

    PubMed

    Crossa, José; Pérez-Rodríguez, Paulino; Cuevas, Jaime; Montesinos-López, Osval; Jarquín, Diego; de Los Campos, Gustavo; Burgueño, Juan; González-Camacho, Juan M; Pérez-Elizalde, Sergio; Beyene, Yoseph; Dreisigacker, Susanne; Singh, Ravi; Zhang, Xuecai; Gowda, Manje; Roorkiwal, Manish; Rutkoski, Jessica; Varshney, Rajeev K

    2017-11-01

    Genomic selection (GS) facilitates the rapid selection of superior genotypes and accelerates the breeding cycle. In this review, we discuss the history, principles, and basis of GS and genomic-enabled prediction (GP) as well as the genetics and statistical complexities of GP models, including genomic genotype×environment (G×E) interactions. We also examine the accuracy of GP models and methods for two cereal crops and two legume crops based on random cross-validation. GS applied to maize breeding has shown tangible genetic gains. Based on GP results, we speculate how GS in germplasm enhancement (i.e., prebreeding) programs could accelerate the flow of genes from gene bank accessions to elite lines. Recent advances in hyperspectral image technology could be combined with GS and pedigree-assisted breeding. Copyright © 2017 Elsevier Ltd. All rights reserved.

  19. The effects of shared information on semantic calculations in the gene ontology.

    PubMed

    Bible, Paul W; Sun, Hong-Wei; Morasso, Maria I; Loganantharaj, Rasiah; Wei, Lai

    2017-01-01

    The structured vocabulary that describes gene function, the gene ontology (GO), serves as a powerful tool in biological research. One application of GO in computational biology calculates semantic similarity between two concepts to make inferences about the functional similarity of genes. A class of term similarity algorithms explicitly calculates the shared information (SI) between concepts then substitutes this calculation into traditional term similarity measures such as Resnik, Lin, and Jiang-Conrath. Alternative SI approaches, when combined with ontology choice and term similarity type, lead to many gene-to-gene similarity measures. No thorough investigation has been made into the behavior, complexity, and performance of semantic methods derived from distinct SI approaches. We apply bootstrapping to compare the generalized performance of 57 gene-to-gene semantic measures across six benchmarks. Considering the number of measures, we additionally evaluate whether these methods can be leveraged through ensemble machine learning to improve prediction performance. Results showed that the choice of ontology type most strongly influenced performance across all evaluations. Combining measures into an ensemble classifier reduces cross-validation error beyond any individual measure for protein interaction prediction. This improvement resulted from information gained through the combination of ontology types as ensemble methods within each GO type offered no improvement. These results demonstrate that multiple SI measures can be leveraged for machine learning tasks such as automated gene function prediction by incorporating methods from across the ontologies. To facilitate future research in this area, we developed the GO Graph Tool Kit (GGTK), an open source C++ library with Python interface (github.com/paulbible/ggtk).

  20. Chemical Entity Recognition and Resolution to ChEBI

    PubMed Central

    Grego, Tiago; Pesquita, Catia; Bastos, Hugo P.; Couto, Francisco M.

    2012-01-01

    Chemical entities are ubiquitous through the biomedical literature and the development of text-mining systems that can efficiently identify those entities are required. Due to the lack of available corpora and data resources, the community has focused its efforts in the development of gene and protein named entity recognition systems, but with the release of ChEBI and the availability of an annotated corpus, this task can be addressed. We developed a machine-learning-based method for chemical entity recognition and a lexical-similarity-based method for chemical entity resolution and compared them with Whatizit, a popular-dictionary-based method. Our methods outperformed the dictionary-based method in all tasks, yielding an improvement in F-measure of 20% for the entity recognition task, 2–5% for the entity-resolution task, and 15% for combined entity recognition and resolution tasks. PMID:25937941

  1. A community computational challenge to predict the activity of pairs of compounds.

    PubMed

    Bansal, Mukesh; Yang, Jichen; Karan, Charles; Menden, Michael P; Costello, James C; Tang, Hao; Xiao, Guanghua; Li, Yajuan; Allen, Jeffrey; Zhong, Rui; Chen, Beibei; Kim, Minsoo; Wang, Tao; Heiser, Laura M; Realubit, Ronald; Mattioli, Michela; Alvarez, Mariano J; Shen, Yao; Gallahan, Daniel; Singer, Dinah; Saez-Rodriguez, Julio; Xie, Yang; Stolovitzky, Gustavo; Califano, Andrea

    2014-12-01

    Recent therapeutic successes have renewed interest in drug combinations, but experimental screening approaches are costly and often identify only small numbers of synergistic combinations. The DREAM consortium launched an open challenge to foster the development of in silico methods to computationally rank 91 compound pairs, from the most synergistic to the most antagonistic, based on gene-expression profiles of human B cells treated with individual compounds at multiple time points and concentrations. Using scoring metrics based on experimental dose-response curves, we assessed 32 methods (31 community-generated approaches and SynGen), four of which performed significantly better than random guessing. We highlight similarities between the methods. Although the accuracy of predictions was not optimal, we find that computational prediction of compound-pair activity is possible, and that community challenges can be useful to advance the field of in silico compound-synergy prediction.

  2. A subregion-based burden test for simultaneous identification of susceptibility loci and subregions within.

    PubMed

    Zhu, Bin; Mirabello, Lisa; Chatterjee, Nilanjan

    2018-06-22

    In rare variant association studies, aggregating rare and/or low frequency variants, may increase statistical power for detection of the underlying susceptibility gene or region. However, it is unclear which variants, or class of them, in a gene contribute most to the association. We proposed a subregion-based burden test (REBET) to simultaneously select susceptibility genes and identify important underlying subregions. The subregions are predefined by shared common biologic characteristics, such as the protein domain or functional impact. Based on a subset-based approach considering local correlations between combinations of test statistics of subregions, REBET is able to properly control the type I error rate while adjusting for multiple comparisons in a computationally efficient manner. Simulation studies show that REBET can achieve power competitive to alternative methods when rare variants cluster within subregions. In two case studies, REBET is able to identify known disease susceptibility genes, and more importantly pinpoint the unreported most susceptible subregions, which represent protein domains essential for gene function. R package REBET is available at https://dceg.cancer.gov/tools/analysis/rebet. Published 2018. This article is a U.S. Government work and is in the public domain in the USA.

  3. A Method for Gene-Based Pathway Analysis Using Genomewide Association Study Summary Statistics Reveals Nine New Type 1 Diabetes Associations

    PubMed Central

    Evangelou, Marina; Smyth, Deborah J; Fortune, Mary D; Burren, Oliver S; Walker, Neil M; Guo, Hui; Onengut-Gumuscu, Suna; Chen, Wei-Min; Concannon, Patrick; Rich, Stephen S; Todd, John A; Wallace, Chris

    2014-01-01

    Pathway analysis can complement point-wise single nucleotide polymorphism (SNP) analysis in exploring genomewide association study (GWAS) data to identify specific disease-associated genes that can be candidate causal genes. We propose a straightforward methodology that can be used for conducting a gene-based pathway analysis using summary GWAS statistics in combination with widely available reference genotype data. We used this method to perform a gene-based pathway analysis of a type 1 diabetes (T1D) meta-analysis GWAS (of 7,514 cases and 9,045 controls). An important feature of the conducted analysis is the removal of the major histocompatibility complex gene region, the major genetic risk factor for T1D. Thirty-one of the 1,583 (2%) tested pathways were identified to be enriched for association with T1D at a 5% false discovery rate. We analyzed these 31 pathways and their genes to identify SNPs in or near these pathway genes that showed potentially novel association with T1D and attempted to replicate the association of 22 SNPs in additional samples. Replication P-values were skewed () with 12 of the 22 SNPs showing . Support, including replication evidence, was obtained for nine T1D associated variants in genes ITGB7 (rs11170466, ), NRP1 (rs722988, ), BAD (rs694739, ), CTSB (rs1296023, ), FYN (rs11964650, ), UBE2G1 (rs9906760, ), MAP3K14 (rs17759555, ), ITGB1 (rs1557150, ), and IL7R (rs1445898, ). The proposed methodology can be applied to other GWAS datasets for which only summary level data are available. PMID:25371288

  4. General Framework for Meta-analysis of Rare Variants in Sequencing Association Studies

    PubMed Central

    Lee, Seunggeun; Teslovich, Tanya M.; Boehnke, Michael; Lin, Xihong

    2013-01-01

    We propose a general statistical framework for meta-analysis of gene- or region-based multimarker rare variant association tests in sequencing association studies. In genome-wide association studies, single-marker meta-analysis has been widely used to increase statistical power by combining results via regression coefficients and standard errors from different studies. In analysis of rare variants in sequencing studies, region-based multimarker tests are often used to increase power. We propose meta-analysis methods for commonly used gene- or region-based rare variants tests, such as burden tests and variance component tests. Because estimation of regression coefficients of individual rare variants is often unstable or not feasible, the proposed method avoids this difficulty by calculating score statistics instead that only require fitting the null model for each study and then aggregating these score statistics across studies. Our proposed meta-analysis rare variant association tests are conducted based on study-specific summary statistics, specifically score statistics for each variant and between-variant covariance-type (linkage disequilibrium) relationship statistics for each gene or region. The proposed methods are able to incorporate different levels of heterogeneity of genetic effects across studies and are applicable to meta-analysis of multiple ancestry groups. We show that the proposed methods are essentially as powerful as joint analysis by directly pooling individual level genotype data. We conduct extensive simulations to evaluate the performance of our methods by varying levels of heterogeneity across studies, and we apply the proposed methods to meta-analysis of rare variant effects in a multicohort study of the genetics of blood lipid levels. PMID:23768515

  5. Comparing genotoxic signatures in cord blood cells from neonates exposed in utero to zidovudine or tenofovir

    PubMed Central

    Vivanti, Alexandre; Soheili, Tayebeh S.; Cuccuini, Wendy; Luce, Sonia; Mandelbrot, Laurent; Lechenadec, Jerome; Cordier, Anne-Gael; Azria, Elie; Soulier, Jean; Cavazzana, Marina; Blanche, Stéphane; André-Schmutz, Isabelle

    2015-01-01

    Objectives: Zidovudine and tenofovir are the two main nucleos(t)ide analogs used to prevent mother-to-child transmission of HIV. In vitro, both drugs bind to and integrate into human DNA and inhibit telomerase. The objective of the present study was to assess the genotoxic effects of either zidovudine or tenofovir-based combination therapies on cord blood cells in newborns exposed in utero. Design: We compared the aneuploid rate and the gene expression profiles in cord blood samples from newborns exposed either to zidovudine or tenofovir-based combination therapies during pregnancy and from unexposed controls (n = 8, 9, and 8, respectively). Methods: The aneuploidy rate was measured on the cord blood T-cell karyotype. Gene expression profiles of cord blood T cells and hematopoietic stem and progenitor cells were determined with microarrays, analyzed in a gene set enrichment analysis and confirmed by real-time quantitative PCRs. Results: Aneuploidy was more frequent in the zidovudine-exposed group (26.3%) than in the tenofovir-exposed group (14.2%) or in controls (13.3%; P < 0.05 for both). The transcription of genes involved in DNA repair, telomere maintenance, nucleotide metabolism, DNA/RNA synthesis, and the cell cycle was deregulated in samples from both the zidovudine and the tenofovir-exposed groups. Conclusion: Although tenofovir has a lower clastogenic impact than zidovudine, gene expression profiling showed that both drugs alter the transcription of DNA repair and telomere maintenance genes. PMID:25513819

  6. Gene-based interaction analysis shows GABAergic genes interacting with parenting in adolescent depressive symptoms.

    PubMed

    Van Assche, Evelien; Moons, Tim; Cinar, Ozan; Viechtbauer, Wolfgang; Oldehinkel, Albertine J; Van Leeuwen, Karla; Verschueren, Karine; Colpin, Hilde; Lambrechts, Diether; Van den Noortgate, Wim; Goossens, Luc; Claes, Stephan; van Winkel, Ruud

    2017-12-01

    Most gene-environment interaction studies (G × E) have focused on single candidate genes. This approach is criticized for its expectations of large effect sizes and occurrence of spurious results. We describe an approach that accounts for the polygenic nature of most psychiatric phenotypes and reduces the risk of false-positive findings. We apply this method focusing on the role of perceived parental support, psychological control, and harsh punishment in depressive symptoms in adolescence. Analyses were conducted on 982 adolescents of Caucasian origin (M age (SD) = 13.78 (.94) years) genotyped for 4,947 SNPs in 263 genes, selected based on a literature survey. The Leuven Adolescent Perceived Parenting Scale (LAPPS) and the Parental Behavior Scale (PBS) were used to assess perceived parental psychological control, harsh punishment, and support. The Center for Epidemiologic Studies Depression Scale (CES-D) was the outcome. We used gene-based testing taking into account linkage disequilibrium to identify genes containing SNPs exhibiting an interaction with environmental factors yielding a p-value per single gene. Significant results at the corrected p-value of p < 1.90 × 10 -4 were examined in an independent replication sample of Dutch adolescents (N = 1354). Two genes showed evidence for interaction with perceived support: GABRR1 (p = 4.62 × 10 -5 ) and GABRR2 (p = 9.05 × 10 -6 ). No genes interacted significantly with psychological control or harsh punishment. Gene-based analysis was unable to confirm the interaction of GABRR1 or GABRR2 with support in the replication sample. However, for GABRR2, but not GABRR1, the correlation of the estimates between the two datasets was significant (r (46) = .32; p = .027) and a gene-based analysis of the combined datasets supported GABRR2 × support interaction (p = 1.63 × 10 -4 ). We present a gene-based method for gene-environment interactions in a polygenic context and show that genes interact differently with particular aspects of parenting. This accentuates the importance of polygenic approaches and the need to accurately assess environmental exposure in G × E. © 2017 Association for Child and Adolescent Mental Health.

  7. The construction of cDNA library and the screening of related antigen of ascitic tumor cells of ovarian cancer.

    PubMed

    Hou, Q; Chen, K; Shan, Z

    2015-01-01

    To construct the cDNA library of the ascites tumor cells of ovarian cancer, which can be used to screen the related antigen for the early diagnosis of ovarian cancer and therapeutic targets of immune treatment. Four cases of ovarian serous cystadenocarcinoma, two cases of ovarian mucinous cystadenocarcinoma, and two cases of ovarian endometrial carcinoma in patients with ascitic tumor cells which were used to construct the cDNA library. To screen the ovarian cancer antigen gene, evaluate the enzyme, and analyze nucleotide sequence, serological analysis of recombinant tumor cDNA expression libraries (SEREX) and suppression subtractive hybridization technique (SSH) techniques were utilized. The detection method of recombinant expression-based serological mini-arrays (SMARTA) was used to detect the ovarian cancer antigen and the positive reaction of 105 cases of ovarian cancer patients and 105 normal women's autoantibodies correspondingly in serum. After two rounds of serologic screening and glycosides sequencing analysis, 59 candidates of ovarian cancer antigen gene fragments were finally identified, which corresponded to 50 genes. They were then divided into six categories: (1) the homologous genes which related to the known ovarian cancer genes, such as BARD 1 gene, etc; (2) the homologous genes which were associated with other tumors, such as TM4SFI gene, etc; (3) the genes which were expressed in a special organization, such as ILF3, FXR1 gene, etc; (4) the genes which were the same with some protein genes of special function, such as TIZ, ClD gene; (5) the homologous genes which possessed the same source with embryonic genes, such as PKHD1 gene, etc; (6) the remaining genes were the unknown genes without the homologous sequence in the gene pool, such as OV-189 genes. SEREX technology combined with SSH method is an effective research strategy which can filter tumor antigen with high specific character; the corresponding autoantibodies of TM4SFl, ClD, TIZ, BARDI, FXRI, and OV-189 gene's recombinant antigen in serum can be regarded as the biomarkers which are used to diagnose ovarian cancer. The combination of multiple antigen detection can improve diagnostic efficiency.

  8. Cancer classification through filtering progressive transductive support vector machine based on gene expression data

    NASA Astrophysics Data System (ADS)

    Lu, Xinguo; Chen, Dan

    2017-08-01

    Traditional supervised classifiers neglect a large amount of data which not have sufficient follow-up information, only work with labeled data. Consequently, the small sample size limits the advancement of design appropriate classifier. In this paper, a transductive learning method which combined with the filtering strategy in transductive framework and progressive labeling strategy is addressed. The progressive labeling strategy does not need to consider the distribution of labeled samples to evaluate the distribution of unlabeled samples, can effective solve the problem of evaluate the proportion of positive and negative samples in work set. Our experiment result demonstrate that the proposed technique have great potential in cancer prediction based on gene expression.

  9. A detailed view on Model-Based Multifactor Dimensionality Reduction for detecting gene-gene interactions in case-control data in the absence and presence of noise

    PubMed Central

    CATTAERT, TOM; CALLE, M. LUZ; DUDEK, SCOTT M.; MAHACHIE JOHN, JESTINAH M.; VAN LISHOUT, FRANÇOIS; URREA, VICTOR; RITCHIE, MARYLYN D.; VAN STEEN, KRISTEL

    2010-01-01

    SUMMARY Analyzing the combined effects of genes and/or environmental factors on the development of complex diseases is a great challenge from both the statistical and computational perspective, even using a relatively small number of genetic and non-genetic exposures. Several data mining methods have been proposed for interaction analysis, among them, the Multifactor Dimensionality Reduction Method (MDR), which has proven its utility in a variety of theoretical and practical settings. Model-Based Multifactor Dimensionality Reduction (MB-MDR), a relatively new MDR-based technique that is able to unify the best of both non-parametric and parametric worlds, was developed to address some of the remaining concerns that go along with an MDR-analysis. These include the restriction to univariate, dichotomous traits, the absence of flexible ways to adjust for lower-order effects and important confounders, and the difficulty to highlight epistasis effects when too many multi-locus genotype cells are pooled into two new genotype groups. Whereas the true value of MB-MDR can only reveal itself by extensive applications of the method in a variety of real-life scenarios, here we investigate the empirical power of MB-MDR to detect gene-gene interactions in the absence of any noise and in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity. For the considered simulation settings, we show that the power is generally higher for MB-MDR than for MDR, in particular in the presence of genetic heterogeneity, phenocopy, or low minor allele frequencies. PMID:21158747

  10. A random set scoring model for prioritization of disease candidate genes using protein complexes and data-mining of GeneRIF, OMIM and PubMed records.

    PubMed

    Jiang, Li; Edwards, Stefan M; Thomsen, Bo; Workman, Christopher T; Guldbrandtsen, Bernt; Sørensen, Peter

    2014-09-24

    Prioritizing genetic variants is a challenge because disease susceptibility loci are often located in genes of unknown function or the relationship with the corresponding phenotype is unclear. A global data-mining exercise on the biomedical literature can establish the phenotypic profile of genes with respect to their connection to disease phenotypes. The importance of protein-protein interaction networks in the genetic heterogeneity of common diseases or complex traits is becoming increasingly recognized. Thus, the development of a network-based approach combined with phenotypic profiling would be useful for disease gene prioritization. We developed a random-set scoring model and implemented it to quantify phenotype relevance in a network-based disease gene-prioritization approach. We validated our approach based on different gene phenotypic profiles, which were generated from PubMed abstracts, OMIM, and GeneRIF records. We also investigated the validity of several vocabulary filters and different likelihood thresholds for predicted protein-protein interactions in terms of their effect on the network-based gene-prioritization approach, which relies on text-mining of the phenotype data. Our method demonstrated good precision and sensitivity compared with those of two alternative complex-based prioritization approaches. We then conducted a global ranking of all human genes according to their relevance to a range of human diseases. The resulting accurate ranking of known causal genes supported the reliability of our approach. Moreover, these data suggest many promising novel candidate genes for human disorders that have a complex mode of inheritance. We have implemented and validated a network-based approach to prioritize genes for human diseases based on their phenotypic profile. We have devised a powerful and transparent tool to identify and rank candidate genes. Our global gene prioritization provides a unique resource for the biological interpretation of data from genome-wide association studies, and will help in the understanding of how the associated genetic variants influence disease or quantitative phenotypes.

  11. The combination of quantitative PCR and western blot detecting CP4-EPSPS component in Roundup Ready soy plant tissues and commercial soy-related foodstuffs.

    PubMed

    Xiao, Xiao; Wu, Honghong; Zhou, Xinghu; Xu, Sheng; He, Jian; Shen, Wenbiao; Zhou, Guanghong; Huang, Ming

    2012-06-01

    With the widespread use of Roundup Ready soy (event 40-3-2) (RRS), the comprehensive detection of genetically modified component in foodstuffs is of significant interest, but few protein-based approaches have been found useful in processed foods. In this report, the combination of quantitative PCR (qPCR) and western blot was used to detect cp4-epsps gene and its protein product in different RRS plant tissues and commercial soy-containing foodstuffs. The foods included those of plant origin produced by different processing procedures and also some products containing both meat and plant protein concentrates. The validity of the 2 methods was confirmed first. We also showed that the CP4-EPSPS protein existed in different RRS plant tissues. In certain cases, the results from the western blot and the qPCR were not consistent. To be specific, at least 2 degraded fragments of CP4-EPSPS protein (35.5 and 24.6 kDa) were observed. For dried bean curd crust and deep-fried bean curd, a degraded protein fragment with the size of 24.6 kDa appeared, while cp4-epsps gene could not be traced by qPCR. In contrast, we found a signal of cp4-epsps DNA in 3 foodstuffs, including soy-containing ham cutlet product, meat ball, and sausage by qPCR, while CP4-EPSPS protein could not be detected by western blot in such samples. Our study therefore concluded that the combination of DNA- and protein-based methods would compensate each other, thus resulting in a more comprehensive detection from nucleic acid and protein levels. The combination of quantitative PCR (qPCR) and western blot was used to detect cp4-epsps gene and its protein product in different Roundup Ready soy (event 40-3-2) plant tissues and commercial soy-containing foodstuffs. The foods included those of plant origin produced by different processing procedures and also some products containing a combination of both meat and plant protein concentrates. This study indicated that the combination of DNA- and protein-based methods would supplement each other for genetically modified detection from nucleic acid and protein levels. Accordingly, qPCR and western blot could be used in CP4-EPSPS detection in a wide variety of soy-related foodstuffs. © 2012 Institute of Food Technologists®

  12. Gene length corrected trimmed mean of M-values (GeTMM) processing of RNA-seq data performs similarly in intersample analyses while improving intrasample comparisons.

    PubMed

    Smid, Marcel; Coebergh van den Braak, Robert R J; van de Werken, Harmen J G; van Riet, Job; van Galen, Anne; de Weerd, Vanja; van der Vlugt-Daane, Michelle; Bril, Sandra I; Lalmahomed, Zarina S; Kloosterman, Wigard P; Wilting, Saskia M; Foekens, John A; IJzermans, Jan N M; Martens, John W M; Sieuwerts, Anieta M

    2018-06-22

    Current normalization methods for RNA-sequencing data allow either for intersample comparison to identify differentially expressed (DE) genes or for intrasample comparison for the discovery and validation of gene signatures. Most studies on optimization of normalization methods typically use simulated data to validate methodologies. We describe a new method, GeTMM, which allows for both inter- and intrasample analyses with the same normalized data set. We used actual (i.e. not simulated) RNA-seq data from 263 colon cancers (no biological replicates) and used the same read count data to compare GeTMM with the most commonly used normalization methods (i.e. TMM (used by edgeR), RLE (used by DESeq2) and TPM) with respect to distributions, effect of RNA quality, subtype-classification, recurrence score, recall of DE genes and correlation to RT-qPCR data. We observed a clear benefit for GeTMM and TPM with regard to intrasample comparison while GeTMM performed similar to TMM and RLE normalized data in intersample comparisons. Regarding DE genes, recall was found comparable among the normalization methods, while GeTMM showed the lowest number of false-positive DE genes. Remarkably, we observed limited detrimental effects in samples with low RNA quality. We show that GeTMM outperforms established methods with regard to intrasample comparison while performing equivalent with regard to intersample normalization using the same normalized data. These combined properties enhance the general usefulness of RNA-seq but also the comparability to the many array-based gene expression data in the public domain.

  13. Successful recovery of transgenic cowpea (Vigna unguiculata) using the 6-phosphomannose isomerase gene as the selectable marker.

    PubMed

    Bakshi, Souvika; Saha, Bedabrata; Roy, Nand Kishor; Mishra, Sagarika; Panda, Sanjib Kumar; Sahoo, Lingaraj

    2012-06-01

    A new method for obtaining transgenic cowpea was developed using positive selection based on the Escherichia coli 6-phosphomannose isomerase gene as the selectable marker and mannose as the selective agent. Only transformed cells were capable of utilizing mannose as a carbon source. Cotyledonary node explants from 4-day-old in vitro-germinated seedlings of cultivar Pusa Komal were inoculated with Agrobacterium tumefaciens strain EHA105 carrying the vector pNOV2819. Regenerating transformed shoots were selected on medium supplemented with a combination of 20 g/l mannose and 5 g/l sucrose as carbon source. The transformed shoots were rooted on medium devoid of mannose. Transformation efficiency based on PCR analysis of individual putative transformed shoots was 3.6%. Southern blot analysis on five randomly chosen PCR-positive plants confirmed the integration of the pmi transgene. Qualitative reverse transcription (qRT-PCR) analysis demonstrated the expression of pmi in T₀ transgenic plants. Chlorophenol red (CPR) assays confirmed the activity of PMI in transgenic plants, and the gene was transmitted to progeny in a Mendelian fashion. The transformation method presented here for cowpea using mannose selection is efficient and reproducible, and could be used to introduce a desirable gene(s) into cowpea for biotic and abiotic stress tolerance.

  14. Phenotypic H-Antigen Typing by Mass Spectrometry Combined with Genetic Typing of H Antigens, O Antigens, and Toxins by Whole-Genome Sequencing Enhances Identification of Escherichia coli Isolates.

    PubMed

    Cheng, Keding; Chui, Huixia; Domish, Larissa; Sloan, Angela; Hernandez, Drexler; McCorrister, Stuart; Robinson, Alyssia; Walker, Matthew; Peterson, Lorea A M; Majcher, Miles; Ratnam, Sam; Haldane, David J M; Bekal, Sadjia; Wylie, John; Chui, Linda; Tyler, Shaun; Xu, Bianli; Reimer, Aleisha; Nadon, Celine; Knox, J David; Wang, Gehua

    2016-08-01

    Mass spectrometry-based phenotypic H-antigen typing (MS-H) combined with whole-genome-sequencing-based genetic identification of H antigens, O antigens, and toxins (WGS-HOT) was used to type 60 clinical Escherichia coli isolates, 43 of which were previously identified as nonmotile, H type undetermined, or O rough by serotyping or having shown discordant MS-H and serotyping results. Whole-genome sequencing confirmed that MS-H was able to provide more accurate data regarding H antigen expression than serotyping. Further, enhanced and more confident O antigen identification resulted from gene cluster based typing in combination with conventional typing based on the gene pair comprising wzx and wzy and that comprising wzm and wzt The O antigen was identified in 94.6% of the isolates when the two genetic O typing approaches (gene pair and gene cluster) were used in conjunction, in comparison to 78.6% when the gene pair database was used alone. In addition, 98.2% of the isolates showed the existence of genes for various toxins and/or virulence factors, among which verotoxins (Shiga toxin 1 and/or Shiga toxin 2) were 100% concordant with conventional PCR based testing results. With more applications of mass spectrometry and whole-genome sequencing in clinical microbiology laboratories, this combined phenotypic and genetic typing platform (MS-H plus WGS-HOT) should be ideal for pathogenic E. coli typing. Copyright © 2016 Cheng et al.

  15. Genotyping of K-ras codons 12 and 13 mutations in colorectal cancer by capillary electrophoresis.

    PubMed

    Chen, Yen-Ling; Chang, Ya-Sian; Chang, Jan-Gowth; Wu, Shou-Mei

    2009-06-26

    Point mutations of the K-ras gene located in codons 12 and 13 cause poor responses to the anti-epidermal growth factor receptor (anti-EGFR) therapy of colorectal cancer (CRC) patients. Besides, mutations of K-ras gene have also been proven to play an important role in human tumor progression. We established a simple and effective capillary electrophoresis (CE) method for simultaneous point mutation detection in codons 12 and 13 of K-ras gene. We combined one universal fluorescence-based nonhuman-sequence primer and two fragment-oriented primers in one tube, and performed this two-in-one polymerase chain reaction (PCR). PCR fragments included wild type and seven point mutations at codons 12 and 13 of K-ras gene. The amplicons were analyzed by single-strand conformation polymorphism (SSCP)-CE method. The CE analysis was performed by using a 1x Tris-borate-EDTA (TBE) buffer containing 1.5% (w/v) hydroxyethylcellulose (HEC) (MW 250,000) under reverse polarity with 15 degrees C and 30 degrees C. Ninety colorectal cancer patients were blindly genotyped using this developed method. The results showed good agreement with those of DNA sequencing method. The SSCP-CE was feasible for mutation screening of K-ras gene in populations.

  16. Progress and outlook of inorganic nanoparticles for delivery of nucleic acid sequences related to orthopedic pathologies: a review.

    PubMed

    Wagner, Darcy E; Bhaduri, Sarit B

    2012-02-01

    The anticipated growth in the aging population will drastically increase medical needs of society; of which, one of the largest components will undoubtedly be from orthopedic-related pathologies. There are several proposed solutions being investigated to cost-effectively prepare for the future--pharmaceuticals, implant devices, cell and gene therapies, or some combination thereof. Gene therapy is one of the more promising possibilities because it seeks to correct the root of the problem, thereby minimizing treatment duration and cost. Currently, viral vectors have shown the highest efficacies, but immunological concerns remain. Nonviral methods show reduced immune responses but are regarded as less efficient. The nonviral paradigms consist of mechanical and chemical approaches. While organic-based materials have been used more frequently in particle-based methods, inorganic materials capable of delivery have distinct advantages, especially advantageous in orthopedic applications. The inorganic gene therapy field is highly interdisciplinary in nature, and requires assimilation of knowledge across the broad fields of cell biology, biochemistry, molecular genetics, materials science, and clinical medicine. This review provides an overview of the role each area plays in orthopedic gene therapy as well as possible future directions for the field.

  17. Literature-based condition-specific miRNA-mRNA target prediction.

    PubMed

    Oh, Minsik; Rhee, Sungmin; Moon, Ji Hwan; Chae, Heejoon; Lee, Sunwon; Kang, Jaewoo; Kim, Sun

    2017-01-01

    miRNAs are small non-coding RNAs that regulate gene expression by binding to the 3'-UTR of genes. Many recent studies have reported that miRNAs play important biological roles by regulating specific mRNAs or genes. Many sequence-based target prediction algorithms have been developed to predict miRNA targets. However, these methods are not designed for condition-specific target predictions and produce many false positives; thus, expression-based target prediction algorithms have been developed for condition-specific target predictions. A typical strategy to utilize expression data is to leverage the negative control roles of miRNAs on genes. To control false positives, a stringent cutoff value is typically set, but in this case, these methods tend to reject many true target relationships, i.e., false negatives. To overcome these limitations, additional information should be utilized. The literature is probably the best resource that we can utilize. Recent literature mining systems compile millions of articles with experiments designed for specific biological questions, and the systems provide a function to search for specific information. To utilize the literature information, we used a literature mining system, BEST, that automatically extracts information from the literature in PubMed and that allows the user to perform searches of the literature with any English words. By integrating omics data analysis methods and BEST, we developed Context-MMIA, a miRNA-mRNA target prediction method that combines expression data analysis results and the literature information extracted based on the user-specified context. In the pathway enrichment analysis using genes included in the top 200 miRNA-targets, Context-MMIA outperformed the four existing target prediction methods that we tested. In another test on whether prediction methods can re-produce experimentally validated target relationships, Context-MMIA outperformed the four existing target prediction methods. In summary, Context-MMIA allows the user to specify a context of the experimental data to predict miRNA targets, and we believe that Context-MMIA is very useful for predicting condition-specific miRNA targets.

  18. Creating genetic resistance to HIV.

    PubMed

    Burnett, John C; Zaia, John A; Rossi, John J

    2012-10-01

    HIV/AIDS remains a chronic and incurable disease, in spite of the notable successes of combination antiretroviral therapy. Gene therapy offers the prospect of creating genetic resistance to HIV that supplants the need for antiviral drugs. In sight of this goal, a variety of anti-HIV genes have reached clinical testing, including gene-editing enzymes, protein-based inhibitors, and RNA-based therapeutics. Combinations of therapeutic genes against viral and host targets are designed to improve the overall antiviral potency and reduce the likelihood of viral resistance. In cell-based therapies, therapeutic genes are expressed in gene modified T lymphocytes or in hematopoietic stem cells that generate an HIV-resistant immune system. Such strategies must promote the selective proliferation of the transplanted cells and the prolonged expression of therapeutic genes. This review focuses on the current advances and limitations in genetic therapies against HIV, including the status of several recent and ongoing clinical studies. Copyright © 2012 Elsevier Ltd. All rights reserved.

  19. Selective Targeting of CTNNB1-, KRAS- or MYC-Driven Cell Growth by Combinations of Existing Drugs

    PubMed Central

    Uitdehaag, Joost C. M.; de Roos, Jeroen A. D. M.; van Doornmalen, Antoon M.; Prinsen, Martine B. W.; Spijkers-Hagelstein, Jill A. P.; de Vetter, Judith R. F.; de Man, Jos; Buijsman, Rogier C.; Zaman, Guido J. R.

    2015-01-01

    The aim of combination drug treatment in cancer therapy is to improve response rate and to decrease the probability of the development of drug resistance. Preferably, drug combinations are synergistic rather than additive, and, ideally, drug combinations work synergistically only in cancer cells and not in non-malignant cells. We have developed a workflow to identify such targeted synergies, and applied this approach to selectively inhibit the proliferation of cell lines with mutations in genes that are difficult to modulate with small molecules. The approach is based on curve shift analysis, which we demonstrate is a more robust method of determining synergy than combination matrix screening with Bliss-scoring. We show that the MEK inhibitor trametinib is more synergistic in combination with the BRAF inhibitor dabrafenib than with vemurafenib, another BRAF inhibitor. In addition, we show that the combination of MEK and BRAF inhibitors is synergistic in BRAF-mutant melanoma cells, and additive or antagonistic in, respectively, BRAF-wild type melanoma cells and non-malignant fibroblasts. This combination exemplifies that synergistic action of drugs can depend on cancer genotype. Next, we used curve shift analysis to identify new drug combinations that specifically inhibit cancer cell proliferation driven by difficult-to-drug cancer genes. Combination studies were performed with compounds that as single agents showed preference for inhibition of cancer cells with mutations in either the CTNNB1 gene (coding for β-catenin), KRAS, or cancer cells expressing increased copy numbers of MYC. We demonstrate that the Wnt-pathway inhibitor ICG-001 and trametinib acted synergistically in Wnt-pathway-mutant cell lines. The ERBB2 inhibitor TAK-165 was synergistic with trametinib in KRAS-mutant cell lines. The EGFR/ERBB2 inhibitor neratinib acted synergistically with the spindle poison docetaxel and with the Aurora kinase inhibitor GSK-1070916 in cell lines with MYC amplification. Our approach can therefore efficiently discover novel drug combinations that selectively target cancer genes. PMID:26018524

  20. In-depth resistome analysis by targeted metagenomics.

    PubMed

    Lanza, Val F; Baquero, Fernando; Martínez, José Luís; Ramos-Ruíz, Ricardo; González-Zorn, Bruno; Andremont, Antoine; Sánchez-Valenzuela, Antonio; Ehrlich, Stanislav Dusko; Kennedy, Sean; Ruppé, Etienne; van Schaik, Willem; Willems, Rob J; de la Cruz, Fernando; Coque, Teresa M

    2018-01-15

    Antimicrobial resistance is a major global health challenge. Metagenomics allows analyzing the presence and dynamics of "resistomes" (the ensemble of genes encoding antimicrobial resistance in a given microbiome) in disparate microbial ecosystems. However, the low sensitivity and specificity of available metagenomic methods preclude the detection of minority populations (often present below their detection threshold) and/or the identification of allelic variants that differ in the resulting phenotype. Here, we describe a novel strategy that combines targeted metagenomics using last generation in-solution capture platforms, with novel bioinformatics tools to establish a standardized framework that allows both quantitative and qualitative analyses of resistomes. We developed ResCap, a targeted sequence capture platform based on SeqCapEZ (NimbleGene) technology, which includes probes for 8667 canonical resistance genes (7963 antibiotic resistance genes and 704 genes conferring resistance to metals or biocides), and 2517 relaxase genes (plasmid markers) and 78,600 genes homologous to the previous identified targets (47,806 for antibiotics and 30,794 for biocides or metals). Its performance was compared with metagenomic shotgun sequencing (MSS) for 17 fecal samples (9 humans, 8 swine). ResCap significantly improves MSS to detect "gene abundance" (from 2.0 to 83.2%) and "gene diversity" (26 versus 14.9 genes unequivocally detected per sample per million of reads; the number of reads unequivocally mapped increasing up to 300-fold by using ResCap), which were calculated using novel bioinformatic tools. ResCap also facilitated the analysis of novel genes potentially involved in the resistance to antibiotics, metals, biocides, or any combination thereof. ResCap, the first targeted sequence capture, specifically developed to analyze resistomes, greatly enhances the sensitivity and specificity of available metagenomic methods and offers the possibility to analyze genes related to the selection and transfer of antimicrobial resistance (biocides, heavy metals, plasmids). The model opens the possibility to study other complex microbial systems in which minority populations play a relevant role.

  1. Gene Environment Interactions and Predictors of Colorectal Cancer in Family-Based, Multi-Ethnic Groups.

    PubMed

    Shiao, S Pamela K; Grayson, James; Yu, Chong Ho; Wasek, Brandi; Bottiglieri, Teodoro

    2018-02-16

    For the personalization of polygenic/omics-based health care, the purpose of this study was to examine the gene-environment interactions and predictors of colorectal cancer (CRC) by including five key genes in the one-carbon metabolism pathways. In this proof-of-concept study, we included a total of 54 families and 108 participants, 54 CRC cases and 54 matched family friends representing four major racial ethnic groups in southern California (White, Asian, Hispanics, and Black). We used three phases of data analytics, including exploratory, family-based analyses adjusting for the dependence within the family for sharing genetic heritage, the ensemble method, and generalized regression models for predictive modeling with a machine learning validation procedure to validate the results for enhanced prediction and reproducibility. The results revealed that despite the family members sharing genetic heritage, the CRC group had greater combined gene polymorphism rates than the family controls ( p < 0.05), on MTHFR C677T , MTR A2756G , MTRR A66G, and DHFR 19 bp except MTHFR A1298C. Four racial groups presented different polymorphism rates for four genes (all p < 0.05) except MTHFR A1298C. Following the ensemble method, the most influential factors were identified, and the best predictive models were generated by using the generalized regression models, with Akaike's information criterion and leave-one-out cross validation methods. Body mass index (BMI) and gender were consistent predictors of CRC for both models when individual genes versus total polymorphism counts were used, and alcohol use was interactive with BMI status. Body mass index status was also interactive with both gender and MTHFR C677T gene polymorphism, and the exposure to environmental pollutants was an additional predictor. These results point to the important roles of environmental and modifiable factors in relation to gene-environment interactions in the prevention of CRC.

  2. An Improved Single-Step Cloning Strategy Simplifies the Agrobacterium tumefaciens-Mediated Transformation (ATMT)-Based Gene-Disruption Method for Verticillium dahliae.

    PubMed

    Wang, Sheng; Xing, Haiying; Hua, Chenlei; Guo, Hui-Shan; Zhang, Jie

    2016-06-01

    The soilborne fungal pathogen Verticillium dahliae infects a broad range of plant species to cause severe diseases. The availability of Verticillium genome sequences has provided opportunities for large-scale investigations of individual gene function in Verticillium strains using Agrobacterium tumefaciens-mediated transformation (ATMT)-based gene-disruption strategies. Traditional ATMT vectors require multiple cloning steps and elaborate characterization procedures to achieve successful gene replacement; thus, these vectors are not suitable for high-throughput ATMT-based gene deletion. Several advancements have been made that either involve simplification of the steps required for gene-deletion vector construction or increase the efficiency of the technique for rapid recombinant characterization. However, an ATMT binary vector that is both simple and efficient is still lacking. Here, we generated a USER-ATMT dual-selection (DS) binary vector, which combines both the advantages of the USER single-step cloning technique and the efficiency of the herpes simplex virus thymidine kinase negative-selection marker. Highly efficient deletion of three different genes in V. dahliae using the USER-ATMT-DS vector enabled verification that this newly-generated vector not only facilitates the cloning process but also simplifies the subsequent identification of fungal homologous recombinants. The results suggest that the USER-ATMT-DS vector is applicable for efficient gene deletion and suitable for large-scale gene deletion in V. dahliae.

  3. CATCh, an Ensemble Classifier for Chimera Detection in 16S rRNA Sequencing Studies

    PubMed Central

    Mysara, Mohamed; Saeys, Yvan; Leys, Natalie; Raes, Jeroen

    2014-01-01

    In ecological studies, microbial diversity is nowadays mostly assessed via the detection of phylogenetic marker genes, such as 16S rRNA. However, PCR amplification of these marker genes produces a significant amount of artificial sequences, often referred to as chimeras. Different algorithms have been developed to remove these chimeras, but efforts to combine different methodologies are limited. Therefore, two machine learning classifiers (reference-based and de novo CATCh) were developed by integrating the output of existing chimera detection tools into a new, more powerful method. When comparing our classifiers with existing tools in either the reference-based or de novo mode, a higher performance of our ensemble method was observed on a wide range of sequencing data, including simulated, 454 pyrosequencing, and Illumina MiSeq data sets. Since our algorithm combines the advantages of different individual chimera detection tools, our approach produces more robust results when challenged with chimeric sequences having a low parent divergence, short length of the chimeric range, and various numbers of parents. Additionally, it could be shown that integrating CATCh in the preprocessing pipeline has a beneficial effect on the quality of the clustering in operational taxonomic units. PMID:25527546

  4. High-throughput gene mapping in Caenorhabditis elegans.

    PubMed

    Swan, Kathryn A; Curtis, Damian E; McKusick, Kathleen B; Voinov, Alexander V; Mapa, Felipa A; Cancilla, Michael R

    2002-07-01

    Positional cloning of mutations in model genetic systems is a powerful method for the identification of targets of medical and agricultural importance. To facilitate the high-throughput mapping of mutations in Caenorhabditis elegans, we have identified a further 9602 putative new single nucleotide polymorphisms (SNPs) between two C. elegans strains, Bristol N2 and the Hawaiian mapping strain CB4856, by sequencing inserts from a CB4856 genomic DNA library and using an informatics pipeline to compare sequences with the canonical N2 genomic sequence. When combined with data from other laboratories, our marker set of 17,189 SNPs provides even coverage of the complete worm genome. To date, we have confirmed >1099 evenly spaced SNPs (one every 91 +/- 56 kb) across the six chromosomes and validated the utility of our SNP marker set and new fluorescence polarization-based genotyping methods for systematic and high-throughput identification of genes in C. elegans by cloning several proprietary genes. We illustrate our approach by recombination mapping and confirmation of the mutation in the cloned gene, dpy-18.

  5. Selection and evaluation of reference genes for expression studies with quantitative PCR in the model fungus Neurospora crassa under different environmental conditions in continuous culture.

    PubMed

    Cusick, Kathleen D; Fitzgerald, Lisa A; Pirlo, Russell K; Cockrell, Allison L; Petersen, Emily R; Biffinger, Justin C

    2014-01-01

    Neurospora crassa has served as a model organism for studying circadian pathways and more recently has gained attention in the biofuel industry due to its enhanced capacity for cellulase production. However, in order to optimize N. crassa for biotechnological applications, metabolic pathways during growth under different environmental conditions must be addressed. Reverse-transcription quantitative PCR (RT-qPCR) is a technique that provides a high-throughput platform from which to measure the expression of a large set of genes over time. The selection of a suitable reference gene is critical for gene expression studies using relative quantification, as this strategy is based on normalization of target gene expression to a reference gene whose expression is stable under the experimental conditions. This study evaluated twelve candidate reference genes for use with N. crassa when grown in continuous culture bioreactors under different light and temperature conditions. Based on combined stability values from NormFinder and Best Keeper software packages, the following are the most appropriate reference genes under conditions of: (1) light/dark cycling: btl, asl, and vma1; (2) all-dark growth: btl, tbp, vma1, and vma2; (3) temperature flux: btl, vma1, act, and asl; (4) all conditions combined: vma1, vma2, tbp, and btl. Since N. crassa exists as different cell types (uni- or multi-nucleated), expression changes in a subset of the candidate genes was further assessed using absolute quantification. A strong negative correlation was found to exist between ratio and threshold cycle (CT) values, demonstrating that CT changes serve as a reliable reflection of transcript, and not gene copy number, fluctuations. The results of this study identified genes that are appropriate for use as reference genes in RT-qPCR studies with N. crassa and demonstrated that even with the presence of different cell types, relative quantification is an acceptable method for measuring gene expression changes during growth in bioreactors.

  6. The mechanism and regularity of quenching the effect of bases on fluorophores: the base-quenched probe method.

    PubMed

    Mao, Huihui; Luo, Guanghua; Zhan, Yuxia; Zhang, Jun; Yao, Shuang; Yu, Yang

    2018-04-30

    The base-quenched probe method for detecting single nucleotide polymorphisms (SNPs) relies on real-time PCR and melting-curve analysis, which might require only one pair of primers and one probe. At present, it has been successfully applied to detect SNPs of multiple genes. However, the mechanism of the base-quenched probe method remains unclear. Therefore, we investigated the possible mechanism of fluorescence quenching by DNA bases in aqueous solution using spectroscopic techniques. It showed that the possible mechanism might be photo-induced electron transfer. We next analyzed electron transfer or transmission between DNA bases and fluorophores. The data suggested that in single-stranded DNA, the electrons of the fluorophore are transferred to the orbital of pyrimidine bases (thymine (T) and cytosine (C)), or that the electron orbitals of the fluorophore are occupied by electrons from purine bases (guanine (G) and adenine (A)), which lead to fluorescence quenching. In addition, the electrons of a fluorophore excited by light can be transmitted along double-stranded DNA, which gives rise to stronger fluorescence quenching. Furthermore, we demonstrated that the quenching efficiency of bases is in the order of G > C ≥ A ≥ T and the capability of electron transmission of base-pairs in double-stranded DNA is in the order of CG[combining low line] ≥ GC[combining low line] > TA[combining low line] ≥ AT[combining low line] (letters representing bases on the complementary strand of the probe are bold and underlined), and the most common commercial fluorophores including FAM, HEX, TET, JOE, and TAMRA could be influenced by bases and are in line with this mechanism and regularity.

  7. NetGen: a novel network-based probabilistic generative model for gene set functional enrichment analysis.

    PubMed

    Sun, Duanchen; Liu, Yinliang; Zhang, Xiang-Sun; Wu, Ling-Yun

    2017-09-21

    High-throughput experimental techniques have been dramatically improved and widely applied in the past decades. However, biological interpretation of the high-throughput experimental results, such as differential expression gene sets derived from microarray or RNA-seq experiments, is still a challenging task. Gene Ontology (GO) is commonly used in the functional enrichment studies. The GO terms identified via current functional enrichment analysis tools often contain direct parent or descendant terms in the GO hierarchical structure. Highly redundant terms make users difficult to analyze the underlying biological processes. In this paper, a novel network-based probabilistic generative model, NetGen, was proposed to perform the functional enrichment analysis. An additional protein-protein interaction (PPI) network was explicitly used to assist the identification of significantly enriched GO terms. NetGen achieved a superior performance than the existing methods in the simulation studies. The effectiveness of NetGen was explored further on four real datasets. Notably, several GO terms which were not directly linked with the active gene list for each disease were identified. These terms were closely related to the corresponding diseases when accessed to the curated literatures. NetGen has been implemented in the R package CopTea publicly available at GitHub ( http://github.com/wulingyun/CopTea/ ). Our procedure leads to a more reasonable and interpretable result of the functional enrichment analysis. As a novel term combination-based functional enrichment analysis method, NetGen is complementary to current individual term-based methods, and can help to explore the underlying pathogenesis of complex diseases.

  8. Localization of causal locus in the genome of the brown macroalga Ectocarpus: NGS-based mapping and positional cloning approaches

    PubMed Central

    Billoud, Bernard; Jouanno, Émilie; Nehr, Zofia; Carton, Baptiste; Rolland, Élodie; Chenivesse, Sabine; Charrier, Bénédicte

    2015-01-01

    Mutagenesis is the only process by which unpredicted biological gene function can be identified. Despite that several macroalgal developmental mutants have been generated, their causal mutation was never identified, because experimental conditions were not gathered at that time. Today, progresses in macroalgal genomics and judicious choices of suitable genetic models make mutated gene identification possible. This article presents a comparative study of two methods aiming at identifying a genetic locus in the brown alga Ectocarpus siliculosus: positional cloning and Next-Generation Sequencing (NGS)-based mapping. Once necessary preliminary experimental tools were gathered, we tested both analyses on an Ectocarpus morphogenetic mutant. We show how a narrower localization results from the combination of the two methods. Advantages and drawbacks of these two approaches as well as potential transfer to other macroalgae are discussed. PMID:25745426

  9. Combining multiple decisions: applications to bioinformatics

    NASA Astrophysics Data System (ADS)

    Yukinawa, N.; Takenouchi, T.; Oba, S.; Ishii, S.

    2008-01-01

    Multi-class classification is one of the fundamental tasks in bioinformatics and typically arises in cancer diagnosis studies by gene expression profiling. This article reviews two recent approaches to multi-class classification by combining multiple binary classifiers, which are formulated based on a unified framework of error-correcting output coding (ECOC). The first approach is to construct a multi-class classifier in which each binary classifier to be aggregated has a weight value to be optimally tuned based on the observed data. In the second approach, misclassification of each binary classifier is formulated as a bit inversion error with a probabilistic model by making an analogy to the context of information transmission theory. Experimental studies using various real-world datasets including cancer classification problems reveal that both of the new methods are superior or comparable to other multi-class classification methods.

  10. Sampling and Pooling Methods for Capturing Herd Level Antibiotic Resistance in Swine Feces using qPCR and CFU Approaches

    PubMed Central

    Mellerup, Anders; Ståhl, Marie

    2015-01-01

    The aim of this article was to define the sampling level and method combination that captures antibiotic resistance at pig herd level utilizing qPCR antibiotic resistance gene quantification and culture-based quantification of antibiotic resistant coliform indicator bacteria. Fourteen qPCR assays for commonly detected antibiotic resistance genes were developed, and used to quantify antibiotic resistance genes in total DNA from swine fecal samples that were obtained using different sampling and pooling methods. In parallel, the number of antibiotic resistant coliform indicator bacteria was determined in the same swine fecal samples. The results showed that the qPCR assays were capable of detecting differences in antibiotic resistance levels in individual animals that the coliform bacteria colony forming units (CFU) could not. Also, the qPCR assays more accurately quantified antibiotic resistance genes when comparing individual sampling and pooling methods. qPCR on pooled samples was found to be a good representative for the general resistance level in a pig herd compared to the coliform CFU counts. It had significantly reduced relative standard deviations compared to coliform CFU counts in the same samples, and therefore differences in antibiotic resistance levels between samples were more readily detected. To our knowledge, this is the first study to describe sampling and pooling methods for qPCR quantification of antibiotic resistance genes in total DNA extracted from swine feces. PMID:26114765

  11. Application of machine learning on brain cancer multiclass classification

    NASA Astrophysics Data System (ADS)

    Panca, V.; Rustam, Z.

    2017-07-01

    Classification of brain cancer is a problem of multiclass classification. One approach to solve this problem is by first transforming it into several binary problems. The microarray gene expression dataset has the two main characteristics of medical data: extremely many features (genes) and only a few number of samples. The application of machine learning on microarray gene expression dataset mainly consists of two steps: feature selection and classification. In this paper, the features are selected using a method based on support vector machine recursive feature elimination (SVM-RFE) principle which is improved to solve multiclass classification, called multiple multiclass SVM-RFE. Instead of using only the selected features on a single classifier, this method combines the result of multiple classifiers. The features are divided into subsets and SVM-RFE is used on each subset. Then, the selected features on each subset are put on separate classifiers. This method enhances the feature selection ability of each single SVM-RFE. Twin support vector machine (TWSVM) is used as the method of the classifier to reduce computational complexity. While ordinary SVM finds single optimum hyperplane, the main objective Twin SVM is to find two non-parallel optimum hyperplanes. The experiment on the brain cancer microarray gene expression dataset shows this method could classify 71,4% of the overall test data correctly, using 100 and 1000 genes selected from multiple multiclass SVM-RFE feature selection method. Furthermore, the per class results show that this method could classify data of normal and MD class with 100% accuracy.

  12. LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights

    PubMed Central

    Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong

    2016-01-01

    Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher’s exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO’s usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher. PMID:26750448

  13. LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights.

    PubMed

    Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong

    2016-01-11

    Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher's exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO's usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher.

  14. Integration of Steady-State and Temporal Gene Expression Data for the Inference of Gene Regulatory Networks

    PubMed Central

    Wang, Yi Kan; Hurley, Daniel G.; Schnell, Santiago; Print, Cristin G.; Crampin, Edmund J.

    2013-01-01

    We develop a new regression algorithm, cMIKANA, for inference of gene regulatory networks from combinations of steady-state and time-series gene expression data. Using simulated gene expression datasets to assess the accuracy of reconstructing gene regulatory networks, we show that steady-state and time-series data sets can successfully be combined to identify gene regulatory interactions using the new algorithm. Inferring gene networks from combined data sets was found to be advantageous when using noisy measurements collected with either lower sampling rates or a limited number of experimental replicates. We illustrate our method by applying it to a microarray gene expression dataset from human umbilical vein endothelial cells (HUVECs) which combines time series data from treatment with growth factor TNF and steady state data from siRNA knockdown treatments. Our results suggest that the combination of steady-state and time-series datasets may provide better prediction of RNA-to-RNA interactions, and may also reveal biological features that cannot be identified from dynamic or steady state information alone. Finally, we consider the experimental design of genomics experiments for gene regulatory network inference and show that network inference can be improved by incorporating steady-state measurements with time-series data. PMID:23967277

  15. Meta-analysis identifies a MECOM gene as a novel predisposing factor of osteoporotic fracture

    PubMed Central

    Hwang, Joo-Yeon; Lee, Seung Hun; Go, Min Jin; Kim, Beom-Jun; Kou, Ikuyo; Ikegawa, Shiro; Guo, Yan; Deng, Hong-Wen; Raychaudhuri, Soumya; Kim, Young Jin; Oh, Ji Hee; Kim, Youngdoe; Moon, Sanghoon; Kim, Dong-Joon; Koo, Heejo; Cha, My-Jung; Lee, Min Hye; Yun, Ji Young; Yoo, Hye-Sook; Kang, Young-Ah; Cho, Eun-Hee; Kim, Sang-Wook; Oh, Ki Won; Kang, Moo II; Son, Ho Young; Kim, Shin-Yoon; Kim, Ghi Su; Han, Bok-Ghee; Cho, Yoon Shin; Cho, Myeong-Chan; Lee, Jong-Young; Koh, Jung-Min

    2014-01-01

    Background Osteoporotic fracture (OF) as a clinical endpoint is a major complication of osteoporosis. To screen for OF susceptibility genes, we performed a genome-wide association study and carried out de novo replication analysis of an East Asian population. Methods Association was tested using a logistic regression analysis. A meta-analysis was performed on the combined results using effect size and standard errors estimated for each study. Results In a combined meta-analysis of a discovery cohort (288 cases and 1139 controls), three hospital based sets in replication stage I (462 cases and 1745 controls), and an independent ethnic group in replication stage II (369 cases and 560 for controls), we identified a new locus associated with OF (rs784288 in the MECOM gene) that showed genome-wide significance (p=3.59×10−8; OR 1.39). RNA interference revealed that a MECOM knockdown suppresses osteoclastogenesis. Conclusions Our findings provide new insights into the genetic architecture underlying OF in East Asians. PMID:23349225

  16. Large-scale identification of chemically induced mutations in Drosophila melanogaster

    PubMed Central

    Haelterman, Nele A.; Jiang, Lichun; Li, Yumei; Bayat, Vafa; Sandoval, Hector; Ugur, Berrak; Tan, Kai Li; Zhang, Ke; Bei, Danqing; Xiong, Bo; Charng, Wu-Lin; Busby, Theodore; Jawaid, Adeel; David, Gabriela; Jaiswal, Manish; Venken, Koen J.T.; Yamamoto, Shinya

    2014-01-01

    Forward genetic screens using chemical mutagens have been successful in defining the function of thousands of genes in eukaryotic model organisms. The main drawback of this strategy is the time-consuming identification of the molecular lesions causative of the phenotypes of interest. With whole-genome sequencing (WGS), it is now possible to sequence hundreds of strains, but determining which mutations are causative among thousands of polymorphisms remains challenging. We have sequenced 394 mutant strains, generated in a chemical mutagenesis screen, for essential genes on the Drosophila X chromosome and describe strategies to reduce the number of candidate mutations from an average of ∼3500 to 35 single-nucleotide variants per chromosome. By combining WGS with a rough mapping method based on large duplications, we were able to map 274 (∼70%) mutations. We show that these mutations are causative, using small 80-kb duplications that rescue lethality. Hence, our findings demonstrate that combining rough mapping with WGS dramatically expands the toolkit necessary for assigning function to genes. PMID:25258387

  17. An efficient genome-wide association test for multivariate phenotypes based on the Fisher combination function.

    PubMed

    Yang, James J; Li, Jia; Williams, L Keoki; Buu, Anne

    2016-01-05

    In genome-wide association studies (GWAS) for complex diseases, the association between a SNP and each phenotype is usually weak. Combining multiple related phenotypic traits can increase the power of gene search and thus is a practically important area that requires methodology work. This study provides a comprehensive review of existing methods for conducting GWAS on complex diseases with multiple phenotypes including the multivariate analysis of variance (MANOVA), the principal component analysis (PCA), the generalizing estimating equations (GEE), the trait-based association test involving the extended Simes procedure (TATES), and the classical Fisher combination test. We propose a new method that relaxes the unrealistic independence assumption of the classical Fisher combination test and is computationally efficient. To demonstrate applications of the proposed method, we also present the results of statistical analysis on the Study of Addiction: Genetics and Environment (SAGE) data. Our simulation study shows that the proposed method has higher power than existing methods while controlling for the type I error rate. The GEE and the classical Fisher combination test, on the other hand, do not control the type I error rate and thus are not recommended. In general, the power of the competing methods decreases as the correlation between phenotypes increases. All the methods tend to have lower power when the multivariate phenotypes come from long tailed distributions. The real data analysis also demonstrates that the proposed method allows us to compare the marginal results with the multivariate results and specify which SNPs are specific to a particular phenotype or contribute to the common construct. The proposed method outperforms existing methods in most settings and also has great applications in GWAS on complex diseases with multiple phenotypes such as the substance abuse disorders.

  18. Occurrence of the structural enterocin A, P, B, L50B genes in enterococci of different origin.

    PubMed

    Strompfová, Viola; Lauková, Andrea; Simonová, Monika; Marcináková, Miroslava

    2008-12-10

    Enterococci are well-known producers of antimicrobial peptides--bacteriocins (enterocins) and the number of characterized enterocins has been significantly increased. Recently, enterocins are of great interest for their potential as biopreservatives in food or feed while research on enterocins as alternative antimicrobials in humans and animals is only at the beginning. The present study provides a survey about the occurrence of enterocin structural genes A, P, B, L50B in a target of 427 strains of Enterococcus faecium (368) and Enterococcus faecalis (59) species from different sources (animal isolates, food and feed) performed by PCR method. Based on our results, 234 strains possessed one or more enterocin structural gene(s). The genes of enterocin P and enterocin A were the most frequently detected structural genes among the PCR positive strains (170 and 155 strains, respectively). Different frequency of the enterocin genes occurrence was detected in strains according to their origin; the strains from horses and silage showed the highest frequency of enterocin genes presence. All possible combinations of the tested genes occurred at least twice except the combination of the gene of enterocin B and L50B which possessed neither strain. The gene of enterocin A was exclusively detected among E. faecium strains, while the gene of enterocin P, B, L50B were detected in strains of both species E. faecium and E. faecalis. In conclusion, a high-frequency and variability of enterocin structural genes exists among enterococci of different origin what offers a big possibility to find effective bacteriocin-producing strains for their application in veterinary medicine.

  19. A Minimally Invasive Method for Retrieving Single Adherent Cells of Different Types from Cultures

    PubMed Central

    Zeng, Jia; Mohammadreza, Aida; Gao, Weimin; Merza, Saeed; Smith, Dean; Kelbauskas, Laimonas; Meldrum, Deirdre R.

    2014-01-01

    The field of single-cell analysis has gained a significant momentum over the last decade. Separation and isolation of individual cells is an indispensable step in almost all currently available single-cell analysis technologies. However, stress levels introduced by such manipulations remain largely unstudied. We present a method for minimally invasive retrieval of selected individual adherent cells of different types from cell cultures. The method is based on a combination of mechanical (shear flow) force and biochemical (trypsin digestion) treatment. We quantified alterations in the transcription levels of stress response genes in individual cells exposed to varying levels of shear flow and trypsinization. We report optimal temperature, RNA preservation reagents, shear force and trypsinization conditions necessary to minimize changes in the stress-related gene expression levels. The method and experimental findings are broadly applicable and can be used by a broad research community working in the field of single cell analysis. PMID:24957932

  20. Meta-Analysis of Tumor Stem-Like Breast Cancer Cells Using Gene Set and Network Analysis

    PubMed Central

    Lee, Won Jun; Kim, Sang Cheol; Yoon, Jung-Ho; Yoon, Sang Jun; Lim, Johan; Kim, You-Sun; Kwon, Sung Won; Park, Jeong Hill

    2016-01-01

    Generally, cancer stem cells have epithelial-to-mesenchymal-transition characteristics and other aggressive properties that cause metastasis. However, there have been no confident markers for the identification of cancer stem cells and comparative methods examining adherent and sphere cells are widely used to investigate mechanism underlying cancer stem cells, because sphere cells have been known to maintain cancer stem cell characteristics. In this study, we conducted a meta-analysis that combined gene expression profiles from several studies that utilized tumorsphere technology to investigate tumor stem-like breast cancer cells. We used our own gene expression profiles along with the three different gene expression profiles from the Gene Expression Omnibus, which we combined using the ComBat method, and obtained significant gene sets using the gene set analysis of our datasets and the combined dataset. This experiment focused on four gene sets such as cytokine-cytokine receptor interaction that demonstrated significance in both datasets. Our observations demonstrated that among the genes of four significant gene sets, six genes were consistently up-regulated and satisfied the p-value of < 0.05, and our network analysis showed high connectivity in five genes. From these results, we established CXCR4, CXCL1 and HMGCS1, the intersecting genes of the datasets with high connectivity and p-value of < 0.05, as significant genes in the identification of cancer stem cells. Additional experiment using quantitative reverse transcription-polymerase chain reaction showed significant up-regulation in MCF-7 derived sphere cells and confirmed the importance of these three genes. Taken together, using meta-analysis that combines gene set and network analysis, we suggested CXCR4, CXCL1 and HMGCS1 as candidates involved in tumor stem-like breast cancer cells. Distinct from other meta-analysis, by using gene set analysis, we selected possible markers which can explain the biological mechanisms and suggested network analysis as an additional criterion for selecting candidates. PMID:26870956

  1. Rapid detection of pathological mutations and deletions of the haemoglobin beta gene (HBB) by High Resolution Melting (HRM) analysis and Gene Ratio Analysis Copy Enumeration PCR (GRACE-PCR).

    PubMed

    Turner, Andrew; Sasse, Jurgen; Varadi, Aniko

    2016-10-19

    Inherited disorders of haemoglobin are the world's most common genetic diseases, resulting in significant morbidity and mortality. The large number of mutations associated with the haemoglobin beta gene (HBB) makes gene scanning by High Resolution Melting (HRM) PCR an attractive diagnostic approach. However, existing HRM-PCR assays are not able to detect all common point mutations and have only a very limited ability to detect larger gene rearrangements. The aim of the current study was to develop a HBB assay, which can be used as a screening test in highly heterogeneous populations, for detection of both point mutations and larger gene rearrangements. The assay is based on a combination of conventional HRM-PCR and a novel Gene Ratio Analysis Copy Enumeration (GRACE) PCR method. HRM-PCR was extensively optimised, which included the use of an unlabelled probe and incorporation of universal bases into primers to prevent interference from common non-pathological polymorphisms. GRACE-PCR was employed to determine HBB gene copy numbers relative to a reference gene using melt curve analysis to detect rearrangements in the HBB gene. The performance of the assay was evaluated by analysing 410 samples. A total of 44 distinct pathological genotypes were detected. In comparison with reference methods, the assay has a sensitivity of 100 % and a specificity of 98 %. We have developed an assay that detects both point mutations and larger rearrangements of the HBB gene. This assay is quick, sensitive, specific and cost effective making it suitable as an initial screening test that can be used for highly heterogeneous cohorts.

  2. Logical analysis of diffuse large B-cell lymphomas.

    PubMed

    Alexe, G; Alexe, S; Axelrod, D E; Hammer, P L; Weissmann, D

    2005-07-01

    The goal of this study is to re-examine the oligonucleotide microarray dataset of Shipp et al., which contains the intensity levels of 6817 genes of 58 patients with diffuse large B-cell lymphoma (DLBCL) and 19 with follicular lymphoma (FL), by means of the combinatorics, optimisation, and logic-based methodology of logical analysis of data (LAD). The motivations for this new analysis included the previously demonstrated capabilities of LAD and its expected potential (1) to identify different informative genes than those discovered by conventional statistical methods, (2) to identify combinations of gene expression levels capable of characterizing different types of lymphoma, and (3) to assemble collections of such combinations that if considered jointly are capable of accurately distinguishing different types of lymphoma. The central concept of LAD is a pattern or combinatorial biomarker, a concept that resembles a rule as used in decision tree methods. LAD is able to exhaustively generate the collection of all those patterns which satisfy certain quality constraints, through a systematic combinatorial process guided by clear optimization criteria. Then, based on a set covering approach, LAD aggregates the collection of patterns into classification models. In addition, LAD is able to use the information provided by large collections of patterns in order to extract subsets of variables, which collectively are able to distinguish between different types of disease. For the differential diagnosis of DLBCL versus FL, a model based on eight significant genes is constructed and shown to have a sensitivity of 94.7% and a specificity of 100% on the test set. For the prognosis of good versus poor outcome among the DLBCL patients, a model is constructed on another set consisting also of eight significant genes, and shown to have a sensitivity of 87.5% and a specificity of 90% on the test set. The genes selected by LAD also work well as a basis for other kinds of statistical analysis, indicating their robustness. These two models exhibit accuracies that compare favorably to those in the original study. In addition, the current study also provides a ranking by importance of the genes in the selected significant subsets as well as a library of dozens of combinatorial biomarkers (i.e. pairs or triplets of genes) that can serve as a source of mathematically generated, statistically significant research hypotheses in need of biological explanation.

  3. Parkinson's disease candidate gene prioritization based on expression profile of midbrain dopaminergic neurons

    PubMed Central

    2010-01-01

    Background Parkinson's disease is the second most common neurodegenerative disorder. The pathological hallmark of the disease is degeneration of midbrain dopaminergic neurons. Genetic association studies have linked 13 human chromosomal loci to Parkinson's disease. Identification of gene(s), as part of the etiology of Parkinson's disease, within the large number of genes residing in these loci can be achieved through several approaches, including screening methods, and considering appropriate criteria. Since several of the indentified Parkinson's disease genes are expressed in substantia nigra pars compact of the midbrain, expression within the neurons of this area could be a suitable criterion to limit the number of candidates and identify PD genes. Methods In this work we have used the combination of findings from six rodent transcriptome analysis studies on the gene expression profile of midbrain dopaminergic neurons and the PARK loci in OMIM (Online Mendelian Inheritance in Man) database, to identify new candidate genes for Parkinson's disease. Results Merging the two datasets, we identified 20 genes within PARK loci, 7 of which are located in an orphan Parkinson's disease locus and one, which had been identified as a disease gene. In addition to identifying a set of candidates for further genetic association studies, these results show that the criteria of expression in midbrain dopaminergic neurons may be used to narrow down the number of genes in PARK loci for such studies. PMID:20716345

  4. AUDIOME: a tiered exome sequencing-based comprehensive gene panel for the diagnosis of heterogeneous nonsyndromic sensorineural hearing loss.

    PubMed

    Guan, Qiaoning; Balciuniene, Jorune; Cao, Kajia; Fan, Zhiqian; Biswas, Sawona; Wilkens, Alisha; Gallo, Daniel J; Bedoukian, Emma; Tarpinian, Jennifer; Jayaraman, Pushkala; Sarmady, Mahdi; Dulik, Matthew; Santani, Avni; Spinner, Nancy; Abou Tayoun, Ahmad N; Krantz, Ian D; Conlin, Laura K; Luo, Minjie

    2018-03-29

    PurposeHereditary hearing loss is highly heterogeneous. To keep up with rapidly emerging disease-causing genes, we developed the AUDIOME test for nonsyndromic hearing loss (NSHL) using an exome sequencing (ES) platform and targeted analysis for the curated genes.MethodsA tiered strategy was implemented for this test. Tier 1 includes combined Sanger and targeted deletion analyses of the two most common NSHL genes and two mitochondrial genes. Nondiagnostic tier 1 cases are subjected to ES and array followed by targeted analysis of the remaining AUDIOME genes.ResultsES resulted in good coverage of the selected genes with 98.24% of targeted bases at >15 ×. A fill-in strategy was developed for the poorly covered regions, which generally fell within GC-rich or highly homologous regions. Prospective testing of 33 patients with NSHL revealed a diagnosis in 11 (33%) and a possible diagnosis in 8 cases (24.2%). Among those, 10 individuals had variants in tier 1 genes. The ES data in the remaining nondiagnostic cases are readily available for further analysis.ConclusionThe tiered and ES-based test provides an efficient and cost-effective diagnostic strategy for NSHL, with the potential to reflex to full exome to identify causal changes outside of the AUDIOME test.Genetics in Medicine advance online publication, 29 March 2018; doi:10.1038/gim.2018.48.

  5. Towards β-globin gene-targeting with integrase-defective lentiviral vectors.

    PubMed

    Inanlou, Davoud Nouri; Yakhchali, Bagher; Khanahmad, Hossein; Gardaneh, Mossa; Movassagh, Hesam; Cohan, Reza Ahangari; Ardestani, Mehdi Shafiee; Mahdian, Reza; Zeinali, Sirous

    2010-11-01

    We have developed an integrase-defective lentiviral (LV) vector in combination with a gene-targeting approach for gene therapy of β-thalassemia. The β-globin gene-targeting construct has two homologous stems including sequence upstream and downstream of the β-globin gene, a β-globin gene positioned between hygromycin and neomycin resistant genes and a herpes simplex virus type 1 thymidine kinase (HSVtk) suicide gene. Utilization of integrase-defective LV as a vector for the β-globin gene increased the number of selected clones relative to non-viral methods. This method represents an important step toward the ultimate goal of a clinical gene therapy for β-thalassemia.

  6. Meta-analysis identifies gene-by-environment interactions as demonstrated in a study of 4,965 mice.

    PubMed

    Kang, Eun Yong; Han, Buhm; Furlotte, Nicholas; Joo, Jong Wha J; Shih, Diana; Davis, Richard C; Lusis, Aldons J; Eskin, Eleazar

    2014-01-01

    Identifying environmentally-specific genetic effects is a key challenge in understanding the structure of complex traits. Model organisms play a crucial role in the identification of such gene-by-environment interactions, as a result of the unique ability to observe genetically similar individuals across multiple distinct environments. Many model organism studies examine the same traits but under varying environmental conditions. For example, knock-out or diet-controlled studies are often used to examine cholesterol in mice. These studies, when examined in aggregate, provide an opportunity to identify genomic loci exhibiting environmentally-dependent effects. However, the straightforward application of traditional methodologies to aggregate separate studies suffers from several problems. First, environmental conditions are often variable and do not fit the standard univariate model for interactions. Additionally, applying a multivariate model results in increased degrees of freedom and low statistical power. In this paper, we jointly analyze multiple studies with varying environmental conditions using a meta-analytic approach based on a random effects model to identify loci involved in gene-by-environment interactions. Our approach is motivated by the observation that methods for discovering gene-by-environment interactions are closely related to random effects models for meta-analysis. We show that interactions can be interpreted as heterogeneity and can be detected without utilizing the traditional uni- or multi-variate approaches for discovery of gene-by-environment interactions. We apply our new method to combine 17 mouse studies containing in aggregate 4,965 distinct animals. We identify 26 significant loci involved in High-density lipoprotein (HDL) cholesterol, many of which are consistent with previous findings. Several of these loci show significant evidence of involvement in gene-by-environment interactions. An additional advantage of our meta-analysis approach is that our combined study has significantly higher power and improved resolution compared to any single study thus explaining the large number of loci discovered in the combined study.

  7. Meta-Analysis Identifies Gene-by-Environment Interactions as Demonstrated in a Study of 4,965 Mice

    PubMed Central

    Joo, Jong Wha J.; Shih, Diana; Davis, Richard C.; Lusis, Aldons J.; Eskin, Eleazar

    2014-01-01

    Identifying environmentally-specific genetic effects is a key challenge in understanding the structure of complex traits. Model organisms play a crucial role in the identification of such gene-by-environment interactions, as a result of the unique ability to observe genetically similar individuals across multiple distinct environments. Many model organism studies examine the same traits but under varying environmental conditions. For example, knock-out or diet-controlled studies are often used to examine cholesterol in mice. These studies, when examined in aggregate, provide an opportunity to identify genomic loci exhibiting environmentally-dependent effects. However, the straightforward application of traditional methodologies to aggregate separate studies suffers from several problems. First, environmental conditions are often variable and do not fit the standard univariate model for interactions. Additionally, applying a multivariate model results in increased degrees of freedom and low statistical power. In this paper, we jointly analyze multiple studies with varying environmental conditions using a meta-analytic approach based on a random effects model to identify loci involved in gene-by-environment interactions. Our approach is motivated by the observation that methods for discovering gene-by-environment interactions are closely related to random effects models for meta-analysis. We show that interactions can be interpreted as heterogeneity and can be detected without utilizing the traditional uni- or multi-variate approaches for discovery of gene-by-environment interactions. We apply our new method to combine 17 mouse studies containing in aggregate 4,965 distinct animals. We identify 26 significant loci involved in High-density lipoprotein (HDL) cholesterol, many of which are consistent with previous findings. Several of these loci show significant evidence of involvement in gene-by-environment interactions. An additional advantage of our meta-analysis approach is that our combined study has significantly higher power and improved resolution compared to any single study thus explaining the large number of loci discovered in the combined study. PMID:24415945

  8. Correlation between TS, MTHFR, and ERCC1 gene polymorphisms and the efficacy of platinum in combination with pemetrexed first-line chemotherapy in mesothelioma patients.

    PubMed

    Powrózek, Tomasz; Kowalski, Dariusz M; Krawczyk, Paweł; Ramlau, Rodryg; Kucharczyk, Tomasz; Kalinka-Warzocha, Ewa; Knetki-Wróblewska, Magdalena; Winiarczyk, Kinga; Dyszkiewicz, Wojciech; Krzakowski, Maciej; Milanowski, Janusz

    2014-11-01

    The combination of pemetrexed and platinum compound represents the standard regimen for first-line chemotherapy in malignant pleural mesothelioma patients. Pemetrexed is a multitarget antifolate agent that inhibits folate-dependent enzymes (eg, thymidylate synthase [TS]) and thus synthesis of nucleotides and DNA. Expression of TS and folate availability, regulated by gene polymorphisms, have implications for effectiveness of chemotherapy and the outcome of mesothelioma patients. The aim of this retrospective multicenter study was to assess the correlation between TS, 5,10-methylenetetrahydrofolate reductase (MTHFR) and excision repair cross-complementing group 1 (ERCC1) gene polymorphisms and the efficacy of pemetrexed-based first-line chemotherapy of mesothelioma patients. Fifty-nine mesothelioma patients (31 men with a median age of 62 years) treated in first-line chemotherapy with platinum in combination with pemetrexed or pemetrexed monotherapy were enrolled. Genomic DNA was isolated from peripheral blood. Using polymerase chain reaction and high resolution melt methods, the variable number of tandem repeat, the G>C single nucleotide polymorphism (SNP) in these repeats, and 6-base pair (bp) insertion/deletion polymorphism of the TS gene, the SNP of 677C>T in MTHFR, and 19007C>T in the ERCC1 gene were analyzed and correlated with disease control rate, progression-free survival (PFS), and overall survival (OS) of mesothelioma patients. Greater risk of early disease progression (PD), and shortening of PFS and OS were associated with several clinical factors (eg, anemia for early PD and OS), weight loss (for PFS and OS), and previous surgical treatment (for early PD, PFS, and OS). Insertion of 6-bp in both alleles of the TS gene (1494del6) was the only genetic factor that increased the incidence of early progression (P = .028) and shortening of median PFS (P = .06) in patients treated with pemetrexed-based chemotherapy. In multivariate analysis, the 1494del6 in the 3' untranslated region (UTR) of the TS gene also had a predictive role for PFS (P = .0185; hazard ratio, 2.3258 for +6/+6 homozygotes) in analyzed mesothelioma patients. Most analyzed polymorphisms in TS, MTHFR, and ERCC1 genes failed to predict outcome in mesothelioma patients treated with pemetrexed-based chemotherapy. However, different variants of 1494del6 in the 3' UTR of the TS gene were associated with differences in disease control rate and PFS of our patients. Copyright © 2014 Elsevier Inc. All rights reserved.

  9. Identification of T1D susceptibility genes within the MHC region by combining protein interaction networks and SNP genotyping data

    PubMed Central

    Brorsson, C.; Hansen, N. T.; Lage, K.; Bergholdt, R.; Brunak, S.; Pociot, F.

    2009-01-01

    Aim To develop novel methods for identifying new genes that contribute to the risk of developing type 1 diabetes within the Major Histocompatibility Complex (MHC) region on chromosome 6, independently of the known linkage disequilibrium (LD) between human leucocyte antigen (HLA)-DRB1, -DQA1, -DQB1 genes. Methods We have developed a novel method that combines single nucleotide polymorphism (SNP) genotyping data with protein–protein interaction (ppi) networks to identify disease-associated network modules enriched for proteins encoded from the MHC region. Approximately 2500 SNPs located in the 4 Mb MHC region were analysed in 1000 affected offspring trios generated by the Type 1 Diabetes Genetics Consortium (T1DGC). The most associated SNP in each gene was chosen and genes were mapped to ppi networks for identification of interaction partners. The association testing and resulting interacting protein modules were statistically evaluated using permutation. Results A total of 151 genes could be mapped to nodes within the protein interaction network and their interaction partners were identified. Five protein interaction modules reached statistical significance using this approach. The identified proteins are well known in the pathogenesis of T1D, but the modules also contain additional candidates that have been implicated in β-cell development and diabetic complications. Conclusions The extensive LD within the MHC region makes it important to develop new methods for analysing genotyping data for identification of additional risk genes for T1D. Combining genetic data with knowledge about functional pathways provides new insight into mechanisms underlying T1D. PMID:19143816

  10. Efficient IDUA Gene Mutation Detection with Combined Use of dHPLC and Dried Blood Samples

    PubMed Central

    Duarte, Ana Joana; Vieira, Luis

    2013-01-01

    Objectives. Development of a simple mutation directed method in order to allow lowering the cost of mutation testing using an easily obtainable biological material. Assessment of the feasibility of such method was tested using a GC-rich amplicon. Design and Methods. A method of denaturing high-performance liquid chromatography (dHPLC) was improved and implemented as a technique for the detection of variants in exon 9 of the IDUA gene. The optimized method was tested in 500 genomic DNA samples obtained from dried blood spots (DBS). Results. With this dHPLC approach it was possible to detect different variants, including the common p.Trp402Ter mutation in the IDUA gene. The high GC content did not interfere with the resolution and reliability of this technique, and discrimination of G-C transversions was also achieved. Conclusion. This PCR-based dHPLC method is proved to be a rapid, a sensitive, and an excellent option for screening numerous samples obtained from DBS. Furthermore, it resulted in the consistent detection of clearly distinguishable profiles of the common p.Trp402Ter IDUA mutation with an advantageous balance of cost and technical requirements. PMID:27335677

  11. Changes in Physiological Parameters after Combined Exercise according to the I/D Polymorphism of hUCP2 Gene in Middle-Aged Obese Females

    PubMed Central

    DUK OH, Sang

    2014-01-01

    Abstract Background The purpose of this study was to determine whether a 45 bp insertion/deletion (I/D) polymorphism in human uncoupling protein 2 (hUCP2) gene was associated with changes in several cardiovascular risk and physical fitness factors in response to combined exercise during 12 weeks in Korean middle-aged women. The changes in physiological parameters after combined exercise during 12 weeks were compared between each genotype subgroups of hUCP2 gene to clarify the inter-individual differences in exercised-induced changes according to genetic predisposition. Methods A total of 185 women aged over 40 years living in Seoul, Korea were participated in this study, and analyzed before and after 12 weeks on combined exercise including aerobic exercise and strength training for body composition, hemodynamic parameters, physical fitness and metabolic variables. A 45 bp I/D polymorphism in hUCP2 gene was genotyped by polymerase chain reaction (PCR) amplification and agarose gel electrophoresis method. Results Combined exercise program during 12 weeks indicated the significant health-promoting effects for our participants on multiple body composition, hemodynamic parameters, physical fitness factors and metabolic parameters, respectively. With respect to a 45 bp I/D polymorphism in hUCP2 gene, this polymorphism was significantly associated with baseline %body fat of our participants (P <.05). Moreover, this polymorphism was significantly associated with the changes in %body fat and serum triglyceride(TG) level after combined exercise program during 12 weeks(P <.05). Conclusion Our data suggest that a 45 bp I/D polymorphism in hUCP2 gene may at least in part contribute to the inter-individual differences on the changes in some clinical and metabolic parameters following combined exercise in middle-aged women. PMID:25909061

  12. Methods for monitoring multiple gene expression

    DOEpatents

    Berka, Randy; Bachkirova, Elena; Rey, Michael

    2013-10-01

    The present invention relates to methods for monitoring differential expression of a plurality of genes in a first filamentous fungal cell relative to expression of the same genes in one or more second filamentous fungal cells using microarrays containing Trichoderma reesei ESTs or SSH clones, or a combination thereof. The present invention also relates to computer readable media and substrates containing such array features for monitoring expression of a plurality of genes in filamentous fungal cells.

  13. Methods for monitoring multiple gene expression

    DOEpatents

    Berka, Randy [Davis, CA; Bachkirova, Elena [Davis, CA; Rey, Michael [Davis, CA

    2012-05-01

    The present invention relates to methods for monitoring differential expression of a plurality of genes in a first filamentous fungal cell relative to expression of the same genes in one or more second filamentous fungal cells using microarrays containing Trichoderma reesei ESTs or SSH clones, or a combination thereof. The present invention also relates to computer readable media and substrates containing such array features for monitoring expression of a plurality of genes in filamentous fungal cells.

  14. Methods for monitoring multiple gene expression

    DOEpatents

    Berka, Randy [Davis, CA; Bachkirova, Elena [Davis, CA; Rey, Michael [Davis, CA

    2008-06-01

    The present invention relates to methods for monitoring differential expression of a plurality of genes in a first filamentous fungal cell relative to expression of the same genes in one or more second filamentous fungal cells using microarrays containing Trichoderma reesei ESTs or SSH clones, or a combination thereof. The present invention also relates to computer readable media and substrates containing such array features for monitoring expression of a plurality of genes in filamentous fungal cells.

  15. GeneBuilder: interactive in silico prediction of gene structure.

    PubMed

    Milanesi, L; D'Angelo, D; Rogozin, I B

    1999-01-01

    Prediction of gene structure in newly sequenced DNA becomes very important in large genome sequencing projects. This problem is complicated due to the exon-intron structure of eukaryotic genes and because gene expression is regulated by many different short nucleotide domains. In order to be able to analyse the full gene structure in different organisms, it is necessary to combine information about potential functional signals (promoter region, splice sites, start and stop codons, 3' untranslated region) together with the statistical properties of coding sequences (coding potential), information about homologous proteins, ESTs and repeated elements. We have developed the GeneBuilder system which is based on prediction of functional signals and coding regions by different approaches in combination with similarity searches in proteins and EST databases. The potential gene structure models are obtained by using a dynamic programming method. The program permits the use of several parameters for gene structure prediction and refinement. During gene model construction, selecting different exon homology levels with a protein sequence selected from a list of homologous proteins can improve the accuracy of the gene structure prediction. In the case of low homology, GeneBuilder is still able to predict the gene structure. The GeneBuilder system has been tested by using the standard set (Burset and Guigo, Genomics, 34, 353-367, 1996) and the performances are: 0.89 sensitivity and 0.91 specificity at the nucleotide level. The total correlation coefficient is 0.88. The GeneBuilder system is implemented as a part of the WebGene a the URL: http://www.itba.mi. cnr.it/webgene and TRADAT (TRAncription Database and Analysis Tools) launcher URL: http://www.itba.mi.cnr.it/tradat.

  16. Genetic doping and health damages.

    PubMed

    Fallahi, Aa; Ravasi, Aa; Farhud, Dd

    2011-01-01

    Use of genetic doping or gene transfer technology will be the newest and the lethal method of doping in future and have some unpleasant consequences for sports, athletes, and outcomes of competitions. The World Anti-Doping Agency (WADA) defines genetic doping as "the non-therapeutic use of genes, genetic elements, and/or cells that have the capacity to enhance athletic performance ". The purpose of this review is to consider genetic doping, health damages and risks of new genes if delivered in athletes. This review, which is carried out by reviewing relevant publications, is primarily based on the journals available in GOOGLE, ELSEVIER, PUBMED in fields of genetic technology, and health using a combination of keywords (e.g., genetic doping, genes, exercise, performance, athletes) until July 2010. There are several genes related to sport performance and if they are used, they will have health risks and sever damages such as cancer, autoimmunization, and heart attack.

  17. A Microplate Reader-Based System for Visualizing Transcriptional Activity During in vivo Microbial Interactions in Space and Time.

    PubMed

    Hennessy, Rosanna C; Stougaard, Peter; Olsson, Stefan

    2017-03-21

    Here, we report the development of a microplate reader-based system for visualizing gene expression dynamics in living bacterial cells in response to a fungus in space and real-time. A bacterium expressing the red fluorescent protein mCherry fused to the promoter region of a regulator gene nunF indicating activation of an antifungal secondary metabolite gene cluster was used as a reporter system. Time-lapse image recordings of the reporter red signal and a green signal from fluorescent metabolites combined with microbial growth measurements showed that nunF-regulated gene transcription is switched on when the bacterium enters the deceleration growth phase and upon physical encounter with fungal hyphae. This novel technique enables real-time live imaging of samples by time-series multi-channel automatic recordings using a microplate reader as both an incubator and image recorder of general use to researchers. The technique can aid in deciding when to destructively sample for other methods e.g. transcriptomics and mass spectrometry imaging to study gene expression and metabolites exchanged during the interaction.

  18. A detailed transcript-level probe annotation reveals alternative splicing based microarray platform differences

    PubMed Central

    Lee, Joseph C; Stiles, David; Lu, Jun; Cam, Margaret C

    2007-01-01

    Background Microarrays are a popular tool used in experiments to measure gene expression levels. Improving the reproducibility of microarray results produced by different chips from various manufacturers is important to create comparable and combinable experimental results. Alternative splicing has been cited as a possible cause of differences in expression measurements across platforms, though no study to this point has been conducted to show its influence in cross-platform differences. Results Using probe sequence data, a new microarray probe/transcript annotation was created based on the AceView Aug05 release that allowed for the categorization of genes based on their expression measurements' susceptibility to alternative splicing differences across microarray platforms. Examining gene expression data from multiple platforms in light of the new categorization, genes unsusceptible to alternative splicing differences showed higher signal agreement than those genes most susceptible to alternative splicing differences. The analysis gave rise to a different probe-level visualization method that can highlight probe differences according to transcript specificity. Conclusion The results highlight the need for detailed probe annotation at the transcriptome level. The presence of alternative splicing within a given sample can affect gene expression measurements and is a contributing factor to overall technical differences across platforms. PMID:17708771

  19. Breast cancer evaluation by fluorescent dot detection using combined mathematical morphology and multifractal techniques

    PubMed Central

    2011-01-01

    Background Fluorescence in situ hybridization (FISH) is very accurate method for measuring HER2 gene copies, as a sign of potential breast cancer. This method requires small tissue samples, and has a high sensitivity to detect abnormalities from a histological section. By using multiple colors, this method allows the detection of multiple targets simultaneously. The target parts in the cells become visible as colored dots. The HER-2 probes are visible as orange stained spots under a fluorescent microscope while probes for centromere 17 (CEP-17), the chromosome on which the gene HER-2/neu is located, are visible as green spots. Methods The conventional analysis involves the scoring of the ratio of HER-2/neu over CEP 17 dots within each cell nucleus and then averaging the scores for a number of 60 cells. A ratio of 2.0 of HER-2/neu to CEP 17 copy number denotes amplification. Several methods have been proposed for the detection and automated evaluation (dot counting) of FISH signals. In this paper the combined method based on the mathematical morphology (MM) and inverse multifractal (IMF) analysis is suggested. Similar method was applied recently in detection of microcalcifications in digital mammograms, and was very successful. Results The combined MM using top-hat and bottom-hat filters, and the IMF method was applied to FISH images from Molecular Biology Lab, Department of Pathology, Wielkoposka Cancer Center, Poznan. Initial results indicate that this method can be applied to FISH images for the evaluation of HER2/neu status. Conclusions Mathematical morphology and multifractal approach are used for colored dot detection and counting in FISH images. Initial results derived on clinical cases are promising. Note that the overlapping of colored dots, particularly red/orange dots, needs additional improvements in post-processing. PMID:21489192

  20. Classification of a large microarray data set: Algorithm comparison and analysis of drug signatures

    PubMed Central

    Natsoulis, Georges; El Ghaoui, Laurent; Lanckriet, Gert R.G.; Tolley, Alexander M.; Leroy, Fabrice; Dunlea, Shane; Eynon, Barrett P.; Pearson, Cecelia I.; Tugendreich, Stuart; Jarnagin, Kurt

    2005-01-01

    A large gene expression database has been produced that characterizes the gene expression and physiological effects of hundreds of approved and withdrawn drugs, toxicants, and biochemical standards in various organs of live rats. In order to derive useful biological knowledge from this large database, a variety of supervised classification algorithms were compared using a 597-microarray subset of the data. Our studies show that several types of linear classifiers based on Support Vector Machines (SVMs) and Logistic Regression can be used to derive readily interpretable drug signatures with high classification performance. Both methods can be tuned to produce classifiers of drug treatments in the form of short, weighted gene lists which upon analysis reveal that some of the signature genes have a positive contribution (act as “rewards” for the class-of-interest) while others have a negative contribution (act as “penalties”) to the classification decision. The combination of reward and penalty genes enhances performance by keeping the number of false positive treatments low. The results of these algorithms are combined with feature selection techniques that further reduce the length of the drug signatures, an important step towards the development of useful diagnostic biomarkers and low-cost assays. Multiple signatures with no genes in common can be generated for the same classification end-point. Comparison of these gene lists identifies biological processes characteristic of a given class. PMID:15867433

  1. Fast ancestral gene order reconstruction of genomes with unequal gene content.

    PubMed

    Feijão, Pedro; Araujo, Eloi

    2016-11-11

    During evolution, genomes are modified by large scale structural events, such as rearrangements, deletions or insertions of large blocks of DNA. Of particular interest, in order to better understand how this type of genomic evolution happens, is the reconstruction of ancestral genomes, given a phylogenetic tree with extant genomes at its leaves. One way of solving this problem is to assume a rearrangement model, such as Double Cut and Join (DCJ), and find a set of ancestral genomes that minimizes the number of events on the input tree. Since this problem is NP-hard for most rearrangement models, exact solutions are practical only for small instances, and heuristics have to be used for larger datasets. This type of approach can be called event-based. Another common approach is based on finding conserved structures between the input genomes, such as adjacencies between genes, possibly also assigning weights that indicate a measure of confidence or probability that this particular structure is present on each ancestral genome, and then finding a set of non conflicting adjacencies that optimize some given function, usually trying to maximize total weight and minimizing character changes in the tree. We call this type of methods homology-based. In previous work, we proposed an ancestral reconstruction method that combines homology- and event-based ideas, using the concept of intermediate genomes, that arise in DCJ rearrangement scenarios. This method showed better rate of correctly reconstructed adjacencies than other methods, while also being faster, since the use of intermediate genomes greatly reduces the search space. Here, we generalize the intermediate genome concept to genomes with unequal gene content, extending our method to account for gene insertions and deletions of any length. In many of the simulated datasets, our proposed method had better results than MLGO and MGRA, two state-of-the-art algorithms for ancestral reconstruction with unequal gene content, while running much faster, making it more scalable to larger datasets. Studing ancestral reconstruction problems under a new light, using the concept of intermediate genomes, allows the design of very fast algorithms by greatly reducing the solution search space, while also giving very good results. The algorithms introduced in this paper were implemented in an open-source software called RINGO (ancestral Reconstruction with INtermediate GenOmes), available at https://github.com/pedrofeijao/RINGO .

  2. A data science approach to candidate gene selection of pain regarded as a process of learning and neural plasticity.

    PubMed

    Ultsch, Alfred; Kringel, Dario; Kalso, Eija; Mogil, Jeffrey S; Lötsch, Jörn

    2016-12-01

    The increasing availability of "big data" enables novel research approaches to chronic pain while also requiring novel techniques for data mining and knowledge discovery. We used machine learning to combine the knowledge about n = 535 genes identified empirically as relevant to pain with the knowledge about the functions of thousands of genes. Starting from an accepted description of chronic pain as displaying systemic features described by the terms "learning" and "neuronal plasticity," a functional genomics analysis proposed that among the functions of the 535 "pain genes," the biological processes "learning or memory" (P = 8.6 × 10) and "nervous system development" (P = 2.4 × 10) are statistically significantly overrepresented as compared with the annotations to these processes expected by chance. After establishing that the hypothesized biological processes were among important functional genomics features of pain, a subset of n = 34 pain genes were found to be annotated with both Gene Ontology terms. Published empirical evidence supporting their involvement in chronic pain was identified for almost all these genes, including 1 gene identified in March 2016 as being involved in pain. By contrast, such evidence was virtually absent in a randomly selected set of 34 other human genes. Hence, the present computational functional genomics-based method can be used for candidate gene selection, providing an alternative to established methods.

  3. Isolation and characterization of NBS-LRR- resistance gene candidates in turmeric (Curcuma longa cv. surama).

    PubMed

    Joshi, R K; Mohanty, S; Subudhi, E; Nayak, S

    2010-09-08

    Turmeric (Curcuma longa), an important asexually reproducing spice crop of the family Zingiberaceae is highly susceptible to bacterial and fungal pathogens. The identification of resistance gene analogs holds great promise for development of resistant turmeric cultivars. Degenerate primers designed based on known resistance genes (R-genes) were used in combinations to elucidate resistance gene analogs from Curcuma longa cultivar surama. The three primers resulted in amplicons with expected sizes of 450-600 bp. The nucleotide sequence of these amplicons was obtained through sequencing; their predicted amino acid sequences compared to each other and to the amino acid sequences of known R-genes revealed significant sequence similarity. The finding of conserved domains, viz., kinase-1a, kinase-2 and hydrophobic motif, provided evidence that the sequences belong to the NBS-LRR class gene family. The presence of tryptophan as the last residue of kinase-2 motif further qualified them to be in the non-TIR-NBS-LRR subfamily of resistance genes. A cluster analysis based on the neighbor-joining method was carried out using Curcuma NBS analogs together with several resistance gene analogs and known R-genes, which classified them into two distinct subclasses, corresponding to clades N3 and N4 of non-TIR-NBS sequences described in plants. The NBS analogs that we isolated can be used as guidelines to eventually isolate numerous R-genes in turmeric.

  4. Disruption of diphthamide synthesis genes and resulting toxin resistance as a robust technology for quantifying and optimizing CRISPR/Cas9-mediated gene editing.

    PubMed

    Killian, Tobias; Dickopf, Steffen; Haas, Alexander K; Kirstenpfad, Claudia; Mayer, Klaus; Brinkmann, Ulrich

    2017-11-13

    We have devised an effective and robust method for the characterization of gene-editing events. The efficacy of editing-mediated mono- and bi-allelic gene inactivation and integration events is quantified based on colony counts. The combination of diphtheria toxin (DT) and puromycin (PM) selection enables analyses of 10,000-100,000 individual cells, assessing hundreds of clones with inactivated genes per experiment. Mono- and bi-allelic gene inactivation is differentiated by DT resistance, which occurs only upon bi-allelic inactivation. PM resistance indicates integration. The robustness and generalizability of the method were demonstrated by quantifying the frequency of gene inactivation and cassette integration under different editing approaches: CRISPR/Cas9-mediated complete inactivation was ~30-50-fold more frequent than cassette integration. Mono-allelic inactivation without integration occurred >100-fold more frequently than integration. Assessment of gRNA length confirmed 20mers to be most effective length for inactivation, while 16-18mers provided the highest overall integration efficacy. The overall efficacy was ~2-fold higher for CRISPR/Cas9 than for zinc-finger nuclease and was significantly increased upon modulation of non-homologous end joining or homology-directed repair. The frequencies and ratios of editing events were similar for two different DPH genes (independent of the target sequence or chromosomal location), which indicates that the optimization parameters identified with this method can be generalized.

  5. Novel strategies to mine alcoholism-related haplotypes and genes by combining existing knowledge framework.

    PubMed

    Zhang, RuiJie; Li, Xia; Jiang, YongShuai; Liu, GuiYou; Li, ChuanXing; Zhang, Fan; Xiao, Yun; Gong, BinSheng

    2009-02-01

    High-throughout single nucleotide polymorphism detection technology and the existing knowledge provide strong support for mining the disease-related haplotypes and genes. In this study, first, we apply four kinds of haplotype identification methods (Confidence Intervals, Four Gamete Tests, Solid Spine of LD and fusing method of haplotype block) into high-throughout SNP genotype data to identify blocks, then use cluster analysis to verify the effectiveness of the four methods, and select the alcoholism-related SNP haplotypes through risk analysis. Second, we establish a mapping from haplotypes to alcoholism-related genes. Third, we inquire NCBI SNP and gene databases to locate the blocks and identify the candidate genes. In the end, we make gene function annotation by KEGG, Biocarta, and GO database. We find 159 haplotype blocks, which relate to the alcoholism most possibly on chromosome 1 approximately 22, including 227 haplotypes, of which 102 SNP haplotypes may increase the risk of alcoholism. We get 121 alcoholism-related genes and verify their reliability by the functional annotation of biology. In a word, we not only can handle the SNP data easily, but also can locate the disease-related genes precisely by combining our novel strategies of mining alcoholism-related haplotypes and genes with existing knowledge framework.

  6. Radiogenetic therapy: strategies to overcome tumor resistance.

    PubMed

    Marples, B; Greco, O; Joiner, M C; Scott, S D

    2003-01-01

    The aim of cancer gene therapy is to selectively kill malignant cells at the tumor site, by exploiting traits specific to cancer cells and/or solid tumors. Strategies that take advantage of biological features common to different tumor types are particularly promising, since they have wide clinical applicability. Much attention has focused on genetic methods that complement radiotherapy, the principal treatment modality, or that exploit hypoxia, the most ubiquitous characteristic of most solid cancers. The goal of this review is to highlight two promising gene therapy methods developed specifically to target the tumor volume that can be readily used in combination with radiotherapy. The first approach uses radiation-responsive gene promoters to control the selective expression of a suicide gene (e.g., herpes simplex virus thymidine kinase) to irradiated tissue only, leading to targeted cell killing in the presence of a prodrug (e.g., ganciclovir). The second method utilizes oxygen-dependent promoters to produce selective therapeutic gene expression and prodrug activation in hypoxic cells, which are refractive to conventional radiotherapy. Further refining of tumor targeting can be achieved by combining radiation and hypoxia responsive elements in chimeric promoters activated by either and dual stimuli. The in vitro and in vivo studies described in this review suggest that the combination of gene therapy and radiotherapy protocols has potential for use in cancer care, particularly in cases currently refractory to treatment as a result of inherent or hypoxia-mediated radioresistance.

  7. Reconsideration of systematic relationships within the order Euplotida (Protista, Ciliophora) using new sequences of the gene coding for small-subunit rRNA and testing the use of combined data sets to construct phylogenies of the Diophrys-complex.

    PubMed

    Yi, Zhenzhen; Song, Weibo; Clamp, John C; Chen, Zigui; Gao, Shan; Zhang, Qianqian

    2009-03-01

    Comprehensive molecular analyses of phylogenetic relationships within euplotid ciliates are relatively rare, and the relationships among some families remain questionable. We performed phylogenetic analyses of the order Euplotida based on new sequences of the gene coding for small-subunit RNA (SSrRNA) from a variety of taxa across the entire order as well as sequences from some of these taxa of other genes (ITS1-5.8S-ITS2 region and histone H4) that have not been included in previous analyses. Phylogenetic trees based on SSrRNA gene sequences constructed with four different methods had a consistent branching pattern that included the following features: (1) the "typical" euplotids comprised a paraphyletic assemblage composed of two divergent clades (family Uronychiidae and families Euplotidae-Certesiidae-Aspidiscidae-Gastrocirrhidae), (2) in the family Uronychiidae, the genera Uronychia and Paradiophrys formed a clearly outlined, well-supported clade that seemed to be rather divergent from Diophrys and Diophryopsis, suggesting that the Diophrys-complex may have had a longer and more separate evolutionary history than previously supposed, (3) inclusion of 12 new SSrRNA sequences in analyses of Euplotidae revealed two new clades of species within the family and cast additional doubt on the present classification of genera within the family, and (4) the intraspecific divergence among five species of Aspidisca was far greater than those of closely related genera. The ITS1-5.8S-ITS2 coding regions and partial histone H4 genes of six morphospecies in the Diophrys-complex were sequenced along with their SSrRNA genes and used to compare phylogenies constructed from single data sets to those constructed from combined sets. Results indicated that combined analyses could be used to construct more reliable, less ambiguous phylogenies of complex groups like the order Euplotida, because they provide a greater amount and diversity of information.

  8. Quantification of AAV particle titers by infrared fluorescence scanning of coomassie-stained sodium dodecyl sulfate-polyacrylamide gels.

    PubMed

    Kohlbrenner, Erik; Henckaerts, Els; Rapti, Kleopatra; Gordon, Ronald E; Linden, R Michael; Hajjar, Roger J; Weber, Thomas

    2012-06-01

    Adeno-associated virus (AAV)-based vectors have gained increasing attention as gene delivery vehicles in basic and preclinical studies as well as in human gene therapy trials. Especially for the latter two-for both safety and therapeutic efficacy reasons-a detailed characterization of all relevant parameters of the vector preparation is essential. Two important parameters that are routinely used to analyze recombinant AAV vectors are (1) the titer of viral particles containing a (recombinant) viral genome and (2) the purity of the vector preparation, most commonly assessed by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) followed by silver staining. An important, third parameter, the titer of total viral particles, that is, the combined titer of both genome-containing and empty viral capsids, is rarely determined. Here, we describe a simple and inexpensive method that allows the simultaneous assessment of both vector purity and the determination of the total viral particle titer. This method, which was validated by comparison with established methods to determine viral particle titers, is based on the fact that Coomassie Brilliant Blue, when bound to proteins, fluoresces in the infrared spectrum. Viral samples are separated by SDS-PAGE followed by Coomassie Brilliant Blue staining and gel analysis with an infrared laser-scanning device. In combination with a protein standard, our method allows the rapid and accurate determination of viral particle titers simultaneously with the assessment of vector purity.

  9. Prioritization of orphan disease-causing genes using topological feature and GO similarity between proteins in interaction networks.

    PubMed

    Li, Min; Li, Qi; Ganegoda, Gamage Upeksha; Wang, JianXin; Wu, FangXiang; Pan, Yi

    2014-11-01

    Identification of disease-causing genes among a large number of candidates is a fundamental challenge in human disease studies. However, it is still time-consuming and laborious to determine the real disease-causing genes by biological experiments. With the advances of the high-throughput techniques, a large number of protein-protein interactions have been produced. Therefore, to address this issue, several methods based on protein interaction network have been proposed. In this paper, we propose a shortest path-based algorithm, named SPranker, to prioritize disease-causing genes in protein interaction networks. Considering the fact that diseases with similar phenotypes are generally caused by functionally related genes, we further propose an improved algorithm SPGOranker by integrating the semantic similarity of GO annotations. SPGOranker not only considers the topological similarity between protein pairs in a protein interaction network but also takes their functional similarity into account. The proposed algorithms SPranker and SPGOranker were applied to 1598 known orphan disease-causing genes from 172 orphan diseases and compared with three state-of-the-art approaches, ICN, VS and RWR. The experimental results show that SPranker and SPGOranker outperform ICN, VS, and RWR for the prioritization of orphan disease-causing genes. Importantly, for the case study of severe combined immunodeficiency, SPranker and SPGOranker predict several novel causal genes.

  10. Treatment with docetaxel in combination with Aneustat leads to potent inhibition of metastasis in a patient-derived xenograft model of advanced prostate cancer

    PubMed Central

    Qu, Sifeng; Ci, Xinpei; Xue, Hui; Dong, Xin; Hao, Jun; Lin, Dong; Clermont, Pier-Luc; Wu, Rebecca; Collins, Colin C; Gout, Peter W; Wang, Yuzhuo

    2018-01-01

    Background: Docetaxel used for first-line treatment of advanced prostate cancer (PCa) is only marginally effective. We previously showed, using the LTL-313H subrenal capsule patient-derived metastatic PCa xenograft model, that docetaxel combined with Aneustat (OMN54), a multivalent plant-derived therapeutic, led to marked synergistic tumour growth inhibition. Here, we investigated the effect of docetaxel+Aneustat on metastasis. Methods: C4-2 cells were incubated with docetaxel, Aneustat and docetaxel+Aneustat to assess effects on cell migration. The LTL-313H model, similarly treated, was analysed for effects on lung micro-metastasis and kidney invasion. The LTL-313H gene expression profile was compared with profiles of PCa patients (obtained from Oncomine) and subjected to IPA to determine involvement of cancer driver genes. Results: Docetaxel+Aneustat markedly inhibited C4-2 cell migration and LTL-313H lung micro-metastasis/kidney invasion. Oncomine analysis indicated that treatment with docetaxel+Aneustat was associated with improved patient outcome. The drug combination markedly downregulated expression of cancer driver genes such as FOXM1 (and FOXM1-target genes). FOXM1 overexpression reduced the anti-metastatic activity of docetaxel+Aneustat. Conclusions: Docetaxel+Aneustat can inhibit PCa tissue invasion and metastasis. This activity appears to be based on reduced expression of cancer driver genes such as FOXM1. Use of docetaxel+Aneustat may provide a new, more effective regimen for therapy of metastatic PCa. PMID:29381682

  11. spa typing for epidemiological surveillance of Staphylococcus aureus.

    PubMed

    Hallin, Marie; Friedrich, Alexander W; Struelens, Marc J

    2009-01-01

    The spa typing method is based on sequencing of the polymorphic X region of the protein A gene (spa), present in all strains of Staphylococcus aureus. The X region is constituted of a variable number of 24-bp repeats flanked by well-conserved regions. This single-locus sequence-based typing method combines a number of technical advantages, such as rapidity, reproducibility, and portability. Moreover, due to its repeat structure, the spa locus simultaneously indexes micro- and macrovariations, enabling the use of spa typing in both local and global epidemiological studies. These studies are facilitated by the establishment of standardized spa type nomenclature and Internet shared databases.

  12. Gene expression patterns combined with network analysis identify hub genes associated with bladder cancer.

    PubMed

    Bi, Dongbin; Ning, Hao; Liu, Shuai; Que, Xinxiang; Ding, Kejia

    2015-06-01

    To explore molecular mechanisms of bladder cancer (BC), network strategy was used to find biomarkers for early detection and diagnosis. The differentially expressed genes (DEGs) between bladder carcinoma patients and normal subjects were screened using empirical Bayes method of the linear models for microarray data package. Co-expression networks were constructed by differentially co-expressed genes and links. Regulatory impact factors (RIF) metric was used to identify critical transcription factors (TFs). The protein-protein interaction (PPI) networks were constructed by the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) and clusters were obtained through molecular complex detection (MCODE) algorithm. Centralities analyses for complex networks were performed based on degree, stress and betweenness. Enrichment analyses were performed based on Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Co-expression networks and TFs (based on expression data of global DEGs and DEGs in different stages and grades) were identified. Hub genes of complex networks, such as UBE2C, ACTA2, FABP4, CKS2, FN1 and TOP2A, were also obtained according to analysis of degree. In gene enrichment analyses of global DEGs, cell adhesion, proteinaceous extracellular matrix and extracellular matrix structural constituent were top three GO terms. ECM-receptor interaction, focal adhesion, and cell cycle were significant pathways. Our results provide some potential underlying biomarkers of BC. However, further validation is required and deep studies are needed to elucidate the pathogenesis of BC. Copyright © 2015 Elsevier Ltd. All rights reserved.

  13. Marker-assisted combination of major genes for pathogen resistance in potato.

    PubMed

    Gebhardt, C; Bellin, D; Henselewski, H; Lehmann, W; Schwarzfischer, J; Valkonen, J P T

    2006-05-01

    Closely linked PCR-based markers facilitate the tracing and combining of resistance factors that have been introgressed previously into cultivated potato from different sources. Crosses were performed to combine the Ry ( adg ) gene for extreme resistance to Potato virus Y (PVY) with the Gro1 gene for resistance to the root cyst nematode Globodera rostochiensis and the Rx1 gene for extreme resistance to Potato virus X (PVX), or with resistance to potato wart (Synchytrium endobioticum). Marker-assisted selection (MAS) using four PCR-based diagnostic assays was applied to 110 F1 hybrids resulting from four 2x by 4x cross-combinations. Thirty tetraploid plants having the appropriate marker combinations were selected and tested for presence of the corresponding resistance traits. All plants tested showed the expected resistant phenotype. Unexpectedly, the plants segregated for additional resistance to pathotypes 1, 2 and 6 of S. endobioticum, which was subsequently shown to be inherited from the PVY resistant parents of the crosses. The selected plants can be used as sources of multiple resistance traits in pedigree breeding and are available from a potato germplasm bank.

  14. Intrinsic and extrinsic approaches for detecting genes in a bacterial genome.

    PubMed Central

    Borodovsky, M; Rudd, K E; Koonin, E V

    1994-01-01

    The unannotated regions of the Escherichia coli genome DNA sequence from the EcoSeq6 database, totaling 1,278 'intergenic' sequences of the combined length of 359,279 basepairs, were analyzed using computer-assisted methods with the aim of identifying putative unknown genes. The proposed strategy for finding new genes includes two key elements: i) prediction of expressed open reading frames (ORFs) using the GeneMark method based on Markov chain models for coding and non-coding regions of Escherichia coli DNA, and ii) search for protein sequence similarities using programs based on the BLAST algorithm and programs for motif identification. A total of 354 putative expressed ORFs were predicted by GeneMark. Using the BLASTX and TBLASTN programs, it was shown that 208 ORFs located in the unannotated regions of the E. coli chromosome are significantly similar to other protein sequences. Identification of 182 ORFs as probable genes was supported by GeneMark and BLAST, comprising 51.4% of the GeneMark 'hits' and 87.5% of the BLAST 'hits'. 73 putative new genes, comprising 20.6% of the GeneMark predictions, belong to ancient conserved protein families that include both eubacterial and eukaryotic members. This value is close to the overall proportion of highly conserved sequences among eubacterial proteins, indicating that the majority of the putative expressed ORFs that are predicted by GeneMark, but have no significant BLAST hits, nevertheless are likely to be real genes. The majority of the putative genes identified by BLAST search have been described since the release of the EcoSeq6 database, but about 70 genes have not been detected so far. Among these new identifications are genes encoding proteins with a variety of predicted functions including dehydrogenases, kinases, several other metabolic enzymes, ATPases, rRNA methyltransferases, membrane proteins, and different types of regulatory proteins. Images PMID:7984428

  15. Exploring candidate biomarkers for lung and prostate cancers using gene expression and flux variability analysis.

    PubMed

    Asgari, Yazdan; Khosravi, Pegah; Zabihinpour, Zahra; Habibi, Mahnaz

    2018-02-19

    Genome-scale metabolic models have provided valuable resources for exploring changes in metabolism under normal and cancer conditions. However, metabolism itself is strongly linked to gene expression, so integration of gene expression data into metabolic models might improve the detection of genes involved in the control of tumor progression. Herein, we considered gene expression data as extra constraints to enhance the predictive powers of metabolic models. We reconstructed genome-scale metabolic models for lung and prostate, under normal and cancer conditions to detect the major genes associated with critical subsystems during tumor development. Furthermore, we utilized gene expression data in combination with an information theory-based approach to reconstruct co-expression networks of the human lung and prostate in both cohorts. Our results revealed 19 genes as candidate biomarkers for lung and prostate cancer cells. This study also revealed that the development of a complementary approach (integration of gene expression and metabolic profiles) could lead to proposing novel biomarkers and suggesting renovated cancer treatment strategies which have not been possible to detect using either of the methods alone.

  16. Liposome-based drug co-delivery systems in cancer cells.

    PubMed

    Zununi Vahed, Sepideh; Salehi, Roya; Davaran, Soodabeh; Sharifi, Simin

    2017-02-01

    Combination therapy and nanotechnology offer a promising therapeutic method in cancer treatment. By improving drug's pharmacokinetics, nanoparticulate systems increase the drug's therapeutic effects while decreasing its adverse side effects related to high dosage. Liposomes are extensively used as drug delivery systems and several liposomal nanomedicines have been approved for clinical applications. In this regard, liposome-based combination chemotherapy (LCC) opens a novel avenue in drug delivery research and has increasingly become a significant approach in clinical cancer treatment. This review paper focuses on LCC strategies including co-delivery of: two chemotherapeutic drugs, chemotherapeutic agent with anti-cancer metals, and chemotherapeutic agent with gene agents and ligand-targeted liposome for co-delivery of chemotherapeutic agents. Definitely, the multidisciplinary method may help improve the efficacy of cancer therapy. An extensive literature review was performed mainly using PubMed. Copyright © 2016 Elsevier B.V. All rights reserved.

  17. Using comparative genome analysis to identify problems in annotated microbial genomes.

    PubMed

    Poptsova, Maria S; Gogarten, J Peter

    2010-07-01

    Genome annotation is a tedious task that is mostly done by automated methods; however, the accuracy of these approaches has been questioned since the beginning of the sequencing era. Genome annotation is a multilevel process, and errors can emerge at different stages: during sequencing, as a result of gene-calling procedures, and in the process of assigning gene functions. Missed or wrongly annotated genes differentially impact different types of analyses. Here we discuss and demonstrate how the methods of comparative genome analysis can refine annotations by locating missing orthologues. We also discuss possible reasons for errors and show that the second-generation annotation systems, which combine multiple gene-calling programs with similarity-based methods, perform much better than the first annotation tools. Since old errors may propagate to the newly sequenced genomes, we emphasize that the problem of continuously updating popular public databases is an urgent and unresolved one. Due to the progress in genome-sequencing technologies, automated annotation techniques will remain the main approach in the future. Researchers need to be aware of the existing errors in the annotation of even well-studied genomes, such as Escherichia coli, and consider additional quality control for their results.

  18. Graph-based semi-supervised learning with genomic data integration using condition-responsive genes applied to phenotype classification.

    PubMed

    Doostparast Torshizi, Abolfazl; Petzold, Linda R

    2018-01-01

    Data integration methods that combine data from different molecular levels such as genome, epigenome, transcriptome, etc., have received a great deal of interest in the past few years. It has been demonstrated that the synergistic effects of different biological data types can boost learning capabilities and lead to a better understanding of the underlying interactions among molecular levels. In this paper we present a graph-based semi-supervised classification algorithm that incorporates latent biological knowledge in the form of biological pathways with gene expression and DNA methylation data. The process of graph construction from biological pathways is based on detecting condition-responsive genes, where 3 sets of genes are finally extracted: all condition responsive genes, high-frequency condition-responsive genes, and P-value-filtered genes. The proposed approach is applied to ovarian cancer data downloaded from the Human Genome Atlas. Extensive numerical experiments demonstrate superior performance of the proposed approach compared to other state-of-the-art algorithms, including the latest graph-based classification techniques. Simulation results demonstrate that integrating various data types enhances classification performance and leads to a better understanding of interrelations between diverse omics data types. The proposed approach outperforms many of the state-of-the-art data integration algorithms. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  19. Comparative study on gene set and pathway topology-based enrichment methods.

    PubMed

    Bayerlová, Michaela; Jung, Klaus; Kramer, Frank; Klemm, Florian; Bleckmann, Annalen; Beißbarth, Tim

    2015-10-22

    Enrichment analysis is a popular approach to identify pathways or sets of genes which are significantly enriched in the context of differentially expressed genes. The traditional gene set enrichment approach considers a pathway as a simple gene list disregarding any knowledge of gene or protein interactions. In contrast, the new group of so called pathway topology-based methods integrates the topological structure of a pathway into the analysis. We comparatively investigated gene set and pathway topology-based enrichment approaches, considering three gene set and four topological methods. These methods were compared in two extensive simulation studies and on a benchmark of 36 real datasets, providing the same pathway input data for all methods. In the benchmark data analysis both types of methods showed a comparable ability to detect enriched pathways. The first simulation study was conducted with KEGG pathways, which showed considerable gene overlaps between each other. In this study with original KEGG pathways, none of the topology-based methods outperformed the gene set approach. Therefore, a second simulation study was performed on non-overlapping pathways created by unique gene IDs. Here, methods accounting for pathway topology reached higher accuracy than the gene set methods, however their sensitivity was lower. We conducted one of the first comprehensive comparative works on evaluating gene set against pathway topology-based enrichment methods. The topological methods showed better performance in the simulation scenarios with non-overlapping pathways, however, they were not conclusively better in the other scenarios. This suggests that simple gene set approach might be sufficient to detect an enriched pathway under realistic circumstances. Nevertheless, more extensive studies and further benchmark data are needed to systematically evaluate these methods and to assess what gain and cost pathway topology information introduces into enrichment analysis. Both types of methods for enrichment analysis require further improvements in order to deal with the problem of pathway overlaps.

  20. Phenome-driven disease genetics prediction toward drug discovery.

    PubMed

    Chen, Yang; Li, Li; Zhang, Guo-Qiang; Xu, Rong

    2015-06-15

    Discerning genetic contributions to diseases not only enhances our understanding of disease mechanisms, but also leads to translational opportunities for drug discovery. Recent computational approaches incorporate disease phenotypic similarities to improve the prediction power of disease gene discovery. However, most current studies used only one data source of human disease phenotype. We present an innovative and generic strategy for combining multiple different data sources of human disease phenotype and predicting disease-associated genes from integrated phenotypic and genomic data. To demonstrate our approach, we explored a new phenotype database from biomedical ontologies and constructed Disease Manifestation Network (DMN). We combined DMN with mimMiner, which was a widely used phenotype database in disease gene prediction studies. Our approach achieved significantly improved performance over a baseline method, which used only one phenotype data source. In the leave-one-out cross-validation and de novo gene prediction analysis, our approach achieved the area under the curves of 90.7% and 90.3%, which are significantly higher than 84.2% (P < e(-4)) and 81.3% (P < e(-12)) for the baseline approach. We further demonstrated that our predicted genes have the translational potential in drug discovery. We used Crohn's disease as an example and ranked the candidate drugs based on the rank of drug targets. Our gene prediction approach prioritized druggable genes that are likely to be associated with Crohn's disease pathogenesis, and our rank of candidate drugs successfully prioritized the Food and Drug Administration-approved drugs for Crohn's disease. We also found literature evidence to support a number of drugs among the top 200 candidates. In summary, we demonstrated that a novel strategy combining unique disease phenotype data with system approaches can lead to rapid drug discovery. nlp. edu/public/data/DMN © The Author 2015. Published by Oxford University Press.

  1. Application of droplet digital PCR to determine copy number of endogenous genes and transgenes in sugarcane.

    PubMed

    Sun, Yue; Joyce, Priya Aiyar

    2017-11-01

    Droplet digital PCR combined with the low copy ACT allele as endogenous reference gene, makes accurate and rapid estimation of gene copy number in Q208 A and Q240 A attainable. Sugarcane is an important cultivated crop with both high polyploidy and aneuploidy in its 10 Gb genome. Without a known copy number reference gene, it is difficult to accurately estimate the copy number of any gene of interest by PCR-based methods in sugarcane. Recently, a new technology, known as droplet digital PCR (ddPCR) has been developed which can measure the absolute amount of the target DNA in a given sample. In this study, we deduced the true copy number of three endogenous genes, actin depolymerizing factor (ADF), adenine phosphoribosyltransferase (APRT) and actin (ACT) in three Australian sugarcane varieties, using ddPCR by comparing the absolute amounts of the above genes with a transgene of known copy number. A single copy of the ACT allele was detected in Q208 A , two copies in Q240 A , but was absent in Q117. Copy number variation was also observed for both APRT and ADF, and ranged from 9 to 11 in the three tested varieties. Using this newly developed ddPCR method, transgene copy number was successfully determined in 19 transgenic Q208 A and Q240 A events using ACT as the reference endogenous gene. Our study demonstrates that ddPCR can be used for high-throughput genetic analysis and is a quick, accurate and reliable alternative method for gene copy number determination in sugarcane. This discovered ACT allele would be a suitable endogenous reference gene for future gene copy number variation and dosage studies of functional genes in Q208 A and Q240 A .

  2. Biblio-MetReS for user-friendly mining of genes and biological processes in scientific documents.

    PubMed

    Usie, Anabel; Karathia, Hiren; Teixidó, Ivan; Alves, Rui; Solsona, Francesc

    2014-01-01

    One way to initiate the reconstruction of molecular circuits is by using automated text-mining techniques. Developing more efficient methods for such reconstruction is a topic of active research, and those methods are typically included by bioinformaticians in pipelines used to mine and curate large literature datasets. Nevertheless, experimental biologists have a limited number of available user-friendly tools that use text-mining for network reconstruction and require no programming skills to use. One of these tools is Biblio-MetReS. Originally, this tool permitted an on-the-fly analysis of documents contained in a number of web-based literature databases to identify co-occurrence of proteins/genes. This approach ensured results that were always up-to-date with the latest live version of the databases. However, this 'up-to-dateness' came at the cost of large execution times. Here we report an evolution of the application Biblio-MetReS that permits constructing co-occurrence networks for genes, GO processes, Pathways, or any combination of the three types of entities and graphically represent those entities. We show that the performance of Biblio-MetReS in identifying gene co-occurrence is as least as good as that of other comparable applications (STRING and iHOP). In addition, we also show that the identification of GO processes is on par to that reported in the latest BioCreAtIvE challenge. Finally, we also report the implementation of a new strategy that combines on-the-fly analysis of new documents with preprocessed information from documents that were encountered in previous analyses. This combination simultaneously decreases program run time and maintains 'up-to-dateness' of the results. http://metres.udl.cat/index.php/downloads, metres.cmb@gmail.com.

  3. Combination of culture-independent and culture-dependent molecular methods for the determination of bacterial community of iru, a fermented Parkia biglobosa seeds.

    PubMed

    Adewumi, Gbenga A; Oguntoyinbo, Folarin A; Keisam, Santosh; Romi, Wahengbam; Jeyaram, Kumaraswamy

    2012-01-01

    In this study, bacterial composition of iru produced by natural, uncontrolled fermentation of Parkia biglobosa seeds was assessed using culture-independent method in combination with culture-based genotypic typing techniques. PCR-denaturing gradient gel electrophoresis (DGGE) revealed similarity in DNA fragments with the two DNA extraction methods used and confirmed bacterial diversity in the 16 iru samples from different production regions. DNA sequencing of the highly variable V3 region of the 16S rRNA genes obtained from PCR-DGGE identified species related to Bacillus subtilis as consistent bacterial species in the fermented samples, while other major bands were identified as close relatives of Staphylococcus vitulinus, Morganella morganii, B. thuringiensis, S. saprophyticus, Tetragenococcus halophilus, Ureibacillus thermosphaericus, Brevibacillus parabrevis, Salinicoccus jeotgali, Brevibacterium sp. and uncultured bacteria clones. Bacillus species were cultured as potential starter cultures and clonal relationship of different isolates determined using amplified ribosomal DNA restriction analysis (ARDRA) combined with 16S-23S rRNA gene internal transcribed spacer (ITS) PCR amplification, restriction analysis (ITS-PCR-RFLP), and randomly amplified polymorphic DNA (RAPD-PCR). This further discriminated B. subtilis and its variants from food-borne pathogens such as B. cereus and suggested the need for development of controlled fermentation processes and good manufacturing practices (GMP) for iru production to achieve product consistency, safety quality, and improved shelf life.

  4. Importing statistical measures into Artemis enhances gene identification in the Leishmania genome project.

    PubMed

    Aggarwal, Gautam; Worthey, E A; McDonagh, Paul D; Myler, Peter J

    2003-06-07

    Seattle Biomedical Research Institute (SBRI) as part of the Leishmania Genome Network (LGN) is sequencing chromosomes of the trypanosomatid protozoan species Leishmania major. At SBRI, chromosomal sequence is annotated using a combination of trained and untrained non-consensus gene-prediction algorithms with ARTEMIS, an annotation platform with rich and user-friendly interfaces. Here we describe a methodology used to import results from three different protein-coding gene-prediction algorithms (GLIMMER, TESTCODE and GENESCAN) into the ARTEMIS sequence viewer and annotation tool. Comparison of these methods, along with the CODONUSAGE algorithm built into ARTEMIS, shows the importance of combining methods to more accurately annotate the L. major genomic sequence. An improvised and powerful tool for gene prediction has been developed by importing data from widely-used algorithms into an existing annotation platform. This approach is especially fruitful in the Leishmania genome project where there is large proportion of novel genes requiring manual annotation.

  5. Subtracting the sequence bias from partially digested MNase-seq data reveals a general contribution of TFIIS to nucleosome positioning.

    PubMed

    Gutiérrez, Gabriel; Millán-Zambrano, Gonzalo; Medina, Daniel A; Jordán-Pla, Antonio; Pérez-Ortín, José E; Peñate, Xenia; Chávez, Sebastián

    2017-12-07

    TFIIS stimulates RNA cleavage by RNA polymerase II and promotes the resolution of backtracking events. TFIIS acts in the chromatin context, but its contribution to the chromatin landscape has not yet been investigated. Co-transcriptional chromatin alterations include subtle changes in nucleosome positioning, like those expected to be elicited by TFIIS, which are elusive to detect. The most popular method to map nucleosomes involves intensive chromatin digestion by micrococcal nuclease (MNase). Maps based on these exhaustively digested samples miss any MNase-sensitive nucleosomes caused by transcription. In contrast, partial digestion approaches preserve such nucleosomes, but introduce noise due to MNase sequence preferences. A systematic way of correcting this bias for massively parallel sequencing experiments is still missing. To investigate the contribution of TFIIS to the chromatin landscape, we developed a refined nucleosome-mapping method in Saccharomyces cerevisiae. Based on partial MNase digestion and a sequence-bias correction derived from naked DNA cleavage, the refined method efficiently mapped nucleosomes in promoter regions rich in MNase-sensitive structures. The naked DNA correction was also important for mapping gene body nucleosomes, particularly in those genes whose core promoters contain a canonical TATA element. With this improved method, we analyzed the global nucleosomal changes caused by lack of TFIIS. We detected a general increase in nucleosomal fuzziness and more restricted changes in nucleosome occupancy, which concentrated in some gene categories. The TATA-containing genes were preferentially associated with decreased occupancy in gene bodies, whereas the TATA-like genes did so with increased fuzziness. The detected chromatin alterations correlated with functional defects in nascent transcription, as revealed by genomic run-on experiments. The combination of partial MNase digestion and naked DNA correction of the sequence bias is a precise nucleosomal mapping method that does not exclude MNase-sensitive nucleosomes. This method is useful for detecting subtle alterations in nucleosome positioning produced by lack of TFIIS. Their analysis revealed that TFIIS generally contributed to nucleosome positioning in both gene promoters and bodies. The independent effect of lack of TFIIS on nucleosome occupancy and fuzziness supports the existence of alternative chromatin dynamics during transcription elongation.

  6. Integrated Module and Gene-Specific Regulatory Inference Implicates Upstream Signaling Networks

    PubMed Central

    Roy, Sushmita; Lagree, Stephen; Hou, Zhonggang; Thomson, James A.; Stewart, Ron; Gasch, Audrey P.

    2013-01-01

    Regulatory networks that control gene expression are important in diverse biological contexts including stress response and development. Each gene's regulatory program is determined by module-level regulation (e.g. co-regulation via the same signaling system), as well as gene-specific determinants that can fine-tune expression. We present a novel approach, Modular regulatory network learning with per gene information (MERLIN), that infers regulatory programs for individual genes while probabilistically constraining these programs to reveal module-level organization of regulatory networks. Using edge-, regulator- and module-based comparisons of simulated networks of known ground truth, we find MERLIN reconstructs regulatory programs of individual genes as well or better than existing approaches of network reconstruction, while additionally identifying modular organization of the regulatory networks. We use MERLIN to dissect global transcriptional behavior in two biological contexts: yeast stress response and human embryonic stem cell differentiation. Regulatory modules inferred by MERLIN capture co-regulatory relationships between signaling proteins and downstream transcription factors thereby revealing the upstream signaling systems controlling transcriptional responses. The inferred networks are enriched for regulators with genetic or physical interactions, supporting the inference, and identify modules of functionally related genes bound by the same transcriptional regulators. Our method combines the strengths of per-gene and per-module methods to reveal new insights into transcriptional regulation in stress and development. PMID:24146602

  7. Application of advanced cytometric and molecular technologies to minimal residual disease monitoring

    NASA Astrophysics Data System (ADS)

    Leary, James F.; He, Feng; Reece, Lisa M.

    2000-04-01

    Minimal residual disease monitoring presents a number of theoretical and practical challenges. Recently it has been possible to meet some of these challenges by combining a number of new advanced biotechnologies. To monitor the number of residual tumor cells requires complex cocktails of molecular probes that collectively provide sensitivities of detection on the order of one residual tumor cell per million total cells. Ultra-high-speed, multi parameter flow cytometry is capable of analyzing cells at rates in excess of 100,000 cells/sec. Residual tumor selection marker cocktails can be optimized by use of receiver operating characteristic analysis. New data minimizing techniques when combined with multi variate statistical or neural network classifications of tumor cells can more accurately predict residual tumor cell frequencies. The combination of these techniques can, under at least some circumstances, detect frequencies of tumor cells as low as one cell in a million with an accuracy of over 98 percent correct classification. Detection of mutations in tumor suppressor genes requires insolation of these rare tumor cells and single-cell DNA sequencing. Rare residual tumor cells can be isolated at single cell level by high-resolution single-cell cell sorting. Molecular characterization of tumor suppressor gene mutations can be accomplished using a combination of single- cell polymerase chain reaction amplification of specific gene sequences followed by TA cloning techniques and DNA sequencing. Mutations as small as a single base pair in a tumor suppressor gene of a single sorted tumor cell have been detected using these methods. Using new amplification procedures and DNA micro arrays it should be possible to extend the capabilities shown in this paper to screening of multiple DNA mutations in tumor suppressor and other genes on small numbers of sorted metastatic tumor cells.

  8. Therapeutic effects of Euphorbia Pekinensis and Glycyrrhiza glabra on Hepatocellular Carcinoma Ascites Partially Via Regulating the Frk-Arhgdib-Inpp5d-Avpr2-Aqp4 Signal Axis

    NASA Astrophysics Data System (ADS)

    Zhang, Yanqiong; Yan, Chen; Li, Yuting; Mao, Xia; Tao, Weiwei; Tang, Yuping; Lin, Ya; Guo, Qiuyan; Duan, Jingao; Lin, Na

    2017-02-01

    To clarify unknown rationalities of herbaceous compatibility of Euphorbia Pekinensis (DJ) and Glycyrrhiza glabra (GC) acting on hepatocellular carcinoma (HCC) ascites, peritoneum transcriptomics profiling of 15 subjects, including normal control (Con), HCC ascites mouse model (Mod), DJ-alone, DJ/GC-synergy and DJ/GC-antagonism treatment groups were performed on OneArray platform, followed by differentially expressed genes (DEGs) screening. DEGs between Mod and Con groups were considered as HCC ascites-related genes, and those among different drug treatment and Mod groups were identified as DJ/GC-combination-related genes. Then, an interaction network of HCC ascites-related gene-DJ/GC combination-related gene-known therapeutic target gene for ascites was constructed. Based on nodes’ degree, closeness, betweenness and k-coreness, the Frk-Arhgdib-Inpp5d-Avpr2-Aqp4 axis with highly network topological importance was demonstrated to be a candidate target of DJ/GC combination acting on HCC ascites. Importantly, both qPCR and western blot analyses verified this regulatory effects based on HCC ascites mice in vivo and M-1 collecting duct cells in vitro. Collectively, different combination designs of DJ and GC may lead to synergistic or antagonistic effects on HCC ascites partially via regulating the Frk-Arhgdib-Inpp5d-Avpr2-Aqp4 axis, implying that global gene expression profiling combined with network analysis can offer an effective way to understand pharmacological mechanisms of traditional Chinese medicine prescriptions.

  9. Gene identification in the congenital disorders of glycosylation type I by whole-exome sequencing.

    PubMed

    Timal, Sharita; Hoischen, Alexander; Lehle, Ludwig; Adamowicz, Maciej; Huijben, Karin; Sykut-Cegielska, Jolanta; Paprocka, Justyna; Jamroz, Ewa; van Spronsen, Francjan J; Körner, Christian; Gilissen, Christian; Rodenburg, Richard J; Eidhof, Ilse; Van den Heuvel, Lambert; Thiel, Christian; Wevers, Ron A; Morava, Eva; Veltman, Joris; Lefeber, Dirk J

    2012-10-01

    Congenital disorders of glycosylation type I (CDG-I) form a growing group of recessive neurometabolic diseases. Identification of disease genes is compromised by the enormous heterogeneity in clinical symptoms and the large number of potential genes involved. Until now, gene identification included the sequential application of biochemical methods in blood samples and fibroblasts. In genetically unsolved cases, homozygosity mapping has been applied in consanguineous families. Altogether, this time-consuming diagnostic strategy led to the identification of defects in 17 different CDG-I genes. Here, we applied whole-exome sequencing (WES) in combination with the knowledge of the protein N-glycosylation pathway for gene identification in our remaining group of six unsolved CDG-I patients from unrelated non-consanguineous families. Exome variants were prioritized based on a list of 76 potential CDG-I candidate genes, leading to the rapid identification of one known and two novel CDG-I gene defects. These included the first X-linked CDG-I due to a de novo mutation in ALG13, and compound heterozygous mutations in DPAGT1, together the first two steps in dolichol-PP-glycan assembly, and mutations in PGM1 in two cases, involved in nucleotide sugar biosynthesis. The pathogenicity of the mutations was confirmed by showing the deficient activity of the corresponding enzymes in patient fibroblasts. Combined with these results, the gene defect has been identified in 98% of our CDG-I patients. Our results implicate the potential of WES to unravel disease genes in the CDG-I in newly diagnosed singleton families.

  10. Ensemble Feature Learning of Genomic Data Using Support Vector Machine

    PubMed Central

    Anaissi, Ali; Goyal, Madhu; Catchpoole, Daniel R.; Braytee, Ali; Kennedy, Paul J.

    2016-01-01

    The identification of a subset of genes having the ability to capture the necessary information to distinguish classes of patients is crucial in bioinformatics applications. Ensemble and bagging methods have been shown to work effectively in the process of gene selection and classification. Testament to that is random forest which combines random decision trees with bagging to improve overall feature selection and classification accuracy. Surprisingly, the adoption of these methods in support vector machines has only recently received attention but mostly on classification not gene selection. This paper introduces an ensemble SVM-Recursive Feature Elimination (ESVM-RFE) for gene selection that follows the concepts of ensemble and bagging used in random forest but adopts the backward elimination strategy which is the rationale of RFE algorithm. The rationale behind this is, building ensemble SVM models using randomly drawn bootstrap samples from the training set, will produce different feature rankings which will be subsequently aggregated as one feature ranking. As a result, the decision for elimination of features is based upon the ranking of multiple SVM models instead of choosing one particular model. Moreover, this approach will address the problem of imbalanced datasets by constructing a nearly balanced bootstrap sample. Our experiments show that ESVM-RFE for gene selection substantially increased the classification performance on five microarray datasets compared to state-of-the-art methods. Experiments on the childhood leukaemia dataset show that an average 9% better accuracy is achieved by ESVM-RFE over SVM-RFE, and 5% over random forest based approach. The selected genes by the ESVM-RFE algorithm were further explored with Singular Value Decomposition (SVD) which reveals significant clusters with the selected data. PMID:27304923

  11. Removing Batch Effects from Longitudinal Gene Expression - Quantile Normalization Plus ComBat as Best Approach for Microarray Transcriptome Data

    PubMed Central

    Müller, Christian; Schillert, Arne; Röthemeier, Caroline; Trégouët, David-Alexandre; Proust, Carole; Binder, Harald; Pfeiffer, Norbert; Beutel, Manfred; Lackner, Karl J.; Schnabel, Renate B.; Tiret, Laurence; Wild, Philipp S.; Blankenberg, Stefan

    2016-01-01

    Technical variation plays an important role in microarray-based gene expression studies, and batch effects explain a large proportion of this noise. It is therefore mandatory to eliminate technical variation while maintaining biological variability. Several strategies have been proposed for the removal of batch effects, although they have not been evaluated in large-scale longitudinal gene expression data. In this study, we aimed at identifying a suitable method for batch effect removal in a large study of microarray-based longitudinal gene expression. Monocytic gene expression was measured in 1092 participants of the Gutenberg Health Study at baseline and 5-year follow up. Replicates of selected samples were measured at both time points to identify technical variability. Deming regression, Passing-Bablok regression, linear mixed models, non-linear models as well as ReplicateRUV and ComBat were applied to eliminate batch effects between replicates. In a second step, quantile normalization prior to batch effect correction was performed for each method. Technical variation between batches was evaluated by principal component analysis. Associations between body mass index and transcriptomes were calculated before and after batch removal. Results from association analyses were compared to evaluate maintenance of biological variability. Quantile normalization, separately performed in each batch, combined with ComBat successfully reduced batch effects and maintained biological variability. ReplicateRUV performed perfectly in the replicate data subset of the study, but failed when applied to all samples. All other methods did not substantially reduce batch effects in the replicate data subset. Quantile normalization plus ComBat appears to be a valuable approach for batch correction in longitudinal gene expression data. PMID:27272489

  12. Characteristics of genomic signatures derived using univariate methods and mechanistically anchored functional descriptors for predicting drug- and xenobiotic-induced nephrotoxicity.

    PubMed

    Shi, Weiwei; Bugrim, Andrej; Nikolsky, Yuri; Nikolskya, Tatiana; Brennan, Richard J

    2008-01-01

    ABSTRACT The ideal toxicity biomarker is composed of the properties of prediction (is detected prior to traditional pathological signs of injury), accuracy (high sensitivity and specificity), and mechanistic relationships to the endpoint measured (biological relevance). Gene expression-based toxicity biomarkers ("signatures") have shown good predictive power and accuracy, but are difficult to interpret biologically. We have compared different statistical methods of feature selection with knowledge-based approaches, using GeneGo's database of canonical pathway maps, to generate gene sets for the classification of renal tubule toxicity. The gene set selection algorithms include four univariate analyses: t-statistics, fold-change, B-statistics, and RankProd, and their combination and overlap for the identification of differentially expressed probes. Enrichment analysis following the results of the four univariate analyses, Hotelling T-square test, and, finally out-of-bag selection, a variant of cross-validation, were used to identify canonical pathway maps-sets of genes coordinately involved in key biological processes-with classification power. Differentially expressed genes identified by the different statistical univariate analyses all generated reasonably performing classifiers of tubule toxicity. Maps identified by enrichment analysis or Hotelling T-square had lower classification power, but highlighted perturbed lipid homeostasis as a common discriminator of nephrotoxic treatments. The out-of-bag method yielded the best functionally integrated classifier. The map "ephrins signaling" performed comparably to a classifier derived using sparse linear programming, a machine learning algorithm, and represents a signaling network specifically involved in renal tubule development and integrity. Such functional descriptors of toxicity promise to better integrate predictive toxicogenomics with mechanistic analysis, facilitating the interpretation and risk assessment of predictive genomic investigations.

  13. Aberrant DNA methylation of tumor-related genes in oral rinse: a noninvasive method for detection of oral squamous cell carcinoma.

    PubMed

    Nagata, Satoshi; Hamada, Tomofumi; Yamada, Norishige; Yokoyama, Seiya; Kitamoto, Sho; Kanmura, Yuji; Nomura, Masahiro; Kamikawa, Yoshiaki; Yonezawa, Suguru; Sugihara, Kazumasa

    2012-09-01

    The early detection of oral squamous cell carcinoma (OSCC) is important, and a screening test with high sensitivity and specificity is urgently needed. Therefore, in this study, the authors investigated the methylation status of tumor-related genes with the objective of establishing a noninvasive method for the detection of OSCC. Oral rinse samples were obtained from 34 patients with OSCC and from 24 healthy individuals (controls). The methylation status of 13 genes was determined by using methylation-specific polymerase chain reaction analysis and was quantified using a microchip electrophoresis system. Promoter methylation in each participant was screened by receiver operating characteristic analysis, and the utility of each gene's methylation status, alone and in combination with other genes, was evaluated as a tool for oral cancer detection. Eight of the 13 genes had significantly higher levels of DNA methylation in samples from patients with OSCC than in controls. The genes E-cadherin (ECAD), transmembrane protein with epidermal growth factor-like and 2 follistatin-like domains 2 (TMEFF2), retinoic acid receptor beta (RARβ), and O-6 methylguanine DNA methyltransferase (MGMT) had high sensitivity (>75%) and specificity for the detection of oral cancer. OSCC was detected with 100% sensitivity and 87.5% specificity using a combination of ECAD, TMEFF2, RARβ, and MGMT and with 97.1% sensitivity and 91.7% specificity using a combination of ECAD, TMEFF2, and MGMT. The aberrant methylation of a combination of marker genes present in oral rinse samples was used to detect OSCC with >90% sensitivity and specificity. The detection of methylated marker genes from oral rinse samples has great potential for the noninvasive detection of OSCC. Copyright © 2012 American Cancer Society.

  14. A reliable combination method to identification and typing of epidemic and endemic clones among clinical isolates of Acinetobacter baumannii.

    PubMed

    Piran, Arezoo; Shahcheraghi, Fereshteh; Solgi, Hamid; Rohani, Mahdi; Badmasti, Farzad

    2017-10-01

    The multi-drug resistant (MDR) Acinetobacter baumannii as an important nosocomial pathogen has emerged a global health concern in recent years. In this study, we applied three easier, faster, and cost-effective methods including PCR-based open reading frames (ORFs) typing, sequence typing of bla OXA-51-like and RAPD-PCR method to rapid typing of A. baumannii strains. Taken together in the present study the results of ORFs typing, PCR-sequencing of bla OXA-51-like genes and MLST sequence typing revealed there was a high prevalence (62%, 35/57) of ST2 as international and successful clone which detected among clinical isolates of multi-drug resistant A. baumannii with ORF pattern B and bla OXA-66 gene. Only 7% (4/57) of MDR isolates belonged to ST1 with ORF pattern A and bla OXA-69 gene. Interestingly, we detected singleton ST513 (32%, 18/57) that encoded bla OXA-90 and showed the ORF pattern H as previously isolated in Middle East. Moreover, our data showed RAPD-PCR method can detect divergent strains of the STs. The Cl-1, Cl-2, Cl-3, Cl-4, Cl-10, Cl-11, Cl-12, Cl-13 and Cl-14 belonged to ST2. While the Cl-6, Cl-7, Cl-8 and Cl-9 belonged to ST513. Only Cl-5 belonged to ST1. It seems that the combination of these methods have more discriminatory than any method separately and could be effectively applied to rapid detection of the clonal complex (CC) of A. baumannii strains without performing of MLST or PFGE. Copyright © 2017 Elsevier B.V. All rights reserved.

  15. Clinical omics analysis of colorectal cancer incorporating copy number aberrations and gene expression data.

    PubMed

    Yoshida, Tsuyoshi; Kobayashi, Takumi; Itoda, Masaya; Muto, Taika; Miyaguchi, Ken; Mogushi, Kaoru; Shoji, Satoshi; Shimokawa, Kazuro; Iida, Satoru; Uetake, Hiroyuki; Ishikawa, Toshiaki; Sugihara, Kenichi; Mizushima, Hiroshi; Tanaka, Hiroshi

    2010-07-29

    Colorectal cancer (CRC) is one of the most frequently occurring cancers in Japan, and thus a wide range of methods have been deployed to study the molecular mechanisms of CRC. In this study, we performed a comprehensive analysis of CRC, incorporating copy number aberration (CRC) and gene expression data. For the last four years, we have been collecting data from CRC cases and organizing the information as an "omics" study by integrating many kinds of analysis into a single comprehensive investigation. In our previous studies, we had experienced difficulty in finding genes related to CRC, as we observed higher noise levels in the expression data than in the data for other cancers. Because chromosomal aberrations are often observed in CRC, here, we have performed a combination of CNA analysis and expression analysis in order to identify some new genes responsible for CRC. This study was performed as part of the Clinical Omics Database Project at Tokyo Medical and Dental University. The purpose of this study was to investigate the mechanism of genetic instability in CRC by this combination of expression analysis and CNA, and to establish a new method for the diagnosis and treatment of CRC. Comprehensive gene expression analysis was performed on 79 CRC cases using an Affymetrix Gene Chip, and comprehensive CNA analysis was performed using an Affymetrix DNA Sty array. To avoid the contamination of cancer tissue with normal cells, laser micro-dissection was performed before DNA/RNA extraction. Data analysis was performed using original software written in the R language. We observed a high percentage of CNA in colorectal cancer, including copy number gains at 7, 8q, 13 and 20q, and copy number losses at 8p, 17p and 18. Gene expression analysis provided many candidates for CRC-related genes, but their association with CRC did not reach the level of statistical significance. The combination of CNA and gene expression analysis, together with the clinical information, suggested UGT2B28, LOC440995, CXCL6, SULT1B1, RALBP1, TYMS, RAB12, RNMT, ARHGDIB, S1000A2, ABHD2, OIT3 and ABHD12 as genes that are possibly associated with CRC. Some of these genes have already been reported as being related to CRC. TYMS has been reported as being associated with resistance to the anti-cancer drug 5-fluorouracil, and we observed a copy number increase for this gene. RALBP1, ARHGDIB and S100A2 have been reported as oncogenes, and we observed copy number increases in each. ARHGDIB has been reported as a metastasis-related gene, and our data also showed copy number increases of this gene in cases with metastasis. The combination of CNA analysis and gene expression analysis was a more effective method for finding genes associated with the clinicopathological classification of CRC than either analysis alone. Using this combination of methods, we were able to detect genes that have already been associated with CRC. We also identified additional candidate genes that may be new markers or targets for this form of cancer.

  16. Subthalamic hGAD65 Gene Therapy and Striatum TH Gene Transfer in a Parkinson’s Disease Rat Model

    PubMed Central

    Zheng, Deyu; Jiang, Xiaohua; Zhao, Junpeng; Duan, Deyi; Zhao, Huanying; Xu, Qunyuan

    2013-01-01

    The aim of the present study is to detect a combination method to utilize gene therapy for the treatment of Parkinson’s disease (PD). Here, a PD rat model is used for the in vivo gene therapy of a recombinant adeno-associated virus (AAV2) containing a human glutamic acid decarboxylase 65 (rAAV2-hGAD65) gene delivered to the subthalamic nucleus (STN). This is combined with the ex vivo gene delivery of tyrosine hydroxylase (TH) by fibroblasts injected into the striatum. After the treatment, the rotation behavior was improved with the greatest efficacy in the combination group. The results of immunohistochemistry showed that hGAD65 gene delivery by AAV2 successfully led to phenotypic changes of neurons in STN. And the levels of glutamic acid and GABA in the internal segment of the globus pallidus (GPi) and substantia nigra pars reticulata (SNr) were obviously lower than the control groups. However, hGAD65 gene transfer did not effectively protect surviving dopaminergic neurons in the SNc and VTA. This study suggests that subthalamic hGAD65 gene therapy and combined with TH gene therapy can alleviate symptoms of the PD model rats, independent of the protection the DA neurons from death. PMID:23738148

  17. The study on serum and urine of renal interstitial fibrosis rats induced by unilateral ureteral obstruction based on metabonomics and network analysis methods.

    PubMed

    Xiang, Zheng; Sun, Hao; Cai, Xiaojun; Chen, Dahui

    2016-04-01

    Transmission of biological information is a biochemical process of multistep cascade from genes/proteins to metabolites. However, because most metabolites reflect the terminal information of the biochemical process, it is difficult to describe the transmission process of disease information in terms of the metabolomics strategy. In this paper, by incorporating network and metabolomics methods, an integrated approach was proposed to systematically investigate and explain the molecular mechanism of renal interstitial fibrosis. Through analysis of the network, the cascade transmission process of disease information starting from genes/proteins to metabolites was putatively identified and uncovered. The results indicated that renal fibrosis was involved in metabolic pathways of glycerophospholipid metabolism, biosynthesis of unsaturated fatty acids and arachidonic acid metabolism, riboflavin metabolism, tyrosine metabolism, and sphingolipid metabolism. These pathways involve kidney disease genes such as TGF-β1 and P2RX7. Our results showed that combining metabolomics and network analysis can provide new strategies and ideas for the interpretation of pathogenesis of disease with full consideration of "gene-protein-metabolite."

  18. Macrophage mediated PCI enhanced gene-directed enzyme prodrug therapy

    NASA Astrophysics Data System (ADS)

    Christie, Catherine E.; Zamora, Genesis; Kwon, Young J.; Berg, Kristian; Madsen, Steen J.; Hirschberg, Henry

    2015-03-01

    Photochemical internalization (PCI) is a photodynamic therapy-based approach for improving the delivery of macromolecules and genes into the cell cytosol. Prodrug activating gene therapy (suicide gene therapy) employing the transduction of the E. coli cytosine deaminase (CD) gene into tumor cells, is a promising method. Expression of this gene within the target cell produces an enzyme that converts the nontoxic prodrug, 5-FC, to the toxic metabolite, 5-fluorouracil (5-FU). 5-FC may be particularly suitable for brain tumors, because it can readily cross the bloodbrain barrier (BBB). In addition the bystander effect, where activated drug is exported from the transfected cancer cells into the tumor microenvironment, plays an important role by inhibiting growth of adjacent tumor cells. Tumor-associated macrophages (TAMs) are frequently found in and around glioblastomas. Monocytes or macrophages (Ma) loaded with drugs, nanoparticles or photosensitizers could therefore be used to target tumors by local synthesis of chemo attractive factors. The basic concept is to combine PCI, to enhance the ex vivo transfection of a suicide gene into Ma, employing specially designed core/shell NP as gene carrier.

  19. Combining isothermal rolling circle amplification and electrochemiluminescence for highly sensitive point mutation detection

    NASA Astrophysics Data System (ADS)

    Su, Qiang; Zhou, Xiaoming

    2008-12-01

    Many pathogenic and genetic diseases are associated with changes in the sequence of particular genes. We describe here a rapid and highly efficient assay for the detection of point mutation. This method is a combination of isothermal rolling circle amplification (RCA) and high sensitive electrochemluminescence (ECL) detection. In the design, a circular template generated by ligation upon the recognition of a point mutation on DNA targets was amplified isothermally by the Phi29 polymerase using a biotinylated primer. The elongation products were hybridized with tris (bipyridine) ruthenium (TBR)-tagged probes and detected in a magnetic bead based ECL platform, indicating the mutation occurrence. P53 was chosen as a model for the identification of this method. The method allowed sensitive determination of the P53 mutation from wild-type and mutant samples. The main advantage of RCA-ECL is that it can be performed under isothermal conditions and avoids the generation of false-positive results. Furthermore, ECL provides a faster, more sensitive, and economical option to currently available electrophoresis-based methods.

  20. Epigenetic differences in monozygotic twins discordant for major depressive disorder

    PubMed Central

    Malki, K; Koritskaya, E; Harris, F; Bryson, K; Herbster, M; Tosto, M G

    2016-01-01

    Although monozygotic (MZ) twins share the majority of their genetic makeup, they can be phenotypically discordant on several traits and diseases. DNA methylation is an epigenetic mechanism that can be influenced by genetic, environmental and stochastic events and may have an important impact on individual variability. In this study we explored epigenetic differences in peripheral blood samples in three MZ twin studies on major depressive disorder (MDD). Epigenetic data for twin pairs were collected as part of a previous study using 8.1-K-CpG microarrays tagging DNA modification in white blood cells from MZ twins discordant for MDD. Data originated from three geographical regions: UK, Australia and the Netherlands. Ninety-seven MZ pairs (194 individuals) discordant for MDD were included. Different methods to address non independently-and-identically distributed (non-i.i.d.) data were evaluated. Machine-learning methods with feature selection centered on support vector machine and random forest were used to build a classifier to predict cases and controls based on epivariations. The most informative variants were mapped to genes and carried forward for network analysis. A mixture approach using principal component analysis (PCA) and Bayes methods allowed to combine the three studies and to leverage the increased predictive power provided by the larger sample. A machine-learning algorithm with feature reduction classified affected from non-affected twins above chance levels in an independent training-testing design. Network analysis revealed gene networks centered on the PPAR−γ (NR1C3) and C-MYC gene hubs interacting through the AP-1 (c-Jun) transcription factor. PPAR−γ (NR1C3) is a drug target for pioglitazone, which has been shown to reduce depression symptoms in patients with MDD. Using a data-driven approach we were able to overcome challenges of non-i.i.d. data when combining epigenetic studies from MZ twins discordant for MDD. Individually, the studies yielded negative results but when combined classification of the disease state from blood epigenome alone was possible. Network analysis revealed genes and gene networks that support the inflammation hypothesis of MDD. PMID:27300265

  1. Epigenetic differences in monozygotic twins discordant for major depressive disorder.

    PubMed

    Malki, K; Koritskaya, E; Harris, F; Bryson, K; Herbster, M; Tosto, M G

    2016-06-14

    Although monozygotic (MZ) twins share the majority of their genetic makeup, they can be phenotypically discordant on several traits and diseases. DNA methylation is an epigenetic mechanism that can be influenced by genetic, environmental and stochastic events and may have an important impact on individual variability. In this study we explored epigenetic differences in peripheral blood samples in three MZ twin studies on major depressive disorder (MDD). Epigenetic data for twin pairs were collected as part of a previous study using 8.1-K-CpG microarrays tagging DNA modification in white blood cells from MZ twins discordant for MDD. Data originated from three geographical regions: UK, Australia and the Netherlands. Ninety-seven MZ pairs (194 individuals) discordant for MDD were included. Different methods to address non independently-and-identically distributed (non-i.i.d.) data were evaluated. Machine-learning methods with feature selection centered on support vector machine and random forest were used to build a classifier to predict cases and controls based on epivariations. The most informative variants were mapped to genes and carried forward for network analysis. A mixture approach using principal component analysis (PCA) and Bayes methods allowed to combine the three studies and to leverage the increased predictive power provided by the larger sample. A machine-learning algorithm with feature reduction classified affected from non-affected twins above chance levels in an independent training-testing design. Network analysis revealed gene networks centered on the PPAR-γ (NR1C3) and C-MYC gene hubs interacting through the AP-1 (c-Jun) transcription factor. PPAR-γ (NR1C3) is a drug target for pioglitazone, which has been shown to reduce depression symptoms in patients with MDD. Using a data-driven approach we were able to overcome challenges of non-i.i.d. data when combining epigenetic studies from MZ twins discordant for MDD. Individually, the studies yielded negative results but when combined classification of the disease state from blood epigenome alone was possible. Network analysis revealed genes and gene networks that support the inflammation hypothesis of MDD.

  2. OpenFlyData: an exemplar data web integrating gene expression data on the fruit fly Drosophila melanogaster.

    PubMed

    Miles, Alistair; Zhao, Jun; Klyne, Graham; White-Cooper, Helen; Shotton, David

    2010-10-01

    Integrating heterogeneous data across distributed sources is a major requirement for in silico bioinformatics supporting translational research. For example, genome-scale data on patterns of gene expression in the fruit fly Drosophila melanogaster are widely used in functional genomic studies in many organisms to inform candidate gene selection and validate experimental results. However, current data integration solutions tend to be heavy weight, and require significant initial and ongoing investment of effort. Development of a common Web-based data integration infrastructure (a.k.a. data web), using Semantic Web standards, promises to alleviate these difficulties, but little is known about the feasibility, costs, risks or practical means of migrating to such an infrastructure. We describe the development of OpenFlyData, a proof-of-concept system integrating gene expression data on D. melanogaster, combining Semantic Web standards with light-weight approaches to Web programming based on Web 2.0 design patterns. To support researchers designing and validating functional genomic studies, OpenFlyData includes user-facing search applications providing intuitive access to and comparison of gene expression data from FlyAtlas, the BDGP in situ database, and FlyTED, using data from FlyBase to expand and disambiguate gene names. OpenFlyData's services are also openly accessible, and are available for reuse by other bioinformaticians and application developers. Semi-automated methods and tools were developed to support labour- and knowledge-intensive tasks involved in deploying SPARQL services. These include methods for generating ontologies and relational-to-RDF mappings for relational databases, which we illustrate using the FlyBase Chado database schema; and methods for mapping gene identifiers between databases. The advantages of using Semantic Web standards for biomedical data integration are discussed, as are open issues. In particular, although the performance of open source SPARQL implementations is sufficient to query gene expression data directly from user-facing applications such as Web-based data fusions (a.k.a. mashups), we found open SPARQL endpoints to be vulnerable to denial-of-service-type problems, which must be mitigated to ensure reliability of services based on this standard. These results are relevant to data integration activities in translational bioinformatics. The gene expression search applications and SPARQL endpoints developed for OpenFlyData are deployed at http://openflydata.org. FlyUI, a library of JavaScript widgets providing re-usable user-interface components for Drosophila gene expression data, is available at http://flyui.googlecode.com. Software and ontologies to support transformation of data from FlyBase, FlyAtlas, BDGP and FlyTED to RDF are available at http://openflydata.googlecode.com. SPARQLite, an implementation of the SPARQL protocol, is available at http://sparqlite.googlecode.com. All software is provided under the GPL version 3 open source license.

  3. A Genome-Wide Association Meta-Analysis of Attention-Deficit/Hyperactivity Disorder Symptoms in Population-Based Paediatric Cohorts

    PubMed Central

    Groen-Blokhuis, Maria M.; Pourcain, Beate St.; Greven, Corina U.; Pappa, Irene; Tiesler, Carla M.T.; Ang, Wei; Nolte, Ilja M.; Vilor-Tejedor, Natalia; Bacelis, Jonas; Ebejer, Jane L.; Zhao, Huiying; Davies, Gareth E.; Ehli, Erik A.; Evans, David M.; Fedko, Iryna O.; Guxens, Mònica; Hottenga, Jouke-Jan; Hudziak, James J.; Jugessur, Astanand; Kemp, John P.; Krapohl, Eva; Martin, Nicholas G.; Murcia, Mario; Myhre, Ronny; Ormel, Johan; Ring, Susan M.; Standl, Marie; Stergiakouli, Evie; Stoltenberg, Camilla; Thiering, Elisabeth; Timpson, Nicholas J.; Trzaskowski, Maciej; van der Most, Peter J.; Wang, Carol; Nyholt, Dale R.; Medland, Sarah E.; Neale, Benjamin; Jacobsson, Bo; Sunyer, Jordi; Hartman, Catharina A.; Whitehouse, Andrew J.O.; Pennell, Craig E.; Heinrich, Joachim; Plomin, Robert; Smith, George Davey; Tiemeier, Henning; Posthuma, Danielle; Boomsma, Dorret I.

    2016-01-01

    Objective To elucidate the influence of common genetic variants on childhood attention-deficit/hyperactivity disorder (ADHD) symptoms, to identify genetic variants that explain its high heritability, and to investigate the genetic overlap of ADHD symptom scores with ADHD diagnosis. Method Within the EArly Genetics and Lifecourse Epidemiology (EAGLE) consortium, genome-wide single nucleotide polymorphisms (SNPs) and ADHD symptom scores were available for 17,666 children (< 13 years) from nine population-based cohorts. SNP-based heritability was estimated in data from the three largest cohorts. Meta-analysis based on genome-wide association (GWA) analyses with SNPs was followed by gene-based association tests, and the overlap in results with a meta-analysis in the Psychiatric Genomics Consortium (PGC) case-control ADHD study was investigated. Results SNP-based heritability ranged from 5% to 34%, indicating that variation in common genetic variants influences ADHD symptom scores. The meta-analysis did not detect genome-wide significant SNPs, but three genes, lying close to each other with SNPs in high linkage disequilibrium (LD), showed a gene-wide significant association (p values between 1.46×10-6 and 2.66×10-6). One gene, WASL, is involved in neuronal development. Both SNP- and gene-based analyses indicated overlap with the PGC meta-analysis results with the genetic correlation estimated at 0.96. Conclusion The SNP-based heritability for ADHD symptom scores indicates a polygenic architecture and genes involved in neurite outgrowth are possibly involved. Continuous and dichotomous measures of ADHD appear to assess a genetically common phenotype. A next step is to combine data from population-based and case-control cohorts in genetic association studies to increase sample size and improve statistical power for identifying genetic variants. PMID:27663945

  4. Novel method to load multiple genes onto a mammalian artificial chromosome.

    PubMed

    Tóth, Anna; Fodor, Katalin; Praznovszky, Tünde; Tubak, Vilmos; Udvardy, Andor; Hadlaczky, Gyula; Katona, Robert L

    2014-01-01

    Mammalian artificial chromosomes are natural chromosome-based vectors that may carry a vast amount of genetic material in terms of both size and number. They are reasonably stable and segregate well in both mitosis and meiosis. A platform artificial chromosome expression system (ACEs) was earlier described with multiple loading sites for a modified lambda-integrase enzyme. It has been shown that this ACEs is suitable for high-level industrial protein production and the treatment of a mouse model for a devastating human disorder, Krabbe's disease. ACEs-treated mutant mice carrying a therapeutic gene lived more than four times longer than untreated counterparts. This novel gene therapy method is called combined mammalian artificial chromosome-stem cell therapy. At present, this method suffers from the limitation that a new selection marker gene should be present for each therapeutic gene loaded onto the ACEs. Complex diseases require the cooperative action of several genes for treatment, but only a limited number of selection marker genes are available and there is also a risk of serious side-effects caused by the unwanted expression of these marker genes in mammalian cells, organs and organisms. We describe here a novel method to load multiple genes onto the ACEs by using only two selectable marker genes. These markers may be removed from the ACEs before therapeutic application. This novel technology could revolutionize gene therapeutic applications targeting the treatment of complex disorders and cancers. It could also speed up cell therapy by allowing researchers to engineer a chromosome with a predetermined set of genetic factors to differentiate adult stem cells, embryonic stem cells and induced pluripotent stem (iPS) cells into cell types of therapeutic value. It is also a suitable tool for the investigation of complex biochemical pathways in basic science by producing an ACEs with several genes from a signal transduction pathway of interest.

  5. Confident difference criterion: a new Bayesian differentially expressed gene selection algorithm with applications.

    PubMed

    Yu, Fang; Chen, Ming-Hui; Kuo, Lynn; Talbott, Heather; Davis, John S

    2015-08-07

    Recently, the Bayesian method becomes more popular for analyzing high dimensional gene expression data as it allows us to borrow information across different genes and provides powerful estimators for evaluating gene expression levels. It is crucial to develop a simple but efficient gene selection algorithm for detecting differentially expressed (DE) genes based on the Bayesian estimators. In this paper, by extending the two-criterion idea of Chen et al. (Chen M-H, Ibrahim JG, Chi Y-Y. A new class of mixture models for differential gene expression in DNA microarray data. J Stat Plan Inference. 2008;138:387-404), we propose two new gene selection algorithms for general Bayesian models and name these new methods as the confident difference criterion methods. One is based on the standardized differences between two mean expression values among genes; the other adds the differences between two variances to it. The proposed confident difference criterion methods first evaluate the posterior probability of a gene having different gene expressions between competitive samples and then declare a gene to be DE if the posterior probability is large. The theoretical connection between the proposed first method based on the means and the Bayes factor approach proposed by Yu et al. (Yu F, Chen M-H, Kuo L. Detecting differentially expressed genes using alibrated Bayes factors. Statistica Sinica. 2008;18:783-802) is established under the normal-normal-model with equal variances between two samples. The empirical performance of the proposed methods is examined and compared to those of several existing methods via several simulations. The results from these simulation studies show that the proposed confident difference criterion methods outperform the existing methods when comparing gene expressions across different conditions for both microarray studies and sequence-based high-throughput studies. A real dataset is used to further demonstrate the proposed methodology. In the real data application, the confident difference criterion methods successfully identified more clinically important DE genes than the other methods. The confident difference criterion method proposed in this paper provides a new efficient approach for both microarray studies and sequence-based high-throughput studies to identify differentially expressed genes.

  6. An information-gain approach to detecting three-way epistatic interactions in genetic association studies

    PubMed Central

    Hu, Ting; Chen, Yuanzhu; Kiralis, Jeff W; Collins, Ryan L; Wejse, Christian; Sirugo, Giorgio; Williams, Scott M; Moore, Jason H

    2013-01-01

    Background Epistasis has been historically used to describe the phenomenon that the effect of a given gene on a phenotype can be dependent on one or more other genes, and is an essential element for understanding the association between genetic and phenotypic variations. Quantifying epistasis of orders higher than two is very challenging due to both the computational complexity of enumerating all possible combinations in genome-wide data and the lack of efficient and effective methodologies. Objectives In this study, we propose a fast, non-parametric, and model-free measure for three-way epistasis. Methods Such a measure is based on information gain, and is able to separate all lower order effects from pure three-way epistasis. Results Our method was verified on synthetic data and applied to real data from a candidate-gene study of tuberculosis in a West African population. In the tuberculosis data, we found a statistically significant pure three-way epistatic interaction effect that was stronger than any lower-order associations. Conclusion Our study provides a methodological basis for detecting and characterizing high-order gene-gene interactions in genetic association studies. PMID:23396514

  7. Reconstructing gene regulatory networks from knock-out data using Gaussian Noise Model and Pearson Correlation Coefficient.

    PubMed

    Mohamed Salleh, Faridah Hani; Arif, Shereena Mohd; Zainudin, Suhaila; Firdaus-Raih, Mohd

    2015-12-01

    A gene regulatory network (GRN) is a large and complex network consisting of interacting elements that, over time, affect each other's state. The dynamics of complex gene regulatory processes are difficult to understand using intuitive approaches alone. To overcome this problem, we propose an algorithm for inferring the regulatory interactions from knock-out data using a Gaussian model combines with Pearson Correlation Coefficient (PCC). There are several problems relating to GRN construction that have been outlined in this paper. We demonstrated the ability of our proposed method to (1) predict the presence of regulatory interactions between genes, (2) their directionality and (3) their states (activation or suppression). The algorithm was applied to network sizes of 10 and 50 genes from DREAM3 datasets and network sizes of 10 from DREAM4 datasets. The predicted networks were evaluated based on AUROC and AUPR. We discovered that high false positive values were generated by our GRN prediction methods because the indirect regulations have been wrongly predicted as true relationships. We achieved satisfactory results as the majority of sub-networks achieved AUROC values above 0.5. Copyright © 2015 Elsevier Ltd. All rights reserved.

  8. Learning to rank-based gene summary extraction.

    PubMed

    Shang, Yue; Hao, Huihui; Wu, Jiajin; Lin, Hongfei

    2014-01-01

    In recent years, the biomedical literature has been growing rapidly. These articles provide a large amount of information about proteins, genes and their interactions. Reading such a huge amount of literature is a tedious task for researchers to gain knowledge about a gene. As a result, it is significant for biomedical researchers to have a quick understanding of the query concept by integrating its relevant resources. In the task of gene summary generation, we regard automatic summary as a ranking problem and apply the method of learning to rank to automatically solve this problem. This paper uses three features as a basis for sentence selection: gene ontology relevance, topic relevance and TextRank. From there, we obtain the feature weight vector using the learning to rank algorithm and predict the scores of candidate summary sentences and obtain top sentences to generate the summary. ROUGE (a toolkit for summarization of automatic evaluation) was used to evaluate the summarization result and the experimental results showed that our method outperforms the baseline techniques. According to the experimental result, the combination of three features can improve the performance of summary. The application of learning to rank can facilitate the further expansion of features for measuring the significance of sentences.

  9. Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNA

    PubMed Central

    Djebali, Sarah; Delaplace, Franck; Crollius, Hugues Roest

    2006-01-01

    Background Accurate and automatic gene identification in eukaryotic genomic DNA is more than ever of crucial importance to efficiently exploit the large volume of assembled genome sequences available to the community. Automatic methods have always been considered less reliable than human expertise. This is illustrated in the EGASP project, where reference annotations against which all automatic methods are measured are generated by human annotators and experimentally verified. We hypothesized that replicating the accuracy of human annotators in an automatic method could be achieved by formalizing the rules and decisions that they use, in a mathematical formalism. Results We have developed Exogean, a flexible framework based on directed acyclic colored multigraphs (DACMs) that can represent biological objects (for example, mRNA, ESTs, protein alignments, exons) and relationships between them. Graphs are analyzed to process the information according to rules that replicate those used by human annotators. Simple individual starting objects given as input to Exogean are thus combined and synthesized into complex objects such as protein coding transcripts. Conclusion We show here, in the context of the EGASP project, that Exogean is currently the method that best reproduces protein coding gene annotations from human experts, in terms of identifying at least one exact coding sequence per gene. We discuss current limitations of the method and several avenues for improvement. PMID:16925841

  10. Accurate prediction of secondary metabolite gene clusters in filamentous fungi.

    PubMed

    Andersen, Mikael R; Nielsen, Jakob B; Klitgaard, Andreas; Petersen, Lene M; Zachariasen, Mia; Hansen, Tilde J; Blicher, Lene H; Gotfredsen, Charlotte H; Larsen, Thomas O; Nielsen, Kristian F; Mortensen, Uffe H

    2013-01-02

    Biosynthetic pathways of secondary metabolites from fungi are currently subject to an intense effort to elucidate the genetic basis for these compounds due to their large potential within pharmaceutics and synthetic biochemistry. The preferred method is methodical gene deletions to identify supporting enzymes for key synthases one cluster at a time. In this study, we design and apply a DNA expression array for Aspergillus nidulans in combination with legacy data to form a comprehensive gene expression compendium. We apply a guilt-by-association-based analysis to predict the extent of the biosynthetic clusters for the 58 synthases active in our set of experimental conditions. A comparison with legacy data shows the method to be accurate in 13 of 16 known clusters and nearly accurate for the remaining 3 clusters. Furthermore, we apply a data clustering approach, which identifies cross-chemistry between physically separate gene clusters (superclusters), and validate this both with legacy data and experimentally by prediction and verification of a supercluster consisting of the synthase AN1242 and the prenyltransferase AN11080, as well as identification of the product compound nidulanin A. We have used A. nidulans for our method development and validation due to the wealth of available biochemical data, but the method can be applied to any fungus with a sequenced and assembled genome, thus supporting further secondary metabolite pathway elucidation in the fungal kingdom.

  11. Combined lentiviral and RNAi technologies for the delivery and permanent silencing of the hsp25 gene.

    PubMed

    Kaur, Punit; Nagaraja, Ganachari M; Asea, Alexzander

    2011-01-01

    Elevated heat shock protein 27 (Hsp27) expression has been found in a number of tumors, including breast, prostate, gastric, uterine, ovarian, head and neck, and tumor arising from the nervous system and urinary system, and determined to be a predictor of poor clinical outcome. Although the mechanism of action of Hsp27 has been well documented, there are currently no available inhibitors of Hsp27 in clinical trials. RNA interference (RNAi) has the potential to offer more specificity and flexibility than traditional drugs to silence gene expression. Not surprisingly, RNAi has become a major focus for biotechnology and pharmaceutical companies, which are now in the early stages of developing RNAi therapeutics, mostly based on short interfering RNA (siRNAs), to target viral infection, cancer, hypercholesterolemia, cardiovascular disease, macular degeneration, and neurodegenerative diseases. However, the critical issues associated with RNAi as a therapeutic are delivery, specificity, and stability of the RNAi reagents. To date, the delivery is currently considered the biggest hurdle, as the introduction of siRNAs systemically into body fluids can result in their degradation, off-target effects, and immune detection. In this chapter, we discuss a method of combined lentiviral and RNAi-based technology for the delivery and permanent silencing of the hsp25 gene.

  12. Gene expression information improves reliability of receptor status in breast cancer patients

    PubMed Central

    Kenn, Michael; Schlangen, Karin; Castillo-Tong, Dan Cacsire; Singer, Christian F.; Cibena, Michael; Koelbl, Heinz; Schreiner, Wolfgang

    2017-01-01

    Immunohistochemical (IHC) determination of receptor status in breast cancer patients is frequently inaccurate. Since it directs the choice of systemic therapy, it is essential to increase its reliability. We increase the validity of IHC receptor expression by additionally considering gene expression (GE) measurements. Crisp therapeutic decisions are based on IHC estimates, even if they are borderline reliable. We further improve decision quality by a responsibility function, defining a critical domain for gene expression. Refined normalization is devised to file any newly diagnosed patient into existing data bases. Our approach renders receptor estimates more reliable by identifying patients with questionable receptor status. The approach is also more efficient since the rate of conclusive samples is increased. We have curated and evaluated gene expression data, together with clinical information, from 2880 breast cancer patients. Combining IHC with gene expression information yields a method more reliable and also more efficient as compared to common practice up to now. Several types of possibly suboptimal treatment allocations, based on IHC receptor status alone, are enumerated. A ‘therapy allocation check’ identifies patients possibly miss-classified. Estrogen: false negative 8%, false positive 6%. Progesterone: false negative 14%, false positive 11%. HER2: false negative 2%, false positive 50%. Possible implications are discussed. We propose an ‘expression look-up-plot’, allowing for a significant potential to improve the quality of precision medicine. Methods are developed and exemplified here for breast cancer patients, but they may readily be transferred to diagnostic data relevant for therapeutic decisions in other fields of oncology. PMID:29100391

  13. Two-Way Gene Interaction From Microarray Data Based on Correlation Methods.

    PubMed

    Alavi Majd, Hamid; Talebi, Atefeh; Gilany, Kambiz; Khayyer, Nasibeh

    2016-06-01

    Gene networks have generated a massive explosion in the development of high-throughput techniques for monitoring various aspects of gene activity. Networks offer a natural way to model interactions between genes, and extracting gene network information from high-throughput genomic data is an important and difficult task. The purpose of this study is to construct a two-way gene network based on parametric and nonparametric correlation coefficients. The first step in constructing a Gene Co-expression Network is to score all pairs of gene vectors. The second step is to select a score threshold and connect all gene pairs whose scores exceed this value. In the foundation-application study, we constructed two-way gene networks using nonparametric methods, such as Spearman's rank correlation coefficient and Blomqvist's measure, and compared them with Pearson's correlation coefficient. We surveyed six genes of venous thrombosis disease, made a matrix entry representing the score for the corresponding gene pair, and obtained two-way interactions using Pearson's correlation, Spearman's rank correlation, and Blomqvist's coefficient. Finally, these methods were compared with Cytoscape, based on BIND, and Gene Ontology, based on molecular function visual methods; R software version 3.2 and Bioconductor were used to perform these methods. Based on the Pearson and Spearman correlations, the results were the same and were confirmed by Cytoscape and GO visual methods; however, Blomqvist's coefficient was not confirmed by visual methods. Some results of the correlation coefficients are not the same with visualization. The reason may be due to the small number of data.

  14. Walnut (Juglans).

    PubMed

    Leslie, Charles A; Walawage, Sriema L; Uratsu, Sandra L; McGranahan, Gale; Dandekar, Abhaya M

    2015-01-01

    Walnut species are important nut and timber producers in temperate regions of Europe, Asia, South America, and North America. Trees can be impacted by Phytophthora, crown gall, nematodes, Armillaria, and cherry leaf roll virus; nuts can be severely damaged by codling moth, husk fly, and Xanthomonas blight. The long generation time of walnuts and an absence of identified natural resistance for most of these problems suggest biotechnological approaches to crop improvement. Described here is a somatic embryo-based transformation protocol that has been used to successfully insert horticulturally useful traits into walnut. Selection is based on the combined use of the selectable neomycin phosphotransferase (nptII) gene and the scorable uidA gene. Transformed embryos can be germinated or micropropagated and rooted for plant production. The method described has been used to establish field trials of mature trees.

  15. Development of a two-stage gene selection method that incorporates a novel hybrid approach using the cuckoo optimization algorithm and harmony search for cancer classification.

    PubMed

    Elyasigomari, V; Lee, D A; Screen, H R C; Shaheed, M H

    2017-03-01

    For each cancer type, only a few genes are informative. Due to the so-called 'curse of dimensionality' problem, the gene selection task remains a challenge. To overcome this problem, we propose a two-stage gene selection method called MRMR-COA-HS. In the first stage, the minimum redundancy and maximum relevance (MRMR) feature selection is used to select a subset of relevant genes. The selected genes are then fed into a wrapper setup that combines a new algorithm, COA-HS, using the support vector machine as a classifier. The method was applied to four microarray datasets, and the performance was assessed by the leave one out cross-validation method. Comparative performance assessment of the proposed method with other evolutionary algorithms suggested that the proposed algorithm significantly outperforms other methods in selecting a fewer number of genes while maintaining the highest classification accuracy. The functions of the selected genes were further investigated, and it was confirmed that the selected genes are biologically relevant to each cancer type. Copyright © 2017. Published by Elsevier Inc.

  16. Construction and applications of exon-trapping gene-targeting vectors with a novel strategy for negative selection.

    PubMed

    Saito, Shinta; Ura, Kiyoe; Kodama, Miho; Adachi, Noritaka

    2015-06-30

    Targeted gene modification by homologous recombination provides a powerful tool for studying gene function in cells and animals. In higher eukaryotes, non-homologous integration of targeting vectors occurs several orders of magnitude more frequently than does targeted integration, making the gene-targeting technology highly inefficient. For this reason, negative-selection strategies have been employed to reduce the number of drug-resistant clones associated with non-homologous vector integration, particularly when artificial nucleases to introduce a DNA break at the target site are unavailable or undesirable. As such, an exon-trap strategy using a promoterless drug-resistance marker gene provides an effective way to counterselect non-homologous integrants. However, constructing exon-trapping targeting vectors has been a time-consuming and complicated process. By virtue of highly efficient att-mediated recombination, we successfully developed a simple and rapid method to construct plasmid-based vectors that allow for exon-trapping gene targeting. These exon-trap vectors were useful in obtaining correctly targeted clones in mouse embryonic stem cells and human HT1080 cells. Most importantly, with the use of a conditionally cytotoxic gene, we further developed a novel strategy for negative selection, thereby enhancing the efficiency of counterselection for non-homologous integration of exon-trap vectors. Our methods will greatly facilitate exon-trapping gene-targeting technologies in mammalian cells, particularly when combined with the novel negative selection strategy.

  17. Developing a Novel Gene-Delivery Vector System Using the Recombinant Fusion Protein of Pseudomonas Exotoxin A and Hyperthermophilic Archaeal Histone HPhA

    PubMed Central

    Zhang, Ling; Feng, Yan; Li, Zehong; Wu, GuangMou; Yue, Yuhuan; Li, Gensong; Cao, Yu; Zhu, Ping

    2015-01-01

    Non-viral gene delivery system with many advantages has a great potential for the future of gene therapy. One inherent obstacle of such approach is the uptake by endocytosis into vesicular compartments. Receptor-mediated gene delivery method holds promise to overcome this obstacle. In this study, we developed a receptor-mediated gene delivery system based on a combination of the Pseudomonas exotoxin A (PE), which has a receptor binding and membrane translocation domain, and the hyperthermophilic archaeal histone (HPhA), which has the DNA binding ability. First, we constructed and expressed the rPE-HPhA fusion protein. We then examined the cytotoxicity and the DNA binding ability of rPE-HPhA. We further assessed the efficiency of transfection of the pEGF-C1 plasmid DNA to CHO cells by the rPE-HPhA system, in comparison to the cationic liposome method. The results showed that the transfection efficiency of rPE-HPhA was higher than that of cationic liposomes. In addition, the rPE-HPhA gene delivery system is non-specific to DNA sequence, topology or targeted cell type. Thus, the rPE-HPhA system can be used for delivering genes of interest into mammalian cells and has great potential to be applied for gene therapy. PMID:26556098

  18. Prioritization of candidate disease genes by topological similarity between disease and protein diffusion profiles.

    PubMed

    Zhu, Jie; Qin, Yufang; Liu, Taigang; Wang, Jun; Zheng, Xiaoqi

    2013-01-01

    Identification of gene-phenotype relationships is a fundamental challenge in human health clinic. Based on the observation that genes causing the same or similar phenotypes tend to correlate with each other in the protein-protein interaction network, a lot of network-based approaches were proposed based on different underlying models. A recent comparative study showed that diffusion-based methods achieve the state-of-the-art predictive performance. In this paper, a new diffusion-based method was proposed to prioritize candidate disease genes. Diffusion profile of a disease was defined as the stationary distribution of candidate genes given a random walk with restart where similarities between phenotypes are incorporated. Then, candidate disease genes are prioritized by comparing their diffusion profiles with that of the disease. Finally, the effectiveness of our method was demonstrated through the leave-one-out cross-validation against control genes from artificial linkage intervals and randomly chosen genes. Comparative study showed that our method achieves improved performance compared to some classical diffusion-based methods. To further illustrate our method, we used our algorithm to predict new causing genes of 16 multifactorial diseases including Prostate cancer and Alzheimer's disease, and the top predictions were in good consistent with literature reports. Our study indicates that integration of multiple information sources, especially the phenotype similarity profile data, and introduction of global similarity measure between disease and gene diffusion profiles are helpful for prioritizing candidate disease genes. Programs and data are available upon request.

  19. NCBI prokaryotic genome annotation pipeline.

    PubMed

    Tatusova, Tatiana; DiCuccio, Michael; Badretdin, Azat; Chetvernin, Vyacheslav; Nawrocki, Eric P; Zaslavsky, Leonid; Lomsadze, Alexandre; Pruitt, Kim D; Borodovsky, Mark; Ostell, James

    2016-08-19

    Recent technological advances have opened unprecedented opportunities for large-scale sequencing and analysis of populations of pathogenic species in disease outbreaks, as well as for large-scale diversity studies aimed at expanding our knowledge across the whole domain of prokaryotes. To meet the challenge of timely interpretation of structure, function and meaning of this vast genetic information, a comprehensive approach to automatic genome annotation is critically needed. In collaboration with Georgia Tech, NCBI has developed a new approach to genome annotation that combines alignment based methods with methods of predicting protein-coding and RNA genes and other functional elements directly from sequence. A new gene finding tool, GeneMarkS+, uses the combined evidence of protein and RNA placement by homology as an initial map of annotation to generate and modify ab initio gene predictions across the whole genome. Thus, the new NCBI's Prokaryotic Genome Annotation Pipeline (PGAP) relies more on sequence similarity when confident comparative data are available, while it relies more on statistical predictions in the absence of external evidence. The pipeline provides a framework for generation and analysis of annotation on the full breadth of prokaryotic taxonomy. For additional information on PGAP see https://www.ncbi.nlm.nih.gov/genome/annotation_prok/ and the NCBI Handbook, https://www.ncbi.nlm.nih.gov/books/NBK174280/. Published by Oxford University Press on behalf of Nucleic Acids Research 2016. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  20. Two-Way Gene Interaction From Microarray Data Based on Correlation Methods

    PubMed Central

    Alavi Majd, Hamid; Talebi, Atefeh; Gilany, Kambiz; Khayyer, Nasibeh

    2016-01-01

    Background Gene networks have generated a massive explosion in the development of high-throughput techniques for monitoring various aspects of gene activity. Networks offer a natural way to model interactions between genes, and extracting gene network information from high-throughput genomic data is an important and difficult task. Objectives The purpose of this study is to construct a two-way gene network based on parametric and nonparametric correlation coefficients. The first step in constructing a Gene Co-expression Network is to score all pairs of gene vectors. The second step is to select a score threshold and connect all gene pairs whose scores exceed this value. Materials and Methods In the foundation-application study, we constructed two-way gene networks using nonparametric methods, such as Spearman’s rank correlation coefficient and Blomqvist’s measure, and compared them with Pearson’s correlation coefficient. We surveyed six genes of venous thrombosis disease, made a matrix entry representing the score for the corresponding gene pair, and obtained two-way interactions using Pearson’s correlation, Spearman’s rank correlation, and Blomqvist’s coefficient. Finally, these methods were compared with Cytoscape, based on BIND, and Gene Ontology, based on molecular function visual methods; R software version 3.2 and Bioconductor were used to perform these methods. Results Based on the Pearson and Spearman correlations, the results were the same and were confirmed by Cytoscape and GO visual methods; however, Blomqvist’s coefficient was not confirmed by visual methods. Conclusions Some results of the correlation coefficients are not the same with visualization. The reason may be due to the small number of data. PMID:27621916

  1. An Optimal Mean Based Block Robust Feature Extraction Method to Identify Colorectal Cancer Genes with Integrated Data.

    PubMed

    Liu, Jian; Cheng, Yuhu; Wang, Xuesong; Zhang, Lin; Liu, Hui

    2017-08-17

    It is urgent to diagnose colorectal cancer in the early stage. Some feature genes which are important to colorectal cancer development have been identified. However, for the early stage of colorectal cancer, less is known about the identity of specific cancer genes that are associated with advanced clinical stage. In this paper, we conducted a feature extraction method named Optimal Mean based Block Robust Feature Extraction method (OMBRFE) to identify feature genes associated with advanced colorectal cancer in clinical stage by using the integrated colorectal cancer data. Firstly, based on the optimal mean and L 2,1 -norm, a novel feature extraction method called Optimal Mean based Robust Feature Extraction method (OMRFE) is proposed to identify feature genes. Then the OMBRFE method which introduces the block ideology into OMRFE method is put forward to process the colorectal cancer integrated data which includes multiple genomic data: copy number alterations, somatic mutations, methylation expression alteration, as well as gene expression changes. Experimental results demonstrate that the OMBRFE is more effective than previous methods in identifying the feature genes. Moreover, genes identified by OMBRFE are verified to be closely associated with advanced colorectal cancer in clinical stage.

  2. A multi-strategy approach to informative gene identification from gene expression data.

    PubMed

    Liu, Ziying; Phan, Sieu; Famili, Fazel; Pan, Youlian; Lenferink, Anne E G; Cantin, Christiane; Collins, Catherine; O'Connor-McCourt, Maureen D

    2010-02-01

    An unsupervised multi-strategy approach has been developed to identify informative genes from high throughput genomic data. Several statistical methods have been used in the field to identify differentially expressed genes. Since different methods generate different lists of genes, it is very challenging to determine the most reliable gene list and the appropriate method. This paper presents a multi-strategy method, in which a combination of several data analysis techniques are applied to a given dataset and a confidence measure is established to select genes from the gene lists generated by these techniques to form the core of our final selection. The remainder of the genes that form the peripheral region are subject to exclusion or inclusion into the final selection. This paper demonstrates this methodology through its application to an in-house cancer genomics dataset and a public dataset. The results indicate that our method provides more reliable list of genes, which are validated using biological knowledge, biological experiments, and literature search. We further evaluated our multi-strategy method by consolidating two pairs of independent datasets, each pair is for the same disease, but generated by different labs using different platforms. The results showed that our method has produced far better results.

  3. Genetics pathway-based imaging approaches in Chinese Han population with Alzheimer's disease risk.

    PubMed

    Bai, Feng; Liao, Wei; Yue, Chunxian; Pu, Mengjia; Shi, Yongmei; Yu, Hui; Yuan, Yonggui; Geng, Leiyu; Zhang, Zhijun

    2016-01-01

    The tau hypothesis has been raised with regard to the pathophysiology of Alzheimer's disease (AD). Mild cognitive impairment (MCI) is associated with a high risk for developing AD. However, no study has directly examined the brain topological alterations based on combined effects of tau protein pathway genes in MCI population. Forty-three patients with MCI and 30 healthy controls underwent resting-state functional magnetic resonance imaging (fMRI) in Chinese Han, and a tau protein pathway-based imaging approaches (7 candidate genes: 17 SNPs) were used to investigate changes in the topological organisation of brain activation associated with MCI. Impaired regional activation is related to tau protein pathway genes (5/7 candidate genes) in patients with MCI and likely in topologically convergent and divergent functional alterations patterns associated with genes, and combined effects of tau protein pathway genes disrupt the topological architecture of cortico-cerebellar loops. The associations between the loops and behaviours further suggest that tau protein pathway genes do play a significant role in non-episodic memory impairment. Tau pathway-based imaging approaches might strengthen the credibility in imaging genetic associations and generate pathway frameworks that might provide powerful new insights into the neural mechanisms that underlie MCI.

  4. Integration of targeted sequencing and NIPT into clinical practice in a Chinese family with maple syrup urine disease.

    PubMed

    You, Yanqin; Sun, Yan; Li, Xuchao; Li, Yali; Wei, Xiaoming; Chen, Fang; Ge, Huijuan; Lan, Zhangzhang; Zhu, Qian; Tang, Ying; Wang, Shujuan; Gao, Ya; Jiang, Fuman; Song, Jiaping; Shi, Quan; Zhu, Xuan; Mu, Feng; Dong, Wei; Gao, Vince; Jiang, Hui; Yi, Xin; Wang, Wei; Gao, Zhiying

    2014-08-01

    This article demonstrates a prominent noninvasive prenatal approach to assist the clinical diagnosis of a single-gene disorder disease, maple syrup urine disease, using targeted sequencing knowledge from the affected family. The method reported here combines novel mutant discovery in known genes by targeted massively parallel sequencing with noninvasive prenatal testing. By applying this new strategy, we successfully revealed novel mutations in the gene BCKDHA (Ex2_4dup and c.392A>G) in this Chinese family and developed a prenatal haplotype-assisted approach to noninvasively detect the genotype of the fetus (transmitted from both parents). This is the first report of integration of targeted sequencing and noninvasive prenatal testing into clinical practice. Our study has demonstrated that this massively parallel sequencing-based strategy can potentially be used for single-gene disorder diagnosis in the future.

  5. Finding FMR1 mosaicism in Fragile X syndrome

    PubMed Central

    Gonçalves, Thaís Fernandez; dos Santos, Jussara Mendonça; Gonçalves, Andressa Pereira; Tassone, Flora; Mendoza-Morales, Guadalupe; Ribeiro, Márcia Gonçalves; Kahn, Evelyn; Boy, Raquel; Pimentel, Márcia Mattos Gonçalves; Santos-Rebouças, Cíntia Barros

    2016-01-01

    OBJETIVE Almost all patients with Fragile X Syndrome (FXS) exhibit a CGG repeat expansion (full mutation) in the Fragile Mental Retardation 1 gene (FMR1). Here, we report five unrelated males with FXS harboring a somatic full mutation/deletion mosaicism. METHODS Mutational profiles were only elucidated by using a combination of molecular approaches (CGG-based PCR, Sanger sequencing, MS-MLPA, Southern blot and mPCR). RESULT Four patients exhibited small deletions encompassing the CGG repeats tract and flanking regions, whereas the remaining had a larger deletion comprising at least exon 1 and part of intron 1 of FMR1 gene. The presence of a 2–3 base pairs microhomology in proximal and distal non-recurrent breakpoints without scars supports the involvement of microhomology mediated induced repair (MMBIR) mechanism in three small deletions. CONCLUSION Our data highlights the importance of using different research methods to elucidate atypical FXS mutational profiles, which are clinically undistinguishable and may have been underestimated. PMID:26716517

  6. A combination HIV reporter virus system for measuring post-entry event efficiency and viral outcome in primary CD4+ T cell subsets.

    PubMed

    Tilton, Carisa A; Tabler, Caroline O; Lucera, Mark B; Marek, Samantha L; Haqqani, Aiman A; Tilton, John C

    2014-01-01

    Fusion between the viral membrane of human immunodeficiency virus (HIV) and the host cell marks the end of the HIV entry process and the beginning of a series of post-entry events including uncoating, reverse transcription, integration, and viral gene expression. The efficiency of post-entry events can be modulated by cellular factors including viral restriction factors and can lead to several distinct outcomes: productive, latent, or abortive infection. Understanding host and viral proteins impacting post-entry event efficiency and viral outcome is critical for strategies to reduce HIV infectivity and to optimize transduction of HIV-based gene therapy vectors. Here, we report a combination reporter virus system measuring both membrane fusion and viral promoter-driven gene expression. This system enables precise determination of unstimulated primary CD4+ T cell subsets targeted by HIV, the efficiency of post-entry viral events, and viral outcome and is compatible with high-throughput screening and cell-sorting methods. Copyright © 2013 Elsevier B.V. All rights reserved.

  7. Reconstructing Genetic Regulatory Networks Using Two-Step Algorithms with the Differential Equation Models of Neural Networks.

    PubMed

    Chen, Chi-Kan

    2017-07-26

    The identification of genetic regulatory networks (GRNs) provides insights into complex cellular processes. A class of recurrent neural networks (RNNs) captures the dynamics of GRN. Algorithms combining the RNN and machine learning schemes were proposed to reconstruct small-scale GRNs using gene expression time series. We present new GRN reconstruction methods with neural networks. The RNN is extended to a class of recurrent multilayer perceptrons (RMLPs) with latent nodes. Our methods contain two steps: the edge rank assignment step and the network construction step. The former assigns ranks to all possible edges by a recursive procedure based on the estimated weights of wires of RNN/RMLP (RE RNN /RE RMLP ), and the latter constructs a network consisting of top-ranked edges under which the optimized RNN simulates the gene expression time series. The particle swarm optimization (PSO) is applied to optimize the parameters of RNNs and RMLPs in a two-step algorithm. The proposed RE RNN -RNN and RE RMLP -RNN algorithms are tested on synthetic and experimental gene expression time series of small GRNs of about 10 genes. The experimental time series are from the studies of yeast cell cycle regulated genes and E. coli DNA repair genes. The unstable estimation of RNN using experimental time series having limited data points can lead to fairly arbitrary predicted GRNs. Our methods incorporate RNN and RMLP into a two-step structure learning procedure. Results show that the RE RMLP using the RMLP with a suitable number of latent nodes to reduce the parameter dimension often result in more accurate edge ranks than the RE RNN using the regularized RNN on short simulated time series. Combining by a weighted majority voting rule the networks derived by the RE RMLP -RNN using different numbers of latent nodes in step one to infer the GRN, the method performs consistently and outperforms published algorithms for GRN reconstruction on most benchmark time series. The framework of two-step algorithms can potentially incorporate with different nonlinear differential equation models to reconstruct the GRN.

  8. Validation of Endogenous Internal Real-Time PCR Controls in Renal Tissues

    PubMed Central

    Cui, Xiangqin; Zhou, Juling; Qiu, Jing; Johnson, Martin R.; Mrug, Michal

    2009-01-01

    Background Endogenous internal controls (‘reference’ or ‘housekeeping’ genes) are widely used in real-time PCR (RT-PCR) analyses. Their use relies on the premise of consistently stable expression across studied experimental conditions. Unfortunately, none of these controls fulfills this premise across a wide range of experimental conditions; consequently, none of them can be recommended for universal use. Methods To determine which endogenous RT-PCR controls are suitable for analyses of renal tissues altered by kidney disease, we studied the expression of 16 commonly used ‘reference genes’ in 7 mildly and 7 severely affected whole kidney tissues from a well-characterized cystic kidney disease model. Expression levels of these 16 genes, determined by TaqMan® RT-PCR analyses and Affymetrix GeneChip® arrays, were normalized and tested for overall variance and equivalence of the means. Results Both statistical approaches and both TaqMan- and GeneChip-based methods converged on 3 out of the 4 top-ranked genes (Ppia, Gapdh and Pgk1) that had the most constant expression levels across the studied phenotypes. Conclusion A combination of the top-ranked genes will provide a suitable endogenous internal control for similar studies of kidney tissues across a wide range of disease severity. PMID:19729889

  9. Assessing differential expression in two-color microarrays: a resampling-based empirical Bayes approach.

    PubMed

    Li, Dongmei; Le Pape, Marc A; Parikh, Nisha I; Chen, Will X; Dye, Timothy D

    2013-01-01

    Microarrays are widely used for examining differential gene expression, identifying single nucleotide polymorphisms, and detecting methylation loci. Multiple testing methods in microarray data analysis aim at controlling both Type I and Type II error rates; however, real microarray data do not always fit their distribution assumptions. Smyth's ubiquitous parametric method, for example, inadequately accommodates violations of normality assumptions, resulting in inflated Type I error rates. The Significance Analysis of Microarrays, another widely used microarray data analysis method, is based on a permutation test and is robust to non-normally distributed data; however, the Significance Analysis of Microarrays method fold change criteria are problematic, and can critically alter the conclusion of a study, as a result of compositional changes of the control data set in the analysis. We propose a novel approach, combining resampling with empirical Bayes methods: the Resampling-based empirical Bayes Methods. This approach not only reduces false discovery rates for non-normally distributed microarray data, but it is also impervious to fold change threshold since no control data set selection is needed. Through simulation studies, sensitivities, specificities, total rejections, and false discovery rates are compared across the Smyth's parametric method, the Significance Analysis of Microarrays, and the Resampling-based empirical Bayes Methods. Differences in false discovery rates controls between each approach are illustrated through a preterm delivery methylation study. The results show that the Resampling-based empirical Bayes Methods offer significantly higher specificity and lower false discovery rates compared to Smyth's parametric method when data are not normally distributed. The Resampling-based empirical Bayes Methods also offers higher statistical power than the Significance Analysis of Microarrays method when the proportion of significantly differentially expressed genes is large for both normally and non-normally distributed data. Finally, the Resampling-based empirical Bayes Methods are generalizable to next generation sequencing RNA-seq data analysis.

  10. Rapid and simple method by combining FTA™ card DNA extraction with two set multiplex PCR for simultaneous detection of non-O157 Shiga toxin-producing Escherichia coli strains and virulence genes in food samples.

    PubMed

    Kim, S A; Park, S H; Lee, S I; Ricke, S C

    2017-12-01

    The aim of this research was to optimize two multiplex polymerase chain reaction (PCR) assays that could simultaneously detect six non-O157 Shiga toxin-producing Escherichia coli (STEC) as well as the three virulence genes. We also investigated the potential of combining the FTA™ card-based DNA extraction with the multiplex PCR assays. Two multiplex PCR assays were optimized using six primer pairs for each non-O157 STEC serogroup and three primer pairs for virulence genes respectively. Each STEC strain specific primer pair only amplified 155, 238, 321, 438, 587 and 750 bp product for O26, O45, O103, O111, O121 and O145 respectively. Three virulence genes were successfully multiplexed: 375 bp for eae, 655 bp for stx1 and 477 bp for stx2. When two multiplex PCR assays were validated with ground beef samples, distinctive bands were also successfully produced. Since the two multiplex PCR examined here can be conducted under the same PCR conditions, the six non-O157 STEC and their virulence genes could be concurrently detected with one run on the thermocycler. In addition, all bands clearly appeared to be amplified by FTA card DNA extraction in the multiplex PCR assay from the ground beef sample, suggesting that an FTA card could be a viable sampling approach for rapid and simple DNA extraction to reduce time and labour and therefore may have practical use for the food industry. Two multiplex polymerase chain reaction (PCR) assays were optimized for discrimination of six non-O157 Shiga toxin-producing Escherichia coli (STEC) and identification of their major virulence genes within a single reaction, simultaneously. This study also determined the successful ability of the FTA™ card as an alternative to commercial DNA extraction method for conducting multiplex STEC PCR assays. The FTA™ card combined with multiplex PCR holds promise for the food industry by offering a simple and rapid DNA sample method for reducing time, cost and labour for detection of STEC in food and environmental samples. © 2017 The Society for Applied Microbiology.

  11. Computational functional genomics-based approaches in analgesic drug discovery and repurposing.

    PubMed

    Lippmann, Catharina; Kringel, Dario; Ultsch, Alfred; Lötsch, Jörn

    2018-06-01

    Persistent pain is a major healthcare problem affecting a fifth of adults worldwide with still limited treatment options. The search for new analgesics increasingly includes the novel research area of functional genomics, which combines data derived from various processes related to DNA sequence, gene expression or protein function and uses advanced methods of data mining and knowledge discovery with the goal of understanding the relationship between the genome and the phenotype. Its use in drug discovery and repurposing for analgesic indications has so far been performed using knowledge discovery in gene function and drug target-related databases; next-generation sequencing; and functional proteomics-based approaches. Here, we discuss recent efforts in functional genomics-based approaches to analgesic drug discovery and repurposing and highlight the potential of computational functional genomics in this field including a demonstration of the workflow using a novel R library 'dbtORA'.

  12. A phylotranscriptomic backbone of the orb-weaving spider family Araneidae (Arachnida, Araneae) supported by multiple methodological approaches.

    PubMed

    Kallal, Robert J; Fernández, Rosa; Giribet, Gonzalo; Hormiga, Gustavo

    2018-04-07

    The orb-weaving spider family Araneidae is extremely diverse (>3100 spp.) and its members can be charismatic terrestrial arthropods, many of them recognizable by their iconic orbicular snare web, such as the common garden spiders. Despite considerable effort to better understand their backbone relationships based on multiple sources of data (morphological, behavioral and molecular), pervasive low support remains in recent studies. In addition, no overarching phylogeny of araneids is available to date, hampering further comparative work. In this study, we analyze the transcriptomes of 33 taxa, including 19 araneids - 12 of them new to this study - representing most of the core family lineages, to examine the relationships within the family using genomic-scale datasets resulting from various methodological treatments, namely ortholog selection and gene occupancy as a measure of matrix completion. Six matrices were constructed to assess these effects by varying orthology inference method and gene occupancy threshold. Orthology methods used are the benchmarking tool BUSCO and the tree-based method UPhO; three gene occupancy thresholds (45%, 65%, 85%) were used to assess the effect of missing data. Gene tree and species tree-based methods (including multi-species coalescent and concatenation approaches, as well as maximum likelihood and Bayesian inference) were used totalling 17 analytical treatments. The monophyly of Araneidae and the placement of core araneid lineages were supported, together with some previously unsound backbone divergences; these include high support for Zygiellinae as the earliest diverging subfamily (followed by Nephilinae), the placement of Gasteracanthinae as sister group to Cyclosa and close relatives, and close relationships between the Araneus + Neoscona clade and Cyrtophorinae + Argiopinae clade. Incongruences were relegated to short branches in the clade comprising Cyclosa and its close relatives. We found congruence between most of the completed analyses, with minimal topological effects from occupancy/missing data and orthology assessment. The resulting number of genes by certain combinations of orthology and occupancy thresholds being analyzed had the greatest effect on the resulting trees, with anomalous outcomes recovered from analysis of lower numbers of genes. Copyright © 2018 Elsevier Inc. All rights reserved.

  13. The best of both worlds: A combined approach for analyzing microalgal diversity via metabarcoding and morphology-based methods

    PubMed Central

    Kahlert, Maria; Fink, Patrick

    2017-01-01

    An increasing number of studies use next generation sequencing (NGS) to analyze complex communities, but is the method sensitive enough when it comes to identification and quantification of species? We compared NGS with morphology-based identification methods in an analysis of microalgal (periphyton) communities. We conducted a mesocosm experiment in which we allowed two benthic grazer species to feed upon benthic biofilms, which resulted in altered periphyton communities. Morphology-based identification and 454 (Roche) pyrosequencing of the V4 region in the small ribosomal unit (18S) rDNA gene were used to investigate the community change caused by grazing. Both the NGS-based data and the morphology-based method detected a marked shift in the biofilm composition, though the two methods varied strongly in their abilities to detect and quantify specific taxa, and neither method was able to detect all species in the biofilms. For quantitative analysis, we therefore recommend using both metabarcoding and microscopic identification when assessing the community composition of eukaryotic microorganisms. PMID:28234997

  14. Combination of DNA-based and conventional methods to detect human leukocyte antigen polymorphism and its use for paternity testing.

    PubMed

    Kereszturya, László; Rajczya, Katalin; Lászikb, András; Gyódia, Eva; Pénzes, Mária; Falus, András; Petrányia, Gyõzõ G

    2002-03-01

    In cases of disputed paternity, the scientific goal is to promote either the exclusion of a falsely accused man or the affiliation of the alleged father. Until now, in addition to anthropologic characteristics, the determination of genetic markers included human leukocyte antigen gene variants; erythrocyte antigens and serum proteins were used for that reason. Recombinant DNA techniques provided a new set of highly variable genetic markers based on DNA nucleotide sequence polymorphism. From the practical standpoint, the application of these techniques to paternity testing provides greater versatility than do conventional genetic marker systems. The use of methods to detect the polymorphism of human leukocyte antigen loci significantly increases the chance of validation of ambiguous results in paternity testing. The outcome of 2384 paternity cases investigated by serologic and/or DNA-based human leukocyte antigen typing was statistically analyzed. Different cases solved by DNA typing are presented involving cases with one or two accused men, exclusions and nonexclusions, and tests of the paternity of a deceased man. The results provide evidence for the advantage of the combined application of various techniques in forensic diagnostics and emphasizes the outstanding possibilities of DNA-based assays. Representative examples demonstrate the strength of combined techniques in paternity testing.

  15. Therapeutic effects of Euphorbia Pekinensis and Glycyrrhiza glabra on Hepatocellular Carcinoma Ascites Partially Via Regulating the Frk-Arhgdib-Inpp5d-Avpr2-Aqp4 Signal Axis

    PubMed Central

    Zhang, Yanqiong; Yan, Chen; Li, Yuting; Mao, Xia; Tao, Weiwei; Tang, Yuping; Lin, Ya; Guo, Qiuyan; Duan, Jingao; Lin, Na

    2017-01-01

    To clarify unknown rationalities of herbaceous compatibility of Euphorbia Pekinensis (DJ) and Glycyrrhiza glabra (GC) acting on hepatocellular carcinoma (HCC) ascites, peritoneum transcriptomics profiling of 15 subjects, including normal control (Con), HCC ascites mouse model (Mod), DJ-alone, DJ/GC-synergy and DJ/GC-antagonism treatment groups were performed on OneArray platform, followed by differentially expressed genes (DEGs) screening. DEGs between Mod and Con groups were considered as HCC ascites-related genes, and those among different drug treatment and Mod groups were identified as DJ/GC-combination-related genes. Then, an interaction network of HCC ascites-related gene-DJ/GC combination-related gene-known therapeutic target gene for ascites was constructed. Based on nodes’ degree, closeness, betweenness and k-coreness, the Frk-Arhgdib-Inpp5d-Avpr2-Aqp4 axis with highly network topological importance was demonstrated to be a candidate target of DJ/GC combination acting on HCC ascites. Importantly, both qPCR and western blot analyses verified this regulatory effects based on HCC ascites mice in vivo and M-1 collecting duct cells in vitro. Collectively, different combination designs of DJ and GC may lead to synergistic or antagonistic effects on HCC ascites partially via regulating the Frk-Arhgdib-Inpp5d-Avpr2-Aqp4 axis, implying that global gene expression profiling combined with network analysis can offer an effective way to understand pharmacological mechanisms of traditional Chinese medicine prescriptions. PMID:28165501

  16. Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering

    PubMed Central

    2010-01-01

    Background Cluster analysis, and in particular hierarchical clustering, is widely used to extract information from gene expression data. The aim is to discover new classes, or sub-classes, of either individuals or genes. Performing a cluster analysis commonly involve decisions on how to; handle missing values, standardize the data and select genes. In addition, pre-processing, involving various types of filtration and normalization procedures, can have an effect on the ability to discover biologically relevant classes. Here we consider cluster analysis in a broad sense and perform a comprehensive evaluation that covers several aspects of cluster analyses, including normalization. Result We evaluated 2780 cluster analysis methods on seven publicly available 2-channel microarray data sets with common reference designs. Each cluster analysis method differed in data normalization (5 normalizations were considered), missing value imputation (2), standardization of data (2), gene selection (19) or clustering method (11). The cluster analyses are evaluated using known classes, such as cancer types, and the adjusted Rand index. The performances of the different analyses vary between the data sets and it is difficult to give general recommendations. However, normalization, gene selection and clustering method are all variables that have a significant impact on the performance. In particular, gene selection is important and it is generally necessary to include a relatively large number of genes in order to get good performance. Selecting genes with high standard deviation or using principal component analysis are shown to be the preferred gene selection methods. Hierarchical clustering using Ward's method, k-means clustering and Mclust are the clustering methods considered in this paper that achieves the highest adjusted Rand. Normalization can have a significant positive impact on the ability to cluster individuals, and there are indications that background correction is preferable, in particular if the gene selection is successful. However, this is an area that needs to be studied further in order to draw any general conclusions. Conclusions The choice of cluster analysis, and in particular gene selection, has a large impact on the ability to cluster individuals correctly based on expression profiles. Normalization has a positive effect, but the relative performance of different normalizations is an area that needs more research. In summary, although clustering, gene selection and normalization are considered standard methods in bioinformatics, our comprehensive analysis shows that selecting the right methods, and the right combinations of methods, is far from trivial and that much is still unexplored in what is considered to be the most basic analysis of genomic data. PMID:20937082

  17. Effective delivery of large genes to the retina by dual AAV vectors

    PubMed Central

    Trapani, Ivana; Colella, Pasqualina; Sommella, Andrea; Iodice, Carolina; Cesi, Giulia; de Simone, Sonia; Marrocco, Elena; Rossi, Settimio; Giunti, Massimo; Palfi, Arpad; Farrar, Gwyneth J; Polishchuk, Roman; Auricchio, Alberto

    2014-01-01

    Retinal gene therapy with adeno-associated viral (AAV) vectors is safe and effective in humans. However, AAV's limited cargo capacity prevents its application to therapies of inherited retinal diseases due to mutations of genes over 5 kb, like Stargardt's disease (STGD) and Usher syndrome type IB (USH1B). Previous methods based on ‘forced’ packaging of large genes into AAV capsids may not be easily translated to the clinic due to the generation of genomes of heterogeneous size which raise safety concerns. Taking advantage of AAV's ability to concatemerize, we generated dual AAV vectors which reconstitute a large gene by either splicing (trans-splicing), homologous recombination (overlapping), or a combination of the two (hybrid). We found that dual trans-splicing and hybrid vectors transduce efficiently mouse and pig photoreceptors to levels that, albeit lower than those achieved with a single AAV, resulted in significant improvement of the retinal phenotype of mouse models of STGD and USH1B. Thus, dual AAV trans-splicing or hybrid vectors are an attractive strategy for gene therapy of retinal diseases that require delivery of large genes. PMID:24150896

  18. The development of a cisgenic apple plant.

    PubMed

    Vanblaere, Thalia; Szankowski, Iris; Schaart, Jan; Schouten, Henk; Flachowsky, Henryk; Broggini, Giovanni A L; Gessler, Cesare

    2011-07-20

    Cisgenesis represents a step toward a new generation of GM crops. The lack of selectable genes (e.g. antibiotic or herbicide resistance) in the final product and the fact that the inserted gene(s) derive from organisms sexually compatible with the target crop should rise less environmental concerns and increase consumer's acceptance. Here we report the generation of a cisgenic apple plant by inserting the endogenous apple scab resistance gene HcrVf2 under the control of its own regulatory sequences into the scab susceptible apple cultivar Gala. A previously developed method based on Agrobacterium-mediated transformation combined with a positive and negative selection system and a chemically inducible recombination machinery allowed the generation of apple cv. Gala carrying the scab resistance gene HcrVf2 under its native regulatory sequences and no foreign genes. Three cisgenic lines were chosen for detailed investigation and were shown to carry a single T-DNA insertion and express the target gene HcrVf2. This is the first report of the generation of a true cisgenic plant. Copyright © 2011 Elsevier B.V. All rights reserved.

  19. A multicolor panel of TALE-KRAB based transcriptional repressor vectors enabling knockdown of multiple gene targets

    PubMed Central

    Zhang, Zhonghui; Wu, Elise; Qian, Zhijian; Wu, Wen-Shu

    2014-01-01

    Stable and efficient knockdown of multiple gene targets is highly desirable for dissection of molecular pathways. Because it allows sequence-specific DNA binding, transcription activator-like effector (TALE) offers a new genetic perturbation technique that allows for gene-specific repression. Here, we constructed a multicolor lentiviral TALE-Kruppel-associated box (KRAB) expression vector platform that enables knockdown of multiple gene targets. This platform is fully compatible with the Golden Gate TALEN and TAL Effector Kit 2.0, a widely used and efficient method for TALE assembly. We showed that this multicolor TALE-KRAB vector system when combined together with bone marrow transplantation could quickly knock down c-kit and PU.1 genes in hematopoietic stem and progenitor cells of recipient mice. Furthermore, our data demonstrated that this platform simultaneously knocked down both c-Kit and PU.1 genes in the same primary cell populations. Together, our results suggest that this multicolor TALE-KRAB vector platform is a promising and versatile tool for knockdown of multiple gene targets and could greatly facilitate dissection of molecular pathways. PMID:25475013

  20. A multicolor panel of TALE-KRAB based transcriptional repressor vectors enabling knockdown of multiple gene targets.

    PubMed

    Zhang, Zhonghui; Wu, Elise; Qian, Zhijian; Wu, Wen-Shu

    2014-12-05

    Stable and efficient knockdown of multiple gene targets is highly desirable for dissection of molecular pathways. Because it allows sequence-specific DNA binding, transcription activator-like effector (TALE) offers a new genetic perturbation technique that allows for gene-specific repression. Here, we constructed a multicolor lentiviral TALE-Kruppel-associated box (KRAB) expression vector platform that enables knockdown of multiple gene targets. This platform is fully compatible with the Golden Gate TALEN and TAL Effector Kit 2.0, a widely used and efficient method for TALE assembly. We showed that this multicolor TALE-KRAB vector system when combined together with bone marrow transplantation could quickly knock down c-kit and PU.1 genes in hematopoietic stem and progenitor cells of recipient mice. Furthermore, our data demonstrated that this platform simultaneously knocked down both c-Kit and PU.1 genes in the same primary cell populations. Together, our results suggest that this multicolor TALE-KRAB vector platform is a promising and versatile tool for knockdown of multiple gene targets and could greatly facilitate dissection of molecular pathways.

  1. Blood transcriptomic comparison of individuals with and without autism spectrum disorder: A combined-samples mega-analysis.

    PubMed

    Tylee, Daniel S; Hess, Jonathan L; Quinn, Thomas P; Barve, Rahul; Huang, Hailiang; Zhang-James, Yanli; Chang, Jeffrey; Stamova, Boryana S; Sharp, Frank R; Hertz-Picciotto, Irva; Faraone, Stephen V; Kong, Sek Won; Glatt, Stephen J

    2017-04-01

    Blood-based microarray studies comparing individuals affected with autism spectrum disorder (ASD) and typically developing individuals help characterize differences in circulating immune cell functions and offer potential biomarker signal. We sought to combine the subject-level data from previously published studies by mega-analysis to increase the statistical power. We identified studies that compared ex vivo blood or lymphocytes from ASD-affected individuals and unrelated comparison subjects using Affymetrix or Illumina array platforms. Raw microarray data and clinical meta-data were obtained from seven studies, totaling 626 affected and 447 comparison subjects. Microarray data were processed using uniform methods. Covariate-controlled mixed-effect linear models were used to identify gene transcripts and co-expression network modules that were significantly associated with diagnostic status. Permutation-based gene-set analysis was used to identify functionally related sets of genes that were over- and under-expressed among ASD samples. Our results were consistent with diminished interferon-, EGF-, PDGF-, PI3K-AKT-mTOR-, and RAS-MAPK-signaling cascades, and increased ribosomal translation and NK-cell related activity in ASD. We explored evidence for sex-differences in the ASD-related transcriptomic signature. We also demonstrated that machine-learning classifiers using blood transcriptome data perform with moderate accuracy when data are combined across studies. Comparing our results with those from blood-based studies of protein biomarkers (e.g., cytokines and trophic factors), we propose that ASD may feature decoupling between certain circulating signaling proteins (higher in ASD samples) and the transcriptional cascades which they typically elicit within circulating immune cells (lower in ASD samples). These findings provide insight into ASD-related transcriptional differences in circulating immune cells. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  2. Blood Transcriptomic Comparison of Individuals with and without Autism Spectrum Disorder: A Combined-Samples Mega-Analysis

    PubMed Central

    Tylee, Daniel S.; Hess, Jonathan L.; Quinn, Thomas P.; Barve, Rahul; Huang, Hailiang; Zhang-James, Yanli; Chang, Jeffrey; Stamova, Boryana S.; Sharp, Frank R.; Hertz-Picciotto, Irva; Faraone, Stephen V.; Kong, Sek Won; Glatt, Stephen J.

    2017-01-01

    Blood-based microarray studies comparing individuals affected with autism spectrum disorder (ASD) and typically developing individuals help characterize differences in circulating immune cell functions and offer potential biomarker signal. We sought to combine the subject-level data from previously published studies by mega-analysis to increase the statistical power. We identified studies that compared ex-vivo blood or lymphocytes from ASD-affected individuals and unrelated comparison subjects using Affymetrix or Illumina array platforms. Raw microarray data and clinical meta-data were obtained from seven studies, totaling 626 affected and 447 comparison subjects. Microarray data were processed using uniform methods. Covariate-controlled mixed-effect linear models were used to identify gene transcripts and co-expression network modules that were significantly associated with diagnostic status. Permutation-based gene-set analysis was used to identify functionally related sets of genes that were over- and under-expressed among ASD samples. Our results were consistent with diminished interferon-, EGF-, PDGF-, PI3K-AKT-mTOR-, and RAS-MAPK-signaling cascades, and increased ribosomal translation and NK-cell related activity in ASD. We explored evidence for sex-differences in the ASD-related transcriptomic signature. We also demonstrated that machine-learning classifiers using blood transcriptome data perform with moderate accuracy when data are combined across studies. Comparing our results with those from blood-based studies of protein biomarkers (e.g., cytokines and trophic factors), we propose that ASD may feature decoupling between certain circulating signaling proteins (higher in ASD samples) and the transcriptional cascades which they typically elicit within circulating immune cells (lower in ASD samples). These findings provide insight into ASD-related transcriptional differences in circulating immune cells. PMID:27862943

  3. A network medicine approach to quantify distance between hereditary disease modules on the interactome

    NASA Astrophysics Data System (ADS)

    Caniza, Horacio; Romero, Alfonso E.; Paccanaro, Alberto

    2015-12-01

    We introduce a MeSH-based method that accurately quantifies similarity between heritable diseases at molecular level. This method effectively brings together the existing information about diseases that is scattered across the vast corpus of biomedical literature. We prove that sets of MeSH terms provide a highly descriptive representation of heritable disease and that the structure of MeSH provides a natural way of combining individual MeSH vocabularies. We show that our measure can be used effectively in the prediction of candidate disease genes. We developed a web application to query more than 28.5 million relationships between 7,574 hereditary diseases (96% of OMIM) based on our similarity measure.

  4. Selecting and validating reference genes for quantitative real-time PCR in Plutella xylostella (L.).

    PubMed

    You, Yanchun; Xie, Miao; Vasseur, Liette; You, Minsheng

    2018-05-01

    Gene expression analysis provides important clues regarding gene functions, and quantitative real-time PCR (qRT-PCR) is a widely used method in gene expression studies. Reference genes are essential for normalizing and accurately assessing gene expression. In the present study, 16 candidate reference genes (ACTB, CyPA, EF1-α, GAPDH, HSP90, NDPk, RPL13a, RPL18, RPL19, RPL32, RPL4, RPL8, RPS13, RPS4, α-TUB, and β-TUB) from Plutella xylostella were selected to evaluate gene expression stability across different experimental conditions using five statistical algorithms (geNorm, NormFinder, Delta Ct, BestKeeper, and RefFinder). The results suggest that different reference genes or combinations of reference genes are suitable for normalization in gene expression studies of P. xylostella according to the different developmental stages, strains, tissues, and insecticide treatments. Based on the given experimental sets, the most stable reference genes were RPS4 across different developmental stages, RPL8 across different strains and tissues, and EF1-α across different insecticide treatments. A comprehensive and systematic assessment of potential reference genes for gene expression normalization is essential for post-genomic functional research in P. xylostella, a notorious pest with worldwide distribution and a high capacity to adapt and develop resistance to insecticides.

  5. A Simple and Efficient Method for Assembling TALE Protein Based on Plasmid Library

    PubMed Central

    Xu, Huarong; Xin, Ying; Zhang, Tingting; Ma, Lixia; Wang, Xin; Chen, Zhilong; Zhang, Zhiying

    2013-01-01

    DNA binding domain of the transcription activator-like effectors (TALEs) from Xanthomonas sp. consists of tandem repeats that can be rearranged according to a simple cipher to target new DNA sequences with high DNA-binding specificity. This technology has been successfully applied in varieties of species for genome engineering. However, assembling long TALE tandem repeats remains a big challenge precluding wide use of this technology. Although several new methodologies for efficiently assembling TALE repeats have been recently reported, all of them require either sophisticated facilities or skilled technicians to carry them out. Here, we described a simple and efficient method for generating customized TALE nucleases (TALENs) and TALE transcription factors (TALE-TFs) based on TALE repeat tetramer library. A tetramer library consisting of 256 tetramers covers all possible combinations of 4 base pairs. A set of unique primers was designed for amplification of these tetramers. PCR products were assembled by one step of digestion/ligation reaction. 12 TALE constructs including 4 TALEN pairs targeted to mouse Gt(ROSA)26Sor gene and mouse Mstn gene sequences as well as 4 TALE-TF constructs targeted to mouse Oct4, c-Myc, Klf4 and Sox2 gene promoter sequences were generated by using our method. The construction routines took 3 days and parallel constructions were available. The rate of positive clones during colony PCR verification was 64% on average. Sequencing results suggested that all TALE constructs were performed with high successful rate. This is a rapid and cost-efficient method using the most common enzymes and facilities with a high success rate. PMID:23840477

  6. A simple and efficient method for assembling TALE protein based on plasmid library.

    PubMed

    Zhang, Zhiqiang; Li, Duo; Xu, Huarong; Xin, Ying; Zhang, Tingting; Ma, Lixia; Wang, Xin; Chen, Zhilong; Zhang, Zhiying

    2013-01-01

    DNA binding domain of the transcription activator-like effectors (TALEs) from Xanthomonas sp. consists of tandem repeats that can be rearranged according to a simple cipher to target new DNA sequences with high DNA-binding specificity. This technology has been successfully applied in varieties of species for genome engineering. However, assembling long TALE tandem repeats remains a big challenge precluding wide use of this technology. Although several new methodologies for efficiently assembling TALE repeats have been recently reported, all of them require either sophisticated facilities or skilled technicians to carry them out. Here, we described a simple and efficient method for generating customized TALE nucleases (TALENs) and TALE transcription factors (TALE-TFs) based on TALE repeat tetramer library. A tetramer library consisting of 256 tetramers covers all possible combinations of 4 base pairs. A set of unique primers was designed for amplification of these tetramers. PCR products were assembled by one step of digestion/ligation reaction. 12 TALE constructs including 4 TALEN pairs targeted to mouse Gt(ROSA)26Sor gene and mouse Mstn gene sequences as well as 4 TALE-TF constructs targeted to mouse Oct4, c-Myc, Klf4 and Sox2 gene promoter sequences were generated by using our method. The construction routines took 3 days and parallel constructions were available. The rate of positive clones during colony PCR verification was 64% on average. Sequencing results suggested that all TALE constructs were performed with high successful rate. This is a rapid and cost-efficient method using the most common enzymes and facilities with a high success rate.

  7. Combining eicosapentaenoic acid, decosahexaenoic acid and arachidonic acid, using a fully crossed design, affect gene expression and eicosanoid secretion in salmon head kidney cells in vitro.

    PubMed

    Holen, Elisabeth; He, Juyun; Espe, Marit; Chen, Liqiou; Araujo, Pedro

    2015-08-01

    Future feed for farmed fish are based on untraditional feed ingredients, which will change nutrient profiles compared to traditional feed based on marine ingredients. To understand the impact of oils from different sources on fish health, n-6 and n-3 polyunsaturated fatty acids (PUFAs) were added to salmon head kidney cells, in a fully crossed design, to monitor their individual and combined effects on gene expression. Exposing salmon head kidney cells to single fatty acids, arachidonic acid (AA) or decosahexaenoic acid (DHA), resulted in down-regulation of cell signaling pathway genes and specific fatty acid metabolism genes as well as reduced prostaglandin E2 (PGE2) secretion. Eicosapentaenoic acid (EPA) had no impact on gene transcription in this study, but reduced the cell secretion of PGE2. The combined effect of AA + EPA resulted in up-regulation of eicosanoid pathway genes and the pro-inflammatory cytokine, tumor necrosis factor alpha (TNF-α), Bclx (an inducer of apoptosis) and fatty acid translocase (CD36) as well as increased cell secretion of PGE2 into the media. Adding single fatty acids to salmon head kidney cells decreased inflammation markers in this model. The combination AA + EPA acted differently than the rest of the fatty acid combinations by increasing the inflammation markers in these cells. The concentration of fatty acid used in this experiment did not induce any lipid peroxidation responses. Copyright © 2015 Elsevier Ltd. All rights reserved.

  8. Combined targeting of lentiviral vectors and positioning of transduced cells by magnetic nanoparticles.

    PubMed

    Hofmann, Andreas; Wenzel, Daniela; Becher, Ulrich M; Freitag, Daniel F; Klein, Alexandra M; Eberbeck, Dietmar; Schulte, Maike; Zimmermann, Katrin; Bergemann, Christian; Gleich, Bernhard; Roell, Wilhelm; Weyh, Thomas; Trahms, Lutz; Nickenig, Georg; Fleischmann, Bernd K; Pfeifer, Alexander

    2009-01-06

    Targeting of viral vectors is a major challenge for in vivo gene delivery, especially after intravascular application. In addition, targeting of the endothelium itself would be of importance for gene-based therapies of vascular disease. Here, we used magnetic nanoparticles (MNPs) to combine cell transduction and positioning in the vascular system under clinically relevant, nonpermissive conditions, including hydrodynamic forces and hypothermia. The use of MNPs enhanced transduction efficiency of endothelial cells and enabled direct endothelial targeting of lentiviral vectors (LVs) by magnetic force, even in perfused vessels. In addition, application of external magnetic fields to mice significantly changed LV/MNP biodistribution in vivo. LV/MNP-transduced cells exhibited superparamagnetic behavior as measured by magnetorelaxometry, and they were efficiently retained by magnetic fields. The magnetic interactions were strong enough to position MNP-containing endothelial cells at the intima of vessels under physiological flow conditions. Importantly, magnetic positioning of MNP-labeled cells was also achieved in vivo in an injury model of the mouse carotid artery. Intravascular gene targeting can be combined with positioning of the transduced cells via nanomagnetic particles, thereby combining gene- and cell-based therapies.

  9. Molecular mapping and breeding with microsatellite markers.

    PubMed

    Lightfoot, David A; Iqbal, Muhammad J

    2013-01-01

    In genetics databases for crop plant species across the world, there are thousands of mapped loci that underlie quantitative traits, oligogenic traits, and simple traits recognized by association mapping in populations. The number of loci will increase as new phenotypes are measured in more diverse genotypes and genetic maps based on saturating numbers of markers are developed. A period of locus reevaluation will decrease the number of important loci as those underlying mega-environmental effects are recognized. A second wave of reevaluation of loci will follow from developmental series analysis, especially for harvest traits like seed yield and composition. Breeding methods to properly use the accurate maps of QTL are being developed. New methods to map, fine map, and isolate the genes underlying the loci will be critical to future advances in crop biotechnology. Microsatellite markers are the most useful tool for breeders. They are codominant, abundant in all genomes, highly polymorphic so useful in many populations, and both economical and technically easy to use. The selective genotyping approaches, including genotype ranking (indexing) based on partial phenotype data combined with favorable allele data and bulked segregation event (segregant) analysis (BSA), will be increasingly important uses for microsatellites. Examples of the methods for developing and using microsatellites derived from genomic sequences are presented for monogenic, oligogenic, and polygenic traits. Examples of successful mapping, fine mapping, and gene isolation are given. When combined with high-throughput methods for genotyping and a genome sequence, the use of association mapping with microsatellite markers will provide critical advances in the analysis of crop traits.

  10. Finding minimum gene subsets with heuristic breadth-first search algorithm for robust tumor classification

    PubMed Central

    2012-01-01

    Background Previous studies on tumor classification based on gene expression profiles suggest that gene selection plays a key role in improving the classification performance. Moreover, finding important tumor-related genes with the highest accuracy is a very important task because these genes might serve as tumor biomarkers, which is of great benefit to not only tumor molecular diagnosis but also drug development. Results This paper proposes a novel gene selection method with rich biomedical meaning based on Heuristic Breadth-first Search Algorithm (HBSA) to find as many optimal gene subsets as possible. Due to the curse of dimensionality, this type of method could suffer from over-fitting and selection bias problems. To address these potential problems, a HBSA-based ensemble classifier is constructed using majority voting strategy from individual classifiers constructed by the selected gene subsets, and a novel HBSA-based gene ranking method is designed to find important tumor-related genes by measuring the significance of genes using their occurrence frequencies in the selected gene subsets. The experimental results on nine tumor datasets including three pairs of cross-platform datasets indicate that the proposed method can not only obtain better generalization performance but also find many important tumor-related genes. Conclusions It is found that the frequencies of the selected genes follow a power-law distribution, indicating that only a few top-ranked genes can be used as potential diagnosis biomarkers. Moreover, the top-ranked genes leading to very high prediction accuracy are closely related to specific tumor subtype and even hub genes. Compared with other related methods, the proposed method can achieve higher prediction accuracy with fewer genes. Moreover, they are further justified by analyzing the top-ranked genes in the context of individual gene function, biological pathway, and protein-protein interaction network. PMID:22830977

  11. [Polymorphism of POU1F1 gene and PRL gene and their combined effects on milk performance traits in Chinese Holstein cattle].

    PubMed

    Jia, Xiang-Jie; Wang, Chang-Fa; Yang, Gui-Wen; Huang, Jin-Ming; Li, Qiu-Ling; Zhong, Ji-Feng

    2011-12-01

    Three novel SNPs were found by DNA sequencing, PCR-RFLP and CRS-PCR methods were used for genotyping in 979 Chinese Holstein cattle. One SNP, G1178C, was identified in exon 2 of POU1F1 gene. Two novel SNPs, A906G and A1134G, were identified in 5'-flanking regulatory region (5'-UTR) of PRL gene. The association between polymorphisms of the two genes and milk performance traits were analyzed with PROC GLM of SAS. The results showed that GC genotype at 1178 locus of POU1F1 gene was advantageous for milk yield, milk protein yield, and milk fat yield. AG genotype at 906 locus was advantageous for milk yield. There was no significant difference between 1134 locus and milk performance traits of 5'-UTR of PRL gene. Analysis of genotype combination effect on milk production traits showed that the effect of combined genotype was not simple sum of single genotypes and the effects of gene pyramiding seemed to be more important in molecular breeding.

  12. Novel RNA hybridization method for the in situ detection of ETV1, ETV4, and ETV5 gene fusions in prostate cancer.

    PubMed

    Kunju, Lakshmi P; Carskadon, Shannon; Siddiqui, Javed; Tomlins, Scott A; Chinnaiyan, Arul M; Palanisamy, Nallasivam

    2014-09-01

    The genetic basis of 50% to 60% of prostate cancer (PCa) is attributable to rearrangements in E26 transformation-specific (ETS) (ERG, ETV1, ETV4, and ETV5), BRAF, and RAF1 genes and overexpression of SPINK1. The development and validation of reliable detection methods are warranted to classify various molecular subtypes of PCa for diagnostic and prognostic purposes. ETS gene rearrangements are typically detected by fluorescence in situ hybridization and reverse-transcription polymerase chain reaction methods. Recently, monoclonal antibodies against ERG have been developed that detect the truncated ERG protein in immunohistochemical assays where staining levels are strongly correlated with ERG rearrangement status by fluorescence in situ hybridization. However, specific antibodies for ETV1, ETV4, and ETV5 are unavailable, challenging their clinical use. We developed a novel RNA in situ hybridization-based assay for the in situ detection of ETV1, ETV4, and ETV5 in formalin-fixed paraffin-embedded tissues from prostate needle biopsies, prostatectomy, and metastatic PCa specimens using RNA probes. Further, with combined RNA in situ hybridization and immunohistochemistry we identified a rare subset of PCa with dual ETS gene rearrangements in collisions of independent tumor foci. The high specificity and sensitivity of RNA in situ hybridization provides an alternate method enabling bright-field in situ detection of ETS gene aberrations in routine clinically available PCa specimens.

  13. Selection of reference genes for quantitative real-time PCR normalization in Panax ginseng at different stages of growth and in different organs.

    PubMed

    Liu, Jing; Wang, Qun; Sun, Minying; Zhu, Linlin; Yang, Michael; Zhao, Yu

    2014-01-01

    Quantitative real-time reverse transcription PCR (qRT-PCR) has become a widely used method for gene expression analysis; however, its data interpretation largely depends on the stability of reference genes. The transcriptomics of Panax ginseng, one of the most popular and traditional ingredients used in Chinese medicines, is increasingly being studied. Furthermore, it is vital to establish a series of reliable reference genes when qRT-PCR is used to assess the gene expression profile of ginseng. In this study, we screened out candidate reference genes for ginseng using gene expression data generated by a high-throughput sequencing platform. Based on the statistical tests, 20 reference genes (10 traditional housekeeping genes and 10 novel genes) were selected. These genes were tested for the normalization of expression levels in five growth stages and three distinct plant organs of ginseng by qPCR. These genes were subsequently ranked and compared according to the stability of their expressions using geNorm, NormFinder, and BestKeeper computational programs. Although the best reference genes were found to vary across different samples, CYP and EF-1α were the most stable genes amongst all samples. GAPDH/30S RPS20, CYP/60S RPL13 and CYP/QCR were the optimum pair of reference genes in the roots, stems, and leaves. CYP/60S RPL13, CYP/eIF-5A, aTUB/V-ATP, eIF-5A/SAR1, and aTUB/pol IIa were the most stably expressed combinations in each of the five developmental stages. Our study serves as a foundation for developing an accurate method of qRT-PCR and will benefit future studies on gene expression profiles of Panax Ginseng.

  14. Pathway-GPS and SIGORA: identifying relevant pathways based on the over-representation of their gene-pair signatures

    PubMed Central

    Foroushani, Amir B.K.; Brinkman, Fiona S.L.

    2013-01-01

    Motivation. Predominant pathway analysis approaches treat pathways as collections of individual genes and consider all pathway members as equally informative. As a result, at times spurious and misleading pathways are inappropriately identified as statistically significant, solely due to components that they share with the more relevant pathways. Results. We introduce the concept of Pathway Gene-Pair Signatures (Pathway-GPS) as pairs of genes that, as a combination, are specific to a single pathway. We devised and implemented a novel approach to pathway analysis, Signature Over-representation Analysis (SIGORA), which focuses on the statistically significant enrichment of Pathway-GPS in a user-specified gene list of interest. In a comparative evaluation of several published datasets, SIGORA outperformed traditional methods by delivering biologically more plausible and relevant results. Availability. An efficient implementation of SIGORA, as an R package with precompiled GPS data for several human and mouse pathway repositories is available for download from http://sigora.googlecode.com/svn/. PMID:24432194

  15. Novel Random Mutagenesis Method for Directed Evolution.

    PubMed

    Feng, Hong; Wang, Hai-Yan; Zhao, Hong-Yan

    2017-01-01

    Directed evolution is a powerful strategy for gene mutagenesis, and has been used for protein engineering both in scientific research and in the biotechnology industry. The routine method for directed evolution was developed by Stemmer in 1994 (Stemmer, Proc Natl Acad Sci USA 91, 10747-10751, 1994; Stemmer, Nature 370, 389-391, 1994). Since then, various methods have been introduced, each of which has advantages and limitations depending upon the targeted genes and procedure. In this chapter, a novel alternative directed evolution method which combines mutagenesis PCR with dITP and fragmentation by endonuclease V is described. The kanamycin resistance gene is used as a reporter gene to verify the novel method for directed evolution. This method for directed evolution has been demonstrated to be efficient, reproducible, and easy to manipulate in practice.

  16. Next-generation sequencing identifies a novel compound heterozygous mutation in MYO7A in a Chinese patient with Usher Syndrome 1B.

    PubMed

    Wei, Xiaoming; Sun, Yan; Xie, Jiansheng; Shi, Quan; Qu, Ning; Yang, Guanghui; Cai, Jun; Yang, Yi; Liang, Yu; Wang, Wei; Yi, Xin

    2012-11-20

    Targeted enrichment and next-generation sequencing (NGS) have been employed for detection of genetic diseases. The purpose of this study was to validate the accuracy and sensitivity of our method for comprehensive mutation detection of hereditary hearing loss, and identify inherited mutations involved in human deafness accurately and economically. To make genetic diagnosis of hereditary hearing loss simple and timesaving, we designed a 0.60 MB array-based chip containing 69 nuclear genes and mitochondrial genome responsible for human deafness and conducted NGS toward ten patients with five known mutations and a Chinese family with hearing loss (never genetically investigated). Ten patients with five known mutations were sequenced using next-generation sequencing to validate the sensitivity of the method. We identified four known mutations in two nuclear deafness causing genes (GJB2 and SLC26A4), one in mitochondrial DNA. We then performed this method to analyze the variants in a Chinese family with hearing loss and identified compound heterozygosity for two novel mutations in gene MYO7A. The compound heterozygosity identified in gene MYO7A causes Usher Syndrome 1B with severe phenotypes. The results support that the combination of enrichment of targeted genes and next-generation sequencing is a valuable molecular diagnostic tool for hereditary deafness and suitable for clinical application. Copyright © 2012 Elsevier B.V. All rights reserved.

  17. CAMUR: Knowledge extraction from RNA-seq cancer data through equivalent classification rules.

    PubMed

    Cestarelli, Valerio; Fiscon, Giulia; Felici, Giovanni; Bertolazzi, Paola; Weitschek, Emanuel

    2016-03-01

    Nowadays, knowledge extraction methods from Next Generation Sequencing data are highly requested. In this work, we focus on RNA-seq gene expression analysis and specifically on case-control studies with rule-based supervised classification algorithms that build a model able to discriminate cases from controls. State of the art algorithms compute a single classification model that contains few features (genes). On the contrary, our goal is to elicit a higher amount of knowledge by computing many classification models, and therefore to identify most of the genes related to the predicted class. We propose CAMUR, a new method that extracts multiple and equivalent classification models. CAMUR iteratively computes a rule-based classification model, calculates the power set of the genes present in the rules, iteratively eliminates those combinations from the data set, and performs again the classification procedure until a stopping criterion is verified. CAMUR includes an ad-hoc knowledge repository (database) and a querying tool.We analyze three different types of RNA-seq data sets (Breast, Head and Neck, and Stomach Cancer) from The Cancer Genome Atlas (TCGA) and we validate CAMUR and its models also on non-TCGA data. Our experimental results show the efficacy of CAMUR: we obtain several reliable equivalent classification models, from which the most frequent genes, their relationships, and the relation with a particular cancer are deduced. dmb.iasi.cnr.it/camur.php emanuel@iasi.cnr.it Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  18. Identification of Reference Genes for Normalizing Quantitative Real-Time PCR in Urechis unicinctus

    NASA Astrophysics Data System (ADS)

    Bai, Yajiao; Zhou, Di; Wei, Maokai; Xie, Yueyang; Gao, Beibei; Qin, Zhenkui; Zhang, Zhifeng

    2018-06-01

    The reverse transcription quantitative real-time PCR (RT-qPCR) has become one of the most important techniques of studying gene expression. A set of valid reference genes are essential for the accurate normalization of data. In this study, five candidate genes were analyzed with geNorm, NormFinder, BestKeeper and ΔCt methods to identify the genes stably expressed in echiuran Urechis unicinctus, an important commercial marine benthic worm, under abiotic (sulfide stress) and normal (adult tissues, embryos and larvae at different development stages) conditions. The comprehensive results indicated that the expression of TBP was the most stable at sulfide stress and in developmental process, while the expression of EF- 1- α was the most stable at sulfide stress and in various tissues. TBP and EF- 1- α were recommended as a suitable reference gene combination to accurately normalize the expression of target genes at sulfide stress; and EF- 1- α, TBP and TUB were considered as a potential reference gene combination for normalizing the expression of target genes in different tissues. No suitable gene combination was obtained among these five candidate genes for normalizing the expression of target genes for developmental process of U. unicinctus. Our results provided a valuable support for quantifying gene expression using RT-qPCR in U. unicinctus.

  19. Improving Gene Therapy Efficiency through the Enrichment of Human Hematopoietic Stem Cells.

    PubMed

    Masiuk, Katelyn E; Brown, Devin; Laborada, Jennifer; Hollis, Roger P; Urbinati, Fabrizia; Kohn, Donald B

    2017-09-06

    Lentiviral vector (LV)-based hematopoietic stem cell (HSC) gene therapy is becoming a promising clinical strategy for the treatment of genetic blood diseases. However, the current approach of modifying 1 × 10 8 to 1 × 10 9 CD34 + cells per patient requires large amounts of LV, which is expensive and technically challenging to produce at clinical scale. Modification of bulk CD34 + cells uses LV inefficiently, because the majority of CD34 + cells are short-term progenitors with a limited post-transplant lifespan. Here, we utilized a clinically relevant, immunomagnetic bead (IB)-based method to purify CD34 + CD38 - cells from human bone marrow (BM) and mobilized peripheral blood (mPB). IB purification of CD34 + CD38 - cells enriched severe combined immune deficiency (SCID) repopulating cell (SRC) frequency an additional 12-fold beyond standard CD34 + purification and did not affect gene marking of long-term HSCs. Transplant of purified CD34 + CD38 - cells led to delayed myeloid reconstitution, which could be rescued by the addition of non-transduced CD38 + cells. Importantly, LV modification and transplantation of IB-purified CD34 + CD38 - cells/non-modified CD38 + cells into immune-deficient mice achieved long-term gene-marked engraftment comparable with modification of bulk CD34 + cells, while utilizing ∼7-fold less LV. Thus, we demonstrate a translatable method to improve the clinical and commercial viability of gene therapy for genetic blood cell diseases. Copyright © 2017 The American Society of Gene and Cell Therapy. Published by Elsevier Inc. All rights reserved.

  20. Clinical phenotype-based gene prioritization: an initial study using semantic similarity and the human phenotype ontology.

    PubMed

    Masino, Aaron J; Dechene, Elizabeth T; Dulik, Matthew C; Wilkens, Alisha; Spinner, Nancy B; Krantz, Ian D; Pennington, Jeffrey W; Robinson, Peter N; White, Peter S

    2014-07-21

    Exome sequencing is a promising method for diagnosing patients with a complex phenotype. However, variant interpretation relative to patient phenotype can be challenging in some scenarios, particularly clinical assessment of rare complex phenotypes. Each patient's sequence reveals many possibly damaging variants that must be individually assessed to establish clear association with patient phenotype. To assist interpretation, we implemented an algorithm that ranks a given set of genes relative to patient phenotype. The algorithm orders genes by the semantic similarity computed between phenotypic descriptors associated with each gene and those describing the patient. Phenotypic descriptor terms are taken from the Human Phenotype Ontology (HPO) and semantic similarity is derived from each term's information content. Model validation was performed via simulation and with clinical data. We simulated 33 Mendelian diseases with 100 patients per disease. We modeled clinical conditions by adding noise and imprecision, i.e. phenotypic terms unrelated to the disease and terms less specific than the actual disease terms. We ranked the causative gene against all 2488 HPO annotated genes. The median causative gene rank was 1 for the optimal and noise cases, 12 for the imprecision case, and 60 for the imprecision with noise case. Additionally, we examined a clinical cohort of subjects with hearing impairment. The disease gene median rank was 22. However, when also considering the patient's exome data and filtering non-exomic and common variants, the median rank improved to 3. Semantic similarity can rank a causative gene highly within a gene list relative to patient phenotype characteristics, provided that imprecision is mitigated. The clinical case results suggest that phenotype rank combined with variant analysis provides significant improvement over the individual approaches. We expect that this combined prioritization approach may increase accuracy and decrease effort for clinical genetic diagnosis.

  1. Detecting false positive sequence homology: a machine learning approach.

    PubMed

    Fujimoto, M Stanley; Suvorov, Anton; Jensen, Nicholas O; Clement, Mark J; Bybee, Seth M

    2016-02-24

    Accurate detection of homologous relationships of biological sequences (DNA or amino acid) amongst organisms is an important and often difficult task that is essential to various evolutionary studies, ranging from building phylogenies to predicting functional gene annotations. There are many existing heuristic tools, most commonly based on bidirectional BLAST searches that are used to identify homologous genes and combine them into two fundamentally distinct classes: orthologs and paralogs. Due to only using heuristic filtering based on significance score cutoffs and having no cluster post-processing tools available, these methods can often produce multiple clusters constituting unrelated (non-homologous) sequences. Therefore sequencing data extracted from incomplete genome/transcriptome assemblies originated from low coverage sequencing or produced by de novo processes without a reference genome are susceptible to high false positive rates of homology detection. In this paper we develop biologically informative features that can be extracted from multiple sequence alignments of putative homologous genes (orthologs and paralogs) and further utilized in context of guided experimentation to verify false positive outcomes. We demonstrate that our machine learning method trained on both known homology clusters obtained from OrthoDB and randomly generated sequence alignments (non-homologs), successfully determines apparent false positives inferred by heuristic algorithms especially among proteomes recovered from low-coverage RNA-seq data. Almost ~42 % and ~25 % of predicted putative homologies by InParanoid and HaMStR respectively were classified as false positives on experimental data set. Our process increases the quality of output from other clustering algorithms by providing a novel post-processing method that is both fast and efficient at removing low quality clusters of putative homologous genes recovered by heuristic-based approaches.

  2. ARG1 Is a Novel Bronchodilator Response Gene

    PubMed Central

    Litonjua, Augusto A.; Lasky-Su, Jessica; Schneiter, Kady; Tantisira, Kelan G.; Lazarus, Ross; Klanderman, Barbara; Lima, John J.; Irvin, Charles G.; Peters, Stephen P.; Hanrahan, John P.; Liggett, Stephen B.; Hawkins, Gregory A.; Meyers, Deborah A.; Bleecker, Eugene R.; Lange, Christoph; Weiss, Scott T.

    2008-01-01

    Rationale: Inhaled β-agonists are one of the most widely used classes of drugs for the treatment of asthma. However, a substantial proportion of patients with asthma do not have a favorable response to these drugs, and identifying genetic determinants of drug response may aid in tailoring treatment for individual patients. Objectives: To screen variants in candidate genes in the steroid and β-adrenergic pathways for association with response to inhaled β-agonists. Methods: We genotyped 844 single nucleotide polymorphisms (SNPs) in 111 candidate genes in 209 children and their parents participating in the Childhood Asthma Management Program. We screened the association of these SNPs with acute response to inhaled β-agonists (bronchodilator response [BDR]) using a novel algorithm implemented in a family-based association test that ranked SNPs in order of statistical power. Genes that had SNPs with median power in the highest quartile were then taken for replication analyses in three other asthma cohorts. Measurements and Main Results: We identified 17 genes from the screening algorithm and genotyped 99 SNPs from these genes in a second population of patients with asthma. We then genotyped 63 SNPs from four genes with significant associations with BDR, for replication in a third and fourth population of patients with asthma. Evidence for association from the four asthma cohorts was combined, and SNPs from ARG1 were significantly associated with BDR. SNP rs2781659 survived Bonferroni correction for multiple testing (combined P value = 0.00048, adjusted P value = 0.047). Conclusions: These findings identify ARG1 as a novel gene for acute BDR in both children and adults with asthma. PMID:18617639

  3. AprioriGWAS, a new pattern mining strategy for detecting genetic variants associated with disease through interaction effects.

    PubMed

    Zhang, Qingrun; Long, Quan; Ott, Jurg

    2014-06-01

    Identifying gene-gene interaction is a hot topic in genome wide association studies. Two fundamental challenges are: (1) how to smartly identify combinations of variants that may be associated with the trait from astronomical number of all possible combinations; and (2) how to test epistatic interaction when all potential combinations are available. We developed AprioriGWAS, which brings two innovations. (1) Based on Apriori, a successful method in field of Frequent Itemset Mining (FIM) in which a pattern growth strategy is leveraged to effectively and accurately reduce search space, AprioriGWAS can efficiently identify genetically associated genotype patterns. (2) To test the hypotheses of epistasis, we adopt a new conditional permutation procedure to obtain reliable statistical inference of Pearson's chi-square test for the [Formula: see text] contingency table generated by associated variants. By applying AprioriGWAS to age-related macular degeneration (AMD) data, we found that: (1) angiopoietin 1 (ANGPT1) and four retinal genes interact with Complement Factor H (CFH). (2) GO term "glycosaminoglycan biosynthetic process" was enriched in AMD interacting genes. The epistatic interactions newly found by AprioriGWAS on AMD data are likely true interactions, since genes interacting with CFH are retinal genes, and GO term enrichment also verified that interaction between glycosaminoglycans (GAGs) and CFH plays an important role in disease pathology of AMD. By applying AprioriGWAS on Bipolar disorder in WTCCC data, we found variants without marginal effect show significant interactions. For example, multiple-SNP genotype patterns inside gene GABRB2 and GRIA1 (AMPA subunit 1 receptor gene). AMPARs are found in many parts of the brain and are the most commonly found receptor in the nervous system. The GABRB2 mediates the fastest inhibitory synaptic transmission in the central nervous system. GRIA1 and GABRB2 are relevant to mental disorders supported by multiple evidences.

  4. Improving prokaryotic transposable elements identification using a combination of de novo and profile HMM methods.

    PubMed

    Kamoun, Choumouss; Payen, Thibaut; Hua-Van, Aurélie; Filée, Jonathan

    2013-10-11

    Insertion Sequences (ISs) and their non-autonomous derivatives (MITEs) are important components of prokaryotic genomes inducing duplication, deletion, rearrangement or lateral gene transfers. Although ISs and MITEs are relatively simple and basic genetic elements, their detection remains a difficult task due to their remarkable sequence diversity. With the advent of high-throughput genome and metagenome sequencing technologies, the development of fast, reliable and sensitive methods of ISs and MITEs detection become an important challenge. So far, almost all studies dealing with prokaryotic transposons have used classical BLAST-based detection methods against reference libraries. Here we introduce alternative methods of detection either taking advantages of the structural properties of the elements (de novo methods) or using an additional library-based method using profile HMM searches. In this study, we have developed three different work flows dedicated to ISs and MITEs detection: the first two use de novo methods detecting either repeated sequences or presence of Inverted Repeats; the third one use 28 in-house transposase alignment profiles with HMM search methods. We have compared the respective performances of each method using a reference dataset of 30 archaeal and 30 bacterial genomes in addition to simulated and real metagenomes. Compared to a BLAST-based method using ISFinder as library, de novo methods significantly improve ISs and MITEs detection. For example, in the 30 archaeal genomes, we discovered 30 new elements (+20%) in addition to the 141 multi-copies elements already detected by the BLAST approach. Many of the new elements correspond to ISs belonging to unknown or highly divergent families. The total number of MITEs has even doubled with the discovery of elements displaying very limited sequence similarities with their respective autonomous partners (mainly in the Inverted Repeats of the elements). Concerning metagenomes, with the exception of short reads data (<300 bp) for which both techniques seem equally limited, profile HMM searches considerably ameliorate the detection of transposase encoding genes (up to +50%) generating low level of false positives compare to BLAST-based methods. Compared to classical BLAST-based methods, the sensitivity of de novo and profile HMM methods developed in this study allow a better and more reliable detection of transposons in prokaryotic genomes and metagenomes. We believed that future studies implying ISs and MITEs identification in genomic data should combine at least one de novo and one library-based method, with optimal results obtained by running the two de novo methods in addition to a library-based search. For metagenomic data, profile HMM search should be favored, a BLAST-based step is only useful to the final annotation into groups and families.

  5. Linking metabolic network features to phenotypes using sparse group lasso.

    PubMed

    Samal, Satya Swarup; Radulescu, Ovidiu; Weber, Andreas; Fröhlich, Holger

    2017-11-01

    Integration of metabolic networks with '-omics' data has been a subject of recent research in order to better understand the behaviour of such networks with respect to differences between biological and clinical phenotypes. Under the conditions of steady state of the reaction network and the non-negativity of fluxes, metabolic networks can be algebraically decomposed into a set of sub-pathways often referred to as extreme currents (ECs). Our objective is to find the statistical association of such sub-pathways with given clinical outcomes, resulting in a particular instance of a self-contained gene set analysis method. In this direction, we propose a method based on sparse group lasso (SGL) to identify phenotype associated ECs based on gene expression data. SGL selects a sparse set of feature groups and also introduces sparsity within each group. Features in our model are clusters of ECs, and feature groups are defined based on correlations among these features. We apply our method to metabolic networks from KEGG database and study the association of network features to prostate cancer (where the outcome is tumor and normal, respectively) as well as glioblastoma multiforme (where the outcome is survival time). In addition, simulations show the superior performance of our method compared to global test, which is an existing self-contained gene set analysis method. R code (compatible with version 3.2.5) is available from http://www.abi.bit.uni-bonn.de/index.php?id=17. samal@combine.rwth-aachen.de or frohlich@bit.uni-bonn.de. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  6. Identification of the transcriptional regulators by expression profiling infected with hepatitis B virus.

    PubMed

    Chai, Xiaoqiang; Han, Yanan; Yang, Jian; Zhao, Xianxian; Liu, Yewang; Hou, Xugang; Tang, Yiheng; Zhao, Shirong; Li, Xiao

    2016-02-01

    The molecular pathogenesis of infection by hepatitis B virus with human is extremely complex and heterogeneous. To date the molecular information is not clearly defined despite intensive research efforts. Thus, studies aimed at transcription and regulation during virus infection or combined researches of those already known to be beneficial are needed. With the purpose of identifying the transcriptional regulators related to infection of hepatitis B virus in gene level, the gene expression profiles from some normal individuals and hepatitis B patients were analyzed in our study. In this work, the differential expressed genes were selected primarily. The several genes among those were validated in an independent set by qRT-PCR. Then the differentially co-expression analysis was conducted to identify differentially co-expressed links and differential co-expressed genes. Next, the analysis of the regulatory impact factors was performed through mapping the links and regulatory data. In order to give a further insight to these regulators, the co-expression gene modules were identified using a threshold-based hierarchical clustering method. Incidentally, the construction of the regulatory network was generated using the computer software. A total of 137,284 differentially co-expressed links and 780 differential co-expressed genes were identified. These co-expressed genes were significantly enriched inflammatory response. The results of regulatory impact factors revealed several crucial regulators related to hepatocellular carcinoma and other high-rank regulators. Meanwhile, more than one hundred co-expression gene modules were identified using clustering method. In our study, some important transcriptional regulators were identified using a computational method, which may enhance the understanding of disease mechanisms and lead to an improved treatment of hepatitis B. However, further experimental studies are required to confirm these findings. Copyright © 2015 Elsevier Masson SAS. All rights reserved.

  7. Influence networks based on coexpression improve drug target discovery for the development of novel cancer therapeutics.

    PubMed

    Penrod, Nadia M; Moore, Jason H

    2014-02-05

    The demand for novel molecularly targeted drugs will continue to rise as we move forward toward the goal of personalizing cancer treatment to the molecular signature of individual tumors. However, the identification of targets and combinations of targets that can be safely and effectively modulated is one of the greatest challenges facing the drug discovery process. A promising approach is to use biological networks to prioritize targets based on their relative positions to one another, a property that affects their ability to maintain network integrity and propagate information-flow. Here, we introduce influence networks and demonstrate how they can be used to generate influence scores as a network-based metric to rank genes as potential drug targets. We use this approach to prioritize genes as drug target candidates in a set of ER⁺ breast tumor samples collected during the course of neoadjuvant treatment with the aromatase inhibitor letrozole. We show that influential genes, those with high influence scores, tend to be essential and include a higher proportion of essential genes than those prioritized based on their position (i.e. hubs or bottlenecks) within the same network. Additionally, we show that influential genes represent novel biologically relevant drug targets for the treatment of ER⁺ breast cancers. Moreover, we demonstrate that gene influence differs between untreated tumors and residual tumors that have adapted to drug treatment. In this way, influence scores capture the context-dependent functions of genes and present the opportunity to design combination treatment strategies that take advantage of the tumor adaptation process. Influence networks efficiently find essential genes as promising drug targets and combinations of targets to inform the development of molecularly targeted drugs and their use.

  8. Influence networks based on coexpression improve drug target discovery for the development of novel cancer therapeutics

    PubMed Central

    2014-01-01

    Background The demand for novel molecularly targeted drugs will continue to rise as we move forward toward the goal of personalizing cancer treatment to the molecular signature of individual tumors. However, the identification of targets and combinations of targets that can be safely and effectively modulated is one of the greatest challenges facing the drug discovery process. A promising approach is to use biological networks to prioritize targets based on their relative positions to one another, a property that affects their ability to maintain network integrity and propagate information-flow. Here, we introduce influence networks and demonstrate how they can be used to generate influence scores as a network-based metric to rank genes as potential drug targets. Results We use this approach to prioritize genes as drug target candidates in a set of ER + breast tumor samples collected during the course of neoadjuvant treatment with the aromatase inhibitor letrozole. We show that influential genes, those with high influence scores, tend to be essential and include a higher proportion of essential genes than those prioritized based on their position (i.e. hubs or bottlenecks) within the same network. Additionally, we show that influential genes represent novel biologically relevant drug targets for the treatment of ER + breast cancers. Moreover, we demonstrate that gene influence differs between untreated tumors and residual tumors that have adapted to drug treatment. In this way, influence scores capture the context-dependent functions of genes and present the opportunity to design combination treatment strategies that take advantage of the tumor adaptation process. Conclusions Influence networks efficiently find essential genes as promising drug targets and combinations of targets to inform the development of molecularly targeted drugs and their use. PMID:24495353

  9. Towards building a disease-phenotype knowledge base: extracting disease-manifestation relationship from literature

    PubMed Central

    Xu, Rong; Li, Li; Wang, QuanQiu

    2013-01-01

    Motivation: Systems approaches to studying phenotypic relationships among diseases are emerging as an active area of research for both novel disease gene discovery and drug repurposing. Currently, systematic study of disease phenotypic relationships on a phenome-wide scale is limited because large-scale machine-understandable disease–phenotype relationship knowledge bases are often unavailable. Here, we present an automatic approach to extract disease–manifestation (D-M) pairs (one specific type of disease–phenotype relationship) from the wide body of published biomedical literature. Data and Methods: Our method leverages external knowledge and limits the amount of human effort required. For the text corpus, we used 119 085 682 MEDLINE sentences (21 354 075 citations). First, we used D-M pairs from existing biomedical ontologies as prior knowledge to automatically discover D-M–specific syntactic patterns. We then extracted additional pairs from MEDLINE using the learned patterns. Finally, we analysed correlations between disease manifestations and disease-associated genes and drugs to demonstrate the potential of this newly created knowledge base in disease gene discovery and drug repurposing. Results: In total, we extracted 121 359 unique D-M pairs with a high precision of 0.924. Among the extracted pairs, 120 419 (99.2%) have not been captured in existing structured knowledge sources. We have shown that disease manifestations correlate positively with both disease-associated genes and drug treatments. Conclusions: The main contribution of our study is the creation of a large-scale and accurate D-M phenotype relationship knowledge base. This unique knowledge base, when combined with existing phenotypic, genetic and proteomic datasets, can have profound implications in our deeper understanding of disease etiology and in rapid drug repurposing. Availability: http://nlp.case.edu/public/data/DMPatternUMLS/ Contact: rxx@case.edu PMID:23828786

  10. Comparison of two PCR-based methods and automated DNA sequencing for prop-1 genotyping in Ames dwarf mice.

    PubMed

    Gerstner, Arpad; DeFord, James H; Papaconstantinou, John

    2003-07-25

    Ames dwarfism is caused by a homozygous single nucleotide mutation in the pituitary specific prop-1 gene, resulting in combined pituitary hormone deficiency, reduced growth and extended lifespan. Thus, these mice serve as an important model system for endocrinological, aging and longevity studies. Because the phenotype of wild type and heterozygous mice is undistinguishable, it is imperative for successful breeding to accurately genotype these animals. Here we report a novel, yet simple, approach for prop-1 genotyping using PCR-based allele-specific amplification (PCR-ASA). We also compare this method to other potential genotyping techniques, i.e. PCR-based restriction fragment length polymorphism analysis (PCR-RFLP) and fluorescence automated DNA sequencing. We demonstrate that the single-step PCR-ASA has several advantages over the classical PCR-RFLP because the procedure is simple, less expensive and rapid. To further increase the specificity and sensitivity of the PCR-ASA, we introduced a single-base mismatch at the 3' penultimate position of the mutant primer. Our results also reveal that the fluorescence automated DNA sequencing has limitations for detecting a single nucleotide polymorphism in the prop-1 gene, particularly in heterozygotes.

  11. Use of reverse transcription loop-mediated isothermal amplification combined with lateral flow dipstick for an easy and rapid detection of Jembrana disease virus.

    PubMed

    Kusumawati, Asmarani; Tampubolon, Issabellina Dwades; Hendarta, Narendra Yoga; Salasia, Siti Isrina Oktavia; Wanahari, Tenri Ashari; Mappakaya, Basofi Ashari; Hartati, Sri

    2015-09-01

    Jembrana disease virus (JDV) is a viral pathogen that causes Jembrana disease in Bali cattle (Bos javanicus) with high mortality rate. An easy and rapid diagnostic method is essential for further control this disease. We used a reverse transcription loop-mediated isothermal amplification (RT-LAMP) combined with lateral flow dipstick (LFD), based on conserved tm subunit of Jembrana disease virus env gene. The RT-LAMP conditions were optimized by varying the concentration of MgSO4, betaine, dNTP, and temperature as well as the time and duration of reaction. The primers sensitivity for JDV was confirmed. The method was able to detect env-tm gene dilution which contained 2 × 10(-15) g of template. Comparatively, the sensitivity of RT-LAMP/LFD was 100-fold more sensitive than reverse transcription-polymerase chain reaction. The primers specificity for JDV was also confirmed using positive and negative controls. This work also showed that virus detection could be done not only on total RNA extracted from blood but various organs could also be analyzed for the presence of JDV using RT-LAMP/LFD method. The whole process, including the LAMP reaction and the LFD hybridization step only lasts approximately 75 min. Results of analysis can be easily observed with naked eyes without addition of any chemical or further analysis. The combination of RT-LAMP with LFD makes the method a more suitable diagnostic tool in conditions where sophisticated and expensive equipments are not available for field investigations on Jembrana disease in Bali cattle.

  12. Differentially Coexpressed Disease Gene Identification Based on Gene Coexpression Network.

    PubMed

    Jiang, Xue; Zhang, Han; Quan, Xiongwen

    2016-01-01

    Screening disease-related genes by analyzing gene expression data has become a popular theme. Traditional disease-related gene selection methods always focus on identifying differentially expressed gene between case samples and a control group. These traditional methods may not fully consider the changes of interactions between genes at different cell states and the dynamic processes of gene expression levels during the disease progression. However, in order to understand the mechanism of disease, it is important to explore the dynamic changes of interactions between genes in biological networks at different cell states. In this study, we designed a novel framework to identify disease-related genes and developed a differentially coexpressed disease-related gene identification method based on gene coexpression network (DCGN) to screen differentially coexpressed genes. We firstly constructed phase-specific gene coexpression network using time-series gene expression data and defined the conception of differential coexpression of genes in coexpression network. Then, we designed two metrics to measure the value of gene differential coexpression according to the change of local topological structures between different phase-specific networks. Finally, we conducted meta-analysis of gene differential coexpression based on the rank-product method. Experimental results demonstrated the feasibility and effectiveness of DCGN and the superior performance of DCGN over other popular disease-related gene selection methods through real-world gene expression data sets.

  13. Genes with minimal phylogenetic information are problematic for coalescent analyses when gene tree estimation is biased.

    PubMed

    Xi, Zhenxiang; Liu, Liang; Davis, Charles C

    2015-11-01

    The development and application of coalescent methods are undergoing rapid changes. One little explored area that bears on the application of gene-tree-based coalescent methods to species tree estimation is gene informativeness. Here, we investigate the accuracy of these coalescent methods when genes have minimal phylogenetic information, including the implementation of the multilocus bootstrap approach. Using simulated DNA sequences, we demonstrate that genes with minimal phylogenetic information can produce unreliable gene trees (i.e., high error in gene tree estimation), which may in turn reduce the accuracy of species tree estimation using gene-tree-based coalescent methods. We demonstrate that this problem can be alleviated by sampling more genes, as is commonly done in large-scale phylogenomic analyses. This applies even when these genes are minimally informative. If gene tree estimation is biased, however, gene-tree-based coalescent analyses will produce inconsistent results, which cannot be remedied by increasing the number of genes. In this case, it is not the gene-tree-based coalescent methods that are flawed, but rather the input data (i.e., estimated gene trees). Along these lines, the commonly used program PhyML has a tendency to infer one particular bifurcating topology even though it is best represented as a polytomy. We additionally corroborate these findings by analyzing the 183-locus mammal data set assembled by McCormack et al. (2012) using ultra-conserved elements (UCEs) and flanking DNA. Lastly, we demonstrate that when employing the multilocus bootstrap approach on this 183-locus data set, there is no strong conflict between species trees estimated from concatenation and gene-tree-based coalescent analyses, as has been previously suggested by Gatesy and Springer (2014). Copyright © 2015 Elsevier Inc. All rights reserved.

  14. [Gene-gene interaction on central obesity in school-aged children in China].

    PubMed

    Fu, L W; Zhang, M X; Wu, L J; Gao, L W; Mi, J

    2017-07-10

    Objective: To investigate possible effect of 6 obesity-associated SNPs in contribution to central obesity and examine whether there is an interaction in the 6 SNPs in the cause of central obesity in school-aged children in China. Methods: A total of 3 502 school-aged children who were included in Beijing Child and Adolescent Metabolic Syndrome (BCAMS) Study were selected, and based on the age and sex specific waist circumference (WC) standards in the BCAMS study, 1 196 central obese cases and 2 306 controls were identified. Genomic DNA was extracted from peripheral blood white cells using the salt fractionation method. A total of 6 single nucleotide polymorphisms ( FTO rs9939609, MC4R rs17782313, BDNF rs6265, PCSK1 rs6235, SH2B1 rs4788102, and CSK rs1378942) were genotyped by TaqMan allelic discrimination assays with the GeneAmp 7900 sequence detection system (Applied Biosystems, Foster City, CA, USA). Logistic regression model was used to investigate the association between 6 SNPs and central obesity. Gene-gene interactions among 6 polymorphic loci were analyzed by using the Generalized Multifactor Dimensionality Reduction (GMDR) method, and then logistic regression model was constructed to confirm the best combination of loci identified in the GMDR. Results: After adjusting gender, age, Tanner stage, physical activity and family history of obesity, the FTO rs9939609-A, MC4R rs17782313-C and BDNF rs6265-G alleles were associated with central obesity under additive genetic model ( OR =1.24, 95 %CI : 1.06-1.45, P =0.008; OR =1.26, 95 %CI : 1.11-1.43, P =2.98×10(-4); OR =1.18, 95 % CI : 1.06-1.32, P =0.003). GMDR analysis showed a significant gene-gene interaction between MC4R rs17782313 and BDNF rs6265 ( P =0.001). The best two-locus combination showed the cross-validation consistency of 10/10 and testing accuracy of 0.539. This interaction showed the maximum consistency and minimum prediction error among all gene-gene interaction models evaluated. Moreover, the combination of MC4R rs17782313-C and BDNF rs6265-G was associated with an increased risk of central obesity after adjustment for gender, age, Tanner stage, physical activity and family history of obesity. Conclusions: Our study showed that FTO rs9939609-A, MC4R rs17782313-C and BDNF rs6265-G alleles were associated with central obesity, and statistical interaction between MC4R rs17782313-C and BDNF rs6265-G increased risk of central obesity in school-aged children in China.

  15. High-Resolution Melt Analysis for Rapid Comparison of Bacterial Community Compositions

    PubMed Central

    Hjelmsø, Mathis Hjort; Hansen, Lars Hestbjerg; Bælum, Jacob; Feld, Louise; Holben, William E.

    2014-01-01

    In the study of bacterial community composition, 16S rRNA gene amplicon sequencing is today among the preferred methods of analysis. The cost of nucleotide sequence analysis, including requisite computational and bioinformatic steps, however, takes up a large part of many research budgets. High-resolution melt (HRM) analysis is the study of the melt behavior of specific PCR products. Here we describe a novel high-throughput approach in which we used HRM analysis targeting the 16S rRNA gene to rapidly screen multiple complex samples for differences in bacterial community composition. We hypothesized that HRM analysis of amplified 16S rRNA genes from a soil ecosystem could be used as a screening tool to identify changes in bacterial community structure. This hypothesis was tested using a soil microcosm setup exposed to a total of six treatments representing different combinations of pesticide and fertilization treatments. The HRM analysis identified a shift in the bacterial community composition in two of the treatments, both including the soil fumigant Basamid GR. These results were confirmed with both denaturing gradient gel electrophoresis (DGGE) analysis and 454-based 16S rRNA gene amplicon sequencing. HRM analysis was shown to be a fast, high-throughput technique that can serve as an effective alternative to gel-based screening methods to monitor microbial community composition. PMID:24610853

  16. [Phylogenetic analysis of closely related Leuconostoc citreum species based on partial housekeeping genes].

    PubMed

    Lv, Qiang; Chen, Ming; Xu, Haiyan; Song, Yuqin; Sun, Zhihong; Dan, Tong; Sun, Tiansong

    2013-07-04

    Using the 16S rRNA, dnaA, murC and pyrG gene sequences, we identified the phylogenetic relationship among closely related Leuconostoc citreum species. Seven Leu. citreum strains originally isolated from sourdough were characterized by PCR methods to amplify the dnaA, murC and pyrG gene sequences, which were determined to assess the suitability as phylogenetic markers. Then, we estimated the genetic distance and constructed the phylogenetic trees including 16S rRNA and above mentioned three housekeeping genes combining with published corresponding sequences. By comparing the phylogenetic trees, the topology of three housekeeping genes trees were consistent with that of 16S rRNA gene. The homology of closely related Leu. citreum species among dnaA, murC, pyrG and 16S rRNA gene sequences were different, ranged from75.5% to 97.2%, 50.2% to 99.7%, 65.0% to 99.8% and 98.5% 100%, respectively. The phylogenetic relationship of three housekeeping genes sequences were highly consistent with the results of 16S rRNA gene sequence, while the genetic distance of these housekeeping genes were extremely high than 16S rRNA gene. Consequently, the dnaA, murC and pyrG gene are suitable for classification and identification closely related Leu. citreum species.

  17. A novel method to identify pathways associated with renal cell carcinoma based on a gene co-expression network

    PubMed Central

    RUAN, XIYUN; LI, HONGYUN; LIU, BO; CHEN, JIE; ZHANG, SHIBAO; SUN, ZEQIANG; LIU, SHUANGQING; SUN, FAHAI; LIU, QINGYONG

    2015-01-01

    The aim of the present study was to develop a novel method for identifying pathways associated with renal cell carcinoma (RCC) based on a gene co-expression network. A framework was established where a co-expression network was derived from the database as well as various co-expression approaches. First, the backbone of the network based on differentially expressed (DE) genes between RCC patients and normal controls was constructed by the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database. The differentially co-expressed links were detected by Pearson’s correlation, the empirical Bayesian (EB) approach and Weighted Gene Co-expression Network Analysis (WGCNA). The co-expressed gene pairs were merged by a rank-based algorithm. We obtained 842; 371; 2,883 and 1,595 co-expressed gene pairs from the co-expression networks of the STRING database, Pearson’s correlation EB method and WGCNA, respectively. Two hundred and eighty-one differentially co-expressed (DC) gene pairs were obtained from the merged network using this novel method. Pathway enrichment analysis based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) database and the network enrichment analysis (NEA) method were performed to verify feasibility of the merged method. Results of the KEGG and NEA pathway analyses showed that the network was associated with RCC. The suggested method was computationally efficient to identify pathways associated with RCC and has been identified as a useful complement to traditional co-expression analysis. PMID:26058425

  18. Mosaic analysis of gene function in postnatal mouse brain development by using virus-based Cre recombination.

    PubMed

    Gibson, Daniel A; Ma, Le

    2011-08-01

    Normal brain function relies not only on embryonic development when major neuronal pathways are established, but also on postnatal development when neural circuits are matured and refined. Misregulation at this stage may lead to neurological and psychiatric disorders such as autism and schizophrenia. Many genes have been studied in the prenatal brain and found crucial to many developmental processes. However, their function in the postnatal brain is largely unknown, partly because their deletion in mice often leads to lethality during neonatal development, and partly because their requirement in early development hampers the postnatal analysis. To overcome these obstacles, floxed alleles of these genes are currently being generated in mice. When combined with transgenic alleles that express Cre recombinase in specific cell types, conditional deletion can be achieved to study gene function in the postnatal brain. However, this method requires additional alleles and extra time (3-6 months) to generate the mice with appropriate genotypes, thereby limiting the expansion of the genetic analysis to a large scale in the mouse brain. Here we demonstrate a complementary approach that uses virally-expressed Cre to study these floxed alleles rapidly and systematically in postnatal brain development. By injecting recombinant adeno-associated viruses (rAAVs) encoding Cre into the neonatal brain, we are able to delete the gene of interest in different regions of the brain. By controlling the viral titer and coexpressing a fluorescent protein marker, we can simultaneously achieve mosaic gene inactivation and sparse neuronal labeling. This method bypasses the requirement of many genes in early development, and allows us to study their cell autonomous function in many critical processes in postnatal brain development, including axonal and dendritic growth, branching, and tiling, as well as synapse formation and refinement. This method has been used successfully in our own lab (unpublished results) and others, and can be extended to other viruses, such as lentivirus, as well as to the expression of shRNA or dominant active proteins. Furthermore, by combining this technique with electrophysiology as well as recently-developed optical imaging tools, this method provides a new strategy to study how genetic pathways influence neural circuit development and function in mice and rats.

  19. Genetic Bee Colony (GBC) algorithm: A new gene selection method for microarray cancer classification.

    PubMed

    Alshamlan, Hala M; Badr, Ghada H; Alohali, Yousef A

    2015-06-01

    Naturally inspired evolutionary algorithms prove effectiveness when used for solving feature selection and classification problems. Artificial Bee Colony (ABC) is a relatively new swarm intelligence method. In this paper, we propose a new hybrid gene selection method, namely Genetic Bee Colony (GBC) algorithm. The proposed algorithm combines the used of a Genetic Algorithm (GA) along with Artificial Bee Colony (ABC) algorithm. The goal is to integrate the advantages of both algorithms. The proposed algorithm is applied to a microarray gene expression profile in order to select the most predictive and informative genes for cancer classification. In order to test the accuracy performance of the proposed algorithm, extensive experiments were conducted. Three binary microarray datasets are use, which include: colon, leukemia, and lung. In addition, another three multi-class microarray datasets are used, which are: SRBCT, lymphoma, and leukemia. Results of the GBC algorithm are compared with our recently proposed technique: mRMR when combined with the Artificial Bee Colony algorithm (mRMR-ABC). We also compared the combination of mRMR with GA (mRMR-GA) and Particle Swarm Optimization (mRMR-PSO) algorithms. In addition, we compared the GBC algorithm with other related algorithms that have been recently published in the literature, using all benchmark datasets. The GBC algorithm shows superior performance as it achieved the highest classification accuracy along with the lowest average number of selected genes. This proves that the GBC algorithm is a promising approach for solving the gene selection problem in both binary and multi-class cancer classification. Copyright © 2015 Elsevier Ltd. All rights reserved.

  20. Comparison of two schemes for automatic keyword extraction from MEDLINE for functional gene clustering.

    PubMed

    Liu, Ying; Ciliax, Brian J; Borges, Karin; Dasigi, Venu; Ram, Ashwin; Navathe, Shamkant B; Dingledine, Ray

    2004-01-01

    One of the key challenges of microarray studies is to derive biological insights from the unprecedented quatities of data on gene-expression patterns. Clustering genes by functional keyword association can provide direct information about the nature of the functional links among genes within the derived clusters. However, the quality of the keyword lists extracted from biomedical literature for each gene significantly affects the clustering results. We extracted keywords from MEDLINE that describes the most prominent functions of the genes, and used the resulting weights of the keywords as feature vectors for gene clustering. By analyzing the resulting cluster quality, we compared two keyword weighting schemes: normalized z-score and term frequency-inverse document frequency (TFIDF). The best combination of background comparison set, stop list and stemming algorithm was selected based on precision and recall metrics. In a test set of four known gene groups, a hierarchical algorithm correctly assigned 25 of 26 genes to the appropriate clusters based on keywords extracted by the TDFIDF weighting scheme, but only 23 og 26 with the z-score method. To evaluate the effectiveness of the weighting schemes for keyword extraction for gene clusters from microarray profiles, 44 yeast genes that are differentially expressed during the cell cycle were used as a second test set. Using established measures of cluster quality, the results produced from TFIDF-weighted keywords had higher purity, lower entropy, and higher mutual information than those produced from normalized z-score weighted keywords. The optimized algorithms should be useful for sorting genes from microarray lists into functionally discrete clusters.

  1. Combining genomic and proteomic approaches for epigenetics research

    PubMed Central

    Han, Yumiao; Garcia, Benjamin A

    2014-01-01

    Epigenetics is the study of changes in gene expression or cellular phenotype that do not change the DNA sequence. In this review, current methods, both genomic and proteomic, associated with epigenetics research are discussed. Among them, chromatin immunoprecipitation (ChIP) followed by sequencing and other ChIP-based techniques are powerful techniques for genome-wide profiling of DNA-binding proteins, histone post-translational modifications or nucleosome positions. However, mass spectrometry-based proteomics is increasingly being used in functional biological studies and has proved to be an indispensable tool to characterize histone modifications, as well as DNA–protein and protein–protein interactions. With the development of genomic and proteomic approaches, combination of ChIP and mass spectrometry has the potential to expand our knowledge of epigenetics research to a higher level. PMID:23895656

  2. Multiplex PCR Assay for Differentiation of Helicobacter felis, H. bizzozeronii, and H. salomonis

    PubMed Central

    Baele, M.; Van den Bulck, K.; Decostere, A.; Vandamme, P.; Hänninen, M.-L.; Ducatelle, R.; Haesebrouck, F.

    2004-01-01

    Helicobacter felis, Helicobacter bizzozeronii, and Helicobacter salomonis are frequently found in the gastric mucous membrane of dogs and cats. These large spiral organisms are phylogenetically highly related to each other. Their fastidious nature makes it difficult to cultivate them in vitro, hampering traditional identification methods. We describe here a multiplex PCR test based on the tRNA intergenic spacers and on the urease gene, combined with capillary electrophoresis, that allows discrimination of these three species. In combination with previously described 16S ribosomal DNA-based primers specific for the nonculturable “Candidatus Helicobacter suis,” our procedure was shown to be very useful in determining the species identity of “Helicobacter heilmannii”-like organisms observed in human stomachs and will facilitate research concerning their possible zoonotic importance. PMID:15004062

  3. Discriminative power of Campylobacter phenotypic and genotypic typing methods.

    PubMed

    Duarte, Alexandra; Seliwiorstow, Tomasz; Miller, William G; De Zutter, Lieven; Uyttendaele, Mieke; Dierick, Katelijne; Botteldoorn, Nadine

    2016-06-01

    The aim of this study was to compare different typing methods, individually and combined, for use in the monitoring of Campylobacter in food. Campylobacter jejuni (n=94) and Campylobacter coli (n=52) isolated from different broiler meat carcasses were characterized using multilocus sequence typing (MLST), flagellin gene A restriction fragment length polymorphism typing (flaA-RFLP), antimicrobial resistance profiling (AMRp), the presence/absence of 5 putative virulence genes; and, exclusively for C. jejuni, the determination of lipooligosaccharide (LOS) class. Discriminatory power was calculated by the Simpson's index of diversity (SID) and the congruence was measured by the adjusted Rand index and adjusted Wallace coefficient. MLST was individually the most discriminative typing method for both C. jejuni (SID=0.981) and C. coli (SID=0.957). The most discriminative combination with a SID of 0.992 for both C. jejuni and C. coli was obtained by combining MLST with flaA-RFLP. The combination of MLST with flaA-RFLP is an easy and feasible typing method for short-term monitoring of Campylobacter in broiler meat carcass. Copyright © 2016 Elsevier B.V. All rights reserved.

  4. Polymorphisms in the methylene tetrahydrofolate reductase gene and their unique combinations are associated with an increased susceptibility to the renal cancers.

    PubMed

    Ajaz, Sadia; Khaliq, Shagufta; Hashmi, Altaf; Naqvi, Syed Ali Anwar; Rizvi, Syed Adib-ul-Hassan; Mehdi, Syed Qasim

    2012-05-01

    Two single nucleotide polymorphisms in the methylene tetrahydrofolate reductase (MTHFR) gene, 677C/T and 1298A/C, encode the thermolabile isoforms of the MTHFR enzyme that adversely affect the folic acid metabolic pathway. In the present study, these polymorphisms were investigated for their associations with the risk and prognosis of the renal cell carcinomas (RCCs) in Pakistani patients. The study included 168 RCC patients and 178 controls. The polymorphisms were analyzed by the polymerase chain reaction-restriction fragment length polymorphism method. Statistical analysis revealed that the C-allele and homozygous C genotype of the MTHFR 1298A/C polymorphism were significantly correlated with the risk of RCCs (odds ratio [OR]=1.60; 95% confidence interval [CI]=1.1-2.34 and OR=3.26; 95% CI=1.27-8.37, respectively). The combined genotype analysis showed that the 677CC+1298CC combination greatly increased the susceptibility to RCCs (OR=8.34; 95% CI=2.7-25.7). The 677CT+1298AA and 677CC+1298CA combinations were also associated with an increased risk of RCC (OR=3.21; 95% CI=1.3-7.8 and OR=2.45; 95% CI=1.3-4.6, respectively). The combined genotype effects were also evident in a semiparametric expectation-maximization-based haplotype analysis. The results presented here indicate that the two MTHFR gene polymorphisms are significantly associated with the risk of RCCs in a cohort of Pakistani patients and may be useful as susceptibility markers in other populations of the world as well.

  5. A Weighted Multipath Measurement Based on Gene Ontology for Estimating Gene Products Similarity

    PubMed Central

    Liu, Lizhen; Dai, Xuemin; Song, Wei; Lu, Jingli

    2014-01-01

    Abstract Many different methods have been proposed for calculating the semantic similarity of term pairs based on gene ontology (GO). Most existing methods are based on information content (IC), and the methods based on IC are used more commonly than those based on the structure of GO. However, most IC-based methods not only fail to handle identical annotations but also show a strong bias toward well-annotated proteins. We propose a new method called weighted multipath measurement (WMM) for estimating the semantic similarity of gene products based on the structure of the GO. We not only considered the contribution of every path between two GO terms but also took the depth of the lowest common ancestors into account. We assigned different weights for different kinds of edges in GO graph. The similarity values calculated by WMM can be reused because they are only relative to the characteristics of GO terms. Experimental results showed that the similarity values obtained by WMM have a higher accuracy. We compared the performance of WMM with that of other methods using GO data and gene annotation datasets for yeast and humans downloaded from the GO database. We found that WMM is more suited for prediction of gene function than most existing IC-based methods and that it can distinguish proteins with identical annotations (two proteins are annotated with the same terms) from each other. PMID:25229994

  6. The Interaction of TXNIP and AFq1 Genes Increases the Susceptibility of Schizophrenia.

    PubMed

    Su, Yousong; Ding, Wenhua; Xing, Mengjuan; Qi, Dake; Li, Zezhi; Cui, Donghong

    2017-08-01

    Although previous studies showed the reduced risk of cancer in patients with schizophrenia, whether patients with schizophrenia possess genetic factors that also contribute to tumor suppressor is still unknown. In the present study, based on our previous microarray data, we focused on the tumor suppressor genes TXNIP and AF1q, which differentially expressed in patients with schizophrenia. A total of 413 patients and 578 healthy controls were recruited. We found no significant differences in genotype, allele, or haplotype frequencies at the selected five single nucleotide polymorphisms (SNPs) (rs2236566 and rs7211 in TXNIP gene; rs10749659, rs2140709, and rs3738481 in AF1q gene) between patients with schizophrenia and controls. However, we found the association between the interaction of TXNIP and AF1q with schizophrenia by using the MDR method followed by traditional statistical analysis. The best gene-gene interaction model identified was a three-locus model TXNIP (rs2236566, rs7211)-AF1q (rs2140709). After traditional statistical analysis, we found the high-risk genotype combination was rs2236566 (GG)-rs7211(CC)-rs2140709(CC) (OR = 1.35 [1.03-1.76]). The low-risk genotype combination was rs2236566 (GT)-rs7211(CC)-rs2140709(CC) (OR = 0.67 [0.49-0.91]). Our finding suggested statistically significant role of interaction of TXNIP and AF1q polymorphisms (TXNIP-rs2236566, TXNIP-rs7211, and AF1q-rs2769605) in schizophrenia susceptibility.

  7. A novel approach for dimension reduction of microarray.

    PubMed

    Aziz, Rabia; Verma, C K; Srivastava, Namita

    2017-12-01

    This paper proposes a new hybrid search technique for feature (gene) selection (FS) using Independent component analysis (ICA) and Artificial Bee Colony (ABC) called ICA+ABC, to select informative genes based on a Naïve Bayes (NB) algorithm. An important trait of this technique is the optimization of ICA feature vector using ABC. ICA+ABC is a hybrid search algorithm that combines the benefits of extraction approach, to reduce the size of data and wrapper approach, to optimize the reduced feature vectors. This hybrid search technique is facilitated by evaluating the performance of ICA+ABC on six standard gene expression datasets of classification. Extensive experiments were conducted to compare the performance of ICA+ABC with the results obtained from recently published Minimum Redundancy Maximum Relevance (mRMR) +ABC algorithm for NB classifier. Also to check the performance that how ICA+ABC works as feature selection with NB classifier, compared the combination of ICA with popular filter techniques and with other similar bio inspired algorithm such as Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). The result shows that ICA+ABC has a significant ability to generate small subsets of genes from the ICA feature vector, that significantly improve the classification accuracy of NB classifier compared to other previously suggested methods. Copyright © 2017 Elsevier Ltd. All rights reserved.

  8. Genomic Prediction and Association Mapping of Curd-Related Traits in Gene Bank Accessions of Cauliflower.

    PubMed

    Thorwarth, Patrick; Yousef, Eltohamy A A; Schmid, Karl J

    2018-02-02

    Genetic resources are an important source of genetic variation for plant breeding. Genome-wide association studies (GWAS) and genomic prediction greatly facilitate the analysis and utilization of useful genetic diversity for improving complex phenotypic traits in crop plants. We explored the potential of GWAS and genomic prediction for improving curd-related traits in cauliflower ( Brassica oleracea var. botrytis ) by combining 174 randomly selected cauliflower gene bank accessions from two different gene banks. The collection was genotyped with genotyping-by-sequencing (GBS) and phenotyped for six curd-related traits at two locations and three growing seasons. A GWAS analysis based on 120,693 single-nucleotide polymorphisms identified a total of 24 significant associations for curd-related traits. The potential for genomic prediction was assessed with a genomic best linear unbiased prediction model and BayesB. Prediction abilities ranged from 0.10 to 0.66 for different traits and did not differ between prediction methods. Imputation of missing genotypes only slightly improved prediction ability. Our results demonstrate that GWAS and genomic prediction in combination with GBS and phenotyping of highly heritable traits can be used to identify useful quantitative trait loci and genotypes among genetically diverse gene bank material for subsequent utilization as genetic resources in cauliflower breeding. Copyright © 2018 Thorwarth et al.

  9. Detection of doublecortin domain-containing 2 (DCDC2), a new candidate tumor suppressor gene of hepatocellular carcinoma, by triple combination array analysis

    PubMed Central

    2013-01-01

    Background To detect genes correlated with hepatocellular carcinoma (HCC), we developed a triple combination array consisting of methylation array, gene expression array and single nucleotide polymorphism (SNP) array analysis. Methods A surgical specimen obtained from a 68-year-old female HCC patient was analyzed by triple combination array, which identified doublecortin domain-containing 2 (DCDC2) as a candidate tumor suppressor gene of HCC. Subsequently, samples from 48 HCC patients were evaluated for their DCDC2 methylation and expression status using methylation specific PCR (MSP) and semi-quantitative reverse transcriptase (RT) PCR, respectively. Then, we investigated the relationship between clinicopathological factors and methylation status of DCDC2. Results DCDC2 was revealed to be hypermethylated (methylation value 0.846, range 0–1.0) in cancer tissue, compared with adjacent normal tissue (0.212) by methylation array in the 68-year-old female patient. Expression array showed decreased expression of DCDC2 in cancerous tissue. SNP array showed that the copy number of chromosome 6p22.1, in which DCDC2 resides, was normal. MSP revealed hypermethylation of the promoter region of DCDC2 in 41 of the tumor samples. DCDC2 expression was significantly decreased in the cases with methylation (P = 0.048). Furthermore, the methylated cases revealed worse prognosis for overall survival than unmethylated cases (P = 0.048). Conclusions The present study indicates that triple combination array is an effective method to detect novel genes related to HCC. We propose that DCDC2 is a tumor suppressor gene of HCC. PMID:24034596

  10. Discovery and explanation of drug-drug interactions via text mining.

    PubMed

    Percha, Bethany; Garten, Yael; Altman, Russ B

    2012-01-01

    Drug-drug interactions (DDIs) can occur when two drugs interact with the same gene product. Most available information about gene-drug relationships is contained within the scientific literature, but is dispersed over a large number of publications, with thousands of new publications added each month. In this setting, automated text mining is an attractive solution for identifying gene-drug relationships and aggregating them to predict novel DDIs. In previous work, we have shown that gene-drug interactions can be extracted from Medline abstracts with high fidelity - we extract not only the genes and drugs, but also the type of relationship expressed in individual sentences (e.g. metabolize, inhibit, activate and many others). We normalize these relationships and map them to a standardized ontology. In this work, we hypothesize that we can combine these normalized gene-drug relationships, drawn from a very broad and diverse literature, to infer DDIs. Using a training set of established DDIs, we have trained a random forest classifier to score potential DDIs based on the features of the normalized assertions extracted from the literature that relate two drugs to a gene product. The classifier recognizes the combinations of relationships, drugs and genes that are most associated with the gold standard DDIs, correctly identifying 79.8% of assertions relating interacting drug pairs and 78.9% of assertions relating noninteracting drug pairs. Most significantly, because our text processing method captures the semantics of individual gene-drug relationships, we can construct mechanistic pharmacological explanations for the newly-proposed DDIs. We show how our classifier can be used to explain known DDIs and to uncover new DDIs that have not yet been reported.

  11. Detection of canine distemper virus (CDV) through one step RT-PCR combined with nested PCR.

    PubMed

    Kim, Y H; Cho, K W; Youn, H Y; Yoo, H S; Han, H R

    2001-04-01

    A one step reverse transcription PCR (RT-PCR) combined nested PCR was set up to increase efficiency in the diagnosis of canine distemper virus (CDV) infection after developement of nested PCR. Two PCR primer sets were designed based on the sequence of nucleocapsid gene of CDV Onderstepoort strain. One-step RT-PCR with the outer primer pair was revealed to detect 10(2) PFU/ml. The sensitivity was increased hundredfold using the one-step RT-PCR combined with the nested PCR. Specificity of the PCR was also confirmed using other related canine virus and peripheral blood mononuclear cells (PBMC) and body secretes of healthy dogs. Of the 51 blood samples from dogs clinically suspected of CD, 45 samples were revealed as positive by one-step RT-PCR combined with nested PCR. However, only 15 samples were identified as positive with a single one step RT-PCR. Therefore approximately 60% increase in the efficiency of the diagnosis was observed by the combined method. These results suggested that one step RT-PCR combined with nested PCR could be a sensitive, specific, and practical method for diagnosis of CDV infection.

  12. The use of a viral 2A sequence for the simultaneous over-expression of both the vgf gene and enhanced green fluorescent protein (eGFP) in vitro and in vivo

    PubMed Central

    Lewis, Jo E.; Brameld, John M.; Hill, Phil; Barrett, Perry; Ebling, Francis J.P.; Jethwa, Preeti H.

    2015-01-01

    Introduction The viral 2A sequence has become an attractive alternative to the traditional internal ribosomal entry site (IRES) for simultaneous over-expression of two genes and in combination with recombinant adeno-associated viruses (rAAV) has been used to manipulate gene expression in vitro. New method To develop a rAAV construct in combination with the viral 2A sequence to allow long-term over-expression of the vgf gene and fluorescent marker gene for tracking of the transfected neurones in vivo. Results Transient transfection of the AAV plasmid containing the vgf gene, viral 2A sequence and eGFP into SH-SY5Y cells resulted in eGFP fluorescence comparable to a commercially available reporter construct. This increase in fluorescent cells was accompanied by an increase in VGF mRNA expression. Infusion of the rAAV vector containing the vgf gene, viral 2A sequence and eGFP resulted in eGFP fluorescence in the hypothalamus of both mice and Siberian hamsters, 32 weeks post infusion. In situ hybridisation confirmed that the location of VGF mRNA expression in the hypothalamus corresponded to the eGFP pattern of fluorescence. Comparison with old method The viral 2A sequence is much smaller than the traditional IRES and therefore allowed over-expression of the vgf gene with fluorescent tracking without compromising viral capacity. Conclusion The use of the viral 2A sequence in the AAV plasmid allowed the simultaneous expression of both genes in vitro. When used in combination with rAAV it resulted in long-term over-expression of both genes at equivalent locations in the hypothalamus of both Siberian hamsters and mice, without any adverse effects. PMID:26300182

  13. Cancer Imaging: Gene Transcription-Based Imaging and Therapeutic Systems

    PubMed Central

    Bhang, Hyo-eun C.; Pomper, Martin G.

    2012-01-01

    Molecular-genetic imaging of cancer is in its infancy. Over the past decade gene reporter systems have been optimized in preclinical models and some have found their way into the clinic. The search is on to find the best combination of gene delivery vehicle and reporter imaging system that can be translated safely and quickly. The goal is to have a combination that can detect a wide variety of cancers with high sensitivity and specificity in a way that rivals the current clinical standard, positron emission tomography with [18F]fluorodeoxyglucose. To do so will require systemic delivery of reporter genes for the detection of micrometastases, and a nontoxic vector, whether viral or based on nanotechnology, to gain widespread acceptance by the oncology community. Merger of molecular-genetic imaging with gene therapy, a strategy that has been employed in the past, will likely be necessary for such imaging to reach widespread clinical use. PMID:22349219

  14. Single Assay for Simultaneous Detection and Differential Identification of Human and Avian Influenza Virus Types, Subtypes, and Emergent Variants

    PubMed Central

    Metzgar, David; Myers, Christopher A.; Russell, Kevin L.; Faix, Dennis; Blair, Patrick J.; Brown, Jason; Vo, Scott; Swayne, David E.; Thomas, Colleen; Stenger, David A.; Lin, Baochuan; Malanoski, Anthony P.; Wang, Zheng; Blaney, Kate M.; Long, Nina C.; Schnur, Joel M.; Saad, Magdi D.; Borsuk, Lisa A.; Lichanska, Agnieszka M.; Lorence, Matthew C.; Weslowski, Brian; Schafer, Klaus O.; Tibbetts, Clark

    2010-01-01

    For more than four decades the cause of most type A influenza virus infections of humans has been attributed to only two viral subtypes, A/H1N1 or A/H3N2. In contrast, avian and other vertebrate species are a reservoir of type A influenza virus genome diversity, hosting strains representing at least 120 of 144 combinations of 16 viral hemagglutinin and 9 viral neuraminidase subtypes. Viral genome segment reassortments and mutations emerging within this reservoir may spawn new influenza virus strains as imminent epidemic or pandemic threats to human health and poultry production. Traditional methods to detect and differentiate influenza virus subtypes are either time-consuming and labor-intensive (culture-based) or remarkably insensitive (antibody-based). Molecular diagnostic assays based upon reverse transcriptase-polymerase chain reaction (RT-PCR) have short assay cycle time, and high analytical sensitivity and specificity. However, none of these diagnostic tests determine viral gene nucleotide sequences to distinguish strains and variants of a detected pathogen from one specimen to the next. Decision-quality, strain- and variant-specific pathogen gene sequence information may be critical for public health, infection control, surveillance, epidemiology, or medical/veterinary treatment planning. The Resequencing Pathogen Microarray (RPM-Flu) is a robust, highly multiplexed and target gene sequencing-based alternative to both traditional culture- or biomarker-based diagnostic tests. RPM-Flu is a single, simultaneous differential diagnostic assay for all subtype combinations of type A influenza viruses and for 30 other viral and bacterial pathogens that may cause influenza-like illness. These other pathogen targets of RPM-Flu may co-infect and compound the morbidity and/or mortality of patients with influenza. The informative specificity of a single RPM-Flu test represents specimen-specific viral gene sequences as determinants of virus type, A/HN subtype, virulence, host-range, and resistance to antiviral agents. PMID:20140251

  15. Single assay for simultaneous detection and differential identification of human and avian influenza virus types, subtypes, and emergent variants.

    PubMed

    Metzgar, David; Myers, Christopher A; Russell, Kevin L; Faix, Dennis; Blair, Patrick J; Brown, Jason; Vo, Scott; Swayne, David E; Thomas, Colleen; Stenger, David A; Lin, Baochuan; Malanoski, Anthony P; Wang, Zheng; Blaney, Kate M; Long, Nina C; Schnur, Joel M; Saad, Magdi D; Borsuk, Lisa A; Lichanska, Agnieszka M; Lorence, Matthew C; Weslowski, Brian; Schafer, Klaus O; Tibbetts, Clark

    2010-02-03

    For more than four decades the cause of most type A influenza virus infections of humans has been attributed to only two viral subtypes, A/H1N1 or A/H3N2. In contrast, avian and other vertebrate species are a reservoir of type A influenza virus genome diversity, hosting strains representing at least 120 of 144 combinations of 16 viral hemagglutinin and 9 viral neuraminidase subtypes. Viral genome segment reassortments and mutations emerging within this reservoir may spawn new influenza virus strains as imminent epidemic or pandemic threats to human health and poultry production. Traditional methods to detect and differentiate influenza virus subtypes are either time-consuming and labor-intensive (culture-based) or remarkably insensitive (antibody-based). Molecular diagnostic assays based upon reverse transcriptase-polymerase chain reaction (RT-PCR) have short assay cycle time, and high analytical sensitivity and specificity. However, none of these diagnostic tests determine viral gene nucleotide sequences to distinguish strains and variants of a detected pathogen from one specimen to the next. Decision-quality, strain- and variant-specific pathogen gene sequence information may be critical for public health, infection control, surveillance, epidemiology, or medical/veterinary treatment planning. The Resequencing Pathogen Microarray (RPM-Flu) is a robust, highly multiplexed and target gene sequencing-based alternative to both traditional culture- or biomarker-based diagnostic tests. RPM-Flu is a single, simultaneous differential diagnostic assay for all subtype combinations of type A influenza viruses and for 30 other viral and bacterial pathogens that may cause influenza-like illness. These other pathogen targets of RPM-Flu may co-infect and compound the morbidity and/or mortality of patients with influenza. The informative specificity of a single RPM-Flu test represents specimen-specific viral gene sequences as determinants of virus type, A/HN subtype, virulence, host-range, and resistance to antiviral agents.

  16. CARSVM: a class association rule-based classification framework and its application to gene expression data.

    PubMed

    Kianmehr, Keivan; Alhajj, Reda

    2008-09-01

    In this study, we aim at building a classification framework, namely the CARSVM model, which integrates association rule mining and support vector machine (SVM). The goal is to benefit from advantages of both, the discriminative knowledge represented by class association rules and the classification power of the SVM algorithm, to construct an efficient and accurate classifier model that improves the interpretability problem of SVM as a traditional machine learning technique and overcomes the efficiency issues of associative classification algorithms. In our proposed framework: instead of using the original training set, a set of rule-based feature vectors, which are generated based on the discriminative ability of class association rules over the training samples, are presented to the learning component of the SVM algorithm. We show that rule-based feature vectors present a high-qualified source of discrimination knowledge that can impact substantially the prediction power of SVM and associative classification techniques. They provide users with more conveniences in terms of understandability and interpretability as well. We have used four datasets from UCI ML repository to evaluate the performance of the developed system in comparison with five well-known existing classification methods. Because of the importance and popularity of gene expression analysis as real world application of the classification model, we present an extension of CARSVM combined with feature selection to be applied to gene expression data. Then, we describe how this combination will provide biologists with an efficient and understandable classifier model. The reported test results and their biological interpretation demonstrate the applicability, efficiency and effectiveness of the proposed model. From the results, it can be concluded that a considerable increase in classification accuracy can be obtained when the rule-based feature vectors are integrated in the learning process of the SVM algorithm. In the context of applicability, according to the results obtained from gene expression analysis, we can conclude that the CARSVM system can be utilized in a variety of real world applications with some adjustments.

  17. Genotype-based gene signature of glioma risk.

    PubMed

    Huang, Yen-Tsung; Zhang, Yi; Wu, Zhijin; Michaud, Dominique S

    2017-07-01

    Glioma accounts for 80% of malignant brain tumors, but its etiologic determinants remain elusive. Despite genetic susceptibility loci identified by genome-wide association study (GWAS), the agnostic approach leaves open the possibility that other susceptibility genes remain to be discovered. Here we conduct a gene-centric integrative GWAS (iGWAS) of glioma risk that combines transcriptomics and genetics. We synthesized a brain transcriptomics dataset (n = 354), a GWAS dataset (n = 4203), and an advanced glioma tumor transcriptomic dataset (n = 483) to conduct an iGWAS. Using the expression quantitative trait loci (eQTL) dataset, we built models to predict gene expression for the GWAS data, based on eQTL genotypes. With the predicted gene expression, iGWAS analyses were performed using a novel statistical method. Gene signature risk score was constructed using a penalized logistic regression model. A total of 30527 transcripts were analyzed using the iGWAS approach. Four novel glioma susceptibility genes were identified with internal and external validation, including DRD5 (P = 3.0 × 10-79), WDR1 (P = 8.4 × 10-77), NOMO1 (P = 1.3 × 10-25), and PDXDC1 (P = 8.3 × 10-24). The genotype-predicted transcription pattern between cases and controls is consistent with that between tumor and its matched normal tissue. The genotype-based 4-gene signature improved the classification between glioma cases and controls based on age, gender, and population stratification, with area under the receiver operating characteristic curve increasing from 0.77 to 0.85 (P = 8.1 × 10-23). A new genotype-based gene signature of glioma was identified using a novel iGWAS approach, which integrates multiplatform genomic data as well as different genetic association studies. © The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Neuro-Oncology. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com

  18. Inferring gene regression networks with model trees

    PubMed Central

    2010-01-01

    Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database) is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear regressions to separate areas of the search space favoring to infer localized similarities over a more global similarity. Furthermore, experimental results show the good performance of REGNET. PMID:20950452

  19. Recreating a functional ancestral archosaur visual pigment.

    PubMed

    Chang, Belinda S W; Jönsson, Karolina; Kazmi, Manija A; Donoghue, Michael J; Sakmar, Thomas P

    2002-09-01

    The ancestors of the archosaurs, a major branch of the diapsid reptiles, originated more than 240 MYA near the dawn of the Triassic Period. We used maximum likelihood phylogenetic ancestral reconstruction methods and explored different models of evolution for inferring the amino acid sequence of a putative ancestral archosaur visual pigment. Three different types of maximum likelihood models were used: nucleotide-based, amino acid-based, and codon-based models. Where possible, within each type of model, likelihood ratio tests were used to determine which model best fit the data. Ancestral reconstructions of the ancestral archosaur node using the best-fitting models of each type were found to be in agreement, except for three amino acid residues at which one reconstruction differed from the other two. To determine if these ancestral pigments would be functionally active, the corresponding genes were chemically synthesized and then expressed in a mammalian cell line in tissue culture. The expressed artificial genes were all found to bind to 11-cis-retinal to yield stable photoactive pigments with lambda(max) values of about 508 nm, which is slightly redshifted relative to that of extant vertebrate pigments. The ancestral archosaur pigments also activated the retinal G protein transducin, as measured in a fluorescence assay. Our results show that ancestral genes from ancient organisms can be reconstructed de novo and tested for function using a combination of phylogenetic and biochemical methods.

  20. Extracting microRNA-gene relations from biomedical literature using distant supervision

    PubMed Central

    Clarke, Luka A.; Couto, Francisco M.

    2017-01-01

    Many biomedical relation extraction approaches are based on supervised machine learning, requiring an annotated corpus. Distant supervision aims at training a classifier by combining a knowledge base with a corpus, reducing the amount of manual effort necessary. This is particularly useful for biomedicine because many databases and ontologies have been made available for many biological processes, while the availability of annotated corpora is still limited. We studied the extraction of microRNA-gene relations from text. MicroRNA regulation is an important biological process due to its close association with human diseases. The proposed method, IBRel, is based on distantly supervised multi-instance learning. We evaluated IBRel on three datasets, and the results were compared with a co-occurrence approach as well as a supervised machine learning algorithm. While supervised learning outperformed on two of those datasets, IBRel obtained an F-score 28.3 percentage points higher on the dataset for which there was no training set developed specifically. To demonstrate the applicability of IBRel, we used it to extract 27 miRNA-gene relations from recently published papers about cystic fibrosis. Our results demonstrate that our method can be successfully used to extract relations from literature about a biological process without an annotated corpus. The source code and data used in this study are available at https://github.com/AndreLamurias/IBRel. PMID:28263989

  1. Extracting microRNA-gene relations from biomedical literature using distant supervision.

    PubMed

    Lamurias, Andre; Clarke, Luka A; Couto, Francisco M

    2017-01-01

    Many biomedical relation extraction approaches are based on supervised machine learning, requiring an annotated corpus. Distant supervision aims at training a classifier by combining a knowledge base with a corpus, reducing the amount of manual effort necessary. This is particularly useful for biomedicine because many databases and ontologies have been made available for many biological processes, while the availability of annotated corpora is still limited. We studied the extraction of microRNA-gene relations from text. MicroRNA regulation is an important biological process due to its close association with human diseases. The proposed method, IBRel, is based on distantly supervised multi-instance learning. We evaluated IBRel on three datasets, and the results were compared with a co-occurrence approach as well as a supervised machine learning algorithm. While supervised learning outperformed on two of those datasets, IBRel obtained an F-score 28.3 percentage points higher on the dataset for which there was no training set developed specifically. To demonstrate the applicability of IBRel, we used it to extract 27 miRNA-gene relations from recently published papers about cystic fibrosis. Our results demonstrate that our method can be successfully used to extract relations from literature about a biological process without an annotated corpus. The source code and data used in this study are available at https://github.com/AndreLamurias/IBRel.

  2. Nested methylation-specific polymerase chain reaction cancer detection method

    DOEpatents

    Belinsky, Steven A [Albuquerque, NM; Palmisano, William A [Edgewood, NM

    2007-05-08

    A molecular marker-based method for monitoring and detecting cancer in humans. Aberrant methylation of gene promoters is a marker for cancer risk in humans. A two-stage, or "nested" polymerase chain reaction method is disclosed for detecting methylated DNA sequences at sufficiently high levels of sensitivity to permit cancer screening in biological fluid samples, such as sputum, obtained non-invasively. The method is for detecting the aberrant methylation of the p16 gene, O 6-methylguanine-DNA methyltransferase gene, Death-associated protein kinase gene, RAS-associated family 1 gene, or other gene promoters. The method offers a potentially powerful approach to population-based screening for the detection of lung and other cancers.

  3. DiffSLC: A graph centrality method to detect essential proteins of a protein-protein interaction network.

    PubMed

    Mistry, Divya; Wise, Roger P; Dickerson, Julie A

    2017-01-01

    Identification of central genes and proteins in biomolecular networks provides credible candidates for pathway analysis, functional analysis, and essentiality prediction. The DiffSLC centrality measure predicts central and essential genes and proteins using a protein-protein interaction network. Network centrality measures prioritize nodes and edges based on their importance to the network topology. These measures helped identify critical genes and proteins in biomolecular networks. The proposed centrality measure, DiffSLC, combines the number of interactions of a protein and the gene coexpression values of genes from which those proteins were translated, as a weighting factor to bias the identification of essential proteins in a protein interaction network. Potentially essential proteins with low node degree are promoted through eigenvector centrality. Thus, the gene coexpression values are used in conjunction with the eigenvector of the network's adjacency matrix and edge clustering coefficient to improve essentiality prediction. The outcome of this prediction is shown using three variations: (1) inclusion or exclusion of gene co-expression data, (2) impact of different coexpression measures, and (3) impact of different gene expression data sets. For a total of seven networks, DiffSLC is compared to other centrality measures using Saccharomyces cerevisiae protein interaction networks and gene expression data. Comparisons are also performed for the top ranked proteins against the known essential genes from the Saccharomyces Gene Deletion Project, which show that DiffSLC detects more essential proteins and has a higher area under the ROC curve than other compared methods. This makes DiffSLC a stronger alternative to other centrality methods for detecting essential genes using a protein-protein interaction network that obeys centrality-lethality principle. DiffSLC is implemented using the igraph package in R, and networkx package in Python. The python package can be obtained from git.io/diffslcpy. The R implementation and code to reproduce the analysis is available via git.io/diffslc.

  4. [Prokaryotic expression of recombinant prochymosin gene and its antiserum preparation].

    PubMed

    Li, Xin-ping; Liu, Huan-huan; Pu, Yan; Zhang, Fu-chun; Li, Yi-jie

    2012-07-01

    To optimize the prochymosin (pCHY) gene codons and express the gene in Escherichia coli (E.coli), and to prepare its antiserum and detect chymosin protein specifically. According to codon usage bias of E.coli, prochymosin gene sequence was synthesized based on the conserved sequences of prochymosin gene from bovine, lamb and camel, and then cloned into the plasmid pET-30a and pcDNA3-AAT-COMP-C3d3 (pcD-ACC), respectively. pET-30a-pCHY was expressed, as the detected antigen, in E.coli BL21(DE3) after IPTG induction. RT-PCR was used to detect prochymosin mRNA expression in liver from the mice injected pcDNA3-AAT-COMP-pCHY-C3d3(pACCC) by hydrodynamics-based transfection method. To prepare the antiserum of prochymosin, pACCC and GST-pCHY proteins were used to immunize New Zealand rabbits in accordance with DNA prime-protein boost strategy. Antibody levels were tested by ELISA. Western blotting showed the molecular weight of His-pCHY protein was about 55 000, similar to the expected molecular size. ELISA demonstrated that the titer level of prochymosin antiserum was high. Based on the codon optimization, we have obtained high-titer prochymosin antiserum through DNA vaccine vector pcD-ACC combined with DNA prime-protein boost strategy, similar to that by protein vaccine.

  5. Detection of type 2 diabetes related modules and genes based on epigenetic networks

    PubMed Central

    2014-01-01

    Background Type 2 diabetes (T2D) is one of the most common chronic metabolic diseases characterized by insulin resistance and the decrease of insulin secretion. Genetic variation can only explain part of the heritability of T2D, so there need new methods to detect the susceptibility genes of the disease. Epigenetics could establish the interface between the environmental factor and the T2D Pathological mechanism. Results Based on the network theory and by combining epigenetic characteristics with human interactome, the weighted human DNA methylation network (WMPN) was constructed, and a T2D-related subnetwork (TMSN) was obtained through T2D-related differentially methylated genes. It is found that TMSN had a T2D specific network structure that non-fatal metabolic disease causing genes were often located in the topological and functional periphery of network. Combined with chromatin modifications, the weighted chromatin modification network (WCPN) was built, and a T2D-related chromatin modification pattern subnetwork was obtained by the TMSN gene set. TCSN had a densely connected network community, indicating that TMSN and TCSN could represent a collection of T2D-related epigenetic dysregulated sub-pathways. Using the cumulative hypergeometric test, 24 interplay modules of DNA methylation and chromatin modifications were identified. By the analysis of gene expression in human T2D islet tissue, it is found that there existed genes with the variant expression level caused by the aberrant DNA methylation and (or) chromatin modifications, which might affect and promote the development of T2D. Conclusions Here we have detected the potential interplay modules of DNA methylation and chromatin modifications for T2D. The study of T2D epigenetic networks provides a new way for understanding the pathogenic mechanism of T2D caused by epigenetic disorders. PMID:24565181

  6. BEACON: automated tool for Bacterial GEnome Annotation ComparisON.

    PubMed

    Kalkatawi, Manal; Alam, Intikhab; Bajic, Vladimir B

    2015-08-18

    Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACON's utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27%, while the number of genes without any function assignment is reduced. We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/ .

  7. XRCC1 Polymorphism Associated With Late Toxicity After Radiation Therapy in Breast Cancer Patients

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Seibold, Petra; Behrens, Sabine; Schmezer, Peter

    Purpose: To identify single-nucleotide polymorphisms (SNPs) in oxidative stress–related genes associated with risk of late toxicities in breast cancer patients receiving radiation therapy. Methods and Materials: Using a 2-stage design, 305 SNPs in 59 candidate genes were investigated in the discovery phase in 753 breast cancer patients from 2 prospective cohorts from Germany. The 10 most promising SNPs in 4 genes were evaluated in the replication phase in up to 1883 breast cancer patients from 6 cohorts identified through the Radiogenomics Consortium. Outcomes of interest were late skin toxicity and fibrosis of the breast, as well as an overall toxicity score (Standardized Totalmore » Average Toxicity). Multivariable logistic and linear regression models were used to assess associations between SNPs and late toxicity. A meta-analysis approach was used to summarize evidence. Results: The association of a genetic variant in the base excision repair gene XRCC1, rs2682585, with normal tissue late radiation toxicity was replicated in all tested studies. In the combined analysis of discovery and replication cohorts, carrying the rare allele was associated with a significantly lower risk of skin toxicities (multivariate odds ratio 0.77, 95% confidence interval 0.61-0.96, P=.02) and a decrease in Standardized Total Average Toxicity scores (−0.08, 95% confidence interval −0.15 to −0.02, P=.016). Conclusions: Using a stage design with replication, we identified a variant allele in the base excision repair gene XRCC1 that could be used in combination with additional variants for developing a test to predict late toxicities after radiation therapy in breast cancer patients.« less

  8. Thirtyfold multiplex genotyping of the p53 gene using solid phase capturable dideoxynucleotides and mass spectrometry.

    PubMed

    Kim, Sobin; Ulz, Michael E; Nguyen, Tuan; Li, Chi-Ming; Sato, Takaaki; Tycko, Benjamin; Ju, Jingyue

    2004-05-01

    A mass spectrometry (MS) based multiplex genotyping method using solid phase capturable (SPC) dideoxynucleotides and single base extension (SBE), named the SPC-SBE, has been developed for mutation detection. We report here the simultaneous genotyping of 30 potential point mutation sites in exons 5, 7, and 8 of the human p53 gene in one tube using the SPC-SBE method. The 30 mutation sites, including the most frequently mutated p53 codons, were chosen to explore the high multiplexing scope of the SPC-SBE method. Thirty primers specific to each potential mutation site were designed to yield SBE products with sufficient mass differences. This was achieved by tuning the mass of some primers using modified nucleotides. Genomic DNA was amplified by multiplex PCR to produce amplicons of the three p53 exons. The 30 primers were combined with the PCR products and biotinylated dideoxynucleotides for SBE to generate 3'-biotinylated extension DNA products. These products were then captured by streptavidin-coated magnetic beads, while the unextended primers and other components in the reaction were washed away. The pure extension DNA products were subsequently released from the solid phase and analyzed with MS. We simultaneously genotyped 30 potential mutation sites in the p53 gene from Wilms' tumor, head and neck tumor, and colorectal tumor. Both homozygous and heterozygous genotypes were accurately determined with digital resolution. This is the highest level of multiplex genotyping reported thus far using MS, indicating that the approach might be applicable to screening a repertoire of genotypes in candidate genes as potential disease markers.

  9. Retransformation of a male sterile barnase line with the barstar gene as an efficient alternative method to identify male sterile-restorer combinations for heterosis breeding.

    PubMed

    Bisht, Naveen C; Jagannath, Arun; Burma, Pradeep K; Pradhan, Akshay K; Pental, Deepak

    2007-06-01

    We report in this study, an improved method for identifying male sterile-restorer combinations using the barnase-barstar system of pollination control for heterosis breeding in crop plants, as an alternative to the conventional line x tester cross method. In this strategy, a transgenic male sterile barnase line was retransformed with appropriate barstar constructs. Double transformants carrying both the barnase and barstar genes were identified and screened for their male fertility status. Using this strategy, 66-90% of fertile retransformants (restored events) were obtained in Brassica juncea using two different barstar constructs. Restored events were analysed for their pollen viability and copy number of the barstar gene. Around 90% of the restored events showed high pollen viability and approximately 30% contained single copy integrations of the barstar gene. These observations were significantly different from those made in our earlier studies using line (barnase) x tester (barstar) crosses, wherein only two viable male sterile-restorer combinations were identified by screening 88 different cross-combinations. The retransformation strategy not only generated several independent restorers for a given male sterile line from a single transformation experiment but also identified potential restorers in the T(0) generation itself leading to significant savings in time, cost and labour. Single copy restored plants with high pollen viability were selfed to segregate male sterile (barnase) and restorer (barstar) lines in the T(1) progeny which could subsequently be diversified into appropriate combiners for heterosis breeding. This strategy will be particularly useful for crop plants where poor transformation frequencies and/or lengthy transformation protocols are a major limitation.

  10. Retroviral vectors encoding ADA regulatory locus control region provide enhanced T-cell-specific transgene expression

    PubMed Central

    2009-01-01

    Background Murine retroviral vectors have been used in several hundred gene therapy clinical trials, but have fallen out of favor for a number of reasons. One issue is that gene expression from viral or internal promoters is highly variable and essentially unregulated. Moreover, with retroviral vectors, gene expression is usually silenced over time. Mammalian genes, in contrast, are characterized by highly regulated, precise levels of expression in both a temporal and a cell-specific manner. To ascertain if recapitulation of endogenous adenosine deaminase (ADA) expression can be achieved in a vector construct we created a new series of Moloney murine leukemia virus (MuLV) based retroviral vector that carry human regulatory elements including combinations of the ADA promoter, the ADA locus control region (LCR), ADA introns and human polyadenylation sequences in a self-inactivating vector backbone. Methods A MuLV-based retroviral vector with a self-inactivating (SIN) backbone, the phosphoglycerate kinase promoter (PGK) and the enhanced green fluorescent protein (eGFP), as a reporter gene, was generated. Subsequent vectors were constructed from this basic vector by deletion or addition of certain elements. The added elements that were assessed are the human ADA promoter, human ADA locus control region (LCR), introns 7, 8, and 11 from the human ADA gene, and human growth hormone polyadenylation signal. Retroviral vector particles were produced by transient three-plasmid transfection of 293T cells. Retroviral vectors encoding eGFP were titered by transducing 293A cells, and then the proportion of GFP-positive cells was determined using fluorescence-activated cell sorting (FACS). Non T-cell and T-cell lines were transduced at a multiplicity of infection (MOI) of 0.1 and the yield of eGFP transgene expression was evaluated by FACS analysis using mean fluorescent intensity (MFI) detection. Results Vectors that contained the ADA LCR were preferentially expressed in T-cell lines. Further improvements in T-cell specific gene expression were observed with the incorporation of additional cis-regulatory elements, such as a human polyadenylation signal and intron 7 from the human ADA gene. Conclusion These studies suggest that the combination of an authentically regulated ADA gene in a murine retroviral vector, together with additional locus-specific regulatory refinements, will yield a vector with a safer profile and greater efficacy in terms of high-level, therapeutic, regulated gene expression for the treatment of ADA-deficient severe combined immunodeficiency. PMID:20042112

  11. Genomic clocks and evolutionary timescales

    NASA Technical Reports Server (NTRS)

    Blair Hedges, S.; Kumar, Sudhir

    2003-01-01

    For decades, molecular clocks have helped to illuminate the evolutionary timescale of life, but now genomic data pose a challenge for time estimation methods. It is unclear how to integrate data from many genes, each potentially evolving under a different model of substitution and at a different rate. Current methods can be grouped by the way the data are handled (genes considered separately or combined into a 'supergene') and the way gene-specific rate models are applied (global versus local clock). There are advantages and disadvantages to each of these approaches, and the optimal method has not yet emerged. Fortunately, time estimates inferred using many genes or proteins have greater precision and appear to be robust to different approaches.

  12. A powerful score-based test statistic for detecting gene-gene co-association.

    PubMed

    Xu, Jing; Yuan, Zhongshang; Ji, Jiadong; Zhang, Xiaoshuai; Li, Hongkai; Wu, Xuesen; Xue, Fuzhong; Liu, Yanxun

    2016-01-29

    The genetic variants identified by Genome-wide association study (GWAS) can only account for a small proportion of the total heritability for complex disease. The existence of gene-gene joint effects which contains the main effects and their co-association is one of the possible explanations for the "missing heritability" problems. Gene-gene co-association refers to the extent to which the joint effects of two genes differ from the main effects, not only due to the traditional interaction under nearly independent condition but the correlation between genes. Generally, genes tend to work collaboratively within specific pathway or network contributing to the disease and the specific disease-associated locus will often be highly correlated (e.g. single nucleotide polymorphisms (SNPs) in linkage disequilibrium). Therefore, we proposed a novel score-based statistic (SBS) as a gene-based method for detecting gene-gene co-association. Various simulations illustrate that, under different sample sizes, marginal effects of causal SNPs and co-association levels, the proposed SBS has the better performance than other existed methods including single SNP-based and principle component analysis (PCA)-based logistic regression model, the statistics based on canonical correlations (CCU), kernel canonical correlation analysis (KCCU), partial least squares path modeling (PLSPM) and delta-square (δ (2)) statistic. The real data analysis of rheumatoid arthritis (RA) further confirmed its advantages in practice. SBS is a powerful and efficient gene-based method for detecting gene-gene co-association.

  13. Molecular phylogeny and evolutionary timescale for the family of mammalian herpesviruses.

    PubMed

    McGeoch, D J; Cook, S; Dolan, A; Jamieson, F E; Telford, E A

    1995-03-31

    A detailed phylogenetic analysis for mammalian members of the family Herpesviridae, based on molecular sequences is reported. Sets of encoded amino acid sequences were collected for eight well conserved genes that are common to mammalian herpesviruses. Phylogenetic trees were inferred from alignments of these sequence sets using both maximum parsimony and distance methods, and evaluated by bootstrap analysis. In all cases the three recognised subfamilies (Alpha-, Beta- and Gammaherpesvirinae), and major sublineages in each subfamily, were clearly distinguished, but within sublineages some finer details of branching were incompletely resolved. Multiple-gene sets were assembled to give a broadly based tree. The root position of the tree was estimated by assuming a constant molecular clock and also by analysis of one herpesviral gene set (that encoding uracil-DNA glycosylase) using cellular homologues as outgroups. Both procedures placed the root between the Alphaherpesvirinae and the other two subfamilies. Substitution rates were calculated for the combined gene sets based on a previous estimate for alphaherpesviral UL27 genes, where the time base had been obtained according to the hypothesis of cospeciation of virus and host lineages. Assuming a constant molecular clock, it was then estimated that the three subfamilies arose approximately 180 to 220 million years ago, that major sublineages within subfamilies were probably generated before the mammalian radiation of 80 to 60 million years ago, and that speciations within sublineages took place in the last 80 million years, probably with a major component of cospeciation with host lineages.

  14. Genome-wide molecular dissection of serotype M3 group A Streptococcus strains causing two epidemics of invasive infections.

    PubMed

    Beres, Stephen B; Sylva, Gail L; Sturdevant, Daniel E; Granville, Chanel N; Liu, Mengyao; Ricklefs, Stacy M; Whitney, Adeline R; Parkins, Larye D; Hoe, Nancy P; Adams, Gerald J; Low, Donald E; DeLeo, Frank R; McGeer, Allison; Musser, James M

    2004-08-10

    Molecular factors that contribute to the emergence of new virulent bacterial subclones and epidemics are poorly understood. We hypothesized that analysis of a population-based strain sample of serotype M3 group A Streptococcus (GAS) recovered from patients with invasive infection by using genome-wide investigative methods would provide new insight into this fundamental infectious disease problem. Serotype M3 GAS strains (n = 255) cultured from patients in Ontario, Canada, over 11 years and representing two distinct infection peaks were studied. Genetic diversity was indexed by pulsed-field gel electrophoresis, DNA-DNA microarray, whole-genome PCR scanning, prophage genotyping, targeted gene sequencing, and single-nucleotide polymorphism genotyping. All variation in gene content was attributable to acquisition or loss of prophages, a molecular process that generated unique combinations of proven or putative virulence genes. Distinct serotype M3 genotypes experienced rapid population expansion and caused infections that differed significantly in character and severity. Molecular genetic analysis, combined with immunologic studies, implicated a 4-aa duplication in the extreme N terminus of M protein as a factor contributing to an epidemic wave of serotype M3 invasive infections. This finding has implications for GAS vaccine research. Genome-wide analysis of population-based strain samples cultured from clinically well defined patients is crucial for understanding the molecular events underlying bacterial epidemics.

  15. Semirational Approach for Ultrahigh Poly(3-hydroxybutyrate) Accumulation in Escherichia coli by Combining One-Step Library Construction and High-Throughput Screening.

    PubMed

    Li, Teng; Ye, Jianwen; Shen, Rui; Zong, Yeqing; Zhao, Xuejin; Lou, Chunbo; Chen, Guo-Qiang

    2016-11-18

    As a product of a multistep enzymatic reaction, accumulation of poly(3-hydroxybutyrate) (PHB) in Escherichia coli (E. coli) can be achieved by overexpression of the PHB synthesis pathway from a native producer involving three genes phbC, phbA, and phbB. Pathway optimization by adjusting expression levels of the three genes can influence properties of the final product. Here, we reported a semirational approach for highly efficient PHB pathway optimization in E. coli based on a phbCAB operon cloned from the native producer Ralstonia entropha (R. entropha). Rationally designed ribosomal binding site (RBS) libraries with defined strengths for each of the three genes were constructed based on high or low copy number plasmids in a one-pot reaction by an oligo-linker mediated assembly (OLMA) method. Strains with desired properties were evaluated and selected by three different methodologies, including visual selection, high-throughput screening, and detailed in-depth analysis. Applying this approach, strains accumulating 0%-92% PHB contents in cell dry weight (CDW) were achieved. PHB with various weight-average molecular weights (M w ) of 2.7-6.8 × 10 6 were also efficiently produced in relatively high contents. These results suggest that the semirational approach combining library design, construction, and proper screening is an efficient way to optimize PHB and other multienzyme pathways.

  16. nGASP - the nematode genome annotation assessment project

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Coghlan, A; Fiedler, T J; McKay, S J

    2008-12-19

    While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets for 10 Mb of the C. elegans genome. Predictions were compared to reference gene sets consisting of confirmed or manually curated gene models from WormBase. The most accurate gene-finders were 'combiner'more » algorithms, which made use of transcript- and protein-alignments and multi-genome alignments, as well as gene predictions from other gene-finders. Gene-finders that used alignments of ESTs, mRNAs and proteins came in second place. There was a tie for third place between gene-finders that used multi-genome alignments and ab initio gene-finders. The median gene level sensitivity of combiners was 78% and their specificity was 42%, which is nearly the same accuracy as reported for combiners in the human genome. C. elegans genes with exons of unusual hexamer content, as well as those with many exons, short exons, long introns, a weak translation start signal, weak splice sites, or poorly conserved orthologs were the most challenging for gene-finders. While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets for 10 Mb of the C. elegans genome. Predictions were compared to reference gene sets consisting of confirmed or manually curated gene models from WormBase. The most accurate gene-finders were 'combiner' algorithms, which made use of transcript- and protein-alignments and multi-genome alignments, as well as gene predictions from other gene-finders. Gene-finders that used alignments of ESTs, mRNAs and proteins came in second place. There was a tie for third place between gene-finders that used multi-genome alignments and ab initio gene-finders. The median gene level sensitivity of combiners was 78% and their specificity was 42%, which is nearly the same accuracy as reported for combiners in the human genome. C. elegans genes with exons of unusual hexamer content, as well as those with many exons, short exons, long introns, a weak translation start signal, weak splice sites, or poorly conserved orthologs were the most challenging for gene-finders.« less

  17. High-throughput detection and screening of plants modified by gene editing using quantitative real-time polymerase chain reaction.

    PubMed

    Peng, Cheng; Wang, Hua; Xu, Xiaoli; Wang, Xiaofu; Chen, Xiaoyun; Wei, Wei; Lai, Yongmin; Liu, Guoquan; Godwin, Ian Douglas; Li, Jieqin; Zhang, Ling; Xu, Junfeng

    2018-05-15

    Gene editing techniques are becoming powerful tools for modifying target genes in organisms. Although several methods have been developed to detect gene-edited organisms, these techniques are time and labour intensive. Meanwhile, few studies have investigated high-throughput detection and screening strategies for plants modified by gene editing. In this study, we developed a simple, sensitive and high-throughput quantitative real-time (qPCR)-based method. The qPCR-based method exploits two differently labelled probes that are placed within one amplicon at the gene editing target site to simultaneously detect the wild-type and a gene-edited mutant. We showed that the qPCR-based method can accurately distinguish CRISPR/Cas9-induced mutants from the wild-type in several different plant species, such as Oryza sativa, Arabidopsis thaliana, Sorghum bicolor, and Zea mays. Moreover, the method can subsequently determine the mutation type by direct sequencing of the qPCR products of mutations due to gene editing. The qPCR-based method is also sufficiently sensitive to distinguish between heterozygous and homozygous mutations in T 0 transgenic plants. In a 384-well plate format, the method enabled the simultaneous analysis of up to 128 samples in three replicates without handling the post-polymerase chain reaction (PCR) products. Thus, we propose that our method is an ideal choice for screening plants modified by gene editing from many candidates in T 0 transgenic plants, which will be widely used in the area of plant gene editing. © 2018 The Authors The Plant Journal © 2018 John Wiley & Sons Ltd.

  18. Network-based integration of GWAS and gene expression identifies a HOX-centric network associated with serous ovarian cancer risk

    PubMed Central

    Kar, Siddhartha P.; Tyrer, Jonathan P.; Li, Qiyuan; Lawrenson, Kate; Aben, Katja K.H.; Anton-Culver, Hoda; Antonenkova, Natalia; Chenevix-Trench, Georgia; Baker, Helen; Bandera, Elisa V.; Bean, Yukie T.; Beckmann, Matthias W.; Berchuck, Andrew; Bisogna, Maria; Bjørge, Line; Bogdanova, Natalia; Brinton, Louise; Brooks-Wilson, Angela; Butzow, Ralf; Campbell, Ian; Carty, Karen; Chang-Claude, Jenny; Chen, Yian Ann; Chen, Zhihua; Cook, Linda S.; Cramer, Daniel; Cunningham, Julie M.; Cybulski, Cezary; Dansonka-Mieszkowska, Agnieszka; Dennis, Joe; Dicks, Ed; Doherty, Jennifer A.; Dörk, Thilo; du Bois, Andreas; Dürst, Matthias; Eccles, Diana; Easton, Douglas F.; Edwards, Robert P.; Ekici, Arif B.; Fasching, Peter A.; Fridley, Brooke L.; Gao, Yu-Tang; Gentry-Maharaj, Aleksandra; Giles, Graham G.; Glasspool, Rosalind; Goode, Ellen L.; Goodman, Marc T.; Grownwald, Jacek; Harrington, Patricia; Harter, Philipp; Hein, Alexander; Heitz, Florian; Hildebrandt, Michelle A.T.; Hillemanns, Peter; Hogdall, Estrid; Hogdall, Claus K.; Hosono, Satoyo; Iversen, Edwin S.; Jakubowska, Anna; Paul, James; Jensen, Allan; Ji, Bu-Tian; Karlan, Beth Y; Kjaer, Susanne K.; Kelemen, Linda E.; Kellar, Melissa; Kelley, Joseph; Kiemeney, Lambertus A.; Krakstad, Camilla; Kupryjanczyk, Jolanta; Lambrechts, Diether; Lambrechts, Sandrina; Le, Nhu D.; Lee, Alice W.; Lele, Shashi; Leminen, Arto; Lester, Jenny; Levine, Douglas A.; Liang, Dong; Lissowska, Jolanta; Lu, Karen; Lubinski, Jan; Lundvall, Lene; Massuger, Leon; Matsuo, Keitaro; McGuire, Valerie; McLaughlin, John R.; McNeish, Iain A.; Menon, Usha; Modugno, Francesmary; Moysich, Kirsten B.; Narod, Steven A.; Nedergaard, Lotte; Ness, Roberta B.; Nevanlinna, Heli; Odunsi, Kunle; Olson, Sara H.; Orlow, Irene; Orsulic, Sandra; Weber, Rachel Palmieri; Pearce, Celeste Leigh; Pejovic, Tanja; Pelttari, Liisa M.; Permuth-Wey, Jennifer; Phelan, Catherine M.; Pike, Malcolm C.; Poole, Elizabeth M.; Ramus, Susan J.; Risch, Harvey A.; Rosen, Barry; Rossing, Mary Anne; Rothstein, Joseph H.; Rudolph, Anja; Runnebaum, Ingo B.; Rzepecka, Iwona K.; Salvesen, Helga B.; Schildkraut, Joellen M.; Schwaab, Ira; Shu, Xiao-Ou; Shvetsov, Yurii B; Siddiqui, Nadeem; Sieh, Weiva; Song, Honglin; Southey, Melissa C.; Sucheston-Campbell, Lara E.; Tangen, Ingvild L.; Teo, Soo-Hwang; Terry, Kathryn L.; Thompson, Pamela J; Timorek, Agnieszka; Tsai, Ya-Yu; Tworoger, Shelley S.; van Altena, Anne M.; Van Nieuwenhuysen, Els; Vergote, Ignace; Vierkant, Robert A.; Wang-Gohrke, Shan; Walsh, Christine; Wentzensen, Nicolas; Whittemore, Alice S.; Wicklund, Kristine G.; Wilkens, Lynne R.; Woo, Yin-Ling; Wu, Xifeng; Wu, Anna; Yang, Hannah; Zheng, Wei; Ziogas, Argyrios; Sellers, Thomas A.; Monteiro, Alvaro N. A.; Freedman, Matthew L.; Gayther, Simon A.; Pharoah, Paul D. P.

    2015-01-01

    Background Genome-wide association studies (GWAS) have so far reported 12 loci associated with serous epithelial ovarian cancer (EOC) risk. We hypothesized that some of these loci function through nearby transcription factor (TF) genes and that putative target genes of these TFs as identified by co-expression may also be enriched for additional EOC risk associations. Methods We selected TF genes within 1 Mb of the top signal at the 12 genome-wide significant risk loci. Mutual information, a form of correlation, was used to build networks of genes strongly co-expressed with each selected TF gene in the unified microarray data set of 489 serous EOC tumors from The Cancer Genome Atlas. Genes represented in this data set were subsequently ranked using a gene-level test based on results for germline SNPs from a serous EOC GWAS meta-analysis (2,196 cases/4,396 controls). Results Gene set enrichment analysis identified six networks centered on TF genes (HOXB2, HOXB5, HOXB6, HOXB7 at 17q21.32 and HOXD1, HOXD3 at 2q31) that were significantly enriched for genes from the risk-associated end of the ranked list (P<0.05 and FDR<0.05). These results were replicated (P<0.05) using an independent association study (7,035 cases/21,693 controls). Genes underlying enrichment in the six networks were pooled into a combined network. Conclusion We identified a HOX-centric network associated with serous EOC risk containing several genes with known or emerging roles in serous EOC development. Impact Network analysis integrating large, context-specific data sets has the potential to offer mechanistic insights into cancer susceptibility and prioritize genes for experimental characterization. PMID:26209509

  19. Phenome-driven disease genetics prediction toward drug discovery

    PubMed Central

    Chen, Yang; Li, Li; Zhang, Guo-Qiang; Xu, Rong

    2015-01-01

    Motivation: Discerning genetic contributions to diseases not only enhances our understanding of disease mechanisms, but also leads to translational opportunities for drug discovery. Recent computational approaches incorporate disease phenotypic similarities to improve the prediction power of disease gene discovery. However, most current studies used only one data source of human disease phenotype. We present an innovative and generic strategy for combining multiple different data sources of human disease phenotype and predicting disease-associated genes from integrated phenotypic and genomic data. Results: To demonstrate our approach, we explored a new phenotype database from biomedical ontologies and constructed Disease Manifestation Network (DMN). We combined DMN with mimMiner, which was a widely used phenotype database in disease gene prediction studies. Our approach achieved significantly improved performance over a baseline method, which used only one phenotype data source. In the leave-one-out cross-validation and de novo gene prediction analysis, our approach achieved the area under the curves of 90.7% and 90.3%, which are significantly higher than 84.2% (P < e−4) and 81.3% (P < e−12) for the baseline approach. We further demonstrated that our predicted genes have the translational potential in drug discovery. We used Crohn’s disease as an example and ranked the candidate drugs based on the rank of drug targets. Our gene prediction approach prioritized druggable genes that are likely to be associated with Crohn’s disease pathogenesis, and our rank of candidate drugs successfully prioritized the Food and Drug Administration-approved drugs for Crohn’s disease. We also found literature evidence to support a number of drugs among the top 200 candidates. In summary, we demonstrated that a novel strategy combining unique disease phenotype data with system approaches can lead to rapid drug discovery. Availability and implementation: nlp.case.edu/public/data/DMN Contact: rxx@case.edu PMID:26072493

  20. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Belinsky, Steven A; Palmisano, William A

    A molecular marker-based method for monitoring and detecting cancer in humans. Aberrant methylation of gene promoters is a marker for cancer risk in humans. A two-stage, or "nested" polymerase chain reaction method is disclosed for detecting methylated DNA sequences at sufficiently high levels of sensitivity to permit cancer screening in biological fluid samples, such as sputum, obtained non-invasively. The method is for detecting the aberrant methylation of the p16 gene, O 6-methylguanine-DNA methyltransferase gene, Death-associated protein kinase gene, RAS-associated family 1 gene, or other gene promoters. The method offers a potentially powerful approach to population-based screening for the detection ofmore » lung and other cancers.« less

  1. Recursive feature selection with significant variables of support vectors.

    PubMed

    Tsai, Chen-An; Huang, Chien-Hsun; Chang, Ching-Wei; Chen, Chun-Houh

    2012-01-01

    The development of DNA microarray makes researchers screen thousands of genes simultaneously and it also helps determine high- and low-expression level genes in normal and disease tissues. Selecting relevant genes for cancer classification is an important issue. Most of the gene selection methods use univariate ranking criteria and arbitrarily choose a threshold to choose genes. However, the parameter setting may not be compatible to the selected classification algorithms. In this paper, we propose a new gene selection method (SVM-t) based on the use of t-statistics embedded in support vector machine. We compared the performance to two similar SVM-based methods: SVM recursive feature elimination (SVMRFE) and recursive support vector machine (RSVM). The three methods were compared based on extensive simulation experiments and analyses of two published microarray datasets. In the simulation experiments, we found that the proposed method is more robust in selecting informative genes than SVMRFE and RSVM and capable to attain good classification performance when the variations of informative and noninformative genes are different. In the analysis of two microarray datasets, the proposed method yields better performance in identifying fewer genes with good prediction accuracy, compared to SVMRFE and RSVM.

  2. A Generalized Approach for Measuring Relationships Among Genes.

    PubMed

    Wang, Lijun; Ahsan, Md Asif; Chen, Ming

    2017-07-21

    Several methods for identifying relationships among pairs of genes have been developed. In this article, we present a generalized approach for measuring relationships between any pairs of genes, which is based on statistical prediction. We derive two particular versions of the generalized approach, least squares estimation (LSE) and nearest neighbors prediction (NNP). According to mathematical proof, LSE is equivalent to the methods based on correlation; and NNP is approximate to one popular method called the maximal information coefficient (MIC) according to the performances in simulations and real dataset. Moreover, the approach based on statistical prediction can be extended from two-genes relationships to multi-genes relationships. This application would help to identify relationships among multi-genes.

  3. Pathways of topological rank analysis (PoTRA): a novel method to detect pathways involved in hepatocellular carcinoma

    PubMed Central

    Liu, Li; Dinu, Valentin

    2018-01-01

    Complex diseases such as cancer are usually the result of a combination of environmental factors and one or several biological pathways consisting of sets of genes. Each biological pathway exerts its function by delivering signaling through the gene network. Theoretically, a pathway is supposed to have a robust topological structure under normal physiological conditions. However, the pathway’s topological structure could be altered under some pathological condition. It is well known that a normal biological network includes a small number of well-connected hub nodes and a large number of nodes that are non-hubs. In addition, it is reported that the loss of connectivity is a common topological trait of cancer networks, which is an assumption of our method. Hence, from normal to cancer, the process of the network losing connectivity might be the process of disrupting the structure of the network, namely, the number of hub genes might be altered in cancer compared to that in normal or the distribution of topological ranks of genes might be altered. Based on this, we propose a new PageRank-based method called Pathways of Topological Rank Analysis (PoTRA) to detect pathways involved in cancer. We use PageRank to measure the relative topological ranks of genes in each biological pathway, then select hub genes for each pathway, and use Fisher’s exact test to test if the number of hub genes in each pathway is altered from normal to cancer. Alternatively, if the distribution of topological ranks of gene in a pathway is altered between normal and cancer, this pathway might also be involved in cancer. Hence, we use the Kolmogorov–Smirnov test to detect pathways that have an altered distribution of topological ranks of genes between two phenotypes. We apply PoTRA to study hepatocellular carcinoma (HCC) and several subtypes of HCC. Very interestingly, we discover that all significant pathways in HCC are cancer-associated generally, while several significant pathways in subtypes of HCC are HCC subtype-associated specifically. In conclusion, PoTRA is a new approach to explore and discover pathways involved in cancer. PoTRA can be used as a complement to other existing methods to broaden our understanding of the biological mechanisms behind cancer at the system-level. PMID:29666752

  4. Pathways of topological rank analysis (PoTRA): a novel method to detect pathways involved in hepatocellular carcinoma.

    PubMed

    Li, Chaoxing; Liu, Li; Dinu, Valentin

    2018-01-01

    Complex diseases such as cancer are usually the result of a combination of environmental factors and one or several biological pathways consisting of sets of genes. Each biological pathway exerts its function by delivering signaling through the gene network. Theoretically, a pathway is supposed to have a robust topological structure under normal physiological conditions. However, the pathway's topological structure could be altered under some pathological condition. It is well known that a normal biological network includes a small number of well-connected hub nodes and a large number of nodes that are non-hubs. In addition, it is reported that the loss of connectivity is a common topological trait of cancer networks, which is an assumption of our method. Hence, from normal to cancer, the process of the network losing connectivity might be the process of disrupting the structure of the network, namely, the number of hub genes might be altered in cancer compared to that in normal or the distribution of topological ranks of genes might be altered. Based on this, we propose a new PageRank-based method called Pathways of Topological Rank Analysis (PoTRA) to detect pathways involved in cancer. We use PageRank to measure the relative topological ranks of genes in each biological pathway, then select hub genes for each pathway, and use Fisher's exact test to test if the number of hub genes in each pathway is altered from normal to cancer. Alternatively, if the distribution of topological ranks of gene in a pathway is altered between normal and cancer, this pathway might also be involved in cancer. Hence, we use the Kolmogorov-Smirnov test to detect pathways that have an altered distribution of topological ranks of genes between two phenotypes. We apply PoTRA to study hepatocellular carcinoma (HCC) and several subtypes of HCC. Very interestingly, we discover that all significant pathways in HCC are cancer-associated generally, while several significant pathways in subtypes of HCC are HCC subtype-associated specifically. In conclusion, PoTRA is a new approach to explore and discover pathways involved in cancer. PoTRA can be used as a complement to other existing methods to broaden our understanding of the biological mechanisms behind cancer at the system-level.

  5. A method for gene-based pathway analysis using genomewide association study summary statistics reveals nine new type 1 diabetes associations.

    PubMed

    Evangelou, Marina; Smyth, Deborah J; Fortune, Mary D; Burren, Oliver S; Walker, Neil M; Guo, Hui; Onengut-Gumuscu, Suna; Chen, Wei-Min; Concannon, Patrick; Rich, Stephen S; Todd, John A; Wallace, Chris

    2014-12-01

    Pathway analysis can complement point-wise single nucleotide polymorphism (SNP) analysis in exploring genomewide association study (GWAS) data to identify specific disease-associated genes that can be candidate causal genes. We propose a straightforward methodology that can be used for conducting a gene-based pathway analysis using summary GWAS statistics in combination with widely available reference genotype data. We used this method to perform a gene-based pathway analysis of a type 1 diabetes (T1D) meta-analysis GWAS (of 7,514 cases and 9,045 controls). An important feature of the conducted analysis is the removal of the major histocompatibility complex gene region, the major genetic risk factor for T1D. Thirty-one of the 1,583 (2%) tested pathways were identified to be enriched for association with T1D at a 5% false discovery rate. We analyzed these 31 pathways and their genes to identify SNPs in or near these pathway genes that showed potentially novel association with T1D and attempted to replicate the association of 22 SNPs in additional samples. Replication P-values were skewed (P=9.85×10-11) with 12 of the 22 SNPs showing P<0.05. Support, including replication evidence, was obtained for nine T1D associated variants in genes ITGB7 (rs11170466, P=7.86×10-9), NRP1 (rs722988, 4.88×10-8), BAD (rs694739, 2.37×10-7), CTSB (rs1296023, 2.79×10-7), FYN (rs11964650, P=5.60×10-7), UBE2G1 (rs9906760, 5.08×10-7), MAP3K14 (rs17759555, 9.67×10-7), ITGB1 (rs1557150, 1.93×10-6), and IL7R (rs1445898, 2.76×10-6). The proposed methodology can be applied to other GWAS datasets for which only summary level data are available. © 2014 The Authors. ** Genetic Epidemiology published by Wiley Periodicals, Inc.

  6. Establishing Clonal Cell Lines with Endothelial-Like Potential from CD9hi, SSEA-1− Cells in Embryonic Stem Cell-Derived Embryoid Bodies

    PubMed Central

    Lian, Qizhou; Yeo, KengSuan; Que, Jianwen; Tan, EileenKhiaWay; Yu, Fenggang; Yin, Yijun; Salto-Tellez, Manuel; Oakley, Reida Menshawe El; Lim, Sai-Kiang

    2006-01-01

    Background Differentiation of embryonic stem cells (ESCs) into specific cell types with minimal risk of teratoma formation could be efficiently directed by first reducing the differentiation potential of ESCs through the generation of clonal, self-renewing lineage-restricted stem cell lines. Efforts to isolate these stem cells are, however, mired in an impasse where the lack of purified lineage-restricted stem cells has hindered the identification of defining markers for these rare stem cells and, in turn, their isolation. Methodology/Principal Findings We describe here a method for the isolation of clonal lineage-restricted cell lines with endothelial potential from ESCs through a combination of empirical and rational evidence-based methods. Using an empirical protocol that we have previously developed to generate embryo-derived RoSH lines with endothelial potential, we first generated E-RoSH lines from mouse ESC-derived embryoid bodies (EBs). Despite originating from different mouse strains, RoSH and E- RoSH lines have similar gene expression profiles (r2 = 0.93) while that between E-RoSH and ESCs was 0.83. In silico gene expression analysis predicted that like RoSH cells, E-RoSH cells have an increased propensity to differentiate into vasculature. Unlike their parental ESCs, E-RoSH cells did not form teratomas and differentiate efficiently into endothelial-like cells in vivo and in vitro. Gene expression and FACS analysis revealed that RoSH and E-RoSH cells are CD9hi, SSEA-1− while ESCs are CD9lo, SSEA-1+. Isolation of CD9hi, SSEA-1− cells that constituted 1%–10% of EB-derived cultures generated an E-RoSH-like culture with an identical E-RoSH-like gene expression profile (r2 = 0.95) and a propensity to differentiate into endothelial-like cells. Conclusions By combining empirical and rational evidence-based methods, we identified definitive selectable surface antigens for the isolation and propagation of lineage-restricted stem cells with endothelial-like potential from mouse ESCs. PMID:17183690

  7. Pathway-based factor analysis of gene expression data produces highly heritable phenotypes that associate with age.

    PubMed

    Anand Brown, Andrew; Ding, Zhihao; Viñuela, Ana; Glass, Dan; Parts, Leopold; Spector, Tim; Winn, John; Durbin, Richard

    2015-03-09

    Statistical factor analysis methods have previously been used to remove noise components from high-dimensional data prior to genetic association mapping and, in a guided fashion, to summarize biologically relevant sources of variation. Here, we show how the derived factors summarizing pathway expression can be used to analyze the relationships between expression, heritability, and aging. We used skin gene expression data from 647 twins from the MuTHER Consortium and applied factor analysis to concisely summarize patterns of gene expression to remove broad confounding influences and to produce concise pathway-level phenotypes. We derived 930 "pathway phenotypes" that summarized patterns of variation across 186 KEGG pathways (five phenotypes per pathway). We identified 69 significant associations of age with phenotype from 57 distinct KEGG pathways at a stringent Bonferroni threshold ([Formula: see text]). These phenotypes are more heritable ([Formula: see text]) than gene expression levels. On average, expression levels of 16% of genes within these pathways are associated with age. Several significant pathways relate to metabolizing sugars and fatty acids; others relate to insulin signaling. We have demonstrated that factor analysis methods combined with biological knowledge can produce more reliable phenotypes with less stochastic noise than the individual gene expression levels, which increases our power to discover biologically relevant associations. These phenotypes could also be applied to discover associations with other environmental factors. Copyright © 2015 Brown et al.

  8. Pathway-Based Factor Analysis of Gene Expression Data Produces Highly Heritable Phenotypes That Associate with Age

    PubMed Central

    Anand Brown, Andrew; Ding, Zhihao; Viñuela, Ana; Glass, Dan; Parts, Leopold; Spector, Tim; Winn, John; Durbin, Richard

    2015-01-01

    Statistical factor analysis methods have previously been used to remove noise components from high-dimensional data prior to genetic association mapping and, in a guided fashion, to summarize biologically relevant sources of variation. Here, we show how the derived factors summarizing pathway expression can be used to analyze the relationships between expression, heritability, and aging. We used skin gene expression data from 647 twins from the MuTHER Consortium and applied factor analysis to concisely summarize patterns of gene expression to remove broad confounding influences and to produce concise pathway-level phenotypes. We derived 930 “pathway phenotypes” that summarized patterns of variation across 186 KEGG pathways (five phenotypes per pathway). We identified 69 significant associations of age with phenotype from 57 distinct KEGG pathways at a stringent Bonferroni threshold (P<5.38×10−5). These phenotypes are more heritable (h2=0.32) than gene expression levels. On average, expression levels of 16% of genes within these pathways are associated with age. Several significant pathways relate to metabolizing sugars and fatty acids; others relate to insulin signaling. We have demonstrated that factor analysis methods combined with biological knowledge can produce more reliable phenotypes with less stochastic noise than the individual gene expression levels, which increases our power to discover biologically relevant associations. These phenotypes could also be applied to discover associations with other environmental factors. PMID:25758824

  9. Combined targeting of lentiviral vectors and positioning of transduced cells by magnetic nanoparticles

    PubMed Central

    Hofmann, Andreas; Wenzel, Daniela; Becher, Ulrich M.; Freitag, Daniel F.; Klein, Alexandra M.; Eberbeck, Dietmar; Schulte, Maike; Zimmermann, Katrin; Bergemann, Christian; Gleich, Bernhard; Roell, Wilhelm; Weyh, Thomas; Trahms, Lutz; Nickenig, Georg; Fleischmann, Bernd K.; Pfeifer, Alexander

    2009-01-01

    Targeting of viral vectors is a major challenge for in vivo gene delivery, especially after intravascular application. In addition, targeting of the endothelium itself would be of importance for gene-based therapies of vascular disease. Here, we used magnetic nanoparticles (MNPs) to combine cell transduction and positioning in the vascular system under clinically relevant, nonpermissive conditions, including hydrodynamic forces and hypothermia. The use of MNPs enhanced transduction efficiency of endothelial cells and enabled direct endothelial targeting of lentiviral vectors (LVs) by magnetic force, even in perfused vessels. In addition, application of external magnetic fields to mice significantly changed LV/MNP biodistribution in vivo. LV/MNP-transduced cells exhibited superparamagnetic behavior as measured by magnetorelaxometry, and they were efficiently retained by magnetic fields. The magnetic interactions were strong enough to position MNP-containing endothelial cells at the intima of vessels under physiological flow conditions. Importantly, magnetic positioning of MNP-labeled cells was also achieved in vivo in an injury model of the mouse carotid artery. Intravascular gene targeting can be combined with positioning of the transduced cells via nanomagnetic particles, thereby combining gene- and cell-based therapies. PMID:19118196

  10. Characterisation of gene delivery using liposomal bubbles and ultrasound

    NASA Astrophysics Data System (ADS)

    Koshima, Risa; Suzuki, Ryo; Oda, Yusuke; Hirata, Keiichi; Nomura, Tetsuya; Negishi, Yoichi; Utoguchi, Naoki; Kudo, Nobuki; Maruyama, Kazuo

    2011-09-01

    The combination of nano/microbubbles and ultrasound is a novel technique for a non-viral gene deliver. We have previously developed novel ultrasound sensitive liposomes (Bubble liposomes) which contain the ultrasound imaging gas perfluoropropane. In this study, Bubble liposomes were compared with cationic lipid (CL)-DNA complexes as potential gene delivery carriers into tumors in vivo. The delivery of genes by bubble liposomes depended on the intensity of the applied ultrasound. The transfection efficiency plateaued at 0.7 W/cm2 ultrasound intensity. Bubble liposomes efficiently transferred genes into cultured cells even when the cells were exposed to ultrasound for only 1 s. In addition, bubble liposomes were able to introduce the luciferase gene more effectively than CL-DNA complexes into mouse ascites tumor cells. We conclude that the combination of Bubble liposomes and ultrasound is a good method for gene transfer in vivo.

  11. Computational Gene Expression Modeling Identifies Salivary Biomarker Analysis that Predict Oral Feeding Readiness in the Newborn

    PubMed Central

    Maron, Jill L.; Hwang, Jooyeon S.; Pathak, Subash; Ruthazer, Robin; Russell, Ruby L.; Alterovitz, Gil

    2014-01-01

    Objective To combine mathematical modeling of salivary gene expression microarray data and systems biology annotation with RT-qPCR amplification to identify (phase I) and validate (phase II) salivary biomarker analysis for the prediction of oral feeding readiness in preterm infants. Study design Comparative whole transcriptome microarray analysis from 12 preterm newborns pre- and post-oral feeding success was used for computational modeling and systems biology analysis to identify potential salivary transcripts associated with oral feeding success (phase I). Selected gene expression biomarkers (15 from computational modeling; 6 evidence-based; and 3 reference) were evaluated by RT-qPCR amplification on 400 salivary samples from successful (n=200) and unsuccessful (n=200) oral feeders (phase II). Genes, alone and in combination, were evaluated by a multivariate analysis controlling for sex and post-conceptional age (PCA) to determine the probability that newborns achieved successful oral feeding. Results Advancing post-conceptional age (p < 0.001) and female sex (p = 0.05) positively predicted an infant’s ability to feed orally. A combination of five genes, NPY2R (hunger signaling), AMPK (energy homeostasis), PLXNA1 (olfactory neurogenesis), NPHP4 (visual behavior) and WNT3 (facial development), in addition to PCA and sex, demonstrated good accuracy for determining feeding success (AUROC = 0.78). Conclusions We have identified objective and biologically relevant salivary biomarkers that noninvasively assess a newborn’s developing brain, sensory and facial development as they relate to oral feeding success. Understanding the mechanisms that underlie the development of oral feeding readiness through translational and computational methods may improve clinical decision making while decreasing morbidities and health care costs. PMID:25620512

  12. Rapid Differentiation and In Situ Detection of 16 Sourdough Lactobacillus Species by Multiplex PCR

    PubMed Central

    Settanni, Luca; van Sinderen, Douwe; Rossi, Jone; Corsetti, Aldo

    2005-01-01

    A two-step multiplex PCR-based method was designed for the rapid detection of 16 species of lactobacilli known to be commonly present in sourdough. The first step of multiplex PCR was developed with a mixture of group-specific primers, while the second step included three multiplex PCR assays with a mixture of species-specific primers. Primers were derived from sequences that specify the 16S rRNA, the 16S-23S rRNA intergenic spacer region, and part of the 23S rRNA gene. The primer pairs designed were shown to exclusively amplify the targeted rrn operon fragment of the corresponding species. Due to the reliability of simultaneously identifying Lactobacillus plantarum, Lactobacillus pentosus, and Lactobacillus paraplantarum, a previously described multiplex PCR method employing recA gene-derived primers was included in the multiplex PCR system. The combination of a newly developed, quick bacterial DNA extraction method from sourdough and this multiplex PCR assay allows the rapid in situ detection of several sourdough-associated lactobacilli, including the recently described species Lactobacillus rossii, and thus represents a very useful alternative to culture-based methodologies. PMID:15933001

  13. Yeast Phenomics: An Experimental Approach for Modeling Gene Interaction Networks that Buffer Disease

    PubMed Central

    Hartman, John L.; Stisher, Chandler; Outlaw, Darryl A.; Guo, Jingyu; Shah, Najaf A.; Tian, Dehua; Santos, Sean M.; Rodgers, John W.; White, Richard A.

    2015-01-01

    The genome project increased appreciation of genetic complexity underlying disease phenotypes: many genes contribute each phenotype and each gene contributes multiple phenotypes. The aspiration of predicting common disease in individuals has evolved from seeking primary loci to marginal risk assignments based on many genes. Genetic interaction, defined as contributions to a phenotype that are dependent upon particular digenic allele combinations, could improve prediction of phenotype from complex genotype, but it is difficult to study in human populations. High throughput, systematic analysis of S. cerevisiae gene knockouts or knockdowns in the context of disease-relevant phenotypic perturbations provides a tractable experimental approach to derive gene interaction networks, in order to deduce by cross-species gene homology how phenotype is buffered against disease-risk genotypes. Yeast gene interaction network analysis to date has revealed biology more complex than previously imagined. This has motivated the development of more powerful yeast cell array phenotyping methods to globally model the role of gene interaction networks in modulating phenotypes (which we call yeast phenomic analysis). The article illustrates yeast phenomic technology, which is applied here to quantify gene X media interaction at higher resolution and supports use of a human-like media for future applications of yeast phenomics for modeling human disease. PMID:25668739

  14. Integrating alternative splicing detection into gene prediction.

    PubMed

    Foissac, Sylvain; Schiex, Thomas

    2005-02-10

    Alternative splicing (AS) is now considered as a major actor in transcriptome/proteome diversity and it cannot be neglected in the annotation process of a new genome. Despite considerable progresses in term of accuracy in computational gene prediction, the ability to reliably predict AS variants when there is local experimental evidence of it remains an open challenge for gene finders. We have used a new integrative approach that allows to incorporate AS detection into ab initio gene prediction. This method relies on the analysis of genomically aligned transcript sequences (ESTs and/or cDNAs), and has been implemented in the dynamic programming algorithm of the graph-based gene finder EuGENE. Given a genomic sequence and a set of aligned transcripts, this new version identifies the set of transcripts carrying evidence of alternative splicing events, and provides, in addition to the classical optimal gene prediction, alternative optimal predictions (among those which are consistent with the AS events detected). This allows for multiple annotations of a single gene in a way such that each predicted variant is supported by a transcript evidence (but not necessarily with a full-length coverage). This automatic combination of experimental data analysis and ab initio gene finding offers an ideal integration of alternatively spliced gene prediction inside a single annotation pipeline.

  15. Enhanced Reliability and Accuracy for Field Deployable Bioforensic Detection and Discrimination of Xylella fastidiosa subsp. pauca, Causal Agent of Citrus Variegated Chlorosis Using Razor Ex Technology and TaqMan Quantitative PCR

    PubMed Central

    Fletcher, Jacqueline; Melcher, Ulrich; Ochoa Corona, Francisco Manuel

    2013-01-01

    A reliable, accurate and rapid multigene-based assay combining real time quantitative PCR (qPCR) and a Razor Ex BioDetection System (Razor Ex) was validated for detection of Xylella fastidiosa subsp. pauca (Xfp, a xylem-limited bacterium that causes citrus variegated chlorosis [CVC]). CVC, which is exotic to the United States, has spread through South and Central America and could significantly impact U.S. citrus if it arrives. A method for early, accurate and sensitive detection of Xfp in plant tissues is needed by plant health officials for inspection of products from quarantined locations, and by extension specialists for detection, identification and management of disease outbreaks and reservoir hosts. Two sets of specific PCR primers and probes, targeting Xfp genes for fimbrillin and the periplasmic iron-binding protein were designed. A third pair of primers targeting the conserved cobalamin synthesis protein gene was designed to detect all possible X. fastidiosa (Xf) strains. All three primer sets detected as little as 1 fg of plasmid DNA carrying X. fastidiosa target sequences and genomic DNA of Xfp at as little as 1 - 10 fg. The use of Razor Ex facilitates a rapid (about 30 min) in-field assay capability for detection of all Xf strains, and for specific detection of Xfp. Combined use of three primer sets targeting different genes increased the assay accuracy and broadened the range of detection. To our knowledge, this is the first report of a field-deployable rapid and reliable bioforensic detection and discrimination method for a bacterial phytopathogen based on multigene targets. PMID:24312333

  16. Enhanced reliability and accuracy for field deployable bioforensic detection and discrimination of Xylella fastidiosa subsp. pauca, causal agent of citrus variegated chlorosis using razor ex technology and TaqMan quantitative PCR.

    PubMed

    Ouyang, Ping; Arif, Mohammad; Fletcher, Jacqueline; Melcher, Ulrich; Ochoa Corona, Francisco Manuel

    2013-01-01

    A reliable, accurate and rapid multigene-based assay combining real time quantitative PCR (qPCR) and a Razor Ex BioDetection System (Razor Ex) was validated for detection of Xylella fastidiosa subsp. pauca (Xfp, a xylem-limited bacterium that causes citrus variegated chlorosis [CVC]). CVC, which is exotic to the United States, has spread through South and Central America and could significantly impact U.S. citrus if it arrives. A method for early, accurate and sensitive detection of Xfp in plant tissues is needed by plant health officials for inspection of products from quarantined locations, and by extension specialists for detection, identification and management of disease outbreaks and reservoir hosts. Two sets of specific PCR primers and probes, targeting Xfp genes for fimbrillin and the periplasmic iron-binding protein were designed. A third pair of primers targeting the conserved cobalamin synthesis protein gene was designed to detect all possible X. fastidiosa (Xf) strains. All three primer sets detected as little as 1 fg of plasmid DNA carrying X. fastidiosa target sequences and genomic DNA of Xfp at as little as 1 - 10 fg. The use of Razor Ex facilitates a rapid (about 30 min) in-field assay capability for detection of all Xf strains, and for specific detection of Xfp. Combined use of three primer sets targeting different genes increased the assay accuracy and broadened the range of detection. To our knowledge, this is the first report of a field-deployable rapid and reliable bioforensic detection and discrimination method for a bacterial phytopathogen based on multigene targets.

  17. A genome-wide association study of corneal astigmatism: The CREAM Consortium

    PubMed Central

    Shah, Rupal L.; Li, Qing; Zhao, Wanting; Tedja, Milly S.; Tideman, J. Willem L.; Khawaja, Anthony P.; Fan, Qiao; Yazar, Seyhan; Williams, Katie M.; Verhoeven, Virginie J.M.; Xie, Jing; Wang, Ya Xing; Hess, Moritz; Nickels, Stefan; Lackner, Karl J.; Pärssinen, Olavi; Wedenoja, Juho; Biino, Ginevra; Concas, Maria Pina; Uitterlinden, André; Rivadeneira, Fernando; Jaddoe, Vincent W.V.; Hysi, Pirro G.; Sim, Xueling; Tan, Nicholas; Tham, Yih-Chung; Sensaki, Sonoko; Hofman, Albert; Vingerling, Johannes R.; Jonas, Jost B.; Mitchell, Paul; Hammond, Christopher J.; Höhn, René; Baird, Paul N.; Wong, Tien-Yin; Cheng, Chinfsg-Yu; Teo, Yik Ying; Mackey, David A.; Williams, Cathy; Saw, Seang-Mei; Klaver, Caroline C.W.; Bailey-Wilson, Joan E.

    2018-01-01

    Purpose To identify genes and genetic markers associated with corneal astigmatism. Methods A meta-analysis of genome-wide association studies (GWASs) of corneal astigmatism undertaken for 14 European ancestry (n=22,250) and 8 Asian ancestry (n=9,120) cohorts was performed by the Consortium for Refractive Error and Myopia. Cases were defined as having >0.75 diopters of corneal astigmatism. Subsequent gene-based and gene-set analyses of the meta-analyzed results of European ancestry cohorts were performed using VEGAS2 and MAGMA software. Additionally, estimates of single nucleotide polymorphism (SNP)-based heritability for corneal and refractive astigmatism and the spherical equivalent were calculated for Europeans using LD score regression. Results The meta-analysis of all cohorts identified a genome-wide significant locus near the platelet-derived growth factor receptor alpha (PDGFRA) gene: top SNP: rs7673984, odds ratio=1.12 (95% CI:1.08–1.16), p=5.55×10−9. No other genome-wide significant loci were identified in the combined analysis or European/Asian ancestry-specific analyses. Gene-based analysis identified three novel candidate genes for corneal astigmatism in Europeans—claudin-7 (CLDN7), acid phosphatase 2, lysosomal (ACP2), and TNF alpha-induced protein 8 like 3 (TNFAIP8L3). Conclusions In addition to replicating a previously identified genome-wide significant locus for corneal astigmatism near the PDGFRA gene, gene-based analysis identified three novel candidate genes, CLDN7, ACP2, and TNFAIP8L3, that warrant further investigation to understand their role in the pathogenesis of corneal astigmatism. The much lower number of genetic variants and genes demonstrating an association with corneal astigmatism compared to published spherical equivalent GWAS analyses suggest a greater influence of rare genetic variants, non-additive genetic effects, or environmental factors in the development of astigmatism. PMID:29422769

  18. Effects of blood-activating and stasis-removing drugs combined with VEGF gene transfer on angiogenesis in ischemic necrosis of the femoral head.

    PubMed

    Li, Jun-Hui; Wu, Ya-Ling; Ye, Jian-Hong; Ning, Ya-Gong; Yu, Hai-Ying; Peng, Zhong-Jie; Luan, Xiao-Wen

    2009-09-01

    To observe the promoting effects of blood-activating and stasis-removing Chinese drugs combined with vascular endothelial growth factor (VEGF) gene transfer on angiogenesis in ischemic necrosis of the femoral head. Forty Japanese giant-ear rabbits were randomly divided into a control group, a model group, a Chinese drug group, a gene group, and a combined group. After 8 weeks of treatment, the rate of VEGF positive cell expression in the synovium of the femoral head was measured using the immunohistochemical method, and the number of blood vessels in the femoral head was measured by digital subtraction angiography. The rate of VEGF positive cell expression in the model group was significantly lower than that in the Chinese drug group (P < 0.05) and very significantly lower than those in other groups (P < 0.01); but in the combined group it was significantly higher than in the Chinese drug group (P < 0.05). The differences in the number of blood vessels in area A between the model group and other groups were not statistically significant. However, in area B, the number of blood vessels significantly increased in the control group, the gene group and the combined group as compared with the model group (P < 0.05), and in the combined group the number of blood vessels was significantly more than in the gene group (P < 0.05); but in the Chinese drug group it was not significantly different than the model group (P > 0.05). Either the blood-activating and stasis-removing Chinese drugs or VEGF gene transfer can promote the angiogenesis and building of collateral circulation for femoral head ischemic necrosis, and the combined therapy with Chinese drugs or VEGF gene transfer may show a better therapeutic effect. The present study provides an experimental basis for clinical application of the combined therapy with the blood-activating and stasis-removing Chinese drugs and VEGF gene transfer.

  19. Discriminating the reaction types of plant type III polyketide synthases

    PubMed Central

    Shimizu, Yugo; Ogata, Hiroyuki; Goto, Susumu

    2017-01-01

    Abstract Motivation: Functional prediction of paralogs is challenging in bioinformatics because of rapid functional diversification after gene duplication events combined with parallel acquisitions of similar functions by different paralogs. Plant type III polyketide synthases (PKSs), producing various secondary metabolites, represent a paralogous family that has undergone gene duplication and functional alteration. Currently, there is no computational method available for the functional prediction of type III PKSs. Results: We developed a plant type III PKS reaction predictor, pPAP, based on the recently proposed classification of type III PKSs. pPAP combines two kinds of similarity measures: one calculated by profile hidden Markov models (pHMMs) built from functionally and structurally important partial sequence regions, and the other based on mutual information between residue positions. pPAP targets PKSs acting on ring-type starter substrates, and classifies their functions into four reaction types. The pHMM approach discriminated two reaction types with high accuracy (97.5%, 39/40), but its accuracy decreased when discriminating three reaction types (87.8%, 43/49). When combined with a correlation-based approach, all 49 PKSs were correctly discriminated, and pPAP was still highly accurate (91.4%, 64/70) even after adding other reaction types. These results suggest pPAP, which is based on linear discriminant analyses of similarity measures, is effective for plant type III PKS function prediction. Availability and Implementation: pPAP is freely available at ftp://ftp.genome.jp/pub/tools/ppap/ Contact: goto@kuicr.kyoto-u.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28334262

  20. A robust data-driven approach for gene ontology annotation.

    PubMed

    Li, Yanpeng; Yu, Hong

    2014-01-01

    Gene ontology (GO) and GO annotation are important resources for biological information management and knowledge discovery, but the speed of manual annotation became a major bottleneck of database curation. BioCreative IV GO annotation task aims to evaluate the performance of system that automatically assigns GO terms to genes based on the narrative sentences in biomedical literature. This article presents our work in this task as well as the experimental results after the competition. For the evidence sentence extraction subtask, we built a binary classifier to identify evidence sentences using reference distance estimator (RDE), a recently proposed semi-supervised learning method that learns new features from around 10 million unlabeled sentences, achieving an F1 of 19.3% in exact match and 32.5% in relaxed match. In the post-submission experiment, we obtained 22.1% and 35.7% F1 performance by incorporating bigram features in RDE learning. In both development and test sets, RDE-based method achieved over 20% relative improvement on F1 and AUC performance against classical supervised learning methods, e.g. support vector machine and logistic regression. For the GO term prediction subtask, we developed an information retrieval-based method to retrieve the GO term most relevant to each evidence sentence using a ranking function that combined cosine similarity and the frequency of GO terms in documents, and a filtering method based on high-level GO classes. The best performance of our submitted runs was 7.8% F1 and 22.2% hierarchy F1. We found that the incorporation of frequency information and hierarchy filtering substantially improved the performance. In the post-submission evaluation, we obtained a 10.6% F1 using a simpler setting. Overall, the experimental analysis showed our approaches were robust in both the two tasks. © The Author(s) 2014. Published by Oxford University Press.

  1. Association weight matrix for the genetic dissection of puberty in beef cattle.

    PubMed

    Fortes, Marina R S; Reverter, Antonio; Zhang, Yuandan; Collis, Eliza; Nagaraj, Shivashankar H; Jonsson, Nick N; Prayaga, Kishore C; Barris, Wes; Hawken, Rachel J

    2010-08-03

    We describe a systems biology approach for the genetic dissection of complex traits based on applying gene network theory to the results from genome-wide associations. The associations of single-nucleotide polymorphisms (SNP) that were individually associated with a primary phenotype of interest, age at puberty in our study, were explored across 22 related traits. Genomic regions were surveyed for genes harboring the selected SNP. As a result, an association weight matrix (AWM) was constructed with as many rows as genes and as many columns as traits. Each {i, j} cell value in the AWM corresponds to the z-score normalized additive effect of the ith gene (via its neighboring SNP) on the jth trait. Columnwise, the AWM recovered the genetic correlations estimated via pedigree-based restricted maximum-likelihood methods. Rowwise, a combination of hierarchical clustering, gene network, and pathway analyses identified genetic drivers that would have been missed by standard genome-wide association studies. Finally, the promoter regions of the AWM-predicted targets of three key transcription factors (TFs), estrogen-related receptor gamma (ESRRG), Pal3 motif, bound by a PPAR-gamma homodimer, IR3 sites (PPARG), and Prophet of Pit 1, PROP paired-like homeobox 1 (PROP1), were surveyed to identify binding sites corresponding to those TFs. Applied to our case, the AWM results recapitulate the known biology of puberty, captured experimentally validated binding sites, and identified candidate genes and gene-gene interactions for further investigation.

  2. An Integrative Framework for Bayesian Variable Selection with Informative Priors for Identifying Genes and Pathways

    PubMed Central

    Ander, Bradley P.; Zhang, Xiaoshuai; Xue, Fuzhong; Sharp, Frank R.; Yang, Xiaowei

    2013-01-01

    The discovery of genetic or genomic markers plays a central role in the development of personalized medicine. A notable challenge exists when dealing with the high dimensionality of the data sets, as thousands of genes or millions of genetic variants are collected on a relatively small number of subjects. Traditional gene-wise selection methods using univariate analyses face difficulty to incorporate correlational, structural, or functional structures amongst the molecular measures. For microarray gene expression data, we first summarize solutions in dealing with ‘large p, small n’ problems, and then propose an integrative Bayesian variable selection (iBVS) framework for simultaneously identifying causal or marker genes and regulatory pathways. A novel partial least squares (PLS) g-prior for iBVS is developed to allow the incorporation of prior knowledge on gene-gene interactions or functional relationships. From the point view of systems biology, iBVS enables user to directly target the joint effects of multiple genes and pathways in a hierarchical modeling diagram to predict disease status or phenotype. The estimated posterior selection probabilities offer probabilitic and biological interpretations. Both simulated data and a set of microarray data in predicting stroke status are used in validating the performance of iBVS in a Probit model with binary outcomes. iBVS offers a general framework for effective discovery of various molecular biomarkers by combining data-based statistics and knowledge-based priors. Guidelines on making posterior inferences, determining Bayesian significance levels, and improving computational efficiencies are also discussed. PMID:23844055

  3. An integrative framework for Bayesian variable selection with informative priors for identifying genes and pathways.

    PubMed

    Peng, Bin; Zhu, Dianwen; Ander, Bradley P; Zhang, Xiaoshuai; Xue, Fuzhong; Sharp, Frank R; Yang, Xiaowei

    2013-01-01

    The discovery of genetic or genomic markers plays a central role in the development of personalized medicine. A notable challenge exists when dealing with the high dimensionality of the data sets, as thousands of genes or millions of genetic variants are collected on a relatively small number of subjects. Traditional gene-wise selection methods using univariate analyses face difficulty to incorporate correlational, structural, or functional structures amongst the molecular measures. For microarray gene expression data, we first summarize solutions in dealing with 'large p, small n' problems, and then propose an integrative Bayesian variable selection (iBVS) framework for simultaneously identifying causal or marker genes and regulatory pathways. A novel partial least squares (PLS) g-prior for iBVS is developed to allow the incorporation of prior knowledge on gene-gene interactions or functional relationships. From the point view of systems biology, iBVS enables user to directly target the joint effects of multiple genes and pathways in a hierarchical modeling diagram to predict disease status or phenotype. The estimated posterior selection probabilities offer probabilitic and biological interpretations. Both simulated data and a set of microarray data in predicting stroke status are used in validating the performance of iBVS in a Probit model with binary outcomes. iBVS offers a general framework for effective discovery of various molecular biomarkers by combining data-based statistics and knowledge-based priors. Guidelines on making posterior inferences, determining Bayesian significance levels, and improving computational efficiencies are also discussed.

  4. Rapid identification of mutations in the IDS gene of Hunter patients: Analysis of mRNA by the protein truncation test

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hogervorst, F.B.L.; Tuijn, A.C. van der; Ommen, G.J.B. van

    Hunter syndrome is an X-linked recessive disorder constituting phenotypes ranging from mild to severe. The gene affected in Hunter syndrome is iduronate-2-sulfatase (IDS). The identification of mutations leading to a defective enzyme could be of benefit for the diagnosis and prognosis of patients. At this moment a variety of mutations have been found, including large deletions and base substitutions. We have previously described a method, designated the protein truncation test (PTT), for the detection of mutations leading to premature translation termination. The method combines reverse transcription and PCR (RT-PCR) with in vitro transcript/translation of the products generated. To facilitate amore » PTT analysis, the forward primer is modified by addition of a T7 promoter sequence and an in-frame protein translation initiation sequence. In our department the method has been successfully applied for DMD and FAP. Here we report on the PTT analysis of 8 Hunter patients, all of them without major gene alterations as determined by Southern analysis. Total RNA was isolated from cultured skin fibroblasts or peripheral blood lymphocytes. PTT analysis revealed 4 novel mutations in the IDS gene: two missense mutations and two frameshift mutations (splice donor site alteration in intron 6 and a 13 bp deletion in exon 9). Furthermore, PTT proved to be a simple method to identify carriers. Currently, we use the generated RT-PCR products of the remaining patients for automated sequence analysis. PTT may be of great value in screening disorders in which affected genes give rise to truncated protein products.« less

  5. Improving transcriptome construction in non-model organisms: integrating manual and automated gene definition in Emiliania huxleyi.

    PubMed

    Feldmesser, Ester; Rosenwasser, Shilo; Vardi, Assaf; Ben-Dor, Shifra

    2014-02-22

    The advent of Next Generation Sequencing technologies and corresponding bioinformatics tools allows the definition of transcriptomes in non-model organisms. Non-model organisms are of great ecological and biotechnological significance, and consequently the understanding of their unique metabolic pathways is essential. Several methods that integrate de novo assembly with genome-based assembly have been proposed. Yet, there are many open challenges in defining genes, particularly where genomes are not available or incomplete. Despite the large numbers of transcriptome assemblies that have been performed, quality control of the transcript building process, particularly on the protein level, is rarely performed if ever. To test and improve the quality of the automated transcriptome reconstruction, we used manually defined and curated genes, several of them experimentally validated. Several approaches to transcript construction were utilized, based on the available data: a draft genome, high quality RNAseq reads, and ESTs. In order to maximize the contribution of the various data, we integrated methods including de novo and genome based assembly, as well as EST clustering. After each step a set of manually curated genes was used for quality assessment of the transcripts. The interplay between the automated pipeline and the quality control indicated which additional processes were required to improve the transcriptome reconstruction. We discovered that E. huxleyi has a very high percentage of non-canonical splice junctions, and relatively high rates of intron retention, which caused unique issues with the currently available tools. While individual tools missed genes and artificially joined overlapping transcripts, combining the results of several tools improved the completeness and quality considerably. The final collection, created from the integration of several quality control and improvement rounds, was compared to the manually defined set both on the DNA and protein levels, and resulted in an improvement of 20% versus any of the read-based approaches alone. To the best of our knowledge, this is the first time that an automated transcript definition is subjected to quality control using manually defined and curated genes and thereafter the process is improved. We recommend using a set of manually curated genes to troubleshoot transcriptome reconstruction.

  6. Multi-label literature classification based on the Gene Ontology graph.

    PubMed

    Jin, Bo; Muller, Brian; Zhai, Chengxiang; Lu, Xinghua

    2008-12-08

    The Gene Ontology is a controlled vocabulary for representing knowledge related to genes and proteins in a computable form. The current effort of manually annotating proteins with the Gene Ontology is outpaced by the rate of accumulation of biomedical knowledge in literature, which urges the development of text mining approaches to facilitate the process by automatically extracting the Gene Ontology annotation from literature. The task is usually cast as a text classification problem, and contemporary methods are confronted with unbalanced training data and the difficulties associated with multi-label classification. In this research, we investigated the methods of enhancing automatic multi-label classification of biomedical literature by utilizing the structure of the Gene Ontology graph. We have studied three graph-based multi-label classification algorithms, including a novel stochastic algorithm and two top-down hierarchical classification methods for multi-label literature classification. We systematically evaluated and compared these graph-based classification algorithms to a conventional flat multi-label algorithm. The results indicate that, through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods can significantly improve predictions of the Gene Ontology terms implied by the analyzed text. Furthermore, the graph-based multi-label classifiers are capable of suggesting Gene Ontology annotations (to curators) that are closely related to the true annotations even if they fail to predict the true ones directly. A software package implementing the studied algorithms is available for the research community. Through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods have better potential than the conventional flat multi-label classification approach to facilitate protein annotation based on the literature.

  7. A hybrid network-based method for the detection of disease-related genes

    NASA Astrophysics Data System (ADS)

    Cui, Ying; Cai, Meng; Dai, Yang; Stanley, H. Eugene

    2018-02-01

    Detecting disease-related genes is crucial in disease diagnosis and drug design. The accepted view is that neighbors of a disease-causing gene in a molecular network tend to cause the same or similar diseases, and network-based methods have been recently developed to identify novel hereditary disease-genes in available biomedical networks. Despite the steady increase in the discovery of disease-associated genes, there is still a large fraction of disease genes that remains under the tip of the iceberg. In this paper we exploit the topological properties of the protein-protein interaction (PPI) network to detect disease-related genes. We compute, analyze, and compare the topological properties of disease genes with non-disease genes in PPI networks. We also design an improved random forest classifier based on these network topological features, and a cross-validation test confirms that our method performs better than previous similar studies.

  8. [Killing effect of Huaier combined with DC-CIK on nude mice bearing colon cancer HT29 stem cells in vivo].

    PubMed

    Sun, Wen-Wen; Dou, Jin-Xia; Zhang, Lin; Qiao, Li-Kui; Shen, Na; Gao, Wen-Yuan

    2018-01-01

    To compare the therapeutic effects of different treatment methods on the nude mice bearing colon cancer HT29 cells. BalB/C nude mice colon cancer stem cell models were established and randomly divided into the following four groups, with 8 nude mice in each group: blank control group, DC-CIK group, Huaier group, and Huaier combined with DC-CIK group (combined treatment group). The mice in DC-CIK group and combined treatment group received 1×10⁶ DC-CIK cells treatment by tail vein injectionafter the tumor stem cells were inoculated for 4 days,2 times a week for three weeks. The mice in Huaier group and combined treatment group received intragastric administration at the dose of 20 g/60 kg body weight, 0.2 mL/time, once a day for a total of three weeks. The mice in control group received equal volume of normal saline. Tumor size and body weight of nude mice were measured every 2 days during treatment for three weeks in each group. After the treatment, the nude mice were sacrificed to measure the tumor weight and the tumor inhibition rate was calculated. The RT-PCR method was used to detect the expression levels of the key genes in the signal pathway. After the end of the treatment, the quality of the tumor in the Huaier group, DC-CIK group and combined treatment group was significantly lower than that in the control group; the quality in combined treatment group was significantly lower than that in Huaier group and DC-CIK group.Among them, the tumor inhibition rate reached 46.77% in the combined treatment group. In respect of changes in expression levels of key genes in the signaling pathway, the mRNA expression levels of key genes PI3KR1 and Akt in PI3K/Akt pathway, key genes Wnt1 and CTTNB1 in Wnt/ β -catenin pathway, and key genes Notch1, Notch2, Notch3 in Notch pathway in the combined treatment group were lower than those in DC-CIK group and Huaier group. The Huaier combined with DC-CIK group showed best therapeutic effect among different treatment methods for HT29 stemcell colon tumors in nude mice, providing a new idea for clinical treatment of colon cancer. Copyright© by the Chinese Pharmaceutical Association.

  9. Fractal Clustering and Knowledge-driven Validation Assessment for Gene Expression Profiling.

    PubMed

    Wang, Lu-Yong; Balasubramanian, Ammaiappan; Chakraborty, Amit; Comaniciu, Dorin

    2005-01-01

    DNA microarray experiments generate a substantial amount of information about the global gene expression. Gene expression profiles can be represented as points in multi-dimensional space. It is essential to identify relevant groups of genes in biomedical research. Clustering is helpful in pattern recognition in gene expression profiles. A number of clustering techniques have been introduced. However, these traditional methods mainly utilize shape-based assumption or some distance metric to cluster the points in multi-dimension linear Euclidean space. Their results shows poor consistence with the functional annotation of genes in previous validation study. From a novel different perspective, we propose fractal clustering method to cluster genes using intrinsic (fractal) dimension from modern geometry. This method clusters points in such a way that points in the same clusters are more self-affine among themselves than to the points in other clusters. We assess this method using annotation-based validation assessment for gene clusters. It shows that this method is superior in identifying functional related gene groups than other traditional methods.

  10. Are "functionally related polymorphisms" of renin-angiotensin-aldosterone system gene polymorphisms associated with hypertension?

    PubMed

    Hahntow, Ines N; Mairuhu, Gideon; van Valkengoed, Irene Gm; Koopmans, Richard P; Michel, Martin C

    2010-06-02

    Genotype-phenotype association studies are typically based upon polymorphisms or haplotypes comprised of multiple polymorphisms within a single gene. It has been proposed that combinations of polymorphisms in distinct genes, which functionally impact the same phenotype, may have stronger phenotype associations than those within a single gene. We have tested this hypothesis using genes encoding components of the renin-angiotensin-aldosterone system and the high blood pressure phenotype. Our analysis is based on 1379 participants of the cross-sectional SUNSET study randomly selected from the population register of Amsterdam. Each subject was genotyped for the angiotensinogen M235T, the angiotensin-converting enzyme insertion/deletion and the angiotensin II type 1 receptor A1166C polymorphism. The phenotype high blood pressure was defined either as a categorical variable comparing hypertension versus normotension as in most previous studies or as a continuous variable using systolic, diastolic and mean blood pressure in a multiple regression analysis with gender, ethnicity, age, body-mass-index and antihypertensive medication as covariates. Genotype-phenotype relationships were explored for each polymorphism in isolation and for double and triple polymorphism combinations. At the single polymorphism level, only the A allele of the angiotensin II type 1 receptor was associated with a high blood pressure phenotype. Using combinations of polymorphisms of two or all three genes did not yield stronger/more consistent associations. We conclude that combinations of physiologically related polymorphisms of multiple genes, at least with regard to the renin-angiotensin-aldosterone system and the hypertensive phenotype, do not necessarily offer additional benefit in analyzing genotype/phenotype associations.

  11. Highly specific gene silencing in a monocot species by artificial microRNAs derived from chimeric miRNA precursors

    DOE PAGES

    Carbonell, Alberto; Fahlgren, Noah; Mitchell, Skyler; ...

    2015-05-20

    Artificial microRNAs (amiRNAs) are used for selective gene silencing in plants. However, current methods to produce amiRNA constructs for silencing transcripts in monocot species are not suitable for simple, cost-effective and large-scale synthesis. Here, a series of expression vectors based on Oryza sativa MIR390 (OsMIR390) precursor was developed for high-throughput cloning and high expression of amiRNAs in monocots. Four different amiRNA sequences designed to target specifically endogenous genes and expressed from OsMIR390-based vectors were validated in transgenic Brachypodium distachyon plants. Surprisingly, amiRNAs accumulated to higher levels and were processed more accurately when expressed from chimeric OsMIR390-based precursors that include distalmore » stem-loop sequences from Arabidopsis thaliana MIR390a (AtMIR390a). In all cases, transgenic plants displayed the predicted phenotypes induced by target gene repression, and accumulated high levels of amiRNAs and low levels of the corresponding target transcripts. Genome-wide transcriptome profiling combined with 5-RLM-RACE analysis in transgenic plants confirmed that amiRNAs were highly specific. Finally, significance Statement A series of amiRNA vectors based on Oryza sativa MIR390 (OsMIR390) precursor were developed for simple, cost-effective and large-scale synthesis of amiRNA constructs to silence genes in monocots. Unexpectedly, amiRNAs produced from chimeric OsMIR390-based precursors including Arabidopsis thaliana MIR390a distal stem-loop sequences accumulated elevated levels of highly effective and specific amiRNAs in transgenic Brachypodium distachyon plants.« less

  12. Synergistic effect of amino acids modified on dendrimer surface in gene delivery.

    PubMed

    Wang, Fei; Wang, Yitong; Wang, Hui; Shao, Naimin; Chen, Yuanyuan; Cheng, Yiyun

    2014-11-01

    Design of an efficient gene vector based on dendrimer remains a great challenge due to the presence of multiple barriers in gene delivery. Single-functionalization on dendrimer cannot overcome all the barriers. In this study, we synthesized a list of single-, dual- and triple-functionalized dendrimers with arginine, phenylalanine and histidine for gene delivery using a one-pot approach. The three amino acids play different roles in gene delivery: arginine is essential in formation of stable complexes, phenylalanine improves cellular uptake efficacy, and histidine increases pH-buffering capacity and minimizes cytotoxicity of the cationic dendrimer. A combination of these amino acids on dendrimer generates a synergistic effect in gene delivery. The dual- and triple-functionalized dendrimers show minimal cytotoxicity on the transfected NIH 3T3 cells. Using this combination strategy, we can obtain triple-functionalized dendrimers with comparable transfection efficacy to several commercial transfection reagents. Such a combination strategy should be applicable to the design of efficient and biocompatible gene vectors for gene delivery. Copyright © 2014 Elsevier Ltd. All rights reserved.

  13. Staph ID/R: a Rapid Method for Determining Staphylococcus Species Identity and Detecting the mecA Gene Directly from Positive Blood Culture

    PubMed Central

    Pasko, Chris; Dunn, John; Jaeckel, Heidi; Nieuwlandt, Dan; Weed, Diane; Woodruff, Evelyn; Zheng, Xiaotian

    2012-01-01

    Rapid diagnosis of staphylococcal bacteremia directs appropriate antimicrobial therapy, leading to improved patient outcome. We describe herein a rapid test (<75 min) that can identify the major pathogenic strains of Staphylococcus to the species level as well as the presence or absence of the methicillin resistance determinant gene, mecA. The test, Staph ID/R, combines a rapid isothermal nucleic acid amplification method, helicase-dependent amplification (HDA), with a chip-based array that produces unambiguous visible results. The analytic sensitivity was 1 CFU per reaction for the mecA gene and was 1 to 250 CFU per reaction depending on the staphylococcal species present in the positive blood culture. Staph ID/R has excellent specificity as well, with no cross-reactivity observed. We validated the performance of Staph ID/R by testing 104 frozen clinical positive blood cultures and comparing the results with rpoB gene or 16S rRNA gene sequencing for species identity determinations and mecA gene PCR to confirm mecA gene results. Staph ID/R agreed with mecA gene PCR for all samples and agreed with rpoB/16S rRNA gene sequencing in all cases except for one sample that contained a mixture of two staphylococcal species, one of which Staph ID/R correctly identified, for an overall agreement of 99.0% (P < 0.01). Staph ID/R could potentially be used to positively affect patient management for Staphylococcus-mediated bacteremia. PMID:22170912

  14. An Efficient Test for Gene-Environment Interaction in Generalized Linear Mixed Models with Family Data.

    PubMed

    Mazo Lopera, Mauricio A; Coombes, Brandon J; de Andrade, Mariza

    2017-09-27

    Gene-environment (GE) interaction has important implications in the etiology of complex diseases that are caused by a combination of genetic factors and environment variables. Several authors have developed GE analysis in the context of independent subjects or longitudinal data using a gene-set. In this paper, we propose to analyze GE interaction for discrete and continuous phenotypes in family studies by incorporating the relatedness among the relatives for each family into a generalized linear mixed model (GLMM) and by using a gene-based variance component test. In addition, we deal with collinearity problems arising from linkage disequilibrium among single nucleotide polymorphisms (SNPs) by considering their coefficients as random effects under the null model estimation. We show that the best linear unbiased predictor (BLUP) of such random effects in the GLMM is equivalent to the ridge regression estimator. This equivalence provides a simple method to estimate the ridge penalty parameter in comparison to other computationally-demanding estimation approaches based on cross-validation schemes. We evaluated the proposed test using simulation studies and applied it to real data from the Baependi Heart Study consisting of 76 families. Using our approach, we identified an interaction between BMI and the Peroxisome Proliferator Activated Receptor Gamma ( PPARG ) gene associated with diabetes.

  15. A Hybrid Approach for CpG Island Detection in the Human Genome.

    PubMed

    Yang, Cheng-Hong; Lin, Yu-Da; Chiang, Yi-Cheng; Chuang, Li-Yeh

    2016-01-01

    CpG islands have been demonstrated to influence local chromatin structures and simplify the regulation of gene activity. However, the accurate and rapid determination of CpG islands for whole DNA sequences remains experimentally and computationally challenging. A novel procedure is proposed to detect CpG islands by combining clustering technology with the sliding-window method (PSO-based). Clustering technology is used to detect the locations of all possible CpG islands and process the data, thus effectively obviating the need for the extensive and unnecessary processing of DNA fragments, and thus improving the efficiency of sliding-window based particle swarm optimization (PSO) search. This proposed approach, named ClusterPSO, provides versatile and highly-sensitive detection of CpG islands in the human genome. In addition, the detection efficiency of ClusterPSO is compared with eight CpG island detection methods in the human genome. Comparison of the detection efficiency for the CpG islands in human genome, including sensitivity, specificity, accuracy, performance coefficient (PC), and correlation coefficient (CC), ClusterPSO revealed superior detection ability among all of the test methods. Moreover, the combination of clustering technology and PSO method can successfully overcome their respective drawbacks while maintaining their advantages. Thus, clustering technology could be hybridized with the optimization algorithm method to optimize CpG island detection. The prediction accuracy of ClusterPSO was quite high, indicating the combination of CpGcluster and PSO has several advantages over CpGcluster and PSO alone. In addition, ClusterPSO significantly reduced implementation time.

  16. Protein-Protein Interaction Network and Gene Ontology

    NASA Astrophysics Data System (ADS)

    Choi, Yunkyu; Kim, Seok; Yi, Gwan-Su; Park, Jinah

    Evolution of computer technologies makes it possible to access a large amount and various kinds of biological data via internet such as DNA sequences, proteomics data and information discovered about them. It is expected that the combination of various data could help researchers find further knowledge about them. Roles of a visualization system are to invoke human abilities to integrate information and to recognize certain patterns in the data. Thus, when the various kinds of data are examined and analyzed manually, an effective visualization system is an essential part. One instance of these integrated visualizations can be combination of protein-protein interaction (PPI) data and Gene Ontology (GO) which could help enhance the analysis of PPI network. We introduce a simple but comprehensive visualization system that integrates GO and PPI data where GO and PPI graphs are visualized side-by-side and supports quick reference functions between them. Furthermore, the proposed system provides several interactive visualization methods for efficiently analyzing the PPI network and GO directedacyclic- graph such as context-based browsing and common ancestors finding.

  17. Quantitative multiplex quantum dot in-situ hybridisation based gene expression profiling in tissue microarrays identifies prognostic genes in acute myeloid leukaemia

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tholouli, Eleni; MacDermott, Sarah; Hoyland, Judith

    2012-08-24

    Highlights: Black-Right-Pointing-Pointer Development of a quantitative high throughput in situ expression profiling method. Black-Right-Pointing-Pointer Application to a tissue microarray of 242 AML bone marrow samples. Black-Right-Pointing-Pointer Identification of HOXA4, HOXA9, Meis1 and DNMT3A as prognostic markers in AML. -- Abstract: Measurement and validation of microarray gene signatures in routine clinical samples is problematic and a rate limiting step in translational research. In order to facilitate measurement of microarray identified gene signatures in routine clinical tissue a novel method combining quantum dot based oligonucleotide in situ hybridisation (QD-ISH) and post-hybridisation spectral image analysis was used for multiplex in-situ transcript detection inmore » archival bone marrow trephine samples from patients with acute myeloid leukaemia (AML). Tissue-microarrays were prepared into which white cell pellets were spiked as a standard. Tissue microarrays were made using routinely processed bone marrow trephines from 242 patients with AML. QD-ISH was performed for six candidate prognostic genes using triplex QD-ISH for DNMT1, DNMT3A, DNMT3B, and for HOXA4, HOXA9, Meis1. Scrambled oligonucleotides were used to correct for background staining followed by normalisation of expression against the expression values for the white cell pellet standard. Survival analysis demonstrated that low expression of HOXA4 was associated with poorer overall survival (p = 0.009), whilst high expression of HOXA9 (p < 0.0001), Meis1 (p = 0.005) and DNMT3A (p = 0.04) were associated with early treatment failure. These results demonstrate application of a standardised, quantitative multiplex QD-ISH method for identification of prognostic markers in formalin-fixed paraffin-embedded clinical samples, facilitating measurement of gene expression signatures in routine clinical samples.« less

  18. MO-DE-207B-03: Improved Cancer Classification Using Patient-Specific Biological Pathway Information Via Gene Expression Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Young, M; Craft, D

    Purpose: To develop an efficient, pathway-based classification system using network biology statistics to assist in patient-specific response predictions to radiation and drug therapies across multiple cancer types. Methods: We developed PICS (Pathway Informed Classification System), a novel two-step cancer classification algorithm. In PICS, a matrix m of mRNA expression values for a patient cohort is collapsed into a matrix p of biological pathways. The entries of p, which we term pathway scores, are obtained from either principal component analysis (PCA), normal tissue centroid (NTC), or gene expression deviation (GED). The pathway score matrix is clustered using both k-means and hierarchicalmore » clustering, and a clustering is judged by how well it groups patients into distinct survival classes. The most effective pathway scoring/clustering combination, per clustering p-value, thus generates various ‘signatures’ for conventional and functional cancer classification. Results: PICS successfully regularized large dimension gene data, separated normal and cancerous tissues, and clustered a large patient cohort spanning six cancer types. Furthermore, PICS clustered patient cohorts into distinct, statistically-significant survival groups. For a suboptimally-debulked ovarian cancer set, the pathway-classified Kaplan-Meier survival curve (p = .00127) showed significant improvement over that of a prior gene expression-classified study (p = .0179). For a pancreatic cancer set, the pathway-classified Kaplan-Meier survival curve (p = .00141) showed significant improvement over that of a prior gene expression-classified study (p = .04). Pathway-based classification confirmed biomarkers for the pyrimidine, WNT-signaling, glycerophosphoglycerol, beta-alanine, and panthothenic acid pathways for ovarian cancer. Despite its robust nature, PICS requires significantly less run time than current pathway scoring methods. Conclusion: This work validates the PICS method to improve cancer classification using biological pathways. Patients are classified with greater specificity and physiological relevance as compared to current gene-specific approaches. Focus now moves to utilizing PICS for pan-cancer patient-specific treatment response prediction.« less

  19. Polymorphisms in Genes of Relevance for Oestrogen and Oxytocin Pathways and Risk of Barrett's Oesophagus and Oesophageal Adenocarcinoma: A Pooled Analysis from the BEACON Consortium.

    PubMed

    Lagergren, Katarina; Ek, Weronica E; Levine, David; Chow, Wong-Ho; Bernstein, Leslie; Casson, Alan G; Risch, Harvey A; Shaheen, Nicholas J; Bird, Nigel C; Reid, Brian J; Corley, Douglas A; Hardie, Laura J; Wu, Anna H; Fitzgerald, Rebecca C; Pharoah, Paul; Caldas, Carlos; Romero, Yvonne; Vaughan, Thomas L; MacGregor, Stuart; Whiteman, David; Westberg, Lars; Nyren, Olof; Lagergren, Jesper

    2015-01-01

    The strong male predominance in oesophageal adenocarcinoma (OAC) and Barrett's oesophagus (BO) continues to puzzle. Hormonal influence, e.g. oestrogen or oxytocin, might contribute. This genetic-epidemiological study pooled 14 studies from three continents, Australia, Europe, and North America. Polymorphisms in 3 key genes coding for the oestrogen pathway (receptor alpha (ESR1), receptor beta (ESR2), and aromatase (CYP19A1)), and 3 key genes of the oxytocin pathway (the oxytocin receptor (OXTR), oxytocin protein (OXT), and cyclic ADP ribose hydrolase glycoprotein (CD38)), were analysed using a gene-based approach, versatile gene-based test association study (VEGAS). Among 1508 OAC patients, 2383 BO patients, and 2170 controls, genetic variants within ESR1 were associated with BO in males (p = 0.0058) and an increased risk of OAC and BO combined in males (p = 0.0023). Genetic variants within OXTR were associated with an increased risk of BO in both sexes combined (p = 0.0035) and in males (p = 0.0012). We followed up these suggestive findings in a further smaller data set, but found no replication. There were no significant associations between the other 4 genes studied and risk of OAC, BO, separately on in combination, in males and females combined or in males only. Genetic variants in the oestrogen receptor alpha and the oxytocin receptor may be associated with an increased risk of BO or OAC, but replication in other large samples are needed.

  20. Identification of Linkages between EDCs in Personal Care Products and Breast Cancer through Data Integration Combined with Gene Network Analysis.

    PubMed

    Jeong, Hyeri; Kim, Jongwoon; Kim, Youngjun

    2017-09-30

    Approximately 1000 chemicals have been reported to possibly have endocrine disrupting effects, some of which are used in consumer products, such as personal care products (PCPs) and cosmetics. We conducted data integration combined with gene network analysis to: (i) identify causal molecular mechanisms between endocrine disrupting chemicals (EDCs) used in PCPs and breast cancer; and (ii) screen candidate EDCs associated with breast cancer. Among EDCs used in PCPs, four EDCs having correlation with breast cancer were selected, and we curated 27 common interacting genes between those EDCs and breast cancer to perform the gene network analysis. Based on the gene network analysis, ESR1, TP53, NCOA1, AKT1, and BCL6 were found to be key genes to demonstrate the molecular mechanisms of EDCs in the development of breast cancer. Using GeneMANIA, we additionally predicted 20 genes which could interact with the 27 common genes. In total, 47 genes combining the common and predicted genes were functionally grouped with the gene ontology and KEGG pathway terms. With those genes, we finally screened candidate EDCs for their potential to increase breast cancer risk. This study highlights that our approach can provide insights to understand mechanisms of breast cancer and identify potential EDCs which are in association with breast cancer.

  1. A Novel Method Combining Vitreous Aspiration and Intravitreal AAV2/8 Injection Results in Retina-Wide Transduction in Adult Mice.

    PubMed

    Da Costa, Romain; Röger, Carsten; Segelken, Jasmin; Barben, Maya; Grimm, Christian; Neidhardt, John

    2016-10-01

    Gene therapies to treat eye disorders have been extensively studied in the past 20 years. Frequently, adeno-associated viruses were applied to the subretinal or intravitreal space of the eye to transduce retinal cells with nucleotide sequences of therapeutic potential. In this study we describe a novel intravitreal injection procedure that leads to a reproducible adeno-associated virus (AAV)2/8-mediated transduction of more than 70% of the retina. Prior to a single intravitreal injection of a enhanced green fluorescent protien (GFP)-expressing viral suspension, we performed an aspiration of vitreous tissue from wild-type C57Bl/6J mice. One and one-half microliters of AAV2/8 suspension was injected. Funduscopy, optical coherence tomography (OCT), laser scanning microscopy of retinal flat mounts, cryosections of eye cups, and ERG recordings verified the efficacy and safety of the method. The combination of vitreous aspiration and intravitreal injection resulted in an almost complete transduction of the retina in approximately 60% of the eyes and showed transduced cells in all retinal layers. Photoreceptors and RPE cells were predominantly transduced. Eyes presented with well-preserved retinal morphology. Electroretinographic recordings suggested that the new combination of techniques did not cause significant alterations of the retinal physiology. We show a novel application technique of AAV2/8 to the vitreous of mice that leads to widespread transduction of the retina. The results of this study have implications for virus-based gene therapies and basic science; for example, they might provide an approach to apply gene replacement strategies or clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9 in vivo. It may further help to develop similar techniques for larger animal models or humans.

  2. The Prediction of Drug-Disease Correlation Based on Gene Expression Data.

    PubMed

    Cui, Hui; Zhang, Menghuan; Yang, Qingmin; Li, Xiangyi; Liebman, Michael; Yu, Ying; Xie, Lu

    2018-01-01

    The explosive growth of high-throughput experimental methods and resulting data yields both opportunity and challenge for selecting the correct drug to treat both a specific patient and their individual disease. Ideally, it would be useful and efficient if computational approaches could be applied to help achieve optimal drug-patient-disease matching but current efforts have met with limited success. Current approaches have primarily utilized the measureable effect of a specific drug on target tissue or cell lines to identify the potential biological effect of such treatment. While these efforts have met with some level of success, there exists much opportunity for improvement. This specifically follows the observation that, for many diseases in light of actual patient response, there is increasing need for treatment with combinations of drugs rather than single drug therapies. Only a few previous studies have yielded computational approaches for predicting the synergy of drug combinations by analyzing high-throughput molecular datasets. However, these computational approaches focused on the characteristics of the drug itself, without fully accounting for disease factors. Here, we propose an algorithm to specifically predict synergistic effects of drug combinations on various diseases, by integrating the data characteristics of disease-related gene expression profiles with drug-treated gene expression profiles. We have demonstrated utility through its application to transcriptome data, including microarray and RNASeq data, and the drug-disease prediction results were validated using existing publications and drug databases. It is also applicable to other quantitative profiling data such as proteomics data. We also provide an interactive web interface to allow our Prediction of Drug-Disease method to be readily applied to user data. While our studies represent a preliminary exploration of this critical problem, we believe that the algorithm can provide the basis for further refinement towards addressing a large clinical need.

  3. Haplotype combination of the bovine INSIG1 gene sequence variants and association with growth traits in Nanyang cattle.

    PubMed

    Sun, Jiajie; Gao, Yuan; Liu, Dong; Ma, Wei; Xue, Jing; Zhang, Chunlei; Lan, Xianyong; Lei, Chuzhao; Chen, Hong

    2012-06-01

    The insulin-induced gene 1 (INSIG1) gene encodes a protein that blocks proteolytic activation of sterol regulatory element binding proteins, which are transcription factors that activate genes that regulate cholesterol, fatty acid, and glucose metabolism. However, similar research for the bovine INSIG1 gene is lacking. Therefore, in this study, polymorphisms of the bovine INSIG1 gene were detected in 643 individuals from four cattle breeds by DNA pooling, forced PCR-RFLP, PCR-SSCP, and DNA sequencing methods. Only 10 novel SNPs were identified, which included four mutations in the coding region and the others in the introns. In Nanyang individuals, seven common haplotypes were identified based on four coding region SNPs. The haplotype GACT, with a frequency of 75.4%, was the most prevalent haplotypes and SNPs formed two linkage disequilibrium blocks with strong multi-allelic D' (D' = 1). Additionally, association analysis between mutations of the bovine INSIG1 gene and growth traits in Nanyang cattle at 6, 12, 18, and 24 months old was performed, and the results indicated that the polymorphisms were not significantly associated with body mass.

  4. On the value of nuclear and mitochondrial gene sequences for reconstructing the phylogeny of vanilloid orchids (Vanilloideae, Orchidaceae)

    PubMed Central

    Cameron, Kenneth M.

    2009-01-01

    Background and Aims Most molecular phylogenetic studies of Orchidaceae have relied heavily on DNA sequences from the plastid genome. Nuclear and mitochondrial loci have only been superficially examined for their systematic value. Since 40% of the genera within Vanilloideae are achlorophyllous mycoheterotrophs, this is an ideal group of orchids in which to evaluate non-plastid gene sequences. Methods Phylogenetic reconstructions for Vanilloideae were produced using independent and combined data from the nuclear 18S, 5·8S and 26S rDNA genes and the mitochondrial atpA gene and nad1b-c intron. Key Results These new data indicate placements for genera such as Lecanorchis and Galeola, for which plastid gene sequences have been mostly unavailable. Nuclear and mitochondrial parsimony jackknife trees are congruent with each other and previously published trees based solely on plastid data. Because of high rates of sequence divergence among vanilloid orchids, even the short 5·8S rDNA gene provides impressive levels of resolution and support. Conclusions Orchid systematists are encouraged to sequence nuclear and mitochondrial gene regions along with the growing number of plastid loci available. PMID:19251715

  5. Validation of endogenous internal real-time PCR controls in renal tissues.

    PubMed

    Cui, Xiangqin; Zhou, Juling; Qiu, Jing; Johnson, Martin R; Mrug, Michal

    2009-01-01

    Endogenous internal controls ('reference' or 'housekeeping' genes) are widely used in real-time PCR (RT-PCR) analyses. Their use relies on the premise of consistently stable expression across studied experimental conditions. Unfortunately, none of these controls fulfills this premise across a wide range of experimental conditions; consequently, none of them can be recommended for universal use. To determine which endogenous RT-PCR controls are suitable for analyses of renal tissues altered by kidney disease, we studied the expression of 16 commonly used 'reference genes' in 7 mildly and 7 severely affected whole kidney tissues from a well-characterized cystic kidney disease model. Expression levels of these 16 genes, determined by TaqMan RT-PCR analyses and Affymetrix GeneChip arrays, were normalized and tested for overall variance and equivalence of the means. Both statistical approaches and both TaqMan- and GeneChip-based methods converged on 3 out of the 4 top-ranked genes (Ppia, Gapdh and Pgk1) that had the most constant expression levels across the studied phenotypes. A combination of the top-ranked genes will provide a suitable endogenous internal control for similar studies of kidney tissues across a wide range of disease severity. Copyright 2009 S. Karger AG, Basel.

  6. Statistical tools for transgene copy number estimation based on real-time PCR.

    PubMed

    Yuan, Joshua S; Burris, Jason; Stewart, Nathan R; Mentewab, Ayalew; Stewart, C Neal

    2007-11-01

    As compared with traditional transgene copy number detection technologies such as Southern blot analysis, real-time PCR provides a fast, inexpensive and high-throughput alternative. However, the real-time PCR based transgene copy number estimation tends to be ambiguous and subjective stemming from the lack of proper statistical analysis and data quality control to render a reliable estimation of copy number with a prediction value. Despite the recent progresses in statistical analysis of real-time PCR, few publications have integrated these advancements in real-time PCR based transgene copy number determination. Three experimental designs and four data quality control integrated statistical models are presented. For the first method, external calibration curves are established for the transgene based on serially-diluted templates. The Ct number from a control transgenic event and putative transgenic event are compared to derive the transgene copy number or zygosity estimation. Simple linear regression and two group T-test procedures were combined to model the data from this design. For the second experimental design, standard curves were generated for both an internal reference gene and the transgene, and the copy number of transgene was compared with that of internal reference gene. Multiple regression models and ANOVA models can be employed to analyze the data and perform quality control for this approach. In the third experimental design, transgene copy number is compared with reference gene without a standard curve, but rather, is based directly on fluorescence data. Two different multiple regression models were proposed to analyze the data based on two different approaches of amplification efficiency integration. Our results highlight the importance of proper statistical treatment and quality control integration in real-time PCR-based transgene copy number determination. These statistical methods allow the real-time PCR-based transgene copy number estimation to be more reliable and precise with a proper statistical estimation. Proper confidence intervals are necessary for unambiguous prediction of trangene copy number. The four different statistical methods are compared for their advantages and disadvantages. Moreover, the statistical methods can also be applied for other real-time PCR-based quantification assays including transfection efficiency analysis and pathogen quantification.

  7. Network-Based Method for Identifying Co-Regeneration Genes in Bone, Dentin, Nerve and Vessel Tissues

    PubMed Central

    Pan, Hongying; Zhang, Yu-Hang; Feng, Kaiyan; Kong, XiangYin; Cai, Yu-Dong

    2017-01-01

    Bone and dental diseases are serious public health problems. Most current clinical treatments for these diseases can produce side effects. Regeneration is a promising therapy for bone and dental diseases, yielding natural tissue recovery with few side effects. Because soft tissues inside the bone and dentin are densely populated with nerves and vessels, the study of bone and dentin regeneration should also consider the co-regeneration of nerves and vessels. In this study, a network-based method to identify co-regeneration genes for bone, dentin, nerve and vessel was constructed based on an extensive network of protein–protein interactions. Three procedures were applied in the network-based method. The first procedure, searching, sought the shortest paths connecting regeneration genes of one tissue type with regeneration genes of other tissues, thereby extracting possible co-regeneration genes. The second procedure, testing, employed a permutation test to evaluate whether possible genes were false discoveries; these genes were excluded by the testing procedure. The last procedure, screening, employed two rules, the betweenness ratio rule and interaction score rule, to select the most essential genes. A total of seventeen genes were inferred by the method, which were deemed to contribute to co-regeneration of at least two tissues. All these seventeen genes were extensively discussed to validate the utility of the method. PMID:28974058

  8. Network-Based Method for Identifying Co- Regeneration Genes in Bone, Dentin, Nerve and Vessel Tissues.

    PubMed

    Chen, Lei; Pan, Hongying; Zhang, Yu-Hang; Feng, Kaiyan; Kong, XiangYin; Huang, Tao; Cai, Yu-Dong

    2017-10-02

    Bone and dental diseases are serious public health problems. Most current clinical treatments for these diseases can produce side effects. Regeneration is a promising therapy for bone and dental diseases, yielding natural tissue recovery with few side effects. Because soft tissues inside the bone and dentin are densely populated with nerves and vessels, the study of bone and dentin regeneration should also consider the co-regeneration of nerves and vessels. In this study, a network-based method to identify co-regeneration genes for bone, dentin, nerve and vessel was constructed based on an extensive network of protein-protein interactions. Three procedures were applied in the network-based method. The first procedure, searching, sought the shortest paths connecting regeneration genes of one tissue type with regeneration genes of other tissues, thereby extracting possible co-regeneration genes. The second procedure, testing, employed a permutation test to evaluate whether possible genes were false discoveries; these genes were excluded by the testing procedure. The last procedure, screening, employed two rules, the betweenness ratio rule and interaction score rule, to select the most essential genes. A total of seventeen genes were inferred by the method, which were deemed to contribute to co-regeneration of at least two tissues. All these seventeen genes were extensively discussed to validate the utility of the method.

  9. Proteomics to study DNA-bound and chromatin-associated gene regulatory complexes

    PubMed Central

    Wierer, Michael; Mann, Matthias

    2016-01-01

    High-resolution mass spectrometry (MS)-based proteomics is a powerful method for the identification of soluble protein complexes and large-scale affinity purification screens can decode entire protein interaction networks. In contrast, protein complexes residing on chromatin have been much more challenging, because they are difficult to purify and often of very low abundance. However, this is changing due to recent methodological and technological advances in proteomics. Proteins interacting with chromatin marks can directly be identified by pulldowns with synthesized histone tails containing posttranslational modifications (PTMs). Similarly, pulldowns with DNA baits harbouring single nucleotide polymorphisms or DNA modifications reveal the impact of those DNA alterations on the recruitment of transcription factors. Accurate quantitation – either isotope-based or label free – unambiguously pinpoints proteins that are significantly enriched over control pulldowns. In addition, protocols that combine classical chromatin immunoprecipitation (ChIP) methods with mass spectrometry (ChIP-MS) target gene regulatory complexes in their in-vivo context. Similar to classical ChIP, cells are crosslinked with formaldehyde and chromatin sheared by sonication or nuclease digested. ChIP-MS baits can be proteins in tagged or endogenous form, histone PTMs, or lncRNAs. Locus-specific ChIP-MS methods would allow direct purification of a single genomic locus and the proteins associated with it. There, loci can be targeted either by artificial DNA-binding sites and corresponding binding proteins or via proteins with sequence specificity such as TAL or nuclease deficient Cas9 in combination with a specific guide RNA. We predict that advances in MS technology will soon make such approaches generally applicable tools in epigenetics. PMID:27402878

  10. A novel artificial intelligence method for weekly dietary menu planning.

    PubMed

    Gaál, B; Vassányi, I; Kozmann, G

    2005-01-01

    Menu planning is an important part of personalized lifestyle counseling. The paper describes the results of an automated menu generator (MenuGene) of the web-based lifestyle counseling system Cordelia that provides personalized advice to prevent cardiovascular diseases. The menu generator uses genetic algorithms to prepare weekly menus for web users. The objectives are derived from personal medical data collected via forms in Cordelia, combined with general nutritional guidelines. The weekly menu is modeled as a multilevel structure. Results show that the genetic algorithm-based method succeeds in planning dietary menus that satisfy strict numerical constraints on every nutritional level (meal, daily basis, weekly basis). The rule-based assessment proved capable of manipulating the mean occurrence of the nutritional components thus providing a method for adjusting the variety and harmony of the menu plans. By splitting the problem into well determined sub-problems, weekly menu plans that satisfy nutritional constraints and have well assorted components can be generated with the same method that is for daily and meal plan generation.

  11. Potential applications of next generation DNA sequencing of 16S rRNA gene amplicons in microbial water quality monitoring

    PubMed Central

    Vierheilig, J.; Savio, D.; Ley, R. E.; Mach, R. L.; Farnleitner, A. H.

    2016-01-01

    The applicability of next generation DNA sequencing (NGS) methods for water quality assessment has so far not been broadly investigated. This study set out to evaluate the potential of an NGS-based approach in a complex catchment with importance for drinking water abstraction. In this multicompartment investigation, total bacterial communities in water, faeces, soil, and sediment samples were investigated by 454 pyrosequencing of bacterial 16S rRNA gene amplicons to assess the capabilities of this NGS method for (i) the development and evaluation of environmental molecular diagnostics, (ii) direct screening of the bulk bacterial communities, and (iii) the detection of faecal pollution in water. Results indicate that NGS methods can highlight potential target populations for diagnostics and will prove useful for the evaluation of existing and the development of novel DNA-based detection methods in the field of water microbiology. The used approach allowed unveiling of dominant bacterial populations but failed to detect populations with low abundances such as faecal indicators in surface waters. In combination with metadata, NGS data will also allow the identification of drivers of bacterial community composition during water treatment and distribution, highlighting the power of this approach for monitoring of bacterial regrowth and contamination in technical systems. PMID:26606090

  12. Finding gene regulatory network candidates using the gene expression knowledge base.

    PubMed

    Venkatesan, Aravind; Tripathi, Sushil; Sanz de Galdeano, Alejandro; Blondé, Ward; Lægreid, Astrid; Mironov, Vladimir; Kuiper, Martin

    2014-12-10

    Network-based approaches for the analysis of large-scale genomics data have become well established. Biological networks provide a knowledge scaffold against which the patterns and dynamics of 'omics' data can be interpreted. The background information required for the construction of such networks is often dispersed across a multitude of knowledge bases in a variety of formats. The seamless integration of this information is one of the main challenges in bioinformatics. The Semantic Web offers powerful technologies for the assembly of integrated knowledge bases that are computationally comprehensible, thereby providing a potentially powerful resource for constructing biological networks and network-based analysis. We have developed the Gene eXpression Knowledge Base (GeXKB), a semantic web technology based resource that contains integrated knowledge about gene expression regulation. To affirm the utility of GeXKB we demonstrate how this resource can be exploited for the identification of candidate regulatory network proteins. We present four use cases that were designed from a biological perspective in order to find candidate members relevant for the gastrin hormone signaling network model. We show how a combination of specific query definitions and additional selection criteria derived from gene expression data and prior knowledge concerning candidate proteins can be used to retrieve a set of proteins that constitute valid candidates for regulatory network extensions. Semantic web technologies provide the means for processing and integrating various heterogeneous information sources. The GeXKB offers biologists such an integrated knowledge resource, allowing them to address complex biological questions pertaining to gene expression. This work illustrates how GeXKB can be used in combination with gene expression results and literature information to identify new potential candidates that may be considered for extending a gene regulatory network.

  13. Identification of downy mildew resistance gene candidates by positional cloning in maize (Zea mays subsp. mays; Poaceae)1

    PubMed Central

    Kim, Jae Yoon; Moon, Jun-Cheol; Kim, Hyo Chul; Shin, Seungho; Song, Kitae; Kim, Kyung-Hee; Lee, Byung-Moo

    2017-01-01

    Premise of the study: Positional cloning in combination with phenotyping is a general approach to identify disease-resistance gene candidates in plants; however, it requires several time-consuming steps including population or fine mapping. Therefore, in the present study, we suggest a new combined strategy to improve the identification of disease-resistance gene candidates. Methods and Results: Downy mildew (DM)–resistant maize was selected from five cultivars using a spreader row technique. Positional cloning and bioinformatics tools were used to identify the DM-resistance quantitative trait locus marker (bnlg1702) and 47 protein-coding gene annotations. Eventually, five DM-resistance gene candidates, including bZIP34, Bak1, and Ppr, were identified by quantitative reverse-transcription PCR (RT-PCR) without fine mapping of the bnlg1702 locus. Conclusions: The combined protocol with the spreader row technique, quantitative trait locus positional cloning, and quantitative RT-PCR was effective for identifying DM-resistance candidate genes. This cloning approach may be applied to other whole-genome-sequenced crops or resistance to other diseases. PMID:28224059

  14. Current Progress in Gene Delivery Technology Based on Chemical Methods and Nano-carriers

    PubMed Central

    Jin, Lian; Zeng, Xin; Liu, Ming; Deng, Yan; He, Nongyue

    2014-01-01

    Gene transfer methods are promising in the field of gene therapy. Current methods for gene transfer include three major groups: viral, physical and chemical methods. This review mainly summarizes development of several types of chemical methods for gene transfer in vitro and in vivo by means of nano-carriers like; calcium phosphates, lipids, and cationic polymers including chitosan, polyethylenimine, polyamidoamine dendrimers, and poly(lactide-co-glycolide). This review also briefly introduces applications of these chemical methods for gene delivery. PMID:24505233

  15. Parameter estimation methods for gene circuit modeling from time-series mRNA data: a comparative study.

    PubMed

    Fan, Ming; Kuwahara, Hiroyuki; Wang, Xiaolei; Wang, Suojin; Gao, Xin

    2015-11-01

    Parameter estimation is a challenging computational problem in the reverse engineering of biological systems. Because advances in biotechnology have facilitated wide availability of time-series gene expression data, systematic parameter estimation of gene circuit models from such time-series mRNA data has become an important method for quantitatively dissecting the regulation of gene expression. By focusing on the modeling of gene circuits, we examine here the performance of three types of state-of-the-art parameter estimation methods: population-based methods, online methods and model-decomposition-based methods. Our results show that certain population-based methods are able to generate high-quality parameter solutions. The performance of these methods, however, is heavily dependent on the size of the parameter search space, and their computational requirements substantially increase as the size of the search space increases. In comparison, online methods and model decomposition-based methods are computationally faster alternatives and are less dependent on the size of the search space. Among other things, our results show that a hybrid approach that augments computationally fast methods with local search as a subsequent refinement procedure can substantially increase the quality of their parameter estimates to the level on par with the best solution obtained from the population-based methods while maintaining high computational speed. These suggest that such hybrid methods can be a promising alternative to the more commonly used population-based methods for parameter estimation of gene circuit models when limited prior knowledge about the underlying regulatory mechanisms makes the size of the parameter search space vastly large. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  16. An ensemble rank learning approach for gene prioritization.

    PubMed

    Lee, Po-Feng; Soo, Von-Wun

    2013-01-01

    Several different computational approaches have been developed to solve the gene prioritization problem. We intend to use the ensemble boosting learning techniques to combine variant computational approaches for gene prioritization in order to improve the overall performance. In particular we add a heuristic weighting function to the Rankboost algorithm according to: 1) the absolute ranks generated by the adopted methods for a certain gene, and 2) the ranking relationship between all gene-pairs from each prioritization result. We select 13 known prostate cancer genes in OMIM database as training set and protein coding gene data in HGNC database as test set. We adopt the leave-one-out strategy for the ensemble rank boosting learning. The experimental results show that our ensemble learning approach outperforms the four gene-prioritization methods in ToppGene suite in the ranking results of the 13 known genes in terms of mean average precision, ROC and AUC measures.

  17. Gene trap and gene inversion methods for conditional gene inactivation in the mouse

    PubMed Central

    Xin, Hong-Bo; Deng, Ke-Yu; Shui, Bo; Qu, Shimian; Sun, Qi; Lee, Jane; Greene, Kai Su; Wilson, Jason; Yu, Ying; Feldman, Morris; Kotlikoff, Michael I.

    2005-01-01

    Conditional inactivation of individual genes in mice using site-specific recombinases is an extremely powerful method for determining the complex roles of mammalian genes in developmental and tissue-specific contexts, a major goal of post-genomic research. However, the process of generating mice with recombinase recognition sequences placed at specific locations within a gene, while maintaining a functional allele, is time consuming, expensive and technically challenging. We describe a system that combines gene trap and site-specific DNA inversion to generate mouse embryonic stem (ES) cell clones for the rapid production of conditional knockout mice, and the use of this system in an initial gene trap screen. Gene trapping should allow the selection of thousands of ES cell clones with defined insertions that can be used to generate conditional knockout mice, thereby providing extensive parallelism that eliminates the time-consuming steps of targeting vector construction and homologous recombination for each gene. PMID:15659575

  18. Identification of SLC20A1 and SLC15A4 among other genes as potential risk factors for combined pituitary hormone deficiency.

    PubMed

    Simm, Franziska; Griesbeck, Anne; Choukair, Daniela; Weiß, Birgit; Paramasivam, Nagarajan; Klammt, Jürgen; Schlesner, Matthias; Wiemann, Stefan; Martinez, Cristina; Hoffmann, Georg F; Pfäffle, Roland W; Bettendorf, Markus; Rappold, Gudrun A

    2017-10-26

    PurposeCombined pituitary hormone deficiency (CPHD) is characterized by a malformed or underdeveloped pituitary gland resulting in an impaired pituitary hormone secretion. Several transcription factors have been described in its etiology, but defects in known genes account for only a small proportion of cases.MethodsTo identify novel genetic causes for congenital hypopituitarism, we performed exome-sequencing studies on 10 patients with CPHD and their unaffected parents. Two candidate genes were sequenced in further 200 patients. Genotype data of known hypopituitary genes are reviewed.ResultsWe discovered 51 likely damaging variants in 38 genes; 12 of the 51 variants represent de novo events (24%); 11 of the 38 genes (29%) were present in the E12.5/E14.5 pituitary transcriptome. Targeted sequencing of two candidate genes, SLC20A1 and SLC15A4, of the solute carrier membrane transport protein family in 200 additional patients demonstrated two further variants predicted as damaging. We also found combinations of de novo (SLC20A1/SLC15A4) and transmitted variants (GLI2/LHX3) in the same individuals, leading to the full-blown CPHD phenotype.ConclusionThese data expand the pituitary target genes repertoire for diagnostics and further functional studies. Exome sequencing has identified a combination of rare variants in different genes that might explain incomplete penetrance in CPHD.Genetics in Medicine advance online publication, 26 October 2017; doi:10.1038/gim.2017.165.

  19. Novel polyacrylate-based cationic nanoparticles for survivin siRNA delivery combined with mitoxantrone for treatment of breast cancer.

    PubMed

    Arami, Sanam; Mahdavi, Majid; Rashidi, Mohammad Reza; Fathi, Marziyeh; Hejazi, Mohammad-Saeid; Samadi, Nasser

    2016-11-01

    As a gene delivery method in breast cancer therapy, knocking down the undesired genes in the cancerous cells would be promising. Inhibitors of Apoptosis Protein (IAP) family genes are some of the genes whose responsibility is inhibition of apoptosis in cells. Silencing these genes seems to be helpful directing the tumor cells to death. siRNA sequence designed against survivin anti-apoptotic gene can play this role if carried to the cytoplasm. Here we prepared a positive charged biocompatible nano-sized particle made up of a Fe 3 O 4 core covered respectively by polyacrylate (PA) and polyethyleneimine (PEI) layer, which could successfully deliver the siRNA into the MCF-7 cells. The particle structure was checked and having less than 50 nm diameter in size, positive charge and, safety towards MCF-7 cells besides being able to form nanoplexes with the siRNA strand helps it entering into the biologic assays part. The siRNA delivery evaluated via flowcytometry. Apoptosis induction was determined by DAPI staining. The efficiency of survivin gene knockdown was evaluated in mRNA and protein levels using Real time PCR and western blotting methods. Overall, the Fe 3 O 4 -PA-PEI nanoparticles can deliver siRNA effectively into the cytoplasm of the MCF-7 breast cancer cells and induce apoptosis. Copyright © 2016 International Alliance for Biological Standardization. Published by Elsevier Ltd. All rights reserved.

  20. Use of the ecf1 gene to detect Shiga toxin-producing Escherichia coli in beef samples.

    PubMed

    Livezey, Kristin W; Groschel, Bettina; Becker, Michael M

    2015-04-01

    Escherichia coli O157:H7 and six serovars (O26, O103, O121, O111, O145, and O45) are frequently implicated in severe clinical illness worldwide. Standard testing methods using stx, eae, and O serogroup-specific gene sequences for detecting the top six non-O157 STEC bear the disadvantage that these genes may reside, independently, in different nonpathogenic organisms, leading to false-positive results. The ecf operon has previously been identified in the large enterohemolysin-encoding plasmid of eae-positive Shiga toxin-producing E. coli (STEC). Here, we explored the utility of the ecf operon as a single marker to detect eae-positive STEC from pure broth and primary meat enrichments. Analysis of 501 E. coli isolates demonstrated a strong correlation (99.6%) between the presence of the ecf1 gene and the combined presence of stx, eae, and ehxA genes. Two large studies were carried out to determine the utility of an ecf1 detection assay to detect non-O157 STEC strains in enriched meat samples in comparison to the results using the U. S. Department of Agriculture Food Safety and Inspection Service (FSIS) method that detects stx and eae genes. In ground beef samples (n = 1,065), the top six non-O157 STEC were detected in 4.0% of samples by an ecf1 detection assay and in 5.0% of samples by the stx- and eae-based method. In contrast, in beef samples composed largely of trim (n = 1,097), the top six non-O157 STEC were detected at 1.1% by both methods. Estimation of false-positive rates among the top six non-O157 STEC revealed a lower rate using the ecf1 detection method (0.5%) than using the eae and stx screening method (1.1%). Additionally, the ecf1 detection assay detected STEC strains associated with severe illness that are not included in the FSIS regulatory definition of adulterant STEC.

Top