gene prediction program: Topics by Science.gov

Sample records for gene prediction program

FrameD: A flexible program for quality check and gene prediction in prokaryotic genomes and noisy matured eukaryotic sequences.

PubMed

Schiex, Thomas; Gouzy, Jérôme; Moisan, Annick; de Oliveira, Yannick

2003-07-01

We describe FrameD, a program that predicts coding regions in prokaryotic and matured eukaryotic sequences. Initially targeted at gene prediction in bacterial GC rich genomes, the gene model used in FrameD also allows to predict genes in the presence of frameshifts and partially undetermined sequences which makes it also very suitable for gene prediction and frameshift correction in unfinished sequences such as EST and EST cluster sequences. Like recent eukaryotic gene prediction programs, FrameD also includes the ability to take into account protein similarity information both in its prediction and its graphical output. Its performances are evaluated on different bacterial genomes. The web site (http://genopole.toulouse.inra.fr/bioinfo/FrameD/FD) allows direct prediction, sequence correction and translation and the ability to learn new models for new organisms.
Reverse-engineering the genetic circuitry of a cancer cell with predicted intervention in chronic lymphocytic leukemia.

PubMed

Vallat, Laurent; Kemper, Corey A; Jung, Nicolas; Maumy-Bertrand, Myriam; Bertrand, Frédéric; Meyer, Nicolas; Pocheville, Arnaud; Fisher, John W; Gribben, John G; Bahram, Seiamak

2013-01-08

Cellular behavior is sustained by genetic programs that are progressively disrupted in pathological conditions--notably, cancer. High-throughput gene expression profiling has been used to infer statistical models describing these cellular programs, and development is now needed to guide orientated modulation of these systems. Here we develop a regression-based model to reverse-engineer a temporal genetic program, based on relevant patterns of gene expression after cell stimulation. This method integrates the temporal dimension of biological rewiring of genetic programs and enables the prediction of the effect of targeted gene disruption at the system level. We tested the performance accuracy of this model on synthetic data before reverse-engineering the response of primary cancer cells to a proliferative (protumorigenic) stimulation in a multistate leukemia biological model (i.e., chronic lymphocytic leukemia). To validate the ability of our method to predict the effects of gene modulation on the global program, we performed an intervention experiment on a targeted gene. Comparison of the predicted and observed gene expression changes demonstrates the possibility of predicting the effects of a perturbation in a gene regulatory network, a first step toward an orientated intervention in a cancer cell genetic program.
Multi-gene genetic programming based predictive models for municipal solid waste gasification in a fluidized bed gasifier.

PubMed

Pandey, Daya Shankar; Pan, Indranil; Das, Saptarshi; Leahy, James J; Kwapinski, Witold

2015-03-01

A multi-gene genetic programming technique is proposed as a new method to predict syngas yield production and the lower heating value for municipal solid waste gasification in a fluidized bed gasifier. The study shows that the predicted outputs of the municipal solid waste gasification process are in good agreement with the experimental dataset and also generalise well to validation (untrained) data. Published experimental datasets are used for model training and validation purposes. The results show the effectiveness of the genetic programming technique for solving complex nonlinear regression problems. The multi-gene genetic programming are also compared with a single-gene genetic programming model to show the relative merits and demerits of the technique. This study demonstrates that the genetic programming based data-driven modelling strategy can be a good candidate for developing models for other types of fuels as well. Copyright © 2014 Elsevier Ltd. All rights reserved.
SGP-1: Prediction and Validation of Homologous Genes Based on Sequence Alignments

PubMed Central

Wiehe, Thomas; Gebauer-Jung, Steffi; Mitchell-Olds, Thomas; Guigó, Roderic

2001-01-01

Conventional methods of gene prediction rely on the recognition of DNA-sequence signals, the coding potential or the comparison of a genomic sequence with a cDNA, EST, or protein database. Reasons for limited accuracy in many circumstances are species-specific training and the incompleteness of reference databases. Lately, comparative genome analysis has attracted increasing attention. Several analysis tools that are based on human/mouse comparisons are already available. Here, we present a program for the prediction of protein-coding genes, termed SGP-1 (Syntenic Gene Prediction), which is based on the similarity of homologous genomic sequences. In contrast to most existing tools, the accuracy of SGP-1 depends little on species-specific properties such as codon usage or the nucleotide distribution. SGP-1 may therefore be applied to nonstandard model organisms in vertebrates as well as in plants, without the need for extensive parameter training. In addition to predicting genes in large-scale genomic sequences, the program may be useful to validate gene structure annotations from databases. To this end, SGP-1 output also contains comparisons between predicted and annotated gene structures in HTML format. The program can be accessed via a Web server at http://soft.ice.mpg.de/sgp-1. The source code, written in ANSI C, is available on request from the authors. PMID:11544202
A comparison of machine learning techniques for survival prediction in breast cancer

PubMed Central

2011-01-01

Background The ability to accurately classify cancer patients into risk classes, i.e. to predict the outcome of the pathology on an individual basis, is a key ingredient in making therapeutic decisions. In recent years gene expression data have been successfully used to complement the clinical and histological criteria traditionally used in such prediction. Many "gene expression signatures" have been developed, i.e. sets of genes whose expression values in a tumor can be used to predict the outcome of the pathology. Here we investigate the use of several machine learning techniques to classify breast cancer patients using one of such signatures, the well established 70-gene signature. Results We show that Genetic Programming performs significantly better than Support Vector Machines, Multilayered Perceptrons and Random Forests in classifying patients from the NKI breast cancer dataset, and comparably to the scoring-based method originally proposed by the authors of the 70-gene signature. Furthermore, Genetic Programming is able to perform an automatic feature selection. Conclusions Since the performance of Genetic Programming is likely to be improvable compared to the out-of-the-box approach used here, and given the biological insight potentially provided by the Genetic Programming solutions, we conclude that Genetic Programming methods are worth further investigation as a tool for cancer patient classification based on gene expression data. PMID:21569330
Seqping: gene prediction pipeline for plant genomes using self-training gene models and transcriptomic data.

PubMed

Chan, Kuang-Lim; Rosli, Rozana; Tatarinova, Tatiana V; Hogan, Michael; Firdaus-Raih, Mohd; Low, Eng-Ti Leslie

2017-01-27

Gene prediction is one of the most important steps in the genome annotation process. A large number of software tools and pipelines developed by various computing techniques are available for gene prediction. However, these systems have yet to accurately predict all or even most of the protein-coding regions. Furthermore, none of the currently available gene-finders has a universal Hidden Markov Model (HMM) that can perform gene prediction for all organisms equally well in an automatic fashion. We present an automated gene prediction pipeline, Seqping that uses self-training HMM models and transcriptomic data. The pipeline processes the genome and transcriptome sequences of the target species using GlimmerHMM, SNAP, and AUGUSTUS pipelines, followed by MAKER2 program to combine predictions from the three tools in association with the transcriptomic evidence. Seqping generates species-specific HMMs that are able to offer unbiased gene predictions. The pipeline was evaluated using the Oryza sativa and Arabidopsis thaliana genomes. Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis showed that the pipeline was able to identify at least 95% of BUSCO's plantae dataset. Our evaluation shows that Seqping was able to generate better gene predictions compared to three HMM-based programs (MAKER2, GlimmerHMM and AUGUSTUS) using their respective available HMMs. Seqping had the highest accuracy in rice (0.5648 for CDS, 0.4468 for exon, and 0.6695 nucleotide structure) and A. thaliana (0.5808 for CDS, 0.5955 for exon, and 0.8839 nucleotide structure). Seqping provides researchers a seamless pipeline to train species-specific HMMs and predict genes in newly sequenced or less-studied genomes. We conclude that the Seqping pipeline predictions are more accurate than gene predictions using the other three approaches with the default or available HMMs.
Economic benefits of using adaptive predictive models of reproductive toxicity in the context of a tiered testing program

EPA Science Inventory

A predictive model of reproductive toxicity, as observed in rat multigeneration reproductive (MGR) studies, was previously developed using high throughput screening (HTS) data from 36 in vitro assays mapped to 8 genes or gene-sets from Phase I of USEPA ToxCast research program, t...
Intrinsic and extrinsic approaches for detecting genes in a bacterial genome.

PubMed Central

Borodovsky, M; Rudd, K E; Koonin, E V

1994-01-01

The unannotated regions of the Escherichia coli genome DNA sequence from the EcoSeq6 database, totaling 1,278 'intergenic' sequences of the combined length of 359,279 basepairs, were analyzed using computer-assisted methods with the aim of identifying putative unknown genes. The proposed strategy for finding new genes includes two key elements: i) prediction of expressed open reading frames (ORFs) using the GeneMark method based on Markov chain models for coding and non-coding regions of Escherichia coli DNA, and ii) search for protein sequence similarities using programs based on the BLAST algorithm and programs for motif identification. A total of 354 putative expressed ORFs were predicted by GeneMark. Using the BLASTX and TBLASTN programs, it was shown that 208 ORFs located in the unannotated regions of the E. coli chromosome are significantly similar to other protein sequences. Identification of 182 ORFs as probable genes was supported by GeneMark and BLAST, comprising 51.4% of the GeneMark 'hits' and 87.5% of the BLAST 'hits'. 73 putative new genes, comprising 20.6% of the GeneMark predictions, belong to ancient conserved protein families that include both eubacterial and eukaryotic members. This value is close to the overall proportion of highly conserved sequences among eubacterial proteins, indicating that the majority of the putative expressed ORFs that are predicted by GeneMark, but have no significant BLAST hits, nevertheless are likely to be real genes. The majority of the putative genes identified by BLAST search have been described since the release of the EcoSeq6 database, but about 70 genes have not been detected so far. Among these new identifications are genes encoding proteins with a variety of predicted functions including dehydrogenases, kinases, several other metabolic enzymes, ATPases, rRNA methyltransferases, membrane proteins, and different types of regulatory proteins. Images PMID:7984428
Use of Artificial Intelligence and Machine Learning Algorithms with Gene Expression Profiling to Predict Recurrent Nonmuscle Invasive Urothelial Carcinoma of the Bladder.

PubMed

Bartsch, Georg; Mitra, Anirban P; Mitra, Sheetal A; Almal, Arpit A; Steven, Kenneth E; Skinner, Donald G; Fry, David W; Lenehan, Peter F; Worzel, William P; Cote, Richard J

2016-02-01

Due to the high recurrence risk of nonmuscle invasive urothelial carcinoma it is crucial to distinguish patients at high risk from those with indolent disease. In this study we used a machine learning algorithm to identify the genes in patients with nonmuscle invasive urothelial carcinoma at initial presentation that were most predictive of recurrence. We used the genes in a molecular signature to predict recurrence risk within 5 years after transurethral resection of bladder tumor. Whole genome profiling was performed on 112 frozen nonmuscle invasive urothelial carcinoma specimens obtained at first presentation on Human WG-6 BeadChips (Illumina®). A genetic programming algorithm was applied to evolve classifier mathematical models for outcome prediction. Cross-validation based resampling and gene use frequencies were used to identify the most prognostic genes, which were combined into rules used in a voting algorithm to predict the sample target class. Key genes were validated by quantitative polymerase chain reaction. The classifier set included 21 genes that predicted recurrence. Quantitative polymerase chain reaction was done for these genes in a subset of 100 patients. A 5-gene combined rule incorporating a voting algorithm yielded 77% sensitivity and 85% specificity to predict recurrence in the training set, and 69% and 62%, respectively, in the test set. A singular 3-gene rule was constructed that predicted recurrence with 80% sensitivity and 90% specificity in the training set, and 71% and 67%, respectively, in the test set. Using primary nonmuscle invasive urothelial carcinoma from initial occurrences genetic programming identified transcripts in reproducible fashion, which were predictive of recurrence. These findings could potentially impact nonmuscle invasive urothelial carcinoma management. Copyright © 2016 American Urological Association Education and Research, Inc. Published by Elsevier Inc. All rights reserved.
GeneBuilder: interactive in silico prediction of gene structure.

PubMed

Milanesi, L; D'Angelo, D; Rogozin, I B

1999-01-01

Prediction of gene structure in newly sequenced DNA becomes very important in large genome sequencing projects. This problem is complicated due to the exon-intron structure of eukaryotic genes and because gene expression is regulated by many different short nucleotide domains. In order to be able to analyse the full gene structure in different organisms, it is necessary to combine information about potential functional signals (promoter region, splice sites, start and stop codons, 3' untranslated region) together with the statistical properties of coding sequences (coding potential), information about homologous proteins, ESTs and repeated elements. We have developed the GeneBuilder system which is based on prediction of functional signals and coding regions by different approaches in combination with similarity searches in proteins and EST databases. The potential gene structure models are obtained by using a dynamic programming method. The program permits the use of several parameters for gene structure prediction and refinement. During gene model construction, selecting different exon homology levels with a protein sequence selected from a list of homologous proteins can improve the accuracy of the gene structure prediction. In the case of low homology, GeneBuilder is still able to predict the gene structure. The GeneBuilder system has been tested by using the standard set (Burset and Guigo, Genomics, 34, 353-367, 1996) and the performances are: 0.89 sensitivity and 0.91 specificity at the nucleotide level. The total correlation coefficient is 0.88. The GeneBuilder system is implemented as a part of the WebGene a the URL: http://www.itba.mi. cnr.it/webgene and TRADAT (TRAncription Database and Analysis Tools) launcher URL: http://www.itba.mi.cnr.it/tradat.
Compare Gene Calls

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ecale Zhou, Carol L.

2016-07-05

Compare Gene Calls (CGC) is a Python code used for combining and comparing gene calls from any number of gene callers. A gene caller is a computer program that predicts the extends of open reading frames within genomes of biological organisms.
Functional analysis of rare variants in mismatch repair proteins augments results from computation-based predictive methods

PubMed Central

Arora, Sanjeevani; Huwe, Peter J.; Sikder, Rahmat; Shah, Manali; Browne, Amanda J.; Lesh, Randy; Nicolas, Emmanuelle; Deshpande, Sanat; Hall, Michael J.; Dunbrack, Roland L.; Golemis, Erica A.

2017-01-01

ABSTRACT The cancer-predisposing Lynch Syndrome (LS) arises from germline mutations in DNA mismatch repair (MMR) genes, predominantly MLH1, MSH2, MSH6, and PMS2. A major challenge for clinical diagnosis of LS is the frequent identification of variants of uncertain significance (VUS) in these genes, as it is often difficult to determine variant pathogenicity, particularly for missense variants. Generic programs such as SIFT and PolyPhen-2, and MMR gene-specific programs such as PON-MMR and MAPP-MMR, are often used to predict deleterious or neutral effects of VUS in MMR genes. We evaluated the performance of multiple predictive programs in the context of functional biologic data for 15 VUS in MLH1, MSH2, and PMS2. Using cell line models, we characterized VUS predicted to range from neutral to pathogenic on mRNA and protein expression, basal cellular viability, viability following treatment with a panel of DNA-damaging agents, and functionality in DNA damage response (DDR) signaling, benchmarking to wild-type MMR proteins. Our results suggest that the MMR gene-specific classifiers do not always align with the experimental phenotypes related to DDR. Our study highlights the importance of complementary experimental and computational assessment to develop future predictors for the assessment of VUS. PMID:28494185
Evaluating bacterial gene-finding HMM structures as probabilistic logic programs.

PubMed

Mørk, Søren; Holmes, Ian

2012-03-01

Probabilistic logic programming offers a powerful way to describe and evaluate structured statistical models. To investigate the practicality of probabilistic logic programming for structure learning in bioinformatics, we undertook a simplified bacterial gene-finding benchmark in PRISM, a probabilistic dialect of Prolog. We evaluate Hidden Markov Model structures for bacterial protein-coding gene potential, including a simple null model structure, three structures based on existing bacterial gene finders and two novel model structures. We test standard versions as well as ADPH length modeling and three-state versions of the five model structures. The models are all represented as probabilistic logic programs and evaluated using the PRISM machine learning system in terms of statistical information criteria and gene-finding prediction accuracy, in two bacterial genomes. Neither of our implementations of the two currently most used model structures are best performing in terms of statistical information criteria or prediction performances, suggesting that better-fitting models might be achievable. The source code of all PRISM models, data and additional scripts are freely available for download at: http://github.com/somork/codonhmm. Supplementary data are available at Bioinformatics online.
Dinucleotide controlled null models for comparative RNA gene prediction.

PubMed

Gesell, Tanja; Washietl, Stefan

2008-05-27

Comparative prediction of RNA structures can be used to identify functional noncoding RNAs in genomic screens. It was shown recently by Babak et al. [BMC Bioinformatics. 8:33] that RNA gene prediction programs can be biased by the genomic dinucleotide content, in particular those programs using a thermodynamic folding model including stacking energies. As a consequence, there is need for dinucleotide-preserving control strategies to assess the significance of such predictions. While there have been randomization algorithms for single sequences for many years, the problem has remained challenging for multiple alignments and there is currently no algorithm available. We present a program called SISSIz that simulates multiple alignments of a given average dinucleotide content. Meeting additional requirements of an accurate null model, the randomized alignments are on average of the same sequence diversity and preserve local conservation and gap patterns. We make use of a phylogenetic substitution model that includes overlapping dependencies and site-specific rates. Using fast heuristics and a distance based approach, a tree is estimated under this model which is used to guide the simulations. The new algorithm is tested on vertebrate genomic alignments and the effect on RNA structure predictions is studied. In addition, we directly combined the new null model with the RNAalifold consensus folding algorithm giving a new variant of a thermodynamic structure based RNA gene finding program that is not biased by the dinucleotide content. SISSIz implements an efficient algorithm to randomize multiple alignments preserving dinucleotide content. It can be used to get more accurate estimates of false positive rates of existing programs, to produce negative controls for the training of machine learning based programs, or as standalone RNA gene finding program. Other applications in comparative genomics that require randomization of multiple alignments can be considered. SISSIz is available as open source C code that can be compiled for every major platform and downloaded here: http://sourceforge.net/projects/sissiz.
Integrating alternative splicing detection into gene prediction.

PubMed

Foissac, Sylvain; Schiex, Thomas

2005-02-10

Alternative splicing (AS) is now considered as a major actor in transcriptome/proteome diversity and it cannot be neglected in the annotation process of a new genome. Despite considerable progresses in term of accuracy in computational gene prediction, the ability to reliably predict AS variants when there is local experimental evidence of it remains an open challenge for gene finders. We have used a new integrative approach that allows to incorporate AS detection into ab initio gene prediction. This method relies on the analysis of genomically aligned transcript sequences (ESTs and/or cDNAs), and has been implemented in the dynamic programming algorithm of the graph-based gene finder EuGENE. Given a genomic sequence and a set of aligned transcripts, this new version identifies the set of transcripts carrying evidence of alternative splicing events, and provides, in addition to the classical optimal gene prediction, alternative optimal predictions (among those which are consistent with the AS events detected). This allows for multiple annotations of a single gene in a way such that each predicted variant is supported by a transcript evidence (but not necessarily with a full-length coverage). This automatic combination of experimental data analysis and ab initio gene finding offers an ideal integration of alternatively spliced gene prediction inside a single annotation pipeline.
A genomic lifespan program that reorganises the young adult brain is targeted in schizophrenia.

PubMed

Skene, Nathan G; Roy, Marcia; Grant, Seth Gn

2017-09-12

The genetic mechanisms regulating the brain and behaviour across the lifespan are poorly understood. We found that lifespan transcriptome trajectories describe a calendar of gene regulatory events in the brain of humans and mice. Transcriptome trajectories defined a sequence of gene expression changes in neuronal, glial and endothelial cell-types, which enabled prediction of age from tissue samples. A major lifespan landmark was the peak change in trajectories occurring in humans at 26 years and in mice at 5 months of age. This species-conserved peak was delayed in females and marked a reorganization of expression of synaptic and schizophrenia-susceptibility genes. The lifespan calendar predicted the characteristic age of onset in young adults and sex differences in schizophrenia. We propose a genomic program generates a lifespan calendar of gene regulation that times age-dependent molecular organization of the brain and mutations that interrupt the program in young adults cause schizophrenia.
Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments

PubMed Central

Haas, Brian J; Salzberg, Steven L; Zhu, Wei; Pertea, Mihaela; Allen, Jonathan E; Orvis, Joshua; White, Owen; Buell, C Robin; Wortman, Jennifer R

2008-01-01

EVidenceModeler (EVM) is presented as an automated eukaryotic gene structure annotation tool that reports eukaryotic gene structures as a weighted consensus of all available evidence. EVM, when combined with the Program to Assemble Spliced Alignments (PASA), yields a comprehensive, configurable annotation system that predicts protein-coding genes and alternatively spliced isoforms. Our experiments on both rice and human genome sequences demonstrate that EVM produces automated gene structure annotation approaching the quality of manual curation. PMID:18190707
ZCURVE 3.0: identify prokaryotic genes with higher accuracy as well as automatically and accurately select essential genes

PubMed Central

Hua, Zhi-Gang; Lin, Yan; Yuan, Ya-Zhou; Yang, De-Chang; Wei, Wen; Guo, Feng-Biao

2015-01-01

In 2003, we developed an ab initio program, ZCURVE 1.0, to find genes in bacterial and archaeal genomes. In this work, we present the updated version (i.e. ZCURVE 3.0). Using 422 prokaryotic genomes, the average accuracy was 93.7% with the updated version, compared with 88.7% with the original version. Such results also demonstrate that ZCURVE 3.0 is comparable with Glimmer 3.02 and may provide complementary predictions to it. In fact, the joint application of the two programs generated better results by correctly finding more annotated genes while also containing fewer false-positive predictions. As the exclusive function, ZCURVE 3.0 contains one post-processing program that can identify essential genes with high accuracy (generally >90%). We hope ZCURVE 3.0 will receive wide use with the web-based running mode. The updated ZCURVE can be freely accessed from http://cefg.uestc.edu.cn/zcurve/ or http://tubic.tju.edu.cn/zcurveb/ without any restrictions. PMID:25977299
Gene and translation initiation site prediction in metagenomic sequences

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hyatt, Philip Douglas; LoCascio, Philip F; Hauser, Loren John

2012-01-01

Gene prediction in metagenomic sequences remains a difficult problem. Current sequencing technologies do not achieve sufficient coverage to assemble the individual genomes in a typical sample; consequently, sequencing runs produce a large number of short sequences whose exact origin is unknown. Since these sequences are usually smaller than the average length of a gene, algorithms must make predictions based on very little data. We present MetaProdigal, a metagenomic version of the gene prediction program Prodigal, that can identify genes in short, anonymous coding sequences with a high degree of accuracy. The novel value of the method consists of enhanced translationmore » initiation site identification, ability to identify sequences that use alternate genetic codes and confidence values for each gene call. We compare the results of MetaProdigal with other methods and conclude with a discussion of future improvements.« less
ZCURVE 3.0: identify prokaryotic genes with higher accuracy as well as automatically and accurately select essential genes.

PubMed

Hua, Zhi-Gang; Lin, Yan; Yuan, Ya-Zhou; Yang, De-Chang; Wei, Wen; Guo, Feng-Biao

2015-07-01

In 2003, we developed an ab initio program, ZCURVE 1.0, to find genes in bacterial and archaeal genomes. In this work, we present the updated version (i.e. ZCURVE 3.0). Using 422 prokaryotic genomes, the average accuracy was 93.7% with the updated version, compared with 88.7% with the original version. Such results also demonstrate that ZCURVE 3.0 is comparable with Glimmer 3.02 and may provide complementary predictions to it. In fact, the joint application of the two programs generated better results by correctly finding more annotated genes while also containing fewer false-positive predictions. As the exclusive function, ZCURVE 3.0 contains one post-processing program that can identify essential genes with high accuracy (generally >90%). We hope ZCURVE 3.0 will receive wide use with the web-based running mode. The updated ZCURVE can be freely accessed from http://cefg.uestc.edu.cn/zcurve/ or http://tubic.tju.edu.cn/zcurveb/ without any restrictions. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

Prediction of exercise-mediated changes in metabolic markers by gene polymorphism.

PubMed

Kahara, Toshio; Takamura, Toshinari; Hayakawa, Tetsuo; Nagai, Yukihiro; Yamaguchi, Hiromi; Katsuki, Tatsuo; Katsuki, Ken-ichi; Katsuki, Michio; Kobayashi, Ken-ichi

2002-08-01

The effects of regular physical exercise on obesity-associated metabolic abnormalities vary for each individual. In this study, we investigated whether genotypes of genes associated with obesity can predict the effects of exercise on changes in metabolic markers in healthy men. Healthy Japanese men (n=106) performed the exercise program at 50% of their maximal heart rate for 20-60 min a day, 2-3 days each week for 3 months. The levels of fasting plasma glucose (FPG) and serum leptin significantly decreased after the exercise program. Polymorphisms of the beta3-adrenergic receptor (beta3AR) and uncoupling protein-1 (UCP-1) genes were analyzed with RFLP methods. In the Trp/Trp genotype of the beta3AR gene, the levels of serum leptin, FPG and fructosamine (FrAm) decreased significantly after the exercise program, but not in the Arg/Arg genotype. In the AG heterozygote and the GG homozygote of the UCP-1 gene, FPG and FrAm levels were significantly reduced, respectively. In conclusion, gene polymorphism of the beta3AR and UCP-1 was found to be associated with the exercise-mediated improvement in glucose tolerance and leptin resistance in healthy Japanese men.
Prediction on the inhibition ratio of pyrrolidine derivatives on matrix metalloproteinase based on gene expression programming.

PubMed

Li, Yuqin; You, Guirong; Jia, Baoxiu; Si, Hongzong; Yao, Xiaojun

2014-01-01

Quantitative structure-activity relationships (QSAR) were developed to predict the inhibition ratio of pyrrolidine derivatives on matrix metalloproteinase via heuristic method (HM) and gene expression programming (GEP). The descriptors of 33 pyrrolidine derivatives were calculated by the software CODESSA, which can calculate quantum chemical, topological, geometrical, constitutional, and electrostatic descriptors. HM was also used for the preselection of 5 appropriate molecular descriptors. Linear and nonlinear QSAR models were developed based on the HM and GEP separately and two prediction models lead to a good correlation coefficient (R (2)) of 0.93 and 0.94. The two QSAR models are useful in predicting the inhibition ratio of pyrrolidine derivatives on matrix metalloproteinase during the discovery of new anticancer drugs and providing theory information for studying the new drugs.
Can Thrifty Gene(s) or Predictive Fetal Programming for Thriftiness Lead to Obesity?

PubMed Central

Baig, Ulfat; Belsare, Prajakta; Watve, Milind; Jog, Maithili

2011-01-01

Obesity and related disorders are thought to have their roots in metabolic “thriftiness” that evolved to combat periodic starvation. The association of low birth weight with obesity in later life caused a shift in the concept from thrifty gene to thrifty phenotype or anticipatory fetal programming. The assumption of thriftiness is implicit in obesity research. We examine here, with the help of a mathematical model, the conditions for evolution of thrifty genes or fetal programming for thriftiness. The model suggests that a thrifty gene cannot exist in a stable polymorphic state in a population. The conditions for evolution of thrifty fetal programming are restricted if the correlation between intrauterine and lifetime conditions is poor. Such a correlation is not observed in natural courses of famine. If there is fetal programming for thriftiness, it could have evolved in anticipation of social factors affecting nutrition that can result in a positive correlation. PMID:21773010
GeneMachine: gene prediction and sequence annotation.

PubMed

Makalowska, I; Ryan, J F; Baxevanis, A D

2001-09-01

A number of free-standing programs have been developed in order to help researchers find potential coding regions and deduce gene structure for long stretches of what is essentially 'anonymous DNA'. As these programs apply inherently different criteria to the question of what is and is not a coding region, multiple algorithms should be used in the course of positional cloning and positional candidate projects to assure that all potential coding regions within a previously-identified critical region are identified. We have developed a gene identification tool called GeneMachine which allows users to query multiple exon and gene prediction programs in an automated fashion. BLAST searches are also performed in order to see whether a previously-characterized coding region corresponds to a region in the query sequence. A suite of Perl programs and modules are used to run MZEF, GENSCAN, GRAIL 2, FGENES, RepeatMasker, Sputnik, and BLAST. The results of these runs are then parsed and written into ASN.1 format. Output files can be opened using NCBI Sequin, in essence using Sequin as both a workbench and as a graphical viewer. The main feature of GeneMachine is that the process is fully automated; the user is only required to launch GeneMachine and then open the resulting file with Sequin. Annotations can then be made to these results prior to submission to GenBank, thereby increasing the intrinsic value of these data. GeneMachine is freely-available for download at http://genome.nhgri.nih.gov/genemachine. A public Web interface to the GeneMachine server for academic and not-for-profit users is available at http://genemachine.nhgri.nih.gov. The Web supplement to this paper may be found at http://genome.nhgri.nih.gov/genemachine/supplement/.
Optimization of neural network architecture using genetic programming improves detection and modeling of gene-gene interactions in studies of human diseases

PubMed Central

Ritchie, Marylyn D; White, Bill C; Parker, Joel S; Hahn, Lance W; Moore, Jason H

2003-01-01

Background Appropriate definition of neural network architecture prior to data analysis is crucial for successful data mining. This can be challenging when the underlying model of the data is unknown. The goal of this study was to determine whether optimizing neural network architecture using genetic programming as a machine learning strategy would improve the ability of neural networks to model and detect nonlinear interactions among genes in studies of common human diseases. Results Using simulated data, we show that a genetic programming optimized neural network approach is able to model gene-gene interactions as well as a traditional back propagation neural network. Furthermore, the genetic programming optimized neural network is better than the traditional back propagation neural network approach in terms of predictive ability and power to detect gene-gene interactions when non-functional polymorphisms are present. Conclusion This study suggests that a machine learning strategy for optimizing neural network architecture may be preferable to traditional trial-and-error approaches for the identification and characterization of gene-gene interactions in common, complex human diseases. PMID:12846935
An Improved Method for TAL Effectors DNA-Binding Sites Prediction Reveals Functional Convergence in TAL Repertoires of Xanthomonas oryzae Strains

PubMed Central

Pérez-Quintero, Alvaro L.; Rodriguez-R, Luis M.; Dereeper, Alexis; López, Camilo; Koebnik, Ralf; Szurek, Boris; Cunnac, Sebastien

2013-01-01

Transcription Activators-Like Effectors (TALEs) belong to a family of virulence proteins from the Xanthomonas genus of bacterial plant pathogens that are translocated into the plant cell. In the nucleus, TALEs act as transcription factors inducing the expression of susceptibility genes. A code for TALE-DNA binding specificity and high-resolution three-dimensional structures of TALE-DNA complexes were recently reported. Accurate prediction of TAL Effector Binding Elements (EBEs) is essential to elucidate the biological functions of the many sequenced TALEs as well as for robust design of artificial TALE DNA-binding domains in biotechnological applications. In this work a program with improved EBE prediction performances was developed using an updated specificity matrix and a position weight correction function to account for the matching pattern observed in a validation set of TALE-DNA interactions. To gain a systems perspective on the large TALE repertoires from X. oryzae strains, this program was used to predict rice gene targets for 99 sequenced family members. Integrating predictions and available expression data in a TALE-gene network revealed multiple candidate transcriptional targets for many TALEs as well as several possible instances of functional convergence among TALEs. PMID:23869221
Experimental annotation of post-translational features and translated coding regions in the pathogen Salmonella Typhimurium

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ansong, Charles; Tolic, Nikola; Purvine, Samuel O.

Complete and accurate genome annotation is crucial for comprehensive and systematic studies of biological systems. For example systems biology-oriented genome scale modeling efforts greatly benefit from accurate annotation of protein-coding genes to develop proper functioning models. However, determining protein-coding genes for most new genomes is almost completely performed by inference, using computational predictions with significant documented error rates (> 15%). Furthermore, gene prediction programs provide no information on biologically important post-translational processing events critical for protein function. With the ability to directly measure peptides arising from expressed proteins, mass spectrometry-based proteomics approaches can be used to augment and verify codingmore » regions of a genomic sequence and importantly detect post-translational processing events. In this study we utilized “shotgun” proteomics to guide accurate primary genome annotation of the bacterial pathogen Salmonella Typhimurium 14028 to facilitate a systems-level understanding of Salmonella biology. The data provides protein-level experimental confirmation for 44% of predicted protein-coding genes, suggests revisions to 48 genes assigned incorrect translational start sites, and uncovers 13 non-annotated genes missed by gene prediction programs. We also present a comprehensive analysis of post-translational processing events in Salmonella, revealing a wide range of complex chemical modifications (70 distinct modifications) and confirming more than 130 signal peptide and N-terminal methionine cleavage events in Salmonella. This study highlights several ways in which proteomics data applied during the primary stages of annotation can improve the quality of genome annotations, especially with regards to the annotation of mature protein products.« less
A gene expression biomarker accurately predicts estrogen ...

EPA Pesticide Factsheets

The EPA’s vision for the Endocrine Disruptor Screening Program (EDSP) in the 21st Century (EDSP21) includes utilization of high-throughput screening (HTS) assays coupled with computational modeling to prioritize chemicals with the goal of eventually replacing current Tier 1 screening tests. The ToxCast program currently includes 18 HTS in vitro assays that evaluate the ability of chemicals to modulate estrogen receptor α (ERα), an important endocrine target. We propose microarray-based gene expression profiling as a complementary approach to predict ERα modulation and have developed computational methods to identify ERα modulators in an existing database of whole-genome microarray data. The ERα biomarker consisted of 46 ERα-regulated genes with consistent expression patterns across 7 known ER agonists and 3 known ER antagonists. The biomarker was evaluated as a predictive tool using the fold-change rank-based Running Fisher algorithm by comparison to annotated gene expression data sets from experiments in MCF-7 cells. Using 141 comparisons from chemical- and hormone-treated cells, the biomarker gave a balanced accuracy for prediction of ERα activation or suppression of 94% or 93%, respectively. The biomarker was able to correctly classify 18 out of 21 (86%) OECD ER reference chemicals including “very weak” agonists and replicated predictions based on 18 in vitro ER-associated HTS assays. For 114 chemicals present in both the HTS data and the MCF-7 c
QSAR Study for Carcinogenic Potency of Aromatic Amines Based on GEP and MLPs

PubMed Central

Song, Fucheng; Zhang, Anling; Liang, Hui; Cui, Lianhua; Li, Wenlian; Si, Hongzong; Duan, Yunbo; Zhai, Honglin

2016-01-01

A new analysis strategy was used to classify the carcinogenicity of aromatic amines. The physical-chemical parameters are closely related to the carcinogenicity of compounds. Quantitative structure activity relationship (QSAR) is a method of predicting the carcinogenicity of aromatic amine, which can reveal the relationship between carcinogenicity and physical-chemical parameters. This study accessed gene expression programming by APS software, the multilayer perceptrons by Weka software to predict the carcinogenicity of aromatic amines, respectively. All these methods relied on molecular descriptors calculated by CODESSA software and eight molecular descriptors were selected to build function equations. As a remarkable result, the accuracy of gene expression programming in training and test sets are 0.92 and 0.82, the accuracy of multilayer perceptrons in training and test sets are 0.84 and 0.74 respectively. The precision of the gene expression programming is obviously superior to multilayer perceptrons both in training set and test set. The QSAR application in the identification of carcinogenic compounds is a high efficiency method. PMID:27854309
Promzea: a pipeline for discovery of co-regulatory motifs in maize and other plant species and its application to the anthocyanin and phlobaphene biosynthetic pathways and the Maize Development Atlas.

PubMed

Liseron-Monfils, Christophe; Lewis, Tim; Ashlock, Daniel; McNicholas, Paul D; Fauteux, François; Strömvik, Martina; Raizada, Manish N

2013-03-15

The discovery of genetic networks and cis-acting DNA motifs underlying their regulation is a major objective of transcriptome studies. The recent release of the maize genome (Zea mays L.) has facilitated in silico searches for regulatory motifs. Several algorithms exist to predict cis-acting elements, but none have been adapted for maize. A benchmark data set was used to evaluate the accuracy of three motif discovery programs: BioProspector, Weeder and MEME. Analysis showed that each motif discovery tool had limited accuracy and appeared to retrieve a distinct set of motifs. Therefore, using the benchmark, statistical filters were optimized to reduce the false discovery ratio, and then remaining motifs from all programs were combined to improve motif prediction. These principles were integrated into a user-friendly pipeline for motif discovery in maize called Promzea, available at http://www.promzea.org and on the Discovery Environment of the iPlant Collaborative website. Promzea was subsequently expanded to include rice and Arabidopsis. Within Promzea, a user enters cDNA sequences or gene IDs; corresponding upstream sequences are retrieved from the maize genome. Predicted motifs are filtered, combined and ranked. Promzea searches the chosen plant genome for genes containing each candidate motif, providing the user with the gene list and corresponding gene annotations. Promzea was validated in silico using a benchmark data set: the Promzea pipeline showed a 22% increase in nucleotide sensitivity compared to the best standalone program tool, Weeder, with equivalent nucleotide specificity. Promzea was also validated by its ability to retrieve the experimentally defined binding sites of transcription factors that regulate the maize anthocyanin and phlobaphene biosynthetic pathways. Promzea predicted additional promoter motifs, and genome-wide motif searches by Promzea identified 127 non-anthocyanin/phlobaphene genes that each contained all five predicted promoter motifs in their promoters, perhaps uncovering a broader co-regulated gene network. Promzea was also tested against tissue-specific microarray data from maize. An online tool customized for promoter motif discovery in plants has been generated called Promzea. Promzea was validated in silico by its ability to retrieve benchmark motifs and experimentally defined motifs and was tested using tissue-specific microarray data. Promzea predicted broader networks of gene regulation associated with the historic anthocyanin and phlobaphene biosynthetic pathways. Promzea is a new bioinformatics tool for understanding transcriptional gene regulation in maize and has been expanded to include rice and Arabidopsis.
DIANA-microT web server: elucidating microRNA functions through target prediction.

PubMed

Maragkakis, M; Reczko, M; Simossis, V A; Alexiou, P; Papadopoulos, G L; Dalamagas, T; Giannopoulos, G; Goumas, G; Koukis, E; Kourtis, K; Vergoulis, T; Koziris, N; Sellis, T; Tsanakas, P; Hatzigeorgiou, A G

2009-07-01

Computational microRNA (miRNA) target prediction is one of the key means for deciphering the role of miRNAs in development and disease. Here, we present the DIANA-microT web server as the user interface to the DIANA-microT 3.0 miRNA target prediction algorithm. The web server provides extensive information for predicted miRNA:target gene interactions with a user-friendly interface, providing extensive connectivity to online biological resources. Target gene and miRNA functions may be elucidated through automated bibliographic searches and functional information is accessible through Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. The web server offers links to nomenclature, sequence and protein databases, and users are facilitated by being able to search for targeted genes using different nomenclatures or functional features, such as the genes possible involvement in biological pathways. The target prediction algorithm supports parameters calculated individually for each miRNA:target gene interaction and provides a signal-to-noise ratio and a precision score that helps in the evaluation of the significance of the predicted results. Using a set of miRNA targets recently identified through the pSILAC method, the performance of several computational target prediction programs was assessed. DIANA-microT 3.0 achieved there with 66% the highest ratio of correctly predicted targets over all predicted targets. The DIANA-microT web server is freely available at www.microrna.gr/microT.
Ontology-oriented retrieval of putative microRNAs in Vitis vinifera via GrapeMiRNA: a web database of de novo predicted grape microRNAs.

PubMed

Lazzari, Barbara; Caprera, Andrea; Cestaro, Alessandro; Merelli, Ivan; Del Corvo, Marcello; Fontana, Paolo; Milanesi, Luciano; Velasco, Riccardo; Stella, Alessandra

2009-06-29

Two complete genome sequences are available for Vitis vinifera Pinot noir. Based on the sequence and gene predictions produced by the IASMA, we performed an in silico detection of putative microRNA genes and of their targets, and collected the most reliable microRNA predictions in a web database. The application is available at http://www.itb.cnr.it/ptp/grapemirna/. The program FindMiRNA was used to detect putative microRNA genes in the grape genome. A very high number of predictions was retrieved, calling for validation. Nine parameters were calculated and, based on the grape microRNAs dataset available at miRBase, thresholds were defined and applied to FindMiRNA predictions having targets in gene exons. In the resulting subset, predictions were ranked according to precursor positions and sequence similarity, and to target identity. To further validate FindMiRNA predictions, comparisons to the Arabidopsis genome, to the grape Genoscope genome, and to the grape EST collection were performed. Results were stored in a MySQL database and a web interface was prepared to query the database and retrieve predictions of interest. The GrapeMiRNA database encompasses 5,778 microRNA predictions spanning the whole grape genome. Predictions are integrated with information that can be of use in selection procedures. Tools added in the web interface also allow to inspect predictions according to gene ontology classes and metabolic pathways of targets. The GrapeMiRNA database can be of help in selecting candidate microRNA genes to be validated.
Prediction of atmospheric degradation data for POPs by gene expression programming.

PubMed

Luan, F; Si, H Z; Liu, H T; Wen, Y Y; Zhang, X Y

2008-01-01

Quantitative structure-activity relationship models for the prediction of the mean and the maximum atmospheric degradation half-life values of persistent organic pollutants were developed based on the linear heuristic method (HM) and non-linear gene expression programming (GEP). Molecular descriptors, calculated from the structures alone, were used to represent the characteristics of the compounds. HM was used both to pre-select the whole descriptor sets and to build the linear model. GEP yielded satisfactory prediction results: the square of the correlation coefficient r(2) was 0.80 and 0.81 for the mean and maximum half-life values of the test set, and the root mean square errors were 0.448 and 0.426, respectively. The results of this work indicate that the GEP is a very promising tool for non-linear approximations.
EGASP: the human ENCODE Genome Annotation Assessment Project

PubMed Central

Guigó, Roderic; Flicek, Paul; Abril, Josep F; Reymond, Alexandre; Lagarde, Julien; Denoeud, France; Antonarakis, Stylianos; Ashburner, Michael; Bajic, Vladimir B; Birney, Ewan; Castelo, Robert; Eyras, Eduardo; Ucla, Catherine; Gingeras, Thomas R; Harrow, Jennifer; Hubbard, Tim; Lewis, Suzanna E; Reese, Martin G

2006-01-01

Background We present the results of EGASP, a community experiment to assess the state-of-the-art in genome annotation within the ENCODE regions, which span 1% of the human genome sequence. The experiment had two major goals: the assessment of the accuracy of computational methods to predict protein coding genes; and the overall assessment of the completeness of the current human genome annotations as represented in the ENCODE regions. For the computational prediction assessment, eighteen groups contributed gene predictions. We evaluated these submissions against each other based on a 'reference set' of annotations generated as part of the GENCODE project. These annotations were not available to the prediction groups prior to the submission deadline, so that their predictions were blind and an external advisory committee could perform a fair assessment. Results The best methods had at least one gene transcript correctly predicted for close to 70% of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into account alternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotide level, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programs relying on mRNA and protein sequences were the most accurate in reproducing the manually curated annotations. Experimental validation shows that only a very small percentage (3.2%) of the selected 221 computationally predicted exons outside of the existing annotation could be verified. Conclusion This is the first such experiment in human DNA, and we have followed the standards established in a similar experiment, GASP1, in Drosophila melanogaster. We believe the results presented here contribute to the value of ongoing large-scale annotation projects and should guide further experimental methods when being scaled up to the entire human genome sequence. PMID:16925836
DNA context represents transcription regulation of the gene in mouse embryonic stem cells

NASA Astrophysics Data System (ADS)

Ha, Misook; Hong, Soondo

2016-04-01

Understanding gene regulatory information in DNA remains a significant challenge in biomedical research. This study presents a computational approach to infer gene regulatory programs from primary DNA sequences. Using DNA around transcription start sites as attributes, our model predicts gene regulation in the gene. We find that H3K27ac around TSS is an informative descriptor of the transcription program in mouse embryonic stem cells. We build a computational model inferring the cell-type-specific H3K27ac signatures in the DNA around TSS. A comparison of embryonic stem cell and liver cell-specific H3K27ac signatures in DNA shows that the H3K27ac signatures in DNA around TSS efficiently distinguish the cell-type specific H3K27ac peaks and the gene regulation. The arrangement of the H3K27ac signatures inferred from the DNA represents the transcription regulation of the gene in mESC. We show that the DNA around transcription start sites is associated with the gene regulatory program by specific interaction with H3K27ac.
DNA context represents transcription regulation of the gene in mouse embryonic stem cells.

PubMed

Ha, Misook; Hong, Soondo

2016-04-14

Understanding gene regulatory information in DNA remains a significant challenge in biomedical research. This study presents a computational approach to infer gene regulatory programs from primary DNA sequences. Using DNA around transcription start sites as attributes, our model predicts gene regulation in the gene. We find that H3K27ac around TSS is an informative descriptor of the transcription program in mouse embryonic stem cells. We build a computational model inferring the cell-type-specific H3K27ac signatures in the DNA around TSS. A comparison of embryonic stem cell and liver cell-specific H3K27ac signatures in DNA shows that the H3K27ac signatures in DNA around TSS efficiently distinguish the cell-type specific H3K27ac peaks and the gene regulation. The arrangement of the H3K27ac signatures inferred from the DNA represents the transcription regulation of the gene in mESC. We show that the DNA around transcription start sites is associated with the gene regulatory program by specific interaction with H3K27ac.
Low-rank regularization for learning gene expression programs.

PubMed

Ye, Guibo; Tang, Mengfan; Cai, Jian-Feng; Nie, Qing; Xie, Xiaohui

2013-01-01

Learning gene expression programs directly from a set of observations is challenging due to the complexity of gene regulation, high noise of experimental measurements, and insufficient number of experimental measurements. Imposing additional constraints with strong and biologically motivated regularizations is critical in developing reliable and effective algorithms for inferring gene expression programs. Here we propose a new form of regulation that constrains the number of independent connectivity patterns between regulators and targets, motivated by the modular design of gene regulatory programs and the belief that the total number of independent regulatory modules should be small. We formulate a multi-target linear regression framework to incorporate this type of regulation, in which the number of independent connectivity patterns is expressed as the rank of the connectivity matrix between regulators and targets. We then generalize the linear framework to nonlinear cases, and prove that the generalized low-rank regularization model is still convex. Efficient algorithms are derived to solve both the linear and nonlinear low-rank regularized problems. Finally, we test the algorithms on three gene expression datasets, and show that the low-rank regularization improves the accuracy of gene expression prediction in these three datasets.
A gene expression biomarker accurately predicts estrogen receptor α modulation in a human gene expression compendium

EPA Science Inventory

The EPA’s vision for the Endocrine Disruptor Screening Program (EDSP) in the 21st Century (EDSP21) includes utilization of high-throughput screening (HTS) assays coupled with computational modeling to prioritize chemicals with the goal of eventually replacing current Tier 1...
Visual gene developer: a fully programmable bioinformatics software for synthetic gene optimization.

PubMed

Jung, Sang-Kyu; McDonald, Karen

2011-08-16

Direct gene synthesis is becoming more popular owing to decreases in gene synthesis pricing. Compared with using natural genes, gene synthesis provides a good opportunity to optimize gene sequence for specific applications. In order to facilitate gene optimization, we have developed a stand-alone software called Visual Gene Developer. The software not only provides general functions for gene analysis and optimization along with an interactive user-friendly interface, but also includes unique features such as programming capability, dedicated mRNA secondary structure prediction, artificial neural network modeling, network & multi-threaded computing, and user-accessible programming modules. The software allows a user to analyze and optimize a sequence using main menu functions or specialized module windows. Alternatively, gene optimization can be initiated by designing a gene construct and configuring an optimization strategy. A user can choose several predefined or user-defined algorithms to design a complicated strategy. The software provides expandable functionality as platform software supporting module development using popular script languages such as VBScript and JScript in the software programming environment. Visual Gene Developer is useful for both researchers who want to quickly analyze and optimize genes, and those who are interested in developing and testing new algorithms in bioinformatics. The software is available for free download at http://www.visualgenedeveloper.net.
Visual gene developer: a fully programmable bioinformatics software for synthetic gene optimization

PubMed Central

2011-01-01

Background Direct gene synthesis is becoming more popular owing to decreases in gene synthesis pricing. Compared with using natural genes, gene synthesis provides a good opportunity to optimize gene sequence for specific applications. In order to facilitate gene optimization, we have developed a stand-alone software called Visual Gene Developer. Results The software not only provides general functions for gene analysis and optimization along with an interactive user-friendly interface, but also includes unique features such as programming capability, dedicated mRNA secondary structure prediction, artificial neural network modeling, network & multi-threaded computing, and user-accessible programming modules. The software allows a user to analyze and optimize a sequence using main menu functions or specialized module windows. Alternatively, gene optimization can be initiated by designing a gene construct and configuring an optimization strategy. A user can choose several predefined or user-defined algorithms to design a complicated strategy. The software provides expandable functionality as platform software supporting module development using popular script languages such as VBScript and JScript in the software programming environment. Conclusion Visual Gene Developer is useful for both researchers who want to quickly analyze and optimize genes, and those who are interested in developing and testing new algorithms in bioinformatics. The software is available for free download at http://www.visualgenedeveloper.net. PMID:21846353

Integrative Genomic Analyses Yields Cell Cycle Regulatory Programs with Prognostic Value

PubMed Central

Cheng, Chao; Lou, Shaoke; Andrews, Erik H.; Ung, Matthew H.; Varn, Frederick S.

2016-01-01

Liposarcoma is the second most common form of sarcoma, which has been categorized into four molecular subtypes, which are associated with differential prognosis of patients. However, the transcriptional regulatory programs associated with distinct histological and molecular subtypes of liposarcoma have not been investigated. This study uses integrative analyses to systematically define the transcriptional regulatory programs associated with liposarcoma. Likewise, computational methods are used to identify regulatory programs associated with different liposarcoma subtypes as well as programs that are predictive of prognosis. Further analysis of curated gene sets was used to identify prognostic gene signatures. The integration of data from a variety sources including gene expression profiles, transcription factor (TF) binding data from ChIP-seq experiments, curated gene sets, and clinical information of patients indicated discrete regulatory programs (e.g., controlled by E2F1 and E2F4) with significantly different regulatory activity in one or multiple subtypes of liposarcoma with respect to normal adipose tissue. These programs were also shown to be prognostic, wherein liposarcoma patients with higher E2F4 or E2F1 activity associated with unfavorable prognosis. A total of 259 gene sets were significantly associated with patient survival in liposarcoma, among which >50% are involved in cell cycle and proliferation. PMID:26856934
Predicting BRCA1 and BRCA2 gene mutation carriers: comparison of LAMBDA, BRCAPRO, Myriad II, and modified Couch models.

PubMed

Lindor, Noralane M; Lindor, Rachel A; Apicella, Carmel; Dowty, James G; Ashley, Amanda; Hunt, Katherine; Mincey, Betty A; Wilson, Marcia; Smith, M Cathie; Hopper, John L

2007-01-01

Models have been developed to predict the probability that a person carries a detectable germline mutation in the BRCA1 or BRCA2 genes. Their relative performance in a clinical setting is unclear. To compare the performance characteristics of four BRCA1/BRCA2 gene mutation prediction models: LAMBDA, based on a checklist and scores developed from data on Ashkenazi Jewish (AJ) women; BRCAPRO, a Bayesian computer program; modified Couch tables based on regression analyses; and Myriad II tables collated by Myriad Genetics Laboratories. Family cancer history data were analyzed from 200 probands from the Mayo Clinic Familial Cancer Program, in a multispecialty tertiary care group practice. All probands had clinical testing for BRCA1 and BRCA2 mutations conducted in a single laboratory. For each model, performance was assessed by the area under the receiver operator characteristic curve (ROC) and by tests of accuracy and dispersion. Cases "missed" by one or more models (model predicted less than 10% probability of mutation when a mutation was actually found) were compared across models. All models gave similar areas under the ROC curve of 0.71 to 0.76. All models except LAMBDA substantially under-predicted the numbers of carriers. All models were too dispersed. In terms of ranking, all prediction models performed reasonably well with similar performance characteristics. Model predictions were widely discrepant for some families. Review of cancer family histories by an experienced clinician continues to be vital to ensure that critical elements are not missed and that the most appropriate risk prediction figures are provided.
Peak flood estimation using gene expression programming

NASA Astrophysics Data System (ADS)

Zorn, Conrad R.; Shamseldin, Asaad Y.

2015-12-01

As a case study for the Auckland Region of New Zealand, this paper investigates the potential use of gene-expression programming (GEP) in predicting specific return period events in comparison to the established and widely used Regional Flood Estimation (RFE) method. Initially calibrated to 14 gauged sites, the GEP derived model was further validated to 10 and 100 year flood events with a relative errors of 29% and 18%, respectively. This is compared to the RFE method providing 48% and 44% errors for the same flood events. While the effectiveness of GEP in predicting specific return period events is made apparent, it is argued that the derived equations should be used in conjunction with those existing methodologies rather than as a replacement.
Gene-expression programming for flip-bucket spillway scour.

PubMed

Guven, Aytac; Azamathulla, H Md

2012-01-01

During the last two decades, researchers have noticed that the use of soft computing techniques as an alternative to conventional statistical methods based on controlled laboratory or field data, gave significantly better results. Gene-expression programming (GEP), which is an extension to genetic programming (GP), has nowadays attracted the attention of researchers in prediction of hydraulic data. This study presents GEP as an alternative tool in the prediction of scour downstream of a flip-bucket spillway. Actual field measurements were used to develop GEP models. The proposed GEP models are compared with the earlier conventional GP results of others (Azamathulla et al. 2008b; RMSE = 2.347, δ = 0.377, R = 0.842) and those of commonly used regression-based formulae. The predictions of GEP models were observed to be in strictly good agreement with measured ones, and quite a bit better than conventional GP and the regression-based formulae. The results are tabulated in terms of statistical error measures (GEP1; RMSE = 1.596, δ = 0.109, R = 0.917) and illustrated via scatter plots.
MED: a new non-supervised gene prediction algorithm for bacterial and archaeal genomes.

PubMed

Zhu, Huaiqiu; Hu, Gang-Qing; Yang, Yi-Fan; Wang, Jin; She, Zhen-Su

2007-03-16

Despite a remarkable success in the computational prediction of genes in Bacteria and Archaea, a lack of comprehensive understanding of prokaryotic gene structures prevents from further elucidation of differences among genomes. It continues to be interesting to develop new ab initio algorithms which not only accurately predict genes, but also facilitate comparative studies of prokaryotic genomes. This paper describes a new prokaryotic genefinding algorithm based on a comprehensive statistical model of protein coding Open Reading Frames (ORFs) and Translation Initiation Sites (TISs). The former is based on a linguistic "Entropy Density Profile" (EDP) model of coding DNA sequence and the latter comprises several relevant features related to the translation initiation. They are combined to form a so-called Multivariate Entropy Distance (MED) algorithm, MED 2.0, that incorporates several strategies in the iterative program. The iterations enable us to develop a non-supervised learning process and to obtain a set of genome-specific parameters for the gene structure, before making the prediction of genes. Results of extensive tests show that MED 2.0 achieves a competitive high performance in the gene prediction for both 5' and 3' end matches, compared to the current best prokaryotic gene finders. The advantage of the MED 2.0 is particularly evident for GC-rich genomes and archaeal genomes. Furthermore, the genome-specific parameters given by MED 2.0 match with the current understanding of prokaryotic genomes and may serve as tools for comparative genomic studies. In particular, MED 2.0 is shown to reveal divergent translation initiation mechanisms in archaeal genomes while making a more accurate prediction of TISs compared to the existing gene finders and the current GenBank annotation.
Prediction of essential proteins based on gene expression programming.

PubMed

Zhong, Jiancheng; Wang, Jianxin; Peng, Wei; Zhang, Zhen; Pan, Yi

2013-01-01

Essential proteins are indispensable for cell survive. Identifying essential proteins is very important for improving our understanding the way of a cell working. There are various types of features related to the essentiality of proteins. Many methods have been proposed to combine some of them to predict essential proteins. However, it is still a big challenge for designing an effective method to predict them by integrating different features, and explaining how these selected features decide the essentiality of protein. Gene expression programming (GEP) is a learning algorithm and what it learns specifically is about relationships between variables in sets of data and then builds models to explain these relationships. In this work, we propose a GEP-based method to predict essential protein by combing some biological features and topological features. We carry out experiments on S. cerevisiae data. The experimental results show that the our method achieves better prediction performance than those methods using individual features. Moreover, our method outperforms some machine learning methods and performs as well as a method which is obtained by combining the outputs of eight machine learning methods. The accuracy of predicting essential proteins can been improved by using GEP method to combine some topological features and biological features.
iTAK: A program for genome-wide prediction and classification of plant transcription factors, transcriptional regulators and protein kinases

USDA-ARS?s Scientific Manuscript database

Transcription factors (TFs) are proteins that regulate the expression of target genes by binding to specific elements in their regulatory regions. Transcriptional regulators (TRs) also regulate the expression of target genes; however, they operate indirectly via interaction with the basal transcript...
Thermodynamic Constraints Improve Metabolic Networks.

PubMed

Krumholz, Elias W; Libourel, Igor G L

2017-08-08

In pursuit of establishing a realistic metabolic phenotypic space, the reversibility of reactions is thermodynamically constrained in modern metabolic networks. The reversibility constraints follow from heuristic thermodynamic poise approximations that take anticipated cellular metabolite concentration ranges into account. Because constraints reduce the feasible space, draft metabolic network reconstructions may need more extensive reconciliation, and a larger number of genes may become essential. Notwithstanding ubiquitous application, the effect of reversibility constraints on the predictive capabilities of metabolic networks has not been investigated in detail. Instead, work has focused on the implementation and validation of the thermodynamic poise calculation itself. With the advance of fast linear programming-based network reconciliation, the effects of reversibility constraints on network reconciliation and gene essentiality predictions have become feasible and are the subject of this study. Networks with thermodynamically informed reversibility constraints outperformed gene essentiality predictions compared to networks that were constrained with randomly shuffled constraints. Unconstrained networks predicted gene essentiality as accurately as thermodynamically constrained networks, but predicted substantially fewer essential genes. Networks that were reconciled with sequence similarity data and strongly enforced reversibility constraints outperformed all other networks. We conclude that metabolic network analysis confirmed the validity of the thermodynamic constraints, and that thermodynamic poise information is actionable during network reconciliation. Copyright © 2017 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Molecular Structure-Based Large-Scale Prediction of Chemical-Induced Gene Expression Changes.

PubMed

Liu, Ruifeng; AbdulHameed, Mohamed Diwan M; Wallqvist, Anders

2017-09-25

The quantitative structure-activity relationship (QSAR) approach has been used to model a wide range of chemical-induced biological responses. However, it had not been utilized to model chemical-induced genomewide gene expression changes until very recently, owing to the complexity of training and evaluating a very large number of models. To address this issue, we examined the performance of a variable nearest neighbor (v-NN) method that uses information on near neighbors conforming to the principle that similar structures have similar activities. Using a data set of gene expression signatures of 13 150 compounds derived from cell-based measurements in the NIH Library of Integrated Network-based Cellular Signatures program, we were able to make predictions for 62% of the compounds in a 10-fold cross validation test, with a correlation coefficient of 0.61 between the predicted and experimentally derived signatures-a reproducibility rivaling that of high-throughput gene expression measurements. To evaluate the utility of the predicted gene expression signatures, we compared the predicted and experimentally derived signatures in their ability to identify drugs known to cause specific liver, kidney, and heart injuries. Overall, the predicted and experimentally derived signatures had similar receiver operating characteristics, whose areas under the curve ranged from 0.71 to 0.77 and 0.70 to 0.73, respectively, across the three organ injury models. However, detailed analyses of enrichment curves indicate that signatures predicted from multiple near neighbors outperformed those derived from experiments, suggesting that averaging information from near neighbors may help improve the signal from gene expression measurements. Our results demonstrate that the v-NN method can serve as a practical approach for modeling large-scale, genomewide, chemical-induced, gene expression changes.
A Novel Method to Predict Highly Expressed Genes Based on Radius Clustering and Relative Synonymous Codon Usage.

PubMed

Tran, Tuan-Anh; Vo, Nam Tri; Nguyen, Hoang Duc; Pham, Bao The

2015-12-01

Recombinant proteins play an important role in many aspects of life and have generated a huge income, notably in the industrial enzyme business. A gene is introduced into a vector and expressed in a host organism-for example, E. coli-to obtain a high productivity of target protein. However, transferred genes from particular organisms are not usually compatible with the host's expression system because of various reasons, for example, codon usage bias, GC content, repetitive sequences, and secondary structure. The solution is developing programs to optimize for designing a nucleotide sequence whose origin is from peptide sequences using properties of highly expressed genes (HEGs) of the host organism. Existing data of HEGs determined by practical and computer-based methods do not satisfy for qualifying and quantifying. Therefore, the demand for developing a new HEG prediction method is critical. We proposed a new method for predicting HEGs and criteria to evaluate gene optimization. Codon usage bias was weighted by amplifying the difference between HEGs and non-highly expressed genes (non-HEGs). The number of predicted HEGs is 5% of the genome. In comparison with Puigbò's method, the result is twice as good as Puigbò's one, in kernel ratio and kernel sensitivity. Concerning transcription/translation factor proteins (TF), the proposed method gives low TF sensitivity, while Puigbò's method gives moderate one. In summary, the results indicated that the proposed method can be a good optional applying method to predict optimized genes for particular organisms, and we generated an HEG database for further researches in gene design.
Sexy gene conversions: locating gene conversions on the X-chromosome.

PubMed

Lawson, Mark J; Zhang, Liqing

2009-08-01

Gene conversion can have a profound impact on both the short- and long-term evolution of genes and genomes. Here, we examined the gene families that are located on the X-chromosomes of human (Homo sapiens), chimpanzee (Pan troglodytes), mouse (Mus musculus) and rat (Rattus norvegicus) for evidence of gene conversion. We identified seven gene families (WD repeat protein family, Ferritin Heavy Chain family, RAS-related Protein RAB-40 family, Diphosphoinositol polyphosphate phosphohydrolase family, Transcription Elongation Factor A family, LDOC1-related family, Zinc Finger Protein ZIC, and GLI family) that show evidence of gene conversion. Through phylogenetic analyses and synteny evidence, we show that gene conversion has played an important role in the evolution of these gene families and that gene conversion has occurred independently in both primates and rodents. Comparing the results with those of two gene conversion prediction programs (GENECONV and Partimatrix), we found that both GENECONV and Partimatrix have very high false negative rates (i.e. failed to predict gene conversions), which leads to many undetected gene conversions. The combination of phylogenetic analyses with physical synteny evidence exhibits high resolution in the detection of gene conversions.
Using the ToxMiner Database for Identifying Disease-Gene Associations in the ToxCast Dataset

EPA Science Inventory

The US EPA ToxCast program is using in vitro, high-throughput screening (HTS) to profile and model the bioactivity of environmental chemicals. The main goal of the ToxCast program is to generate predictive signatures of toxicity that ultimately provide rapid and cost-effective me...
Application of the ToxMiner Database: Network Analysis Linking the ToxCast Chemicals to Known Disease-Gene Associations

EPA Science Inventory

The US EPA ToxCast program is using in vitro HTS (High-Throughput Screening) methods to profile and model bioactivity of environmental chemicals. The main goals of the ToxCast program are to generate predictive signatures of toxicity, and ultimately provide rapid and cost-effecti...
Web application for automatic prediction of gene translation elongation efficiency.

PubMed

Sokolov, Vladimir; Zuraev, Bulat; Lashin, Sergei; Matushkin, Yury

2015-09-03

Expression efficiency is one of the major characteristics describing genes in various modern investigations. Expression efficiency of genes is regulated at various stages: transcription, translation, posttranslational protein modification and others. In this study, a special EloE (Elongation Efficiency) web application is described. The EloE sorts the organism's genes in a descend order on their theoretical rate of the elongation stage of translation based on the analysis of their nucleotide sequences. Obtained theoretical data have a significant correlation with available experimental data of gene expression in various organisms. In addition, the program identifies preferential codons in organism's genes and defines distribution of potential secondary structures energy in 5´ and 3´ regions of mRNA. The EloE can be useful in preliminary estimation of translation elongation efficiency for genes for which experimental data are not available yet. Some results can be used, for instance, in other programs modeling artificial genetic structures in genetically engineered experiments.
Advances and Challenges in Genomic Selection for Disease Resistance.

PubMed

Poland, Jesse; Rutkoski, Jessica

2016-08-04

Breeding for disease resistance is a central focus of plant breeding programs, as any successful variety must have the complete package of high yield, disease resistance, agronomic performance, and end-use quality. With the need to accelerate the development of improved varieties, genomics-assisted breeding is becoming an important tool in breeding programs. With marker-assisted selection, there has been success in breeding for disease resistance; however, much of this work and research has focused on identifying, mapping, and selecting for major resistance genes that tend to be highly effective but vulnerable to breakdown with rapid changes in pathogen races. In contrast, breeding for minor-gene quantitative resistance tends to produce more durable varieties but is a more challenging breeding objective. As the genetic architecture of resistance shifts from single major R genes to a diffused architecture of many minor genes, the best approach for molecular breeding will shift from marker-assisted selection to genomic selection. Genomics-assisted breeding for quantitative resistance will therefore necessitate whole-genome prediction models and selection methodology as implemented for classical complex traits such as yield. Here, we examine multiple case studies testing whole-genome prediction models and genomic selection for disease resistance. In general, whole-genome models for disease resistance can produce prediction accuracy suitable for application in breeding. These models also largely outperform multiple linear regression as would be applied in marker-assisted selection. With the implementation of genomic selection for yield and other agronomic traits, whole-genome marker profiles will be available for the entire set of breeding lines, enabling genomic selection for disease at no additional direct cost. In this context, the scope of implementing genomics selection for disease resistance, and specifically for quantitative resistance and quarantined pathogens, becomes a tractable and powerful approach in breeding programs.
Protocorms and Protocorm-Like Bodies Are Molecularly Distinct from Zygotic Embryonic Tissues in Phalaenopsis aphrodite1[OPEN

PubMed Central

Chen, Jhun-Chen; Wei, Miao-Ju

2016-01-01

The distinct reproductive program of orchids provides a unique evolutionary model with pollination-triggered ovule development and megasporogenesis, a modified embryogenesis program resulting in seeds with immature embryos, and mycorrhiza-induced seed germination. However, the molecular mechanisms that have evolved to establish these unparalleled developmental programs are largely unclear. Here, we conducted comparative studies of genome-wide gene expression of various reproductive tissues and captured the molecular events associated with distinct reproductive programs in Phalaenopsis aphrodite. Importantly, our data provide evidence to demonstrate that protocorm-like body (PLB) regeneration (the clonal regeneration practice used in the orchid industry) does not follow the embryogenesis program. Instead, we propose that SHOOT MERISTEMLESS, a class I KNOTTED-LIKE HOMEOBOX gene, is likely to play a role in PLB regeneration. Our studies challenge the current understanding of the embryonic identity of PLBs. Taken together, the data obtained establish a fundamental framework for orchid reproductive development and provide a valuable new resource to enable the prediction of gene regulatory networks that is required for specialized developmental programs of orchid species. PMID:27338813
A Seasonal Time-Series Model Based on Gene Expression Programming for Predicting Financial Distress

PubMed Central

2018-01-01

The issue of financial distress prediction plays an important and challenging research topic in the financial field. Currently, there have been many methods for predicting firm bankruptcy and financial crisis, including the artificial intelligence and the traditional statistical methods, and the past studies have shown that the prediction result of the artificial intelligence method is better than the traditional statistical method. Financial statements are quarterly reports; hence, the financial crisis of companies is seasonal time-series data, and the attribute data affecting the financial distress of companies is nonlinear and nonstationary time-series data with fluctuations. Therefore, this study employed the nonlinear attribute selection method to build a nonlinear financial distress prediction model: that is, this paper proposed a novel seasonal time-series gene expression programming model for predicting the financial distress of companies. The proposed model has several advantages including the following: (i) the proposed model is different from the previous models lacking the concept of time series; (ii) the proposed integrated attribute selection method can find the core attributes and reduce high dimensional data; and (iii) the proposed model can generate the rules and mathematical formulas of financial distress for providing references to the investors and decision makers. The result shows that the proposed method is better than the listing classifiers under three criteria; hence, the proposed model has competitive advantages in predicting the financial distress of companies. PMID:29765399
A Seasonal Time-Series Model Based on Gene Expression Programming for Predicting Financial Distress.

PubMed

Cheng, Ching-Hsue; Chan, Chia-Pang; Yang, Jun-He

2018-01-01

The issue of financial distress prediction plays an important and challenging research topic in the financial field. Currently, there have been many methods for predicting firm bankruptcy and financial crisis, including the artificial intelligence and the traditional statistical methods, and the past studies have shown that the prediction result of the artificial intelligence method is better than the traditional statistical method. Financial statements are quarterly reports; hence, the financial crisis of companies is seasonal time-series data, and the attribute data affecting the financial distress of companies is nonlinear and nonstationary time-series data with fluctuations. Therefore, this study employed the nonlinear attribute selection method to build a nonlinear financial distress prediction model: that is, this paper proposed a novel seasonal time-series gene expression programming model for predicting the financial distress of companies. The proposed model has several advantages including the following: (i) the proposed model is different from the previous models lacking the concept of time series; (ii) the proposed integrated attribute selection method can find the core attributes and reduce high dimensional data; and (iii) the proposed model can generate the rules and mathematical formulas of financial distress for providing references to the investors and decision makers. The result shows that the proposed method is better than the listing classifiers under three criteria; hence, the proposed model has competitive advantages in predicting the financial distress of companies.
Prioritization of candidate disease genes by topological similarity between disease and protein diffusion profiles.

PubMed

Zhu, Jie; Qin, Yufang; Liu, Taigang; Wang, Jun; Zheng, Xiaoqi

2013-01-01

Identification of gene-phenotype relationships is a fundamental challenge in human health clinic. Based on the observation that genes causing the same or similar phenotypes tend to correlate with each other in the protein-protein interaction network, a lot of network-based approaches were proposed based on different underlying models. A recent comparative study showed that diffusion-based methods achieve the state-of-the-art predictive performance. In this paper, a new diffusion-based method was proposed to prioritize candidate disease genes. Diffusion profile of a disease was defined as the stationary distribution of candidate genes given a random walk with restart where similarities between phenotypes are incorporated. Then, candidate disease genes are prioritized by comparing their diffusion profiles with that of the disease. Finally, the effectiveness of our method was demonstrated through the leave-one-out cross-validation against control genes from artificial linkage intervals and randomly chosen genes. Comparative study showed that our method achieves improved performance compared to some classical diffusion-based methods. To further illustrate our method, we used our algorithm to predict new causing genes of 16 multifactorial diseases including Prostate cancer and Alzheimer's disease, and the top predictions were in good consistent with literature reports. Our study indicates that integration of multiple information sources, especially the phenotype similarity profile data, and introduction of global similarity measure between disease and gene diffusion profiles are helpful for prioritizing candidate disease genes. Programs and data are available upon request.
TnpPred: A Web Service for the Robust Prediction of Prokaryotic Transposases

PubMed Central

Riadi, Gonzalo; Medina-Moenne, Cristobal; Holmes, David S.

2012-01-01

Transposases (Tnps) are enzymes that participate in the movement of insertion sequences (ISs) within and between genomes. Genes that encode Tnps are amongst the most abundant and widely distributed genes in nature. However, they are difficult to predict bioinformatically and given the increasing availability of prokaryotic genomes and metagenomes, it is incumbent to develop rapid, high quality automatic annotation of ISs. This need prompted us to develop a web service, termed TnpPred for Tnp discovery. It provides better sensitivity and specificity for Tnp predictions than given by currently available programs as determined by ROC analysis. TnpPred should be useful for improving genome annotation. The TnpPred web service is freely available for noncommercial use. PMID:23251097

Exploring root symbiotic programs in the model legume Medicago truncatula using EST analysis.

PubMed

Journet, Etienne-Pascal; van Tuinen, Diederik; Gouzy, Jérome; Crespeau, Hervé; Carreau, Véronique; Farmer, Mary-Jo; Niebel, Andreas; Schiex, Thomas; Jaillon, Olivier; Chatagnier, Odile; Godiard, Laurence; Micheli, Fabienne; Kahn, Daniel; Gianinazzi-Pearson, Vivienne; Gamas, Pascal

2002-12-15

We report on a large-scale expressed sequence tag (EST) sequencing and analysis program aimed at characterizing the sets of genes expressed in roots of the model legume Medicago truncatula during interactions with either of two microsymbionts, the nitrogen-fixing bacterium Sinorhizobium meliloti or the arbuscular mycorrhizal fungus Glomus intraradices. We have designed specific tools for in silico analysis of EST data, in relation to chimeric cDNA detection, EST clustering, encoded protein prediction, and detection of differential expression. Our 21 473 5'- and 3'-ESTs could be grouped into 6359 EST clusters, corresponding to distinct virtual genes, along with 52 498 other M.truncatula ESTs available in the dbEST (NCBI) database that were recruited in the process. These clusters were manually annotated, using a specifically developed annotation interface. Analysis of EST cluster distribution in various M.truncatula cDNA libraries, supported by a refined R test to evaluate statistical significance and by 'electronic northern' representation, enabled us to identify a large number of novel genes predicted to be up- or down-regulated during either symbiotic root interaction. These in silico analyses provide a first global view of the genetic programs for root symbioses in M.truncatula. A searchable database has been built and can be accessed through a public interface.
Transcriptional Network Analysis in Muscle Reveals AP-1 as a Partner of PGC-1α in the Regulation of the Hypoxic Gene Program

PubMed Central

Baresic, Mario; Salatino, Silvia; Kupr, Barbara

2014-01-01

Skeletal muscle tissue shows an extraordinary cellular plasticity, but the underlying molecular mechanisms are still poorly understood. Here, we use a combination of experimental and computational approaches to unravel the complex transcriptional network of muscle cell plasticity centered on the peroxisome proliferator-activated receptor γ coactivator 1α (PGC-1α), a regulatory nexus in endurance training adaptation. By integrating data on genome-wide binding of PGC-1α and gene expression upon PGC-1α overexpression with comprehensive computational prediction of transcription factor binding sites (TFBSs), we uncover a hitherto-underestimated number of transcription factor partners involved in mediating PGC-1α action. In particular, principal component analysis of TFBSs at PGC-1α binding regions predicts that, besides the well-known role of the estrogen-related receptor α (ERRα), the activator protein 1 complex (AP-1) plays a major role in regulating the PGC-1α-controlled gene program of the hypoxia response. Our findings thus reveal the complex transcriptional network of muscle cell plasticity controlled by PGC-1α. PMID:24912679
Biological Networks for Predicting Chemical Hepatocarcinogenicity Using Gene Expression Data from Treated Mice and Relevance across Human and Rat Species

PubMed Central

Thomas, Reuben; Thomas, Russell S.; Auerbach, Scott S.; Portier, Christopher J.

2013-01-01

Background Several groups have employed genomic data from subchronic chemical toxicity studies in rodents (90 days) to derive gene-centric predictors of chronic toxicity and carcinogenicity. Genes are annotated to belong to biological processes or molecular pathways that are mechanistically well understood and are described in public databases. Objectives To develop a molecular pathway-based prediction model of long term hepatocarcinogenicity using 90-day gene expression data and to evaluate the performance of this model with respect to both intra-species, dose-dependent and cross-species predictions. Methods Genome-wide hepatic mRNA expression was retrospectively measured in B6C3F1 mice following subchronic exposure to twenty-six (26) chemicals (10 were positive, 2 equivocal and 14 negative for liver tumors) previously studied by the US National Toxicology Program. Using these data, a pathway-based predictor model for long-term liver cancer risk was derived using random forests. The prediction model was independently validated on test sets associated with liver cancer risk obtained from mice, rats and humans. Results Using 5-fold cross validation, the developed prediction model had reasonable predictive performance with the area under receiver-operator curve (AUC) equal to 0.66. The developed prediction model was then used to extrapolate the results to data associated with rat and human liver cancer. The extrapolated model worked well for both extrapolated species (AUC value of 0.74 for rats and 0.91 for humans). The prediction models implied a balanced interplay between all pathway responses leading to carcinogenicity predictions. Conclusions Pathway-based prediction models estimated from sub-chronic data hold promise for predicting long-term carcinogenicity and also for its ability to extrapolate results across multiple species. PMID:23737943
Biological networks for predicting chemical hepatocarcinogenicity using gene expression data from treated mice and relevance across human and rat species.

PubMed

Thomas, Reuben; Thomas, Russell S; Auerbach, Scott S; Portier, Christopher J

2013-01-01

Several groups have employed genomic data from subchronic chemical toxicity studies in rodents (90 days) to derive gene-centric predictors of chronic toxicity and carcinogenicity. Genes are annotated to belong to biological processes or molecular pathways that are mechanistically well understood and are described in public databases. To develop a molecular pathway-based prediction model of long term hepatocarcinogenicity using 90-day gene expression data and to evaluate the performance of this model with respect to both intra-species, dose-dependent and cross-species predictions. Genome-wide hepatic mRNA expression was retrospectively measured in B6C3F1 mice following subchronic exposure to twenty-six (26) chemicals (10 were positive, 2 equivocal and 14 negative for liver tumors) previously studied by the US National Toxicology Program. Using these data, a pathway-based predictor model for long-term liver cancer risk was derived using random forests. The prediction model was independently validated on test sets associated with liver cancer risk obtained from mice, rats and humans. Using 5-fold cross validation, the developed prediction model had reasonable predictive performance with the area under receiver-operator curve (AUC) equal to 0.66. The developed prediction model was then used to extrapolate the results to data associated with rat and human liver cancer. The extrapolated model worked well for both extrapolated species (AUC value of 0.74 for rats and 0.91 for humans). The prediction models implied a balanced interplay between all pathway responses leading to carcinogenicity predictions. Pathway-based prediction models estimated from sub-chronic data hold promise for predicting long-term carcinogenicity and also for its ability to extrapolate results across multiple species.
Gene expression programming approach for the estimation of moisture ratio in herbal plants drying with vacuum heat pump dryer

NASA Astrophysics Data System (ADS)

Dikmen, Erkan; Ayaz, Mahir; Gül, Doğan; Şahin, Arzu Şencan

2017-07-01

The determination of drying behavior of herbal plants is a complex process. In this study, gene expression programming (GEP) model was used to determine drying behavior of herbal plants as fresh sweet basil, parsley and dill leaves. Time and drying temperatures are input parameters for the estimation of moisture ratio of herbal plants. The results of the GEP model are compared with experimental drying data. The statistical values as mean absolute percentage error, root-mean-squared error and R-square are used to calculate the difference between values predicted by the GEP model and the values actually observed from the experimental study. It was found that the results of the GEP model and experimental study are in moderately well agreement. The results have shown that the GEP model can be considered as an efficient modelling technique for the prediction of moisture ratio of herbal plants.
Computational prediction of over-annotated protein-coding genes in the genome of Agrobacterium tumefaciens strain C58

NASA Astrophysics Data System (ADS)

Yu, Jia-Feng; Sui, Tian-Xiang; Wang, Hong-Mei; Wang, Chun-Ling; Jing, Li; Wang, Ji-Hua

2015-12-01

Agrobacterium tumefaciens strain C58 is a type of pathogen that can cause tumors in some dicotyledonous plants. Ever since the genome of A. tumefaciens strain C58 was sequenced, the quality of annotation of its protein-coding genes has been queried continually, because the annotation varies greatly among different databases. In this paper, the questionable hypothetical genes were re-predicted by integrating the TN curve and Z curve methods. As a result, 30 genes originally annotated as “hypothetical” were discriminated as being non-coding sequences. By testing the re-prediction program 10 times on data sets composed of the function-known genes, the mean accuracy of 99.99% and mean Matthews correlation coefficient value of 0.9999 were obtained. Further sequence analysis and COG analysis showed that the re-annotation results were very reliable. This work can provide an efficient tool and data resources for future studies of A. tumefaciens strain C58. Project supported by the National Natural Science Foundation of China (Grant Nos. 61302186 and 61271378) and the Funding from the State Key Laboratory of Bioelectronics of Southeast University.
An unsupervised classification scheme for improving predictions of prokaryotic TIS.

PubMed

Tech, Maike; Meinicke, Peter

2006-03-09

Although it is not difficult for state-of-the-art gene finders to identify coding regions in prokaryotic genomes, exact prediction of the corresponding translation initiation sites (TIS) is still a challenging problem. Recently a number of post-processing tools have been proposed for improving the annotation of prokaryotic TIS. However, inherent difficulties of these approaches arise from the considerable variation of TIS characteristics across different species. Therefore prior assumptions about the properties of prokaryotic gene starts may cause suboptimal predictions for newly sequenced genomes with TIS signals differing from those of well-investigated genomes. We introduce a clustering algorithm for completely unsupervised scoring of potential TIS, based on positionally smoothed probability matrices. The algorithm requires an initial gene prediction and the genomic sequence of the organism to perform the reannotation. As compared with other methods for improving predictions of gene starts in bacterial genomes, our approach is not based on any specific assumptions about prokaryotic TIS. Despite the generality of the underlying algorithm, the prediction rate of our method is competitive on experimentally verified test data from E. coli and B. subtilis. Regarding genomes with high G+C content, in contrast to some previously proposed methods, our algorithm also provides good performance on P. aeruginosa, B. pseudomallei and R. solanacearum. On reliable test data we showed that our method provides good results in post-processing the predictions of the widely-used program GLIMMER. The underlying clustering algorithm is robust with respect to variations in the initial TIS annotation and does not require specific assumptions about prokaryotic gene starts. These features are particularly useful on genomes with high G+C content. The algorithm has been implemented in the tool "TICO" (TIs COrrector) which is publicly available from our web site.
Application of gene expression programming and neural networks to predict adverse events of radical hysterectomy in cervical cancer patients.

PubMed

Kusy, Maciej; Obrzut, Bogdan; Kluska, Jacek

2013-12-01

The aim of this article was to compare gene expression programming (GEP) method with three types of neural networks in the prediction of adverse events of radical hysterectomy in cervical cancer patients. One-hundred and seven patients treated by radical hysterectomy were analyzed. Each record representing a single patient consisted of 10 parameters. The occurrence and lack of perioperative complications imposed a two-class classification problem. In the simulations, GEP algorithm was compared to a multilayer perceptron (MLP), a radial basis function network neural, and a probabilistic neural network. The generalization ability of the models was assessed on the basis of their accuracy, the sensitivity, the specificity, and the area under the receiver operating characteristic curve (AUROC). The GEP classifier provided best results in the prediction of the adverse events with the accuracy of 71.96 %. Comparable but slightly worse outcomes were obtained using MLP, i.e., 71.87 %. For each of measured indices: accuracy, sensitivity, specificity, and the AUROC, the standard deviation was the smallest for the models generated by GEP classifier.
A web application for automatic prediction of gene translation elongation efficiency.

PubMed

Sokolov, Vladimir S; Zuraev, Bulat S; Lashin, Sergei A; Matushkin, Yury G

2015-03-01

Expression efficiency is one of the major characteristics describing genes in various modern investigations. Expression efficiency of genes is regulated at various stages: transcription, translation, posttranslational protein modification and others. In this study, a special EloE (Elongation Efficiency) web application is described. The EloE sorts the organism's genes in a descend order on their theoretical rate of the elongation stage of translation based on the analysis of their nucleotide sequences. Obtained theoretical data have a significant correlation with available experimental data of gene expression in various organisms. In addition, the program identifies preferential codons in organism's genes and defines distribution of potential secondary structures energy in 5´ and 3´ regions of mRNA. The EloE can be useful in preliminary estimation of translation elongation efficiency for genes for which experimental data are not available yet. Some results can be used, for instance, in other programs modeling artificial genetic structures in genetically engineered experiments. The EloE web application is available at http://www-bionet.sscc.ru:7780/EloE.
A gene regulatory network model for floral transition of the shoot apex in maize and its dynamic modeling.

PubMed

Dong, Zhanshan; Danilevskaya, Olga; Abadie, Tabare; Messina, Carlos; Coles, Nathan; Cooper, Mark

2012-01-01

The transition from the vegetative to reproductive development is a critical event in the plant life cycle. The accurate prediction of flowering time in elite germplasm is important for decisions in maize breeding programs and best agronomic practices. The understanding of the genetic control of flowering time in maize has significantly advanced in the past decade. Through comparative genomics, mutant analysis, genetic analysis and QTL cloning, and transgenic approaches, more than 30 flowering time candidate genes in maize have been revealed and the relationships among these genes have been partially uncovered. Based on the knowledge of the flowering time candidate genes, a conceptual gene regulatory network model for the genetic control of flowering time in maize is proposed. To demonstrate the potential of the proposed gene regulatory network model, a first attempt was made to develop a dynamic gene network model to predict flowering time of maize genotypes varying for specific genes. The dynamic gene network model is composed of four genes and was built on the basis of gene expression dynamics of the two late flowering id1 and dlf1 mutants, the early flowering landrace Gaspe Flint and the temperate inbred B73. The model was evaluated against the phenotypic data of the id1 dlf1 double mutant and the ZMM4 overexpressed transgenic lines. The model provides a working example that leverages knowledge from model organisms for the utilization of maize genomic information to predict a whole plant trait phenotype, flowering time, of maize genotypes.
Distributed Function Mining for Gene Expression Programming Based on Fast Reduction.

PubMed

Deng, Song; Yue, Dong; Yang, Le-chan; Fu, Xiong; Feng, Ya-zhou

2016-01-01

For high-dimensional and massive data sets, traditional centralized gene expression programming (GEP) or improved algorithms lead to increased run-time and decreased prediction accuracy. To solve this problem, this paper proposes a new improved algorithm called distributed function mining for gene expression programming based on fast reduction (DFMGEP-FR). In DFMGEP-FR, fast attribution reduction in binary search algorithms (FAR-BSA) is proposed to quickly find the optimal attribution set, and the function consistency replacement algorithm is given to solve integration of the local function model. Thorough comparative experiments for DFMGEP-FR, centralized GEP and the parallel gene expression programming algorithm based on simulated annealing (parallel GEPSA) are included in this paper. For the waveform, mushroom, connect-4 and musk datasets, the comparative results show that the average time-consumption of DFMGEP-FR drops by 89.09%%, 88.85%, 85.79% and 93.06%, respectively, in contrast to centralized GEP and by 12.5%, 8.42%, 9.62% and 13.75%, respectively, compared with parallel GEPSA. Six well-studied UCI test data sets demonstrate the efficiency and capability of our proposed DFMGEP-FR algorithm for distributed function mining.
Transcriptomic profiling as a screening tool to detect trenbolone treatment in beef cattle.

PubMed

Pegolo, S; Cannizzo, F T; Biolatti, B; Castagnaro, M; Bargelloni, L

2014-06-01

The effects of steroid hormone implants containing trenbolone alone (Finaplix-H), combined with 17β-oestradiol (17β-E; Revalor-H), or with 17β-E and dexamethasone (Revalor-H plus dexamethasone per os) on the bovine muscle transcriptome were examined by DNA-microarray. Overall, large sets of genes were shown to be modulated by the different growth promoters (GPs) and the regulated pathways and biological processes were mostly shared among the treatment groups. Using the Prediction Analysis of Microarray program, GP-treated animals were accurately identified by a small number of predictive genes. A meta-analysis approach was also carried out for the Revalor group to potentially increase the robustness of class prediction analysis. After data pre-processing, a high level of accuracy (90%) was obtained in the classification of samples, using 105 predictive gene markers. Transcriptomics could thus help in the identification of indirect biomarkers for anabolic treatment in beef cattle to be applied for the screening of muscle samples collected after slaughtering. Copyright © 2014 Elsevier Ltd. All rights reserved.
miRWalk--database: prediction of possible miRNA binding sites by "walking" the genes of three genomes.

PubMed

Dweep, Harsh; Sticht, Carsten; Pandey, Priyanka; Gretz, Norbert

2011-10-01

MicroRNAs are small, non-coding RNA molecules that can complementarily bind to the mRNA 3'-UTR region to regulate the gene expression by transcriptional repression or induction of mRNA degradation. Increasing evidence suggests a new mechanism by which miRNAs may regulate target gene expression by binding in promoter and amino acid coding regions. Most of the existing databases on miRNAs are restricted to mRNA 3'-UTR region. To address this issue, we present miRWalk, a comprehensive database on miRNAs, which hosts predicted as well as validated miRNA binding sites, information on all known genes of human, mouse and rat. All mRNAs, mitochondrial genes and 10 kb upstream flanking regions of all known genes of human, mouse and rat were analyzed by using a newly developed algorithm named 'miRWalk' as well as with eight already established programs for putative miRNA binding sites. An automated and extensive text-mining search was performed on PubMed database to extract validated information on miRNAs. Combined information was put into a MySQL database. miRWalk presents predicted and validated information on miRNA-target interaction. Such a resource enables researchers to validate new targets of miRNA not only on 3'-UTR, but also on the other regions of all known genes. The 'Validated Target module' is updated every month and the 'Predicted Target module' is updated every 6 months. miRWalk is freely available at http://mirwalk.uni-hd.de/. Copyright © 2011 Elsevier Inc. All rights reserved.
Integrated analyses of microRNAs demonstrate their widespread influence on gene expression in high-grade serous ovarian carcinoma.

PubMed

Creighton, Chad J; Hernandez-Herrera, Anadulce; Jacobsen, Anders; Levine, Douglas A; Mankoo, Parminder; Schultz, Nikolaus; Du, Ying; Zhang, Yiqun; Larsson, Erik; Sheridan, Robert; Xiao, Weimin; Spellman, Paul T; Getz, Gad; Wheeler, David A; Perou, Charles M; Gibbs, Richard A; Sander, Chris; Hayes, D Neil; Gunaratne, Preethi H

2012-01-01

The Cancer Genome Atlas (TCGA) Network recently comprehensively catalogued the molecular aberrations in 487 high-grade serous ovarian cancers, with much remaining to be elucidated regarding the microRNAs (miRNAs). Here, using TCGA ovarian data, we surveyed the miRNAs, in the context of their predicted gene targets. Integration of miRNA and gene patterns yielded evidence that proximal pairs of miRNAs are processed from polycistronic primary transcripts, and that intronic miRNAs and their host gene mRNAs derive from common transcripts. Patterns of miRNA expression revealed multiple tumor subtypes and a set of 34 miRNAs predictive of overall patient survival. In a global analysis, miRNA:mRNA pairs anti-correlated in expression across tumors showed a higher frequency of in silico predicted target sites in the mRNA 3'-untranslated region (with less frequency observed for coding sequence and 5'-untranslated regions). The miR-29 family and predicted target genes were among the most strongly anti-correlated miRNA:mRNA pairs; over-expression of miR-29a in vitro repressed several anti-correlated genes (including DNMT3A and DNMT3B) and substantially decreased ovarian cancer cell viability. This study establishes miRNAs as having a widespread impact on gene expression programs in ovarian cancer, further strengthening our understanding of miRNA biology as it applies to human cancer. As with gene transcripts, miRNAs exhibit high diversity reflecting the genomic heterogeneity within a clinically homogeneous disease population. Putative miRNA:mRNA interactions, as identified using integrative analysis, can be validated. TCGA data are a valuable resource for the identification of novel tumor suppressive miRNAs in ovarian as well as other cancers.
Prediction of cancer class with majority voting genetic programming classifier using gene expression data.

PubMed

Paul, Topon Kumar; Iba, Hitoshi

2009-01-01

In order to get a better understanding of different types of cancers and to find the possible biomarkers for diseases, recently, many researchers are analyzing the gene expression data using various machine learning techniques. However, due to a very small number of training samples compared to the huge number of genes and class imbalance, most of these methods suffer from overfitting. In this paper, we present a majority voting genetic programming classifier (MVGPC) for the classification of microarray data. Instead of a single rule or a single set of rules, we evolve multiple rules with genetic programming (GP) and then apply those rules to test samples to determine their labels with majority voting technique. By performing experiments on four different public cancer data sets, including multiclass data sets, we have found that the test accuracies of MVGPC are better than those of other methods, including AdaBoost with GP. Moreover, some of the more frequently occurring genes in the classification rules are known to be associated with the types of cancers being studied in this paper.
Integrative gene network construction to analyze cancer recurrence using semi-supervised learning.

PubMed

Park, Chihyun; Ahn, Jaegyoon; Kim, Hyunjin; Park, Sanghyun

2014-01-01

The prognosis of cancer recurrence is an important research area in bioinformatics and is challenging due to the small sample sizes compared to the vast number of genes. There have been several attempts to predict cancer recurrence. Most studies employed a supervised approach, which uses only a few labeled samples. Semi-supervised learning can be a great alternative to solve this problem. There have been few attempts based on manifold assumptions to reveal the detailed roles of identified cancer genes in recurrence. In order to predict cancer recurrence, we proposed a novel semi-supervised learning algorithm based on a graph regularization approach. We transformed the gene expression data into a graph structure for semi-supervised learning and integrated protein interaction data with the gene expression data to select functionally-related gene pairs. Then, we predicted the recurrence of cancer by applying a regularization approach to the constructed graph containing both labeled and unlabeled nodes. The average improvement rate of accuracy for three different cancer datasets was 24.9% compared to existing supervised and semi-supervised methods. We performed functional enrichment on the gene networks used for learning. We identified that those gene networks are significantly associated with cancer-recurrence-related biological functions. Our algorithm was developed with standard C++ and is available in Linux and MS Windows formats in the STL library. The executable program is freely available at: http://embio.yonsei.ac.kr/~Park/ssl.php.
RNA-sequence data normalization through in silico prediction of reference genes: the bacterial response to DNA damage as case study.

PubMed

Berghoff, Bork A; Karlsson, Torgny; Källman, Thomas; Wagner, E Gerhart H; Grabherr, Manfred G

2017-01-01

Measuring how gene expression changes in the course of an experiment assesses how an organism responds on a molecular level. Sequencing of RNA molecules, and their subsequent quantification, aims to assess global gene expression changes on the RNA level (transcriptome). While advances in high-throughput RNA-sequencing (RNA-seq) technologies allow for inexpensive data generation, accurate post-processing and normalization across samples is required to eliminate any systematic noise introduced by the biochemical and/or technical processes. Existing methods thus either normalize on selected known reference genes that are invariant in expression across the experiment, assume that the majority of genes are invariant, or that the effects of up- and down-regulated genes cancel each other out during the normalization. Here, we present a novel method, moose 2 , which predicts invariant genes in silico through a dynamic programming (DP) scheme and applies a quadratic normalization based on this subset. The method allows for specifying a set of known or experimentally validated invariant genes, which guides the DP. We experimentally verified the predictions of this method in the bacterium Escherichia coli , and show how moose 2 is able to (i) estimate the expression value distances between RNA-seq samples, (ii) reduce the variation of expression values across all samples, and (iii) to subsequently reveal new functional groups of genes during the late stages of DNA damage. We further applied the method to three eukaryotic data sets, on which its performance compares favourably to other methods. The software is implemented in C++ and is publicly available from http://grabherr.github.io/moose2/. The proposed RNA-seq normalization method, moose 2 , is a valuable alternative to existing methods, with two major advantages: (i) in silico prediction of invariant genes provides a list of potential reference genes for downstream analyses, and (ii) non-linear artefacts in RNA-seq data are handled adequately to minimize variations between replicates.
Characteristics of genomic signatures derived using univariate methods and mechanistically anchored functional descriptors for predicting drug- and xenobiotic-induced nephrotoxicity.

PubMed

Shi, Weiwei; Bugrim, Andrej; Nikolsky, Yuri; Nikolskya, Tatiana; Brennan, Richard J

2008-01-01

ABSTRACT The ideal toxicity biomarker is composed of the properties of prediction (is detected prior to traditional pathological signs of injury), accuracy (high sensitivity and specificity), and mechanistic relationships to the endpoint measured (biological relevance). Gene expression-based toxicity biomarkers ("signatures") have shown good predictive power and accuracy, but are difficult to interpret biologically. We have compared different statistical methods of feature selection with knowledge-based approaches, using GeneGo's database of canonical pathway maps, to generate gene sets for the classification of renal tubule toxicity. The gene set selection algorithms include four univariate analyses: t-statistics, fold-change, B-statistics, and RankProd, and their combination and overlap for the identification of differentially expressed probes. Enrichment analysis following the results of the four univariate analyses, Hotelling T-square test, and, finally out-of-bag selection, a variant of cross-validation, were used to identify canonical pathway maps-sets of genes coordinately involved in key biological processes-with classification power. Differentially expressed genes identified by the different statistical univariate analyses all generated reasonably performing classifiers of tubule toxicity. Maps identified by enrichment analysis or Hotelling T-square had lower classification power, but highlighted perturbed lipid homeostasis as a common discriminator of nephrotoxic treatments. The out-of-bag method yielded the best functionally integrated classifier. The map "ephrins signaling" performed comparably to a classifier derived using sparse linear programming, a machine learning algorithm, and represents a signaling network specifically involved in renal tubule development and integrity. Such functional descriptors of toxicity promise to better integrate predictive toxicogenomics with mechanistic analysis, facilitating the interpretation and risk assessment of predictive genomic investigations.
Final Report: The DNA Files: Unraveling the mysteries of genetics, January 1, 1998-March 31, 1999

DOE Office of Scientific and Technical Information (OSTI.GOV)

Scott, Bari

1999-05-01

The DNA Files is an award-winning radio documentary series on genetics created by SoundVision Productions. The DNA Files was hosted by John Hockenberry and was presented in documentary and discussion format. The programs covered a range of topics from prenatal and predictive gene testing, gene therapy, and commercialization of genetic information to new evolutionary genetic evidence, transgenic vegetables and use of DNA in forensics.
CTCF counter-regulates cardiomyocyte development and maturation programs in the embryonic heart.

PubMed

Gomez-Velazquez, Melisa; Badia-Careaga, Claudio; Lechuga-Vieco, Ana Victoria; Nieto-Arellano, Rocio; Tena, Juan J; Rollan, Isabel; Alvarez, Alba; Torroja, Carlos; Caceres, Eva F; Roy, Anna R; Galjart, Niels; Delgado-Olguin, Paul; Sanchez-Cabo, Fatima; Enriquez, Jose Antonio; Gomez-Skarmeta, Jose Luis; Manzanares, Miguel

2017-08-01

Cardiac progenitors are specified early in development and progressively differentiate and mature into fully functional cardiomyocytes. This process is controlled by an extensively studied transcriptional program. However, the regulatory events coordinating the progression of such program from development to maturation are largely unknown. Here, we show that the genome organizer CTCF is essential for cardiogenesis and that it mediates genomic interactions to coordinate cardiomyocyte differentiation and maturation in the developing heart. Inactivation of Ctcf in cardiac progenitor cells and their derivatives in vivo during development caused severe cardiac defects and death at embryonic day 12.5. Genome wide expression analysis in Ctcf mutant hearts revealed that genes controlling mitochondrial function and protein production, required for cardiomyocyte maturation, were upregulated. However, mitochondria from mutant cardiomyocytes do not mature properly. In contrast, multiple development regulatory genes near predicted heart enhancers, including genes in the IrxA cluster, were downregulated in Ctcf mutants, suggesting that CTCF promotes cardiomyocyte differentiation by facilitating enhancer-promoter interactions. Accordingly, loss of CTCF disrupts gene expression and chromatin interactions as shown by chromatin conformation capture followed by deep sequencing. Furthermore, CRISPR-mediated deletion of an intergenic CTCF site within the IrxA cluster alters gene expression in the developing heart. Thus, CTCF mediates local regulatory interactions to coordinate transcriptional programs controlling transitions in morphology and function during heart development.

CTCF counter-regulates cardiomyocyte development and maturation programs in the embryonic heart

PubMed Central

Gomez-Velazquez, Melisa; Badia-Careaga, Claudio; Lechuga-Vieco, Ana Victoria; Nieto-Arellano, Rocio; Rollan, Isabel; Alvarez, Alba; Torroja, Carlos; Caceres, Eva F.; Roy, Anna R.; Galjart, Niels; Sanchez-Cabo, Fatima; Enriquez, Jose Antonio; Gomez-Skarmeta, Jose Luis

2017-01-01

Cardiac progenitors are specified early in development and progressively differentiate and mature into fully functional cardiomyocytes. This process is controlled by an extensively studied transcriptional program. However, the regulatory events coordinating the progression of such program from development to maturation are largely unknown. Here, we show that the genome organizer CTCF is essential for cardiogenesis and that it mediates genomic interactions to coordinate cardiomyocyte differentiation and maturation in the developing heart. Inactivation of Ctcf in cardiac progenitor cells and their derivatives in vivo during development caused severe cardiac defects and death at embryonic day 12.5. Genome wide expression analysis in Ctcf mutant hearts revealed that genes controlling mitochondrial function and protein production, required for cardiomyocyte maturation, were upregulated. However, mitochondria from mutant cardiomyocytes do not mature properly. In contrast, multiple development regulatory genes near predicted heart enhancers, including genes in the IrxA cluster, were downregulated in Ctcf mutants, suggesting that CTCF promotes cardiomyocyte differentiation by facilitating enhancer-promoter interactions. Accordingly, loss of CTCF disrupts gene expression and chromatin interactions as shown by chromatin conformation capture followed by deep sequencing. Furthermore, CRISPR-mediated deletion of an intergenic CTCF site within the IrxA cluster alters gene expression in the developing heart. Thus, CTCF mediates local regulatory interactions to coordinate transcriptional programs controlling transitions in morphology and function during heart development. PMID:28846746
The GP problem: quantifying gene-to-phenotype relationships.

PubMed

Cooper, Mark; Chapman, Scott C; Podlich, Dean W; Hammer, Graeme L

2002-01-01

In this paper we refer to the gene-to-phenotype modeling challenge as the GP problem. Integrating information across levels of organization within a genotype-environment system is a major challenge in computational biology. However, resolving the GP problem is a fundamental requirement if we are to understand and predict phenotypes given knowledge of the genome and model dynamic properties of biological systems. Organisms are consequences of this integration, and it is a major property of biological systems that underlies the responses we observe. We discuss the E(NK) model as a framework for investigation of the GP problem and the prediction of system properties at different levels of organization. We apply this quantitative framework to an investigation of the processes involved in genetic improvement of plants for agriculture. In our analysis, N genes determine the genetic variation for a set of traits that are responsible for plant adaptation to E environment-types within a target population of environments. The N genes can interact in epistatic NK gene-networks through the way that they influence plant growth and development processes within a dynamic crop growth model. We use a sorghum crop growth model, available within the APSIM agricultural production systems simulation model, to integrate the gene-environment interactions that occur during growth and development and to predict genotype-to-phenotype relationships for a given E(NK) model. Directional selection is then applied to the population of genotypes, based on their predicted phenotypes, to simulate the dynamic aspects of genetic improvement by a plant-breeding program. The outcomes of the simulated breeding are evaluated across cycles of selection in terms of the changes in allele frequencies for the N genes and the genotypic and phenotypic values of the populations of genotypes.
MOCAT: A Metagenomics Assembly and Gene Prediction Toolkit

PubMed Central

Li, Junhua; Chen, Weineng; Chen, Hua; Mende, Daniel R.; Arumugam, Manimozhiyan; Pan, Qi; Liu, Binghang; Qin, Junjie; Wang, Jun; Bork, Peer

2012-01-01

MOCAT is a highly configurable, modular pipeline for fast, standardized processing of single or paired-end sequencing data generated by the Illumina platform. The pipeline uses state-of-the-art programs to quality control, map, and assemble reads from metagenomic samples sequenced at a depth of several billion base pairs, and predict protein-coding genes on assembled metagenomes. Mapping against reference databases allows for read extraction or removal, as well as abundance calculations. Relevant statistics for each processing step can be summarized into multi-sheet Excel documents and queryable SQL databases. MOCAT runs on UNIX machines and integrates seamlessly with the SGE and PBS queuing systems, commonly used to process large datasets. The open source code and modular architecture allow users to modify or exchange the programs that are utilized in the various processing steps. Individual processing steps and parameters were benchmarked and tested on artificial, real, and simulated metagenomes resulting in an improvement of selected quality metrics. MOCAT can be freely downloaded at http://www.bork.embl.de/mocat/. PMID:23082188
Artificial Intelligence Tools for Scaling Up of High Shear Wet Granulation Process.

PubMed

Landin, Mariana

2017-01-01

The results presented in this article demonstrate the potential of artificial intelligence tools for predicting the endpoint of the granulation process in high-speed mixer granulators of different scales from 25L to 600L. The combination of neurofuzzy logic and gene expression programing technologies allowed the modeling of the impeller power as a function of operation conditions and wet granule properties, establishing the critical variables that affect the response and obtaining a unique experimental polynomial equation (transparent model) of high predictability (R 2 > 86.78%) for all size equipment. Gene expression programing allowed the modeling of the granulation process for granulators of similar and dissimilar geometries and can be improved by implementing additional characteristics of the process, as composition variables or operation parameters (e.g., batch size, chopper speed). The principles and the methodology proposed here can be applied to understand and control manufacturing process, using any other granulation equipment, including continuous granulation processes. Copyright © 2016 American Pharmacists Association®. Published by Elsevier Inc. All rights reserved.
Forecasting Caspian Sea level changes using satellite altimetry data (June 1992-December 2013) based on evolutionary support vector regression algorithms and gene expression programming

NASA Astrophysics Data System (ADS)

Imani, Moslem; You, Rey-Jer; Kuo, Chung-Yen

2014-10-01

Sea level forecasting at various time intervals is of great importance in water supply management. Evolutionary artificial intelligence (AI) approaches have been accepted as an appropriate tool for modeling complex nonlinear phenomena in water bodies. In the study, we investigated the ability of two AI techniques: support vector machine (SVM), which is mathematically well-founded and provides new insights into function approximation, and gene expression programming (GEP), which is used to forecast Caspian Sea level anomalies using satellite altimetry observations from June 1992 to December 2013. SVM demonstrates the best performance in predicting Caspian Sea level anomalies, given the minimum root mean square error (RMSE = 0.035) and maximum coefficient of determination (R2 = 0.96) during the prediction periods. A comparison between the proposed AI approaches and the cascade correlation neural network (CCNN) model also shows the superiority of the GEP and SVM models over the CCNN.
MOCAT: a metagenomics assembly and gene prediction toolkit.

PubMed

Kultima, Jens Roat; Sunagawa, Shinichi; Li, Junhua; Chen, Weineng; Chen, Hua; Mende, Daniel R; Arumugam, Manimozhiyan; Pan, Qi; Liu, Binghang; Qin, Junjie; Wang, Jun; Bork, Peer

2012-01-01

MOCAT is a highly configurable, modular pipeline for fast, standardized processing of single or paired-end sequencing data generated by the Illumina platform. The pipeline uses state-of-the-art programs to quality control, map, and assemble reads from metagenomic samples sequenced at a depth of several billion base pairs, and predict protein-coding genes on assembled metagenomes. Mapping against reference databases allows for read extraction or removal, as well as abundance calculations. Relevant statistics for each processing step can be summarized into multi-sheet Excel documents and queryable SQL databases. MOCAT runs on UNIX machines and integrates seamlessly with the SGE and PBS queuing systems, commonly used to process large datasets. The open source code and modular architecture allow users to modify or exchange the programs that are utilized in the various processing steps. Individual processing steps and parameters were benchmarked and tested on artificial, real, and simulated metagenomes resulting in an improvement of selected quality metrics. MOCAT can be freely downloaded at http://www.bork.embl.de/mocat/.
An integrated approach to reconstructing genome-scale transcriptional regulatory networks

DOE PAGES

Imam, Saheed; Noguera, Daniel R.; Donohue, Timothy J.; ...

2015-02-27

Transcriptional regulatory networks (TRNs) program cells to dynamically alter their gene expression in response to changing internal or environmental conditions. In this study, we develop a novel workflow for generating large-scale TRN models that integrates comparative genomics data, global gene expression analyses, and intrinsic properties of transcription factors (TFs). An assessment of this workflow using benchmark datasets for the well-studied γ-proteobacterium Escherichia coli showed that it outperforms expression-based inference approaches, having a significantly larger area under the precision-recall curve. Further analysis indicated that this integrated workflow captures different aspects of the E. coli TRN than expression-based approaches, potentially making themmore » highly complementary. We leveraged this new workflow and observations to build a large-scale TRN model for the α-Proteobacterium Rhodobacter sphaeroides that comprises 120 gene clusters, 1211 genes (including 93 TFs), 1858 predicted protein-DNA interactions and 76 DNA binding motifs. We found that ~67% of the predicted gene clusters in this TRN are enriched for functions ranging from photosynthesis or central carbon metabolism to environmental stress responses. We also found that members of many of the predicted gene clusters were consistent with prior knowledge in R. sphaeroides and/or other bacteria. Experimental validation of predictions from this R. sphaeroides TRN model showed that high precision and recall was also obtained for TFs involved in photosynthesis (PpsR), carbon metabolism (RSP_0489) and iron homeostasis (RSP_3341). In addition, this integrative approach enabled generation of TRNs with increased information content relative to R. sphaeroides TRN models built via other approaches. We also show how this approach can be used to simultaneously produce TRN models for each related organism used in the comparative genomics analysis. Our results highlight the advantages of integrating comparative genomics of closely related organisms with gene expression data to assemble large-scale TRN models with high-quality predictions.« less
[The application of gene expression programming in the diagnosis of heart disease].

PubMed

Dai, Wenbin; Zhang, Yuntao; Gao, Xingyu

2009-02-01

GEP (Gene expression programming) is a new genetic algorithm, and it has been proved to be excellent in function finding. In this paper, for the purpose of setting up a diagnostic model, GEP is used to deal with the data of heart disease. Eight variables, Sex, Chest pain, Blood pressure, Angina, Peak, Slope, Colored vessels and Thal, are picked out of thirteen variables to form a classified function. This function is used to predict a forecasting set of 100 samples, and the accuracy is 87%. Other algorithms such as SVM (Support vector machine) are applied to the same data and the forecasting results show that GEP is better than other algorithms.
GIANT API: an application programming interface for functional genomics

PubMed Central

Roberts, Andrew M.; Wong, Aaron K.; Fisk, Ian; Troyanskaya, Olga G.

2016-01-01

GIANT API provides biomedical researchers programmatic access to tissue-specific and global networks in humans and model organisms, and associated tools, which includes functional re-prioritization of existing genome-wide association study (GWAS) data. Using tissue-specific interaction networks, researchers are able to predict relationships between genes specific to a tissue or cell lineage, identify the changing roles of genes across tissues and uncover disease-gene associations. Additionally, GIANT API enables computational tools like NetWAS, which leverages tissue-specific networks for re-prioritization of GWAS results. The web services covered by the API include 144 tissue-specific functional gene networks in human, global functional networks for human and six common model organisms and the NetWAS method. GIANT API conforms to the REST architecture, which makes it stateless, cacheable and highly scalable. It can be used by a diverse range of clients including web browsers, command terminals, programming languages and standalone apps for data analysis and visualization. The API is freely available for use at http://giant-api.princeton.edu. PMID:27098035
Effects of Metabolic Programming on Juvenile Play Behavior and Gene Expression in the Prefrontal Cortex of Rats.

PubMed

Hehar, Harleen; Ma, Irene; Mychasiuk, Richelle

2016-01-01

Early developmental processes, such as metabolic programming, can provide cues to an organism, which allow it to make modifications that are predicted to be beneficial for survival. Similarly, social play has a multifaceted role in promoting survival and fitness of animals. Play is a complex behavior that is greatly influenced by motivational and reward circuits, as well as the energy reserves and metabolism of an organism. This study examined the association between metabolic programming and juvenile play behavior in an effort to further elucidate insight into the consequences that early adaptions have on developmental trajectories. The study also examined changes in expression of four genes (Drd2, IGF1, Opa1, and OxyR) in the prefrontal cortex known to play significant roles in reward, bioenergetics, and social-emotional functioning. Using four distinct variations in developmental programming (high-fat diet, caloric restriction, exercise, or high-fat diet combined with exercise), we found that dietary programming (high-fat diet vs. caloric restriction) had the greatest impact on play behavior and gene expression. However, exercise also induced changes in both measures. This study demonstrates that metabolic programming can alter neural circuits and bioenergetics involved in play behavior, thus providing new insights into mechanisms that allow programming to influence the evolutionary success of an organism. © 2016 S. Karger AG, Basel.
Validation of predictive models for germline mutations in DNA mismatch repair genes in colorectal cancer.

PubMed

Monzon, Jose G; Cremin, Carol; Armstrong, Linlea; Nuk, Jennifer; Young, Sean; Horsman, Doug E; Garbutt, Kristy; Bajdik, Chris D; Gill, Sharlene

2010-02-15

Lynch syndrome is defined by the presence of germline mutations in mismatch repair (MMR) genes. Several models have been recently devised that predict mutation carrier status (Myriad Genetics, Wijnen, Barnetson, PREMM and MMRpro models). Families at moderate-high risk for harboring a Lynch-associated mutation, referred to the BC Cancer Agency (BCCA) Hereditary Cancer Program (HCP), underwent mutation analysis, immunohistochemistry and/or microsatellite testing. Seventy-two tested cases were included. Twenty-five patients were mutation positive (34.7%) and 47 were mutation negative (65.3%). Nineteen of 43 patients who were both microsatellite stable and normal on immunohistochemistry for MLH1 and MSH2 were also genotyped for mutations in these genes; all 19 were negative for MMR gene mutations. Model-derived probabilities of harboring a MMR gene mutation in the proband were calculated and compared to observed results. The area under the ROC curves were 0.75 (95%CI; 0.63-0.87), 0.86 (0.7-0.96), 0.89 (0.82-0.97), 0.89 (0.81-0.98) and 0.93 (0.86-0.99) for the Myriad, Barnetson, Wijnen, MMRpro and PREMM models, respectively. The Amsterdam II criteria had a sensitivity and specificity of 0.76 and 0.74, respectively, in this cohort. The PREMM model demonstrated the best performance for predicting carrier status based on the positive likelihood ratios at the >10%, >20% and >30% probability thresholds. In this referred cohort, the PREMM model had the most favorable concordance index and predictive performance for carrier status based on the positive LR. These prediction models (PREMM, MMRPro and Wijnen) may soon replace the Amsterdam II and revised Bethesda criteria as a prescreening tool for Lynch mutations.
Twenty-four signature genes predict the prognosis of oral squamous cell carcinoma with high accuracy and repeatability

PubMed Central

Gao, Jianyong; Tian, Gang; Han, Xu; Zhu, Qiang

2018-01-01

Oral squamous cell carcinoma (OSCC) is the sixth most common type cancer worldwide, with poor prognosis. The present study aimed to identify gene signatures that could classify OSCC and predict prognosis in different stages. A training data set (GSE41613) and two validation data sets (GSE42743 and GSE26549) were acquired from the online Gene Expression Omnibus database. In the training data set, patients were classified based on the tumor-node-metastasis staging system, and subsequently grouped into low stage (L) or high stage (H). Signature genes between L and H stages were selected by disparity index analysis, and classification was performed by the expression of these signature genes. The established classification was compared with the L and H classification, and fivefold cross validation was used to evaluate the stability. Enrichment analysis for the signature genes was implemented by the Database for Annotation, Visualization and Integration Discovery. Two validation data sets were used to determine the precise of classification. Survival analysis was conducted followed each classification using the package ‘survival’ in R software. A set of 24 signature genes was identified based on the classification model with the Fi value of 0.47, which was used to distinguish OSCC samples in two different stages. Overall survival of patients in the H stage was higher than those in the L stage. Signature genes were primarily enriched in ‘ether lipid metabolism’ pathway and biological processes such as ‘positive regulation of adaptive immune response’ and ‘apoptotic cell clearance’. The results provided a novel 24-gene set that may be used as biomarkers to predict OSCC prognosis with high accuracy, which may be used to determine an appropriate treatment program for patients with OSCC in addition to the traditional evaluation index. PMID:29257303
Androgen-induced programs for prostate epithelial growth and invasion arise in embryogenesis and are reactivated in cancer

PubMed Central

Schaeffer, EM; Marchionni, L; Huang, Z; Simons, B; Blackman, A; Yu, W; Parmigiani, G; Berman, DM

2008-01-01

Cancer cells differentiate along specific lineages that largely determine their clinical and biologic behavior. Distinct cancer phenotypes from different cells and organs likely result from unique gene expression repertoires established in the embryo and maintained after malignant transformation. We used comprehensive gene expression analysis to examine this concept in the prostate, an organ with a tractable developmental program and a high propensity for cancer. We focused on gene expression in the murine prostate rudiment at three time points during the first 48 h of exposure to androgen, which initiates proliferation and invasion of prostate epithelial buds into surrounding urogenital sinus mesenchyme. Here, we show that androgen exposure regulates genes previously implicated in prostate carcinogenesis comprising pathways for the phosphatase and tensin homolog (PTEN), fibroblast growth factor (FGF)/mitogen-activated protein kinase (MAPK), and Wnt signaling along with cellular programs regulating such ‘hallmarks’ of cancer as angiogenesis, apoptosis, migration and proliferation. We found statistically significant evidence for novel androgeninduced gene regulation events that establish and/or maintain prostate cell fate. These include modulation of gene expression through microRNAs, expression of specific transcription factors, and regulation of their predicted targets. By querying public gene expression databases from other tissues, we found that rather than generally characterizing androgen exposure or epithelial budding, the early prostate development program more closely resembles the program for human prostate cancer. Most importantly, early androgen-regulated genes and functional themes associated with prostate development were highly enriched in contrasts between increasingly lethal forms of prostate cancer, confirming a ‘reactivation’ of embryonic pathways for proliferation and invasion in prostate cancer progression. Among the genes with the most significant links to the development and cancer, we highlight coordinate induction of the transcription factor Sox9 and suppression of the proapoptotic phospholipid-binding protein Annexin A1 that link early prostate development to early prostate carcinogenesis. These results credential early prostate development as a reliable and valid model system for the investigation of genes and pathways that drive prostate cancer. PMID:18794802
Translational systems pharmacology‐based predictive assessment of drug‐induced cardiomyopathy

PubMed Central

Messinis, Dimitris E.; Melas, Ioannis N.; Hur, Junguk; Varshney, Navya; Alexopoulos, Leonidas G.

2018-01-01

Drug‐induced cardiomyopathy contributes to drug attrition. We compared two pipelines of predictive modeling: (1) applying elastic net (EN) to differentially expressed genes (DEGs) of drugs; (2) applying integer linear programming (ILP) to construct each drug's signaling pathway starting from its targets to downstream proteins, to transcription factors, and to its DEGs in human cardiomyocytes, and then subjecting the genes/proteins in the drugs' signaling networks to EN regression. We classified 31 drugs with availability of DEGs into 13 toxic and 18 nontoxic drugs based on a clinical cardiomyopathy incidence cutoff of 0.1%. The ILP‐augmented modeling increased prediction accuracy from 79% to 88% (sensitivity: 88%; specificity: 89%) under leave‐one‐out cross validation. The ILP‐constructed signaling networks of drugs were better predictors than DEGs. Per literature, the microRNAs that reportedly regulate expression of our six top predictors are of diagnostic value for natural heart failure or doxorubicin‐induced cardiomyopathy. This translational predictive modeling might uncover potential biomarkers. PMID:29341478
Data mining and pathway analysis of glucose-6-phosphate dehydrogenase with natural language processing.

PubMed

Chen, Long; Zhang, Chunhua; Wang, Yanling; Li, Yuqian; Han, Qiaoqiao; Yang, Huixin; Zhu, Yuechun

2017-08-01

Human glucose-6-phosphate dehydrogenase (G6PD) is a crucial enzyme in the pentose phosphate pathway, and serves an important role in biosynthesis and the redox balance. G6PD deficiency is a major cause of neonatal jaundice and acute hemolyticanemia, and recently, G6PD has been associated with diseases including inflammation and cancer. The aim of the present study was to conduct a search of the National Center for Biotechnology Information PubMed library for articles discussing G6PD. Genes that were identified to be associated with G6PD were recorded, and the frequency at which each gene appeared was calculated. Gene ontology (GO), pathway and network analyses were then performed. A total of 98 G6PD‑associated genes and 33 microRNAs (miRNAs) that potentially regulate G6PD were identified. The 98 G6PD‑associated genes were then sub‑classified into three functional groups by GO analysis, followed by analysis of function, pathway, network, and disease association. Out of the 47 signaling pathways identified, seven were significantly correlated with G6PD‑associated genes. At least two out of four independent programs identified the 33 miRNAs that were predicted to target G6PD. miR‑1207‑5P, miR‑1 and miR‑125a‑5p were predicted by all four software programs to target G6PD. The results of the present study revealed that dysregulation of G6PD was associated with cancer, autoimmune diseases, and oxidative stress‑induced disorders. These results revealed the potential roles of G6PD‑regulated signaling and metabolic pathways in the etiology of these diseases.
Data mining and pathway analysis of glucose-6-phosphate dehydrogenase with natural language processing

PubMed Central

Chen, Long; Zhang, Chunhua; Wang, Yanling; Li, Yuqian; Han, Qiaoqiao; Yang, Huixin; Zhu, Yuechun

2017-01-01

Human glucose-6-phosphate dehydrogenase (G6PD) is a crucial enzyme in the pentose phosphate pathway, and serves an important role in biosynthesis and the redox balance. G6PD deficiency is a major cause of neonatal jaundice and acute hemolyticanemia, and recently, G6PD has been associated with diseases including inflammation and cancer. The aim of the present study was to conduct a search of the National Center for Biotechnology Information PubMed library for articles discussing G6PD. Genes that were identified to be associated with G6PD were recorded, and the frequency at which each gene appeared was calculated. Gene ontology (GO), pathway and network analyses were then performed. A total of 98 G6PD-associated genes and 33 microRNAs (miRNAs) that potentially regulate G6PD were identified. The 98 G6PD-associated genes were then sub-classified into three functional groups by GO analysis, followed by analysis of function, pathway, network, and disease association. Out of the 47 signaling pathways identified, seven were significantly correlated with G6PD-associated genes. At least two out of four independent programs identified the 33 miRNAs that were predicted to target G6PD. miR-1207-5P, miR-1 and miR-125a-5p were predicted by all four software programs to target G6PD. The results of the present study revealed that dysregulation of G6PD was associated with cancer, autoimmune diseases, and oxidative stress-induced disorders. These results revealed the potential roles of G6PD-regulated signaling and metabolic pathways in the etiology of these diseases. PMID:28627690
In silico predicted reproductive endocrine transcriptional regulatory networks during zebrafish (Danio rerio) development.

PubMed

Hala, D

2017-03-21

The interconnected topology of transcriptional regulatory networks (TRNs) readily lends to mathematical (or in silico) representation and analysis as a stoichiometric matrix. Such a matrix can be 'solved' using the mathematical method of extreme pathway (ExPa) analysis, which identifies uniquely activated genes subject to transcription factor (TF) availability. In this manuscript, in silico multi-tissue TRN models of brain, liver and gonad were used to study reproductive endocrine developmental programming in zebrafish (Danio rerio) from 0.25h post fertilization (hpf; zygote) to 90 days post fertilization (dpf; adult life stage). First, properties of TRN models were studied by sequentially activating all genes in multi-tissue models. This analysis showed the brain to exhibit lowest proportion of co-regulated genes (19%) relative to liver (23%) and gonad (32%). This was surprising given that the brain comprised 75% and 25% more TFs than liver and gonad respectively. Such 'hierarchy' of co-regulatory capability (brain
Virulence strategies for infecting phagocytes deduced from the in vivo transcriptional program of Legionella pneumophila.

PubMed

Brüggemann, Holger; Hagman, Arne; Jules, Matthieu; Sismeiro, Odile; Dillies, Marie-Agnès; Gouyette, Catherine; Kunst, Frank; Steinert, Michael; Heuner, Klaus; Coppée, Jean-Yves; Buchrieser, Carmen

2006-08-01

Adaptation to the host environment and exploitation of host cell functions are critical to the success of intracellular pathogens. Here, insight to these virulence mechanisms was obtained for the first time from the transcriptional program of the human pathogen Legionella pneumophila during infection of its natural host, Acanthamoeba castellanii. The biphasic life cycle of L. pneumophila was reflected by a major shift in gene expression from replicative to transmissive phase, concerning nearly half of the genes predicted in the genome. However, three different L. pneumophila strains showed similar in vivo gene expression patterns, indicating that common regulatory mechanisms govern the Legionella life cycle, despite the plasticity of its genome. During the replicative phase, in addition to components of aerobic metabolism and amino acid catabolism, the Entner-Doudoroff pathway, a NADPH producing mechanism used for sugar and/or gluconate assimilation, was expressed, suggesting for the first time that intracellular L. pneumophila may also scavenge host carbohydrates as nutrients and not only proteins. Identification of genes only upregulated in vivo but not in vitro, may explain higher virulence of in vivo grown L. pneumophila. Late in the life cycle, L. pneumophila upregulates genes predicted to promote transmission and manipulation of a new host cell, therewith priming it for the next attack. These including substrates of the Dot/Icm secretion system, other factors associated previously with invasion and virulence, the motility and the type IV pilus machineries, and > 90 proteins not characterized so far. Analysis of a fliA (sigma28) deletion mutant identified genes coregulated with the flagellar regulon, including GGDEF/EAL regulators and factors that promote host cell entry and survival.
Bringing the fathead minnow (Pimephales promelas) into the ...

EPA Pesticide Factsheets

The fathead minnow (Pimephales promelas) is a well-established ecotoxicological model organism that has been widely used for regulatory ecotoxicity testing and research for over a half century. Throughout this time, a lot of knowledge has been gained about the fathead minnow’s biological responses to various xenobiotics. However, despite its importance as a model organism, the fathead minnow still has few publicly available gene sequences. Recently, Burns et al. (2015; Environ. Toxicol. Chem. 35:212) described the sequencing and de-novo assembly of the fathead minnow genome. Two draft genome assemblies are now publicly available on the GenBank database. However, on their own the draft assemblies remain of limited use to researchers who are primarily interested in the functional units of the genome, i.e. the genes. In the present study, an annotation pipeline, consisting of gene prediction, evidence alignment, and data synthesis, was applied to the fathead minnow SOAPdenovo assembly. Ab initio gene prediction was performed using AUGUSTUS, which provided a starting point of 43,345 gene predictions. Fathead minnow Expressed Sequence Tags (ESTs) and zebrafish protein-coding sequences (CDSs) were then aligned to the assembly using the corresponding spliced alignment methods of the program Exonerate. Of the over 240,000 EST alignments, 73% were successfully aligned with 90% or greater sequence identity and query coverage. Similarly, 39% of nearly 45,000 zebrafish co
Integrated Analyses of microRNAs Demonstrate Their Widespread Influence on Gene Expression in High-Grade Serous Ovarian Carcinoma

PubMed Central

Levine, Douglas A.; Mankoo, Parminder; Schultz, Nikolaus; Du, Ying; Zhang, Yiqun; Larsson, Erik; Sheridan, Robert; Xiao, Weimin; Spellman, Paul T.; Getz, Gad; Wheeler, David A.; Perou, Charles M.; Gibbs, Richard A.; Sander, Chris; Hayes, D. Neil; Gunaratne, Preethi H.

2012-01-01

Background The Cancer Genome Atlas (TCGA) Network recently comprehensively catalogued the molecular aberrations in 487 high-grade serous ovarian cancers, with much remaining to be elucidated regarding the microRNAs (miRNAs). Here, using TCGA ovarian data, we surveyed the miRNAs, in the context of their predicted gene targets. Methods and Results Integration of miRNA and gene patterns yielded evidence that proximal pairs of miRNAs are processed from polycistronic primary transcripts, and that intronic miRNAs and their host gene mRNAs derive from common transcripts. Patterns of miRNA expression revealed multiple tumor subtypes and a set of 34 miRNAs predictive of overall patient survival. In a global analysis, miRNA:mRNA pairs anti-correlated in expression across tumors showed a higher frequency of in silico predicted target sites in the mRNA 3′-untranslated region (with less frequency observed for coding sequence and 5′-untranslated regions). The miR-29 family and predicted target genes were among the most strongly anti-correlated miRNA:mRNA pairs; over-expression of miR-29a in vitro repressed several anti-correlated genes (including DNMT3A and DNMT3B) and substantially decreased ovarian cancer cell viability. Conclusions This study establishes miRNAs as having a widespread impact on gene expression programs in ovarian cancer, further strengthening our understanding of miRNA biology as it applies to human cancer. As with gene transcripts, miRNAs exhibit high diversity reflecting the genomic heterogeneity within a clinically homogeneous disease population. Putative miRNA:mRNA interactions, as identified using integrative analysis, can be validated. TCGA data are a valuable resource for the identification of novel tumor suppressive miRNAs in ovarian as well as other cancers. PMID:22479643

Pathway activity inference for multiclass disease classification through a mathematical programming optimisation framework.

PubMed

Yang, Lingjian; Ainali, Chrysanthi; Tsoka, Sophia; Papageorgiou, Lazaros G

2014-12-05

Applying machine learning methods on microarray gene expression profiles for disease classification problems is a popular method to derive biomarkers, i.e. sets of genes that can predict disease state or outcome. Traditional approaches where expression of genes were treated independently suffer from low prediction accuracy and difficulty of biological interpretation. Current research efforts focus on integrating information on protein interactions through biochemical pathway datasets with expression profiles to propose pathway-based classifiers that can enhance disease diagnosis and prognosis. As most of the pathway activity inference methods in literature are either unsupervised or applied on two-class datasets, there is good scope to address such limitations by proposing novel methodologies. A supervised multiclass pathway activity inference method using optimisation techniques is reported. For each pathway expression dataset, patterns of its constituent genes are summarised into one composite feature, termed pathway activity, and a novel mathematical programming model is proposed to infer this feature as a weighted linear summation of expression of its constituent genes. Gene weights are determined by the optimisation model, in a way that the resulting pathway activity has the optimal discriminative power with regards to disease phenotypes. Classification is then performed on the resulting low-dimensional pathway activity profile. The model was evaluated through a variety of published gene expression profiles that cover different types of disease. We show that not only does it improve classification accuracy, but it can also perform well in multiclass disease datasets, a limitation of other approaches from the literature. Desirable features of the model include the ability to control the maximum number of genes that may participate in determining pathway activity, which may be pre-specified by the user. Overall, this work highlights the potential of building pathway-based multi-phenotype classifiers for accurate disease diagnosis and prognosis problems.
ConsPred: a rule-based (re-)annotation framework for prokaryotic genomes.

PubMed

Weinmaier, Thomas; Platzer, Alexander; Frank, Jeroen; Hellinger, Hans-Jörg; Tischler, Patrick; Rattei, Thomas

2016-11-01

The rapidly growing number of available prokaryotic genome sequences requires fully automated and high-quality software solutions for their initial and re-annotation. Here we present ConsPred, a prokaryotic genome annotation framework that performs intrinsic gene predictions, homology searches, predictions of non-coding genes as well as CRISPR repeats and integrates all evidence into a consensus annotation. ConsPred achieves comprehensive, high-quality annotations based on rules and priorities, similar to decision-making in manual curation and avoids conflicting predictions. Parameters controlling the annotation process are configurable by the user. ConsPred has been used in the institutions of the authors for longer than 5 years and can easily be extended and adapted to specific needs. The ConsPred algorithm for producing a consensus from the varying scores of multiple gene prediction programs approaches manual curation in accuracy. Its rule-based approach for choosing final predictions avoids overriding previous manual curations. ConsPred is implemented in Java, Perl and Shell and is freely available under the Creative Commons license as a stand-alone in-house pipeline or as an Amazon Machine Image for cloud computing, see https://sourceforge.net/projects/conspred/. thomas.rattei@univie.ac.atSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
No evidence for the use of DIR, D–D fusions, chromosome 15 open reading frames or VHreplacement in the peripheral repertoire was found on application of an improved algorithm, JointML, to 6329 human immunoglobulin H rearrangements

PubMed Central

Ohm-Laursen, Line; Nielsen, Morten; Larsen, Stine R; Barington, Torben

2006-01-01

Antibody diversity is created by imprecise joining of the variability (V), diversity (D) and joining (J) gene segments of the heavy and light chain loci. Analysis of rearrangements is complicated by somatic hypermutations and uncertainty concerning the sources of gene segments and the precise way in which they recombine. It has been suggested that D genes with irregular recombination signal sequences (DIR) and chromosome 15 open reading frames (OR15) can replace conventional D genes, that two D genes or inverted D genes may be used and that the repertoire can be further diversified by heavy chain V gene (VH) replacement. Safe conclusions require large, well-defined sequence samples and algorithms minimizing stochastic assignment of segments. Two computer programs were developed for analysis of heavy chain joints. JointHMM is a profile hidden Markow model, while JointML is a maximum-likelihood-based method taking the lengths of the joint and the mutational status of the VH gene into account. The programs were applied to a set of 6329 clonally unrelated rearrangements. A conventional D gene was found in 80% of unmutated sequences and 64% of mutated sequences, while D-gene assignment was kept below 5% in artificial (randomly permutated) rearrangements. No evidence for the use of DIR, OR15, multiple D genes or VH replacements was found, while inverted D genes were used in less than 1‰ of the sequences. JointML was shown to have a higher predictive performance for D-gene assignment in mutated and unmutated sequences than four other publicly available programs. An online version 1·0 of JointML is available at http://www.cbs.dtu.dk/services/VDJsolver. PMID:17005006
Enhanced sensitivity of CpG island search and primer design based on predicted CpG island position.

PubMed

Park, Hyun-Chul; Ahn, Eu-Ree; Jung, Ju Yeon; Park, Ji-Hye; Lee, Jee Won; Lim, Si-Keun; Kim, Won

2018-05-01

DNA methylation has important biological roles, such as gene expression regulation, as well as practical applications in forensics, such as in body fluid identification and age estimation. DNA methylation often occurs in the CpG site, and methylation within the CpG islands affects various cellular functions and is related to tissue-specific identification. Several programs have been developed to identify CpG islands; however, the size, location, and number of predicted CpG islands are not identical due to different search algorithms. In addition, they only provide structural information for predicted CpG islands without experimental information, such as primer design. We developed an analysis pipeline package, CpGPNP, to integrate CpG island prediction and primer design. CpGPNP predicts CpG islands more accurately and sensitively than other programs, and designs primers easily based on the predicted CpG island locations. The primer design function included standard, bisulfite, and methylation-specific PCR to identify the methylation of particular CpG sites. In this study, we performed CpG island prediction on all chromosomes and compared CpG island search performance of CpGPNP with other CpG island prediction programs. In addition, we compared the position of primers designed for a specific region within the predicted CpG island using other bisulfite PCR primer programs. The primers designed by CpGPNP were used to experimentally verify the amplification of the target region of markers for body fluid identification and age estimation. CpGPNP is freely available at http://forensicdna.kr/cpgpnp/. Copyright © 2018 Elsevier B.V. All rights reserved.
Complete nucleotide sequence and annotation of the temperate corynephage ϕ16 genome.

PubMed

Lobanova, Juliya S; Gak, Evgueni R; Andreeva, Irina G; Rybak, Konstantin V; Krylov, Alexander A; Mashko, Sergey V

2017-08-01

The complete genome of ϕ16, a temperate corynephage from Corynebacterium glutamicum ATCC 21792, was sequenced and annotated (GenBank: KY250482). The electron microscopy study of ϕ16 virion confirmed that it belongs to the family Siphoviridae. The ϕ16 genome consists of a linear double-stranded DNA molecule of 58,200 bp (G+C = 52.2%) with protruding cohesive 3'-ends of 14 nt. Four major structural proteins were separated by SDS-PAGE and identified by peptide mass fingerprinting technique. Using bioinformatics analysis, 101 putative ORFs and 5 tRNA genes were predicted. Only 27 putative gene products could be assigned to known biological functions. The ϕ16 genome was divided into functional modules. Seven putative promoters and eight putative unidirectional intrinsic terminators were predicted. One site of putative «-1» programmed ribosomal frameshifting was proposed in the phage tail assembly genome region. C. glutamicum genetic tools could be broadened by exploiting the known integrase gene (gp33) and the newly identified excisionase gene (gp47), participating in site-specific recombination between ϕ16-attP/attB.
Identifying Functionally Linked Gene Modules Within Biological Pathways Assessed by ToxCast In Vitro Assays

EPA Science Inventory

The US EPA ToxCast program is using in vitro high-throughput screening assays to profile the bioactivity of environmental chemicals, with the ultimate goal of predicting in vivo toxicity. We hypothesize that in modeling toxicity it will be more constructive to understand the pert...
Secondary Structure Predictions for Long RNA Sequences Based on Inversion Excursions and MapReduce.

PubMed

Yehdego, Daniel T; Zhang, Boyu; Kodimala, Vikram K R; Johnson, Kyle L; Taufer, Michela; Leung, Ming-Ying

2013-05-01

Secondary structures of ribonucleic acid (RNA) molecules play important roles in many biological processes including gene expression and regulation. Experimental observations and computing limitations suggest that we can approach the secondary structure prediction problem for long RNA sequences by segmenting them into shorter chunks, predicting the secondary structures of each chunk individually using existing prediction programs, and then assembling the results to give the structure of the original sequence. The selection of cutting points is a crucial component of the segmenting step. Noting that stem-loops and pseudoknots always contain an inversion, i.e., a stretch of nucleotides followed closely by its inverse complementary sequence, we developed two cutting methods for segmenting long RNA sequences based on inversion excursions: the centered and optimized method. Each step of searching for inversions, chunking, and predictions can be performed in parallel. In this paper we use a MapReduce framework, i.e., Hadoop, to extensively explore meaningful inversion stem lengths and gap sizes for the segmentation and identify correlations between chunking methods and prediction accuracy. We show that for a set of long RNA sequences in the RFAM database, whose secondary structures are known to contain pseudoknots, our approach predicts secondary structures more accurately than methods that do not segment the sequence, when the latter predictions are possible computationally. We also show that, as sequences exceed certain lengths, some programs cannot computationally predict pseudoknots while our chunking methods can. Overall, our predicted structures still retain the accuracy level of the original prediction programs when compared with known experimental secondary structure.
Predictive genomics DNA profiling for athletic performance.

PubMed

Kambouris, Marios; Ntalouka, Foteini; Ziogas, Georgios; Maffulli, Nicola

2012-12-01

Genes control biological processes such as muscle, cartilage and bone formation, muscle energy production and metabolism (mitochondriogenesis, lactic acid removal), blood and tissue oxygenation (erythropoiesis, angiogenesis, vasodilatation), all essential in sport and athletic performance. DNA sequence variations in such genes confer genetic advantages that can be exploited, or genetic 'barriers' that could be overcome to achieve optimal athletic performance. Predictive Genomic DNA Profiling for athletic performance reveals genetic variations that may be associated with better suitability for endurance, strength and speed sports, vulnerability to sports-related injuries and individualized nutritional requirements. Knowledge of genetic 'suitability' in respect to endurance capacity or strength and speed would lead to appropriate sport and athletic activity selection. Knowledge of genetic advantages and barriers would 'direct' an individualized training program, nutritional plan and nutritional supplementation to achieving optimal performance, overcoming 'barriers' that results from intense exercise and pressure under competition with minimum waste of time and energy and avoidance of health risks (hypertension, cardiovascular disease, inflammation, and musculoskeletal injuries) related to exercise, training and competition. Predictive Genomics DNA profiling for Athletics and Sports performance is developing into a tool for athletic activity and sport selection and for the formulation of individualized and personalized training and nutritional programs to optimize health and performance for the athlete. Human DNA sequences are patentable in some countries, while in others DNA testing methodologies [unless proprietary], are non patentable. On the other hand, gene and variant selection, genotype interpretation and the risk and suitability assigning algorithms based on the specific Genomic variants used are amenable to patent protection.
Assessment of the predictive accuracy of five in silico prediction tools, alone or in combination, and two metaservers to classify long QT syndrome gene mutations.

PubMed

Leong, Ivone U S; Stuckey, Alexander; Lai, Daniel; Skinner, Jonathan R; Love, Donald R

2015-05-13

Long QT syndrome (LQTS) is an autosomal dominant condition predisposing to sudden death from malignant arrhythmia. Genetic testing identifies many missense single nucleotide variants of uncertain pathogenicity. Establishing genetic pathogenicity is an essential prerequisite to family cascade screening. Many laboratories use in silico prediction tools, either alone or in combination, or metaservers, in order to predict pathogenicity; however, their accuracy in the context of LQTS is unknown. We evaluated the accuracy of five in silico programs and two metaservers in the analysis of LQTS 1-3 gene variants. The in silico tools SIFT, PolyPhen-2, PROVEAN, SNPs&GO and SNAP, either alone or in all possible combinations, and the metaservers Meta-SNP and PredictSNP, were tested on 312 KCNQ1, KCNH2 and SCN5A gene variants that have previously been characterised by either in vitro or co-segregation studies as either "pathogenic" (283) or "benign" (29). The accuracy, sensitivity, specificity and Matthews Correlation Coefficient (MCC) were calculated to determine the best combination of in silico tools for each LQTS gene, and when all genes are combined. The best combination of in silico tools for KCNQ1 is PROVEAN, SNPs&GO and SIFT (accuracy 92.7%, sensitivity 93.1%, specificity 100% and MCC 0.70). The best combination of in silico tools for KCNH2 is SIFT and PROVEAN or PROVEAN, SNPs&GO and SIFT. Both combinations have the same scores for accuracy (91.1%), sensitivity (91.5%), specificity (87.5%) and MCC (0.62). In the case of SCN5A, SNAP and PROVEAN provided the best combination (accuracy 81.4%, sensitivity 86.9%, specificity 50.0%, and MCC 0.32). When all three LQT genes are combined, SIFT, PROVEAN and SNAP is the combination with the best performance (accuracy 82.7%, sensitivity 83.0%, specificity 80.0%, and MCC 0.44). Both metaservers performed better than the single in silico tools; however, they did not perform better than the best performing combination of in silico tools. The combination of in silico tools with the best performance is gene-dependent. The in silico tools reported here may have some value in assessing variants in the KCNQ1 and KCNH2 genes, but caution should be taken when the analysis is applied to SCN5A gene variants.
A Grammatical Approach to RNA-RNA Interaction Prediction

NASA Astrophysics Data System (ADS)

Kato, Yuki; Akutsu, Tatsuya; Seki, Hiroyuki

2007-11-01

Much attention has been paid to two interacting RNA molecules involved in post-transcriptional control of gene expression. Although there have been a few studies on RNA-RNA interaction prediction based on dynamic programming algorithm, no grammar-based approach has been proposed. The purpose of this paper is to provide a new modeling for RNA-RNA interaction based on multiple context-free grammar (MCFG). We present a polynomial time parsing algorithm for finding the most likely derivation tree for the stochastic version of MCFG, which is applicable to RNA joint secondary structure prediction including kissing hairpin loops. Also, elementary tests on RNA-RNA interaction prediction have shown that the proposed method is comparable to Alkan et al.'s method.
GIANT API: an application programming interface for functional genomics.

PubMed

Roberts, Andrew M; Wong, Aaron K; Fisk, Ian; Troyanskaya, Olga G

2016-07-08

GIANT API provides biomedical researchers programmatic access to tissue-specific and global networks in humans and model organisms, and associated tools, which includes functional re-prioritization of existing genome-wide association study (GWAS) data. Using tissue-specific interaction networks, researchers are able to predict relationships between genes specific to a tissue or cell lineage, identify the changing roles of genes across tissues and uncover disease-gene associations. Additionally, GIANT API enables computational tools like NetWAS, which leverages tissue-specific networks for re-prioritization of GWAS results. The web services covered by the API include 144 tissue-specific functional gene networks in human, global functional networks for human and six common model organisms and the NetWAS method. GIANT API conforms to the REST architecture, which makes it stateless, cacheable and highly scalable. It can be used by a diverse range of clients including web browsers, command terminals, programming languages and standalone apps for data analysis and visualization. The API is freely available for use at http://giant-api.princeton.edu. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Computational Identification and Functional Predictions of Long Noncoding RNA in Zea mays

PubMed Central

Boerner, Susan; McGinnis, Karen M.

2012-01-01

Background Computational analysis of cDNA sequences from multiple organisms suggests that a large portion of transcribed DNA does not code for a functional protein. In mammals, noncoding transcription is abundant, and often results in functional RNA molecules that do not appear to encode proteins. Many long noncoding RNAs (lncRNAs) appear to have epigenetic regulatory function in humans, including HOTAIR and XIST. While epigenetic gene regulation is clearly an essential mechanism in plants, relatively little is known about the presence or function of lncRNAs in plants. Methodology/Principal Findings To explore the connection between lncRNA and epigenetic regulation of gene expression in plants, a computational pipeline using the programming language Python has been developed and applied to maize full length cDNA sequences to identify, classify, and localize potential lncRNAs. The pipeline was used in parallel with an SVM tool for identifying ncRNAs to identify the maximal number of ncRNAs in the dataset. Although the available library of sequences was small and potentially biased toward protein coding transcripts, 15% of the sequences were predicted to be noncoding. Approximately 60% of these sequences appear to act as precursors for small RNA molecules and may function to regulate gene expression via a small RNA dependent mechanism. ncRNAs were predicted to originate from both genic and intergenic loci. Of the lncRNAs that originated from genic loci, ∼20% were antisense to the host gene loci. Conclusions/Significance Consistent with similar studies in other organisms, noncoding transcription appears to be widespread in the maize genome. Computational predictions indicate that maize lncRNAs may function to regulate expression of other genes through multiple RNA mediated mechanisms. PMID:22916204
The genome sequence of the colonial chordate, Botryllus schlosseri

PubMed Central

Voskoboynik, Ayelet; Neff, Norma F; Sahoo, Debashis; Newman, Aaron M; Pushkarev, Dmitry; Koh, Winston; Passarelli, Benedetto; Fan, H Christina; Mantalas, Gary L; Palmeri, Karla J; Ishizuka, Katherine J; Gissi, Carmela; Griggio, Francesca; Ben-Shlomo, Rachel; Corey, Daniel M; Penland, Lolita; White, Richard A; Weissman, Irving L; Quake, Stephen R

2013-01-01

Botryllus schlosseri is a colonial urochordate that follows the chordate plan of development following sexual reproduction, but invokes a stem cell-mediated budding program during subsequent rounds of asexual reproduction. As urochordates are considered to be the closest living invertebrate relatives of vertebrates, they are ideal subjects for whole genome sequence analyses. Using a novel method for high-throughput sequencing of eukaryotic genomes, we sequenced and assembled 580 Mbp of the B. schlosseri genome. The genome assembly is comprised of nearly 14,000 intron-containing predicted genes, and 13,500 intron-less predicted genes, 40% of which could be confidently parceled into 13 (of 16 haploid) chromosomes. A comparison of homologous genes between B. schlosseri and other diverse taxonomic groups revealed genomic events underlying the evolution of vertebrates and lymphoid-mediated immunity. The B. schlosseri genome is a community resource for studying alternative modes of reproduction, natural transplantation reactions, and stem cell-mediated regeneration. DOI: http://dx.doi.org/10.7554/eLife.00569.001 PMID:23840927
Mindfulness-Based Stress Reduction training reduces loneliness and pro-inflammatory gene expression in older adults: a small randomized controlled trial.

PubMed

Creswell, J David; Irwin, Michael R; Burklund, Lisa J; Lieberman, Matthew D; Arevalo, Jesusa M G; Ma, Jeffrey; Breen, Elizabeth Crabb; Cole, Steven W

2012-10-01

Lonely older adults have increased expression of pro-inflammatory genes as well as increased risk for morbidity and mortality. Previous behavioral treatments have attempted to reduce loneliness and its concomitant health risks, but have had limited success. The present study tested whether the 8-week Mindfulness-Based Stress Reduction (MBSR) program (compared to a Wait-List control group) reduces loneliness and downregulates loneliness-related pro-inflammatory gene expression in older adults (N = 40). Consistent with study predictions, mixed effect linear models indicated that the MBSR program reduced loneliness, compared to small increases in loneliness in the control group (treatment condition × time interaction: F(1,35) = 7.86, p = .008). Moreover, at baseline, there was an association between reported loneliness and upregulated pro-inflammatory NF-κB-related gene expression in circulating leukocytes, and MBSR downregulated this NF-κB-associated gene expression profile at post-treatment. Finally, there was a trend for MBSR to reduce C Reactive Protein (treatment condition × time interaction: (F(1,33) = 3.39, p = .075). This work provides an initial indication that MBSR may be a novel treatment approach for reducing loneliness and related pro-inflammatory gene expression in older adults. Copyright © 2012 Elsevier Inc. All rights reserved.
Mindfulness-Based Stress Reduction Training Reduces Loneliness and Pro-Inflammatory Gene Expression in Older Adults: A Small Randomized Controlled Trial

PubMed Central

Creswell, J. David; Irwin, Michael R.; Burklund, Lisa J.; Lieberman, Matthew D.; Arevalo, Jesusa M. G.; Ma, Jeffrey; Breen, Elizabeth Crabb; Cole, Steven W.

2013-01-01

Lonely older adults have increased expression of pro-inflammatory genes as well as increased risk for morbidity and mortality. Previous behavioral treatments have attempted to reduce loneliness and its concomitant health risks, but have had limited success. The present study tested whether the 8-week Mindfulness-Based Stress Reduction (MBSR) program (compared to a Wait-List control group) reduces loneliness and downregulates loneliness-related pro-inflammatory gene expression in older adults (N=40). Consistent with study predictions, mixed effect linear models indicated that the MBSR program reduced loneliness, compared to small increases in loneliness in the control group (treatment condition × time interaction: F(1,35)=7.86, p=.008). Moreover, at baseline, there was an association between reported loneliness and upregulated pro-inflammatory NF-κB-related gene expression in circulating leukocytes, and MBSR downregulated this NF-κB-associated gene expression profile at post-treatment. Finally, there was a trend for MBSR to reduce C Reactive Protein (treatment condition × time interaction: (F(1,33)=3.39, p=.075). This work provides an initial indication that MBSR may be a novel treatment approach for reducing loneliness and related pro-inflammatory gene expression in older adults. PMID:22820409
Genomes to natural products PRediction Informatics for Secondary Metabolomes (PRISM)

PubMed Central

Skinnider, Michael A.; Dejong, Chris A.; Rees, Philip N.; Johnston, Chad W.; Li, Haoxin; Webster, Andrew L. H.; Wyatt, Morgan A.; Magarvey, Nathan A.

2015-01-01

Microbial natural products are an invaluable source of evolved bioactive small molecules and pharmaceutical agents. Next-generation and metagenomic sequencing indicates untapped genomic potential, yet high rediscovery rates of known metabolites increasingly frustrate conventional natural product screening programs. New methods to connect biosynthetic gene clusters to novel chemical scaffolds are therefore critical to enable the targeted discovery of genetically encoded natural products. Here, we present PRISM, a computational resource for the identification of biosynthetic gene clusters, prediction of genetically encoded nonribosomal peptides and type I and II polyketides, and bio- and cheminformatic dereplication of known natural products. PRISM implements novel algorithms which render it uniquely capable of predicting type II polyketides, deoxygenated sugars, and starter units, making it a comprehensive genome-guided chemical structure prediction engine. A library of 57 tailoring reactions is leveraged for combinatorial scaffold library generation when multiple potential substrates are consistent with biosynthetic logic. We compare the accuracy of PRISM to existing genomic analysis platforms. PRISM is an open-source, user-friendly web application available at http://magarveylab.ca/prism/. PMID:26442528
Binary Classification using Decision Tree based Genetic Programming and Its Application to Analysis of Bio-mass Data

NASA Astrophysics Data System (ADS)

To, Cuong; Pham, Tuan D.

2010-01-01

In machine learning, pattern recognition may be the most popular task. "Similar" patterns identification is also very important in biology because first, it is useful for prediction of patterns associated with disease, for example cancer tissue (normal or tumor); second, similarity or dissimilarity of the kinetic patterns is used to identify coordinately controlled genes or proteins involved in the same regulatory process. Third, similar genes (proteins) share similar functions. In this paper, we present an algorithm which uses genetic programming to create decision tree for binary classification problem. The application of the algorithm was implemented on five real biological databases. Base on the results of comparisons with well-known methods, we see that the algorithm is outstanding in most of cases.
RGAugury: a pipeline for genome-wide prediction of resistance gene analogs (RGAs) in plants.

PubMed

Li, Pingchuan; Quan, Xiande; Jia, Gaofeng; Xiao, Jin; Cloutier, Sylvie; You, Frank M

2016-11-02

Resistance gene analogs (RGAs), such as NBS-encoding proteins, receptor-like protein kinases (RLKs) and receptor-like proteins (RLPs), are potential R-genes that contain specific conserved domains and motifs. Thus, RGAs can be predicted based on their conserved structural features using bioinformatics tools. Computer programs have been developed for the identification of individual domains and motifs from the protein sequences of RGAs but none offer a systematic assessment of the different types of RGAs. A user-friendly and efficient pipeline is needed for large-scale genome-wide RGA predictions of the growing number of sequenced plant genomes. An integrative pipeline, named RGAugury, was developed to automate RGA prediction. The pipeline first identifies RGA-related protein domains and motifs, namely nucleotide binding site (NB-ARC), leucine rich repeat (LRR), transmembrane (TM), serine/threonine and tyrosine kinase (STTK), lysin motif (LysM), coiled-coil (CC) and Toll/Interleukin-1 receptor (TIR). RGA candidates are identified and classified into four major families based on the presence of combinations of these RGA domains and motifs: NBS-encoding, TM-CC, and membrane associated RLP and RLK. All time-consuming analyses of the pipeline are paralleled to improve performance. The pipeline was evaluated using the well-annotated Arabidopsis genome. A total of 98.5, 85.2, and 100 % of the reported NBS-encoding genes, membrane associated RLPs and RLKs were validated, respectively. The pipeline was also successfully applied to predict RGAs for 50 sequenced plant genomes. A user-friendly web interface was implemented to ease command line operations, facilitate visualization and simplify result management for multiple datasets. RGAugury is an efficiently integrative bioinformatics tool for large scale genome-wide identification of RGAs. It is freely available at Bitbucket: https://bitbucket.org/yaanlpc/rgaugury .
Identification of functional elements and regulatory circuits by Drosophila modENCODE

DOE Office of Scientific and Technical Information (OSTI.GOV)

Roy, Sushmita; Ernst, Jason; Kharchenko, Peter V.

2010-12-22

To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- andmore » tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation. Several years after the complete genetic sequencing of many species, it is still unclear how to translate genomic information into a functional map of cellular and developmental programs. The Encyclopedia of DNA Elements (ENCODE) (1) and model organism ENCODE (modENCODE) (2) projects use diverse genomic assays to comprehensively annotate the Homo sapiens (human), Drosophila melanogaster (fruit fly), and Caenorhabditis elegans (worm) genomes, through systematic generation and computational integration of functional genomic data sets. Previous genomic studies in flies have made seminal contributions to our understanding of basic biological mechanisms and genome functions, facilitated by genetic, experimental, computational, and manual annotation of the euchromatic and heterochromatic genome (3), small genome size, short life cycle, and a deep knowledge of development, gene function, and chromosome biology. The functions of {approx}40% of the protein and nonprotein-coding genes [FlyBase 5.12 (4)] have been determined from cDNA collections (5, 6), manual curation of gene models (7), gene mutations and comprehensive genome-wide RNA interference screens (8-10), and comparative genomic analyses (11, 12). The Drosophila modENCODE project has generated more than 700 data sets that profile transcripts, histone modifications and physical nucleosome properties, general and specific transcription factors (TFs), and replication programs in cell lines, isolated tissues, and whole organisms across several developmental stages (Fig. 1). Here, we computationally integrate these data sets and report (i) improved and additional genome annotations, including full-length proteincoding genes and peptides as short as 21 amino acids; (ii) noncoding transcripts, including 132 candidate structural RNAs and 1608 nonstructural transcripts; (iii) additional Argonaute (Ago)-associated small RNA genes and pathways, including new microRNAs (miRNAs) encoded within protein-coding exons and endogenous small interfering RNAs (siRNAs) from 3-inch untranslated regions; (iv) chromatin 'states' defined by combinatorial patterns of 18 chromatin marks that are associated with distinct functions and properties; (v) regions of high TF occupancy and replication activity with likely epigenetic regulation; (vi)mixed TF and miRNA regulatory networks with hierarchical structure and enriched feed-forward loops; (vii) coexpression- and co-regulation-based functional annotations for nearly 3000 genes; (viii) stage- and tissue-specific regulators; and (ix) predictive models of gene expression levels and regulator function.« less
Microarray-based cancer prediction using soft computing approach.

PubMed

Wang, Xiaosheng; Gotoh, Osamu

2009-05-26

One of the difficulties in using gene expression profiles to predict cancer is how to effectively select a few informative genes to construct accurate prediction models from thousands or ten thousands of genes. We screen highly discriminative genes and gene pairs to create simple prediction models involved in single genes or gene pairs on the basis of soft computing approach and rough set theory. Accurate cancerous prediction is obtained when we apply the simple prediction models for four cancerous gene expression datasets: CNS tumor, colon tumor, lung cancer and DLBCL. Some genes closely correlated with the pathogenesis of specific or general cancers are identified. In contrast with other models, our models are simple, effective and robust. Meanwhile, our models are interpretable for they are based on decision rules. Our results demonstrate that very simple models may perform well on cancerous molecular prediction and important gene markers of cancer can be detected if the gene selection approach is chosen reasonably.

Comparative analysis of grapevine whole-genome gene predictions, functional annotation, categorization and integration of the predicted gene sequences

PubMed Central

2012-01-01

Background The first draft assembly and gene prediction of the grapevine genome (8X base coverage) was made available to the scientific community in 2007, and functional annotation was developed on this gene prediction. Since then additional Sanger sequences were added to the 8X sequences pool and a new version of the genomic sequence with superior base coverage (12X) was produced. Results In order to more efficiently annotate the function of the genes predicted in the new assembly, it is important to build on as much of the previous work as possible, by transferring 8X annotation of the genome to the 12X version. The 8X and 12X assemblies and gene predictions of the grapevine genome were compared to answer the question, “Can we uniquely map 8X predicted genes to 12X predicted genes?” The results show that while the assemblies and gene structure predictions are too different to make a complete mapping between them, most genes (18,725) showed a one-to-one relationship between 8X predicted genes and the last version of 12X predicted genes. In addition, reshuffled genomic sequence structures appeared. These highlight regions of the genome where the gene predictions need to be taken with caution. Based on the new grapevine gene functional annotation and in-depth functional categorization, twenty eight new molecular networks have been created for VitisNet while the existing networks were updated. Conclusions The outcomes of this study provide a functional annotation of the 12X genes, an update of VitisNet, the system of the grapevine molecular networks, and a new functional categorization of genes. Data are available at the VitisNet website (http://www.sdstate.edu/ps/research/vitis/pathways.cfm). PMID:22554261
Gene expression models for prediction of longitudinal dispersion coefficient in streams

NASA Astrophysics Data System (ADS)

Sattar, Ahmed M. A.; Gharabaghi, Bahram

2015-05-01

Longitudinal dispersion is the key hydrologic process that governs transport of pollutants in natural streams. It is critical for spill action centers to be able to predict the pollutant travel time and break-through curves accurately following accidental spills in urban streams. This study presents a novel gene expression model for longitudinal dispersion developed using 150 published data sets of geometric and hydraulic parameters in natural streams in the United States, Canada, Europe, and New Zealand. The training and testing of the model were accomplished using randomly-selected 67% (100 data sets) and 33% (50 data sets) of the data sets, respectively. Gene expression programming (GEP) is used to develop empirical relations between the longitudinal dispersion coefficient and various control variables, including the Froude number which reflects the effect of reach slope, aspect ratio, and the bed material roughness on the dispersion coefficient. Two GEP models have been developed, and the prediction uncertainties of the developed GEP models are quantified and compared with those of existing models, showing improved prediction accuracy in favor of GEP models. Finally, a parametric analysis is performed for further verification of the developed GEP models. The main reason for the higher accuracy of the GEP models compared to the existing regression models is that exponents of the key variables (aspect ratio and bed material roughness) are not constants but a function of the Froude number. The proposed relations are both simple and accurate and can be effectively used to predict the longitudinal dispersion coefficients in natural streams.
Directed random walks and constraint programming reveal active pathways in hepatocyte growth factor signaling.

PubMed

Kittas, Aristotelis; Delobelle, Aurélien; Schmitt, Sabrina; Breuhahn, Kai; Guziolowski, Carito; Grabe, Niels

2016-01-01

An effective means to analyze mRNA expression data is to take advantage of established knowledge from pathway databases, using methods such as pathway-enrichment analyses. However, pathway databases are not case-specific and expression data could be used to infer gene-regulation patterns in the context of specific pathways. In addition, canonical pathways may not always describe the signaling mechanisms properly, because interactions can frequently occur between genes in different pathways. Relatively few methods have been proposed to date for generating and analyzing such networks, preserving the causality between gene interactions and reasoning over the qualitative logic of regulatory effects. We present an algorithm (MCWalk) integrated with a logic programming approach, to discover subgraphs in large-scale signaling networks by random walks in a fully automated pipeline. As an exemplary application, we uncover the signal transduction mechanisms in a gene interaction network describing hepatocyte growth factor-stimulated cell migration and proliferation from gene-expression measured with microarray and RT-qPCR using in-house perturbation experiments in a keratinocyte-fibroblast co-culture. The resulting subgraphs illustrate possible associations of hepatocyte growth factor receptor c-Met nodes, differentially expressed genes and cellular states. Using perturbation experiments and Answer Set programming, we are able to select those which are more consistent with the experimental data. We discover key regulator nodes by measuring the frequency with which they are traversed when connecting signaling between receptors and significantly regulated genes and predict their expression-shift consistently with the measured data. The Java implementation of MCWalk is publicly available under the MIT license at: https://bitbucket.org/akittas/biosubg. © 2015 FEBS.
Ensemble gene function prediction database reveals genes important for complex I formation in Arabidopsis thaliana.

PubMed

Hansen, Bjoern Oest; Meyer, Etienne H; Ferrari, Camilla; Vaid, Neha; Movahedi, Sara; Vandepoele, Klaas; Nikoloski, Zoran; Mutwil, Marek

2018-03-01

Recent advances in gene function prediction rely on ensemble approaches that integrate results from multiple inference methods to produce superior predictions. Yet, these developments remain largely unexplored in plants. We have explored and compared two methods to integrate 10 gene co-function networks for Arabidopsis thaliana and demonstrate how the integration of these networks produces more accurate gene function predictions for a larger fraction of genes with unknown function. These predictions were used to identify genes involved in mitochondrial complex I formation, and for five of them, we confirmed the predictions experimentally. The ensemble predictions are provided as a user-friendly online database, EnsembleNet. The methods presented here demonstrate that ensemble gene function prediction is a powerful method to boost prediction performance, whereas the EnsembleNet database provides a cutting-edge community tool to guide experimentalists. © 2017 The Authors. New Phytologist © 2017 New Phytologist Trust.
GeMS: an advanced software package for designing synthetic genes.

PubMed

Jayaraj, Sebastian; Reid, Ralph; Santi, Daniel V

2005-01-01

A user-friendly, advanced software package for gene design is described. The software comprises an integrated suite of programs-also provided as stand-alone tools-that automatically performs the following tasks in gene design: restriction site prediction, codon optimization for any expression host, restriction site inclusion and exclusion, separation of long sequences into synthesizable fragments, T(m) and stem-loop determinations, optimal oligonucleotide component design and design verification/error-checking. The output is a complete design report and a list of optimized oligonucleotides to be prepared for subsequent gene synthesis. The user interface accommodates both inexperienced and experienced users. For inexperienced users, explanatory notes are provided such that detailed instructions are not necessary; for experienced users, a streamlined interface is provided without such notes. The software has been extensively tested in the design and successful synthesis of over 400 kb of genes, many of which exceeded 5 kb in length.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Hamaji, Takashi; Lopez, David; Pellegrini, Matteo

Upon fertilization Chlamydomonas reinhardtii zygotes undergo a program of differentiation into a diploid zygospore that is accompanied by transcription of hundreds of zygote-specific genes. We identified a distinct sequence motif we term a zygotic response element (ZYRE) that is highly enriched in promoter regions of C. reinhardtii early zygotic genes. A luciferase reporter assay was used to show that native ZYRE motifs within the promoter of zygotic gene ZYS3 or intron of zygotic gene DMT4 are necessary for zygotic induction. A synthetic luciferase reporter with a minimal promoter was used to show that ZYRE motifs introduced upstream are sufficient tomore » confer zygotic upregulation, and that ZYRE-controlled zygotic transcription is dependent on the homeodomain transcription factor GSP1. Furthermore, we predict that ZYRE motifs will correspond to binding sites for the homeodomain proteins GSP1-GSM1 that heterodimerize and activate zygotic gene expression in early zygotes.« less
Integrating genomics and proteomics data to predict drug effects using binary linear programming.

PubMed

Ji, Zhiwei; Su, Jing; Liu, Chenglin; Wang, Hongyan; Huang, Deshuang; Zhou, Xiaobo

2014-01-01

The Library of Integrated Network-Based Cellular Signatures (LINCS) project aims to create a network-based understanding of biology by cataloging changes in gene expression and signal transduction that occur when cells are exposed to a variety of perturbations. It is helpful for understanding cell pathways and facilitating drug discovery. Here, we developed a novel approach to infer cell-specific pathways and identify a compound's effects using gene expression and phosphoproteomics data under treatments with different compounds. Gene expression data were employed to infer potential targets of compounds and create a generic pathway map. Binary linear programming (BLP) was then developed to optimize the generic pathway topology based on the mid-stage signaling response of phosphorylation. To demonstrate effectiveness of this approach, we built a generic pathway map for the MCF7 breast cancer cell line and inferred the cell-specific pathways by BLP. The first group of 11 compounds was utilized to optimize the generic pathways, and then 4 compounds were used to identify effects based on the inferred cell-specific pathways. Cross-validation indicated that the cell-specific pathways reliably predicted a compound's effects. Finally, we applied BLP to re-optimize the cell-specific pathways to predict the effects of 4 compounds (trichostatin A, MS-275, staurosporine, and digoxigenin) according to compound-induced topological alterations. Trichostatin A and MS-275 (both HDAC inhibitors) inhibited the downstream pathway of HDAC1 and caused cell growth arrest via activation of p53 and p21; the effects of digoxigenin were totally opposite. Staurosporine blocked the cell cycle via p53 and p21, but also promoted cell growth via activated HDAC1 and its downstream pathway. Our approach was also applied to the PC3 prostate cancer cell line, and the cross-validation analysis showed very good accuracy in predicting effects of 4 compounds. In summary, our computational model can be used to elucidate potential mechanisms of a compound's efficacy.
MIRNA-DISTILLER: A Stand-Alone Application to Compile microRNA Data from Databases.

PubMed

Rieger, Jessica K; Bodan, Denis A; Zanger, Ulrich M

2011-01-01

MicroRNAs (miRNA) are small non-coding RNA molecules of ∼22 nucleotides which regulate large numbers of genes by binding to seed sequences at the 3'-untranslated region of target gene transcripts. The target mRNA is then usually degraded or translation is inhibited, although thus resulting in posttranscriptional down regulation of gene expression at the mRNA and/or protein level. Due to the bioinformatic difficulties in predicting functional miRNA binding sites, several publically available databases have been developed that predict miRNA binding sites based on different algorithms. The parallel use of different databases is currently indispensable, but highly uncomfortable and time consuming, especially when working with numerous genes of interest. We have therefore developed a new stand-alone program, termed MIRNA-DISTILLER, which allows to compile miRNA data for given target genes from public databases. Currently implemented are TargetScan, microCosm, and miRDB, which may be queried independently, pairwise, or together to calculate the respective intersections. Data are stored locally for application of further analysis tools including freely definable biological parameter filters, customized output-lists for both miRNAs and target genes, and various graphical facilities. The software, a data example file and a tutorial are freely available at http://www.ikp-stuttgart.de/content/language1/html/10415.asp.
MIRNA-DISTILLER: A Stand-Alone Application to Compile microRNA Data from Databases

PubMed Central

Rieger, Jessica K.; Bodan, Denis A.; Zanger, Ulrich M.

2011-01-01

MicroRNAs (miRNA) are small non-coding RNA molecules of ∼22 nucleotides which regulate large numbers of genes by binding to seed sequences at the 3′-untranslated region of target gene transcripts. The target mRNA is then usually degraded or translation is inhibited, although thus resulting in posttranscriptional down regulation of gene expression at the mRNA and/or protein level. Due to the bioinformatic difficulties in predicting functional miRNA binding sites, several publically available databases have been developed that predict miRNA binding sites based on different algorithms. The parallel use of different databases is currently indispensable, but highly uncomfortable and time consuming, especially when working with numerous genes of interest. We have therefore developed a new stand-alone program, termed MIRNA-DISTILLER, which allows to compile miRNA data for given target genes from public databases. Currently implemented are TargetScan, microCosm, and miRDB, which may be queried independently, pairwise, or together to calculate the respective intersections. Data are stored locally for application of further analysis tools including freely definable biological parameter filters, customized output-lists for both miRNAs and target genes, and various graphical facilities. The software, a data example file and a tutorial are freely available at http://www.ikp-stuttgart.de/content/language1/html/10415.asp PMID:22303335
Prediction of gene expression with cis-SNPs using mixed models and regularization methods.

PubMed

Zeng, Ping; Zhou, Xiang; Huang, Shuiping

2017-05-11

It has been shown that gene expression in human tissues is heritable, thus predicting gene expression using only SNPs becomes possible. The prediction of gene expression can offer important implications on the genetic architecture of individual functional associated SNPs and further interpretations of the molecular basis underlying human diseases. We compared three types of methods for predicting gene expression using only cis-SNPs, including the polygenic model, i.e. linear mixed model (LMM), two sparse models, i.e. Lasso and elastic net (ENET), and the hybrid of LMM and sparse model, i.e. Bayesian sparse linear mixed model (BSLMM). The three kinds of prediction methods have very different assumptions of underlying genetic architectures. These methods were evaluated using simulations under various scenarios, and were applied to the Geuvadis gene expression data. The simulations showed that these four prediction methods (i.e. Lasso, ENET, LMM and BSLMM) behaved best when their respective modeling assumptions were satisfied, but BSLMM had a robust performance across a range of scenarios. According to R 2 of these models in the Geuvadis data, the four methods performed quite similarly. We did not observe any clustering or enrichment of predictive genes (defined as genes with R 2 ≥ 0.05) across the chromosomes, and also did not see there was any clear relationship between the proportion of the predictive genes and the proportion of genes in each chromosome. However, an interesting finding in the Geuvadis data was that highly predictive genes (e.g. R 2 ≥ 0.30) may have sparse genetic architectures since Lasso, ENET and BSLMM outperformed LMM for these genes; and this observation was validated in another gene expression data. We further showed that the predictive genes were enriched in approximately independent LD blocks. Gene expression can be predicted with only cis-SNPs using well-developed prediction models and these predictive genes were enriched in some approximately independent LD blocks. The prediction of gene expression can shed some light on the functional interpretation for identified SNPs in GWASs.
Nucleotide polymorphisms in a pine ortholog of the Arabidopsis degrading enzyme cellulase KORRIGAN are associated with early growth performance in Pinus pinaster.

PubMed

Cabezas, José Antonio; González-Martínez, Santiago C; Collada, Carmen; Guevara, María Angeles; Boury, Christophe; de María, Nuria; Eveno, Emmanuelle; Aranda, Ismael; Garnier-Géré, Pauline H; Brach, Jean; Alía, Ricardo; Plomion, Christophe; Cervera, María Teresa

2015-09-01

We have carried out a candidate-gene-based association genetic study in Pinus pinaster Aiton and evaluated the predictive performance for genetic merit gain of the most significantly associated genes and single nucleotide polymorphisms (SNPs). We used a second generation 384-SNP array enriched with candidate genes for growth and wood properties to genotype mother trees collected in 20 natural populations covering most of the European distribution of the species. Phenotypic data for total height, polycyclism, root-collar diameter and biomass were obtained from a replicated provenance-progeny trial located in two sites with contrasting environments (Atlantic vs Mediterranean climate). General linear models identified strong associations between growth traits (total height and polycyclism) and four SNPs from the korrigan candidate gene, after multiple testing corrections using false discovery rate. The combined genomic breeding value predictions assessed for the four associated korrigan SNPs by ridge regression-best linear unbiased prediction (RR-BLUP) and cross-validation accounted for up to 8 and 15% of the phenotypic variance for height and polycyclic growth, respectively, and did not improve adding SNPs from other growth-related candidate genes. For root-collar diameter and total biomass, they accounted for 1.6 and 1.1% of the phenotypic variance, respectively, but increased to 15 and 4.1% when other SNPs from lp3.1, lp3.3 and cad were included in RR-BLUP models. These results point towards a desirable integration of candidate-gene studies as a means to pre-select relevant markers, and aid genomic selection in maritime pine breeding programs. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Genome-wide computational identification of microRNAs and their targets in the deep-branching eukaryote Giardia lamblia.

PubMed

Zhang, Yan-Qiong; Chen, Dong-Liang; Tian, Hai-Feng; Zhang, Bao-Hong; Wen, Jian-Fan

2009-10-01

Using a combined computational program, we identified 50 potential microRNAs (miRNAs) in Giardia lamblia, one of the most primitive unicellular eukaryotes. These miRNAs are unique to G. lamblia and no homologues have been found in other organisms; miRNAs, currently known in other species, were not found in G. lamblia. This suggests that miRNA biogenesis and miRNA-mediated gene regulation pathway may evolve independently, especially in evolutionarily distant lineages. A majority (43) of the predicted miRNAs are located at one single locus; however, some miRNAs have two or more copies in the genome. Among the 58 miRNA genes, 28 are located in the intergenic regions whereas 30 are present in the anti-sense strands of the protein-coding sequences. Five predicted miRNAs are expressed in G. lamblia trophozoite cells evidenced by expressed sequence tags or RT-PCR. Thirty-seven identified miRNAs may target 50 protein-coding genes, including seven variant-specific surface proteins (VSPs). Our findings provide a clue that miRNA-mediated gene regulation may exist in the early stage of eukaryotic evolution, suggesting that it is an important regulation system ubiquitous in eukaryotes.
In silico identification of genetic variants in glucocerebrosidase (GBA) gene involved in Gaucher's disease using multiple software tools.

PubMed

Manickam, Madhumathi; Ravanan, Palaniyandi; Singh, Pratibha; Talwar, Priti

2014-01-01

Gaucher's disease (GD) is an autosomal recessive disorder caused by the deficiency of glucocerebrosidase, a lysosomal enzyme that catalyses the hydrolysis of the glycolipid glucocerebroside to ceramide and glucose. Polymorphisms in GBA gene have been associated with the development of Gaucher disease. We hypothesize that prediction of SNPs using multiple state of the art software tools will help in increasing the confidence in identification of SNPs involved in GD. Enzyme replacement therapy is the only option for GD. Our goal is to use several state of art SNP algorithms to predict/address harmful SNPs using comparative studies. In this study seven different algorithms (SIFT, MutPred, nsSNP Analyzer, PANTHER, PMUT, PROVEAN, and SNPs&GO) were used to predict the harmful polymorphisms. Among the seven programs, SIFT found 47 nsSNPs as deleterious, MutPred found 46 nsSNPs as harmful. nsSNP Analyzer program found 43 out of 47 nsSNPs are disease causing SNPs whereas PANTHER found 32 out of 47 as highly deleterious, 22 out of 47 are classified as pathological mutations by PMUT, 44 out of 47 were predicted to be deleterious by PROVEAN server, all 47 shows the disease related mutations by SNPs&GO. Twenty two nsSNPs were commonly predicted by all the seven different algorithms. The common 22 targeted mutations are F251L, C342G, W312C, P415R, R463C, D127V, A309V, G46E, G202E, P391L, Y363C, Y205C, W378C, I402T, S366R, F397S, Y418C, P401L, G195E, W184R, R48W, and T43R.
A comparative analysis of soft computing techniques for gene prediction.

PubMed

Goel, Neelam; Singh, Shailendra; Aseri, Trilok Chand

2013-07-01

The rapid growth of genomic sequence data for both human and nonhuman species has made analyzing these sequences, especially predicting genes in them, very important and is currently the focus of many research efforts. Beside its scientific interest in the molecular biology and genomics community, gene prediction is of considerable importance in human health and medicine. A variety of gene prediction techniques have been developed for eukaryotes over the past few years. This article reviews and analyzes the application of certain soft computing techniques in gene prediction. First, the problem of gene prediction and its challenges are described. These are followed by different soft computing techniques along with their application to gene prediction. In addition, a comparative analysis of different soft computing techniques for gene prediction is given. Finally some limitations of the current research activities and future research directions are provided. Copyright © 2013 Elsevier Inc. All rights reserved.
A Web interface generator for molecular biology programs in Unix.

PubMed

Letondal, C

2001-01-01

Almost all users encounter problems using sequence analysis programs. Not only are they difficult to learn because of the parameters, syntax and semantic, but many are different. That is why we have developed a Web interface generator for more than 150 molecular biology command-line driven programs, including: phylogeny, gene prediction, alignment, RNA, DNA and protein analysis, motif discovery, structure analysis and database searching programs. The generator uses XML as a high-level description language of the legacy software parameters. Its aim is to provide users with the equivalent of a basic Unix environment, with program combination, customization and basic scripting through macro registration. The program has been used for three years by about 15000 users throughout the world; it has recently been installed on other sites and evaluated as a standard user interface for EMBOSS programs.
Seed maturation associated transcriptional programs and regulatory networks underlying genotypic difference in seed dormancy and size/weight in wheat (Triticum aestivum L.).

PubMed

Yamasaki, Yuji; Gao, Feng; Jordan, Mark C; Ayele, Belay T

2017-09-16

Maturation forms one of the critical seed developmental phases and it is characterized mainly by programmed cell death, dormancy and desiccation, however, the transcriptional programs and regulatory networks underlying acquisition of dormancy and deposition of storage reserves during the maturation phase of seed development are poorly understood in wheat. The present study performed comparative spatiotemporal transcriptomic analysis of seed maturation in two wheat genotypes with contrasting seed weight/size and dormancy phenotype. The embryo and endosperm tissues of maturing seeds appeared to exhibit genotype-specific temporal shifts in gene expression profile that might contribute to the seed phenotypic variations. Functional annotations of gene clusters suggest that the two tissues exhibit distinct but genotypically overlapping molecular functions. Motif enrichment predicts genotypically distinct abscisic acid (ABA) and gibberellin (GA) regulated transcriptional networks contribute to the contrasting seed weight/size and dormancy phenotypes between the two genotypes. While other ABA responsive element (ABRE) motifs are enriched in both genotypes, the prevalence of G-box-like motif specifically in tissues of the dormant genotype suggests distinct ABA mediated transcriptional mechanisms control the establishment of dormancy during seed maturation. In agreement with this, the bZIP transcription factors that co-express with ABRE enriched embryonic genes differ with genotype. The enrichment of SITEIIATCYTC motif specifically in embryo clusters of maturing seeds irrespective of genotype predicts a tissue specific role for the respective TCP transcription factors with no or minimal contribution to the variations in seed dormancy. The results of this study advance our understanding of the seed maturation associated molecular mechanisms underlying variation in dormancy and weight/size in wheat seeds, which is a critical step towards the designing of molecular strategies for enhancing seed yield and quality.
Prediction of epigenetically regulated genes in breast cancer cell lines.

PubMed

Loss, Leandro A; Sadanandam, Anguraj; Durinck, Steffen; Nautiyal, Shivani; Flaucher, Diane; Carlton, Victoria E H; Moorhead, Martin; Lu, Yontao; Gray, Joe W; Faham, Malek; Spellman, Paul; Parvin, Bahram

2010-06-04

Methylation of CpG islands within the DNA promoter regions is one mechanism that leads to aberrant gene expression in cancer. In particular, the abnormal methylation of CpG islands may silence associated genes. Therefore, using high-throughput microarrays to measure CpG island methylation will lead to better understanding of tumor pathobiology and progression, while revealing potentially new biomarkers. We have examined a recently developed high-throughput technology for measuring genome-wide methylation patterns called mTACL. Here, we propose a computational pipeline for integrating gene expression and CpG island methylation profiles to identify epigenetically regulated genes for a panel of 45 breast cancer cell lines, which is widely used in the Integrative Cancer Biology Program (ICBP). The pipeline (i) reduces the dimensionality of the methylation data, (ii) associates the reduced methylation data with gene expression data, and (iii) ranks methylation-expression associations according to their epigenetic regulation. Dimensionality reduction is performed in two steps: (i) methylation sites are grouped across the genome to identify regions of interest, and (ii) methylation profiles are clustered within each region. Associations between the clustered methylation and the gene expression data sets generate candidate matches within a fixed neighborhood around each gene. Finally, the methylation-expression associations are ranked through a logistic regression, and their significance is quantified through permutation analysis. Our two-step dimensionality reduction compressed 90% of the original data, reducing 137,688 methylation sites to 14,505 clusters. Methylation-expression associations produced 18,312 correspondences, which were used to further analyze epigenetic regulation. Logistic regression was used to identify 58 genes from these correspondences that showed a statistically significant negative correlation between methylation profiles and gene expression in the panel of breast cancer cell lines. Subnetwork enrichment of these genes has identified 35 common regulators with 6 or more predicted markers. In addition to identifying epigenetically regulated genes, we show evidence of differentially expressed methylation patterns between the basal and luminal subtypes. Our results indicate that the proposed computational protocol is a viable platform for identifying epigenetically regulated genes. Our protocol has generated a list of predictors including COL1A2, TOP2A, TFF1, and VAV3, genes whose key roles in epigenetic regulation is documented in the literature. Subnetwork enrichment of these predicted markers further suggests that epigenetic regulation of individual genes occurs in a coordinated fashion and through common regulators.
A general framework for optimization of probes for gene expression microarray and its application to the fungus Podospora anserina.

PubMed

Bidard, Frédérique; Imbeaud, Sandrine; Reymond, Nancie; Lespinet, Olivier; Silar, Philippe; Clavé, Corinne; Delacroix, Hervé; Berteaux-Lecellier, Véronique; Debuchy, Robert

2010-06-18

The development of new microarray technologies makes custom long oligonucleotide arrays affordable for many experimental applications, notably gene expression analyses. Reliable results depend on probe design quality and selection. Probe design strategy should cope with the limited accuracy of de novo gene prediction programs, and annotation up-dating. We present a novel in silico procedure which addresses these issues and includes experimental screening, as an empirical approach is the best strategy to identify optimal probes in the in silico outcome. We used four criteria for in silico probe selection: cross-hybridization, hairpin stability, probe location relative to coding sequence end and intron position. This latter criterion is critical when exon-intron gene structure predictions for intron-rich genes are inaccurate. For each coding sequence (CDS), we selected a sub-set of four probes. These probes were included in a test microarray, which was used to evaluate the hybridization behavior of each probe. The best probe for each CDS was selected according to three experimental criteria: signal-to-noise ratio, signal reproducibility, and representative signal intensities. This procedure was applied for the development of a gene expression Agilent platform for the filamentous fungus Podospora anserina and the selection of a single 60-mer probe for each of the 10,556 P. anserina CDS. A reliable gene expression microarray version based on the Agilent 44K platform was developed with four spot replicates of each probe to increase statistical significance of analysis.
WGSSAT: A High-Throughput Computational Pipeline for Mining and Annotation of SSR Markers From Whole Genomes.

PubMed

Pandey, Manmohan; Kumar, Ravindra; Srivastava, Prachi; Agarwal, Suyash; Srivastava, Shreya; Nagpure, Naresh S; Jena, Joy K; Kushwaha, Basdeo

2018-03-16

Mining and characterization of Simple Sequence Repeat (SSR) markers from whole genomes provide valuable information about biological significance of SSR distribution and also facilitate development of markers for genetic analysis. Whole genome sequencing (WGS)-SSR Annotation Tool (WGSSAT) is a graphical user interface pipeline developed using Java Netbeans and Perl scripts which facilitates in simplifying the process of SSR mining and characterization. WGSSAT takes input in FASTA format and automates the prediction of genes, noncoding RNA (ncRNA), core genes, repeats and SSRs from whole genomes followed by mapping of the predicted SSRs onto a genome (classified according to genes, ncRNA, repeats, exonic, intronic, and core gene region) along with primer identification and mining of cross-species markers. The program also generates a detailed statistical report along with visualization of mapped SSRs, genes, core genes, and RNAs. The features of WGSSAT were demonstrated using Takifugu rubripes data. This yielded a total of 139 057 SSR, out of which 113 703 SSR primer pairs were uniquely amplified in silico onto a T. rubripes (fugu) genome. Out of 113 703 mined SSRs, 81 463 were from coding region (including 4286 exonic and 77 177 intronic), 7 from RNA, 267 from core genes of fugu, whereas 105 641 SSR and 601 SSR primer pairs were uniquely mapped onto the medaka genome. WGSSAT is tested under Ubuntu Linux. The source code, documentation, user manual, example dataset and scripts are available online at https://sourceforge.net/projects/wgssat-nbfgr.
Fusion Genes Predict Prostate Cancer Recurrence

DTIC Science & Technology

2017-10-01

we will develop a training program centered on genomics and cell culturing methods to train new investigators to carry out research in benign urologic...Medical Research and Materiel Command Fort Detrick, Maryland 21702-5012 DISTRIBUTION STATEMENT: Approved for Public Release; Distribution...MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S) U.S. Army Medical Research and Materiel Command Fort Detrick, Maryland

HuMiChip: Development of a Functional Gene Array for the Study of Human Microbiomes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tu, Q.; Deng, Ye; Lin, Lu

Microbiomes play very important roles in terms of nutrition, health and disease by interacting with their hosts. Based on sequence data currently available in public domains, we have developed a functional gene array to monitor both organismal and functional gene profiles of normal microbiota in human and mouse hosts, and such an array is called human and mouse microbiota array, HMM-Chip. First, seed sequences were identified from KEGG databases, and used to construct a seed database (seedDB) containing 136 gene families in 19 metabolic pathways closely related to human and mouse microbiomes. Second, a mother database (motherDB) was constructed withmore » 81 genomes of bacterial strains with 54 from gut and 27 from oral environments, and 16 metagenomes, and used for selection of genes and probe design. Gene prediction was performed by Glimmer3 for bacterial genomes, and by the Metagene program for metagenomes. In total, 228,240 and 801,599 genes were identified for bacterial genomes and metagenomes, respectively. Then the motherDB was searched against the seedDB using the HMMer program, and gene sequences in the motherDB that were highly homologous with seed sequences in the seedDB were used for probe design by the CommOligo software. Different degrees of specific probes, including gene-specific, inclusive and exclusive group-specific probes were selected. All candidate probes were checked against the motherDB and NCBI databases for specificity. Finally, 7,763 probes covering 91.2percent (12,601 out of 13,814) HMMer confirmed sequences from 75 bacterial genomes and 16 metagenomes were selected. This developed HMM-Chip is able to detect the diversity and abundance of functional genes, the gene expression of microbial communities, and potentially, the interactions of microorganisms and their hosts.« less
Identification and characterization of a cis-regulatory element for zygotic gene expression in Chlamydomonas reinhardtii

DOE PAGES

Hamaji, Takashi; Lopez, David; Pellegrini, Matteo; ...

2016-03-26

Upon fertilization Chlamydomonas reinhardtii zygotes undergo a program of differentiation into a diploid zygospore that is accompanied by transcription of hundreds of zygote-specific genes. We identified a distinct sequence motif we term a zygotic response element (ZYRE) that is highly enriched in promoter regions of C. reinhardtii early zygotic genes. A luciferase reporter assay was used to show that native ZYRE motifs within the promoter of zygotic gene ZYS3 or intron of zygotic gene DMT4 are necessary for zygotic induction. A synthetic luciferase reporter with a minimal promoter was used to show that ZYRE motifs introduced upstream are sufficient tomore » confer zygotic upregulation, and that ZYRE-controlled zygotic transcription is dependent on the homeodomain transcription factor GSP1. Furthermore, we predict that ZYRE motifs will correspond to binding sites for the homeodomain proteins GSP1-GSM1 that heterodimerize and activate zygotic gene expression in early zygotes.« less
Outcome-Driven Cluster Analysis with Application to Microarray Data.

PubMed

Hsu, Jessie J; Finkelstein, Dianne M; Schoenfeld, David A

2015-01-01

One goal of cluster analysis is to sort characteristics into groups (clusters) so that those in the same group are more highly correlated to each other than they are to those in other groups. An example is the search for groups of genes whose expression of RNA is correlated in a population of patients. These genes would be of greater interest if their common level of RNA expression were additionally predictive of the clinical outcome. This issue arose in the context of a study of trauma patients on whom RNA samples were available. The question of interest was whether there were groups of genes that were behaving similarly, and whether each gene in the cluster would have a similar effect on who would recover. For this, we develop an algorithm to simultaneously assign characteristics (genes) into groups of highly correlated genes that have the same effect on the outcome (recovery). We propose a random effects model where the genes within each group (cluster) equal the sum of a random effect, specific to the observation and cluster, and an independent error term. The outcome variable is a linear combination of the random effects of each cluster. To fit the model, we implement a Markov chain Monte Carlo algorithm based on the likelihood of the observed data. We evaluate the effect of including outcome in the model through simulation studies and describe a strategy for prediction. These methods are applied to trauma data from the Inflammation and Host Response to Injury research program, revealing a clustering of the genes that are informed by the recovery outcome.
Predicting the activity of drugs for a group of imidazopyridine anticoccidial compounds.

PubMed

Si, Hongzong; Lian, Ning; Yuan, Shuping; Fu, Aiping; Duan, Yun-Bo; Zhang, Kejun; Yao, Xiaojun

2009-10-01

Gene expression programming (GEP) is a novel machine learning technique. The GEP is used to build nonlinear quantitative structure-activity relationship model for the prediction of the IC(50) for the imidazopyridine anticoccidial compounds. This model is based on descriptors which are calculated from the molecular structure. Four descriptors are selected from the descriptors' pool by heuristic method (HM) to build multivariable linear model. The GEP method produced a nonlinear quantitative model with a correlation coefficient and a mean error of 0.96 and 0.24 for the training set, 0.91 and 0.52 for the test set, respectively. It is shown that the GEP predicted results are in good agreement with experimental ones.
Integrative Analysis Reveals Regulatory Programs in Endometriosis

PubMed Central

Yang, Huan; Kang, Kai; Cheng, Chao; Mamillapalli, Ramanaiah; Taylor, Hugh S.

2015-01-01

Endometriosis is a common gynecological disease found in approximately 10% of reproductive-age women. Gene expression analysis has been performed to explore alterations in gene expression associated with endometriosis; however, the underlying transcription factors (TFs) governing such expression changes have not been investigated in a systematic way. In this study, we propose a method to integrate gene expression with TF binding data and protein–protein interactions to construct an integrated regulatory network (IRN) for endometriosis. The IRN has shown that the most regulated gene in endometriosis is RUNX1, which is targeted by 14 of 26 TFs also involved in endometriosis. Using 2 published cohorts, GSE7305 (Hover, n = 20) and GSE7307 (Roth, n = 36) from the Gene Expression Omnibus database, we identified a network of TFs, which bind to target genes that are differentially expressed in endometriosis. Enrichment analysis based on the hypergeometric distribution allowed us to predict the TFs involved in endometriosis (n = 40). This included known TFs such as androgen receptor (AR) and critical factors in the pathology of endometriosis, estrogen receptor α, and estrogen receptor β. We also identified several new ones from which we selected FOXA2 and TFAP2C, and their regulation was confirmed by quantitative real-time polymerase chain reaction and immunohistochemistry (IHC). Further, our analysis revealed that the function of AR and p53 in endometriosis is regulated by posttranscriptional changes and not by differential gene expression. Our integrative analysis provides new insights into the regulatory programs involved in endometriosis. PMID:26134036
Monthly reservoir inflow forecasting using a new hybrid SARIMA genetic programming approach

NASA Astrophysics Data System (ADS)

Moeeni, Hamid; Bonakdari, Hossein; Ebtehaj, Isa

2017-03-01

Forecasting reservoir inflow is one of the most important components of water resources and hydroelectric systems operation management. Seasonal autoregressive integrated moving average (SARIMA) models have been frequently used for predicting river flow. SARIMA models are linear and do not consider the random component of statistical data. To overcome this shortcoming, monthly inflow is predicted in this study based on a combination of seasonal autoregressive integrated moving average (SARIMA) and gene expression programming (GEP) models, which is a new hybrid method (SARIMA-GEP). To this end, a four-step process is employed. First, the monthly inflow datasets are pre-processed. Second, the datasets are modelled linearly with SARIMA and in the third stage, the non-linearity of residual series caused by linear modelling is evaluated. After confirming the non-linearity, the residuals are modelled in the fourth step using a gene expression programming (GEP) method. The proposed hybrid model is employed to predict the monthly inflow to the Jamishan Dam in west Iran. Thirty years' worth of site measurements of monthly reservoir dam inflow with extreme seasonal variations are used. The results of this hybrid model (SARIMA-GEP) are compared with SARIMA, GEP, artificial neural network (ANN) and SARIMA-ANN models. The results indicate that the SARIMA-GEP model ( R 2=78.8, VAF =78.8, RMSE =0.89, MAPE =43.4, CRM =0.053) outperforms SARIMA and GEP and SARIMA-ANN ( R 2=68.3, VAF =66.4, RMSE =1.12, MAPE =56.6, CRM =0.032) displays better performance than the SARIMA and ANN models. A comparison of the two hybrid models indicates the superiority of SARIMA-GEP over the SARIMA-ANN model.
Development and validation of a gene expression-based signature to predict distant metastasis in locoregionally advanced nasopharyngeal carcinoma: a retrospective, multicentre, cohort study.

PubMed

Tang, Xin-Ran; Li, Ying-Qin; Liang, Shao-Bo; Jiang, Wei; Liu, Fang; Ge, Wen-Xiu; Tang, Ling-Long; Mao, Yan-Ping; He, Qing-Mei; Yang, Xiao-Jing; Zhang, Yuan; Wen, Xin; Zhang, Jian; Wang, Ya-Qin; Zhang, Pan-Pan; Sun, Ying; Yun, Jing-Ping; Zeng, Jing; Li, Li; Liu, Li-Zhi; Liu, Na; Ma, Jun

2018-03-01

Gene expression patterns can be used as prognostic biomarkers in various types of cancers. We aimed to identify a gene expression pattern for individual distant metastatic risk assessment in patients with locoregionally advanced nasopharyngeal carcinoma. In this multicentre, retrospective, cohort analysis, we included 937 patients with locoregionally advanced nasopharyngeal carcinoma from three Chinese hospitals: the Sun Yat-sen University Cancer Center (Guangzhou, China), the Affiliated Hospital of Guilin Medical University (Guilin, China), and the First People's Hospital of Foshan (Foshan, China). Using microarray analysis, we profiled mRNA gene expression between 24 paired locoregionally advanced nasopharyngeal carcinoma tumours from patients at Sun Yat-sen University Cancer Center with or without distant metastasis after radical treatment. Differentially expressed genes were examined using digital expression profiling in a training cohort (Guangzhou training cohort; n=410) to build a gene classifier using a penalised regression model. We validated the prognostic accuracy of this gene classifier in an internal validation cohort (Guangzhou internal validation cohort, n=204) and two external independent cohorts (Guilin cohort, n=165; Foshan cohort, n=158). The primary endpoint was distant metastasis-free survival. Secondary endpoints were disease-free survival and overall survival. We identified 137 differentially expressed genes between metastatic and non-metastatic locoregionally advanced nasopharyngeal carcinoma tissues. A distant metastasis gene signature for locoregionally advanced nasopharyngeal carcinoma (DMGN) that consisted of 13 genes was generated to classify patients into high-risk and low-risk groups in the training cohort. Patients with high-risk scores in the training cohort had shorter distant metastasis-free survival (hazard ratio [HR] 4·93, 95% CI 2·99-8·16; p<0·0001), disease-free survival (HR 3·51, 2·43-5·07; p<0·0001), and overall survival (HR 3·22, 2·18-4·76; p<0·0001) than patients with low-risk scores. The prognostic accuracy of DMGN was validated in the internal and external cohorts. Furthermore, among patients with low-risk scores in the combined training and internal cohorts, concurrent chemotherapy improved distant metastasis-free survival compared with those patients who did not receive concurrent chemotherapy (HR 0·40, 95% CI 0·19-0·83; p=0·011), whereas patients with high-risk scores did not benefit from concurrent chemotherapy (HR 1·03, 0·71-1·50; p=0·876). This was also validated in the two external cohorts combined. We developed a nomogram based on the DMGN and other variables that predicted an individual's risk of distant metastasis, which was strengthened by adding Epstein-Barr virus DNA status. The DMGN is a reliable prognostic tool for distant metastasis in patients with locoregionally advanced nasopharyngeal carcinoma and might be able to predict which patients benefit from concurrent chemotherapy. It has the potential to guide treatment decisions for patients at different risk of distant metastasis. The National Natural Science Foundation of China, the National Science & Technology Pillar Program during the Twelfth Five-year Plan Period, the Natural Science Foundation of Guang Dong Province, the National Key Research and Development Program of China, the Innovation Team Development Plan of the Ministry of Education, the Health & Medical Collaborative Innovation Project of Guangzhou City, China, and the Program of Introducing Talents of Discipline to Universities. Copyright © 2018 Elsevier Ltd. All rights reserved.
Musashi2 sustains the mixed-lineage leukemia–driven stem cell regulatory program

PubMed Central

Park, Sun-Mi; Gönen, Mithat; Vu, Ly; Minuesa, Gerard; Tivnan, Patrick; Barlowe, Trevor S.; Taggart, James; Lu, Yuheng; Deering, Raquel P.; Hacohen, Nir; Figueroa, Maria E.; Paietta, Elisabeth; Fernandez, Hugo F.; Tallman, Martin S.; Melnick, Ari; Levine, Ross; Leslie, Christina; Lengner, Christopher J.; Kharas, Michael G.

2015-01-01

Leukemia stem cells (LSCs) are found in most aggressive myeloid diseases and contribute to therapeutic resistance. Leukemia cells exhibit a dysregulated developmental program as the result of genetic and epigenetic alterations. Overexpression of the RNA-binding protein Musashi2 (MSI2) has been previously shown to predict poor survival in leukemia. Here, we demonstrated that conditional deletion of Msi2 in the hematopoietic compartment results in delayed leukemogenesis, reduced disease burden, and a loss of LSC function in a murine leukemia model. Gene expression profiling of these Msi2-deficient animals revealed a loss of the hematopoietic/leukemic stem cell self-renewal program and an increase in the differentiation program. In acute myeloid leukemia patients, the presence of a gene signature that was similar to that observed in Msi2-deficent murine LSCs correlated with improved survival. We determined that MSI2 directly maintains the mixed-lineage leukemia (MLL) self-renewal program by interacting with and retaining efficient translation of Hoxa9, Myc, and Ikzf2 mRNAs. Moreover, depletion of MLL target Ikzf2 in LSCs reduced colony formation, decreased proliferation, and increased apoptosis. Our data provide evidence that MSI2 controls efficient translation of the oncogenic LSC self-renewal program and suggest MSI2 as a potential therapeutic target for myeloid leukemia. PMID:25664853
Genomes to natural products PRediction Informatics for Secondary Metabolomes (PRISM).

PubMed

Skinnider, Michael A; Dejong, Chris A; Rees, Philip N; Johnston, Chad W; Li, Haoxin; Webster, Andrew L H; Wyatt, Morgan A; Magarvey, Nathan A

2015-11-16

Microbial natural products are an invaluable source of evolved bioactive small molecules and pharmaceutical agents. Next-generation and metagenomic sequencing indicates untapped genomic potential, yet high rediscovery rates of known metabolites increasingly frustrate conventional natural product screening programs. New methods to connect biosynthetic gene clusters to novel chemical scaffolds are therefore critical to enable the targeted discovery of genetically encoded natural products. Here, we present PRISM, a computational resource for the identification of biosynthetic gene clusters, prediction of genetically encoded nonribosomal peptides and type I and II polyketides, and bio- and cheminformatic dereplication of known natural products. PRISM implements novel algorithms which render it uniquely capable of predicting type II polyketides, deoxygenated sugars, and starter units, making it a comprehensive genome-guided chemical structure prediction engine. A library of 57 tailoring reactions is leveraged for combinatorial scaffold library generation when multiple potential substrates are consistent with biosynthetic logic. We compare the accuracy of PRISM to existing genomic analysis platforms. PRISM is an open-source, user-friendly web application available at http://magarveylab.ca/prism/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Initial experience with GeneXpert MTB/RIF assay in the Arkansas Tuberculosis Control Program.

PubMed

Patil, Naveen; Saba, Hamida; Marco, Asween; Samant, Rohan; Mukasa, Leonard

2014-01-01

Mycobacterium tuberculosis remains one of the most significant causes of death from an infectious agent. Rapid and accurate diagnosis of pulmonary and extra-pulmonary tuberculosis (TB) is still a great challenge. The GeneXpert MTB/RIF assay is a novel integrated diagnostic system for the diagnosis of tuberculosis and rapid detection of Rifampin (RIF) resistance in clinical specimens. In 2012, the Arkansas Tuberculosis Control Program introduced GeneXpert MTB/RIF assay to replace the labour-intensive Mycobacterium Tuberculosis Direct (MTD) assay. To rapidly diagnose TB within two hours and to simultaneously detect RIF resistance. Describe the procedure used to introduce GeneXpert MTB/RIF assay in the Arkansas Tuberculosis Control Program.Characterise the current gap in rapid M. tuberculosis diagnosis in Arkansas.Assess factors that predict acid fast bacilli (AFB) smearnegative but culture-positive cases in Arkansas.Illustrate, with two case reports, the role of GeneXpert MTB/RIF assay in reduction of time to confirmation of M. tuberculosis diagnosis in the first year of implementation. Between June 2012 and June 2013, all AFB sputum smearpositive cases and any others, on request by the physician, had GeneXpert MTB/RIF assay performed as well as traditional M. tuberculosis culture and susceptibilities using Mycobacteria Growth Indicator Tube (MGIT) 960 and Löwenstein-Jensen (LJ) slants. Surveillance data for January 2009-June 2013 was analysed to characterise sputum smear-negative but culture-positive cases. Seventy-one TB cases were reported from June 2012- June 2013. GeneXpert MTB/RIF assay identified all culture-positive cases as well as three cases that were negative on culture. Also, this rapid assay identified all six smear-negative but M. tuberculosis culture-positive cases; two of these cases are described as case reports. GeneXpert MTB/RIF assay has made rapid TB diagnosis possible, with tremendous potential in determining isolation of TB suspects on one hand, and quickly ruling out TB whenever suspected.
Evolutionary origins of the endocannabinoid system.

PubMed

McPartland, John M; Matias, Isabel; Di Marzo, Vincenzo; Glass, Michelle

2006-03-29

Endocannabinoid system evolution was estimated by searching for functional orthologs in the genomes of twelve phylogenetically diverse organisms: Homo sapiens, Mus musculus, Takifugu rubripes, Ciona intestinalis, Caenorhabditis elegans, Drosophila melanogaster, Saccharomyces cerevisiae, Arabidopsis thaliana, Plasmodium falciparum, Tetrahymena thermophila, Archaeoglobus fulgidus, and Mycobacterium tuberculosis. Sequences similar to human endocannabinoid exon sequences were derived from filtered BLAST searches, and subjected to phylogenetic testing with ClustalX and tree building programs. Monophyletic clades that agreed with broader phylogenetic evidence (i.e., gene trees displaying topographical congruence with species trees) were considered orthologs. The capacity of orthologs to function as endocannabinoid proteins was predicted with pattern profilers (Pfam, Prosite, TMHMM, and pSORT), and by examining queried sequences for amino acid motifs known to serve critical roles in endocannabinoid protein function (obtained from a database of site-directed mutagenesis studies). This novel transfer of functional information onto gene trees enabled us to better predict the functional origins of the endocannabinoid system. Within this limited number of twelve organisms, the endocannabinoid genes exhibited heterogeneous evolutionary trajectories, with functional orthologs limited to mammals (TRPV1 and GPR55), or vertebrates (CB2 and DAGLbeta), or chordates (MAGL and COX2), or animals (DAGLalpha and CB1-like receptors), or opisthokonta (animals and fungi, NAPE-PLD), or eukaryotes (FAAH). Our methods identified fewer orthologs than did automated annotation systems, such as HomoloGene. Phylogenetic profiles, nonorthologous gene displacement, functional convergence, and coevolution are discussed.
Combinatorial therapy discovery using mixed integer linear programming.

PubMed

Pang, Kaifang; Wan, Ying-Wooi; Choi, William T; Donehower, Lawrence A; Sun, Jingchun; Pant, Dhruv; Liu, Zhandong

2014-05-15

Combinatorial therapies play increasingly important roles in combating complex diseases. Owing to the huge cost associated with experimental methods in identifying optimal drug combinations, computational approaches can provide a guide to limit the search space and reduce cost. However, few computational approaches have been developed for this purpose, and thus there is a great need of new algorithms for drug combination prediction. Here we proposed to formulate the optimal combinatorial therapy problem into two complementary mathematical algorithms, Balanced Target Set Cover (BTSC) and Minimum Off-Target Set Cover (MOTSC). Given a disease gene set, BTSC seeks a balanced solution that maximizes the coverage on the disease genes and minimizes the off-target hits at the same time. MOTSC seeks a full coverage on the disease gene set while minimizing the off-target set. Through simulation, both BTSC and MOTSC demonstrated a much faster running time over exhaustive search with the same accuracy. When applied to real disease gene sets, our algorithms not only identified known drug combinations, but also predicted novel drug combinations that are worth further testing. In addition, we developed a web-based tool to allow users to iteratively search for optimal drug combinations given a user-defined gene set. Our tool is freely available for noncommercial use at http://www.drug.liuzlab.org/. zhandong.liu@bcm.edu Supplementary data are available at Bioinformatics online.
Neuronal Susceptibility to GRIM in Drosophila melanogaster Measures the Rate of Genetic Changes that Scale to Lifespan

PubMed Central

Bedoukian, Matthew A.; Rodriguez, Sarah M.; Cohen, Matthew B.; Duncan Smith, Stuart V.; Park, Jennifer

2009-01-01

Gene expression in Drosophila melanogaster changes significantly throughout life and some of these changes can be delayed by lowering ambient temperature and also by dietary restriction. These two interventions are known to slow the rate of aging as well as the accumulation of damage. It is unknown, however, whether gene expression changes that occur during development and early adult life make an animal more vulnerable to death. Here we develop a method capable of measuring the rate of programmed genetic changes during young adult life in Drosophila melanogaster and show that these changes can be delayed or accelerated in a manner that is predictive of longevity. We show that temperature shifts and dietary restriction, which slow the rate of aging in Drosophila melanogaster, extend the window of neuronal susceptibility to GRIM over-expression in a way that scales to lifespan. We propose that this susceptibility can be used to test compounds and genetic manipulations that alter the onset of senescence by changing the programmed timing of gene expression that correlates and may be causal to aging. PMID:19428445
A hybrid approach of gene sets and single genes for the prediction of survival risks with gene expression data.

PubMed

Seok, Junhee; Davis, Ronald W; Xiao, Wenzhong

2015-01-01

Accumulated biological knowledge is often encoded as gene sets, collections of genes associated with similar biological functions or pathways. The use of gene sets in the analyses of high-throughput gene expression data has been intensively studied and applied in clinical research. However, the main interest remains in finding modules of biological knowledge, or corresponding gene sets, significantly associated with disease conditions. Risk prediction from censored survival times using gene sets hasn't been well studied. In this work, we propose a hybrid method that uses both single gene and gene set information together to predict patient survival risks from gene expression profiles. In the proposed method, gene sets provide context-level information that is poorly reflected by single genes. Complementarily, single genes help to supplement incomplete information of gene sets due to our imperfect biomedical knowledge. Through the tests over multiple data sets of cancer and trauma injury, the proposed method showed robust and improved performance compared with the conventional approaches with only single genes or gene sets solely. Additionally, we examined the prediction result in the trauma injury data, and showed that the modules of biological knowledge used in the prediction by the proposed method were highly interpretable in biology. A wide range of survival prediction problems in clinical genomics is expected to benefit from the use of biological knowledge.
A Hybrid Approach of Gene Sets and Single Genes for the Prediction of Survival Risks with Gene Expression Data

PubMed Central

Seok, Junhee; Davis, Ronald W.; Xiao, Wenzhong

2015-01-01

Accumulated biological knowledge is often encoded as gene sets, collections of genes associated with similar biological functions or pathways. The use of gene sets in the analyses of high-throughput gene expression data has been intensively studied and applied in clinical research. However, the main interest remains in finding modules of biological knowledge, or corresponding gene sets, significantly associated with disease conditions. Risk prediction from censored survival times using gene sets hasn’t been well studied. In this work, we propose a hybrid method that uses both single gene and gene set information together to predict patient survival risks from gene expression profiles. In the proposed method, gene sets provide context-level information that is poorly reflected by single genes. Complementarily, single genes help to supplement incomplete information of gene sets due to our imperfect biomedical knowledge. Through the tests over multiple data sets of cancer and trauma injury, the proposed method showed robust and improved performance compared with the conventional approaches with only single genes or gene sets solely. Additionally, we examined the prediction result in the trauma injury data, and showed that the modules of biological knowledge used in the prediction by the proposed method were highly interpretable in biology. A wide range of survival prediction problems in clinical genomics is expected to benefit from the use of biological knowledge. PMID:25933378
Analyzing multiple data sets by interconnecting RSAT programs via SOAP Web services: an example with ChIP-chip data.

PubMed

Sand, Olivier; Thomas-Chollier, Morgane; Vervisch, Eric; van Helden, Jacques

2008-01-01

This protocol shows how to access the Regulatory Sequence Analysis Tools (RSAT) via a programmatic interface in order to automate the analysis of multiple data sets. We describe the steps for writing a Perl client that connects to the RSAT Web services and implements a workflow to discover putative cis-acting elements in promoters of gene clusters. In the presented example, we apply this workflow to lists of transcription factor target genes resulting from ChIP-chip experiments. For each factor, the protocol predicts the binding motifs by detecting significantly overrepresented hexanucleotides in the target promoters and generates a feature map that displays the positions of putative binding sites along the promoter sequences. This protocol is addressed to bioinformaticians and biologists with programming skills (notions of Perl). Running time is approximately 6 min on the example data set.
Feed-forward transcriptional programming by nuclear receptors: regulatory principles and therapeutic implications.

PubMed

Sasse, Sarah K; Gerber, Anthony N

2015-01-01

Nuclear receptors (NRs) are widely targeted to treat a range of human diseases. Feed-forward loops are an ancient mechanism through which single cell organisms organize transcriptional programming and modulate gene expression dynamics, but they have not been systematically studied as a regulatory paradigm for NR-mediated transcriptional responses. Here, we provide an overview of the basic properties of feed-forward loops as predicted by mathematical models and validated experimentally in single cell organisms. We review existing evidence implicating feed-forward loops as important in controlling clinically relevant transcriptional responses to estrogens, progestins, and glucocorticoids, among other NR ligands. We propose that feed-forward transcriptional circuits are a major mechanism through which NRs integrate signals, exert temporal control over gene regulation, and compartmentalize client transcriptomes into discrete subunits. Implications for the design and function of novel selective NR ligands are discussed. Copyright © 2014 Elsevier Inc. All rights reserved.
Synthetic mixed-signal computation in living cells

PubMed Central

Rubens, Jacob R.; Selvaggio, Gianluca; Lu, Timothy K.

2016-01-01

Living cells implement complex computations on the continuous environmental signals that they encounter. These computations involve both analogue- and digital-like processing of signals to give rise to complex developmental programs, context-dependent behaviours and homeostatic activities. In contrast to natural biological systems, synthetic biological systems have largely focused on either digital or analogue computation separately. Here we integrate analogue and digital computation to implement complex hybrid synthetic genetic programs in living cells. We present a framework for building comparator gene circuits to digitize analogue inputs based on different thresholds. We then demonstrate that comparators can be predictably composed together to build band-pass filters, ternary logic systems and multi-level analogue-to-digital converters. In addition, we interface these analogue-to-digital circuits with other digital gene circuits to enable concentration-dependent logic. We expect that this hybrid computational paradigm will enable new industrial, diagnostic and therapeutic applications with engineered cells. PMID:27255669
Combining classifiers to predict gene function in Arabidopsis thaliana using large-scale gene expression measurements.

PubMed

Lan, Hui; Carson, Rachel; Provart, Nicholas J; Bonner, Anthony J

2007-09-21

Arabidopsis thaliana is the model species of current plant genomic research with a genome size of 125 Mb and approximately 28,000 genes. The function of half of these genes is currently unknown. The purpose of this study is to infer gene function in Arabidopsis using machine-learning algorithms applied to large-scale gene expression data sets, with the goal of identifying genes that are potentially involved in plant response to abiotic stress. Using in house and publicly available data, we assembled a large set of gene expression measurements for A. thaliana. Using those genes of known function, we first evaluated and compared the ability of basic machine-learning algorithms to predict which genes respond to stress. Predictive accuracy was measured using ROC50 and precision curves derived through cross validation. To improve accuracy, we developed a method for combining these classifiers using a weighted-voting scheme. The combined classifier was then trained on genes of known function and applied to genes of unknown function, identifying genes that potentially respond to stress. Visual evidence corroborating the predictions was obtained using electronic Northern analysis. Three of the predicted genes were chosen for biological validation. Gene knockout experiments confirmed that all three are involved in a variety of stress responses. The biological analysis of one of these genes (At1g16850) is presented here, where it is shown to be necessary for the normal response to temperature and NaCl. Supervised learning methods applied to large-scale gene expression measurements can be used to predict gene function. However, the ability of basic learning methods to predict stress response varies widely and depends heavily on how much dimensionality reduction is used. Our method of combining classifiers can improve the accuracy of such predictions - in this case, predictions of genes involved in stress response in plants - and it effectively chooses the appropriate amount of dimensionality reduction automatically. The method provides a useful means of identifying genes in A. thaliana that potentially respond to stress, and we expect it would be useful in other organisms and for other gene functions.
TargetCompare: A web interface to compare simultaneous miRNAs targets

PubMed Central

Moreira, Fabiano Cordeiro; Dustan, Bruno; Hamoy, Igor G; Ribeiro-dos-Santos, André M; dos Santos, Ândrea Ribeiro

2014-01-01

MicroRNAs (miRNAs) are small non-coding nucleotide sequences between 17 and 25 nucleotides in length that primarily function in the regulation of gene expression. A since miRNA has thousand of predict targets in a complex, regulatory cell signaling network. Therefore, it is of interest to study multiple target genes simultaneously. Hence, we describe a web tool (developed using Java programming language and MySQL database server) to analyse multiple targets of pre-selected miRNAs. We cross validated the tool in eight most highly expressed miRNAs in the antrum region of stomach. This helped to identify 43 potential genes that are target of at least six of the referred miRNAs. The developed tool aims to reduce the randomness and increase the chance of selecting strong candidate target genes and miRNAs responsible for playing important roles in the studied tissue. Availability http://lghm.ufpa.br/targetcompare PMID:25352731

TargetCompare: A web interface to compare simultaneous miRNAs targets.

PubMed

Moreira, Fabiano Cordeiro; Dustan, Bruno; Hamoy, Igor G; Ribeiro-Dos-Santos, André M; Dos Santos, Andrea Ribeiro

2014-01-01

MicroRNAs (miRNAs) are small non-coding nucleotide sequences between 17 and 25 nucleotides in length that primarily function in the regulation of gene expression. A since miRNA has thousand of predict targets in a complex, regulatory cell signaling network. Therefore, it is of interest to study multiple target genes simultaneously. Hence, we describe a web tool (developed using Java programming language and MySQL database server) to analyse multiple targets of pre-selected miRNAs. We cross validated the tool in eight most highly expressed miRNAs in the antrum region of stomach. This helped to identify 43 potential genes that are target of at least six of the referred miRNAs. The developed tool aims to reduce the randomness and increase the chance of selecting strong candidate target genes and miRNAs responsible for playing important roles in the studied tissue. http://lghm.ufpa.br/targetcompare.
Modeling phenotypic metabolic adaptations of Mycobacterium tuberculosis H37Rv under hypoxia.

PubMed

Fang, Xin; Wallqvist, Anders; Reifman, Jaques

2012-01-01

The ability to adapt to different conditions is key for Mycobacterium tuberculosis, the causative agent of tuberculosis (TB), to successfully infect human hosts. Adaptations allow the organism to evade the host immune responses during acute infections and persist for an extended period of time during the latent infectious stage. In latently infected individuals, estimated to include one-third of the human population, the organism exists in a variety of metabolic states, which impedes the development of a simple strategy for controlling or eradicating this disease. Direct knowledge of the metabolic states of M. tuberculosis in patients would aid in the management of the disease as well as in forming the basis for developing new drugs and designing more efficacious drug cocktails. Here, we propose an in silico approach to create state-specific models based on readily available gene expression data. The coupling of differential gene expression data with a metabolic network model allowed us to characterize the metabolic adaptations of M. tuberculosis H37Rv to hypoxia. Given the microarray data for the alterations in gene expression, our model predicted reduced oxygen uptake, ATP production changes, and a global change from an oxidative to a reductive tricarboxylic acid (TCA) program. Alterations in the biomass composition indicated an increase in the cell wall metabolites required for cell-wall growth, as well as heightened accumulation of triacylglycerol in preparation for a low-nutrient, low metabolic activity life style. In contrast, the gene expression program in the deletion mutant of dosR, which encodes the immediate hypoxic response regulator, failed to adapt to low-oxygen stress. Our predictions were compatible with recent experimental observations of M. tuberculosis activity under hypoxic and anaerobic conditions. Importantly, alterations in the flow and accumulation of a particular metabolite were not necessarily directly linked to differential gene expression of the enzymes catalyzing the related metabolic reactions.
Deep Sequencing Reveals the Effect of MeJA on Scutellarin Biosynthesis in Erigeron breviscapus

PubMed Central

Xiao, Ying; Zhang, Feng; Chen, Jun-feng; Ji, Qian; Tan, He-Xin; Huang, Xin; Feng, Hao; Huang, Bao-Kang; Chen, Wan-Sheng; Zhang, Lei

2015-01-01

Background Erigeron breviscapus, a well-known traditional Chinese medicinal herb, is broadly used in the treatment of cerebrovascular disease. Scutellarin, a kind of flavonoids, is considered as the material base of the pharmaceutical activities in E. breviscapus. The stable and high content of scutellarin is critical for the quality and efficiency of E. breviscapus in the clinical use. Therefore, understanding the molecular mechanism of scutellarin biosynthesis is crucial for metabolic engineering to increase the content of the active compound. However, there is virtually no study available yet concerning the genetic research of scutellarin biosynthesis in E. breviscapus. Results Using Illumina sequencing technology, we obtained over three billion bases of high-quality sequence data and conducted de novo assembly and annotation without prior genome information. A total of 182,527 unigenes (mean length = 738 bp) were found. 63,059 unigenes were functionally annotated with a cut-off E-value of 10−5. Next, a total of 238 (200 up-regulated and 38 down-regulated genes) and 513 (375 up-regulated and 138 down-regulated genes) differentially expressed genes were identified at different time points after methyl jasmonate (MeJA) treatment, which fell into categories of ‘metabolic process’ and ‘cellular process’ using GO database, suggesting that MeJA-induced activities of signal pathway in plant mainly led to re-programming of metabolism and cell activity. In addition, 13 predicted genes that might participate in the metabolism of flavonoids were found by two co-expression analyses in E. breviscapus. Conclusions Our study is the first to provide a transcriptome sequence resource for E. breviscapus plants after MeJA treatment and it reveals transcriptome re-programming upon elicitation. As the result, several putative unknown genes involved in the metabolism of flavonoids were predicted. These data provide a valuable resource for the genetic and genomic studies of special flavonoids metabolism and further metabolic engineering in E. breviscapus. PMID:26656917
A general framework for optimization of probes for gene expression microarray and its application to the fungus Podospora anserina

PubMed Central

2010-01-01

Background The development of new microarray technologies makes custom long oligonucleotide arrays affordable for many experimental applications, notably gene expression analyses. Reliable results depend on probe design quality and selection. Probe design strategy should cope with the limited accuracy of de novo gene prediction programs, and annotation up-dating. We present a novel in silico procedure which addresses these issues and includes experimental screening, as an empirical approach is the best strategy to identify optimal probes in the in silico outcome. Findings We used four criteria for in silico probe selection: cross-hybridization, hairpin stability, probe location relative to coding sequence end and intron position. This latter criterion is critical when exon-intron gene structure predictions for intron-rich genes are inaccurate. For each coding sequence (CDS), we selected a sub-set of four probes. These probes were included in a test microarray, which was used to evaluate the hybridization behavior of each probe. The best probe for each CDS was selected according to three experimental criteria: signal-to-noise ratio, signal reproducibility, and representative signal intensities. This procedure was applied for the development of a gene expression Agilent platform for the filamentous fungus Podospora anserina and the selection of a single 60-mer probe for each of the 10,556 P. anserina CDS. Conclusions A reliable gene expression microarray version based on the Agilent 44K platform was developed with four spot replicates of each probe to increase statistical significance of analysis. PMID:20565839
Suicide and the selfish gene.

PubMed

Satora, Leszek

2005-01-01

The application of an evolutionary perspective to human behaviour generates philosophical, political and scientific controversy. Modern human symbolic consciousness is not the cumulation of the long trend that natural selection would predict. The new archaeological data suggested the anatomical and behavioural innovation has been episodic and rare separated by long periods of stagnate. New behavioural mode and the new skeletal structure of modem human arose as an incidental exaptation. Additionally the genetic basis dysfunction connected with suicide behaviour and growing statistic suicide among teenager is contradictory to the theory that our behaviour are programmed in any detail by selfish genes. In this cases genetically determined suicidal behaviour should be rapidly eliminated by natural selection.
Identification of differentially expressed small non-coding RNAs in the legume endosymbiont Sinorhizobium meliloti by comparative genomics

PubMed Central

del Val, Coral; Rivas, Elena; Torres-Quesada, Omar; Toro, Nicolás; Jiménez-Zurdo, José I

2007-01-01

Bacterial small non-coding RNAs (sRNAs) are being recognized as novel widespread regulators of gene expression in response to environmental signals. Here, we present the first search for sRNA-encoding genes in the nitrogen-fixing endosymbiont Sinorhizobium meliloti, performed by a genome-wide computational analysis of its intergenic regions. Comparative sequence data from eight related α-proteobacteria were obtained, and the interspecies pairwise alignments were scored with the programs eQRNA and RNAz as complementary predictive tools to identify conserved and stable secondary structures corresponding to putative non-coding RNAs. Northern experiments confirmed that eight of the predicted loci, selected among the original 32 candidates as most probable sRNA genes, expressed small transcripts. This result supports the combined use of eQRNA and RNAz as a robust strategy to identify novel sRNAs in bacteria. Furthermore, seven of the transcripts accumulated differentially in free-living and symbiotic conditions. Experimental mapping of the 5′-ends of the detected transcripts revealed that their encoding genes are organized in autonomous transcription units with recognizable promoter and, in most cases, termination signatures. These findings suggest novel regulatory functions for sRNAs related to the interactions of α-proteobacteria with their eukaryotic hosts. PMID:17971083
Automated Discovery of Functional Generality of Human Gene Expression Programs

PubMed Central

Gerber, Georg K; Dowell, Robin D; Jaakkola, Tommi S; Gifford, David K

2007-01-01

An important research problem in computational biology is the identification of expression programs, sets of co-expressed genes orchestrating normal or pathological processes, and the characterization of the functional breadth of these programs. The use of human expression data compendia for discovery of such programs presents several challenges including cellular inhomogeneity within samples, genetic and environmental variation across samples, uncertainty in the numbers of programs and sample populations, and temporal behavior. We developed GeneProgram, a new unsupervised computational framework based on Hierarchical Dirichlet Processes that addresses each of the above challenges. GeneProgram uses expression data to simultaneously organize tissues into groups and genes into overlapping programs with consistent temporal behavior, to produce maps of expression programs, which are sorted by generality scores that exploit the automatically learned groupings. Using synthetic and real gene expression data, we showed that GeneProgram outperformed several popular expression analysis methods. We applied GeneProgram to a compendium of 62 short time-series gene expression datasets exploring the responses of human cells to infectious agents and immune-modulating molecules. GeneProgram produced a map of 104 expression programs, a substantial number of which were significantly enriched for genes involved in key signaling pathways and/or bound by NF-κB transcription factors in genome-wide experiments. Further, GeneProgram discovered expression programs that appear to implicate surprising signaling pathways or receptor types in the response to infection, including Wnt signaling and neurotransmitter receptors. We believe the discovered map of expression programs involved in the response to infection will be useful for guiding future biological experiments; genes from programs with low generality scores might serve as new drug targets that exhibit minimal “cross-talk,” and genes from high generality programs may maintain common physiological responses that go awry in disease states. Further, our method is multipurpose, and can be applied readily to novel compendia of biological data. PMID:17696603
Prediction of epigenetically regulated genes in breast cancer cell lines

DOE Office of Scientific and Technical Information (OSTI.GOV)

Loss, Leandro A; Sadanandam, Anguraj; Durinck, Steffen

Methylation of CpG islands within the DNA promoter regions is one mechanism that leads to aberrant gene expression in cancer. In particular, the abnormal methylation of CpG islands may silence associated genes. Therefore, using high-throughput microarrays to measure CpG island methylation will lead to better understanding of tumor pathobiology and progression, while revealing potentially new biomarkers. We have examined a recently developed high-throughput technology for measuring genome-wide methylation patterns called mTACL. Here, we propose a computational pipeline for integrating gene expression and CpG island methylation profles to identify epigenetically regulated genes for a panel of 45 breast cancer cell lines,more » which is widely used in the Integrative Cancer Biology Program (ICBP). The pipeline (i) reduces the dimensionality of the methylation data, (ii) associates the reduced methylation data with gene expression data, and (iii) ranks methylation-expression associations according to their epigenetic regulation. Dimensionality reduction is performed in two steps: (i) methylation sites are grouped across the genome to identify regions of interest, and (ii) methylation profles are clustered within each region. Associations between the clustered methylation and the gene expression data sets generate candidate matches within a fxed neighborhood around each gene. Finally, the methylation-expression associations are ranked through a logistic regression, and their significance is quantified through permutation analysis. Our two-step dimensionality reduction compressed 90% of the original data, reducing 137,688 methylation sites to 14,505 clusters. Methylation-expression associations produced 18,312 correspondences, which were used to further analyze epigenetic regulation. Logistic regression was used to identify 58 genes from these correspondences that showed a statistically signifcant negative correlation between methylation profles and gene expression in the panel of breast cancer cell lines. Subnetwork enrichment of these genes has identifed 35 common regulators with 6 or more predicted markers. In addition to identifying epigenetically regulated genes, we show evidence of differentially expressed methylation patterns between the basal and luminal subtypes. Our results indicate that the proposed computational protocol is a viable platform for identifying epigenetically regulated genes. Our protocol has generated a list of predictors including COL1A2, TOP2A, TFF1, and VAV3, genes whose key roles in epigenetic regulation is documented in the literature. Subnetwork enrichment of these predicted markers further suggests that epigenetic regulation of individual genes occurs in a coordinated fashion and through common regulators.« less
Evidence That Up-Regulation of MicroRNA-29 Contributes to Postnatal Body Growth Deceleration

PubMed Central

Kamran, Fariha; Andrade, Anenisia C.; Nella, Aikaterini A.; Clokie, Samuel J.; Rezvani, Geoffrey; Nilsson, Ola; Baron, Jeffrey

2015-01-01

Body growth is rapid in infancy but subsequently slows and eventually ceases due to a progressive decline in cell proliferation that occurs simultaneously in multiple organs. We previously showed that this decline in proliferation is driven in part by postnatal down-regulation of a large set of growth-promoting genes in multiple organs. We hypothesized that this growth-limiting genetic program is orchestrated by microRNAs (miRNAs). Bioinformatic analysis identified target sequences of the miR-29 family of miRNAs to be overrepresented in age–down-regulated genes. Concomitantly, expression microarray analysis in mouse kidney and lung showed that all members of the miR-29 family, miR-29a, -b, and -c, were strongly up-regulated from 1 to 6 weeks of age. Real-time PCR confirmed that miR-29a, -b, and -c were up-regulated with age in liver, kidney, lung, and heart, and their expression levels were higher in hepatocytes isolated from 5-week-old mice than in hepatocytes from embryonic mouse liver at embryonic day 16.5. We next focused on 3 predicted miR-29 target genes (Igf1, Imp1, and Mest), all of which are growth-promoting. A 3′-untranslated region containing the predicted target sequences from each gene was placed individually in a luciferase reporter construct. Transfection of miR-29 mimics suppressed luciferase gene activity for all 3 genes, and this suppression was diminished by mutating the target sequences, suggesting that these genes are indeed regulated by miR-29. Taken together, the findings suggest that up-regulation of miR-29 during juvenile life drives the down-regulation of multiple growth-promoting genes, thus contributing to physiological slowing and eventual cessation of body growth. PMID:25866874
Evidence That Up-Regulation of MicroRNA-29 Contributes to Postnatal Body Growth Deceleration.

PubMed

Kamran, Fariha; Andrade, Anenisia C; Nella, Aikaterini A; Clokie, Samuel J; Rezvani, Geoffrey; Nilsson, Ola; Baron, Jeffrey; Lui, Julian C

2015-06-01

Body growth is rapid in infancy but subsequently slows and eventually ceases due to a progressive decline in cell proliferation that occurs simultaneously in multiple organs. We previously showed that this decline in proliferation is driven in part by postnatal down-regulation of a large set of growth-promoting genes in multiple organs. We hypothesized that this growth-limiting genetic program is orchestrated by microRNAs (miRNAs). Bioinformatic analysis identified target sequences of the miR-29 family of miRNAs to be overrepresented in age-down-regulated genes. Concomitantly, expression microarray analysis in mouse kidney and lung showed that all members of the miR-29 family, miR-29a, -b, and -c, were strongly up-regulated from 1 to 6 weeks of age. Real-time PCR confirmed that miR-29a, -b, and -c were up-regulated with age in liver, kidney, lung, and heart, and their expression levels were higher in hepatocytes isolated from 5-week-old mice than in hepatocytes from embryonic mouse liver at embryonic day 16.5. We next focused on 3 predicted miR-29 target genes (Igf1, Imp1, and Mest), all of which are growth-promoting. A 3'-untranslated region containing the predicted target sequences from each gene was placed individually in a luciferase reporter construct. Transfection of miR-29 mimics suppressed luciferase gene activity for all 3 genes, and this suppression was diminished by mutating the target sequences, suggesting that these genes are indeed regulated by miR-29. Taken together, the findings suggest that up-regulation of miR-29 during juvenile life drives the down-regulation of multiple growth-promoting genes, thus contributing to physiological slowing and eventual cessation of body growth.
Prediction of chemo-response in serous ovarian cancer.

PubMed

Gonzalez Bosquet, Jesus; Newtson, Andreea M; Chung, Rebecca K; Thiel, Kristina W; Ginader, Timothy; Goodheart, Michael J; Leslie, Kimberly K; Smith, Brian J

2016-10-19

Nearly one-third of serous ovarian cancer (OVCA) patients will not respond to initial treatment with surgery and chemotherapy and die within one year of diagnosis. If patients who are unlikely to respond to current standard therapy can be identified up front, enhanced tumor analyses and treatment regimens could potentially be offered. Using the Cancer Genome Atlas (TCGA) serous OVCA database, we previously identified a robust molecular signature of 422-genes associated with chemo-response. Our objective was to test whether this signature is an accurate and sensitive predictor of chemo-response in serous OVCA. We first constructed prediction models to predict chemo-response using our previously described 422-gene signature that was associated with response to treatment in serous OVCA. Performance of all prediction models were measured with area under the curves (AUCs, a measure of the model's accuracy) and their respective confidence intervals (CIs). To optimize the prediction process, we determined which elements of the signature most contributed to chemo-response prediction. All prediction models were replicated and validated using six publicly available independent gene expression datasets. The 422-gene signature prediction models predicted chemo-response with AUCs of ~70 %. Optimization of prediction models identified the 34 most important genes in chemo-response prediction. These 34-gene models had improved performance, with AUCs approaching 80 %. Both 422-gene and 34-gene prediction models were replicated and validated in six independent datasets. These prediction models serve as the foundation for the future development and implementation of a diagnostic tool to predict response to chemotherapy for serous OVCA patients.
QueTAL: a suite of tools to classify and compare TAL effectors functionally and phylogenetically

PubMed Central

Pérez-Quintero, Alvaro L.; Lamy, Léo; Gordon, Jonathan L.; Escalon, Aline; Cunnac, Sébastien; Szurek, Boris; Gagnevin, Lionel

2015-01-01

Transcription Activator-Like (TAL) effectors from Xanthomonas plant pathogenic bacteria can bind to the promoter region of plant genes and induce their expression. DNA-binding specificity is governed by a central domain made of nearly identical repeats, each determining the recognition of one base pair via two amino acid residues (a.k.a. Repeat Variable Di-residue, or RVD). Knowing how TAL effectors differ from each other within and between strains would be useful to infer functional and evolutionary relationships, but their repetitive nature precludes reliable use of traditional alignment methods. The suite QueTAL was therefore developed to offer tailored tools for comparison of TAL effector genes. The program DisTAL considers each repeat as a unit, transforms a TAL effector sequence into a sequence of coded repeats and makes pair-wise alignments between these coded sequences to construct trees. The program FuncTAL is aimed at finding TAL effectors with similar DNA-binding capabilities. It calculates correlations between position weight matrices of potential target DNA sequence predicted from the RVD sequence, and builds trees based on these correlations. The programs accurately represented phylogenetic and functional relationships between TAL effectors using either simulated or literature-curated data. When using the programs on a large set of TAL effector sequences, the DisTAL tree largely reflected the expected species phylogeny. In contrast, FuncTAL showed that TAL effectors with similar binding capabilities can be found between phylogenetically distant taxa. This suite will help users to rapidly analyse any TAL effector genes of interest and compare them to other available TAL genes and should improve our understanding of TAL effectors evolution. It is available at http://bioinfo-web.mpl.ird.fr/cgi-bin2/quetal/quetal.cgi. PMID:26284082
Validation of Skeletal Muscle cis-Regulatory Module Predictions Reveals Nucleotide Composition Bias in Functional Enhancers

PubMed Central

Kwon, Andrew T.; Chou, Alice Yi; Arenillas, David J.; Wasserman, Wyeth W.

2011-01-01

We performed a genome-wide scan for muscle-specific cis-regulatory modules (CRMs) using three computational prediction programs. Based on the predictions, 339 candidate CRMs were tested in cell culture with NIH3T3 fibroblasts and C2C12 myoblasts for capacity to direct selective reporter gene expression to differentiated C2C12 myotubes. A subset of 19 CRMs validated as functional in the assay. The rate of predictive success reveals striking limitations of computational regulatory sequence analysis methods for CRM discovery. Motif-based methods performed no better than predictions based only on sequence conservation. Analysis of the properties of the functional sequences relative to inactive sequences identifies nucleotide sequence composition can be an important characteristic to incorporate in future methods for improved predictive specificity. Muscle-related TFBSs predicted within the functional sequences display greater sequence conservation than non-TFBS flanking regions. Comparison with recent MyoD and histone modification ChIP-Seq data supports the validity of the functional regions. PMID:22144875
Changing the Game: Using Integrative Genomics to Probe Virulence Mechanisms of the Stem Rust Pathogen Puccinia graminis f. sp. tritici.

PubMed

Figueroa, Melania; Upadhyaya, Narayana M; Sperschneider, Jana; Park, Robert F; Szabo, Les J; Steffenson, Brian; Ellis, Jeff G; Dodds, Peter N

2016-01-01

The recent resurgence of wheat stem rust caused by new virulent races of Puccinia graminis f. sp. tritici (Pgt) poses a threat to food security. These concerns have catalyzed an extensive global effort toward controlling this disease. Substantial research and breeding programs target the identification and introduction of new stem rust resistance (Sr) genes in cultivars for genetic protection against the disease. Such resistance genes typically encode immune receptor proteins that recognize specific components of the pathogen, known as avirulence (Avr) proteins. A significant drawback to deploying cultivars with single Sr genes is that they are often overcome by evolution of the pathogen to escape recognition through alterations in Avr genes. Thus, a key element in achieving durable rust control is the deployment of multiple effective Sr genes in combination, either through conventional breeding or transgenic approaches, to minimize the risk of resistance breakdown. In this situation, evolution of pathogen virulence would require changes in multiple Avr genes in order to bypass recognition. However, choosing the optimal Sr gene combinations to deploy is a challenge that requires detailed knowledge of the pathogen Avr genes with which they interact and the virulence phenotypes of Pgt existing in nature. Identifying specific Avr genes from Pgt will provide screening tools to enhance pathogen virulence monitoring, assess heterozygosity and propensity for mutation in pathogen populations, and confirm individual Sr gene functions in crop varieties carrying multiple effective resistance genes. Toward this goal, much progress has been made in assembling a high quality reference genome sequence for Pgt, as well as a Pan-genome encompassing variation between multiple field isolates with diverse virulence spectra. In turn this has allowed prediction of Pgt effector gene candidates based on known features of Avr genes in other plant pathogens, including the related flax rust fungus. Upregulation of gene expression in haustoria and evidence for diversifying selection are two useful parameters to identify candidate Avr genes. Recently, we have also applied machine learning approaches to agnostically predict candidate effectors. Here, we review progress in stem rust pathogenomics and approaches currently underway to identify Avr genes recognized by wheat Sr genes.
SnowyOwl: accurate prediction of fungal genes by using RNA-Seq and homology information to select among ab initio models

PubMed Central

2014-01-01

Background Locating the protein-coding genes in novel genomes is essential to understanding and exploiting the genomic information but it is still difficult to accurately predict all the genes. The recent availability of detailed information about transcript structure from high-throughput sequencing of messenger RNA (RNA-Seq) delineates many expressed genes and promises increased accuracy in gene prediction. Computational gene predictors have been intensively developed for and tested in well-studied animal genomes. Hundreds of fungal genomes are now or will soon be sequenced. The differences of fungal genomes from animal genomes and the phylogenetic sparsity of well-studied fungi call for gene-prediction tools tailored to them. Results SnowyOwl is a new gene prediction pipeline that uses RNA-Seq data to train and provide hints for the generation of Hidden Markov Model (HMM)-based gene predictions and to evaluate the resulting models. The pipeline has been developed and streamlined by comparing its predictions to manually curated gene models in three fungal genomes and validated against the high-quality gene annotation of Neurospora crassa; SnowyOwl predicted N. crassa genes with 83% sensitivity and 65% specificity. SnowyOwl gains sensitivity by repeatedly running the HMM gene predictor Augustus with varied input parameters and selectivity by choosing the models with best homology to known proteins and best agreement with the RNA-Seq data. Conclusions SnowyOwl efficiently uses RNA-Seq data to produce accurate gene models in both well-studied and novel fungal genomes. The source code for the SnowyOwl pipeline (in Python) and a web interface (in PHP) is freely available from http://sourceforge.net/projects/snowyowl/. PMID:24980894
Improved accuracy of supervised CRM discovery with interpolated Markov models and cross-species comparison

PubMed Central

Kazemian, Majid; Zhu, Qiyun; Halfon, Marc S.; Sinha, Saurabh

2011-01-01

Despite recent advances in experimental approaches for identifying transcriptional cis-regulatory modules (CRMs, ‘enhancers’), direct empirical discovery of CRMs for all genes in all cell types and environmental conditions is likely to remain an elusive goal. Effective methods for computational CRM discovery are thus a critically needed complement to empirical approaches. However, existing computational methods that search for clusters of putative binding sites are ineffective if the relevant TFs and/or their binding specificities are unknown. Here, we provide a significantly improved method for ‘motif-blind’ CRM discovery that does not depend on knowledge or accurate prediction of TF-binding motifs and is effective when limited knowledge of functional CRMs is available to ‘supervise’ the search. We propose a new statistical method, based on ‘Interpolated Markov Models’, for motif-blind, genome-wide CRM discovery. It captures the statistical profile of variable length words in known CRMs of a regulatory network and finds candidate CRMs that match this profile. The method also uses orthologs of the known CRMs from closely related genomes. We perform in silico evaluation of predicted CRMs by assessing whether their neighboring genes are enriched for the expected expression patterns. This assessment uses a novel statistical test that extends the widely used Hypergeometric test of gene set enrichment to account for variability in intergenic lengths. We find that the new CRM prediction method is superior to existing methods. Finally, we experimentally validate 12 new CRM predictions by examining their regulatory activity in vivo in Drosophila; 10 of the tested CRMs were found to be functional, while 6 of the top 7 predictions showed the expected activity patterns. We make our program available as downloadable source code, and as a plugin for a genome browser installed on our servers. PMID:21821659
Aggregating Data for Computational Toxicology Applications ...

EPA Pesticide Factsheets

Computational toxicology combines data from high-throughput test methods, chemical structure analyses and other biological domains (e.g., genes, proteins, cells, tissues) with the goals of predicting and understanding the underlying mechanistic causes of chemical toxicity and for predicting toxicity of new chemicals and products. A key feature of such approaches is their reliance on knowledge extracted from large collections of data and data sets in computable formats. The U.S. Environmental Protection Agency (EPA) has developed a large data resource called ACToR (Aggregated Computational Toxicology Resource) to support these data-intensive efforts. ACToR comprises four main repositories: core ACToR (chemical identifiers and structures, and summary data on hazard, exposure, use, and other domains), ToxRefDB (Toxicity Reference Database, a compilation of detailed in vivo toxicity data from guideline studies), ExpoCastDB (detailed human exposure data from observational studies of selected chemicals), and ToxCastDB (data from high-throughput screening programs, including links to underlying biological information related to genes and pathways). The EPA DSSTox (Distributed Structure-Searchable Toxicity) program provides expert-reviewed chemical structures and associated information for these and other high-interest public inventories. Overall, the ACToR system contains information on about 400,000 chemicals from 1100 different sources. The entire system is built usi
Isolation and characterization of new highly polymorphic DNA markers from the Huntington disease region

DOE Office of Scientific and Technical Information (OSTI.GOV)

Weber, B.; Hedrick, A.; Andrew, S.

1992-02-01

The defect causing Huntington disease (HD) has been mapped to 4p16.3, distal to the DNA marker D4S10. Subsequently, additional polymorphic markers closer to the HD gene have been isolated, which has led to the establishment of predictive testing programs for individuals at risk for HD. Approximately 17% of persons presenting to the Canadian collaborative study for predictive testing for HD have not received any modification of risk, in part because of limited informativeness of currently available DNA markers. Therefore, more highly polymorphic DNA markers are needed, which well further increase the accuracy and availability of predictive testing, specifically for familiesmore » with complex or incomplete pedigree structures. In addition, new markers are urgently needed in order to refine the breakpoints in the few known recombinant HD chromosomes, which could allow a more accurate localization of the HD gene within 4p16.3 and, therefore, accelerate the cloning of the disease gene. In this study, the authors present the identification and characterization of nine new polymorphic DNA markers, including three markers which detect highly informative multiallelic VNTR-like polymorphisms with PIC values of up to .84. These markers have been isolated from a cloned region of DNA which has been previously mapped approximately 1,000 kb from the 4p telomere.« less
Gene function prediction based on Gene Ontology Hierarchy Preserving Hashing.

PubMed

Zhao, Yingwen; Fu, Guangyuan; Wang, Jun; Guo, Maozu; Yu, Guoxian

2018-02-23

Gene Ontology (GO) uses structured vocabularies (or terms) to describe the molecular functions, biological roles, and cellular locations of gene products in a hierarchical ontology. GO annotations associate genes with GO terms and indicate the given gene products carrying out the biological functions described by the relevant terms. However, predicting correct GO annotations for genes from a massive set of GO terms as defined by GO is a difficult challenge. To combat with this challenge, we introduce a Gene Ontology Hierarchy Preserving Hashing (HPHash) based semantic method for gene function prediction. HPHash firstly measures the taxonomic similarity between GO terms. It then uses a hierarchy preserving hashing technique to keep the hierarchical order between GO terms, and to optimize a series of hashing functions to encode massive GO terms via compact binary codes. After that, HPHash utilizes these hashing functions to project the gene-term association matrix into a low-dimensional one and performs semantic similarity based gene function prediction in the low-dimensional space. Experimental results on three model species (Homo sapiens, Mus musculus and Rattus norvegicus) for interspecies gene function prediction show that HPHash performs better than other related approaches and it is robust to the number of hash functions. In addition, we also take HPHash as a plugin for BLAST based gene function prediction. From the experimental results, HPHash again significantly improves the prediction performance. The codes of HPHash are available at: http://mlda.swu.edu.cn/codes.php?name=HPHash. Copyright © 2018 Elsevier Inc. All rights reserved.
CodingQuarry: highly accurate hidden Markov model gene prediction in fungal genomes using RNA-seq transcripts.

PubMed

Testa, Alison C; Hane, James K; Ellwood, Simon R; Oliver, Richard P

2015-03-11

The impact of gene annotation quality on functional and comparative genomics makes gene prediction an important process, particularly in non-model species, including many fungi. Sets of homologous protein sequences are rarely complete with respect to the fungal species of interest and are often small or unreliable, especially when closely related species have not been sequenced or annotated in detail. In these cases, protein homology-based evidence fails to correctly annotate many genes, or significantly improve ab initio predictions. Generalised hidden Markov models (GHMM) have proven to be invaluable tools in gene annotation and, recently, RNA-seq has emerged as a cost-effective means to significantly improve the quality of automated gene annotation. As these methods do not require sets of homologous proteins, improving gene prediction from these resources is of benefit to fungal researchers. While many pipelines now incorporate RNA-seq data in training GHMMs, there has been relatively little investigation into additionally combining RNA-seq data at the point of prediction, and room for improvement in this area motivates this study. CodingQuarry is a highly accurate, self-training GHMM fungal gene predictor designed to work with assembled, aligned RNA-seq transcripts. RNA-seq data informs annotations both during gene-model training and in prediction. Our approach capitalises on the high quality of fungal transcript assemblies by incorporating predictions made directly from transcript sequences. Correct predictions are made despite transcript assembly problems, including those caused by overlap between the transcripts of adjacent gene loci. Stringent benchmarking against high-confidence annotation subsets showed CodingQuarry predicted 91.3% of Schizosaccharomyces pombe genes and 90.4% of Saccharomyces cerevisiae genes perfectly. These results are 4-5% better than those of AUGUSTUS, the next best performing RNA-seq driven gene predictor tested. Comparisons against whole genome Sc. pombe and S. cerevisiae annotations further substantiate a 4-5% improvement in the number of correctly predicted genes. We demonstrate the success of a novel method of incorporating RNA-seq data into GHMM fungal gene prediction. This shows that a high quality annotation can be achieved without relying on protein homology or a training set of genes. CodingQuarry is freely available ( https://sourceforge.net/projects/codingquarry/ ), and suitable for incorporation into genome annotation pipelines.

The grapevine expression atlas reveals a deep transcriptome shift driving the entire plant into a maturation program.

PubMed

Fasoli, Marianna; Dal Santo, Silvia; Zenoni, Sara; Tornielli, Giovanni Battista; Farina, Lorenzo; Zamboni, Anita; Porceddu, Andrea; Venturini, Luca; Bicego, Manuele; Murino, Vittorio; Ferrarini, Alberto; Delledonne, Massimo; Pezzotti, Mario

2012-09-01

We developed a genome-wide transcriptomic atlas of grapevine (Vitis vinifera) based on 54 samples representing green and woody tissues and organs at different developmental stages as well as specialized tissues such as pollen and senescent leaves. Together, these samples expressed ∼91% of the predicted grapevine genes. Pollen and senescent leaves had unique transcriptomes reflecting their specialized functions and physiological status. However, microarray and RNA-seq analysis grouped all the other samples into two major classes based on maturity rather than organ identity, namely, the vegetative/green and mature/woody categories. This division represents a fundamental transcriptomic reprogramming during the maturation process and was highlighted by three statistical approaches identifying the transcriptional relationships among samples (correlation analysis), putative biomarkers (O2PLS-DA approach), and sets of strongly and consistently expressed genes that define groups (topics) of similar samples (biclustering analysis). Gene coexpression analysis indicated that the mature/woody developmental program results from the reiterative coactivation of pathways that are largely inactive in vegetative/green tissues, often involving the coregulation of clusters of neighboring genes and global regulation based on codon preference. This global transcriptomic reprogramming during maturation has not been observed in herbaceous annual species and may be a defining characteristic of perennial woody plants.
Analysis of membrane protein genes in a Brazilian isolate of Anaplasma marginale.

PubMed

G Junior, Daniel S; Araújo, Flábio R; Almeida Junior, Nalvo F; Adi, Said S; Cheung, Luciana M; Fragoso, Stenio P; Ramos, Carlos A N; Oliveira, Renato Henrique M de; Santos, Caroline S; Bacanelli, Gisele; Soares, Cleber O; Rosinha, Grácia M S; Fonseca, Adivaldo H

2010-11-01

The sequencing of the complete genome of Anaplasma marginale has enabled the identification of several genes that encode membrane proteins, thereby increasing the chances of identifying candidate immunogens. Little is known regarding the genetic variability of genes that encode membrane proteins in A. marginale isolates. The aim of the present study was to determine the degree of conservation of the predicted amino acid sequences of OMP1, OMP4, OMP5, OMP7, OMP8, OMP10, OMP14, OMP15, SODb, OPAG1, OPAG3, VirB3, VirB9-1, PepA, EF-Tu and AM854 proteins in a Brazilian isolate of A. marginale compared to other isolates. Hence, primers were used to amplify these genes: omp1, omp4, omp5, omp7, omp8, omp10, omp14, omp15, sodb, opag1, opag3, virb3, VirB9-1, pepA, ef-tu and am854. After polimerase chain reaction amplification, the products were cloned and sequenced using the Sanger method and the predicted amino acid sequence were multi-aligned using the CLUSTALW and MEGA 4 programs, comparing the predicted sequences between the Brazilian, Saint Maries, Florida and A. marginale centrale isolates. With the exception of outer membrane protein (OMP) 7, all proteins exhibited 92-100% homology to the other A. marginale isolates. However, only OMP1, OMP5, EF-Tu, VirB3, SODb and VirB9-1 were selected as potential immunogens capable of promoting cross-protection between isolates due to the high degree of homology (over 72%) also found with A. (centrale) marginale.
Reconstruction and Validation of a Genome-Scale Metabolic Model for the Filamentous Fungus Neurospora crassa Using FARM

PubMed Central

Hood, Heather M.; Ocasio, Linda R.; Sachs, Matthew S.; Galagan, James E.

2013-01-01

The filamentous fungus Neurospora crassa played a central role in the development of twentieth-century genetics, biochemistry and molecular biology, and continues to serve as a model organism for eukaryotic biology. Here, we have reconstructed a genome-scale model of its metabolism. This model consists of 836 metabolic genes, 257 pathways, 6 cellular compartments, and is supported by extensive manual curation of 491 literature citations. To aid our reconstruction, we developed three optimization-based algorithms, which together comprise Fast Automated Reconstruction of Metabolism (FARM). These algorithms are: LInear MEtabolite Dilution Flux Balance Analysis (limed-FBA), which predicts flux while linearly accounting for metabolite dilution; One-step functional Pruning (OnePrune), which removes blocked reactions with a single compact linear program; and Consistent Reproduction Of growth/no-growth Phenotype (CROP), which reconciles differences between in silico and experimental gene essentiality faster than previous approaches. Against an independent test set of more than 300 essential/non-essential genes that were not used to train the model, the model displays 93% sensitivity and specificity. We also used the model to simulate the biochemical genetics experiments originally performed on Neurospora by comprehensively predicting nutrient rescue of essential genes and synthetic lethal interactions, and we provide detailed pathway-based mechanistic explanations of our predictions. Our model provides a reliable computational framework for the integration and interpretation of ongoing experimental efforts in Neurospora, and we anticipate that our methods will substantially reduce the manual effort required to develop high-quality genome-scale metabolic models for other organisms. PMID:23935467
Gene regulatory network inference from multifactorial perturbation data using both regression and correlation analyses.

PubMed

Xiong, Jie; Zhou, Tong

2012-01-01

An important problem in systems biology is to reconstruct gene regulatory networks (GRNs) from experimental data and other a priori information. The DREAM project offers some types of experimental data, such as knockout data, knockdown data, time series data, etc. Among them, multifactorial perturbation data are easier and less expensive to obtain than other types of experimental data and are thus more common in practice. In this article, a new algorithm is presented for the inference of GRNs using the DREAM4 multifactorial perturbation data. The GRN inference problem among [Formula: see text] genes is decomposed into [Formula: see text] different regression problems. In each of the regression problems, the expression level of a target gene is predicted solely from the expression level of a potential regulation gene. For different potential regulation genes, different weights for a specific target gene are constructed by using the sum of squared residuals and the Pearson correlation coefficient. Then these weights are normalized to reflect effort differences of regulating distinct genes. By appropriately choosing the parameters of the power law, we constructe a 0-1 integer programming problem. By solving this problem, direct regulation genes for an arbitrary gene can be estimated. And, the normalized weight of a gene is modified, on the basis of the estimation results about the existence of direct regulations to it. These normalized and modified weights are used in queuing the possibility of the existence of a corresponding direct regulation. Computation results with the DREAM4 In Silico Size 100 Multifactorial subchallenge show that estimation performances of the suggested algorithm can even outperform the best team. Using the real data provided by the DREAM5 Network Inference Challenge, estimation performances can be ranked third. Furthermore, the high precision of the obtained most reliable predictions shows the suggested algorithm may be helpful in guiding biological experiment designs.
Genome-wide targeted prediction of ABA responsive genes in rice based on over-represented cis-motif in co-expressed genes.

PubMed

Lenka, Sangram K; Lohia, Bikash; Kumar, Abhay; Chinnusamy, Viswanathan; Bansal, Kailash C

2009-02-01

Abscisic acid (ABA), the popular plant stress hormone, plays a key role in regulation of sub-set of stress responsive genes. These genes respond to ABA through specific transcription factors which bind to cis-regulatory elements present in their promoters. We discovered the ABA Responsive Element (ABRE) core (ACGT) containing CGMCACGTGB motif as over-represented motif among the promoters of ABA responsive co-expressed genes in rice. Targeted gene prediction strategy using this motif led to the identification of 402 protein coding genes potentially regulated by ABA-dependent molecular genetic network. RT-PCR analysis of arbitrarily chosen 45 genes from the predicted 402 genes confirmed 80% accuracy of our prediction. Plant Gene Ontology (GO) analysis of ABA responsive genes showed enrichment of signal transduction and stress related genes among diverse functional categories.
Hox-C9 activates the intrinsic pathway of apoptosis and is associated with spontaneous regression in neuroblastoma.

PubMed

Kocak, H; Ackermann, S; Hero, B; Kahlert, Y; Oberthuer, A; Juraeva, D; Roels, F; Theissen, J; Westermann, F; Deubzer, H; Ehemann, V; Brors, B; Odenthal, M; Berthold, F; Fischer, M

2013-04-11

Neuroblastoma is an embryonal malignancy of the sympathetic nervous system. Spontaneous regression and differentiation of neuroblastoma is observed in a subset of patients, and has been suggested to represent delayed activation of physiologic molecular programs of fetal neuroblasts. Homeobox genes constitute an important family of transcription factors, which play a fundamental role in morphogenesis and cell differentiation during embryogenesis. In this study, we demonstrate that expression of the majority of the human HOX class I homeobox genes is significantly associated with clinical covariates in neuroblastoma using microarray expression data of 649 primary tumors. Moreover, a HOX gene expression-based classifier predicted neuroblastoma patient outcome independently of age, stage and MYCN amplification status. Among all HOX genes, HOXC9 expression was most prominently associated with favorable prognostic markers. Most notably, elevated HOXC9 expression was significantly associated with spontaneous regression in infant neuroblastoma. Re-expression of HOXC9 in three neuroblastoma cell lines led to a significant reduction in cell viability, and abrogated tumor growth almost completely in neuroblastoma xenografts. Neuroblastoma growth arrest was related to the induction of programmed cell death, as indicated by an increase in the sub-G1 fraction and translocation of phosphatidylserine to the outer membrane. Programmed cell death was associated with the release of cytochrome c from the mitochondria into the cytosol and activation of the intrinsic cascade of caspases, indicating that HOXC9 re-expression triggers the intrinsic apoptotic pathway. Collectively, our results show a strong prognostic impact of HOX gene expression in neuroblastoma, and may point towards a role of Hox-C9 in neuroblastoma spontaneous regression.
Exploring information transmission in gene networks using stochastic simulation and machine learning

NASA Astrophysics Data System (ADS)

Park, Kyemyung; Prüstel, Thorsten; Lu, Yong; Narayanan, Manikandan; Martins, Andrew; Tsang, John

How gene regulatory networks operate robustly despite environmental fluctuations and biochemical noise is a fundamental question in biology. Mathematically the stochastic dynamics of a gene regulatory network can be modeled using chemical master equation (CME), but nonlinearity and other challenges render analytical solutions of CMEs difficult to attain. While approaches of approximation and stochastic simulation have been devised for simple models, obtaining a more global picture of a system's behaviors in high-dimensional parameter space without simplifying the system substantially remains a major challenge. Here we present a new framework for understanding and predicting the behaviors of gene regulatory networks in the context of information transmission among genes. Our approach uses stochastic simulation of the network followed by machine learning of the mapping between model parameters and network phenotypes such as information transmission behavior. We also devised ways to visualize high-dimensional phase spaces in intuitive and informative manners. We applied our approach to several gene regulatory circuit motifs, including both feedback and feedforward loops, to reveal underexplored aspects of their operational behaviors. This work is supported by the Intramural Program of NIAID/NIH.
Extensive complementarity between gene function prediction methods.

PubMed

Vidulin, Vedrana; Šmuc, Tomislav; Supek, Fran

2016-12-01

The number of sequenced genomes rises steadily but we still lack the knowledge about the biological roles of many genes. Automated function prediction (AFP) is thus a necessity. We hypothesized that AFP approaches that draw on distinct genome features may be useful for predicting different types of gene functions, motivating a systematic analysis of the benefits gained by obtaining and integrating such predictions. Our pipeline amalgamates 5 133 543 genes from 2071 genomes in a single massive analysis that evaluates five established genomic AFP methodologies. While 1227 Gene Ontology (GO) terms yielded reliable predictions, the majority of these functions were accessible to only one or two of the methods. Moreover, different methods tend to assign a GO term to non-overlapping sets of genes. Thus, inferences made by diverse genomic AFP methods display a striking complementary, both gene-wise and function-wise. Because of this, a viable integration strategy is to rely on a single most-confident prediction per gene/function, rather than enforcing agreement across multiple AFP methods. Using an information-theoretic approach, we estimate that current databases contain 29.2 bits/gene of known Escherichia coli gene functions. This can be increased by up to 5.5 bits/gene using individual AFP methods or by 11 additional bits/gene upon integration, thereby providing a highly-ranking predictor on the Critical Assessment of Function Annotation 2 community benchmark. Availability of more sequenced genomes boosts the predictive accuracy of AFP approaches and also the benefit from integrating them. The individual and integrated GO predictions for the complete set of genes are available from http://gorbi.irb.hr/ CONTACT: fran.supek@irb.hrSupplementary information: Supplementary materials are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Reflections on the Anopheles gambiae genome sequence, transgenic mosquitoes and the prospect for controlling malaria and other vector borne diseases.

PubMed

Tabachnick, Walter J

2003-09-01

The completion of the Anopheles gambiae Giles genome sequencing project is a milestone toward developing more effective strategies in reducing the impact of malaria and other vector borne diseases. The successes in developing transgenic approaches using mosquitoes have provided another essential new tool for further progress in basic vector genetics and the goal of disease control. The use of transgenic approaches to develop refractory mosquitoes is also possible. The ability to use genome sequence to identify genes, and transgenic approaches to construct refractory mosquitoes, has provided the opportunity that with the future development of an appropriate genetic drive system, refractory transgenes can be released into vector populations leading to nontransmitting mosquitoes. An. gambiae populations incapable of transmitting malaria. This compelling strategy will be very difficult to achieve and will require a broad substantial research program for success. The fundamental information that is required on genome structure, gene function and environmental effects on genetic expression are largely unknown. The ability to predict gene effects on phenotype is rudimentary, particularly in natural populations. As a result, the release of a refractory transgene into natural mosquito populations is imprecise and there is little ability to predict unintended consequences. The new genetic tools at hand provide opportunities to address an array of important issues, many of which can have immediate impact on the effectiveness of a host of strategies to control vector borne disease. Transgenic release approaches represent only one strategy that should be pursued. A balanced research program is required.
The Association of CD81 Polymorphisms with Alloimmunization in Sickle Cell Disease

PubMed Central

Tatari-Calderone, Zohreh; Tamouza, Ryad; Le Bouder, Gama P.; Dewan, Ramita; Luban, Naomi L. C.; Lasserre, Jacqueline; Maury, Jacqueline; Lionnet, François; Krishnamoorthy, Rajagopal; Girot, Robert

2013-01-01

The goal of the present work was to identify the candidate genetic markers predictive of alloimmunization in sickle cell disease (SCD). Red blood cell (RBC) transfusion is indicated for acute treatment, prevention, and abrogation of some complications of SCD. A well-known consequence of multiple RBC transfusions is alloimmunization. Given that a subset of SCD patients develop multiple RBC allo-/autoantibodies, while others do not in a similar multiple transfusional setting, we investigated a possible genetic basis for alloimmunization. Biomarker(s) which predicts (predict) susceptibility to alloimmunization could identify patients at risk before the onset of a transfusion program and thus may have important implications for clinical management. In addition, such markers could shed light on the mechanism(s) underlying alloimmunization. We genotyped 27 single nucleotide polymorphisms (SNPs) in the CD81, CHRNA10, and ARHG genes in two groups of SCD patients. One group (35) of patients developed alloantibodies, and another (40) had no alloantibodies despite having received multiple transfusions. Two SNPs in the CD81 gene, that encodes molecule involved in the signal modulation of B lymphocytes, show a strong association with alloimmunization. If confirmed in prospective studies with larger cohorts, the two SNPs identified in this retrospective study could serve as predictive biomarkers for alloimmunization. PMID:23762099
Predicting Gene Structure Changes Resulting from Genetic Variants via Exon Definition Features.

PubMed

Majoros, William H; Holt, Carson; Campbell, Michael S; Ware, Doreen; Yandell, Mark; Reddy, Timothy E

2018-04-25

Genetic variation that disrupts gene function by altering gene splicing between individuals can substantially influence traits and disease. In those cases, accurately predicting the effects of genetic variation on splicing can be highly valuable for investigating the mechanisms underlying those traits and diseases. While methods have been developed to generate high quality computational predictions of gene structures in reference genomes, the same methods perform poorly when used to predict the potentially deleterious effects of genetic changes that alter gene splicing between individuals. Underlying that discrepancy in predictive ability are the common assumptions by reference gene finding algorithms that genes are conserved, well-formed, and produce functional proteins. We describe a probabilistic approach for predicting recent changes to gene structure that may or may not conserve function. The model is applicable to both coding and noncoding genes, and can be trained on existing gene annotations without requiring curated examples of aberrant splicing. We apply this model to the problem of predicting altered splicing patterns in the genomes of individual humans, and we demonstrate that performing gene-structure prediction without relying on conserved coding features is feasible. The model predicts an unexpected abundance of variants that create de novo splice sites, an observation supported by both simulations and empirical data from RNA-seq experiments. While these de novo splice variants are commonly misinterpreted by other tools as coding or noncoding variants of little or no effect, we find that in some cases they can have large effects on splicing activity and protein products, and we propose that they may commonly act as cryptic factors in disease. The software is available from geneprediction.org/SGRF. bmajoros@duke.edu. Supplementary information is available at Bioinformatics online.
Gene Expression Profiling Predicts the Development of Oral Cancer

PubMed Central

Saintigny, Pierre; Zhang, Li; Fan, You-Hong; El-Naggar, Adel K.; Papadimitrakopoulou, Vali; Feng, Lei; Lee, J. Jack; Kim, Edward S.; Hong, Waun Ki; Mao, Li

2011-01-01

Patients with oral preneoplastic lesion (OPL) have high risk of developing oral cancer. Although certain risk factors such as smoking status and histology are known, our ability to predict oral cancer risk remains poor. The study objective was to determine the value of gene expression profiling in predicting oral cancer development. Gene expression profile was measured in 86 of 162 OPL patients who were enrolled in a clinical chemoprevention trial that used the incidence of oral cancer development as a prespecified endpoint. The median follow-up time was 6.08 years and 35 of the 86 patients developed oral cancer over the course. Gene expression profiles were associated with oral cancer-free survival and used to develope multivariate predictive models for oral cancer prediction. We developed a 29-transcript predictive model which showed marked improvement in terms of prediction accuracy (with 8% predicting error rate) over the models using previously known clinico-pathological risk factors. Based on the gene expression profile data, we also identified 2182 transcripts significantly associated with oral cancer risk associated genes (P-value<0.01, single variate Cox proportional hazards model). Functional pathway analysis revealed proteasome machinery, MYC, and ribosomes components as the top gene sets associated with oral cancer risk. In multiple independent datasets, the expression profiles of the genes can differentiate head and neck cancer from normal mucosa. Our results show that gene expression profiles may improve the prediction of oral cancer risk in OPL patients and the significant genes identified may serve as potential targets for oral cancer chemoprevention. PMID:21292635
Systematic Characterization and Prediction of Human Hypertension Genes.

PubMed

Li, Yan-Hui; Zhang, Gai-Gai; Wang, Nanping

2017-02-01

Hypertension is a major cardiovascular risk factor and accounts for a large part of cardiovascular mortality. In this work, we analyzed the properties of hypertension genes and found that when compared with genes not yet known to be involved in hypertension regulation, known hypertension genes display distinguishing features: (1) hypertension genes tend to be located at network center; (2) hypertension genes tend to interact with each other; and (3) hypertension genes tend to enrich in certain biological processes and show certain phenotypes. Based on these features, we developed a machine-learning algorithm to predict new hypertension genes. One hundred and seventy-seven candidates were predicted with a posterior probability >0.9. Evidence supporting 17 of the predictions has been found. © 2016 American Heart Association, Inc.
A critical assessment of Mus musculus gene function prediction using integrated genomic evidence

PubMed Central

Peña-Castillo, Lourdes; Tasan, Murat; Myers, Chad L; Lee, Hyunju; Joshi, Trupti; Zhang, Chao; Guan, Yuanfang; Leone, Michele; Pagnani, Andrea; Kim, Wan Kyu; Krumpelman, Chase; Tian, Weidong; Obozinski, Guillaume; Qi, Yanjun; Mostafavi, Sara; Lin, Guan Ning; Berriz, Gabriel F; Gibbons, Francis D; Lanckriet, Gert; Qiu, Jian; Grant, Charles; Barutcuoglu, Zafer; Hill, David P; Warde-Farley, David; Grouios, Chris; Ray, Debajyoti; Blake, Judith A; Deng, Minghua; Jordan, Michael I; Noble, William S; Morris, Quaid; Klein-Seetharaman, Judith; Bar-Joseph, Ziv; Chen, Ting; Sun, Fengzhu; Troyanskaya, Olga G; Marcotte, Edward M; Xu, Dong; Hughes, Timothy R; Roth, Frederick P

2008-01-01

Background: Several years after sequencing the human genome and the mouse genome, much remains to be discovered about the functions of most human and mouse genes. Computational prediction of gene function promises to help focus limited experimental resources on the most likely hypotheses. Several algorithms using diverse genomic data have been applied to this task in model organisms; however, the performance of such approaches in mammals has not yet been evaluated. Results: In this study, a standardized collection of mouse functional genomic data was assembled; nine bioinformatics teams used this data set to independently train classifiers and generate predictions of function, as defined by Gene Ontology (GO) terms, for 21,603 mouse genes; and the best performing submissions were combined in a single set of predictions. We identified strengths and weaknesses of current functional genomic data sets and compared the performance of function prediction algorithms. This analysis inferred functions for 76% of mouse genes, including 5,000 currently uncharacterized genes. At a recall rate of 20%, a unified set of predictions averaged 41% precision, with 26% of GO terms achieving a precision better than 90%. Conclusion: We performed a systematic evaluation of diverse, independently developed computational approaches for predicting gene function from heterogeneous data sources in mammals. The results show that currently available data for mammals allows predictions with both breadth and accuracy. Importantly, many highly novel predictions emerge for the 38% of mouse genes that remain uncharacterized. PMID:18613946
Predicting essential genes for identifying potential drug targets in Aspergillus fumigatus.

PubMed

Lu, Yao; Deng, Jingyuan; Rhodes, Judith C; Lu, Hui; Lu, Long Jason

2014-06-01

Aspergillus fumigatus (Af) is a ubiquitous and opportunistic pathogen capable of causing acute, invasive pulmonary disease in susceptible hosts. Despite current therapeutic options, mortality associated with invasive Af infections remains unacceptably high, increasing 357% since 1980. Therefore, there is an urgent need for the development of novel therapeutic strategies, including more efficacious drugs acting on new targets. Thus, as noted in a recent review, "the identification of essential genes in fungi represents a crucial step in the development of new antifungal drugs". Expanding the target space by rapidly identifying new essential genes has thus been described as "the most important task of genomics-based target validation". In previous research, we were the first to show that essential gene annotation can be reliably transferred between distantly related four Prokaryotic species. In this study, we extend our machine learning approach to the much more complex Eukaryotic fungal species. A compendium of essential genes is predicted in Af by transferring known essential gene annotations from another filamentous fungus Neurospora crassa. This approach predicts essential genes by integrating diverse types of intrinsic and context-dependent genomic features encoded in microbial genomes. The predicted essential datasets contained 1674 genes. We validated our results by comparing our predictions with known essential genes in Af, comparing our predictions with those predicted by homology mapping, and conducting conditional expressed alleles. We applied several layers of filters and selected a set of potential drug targets from the predicted essential genes. Finally, we have conducted wet lab knockout experiments to verify our predictions, which further validates the accuracy and wide applicability of the machine learning approach. The approach presented here significantly extended our ability to predict essential genes beyond orthologs and made it possible to predict an inventory of essential genes in Eukaryotic fungal species, amongst which a preferred subset of suitable drug targets may be selected. By selecting the best new targets, we believe that resultant drugs would exhibit an unparalleled clinical impact against a naive pathogen population. Additional benefits that a compendium of essential genes can provide are important information on cell function and evolutionary biology. Furthermore, mapping essential genes to pathways may also reveal critical check points in the pathogen's metabolism. Finally, this approach is highly reproducible and portable, and can be easily applied to predict essential genes in many more pathogenic microbes, especially those unculturable. Copyright © 2014 Elsevier Ltd. All rights reserved.
Systematic prediction of gene function in Arabidopsis thaliana using a probabilistic functional gene network

PubMed Central

Hwang, Sohyun; Rhee, Seung Y; Marcotte, Edward M; Lee, Insuk

2012-01-01

AraNet is a functional gene network for the reference plant Arabidopsis and has been constructed in order to identify new genes associated with plant traits. It is highly predictive for diverse biological pathways and can be used to prioritize genes for functional screens. Moreover, AraNet provides a web-based tool with which plant biologists can efficiently discover novel functions of Arabidopsis genes (http://www.functionalnet.org/aranet/). This protocol explains how to conduct network-based prediction of gene functions using AraNet and how to interpret the prediction results. Functional discovery in plant biology is facilitated by combining candidate prioritization by AraNet with focused experimental tests. PMID:21886106
Changes in Gene Expression Predicting Local Control in Cervical Cancer: Results from Radiation Therapy Oncology Group 0128

PubMed Central

Weidhaas, Joanne B.; Li, Shu-Xia; Winter, Kathryn; Ryu, Janice; Jhingran, Anuja; Miller, Bridgette; Dicker, Adam P.; Gaffney, David

2009-01-01

Purpose To evaluate the potential of gene expression signatures to predict response to treatment in locally advanced cervical cancer treated with definitive chemotherapy and radiation. Experimental Design Tissue biopsies were collected from patients participating in Radiation Therapy Oncology Group (RTOG) 0128, a phase II trial evaluating the benefit of celecoxib in addition to cisplatin chemotherapy and radiation for locally advanced cervical cancer. Gene expression profiling was done and signatures of pretreatment, mid-treatment (before the first implant), and “changed” gene expression patterns between pre- and mid-treatment samples were determined. The ability of the gene signatures to predict local control versus local failure was evaluated. Two-group t test was done to identify the initial gene set separating these end points. Supervised classification methods were used to enrich the gene sets. The results were further validated by leave-one-out and 2-fold cross-validation. Results Twenty-two patients had suitable material from pretreatment samples for analysis, and 13 paired pre- and mid-treatment samples were obtained. The changed gene expression signatures between the pre- and mid-treatment biopsies predicted response to treatment, separating patients with local failures from those who achieved local control with a seven-gene signature. The in-sample prediction rate, leave-one-out prediction rate, and 2-fold prediction rate are 100% for this seven-gene signature. This signature was enriched for cell cycle genes. Conclusions Changed gene expression signatures during therapy in cervical cancer can predict outcome as measured by local control. After further validation, such findings could be applied to direct additional therapy for cervical cancer patients treated with chemotherapy and radiation. PMID:19509178
Integration of biological data by kernels on graph nodes allows prediction of new genes involved in mitotic chromosome condensation

PubMed Central

Hériché, Jean-Karim; Lees, Jon G.; Morilla, Ian; Walter, Thomas; Petrova, Boryana; Roberti, M. Julia; Hossain, M. Julius; Adler, Priit; Fernández, José M.; Krallinger, Martin; Haering, Christian H.; Vilo, Jaak; Valencia, Alfonso; Ranea, Juan A.; Orengo, Christine; Ellenberg, Jan

2014-01-01

The advent of genome-wide RNA interference (RNAi)–based screens puts us in the position to identify genes for all functions human cells carry out. However, for many functions, assay complexity and cost make genome-scale knockdown experiments impossible. Methods to predict genes required for cell functions are therefore needed to focus RNAi screens from the whole genome on the most likely candidates. Although different bioinformatics tools for gene function prediction exist, they lack experimental validation and are therefore rarely used by experimentalists. To address this, we developed an effective computational gene selection strategy that represents public data about genes as graphs and then analyzes these graphs using kernels on graph nodes to predict functional relationships. To demonstrate its performance, we predicted human genes required for a poorly understood cellular function—mitotic chromosome condensation—and experimentally validated the top 100 candidates with a focused RNAi screen by automated microscopy. Quantitative analysis of the images demonstrated that the candidates were indeed strongly enriched in condensation genes, including the discovery of several new factors. By combining bioinformatics prediction with experimental validation, our study shows that kernels on graph nodes are powerful tools to integrate public biological data and predict genes involved in cellular functions of interest. PMID:24943848
Genome-wide identification of galactinol synthase (GolS) genes in Solanum lycopersicum and Brachypodium distachyon.

PubMed

Filiz, Ertugrul; Ozyigit, Ibrahim Ilker; Vatansever, Recep

2015-10-01

GolS genes stand as potential candidate genes for molecular breeding and/or engineering programs in order for improving abiotic stress tolerance in plant species. In this study, a total of six galactinol synthase (GolS) genes/proteins were retrieved for Solanum lycopersicum and Brachypodium distachyon. GolS protein sequences were identified to include glyco_transf_8 (PF01501) domain structure, and to have a close molecular weight (36.40-39.59kDa) and amino acid length (318-347 aa) with a slightly acidic pI (5.35-6.40). The sub-cellular location was mainly predicted as cytoplasmic. S. lycopersicum genes located on chr 1 and 2, and included one segmental duplication while genes of B. distachyon were only on chr 1 with one tandem duplication. GolS sequences were found to have well conserved motif structures. Cis-acting analysis was performed for three abiotic stress responsive elements, including ABA responsive element (ABRE), dehydration and cold responsive elements (DRE/CRT) and low-temperature responsive element (LTRE). ABRE elements were found in all GolS genes, except for SlGolS4; DRE/CRT was not detected in any GolS genes and LTRE element found in SlGolS1 and BdGolS1 genes. AU analysis in UTR and ORF regions indicated that SlGolS and BdGolS mRNAs may have a short half-life. SlGolS3 and SlGolS4 genes may generate more stable transcripts since they included AATTAAA motif for polyadenylation signal POLASIG2. Seconder structures of SlGolS proteins were well conserved than that of BdGolS. Some structural divergences were detected in 3D structures and predicted binding sites exhibited various patterns in GolS proteins. Copyright © 2015 Elsevier Ltd. All rights reserved.
Integrative pathway analysis of a genome-wide association study of V̇o2max response to exercise training

PubMed Central

Vivar, Juan C.; Sarzynski, Mark A.; Sung, Yun Ju; Timmons, James A.; Bouchard, Claude; Rankinen, Tuomo

2013-01-01

We previously reported the findings from a genome-wide association study of the response of maximal oxygen uptake (V̇o2max) to an exercise program. Here we follow up on these results to generate hypotheses on genes, pathways, and systems involved in the ability to respond to exercise training. A systems biology approach can help us better establish a comprehensive physiological description of what underlies V̇o2maxtrainability. The primary material for this exploration was the individual single-nucleotide polymorphism (SNP), SNP-gene mapping, and statistical significance levels. We aimed to generate novel hypotheses through analyses that go beyond statistical association of single-locus markers. This was accomplished through three complementary approaches: 1) building de novo evidence of gene candidacy through informatics-driven literature mining; 2) aggregating evidence from statistical associations to link variant enrichment in biological pathways to V̇o2max trainability; and 3) predicting possible consequences of variants residing in the pathways of interest. We started with candidate gene prioritization followed by pathway analysis focused on overrepresentation analysis and gene set enrichment analysis. Subsequently, leads were followed using in silico analysis of predicted SNP functions. Pathways related to cellular energetics (pantothenate and CoA biosynthesis; PPAR signaling) and immune functions (complement and coagulation cascades) had the highest levels of SNP burden. In particular, long-chain fatty acid transport and fatty acid oxidation genes and sequence variants were found to influence differences in V̇o2max trainability. Together, these methods allow for the hypothesis-driven ranking and prioritization of genes and pathways for future experimental testing and validation. PMID:23990238

DMRT gene cluster analysis in the platypus: new insights into genomic organization and regulatory regions.

PubMed

El-Mogharbel, Nisrine; Wakefield, Matthew; Deakin, Janine E; Tsend-Ayush, Enkhjargal; Grützner, Frank; Alsop, Amber; Ezaz, Tariq; Marshall Graves, Jennifer A

2007-01-01

We isolated and characterized a cluster of platypus DMRT genes and compared their arrangement, location, and sequence across vertebrates. The DMRT gene cluster on human 9p24.3 harbors, in order, DMRT1, DMRT3, and DMRT2, which share a DM domain. DMRT1 is highly conserved and involved in sexual development in vertebrates, and deletions in this region cause sex reversal in humans. Sequence comparisons of DMRT genes between species have been valuable in identifying exons, control regions, and conserved nongenic regions (CNGs). The addition of platypus sequences is expected to be particularly valuable, since monotremes fill a gap in the vertebrate genome coverage. We therefore isolated and fully sequenced platypus BAC clones containing DMRT3 and DMRT2 as well as DMRT1 and then generated multispecies alignments and ran prediction programs followed by experimental verification to annotate this gene cluster. We found that the three genes have 58-66% identity to their human orthologues, lie in the same order as in other vertebrates, and colocate on 1 of the 10 platypus sex chromosomes, X5. We also predict that optimal annotation of the newly sequenced platypus genome will be challenging. The analysis of platypus sequence revealed differences in structure and sequence of the DMRT gene cluster. Multispecies comparison was particularly effective for detecting CNGs, revealing several novel potential regulatory regions within DMRT3 and DMRT2 as well as DMRT1. RT-PCR indicated that platypus DMRT1 and DMRT3 are expressed specifically in the adult testis (and not ovary), but DMRT2 has a wider expression profile, as it does for other mammals. The platypus DMRT1 expression pattern, and its location on an X chromosome, suggests an involvement in monotreme sexual development.
BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS.

PubMed

Hoff, Katharina J; Lange, Simone; Lomsadze, Alexandre; Borodovsky, Mark; Stanke, Mario

2016-03-01

Gene finding in eukaryotic genomes is notoriously difficult to automate. The task is to design a work flow with a minimal set of tools that would reach state-of-the-art performance across a wide range of species. GeneMark-ET is a gene prediction tool that incorporates RNA-Seq data into unsupervised training and subsequently generates ab initio gene predictions. AUGUSTUS is a gene finder that usually requires supervised training and uses information from RNA-Seq reads in the prediction step. Complementary strengths of GeneMark-ET and AUGUSTUS provided motivation for designing a new combined tool for automatic gene prediction. We present BRAKER1, a pipeline for unsupervised RNA-Seq-based genome annotation that combines the advantages of GeneMark-ET and AUGUSTUS. As input, BRAKER1 requires a genome assembly file and a file in bam-format with spliced alignments of RNA-Seq reads to the genome. First, GeneMark-ET performs iterative training and generates initial gene structures. Second, AUGUSTUS uses predicted genes for training and then integrates RNA-Seq read information into final gene predictions. In our experiments, we observed that BRAKER1 was more accurate than MAKER2 when it is using RNA-Seq as sole source for training and prediction. BRAKER1 does not require pre-trained parameters or a separate expert-prepared training step. BRAKER1 is available for download at http://bioinf.uni-greifswald.de/bioinf/braker/ and http://exon.gatech.edu/GeneMark/ katharina.hoff@uni-greifswald.de or borodovsky@gatech.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Epigenetic Networks Regulate the Transcriptional Program in Memory and Terminally Differentiated CD8+ T Cells.

PubMed

Rodriguez, Ramon M; Suarez-Alvarez, Beatriz; Lavín, José L; Mosén-Ansorena, David; Baragaño Raneros, Aroa; Márquez-Kisinousky, Leonardo; Aransay, Ana M; Lopez-Larrea, Carlos

2017-01-15

Epigenetic mechanisms play a critical role during differentiation of T cells by contributing to the formation of stable and heritable transcriptional patterns. To better understand the mechanisms of memory maintenance in CD8 + T cells, we performed genome-wide analysis of DNA methylation, histone marking (acetylated lysine 9 in histone H3 and trimethylated lysine 9 in histone), and gene-expression profiles in naive, effector memory (EM), and terminally differentiated EM (TEMRA) cells. Our results indicate that DNA demethylation and histone acetylation are coordinated to generate the transcriptional program associated with memory cells. Conversely, EM and TEMRA cells share a very similar epigenetic landscape. Nonetheless, the TEMRA transcriptional program predicts an innate immunity phenotype associated with genes never reported in these cells, including several mediators of NK cell activation (VAV3 and LYN) and a large array of NK receptors (e.g., KIR2DL3, KIR2DL4, KIR2DL1, KIR3DL1, KIR2DS5). In addition, we identified up to 161 genes that encode transcriptional regulators, some of unknown function in CD8 + T cells, and that were differentially expressed in the course of differentiation. Overall, these results provide new insights into the regulatory networks involved in memory CD8 + T cell maintenance and T cell terminal differentiation. Copyright © 2017 by The American Association of Immunologists, Inc.
Large-Scale Sequencing of Two Regions in Human Chromosome 7q22: Analysis of 650 kb of Genomic Sequence around the EPO and CUTL1 Loci Reveals 17 Genes

PubMed Central

Glöckner, Gernot; Scherer, Stephen; Schattevoy, Ruben; Boright, Andrew; Weber, Jacqueline; Tsui, Lap-Chee; Rosenthal, André

1998-01-01

We have sequenced and annotated two genomic regions located in the Giemsa negative band q22 of human chromosome 7. The first region defined by the erythropoietin (EPO) locus is 228 kb in length and contains 13 genes. Whereas 3 genes (GNB2, EPO, PCOLCE) were known previously on the mRNA level, we have been able to identify 10 novel genes using a newly developed automatic annotation tool RUMMAGE-DP, which comprises >26 different programs mainly for exon prediction, homology searches, and compositional and repeat analysis. For precise annotation we have also resequenced ESTs identified to the region and assembled them to build large cDNAs. In addition, we have investigated the differential splicing of genes. Using these tools we annotated 4 of the 10 genes as a zonadhesin, a transferrin homolog, a nucleoporin-like gene, and an actin gene. Two genes showed weak similarity to an insulin-like receptor and a neuronal protein with a leucine-rich amino-terminal domain. Four predicted genes (CDS1–CDS4) CDS that have been confirmed on the mRNA level showed no similarity to known proteins and a potential function could not be assigned. The second region in 7q22 defined by the CUTL1 (CCAAT displacement protein and its splice variant) locus is 416 kb in length and contains three known genes, including PMSL12, APS, CUTL1, and a novel gene (CDS5). The CUTL1 locus, consisting of two splice variants (CDP and CASP), occupies >300 kb. Based on the G,C profile an isochore switch can be defined between the CUTL1 gene and the APS and PMSL12 genes. [Clones 37G3, 164c7, and 235f8 are deposited in GenBank under accession no. AF053356; clone 123e15, accession no. AF024533; 186d2, accession no. AF024534; 46f6, accession no. AF006752; 50h2, accession no. AF047825; and 76h2, accession no. AF030453] PMID:9799793
Innovative Research Design Exploring the Effects of Physical Activity and Genetics on Cognitive Performance in Community-Based Older Adults

PubMed Central

Etnier, Jennifer L.; Labban, Jeffrey D.; Karper, William B.; Wideman, Laurie; Piepmeier, Aaron T.; Shih, Chia-Hao; Castellano, Michael; Williams, Lauren M.; Park, Se-Yun; Henrich, Vincent C.; Dudley, William N.; Rulison, Kelli L.

2015-01-01

Physical activity is predictive of better cognitive performance and lower risk of Alzheimer’s disease (AD). The apolipoprotein E gene (APOE) is a susceptibility gene for AD with the e4 allele being associated with a greater risk of AD. Cross-sectional and prospective research shows that physical activity is predictive of better cognitive performance for those at greater genetic risk for AD. However, the moderating role of APOE on the effects of a physical activity intervention on cognitive performance has not been examined. The purpose of this manuscript is to justify the need for such research and to describe the design, methods, and recruitment tactics used in the conductance of a study designed to provide insight as to the extent to which cognitive benefits resulting from an 8-month physical activity program are differentiated by ApoEe4 status. The effectiveness of the recruitment strategies and the feasibility of recruiting ApoE e4 carriers are discussed. PMID:25594264
Innovative Research Design Exploring the Effects of Physical Activity and Genetics on Cognitive Performance in Community-Based Older Adults.

PubMed

Etnier, Jennifer L; Labban, Jeffrey D; Karper, William B; Wideman, Laurie; Piepmeier, Aaron T; Shih, Chia-Hao; Castellano, Michael; Williams, Lauren M; Park, Se-Yun; Henrich, Vincent C; Dudley, William N; Rulison, Kelli L

2015-10-01

Physical activity is predictive of better cognitive performance and lower risk of Alzheimer's disease (AD). The apolipoprotein E gene (APOE) is a susceptibility gene for AD with the e4 allele being associated with a greater risk of AD. Cross-sectional and prospective research shows that physical activity is predictive of better cognitive performance for those at greater genetic risk for AD. However, the moderating role of APOE on the effects of a physical activity intervention on cognitive performance has not been examined. The purpose of this manuscript is to justify the need for such research and to describe the design, methods, and recruitment tactics used in the conductance of a study designed to provide insight as to the extent to which cognitive benefits resulting from an 8-month physical activity program are differentiated by APOE e4 status. The effectiveness of the recruitment strategies and the feasibility of recruiting APOE e4 carriers are discussed.
An efficient platform for genetic selection and screening of gene switches in Escherichia coli

PubMed Central

Muranaka, Norihito; Sharma, Vandana; Nomura, Yoko; Yokobayashi, Yohei

2009-01-01

Engineered gene switches and circuits that can sense various biochemical and physical signals, perform computation, and produce predictable outputs are expected to greatly advance our ability to program complex cellular behaviors. However, rational design of gene switches and circuits that function in living cells is challenging due to the complex intracellular milieu. Consequently, most successful designs of gene switches and circuits have relied, to some extent, on high-throughput screening and/or selection from combinatorial libraries of gene switch and circuit variants. In this study, we describe a generic and efficient platform for selection and screening of gene switches and circuits in Escherichia coli from large libraries. The single-gene dual selection marker tetA was translationally fused to green fluorescent protein (gfpuv) via a flexible peptide linker and used as a dual selection and screening marker for laboratory evolution of gene switches. Single-cycle (sequential positive and negative selections) enrichment efficiencies of >7000 were observed in mock selections of model libraries containing functional riboswitches in liquid culture. The technique was applied to optimize various parameters affecting the selection outcome, and to isolate novel thiamine pyrophosphate riboswitches from a complex library. Artificial riboswitches with excellent characteristics were isolated that exhibit up to 58-fold activation as measured by fluorescent reporter gene assay. PMID:19190095
MapReduce Algorithms for Inferring Gene Regulatory Networks from Time-Series Microarray Data Using an Information-Theoretic Approach.

PubMed

Abduallah, Yasser; Turki, Turki; Byron, Kevin; Du, Zongxuan; Cervantes-Cervantes, Miguel; Wang, Jason T L

2017-01-01

Gene regulation is a series of processes that control gene expression and its extent. The connections among genes and their regulatory molecules, usually transcription factors, and a descriptive model of such connections are known as gene regulatory networks (GRNs). Elucidating GRNs is crucial to understand the inner workings of the cell and the complexity of gene interactions. To date, numerous algorithms have been developed to infer gene regulatory networks. However, as the number of identified genes increases and the complexity of their interactions is uncovered, networks and their regulatory mechanisms become cumbersome to test. Furthermore, prodding through experimental results requires an enormous amount of computation, resulting in slow data processing. Therefore, new approaches are needed to expeditiously analyze copious amounts of experimental data resulting from cellular GRNs. To meet this need, cloud computing is promising as reported in the literature. Here, we propose new MapReduce algorithms for inferring gene regulatory networks on a Hadoop cluster in a cloud environment. These algorithms employ an information-theoretic approach to infer GRNs using time-series microarray data. Experimental results show that our MapReduce program is much faster than an existing tool while achieving slightly better prediction accuracy than the existing tool.
Developing a predictive tropospheric ozone model for Tabriz

NASA Astrophysics Data System (ADS)

Khatibi, Rahman; Naghipour, Leila; Ghorbani, Mohammad A.; Smith, Michael S.; Karimi, Vahid; Farhoudi, Reza; Delafrouz, Hadi; Arvanaghi, Hadi

2013-04-01

Predictive ozone models are becoming indispensable tools by providing a capability for pollution alerts to serve people who are vulnerable to the risks. We have developed a tropospheric ozone prediction capability for Tabriz, Iran, by using the following five modeling strategies: three regression-type methods: Multiple Linear Regression (MLR), Artificial Neural Networks (ANNs), and Gene Expression Programming (GEP); and two auto-regression-type models: Nonlinear Local Prediction (NLP) to implement chaos theory and Auto-Regressive Integrated Moving Average (ARIMA) models. The regression-type modeling strategies explain the data in terms of: temperature, solar radiation, dew point temperature, and wind speed, by regressing present ozone values to their past values. The ozone time series are available at various time intervals, including hourly intervals, from August 2010 to March 2011. The results for MLR, ANN and GEP models are not overly good but those produced by NLP and ARIMA are promising for the establishing a forecasting capability.
The perimenopausal aging transition in the female rat brain: decline in bioenergetic systems and synaptic plasticity.

PubMed

Yin, Fei; Yao, Jia; Sancheti, Harsh; Feng, Tao; Melcangi, Roberto C; Morgan, Todd E; Finch, Caleb E; Pike, Christian J; Mack, Wendy J; Cadenas, Enrique; Brinton, Roberta D

2015-07-01

The perimenopause is an aging transition unique to the female that leads to reproductive senescence which can be characterized by multiple neurological symptoms. To better understand potential underlying mechanisms of neurological symptoms of perimenopause, the present study determined genomic, biochemical, brain metabolic, and electrophysiological transformations that occur during this transition using a rat model recapitulating fundamental characteristics of the human perimenopause. Gene expression analyses indicated two distinct aging programs: chronological and endocrine. A critical period emerged during the endocrine transition from regular to irregular cycling characterized by decline in bioenergetic gene expression, confirmed by deficits in fluorodeoxyglucose-positron emission tomography (FDG-PET) brain metabolism, mitochondrial function, and long-term potentiation. Bioinformatic analysis predicted insulin/insulin-like growth factor 1 and adenosine monophosphate-activated protein kinase/peroxisome proliferator-activated receptor gamma coactivator 1 alpha (AMPK/PGC1α) signaling pathways as upstream regulators. Onset of acyclicity was accompanied by a rise in genes required for fatty acid metabolism, inflammation, and mitochondrial function. Subsequent chronological aging resulted in decline of genes required for mitochondrial function and β-amyloid degradation. Emergence of glucose hypometabolism and impaired synaptic function in brain provide plausible mechanisms of neurological symptoms of perimenopause and may be predictive of later-life vulnerability to hypometabolic conditions such as Alzheimer's. Copyright © 2015 Elsevier Inc. All rights reserved.
Identification of novel mutations in HFE, HFE2, TfR2, and SLC40A1 genes in Chinese patients affected by hereditary hemochromatosis.

PubMed

Wang, Yongwei; Du, Yali; Liu, Gang; Guo, Shanshan; Hou, Bo; Jiang, Xianyong; Han, Bing; Chang, Yanzhong; Nie, Guangjun

2017-04-01

Hereditary hemochromatosis (HH) is a group of inherited iron-overload disorders associated with pathogenic defects in the genes encoding hemochromatosis (HFE), hemojuvelin (HJV/HFE2), hepcidin (HAMP), transferrin receptor 2 (TfR2), and ferroportin (FPN1/SLC40A1) proteins, and the clinical features are well described. However, there have been only a few detailed reports of HH in Chinese populations. Thus, there is insufficient patient information for population-based analyses in Chinese populations or comparative studies among different ethical groups. In the current work, we describe eight Chinese cases of hereditary hemochromatosis. Gene sequencing results revealed eight mutations (five novel mutations) in HFE, HFE2, TfR2, and SLC40A1 genes in these Chinese HH patients. In addition, we used Polymorphism Phenotyping v2 (Polyphen), Sorting Intolerant From Tolerant (SIFT), and a sequence alignment program to predict the molecular consequences of missense mutations.
An Evolutionarily Conserved DOF-CONSTANS Module Controls Plant Photoperiodic Signaling1[OPEN

PubMed Central

2015-01-01

The response to daylength is a crucial process that evolved very early in plant evolution, entitling the early green eukaryote to predict seasonal variability and attune its physiological responses to the environment. The photoperiod responses evolved into the complex signaling pathways that govern the angiosperm floral transition today. The Chlamydomonas reinhardtii DNA-Binding with One Finger (CrDOF) gene controls transcription in a photoperiod-dependent manner, and its misexpression influences algal growth and viability. In short days, CrDOF enhances CrCO expression, a homolog of plant CONSTANS (CO), by direct binding to its promoter, while it reduces the expression of cell division genes in long days independently of CrCO. In Arabidopsis (Arabidopsis thaliana), transgenic plants overexpressing CrDOF show floral delay and reduced expression of the photoperiodic genes CO and FLOWERING LOCUS T. The conservation of the DOF-CO module during plant evolution could be an important clue to understanding diversification by the inheritance of conserved gene toolkits in key developmental programs. PMID:25897001
A Predictive Model of the Oxygen and Heme Regulatory Network in Yeast

PubMed Central

Kundaje, Anshul; Xin, Xiantong; Lan, Changgui; Lianoglou, Steve; Zhou, Mei; Zhang, Li; Leslie, Christina

2008-01-01

Deciphering gene regulatory mechanisms through the analysis of high-throughput expression data is a challenging computational problem. Previous computational studies have used large expression datasets in order to resolve fine patterns of coexpression, producing clusters or modules of potentially coregulated genes. These methods typically examine promoter sequence information, such as DNA motifs or transcription factor occupancy data, in a separate step after clustering. We needed an alternative and more integrative approach to study the oxygen regulatory network in Saccharomyces cerevisiae using a small dataset of perturbation experiments. Mechanisms of oxygen sensing and regulation underlie many physiological and pathological processes, and only a handful of oxygen regulators have been identified in previous studies. We used a new machine learning algorithm called MEDUSA to uncover detailed information about the oxygen regulatory network using genome-wide expression changes in response to perturbations in the levels of oxygen, heme, Hap1, and Co2+. MEDUSA integrates mRNA expression, promoter sequence, and ChIP-chip occupancy data to learn a model that accurately predicts the differential expression of target genes in held-out data. We used a novel margin-based score to extract significant condition-specific regulators and assemble a global map of the oxygen sensing and regulatory network. This network includes both known oxygen and heme regulators, such as Hap1, Mga2, Hap4, and Upc2, as well as many new candidate regulators. MEDUSA also identified many DNA motifs that are consistent with previous experimentally identified transcription factor binding sites. Because MEDUSA's regulatory program associates regulators to target genes through their promoter sequences, we directly tested the predicted regulators for OLE1, a gene specifically induced under hypoxia, by experimental analysis of the activity of its promoter. In each case, deletion of the candidate regulator resulted in the predicted effect on promoter activity, confirming that several novel regulators identified by MEDUSA are indeed involved in oxygen regulation. MEDUSA can reveal important information from a small dataset and generate testable hypotheses for further experimental analysis. Supplemental data are included. PMID:19008939
Evaluation of liquefaction potential of soil based on standard penetration test using multi-gene genetic programming model

NASA Astrophysics Data System (ADS)

Muduli, Pradyut; Das, Sarat

2014-06-01

This paper discusses the evaluation of liquefaction potential of soil based on standard penetration test (SPT) dataset using evolutionary artificial intelligence technique, multi-gene genetic programming (MGGP). The liquefaction classification accuracy (94.19%) of the developed liquefaction index (LI) model is found to be better than that of available artificial neural network (ANN) model (88.37%) and at par with the available support vector machine (SVM) model (94.19%) on the basis of the testing data. Further, an empirical equation is presented using MGGP to approximate the unknown limit state function representing the cyclic resistance ratio (CRR) of soil based on developed LI model. Using an independent database of 227 cases, the overall rates of successful prediction of occurrence of liquefaction and non-liquefaction are found to be 87, 86, and 84% by the developed MGGP based model, available ANN and the statistical models, respectively, on the basis of calculated factor of safety (F s) against the liquefaction occurrence.
Genomic models with genotype × environment interaction for predicting hybrid performance: an application in maize hybrids.

PubMed

Acosta-Pech, Rocío; Crossa, José; de Los Campos, Gustavo; Teyssèdre, Simon; Claustres, Bruno; Pérez-Elizalde, Sergio; Pérez-Rodríguez, Paulino

2017-07-01

A new genomic model that incorporates genotype × environment interaction gave increased prediction accuracy of untested hybrid response for traits such as percent starch content, percent dry matter content and silage yield of maize hybrids. The prediction of hybrid performance (HP) is very important in agricultural breeding programs. In plant breeding, multi-environment trials play an important role in the selection of important traits, such as stability across environments, grain yield and pest resistance. Environmental conditions modulate gene expression causing genotype × environment interaction (G × E), such that the estimated genetic correlations of the performance of individual lines across environments summarize the joint action of genes and environmental conditions. This article proposes a genomic statistical model that incorporates G × E for general and specific combining ability for predicting the performance of hybrids in environments. The proposed model can also be applied to any other hybrid species with distinct parental pools. In this study, we evaluated the predictive ability of two HP prediction models using a cross-validation approach applied in extensive maize hybrid data, comprising 2724 hybrids derived from 507 dent lines and 24 flint lines, which were evaluated for three traits in 58 environments over 12 years; analyses were performed for each year. On average, genomic models that include the interaction of general and specific combining ability with environments have greater predictive ability than genomic models without interaction with environments (ranging from 12 to 22%, depending on the trait). We concluded that including G × E in the prediction of untested maize hybrids increases the accuracy of genomic models.
Rational design of highly active sgRNAs for CRISPR-Cas9-mediated gene inactivation

PubMed Central

Doench, John G.; Hartenian, Ella; Graham, Daniel B.; Tothova, Zuzana; Hegde, Mudra; Smith, Ian; Sullender, Meagan; Ebert, Benjamin L.; Xavier, Ramnik J.; Root, David E.

2014-01-01

Components of the prokaryotic clustered regularly interspersed palindromic repeat (CRISPR) loci have recently been repurposed for use in mammalian cells1–6. The Cas9 protein can be programmed with a single guide RNA (sgRNA) to generate site-specific DNA breaks, but there are few known rules governing on-target efficacy of this system7,8. We created a pool of sgRNAs, tiling across all possible target sites of a panel of six endogenous mouse and three endogenous human genes and quantitatively assessed their ability to produce null alleles of their target gene by antibody staining and flow cytometry. We discovered sequence features that improved activity, including a further optimization of the proto-spacer adjacent motif (PAM) of Streptococcus pyogenes Cas9. The results from 1,841 sgRNAs were used to construct a predictive model of sgRNA activity to improve sgRNA design for gene editing and genetic screens. We provide an online tool for the design of highly active sgRNAs for any gene of interest. PMID:25184501
A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions

PubMed Central

Glusman, Gustavo; Qin, Shizhen; El-Gewely, M. Raafat; Siegel, Andrew F; Roach, Jared C; Hood, Leroy; Smit, Arian F. A

2006-01-01

The identification and characterization of the complete ensemble of genes is a main goal of deciphering the digital information stored in the human genome. Many algorithms for computational gene prediction have been described, ultimately derived from two basic concepts: (1) modeling gene structure and (2) recognizing sequence similarity. Successful hybrid methods combining these two concepts have also been developed. We present a third orthogonal approach to gene prediction, based on detecting the genomic signatures of transcription, accumulated over evolutionary time. We discuss four algorithms based on this third concept: Greens and CHOWDER, which quantify mutational strand biases caused by transcription-coupled DNA repair, and ROAST and PASTA, which are based on strand-specific selection against polyadenylation signals. We combined these algorithms into an integrated method called FEAST, which we used to predict the location and orientation of thousands of putative transcription units not overlapping known genes. Many of the newly predicted transcriptional units do not appear to code for proteins. The new algorithms are particularly apt at detecting genes with long introns and lacking sequence conservation. They therefore complement existing gene prediction methods and will help identify functional transcripts within many apparent “genomic deserts.” PMID:16543943
CORE_TF: a user-friendly interface to identify evolutionary conserved transcription factor binding sites in sets of co-regulated genes

PubMed Central

Hestand, Matthew S; van Galen, Michiel; Villerius, Michel P; van Ommen, Gert-Jan B; den Dunnen, Johan T; 't Hoen, Peter AC

2008-01-01

Background The identification of transcription factor binding sites is difficult since they are only a small number of nucleotides in size, resulting in large numbers of false positives and false negatives in current approaches. Computational methods to reduce false positives are to look for over-representation of transcription factor binding sites in a set of similarly regulated promoters or to look for conservation in orthologous promoter alignments. Results We have developed a novel tool, "CORE_TF" (Conserved and Over-REpresented Transcription Factor binding sites) that identifies common transcription factor binding sites in promoters of co-regulated genes. To improve upon existing binding site predictions, the tool searches for position weight matrices from the TRANSFACR database that are over-represented in an experimental set compared to a random set of promoters and identifies cross-species conservation of the predicted transcription factor binding sites. The algorithm has been evaluated with expression and chromatin-immunoprecipitation on microarray data. We also implement and demonstrate the importance of matching the random set of promoters to the experimental promoters by GC content, which is a unique feature of our tool. Conclusion The program CORE_TF is accessible in a user friendly web interface at . It provides a table of over-represented transcription factor binding sites in the users input genes' promoters and a graphical view of evolutionary conserved transcription factor binding sites. In our test data sets it successfully predicts target transcription factors and their binding sites. PMID:19036135
Hidden state prediction: a modification of classic ancestral state reconstruction algorithms helps unravel complex symbioses.

PubMed

Zaneveld, Jesse R R; Thurber, Rebecca L V

2014-01-01

Complex symbioses between animal or plant hosts and their associated microbiotas can involve thousands of species and millions of genes. Because of the number of interacting partners, it is often impractical to study all organisms or genes in these host-microbe symbioses individually. Yet new phylogenetic predictive methods can use the wealth of accumulated data on diverse model organisms to make inferences into the properties of less well-studied species and gene families. Predictive functional profiling methods use evolutionary models based on the properties of studied relatives to put bounds on the likely characteristics of an organism or gene that has not yet been studied in detail. These techniques have been applied to predict diverse features of host-associated microbial communities ranging from the enzymatic function of uncharacterized genes to the gene content of uncultured microorganisms. We consider these phylogenetically informed predictive techniques from disparate fields as examples of a general class of algorithms for Hidden State Prediction (HSP), and argue that HSP methods have broad value in predicting organismal traits in a variety of contexts, including the study of complex host-microbe symbioses.
A hadoop-based method to predict potential effective drug combination.

PubMed

Sun, Yifan; Xiong, Yi; Xu, Qian; Wei, Dongqing

2014-01-01

Combination drugs that impact multiple targets simultaneously are promising candidates for combating complex diseases due to their improved efficacy and reduced side effects. However, exhaustive screening of all possible drug combinations is extremely time-consuming and impractical. Here, we present a novel Hadoop-based approach to predict drug combinations by taking advantage of the MapReduce programming model, which leads to an improvement of scalability of the prediction algorithm. By integrating the gene expression data of multiple drugs, we constructed data preprocessing and the support vector machines and naïve Bayesian classifiers on Hadoop for prediction of drug combinations. The experimental results suggest that our Hadoop-based model achieves much higher efficiency in the big data processing steps with satisfactory performance. We believed that our proposed approach can help accelerate the prediction of potential effective drugs with the increasing of the combination number at an exponential rate in future. The source code and datasets are available upon request.

A Hadoop-Based Method to Predict Potential Effective Drug Combination

PubMed Central

Xiong, Yi; Xu, Qian; Wei, Dongqing

2014-01-01

Combination drugs that impact multiple targets simultaneously are promising candidates for combating complex diseases due to their improved efficacy and reduced side effects. However, exhaustive screening of all possible drug combinations is extremely time-consuming and impractical. Here, we present a novel Hadoop-based approach to predict drug combinations by taking advantage of the MapReduce programming model, which leads to an improvement of scalability of the prediction algorithm. By integrating the gene expression data of multiple drugs, we constructed data preprocessing and the support vector machines and naïve Bayesian classifiers on Hadoop for prediction of drug combinations. The experimental results suggest that our Hadoop-based model achieves much higher efficiency in the big data processing steps with satisfactory performance. We believed that our proposed approach can help accelerate the prediction of potential effective drugs with the increasing of the combination number at an exponential rate in future. The source code and datasets are available upon request. PMID:25147789
An efficient model for auxiliary diagnosis of hepatocellular carcinoma based on gene expression programming.

PubMed

Zhang, Li; Chen, Jiasheng; Gao, Chunming; Liu, Chuanmiao; Xu, Kuihua

2018-03-16

Hepatocellular carcinoma (HCC) is a leading cause of cancer-related death worldwide. The early diagnosis of HCC is greatly helpful to achieve long-term disease-free survival. However, HCC is usually difficult to be diagnosed at an early stage. The aim of this study was to create the prediction model to diagnose HCC based on gene expression programming (GEP). GEP is an evolutionary algorithm and a domain-independent problem-solving technique. Clinical data show that six serum biomarkers, including gamma-glutamyl transferase, C-reaction protein, carcinoembryonic antigen, alpha-fetoprotein, carbohydrate antigen 153, and carbohydrate antigen 199, are related to HCC characteristics. In this study, the prediction of HCC was made based on these six biomarkers (195 HCC patients and 215 non-HCC controls) by setting up optimal joint models with GEP. The GEP model discriminated 353 out of 410 subjects, representing a determination coefficient of 86.28% (283/328) and 85.37% (70/82) for training and test sets, respectively. Compared to the results from the support vector machine, the artificial neural network, and the multilayer perceptron, GEP showed a better outcome. The results suggested that GEP modeling was a promising and excellent tool in diagnosis of hepatocellular carcinoma, and it could be widely used in HCC auxiliary diagnosis. Graphical abstract The process to establish an efficient model for auxiliary diagnosis of hepatocellular carcinoma.
Genomics of NSCLC patients both affirm PD-L1 expression and predict their clinical responses to anti-PD-1 immunotherapy.

PubMed

Brogden, Kim A; Parashar, Deepak; Hallier, Andrea R; Braun, Terry; Qian, Fang; Rizvi, Naiyer A; Bossler, Aaron D; Milhem, Mohammed M; Chan, Timothy A; Abbasi, Taher; Vali, Shireen

2018-02-27

Programmed Death Ligand 1 (PD-L1) is a co-stimulatory and immune checkpoint protein. PD-L1 expression in non-small cell lung cancers (NSCLC) is a hallmark of adaptive resistance and its expression is often used to predict the outcome of Programmed Death 1 (PD-1) and PD-L1 immunotherapy treatments. However, clinical benefits do not occur in all patients and new approaches are needed to assist in selecting patients for PD-1 or PD-L1 immunotherapies. Here, we hypothesized that patient tumor cell genomics influenced cell signaling and expression of PD-L1, chemokines, and immunosuppressive molecules and these profiles could be used to predict patient clinical responses. We used a recent dataset from NSCLC patients treated with pembrolizumab. Deleterious gene mutational profiles in patient exomes were identified and annotated into a cancer network to create NSCLC patient-specific predictive computational simulation models. Validation checks were performed on the cancer network, simulation model predictions, and PD-1 match rates between patient-specific predicted and clinical responses. Expression profiles of these 24 chemokines and immunosuppressive molecules were used to identify patients who would or would not respond to PD-1 immunotherapy. PD-L1 expression alone was not sufficient to predict which patients would or would not respond to PD-1 immunotherapy. Adding chemokine and immunosuppressive molecule expression profiles allowed patient models to achieve a greater than 85.0% predictive correlation among predicted and reported patient clinical responses. Our results suggested that chemokine and immunosuppressive molecule expression profiles can be used to accurately predict clinical responses thus differentiating among patients who would and would not benefit from PD-1 or PD-L1 immunotherapies.
Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction.

PubMed

Stojanova, Daniela; Ceci, Michelangelo; Malerba, Donato; Dzeroski, Saso

2013-09-26

Ontologies and catalogs of gene functions, such as the Gene Ontology (GO) and MIPS-FUN, assume that functional classes are organized hierarchically, that is, general functions include more specific ones. This has recently motivated the development of several machine learning algorithms for gene function prediction that leverages on this hierarchical organization where instances may belong to multiple classes. In addition, it is possible to exploit relationships among examples, since it is plausible that related genes tend to share functional annotations. Although these relationships have been identified and extensively studied in the area of protein-protein interaction (PPI) networks, they have not received much attention in hierarchical and multi-class gene function prediction. Relations between genes introduce autocorrelation in functional annotations and violate the assumption that instances are independently and identically distributed (i.i.d.), which underlines most machine learning algorithms. Although the explicit consideration of these relations brings additional complexity to the learning process, we expect substantial benefits in predictive accuracy of learned classifiers. This article demonstrates the benefits (in terms of predictive accuracy) of considering autocorrelation in multi-class gene function prediction. We develop a tree-based algorithm for considering network autocorrelation in the setting of Hierarchical Multi-label Classification (HMC). We empirically evaluate the proposed algorithm, called NHMC (Network Hierarchical Multi-label Classification), on 12 yeast datasets using each of the MIPS-FUN and GO annotation schemes and exploiting 2 different PPI networks. The results clearly show that taking autocorrelation into account improves the predictive performance of the learned models for predicting gene function. Our newly developed method for HMC takes into account network information in the learning phase: When used for gene function prediction in the context of PPI networks, the explicit consideration of network autocorrelation increases the predictive performance of the learned models. Overall, we found that this holds for different gene features/ descriptions, functional annotation schemes, and PPI networks: Best results are achieved when the PPI network is dense and contains a large proportion of function-relevant interactions.
Prediction of missing common genes for disease pairs using network based module separation on incomplete human interactome.

PubMed

Akram, Pakeeza; Liao, Li

2017-12-06

Identification of common genes associated with comorbid diseases can be critical in understanding their pathobiological mechanism. This work presents a novel method to predict missing common genes associated with a disease pair. Searching for missing common genes is formulated as an optimization problem to minimize network based module separation from two subgraphs produced by mapping genes associated with disease onto the interactome. Using cross validation on more than 600 disease pairs, our method achieves significantly higher average receiver operating characteristic ROC Score of 0.95 compared to a baseline ROC score 0.60 using randomized data. Missing common genes prediction is aimed to complete gene set associated with comorbid disease for better understanding of biological intervention. It will also be useful for gene targeted therapeutics related to comorbid diseases. This method can be further considered for prediction of missing edges to complete the subgraph associated with disease pair.
Progress and challenges in bioinformatics approaches for enhancer identification

PubMed Central

Kleftogiannis, Dimitrios; Kalnis, Panos

2016-01-01

Enhancers are cis-acting DNA elements that play critical roles in distal regulation of gene expression. Identifying enhancers is an important step for understanding distinct gene expression programs that may reflect normal and pathogenic cellular conditions. Experimental identification of enhancers is constrained by the set of conditions used in the experiment. This requires multiple experiments to identify enhancers, as they can be active under specific cellular conditions but not in different cell types/tissues or cellular states. This has opened prospects for computational prediction methods that can be used for high-throughput identification of putative enhancers to complement experimental approaches. Potential functions and properties of predicted enhancers have been catalogued and summarized in several enhancer-oriented databases. Because the current methods for the computational prediction of enhancers produce significantly different enhancer predictions, it will be beneficial for the research community to have an overview of the strategies and solutions developed in this field. In this review, we focus on the identification and analysis of enhancers by bioinformatics approaches. First, we describe a general framework for computational identification of enhancers, present relevant data types and discuss possible computational solutions. Next, we cover over 30 existing computational enhancer identification methods that were developed since 2000. Our review highlights advantages, limitations and potentials, while suggesting pragmatic guidelines for development of more efficient computational enhancer prediction methods. Finally, we discuss challenges and open problems of this topic, which require further consideration. PMID:26634919
Characterization of the Biosynthetic Genes for 10,11-Dehydrocurvularin, a Heat Shock Response-Modulating Anticancer Fungal Polyketide from Aspergillus terreus

PubMed Central

Xu, Yuquan; Espinosa-Artiles, Patricia; Schubert, Vivien; Xu, Ya-ming; Zhang, Wei; Lin, Min; Gunatilaka, A. A. Leslie; Süssmuth, Roderich

2013-01-01

10,11-Dehydrocurvularin is a prevalent fungal phytotoxin with heat shock response and immune-modulatory activities. It features a dihydroxyphenylacetic acid lactone polyketide framework with structural similarities to resorcylic acid lactones like radicicol or zearalenone. A genomic locus was identified from the dehydrocurvularin producer strain Aspergillus terreus AH-02-30-F7 to reveal genes encoding a pair of iterative polyketide synthases (A. terreus CURS1 [AtCURS1] and AtCURS2) that are predicted to collaborate in the biosynthesis of 10,11-dehydrocurvularin. Additional genes in this locus encode putative proteins that may be involved in the export of the compound from the cell and in the transcriptional regulation of the cluster. 10,11-Dehydrocurvularin biosynthesis was reconstituted in Saccharomyces cerevisiae by heterologous expression of the polyketide synthases. Bioinformatic analysis of the highly reducing polyketide synthase AtCURS1 and the nonreducing polyketide synthase AtCURS2 highlights crucial biosynthetic programming differences compared to similar synthases involved in resorcylic acid lactone biosynthesis. These differences lead to the synthesis of a predicted tetraketide starter unit that forms part of the 12-membered lactone ring of dehydrocurvularin, as opposed to the penta- or hexaketide starters in the 14-membered rings of resorcylic acid lactones. Tetraketide N-acetylcysteamine thioester analogues of the starter unit were shown to support the biosynthesis of dehydrocurvularin and its analogues, with yeast expressing AtCURS2 alone. Differential programming of the product template domain of the nonreducing polyketide synthase AtCURS2 results in an aldol condensation with a different regiospecificity than that of resorcylic acid lactones, yielding the dihydroxyphenylacetic acid scaffold characterized by an S-type cyclization pattern atypical for fungal polyketides. PMID:23335766
Development of GP and GEP models to estimate an environmental issue induced by blasting operation.

PubMed

Faradonbeh, Roohollah Shirani; Hasanipanah, Mahdi; Amnieh, Hassan Bakhshandeh; Armaghani, Danial Jahed; Monjezi, Masoud

2018-05-21

Air overpressure (AOp) is one of the most adverse effects induced by blasting in the surface mines and civil projects. So, proper evaluation and estimation of the AOp is important for minimizing the environmental problems resulting from blasting. The main aim of this study is to estimate AOp produced by blasting operation in Miduk copper mine, Iran, developing two artificial intelligence models, i.e., genetic programming (GP) and gene expression programming (GEP). Then, the accuracy of the GP and GEP models has been compared to multiple linear regression (MLR) and three empirical models. For this purpose, 92 blasting events were investigated, and subsequently, the AOp values were carefully measured. Moreover, in each operation, the values of maximum charge per delay and distance from blast points, as two effective parameters on the AOp, were measured. After predicting by the predictive models, their performance prediction was checked in terms of variance account for (VAF), coefficient of determination (CoD), and root mean square error (RMSE). Finally, it was found that the GEP with VAF of 94.12%, CoD of 0.941, and RMSE of 0.06 is a more precise model than other predictive models for the AOp prediction in the Miduk copper mine, and it can be introduced as a new powerful tool for estimating the AOp resulting from blasting.
Compilation of mRNA Polyadenylation Signals in Arabidopsis Revealed a New Signal Element and Potential Secondary Structures1[w

PubMed Central

Loke, Johnny C.; Stahlberg, Eric A.; Strenski, David G.; Haas, Brian J.; Wood, Paul Chris; Li, Qingshun Quinn

2005-01-01

Using a novel program, SignalSleuth, and a database containing authenticated polyadenylation [poly(A)] sites, we analyzed the composition of mRNA poly(A) signals in Arabidopsis (Arabidopsis thaliana), and reevaluated previously described cis-elements within the 3′-untranslated (UTR) regions, including near upstream elements and far upstream elements. As predicted, there are absences of high-consensus signal patterns. The AAUAAA signal topped the near upstream elements patterns and was found within the predicted location to only approximately 10% of 3′-UTRs. More importantly, we identified a new set, named cleavage elements, of poly(A) signals flanking both sides of the cleavage site. These cis-elements were not previously revealed by conventional mutagenesis and are contemplated as a cluster of signals for cleavage site recognition. Moreover, a single-nucleotide profile scan on the 3′-UTR regions unveiled a distinct arrangement of alternate stretches of U and A nucleotides, which led to a prediction of the formation of secondary structures. Using an RNA secondary structure prediction program, mFold, we identified three main types of secondary structures on the sequences analyzed. Surprisingly, these observed secondary structures were all interrupted in previously constructed mutations in these regions. These results will enable us to revise the current model of plant poly(A) signals and to develop tools to predict 3′-ends for gene annotation. PMID:15965016
Predictive regulatory models in Drosophila melanogaster by integrative inference of transcriptional networks

PubMed Central

Marbach, Daniel; Roy, Sushmita; Ay, Ferhat; Meyer, Patrick E.; Candeias, Rogerio; Kahveci, Tamer; Bristow, Christopher A.; Kellis, Manolis

2012-01-01

Gaining insights on gene regulation from large-scale functional data sets is a grand challenge in systems biology. In this article, we develop and apply methods for transcriptional regulatory network inference from diverse functional genomics data sets and demonstrate their value for gene function and gene expression prediction. We formulate the network inference problem in a machine-learning framework and use both supervised and unsupervised methods to predict regulatory edges by integrating transcription factor (TF) binding, evolutionarily conserved sequence motifs, gene expression, and chromatin modification data sets as input features. Applying these methods to Drosophila melanogaster, we predict ∼300,000 regulatory edges in a network of ∼600 TFs and 12,000 target genes. We validate our predictions using known regulatory interactions, gene functional annotations, tissue-specific expression, protein–protein interactions, and three-dimensional maps of chromosome conformation. We use the inferred network to identify putative functions for hundreds of previously uncharacterized genes, including many in nervous system development, which are independently confirmed based on their tissue-specific expression patterns. Last, we use the regulatory network to predict target gene expression levels as a function of TF expression, and find significantly higher predictive power for integrative networks than for motif or ChIP-based networks. Our work reveals the complementarity between physical evidence of regulatory interactions (TF binding, motif conservation) and functional evidence (coordinated expression or chromatin patterns) and demonstrates the power of data integration for network inference and studies of gene regulation at the systems level. PMID:22456606
Regional Heritability Mapping Provides Insights into Dry Matter Content in African White and Yellow Cassava Populations.

PubMed

Okeke, Uche Godfrey; Akdemir, Deniz; Rabbi, Ismail; Kulakow, Peter; Jannink, Jean-Luc

2018-03-01

The HarvestPlus program for cassava ( Crantz) fortifies cassava with β-carotene by breeding for carotene-rich tubers (yellow cassava). However, a negative correlation between yellowness and dry matter (DM) content has been identified. We investigated the genetic control of DM in white and yellow cassava. We used regional heritability mapping (RHM) to associate DM with genomic segments in both subpopulations. Significant segments were subjected to candidate gene analysis and candidates were validated with prediction accuracies. The RHM procedure was validated via a simulation approach and revealed significant hits for white cassava on chromosomes 1, 4, 5, 10, 17, and 18, whereas hits for the yellow were on chromosome 1. Candidate gene analysis revealed genes in the carbohydrate biosynthesis pathway including plant serine-threonine protein kinases (SnRKs), UDP (uridine diphosphate)-glycosyltransferases, UDP-sugar transporters, invertases, pectinases, and regulons. Validation using 1252 unique identifiers from the SnRK gene family genome-wide recovered 50% of the predictive accuracy of whole-genome single nucleotide polymorphisms for DM, whereas validation using 53 likely genes (extracted from the literature) from significant segments recovered 32%. Genes including an acid invertase, a neutral or alkaline invertase, and a glucose-6-phosphate isomerase were validated on the basis of an a priori list for the cassava starch pathway, and also a fructose-biphosphate aldolase from the Calvin cycle pathway. The power of the RHM procedure was estimated as 47% when the causal quantitative trait loci generated 10% of the phenotypic variance (sample size = 451). Cassava DM genetics are complex and RHM may be useful for complex traits. Copyright © 2018 Crop Science Society of America.
Glucokinase gene mutations (MODY 2) in Asian Indians.

PubMed

Kanthimathi, Sekar; Jahnavi, Suresh; Balamurugan, Kandasamy; Ranjani, Harish; Sonya, Jagadesan; Goswami, Soumik; Chowdhury, Subhankar; Mohan, Viswanathan; Radha, Venkatesan

2014-03-01

Heterozygous inactivating mutations in the glucokinase (GCK) gene cause a hyperglycemic condition termed maturity-onset diabetes of the young (MODY) 2 or GCK-MODY. This is characterized by mild, stable, usually asymptomatic, fasting hyperglycemia that rarely requires pharmacological intervention. The aim of the present study was to screen for GCK gene mutations in Asian Indian subjects with mild hyperglycemia. Of the 1,517 children and adolescents of the population-based ORANGE study in Chennai, India, 49 were found to have hyperglycemia. These children along with the six patients referred to our center with mild hyperglycemia were screened for MODY 2 mutations. The GCK gene was bidirectionally sequenced using BigDye(®) Terminator v3.1 (Applied Biosystems, Foster City, CA) chemistry. In silico predictions of the pathogenicity were carried out using the online tools SIFT, Polyphen-2, and I-Mutant 2.0 software programs. Direct sequencing of the GCK gene in the patients referred to our Centre revealed one novel mutation, Thr206Ala (c.616A>G), in exon 6 and one previously described mutation, Met251Thr (c.752T>C), in exon 7. In silico analysis predicted the novel mutation to be pathogenic. The highly conserved nature and critical location of the residue Thr206 along with the clinical course suggests that the Thr206Ala is a MODY 2 mutation. However, we did not find any MODY 2 mutations in the 49 children selected from the population-based study. Hence prevalence of GCK mutations in Chennai is <1:1,517. This is the first study of MODY 2 mutations from India and confirms the importance of considering GCK gene mutation screening in patients with mild early-onset hyperglycemia who are negative for β-cell antibodies.
Complex phenotype of dyskeratosis congenita and mood dysregulation with novel homozygous RTEL1 and TPH1 variants.

PubMed

Ungar, Rachel A; Giri, Neelam; Pao, Maryland; Khincha, Payal P; Zhou, Weiyin; Alter, Blanche P; Savage, Sharon A

2018-06-01

Dyskeratosis congenita (DC) is an inherited bone marrow failure syndrome caused by germline mutations in telomere biology genes. Patients have extremely short telomeres for their age and a complex phenotype including oral leukoplakia, abnormal skin pigmentation, and dysplastic nails in addition to bone marrow failure, pulmonary fibrosis, stenosis of the esophagus, lacrimal ducts and urethra, developmental anomalies, and high risk of cancer. We evaluated a patient with features of DC, mood dysregulation, diabetes, and lack of pubertal development. Family history was not available but genome-wide genotyping was consistent with consanguinity. Whole exome sequencing identified 82 variants of interest in 80 genes based on the following criteria: homozygous, <0.1% minor allele frequency in public and in-house databases, nonsynonymous, and predicted deleterious by multiple in silico prediction programs. Six genes were identified likely contributory to the clinical presentation. The cause of DC is likely due to homozygous splice site variants in regulator of telomere elongation helicase 1, a known DC and telomere biology gene. A homozygous, missense variant in tryptophan hydroxylase 1 may be clinically important as this gene encodes the rate limiting step in serotonin biosynthesis, a biologic pathway connected with mood disorders. Four additional genes (SCN4A, LRP4, GDAP1L1, and SPTBN5) had rare, missense homozygous variants that we speculate may contribute to portions of the clinical phenotype. This case illustrates the value of conducting detailed clinical and genomic evaluations on rare patients in order to identify new areas of research into the functional consequences of rare variants and their contribution to human disease. © 2018 Wiley Periodicals, Inc.
Four-gene Pan-African Blood Signature Predicts Progression to Tuberculosis.

PubMed

Suliman, Sara; Thompson, Ethan; Sutherland, Jayne; Weiner Rd, January; Ota, Martin O C; Shankar, Smitha; Penn-Nicholson, Adam; Thiel, Bonnie; Erasmus, Mzwandile; Maertzdorf, Jeroen; Duffy, Fergal J; Hill, Philip C; Hughes, E Jane; Stanley, Kim; Downing, Katrina; Fisher, Michelle L; Valvo, Joe; Parida, Shreemanta K; van der Spuy, Gian; Tromp, Gerard; Adetifa, Ifedayo M O; Donkor, Simon; Howe, Rawleigh; Mayanja-Kizza, Harriet; Boom, W Henry; Dockrell, Hazel; Ottenhoff, Tom H M; Hatherill, Mark; Aderem, Alan; Hanekom, Willem A; Scriba, Thomas J; Kaufmann, Stefan He; Zak, Daniel E; Walzl, Gerhard

2018-04-06

Contacts of tuberculosis (TB) patients constitute an important target population for preventative measures as they are at high risk of infection with Mycobacterium tuberculosis and progression to disease. We investigated biosignatures with predictive ability for incident tuberculosis. In a case-control study nested within the Grand Challenges 6-74 longitudinal HIV-negative African cohort of exposed household contacts, we employed RNA sequencing, polymerase chain reaction (PCR) and the Pair Ratio algorithm in a training/test set approach. Overall, 79 progressors, who developed tuberculosis between 3 and 24 months following exposure, and 328 matched non-progressors, who remained healthy during 24 months of follow-up, were investigated. A four-transcript signature (RISK4), derived from samples in a South African and Gambian training set, predicted progression up to two years before onset of disease in blinded test set samples from South Africa, The Gambia and Ethiopia with little population-associated variability and also validated on an external cohort of South African adolescents with latent Mycobacterium tuberculosis infection. By contrast, published diagnostic or prognostic tuberculosis signatures predicted on samples from some but not all 3 countries, indicating site-specific variability. Post-hoc meta-analysis identified a single gene pair, C1QC/TRAV27, that would consistently predict TB progression in household contacts from multiple African sites but not in infected adolescents without known recent exposure events. Collectively, we developed a simple whole blood-based PCR test to predict tuberculosis in household contacts from diverse African populations, with potential for implementation in national TB contact investigation programs.
Predictive Genes in Adjacent Normal Tissue Are Preferentially Altered by sCNV during Tumorigenesis in Liver Cancer and May Rate Limiting

PubMed Central

Lamb, John R.; Zhang, Chunsheng; Xie, Tao; Wang, Kai; Zhang, Bin; Hao, Ke; Chudin, Eugene; Fraser, Hunter B.; Millstein, Joshua; Ferguson, Mark; Suver, Christine; Ivanovska, Irena; Scott, Martin; Philippar, Ulrike; Bansal, Dimple; Zhang, Zhan; Burchard, Julja; Smith, Ryan; Greenawalt, Danielle; Cleary, Michele; Derry, Jonathan; Loboda, Andrey; Watters, James; Poon, Ronnie T. P.; Fan, Sheung T.; Yeung, Chun; Lee, Nikki P. Y.; Guinney, Justin; Molony, Cliona; Emilsson, Valur; Buser-Doepner, Carolyn; Zhu, Jun; Friend, Stephen; Mao, Mao; Shaw, Peter M.; Dai, Hongyue; Luk, John M.; Schadt, Eric E.

2011-01-01

Background In hepatocellular carcinoma (HCC) genes predictive of survival have been found in both adjacent normal (AN) and tumor (TU) tissues. The relationships between these two sets of predictive genes and the general process of tumorigenesis and disease progression remains unclear. Methodology/Principal Findings Here we have investigated HCC tumorigenesis by comparing gene expression, DNA copy number variation and survival using ∼250 AN and TU samples representing, respectively, the pre-cancer state, and the result of tumorigenesis. Genes that participate in tumorigenesis were defined using a gene-gene correlation meta-analysis procedure that compared AN versus TU tissues. Genes predictive of survival in AN (AN-survival genes) were found to be enriched in the differential gene-gene correlation gene set indicating that they directly participate in the process of tumorigenesis. Additionally the AN-survival genes were mostly not predictive after tumorigenesis in TU tissue and this transition was associated with and could largely be explained by the effect of somatic DNA copy number variation (sCNV) in cis and in trans. The data was consistent with the variance of AN-survival genes being rate-limiting steps in tumorigenesis and this was confirmed using a treatment that promotes HCC tumorigenesis that selectively altered AN-survival genes and genes differentially correlated between AN and TU. Conclusions/Significance This suggests that the process of tumor evolution involves rate-limiting steps related to the background from which the tumor evolved where these were frequently predictive of clinical outcome. Additionally treatments that alter the likelihood of tumorigenesis occurring may act by altering AN-survival genes, suggesting that the process can be manipulated. Further sCNV explains a substantial fraction of tumor specific expression and may therefore be a causal driver of tumor evolution in HCC and perhaps many solid tumor types. PMID:21750698
Sequence-based Network Completion Reveals the Integrality of Missing Reactions in Metabolic Networks*

PubMed Central

Krumholz, Elias W.; Libourel, Igor G. L.

2015-01-01

Genome-scale metabolic models are central in connecting genotypes to metabolic phenotypes. However, even for well studied organisms, such as Escherichia coli, draft networks do not contain a complete biochemical network. Missing reactions are referred to as gaps. These gaps need to be filled to enable functional analysis, and gap-filling choices influence model predictions. To investigate whether functional networks existed where all gap-filling reactions were supported by sequence similarity to annotated enzymes, four draft networks were supplemented with all reactions from the Model SEED database for which minimal sequence similarity was found in their genomes. Quadratic programming revealed that the number of reactions that could partake in a gap-filling solution was vast: 3,270 in the case of E. coli, where 72% of the metabolites in the draft network could connect a gap-filling solution. Nonetheless, no network could be completed without the inclusion of orphaned enzymes, suggesting that parts of the biochemistry integral to biomass precursor formation are uncharacterized. However, many gap-filling reactions were well determined, and the resulting networks showed improved prediction of gene essentiality compared with networks generated through canonical gap filling. In addition, gene essentiality predictions that were sensitive to poorly determined gap-filling reactions were of poor quality, suggesting that damage to the network structure resulting from the inclusion of erroneous gap-filling reactions may be predictable. PMID:26041773
A signature inferred from Drosophila mitotic genes predicts survival of breast cancer patients.

PubMed

Damasco, Christian; Lembo, Antonio; Somma, Maria Patrizia; Gatti, Maurizio; Di Cunto, Ferdinando; Provero, Paolo

2011-02-28

The classification of breast cancer patients into risk groups provides a powerful tool for the identification of patients who will benefit from aggressive systemic therapy. The analysis of microarray data has generated several gene expression signatures that improve diagnosis and allow risk assessment. There is also evidence that cell proliferation-related genes have a high predictive power within these signatures. We thus constructed a gene expression signature (the DM signature) using the human orthologues of 108 Drosophila melanogaster genes required for either the maintenance of chromosome integrity (36 genes) or mitotic division (72 genes). The DM signature has minimal overlap with the extant signatures and is highly predictive of survival in 5 large breast cancer datasets. In addition, we show that the DM signature outperforms many widely used breast cancer signatures in predictive power, and performs comparably to other proliferation-based signatures. For most genes of the DM signature, an increased expression is negatively correlated with patient survival. The genes that provide the highest contribution to the predictive power of the DM signature are those involved in cytokinesis. This finding highlights cytokinesis as an important marker in breast cancer prognosis and as a possible target for antimitotic therapies.
xGDBvm: A Web GUI-Driven Workflow for Annotating Eukaryotic Genomes in the Cloud[OPEN

PubMed Central

Merchant, Nirav

2016-01-01

Genome-wide annotation of gene structure requires the integration of numerous computational steps. Currently, annotation is arguably best accomplished through collaboration of bioinformatics and domain experts, with broad community involvement. However, such a collaborative approach is not scalable at today’s pace of sequence generation. To address this problem, we developed the xGDBvm software, which uses an intuitive graphical user interface to access a number of common genome analysis and gene structure tools, preconfigured in a self-contained virtual machine image. Once their virtual machine instance is deployed through iPlant’s Atmosphere cloud services, users access the xGDBvm workflow via a unified Web interface to manage inputs, set program parameters, configure links to high-performance computing (HPC) resources, view and manage output, apply analysis and editing tools, or access contextual help. The xGDBvm workflow will mask the genome, compute spliced alignments from transcript and/or protein inputs (locally or on a remote HPC cluster), predict gene structures and gene structure quality, and display output in a public or private genome browser complete with accessory tools. Problematic gene predictions are flagged and can be reannotated using the integrated yrGATE annotation tool. xGDBvm can also be configured to append or replace existing data or load precomputed data. Multiple genomes can be annotated and displayed, and outputs can be archived for sharing or backup. xGDBvm can be adapted to a variety of use cases including de novo genome annotation, reannotation, comparison of different annotations, and training or teaching. PMID:27020957
xGDBvm: A Web GUI-Driven Workflow for Annotating Eukaryotic Genomes in the Cloud.

PubMed

Duvick, Jon; Standage, Daniel S; Merchant, Nirav; Brendel, Volker P

2016-04-01

Genome-wide annotation of gene structure requires the integration of numerous computational steps. Currently, annotation is arguably best accomplished through collaboration of bioinformatics and domain experts, with broad community involvement. However, such a collaborative approach is not scalable at today's pace of sequence generation. To address this problem, we developed the xGDBvm software, which uses an intuitive graphical user interface to access a number of common genome analysis and gene structure tools, preconfigured in a self-contained virtual machine image. Once their virtual machine instance is deployed through iPlant's Atmosphere cloud services, users access the xGDBvm workflow via a unified Web interface to manage inputs, set program parameters, configure links to high-performance computing (HPC) resources, view and manage output, apply analysis and editing tools, or access contextual help. The xGDBvm workflow will mask the genome, compute spliced alignments from transcript and/or protein inputs (locally or on a remote HPC cluster), predict gene structures and gene structure quality, and display output in a public or private genome browser complete with accessory tools. Problematic gene predictions are flagged and can be reannotated using the integrated yrGATE annotation tool. xGDBvm can also be configured to append or replace existing data or load precomputed data. Multiple genomes can be annotated and displayed, and outputs can be archived for sharing or backup. xGDBvm can be adapted to a variety of use cases including de novo genome annotation, reannotation, comparison of different annotations, and training or teaching. © 2016 American Society of Plant Biologists. All rights reserved.
Inductive matrix completion for predicting gene-disease associations.

PubMed

Natarajan, Nagarajan; Dhillon, Inderjit S

2014-06-15

Most existing methods for predicting causal disease genes rely on specific type of evidence, and are therefore limited in terms of applicability. More often than not, the type of evidence available for diseases varies-for example, we may know linked genes, keywords associated with the disease obtained by mining text, or co-occurrence of disease symptoms in patients. Similarly, the type of evidence available for genes varies-for example, specific microarray probes convey information only for certain sets of genes. In this article, we apply a novel matrix-completion method called Inductive Matrix Completion to the problem of predicting gene-disease associations; it combines multiple types of evidence (features) for diseases and genes to learn latent factors that explain the observed gene-disease associations. We construct features from different biological sources such as microarray expression data and disease-related textual data. A crucial advantage of the method is that it is inductive; it can be applied to diseases not seen at training time, unlike traditional matrix-completion approaches and network-based inference methods that are transductive. Comparison with state-of-the-art methods on diseases from the Online Mendelian Inheritance in Man (OMIM) database shows that the proposed approach is substantially better-it has close to one-in-four chance of recovering a true association in the top 100 predictions, compared to the recently proposed Catapult method (second best) that has <15% chance. We demonstrate that the inductive method is particularly effective for a query disease with no previously known gene associations, and for predicting novel genes, i.e. genes that are previously not linked to diseases. Thus the method is capable of predicting novel genes even for well-characterized diseases. We also validate the novelty of predictions by evaluating the method on recently reported OMIM associations and on associations recently reported in the literature. Source code and datasets can be downloaded from http://bigdata.ices.utexas.edu/project/gene-disease. © The Author 2014. Published by Oxford University Press.

CisMapper: predicting regulatory interactions from transcription factor ChIP-seq data

PubMed Central

O'Connor, Timothy; Bodén, Mikael

2017-01-01

Abstract Identifying the genomic regions and regulatory factors that control the transcription of genes is an important, unsolved problem. The current method of choice predicts transcription factor (TF) binding sites using chromatin immunoprecipitation followed by sequencing (ChIP-seq), and then links the binding sites to putative target genes solely on the basis of the genomic distance between them. Evidence from chromatin conformation capture experiments shows that this approach is inadequate due to long-distance regulation via chromatin looping. We present CisMapper, which predicts the regulatory targets of a TF using the correlation between a histone mark at the TF's bound sites and the expression of each gene across a panel of tissues. Using both chromatin conformation capture and differential expression data, we show that CisMapper is more accurate at predicting the target genes of a TF than the distance-based approaches currently used, and is particularly advantageous for predicting the long-range regulatory interactions typical of tissue-specific gene expression. CisMapper also predicts which TF binding sites regulate a given gene more accurately than using genomic distance. Unlike distance-based methods, CisMapper can predict which transcription start site of a gene is regulated by a particular binding site of the TF. PMID:28204599
Combining Gene Signatures Improves Prediction of Breast Cancer Survival

PubMed Central

Zhao, Xi; Naume, Bjørn; Langerød, Anita; Frigessi, Arnoldo; Kristensen, Vessela N.; Børresen-Dale, Anne-Lise; Lingjærde, Ole Christian

2011-01-01

Background Several gene sets for prediction of breast cancer survival have been derived from whole-genome mRNA expression profiles. Here, we develop a statistical framework to explore whether combination of the information from such sets may improve prediction of recurrence and breast cancer specific death in early-stage breast cancers. Microarray data from two clinically similar cohorts of breast cancer patients are used as training (n = 123) and test set (n = 81), respectively. Gene sets from eleven previously published gene signatures are included in the study. Principal Findings To investigate the relationship between breast cancer survival and gene expression on a particular gene set, a Cox proportional hazards model is applied using partial likelihood regression with an L2 penalty to avoid overfitting and using cross-validation to determine the penalty weight. The fitted models are applied to an independent test set to obtain a predicted risk for each individual and each gene set. Hierarchical clustering of the test individuals on the basis of the vector of predicted risks results in two clusters with distinct clinical characteristics in terms of the distribution of molecular subtypes, ER, PR status, TP53 mutation status and histological grade category, and associated with significantly different survival probabilities (recurrence: p = 0.005; breast cancer death: p = 0.014). Finally, principal components analysis of the gene signatures is used to derive combined predictors used to fit a new Cox model. This model classifies test individuals into two risk groups with distinct survival characteristics (recurrence: p = 0.003; breast cancer death: p = 0.001). The latter classifier outperforms all the individual gene signatures, as well as Cox models based on traditional clinical parameters and the Adjuvant! Online for survival prediction. Conclusion Combining the predictive strength of multiple gene signatures improves prediction of breast cancer survival. The presented methodology is broadly applicable to breast cancer risk assessment using any new identified gene set. PMID:21423775
BAC and RNA sequencing reveal the brown planthopper resistance gene BPH15 in a recombination cold spot that mediates a unique defense mechanism.

PubMed

Lv, Wentang; Du, Ba; Shangguan, Xinxin; Zhao, Yan; Pan, Yufang; Zhu, Lili; He, Yuqing; He, Guangcun

2014-08-11

Brown planthopper (BPH, Nilaparvata lugens Stål), is the most destructive phloem-feeding insect pest of rice (Oryza sativa). The BPH-resistance gene BPH15 has been proved to be effective in controlling the pest and widely applied in rice breeding programs. Nevertheless, molecular mechanism of the resistance remain unclear. In this study, we narrowed down the position of BPH15 on chromosome 4 and investigated the transcriptome of BPH15 rice after BPH attacked. We analyzed 13,000 BC2F2 plants of cross between susceptible rice TN1 and the recombinant inbred line RI93 that carrying the BPH15 gene from original resistant donor B5. BPH15 was mapped to a 0.0269 cM region on chromosome 4, which is 210-kb in the reference genome of Nipponbare. Sequencing bacterial artificial chromosome (BAC) clones that span the BPH15 region revealed that the physical size of BPH15 region in resistant rice B5 is 580-kb, much bigger than the corresponding region in the reference genome of Nipponbare. There were 87 predicted genes in the BPH15 region in resistant rice. The expression profiles of predicted genes were analyzed. Four jacalin-related lectin proteins genes and one LRR protein gene were found constitutively expressed in resistant parent and considered the candidate genes of BPH15. The transcriptomes of resistant BPH15 introgression line and the susceptible recipient line were analyzed using high-throughput RNA sequencing. In total, 2,914 differentially expressed genes (DEGs) were identified. BPH-responsive transcript profiles were distinct between resistant and susceptible plants and between the early stage (6 h after infestation, HAI) and late stage (48 HAI). The key defense mechanism was related to jasmonate signaling, ethylene signaling, receptor kinase, MAPK cascades, Ca(2+) signaling, PR genes, transcription factors, and protein posttranslational modifications. Our work combined BAC and RNA sequencing to identify candidate genes of BPH15 and revealed the resistance mechanism that it mediated. These results increase our understanding of plant-insect interactions and can be used to protect against this destructive agricultural pest.
A Transcriptional Program for Arbuscule Degeneration during AM Symbiosis Is Regulated by MYB1.

PubMed

Floss, Daniela S; Gomez, S Karen; Park, Hee-Jin; MacLean, Allyson M; Müller, Lena M; Bhattarai, Kishor K; Lévesque-Tremblay, Veronique; Maldonado-Mendoza, Ignacio E; Harrison, Maria J

2017-04-24

During the endosymbiosis formed between plants and arbuscular mycorrhizal (AM) fungi, the root cortical cells are colonized by branched hyphae called arbuscules, which function in nutrient exchange with the plant [1]. Despite their positive function, arbuscules are ephemeral structures, and their development is followed by a degeneration phase, in which the arbuscule and surrounding periarbuscular membrane and matrix gradually disappear from the root cell [2, 3]. Currently, the root cell's role in this process and the underlying regulatory mechanisms are unknown. Here, by using a Medicago truncatula pt4 mutant in which arbuscules degenerate prematurely [4], we identified arbuscule degeneration-associated genes, of which 38% are predicted to encode secreted hydrolases, suggesting a role in disassembly of the arbuscule and interface. Through RNAi and analysis of an insertion mutant, we identified a symbiosis-specific MYB-like transcription factor (MYB1) that suppresses arbuscule degeneration in mtpt4. In myb1, expression of several degeneration-associated genes is reduced. Conversely, in roots constitutively overexpressing MYB1, expression of degeneration-associated genes is increased and subsequent development of symbiosis is impaired. MYB1-regulated gene expression is enhanced by DELLA proteins and is dependent on NSP1 [5], but not NSP2 [6]. Furthermore, MYB1 interacts with DELLA and NSP1. Our data identify a transcriptional program for arbuscule degeneration and reveal that its regulators include MYB1 in association with two transcriptional regulators, NSP1 and DELLA, both of which function in preceding phases of the symbiosis. We propose that the combinatorial use of transcription factors enables the sequential expression of transcriptional programs for arbuscule development and degeneration. Copyright © 2017 Elsevier Ltd. All rights reserved.
ASPIC: a novel method to predict the exon-intron structure of a gene that is optimally compatible to a set of transcript sequences.

PubMed

Bonizzoni, Paola; Rizzi, Raffaella; Pesole, Graziano

2005-10-05

Currently available methods to predict splice sites are mainly based on the independent and progressive alignment of transcript data (mostly ESTs) to the genomic sequence. Apart from often being computationally expensive, this approach is vulnerable to several problems--hence the need to develop novel strategies. We propose a method, based on a novel multiple genome-EST alignment algorithm, for the detection of splice sites. To avoid limitations of splice sites prediction (mainly, over-predictions) due to independent single EST alignments to the genomic sequence our approach performs a multiple alignment of transcript data to the genomic sequence based on the combined analysis of all available data. We recast the problem of predicting constitutive and alternative splicing as an optimization problem, where the optimal multiple transcript alignment minimizes the number of exons and hence of splice site observations. We have implemented a splice site predictor based on this algorithm in the software tool ASPIC (Alternative Splicing PredICtion). It is distinguished from other methods based on BLAST-like tools by the incorporation of entirely new ad hoc procedures for accurate and computationally efficient transcript alignment and adopts dynamic programming for the refinement of intron boundaries. ASPIC also provides the minimal set of non-mergeable transcript isoforms compatible with the detected splicing events. The ASPIC web resource is dynamically interconnected with the Ensembl and Unigene databases and also implements an upload facility. Extensive bench marking shows that ASPIC outperforms other existing methods in the detection of novel splicing isoforms and in the minimization of over-predictions. ASPIC also requires a lower computation time for processing a single gene and an EST cluster. The ASPIC web resource is available at http://aspic.algo.disco.unimib.it/aspic-devel/.
04-ERD-052-Final Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Loots, G G; Ovcharenko, I; Collette, N

2007-02-26

Generating the sequence of the human genome represents a colossal achievement for science and mankind. The technical use for the human genome project information holds great promise to cure disease, prevent bioterror threats, as well as to learn about human origins. Yet converting the sequence data into biological meaningful information has not been immediately obvious, and we are still in the preliminary stages of understanding how the genome is organized, what are the functional building blocks and how do these sequences mediate complex biological processes. The overarching goal of this program was to develop novel methods and high throughput strategiesmore » for determining the functions of ''anonymous'' human genes that are evolutionarily deeply conserved in other vertebrates. We coupled analytical tool development and computational predictions regarding gene function with novel high throughput experimental strategies and tested biological predictions in the laboratory. The tools required for comparative genomic data-mining are fundamentally the same whether they are applied to scientific studies of related microbes or the search for functions of novel human genes. For this reason the tools, conceptual framework and the coupled informatics-experimental biology paradigm we developed in this LDRD has many potential scientific applications relevant to LLNL multidisciplinary research in bio-defense, bioengineering, bionanosciences and microbial and environmental genomics.« less
Gene expression signature in urine for diagnosing and assessing aggressiveness of bladder urothelial carcinoma.

PubMed

Mengual, Lourdes; Burset, Moisès; Ribal, María José; Ars, Elisabet; Marín-Aguilera, Mercedes; Fernández, Manuel; Ingelmo-Torres, Mercedes; Villavicencio, Humberto; Alcaraz, Antonio

2010-05-01

To develop an accurate and noninvasive method for bladder cancer diagnosis and prediction of disease aggressiveness based on the gene expression patterns of urine samples. Gene expression patterns of 341 urine samples from bladder urothelial cell carcinoma (UCC) patients and 235 controls were analyzed via TaqMan Arrays. In a first phase of the study, three consecutive gene selection steps were done to identify a gene set expression signature to detect and stratify UCC in urine. Subsequently, those genes more informative for UCC diagnosis and prediction of tumor aggressiveness were combined to obtain a classification system of bladder cancer samples. In a second phase, the obtained gene set signature was evaluated in a routine clinical scenario analyzing only voided urine samples. We have identified a 12+2 gene expression signature for UCC diagnosis and prediction of tumor aggressiveness on urine samples. Overall, this gene set panel had 98% sensitivity (SN) and 99% specificity (SP) in discriminating between UCC and control samples and 79% SN and 92% SP in predicting tumor aggressiveness. The translation of the model to the clinically applicable format corroborates that the 12+2 gene set panel described maintains a high accuracy for UCC diagnosis (SN = 89% and SP = 95%) and tumor aggressiveness prediction (SN = 79% and SP = 91%) in voided urine samples. The 12+2 gene expression signature described in urine is able to identify patients suffering from UCC and predict tumor aggressiveness. We show that a panel of molecular markers may improve the schedule for diagnosis and follow-up in UCC patients. Copyright 2010 AACR.
Intersection of toxicogenomics and high throughput screening in the Tox21 program: an NIEHS perspective.

PubMed

Merrick, B Alex; Paules, Richard S; Tice, Raymond R

Humans are exposed to thousands of chemicals with inadequate toxicological data. Advances in computational toxicology, robotic high throughput screening (HTS), and genome-wide expression have been integrated into the Tox21 program to better predict the toxicological effects of chemicals. Tox21 is a collaboration among US government agencies initiated in 2008 that aims to shift chemical hazard assessment from traditional animal toxicology to target-specific, mechanism-based, biological observations using in vitro assays and lower organism models. HTS uses biocomputational methods for probing thousands of chemicals in in vitro assays for gene-pathway response patterns predictive of adverse human health outcomes. In 1999, NIEHS began exploring the application of toxicogenomics to toxicology and recent advances in NextGen sequencing should greatly enhance the biological content obtained from HTS platforms. We foresee an intersection of new technologies in toxicogenomics and HTS as an innovative development in Tox21. Tox21 goals, priorities, progress, and challenges will be reviewed.
Combat Wound Initiative program.

PubMed

Stojadinovic, Alexander; Elster, Eric; Potter, Benjamin K; Davis, Thomas A; Tadaki, Doug K; Brown, Trevor S; Ahlers, Stephen; Attinger, Christopher E; Andersen, Romney C; Burris, David; Centeno, Jose; Champion, Hunter; Crumbley, David R; Denobile, John; Duga, Michael; Dunne, James R; Eberhardt, John; Ennis, William J; Forsberg, Jonathan A; Hawksworth, Jason; Helling, Thomas S; Lazarus, Gerald S; Milner, Stephen M; Mullick, Florabel G; Owner, Christopher R; Pasquina, Paul F; Patel, Chirag R; Peoples, George E; Nissan, Aviram; Ring, Michael; Sandberg, Glenn D; Schaden, Wolfgang; Schultz, Gregory S; Scofield, Tom; Shawen, Scott B; Sheppard, Forest R; Stannard, James P; Weina, Peter J; Zenilman, Jonathan M

2010-07-01

The Combat Wound Initiative (CWI) program is a collaborative, multidisciplinary, and interservice public-private partnership that provides personalized, state-of-the-art, and complex wound care via targeted clinical and translational research. The CWI uses a bench-to-bedside approach to translational research, including the rapid development of a human extracorporeal shock wave therapy (ESWT) study in complex wounds after establishing the potential efficacy, biologic mechanisms, and safety of this treatment modality in a murine model. Additional clinical trials include the prospective use of clinical data, serum and wound biomarkers, and wound gene expression profiles to predict wound healing/failure and additional clinical patient outcomes following combat-related trauma. These clinical research data are analyzed using machine-based learning algorithms to develop predictive treatment models to guide clinical decision-making. Future CWI directions include additional clinical trials and study centers and the refinement and deployment of our genetically driven, personalized medicine initiative to provide patient-specific care across multiple medical disciplines, with an emphasis on combat casualty care.
Building predictive gene signatures through simultaneous assessment of transcription factor activation and gene expression.

EPA Science Inventory

Building predictive gene signatures through simultaneous assessment of transcription factor activation and gene expression Exposure to many drugs and environmentally-relevant chemicals can cause adverse outcomes. These adverse outcomes, such as cancer, have been linked to mol...
Hidden state prediction: a modification of classic ancestral state reconstruction algorithms helps unravel complex symbioses

PubMed Central

Zaneveld, Jesse R. R.; Thurber, Rebecca L. V.

2014-01-01

Complex symbioses between animal or plant hosts and their associated microbiotas can involve thousands of species and millions of genes. Because of the number of interacting partners, it is often impractical to study all organisms or genes in these host-microbe symbioses individually. Yet new phylogenetic predictive methods can use the wealth of accumulated data on diverse model organisms to make inferences into the properties of less well-studied species and gene families. Predictive functional profiling methods use evolutionary models based on the properties of studied relatives to put bounds on the likely characteristics of an organism or gene that has not yet been studied in detail. These techniques have been applied to predict diverse features of host-associated microbial communities ranging from the enzymatic function of uncharacterized genes to the gene content of uncultured microorganisms. We consider these phylogenetically informed predictive techniques from disparate fields as examples of a general class of algorithms for Hidden State Prediction (HSP), and argue that HSP methods have broad value in predicting organismal traits in a variety of contexts, including the study of complex host-microbe symbioses. PMID:25202302
An Approach for Predicting Essential Genes Using Multiple Homology Mapping and Machine Learning Algorithms.

PubMed

Hua, Hong-Li; Zhang, Fa-Zhan; Labena, Abraham Alemayehu; Dong, Chuan; Jin, Yan-Ting; Guo, Feng-Biao

Investigation of essential genes is significant to comprehend the minimal gene sets of cell and discover potential drug targets. In this study, a novel approach based on multiple homology mapping and machine learning method was introduced to predict essential genes. We focused on 25 bacteria which have characterized essential genes. The predictions yielded the highest area under receiver operating characteristic (ROC) curve (AUC) of 0.9716 through tenfold cross-validation test. Proper features were utilized to construct models to make predictions in distantly related bacteria. The accuracy of predictions was evaluated via the consistency of predictions and known essential genes of target species. The highest AUC of 0.9552 and average AUC of 0.8314 were achieved when making predictions across organisms. An independent dataset from Synechococcus elongatus , which was released recently, was obtained for further assessment of the performance of our model. The AUC score of predictions is 0.7855, which is higher than other methods. This research presents that features obtained by homology mapping uniquely can achieve quite great or even better results than those integrated features. Meanwhile, the work indicates that machine learning-based method can assign more efficient weight coefficients than using empirical formula based on biological knowledge.
eMBI: Boosting Gene Expression-based Clustering for Cancer Subtypes.

PubMed

Chang, Zheng; Wang, Zhenjia; Ashby, Cody; Zhou, Chuan; Li, Guojun; Zhang, Shuzhong; Huang, Xiuzhen

2014-01-01

Identifying clinically relevant subtypes of a cancer using gene expression data is a challenging and important problem in medicine, and is a necessary premise to provide specific and efficient treatments for patients of different subtypes. Matrix factorization provides a solution by finding checker-board patterns in the matrices of gene expression data. In the context of gene expression profiles of cancer patients, these checkerboard patterns correspond to genes that are up- or down-regulated in patients with particular cancer subtypes. Recently, a new matrix factorization framework for biclustering called Maximum Block Improvement (MBI) is proposed; however, it still suffers several problems when applied to cancer gene expression data analysis. In this study, we developed many effective strategies to improve MBI and designed a new program called enhanced MBI (eMBI), which is more effective and efficient to identify cancer subtypes. Our tests on several gene expression profiling datasets of cancer patients consistently indicate that eMBI achieves significant improvements in comparison with MBI, in terms of cancer subtype prediction accuracy, robustness, and running time. In addition, the performance of eMBI is much better than another widely used matrix factorization method called nonnegative matrix factorization (NMF) and the method of hierarchical clustering, which is often the first choice of clinical analysts in practice.
eMBI: Boosting Gene Expression-based Clustering for Cancer Subtypes

PubMed Central

Chang, Zheng; Wang, Zhenjia; Ashby, Cody; Zhou, Chuan; Li, Guojun; Zhang, Shuzhong; Huang, Xiuzhen

2014-01-01

Identifying clinically relevant subtypes of a cancer using gene expression data is a challenging and important problem in medicine, and is a necessary premise to provide specific and efficient treatments for patients of different subtypes. Matrix factorization provides a solution by finding checker-board patterns in the matrices of gene expression data. In the context of gene expression profiles of cancer patients, these checkerboard patterns correspond to genes that are up- or down-regulated in patients with particular cancer subtypes. Recently, a new matrix factorization framework for biclustering called Maximum Block Improvement (MBI) is proposed; however, it still suffers several problems when applied to cancer gene expression data analysis. In this study, we developed many effective strategies to improve MBI and designed a new program called enhanced MBI (eMBI), which is more effective and efficient to identify cancer subtypes. Our tests on several gene expression profiling datasets of cancer patients consistently indicate that eMBI achieves significant improvements in comparison with MBI, in terms of cancer subtype prediction accuracy, robustness, and running time. In addition, the performance of eMBI is much better than another widely used matrix factorization method called nonnegative matrix factorization (NMF) and the method of hierarchical clustering, which is often the first choice of clinical analysts in practice. PMID:25374455
Training set selection for the prediction of essential genes.

PubMed

Cheng, Jian; Xu, Zhao; Wu, Wenwu; Zhao, Li; Li, Xiangchen; Liu, Yanlin; Tao, Shiheng

2014-01-01

Various computational models have been developed to transfer annotations of gene essentiality between organisms. However, despite the increasing number of microorganisms with well-characterized sets of essential genes, selection of appropriate training sets for predicting the essential genes of poorly-studied or newly sequenced organisms remains challenging. In this study, a machine learning approach was applied reciprocally to predict the essential genes in 21 microorganisms. Results showed that training set selection greatly influenced predictive accuracy. We determined four criteria for training set selection: (1) essential genes in the selected training set should be reliable; (2) the growth conditions in which essential genes are defined should be consistent in training and prediction sets; (3) species used as training set should be closely related to the target organism; and (4) organisms used as training and prediction sets should exhibit similar phenotypes or lifestyles. We then analyzed the performance of an incomplete training set and an integrated training set with multiple organisms. We found that the size of the training set should be at least 10% of the total genes to yield accurate predictions. Additionally, the integrated training sets exhibited remarkable increase in stability and accuracy compared with single sets. Finally, we compared the performance of the integrated training sets with the four criteria and with random selection. The results revealed that a rational selection of training sets based on our criteria yields better performance than random selection. Thus, our results provide empirical guidance on training set selection for the identification of essential genes on a genome-wide scale.
Improved accuracy of supervised CRM discovery with interpolated Markov models and cross-species comparison.

PubMed

Kazemian, Majid; Zhu, Qiyun; Halfon, Marc S; Sinha, Saurabh

2011-12-01

Despite recent advances in experimental approaches for identifying transcriptional cis-regulatory modules (CRMs, 'enhancers'), direct empirical discovery of CRMs for all genes in all cell types and environmental conditions is likely to remain an elusive goal. Effective methods for computational CRM discovery are thus a critically needed complement to empirical approaches. However, existing computational methods that search for clusters of putative binding sites are ineffective if the relevant TFs and/or their binding specificities are unknown. Here, we provide a significantly improved method for 'motif-blind' CRM discovery that does not depend on knowledge or accurate prediction of TF-binding motifs and is effective when limited knowledge of functional CRMs is available to 'supervise' the search. We propose a new statistical method, based on 'Interpolated Markov Models', for motif-blind, genome-wide CRM discovery. It captures the statistical profile of variable length words in known CRMs of a regulatory network and finds candidate CRMs that match this profile. The method also uses orthologs of the known CRMs from closely related genomes. We perform in silico evaluation of predicted CRMs by assessing whether their neighboring genes are enriched for the expected expression patterns. This assessment uses a novel statistical test that extends the widely used Hypergeometric test of gene set enrichment to account for variability in intergenic lengths. We find that the new CRM prediction method is superior to existing methods. Finally, we experimentally validate 12 new CRM predictions by examining their regulatory activity in vivo in Drosophila; 10 of the tested CRMs were found to be functional, while 6 of the top 7 predictions showed the expected activity patterns. We make our program available as downloadable source code, and as a plugin for a genome browser installed on our servers. © The Author(s) 2011. Published by Oxford University Press.
nGASP - the nematode genome annotation assessment project

DOE Office of Scientific and Technical Information (OSTI.GOV)

Coghlan, A; Fiedler, T J; McKay, S J

2008-12-19

While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets for 10 Mb of the C. elegans genome. Predictions were compared to reference gene sets consisting of confirmed or manually curated gene models from WormBase. The most accurate gene-finders were 'combiner'more » algorithms, which made use of transcript- and protein-alignments and multi-genome alignments, as well as gene predictions from other gene-finders. Gene-finders that used alignments of ESTs, mRNAs and proteins came in second place. There was a tie for third place between gene-finders that used multi-genome alignments and ab initio gene-finders. The median gene level sensitivity of combiners was 78% and their specificity was 42%, which is nearly the same accuracy as reported for combiners in the human genome. C. elegans genes with exons of unusual hexamer content, as well as those with many exons, short exons, long introns, a weak translation start signal, weak splice sites, or poorly conserved orthologs were the most challenging for gene-finders. While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets for 10 Mb of the C. elegans genome. Predictions were compared to reference gene sets consisting of confirmed or manually curated gene models from WormBase. The most accurate gene-finders were 'combiner' algorithms, which made use of transcript- and protein-alignments and multi-genome alignments, as well as gene predictions from other gene-finders. Gene-finders that used alignments of ESTs, mRNAs and proteins came in second place. There was a tie for third place between gene-finders that used multi-genome alignments and ab initio gene-finders. The median gene level sensitivity of combiners was 78% and their specificity was 42%, which is nearly the same accuracy as reported for combiners in the human genome. C. elegans genes with exons of unusual hexamer content, as well as those with many exons, short exons, long introns, a weak translation start signal, weak splice sites, or poorly conserved orthologs were the most challenging for gene-finders.« less
A new computational strategy for predicting essential genes.

PubMed

Cheng, Jian; Wu, Wenwu; Zhang, Yinwen; Li, Xiangchen; Jiang, Xiaoqian; Wei, Gehong; Tao, Shiheng

2013-12-21

Determination of the minimum gene set for cellular life is one of the central goals in biology. Genome-wide essential gene identification has progressed rapidly in certain bacterial species; however, it remains difficult to achieve in most eukaryotic species. Several computational models have recently been developed to integrate gene features and used as alternatives to transfer gene essentiality annotations between organisms. We first collected features that were widely used by previous predictive models and assessed the relationships between gene features and gene essentiality using a stepwise regression model. We found two issues that could significantly reduce model accuracy: (i) the effect of multicollinearity among gene features and (ii) the diverse and even contrasting correlations between gene features and gene essentiality existing within and among different species. To address these issues, we developed a novel model called feature-based weighted Naïve Bayes model (FWM), which is based on Naïve Bayes classifiers, logistic regression, and genetic algorithm. The proposed model assesses features and filters out the effects of multicollinearity and diversity. The performance of FWM was compared with other popular models, such as support vector machine, Naïve Bayes model, and logistic regression model, by applying FWM to reciprocally predict essential genes among and within 21 species. Our results showed that FWM significantly improves the accuracy and robustness of essential gene prediction. FWM can remarkably improve the accuracy of essential gene prediction and may be used as an alternative method for other classification work. This method can contribute substantially to the knowledge of the minimum gene sets required for living organisms and the discovery of new drug targets.
Intelligent Data Fusion for Wide-Area Assessment of UXO Contamination

DTIC Science & Technology

2008-02-29

Development Program (SERDP). The authors thank the SERDP staff and team members for their assistance, particularly Dr. Herb Nelson and Dr. Dan Steinhurst...Fusion and Integration for Intelligent Systems, Taipei, Taiwan , R.O.C., Aug., 1999. 4. B.J. Johnson, T.G. Moore, B.J. Blejer, C.F. Lee, T.P. Opar, S...gene-expression data using Dempster-Shafer Theory of evidence to predict breast cancer tumors,” Bioinformation 1(5), 170-5, (2006) 21. Dr. Herb H. Nelson, personal communication (2007)
nGASP--the nematode genome annotation assessment project.

PubMed

Coghlan, Avril; Fiedler, Tristan J; McKay, Sheldon J; Flicek, Paul; Harris, Todd W; Blasiar, Darin; Stein, Lincoln D

2008-12-19

While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets across 10 Mb of the C. elegans genome. Predictions were compared to reference gene sets consisting of confirmed or manually curated gene models from WormBase. The most accurate gene-finders were 'combiner' algorithms, which made use of transcript- and protein-alignments and multi-genome alignments, as well as gene predictions from other gene-finders. Gene-finders that used alignments of ESTs, mRNAs and proteins came in second. There was a tie for third place between gene-finders that used multi-genome alignments and ab initio gene-finders. The median gene level sensitivity of combiners was 78% and their specificity was 42%, which is nearly the same accuracy reported for combiners in the human genome. C. elegans genes with exons of unusual hexamer content, as well as those with unusually many exons, short exons, long introns, a weak translation start signal, weak splice sites, or poorly conserved orthologs posed the greatest difficulty for gene-finders. This experiment establishes a baseline of gene prediction accuracy in Caenorhabditis genomes, and has guided the choice of gene-finders for the annotation of newly sequenced genomes of Caenorhabditis and other nematode species. We have created new gene sets for C. briggsae, C. remanei, C. brenneri, C. japonica, and Brugia malayi using some of the best-performing gene-finders.

Prediction of operon-like gene clusters in the Arabidopsis thaliana genome based on co-expression analysis of neighboring genes.

PubMed

Wada, Masayoshi; Takahashi, Hiroki; Altaf-Ul-Amin, Md; Nakamura, Kensuke; Hirai, Masami Y; Ohta, Daisaku; Kanaya, Shigehiko

2012-07-15

Operon-like arrangements of genes occur in eukaryotes ranging from yeasts and filamentous fungi to nematodes, plants, and mammals. In plants, several examples of operon-like gene clusters involved in metabolic pathways have recently been characterized, e.g. the cyclic hydroxamic acid pathways in maize, the avenacin biosynthesis gene clusters in oat, the thalianol pathway in Arabidopsis thaliana, and the diterpenoid momilactone cluster in rice. Such operon-like gene clusters are defined by their co-regulation or neighboring positions within immediate vicinity of chromosomal regions. A comprehensive analysis of the expression of neighboring genes therefore accounts a crucial step to reveal the complete set of operon-like gene clusters within a genome. Genome-wide prediction of operon-like gene clusters should contribute to functional annotation efforts and provide novel insight into evolutionary aspects acquiring certain biological functions as well. We predicted co-expressed gene clusters by comparing the Pearson correlation coefficient of neighboring genes and randomly selected gene pairs, based on a statistical method that takes false discovery rate (FDR) into consideration for 1469 microarray gene expression datasets of A. thaliana. We estimated that A. thaliana contains 100 operon-like gene clusters in total. We predicted 34 statistically significant gene clusters consisting of 3 to 22 genes each, based on a stringent FDR threshold of 0.1. Functional relationships among genes in individual clusters were estimated by sequence similarity and functional annotation of genes. Duplicated gene pairs (determined based on BLAST with a cutoff of E<10(-5)) are included in 27 clusters. Five clusters are associated with metabolism, containing P450 genes restricted to the Brassica family and predicted to be involved in secondary metabolism. Operon-like clusters tend to include genes encoding bio-machinery associated with ribosomes, the ubiquitin/proteasome system, secondary metabolic pathways, lipid and fatty-acid metabolism, and the lipid transfer system. Copyright © 2012 Elsevier B.V. All rights reserved.
MuPeXI: prediction of neo-epitopes from tumor sequencing data.

PubMed

Bjerregaard, Anne-Mette; Nielsen, Morten; Hadrup, Sine Reker; Szallasi, Zoltan; Eklund, Aron Charles

2017-09-01

Personalization of immunotherapies such as cancer vaccines and adoptive T cell therapy depends on identification of patient-specific neo-epitopes that can be specifically targeted. MuPeXI, the mutant peptide extractor and informer, is a program to identify tumor-specific peptides and assess their potential to be neo-epitopes. The program input is a file with somatic mutation calls, a list of HLA types, and optionally a gene expression profile. The output is a table with all tumor-specific peptides derived from nucleotide substitutions, insertions, and deletions, along with comprehensive annotation, including HLA binding and similarity to normal peptides. The peptides are sorted according to a priority score which is intended to roughly predict immunogenicity. We applied MuPeXI to three tumors for which predicted MHC-binding peptides had been screened for T cell reactivity, and found that MuPeXI was able to prioritize immunogenic peptides with an area under the curve of 0.63. Compared to other available tools, MuPeXI provides more information and is easier to use. MuPeXI is available as stand-alone software and as a web server at http://www.cbs.dtu.dk/services/MuPeXI .
A stepwise model to predict monthly streamflow

NASA Astrophysics Data System (ADS)

Mahmood Al-Juboori, Anas; Guven, Aytac

2016-12-01

In this study, a stepwise model empowered with genetic programming is developed to predict the monthly flows of Hurman River in Turkey and Diyalah and Lesser Zab Rivers in Iraq. The model divides the monthly flow data to twelve intervals representing the number of months in a year. The flow of a month, t is considered as a function of the antecedent month's flow (t - 1) and it is predicted by multiplying the antecedent monthly flow by a constant value called K. The optimum value of K is obtained by a stepwise procedure which employs Gene Expression Programming (GEP) and Nonlinear Generalized Reduced Gradient Optimization (NGRGO) as alternative to traditional nonlinear regression technique. The degree of determination and root mean squared error are used to evaluate the performance of the proposed models. The results of the proposed model are compared with the conventional Markovian and Auto Regressive Integrated Moving Average (ARIMA) models based on observed monthly flow data. The comparison results based on five different statistic measures show that the proposed stepwise model performed better than Markovian model and ARIMA model. The R2 values of the proposed model range between 0.81 and 0.92 for the three rivers in this study.
Testing the predictive value of peripheral gene expression for nonremission following citalopram treatment for major depression.

PubMed

Guilloux, Jean-Philippe; Bassi, Sabrina; Ding, Ying; Walsh, Chris; Turecki, Gustavo; Tseng, George; Cyranowski, Jill M; Sibille, Etienne

2015-02-01

Major depressive disorder (MDD) in general, and anxious-depression in particular, are characterized by poor rates of remission with first-line treatments, contributing to the chronic illness burden suffered by many patients. Prospective research is needed to identify the biomarkers predicting nonremission prior to treatment initiation. We collected blood samples from a discovery cohort of 34 adult MDD patients with co-occurring anxiety and 33 matched, nondepressed controls at baseline and after 12 weeks (of citalopram plus psychotherapy treatment for the depressed cohort). Samples were processed on gene arrays and group differences in gene expression were investigated. Exploratory analyses suggest that at pretreatment baseline, nonremitting patients differ from controls with gene function and transcription factor analyses potentially related to elevated inflammation and immune activation. In a second phase, we applied an unbiased machine learning prediction model and corrected for model-selection bias. Results show that baseline gene expression predicted nonremission with 79.4% corrected accuracy with a 13-gene model. The same gene-only model predicted nonremission after 8 weeks of citalopram treatment with 76% corrected accuracy in an independent validation cohort of 63 MDD patients treated with citalopram at another institution. Together, these results demonstrate the potential, but also the limitations, of baseline peripheral blood-based gene expression to predict nonremission after citalopram treatment. These results not only support their use in future prediction tools but also suggest that increased accuracy may be obtained with the inclusion of additional predictors (eg, genetics and clinical scales).
Prediction and Testing of Biological Networks Underlying Intestinal Cancer

PubMed Central

Mariadason, John M.; Wang, Donghai; Augenlicht, Leonard H.; Chance, Mark R.

2010-01-01

Colorectal cancer progresses through an accumulation of somatic mutations, some of which reside in so-called “driver” genes that provide a growth advantage to the tumor. To identify points of intersection between driver gene pathways, we implemented a network analysis framework using protein interactions to predict likely connections – both precedented and novel – between key driver genes in cancer. We applied the framework to find significant connections between two genes, Apc and Cdkn1a (p21), known to be synergistic in tumorigenesis in mouse models. We then assessed the functional coherence of the resulting Apc-Cdkn1a network by engineering in vivo single node perturbations of the network: mouse models mutated individually at Apc (Apc1638N+/−) or Cdkn1a (Cdkn1a−/−), followed by measurements of protein and gene expression changes in intestinal epithelial tissue. We hypothesized that if the predicted network is biologically coherent (functional), then the predicted nodes should associate more specifically with dysregulated genes and proteins than stochastically selected genes and proteins. The predicted Apc-Cdkn1a network was significantly perturbed at the mRNA-level by both single gene knockouts, and the predictions were also strongly supported based on physical proximity and mRNA coexpression of proteomic targets. These results support the functional coherence of the proposed Apc-Cdkn1a network and also demonstrate how network-based predictions can be statistically tested using high-throughput biological data. PMID:20824133
Cross-organism learning method to discover new gene functionalities.

PubMed

Domeniconi, Giacomo; Masseroli, Marco; Moro, Gianluca; Pinoli, Pietro

2016-04-01

Knowledge of gene and protein functions is paramount for the understanding of physiological and pathological biological processes, as well as in the development of new drugs and therapies. Analyses for biomedical knowledge discovery greatly benefit from the availability of gene and protein functional feature descriptions expressed through controlled terminologies and ontologies, i.e., of gene and protein biomedical controlled annotations. In the last years, several databases of such annotations have become available; yet, these valuable annotations are incomplete, include errors and only some of them represent highly reliable human curated information. Computational techniques able to reliably predict new gene or protein annotations with an associated likelihood value are thus paramount. Here, we propose a novel cross-organisms learning approach to reliably predict new functionalities for the genes of an organism based on the known controlled annotations of the genes of another, evolutionarily related and better studied, organism. We leverage a new representation of the annotation discovery problem and a random perturbation of the available controlled annotations to allow the application of supervised algorithms to predict with good accuracy unknown gene annotations. Taking advantage of the numerous gene annotations available for a well-studied organism, our cross-organisms learning method creates and trains better prediction models, which can then be applied to predict new gene annotations of a target organism. We tested and compared our method with the equivalent single organism approach on different gene annotation datasets of five evolutionarily related organisms (Homo sapiens, Mus musculus, Bos taurus, Gallus gallus and Dictyostelium discoideum). Results show both the usefulness of the perturbation method of available annotations for better prediction model training and a great improvement of the cross-organism models with respect to the single-organism ones, without influence of the evolutionary distance between the considered organisms. The generated ranked lists of reliably predicted annotations, which describe novel gene functionalities and have an associated likelihood value, are very valuable both to complement available annotations, for better coverage in biomedical knowledge discovery analyses, and to quicken the annotation curation process, by focusing it on the prioritized novel annotations predicted. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Use of gene-expression programming to estimate Manning’s roughness coefficient for high gradient streams

USGS Publications Warehouse

Azamathulla, H. Md.; Jarrett, Robert D.

2013-01-01

Manning’s roughness coefficient (n) has been widely used in the estimation of flood discharges or depths of flow in natural channels. Therefore, the selection of appropriate Manning’s nvalues is of paramount importance for hydraulic engineers and hydrologists and requires considerable experience, although extensive guidelines are available. Generally, the largest source of error in post-flood estimates (termed indirect measurements) is due to estimates of Manning’s n values, particularly when there has been minimal field verification of flow resistance. This emphasizes the need to improve methods for estimating n values. The objective of this study was to develop a soft computing model in the estimation of the Manning’s n values using 75 discharge measurements on 21 high gradient streams in Colorado, USA. The data are from high gradient (S > 0.002 m/m), cobble- and boulder-bed streams for within bank flows. This study presents Gene-Expression Programming (GEP), an extension of Genetic Programming (GP), as an improved approach to estimate Manning’s roughness coefficient for high gradient streams. This study uses field data and assessed the potential of gene-expression programming (GEP) to estimate Manning’s n values. GEP is a search technique that automatically simplifies genetic programs during an evolutionary processes (or evolves) to obtain the most robust computer program (e.g., simplify mathematical expressions, decision trees, polynomial constructs, and logical expressions). Field measurements collected by Jarrett (J Hydraulic Eng ASCE 110: 1519–1539, 1984) were used to train the GEP network and evolve programs. The developed network and evolved programs were validated by using observations that were not involved in training. GEP and ANN-RBF (artificial neural network-radial basis function) models were found to be substantially more effective (e.g., R2 for testing/validation of GEP and RBF-ANN is 0.745 and 0.65, respectively) than Jarrett’s (J Hydraulic Eng ASCE 110: 1519–1539, 1984) equation (R2 for testing/validation equals 0.58) in predicting the Manning’s n.
Forest gene conservation programs in Alberta, Canada

Treesearch

Jodie Krakowski

2017-01-01

Provincial tree improvement programs in Alberta began in 1976. Early gene conservation focused on ex situ measures such as seed and clone banking, and research trials of commercial species with tree improvement programs. The gene conservation program now encompasses representative and unique populations of all native tree species in situ. The ex situ program aims to...
Prediction of Human Phenotype Ontology terms by means of hierarchical ensemble methods.

PubMed

Notaro, Marco; Schubach, Max; Robinson, Peter N; Valentini, Giorgio

2017-10-12

The prediction of human gene-abnormal phenotype associations is a fundamental step toward the discovery of novel genes associated with human disorders, especially when no genes are known to be associated with a specific disease. In this context the Human Phenotype Ontology (HPO) provides a standard categorization of the abnormalities associated with human diseases. While the problem of the prediction of gene-disease associations has been widely investigated, the related problem of gene-phenotypic feature (i.e., HPO term) associations has been largely overlooked, even if for most human genes no HPO term associations are known and despite the increasing application of the HPO to relevant medical problems. Moreover most of the methods proposed in literature are not able to capture the hierarchical relationships between HPO terms, thus resulting in inconsistent and relatively inaccurate predictions. We present two hierarchical ensemble methods that we formally prove to provide biologically consistent predictions according to the hierarchical structure of the HPO. The modular structure of the proposed methods, that consists in a "flat" learning first step and a hierarchical combination of the predictions in the second step, allows the predictions of virtually any flat learning method to be enhanced. The experimental results show that hierarchical ensemble methods are able to predict novel associations between genes and abnormal phenotypes with results that are competitive with state-of-the-art algorithms and with a significant reduction of the computational complexity. Hierarchical ensembles are efficient computational methods that guarantee biologically meaningful predictions that obey the true path rule, and can be used as a tool to improve and make consistent the HPO terms predictions starting from virtually any flat learning method. The implementation of the proposed methods is available as an R package from the CRAN repository.
Cohort-specific imputation of gene expression improves prediction of warfarin dose for African Americans.

PubMed

Gottlieb, Assaf; Daneshjou, Roxana; DeGorter, Marianne; Bourgeois, Stephane; Svensson, Peter J; Wadelius, Mia; Deloukas, Panos; Montgomery, Stephen B; Altman, Russ B

2017-11-24

Genome-wide association studies are useful for discovering genotype-phenotype associations but are limited because they require large cohorts to identify a signal, which can be population-specific. Mapping genetic variation to genes improves power and allows the effects of both protein-coding variation as well as variation in expression to be combined into "gene level" effects. Previous work has shown that warfarin dose can be predicted using information from genetic variation that affects protein-coding regions. Here, we introduce a method that improves dose prediction by integrating tissue-specific gene expression. In particular, we use drug pathways and expression quantitative trait loci knowledge to impute gene expression-on the assumption that differential expression of key pathway genes may impact dose requirement. We focus on 116 genes from the pharmacokinetic and pharmacodynamic pathways of warfarin within training and validation sets comprising both European and African-descent individuals. We build gene-tissue signatures associated with warfarin dose in a cohort-specific manner and identify a signature of 11 gene-tissue pairs that significantly augments the International Warfarin Pharmacogenetics Consortium dosage-prediction algorithm in both populations. Our results demonstrate that imputed expression can improve dose prediction and bridge population-specific compositions. MATLAB code is available at https://github.com/assafgo/warfarin-cohort.
Gene and transcript abundances of bacterial type III secretion systems from the rumen microbiome are correlated with methane yield in sheep.

PubMed

Kamke, Janine; Soni, Priya; Li, Yang; Ganesh, Siva; Kelly, William J; Leahy, Sinead C; Shi, Weibing; Froula, Jeff; Rubin, Edward M; Attwood, Graeme T

2017-08-08

Ruminants are important contributors to global methane emissions via microbial fermentation in their reticulo-rumens. This study is part of a larger program, characterising the rumen microbiomes of sheep which vary naturally in methane yield (g CH 4 /kg DM/day) and aims to define differences in microbial communities, and in gene and transcript abundances that can explain the animal methane phenotype. Rumen microbiome metagenomic and metatranscriptomic data were analysed by Gene Set Enrichment, sparse partial least squares regression and the Wilcoxon Rank Sum test to estimate correlations between specific KEGG bacterial pathways/genes and high methane yield in sheep. KEGG genes enriched in high methane yield sheep were reassembled from raw reads and existing contigs and analysed by MEGAN to predict their phylogenetic origin. Protein coding sequences from Succinivibrio dextrinosolvens strains were analysed using Effective DB to predict bacterial type III secreted proteins. The effect of S. dextrinosolvens strain H5 growth on methane formation by rumen methanogens was explored using co-cultures. Detailed analysis of the rumen microbiomes of high methane yield sheep shows that gene and transcript abundances of bacterial type III secretion system genes are positively correlated with methane yield in sheep. Most of the bacterial type III secretion system genes could not be assigned to a particular bacterial group, but several genes were affiliated with the genus Succinivibrio, and searches of bacterial genome sequences found that strains of S. dextrinosolvens were part of a small group of rumen bacteria that encode this type of secretion system. In co-culture experiments, S. dextrinosolvens strain H5 showed a growth-enhancing effect on a methanogen belonging to the order Methanomassiliicoccales, and inhibition of a representative of the Methanobrevibacter gottschalkii clade. This is the first report of bacterial type III secretion system genes being associated with high methane emissions in ruminants, and identifies these secretions systems as potential new targets for methane mitigation research. The effects of S. dextrinosolvens on the growth of rumen methanogens in co-cultures indicate that bacteria-methanogen interactions are important modulators of methane production in ruminant animals.
Probability-based collaborative filtering model for predicting gene-disease associations.

PubMed

Zeng, Xiangxiang; Ding, Ningxiang; Rodríguez-Patón, Alfonso; Zou, Quan

2017-12-28

Accurately predicting pathogenic human genes has been challenging in recent research. Considering extensive gene-disease data verified by biological experiments, we can apply computational methods to perform accurate predictions with reduced time and expenses. We propose a probability-based collaborative filtering model (PCFM) to predict pathogenic human genes. Several kinds of data sets, containing data of humans and data of other nonhuman species, are integrated in our model. Firstly, on the basis of a typical latent factorization model, we propose model I with an average heterogeneous regularization. Secondly, we develop modified model II with personal heterogeneous regularization to enhance the accuracy of aforementioned models. In this model, vector space similarity or Pearson correlation coefficient metrics and data on related species are also used. We compared the results of PCFM with the results of four state-of-arts approaches. The results show that PCFM performs better than other advanced approaches. PCFM model can be leveraged for predictions of disease genes, especially for new human genes or diseases with no known relationships.
Well-characterized sequence features of eukaryote genomes and implications for ab initio gene prediction.

PubMed

Huang, Ying; Chen, Shi-Yi; Deng, Feilong

2016-01-01

In silico analysis of DNA sequences is an important area of computational biology in the post-genomic era. Over the past two decades, computational approaches for ab initio prediction of gene structure from genome sequence alone have largely facilitated our understanding on a variety of biological questions. Although the computational prediction of protein-coding genes has already been well-established, we are also facing challenges to robustly find the non-coding RNA genes, such as miRNA and lncRNA. Two main aspects of ab initio gene prediction include the computed values for describing sequence features and used algorithm for training the discriminant function, and by which different combinations are employed into various bioinformatic tools. Herein, we briefly review these well-characterized sequence features in eukaryote genomes and applications to ab initio gene prediction. The main purpose of this article is to provide an overview to beginners who aim to develop the related bioinformatic tools.
Fetal alcohol spectrum disorders: gene-environment interactions, predictive biomarkers, and the relationship between structural alterations in the brain and functional outcomes.

PubMed

Reynolds, James N; Weinberg, Joanne; Clarren, Sterling; Beaulieu, Christian; Rasmussen, Carmen; Kobor, Michael; Dube, Marie-Pierre; Goldowitz, Daniel

2011-03-01

Prenatal alcohol exposure is a major, preventable cause of behavioral and cognitive deficits in children. Despite extensive research, a unique neurobehavioral profile for children affected by prenatal alcohol exposure remains elusive. A fundamental question that must be addressed is how genetic and environmental factors interact with gestational alcohol exposure to produce neurobehavioral and neurobiological deficits in children. The core objectives of the NeuroDevNet team in fetal alcohol spectrum disorders is to create an integrated research program of basic and clinical investigations that will (1) identify genetic and epigenetic modifications that may be predictive of the neurobehavioral and neurobiological dysfunctions in offspring induced by gestational alcohol exposure and (2) determine the relationship between structural alterations in the brain induced by gestational alcohol exposure and functional outcomes in offspring. The overarching hypothesis to be tested is that neurobehavioral and neurobiological dysfunctions induced by gestational alcohol exposure are correlated with the genetic background of the affected child and/or epigenetic modifications in gene expression. The identification of genetic and/or epigenetic markers that are predictive of the severity of behavioral and cognitive deficits in children affected by gestational alcohol exposure will have a profound impact on our ability to identify children at risk. Copyright © 2011 Elsevier Inc. All rights reserved.
Fetal Alcohol Spectrum Disorders: Gene-Environment Interactions, Predictive Biomarkers, and the Relationship Between Structural Alterations in the Brain and Functional Outcomes

PubMed Central

Reynolds, James N.; Weinberg, Joanne; Clarren, Sterling; Beaulieu, Christian; Rasmussen, Carmen; Kobor, Michael; Dube, Marie-Pierre; Goldowitz, Daniel

2016-01-01

Prenatal alcohol exposure is a major, preventable cause of behavioral and cognitive deficits in children. Despite extensive research, a unique neurobehavioral profile for children affected by prenatal alcohol exposure remains elusive. A fundamental question that must be addressed is how genetic and environmental factors interact with gestational alcohol exposure to produce neurobehavioral and neurobiological deficits in children. The core objectives of the NeuroDevNet team in fetal alcohol spectrum disorders is to create an integrated research program of basic and clinical investigations that will (1) identify genetic and epigenetic modifications that may be predictive of the neurobehavioral and neurobiological dysfunctions in offspring induced by gestational alcohol exposure and (2) determine the relationship between structural alterations in the brain induced by gestational alcohol exposure and functional outcomes in offspring. The overarching hypothesis to be tested is that neurobehavioral and neurobiological dysfunctions induced by gestational alcohol exposure are correlated with the genetic background of the affected child and/or epigenetic modifications in gene expression. The identification of genetic and/or epigenetic markers that are predictive of the severity of behavioral and cognitive deficits in children affected by gestational alcohol exposure will have a profound impact on our ability to identify children at risk. PMID:21575841
GC[Formula: see text]NMF: A Novel Matrix Factorization Framework for Gene-Phenotype Association Prediction.

PubMed

Zhang, Yaogong; Liu, Jiahui; Liu, Xiaohu; Hong, Yuxiang; Fan, Xin; Huang, Yalou; Wang, Yuan; Xie, Maoqiang

2018-04-24

Gene-phenotype association prediction can be applied to reveal the inherited basis of human diseases and facilitate drug development. Gene-phenotype associations are related to complex biological processes and influenced by various factors, such as relationship between phenotypes and that among genes. While due to sparseness of curated gene-phenotype associations and lack of integrated analysis of the joint effect of multiple factors, existing applications are limited to prediction accuracy and potential gene-phenotype association detection. In this paper, we propose a novel method by exploiting weighted graph constraint learned from hierarchical structures of phenotype data and group prior information among genes by inheriting advantages of Non-negative Matrix Factorization (NMF), called Weighted Graph Constraint and Group Centric Non-negative Matrix Factorization (GC[Formula: see text]NMF). Specifically, first we introduce the depth of parent-child relationships between two adjacent phenotypes in hierarchical phenotypic data as weighted graph constraint for a better phenotype understanding. Second, we utilize intra-group correlation among genes in a gene group as group constraint for gene understanding. Such information provides us with the intuition that genes in a group probably result in similar phenotypes. The model not only allows us to achieve a high-grade prediction performance, but also helps us to learn interpretable representation of genes and phenotypes simultaneously to facilitate future biological analysis. Experimental results on biological gene-phenotype association datasets of mouse and human demonstrate that GC[Formula: see text]NMF can obtain superior prediction accuracy and good understandability for biological explanation over other state-of-the-arts methods.
Incorrectly predicted genes in rice?

PubMed

Cruveiller, Stéphane; Jabbari, Kamel; Clay, Oliver; Bernardi, Giorgio

2004-05-26

Between one third and one half of the proposed rice genes appear to have no homologs in other species, including Arabidopsis. Compositional considerations, and a comparison of curated rice sequences with ex novo predictions, suggest that many or most of the putative genes without homologs may be false positive predictions, i.e., sequences that are never translated into functional proteins in vivo.
Ensemble positive unlabeled learning for disease gene identification.

PubMed

Yang, Peng; Li, Xiaoli; Chua, Hon-Nian; Kwoh, Chee-Keong; Ng, See-Kiong

2014-01-01

An increasing number of genes have been experimentally confirmed in recent years as causative genes to various human diseases. The newly available knowledge can be exploited by machine learning methods to discover additional unknown genes that are likely to be associated with diseases. In particular, positive unlabeled learning (PU learning) methods, which require only a positive training set P (confirmed disease genes) and an unlabeled set U (the unknown candidate genes) instead of a negative training set N, have been shown to be effective in uncovering new disease genes in the current scenario. Using only a single source of data for prediction can be susceptible to bias due to incompleteness and noise in the genomic data and a single machine learning predictor prone to bias caused by inherent limitations of individual methods. In this paper, we propose an effective PU learning framework that integrates multiple biological data sources and an ensemble of powerful machine learning classifiers for disease gene identification. Our proposed method integrates data from multiple biological sources for training PU learning classifiers. A novel ensemble-based PU learning method EPU is then used to integrate multiple PU learning classifiers to achieve accurate and robust disease gene predictions. Our evaluation experiments across six disease groups showed that EPU achieved significantly better results compared with various state-of-the-art prediction methods as well as ensemble learning classifiers. Through integrating multiple biological data sources for training and the outputs of an ensemble of PU learning classifiers for prediction, we are able to minimize the potential bias and errors in individual data sources and machine learning algorithms to achieve more accurate and robust disease gene predictions. In the future, our EPU method provides an effective framework to integrate the additional biological and computational resources for better disease gene predictions.
Prediction of gene-phenotype associations in humans, mice, and plants using phenologs.

PubMed

Woods, John O; Singh-Blom, Ulf Martin; Laurent, Jon M; McGary, Kriston L; Marcotte, Edward M

2013-06-21

Phenotypes and diseases may be related to seemingly dissimilar phenotypes in other species by means of the orthology of underlying genes. Such "orthologous phenotypes," or "phenologs," are examples of deep homology, and may be used to predict additional candidate disease genes. In this work, we develop an unsupervised algorithm for ranking phenolog-based candidate disease genes through the integration of predictions from the k nearest neighbor phenologs, comparing classifiers and weighting functions by cross-validation. We also improve upon the original method by extending the theory to paralogous phenotypes. Our algorithm makes use of additional phenotype data--from chicken, zebrafish, and E. coli, as well as new datasets for C. elegans--establishing that several types of annotations may be treated as phenotypes. We demonstrate the use of our algorithm to predict novel candidate genes for human atrial fibrillation (such as HRH2, ATP4A, ATP4B, and HOPX) and epilepsy (e.g., PAX6 and NKX2-1). We suggest gene candidates for pharmacologically-induced seizures in mouse, solely based on orthologous phenotypes from E. coli. We also explore the prediction of plant gene-phenotype associations, as for the Arabidopsis response to vernalization phenotype. We are able to rank gene predictions for a significant portion of the diseases in the Online Mendelian Inheritance in Man database. Additionally, our method suggests candidate genes for mammalian seizures based only on bacterial phenotypes and gene orthology. We demonstrate that phenotype information may come from diverse sources, including drug sensitivities, gene ontology biological processes, and in situ hybridization annotations. Finally, we offer testable candidates for a variety of human diseases, plant traits, and other classes of phenotypes across a wide array of species.
DNA methylation mediates the impact of exposure to prenatal maternal stress on BMI and central adiposity in children at age 13½ years: Project Ice Storm

PubMed Central

Cao-Lei, Lei; Dancause, Kelsey N; Elgbeili, Guillaume; Massart, Renaud; Szyf, Moshe; Liu, Aihua; Laplante, David P; King, Suzanne

2015-01-01

Prenatal maternal stress (PNMS) in animals and humans predicts obesity and metabolic dysfunction in the offspring. Epigenetic modification of gene function is considered one possible mechanism by which PNMS results in poor outcomes in offspring. Our goal was to determine the role of maternal objective exposure and subjective distress on child BMI and central adiposity at 13½ years of age, and to test the hypothesis that DNA methylation mediates the effect of PNMS on growth. Mothers were pregnant during the January 1998 Quebec ice storm. We assessed their objective exposure and subjective distress in June 1998. At age 13½ their children were weighed and measured (n = 66); a subsample provided blood samples for epigenetic studies (n = 31). Objective and subjective PNMS correlated with central adiposity (waist-to-height ratio); only objective PNMS predicted body mass index (BMI). Bootstrapping analyses showed that the methylation level of genes from established Type-1 and -2 diabetes mellitus pathways showed significant mediation of the effect of objective PNMS on both central adiposity and BMI. However, the negative mediating effects indicate that, although greater objective PNMS predicts greater BMI and adiposity, this effect is dampened by the effects of objective PNMS on DNA methylation, suggesting a protective role of the selected genes from Type-1 and -2 diabetes mellitus pathways. We provide data supporting that DNA methylation is a potential mechanism involved in the long-term adaptation and programming of the genome in response to early adverse environmental factors. PMID:26098974

Systematically Differentiating Functions for Alternatively Spliced Isoforms through Integrating RNA-seq Data

PubMed Central

Menon, Rajasree; Wen, Yuchen; Omenn, Gilbert S.; Kretzler, Matthias; Guan, Yuanfang

2013-01-01

Integrating large-scale functional genomic data has significantly accelerated our understanding of gene functions. However, no algorithm has been developed to differentiate functions for isoforms of the same gene using high-throughput genomic data. This is because standard supervised learning requires ‘ground-truth’ functional annotations, which are lacking at the isoform level. To address this challenge, we developed a generic framework that interrogates public RNA-seq data at the transcript level to differentiate functions for alternatively spliced isoforms. For a specific function, our algorithm identifies the ‘responsible’ isoform(s) of a gene and generates classifying models at the isoform level instead of at the gene level. Through cross-validation, we demonstrated that our algorithm is effective in assigning functions to genes, especially the ones with multiple isoforms, and robust to gene expression levels and removal of homologous gene pairs. We identified genes in the mouse whose isoforms are predicted to have disparate functionalities and experimentally validated the ‘responsible’ isoforms using data from mammary tissue. With protein structure modeling and experimental evidence, we further validated the predicted isoform functional differences for the genes Cdkn2a and Anxa6. Our generic framework is the first to predict and differentiate functions for alternatively spliced isoforms, instead of genes, using genomic data. It is extendable to any base machine learner and other species with alternatively spliced isoforms, and shifts the current gene-centered function prediction to isoform-level predictions. PMID:24244129
Biological interpretation of genome-wide association studies using predicted gene functions.

PubMed

Pers, Tune H; Karjalainen, Juha M; Chan, Yingleong; Westra, Harm-Jan; Wood, Andrew R; Yang, Jian; Lui, Julian C; Vedantam, Sailaja; Gustafsson, Stefan; Esko, Tonu; Frayling, Tim; Speliotes, Elizabeth K; Boehnke, Michael; Raychaudhuri, Soumya; Fehrmann, Rudolf S N; Hirschhorn, Joel N; Franke, Lude

2015-01-19

The main challenge for gaining biological insights from genetic associations is identifying which genes and pathways explain the associations. Here we present DEPICT, an integrative tool that employs predicted gene functions to systematically prioritize the most likely causal genes at associated loci, highlight enriched pathways and identify tissues/cell types where genes from associated loci are highly expressed. DEPICT is not limited to genes with established functions and prioritizes relevant gene sets for many phenotypes.
Biomine: predicting links between biological entities using network models of heterogeneous databases.

PubMed

Eronen, Lauri; Toivonen, Hannu

2012-06-06

Biological databases contain large amounts of data concerning the functions and associations of genes and proteins. Integration of data from several such databases into a single repository can aid the discovery of previously unknown connections spanning multiple types of relationships and databases. Biomine is a system that integrates cross-references from several biological databases into a graph model with multiple types of edges, such as protein interactions, gene-disease associations and gene ontology annotations. Edges are weighted based on their type, reliability, and informativeness. We present Biomine and evaluate its performance in link prediction, where the goal is to predict pairs of nodes that will be connected in the future, based on current data. In particular, we formulate protein interaction prediction and disease gene prioritization tasks as instances of link prediction. The predictions are based on a proximity measure computed on the integrated graph. We consider and experiment with several such measures, and perform a parameter optimization procedure where different edge types are weighted to optimize link prediction accuracy. We also propose a novel method for disease-gene prioritization, defined as finding a subset of candidate genes that cluster together in the graph. We experimentally evaluate Biomine by predicting future annotations in the source databases and prioritizing lists of putative disease genes. The experimental results show that Biomine has strong potential for predicting links when a set of selected candidate links is available. The predictions obtained using the entire Biomine dataset are shown to clearly outperform ones obtained using any single source of data alone, when different types of links are suitably weighted. In the gene prioritization task, an established reference set of disease-associated genes is useful, but the results show that under favorable conditions, Biomine can also perform well when no such information is available.The Biomine system is a proof of concept. Its current version contains 1.1 million entities and 8.1 million relations between them, with focus on human genetics. Some of its functionalities are available in a public query interface at http://biomine.cs.helsinki.fi, allowing searching for and visualizing connections between given biological entities.
Gene array analysis reveals a common Runx transcriptional program controlling cell adhesion and survival

PubMed Central

Wotton, Sandy; Terry, Anne; Kilbey, Anna; Jenkins, Alma; Herzyk, Pawel; Cameron, Ewan; Neil, James C.

2008-01-01

The Runx genes play divergent roles in development and cancer, where they can act either as oncogenes or tumour suppressors. We compared the effects of ectopic Runx expression in established fibroblasts, where all three genes produce an indistinguishable phenotype entailing epithelioid morphology and increased cell survival under stress conditions. Gene array analysis revealed a strongly overlapping transcriptional signature, with no examples of opposing regulation of the same target gene. A common set of 50 highly regulated genes was identified after further filtering on regulation by inducible RUNX1-ER. This set revealed a strong bias towards genes with annotated roles in cancer and development, and a preponderance of targets encoding extracellular or surface proteins, reflecting the marked effects of Runx on cell adhesion. Furthermore, in silico prediction of resistance to glucocorticoid growth inhibition was confirmed in fibroblasts and lymphoid cells expressing ectopic Runx. The effects of fibroblast expression of common RUNX1 fusion oncoproteins (RUNX1-ETO, TEL-RUNX1, CBFB-MYH11) were also tested. While two direct Runx activation target genes were repressed (Ncam1, Rgc32), the fusion proteins appeared to disrupt regulation of down-regulated targets (Cebpd, Id2, Rgs2) rather than impose constitutive repression. These results elucidate the oncogenic potential of the Runx family and reveal novel targets for therapeutic inhibition. PMID:18560354
End-To-End Risk Assesment: From Genes and Protein to Acceptable Radiation Risks for Mars Exploration

NASA Technical Reports Server (NTRS)

Cucinotta, Francis A.; Schimmerling, Walter

2000-01-01

The human exploration of Mars will impose unavoidable health risks from galactic cosmic rays (GCR) and possibly solar particle events (SPE). It is the goal of NASA's Space Radiation Health Program to develop the capability to predict health risks with significant accuracy to ensure that risks are well below acceptable levels and to allow for mitigation approaches to be effective at reasonable costs. End-to-End risk assessment is the approach being followed to understand proton and heavy ion damage at the molecular, cellular, and tissue levels in order to predict the probability of the major health risk including cancer, neurological disorders, hereditary effects, cataracts, and acute radiation sickness and to develop countermeasures for mitigating risks.
Prospects and Potential Uses of Genomic Prediction of Key Performance Traits in Tetraploid Potato.

PubMed

Stich, Benjamin; Van Inghelandt, Delphine

2018-01-01

Genomic prediction is a routine tool in breeding programs of most major animal and plant species. However, its usefulness for potato breeding has not yet been evaluated in detail. The objectives of this study were to (i) examine the prospects of genomic prediction of key performance traits in a diversity panel of tetraploid potato modeling additive, dominance, and epistatic effects, (ii) investigate the effects of size and make up of training set, number of test environments and molecular markers on prediction accuracy, and (iii) assess the effect of including markers from candidate genes on the prediction accuracy. With genomic best linear unbiased prediction (GBLUP), BayesA, BayesCπ, and Bayesian LASSO, four different prediction methods were used for genomic prediction of relative area under disease progress curve after a Phytophthora infestans infection, plant maturity, maturity corrected resistance, tuber starch content, tuber starch yield (TSY), and tuber yield (TY) of 184 tetraploid potato clones or subsets thereof genotyped with the SolCAP 8.3k SNP array. The cross-validated prediction accuracies with GBLUP and the three Bayesian approaches for the six evaluated traits ranged from about 0.5 to about 0.8. For traits with a high expected genetic complexity, such as TSY and TY, we observed an 8% higher prediction accuracy using a model with additive and dominance effects compared with a model with additive effects only. Our results suggest that for oligogenic traits in general and when diagnostic markers are available in particular, the use of Bayesian methods for genomic prediction is highly recommended and that the diagnostic markers should be modeled as fixed effects. The evaluation of the relative performance of genomic prediction vs. phenotypic selection indicated that the former is superior, assuming cycle lengths and selection intensities that are possible to realize in commercial potato breeding programs.
Prospects and Potential Uses of Genomic Prediction of Key Performance Traits in Tetraploid Potato

PubMed Central

Stich, Benjamin; Van Inghelandt, Delphine

2018-01-01

Genomic prediction is a routine tool in breeding programs of most major animal and plant species. However, its usefulness for potato breeding has not yet been evaluated in detail. The objectives of this study were to (i) examine the prospects of genomic prediction of key performance traits in a diversity panel of tetraploid potato modeling additive, dominance, and epistatic effects, (ii) investigate the effects of size and make up of training set, number of test environments and molecular markers on prediction accuracy, and (iii) assess the effect of including markers from candidate genes on the prediction accuracy. With genomic best linear unbiased prediction (GBLUP), BayesA, BayesCπ, and Bayesian LASSO, four different prediction methods were used for genomic prediction of relative area under disease progress curve after a Phytophthora infestans infection, plant maturity, maturity corrected resistance, tuber starch content, tuber starch yield (TSY), and tuber yield (TY) of 184 tetraploid potato clones or subsets thereof genotyped with the SolCAP 8.3k SNP array. The cross-validated prediction accuracies with GBLUP and the three Bayesian approaches for the six evaluated traits ranged from about 0.5 to about 0.8. For traits with a high expected genetic complexity, such as TSY and TY, we observed an 8% higher prediction accuracy using a model with additive and dominance effects compared with a model with additive effects only. Our results suggest that for oligogenic traits in general and when diagnostic markers are available in particular, the use of Bayesian methods for genomic prediction is highly recommended and that the diagnostic markers should be modeled as fixed effects. The evaluation of the relative performance of genomic prediction vs. phenotypic selection indicated that the former is superior, assuming cycle lengths and selection intensities that are possible to realize in commercial potato breeding programs. PMID:29563919
TRANSAT-- method for detecting the conserved helices of functional RNA structures, including transient, pseudo-knotted and alternative structures.

PubMed

Wiebe, Nicholas J P; Meyer, Irmtraud M

2010-06-24

The prediction of functional RNA structures has attracted increased interest, as it allows us to study the potential functional roles of many genes. RNA structure prediction methods, however, assume that there is a unique functional RNA structure and also do not predict functional features required for in vivo folding. In order to understand how functional RNA structures form in vivo, we require sophisticated experiments or reliable prediction methods. So far, there exist only a few, experimentally validated transient RNA structures. On the computational side, there exist several computer programs which aim to predict the co-transcriptional folding pathway in vivo, but these make a range of simplifying assumptions and do not capture all features known to influence RNA folding in vivo. We want to investigate if evolutionarily related RNA genes fold in a similar way in vivo. To this end, we have developed a new computational method, Transat, which detects conserved helices of high statistical significance. We introduce the method, present a comprehensive performance evaluation and show that Transat is able to predict the structural features of known reference structures including pseudo-knotted ones as well as those of known alternative structural configurations. Transat can also identify unstructured sub-sequences bound by other molecules and provides evidence for new helices which may define folding pathways, supporting the notion that homologous RNA sequence not only assume a similar reference RNA structure, but also fold similarly. Finally, we show that the structural features predicted by Transat differ from those assuming thermodynamic equilibrium. Unlike the existing methods for predicting folding pathways, our method works in a comparative way. This has the disadvantage of not being able to predict features as function of time, but has the considerable advantage of highlighting conserved features and of not requiring a detailed knowledge of the cellular environment.
Building gene expression signatures indicative of transcription factor activation to predict AOP modulation

EPA Science Inventory

Building gene expression signatures indicative of transcription factor activation to predict AOP modulation Adverse outcome pathways (AOPs) are a framework for predicting quantitative relationships between molecular initiatin...
Predicting features of breast cancer with gene expression patterns.

PubMed

Lu, Xuesong; Lu, Xin; Wang, Zhigang C; Iglehart, J Dirk; Zhang, Xuegong; Richardson, Andrea L

2008-03-01

Data from gene expression arrays hold an enormous amount of biological information. We sought to determine if global gene expression in primary breast cancers contained information about biologic, histologic, and anatomic features of the disease in individual patients. Microarray data from the tumors of 129 patients were analyzed for the ability to predict biomarkers [estrogen receptor (ER) and HER2], histologic features [grade and lymphatic-vascular invasion (LVI)], and stage parameters (tumor size and lymph node metastasis). Multiple statistical predictors were used and the prediction accuracy was determined by cross-validation error rate; multidimensional scaling (MDS) allowed visualization of the predicted states under study. Models built from gene expression data accurately predict ER and HER2 status, and divide tumor grade into high-grade and low-grade clusters; intermediate-grade tumors are not a unique group. In contrast, gene expression data is inaccurate at predicting tumor size, lymph node status or LVI. The best model for prediction of nodal status included tumor size, LVI status and pathologically defined tumor subtype (based on combinations of ER, HER2, and grade); the addition of microarray-based prediction to this model failed to improve the prediction accuracy. Global gene expression supports a binary division of ER, HER2, and grade, clearly separating tumors into two categories; intermediate values for these bio-indicators do not define intermediate tumor subsets. Results are consistent with a model of regional metastasis that depends on inherent biologic differences in metastatic propensity between breast cancer subtypes, upon which time and chance then operate.
Adipose Gene Expression Prior to Weight Loss Can Differentiate and Weakly Predict Dietary Responders

PubMed Central

Mutch, David M.; Temanni, M. Ramzi; Henegar, Corneliu; Combes, Florence; Pelloux, Véronique; Holst, Claus; Sørensen, Thorkild I. A.; Astrup, Arne; Martinez, J. Alfredo; Saris, Wim H. M.; Viguerie, Nathalie; Langin, Dominique; Zucker, Jean-Daniel; Clément, Karine

2007-01-01

Background The ability to identify obese individuals who will successfully lose weight in response to dietary intervention will revolutionize disease management. Therefore, we asked whether it is possible to identify subjects who will lose weight during dietary intervention using only a single gene expression snapshot. Methodology/Principal Findings The present study involved 54 female subjects from the Nutrient-Gene Interactions in Human Obesity-Implications for Dietary Guidelines (NUGENOB) trial to determine whether subcutaneous adipose tissue gene expression could be used to predict weight loss prior to the 10-week consumption of a low-fat hypocaloric diet. Using several statistical tests revealed that the gene expression profiles of responders (8–12 kgs weight loss) could always be differentiated from non-responders (<4 kgs weight loss). We also assessed whether this differentiation was sufficient for prediction. Using a bottom-up (i.e. black-box) approach, standard class prediction algorithms were able to predict dietary responders with up to 61.1%±8.1% accuracy. Using a top-down approach (i.e. using differentially expressed genes to build a classifier) improved prediction accuracy to 80.9%±2.2%. Conclusion Adipose gene expression profiling prior to the consumption of a low-fat diet is able to differentiate responders from non-responders as well as serve as a weak predictor of subjects destined to lose weight. While the degree of prediction accuracy currently achieved with a gene expression snapshot is perhaps insufficient for clinical use, this work reveals that the comprehensive molecular signature of adipose tissue paves the way for the future of personalized nutrition. PMID:18094752
A radiosensitivity gene signature and PD-L1 status predict clinical outcome of patients with invasive breast carcinoma in The Cancer Genome Atlas (TCGA) dataset.

PubMed

Jang, Bum-Sup; Kim, In Ah

2017-09-01

We investigated the link between the radiosensitivity gene signature and programmed cell death ligand 1 (PD-L1) status and clinical outcome in order to identify a group of patients that would possibly receive clinical benefit of radiotherapy (RT) combined with anti-PD1/PD-L1 therapy. We validated the identified gene signature related to radiosensitivity and analyzed the PD-L1 status of invasive breast cancer in The Cancer Genome Atlas (TCGA) dataset. To validate the gene signature, 1045 patients were selected and divided into two clusters using a consensus clustering algorithm based on their radiosensitive (RS) or radioresistant (RR) designation according to their prognosis. Patients were also stratified as PD-L1-high or PD-L1-low based on the median value of CD274 mRNA expression level as surrogates of PD-L1. Patents assigned to the RS group had decreased risk of recurrence-free survival (RFS) rate than patients in the RR group by univariate analysis (HR 0.45, 95% CI 0.25-0.81, p=0.008) only when treated with RT. The RS group was independently associated with the PD-L1-high group, and CD274 mRNA expression was significantly higher in the RS group (p<0.001) than the RR group. In the PD-L1-high group, the RS group was associated with better RFS compared to the RR group (HR 0.37, 95% CI 0.16-0.87, p=0.022) in multivariate analysis. The level of PD-L1 expression may represent the immunogenicity of tumors, and thus, we speculated that the PD-L1-high group had more immunogenic tumors, which could be more sensitive to radiation-induced immunologic cell death. We first evaluated the predictive value of the radiosensitivity gene signature and described a relationship with this radiosensitivity gene signature and PD-L1. The radiosensitivity gene signature and PD-L1 status were important factors for prediction of the clinical outcome of RT in patients with invasive breast cancer and may be used for selecting patients who will benefit from RT combined with anti-PD1/PDL1 therapy. Copyright © 2017 Elsevier B.V. All rights reserved.
A mathematical model of in vivo bovine blastocyst developmental to gestational Day 15.

PubMed

Shorten, P R; Donnison, M; McDonald, R M; Meier, S; Ledgard, A M; Berg, D

2018-06-20

Bovine embryo growth involves a complex interaction between the developing embryo and the growth-promoting potential of the uterine environment. We have previously established links between embryonic factors (embryo stage, embryo gene expression), maternal factors (progesterone, body condition score), and embryonic growth to 8 d after bulk transfer of Day 7 in vitro-produced blastocysts. In this study we recovered blastocysts on Days 7 and 15 after artificial insemination to test the hypothesis that in vivo and in vitro embryos follow a similar growth program. We conducted our study using 4 commercial farms and repeated our study over 2 yr (2014, 2015), with data available from 2 of the 4 farms in the second year. Morphological and gene expression measurements (196 candidate genes) of the Day 7 embryos were measured and the progesterone concentration of the cows were measured throughout the reproductive cycle as a reflection of the state of the uterine environment. These data were also used to assess the interaction between the uterine environment and the developing embryo and to examine how well Day 7 embryo stage can be predicted from the Day 7 gene expression profile. Progesterone was not a strong predictor of in vivo embryo growth to Day 15. This contrasts with a range of Day 7 embryo transfer studies which demonstrated that progesterone is a very good predictor of embryo growth to Day 15. Our analysis demonstrates that in vivo embryos are 3 times less sensitive to progesterone than in vitro-transferred embryos (up to Day 15). This highlights that caution must be applied when extrapolating the results of in vitro embryo transfer studies to the in vivo situation. The similar variance in measured and predicted (based on Day 15 length) Day 7 embryo stage indicate low stochastic perturbations for in vivo embryo growth (large stochastic growth effects would generate a significantly larger standard deviation in measured embryo length on Day 15). We also identified that Day 7 embryo stage could be predicted based on the Day 7 gene expression profile (58% overall success rate for classification of 5 embryo stages). Our analysis also associated genes with each developmental stage and demonstrates the high level of temporal regulation of genes that occurs during early embryonic development. Copyright © 2018 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Application of hidden Markov models to biological data mining: a case study

NASA Astrophysics Data System (ADS)

Yin, Michael M.; Wang, Jason T.

2000-04-01

In this paper we present an example of biological data mining: the detection of splicing junction acceptors in eukaryotic genes. Identification or prediction of transcribed sequences from within genomic DNA has been a major rate-limiting step in the pursuit of genes. Programs currently available are far from being powerful enough to elucidate the gene structure completely. Here we develop a hidden Markov model (HMM) to represent the degeneracy features of splicing junction acceptor sites in eukaryotic genes. The HMM system is fully trained using an expectation maximization (EM) algorithm and the system performance is evaluated using the 10-way cross- validation method. Experimental results show that our HMM system can correctly classify more than 94% of the candidate sequences (including true and false acceptor sites) into right categories. About 90% of the true acceptor sites and 96% of the false acceptor sites in the test data are classified correctly. These results are very promising considering that only the local information in DNA is used. The proposed model will be a very important component of an effective and accurate gene structure detection system currently being developed in our lab.
Post-transcriptional bursting in genes regulated by small RNA molecules

NASA Astrophysics Data System (ADS)

Rodrigo, Guillermo

2018-03-01

Gene expression programs in living cells are highly dynamic due to spatiotemporal molecular signaling and inherent biochemical stochasticity. Here we study a mechanism based on molecule-to-molecule variability at the RNA level for the generation of bursts of protein production, which can lead to heterogeneity in a cell population. We develop a mathematical framework to show numerically and analytically that genes regulated post transcriptionally by small RNA molecules can exhibit such bursts due to different states of translation activity (on or off), mostly revealed in a regime of few molecules. We exploit this framework to compare transcriptional and post-transcriptional bursting and also to illustrate how to tune the resulting protein distribution with additional post-transcriptional regulations. Moreover, because RNA-RNA interactions are predictable with an energy model, we define the kinetic constants of on-off switching as functions of the two characteristic free-energy differences of the system, activation and formation, with a nonequilibrium scheme. Overall, post-transcriptional bursting represents a distinctive principle linking gene regulation to gene expression noise, which highlights the importance of the RNA layer beyond the simple information transfer paradigm and significantly contributes to the understanding of the intracellular processes from a first-principles perspective.
Structure and expression of genes for a class of cysteine-rich proteins of the cuticle layers of differentiating wool and hair follicles

PubMed Central

1990-01-01

The major histological components of the hair follicle are the hair cortex and cuticle. The hair cuticle cells encase and protect the cortex and undergo a different developmental program to that of the cortex. We report the molecular characterization of a set of evolutionarily conserved hair genes which are transcribed in the hair cuticle late in follicle development. Two genes were isolated and characterized, one expressed in the human follicle and one in the sheep follicle. Each gene encodes a small protein of 16 kD, containing greater than 50 cysteine residues, ranging from 31 to 36 mol% cysteine. Their high cysteine content and in vitro expression data identify them as ultra-high-sulfur (UHS) keratin proteins. The predicted proteins are composed almost entirely of cysteine-rich and glycine-rich repeats. Genomic blots reveal that the UHS keratin proteins are encoded by related multigene families in both the human and sheep genomes. Tissue in situ hybridization demonstrates that the expression of both genes is localized to the hair fiber cuticle and occurs at a late stage in fiber morphogenesis. PMID:1703541
Integration of a splicing regulatory network within the meiotic gene expression program of Saccharomyces cerevisiae

PubMed Central

Munding, Elizabeth M.; Igel, A. Haller; Shiue, Lily; Dorighi, Kristel M.; Treviño, Lisa R.; Ares, Manuel

2010-01-01

Splicing regulatory networks are essential components of eukaryotic gene expression programs, yet little is known about how they are integrated with transcriptional regulatory networks into coherent gene expression programs. Here we define the MER1 splicing regulatory network and examine its role in the gene expression program during meiosis in budding yeast. Mer1p splicing factor promotes splicing of just four pre-mRNAs. All four Mer1p-responsive genes also require Nam8p for splicing activation by Mer1p; however, other genes require Nam8p but not Mer1p, exposing an overlapping meiotic splicing network controlled by Nam8p. MER1 mRNA and three of the four Mer1p substrate pre-mRNAs are induced by the transcriptional regulator Ume6p. This unusual arrangement delays expression of Mer1p-responsive genes relative to other genes under Ume6p control. Products of Mer1p-responsive genes are required for initiating and completing recombination and for activation of Ndt80p, the activator of the transcriptional network required for subsequent steps in the program. Thus, the MER1 splicing regulatory network mediates the dependent relationship between the UME6 and NDT80 transcriptional regulatory networks in the meiotic gene expression program. This study reveals how splicing regulatory networks can be interlaced with transcriptional regulatory networks in eukaryotic gene expression programs. PMID:21123654
Parkinson's Disease Gene Therapy: Success by Design Meets Failure by Efficacy

PubMed Central

Bartus, Raymond T; Weinberg, Marc S; Samulski, R. Jude

2014-01-01

Over the past decade, nine gene therapy clinical trials for Parkinson's disease (PD) have been initiated and completed. Starting with considerable optimism at the initiation of each trial, none of the programs has yet borne sufficiently robust clinical efficacy or found a clear path toward regulatory approval. Despite the immediately disappointing nature of the efficacy outcomes in these trials, the clinical data garnered from the individual studies nonetheless represent tangible and significant progress for the gene therapy field. Collectively, the clinical trials demonstrate that we have overcome the major safety hurdles previously suppressing central nervous system (CNS) gene therapy, for none produced any evidence of untoward risk or harm after administration of various vector-delivery systems. More importantly, these studies also demonstrated controlled, highly persistent generation of biologically active proteins targeted to structures deep in the human brain. Therefore, a renewed, focused emphasis must be placed on advancing clinical efficacy by improving clinical trial design, patient selection and outcome measures, developing more predictive animal models to support clinical testing, carefully performing retrospective analyses, and most importantly moving forward—beyond our past limits. PMID:24356252
An Evolutionarily Conserved DOF-CONSTANS Module Controls Plant Photoperiodic Signaling.

PubMed

Lucas-Reina, Eva; Romero-Campero, Francisco J; Romero, José M; Valverde, Federico

2015-06-01

The response to daylength is a crucial process that evolved very early in plant evolution, entitling the early green eukaryote to predict seasonal variability and attune its physiological responses to the environment. The photoperiod responses evolved into the complex signaling pathways that govern the angiosperm floral transition today. The Chlamydomonas reinhardtii DNA-Binding with One Finger (CrDOF) gene controls transcription in a photoperiod-dependent manner, and its misexpression influences algal growth and viability. In short days, CrDOF enhances CrCO expression, a homolog of plant CONSTANS (CO), by direct binding to its promoter, while it reduces the expression of cell division genes in long days independently of CrCO. In Arabidopsis (Arabidopsis thaliana), transgenic plants overexpressing CrDOF show floral delay and reduced expression of the photoperiodic genes CO and FLOWERING LOCUS T. The conservation of the DOF-CO module during plant evolution could be an important clue to understanding diversification by the inheritance of conserved gene toolkits in key developmental programs. © 2015 American Society of Plant Biologists. All Rights Reserved.
Improving promoter prediction for the NNPP2.2 algorithm: a case study using Escherichia coli DNA sequences.

PubMed

Burden, S; Lin, Y-X; Zhang, R

2005-03-01

Although a great deal of research has been undertaken in the area of promoter prediction, prediction techniques are still not fully developed. Many algorithms tend to exhibit poor specificity, generating many false positives, or poor sensitivity. The neural network prediction program NNPP2.2 is one such example. To improve the NNPP2.2 prediction technique, the distance between the transcription start site (TSS) associated with the promoter and the translation start site (TLS) of the subsequent gene coding region has been studied for Escherichia coli K12 bacteria. An empirical probability distribution that is consistent for all E.coli promoters has been established. This information is combined with the results from NNPP2.2 to create a new technique called TLS-NNPP, which improves the specificity of promoter prediction. The technique is shown to be effective using E.coli DNA sequences, however, it is applicable to any organism for which a set of promoters has been experimentally defined. The data used in this project and the prediction results for the tested sequences can be obtained from http://www.uow.edu.au/~yanxia/E_Coli_paper/SBurden_Results.xls alh98@uow.edu.au.

Applicability of a gene expression based prediction method to SD and Wistar rats: an example of CARCINOscreen®.

PubMed

Matsumoto, Hiroshi; Saito, Fumiyo; Takeyoshi, Masahiro

2015-12-01

Recently, the development of several gene expression-based prediction methods has been attempted in the fields of toxicology. CARCINOscreen® is a gene expression-based screening method to predict carcinogenicity of chemicals which target the liver with high accuracy. In this study, we investigated the applicability of the gene expression-based screening method to SD and Wistar rats by using CARCINOscreen®, originally developed with F344 rats, with two carcinogens, 2,4-diaminotoluen and thioacetamide, and two non-carcinogens, 2,6-diaminotoluen and sodium benzoate. After the 28-day repeated dose test was conducted with each chemical in SD and Wistar rats, microarray analysis was performed using total RNA extracted from each liver. Obtained gene expression data were applied to CARCINOscreen®. Predictive scores obtained by the CARCINOscreen® for known carcinogens were > 2 in all strains of rats, while non-carcinogens gave prediction scores below 0.5. These results suggested that the gene expression based screening method, CARCINOscreen®, can be applied to SD and Wistar rats, widely used strains in toxicological studies, by setting of an appropriate boundary line of prediction score to classify the chemicals into carcinogens and non-carcinogens.
A Generalized Approach for Measuring Relationships Among Genes.

PubMed

Wang, Lijun; Ahsan, Md Asif; Chen, Ming

2017-07-21

Several methods for identifying relationships among pairs of genes have been developed. In this article, we present a generalized approach for measuring relationships between any pairs of genes, which is based on statistical prediction. We derive two particular versions of the generalized approach, least squares estimation (LSE) and nearest neighbors prediction (NNP). According to mathematical proof, LSE is equivalent to the methods based on correlation; and NNP is approximate to one popular method called the maximal information coefficient (MIC) according to the performances in simulations and real dataset. Moreover, the approach based on statistical prediction can be extended from two-genes relationships to multi-genes relationships. This application would help to identify relationships among multi-genes.
Systems Biology-Based Identification of Mycobacterium tuberculosis Persistence Genes in Mouse Lungs

PubMed Central

Dutta, Noton K.; Bandyopadhyay, Nirmalya; Veeramani, Balaji; Lamichhane, Gyanu; Karakousis, Petros C.; Bader, Joel S.

2014-01-01

ABSTRACT Identifying Mycobacterium tuberculosis persistence genes is important for developing novel drugs to shorten the duration of tuberculosis (TB) treatment. We developed computational algorithms that predict M. tuberculosis genes required for long-term survival in mouse lungs. As the input, we used high-throughput M. tuberculosis mutant library screen data, mycobacterial global transcriptional profiles in mice and macrophages, and functional interaction networks. We selected 57 unique, genetically defined mutants (18 previously tested and 39 untested) to assess the predictive power of this approach in the murine model of TB infection. We observed a 6-fold enrichment in the predicted set of M. tuberculosis genes required for persistence in mouse lungs relative to randomly selected mutant pools. Our results also allowed us to reclassify several genes as required for M. tuberculosis persistence in vivo. Finally, the new results implicated additional high-priority candidate genes for testing. Experimental validation of computational predictions demonstrates the power of this systems biology approach for elucidating M. tuberculosis persistence genes. PMID:24549847
Analysis of optimality in natural and perturbed metabolic networks

PubMed Central

Segrè, Daniel; Vitkup, Dennis; Church, George M.

2002-01-01

An important goal of whole-cell computational modeling is to integrate detailed biochemical information with biological intuition to produce testable predictions. Based on the premise that prokaryotes such as Escherichia coli have maximized their growth performance along evolution, flux balance analysis (FBA) predicts metabolic flux distributions at steady state by using linear programming. Corroborating earlier results, we show that recent intracellular flux data for wild-type E. coli JM101 display excellent agreement with FBA predictions. Although the assumption of optimality for a wild-type bacterium is justifiable, the same argument may not be valid for genetically engineered knockouts or other bacterial strains that were not exposed to long-term evolutionary pressure. We address this point by introducing the method of minimization of metabolic adjustment (MOMA), whereby we test the hypothesis that knockout metabolic fluxes undergo a minimal redistribution with respect to the flux configuration of the wild type. MOMA employs quadratic programming to identify a point in flux space, which is closest to the wild-type point, compatibly with the gene deletion constraint. Comparing MOMA and FBA predictions to experimental flux data for E. coli pyruvate kinase mutant PB25, we find that MOMA displays a significantly higher correlation than FBA. Our method is further supported by experimental data for E. coli knockout growth rates. It can therefore be used for predicting the behavior of perturbed metabolic networks, whose growth performance is in general suboptimal. MOMA and its possible future extensions may be useful in understanding the evolutionary optimization of metabolism. PMID:12415116
Aggregating Data for Computational Toxicology Applications: The U.S. Environmental Protection Agency (EPA) Aggregated Computational Toxicology Resource (ACToR) System

PubMed Central

Judson, Richard S.; Martin, Matthew T.; Egeghy, Peter; Gangwal, Sumit; Reif, David M.; Kothiya, Parth; Wolf, Maritja; Cathey, Tommy; Transue, Thomas; Smith, Doris; Vail, James; Frame, Alicia; Mosher, Shad; Cohen Hubal, Elaine A.; Richard, Ann M.

2012-01-01

Computational toxicology combines data from high-throughput test methods, chemical structure analyses and other biological domains (e.g., genes, proteins, cells, tissues) with the goals of predicting and understanding the underlying mechanistic causes of chemical toxicity and for predicting toxicity of new chemicals and products. A key feature of such approaches is their reliance on knowledge extracted from large collections of data and data sets in computable formats. The U.S. Environmental Protection Agency (EPA) has developed a large data resource called ACToR (Aggregated Computational Toxicology Resource) to support these data-intensive efforts. ACToR comprises four main repositories: core ACToR (chemical identifiers and structures, and summary data on hazard, exposure, use, and other domains), ToxRefDB (Toxicity Reference Database, a compilation of detailed in vivo toxicity data from guideline studies), ExpoCastDB (detailed human exposure data from observational studies of selected chemicals), and ToxCastDB (data from high-throughput screening programs, including links to underlying biological information related to genes and pathways). The EPA DSSTox (Distributed Structure-Searchable Toxicity) program provides expert-reviewed chemical structures and associated information for these and other high-interest public inventories. Overall, the ACToR system contains information on about 400,000 chemicals from 1100 different sources. The entire system is built using open source tools and is freely available to download. This review describes the organization of the data repository and provides selected examples of use cases. PMID:22408426
Aggregating data for computational toxicology applications: The U.S. Environmental Protection Agency (EPA) Aggregated Computational Toxicology Resource (ACToR) System.

PubMed

Judson, Richard S; Martin, Matthew T; Egeghy, Peter; Gangwal, Sumit; Reif, David M; Kothiya, Parth; Wolf, Maritja; Cathey, Tommy; Transue, Thomas; Smith, Doris; Vail, James; Frame, Alicia; Mosher, Shad; Cohen Hubal, Elaine A; Richard, Ann M

2012-01-01

Computational toxicology combines data from high-throughput test methods, chemical structure analyses and other biological domains (e.g., genes, proteins, cells, tissues) with the goals of predicting and understanding the underlying mechanistic causes of chemical toxicity and for predicting toxicity of new chemicals and products. A key feature of such approaches is their reliance on knowledge extracted from large collections of data and data sets in computable formats. The U.S. Environmental Protection Agency (EPA) has developed a large data resource called ACToR (Aggregated Computational Toxicology Resource) to support these data-intensive efforts. ACToR comprises four main repositories: core ACToR (chemical identifiers and structures, and summary data on hazard, exposure, use, and other domains), ToxRefDB (Toxicity Reference Database, a compilation of detailed in vivo toxicity data from guideline studies), ExpoCastDB (detailed human exposure data from observational studies of selected chemicals), and ToxCastDB (data from high-throughput screening programs, including links to underlying biological information related to genes and pathways). The EPA DSSTox (Distributed Structure-Searchable Toxicity) program provides expert-reviewed chemical structures and associated information for these and other high-interest public inventories. Overall, the ACToR system contains information on about 400,000 chemicals from 1100 different sources. The entire system is built using open source tools and is freely available to download. This review describes the organization of the data repository and provides selected examples of use cases.
Neighborhood and Family Environment of Expectant Mothers May Influence Prenatal Programming of Adult Cancer Risk: Discussion and an Illustrative Biomarker Example

PubMed Central

King, Katherine E.; Kane, Jennifer B.; Scarbrough, Peter; Hoyo, Cathrine; Murphy, Susan K.

2016-01-01

Objectives Childhood stressors including physical abuse predict adult cancer risk. Prior research portrays this finding as indirect through coping behaviors including adult smoking or through increased toxic exposures during childhood. Little is known about potential direct causal mechanisms between early-life stressors and adult cancer. Because prenatal conditions can affect gene expression by altering DNA methylation with implications for adult health, we hypothesize that maternal stress may program methylation of cancer-linked genes during gametogenesis. Methods To illustrate, we relate maternal social resources to methylation at the imprinted MEG3 differentially methylated regulatory region linked to multiple cancer types. Mothers (n=489) in umbilical cord blood of diverse birth cohort (Durham, North Carolina) provided newborn’s cord blood and completed a questionnaire. Results Newborns of currently-married mothers show lower (−0.321 SD, p<0.05) methylation vs. newborns of never-married mothers, who did not differ from those whose mothers are cohabiting and others (adjusted for demographics). MEG3 DNA methylation levels are also lower when maternal grandmothers co-reside before pregnancy (−0.314 SD, p<0.05). A 1-SD increase in prenatal neighborhood disadvantage also predicts higher methylation (−0.137 SD, p<0.05). Conclusions Maternal social resources may result in differential methylation of MEG3, which demonstrates a potential partial mechanism priming socially disadvantaged newborns for later risk of some cancers. PMID:27050035
Biological interpretation of genome-wide association studies using predicted gene functions

PubMed Central

Pers, Tune H.; Karjalainen, Juha M.; Chan, Yingleong; Westra, Harm-Jan; Wood, Andrew R.; Yang, Jian; Lui, Julian C.; Vedantam, Sailaja; Gustafsson, Stefan; Esko, Tonu; Frayling, Tim; Speliotes, Elizabeth K.; Boehnke, Michael; Raychaudhuri, Soumya; Fehrmann, Rudolf S.N.; Hirschhorn, Joel N.; Franke, Lude

2015-01-01

The main challenge for gaining biological insights from genetic associations is identifying which genes and pathways explain the associations. Here we present DEPICT, an integrative tool that employs predicted gene functions to systematically prioritize the most likely causal genes at associated loci, highlight enriched pathways and identify tissues/cell types where genes from associated loci are highly expressed. DEPICT is not limited to genes with established functions and prioritizes relevant gene sets for many phenotypes. PMID:25597830
A Simple Measure of the Dynamics of Segmented Genomes: An Application to Influenza

NASA Astrophysics Data System (ADS)

Aris-Brosou, Stéphane

The severity of influenza epidemics, which can potentially become a pandemic, has been very difficult to predict. However, past efforts were focusing on gene-by-gene approaches, while it is acknowledged that the whole genome dynamics contribute to the severity of an epidemic. Here, putting this rationale into action, I describe a simple measure of the amount of reassortment that affects influenza at a genomic scale during a particular year. The analysis of 530 complete genomes of the H1N1 subtype, sampled over eleven years, shows that the proposed measure explains 58% of the variance in the prevalence of H1 influenza in the US population. The proposed measure, denoted nRF, could therefore improve influenza surveillance programs at a minimal cost.
Creating and validating cis-regulatory maps of tissue-specific gene expression regulation

PubMed Central

O'Connor, Timothy R.; Bailey, Timothy L.

2014-01-01

Predicting which genomic regions control the transcription of a given gene is a challenge. We present a novel computational approach for creating and validating maps that associate genomic regions (cis-regulatory modules–CRMs) with genes. The method infers regulatory relationships that explain gene expression observed in a test tissue using widely available genomic data for ‘other’ tissues. To predict the regulatory targets of a CRM, we use cross-tissue correlation between histone modifications present at the CRM and expression at genes within 1 Mbp of it. To validate cis-regulatory maps, we show that they yield more accurate models of gene expression than carefully constructed control maps. These gene expression models predict observed gene expression from transcription factor binding in the CRMs linked to that gene. We show that our maps are able to identify long-range regulatory interactions and improve substantially over maps linking genes and CRMs based on either the control maps or a ‘nearest neighbor’ heuristic. Our results also show that it is essential to include CRMs predicted in multiple tissues during map-building, that H3K27ac is the most informative histone modification, and that CAGE is the most informative measure of gene expression for creating cis-regulatory maps. PMID:25200088
The prediction of candidate genes for cervix related cancer through gene ontology and graph theoretical approach.

PubMed

Hindumathi, V; Kranthi, T; Rao, S B; Manimaran, P

2014-06-01

With rapidly changing technology, prediction of candidate genes has become an indispensable task in recent years mainly in the field of biological research. The empirical methods for candidate gene prioritization that succors to explore the potential pathway between genetic determinants and complex diseases are highly cumbersome and labor intensive. In such a scenario predicting potential targets for a disease state through in silico approaches are of researcher's interest. The prodigious availability of protein interaction data coupled with gene annotation renders an ease in the accurate determination of disease specific candidate genes. In our work we have prioritized the cervix related cancer candidate genes by employing Csaba Ortutay and his co-workers approach of identifying the candidate genes through graph theoretical centrality measures and gene ontology. With the advantage of the human protein interaction data, cervical cancer gene sets and the ontological terms, we were able to predict 15 novel candidates for cervical carcinogenesis. The disease relevance of the anticipated candidate genes was corroborated through a literature survey. Also the presence of the drugs for these candidates was detected through Therapeutic Target Database (TTD) and DrugMap Central (DMC) which affirms that they may be endowed as potential drug targets for cervical cancer.
Targeted next generation sequencing of the entire vitamin D receptor gene reveals polymorphisms correlated with vitamin D deficiency among older Filipino women with and without fragility fracture.

PubMed

Zumaraga, Mark Pretzel; Medina, Paul Julius; Recto, Juan Miguel; Abrahan, Lauro; Azurin, Edelyn; Tanchoco, Celeste C; Jimeno, Cecilia A; Palmes-Saloma, Cynthia

2017-03-01

This study aimed to discover genetic variants in the entire 101 kB vitamin D receptor (VDR) gene for vitamin D deficiency in a group of postmenopausal Filipino women using targeted next generation sequencing (TNGS) approach in a case-control study design. A total of 50 women with and without osteoporotic fracture seen at the Philippine Orthopedic Center were included. Blood samples were collected for determination of serum vitamin D, calcium, phosphorus, glucose, blood urea nitrogen, creatinine, aspartate aminotransferase, alanine aminotransferase and as primary source for targeted VDR gene sequencing using the Ion Torrent Personal Genome Machine. The variant calling was based on the GATK best practice workflow and annotated using Annovar tool. A total of 1496 unique variants in the whole 101-kb VDR gene were identified. Novel sequence variations not registered in the dbSNP database were found among cases and controls at a rate of 23.1% and 16.6% of total discovered variants, respectively. One disease-associated enhancer showed statistically significant association to low serum 25-hydroxy vitamin D levels (Pearson chi-square P-value=0.009). The transcription factor binding site prediction program PROMO predicted the disruption of three transcription factor binding sites in this enhancer region. These findings show the power of TNGS in identifying sequence variations in a very large gene and the surprising results obtained in this study greatly expand the catalog of known VDR sequence variants that may represent an important clue in the emergence of vitamin D deficiency. Such information will also provide the additional guidance necessary toward a personalized nutritional advice to reach sufficient vitamin D status. Copyright © 2016 Elsevier Inc. All rights reserved.
Targeted resequencing of candidate genes reveals novel variants associated with severe Behçet's uveitis.

PubMed

Kim, Sang Jin; Lee, Seungbok; Park, Changho; Seo, Jeong-Sun; Kim, Jong-Il; Yu, Hyeong Gon

2013-10-18

Behçet's disease (BD) is a chronic systemic inflammatory disorder characterized by four major manifestations: recurrent uveitis, oral and genital ulcers and skin lesions. To identify some pathogenic variants associated with severe Behçet's uveitis, we used targeted and massively parallel sequencing methods to explore the genetic diversity of target regions. A solution-based target enrichment kit was designed to capture whole-exonic regions of 132 candidate genes. Using a multiplexing strategy, 32 samples from patients with a severe type of Behçet's uveitis were sequenced with a Genome Analyzer IIx. We compared the frequency of each variant with that of 59 normal Korean controls, and selected five rare and eight common single-nucleotide variants as the candidates for a replication study. The selected variants were genotyped in 61 cases and 320 controls and, as a result, two rare and seven common variants showed significant associations with severe Behçet's uveitis (P<0.05). Some of these, including rs199955684 in KIR3DL3, rs1801133 in MTHFR, rs1051790 in MICA and rs1051456 in KIR2DL4, were predicted to be damaging by either the PolyPhen-2 or SIFT prediction program. Variants on FCGR3A (rs396991) and ICAM1 (rs5498) have been previously reported as susceptibility loci of this disease, and those on IFNAR1, MTFHR and MICA also replicated the previous reports at the gene level. The KIR3DL3 and KIR2DL4 genes are novel susceptibility genes that have not been reported in association with BD. In conclusion, this study showed that target enrichment and next-generation sequencing technologies can provide valuable information on the genetic predisposition for Behçet's uveitis.
Use of mutation spectra analysis software.

PubMed

Rogozin, I; Kondrashov, F; Glazko, G

2001-02-01

The study and comparison of mutation(al) spectra is an important problem in molecular biology, because these spectra often reflect on important features of mutations and their fixation. Such features include the interaction of DNA with various mutagens, the function of repair/replication enzymes, and properties of target proteins. It is known that mutability varies significantly along nucleotide sequences, such that mutations often concentrate at certain positions, called "hotspots," in a sequence. In this paper, we discuss in detail two approaches for mutation spectra analysis: the comparison of mutation spectra with a HG-PUBL program, (FTP: sunsite.unc.edu/pub/academic/biology/dna-mutations/hyperg) and hotspot prediction with the CLUSTERM program (www.itba.mi.cnr.it/webmutation; ftp.bionet.nsc.ru/pub/biology/dbms/clusterm.zip). Several other approaches for mutational spectra analysis, such as the analysis of a target protein structure, hotspot context revealing, multiple spectra comparisons, as well as a number of mutation databases are briefly described. Mutation spectra in the lacI gene of E. coli and the human p53 gene are used for illustration of various difficulties of such analysis. Copyright 2001 Wiley-Liss, Inc.
Machine Learning–Based Differential Network Analysis: A Study of Stress-Responsive Transcriptomes in Arabidopsis[W

PubMed Central

Ma, Chuang; Xin, Mingming; Feldmann, Kenneth A.; Wang, Xiangfeng

2014-01-01

Machine learning (ML) is an intelligent data mining technique that builds a prediction model based on the learning of prior knowledge to recognize patterns in large-scale data sets. We present an ML-based methodology for transcriptome analysis via comparison of gene coexpression networks, implemented as an R package called machine learning–based differential network analysis (mlDNA) and apply this method to reanalyze a set of abiotic stress expression data in Arabidopsis thaliana. The mlDNA first used a ML-based filtering process to remove nonexpressed, constitutively expressed, or non-stress-responsive “noninformative” genes prior to network construction, through learning the patterns of 32 expression characteristics of known stress-related genes. The retained “informative” genes were subsequently analyzed by ML-based network comparison to predict candidate stress-related genes showing expression and network differences between control and stress networks, based on 33 network topological characteristics. Comparative evaluation of the network-centric and gene-centric analytic methods showed that mlDNA substantially outperformed traditional statistical testing–based differential expression analysis at identifying stress-related genes, with markedly improved prediction accuracy. To experimentally validate the mlDNA predictions, we selected 89 candidates out of the 1784 predicted salt stress–related genes with available SALK T-DNA mutagenesis lines for phenotypic screening and identified two previously unreported genes, mutants of which showed salt-sensitive phenotypes. PMID:24520154
Determining Cutoff Point of Ensemble Trees Based on Sample Size in Predicting Clinical Dose with DNA Microarray Data.

PubMed

Yılmaz Isıkhan, Selen; Karabulut, Erdem; Alpar, Celal Reha

2016-01-01

Background/Aim . Evaluating the success of dose prediction based on genetic or clinical data has substantially advanced recently. The aim of this study is to predict various clinical dose values from DNA gene expression datasets using data mining techniques. Materials and Methods . Eleven real gene expression datasets containing dose values were included. First, important genes for dose prediction were selected using iterative sure independence screening. Then, the performances of regression trees (RTs), support vector regression (SVR), RT bagging, SVR bagging, and RT boosting were examined. Results . The results demonstrated that a regression-based feature selection method substantially reduced the number of irrelevant genes from raw datasets. Overall, the best prediction performance in nine of 11 datasets was achieved using SVR; the second most accurate performance was provided using a gradient-boosting machine (GBM). Conclusion . Analysis of various dose values based on microarray gene expression data identified common genes found in our study and the referenced studies. According to our findings, SVR and GBM can be good predictors of dose-gene datasets. Another result of the study was to identify the sample size of n = 25 as a cutoff point for RT bagging to outperform a single RT.
PHENOstruct: Prediction of human phenotype ontology terms using heterogeneous data sources.

PubMed

Kahanda, Indika; Funk, Christopher; Verspoor, Karin; Ben-Hur, Asa

2015-01-01

The human phenotype ontology (HPO) was recently developed as a standardized vocabulary for describing the phenotype abnormalities associated with human diseases. At present, only a small fraction of human protein coding genes have HPO annotations. But, researchers believe that a large portion of currently unannotated genes are related to disease phenotypes. Therefore, it is important to predict gene-HPO term associations using accurate computational methods. In this work we demonstrate the performance advantage of the structured SVM approach which was shown to be highly effective for Gene Ontology term prediction in comparison to several baseline methods. Furthermore, we highlight a collection of informative data sources suitable for the problem of predicting gene-HPO associations, including large scale literature mining data.
A deep auto-encoder model for gene expression prediction.

PubMed

Xie, Rui; Wen, Jia; Quitadamo, Andrew; Cheng, Jianlin; Shi, Xinghua

2017-11-17

Gene expression is a key intermediate level that genotypes lead to a particular trait. Gene expression is affected by various factors including genotypes of genetic variants. With an aim of delineating the genetic impact on gene expression, we build a deep auto-encoder model to assess how good genetic variants will contribute to gene expression changes. This new deep learning model is a regression-based predictive model based on the MultiLayer Perceptron and Stacked Denoising Auto-encoder (MLP-SAE). The model is trained using a stacked denoising auto-encoder for feature selection and a multilayer perceptron framework for backpropagation. We further improve the model by introducing dropout to prevent overfitting and improve performance. To demonstrate the usage of this model, we apply MLP-SAE to a real genomic datasets with genotypes and gene expression profiles measured in yeast. Our results show that the MLP-SAE model with dropout outperforms other models including Lasso, Random Forests and the MLP-SAE model without dropout. Using the MLP-SAE model with dropout, we show that gene expression quantifications predicted by the model solely based on genotypes, align well with true gene expression patterns. We provide a deep auto-encoder model for predicting gene expression from SNP genotypes. This study demonstrates that deep learning is appropriate for tackling another genomic problem, i.e., building predictive models to understand genotypes' contribution to gene expression. With the emerging availability of richer genomic data, we anticipate that deep learning models play a bigger role in modeling and interpreting genomics.
Sequence-based Network Completion Reveals the Integrality of Missing Reactions in Metabolic Networks.

PubMed

Krumholz, Elias W; Libourel, Igor G L

2015-07-31

Genome-scale metabolic models are central in connecting genotypes to metabolic phenotypes. However, even for well studied organisms, such as Escherichia coli, draft networks do not contain a complete biochemical network. Missing reactions are referred to as gaps. These gaps need to be filled to enable functional analysis, and gap-filling choices influence model predictions. To investigate whether functional networks existed where all gap-filling reactions were supported by sequence similarity to annotated enzymes, four draft networks were supplemented with all reactions from the Model SEED database for which minimal sequence similarity was found in their genomes. Quadratic programming revealed that the number of reactions that could partake in a gap-filling solution was vast: 3,270 in the case of E. coli, where 72% of the metabolites in the draft network could connect a gap-filling solution. Nonetheless, no network could be completed without the inclusion of orphaned enzymes, suggesting that parts of the biochemistry integral to biomass precursor formation are uncharacterized. However, many gap-filling reactions were well determined, and the resulting networks showed improved prediction of gene essentiality compared with networks generated through canonical gap filling. In addition, gene essentiality predictions that were sensitive to poorly determined gap-filling reactions were of poor quality, suggesting that damage to the network structure resulting from the inclusion of erroneous gap-filling reactions may be predictable. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.
Do estrogen receptor alpha polymorphisms have any impact on the outcome in an ART program?

PubMed

Anagnostou, Elli; Malamas, Fotodotis; Mavrogianni, Despina; Dinopoulou, Vasiliki; Drakakis, Peter; Kallianidis, Konstantinos; Loutradis, Dimitris

2013-04-01

To investigate two of the most studied estrogen receptor alpha polymorphisms (PvuII and XbaI) in combination, in order to evaluate their impact on an ART program outcome. 203 normally ovulating women who underwent IVF or ICSI treatment were genotyped for PvuII and XbaI polymorphisms in ESR1 intron 1 using Real-Time PCR. The relationship between the presence of polymorphic alleles and the ovulation induction parameters and outcome was examined. Women were grouped according to the number of polymorphic alleles they carried in two groups (0-2 versus 3-4 polymorphic alleles). The presence of 3 or more polymorphic alleles was associated with significantly lower E2 levels on the day of hCG administration and a significantly lower rate of good quality embryos. There is an association between ESR1 polymorphisms and some ART parameters such as the level of E2 on the day of hCG administration and the quality of the embryos. These results underline the importance of ESR1 as a candidate gene for the prediction of ovarian response to IVF/ICSI protocols. Future research work concerning several more genes is necessary for a better evaluation of patients before entering an IVF/ICSI program.

Mixed Sequence Reader: A Program for Analyzing DNA Sequences with Heterozygous Base Calling

PubMed Central

Chang, Chun-Tien; Tsai, Chi-Neu; Tang, Chuan Yi; Chen, Chun-Houh; Lian, Jang-Hau; Hu, Chi-Yu; Tsai, Chia-Lung; Chao, Angel; Lai, Chyong-Huey; Wang, Tzu-Hao; Lee, Yun-Shien

2012-01-01

The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR), which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i) physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii) predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS); (iii) determine human papilloma virus (HPV) genotypes by searching current viral databases in cases of double infections; (iv) estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4) and its paralog HSPDP3. PMID:22778697
Hi-C Chromatin Interaction Networks Predict Co-expression in the Mouse Cortex

PubMed Central

Hulsman, Marc; Lelieveldt, Boudewijn P. F.; de Ridder, Jeroen; Reinders, Marcel

2015-01-01

The three dimensional conformation of the genome in the cell nucleus influences important biological processes such as gene expression regulation. Recent studies have shown a strong correlation between chromatin interactions and gene co-expression. However, predicting gene co-expression from frequent long-range chromatin interactions remains challenging. We address this by characterizing the topology of the cortical chromatin interaction network using scale-aware topological measures. We demonstrate that based on these characterizations it is possible to accurately predict spatial co-expression between genes in the mouse cortex. Consistent with previous findings, we find that the chromatin interaction profile of a gene-pair is a good predictor of their spatial co-expression. However, the accuracy of the prediction can be substantially improved when chromatin interactions are described using scale-aware topological measures of the multi-resolution chromatin interaction network. We conclude that, for co-expression prediction, it is necessary to take into account different levels of chromatin interactions ranging from direct interaction between genes (i.e. small-scale) to chromatin compartment interactions (i.e. large-scale). PMID:25965262
DroSpeGe: rapid access database for new Drosophila species genomes.

PubMed

Gilbert, Donald G

2007-01-01

The Drosophila species comparative genome database DroSpeGe (http://insects.eugenes.org/DroSpeGe/) provides genome researchers with rapid, usable access to 12 new and old Drosophila genomes, since its inception in 2004. Scientists can use, with minimal computing expertise, the wealth of new genome information for developing new insights into insect evolution. New genome assemblies provided by several sequencing centers have been annotated with known model organism gene homologies and gene predictions to provided basic comparative data. TeraGrid supplies the shared cyberinfrastructure for the primary computations. This genome database includes homologies to Drosophila melanogaster and eight other eukaryote model genomes, and gene predictions from several groups. BLAST searches of the newest assemblies are integrated with genome maps. GBrowse maps provide detailed views of cross-species aligned genomes. BioMart provides for data mining of annotations and sequences. Common chromosome maps identify major synteny among species. Potential gain and loss of genes is suggested by Gene Ontology groupings for genes of the new species. Summaries of essential genome statistics include sizes, genes found and predicted, homology among genomes, phylogenetic trees of species and comparisons of several gene predictions for sensitivity and specificity in finding new and known genes.
Complete genome sequence analysis of the fish pathogen Flavobacterium columnare provides insights into antibiotic resistance and pathogenicity related genes.

PubMed

Zhang, Yulei; Zhao, Lijuan; Chen, Wenjie; Huang, Yunmao; Yang, Ling; Sarathbabu, V; Wu, Zaohe; Li, Jun; Nie, Pin; Lin, Li

2017-10-01

We analyzed here the complete genome sequences of a highly virulent Flavobacterium columnare Pf1 strain isolated in our laboratory. The complete genome consists of a 3,171,081 bp circular DNA with 2784 predicted protein-coding genes. Among these, 286 genes were predicted as antibiotic resistance genes, including 32 RND-type efflux pump related genes which were associated with the export of aminoglycosides, indicating inducible aminoglycosides resistances in F. columnare. On the other hand, 328 genes were predicted as pathogenicity related genes which could be classified as virulence factors, gliding motility proteins, adhesins, and many putative secreted proteases. These genes were probably involved in the colonization, invasion and destruction of fish tissues during the infection of F. columnare. Apparently, our obtained complete genome sequences provide the basis for the explanation of the interactions between the F. columnare and the infected fish. The predicted antibiotic resistance and pathogenicity related genes will shed a new light on the development of more efficient preventional strategies against the infection of F. columnare, which is a major worldwide fish pathogen. Copyright © 2017 Elsevier Ltd. All rights reserved.
Gene expression patterns in formalin-fixed, paraffin-embedded core biopsies predict docetaxel chemosensitivity in breast cancer patients.

PubMed

Chang, Jenny C; Makris, Andreas; Gutierrez, M Carolina; Hilsenbeck, Susan G; Hackett, James R; Jeong, Jennie; Liu, Mei-Lan; Baker, Joffre; Clark-Langone, Kim; Baehner, Frederick L; Sexton, Krsytal; Mohsin, Syed; Gray, Tara; Alvarez, Laura; Chamness, Gary C; Osborne, C Kent; Shak, Steven

2008-03-01

Previously, we had identified gene expression patterns that predicted response to neoadjuvant docetaxel. Other studies have validated that a high Recurrence Score (RS) by the 21-gene RT-PCR assay is predictive of worse prognosis but better response to chemotherapy. We investigated whether tumor expression of these 21 genes and other candidate genes can predict response to docetaxel. Core biopsies from 97 patients were obtained before treatment with neoadjuvant docetaxel (4 cycles, 100 mg/m2 q3 weeks). Three 10-microm FFPE sections were submitted for quantitative RT-PCR assays of 192 genes that were selected from our previous work and the literature. Of the 97 patients, 81 (84%) had sufficient invasive cancer, 80 (82%) had sufficient RNA for QRTPCR assay, and 72 (74%) had clinical response data. Mean age was 48.5 years, and the median tumor size was 6 cm. Clinical complete responses (CR) were observed in 12 (17%), partial responses in 41 (57%), stable disease in 17 (24%), and progressive disease in 2 patients (3%). A significant relationship (P<0.05) between gene expression and CR was observed for 14 genes, including CYBA. CR was associated with lower expression of the ER gene group and higher expression of the proliferation gene group from the 21 gene assay. Of note, CR was more likely with a high RS (P=0.008). We have established molecular profiles of sensitivity to docetaxel. RT-PCR technology provides a potential platform for a predictive test of docetaxel chemosensitivity using small amounts of routinely processed material.
Microbial Functional Gene Diversity Predicts Groundwater Contamination and Ecosystem Functioning.

PubMed

He, Zhili; Zhang, Ping; Wu, Linwei; Rocha, Andrea M; Tu, Qichao; Shi, Zhou; Wu, Bo; Qin, Yujia; Wang, Jianjun; Yan, Qingyun; Curtis, Daniel; Ning, Daliang; Van Nostrand, Joy D; Wu, Liyou; Yang, Yunfeng; Elias, Dwayne A; Watson, David B; Adams, Michael W W; Fields, Matthew W; Alm, Eric J; Hazen, Terry C; Adams, Paul D; Arkin, Adam P; Zhou, Jizhong

2018-02-20

Contamination from anthropogenic activities has significantly impacted Earth's biosphere. However, knowledge about how environmental contamination affects the biodiversity of groundwater microbiomes and ecosystem functioning remains very limited. Here, we used a comprehensive functional gene array to analyze groundwater microbiomes from 69 wells at the Oak Ridge Field Research Center (Oak Ridge, TN), representing a wide pH range and uranium, nitrate, and other contaminants. We hypothesized that the functional diversity of groundwater microbiomes would decrease as environmental contamination (e.g., uranium or nitrate) increased or at low or high pH, while some specific populations capable of utilizing or resistant to those contaminants would increase, and thus, such key microbial functional genes and/or populations could be used to predict groundwater contamination and ecosystem functioning. Our results indicated that functional richness/diversity decreased as uranium (but not nitrate) increased in groundwater. In addition, about 5.9% of specific key functional populations targeted by a comprehensive functional gene array (GeoChip 5) increased significantly ( P < 0.05) as uranium or nitrate increased, and their changes could be used to successfully predict uranium and nitrate contamination and ecosystem functioning. This study indicates great potential for using microbial functional genes to predict environmental contamination and ecosystem functioning. IMPORTANCE Disentangling the relationships between biodiversity and ecosystem functioning is an important but poorly understood topic in ecology. Predicting ecosystem functioning on the basis of biodiversity is even more difficult, particularly with microbial biomarkers. As an exploratory effort, this study used key microbial functional genes as biomarkers to provide predictive understanding of environmental contamination and ecosystem functioning. The results indicated that the overall functional gene richness/diversity decreased as uranium increased in groundwater, while specific key microbial guilds increased significantly as uranium or nitrate increased. These key microbial functional genes could be used to successfully predict environmental contamination and ecosystem functioning. This study represents a significant advance in using functional gene markers to predict the spatial distribution of environmental contaminants and ecosystem functioning toward predictive microbial ecology, which is an ultimate goal of microbial ecology. Copyright © 2018 He et al.
Genome-wide prediction and analysis of human tissue-selective genes using microarray expression data

PubMed Central

2013-01-01

Background Understanding how genes are expressed specifically in particular tissues is a fundamental question in developmental biology. Many tissue-specific genes are involved in the pathogenesis of complex human diseases. However, experimental identification of tissue-specific genes is time consuming and difficult. The accurate predictions of tissue-specific gene targets could provide useful information for biomarker development and drug target identification. Results In this study, we have developed a machine learning approach for predicting the human tissue-specific genes using microarray expression data. The lists of known tissue-specific genes for different tissues were collected from UniProt database, and the expression data retrieved from the previously compiled dataset according to the lists were used for input vector encoding. Random Forests (RFs) and Support Vector Machines (SVMs) were used to construct accurate classifiers. The RF classifiers were found to outperform SVM models for tissue-specific gene prediction. The results suggest that the candidate genes for brain or liver specific expression can provide valuable information for further experimental studies. Our approach was also applied for identifying tissue-selective gene targets for different types of tissues. Conclusions A machine learning approach has been developed for accurately identifying the candidate genes for tissue specific/selective expression. The approach provides an efficient way to select some interesting genes for developing new biomedical markers and improve our knowledge of tissue-specific expression. PMID:23369200
Predicting neuroblastoma using developmental signals and a logic-based model.

PubMed

Kasemeier-Kulesa, Jennifer C; Schnell, Santiago; Woolley, Thomas; Spengler, Jennifer A; Morrison, Jason A; McKinney, Mary C; Pushel, Irina; Wolfe, Lauren A; Kulesa, Paul M

2018-07-01

Genomic information from human patient samples of pediatric neuroblastoma cancers and known outcomes have led to specific gene lists put forward as high risk for disease progression. However, the reliance on gene expression correlations rather than mechanistic insight has shown limited potential and suggests a critical need for molecular network models that better predict neuroblastoma progression. In this study, we construct and simulate a molecular network of developmental genes and downstream signals in a 6-gene input logic model that predicts a favorable/unfavorable outcome based on the outcome of the four cell states including cell differentiation, proliferation, apoptosis, and angiogenesis. We simulate the mis-expression of the tyrosine receptor kinases, trkA and trkB, two prognostic indicators of neuroblastoma, and find differences in the number and probability distribution of steady state outcomes. We validate the mechanistic model assumptions using RNAseq of the SHSY5Y human neuroblastoma cell line to define the input states and confirm the predicted outcome with antibody staining. Lastly, we apply input gene signatures from 77 published human patient samples and show that our model makes more accurate disease outcome predictions for early stage disease than any current neuroblastoma gene list. These findings highlight the predictive strength of a logic-based model based on developmental genes and offer a better understanding of the molecular network interactions during neuroblastoma disease progression. Copyright © 2018. Published by Elsevier B.V.
Genetic predictors of antipsychotic response to lurasidone identified in a genome wide association study and by schizophrenia risk genes.

PubMed

Li, Jiang; Yoshikawa, Akane; Brennan, Mark D; Ramsey, Timothy L; Meltzer, Herbert Y

2018-02-01

Biomarkers which predict response to atypical antipsychotic drugs (AAPDs) increases their benefit/risk ratio. We sought to identify common variants in genes which predict response to lurasidone, an AAPD, by associating genome-wide association study (GWAS) data and changes (Δ) in Positive And Negative Syndrome Scale (PANSS) scores from two 6-week randomized, placebo-controlled trials of lurasidone in schizophrenia (SCZ) patients. We also included SCZ risk SNPs identified by the Psychiatric Genomics Consortium using a polygenic risk analysis. The top genomic loci, with uncorrected p<10 -4 , include: 1) synaptic adhesion (PTPRD, LRRC4C, NRXN1, ILIRAPL1, SLITRK1) and scaffolding (MAGI1, MAGI2, NBEA) genes, both essential for synaptic function; 2) other synaptic plasticity-related genes (NRG1/3 and KALRN); 3) the neuron-specific RNA splicing regulator, RBFOX1; and 4) ion channel genes, e.g. KCNA10, KCNAB1, KCNK9 and CACNA2D3). Some genes predicted response for patients with both European and African Ancestries. We replicated some SNPs reported to predict response to other atypical APDs in other GWAS. Although none of the biomarkers reached genome-wide significance, many of the genes and associated pathways have previously been linked to SCZ. Two polygenic modeling approaches, GCTA-GREML and PLINK-Polygenic Risk Score, demonstrated that some risk genes related to neurodevelopment, synaptic biology, immune response, and histones, also contributed to prediction of response. The top hits predicting response to lurasidone did not predict improvement with placebo. This is the first evidence from clinical trials that SCZ risk SNPs are related to clinical response to an AAPD. These results need to be replicated in an independent sample. Copyright © 2017. Published by Elsevier B.V.
Computational Predictions Provide Insights into the Biology of TAL Effector Target Sites

PubMed Central

Grau, Jan; Wolf, Annett; Reschke, Maik; Bonas, Ulla; Posch, Stefan; Boch, Jens

2013-01-01

Transcription activator-like (TAL) effectors are injected into host plant cells by Xanthomonas bacteria to function as transcriptional activators for the benefit of the pathogen. The DNA binding domain of TAL effectors is composed of conserved amino acid repeat structures containing repeat-variable diresidues (RVDs) that determine DNA binding specificity. In this paper, we present TALgetter, a new approach for predicting TAL effector target sites based on a statistical model. In contrast to previous approaches, the parameters of TALgetter are estimated from training data computationally. We demonstrate that TALgetter successfully predicts known TAL effector target sites and often yields a greater number of predictions that are consistent with up-regulation in gene expression microarrays than an existing approach, Target Finder of the TALE-NT suite. We study the binding specificities estimated by TALgetter and approve that different RVDs are differently important for transcriptional activation. In subsequent studies, the predictions of TALgetter indicate a previously unreported positional preference of TAL effector target sites relative to the transcription start site. In addition, several TAL effectors are predicted to bind to the TATA-box, which might constitute one general mode of transcriptional activation by TAL effectors. Scrutinizing the predicted target sites of TALgetter, we propose several novel TAL effector virulence targets in rice and sweet orange. TAL-mediated induction of the candidates is supported by gene expression microarrays. Validity of these targets is also supported by functional analogy to known TAL effector targets, by an over-representation of TAL effector targets with similar function, or by a biological function related to pathogen infection. Hence, these predicted TAL effector virulence targets are promising candidates for studying the virulence function of TAL effectors. TALgetter is implemented as part of the open-source Java library Jstacs, and is freely available as a web-application and a command line program. PMID:23526890
Association between expression of random gene sets and survival is evident in multiple cancer types and may be explained by sub-classification.

PubMed

Shimoni, Yishai

2018-02-01

One of the goals of cancer research is to identify a set of genes that cause or control disease progression. However, although multiple such gene sets were published, these are usually in very poor agreement with each other, and very few of the genes proved to be functional therapeutic targets. Furthermore, recent findings from a breast cancer gene-expression cohort showed that sets of genes selected randomly can be used to predict survival with a much higher probability than expected. These results imply that many of the genes identified in breast cancer gene expression analysis may not be causal of cancer progression, even though they can still be highly predictive of prognosis. We performed a similar analysis on all the cancer types available in the cancer genome atlas (TCGA), namely, estimating the predictive power of random gene sets for survival. Our work shows that most cancer types exhibit the property that random selections of genes are more predictive of survival than expected. In contrast to previous work, this property is not removed by using a proliferation signature, which implies that proliferation may not always be the confounder that drives this property. We suggest one possible solution in the form of data-driven sub-classification to reduce this property significantly. Our results suggest that the predictive power of random gene sets may be used to identify the existence of sub-classes in the data, and thus may allow better understanding of patient stratification. Furthermore, by reducing the observed bias this may allow more direct identification of biologically relevant, and potentially causal, genes.
Association between expression of random gene sets and survival is evident in multiple cancer types and may be explained by sub-classification

PubMed Central

2018-01-01

One of the goals of cancer research is to identify a set of genes that cause or control disease progression. However, although multiple such gene sets were published, these are usually in very poor agreement with each other, and very few of the genes proved to be functional therapeutic targets. Furthermore, recent findings from a breast cancer gene-expression cohort showed that sets of genes selected randomly can be used to predict survival with a much higher probability than expected. These results imply that many of the genes identified in breast cancer gene expression analysis may not be causal of cancer progression, even though they can still be highly predictive of prognosis. We performed a similar analysis on all the cancer types available in the cancer genome atlas (TCGA), namely, estimating the predictive power of random gene sets for survival. Our work shows that most cancer types exhibit the property that random selections of genes are more predictive of survival than expected. In contrast to previous work, this property is not removed by using a proliferation signature, which implies that proliferation may not always be the confounder that drives this property. We suggest one possible solution in the form of data-driven sub-classification to reduce this property significantly. Our results suggest that the predictive power of random gene sets may be used to identify the existence of sub-classes in the data, and thus may allow better understanding of patient stratification. Furthermore, by reducing the observed bias this may allow more direct identification of biologically relevant, and potentially causal, genes. PMID:29470520
MicroRNAs-1614-3p gene seed region polymorphisms and association analysis with chicken production traits.

PubMed

Li, Hong; Sun, Gui-Rong; Tian, Ya-Dong; Han, Rui-Li; Li, Guo-Xi; Kang, Xiang-Tao

2013-05-01

In the present study, a total of 860 chickens from a Gushi-Anka F2 resource population were used to evaluate the genetic effect of the gga-miR-1614-3p gene. A novel, silent, single nucleotide polymorphism (SNP, +5 C>T) was detected in the gga-miR-1614-3p gene seed region through AvaII polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP) and PCR products sequencing methods. Associations between the SNP and chicken growth, meat quality and carcass traits were performed by association analysis. The results showed that the SNP was significantly associated with breast muscle shear force and leg muscle water loss rate, wing weight, liver weight and heart weight (p<0.05), and highly significantly associated with the weight of the abdominal fat (p<0.01). The secondary structure of gga-miR-1614 and the free energy were altered due to the variation predicted by the M-fold program.
FunSimMat: a comprehensive functional similarity database

PubMed Central

Schlicker, Andreas; Albrecht, Mario

2008-01-01

Functional similarity based on Gene Ontology (GO) annotation is used in diverse applications like gene clustering, gene expression data analysis, protein interaction prediction and evaluation. However, there exists no comprehensive resource of functional similarity values although such a database would facilitate the use of functional similarity measures in different applications. Here, we describe FunSimMat (Functional Similarity Matrix, http://funsimmat.bioinf.mpi-inf.mpg.de/), a large new database that provides several different semantic similarity measures for GO terms. It offers various precomputed functional similarity values for proteins contained in UniProtKB and for protein families in Pfam and SMART. The web interface allows users to efficiently perform both semantic similarity searches with GO terms and functional similarity searches with proteins or protein families. All results can be downloaded in tab-delimited files for use with other tools. An additional XML–RPC interface gives automatic online access to FunSimMat for programs and remote services. PMID:17932054
Selection of suitable reference genes for gene expression studies in Staphylococcus capitis during growth under erythromycin stress.

PubMed

Cui, Bintao; Smooker, Peter M; Rouch, Duncan A; Deighton, Margaret A

2016-08-01

Accurate and reproducible measurement of gene transcription requires appropriate reference genes, which are stably expressed under different experimental conditions to provide normalization. Staphylococcus capitis is a human pathogen that produces biofilm under stress, such as imposed by antimicrobial agents. In this study, a set of five commonly used staphylococcal reference genes (gyrB, sodA, recA, tuf and rpoB) were systematically evaluated in two clinical isolates of Staphylococcus capitis (S. capitis subspecies urealyticus and capitis, respectively) under erythromycin stress in mid-log and stationary phases. Two public software programs (geNorm and NormFinder) and two manual calculation methods, reference residue normalization (RRN) and relative quantitative (RQ), were applied. The potential reference genes selected by the four algorithms were further validated by comparing the expression of a well-studied biofilm gene (icaA) with phenotypic biofilm formation in S. capitis under four different experimental conditions. The four methods differed considerably in their ability to predict the most suitable reference gene or gene combination for comparing icaA expression under different conditions. Under the conditions used here, the RQ method provided better selection of reference genes than the other three algorithms; however, this finding needs to be confirmed with a larger number of isolates. This study reinforces the need to assess the stability of reference genes for analysis of target gene expression under different conditions and the use of more than one algorithm in such studies. Although this work was conducted using a specific human pathogen, it emphasizes the importance of selecting suitable reference genes for accurate normalization of gene expression more generally.
Comparative transcriptional profiling-based identification of raphanusanin-inducible genes

PubMed Central

2010-01-01

Background Raphanusanin (Ra) is a light-induced growth inhibitor involved in the inhibition of hypocotyl growth in response to unilateral blue-light illumination in radish seedlings. Knowledge of the roles of Ra still remains elusive. To understand the roles of Ra and its functional coupling to light signalling, we constructed the Ra-induced gene library using the Suppression Subtractive Hybridisation (SSH) technique and present a comparative investigation of gene regulation in radish seedlings in response to short-term Ra and blue-light exposure. Results The predicted gene ontology (GO) term revealed that 55% of the clones in the Ra-induced gene library were associated with genes involved in common defence mechanisms, including thirty four genes homologous to Arabidopsis genes implicated in R-gene-triggered resistance in the programmed cell death (PCD) pathway. Overall, the library was enriched with transporters, hydrolases, protein kinases, and signal transducers. The transcriptome analysis revealed that, among the fifty genes from various functional categories selected from 88 independent genes of the Ra-induced library, 44 genes were up-regulated and 4 were down-regulated. The comparative analysis showed that, among the transcriptional profiles of 33 highly Ra-inducible genes, 25 ESTs were commonly regulated by different intensities and duration of blue-light irradiation. The transcriptional profiles, coupled with the transcriptional regulation of early blue light, have provided the functional roles of many genes expected to be involved in the light-mediated defence mechanism. Conclusions This study is the first comprehensive survey of transcriptional regulation in response to Ra. The results described herein suggest a link between Ra and cellular defence and light signalling, and thereby contribute to further our understanding of how Ra is involved in light-mediated mechanisms of plant defence. PMID:20553608
Massive Collection of Full-Length Complementary DNA Clones and Microarray Analyses:. Keys to Rice Transcriptome Analysis

NASA Astrophysics Data System (ADS)

Kikuchi, Shoshi

2009-02-01

Completion of the high-precision genome sequence analysis of rice led to the collection of about 35,000 full-length cDNA clones and the determination of their complete sequences. Mapping of these full-length cDNA sequences has given us information on (1) the number of genes expressed in the rice genome; (2) the start and end positions and exon-intron structures of rice genes; (3) alternative transcripts; (4) possible encoded proteins; (5) non-protein-coding (np) RNAs; (6) the density of gene localization on the chromosome; (7) setting the parameters of gene prediction programs; and (8) the construction of a microarray system that monitors global gene expression. Manual curation for rice gene annotation by using mapping information on full-length cDNA and EST assemblies has revealed about 32,000 expressed genes in the rice genome. Analysis of major gene families, such as those encoding membrane transport proteins (pumps, ion channels, and secondary transporters), along with the evolution from bacteria to higher animals and plants, reveals how gene numbers have increased through adaptation to circumstances. Family-based gene annotation also gives us a new way of comparing organisms. Massive amounts of data on gene expression under many kinds of physiological conditions are being accumulated in rice oligoarrays (22K and 44K) based on full-length cDNA sequences. Cluster analyses of genes that have the same promoter cis-elements, that have similar expression profiles, or that encode enzymes in the same metabolic pathways or signal transduction cascades give us clues to understanding the networks of gene expression in rice. As a tool for that purpose, we recently developed "RiCES", a tool for searching for cis-elements in the promoter regions of clustered genes.
Induced PTF1a expression in pancreatic ductal adenocarcinoma cells activates acinar gene networks, reduces tumorigenic properties, and sensitizes cells to gemcitabine treatment.

PubMed

Jakubison, Brad L; Schweickert, Patrick G; Moser, Sarah E; Yang, Yi; Gao, Hongyu; Scully, Kathleen; Itkin-Ansari, Pamela; Liu, Yunlong; Konieczny, Stephen F

2018-05-02

Pancreatic acinar cells synthesize, package, and secrete digestive enzymes into the duodenum to aid in nutrient absorption and meet metabolic demands. When exposed to cellular stresses and insults, acinar cells undergo a dedifferentiation process termed acinar-ductal metaplasia (ADM). ADM lesions with oncogenic mutations eventually give rise to pancreatic ductal adenocarcinoma (PDAC). In healthy pancreata, the basic helix-loop-helix (bHLH) factors MIST1 and PTF1a coordinate an acinar-specific transcription network that maintains the highly developed differentiation status of the cells, protecting the pancreas from undergoing a transformative process. However, when MIST1 and PTF1a gene expression is silenced, cells are more prone to progress to PDAC. In this study, we tested whether induced MIST1 or PTF1a expression in PDAC cells could (i) re-establish the transcriptional program of differentiated acinar cells and (ii) simultaneously reduce tumor cell properties. As predicted, PTF1a induced gene expression of digestive enzymes and acinar-specific transcription factors, while MIST1 induced gene expression of vesicle trafficking molecules as well as activation of unfolded protein response components, all of which are essential to handle the high protein production load that is characteristic of acinar cells. Importantly, induction of PTF1a in PDAC also influenced cancer-associated properties, leading to a decrease in cell proliferation, cancer stem cell numbers, and repression of key ATP-binding cassette efflux transporters resulting in heightened sensitivity to gemcitabine. Thus, activation of pancreatic bHLH transcription factors rescues the acinar gene program and decreases tumorigenic properties in pancreatic cancer cells, offering unique opportunities to develop novel therapeutic intervention strategies for this deadly disease. © 2018 The Authors. Published by FEBS Press and John Wiley & Sons Ltd.
Dissection of Symbiosis and Organ Development by Integrated Transcriptome Analysis of Lotus japonicus Mutant and Wild-Type Plants

PubMed Central

Høgslund, Niels; Radutoiu, Simona; Krusell, Lene; Voroshilova, Vera; Hannah, Matthew A.; Goffard, Nicolas; Sanchez, Diego H.; Lippold, Felix; Ott, Thomas; Sato, Shusei; Tabata, Satoshi; Liboriussen, Poul; Lohmann, Gitte V.; Schauser, Leif; Weiller, Georg F.; Udvardi, Michael K.; Stougaard, Jens

2009-01-01

Genetic analyses of plant symbiotic mutants has led to the identification of key genes involved in Rhizobium-legume communication as well as in development and function of nitrogen fixing root nodules. However, the impact of these genes in coordinating the transcriptional programs of nodule development has only been studied in limited and isolated studies. Here, we present an integrated genome-wide analysis of transcriptome landscapes in Lotus japonicus wild-type and symbiotic mutant plants. Encompassing five different organs, five stages of the sequentially developed determinate Lotus root nodules, and eight mutants impaired at different stages of the symbiotic interaction, our data set integrates an unprecedented combination of organ- or tissue-specific profiles with mutant transcript profiles. In total, 38 different conditions sampled under the same well-defined growth regimes were included. This comprehensive analysis unravelled new and unexpected patterns of transcriptional regulation during symbiosis and organ development. Contrary to expectations, none of the previously characterized nodulins were among the 37 genes specifically expressed in nodules. Another surprise was the extensive transcriptional response in whole root compared to the susceptible root zone where the cellular response is most pronounced. A large number of transcripts predicted to encode transcriptional regulators, receptors and proteins involved in signal transduction, as well as many genes with unknown function, were found to be regulated during nodule organogenesis and rhizobial infection. Combining wild type and mutant profiles of these transcripts demonstrates the activation of a complex genetic program that delineates symbiotic nitrogen fixation. The complete data set was organized into an indexed expression directory that is accessible from a resource database, and here we present selected examples of biological questions that can be addressed with this comprehensive and powerful gene expression data set. PMID:19662091
Improved genome-scale multi-target virtual screening via a novel collaborative filtering approach to cold-start problem

PubMed Central

Lim, Hansaim; Gray, Paul; Xie, Lei; Poleksic, Aleksandar

2016-01-01

Conventional one-drug-one-gene approach has been of limited success in modern drug discovery. Polypharmacology, which focuses on searching for multi-targeted drugs to perturb disease-causing networks instead of designing selective ligands to target individual proteins, has emerged as a new drug discovery paradigm. Although many methods for single-target virtual screening have been developed to improve the efficiency of drug discovery, few of these algorithms are designed for polypharmacology. Here, we present a novel theoretical framework and a corresponding algorithm for genome-scale multi-target virtual screening based on the one-class collaborative filtering technique. Our method overcomes the sparseness of the protein-chemical interaction data by means of interaction matrix weighting and dual regularization from both chemicals and proteins. While the statistical foundation behind our method is general enough to encompass genome-wide drug off-target prediction, the program is specifically tailored to find protein targets for new chemicals with little to no available interaction data. We extensively evaluate our method using a number of the most widely accepted gene-specific and cross-gene family benchmarks and demonstrate that our method outperforms other state-of-the-art algorithms for predicting the interaction of new chemicals with multiple proteins. Thus, the proposed algorithm may provide a powerful tool for multi-target drug design. PMID:27958331

Improved genome-scale multi-target virtual screening via a novel collaborative filtering approach to cold-start problem.

PubMed

Lim, Hansaim; Gray, Paul; Xie, Lei; Poleksic, Aleksandar

2016-12-13

Conventional one-drug-one-gene approach has been of limited success in modern drug discovery. Polypharmacology, which focuses on searching for multi-targeted drugs to perturb disease-causing networks instead of designing selective ligands to target individual proteins, has emerged as a new drug discovery paradigm. Although many methods for single-target virtual screening have been developed to improve the efficiency of drug discovery, few of these algorithms are designed for polypharmacology. Here, we present a novel theoretical framework and a corresponding algorithm for genome-scale multi-target virtual screening based on the one-class collaborative filtering technique. Our method overcomes the sparseness of the protein-chemical interaction data by means of interaction matrix weighting and dual regularization from both chemicals and proteins. While the statistical foundation behind our method is general enough to encompass genome-wide drug off-target prediction, the program is specifically tailored to find protein targets for new chemicals with little to no available interaction data. We extensively evaluate our method using a number of the most widely accepted gene-specific and cross-gene family benchmarks and demonstrate that our method outperforms other state-of-the-art algorithms for predicting the interaction of new chemicals with multiple proteins. Thus, the proposed algorithm may provide a powerful tool for multi-target drug design.
Prediction of cardioembolic, arterial and lacunar causes of cryptogenic stroke by gene expression and infarct location

PubMed Central

Jickling, Glen C; Stamova, Boryana; Ander, Bradley P; Zhan, Xinhua; Liu, Dazhi; Sison, Shara-Mae; Verro, Piero; Sharp, Frank R

2012-01-01

Background and Purpose The cause of ischemic stroke remains unclear, or cryptogenic, in as many as 35% of stroke patients. Not knowing the cause of stroke restricts optimal implementation of prevention therapy and limits stroke research. We demonstrate how gene expression profiles in blood can be used in conjunction with a measure of infarct location on neuroimaging to predict a probable cause in cryptogenic stroke. Methods The cause of cryptogenic stroke was predicted using previously described profiles of differentially expressed genes characteristic of patients with cardioembolic, arterial and lacunar stroke. RNA was isolated from peripheral blood of 131 cryptogenic strokes and compared to profiles derived from 149 strokes of known cause. Each sample was run on Affymetrix U133 Plus2.0 microarrays. Cause of cryptogenic stroke was predicted using gene expression in blood and infarct location. Results Cryptogenic strokes were predicted to be 58% cardioembolic, 18% arterial, 12% lacunar and 12% unclear etiology. Cryptogenic stroke of predicted cardioembolic etiology had more prior myocardial infarction and higher CHA2DS2-VASc scores compared to stroke of predicted arterial etiology. Predicted lacunar strokes had higher systolic and diastolic blood pressures and lower NIHSS compared to predicted arterial and cardioembolic strokes. Cryptogenic strokes of unclear predicted etiology were less likely to have a prior TIA or ischemic stroke. Conclusions Gene expression in conjunction with a measure of infarct location can predict a probable cause in cryptogenic strokes. Predicted groups require further evaluation to determine whether relevant clinical, imaging, or therapeutic differences exist for each group. PMID:22627989
Sex differences in prenatal epigenetic programming of stress pathways.

PubMed

Bale, Tracy L

2011-07-01

Maternal stress experience is associated with neurodevelopmental disorders including schizophrenia and autism. Recent studies have examined mechanisms by which changes in the maternal milieu may be transmitted to the developing embryo and potentially translated into programming of the epigenome. Animal models of prenatal stress have identified important sex- and temporal-specific effects on offspring stress responsivity. As dysregulation of stress pathways is a common feature in most neuropsychiatric diseases, molecular and epigenetic analyses at the maternal-embryo interface, especially in the placenta, may provide unique insight into identifying much-needed predictive biomarkers. In addition, as most neurodevelopmental disorders present with a sex bias, examination of sex differences in the inheritance of phenotypic outcomes may pinpoint gene targets and specific windows of vulnerability in neurodevelopment, which have been disrupted. This review discusses the association and possible contributing mechanisms of prenatal stress in programming offspring stress pathway dysregulation and the importance of sex.
The G-Box Transcriptional Regulatory Code in Arabidopsis1[OPEN

PubMed Central

Shepherd, Samuel J.K.; Brestovitsky, Anna; Dickinson, Patrick; Biswas, Surojit

2017-01-01

Plants have significantly more transcription factor (TF) families than animals and fungi, and plant TF families tend to contain more genes; these expansions are linked to adaptation to environmental stressors. Many TF family members bind to similar or identical sequence motifs, such as G-boxes (CACGTG), so it is difficult to predict regulatory relationships. We determined that the flanking sequences near G-boxes help determine in vitro specificity but that this is insufficient to predict the transcription pattern of genes near G-boxes. Therefore, we constructed a gene regulatory network that identifies the set of bZIPs and bHLHs that are most predictive of the expression of genes downstream of perfect G-boxes. This network accurately predicts transcriptional patterns and reconstructs known regulatory subnetworks. Finally, we present Ara-BOX-cis (araboxcis.org), a Web site that provides interactive visualizations of the G-box regulatory network, a useful resource for generating predictions for gene regulatory relations. PMID:28864470
Matrix factorization-based data fusion for gene function prediction in baker's yeast and slime mold.

PubMed

Zitnik, Marinka; Zupan, Blaž

2014-01-01

The development of effective methods for the characterization of gene functions that are able to combine diverse data sources in a sound and easily-extendible way is an important goal in computational biology. We have previously developed a general matrix factorization-based data fusion approach for gene function prediction. In this manuscript, we show that this data fusion approach can be applied to gene function prediction and that it can fuse various heterogeneous data sources, such as gene expression profiles, known protein annotations, interaction and literature data. The fusion is achieved by simultaneous matrix tri-factorization that shares matrix factors between sources. We demonstrate the effectiveness of the approach by evaluating its performance on predicting ontological annotations in slime mold D. discoideum and on recognizing proteins of baker's yeast S. cerevisiae that participate in the ribosome or are located in the cell membrane. Our approach achieves predictive performance comparable to that of the state-of-the-art kernel-based data fusion, but requires fewer data preprocessing steps.
In vitro perturbations of targets in cancer hallmark processes predict rodent chemical carcinogenesis.

PubMed

Kleinstreuer, Nicole C; Dix, David J; Houck, Keith A; Kavlock, Robert J; Knudsen, Thomas B; Martin, Matthew T; Paul, Katie B; Reif, David M; Crofton, Kevin M; Hamilton, Kerry; Hunter, Ronald; Shah, Imran; Judson, Richard S

2013-01-01

Thousands of untested chemicals in the environment require efficient characterization of carcinogenic potential in humans. A proposed solution is rapid testing of chemicals using in vitro high-throughput screening (HTS) assays for targets in pathways linked to disease processes to build models for priority setting and further testing. We describe a model for predicting rodent carcinogenicity based on HTS data from 292 chemicals tested in 672 assays mapping to 455 genes. All data come from the EPA ToxCast project. The model was trained on a subset of 232 chemicals with in vivo rodent carcinogenicity data in the Toxicity Reference Database (ToxRefDB). Individual HTS assays strongly associated with rodent cancers in ToxRefDB were linked to genes, pathways, and hallmark processes documented to be involved in tumor biology and cancer progression. Rodent liver cancer endpoints were linked to well-documented pathways such as peroxisome proliferator-activated receptor signaling and TP53 and novel targets such as PDE5A and PLAUR. Cancer hallmark genes associated with rodent thyroid tumors were found to be linked to human thyroid tumors and autoimmune thyroid disease. A model was developed in which these genes/pathways function as hypothetical enhancers or promoters of rat thyroid tumors, acting secondary to the key initiating event of thyroid hormone disruption. A simple scoring function was generated to identify chemicals with significant in vitro evidence that was predictive of in vivo carcinogenicity in different rat tissues and organs. This scoring function was applied to an external test set of 33 compounds with carcinogenicity classifications from the EPA's Office of Pesticide Programs and successfully (p = 0.024) differentiated between chemicals classified as "possible"/"probable"/"likely" carcinogens and those designated as "not likely" or with "evidence of noncarcinogenicity." This model represents a chemical carcinogenicity prioritization tool supporting targeted testing and functional validation of cancer pathways.
Evaluating High-Throughput Ab Initio Gene Finders to Discover Proteins Encoded in Eukaryotic Pathogen Genomes Missed by Laboratory Techniques

PubMed Central

Goodswen, Stephen J.; Kennedy, Paul J.; Ellis, John T.

2012-01-01

Next generation sequencing technology is advancing genome sequencing at an unprecedented level. By unravelling the code within a pathogen’s genome, every possible protein (prior to post-translational modifications) can theoretically be discovered, irrespective of life cycle stages and environmental stimuli. Now more than ever there is a great need for high-throughput ab initio gene finding. Ab initio gene finders use statistical models to predict genes and their exon-intron structures from the genome sequence alone. This paper evaluates whether existing ab initio gene finders can effectively predict genes to deduce proteins that have presently missed capture by laboratory techniques. An aim here is to identify possible patterns of prediction inaccuracies for gene finders as a whole irrespective of the target pathogen. All currently available ab initio gene finders are considered in the evaluation but only four fulfil high-throughput capability: AUGUSTUS, GeneMark_hmm, GlimmerHMM, and SNAP. These gene finders require training data specific to a target pathogen and consequently the evaluation results are inextricably linked to the availability and quality of the data. The pathogen, Toxoplasma gondii, is used to illustrate the evaluation methods. The results support current opinion that predicted exons by ab initio gene finders are inaccurate in the absence of experimental evidence. However, the results reveal some patterns of inaccuracy that are common to all gene finders and these inaccuracies may provide a focus area for future gene finder developers. PMID:23226328
Frequencies and expression levels of programmed death ligand 1 (PD-L1) in circulating tumor RNA (ctRNA) in various cancer types.

PubMed

Ishiba, Toshiyuki; Hoffmann, Andreas-Claudius; Usher, Joshua; Elshimali, Yahya; Sturdevant, Todd; Dang, Mai; Jaimes, Yolanda; Tyagi, Rama; Gonzales, Ronald; Grino, Mary; Pinski, Jacek K; Barzi, Afsaneh; Raez, Luis E; Eberhardt, Wilfried E; Theegarten, Dirk; Lenz, Heinz-Josef; Uetake, Hiroyuki; Danenberg, Peter V; Danenberg, Kathleen

2018-06-07

Precision medicine and prediction of therapeutic response requires monitoring potential biomarkers before and after treatment. Liquid biopsies provide noninvasive prognostic markers such as circulating tumor DNA and RNA. Circulating tumor RNA (ctRNA) in blood is also used to identify mutations in genes of interest, but additionally, provides information about relative expression levels of important genes. In this study, we analyzed PD-L1 expression in ctRNA isolated from various cancer types. Tumors inhibit antitumor response by modulating the immune checkpoint proteins programmed death ligand 1 (PD-L1) and its cognate receptor PD1. The expression of these genes has been implicated in evasion of immune response and resistance to targeted therapies. Blood samples were collected from gastric (GC), colorectal (CRC), lung (NSCLC), breast (BC), prostate cancer (PC) patients, and a healthy control group. ctRNA was purified from fractionated plasma, and following reverse transcription, levels of PD-L1 expression were analyzed using qPCR. PD-L1 expression was detected in the plasma ctRNA of all cancer types at varying frequencies but no PD-L1 mRNA was detected in cancer-free individuals. The frequencies of PD-L1 expression were significantly different among the various cancer types but the median relative PD-L1 expression values were not significantly different. In 12 cases where plasma and tumor tissue were available from the same patients, there was a high degree of concordance between expression of PD-L1 protein in tumor tissues and PD-L1 gene expression in plasma, and both methods were equally predictive of response to nivolumab. PD-L1 mRNA can be detected and quantitated in ctRNA of cancer patients. These results pave the way for further studies aimed at determining whether monitoring the levels of PD-L1 mRNA in blood can identify patients who are most likely to benefit from the conventional treatment. Copyright © 2018 Elsevier Inc. All rights reserved.
Prediction and analysis of essential genes using the enrichments of gene ontology and KEGG pathways.

PubMed

Chen, Lei; Zhang, Yu-Hang; Wang, ShaoPeng; Zhang, YunHua; Huang, Tao; Cai, Yu-Dong

2017-01-01

Identifying essential genes in a given organism is important for research on their fundamental roles in organism survival. Furthermore, if possible, uncovering the links between core functions or pathways with these essential genes will further help us obtain deep insight into the key roles of these genes. In this study, we investigated the essential and non-essential genes reported in a previous study and extracted gene ontology (GO) terms and biological pathways that are important for the determination of essential genes. Through the enrichment theory of GO and KEGG pathways, we encoded each essential/non-essential gene into a vector in which each component represented the relationship between the gene and one GO term or KEGG pathway. To analyze these relationships, the maximum relevance minimum redundancy (mRMR) was adopted. Then, the incremental feature selection (IFS) and support vector machine (SVM) were employed to extract important GO terms and KEGG pathways. A prediction model was built simultaneously using the extracted GO terms and KEGG pathways, which yielded nearly perfect performance, with a Matthews correlation coefficient of 0.951, for distinguishing essential and non-essential genes. To fully investigate the key factors influencing the fundamental roles of essential genes, the 21 most important GO terms and three KEGG pathways were analyzed in detail. In addition, several genes was provided in this study, which were predicted to be essential genes by our prediction model. We suggest that this study provides more functional and pathway information on the essential genes and provides a new way to investigate related problems.
Center of Excellence for Individuation of Therapy for Breast Cancer

DTIC Science & Technology

2012-03-01

Sledge, B. Leyland-Jones (2011) Gene copy number and expression of TYMP and TYMS are predictive of outcome in breast cancer patients treated with... Gene copy number and expression of TYMP and TYMS are predictive of outcome in breast cancer patients treated with capecitabine. R. Audet, C...determine if a specific gene expression signature could be used as predictive marker for treatment outcome . Results summary for Cohort A: doxorubicin
DeSigN: connecting gene expression with therapeutics for drug repurposing and development.

PubMed

Lee, Bernard Kok Bang; Tiong, Kai Hung; Chang, Jit Kang; Liew, Chee Sun; Abdul Rahman, Zainal Ariff; Tan, Aik Choon; Khang, Tsung Fei; Cheong, Sok Ching

2017-01-25

The drug discovery and development pipeline is a long and arduous process that inevitably hampers rapid drug development. Therefore, strategies to improve the efficiency of drug development are urgently needed to enable effective drugs to enter the clinic. Precision medicine has demonstrated that genetic features of cancer cells can be used for predicting drug response, and emerging evidence suggest that gene-drug connections could be predicted more accurately by exploring the cumulative effects of many genes simultaneously. We developed DeSigN, a web-based tool for predicting drug efficacy against cancer cell lines using gene expression patterns. The algorithm correlates phenotype-specific gene signatures derived from differentially expressed genes with pre-defined gene expression profiles associated with drug response data (IC 50 ) from 140 drugs. DeSigN successfully predicted the right drug sensitivity outcome in four published GEO studies. Additionally, it predicted bosutinib, a Src/Abl kinase inhibitor, as a sensitive inhibitor for oral squamous cell carcinoma (OSCC) cell lines. In vitro validation of bosutinib in OSCC cell lines demonstrated that indeed, these cell lines were sensitive to bosutinib with IC 50 of 0.8-1.2 μM. As further confirmation, we demonstrated experimentally that bosutinib has anti-proliferative activity in OSCC cell lines, demonstrating that DeSigN was able to robustly predict drug that could be beneficial for tumour control. DeSigN is a robust method that is useful for the identification of candidate drugs using an input gene signature obtained from gene expression analysis. This user-friendly platform could be used to identify drugs with unanticipated efficacy against cancer cell lines of interest, and therefore could be used for the repurposing of drugs, thus improving the efficiency of drug development.
NoGOA: predicting noisy GO annotations using evidences and sparse representation.

PubMed

Yu, Guoxian; Lu, Chang; Wang, Jun

2017-07-21

Gene Ontology (GO) is a community effort to represent functional features of gene products. GO annotations (GOA) provide functional associations between GO terms and gene products. Due to resources limitation, only a small portion of annotations are manually checked by curators, and the others are electronically inferred. Although quality control techniques have been applied to ensure the quality of annotations, the community consistently report that there are still considerable noisy (or incorrect) annotations. Given the wide application of annotations, however, how to identify noisy annotations is an important but yet seldom studied open problem. We introduce a novel approach called NoGOA to predict noisy annotations. NoGOA applies sparse representation on the gene-term association matrix to reduce the impact of noisy annotations, and takes advantage of sparse representation coefficients to measure the semantic similarity between genes. Secondly, it preliminarily predicts noisy annotations of a gene based on aggregated votes from semantic neighborhood genes of that gene. Next, NoGOA estimates the ratio of noisy annotations for each evidence code based on direct annotations in GOA files archived on different periods, and then weights entries of the association matrix via estimated ratios and propagates weights to ancestors of direct annotations using GO hierarchy. Finally, it integrates evidence-weighted association matrix and aggregated votes to predict noisy annotations. Experiments on archived GOA files of six model species (H. sapiens, A. thaliana, S. cerevisiae, G. gallus, B. Taurus and M. musculus) demonstrate that NoGOA achieves significantly better results than other related methods and removing noisy annotations improves the performance of gene function prediction. The comparative study justifies the effectiveness of integrating evidence codes with sparse representation for predicting noisy GO annotations. Codes and datasets are available at http://mlda.swu.edu.cn/codes.php?name=NoGOA .
Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger.

PubMed

Wright, James C; Sugden, Deana; Francis-McIntyre, Sue; Riba-Garcia, Isabel; Gaskell, Simon J; Grigoriev, Igor V; Baker, Scott E; Beynon, Robert J; Hubbard, Simon J

2009-02-04

Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS) and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR). 405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6%) of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models. This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST) data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method.
A peripheral blood transcriptomic signature predicts autoantibody development in infants at risk of type 1 diabetes.

PubMed

Mehdi, Ahmed M; Hamilton-Williams, Emma E; Cristino, Alexandre; Ziegler, Anette; Bonifacio, Ezio; Le Cao, Kim-Anh; Harris, Mark; Thomas, Ranjeny

2018-03-08

Autoimmune-mediated destruction of pancreatic islet β cells results in type 1 diabetes (T1D). Serum islet autoantibodies usually develop in genetically susceptible individuals in early childhood before T1D onset, with multiple islet autoantibodies predicting diabetes development. However, most at-risk children remain islet-antibody negative, and no test currently identifies those likely to seroconvert. We sought a genomic signature predicting seroconversion risk by integrating longitudinal peripheral blood gene expression profiles collected in high-risk children included in the BABYDIET and DIPP cohorts, of whom 50 seroconverted. Subjects were followed for 10 years to determine time of seroconversion. Any cohort effect and the time of seroconversion were corrected to uncover genes differentially expressed (DE) in seroconverting children. Gene expression signatures associated with seroconversion were evident during the first year of life, with 67 DE genes identified in seroconverting children relative to those remaining antibody negative. These genes contribute to T cell-, DC-, and B cell-related immune responses. Near-birth expression of ADCY9, PTCH1, MEX3B, IL15RA, ZNF714, TENM1, and PLEKHA5, along with HLA risk score predicted seroconversion (AUC 0.85). The ubiquitin-proteasome pathway linked DE genes and T1D susceptibility genes. Therefore, a gene expression signature in infancy predicts risk of seroconversion. Ubiquitination may play a mechanistic role in diabetes progression.
A peripheral blood transcriptomic signature predicts autoantibody development in infants at risk of type 1 diabetes

PubMed Central

Mehdi, Ahmed M.; Hamilton-Williams, Emma E.; Cristino, Alexandre; Ziegler, Anette; Harris, Mark

2018-01-01

Autoimmune-mediated destruction of pancreatic islet β cells results in type 1 diabetes (T1D). Serum islet autoantibodies usually develop in genetically susceptible individuals in early childhood before T1D onset, with multiple islet autoantibodies predicting diabetes development. However, most at-risk children remain islet-antibody negative, and no test currently identifies those likely to seroconvert. We sought a genomic signature predicting seroconversion risk by integrating longitudinal peripheral blood gene expression profiles collected in high-risk children included in the BABYDIET and DIPP cohorts, of whom 50 seroconverted. Subjects were followed for 10 years to determine time of seroconversion. Any cohort effect and the time of seroconversion were corrected to uncover genes differentially expressed (DE) in seroconverting children. Gene expression signatures associated with seroconversion were evident during the first year of life, with 67 DE genes identified in seroconverting children relative to those remaining antibody negative. These genes contribute to T cell–, DC-, and B cell–related immune responses. Near-birth expression of ADCY9, PTCH1, MEX3B, IL15RA, ZNF714, TENM1, and PLEKHA5, along with HLA risk score predicted seroconversion (AUC 0.85). The ubiquitin-proteasome pathway linked DE genes and T1D susceptibility genes. Therefore, a gene expression signature in infancy predicts risk of seroconversion. Ubiquitination may play a mechanistic role in diabetes progression. PMID:29515040
Predicting Protein Function by Genomic Context: Quantitative Evaluation and Qualitative Inferences

PubMed Central

Huynen, Martijn; Snel, Berend; Lathe, Warren; Bork, Peer

2000-01-01

Various new methods have been proposed to predict functional interactions between proteins based on the genomic context of their genes. The types of genomic context that they use are Type I: the fusion of genes; Type II: the conservation of gene-order or co-occurrence of genes in potential operons; and Type III: the co-occurrence of genes across genomes (phylogenetic profiles). Here we compare these types for their coverage, their correlations with various types of functional interaction, and their overlap with homology-based function assignment. We apply the methods to Mycoplasma genitalium, the standard benchmarking genome in computational and experimental genomics. Quantitatively, conservation of gene order is the technique with the highest coverage, applying to 37% of the genes. By combining gene order conservation with gene fusion (6%), the co-occurrence of genes in operons in absence of gene order conservation (8%), and the co-occurrence of genes across genomes (11%), significant context information can be obtained for 50% of the genes (the categories overlap). Qualitatively, we observe that the functional interactions between genes are stronger as the requirements for physical neighborhood on the genome are more stringent, while the fraction of potential false positives decreases. Moreover, only in cases in which gene order is conserved in a substantial fraction of the genomes, in this case six out of twenty-five, does a single type of functional interaction (physical interaction) clearly dominate (>80%). In other cases, complementary function information from homology searches, which is available for most of the genes with significant genomic context, is essential to predict the type of interaction. Using a combination of genomic context and homology searches, new functional features can be predicted for 10% of M. genitalium genes. PMID:10958638
Prediction of Response to Therapy and Clinical Outcome through a Pilot Study of Complete Genetic Assessment of Ovarian Cancer

DTIC Science & Technology

2015-12-01

Oncology program supported by this grant consented patients to 11-104. OncoPanel is a cancer genomic assay that detects somatic mutations, copy number...KMT2D, EP300, FANCD2 Sertoli Leydig cell DICER1 Copy number variants: In addition, 219 patients were analyzed for copy-number variations ( CNV ) in...OncoPanel genes. >12,000 total CNV were reported in the cohort (Figure 2). Single- copy deletions (n=5558) and copy-number gains (low amplification) (n
Operating Comfort Prediction Model of Human-Machine Interface Layout for Cabin Based on GEP.

PubMed

Deng, Li; Wang, Guohua; Chen, Bo

2015-01-01

In view of the evaluation and decision-making problem of human-machine interface layout design for cabin, the operating comfort prediction model is proposed based on GEP (Gene Expression Programming), using operating comfort to evaluate layout scheme. Through joint angles to describe operating posture of upper limb, the joint angles are taken as independent variables to establish the comfort model of operating posture. Factor analysis is adopted to decrease the variable dimension; the model's input variables are reduced from 16 joint angles to 4 comfort impact factors, and the output variable is operating comfort score. The Chinese virtual human body model is built by CATIA software, which will be used to simulate and evaluate the operators' operating comfort. With 22 groups of evaluation data as training sample and validation sample, GEP algorithm is used to obtain the best fitting function between the joint angles and the operating comfort; then, operating comfort can be predicted quantitatively. The operating comfort prediction result of human-machine interface layout of driller control room shows that operating comfort prediction model based on GEP is fast and efficient, it has good prediction effect, and it can improve the design efficiency.
Operating Comfort Prediction Model of Human-Machine Interface Layout for Cabin Based on GEP

PubMed Central

Wang, Guohua; Chen, Bo

2015-01-01

In view of the evaluation and decision-making problem of human-machine interface layout design for cabin, the operating comfort prediction model is proposed based on GEP (Gene Expression Programming), using operating comfort to evaluate layout scheme. Through joint angles to describe operating posture of upper limb, the joint angles are taken as independent variables to establish the comfort model of operating posture. Factor analysis is adopted to decrease the variable dimension; the model's input variables are reduced from 16 joint angles to 4 comfort impact factors, and the output variable is operating comfort score. The Chinese virtual human body model is built by CATIA software, which will be used to simulate and evaluate the operators' operating comfort. With 22 groups of evaluation data as training sample and validation sample, GEP algorithm is used to obtain the best fitting function between the joint angles and the operating comfort; then, operating comfort can be predicted quantitatively. The operating comfort prediction result of human-machine interface layout of driller control room shows that operating comfort prediction model based on GEP is fast and efficient, it has good prediction effect, and it can improve the design efficiency. PMID:26448740
Gene-Gene and Gene-Environment Interactions in Ulcerative Colitis

PubMed Central

Wang, Ming-Hsi; Fiocchi, Claudio; Zhu, Xiaofeng; Ripke, Stephan; Kamboh, M. Ilyas; Rebert, Nancy; Duerr, Richard H.; Achkar, Jean-Paul

2014-01-01

Genome-wide association studies (GWAS) have identified at least 133 ulcerative colitis (UC) associated loci. The role of genetic factors in clinical practice is not clearly defined. The relevance of genetic variants to disease pathogenesis is still uncertain because of not characterized gene-gene and gene-environment interactions. We examined the predictive value of combining the 133 UC risk loci with genetic interactions in an ongoing inflammatory bowel disease (IBD) GWAS. The Wellcome Trust Case-Control Consortium (WTCCC) IBD GWAS was used as a replication cohort. We applied logic regression (LR), a novel adaptive regression methodology, to search for high order interactions. Exploratory genotype correlations with UC sub-phenotypes (extent of disease, need of surgery, age of onset, extra-intestinal manifestations and primary sclerosing cholangitis (PSC)) were conducted. The combination of 133 UC loci yielded good UC risk predictability (area under the curve [AUC] of 0.86). A higher cumulative allele score predicted higher UC risk. Through LR, several lines of evidence for genetic interactions were identified and successfully replicated in the WTCCC cohort. The genetic interactions combined with the gene-smoking interaction significantly improved predictability in the model (AUC, from 0.86 to 0.89, P=3.26E-05). Explained UC variance increased from 37% to 42% after adding the interaction terms. A within case analysis found suggested genetic association with PSC. Our study demonstrates that the LR methodology allows the identification and replication of high order genetic interactions in UC GWAS datasets. UC risk can be predicted by a 133 loci and improved by adding gene-gene and gene-environment interactions. PMID:24241240

Phylogenomic detection and functional prediction of genes potentially important for plant meiosis.

PubMed

Zhang, Luoyan; Kong, Hongzhi; Ma, Hong; Yang, Ji

2018-02-15

Meiosis is a specialized type of cell division necessary for sexual reproduction in eukaryotes. A better understanding of the cytological procedures of meiosis has been achieved by comprehensive cytogenetic studies in plants, while the genetic mechanisms regulating meiotic progression remain incompletely understood. The increasing accumulation of complete genome sequences and large-scale gene expression datasets has provided a powerful resource for phylogenomic inference and unsupervised identification of genes involved in plant meiosis. By integrating sequence homology and expression data, 164, 131, 124 and 162 genes potentially important for meiosis were identified in the genomes of Arabidopsis thaliana, Oryza sativa, Selaginella moellendorffii and Pogonatum aloides, respectively. The predicted genes were assigned to 45 meiotic GO terms, and their functions were related to different processes occurring during meiosis in various organisms. Most of the predicted meiotic genes underwent lineage-specific duplication events during plant evolution, with about 30% of the predicted genes retaining only a single copy in higher plant genomes. The results of this study provided clues to design experiments for better functional characterization of meiotic genes in plants, promoting the phylogenomic approach to the evolutionary dynamics of the plant meiotic machineries. Copyright © 2017 Elsevier B.V. All rights reserved.
Google Goes Cancer: Improving Outcome Prediction for Cancer Patients by Network-Based Ranking of Marker Genes

PubMed Central

Roy, Janine; Aust, Daniela; Knösel, Thomas; Rümmele, Petra; Jahnke, Beatrix; Hentrich, Vera; Rückert, Felix; Niedergethmann, Marco; Weichert, Wilko; Bahra, Marcus; Schlitt, Hans J.; Settmacher, Utz; Friess, Helmut; Büchler, Markus; Saeger, Hans-Detlev; Schroeder, Michael; Pilarsky, Christian; Grützmann, Robert

2012-01-01

Predicting the clinical outcome of cancer patients based on the expression of marker genes in their tumors has received increasing interest in the past decade. Accurate predictors of outcome and response to therapy could be used to personalize and thereby improve therapy. However, state of the art methods used so far often found marker genes with limited prediction accuracy, limited reproducibility, and unclear biological relevance. To address this problem, we developed a novel computational approach to identify genes prognostic for outcome that couples gene expression measurements from primary tumor samples with a network of known relationships between the genes. Our approach ranks genes according to their prognostic relevance using both expression and network information in a manner similar to Google's PageRank. We applied this method to gene expression profiles which we obtained from 30 patients with pancreatic cancer, and identified seven candidate marker genes prognostic for outcome. Compared to genes found with state of the art methods, such as Pearson correlation of gene expression with survival time, we improve the prediction accuracy by up to 7%. Accuracies were assessed using support vector machine classifiers and Monte Carlo cross-validation. We then validated the prognostic value of our seven candidate markers using immunohistochemistry on an independent set of 412 pancreatic cancer samples. Notably, signatures derived from our candidate markers were independently predictive of outcome and superior to established clinical prognostic factors such as grade, tumor size, and nodal status. As the amount of genomic data of individual tumors grows rapidly, our algorithm meets the need for powerful computational approaches that are key to exploit these data for personalized cancer therapies in clinical practice. PMID:22615549
Improvement of experimental testing and network training conditions with genome-wide microarrays for more accurate predictions of drug gene targets

PubMed Central

2014-01-01

Background Genome-wide microarrays have been useful for predicting chemical-genetic interactions at the gene level. However, interpreting genome-wide microarray results can be overwhelming due to the vast output of gene expression data combined with off-target transcriptional responses many times induced by a drug treatment. This study demonstrates how experimental and computational methods can interact with each other, to arrive at more accurate predictions of drug-induced perturbations. We present a two-stage strategy that links microarray experimental testing and network training conditions to predict gene perturbations for a drug with a known mechanism of action in a well-studied organism. Results S. cerevisiae cells were treated with the antifungal, fluconazole, and expression profiling was conducted under different biological conditions using Affymetrix genome-wide microarrays. Transcripts were filtered with a formal network-based method, sparse simultaneous equation models and Lasso regression (SSEM-Lasso), under different network training conditions. Gene expression results were evaluated using both gene set and single gene target analyses, and the drug’s transcriptional effects were narrowed first by pathway and then by individual genes. Variables included: (i) Testing conditions – exposure time and concentration and (ii) Network training conditions – training compendium modifications. Two analyses of SSEM-Lasso output – gene set and single gene – were conducted to gain a better understanding of how SSEM-Lasso predicts perturbation targets. Conclusions This study demonstrates that genome-wide microarrays can be optimized using a two-stage strategy for a more in-depth understanding of how a cell manifests biological reactions to a drug treatment at the transcription level. Additionally, a more detailed understanding of how the statistical model, SSEM-Lasso, propagates perturbations through a network of gene regulatory interactions is achieved. PMID:24444313
SeqTU: A web server for identification of bacterial transcription units

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Xin; Chou, Wen -Chi; Ma, Qin

A transcription unit (TU) consists of K ≥ 1 consecutive genes on the same strand of a bacterial genome that are transcribed into a single mRNA molecule under certain conditions. Their identification is an essential step in elucidation of transcriptional regulatory networks. We have recently developed a machine-learning method to accurately identify TUs from RNA-seq data, based on two features of the assembled RNA reads: the continuity and stability of RNA-seq coverage across a genomic region. While good performance was achieved by the method on Escherichia coli and Clostridium thermocellum, substantial work is needed to make the program generally applicablemore » to all bacteria, knowing that the program requires organism specific information. A web server, named SeqTU, was developed to automatically identify TUs with given RNA-seq data of any bacterium using a machine-learning approach. The server consists of a number of utility tools, in addition to TU identification, such as data preparation, data quality check and RNA-read mapping. SeqTU provides a user-friendly interface and automated prediction of TUs from given RNA-seq data. Furthermore, the predicted TUs are displayed intuitively using HTML format along with a graphic visualization of the prediction.« less
SeqTU: A web server for identification of bacterial transcription units

DOE PAGES

Chen, Xin; Chou, Wen -Chi; Ma, Qin; ...

2017-03-07

A transcription unit (TU) consists of K ≥ 1 consecutive genes on the same strand of a bacterial genome that are transcribed into a single mRNA molecule under certain conditions. Their identification is an essential step in elucidation of transcriptional regulatory networks. We have recently developed a machine-learning method to accurately identify TUs from RNA-seq data, based on two features of the assembled RNA reads: the continuity and stability of RNA-seq coverage across a genomic region. While good performance was achieved by the method on Escherichia coli and Clostridium thermocellum, substantial work is needed to make the program generally applicablemore » to all bacteria, knowing that the program requires organism specific information. A web server, named SeqTU, was developed to automatically identify TUs with given RNA-seq data of any bacterium using a machine-learning approach. The server consists of a number of utility tools, in addition to TU identification, such as data preparation, data quality check and RNA-read mapping. SeqTU provides a user-friendly interface and automated prediction of TUs from given RNA-seq data. Furthermore, the predicted TUs are displayed intuitively using HTML format along with a graphic visualization of the prediction.« less
Designing Tyrosinase siRNAs by Multiple Prediction Algorithms and Evaluation of Their Anti-Melanogenic Effects.

PubMed

Kwon, Ok-Seon; Kwon, Soo-Jung; Kim, Jin Sang; Lee, Gunbong; Maeng, Han-Joo; Lee, Jeongmi; Hwang, Gwi Seo; Cha, Hyuk-Jin; Chun, Kwang-Hoon

2018-05-01

Melanin is a pigment produced from tyrosine in melanocytes. Although melanin has a protective role against UVB radiation-induced damage, it is also associated with the development of melanoma and darker skin tone. Tyrosinase is a key enzyme in melanin synthesis, which regulates the rate-limiting step during conversion of tyrosine into DOPA and dopaquinone. To develop effective RNA interference therapeutics, we designed a melanin siRNA pool by applying multiple prediction programs to reduce human tyrosinase levels. First, 272 siRNAs passed the target accessibility evaluation using the RNAxs program. Then we selected 34 siRNA sequences with ΔG ≥-34.6 kcal/mol, i-Score value ≥65, and siRNA scales score ≤30. siRNAs were designed as 19-bp RNA duplexes with an asymmetric 3' overhang at the 3' end of the antisense strand. We tested if these siRNAs effectively reduced tyrosinase gene expression using qRT-PCR and found that 17 siRNA sequences were more effective than commercially available siRNA. Three siRNAs further tested showed an effective visual color change in MNT-1 human cells without cytotoxic effects, indicating these sequences are anti-melanogenic. Our study revealed that human tyrosinase siRNAs could be efficiently designed using multiple prediction algorithms.
Designing Tyrosinase siRNAs by Multiple Prediction Algorithms and Evaluation of Their Anti-Melanogenic Effects

PubMed Central

Kwon, Ok-Seon; Kwon, Soo-Jung; Kim, Jin Sang; Lee, Gunbong; Maeng, Han-Joo; Lee, Jeongmi; Hwang, Gwi Seo; Cha, Hyuk-Jin; Chun, Kwang-Hoon

2018-01-01

Melanin is a pigment produced from tyrosine in melanocytes. Although melanin has a protective role against UVB radiation-induced damage, it is also associated with the development of melanoma and darker skin tone. Tyrosinase is a key enzyme in melanin synthesis, which regulates the rate-limiting step during conversion of tyrosine into DOPA and dopaquinone. To develop effective RNA interference therapeutics, we designed a melanin siRNA pool by applying multiple prediction programs to reduce human tyrosinase levels. First, 272 siRNAs passed the target accessibility evaluation using the RNAxs program. Then we selected 34 siRNA sequences with ΔG ≥−34.6 kcal/mol, i-Score value ≥65, and siRNA scales score ≤30. siRNAs were designed as 19-bp RNA duplexes with an asymmetric 3′ overhang at the 3′ end of the antisense strand. We tested if these siRNAs effectively reduced tyrosinase gene expression using qRT-PCR and found that 17 siRNA sequences were more effective than commercially available siRNA. Three siRNAs further tested showed an effective visual color change in MNT-1 human cells without cytotoxic effects, indicating these sequences are anti-melanogenic. Our study revealed that human tyrosinase siRNAs could be efficiently designed using multiple prediction algorithms. PMID:29223142
A systems biology model of the regulatory network in Populus leaves reveals interacting regulators and conserved regulation

PubMed Central

2011-01-01

Background Green plant leaves have always fascinated biologists as hosts for photosynthesis and providers of basic energy to many food webs. Today, comprehensive databases of gene expression data enable us to apply increasingly more advanced computational methods for reverse-engineering the regulatory network of leaves, and to begin to understand the gene interactions underlying complex emergent properties related to stress-response and development. These new systems biology methods are now also being applied to organisms such as Populus, a woody perennial tree, in order to understand the specific characteristics of these species. Results We present a systems biology model of the regulatory network of Populus leaves. The network is reverse-engineered from promoter information and expression profiles of leaf-specific genes measured over a large set of conditions related to stress and developmental. The network model incorporates interactions between regulators, such as synergistic and competitive relationships, by evaluating increasingly more complex regulatory mechanisms, and is therefore able to identify new regulators of leaf development not found by traditional genomics methods based on pair-wise expression similarity. The approach is shown to explain available gene function information and to provide robust prediction of expression levels in new data. We also use the predictive capability of the model to identify condition-specific regulation as well as conserved regulation between Populus and Arabidopsis. Conclusions We outline a computationally inferred model of the regulatory network of Populus leaves, and show how treating genes as interacting, rather than individual, entities identifies new regulators compared to traditional genomics analysis. Although systems biology models should be used with care considering the complexity of regulatory programs and the limitations of current genomics data, methods describing interactions can provide hypotheses about the underlying cause of emergent properties and are needed if we are to identify target genes other than those constituting the "low hanging fruit" of genomic analysis. PMID:21232107
Landscape genetics as a tool for conservation planning: predicting the effects of landscape change on gene flow.

PubMed

van Strien, Maarten J; Keller, Daniela; Holderegger, Rolf; Ghazoul, Jaboury; Kienast, Felix; Bolliger, Janine

2014-03-01

For conservation managers, it is important to know whether landscape changes lead to increasing or decreasing gene flow. Although the discipline of landscape genetics assesses the influence of landscape elements on gene flow, no studies have yet used landscape-genetic models to predict gene flow resulting from landscape change. A species that has already been severely affected by landscape change is the large marsh grasshopper (Stethophyma grossum), which inhabits moist areas in fragmented agricultural landscapes in Switzerland. From transects drawn between all population pairs within maximum dispersal distance (< 3 km), we calculated several measures of landscape composition as well as some measures of habitat configuration. Additionally, a complete sampling of all populations in our study area allowed incorporating measures of population topology. These measures together with the landscape metrics formed the predictor variables in linear models with gene flow as response variable (F(ST) and mean pairwise assignment probability). With a modified leave-one-out cross-validation approach, we selected the model with the highest predictive accuracy. With this model, we predicted gene flow under several landscape-change scenarios, which simulated construction, rezoning or restoration projects, and the establishment of a new population. For some landscape-change scenarios, significant increase or decrease in gene flow was predicted, while for others little change was forecast. Furthermore, we found that the measures of population topology strongly increase model fit in landscape genetic analysis. This study demonstrates the use of predictive landscape-genetic models in conservation and landscape planning.
Cryptic tRNAs in chaetognath mitochondrial genomes.

PubMed

Barthélémy, Roxane-Marie; Seligmann, Hervé

2016-06-01

The chaetognaths constitute a small and enigmatic phylum of little marine invertebrates. Both nuclear and mitochondrial genomes have numerous originalities, some phylum-specific. Until recently, their mitogenomes seemed containing only one tRNA gene (trnMet), but a recent study found in two chaetognath mitogenomes two and four tRNA genes. Moreover, apparently two conspecific mitogenomes have different tRNA gene numbers (one and two). Reanalyses by tRNAscan-SE and ARWEN softwares of the five available complete chaetognath mitogenomes suggest numerous additional tRNA genes from different types. Their total number never reaches the 22 found in most other invertebrates using that genetic code. Predicted error compensation between codon-anticodon mismatch and tRNA misacylation suggests translational activity by tRNAs predicted solely according to secondary structure for tRNAs predicted by tRNAscan-SE, not ARWEN. Numbers of predicted stop-suppressor (antitermination) tRNAs coevolve with predicted overlapping, frameshifted protein coding genes including stop codons. Sequence alignments in secondary structure prediction with non-chaetognath tRNAs suggest that the most likely functional tRNAs are in intergenic regions, as regular mt-tRNAs. Due to usually short intergenic regions, generally tRNA sequences partially overlap with flanking genes. Some tRNA pairs seem templated by sense-antisense strands. Moreover, 16S rRNA genes, but not 12S rRNAs, appear as tRNA nurseries, as previously suggested for multifunctional ribosomal-like protogenomes. Copyright © 2016 Elsevier Ltd. All rights reserved.
The role of ITPA and ribavirin transporter genes polymorphisms in prediction of ribavirin-induced anaemia in chronic hepatitis C Egyptian patients.

PubMed

El Desoky, Ehab S; Abdelhafez, Alaa T; Cusato, Jessica; Kamel, Sherif I; Hussein, Abeer Mr; De Nicolo, Amedeo; Di Perri, Giovanni; D'Avolio, Antonio

2017-09-01

Few data are available concerning the roles of polymorphisms of inosine triphosphatase (ITPA) gene and ribavirin (RBV) transporter genes in the prediction of RBV-induced anaemia among Egyptians with chronic hepatitis C (CHC). Genotyping of three ITPA gene variants and two variants of RBV transporter genes has been performed in 123 patients under pegylated interferon-α/ribavirin treatment. The baseline haemoglobin and ITPA rs1127354 CA/AA have been found as predictors of anaemia at 4, 8 and 12 weeks of RBV therapy. In addition, ITPA rs7270101 AC/CC and age predicted anaemia after 12 weeks of therapy. In conclusion, the ITPA variant rs1127354C>A significantly predict RBV-induced anaemia during the first 3 months of treatment and it is recommended to be assessed before RBV administration. © 2017 John Wiley & Sons Australia, Ltd.
Predictable transcriptome evolution in the convergent and complex bioluminescent organs of squid

PubMed Central

Pankey, M. Sabrina; Minin, Vladimir N.; Imholte, Greg C.; Suchard, Marc A.; Oakley, Todd H.

2014-01-01

Despite contingency in life’s history, the similarity of evolutionarily convergent traits may represent predictable solutions to common conditions. However, the extent to which overall gene expression levels (transcriptomes) underlying convergent traits are themselves convergent remains largely unexplored. Here, we show strong statistical support for convergent evolutionary origins and massively parallel evolution of the entire transcriptomes in symbiotic bioluminescent organs (bacterial photophores) from two divergent squid species. The gene expression similarities are so strong that regression models of one species’ photophore can predict organ identity of a distantly related photophore from gene expression levels alone. Our results point to widespread parallel changes in gene expression evolution associated with convergent origins of complex organs. Therefore, predictable solutions may drive not only the evolution of novel, complex organs but also the evolution of overall gene expression levels that underlie them. PMID:25336755
Identifying gnostic predictors of the vaccine response.

PubMed

Haining, W Nicholas; Pulendran, Bali

2012-06-01

Molecular predictors of the response to vaccination could transform vaccine development. They would allow larger numbers of vaccine candidates to be rapidly screened, shortening the development time for new vaccines. Gene-expression based predictors of vaccine response have shown early promise. However, a limitation of gene-expression based predictors is that they often fail to reveal the mechanistic basis of their ability to classify response. Linking predictive signatures to the function of their component genes would advance basic understanding of vaccine immunity and also improve the robustness of vaccine prediction. New analytic tools now allow more biological meaning to be extracted from predictive signatures. Functional genomic approaches to perturb gene expression in mammalian cells permit the function of predictive genes to be surveyed in highly parallel experiments. The challenge for vaccinologists is therefore to use these tools to embed mechanistic insights into predictors of vaccine response. Copyright © 2012 Elsevier Ltd. All rights reserved.
Common variants in genes encoding adiponectin (ADIPOQ) and its receptors (ADIPOR1/2), adiponectin concentrations, and diabetes incidence in the Diabetes Prevention Program

PubMed Central

Mather, K. J.; Christophi, C. A.; Jablonski, K. A.; Knowler, W. C.; Goldberg, R. B.; Kahn, S. E.; Spector, T.; Dastani, Z.; Waterworth, D.; Richards, J. B.; Funahashi, T.; Pi-Sunyer, F. X.; Pollin, T. I.; Florez, J. C.; Franks, P. W.

2012-01-01

Aims Baseline adiponectin concentrations predict incident Type 2 diabetes mellitus in the Diabetes Prevention Program. We tested the hypothesis that common variants in the genes encoding adiponectin (ADIPOQ) and its receptors (ADIPOR1, ADIPOR2) would associate with circulating adiponectin concentrations and/or with diabetes incidence in the Diabetes Prevention Program population. Methods Seventy-seven tagging single-nucleotide polymorphisms (SNPs) in ADIPOQ (24), ADIPOR1 (22) and ADIPOR2 (31) were genotyped. Associations of SNPs with baseline adiponectin concentrations were evaluated using linear modelling. Associations of SNPs with diabetes incidence were evaluated using Cox proportional hazards modelling. Results Thirteen of 24 ADIPOQ SNPs were significantly associated with baseline adiponectin concentrations. Multivariable analysis including these 13 SNPs revealed strong independent contributions from rs17366568, rs1648707, rs17373414 and rs1403696 with adiponectin concentrations. However, no ADIPOQ SNPs were directly associated with diabetes incidence. Two ADIPOR1 SNPs (rs1342387 and rs12733285) were associated with ~18% increased diabetes incidence for carriers of the minor allele without differences across treatment groups, and without any relationship with adiponectin concentrations. Conclusions ADIPOQ SNPs are significantly associated with adiponectin concentrations in the Diabetes Prevention Program cohort. This observation extends prior observations from unselected populations of European descent into a broader multi-ethnic population, and confirms the relevance of these variants in an obese/dysglycaemic population. Despite the robust relationship between adiponectin concentrations and diabetes risk in this cohort, variants in ADIPOQ that relate to adiponectin concentrations do not relate to diabetes risk in this population. ADIPOR1 variants exerted significant effects on diabetes risk distinct from any effect of adiponectin concentrations. [Clinical Trials Registry Nos; NCT 00004992 (Diabetes Prevention Program) and NCT 00038727 (Diabetes Prevention Program Outcomes Study)] PMID:22443353
Predicting Gene Structures from Multiple RT-PCR Tests

NASA Astrophysics Data System (ADS)

Kováč, Jakub; Vinař, Tomáš; Brejová, Broňa

It has been demonstrated that the use of additional information such as ESTs and protein homology can significantly improve accuracy of gene prediction. However, many sources of external information are still being omitted from consideration. Here, we investigate the use of product lengths from RT-PCR experiments in gene finding. We present hardness results and practical algorithms for several variants of the problem and apply our methods to a real RT-PCR data set in the Drosophila genome. We conclude that the use of RT-PCR data can improve the sensitivity of gene prediction and locate novel splicing variants.
Genome-Wide Comparative Gene Family Classification

PubMed Central

Frech, Christian; Chen, Nansheng

2010-01-01

Correct classification of genes into gene families is important for understanding gene function and evolution. Although gene families of many species have been resolved both computationally and experimentally with high accuracy, gene family classification in most newly sequenced genomes has not been done with the same high standard. This project has been designed to develop a strategy to effectively and accurately classify gene families across genomes. We first examine and compare the performance of computer programs developed for automated gene family classification. We demonstrate that some programs, including the hierarchical average-linkage clustering algorithm MC-UPGMA and the popular Markov clustering algorithm TRIBE-MCL, can reconstruct manual curation of gene families accurately. However, their performance is highly sensitive to parameter setting, i.e. different gene families require different program parameters for correct resolution. To circumvent the problem of parameterization, we have developed a comparative strategy for gene family classification. This strategy takes advantage of existing curated gene families of reference species to find suitable parameters for classifying genes in related genomes. To demonstrate the effectiveness of this novel strategy, we use TRIBE-MCL to classify chemosensory and ABC transporter gene families in C. elegans and its four sister species. We conclude that fully automated programs can establish biologically accurate gene families if parameterized accordingly. Comparative gene family classification finds optimal parameters automatically, thus allowing rapid insights into gene families of newly sequenced species. PMID:20976221
Bioinformatics analysis of the predicted polyprenol reductase genes in higher plants

NASA Astrophysics Data System (ADS)

Basyuni, M.; Wati, R.

2018-03-01

The present study evaluates the bioinformatics methods to analyze twenty-four predicted polyprenol reductase genes from higher plants on GenBank as well as predicted the structure, composition, similarity, subcellular localization, and phylogenetic. The physicochemical properties of plant polyprenol showed diversity among the observed genes. The percentage of the secondary structure of plant polyprenol genes followed the ratio order of α helix > random coil > extended chain structure. The values of chloroplast but not signal peptide were too low, indicated that few chloroplast transit peptide in plant polyprenol reductase genes. The possibility of the potential transit peptide showed variation among the plant polyprenol reductase, suggested the importance of understanding the variety of peptide components of plant polyprenol genes. To clarify this finding, a phylogenetic tree was drawn. The phylogenetic tree shows several branches in the tree, suggested that plant polyprenol reductase genes grouped into divergent clusters in the tree.
HGPEC: a Cytoscape app for prediction of novel disease-gene and disease-disease associations and evidence collection based on a random walk on heterogeneous network.

PubMed

Le, Duc-Hau; Pham, Van-Huy

2017-06-15

Finding gene-disease and disease-disease associations play important roles in the biomedical area and many prioritization methods have been proposed for this goal. Among them, approaches based on a heterogeneous network of genes and diseases are considered state-of-the-art ones, which achieve high prediction performance and can be used for diseases with/without known molecular basis. Here, we developed a Cytoscape app, namely HGPEC, based on a random walk with restart algorithm on a heterogeneous network of genes and diseases. This app can prioritize candidate genes and diseases by employing a heterogeneous network consisting of a network of genes/proteins and a phenotypic disease similarity network. Based on the rankings, novel disease-gene and disease-disease associations can be identified. These associations can be supported with network- and rank-based visualization as well as evidences and annotations from biomedical data. A case study on prediction of novel breast cancer-associated genes and diseases shows the abilities of HGPEC. In addition, we showed prominence in the performance of HGPEC compared to other tools for prioritization of candidate disease genes. Taken together, our app is expected to effectively predict novel disease-gene and disease-disease associations and support network- and rank-based visualization as well as biomedical evidences for such the associations.
Molecular Diagnostics in Colorectal Carcinoma: Advances and Applications for 2018.

PubMed

Bhalla, Amarpreet; Zulfiqar, Muhammad; Bluth, Martin H

2018-06-01

The molecular pathogenesis and classification of colorectal carcinoma are based on the traditional adenomaecarcinoma sequence, serrated polyp pathway, and microsatellite instability (MSI). The genetic basis for hereditary nonpolyposis colorectal cancer is the detection of mutations in the MLH1, MSH2, MSH6, PMS2, and EPCAM genes. Genetic testing for Lynch syndrome includes MSI testing, methylator phenotype testing, BRAF mutation testing, and molecular testing for germline mutations in MMR genes. Molecular makers with predictive and prognostic implications include quantitative multigene reverse transcriptase polymerase chain reaction assay and KRAS and BRAF mutation analysis. Mismatch repair-deficient tumors have higher rates of programmed death-ligand 1 expression. Cell-free DNA analysis in fluids are proving beneficial for diagnosis and prognosis in these disease states towards effective patient management. Copyright © 2018 Elsevier Inc. All rights reserved.
Genomic Selection in Plant Breeding: Methods, Models, and Perspectives.

PubMed

Crossa, José; Pérez-Rodríguez, Paulino; Cuevas, Jaime; Montesinos-López, Osval; Jarquín, Diego; de Los Campos, Gustavo; Burgueño, Juan; González-Camacho, Juan M; Pérez-Elizalde, Sergio; Beyene, Yoseph; Dreisigacker, Susanne; Singh, Ravi; Zhang, Xuecai; Gowda, Manje; Roorkiwal, Manish; Rutkoski, Jessica; Varshney, Rajeev K

2017-11-01

Genomic selection (GS) facilitates the rapid selection of superior genotypes and accelerates the breeding cycle. In this review, we discuss the history, principles, and basis of GS and genomic-enabled prediction (GP) as well as the genetics and statistical complexities of GP models, including genomic genotype×environment (G×E) interactions. We also examine the accuracy of GP models and methods for two cereal crops and two legume crops based on random cross-validation. GS applied to maize breeding has shown tangible genetic gains. Based on GP results, we speculate how GS in germplasm enhancement (i.e., prebreeding) programs could accelerate the flow of genes from gene bank accessions to elite lines. Recent advances in hyperspectral image technology could be combined with GS and pedigree-assisted breeding. Copyright © 2017 Elsevier Ltd. All rights reserved.

Network regularised Cox regression and multiplex network models to predict disease comorbidities and survival of cancer.

PubMed

Xu, Haoming; Moni, Mohammad Ali; Liò, Pietro

2015-12-01

In cancer genomics, gene expression levels provide important molecular signatures for all types of cancer, and this could be very useful for predicting the survival of cancer patients. However, the main challenge of gene expression data analysis is high dimensionality, and microarray is characterised by few number of samples with large number of genes. To overcome this problem, a variety of penalised Cox proportional hazard models have been proposed. We introduce a novel network regularised Cox proportional hazard model and a novel multiplex network model to measure the disease comorbidities and to predict survival of the cancer patient. Our methods are applied to analyse seven microarray cancer gene expression datasets: breast cancer, ovarian cancer, lung cancer, liver cancer, renal cancer and osteosarcoma. Firstly, we applied a principal component analysis to reduce the dimensionality of original gene expression data. Secondly, we applied a network regularised Cox regression model on the reduced gene expression datasets. By using normalised mutual information method and multiplex network model, we predict the comorbidities for the liver cancer based on the integration of diverse set of omics and clinical data, and we find the diseasome associations (disease-gene association) among different cancers based on the identified common significant genes. Finally, we evaluated the precision of the approach with respect to the accuracy of survival prediction using ROC curves. We report that colon cancer, liver cancer and renal cancer share the CXCL5 gene, and breast cancer, ovarian cancer and renal cancer share the CCND2 gene. Our methods are useful to predict survival of the patient and disease comorbidities more accurately and helpful for improvement of the care of patients with comorbidity. Software in Matlab and R is available on our GitHub page: https://github.com/ssnhcom/NetworkRegularisedCox.git. Copyright © 2015. Published by Elsevier Ltd.
In Silico Prediction and Validation of Gfap as an miR-3099 Target in Mouse Brain.

PubMed

Abidin, Shahidee Zainal; Leong, Jia-Wen; Mahmoudi, Marzieh; Nordin, Norshariza; Abdullah, Syahril; Cheah, Pike-See; Ling, King-Hwa

2017-08-01

MicroRNAs are small non-coding RNAs that play crucial roles in the regulation of gene expression and protein synthesis during brain development. MiR-3099 is highly expressed throughout embryogenesis, especially in the developing central nervous system. Moreover, miR-3099 is also expressed at a higher level in differentiating neurons in vitro, suggesting that it is a potential regulator during neuronal cell development. This study aimed to predict the target genes of miR-3099 via in-silico analysis using four independent prediction algorithms (miRDB, miRanda, TargetScan, and DIANA-micro-T-CDS) with emphasis on target genes related to brain development and function. Based on the analysis, a total of 3,174 miR-3099 target genes were predicted. Those predicted by at least three algorithms (324 genes) were subjected to DAVID bioinformatics analysis to understand their overall functional themes and representation. The analysis revealed that nearly 70% of the target genes were expressed in the nervous system and a significant proportion were associated with transcriptional regulation and protein ubiquitination mechanisms. Comparison of in situ hybridization (ISH) expression patterns of miR-3099 in both published and in-house-generated ISH sections with the ISH sections of target genes from the Allen Brain Atlas identified 7 target genes (Dnmt3a, Gabpa, Gfap, Itga4, Lxn, Smad7, and Tbx18) having expression patterns complementary to miR-3099 in the developing and adult mouse brain samples. Of these, we validated Gfap as a direct downstream target of miR-3099 using the luciferase reporter gene system. In conclusion, we report the successful prediction and validation of Gfap as an miR-3099 target gene using a combination of bioinformatics resources with enrichment of annotations based on functional ontologies and a spatio-temporal expression dataset.
Phenome-driven disease genetics prediction toward drug discovery.

PubMed

Chen, Yang; Li, Li; Zhang, Guo-Qiang; Xu, Rong

2015-06-15

Discerning genetic contributions to diseases not only enhances our understanding of disease mechanisms, but also leads to translational opportunities for drug discovery. Recent computational approaches incorporate disease phenotypic similarities to improve the prediction power of disease gene discovery. However, most current studies used only one data source of human disease phenotype. We present an innovative and generic strategy for combining multiple different data sources of human disease phenotype and predicting disease-associated genes from integrated phenotypic and genomic data. To demonstrate our approach, we explored a new phenotype database from biomedical ontologies and constructed Disease Manifestation Network (DMN). We combined DMN with mimMiner, which was a widely used phenotype database in disease gene prediction studies. Our approach achieved significantly improved performance over a baseline method, which used only one phenotype data source. In the leave-one-out cross-validation and de novo gene prediction analysis, our approach achieved the area under the curves of 90.7% and 90.3%, which are significantly higher than 84.2% (P < e(-4)) and 81.3% (P < e(-12)) for the baseline approach. We further demonstrated that our predicted genes have the translational potential in drug discovery. We used Crohn's disease as an example and ranked the candidate drugs based on the rank of drug targets. Our gene prediction approach prioritized druggable genes that are likely to be associated with Crohn's disease pathogenesis, and our rank of candidate drugs successfully prioritized the Food and Drug Administration-approved drugs for Crohn's disease. We also found literature evidence to support a number of drugs among the top 200 candidates. In summary, we demonstrated that a novel strategy combining unique disease phenotype data with system approaches can lead to rapid drug discovery. nlp. edu/public/data/DMN © The Author 2015. Published by Oxford University Press.
An integrative approach to ortholog prediction for disease-focused and other functional studies.

PubMed

Hu, Yanhui; Flockhart, Ian; Vinayagam, Arunachalam; Bergwitz, Clemens; Berger, Bonnie; Perrimon, Norbert; Mohr, Stephanie E

2011-08-31

Mapping of orthologous genes among species serves an important role in functional genomics by allowing researchers to develop hypotheses about gene function in one species based on what is known about the functions of orthologs in other species. Several tools for predicting orthologous gene relationships are available. However, these tools can give different results and identification of predicted orthologs is not always straightforward. We report a simple but effective tool, the Drosophila RNAi Screening Center Integrative Ortholog Prediction Tool (DIOPT; http://www.flyrnai.org/diopt), for rapid identification of orthologs. DIOPT integrates existing approaches, facilitating rapid identification of orthologs among human, mouse, zebrafish, C. elegans, Drosophila, and S. cerevisiae. As compared to individual tools, DIOPT shows increased sensitivity with only a modest decrease in specificity. Moreover, the flexibility built into the DIOPT graphical user interface allows researchers with different goals to appropriately 'cast a wide net' or limit results to highest confidence predictions. DIOPT also displays protein and domain alignments, including percent amino acid identity, for predicted ortholog pairs. This helps users identify the most appropriate matches among multiple possible orthologs. To facilitate using model organisms for functional analysis of human disease-associated genes, we used DIOPT to predict high-confidence orthologs of disease genes in Online Mendelian Inheritance in Man (OMIM) and genes in genome-wide association study (GWAS) data sets. The results are accessible through the DIOPT diseases and traits query tool (DIOPT-DIST; http://www.flyrnai.org/diopt-dist). DIOPT and DIOPT-DIST are useful resources for researchers working with model organisms, especially those who are interested in exploiting model organisms such as Drosophila to study the functions of human disease genes.
Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger

PubMed Central

Wright, James C; Sugden, Deana; Francis-McIntyre, Sue; Riba-Garcia, Isabel; Gaskell, Simon J; Grigoriev, Igor V; Baker, Scott E; Beynon, Robert J; Hubbard, Simon J

2009-01-01

Background Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS) and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR). Results 405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6%) of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models. Conclusion This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST) data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method. PMID:19193216
PreCisIon: PREdiction of CIS-regulatory elements improved by gene's positION.

PubMed

Elati, Mohamed; Nicolle, Rémy; Junier, Ivan; Fernández, David; Fekih, Rim; Font, Julio; Képès, François

2013-02-01

Conventional approaches to predict transcriptional regulatory interactions usually rely on the definition of a shared motif sequence on the target genes of a transcription factor (TF). These efforts have been frustrated by the limited availability and accuracy of TF binding site motifs, usually represented as position-specific scoring matrices, which may match large numbers of sites and produce an unreliable list of target genes. To improve the prediction of binding sites, we propose to additionally use the unrelated knowledge of the genome layout. Indeed, it has been shown that co-regulated genes tend to be either neighbors or periodically spaced along the whole chromosome. This study demonstrates that respective gene positioning carries significant information. This novel type of information is combined with traditional sequence information by a machine learning algorithm called PreCisIon. To optimize this combination, PreCisIon builds a strong gene target classifier by adaptively combining weak classifiers based on either local binding sequence or global gene position. This strategy generically paves the way to the optimized incorporation of any future advances in gene target prediction based on local sequence, genome layout or on novel criteria. With the current state of the art, PreCisIon consistently improves methods based on sequence information only. This is shown by implementing a cross-validation analysis of the 20 major TFs from two phylogenetically remote model organisms. For Bacillus subtilis and Escherichia coli, respectively, PreCisIon achieves on average an area under the receiver operating characteristic curve of 70 and 60%, a sensitivity of 80 and 70% and a specificity of 60 and 56%. The newly predicted gene targets are demonstrated to be functionally consistent with previously known targets, as assessed by analysis of Gene Ontology enrichment or of the relevant literature and databases.
Clinical Value of Prognosis Gene Expression Signatures in Colorectal Cancer: A Systematic Review

PubMed Central

Cordero, David; Riccadonna, Samantha; Solé, Xavier; Crous-Bou, Marta; Guinó, Elisabet; Sanjuan, Xavier; Biondo, Sebastiano; Soriano, Antonio; Jurman, Giuseppe; Capella, Gabriel; Furlanello, Cesare; Moreno, Victor

2012-01-01

Introduction The traditional staging system is inadequate to identify those patients with stage II colorectal cancer (CRC) at high risk of recurrence or with stage III CRC at low risk. A number of gene expression signatures to predict CRC prognosis have been proposed, but none is routinely used in the clinic. The aim of this work was to assess the prediction ability and potential clinical usefulness of these signatures in a series of independent datasets. Methods A literature review identified 31 gene expression signatures that used gene expression data to predict prognosis in CRC tissue. The search was based on the PubMed database and was restricted to papers published from January 2004 to December 2011. Eleven CRC gene expression datasets with outcome information were identified and downloaded from public repositories. Random Forest classifier was used to build predictors from the gene lists. Matthews correlation coefficient was chosen as a measure of classification accuracy and its associated p-value was used to assess association with prognosis. For clinical usefulness evaluation, positive and negative post-tests probabilities were computed in stage II and III samples. Results Five gene signatures showed significant association with prognosis and provided reasonable prediction accuracy in their own training datasets. Nevertheless, all signatures showed low reproducibility in independent data. Stratified analyses by stage or microsatellite instability status showed significant association but limited discrimination ability, especially in stage II tumors. From a clinical perspective, the most predictive signatures showed a minor but significant improvement over the classical staging system. Conclusions The published signatures show low prediction accuracy but moderate clinical usefulness. Although gene expression data may inform prognosis, better strategies for signature validation are needed to encourage their widespread use in the clinic. PMID:23145004
Stroma-associated master regulators of molecular subtypes predict patient prognosis in ovarian cancer.

PubMed

Zhang, Shengzhe; Jing, Ying; Zhang, Meiying; Zhang, Zhenfeng; Ma, Pengfei; Peng, Huixin; Shi, Kaixuan; Gao, Wei-Qiang; Zhuang, Guanglei

2015-11-04

High-grade serous ovarian carcinoma (HGS-OvCa) has the lowest survival rate among all gynecologic cancers and is hallmarked by a high degree of heterogeneity. The Cancer Genome Atlas network has described a gene expression-based molecular classification of HGS-OvCa into Differentiated, Mesenchymal, Immunoreactive and Proliferative subtypes. However, the biological underpinnings and regulatory mechanisms underlying the distinct molecular subtypes are largely unknown. Here we showed that tumor-infiltrating stromal cells significantly contributed to the assignments of Mesenchymal and Immunoreactive clusters. Using reverse engineering and an unbiased interrogation of subtype regulatory networks, we identified the transcriptional modules containing master regulators that drive gene expression of Mesenchymal and Immunoreactive HGS-OvCa. Mesenchymal master regulators were associated with poor prognosis, while Immunoreactive master regulators positively correlated with overall survival. Meta-analysis of 749 HGS-OvCa expression profiles confirmed that master regulators as a prognostic signature were able to predict patient outcome. Our data unraveled master regulatory programs of HGS-OvCa subtypes with prognostic and potentially therapeutic relevance, and suggested that the unique transcriptional and clinical characteristics of ovarian Mesenchymal and Immunoreactive subtypes could be, at least partially, ascribed to tumor microenvironment.
MATRIX FACTORIZATION-BASED DATA FUSION FOR GENE FUNCTION PREDICTION IN BAKER’S YEAST AND SLIME MOLD

PubMed Central

ŽITNIK, MARINKA; ZUPAN, BLAŽ

2014-01-01

The development of effective methods for the characterization of gene functions that are able to combine diverse data sources in a sound and easily-extendible way is an important goal in computational biology. We have previously developed a general matrix factorization-based data fusion approach for gene function prediction. In this manuscript, we show that this data fusion approach can be applied to gene function prediction and that it can fuse various heterogeneous data sources, such as gene expression profiles, known protein annotations, interaction and literature data. The fusion is achieved by simultaneous matrix tri-factorization that shares matrix factors between sources. We demonstrate the effectiveness of the approach by evaluating its performance on predicting ontological annotations in slime mold D. discoideum and on recognizing proteins of baker’s yeast S. cerevisiae that participate in the ribosome or are located in the cell membrane. Our approach achieves predictive performance comparable to that of the state-of-the-art kernel-based data fusion, but requires fewer data preprocessing steps. PMID:24297565
Molecular cloning and cold shock induced overexpression of the DNA encoding phor sensor domain from Mycobacterium tuberculosis as a target molecule for novel anti-tubercular drugs

NASA Astrophysics Data System (ADS)

Langi, Gladys Emmanuella Putri; Moeis, Maelita R.; Ihsanawati, Giri-Rachman, Ernawati Arifin

2014-03-01

Mycobacterium tuberculosis (Mtb), the sole cause of Tuberculosis (TB), is still a major global problem. The discovery of new anti-tubercular drugs is needed to face the increasing TB cases, especially to prevent the increase of cases with resistant Mtb. A potential novel drug target is the Mtb PhoR sensor domain protein which is the histidine kinase extracellular domain for receiving environmental signals. This protein is the initial part of the two-component system PhoR-PhoP regulating 114 genes related to the virulence of Mtb. In this study, the gene encoding PhoR sensor domain (SensPhoR) was subcloned from pGEM-T SensPhoR from the previous study (Suwanto, 2012) to pColdII. The construct pColdII SensPhoR was confirmed through restriction analysis and sequencing. Using the construct, SensPhoR was overexpressed at 15°C using Escherichia coli BL21 (DE3). Low temperature was chosen because according to the solubility prediction program of recombinant proteins from The University of Oklahama, the PhoR sensor domain has a chance of 79.8% to be expressed as insoluble proteins in Escherichia coli's (E. coli) cytoplasm. This prediction is also supported by other similar programs: PROSO and PROSO II. The SDS PAGE result indicated that the PhoR sensor domain recombinant protein was overexpressed. For future studies, this protein will be purified and used for structure analysis which can be used to find potential drugs through rational drug design.
Array data extractor (ADE): a LabVIEW program to extract and merge gene array data.

PubMed

Kurtenbach, Stefan; Kurtenbach, Sarah; Zoidl, Georg

2013-12-01

Large data sets from gene expression array studies are publicly available offering information highly valuable for research across many disciplines ranging from fundamental to clinical research. Highly advanced bioinformatics tools have been made available to researchers, but a demand for user-friendly software allowing researchers to quickly extract expression information for multiple genes from multiple studies persists. Here, we present a user-friendly LabVIEW program to automatically extract gene expression data for a list of genes from multiple normalized microarray datasets. Functionality was tested for 288 class A G protein-coupled receptors (GPCRs) and expression data from 12 studies comparing normal and diseased human hearts. Results confirmed known regulation of a beta 1 adrenergic receptor and further indicate novel research targets. Although existing software allows for complex data analyses, the LabVIEW based program presented here, "Array Data Extractor (ADE)", provides users with a tool to retrieve meaningful information from multiple normalized gene expression datasets in a fast and easy way. Further, the graphical programming language used in LabVIEW allows applying changes to the program without the need of advanced programming knowledge.
Cis-regulatory somatic mutations and gene-expression alteration in B-cell lymphomas.

PubMed

Mathelier, Anthony; Lefebvre, Calvin; Zhang, Allen W; Arenillas, David J; Ding, Jiarui; Wasserman, Wyeth W; Shah, Sohrab P

2015-04-23

With the rapid increase of whole-genome sequencing of human cancers, an important opportunity to analyze and characterize somatic mutations lying within cis-regulatory regions has emerged. A focus on protein-coding regions to identify nonsense or missense mutations disruptive to protein structure and/or function has led to important insights; however, the impact on gene expression of mutations lying within cis-regulatory regions remains under-explored. We analyzed somatic mutations from 84 matched tumor-normal whole genomes from B-cell lymphomas with accompanying gene expression measurements to elucidate the extent to which these cancers are disrupted by cis-regulatory mutations. We characterize mutations overlapping a high quality set of well-annotated transcription factor binding sites (TFBSs), covering a similar portion of the genome as protein-coding exons. Our results indicate that cis-regulatory mutations overlapping predicted TFBSs are enriched in promoter regions of genes involved in apoptosis or growth/proliferation. By integrating gene expression data with mutation data, our computational approach culminates with identification of cis-regulatory mutations most likely to participate in dysregulation of the gene expression program. The impact can be measured along with protein-coding mutations to highlight key mutations disrupting gene expression and pathways in cancer. Our study yields specific genes with disrupted expression triggered by genomic mutations in either the coding or the regulatory space. It implies that mutated regulatory components of the genome contribute substantially to cancer pathways. Our analyses demonstrate that identifying genomically altered cis-regulatory elements coupled with analysis of gene expression data will augment biological interpretation of mutational landscapes of cancers.
An approach for reduction of false predictions in reverse engineering of gene regulatory networks.

PubMed

Khan, Abhinandan; Saha, Goutam; Pal, Rajat Kumar

2018-05-14

A gene regulatory network discloses the regulatory interactions amongst genes, at a particular condition of the human body. The accurate reconstruction of such networks from time-series genetic expression data using computational tools offers a stiff challenge for contemporary computer scientists. This is crucial to facilitate the understanding of the proper functioning of a living organism. Unfortunately, the computational methods produce many false predictions along with the correct predictions, which is unwanted. Investigations in the domain focus on the identification of as many correct regulations as possible in the reverse engineering of gene regulatory networks to make it more reliable and biologically relevant. One way to achieve this is to reduce the number of incorrect predictions in the reconstructed networks. In the present investigation, we have proposed a novel scheme to decrease the number of false predictions by suitably combining several metaheuristic techniques. We have implemented the same using a dataset ensemble approach (i.e. combining multiple datasets) also. We have employed the proposed methodology on real-world experimental datasets of the SOS DNA Repair network of Escherichia coli and the IMRA network of Saccharomyces cerevisiae. Subsequently, we have experimented upon somewhat larger, in silico networks, namely, DREAM3 and DREAM4 Challenge networks, and 15-gene and 20-gene networks extracted from the GeneNetWeaver database. To study the effect of multiple datasets on the quality of the inferred networks, we have used four datasets in each experiment. The obtained results are encouraging enough as the proposed methodology can reduce the number of false predictions significantly, without using any supplementary prior biological information for larger gene regulatory networks. It is also observed that if a small amount of prior biological information is incorporated here, the results improve further w.r.t. the prediction of true positives. Copyright © 2018 Elsevier Ltd. All rights reserved.
Gene function prediction with gene interaction networks: a context graph kernel approach.

PubMed

Li, Xin; Chen, Hsinchun; Li, Jiexun; Zhang, Zhu

2010-01-01

Predicting gene functions is a challenge for biologists in the postgenomic era. Interactions among genes and their products compose networks that can be used to infer gene functions. Most previous studies adopt a linkage assumption, i.e., they assume that gene interactions indicate functional similarities between connected genes. In this study, we propose to use a gene's context graph, i.e., the gene interaction network associated with the focal gene, to infer its functions. In a kernel-based machine-learning framework, we design a context graph kernel to capture the information in context graphs. Our experimental study on a testbed of p53-related genes demonstrates the advantage of using indirect gene interactions and shows the empirical superiority of the proposed approach over linkage-assumption-based methods, such as the algorithm to minimize inconsistent connected genes and diffusion kernels.
G protein polymorphisms do not predict weight loss and improvement of hypertension in severely obese patients.

PubMed

Potoczna, Natascha; Wertli, Maria; Steffen, Rudolph; Ricklin, Thomas; Lentes, Klaus-Ulrich; Horber, Fritz F

2004-11-01

Both the gene encoding the alpha subunit of G stimulatory proteins (GNAS1) and the beta3 subunit gene (GNB3) of G proteins are associated with obesity and/or hypertension. Moreover, the TT/TC825 polymorphism of GNB3 predicts greater weight loss than the CC825 polymorphism in obese patients (mean body mass index, 35 kg/m2) undergoing a structured nonpharmacologic weight loss program. Gastric banding enforces a low-calorie diet by diminishing the need for volitional adherence. It is unknown whether these polymorphisms predict the variable weight loss in patients after bariatric surgery. Three hundred and four severely obese patients (mean +/- SEM age, 42 +/- 1 years; 245 women and 59 men; mean +/- SEM body mass index, 43.9 +/- 0.3 kg/m2) followed prospectively for at least 3 years after surgery were genotyped for the GNB3 C825T, G814A, and GNAS1 T393 polymorphisms. All analyses were performed blinded to the phenotypic characteristics of the study group. Frequencies of polymorphisms were comparable to those previously published. No polymorphism studied predicted 3-year weight loss or was associated with high blood pressure in severely obese patients after gastric banding. Multivariate analysis of potentially confounding factors such as reoperation rate or use of sibutramine or orlistat revealed similar results (P > 0.1). Regardless of the mechanism(s) involved for these discordant findings, GNB3 C825T, G814A, and GNAS1 T393C polymorphisms do not seem to be reliable predictors of long-term weight loss.
A manually annotated Actinidia chinensis var. chinensis (kiwifruit) genome highlights the challenges associated with draft genomes and gene prediction in plants.

PubMed

Pilkington, Sarah M; Crowhurst, Ross; Hilario, Elena; Nardozza, Simona; Fraser, Lena; Peng, Yongyan; Gunaseelan, Kularajathevan; Simpson, Robert; Tahir, Jibran; Deroles, Simon C; Templeton, Kerry; Luo, Zhiwei; Davy, Marcus; Cheng, Canhong; McNeilage, Mark; Scaglione, Davide; Liu, Yifei; Zhang, Qiong; Datson, Paul; De Silva, Nihal; Gardiner, Susan E; Bassett, Heather; Chagné, David; McCallum, John; Dzierzon, Helge; Deng, Cecilia; Wang, Yen-Yi; Barron, Lorna; Manako, Kelvina; Bowen, Judith; Foster, Toshi M; Erridge, Zoe A; Tiffin, Heather; Waite, Chethi N; Davies, Kevin M; Grierson, Ella P; Laing, William A; Kirk, Rebecca; Chen, Xiuyin; Wood, Marion; Montefiori, Mirco; Brummell, David A; Schwinn, Kathy E; Catanach, Andrew; Fullerton, Christina; Li, Dawei; Meiyalaghan, Sathiyamoorthy; Nieuwenhuizen, Niels; Read, Nicola; Prakash, Roneel; Hunter, Don; Zhang, Huaibi; McKenzie, Marian; Knäbel, Mareike; Harris, Alastair; Allan, Andrew C; Gleave, Andrew; Chen, Angela; Janssen, Bart J; Plunkett, Blue; Ampomah-Dwamena, Charles; Voogd, Charlotte; Leif, Davin; Lafferty, Declan; Souleyre, Edwige J F; Varkonyi-Gasic, Erika; Gambi, Francesco; Hanley, Jenny; Yao, Jia-Long; Cheung, Joey; David, Karine M; Warren, Ben; Marsh, Ken; Snowden, Kimberley C; Lin-Wang, Kui; Brian, Lara; Martinez-Sanchez, Marcela; Wang, Mindy; Ileperuma, Nadeesha; Macnee, Nikolai; Campin, Robert; McAtee, Peter; Drummond, Revel S M; Espley, Richard V; Ireland, Hilary S; Wu, Rongmei; Atkinson, Ross G; Karunairetnam, Sakuntala; Bulley, Sean; Chunkath, Shayhan; Hanley, Zac; Storey, Roy; Thrimawithana, Amali H; Thomson, Susan; David, Charles; Testolin, Raffaele; Huang, Hongwen; Hellens, Roger P; Schaffer, Robert J

2018-04-16

Most published genome sequences are drafts, and most are dominated by computational gene prediction. Draft genomes typically incorporate considerable sequence data that are not assigned to chromosomes, and predicted genes without quality confidence measures. The current Actinidia chinensis (kiwifruit) 'Hongyang' draft genome has 164 Mb of sequences unassigned to pseudo-chromosomes, and omissions have been identified in the gene models. A second genome of an A. chinensis (genotype Red5) was fully sequenced. This new sequence resulted in a 554.0 Mb assembly with all but 6 Mb assigned to pseudo-chromosomes. Pseudo-chromosomal comparisons showed a considerable number of translocation events have occurred following a whole genome duplication (WGD) event some consistent with centromeric Robertsonian-like translocations. RNA sequencing data from 12 tissues and ab initio analysis informed a genome-wide manual annotation, using the WebApollo tool. In total, 33,044 gene loci represented by 33,123 isoforms were identified, named and tagged for quality of evidential support. Of these 3114 (9.4%) were identical to a protein within 'Hongyang' The Kiwifruit Information Resource (KIR v2). Some proportion of the differences will be varietal polymorphisms. However, as most computationally predicted Red5 models required manual re-annotation this proportion is expected to be small. The quality of the new gene models was tested by fully sequencing 550 cloned 'Hort16A' cDNAs and comparing with the predicted protein models for Red5 and both the original 'Hongyang' assembly and the revised annotation from KIR v2. Only 48.9% and 63.5% of the cDNAs had a match with 90% identity or better to the original and revised 'Hongyang' annotation, respectively, compared with 90.9% to the Red5 models. Our study highlights the need to take a cautious approach to draft genomes and computationally predicted genes. Our use of the manual annotation tool WebApollo facilitated manual checking and correction of gene models enabling improvement of computational prediction. This utility was especially relevant for certain types of gene families such as the EXPANSIN like genes. Finally, this high quality gene set will supply the kiwifruit and general plant community with a new tool for genomics and other comparative analysis.
Long-distance gene flow and adaptation of forest trees to rapid climate change

PubMed Central

Kremer, Antoine; Ronce, Ophélie; Robledo-Arnuncio, Juan J; Guillaume, Frédéric; Bohrer, Gil; Nathan, Ran; Bridle, Jon R; Gomulkiewicz, Richard; Klein, Etienne K; Ritland, Kermit; Kuparinen, Anna; Gerber, Sophie; Schueler, Silvio

2012-01-01

Forest trees are the dominant species in many parts of the world and predicting how they might respond to climate change is a vital global concern. Trees are capable of long-distance gene flow, which can promote adaptive evolution in novel environments by increasing genetic variation for fitness. It is unclear, however, if this can compensate for maladaptive effects of gene flow and for the long-generation times of trees. We critically review data on the extent of long-distance gene flow and summarise theory that allows us to predict evolutionary responses of trees to climate change. Estimates of long-distance gene flow based both on direct observations and on genetic methods provide evidence that genes can move over spatial scales larger than habitat shifts predicted under climate change within one generation. Both theoretical and empirical data suggest that the positive effects of gene flow on adaptation may dominate in many instances. The balance of positive to negative consequences of gene flow may, however, differ for leading edge, core and rear sections of forest distributions. We propose future experimental and theoretical research that would better integrate dispersal biology with evolutionary quantitative genetics and improve predictions of tree responses to climate change. PMID:22372546
Long-distance gene flow and adaptation of forest trees to rapid climate change.

PubMed

Kremer, Antoine; Ronce, Ophélie; Robledo-Arnuncio, Juan J; Guillaume, Frédéric; Bohrer, Gil; Nathan, Ran; Bridle, Jon R; Gomulkiewicz, Richard; Klein, Etienne K; Ritland, Kermit; Kuparinen, Anna; Gerber, Sophie; Schueler, Silvio

2012-04-01

Forest trees are the dominant species in many parts of the world and predicting how they might respond to climate change is a vital global concern. Trees are capable of long-distance gene flow, which can promote adaptive evolution in novel environments by increasing genetic variation for fitness. It is unclear, however, if this can compensate for maladaptive effects of gene flow and for the long-generation times of trees. We critically review data on the extent of long-distance gene flow and summarise theory that allows us to predict evolutionary responses of trees to climate change. Estimates of long-distance gene flow based both on direct observations and on genetic methods provide evidence that genes can move over spatial scales larger than habitat shifts predicted under climate change within one generation. Both theoretical and empirical data suggest that the positive effects of gene flow on adaptation may dominate in many instances. The balance of positive to negative consequences of gene flow may, however, differ for leading edge, core and rear sections of forest distributions. We propose future experimental and theoretical research that would better integrate dispersal biology with evolutionary quantitative genetics and improve predictions of tree responses to climate change. © 2012 Blackwell Publishing Ltd/CNRS.
Transcription Factor Binding Profiles Reveal Cyclic Expression of Human Protein-coding Genes and Non-coding RNAs

PubMed Central

Cheng, Chao; Ung, Matthew; Grant, Gavin D.; Whitfield, Michael L.

2013-01-01

Cell cycle is a complex and highly supervised process that must proceed with regulatory precision to achieve successful cellular division. Despite the wide application, microarray time course experiments have several limitations in identifying cell cycle genes. We thus propose a computational model to predict human cell cycle genes based on transcription factor (TF) binding and regulatory motif information in their promoters. We utilize ENCODE ChIP-seq data and motif information as predictors to discriminate cell cycle against non-cell cycle genes. Our results show that both the trans- TF features and the cis- motif features are predictive of cell cycle genes, and a combination of the two types of features can further improve prediction accuracy. We apply our model to a complete list of GENCODE promoters to predict novel cell cycle driving promoters for both protein-coding genes and non-coding RNAs such as lincRNAs. We find that a similar percentage of lincRNAs are cell cycle regulated as protein-coding genes, suggesting the importance of non-coding RNAs in cell cycle division. The model we propose here provides not only a practical tool for identifying novel cell cycle genes with high accuracy, but also new insights on cell cycle regulation by TFs and cis-regulatory elements. PMID:23874175
Genomic Prediction and Association Mapping of Curd-Related Traits in Gene Bank Accessions of Cauliflower.

PubMed

Thorwarth, Patrick; Yousef, Eltohamy A A; Schmid, Karl J

2018-02-02

Genetic resources are an important source of genetic variation for plant breeding. Genome-wide association studies (GWAS) and genomic prediction greatly facilitate the analysis and utilization of useful genetic diversity for improving complex phenotypic traits in crop plants. We explored the potential of GWAS and genomic prediction for improving curd-related traits in cauliflower ( Brassica oleracea var. botrytis ) by combining 174 randomly selected cauliflower gene bank accessions from two different gene banks. The collection was genotyped with genotyping-by-sequencing (GBS) and phenotyped for six curd-related traits at two locations and three growing seasons. A GWAS analysis based on 120,693 single-nucleotide polymorphisms identified a total of 24 significant associations for curd-related traits. The potential for genomic prediction was assessed with a genomic best linear unbiased prediction model and BayesB. Prediction abilities ranged from 0.10 to 0.66 for different traits and did not differ between prediction methods. Imputation of missing genotypes only slightly improved prediction ability. Our results demonstrate that GWAS and genomic prediction in combination with GBS and phenotyping of highly heritable traits can be used to identify useful quantitative trait loci and genotypes among genetically diverse gene bank material for subsequent utilization as genetic resources in cauliflower breeding. Copyright © 2018 Thorwarth et al.

Kassiopeia: a database and web application for the analysis of mutually exclusive exomes of eukaryotes

PubMed Central

2014-01-01

Background Alternative splicing is an important process in higher eukaryotes that allows obtaining several transcripts from one gene. A specific case of alternative splicing is mutually exclusive splicing, in which exactly one exon out of a cluster of neighbouring exons is spliced into the mature transcript. Recently, a new algorithm for the prediction of these exons has been developed based on the preconditions that the exons of the cluster have similar lengths, sequence homology, and conserved splice sites, and that they are translated in the same reading frame. Description In this contribution we introduce Kassiopeia, a database and web application for the generation, storage, and presentation of genome-wide analyses of mutually exclusive exomes. Currently, Kassiopeia provides access to the mutually exclusive exomes of twelve Drosophila species, the thale cress Arabidopsis thaliana, the flatworm Caenorhabditis elegans, and human. Mutually exclusive spliced exons (MXEs) were predicted based on gene reconstructions from Scipio. Based on the standard prediction values, with which 83.5% of the annotated MXEs of Drosophila melanogaster were reconstructed, the exomes contain surprisingly more MXEs than previously supposed and identified. The user can search Kassiopeia using BLAST or browse the genes of each species optionally adjusting the parameters used for the prediction to reveal more divergent or only very similar exon candidates. Conclusions We developed a pipeline to predict MXEs in the genomes of several model organisms and a web interface, Kassiopeia, for their visualization. For each gene Kassiopeia provides a comprehensive gene structure scheme, the sequences and predicted secondary structures of the MXEs, and, if available, further evidence for MXE candidates from cDNA/EST data, predictions of MXEs in homologous genes of closely related species, and RNA secondary structure predictions. Kassiopeia can be accessed at http://www.motorprotein.de/kassiopeia. PMID:24507667
Human microRNA target analysis and gene ontology clustering by GOmir, a novel stand-alone application

PubMed Central

Roubelakis, Maria G; Zotos, Pantelis; Papachristoudis, Georgios; Michalopoulos, Ioannis; Pappa, Kalliopi I; Anagnou, Nicholas P; Kossida, Sophia

2009-01-01

Background microRNAs (miRNAs) are single-stranded RNA molecules of about 20–23 nucleotides length found in a wide variety of organisms. miRNAs regulate gene expression, by interacting with target mRNAs at specific sites in order to induce cleavage of the message or inhibit translation. Predicting or verifying mRNA targets of specific miRNAs is a difficult process of great importance. Results GOmir is a novel stand-alone application consisting of two separate tools: JTarget and TAGGO. JTarget integrates miRNA target prediction and functional analysis by combining the predicted target genes from TargetScan, miRanda, RNAhybrid and PicTar computational tools as well as the experimentally supported targets from TarBase and also providing a full gene description and functional analysis for each target gene. On the other hand, TAGGO application is designed to automatically group gene ontology annotations, taking advantage of the Gene Ontology (GO), in order to extract the main attributes of sets of proteins. GOmir represents a new tool incorporating two separate Java applications integrated into one stand-alone Java application. Conclusion GOmir (by using up to five different databases) introduces miRNA predicted targets accompanied by (a) full gene description, (b) functional analysis and (c) detailed gene ontology clustering. Additionally, a reverse search initiated by a potential target can also be conducted. GOmir can freely be downloaded BRFAA. PMID:19534746
Human microRNA target analysis and gene ontology clustering by GOmir, a novel stand-alone application.

PubMed

Roubelakis, Maria G; Zotos, Pantelis; Papachristoudis, Georgios; Michalopoulos, Ioannis; Pappa, Kalliopi I; Anagnou, Nicholas P; Kossida, Sophia

2009-06-16

microRNAs (miRNAs) are single-stranded RNA molecules of about 20-23 nucleotides length found in a wide variety of organisms. miRNAs regulate gene expression, by interacting with target mRNAs at specific sites in order to induce cleavage of the message or inhibit translation. Predicting or verifying mRNA targets of specific miRNAs is a difficult process of great importance. GOmir is a novel stand-alone application consisting of two separate tools: JTarget and TAGGO. JTarget integrates miRNA target prediction and functional analysis by combining the predicted target genes from TargetScan, miRanda, RNAhybrid and PicTar computational tools as well as the experimentally supported targets from TarBase and also providing a full gene description and functional analysis for each target gene. On the other hand, TAGGO application is designed to automatically group gene ontology annotations, taking advantage of the Gene Ontology (GO), in order to extract the main attributes of sets of proteins. GOmir represents a new tool incorporating two separate Java applications integrated into one stand-alone Java application. GOmir (by using up to five different databases) introduces miRNA predicted targets accompanied by (a) full gene description, (b) functional analysis and (c) detailed gene ontology clustering. Additionally, a reverse search initiated by a potential target can also be conducted. GOmir can freely be downloaded BRFAA.
Meta-analysis of gene expression profiles associated with histological classification and survival in 829 ovarian cancer samples.

PubMed

Fekete, Tibor; Rásó, Erzsébet; Pete, Imre; Tegze, Bálint; Liko, István; Munkácsy, Gyöngyi; Sipos, Norbert; Rigó, János; Györffy, Balázs

2012-07-01

Transcriptomic analysis of global gene expression in ovarian carcinoma can identify dysregulated genes capable to serve as molecular markers for histology subtypes and survival. The aim of our study was to validate previous candidate signatures in an independent setting and to identify single genes capable to serve as biomarkers for ovarian cancer progression. As several datasets are available in the GEO today, we were able to perform a true meta-analysis. First, 829 samples (11 datasets) were downloaded, and the predictive power of 16 previously published gene sets was assessed. Of these, eight were capable to discriminate histology subtypes, and none was capable to predict survival. To overcome the differences in previous studies, we used the 829 samples to identify new predictors. Then, we collected 64 ovarian cancer samples (median relapse-free survival 24.5 months) and performed TaqMan Real Time Polimerase Chain Reaction (RT-PCR) analysis for the best 40 genes associated with histology subtypes and survival. Over 90% of subtype-associated genes were confirmed. Overall survival was effectively predicted by hormone receptors (PGR and ESR2) and by TSPAN8. Relapse-free survival was predicted by MAPT and SNCG. In summary, we successfully validated several gene sets in a meta-analysis in large datasets of ovarian samples. Additionally, several individual genes identified were validated in a clinical cohort. Copyright © 2011 UICC.
Prediction of gene expression in embryonic structures of Drosophila melanogaster.

PubMed

Samsonova, Anastasia A; Niranjan, Mahesan; Russell, Steven; Brazma, Alvis

2007-07-01

Understanding how sets of genes are coordinately regulated in space and time to generate the diversity of cell types that characterise complex metazoans is a major challenge in modern biology. The use of high-throughput approaches, such as large-scale in situ hybridisation and genome-wide expression profiling via DNA microarrays, is beginning to provide insights into the complexities of development. However, in many organisms the collection and annotation of comprehensive in situ localisation data is a difficult and time-consuming task. Here, we present a widely applicable computational approach, integrating developmental time-course microarray data with annotated in situ hybridisation studies, that facilitates the de novo prediction of tissue-specific expression for genes that have no in vivo gene expression localisation data available. Using a classification approach, trained with data from microarray and in situ hybridisation studies of gene expression during Drosophila embryonic development, we made a set of predictions on the tissue-specific expression of Drosophila genes that have not been systematically characterised by in situ hybridisation experiments. The reliability of our predictions is confirmed by literature-derived annotations in FlyBase, by overrepresentation of Gene Ontology biological process annotations, and, in a selected set, by detailed gene-specific studies from the literature. Our novel organism-independent method will be of considerable utility in enriching the annotation of gene function and expression in complex multicellular organisms.
Prediction of Gene Expression in Embryonic Structures of Drosophila melanogaster

PubMed Central

Samsonova, Anastasia A; Niranjan, Mahesan; Russell, Steven; Brazma, Alvis

2007-01-01

Understanding how sets of genes are coordinately regulated in space and time to generate the diversity of cell types that characterise complex metazoans is a major challenge in modern biology. The use of high-throughput approaches, such as large-scale in situ hybridisation and genome-wide expression profiling via DNA microarrays, is beginning to provide insights into the complexities of development. However, in many organisms the collection and annotation of comprehensive in situ localisation data is a difficult and time-consuming task. Here, we present a widely applicable computational approach, integrating developmental time-course microarray data with annotated in situ hybridisation studies, that facilitates the de novo prediction of tissue-specific expression for genes that have no in vivo gene expression localisation data available. Using a classification approach, trained with data from microarray and in situ hybridisation studies of gene expression during Drosophila embryonic development, we made a set of predictions on the tissue-specific expression of Drosophila genes that have not been systematically characterised by in situ hybridisation experiments. The reliability of our predictions is confirmed by literature-derived annotations in FlyBase, by overrepresentation of Gene Ontology biological process annotations, and, in a selected set, by detailed gene-specific studies from the literature. Our novel organism-independent method will be of considerable utility in enriching the annotation of gene function and expression in complex multicellular organisms. PMID:17658945
Identifying metabolic enzymes with multiple types of association evidence

PubMed Central

Kharchenko, Peter; Chen, Lifeng; Freund, Yoav; Vitkup, Dennis; Church, George M

2006-01-01

Background Existing large-scale metabolic models of sequenced organisms commonly include enzymatic functions which can not be attributed to any gene in that organism. Existing computational strategies for identifying such missing genes rely primarily on sequence homology to known enzyme-encoding genes. Results We present a novel method for identifying genes encoding for a specific metabolic function based on a local structure of metabolic network and multiple types of functional association evidence, including clustering of genes on the chromosome, similarity of phylogenetic profiles, gene expression, protein fusion events and others. Using E. coli and S. cerevisiae metabolic networks, we illustrate predictive ability of each individual type of association evidence and show that significantly better predictions can be obtained based on the combination of all data. In this way our method is able to predict 60% of enzyme-encoding genes of E. coli metabolism within the top 10 (out of 3551) candidates for their enzymatic function, and as a top candidate within 43% of the cases. Conclusion We illustrate that a combination of genome context and other functional association evidence is effective in predicting genes encoding metabolic enzymes. Our approach does not rely on direct sequence homology to known enzyme-encoding genes, and can be used in conjunction with traditional homology-based metabolic reconstruction methods. The method can also be used to target orphan metabolic activities. PMID:16571130
A Gene Expression Profile of BRCAness That Predicts for Responsiveness to Platinum and PARP Inhibitors

DTIC Science & Technology

2015-10-01

1 Award Number: W81XWH-10-1-0585 TITLE: A Gene Expression Profile of BRCAness That Predicts for Responsiveness to Platinum and PARP Inhibitors...TITLE AND SUBTITLE A Gene Expression Profile of BRCAness That Predicts for Responsiveness to Platinum and PARP Inhibitors 5a. CONTRACT NUMBER W81XWH...BRCAlike, i.e. not HR deficient and are resistant to PARPis but are sensitive to platinum . These tumors exhibit alterations in another DNA repair
Reverse engineering model structures for soil and ecosystem respiration: the potential of gene expression programming

NASA Astrophysics Data System (ADS)

Ilie, Iulia; Dittrich, Peter; Carvalhais, Nuno; Jung, Martin; Heinemeyer, Andreas; Migliavacca, Mirco; Morison, James I. L.; Sippel, Sebastian; Subke, Jens-Arne; Wilkinson, Matthew; Mahecha, Miguel D.

2017-09-01

Accurate model representation of land-atmosphere carbon fluxes is essential for climate projections. However, the exact responses of carbon cycle processes to climatic drivers often remain uncertain. Presently, knowledge derived from experiments, complemented by a steadily evolving body of mechanistic theory, provides the main basis for developing such models. The strongly increasing availability of measurements may facilitate new ways of identifying suitable model structures using machine learning. Here, we explore the potential of gene expression programming (GEP) to derive relevant model formulations based solely on the signals present in data by automatically applying various mathematical transformations to potential predictors and repeatedly evolving the resulting model structures. In contrast to most other machine learning regression techniques, the GEP approach generates readable models that allow for prediction and possibly for interpretation. Our study is based on two cases: artificially generated data and real observations. Simulations based on artificial data show that GEP is successful in identifying prescribed functions, with the prediction capacity of the models comparable to four state-of-the-art machine learning methods (random forests, support vector machines, artificial neural networks, and kernel ridge regressions). Based on real observations we explore the responses of the different components of terrestrial respiration at an oak forest in south-eastern England. We find that the GEP-retrieved models are often better in prediction than some established respiration models. Based on their structures, we find previously unconsidered exponential dependencies of respiration on seasonal ecosystem carbon assimilation and water dynamics. We noticed that the GEP models are only partly portable across respiration components, the identification of a general terrestrial respiration model possibly prevented by equifinality issues. Overall, GEP is a promising tool for uncovering new model structures for terrestrial ecology in the data-rich era, complementing more traditional modelling approaches.
Large-Scale Bi-Level Strain Design Approaches and Mixed-Integer Programming Solution Techniques

PubMed Central

Kim, Joonhoon; Reed, Jennifer L.; Maravelias, Christos T.

2011-01-01

The use of computational models in metabolic engineering has been increasing as more genome-scale metabolic models and computational approaches become available. Various computational approaches have been developed to predict how genetic perturbations affect metabolic behavior at a systems level, and have been successfully used to engineer microbial strains with improved primary or secondary metabolite production. However, identification of metabolic engineering strategies involving a large number of perturbations is currently limited by computational resources due to the size of genome-scale models and the combinatorial nature of the problem. In this study, we present (i) two new bi-level strain design approaches using mixed-integer programming (MIP), and (ii) general solution techniques that improve the performance of MIP-based bi-level approaches. The first approach (SimOptStrain) simultaneously considers gene deletion and non-native reaction addition, while the second approach (BiMOMA) uses minimization of metabolic adjustment to predict knockout behavior in a MIP-based bi-level problem for the first time. Our general MIP solution techniques significantly reduced the CPU times needed to find optimal strategies when applied to an existing strain design approach (OptORF) (e.g., from ∼10 days to ∼5 minutes for metabolic engineering strategies with 4 gene deletions), and identified strategies for producing compounds where previous studies could not (e.g., malate and serine). Additionally, we found novel strategies using SimOptStrain with higher predicted production levels (for succinate and glycerol) than could have been found using an existing approach that considers network additions and deletions in sequential steps rather than simultaneously. Finally, using BiMOMA we found novel strategies involving large numbers of modifications (for pyruvate and glutamate), which sequential search and genetic algorithms were unable to find. The approaches and solution techniques developed here will facilitate the strain design process and extend the scope of its application to metabolic engineering. PMID:21949695
Large-scale bi-level strain design approaches and mixed-integer programming solution techniques.

PubMed

Kim, Joonhoon; Reed, Jennifer L; Maravelias, Christos T

2011-01-01

The use of computational models in metabolic engineering has been increasing as more genome-scale metabolic models and computational approaches become available. Various computational approaches have been developed to predict how genetic perturbations affect metabolic behavior at a systems level, and have been successfully used to engineer microbial strains with improved primary or secondary metabolite production. However, identification of metabolic engineering strategies involving a large number of perturbations is currently limited by computational resources due to the size of genome-scale models and the combinatorial nature of the problem. In this study, we present (i) two new bi-level strain design approaches using mixed-integer programming (MIP), and (ii) general solution techniques that improve the performance of MIP-based bi-level approaches. The first approach (SimOptStrain) simultaneously considers gene deletion and non-native reaction addition, while the second approach (BiMOMA) uses minimization of metabolic adjustment to predict knockout behavior in a MIP-based bi-level problem for the first time. Our general MIP solution techniques significantly reduced the CPU times needed to find optimal strategies when applied to an existing strain design approach (OptORF) (e.g., from ∼10 days to ∼5 minutes for metabolic engineering strategies with 4 gene deletions), and identified strategies for producing compounds where previous studies could not (e.g., malate and serine). Additionally, we found novel strategies using SimOptStrain with higher predicted production levels (for succinate and glycerol) than could have been found using an existing approach that considers network additions and deletions in sequential steps rather than simultaneously. Finally, using BiMOMA we found novel strategies involving large numbers of modifications (for pyruvate and glutamate), which sequential search and genetic algorithms were unable to find. The approaches and solution techniques developed here will facilitate the strain design process and extend the scope of its application to metabolic engineering.
Construction of ontology augmented networks for protein complex prediction.

PubMed

Zhang, Yijia; Lin, Hongfei; Yang, Zhihao; Wang, Jian

2013-01-01

Protein complexes are of great importance in understanding the principles of cellular organization and function. The increase in available protein-protein interaction data, gene ontology and other resources make it possible to develop computational methods for protein complex prediction. Most existing methods focus mainly on the topological structure of protein-protein interaction networks, and largely ignore the gene ontology annotation information. In this article, we constructed ontology augmented networks with protein-protein interaction data and gene ontology, which effectively unified the topological structure of protein-protein interaction networks and the similarity of gene ontology annotations into unified distance measures. After constructing ontology augmented networks, a novel method (clustering based on ontology augmented networks) was proposed to predict protein complexes, which was capable of taking into account the topological structure of the protein-protein interaction network, as well as the similarity of gene ontology annotations. Our method was applied to two different yeast protein-protein interaction datasets and predicted many well-known complexes. The experimental results showed that (i) ontology augmented networks and the unified distance measure can effectively combine the structure closeness and gene ontology annotation similarity; (ii) our method is valuable in predicting protein complexes and has higher F1 and accuracy compared to other competing methods.
Regional variations in the diversity and predicted metabolic potential of benthic prokaryotes in coastal northern Zhejiang, East China Sea

PubMed Central

Wang, Kai; Ye, Xiansen; Zhang, Huajun; Chen, Heping; Zhang, Demin; Liu, Lian

2016-01-01

Knowledge about the drivers of benthic prokaryotic diversity and metabolic potential in interconnected coastal sediments at regional scales is limited. We collected surface sediments across six zones covering ~200 km in coastal northern Zhejiang, East China Sea and combined 16 S rRNA gene sequencing, community-level metabolic prediction, and sediment physicochemical measurements to investigate variations in prokaryotic diversity and metabolic gene composition with geographic distance and under local environmental conditions. Geographic distance was the most influential factor in prokaryotic β-diversity compared with major environmental drivers, including temperature, sediment texture, acid-volatile sulfide, and water depth, but a large unexplained variation in community composition suggested the potential effects of unmeasured abiotic/biotic factors and stochastic processes. Moreover, prokaryotic assemblages showed a biogeographic provincialism across the zones. The predicted metabolic gene composition similarly shifted as taxonomic composition did. Acid-volatile sulfide was strongly correlated with variation in metabolic gene composition. The enrichments in the relative abundance of sulfate-reducing bacteria and genes relevant with dissimilatory sulfate reduction were observed and predicted, respectively, in the Yushan area. These results provide insights into the relative importance of geographic distance and environmental condition in driving benthic prokaryotic diversity in coastal areas and predict specific biogeochemically-relevant genes for future studies. PMID:27917954
Reranking candidate gene models with cross-species comparison for improved gene prediction

PubMed Central

Liu, Qian; Crammer, Koby; Pereira, Fernando CN; Roos, David S

2008-01-01

Background Most gene finders score candidate gene models with state-based methods, typically HMMs, by combining local properties (coding potential, splice donor and acceptor patterns, etc). Competing models with similar state-based scores may be distinguishable with additional information. In particular, functional and comparative genomics datasets may help to select among competing models of comparable probability by exploiting features likely to be associated with the correct gene models, such as conserved exon/intron structure or protein sequence features. Results We have investigated the utility of a simple post-processing step for selecting among a set of alternative gene models, using global scoring rules to rerank competing models for more accurate prediction. For each gene locus, we first generate the K best candidate gene models using the gene finder Evigan, and then rerank these models using comparisons with putative orthologous genes from closely-related species. Candidate gene models with lower scores in the original gene finder may be selected if they exhibit strong similarity to probable orthologs in coding sequence, splice site location, or signal peptide occurrence. Experiments on Drosophila melanogaster demonstrate that reranking based on cross-species comparison outperforms the best gene models identified by Evigan alone, and also outperforms the comparative gene finders GeneWise and Augustus+. Conclusion Reranking gene models with cross-species comparison improves gene prediction accuracy. This straightforward method can be readily adapted to incorporate additional lines of evidence, as it requires only a ranked source of candidate gene models. PMID:18854050
Multiple fuzzy neural network system for outcome prediction and classification of 220 lymphoma patients on the basis of molecular profiling.

PubMed

Ando, Tatsuya; Suguro, Miyuki; Kobayashi, Takeshi; Seto, Masao; Honda, Hiroyuki

2003-10-01

A fuzzy neural network (FNN) using gene expression profile data can select combinations of genes from thousands of genes, and is applicable to predict outcome for cancer patients after chemotherapy. However, wide clinical heterogeneity reduces the accuracy of prediction. To overcome this problem, we have proposed an FNN system based on majoritarian decision using multiple noninferior models. We used transcriptional profiling data, which were obtained from "Lymphochip" DNA microarrays (http://llmpp.nih.gov/DLBCL), reported by Rosenwald (N Engl J Med 2002; 346: 1937-47). When the data were analyzed by our FNN system, accuracy (73.4%) of outcome prediction using only 1 FNN model with 4 genes was higher than that (68.5%) of the Cox model using 17 genes. Higher accuracy (91%) was obtained when an FNN system with 9 noninferior models, consisting of 35 independent genes, was used. The genes selected by the system included genes that are informative in the prognosis of Diffuse large B-cell lymphoma (DLBCL), such as genes showing an expression pattern similar to that of CD10 and BCL-6 or similar to that of IRF-4 and BCL-4. We classified 220 DLBCL patients into 5 groups using the prediction results of 9 FNN models. These groups may correspond to DLBCL subtypes. In group A containing half of the 220 patients, patients with poor outcome were found to satisfy 2 rules, i.e., high expression of MAX dimerization with high expression of unknown A (LC_26146), or high expression of MAX dimerization with low expression of unknown B (LC_33144). The present paper is the first to describe the multiple noninferior FNN modeling system. This system is a powerful tool for predicting outcome and classifying patients, and is applicable to other heterogeneous diseases.
Transcriptional Changes in the Transition from Vegetative Cells to Asexual Development in the Model Fungus Aspergillus nidulans

PubMed Central

Garzia, Aitor; Etxebeste, Oier; Rodríguez-Romero, Julio; Fischer, Reinhard; Espeso, Eduardo A.

2013-01-01

Morphogenesis encompasses programmed changes in gene expression that lead to the development of specialized cell types. In the model fungus Aspergillus nidulans, asexual development involves the formation of characteristic cell types, collectively known as the conidiophore. With the aim of determining the transcriptional changes that occur upon induction of asexual development, we have applied massive mRNA sequencing to compare the expression pattern of 19-h-old submerged vegetative cells (hyphae) with that of similar hyphae after exposure to the air for 5 h. We found that the expression of 2,222 (20.3%) of the predicted 10,943 A. nidulans transcripts was significantly modified after air exposure, 2,035 being downregulated and 187 upregulated. The activation during this transition of genes that belong specifically to the asexual developmental pathway was confirmed. Another remarkable quantitative change occurred in the expression of genes involved in carbon or nitrogen primary metabolism. Genes participating in polar growth or sexual development were transcriptionally repressed, as were those belonging to the HogA/SakA stress response mitogen-activated protein (MAP) kinase pathway. We also identified significant expression changes in several genes purportedly involved in redox balance, transmembrane transport, secondary metabolite production, or transcriptional regulation, mainly binuclear-zinc cluster transcription factors. Genes coding for these four activities were usually grouped in metabolic clusters, which may bring regulatory implications for the induction of asexual development. These results provide a blueprint for further stage-specific gene expression studies during conidiophore development. PMID:23264642
Electronic Health Record Design and Implementation for Pharmacogenomics: a Local Perspective

PubMed Central

Peterson, Josh F.; Bowton, Erica; Field, Julie R.; Beller, Marc; Mitchell, Jennifer; Schildcrout, Jonathan; Gregg, William; Johnson, Kevin; Jirjis, Jim N; Roden, Dan M.; Pulley, Jill M.; Denny, Josh C.

2014-01-01

Purpose The design of electronic health records (EHR) to translate genomic medicine into clinical care is crucial to successful introduction of new genomic services, yet there are few published guides to implementation. Methods The design, implemented features, and evolution of a locally developed EHR that supports a large pharmacogenomics program at a tertiary care academic medical center was tracked over a 4-year development period. Results Developers and program staff created EHR mechanisms for ordering a pharmacogenomics panel in advance of clinical need (preemptive genotyping) and in response to a specific drug indication. Genetic data from panel-based genotyping were sequestered from the EHR until drug-gene interactions (DGIs) met evidentiary standards and deemed clinically actionable. A service to translate genotype to predicted drug response phenotype populated a summary of DGIs, triggered inpatient and outpatient clinical decision support, updated laboratory records, and created gene results within online personal health records. Conclusion The design of a locally developed EHR supporting pharmacogenomics has generalizable utility. The challenge of representing genomic data in a comprehensible and clinically actionable format is discussed along with reflection on the scalability of the model to larger sets of genomic data. PMID:24009000
Cosmetics-triggered percutaneous remote control of transgene expression in mice.

PubMed

Wang, Hui; Ye, Haifeng; Xie, Mingqi; Daoud El-Baba, Marie; Fussenegger, Martin

2015-08-18

Synthetic biology has significantly advanced the rational design of trigger-inducible gene switches that program cellular behavior in a reliable and predictable manner. Capitalizing on genetic componentry, including the repressor PmeR and its cognate operator OPmeR, that has evolved in Pseudomonas syringae pathovar tomato DC3000 to sense and resist plant-defence metabolites of the paraben class, we have designed a set of inducible and repressible mammalian transcription-control devices that could dose-dependently fine-tune transgene expression in mammalian cells and mice in response to paraben derivatives. With an over 60-years track record as licensed preservatives in the cosmetics industry, paraben derivatives have become a commonplace ingredient of most skin-care products including shower gels, cleansing toners and hand creams. As parabens can rapidly reach the bloodstream of mice following topical application, we used this feature to percutaneously program transgene expression of subcutaneous designer cell implants using off-the-shelf commercial paraben-containing skin-care cosmetics. The combination of non-invasive, transdermal and orthogonal trigger-inducible remote control of transgene expression may provide novel opportunities for dynamic interventions in future gene and cell-based therapies. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Cosmetics-triggered percutaneous remote control of transgene expression in mice

PubMed Central

Wang, Hui; Ye, Haifeng; Xie, Mingqi; Daoud El-Baba, Marie; Fussenegger, Martin

2015-01-01

Synthetic biology has significantly advanced the rational design of trigger-inducible gene switches that program cellular behavior in a reliable and predictable manner. Capitalizing on genetic componentry, including the repressor PmeR and its cognate operator OPmeR, that has evolved in Pseudomonas syringae pathovar tomato DC3000 to sense and resist plant-defence metabolites of the paraben class, we have designed a set of inducible and repressible mammalian transcription-control devices that could dose-dependently fine-tune transgene expression in mammalian cells and mice in response to paraben derivatives. With an over 60-years track record as licensed preservatives in the cosmetics industry, paraben derivatives have become a commonplace ingredient of most skin-care products including shower gels, cleansing toners and hand creams. As parabens can rapidly reach the bloodstream of mice following topical application, we used this feature to percutaneously program transgene expression of subcutaneous designer cell implants using off-the-shelf commercial paraben-containing skin-care cosmetics. The combination of non-invasive, transdermal and orthogonal trigger-inducible remote control of transgene expression may provide novel opportunities for dynamic interventions in future gene and cell-based therapies. PMID:25943548
Therapygenetics: 5-HTTLPR genotype predicts the response to exposure therapy for agoraphobia.

PubMed

Knuts, Inge; Esquivel, Gabriel; Kenis, Gunter; Overbeek, Thea; Leibold, Nicole; Goossens, Lies; Schruers, Koen

2014-08-01

This study was intended to assess the extent to which the low-expression allele of the serotonin transporter gene promoter predicts better response to exposure-based behavior therapy in patients with panic disorder with agoraphobia (PDA). Ninety-nine patients with PDA underwent a 1-week in vivo exposure-based behavior therapy program and provided saliva samples to extract genomic DNA and classify individuals according to four allelic forms (SA, SG, LA, LG) of the 5-HTT-linked polymorphic region (5-HTTLPR). We determined whether the 5-HTTLPR genotype predicted change in avoidance behavior in PDA following treatment. After controlling for pre-treatment avoidance behavior, the 5-HTTLPR low-expression genotypes showed a more favorable response to exposure therapy two weeks following treatment, compared to the other patients. This study suggests a genetic contribution to treatment outcome following behavior therapy and implicates the serotonergic system in response to exposure-based treatments in PDA. Copyright © 2014 Elsevier B.V. and ECNP. All rights reserved.

Epigenetic Alteration by DNA Methylation of ESR1, MYOD1 and hTERT Gene Promoters is Useful for Prediction of Response in Patients of Locally Advanced Invasive Cervical Carcinoma Treated by Chemoradiation.

PubMed

Sood, S; Patel, F D; Ghosh, S; Arora, A; Dhaliwal, L K; Srinivasan, R

2015-12-01

Locally advanced invasive cervical cancer [International Federation of Gynecology and Obstetrics (FIGO) IIB/III] is treated by chemoradiation. The response to treatment is variable within a given FIGO stage. Therefore, the aim of the present study was to evaluate the gene promoter methylation profile and corresponding transcript expression of a panel of six genes to identify genes which could predict the response of patients treated by chemoradiation. In total, 100 patients with invasive cervical cancer in FIGO stage IIB/III who underwent chemoradiation treatment were evaluated. Ten patients developed systemic metastases during therapy and were excluded. On the basis of patient follow-up, 69 patients were chemoradiation-sensitive, whereas 21 were chemoradiation-resistant. Gene promoter methylation and gene expression was determined by TaqMan assay and quantitative real-time PCR, respectively, in tissue samples. The methylation frequency of ESR1, BRCA1, RASSF1A, MLH1, MYOD1 and hTERT genes ranged from 40 to 70%. Univariate and hierarchical cluster analysis revealed that gene promoter methylation of MYOD1, ESR1 and hTERT could predict for chemoradiation response. A pattern of unmethylated MYOD1, unmethylated ESR1 and methylated hTERT promoter as well as lower ESR1 transcript levels predicted for chemoradiation resistance. Methylation profiling of a panel of three genes that includes MYOD1, ESR1 and hTERT may be useful to predict the response of invasive cervical carcinoma patients treated with standard chemoradiation therapy. Copyright © 2015 The Royal College of Radiologists. Published by Elsevier Ltd. All rights reserved.
Predictive model for inflammation grades of chronic hepatitis B: Large-scale analysis of clinical parameters and gene expressions.

PubMed

Zhou, Weichen; Ma, Yanyun; Zhang, Jun; Hu, Jingyi; Zhang, Menghan; Wang, Yi; Li, Yi; Wu, Lijun; Pan, Yida; Zhang, Yitong; Zhang, Xiaonan; Zhang, Xinxin; Zhang, Zhanqing; Zhang, Jiming; Li, Hai; Lu, Lungen; Jin, Li; Wang, Jiucun; Yuan, Zhenghong; Liu, Jie

2017-11-01

Liver biopsy is the gold standard to assess pathological features (eg inflammation grades) for hepatitis B virus-infected patients although it is invasive and traumatic; meanwhile, several gene profiles of chronic hepatitis B (CHB) have been separately described in relatively small hepatitis B virus (HBV)-infected samples. We aimed to analyse correlations among inflammation grades, gene expressions and clinical parameters (serum alanine amino transaminase, aspartate amino transaminase and HBV-DNA) in large-scale CHB samples and to predict inflammation grades by using clinical parameters and/or gene expressions. We analysed gene expressions with three clinical parameters in 122 CHB samples by an improved regression model. Principal component analysis and machine-learning methods including Random Forest, K-nearest neighbour and support vector machine were used for analysis and further diagnosis models. Six normal samples were conducted to validate the predictive model. Significant genes related to clinical parameters were found enriching in the immune system, interferon-stimulated, regulation of cytokine production, anti-apoptosis, and etc. A panel of these genes with clinical parameters can effectively predict binary classifications of inflammation grade (area under the ROC curve [AUC]: 0.88, 95% confidence interval [CI]: 0.77-0.93), validated by normal samples. A panel with only clinical parameters was also valuable (AUC: 0.78, 95% CI: 0.65-0.86), indicating that liquid biopsy method for detecting the pathology of CHB is possible. This is the first study to systematically elucidate the relationships among gene expressions, clinical parameters and pathological inflammation grades in CHB, and to build models predicting inflammation grades by gene expressions and/or clinical parameters as well. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
A multigene predictor of metastatic outcome in early stage hormone receptor-negative and triple-negative breast cancer

PubMed Central

2010-01-01

Introduction Various multigene predictors of breast cancer clinical outcome have been commercialized, but proved to be prognostic only for hormone receptor (HR) subsets overexpressing estrogen or progesterone receptors. Hormone receptor negative (HRneg) breast cancers, particularly those lacking HER2/ErbB2 overexpression and known as triple-negative (Tneg) cases, are heterogeneous and generally aggressive breast cancer subsets in need of prognostic subclassification, since most early stage HRneg and Tneg breast cancer patients are cured with conservative treatment yet invariably receive aggressive adjuvant chemotherapy. Methods An unbiased search for genes predictive of distant metastatic relapse was undertaken using a training cohort of 199 node-negative, adjuvant treatment naïve HRneg (including 154 Tneg) breast cancer cases curated from three public microarray datasets. Prognostic gene candidates were subsequently validated using a different cohort of 75 node-negative, adjuvant naïve HRneg cases curated from three additional datasets. The HRneg/Tneg gene signature was prognostically compared with eight other previously reported gene signatures, and evaluated for cancer network associations by two commercial pathway analysis programs. Results A novel set of 14 prognostic gene candidates was identified as outcome predictors: CXCL13, CLIC5, RGS4, RPS28, RFX7, EXOC7, HAPLN1, ZNF3, SSX3, HRBL, PRRG3, ABO, PRTN3, MATN1. A composite HRneg/Tneg gene signature index proved more accurate than any individual candidate gene or other reported multigene predictors in identifying cases likely to remain free of metastatic relapse. Significant positive correlations between the HRneg/Tneg index and three independent immune-related signatures (STAT1, IFN, and IR) were observed, as were consistent negative associations between the three immune-related signatures and five other proliferation module-containing signatures (MS-14, ONCO-RS, GGI, CSR/wound and NKI-70). Network analysis identified 8 genes within the HRneg/Tneg signature as being functionally linked to immune/inflammatory chemokine regulation. Conclusions A multigene HRneg/Tneg signature linked to immune/inflammatory cytokine regulation was identified from pooled expression microarray data and shown to be superior to other reported gene signatures in predicting the metastatic outcome of early stage and conservatively managed HRneg and Tneg breast cancer. Further validation of this prognostic signature may lead to new therapeutic insights and spare many newly diagnosed breast cancer patients the need for aggressive adjuvant chemotherapy. PMID:20946665
Clustering gene expression data based on predicted differential effects of GV interaction.

PubMed

Pan, Hai-Yan; Zhu, Jun; Han, Dan-Fu

2005-02-01

Microarray has become a popular biotechnology in biological and medical research. However, systematic and stochastic variabilities in microarray data are expected and unavoidable, resulting in the problem that the raw measurements have inherent "noise" within microarray experiments. Currently, logarithmic ratios are usually analyzed by various clustering methods directly, which may introduce bias interpretation in identifying groups of genes or samples. In this paper, a statistical method based on mixed model approaches was proposed for microarray data cluster analysis. The underlying rationale of this method is to partition the observed total gene expression level into various variations caused by different factors using an ANOVA model, and to predict the differential effects of GV (gene by variety) interaction using the adjusted unbiased prediction (AUP) method. The predicted GV interaction effects can then be used as the inputs of cluster analysis. We illustrated the application of our method with a gene expression dataset and elucidated the utility of our approach using an external validation.
Gene expression analysis predicts insect venom anaphylaxis in indolent systemic mastocytosis.

PubMed

Niedoszytko, M; Bruinenberg, M; van Doormaal, J J; de Monchy, J G R; Nedoszytko, B; Koppelman, G H; Nawijn, M C; Wijmenga, C; Jassem, E; Elberink, J N G Oude

2011-05-01

Anaphylaxis to insect venom (Hymenoptera) is most severe in patients with mastocytosis and may even lead to death. However, not all patients with mastocytosis suffer from anaphylaxis. The aim of the study was to analyze differences in gene expression between patients with indolent systemic mastocytosis (ISM) and a history of insect venom anaphylaxis (IVA) compared to those patients without a history of anaphylaxis, and to determine the predictive use of gene expression profiling. Whole-genome gene expression analysis was performed in peripheral blood cells. Twenty-two adults with ISM were included: 12 with a history of IVA and 10 without a history of anaphylaxis of any kind. Significant differences in single gene expression corrected for multiple testing were found for 104 transcripts (P < 0.05). Gene ontology analysis revealed that the differentially expressed genes were involved in pathways responsible for the development of cancer and focal and cell adhesion suggesting that the expression of genes related to the differentiation state of cells is higher in patients with a history of anaphylaxis. Based on the gene expression profiles, a naïve Bayes prediction model was built identifying patients with IVA. In ISM, gene expression profiles are different between patients with a history of IVA and those without. These findings might reflect a more pronounced mast cells dysfunction in patients without a history of anaphylaxis. Gene expression profiling might be a useful tool to predict the risk of anaphylaxis on insect venom in patients with ISM. Prospective studies are needed to substantiate any conclusions. © 2010 John Wiley & Sons A/S.
Predicting Gene Expression Level from Relative Codon Usage Bias: An Application to Escherichia coli Genome

PubMed Central

Roymondal, Uttam; Das, Shibsankar; Sahoo, Satyabrata

2009-01-01

We present an expression measure of a gene, devised to predict the level of gene expression from relative codon bias (RCB). There are a number of measures currently in use that quantify codon usage in genes. Based on the hypothesis that gene expressivity and codon composition is strongly correlated, RCB has been defined to provide an intuitively meaningful measure of an extent of the codon preference in a gene. We outline a simple approach to assess the strength of RCB (RCBS) in genes as a guide to their likely expression levels and illustrate this with an analysis of Escherichia coli (E. coli) genome. Our efforts to quantitatively predict gene expression levels in E. coli met with a high level of success. Surprisingly, we observe a strong correlation between RCBS and protein length indicating natural selection in favour of the shorter genes to be expressed at higher level. The agreement of our result with high protein abundances, microarray data and radioactive data demonstrates that the genomic expression profile available in our method can be applied in a meaningful way to the study of cell physiology and also for more detailed studies of particular genes of interest. PMID:19131380
LitMiner and WikiGene: identifying problem-related key players of gene regulation using publication abstracts.

PubMed

Maier, Holger; Döhr, Stefanie; Grote, Korbinian; O'Keeffe, Sean; Werner, Thomas; Hrabé de Angelis, Martin; Schneider, Ralf

2005-07-01

The LitMiner software is a literature data-mining tool that facilitates the identification of major gene regulation key players related to a user-defined field of interest in PubMed abstracts. The prediction of gene-regulatory relationships is based on co-occurrence analysis of key terms within the abstracts. LitMiner predicts relationships between key terms from the biomedical domain in four categories (genes, chemical compounds, diseases and tissues). Owing to the limitations (no direction, unverified automatic prediction) of the co-occurrence approach, the primary data in the LitMiner database represent postulated basic gene-gene relationships. The usefulness of the LitMiner system has been demonstrated recently in a study that reconstructed disease-related regulatory networks by promoter modelling that was initiated by a LitMiner generated primary gene list. To overcome the limitations and to verify and improve the data, we developed WikiGene, a Wiki-based curation tool that allows revision of the data by expert users over the Internet. LitMiner (http://andromeda.gsf.de/litminer) and WikiGene (http://andromeda.gsf.de/wiki) can be used unrestricted with any Internet browser.
Analysis of selected genes associated with cardiomyopathy by next-generation sequencing.

PubMed

Szabadosova, Viktoria; Boronova, Iveta; Ferenc, Peter; Tothova, Iveta; Bernasovska, Jarmila; Zigova, Michaela; Kmec, Jan; Bernasovsky, Ivan

2018-02-01

As the leading cause of congestive heart failure, cardiomyopathy represents a heterogenous group of heart muscle disorders. Despite considerable progress being made in the genetic diagnosis of cardiomyopathy by detection of the mutations in the most prevalent cardiomyopathy genes, the cause remains unsolved in many patients. High-throughput mutation screening in the disease genes for cardiomyopathy is now possible because of using target enrichment followed by next-generation sequencing. The aim of the study was to analyze a panel of genes associated with dilated or hypertrophic cardiomyopathy based on previously published results in order to identify the subjects at risk. The method of next-generation sequencing by IlluminaHiSeq 2500 platform was used to detect sequence variants in 16 individuals diagnosed with dilated or hypertrophic cardiomyopathy. Detected variants were filtered and the functional impact of amino acid changes was predicted by computational programs. DNA samples of the 16 patients were analyzed by whole exome sequencing. We identified six nonsynonymous variants that were shown to be pathogenic in all used prediction softwares: rs3744998 (EPG5), rs11551768 (MGME1), rs148374985 (MURC), rs78461695 (PLEC), rs17158558 (RET) and rs2295190 (SYNE1). Two of the analyzed sequence variants had minor allele frequency (MAF)<0.01: rs148374985 (MURC), rs34580776 (MYBPC3). Our data support the potential role of the detected variants in pathogenesis of dilated or hypertrophic cardiomyopathy; however, the possibility that these variants might not be true disease-causing variants but are susceptibility alleles that require additional mutations or injury to cause the clinical phenotype of disease must be considered. © 2017 Wiley Periodicals, Inc.
Chronic and Acute Stress, Gender, and Serotonin Transporter Gene-Environment Interactions Predicting Depression Symptoms in Youth

ERIC Educational Resources Information Center

Hammen, Constance; Brennan, Patricia A.; Keenan-Miller, Danielle; Hazel, Nicholas A.; Najman, Jake M.

2010-01-01

Background: Many recent studies of serotonin transporter gene by environment effects predicting depression have used stress assessments with undefined or poor psychometric methods, possibly contributing to wide variation in findings. The present study attempted to distinguish between effects of acute and chronic stress to predict depressive…
A network approach to predict pathogenic genes for Fusarium graminearum.

PubMed

Liu, Xiaoping; Tang, Wei-Hua; Zhao, Xing-Ming; Chen, Luonan

2010-10-04

Fusarium graminearum is the pathogenic agent of Fusarium head blight (FHB), which is a destructive disease on wheat and barley, thereby causing huge economic loss and health problems to human by contaminating foods. Identifying pathogenic genes can shed light on pathogenesis underlying the interaction between F. graminearum and its plant host. However, it is difficult to detect pathogenic genes for this destructive pathogen by time-consuming and expensive molecular biological experiments in lab. On the other hand, computational methods provide an alternative way to solve this problem. Since pathogenesis is a complicated procedure that involves complex regulations and interactions, the molecular interaction network of F. graminearum can give clues to potential pathogenic genes. Furthermore, the gene expression data of F. graminearum before and after its invasion into plant host can also provide useful information. In this paper, a novel systems biology approach is presented to predict pathogenic genes of F. graminearum based on molecular interaction network and gene expression data. With a small number of known pathogenic genes as seed genes, a subnetwork that consists of potential pathogenic genes is identified from the protein-protein interaction network (PPIN) of F. graminearum, where the genes in the subnetwork are further required to be differentially expressed before and after the invasion of the pathogenic fungus. Therefore, the candidate genes in the subnetwork are expected to be involved in the same biological processes as seed genes, which imply that they are potential pathogenic genes. The prediction results show that most of the pathogenic genes of F. graminearum are enriched in two important signal transduction pathways, including G protein coupled receptor pathway and MAPK signaling pathway, which are known related to pathogenesis in other fungi. In addition, several pathogenic genes predicted by our method are verified in other pathogenic fungi, which demonstrate the effectiveness of the proposed method. The results presented in this paper not only can provide guidelines for future experimental verification, but also shed light on the pathogenesis of the destructive fungus F. graminearum.
Genetic Predictors of ≥5% Weight Loss by Multidisciplinary Advice to Severely Obese Subjects

PubMed Central

Aller, Erik E.J.G.; Mariman, Edwin C.M.; Bouwman, Freek G.; van Baak, Marleen A.

2017-01-01

Background Weight loss success is determined by genetic factors, which may differ according to treatment strategy. Methods From a multidisciplinary obesity treatment program involving dietary advice, psychological counseling, and increased physical activity, 587 subjects (68% female; 46.1 ± 12.4 years; BMI 39.9 ± 6.3) were recruited. At baseline, a blood sample was drawn for DNA isolation. Genotypes were determined for 30 polymorphisms in 25 candidate genes. The association between genotypes and weight loss was assessed after 3 months (short-term) and after 12 months of treatment (long-term). Weight loss was categorized as ≥5% or <5% of initial weight. Results The G/G genotype of PLIN1 (rs2289487) and PLIN1 (rs2304795), the T/T genotype of PLIN1 (rs1052700), and the C/C genotype of MMP2 predicted ≥5% weight loss in the first 3 months. The C/G-G/G genotype of PPARγ (rs1801282) and the T/C genotype of TIMP4 (rs3755724) predicted ≥5% weight loss after 12 months. Subjects with the combination of PPARγ (rs1801282) C/G-G/G and TIMP4 (rs3755724) T/C lost even more weight. Conclusion Polymorphisms in genes related to regulation of fat storage and structural adaptation of the adipocytes are predictors for weight loss success with different genes being relevant for short-term and long-term weight loss success. PMID:28578327
A genetic network that suppresses genome rearrangements in Saccharomyces cerevisiae and contains defects in cancers

PubMed Central

Putnam, Christopher D.; Srivatsan, Anjana; Nene, Rahul V.; Martinez, Sandra L.; Clotfelter, Sarah P.; Bell, Sara N.; Somach, Steven B.; E.S. de Souza, Jorge; Fonseca, André F.; de Souza, Sandro J.; Kolodner, Richard D.

2016-01-01

Gross chromosomal rearrangements (GCRs) play an important role in human diseases, including cancer. The identity of all Genome Instability Suppressing (GIS) genes is not currently known. Here multiple Saccharomyces cerevisiae GCR assays and query mutations were crossed into arrays of mutants to identify progeny with increased GCR rates. One hundred eighty two GIS genes were identified that suppressed GCR formation. Another 438 cooperatively acting GIS genes were identified that were not GIS genes, but suppressed the increased genome instability caused by individual query mutations. Analysis of TCGA data using the human genes predicted to act in GIS pathways revealed that a minimum of 93% of ovarian and 66% of colorectal cancer cases had defects affecting one or more predicted GIS gene. These defects included loss-of-function mutations, copy-number changes associated with reduced expression, and silencing. In contrast, acute myeloid leukaemia cases did not appear to have defects affecting the predicted GIS genes. PMID:27071721
Gene Expression Profiling of Peripheral Blood From Kidney Transplant Recipients for the Early Detection of Digestive System Cancer.

PubMed

Kusaka, M; Okamoto, M; Takenaka, M; Sasaki, H; Fukami, N; Kataoka, K; Ito, T; Kenmochi, T; Hoshinaga, K; Shiroki, R

2017-06-01

Kidney transplant recipients are at increased risk of developing cancer in comparison with the general population. To effectively manage post-transplantation malignancies, it is essential to proactively monitor patients. A long-term intensive screening program was associated with a reduced incidence of cancer after transplantation. This study evaluated the usefulness of the gene expression profiling of peripheral blood samples obtained from kidney transplant patients and adopted a screening test for detecting cancer of the digestive system (gastric, colon, pancreas, and biliary tract). Nineteen patients were included in this study and a total of 53 gene expression screening tests were performed. The gene expression profiles of blood-delivered total RNA and whole genome human gene expression profiles were obtained. We investigated the expression levels of 2665 genes associated with digestive cancers and counted the number of genes in which expression was altered. A hierarchical clustering analysis was also performed. The final prediction of the cancer possibility was determined according to an algorithm. The number of genes in which expression was altered was significantly increased in the kidney transplant recipients in comparison with the general population (1091 ± 63 vs 823 ± 94; P = .0024). The number of genes with altered expression decreased after the induction of mechanistic target of rapamycin (mTOR) inhibitor (1484 ± 227 vs 883 ± 154; P = .0439). No cases of possible digestive cancer were detected in this study period. The gene expression profiling of peripheral blood samples may be a useful and noninvasive diagnostic tool that allows for the early detection of cancer of the digestive system. Copyright © 2017 Elsevier Inc. All rights reserved.
Genome and transcriptome sequencing characterises the gene space of Macadamia integrifolia (Proteaceae).

PubMed

Nock, Catherine J; Baten, Abdul; Barkla, Bronwyn J; Furtado, Agnelo; Henry, Robert J; King, Graham J

2016-11-17

The large Gondwanan plant family Proteaceae is an early-diverging eudicot lineage renowned for its morphological, taxonomic and ecological diversity. Macadamia is the most economically important Proteaceae crop and represents an ancient rainforest-restricted lineage. The family is a focus for studies of adaptive radiation due to remarkable species diversification in Mediterranean-climate biodiversity hotspots, and numerous evolutionary transitions between biomes. Despite a long history of research, comparative analyses in the Proteaceae and macadamia breeding programs are restricted by a paucity of genetic information. To address this, we sequenced the genome and transcriptome of the widely grown Macadamia integrifolia cultivar 741. Over 95 gigabases of DNA and RNA-seq sequence data were de novo assembled and annotated. The draft assembly has a total length of 518 Mb and spans approximately 79% of the estimated genome size. Following annotation, 35,337 protein-coding genes were predicted of which over 90% were expressed in at least one of the leaf, shoot or flower tissues examined. Gene family comparisons with five other eudicot species revealed 13,689 clusters containing macadamia genes and 1005 macadamia-specific clusters, and provides evidence for linage-specific expansion of gene families involved in pathogen recognition, plant defense and monoterpene synthesis. Cyanogenesis is an important defense strategy in the Proteaceae, and a detailed analysis of macadamia gene homologues potentially involved in cyanogenic glycoside biosynthesis revealed several highly expressed candidate genes. The gene space of macadamia provides a foundation for comparative genomics, gene discovery and the acceleration of molecular-assisted breeding. This study presents the first available genomic resources for the large basal eudicot family Proteaceae, access to most macadamia genes and opportunities to uncover the genetic basis of traits of importance for adaptation and crop improvement.
Promoter analysis reveals globally differential regulation of human long non-coding RNA and protein-coding genes

DOE PAGES

Alam, Tanvir; Medvedeva, Yulia A.; Jia, Hui; ...

2014-10-02

Transcriptional regulation of protein-coding genes is increasingly well-understood on a global scale, yet no comparable information exists for long non-coding RNA (lncRNA) genes, which were recently recognized to be as numerous as protein-coding genes in mammalian genomes. We performed a genome-wide comparative analysis of the promoters of human lncRNA and protein-coding genes, finding global differences in specific genetic and epigenetic features relevant to transcriptional regulation. These two groups of genes are hence subject to separate transcriptional regulatory programs, including distinct transcription factor (TF) proteins that significantly favor lncRNA, rather than coding-gene, promoters. We report a specific signature of promoter-proximal transcriptionalmore » regulation of lncRNA genes, including several distinct transcription factor binding sites (TFBS). Experimental DNase I hypersensitive site profiles are consistent with active configurations of these lncRNA TFBS sets in diverse human cell types. TFBS ChIP-seq datasets confirm the binding events that we predicted using computational approaches for a subset of factors. For several TFs known to be directly regulated by lncRNAs, we find that their putative TFBSs are enriched at lncRNA promoters, suggesting that the TFs and the lncRNAs may participate in a bidirectional feedback loop regulatory network. Accordingly, cells may be able to modulate lncRNA expression levels independently of mRNA levels via distinct regulatory pathways. Our results also raise the possibility that, given the historical reliance on protein-coding gene catalogs to define the chromatin states of active promoters, a revision of these chromatin signature profiles to incorporate expressed lncRNA genes is warranted in the future.« less
Promoter analysis reveals globally differential regulation of human long non-coding RNA and protein-coding genes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Alam, Tanvir; Medvedeva, Yulia A.; Jia, Hui

Transcriptional regulation of protein-coding genes is increasingly well-understood on a global scale, yet no comparable information exists for long non-coding RNA (lncRNA) genes, which were recently recognized to be as numerous as protein-coding genes in mammalian genomes. We performed a genome-wide comparative analysis of the promoters of human lncRNA and protein-coding genes, finding global differences in specific genetic and epigenetic features relevant to transcriptional regulation. These two groups of genes are hence subject to separate transcriptional regulatory programs, including distinct transcription factor (TF) proteins that significantly favor lncRNA, rather than coding-gene, promoters. We report a specific signature of promoter-proximal transcriptionalmore » regulation of lncRNA genes, including several distinct transcription factor binding sites (TFBS). Experimental DNase I hypersensitive site profiles are consistent with active configurations of these lncRNA TFBS sets in diverse human cell types. TFBS ChIP-seq datasets confirm the binding events that we predicted using computational approaches for a subset of factors. For several TFs known to be directly regulated by lncRNAs, we find that their putative TFBSs are enriched at lncRNA promoters, suggesting that the TFs and the lncRNAs may participate in a bidirectional feedback loop regulatory network. Accordingly, cells may be able to modulate lncRNA expression levels independently of mRNA levels via distinct regulatory pathways. Our results also raise the possibility that, given the historical reliance on protein-coding gene catalogs to define the chromatin states of active promoters, a revision of these chromatin signature profiles to incorporate expressed lncRNA genes is warranted in the future.« less
Estimation of mechanical properties of nanomaterials using artificial intelligence methods

NASA Astrophysics Data System (ADS)

Vijayaraghavan, V.; Garg, A.; Wong, C. H.; Tai, K.

2014-09-01

Computational modeling tools such as molecular dynamics (MD), ab initio, finite element modeling or continuum mechanics models have been extensively applied to study the properties of carbon nanotubes (CNTs) based on given input variables such as temperature, geometry and defects. Artificial intelligence techniques can be used to further complement the application of numerical methods in characterizing the properties of CNTs. In this paper, we have introduced the application of multi-gene genetic programming (MGGP) and support vector regression to formulate the mathematical relationship between the compressive strength of CNTs and input variables such as temperature and diameter. The predictions of compressive strength of CNTs made by these models are compared to those generated using MD simulations. The results indicate that MGGP method can be deployed as a powerful method for predicting the compressive strength of the carbon nanotubes.
Advancing Translational Research Through the NHLBI Gene Therapy Resource Program (GTRP)

PubMed Central

Benson, Janet; Cornetta, Kenneth; Diggins, Margaret; Johnston, Julie C.; Sepelak, Susan; Wang, Gensheng; Wilson, James M.; Wright, J. Fraser; Skarlatos, Sonia I.

2013-01-01

Abstract Translational research is a lengthy, complex, and necessary endeavor in order to bring basic science discoveries to clinical fruition. The NIH offers several programs to support translational research including an important resource established specifically for gene therapy researchers—the National Heart, Lung, and Blood Institute (NHLBI) Gene Therapy Resource Program (GTRP). This paper reviews the core components of the GTRP and describes how the GTRP provides researchers with resources that are critical to advancing investigational gene therapy products into clinical testing. PMID:23692378
A polynomial based model for cell fate prediction in human diseases.

PubMed

Ma, Lichun; Zheng, Jie

2017-12-21

Cell fate regulation directly affects tissue homeostasis and human health. Research on cell fate decision sheds light on key regulators, facilitates understanding the mechanisms, and suggests novel strategies to treat human diseases that are related to abnormal cell development. In this study, we proposed a polynomial based model to predict cell fate. This model was derived from Taylor series. As a case study, gene expression data of pancreatic cells were adopted to test and verify the model. As numerous features (genes) are available, we employed two kinds of feature selection methods, i.e. correlation based and apoptosis pathway based. Then polynomials of different degrees were used to refine the cell fate prediction function. 10-fold cross-validation was carried out to evaluate the performance of our model. In addition, we analyzed the stability of the resultant cell fate prediction model by evaluating the ranges of the parameters, as well as assessing the variances of the predicted values at randomly selected points. Results show that, within both the two considered gene selection methods, the prediction accuracies of polynomials of different degrees show little differences. Interestingly, the linear polynomial (degree 1 polynomial) is more stable than others. When comparing the linear polynomials based on the two gene selection methods, it shows that although the accuracy of the linear polynomial that uses correlation analysis outcomes is a little higher (achieves 86.62%), the one within genes of the apoptosis pathway is much more stable. Considering both the prediction accuracy and the stability of polynomial models of different degrees, the linear model is a preferred choice for cell fate prediction with gene expression data of pancreatic cells. The presented cell fate prediction model can be extended to other cells, which may be important for basic research as well as clinical study of cell development related diseases.
Pediatric acute lymphoblastic leukemia.

PubMed

Carroll, William L; Bhojwani, Deepa; Min, Dong-Joon; Raetz, Elizabeth; Relling, Mary; Davies, Stella; Downing, James R; Willman, Cheryl L; Reed, John C

2003-01-01

The outcome for children with acute lymphoblastic leukemia (ALL) has improved dramatically with current therapy resulting in an event free survival exceeding 75% for most patients. However significant challenges remain including developing better methods to predict which patients can be cured with less toxic treatment and which ones will benefit from augmented therapy. In addition, 25% of patients fail therapy and novel treatments that are focused on undermining specifically the leukemic process are needed urgently. In Section I, Dr. Carroll reviews current approaches to risk classification and proposes a system that incorporates well-established clinical parameters, genetic lesions of the blast as well as early response parameters. He then provides an overview of emerging technologies in genomics and proteomics and how they might lead to more rational, biologically based classification systems. In Section II, Drs. Mary Relling and Stella Davies describe emerging findings that relate to host features that influence outcome, the role of inherited germline variation. They highlight technical breakthroughs in assessing germline differences among patients. Polymorphisms of drug metabolizing genes have been shown to influence toxicity and the best example is the gene thiopurine methyltransferase (TPMT) a key enzyme in the metabolism of 6-mercaptopurine. Polymorphisms are associated with decreased activity that is also associated with increased toxicity. The role of polymorphisms in other genes whose products play an important role in drug metabolism as well as cytokine genes are discussed. In Sections III and IV, Drs. James Downing and Cheryl Willman review their findings using gene expression profiling to classify ALL. Both authors outline challenges in applying this methodology to analysis of clinical samples. Dr. Willman describes her laboratory's examination of infant leukemia and precursor B-ALL where unsupervised approaches have led to the identification of inherent biologic groups not predicted by conventional morphologic, immunophenotypic and cytogenetic variables. Dr. Downing describes his results from a pediatric ALL expression database using over 327 diagnostic samples, with 80% of the dataset consisting of samples from patients treated on a single institutional protocol. Seven distinct leukemia subtypes were identified representing known leukemia subtypes including: BCR-ABL, E2A-PBX1, TEL-AML1, rearrangements in the MLL gene, hyperdiploid karyotype (i.e., > 50 chromosomes), and T-ALL as well as a new leukemia subtype. A subset of genes have been identified whose expression appears to be predictive of outcome but independent verification is needed before this type of analysis can be integrated into treatment assignment. Chemotherapeutic agents kill cancer cells by activating apoptosis, or programmed cell death. In Section V, Dr. John Reed describes major apoptotic pathways and the specific role of key proteins in this response. The expression level of some of these proteins, such as BCL2, BAX, and caspase 3, has been shown to be predictive of ultimate outcome in hematopoietic tumors. New therapeutic approaches that modulate the apoptotic pathway are now available and Dr. Reed highlights those that may be applicable to the treatment of childhood ALL.

Reconstructing genome-wide regulatory network of E. coli using transcriptome data and predicted transcription factor activities

PubMed Central

2011-01-01

Background Gene regulatory networks play essential roles in living organisms to control growth, keep internal metabolism running and respond to external environmental changes. Understanding the connections and the activity levels of regulators is important for the research of gene regulatory networks. While relevance score based algorithms that reconstruct gene regulatory networks from transcriptome data can infer genome-wide gene regulatory networks, they are unfortunately prone to false positive results. Transcription factor activities (TFAs) quantitatively reflect the ability of the transcription factor to regulate target genes. However, classic relevance score based gene regulatory network reconstruction algorithms use models do not include the TFA layer, thus missing a key regulatory element. Results This work integrates TFA prediction algorithms with relevance score based network reconstruction algorithms to reconstruct gene regulatory networks with improved accuracy over classic relevance score based algorithms. This method is called Gene expression and Transcription factor activity based Relevance Network (GTRNetwork). Different combinations of TFA prediction algorithms and relevance score functions have been applied to find the most efficient combination. When the integrated GTRNetwork method was applied to E. coli data, the reconstructed genome-wide gene regulatory network predicted 381 new regulatory links. This reconstructed gene regulatory network including the predicted new regulatory links show promising biological significances. Many of the new links are verified by known TF binding site information, and many other links can be verified from the literature and databases such as EcoCyc. The reconstructed gene regulatory network is applied to a recent transcriptome analysis of E. coli during isobutanol stress. In addition to the 16 significantly changed TFAs detected in the original paper, another 7 significantly changed TFAs have been detected by using our reconstructed network. Conclusions The GTRNetwork algorithm introduces the hidden layer TFA into classic relevance score-based gene regulatory network reconstruction processes. Integrating the TFA biological information with regulatory network reconstruction algorithms significantly improves both detection of new links and reduces that rate of false positives. The application of GTRNetwork on E. coli gene transcriptome data gives a set of potential regulatory links with promising biological significance for isobutanol stress and other conditions. PMID:21668997
Network propagation in the cytoscape cyberinfrastructure.

PubMed

Carlin, Daniel E; Demchak, Barry; Pratt, Dexter; Sage, Eric; Ideker, Trey

2017-10-01

Network propagation is an important and widely used algorithm in systems biology, with applications in protein function prediction, disease gene prioritization, and patient stratification. However, up to this point it has required significant expertise to run. Here we extend the popular network analysis program Cytoscape to perform network propagation as an integrated function. Such integration greatly increases the access to network propagation by putting it in the hands of biologists and linking it to the many other types of network analysis and visualization available through Cytoscape. We demonstrate the power and utility of the algorithm by identifying mutations conferring resistance to Vemurafenib.
Gene Expression Patterns during the Early Stages of Chemically Induced Larval Metamorphosis and Settlement of the Coral Acropora millepora

PubMed Central

Siboni, Nachshon; Abrego, David; Motti, Cherie A.; Tebben, Jan; Harder, Tilmann

2014-01-01

The morphogenetic transition of motile coral larvae into sessile primary polyps is triggered and genetically programmed upon exposure to environmental biomaterials, such as crustose coralline algae (CCA) and bacterial biofilms. Although the specific chemical cues that trigger coral larval morphogenesis are poorly understood there is much more information available on the genes that play a role in this early life phase. Putative chemical cues from natural biomaterials yielded defined chemical samples that triggered different morphogenetic outcomes: an extract derived from a CCA-associated Pseudoalteromonas bacterium that induced metamorphosis, characterized by non-attached metamorphosed juveniles; and two fractions of the CCA Hydrolithon onkodes (Heydrich) that induced settlement, characterized by attached metamorphosed juveniles. In an effort to distinguish the genes involved in these two morphogenetic transitions, competent larvae of the coral Acropora millepora were exposed to these predictable cues and the expression profiles of 47 coral genes of interest (GOI) were investigated after only 1 hour of exposure using multiplex RT–qPCR. Thirty-two GOI were differentially expressed, indicating a putative role during the early regulation of morphogenesis. The most striking differences were observed for immunity-related genes, hypothesized to be involved in cell recognition and adhesion, and for fluorescent protein genes. Principal component analysis of gene expression profiles resulted in separation between the different morphogenetic cues and exposure times, and not only identified those genes involved in the early response but also those which influenced downstream biological changes leading to larval metamorphosis or settlement. PMID:24632854
Novel candidate genes of the PARK7 interactome as mediators of apoptosis and acetylation in multiple sclerosis: An in silico analysis.

PubMed

Vavougios, George D; Zarogiannis, Sotirios G; Krogfelt, Karen Angeliki; Gourgoulianis, Konstantinos; Mitsikostas, Dimos Dimitrios; Hadjigeorgiou, Georgios

2018-01-01

currently only 4 studies have explored the potential role of PARK7's dysregulation in MS pathophysiology Currently, no study has evaluated the potential role of the PARK7 interactome in MS. The aim of our study was to assess the differential expression of PARK7 mRNA in peripheral blood mononuclears (PBMCs) donated from MS versus healthy patients using data mining techniques. The PARK7 interactome data from the GDS3920 profile were scrutinized for differentially expressed genes (DEGs); Gene Enrichment Analysis (GEA) was used to detect significantly enriched biological functions. 27 differentially expressed genes in the MS dataset were detected; 12 of these (NDUFA4, UBA2, TDP2, NPM1, NDUFS3, SUMO1, PIAS2, KIAA0101, RBBP4, NONO, RBBP7 AND HSPA4) are reported for the first time in MS. Stepwise Linear Discriminant Function Analysis constructed a predictive model (Wilk's λ = 0.176, χ 2 = 45.204, p = 1.5275e -10 ) with 2 variables (TIDP2, RBBP4) that achieved 96.6% accuracy when discriminating between patients and controls. Gene Enrichment Analysis revealed that induction and regulation of programmed / intrinsic cell death represented the most salient Gene Ontology annotations. Cross-validation on systemic lupus erythematosus and ischemic stroke datasets revealed that these functions are unique to the MS dataset. Based on our results, novel potential target genes are revealed; these differentially expressed genes regulate epigenetic and apoptotic pathways that may further elucidate underlying mechanisms of autorreactivity in MS. Copyright © 2017 Elsevier B.V. All rights reserved.
Prophinder: a computational tool for prophage prediction in prokaryotic genomes.

PubMed

Lima-Mendez, Gipsi; Van Helden, Jacques; Toussaint, Ariane; Leplae, Raphaël

2008-03-15

Prophinder is a prophage prediction tool coupled with a prediction database, a web server and web service. Predicted prophages will help to fill the gaps in the current sparse phage sequence space, which should cover an estimated 100 million species. Systematic and reliable predictions will enable further studies of prophages contribution to the bacteriophage gene pool and to better understand gene shuffling between prophages and phages infecting the same host. Softare is available at http://aclame.ulb.ac.be/prophinder
Transcriptional Coupling of Neighboring Genes and Gene Expression Noise: Evidence that Gene Orientation and Noncoding Transcripts Are Modulators of Noise

PubMed Central

Wang, Guang-Zhong; Lercher, Martin J.; Hurst, Laurence D.

2011-01-01

Abstract How is noise in gene expression modulated? Do mechanisms of noise control impact genome organization? In yeast, the expression of one gene can affect that of a very close neighbor. As the effect is highly regionalized, we hypothesize that genes in different orientations will have differing degrees of coupled expression and, in turn, different noise levels. Divergently organized gene pairs, in particular those with bidirectional promoters, have close promoters, maximizing the likelihood that expression of one gene affects the neighbor. With more distant promoters, the same is less likely to hold for gene pairs in nondivergent orientation. Stochastic models suggest that coupled chromatin dynamics will typically result in low abundance-corrected noise (ACN). Transcription of noncoding RNA (ncRNA) from a bidirectional promoter, we thus hypothesize to be a noise-reduction, expression-priming, mechanism. The hypothesis correctly predicts that protein-coding genes with a bidirectional promoter, including those with a ncRNA partner, have lower ACN than other genes and divergent gene pairs uniquely have correlated ACN. Moreover, as predicted, ACN increases with the distance between promoters. The model also correctly predicts ncRNA transcripts to be often divergently transcribed from genes that a priori would be under selection for low noise (essential genes, protein complex genes) and that the latter genes should commonly reside in divergent orientation. Likewise, that genes with bidirectional promoters are rare subtelomerically, cluster together, and are enriched in essential gene clusters is expected and observed. We conclude that gene orientation and transcription of ncRNAs are candidate modulators of noise. PMID:21402863
Array data extractor (ADE): a LabVIEW program to extract and merge gene array data

PubMed Central

2013-01-01

Background Large data sets from gene expression array studies are publicly available offering information highly valuable for research across many disciplines ranging from fundamental to clinical research. Highly advanced bioinformatics tools have been made available to researchers, but a demand for user-friendly software allowing researchers to quickly extract expression information for multiple genes from multiple studies persists. Findings Here, we present a user-friendly LabVIEW program to automatically extract gene expression data for a list of genes from multiple normalized microarray datasets. Functionality was tested for 288 class A G protein-coupled receptors (GPCRs) and expression data from 12 studies comparing normal and diseased human hearts. Results confirmed known regulation of a beta 1 adrenergic receptor and further indicate novel research targets. Conclusions Although existing software allows for complex data analyses, the LabVIEW based program presented here, “Array Data Extractor (ADE)”, provides users with a tool to retrieve meaningful information from multiple normalized gene expression datasets in a fast and easy way. Further, the graphical programming language used in LabVIEW allows applying changes to the program without the need of advanced programming knowledge. PMID:24289243
[The value of 5-HTT gene polymorphism for the assessment and prediction of male adolescence violence].

PubMed

Yu, Yue; Liu, Xiang; Yang, Zhen-xing; Qiu, Chang-jian; Ma, Xiao-hong

2012-08-01

To establish an adolescent violence crime prediction model, and to assess the value of serotonin transporter (5-HTT) gene polymorphism for the assessment and prediction of violent crime. Investigative tools were used to analyze the difference in personality dimensions, social support, coping styles, aggressiveness, impulsivity, and family condition scale between 223 adolescents with violence behavior and 148 adolescents without violence behavior. The distribution of 5-HTT gene polymorphisms (5-HTTLPR and 5-HTTVNTR) was compared between the two groups. The role of 5-HTT gene polymorphism on adolescent personality, impulsion and aggression scale also was also analyzed. Stepwise logistic regression was used to establish a predictive model for adolescent violent crime. Significant difference was found between the violence group and the control group on multiple dimensions of psychology and environment scales. However, no statistical difference was found with regard to the 5-HTT genotypes and alleles between adolescents with violent behaviors and normal controls. The rate of prediction accuracy was not significantly improved when 5-HTT gene polymorphism was taken into the model. The violent crime of adolescents was closely related with social and environmental factors. No association was found between 5-HTT polymorphisms and adolescent violence criminal behavior.
Context-sensitive network-based disease genetics prediction and its implications in drug discovery

PubMed Central

Chen, Yang; Xu, Rong

2017-01-01

Abstract Motivation: Disease phenotype networks play an important role in computational approaches to identifying new disease-gene associations. Current disease phenotype networks often model disease relationships based on pairwise similarities, therefore ignore the specific context on how two diseases are connected. In this study, we propose a new strategy to model disease associations using context-sensitive networks (CSNs). We developed a CSN-based phenome-driven approach for disease genetics prediction, and investigated the translational potential of the predicted genes in drug discovery. Results: We constructed CSNs by directly connecting diseases with associated phenotypes. Here, we constructed two CSNs using different data sources; the two networks contain 26 790 and 13 822 nodes respectively. We integrated the CSNs with a genetic functional relationship network and predicted disease genes using a network-based ranking algorithm. For comparison, we built Similarity-Based disease Networks (SBN) using the same disease phenotype data. In a de novo cross validation for 3324 diseases, the CSN-based approach significantly increased the average rank from top 12.6 to top 8.8% for all tested genes comparing with the SBN-based approach (p
Cloning and characterization of a mouse gene with homology to the human von Hippel-Lindau disease tumor suppressor gene: implications for the potential organization of the human von Hippel-Lindau disease gene.

PubMed

Gao, J; Naglich, J G; Laidlaw, J; Whaley, J M; Seizinger, B R; Kley, N

1995-02-15

The human von Hippel-Lindau disease (VHL) gene has recently been identified and, based on the nucleotide sequence of a partial cDNA clone, has been predicted to encode a novel protein with as yet unknown functions [F. Latif et al., Science (Washington DC), 260: 1317-1320, 1993]. The length of the encoded protein and the characteristics of the cellular expressed protein are as yet unclear. Here we report the cloning and characterization of a mouse gene (mVHLh1) that is widely expressed in different mouse tissues and shares high homology with the human VHL gene. It predicts a protein 181 residues long (and/or 162 amino acids, considering a potential alternative start codon), which across a core region of approximately 140 residues displays a high degree of sequence identity (98%) to the predicted human VHL protein. High stringency DNA and RNA hybridization experiments and protein expression analyses indicate that this gene is the most highly VHL-related mouse gene, suggesting that it represents the mouse VHL gene homologue rather than a related gene sharing a conserved functional domain. These findings provide new insights into the potential organization of the VHL gene and nature of its encoded protein.
Comparative analysis of protein interactome networks prioritizes candidate genes with cancer signatures.

PubMed

Li, Yongsheng; Sahni, Nidhi; Yi, Song

2016-11-29

Comprehensive understanding of human cancer mechanisms requires the identification of a thorough list of cancer-associated genes, which could serve as biomarkers for diagnoses and therapies in various types of cancer. Although substantial progress has been made in functional studies to uncover genes involved in cancer, these efforts are often time-consuming and costly. Therefore, it remains challenging to comprehensively identify cancer candidate genes. Network-based methods have accelerated this process through the analysis of complex molecular interactions in the cell. However, the extent to which various interactome networks can contribute to prediction of candidate genes responsible for cancer is still enigmatic. In this study, we evaluated different human protein-protein interactome networks and compared their application to cancer gene prioritization. Our results indicate that network analyses can increase the power to identify novel cancer genes. In particular, such predictive power can be enhanced with the use of unbiased systematic protein interaction maps for cancer gene prioritization. Functional analysis reveals that the top ranked genes from network predictions co-occur often with cancer-related terms in literature, and further, these candidate genes are indeed frequently mutated across cancers. Finally, our study suggests that integrating interactome networks with other omics datasets could provide novel insights into cancer-associated genes and underlying molecular mechanisms.
Chromosome preference of disease genes and vectorization for the prediction of non-coding disease genes.

PubMed

Peng, Hui; Lan, Chaowang; Liu, Yuansheng; Liu, Tao; Blumenstein, Michael; Li, Jinyan

2017-10-03

Disease-related protein-coding genes have been widely studied, but disease-related non-coding genes remain largely unknown. This work introduces a new vector to represent diseases, and applies the newly vectorized data for a positive-unlabeled learning algorithm to predict and rank disease-related long non-coding RNA (lncRNA) genes. This novel vector representation for diseases consists of two sub-vectors, one is composed of 45 elements, characterizing the information entropies of the disease genes distribution over 45 chromosome substructures. This idea is supported by our observation that some substructures (e.g., the chromosome 6 p-arm) are highly preferred by disease-related protein coding genes, while some (e.g., the 21 p-arm) are not favored at all. The second sub-vector is 30-dimensional, characterizing the distribution of disease gene enriched KEGG pathways in comparison with our manually created pathway groups. The second sub-vector complements with the first one to differentiate between various diseases. Our prediction method outperforms the state-of-the-art methods on benchmark datasets for prioritizing disease related lncRNA genes. The method also works well when only the sequence information of an lncRNA gene is known, or even when a given disease has no currently recognized long non-coding genes.
Chromosome preference of disease genes and vectorization for the prediction of non-coding disease genes

PubMed Central

Peng, Hui; Lan, Chaowang; Liu, Yuansheng; Liu, Tao; Blumenstein, Michael; Li, Jinyan

2017-01-01

Disease-related protein-coding genes have been widely studied, but disease-related non-coding genes remain largely unknown. This work introduces a new vector to represent diseases, and applies the newly vectorized data for a positive-unlabeled learning algorithm to predict and rank disease-related long non-coding RNA (lncRNA) genes. This novel vector representation for diseases consists of two sub-vectors, one is composed of 45 elements, characterizing the information entropies of the disease genes distribution over 45 chromosome substructures. This idea is supported by our observation that some substructures (e.g., the chromosome 6 p-arm) are highly preferred by disease-related protein coding genes, while some (e.g., the 21 p-arm) are not favored at all. The second sub-vector is 30-dimensional, characterizing the distribution of disease gene enriched KEGG pathways in comparison with our manually created pathway groups. The second sub-vector complements with the first one to differentiate between various diseases. Our prediction method outperforms the state-of-the-art methods on benchmark datasets for prioritizing disease related lncRNA genes. The method also works well when only the sequence information of an lncRNA gene is known, or even when a given disease has no currently recognized long non-coding genes. PMID:29108274
Predictive computation of genomic logic processing functions in embryonic development

PubMed Central

Peter, Isabelle S.; Faure, Emmanuel; Davidson, Eric H.

2012-01-01

Gene regulatory networks (GRNs) control the dynamic spatial patterns of regulatory gene expression in development. Thus, in principle, GRN models may provide system-level, causal explanations of developmental process. To test this assertion, we have transformed a relatively well-established GRN model into a predictive, dynamic Boolean computational model. This Boolean model computes spatial and temporal gene expression according to the regulatory logic and gene interactions specified in a GRN model for embryonic development in the sea urchin. Additional information input into the model included the progressive embryonic geometry and gene expression kinetics. The resulting model predicted gene expression patterns for a large number of individual regulatory genes each hour up to gastrulation (30 h) in four different spatial domains of the embryo. Direct comparison with experimental observations showed that the model predictively computed these patterns with remarkable spatial and temporal accuracy. In addition, we used this model to carry out in silico perturbations of regulatory functions and of embryonic spatial organization. The model computationally reproduced the altered developmental functions observed experimentally. Two major conclusions are that the starting GRN model contains sufficiently complete regulatory information to permit explanation of a complex developmental process of gene expression solely in terms of genomic regulatory code, and that the Boolean model provides a tool with which to test in silico regulatory circuitry and developmental perturbations. PMID:22927416
Identification and Functional Analysis of the Nocardithiocin Gene Cluster in Nocardia pseudobrasiliensis

PubMed Central

Sakai, Kanae; Komaki, Hisayuki; Gonoi, Tohru

2015-01-01

Nocardithiocin is a thiopeptide compound isolated from the opportunistic pathogen Nocardia pseudobrasiliensis. It shows a strong activity against acid-fast bacteria and is also active against rifampicin-resistant Mycobacterium tuberculosis. Here, we report the identification of the nocardithiocin gene cluster in N. pseudobrasiliensis IFM 0761 based on conserved thiopeptide biosynthesis gene sequence and the whole genome sequence. The predicted gene cluster was confirmed by gene disruption and complementation. As expected, strains containing the disrupted gene did not produce nocardithiocin while gene complementation restored nocardithiocin production in these strains. The predicted cluster was further analyzed using RNA-seq which showed that the nocardithiocin gene cluster contains 12 genes within a 15.2-kb region. This finding will promote the improvement of nocardithiocin productivity and its derivatives production. PMID:26588225
Identification of rare paired box 3 variant in strabismus by whole exome sequencing

PubMed Central

Gong, Hui-Min; Wang, Jing; Xu, Jing; Zhou, Zhan-Yu; Li, Jing-Wen; Chen, Shu-Fang

2017-01-01

AIM To identify the potentially pathogenic gene variants that contributes to the etiology of strabismus. METHODS A Chinese pedigree with strabismus was collected and the exomes of two affected individuals were sequenced using the next-generation sequencing technology. The resulting variants from exome sequencing were filtered by subsequent bioinformatics methods and the candidate mutation was verified as heterozygous in the affected proposita and her mother by sanger sequencing. RESULTS Whole exome sequencing and filtering identified a nonsynonymous mutation c.434G-T transition in paired box 3 (PAX3) in the two affected individuals, which were predicted to be deleterious by more than 4 bioinformatics programs. This altered amino acid residue was located in the conserved PAX domain of PAX3. This gene encodes a member of the PAX family of transcription factors, which play critical roles during fetal development. Mutations in PAX3 were associated with Waardenburg syndrome with strabismus. CONCLUSION Our results report that the c.434G-T mutation (p.R145L) in PAX3 may contribute to strabismus, expanding our understanding of the causally relevant genes for this disorder. PMID:28861346
Polycomb-like 2 Associates with PRC2 and Regulates Transcriptional Networks during Mouse Embryonic Stem Cell Self-Renewal and Differentiation

PubMed Central

Walker, Emily; Chang, Wing Y.; Hunkapiller, Julie; Cagney, Gerard; Garcha, Kamal; Torchia, Joseph; Krogan, Nevan J.; Reiter, Jeremy F.; Stanford, William L.

2010-01-01

Summary Polycomb group (PcG) proteins are conserved epigenetic transcriptional repressors that control numerous developmental gene expression programs and have recently been implicated in modulating embryonic stem cell (ESC) fate. We identified the PcG protein PCL2 (polycomb-like 2) in a genome-wide screen for regulators of self-renewal and pluripotency and predicted that it would play an important role in mouse ESC fate determination. Using multiple biochemical strategies, we provide evidence that PCL2 is a Polycomb Repressive Complex 2 (PRC2)-associated protein in mouse ESCs. Knockdown of Pcl2 in ESCs resulted in heightened self-renewal characteristics, defects in differentiation and altered patterns of histone methylation. Integration of global gene expression and promoter occupancy analyses allowed us to identify PCL2 and PRC2 transcriptional targets and draft regulatory networks. We describe the role of PCL2 in both modulating transcription of ESC self-renewal genes in undifferentiated ESCs as well as developmental regulators during early commitment and differentiation. PMID:20144788
Genetic analysis of prolactin gene in Pakistani cattle.

PubMed

Uddin, Raza Mohy; Babar, Masroor Ellahi; Nadeem, Asif; Hussain, Tanveer; Ahmad, Shakil; Munir, Sadia; Mehboob, Riffat; Ahmad, Fridoon Jawad

2013-10-01

Prolactin (PRL) is a polypeptide hormone, secreted mainly by the anterior pituitary gland. It is involved in many endocrine activities. The key functions of PRL are related to reproduction and lactation in mammals. To ascertain the presence of polymorphisms in the bovine PRL gene (bPRL), the bPRL gene was sequenced. Five mutations were identified in exonic region and eleven in associated intronic regions in 100 cattle from four Pakistani cattle breeds. Haplotype of predicted amino acid changes represent a common alteration at codon 222 from R-Arginine into K-Lysine in all four breeds. Significant statistical variations were observed in the distribution of single nucleotide polymorphism (SNP) in various cattle populations. However, on basis of present study, an association of these SNPs with milk performance traits in four Pakistani cow breeds cannot be truly replicated but at least can be effective DNA markers for some of the breeds studied. Linkage analysis between these SNPs on larger populations can be useful for the association with milk production traits. Furthermore, present study may be used for marker-assisted selection and management in cattle breeding program in local cattle breeds.
Identification of rare paired box 3 variant in strabismus by whole exome sequencing.

PubMed

Gong, Hui-Min; Wang, Jing; Xu, Jing; Zhou, Zhan-Yu; Li, Jing-Wen; Chen, Shu-Fang

2017-01-01

To identify the potentially pathogenic gene variants that contributes to the etiology of strabismus. A Chinese pedigree with strabismus was collected and the exomes of two affected individuals were sequenced using the next-generation sequencing technology. The resulting variants from exome sequencing were filtered by subsequent bioinformatics methods and the candidate mutation was verified as heterozygous in the affected proposita and her mother by sanger sequencing. Whole exome sequencing and filtering identified a nonsynonymous mutation c.434G-T transition in paired box 3 (PAX3) in the two affected individuals, which were predicted to be deleterious by more than 4 bioinformatics programs. This altered amino acid residue was located in the conserved PAX domain of PAX3. This gene encodes a member of the PAX family of transcription factors, which play critical roles during fetal development. Mutations in PAX3 were associated with Waardenburg syndrome with strabismus. Our results report that the c.434G-T mutation (p.R145L) in PAX3 may contribute to strabismus, expanding our understanding of the causally relevant genes for this disorder.
Transcriptomics of shading-induced and NAA-induced abscission in apple (Malus domestica) reveals a shared pathway involving reduced photosynthesis, alterations in carbohydrate transport and signaling and hormone crosstalk

PubMed Central

2011-01-01

Background Naphthaleneacetic acid (NAA), a synthetic auxin analogue, is widely used as an effective thinner in apple orchards. When applied shortly after fruit set, some fruit abscise leading to improved fruit size and quality. However, the thinning results of NAA are inconsistent and difficult to predict, sometimes leading to excess fruit drop or insufficient thinning which are costly to growers. This unpredictability reflects our incomplete understanding of the mode of action of NAA in promoting fruit abscission. Results Here we compared NAA-induced fruit drop with that caused by shading via gene expression profiling performed on the fruit abscission zone (FAZ), sampled 1, 3, and 5 d after treatment. More than 700 genes with significant changes in transcript abundance were identified from NAA-treated FAZ. Combining results from both treatments, we found that genes associated with photosynthesis, cell cycle and membrane/cellular trafficking were downregulated. On the other hand, there was up-regulation of genes related to ABA, ethylene biosynthesis and signaling, cell wall degradation and programmed cell death. While the differentially expressed gene sets for NAA and shading treatments shared only 25% identity, NAA and shading showed substantial similarity with respect to the classes of genes identified. Specifically, photosynthesis, carbon utilization, ABA and ethylene pathways were affected in both NAA- and shading-induced young fruit abscission. Moreover, we found that NAA, similar to shading, directly interfered with leaf photosynthesis by repressing photosystem II (PSII) efficiency within 10 minutes of treatment, suggesting that NAA and shading induced some of the same early responses due to reduced photosynthesis, which concurred with changes in hormone signaling pathways and triggered fruit abscission. Conclusions This study provides an extensive transcriptome study and a good platform for further investigation of possible regulatory genes involved in the induction of young fruit abscission in apple, which will enable us to better understand the mechanism of fruit thinning and facilitate the selection of potential chemicals for the thinning programs in apple. PMID:22003957

Transcriptomics of shading-induced and NAA-induced abscission in apple (Malus domestica) reveals a shared pathway involving reduced photosynthesis, alterations in carbohydrate transport and signaling and hormone crosstalk.

PubMed

Zhu, Hong; Dardick, Chris D; Beers, Eric P; Callanhan, Ann M; Xia, Rui; Yuan, Rongcai

2011-10-17

Naphthaleneacetic acid (NAA), a synthetic auxin analogue, is widely used as an effective thinner in apple orchards. When applied shortly after fruit set, some fruit abscise leading to improved fruit size and quality. However, the thinning results of NAA are inconsistent and difficult to predict, sometimes leading to excess fruit drop or insufficient thinning which are costly to growers. This unpredictability reflects our incomplete understanding of the mode of action of NAA in promoting fruit abscission. Here we compared NAA-induced fruit drop with that caused by shading via gene expression profiling performed on the fruit abscission zone (FAZ), sampled 1, 3, and 5 d after treatment. More than 700 genes with significant changes in transcript abundance were identified from NAA-treated FAZ. Combining results from both treatments, we found that genes associated with photosynthesis, cell cycle and membrane/cellular trafficking were downregulated. On the other hand, there was up-regulation of genes related to ABA, ethylene biosynthesis and signaling, cell wall degradation and programmed cell death. While the differentially expressed gene sets for NAA and shading treatments shared only 25% identity, NAA and shading showed substantial similarity with respect to the classes of genes identified. Specifically, photosynthesis, carbon utilization, ABA and ethylene pathways were affected in both NAA- and shading-induced young fruit abscission. Moreover, we found that NAA, similar to shading, directly interfered with leaf photosynthesis by repressing photosystem II (PSII) efficiency within 10 minutes of treatment, suggesting that NAA and shading induced some of the same early responses due to reduced photosynthesis, which concurred with changes in hormone signaling pathways and triggered fruit abscission. This study provides an extensive transcriptome study and a good platform for further investigation of possible regulatory genes involved in the induction of young fruit abscission in apple, which will enable us to better understand the mechanism of fruit thinning and facilitate the selection of potential chemicals for the thinning programs in apple.
A gene expression signature associated with survival in metastatic melanoma

PubMed Central

Mandruzzato, Susanna; Callegaro, Andrea; Turcatel, Gianluca; Francescato, Samuela; Montesco, Maria C; Chiarion-Sileni, Vanna; Mocellin, Simone; Rossi, Carlo R; Bicciato, Silvio; Wang, Ena; Marincola, Francesco M; Zanovello, Paola

2006-01-01

Background Current clinical and histopathological criteria used to define the prognosis of melanoma patients are inadequate for accurate prediction of clinical outcome. We investigated whether genome screening by means of high-throughput gene microarray might provide clinically useful information on patient survival. Methods Forty-three tumor tissues from 38 patients with stage III and stage IV melanoma were profiled with a 17,500 element cDNA microarray. Expression data were analyzed using significance analysis of microarrays (SAM) to identify genes associated with patient survival, and supervised principal components (SPC) to determine survival prediction. Results SAM analysis revealed a set of 80 probes, corresponding to 70 genes, associated with survival, i.e. 45 probes characterizing longer and 35 shorter survival times, respectively. These transcripts were included in a survival prediction model designed using SPC and cross-validation which allowed identifying 30 predicting probes out of the 80 associated with survival. Conclusion The longer-survival group of genes included those expressed in immune cells, both innate and acquired, confirming the interplay between immunological mechanisms and the natural history of melanoma. Genes linked to immune cells were totally lacking in the poor-survival group, which was instead associated with a number of genes related to highly proliferative and invasive tumor cells. PMID:17129373
Robust diagnosis of non-Hodgkin lymphoma phenotypes validated on gene expression data from different laboratories.

PubMed

Bhanot, Gyan; Alexe, Gabriela; Levine, Arnold J; Stolovitzky, Gustavo

2005-01-01

A major challenge in cancer diagnosis from microarray data is the need for robust, accurate, classification models which are independent of the analysis techniques used and can combine data from different laboratories. We propose such a classification scheme originally developed for phenotype identification from mass spectrometry data. The method uses a robust multivariate gene selection procedure and combines the results of several machine learning tools trained on raw and pattern data to produce an accurate meta-classifier. We illustrate and validate our method by applying it to gene expression datasets: the oligonucleotide HuGeneFL microarray dataset of Shipp et al. (www.genome.wi.mit.du/MPR/lymphoma) and the Hu95Av2 Affymetrix dataset (DallaFavera's laboratory, Columbia University). Our pattern-based meta-classification technique achieves higher predictive accuracies than each of the individual classifiers , is robust against data perturbations and provides subsets of related predictive genes. Our techniques predict that combinations of some genes in the p53 pathway are highly predictive of phenotype. In particular, we find that in 80% of DLBCL cases the mRNA level of at least one of the three genes p53, PLK1 and CDK2 is elevated, while in 80% of FL cases, the mRNA level of at most one of them is elevated.
A novel essential domain perspective for exploring gene essentiality.

PubMed

Lu, Yao; Lu, Yulan; Deng, Jingyuan; Peng, Hai; Lu, Hui; Lu, Long Jason

2015-09-15

Genes with indispensable functions are identified as essential; however, the traditional gene-level studies of essentiality have several limitations. In this study, we characterized gene essentiality from a new perspective of protein domains, the independent structural or functional units of a polypeptide chain. To identify such essential domains, we have developed an Expectation-Maximization (EM) algorithm-based Essential Domain Prediction (EDP) Model. With simulated datasets, the model provided convergent results given different initial values and offered accurate predictions even with noise. We then applied the EDP model to six microbial species and predicted 1879 domains to be essential in at least one species, ranging 10-23% in each species. The predicted essential domains were more conserved than either non-essential domains or essential genes. Comparing essential domains in prokaryotes and eukaryotes revealed an evolutionary distance consistent with that inferred from ribosomal RNA. When utilizing these essential domains to reproduce the annotation of essential genes, we received accurate results that suggest protein domains are more basic units for the essentiality of genes. Furthermore, we presented several examples to illustrate how the combination of essential and non-essential domains can lead to genes with divergent essentiality. In summary, we have described the first systematic analysis on gene essentiality on the level of domains. huilu.bioinfo@gmail.com or Long.Lu@cchmc.org Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Analysis of Strand-Specific RNA-Seq Data Using Machine Learning Reveals the Structures of Transcription Units in Clostridium thermocellum

DOE PAGES

Chou, Wen-Chi; Ma, Qin; Yang, Shihui; ...

2015-03-12

The identification of transcription units (TUs) encoded in a bacterial genome is essential to elucidation of transcriptional regulation of the organism. To gain a detailed understanding of the dynamically composed TU structures, we have used four strand-specific RNA-seq (ssRNA-seq) datasets collected under two experimental conditions to derive the genomic TU organization of Clostridium thermocellum using a machine-learning approach. Our method accurately predicted the genomic boundaries of individual TUs based on two sets of parameters measuring the RNA-seq expression patterns across the genome: expression-level continuity and variance. A total of 2590 distinct TUs are predicted based on the four RNA-seq datasets.more » Moreover, among the predicted TUs, 44% have multiple genes. We assessed our prediction method on an independent set of RNA-seq data with longer reads. The evaluation confirmed the high quality of the predicted TUs. Functional enrichment analyses on a selected subset of the predicted TUs revealed interesting biology. To demonstrate the generality of the prediction method, we have also applied the method to RNA-seq data collected on Escherichia coli and achieved high prediction accuracies. The TU prediction program named SeqTU is publicly available athttps://code.google.com/p/seqtu/. We expect that the predicted TUs can serve as the baseline information for studying transcriptional and post-transcriptional regulation in C. thermocellum and other bacteria.« less
Sensitivity analysis of gene ranking methods in phenotype prediction.

PubMed

deAndrés-Galiana, Enrique J; Fernández-Martínez, Juan L; Sonis, Stephen T

2016-12-01

It has become clear that noise generated during the assay and analytical processes has the ability to disrupt accurate interpretation of genomic studies. Not only does such noise impact the scientific validity and costs of studies, but when assessed in the context of clinically translatable indications such as phenotype prediction, it can lead to inaccurate conclusions that could ultimately impact patients. We applied a sequence of ranking methods to damp noise associated with microarray outputs, and then tested the utility of the approach in three disease indications using publically available datasets. This study was performed in three phases. We first theoretically analyzed the effect of noise in phenotype prediction problems showing that it can be expressed as a modeling error that partially falsifies the pathways. Secondly, via synthetic modeling, we performed the sensitivity analysis for the main gene ranking methods to different types of noise. Finally, we studied the predictive accuracy of the gene lists provided by these ranking methods in synthetic data and in three different datasets related to cancer, rare and neurodegenerative diseases to better understand the translational aspects of our findings. In the case of synthetic modeling, we showed that Fisher's Ratio (FR) was the most robust gene ranking method in terms of precision for all the types of noise at different levels. Significance Analysis of Microarrays (SAM) provided slightly lower performance and the rest of the methods (fold change, entropy and maximum percentile distance) were much less precise and accurate. The predictive accuracy of the smallest set of high discriminatory probes was similar for all the methods in the case of Gaussian and Log-Gaussian noise. In the case of class assignment noise, the predictive accuracy of SAM and FR is higher. Finally, for real datasets (Chronic Lymphocytic Leukemia, Inclusion Body Myositis and Amyotrophic Lateral Sclerosis) we found that FR and SAM provided the highest predictive accuracies with the smallest number of genes. Biological pathways were found with an expanded list of genes whose discriminatory power has been established via FR. We have shown that noise in expression data and class assignment partially falsifies the sets of discriminatory probes in phenotype prediction problems. FR and SAM better exploit the principle of parsimony and are able to find subsets with less number of high discriminatory genes. The predictive accuracy and the precision are two different metrics to select the important genes, since in the presence of noise the most predictive genes do not completely coincide with those that are related to the phenotype. Based on the synthetic results, FR and SAM are recommended to unravel the biological pathways that are involved in the disease development. Copyright Â© 2016 Elsevier Inc. All rights reserved.
Evaluation of a 30-gene paclitaxel, fluorouracil, doxorubicin and cyclophosphamide chemotherapy response predictor in a multicenter randomized trial in breast cancer

PubMed Central

Tabchy, Adel; Valero, Vicente; Vidaurre, Tatiana; Lluch, Ana; Gomez, Henry; Martin, Miguel; Qi, Yuan; Barajas-Figueroa, Luis Javier; Souchon, Eduardo; Coutant, Charles; Doimi, Franco D; Ibrahim, Nuhad K; Gong, Yun; Hortobagyi, Gabriel N; Hess, Kenneth R; Symmans, W Fraser; Pusztai, Lajos

2010-01-01

Purpose We examined in a prospective, randomized, international clinical trial the performance of a previously defined 30-gene predictor (DLDA-30) of pathologic complete response (pCR) to preoperative weekly paclitaxel and fluorouracil, doxorubicin, cyclophosphamide (T/FAC) chemotherapy, and assessed if DLDA-30 also predicts increased sensitivity to FAC-only chemotherapy. We compared the pCR rates after T/FAC versus FAC×6 preoperative chemotherapy. We also performed an exploratory analysis to identify novel candidate genes that differentially predict response in the two treatment arms. Experimental Design 273 patients were randomly assigned to receive either weekly paclitaxel × 12 followed by FAC × 4 (T/FAC, n=138), or FAC × 6 (n=135) neoadjuvant chemotherapy. All patients underwent a pretreatment FNA biopsy of the tumor for gene expression profiling and treatment response prediction. Results The pCR rates were 19% and 9% in the T/FAC and FAC arms, respectively (p<0.05). In the T/FAC arm, the positive predictive value (PPV) of the genomic predictor was 38% (95%CI:21–56%), the negative predictive value (NPV) 88% (CI:77–95%) and the AUC 0.711. In the FAC arm, the PPV was 9% (CI:1–29%) and the AUC 0.584. This suggests that the genomic predictor may have regimen-specificity. Its performance was similar to a clinical variable-based predictor nomogram. Conclusions Gene expression profiling for prospective response prediction was feasible in this international trial. The 30-gene predictor can identify patients with greater than average sensitivity to T/FAC chemotherapy. However, it captured molecular equivalents of clinical phenotype. Next generation predictive markers will need to be developed separately for different molecular subsets of breast cancers. PMID:20829329
Complete nucleotide sequence of the freshwater unicellular cyanobacterium Synechococcus elongatus PCC 6301 chromosome: gene content and organization.

PubMed

Sugita, Chieko; Ogata, Koretsugu; Shikata, Masamitsu; Jikuya, Hiroyuki; Takano, Jun; Furumichi, Miho; Kanehisa, Minoru; Omata, Tatsuo; Sugiura, Masahiro; Sugita, Mamoru

2007-01-01

The entire genome of the unicellular cyanobacterium Synechococcus elongatus PCC 6301 (formerly Anacystis nidulans Berkeley strain 6301) was sequenced. The genome consisted of a circular chromosome 2,696,255 bp long. A total of 2,525 potential protein-coding genes, two sets of rRNA genes, 45 tRNA genes representing 42 tRNA species, and several genes for small stable RNAs were assigned to the chromosome by similarity searches and computer predictions. The translated products of 56% of the potential protein-coding genes showed sequence similarities to experimentally identified and predicted proteins of known function, and the products of 35% of the genes showed sequence similarities to the translated products of hypothetical genes. The remaining 9% of genes lacked significant similarities to genes for predicted proteins in the public DNA databases. Some 139 genes coding for photosynthesis-related components were identified. Thirty-seven genes for two-component signal transduction systems were also identified. This is the smallest number of such genes identified in cyanobacteria, except for marine cyanobacteria, suggesting that only simple signal transduction systems are found in this strain. The gene arrangement and nucleotide sequence of Synechococcus elongatus PCC 6301 were nearly identical to those of a closely related strain Synechococcus elongatus PCC 7942, except for the presence of a 188.6 kb inversion. The sequences as well as the gene information shown in this paper are available in the Web database, CYORF (http://www.cyano.genome.jp/).
The Breast Cancer to Bone (B2B) Metastases Research Program: a multi-disciplinary investigation of bone metastases from breast cancer.

PubMed

Brockton, Nigel T; Gill, Stephanie J; Laborge, Stephanie L; Paterson, Alexander H G; Cook, Linda S; Vogel, Hans J; Shemanko, Carrie S; Hanley, David A; Magliocco, Anthony M; Friedenreich, Christine M

2015-07-10

Bone is the most common site of breast cancer distant metastasis, affecting 50-70 % of patients who develop metastatic disease. Despite decades of informative research, the effective prevention, prediction and treatment of these lesions remains elusive. The Breast Cancer to Bone (B2B) Metastases Research Program consists of a prospective cohort of incident breast cancer patients and four sub-projects that are investigating priority areas in breast cancer bone metastases. These include the impact of lifestyle factors and inflammation on risk of bone metastases, the gene expression features of the primary tumour, the potential role for metabolomics in early detection of bone metastatic disease and the signalling pathways that drive the metastatic lesions in the bone. The B2B Research Program is enrolling a prospective cohort of 600 newly diagnosed, incident, stage I-IIIc breast cancer survivors in Alberta, Canada over a five year period. At baseline, pre-treatment/surgery blood samples are collected and detailed epidemiologic data is collected by in-person interview and self-administered questionnaires. Additional self-administered questionnaires and blood samples are completed at specified follow-up intervals (24, 48 and 72 months). Vital status is obtained prior to each follow-up through record linkages with the Alberta Cancer Registry. Recurrences are identified through medical chart abstractions. Each of the four projects applies specific methods and analyses to assess the impact of serum vitamin D and cytokine concentrations, tumour transcript and protein expression, serum metabolomic profiles and in vitro cell signalling on breast cancer bone metastases. The B2B Research Program will address key issues in breast cancer bone metastases including the association between lifestyle factors (particularly a comprehensive assessment of vitamin D status) inflammation and bone metastases, the significance or primary tumour gene expression in tissue tropism, the potential of metabolomic profiles for risk assessment and early detection and the signalling pathways controlling the metastatic tumour microenvironment. There is substantial synergy between the four projects and it is hoped that this integrated program of research will advance our understanding of key aspects of bone metastases from breast cancer to improve the prevention, prediction, detection, and treatment of these lesions.
Genome-wide identification, evolutionary and expression analysis of the aspartic protease gene superfamily in grape

PubMed Central

2013-01-01

Background Aspartic proteases (APs) are a large family of proteolytic enzymes found in almost all organisms. In plants, they are involved in many biological processes, such as senescence, stress responses, programmed cell death, and reproduction. Prior to the present study, no grape AP gene(s) had been reported, and their research on woody species was very limited. Results In this study, a total of 50 AP genes (VvAP) were identified in the grape genome, among which 30 contained the complete ASP domain. Synteny analysis within grape indicated that segmental and tandem duplication events contributed to the expansion of the grape AP family. Additional analysis between grape and Arabidopsis demonstrated that several grape AP genes were found in the corresponding syntenic blocks of Arabidopsis, suggesting that these genes arose before the divergence of grape and Arabidopsis. Phylogenetic relationships of the 30 VvAPs with the complete ASP domain and their Arabidopsis orthologs, as well as their gene and protein features were analyzed and their cellular localization was predicted. Moreover, expression profiles of VvAP genes in six different tissues were determined, and their transcript abundance under various stresses and hormone treatments were measured. Twenty-seven VvAP genes were expressed in at least one of the six tissues examined; nineteen VvAPs responded to at least one abiotic stress, 12 VvAPs responded to powdery mildew infection, and most of the VvAPs responded to SA and ABA treatments. Furthermore, integrated synteny and phylogenetic analysis identified orthologous AP genes between grape and Arabidopsis, providing a unique starting point for investigating the function of grape AP genes. Conclusions The genome-wide identification, evolutionary and expression analyses of grape AP genes provide a framework for future analysis of AP genes in defining their roles during stress response. Integrated synteny and phylogenetic analyses provide novel insight into the functions of less well-studied genes using information from their better understood orthologs. PMID:23945092
The role of gene-gene interaction in the prediction of criminal behavior.

PubMed

Boutwell, Brian B; Menard, Scott; Barnes, J C; Beaver, Kevin M; Armstrong, Todd A; Boisvert, Danielle

2014-04-01

A host of research has examined the possibility that environmental risk factors might condition the influence of genes on various outcomes. Less research, however, has been aimed at exploring the possibility that genetic factors might interact to impact the emergence of human traits. Even fewer studies exist examining the interaction of genes in the prediction of behavioral outcomes. The current study expands this body of research by testing the interaction between genes involved in neural transmission. Our findings suggest that certain dopamine genes interact to increase the odds of criminogenic outcomes in a national sample of Americans. Copyright © 2014 Elsevier Inc. All rights reserved.
A novel gene expression-based prognostic scoring system to predict survival in gastric cancer

DOE PAGES

Wang, Pin; Wang, Yunshan; Hang, Bo; ...

2016-07-11

Analysis of gene expression patterns in gastric cancer (GC) can help to identify a comprehensive panel of gene biomarkers for predicting clinical outcomes and to discover potential new therapeutic targets. Here, a multi-step bioinformatics analytic approach was developed to establish a novel prognostic scoring system for GC. We first identified 276 genes that were robustly differentially expressed between normal and GC tissues, of which, 249 were found to be significantly associated with overall survival (OS) by univariate Cox regression analysis. The biological functions of 249 genes are related to cell cycle, RNA/ncRNA process, acetylation and extracellular matrix organization. A networkmore » was generated for view of the gene expression architecture of 249 genes in 265 GCs. Finally, we applied a canonical discriminant analysis approach to identify a 53-gene signature and a prognostic scoring system was established based on a canonical discriminant function of 53 genes. The prognostic scores strongly predicted patients with GC to have either a poor or good OS. Our study raises the prospect that the practicality of GC patient prognosis can be assessed by this prognostic scoring system.« less
Combining guilt-by-association and guilt-by-profiling to predict Saccharomyces cerevisiae gene function

PubMed Central

Tian, Weidong; Zhang, Lan V; Taşan, Murat; Gibbons, Francis D; King, Oliver D; Park, Julie; Wunderlich, Zeba; Cherry, J Michael; Roth, Frederick P

2008-01-01

Background: Learning the function of genes is a major goal of computational genomics. Methods for inferring gene function have typically fallen into two categories: 'guilt-by-profiling', which exploits correlation between function and other gene characteristics; and 'guilt-by-association', which transfers function from one gene to another via biological relationships. Results: We have developed a strategy ('Funckenstein') that performs guilt-by-profiling and guilt-by-association and combines the results. Using a benchmark set of functional categories and input data for protein-coding genes in Saccharomyces cerevisiae, Funckenstein was compared with a previous combined strategy. Subsequently, we applied Funckenstein to 2,455 Gene Ontology terms. In the process, we developed 2,455 guilt-by-profiling classifiers based on 8,848 gene characteristics and 12 functional linkage graphs based on 23 biological relationships. Conclusion: Funckenstein outperforms a previous combined strategy using a common benchmark dataset. The combination of 'guilt-by-profiling' and 'guilt-by-association' gave significant improvement over the component classifiers, showing the greatest synergy for the most specific functions. Performance was evaluated by cross-validation and by literature examination of the top-scoring novel predictions. These quantitative predictions should help prioritize experimental study of yeast gene functions. PMID:18613951
A novel gene expression-based prognostic scoring system to predict survival in gastric cancer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Pin; Wang, Yunshan; Hang, Bo

Analysis of gene expression patterns in gastric cancer (GC) can help to identify a comprehensive panel of gene biomarkers for predicting clinical outcomes and to discover potential new therapeutic targets. Here, a multi-step bioinformatics analytic approach was developed to establish a novel prognostic scoring system for GC. We first identified 276 genes that were robustly differentially expressed between normal and GC tissues, of which, 249 were found to be significantly associated with overall survival (OS) by univariate Cox regression analysis. The biological functions of 249 genes are related to cell cycle, RNA/ncRNA process, acetylation and extracellular matrix organization. A networkmore » was generated for view of the gene expression architecture of 249 genes in 265 GCs. Finally, we applied a canonical discriminant analysis approach to identify a 53-gene signature and a prognostic scoring system was established based on a canonical discriminant function of 53 genes. The prognostic scores strongly predicted patients with GC to have either a poor or good OS. Our study raises the prospect that the practicality of GC patient prognosis can be assessed by this prognostic scoring system.« less
Comparison of RNA-seq and microarray-based models for clinical endpoint prediction.

PubMed

Zhang, Wenqian; Yu, Ying; Hertwig, Falk; Thierry-Mieg, Jean; Zhang, Wenwei; Thierry-Mieg, Danielle; Wang, Jian; Furlanello, Cesare; Devanarayan, Viswanath; Cheng, Jie; Deng, Youping; Hero, Barbara; Hong, Huixiao; Jia, Meiwen; Li, Li; Lin, Simon M; Nikolsky, Yuri; Oberthuer, André; Qing, Tao; Su, Zhenqiang; Volland, Ruth; Wang, Charles; Wang, May D; Ai, Junmei; Albanese, Davide; Asgharzadeh, Shahab; Avigad, Smadar; Bao, Wenjun; Bessarabova, Marina; Brilliant, Murray H; Brors, Benedikt; Chierici, Marco; Chu, Tzu-Ming; Zhang, Jibin; Grundy, Richard G; He, Min Max; Hebbring, Scott; Kaufman, Howard L; Lababidi, Samir; Lancashire, Lee J; Li, Yan; Lu, Xin X; Luo, Heng; Ma, Xiwen; Ning, Baitang; Noguera, Rosa; Peifer, Martin; Phan, John H; Roels, Frederik; Rosswog, Carolina; Shao, Susan; Shen, Jie; Theissen, Jessica; Tonini, Gian Paolo; Vandesompele, Jo; Wu, Po-Yen; Xiao, Wenzhong; Xu, Joshua; Xu, Weihong; Xuan, Jiekun; Yang, Yong; Ye, Zhan; Dong, Zirui; Zhang, Ke K; Yin, Ye; Zhao, Chen; Zheng, Yuanting; Wolfinger, Russell D; Shi, Tieliu; Malkas, Linda H; Berthold, Frank; Wang, Jun; Tong, Weida; Shi, Leming; Peng, Zhiyu; Fischer, Matthias

2015-06-25

Gene expression profiling is being widely applied in cancer research to identify biomarkers for clinical endpoint prediction. Since RNA-seq provides a powerful tool for transcriptome-based applications beyond the limitations of microarrays, we sought to systematically evaluate the performance of RNA-seq-based and microarray-based classifiers in this MAQC-III/SEQC study for clinical endpoint prediction using neuroblastoma as a model. We generate gene expression profiles from 498 primary neuroblastomas using both RNA-seq and 44 k microarrays. Characterization of the neuroblastoma transcriptome by RNA-seq reveals that more than 48,000 genes and 200,000 transcripts are being expressed in this malignancy. We also find that RNA-seq provides much more detailed information on specific transcript expression patterns in clinico-genetic neuroblastoma subgroups than microarrays. To systematically compare the power of RNA-seq and microarray-based models in predicting clinical endpoints, we divide the cohort randomly into training and validation sets and develop 360 predictive models on six clinical endpoints of varying predictability. Evaluation of factors potentially affecting model performances reveals that prediction accuracies are most strongly influenced by the nature of the clinical endpoint, whereas technological platforms (RNA-seq vs. microarrays), RNA-seq data analysis pipelines, and feature levels (gene vs. transcript vs. exon-junction level) do not significantly affect performances of the models. We demonstrate that RNA-seq outperforms microarrays in determining the transcriptomic characteristics of cancer, while RNA-seq and microarray-based models perform similarly in clinical endpoint prediction. Our findings may be valuable to guide future studies on the development of gene expression-based predictive models and their implementation in clinical practice.
Phenome-driven disease genetics prediction toward drug discovery

PubMed Central

Chen, Yang; Li, Li; Zhang, Guo-Qiang; Xu, Rong

2015-01-01

Motivation: Discerning genetic contributions to diseases not only enhances our understanding of disease mechanisms, but also leads to translational opportunities for drug discovery. Recent computational approaches incorporate disease phenotypic similarities to improve the prediction power of disease gene discovery. However, most current studies used only one data source of human disease phenotype. We present an innovative and generic strategy for combining multiple different data sources of human disease phenotype and predicting disease-associated genes from integrated phenotypic and genomic data. Results: To demonstrate our approach, we explored a new phenotype database from biomedical ontologies and constructed Disease Manifestation Network (DMN). We combined DMN with mimMiner, which was a widely used phenotype database in disease gene prediction studies. Our approach achieved significantly improved performance over a baseline method, which used only one phenotype data source. In the leave-one-out cross-validation and de novo gene prediction analysis, our approach achieved the area under the curves of 90.7% and 90.3%, which are significantly higher than 84.2% (P < e−4) and 81.3% (P < e−12) for the baseline approach. We further demonstrated that our predicted genes have the translational potential in drug discovery. We used Crohn’s disease as an example and ranked the candidate drugs based on the rank of drug targets. Our gene prediction approach prioritized druggable genes that are likely to be associated with Crohn’s disease pathogenesis, and our rank of candidate drugs successfully prioritized the Food and Drug Administration-approved drugs for Crohn’s disease. We also found literature evidence to support a number of drugs among the top 200 candidates. In summary, we demonstrated that a novel strategy combining unique disease phenotype data with system approaches can lead to rapid drug discovery. Availability and implementation: nlp.case.edu/public/data/DMN Contact: rxx@case.edu PMID:26072493
Brain white matter structure and COMT gene are linked to second-language learning in adults

PubMed Central

Mamiya, Ping C.; Richards, Todd L.; Coe, Bradley P.; Eichler, Evan E.; Kuhl, Patricia K.

2016-01-01

Adult human brains retain the capacity to undergo tissue reorganization during second-language learning. Brain-imaging studies show a relationship between neuroanatomical properties and learning for adults exposed to a second language. However, the role of genetic factors in this relationship has not been investigated. The goal of the current study was twofold: (i) to characterize the relationship between brain white matter fiber-tract properties and second-language immersion using diffusion tensor imaging, and (ii) to determine whether polymorphisms in the catechol-O-methyltransferase (COMT) gene affect the relationship. We recruited incoming Chinese students enrolled in the University of Washington and scanned their brains one time. We measured the diffusion properties of the white matter fiber tracts and correlated them with the number of days each student had been in the immersion program at the time of the brain scan. We found that higher numbers of days in the English immersion program correlated with higher fractional anisotropy and lower radial diffusivity in the right superior longitudinal fasciculus. We show that fractional anisotropy declined once the subjects finished the immersion program. The relationship between brain white matter fiber-tract properties and immersion varied in subjects with different COMT genotypes. Subjects with the Methionine (Met)/Valine (Val) and Val/Val genotypes showed higher fractional anisotropy and lower radial diffusivity during immersion, which reversed immediately after immersion ended, whereas those with the Met/Met genotype did not show these relationships. Statistical modeling revealed that subjects’ grades in the language immersion program were best predicted by fractional anisotropy and COMT genotype. PMID:27298360
Brain white matter structure and COMT gene are linked to second-language learning in adults.

PubMed

Mamiya, Ping C; Richards, Todd L; Coe, Bradley P; Eichler, Evan E; Kuhl, Patricia K

2016-06-28

Adult human brains retain the capacity to undergo tissue reorganization during second-language learning. Brain-imaging studies show a relationship between neuroanatomical properties and learning for adults exposed to a second language. However, the role of genetic factors in this relationship has not been investigated. The goal of the current study was twofold: (i) to characterize the relationship between brain white matter fiber-tract properties and second-language immersion using diffusion tensor imaging, and (ii) to determine whether polymorphisms in the catechol-O-methyltransferase (COMT) gene affect the relationship. We recruited incoming Chinese students enrolled in the University of Washington and scanned their brains one time. We measured the diffusion properties of the white matter fiber tracts and correlated them with the number of days each student had been in the immersion program at the time of the brain scan. We found that higher numbers of days in the English immersion program correlated with higher fractional anisotropy and lower radial diffusivity in the right superior longitudinal fasciculus. We show that fractional anisotropy declined once the subjects finished the immersion program. The relationship between brain white matter fiber-tract properties and immersion varied in subjects with different COMT genotypes. Subjects with the Methionine (Met)/Valine (Val) and Val/Val genotypes showed higher fractional anisotropy and lower radial diffusivity during immersion, which reversed immediately after immersion ended, whereas those with the Met/Met genotype did not show these relationships. Statistical modeling revealed that subjects' grades in the language immersion program were best predicted by fractional anisotropy and COMT genotype.
Identification of human microRNA targets from isolated argonaute protein complexes.

PubMed

Beitzinger, Michaela; Peters, Lasse; Zhu, Jia Yun; Kremmer, Elisabeth; Meister, Gunter

2007-06-01

MicroRNAs (miRNAs) constitute a class of small non-coding RNAs that regulate gene expression on the level of translation and/or mRNA stability. Mammalian miRNAs associate with members of the Argonaute (Ago) protein family and bind to partially complementary sequences in the 3' untranslated region (UTR) of specific target mRNAs. Computer algorithms based on factors such as free binding energy or sequence conservation have been used to predict miRNA target mRNAs. Based on such predictions, up to one third of all mammalian mRNAs seem to be under miRNA regulation. However, due to the low degree of complementarity between the miRNA and its target, such computer programs are often imprecise and therefore not very reliable. Here we report the first biochemical identification approach of miRNA targets from human cells. Using highly specific monoclonal antibodies against members of the Ago protein family, we co-immunoprecipitate Ago-bound mRNAs and identify them by cloning. Interestingly, most of the identified targets are also predicted by different computer programs. Moreover, we randomly analyzed six different target candidates and were able to experimentally validate five as miRNA targets. Our data clearly indicate that miRNA targets can be experimentally identified from Ago complexes and therefore provide a new tool to directly analyze miRNA function.
The Significance of the PD-L1 Expression in Non-Small-Cell Lung Cancer: Trenchant Double Swords as Predictive and Prognostic Markers.

PubMed

Takada, Kazuki; Toyokawa, Gouji; Shoji, Fumihiro; Okamoto, Tatsuro; Maehara, Yoshihiko

2018-03-01

Lung cancer is the leading cause of death due to cancer worldwide. Surgery, chemotherapy, and radiotherapy have been the standard treatment for lung cancer, and targeted molecular therapy has greatly improved the clinical course of patients with non-small-cell lung cancer (NSCLC) harboring driver mutations, such as in epidermal growth factor receptor and anaplastic lymphoma kinase genes. Despite advances in such therapies, the prognosis of patients with NSCLC without driver oncogene mutations remains poor. Immunotherapy targeting programmed cell death-1 (PD-1) and programmed cell death-ligand 1 (PD-L1) has recently been shown to improve the survival in advanced NSCLC. The PD-L1 expression on the surface of tumor cells has emerged as a potential biomarker for predicting responses to immunotherapy and prognosis after surgery in NSCLC. However, the utility of PD-L1 expression as a predictive and prognostic biomarker remains controversial because of the existence of various PD-L1 antibodies, scoring systems, and positivity cutoffs. In this review, we summarize the data from representative clinical trials of PD-1/PD-L1 immune checkpoint inhibitors in NSCLC and previous reports on the association between PD-L1 expression and clinical outcomes in patients with NSCLC. Furthermore, we discuss the future perspectives of immunotherapy and immune checkpoint factors. Copyright © 2017 Elsevier Inc. All rights reserved.

The AGT Gene M235T Polymorphism and Response of Power-Related Variables to Aerobic Training.

PubMed

Aleksandra, Zarębska; Zbigniew, Jastrzębski; Waldemar, Moska; Agata, Leońska-Duniec; Mariusz, Kaczmarczyk; Marek, Sawczuk; Agnieszka, Maciejewska-Skrendo; Piotr, Żmijewski; Krzysztof, Ficek; Grzegorz, Trybek; Ewelina, Lulińska-Kuklik; Semenova, Ekaterina A; Ahmetov, Ildus I; Paweł, Cięszczyk

2016-12-01

The C allele of the M235T (rs699) polymorphism of the AGT gene correlates with higher levels of angiotensin II and has been associated with power and strength sport performance. The aim of the study was to investigate whether or not selected power-related variables and their response to a 12-week program of aerobic dance training are modulated by the AGT M235T genotype in healthy participants. Two hundred and one Polish Caucasian women aged 21 ± 1 years met the inclusion criteria and were included in the study. All women completed a 12-week program of low and high impact aerobics. Wingate peak power and total work capacity, 5 m, 10 m, and 30 m running times and jump height and jump power were determined before and after the training programme. All power-related variables improved significantly in response to aerobic dance training. We found a significant association between the M235T polymorphism and jump-based variables (squat jump (SJ) height, p = 0.005; SJ power, p = 0.015; countermovement jump height, p = 0.025; average of 10 countermovement jumps with arm swing (ACMJ) height, p = 0.001; ACMJ power, p = 0.035). Specifically, greater improvements were observed in the C allele carriers in comparison with TT homozygotes. In conclusion, aerobic dance, one of the most commonly practiced adult fitness activities in the world, provides sufficient training stimuli for augmenting the explosive strength necessary to increase vertical jump performance. The AGT gene M235T polymorphism seems to be not only a candidate gene variant for power/strength related phenotypes, but also a genetic marker for predicting response to training.
Functional analysis and transcriptional output of the Göttingen minipig genome.

PubMed

Heckel, Tobias; Schmucki, Roland; Berrera, Marco; Ringshandl, Stephan; Badi, Laura; Steiner, Guido; Ravon, Morgane; Küng, Erich; Kuhn, Bernd; Kratochwil, Nicole A; Schmitt, Georg; Kiialainen, Anna; Nowaczyk, Corinne; Daff, Hamina; Khan, Azinwi Phina; Lekolool, Isaac; Pelle, Roger; Okoth, Edward; Bishop, Richard; Daubenberger, Claudia; Ebeling, Martin; Certa, Ulrich

2015-11-14

In the past decade the Göttingen minipig has gained increasing recognition as animal model in pharmaceutical and safety research because it recapitulates many aspects of human physiology and metabolism. Genome-based comparison of drug targets together with quantitative tissue expression analysis allows rational prediction of pharmacology and cross-reactivity of human drugs in animal models thereby improving drug attrition which is an important challenge in the process of drug development. Here we present a new chromosome level based version of the Göttingen minipig genome together with a comparative transcriptional analysis of tissues with pharmaceutical relevance as basis for translational research. We relied on mapping and assembly of WGS (whole-genome-shotgun sequencing) derived reads to the reference genome of the Duroc pig and predict 19,228 human orthologous protein-coding genes. Genome-based prediction of the sequence of human drug targets enables the prediction of drug cross-reactivity based on conservation of binding sites. We further support the finding that the genome of Sus scrofa contains about ten-times less pseudogenized genes compared to other vertebrates. Among the functional human orthologs of these minipig pseudogenes we found HEPN1, a putative tumor suppressor gene. The genomes of Sus scrofa, the Tibetan boar, the African Bushpig, and the Warthog show sequence conservation of all inactivating HEPN1 mutations suggesting disruption before the evolutionary split of these pig species. We identify 133 Sus scrofa specific, conserved long non-coding RNAs (lncRNAs) in the minipig genome and show that these transcripts are highly conserved in the African pigs and the Tibetan boar suggesting functional significance. Using a new minipig specific microarray we show high conservation of gene expression signatures in 13 tissues with biomedical relevance between humans and adult minipigs. We underline this relationship for minipig and human liver where we could demonstrate similar expression levels for most phase I drug-metabolizing enzymes. Higher expression levels and metabolic activities were found for FMO1, AKR/CRs and for phase II drug metabolizing enzymes in minipig as compared to human. The variability of gene expression in equivalent human and minipig tissues is considerably higher in minipig organs, which is important for study design in case a human target belongs to this variable category in the minipig. The first analysis of gene expression in multiple tissues during development from young to adult shows that the majority of transcriptional programs are concluded four weeks after birth. This finding is in line with the advanced state of human postnatal organ development at comparative age categories and further supports the minipig as model for pediatric drug safety studies. Genome based assessment of sequence conservation combined with gene expression data in several tissues improves the translational value of the minipig for human drug development. The genome and gene expression data presented here are important resources for researchers using the minipig as model for biomedical research or commercial breeding. Potential impact of our data for comparative genomics, translational research, and experimental medicine are discussed.
Cooperative gene regulation by microRNA pairs and their identification using a computational workflow

PubMed Central

Schmitz, Ulf; Lai, Xin; Winter, Felix; Wolkenhauer, Olaf; Vera, Julio; Gupta, Shailendra K.

2014-01-01

MicroRNAs (miRNAs) are an integral part of gene regulation at the post-transcriptional level. Recently, it has been shown that pairs of miRNAs can repress the translation of a target mRNA in a cooperative manner, which leads to an enhanced effectiveness and specificity in target repression. However, it remains unclear which miRNA pairs can synergize and which genes are target of cooperative miRNA regulation. In this paper, we present a computational workflow for the prediction and analysis of cooperating miRNAs and their mutual target genes, which we refer to as RNA triplexes. The workflow integrates methods of miRNA target prediction; triplex structure analysis; molecular dynamics simulations and mathematical modeling for a reliable prediction of functional RNA triplexes and target repression efficiency. In a case study we analyzed the human genome and identified several thousand targets of cooperative gene regulation. Our results suggest that miRNA cooperativity is a frequent mechanism for an enhanced target repression by pairs of miRNAs facilitating distinctive and fine-tuned target gene expression patterns. Human RNA triplexes predicted and characterized in this study are organized in a web resource at www.sbi.uni-rostock.de/triplexrna/. PMID:24875477
Explaining the disease phenotype of intergenic SNP through predicted long range regulation

PubMed Central

Chen, Jingqi; Tian, Weidong

2016-01-01

Thousands of disease-associated SNPs (daSNPs) are located in intergenic regions (IGR), making it difficult to understand their association with disease phenotypes. Recent analysis found that non-coding daSNPs were frequently located in or approximate to regulatory elements, inspiring us to try to explain the disease phenotypes of IGR daSNPs through nearby regulatory sequences. Hence, after locating the nearest distal regulatory element (DRE) to a given IGR daSNP, we applied a computational method named INTREPID to predict the target genes regulated by the DRE, and then investigated their functional relevance to the IGR daSNP's disease phenotypes. 36.8% of all IGR daSNP-disease phenotype associations investigated were possibly explainable through the predicted target genes, which were enriched with, were functionally relevant to, or consisted of the corresponding disease genes. This proportion could be further increased to 60.5% if the LD SNPs of daSNPs were also considered. Furthermore, the predicted SNP-target gene pairs were enriched with known eQTL/mQTL SNP-gene relationships. Overall, it's likely that IGR daSNPs may contribute to disease phenotypes by interfering with the regulatory function of their nearby DREs and causing abnormal expression of disease genes. PMID:27280978
Comparing machine learning and logistic regression methods for predicting hypertension using a combination of gene expression and next-generation sequencing data.

PubMed

Held, Elizabeth; Cape, Joshua; Tintle, Nathan

2016-01-01

Machine learning methods continue to show promise in the analysis of data from genetic association studies because of the high number of variables relative to the number of observations. However, few best practices exist for the application of these methods. We extend a recently proposed supervised machine learning approach for predicting disease risk by genotypes to be able to incorporate gene expression data and rare variants. We then apply 2 different versions of the approach (radial and linear support vector machines) to simulated data from Genetic Analysis Workshop 19 and compare performance to logistic regression. Method performance was not radically different across the 3 methods, although the linear support vector machine tended to show small gains in predictive ability relative to a radial support vector machine and logistic regression. Importantly, as the number of genes in the models was increased, even when those genes contained causal rare variants, model predictive ability showed a statistically significant decrease in performance for both the radial support vector machine and logistic regression. The linear support vector machine showed more robust performance to the inclusion of additional genes. Further work is needed to evaluate machine learning approaches on larger samples and to evaluate the relative improvement in model prediction from the incorporation of gene expression data.
Multiple biomarkers in molecular oncology. II. Molecular diagnostics applications in breast cancer management.

PubMed

Malinowski, Douglas P

2007-05-01

In recent years, the application of genomic and proteomic technologies to the problem of breast cancer prognosis and the prediction of therapy response have begun to yield encouraging results. Independent studies employing transcriptional profiling of primary breast cancer specimens using DNA microarrays have identified gene expression profiles that correlate with clinical outcome in primary breast biopsy specimens. Recent advances in microarray technology have demonstrated reproducibility, making clinical applications more achievable. In this regard, one such DNA microarray device based upon a 70-gene expression signature was recently cleared by the US FDA for application to breast cancer prognosis. These DNA microarrays often employ at least 70 gene targets for transcriptional profiling and prognostic assessment in breast cancer. The use of PCR-based methods utilizing a small subset of genes has recently demonstrated the ability to predict the clinical outcome in early-stage breast cancer. Furthermore, protein-based immunohistochemistry methods have progressed from using gene clusters and gene expression profiling to smaller subsets of expressed proteins to predict prognosis in early-stage breast cancer. Beyond prognostic applications, DNA microarray-based transcriptional profiling has demonstrated the ability to predict response to chemotherapy in early-stage breast cancer patients. In this review, recent advances in the use of multiple markers for prognosis of disease recurrence in early-stage breast cancer and the prediction of therapy response will be discussed.
Marker genes that are less conserved in their sequences are useful for predicting genome-wide similarity levels between closely related prokaryotic strains

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lan, Yemin; Rosen, Gail; Hershberg, Ruth

The 16s rRNA gene is so far the most widely used marker for taxonomical classification and separation of prokaryotes. Since it is universally conserved among prokaryotes, it is possible to use this gene to classify a broad range of prokaryotic organisms. At the same time, it has often been noted that the 16s rRNA gene is too conserved to separate between prokaryotes at finer taxonomic levels. In this paper, we examine how well levels of similarity of 16s rRNA and 73 additional universal or nearly universal marker genes correlate with genome-wide levels of gene sequence similarity. We demonstrate that themore » percent identity of 16s rRNA predicts genome-wide levels of similarity very well for distantly related prokaryotes, but not for closely related ones. In closely related prokaryotes, we find that there are many other marker genes for which levels of similarity are much more predictive of genome-wide levels of gene sequence similarity. Finally, we show that the identities of the markers that are most useful for predicting genome-wide levels of similarity within closely related prokaryotic lineages vary greatly between lineages. However, the most useful markers are always those that are least conserved in their sequences within each lineage. In conclusion, our results show that by choosing markers that are less conserved in their sequences within a lineage of interest, it is possible to better predict genome-wide gene sequence similarity between closely related prokaryotes than is possible using the 16s rRNA gene. We point readers towards a database we have created (POGO-DB) that can be used to easily establish which markers show lowest levels of sequence conservation within different prokaryotic lineages.« less
Marker genes that are less conserved in their sequences are useful for predicting genome-wide similarity levels between closely related prokaryotic strains

DOE PAGES

Lan, Yemin; Rosen, Gail; Hershberg, Ruth

2016-05-03

The 16s rRNA gene is so far the most widely used marker for taxonomical classification and separation of prokaryotes. Since it is universally conserved among prokaryotes, it is possible to use this gene to classify a broad range of prokaryotic organisms. At the same time, it has often been noted that the 16s rRNA gene is too conserved to separate between prokaryotes at finer taxonomic levels. In this paper, we examine how well levels of similarity of 16s rRNA and 73 additional universal or nearly universal marker genes correlate with genome-wide levels of gene sequence similarity. We demonstrate that themore » percent identity of 16s rRNA predicts genome-wide levels of similarity very well for distantly related prokaryotes, but not for closely related ones. In closely related prokaryotes, we find that there are many other marker genes for which levels of similarity are much more predictive of genome-wide levels of gene sequence similarity. Finally, we show that the identities of the markers that are most useful for predicting genome-wide levels of similarity within closely related prokaryotic lineages vary greatly between lineages. However, the most useful markers are always those that are least conserved in their sequences within each lineage. In conclusion, our results show that by choosing markers that are less conserved in their sequences within a lineage of interest, it is possible to better predict genome-wide gene sequence similarity between closely related prokaryotes than is possible using the 16s rRNA gene. We point readers towards a database we have created (POGO-DB) that can be used to easily establish which markers show lowest levels of sequence conservation within different prokaryotic lineages.« less
Breeding and Genetics Symposium: networks and pathways to guide genomic selection.

PubMed

Snelling, W M; Cushman, R A; Keele, J W; Maltecca, C; Thomas, M G; Fortes, M R S; Reverter, A

2013-02-01

Many traits affecting profitability and sustainability of meat, milk, and fiber production are polygenic, with no single gene having an overwhelming influence on observed variation. No knowledge of the specific genes controlling these traits has been needed to make substantial improvement through selection. Significant gains have been made through phenotypic selection enhanced by pedigree relationships and continually improving statistical methodology. Genomic selection, recently enabled by assays for dense SNP located throughout the genome, promises to increase selection accuracy and accelerate genetic improvement by emphasizing the SNP most strongly correlated to phenotype although the genes and sequence variants affecting phenotype remain largely unknown. These genomic predictions theoretically rely on linkage disequilibrium (LD) between genotyped SNP and unknown functional variants, but familial linkage may increase effectiveness when predicting individuals related to those in the training data. Genomic selection with functional SNP genotypes should be less reliant on LD patterns shared by training and target populations, possibly allowing robust prediction across unrelated populations. Although the specific variants causing polygenic variation may never be known with certainty, a number of tools and resources can be used to identify those most likely to affect phenotype. Associations of dense SNP genotypes with phenotype provide a 1-dimensional approach for identifying genes affecting specific traits; in contrast, associations with multiple traits allow defining networks of genes interacting to affect correlated traits. Such networks are especially compelling when corroborated by existing functional annotation and established molecular pathways. The SNP occurring within network genes, obtained from public databases or derived from genome and transcriptome sequences, may be classified according to expected effects on gene products. As illustrated by functionally informed genomic predictions being more accurate than naive whole-genome predictions of beef tenderness, coupling evidence from livestock genotypes, phenotypes, gene expression, and genomic variants with existing knowledge of gene functions and interactions may provide greater insight into the genes and genomic mechanisms affecting polygenic traits and facilitate functional genomic selection for economically important traits.
GeneBee-net: Internet-based server for analyzing biopolymers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brodsky, L.I.; Ivanov, V.V.; Nikolaev, V.K.

This work describes a network server for searching databanks of biopolymer structures and performing other biocomputing procedures; it is available via direct Internet connection. Basic server procedures are dedicated to homology (similarity) search of sequence and 3D structure of proteins. The homologies found could be used to build multiple alignments, predict protein and RNA secondary structure, and construct phylogenetic trees. In addition to traditional methods of sequence similarity search, the authors propose {open_quotes}non-matrix{close_quotes} (correlational) search. An analogous approach is used to identify regions of similar tertiary structure of proteins. Algorithm concepts and usage examples are presented for new methods. Servicemore » logic is based upon interaction of a client program and server procedures. The client program allows the compilation of queries and the processing of results of an analysis.« less
Prediction of regulatory gene pairs using dynamic time warping and gene ontology.

PubMed

Yang, Andy C; Hsu, Hui-Huang; Lu, Ming-Da; Tseng, Vincent S; Shih, Timothy K

2014-01-01

Selecting informative genes is the most important task for data analysis on microarray gene expression data. In this work, we aim at identifying regulatory gene pairs from microarray gene expression data. However, microarray data often contain multiple missing expression values. Missing value imputation is thus needed before further processing for regulatory gene pairs becomes possible. We develop a novel approach to first impute missing values in microarray time series data by combining k-Nearest Neighbour (KNN), Dynamic Time Warping (DTW) and Gene Ontology (GO). After missing values are imputed, we then perform gene regulation prediction based on our proposed DTW-GO distance measurement of gene pairs. Experimental results show that our approach is more accurate when compared with existing missing value imputation methods on real microarray data sets. Furthermore, our approach can also discover more regulatory gene pairs that are known in the literature than other methods.
GenePRIMP: A Gene Prediction Improvement Pipeline For Prokaryotic Genomes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kyrpides, Nikos C.; Ivanova, Natalia N.; Pati, Amrita

2010-07-08

GenePRIMP (Gene Prediction Improvement Pipeline, Http://geneprimp.jgi-psf.org), a computational process that performs evidence-based evaluation of gene models in prokaryotic genomes and reports anomalies including inconsistent start sites, missing genes, and split genes. We show that manual curation of gene models using the anomaly reports generated by GenePRIMP improves their quality and demonstrate the applicability of GenePRIMP in improving finishing quality and comparing different genome sequencing and annotation technologies. Keywords in context: Gene model, Quality Control, Translation start sites, Automatic correction. Hardware requirements; PC, MAC; Operating System: UNIX/LINUX; Compiler/Version: Perl 5.8.5 or higher; Special requirements: NCBI Blast and nr installation; File Types:more » Source Code, Executable module(s), Sample problem input data; installation instructions other; programmer documentation. Location/transmission: http://geneprimp.jgi-psf.org/gp.tar.gz« less
Annotation of gene function in citrus using gene expression information and co-expression networks

PubMed Central

2014-01-01

Background The genus Citrus encompasses major cultivated plants such as sweet orange, mandarin, lemon and grapefruit, among the world’s most economically important fruit crops. With increasing volumes of transcriptomics data available for these species, Gene Co-expression Network (GCN) analysis is a viable option for predicting gene function at a genome-wide scale. GCN analysis is based on a “guilt-by-association” principle whereby genes encoding proteins involved in similar and/or related biological processes may exhibit similar expression patterns across diverse sets of experimental conditions. While bioinformatics resources such as GCN analysis are widely available for efficient gene function prediction in model plant species including Arabidopsis, soybean and rice, in citrus these tools are not yet developed. Results We have constructed a comprehensive GCN for citrus inferred from 297 publicly available Affymetrix Genechip Citrus Genome microarray datasets, providing gene co-expression relationships at a genome-wide scale (33,000 transcripts). The comprehensive citrus GCN consists of a global GCN (condition-independent) and four condition-dependent GCNs that survey the sweet orange species only, all citrus fruit tissues, all citrus leaf tissues, or stress-exposed plants. All of these GCNs are clustered using genome-wide, gene-centric (guide) and graph clustering algorithms for flexibility of gene function prediction. For each putative cluster, gene ontology (GO) enrichment and gene expression specificity analyses were performed to enhance gene function, expression and regulation pattern prediction. The guide-gene approach was used to infer novel roles of genes involved in disease susceptibility and vitamin C metabolism, and graph-clustering approaches were used to investigate isoprenoid/phenylpropanoid metabolism in citrus peel, and citric acid catabolism via the GABA shunt in citrus fruit. Conclusions Integration of citrus gene co-expression networks, functional enrichment analysis and gene expression information provide opportunities to infer gene function in citrus. We present a publicly accessible tool, Network Inference for Citrus Co-Expression (NICCE, http://citrus.adelaide.edu.au/nicce/home.aspx), for the gene co-expression analysis in citrus. PMID:25023870
MethPrimer: designing primers for methylation PCRs.

PubMed

Li, Long-Cheng; Dahiya, Rajvir

2002-11-01

DNA methylation is an epigenetic mechanism of gene regulation. Bisulfite- conversion-based PCR methods, such as bisulfite sequencing PCR (BSP) and methylation specific PCR (MSP), remain the most commonly used techniques for methylation mapping. Existing primer design programs developed for standard PCR cannot handle primer design for bisulfite-conversion-based PCRs due to changes in DNA sequence context caused by bisulfite treatment and many special constraints both on the primers and the region to be amplified for such experiments. Therefore, the present study was designed to develop a program for such applications. MethPrimer, based on Primer 3, is a program for designing PCR primers for methylation mapping. It first takes a DNA sequence as its input and searches the sequence for potential CpG islands. Primers are then picked around the predicted CpG islands or around regions specified by users. MethPrimer can design primers for BSP and MSP. Results of primer selection are delivered through a web browser in text and in graphic view.
Comparative Evaluation of Two Serial Gene Expression Experiments | Division of Cancer Prevention

Cancer.gov

Stuart G. Baker, 2014 Introduction This program fits biologically relevant response curves in comparative analysis of the two gene expression experiments involving same genes but under different scenarios and at least 12 responses. The program outputs gene pairs with biologically relevant response curve shapes including flat, linear, sigmoid, hockey stick, impulse and step
General statistics of stochastic process of gene expression in eukaryotic cells.

PubMed Central

Kuznetsov, V A; Knott, G D; Bonner, R F

2002-01-01

Thousands of genes are expressed at such very low levels (< or =1 copy per cell) that global gene expression analysis of rarer transcripts remains problematic. Ambiguity in identification of rarer transcripts creates considerable uncertainty in fundamental questions such as the total number of genes expressed in an organism and the biological significance of rarer transcripts. Knowing the distribution of the true number of genes expressed at each level and the corresponding gene expression level probability function (GELPF) could help resolve these uncertainties. We found that all observed large-scale gene expression data sets in yeast, mouse, and human cells follow a Pareto-like distribution model skewed by many low-abundance transcripts. A novel stochastic model of the gene expression process predicts the universality of the GELPF both across different cell types within a multicellular organism and across different organisms. This model allows us to predict the frequency distribution of all gene expression levels within a single cell and to estimate the number of expressed genes in a single cell and in a population of cells. A random "basal" transcription mechanism for protein-coding genes in all or almost all eukaryotic cell types is predicted. This fundamental mechanism might enhance the expression of rarely expressed genes and, thus, provide a basic level of phenotypic diversity, adaptability, and random monoallelic expression in cell populations. PMID:12136033
Selection of Valid Reference Genes for Reverse Transcription Quantitative PCR Analysis in Heliconius numata (Lepidoptera: Nymphalidae)

PubMed Central

Chouteau, Mathieu; Whibley, Annabel; Joron, Mathieu; Llaurens, Violaine

2016-01-01

Identifying the genetic basis of adaptive variation is challenging in non-model organisms and quantitative real time PCR. is a useful tool for validating predictions regarding the expression of candidate genes. However, comparing expression levels in different conditions requires rigorous experimental design and statistical analyses. Here, we focused on the neotropical passion-vine butterflies Heliconius, non-model species studied in evolutionary biology for their adaptive variation in wing color patterns involved in mimicry and in the signaling of their toxicity to predators. We aimed at selecting stable reference genes to be used for normalization of gene expression data in RT-qPCR analyses from developing wing discs according to the minimal guidelines described in Minimum Information for publication of Quantitative Real-Time PCR Experiments (MIQE). To design internal RT-qPCR controls, we studied the stability of expression of nine candidate reference genes (actin, annexin, eF1α, FK506BP, PolyABP, PolyUBQ, RpL3, RPS3A, and tubulin) at two developmental stages (prepupal and pupal) using three widely used programs (GeNorm, NormFinder and BestKeeper). Results showed that, despite differences in statistical methods, genes RpL3, eF1α, polyABP, and annexin were stably expressed in wing discs in late larval and pupal stages of Heliconius numata. This combination of genes may be used as a reference for a reliable study of differential expression in wings for instance for genes involved in important phenotypic variation, such as wing color pattern variation. Through this example, we provide general useful technical recommendations as well as relevant statistical strategies for evolutionary biologists aiming to identify candidate-genes involved adaptive variation in non-model organisms. PMID:27271971
Testing the recent theories for the origin of the hermaphrodite flower by comparison of the transcriptomes of gymnosperms and angiosperms.

PubMed

Tavares, Raquel; Cagnon, Mathilde; Negrutiu, Ioan; Mouchiroud, Dominque

2010-08-03

Different theories for the origin of the angiosperm hermaphrodite flower make different predictions concerning the overlap between the genes expressed in the male and female cones of gymnosperms and the genes expressed in the hermaphrodite flower of angiosperms. The Mostly Male (MM) theory predicts that, of genes expressed primarily in male versus female gymnosperm cones, an excess of male orthologs will be expressed in flowers, excluding ovules, while Out Of Male (OOM) and Out Of Female (OOF) theories predict no such excess. In this paper, we tested these predictions by comparing the transcriptomes of three gymnosperms (Ginkgo biloba, Welwitschia mirabilis and Zamia fisheri) and two angiosperms (Arabidopsis thaliana and Oryza sativa), using EST data. We found that the proportion of orthologous genes expressed in the reproductive organs of the gymnosperms and in the angiosperms flower is significantly higher than the proportion of orthologous genes expressed in the reproductive organs of the gymnosperms and in the angiosperms vegetative tissues, which shows that the approach is correct. However, we detected no significant differences between the proportion of gymnosperm orthologous genes expressed in the male cone and in the angiosperms flower and the proportion of gymnosperm orthologous genes expressed in the female cone and in the angiosperms flower. These results do not support the MM theory prediction of an excess of male gymnosperm genes expressed in the hermaphrodite flower of the angiosperms and seem to support the OOM/OOF theories. However, other explanations can be given for the 1:1 ratio that we found. More abundant and more specific (namely carpel and ovule) expression data should be produced in order to further test these theories.
Testing the recent theories for the origin of the hermaphrodite flower by comparison of the transcriptomes of gymnosperms and angiosperms

PubMed Central

2010-01-01

Background Different theories for the origin of the angiosperm hermaphrodite flower make different predictions concerning the overlap between the genes expressed in the male and female cones of gymnosperms and the genes expressed in the hermaphrodite flower of angiosperms. The Mostly Male (MM) theory predicts that, of genes expressed primarily in male versus female gymnosperm cones, an excess of male orthologs will be expressed in flowers, excluding ovules, while Out Of Male (OOM) and Out Of Female (OOF) theories predict no such excess. Results In this paper, we tested these predictions by comparing the transcriptomes of three gymnosperms (Ginkgo biloba, Welwitschia mirabilis and Zamia fisheri) and two angiosperms (Arabidopsis thaliana and Oryza sativa), using EST data. We found that the proportion of orthologous genes expressed in the reproductive organs of the gymnosperms and in the angiosperms flower is significantly higher than the proportion of orthologous genes expressed in the reproductive organs of the gymnosperms and in the angiosperms vegetative tissues, which shows that the approach is correct. However, we detected no significant differences between the proportion of gymnosperm orthologous genes expressed in the male cone and in the angiosperms flower and the proportion of gymnosperm orthologous genes expressed in the female cone and in the angiosperms flower. Conclusions These results do not support the MM theory prediction of an excess of male gymnosperm genes expressed in the hermaphrodite flower of the angiosperms and seem to support the OOM/OOF theories. However, other explanations can be given for the 1:1 ratio that we found. More abundant and more specific (namely carpel and ovule) expression data should be produced in order to further test these theories. PMID:20682074
DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, X; Zhou, Z; Thomas, K

Purpose: The goal of this work is to investigate the use of contrast enhanced computed tomographic (CT) features for the prediction of mutations of BAP1, PBRM1, and VHL genes in renal cell carcinoma (RCC). Methods: For this study, we used two patient databases with renal cell carcinoma (RCC). The first one consisted of 33 patients from our institution (UT Southwestern Medical Center, UTSW). The second one consisted of 24 patients from the Cancer Imaging Archive (TCIA), where each patient is connected by a unique identi?er to the tissue samples from the Cancer Genome Atlas (TCGA). From the contrast enhanced CTmore » image of each patient, tumor contour was first delineated by a physician. Geometry, intensity, and texture features were extracted from the delineated tumor. Based on UTSW dataset, we completed feature selection and trained a support vector machine (SVM) classifier to predict mutations of BAP1, PBRM1 and VHL genes. We then used TCIA-TCGA dataset to validate the predictive model build upon UTSW dataset. Results: The prediction accuracy of gene expression of TCIA-TCGA patients was 0.83 (20 of 24), 0.83 (20 of 24), and 0.75 (18 of 24) for BAP1, PBRM1, and VHL respectively. For BAP1 gene, texture feature was the most prominent feature type. For PBRM1 gene, intensity feature was the most prominent. For VHL gene, geometry, intensity, and texture features were all important. Conclusion: Using our feature selection strategy and models, we achieved predictive accuracy over 0.75 for all three genes under the condition of using patient data from one institution for training and data from other institutions for testing. These results suggest that radiogenomics can be used to aid in prognosis and used as convenient surrogates for expensive and time consuming gene assay procedures.« less

Some links on this page may take you to non-federal websites. Their policies may differ from this site.