Integrative Analysis of Complex Cancer Genomics and Clinical Profiles Using the cBioPortal
Gao, Jianjiong; Aksoy, Bülent Arman; Dogrusoz, Ugur; Dresdner, Gideon; Gross, Benjamin; Sumer, S. Onur; Sun, Yichao; Jacobsen, Anders; Sinha, Rileen; Larsson, Erik; Cerami, Ethan; Sander, Chris; Schultz, Nikolaus
2014-01-01
The cBioPortal for Cancer Genomics (http://cbioportal.org) provides a Web resource for exploring, visualizing, and analyzing multidimensional cancer genomics data. The portal reduces molecular profiling data from cancer tissues and cell lines into readily understandable genetic, epigenetic, gene expression, and proteomic events. The query interface combined with customized data storage enables researchers to interactively explore genetic alterations across samples, genes, and pathways and, when available in the underlying data, to link these to clinical outcomes. The portal provides graphical summaries of gene-level data from multiple platforms, network visualization and analysis, survival analysis, patient-centric queries, and software programmatic access. The intuitive Web interface of the portal makes complex cancer genomics profiles accessible to researchers and clinicians without requiring bioinformatics expertise, thus facilitating biological discoveries. Here, we provide a practical guide to the analysis and visualization features of the cBioPortal for Cancer Genomics. PMID:23550210
Feltus, F Alex
2014-06-01
Understanding the control of any trait optimally requires the detection of causal genes, gene interaction, and mechanism of action to discover and model the biochemical pathways underlying the expressed phenotype. Functional genomics techniques, including RNA expression profiling via microarray and high-throughput DNA sequencing, allow for the precise genome localization of biological information. Powerful genetic approaches, including quantitative trait locus (QTL) and genome-wide association study mapping, link phenotype with genome positions, yet genetics is less precise in localizing the relevant mechanistic information encoded in DNA. The coupling of salient functional genomic signals with genetically mapped positions is an appealing approach to discover meaningful gene-phenotype relationships. Techniques used to define this genetic-genomic convergence comprise the field of systems genetics. This short review will address an application of systems genetics where RNA profiles are associated with genetically mapped genome positions of individual genes (eQTL mapping) or as gene sets (co-expression network modules). Both approaches can be applied for knowledge independent selection of candidate genes (and possible control mechanisms) underlying complex traits where multiple, likely unlinked, genomic regions might control specific complex traits. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Lee, Kang-Hoon; Lee, Young-Kwan; Kwon, Deug-Nam; Chiu, Sophia; Chew, Victoria; Rah, Hyungchul; Kujawski, Gregory; Melhem, Ramzi; Hsu, Karen; Chung, Cecilia; Greenhalgh, David G; Cho, Kiho
2011-06-01
Approximately 2% of the human genome is reported to be occupied by genes. Various forms of repetitive elements (REs), both characterized and uncharacterized, are presumed to make up the vast majority of the rest of the genomes of human and other species. In conjunction with a comprehensive annotation of genes, information regarding components of genome biology, such as gene polymorphisms, non-coding RNAs, and certain REs, is found in human genome databases. However, the genome-wide profile of unique RE arrangements formed by different groups of REs has not been fully characterized yet. In this study, the entire human genome was subjected to an unbiased RE survey to establish a whole-genome profile of REs and their arrangements. Due to the limitation in query size within the bl2seq alignment program (National Center for Biotechnology Information [NCBI]) utilized for the RE survey, the entire NCBI reference human genome was fragmented into 6206 units of 0.5M nucleotides. A number of RE arrangements with varying complexities and patterns were identified throughout the genome. Each chromosome had unique profiles of RE arrangements and density, and high levels of RE density were measured near the centromere regions. Subsequently, 175 complex RE arrangements, which were selected throughout the genome, were subjected to a comparison analysis using five different human genome sequences. Interestingly, three of the five human genome databases shared the exactly same arrangement patterns and sequences for all 175 RE arrangement regions (a total of 12,765,625 nucleotides). The findings from this study demonstrate that a substantial fraction of REs in the human genome are clustered into various forms of ordered structures. Further investigations are needed to examine whether some of these ordered RE arrangements contribute to the human pathobiology as a functional genome unit. Copyright © 2011 Elsevier Inc. All rights reserved.
Transforming the practice of medicine using genomics
Ginsburg, Geoffrey S.; Ginsburg, Geoffrey S.; J. McCarthy, Jeanette
2009-01-01
Recent studies have demonstrated the use of genomic data, particularly gene expression signatures, as clinical prognostic factors in complex diseases. Such studies herald the future for genomic medicine and the opportunity for personalized prognosis in a variety of clinical contexts that utilize genomescale molecular information. Several key areas represent logical and critical next steps in the use of complex genomic profiling data towards the goal of personalized medicine. First, analyses should be geared toward the development of molecular profiles that predict future events – such as major clinical events or the response, resistance, or adverse reaction to therapy. Secondly, these must move into actual clinical practice by forming the basis for the next generation of clinical trials that will employ these methodologies to stratify patients. Lastly, there remain formidable challenges is in the translation of genomic technologies into clinical medicine that will need to be addressed: professional and public education, health outcomes research, reimbursement, regulatory oversight and privacy protection. PMID:22461094
O'Brien, M.A.; Costin, B.N.; Miles, M.F.
2014-01-01
Postgenomic studies of the function of genes and their role in disease have now become an area of intense study since efforts to define the raw sequence material of the genome have largely been completed. The use of whole-genome approaches such as microarray expression profiling and, more recently, RNA-sequence analysis of transcript abundance has allowed an unprecedented look at the workings of the genome. However, the accurate derivation of such high-throughput data and their analysis in terms of biological function has been critical to truly leveraging the postgenomic revolution. This chapter will describe an approach that focuses on the use of gene networks to both organize and interpret genomic expression data. Such networks, derived from statistical analysis of large genomic datasets and the application of multiple bioinformatics data resources, poten-tially allow the identification of key control elements for networks associated with human disease, and thus may lead to derivation of novel therapeutic approaches. However, as discussed in this chapter, the leveraging of such networks cannot occur without a thorough understanding of the technical and statistical factors influencing the derivation of genomic expression data. Thus, while the catch phrase may be “it's the network … stupid,” the understanding of factors extending from RNA isolation to genomic profiling technique, multivariate statistics, and bioinformatics are all critical to defining fully useful gene networks for study of complex biology. PMID:23195313
The Cancer Genome Atlas Pan-Cancer Analysis Project
Weinstein, John N.; Collisson, Eric A.; Mills, Gordon B.; Shaw, Kenna M.; Ozenberger, Brad A.; Ellrott, Kyle; Shmulevich, Ilya; Sander, Chris; Stuart, Joshua M.
2014-01-01
Cancer can take hundreds of different forms depending on the location, cell of origin and spectrum of genomic alterations that promote oncogenesis and affect therapeutic response. Although many genomic events with direct phenotypic impact have been identified, much of the complex molecular landscape remains incompletely charted for most cancer lineages. For that reason, The Cancer Genome Atlas (TCGA) Research Network has profiled and analyzed large numbers of human tumours to discover molecular aberrations at the DNA, RNA, protein, and epigenetic levels. The resulting rich data provide a major opportunity to develop an integrated picture of commonalities, differences, and emergent themes across tumour lineages. The Pan-Cancer initiative compares the first twelve tumour types profiled by TCGA. Analysis of the molecular aberrations and their functional roles across tumour types will teach us how to extend therapies effective in one cancer type to others with a similar genomic profile. PMID:24071849
Genomics DNA Profiling in Elite Professional Soccer Players: A Pilot Study
Kambouris, M; Del Buono, A; Maffulli, N
2014-01-01
Functional variants in exonic regions have been associated with development of cardiovascular disease, diabetes and cancer. Athletic performance can be considered a multi-factorial complex phenotype. Genomic DNA was extracted from buccal swabs of seven soccer players from the Fulham football team. Single nucleotide polymorphism (SNPs) genotyping was undertaken. To achieve optimal athletic performance, predictive genomics DNA profiling for sports performance can be used to aid in sport selection and elaboration of personalized training and nutrition programs. Predictive DNA profiling may be able to detect athletes with potential or frank injuries, or screening and selection of future athletes, and can help them to maximize utilization of their potential and improve performance in sports. The aim of this study is to provide a wide scenario of specific genomic variants that an athlete carries, to implement which measures should be taken to maximize the athlete’s potential. PMID:24809029
Inferring genome-wide interplay landscape between DNA methylation and transcriptional regulation.
Tang, Binhua; Wang, Xin
2015-01-01
DNA methylation and transcriptional regulation play important roles in cancer cell development and differentiation processes. Based on the currently available cell line profiling information from the ENCODE Consortium, we propose a Bayesian inference model to infer and construct genome-wide interaction landscape between DNA methylation and transcriptional regulation, which sheds light on the underlying complex functional mechanisms important within the human cancer and disease context. For the first time, we select all the currently available cell lines (>=20) and transcription factors (>=80) profiling information from the ENCODE Consortium portal. Through the integration of those genome-wide profiling sources, our genome-wide analysis detects multiple functional loci of interest, and indicates that DNA methylation is cell- and region-specific, due to the interplay mechanisms with transcription regulatory activities. We validate our analysis results with the corresponding RNA-sequencing technique for those detected genomic loci. Our results provide novel and meaningful insights for the interplay mechanisms of transcriptional regulation and gene expression for the human cancer and disease studies.
An ensemble model of competitive multi-factor binding of the genome
Wasson, Todd; Hartemink, Alexander J.
2009-01-01
Hundreds of different factors adorn the eukaryotic genome, binding to it in large number. These DNA binding factors (DBFs) include nucleosomes, transcription factors (TFs), and other proteins and protein complexes, such as the origin recognition complex (ORC). DBFs compete with one another for binding along the genome, yet many current models of genome binding do not consider different types of DBFs together simultaneously. Additionally, binding is a stochastic process that results in a continuum of binding probabilities at any position along the genome, but many current models tend to consider positions as being either binding sites or not. Here, we present a model that allows a multitude of DBFs, each at different concentrations, to compete with one another for binding sites along the genome. The result is an “occupancy profile,” a probabilistic description of the DNA occupancy of each factor at each position. We implement our model efficiently as the software package COMPETE. We demonstrate genome-wide and at specific loci how modeling nucleosome binding alters TF binding, and vice versa, and illustrate how factor concentration influences binding occupancy. Binding cooperativity between nearby TFs arises implicitly via mutual competition with nucleosomes. Our method applies not only to TFs, but also recapitulates known occupancy profiles of a well-studied replication origin with and without ORC binding. Importantly, the sequence preferences our model takes as input are derived from in vitro experiments. This ensures that the calculated occupancy profiles are the result of the forces of competition represented explicitly in our model and the inherent sequence affinities of the constituent DBFs. PMID:19720867
ReprDB and panDB: minimalist databases with maximal microbial representation.
Zhou, Wei; Gay, Nicole; Oh, Julia
2018-01-18
Profiling of shotgun metagenomic samples is hindered by a lack of unified microbial reference genome databases that (i) assemble genomic information from all open access microbial genomes, (ii) have relatively small sizes, and (iii) are compatible to various metagenomic read mapping tools. Moreover, computational tools to rapidly compile and update such databases to accommodate the rapid increase in new reference genomes do not exist. As a result, database-guided analyses often fail to profile a substantial fraction of metagenomic shotgun sequencing reads from complex microbiomes. We report pipelines that efficiently traverse all open access microbial genomes and assemble non-redundant genomic information. The pipelines result in two species-resolution microbial reference databases of relatively small sizes: reprDB, which assembles microbial representative or reference genomes, and panDB, for which we developed a novel iterative alignment algorithm to identify and assemble non-redundant genomic regions in multiple sequenced strains. With the databases, we managed to assign taxonomic labels and genome positions to the majority of metagenomic reads from human skin and gut microbiomes, demonstrating a significant improvement over a previous database-guided analysis on the same datasets. reprDB and panDB leverage the rapid increases in the number of open access microbial genomes to more fully profile metagenomic samples. Additionally, the databases exclude redundant sequence information to avoid inflated storage or memory space and indexing or analyzing time. Finally, the novel iterative alignment algorithm significantly increases efficiency in pan-genome identification and can be useful in comparative genomic analyses.
Nielsen, H Bjørn; Almeida, Mathieu; Juncker, Agnieszka Sierakowska; Rasmussen, Simon; Li, Junhua; Sunagawa, Shinichi; Plichta, Damian R; Gautier, Laurent; Pedersen, Anders G; Le Chatelier, Emmanuelle; Pelletier, Eric; Bonde, Ida; Nielsen, Trine; Manichanh, Chaysavanh; Arumugam, Manimozhiyan; Batto, Jean-Michel; Quintanilha Dos Santos, Marcelo B; Blom, Nikolaj; Borruel, Natalia; Burgdorf, Kristoffer S; Boumezbeur, Fouad; Casellas, Francesc; Doré, Joël; Dworzynski, Piotr; Guarner, Francisco; Hansen, Torben; Hildebrand, Falk; Kaas, Rolf S; Kennedy, Sean; Kristiansen, Karsten; Kultima, Jens Roat; Léonard, Pierre; Levenez, Florence; Lund, Ole; Moumen, Bouziane; Le Paslier, Denis; Pons, Nicolas; Pedersen, Oluf; Prifti, Edi; Qin, Junjie; Raes, Jeroen; Sørensen, Søren; Tap, Julien; Tims, Sebastian; Ussery, David W; Yamada, Takuji; Renault, Pierre; Sicheritz-Ponten, Thomas; Bork, Peer; Wang, Jun; Brunak, Søren; Ehrlich, S Dusko
2014-08-01
Most current approaches for analyzing metagenomic data rely on comparisons to reference genomes, but the microbial diversity of many environments extends far beyond what is covered by reference databases. De novo segregation of complex metagenomic data into specific biological entities, such as particular bacterial strains or viruses, remains a largely unsolved problem. Here we present a method, based on binning co-abundant genes across a series of metagenomic samples, that enables comprehensive discovery of new microbial organisms, viruses and co-inherited genetic entities and aids assembly of microbial genomes without the need for reference sequences. We demonstrate the method on data from 396 human gut microbiome samples and identify 7,381 co-abundance gene groups (CAGs), including 741 metagenomic species (MGS). We use these to assemble 238 high-quality microbial genomes and identify affiliations between MGS and hundreds of viruses or genetic entities. Our method provides the means for comprehensive profiling of the diversity within complex metagenomic samples.
A sequence-based survey of the complex structural organization of tumor genomes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Collins, Colin; Raphael, Benjamin J.; Volik, Stanislav
2008-04-03
The genomes of many epithelial tumors exhibit extensive chromosomal rearrangements. All classes of genome rearrangements can be identified using End Sequencing Profiling (ESP), which relies on paired-end sequencing of cloned tumor genomes. In this study, brain, breast, ovary and prostate tumors along with three breast cancer cell lines were surveyed with ESP yielding the largest available collection of sequence-ready tumor genome breakpoints and providing evidence that some rearrangements may be recurrent. Sequencing and fluorescence in situ hybridization (FISH) confirmed translocations and complex tumor genome structures that include coamplification and packaging of disparate genomic loci with associated molecular heterogeneity. Comparison ofmore » the tumor genomes suggests recurrent rearrangements. Some are likely to be novel structural polymorphisms, whereas others may be bona fide somatic rearrangements. A recurrent fusion transcript in breast tumors and a constitutional fusion transcript resulting from a segmental duplication were identified. Analysis of end sequences for single nucleotide polymorphisms (SNPs) revealed candidate somatic mutations and an elevated rate of novel SNPs in an ovarian tumor. These results suggest that the genomes of many epithelial tumors may be far more dynamic and complex than previously appreciated and that genomic fusions including fusion transcripts and proteins may be common, possibly yielding tumor-specific biomarkers and therapeutic targets.« less
Regulatory variation: an emerging vantage point for cancer biology.
Li, Luolan; Lorzadeh, Alireza; Hirst, Martin
2014-01-01
Transcriptional regulation involves complex and interdependent interactions of noncoding and coding regions of the genome with proteins that interact and modify them. Genetic variation/mutation in coding and noncoding regions of the genome can drive aberrant transcription and disease. In spite of accounting for nearly 98% of the genome comparatively little is known about the contribution of noncoding DNA elements to disease. Genome-wide association studies of complex human diseases including cancer have revealed enrichment for variants in the noncoding genome. A striking finding of recent cancer genome re-sequencing efforts has been the previously underappreciated frequency of mutations in epigenetic modifiers across a wide range of cancer types. Taken together these results point to the importance of dysregulation in transcriptional regulatory control in genesis of cancer. Powered by recent technological advancements in functional genomic profiling, exploration of normal and transformed regulatory networks will provide novel insight into the initiation and progression of cancer and open new windows to future prognostic and diagnostic tools. © 2013 Wiley Periodicals, Inc.
Thomas, David; Finan, Chris; Newport, Melanie J; Jones, Susan
2015-10-01
The complexity of DNA can be quantified using estimates of entropy. Variation in DNA complexity is expected between the promoters of genes with different transcriptional mechanisms; namely housekeeping (HK) and tissue specific (TS). The former are transcribed constitutively to maintain general cellular functions, and the latter are transcribed in restricted tissue and cells types for specific molecular events. It is known that promoter features in the human genome are related to tissue specificity, but this has been difficult to quantify on a genomic scale. If entropy effectively quantifies DNA complexity, calculating the entropies of HK and TS gene promoters as profiles may reveal significant differences. Entropy profiles were calculated for a total dataset of 12,003 human gene promoters and for 501 housekeeping (HK) and 587 tissue specific (TS) human gene promoters. The mean profiles show the TS promoters have a significantly lower entropy (p<2.2e-16) than HK gene promoters. The entropy distributions for the 3 datasets show that promoter entropies could be used to identify novel HK genes. Functional features comprise DNA sequence patterns that are non-random and hence they have lower entropies. The lower entropy of TS gene promoters can be explained by a higher density of positive and negative regulatory elements, required for genes with complex spatial and temporary expression. Copyright © 2015 Elsevier Ltd. All rights reserved.
Genome-Wide Expression Profiling of Complex Regional Pain Syndrome
Jin, Eun-Heui; Zhang, Enji; Ko, Youngkwon; Sim, Woo Seog; Moon, Dong Eon; Yoon, Keon Jung; Hong, Jang Hee; Lee, Won Hyung
2013-01-01
Complex regional pain syndrome (CRPS) is a chronic, progressive, and devastating pain syndrome characterized by spontaneous pain, hyperalgesia, allodynia, altered skin temperature, and motor dysfunction. Although previous gene expression profiling studies have been conducted in animal pain models, there genome-wide expression profiling in the whole blood of CRPS patients has not been reported yet. Here, we successfully identified certain pain-related genes through genome-wide expression profiling in the blood from CRPS patients. We found that 80 genes were differentially expressed between 4 CRPS patients (2 CRPS I and 2 CRPS II) and 5 controls (cut-off value: 1.5-fold change and p<0.05). Most of those genes were associated with signal transduction, developmental processes, cell structure and motility, and immunity and defense. The expression levels of major histocompatibility complex class I A subtype (HLA-A29.1), matrix metalloproteinase 9 (MMP9), alanine aminopeptidase N (ANPEP), l-histidine decarboxylase (HDC), granulocyte colony-stimulating factor 3 receptor (G-CSF3R), and signal transducer and activator of transcription 3 (STAT3) genes selected from the microarray were confirmed in 24 CRPS patients and 18 controls by quantitative reverse transcription-polymerase chain reaction (qRT-PCR). We focused on the MMP9 gene that, by qRT-PCR, showed a statistically significant difference in expression in CRPS patients compared to controls with the highest relative fold change (4.0±1.23 times and p = 1.4×10−4). The up-regulation of MMP9 gene in the blood may be related to the pain progression in CRPS patients. Our findings, which offer a valuable contribution to the understanding of the differential gene expression in CRPS may help in the understanding of the pathophysiology of CRPS pain progression. PMID:24244504
Decomposing genomic variance using information from GWA, GWE and eQTL analysis.
Ehsani, A; Janss, L; Pomp, D; Sørensen, P
2016-04-01
A commonly used procedure in genome-wide association (GWA), genome-wide expression (GWE) and expression quantitative trait locus (eQTL) analyses is based on a bottom-up experimental approach that attempts to individually associate molecular variants with complex traits. Top-down modeling of the entire set of genomic data and partitioning of the overall variance into subcomponents may provide further insight into the genetic basis of complex traits. To test this approach, we performed a whole-genome variance components analysis and partitioned the genomic variance using information from GWA, GWE and eQTL analyses of growth-related traits in a mouse F2 population. We characterized the mouse trait genetic architecture by ordering single nucleotide polymorphisms (SNPs) based on their P-values and studying the areas under the curve (AUCs). The observed traits were found to have a genomic variance profile that differed significantly from that expected of a trait under an infinitesimal model. This situation was particularly true for both body weight and body fat, for which the AUCs were much higher compared with that of glucose. In addition, SNPs with a high degree of trait-specific regulatory potential (SNPs associated with subset of transcripts that significantly associated with a specific trait) explained a larger proportion of the genomic variance than did SNPs with high overall regulatory potential (SNPs associated with transcripts using traditional eQTL analysis). We introduced AUC measures of genomic variance profiles that can be used to quantify relative importance of SNPs as well as degree of deviation of a trait's inheritance from an infinitesimal model. The shape of the curve aids global understanding of traits: The steeper the left-hand side of the curve, the fewer the number of SNPs controlling most of the phenotypic variance. © 2015 Stichting International Foundation for Animal Genetics.
Best, Megan; Newson, Ainsley J; Meiser, Bettina; Juraskova, Ilona; Goldstein, David; Tucker, Kathy; Ballinger, Mandy L; Hess, Dominique; Schlub, Timothy E; Biesecker, Barbara; Vines, Richard; Vines, Kate; Thomas, David; Young, Mary-Anne; Savard, Jacqueline; Jacobs, Chris; Butow, Phyllis
2018-04-05
Genomic sequencing in cancer (both tumour and germline), and development of therapies targeted to tumour genetic status, hold great promise for improvement of patient outcomes. However, the imminent introduction of genomics into clinical practice calls for better understanding of how patients value, experience, and cope with this novel technology and its often complex results. Here we describe a protocol for a novel mixed-methods, prospective study (PiGeOn) that aims to examine patients' psychosocial, cognitive, affective and behavioural responses to tumour genomic profiling and to integrate a parallel critical ethical analysis of returning results. This is a cohort sub-study of a parent tumour genomic profiling programme enrolling patients with advanced cancer. One thousand patients will be recruited for the parent study in Sydney, Australia from 2016 to 2019. They will be asked to complete surveys at baseline, three, and five months. Primary outcomes are: knowledge, preferences, attitudes and values. A purposively sampled subset of patients will be asked to participate in three semi-structured interviews (at each time point) to provide deeper data interpretation. Relevant ethical themes will be critically analysed to iteratively develop or refine normative ethical concepts or frameworks currently used in the return of genetic information. This will be the first Australian study to collect longitudinal data on cancer patients' experience of tumour genomic profiling. Findings will be used to inform ongoing ethical debates on issues such as how to effectively obtain informed consent for genomic profiling return results, distinguish between research and clinical practice and manage patient expectations. The combination of quantitative and qualitative methods will provide comprehensive and critical data on how patients cope with 'actionable' and 'non-actionable' results. This information is needed to ensure that when tumour genomic profiling becomes part of routine clinical care, ethical considerations are embedded, and patients are adequately prepared and supported during and after receiving results. Not required for this sub-study, parent trial registration ACTRN12616000908437 .
Single-Cell Genomics Unravels Brain Cell-Type Complexity.
Guillaumet-Adkins, Amy; Heyn, Holger
2017-01-01
The brain is the most complex tissue in terms of cell types that it comprises, to the extent that it is still poorly understood. Single cell genome and transcriptome profiling allow to disentangle the neuronal heterogeneity, enabling the categorization of individual neurons into groups with similar molecular signatures. Herein, we unravel the current state of knowledge in single cell neurogenomics. We describe the molecular understanding of the cellular architecture of the mammalian nervous system in health and in disease; from the discovery of unrecognized cell types to the validation of known ones, applying these state-of-the-art technologies.
Moqtaderi, Zarmik; Wang, Jie; Raha, Debasish; White, Robert J; Snyder, Michael; Weng, Zhiping; Struhl, Kevin
2010-05-01
Genome-wide occupancy profiles of five components of the RNA polymerase III (Pol III) machinery in human cells identified the expected tRNA and noncoding RNA targets and revealed many additional Pol III-associated loci, mostly near short interspersed elements (SINEs). Several genes are targets of an alternative transcription factor IIIB (TFIIIB) containing Brf2 instead of Brf1 and have extremely low levels of TFIIIC. Strikingly, expressed Pol III genes, unlike nonexpressed Pol III genes, are situated in regions with a pattern of histone modifications associated with functional Pol II promoters. TFIIIC alone associates with numerous ETC loci, via the B box or a novel motif. ETCs are often near CTCF binding sites, suggesting a potential role in chromosome organization. Our results suggest that human Pol III complexes associate preferentially with regions near functional Pol II promoters and that TFIIIC-mediated recruitment of TFIIIB is regulated in a locus-specific manner.
Moqtaderi, Zarmik; Wang, Jie; Raha, Debasish; White, Robert J.; Snyder, Michael; Weng, Zhiping; Struhl, Kevin
2012-01-01
Genome-wide occupancy profiles of five components of the RNA Polymerase III (Pol III) machinery in human cells identified the expected tRNA and non-coding RNA targets and revealed many additional Pol III-associated loci, mostly near SINEs. Several genes are targets of an alternative TFIIIB containing Brf2 instead of Brf1 and have extremely low levels of TFIIIC. Strikingly, expressed Pol III genes, unlike non-expressed Pol III genes, are situated in regions with a pattern of histone modifications associated with functional Pol II promoters. TFIIIC alone associates with numerous ETC loci, via the B box or a novel motif. ETCs are often near CTCF binding sites, suggesting a potential role in chromosome organization. Our results suggest that human Pol III complexes associate preferentially with regions near functional Pol II promoters and that TFIIIC-mediated recruitment of TFIIIB is regulated in a locus-specific manner. PMID:20418883
MiR-191 Regulates Primary Human Fibroblast Proliferation and Directly Targets Multiple Oncogenes
Polioudakis, Damon; Abell, Nathan S.; Iyer, Vishwanath R.
2015-01-01
miRNAs play a central role in numerous pathologies including multiple cancer types. miR-191 has predominantly been studied as an oncogene, but the role of miR-191 in the proliferation of primary cells is not well characterized, and the miR-191 targetome has not been experimentally profiled. Here we utilized RNA induced silencing complex immunoprecipitations as well as gene expression profiling to construct a genome wide miR-191 target profile. We show that miR-191 represses proliferation in primary human fibroblasts, identify multiple proto-oncogenes as novel miR-191 targets, including CDK9, NOTCH2, and RPS6KA3, and present evidence that miR-191 extensively mediates target expression through coding sequence (CDS) pairing. Our results provide a comprehensive genome wide miR-191 target profile, and demonstrate miR-191’s regulation of primary human fibroblast proliferation. PMID:25992613
Discovering Hematopoietic Mechanisms Through Genome-Wide Analysis of GATA Factor Chromatin Occupancy
Fujiwara, Tohru; O'Geen, Henriette; Keles, Sunduz; Blahnik, Kimberly; Linnemann, Amelia K.; Kang, Yoon-A; Choi, Kyunghee; Farnham, Peggy J.; Bresnick, Emery H.
2009-01-01
SUMMARY GATA factors interact with simple DNA motifs (WGATAR) to regulate critical processes, including hematopoiesis, but very few WGATAR motifs are occupied in genomes. Given the rudimentary knowledge of mechanisms underlying this restriction, and how GATA factors establish genetic networks, we used ChIP-seq to define GATA-1 and GATA-2 occupancy genome-wide in erythroid cells. Coupled with genetic complementation analysis and transcriptional profiling, these studies revealed a rich collection of targets containing a characteristic binding motif of greater complexity than WGATAR. GATA factors occupied loci encoding multiple components of the Scl/TAL1 complex, a master regulator of hematopoiesis and leukemogenic target. Mechanistic analyses provided evidence for cross-regulatory and autoregulatory interactions among components of this complex, including GATA-2 induction of the hematopoietic corepressor ETO-2 and an ETO-2 negative autoregulatory loop. These results establish fundamental principles underlying GATA factor mechanisms in chromatin and illustrate a complex network of considerable importance for the control of hematopoiesis. PMID:19941826
Molecular and Genomic Alterations in Glioblastoma Multiforme.
Crespo, Ines; Vital, Ana Louisa; Gonzalez-Tablas, María; Patino, María del Carmen; Otero, Alvaro; Lopes, María Celeste; de Oliveira, Catarina; Domingues, Patricia; Orfao, Alberto; Tabernero, Maria Dolores
2015-07-01
In recent years, important advances have been achieved in the understanding of the molecular biology of glioblastoma multiforme (GBM); thus, complex genetic alterations and genomic profiles, which recurrently involve multiple signaling pathways, have been defined, leading to the first molecular/genetic classification of the disease. In this regard, different genetic alterations and genetic pathways appear to distinguish primary (eg, EGFR amplification) versus secondary (eg, IDH1/2 or TP53 mutation) GBM. Such genetic alterations target distinct combinations of the growth factor receptor-ras signaling pathways, as well as the phosphatidylinositol 3-kinase/phosphatase and tensin homolog/AKT, retinoblastoma/cyclin-dependent kinase (CDK) N2A-p16(INK4A), and TP53/mouse double minute (MDM) 2/MDM4/CDKN2A-p14(ARF) pathways, in cells that present features associated with key stages of normal neurogenesis and (normal) central nervous system cell types. This translates into well-defined genomic profiles that have been recently classified by The Cancer Genome Atlas Consortium into four subtypes: classic, mesenchymal, proneural, and neural GBM. Herein, we review the most relevant genetic alterations of primary versus secondary GBM, the specific signaling pathways involved, and the overall genomic profile of this genetically heterogeneous group of malignant tumors. Copyright © 2015 American Society for Investigative Pathology. Published by Elsevier Inc. All rights reserved.
Functional genomics efforts face tradeoffs between number of perturbations examined and complexity of phenotypes measured. We bridge this gap with Perturb-seq, which combines droplet-based single-cell RNA-seq with a strategy for barcoding CRISPR-mediated perturbations, allowing many perturbations to be profiled in pooled format. We applied Perturb-seq to dissect the mammalian unfolded protein response (UPR) using single and combinatorial CRISPR perturbations. Two genome-scale CRISPR interference (CRISPRi) screens identified genes whose repression perturbs ER homeostasis.
Methyl-CpG island-associated genome signature tags
Dunn, John J
2014-05-20
Disclosed is a method for analyzing the organismic complexity of a sample through analysis of the nucleic acid in the sample. In the disclosed method, through a series of steps, including digestion with a type II restriction enzyme, ligation of capture adapters and linkers and digestion with a type IIS restriction enzyme, genome signature tags are produced. The sequences of a statistically significant number of the signature tags are determined and the sequences are used to identify and quantify the organisms in the sample. Various embodiments of the invention described herein include methods for using single point genome signature tags to analyze the related families present in a sample, methods for analyzing sequences associated with hyper- and hypo-methylated CpG islands, methods for visualizing organismic complexity change in a sampling location over time and methods for generating the genome signature tag profile of a sample of fragmented DNA.
2014-01-01
Background Polycomb group proteins form multicomponent complexes that are important for establishing lineage-specific patterns of gene expression. Mammalian cells encode multiple permutations of the prototypic Polycomb repressive complex 1 (PRC1) with little evidence for functional specialization. An aim of this study is to determine whether the multiple orthologs that are co-expressed in human fibroblasts act on different target genes and whether their genomic location changes during cellular senescence. Results Deep sequencing of chromatin immunoprecipitated with antibodies against CBX6, CBX7, CBX8, RING1 and RING2 reveals that the orthologs co-localize at multiple sites. PCR-based validation at representative loci suggests that a further six PRC1 proteins have similar binding patterns. Importantly, sequential chromatin immunoprecipitation with antibodies against different orthologs implies that multiple variants of PRC1 associate with the same DNA. At many loci, the binding profiles have a distinctive architecture that is preserved in two different types of fibroblast. Conversely, there are several hundred loci at which PRC1 binding is cell type-specific and, contrary to expectations, the presence of PRC1 does not necessarily equate with transcriptional silencing. Interestingly, the PRC1 binding profiles are preserved in senescent cells despite changes in gene expression. Conclusions The multiple permutations of PRC1 in human fibroblasts congregate at common rather than specific sites in the genome and with overlapping but distinctive binding profiles in different fibroblasts. The data imply that the effects of PRC1 complexes on gene expression are more subtle than simply repressing the loci at which they bind. PMID:24485159
Genetic and molecular alterations in pancreatic cancer: implications for personalized medicine.
Fang, Yantian; Yao, Qizhi; Chen, Zongyou; Xiang, Jianbin; William, Fisher E; Gibbs, Richard A; Chen, Changyi
2013-10-31
Recent advances in human genomics and biotechnologies have profound impacts on medical research and clinical practice. Individual genomic information, including DNA sequences and gene expression profiles, can be used for prediction, prevention, diagnosis, and treatment for many complex diseases. Personalized medicine attempts to tailor medical care to individual patients by incorporating their genomic information. In a case of pancreatic cancer, the fourth leading cause of cancer death in the United States, alteration in many genes as well as molecular profiles in blood, pancreas tissue, and pancreas juice has recently been discovered to be closely associated with tumorigenesis or prognosis of the cancer. This review aims to summarize recent advances of important genes, proteins, and microRNAs that play a critical role in the pathogenesis of pancreatic cancer, and to provide implications for personalized medicine in pancreatic cancer.
Huang, Lulin; Cheng, Tingcai; Xu, Pingzhen; Fang, Ting; Xia, Qingyou
2012-01-01
Transcription factors are present in all living organisms, and play vital roles in a wide range of biological processes. Studies of transcription factors will help reveal the complex regulation mechanism of organisms. So far, hundreds of domains have been identified that show transcription factor activity. Here, 281 reported transcription factor domains were used as seeds to search the transcription factors in genomes of Bombyx mori L. (Lepidoptera: Bombycidae) and four other model insects. Overall, 666 transcription factors including 36 basal factors and 630 other factors were identified in B. mori genome, which accounted for 4.56% of its genome. The silkworm transcription factors' expression profiles were investigated in relation to multiple tissues, developmental stages, sexual dimorphism, and responses to oral infection by pathogens and direct bacterial injection. These all provided rich clues for revealing the transcriptional regulation mechanism of silkworm organ differentiation, growth and development, sexual dimorphism, and response to pathogen infection. PMID:22943524
Lohmann, Ingrid
2012-01-01
In multi-cellular organisms, spatiotemporal activity of cis-regulatory DNA elements depends on their occupancy by different transcription factors (TFs). In recent years, genome-wide ChIP-on-Chip, ChIP-Seq and DamID assays have been extensively used to unravel the combinatorial interaction of TFs with cis-regulatory modules (CRMs) in the genome. Even though genome-wide binding profiles are increasingly becoming available for different TFs, single TF binding profiles are in most cases not sufficient for dissecting complex regulatory networks. Thus, potent computational tools detecting statistically significant and biologically relevant TF-motif co-occurrences in genome-wide datasets are essential for analyzing context-dependent transcriptional regulation. We have developed COPS (Co-Occurrence Pattern Search), a new bioinformatics tool based on a combination of association rules and Markov chain models, which detects co-occurring TF binding sites (BSs) on genomic regions of interest. COPS scans DNA sequences for frequent motif patterns using a Frequent-Pattern tree based data mining approach, which allows efficient performance of the software with respect to both data structure and implementation speed, in particular when mining large datasets. Since transcriptional gene regulation very often relies on the formation of regulatory protein complexes mediated by closely adjoining TF binding sites on CRMs, COPS additionally detects preferred short distance between co-occurring TF motifs. The performance of our software with respect to biological significance was evaluated using three published datasets containing genomic regions that are independently bound by several TFs involved in a defined biological process. In sum, COPS is a fast, efficient and user-friendly tool mining statistically and biologically significant TFBS co-occurrences and therefore allows the identification of TFs that combinatorially regulate gene expression. PMID:23272209
Wang, Edwin; Zaman, Naif; Mcgee, Shauna; Milanese, Jean-Sébastien; Masoudi-Nejad, Ali; O'Connor-McCourt, Maureen
2015-02-01
Tumor genome sequencing leads to documenting thousands of DNA mutations and other genomic alterations. At present, these data cannot be analyzed adequately to aid in the understanding of tumorigenesis and its evolution. Moreover, we have little insight into how to use these data to predict clinical phenotypes and tumor progression to better design patient treatment. To meet these challenges, we discuss a cancer hallmark network framework for modeling genome sequencing data to predict cancer clonal evolution and associated clinical phenotypes. The framework includes: (1) cancer hallmarks that can be represented by a few molecular/signaling networks. 'Network operational signatures' which represent gene regulatory logics/strengths enable to quantify state transitions and measures of hallmark traits. Thus, sets of genomic alterations which are associated with network operational signatures could be linked to the state/measure of hallmark traits. The network operational signature transforms genotypic data (i.e., genomic alterations) to regulatory phenotypic profiles (i.e., regulatory logics/strengths), to cellular phenotypic profiles (i.e., hallmark traits) which lead to clinical phenotypic profiles (i.e., a collection of hallmark traits). Furthermore, the framework considers regulatory logics of the hallmark networks under tumor evolutionary dynamics and therefore also includes: (2) a self-promoting positive feedback loop that is dominated by a genomic instability network and a cell survival/proliferation network is the main driver of tumor clonal evolution. Surrounding tumor stroma and its host immune systems shape the evolutionary paths; (3) cell motility initiating metastasis is a byproduct of the above self-promoting loop activity during tumorigenesis; (4) an emerging hallmark network which triggers genome duplication dominates a feed-forward loop which in turn could act as a rate-limiting step for tumor formation; (5) mutations and other genomic alterations have specific patterns and tissue-specificity, which are driven by aging and other cancer-inducing agents. This framework represents the logics of complex cancer biology as a myriad of phenotypic complexities governed by a limited set of underlying organizing principles. It therefore adds to our understanding of tumor evolution and tumorigenesis, and moreover, potential usefulness of predicting tumors' evolutionary paths and clinical phenotypes. Strategies of using this framework in conjunction with genome sequencing data in an attempt to predict personalized drug targets, drug resistance, and metastasis for cancer patients, as well as cancer risks for healthy individuals are discussed. Accurate prediction of cancer clonal evolution and clinical phenotypes will have substantial impact on timely diagnosis, personalized treatment and personalized prevention of cancer. Crown Copyright © 2014. Published by Elsevier Ltd. All rights reserved.
Societal challenges of precision medicine: Bringing order to chaos.
Salgado, Roberto; Moore, Helen; Martens, John W M; Lively, Tracy; Malik, Shakun; McDermott, Ultan; Michiels, Stefan; Moscow, Jeffrey A; Tejpar, Sabine; McKee, Tawnya; Lacombe, Denis
2017-10-01
The increasing number of drugs targeting specific proteins implicated in tumourigenesis and the commercial promotion of relatively affordable genome-wide analyses has led to an increasing expectation among patients with cancer that they can now receive effective personalised treatment based on the often complex genomic signature of their tumour. For such approaches to work in routine practice, the development of correspondingly complex biomarker assays through an appropriate and rigorous regulatory framework will be required. It is becoming increasingly evident that a re-engineering of clinical research is necessary so that regulatory considerations and procedures facilitate the efficient translation of these required biomarker assays from the discovery setting through to clinical application. This article discusses the practical requirements and challenges of developing such new precision medicine strategies, based on leveraging complex genomic profiles, as discussed at the Innovation and Biomarkers in Cancer Drug Development meeting (8th-9th September 2016, Brussels, Belgium). Copyright © 2017 Elsevier Ltd. All rights reserved.
A fungal mock community control for amplicon sequencing experiments
USDA-ARS?s Scientific Manuscript database
The field of microbial ecology has been profoundly advanced by the ability to profile the composition of complex microbial communities by means of high throughput amplicon sequencing of marker genes amplified directly from environmental genomic DNA extracts. However, it has become increasingly clear...
Each cell counts: Hematopoiesis and immunity research in the era of single cell genomics.
Jaitin, Diego Adhemar; Keren-Shaul, Hadas; Elefant, Naama; Amit, Ido
2015-02-01
Hematopoiesis and immunity are mediated through complex interactions between multiple cell types and states. This complexity is currently addressed following a reductionist approach of characterizing cell types by a small number of cell surface molecular features and gross functions. While the introduction of global transcriptional profiling technologies enabled a more comprehensive view, heterogeneity within sampled populations remained unaddressed, obscuring the true picture of hematopoiesis and immune system function. A critical mass of technological advances in molecular biology and genomics has enabled genome-wide measurements of single cells - the fundamental unit of immunity. These new advances are expected to boost detection of less frequent cell types and fuzzy intermediate cell states, greatly expanding the resolution of current available classifications. This new era of single-cell genomics in immunology research holds great promise for further understanding of the mechanisms and circuits regulating hematopoiesis and immunity in both health and disease. In the near future, the accuracy of single-cell genomics will ultimately enable precise diagnostics and treatment of multiple hematopoietic and immune related diseases. Copyright © 2015 Elsevier Ltd. All rights reserved.
Cancer vulnerabilities unveiled by genomic loss
Nijhawan, Deepak; Zack, Travis I.; Ren, Yin; Strickland, Matthew R.; Lamothe, Rebecca; Schumacher, Steven E.; Tsherniak, Aviad; Besche, Henrike C.; Rosenbluh, Joseph; Shehata, Shyemaa; Cowley, Glenn S.; Weir, Barbara A.; Goldberg, Alfred L.; Mesirov, Jill P.; Root, David E.; Bhatia, Sangeeta N.; Beroukhim, Rameen; Hahn, William C.
2012-01-01
Summary Due to genome instability, most cancers exhibit loss of regions containing tumor suppressor genes and collateral loss of other genes. To identify cancer-specific vulnerabilities that are the result of copy-number losses, we performed integrated analyses of genome-wide copy-number and RNAi profiles and identified 56 genes for which gene suppression specifically inhibited the proliferation of cells harboring partial copy-number loss of that gene. These CYCLOPS (Copy-number alterations Yielding Cancer Liabilities Owing to Partial losS) genes are enriched for spliceosome, proteasome and ribosome components. One CYCLOPS gene, PSMC2, encodes an essential member of the 19S proteasome. Normal cells express excess PSMC2, which resides in a complex with PSMC1, PSMD2, and PSMD5 and acts as a reservoir protecting cells from PSMC2 suppression. Cells harboring partial PSMC2 copy-number loss lack this complex and die after PSMC2 suppression. These observations define a distinct class of cancer-specific liabilities resulting from genome instability. PMID:22901813
Gao, Chen; Wang, Yibin
2014-01-01
With the advancement of transcriptome profiling by micro-arrays and high-throughput RNA-sequencing, transcriptome complexity and its dynamics are revealed at different levels in cardiovascular development and diseases. In this review, we will highlight the recent progress in our knowledge of cardiovascular transcriptome complexity contributed by RNA splicing, RNA editing and noncoding RNAs. The emerging importance of many of these previously under-explored aspects of gene regulation in cardiovascular development and pathology will be discussed.
Spectroscopic and Statistical Techniques for Information Recovery in Metabonomics and Metabolomics
NASA Astrophysics Data System (ADS)
Lindon, John C.; Nicholson, Jeremy K.
2008-07-01
Methods for generating and interpreting metabolic profiles based on nuclear magnetic resonance (NMR) spectroscopy, mass spectrometry (MS), and chemometric analysis methods are summarized and the relative strengths and weaknesses of NMR and chromatography-coupled MS approaches are discussed. Given that all data sets measured to date only probe subsets of complex metabolic profiles, we describe recent developments for enhanced information recovery from the resulting complex data sets, including integration of NMR- and MS-based metabonomic results and combination of metabonomic data with data from proteomics, transcriptomics, and genomics. We summarize the breadth of applications, highlight some current activities, discuss the issues relating to metabonomics, and identify future trends.
Spectroscopic and statistical techniques for information recovery in metabonomics and metabolomics.
Lindon, John C; Nicholson, Jeremy K
2008-01-01
Methods for generating and interpreting metabolic profiles based on nuclear magnetic resonance (NMR) spectroscopy, mass spectrometry (MS), and chemometric analysis methods are summarized and the relative strengths and weaknesses of NMR and chromatography-coupled MS approaches are discussed. Given that all data sets measured to date only probe subsets of complex metabolic profiles, we describe recent developments for enhanced information recovery from the resulting complex data sets, including integration of NMR- and MS-based metabonomic results and combination of metabonomic data with data from proteomics, transcriptomics, and genomics. We summarize the breadth of applications, highlight some current activities, discuss the issues relating to metabonomics, and identify future trends.
Revisiting the Evolution of Mycobacterium bovis
Mostowy, Serge; Inwald, Jackie; Gordon, Steve; Martin, Carlos; Warren, Rob; Kremer, Kristin; Cousins, Debby; Behr, Marcel A.
2005-01-01
Though careful consideration has been placed towards genetic characterization of tubercle bacillus isolates causing disease in humans, those causing disease predominantly among wild and domesticated mammals have received less attention. In contrast to Mycobacterium tuberculosis, whose host range is largely specific to humans, M. bovis and “M bovis-like” organisms infect a broad range of animal species beyond their most prominent host in cattle. To determine whether strains of variable genomic content are associated with distinct distributions of disease, the DNA contents of M. bovis or M. bovis-like isolates from a variety of hosts were investigated via Affymetrix GeneChip. Consistent with previous genomic analysis of the M. tuberculosis complex (MTC), large sequence polymorphisms of putative diagnostic and biological consequence were able to unambiguously distinguish interrogated isolates. The distribution of deleted regions indicates organisms genomically removed from M. bovis and also points to structured genomic variability within M. bovis. Certain genomic profiles spanned a variety of hosts but were clustered by geography, while others associated primarily with host type. In contrast to the prevailing assumption that M. bovis has broad host capacity, genomic profiles suggest that distinct MTC lineages differentially infect a variety of mammals. From this, a phylogenetic stratification of genotypes offers a predictive framework upon which to base future genetic and phenotypic studies of the MTC. PMID:16159772
Diversity Arrays Technology (DArT) for whole-genome profiling of barley
Wenzl, Peter; Carling, Jason; Kudrna, David; Jaccoud, Damian; Huttner, Eric; Kleinhofs, Andris; Kilian, Andrzej
2004-01-01
Diversity Arrays Technology (DArT) can detect and type DNA variation at several hundred genomic loci in parallel without relying on sequence information. Here we show that it can be effectively applied to genetic mapping and diversity analyses of barley, a species with a 5,000-Mbp genome. We tested several complexity reduction methods and selected two that generated the most polymorphic genomic representations. Arrays containing individual fragments from these representations generated DArT fingerprints with a genotype call rate of 98.0% and a scoring reproducibility of at least 99.8%. The fingerprints grouped barley lines according to known genetic relationships. To validate the Mendelian behavior of DArT markers, we constructed a genetic map for a cross between cultivars Steptoe and Morex. Nearly all polymorphic array features could be incorporated into one of seven linkage groups (98.8%). The resulting map comprised ≈385 unique DArT markers and spanned 1,137 centimorgans. A comparison with the restriction fragment length polymorphism-based framework map indicated that the quality of the DArT map was equivalent, if not superior, to that of the framework map. These results highlight the potential of DArT as a generic technique for genome profiling in the context of molecular breeding and genomics. PMID:15192146
Weitzel, Jeffrey N.; Blazer, Kathleen R.; MacDonald, Deborah J.; Culver, Julie O.; Offit, Kenneth
2012-01-01
Scientific and technologic advances are revolutionizing our approach to genetic cancer risk assessment, cancer screening and prevention, and targeted therapy, fulfilling the promise of personalized medicine. In this monograph we review the evolution of scientific discovery in cancer genetics and genomics, and describe current approaches, benefits and barriers to the translation of this information to the practice of preventive medicine. Summaries of known hereditary cancer syndromes and highly penetrant genes are provided and contrasted with recently-discovered genomic variants associated with modest increases in cancer risk. We describe the scope of knowledge, tools, and expertise required for the translation of complex genetic and genomic test information into clinical practice. The challenges of genomic counseling include the need for genetics and genomics professional education and multidisciplinary team training, the need for evidence-based information regarding the clinical utility of testing for genomic variants, the potential dangers posed by premature marketing of first-generation genomic profiles, and the need for new clinical models to improve access to and responsible communication of complex disease-risk information. We conclude that given the experiences and lessons learned in the genetics era, the multidisciplinary model of genetic cancer risk assessment and management will serve as a solid foundation to support the integration of personalized genomic information into the practice of cancer medicine. PMID:21858794
The emerging genomics and systems biology research lead to systems genomics studies.
Yang, Mary Qu; Yoshigoe, Kenji; Yang, William; Tong, Weida; Qin, Xiang; Dunker, A; Chen, Zhongxue; Arbania, Hamid R; Liu, Jun S; Niemierko, Andrzej; Yang, Jack Y
2014-01-01
Synergistically integrating multi-layer genomic data at systems level not only can lead to deeper insights into the molecular mechanisms related to disease initiation and progression, but also can guide pathway-based biomarker and drug target identification. With the advent of high-throughput next-generation sequencing technologies, sequencing both DNA and RNA has generated multi-layer genomic data that can provide DNA polymorphism, non-coding RNA, messenger RNA, gene expression, isoform and alternative splicing information. Systems biology on the other hand studies complex biological systems, particularly systematic study of complex molecular interactions within specific cells or organisms. Genomics and molecular systems biology can be merged into the study of genomic profiles and implicated biological functions at cellular or organism level. The prospectively emerging field can be referred to as systems genomics or genomic systems biology. The Mid-South Bioinformatics Centre (MBC) and Joint Bioinformatics Ph.D. Program of University of Arkansas at Little Rock and University of Arkansas for Medical Sciences are particularly interested in promoting education and research advancement in this prospectively emerging field. Based on past investigations and research outcomes, MBC is further utilizing differential gene and isoform/exon expression from RNA-seq and co-regulation from the ChiP-seq specific for different phenotypes in combination with protein-protein interactions, and protein-DNA interactions to construct high-level gene networks for an integrative genome-phoneme investigation at systems biology level.
Ray Meta: scalable de novo metagenome assembly and profiling
2012-01-01
Voluminous parallel sequencing datasets, especially metagenomic experiments, require distributed computing for de novo assembly and taxonomic profiling. Ray Meta is a massively distributed metagenome assembler that is coupled with Ray Communities, which profiles microbiomes based on uniquely-colored k-mers. It can accurately assemble and profile a three billion read metagenomic experiment representing 1,000 bacterial genomes of uneven proportions in 15 hours with 1,024 processor cores, using only 1.5 GB per core. The software will facilitate the processing of large and complex datasets, and will help in generating biological insights for specific environments. Ray Meta is open source and available at http://denovoassembler.sf.net. PMID:23259615
He, Awen; Wang, Wenyu; Prakash, N Tejo; Tinkov, Alexey A; Skalny, Anatoly V; Wen, Yan; Hao, Jingcan; Guo, Xiong; Zhang, Feng
2018-03-01
Chemical elements are closely related to human health. Extensive genomic profile data of complex diseases offer us a good opportunity to systemically investigate the relationships between elements and complex diseases/traits. In this study, we applied gene set enrichment analysis (GSEA) approach to detect the associations between elements and complex diseases/traits though integrating element-gene interaction datasets and genome-wide association study (GWAS) data of complex diseases/traits. To illustrate the performance of GSEA, the element-gene interaction datasets of 24 elements were extracted from the comparative toxicogenomics database (CTD). GWAS summary datasets of 24 complex diseases or traits were downloaded from the dbGaP or GEFOS websites. We observed significant associations between 7 elements and 13 complex diseases or traits (all false discovery rate (FDR) < 0.05), including reported relationships such as aluminum vs. Alzheimer's disease (FDR = 0.042), calcium vs. bone mineral density (FDR = 0.031), magnesium vs. systemic lupus erythematosus (FDR = 0.012) as well as novel associations, such as nickel vs. hypertriglyceridemia (FDR = 0.002) and bipolar disorder (FDR = 0.027). Our study results are consistent with previous biological studies, supporting the good performance of GSEA. Our analyzing results based on GSEA framework provide novel clues for discovering causal relationships between elements and complex diseases. © 2017 WILEY PERIODICALS, INC.
Borziak, Kirill; Posner, Mareike G; Upadhyay, Abhishek; Danson, Michael J; Bagby, Stefan; Dorus, Steve
2014-01-01
Metagenomic analyses have advanced our understanding of ecological microbial diversity, but to what extent can metagenomic data be used to predict the metabolic capacity of difficult-to-study organisms and their abiotic environmental interactions? We tackle this question, using a comparative genomic approach, by considering the molecular basis of aerobiosis within archaea. Lipoylation, the covalent attachment of lipoic acid to 2-oxoacid dehydrogenase multienzyme complexes (OADHCs), is essential for metabolism in aerobic bacteria and eukarya. Lipoylation is catalysed either by lipoate protein ligase (LplA), which in archaea is typically encoded by two genes (LplA-N and LplA-C), or by a lipoyl(octanoyl) transferase (LipB or LipM) plus a lipoic acid synthetase (LipA). Does the genomic presence of lipoylation and OADHC genes across archaea from diverse habitats correlate with aerobiosis? First, analyses of 11,826 biotin protein ligase (BPL)-LplA-LipB transferase family members and 147 archaeal genomes identified 85 species with lipoylation capabilities and provided support for multiple ancestral acquisitions of lipoylation pathways during archaeal evolution. Second, with the exception of the Sulfolobales order, the majority of species possessing lipoylation systems exclusively retain LplA, or either LipB or LipM, consistent with archaeal genome streamlining. Third, obligate anaerobic archaea display widespread loss of lipoylation and OADHC genes. Conversely, a high level of correspondence is observed between aerobiosis and the presence of LplA/LipB/LipM, LipA and OADHC E2, consistent with the role of lipoylation in aerobic metabolism. This correspondence between OADHC lipoylation capacity and aerobiosis indicates that genomic pathway profiling in archaea is informative and that well characterized pathways may be predictive in relation to abiotic conditions in difficult-to-study extremophiles. Given the highly variable retention of gene repertoires across the archaea, the extension of comparative genomic pathway profiling to broader metabolic and homeostasis networks should be useful in revealing characteristics from metagenomic datasets related to adaptations to diverse environments.
Di, Li-Jun; Byun, Jung S; Wong, Madeline M; Wakano, Clay; Taylor, Tara; Bilke, Sven; Baek, Songjoon; Hunter, Kent; Yang, Howard; Lee, Maxwell; Zvosec, Cecilia; Khramtsova, Galina; Cheng, Fan; Perou, Charles M; Miller, C Ryan; Raab, Rachel; Olopade, Olufunmilayo I; Gardner, Kevin
2013-01-01
The C-terminal binding protein (CtBP) is a NADH-dependent transcriptional repressor that links carbohydrate metabolism to epigenetic regulation by recruiting diverse histone-modifying complexes to chromatin. Here global profiling of CtBP in breast cancer cells reveals that it drives epithelial-to-mesenchymal transition, stem cell pathways and genome instability. CtBP expression induces mesenchymal and stem cell-like features, whereas CtBP depletion or caloric restriction reverses gene repression and increases DNA repair. Multiple members of the CtBP-targeted gene network are selectively downregulated in aggressive breast cancer subtypes. Differential expression of CtBP-targeted genes predicts poor clinical outcome in breast cancer patients, and elevated levels of CtBP in patient tumours predict shorter median survival. Finally, both CtBP promoter targeting and gene repression can be reversed by small molecule inhibition. These findings define broad roles for CtBP in breast cancer biology and suggest novel chromatin-based strategies for pharmacologic and metabolic intervention in cancer.
Bastarrachea, Raúl A.; Gallegos-Cabriales, Esther C.; Nava-González, Edna J.; Haack, Karin; Voruganti, V. Saroja; Charlesworth, Jac; Laviada-Molina, Hugo A.; Veloz-Garza, Rosa A.; Cardenas-Villarreal, Velia Margarita; Valdovinos-Chavez, Salvador B.; Gomez-Aguilar, Patricia; Meléndez, Guillermo; López-Alvarenga, Juan Carlos; Göring, Harald H. H.; Cole, Shelley A.; Blangero, John; Comuzzie, Anthony G.; Kent, Jack W.
2012-01-01
Whole-transcriptome expression profiling provides novel phenotypes for analysis of complex traits. Gene expression measurements reflect quantitative variation in transcript-specific messenger RNA levels and represent phenotypes lying close to the action of genes. Understanding the genetic basis of gene expression will provide insight into the processes that connect genotype to clinically significant traits representing a central tenet of system biology. Synchronous in vivo expression profiles of lymphocytes, muscle, and subcutaneous fat were obtained from healthy Mexican men. Most genes were expressed at detectable levels in multiple tissues, and RNA levels were correlated between tissue types. A subset of transcripts with high reliability of expression across tissues (estimated by intraclass correlation coefficients) was enriched for cis-regulated genes, suggesting that proximal sequence variants may influence expression similarly in different cellular environments. This integrative global gene expression profiling approach is proving extremely useful for identifying genes and pathways that contribute to complex clinical traits. Clearly, the coincidence of clinical trait quantitative trait loci and expression quantitative trait loci can help in the prioritization of positional candidate genes. Such data will be crucial for the formal integration of positional and transcriptomic information characterized as genetical genomics. PMID:22797999
Menzel, Ralph; Swain, Suresh C; Hoess, Sebastian; Claus, Evelyn; Menzel, Stefanie; Steinberg, Christian EW; Reifferscheid, Georg; Stürzenbaum, Stephen R
2009-01-01
Background Traditionally, toxicity of river sediments is assessed using whole sediment tests with benthic organisms. The challenge, however, is the differentiation between multiple effects caused by complex contaminant mixtures and the unspecific toxicity endpoints such as survival, growth or reproduction. The use of gene expression profiling facilitates the identification of transcriptional changes at the molecular level that are specific to the bio-available fraction of pollutants. Results In this pilot study, we exposed the nematode Caenorhabditis elegans to three sediments of German rivers with varying (low, medium and high) levels of heavy metal and organic contamination. Beside chemical analysis, three standard bioassays were performed: reproduction of C. elegans, genotoxicity (Comet assay) and endocrine disruption (YES test). Gene expression was profiled using a whole genome DNA-microarray approach to identify overrepresented functional gene categories and derived cellular processes. Disaccharide and glycogen metabolism were found to be affected, whereas further functional pathways, such as oxidative phosphorylation, ribosome biogenesis, metabolism of xenobiotics, aging and several developmental processes were found to be differentially regulated only in response to the most contaminated sediment. Conclusion This study demonstrates how ecotoxicogenomics can identify transcriptional responses in complex mixture scenarios to distinguish different samples of river sediments. PMID:19366437
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grigoriev, Igor
The JGI Fungal Genomics Program aims to scale up sequencing and analysis of fungal genomes to explore the diversity of fungi important for energy and the environment, and to promote functional studies on a system level. Combining new sequencing technologies and comparative genomics tools, JGI is now leading the world in fungal genome sequencing and analysis. Over 120 sequenced fungal genomes with analytical tools are available via MycoCosm (www.jgi.doe.gov/fungi), a web-portal for fungal biologists. Our model of interacting with user communities, unique among other sequencing centers, helps organize these communities, improves genome annotation and analysis work, and facilitates new larger-scalemore » genomic projects. This resulted in 20 high-profile papers published in 2011 alone and contributing to the Genomics Encyclopedia of Fungi, which targets fungi related to plant health (symbionts, pathogens, and biocontrol agents) and biorefinery processes (cellulose degradation, sugar fermentation, industrial hosts). Our next grand challenges include larger scale exploration of fungal diversity (1000 fungal genomes), developing molecular tools for DOE-relevant model organisms, and analysis of complex systems and metagenomes.« less
Nikiforuk, Aidan M; Leung, Anders; Cook, Bradley W M; Court, Deborah A; Kobasa, Darwyn; Theriault, Steven S
2016-10-01
Viral Infectious clone systems serve as robust platforms to study viral gene or replicative function by reverse genetics, formulate vaccines and adapt a wild type-virus to an animal host. Since the development of the first viral infectious clone system for the poliovirus, novel strategies of viral genome construction have allowed for the assembly of viral genomes across the identified viral families. However, the molecular profiles of some viruses make their genome more difficult to construct than others. Two factors that affect the difficulty of infectious clone construction are genome length and genome complexity. This work examines the available strategies for overcoming the obstacles of assembling the long and complex RNA genomes of coronaviruses and reports one-step construction of an infectious clone system for the Middle East Respiratory Syndrome coronavirus (MERS-CoV) by homologous recombination in S. cerevisiae. Future use of this methodology will shorten the time between emergence of a novel viral pathogen and construction of an infectious clone system. Completion of a viral infectious clone system facilitates further study of a virus's biology, improvement of diagnostic tests, vaccine production and the screening of antiviral compounds. Crown Copyright © 2016. Published by Elsevier B.V. All rights reserved.
Profiling protein function with small molecule microarrays
Winssinger, Nicolas; Ficarro, Scott; Schultz, Peter G.; Harris, Jennifer L.
2002-01-01
The regulation of protein function through posttranslational modification, local environment, and protein–protein interaction is critical to cellular function. The ability to analyze on a genome-wide scale protein functional activity rather than changes in protein abundance or structure would provide important new insights into complex biological processes. Herein, we report the application of a spatially addressable small molecule microarray to an activity-based profile of proteases in crude cell lysates. The potential of this small molecule-based profiling technology is demonstrated by the detection of caspase activation upon induction of apoptosis, characterization of the activated caspase, and inhibition of the caspase-executed apoptotic phenotype using the small molecule inhibitor identified in the microarray-based profile. PMID:12167675
Childhood Acute Lymphoblastic Leukemia: Integrating Genomics into Therapy
Tasian, Sarah K; Loh, Mignon L; Hunger, Stephen P
2015-01-01
Acute lymphoblastic leukemia (ALL), the most common malignancy of childhood, is a genetically complex entity that remains a major cause of childhood cancer-related mortality. Major advances in genomic and epigenomic profiling during the past decade have appreciably enhanced knowledge of the biology of de novo and relapsed ALL and have facilitated more precise risk stratification of patients. These achievements have also provided critical insights regarding potentially targetable lesions for development of new therapeutic approaches in the era of precision medicine. This review delineates the current genetic landscape of childhood ALL with emphasis upon patient outcomes with contemporary treatment regimens, as well as therapeutic implications of newly identified genomic alterations in specific subsets of ALL. PMID:26194091
Gurjav, Ulziijargal; Outhred, Alexander C.; Jelfs, Peter; McCallum, Nadine; Wang, Qinning; Hill-Cawthorne, Grant A.; Marais, Ben J.; Sintchenko, Vitali
2016-01-01
Australia has a low tuberculosis incidence rate with most cases occurring among recent immigrants. Given suboptimal cluster resolution achieved with 24-locus mycobacterium interspersed repetitive unit (MIRU-24) genotyping, the added value of whole genome sequencing was explored. MIRU-24 profiles of all Mycobacterium tuberculosis culture-confirmed tuberculosis cases diagnosed between 2009 and 2013 in New South Wales (NSW), Australia, were examined and clusters identified. The relatedness of cases within the largest MIRU-24 clusters was assessed using whole genome sequencing and phylogenetic analyses. Of 1841 culture-confirmed TB cases, 91.9% (1692/1841) had complete demographic and genotyping data. East-African Indian (474; 28.0%) and Beijing (470; 27.8%) lineage strains predominated. The overall rate of MIRU-24 clustering was 20.1% (340/1692) and was highest among Beijing lineage strains (35.7%; 168/470). One Beijing and three East-African Indian (EAI) clonal complexes were responsible for the majority of observed clusters. Whole genome sequencing of the 4 largest clusters (30 isolates) demonstrated diverse single nucleotide polymorphisms (SNPs) within identified clusters. All sequenced EAI strains and 70% of Beijing lineage strains clustered by MIRU-24 typing demonstrated distinct SNP profiles. The superior resolution provided by whole genome sequencing demonstrated limited M. tuberculosis transmission within NSW, even within identified MIRU-24 clusters. Routine whole genome sequencing could provide valuable public health guidance in low burden settings. PMID:27737005
Global Identification and Characterization of Transcriptionally Active Regions in the Rice Genome
Stolc, Viktor; Deng, Wei; He, Hang; Korbel, Jan; Chen, Xuewei; Tongprasit, Waraporn; Ronald, Pamela; Chen, Runsheng; Gerstein, Mark; Wang Deng, Xing
2007-01-01
Genome tiling microarray studies have consistently documented rich transcriptional activity beyond the annotated genes. However, systematic characterization and transcriptional profiling of the putative novel transcripts on the genome scale are still lacking. We report here the identification of 25,352 and 27,744 transcriptionally active regions (TARs) not encoded by annotated exons in the rice (Oryza. sativa) subspecies japonica and indica, respectively. The non-exonic TARs account for approximately two thirds of the total TARs detected by tiling arrays and represent transcripts likely conserved between japonica and indica. Transcription of 21,018 (83%) japonica non-exonic TARs was verified through expression profiling in 10 tissue types using a re-array in which annotated genes and TARs were each represented by five independent probes. Subsequent analyses indicate that about 80% of the japonica TARs that were not assigned to annotated exons can be assigned to various putatively functional or structural elements of the rice genome, including splice variants, uncharacterized portions of incompletely annotated genes, antisense transcripts, duplicated gene fragments, and potential non-coding RNAs. These results provide a systematic characterization of non-exonic transcripts in rice and thus expand the current view of the complexity and dynamics of the rice transcriptome. PMID:17372628
Hsu, Yi-Hsiang; Zillikens, M Carola; Wilson, Scott G; Farber, Charles R; Demissie, Serkalem; Soranzo, Nicole; Bianchi, Estelle N; Grundberg, Elin; Liang, Liming; Richards, J Brent; Estrada, Karol; Zhou, Yanhua; van Nas, Atila; Moffatt, Miriam F; Zhai, Guangju; Hofman, Albert; van Meurs, Joyce B; Pols, Huibert A P; Price, Roger I; Nilsson, Olle; Pastinen, Tomi; Cupples, L Adrienne; Lusis, Aldons J; Schadt, Eric E; Ferrari, Serge; Uitterlinden, André G; Rivadeneira, Fernando; Spector, Timothy D; Karasik, David; Kiel, Douglas P
2010-06-10
Osteoporosis is a complex disorder and commonly leads to fractures in elderly persons. Genome-wide association studies (GWAS) have become an unbiased approach to identify variations in the genome that potentially affect health. However, the genetic variants identified so far only explain a small proportion of the heritability for complex traits. Due to the modest genetic effect size and inadequate power, true association signals may not be revealed based on a stringent genome-wide significance threshold. Here, we take advantage of SNP and transcript arrays and integrate GWAS and expression signature profiling relevant to the skeletal system in cellular and animal models to prioritize the discovery of novel candidate genes for osteoporosis-related traits, including bone mineral density (BMD) at the lumbar spine (LS) and femoral neck (FN), as well as geometric indices of the hip (femoral neck-shaft angle, NSA; femoral neck length, NL; and narrow-neck width, NW). A two-stage meta-analysis of GWAS from 7,633 Caucasian women and 3,657 men, revealed three novel loci associated with osteoporosis-related traits, including chromosome 1p13.2 (RAP1A, p = 3.6x10(-8)), 2q11.2 (TBC1D8), and 18q11.2 (OSBPL1A), and confirmed a previously reported region near TNFRSF11B/OPG gene. We also prioritized 16 suggestive genome-wide significant candidate genes based on their potential involvement in skeletal metabolism. Among them, 3 candidate genes were associated with BMD in women. Notably, 2 out of these 3 genes (GPR177, p = 2.6x10(-13); SOX6, p = 6.4x10(-10)) associated with BMD in women have been successfully replicated in a large-scale meta-analysis of BMD, but none of the non-prioritized candidates (associated with BMD) did. Our results support the concept of our prioritization strategy. In the absence of direct biological support for identified genes, we highlighted the efficiency of subsequent functional characterization using publicly available expression profiling relevant to the skeletal system in cellular or whole animal models to prioritize candidate genes for further functional validation.
2012-01-01
Background The biphasic life cycle with pelagic larva and benthic adult stages is widely observed in the animal kingdom, including the Porifera (sponges), which are the earliest branching metazoans. The demosponge, Amphimedon queenslandica, undergoes metamorphosis from a free-swimming larva into a sessile adult that bears no morphological resemblance to other animals. While the genome of A. queenslandica contains an extensive repertoire of genes very similar to that of complex bilaterians, it is as yet unclear how this is drawn upon to coordinate changing morphological features and ecological demands throughout the sponge life cycle. Results To identify genome-wide events that accompany the pelagobenthic transition in A. queenslandica, we compared global gene expression profiles at four key developmental stages by sequencing the poly(A) transcriptome using SOLiD technology. Large-scale changes in transcription were observed as sponge larvae settled on the benthos and began metamorphosis. Although previous systematics suggest that the only clear homology between Porifera and other animals is in the embryonic and larval stages, we observed extensive use of genes involved in metazoan-associated cellular processes throughout the sponge life cycle. Sponge-specific transcripts are not over-represented in the morphologically distinct adult; rather, many genes that encode typical metazoan features, such as cell adhesion and immunity, are upregulated. Our analysis further revealed gene families with candidate roles in competence, settlement, and metamorphosis in the sponge, including transcription factors, G-protein coupled receptors and other signaling molecules. Conclusions This first genome-wide study of the developmental transcriptome in an early branching metazoan highlights major transcriptional events that accompany the pelagobenthic transition and point to a network of regulatory mechanisms that coordinate changes in morphology with shifting environmental demands. Metazoan developmental and structural gene orthologs are well-integrated into the expression profiles at every stage of sponge development, including the adult. The utilization of genes involved in metazoan-associated processes throughout sponge development emphasizes the potential of the genome of the last common ancestor of animals to generate phenotypic complexity. PMID:22646746
Ferrarini, Alberto; Forcato, Claudio; Buson, Genny; Tononi, Paola; Del Monaco, Valentina; Terracciano, Mario; Bolognesi, Chiara; Fontana, Francesca; Medoro, Gianni; Neves, Rui; Möhlendick, Birte; Rihawi, Karim; Ardizzoni, Andrea; Sumanasuriya, Semini; Flohr, Penny; Lambros, Maryou; de Bono, Johann; Stoecklein, Nikolas H; Manaresi, Nicolò
2018-01-01
Chromosomal instability and associated chromosomal aberrations are hallmarks of cancer and play a critical role in disease progression and development of resistance to drugs. Single-cell genome analysis has gained interest in latest years as a source of biomarkers for targeted-therapy selection and drug resistance, and several methods have been developed to amplify the genomic DNA and to produce libraries suitable for Whole Genome Sequencing (WGS). However, most protocols require several enzymatic and cleanup steps, thus increasing the complexity and length of protocols, while robustness and speed are key factors for clinical applications. To tackle this issue, we developed a single-tube, single-step, streamlined protocol, exploiting ligation mediated PCR (LM-PCR) Whole Genome Amplification (WGA) method, for low-pass genome sequencing with the Ion Torrent™ platform and copy number alterations (CNAs) calling from single cells. The method was evaluated on single cells isolated from 6 aberrant cell lines of the NCI-H series. In addition, to demonstrate the feasibility of the workflow on clinical samples, we analyzed single circulating tumor cells (CTCs) and white blood cells (WBCs) isolated from the blood of patients affected by prostate cancer or lung adenocarcinoma. The results obtained show that the developed workflow generates data accurately representing whole genome absolute copy number profiles of single cell and allows alterations calling at resolutions down to 100 Kbp with as few as 200,000 reads. The presented data demonstrate the feasibility of the Ampli1™ WGA-based low-pass workflow for detection of CNAs in single tumor cells which would be of particular interest for genome-driven targeted therapy selection and for monitoring of disease progression.
Unmasking molecular profiles of bladder cancer.
Piao, Xuan-Mei; Byun, Young Joon; Kim, Wun-Jae; Kim, Jayoung
2018-03-01
Precision medicine is designed to tailor treatments for individual patients by factoring in each person's specific biology and mechanism of disease. This paradigm shifted from a "one size fits all" approach to "personalized and precision care" requires multiple layers of molecular profiling of biomarkers for accurate diagnosis and prediction of treatment responses. Intensive studies are also being performed to understand the complex and dynamic molecular profiles of bladder cancer. These efforts involve looking bladder cancer mechanism at the multiple levels of the genome, epigenome, transcriptome, proteome, lipidome, metabolome etc. The aim of this short review is to outline the current technologies being used to investigate molecular profiles and discuss biomarker candidates that have been investigated as possible diagnostic and prognostic indicators of bladder cancer.
CRISPR-cas loci profiling of Cronobacter sakazakii pathovars.
Ogrodzki, Pauline; Forsythe, Stephen James
2016-12-01
Cronobacter sakazakii sequence types 1, 4, 8 and 12 are associated with outbreaks of neonatal meningitis and necrotizing enterocolitis infections. However clonality results in strains which are indistinguishable using conventional methods. This study investigated the use of clustered regularly interspaced short palindromic repeats (CRISPR)-cas loci profiling for epidemiological investigations. Seventy whole genomes of C. sakazakii strains from four clonal complexes which were widely distributed temporally, geographically and origin of source were profiled. All strains encoded the same type I-E subtype CRISPR-cas system with a total of 12 different CRISPR spacer arrays. This study demonstrated the greater discriminatory power of CRISPR spacer array profiling compared with multilocus sequence typing, which will be of use in source attribution during Cronobacter outbreak investigations.
Strategies to explore functional genomics data sets in NCBI's GEO database.
Wilhite, Stephen E; Barrett, Tanya
2012-01-01
The Gene Expression Omnibus (GEO) database is a major repository that stores high-throughput functional genomics data sets that are generated using both microarray-based and sequence-based technologies. Data sets are submitted to GEO primarily by researchers who are publishing their results in journals that require original data to be made freely available for review and analysis. In addition to serving as a public archive for these data, GEO has a suite of tools that allow users to identify, analyze, and visualize data relevant to their specific interests. These tools include sample comparison applications, gene expression profile charts, data set clusters, genome browser tracks, and a powerful search engine that enables users to construct complex queries.
Strategies to Explore Functional Genomics Data Sets in NCBI’s GEO Database
Wilhite, Stephen E.; Barrett, Tanya
2012-01-01
The Gene Expression Omnibus (GEO) database is a major repository that stores high-throughput functional genomics data sets that are generated using both microarray-based and sequence-based technologies. Data sets are submitted to GEO primarily by researchers who are publishing their results in journals that require original data to be made freely available for review and analysis. In addition to serving as a public archive for these data, GEO has a suite of tools that allow users to identify, analyze and visualize data relevant to their specific interests. These tools include sample comparison applications, gene expression profile charts, data set clusters, genome browser tracks, and a powerful search engine that enables users to construct complex queries. PMID:22130872
Sczyrba, Alexander; Hofmann, Peter; Belmann, Peter; Koslicki, David; Janssen, Stefan; Dröge, Johannes; Gregor, Ivan; Majda, Stephan; Fiedler, Jessika; Dahms, Eik; Bremges, Andreas; Fritz, Adrian; Garrido-Oter, Ruben; Jørgensen, Tue Sparholt; Shapiro, Nicole; Blood, Philip D.; Gurevich, Alexey; Bai, Yang; Turaev, Dmitrij; DeMaere, Matthew Z.; Chikhi, Rayan; Nagarajan, Niranjan; Quince, Christopher; Meyer, Fernando; Balvočiūtė, Monika; Hansen, Lars Hestbjerg; Sørensen, Søren J.; Chia, Burton K. H.; Denis, Bertrand; Froula, Jeff L.; Wang, Zhong; Egan, Robert; Kang, Dongwan Don; Cook, Jeffrey J.; Deltel, Charles; Beckstette, Michael; Lemaitre, Claire; Peterlongo, Pierre; Rizk, Guillaume; Lavenier, Dominique; Wu, Yu-Wei; Singer, Steven W.; Jain, Chirag; Strous, Marc; Klingenberg, Heiner; Meinicke, Peter; Barton, Michael; Lingner, Thomas; Lin, Hsin-Hung; Liao, Yu-Chieh; Silva, Genivaldo Gueiros Z.; Cuevas, Daniel A.; Edwards, Robert A.; Saha, Surya; Piro, Vitor C.; Renard, Bernhard Y.; Pop, Mihai; Klenk, Hans-Peter; Göker, Markus; Kyrpides, Nikos C.; Woyke, Tanja; Vorholt, Julia A.; Schulze-Lefert, Paul; Rubin, Edward M.; Darling, Aaron E.; Rattei, Thomas; McHardy, Alice C.
2018-01-01
In metagenome analysis, computational methods for assembly, taxonomic profiling and binning are key components facilitating downstream biological data interpretation. However, a lack of consensus about benchmarking datasets and evaluation metrics complicates proper performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on datasets of unprecedented complexity and realism. Benchmark metagenomes were generated from ~700 newly sequenced microorganisms and ~600 novel viruses and plasmids, including genomes with varying degrees of relatedness to each other and to publicly available ones and representing common experimental setups. Across all datasets, assembly and genome binning programs performed well for species represented by individual genomes, while performance was substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below the family level. Parameter settings substantially impacted performances, underscoring the importance of program reproducibility. While highlighting current challenges in computational metagenomics, the CAMI results provide a roadmap for software selection to answer specific research questions. PMID:28967888
HuH-7 reference genome profile: complex karyotype composed of massive loss of heterozygosity.
Kasai, Fumio; Hirayama, Noriko; Ozawa, Midori; Satoh, Motonobu; Kohara, Arihiro
2018-05-17
Human cell lines represent a valuable resource as in vitro experimental models. A hepatoma cell line, HuH-7 (JCRB0403), has been used extensively in various research fields and a number of studies using this line have been published continuously since it was established in 1982. However, an accurate genome profile, which can be served as a reliable reference, has not been available. In this study, we performed M-FISH, SNP microarray and amplicon sequencing to characterize the cell line. Single cell analysis of metaphases revealed a high level of heterogeneity with a mode of 60 chromosomes. Cytogenetic results demonstrated chromosome abnormalities involving every chromosome in addition to a massive loss of heterozygosity, which accounts for 55.3% of the genome, consistent with the homozygous variants seen in the sequence analysis. We provide empirical data that the HuH-7 cell line is composed of highly heterogeneous cell populations, suggesting that besides cell line authentication, the quality of cell lines needs to be taken into consideration in the future use of tumor cell lines.
Analysis of Ribosome Stalling and Translation Elongation Dynamics by Deep Learning.
Zhang, Sai; Hu, Hailin; Zhou, Jingtian; He, Xuan; Jiang, Tao; Zeng, Jianyang
2017-09-27
Ribosome stalling is manifested by the local accumulation of ribosomes at specific codon positions of mRNAs. Here, we present ROSE, a deep learning framework to analyze high-throughput ribosome profiling data and estimate the probability of a ribosome stalling event occurring at each genomic location. Extensive validation tests on independent data demonstrated that ROSE possessed higher prediction accuracy than conventional prediction models, with an increase in the area under the receiver operating characteristic curve by up to 18.4%. In addition, genome-wide statistical analyses showed that ROSE predictions can be well correlated with diverse putative regulatory factors of ribosome stalling. Moreover, the genome-wide ribosome stalling landscapes of both human and yeast computed by ROSE recovered the functional interplays between ribosome stalling and cotranslational events in protein biogenesis, including protein targeting by the signal recognition particles and protein secondary structure formation. Overall, our study provides a novel method to complement the ribosome profiling techniques and further decipher the complex regulatory mechanisms underlying translation elongation dynamics encoded in the mRNA sequence. Copyright © 2017 Elsevier Inc. All rights reserved.
Pereira, Bernard; Chin, Suet-Feung; Rueda, Oscar M.; Vollan, Hans-Kristian Moen; Provenzano, Elena; Bardwell, Helen A.; Pugh, Michelle; Jones, Linda; Russell, Roslin; Sammut, Stephen-John; Tsui, Dana W. Y.; Liu, Bin; Dawson, Sarah-Jane; Abraham, Jean; Northen, Helen; Peden, John F.; Mukherjee, Abhik; Turashvili, Gulisa; Green, Andrew R.; McKinney, Steve; Oloumi, Arusha; Shah, Sohrab; Rosenfeld, Nitzan; Murphy, Leigh; Bentley, David R.; Ellis, Ian O.; Purushotham, Arnie; Pinder, Sarah E.; Børresen-Dale, Anne-Lise; Earl, Helena M.; Pharoah, Paul D.; Ross, Mark T.; Aparicio, Samuel; Caldas, Carlos
2016-01-01
The genomic landscape of breast cancer is complex, and inter- and intra-tumour heterogeneity are important challenges in treating the disease. In this study, we sequence 173 genes in 2,433 primary breast tumours that have copy number aberration (CNA), gene expression and long-term clinical follow-up data. We identify 40 mutation-driver (Mut-driver) genes, and determine associations between mutations, driver CNA profiles, clinical-pathological parameters and survival. We assess the clonal states of Mut-driver mutations, and estimate levels of intra-tumour heterogeneity using mutant-allele fractions. Associations between PIK3CA mutations and reduced survival are identified in three subgroups of ER-positive cancer (defined by amplification of 17q23, 11q13–14 or 8q24). High levels of intra-tumour heterogeneity are in general associated with a worse outcome, but highly aggressive tumours with 11q13–14 amplification have low levels of intra-tumour heterogeneity. These results emphasize the importance of genome-based stratification of breast cancer, and have important implications for designing therapeutic strategies. PMID:27161491
Nutritional metabolomics: Progress in addressing complexity in diet and health
Jones, Dean P.; Park, Youngja; Ziegler, Thomas R.
2013-01-01
Nutritional metabolomics is rapidly maturing to use small molecule chemical profiling to support integration of diet and nutrition in complex biosystems research. These developments are critical to facilitate transition of nutritional sciences from population-based to individual-based criteria for nutritional research, assessment and management. This review addresses progress in making these approaches manageable for nutrition research. Important concept developments concerning the exposome, predictive health and complex pathobiology, serve to emphasize the central role of diet and nutrition in integrated biosystems models of health and disease. Improved analytic tools and databases for targeted and non-targeted metabolic profiling, along with bioinformatics, pathway mapping and computational modeling, are now used for nutrition research on diet, metabolism, microbiome and health associations. These new developments enable metabolome-wide association studies (MWAS) and provide a foundation for nutritional metabolomics, along with genomics, epigenomics and health phenotyping, to support integrated models required for personalized diet and nutrition forecasting. PMID:22540256
GEMINI: a computationally-efficient search engine for large gene expression datasets.
DeFreitas, Timothy; Saddiki, Hachem; Flaherty, Patrick
2016-02-24
Low-cost DNA sequencing allows organizations to accumulate massive amounts of genomic data and use that data to answer a diverse range of research questions. Presently, users must search for relevant genomic data using a keyword, accession number of meta-data tag. However, in this search paradigm the form of the query - a text-based string - is mismatched with the form of the target - a genomic profile. To improve access to massive genomic data resources, we have developed a fast search engine, GEMINI, that uses a genomic profile as a query to search for similar genomic profiles. GEMINI implements a nearest-neighbor search algorithm using a vantage-point tree to store a database of n profiles and in certain circumstances achieves an [Formula: see text] expected query time in the limit. We tested GEMINI on breast and ovarian cancer gene expression data from The Cancer Genome Atlas project and show that it achieves a query time that scales as the logarithm of the number of records in practice on genomic data. In a database with 10(5) samples, GEMINI identifies the nearest neighbor in 0.05 sec compared to a brute force search time of 0.6 sec. GEMINI is a fast search engine that uses a query genomic profile to search for similar profiles in a very large genomic database. It enables users to identify similar profiles independent of sample label, data origin or other meta-data information.
Miklós, István
2009-01-01
Homologous genes originate from a common ancestor through vertical inheritance, duplication, or horizontal gene transfer. Entire homolog families spawned by a single ancestral gene can be identified across multiple genomes based on protein sequence similarity. The sequences, however, do not always reveal conclusively the history of large families. To study the evolution of complete gene repertoires, we propose here a mathematical framework that does not rely on resolved gene family histories. We show that so-called phylogenetic profiles, formed by family sizes across multiple genomes, are sufficient to infer principal evolutionary trends. The main novelty in our approach is an efficient algorithm to compute the likelihood of a phylogenetic profile in a model of birth-and-death processes acting on a phylogeny. We examine known gene families in 28 archaeal genomes using a probabilistic model that involves lineage- and family-specific components of gene acquisition, duplication, and loss. The model enables us to consider all possible histories when inferring statistics about archaeal evolution. According to our reconstruction, most lineages are characterized by a net loss of gene families. Major increases in gene repertoire have occurred only a few times. Our reconstruction underlines the importance of persistent streamlining processes in shaping genome composition in Archaea. It also suggests that early archaeal genomes were as complex as typical modern ones, and even show signs, in the case of the methanogenic ancestor, of an extremely large gene repertoire. PMID:19570746
Advances in epigenetics and epigenomics for neurodegenerative diseases.
Qureshi, Irfan A; Mehler, Mark F
2011-10-01
In the post-genomic era, epigenetic factors-literally those that are "over" or "above" genetic ones and responsible for controlling the expression and function of genes-have emerged as important mediators of development and aging; gene-gene and gene-environmental interactions; and the pathophysiology of complex disease states. Here, we provide a brief overview of the major epigenetic mechanisms (ie, DNA methylation, histone modifications and chromatin remodeling, and non-coding RNA regulation). We highlight the nearly ubiquitous profiles of epigenetic dysregulation that have been found in Alzheimer's and other neurodegenerative diseases. We also review innovative methods and technologies that enable the characterization of individual epigenetic modifications and more widespread epigenomic states at high resolution. We conclude that, together with complementary genetic, genomic, and related approaches, interrogating epigenetic and epigenomic profiles in neurodegenerative diseases represent important and increasingly practical strategies for advancing our understanding of and the diagnosis and treatment of these disorders.
Advances in Epigenetics and Epigenomics for Neurodegenerative Diseases
Qureshi, Irfan A.
2015-01-01
In the post-genomic era, epigenetic factors—literally those that are “over” or “above” genetic ones and responsible for controlling the expression and function of genes—have emerged as important mediators of development and aging; gene-gene and gene-environmental interactions; and the pathophysiology of complex disease states. Here, we provide a brief overview of the major epigenetic mechanisms (ie, DNA methylation, histone modifications and chromatin remodeling, and non-coding RNA regulation). We highlight the nearly ubiquitous profiles of epigenetic dysregulation that have been found in Alzheimer’s and other neurodegenerative diseases. We also review innovative methods and technologies that enable the characterization of individual epigenetic modifications and more widespread epigenomic states at high resolution. We conclude that, together with complementary genetic, genomic, and related approaches, interrogating epigenetic and epigenomic profiles in neurodegenerative diseases represent important and increasingly practical strategies for advancing our understanding of and the diagnosis and treatment of these disorders. PMID:21671162
Effector profiles distinguish formae speciales of Fusarium oxysporum.
van Dam, Peter; Fokkens, Like; Schmidt, Sarah M; Linmans, Jasper H J; Kistler, H Corby; Ma, Li-Jun; Rep, Martijn
2016-11-01
Formae speciales (ff.spp.) of the fungus Fusarium oxysporum are often polyphyletic within the species complex, making it impossible to identify them on the basis of conserved genes. However, sequences that determine host-specific pathogenicity may be expected to be similar between strains within the same forma specialis. Whole genome sequencing was performed on strains from five different ff.spp. (cucumerinum, niveum, melonis, radicis-cucumerinum and lycopersici). In each genome, genes for putative effectors were identified based on small size, secretion signal, and vicinity to a "miniature impala" transposable element. The candidate effector genes of all genomes were collected and the presence/absence patterns in each individual genome were clustered. Members of the same forma specialis turned out to group together, with cucurbit-infecting strains forming a supercluster separate from other ff.spp. Moreover, strains from different clonal lineages within the same forma specialis harbour identical effector gene sequences, supporting horizontal transfer of genetic material. These data offer new insight into the genetic basis of host specificity in the F. oxysporum species complex and show that (putative) effectors can be used to predict host specificity in F. oxysporum. © 2016 Society for Applied Microbiology and John Wiley & Sons Ltd.
Applicability of SCAR markers to food genomics: olive oil traceability.
Pafundo, Simona; Agrimonti, Caterina; Maestri, Elena; Marmiroli, Nelson
2007-07-25
DNA analysis with molecular markers has opened a shortcut toward a genomic comprehension of complex organisms. The availability of micro-DNA extraction methods, coupled with selective amplification of the smallest extracted fragments with molecular markers, could equally bring a breakthrough in food genomics: the identification of original components in food. Amplified fragment length polymorphisms (AFLPs) have been instrumental in plant genomics because they may allow rapid and reliable analysis of multiple and potentially polymorphic sites. Nevertheless, their direct application to the analysis of DNA extracted from food matrixes is complicated by the low quality of DNA extracted: its high degradation and the presence of inhibitors of enzymatic reactions. The conversion of an AFLP fragment to a robust and specific single-locus PCR-based marker, therefore, could extend the use of molecular markers to large-scale analysis of complex agro-food matrixes. In the present study is reported the development of sequence characterized amplified regions (SCARs) starting from AFLP profiles of monovarietal olive oils analyzed on agarose gel; one of these was used to identify differences among 56 olive cultivars. All the developed markers were purposefully amplified in olive oils to apply them to olive oil traceability.
Metabolic pathway profiling of mitochondrial respiratory chain mutants in C. elegans
MJ, Falk; Z, Zhang; Rosenjack; Nissim; E, Daikhin; Nissim; MM, Sedensky; M, Yudkoff; PG, Morgan
2008-01-01
C. elegans affords a model of primary mitochondrial dysfunction that provides insight into cellular adaptations which accompany mutations in nuclear gene that encode mitochondrial proteins. To this end, we characterized genome-wide expression profiles of C. elegans strains with mutations in nuclear-encoded subunits of respiratory chain complexes. Our goal was to detect concordant changes among clusters of genes that comprise defined metabolic pathways. Results indicate that respiratory chain mutants significantly upregulate a variety of basic cellular metabolic pathways involved in carbohydrate, amino acid, and fatty acid metabolism, as well as cellular defense pathways such as the metabolism of P450 and glutathione. To further confirm and extend expression analysis findings, quantitation of whole worm free amino acid levels was performed in C. elegans mitochondrial mutants for subunits of complexes I, II, and III. Significant differences were seen for 13 of 16 amino acid levels in complex I mutants compared with controls, as well as overarching similarities among profiles of complex I, II, and III mutants compared with controls. The specific pattern of amino acid alterations observed provides novel evidence to suggest that an increase in glutamate-linked transamination reactions caused by the failure of NAD+ dependent oxidation of ketoacids occurs in primary mitochondrial respiratory chain mutants. Recognition of consistent alterations among patterns of nuclear gene expression for multiple biochemical pathways and in quantitative amino acid profiles in a translational genetic model of mitochondrial dysfunction allows insight into the complex pathogenesis underlying primary mitochondrial disease. Such knowledge may enable the development of a metabolomic profiling diagnostic tool applicable to human mitochondrial disease. PMID:18178500
Single-cell genomic profiling of acute myeloid leukemia for clinical use: A pilot study
Yan, Benedict; Hu, Yongli; Ban, Kenneth H.K.; Tiang, Zenia; Ng, Christopher; Lee, Joanne; Tan, Wilson; Chiu, Lily; Tan, Tin Wee; Seah, Elaine; Ng, Chin Hin; Chng, Wee-Joo; Foo, Roger
2017-01-01
Although bulk high-throughput genomic profiling studies have led to a significant increase in the understanding of cancer biology, there is increasing awareness that bulk profiling approaches do not completely elucidate tumor heterogeneity. Single-cell genomic profiling enables the distinction of tumor heterogeneity, and may improve clinical diagnosis through the identification and characterization of putative subclonal populations. In the present study, the challenges associated with a single-cell genomics profiling workflow for clinical diagnostics were investigated. Single-cell RNA-sequencing (RNA-seq) was performed on 20 cells from an acute myeloid leukemia bone marrow sample. Putative blasts were identified based on their gene expression profiles and principal component analysis was performed to identify outlier cells. Variant calling was performed on the single-cell RNA-seq data. The present pilot study demonstrates a proof of concept for clinical single-cell genomic profiling. The recognized limitations include significant stochastic RNA loss and the relatively low throughput of the current proposed platform. Although the results of the present study are promising, further technological advances and protocol optimization are necessary for single-cell genomic profiling to be clinically viable. PMID:28454300
Quantitative phenotyping via deep barcode sequencing.
Smith, Andrew M; Heisler, Lawrence E; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J; Chee, Mark; Roth, Frederick P; Giaever, Guri; Nislow, Corey
2009-10-01
Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or "Bar-seq," outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that approximately 20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene-environment interactions on a genome-wide scale.
A collaborative exercise on DNA methylation based body fluid typing.
Jung, Sang-Eun; Cho, Sohee; Antunes, Joana; Gomes, Iva; Uchimoto, Mari L; Oh, Yu Na; Di Giacomo, Lisa; Schneider, Peter M; Park, Min Sun; van der Meer, Dieudonne; Williams, Graham; McCord, Bruce; Ahn, Hee-Jung; Choi, Dong Ho; Lee, Yang Han; Lee, Soong Deok; Lee, Hwan Young
2016-10-01
A collaborative exercise on DNA methylation based body fluid identification was conducted by seven laboratories. For this project, a multiplex methylation SNaPshot reaction composed of seven CpG markers was used for the identification of four body fluids, including blood, saliva, semen, and vaginal fluid. A total of 30 specimens were prepared and distributed to participating laboratories after thorough testing. The required experiments included four increasingly complex tasks: (1) CE of a purified single-base extension reaction product, (2) multiplex PCR and multiplex single-base extension reaction of bisulfite-modified DNA, (3) bisulfite conversion of genomic DNA, and (4) extraction of genomic DNA from body fluid samples. In tasks 2, 3 and 4, one or more mixtures were analyzed, and specimens containing both known and unknown body fluid sources were used. Six of the laboratories generated consistent body fluid typing results for specimens of bisulfite-converted DNA and genomic DNA. One laboratory failed to set up appropriate conditions for capillary analysis of reference single-base extension products. In general, variation in the values obtained for DNA methylation analysis between laboratories increased with the complexity of the required experiments. However, all laboratories concurred on the interpretation of the DNA methylation profiles produced. Although the establishment of interpretational guidelines on DNA methylation based body fluid identification has yet to be performed, this study supports the addition of DNA methylation profiling to forensic body fluid typing. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Ficklin, Stephen P; Feltus, Frank Alex
2013-01-01
Many traits of biological and agronomic significance in plants are controlled in a complex manner where multiple genes and environmental signals affect the expression of the phenotype. In Oryza sativa (rice), thousands of quantitative genetic signals have been mapped to the rice genome. In parallel, thousands of gene expression profiles have been generated across many experimental conditions. Through the discovery of networks with real gene co-expression relationships, it is possible to identify co-localized genetic and gene expression signals that implicate complex genotype-phenotype relationships. In this work, we used a knowledge-independent, systems genetics approach, to discover a high-quality set of co-expression networks, termed Gene Interaction Layers (GILs). Twenty-two GILs were constructed from 1,306 Affymetrix microarray rice expression profiles that were pre-clustered to allow for improved capture of gene co-expression relationships. Functional genomic and genetic data, including over 8,000 QTLs and 766 phenotype-tagged SNPs (p-value < = 0.001) from genome-wide association studies, both covering over 230 different rice traits were integrated with the GILs. An online systems genetics data-mining resource, the GeneNet Engine, was constructed to enable dynamic discovery of gene sets (i.e. network modules) that overlap with genetic traits. GeneNet Engine does not provide the exact set of genes underlying a given complex trait, but through the evidence of gene-marker correspondence, co-expression, and functional enrichment, site visitors can identify genes with potential shared causality for a trait which could then be used for experimental validation. A set of 2 million SNPs was incorporated into the database and serve as a potential set of testable biomarkers for genes in modules that overlap with genetic traits. Herein, we describe two modules found using GeneNet Engine, one with significant overlap with the trait amylose content and another with significant overlap with blast disease resistance.
Ficklin, Stephen P.; Feltus, Frank Alex
2013-01-01
Many traits of biological and agronomic significance in plants are controlled in a complex manner where multiple genes and environmental signals affect the expression of the phenotype. In Oryza sativa (rice), thousands of quantitative genetic signals have been mapped to the rice genome. In parallel, thousands of gene expression profiles have been generated across many experimental conditions. Through the discovery of networks with real gene co-expression relationships, it is possible to identify co-localized genetic and gene expression signals that implicate complex genotype-phenotype relationships. In this work, we used a knowledge-independent, systems genetics approach, to discover a high-quality set of co-expression networks, termed Gene Interaction Layers (GILs). Twenty-two GILs were constructed from 1,306 Affymetrix microarray rice expression profiles that were pre-clustered to allow for improved capture of gene co-expression relationships. Functional genomic and genetic data, including over 8,000 QTLs and 766 phenotype-tagged SNPs (p-value < = 0.001) from genome-wide association studies, both covering over 230 different rice traits were integrated with the GILs. An online systems genetics data-mining resource, the GeneNet Engine, was constructed to enable dynamic discovery of gene sets (i.e. network modules) that overlap with genetic traits. GeneNet Engine does not provide the exact set of genes underlying a given complex trait, but through the evidence of gene-marker correspondence, co-expression, and functional enrichment, site visitors can identify genes with potential shared causality for a trait which could then be used for experimental validation. A set of 2 million SNPs was incorporated into the database and serve as a potential set of testable biomarkers for genes in modules that overlap with genetic traits. Herein, we describe two modules found using GeneNet Engine, one with significant overlap with the trait amylose content and another with significant overlap with blast disease resistance. PMID:23874666
Genomic catastrophes frequently arise in esophageal adenocarcinoma and drive tumorigenesis
Patch, Ann-Marie; Bailey, Peter; Newell, Felicity; Holmes, Oliver; Fink, J. Lynn; Quinn, Michael C.J.; Tang, Yue Hang; Lampe, Guy; Quek, Kelly; Loffler, Kelly A.; Manning, Suzanne; Idrisoglu, Senel; Miller, David; Xu, Qinying; Waddell, Nick; Wilson, Peter J.; Bruxner, Timothy J.C.; Christ, Angelika N.; Harliwong, Ivon; Nourse, Craig; Nourbakhsh, Ehsan; Anderson, Matthew; Kazakoff, Stephen; Leonard, Conrad; Wood, Scott; Simpson, Peter T.; Reid, Lynne E.; Krause, Lutz; Hussey, Damian J.; Watson, David I.; Lord, Reginald V.; Nancarrow, Derek; Phillips, Wayne A.; Gotley, David; Smithers, B. Mark; Whiteman, David C.; Hayward, Nicholas K.; Campbell, Peter J.; Pearson, John V.; Grimmond, Sean M.; Barbour, Andrew P.
2015-01-01
Oesophageal adenocarcinoma (EAC) incidence is rapidly increasing in Western countries. A better understanding of EAC underpins efforts to improve early detection and treatment outcomes. While large EAC exome sequencing efforts to date have found recurrent loss-of-function mutations, oncogenic driving events have been underrepresented. Here we use a combination of whole-genome sequencing (WGS) and single-nucleotide polymorphism-array profiling to show that genomic catastrophes are frequent in EAC, with almost a third (32%, n = 40/123) undergoing chromothriptic events. WGS of 22 EAC cases show that catastrophes may lead to oncogene amplification through chromothripsis-derived double-minute chromosome formation (MYC and MDM2) or breakage-fusion-bridge (KRAS, MDM2 and RFC3). Telomere shortening is more prominent in EACs bearing localized complex rearrangements. Mutational signature analysis also confirms that extreme genomic instability in EAC can be driven by somatic BRCA2 mutations. These findings suggest that genomic catastrophes have a significant role in the malignant transformation of EAC. PMID:25351503
Phylo_dCor: distance correlation as a novel metric for phylogenetic profiling.
Sferra, Gabriella; Fratini, Federica; Ponzi, Marta; Pizzi, Elisabetta
2017-09-05
Elaboration of powerful methods to predict functional and/or physical protein-protein interactions from genome sequence is one of the main tasks in the post-genomic era. Phylogenetic profiling allows the prediction of protein-protein interactions at a whole genome level in both Prokaryotes and Eukaryotes. For this reason it is considered one of the most promising methods. Here, we propose an improvement of phylogenetic profiling that enables handling of large genomic datasets and infer global protein-protein interactions. This method uses the distance correlation as a new measure of phylogenetic profile similarity. We constructed robust reference sets and developed Phylo-dCor, a parallelized version of the algorithm for calculating the distance correlation that makes it applicable to large genomic data. Using Saccharomyces cerevisiae and Escherichia coli genome datasets, we showed that Phylo-dCor outperforms phylogenetic profiling methods previously described based on the mutual information and Pearson's correlation as measures of profile similarity. In this work, we constructed and assessed robust reference sets and propose the distance correlation as a measure for comparing phylogenetic profiles. To make it applicable to large genomic data, we developed Phylo-dCor, a parallelized version of the algorithm for calculating the distance correlation. Two R scripts that can be run on a wide range of machines are available upon request.
Genome-scale approaches to the epigenetics of common human disease
2011-01-01
Traditionally, the pathology of human disease has been focused on microscopic examination of affected tissues, chemical and biochemical analysis of biopsy samples, other available samples of convenience, such as blood, and noninvasive or invasive imaging of varying complexity, in order to classify disease and illuminate its mechanistic basis. The molecular age has complemented this armamentarium with gene expression arrays and selective analysis of individual genes. However, we are entering a new era of epigenomic profiling, i.e., genome-scale analysis of cell-heritable nonsequence genetic change, such as DNA methylation. The epigenome offers access to stable measurements of cellular state and to biobanked material for large-scale epidemiological studies. Some of these genome-scale technologies are beginning to be applied to create the new field of epigenetic epidemiology. PMID:19844740
The Genetic Interpretation of Area under the ROC Curve in Genomic Profiling
Wray, Naomi R.; Yang, Jian; Goddard, Michael E.; Visscher, Peter M.
2010-01-01
Genome-wide association studies in human populations have facilitated the creation of genomic profiles which combine the effects of many associated genetic variants to predict risk of disease. The area under the receiver operator characteristic (ROC) curve is a well established measure for determining the efficacy of tests in correctly classifying diseased and non-diseased individuals. We use quantitative genetics theory to provide insight into the genetic interpretation of the area under the ROC curve (AUC) when the test classifier is a predictor of genetic risk. Even when the proportion of genetic variance explained by the test is 100%, there is a maximum value for AUC that depends on the genetic epidemiology of the disease, i.e. either the sibling recurrence risk or heritability and disease prevalence. We derive an equation relating maximum AUC to heritability and disease prevalence. The expression can be reversed to calculate the proportion of genetic variance explained given AUC, disease prevalence, and heritability. We use published estimates of disease prevalence and sibling recurrence risk for 17 complex genetic diseases to calculate the proportion of genetic variance that a test must explain to achieve AUC = 0.75; this varied from 0.10 to 0.74. We provide a genetic interpretation of AUC for use with predictors of genetic risk based on genomic profiles. We provide a strategy to estimate proportion of genetic variance explained on the liability scale from estimates of AUC, disease prevalence, and heritability (or sibling recurrence risk) available as an online calculator. PMID:20195508
Pfeiffer, Friedhelm; Zamora-Lagos, Maria-Antonia; Blettinger, Martin; Yeroslaviz, Assa; Dahl, Andreas; Gruber, Stephan; Habermann, Bianca H
2018-01-05
Due to the predominant usage of short-read sequencing to date, most bacterial genome sequences reported in the last years remain at the draft level. This precludes certain types of analyses, such as the in-depth analysis of genome plasticity. Here we report the finalized genome sequence of the environmental strain Aeromonas salmonicida subsp. pectinolytica 34mel, for which only a draft genome with 253 contigs is currently available. Successful completion of the transposon-rich genome critically depended on the PacBio long read sequencing technology. Using finalized genome sequences of A. salmonicida subsp. pectinolytica and other Aeromonads, we report the detailed analysis of the transposon composition of these bacterial species. Mobilome evolution is exemplified by a complex transposon, which has shifted from pathogenicity-related to environmental-related gene content in A. salmonicida subsp. pectinolytica 34mel. Obtaining the complete, circular genome of A. salmonicida subsp. pectinolytica allowed us to perform an in-depth analysis of its mobilome. We demonstrate the mobilome-dependent evolution of this strain's genetic profile from pathogenic to environmental.
Baran, Richard; Ivanova, Natalia N.; Jose, Nick; Garcia-Pichel, Ferran; Kyrpides, Nikos C.; Gugger, Muriel; Northen, Trent R.
2013-01-01
Mass spectrometry-based metabolomics has become a powerful tool for the detection of metabolites in complex biological systems and for the identification of novel metabolites. We previously identified a number of unexpected metabolites in the cyanobacterium Synechococcus sp. PCC 7002, such as histidine betaine, its derivatives and several unusual oligosaccharides. To test for the presence of these compounds and to assess the diversity of small polar metabolites in other cyanobacteria, we profiled cell extracts of nine strains representing much of the morphological and evolutionary diversification of this phylum. Spectral features in raw metabolite profiles obtained by normal phase liquid chromatography coupled to mass spectrometry (MS) were manually curated so that chemical formulae of metabolites could be assigned. For putative identification, retention times and MS/MS spectra were cross-referenced with those of standards or available sprectral library records. Overall, we detected 264 distinct metabolites. These included indeed different betaines, oligosaccharides as well as additional unidentified metabolites with chemical formulae not present in databases of metabolism. Some of these metabolites were detected only in a single strain, but some were present in more than one. Genomic interrogation of the strains revealed that generally, presence of a given metabolite corresponded well with the presence of its biosynthetic genes, if known. Our results show the potential of combining metabolite profiling and genomics for the identification of novel biosynthetic genes. PMID:24084783
Inaugural Genomics Automation Congress and the coming deluge of sequencing data.
Creighton, Chad J
2010-10-01
Presentations at Select Biosciences's first 'Genomics Automation Congress' (Boston, MA, USA) in 2010 focused on next-generation sequencing and the platforms and methodology around them. The meeting provided an overview of sequencing technologies, both new and emerging. Speakers shared their recent work on applying sequencing to profile cells for various levels of biomolecular complexity, including DNA sequences, DNA copy, DNA methylation, mRNA and microRNA. With sequencing time and costs continuing to drop dramatically, a virtual explosion of very large sequencing datasets is at hand, which will probably present challenges and opportunities for high-level data analysis and interpretation, as well as for information technology infrastructure.
Demissie, Serkalem; Soranzo, Nicole; Bianchi, Estelle N.; Grundberg, Elin; Liang, Liming; Richards, J. Brent; Estrada, Karol; Zhou, Yanhua; van Nas, Atila; Moffatt, Miriam F.; Zhai, Guangju; Hofman, Albert; van Meurs, Joyce B.; Pols, Huibert A. P.; Price, Roger I.; Nilsson, Olle; Pastinen, Tomi; Cupples, L. Adrienne; Lusis, Aldons J.; Schadt, Eric E.; Ferrari, Serge; Uitterlinden, André G.
2010-01-01
Osteoporosis is a complex disorder and commonly leads to fractures in elderly persons. Genome-wide association studies (GWAS) have become an unbiased approach to identify variations in the genome that potentially affect health. However, the genetic variants identified so far only explain a small proportion of the heritability for complex traits. Due to the modest genetic effect size and inadequate power, true association signals may not be revealed based on a stringent genome-wide significance threshold. Here, we take advantage of SNP and transcript arrays and integrate GWAS and expression signature profiling relevant to the skeletal system in cellular and animal models to prioritize the discovery of novel candidate genes for osteoporosis-related traits, including bone mineral density (BMD) at the lumbar spine (LS) and femoral neck (FN), as well as geometric indices of the hip (femoral neck-shaft angle, NSA; femoral neck length, NL; and narrow-neck width, NW). A two-stage meta-analysis of GWAS from 7,633 Caucasian women and 3,657 men, revealed three novel loci associated with osteoporosis-related traits, including chromosome 1p13.2 (RAP1A, p = 3.6×10−8), 2q11.2 (TBC1D8), and 18q11.2 (OSBPL1A), and confirmed a previously reported region near TNFRSF11B/OPG gene. We also prioritized 16 suggestive genome-wide significant candidate genes based on their potential involvement in skeletal metabolism. Among them, 3 candidate genes were associated with BMD in women. Notably, 2 out of these 3 genes (GPR177, p = 2.6×10−13; SOX6, p = 6.4×10−10) associated with BMD in women have been successfully replicated in a large-scale meta-analysis of BMD, but none of the non-prioritized candidates (associated with BMD) did. Our results support the concept of our prioritization strategy. In the absence of direct biological support for identified genes, we highlighted the efficiency of subsequent functional characterization using publicly available expression profiling relevant to the skeletal system in cellular or whole animal models to prioritize candidate genes for further functional validation. PMID:20548944
Partial DNA-guided Cas9 enables genome editing with reduced off-target activity
Yin, Hao; Song, Chun-Qing; Suresh, Sneha; Kwan, Suet-Yan; Wu, Qiongqiong; Walsh, Stephen; Ding, Junmei; Bogorad, Roman L; Zhu, Lihua Julie; Wolfe, Scot A; Koteliansky, Victor; Xue, Wen; Langer, Robert; Anderson, Daniel G
2018-01-01
CRISPR–Cas9 is a versatile RNA-guided genome editing tool. Here we demonstrate that partial replacement of RNA nucleotides with DNA nucleotides in CRISPR RNA (crRNA) enables efficient gene editing in human cells. This strategy of partial DNA replacement retains on-target activity when used with both crRNA and sgRNA, as well as with multiple guide sequences. Partial DNA replacement also works for crRNA of Cpf1, another CRISPR system. We find that partial DNA replacement in the guide sequence significantly reduces off-target genome editing through focused analysis of off-target cleavage, measurement of mismatch tolerance and genome-wide profiling of off-target sites. Using the structure of the Cas9–sgRNA complex as a guide, the majority of the 3′ end of crRNA can be replaced with DNA nucleotide, and the 5 - and 3′-DNA-replaced crRNA enables efficient genome editing. Cas9 guided by a DNA–RNA chimera may provide a generalized strategy to reduce both the cost and the off-target genome editing in human cells. PMID:29377001
Brody, Thomas; Yavatkar, Amarendra S; Park, Dong Sun; Kuzin, Alexander; Ross, Jermaine; Odenwald, Ward F
2017-06-01
Flavivirus and Filovirus infections are serious epidemic threats to human populations. Multi-genome comparative analysis of these evolving pathogens affords a view of their essential, conserved sequence elements as well as progressive evolutionary changes. While phylogenetic analysis has yielded important insights, the growing number of available genomic sequences makes comparisons between hundreds of viral strains challenging. We report here a new approach for the comparative analysis of these hemorrhagic fever viruses that can superimpose an unlimited number of one-on-one alignments to identify important features within genomes of interest. We have adapted EvoPrinter alignment algorithms for the rapid comparative analysis of Flavivirus or Filovirus sequences including Zika and Ebola strains. The user can input a full genome or partial viral sequence and then view either individual comparisons or generate color-coded readouts that superimpose hundreds of one-on-one alignments to identify unique or shared identity SNPs that reveal ancestral relationships between strains. The user can also opt to select a database genome in order to access a library of pre-aligned genomes of either 1,094 Flaviviruses or 460 Filoviruses for rapid comparative analysis with all database entries or a select subset. Using EvoPrinter search and alignment programs, we show the following: 1) superimposing alignment data from many related strains identifies lineage identity SNPs, which enable the assessment of sublineage complexity within viral outbreaks; 2) whole-genome SNP profile screens uncover novel Dengue2 and Zika recombinant strains and their parental lineages; 3) differential SNP profiling identifies host cell A-to-I hyper-editing within Ebola and Marburg viruses, and 4) hundreds of superimposed one-on-one Ebola genome alignments highlight ultra-conserved regulatory sequences, invariant amino acid codons and evolutionarily variable protein-encoding domains within a single genome. EvoPrinter allows for the assessment of lineage complexity within Flavivirus or Filovirus outbreaks, identification of recombinant strains, highlights sequences that have undergone host cell A-to-I editing, and identifies unique input and database SNPs within highly conserved sequences. EvoPrinter's ability to superimpose alignment data from hundreds of strains onto a single genome has allowed us to identify unique Zika virus sublineages that are currently spreading in South, Central and North America, the Caribbean, and in China. This new set of integrated alignment programs should serve as a useful addition to existing tools for the comparative analysis of these viruses.
2010-01-01
Background The identification of non-coding transcripts in human, mouse, and Escherichia coli has revealed their widespread occurrence and functional importance in both eukaryotic and prokaryotic life. In prokaryotes, studies have shown that non-coding transcripts participate in a broad range of cellular functions like gene regulation, stress and virulence. However, very little is known about non-coding transcripts in Streptococcus pneumoniae (pneumococcus), an obligate human respiratory pathogen responsible for significant worldwide morbidity and mortality. Tiling microarrays enable genome wide mRNA profiling as well as identification of novel transcripts at a high-resolution. Results Here, we describe a high-resolution transcription map of the S. pneumoniae clinical isolate TIGR4 using genomic tiling arrays. Our results indicate that approximately 66% of the genome is expressed under our experimental conditions. We identified a total of 50 non-coding small RNAs (sRNAs) from the intergenic regions, of which 36 had no predicted function. Half of the identified sRNA sequences were found to be unique to S. pneumoniae genome. We identified eight overrepresented sequence motifs among sRNA sequences that correspond to sRNAs in different functional categories. Tiling arrays also identified approximately 202 operon structures in the genome. Conclusions In summary, the pneumococcal operon structures and novel sRNAs identified in this study enhance our understanding of the complexity and extent of the pneumococcal 'expressed' genome. Furthermore, the results of this study open up new avenues of research for understanding the complex RNA regulatory network governing S. pneumoniae physiology and virulence. PMID:20525227
Movassaghi, Masoud; Shabihkhani, Maryam; Hojat, Seyed A; Williams, Ryan R; Chung, Lawrance K; Im, Kyuseok; Lucey, Gregory M; Wei, Bowen; Mareninov, Sergey; Wang, Michael W; Ng, Denise W; Tashjian, Randy S; Magaki, Shino; Perez-Rosendahl, Mari; Yang, Isaac; Khanlou, Negar; Vinters, Harry V; Liau, Linda M; Nghiemphu, Phioanh L; Lai, Albert; Cloughesy, Timothy F; Yong, William H
2017-08-01
Commercial targeted genomic profiling with next generation sequencing using formalin-fixed paraffin embedded (FFPE) tissue has recently entered into clinical use for diagnosis and for the guiding of therapy. However, there is limited independent data regarding the accuracy or robustness of commercial genomic profiling in gliomas. As part of patient care, FFPE samples of gliomas from 71 patients were submitted for targeted genomic profiling to one commonly used commercial vendor, Foundation Medicine. Genomic alterations were determined for the following grades or groups of gliomas; Grade I/II, Grade III, primary glioblastomas (GBMs), recurrent primary GBMs, and secondary GBMs. In addition, FFPE samples from the same patients were independently assessed with conventional methods such as immunohistochemistry (IHC), Quantitative real-time PCR (qRT-PCR), or Fluorescence in situ hybridization (FISH) for three genetic alterations: IDH1 mutations, EGFR amplification, and EGFRvIII expression. A total of 100 altered genes were detected by the aforementioned targeted genomic profiling assay. The number of different genomic alterations was significantly different between the five groups of gliomas and consistent with the literature. CDKN2A/B, TP53, and TERT were the most common genomic alterations seen in primary GBMs, whereas IDH1, TP53, and PIK3CA were the most common in secondary GBMs. Targeted genomic profiling demonstrated 92.3%-100% concordance with conventional methods. The targeted genomic profiling report provided an average of 5.5 drugs, and listed an average of 8.4 clinical trials for the 71 glioma patients studied but only a third of the trials were appropriate for glioma patients. In this limited comparison study, this commercial next generation sequencing based-targeted genomic profiling showed a high concordance rate with conventional methods for the 3 genetic alterations and identified mutations expected for the type of glioma. While it may not be feasible to exhaustively independently validate a commercial genomic profiling assay, examination of a few markers provides some reassurance of its robustness. While potential targeted drugs are recommended based on genetic alterations, to date most targeted therapies have failed in glioblasomas so the usefulness of such recommendations will increase with development of novel and efficacious drugs. Copyright © 2017. Published by Elsevier Inc.
BioStar models of clinical and genomic data for biomedical data warehouse design
Wang, Liangjiang; Ramanathan, Murali
2008-01-01
Biomedical research is now generating large amounts of data, ranging from clinical test results to microarray gene expression profiles. The scale and complexity of these datasets give rise to substantial challenges in data management and analysis. It is highly desirable that data warehousing and online analytical processing technologies can be applied to biomedical data integration and mining. The major difficulty probably lies in the task of capturing and modelling diverse biological objects and their complex relationships. This paper describes multidimensional data modelling for biomedical data warehouse design. Since the conventional models such as star schema appear to be insufficient for modelling clinical and genomic data, we develop a new model called BioStar schema. The new model can capture the rich semantics of biomedical data and provide greater extensibility for the fast evolution of biological research methodologies. PMID:18048122
Development and mapping of DArT markers within the Festuca - Lolium complex
Kopecký, David; Bartoš, Jan; Lukaszewski, Adam J; Baird, James H; Černoch, Vladimír; Kölliker, Roland; Rognli, Odd Arne; Blois, Helene; Caig, Vanessa; Lübberstedt, Thomas; Studer, Bruno; Shaw, Paul; Doležel, Jaroslav; Kilian, Andrzej
2009-01-01
Background Grasses are among the most important and widely cultivated plants on Earth. They provide high quality fodder for livestock, are used for turf and amenity purposes, and play a fundamental role in environment protection. Among cultivated grasses, species within the Festuca-Lolium complex predominate, especially in temperate regions. To facilitate high-throughput genome profiling and genetic mapping within the complex, we have developed a Diversity Arrays Technology (DArT) array for five grass species: F. pratensis, F. arundinacea, F. glaucescens, L. perenne and L. multiflorum. Results The DArTFest array contains 7680 probes derived from methyl-filtered genomic representations. In a first marker discovery experiment performed on 40 genotypes from each species (with the exception of F. glaucescens for which only 7 genotypes were used), we identified 3884 polymorphic markers. The number of DArT markers identified in every single genotype varied from 821 to 1852. To test the usefulness of DArTFest array for physical mapping, DArT markers were assigned to each of the seven chromosomes of F. pratensis using single chromosome substitution lines while recombinants of F. pratensis chromosome 3 were used to allocate the markers to seven chromosome bins. Conclusion The resources developed in this project will facilitate the development of genetic maps in Festuca and Lolium, the analysis on genetic diversity, and the monitoring of the genomic constitution of the Festuca × Lolium hybrids. They will also enable marker-assisted selection for multiple traits or for specific genome regions. PMID:19832973
Chatterjee, Sumantra; Sivakamasundari, V; Yap, Sook Peng; Kraus, Petra; Kumar, Vibhor; Xing, Xing; Lim, Siew Lan; Sng, Joel; Prabhakar, Shyam; Lufkin, Thomas
2014-12-05
Vertebrate organogenesis is a highly complex process involving sequential cascades of transcription factor activation or repression. Interestingly a single developmental control gene can occasionally be essential for the morphogenesis and differentiation of tissues and organs arising from vastly disparate embryological lineages. Here we elucidated the role of the mammalian homeobox gene Bapx1 during the embryogenesis of five distinct organs at E12.5 - vertebral column, spleen, gut, forelimb and hindlimb - using expression profiling of sorted wildtype and mutant cells combined with genome wide binding site analysis. Furthermore we analyzed the development of the vertebral column at the molecular level by combining transcriptional profiling and genome wide binding data for Bapx1 with similarly generated data sets for Sox9 to assemble a detailed gene regulatory network revealing genes previously not reported to be controlled by either of these two transcription factors. The gene regulatory network appears to control cell fate decisions and morphogenesis in the vertebral column along with the prevention of premature chondrocyte differentiation thus providing a detailed molecular view of vertebral column development.
Optimization of cDNA-AFLP experiments using genomic sequence data.
Kivioja, Teemu; Arvas, Mikko; Saloheimo, Markku; Penttilä, Merja; Ukkonen, Esko
2005-06-01
cDNA amplified fragment length polymorphism (cDNA-AFLP) is one of the few genome-wide level expression profiling methods capable of finding genes that have not yet been cloned or even predicted from sequence but have interesting expression patterns under the studied conditions. In cDNA-AFLP, a complex cDNA mixture is divided into small subsets using restriction enzymes and selective PCR. A large cDNA-AFLP experiment can require a substantial amount of resources, such as hundreds of PCR amplifications and gel electrophoresis runs, followed by manual cutting of a large number of bands from the gels. Our aim was to test whether this workload can be reduced by rational design of the experiment. We used the available genomic sequence information to optimize cDNA-AFLP experiments beforehand so that as many transcripts as possible could be profiled with a given amount of resources. Optimization of the selection of both restriction enzymes and selective primers for cDNA-AFLP experiments has not been performed previously. The in silico tests performed suggest that substantial amounts of resources can be saved by the optimization of cDNA-AFLP experiments.
Identification of copy number variants in whole-genome data using Reference Coverage Profiles
Glusman, Gustavo; Severson, Alissa; Dhankani, Varsha; Robinson, Max; Farrah, Terry; Mauldin, Denise E.; Stittrich, Anna B.; Ament, Seth A.; Roach, Jared C.; Brunkow, Mary E.; Bodian, Dale L.; Vockley, Joseph G.; Shmulevich, Ilya; Niederhuber, John E.; Hood, Leroy
2015-01-01
The identification of DNA copy numbers from short-read sequencing data remains a challenge for both technical and algorithmic reasons. The raw data for these analyses are measured in tens to hundreds of gigabytes per genome; transmitting, storing, and analyzing such large files is cumbersome, particularly for methods that analyze several samples simultaneously. We developed a very efficient representation of depth of coverage (150–1000× compression) that enables such analyses. Current methods for analyzing variants in whole-genome sequencing (WGS) data frequently miss copy number variants (CNVs), particularly hemizygous deletions in the 1–100 kb range. To fill this gap, we developed a method to identify CNVs in individual genomes, based on comparison to joint profiles pre-computed from a large set of genomes. We analyzed depth of coverage in over 6000 high quality (>40×) genomes. The depth of coverage has strong sequence-specific fluctuations only partially explained by global parameters like %GC. To account for these fluctuations, we constructed multi-genome profiles representing the observed or inferred diploid depth of coverage at each position along the genome. These Reference Coverage Profiles (RCPs) take into account the diverse technologies and pipeline versions used. Normalization of the scaled coverage to the RCP followed by hidden Markov model (HMM) segmentation enables efficient detection of CNVs and large deletions in individual genomes. Use of pre-computed multi-genome coverage profiles improves our ability to analyze each individual genome. We make available RCPs and tools for performing these analyses on personal genomes. We expect the increased sensitivity and specificity for individual genome analysis to be critical for achieving clinical-grade genome interpretation. PMID:25741365
Tapping the promise of genomics in species with complex, nonmodel genomes.
Hirsch, Candice N; Buell, C Robin
2013-01-01
Genomics is enabling a renaissance in all disciplines of plant biology. However, many plant genomes are complex and remain recalcitrant to current genomic technologies. The complexities of these nonmodel plant genomes are attributable to gene and genome duplication, heterozygosity, ploidy, and/or repetitive sequences. Methods are available to simplify the genome and reduce these barriers, including inbreeding and genome reduction, making these species amenable to current sequencing and assembly methods. Some, but not all, of the complexities in nonmodel genomes can be bypassed by sequencing the transcriptome rather than the genome. Additionally, comparative genomics approaches, which leverage phylogenetic relatedness, can aid in the interpretation of complex genomes. Although there are limitations in accessing complex nonmodel plant genomes using current sequencing technologies, genome manipulation and resourceful analyses can allow access to even the most recalcitrant plant genomes.
TEA: the epigenome platform for Arabidopsis methylome study.
Su, Sheng-Yao; Chen, Shu-Hwa; Lu, I-Hsuan; Chiang, Yih-Shien; Wang, Yu-Bin; Chen, Pao-Yang; Lin, Chung-Yen
2016-12-22
Bisulfite sequencing (BS-seq) has become a standard technology to profile genome-wide DNA methylation at single-base resolution. It allows researchers to conduct genome-wise cytosine methylation analyses on issues about genomic imprinting, transcriptional regulation, cellular development and differentiation. One single data from a BS-Seq experiment is resolved into many features according to the sequence contexts, making methylome data analysis and data visualization a complex task. We developed a streamlined platform, TEA, for analyzing and visualizing data from whole-genome BS-Seq (WGBS) experiments conducted in the model plant Arabidopsis thaliana. To capture the essence of the genome methylation level and to meet the efficiency for running online, we introduce a straightforward method for measuring genome methylation in each sequence context by gene. The method is scripted in Java to process BS-Seq mapping results. Through a simple data uploading process, the TEA server deploys a web-based platform for deep analysis by linking data to an updated Arabidopsis annotation database and toolkits. TEA is an intuitive and efficient online platform for analyzing the Arabidopsis genomic DNA methylation landscape. It provides several ways to help users exploit WGBS data. TEA is freely accessible for academic users at: http://tea.iis.sinica.edu.tw .
Characterizing polymorphic inversions in human genomes by single-cell sequencing
Sanders, Ashley D.; Hills, Mark; Porubský, David; Guryev, Victor; Falconer, Ester; Lansdorp, Peter M.
2016-01-01
Identifying genomic features that differ between individuals and cells can help uncover the functional variants that drive phenotypes and disease susceptibilities. For this, single-cell studies are paramount, as it becomes increasingly clear that the contribution of rare but functional cellular subpopulations is important for disease prognosis, management, and progression. Until now, studying these associations has been challenged by our inability to map structural rearrangements accurately and comprehensively. To overcome this, we coupled single-cell sequencing of DNA template strands (Strand-seq) with custom analysis software to rapidly discover, map, and genotype genomic rearrangements at high resolution. This allowed us to explore the distribution and frequency of inversions in a heterogeneous cell population, identify several polymorphic domains in complex regions of the genome, and locate rare alleles in the reference assembly. We then mapped the entire genomic complement of inversions within two unrelated individuals to characterize their distinct inversion profiles and built a nonredundant global reference of structural rearrangements in the human genome. The work described here provides a powerful new framework to study structural variation and genomic heterogeneity in single-cell samples, whether from individuals for population studies or tissue types for biomarker discovery. PMID:27472961
Wang, Shur-Jen; Laulederkind, Stanley J F; Hayman, G Thomas; Petri, Victoria; Smith, Jennifer R; Tutaj, Marek; Nigam, Rajni; Dwinell, Melinda R; Shimoyama, Mary
2016-08-01
Cardiovascular diseases are complex diseases caused by a combination of genetic and environmental factors. To facilitate progress in complex disease research, the Rat Genome Database (RGD) provides the community with a disease portal where genome objects and biological data related to cardiovascular diseases are systematically organized. The purpose of this study is to present biocuration at RGD, including disease, genetic, and pathway data. The RGD curation team uses controlled vocabularies/ontologies to organize data curated from the published literature or imported from disease and pathway databases. These organized annotations are associated with genes, strains, and quantitative trait loci (QTLs), thus linking functional annotations to genome objects. Screen shots from the web pages are used to demonstrate the organization of annotations at RGD. The human cardiovascular disease genes identified by annotations were grouped according to data sources and their annotation profiles were compared by in-house tools and other enrichment tools available to the public. The analysis results show that the imported cardiovascular disease genes from ClinVar and OMIM are functionally different from the RGD manually curated genes in terms of pathway and Gene Ontology annotations. The inclusion of disease genes from other databases enriches the collection of disease genes not only in quantity but also in quality. Copyright © 2016 the American Physiological Society.
Lim, Su Jun; Boyle, Patrick J.; Chinen, Madoka; Dale, Ryan K.; Lei, Elissa P.
2013-01-01
Chromatin insulators are functionally conserved DNA–protein complexes situated throughout the genome that organize independent transcriptional domains. Previous work implicated RNA as an important cofactor in chromatin insulator activity, although the precise mechanisms are not yet understood. Here we identify the exosome, the highly conserved major cellular 3′ to 5′ RNA degradation machinery, as a physical interactor of CP190-dependent chromatin insulator complexes in Drosophila. Genome-wide profiling of exosome by ChIP-seq in two different embryonic cell lines reveals extensive and specific overlap with the CP190, BEAF-32 and CTCF insulator proteins. Colocalization occurs mainly at promoters but also boundary elements such as Mcp, Fab-8, scs and scs′, which overlaps with a promoter. Surprisingly, exosome associates primarily with promoters but not gene bodies of active genes, arguing against simple cotranscriptional recruitment to RNA substrates. Similar to insulator proteins, exosome is also significantly enriched at divergently transcribed promoters. Directed ChIP of exosome in cell lines depleted of insulator proteins shows that CTCF is required specifically for exosome association at Mcp and Fab-8 but not other sites, suggesting that alternate mechanisms must also contribute to exosome chromatin recruitment. Taken together, our results reveal a novel positive relationship between exosome and chromatin insulators throughout the genome. PMID:23358822
2007-05-01
Benign and Malignant Nerve Sheath Tumors in Neurofibromatosis Patients PRINCIPAL INVESTIGATOR: Matt van de Rijn, M.D., Ph.D. Torsten...Annual 3. DATES COVERED 1 May 2006 –30 Apr 2007 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Genomic and Expression Profiling of Benign and Malignant Nerve...Award Number: DAMD17-03-1-0297 Title: Genomic and Expression Profiling of Benign and Malignant Nerve Sheath Tumors in Neurofibromatosis
How to interpret methylation sensitive amplified polymorphism (MSAP) profiles?
Fulneček, Jaroslav; Kovařík, Aleš
2014-01-06
DNA methylation plays a key role in development, contributes to genome stability, and may also respond to external factors supporting adaptation and evolution. To connect different types of stimuli with particular biological processes, identifying genome regions with altered 5-methylcytosine distribution at a genome-wide scale is important. Many researchers are using the simple, reliable, and relatively inexpensive Methylation Sensitive Amplified Polymorphism (MSAP) method that is particularly useful in studies of epigenetic variation. However, electrophoretic patterns produced by the method are rather difficult to interpret, particularly when MspI and HpaII isoschizomers are used because these enzymes are methylation-sensitive, and any C within the CCGG recognition motif can be methylated in plant DNA. Here, we evaluate MSAP patterns with respect to current knowledge of the enzyme activities and the level and distribution of 5-methylcytosine in plant and vertebrate genomes. We discuss potential caveats related to complex MSAP patterns and provide clues regarding how to interpret them. We further show that addition of combined HpaII + MspI digestion would assist in the interpretation of the most controversial MSAP pattern represented by the signal in the HpaII but not in the MspI profile. We recommend modification of the MSAP protocol that definitely discerns between putative hemimethylated mCCGG and internal CmCGG sites. We believe that our view and the simple improvement will assist in correct MSAP data interpretation.
Multi-region and single-cell sequencing reveal variable genomic heterogeneity in rectal cancer.
Liu, Mingshan; Liu, Yang; Di, Jiabo; Su, Zhe; Yang, Hong; Jiang, Beihai; Wang, Zaozao; Zhuang, Meng; Bai, Fan; Su, Xiangqian
2017-11-23
Colorectal cancer is a heterogeneous group of malignancies with complex molecular subtypes. While colon cancer has been widely investigated, studies on rectal cancer are very limited. Here, we performed multi-region whole-exome sequencing and single-cell whole-genome sequencing to examine the genomic intratumor heterogeneity (ITH) of rectal tumors. We sequenced nine tumor regions and 88 single cells from two rectal cancer patients with tumors of the same molecular classification and characterized their mutation profiles and somatic copy number alterations (SCNAs) at the multi-region and the single-cell levels. A variable extent of genomic heterogeneity was observed between the two patients, and the degree of ITH increased when analyzed on the single-cell level. We found that major SCNAs were early events in cancer development and inherited steadily. Single-cell sequencing revealed mutations and SCNAs which were hidden in bulk sequencing. In summary, we studied the ITH of rectal cancer at regional and single-cell resolution and demonstrated that variable heterogeneity existed in two patients. The mutational scenarios and SCNA profiles of two patients with treatment naïve from the same molecular subtype are quite different. Our results suggest each tumor possesses its own architecture, which may result in different diagnosis, prognosis, and drug responses. Remarkable ITH exists in the two patients we have studied, providing a preliminary impression of ITH in rectal cancer.
Genetic Profile of Adenoid Cystic Carcinomas (ACC) with High-Grade Transformation versus Solid Type
Costa, Ana Flávia; Altemani, Albina; Vékony, Hedy; Bloemena, Elisabeth; Fresno, Florentino; Suárez, Carlos; Llorente, José Luis; Hermsen, Mario
2010-01-01
Background: ACC can occasionally undergo dedifferentiation also referred to as high-grade transformation (ACC-HGT). However, ACC-HGT can also undergo transformation to adenocarcinomas which are not poorly differentiated. ACC-HGT is generally considered to be an aggressive variant of ACC, even more than solid ACC. This study was aimed to describe the genetic changes of ACC-HGT in relation to clinico-pathological features and to compare results to solid ACC. Methods: Genome-wide DNA copy number changes were analyzed by microarray CGH in ACC-HGT, 4 with transformation into moderately differentiated adenocarcinoma (MDA) and two into poorly differentiated carcinoma (PDC), 5 solid ACC. In addition, Ki-67 index and p53 immunopositivity was assessed. Results: ACC-HGT carried fewer copy number changes compared to solid ACC. Two ACC-HGT cases harboured a breakpoint at 6q23, near the cMYB oncogene. The complexity of the genomic profile concurred with the clinical course of the patient. Among the ACC-HGT, p53 positivity significantly increased from the conventional to the transformed (both MDA and PDC) component. Conclusion: ACC-HGT may not necessarily reflect a more advanced stage of tumor progression, but rather a transformation to another histological form in which the poorly differentiated forms (PDC) presents a genetic complexity similar to the solid ACC. PMID:20978318
Genetic profile of adenoid cystic carcinomas (ACC) with high-grade transformation versus solid type.
Costa, Ana Flávia; Altemani, Albina; Vékony, Hedy; Bloemena, Elisabeth; Fresno, Florentino; Suárez, Carlos; Llorente, José Luis; Hermsen, Mario
2010-01-01
ACC can occasionally undergo dedifferentiation also referred to as high-grade transformation (ACC-HGT). However, ACC-HGT can also undergo transformation to adenocarcinomas which are not poorly differentiated. ACC-HGT is generally considered to be an aggressive variant of ACC, even more than solid ACC. This study was aimed to describe the genetic changes of ACC-HGT in relation to clinico-pathological features and to compare results to solid ACC. genome-wide DNA copy number changes were analyzed by microarray CGH in ACC-HGT, 4 with transformation into moderately differentiated adenocarcinoma (MDA) and two into poorly differentiated carcinoma (PDC), 5 solid ACC. In addition, Ki-67 index and p53 immunopositivity was assessed. ACC-HGT carried fewer copy number changes compared to solid ACC. Two ACC-HGT cases harboured a breakpoint at 6q23, near the cMYB oncogene. The complexity of the genomic profile concurred with the clinical course of the patient. Among the ACC-HGT, p53 positivity significantly increased from the conventional to the transformed (both MDA and PDC) component. ACC-HGT may not necessarily reflect a more advanced stage of tumor progression, but rather a transformation to another histological form in which the poorly differentiated forms (PDC) presents a genetic complexity similar to the solid ACC.
Genetic profile of adenoid cystic carcinomas (ACC) with high-grade transformation versus solid type.
Costa, Ana Flávia; Altemani, Albina; Vékony, Hedy; Bloemena, Elisabeth; Fresno, Florentino; Suárez, Carlos; Llorente, José Luis; Hermsen, Mario
2011-08-01
ACC can occasionally undergo dedifferentiation also referred to as high-grade transformation (ACC-HGT). However, ACC-HGT can also undergo transformation to adenocarcinomas which are not poorly differentiated. ACC-HGT is generally considered to be an aggressive variant of ACC, even more than solid ACC. This study was aimed to describe the genetic changes of ACC-HGT in relation to clinico-pathological features, and to compare results to solid ACC. Genome wide DNA copy number changes were analyzed by microarray CGH in ACC-HGT, four with transformation into moderately differentiated adenocarcinoma (MDA) and two into poorly differentiated carcinoma (PDC), and five solid ACC. In addition, Ki67 index and p53 immunopositivity was assessed. ACC-HGT carried fewer copy number changes compared to solid ACC. Two ACC-HGT cases harboured a breakpoint at 6q23, near the cMYB oncogene. The complexity of the genomic profile concurred with the clinical course of the patient. Among the ACC-HGT, p53 positivity significantly increased from the conventional to the transformed (both MDA and PDC) component. ACC-HGT may not necessarily reflect a more advanced stage of tumor progression, but rather a transformation to another histological form in which the poorly differentiated forms (PDC) presents a genetic complexity similar to the solid ACC.
Ribosome profiling: a Hi-Def monitor for protein synthesis at the genome-wide scale
Michel, Audrey M; Baranov, Pavel V
2013-01-01
Ribosome profiling or ribo-seq is a new technique that provides genome-wide information on protein synthesis (GWIPS) in vivo. It is based on the deep sequencing of ribosome protected mRNA fragments allowing the measurement of ribosome density along all RNA molecules present in the cell. At the same time, the high resolution of this technique allows detailed analysis of ribosome density on individual RNAs. Since its invention, the ribosome profiling technique has been utilized in a range of studies in both prokaryotic and eukaryotic organisms. Several studies have adapted and refined the original ribosome profiling protocol for studying specific aspects of translation. Ribosome profiling of initiating ribosomes has been used to map sites of translation initiation. These studies revealed the surprisingly complex organization of translation initiation sites in eukaryotes. Multiple initiation sites are responsible for the generation of N-terminally extended and truncated isoforms of known proteins as well as for the translation of numerous open reading frames (ORFs), upstream of protein coding ORFs. Ribosome profiling of elongating ribosomes has been used for measuring differential gene expression at the level of translation, the identification of novel protein coding genes and ribosome pausing. It has also provided data for developing quantitative models of translation. Although only a dozen or so ribosome profiling datasets have been published so far, they have already dramatically changed our understanding of translational control and have led to new hypotheses regarding the origin of protein coding genes. © 2013 John Wiley & Sons, Ltd. PMID:23696005
Macqueen, Daniel J; Kristjánsson, Bjarni K; Johnston, Ian A
2010-06-01
Metazoan akirin genes regulate innate immunity, myogenesis, and carcinogenesis. Invertebrates typically have one family member, while most tetrapod and teleost vertebrates have one to three. We demonstrate an expanded repertoire of eight family members in genomes of four salmonid fishes, owing to paralog preservation after three tetraploidization events. Retention of paralogs secondarily lost in other teleosts may be related to functional diversification and posttranslational regulation. We hypothesized that salmonid akirins would be transcriptionally regulated in fast-twitch skeletal muscle during activation of conserved pathways governing catabolism and growth. The in vivo nutritional state of Arctic charr (Salvelinus alpinus L.) was experimentally manipulated, and transcript levels for akirin family members and 26 other genes were measured by quantitative real-time PCR (qPCR), allowing the establishment of a similarity network of expression profiles. In fasted muscle, a class of akirins was upregulated, with one family member showing high coexpression with catabolic genes coding the NF-kappaB p65 subunit, E2 ubiquitin-conjugating enzymes, E3 ubiquitin ligases, and IGF-I receptors. Another class of akirin was upregulated with subsequent feeding, coexpressed with 14-3-3 protein genes. There was no similarity between expression profiles of akirins with IGF hormones or binding protein genes. The level of phylogenetic relatedness of akirin family members was not a strong predictor of transcriptional responses to nutritional state, or differences in transcript abundance levels, indicating a complex pattern of regulatory evolution. The salmonid akirins epitomize the complexity linking the genome to physiological phenotypes of vertebrates with a history of tetraploidization.
The Network Organization of Cancer-associated Protein Complexes in Human Tissues
Zhao, Jing; Lee, Sang Hoon; Huss, Mikael; Holme, Petter
2013-01-01
Differential gene expression profiles for detecting disease genes have been studied intensively in systems biology. However, it is known that various biological functions achieved by proteins follow from the ability of the protein to form complexes by physically binding to each other. In other words, the functional units are often protein complexes rather than individual proteins. Thus, we seek to replace the perspective of disease-related genes by disease-related complexes, exemplifying with data on 39 human solid tissue cancers and their original normal tissues. To obtain the differential abundance levels of protein complexes, we apply an optimization algorithm to genome-wide differential expression data. From the differential abundance of complexes, we extract tissue- and cancer-selective complexes, and investigate their relevance to cancer. The method is supported by a clustering tendency of bipartite cancer-complex relationships, as well as a more concrete and realistic approach to disease-related proteomics. PMID:23567845
Martínez-Núñez, Mario Alberto; Poot-Hernandez, Augusto Cesar; Rodríguez-Vázquez, Katya; Perez-Rueda, Ernesto
2013-01-01
In this work, the content of enzymes and DNA-binding transcription factors (TFs) in 794 non-redundant prokaryotic genomes was evaluated. The identification of enzymes was based on annotations deposited in the KEGG database as well as in databases of functional domains (COG and PFAM) and structural domains (Superfamily). For identifications of the TFs, hidden Markov profiles were constructed based on well-known transcriptional regulatory families. From these analyses, we obtained diverse and interesting results, such as the negative rate of incremental changes in the number of detected enzymes with respect to the genome size. On the contrary, for TFs the rate incremented as the complexity of genome increased. This inverse related performance shapes the diversity of metabolic and regulatory networks and impacts the availability of enzymes and TFs. Furthermore, the intersection of the derivatives between enzymes and TFs was identified at 9,659 genes, after this point, the regulatory complexity grows faster than metabolic complexity. In addition, TFs have a low number of duplications, in contrast to the apparent high number of duplications associated with enzymes. Despite the greater number of duplicated enzymes versus TFs, the increment by which duplicates appear is higher in TFs. A lower proportion of enzymes among archaeal genomes (22%) than in the bacterial ones (27%) was also found. This low proportion might be compensated by the interconnection between the metabolic pathways in Archaea. A similar proportion was also found for the archaeal TFs, for which the formation of regulatory complexes has been proposed. Finally, an enrichment of multifunctional enzymes in Bacteria, as a mechanism of ecological adaptation, was detected.
Martínez-Núñez, Mario Alberto; Poot-Hernandez, Augusto Cesar; Rodríguez-Vázquez, Katya; Perez-Rueda, Ernesto
2013-01-01
In this work, the content of enzymes and DNA-binding transcription factors (TFs) in 794 non-redundant prokaryotic genomes was evaluated. The identification of enzymes was based on annotations deposited in the KEGG database as well as in databases of functional domains (COG and PFAM) and structural domains (Superfamily). For identifications of the TFs, hidden Markov profiles were constructed based on well-known transcriptional regulatory families. From these analyses, we obtained diverse and interesting results, such as the negative rate of incremental changes in the number of detected enzymes with respect to the genome size. On the contrary, for TFs the rate incremented as the complexity of genome increased. This inverse related performance shapes the diversity of metabolic and regulatory networks and impacts the availability of enzymes and TFs. Furthermore, the intersection of the derivatives between enzymes and TFs was identified at 9,659 genes, after this point, the regulatory complexity grows faster than metabolic complexity. In addition, TFs have a low number of duplications, in contrast to the apparent high number of duplications associated with enzymes. Despite the greater number of duplicated enzymes versus TFs, the increment by which duplicates appear is higher in TFs. A lower proportion of enzymes among archaeal genomes (22%) than in the bacterial ones (27%) was also found. This low proportion might be compensated by the interconnection between the metabolic pathways in Archaea. A similar proportion was also found for the archaeal TFs, for which the formation of regulatory complexes has been proposed. Finally, an enrichment of multifunctional enzymes in Bacteria, as a mechanism of ecological adaptation, was detected. PMID:23922780
Kraggerud, Sigrid Marie; Hoei-Hansen, Christina E.; Alagaratnam, Sharmini; Skotheim, Rolf I.; Abeler, Vera M.
2013-01-01
This review focuses on the molecular characteristics and development of rare malignant ovarian germ cell tumors (mOGCTs). We provide an overview of the genomic aberrations assessed by ploidy, cytogenetic banding, and comparative genomic hybridization. We summarize and discuss the transcriptome profiles of mRNA and microRNA (miRNA), and biomarkers (DNA methylation, gene mutation, individual protein expression) for each mOGCT histological subtype. Parallels between the origin of mOGCT and their male counterpart testicular GCT (TGCT) are discussed from the perspective of germ cell development, endocrinological influences, and pathogenesis, as is the GCT origin in patients with disorders of sex development. Integrated molecular profiles of the 3 main histological subtypes, dysgerminoma (DG), yolk sac tumor (YST), and immature teratoma (IT), are presented. DGs show genomic aberrations comparable to TGCT. In contrast, the genome profiles of YST and IT are different both from each other and from DG/TGCT. Differences between DG and YST are underlined by their miRNA/mRNA expression patterns, suggesting preferential involvement of the WNT/β-catenin and TGF-β/bone morphogenetic protein signaling pathways among YSTs. Characteristic protein expression patterns are observed in DG, YST and IT. We propose that mOGCT develop through different developmental pathways, including one that is likely shared with TGCT and involves insufficient sexual differentiation of the germ cell niche. The molecular features of the mOGCTs underline their similarity to pluripotent precursor cells (primordial germ cells, PGCs) and other stem cells. This similarity combined with the process of ovary development, explain why mOGCTs present so early in life, and with greater histological complexity, than most somatic solid tumors. PMID:23575763
Quantitative phenotyping via deep barcode sequencing
Smith, Andrew M.; Heisler, Lawrence E.; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J.; Chee, Mark; Roth, Frederick P.; Giaever, Guri; Nislow, Corey
2009-01-01
Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or “Bar-seq,” outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that ∼20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene–environment interactions on a genome-wide scale. PMID:19622793
Genomic analysis and clinical management of adolescent cutaneous melanoma.
Rabbie, Roy; Rashid, Mamunur; Arance, Ana M; Sánchez, Marcelo; Tell-Marti, Gemma; Potrony, Miriam; Conill, Carles; van Doorn, Remco; Dentro, Stefan; Gruis, Nelleke A; Corrie, Pippa; Iyer, Vivek; Robles-Espinoza, Carla Daniela; Puig-Butille, Joan A; Puig, Susana; Adams, David J
2017-05-01
Melanoma in young children is rare; however, its incidence in adolescents and young adults is rising. We describe the clinical course of a 15-year-old female diagnosed with AJCC stage IB non-ulcerated primary melanoma, who died from metastatic disease 4 years after diagnosis despite three lines of modern systemic therapy. We also present the complete genomic profile of her tumour and compare this to a further series of 13 adolescent melanomas and 275 adult cutaneous melanomas. A somatic BRAF V 600E mutation and a high mutational load equivalent to that found in adult melanoma and composed primarily of C>T mutations were observed. A germline genomic analysis alongside a series of 23 children and adolescents with melanoma revealed no mutations in known germline melanoma-predisposing genes. Adolescent melanomas appear to have genomes that are as complex as those arising in adulthood and their clinical course can, as with adults, be unpredictable. © 2017 The Authors. Pigment Cell & Melanoma Research published by John Wiley & Sons Ltd.
Manno, Mariano Torres; Zuljan, Federico; Alarcón, Sergio; Esteban, Luis; Blancato, Victor; Espariz, Martín; Magni, Christian
2018-06-23
Lactococcus lactis strains constitute one of the most important starter cultures for cheese production. In this study, a genome-wide analysis was performed including 68 available genomes of L. lactis group strains showing the existence of two species (L. lactis and L. cremoris) and two biovars (L. lactis biovar. diacetylactis and L. cremoris biovar. lactis). The proposed classification scheme revealed coherency among phenotypic (through in silico and in vivo bacterial function profiling), phylogenomic (through maximum likelihood trees) and genomic (using overall genome sequence-based parameters) approaches. Strain biodiversity for the industrial biovar. diacetylactis was also analyzed, finding they are formed by at least three variants with the CC1 clonal complex as the only one distributed worldwide. These findings and methodologies will help improve the selection of L. lactis group strains for industrial use as well as facilitate the interpretation of previous or future research studies on this diverse group of bacteria. Copyright © 2018. Published by Elsevier B.V.
Nayduch, Dana; Lee, Matthew B; Saski, Christopher A
2014-01-01
Unlike other important vectors such as mosquitoes and sandflies, genetic and genomic tools for Culicoides biting midges are lacking, despite the fact that they vector a large number of arboviruses and other pathogens impacting humans and domestic animals world-wide. In North America, female Culicoides sonorensis midges are important vectors of bluetongue virus (BTV) and epizootic hemorrhagic disease virus (EHDV), orbiviruses that cause significant disease in livestock and wildlife. Libraries of tissue-specific transcripts expressed in response to feeding and oral orbivirus challenge in C. sonorensis have previously been reported, but extensive genome-wide expression profiling in the midge has not. Here, we successfully used deep sequencing technologies to construct the first adult female C. sonorensis reference transcriptome, and utilized genome-wide expression profiling to elucidate the genetic response to blood and sucrose feeding over time. The adult female midge unigene consists of 19,041 genes, of which less than 7% are differentially expressed during the course of a sucrose meal, while up to 52% of the genes respond significantly in blood-fed midges, indicating hematophagy induces complex physiological processes. Many genes that were differentially expressed during blood feeding were associated with digestion (e.g. proteases, lipases), hematophagy (e.g., salivary proteins), and vitellogenesis, revealing many major metabolic and biological factors underlying these critical processes. Additionally, key genes in the vitellogenesis pathway were identified, which provides the first glimpse into the molecular basis of anautogeny for C. sonorensis. This is the first extensive transcriptome for this genus, which will serve as a framework for future expression studies, RNAi, and provide a rich dataset contributing to the ultimate goal of informing a reference genome assembly and annotation. Moreover, this study will serve as a foundation for subsequent studies of genome-wide expression analyses during early orbivirus infection and dissecting the molecular mechanisms behind vector competence in midges.
Røe, Oluf Dimitri; Anderssen, Endre; Helge, Eli; Pettersen, Caroline Hild; Olsen, Karina Standahl; Sandeck, Helmut; Haaverstad, Rune; Lundgren, Steinar; Larsson, Erik
2009-01-01
Background Malignant pleural mesothelioma is considered an almost incurable tumour with increasing incidence worldwide. It usually develops in the parietal pleura, from mesothelial lining or submesothelial cells, subsequently invading the visceral pleura. Chromosomal and genomic aberrations of mesothelioma are diverse and heterogenous. Genome-wide profiling of mesothelioma versus parietal and visceral normal pleural tissue could thus reveal novel genes and pathways explaining its aggressive phenotype. Methodology and Principal Findings Well-characterised tissue from five mesothelioma patients and normal parietal and visceral pleural samples from six non-cancer patients were profiled by Affymetrix oligoarray of 38 500 genes. The lists of differentially expressed genes tested for overrepresentation in KEGG PATHWAYS (Kyoto Encyclopedia of Genes and Genomes) and GO (gene ontology) terms revealed large differences of expression between visceral and parietal pleura, and both tissues differed from mesothelioma. Cell growth and intrinsic resistance in tumour versus parietal pleura was reflected in highly overexpressed cell cycle, mitosis, replication, DNA repair and anti-apoptosis genes. Several genes of the “salvage pathway” that recycle nucleobases were overexpressed, among them TYMS, encoding thymidylate synthase, the main target of the antifolate drug pemetrexed that is active in mesothelioma. Circadian rhythm genes were expressed in favour of tumour growth. The local invasive, non-metastatic phenotype of mesothelioma, could partly be due to overexpression of the known metastasis suppressors NME1 and NME2. Down-regulation of several tumour suppressor genes could contribute to mesothelioma progression. Genes involved in cell communication were down-regulated, indicating that mesothelioma may shield itself from the immune system. Similarly, in non-cancer parietal versus visceral pleura signal transduction, soluble transporter and adhesion genes were down-regulated. This could represent a genetical platform of the parietal pleura propensity to develop mesothelioma. Conclusions Genome-wide microarray approach using complex human tissue samples revealed novel expression patterns, reflecting some important features of mesothelioma biology that should be further explored. PMID:19662092
Beird, Hannah C.; Wu, Chia-Chin; Ingram, Davis R.; Wang, Wei-Lien; Alimohamed, Asrar; Gumbs, Curtis; Little, Latasha; Song, Xingzhi; Feig, Barry W.; Roland, Christina L.; Zhang, Jianhua; Benjamin, Robert S.; Hwu, Patrick; Lazar, Alexander J.; Futreal, P. Andrew; Somaiah, Neeta
2018-01-01
Well-differentiated (WD) liposarcoma is a low-grade mesenchymal tumor with features of mature adipocytes and high propensity for local recurrence. Often, WD patients present with or later progress to a higher-grade nonlipogenic form known as dedifferentiated (DD) liposarcoma. These DD tumors behave more aggressively and can metastasize. Both WD and DD liposarcomas harbor neochromosomes formed from amplifications and rearrangements of Chr 12q that encode oncogenes (MDM2, CDK4, and YEATS2) and adipocytic differentiation factors (HMGA2 and CPM). However, genomic changes associated with progression from WD to DD have not been well-defined. Therefore, we selected patients with matched WD and DD tumors for extensive genomic profiling in order to understand their clonal relationships and to delineate any defining alterations for each entity. Exome and transcriptomic sequencing was performed for 17 patients with both WD and DD diagnoses. Somatic point and copy-number alterations were integrated with transcriptional analyses to determine subtype-associated genomic features and pathways. The results were, on average, that only 8.3% of somatic mutations in WD liposarcoma were shared with their cognate DD component. DD tumors had higher numbers of somatic copy-number losses, amplifications involving Chr 12q, and fusion transcripts than WD tumors. HMGA2 and CPM rearrangements occur more frequently in DD components. The shared somatic mutations indicate a clonal origin for matched WD and DD tumors and show early divergence with ongoing genomic instability due to continual generation and selection of neochromosomes. Stochastic generation and subsequent expression of fusion transcripts from the neochromosome that involve adipogenesis genes such as HMGA2 and CPM may influence the differentiation state of the subsequent tumor. PMID:29610390
Global Genetic Response in a Cancer Cell: Self-Organized Coherent Expression Dynamics
Tsuchiya, Masa; Hashimoto, Midori; Takenaka, Yoshiko; Motoike, Ikuko N.; Yoshikawa, Kenichi
2014-01-01
Understanding the basic mechanism of the spatio-temporal self-control of genome-wide gene expression engaged with the complex epigenetic molecular assembly is one of major challenges in current biological science. In this study, the genome-wide dynamical profile of gene expression was analyzed for MCF-7 breast cancer cells induced by two distinct ErbB receptor ligands: epidermal growth factor (EGF) and heregulin (HRG), which drive cell proliferation and differentiation, respectively. We focused our attention to elucidate how global genetic responses emerge and to decipher what is an underlying principle for dynamic self-control of genome-wide gene expression. The whole mRNA expression was classified into about a hundred groups according to the root mean square fluctuation (rmsf). These expression groups showed characteristic time-dependent correlations, indicating the existence of collective behaviors on the ensemble of genes with respect to mRNA expression and also to temporal changes in expression. All-or-none responses were observed for HRG and EGF (biphasic statistics) at around 10–20 min. The emergence of time-dependent collective behaviors of expression occurred through bifurcation of a coherent expression state (CES). In the ensemble of mRNA expression, the self-organized CESs reveals distinct characteristic expression domains for biphasic statistics, which exhibits notably the presence of criticality in the expression profile as a route for genomic transition. In time-dependent changes in the expression domains, the dynamics of CES reveals that the temporal development of the characteristic domains is characterized as autonomous bistable switch, which exhibits dynamic criticality (the temporal development of criticality) in the genome-wide coherent expression dynamics. It is expected that elucidation of the biophysical origin for such critical behavior sheds light on the underlying mechanism of the control of whole genome. PMID:24831017
Identifying gene networks underlying the neurobiology of ethanol and alcoholism.
Wolen, Aaron R; Miles, Michael F
2012-01-01
For complex disorders such as alcoholism, identifying the genes linked to these diseases and their specific roles is difficult. Traditional genetic approaches, such as genetic association studies (including genome-wide association studies) and analyses of quantitative trait loci (QTLs) in both humans and laboratory animals already have helped identify some candidate genes. However, because of technical obstacles, such as the small impact of any individual gene, these approaches only have limited effectiveness in identifying specific genes that contribute to complex diseases. The emerging field of systems biology, which allows for analyses of entire gene networks, may help researchers better elucidate the genetic basis of alcoholism, both in humans and in animal models. Such networks can be identified using approaches such as high-throughput molecular profiling (e.g., through microarray-based gene expression analyses) or strategies referred to as genetical genomics, such as the mapping of expression QTLs (eQTLs). Characterization of gene networks can shed light on the biological pathways underlying complex traits and provide the functional context for identifying those genes that contribute to disease development.
Ovenden, Ben; Milgate, Andrew; Wade, Len J; Rebetzke, Greg J; Holland, James B
2018-05-31
Abiotic stress tolerance traits are often complex and recalcitrant targets for conventional breeding improvement in many crop species. This study evaluated the potential of genomic selection to predict water-soluble carbohydrate concentration (WSCC), an important drought tolerance trait, in wheat under field conditions. A panel of 358 varieties and breeding lines constrained for maturity was evaluated under rainfed and irrigated treatments across two locations and two years. Whole-genome marker profiles and factor analytic mixed models were used to generate genomic estimated breeding values (GEBVs) for specific environments and environment groups. Additive genetic variance was smaller than residual genetic variance for WSCC, such that genotypic values were dominated by residual genetic effects rather than additive breeding values. As a result, GEBVs were not accurate predictors of genotypic values of the extant lines, but GEBVs should be reliable selection criteria to choose parents for intermating to produce new populations. The accuracy of GEBVs for untested lines was sufficient to increase predicted genetic gain from genomic selection per unit time compared to phenotypic selection if the breeding cycle is reduced by half by the use of GEBVs in off-season generations. Further, genomic prediction accuracy depended on having phenotypic data from environments with strong correlations with target production environments to build prediction models. By combining high-density marker genotypes, stress-managed field evaluations, and mixed models that model simultaneously covariances among genotypes and covariances of complex trait performance between pairs of environments, we were able to train models with good accuracy to facilitate genetic gain from genomic selection. Copyright © 2018 Ovenden et al.
Childhood Acute Myeloid Leukaemia
Rubnitz, Jeffrey E.; Inaba, Hiroto
2012-01-01
Summary Although acute myeloid leukaemia (AML) has long been recognized for its morphological and cytogenetic heterogeneity, recent high-resolution genomic profiling has demonstrated a complexity even greater than previously imagined. This complexity can be seen in the number and diversity of genetic alterations, epigenetic modifications, and characteristics of the leukaemic stem cells. The broad range of abnormalities across different AML subtypes suggests that improvements in clinical outcome will require the development of targeted therapies for each subtype of disease and the design of novel clinical trials to test these strategies. It is highly unlikely that further gains in long-term survival rates will be possible by mere intensification of conventional chemotherapy. In this review, we summarize recent studies that provide new insight into the genetics and biology of AML, discuss risk stratification and therapy for this disease, and profile some of the therapeutic agents currently under investigation. PMID:22966788
PanCoreGen - Profiling, detecting, annotating protein-coding genes in microbial genomes.
Paul, Sandip; Bhardwaj, Archana; Bag, Sumit K; Sokurenko, Evgeni V; Chattopadhyay, Sujay
2015-12-01
A large amount of genomic data, especially from multiple isolates of a single species, has opened new vistas for microbial genomics analysis. Analyzing the pan-genome (i.e. the sum of genetic repertoire) of microbial species is crucial in understanding the dynamics of molecular evolution, where virulence evolution is of major interest. Here we present PanCoreGen - a standalone application for pan- and core-genomic profiling of microbial protein-coding genes. PanCoreGen overcomes key limitations of the existing pan-genomic analysis tools, and develops an integrated annotation-structure for a species-specific pan-genomic profile. It provides important new features for annotating draft genomes/contigs and detecting unidentified genes in annotated genomes. It also generates user-defined group-specific datasets within the pan-genome. Interestingly, analyzing an example-set of Salmonella genomes, we detect potential footprints of adaptive convergence of horizontally transferred genes in two human-restricted pathogenic serovars - Typhi and Paratyphi A. Overall, PanCoreGen represents a state-of-the-art tool for microbial phylogenomics and pathogenomics study. Copyright © 2015 Elsevier Inc. All rights reserved.
Advantages and Pitfalls of Mass Spectrometry Based Metabolome Profiling in Systems Biology.
Aretz, Ina; Meierhofer, David
2016-04-27
Mass spectrometry-based metabolome profiling became the method of choice in systems biology approaches and aims to enhance biological understanding of complex biological systems. Genomics, transcriptomics, and proteomics are well established technologies and are commonly used by many scientists. In comparison, metabolomics is an emerging field and has not reached such high-throughput, routine and coverage than other omics technologies. Nevertheless, substantial improvements were achieved during the last years. Integrated data derived from multi-omics approaches will provide a deeper understanding of entire biological systems. Metabolome profiling is mainly hampered by its diversity, variation of metabolite concentration by several orders of magnitude and biological data interpretation. Thus, multiple approaches are required to cover most of the metabolites. No software tool is capable of comprehensively translating all the data into a biologically meaningful context yet. In this review, we discuss the advantages of metabolome profiling and main obstacles limiting progress in systems biology.
Advantages and Pitfalls of Mass Spectrometry Based Metabolome Profiling in Systems Biology
Aretz, Ina; Meierhofer, David
2016-01-01
Mass spectrometry-based metabolome profiling became the method of choice in systems biology approaches and aims to enhance biological understanding of complex biological systems. Genomics, transcriptomics, and proteomics are well established technologies and are commonly used by many scientists. In comparison, metabolomics is an emerging field and has not reached such high-throughput, routine and coverage than other omics technologies. Nevertheless, substantial improvements were achieved during the last years. Integrated data derived from multi-omics approaches will provide a deeper understanding of entire biological systems. Metabolome profiling is mainly hampered by its diversity, variation of metabolite concentration by several orders of magnitude and biological data interpretation. Thus, multiple approaches are required to cover most of the metabolites. No software tool is capable of comprehensively translating all the data into a biologically meaningful context yet. In this review, we discuss the advantages of metabolome profiling and main obstacles limiting progress in systems biology. PMID:27128910
Expression of the G72/G30 gene in transgenic mice induces behavioral changes
Cheng, Lijun; Hattori, Eiji; Nakajima, Akira; Woehrle, Nancy S.; Opal, Mark D.; Zhang, Chunling; Grennan, Kay; Dulawa, Stephanie C.; Tang, Ya-Ping; Gershon, Elliot S.; Liu, Chunyu
2012-01-01
The G72/G30 gene complex is a candidate gene for schizophrenia and bipolar disorder. However, G72 and G30 mRNAs are expressed at very low levels in human brain, with only rare splicing forms observed. We report here G72/G30 expression profiles and behavioral changes in a G72/G30 transgenic mouse model. A human BAC clone containing the G72/G30 genomic region was used to establish the transgenic mouse model, on which gene expression studies, Western blot and behavioral tests were performed. Relative to their minimal expression in humans, G72 and G30 mRNAs were highly expressed in the transgenic mice, and had a more complex splicing pattern. The highest G72 transcript levels were found in testis, followed by cerebral cortex, with very low or undetectable levels in other tissues. No LG72 (the long putative isoform of G72) protein was detected in the transgenic mice. Whole-genome expression profiling identified 361 genes differentially-expressed in transgenic mice compared to wild-type, including genes previously implicated in neurological and psychological disorders. Relative to wild-type mice, the transgenic mice exhibited fewer stereotypic movements in the open field test, higher baseline startle responses in the course of the prepulse inhibition test, and lower hedonic responses in the sucrose preference test. The transcriptome profile changes and multiple mouse behavioral effects suggest that the G72 gene may play a role in modulating behaviors relevant to psychiatric disorders. PMID:23337943
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hug, Laura A.; Thomas, Brian C.; Sharon, Itai
Nitrogen, sulfur and carbon fluxes in the terrestrial subsurface are determined by the intersecting activities of microbial community members, yet the organisms responsible are largely unknown. Metagenomic methods can identify organisms and functions, but genome recovery is often precluded by data complexity. To address this limitation, we developed subsampling assembly methods to re-construct high-quality draft genomes from complex samples. Here, we applied these methods to evaluate the interlinked roles of the most abundant organisms in biogeochemical cycling in the aquifer sediment. Community proteomics confirmed these activities. The eight most abundant organisms belong to novel lineages, and two represent phyla withmore » no previously sequenced genome. Four organisms are predicted to fix carbon via the Calvin Benson Bassham, Wood Ljungdahl or 3-hydroxyproprionate/4-hydroxybutarate pathways. The profiled organisms are involved in the network of denitrification, dissimilatory nitrate reduction to ammonia, ammonia oxidation and sulfate reduction/oxidation, and require substrates supplied by other community members. An ammonium-oxidizing Thaumarchaeote is the most abundant community member, despite low ammonium concentrations in the groundwater. Finally, this organism likely benefits from two other relatively abundant organisms capable of producing ammonium from nitrate, which is abundant in the groundwater. Overall, dominant members of the microbial community are interconnected through exchange of geochemical resources.« less
Stevenson, Clare E. M.; Assaad, Aoun; Chandra, Govind; Le, Tung B. K.; Greive, Sandra J.; Bibb, Mervyn J.; Lawson, David M.
2013-01-01
Consistent with their complex lifestyles and rich secondary metabolite profiles, the genomes of streptomycetes encode a plethora of transcription factors, the vast majority of which are uncharacterized. Herein, we use Surface Plasmon Resonance (SPR) to identify and delineate putative operator sites for SCO3205, a MarR family transcriptional regulator from Streptomyces coelicolor that is well represented in sequenced actinomycete genomes. In particular, we use a novel SPR footprinting approach that exploits indirect ligand capture to vastly extend the lifetime of a standard streptavidin SPR chip. We define two operator sites upstream of sco3205 and a pseudopalindromic consensus sequence derived from these enables further potential operator sites to be identified in the S. coelicolor genome. We evaluate each of these through SPR and test the importance of the conserved bases within the consensus sequence. Informed by these results, we determine the crystal structure of a SCO3205-DNA complex at 2.8 Å resolution, enabling molecular level rationalization of the SPR data. Taken together, our observations support a DNA recognition mechanism involving both direct and indirect sequence readout. PMID:23748564
The Somatic Genomic Landscape of Glioblastoma
Brennan, Cameron W.; Verhaak, Roel G.W.; McKenna, Aaron; Campos, Benito; Noushmehr, Houtan; Salama, Sofie R.; Zheng, Siyuan; Chakravarty, Debyani; Sanborn, J. Zachary; Berman, Samuel H.; Beroukhim, Rameen; Bernard, Brady; Wu, Chang-Jiun; Genovese, Giannicola; Shmulevich, Ilya; Barnholtz-Sloan, Jill; Zou, Lihua; Vegesna, Rahulsimham; Shukla, Sachet A.; Ciriello, Giovanni; Yung, WK; Zhang, Wei; Sougnez, Carrie; Mikkelsen, Tom; Aldape, Kenneth; Bigner, Darell D.; Van Meir, Erwin G.; Prados, Michael; Sloan, Andrew; Black, Keith L.; Eschbacher, Jennifer; Finocchiaro, Gaetano; Friedman, William; Andrews, David W.; Guha, Abhijit; Iacocca, Mary; O’Neill, Brian P.; Foltz, Greg; Myers, Jerome; Weisenberger, Daniel J.; Penny, Robert; Kucherlapati, Raju; Perou, Charles M.; Hayes, D. Neil; Gibbs, Richard; Marra, Marco; Mills, Gordon B.; Lander, Eric; Spellman, Paul; Wilson, Richard; Sander, Chris; Weinstein, John; Meyerson, Matthew; Gabriel, Stacey; Laird, Peter W.; Haussler, David; Getz, Gad; Chin, Lynda
2013-01-01
We describe the landscape of somatic genomic alterations based on multi-dimensional and comprehensive characterization of more than 500 glioblastoma tumors (GBMs). We identify several novel mutated genes as well as complex rearrangements of signature receptors including EGFR and PDGFRA. TERT promoter mutations are shown to correlate with elevated mRNA expression, supporting a role in telomerase reactivation. Correlative analyses confirm that the survival advantage of the proneural subtype is conferred by the G-CIMP phenotype, and MGMT DNA methylation may be a predictive biomarker for treatment response only in classical subtype GBM. Integrative analysis of genomic and proteomic profiles challenges the notion of therapeutic inhibition of a pathway as an alternative to inhibition of the target itself. These data will facilitate the discovery of therapeutic and diagnostic target candidates, the validation of research and clinical observations and the generation of unanticipated hypotheses that can advance our molecular understanding of this lethal cancer. PMID:24120142
The somatic genomic landscape of glioblastoma.
Brennan, Cameron W; Verhaak, Roel G W; McKenna, Aaron; Campos, Benito; Noushmehr, Houtan; Salama, Sofie R; Zheng, Siyuan; Chakravarty, Debyani; Sanborn, J Zachary; Berman, Samuel H; Beroukhim, Rameen; Bernard, Brady; Wu, Chang-Jiun; Genovese, Giannicola; Shmulevich, Ilya; Barnholtz-Sloan, Jill; Zou, Lihua; Vegesna, Rahulsimham; Shukla, Sachet A; Ciriello, Giovanni; Yung, W K; Zhang, Wei; Sougnez, Carrie; Mikkelsen, Tom; Aldape, Kenneth; Bigner, Darell D; Van Meir, Erwin G; Prados, Michael; Sloan, Andrew; Black, Keith L; Eschbacher, Jennifer; Finocchiaro, Gaetano; Friedman, William; Andrews, David W; Guha, Abhijit; Iacocca, Mary; O'Neill, Brian P; Foltz, Greg; Myers, Jerome; Weisenberger, Daniel J; Penny, Robert; Kucherlapati, Raju; Perou, Charles M; Hayes, D Neil; Gibbs, Richard; Marra, Marco; Mills, Gordon B; Lander, Eric; Spellman, Paul; Wilson, Richard; Sander, Chris; Weinstein, John; Meyerson, Matthew; Gabriel, Stacey; Laird, Peter W; Haussler, David; Getz, Gad; Chin, Lynda
2013-10-10
We describe the landscape of somatic genomic alterations based on multidimensional and comprehensive characterization of more than 500 glioblastoma tumors (GBMs). We identify several novel mutated genes as well as complex rearrangements of signature receptors, including EGFR and PDGFRA. TERT promoter mutations are shown to correlate with elevated mRNA expression, supporting a role in telomerase reactivation. Correlative analyses confirm that the survival advantage of the proneural subtype is conferred by the G-CIMP phenotype, and MGMT DNA methylation may be a predictive biomarker for treatment response only in classical subtype GBM. Integrative analysis of genomic and proteomic profiles challenges the notion of therapeutic inhibition of a pathway as an alternative to inhibition of the target itself. These data will facilitate the discovery of therapeutic and diagnostic target candidates, the validation of research and clinical observations and the generation of unanticipated hypotheses that can advance our molecular understanding of this lethal cancer. Copyright © 2013 Elsevier Inc. All rights reserved.
Bowman, Megan J.; Park, Wonkeun; Bauer, Philip J.; Udall, Joshua A.; Page, Justin T.; Raney, Joshua; Scheffler, Brian E.; Jones, Don. C.; Campbell, B. Todd
2013-01-01
An RNA-Seq experiment was performed using field grown well-watered and naturally rain fed cotton plants to identify differentially expressed transcripts under water-deficit stress. Our work constitutes the first application of the newly published diploid D5 Gossypium raimondii sequence in the study of tetraploid AD1 upland cotton RNA-seq transcriptome analysis. A total of 1,530 transcripts were differentially expressed between well-watered and water-deficit stressed root tissues, in patterns that confirm the accuracy of this technique for future studies in cotton genomics. Additionally, putative sequence based genome localization of differentially expressed transcripts detected A2 genome specific gene expression under water-deficit stress. These data will facilitate efforts to understand the complex responses governing transcriptomic regulatory mechanisms and to identify candidate genes that may benefit applied plant breeding programs. PMID:24324815
Classification of Phylogenetic Profiles for Protein Function Prediction: An SVM Approach
NASA Astrophysics Data System (ADS)
Kotaru, Appala Raju; Joshi, Ramesh C.
Predicting the function of an uncharacterized protein is a major challenge in post-genomic era due to problems complexity and scale. Having knowledge of protein function is a crucial link in the development of new drugs, better crops, and even the development of biochemicals such as biofuels. Recently numerous high-throughput experimental procedures have been invented to investigate the mechanisms leading to the accomplishment of a protein’s function and Phylogenetic profile is one of them. Phylogenetic profile is a way of representing a protein which encodes evolutionary history of proteins. In this paper we proposed a method for classification of phylogenetic profiles using supervised machine learning method, support vector machine classification along with radial basis function as kernel for identifying functionally linked proteins. We experimentally evaluated the performance of the classifier with the linear kernel, polynomial kernel and compared the results with the existing tree kernel. In our study we have used proteins of the budding yeast saccharomyces cerevisiae genome. We generated the phylogenetic profiles of 2465 yeast genes and for our study we used the functional annotations that are available in the MIPS database. Our experiments show that the performance of the radial basis kernel is similar to polynomial kernel is some functional classes together are better than linear, tree kernel and over all radial basis kernel outperformed the polynomial kernel, linear kernel and tree kernel. In analyzing these results we show that it will be feasible to make use of SVM classifier with radial basis function as kernel to predict the gene functionality using phylogenetic profiles.
Integrated Genomic and Network-Based Analyses of Complex Diseases and Human Disease Network.
Al-Harazi, Olfat; Al Insaif, Sadiq; Al-Ajlan, Monirah A; Kaya, Namik; Dzimiri, Nduna; Colak, Dilek
2016-06-20
A disease phenotype generally reflects various pathobiological processes that interact in a complex network. The highly interconnected nature of the human protein interaction network (interactome) indicates that, at the molecular level, it is difficult to consider diseases as being independent of one another. Recently, genome-wide molecular measurements, data mining and bioinformatics approaches have provided the means to explore human diseases from a molecular basis. The exploration of diseases and a system of disease relationships based on the integration of genome-wide molecular data with the human interactome could offer a powerful perspective for understanding the molecular architecture of diseases. Recently, subnetwork markers have proven to be more robust and reliable than individual biomarker genes selected based on gene expression profiles alone, and achieve higher accuracy in disease classification. We have applied one of these methodologies to idiopathic dilated cardiomyopathy (IDCM) data that we have generated using a microarray and identified significant subnetworks associated with the disease. In this paper, we review the recent endeavours in this direction, and summarize the existing methodologies and computational tools for network-based analysis of complex diseases and molecular relationships among apparently different disorders and human disease network. We also discuss the future research trends and topics of this promising field. Copyright © 2015 Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, and Genetics Society of China. Published by Elsevier Ltd. All rights reserved.
UFO: a web server for ultra-fast functional profiling of whole genome protein sequences.
Meinicke, Peter
2009-09-02
Functional profiling is a key technique to characterize and compare the functional potential of entire genomes. The estimation of profiles according to an assignment of sequences to functional categories is a computationally expensive task because it requires the comparison of all protein sequences from a genome with a usually large database of annotated sequences or sequence families. Based on machine learning techniques for Pfam domain detection, the UFO web server for ultra-fast functional profiling allows researchers to process large protein sequence collections instantaneously. Besides the frequencies of Pfam and GO categories, the user also obtains the sequence specific assignments to Pfam domain families. In addition, a comparison with existing genomes provides dissimilarity scores with respect to 821 reference proteomes. Considering the underlying UFO domain detection, the results on 206 test genomes indicate a high sensitivity of the approach. In comparison with current state-of-the-art HMMs, the runtime measurements show a considerable speed up in the range of four orders of magnitude. For an average size prokaryotic genome, the computation of a functional profile together with its comparison typically requires about 10 seconds of processing time. For the first time the UFO web server makes it possible to get a quick overview on the functional inventory of newly sequenced organisms. The genome scale comparison with a large number of precomputed profiles allows a first guess about functionally related organisms. The service is freely available and does not require user registration or specification of a valid email address.
The Assembly Pathway of Mitochondrial Respiratory Chain Complex I.
Guerrero-Castillo, Sergio; Baertling, Fabian; Kownatzki, Daniel; Wessels, Hans J; Arnold, Susanne; Brandt, Ulrich; Nijtmans, Leo
2017-01-10
Mitochondrial complex I is the largest integral membrane enzyme of the respiratory chain and consists of 44 different subunits encoded in the mitochondrial and nuclear genome. Its biosynthesis is a highly complicated and multifaceted process involving at least 14 additional assembly factors. How these subunits assemble into a functional complex I and where the assembly factors come into play is largely unknown. Here, we applied a dynamic complexome profiling approach to elucidate the assembly of human mitochondrial complex I and its further incorporation into respiratory chain supercomplexes. We delineate the stepwise incorporation of all but one subunit into a series of distinct assembly intermediates and their association with known and putative assembly factors, which had not been implicated in this process before. The resulting detailed and comprehensive model of complex I assembly is fully consistent with recent structural data and the remarkable modular architecture of this multiprotein complex. Copyright © 2017 Elsevier Inc. All rights reserved.
Chen, Yi; Luo, Yan; Curry, Phillip; Timme, Ruth; Melka, David; Doyle, Matthew; Parish, Mickey; Hammack, Thomas S; Allard, Marc W; Brown, Eric W; Strain, Errol A
2017-01-01
A listeriosis outbreak in the United States implicated contaminated ice cream produced by one company, which operated 3 facilities. We performed single nucleotide polymorphism (SNP)-based whole genome sequencing (WGS) analysis on Listeria monocytogenes from food, environmental and clinical sources, identifying two clusters and a single branch, belonging to PCR serogroup IIb and genetic lineage I. WGS Cluster I, representing one outbreak strain, contained 82 food and environmental isolates from Facility I and 4 clinical isolates. These isolates differed by up to 29 SNPs, exhibited 9 pulsed-field gel electrophoresis (PFGE) profiles and multilocus sequence typing (MLST) sequence type (ST) 5 of clonal complex 5 (CC5). WGS Cluster II contained 51 food and environmental isolates from Facility II, 4 food isolates from Facility I and 5 clinical isolates. Among them the isolates from Facility II and clinical isolates formed a clade and represented another outbreak strain. Isolates in this clade differed by up to 29 SNPs, exhibited 3 PFGE profiles and ST5. The only isolate collected from Facility III belonged to singleton ST489, which was in a single branch separate from Clusters I and II, and was not associated with the outbreak. WGS analyses clustered together outbreak-associated isolates exhibiting multiple PFGE profiles, while differentiating them from epidemiologically unrelated isolates that exhibited outbreak PFGE profiles. The complete genome of a Cluster I isolate allowed the identification and analyses of putative prophages, revealing that Cluster I isolates differed by the gain or loss of three putative prophages, causing the banding pattern differences among all 3 AscI-PFGE profiles observed in Cluster I isolates. WGS data suggested that certain ice cream varieties and/or production lines might have contamination sources unique to them. The SNP-based analysis was able to distinguish CC5 as a group from non-CC5 isolates and differentiate among CC5 isolates from different outbreaks/incidents.
Chen, Yi; Luo, Yan; Curry, Phillip; Timme, Ruth; Melka, David; Doyle, Matthew; Parish, Mickey; Hammack, Thomas S.; Allard, Marc W.; Brown, Eric W.; Strain, Errol A.
2017-01-01
A listeriosis outbreak in the United States implicated contaminated ice cream produced by one company, which operated 3 facilities. We performed single nucleotide polymorphism (SNP)-based whole genome sequencing (WGS) analysis on Listeria monocytogenes from food, environmental and clinical sources, identifying two clusters and a single branch, belonging to PCR serogroup IIb and genetic lineage I. WGS Cluster I, representing one outbreak strain, contained 82 food and environmental isolates from Facility I and 4 clinical isolates. These isolates differed by up to 29 SNPs, exhibited 9 pulsed-field gel electrophoresis (PFGE) profiles and multilocus sequence typing (MLST) sequence type (ST) 5 of clonal complex 5 (CC5). WGS Cluster II contained 51 food and environmental isolates from Facility II, 4 food isolates from Facility I and 5 clinical isolates. Among them the isolates from Facility II and clinical isolates formed a clade and represented another outbreak strain. Isolates in this clade differed by up to 29 SNPs, exhibited 3 PFGE profiles and ST5. The only isolate collected from Facility III belonged to singleton ST489, which was in a single branch separate from Clusters I and II, and was not associated with the outbreak. WGS analyses clustered together outbreak-associated isolates exhibiting multiple PFGE profiles, while differentiating them from epidemiologically unrelated isolates that exhibited outbreak PFGE profiles. The complete genome of a Cluster I isolate allowed the identification and analyses of putative prophages, revealing that Cluster I isolates differed by the gain or loss of three putative prophages, causing the banding pattern differences among all 3 AscI-PFGE profiles observed in Cluster I isolates. WGS data suggested that certain ice cream varieties and/or production lines might have contamination sources unique to them. The SNP-based analysis was able to distinguish CC5 as a group from non-CC5 isolates and differentiate among CC5 isolates from different outbreaks/incidents. PMID:28166293
How to interpret Methylation Sensitive Amplified Polymorphism (MSAP) profiles?
2014-01-01
Background DNA methylation plays a key role in development, contributes to genome stability, and may also respond to external factors supporting adaptation and evolution. To connect different types of stimuli with particular biological processes, identifying genome regions with altered 5-methylcytosine distribution at a genome-wide scale is important. Many researchers are using the simple, reliable, and relatively inexpensive Methylation Sensitive Amplified Polymorphism (MSAP) method that is particularly useful in studies of epigenetic variation. However, electrophoretic patterns produced by the method are rather difficult to interpret, particularly when MspI and HpaII isoschizomers are used because these enzymes are methylation-sensitive, and any C within the CCGG recognition motif can be methylated in plant DNA. Results Here, we evaluate MSAP patterns with respect to current knowledge of the enzyme activities and the level and distribution of 5-methylcytosine in plant and vertebrate genomes. We discuss potential caveats related to complex MSAP patterns and provide clues regarding how to interpret them. We further show that addition of combined HpaII + MspI digestion would assist in the interpretation of the most controversial MSAP pattern represented by the signal in the HpaII but not in the MspI profile. Conclusions We recommend modification of the MSAP protocol that definitely discerns between putative hemimethylated mCCGG and internal CmCGG sites. We believe that our view and the simple improvement will assist in correct MSAP data interpretation. PMID:24393618
Comparing Patterns of Natural Selection across Species Using Selective Signatures
Shapiro, B. Jesse; Alm, Eric J
2008-01-01
Comparing gene expression profiles over many different conditions has led to insights that were not obvious from single experiments. In the same way, comparing patterns of natural selection across a set of ecologically distinct species may extend what can be learned from individual genome-wide surveys. Toward this end, we show how variation in protein evolutionary rates, after correcting for genome-wide effects such as mutation rate and demographic factors, can be used to estimate the level and types of natural selection acting on genes across different species. We identify unusually rapidly and slowly evolving genes, relative to empirically derived genome-wide and gene family-specific background rates for 744 core protein families in 30 γ-proteobacterial species. We describe the pattern of fast or slow evolution across species as the “selective signature” of a gene. Selective signatures represent a profile of selection across species that is predictive of gene function: pairs of genes with correlated selective signatures are more likely to share the same cellular function, and genes in the same pathway can evolve in concert. For example, glycolysis and phenylalanine metabolism genes evolve rapidly in Idiomarina loihiensis, mirroring an ecological shift in carbon source from sugars to amino acids. In a broader context, our results suggest that the genomic landscape is organized into functional modules even at the level of natural selection, and thus it may be easier than expected to understand the complex evolutionary pressures on a cell. PMID:18266472
Dimitrova, N; Nagaraj, A B; Razi, A; Singh, S; Kamalakaran, S; Banerjee, N; Joseph, P; Mankovich, A; Mittal, P; DiFeo, A; Varadan, V
2017-04-27
Characterizing the complex interplay of cellular processes in cancer would enable the discovery of key mechanisms underlying its development and progression. Published approaches to decipher driver mechanisms do not explicitly model tissue-specific changes in pathway networks and the regulatory disruptions related to genomic aberrations in cancers. We therefore developed InFlo, a novel systems biology approach for characterizing complex biological processes using a unique multidimensional framework integrating transcriptomic, genomic and/or epigenomic profiles for any given cancer sample. We show that InFlo robustly characterizes tissue-specific differences in activities of signalling networks on a genome scale using unique probabilistic models of molecular interactions on a per-sample basis. Using large-scale multi-omics cancer datasets, we show that InFlo exhibits higher sensitivity and specificity in detecting pathway networks associated with specific disease states when compared to published pathway network modelling approaches. Furthermore, InFlo's ability to infer the activity of unmeasured signalling network components was also validated using orthogonal gene expression signatures. We then evaluated multi-omics profiles of primary high-grade serous ovarian cancer tumours (N=357) to delineate mechanisms underlying resistance to frontline platinum-based chemotherapy. InFlo was the only algorithm to identify hyperactivation of the cAMP-CREB1 axis as a key mechanism associated with resistance to platinum-based therapy, a finding that we subsequently experimentally validated. We confirmed that inhibition of CREB1 phosphorylation potently sensitized resistant cells to platinum therapy and was effective in killing ovarian cancer stem cells that contribute to both platinum-resistance and tumour recurrence. Thus, we propose InFlo to be a scalable and widely applicable and robust integrative network modelling framework for the discovery of evidence-based biomarkers and therapeutic targets.
Lee, Hayan; Schatz, Michael C
2012-08-15
Genome resequencing and short read mapping are two of the primary tools of genomics and are used for many important applications. The current state-of-the-art in mapping uses the quality values and mapping quality scores to evaluate the reliability of the mapping. These attributes, however, are assigned to individual reads and do not directly measure the problematic repeats across the genome. Here, we present the Genome Mappability Score (GMS) as a novel measure of the complexity of resequencing a genome. The GMS is a weighted probability that any read could be unambiguously mapped to a given position and thus measures the overall composition of the genome itself. We have developed the Genome Mappability Analyzer to compute the GMS of every position in a genome. It leverages the parallelism of cloud computing to analyze large genomes, and enabled us to identify the 5-14% of the human, mouse, fly and yeast genomes that are difficult to analyze with short reads. We examined the accuracy of the widely used BWA/SAMtools polymorphism discovery pipeline in the context of the GMS, and found discovery errors are dominated by false negatives, especially in regions with poor GMS. These errors are fundamental to the mapping process and cannot be overcome by increasing coverage. As such, the GMS should be considered in every resequencing project to pinpoint the 'dark matter' of the genome, including of known clinically relevant variations in these regions. The source code and profiles of several model organisms are available at http://gma-bio.sourceforge.net
Genomic Heterogeneity as a Barrier to Precision Medicine in Gastroesophageal Adenocarcinoma.
Pectasides, Eirini; Stachler, Matthew D; Derks, Sarah; Liu, Yang; Maron, Steven; Islam, Mirazul; Alpert, Lindsay; Kwak, Heewon; Kindler, Hedy; Polite, Blase; Sharma, Manish R; Allen, Kenisha; O'Day, Emily; Lomnicki, Samantha; Maranto, Melissa; Kanteti, Rajani; Fitzpatrick, Carrie; Weber, Christopher; Setia, Namrata; Xiao, Shu-Yuan; Hart, John; Nagy, Rebecca J; Kim, Kyoung-Mee; Choi, Min-Gew; Min, Byung-Hoon; Nason, Katie S; O'Keefe, Lea; Watanabe, Masayuki; Baba, Hideo; Lanman, Rick; Agoston, Agoston T; Oh, David J; Dunford, Andrew; Thorner, Aaron R; Ducar, Matthew D; Wollison, Bruce M; Coleman, Haley A; Ji, Yuan; Posner, Mitchell C; Roggin, Kevin; Turaga, Kiran; Chang, Paul; Hogarth, Kyle; Siddiqui, Uzma; Gelrud, Andres; Ha, Gavin; Freeman, Samuel S; Rhoades, Justin; Reed, Sarah; Gydush, Greg; Rotem, Denisse; Davison, Jon; Imamura, Yu; Adalsteinsson, Viktor; Lee, Jeeyun; Bass, Adam J; Catenacci, Daniel V
2018-01-01
Gastroesophageal adenocarcinoma (GEA) is a lethal disease where targeted therapies, even when guided by genomic biomarkers, have had limited efficacy. A potential reason for the failure of such therapies is that genomic profiling results could commonly differ between the primary and metastatic tumors. To evaluate genomic heterogeneity, we sequenced paired primary GEA and synchronous metastatic lesions across multiple cohorts, finding extensive differences in genomic alterations, including discrepancies in potentially clinically relevant alterations. Multiregion sequencing showed significant discrepancy within the primary tumor (PT) and between the PT and disseminated disease, with oncogene amplification profiles commonly discordant. In addition, a pilot analysis of cell-free DNA (cfDNA) sequencing demonstrated the feasibility of detecting genomic amplifications not detected in PT sampling. Lastly, we profiled paired primary tumors, metastatic tumors, and cfDNA from patients enrolled in the personalized antibodies for GEA (PANGEA) trial of targeted therapies in GEA and found that genomic biomarkers were recurrently discrepant between the PT and untreated metastases. Divergent primary and metastatic tissue profiling led to treatment reassignment in 32% (9/28) of patients. In discordant primary and metastatic lesions, we found 87.5% concordance for targetable alterations in metastatic tissue and cfDNA, suggesting the potential for cfDNA profiling to enhance selection of therapy. Significance: We demonstrate frequent baseline heterogeneity in targetable genomic alterations in GEA, indicating that current tissue sampling practices for biomarker testing do not effectively guide precision medicine in this disease and that routine profiling of metastatic lesions and/or cfDNA should be systematically evaluated. Cancer Discov; 8(1); 37-48. ©2017 AACR. See related commentary by Sundar and Tan, p. 14 See related article by Janjigian et al., p. 49 This article is highlighted in the In This Issue feature, p. 1 . ©2017 American Association for Cancer Research.
Adamson, Britt; Norman, Thomas M.; Jost, Marco; Cho, Min Y.; Nuñez, James K.; Chen, Yuwen; Villalta, Jacqueline E.; Gilbert, Luke A.; Horlbeck, Max A.; Hein, Marco Y.; Pak, Ryan A.; Gray, Andrew N.; Gross, Carol A.; Dixit, Atray; Parnas, Oren; Regev, Aviv; Weissman, Jonathan S.
2016-01-01
SUMMARY Functional genomics efforts face tradeoffs between number of perturbations examined and complexity of phenotypes measured. We bridge this gap with Perturb-seq, which combines droplet-based single-cell RNA-seq with a strategy for barcoding CRISPR-mediated perturbations, allowing many perturbations to be profiled in pooled format. We applied Perturb-seq to dissect the mammalian unfolded protein response (UPR) using single and combinatorial CRISPR perturbations. Two genome-scale CRISPR interference (CRISPRi) screens identified genes whose repression perturbs ER homeostasis. Subjecting ~100 hits to Perturb-seq enabled high-precision functional clustering of genes. Single-cell analyses decoupled the three UPR branches, revealed bifurcated UPR branch activation among cells subject to the same perturbation, and uncovered differential activation of the branches across hits, including an isolated feedback loop between the translocon and IRE1α. These studies provide insight into how the three sensors of ER homeostasis monitor distinct types of stress and highlight the ability of Perturb-seq to dissect complex cellular responses. PMID:27984733
Stepping into the omics era: Opportunities and challenges for biomaterials science and engineering.
Groen, Nathalie; Guvendiren, Murat; Rabitz, Herschel; Welsh, William J; Kohn, Joachim; de Boer, Jan
2016-04-01
The research paradigm in biomaterials science and engineering is evolving from using low-throughput and iterative experimental designs towards high-throughput experimental designs for materials optimization and the evaluation of materials properties. Computational science plays an important role in this transition. With the emergence of the omics approach in the biomaterials field, referred to as materiomics, high-throughput approaches hold the promise of tackling the complexity of materials and understanding correlations between material properties and their effects on complex biological systems. The intrinsic complexity of biological systems is an important factor that is often oversimplified when characterizing biological responses to materials and establishing property-activity relationships. Indeed, in vitro tests designed to predict in vivo performance of a given biomaterial are largely lacking as we are not able to capture the biological complexity of whole tissues in an in vitro model. In this opinion paper, we explain how we reached our opinion that converging genomics and materiomics into a new field would enable a significant acceleration of the development of new and improved medical devices. The use of computational modeling to correlate high-throughput gene expression profiling with high throughput combinatorial material design strategies would add power to the analysis of biological effects induced by material properties. We believe that this extra layer of complexity on top of high-throughput material experimentation is necessary to tackle the biological complexity and further advance the biomaterials field. In this opinion paper, we postulate that converging genomics and materiomics into a new field would enable a significant acceleration of the development of new and improved medical devices. The use of computational modeling to correlate high-throughput gene expression profiling with high throughput combinatorial material design strategies would add power to the analysis of biological effects induced by material properties. We believe that this extra layer of complexity on top of high-throughput material experimentation is necessary to tackle the biological complexity and further advance the biomaterials field. Copyright © 2016. Published by Elsevier Ltd.
Mukunthan, B; Nagaveni, N
2014-01-01
In genetic engineering, conventional techniques and algorithms employed by forensic scientists to assist in identification of individuals on the basis of their respective DNA profiles involves more complex computational steps and mathematical formulae, also the identification of location of mutation in a genomic sequence in laboratories is still an exigent task. This novel approach provides ability to solve the problems that do not have an algorithmic solution and the available solutions are also too complex to be found. The perfect blend made of bioinformatics and neural networks technique results in efficient DNA pattern analysis algorithm with utmost prediction accuracy.
Assigning protein functions by comparative genome analysis protein phylogenetic profiles
Pellegrini, Matteo; Marcotte, Edward M.; Thompson, Michael J.; Eisenberg, David; Grothe, Robert; Yeates, Todd O.
2003-05-13
A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.
2008-09-01
community representation. 12 survey a complex microbial community. Community DNA or rRNA extracted from a sample may require amplification before...restricted to cultivated clades, since not only do many clades have sufficient database representation due to 16S environmental surveys , but such...well developed for standard and comprehensive surveys . Depending on the population being targeted and the identification method, FCM can be a
Ji, Yanqing; Ying, Hao; Tran, John; Dews, Peter; Massanari, R Michael
2016-07-19
Finding highly relevant articles from biomedical databases is challenging not only because it is often difficult to accurately express a user's underlying intention through keywords but also because a keyword-based query normally returns a long list of hits with many citations being unwanted by the user. This paper proposes a novel biomedical literature search system, called BiomedSearch, which supports complex queries and relevance feedback. The system employed association mining techniques to build a k-profile representing a user's relevance feedback. More specifically, we developed a weighted interest measure and an association mining algorithm to find the strength of association between a query and each concept in the article(s) selected by the user as feedback. The top concepts were utilized to form a k-profile used for the next-round search. BiomedSearch relies on Unified Medical Language System (UMLS) knowledge sources to map text files to standard biomedical concepts. It was designed to support queries with any levels of complexity. A prototype of BiomedSearch software was made and it was preliminarily evaluated using the Genomics data from TREC (Text Retrieval Conference) 2006 Genomics Track. Initial experiment results indicated that BiomedSearch increased the mean average precision (MAP) for a set of queries. With UMLS and association mining techniques, BiomedSearch can effectively utilize users' relevance feedback to improve the performance of biomedical literature search.
Automated deconvolution of structured mixtures from heterogeneous tumor genomic data
Roman, Theodore; Xie, Lu
2017-01-01
With increasing appreciation for the extent and importance of intratumor heterogeneity, much attention in cancer research has focused on profiling heterogeneity on a single patient level. Although true single-cell genomic technologies are rapidly improving, they remain too noisy and costly at present for population-level studies. Bulk sequencing remains the standard for population-scale tumor genomics, creating a need for computational tools to separate contributions of multiple tumor clones and assorted stromal and infiltrating cell populations to pooled genomic data. All such methods are limited to coarse approximations of only a few cell subpopulations, however. In prior work, we demonstrated the feasibility of improving cell type deconvolution by taking advantage of substructure in genomic mixtures via a strategy called simplicial complex unmixing. We improve on past work by introducing enhancements to automate learning of substructured genomic mixtures, with specific emphasis on genome-wide copy number variation (CNV) data, as well as the ability to process quantitative RNA expression data, and heterogeneous combinations of RNA and CNV data. We introduce methods for dimensionality estimation to better decompose mixture model substructure; fuzzy clustering to better identify substructure in sparse, noisy data; and automated model inference methods for other key model parameters. We further demonstrate their effectiveness in identifying mixture substructure in true breast cancer CNV data from the Cancer Genome Atlas (TCGA). Source code is available at https://github.com/tedroman/WSCUnmix PMID:29059177
A genome-wide SNP scan accelerates trait-regulatory genomic loci identification in chickpea
Kujur, Alice; Bajaj, Deepak; Upadhyaya, Hari D.; Das, Shouvik; Ranjan, Rajeev; Shree, Tanima; Saxena, Maneesha S.; Badoni, Saurabh; Kumar, Vinod; Tripathi, Shailesh; Gowda, C.L.L.; Sharma, Shivali; Singh, Sube; Tyagi, Akhilesh K.; Parida, Swarup K.
2015-01-01
We identified 44844 high-quality SNPs by sequencing 92 diverse chickpea accessions belonging to a seed and pod trait-specific association panel using reference genome- and de novo-based GBS (genotyping-by-sequencing) assays. A GWAS (genome-wide association study) in an association panel of 211, including the 92 sequenced accessions, identified 22 major genomic loci showing significant association (explaining 23–47% phenotypic variation) with pod and seed number/plant and 100-seed weight. Eighteen trait-regulatory major genomic loci underlying 13 robust QTLs were validated and mapped on an intra-specific genetic linkage map by QTL mapping. A combinatorial approach of GWAS, QTL mapping and gene haplotype-specific LD mapping and transcript profiling uncovered one superior haplotype and favourable natural allelic variants in the upstream regulatory region of a CesA-type cellulose synthase (Ca_Kabuli_CesA3) gene regulating high pod and seed number/plant (explaining 47% phenotypic variation) in chickpea. The up-regulation of this superior gene haplotype correlated with increased transcript expression of Ca_Kabuli_CesA3 gene in the pollen and pod of high pod/seed number accession, resulting in higher cellulose accumulation for normal pollen and pollen tube growth. A rapid combinatorial genome-wide SNP genotyping-based approach has potential to dissect complex quantitative agronomic traits and delineate trait-regulatory genomic loci (candidate genes) for genetic enhancement in crop plants, including chickpea. PMID:26058368
LeBlanc, Chantal; Lee, Tae-Jin; Mulvaney, Patrick; Allen, George C.; Martienssen, Robert A.; Thompson, William F.
2017-01-01
All plants and animals must replicate their DNA, using a regulated process to ensure that their genomes are completely and accurately replicated. DNA replication timing programs have been extensively studied in yeast and animal systems, but much less is known about the replication programs of plants. We report a novel adaptation of the “Repli-seq” assay for use in intact root tips of maize (Zea mays) that includes several different cell lineages and present whole-genome replication timing profiles from cells in early, mid, and late S phase of the mitotic cell cycle. Maize root tips have a complex replication timing program, including regions of distinct early, mid, and late S replication that each constitute between 20 and 24% of the genome, as well as other loci corresponding to ∼32% of the genome that exhibit replication activity in two different time windows. Analyses of genomic, transcriptional, and chromatin features of the euchromatic portion of the maize genome provide evidence for a gradient of early replicating, open chromatin that transitions gradually to less open and less transcriptionally active chromatin replicating in mid S phase. Our genomic level analysis also demonstrated that the centromere core replicates in mid S, before heavily compacted classical heterochromatin, including pericentromeres and knobs, which replicate during late S phase. PMID:28842533
Montesanto, Alberto; Geracitano, Silvana; Garasto, Sabrina; Fusco, Sergio; Lattanzio, Fabrizia; Passarino, Giuseppe; Corsonello, Andrea
2016-01-01
Before the last decade, attempts to identify the genetic factors involved in the susceptibility to age-related complex diseases such as cardiovascular disease, diabetes and cancer had very limited success. Recently, two important advancements have provided new opportunities to improve our knowledge in this field. Firstly, it has emerged the concept of studying the molecular mechanisms underlying the age related decline of the organism (such as cellular senescence), rather than the genetics of single disorders. In addition, advances in DNA technology have uncovered an incredible number of common susceptibility variants for several complex traits. Despite these progresses, the translation of these discoveries into clinical practice has been very difficult. To date, several attempts in translating genomics to medicine are being carried out to look for the best way by which genomic discoveries may improve our understanding of fundamental issues in the prediction and prevention of some complex diseases. The successful strategy seems to be testing simultaneously multiple susceptibility variants in combination with traditional risk factors. In fact, such approach showed that genetic factors substantially improve the prediction of complex diseases especially for coronary heart disease and prostate cancer, making possible appropriate behavioural and medical interventions. In the future, the identification of new genetic variants and their inclusion into current risk profile models will probably improve the discrimination power of these models for other complex diseases such as type 2 diabetes mellitus and breast cancer. On the other hand, for traits with low heritability, this improvement will probably be negligible, and this will urge further researches on the role played by traditional and newly discovered non-genetic risk factors.
Spiked GBS: A unified, open platform for single marker genotyping and whole-genome profiling
USDA-ARS?s Scientific Manuscript database
In plant breeding, there are two primary applications for DNA markers in selection: 1) selection of known genes using a single marker assay (marker-assisted selection; MAS); and 2) whole-genome profiling and prediction (genomic selection; GS). Typically, marker platforms have addressed only one of t...
Albrecht-Buehler, Guenter
2007-09-01
In genome duplexes that exceed 100 kb the frequency distributions of their trinucleotides (triplet profiles) are the same in both strands. This remarkable symmetry, sometimes called Chargaff's second parity rule, is not the result of base pairing, but can be explained as the result of countless inversions and inverted transpositions that occurred throughout evolution (G. Albrecht-Buehler, 2006, Proc. Natl. Acad. Sci. USA 103, 17828-17833). Furthermore, comparing the triplet profiles of genomes from a large number of different taxa and species revealed that they were not only strand-symmetrical, but even surprisingly similar to one another (majority profile; G. Albrecht-Buehler, 2007, Genomics 89, 596-601). The present article proposes that the same inversion/transposition mechanism(s) that created the strand symmetry may also explain the existence of the majority profile. Thus they may be key factors in the creation of an almost universal "format" in which genome sequences are written. One may speculate that this universality of genome format may facilitate horizontal gene transfer and, thus, accelerate evolution.
Managing the genomic revolution in cancer diagnostics.
Nguyen, Doreen; Gocke, Christopher D
2017-08-01
Molecular tumor profiling is now a routine part of patient care, revealing targetable genomic alterations and molecularly distinct tumor subtypes with therapeutic and prognostic implications. The widespread adoption of next-generation sequencing technologies has greatly facilitated clinical implementation of genomic data and opened the door for high-throughput multigene-targeted sequencing. Herein, we discuss the variability of cancer genetic profiling currently offered by clinical laboratories, the challenges of applying rapidly evolving medical knowledge to individual patients, and the need for more standardized population-based molecular profiling.
Absence of Complex I Implicates Rearrangement of the Respiratory Chain in European Mistletoe.
Senkler, Jennifer; Rugen, Nils; Eubel, Holger; Hegermann, Jan; Braun, Hans-Peter
2018-05-21
The mitochondrial oxidative phosphorylation (OXPHOS) system, which is based on the presence of five protein complexes, is in the very center of cellular ATP production. Complexes I to IV are components of the respiratory electron transport chain that drives proton translocation across the inner mitochondrial membrane. The resulting proton gradient is used by complex V (the ATP synthase complex) for the phosphorylation of ADP. Occurrence of complexes I to V is highly conserved in eukaryotes, with exceptions being restricted to unicellular parasites that take up energy-rich compounds from their hosts. Here we present biochemical evidence that the European mistletoe (Viscum album), an obligate semi-parasite living on branches of trees, has a highly unusual OXPHOS system. V. album mitochondria completely lack complex I and have greatly reduced amounts of complexes II and V. At the same time, the complexes III and IV form remarkably stable respiratory supercomplexes. Furthermore, complexome profiling revealed the presence of 150 kDa complexes that include type II NAD(P)H dehydrogenases and an alternative oxidase. Although the absence of complex I genes in mitochondrial genomes of mistletoe species has recently been reported, this is the first biochemical proof that these genes have not been transferred to the nuclear genome and that this respiratory complex indeed is not assembled. As a consequence, the whole respiratory chain is remodeled. Our results demonstrate that, in the context of parasitism, multicellular life can cope with lack of one of the OXPHOS complexes and give new insights into the life strategy of mistletoe species. Copyright © 2018 Elsevier Ltd. All rights reserved.
i-ADHoRe 2.0: an improved tool to detect degenerated genomic homology using genomic profiles.
Simillion, Cedric; Janssens, Koen; Sterck, Lieven; Van de Peer, Yves
2008-01-01
i-ADHoRe is a software tool that combines gene content and gene order information of homologous genomic segments into profiles to detect highly degenerated homology relations within and between genomes. The new version offers, besides a significant increase in performance, several optimizations to the algorithm, most importantly to the profile alignment routine. As a result, the annotations of multiple genomes, or parts thereof, can be fed simultaneously into the program, after which it will report all regions of homology, both within and between genomes. The i-ADHoRe 2.0 package contains the C++ source code for the main program as well as various Perl scripts and a fully documented Perl API to facilitate post-processing. The software runs on any Linux- or -UNIX based platform. The package is freely available for academic users and can be downloaded from http://bioinformatics.psb.ugent.be/
The transformative potential of an integrative approach to pregnancy.
Eidem, Haley R; McGary, Kriston L; Capra, John A; Abbot, Patrick; Rokas, Antonis
2017-09-01
Complex traits typically involve diverse biological pathways and are shaped by numerous genetic and environmental factors. Pregnancy-associated traits and pathologies are further complicated by extensive communication across multiple tissues in two individuals, interactions between two genomes-maternal and fetal-that obscure causal variants and lead to genetic conflict, and rapid evolution of pregnancy-associated traits across mammals and in the human lineage. Given the multi-faceted complexity of human pregnancy, integrative approaches that synthesize diverse data types and analyses harbor tremendous promise to identify the genetic architecture and environmental influences underlying pregnancy-associated traits and pathologies. We review current research that addresses the extreme complexities of traits and pathologies associated with human pregnancy. We find that successful efforts to address the many complexities of pregnancy-associated traits and pathologies often harness the power of many and diverse types of data, including genome-wide association studies, evolutionary analyses, multi-tissue transcriptomic profiles, and environmental conditions. We propose that understanding of pregnancy and its pathologies will be accelerated by computational platforms that provide easy access to integrated data and analyses. By simplifying the integration of diverse data, such platforms will provide a comprehensive synthesis that transcends many of the inherent challenges present in studies of pregnancy. Copyright © 2017 Elsevier Ltd. All rights reserved.
Peña-Llopis, Samuel; Brugarolas, James
2014-01-01
Genomic technologies have revolutionized our understanding of complex Mendelian diseases and cancer. Solid tumors present several challenges for genomic analyses, such as tumor heterogeneity and tumor contamination with surrounding stroma and infiltrating lymphocytes. We developed a protocol to (i) select tissues of high cellular purity on the basis of histological analyses of immediately flanking sections and (ii) simultaneously extract genomic DNA (gDNA), messenger RNA (mRNA), noncoding RNA (ncRNA; enriched in microRNA (miRNA)) and protein from the same tissues. After tissue selection, about 12–16 extractions of DNA/RNA/protein can be obtained per day. Compared with other similar approaches, this fast and reliable methodology allowed us to identify mutations in tumors with remarkable sensitivity and to perform integrative analyses of whole-genome and exome data sets, DNA copy numbers (by single-nucleotide polymorphism (SNP) arrays), gene expression data (by transcriptome profiling and quantitative PCR (qPCR)) and protein levels (by western blotting and immunohistochemical analysis) from the same samples. Although we focused on renal cell carcinoma, this protocol may be adapted with minor changes to any human or animal tissue to obtain high-quality and high-yield nucleic acids and proteins. PMID:24136348
designGG: an R-package and web tool for the optimal design of genetical genomics experiments.
Li, Yang; Swertz, Morris A; Vera, Gonzalo; Fu, Jingyuan; Breitling, Rainer; Jansen, Ritsert C
2009-06-18
High-dimensional biomolecular profiling of genetically different individuals in one or more environmental conditions is an increasingly popular strategy for exploring the functioning of complex biological systems. The optimal design of such genetical genomics experiments in a cost-efficient and effective way is not trivial. This paper presents designGG, an R package for designing optimal genetical genomics experiments. A web implementation for designGG is available at http://gbic.biol.rug.nl/designGG. All software, including source code and documentation, is freely available. DesignGG allows users to intelligently select and allocate individuals to experimental units and conditions such as drug treatment. The user can maximize the power and resolution of detecting genetic, environmental and interaction effects in a genome-wide or local mode by giving more weight to genome regions of special interest, such as previously detected phenotypic quantitative trait loci. This will help to achieve high power and more accurate estimates of the effects of interesting factors, and thus yield a more reliable biological interpretation of data. DesignGG is applicable to linkage analysis of experimental crosses, e.g. recombinant inbred lines, as well as to association analysis of natural populations.
Comprehensive genomic profiles of small cell lung cancer
George, Julie; Lim, Jing Shan; Jang, Se Jin; Cun, Yupeng; Ozretić, Luka; Kong, Gu; Leenders, Frauke; Lu, Xin; Fernández-Cuesta, Lynnette; Bosco, Graziella; Müller, Christian; Dahmen, Ilona; Jahchan, Nadine S.; Park, Kwon-Sik; Yang, Dian; Karnezis, Anthony N.; Vaka, Dedeepya; Torres, Angela; Wang, Maia Segura; Korbel, Jan O.; Menon, Roopika; Chun, Sung-Min; Kim, Deokhoon; Wilkerson, Matt; Hayes, Neil; Engelmann, David; Pützer, Brigitte; Bos, Marc; Michels, Sebastian; Vlasic, Ignacija; Seidel, Danila; Pinther, Berit; Schaub, Philipp; Becker, Christian; Altmüller, Janine; Yokota, Jun; Kohno, Takashi; Iwakawa, Reika; Tsuta, Koji; Noguchi, Masayuki; Muley, Thomas; Hoffmann, Hans; Schnabel, Philipp A.; Petersen, Iver; Chen, Yuan; Soltermann, Alex; Tischler, Verena; Choi, Chang-min; Kim, Yong-Hee; Massion, Pierre P.; Zou, Yong; Jovanovic, Dragana; Kontic, Milica; Wright, Gavin M.; Russell, Prudence A.; Solomon, Benjamin; Koch, Ina; Lindner, Michael; Muscarella, Lucia A.; la Torre, Annamaria; Field, John K.; Jakopovic, Marko; Knezevic, Jelena; Castaños-Vélez, Esmeralda; Roz, Luca; Pastorino, Ugo; Brustugun, Odd-Terje; Lund-Iversen, Marius; Thunnissen, Erik; Köhler, Jens; Schuler, Martin; Botling, Johan; Sandelin, Martin; Sanchez-Cespedes, Montserrat; Salvesen, Helga B.; Achter, Viktor; Lang, Ulrich; Bogus, Magdalena; Schneider, Peter M.; Zander, Thomas; Ansén, Sascha; Hallek, Michael; Wolf, Jürgen; Vingron, Martin; Yatabe, Yasushi; Travis, William D.; Nürnberg, Peter; Reinhardt, Christian; Perner, Sven; Heukamp, Lukas; Büttner, Reinhard; Haas, Stefan A.; Brambilla, Elisabeth; Peifer, Martin; Sage, Julien; Thomas, Roman K.
2016-01-01
We have sequenced the genomes of 110 small cell lung cancers (SCLC), one of the deadliest human cancers. In nearly all the tumours analysed we found bi-allelic inactivation of TP53 and RB1, sometimes by complex genomic rearrangements. Two tumours with wild-type RB1 had evidence of chromothripsis leading to overexpression of cyclin D1 (encoded by the CCND1 gene), revealing an alternative mechanism of Rb1 deregulation. Thus, loss of the tumour suppressors TP53 and RB1 is obligatory in SCLC. We discovered somatic genomic rearrangements of TP73 that create an oncogenic version of this gene, TP73Δex2/3. In rare cases, SCLC tumours exhibited kinase gene mutations, providing a possible therapeutic opportunity for individual patients. Finally, we observed inactivating mutations in NOTCH family genes in 25% of human SCLC. Accordingly, activation of Notch signalling in a pre-clinical SCLC mouse model strikingly reduced the number of tumours and extended the survival of the mutant mice. Furthermore, neuroendocrine gene expression was abrogated by Notch activity in SCLC cells. This first comprehensive study of somatic genome alterations in SCLC uncovers several key biological processes and identifies candidate therapeutic targets in this highly lethal form of cancer. PMID:26168399
Bajaj, Deepak; Das, Shouvik; Upadhyaya, Hari D.; Ranjan, Rajeev; Badoni, Saurabh; Kumar, Vinod; Tripathi, Shailesh; Gowda, C. L. Laxmipathi; Sharma, Shivali; Singh, Sube; Tyagi, Akhilesh K.; Parida, Swarup K.
2015-01-01
The study identified 9045 high-quality SNPs employing both genome-wide GBS- and candidate gene-based SNP genotyping assays in 172, including 93 cultivated (desi and kabuli) and 79 wild chickpea accessions. The GWAS in a structured population of 93 sequenced accessions detected 15 major genomic loci exhibiting significant association with seed coat color. Five seed color-associated major genomic loci underlying robust QTLs mapped on a high-density intra-specific genetic linkage map were validated by QTL mapping. The integration of association and QTL mapping with gene haplotype-specific LD mapping and transcript profiling identified novel allelic variants (non-synonymous SNPs) and haplotypes in a MATE secondary transporter gene regulating light/yellow brown and beige seed coat color differentiation in chickpea. The down-regulation and decreased transcript expression of beige seed coat color-associated MATE gene haplotype was correlated with reduced proanthocyanidins accumulation in the mature seed coats of beige than light/yellow brown seed colored desi and kabuli accessions for their coloration/pigmentation. This seed color-regulating MATE gene revealed strong purifying selection pressure primarily in LB/YB seed colored desi and wild Cicer reticulatum accessions compared with the BE seed colored kabuli accessions. The functionally relevant molecular tags identified have potential to decipher the complex transcriptional regulatory gene function of seed coat coloration and for understanding the selective sweep-based seed color trait evolutionary pattern in cultivated and wild accessions during chickpea domestication. The genome-wide integrated approach employed will expedite marker-assisted genetic enhancement for developing cultivars with desirable seed coat color types in chickpea. PMID:26635822
g:Profiler-a web server for functional interpretation of gene lists (2016 update).
Reimand, Jüri; Arak, Tambet; Adler, Priit; Kolberg, Liis; Reisberg, Sulev; Peterson, Hedi; Vilo, Jaak
2016-07-08
Functional enrichment analysis is a key step in interpreting gene lists discovered in diverse high-throughput experiments. g:Profiler studies flat and ranked gene lists and finds statistically significant Gene Ontology terms, pathways and other gene function related terms. Translation of hundreds of gene identifiers is another core feature of g:Profiler. Since its first publication in 2007, our web server has become a popular tool of choice among basic and translational researchers. Timeliness is a major advantage of g:Profiler as genome and pathway information is synchronized with the Ensembl database in quarterly updates. g:Profiler supports 213 species including mammals and other vertebrates, plants, insects and fungi. The 2016 update of g:Profiler introduces several novel features. We have added further functional datasets to interpret gene lists, including transcription factor binding site predictions, Mendelian disease annotations, information about protein expression and complexes and gene mappings of human genetic polymorphisms. Besides the interactive web interface, g:Profiler can be accessed in computational pipelines using our R package, Python interface and BioJS component. g:Profiler is freely available at http://biit.cs.ut.ee/gprofiler/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
How genome complexity can explain the difficulty of aligning reads to genomes.
Phan, Vinhthuy; Gao, Shanshan; Tran, Quang; Vo, Nam S
2015-01-01
Although it is frequently observed that aligning short reads to genomes becomes harder if they contain complex repeat patterns, there has not been much effort to quantify the relationship between complexity of genomes and difficulty of short-read alignment. Existing measures of sequence complexity seem unsuitable for the understanding and quantification of this relationship. We investigated several measures of complexity and found that length-sensitive measures of complexity had the highest correlation to accuracy of alignment. In particular, the rate of distinct substrings of length k, where k is similar to the read length, correlated very highly to alignment performance in terms of precision and recall. We showed how to compute this measure efficiently in linear time, making it useful in practice to estimate quickly the difficulty of alignment for new genomes without having to align reads to them first. We showed how the length-sensitive measures could provide additional information for choosing aligners that would align consistently accurately on new genomes. We formally established a connection between genome complexity and the accuracy of short-read aligners. The relationship between genome complexity and alignment accuracy provides additional useful information for selecting suitable aligners for new genomes. Further, this work suggests that the complexity of genomes sometimes should be thought of in terms of specific computational problems, such as the alignment of short reads to genomes.
Accurate read-based metagenome characterization using a hierarchical suite of unique signatures
Freitas, Tracey Allen K.; Li, Po-E; Scholz, Matthew B.; Chain, Patrick S. G.
2015-01-01
A major challenge in the field of shotgun metagenomics is the accurate identification of organisms present within a microbial community, based on classification of short sequence reads. Though existing microbial community profiling methods have attempted to rapidly classify the millions of reads output from modern sequencers, the combination of incomplete databases, similarity among otherwise divergent genomes, errors and biases in sequencing technologies, and the large volumes of sequencing data required for metagenome sequencing has led to unacceptably high false discovery rates (FDR). Here, we present the application of a novel, gene-independent and signature-based metagenomic taxonomic profiling method with significantly and consistently smaller FDR than any other available method. Our algorithm circumvents false positives using a series of non-redundant signature databases and examines Genomic Origins Through Taxonomic CHAllenge (GOTTCHA). GOTTCHA was tested and validated on 20 synthetic and mock datasets ranging in community composition and complexity, was applied successfully to data generated from spiked environmental and clinical samples, and robustly demonstrates superior performance compared with other available tools. PMID:25765641
Interplay of heritage and habitat in the distribution of bacterial signal transduction systems.
Galperin, Michael Y; Higdon, Roger; Kolker, Eugene
2010-04-01
Comparative analysis of the complete genome sequences from a variety of poorly studied organisms aims at predicting ecological and behavioral properties of these organisms and helping in characterizing their habitats. This task requires finding appropriate descriptors that could be correlated with the core traits of each system and would allow meaningful comparisons. Using the relatively simple bacterial models, first attempts have been made to introduce suitable metrics to describe the complexity of organism's signaling machinery, which included introducing the "bacterial IQ" score. Here, we use an updated census of prokaryotic signal transduction systems to improve this parameter and evaluate its consistency within selected bacterial phyla. We also introduce a more elaborate descriptor, a set of profiles of relative abundance of members of each family of signal transduction proteins encoded in each genome. We show that these family profiles are well conserved within each genus and are often consistent within families of bacteria. Thus, they reflect evolutionary relationships between organisms as well as individual adaptations of each organism to its specific ecological niche.
Yasui, Yasuo; Hirakawa, Hideki; Oikawa, Tetsuo; Toyoshima, Masami; Matsuzaki, Chiaki; Ueno, Mariko; Mizuno, Nobuyuki; Nagatoshi, Yukari; Imamura, Tomohiro; Miyago, Manami; Tanaka, Kojiro; Mise, Kazuyuki; Tanaka, Tsutomu; Mizukoshi, Hiroharu; Mori, Masashi; Fujita, Yasunari
2016-01-01
Chenopodium quinoa Willd. (quinoa) originated from the Andean region of South America, and is a pseudocereal crop of the Amaranthaceae family. Quinoa is emerging as an important crop with the potential to contribute to food security worldwide and is considered to be an optimal food source for astronauts, due to its outstanding nutritional profile and ability to tolerate stressful environments. Furthermore, plant pathologists use quinoa as a representative diagnostic host to identify virus species. However, molecular analysis of quinoa is limited by its genetic heterogeneity due to outcrossing and its genome complexity derived from allotetraploidy. To overcome these obstacles, we established the inbred and standard quinoa accession Kd that enables rigorous molecular analysis, and presented the draft genome sequence of Kd, using an optimized combination of high-throughput next generation sequencing on the Illumina Hiseq 2500 and PacBio RS II sequencers. The de novo genome assembly contained 25 k scaffolds consisting of 1 Gbp with N50 length of 86 kbp. Based on these data, we constructed the free-access Quinoa Genome DataBase (QGDB). Thus, these findings provide insights into the mechanisms underlying agronomically important traits of quinoa and the effect of allotetraploidy on genome evolution. PMID:27458999
Grigoryev, Yevgeniy A.; Kurian, Sunil M.; Avnur, Zafi; Borie, Dominic; Deng, Jun; Campbell, Daniel; Sung, Joanna; Nikolcheva, Tania; Quinn, Anthony; Schulman, Howard; Peng, Stanford L.; Schaffer, Randolph; Fisher, Jonathan; Mondala, Tony; Head, Steven; Flechner, Stuart M.; Kantor, Aaron B.; Marsh, Christopher; Salomon, Daniel R.
2010-01-01
A major challenge for the field of transplantation is the lack of understanding of genomic and molecular drivers of early post-transplant immunity. The early immune response creates a complex milieu that determines the course of ensuing immune events and the ultimate outcome of the transplant. The objective of the current study was to mechanistically deconvolute the early immune response by purifying and profiling the constituent cell subsets of the peripheral blood. We employed genome-wide profiling of whole blood and purified CD4, CD8, B cells and monocytes in tandem with high-throughput laser-scanning cytometry in 10 kidney transplants sampled serially pre-transplant, 1, 2, 4, 8 and 12 weeks. Cytometry confirmed early cell subset depletion by antibody induction and immunosuppression. Multiple markers revealed the activation and proliferative expansion of CD45RO+CD62L− effector memory CD4/CD8 T cells as well as progressive activation of monocytes and B cells. Next, we mechanistically deconvoluted early post-transplant immunity by serial monitoring of whole blood using DNA microarrays. Parallel analysis of cell subset-specific gene expression revealed a unique spectrum of time-dependent changes and functional pathways. Gene expression profiling results were validated with 157 different probesets matching all 65 antigens detected by cytometry. Thus, serial blood cell monitoring reflects the profound changes in blood cell composition and immune activation early post-transplant. Each cell subset reveals distinct pathways and functional programs. These changes illuminate a complex, early phase of immunity and inflammation that includes activation and proliferative expansion of the memory effector and regulatory cells that may determine the phenotype and outcome of the kidney transplant. PMID:20976225
Grigoryev, Yevgeniy A; Kurian, Sunil M; Avnur, Zafi; Borie, Dominic; Deng, Jun; Campbell, Daniel; Sung, Joanna; Nikolcheva, Tania; Quinn, Anthony; Schulman, Howard; Peng, Stanford L; Schaffer, Randolph; Fisher, Jonathan; Mondala, Tony; Head, Steven; Flechner, Stuart M; Kantor, Aaron B; Marsh, Christopher; Salomon, Daniel R
2010-10-14
A major challenge for the field of transplantation is the lack of understanding of genomic and molecular drivers of early post-transplant immunity. The early immune response creates a complex milieu that determines the course of ensuing immune events and the ultimate outcome of the transplant. The objective of the current study was to mechanistically deconvolute the early immune response by purifying and profiling the constituent cell subsets of the peripheral blood. We employed genome-wide profiling of whole blood and purified CD4, CD8, B cells and monocytes in tandem with high-throughput laser-scanning cytometry in 10 kidney transplants sampled serially pre-transplant, 1, 2, 4, 8 and 12 weeks. Cytometry confirmed early cell subset depletion by antibody induction and immunosuppression. Multiple markers revealed the activation and proliferative expansion of CD45RO(+)CD62L(-) effector memory CD4/CD8 T cells as well as progressive activation of monocytes and B cells. Next, we mechanistically deconvoluted early post-transplant immunity by serial monitoring of whole blood using DNA microarrays. Parallel analysis of cell subset-specific gene expression revealed a unique spectrum of time-dependent changes and functional pathways. Gene expression profiling results were validated with 157 different probesets matching all 65 antigens detected by cytometry. Thus, serial blood cell monitoring reflects the profound changes in blood cell composition and immune activation early post-transplant. Each cell subset reveals distinct pathways and functional programs. These changes illuminate a complex, early phase of immunity and inflammation that includes activation and proliferative expansion of the memory effector and regulatory cells that may determine the phenotype and outcome of the kidney transplant.
Kabani, Sarah; Fenn, Katelyn; Ross, Alan; Ivens, Al; Smith, Terry K; Ghazal, Peter; Matthews, Keith
2009-01-01
Background Trypanosomes undergo extensive developmental changes during their complex life cycle. Crucial among these is the transition between slender and stumpy bloodstream forms and, thereafter, the differentiation from stumpy to tsetse-midgut procyclic forms. These developmental events are highly regulated, temporally reproducible and accompanied by expression changes mediated almost exclusively at the post-transcriptional level. Results In this study we have examined, by whole-genome microarray analysis, the mRNA abundance of genes in slender and stumpy forms of T.brucei AnTat1.1 cells, and also during their synchronous differentiation to procyclic forms. In total, five biological replicates representing the differentiation of matched parasite populations derived from five individual mouse infections were assayed, with RNAs being derived at key biological time points during the time course of their synchronous differentiation to procyclic forms. Importantly, the biological context of these mRNA profiles was established by assaying the coincident cellular events in each population (surface antigen exchange, morphological restructuring, cell cycle re-entry), thereby linking the observed gene expression changes to the well-established framework of trypanosome differentiation. Conclusion Using stringent statistical analysis and validation of the derived profiles against experimentally-predicted gene expression and phenotypic changes, we have established the profile of regulated gene expression during these important life-cycle transitions. The highly synchronous nature of differentiation between stumpy and procyclic forms also means that these studies of mRNA profiles are directly relevant to the changes in mRNA abundance within individual cells during this well-characterised developmental transition. PMID:19747379
Evaluating cell lines as tumour models by comparison of genomic profiles
Domcke, Silvia; Sinha, Rileen; Levine, Douglas A.; Sander, Chris; Schultz, Nikolaus
2013-01-01
Cancer cell lines are frequently used as in vitro tumour models. Recent molecular profiles of hundreds of cell lines from The Cancer Cell Line Encyclopedia and thousands of tumour samples from the Cancer Genome Atlas now allow a systematic genomic comparison of cell lines and tumours. Here we analyse a panel of 47 ovarian cancer cell lines and identify those that have the highest genetic similarity to ovarian tumours. Our comparison of copy-number changes, mutations and mRNA expression profiles reveals pronounced differences in molecular profiles between commonly used ovarian cancer cell lines and high-grade serous ovarian cancer tumour samples. We identify several rarely used cell lines that more closely resemble cognate tumour profiles than commonly used cell lines, and we propose these lines as the most suitable models of ovarian cancer. Our results indicate that the gap between cell lines and tumours can be bridged by genomically informed choices of cell line models for all tumour types. PMID:23839242
Translating genomic discoveries to the clinic in pediatric oncology.
Glade Bender, Julia; Verma, Anupam; Schiffman, Joshua D
2015-02-01
The present study describes the recent advances in the identification of targetable genomic alterations in pediatric cancers, along with the progress and associated challenges in translating these findings into therapeutic benefit. Each field within pediatric cancer has rapidly and comprehensively begun to define genomic targets in tumors that potentially can improve the clinical outcome of patients, including hematologic malignancies (leukemia and lymphoma), solid malignancies (neuroblastoma, rhabdomyosarcoma, Ewing sarcoma, and osteosarcoma), and brain tumors (gliomas, ependymomas, and medulloblastomas). Although each tumor has specific and sometimes overlapping genomic targets, the translation to the clinic of new targeted trials and precision medicine protocols is still in its infancy. The first clinical tumor profiling studies in pediatric oncology have demonstrated feasibility and patient enthusiasm for the personalized medicine paradigm, but have yet to demonstrate clinical utility. Complexities influencing implementation include rapidly evolving sequencing technologies, tumor heterogeneity, and lack of access to targeted therapies. The return of incidental findings from the germline also remains a challenge, with evolving policy statements and accepted standards. The translation of genomic discoveries to the clinic in pediatric oncology continues to move forward at a brisk pace. Early adoption of genomics for tumor classification, risk stratification, and initial trials of targeted therapeutic agents has led to powerful results. As our experience grows in the integration of genomic and clinical medicine, the outcome for children with cancer should continue to improve.
Hug, Laura A.; Thomas, Brian C.; Sharon, Itai; ...
2015-07-22
Nitrogen, sulfur and carbon fluxes in the terrestrial subsurface are determined by the intersecting activities of microbial community members, yet the organisms responsible are largely unknown. Metagenomic methods can identify organisms and functions, but genome recovery is often precluded by data complexity. To address this limitation, we developed subsampling assembly methods to re-construct high-quality draft genomes from complex samples. Here, we applied these methods to evaluate the interlinked roles of the most abundant organisms in biogeochemical cycling in the aquifer sediment. Community proteomics confirmed these activities. The eight most abundant organisms belong to novel lineages, and two represent phyla withmore » no previously sequenced genome. Four organisms are predicted to fix carbon via the Calvin Benson Bassham, Wood Ljungdahl or 3-hydroxyproprionate/4-hydroxybutarate pathways. The profiled organisms are involved in the network of denitrification, dissimilatory nitrate reduction to ammonia, ammonia oxidation and sulfate reduction/oxidation, and require substrates supplied by other community members. An ammonium-oxidizing Thaumarchaeote is the most abundant community member, despite low ammonium concentrations in the groundwater. Finally, this organism likely benefits from two other relatively abundant organisms capable of producing ammonium from nitrate, which is abundant in the groundwater. Overall, dominant members of the microbial community are interconnected through exchange of geochemical resources.« less
The Cancer Genome Atlas Pan-Cancer analysis project.
Weinstein, John N; Collisson, Eric A; Mills, Gordon B; Shaw, Kenna R Mills; Ozenberger, Brad A; Ellrott, Kyle; Shmulevich, Ilya; Sander, Chris; Stuart, Joshua M
2013-10-01
The Cancer Genome Atlas (TCGA) Research Network has profiled and analyzed large numbers of human tumors to discover molecular aberrations at the DNA, RNA, protein and epigenetic levels. The resulting rich data provide a major opportunity to develop an integrated picture of commonalities, differences and emergent themes across tumor lineages. The Pan-Cancer initiative compares the first 12 tumor types profiled by TCGA. Analysis of the molecular aberrations and their functional roles across tumor types will teach us how to extend therapies effective in one cancer type to others with a similar genomic profile.
Challenges and Opportunities in Genome-Wide Environmental Interaction (GWEI) studies
Aschard, Hugues; Lutz, Sharon; Maus, Bärbel; Duell, Eric J.; Fingerlin, Tasha; Chatterjee, Nilanjan; Kraft, Peter; Van Steen, Kristel
2012-01-01
The interest in performing gene-environment interaction studies has seen a significant increase with the increase of advanced molecular genetics techniques. Practically, it became possible to investigate the role of environmental factors in disease risk and hence to investigate their role as genetic effect modifiers. The understanding that genetics is important in the uptake and metabolism of toxic substances is an example of how genetic profiles can modify important environmental risk factors to disease. Several rationales exist to set up gene-environment interaction studies and the technical challenges related to these studies – when the number of environmental or genetic risk factors is relatively small – has been described before. In the post-genomic era, it is now possible to study thousands of genes and their interaction with the environment. This brings along a whole range of new challenges and opportunities. Despite a continuing effort in developing efficient methods and optimal bioinformatics infrastructures to deal with the available wealth of data, the challenge remains how to best present and analyze Genome-Wide Environmental Interaction (GWEI) studies involving multiple genetic and environmental factors. Since GWEIs are performed at the intersection of statistical genetics, bioinformatics and epidemiology, usually similar problems need to be dealt with as for Genome-Wide Association gene-gene Interaction (GWAI) studies. However, additional complexities need to be considered which are typical for large-scale epidemiological studies, but are also related to “joining” two heterogeneous types of data in explaining complex disease trait variation or for prediction purposes. PMID:22760307
de Groot, Reinoud; Lüthi, Joel; Lindsay, Helen; Holtackers, René; Pelkmans, Lucas
2018-01-23
High-content imaging using automated microscopy and computer vision allows multivariate profiling of single-cell phenotypes. Here, we present methods for the application of the CISPR-Cas9 system in large-scale, image-based, gene perturbation experiments. We show that CRISPR-Cas9-mediated gene perturbation can be achieved in human tissue culture cells in a timeframe that is compatible with image-based phenotyping. We developed a pipeline to construct a large-scale arrayed library of 2,281 sequence-verified CRISPR-Cas9 targeting plasmids and profiled this library for genes affecting cellular morphology and the subcellular localization of components of the nuclear pore complex (NPC). We conceived a machine-learning method that harnesses genetic heterogeneity to score gene perturbations and identify phenotypically perturbed cells for in-depth characterization of gene perturbation effects. This approach enables genome-scale image-based multivariate gene perturbation profiling using CRISPR-Cas9. © 2018 The Authors. Published under the terms of the CC BY 4.0 license.
Bioinformatics/biostatistics: microarray analysis.
Eichler, Gabriel S
2012-01-01
The quantity and complexity of the molecular-level data generated in both research and clinical settings require the use of sophisticated, powerful computational interpretation techniques. It is for this reason that bioinformatic analysis of complex molecular profiling data has become a fundamental technology in the development of personalized medicine. This chapter provides a high-level overview of the field of bioinformatics and outlines several, classic bioinformatic approaches. The highlighted approaches can be aptly applied to nearly any sort of high-dimensional genomic, proteomic, or metabolomic experiments. Reviewed technologies in this chapter include traditional clustering analysis, the Gene Expression Dynamics Inspector (GEDI), GoMiner (GoMiner), Gene Set Enrichment Analysis (GSEA), and the Learner of Functional Enrichment (LeFE).
Emerging Applications of Metabolomic and Genomic Profiling in Diabetic Clinical Medicine
McKillop, Aine M.; Flatt, Peter R.
2011-01-01
Clinical and epidemiological metabolomics provides a unique opportunity to look at genotype-phenotype relationships as well as the body\\x{2019}s responses to environmental and lifestyle factors. Fundamentally, it provides information on the universal outcome of influencing factors on disease states and has great potential in the early diagnosis, therapy monitoring, and understanding of the pathogenesis of disease. Diseases, such as diabetes, with a complex set of interactions between genetic and environmental factors, produce changes in the body\\x{2019}s biochemical profile, thereby providing potential markers for diagnosis and initiation of therapies. There is clearly a need to discover new ways to aid diagnosis and assessment of glycemic status to help reduce diabetes complications and improve the quality of life. Many factors, including peptides, proteins, metabolites, nucleic acids, and polymorphisms, have been proposed as putative biomarkers for diabetes. Metabolomics is an approach used to identify and assess metabolic characteristics, changes, and phenotypes in response to influencing factors, such as environment, diet, lifestyle, and pathophysiological states. The specificity and sensitivity using metabolomics to identify biomarkers of disease have become increasingly feasible because of advances in analytical and information technologies. Likewise, the emergence of high-throughput genotyping technologies and genome-wide association studies has prompted the search for genetic markers of diabetes predisposition or susceptibility. In this review, we consider the application of key metabolomic and genomic methodologies in diabetes and summarize the established, new, and emerging metabolomic and genomic biomarkers for the disease. We conclude by summarizing future insights into the search for improved biomarkers for diabetes research and human diagnostics. PMID:22110171
Eke, Iris; Makinde, Adeola Y; Aryankalayil, Molykutty J; Ahmed, Mansoor M; Coleman, C Norman
2016-11-01
New technologies enabling the analysis of various molecules, including DNA, RNA, proteins and small metabolites, can aid in understanding the complex molecular processes in cancer cells. In particular, for the use of novel targeted therapeutics, elucidation of the mechanisms leading to cell death or survival is crucial to eliminate tumor resistance and optimize therapeutic efficacy. While some techniques, such as genomic analysis for identifying specific gene mutations or epigenetic testing of promoter methylation, are already in clinical use, other "omics-based" assays are still evolving. Here, we provide an overview of the current status of molecular profiling methods, including promising research strategies, as well as possible challenges, and their emerging role in radiation oncology. Published by Elsevier Ireland Ltd.
OrthoDB v8: update of the hierarchical catalog of orthologs and the underlying free software.
Kriventseva, Evgenia V; Tegenfeldt, Fredrik; Petty, Tom J; Waterhouse, Robert M; Simão, Felipe A; Pozdnyakov, Igor A; Ioannidis, Panagiotis; Zdobnov, Evgeny M
2015-01-01
Orthology, refining the concept of homology, is the cornerstone of evolutionary comparative studies. With the ever-increasing availability of genomic data, inference of orthology has become instrumental for generating hypotheses about gene functions crucial to many studies. This update of the OrthoDB hierarchical catalog of orthologs (http://www.orthodb.org) covers 3027 complete genomes, including the most comprehensive set of 87 arthropods, 61 vertebrates, 227 fungi and 2627 bacteria (sampling the most complete and representative genomes from over 11,000 available). In addition to the most extensive integration of functional annotations from UniProt, InterPro, GO, OMIM, model organism phenotypes and COG functional categories, OrthoDB uniquely provides evolutionary annotations including rates of ortholog sequence divergence, copy-number profiles, sibling groups and gene architectures. We re-designed the entirety of the OrthoDB website from the underlying technology to the user interface, enabling the user to specify species of interest and to select the relevant orthology level by the NCBI taxonomy. The text searches allow use of complex logic with various identifiers of genes, proteins, domains, ontologies or annotation keywords and phrases. Gene copy-number profiles can also be queried. This release comes with the freely available underlying ortholog clustering pipeline (http://www.orthodb.org/software). © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Arpeggio: harmonic compression of ChIP-seq data reveals protein-chromatin interaction signatures
Stanton, Kelly Patrick; Parisi, Fabio; Strino, Francesco; Rabin, Neta; Asp, Patrik; Kluger, Yuval
2013-01-01
Researchers generating new genome-wide data in an exploratory sequencing study can gain biological insights by comparing their data with well-annotated data sets possessing similar genomic patterns. Data compression techniques are needed for efficient comparisons of a new genomic experiment with large repositories of publicly available profiles. Furthermore, data representations that allow comparisons of genomic signals from different platforms and across species enhance our ability to leverage these large repositories. Here, we present a signal processing approach that characterizes protein–chromatin interaction patterns at length scales of several kilobases. This allows us to efficiently compare numerous chromatin-immunoprecipitation sequencing (ChIP-seq) data sets consisting of many types of DNA-binding proteins collected from a variety of cells, conditions and organisms. Importantly, these interaction patterns broadly reflect the biological properties of the binding events. To generate these profiles, termed Arpeggio profiles, we applied harmonic deconvolution techniques to the autocorrelation profiles of the ChIP-seq signals. We used 806 publicly available ChIP-seq experiments and showed that Arpeggio profiles with similar spectral densities shared biological properties. Arpeggio profiles of ChIP-seq data sets revealed characteristics that are not easily detected by standard peak finders. They also allowed us to relate sequencing data sets from different genomes, experimental platforms and protocols. Arpeggio is freely available at http://sourceforge.net/p/arpeggio/wiki/Home/. PMID:23873955
Arpeggio: harmonic compression of ChIP-seq data reveals protein-chromatin interaction signatures.
Stanton, Kelly Patrick; Parisi, Fabio; Strino, Francesco; Rabin, Neta; Asp, Patrik; Kluger, Yuval
2013-09-01
Researchers generating new genome-wide data in an exploratory sequencing study can gain biological insights by comparing their data with well-annotated data sets possessing similar genomic patterns. Data compression techniques are needed for efficient comparisons of a new genomic experiment with large repositories of publicly available profiles. Furthermore, data representations that allow comparisons of genomic signals from different platforms and across species enhance our ability to leverage these large repositories. Here, we present a signal processing approach that characterizes protein-chromatin interaction patterns at length scales of several kilobases. This allows us to efficiently compare numerous chromatin-immunoprecipitation sequencing (ChIP-seq) data sets consisting of many types of DNA-binding proteins collected from a variety of cells, conditions and organisms. Importantly, these interaction patterns broadly reflect the biological properties of the binding events. To generate these profiles, termed Arpeggio profiles, we applied harmonic deconvolution techniques to the autocorrelation profiles of the ChIP-seq signals. We used 806 publicly available ChIP-seq experiments and showed that Arpeggio profiles with similar spectral densities shared biological properties. Arpeggio profiles of ChIP-seq data sets revealed characteristics that are not easily detected by standard peak finders. They also allowed us to relate sequencing data sets from different genomes, experimental platforms and protocols. Arpeggio is freely available at http://sourceforge.net/p/arpeggio/wiki/Home/.
Advances and Challenges in Genomic Selection for Disease Resistance.
Poland, Jesse; Rutkoski, Jessica
2016-08-04
Breeding for disease resistance is a central focus of plant breeding programs, as any successful variety must have the complete package of high yield, disease resistance, agronomic performance, and end-use quality. With the need to accelerate the development of improved varieties, genomics-assisted breeding is becoming an important tool in breeding programs. With marker-assisted selection, there has been success in breeding for disease resistance; however, much of this work and research has focused on identifying, mapping, and selecting for major resistance genes that tend to be highly effective but vulnerable to breakdown with rapid changes in pathogen races. In contrast, breeding for minor-gene quantitative resistance tends to produce more durable varieties but is a more challenging breeding objective. As the genetic architecture of resistance shifts from single major R genes to a diffused architecture of many minor genes, the best approach for molecular breeding will shift from marker-assisted selection to genomic selection. Genomics-assisted breeding for quantitative resistance will therefore necessitate whole-genome prediction models and selection methodology as implemented for classical complex traits such as yield. Here, we examine multiple case studies testing whole-genome prediction models and genomic selection for disease resistance. In general, whole-genome models for disease resistance can produce prediction accuracy suitable for application in breeding. These models also largely outperform multiple linear regression as would be applied in marker-assisted selection. With the implementation of genomic selection for yield and other agronomic traits, whole-genome marker profiles will be available for the entire set of breeding lines, enabling genomic selection for disease at no additional direct cost. In this context, the scope of implementing genomics selection for disease resistance, and specifically for quantitative resistance and quarantined pathogens, becomes a tractable and powerful approach in breeding programs.
Prediction of human population responses to toxic compounds by a collaborative competition.
Eduati, Federica; Mangravite, Lara M; Wang, Tao; Tang, Hao; Bare, J Christopher; Huang, Ruili; Norman, Thea; Kellen, Mike; Menden, Michael P; Yang, Jichen; Zhan, Xiaowei; Zhong, Rui; Xiao, Guanghua; Xia, Menghang; Abdo, Nour; Kosyk, Oksana; Friend, Stephen; Dearry, Allen; Simeonov, Anton; Tice, Raymond R; Rusyn, Ivan; Wright, Fred A; Stolovitzky, Gustavo; Xie, Yang; Saez-Rodriguez, Julio
2015-09-01
The ability to computationally predict the effects of toxic compounds on humans could help address the deficiencies of current chemical safety testing. Here, we report the results from a community-based DREAM challenge to predict toxicities of environmental compounds with potential adverse health effects for human populations. We measured the cytotoxicity of 156 compounds in 884 lymphoblastoid cell lines for which genotype and transcriptional data are available as part of the Tox21 1000 Genomes Project. The challenge participants developed algorithms to predict interindividual variability of toxic response from genomic profiles and population-level cytotoxicity data from structural attributes of the compounds. 179 submitted predictions were evaluated against an experimental data set to which participants were blinded. Individual cytotoxicity predictions were better than random, with modest correlations (Pearson's r < 0.28), consistent with complex trait genomic prediction. In contrast, predictions of population-level response to different compounds were higher (r < 0.66). The results highlight the possibility of predicting health risks associated with unknown compounds, although risk estimation accuracy remains suboptimal.
Dreger, Dayna L; Rimbault, Maud; Davis, Brian W; Bhatnagar, Adrienne; Parker, Heidi G; Ostrander, Elaine A
2016-12-01
In the decade following publication of the draft genome sequence of the domestic dog, extraordinary advances with application to several fields have been credited to the canine genetic system. Taking advantage of closed breeding populations and the subsequent selection for aesthetic and behavioral characteristics, researchers have leveraged the dog as an effective natural model for the study of complex traits, such as disease susceptibility, behavior and morphology, generating unique contributions to human health and biology. When designing genetic studies using purebred dogs, it is essential to consider the unique demography of each population, including estimation of effective population size and timing of population bottlenecks. The analytical design approach for genome-wide association studies (GWAS) and analysis of whole-genome sequence (WGS) experiments are inextricable from demographic data. We have performed a comprehensive study of genomic homozygosity, using high-depth WGS data for 90 individuals, and Illumina HD SNP data from 800 individuals representing 80 breeds. These data were coupled with extensive pedigree data analyses for 11 breeds that, together, allowed us to compute breed structure, demography, and molecular measures of genome diversity. Our comparative analyses characterize the extent, formation and implication of breed-specific diversity as it relates to population structure. These data demonstrate the relationship between breed-specific genome dynamics and population architecture, and provide important considerations influencing the technological and cohort design of association and other genomic studies. © 2016. Published by The Company of Biologists Ltd.
Dreger, Dayna L.; Rimbault, Maud; Davis, Brian W.; Bhatnagar, Adrienne; Parker, Heidi G.
2016-01-01
ABSTRACT In the decade following publication of the draft genome sequence of the domestic dog, extraordinary advances with application to several fields have been credited to the canine genetic system. Taking advantage of closed breeding populations and the subsequent selection for aesthetic and behavioral characteristics, researchers have leveraged the dog as an effective natural model for the study of complex traits, such as disease susceptibility, behavior and morphology, generating unique contributions to human health and biology. When designing genetic studies using purebred dogs, it is essential to consider the unique demography of each population, including estimation of effective population size and timing of population bottlenecks. The analytical design approach for genome-wide association studies (GWAS) and analysis of whole-genome sequence (WGS) experiments are inextricable from demographic data. We have performed a comprehensive study of genomic homozygosity, using high-depth WGS data for 90 individuals, and Illumina HD SNP data from 800 individuals representing 80 breeds. These data were coupled with extensive pedigree data analyses for 11 breeds that, together, allowed us to compute breed structure, demography, and molecular measures of genome diversity. Our comparative analyses characterize the extent, formation and implication of breed-specific diversity as it relates to population structure. These data demonstrate the relationship between breed-specific genome dynamics and population architecture, and provide important considerations influencing the technological and cohort design of association and other genomic studies. PMID:27874836
A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis
Down, Thomas A.; Rakyan, Vardhman K.; Turner, Daniel J.; Flicek, Paul; Li, Heng; Kulesha, Eugene; Gräf, Stefan; Johnson, Nathan; Herrero, Javier; Tomazou, Eleni M.; Thorne, Natalie P.; Bäckdahl, Liselotte; Herberth, Marlis; Howe, Kevin L.; Jackson, David K.; Miretti, Marcos M.; Marioni, John C.; Birney, Ewan; Hubbard, Tim J. P.; Durbin, Richard; Tavaré, Simon; Beck, Stephan
2009-01-01
DNA methylation is an indispensible epigenetic modification of mammalian genomes. Consequently there is great interest in strategies for genome-wide/whole-genome DNA methylation analysis, and immunoprecipitation-based methods have proven to be a powerful option. Such methods are rapidly shifting the bottleneck from data generation to data analysis, necessitating the development of better analytical tools. Until now, a major analytical difficulty associated with immunoprecipitation-based DNA methylation profiling has been the inability to estimate absolute methylation levels. Here we report the development of a novel cross-platform algorithm – Bayesian Tool for Methylation Analysis (Batman) – for analyzing Methylated DNA Immunoprecipitation (MeDIP) profiles generated using arrays (MeDIP-chip) or next-generation sequencing (MeDIP-seq). The latter is an approach we have developed to elucidate the first high-resolution whole-genome DNA methylation profile (DNA methylome) of any mammalian genome. MeDIP-seq/MeDIP-chip combined with Batman represent robust, quantitative, and cost-effective functional genomic strategies for elucidating the function of DNA methylation. PMID:18612301
Kullback Leibler divergence in complete bacterial and phage genomes
Akhter, Sajia; Kashef, Mona T.; Ibrahim, Eslam S.; Bailey, Barbara
2017-01-01
The amino acid content of the proteins encoded by a genome may predict the coding potential of that genome and may reflect lifestyle restrictions of the organism. Here, we calculated the Kullback–Leibler divergence from the mean amino acid content as a metric to compare the amino acid composition for a large set of bacterial and phage genome sequences. Using these data, we demonstrate that (i) there is a significant difference between amino acid utilization in different phylogenetic groups of bacteria and phages; (ii) many of the bacteria with the most skewed amino acid utilization profiles, or the bacteria that host phages with the most skewed profiles, are endosymbionts or parasites; (iii) the skews in the distribution are not restricted to certain metabolic processes but are common across all bacterial genomic subsystems; (iv) amino acid utilization profiles strongly correlate with GC content in bacterial genomes but very weakly correlate with the G+C percent in phage genomes. These findings might be exploited to distinguish coding from non-coding sequences in large data sets, such as metagenomic sequence libraries, to help in prioritizing subsequent analyses. PMID:29204318
Kullback Leibler divergence in complete bacterial and phage genomes.
Akhter, Sajia; Aziz, Ramy K; Kashef, Mona T; Ibrahim, Eslam S; Bailey, Barbara; Edwards, Robert A
2017-01-01
The amino acid content of the proteins encoded by a genome may predict the coding potential of that genome and may reflect lifestyle restrictions of the organism. Here, we calculated the Kullback-Leibler divergence from the mean amino acid content as a metric to compare the amino acid composition for a large set of bacterial and phage genome sequences. Using these data, we demonstrate that (i) there is a significant difference between amino acid utilization in different phylogenetic groups of bacteria and phages; (ii) many of the bacteria with the most skewed amino acid utilization profiles, or the bacteria that host phages with the most skewed profiles, are endosymbionts or parasites; (iii) the skews in the distribution are not restricted to certain metabolic processes but are common across all bacterial genomic subsystems; (iv) amino acid utilization profiles strongly correlate with GC content in bacterial genomes but very weakly correlate with the G+C percent in phage genomes. These findings might be exploited to distinguish coding from non-coding sequences in large data sets, such as metagenomic sequence libraries, to help in prioritizing subsequent analyses.
Daware, Anurag; Das, Sweta; Srivastava, Rishi; Badoni, Saurabh; Singh, Ashok K.; Agarwal, Pinky; Parida, Swarup K.; Tyagi, Akhilesh K.
2016-01-01
Development and use of genome-wide informative simple sequence repeat (SSR) markers and novel integrated genomic strategies are vital to drive genomics-assisted breeding applications and for efficient dissection of quantitative trait loci (QTLs) underlying complex traits in rice. The present study developed 6244 genome-wide informative SSR markers exhibiting in silico fragment length polymorphism based on repeat-unit variations among genomic sequences of 11 indica, japonica, aus, and wild rice accessions. These markers were mapped on diverse coding and non-coding sequence components of known cloned/candidate genes annotated from 12 chromosomes and revealed a much higher amplification (97%) and polymorphic potential (88%) along with wider genetic/functional diversity level (16–74% with a mean 53%) especially among accessions belonging to indica cultivar group, suggesting their utility in large-scale genomics-assisted breeding applications in rice. A high-density 3791 SSR markers-anchored genetic linkage map (IR 64 × Sonasal) spanning 2060 cM total map-length with an average inter-marker distance of 0.54 cM was generated. This reference genetic map identified six major genomic regions harboring robust QTLs (31% combined phenotypic variation explained with a 5.7–8.7 LOD) governing grain weight on six rice chromosomes. One strong grain weight major QTL region (OsqGW5.1) was narrowed-down by integrating traditional QTL mapping with high-resolution QTL region-specific integrated SSR and single nucleotide polymorphism markers-based QTL-seq analysis and differential expression profiling. This led us to delineate two natural allelic variants in two known cis-regulatory elements (RAV1AAT and CARGCW8GAT) of glycosyl hydrolase and serine carboxypeptidase genes exhibiting pronounced seed-specific differential regulation in low (Sonasal) and high (IR 64) grain weight mapping parental accessions. Our genome-wide SSR marker resource (polymorphic within/between diverse cultivar groups) and integrated genomic strategy can efficiently scan functionally relevant potential molecular tags (markers, candidate genes and alleles) regulating complex agronomic traits (grain weight) and expedite marker-assisted genetic enhancement in rice. PMID:27833617
Evolution of biological complexity
Adami, Christoph; Ofria, Charles; Collier, Travis C.
2000-01-01
To make a case for or against a trend in the evolution of complexity in biological evolution, complexity needs to be both rigorously defined and measurable. A recent information-theoretic (but intuitively evident) definition identifies genomic complexity with the amount of information a sequence stores about its environment. We investigate the evolution of genomic complexity in populations of digital organisms and monitor in detail the evolutionary transitions that increase complexity. We show that, because natural selection forces genomes to behave as a natural “Maxwell Demon,” within a fixed environment, genomic complexity is forced to increase. PMID:10781045
Morales-Cruz, Abraham; Allenbeck, Gabrielle; Figueroa-Balderas, Rosa; Ashworth, Vanessa E; Lawrence, Daniel P; Travadon, Renaud; Smith, Rhonda J; Baumgartner, Kendra; Rolshausen, Philippe E; Cantu, Dario
2018-02-01
Grapevines, like other perennial crops, are affected by so-called 'trunk diseases', which damage the trunk and other woody tissues. Mature grapevines typically contract more than one trunk disease and often multiple grapevine trunk pathogens (GTPs) are recovered from infected tissues. The co-existence of different GTP species in complex and dynamic microbial communities complicates the study of the molecular mechanisms underlying disease development, especially under vineyard conditions. The objective of this study was to develop and optimize a community-level transcriptomics (i.e. metatranscriptomics) approach that could monitor simultaneously the virulence activities of multiple GTPs in planta. The availability of annotated genomes for the most relevant co-infecting GTPs in diseased grapevine wood provided the unprecedented opportunity to generate a multi-species reference for the mapping and quantification of DNA and RNA sequencing reads. We first evaluated popular sequence read mappers using permutations of multiple simulated datasets. Alignment parameters of the selected mapper were optimized to increase the specificity and sensitivity for its application to metagenomics and metatranscriptomics analyses. Initial testing on grapevine wood experimentally inoculated with individual GTPs confirmed the validity of the method. Using naturally infected field samples expressing a variety of trunk disease symptoms, we show that our approach provides quantitative assessments of species composition, as well as genome-wide transcriptional profiling of potential virulence factors, namely cell wall degradation, secondary metabolism and nutrient uptake for all co-infecting GTPs. © 2017 BSPP AND JOHN WILEY & SONS LTD.
Genome-wide association mapping of leaf metabolic profiles for dissecting complex traits in maize.
Riedelsheimer, Christian; Lisec, Jan; Czedik-Eysenberg, Angelika; Sulpice, Ronan; Flis, Anna; Grieder, Christoph; Altmann, Thomas; Stitt, Mark; Willmitzer, Lothar; Melchinger, Albrecht E
2012-06-05
The diversity of metabolites found in plants is by far greater than in most other organisms. Metabolic profiling techniques, which measure many of these compounds simultaneously, enabled investigating the regulation of metabolic networks and proved to be useful for predicting important agronomic traits. However, little is known about the genetic basis of metabolites in crops such as maize. Here, a set of 289 diverse maize inbred lines was genotyped with 56,110 SNPs and assayed for 118 biochemical compounds in the leaves of young plants, as well as for agronomic traits of mature plants in field trials. Metabolite concentrations had on average a repeatability of 0.73 and showed a correlation pattern that largely reflected their functional grouping. Genome-wide association mapping with correction for population structure and cryptic relatedness identified for 26 distinct metabolites strong associations with SNPs, explaining up to 32.0% of the observed genetic variance. On nine chromosomes, we detected 15 distinct SNP-metabolite associations, each of which explained more then 15% of the genetic variance. For lignin precursors, including p-coumaric acid and caffeic acid, we found strong associations (P values to ) with a region on chromosome 9 harboring cinnamoyl-CoA reductase, a key enzyme in monolignol synthesis and a target for improving the quality of lignocellulosic biomass by genetic engineering approaches. Moreover, lignin precursors correlated significantly with lignin content, plant height, and dry matter yield, suggesting that metabolites represent promising connecting links for narrowing the genotype-phenotype gap of complex agronomic traits.
Genetic Comparison of B. Anthracis and its Close Relatives Using AFLP and PCR Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jackson, P.J.; Hill, K.K.; Laker, M.T.
1999-02-01
Amplified Fragment length Polymorphism (AFLP) analysis allows a rapid, relatively simple analysis of a large portion of a microbial genome, providing information about the species and its phylogenetic relationship to other microbes (Vos, et al., 1995). The method simply surveys the genome for length and sequence polymorphisms. The pattern identified can be used for comparison to the genomes of other species. Unlike other methods, it does not rely on analysis of a single genetic locus that may bias the interpretation of results and it does not require any prior knowledge of the targeted organism. Moreover, a standard set of reagentsmore » can be applied to any species without using species-specific information or molecular probes. The authors are using AFLP's to rapidly identify different bacterial species. A comparison of AFLP profiles generated from a large battery of B. anthracis strains shows very little variability among different isolates (Keim, et al., 1997). By contrast, there is a significant difference between AFLP profiles generated for any B. anthracis strain and even the most closely related Bacillus species. Sufficient variability is apparent among all known microbial species to allow phylogenetic analysis based on large numbers of genetically unlinked loci. These striking differences among AFLP profiles allow unambiguous identification of previously identified species and phylogenetic placement of newly characterized isolates relative to known species based on a large number of independent genetic loci. Data generated thus far show that the method provides phylogenetic analyses that are consistent with other widely accepted phylogenetic methods. However, AFLP analysis provides a more detailed analysis of the targets and samples a much larger portion of the genome. Consequently, it provides an inexpensive, rapid means of characterizing microbial isolates to further differentiate among strains and closely related microbial species. Such information cannot be rapidly generated by other means. AFLP sample analysis quickly generates a very large amount of molecular information about microbial genomes. However, this information cannot be analyzed rapidly using manual methods. The authors are developing a large archive of electronic AFLP signatures that is being used to identify isolates collected from medical, veterinary, forensic and environmental samples. They are also developing the computational packages necessary to rapidly and unambiguously analyze the AFLP profiles and conduct a phylogenetic comparison of these data relative to information already in the database. They will use this archive and the associated algorithms to determine the species identity of previously uncharacterized isolates and place them phylogenetically relative to other microbes based on their AFLP signatures. This study provides significant new information about microbes with environmental, veterinary and medical significance. This information can be used in further studies to understand the relationships among these species and the factors that distinguish them from one another. It should also allow identification of unique factors that contribute to important microbial traits including pathogenicity and virulence. They are also using AFLP data to identify, isolate and sequence DNA fragments that are unique to particular microbial species and strains. The fragment patterns and sequence information provide insights into the complexity and organization of bacterial genomes relative to one another. They also provide the information necessary for development of species-specific PCR primers that can be used to interrogate complex samples for the presence of B. anthracis, other microbial pathogens or their remnants.« less
Cai, Binghuang; Li, Biao; Kiga, Nikki; Thusberg, Janita; Bergquist, Timothy; Chen, Yun-Ching; Niknafs, Noushin; Carter, Hannah; Tokheim, Collin; Beleva-Guthrie, Violeta; Douville, Christopher; Bhattacharya, Rohit; Yeo, Hui Ting Grace; Fan, Jean; Sengupta, Sohini; Kim, Dewey; Cline, Melissa; Turner, Tychele; Diekhans, Mark; Zaucha, Jan; Pal, Lipika R; Cao, Chen; Yu, Chen-Hsin; Yin, Yizhou; Carraro, Marco; Giollo, Manuel; Ferrari, Carlo; Leonardi, Emanuela; Tosatto, Silvio C E; Bobe, Jason; Ball, Madeleine; Hoskins, Roger A; Repo, Susanna; Church, George; Brenner, Steven E; Moult, John; Gough, Julian; Stanke, Mario; Karchin, Rachel; Mooney, Sean D
2017-09-01
The advent of next-generation sequencing has dramatically decreased the cost for whole-genome sequencing and increased the viability for its application in research and clinical care. The Personal Genome Project (PGP) provides unrestricted access to genomes of individuals and their associated phenotypes. This resource enabled the Critical Assessment of Genome Interpretation (CAGI) to create a community challenge to assess the bioinformatics community's ability to predict traits from whole genomes. In the CAGI PGP challenge, researchers were asked to predict whether an individual had a particular trait or profile based on their whole genome. Several approaches were used to assess submissions, including ROC AUC (area under receiver operating characteristic curve), probability rankings, the number of correct predictions, and statistical significance simulations. Overall, we found that prediction of individual traits is difficult, relying on a strong knowledge of trait frequency within the general population, whereas matching genomes to trait profiles relies heavily upon a small number of common traits including ancestry, blood type, and eye color. When a rare genetic disorder is present, profiles can be matched when one or more pathogenic variants are identified. Prediction accuracy has improved substantially over the last 6 years due to improved methodology and a better understanding of features. © 2017 Wiley Periodicals, Inc.
Rice-Map: a new-generation rice genome browser.
Wang, Jun; Kong, Lei; Zhao, Shuqi; Zhang, He; Tang, Liang; Li, Zhe; Gu, Xiaocheng; Luo, Jingchu; Gao, Ge
2011-03-30
The concurrent release of rice genome sequences for two subspecies (Oryza sativa L. ssp. japonica and Oryza sativa L. ssp. indica) facilitates rice studies at the whole genome level. Since the advent of high-throughput analysis, huge amounts of functional genomics data have been delivered rapidly, making an integrated online genome browser indispensable for scientists to visualize and analyze these data. Based on next-generation web technologies and high-throughput experimental data, we have developed Rice-Map, a novel genome browser for researchers to navigate, analyze and annotate rice genome interactively. More than one hundred annotation tracks (81 for japonica and 82 for indica) have been compiled and loaded into Rice-Map. These pre-computed annotations cover gene models, transcript evidences, expression profiling, epigenetic modifications, inter-species and intra-species homologies, genetic markers and other genomic features. In addition to these pre-computed tracks, registered users can interactively add comments and research notes to Rice-Map as User-Defined Annotation entries. By smoothly scrolling, dragging and zooming, users can browse various genomic features simultaneously at multiple scales. On-the-fly analysis for selected entries could be performed through dedicated bioinformatic analysis platforms such as WebLab and Galaxy. Furthermore, a BioMart-powered data warehouse "Rice Mart" is offered for advanced users to fetch bulk datasets based on complex criteria. Rice-Map delivers abundant up-to-date japonica and indica annotations, providing a valuable resource for both computational and bench biologists. Rice-Map is publicly accessible at http://www.ricemap.org/, with all data available for free downloading.
FunCoup 3.0: database of genome-wide functional coupling networks
Schmitt, Thomas; Ogris, Christoph; Sonnhammer, Erik L. L.
2014-01-01
We present an update of the FunCoup database (http://FunCoup.sbc.su.se) of functional couplings, or functional associations, between genes and gene products. Identifying these functional couplings is an important step in the understanding of higher level mechanisms performed by complex cellular processes. FunCoup distinguishes between four classes of couplings: participation in the same signaling cascade, participation in the same metabolic process, co-membership in a protein complex and physical interaction. For each of these four classes, several types of experimental and statistical evidence are combined by Bayesian integration to predict genome-wide functional coupling networks. The FunCoup framework has been completely re-implemented to allow for more frequent future updates. It contains many improvements, such as a regularization procedure to automatically downweight redundant evidences and a novel method to incorporate phylogenetic profile similarity. Several datasets have been updated and new data have been added in FunCoup 3.0. Furthermore, we have developed a new Web site, which provides powerful tools to explore the predicted networks and to retrieve detailed information about the data underlying each prediction. PMID:24185702
Information theory applications for biological sequence analysis.
Vinga, Susana
2014-05-01
Information theory (IT) addresses the analysis of communication systems and has been widely applied in molecular biology. In particular, alignment-free sequence analysis and comparison greatly benefited from concepts derived from IT, such as entropy and mutual information. This review covers several aspects of IT applications, ranging from genome global analysis and comparison, including block-entropy estimation and resolution-free metrics based on iterative maps, to local analysis, comprising the classification of motifs, prediction of transcription factor binding sites and sequence characterization based on linguistic complexity and entropic profiles. IT has also been applied to high-level correlations that combine DNA, RNA or protein features with sequence-independent properties, such as gene mapping and phenotype analysis, and has also provided models based on communication systems theory to describe information transmission channels at the cell level and also during evolutionary processes. While not exhaustive, this review attempts to categorize existing methods and to indicate their relation with broader transversal topics such as genomic signatures, data compression and complexity, time series analysis and phylogenetic classification, providing a resource for future developments in this promising area.
FunCoup 3.0: database of genome-wide functional coupling networks.
Schmitt, Thomas; Ogris, Christoph; Sonnhammer, Erik L L
2014-01-01
We present an update of the FunCoup database (http://FunCoup.sbc.su.se) of functional couplings, or functional associations, between genes and gene products. Identifying these functional couplings is an important step in the understanding of higher level mechanisms performed by complex cellular processes. FunCoup distinguishes between four classes of couplings: participation in the same signaling cascade, participation in the same metabolic process, co-membership in a protein complex and physical interaction. For each of these four classes, several types of experimental and statistical evidence are combined by Bayesian integration to predict genome-wide functional coupling networks. The FunCoup framework has been completely re-implemented to allow for more frequent future updates. It contains many improvements, such as a regularization procedure to automatically downweight redundant evidences and a novel method to incorporate phylogenetic profile similarity. Several datasets have been updated and new data have been added in FunCoup 3.0. Furthermore, we have developed a new Web site, which provides powerful tools to explore the predicted networks and to retrieve detailed information about the data underlying each prediction.
Background: Gliomas are diverse neoplasms with multiple molecular subtypes. How tumor-initiating mutations relate to molecular subtypes as these tumors evolve during malignant progression remains unclear.Methods: We used genetically engineered mouse models, histopathology, genetic lineage tracing, expression profiling, and copy number analyses to examine how genomic tumor diversity evolves during the course of malignant progression from low- to high-grade disease.
Chromosome organizaton in simple and complex unicellular organisms.
O'Sullivan, Justin M
2011-01-01
The genomes of unicellular organisms form complex 3-dimensional structures. This spatial organization is hypothesized to have a significant role in genomic function. Spatial organization is not limited solely to the three-dimensional folding of the chromosome(s) in genomes but also includes genome positioning, and the folding and compartmentalization of any additional genetic material (e.g. episomes) present within complex genomes. In this comment, I will highlight similarities in the spatial organization of eukaryotic and prokaryotic unicellular genomes.
Muley, Vijaykumar Yogesh; Ranjan, Akash
2012-01-01
Recent progress in computational methods for predicting physical and functional protein-protein interactions has provided new insights into the complexity of biological processes. Most of these methods assume that functionally interacting proteins are likely to have a shared evolutionary history. This history can be traced out for the protein pairs of a query genome by correlating different evolutionary aspects of their homologs in multiple genomes known as the reference genomes. These methods include phylogenetic profiling, gene neighborhood and co-occurrence of the orthologous protein coding genes in the same cluster or operon. These are collectively known as genomic context methods. On the other hand a method called mirrortree is based on the similarity of phylogenetic trees between two interacting proteins. Comprehensive performance analyses of these methods have been frequently reported in literature. However, very few studies provide insight into the effect of reference genome selection on detection of meaningful protein interactions. We analyzed the performance of four methods and their variants to understand the effect of reference genome selection on prediction efficacy. We used six sets of reference genomes, sampled in accordance with phylogenetic diversity and relationship between organisms from 565 bacteria. We used Escherichia coli as a model organism and the gold standard datasets of interacting proteins reported in DIP, EcoCyc and KEGG databases to compare the performance of the prediction methods. Higher performance for predicting protein-protein interactions was achievable even with 100-150 bacterial genomes out of 565 genomes. Inclusion of archaeal genomes in the reference genome set improves performance. We find that in order to obtain a good performance, it is better to sample few genomes of related genera of prokaryotes from the large number of available genomes. Moreover, such a sampling allows for selecting 50-100 genomes for comparable accuracy of predictions when computational resources are limited.
Shoham, Dany
2011-01-01
Based on a wealth of recent findings, in conjunction with earliest chronologies pertaining to evolutionary emergences of ancestral RNA viruses, ducks, Influenzavirus A (assumingly within ducks), and hominids, as well as to the initial domestication of mallard duck (Anas platyrhynchos), jungle fowl (Gallus gallus), wild turkey (Meleagris gallopavo), wild boar (Sus scrofa), and wild horse (Equus ferus), presumed genesis modes of primordial pandemic influenza strains have multidisciplinarily been configured. The virological fundamentality of domestication and farming of those various avian and mammalian species has thereby been demonstrated and broadly elucidated, within distinctive coevolutionary paradigms. The mentioned viral genesis modes were then analyzed, compatibly with common denominators and flexibility that mark the geographic profile of the last 18 pandemic strains, which reputedly emerged since 1510, the antigenic profile of the last 10 pandemic strains since 1847, and the genomic profile of the last 5 pandemic strains since 1918, until present. Related ecophylogenetic and biogeographic aspects have been enlightened, alongside with the crucial role of spatial virus gene dissemination by avian hosts. A fairly coherent picture of primary and late evolutionary and genomic courses of pandemic strains has thus been attained, tentatively. Specific patterns underlying complexes prone to generate past and future pandemic strains from viral reservoir in animals are consequentially derived. PMID:23074663
Yasui, Yasuo; Hirakawa, Hideki; Oikawa, Tetsuo; Toyoshima, Masami; Matsuzaki, Chiaki; Ueno, Mariko; Mizuno, Nobuyuki; Nagatoshi, Yukari; Imamura, Tomohiro; Miyago, Manami; Tanaka, Kojiro; Mise, Kazuyuki; Tanaka, Tsutomu; Mizukoshi, Hiroharu; Mori, Masashi; Fujita, Yasunari
2016-12-01
Chenopodium quinoa Willd. (quinoa) originated from the Andean region of South America, and is a pseudocereal crop of the Amaranthaceae family. Quinoa is emerging as an important crop with the potential to contribute to food security worldwide and is considered to be an optimal food source for astronauts, due to its outstanding nutritional profile and ability to tolerate stressful environments. Furthermore, plant pathologists use quinoa as a representative diagnostic host to identify virus species. However, molecular analysis of quinoa is limited by its genetic heterogeneity due to outcrossing and its genome complexity derived from allotetraploidy. To overcome these obstacles, we established the inbred and standard quinoa accession Kd that enables rigorous molecular analysis, and presented the draft genome sequence of Kd, using an optimized combination of high-throughput next generation sequencing on the Illumina Hiseq 2500 and PacBio RS II sequencers. The de novo genome assembly contained 25 k scaffolds consisting of 1 Gbp with N50 length of 86 kbp. Based on these data, we constructed the free-access Quinoa Genome DataBase (QGDB). Thus, these findings provide insights into the mechanisms underlying agronomically important traits of quinoa and the effect of allotetraploidy on genome evolution. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Harris, R. Alan; Wang, Ting; Coarfa, Cristian; Nagarajan, Raman P.; Hong, Chibo; Downey, Sara L.; Johnson, Brett E.; Fouse, Shaun D.; Delaney, Allen; Zhao, Yongjun; Olshen, Adam; Ballinger, Tracy; Zhou, Xin; Forsberg, Kevin J.; Gu, Junchen; Echipare, Lorigail; O’Geen, Henriette; Lister, Ryan; Pelizzola, Mattia; Xi, Yuanxin; Epstein, Charles B.; Bernstein, Bradley E.; Hawkins, R. David; Ren, Bing; Chung, Wen-Yu; Gu, Hongcang; Bock, Christoph; Gnirke, Andreas; Zhang, Michael Q.; Haussler, David; Ecker, Joseph; Li, Wei; Farnham, Peggy J.; Waterland, Robert A.; Meissner, Alexander; Marra, Marco A.; Hirst, Martin; Milosavljevic, Aleksandar; Costello, Joseph F.
2010-01-01
Sequencing-based DNA methylation profiling methods are comprehensive and, as accuracy and affordability improve, will increasingly supplant microarrays for genome-scale analyses. Here, four sequencing-based methodologies were applied to biological replicates of human embryonic stem cells to compare their CpG coverage genome-wide and in transposons, resolution, cost, concordance and its relationship with CpG density and genomic context. The two bisulfite methods reached concordance of 82% for CpG methylation levels and 99% for non-CpG cytosine methylation levels. Using binary methylation calls, two enrichment methods were 99% concordant, while regions assessed by all four methods were 97% concordant. To achieve comprehensive methylome coverage while reducing cost, an approach integrating two complementary methods was examined. The integrative methylome profile along with histone methylation, RNA, and SNP profiles derived from the sequence reads allowed genome-wide assessment of allele-specific epigenetic states, identifying most known imprinted regions and new loci with monoallelic epigenetic marks and monoallelic expression. PMID:20852635
Jordan, Daniel M; Do, Ron
2018-04-11
While sequence-based genetic tests have long been available for specific loci, especially for Mendelian disease, the rapidly falling costs of genome-wide genotyping arrays, whole-exome sequencing, and whole-genome sequencing are moving us toward a future where full genomic information might inform the prognosis and treatment of a variety of diseases, including complex disease. Similarly, the availability of large populations with full genomic information has enabled new insights about the etiology and genetic architecture of complex disease. Insights from the latest generation of genomic studies suggest that our categorization of diseases as complex may conceal a wide spectrum of genetic architectures and causal mechanisms that ranges from Mendelian forms of complex disease to complex regulatory structures underlying Mendelian disease. Here, we review these insights, along with advances in the prediction of disease risk and outcomes from full genomic information. Expected final online publication date for the Annual Review of Genomics and Human Genetics Volume 19 is August 31, 2018. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Dhir, Mashaal; Choudry, Haroon A; Holtzman, Matthew P; Pingpank, James F; Ahrendt, Steven A; Zureikat, Amer H; Hogg, Melissa E; Bartlett, David L; Zeh, Herbert J; Singhi, Aatur D; Bahary, Nathan
2017-01-01
The impact of genomic profiling on the outcomes of patients with advanced gastrointestinal (GI) malignancies remains unknown. The primary objectives of the study were to investigate the clinical benefit of genomic-guided therapy, defined as complete response (CR), partial response (PR), or stable disease (SD) at 3 months, and its impact on progression-free survival (PFS) in patients with advanced GI malignancies. Clinical and genomic data of all consecutive GI tumor samples from April, 2013 to April, 2016 sequenced by FoundationOne were obtained and analyzed. A total of 101 samples from 97 patients were analyzed. Ninety-eight samples from 95 patients could be amplified making this approach feasible in 97% of the samples. After removing duplicates, 95 samples from 95 patients were included in the further analysis. Median time from specimen collection to reporting was 11 days. Genomic alteration-guided treatment recommendations were considered new and clinically relevant in 38% (36/95) of the patients. Rapid decline in functional status was noted in 25% (9/36) of these patients who could therefore not receive genomic-guided therapy. Genomic-guided therapy was utilized in 13 patients (13.7%) and 7 patients (7.4%) experienced clinical benefit (6 PR and 1 SD). Among these seven patients, median PFS was 10 months with some ongoing durable responses. Genomic profiling-guided therapy can lead to clinical benefit in a subset of patients with advanced GI malignancies. Attempting genomic profiling earlier in the course of treatment prior to functional decline may allow more patients to benefit from these therapies. © 2016 The Authors. Cancer Medicine published by John Wiley & Sons Ltd.
Looping and clustering model for the organization of protein-DNA complexes on the bacterial genome
NASA Astrophysics Data System (ADS)
Walter, Jean-Charles; Walliser, Nils-Ole; David, Gabriel; Dorignac, Jérôme; Geniet, Frédéric; Palmeri, John; Parmeggiani, Andrea; Wingreen, Ned S.; Broedersz, Chase P.
2018-03-01
The bacterial genome is organized by a variety of associated proteins inside a structure called the nucleoid. These proteins can form complexes on DNA that play a central role in various biological processes, including chromosome segregation. A prominent example is the large ParB-DNA complex, which forms an essential component of the segregation machinery in many bacteria. ChIP-Seq experiments show that ParB proteins localize around centromere-like parS sites on the DNA to which ParB binds specifically, and spreads from there over large sections of the chromosome. Recent theoretical and experimental studies suggest that DNA-bound ParB proteins can interact with each other to condense into a coherent 3D complex on the DNA. However, the structural organization of this protein-DNA complex remains unclear, and a predictive quantitative theory for the distribution of ParB proteins on DNA is lacking. Here, we propose the looping and clustering model, which employs a statistical physics approach to describe protein-DNA complexes. The looping and clustering model accounts for the extrusion of DNA loops from a cluster of interacting DNA-bound proteins that is organized around a single high-affinity binding site. Conceptually, the structure of the protein-DNA complex is determined by a competition between attractive protein interactions and loop closure entropy of this protein-DNA cluster on the one hand, and the positional entropy for placing loops within the cluster on the other. Indeed, we show that the protein interaction strength determines the ‘tightness’ of the loopy protein-DNA complex. Thus, our model provides a theoretical framework for quantitatively computing the binding profiles of ParB-like proteins around a cognate (parS) binding site.
Digestive tumor bank protocol: from surgical specimens to genomic studies of digestive cancers.
Popescu, I; Stroescu, C; Dumitrascu, T; Herlea, V; Paslaru, Liliana; Lazar, V; Boissin, H; Taieb, J; Horeanga, Ionela
2006-01-01
Cancer is a complex polygenic and multifactorial disease, resulting from successive dynamic changes in the genome of somatic cells and from the accumulation of molecular alterations in both tumour cells and host cells. For the majority of cancers, including many malignancies of the gastrointestinal tract, our current means of diagnosis and treatment of the tumors are grossly insufficient. In recent years the development of several gene expression profiling methods such as comparative genomic hybridization (CGH), differential display, serial analysis of gene expression (SAGE) and DNA arrays, together with the sequencing of the human genome, has provided an opportunity to monitor and investigate the complete cascade of molecular events leading to tumor development and progression. Given the central role played by surgeons in the current management of patients with solid cancers, it is of paramount importance for them to know the principles characterizing this laboratory tools to critically assess the results originating from this biotechnology. We describe in this article the scientific partnership between Fundeni Clinical Institute Bucharest, Romania and RNtech Company, Paris, France for the development of a center of biological resources (Biobank) as well as the standardized protocol of working with the biological samples, the ongoing projects and the future perspectives.
Real-World Evidence In Support Of Precision Medicine: Clinico-Genomic Cancer Data As A Case Study.
Agarwala, Vineeta; Khozin, Sean; Singal, Gaurav; O'Connell, Claire; Kuk, Deborah; Li, Gerald; Gossai, Anala; Miller, Vincent; Abernethy, Amy P
2018-05-01
The majority of US adult cancer patients today are diagnosed and treated outside the context of any clinical trial (that is, in the real world). Although these patients are not part of a research study, their clinical data are still recorded. Indeed, data captured in electronic health records form an ever-growing, rich digital repository of longitudinal patient experiences, treatments, and outcomes. Likewise, genomic data from tumor molecular profiling are increasingly guiding oncology care. Linking real-world clinical and genomic data, as well as information from other co-occurring data sets, could create study populations that provide generalizable evidence for precision medicine interventions. However, the infrastructure required to link, ensure quality, and rapidly learn from such composite data is complex. We outline the challenges and describe a novel approach to building a real-world clinico-genomic database of patients with cancer. This work represents a case study in how data collected during routine patient care can inform precision medicine efforts for the population at large. We suggest that health policies can promote innovation by defining appropriate uses of real-world evidence, establishing data standards, and incentivizing data sharing.
Gorini, Giorgio; Bell, Richard L.; Mayfield, R. Dayne
2016-01-01
Summary Alcohol abuse and dependence are multifaceted disorders with neurobiological, psychological, and environmental components. Research on other complex neuropsychiatric diseases suggests that genetically influenced intermediate characteristics affect the risk for heavy alcohol consumption and its consequences. Diverse therapeutic interventions can be developed through identification of reliable biomarkers for this disorder and new pharmacological targets for its treatment. Advances in the fields of genomics and proteomics offer a number of possible targets for the development of new therapeutic approaches. This brain-focused review highlights studies identifying neurobiological systems associated with these targets and possible pharmacotherapies, summarizing evidence from clinically relevant animal and human studies, as well as sketching improvements and challenges facing the fields of proteomics and genomics. Concluding thoughts on using results from these profiling technologies for medication development are also presented. PMID:21199775
Integrative Clinical Genomics of Metastatic Cancer
Robinson, Dan R.; Wu, Yi-Mi; Lonigro, Robert J.; Vats, Pankaj; Cobain, Erin; Everett, Jessica; Cao, Xuhong; Rabban, Erica; Kumar-Sinha, Chandan; Raymond, Victoria; Schuetze, Scott; Alva, Ajjai; Siddiqui, Javed; Chugh, Rashmi; Worden, Francis; Zalupski, Mark M.; Innis, Jeffrey; Mody, Rajen J.; Tomlins, Scott A.; Lucas, David; Baker, Laurence H.; Ramnath, Nithya; Schott, Ann F.; Hayes, Daniel F.; Vijai, Joseph; Offit, Kenneth; Stoffel, Elena M.; Roberts, J. Scott; Smith, David C.; Kunju, Lakshmi P.; Talpaz, Moshe; Cieslik, Marcin; Chinnaiyan, Arul M.
2017-01-01
SUMMARY Metastasis is the primary cause of cancer-related deaths. While The Cancer Genome Atlas (TCGA) has sequenced primary tumor types obtained from surgical resections, much less comprehensive molecular analysis is available from clinically acquired metastatic cancers. Here, we perform whole exome and transcriptome sequencing of 500 adult patients with metastatic solid tumors of diverse lineage and biopsy site. The most prevalent genes somatically altered in metastatic cancer included TP53, CDKN2A, PTEN, PIK3CA, and RB1. Putative pathogenic germline variants were present in 12.2% of cases of which 75% were related to defects in DNA repair. RNA sequencing complemented DNA sequencing for the identification of gene fusions, pathway activation, and immune profiling. Integrative sequence analysis provides a clinically relevant, multi-dimensional view of the complex molecular landscape and microenvironment of metastatic cancers. PMID:28783718
Lu, Qiongshi; Li, Boyang; Ou, Derek; Erlendsdottir, Margret; Powles, Ryan L; Jiang, Tony; Hu, Yiming; Chang, David; Jin, Chentian; Dai, Wei; He, Qidu; Liu, Zefeng; Mukherjee, Shubhabrata; Crane, Paul K; Zhao, Hongyu
2017-12-07
Despite the success of large-scale genome-wide association studies (GWASs) on complex traits, our understanding of their genetic architecture is far from complete. Jointly modeling multiple traits' genetic profiles has provided insights into the shared genetic basis of many complex traits. However, large-scale inference sets a high bar for both statistical power and biological interpretability. Here we introduce a principled framework to estimate annotation-stratified genetic covariance between traits using GWAS summary statistics. Through theoretical and numerical analyses, we demonstrate that our method provides accurate covariance estimates, thereby enabling researchers to dissect both the shared and distinct genetic architecture across traits to better understand their etiologies. Among 50 complex traits with publicly accessible GWAS summary statistics (N total ≈ 4.5 million), we identified more than 170 pairs with statistically significant genetic covariance. In particular, we found strong genetic covariance between late-onset Alzheimer disease (LOAD) and amyotrophic lateral sclerosis (ALS), two major neurodegenerative diseases, in single-nucleotide polymorphisms (SNPs) with high minor allele frequencies and in SNPs located in the predicted functional genome. Joint analysis of LOAD, ALS, and other traits highlights LOAD's correlation with cognitive traits and hints at an autoimmune component for ALS. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
GWIPS-viz: development of a ribo-seq genome browser
Michel, Audrey M.; Fox, Gearoid; M. Kiran, Anmol; De Bo, Christof; O’Connor, Patrick B. F.; Heaphy, Stephen M.; Mullan, James P. A.; Donohue, Claire A.; Higgins, Desmond G.; Baranov, Pavel V.
2014-01-01
We describe the development of GWIPS-viz (http://gwips.ucc.ie), an online genome browser for viewing ribosome profiling data. Ribosome profiling (ribo-seq) is a recently developed technique that provides genome-wide information on protein synthesis (GWIPS) in vivo. It is based on the deep sequencing of ribosome-protected messenger RNA (mRNA) fragments, which allows the ribosome density along all mRNA transcripts present in the cell to be quantified. Since its inception, ribo-seq has been carried out in a number of eukaryotic and prokaryotic organisms. Owing to the increasing interest in ribo-seq, there is a pertinent demand for a dedicated ribo-seq genome browser. GWIPS-viz is based on The University of California Santa Cruz (UCSC) Genome Browser. Ribo-seq tracks, coupled with mRNA-seq tracks, are currently available for several genomes: human, mouse, zebrafish, nematode, yeast, bacteria (Escherichia coli K12, Bacillus subtilis), human cytomegalovirus and bacteriophage lambda. Our objective is to continue incorporating published ribo-seq data sets so that the wider community can readily view ribosome profiling information from multiple studies without the need to carry out computational processing. PMID:24185699
Wang, Edwin; Zou, Jinfeng; Zaman, Naif; Beitel, Lenore K; Trifiro, Mark; Paliouras, Miltiadis
2013-08-01
Recent tumor genome sequencing confirmed that one tumor often consists of multiple cell subpopulations (clones) which bear different, but related, genetic profiles such as mutation and copy number variation profiles. Thus far, one tumor has been viewed as a whole entity in cancer functional studies. With the advances of genome sequencing and computational analysis, we are able to quantify and computationally dissect clones from tumors, and then conduct clone-based analysis. Emerging technologies such as single-cell genome sequencing and RNA-Seq could profile tumor clones. Thus, we should reconsider how to conduct cancer systems biology studies in the genome sequencing era. We will outline new directions for conducting cancer systems biology by considering that genome sequencing technology can be used for dissecting, quantifying and genetically characterizing clones from tumors. Topics discussed in Part 1 of this review include computationally quantifying of tumor subpopulations; clone-based network modeling, cancer hallmark-based networks and their high-order rewiring principles and the principles of cell survival networks of fast-growing clones. Crown Copyright © 2013. Published by Elsevier Ltd. All rights reserved.
Kersting, Anna R.; Bornberg-Bauer, Erich; Moore, Andrew D.; Grath, Sonja
2012-01-01
Plant genomes are generally very large, mostly paleopolyploid, and have numerous gene duplicates and complex genomic features such as repeats and transposable elements. Many of these features have been hypothesized to enable plants, which cannot easily escape environmental challenges, to rapidly adapt. Another mechanism, which has recently been well described as a major facilitator of rapid adaptation in bacteria, animals, and fungi but not yet for plants, is modular rearrangement of protein-coding genes. Due to the high precision of profile-based methods, rearrangements can be well captured at the protein level by characterizing the emergence, loss, and rearrangements of protein domains, their structural, functional, and evolutionary building blocks. Here, we study the dynamics of domain rearrangements and explore their adaptive benefit in 27 plant and 3 algal genomes. We use a phylogenomic approach by which we can explain the formation of 88% of all arrangements by single-step events, such as fusion, fission, and terminal loss of domains. We find many domains are lost along every lineage, but at least 500 domains are novel, that is, they are unique to green plants and emerged more or less recently. These novel domains duplicate and rearrange more readily within their genomes than ancient domains and are overproportionally involved in stress response and developmental innovations. Novel domains more often affect regulatory proteins and show a higher degree of structural disorder than ancient domains. Whereas a relatively large and well-conserved core set of single-domain proteins exists, long multi-domain arrangements tend to be species-specific. We find that duplicated genes are more often involved in rearrangements. Although fission events typically impact metabolic proteins, fusion events often create new signaling proteins essential for environmental sensing. Taken together, the high volatility of single domains and complex arrangements in plant genomes demonstrate the importance of modularity for environmental adaptability of plants. PMID:22250127
Ecophysiology of Freshwater Verrucomicrobia Inferred from Metagenome-Assembled Genomes
He, Shaomei; Stevens, Sarah L. R.; Chan, Leong-Keat; Bertilsson, Stefan; Glavina del Rio, Tijana; Tringe, Susannah G.; Malmstrom, Rex R.
2017-01-01
ABSTRACT Microbes are critical in carbon and nutrient cycling in freshwater ecosystems. Members of the Verrucomicrobia are ubiquitous in such systems, and yet their roles and ecophysiology are not well understood. In this study, we recovered 19 Verrucomicrobia draft genomes by sequencing 184 time-series metagenomes from a eutrophic lake and a humic bog that differ in carbon source and nutrient availabilities. These genomes span four of the seven previously defined Verrucomicrobia subdivisions and greatly expand knowledge of the genomic diversity of freshwater Verrucomicrobia. Genome analysis revealed their potential role as (poly)saccharide degraders in freshwater, uncovered interesting genomic features for this lifestyle, and suggested their adaptation to nutrient availabilities in their environments. Verrucomicrobia populations differ significantly between the two lakes in glycoside hydrolase gene abundance and functional profiles, reflecting the autochthonous and terrestrially derived allochthonous carbon sources of the two ecosystems, respectively. Interestingly, a number of genomes recovered from the bog contained gene clusters that potentially encode a novel porin-multiheme cytochrome c complex and might be involved in extracellular electron transfer in the anoxic humus-rich environment. Notably, most epilimnion genomes have large numbers of so-called “Planctomycete-specific” cytochrome c-encoding genes, which exhibited distribution patterns nearly opposite to those seen with glycoside hydrolase genes, probably associated with the different levels of environmental oxygen availability and carbohydrate complexity between lakes/layers. Overall, the recovered genomes represent a major step toward understanding the role, ecophysiology, and distribution of Verrucomicrobia in freshwater. IMPORTANCE Freshwater Verrucomicrobia spp. are cosmopolitan in lakes and rivers, and yet their roles and ecophysiology are not well understood, as cultured freshwater Verrucomicrobia spp. are restricted to one subdivision of this phylum. Here, we greatly expanded the known genomic diversity of this freshwater lineage by recovering 19 Verrucomicrobia draft genomes from 184 metagenomes collected from a eutrophic lake and a humic bog across multiple years. Most of these genomes represent the first freshwater representatives of several Verrucomicrobia subdivisions. Genomic analysis revealed Verrucomicrobia to be potential (poly)saccharide degraders and suggested their adaptation to carbon sources of different origins in the two contrasting ecosystems. We identified putative extracellular electron transfer genes and so-called “Planctomycete-specific” cytochrome c-encoding genes and identified their distinct distribution patterns between the lakes/layers. Overall, our analysis greatly advances the understanding of the function, ecophysiology, and distribution of freshwater Verrucomicrobia, while highlighting their potential role in freshwater carbon cycling. PMID:28959738
Loong, Herbert H; Raymond, Victoria M; Shiotsu, Yukimasa; Chua, Daniel T T; Teo, Peter M L; Yung, Tony; Skrzypczak, Stan; Lanman, Richard B; Mok, Tony S K
2018-05-07
Genomic profiling of cell-free circulating tumor DNA (ctDNA) is a potential alternative to repeat invasive biopsy in patients with advanced cancer. We report the first real-world cohort of comprehensive genomic assessments of patients with non-small-cell lung cancer (NSCLC) in a Chinese population. We performed a retrospective analysis of patients with advanced or metastatic NSCLC whose physician requested ctDNA-based genomic profiling using the Guardant360 platform from January 2016 to June 2017. Guardant360 includes all 4 major types of genomic alterations (point mutations, insertion-deletion alterations, fusions, and amplifications) in 73 genes. Genomic profiling was performed in 76 patients from Hong Kong during the 18-month study period (median age, 59.5 years; 41 men and 35 women). The histologic types included adenocarcinoma (n = 10), NSCLC, not otherwise specified (n = 58), and squamous cell carcinoma (n = 8). In the adenocarcinoma and NSCLC, not otherwise specified, combined group, 62 of the 68 patients (91%) had variants identified (range, 1-12; median, 3), of whom, 26 (42%) had ≥ 1 of the 7 National Comprehensive Cancer Network-recommended lung adenocarcinoma genomic targets. Concurrent detection of driver and resistance mutations were identified in 6 of 13 patients with EGFR driver mutations and in 3 of 5 patients with EML4-ALK fusions. All 8 patients with squamous cell carcinoma had multiple variants identified (range, 1-20; median, 6), including FGFR1 amplification and ERBB2 (HER2) amplification. PIK3CA amplification occurred in combination with either FGFR1 or ERBB2 (HER2) amplification or alone. Genomic profiling using ctDNA analysis detected alterations in most patients with advanced-stage NSCLC, with targetable aberrations and resistance mechanisms identified. This approach has demonstrated its feasibility in Asia. Copyright © 2018 Elsevier Inc. All rights reserved.
Annotation and Classification of CRISPR-Cas Systems
Makarova, Kira S.; Koonin, Eugene V.
2018-01-01
The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas (CRISPR-associated proteins) is a prokaryotic adaptive immune system that is represented in most archaea and many bacteria. Among the currently known prokaryotic defense systems, the CRISPR-Cas genomic loci show unprecedented complexity and diversity. Classification of CRISPR-Cas variants that would capture their evolutionary relationships to the maximum possible extent is essential for comparative genomic and functional characterization of this theoretically and practically important system of adaptive immunity. To this end, a multipronged approach has been developed that combines phylogenetic analysis of the conserved Cas proteins with comparison of gene repertoires and arrangements in CRISPR-Cas loci. This approach led to the current classification of CRISPR-Cas systems into three distinct types and ten subtypes for each of which signature genes have been identified. Comparative genomic analysis of the CRISPR-Cas systems in new archaeal and bacterial genomes performed over the 3 years elapsed since the development of this classification makes it clear that new types and subtypes of CRISPR-Cas need to be introduced. Moreover, this classification system captures only part of the complexity of CRISPR-Cas organization and evolution, due to the intrinsic modularity and evolutionary mobility of these immunity systems, resulting in numerous recombinant variants. Moreover, most of the cas genes evolve rapidly, complicating the family assignment for many Cas proteins and the use of family profiles for the recognition of CRISPR-Cas subtype signatures. Further progress in the comparative analysis of CRISPR-Cas systems requires integration of the most sensitive sequence comparison tools, protein structure comparison, and refined approaches for comparison of gene neighborhoods. PMID:25981466
Annotation and Classification of CRISPR-Cas Systems.
Makarova, Kira S; Koonin, Eugene V
2015-01-01
The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas (CRISPR-associated proteins) is a prokaryotic adaptive immune system that is represented in most archaea and many bacteria. Among the currently known prokaryotic defense systems, the CRISPR-Cas genomic loci show unprecedented complexity and diversity. Classification of CRISPR-Cas variants that would capture their evolutionary relationships to the maximum possible extent is essential for comparative genomic and functional characterization of this theoretically and practically important system of adaptive immunity. To this end, a multipronged approach has been developed that combines phylogenetic analysis of the conserved Cas proteins with comparison of gene repertoires and arrangements in CRISPR-Cas loci. This approach led to the current classification of CRISPR-Cas systems into three distinct types and ten subtypes for each of which signature genes have been identified. Comparative genomic analysis of the CRISPR-Cas systems in new archaeal and bacterial genomes performed over the 3 years elapsed since the development of this classification makes it clear that new types and subtypes of CRISPR-Cas need to be introduced. Moreover, this classification system captures only part of the complexity of CRISPR-Cas organization and evolution, due to the intrinsic modularity and evolutionary mobility of these immunity systems, resulting in numerous recombinant variants. Moreover, most of the cas genes evolve rapidly, complicating the family assignment for many Cas proteins and the use of family profiles for the recognition of CRISPR-Cas subtype signatures. Further progress in the comparative analysis of CRISPR-Cas systems requires integration of the most sensitive sequence comparison tools, protein structure comparison, and refined approaches for comparison of gene neighborhoods.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ortega, Davi R.; Zhulin, Igor B.; Punta, Marco
Escherichia coli and Salmonella enterica are models for many experiments in molecular biology including chemotaxis, and most of the results obtained with one organism have been generalized to another. While most components of the chemotaxis pathway are strongly conserved between the two species, Salmonella genomes contain some chemoreceptors and an additional protein, CheV, that are not found in E. coli. The role of CheV was examined in distantly related species Bacillus subtilis and Helicobacter pylori, but its role in bacterial chemotaxis is still not well understood. We tested a hypothesis that in enterobacteria CheV functions as an additional adaptor linkingmore » the CheA kinase to certain types of chemoreceptors that cannot be effectively accommodated by the universal adaptor CheW. Phylogenetic profiling, genomic context and comparative protein sequence analyses suggested that CheV interacts with specific domains of CheA and chemoreceptors from an orthologous group exemplified by the Salmonella McpC protein. Structural consideration of the conservation patterns suggests that CheV and CheW share the same binding spot on the chemoreceptor structure, but have some affinity bias towards chemoreceptors from different orthologous groups. Finally, published experimental results and data newly obtained via comparative genomics support the idea that CheV functions as a "phosphate sink" possibly to off-set the over-stimulation of the kinase by certain types of chemoreceptors. Altogether, our results strongly suggest that CheV is an additional adaptor for accommodating specific chemoreceptors within the chemotaxis signaling complex.« less
Jiang, Jinjin; Wang, Yue; Zhu, Bao; Fang, Tingting; Fang, Yujie; Wang, Youping
2015-01-27
Brassica includes many successfully cultivated crop species of polyploid origin, either by ancestral genome triplication or by hybridization between two diploid progenitors, displaying complex repetitive sequences and transposons. The U's triangle, which consists of three diploids and three amphidiploids, is optimal for the analysis of complicated genomes after polyploidization. Next-generation sequencing enables the transcriptome profiling of polyploids on a global scale. We examined the gene expression patterns of three diploids (Brassica rapa, B. nigra, and B. oleracea) and three amphidiploids (B. napus, B. juncea, and B. carinata) via digital gene expression analysis. In total, the libraries generated between 5.7 and 6.1 million raw reads, and the clean tags of each library were mapped to 18547-21995 genes of B. rapa genome. The unambiguous tag-mapped genes in the libraries were compared. Moreover, the majority of differentially expressed genes (DEGs) were explored among diploids as well as between diploids and amphidiploids. Gene ontological analysis was performed to functionally categorize these DEGs into different classes. The Kyoto Encyclopedia of Genes and Genomes analysis was performed to assign these DEGs into approximately 120 pathways, among which the metabolic pathway, biosynthesis of secondary metabolites, and peroxisomal pathway were enriched. The non-additive genes in Brassica amphidiploids were analyzed, and the results indicated that orthologous genes in polyploids are frequently expressed in a non-additive pattern. Methyltransferase genes showed differential expression pattern in Brassica species. Our results provided an understanding of the transcriptome complexity of natural Brassica species. The gene expression changes in diploids and allopolyploids may help elucidate the morphological and physiological differences among Brassica species.
Ortega, Davi R.; Zhulin, Igor B.; Punta, Marco
2016-02-04
Escherichia coli and Salmonella enterica are models for many experiments in molecular biology including chemotaxis, and most of the results obtained with one organism have been generalized to another. While most components of the chemotaxis pathway are strongly conserved between the two species, Salmonella genomes contain some chemoreceptors and an additional protein, CheV, that are not found in E. coli. The role of CheV was examined in distantly related species Bacillus subtilis and Helicobacter pylori, but its role in bacterial chemotaxis is still not well understood. We tested a hypothesis that in enterobacteria CheV functions as an additional adaptor linkingmore » the CheA kinase to certain types of chemoreceptors that cannot be effectively accommodated by the universal adaptor CheW. Phylogenetic profiling, genomic context and comparative protein sequence analyses suggested that CheV interacts with specific domains of CheA and chemoreceptors from an orthologous group exemplified by the Salmonella McpC protein. Structural consideration of the conservation patterns suggests that CheV and CheW share the same binding spot on the chemoreceptor structure, but have some affinity bias towards chemoreceptors from different orthologous groups. Finally, published experimental results and data newly obtained via comparative genomics support the idea that CheV functions as a "phosphate sink" possibly to off-set the over-stimulation of the kinase by certain types of chemoreceptors. Altogether, our results strongly suggest that CheV is an additional adaptor for accommodating specific chemoreceptors within the chemotaxis signaling complex.« less
Soh, Jung; Gordon, Paul MK; Taschuk, Morgan L; Dong, Anguo; Ah-Seng, Andrew C; Turinsky, Andrei L; Sensen, Christoph W
2008-01-01
Background The Bluejay genome browser has been developed over several years to address the challenges posed by the ever increasing number of data types as well as the increasing volume of data in genome research. Beginning with a browser capable of rendering views of XML-based genomic information and providing scalable vector graphics output, we have now completed version 1.0 of the system with many additional features. Our development efforts were guided by our observation that biologists who use both gene expression profiling and comparative genomics gain functional insights above and beyond those provided by traditional per-gene analyses. Results Bluejay 1.0 is a genome viewer integrating genome annotation with: (i) gene expression information; and (ii) comparative analysis with an unlimited number of other genomes in the same view. This allows the biologist to see a gene not just in the context of its genome, but also its regulation and its evolution. Bluejay now has rich provision for personalization by users: (i) numerous display customization features; (ii) the availability of waypoints for marking multiple points of interest on a genome and subsequently utilizing them; and (iii) the ability to take user relevance feedback of annotated genes or textual items to offer personalized recommendations. Bluejay 1.0 also embeds the Seahawk browser for the Moby protocol, enabling users to seamlessly invoke hundreds of Web Services on genomic data of interest without any hard-coding. Conclusion Bluejay offers a unique set of customizable genome-browsing features, with the goal of allowing biologists to quickly focus on, analyze, compare, and retrieve related information on the parts of the genomic data they are most interested in. We expect these capabilities of Bluejay to benefit the many biologists who want to answer complex questions using the information available from completely sequenced genomes. PMID:18940007
Janssens, A Cecile J W; Gwinn, Marta; Bradley, Linda A; Oostra, Ben A; van Duijn, Cornelia M; Khoury, Muin J
2008-03-01
Predictive genomic profiling used to produce personalized nutrition and other lifestyle health recommendations is currently offered directly to consumers. By examining previous meta-analyses and HuGE reviews, we assessed the scientific evidence supporting the purported gene-disease associations for genes included in genomic profiles offered online. We identified seven companies that offer predictive genomic profiling. We searched PubMed for meta-analyses and HuGE reviews of studies of gene-disease associations published from 2000 through June 2007 in which the genotypes of people with a disease were compared with those of a healthy or general-population control group. The seven companies tested at least 69 different polymorphisms in 56 genes. Of the 56 genes tested, 24 (43%) were not reviewed in meta-analyses. For the remaining 32 genes, we found 260 meta-analyses that examined 160 unique polymorphism-disease associations, of which only 60 (38%) were found to be statistically significant. Even the 60 significant associations, which involved 29 different polymorphisms and 28 different diseases, were generally modest, with synthetic odds ratios ranging from 0.54 to 0.88 for protective variants and from 1.04 to 3.2 for risk variants. Furthermore, genes in cardiogenomic profiles were more frequently associated with noncardiovascular diseases than with cardiovascular diseases, and though two of the five genes of the osteogenomic profiles did show significant associations with disease, the associations were not with bone diseases. There is insufficient scientific evidence to conclude that genomic profiles are useful in measuring genetic risk for common diseases or in developing personalized diet and lifestyle recommendations for disease prevention.
Ortiz, Michael V; Kobos, Rachel; Walsh, Michael; Slotkin, Emily K; Roberts, Stephen; Berger, Michael F; Hameed, Meera; Solit, David; Ladanyi, Marc; Shukla, Neerav; Kentsis, Alex
2016-08-01
Pediatric oncologists have begun to leverage tumor genetic profiling to match patients with targeted therapies. At the Memorial Sloan Kettering Cancer Center (MSKCC), we developed the Pediatric Molecular Tumor Board (PMTB) to track, integrate, and interpret clinical genomic profiling and potential targeted therapeutic recommendations. This retrospective case series includes all patients reviewed by the MSKCC PMTB from July 2014 to June 2015. Cases were submitted by treating oncologists and potential treatment recommendations were based upon the modified guidelines of the Oxford Centre for Evidence-Based Medicine. There were 41 presentations of 39 individual patients during the study period. Gliomas, acute myeloid leukemia, and neuroblastoma were the most commonly reviewed cases. Thirty nine (87%) of the 45 molecular sequencing profiles utilized hybrid-capture targeted genome sequencing. In 30 (73%) of the 41 presentations, the PMTB provided therapeutic recommendations, of which 19 (46%) were implemented. Twenty-one (70%) of the recommendations involved targeted therapies. Three (14%) targeted therapy recommendations had published evidence to support the proposed recommendations (evidence levels 1-2), eight (36%) recommendations had preclinical evidence (level 3), and 11 (50%) recommendations were based upon hypothetical biological rationales (level 4). The MSKCC PMTB enabled a clinically relevant interpretation of genomic profiling. Effective use of clinical genomics is anticipated to require new and improved tools to ascribe pathogenic significance and therapeutic actionability. The development of specific rule-driven clinical protocols will be needed for the incorporation and evaluation of genomic and molecular profiling in interventional prospective clinical trials. © 2016 Wiley Periodicals, Inc.
Gao, Hui; Zhao, Chunyan
2018-01-01
Chromatin immunoprecipitation (ChIP) has become the most effective and widely used tool to study the interactions between specific proteins or modified forms of proteins and a genomic DNA region. Combined with genome-wide profiling technologies, such as microarray hybridization (ChIP-on-chip) or massively parallel sequencing (ChIP-seq), ChIP could provide a genome-wide mapping of in vivo protein-DNA interactions in various organisms. Here, we describe a protocol of ChIP-on-chip that uses tiling microarray to obtain a genome-wide profiling of ChIPed DNA.
Genome-wide alterations of the DNA replication program during tumor progression
NASA Astrophysics Data System (ADS)
Arneodo, A.; Goldar, A.; Argoul, F.; Hyrien, O.; Audit, B.
2016-08-01
Oncogenic stress is a major driving force in the early stages of cancer development. Recent experimental findings reveal that, in precancerous lesions and cancers, activated oncogenes may induce stalling and dissociation of DNA replication forks resulting in DNA damage. Replication timing is emerging as an important epigenetic feature that recapitulates several genomic, epigenetic and functional specificities of even closely related cell types. There is increasing evidence that chromosome rearrangements, the hallmark of many cancer genomes, are intimately associated with the DNA replication program and that epigenetic replication timing changes often precede chromosomic rearrangements. The recent development of a novel methodology to map replication fork polarity using deep sequencing of Okazaki fragments has provided new and complementary genome-wide replication profiling data. We review the results of a wavelet-based multi-scale analysis of genomic and epigenetic data including replication profiles along human chromosomes. These results provide new insight into the spatio-temporal replication program and its dynamics during differentiation. Here our goal is to bring to cancer research, the experimental protocols and computational methodologies for replication program profiling, and also the modeling of the spatio-temporal replication program. To illustrate our purpose, we report very preliminary results obtained for the chronic myelogeneous leukemia, the archetype model of cancer. Finally, we discuss promising perspectives on using genome-wide DNA replication profiling as a novel efficient tool for cancer diagnosis, prognosis and personalized treatment.
An efficient approach to BAC based assembly of complex genomes.
Visendi, Paul; Berkman, Paul J; Hayashi, Satomi; Golicz, Agnieszka A; Bayer, Philipp E; Ruperao, Pradeep; Hurgobin, Bhavna; Montenegro, Juan; Chan, Chon-Kit Kenneth; Staňková, Helena; Batley, Jacqueline; Šimková, Hana; Doležel, Jaroslav; Edwards, David
2016-01-01
There has been an exponential growth in the number of genome sequencing projects since the introduction of next generation DNA sequencing technologies. Genome projects have increasingly involved assembly of whole genome data which produces inferior assemblies compared to traditional Sanger sequencing of genomic fragments cloned into bacterial artificial chromosomes (BACs). While whole genome shotgun sequencing using next generation sequencing (NGS) is relatively fast and inexpensive, this method is extremely challenging for highly complex genomes, where polyploidy or high repeat content confounds accurate assembly, or where a highly accurate 'gold' reference is required. Several attempts have been made to improve genome sequencing approaches by incorporating NGS methods, to variable success. We present the application of a novel BAC sequencing approach which combines indexed pools of BACs, Illumina paired read sequencing, a sequence assembler specifically designed for complex BAC assembly, and a custom bioinformatics pipeline. We demonstrate this method by sequencing and assembling BAC cloned fragments from bread wheat and sugarcane genomes. We demonstrate that our assembly approach is accurate, robust, cost effective and scalable, with applications for complete genome sequencing in large and complex genomes.
Knowledge-driven genomic interactions: an application in ovarian cancer.
Kim, Dokyoon; Li, Ruowang; Dudek, Scott M; Frase, Alex T; Pendergrass, Sarah A; Ritchie, Marylyn D
2014-01-01
Effective cancer clinical outcome prediction for understanding of the mechanism of various types of cancer has been pursued using molecular-based data such as gene expression profiles, an approach that has promise for providing better diagnostics and supporting further therapies. However, clinical outcome prediction based on gene expression profiles varies between independent data sets. Further, single-gene expression outcome prediction is limited for cancer evaluation since genes do not act in isolation, but rather interact with other genes in complex signaling or regulatory networks. In addition, since pathways are more likely to co-operate together, it would be desirable to incorporate expert knowledge to combine pathways in a useful and informative manner. Thus, we propose a novel approach for identifying knowledge-driven genomic interactions and applying it to discover models associated with cancer clinical phenotypes using grammatical evolution neural networks (GENN). In order to demonstrate the utility of the proposed approach, an ovarian cancer data from the Cancer Genome Atlas (TCGA) was used for predicting clinical stage as a pilot project. We identified knowledge-driven genomic interactions associated with cancer stage from single knowledge bases such as sources of pathway-pathway interaction, but also knowledge-driven genomic interactions across different sets of knowledge bases such as pathway-protein family interactions by integrating different types of information. Notably, an integration model from different sources of biological knowledge achieved 78.82% balanced accuracy and outperformed the top models with gene expression or single knowledge-based data types alone. Furthermore, the results from the models are more interpretable because they are framed in the context of specific biological pathways or other expert knowledge. The success of the pilot study we have presented herein will allow us to pursue further identification of models predictive of clinical cancer survival and recurrence. Understanding the underlying tumorigenesis and progression in ovarian cancer through the global view of interactions within/between different biological knowledge sources has the potential for providing more effective screening strategies and therapeutic targets for many types of cancer.
Dong, Yanhan; Li, Ying; Zhao, Miaomiao; Jing, Maofeng; Liu, Xinyu; Liu, Muxing; Guo, Xianxian; Zhang, Xing; Chen, Yue; Liu, Yongfeng; Liu, Yanhong; Ye, Wenwu; Zhang, Haifeng; Wang, Yuanchao; Zheng, Xiaobo; Wang, Ping; Zhang, Zhengguang
2015-01-01
Genome dynamics of pathogenic organisms are driven by pathogen and host co-evolution, in which pathogen genomes are shaped to overcome stresses imposed by hosts with various genetic backgrounds through generation of a variety of isolates. This same principle applies to the rice blast pathogen Magnaporthe oryzae and the rice host; however, genetic variations among different isolates of M. oryzae remain largely unknown, particularly at genome and transcriptome levels. Here, we applied genomic and transcriptomic analytical tools to investigate M. oryzae isolate 98-06 that is the most aggressive in infection of susceptible rice cultivars. A unique 1.4 Mb of genomic sequences was found in isolate 98-06 in comparison to reference strain 70-15. Genome-wide expression profiling revealed the presence of two critical expression patterns of M. oryzae based on 64 known pathogenicity-related (PaR) genes. In addition, 134 candidate effectors with various segregation patterns were identified. Five tested proteins could suppress BAX-mediated programmed cell death in Nicotiana benthamiana leaves. Characterization of isolate-specific effector candidates Iug6 and Iug9 and PaR candidate Iug18 revealed that they have a role in fungal propagation and pathogenicity. Moreover, Iug6 and Iug9 are located exclusively in the biotrophic interfacial complex (BIC) and their overexpression leads to suppression of defense-related gene expression in rice, suggesting that they might participate in biotrophy by inhibiting the SA and ET pathways within the host. Thus, our studies identify novel effector and PaR proteins involved in pathogenicity of the highly aggressive M. oryzae field isolate 98-06, and reveal molecular and genomic dynamics in the evolution of M. oryzae and rice host interactions. PMID:25837042
Smith, Barbara A.; Imamura, Hideo; Sanders, Mandy; Svobodova, Milena; Volf, Petr; Berriman, Matthew; Cotton, James A.; Smith, Deborah F.
2014-01-01
Although asexual reproduction via clonal propagation has been proposed as the principal reproductive mechanism across parasitic protozoa of the Leishmania genus, sexual recombination has long been suspected, based on hybrid marker profiles detected in field isolates from different geographical locations. The recent experimental demonstration of a sexual cycle in Leishmania within sand flies has confirmed the occurrence of hybridisation, but knowledge of the parasite life cycle in the wild still remains limited. Here, we use whole genome sequencing to investigate the frequency of sexual reproduction in Leishmania, by sequencing the genomes of 11 Leishmania infantum isolates from sand flies and 1 patient isolate in a focus of cutaneous leishmaniasis in the Çukurova province of southeast Turkey. This is the first genome-wide examination of a vector-isolated population of Leishmania parasites. A genome-wide pattern of patchy heterozygosity and SNP density was observed both within individual strains and across the whole group. Comparisons with other Leishmania donovani complex genome sequences suggest that these isolates are derived from a single cross of two diverse strains with subsequent recombination within the population. This interpretation is supported by a statistical model of the genomic variability for each strain compared to the L. infantum reference genome strain as well as genome-wide scans for recombination within the population. Further analysis of these heterozygous blocks indicates that the two parents were phylogenetically distinct. Patterns of linkage disequilibrium indicate that this population reproduced primarily clonally following the original hybridisation event, but that some recombination also occurred. This observation allowed us to estimate the relative rates of sexual and asexual reproduction within this population, to our knowledge the first quantitative estimate of these events during the Leishmania life cycle. PMID:24453988
Angstadt, Andrea Y; Motsinger-Reif, Alison; Thomas, Rachael; Kisseberth, William C; Guillermo Couto, C; Duval, Dawn L; Nielsen, Dahlia M; Modiano, Jaime F; Breen, Matthew
2011-11-01
Osteosarcoma (OS) is the most commonly diagnosed malignant bone tumor in humans and dogs, characterized in both species by extremely complex karyotypes exhibiting high frequencies of genomic imbalance. Evaluation of genomic signatures in human OS using array comparative genomic hybridization (aCGH) has assisted in uncovering genetic mechanisms that result in disease phenotype. Previous low-resolution (10-20 Mb) aCGH analysis of canine OS identified a wide range of recurrent DNA copy number aberrations, indicating extensive genomic instability. In this study, we profiled 123 canine OS tumors by 1 Mb-resolution aCGH to generate a dataset for direct comparison with current data for human OS, concluding that several high frequency aberrations in canine and human OS are orthologous. To ensure complete coverage of gene annotation, we identified the human refseq genes that map to these orthologous aberrant dog regions and found several candidate genes warranting evaluation for OS involvement. Specifically, subsequenct FISH and qRT-PCR analysis of RUNX2, TUSC3, and PTEN indicated that expression levels correlated with genomic copy number status, showcasing RUNX2 as an OS associated gene and TUSC3 as a possible tumor suppressor candidate. Together these data demonstrate the ability of genomic comparative oncology to identify genetic abberations which may be important for OS progression. Large scale screening of genomic imbalance in canine OS further validates the use of the dog as a suitable model for human cancers, supporting the idea that dysregulation discovered in canine cancers will provide an avenue for complementary study in human counterparts. Copyright © 2011 Wiley-Liss, Inc.
Global Quantitative Modeling of Chromatin Factor Interactions
Zhou, Jian; Troyanskaya, Olga G.
2014-01-01
Chromatin is the driver of gene regulation, yet understanding the molecular interactions underlying chromatin factor combinatorial patterns (or the “chromatin codes”) remains a fundamental challenge in chromatin biology. Here we developed a global modeling framework that leverages chromatin profiling data to produce a systems-level view of the macromolecular complex of chromatin. Our model ultilizes maximum entropy modeling with regularization-based structure learning to statistically dissect dependencies between chromatin factors and produce an accurate probability distribution of chromatin code. Our unsupervised quantitative model, trained on genome-wide chromatin profiles of 73 histone marks and chromatin proteins from modENCODE, enabled making various data-driven inferences about chromatin profiles and interactions. We provided a highly accurate predictor of chromatin factor pairwise interactions validated by known experimental evidence, and for the first time enabled higher-order interaction prediction. Our predictions can thus help guide future experimental studies. The model can also serve as an inference engine for predicting unknown chromatin profiles — we demonstrated that with this approach we can leverage data from well-characterized cell types to help understand less-studied cell type or conditions. PMID:24675896
2011-01-01
Background Green plant leaves have always fascinated biologists as hosts for photosynthesis and providers of basic energy to many food webs. Today, comprehensive databases of gene expression data enable us to apply increasingly more advanced computational methods for reverse-engineering the regulatory network of leaves, and to begin to understand the gene interactions underlying complex emergent properties related to stress-response and development. These new systems biology methods are now also being applied to organisms such as Populus, a woody perennial tree, in order to understand the specific characteristics of these species. Results We present a systems biology model of the regulatory network of Populus leaves. The network is reverse-engineered from promoter information and expression profiles of leaf-specific genes measured over a large set of conditions related to stress and developmental. The network model incorporates interactions between regulators, such as synergistic and competitive relationships, by evaluating increasingly more complex regulatory mechanisms, and is therefore able to identify new regulators of leaf development not found by traditional genomics methods based on pair-wise expression similarity. The approach is shown to explain available gene function information and to provide robust prediction of expression levels in new data. We also use the predictive capability of the model to identify condition-specific regulation as well as conserved regulation between Populus and Arabidopsis. Conclusions We outline a computationally inferred model of the regulatory network of Populus leaves, and show how treating genes as interacting, rather than individual, entities identifies new regulators compared to traditional genomics analysis. Although systems biology models should be used with care considering the complexity of regulatory programs and the limitations of current genomics data, methods describing interactions can provide hypotheses about the underlying cause of emergent properties and are needed if we are to identify target genes other than those constituting the "low hanging fruit" of genomic analysis. PMID:21232107
Stojnic, Robert; Fu, Audrey Qiuyan; Adryan, Boris
2012-01-01
Inferring the combinatorial regulatory code of transcription factors (TFs) from genome-wide TF binding profiles is challenging. A major reason is that TF binding profiles significantly overlap and are therefore highly correlated. Clustered occurrence of multiple TFs at genomic sites may arise from chromatin accessibility and local cooperation between TFs, or binding sites may simply appear clustered if the profiles are generated from diverse cell populations. Overlaps in TF binding profiles may also result from measurements taken at closely related time intervals. It is thus of great interest to distinguish TFs that directly regulate gene expression from those that are indirectly associated with gene expression. Graphical models, in particular Bayesian networks, provide a powerful mathematical framework to infer different types of dependencies. However, existing methods do not perform well when the features (here: TF binding profiles) are highly correlated, when their association with the biological outcome is weak, and when the sample size is small. Here, we develop a novel computational method, the Neighbourhood Consistent PC (NCPC) algorithms, which deal with these scenarios much more effectively than existing methods do. We further present a novel graphical representation, the Direct Dependence Graph (DDGraph), to better display the complex interactions among variables. NCPC and DDGraph can also be applied to other problems involving highly correlated biological features. Both methods are implemented in the R package ddgraph, available as part of Bioconductor (http://bioconductor.org/packages/2.11/bioc/html/ddgraph.html). Applied to real data, our method identified TFs that specify different classes of cis-regulatory modules (CRMs) in Drosophila mesoderm differentiation. Our analysis also found depletion of the early transcription factor Twist binding at the CRMs regulating expression in visceral and somatic muscle cells at later stages, which suggests a CRM-specific repression mechanism that so far has not been characterised for this class of mesodermal CRMs. PMID:23144600
Mojib, Nazia; Amad, Maan; Thimma, Manjula; Aldanondo, Naroa; Kumaran, Mande; Irigoien, Xabier
2014-06-01
The tropical oligotrophic oceanic areas are characterized by high water transparency and annual solar radiation. Under these conditions, a large number of phylogenetically diverse mesozooplankton species living in the surface waters (neuston) are found to be blue pigmented. In the present study, we focused on understanding the metabolic and genetic basis of the observed blue phenotype functional equivalence between the blue-pigmented organisms from the phylum Arthropoda, subclass Copepoda (Acartia fossae) and the phylum Chordata, class Appendicularia (Oikopleura dioica) in the Red Sea. Previous studies have shown that carotenoid-protein complexes are responsible for blue coloration in crustaceans. Therefore, we performed carotenoid metabolic profiling using both targeted and nontargeted (high-resolution mass spectrometry) approaches in four different blue-pigmented genera of copepods and one blue-pigmented species of appendicularia. Astaxanthin was found to be the principal carotenoid in all the species. The pathway analysis showed that all the species can synthesize astaxanthin from β-carotene, ingested from dietary sources, via 3-hydroxyechinenone, canthaxanthin, zeaxanthin, adonirubin or adonixanthin. Further, using de novo assembled transcriptome of blue A. fossae (subclass Copepoda), we identified highly expressed homologous β-carotene hydroxylase enzymes and putative carotenoid-binding proteins responsible for astaxanthin formation and the blue phenotype. In blue O. dioica (class Appendicularia), corresponding putative genes were identified from the reference genome. Collectively, our data provide molecular evidences for the bioconversion and accumulation of blue astaxanthin-protein complexes underpinning the observed ecological functional equivalence and adaptive convergence among neustonic mesozooplankton. © 2014 The Authors. Molecular Ecology Published by John Wiley & Sons Ltd.
Genomic survey, expression profile and co-expression network analysis of OsWD40 family in rice
2012-01-01
Background WD40 proteins represent a large family in eukaryotes, which have been involved in a broad spectrum of crucial functions. Systematic characterization and co-expression analysis of OsWD40 genes enable us to understand the networks of the WD40 proteins and their biological processes and gene functions in rice. Results In this study, we identify and analyze 200 potential OsWD40 genes in rice, describing their gene structures, genome localizations, and evolutionary relationship of each member. Expression profiles covering the whole life cycle in rice has revealed that transcripts of OsWD40 were accumulated differentially during vegetative and reproductive development and preferentially up or down-regulated in different tissues. Under phytohormone treatments, 25 OsWD40 genes were differentially expressed with treatments of one or more of the phytohormone NAA, KT, or GA3 in rice seedlings. We also used a combined analysis of expression correlation and Gene Ontology annotation to infer the biological role of the OsWD40 genes in rice. The results suggested that OsWD40 genes may perform their diverse functions by complex network, thus were predictive for understanding their biological pathways. The analysis also revealed that OsWD40 genes might interact with each other to take part in metabolic pathways, suggesting a more complex feedback network. Conclusions All of these analyses suggest that the functions of OsWD40 genes are diversified, which provide useful references for selecting candidate genes for further functional studies. PMID:22429805
BioVLAB-MMIA: a cloud environment for microRNA and mRNA integrated analysis (MMIA) on Amazon EC2.
Lee, Hyungro; Yang, Youngik; Chae, Heejoon; Nam, Seungyoon; Choi, Donghoon; Tangchaisin, Patanachai; Herath, Chathura; Marru, Suresh; Nephew, Kenneth P; Kim, Sun
2012-09-01
MicroRNAs, by regulating the expression of hundreds of target genes, play critical roles in developmental biology and the etiology of numerous diseases, including cancer. As a vast amount of microRNA expression profile data are now publicly available, the integration of microRNA expression data sets with gene expression profiles is a key research problem in life science research. However, the ability to conduct genome-wide microRNA-mRNA (gene) integration currently requires sophisticated, high-end informatics tools, significant expertise in bioinformatics and computer science to carry out the complex integration analysis. In addition, increased computing infrastructure capabilities are essential in order to accommodate large data sets. In this study, we have extended the BioVLAB cloud workbench to develop an environment for the integrated analysis of microRNA and mRNA expression data, named BioVLAB-MMIA. The workbench facilitates computations on the Amazon EC2 and S3 resources orchestrated by the XBaya Workflow Suite. The advantages of BioVLAB-MMIA over the web-based MMIA system include: 1) readily expanded as new computational tools become available; 2) easily modifiable by re-configuring graphic icons in the workflow; 3) on-demand cloud computing resources can be used on an "as needed" basis; 4) distributed orchestration supports complex and long running workflows asynchronously. We believe that BioVLAB-MMIA will be an easy-to-use computing environment for researchers who plan to perform genome-wide microRNA-mRNA (gene) integrated analysis tasks.
Chemical genomic profiling via barcode sequencing to predict compound mode of action
Piotrowski, Jeff S.; Simpkins, Scott W.; Li, Sheena C.; Deshpande, Raamesh; McIlwain, Sean; Ong, Irene; Myers, Chad L.; Boone, Charlie; Andersen, Raymond J.
2015-01-01
Summary Chemical genomics is an unbiased, whole-cell approach to characterizing novel compounds to determine mode of action and cellular target. Our version of this technique is built upon barcoded deletion mutants of Saccharomyces cerevisiae and has been adapted to a high-throughput methodology using next-generation sequencing. Here we describe the steps to generate a chemical genomic profile from a compound of interest, and how to use this information to predict molecular mechanism and targets of bioactive compounds. PMID:25618354
Staying alive in adversity: transcriptome dynamics in the stress-resistant dauer larva.
Holt, Suzan J
2006-10-01
In response to food depletion and overcrowding, the soil nematode Caenorhabditis elegans can arrest development and form an alternate third larval stage called the dauer. Though nonfeeding, the dauer larva is long lived and stress resistant. Metabolic and transcription rates are lowered but the transcriptome of the dauer is complex. In this study, distribution analysis of transcript profiles generated by Serial Analysis of Gene Expression (SAGE) in dauer larvae and in mixed developmental stages is presented. An inverse relationship was observed between frequency and abundance/copy number of SAGE tag types (transcripts) in both profiles. In the dauer profile, a relatively greater proportion of highly abundant transcripts was counterbalanced by a smaller fraction of low to moderately abundant transcripts. Comparisons of abundant tag counts between the two profiles revealed relative enrichment in the dauer profile of transcripts with predicted or known involvement in ribosome biogenesis and protein synthesis, membrane transport, and immune responses. Translation-coupled mRNA decay is proposed as part of an immune-like stress response in the dauer larva. An influence of genomic region on transcript level may reflect the coordination of transcription and mRNA turnover.
mRNA expression profiling of laser microbeam microdissected cells from slender embryonic structures.
Scheidl, Stefan J; Nilsson, Sven; Kalén, Mattias; Hellström, Mats; Takemoto, Minoru; Håkansson, Joakim; Lindahl, Per
2002-03-01
Microarray hybridization has rapidly evolved as an important tool for genomic studies and studies of gene regulation at the transcriptome level. Expression profiles from homogenous samples such as yeast and mammalian cell cultures are currently extending our understanding of biology, whereas analyses of multicellular organisms are more difficult because of tissue complexity. The combination of laser microdissection, RNA amplification, and microarray hybridization has the potential to provide expression profiles from selected populations of cells in vivo. In this article, we present and evaluate an experimental procedure for global gene expression analysis of slender embryonic structures using laser microbeam microdissection and laser pressure catapulting. As a proof of principle, expression profiles from 1000 cells in the mouse embryonic (E9.5) dorsal aorta were generated and compared with profiles for captured mesenchymal cells located one cell diameter further away from the aortic lumen. A number of genes were overexpressed in the aorta, including 11 previously known markers for blood vessels. Among the blood vessel markers were endoglin, tie-2, PDGFB, and integrin-beta1, that are important regulators of blood vessel formation. This demonstrates that microarray analysis of laser microbeam micro-dissected cells is sufficiently sensitive for identifying genes with regulative functions.
Arenas, Miguel
2015-04-01
NGS technologies present a fast and cheap generation of genomic data. Nevertheless, ancestral genome inference is not so straightforward due to complex evolutionary processes acting on this material such as inversions, translocations, and other genome rearrangements that, in addition to their implicit complexity, can co-occur and confound ancestral inferences. Recently, models of genome evolution that accommodate such complex genomic events are emerging. This letter explores these novel evolutionary models and proposes their incorporation into robust statistical approaches based on computer simulations, such as approximate Bayesian computation, that may produce a more realistic evolutionary analysis of genomic data. Advantages and pitfalls in using these analytical methods are discussed. Potential applications of these ancestral genomic inferences are also pointed out.
Li, LiQi; Jothi, Raja; Cui, Kairong; Lee, Jan Y; Cohen, Tsadok; Gorivodsky, Marat; Tzchori, Itai; Zhao, Yangu; Hayes, Sandra M; Bresnick, Emery H; Zhao, Keji; Westphal, Heiner; Love, Paul E
2011-02-01
The nuclear adaptor Ldb1 functions as a core component of multiprotein transcription complexes that regulate differentiation in diverse cell types. In the hematopoietic lineage, Ldb1 forms a complex with the non-DNA-binding adaptor Lmo2 and the transcription factors E2A, Scl and GATA-1 (or GATA-2). Here we demonstrate a critical and continuous requirement for Ldb1 in the maintenance of both fetal and adult mouse hematopoietic stem cells (HSCs). Deletion of Ldb1 in hematopoietic progenitors resulted in the downregulation of many transcripts required for HSC maintenance. Genome-wide profiling by chromatin immunoprecipitation followed by sequencing (ChIP-Seq) identified Ldb1 complex-binding sites at highly conserved regions in the promoters of genes involved in HSC maintenance. Our results identify a central role for Ldb1 in regulating the transcriptional program responsible for the maintenance of HSCs.
Determining protein function and interaction from genome analysis
Eisenberg, David; Marcotte, Edward M.; Thompson, Michael J.; Pellegrini, Matteo; Yeates, Todd O.
2004-08-03
A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.
Network-assisted target identification for haploinsufficiency and homozygous profiling screens
Wang, Sheng
2017-01-01
Chemical genomic screens have recently emerged as a systematic approach to drug discovery on a genome-wide scale. Drug target identification and elucidation of the mechanism of action (MoA) of hits from these noisy high-throughput screens remain difficult. Here, we present GIT (Genetic Interaction Network-Assisted Target Identification), a network analysis method for drug target identification in haploinsufficiency profiling (HIP) and homozygous profiling (HOP) screens. With the drug-induced phenotypic fitness defect of the deletion of a gene, GIT also incorporates the fitness defects of the gene’s neighbors in the genetic interaction network. On three genome-scale yeast chemical genomic screens, GIT substantially outperforms previous scoring methods on target identification on HIP and HOP assays, respectively. Finally, we showed that by combining HIP and HOP assays, GIT further boosts target identification and reveals potential drug’s mechanism of action. PMID:28574983
Leveraging non-targeted metabolite profiling via statistical genomics
USDA-ARS?s Scientific Manuscript database
One of the challenges of systems biology is to integrate multiple sources of data in order to build a cohesive view of the system of study. Here we describe the mass spectrometry based profiling of maize kernels, a model system for genomic studies and a cornerstone of the agroeconomy. Using a networ...
USDA-ARS?s Scientific Manuscript database
Using whole-genome bisulfite sequencing (WGBS), we profiled the DNA methylome of cattle sperms through comparison with three bovine somatic tissues (mammary grand, brain and blood). Large differences between them were observed in the methylation patterns of global CpGs, pericentromeric satellites, p...
Reconstruction of Tissue-Specific Metabolic Networks Using CORDA
Schultz, André; Qutub, Amina A.
2016-01-01
Human metabolism involves thousands of reactions and metabolites. To interpret this complexity, computational modeling becomes an essential experimental tool. One of the most popular techniques to study human metabolism as a whole is genome scale modeling. A key challenge to applying genome scale modeling is identifying critical metabolic reactions across diverse human tissues. Here we introduce a novel algorithm called Cost Optimization Reaction Dependency Assessment (CORDA) to build genome scale models in a tissue-specific manner. CORDA performs more efficiently computationally, shows better agreement to experimental data, and displays better model functionality and capacity when compared to previous algorithms. CORDA also returns reaction associations that can greatly assist in any manual curation to be performed following the automated reconstruction process. Using CORDA, we developed a library of 76 healthy and 20 cancer tissue-specific reconstructions. These reconstructions identified which metabolic pathways are shared across diverse human tissues. Moreover, we identified changes in reactions and pathways that are differentially included and present different capacity profiles in cancer compared to healthy tissues, including up-regulation of folate metabolism, the down-regulation of thiamine metabolism, and tight regulation of oxidative phosphorylation. PMID:26942765
Vitamin D receptor signaling and its therapeutic implications: Genome-wide and structural view.
Carlberg, Carsten; Molnár, Ferdinand
2015-05-01
Vitamin D3 is one of the few natural compounds that has, via its metabolite 1α,25-dihydroxyvitamin D3 (1,25(OH)2D3) and the transcription factor vitamin D receptor (VDR), a direct effect on gene regulation. For efficiently applying the therapeutic and disease-preventing potential of 1,25(OH)2D3 and its synthetic analogs, the key steps in vitamin D signaling need to be understood. These are the different types of molecular interactions with the VDR, such as (i) the complex formation of VDR with genomic DNA, (ii) the interaction of VDR with its partner transcription factors, (iii) the binding of 1,25(OH)2D3 or its synthetic analogs within the ligand-binding pocket of the VDR, and (iv) the resulting conformational change on the surface of the VDR leading to a change of the protein-protein interaction profile of the receptor with other proteins. This review will present the latest genome-wide insight into vitamin D signaling, and will discuss its therapeutic implications.
Peptide biomarkers used for the selective breeding of a complex polygenic trait in honey bees.
Guarna, M Marta; Hoover, Shelley E; Huxter, Elizabeth; Higo, Heather; Moon, Kyung-Mee; Domanski, Dominik; Bixby, Miriam E F; Melathopoulos, Andony P; Ibrahim, Abdullah; Peirson, Michael; Desai, Suresh; Micholson, Derek; White, Rick; Borchers, Christoph H; Currie, Robert W; Pernal, Stephen F; Foster, Leonard J
2017-08-21
We present a novel way to select for highly polygenic traits. For millennia, humans have used observable phenotypes to selectively breed stronger or more productive livestock and crops. Selection on genotype, using single-nucleotide polymorphisms (SNPs) and genome profiling, is also now applied broadly in livestock breeding programs; however, selection on protein/peptide or mRNA expression markers has not yet been proven useful. Here we demonstrate the utility of protein markers to select for disease-resistant hygienic behavior in the European honey bee (Apis mellifera L.). Robust, mechanistically-linked protein expression markers, by integrating cis- and trans- effects from many genomic loci, may overcome limitations of genomic markers to allow for selection. After three generations of selection, the resulting marker-selected stock outperformed an unselected benchmark stock in terms of hygienic behavior, and had improved survival when challenged with a bacterial disease or a parasitic mite, similar to bees selected using a phenotype-based assessment for this trait. This is the first demonstration of the efficacy of protein markers for industrial selective breeding in any agricultural species, plant or animal.
Kinetic theory approach to modeling of cellular repair mechanisms under genome stress.
Qi, Jinpeng; Ding, Yongsheng; Zhu, Ying; Wu, Yizhi
2011-01-01
Under acute perturbations from outer environment, a normal cell can trigger cellular self-defense mechanism in response to genome stress. To investigate the kinetics of cellular self-repair process at single cell level further, a model of DNA damage generating and repair is proposed under acute Ion Radiation (IR) by using mathematical framework of kinetic theory of active particles (KTAP). Firstly, we focus on illustrating the profile of Cellular Repair System (CRS) instituted by two sub-populations, each of which is made up of the active particles with different discrete states. Then, we implement the mathematical framework of cellular self-repair mechanism, and illustrate the dynamic processes of Double Strand Breaks (DSBs) and Repair Protein (RP) generating, DSB-protein complexes (DSBCs) synthesizing, and toxins accumulating. Finally, we roughly analyze the capability of cellular self-repair mechanism, cellular activity of transferring DNA damage, and genome stability, especially the different fates of a certain cell before and after the time thresholds of IR perturbations that a cell can tolerate maximally under different IR perturbation circumstances.
Li, Dongming; Palanca, Ana Marie S; Won, So Youn; Gao, Lei; Feng, Ying; Vashisht, Ajay A; Liu, Li; Zhao, Yuanyuan; Liu, Xigang; Wu, Xiuyun; Li, Shaofang; Le, Brandon; Kim, Yun Ju; Yang, Guodong; Li, Shengben; Liu, Jinyuan; Wohlschlegel, James A; Guo, Hongwei; Mo, Beixin; Chen, Xuemei; Law, Julie A
2017-01-01
DNA methylation is associated with gene silencing in eukaryotic organisms. Although pathways controlling the establishment, maintenance and removal of DNA methylation are known, relatively little is understood about how DNA methylation influences gene expression. Here we identified a METHYL-CpG-BINDING DOMAIN 7 (MBD7) complex in Arabidopsis thaliana that suppresses the transcriptional silencing of two LUCIFERASE (LUC) reporters via a mechanism that is largely downstream of DNA methylation. Although mutations in components of the MBD7 complex resulted in modest increases in DNA methylation concomitant with decreased LUC expression, we found that these hyper-methylation and gene expression phenotypes can be genetically uncoupled. This finding, along with genome-wide profiling experiments showing minimal changes in DNA methylation upon disruption of the MBD7 complex, places the MBD7 complex amongst a small number of factors acting downstream of DNA methylation. This complex, however, is unique as it functions to suppress, rather than enforce, DNA methylation-mediated gene silencing. DOI: http://dx.doi.org/10.7554/eLife.19893.001 PMID:28452714
Approaches to integrating germline and tumor genomic data in cancer research
Feigelson, Heather Spencer; Goddard, Katrina A.B.; Hollombe, Celine; Tingle, Sharna R.; Gillanders, Elizabeth M.; Mechanic, Leah E.; Nelson, Stefanie A.
2014-01-01
Cancer is characterized by a diversity of genetic and epigenetic alterations occurring in both the germline and somatic (tumor) genomes. Hundreds of germline variants associated with cancer risk have been identified, and large amounts of data identifying mutations in the tumor genome that participate in tumorigenesis have been generated. Increasingly, these two genomes are being explored jointly to better understand how cancer risk alleles contribute to carcinogenesis and whether they influence development of specific tumor types or mutation profiles. To understand how data from germline risk studies and tumor genome profiling is being integrated, we reviewed 160 articles describing research that incorporated data from both genomes, published between January 2009 and December 2012, and summarized the current state of the field. We identified three principle types of research questions being addressed using these data: (i) use of tumor data to determine the putative function of germline risk variants; (ii) identification and analysis of relationships between host genetic background and particular tumor mutations or types; and (iii) use of tumor molecular profiling data to reduce genetic heterogeneity or refine phenotypes for germline association studies. We also found descriptive studies that compared germline and tumor genomic variation in a gene or gene family, and papers describing research methods, data sources, or analytical tools. We identified a large set of tools and data resources that can be used to analyze and integrate data from both genomes. Finally, we discuss opportunities and challenges for cancer research that integrates germline and tumor genomics data. PMID:25115441
Mahajan, Prashant; Kuppermann, Nathan; Suarez, Nicolas; Mejias, Asuncion; Casper, Charlie; Dean, J Michael; Ramilo, Octavio
2015-01-01
To develop the infrastructure and demonstrate the feasibility of conducting microarray-based RNA transcriptional profile analyses for the diagnosis of serious bacterial infections in febrile infants 60 days and younger in a multicenter pediatric emergency research network. We designed a prospective multicenter cohort study with the aim of enrolling more than 4000 febrile infants 60 days and younger. To ensure success of conducting complex genomic studies in emergency department (ED) settings, we established an infrastructure within the Pediatric Emergency Care Applied Research Network, including 21 sites, to evaluate RNA transcriptional profiles in young febrile infants. We developed a comprehensive manual of operations and trained site investigators to obtain and process blood samples for RNA extraction and genomic analyses. We created standard operating procedures for blood sample collection, processing, storage, shipping, and analyses. We planned to prospectively identify, enroll, and collect 1 mL blood samples for genomic analyses from eligible patients to identify logistical issues with study procedures. Finally, we planned to batch blood samples and determined RNA quantity and quality at the central microarray laboratory and organized data analysis with the Pediatric Emergency Care Applied Research Network data coordinating center. Below we report on establishment of the infrastructure and the feasibility success in the first year based on the enrollment of a limited number of patients. We successfully established the infrastructure at 21 EDs. Over the first 5 months we enrolled 79% (74 of 94) of eligible febrile infants. We were able to obtain and ship 1 mL of blood from 74% (55 of 74) of enrolled participants, with at least 1 sample per participating ED. The 55 samples were shipped and evaluated at the microarray laboratory, and 95% (52 of 55) of blood samples were of adequate quality and contained sufficient RNA for expression analysis. It is possible to create a robust infrastructure to conduct genomic studies in young febrile infants in the context of a multicenter pediatric ED research setting. The sufficient quantity and high quality of RNA obtained suggests that whole blood transcriptional profile analysis for the diagnostic evaluation of young febrile infants can be successfully performed in this setting.
Desai, Meeta; Efstratiou, Androulla; George, Robert; Stanley, John
1999-01-01
We have used fluorescent amplified-fragment length polymorphism (FAFLP) analysis to subtype clinical isolates of Streptococcus pyogenes serotype M1. Established typing methods define most M1 isolates as members of a clone that has a worldwide distribution and that is strongly associated with invasive diseases. FAFLP analysis simultaneously sampled 90 to 120 loci throughout the M1 genome. Its discriminatory power, precision, and reproducibility were compared with those of other molecular typing methods. Irrespective of disease symptomatology or geographic origin, the majority of the clinical M1 isolates shared a single ribotype, pulsed-field gel electrophoresis macrorestriction profile, and emm1 gene sequence. Nonetheless, among these isolates, FAFLP analysis could differentiate 17 distinct profiles, including seven multi-isolate groups. The FAFLP profiles of M1 isolates reproducibly exhibited between 1 and more than 20 amplified fragment differences. The high discriminatory power of genotyping by FAFLP analysis revealed genetic microheterogeneity and differentiated otherwise “identical” M1 isolates as members of a clone complex. PMID:10325352
Learning directed acyclic graphs from large-scale genomics data.
Nikolay, Fabio; Pesavento, Marius; Kritikos, George; Typas, Nassos
2017-09-20
In this paper, we consider the problem of learning the genetic interaction map, i.e., the topology of a directed acyclic graph (DAG) of genetic interactions from noisy double-knockout (DK) data. Based on a set of well-established biological interaction models, we detect and classify the interactions between genes. We propose a novel linear integer optimization program called the Genetic-Interactions-Detector (GENIE) to identify the complex biological dependencies among genes and to compute the DAG topology that matches the DK measurements best. Furthermore, we extend the GENIE program by incorporating genetic interaction profile (GI-profile) data to further enhance the detection performance. In addition, we propose a sequential scalability technique for large sets of genes under study, in order to provide statistically significant results for real measurement data. Finally, we show via numeric simulations that the GENIE program and the GI-profile data extended GENIE (GI-GENIE) program clearly outperform the conventional techniques and present real data results for our proposed sequential scalability technique.
Almeida, Mathieu; Hébert, Agnès; Abraham, Anne-Laure; Rasmussen, Simon; Monnet, Christophe; Pons, Nicolas; Delbès, Céline; Loux, Valentin; Batto, Jean-Michel; Leonard, Pierre; Kennedy, Sean; Ehrlich, Stanislas Dusko; Pop, Mihai; Montel, Marie-Christine; Irlinger, Françoise; Renault, Pierre
2014-12-13
Microbial communities of traditional cheeses are complex and insufficiently characterized. The origin, safety and functional role in cheese making of these microbial communities are still not well understood. Metagenomic analysis of these communities by high throughput shotgun sequencing is a promising approach to characterize their genomic and functional profiles. Such analyses, however, critically depend on the availability of appropriate reference genome databases against which the sequencing reads can be aligned. We built a reference genome catalog suitable for short read metagenomic analysis using a low-cost sequencing strategy. We selected 142 bacteria isolated from dairy products belonging to 137 different species and 67 genera, and succeeded to reconstruct the draft genome of 117 of them at a standard or high quality level, including isolates from the genera Kluyvera, Luteococcus and Marinilactibacillus, still missing from public database. To demonstrate the potential of this catalog, we analysed the microbial composition of the surface of two smear cheeses and one blue-veined cheese, and showed that a significant part of the microbiota of these traditional cheeses was composed of microorganisms newly sequenced in our study. Our study provides data, which combined with publicly available genome references, represents the most expansive catalog to date of cheese-associated bacteria. Using this extended dairy catalog, we revealed the presence in traditional cheese of dominant microorganisms not deliberately inoculated, mainly Gram-negative genera such as Pseudoalteromonas haloplanktis or Psychrobacter immobilis, that may contribute to the characteristics of cheese produced through traditional methods.
Diversity arrays technology: a generic genome profiling technology on open platforms.
Kilian, Andrzej; Wenzl, Peter; Huttner, Eric; Carling, Jason; Xia, Ling; Blois, Hélène; Caig, Vanessa; Heller-Uszynska, Katarzyna; Jaccoud, Damian; Hopper, Colleen; Aschenbrenner-Kilian, Malgorzata; Evers, Margaret; Peng, Kaiman; Cayla, Cyril; Hok, Puthick; Uszynski, Grzegorz
2012-01-01
In the last 20 years, we have observed an exponential growth of the DNA sequence data and simular increase in the volume of DNA polymorphism data generated by numerous molecular marker technologies. Most of the investment, and therefore progress, concentrated on human genome and genomes of selected model species. Diversity Arrays Technology (DArT), developed over a decade ago, was among the first "democratizing" genotyping technologies, as its performance was primarily driven by the level of DNA sequence variation in the species rather than by the level of financial investment. DArT also proved more robust to genome size and ploidy-level differences among approximately 60 organisms for which DArT was developed to date compared to other high-throughput genotyping technologies. The success of DArT in a number of organisms, including a wide range of "orphan crops," can be attributed to the simplicity of underlying concepts: DArT combines genome complexity reduction methods enriching for genic regions with a highly parallel assay readout on a number of "open-access" microarray platforms. The quantitative nature of the assay enabled a number of applications in which allelic frequencies can be estimated from DArT arrays. A typical DArT assay tests for polymorphism tens of thousands of genomic loci with the final number of markers reported (hundreds to thousands) reflecting the level of DNA sequence variation in the tested loci. Detailed DArT methods, protocols, and a range of their application examples as well as DArT's evolution path are presented.
ATRX Directs Binding of PRC2 to Xist RNA and Polycomb Targets
Sarma, Kavitha; Cifuentes-Rojas, Catherine; Ergun, Ayla; del Rosario, Amanda; Jeon, Yesu; White, Forest; Sadreyev, Ruslan; Lee, Jeannie T.
2015-01-01
SUMMARY X chromosome inactivation (XCI) depends on the long noncoding RNA Xist and its recruitment of Polycomb Repressive Complex 2 (PRC2). PRC2 is also targeted to other sites throughout the genome to effect transcriptional repression. Using XCI as a model, we apply an unbiased proteomics approach to isolate Xist and PRC2 regulators and identified ATRX. ATRX unexpectedly functions as a high-affinity RNA-binding protein that directly interacts with RepA/Xist RNA to promote loading of PRC2 in vivo. Without ATRX, PRC2 cannot load onto Xist RNA nor spread in cis along the X chromosome. Moreover, epigenomic profiling reveals that genome-wide targeting of PRC2 depends on ATRX, as loss of ATRX leads to spatial redistribution of PRC2 and derepression of Polycomb responsive genes. Thus, ATRX is a required specificity determinant for PRC2 targeting and function. PMID:25417162
Hotspots of aberrant enhancer activity punctuate the colorectal cancer epigenome
Cohen, Andrea J.; Saiakhova, Alina; Corradin, Olivia; Luppino, Jennifer M.; Lovrenert, Katreya; Bartels, Cynthia F.; Morrow, James J.; Mack, Stephen C.; Dhillon, Gursimran; Beard, Lydia; Myeroff, Lois; Kalady, Matthew F.; Willis, Joseph; Bradner, James E.; Keri, Ruth A.; Berger, Nathan A.; Pruett-Miller, Shondra M.; Markowitz, Sanford D.; Scacheri, Peter C.
2017-01-01
In addition to mutations in genes, aberrant enhancer element activity at non-coding regions of the genome is a key driver of tumorigenesis. Here, we perform epigenomic enhancer profiling of a cohort of more than forty genetically diverse human colorectal cancer (CRC) specimens. Using normal colonic crypt epithelium as a comparator, we identify enhancers with recurrently gained or lost activity across CRC specimens. Of the enhancers highly recurrently activated in CRC, most are constituents of super enhancers, are occupied by AP-1 and cohesin complex members, and originate from primed chromatin. Many activate known oncogenes, and CRC growth can be mitigated through pharmacologic inhibition or genome editing of these loci. Nearly half of all GWAS CRC risk loci co-localize to recurrently activated enhancers. These findings indicate that the CRC epigenome is defined by highly recurrent epigenetic alterations at enhancers which activate a common, aberrant transcriptional programme critical for CRC growth and survival. PMID:28169291
Jiang, Li; Edwards, Stefan M; Thomsen, Bo; Workman, Christopher T; Guldbrandtsen, Bernt; Sørensen, Peter
2014-09-24
Prioritizing genetic variants is a challenge because disease susceptibility loci are often located in genes of unknown function or the relationship with the corresponding phenotype is unclear. A global data-mining exercise on the biomedical literature can establish the phenotypic profile of genes with respect to their connection to disease phenotypes. The importance of protein-protein interaction networks in the genetic heterogeneity of common diseases or complex traits is becoming increasingly recognized. Thus, the development of a network-based approach combined with phenotypic profiling would be useful for disease gene prioritization. We developed a random-set scoring model and implemented it to quantify phenotype relevance in a network-based disease gene-prioritization approach. We validated our approach based on different gene phenotypic profiles, which were generated from PubMed abstracts, OMIM, and GeneRIF records. We also investigated the validity of several vocabulary filters and different likelihood thresholds for predicted protein-protein interactions in terms of their effect on the network-based gene-prioritization approach, which relies on text-mining of the phenotype data. Our method demonstrated good precision and sensitivity compared with those of two alternative complex-based prioritization approaches. We then conducted a global ranking of all human genes according to their relevance to a range of human diseases. The resulting accurate ranking of known causal genes supported the reliability of our approach. Moreover, these data suggest many promising novel candidate genes for human disorders that have a complex mode of inheritance. We have implemented and validated a network-based approach to prioritize genes for human diseases based on their phenotypic profile. We have devised a powerful and transparent tool to identify and rank candidate genes. Our global gene prioritization provides a unique resource for the biological interpretation of data from genome-wide association studies, and will help in the understanding of how the associated genetic variants influence disease or quantitative phenotypes.
Comparative analysis of field-isolate and monkey-adapted Plasmodium vivax genomes.
Chan, Ernest R; Barnwell, John W; Zimmerman, Peter A; Serre, David
2015-03-01
Significant insights into the biology of Plasmodium vivax have been gained from the ability to successfully adapt human infections to non-human primates. P. vivax strains grown in monkeys serve as a renewable source of parasites for in vitro and ex vivo experimental studies and functional assays, or for studying in vivo the relapse characteristics, mosquito species compatibilities, drug susceptibility profiles or immune responses towards potential vaccine candidates. Despite the importance of these studies, little is known as to how adaptation to a different host species may influence the genome of P. vivax. In addition, it is unclear whether these monkey-adapted strains consist of a single clonal population of parasites or if they retain the multiclonal complexity commonly observed in field isolates. Here we compare the genome sequences of seven P. vivax strains adapted to New World monkeys with those of six human clinical isolates collected directly in the field. We show that the adaptation of P. vivax parasites to monkey hosts, and their subsequent propagation, did not result in significant modifications of their genome sequence and that these monkey-adapted strains recapitulate the genomic diversity of field isolates. Our analyses also reveal that these strains are not always genetically homogeneous and should be analyzed cautiously. Overall, our study provides a framework to better leverage this important research material and fully utilize this resource for improving our understanding of P. vivax biology.
Comparative Analysis of Field-Isolate and Monkey-Adapted Plasmodium vivax Genomes
Chan, Ernest R.; Barnwell, John W.; Zimmerman, Peter A.; Serre, David
2015-01-01
Significant insights into the biology of Plasmodium vivax have been gained from the ability to successfully adapt human infections to non-human primates. P. vivax strains grown in monkeys serve as a renewable source of parasites for in vitro and ex vivo experimental studies and functional assays, or for studying in vivo the relapse characteristics, mosquito species compatibilities, drug susceptibility profiles or immune responses towards potential vaccine candidates. Despite the importance of these studies, little is known as to how adaptation to a different host species may influence the genome of P. vivax. In addition, it is unclear whether these monkey-adapted strains consist of a single clonal population of parasites or if they retain the multiclonal complexity commonly observed in field isolates. Here we compare the genome sequences of seven P. vivax strains adapted to New World monkeys with those of six human clinical isolates collected directly in the field. We show that the adaptation of P. vivax parasites to monkey hosts, and their subsequent propagation, did not result in significant modifications of their genome sequence and that these monkey-adapted strains recapitulate the genomic diversity of field isolates. Our analyses also reveal that these strains are not always genetically homogeneous and should be analyzed cautiously. Overall, our study provides a framework to better leverage this important research material and fully utilize this resource for improving our understanding of P. vivax biology. PMID:25768941
Carlson, Hanqian L; Quinn, Jeffrey J; Yang, Yul W; Thornburg, Chelsea K; Chang, Howard Y; Stadler, H Scott
2015-12-01
Gene expression profiling in E 11 mouse embryos identified high expression of the long noncoding RNA (lncRNA), LNCRNA-HIT in the undifferentiated limb mesenchyme, gut, and developing genital tubercle. In the limb mesenchyme, LncRNA-HIT was found to be retained in the nucleus, forming a complex with p100 and CBP. Analysis of the genome-wide distribution of LncRNA-HIT-p100/CBP complexes by ChIRP-seq revealed LncRNA-HIT associated peaks at multiple loci in the murine genome. Ontological analysis of the genes contacted by LncRNA-HIT-p100/CBP complexes indicate a primary role for these loci in chondrogenic differentiation. Functional analysis using siRNA-mediated reductions in LncRNA-HIT or p100 transcripts revealed a significant decrease in expression of many of the LncRNA-HIT-associated loci. LncRNA-HIT siRNA treatments also impacted the ability of the limb mesenchyme to form cartilage, reducing mesenchymal cell condensation and the formation of cartilage nodules. Mechanistically the LncRNA-HIT siRNA treatments impacted pro-chondrogenic gene expression by reducing H3K27ac or p100 activity, confirming that LncRNA-HIT is essential for chondrogenic differentiation in the limb mesenchyme. Taken together, these findings reveal a fundamental epigenetic mechanism functioning during early limb development, using LncRNA-HIT and its associated proteins to promote the expression of multiple genes whose products are necessary for the formation of cartilage.
Carlson, Hanqian L.; Quinn, Jeffrey J.; Yang, Yul W.; Thornburg, Chelsea K.; Chang, Howard Y.; Stadler, H. Scott
2015-01-01
Gene expression profiling in E 11 mouse embryos identified high expression of the long noncoding RNA (lncRNA), LNCRNA-HIT in the undifferentiated limb mesenchyme, gut, and developing genital tubercle. In the limb mesenchyme, LncRNA-HIT was found to be retained in the nucleus, forming a complex with p100 and CBP. Analysis of the genome-wide distribution of LncRNA-HIT-p100/CBP complexes by ChIRP-seq revealed LncRNA-HIT associated peaks at multiple loci in the murine genome. Ontological analysis of the genes contacted by LncRNA-HIT-p100/CBP complexes indicate a primary role for these loci in chondrogenic differentiation. Functional analysis using siRNA-mediated reductions in LncRNA-HIT or p100 transcripts revealed a significant decrease in expression of many of the LncRNA-HIT-associated loci. LncRNA-HIT siRNA treatments also impacted the ability of the limb mesenchyme to form cartilage, reducing mesenchymal cell condensation and the formation of cartilage nodules. Mechanistically the LncRNA-HIT siRNA treatments impacted pro-chondrogenic gene expression by reducing H3K27ac or p100 activity, confirming that LncRNA-HIT is essential for chondrogenic differentiation in the limb mesenchyme. Taken together, these findings reveal a fundamental epigenetic mechanism functioning during early limb development, using LncRNA-HIT and its associated proteins to promote the expression of multiple genes whose products are necessary for the formation of cartilage. PMID:26633036
Strand-specific transcriptome profiling with directly labeled RNA on genomic tiling microarrays
2011-01-01
Background With lower manufacturing cost, high spot density, and flexible probe design, genomic tiling microarrays are ideal for comprehensive transcriptome studies. Typically, transcriptome profiling using microarrays involves reverse transcription, which converts RNA to cDNA. The cDNA is then labeled and hybridized to the probes on the arrays, thus the RNA signals are detected indirectly. Reverse transcription is known to generate artifactual cDNA, in particular the synthesis of second-strand cDNA, leading to false discovery of antisense RNA. To address this issue, we have developed an effective method using RNA that is directly labeled, thus by-passing the cDNA generation. This paper describes this method and its application to the mapping of transcriptome profiles. Results RNA extracted from laboratory cultures of Porphyromonas gingivalis was fluorescently labeled with an alkylation reagent and hybridized directly to probes on genomic tiling microarrays specifically designed for this periodontal pathogen. The generated transcriptome profile was strand-specific and produced signals close to background level in most antisense regions of the genome. In contrast, high levels of signal were detected in the antisense regions when the hybridization was done with cDNA. Five antisense areas were tested with independent strand-specific RT-PCR and none to negligible amplification was detected, indicating that the strong antisense cDNA signals were experimental artifacts. Conclusions An efficient method was developed for mapping transcriptome profiles specific to both coding strands of a bacterial genome. This method chemically labels and uses extracted RNA directly in microarray hybridization. The generated transcriptome profile was free of cDNA artifactual signals. In addition, this method requires fewer processing steps and is potentially more sensitive in detecting small amount of RNA compared to conventional end-labeling methods due to the incorporation of more fluorescent molecules per RNA fragment. PMID:21235785
An, Yu; Duan, Wenyuan; Huang, Guoying; Chen, Xiaoli; Li, Li; Nie, Chenxia; Hou, Jia; Gui, Yonghao; Wu, Yiming; Zhang, Feng; Shen, Yiping; Wu, Bailin; Wang, Hongyan
2016-01-08
Ventricular septal defects (VSDs) constitute the most prevalent congenital heart disease (CHD), occurs either in isolation (isolated VSD) or in combination with other cardiac defects (complex VSD). Copy number variation (CNV) has been highlighted as a possible contributing factor to the etiology of many congenital diseases. However, little is known concerning the involvement of CNVs in either isolated or complex VSDs. We analyzed 154 unrelated Chinese individuals with VSD by chromosomal microarray analysis. The subjects were recruited from four hospitals across China. Each case underwent clinical assessment to define the type of VSD, either isolated or complex VSD. CNVs detected were categorized into syndrom related CNVs, recurrent CNVs and rare CNVs. Genes encompassed by the CNVs were analyzed using enrichment and pathway analysis. Among 154 probands, we identified 29 rare CNVs in 26 VSD patients (16.9 %, 26/154) and 8 syndrome-related CNVs in 8 VSD patients (5.2 %, 8/154). 12 of the detected 29 rare CNVs (41.3 %) were recurrently reported in DECIPHER or ISCA database as associated with either VSD or general heart disease. Fifteen genes (5 %, 15/285) within CNVs were associated with a broad spectrum of complicated CHD. Among these15 genes, 7 genes were in "abnormal interventricular septum morphology" derived from the MGI (mouse genome informatics) database, and nine genes were associated with cardiovascular system development (GO:0072538).We also found that these VSD-related candidate genes are enriched in chromatin binding and transcription regulation, which are the biological processes underlying heart development. Our study demonstrates the potential clinical diagnostic utility of genomic imbalance profiling in VSD patients. Additionally, gene enrichment and pathway analysis helped us to implicate VSD related candidate genes.
Evolution of Genome Size and Complexity in Pinus
Morse, Alison M.; Peterson, Daniel G.; Islam-Faridi, M. Nurul; Smith, Katherine E.; Magbanua, Zenaida; Garcia, Saul A.; Kubisiak, Thomas L.; Amerson, Henry V.; Carlson, John E.; Nelson, C. Dana; Davis, John M.
2009-01-01
Background Genome evolution in the gymnosperm lineage of seed plants has given rise to many of the most complex and largest plant genomes, however the elements involved are poorly understood. Methodology/Principal Findings Gymny is a previously undescribed retrotransposon family in Pinus that is related to Athila elements in Arabidopsis. Gymny elements are dispersed throughout the modern Pinus genome and occupy a physical space at least the size of the Arabidopsis thaliana genome. In contrast to previously described retroelements in Pinus, the Gymny family was amplified or introduced after the divergence of pine and spruce (Picea). If retrotransposon expansions are responsible for genome size differences within the Pinaceae, as they are in angiosperms, then they have yet to be identified. In contrast, molecular divergence of Gymny retrotransposons together with other families of retrotransposons can account for the large genome complexity of pines along with protein-coding genic DNA, as revealed by massively parallel DNA sequence analysis of Cot fractionated genomic DNA. Conclusions/Significance Most of the enormous genome complexity of pines can be explained by divergence of retrotransposons, however the elements responsible for genome size variation are yet to be identified. Genomic resources for Pinus including those reported here should assist in further defining whether and how the roles of retrotransposons differ in the evolution of angiosperm and gymnosperm genomes. PMID:19194510
Li, Ping; Li, Xuan; Gu, Qing; Lou, Xiu-yu; Zhang, Xiao-mei; Song, Da-feng; Zhang, Chen
2016-01-01
Objective: In previous studies, Lactobacillus plantarum ZJ316 showed probiotic properties, such as antimicrobial activity against various pathogens and the capacity to significantly improve pig growth and pork quality. The purpose of this study was to reveal the genes potentially related to its genetic adaptation and probiotic profiles based on comparative genomic analysis. Methods: The genome sequence of L. plantarum ZJ316 was compared with those of eight L. plantarum strains deposited in GenBank. BLASTN, Mauve, and MUMmer programs were used for genome alignment and comparison. CRISPRFinder was applied for searching the clustered regularly interspaced short palindromic repeats (CRISPRs). Results: We identified genes that encode proteins related to genetic adaptation and probiotic profiles, including carbohydrate transport and metabolism, proteolytic enzyme systems and amino acid biosynthesis, CRISPR adaptive immunity, stress responses, bile salt resistance, ability to adhere to the host intestinal wall, exopolysaccharide (EPS) biosynthesis, and bacteriocin biosynthesis. Conclusions: Comparative characterization of the L. plantarum ZJ316 genome provided the genetic basis for further elucidating the functional mechanisms of its probiotic properties. ZJ316 could be considered a potential probiotic candidate. PMID:27487802
Li, Ping; Li, Xuan; Gu, Qing; Lou, Xiu-Yu; Zhang, Xiao-Mei; Song, Da-Feng; Zhang, Chen
2016-08-01
In previous studies, Lactobacillus plantarum ZJ316 showed probiotic properties, such as antimicrobial activity against various pathogens and the capacity to significantly improve pig growth and pork quality. The purpose of this study was to reveal the genes potentially related to its genetic adaptation and probiotic profiles based on comparative genomic analysis. The genome sequence of L. plantarum ZJ316 was compared with those of eight L. plantarum strains deposited in GenBank. BLASTN, Mauve, and MUMmer programs were used for genome alignment and comparison. CRISPRFinder was applied for searching the clustered regularly interspaced short palindromic repeats (CRISPRs). We identified genes that encode proteins related to genetic adaptation and probiotic profiles, including carbohydrate transport and metabolism, proteolytic enzyme systems and amino acid biosynthesis, CRISPR adaptive immunity, stress responses, bile salt resistance, ability to adhere to the host intestinal wall, exopolysaccharide (EPS) biosynthesis, and bacteriocin biosynthesis. Comparative characterization of the L. plantarum ZJ316 genome provided the genetic basis for further elucidating the functional mechanisms of its probiotic properties. ZJ316 could be considered a potential probiotic candidate.
Symonová, Radka; Majtánová, Zuzana; Arias-Rodriguez, Lenin; Mořkovský, Libor; Kořínková, Tereza; Cavin, Lionel; Pokorná, Martina Johnson; Doležálková, Marie; Flajšhans, Martin; Normandeau, Eric; Ráb, Petr; Meyer, Axel; Bernatchez, Louis
2017-11-01
Genomic GC content can vary locally, and GC-rich regions are usually associated with increased DNA thermostability in thermophilic prokaryotes and warm-blooded eukaryotes. Among vertebrates, fish and amphibians appeared to possess a distinctly less heterogeneous AT/GC organization in their genomes, whereas cytogenetically detectable GC heterogeneity has so far only been documented in mammals and birds. The subject of our study is the gar, an ancient "living fossil" of a basal ray-finned fish lineage, known from the Cretaceous period. We carried out cytogenomic analysis in two gar genera (Atractosteus and Lepisosteus) uncovering a GC chromosomal pattern uncharacteristic for fish. Bioinformatic analysis of the spotted gar (Lepisosteus oculatus) confirmed a GC compartmentalization on GC profiles of linkage groups. This indicates a rather mammalian mode of compositional organization on gar chromosomes. Gars are thus the only analyzed extant ray-finned fishes with a GC compartmentalized genome. Since gars are cold-blooded anamniotes, our results contradict the generally accepted hypothesis that the phylogenomic onset of GC compartmentalization occurred near the origin of amniotes. Ecophysiological findings of other authors indicate a metabolic similarity of gars with mammals. We hypothesize that gars might have undergone convergent evolution with the tetrapod lineages leading to mammals on both metabolic and genomic levels. Their metabolic adaptations might have left footprints in their compositional genome evolution, as proposed by the metabolic rate hypothesis. The genome organization described here in gars sheds new light on the compositional genome evolution in vertebrates generally and contributes to better understanding of the complexities of the mechanisms involved in this process. © 2016 Wiley Periodicals, Inc.
Comprehensive Genomic Profiling of Esthesioneuroblastoma Reveals Additional Treatment Options.
Gay, Laurie M; Kim, Sungeun; Fedorchak, Kyle; Kundranda, Madappa; Odia, Yazmin; Nangia, Chaitali; Battiste, James; Colon-Otero, Gerardo; Powell, Steven; Russell, Jeffery; Elvin, Julia A; Vergilio, Jo-Anne; Suh, James; Ali, Siraj M; Stephens, Philip J; Miller, Vincent A; Ross, Jeffrey S
2017-07-01
Esthesioneuroblastoma (ENB), also known as olfactory neuroblastoma, is a rare malignant neoplasm of the olfactory mucosa. Despite surgical resection combined with radiotherapy and adjuvant chemotherapy, ENB often relapses with rapid progression. Current multimodality, nontargeted therapy for relapsed ENB is of limited clinical benefit. We queried whether comprehensive genomic profiling (CGP) of relapsed or refractory ENB can uncover genomic alterations (GA) that could identify potential targeted therapies for these patients. CGP was performed on formalin-fixed, paraffin-embedded sections from 41 consecutive clinical cases of ENBs using a hybrid-capture, adaptor ligation based next-generation sequencing assay to a mean coverage depth of 593X. The results were analyzed for base substitutions, insertions and deletions, select rearrangements, and copy number changes (amplifications and homozygous deletions). Clinically relevant GA (CRGA) were defined as GA linked to drugs on the market or under evaluation in clinical trials. A total of 28 ENBs harbored GA, with a mean of 1.5 GA per sample. Approximately half of the ENBs (21, 51%) featured at least one CRGA, with an average of 1 CRGA per sample. The most commonly altered gene was TP53 (17%), with GA in PIK3CA , NF1 , CDKN2A , and CDKN2C occurring in 7% of samples. We report comprehensive genomic profiles for 41 ENB tumors. CGP revealed potential new therapeutic targets, including targetable GA in the mTOR, CDK and growth factor signaling pathways, highlighting the clinical value of genomic profiling in ENB. Comprehensive genomic profiling of 41 relapsed or refractory ENBs reveals recurrent alterations or classes of mutation, including amplification of tyrosine kinases encoded on chromosome 5q and mutations affecting genes in the mTOR/PI3K pathway. Approximately half of the ENBs (21, 51%) featured at least one clinically relevant genomic alteration (CRGA), with an average of 1 CRGA per sample. The most commonly altered gene was TP53 (17%), and alterations in PIK3CA , NF1 , CDKN2A , or CDKN2C were identified in 7% of samples. Responses to treatment with the kinase inhibitors sunitinib, everolimus, and pazopanib are presented in conjunction with tumor genomics. © AlphaMed Press 2017.
Eisenberg, David; Marcotte, Edward M.; Pellegrini, Matteo; Thompson, Michael J.; Yeates, Todd O.
2002-10-15
A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.
USDA-ARS?s Scientific Manuscript database
The large and complex genome of bread wheat (Triticum aestivum L., ~17 Gb) requires high-resolution genome maps saturated with ordered markers to assist in anchoring and orienting BAC contigs/ sequence scaffolds for whole genome sequence assembly. Radiation hybrid (RH) mapping has proven to be an e...
Ortega, Victor E.; Meyers, Deborah A.
2014-01-01
Pharmacogenetics is being used to develop personalized therapies specific to individuals from different ethnic or racial groups. Pharmacogenetic studies to date have been primarily performed in trial cohorts consisting of non-Hispanic whites of European descent. A “bottleneck” or collapse of genetic diversity associated with the first human colonization of Europe during the Upper Paleolithic period, followed by the recent mixing of African, European, and Native American ancestries has resulted in different ethnic groups with varying degrees of genetic diversity. Differences in genetic ancestry may introduce genetic variation which has the potential to alter the therapeutic efficacy of commonly used asthma therapies, for example β2-adrenergic receptor agonists (beta agonists). Pharmacogenetic studies of admixed ethnic groups have been limited to small candidate gene association studies of which the best example is the gene coding for the receptor target of beta agonist therapy, ADRB2. Large consortium-based sequencing studies are using next-generation whole-genome sequencing to provide a diverse genome map of different admixed populations which can be used for future pharmacogenetic studies. These studies will include candidate gene studies, genome-wide association studies, and whole-genome admixture-based approaches which account for ancestral genetic structure, complex haplotypes, gene-gene interactions, and rare variants to detect and replicate novel pharmacogenetic loci. PMID:24369795
Pyne, Michael E; Liu, Xuejia; Moo-Young, Murray; Chung, Duane A; Chou, C Perry
2016-09-19
Clostridium pasteurianum is emerging as a prospective host for the production of biofuels and chemicals, and has recently been shown to directly consume electric current. Despite this growing biotechnological appeal, the organism's genetics and central metabolism remain poorly understood. Here we present a concurrent genome sequence for the C. pasteurianum type strain and provide extensive genomic analysis of the organism's defence mechanisms and central fermentative metabolism. Next generation genome sequencing produced reads corresponding to spontaneous excision of a novel phage, designated φ6013, which could be induced using mitomycin C and detected using PCR and transmission electron microscopy. Methylome analysis of sequencing reads provided a near-complete glimpse into the organism's restriction-modification systems. We also unveiled the chief C. pasteurianum Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) locus, which was found to exemplify a Type I-B system. Finally, we show that C. pasteurianum possesses a highly complex fermentative metabolism whereby the metabolic pathways enlisted by the cell is governed by the degree of reductance of the substrate. Four distinct fermentation profiles, ranging from exclusively acidogenic to predominantly alcohologenic, were observed through redox consideration of the substrate. A detailed discussion of the organism's central metabolism within the context of metabolic engineering is provided.
Zhu, Zhou; Ihle, Nathan T; Rejto, Paul A; Zarrinkar, Patrick P
2016-06-13
Genome-scale functional genomic screens across large cell line panels provide a rich resource for discovering tumor vulnerabilities that can lead to the next generation of targeted therapies. Their data analysis typically has focused on identifying genes whose knockdown enhances response in various pre-defined genetic contexts, which are limited by biological complexities as well as the incompleteness of our knowledge. We thus introduce a complementary data mining strategy to identify genes with exceptional sensitivity in subsets, or outlier groups, of cell lines, allowing an unbiased analysis without any a priori assumption about the underlying biology of dependency. Genes with outlier features are strongly and specifically enriched with those known to be associated with cancer and relevant biological processes, despite no a priori knowledge being used to drive the analysis. Identification of exceptional responders (outliers) may not lead only to new candidates for therapeutic intervention, but also tumor indications and response biomarkers for companion precision medicine strategies. Several tumor suppressors have an outlier sensitivity pattern, supporting and generalizing the notion that tumor suppressors can play context-dependent oncogenic roles. The novel application of outlier analysis described here demonstrates a systematic and data-driven analytical strategy to decipher large-scale functional genomic data for oncology target and precision medicine discoveries.
Functional Profiling Using the Saccharomyces Genome Deletion Project Collections.
Nislow, Corey; Wong, Lai Hong; Lee, Amy Huei-Yi; Giaever, Guri
2016-09-01
The ability to measure and quantify the fitness of an entire organism requires considerably more complex approaches than simply using traditional "omic" methods that examine, for example, the abundance of RNA transcripts, proteins, or metabolites. The yeast deletion collections represent the only systematic, comprehensive set of null alleles for any organism in which such fitness measurements can be assayed. Generated by the Saccharomyces Genome Deletion Project, these collections allow the systematic and parallel analysis of gene functions using any measurable phenotype. The unique 20-bp molecular barcodes engineered into the genome of each deletion strain facilitate the massively parallel analysis of individual fitness. Here, we present functional genomic protocols for use with the yeast deletion collections. We describe how to maintain, propagate, and store the deletion collections and how to perform growth fitness assays on single and parallel screening platforms. Phenotypic fitness analyses of the yeast mutants, described in brief here, provide important insights into biological functions, mechanisms of drug action, and response to environmental stresses. It is important to bear in mind that the specific assays described in this protocol represent some of the many ways in which these collections can be assayed, and in this description particular attention is paid to maximizing throughput using growth as the phenotypic measure. © 2016 Cold Spring Harbor Laboratory Press.
Mwacharo, Joram M; Kim, Eui-Soo; Elbeltagy, Ahmed R; Aboul-Naga, Adel M; Rischkowsky, Barbara A; Rothschild, Max F
2017-12-15
African indigenous sheep are classified as fat-tail, thin-tail and fat-rump hair sheep. The fat-tail are well adapted to dryland environments, but little is known on their genome profiles. We analyzed patterns of genomic variation by genotyping, with the Ovine SNP50K microarray, 394 individuals from five populations of fat-tail sheep from a desert environment in Egypt. Comparative inferences with other East African and western Asia fat-tail and European sheep, reveal at least two phylogeographically distinct genepools of fat-tail sheep in Africa that differ from the European genepool, suggesting separate evolutionary and breeding history. We identified 24 candidate selection sweep regions, spanning 172 potentially novel and known genes, which are enriched with genes underpinning dryland adaptation physiology. In particular, we found selection sweeps spanning genes and/or pathways associated with metabolism; response to stress, ultraviolet radiation, oxidative stress and DNA damage repair; activation of immune response; regulation of reproduction, organ function and development, body size and morphology, skin and hair pigmentation, and keratinization. Our findings provide insights on the complexity of genome architecture regarding dryland stress adaptation in the fat-tail sheep and showcase the indigenous stocks as appropriate genotypes for adaptation planning to sustain livestock production and human livelihoods, under future climates.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chai, Juanjuan; Kora, Guruprasad; Ahn, Tae-Hyuk
2014-10-09
To supply some background, phylogenetic studies have provided detailed knowledge on the evolutionary mechanisms of genes and species in Bacteria and Archaea. However, the evolution of cellular functions, represented by metabolic pathways and biological processes, has not been systematically characterized. Many clades in the prokaryotic tree of life have now been covered by sequenced genomes in GenBank. This enables a large-scale functional phylogenomics study of many computationally inferred cellular functions across all sequenced prokaryotes. Our results show a total of 14,727 GenBank prokaryotic genomes were re-annotated using a new protein family database, UniFam, to obtain consistent functional annotations for accuratemore » comparison. The functional profile of a genome was represented by the biological process Gene Ontology (GO) terms in its annotation. The GO term enrichment analysis differentiated the functional profiles between selected archaeal taxa. 706 prokaryotic metabolic pathways were inferred from these genomes using Pathway Tools and MetaCyc. The consistency between the distribution of metabolic pathways in the genomes and the phylogenetic tree of the genomes was measured using parsimony scores and retention indices. The ancestral functional profiles at the internal nodes of the phylogenetic tree were reconstructed to track the gains and losses of metabolic pathways in evolutionary history. In conclusion, our functional phylogenomics analysis shows divergent functional profiles of taxa and clades. Such function-phylogeny correlation stems from a set of clade-specific cellular functions with low parsimony scores. On the other hand, many cellular functions are sparsely dispersed across many clades with high parsimony scores. These different types of cellular functions have distinct evolutionary patterns reconstructed from the prokaryotic tree.« less
USDA-ARS?s Scientific Manuscript database
Modern biological analyses are often assisted by recent technologies making the sequencing of complex genomes both technically possible and feasible. We recently sequenced the tomato genome that, like many eukaryotic genomes, is large and complex. Current sequencing technologies allow the developmen...
Publication Abstract: Philadelphia chromosome-like acute lymphoblastic leukemia (Ph-like ALL) is characterized by a gene-expression profile similar to that of BCR-ABL1-positive ALL, alterations of lymphoid transcription factor genes, and a poor outcome. The frequency and spectrum of genetic alterations in Ph-like ALL and its responsiveness to tyrosine kinase inhibition are undefined, especially in adolescents and adults. We performed genomic profiling of 1725 patients with precursor B-cell ALL and detailed genomic analysis of 154 patients with Ph-like ALL.
2014-01-01
Background Cis-regulatory modules (CRMs), or the DNA sequences required for regulating gene expression, play the central role in biological researches on transcriptional regulation in metazoan species. Nowadays, the systematic understanding of CRMs still mainly resorts to computational methods due to the time-consuming and small-scale nature of experimental methods. But the accuracy and reliability of different CRM prediction tools are still unclear. Without comparative cross-analysis of the results and combinatorial consideration with extra experimental information, there is no easy way to assess the confidence of the predicted CRMs. This limits the genome-wide understanding of CRMs. Description It is known that transcription factor binding and epigenetic profiles tend to determine functions of CRMs in gene transcriptional regulation. Thus integration of the genome-wide epigenetic profiles with systematically predicted CRMs can greatly help researchers evaluate and decipher the prediction confidence and possible transcriptional regulatory functions of these potential CRMs. However, these data are still fragmentary in the literatures. Here we performed the computational genome-wide screening for potential CRMs using different prediction tools and constructed the pioneer database, cisMEP (cis-regulatory module epigenetic profile database), to integrate these computationally identified CRMs with genomic epigenetic profile data. cisMEP collects the literature-curated TFBS location data and nine genres of epigenetic data for assessing the confidence of these potential CRMs and deciphering the possible CRM functionality. Conclusions cisMEP aims to provide a user-friendly interface for researchers to assess the confidence of different potential CRMs and to understand the functions of CRMs through experimentally-identified epigenetic profiles. The deposited potential CRMs and experimental epigenetic profiles for confidence assessment provide experimentally testable hypotheses for the molecular mechanisms of metazoan gene regulation. We believe that the information deposited in cisMEP will greatly facilitate the comparative usage of different CRM prediction tools and will help biologists to study the modular regulatory mechanisms between different TFs and their target genes. PMID:25521507
NetF-producing Clostridium perfringens: Clonality and plasmid pathogenicity loci analysis.
Mehdizadeh Gohari, Iman; Kropinski, Andrew M; Weese, Scott J; Whitehead, Ashley E; Parreira, Valeria R; Boerlin, Patrick; Prescott, John F
2017-04-01
Clostridium perfringens is an important cause of foal necrotizing enteritis and canine acute hemorrhagic diarrhea. A major virulence determinant of the strains associated with these diseases appears to be a beta-sheet pore-forming toxin, NetF, encoded within a pathogenicity locus (NetF locus) on a large tcp-conjugative plasmid. Strains producing NetF also produce the putative toxin NetE, encoded within the same pathogenicity locus, as well as CPE enterotoxin and CPB2 on a second plasmid, and sometimes the putative toxin NetG within a pathogenicity locus (NetG locus) on another separate large conjugative plasmid. Previous genome sequences of two netF-positive C. perfringens showed that they both shared three similar plasmids, including the NetF/NetE and CPE/CPB2 toxins-encoding plasmids mentioned above and a putative bacteriocin-encoding plasmid. The main purpose of this study was to determine whether all NetF-producing strains share this common plasmid profile and whether their distinct NetF and CPE pathogenicity loci are conserved. To answer this question, 15 equine and 15 canine netF-positive isolates of C. perfringens were sequenced using Illumina Hiseq2000 technology. In addition, the clonal relationships among the NetF-producing strains were evaluated by core genome multilocus sequence typing (cgMLST). The data obtained showed that all NetF-producing strains have a common plasmid profile and that the defined pathogenicity loci on the plasmids are conserved in all these strains. cgMLST analysis showed that the NetF-producing C. perfringens strains belong to two distinct clonal complexes. The pNetG plasmid was absent from isolates of one of the clonal complexes, and there were minor but consistent differences in the NetF/NetE and CPE/CPB2 plasmids between the two clonal complexes. Copyright © 2017 Elsevier B.V. All rights reserved.
Holm, Karolina; Staaf, Johan; Lauss, Martin; Aine, Mattias; Lindgren, David; Bendahl, Pär-Ola; Vallon-Christersson, Johan; Barkardottir, Rosa Bjork; Höglund, Mattias; Borg, Åke; Jönsson, Göran; Ringnér, Markus
2016-02-29
Aberrant DNA methylation is frequently observed in breast cancer. However, the relationship between methylation patterns and the heterogeneity of breast cancer has not been comprehensively characterized. Whole-genome DNA methylation analysis using Illumina Infinium HumanMethylation450 BeadChip arrays was performed on 188 human breast tumors. Unsupervised bootstrap consensus clustering was performed to identify DNA methylation epigenetic subgroups (epitypes). The Cancer Genome Atlas data, including methylation profiles of 669 human breast tumors, was used for validation. The identified epitypes were characterized by integration with publicly available genome-wide data, including gene expression levels, DNA copy numbers, whole-exome sequencing data, and chromatin states. We identified seven breast cancer epitypes. One epitype was distinctly associated with basal-like tumors and with BRCA1 mutations, one epitype contained a subset of ERBB2-amplified tumors characterized by multiple additional amplifications and the most complex genomes, and one epitype displayed a methylation profile similar to normal epithelial cells. Luminal tumors were stratified into the remaining four epitypes, with differences in promoter hypermethylation, global hypomethylation, proliferative rates, and genomic instability. Specific hyper- and hypomethylation across the basal-like epitype was rare. However, we observed that the candidate genomic instability drivers BRCA1 and HORMAD1 displayed aberrant methylation linked to gene expression levels in some basal-like tumors. Hypomethylation in luminal tumors was associated with DNA repeats and subtelomeric regions. We observed two dominant patterns of aberrant methylation in breast cancer. One pattern, constitutively methylated in both basal-like and luminal breast cancer, was linked to genes with promoters in a Polycomb-repressed state in normal epithelial cells and displayed no correlation with gene expression levels. The second pattern correlated with gene expression levels and was associated with methylation in luminal tumors and genes with active promoters in normal epithelial cells. Our results suggest that hypermethylation patterns across basal-like breast cancer may have limited influence on tumor progression and instead reflect the repressed chromatin state of the tissue of origin. On the contrary, hypermethylation patterns specific to luminal breast cancer influence gene expression, may contribute to tumor progression, and may present an actionable epigenetic alteration in a subset of luminal breast cancers.
Host genetic variation impacts microbiome composition across human body sites.
Blekhman, Ran; Goodrich, Julia K; Huang, Katherine; Sun, Qi; Bukowski, Robert; Bell, Jordana T; Spector, Timothy D; Keinan, Alon; Ley, Ruth E; Gevers, Dirk; Clark, Andrew G
2015-09-15
The composition of bacteria in and on the human body varies widely across human individuals, and has been associated with multiple health conditions. While microbial communities are influenced by environmental factors, some degree of genetic influence of the host on the microbiome is also expected. This study is part of an expanding effort to comprehensively profile the interactions between human genetic variation and the composition of this microbial ecosystem on a genome- and microbiome-wide scale. Here, we jointly analyze the composition of the human microbiome and host genetic variation. By mining the shotgun metagenomic data from the Human Microbiome Project for host DNA reads, we gathered information on host genetic variation for 93 individuals for whom bacterial abundance data are also available. Using this dataset, we identify significant associations between host genetic variation and microbiome composition in 10 of the 15 body sites tested. These associations are driven by host genetic variation in immunity-related pathways, and are especially enriched in host genes that have been previously associated with microbiome-related complex diseases, such as inflammatory bowel disease and obesity-related disorders. Lastly, we show that host genomic regions associated with the microbiome have high levels of genetic differentiation among human populations, possibly indicating host genomic adaptation to environment-specific microbiomes. Our results highlight the role of host genetic variation in shaping the composition of the human microbiome, and provide a starting point toward understanding the complex interaction between human genetics and the microbiome in the context of human evolution and disease.
Aberrant expression of long noncoding RNAs in autistic brain.
Ziats, Mark N; Rennert, Owen M
2013-03-01
The autism spectrum disorders (ASD) have a significant hereditary component, but the implicated genetic loci are heterogeneous and complex. Consequently, there is a gap in understanding how diverse genomic aberrations all result in one clinical ASD phenotype. Gene expression studies from autism brain tissue have demonstrated that aberrantly expressed protein-coding genes may converge onto common molecular pathways, potentially reconciling the strong heritability and shared clinical phenotypes with the genomic heterogeneity of the disorder. However, the regulation of gene expression is extremely complex and governed by many mechanisms, including noncoding RNAs. Yet no study in ASD brain tissue has assessed for changes in regulatory long noncoding RNAs (lncRNAs), which represent a large proportion of the human transcriptome, and actively modulate mRNA expression. To assess if aberrant expression of lncRNAs may play a role in the molecular pathogenesis of ASD, we profiled over 33,000 annotated lncRNAs and 30,000 mRNA transcripts from postmortem brain tissue of autistic and control prefrontal cortex and cerebellum by microarray. We detected over 200 differentially expressed lncRNAs in ASD, which were enriched for genomic regions containing genes related to neurodevelopment and psychiatric disease. Additionally, comparison of differences in expression of mRNAs between prefrontal cortex and cerebellum within individual donors showed ASD brains had more transcriptional homogeneity. Moreover, this was also true of the lncRNA transcriptome. Our results suggest that further investigation of lncRNA expression in autistic brain may further elucidate the molecular pathogenesis of this disorder.
Wagner, Bridget K.; Clemons, Paul A.
2009-01-01
Discovering small-molecule modulators for thousands of gene products requires multiple stages of biological testing, specificity evaluation, and chemical optimization. Many cellular profiling methods, including cellular sensitivity, gene-expression, and cellular imaging, have emerged as methods to assess the functional consequences of biological perturbations. Cellular profiling methods applied to small-molecule science provide opportunities to use complex phenotypic information to prioritize and optimize small-molecule structures simultaneously against multiple biological endpoints. As throughput increases and cost decreases for such technologies, we see an emerging paradigm of using more information earlier in probe- and drug-discovery efforts. Moreover, increasing access to public datasets makes possible the construction of “virtual” profiles of small-molecule performance, even when multiplexed measurements were not performed or when multidimensional profiling was not the original intent. We review some key conceptual advances in small-molecule phenotypic profiling, emphasizing connections to other information, such as protein-binding measurements, genetic perturbations, and cell states. We argue that to maximally leverage these measurements in probe and drug discovery requires a fundamental connection to synthetic chemistry, allowing the consequences of synthetic decisions to be described in terms of changes in small-molecule profiles. Mining such data in the context of chemical structure and synthesis strategies can inform decisions about chemistry procurement and library development, leading to optimal small-molecule screening collections. PMID:19825513
Ahlstrom, Christina A; Bonnedahl, Jonas; Woksepp, Hanna; Hernandez, Jorge; Olsen, Björn; Ramey, Andrew M
2018-05-09
Antimicrobial resistance (AMR) in bacterial pathogens threatens global health, though the spread of AMR bacteria and AMR genes between humans, animals, and the environment is still largely unknown. Here, we investigated the role of wild birds in the epidemiology of AMR Escherichia coli. Using next-generation sequencing, we characterized cephalosporin-resistant E. coli cultured from sympatric gulls and bald eagles inhabiting a landfill habitat in Alaska to identify genetic determinants conferring AMR, explore potential transmission pathways of AMR bacteria and genes at this site, and investigate how their genetic diversity compares to isolates reported in other taxa. We found genetically diverse E. coli isolates with sequence types previously associated with human infections and resistance genes of clinical importance, including bla CTX-M and bla CMY . Identical resistance profiles were observed in genetically unrelated E. coli isolates from both gulls and bald eagles. Conversely, isolates with indistinguishable core-genomes were found to have different resistance profiles. Our findings support complex epidemiological interactions including bacterial strain sharing between gulls and bald eagles and horizontal gene transfer among E. coli harboured by birds. Results suggest that landfills may serve as a source for AMR acquisition and/or maintenance, including bacterial sequence types and AMR genes relevant to human health.
Gaponova, Anna V.; Deneka, Alexander Y.; Beck, Tim N.; Liu, Hanqing; Andrianov, Gregory; Nikonova, Anna S.; Nicolas, Emmanuelle; Einarson, Margret B.; Golemis, Erica A.; Serebriiskii, Ilya G.
2017-01-01
Ovarian, head and neck, and other cancers are commonly treated with cisplatin and other DNA damaging cytotoxic agents. Altered DNA damage response (DDR) contributes to resistance of these tumors to chemotherapies, some targeted therapies, and radiation. DDR involves multiple protein complexes and signaling pathways, some of which are evolutionarily ancient and involve protein orthologs conserved from yeast to humans. To identify new regulators of cisplatin-resistance in human tumors, we integrated high throughput and curated datasets describing yeast genes that regulate sensitivity to cisplatin and/or ionizing radiation. Next, we clustered highly validated genes based on chemogenomic profiling, and then mapped orthologs of these genes in expanded genomic networks for multiple metazoans, including humans. This approach identified an enriched candidate set of genes involved in the regulation of resistance to radiation and/or cisplatin in humans. Direct functional assessment of selected candidate genes using RNA interference confirmed their activity in influencing cisplatin resistance, degree of γH2AX focus formation and ATR phosphorylation, in ovarian and head and neck cancer cell lines, suggesting impaired DDR signaling as the driving mechanism. This work enlarges the set of genes that may contribute to chemotherapy resistance and provides a new contextual resource for interpreting next generation sequencing (NGS) genomic profiling of tumors. PMID:27863405
Abraham, Paul E; Wang, Xiaojing; Ranjan, Priya; Nookaew, Intawat; Zhang, Bing; Tuskan, Gerald A; Hettich, Robert L
2015-12-04
Next-generation sequencing has transformed the ability to link genotypes to phenotypes and facilitates the dissection of genetic contribution to complex traits. However, it is challenging to link genetic variants with the perturbed functional effects on proteins encoded by such genes. Here we show how RNA sequencing can be exploited to construct genotype-specific protein sequence databases to assess natural variation in proteins, providing information about the molecular toolbox driving cellular processes. For this study, we used two natural genotypes selected from a recent genome-wide association study of Populus trichocarpa, an obligate outcrosser with tremendous phenotypic variation across the natural population. This strategy allowed us to comprehensively catalogue proteins containing single amino acid polymorphisms (SAAPs), as well as insertions and deletions. We profiled the frequency of 128 types of naturally occurring amino acid substitutions, including both expected (neutral) and unexpected (non-neutral) SAAPs, with a subset occurring in regions of the genome having strong polymorphism patterns consistent with recent positive and/or divergent selection. By zeroing in on the molecular signatures of these important regions that might have previously been uncharacterized, we now provide a high-resolution molecular inventory that should improve accessibility and subsequent identification of natural protein variants in future genotype-to-phenotype studies.
Lindgren, Emma; Hägg, Sara; Giordano, Fosco; Björkegren, Johan; Ström, Lena
2014-01-01
Genome integrity is fundamental for cell survival and cell cycle progression. Important mechanisms for keeping the genome intact are proper sister chromatid segregation, correct gene regulation and efficient repair of damaged DNA. Cohesin and its DNA loader, the Scc2/4 complex have been implicated in all these cellular actions. The gene regulation role has been described in several organisms. In yeast it has been suggested that the proteins in the cohesin network would effect transcription based on its role as insulator. More recently, data are emerging indicating direct roles for gene regulation also in yeast. Here we extend these studies by investigating whether the cohesin loader Scc2 is involved in regulation of gene expression. We performed global gene expression profiling in the absence and presence of DNA damage, in wild type and Scc2 deficient G2/M arrested cells, when it is known that Scc2 is important for DNA double strand break repair and formation of damage induced cohesion. We found that not only the DNA damage specific transcriptional response is distorted after inactivation of Scc2 but also the overall transcription profile. Interestingly, these alterations did not correlate with changes in cohesin binding. PMID:25483075
Microplate-based platform for combined chromatin and DNA methylation immunoprecipitation assays
2011-01-01
Background The processes that compose expression of a given gene are far more complex than previously thought presenting unprecedented conceptual and mechanistic challenges that require development of new tools. Chromatin structure, which is regulated by DNA methylation and histone modification, is at the center of gene regulation. Immunoprecipitations of chromatin (ChIP) and methylated DNA (MeDIP) represent a major achievement in this area that allow researchers to probe chromatin modifications as well as specific protein-DNA interactions in vivo and to estimate the density of proteins at specific sites genome-wide. Although a critical component of chromatin structure, DNA methylation has often been studied independently of other chromatin events and transcription. Results To allow simultaneous measurements of DNA methylation with other genomic processes, we developed and validated a simple and easy-to-use high throughput microplate-based platform for analysis of DNA methylation. Compared to the traditional beads-based MeDIP the microplate MeDIP was more sensitive and had lower non-specific binding. We integrated the MeDIP method with a microplate ChIP assay which allows measurements of both DNA methylation and histone marks at the same time, Matrix ChIP-MeDIP platform. We illustrated several applications of this platform to relate DNA methylation, with chromatin and transcription events at selected genes in cultured cells, human cancer and in a model of diabetic kidney disease. Conclusion The high throughput capacity of Matrix ChIP-MeDIP to profile tens and potentially hundreds of different genomic events at the same time as DNA methylation represents a powerful platform to explore complex genomic mechanism at selected genes in cultured cells and in whole tissues. In this regard, Matrix ChIP-MeDIP should be useful to complement genome-wide studies where the rich chromatin and transcription database resources provide fruitful foundation to pursue mechanistic, functional and diagnostic information at genes of interest in health and disease. PMID:22098709
Crop epigenetics and the molecular hardware of genotype × environment interactions.
King, Graham J
2015-01-01
Crop plants encounter thermal environments which fluctuate on a diurnal and seasonal basis. Future climate resilient cultivars will need to respond to thermal profiles reflecting more variable conditions, and harness plasticity that involves regulation of epigenetic processes and complex genomic regulatory networks. Compartmentalization within plant cells insulates the genomic central processing unit within the interphase nucleus. This review addresses the properties of the chromatin hardware in which the genome is embedded, focusing on the biophysical and thermodynamic properties of DNA, histones and nucleosomes. It explores the consequences of thermal and ionic variation on the biophysical behavior of epigenetic marks such as DNA cytosine methylation (5mC), and histone variants such as H2A.Z, and how these contribute to maintenance of chromatin integrity in the nucleus, while enabling specific subsets of genes to be regulated. Information is drawn from theoretical molecular in vitro studies as well as model and crop plants and incorporates recent insights into the role epigenetic processes play in mediating between environmental signals and genomic regulation. A preliminary speculative framework is outlined, based on the evidence of what appears to be a cohesive set of interactions at molecular, biophysical and electrostatic level between the various components contributing to chromatin conformation and dynamics. It proposes that within plant nuclei, general and localized ionic homeostasis plays an important role in maintaining chromatin conformation, whilst maintaining complex genomic regulation that involves specific patterns of epigenetic marks. More generally, reversible changes in DNA methylation appear to be consistent with the ability of nuclear chromatin to manage variation in external ionic and temperature environment. Whilst tentative, this framework provides scope to develop experimental approaches to understand in greater detail the internal environment of plant nuclei. It is hoped that this will generate a deeper understanding of the molecular mechanisms underlying genotype × environment interactions that may be beneficial for long-term improvement of crop performance in less predictable climates.
Crop epigenetics and the molecular hardware of genotype × environment interactions
King, Graham J.
2015-01-01
Crop plants encounter thermal environments which fluctuate on a diurnal and seasonal basis. Future climate resilient cultivars will need to respond to thermal profiles reflecting more variable conditions, and harness plasticity that involves regulation of epigenetic processes and complex genomic regulatory networks. Compartmentalization within plant cells insulates the genomic central processing unit within the interphase nucleus. This review addresses the properties of the chromatin hardware in which the genome is embedded, focusing on the biophysical and thermodynamic properties of DNA, histones and nucleosomes. It explores the consequences of thermal and ionic variation on the biophysical behavior of epigenetic marks such as DNA cytosine methylation (5mC), and histone variants such as H2A.Z, and how these contribute to maintenance of chromatin integrity in the nucleus, while enabling specific subsets of genes to be regulated. Information is drawn from theoretical molecular in vitro studies as well as model and crop plants and incorporates recent insights into the role epigenetic processes play in mediating between environmental signals and genomic regulation. A preliminary speculative framework is outlined, based on the evidence of what appears to be a cohesive set of interactions at molecular, biophysical and electrostatic level between the various components contributing to chromatin conformation and dynamics. It proposes that within plant nuclei, general and localized ionic homeostasis plays an important role in maintaining chromatin conformation, whilst maintaining complex genomic regulation that involves specific patterns of epigenetic marks. More generally, reversible changes in DNA methylation appear to be consistent with the ability of nuclear chromatin to manage variation in external ionic and temperature environment. Whilst tentative, this framework provides scope to develop experimental approaches to understand in greater detail the internal environment of plant nuclei. It is hoped that this will generate a deeper understanding of the molecular mechanisms underlying genotype × environment interactions that may be beneficial for long-term improvement of crop performance in less predictable climates. PMID:26594221
Guo, Min; Yang, Ruifu; Huang, Chen; Liao, Qiwen; Fan, Guangyi; Sun, Chenghang; Lee, Simon Ming-Yuen
2017-04-04
The nuclear envelope is considered a key classification marker that distinguishes prokaryotes from eukaryotes. However, this marker does not apply to the family Planctomycetaceae, which has intracellular spaces divided by lipidic intracytoplasmic membranes (ICMs). Nuclear localization signal (NLS), a short stretch of amino acid sequence, destines to transport proteins from cytoplasm into nucleus, and is also associated with the development of nuclear envelope. We attempted to investigate the NLS motifs in Planctomycetaceae genomes to demonstrate the potential molecular transition in the development of intracellular membrane system. In this study, we identified NLS-like motifs that have the same amino acid compositions as experimentally identified NLSs in genomes of 11 representative species of family Planctomycetaceae. A total of 15 NLS types and 170 NLS-bearing proteins were detected in the 11 strains. To determine the molecular transformation, we compared NLS-bearing protein abundances in the 11 representative Planctomycetaceae genomes with them in genomes of 16 taxonomically varied microorganisms: nine bacteria, two archaea and five fungi. In the 27 strains, 29 NLS types and 1101 NLS-bearing proteins were identified, principal component analysis showed a significant transitional gradient from bacteria to Planctomycetaceae to fungi on their NLS-bearing protein abundance profiles. Then, we clustered the 993 non-redundant NLS-bearing proteins into 181 families and annotated their involved metabolic pathways. Afterwards, we aligned the ten types of NLS motifs from the 13 families containing NLS-bearing proteins among bacteria, Planctomycetaceae or fungi, considering their diversity, length and origin. A transition towards increased complexity from non-planctomycete bacteria to Planctomycetaceae to archaea and fungi was detected based on the complexity of the 10 types of NLS-like motifs in the 13 NLS-bearing proteins families. The results of this study reveal that Planctomycetaceae separates slightly from the members of non-planctomycete bacteria but still has substantial differences from fungi, based on the NLS-like motifs and NLS-bearing protein analysis.
Starch and starch hydrolysates are favorable carbon sources for bifidobacteria in the human gut.
Liu, Songling; Ren, Fazheng; Zhao, Liang; Jiang, Lu; Hao, Yanling; Jin, Junhua; Zhang, Ming; Guo, Huiyuan; Lei, Xingen; Sun, Erna; Liu, Hongna
2015-03-01
Bifidobacteria are key commensals in human gut, and their abundance is associated with the health of their hosts. Although they are dominant in infant gut, their number becomes lower in adult gut. The changes of the diet are considered to be main reason for this difference. Large amounts of whole-genomic sequence data of bifidobacteria make it possible to elucidate the genetic interpretation of their adaptation to the nutrient environment. Among the nutrients in human gut, starch is a highly fermentable substrate and can exert beneficial effects by increasing bifidobacteria and/or being fermented to short chain fatty acids. In order to determine the potential substrate preference of bifidobacteria, we compared the glycoside hydrolase (GH) profiles of a pooled-bifidobacterial genome (PBG) with a representative microbiome (RM) of the human gut. In bifidobacterial genomes, only 15% of GHs contained signal peptides, suggesting their weakness in utilization of complex carbohydrate, such as plant cell wall polysaccharides. However, compared with other intestinal bacteria, bifidobacteiral genomes encoded more GH genes for degrading starch and starch hydrolysates, indicating that they have genetic advantages in utilizing these substrates. Bifidobacterium longum subsp. longum BBMN68 isolated from centenarian's faeces was used as a model strain to further investigate the carbohydrate utilization. The pathway for degrading starch and starch hydrolysates was the only complete pathway for complex carbohydrates in human gut. It is noteworthy that all of the GH genes for degrading starch and starch hydrolysates in the BBMN68 genome were conserved in all studied bifidobacterial strains. The in silico analyses of BBMN68 were further confirmed by growth experiments, proteomic and real-time quantitative PCR (RT-PCR) analyses. Our results demonstrated that starch and starch hydrolysates were the most universal and favorable carbon sources for bifidobacteria. The low amount of these carbon sources in adult intestine was speculated to contribute to the low relative abundance of bifidobacteria.
Fang, Lingzhao; Sahana, Goutam; Ma, Peipei; Su, Guosheng; Yu, Ying; Zhang, Shengli; Lund, Mogens Sandø; Sørensen, Peter
2017-08-10
A better understanding of the genetic architecture underlying complex traits (e.g., the distribution of causal variants and their effects) may aid in the genomic prediction. Here, we hypothesized that the genomic variants of complex traits might be enriched in a subset of genomic regions defined by genes grouped on the basis of "Gene Ontology" (GO), and that incorporating this independent biological information into genomic prediction models might improve their predictive ability. Four complex traits (i.e., milk, fat and protein yields, and mastitis) together with imputed sequence variants in Holstein (HOL) and Jersey (JER) cattle were analysed. We first carried out a post-GWAS analysis in a HOL training population to assess the degree of enrichment of the association signals in the gene regions defined by each GO term. We then extended the genomic best linear unbiased prediction model (GBLUP) to a genomic feature BLUP (GFBLUP) model, including an additional genomic effect quantifying the joint effect of a group of variants located in a genomic feature. The GBLUP model using a single random effect assumes that all genomic variants contribute to the genomic relationship equally, whereas GFBLUP attributes different weights to the individual genomic relationships in the prediction equation based on the estimated genomic parameters. Our results demonstrate that the immune-relevant GO terms were more associated with mastitis than milk production, and several biologically meaningful GO terms improved the prediction accuracy with GFBLUP for the four traits, as compared with GBLUP. The improvement of the genomic prediction between breeds (the average increase across the four traits was 0.161) was more apparent than that it was within the HOL (the average increase across the four traits was 0.020). Our genomic feature modelling approaches provide a framework to simultaneously explore the genetic architecture and genomic prediction of complex traits by taking advantage of independent biological knowledge.
Integrating multi-omic features exploiting Chromosome Conformation Capture data.
Merelli, Ivan; Tordini, Fabio; Drocco, Maurizio; Aldinucci, Marco; Liò, Pietro; Milanesi, Luciano
2015-01-01
The representation, integration, and interpretation of omic data is a complex task, in particular considering the huge amount of information that is daily produced in molecular biology laboratories all around the world. The reason is that sequencing data regarding expression profiles, methylation patterns, and chromatin domains is difficult to harmonize in a systems biology view, since genome browsers only allow coordinate-based representations, discarding functional clusters created by the spatial conformation of the DNA in the nucleus. In this context, recent progresses in high throughput molecular biology techniques and bioinformatics have provided insights into chromatin interactions on a larger scale and offer a formidable support for the interpretation of multi-omic data. In particular, a novel sequencing technique called Chromosome Conformation Capture allows the analysis of the chromosome organization in the cell's natural state. While performed genome wide, this technique is usually called Hi-C. Inspired by service applications such as Google Maps, we developed NuChart, an R package that integrates Hi-C data to describe the chromosomal neighborhood starting from the information about gene positions, with the possibility of mapping on the achieved graphs genomic features such as methylation patterns and histone modifications, along with expression profiles. In this paper we show the importance of the NuChart application for the integration of multi-omic data in a systems biology fashion, with particular interest in cytogenetic applications of these techniques. Moreover, we demonstrate how the integration of multi-omic data can provide useful information in understanding why genes are in certain specific positions inside the nucleus and how epigenetic patterns correlate with their expression.
Lin, Choun-Sea; Chen, Jeremy J W; Chiu, Chi-Chou; Hsiao, Han C W; Yang, Chen-Jui; Jin, Xiao-Hua; Leebens-Mack, James; de Pamphilis, Claude W; Huang, Yao-Ting; Yang, Ling-Hung; Chang, Wan-Jung; Kui, Ling; Wong, Gane Ka-Shu; Hu, Jer-Ming; Wang, Wen; Shih, Ming-Che
2017-06-01
The chloroplast NAD(P)H dehydrogenase-like (NDH) complex consists of about 30 subunits from both the nuclear and chloroplast genomes and is ubiquitous across most land plants. In some orchids, such as Phalaenopsis equestris, Dendrobium officinale and Dendrobium catenatum, most of the 11 chloroplast genome-encoded ndh genes (cp-ndh) have been lost. Here we investigated whether functional cp-ndh genes have been completely lost in these orchids or whether they have been transferred and retained in the nuclear genome. Further, we assessed whether both cp-ndh genes and nucleus-encoded NDH-related genes can be lost, resulting in the absence of the NDH complex. Comparative analyses of the genome of Apostasia odorata, an orchid species with a complete complement of cp-ndh genes which represents the sister lineage to all other orchids, and three published orchid genome sequences for P. equestris, D. officinale and D. catenatum, which are all missing cp-ndh genes, indicated that copies of cp-ndh genes are not present in any of these four nuclear genomes. This observation suggests that the NDH complex is not necessary for some plants. Comparative genomic/transcriptomic analyses of currently available plastid genome sequences and nuclear transcriptome data showed that 47 out of 660 photoautotrophic plants and all the heterotrophic plants are missing plastid-encoded cp-ndh genes and exhibit no evidence for maintenance of a functional NDH complex. Our data indicate that the NDH complex can be lost in photoautotrophic plant species. Further, the loss of the NDH complex may increase the probability of transition from a photoautotrophic to a heterotrophic life history. © 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd.
Evolution and the complexity of bacteriophages.
Serwer, Philip
2007-03-13
The genomes of both long-genome (> 200 Kb) bacteriophages and long-genome eukaryotic viruses have cellular gene homologs whose selective advantage is not explained. These homologs add genomic and possibly biochemical complexity. Understanding their significance requires a definition of complexity that is more biochemically oriented than past empirically based definitions. Initially, I propose two biochemistry-oriented definitions of complexity: either decreased randomness or increased encoded information that does not serve immediate needs. Then, I make the assumption that these two definitions are equivalent. This assumption and recent data lead to the following four-part hypothesis that explains the presence of cellular gene homologs in long bacteriophage genomes and also provides a pathway for complexity increases in prokaryotic cells: (1) Prokaryotes underwent evolutionary increases in biochemical complexity after the eukaryote/prokaryote splits. (2) Some of the complexity increases occurred via multi-step, weak selection that was both protected from strong selection and accelerated by embedding evolving cellular genes in the genomes of bacteriophages and, presumably, also archaeal viruses (first tier selection). (3) The mechanisms for retaining cellular genes in viral genomes evolved under additional, longer-term selection that was stronger (second tier selection). (4) The second tier selection was based on increased access by prokaryotic cells to improved biochemical systems. This access was achieved when DNA transfer moved to prokaryotic cells both the more evolved genes and their more competitive and complex biochemical systems. I propose testing this hypothesis by controlled evolution in microbial communities to (1) determine the effects of deleting individual cellular gene homologs on the growth and evolution of long genome bacteriophages and hosts, (2) find the environmental conditions that select for the presence of cellular gene homologs, (3) determine which, if any, bacteriophage genes were selected for maintaining the homologs and (4) determine the dynamics of homolog evolution. This hypothesis is an explanation of evolutionary leaps in general. If accurate, it will assist both understanding and influencing the evolution of microbes and their communities. Analysis of evolutionary complexity increase for at least prokaryotes should include analysis of genomes of long-genome bacteriophages.
Jiao, J; Wu, J; Lv, Z; Sun, C; Gao, L; Yan, X; Cui, L; Tang, Z; Yan, B; Jia, Y
2015-11-26
This study aimed to investigate cytosine methylation profiles in different tobacco (Nicotiana tabacum) cultivars grown in China. Methylation-sensitive amplified polymorphism was used to analyze genome-wide global methylation profiles in four tobacco cultivars (Yunyan 85, NC89, K326, and Yunyan 87). Amplicons with methylated C motifs were cloned by reamplified polymerase chain reaction, sequenced, and analyzed. The results show that geographical location had a greater effect on methylation patterns in the tobacco genome than did sampling time. Analysis of the CG dinucleotide distribution in methylation-sensitive polymorphic restriction fragments suggested that a CpG dinucleotide cluster-enriched area is a possible site of cytosine methylation in the tobacco genome. The sequence alignments of the Nia1 gene (that encodes nitrate reductase) in Yunyan 87 in different regions indicate that a C-T transition might be responsible for the tobacco phenotype. T-C nucleotide replacement might also be responsible for the tobacco phenotype and may be influenced by geographical location.
Sha, Yanwei; Sha, Yankun; Ji, Zhiyong; Ding, Lu; Zhang, Qing; Ouyang, Honggen; Lin, Shaobin; Wang, Xu; Shao, Lin; Shi, Chong; Li, Ping; Song, Yueqiang
2017-03-01
Robertsonian translocation (RT) is a common cause for male infertility, recurrent pregnancy loss, and birth defects. Studying meiotic recombination in RT-carrier patients helps decipher the mechanism and improve the clinical management of infertility and birth defects caused by RT. Here we present a new method to study spermatogenesis on a single-gamete basis from two RT carriers. By using a combined single-cell whole-genome amplification and sequencing protocol, we comprehensively profiled the chromosomal copy number of 88 single sperms from two RT-carrier patients. With the profiled information, chromosomal aberrations were identified on a whole-genome, per-sperm basis. We found that the previously reported interchromosomal effect might not exist with RT carriers. It is suggested that single-cell genome sequencing enables comprehensive chromosomal aneuploidy screening and provides a powerful tool for studying gamete generation from patients carrying chromosomal diseases. © 2017 John Wiley & Sons Ltd/University College London.
Stratification of co-evolving genomic groups using ranked phylogenetic profiles
Freilich, Shiri; Goldovsky, Leon; Gottlieb, Assaf; Blanc, Eric; Tsoka, Sophia; Ouzounis, Christos A
2009-01-01
Background Previous methods of detecting the taxonomic origins of arbitrary sequence collections, with a significant impact to genome analysis and in particular metagenomics, have primarily focused on compositional features of genomes. The evolutionary patterns of phylogenetic distribution of genes or proteins, represented by phylogenetic profiles, provide an alternative approach for the detection of taxonomic origins, but typically suffer from low accuracy. Herein, we present rank-BLAST, a novel approach for the assignment of protein sequences into genomic groups of the same taxonomic origin, based on the ranking order of phylogenetic profiles of target genes or proteins across the reference database. Results The rank-BLAST approach is validated by computing the phylogenetic profiles of all sequences for five distinct microbial species of varying degrees of phylogenetic proximity, against a reference database of 243 fully sequenced genomes. The approach - a combination of sequence searches, statistical estimation and clustering - analyses the degree of sequence divergence between sets of protein sequences and allows the classification of protein sequences according to the species of origin with high accuracy, allowing taxonomic classification of 64% of the proteins studied. In most cases, a main cluster is detected, representing the corresponding species. Secondary, functionally distinct and species-specific clusters exhibit different patterns of phylogenetic distribution, thus flagging gene groups of interest. Detailed analyses of such cases are provided as examples. Conclusion Our results indicate that the rank-BLAST approach can capture the taxonomic origins of sequence collections in an accurate and efficient manner. The approach can be useful both for the analysis of genome evolution and the detection of species groups in metagenomics samples. PMID:19860884
Multichromosomal median and halving problems under different genomic distances
Tannier, Eric; Zheng, Chunfang; Sankoff, David
2009-01-01
Background Genome median and genome halving are combinatorial optimization problems that aim at reconstructing ancestral genomes as well as the evolutionary events leading from the ancestor to extant species. Exploring complexity issues is a first step towards devising efficient algorithms. The complexity of the median problem for unichromosomal genomes (permutations) has been settled for both the breakpoint distance and the reversal distance. Although the multichromosomal case has often been assumed to be a simple generalization of the unichromosomal case, it is also a relaxation so that complexity in this context does not follow from existing results, and is open for all distances. Results We settle here the complexity of several genome median and halving problems, including a surprising polynomial result for the breakpoint median and guided halving problems in genomes with circular and linear chromosomes, showing that the multichromosomal problem is actually easier than the unichromosomal problem. Still other variants of these problems are NP-complete, including the DCJ double distance problem, previously mentioned as an open question. We list the remaining open problems. Conclusion This theoretical study clears up a wide swathe of the algorithmical study of genome rearrangements with multiple multichromosomal genomes. PMID:19386099
ICTV Virus Taxonomy Profile: Virgaviridae
USDA-ARS?s Scientific Manuscript database
The family Virgaviridae is comprised of plant-infecting viruses with rod-shaped particles, single stranded RNA genomes with 3' terminal tRNA-like structures, and replication proteins typical of alphalike viruses. Differences in the number of genome components, genome organization and transmission m...
Genomic and Genetic Diversity within the Pseudomonas fluorescens Complex
Garrido-Sanz, Daniel; Meier-Kolthoff, Jan P.; Göker, Markus; Martín, Marta; Rivilla, Rafael; Redondo-Nieto, Miguel
2016-01-01
The Pseudomonas fluorescens complex includes Pseudomonas strains that have been taxonomically assigned to more than fifty different species, many of which have been described as plant growth-promoting rhizobacteria (PGPR) with potential applications in biocontrol and biofertilization. So far the phylogeny of this complex has been analyzed according to phenotypic traits, 16S rDNA, MLSA and inferred by whole-genome analysis. However, since most of the type strains have not been fully sequenced and new species are frequently described, correlation between taxonomy and phylogenomic analysis is missing. In recent years, the genomes of a large number of strains have been sequenced, showing important genomic heterogeneity and providing information suitable for genomic studies that are important to understand the genomic and genetic diversity shown by strains of this complex. Based on MLSA and several whole-genome sequence-based analyses of 93 sequenced strains, we have divided the P. fluorescens complex into eight phylogenomic groups that agree with previous works based on type strains. Digital DDH (dDDH) identified 69 species and 75 subspecies within the 93 genomes. The eight groups corresponded to clustering with a threshold of 31.8% dDDH, in full agreement with our MLSA. The Average Nucleotide Identity (ANI) approach showed inconsistencies regarding the assignment to species and to the eight groups. The small core genome of 1,334 CDSs and the large pan-genome of 30,848 CDSs, show the large diversity and genetic heterogeneity of the P. fluorescens complex. However, a low number of strains were enough to explain most of the CDSs diversity at core and strain-specific genomic fractions. Finally, the identification and analysis of group-specific genome and the screening for distinctive characters revealed a phylogenomic distribution of traits among the groups that provided insights into biocontrol and bioremediation applications as well as their role as PGPR. PMID:26915094
Stelzer, Claus-Peter; Riss, Simone; Stadler, Peter
2011-04-07
Studies on genome size variation in animals are rarely done at lower taxonomic levels, e.g., slightly above/below the species level. Yet, such variation might provide important clues on the tempo and mode of genome size evolution. In this study we used the flow-cytometry method to study the evolution of genome size in the rotifer Brachionus plicatilis, a cryptic species complex consisting of at least 14 closely related species. We found an unexpectedly high variation in this species complex, with genome sizes ranging approximately seven-fold (haploid '1C' genome sizes: 0.056-0.416 pg). Most of this variation (67%) could be ascribed to the major clades of the species complex, i.e. clades that are well separated according to most species definitions. However, we also found substantial variation (32%) at lower taxonomic levels--within and among genealogical species--and, interestingly, among species pairs that are not completely reproductively isolated. In one genealogical species, called B. 'Austria', we found greatly enlarged genome sizes that could roughly be approximated as multiples of the genomes of its closest relatives, which suggests that whole-genome duplications have occurred early during separation of this lineage. Overall, genome size was significantly correlated to egg size and body size, even though the latter became non-significant after controlling for phylogenetic non-independence. Our study suggests that substantial genome size variation can build up early during speciation, potentially even among isolated populations. An alternative, but not mutually exclusive interpretation might be that reproductive isolation tends to build up unusually slow in this species complex.
2011-01-01
Background Studies on genome size variation in animals are rarely done at lower taxonomic levels, e.g., slightly above/below the species level. Yet, such variation might provide important clues on the tempo and mode of genome size evolution. In this study we used the flow-cytometry method to study the evolution of genome size in the rotifer Brachionus plicatilis, a cryptic species complex consisting of at least 14 closely related species. Results We found an unexpectedly high variation in this species complex, with genome sizes ranging approximately seven-fold (haploid '1C' genome sizes: 0.056-0.416 pg). Most of this variation (67%) could be ascribed to the major clades of the species complex, i.e. clades that are well separated according to most species definitions. However, we also found substantial variation (32%) at lower taxonomic levels - within and among genealogical species - and, interestingly, among species pairs that are not completely reproductively isolated. In one genealogical species, called B. 'Austria', we found greatly enlarged genome sizes that could roughly be approximated as multiples of the genomes of its closest relatives, which suggests that whole-genome duplications have occurred early during separation of this lineage. Overall, genome size was significantly correlated to egg size and body size, even though the latter became non-significant after controlling for phylogenetic non-independence. Conclusions Our study suggests that substantial genome size variation can build up early during speciation, potentially even among isolated populations. An alternative, but not mutually exclusive interpretation might be that reproductive isolation tends to build up unusually slow in this species complex. PMID:21473744
DNA methylation profiles of donor nuclei cells and tissues of cloned bovine fetuses.
Kremenskoy, Maksym; Kremenska, Yuliya; Suzuki, Masako; Imai, Kei; Takahashi, Seiya; Hashizume, Kazuyoshi; Yagi, Shintaro; Shiota, Kunio
2006-04-01
Methylation of DNA in CpG islands plays an important role during fetal development and differentiation because CpG islands are preferentially located in upstream regions of mammalian genomic DNA, including the transcription start site of housekeeping genes and are also associated with tissue-specific genes. Somatic nuclear transfer (NT) technology has been used to generate live clones in numerous mammalian species, but only a low percentage of nuclear transferred animals develop to term. Abnormal epigenetic changes in the CpG islands of donor nuclei after nuclear transfer could contribute to a high rate of abortion during early gestation and increase perinatal death. These changes have yet to be explored. Thus, we investigated the genome-wide DNA methylation profiles of CpG islands in nuclei donor cells and NT animals. Using Restriction Landmark Genomic Scanning (RLGS), we showed, for the first time, the epigenetic profile formation of tissues from NT bovine fetuses produced from cumulus cells. From approximately 2600 unmethylated NotI sites visualized on the RLGS profile, at least 35 NotI sites showed different methylation statuses. Moreover, we proved that fetal and placental tissues from artificially inseminated and cloned cattle have tissue-specific differences in the genome-wide methylation profiles of the CpG islands. We also found that possible abnormalities occurred in the fetal brain and placental tissues of cloned animals.
Fourie, Gerda; van der Merwe, Nicolaas A; Wingfield, Brenda D; Bogale, Mesfin; Tudzynski, Bettina; Wingfield, Michael J; Steenkamp, Emma T
2013-09-08
The availability of mitochondrial genomes has allowed for the resolution of numerous questions regarding the evolutionary history of fungi and other eukaryotes. In the Gibberella fujikuroi species complex, the exact relationships among the so-called "African", "Asian" and "American" Clades remain largely unresolved, irrespective of the markers employed. In this study, we considered the feasibility of using mitochondrial genes to infer the phylogenetic relationships among Fusarium species in this complex. The mitochondrial genomes of representatives of the three Clades (Fusarium circinatum, F. verticillioides and F. fujikuroi) were characterized and we determined whether or not the mitochondrial genomes of these fungi have value in resolving the higher level evolutionary relationships in the complex. Overall, the mitochondrial genomes of the three species displayed a high degree of synteny, with all the genes (protein coding genes, unique ORFs, ribosomal RNA and tRNA genes) in identical order and orientation, as well as introns that share similar positions within genes. The intergenic regions and introns generally contributed significantly to the size differences and diversity observed among these genomes. Phylogenetic analysis of the concatenated protein-coding dataset separated members of the Gibberella fujikuroi complex from other Fusarium species and suggested that F. fujikuroi ("Asian" Clade) is basal in the complex. However, individual mitochondrial gene trees were largely incongruent with one another and with the concatenated gene tree, because six distinct phylogenetic trees were recovered from the various single gene datasets. The mitochondrial genomes of Fusarium species in the Gibberella fujikuroi complex are remarkably similar to those of the previously characterized Fusarium species and Sordariomycetes. Despite apparently representing a single replicative unit, all of the genes encoded on the mitochondrial genomes of these fungi do not share the same evolutionary history. This incongruence could be due to biased selection on some genes or recombination among mitochondrial genomes. The results thus suggest that the use of individual mitochondrial genes for phylogenetic inference could mask the true relationships between species in this complex.
Nuclear import of viral DNA genomes.
Greber, Urs F; Fassati, Ariberto
2003-03-01
The genomes of many viruses traffic into the nucleus, where they are either integrated into host chromosomes or maintained as episomal DNA and then transcriptionally activated or silenced. Here, we discuss the existing evidence on how the lentiviruses, adenoviruses, herpesviruses, hepadnaviruses and autonomous parvoviruses enter the nucleus. Depending on the size of the capsid enclosing the genome, three principles of viral nucleic acids import are discussed. The first principle is that the capsid disassembles in the cytosol or in a docked state at the nuclear pore complex and a subviral genomic complex is trafficked through the pore. Second, the genome is injected from a capsid that is docked to the pore complex, and third, import factors are recruited to cytosolic capsids to increase capsid affinity to the pore complex, mediate translocation and allow disassembly in the nucleoplasm.
Precision medicine for advanced prostate cancer
Mullane, Stephanie A.; Van Allen, Eliezer M.
2016-01-01
Purpose of review Precision cancer medicine, the use of genomic profiling of patient tumors at the point-of-care to inform treatment decisions, is rapidly changing treatment strategies across cancer types. Precision medicine for advanced prostate cancer may identify new treatment strategies and change clinical practice. In this review, we discuss the potential and challenges of precision medicine in advanced prostate cancer. Recent findings Although primary prostate cancers do not harbor highly recurrent targetable genomic alterations, recent reports on the genomics of metastatic castration-resistant prostate cancer has shown multiple targetable alterations in castration-resistant prostate cancer metastatic biopsies. Therapeutic implications include targeting prevalent DNA repair pathway alterations with PARP-1 inhibition in genomically defined subsets of patients, among other genomically stratified targets. In addition, multiple recent efforts have demonstrated the promise of liquid tumor profiling (e.g., profiling circulating tumor cells or cell-free tumor DNA) and highlighted the necessary steps to scale these approaches in prostate cancer. Summary Although still in the initial phase of precision medicine for prostate cancer, there is extraordinary potential for clinical impact. Efforts to overcome current scientific and clinical barriers will enable widespread use of precision medicine approaches for advanced prostate cancer patients. PMID:26909474
Precision medicine for advanced prostate cancer.
Mullane, Stephanie A; Van Allen, Eliezer M
2016-05-01
Precision cancer medicine, the use of genomic profiling of patient tumors at the point-of-care to inform treatment decisions, is rapidly changing treatment strategies across cancer types. Precision medicine for advanced prostate cancer may identify new treatment strategies and change clinical practice. In this review, we discuss the potential and challenges of precision medicine in advanced prostate cancer. Although primary prostate cancers do not harbor highly recurrent targetable genomic alterations, recent reports on the genomics of metastatic castration-resistant prostate cancer has shown multiple targetable alterations in castration-resistant prostate cancer metastatic biopsies. Therapeutic implications include targeting prevalent DNA repair pathway alterations with PARP-1 inhibition in genomically defined subsets of patients, among other genomically stratified targets. In addition, multiple recent efforts have demonstrated the promise of liquid tumor profiling (e.g., profiling circulating tumor cells or cell-free tumor DNA) and highlighted the necessary steps to scale these approaches in prostate cancer. Although still in the initial phase of precision medicine for prostate cancer, there is extraordinary potential for clinical impact. Efforts to overcome current scientific and clinical barriers will enable widespread use of precision medicine approaches for advanced prostate cancer patients.
Maggi, Elaine; Montagna, Cristina
2015-12-01
The American Association for Cancer Research (AACR) Precision Medicine Series "Integrating Clinical Genomics and Cancer Therapy" took place June 13-16, 2015 in Salt Lake City, Utah. The conference was co-chaired by Charles L. Sawyers form Memorial Sloan Kettering Cancer Center in New York, Elaine R. Mardis form Washington University School of Medicine in St. Louis, and Arul M. Chinnaiyan from University of Michigan in Ann Arbor. About 500 clinicians, basic science investigators, bioinformaticians, and postdoctoral fellows joined together to discuss the current state of Clinical Genomics and the advances and challenges of integrating Next Generation Sequencing (NGS) technologies into clinical practice. The plenary sessions and panel discussions covered current platforms and sequencing approaches adopted for NGS assays of cancer genome at several national and international institutions, different approaches used to map and classify targetable sequence variants, and how information acquired with the sequencing of the cancer genome is used to guide treatment options. While challenges still exist from a technological perspective, it emerged that there exists considerable need for the development of tools to aid the identification of the therapy most suitable based on the mutational profile of the somatic cancer genome. The process to match patients to ongoing clinical trials is still complex. In addition, the need for centralized data repositories, preferably linked to well annotated clinical records, that aid sharing of sequencing information is central to begin understanding the contribution of variants of unknown significance to tumor etiology and response to therapy. Here we summarize the highlights of this stimulating four-day conference with a major emphasis on the open problems that the clinical genomics community is currently facing and the tools most needed for advancing this field. Copyright © 2015. Published by Elsevier B.V. All rights reserved.
Babak, Tomas; Garrett-Engele, Philip; Armour, Christopher D; Raymond, Christopher K; Keller, Mark P; Chen, Ronghua; Rohl, Carol A; Johnson, Jason M; Attie, Alan D; Fraser, Hunter B; Schadt, Eric E
2010-08-13
Identifying associations between genotypes and gene expression levels using microarrays has enabled systematic interrogation of regulatory variation underlying complex phenotypes. This approach has vast potential for functional characterization of disease states, but its prohibitive cost, given hundreds to thousands of individual samples from populations have to be genotyped and expression profiled, has limited its widespread application. Here we demonstrate that genomic regions with allele-specific expression (ASE) detected by sequencing cDNA are highly enriched for cis-acting expression quantitative trait loci (cis-eQTL) identified by profiling of 500 animals in parallel, with up to 90% agreement on the allele that is preferentially expressed. We also observed widespread noncoding and antisense ASE and identified several allele-specific alternative splicing variants. Monitoring ASE by sequencing cDNA from as little as one sample is a practical alternative to expression genetics for mapping cis-acting variation that regulates RNA transcription and processing.
Company Profile: AKESOgen, Inc.
Bouzyk, Mark; Boisjoli, Robert
2012-07-01
Rapid advancement of genomics, genetic and bioinformatic technologies have paved the way for an explosion of opportunities in pharmacogenomics, which is reflected by the growing number of biomarkers in the 'personalized medicine cabinet'. AKESOgen, Inc. (GA, USA) has been established to meet and champion these needs. AKESOgen, Inc. is a biomarker, genomics and pharmacogenomics contract research organization that services the academic, pharmaceutical, biotechnology and agricultural sectors. AKESOgen, Inc. performs biomarker profiling and genomics services utilizing different types of markers (e.g., DNA, RNA and methylation) for the research and development market. AKESOgen, Inc. establishes and validates biomarkers in the clinical trials arena and provides expertise in biobanking.
mRNA Expression Profiling of Laser Microbeam Microdissected Cells from Slender Embryonic Structures
Scheidl, Stefan J.; Nilsson, Sven; Kalén, Mattias; Hellström, Mats; Takemoto, Minoru; Håkansson, Joakim; Lindahl, Per
2002-01-01
Microarray hybridization has rapidly evolved as an important tool for genomic studies and studies of gene regulation at the transcriptome level. Expression profiles from homogenous samples such as yeast and mammalian cell cultures are currently extending our understanding of biology, whereas analyses of multicellular organisms are more difficult because of tissue complexity. The combination of laser microdissection, RNA amplification, and microarray hybridization has the potential to provide expression profiles from selected populations of cells in vivo. In this article, we present and evaluate an experimental procedure for global gene expression analysis of slender embryonic structures using laser microbeam microdissection and laser pressure catapulting. As a proof of principle, expression profiles from 1000 cells in the mouse embryonic (E9.5) dorsal aorta were generated and compared with profiles for captured mesenchymal cells located one cell diameter further away from the aortic lumen. A number of genes were overexpressed in the aorta, including 11 previously known markers for blood vessels. Among the blood vessel markers were endoglin, tie-2, PDGFB, and integrin-β1, that are important regulators of blood vessel formation. This demonstrates that microarray analysis of laser microbeam micro-dissected cells is sufficiently sensitive for identifying genes with regulative functions. PMID:11891179
Ikegami, Kohta; Ohgane, Jun; Tanaka, Satoshi; Yagi, Shintaro; Shiota, Kunio
2009-01-01
Genes constitute only a small proportion of the mammalian genome, the majority of which is composed of non-genic repetitive elements including interspersed repeats and satellites. A unique feature of the mammalian genome is that there are numerous tissue-dependent, differentially methylated regions (T-DMRs) in the non-repetitive sequences, which include genes and their regulatory elements. The epigenetic status of T-DMRs varies from that of repetitive elements and constitutes the DNA methylation profile genome-wide. Since the DNA methylation profile is specific to each cell and tissue type, much like a fingerprint, it can be used as a means of identification. The formation of DNA methylation profiles is the basis for cell differentiation and development in mammals. The epigenetic status of each T-DMR is regulated by the interplay between DNA methyltransferases, histone modification enzymes, histone subtypes, non-histone nuclear proteins and non-coding RNAs. In this review, we will discuss how these epigenetic factors cooperate to establish cell- and tissue-specific DNA methylation profiles.
Watson for Genomics: Moving Personalized Medicine Forward.
Rhrissorrakrai, Kahn; Koyama, Takahiko; Parida, Laxmi
2016-08-01
The confluence of genomic technologies and cognitive computing has brought us to the doorstep of widespread usage of personalized medicine. Cognitive systems, such as Watson for Genomics (WG), integrate massive amounts of new omic data with the current body of knowledge to assist physicians in analyzing and acting on patient's genomic profiles. Copyright © 2016 Elsevier Inc. All rights reserved.
Weinberg, Benjamin A.; Gowen, Kyle; Lee, Thomas K.; Ou, Sai‐Hong Ignatius; Bristow, Robert; Krill, Lauren; Almira‐Suarez, M. Isabel; Ali, Siraj M.; Miller, Vincent A.; Liu, Stephen V.
2017-01-01
Abstract Background. Metastatic recurrence after treatment for locoregional cancer is a major cause of morbidity and cancer‐specific mortality. Distinguishing metastatic recurrence from the development of a second primary cancer has important prognostic and therapeutic value and represents a difficult clinical scenario. Advances beyond histopathological comparison are needed. We sought to interrogate the ability of comprehensive genomic profiling (CGP) to aid in distinguishing between these clinical scenarios. Materials and Methods. We identified three prospective cases of recurrent tumors in patients previously treated for localized cancers in which histologic analyses suggested subsequent development of a distinct second primary. Paired samples from the original primary and recurrent tumor were subjected to hybrid capture next‐generation sequencing‐based CGP to identify base pair substitutions, insertions, deletions, copy number alterations (CNA), and chromosomal rearrangements. Genomic profiles between paired samples were compared using previously established statistical clonality assessment software to gauge relatedness beyond global CGP similarities. Results. A high degree of similarity was observed among genomic profiles from morphologically distinct primary and recurrent tumors. Genomic information suggested reclassification as recurrent metastatic disease, and patients received therapy for metastatic disease based on the molecular determination. Conclusions. Our cases demonstrate an important adjunct role for CGP technologies in separating metastatic recurrence from development of a second primary cancer. Larger series are needed to confirm our observations, but comparative CGP may be considered in patients for whom distinguishing metastatic recurrence from a second primary would alter the therapeutic approach. Implications for Practice. Distinguishing a metastatic recurrence from a second primary cancer can represent a difficult clinicopathologic problem but has important prognostic and therapeutic implications. Approaches to aid histologic analysis may improve clinician and pathologist confidence in this increasingly common clinical scenario. Our series provides early support for incorporating paired comprehensive genomic profiling in clinical situations in which determination of metastatic recurrence versus a distinct second primary cancer would influence patient management. PMID:28193735
Valdes Franco, José A; Wang, Yi; Huo, Naxin; Ponciano, Grisel; Colvin, Howard A; McMahan, Colleen M; Gu, Yong Q; Belknap, William R
2018-04-19
Guayule (Parthenium argentatum A. Gray) is a rubber-producing desert shrub native to Mexico and the United States. Guayule represents an alternative to Hevea brasiliensis as a source for commercial natural rubber. The efficient application of modern molecular/genetic tools to guayule improvement requires characterization of its genome. The 1.6 Gb guayule genome was sequenced, assembled and annotated. The final 1.5 Gb assembly, while fragmented (N 50 = 22 kb), maps > 95% of the shotgun reads and is essentially complete. Approximately 40,000 transcribed, protein encoding genes were annotated on the assembly. Further characterization of this genome revealed 15 families of small, microsatellite-associated, transposable elements (TEs) with unexpected chromosomal distribution profiles. These SaTar (Satellite Targeted) elements, which are non-autonomous Mu-like elements (MULEs), were frequently observed in multimeric linear arrays of unrelated individual elements within which no individual element is interrupted by another. This uniformly non-nested TE multimer architecture has not been previously described in either eukaryotic or prokaryotic genomes. Five families of similarly distributed non-autonomous MULEs (microsatellite associated, modularly assembled) were characterized in the rice genome. Families of TEs with similar structures and distribution profiles were identified in sorghum and citrus. The sequencing and assembly of the guayule genome provides a foundation for application of current crop improvement technologies to this plant. In addition, characterization of this genome revealed SaTar elements with distribution profiles unique among TEs. Satar targeting appears based on an alternative MULE recombination mechanism with the potential to impact gene evolution.
Emerging trends in the functional genomics of the abiotic stress response in crop plants.
Vij, Shubha; Tyagi, Akhilesh K
2007-05-01
Plants are exposed to different abiotic stresses, such as water deficit, high temperature, salinity, cold, heavy metals and mechanical wounding, under field conditions. It is estimated that such stress conditions can potentially reduce the yield of crop plants by more than 50%. Investigations of the physiological, biochemical and molecular aspects of stress tolerance have been conducted to unravel the intrinsic mechanisms developed during evolution to mitigate against stress by plants. Before the advent of the genomics era, researchers primarily used a gene-by-gene approach to decipher the function of the genes involved in the abiotic stress response. However, abiotic stress tolerance is a complex trait and, although large numbers of genes have been identified to be involved in the abiotic stress response, there remain large gaps in our understanding of the trait. The availability of the genome sequences of certain important plant species has enabled the use of strategies, such as genome-wide expression profiling, to identify the genes associated with the stress response, followed by the verification of gene function by the analysis of mutants and transgenics. Certain components of both abscisic acid-dependent and -independent cascades involved in the stress response have already been identified. Information originating from the genome-wide analysis of abiotic stress tolerance will help to provide an insight into the stress-responsive network(s), and may allow the modification of this network to reduce the loss caused by stress and to increase agricultural productivity.
Malaria in India: The Center for the Study of Complex Malaria in India
Das, Aparup; Anvikar, Anupkumar R.; Cator, Lauren J.; Dhiman, Ramesh C.; Eapen, Alex; Mishra, Neelima; Nagpal, Bhupinder N.; Nanda, Nutan; Raghavendra, Kamaraju; Read, Andrew F.; Sharma, Surya K.; Singh, Om P.; Singh, Vineeta; Sinnis, Photini; Srivastava, Harish C.; Sullivan, Steven A.; Sutton, Patrick L.; Thomas, Matthew B.; Carlton, Jane M.; Valecha, Neena
2012-01-01
Malaria is a major public health problem in India and one which contributes significantly to the overall malaria burden in Southeast Asia. The National Vector Borne Disease Control Program of India reported ~1.6 million cases and ~1100 malaria deaths in 2009. Some experts argue that this is a serious underestimation and that the actual number of malaria cases per year is likely between 9 and 50 times greater, with an approximate 13-fold underestimation of malaria-related mortality. The difficulty in making these estimations is further exacerbated by (i) highly variable malaria eco-epidemiological profiles, (ii) the transmission and overlap of multiple Plasmodium species and Anopheles vectors, (iii) increasing antimalarial drug resistance and insecticide resistance, and (iv) the impact of climate change on each of these variables. Simply stated, the burden of malaria in India is complex. Here we describe plans for a Center for the Study of Complex Malaria in India (CSCMi), one of ten International Centers of Excellence in Malaria Research (ICEMRs) located in malarious regions of the world recently funded by the National Institute of Allergy and Infectious Diseases, National Institutes of Health. The CSCMi is a close partnership between Indian and United States scientists, and aims to address major gaps in our understanding of the complexity of malaria in India, including changing patterns of epidemiology, vector biology and control, drug resistance, and parasite genomics. We hope that such a multidisciplinary approach that integrates clinical and field studies with laboratory, molecular, and genomic methods will provide a powerful combination for malaria control and prevention in India. PMID:22142788
In silico polypharmacology of natural products.
Fang, Jiansong; Liu, Chuang; Wang, Qi; Lin, Ping; Cheng, Feixiong
2017-04-27
Natural products with polypharmacological profiles have demonstrated promise as novel therapeutics for various complex diseases, including cancer. Currently, many gaps exist in our knowledge of which compounds interact with which targets, and experimentally testing all possible interactions is infeasible. Recent advances and developments of systems pharmacology and computational (in silico) approaches provide powerful tools for exploring the polypharmacological profiles of natural products. In this review, we introduce recent progresses and advances of computational tools and systems pharmacology approaches for identifying drug targets of natural products by focusing on the development of targeted cancer therapy. We survey the polypharmacological and systems immunology profiles of five representative natural products that are being considered as cancer therapies. We summarize various chemoinformatics, bioinformatics and systems biology resources for reconstructing drug-target networks of natural products. We then review currently available computational approaches and tools for prediction of drug-target interactions by focusing on five domains: target-based, ligand-based, chemogenomics-based, network-based and omics-based systems biology approaches. In addition, we describe a practical example of the application of systems pharmacology approaches by integrating the polypharmacology of natural products and large-scale cancer genomics data for the development of precision oncology under the systems biology framework. Finally, we highlight the promise of cancer immunotherapies and combination therapies that target tumor ecosystems (e.g. clones or 'selfish' sub-clones) via exploiting the immunological and inflammatory 'side' effects of natural products in the cancer post-genomics era. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Genome-enabled prediction models for yield related traits in chickpea
USDA-ARS?s Scientific Manuscript database
Genomic selection (GS) unlike marker-assisted backcrossing (MABC) predicts breeding values of lines using genome-wide marker profiling and allows selection of lines prior to field-phenotyping, thereby shortening the breeding cycle. A collection of 320 elite breeding lines was selected and phenotyped...
Yuan, Bo; Liu, Pengfei; Gupta, Aditya; Beck, Christine R.; Tejomurtula, Anusha; Campbell, Ian M.; Gambin, Tomasz; Simmons, Alexandra D.; Withers, Marjorie A.; Harris, R. Alan; Rogers, Jeffrey; Schwartz, David C.; Lupski, James R.
2015-01-01
Many loci in the human genome harbor complex genomic structures that can result in susceptibility to genomic rearrangements leading to various genomic disorders. Nephronophthisis 1 (NPHP1, MIM# 256100) is an autosomal recessive disorder that can be caused by defects of NPHP1; the gene maps within the human 2q13 region where low copy repeats (LCRs) are abundant. Loss of function of NPHP1 is responsible for approximately 85% of the NPHP1 cases—about 80% of such individuals carry a large recurrent homozygous NPHP1 deletion that occurs via nonallelic homologous recombination (NAHR) between two flanking directly oriented ~45 kb LCRs. Published data revealed a non-pathogenic inversion polymorphism involving the NPHP1 gene flanked by two inverted ~358 kb LCRs. Using optical mapping and array-comparative genomic hybridization, we identified three potential novel structural variant (SV) haplotypes at the NPHP1 locus that may protect a haploid genome from the NPHP1 deletion. Inter-species comparative genomic analyses among primate genomes revealed massive genomic changes during evolution. The aggregated data suggest that dynamic genomic rearrangements occurred historically within the NPHP1 locus and generated SV haplotypes observed in the human population today, which may confer differential susceptibility to genomic instability and the NPHP1 deletion within a personal genome. Our study documents diverse SV haplotypes at a complex LCR-laden human genomic region. Comparative analyses provide a model for how this complex region arose during primate evolution, and studies among humans suggest that intra-species polymorphism may potentially modulate an individual’s susceptibility to acquiring disease-associated alleles. PMID:26641089
From genes to genomes: a new paradigm for studying fungal pathogenesis in Magnaporthe oryzae.
Xu, Jin-Rong; Zhao, Xinhua; Dean, Ralph A
2007-01-01
Magnaporthe oryzae is the most destructive fungal pathogen of rice worldwide and because of its amenability to classical and molecular genetic manipulation, availability of a genome sequence, and other resources it has emerged as a leading model system to study host-pathogen interactions. This chapter reviews recent progress toward elucidation of the molecular basis of infection-related morphogenesis, host penetration, invasive growth, and host-pathogen interactions. Related information on genome analysis and genomic studies of plant infection processes is summarized under specific topics where appropriate. Particular emphasis is placed on the role of MAP kinase and cAMP signal transduction pathways and unique features in the genome such as repetitive sequences and expanded gene families. Emerging developments in functional genome analysis through large-scale insertional mutagenesis and gene expression profiling are detailed. The chapter concludes with new prospects in the area of systems biology, such as protein expression profiling, and highlighting remaining crucial information needed to fully appreciate host-pathogen interactions.
Davey, Mark W; Graham, Neil S; Vanholme, Bartel; Swennen, Rony; May, Sean T; Keulemans, Johan
2009-01-01
Background 'Systems-wide' approaches such as microarray RNA-profiling are ideally suited to the study of the complex overlapping responses of plants to biotic and abiotic stresses. However, commercial microarrays are only available for a limited number of plant species and development costs are so substantial as to be prohibitive for most research groups. Here we evaluate the use of cross-hybridisation to Affymetrix oligonucleotide GeneChip® microarrays to profile the response of the banana (Musa spp.) leaf transcriptome to drought stress using a genomic DNA (gDNA)-based probe-selection strategy to improve the efficiency of detection of differentially expressed Musa transcripts. Results Following cross-hybridisation of Musa gDNA to the Rice GeneChip® Genome Array, ~33,700 gene-specific probe-sets had a sufficiently high degree of homology to be retained for transcriptomic analyses. In a proof-of-concept approach, pooled RNA representing a single biological replicate of control and drought stressed leaves of the Musa cultivar 'Cachaco' were hybridised to the Affymetrix Rice Genome Array. A total of 2,910 Musa gene homologues with a >2-fold difference in expression levels were subsequently identified. These drought-responsive transcripts included many functional classes associated with plant biotic and abiotic stress responses, as well as a range of regulatory genes known to be involved in coordinating abiotic stress responses. This latter group included members of the ERF, DREB, MYB, bZIP and bHLH transcription factor families. Fifty-two of these drought-sensitive Musa transcripts were homologous to genes underlying QTLs for drought and cold tolerance in rice, including in 2 instances QTLs associated with a single underlying gene. The list of drought-responsive transcripts also included genes identified in publicly-available comparative transcriptomics experiments. Conclusion Our results demonstrate that despite the general paucity of nucleotide sequence data in Musa and only distant phylogenetic relations to rice, gDNA probe-based cross-hybridisation to the Rice GeneChip® is a highly promising strategy to study complex biological responses and illustrates the potential of such strategies for gene discovery in non-model species. PMID:19758430
Schmidt, Martin; Van Bel, Michiel; Woloszynska, Magdalena; Slabbinck, Bram; Martens, Cindy; De Block, Marc; Coppens, Frederik; Van Lijsebettens, Mieke
2017-07-06
Cytosine methylation in plant genomes is important for the regulation of gene transcription and transposon activity. Genome-wide methylomes are studied upon mutation of the DNA methyltransferases, adaptation to environmental stresses or during development. However, from basic biology to breeding programs, there is a need to monitor multiple samples to determine transgenerational methylation inheritance or differential cytosine methylation. Methylome data obtained by sodium hydrogen sulfite (bisulfite)-conversion and next-generation sequencing (NGS) provide genome-wide information on cytosine methylation. However, a profiling method that detects cytosine methylation state dispersed over the genome would allow high-throughput analysis of multiple plant samples with distinct epigenetic signatures. We use specific restriction endonucleases to enrich for cytosine coverage in a bisulfite and NGS-based profiling method, which was compared to whole-genome bisulfite sequencing of the same plant material. We established an effective methylome profiling method in plants, termed plant-reduced representation bisulfite sequencing (plant-RRBS), using optimized double restriction endonuclease digestion, fragment end repair, adapter ligation, followed by bisulfite conversion, PCR amplification and NGS. We report a performant laboratory protocol and a straightforward bioinformatics data analysis pipeline for plant-RRBS, applicable for any reference-sequenced plant species. As a proof of concept, methylome profiling was performed using an Oryza sativa ssp. indica pure breeding line and a derived epigenetically altered line (epiline). Plant-RRBS detects methylation levels at tens of millions of cytosine positions deduced from bisulfite conversion in multiple samples. To evaluate the method, the coverage of cytosine positions, the intra-line similarity and the differential cytosine methylation levels between the pure breeding line and the epiline were determined. Plant-RRBS reproducibly covers commonly up to one fourth of the cytosine positions in the rice genome when using MspI-DpnII within a group of five biological replicates of a line. The method predominantly detects cytosine methylation in putative promoter regions and not-annotated regions in rice. Plant-RRBS offers high-throughput and broad, genome-dispersed methylation detection by effective read number generation obtained from reproducibly covered genome fractions using optimized endonuclease combinations, facilitating comparative analyses of multi-sample studies for cytosine methylation and transgenerational stability in experimental material and plant breeding populations.
Genome engineering in ophthalmology: Application of CRISPR/Cas to the treatment of eye disease.
Hung, Sandy S C; McCaughey, Tristan; Swann, Olivia; Pébay, Alice; Hewitt, Alex W
2016-07-01
The Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR) and CRISPR-associated protein (Cas) system has enabled an accurate and efficient means to edit the human genome. Rapid advances in this technology could results in imminent clinical application, and with favourable anatomical and immunological profiles, ophthalmic disease will be at the forefront of such work. There have been a number of breakthroughs improving the specificity and efficacy of CRISPR/Cas-mediated genome editing. Similarly, better methods to identify off-target cleavage sites have also been developed. With the impending clinical utility of CRISPR/Cas technology, complex ethical issues related to the regulation and management of the precise applications of human gene editing must be considered. This review discusses the current progress and recent breakthroughs in CRISPR/Cas-based gene engineering, and outlines some of the technical issues that must be addressed before gene correction, be it in vivo or in vitro, is integrated into ophthalmic care. We outline a clinical pipeline for CRISPR-based treatments of inherited eye diseases and provide an overview of the important ethical implications of gene editing and how these may influence the future of this technology. Copyright © 2016 Elsevier Ltd. All rights reserved.
King, Matthew R.; Matzat, Leah H.; Dale, Ryan K.; Lim, Su Jun; Lei, Elissa P.
2014-01-01
ABSTRACT Chromatin insulators are DNA–protein complexes that are situated throughout the genome that are proposed to contribute to higher-order organization and demarcation into distinct transcriptional domains. Mounting evidence in different species implicates RNA and RNA-binding proteins as regulators of chromatin insulator activities. Here, we identify the Drosophila hnRNP M homolog Rumpelstiltskin (Rump) as an antagonist of gypsy chromatin insulator enhancer-blocking and barrier activities. Despite ubiquitous expression of Rump, decreasing Rump levels leads to improvement of barrier activity only in tissues outside of the central nervous system (CNS). Furthermore, rump mutants restore insulator body localization in an insulator mutant background only in non-CNS tissues. Rump associates physically with core gypsy insulator proteins, and chromatin immunoprecipitation and sequencing analysis of Rump demonstrates extensive colocalization with a subset of insulator sites across the genome. The genome-wide binding profile and tissue specificity of Rump contrast with that of Shep, a recently identified RNA-binding protein that antagonizes gypsy insulator activity primarily in the CNS. Our findings indicate parallel roles for RNA-binding proteins in mediating tissue-specific regulation of chromatin insulator activity. PMID:24706949
CellLineNavigator: a workbench for cancer cell line analysis
Krupp, Markus; Itzel, Timo; Maass, Thorsten; Hildebrandt, Andreas; Galle, Peter R.; Teufel, Andreas
2013-01-01
The CellLineNavigator database, freely available at http://www.medicalgenomics.org/celllinenavigator, is a web-based workbench for large scale comparisons of a large collection of diverse cell lines. It aims to support experimental design in the fields of genomics, systems biology and translational biomedical research. Currently, this compendium holds genome wide expression profiles of 317 different cancer cell lines, categorized into 57 different pathological states and 28 individual tissues. To enlarge the scope of CellLineNavigator, the database was furthermore closely linked to commonly used bioinformatics databases and knowledge repositories. To ensure easy data access and search ability, a simple data and an intuitive querying interface were implemented. It allows the user to explore and filter gene expression, focusing on pathological or physiological conditions. For a more complex search, the advanced query interface may be used to query for (i) differentially expressed genes; (ii) pathological or physiological conditions; or (iii) gene names or functional attributes, such as Kyoto Encyclopaedia of Genes and Genomes pathway maps. These queries may also be combined. Finally, CellLineNavigator allows additional advanced analysis of differentially regulated genes by a direct link to the Database for Annotation, Visualization and Integrated Discovery (DAVID) Bioinformatics Resources. PMID:23118487
Lee, Mikyung; Kim, Yangseok
2009-12-16
Genomic alterations frequently occur in many cancer patients and play important mechanistic roles in the pathogenesis of cancer. Furthermore, they can modify the expression level of genes due to altered copy number in the corresponding region of the chromosome. An accumulating body of evidence supports the possibility that strong genome-wide correlation exists between DNA content and gene expression. Therefore, more comprehensive analysis is needed to quantify the relationship between genomic alteration and gene expression. A well-designed bioinformatics tool is essential to perform this kind of integrative analysis. A few programs have already been introduced for integrative analysis. However, there are many limitations in their performance of comprehensive integrated analysis using published software because of limitations in implemented algorithms and visualization modules. To address this issue, we have implemented the Java-based program CHESS to allow integrative analysis of two experimental data sets: genomic alteration and genome-wide expression profile. CHESS is composed of a genomic alteration analysis module and an integrative analysis module. The genomic alteration analysis module detects genomic alteration by applying a threshold based method or SW-ARRAY algorithm and investigates whether the detected alteration is phenotype specific or not. On the other hand, the integrative analysis module measures the genomic alteration's influence on gene expression. It is divided into two separate parts. The first part calculates overall correlation between comparative genomic hybridization ratio and gene expression level by applying following three statistical methods: simple linear regression, Spearman rank correlation and Pearson's correlation. In the second part, CHESS detects the genes that are differentially expressed according to the genomic alteration pattern with three alternative statistical approaches: Student's t-test, Fisher's exact test and Chi square test. By successive operations of two modules, users can clarify how gene expression levels are affected by the phenotype specific genomic alterations. As CHESS was developed in both Java application and web environments, it can be run on a web browser or a local machine. It also supports all experimental platforms if a properly formatted text file is provided to include the chromosomal position of probes and their gene identifiers. CHESS is a user-friendly tool for investigating disease specific genomic alterations and quantitative relationships between those genomic alterations and genome-wide gene expression profiling.
Genomic signatures predict migration and spawning failure in wild Canadian salmon.
Miller, Kristina M; Li, Shaorong; Kaukinen, Karia H; Ginther, Norma; Hammill, Edd; Curtis, Janelle M R; Patterson, David A; Sierocinski, Thomas; Donnison, Louise; Pavlidis, Paul; Hinch, Scott G; Hruska, Kimberly A; Cooke, Steven J; English, Karl K; Farrell, Anthony P
2011-01-14
Long-term population viability of Fraser River sockeye salmon (Oncorhynchus nerka) is threatened by unusually high levels of mortality as they swim to their spawning areas before they spawn. Functional genomic studies on biopsied gill tissue from tagged wild adults that were tracked through ocean and river environments revealed physiological profiles predictive of successful migration and spawning. We identified a common genomic profile that was correlated with survival in each study. In ocean-tagged fish, a mortality-related genomic signature was associated with a 13.5-fold greater chance of dying en route. In river-tagged fish, the same genomic signature was associated with a 50% increase in mortality before reaching the spawning grounds in one of three stocks tested. At the spawning grounds, the same signature was associated with 3.7-fold greater odds of dying without spawning. Functional analysis raises the possibility that the mortality-related signature reflects a viral infection.
Smola, Matthew J.; Rice, Greggory M.; Busan, Steven; Siegfried, Nathan A.; Weeks, Kevin M.
2016-01-01
SHAPE chemistries exploit small electrophilic reagents that react with the 2′-hydroxyl group to interrogate RNA structure at single-nucleotide resolution. Mutational profiling (MaP) identifies modified residues based on the ability of reverse transcriptase to misread a SHAPE-modified nucleotide and then counting the resulting mutations by massively parallel sequencing. The SHAPE-MaP approach measures the structure of large and transcriptome-wide systems as accurately as for simple model RNAs. This protocol describes the experimental steps, implemented over three days, required to perform SHAPE probing and construct multiplexed SHAPE-MaP libraries suitable for deep sequencing. These steps include RNA folding and SHAPE structure probing, mutational profiling by reverse transcription, library construction, and sequencing. Automated processing of MaP sequencing data is accomplished using two software packages. ShapeMapper converts raw sequencing files into mutational profiles, creates SHAPE reactivity plots, and provides useful troubleshooting information, often within an hour. SuperFold uses these data to model RNA secondary structures, identify regions with well-defined structures, and visualize probable and alternative helices, often in under a day. We illustrate these algorithms with the E. coli thiamine pyrophosphate riboswitch, E. coli 16S rRNA, and HIV-1 genomic RNAs. SHAPE-MaP can be used to make nucleotide-resolution biophysical measurements of individual RNA motifs, rare components of complex RNA ensembles, and entire transcriptomes. The straightforward MaP strategy greatly expands the number, length, and complexity of analyzable RNA structures. PMID:26426499
Green, Michael R; Aya-Bonilla, Carlos; Gandhi, Maher K; Lea, Rod A; Wellwood, Jeremy; Wood, Peter; Marlton, Paula; Griffiths, Lyn R
2011-05-01
Recent developments in genomic technologies have resulted in increased understanding of pathogenic mechanisms and emphasized the importance of central survival pathways. Here, we use a novel bioinformatic based integrative genomic profiling approach to elucidate conserved mechanisms of lymphomagenesis in the three commonest non-Hodgkin's lymphoma (NHL) entities: diffuse large B-cell lymphoma, follicular lymphoma, and B-cell chronic lymphocytic leukemia. By integrating genome-wide DNA copy number analysis and transcriptome profiling of tumor cohorts, we identified genetic lesions present in each entity and highlighted their likely target genes. This revealed a significant enrichment of components of both the apoptosis pathway and the mitogen activated protein kinase pathway, including amplification of the MAP3K12 locus in all three entities, within the set of genes targeted by genetic alterations in these diseases. Furthermore, amplification of 12p13.33 was identified in all three entities and found to target the FOXM1 oncogene. Amplification of FOXM1 was subsequently found to be associated with an increased MYC oncogenic signaling signature, and siRNA-mediated knock-down of FOXM1 resulted in decreased MYC expression and induced G2 arrest. Together, these findings underscore genetic alteration of the MAPK and apoptosis pathways, and genetic amplification of FOXM1 as conserved mechanisms of lymphomagenesis in common NHL entities. Integrative genomic profiling identifies common central survival mechanisms and highlights them as attractive targets for directed therapy. 2011 Wiley-Liss, Inc.
Tao, Xiang; Lai, Xian-Jun; Zhang, Yi-Zheng; Tan, Xue-Mei; Wang, Haiyan
2014-01-01
Background Transposable elements (TEs) are the most abundant genomic components in eukaryotes and affect the genome by their replications and movements to generate genetic plasticity. Sweet potato performs asexual reproduction generally and the TEs may be an important genetic factor for genome reorganization. Complete identification of TEs is essential for the study of genome evolution. However, the TEs of sweet potato are still poorly understood because of its complex hexaploid genome and difficulty in genome sequencing. The recent availability of the sweet potato transcriptome databases provides an opportunity for discovering and characterizing the expressed TEs. Methodology/Principal Findings We first established the integrated-transcriptome database by de novo assembling four published sweet potato transcriptome databases from three cultivars in China. Using sequence-similarity search and analysis, a total of 1,405 TEs including 883 retrotransposons and 522 DNA transposons were predicted and categorized. Depending on mapping sets of RNA-Seq raw short reads to the predicted TEs, we compared the quantities, classifications and expression activities of TEs inter- and intra-cultivars. Moreover, the differential expressions of TEs in seven tissues of Xushu 18 cultivar were analyzed by using Illumina digital gene expression (DGE) tag profiling. It was found that 417 TEs were expressed in one or more tissues and 107 in all seven tissues. Furthermore, the copy number of 11 transposase genes was determined to be 1–3 copies in the genome of sweet potato by Real-time PCR-based absolute quantification. Conclusions/Significance Our result provides a new method for TE searching on species with transcriptome sequences while lacking genome information. The searching, identification and expression analysis of TEs will provide useful TE information in sweet potato, which are valuable for the further studies of TE-mediated gene mutation and optimization in asexual reproduction. It contributes to elucidating the roles of TEs in genome evolution. PMID:24608103
Lin, Douglas I; Chudnovsky, Yakov; Duggan, Bridget; Zajchowski, Deborah; Greenbowe, Joel; Ross, Jeffrey S; Gay, Laurie M; Ali, Siraj M; Elvin, Julia A
2017-12-01
Small cell carcinoma of the ovary, hypercalcemic-type (SCCOHT) is a rare, extremely aggressive neoplasm that usually occurs in young women and is characterized by deleterious germline or somatic SMARCA4 mutations. We performed comprehensive genomic profiling (CGP) to potentially identify additional clinically and pathophysiologically relevant genomic alterations in SCCOHT. CGP assessment of all classes of coding alterations in up to 406 genes commonly altered in cancer and intronic regions for up to 31 genes commonly rearranged in cancer was performed on 18 SCCOHT cases (16 exhibiting classic morphology and 2 cases exhibiting exclusive a large cell variant morphology). In addition, a retrospective database search for clinically advanced ovarian tumors with genomic profiles similar to SCCOHT yielded 3 additional cases originally diagnosed as non-SCCOHT. CGP revealed inactivating SMARCA4 alterations and low tumor mutational burden (TMB) (<6mutations/Mb) in 94% (15/16) of SCCOHT with classic morphology. In contrast, both (2/2) cases exhibiting only large cell variant morphology were hypermutated (TMB scores of 90 and 360mut/Mb) and were wildtype for SMARCA4. In our retrospective search, an index ovarian cancer patient harboring inactivating SMARCA4 alterations, initially diagnosed as endometrioid carcinoma, was re-classified as SCCOHT and responded to an SCCOHT chemotherapy regimen. The vast majority of SCCOHT demonstrate genomic SMARCA4 loss with only rare co-occurring alterations. Our data support a role for CGP in the diagnosis and management of SCCOHT and of other lesions with overlapping histological and clinical features, since identifying the former by genomic profile suggests benefit from an appropriate regimen and treatment decisions, as illustrated by an index patient. Copyright © 2017 Elsevier Inc. All rights reserved.
Reisle, Caralyn; Martin, Lee Ann; Alwelaie, Yazeed; Mungall, Karen L.; Ch'ng, Carolyn; Thomas, Ruth; Ng, Tony; Yip, Stephen; J. Lim, Howard; Sun, Sophie; Young, Sean S.; Karsan, Aly; Zhao, Yongjun; Mungall, Andrew J.; Moore, Richard A.; J. Renouf, Daniel; Gelmon, Karen; Ma, Yussanne P.; Hayes, Malcolm; Laskin, Janessa; Marra, Marco A.; Schrader, Kasmintan A.; Jones, Steven J. M.
2017-01-01
We describe a woman with the known pathogenic germline variant CHEK2:c.1100delC and synchronous diagnoses of both pelvic genital type leiomyosarcoma (LMS) and metastatic invasive ductal breast carcinoma. CHEK2 (checkpoint kinase 2) is a tumor-suppressor gene encoding a serine/threonine-protein kinase (CHEK2) involved in double-strand DNA break repair and cell cycle arrest. The CHEK2:c.1100delC variant is a moderate penetrance allele resulting in an approximately twofold increase in breast cancer risk. Whole-genome and whole-transcriptome sequencing were performed on the leiomyosarcoma and matched blood-derived DNA. Despite the presence of several genomic hits within the double-strand DNA damage pathway (CHEK2 germline variant and multiple RAD51B somatic structural variants), tumor profiling did not show an obvious DNA repair deficiency signature. However, even though the LMS displayed clear malignant features, its genomic profiling revealed several characteristics classically associated with leiomyomas including a translocation, t(12;14), with one breakpoint disrupting RAD51B and the other breakpoint upstream of HMGA2 with very high expression of HMGA2 and PLAG1. This is the first report of LMS genomic profiling in a patient with the germline CHEK2:c.1100delC variant and an additional diagnosis of metastatic invasive ductal breast carcinoma. We also describe a possible mechanistic relationship between leiomyoma and LMS based on genomic and transcriptome data. Our findings suggest that RAD51B translocation and HMGA2 overexpression may play an important role in LMS oncogenesis. PMID:28514723
Thibodeau, My Linh; Reisle, Caralyn; Zhao, Eric; Martin, Lee Ann; Alwelaie, Yazeed; Mungall, Karen L; Ch'ng, Carolyn; Thomas, Ruth; Ng, Tony; Yip, Stephen; J Lim, Howard; Sun, Sophie; Young, Sean S; Karsan, Aly; Zhao, Yongjun; Mungall, Andrew J; Moore, Richard A; J Renouf, Daniel; Gelmon, Karen; Ma, Yussanne P; Hayes, Malcolm; Laskin, Janessa; Marra, Marco A; Schrader, Kasmintan A; Jones, Steven J M
2017-09-01
We describe a woman with the known pathogenic germline variant CHEK2 :c.1100delC and synchronous diagnoses of both pelvic genital type leiomyosarcoma (LMS) and metastatic invasive ductal breast carcinoma. CHEK2 (checkpoint kinase 2) is a tumor-suppressor gene encoding a serine/threonine-protein kinase (CHEK2) involved in double-strand DNA break repair and cell cycle arrest. The CHEK2 :c.1100delC variant is a moderate penetrance allele resulting in an approximately twofold increase in breast cancer risk. Whole-genome and whole-transcriptome sequencing were performed on the leiomyosarcoma and matched blood-derived DNA. Despite the presence of several genomic hits within the double-strand DNA damage pathway ( CHEK2 germline variant and multiple RAD51B somatic structural variants), tumor profiling did not show an obvious DNA repair deficiency signature. However, even though the LMS displayed clear malignant features, its genomic profiling revealed several characteristics classically associated with leiomyomas including a translocation, t(12;14), with one breakpoint disrupting RAD51B and the other breakpoint upstream of HMGA2 with very high expression of HMGA2 and PLAG1 This is the first report of LMS genomic profiling in a patient with the germline CHEK2 :c.1100delC variant and an additional diagnosis of metastatic invasive ductal breast carcinoma. We also describe a possible mechanistic relationship between leiomyoma and LMS based on genomic and transcriptome data. Our findings suggest that RAD51B translocation and HMGA2 overexpression may play an important role in LMS oncogenesis. © 2017 Thibodeau et al.; Published by Cold Spring Harbor Laboratory Press.
Molecular Targeted Therapies of Childhood Choroid Plexus Carcinoma
2013-10-01
Microarray intensities were analyzed in PGS, using the benign human choroid plexus papilloma (CPP) samples as an expression baseline reference. This...additional human and mouse CPC genomic profiles (timeframe: months 1-5). The goal of these studies is to expand our number of genomic profiles (DNA and...mRNA arrays) of both human and mouse CPCs to provide a comprehensive dataset with which to identify key candidate oncogenes, tumor suppressor genes
Molecular Targeted Therapies of Childhood Choroid Plexus Carcinoma
2012-10-01
Microarray intensities were analyzed in PGS, using the benign human choroid plexus papilloma (CPP) samples as an expression baseline reference...identify candidate drug targets of CPC. Task 1: Generation of additional human and mouse CPC genomic profiles (timeframe: months 1-5). The goal...of these studies is to expand our number of genomic profiles (DNA and mRNA arrays) of both human and mouse CPCs to provide a comprehensive dataset
Molecular Targeted Therapies of Childhood Choroid Plexus Carcinoma
2011-10-01
were analyzed in PGS, using the benign human choroid plexus papilloma (CPP) samples as an expression baseline reference. This analysis highlights...Task 1: Generation of additional human and mouse CPC genomic profiles (timeframe: months 1-5). The goal of these studies is to expand our...number of genomic profiles (DNA and mRNA arrays) of both human and mouse CPCs to provide a comprehensive dataset with which to identify key candidate
Angelastro, James M.; Klimaschewski, Lars; Tang, Song; Vitolo, Ottavio V.; Weissman, Tamily A.; Donlin, Laura T.; Shelanski, Michael L.; Greene, Lloyd A.
2000-01-01
Neurotrophic factors such as nerve growth factor (NGF) promote a wide variety of responses in neurons, including differentiation, survival, plasticity, and repair. Such actions often require changes in gene expression. To identify the regulated genes and thereby to more fully understand the NGF mechanism, we carried out serial analysis of gene expression (SAGE) profiling of transcripts derived from rat PC12 cells before and after NGF-promoted neuronal differentiation. Multiple criteria supported the reliability of the profile. Approximately 157,000 SAGE tags were analyzed, representing at least 21,000 unique transcripts. Of these, nearly 800 were regulated by 6-fold or more in response to NGF. Approximately 150 of the regulated transcripts have been matched to named genes, the majority of which were not previously known to be NGF-responsive. Functional categorization of the regulated genes provides insight into the complex, integrated mechanism by which NGF promotes its multiple actions. It is anticipated that as genomic sequence information accrues the data derived here will continue to provide information about neurotrophic factor mechanisms. PMID:10984536
Gao, Shan; Chen, Weiyang; Zeng, Yingxin; Jing, Haiming; Zhang, Nan; Flavel, Matthew; Jois, Markandeya; Han, Jing-Dong J; Xian, Bo; Li, Guojun
2018-04-18
Traditional toxicological studies have relied heavily on various animal models to understand the effect of various compounds in a biological context. Considering the great cost, complexity and time involved in experiments using higher order organisms. Researchers have been exploring alternative models that avoid these disadvantages. One example of such a model is the nematode Caenorhabditis elegans. There are some advantages of C. elegans, such as small size, short life cycle, well defined genome, ease of maintenance and efficient reproduction. As these benefits allow large scale studies to be initiated with relative ease, the problem of how to efficiently capture, organize and analyze the resulting large volumes of data must be addressed. We have developed a new method for quantitative screening of chemicals using C. elegans. 33 features were identified for each chemical treatment. The compounds with different toxicities were shown to alter the phenotypes of C. elegans in distinct and detectable patterns. We found that phenotypic profiling revealed conserved functions to classify and predict the toxicity of different chemicals. Our results demonstrate the power of phenotypic profiling in C. elegans under different chemical environments.
Chacon, Diego; Beck, Dominik; Perera, Dilmi; Wong, Jason W H; Pimanda, John E
2014-01-01
The BloodChIP database (http://www.med.unsw.edu.au/CRCWeb.nsf/page/BloodChIP) supports exploration and visualization of combinatorial transcription factor (TF) binding at a particular locus in human CD34-positive and other normal and leukaemic cells or retrieval of target gene sets for user-defined combinations of TFs across one or more cell types. Increasing numbers of genome-wide TF binding profiles are being added to public repositories, and this trend is likely to continue. For the power of these data sets to be fully harnessed by experimental scientists, there is a need for these data to be placed in context and easily accessible for downstream applications. To this end, we have built a user-friendly database that has at its core the genome-wide binding profiles of seven key haematopoietic TFs in human stem/progenitor cells. These binding profiles are compared with binding profiles in normal differentiated and leukaemic cells. We have integrated these TF binding profiles with chromatin marks and expression data in normal and leukaemic cell fractions. All queries can be exported into external sites to construct TF-gene and protein-protein networks and to evaluate the association of genes with cellular processes and tissue expression.
Deppdb--DNA electrostatic potential properties database: electrostatic properties of genome DNA.
Osypov, Alexander A; Krutinin, Gleb G; Kamzolova, Svetlana G
2010-06-01
The electrostatic properties of genome DNA influence its interactions with different proteins, in particular, the regulation of transcription by RNA-polymerases. DEPPDB--DNA Electrostatic Potential Properties Database--was developed to hold and provide all available information on the electrostatic properties of genome DNA combined with its sequence and annotation of biological and structural properties of genome elements and whole genomes. Genomes in DEPPDB are organized on a taxonomical basis. Currently, the database contains all the completely sequenced bacterial and viral genomes according to NCBI RefSeq. General properties of the genome DNA electrostatic potential profile and principles of its formation are revealed. This potential correlates with the GC content but does not correspond to it exactly and strongly depends on both the sequence arrangement and its context (flanking regions). Analysis of the promoter regions for bacterial and viral RNA polymerases revealed a correspondence between the scale of these proteins' physical properties and electrostatic profile patterns. We also discovered a direct correlation between the potential value and the binding frequency of RNA polymerase to DNA, supporting the idea of the role of electrostatics in these interactions. This matches a pronounced tendency of the promoter regions to possess higher values of the electrostatic potential.
Panzenhagen, P H N; Cabral, C C; Suffys, P N; Franco, R M; Rodrigues, D P; Conte-Junior, C A
2018-04-01
Salmonella pathogenicity relies on virulence factors many of which are clustered within the Salmonella pathogenicity islands. Salmonella also harbours mobile genetic elements such as virulence plasmids, prophage-like elements and antimicrobial resistance genes which can contribute to increase its pathogenicity. Here, we have genetically characterized a selected S. Typhimurium strain (CCRJ_26) from our previous study with Multiple Drugs Resistant profile and high-frequency PFGE clonal profile which apparently persists in the pork production centre of Rio de Janeiro State, Brazil. By whole-genome sequencing, we described the strain's genome virulent content and characterized the repertoire of bacterial plasmids, antibiotic resistance genes and prophage-like elements. Here, we have shown evidence that strain CCRJ_26 genome possible represent a virulence-associated phenotype which may be potentially virulent in human infection. Whole-genome sequencing technologies are still costly and remain underexplored for applied microbiology in Brazil. Hence, this genomic description of S. Typhimurium strain CCRJ_26 will provide help in future molecular epidemiological studies. The analysis described here reveals a quick and useful pipeline for bacterial virulence characterization using whole-genome sequencing approach. © 2018 The Society for Applied Microbiology.
An imbalanced parental genome ratio affects the development of rice zygotes.
Toda, Erika; Ohnishi, Yukinosuke; Okamoto, Takashi
2018-04-27
Upon double fertilization, one sperm cell fuses with the egg cell to form a zygote with a 1:1 maternal-to-paternal genome ratio (1m:1p), and another sperm cell fuses with the central cell to form a triploid primary endosperm cell with a 2m:1p ratio, resulting in formation of the embryo and the endosperm, respectively. The endosperm is known to be considerably sensitive to the ratio of the parental genomes. However, the effect of an imbalance of the parental genomes on zygotic development and embryogenesis has not been well studied, because it is difficult to reproduce the parental genome-imbalanced situation in zygotes and to monitor the developmental profile of zygotes without external effects from the endosperm. In this study, we produced polyploid zygotes with an imbalanced parental genome ratio by electro-fusion of isolated rice gametes and observed their developmental profiles. Polyploid zygotes with an excess maternal gamete/genome developed normally, whereas approximately half to three-quarters of polyploid zygotes with a paternal excess showed developmental arrests. These results indicate that paternal and maternal genomes synergistically serve zygote development with distinct functions, and that genes with monoallelic expression play important roles during zygotic development and embryogenesis.
Landscape community genomics: understanding eco-evolutionary processes in complex environments
Hand, Brian K.; Lowe, Winsor H.; Kovach, Ryan P.; Muhlfeld, Clint C.; Luikart, Gordon
2015-01-01
Extrinsic factors influencing evolutionary processes are often categorically lumped into interactions that are environmentally (e.g., climate, landscape) or community-driven, with little consideration of the overlap or influence of one on the other. However, genomic variation is strongly influenced by complex and dynamic interactions between environmental and community effects. Failure to consider both effects on evolutionary dynamics simultaneously can lead to incomplete, spurious, or erroneous conclusions about the mechanisms driving genomic variation. We highlight the need for a landscape community genomics (LCG) framework to help to motivate and challenge scientists in diverse fields to consider a more holistic, interdisciplinary perspective on the genomic evolution of multi-species communities in complex environments.
Global Gene Expression Profiles Identify Metastasis Regulatory Networks | Center for Cancer Research
Metastasis is a systemic disease in which cancer cells break away from a tumor and migrate to other parts of the body, usually via the blood or lymphatic systems, to form new tumors. Metastatic tumors are difficult to treat and account for the majority of cancer-related deaths. Susceptibility to metastasis is known to have a genetic component, with some individuals more predisposed than others. However, because of the complex interchange between random genomic and epigenetic events that contribute to the disease, characterization of individual genes or small numbers of genes is not sufficient to understand the processes leading up to metastasis.
Zhang, Aihua; Sun, Hui; Wu, Xiuhong; Wang, Xijun
2012-12-24
Metabolomics is a powerful technique for the discovery of novel biomarkers and elucidation of biochemical pathways to improve diagnosis, prognosis and therapy. An advantage of this approach is its ability to assess global metabolic profiles to enhance pathologic characterization. Urine is an ideal bio-medium for disease study because it is readily available, easily obtained and less complex than other body fluids. Ease of collection allows for serial sampling to monitor disease and therapeutic response. Because of this potential, this paper will review urine metabolomic analysis, discuss its significance in the post-genomic era and highlight the specific roles of endogenous small molecule metabolites in this emerging field. Copyright © 2012 Elsevier B.V. All rights reserved.
Baichoo, Shakuntala; Ouzounis, Christos A
A multitude of algorithms for sequence comparison, short-read assembly and whole-genome alignment have been developed in the general context of molecular biology, to support technology development for high-throughput sequencing, numerous applications in genome biology and fundamental research on comparative genomics. The computational complexity of these algorithms has been previously reported in original research papers, yet this often neglected property has not been reviewed previously in a systematic manner and for a wider audience. We provide a review of space and time complexity of key sequence analysis algorithms and highlight their properties in a comprehensive manner, in order to identify potential opportunities for further research in algorithm or data structure optimization. The complexity aspect is poised to become pivotal as we will be facing challenges related to the continuous increase of genomic data on unprecedented scales and complexity in the foreseeable future, when robust biological simulation at the cell level and above becomes a reality. Copyright © 2017 Elsevier B.V. All rights reserved.
Kogelman, Lisette J A; Zhernakova, Daria V; Westra, Harm-Jan; Cirera, Susanna; Fredholm, Merete; Franke, Lude; Kadarmideen, Haja N
2015-10-20
Obesity is a multi-factorial health problem in which genetic factors play an important role. Limited results have been obtained in single-gene studies using either genomic or transcriptomic data. RNA sequencing technology has shown its potential in gaining accurate knowledge about the transcriptome, and may reveal novel genes affecting complex diseases. Integration of genomic and transcriptomic variation (expression quantitative trait loci [eQTL] mapping) has identified causal variants that affect complex diseases. We integrated transcriptomic data from adipose tissue and genomic data from a porcine model to investigate the mechanisms involved in obesity using a systems genetics approach. Using a selective gene expression profiling approach, we selected 36 animals based on a previously created genomic Obesity Index for RNA sequencing of subcutaneous adipose tissue. Differential expression analysis was performed using the Obesity Index as a continuous variable in a linear model. eQTL mapping was then performed to integrate 60 K porcine SNP chip data with the RNA sequencing data. Results were restricted based on genome-wide significant single nucleotide polymorphisms, detected differentially expressed genes, and previously detected co-expressed gene modules. Further data integration was performed by detecting co-expression patterns among eQTLs and integration with protein data. Differential expression analysis of RNA sequencing data revealed 458 differentially expressed genes. The eQTL mapping resulted in 987 cis-eQTLs and 73 trans-eQTLs (false discovery rate < 0.05), of which the cis-eQTLs were associated with metabolic pathways. We reduced the eQTL search space by focusing on differentially expressed and co-expressed genes and disease-associated single nucleotide polymorphisms to detect obesity-related genes and pathways. Building a co-expression network using eQTLs resulted in the detection of a module strongly associated with lipid pathways. Furthermore, we detected several obesity candidate genes, for example, ENPP1, CTSL, and ABHD12B. To our knowledge, this is the first study to perform an integrated genomics and transcriptomics (eQTL) study using, and modeling, genomic and subcutaneous adipose tissue RNA sequencing data on obesity in a porcine model. We detected several pathways and potential causal genes for obesity. Further validation and investigation may reveal their exact function and association with obesity.
Dumas, Marc-Emmanuel; Domange, Céline; Calderari, Sophie; Martínez, Andrea Rodríguez; Ayala, Rafael; Wilder, Steven P; Suárez-Zamorano, Nicolas; Collins, Stephan C; Wallis, Robert H; Gu, Quan; Wang, Yulan; Hue, Christophe; Otto, Georg W; Argoud, Karène; Navratil, Vincent; Mitchell, Steve C; Lindon, John C; Holmes, Elaine; Cazier, Jean-Baptiste; Nicholson, Jeremy K; Gauguier, Dominique
2016-09-30
The genetic regulation of metabolic phenotypes (i.e., metabotypes) in type 2 diabetes mellitus occurs through complex organ-specific cellular mechanisms and networks contributing to impaired insulin secretion and insulin resistance. Genome-wide gene expression profiling systems can dissect the genetic contributions to metabolome and transcriptome regulations. The integrative analysis of multiple gene expression traits and metabolic phenotypes (i.e., metabotypes) together with their underlying genetic regulation remains a challenge. Here, we introduce a systems genetics approach based on the topological analysis of a combined molecular network made of genes and metabolites identified through expression and metabotype quantitative trait locus mapping (i.e., eQTL and mQTL) to prioritise biological characterisation of candidate genes and traits. We used systematic metabotyping by 1 H NMR spectroscopy and genome-wide gene expression in white adipose tissue to map molecular phenotypes to genomic blocks associated with obesity and insulin secretion in a series of rat congenic strains derived from spontaneously diabetic Goto-Kakizaki (GK) and normoglycemic Brown-Norway (BN) rats. We implemented a network biology strategy approach to visualize the shortest paths between metabolites and genes significantly associated with each genomic block. Despite strong genomic similarities (95-99 %) among congenics, each strain exhibited specific patterns of gene expression and metabotypes, reflecting the metabolic consequences of series of linked genetic polymorphisms in the congenic intervals. We subsequently used the congenic panel to map quantitative trait loci underlying specific mQTLs and genome-wide eQTLs. Variation in key metabolites like glucose, succinate, lactate, or 3-hydroxybutyrate and second messenger precursors like inositol was associated with several independent genomic intervals, indicating functional redundancy in these regions. To navigate through the complexity of these association networks we mapped candidate genes and metabolites onto metabolic pathways and implemented a shortest path strategy to highlight potential mechanistic links between metabolites and transcripts at colocalized mQTLs and eQTLs. Minimizing the shortest path length drove prioritization of biological validations by gene silencing. These results underline the importance of network-based integration of multilevel systems genetics datasets to improve understanding of the genetic architecture of metabotype and transcriptomic regulation and to characterize novel functional roles for genes determining tissue-specific metabolism.
Nilsson, Emil K; Boström, Adrian E; Mwinyi, Jessica; Schiöth, Helgi B
2016-06-01
Despite an established link between sleep deprivation and epigenetic processes in humans, it remains unclear to what extent sleep deprivation modulates DNA methylation. We performed a within-subject randomized blinded study with 16 healthy subjects to examine the effect of one night of total sleep deprivation (TSD) on the genome-wide methylation profile in blood compared with that in normal sleep. Genome-wide differences in methylation between both conditions were assessed by applying a paired regression model that corrected for monocyte subpopulations. In addition, the correlations between the methylation of genes detected to be modulated by TSD and gene expression were examined in a separate, publicly available cohort of 10 healthy male donors (E-GEOD-49065). Sleep deprivation significantly affected the DNA methylation profile both independently and in dependency of shifts in monocyte composition. Our study detected differential methylation of 269 probes. Notably, one CpG site was located 69 bp upstream of ING5, which has been shown to be differentially expressed after sleep deprivation. Gene set enrichment analysis detected the Notch and Wnt signaling pathways to be enriched among the differentially methylated genes. These results provide evidence that total acute sleep deprivation alters the methylation profile in healthy human subjects. This is, to our knowledge, the first study that systematically investigated the impact of total acute sleep deprivation on genome-wide DNA methylation profiles in blood and related the epigenomic findings to the expression data.
Jung, Seung-Hyun; Shin, Seung-Hun; Yim, Seon-Hee; Choi, Hye-Sun; Lee, Sug-Hyung; Chung, Yeun-Jun
2009-07-31
Recently, microarray-based comparative genomic hybridization (array-CGH) has emerged as a very efficient technology with higher resolution for the genome-wide identification of copy number alterations (CNA). Although CNAs are thought to affect gene expression, there is no platform currently available for the integrated CNA-expression analysis. To achieve high-resolution copy number analysis integrated with expression profiles, we established human 30k oligoarray-based genome-wide copy number analysis system and explored the applicability of this system for integrated genome and transcriptome analysis using MDA-MB-231 cell line. We compared the CNAs detected by the oligoarray with those detected by the 3k BAC array for validation. The oligoarray identified the single copy difference more accurately and sensitively than the BAC array. Seventeen CNAs detected by both platforms in MDA-MB-231 such as gains of 5p15.33-13.1, 8q11.22-8q21.13, 17p11.2, and losses of 1p32.3, 8p23.3-8p11.21, and 9p21 were consistently identified in previous studies on breast cancer. There were 122 other small CNAs (mean size 1.79 mb) that were detected by oligoarray only, not by BAC-array. We performed genomic qPCR targeting 7 CNA regions, detected by oligoarray only, and one non-CNA region to validate the oligoarray CNA detection. All qPCR results were consistent with the oligoarray-CGH results. When we explored the possibility of combined interpretation of both DNA copy number and RNA expression profiles, mean DNA copy number and RNA expression levels showed a significant correlation. In conclusion, this 30k oligoarray-CGH system can be a reasonable choice for analyzing whole genome CNAs and RNA expression profiles at a lower cost.
2011-01-01
Background Coffee is one of the world's most important crops; it is consumed worldwide and plays a significant role in the economy of producing countries. Coffea arabica and C. canephora are responsible for 70 and 30% of commercial production, respectively. C. arabica is an allotetraploid from a recent hybridization of the diploid species, C. canephora and C. eugenioides. C. arabica has lower genetic diversity and results in a higher quality beverage than C. canephora. Research initiatives have been launched to produce genomic and transcriptomic data about Coffea spp. as a strategy to improve breeding efficiency. Results Assembling the expressed sequence tags (ESTs) of C. arabica and C. canephora produced by the Brazilian Coffee Genome Project and the Nestlé-Cornell Consortium revealed 32,007 clusters of C. arabica and 16,665 clusters of C. canephora. We detected different GC3 profiles between these species that are related to their genome structure and mating system. BLAST analysis revealed similarities between coffee and grape (Vitis vinifera) genes. Using KA/KS analysis, we identified coffee genes under purifying and positive selection. Protein domain and gene ontology analyses suggested differences between Coffea spp. data, mainly in relation to complex sugar synthases and nucleotide binding proteins. OrthoMCL was used to identify specific and prevalent coffee protein families when compared to five other plant species. Among the interesting families annotated are new cystatins, glycine-rich proteins and RALF-like peptides. Hierarchical clustering was used to independently group C. arabica and C. canephora expression clusters according to expression data extracted from EST libraries, resulting in the identification of differentially expressed genes. Based on these results, we emphasize gene annotation and discuss plant defenses, abiotic stress and cup quality-related functional categories. Conclusion We present the first comprehensive genome-wide transcript profile study of C. arabica and C. canephora, which can be freely assessed by the scientific community at http://www.lge.ibi.unicamp.br/coffea. Our data reveal the presence of species-specific/prevalent genes in coffee that may help to explain particular characteristics of these two crops. The identification of differentially expressed transcripts offers a starting point for the correlation between gene expression profiles and Coffea spp. developmental traits, providing valuable insights for coffee breeding and biotechnology, especially concerning sugar metabolism and stress tolerance. PMID:21303543
Zhang, Qi; Zeng, Xin; Younkin, Sam; Kawli, Trupti; Snyder, Michael P; Keleş, Sündüz
2016-02-24
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) experiments revolutionized genome-wide profiling of transcription factors and histone modifications. Although maturing sequencing technologies allow these experiments to be carried out with short (36-50 bps), long (75-100 bps), single-end, or paired-end reads, the impact of these read parameters on the downstream data analysis are not well understood. In this paper, we evaluate the effects of different read parameters on genome sequence alignment, coverage of different classes of genomic features, peak identification, and allele-specific binding detection. We generated 101 bps paired-end ChIP-seq data for many transcription factors from human GM12878 and MCF7 cell lines. Systematic evaluations using in silico variations of these data as well as fully simulated data, revealed complex interplay between the sequencing parameters and analysis tools, and indicated clear advantages of paired-end designs in several aspects such as alignment accuracy, peak resolution, and most notably, allele-specific binding detection. Our work elucidates the effect of design on the downstream analysis and provides insights to investigators in deciding sequencing parameters in ChIP-seq experiments. We present the first systematic evaluation of the impact of ChIP-seq designs on allele-specific binding detection and highlights the power of pair-end designs in such studies.
Charlesworth, Jac C; Peralta, Juan M; Drigalenko, Eugene; Göring, Harald Hh; Almasy, Laura; Dyer, Thomas D; Blangero, John
2009-12-15
Gene identification using linkage, association, or genome-wide expression is often underpowered. We propose that formal combination of information from multiple gene-identification approaches may lead to the identification of novel loci that are missed when only one form of information is available. Firstly, we analyze the Genetic Analysis Workshop 16 Framingham Heart Study Problem 2 genome-wide association data for HDL-cholesterol using a "gene-centric" approach. Then we formally combine the association test results with genome-wide transcriptional profiling data for high-density lipoprotein cholesterol (HDL-C), from the San Antonio Family Heart Study, using a Z-transform test (Stouffer's method). We identified 39 genes by the joint test at a conservative 1% false-discovery rate, including 9 from the significant gene-based association test and 23 whose expression was significantly correlated with HDL-C. Seven genes identified as significant in the joint test were not independently identified by either the association or expression tests. This combined approach has increased power and leads to the direct nomination of novel candidate genes likely to be involved in the determination of HDL-C levels. Such information can then be used as justification for a more exhaustive search for functional sequence variation within the nominated genes. We anticipate that this type of analysis will improve our speed of identification of regulatory genes causally involved in disease risk.
Alonso, Sergio; Suzuki, Koichi; Yamamoto, Fumiichiro; Perucho, Manuel
2018-01-01
Somatic, and in a minor scale also germ line, epigenetic aberrations are fundamental to carcinogenesis, cancer progression, and tumor phenotype. DNA methylation is the most extensively studied and arguably the best understood epigenetic mechanisms that become altered in cancer. Both somatic loss of methylation (hypomethylation) and gain of methylation (hypermethylation) are found in the genome of malignant cells. In general, the cancer cell epigenome is globally hypomethylated, while some regions-typically gene-associated CpG islands-become hypermethylated. Given the profound impact that DNA methylation exerts on the transcriptional profile and genomic stability of cancer cells, its characterization is essential to fully understand the complexity of cancer biology, improve tumor classification, and ultimately advance cancer patient management and treatment. A plethora of methods have been devised to analyze and quantify DNA methylation alterations. Several of the early-developed methods relied on the use of methylation-sensitive restriction enzymes, whose activity depends on the methylation status of their recognition sequences. Among these techniques, methylation-sensitive amplification length polymorphism (MS-AFLP) was developed in the early 2000s, and successfully adapted from its original gel electrophoresis fingerprinting format to a microarray format that notably increased its throughput and allowed the quantification of the methylation changes. This array-based platform interrogates over 9500 independent loci putatively amplified by the MS-AFLP technique, corresponding to the NotI sites mapped throughout the human genome.
Yamaoka, Shuhei; Yoshimura, Kazusa; Kondou, Youichi; Onogi, Akio; Matsui, Minami; Iwata, Hiroyoshi; Ban, Tomohiro
2017-01-01
Profiling elemental contents in wheat grains and clarifying the underlying genetic systems are important for the breeding of biofortified crops. Our objective was to evaluate the genetic potential of 269 Afghan wheat landraces for increasing elemental contents in wheat cultivars. The contents of three major (Mg, K, and P) and three minor (Mn, Fe, and Zn) elements in wheat grains were measured by energy dispersive X-ray fluorescence spectrometry. Large variations in elemental contents were observed among landraces. Marker-based heritability estimates were low to moderate, suggesting that the elemental contents are complex quantitative traits. Genetic correlations between two locations (Japan and Afghanistan) and among the six elements were estimated using a multi-response Bayesian linear mixed model. Low-to-moderate genetic correlations were observed among major elements and among minor elements respectively, but not between major and minor elements. A single-response genome-wide association study detected only one significant marker, which was associated with Zn, suggesting it will be difficult to increase the elemental contents of wheat by conventional marker-assisted selection. Genomic predictions for major elemental contents were moderately or highly accurate, whereas those for minor elements were mostly low or moderate. Our results indicate genomic selection may be useful for the genetic improvement of elemental contents in wheat. PMID:28072876
Retrotransposon profiling of RNA polymerase III initiation sites.
Qi, Xiaojie; Daily, Kenneth; Nguyen, Kim; Wang, Haoyi; Mayhew, David; Rigor, Paul; Forouzan, Sholeh; Johnston, Mark; Mitra, Robi David; Baldi, Pierre; Sandmeyer, Suzanne
2012-04-01
Although retroviruses are relatively promiscuous in choice of integration sites, retrotransposons can display marked integration specificity. In yeast and slime mold, some retrotransposons are associated with tRNA genes (tDNAs). In the Saccharomyces cerevisiae genome, the long terminal repeat retrotransposon Ty3 is found at RNA polymerase III (Pol III) transcription start sites of tDNAs. Ty1, 2, and 4 elements also cluster in the upstream regions of these genes. To determine the extent to which other Pol III-transcribed genes serve as genomic targets for Ty3, a set of 10,000 Ty3 genomic retrotranspositions were mapped using high-throughput DNA sequencing. Integrations occurred at all known tDNAs, two tDNA relics (iYGR033c and ZOD1), and six non-tDNA, Pol III-transcribed types of genes (RDN5, SNR6, SNR52, RPR1, RNA170, and SCR1). Previous work in vitro demonstrated that the Pol III transcription factor (TF) IIIB is important for Ty3 targeting. However, seven loci that bind the TFIIIB loader, TFIIIC, were not targeted, underscoring the unexplained absence of TFIIIB at those sites. Ty3 integrations also occurred in two open reading frames not previously associated with Pol III transcription, suggesting the existence of a small number of additional sites in the yeast genome that interact with Pol III transcription complexes.
Genomics and Genetics in the Biology of Adaptation to Exercise
Bouchard, Claude; Rankinen, Tuomo; Timmons, James A.
2014-01-01
This chapter is devoted to the role of genetic variation and gene-exercise interactions in the biology of adaptation to exercise. There is evidence from genetic epidemiology research that DNA sequence differences contribute to human variation in physical activity level, cardiorespiratory fitness in the untrained state, cardiovascular and metabolic response to acute exercise, and responsiveness to regular exercise. Methodological and technological advances have made it possible to undertake the molecular dissection of the genetic component of complex, multifactorial traits, such as those of interest to exercise biology, in terms of tissue expression profile, genes, and allelic variants. The evidence from animal models and human studies is considered. Data on candidate genes, genome-wide linkage results, genome-wide association findings, expression arrays, and combinations of these approaches are reviewed. Combining transcriptomic and genomic technologies has been shown to be more powerful as evidenced by the development of a recent molecular predictor of the ability to increase VO2max with exercise training. For exercise as a behavior and physiological fitness as a state to be major players in public health policies will require that that the role of human individuality and the influence of DNA sequence differences be understood. Likewise, progress in the use of exercise in therapeutic medicine will depend to a large extent on our ability to identify the favorable responders for given physiological properties to a given exercise regimen. PMID:23733655
Enhancing genomic prediction with genome-wide association studies in multiparental maize populations
USDA-ARS?s Scientific Manuscript database
Genome-wide association mapping using dense marker sets has identified some nucleotide variants affecting complex traits which have been validated with fine-mapping and functional analysis. Many sequence variants associated with complex traits in maize have small effects and low repeatability, howev...
Golby, Paul; Nunez, Javier; Cockle, Paul J.; Ewer, Katie; Logan, Karen; Hogarth, Philip; Vordermeier, H. Martin; Hinds, Jason; Hewinson, R. Glyn; Gordon, Stephen V.
2011-01-01
Genome sequencing of Mycobacterium tuberculosis complex members has accelerated the search for new disease-control tools. Antigen mining is one area that has benefited enormously from access to genome data. As part of an ongoing antigen mining programme, we screened genes that were previously identified by transcriptome analysis as upregulated in response to an in vitro acid shock for their in vivo expression profile and antigenicity. We show that the genes encoding two methyltransferases, Mb1438c/Rv1403c and Mb1440c/Rv1404c, were highly upregulated in a mouse model of infection, and were antigenic in M. bovis-infected cattle. As the genes encoding these antigens were highly upregulated in vivo, we sought to define their genetic regulation. A mutant was constructed that was deleted for their putative regulator, Mb1439/Rv1404; loss of the regulator led to increased expression of the flanking methyltransferases and a defined set of distal genes. This work has therefore generated both applied and fundamental outputs, with the description of novel mycobacterial antigens that can now be moved into field trials, but also with the description of a regulatory network that is responsive to both in vivo and in vitro stimuli. PMID:18375799
The Complex Transcriptional Response of Acaryochloris marina to Different Oxygen Levels.
Hernández-Prieto, Miguel A; Lin, Yuankui; Chen, Min
2017-02-09
Ancient oxygenic photosynthetic prokaryotes produced oxygen as a waste product, but existed for a long time under an oxygen-free (anoxic) atmosphere, before an oxic atmosphere emerged. The change in oxygen levels in the atmosphere influenced the chemistry and structure of many enzymes that contained prosthetic groups that were inactivated by oxygen. In the genome of Acaryochloris marina , multiple gene copies exist for proteins that are normally encoded by a single gene copy in other cyanobacteria. Using high throughput RNA sequencing to profile transcriptome responses from cells grown under microoxic and hyperoxic conditions, we detected 8446 transcripts out of the 8462 annotated genes in the Cyanobase database. Two-thirds of the 50 most abundant transcripts are key proteins in photosynthesis. Microoxic conditions negatively affected the levels of expression of genes encoding photosynthetic complexes, with the exception of some subunits. In addition to the known regulation of the multiple copies of psbA , we detected a similar transcriptional pattern for psbJ and psbU , which might play a key role in the altered components of photosystem II. Furthermore, regulation of genes encoding proteins important for reactive oxygen species-scavenging is discussed at genome level, including, for the first time, specific small RNAs having possible regulatory roles under varying oxygen levels. Copyright © 2017 Hernandez-Prieto et al.
The Complex Transcriptional Response of Acaryochloris marina to Different Oxygen Levels
Hernández-Prieto, Miguel A.; Lin, Yuankui; Chen, Min
2016-01-01
Ancient oxygenic photosynthetic prokaryotes produced oxygen as a waste product, but existed for a long time under an oxygen-free (anoxic) atmosphere, before an oxic atmosphere emerged. The change in oxygen levels in the atmosphere influenced the chemistry and structure of many enzymes that contained prosthetic groups that were inactivated by oxygen. In the genome of Acaryochloris marina, multiple gene copies exist for proteins that are normally encoded by a single gene copy in other cyanobacteria. Using high throughput RNA sequencing to profile transcriptome responses from cells grown under microoxic and hyperoxic conditions, we detected 8446 transcripts out of the 8462 annotated genes in the Cyanobase database. Two-thirds of the 50 most abundant transcripts are key proteins in photosynthesis. Microoxic conditions negatively affected the levels of expression of genes encoding photosynthetic complexes, with the exception of some subunits. In addition to the known regulation of the multiple copies of psbA, we detected a similar transcriptional pattern for psbJ and psbU, which might play a key role in the altered components of photosystem II. Furthermore, regulation of genes encoding proteins important for reactive oxygen species-scavenging is discussed at genome level, including, for the first time, specific small RNAs having possible regulatory roles under varying oxygen levels. PMID:27974439
El Hage Chehade, Hiba; Wazir, Umar; Mokbel, Kinan; Kasem, Abdul; Mokbel, Kefah
2018-01-01
Decision-making regarding adjuvant chemotherapy has been based on clinical and pathological features. However, such decisions are seldom consistent. Web-based predictive models have been developed using data from cancer registries to help determine the need for adjuvant therapy. More recently, with the recognition of the heterogenous nature of breast cancer, genomic assays have been developed to aid in the therapeutic decision-making. We have carried out a comprehensive literature review regarding online prognostication tools and genomic assays to assess whether online tools could be used as valid alternatives to genomic profiling in decision-making regarding adjuvant therapy in early breast cancer. Breast cancer has been recently recognized as a heterogenous disease based on variations in molecular characteristics. Online tools are valuable in guiding adjuvant treatment, especially in resource constrained countries. However, in the era of personalized therapy, molecular profiling appears to be superior in predicting clinical outcome and guiding therapy. Copyright © 2017 Elsevier Inc. All rights reserved.
FULL-GENOME ANALYSIS OF ALTERNATIVE SPLICING IN MOUSE LIVER AFTER HEPATOTOXICANT EXPOSURE
Alternative splicing plays a role in determining gene function and protein diversity. We have employed whole genome exon profiling using Affymetrix Mouse Exon 1.0 ST arrays to understand the significance of alternative splicing on a genome-wide scale in response to multiple toxic...
Genome-Wide Profiling of RNA–Protein Interactions Using CLIP-Seq
Stork, Cheryl; Zheng, Sika
2017-01-01
UV crosslinking immunoprecipitation (CLIP) is an increasingly popular technique to study protein–RNA interactions in tissues and cells. Whole cells or tissues are ultraviolet irradiated to generate a covalent bond between RNA and proteins that are in close contact. After partial RNase digestion, antibodies specific to an RNA binding protein (RBP) or a protein–epitope tag is then used to immunoprecipitate the protein–RNA complexes. After stringent washing and gel separation the RBP–RNA complex is excised. The RBP is protease digested to allow purification of the bound RNA. Reverse transcription of the RNA followed by high-throughput sequencing of the cDNA library is now often used to identify protein bound RNA on a genome-wide scale. UV irradiation can result in cDNA truncations and/or mutations at the crosslink sites, which complicates the alignment of the sequencing library to the reference genome and the identification of the crosslinking sites. Meanwhile, one or more amino acids of a crosslinked RBP can remain attached to its bound RNA due to incomplete digestion of the protein. As a result, reverse transcriptase may not read through the crosslink sites, and produce cDNA ending at the crosslinked nucleotide. This is harnessed by one variant of CLIP methods to identify crosslinking sites at a nucleotide resolution. This method, individual nucleotide resolution CLIP (iCLIP) circularizes cDNA to capture the truncated cDNA and also increases the efficiency of ligating sequencing adapters to the library. Here, we describe the detailed procedure of iCLIP. PMID:26965263
Linking Genes to Cardiovascular Diseases: Gene Action and Gene–Environment Interactions
2016-01-01
A unique myocardial characteristic is its ability to grow/remodel in order to adapt; this is determined partly by genes and partly by the environment and the milieu intérieur. In the “post-genomic” era, a need is emerging to elucidate the physiologic functions of myocardial genes, as well as potential adaptive and maladaptive modulations induced by environmental/epigenetic factors. Genome sequencing and analysis advances have become exponential lately, with escalation of our knowledge concerning sometimes controversial genetic underpinnings of cardiovascular diseases. Current technologies can identify candidate genes variously involved in diverse normal/abnormal morphomechanical phenotypes, and offer insights into multiple genetic factors implicated in complex cardiovascular syndromes. The expression profiles of thousands of genes are regularly ascertained under diverse conditions. Global analyses of gene expression levels are useful for cataloging genes and correlated phenotypes, and for elucidating the role of genes in maladies. Comparative expression of gene networks coupled to complex disorders can contribute insights as to how “modifier genes” influence the expressed phenotypes. Increasingly, a more comprehensive and detailed systematic understanding of genetic abnormalities underlying, for example, various genetic cardiomyopathies is emerging. Implementing genomic findings in cardiology practice may well lead directly to better diagnosing and therapeutics. There is currently evolving a strong appreciation for the value of studying gene anomalies, and doing so in a non-disjointed, cohesive manner. However, it is challenging for many—practitioners and investigators—to comprehend, interpret, and utilize the clinically increasingly accessible and affordable cardiovascular genomics studies. This survey addresses the need for fundamental understanding in this vital area. PMID:26545598
[Research progress in neuropsychopharmacology updated for the post-genomic era].
Nakanishi, Toru
2009-11-01
Neuropsychopharmacological research in the post genomic (genomic sequence) era has been developing rapidly through the use of novel techniques including DNA chips. We have applied these techniques to investigate the anti-tumor effect of NSAIDs, isolate novel genes specifically expressed in rheumatoid arthritis, and analyze gene expression profiles in mesenchymal stem cells. Recently, we have developed a novel system of quantitative PCR for detection of BDNF mRNA isoforms. By using this system, we identified the exon-specific mode of expression in acute and chronic pain. In addition, we have made gene expression profiles of KO mice of beta2 subunits in acetylcholine receptors.
Eccleston, Mark; Morris, George
2008-09-01
ValiRx plc is a therapeutics and diagnostics company developing an integrated approach to the diagnosis, treatment and prognosis of cancer through its two subsidiaries; ValiPharma and ValiBio. Over 95% of cellular DNA is tightly packaged into a complex structure called chromatin, with only 1% available to be read by cell's machinery. ValiRx's two proprietary technology platforms exploit this epigenomic structure. ValiBio is developing low-cost, rapid, high-throughput, noninvasive screening tests for the early detection, differential diagnosis and prognosis of cancer using its patented Hypergenomics™ and Nucleosomics™ technology. Its therapeutics subsidiary, ValiPharma, is developing novel gene-silencing therapeutics based on its GeneICE™ technology platform, which works by repackaging specific open areas of DNA, resulting in targeted gene deactivation. HyperGenomics and GeneICE are synergistic but independent business areas based on the company's core patent portfolio. ValiRx intends to facilitate early, optimal personalized treatment regimes by correlating 'hypersensitive' site profiles within the genome to specific types of cancer.
Parker, Heidi G; Dreger, Dayna L; Rimbault, Maud; Davis, Brian W; Mullen, Alexandra B; Carpintero-Ramirez, Gretchen; Ostrander, Elaine A
2017-04-25
There are nearly 400 modern domestic dog breeds with a unique histories and genetic profiles. To track the genetic signatures of breed development, we have assembled the most diverse dataset of dog breeds, reflecting their extensive phenotypic variation and heritage. Combining genetic distance, migration, and genome-wide haplotype sharing analyses, we uncover geographic patterns of development and independent origins of common traits. Our analyses reveal the hybrid history of breeds and elucidate the effects of immigration, revealing for the first time a suggestion of New World dog within some modern breeds. Finally, we used cladistics and haplotype sharing to show that some common traits have arisen more than once in the history of the dog. These analyses characterize the complexities of breed development, resolving longstanding questions regarding individual breed origination, the effect of migration on geographically distinct breeds, and, by inference, transfer of trait and disease alleles among dog breeds. Copyright © 2017 The Author(s). Published by Elsevier Inc. All rights reserved.
Quantitative Profiling of Peptides from RNAs classified as non-coding
Prabakaran, Sudhakaran; Hemberg, Martin; Chauhan, Ruchi; Winter, Dominic; Tweedie-Cullen, Ry Y.; Dittrich, Christian; Hong, Elizabeth; Gunawardena, Jeremy; Steen, Hanno; Kreiman, Gabriel; Steen, Judith A.
2014-01-01
Only a small fraction of the mammalian genome codes for messenger RNAs destined to be translated into proteins, and it is generally assumed that a large portion of transcribed sequences - including introns and several classes of non-coding RNAs (ncRNAs) do not give rise to peptide products. A systematic examination of translation and physiological regulation of ncRNAs has not been conducted. Here, we use computational methods to identify the products of non-canonical translation in mouse neurons by analyzing unannotated transcripts in combination with proteomic data. This study supports the existence of non-canonical translation products from both intragenic and extragenic genomic regions, including peptides derived from anti-sense transcripts and introns. Moreover, the studied novel translation products exhibit temporal regulation similar to that of proteins known to be involved in neuronal activity processes. These observations highlight a potentially large and complex set of biologically regulated translational events from transcripts formerly thought to lack coding potential. PMID:25403355
Rodrigues, Raquel; Grosso, Ana Rita; Moita, Luís
2013-01-01
The immune system relies on the plasticity of its components to produce appropriate responses to frequent environmental challenges. Dendritic cells (DCs) are critical initiators of innate immunity and orchestrate the later and more specific adaptive immunity. The generation of diversity in transcriptional programs is central for effective immune responses. Alternative splicing is widely considered a key generator of transcriptional and proteomic complexity, but its role has been rarely addressed systematically in immune cells. Here we used splicing-sensitive arrays to assess genome-wide gene- and exon-level expression profiles in human DCs in response to a bacterial challenge. We find widespread alternative splicing events and splicing factor transcriptional signatures induced by an E. coli challenge to human DCs. Alternative splicing acts in concert with transcriptional modulation, but these two mechanisms of gene regulation affect primarily distinct functional gene groups. Alternative splicing is likely to have an important role in DC immunobiology because it affects genes known to be involved in DC development, endocytosis, antigen presentation and cell cycle arrest.
Common genetic variation drives molecular heterogeneity in human iPSCs.
Kilpinen, Helena; Goncalves, Angela; Leha, Andreas; Afzal, Vackar; Alasoo, Kaur; Ashford, Sofie; Bala, Sendu; Bensaddek, Dalila; Casale, Francesco Paolo; Culley, Oliver J; Danecek, Petr; Faulconbridge, Adam; Harrison, Peter W; Kathuria, Annie; McCarthy, Davis; McCarthy, Shane A; Meleckyte, Ruta; Memari, Yasin; Moens, Nathalie; Soares, Filipa; Mann, Alice; Streeter, Ian; Agu, Chukwuma A; Alderton, Alex; Nelson, Rachel; Harper, Sarah; Patel, Minal; White, Alistair; Patel, Sharad R; Clarke, Laura; Halai, Reena; Kirton, Christopher M; Kolb-Kokocinski, Anja; Beales, Philip; Birney, Ewan; Danovi, Davide; Lamond, Angus I; Ouwehand, Willem H; Vallier, Ludovic; Watt, Fiona M; Durbin, Richard; Stegle, Oliver; Gaffney, Daniel J
2017-06-15
Technology utilizing human induced pluripotent stem cells (iPS cells) has enormous potential to provide improved cellular models of human disease. However, variable genetic and phenotypic characterization of many existing iPS cell lines limits their potential use for research and therapy. Here we describe the systematic generation, genotyping and phenotyping of 711 iPS cell lines derived from 301 healthy individuals by the Human Induced Pluripotent Stem Cells Initiative. Our study outlines the major sources of genetic and phenotypic variation in iPS cells and establishes their suitability as models of complex human traits and cancer. Through genome-wide profiling we find that 5-46% of the variation in different iPS cell phenotypes, including differentiation capacity and cellular morphology, arises from differences between individuals. Additionally, we assess the phenotypic consequences of genomic copy-number alterations that are repeatedly observed in iPS cells. In addition, we present a comprehensive map of common regulatory variants affecting the transcriptome of human pluripotent cells.
Principles of gene microarray data analysis.
Mocellin, Simone; Rossi, Carlo Riccardo
2007-01-01
The development of several gene expression profiling methods, such as comparative genomic hybridization (CGH), differential display, serial analysis of gene expression (SAGE), and gene microarray, together with the sequencing of the human genome, has provided an opportunity to monitor and investigate the complex cascade of molecular events leading to tumor development and progression. The availability of such large amounts of information has shifted the attention of scientists towards a nonreductionist approach to biological phenomena. High throughput technologies can be used to follow changing patterns of gene expression over time. Among them, gene microarray has become prominent because it is easier to use, does not require large-scale DNA sequencing, and allows for the parallel quantification of thousands of genes from multiple samples. Gene microarray technology is rapidly spreading worldwide and has the potential to drastically change the therapeutic approach to patients affected with tumor. Therefore, it is of paramount importance for both researchers and clinicians to know the principles underlying the analysis of the huge amount of data generated with microarray technology.
Assessment of Parkinson’s disease risk loci in Greece
Kara, Eleanna; Xiromerisiou, Georgia; Spanaki, Cleanthe; Bozi, Maria; Koutsis, Georgios; Panas, Marios; Dardiotis, Efthimios; Ralli, Styliani; Bras, Jose; Letson, Christopher; Edsall, Connor; Pliner, Hannah; Arepali, Sampath; Kalinderi, Kallirhoe; Fidani, Liana; Bostanjopoulou, Sevasti; Keller, Margaux F; Wood, Nicholas W; Hardy, John; Houlden, Henry; Stefanis, Leonidas; Plaitakis, Andreas; Hernandez, Dena; Hadjigeorgiou, Georgios M; Nalls, Mike A; Singleton, Andrew B
2013-01-01
Genome wide association studies (GWAS) have been shown to be a powerful approach to identify risk loci for neurodegenerative diseases. Recent GWAS in Parkinson’s disease (PD) have been successful in identifying numerous risk variants pointing to novel pathways potentially implicated in the pathogenesis of PD. Contributing to these GWAS efforts, we performed genotyping of previously identified risk alleles in PD patients and controls from Greece. We showed that previously published risk profiles for Northern European and American populations are also applicable to the Greek population. In addition, while we were largely underpowered to detect individual associations we replicated 5 of 32 previously published risk variants with nominal p-values <0.05. Genome-wide complex trait analysis (GCTA) revealed that known risk loci explain disease risk in 1.27% of Greek PD patients. Collectively, these results indicate that there is likely a substantial genetic component to PD in Greece similarly to other worldwide populations that remains to be discovered. PMID:24080174
North, Matthew; Tandon, Vickram J.; Thomas, Reuben; Loguinov, Alex; Gerlovina, Inna; Hubbard, Alan E.; Zhang, Luoping; Smith, Martyn T.; Vulpe, Chris D.
2011-01-01
Benzene is a ubiquitous environmental contaminant and is widely used in industry. Exposure to benzene causes a number of serious health problems, including blood disorders and leukemia. Benzene undergoes complex metabolism in humans, making mechanistic determination of benzene toxicity difficult. We used a functional genomics approach to identify the genes that modulate the cellular toxicity of three of the phenolic metabolites of benzene, hydroquinone (HQ), catechol (CAT) and 1,2,4-benzenetriol (BT), in the model eukaryote Saccharomyces cerevisiae. Benzene metabolites generate oxidative and cytoskeletal stress, and tolerance requires correct regulation of iron homeostasis and the vacuolar ATPase. We have identified a conserved bZIP transcription factor, Yap3p, as important for a HQ-specific response pathway, as well as two genes that encode putative NAD(P)H:quinone oxidoreductases, PST2 and YCP4. Many of the yeast genes identified have human orthologs that may modulate human benzene toxicity in a similar manner and could play a role in benzene exposure-related disease. PMID:21912624
Context influences on TALE–DNA binding revealed by quantitative profiling
Rogers, Julia M.; Barrera, Luis A.; Reyon, Deepak; Sander, Jeffry D.; Kellis, Manolis; Joung, J Keith; Bulyk, Martha L.
2015-01-01
Transcription activator-like effector (TALE) proteins recognize DNA using a seemingly simple DNA-binding code, which makes them attractive for use in genome engineering technologies that require precise targeting. Although this code is used successfully to design TALEs to target specific sequences, off-target binding has been observed and is difficult to predict. Here we explore TALE–DNA interactions comprehensively by quantitatively assaying the DNA-binding specificities of 21 representative TALEs to ∼5,000–20,000 unique DNA sequences per protein using custom-designed protein-binding microarrays (PBMs). We find that protein context features exert significant influences on binding. Thus, the canonical recognition code does not fully capture the complexity of TALE–DNA binding. We used the PBM data to develop a computational model, Specificity Inference For TAL-Effector Design (SIFTED), to predict the DNA-binding specificity of any TALE. We provide SIFTED as a publicly available web tool that predicts potential genomic off-target sites for improved TALE design. PMID:26067805
Context influences on TALE-DNA binding revealed by quantitative profiling.
Rogers, Julia M; Barrera, Luis A; Reyon, Deepak; Sander, Jeffry D; Kellis, Manolis; Joung, J Keith; Bulyk, Martha L
2015-06-11
Transcription activator-like effector (TALE) proteins recognize DNA using a seemingly simple DNA-binding code, which makes them attractive for use in genome engineering technologies that require precise targeting. Although this code is used successfully to design TALEs to target specific sequences, off-target binding has been observed and is difficult to predict. Here we explore TALE-DNA interactions comprehensively by quantitatively assaying the DNA-binding specificities of 21 representative TALEs to ∼5,000-20,000 unique DNA sequences per protein using custom-designed protein-binding microarrays (PBMs). We find that protein context features exert significant influences on binding. Thus, the canonical recognition code does not fully capture the complexity of TALE-DNA binding. We used the PBM data to develop a computational model, Specificity Inference For TAL-Effector Design (SIFTED), to predict the DNA-binding specificity of any TALE. We provide SIFTED as a publicly available web tool that predicts potential genomic off-target sites for improved TALE design.
Common genetic variation drives molecular heterogeneity in human iPSCs
Leha, Andreas; Afzal, Vackar; Alasoo, Kaur; Ashford, Sofie; Bala, Sendu; Bensaddek, Dalila; Casale, Francesco Paolo; Culley, Oliver J; Danecek, Petr; Faulconbridge, Adam; Harrison, Peter W; Kathuria, Annie; McCarthy, Davis; McCarthy, Shane A; Meleckyte, Ruta; Memari, Yasin; Moens, Nathalie; Soares, Filipa; Mann, Alice; Streeter, Ian; Agu, Chukwuma A; Alderton, Alex; Nelson, Rachel; Harper, Sarah; Patel, Minal; White, Alistair; Patel, Sharad R; Clarke, Laura; Halai, Reena; Kirton, Christopher M; Kolb-Kokocinski, Anja; Beales, Philip; Birney, Ewan; Danovi, Davide; Lamond, Angus I; Ouwehand, Willem H; Vallier, Ludovic; Watt, Fiona M; Durbin, Richard
2017-01-01
Induced pluripotent stem cell (iPSC) technology has enormous potential to provide improved cellular models of human disease. However, variable genetic and phenotypic characterisation of many existing iPSC lines limits their potential use for research and therapy. Here, we describe the systematic generation, genotyping and phenotyping of 711 iPSC lines derived from 301 healthy individuals by the Human Induced Pluripotent Stem Cells Initiative (HipSci: http://www.hipsci.org). Our study outlines the major sources of genetic and phenotypic variation in iPSCs and establishes their suitability as models of complex human traits and cancer. Through genome-wide profiling we find that 5-46% of the variation in different iPSC phenotypes, including differentiation capacity and cellular morphology, arises from differences between individuals. Additionally, we assess the phenotypic consequences of rare, genomic copy number mutations that are repeatedly observed in iPSC reprogramming and present a comprehensive map of common regulatory variants affecting the transcriptome of human pluripotent cells. PMID:28489815
Sunflower Hybrid Breeding: From Markers to Genomic Selection
Dimitrijevic, Aleksandra; Horn, Renate
2018-01-01
In sunflower, molecular markers for simple traits as, e.g., fertility restoration, high oleic acid content, herbicide tolerance or resistances to Plasmopara halstedii, Puccinia helianthi, or Orobanche cumana have been successfully used in marker-assisted breeding programs for years. However, agronomically important complex quantitative traits like yield, heterosis, drought tolerance, oil content or selection for disease resistance, e.g., against Sclerotinia sclerotiorum have been challenging and will require genome-wide approaches. Plant genetic resources for sunflower are being collected and conserved worldwide that represent valuable resources to study complex traits. Sunflower association panels provide the basis for genome-wide association studies, overcoming disadvantages of biparental populations. Advances in technologies and the availability of the sunflower genome sequence made novel approaches on the whole genome level possible. Genotype-by-sequencing, and whole genome sequencing based on next generation sequencing technologies facilitated the production of large amounts of SNP markers for high density maps as well as SNP arrays and allowed genome-wide association studies and genomic selection in sunflower. Genome wide or candidate gene based association studies have been performed for traits like branching, flowering time, resistance to Sclerotinia head and stalk rot. First steps in genomic selection with regard to hybrid performance and hybrid oil content have shown that genomic selection can successfully address complex quantitative traits in sunflower and will help to speed up sunflower breeding programs in the future. To make sunflower more competitive toward other oil crops higher levels of resistance against pathogens and better yield performance are required. In addition, optimizing plant architecture toward a more complex growth type for higher plant densities has the potential to considerably increase yields per hectare. Integrative approaches combining omic technologies (genomics, transcriptomics, proteomics, metabolomics and phenomics) using bioinformatic tools will facilitate the identification of target genes and markers for complex traits and will give a better insight into the mechanisms behind the traits. PMID:29387071
Digital gene expression analysis of the zebra finch genome
2010-01-01
Background In order to understand patterns of adaptation and molecular evolution it is important to quantify both variation in gene expression and nucleotide sequence divergence. Gene expression profiling in non-model organisms has recently been facilitated by the advent of massively parallel sequencing technology. Here we investigate tissue specific gene expression patterns in the zebra finch (Taeniopygia guttata) with special emphasis on the genes of the major histocompatibility complex (MHC). Results Almost 2 million 454-sequencing reads from cDNA of six different tissues were assembled and analysed. A total of 11,793 zebra finch transcripts were represented in this EST data, indicating a transcriptome coverage of about 65%. There was a positive correlation between the tissue specificity of gene expression and non-synonymous to synonymous nucleotide substitution ratio of genes, suggesting that genes with a specialised function are evolving at a higher rate (or with less constraint) than genes with a more general function. In line with this, there was also a negative correlation between overall expression levels and expression specificity of contigs. We found evidence for expression of 10 different genes related to the MHC. MHC genes showed relatively tissue specific expression levels and were in general primarily expressed in spleen. Several MHC genes, including MHC class I also showed expression in brain. Furthermore, for all genes with highest levels of expression in spleen there was an overrepresentation of several gene ontology terms related to immune function. Conclusions Our study highlights the usefulness of next-generation sequence data for quantifying gene expression in the genome as a whole as well as in specific candidate genes. Overall, the data show predicted patterns of gene expression profiles and molecular evolution in the zebra finch genome. Expression of MHC genes in particular, corresponds well with expression patterns in other vertebrates. PMID:20359325
Cui, Hao-Ran; Zhang, Zheng-Rong; Lv, Wei; Xu, Jia-Ning; Wang, Xiao-Yun
2015-08-01
The F-box protein family is a large family that is characterized by conserved F-box domains of approximately 40-50 amino acids in the N-terminus. F-box proteins participate in diverse cellular processes, such as development of floral organs, signal transduction and response to stress, primarily as a component of the Skp1-cullin-F-box (SCF) complex. In this study, using a global search of the apple genome, 517 F-box protein-encoding genes (F-box genes for short) were identified and further subdivided into 12 groups according to the characterization of known functional domains, which suggests the different potential functions or processes that they were involved in. Among these domains, the galactose oxidase domain was analyzed for the first time in plants, and this domain was present with or without the Kelch domain. The F-box genes were distributed in all 17 apple chromosomes with various densities and tended to form gene clusters. Spatial expression profile analysis revealed that F-box genes have organ-specific expression and are widely expressed in all organs. Proteins that contained the galactose oxidase domain were highly expressed in leaves, flowers and seeds. From a fruit ripening expression profile, 166 F-box genes were identified. The expressions of most of these genes changed little during maturation, but five of them increased significantly. Using qRT-PCR to examine the expression of F-box genes encoding proteins with domains related to stress, the results revealed that F-box proteins were up- or down-regulated, which suggests that F-box genes were involved in abiotic stress. The results of this study helped to elucidate the functions of F-box proteins, especially in Rosaceae plants.
Immunological network signatures of cancer progression and survival
2011-01-01
Background The immune contribution to cancer progression is complex and difficult to characterize. For example in tumors, immune gene expression is detected from the combination of normal, tumor and immune cells in the tumor microenvironment. Profiling the immune component of tumors may facilitate the characterization of the poorly understood roles immunity plays in cancer progression. However, the current approaches to analyze the immune component of a tumor rely on incomplete identification of immune factors. Methods To facilitate a more comprehensive approach, we created a ranked immunological relevance score for all human genes, developed using a novel strategy that combines text mining and information theory. We used this score to assign an immunological grade to gene expression profiles, and thereby quantify the immunological component of tumors. This immunological relevance score was benchmarked against existing manually curated immune resources as well as high-throughput studies. To further characterize immunological relevance for genes, the relevance score was charted against both the human interactome and cancer information, forming an expanded interactome landscape of tumor immunity. We applied this approach to expression profiles in melanomas, thus identifying and grading their immunological components, followed by identification of their associated protein interactions. Results The power of this strategy was demonstrated by the observation of early activation of the adaptive immune response and the diversity of the immune component during melanoma progression. Furthermore, the genome-wide immunological relevance score classified melanoma patient groups, whose immunological grade correlated with clinical features, such as immune phenotypes and survival. Conclusions The assignment of a ranked immunological relevance score to all human genes extends the content of existing immune gene resources and enriches our understanding of immune involvement in complex biological networks. The application of this approach to tumor immunity represents an automated systems strategy that quantifies the immunological component in complex disease. In so doing, it stratifies patients according to their immune profiles, which may lead to effective computational prognostic and clinical guides. PMID:21453479
An analysis of sponge genomes.
Costantini, Maria
2004-11-24
The genome of sponges has only been investigated so far by Bartmann-Lindholm et al. [Progr. Colloid. Polym. Sci. 107 (1997) 122-126] who reported a multimodal CsCl profile which could be resolved into five peaks for Geodia cydonium. This problem was reinvestigated here on both G. cydonium and Suberites domuncula. It was shown that DNAs from both sponges are characterized by unimodal CsCl profiles, additional peaks being due to contaminating prokaryotic and eukaryotic microorganisms.
Genome-wide Gene Expression Profiling of Acute Metal Exposures in Male Zebrafish
2014-10-23
Data in Brief Genome-wide gene expression profiling of acute metal exposures in male zebrafish Christine E. Baer a,⁎, Danielle L. Ippolito b, Naissan... Zebrafish Whole organism Nickel Chromium Cobalt Toxicogenomics To capture global responses to metal poisoning and mechanistic insights into metal...toxicity, gene expression changes were evaluated in whole adult male zebrafish following acute 24 h high dose exposure to three metals with known human
Entropic Profiler – detection of conservation in genomes using information theory
Fernandes, Francisco; Freitas, Ana T; Almeida, Jonas S; Vinga, Susana
2009-01-01
Background In the last decades, with the successive availability of whole genome sequences, many research efforts have been made to mathematically model DNA. Entropic Profiles (EP) were proposed recently as a new measure of continuous entropy of genome sequences. EP represent local information plots related to DNA randomness and are based on information theory and statistical concepts. They express the weighed relative abundance of motifs for each position in genomes. Their study is very relevant because under or over-representation segments are often associated with significant biological meaning. Findings The Entropic Profiler application here presented is a new tool designed to detect and extract under and over-represented DNA segments in genomes by using EP. It allows its computation in a very efficient way by recurring to improved algorithms and data structures, which include modified suffix trees. Available through a web interface and as downloadable source code, it allows to study positions and to search for motifs inside the whole sequence or within a specified range. DNA sequences can be entered from different sources, including FASTA files, pre-loaded examples or resuming a previously saved work. Besides the EP value plots, p-values and z-scores for each motif are also computed, along with the Chaos Game Representation of the sequence. Conclusion EP are directly related with the statistical significance of motifs and can be considered as a new method to extract and classify significant regions in genomes and estimate local scales in DNA. The present implementation establishes an efficient and useful tool for whole genome analysis. PMID:19416538
A Deluge of Complex Repeats: The Solanum Genome
Mehra, Mrigaya; Gangwar, Indu; Shankar, Ravi
2015-01-01
Repetitive elements have lately emerged as key components of genome, performing varieties of roles. It has now become necessary to have an account of repeats for every genome to understand its dynamics and state. Recently, genomes of two major Solanaceae species, Solanum tuberosum and Solanum lycopersicum, were sequenced. These species are important crops having high commercial significance as well as value as model species. However, there is a reasonable gap in information about repetitive elements and their possible roles in genome regulation for these species. The present study was aimed at detailed identification and characterization of complex repetitive elements in these genomes, along with study of their possible functional associations as well as to assess possible transcriptionally active repetitive elements. In this study, it was found that ~50–60% of genomes of S. tuberosum and S. lycopersicum were composed of repetitive elements. It was also found that complex repetitive elements were associated with >95% of genes in both species. These two genomes are mostly composed of LTR retrotransposons. Two novel repeat families very similar to LTR/ERV1 and LINE/RTE-BovB have been reported for the first time. Active existence of complex repeats was estimated by measuring their transcriptional abundance using Next Generation Sequencing read data and Microarray platforms. A reasonable amount of regulatory components like transcription factor binding sites and miRNAs appear to be under the influence of these complex repetitive elements in these species, while several genes appeared to possess exonized repeats. PMID:26241045
Schlecht, Ulrich; Erb, Ionas; Demougin, Philippe; Robine, Nicolas; Borde, Valérie; van Nimwegen, Erik; Nicolas, Alain
2008-01-01
The autonomously replicating sequence binding factor 1 (Abf1) was initially identified as an essential DNA replication factor and later shown to be a component of the regulatory network controlling mitotic and meiotic cell cycle progression in budding yeast. The protein is thought to exert its functions via specific interaction with its target site as part of distinct protein complexes, but its roles during mitotic growth and meiotic development are only partially understood. Here, we report a comprehensive approach aiming at the identification of direct Abf1-target genes expressed during fermentation, respiration, and sporulation. Computational prediction of the protein's target sites was integrated with a genome-wide DNA binding assay in growing and sporulating cells. The resulting data were combined with the output of expression profiling studies using wild-type versus temperature-sensitive alleles. This work identified 434 protein-coding loci as being transcriptionally dependent on Abf1. More than 60% of their putative promoter regions contained a computationally predicted Abf1 binding site and/or were bound by Abf1 in vivo, identifying them as direct targets. The present study revealed numerous loci previously unknown to be under Abf1 control, and it yielded evidence for the protein's variable DNA binding pattern during mitotic growth and meiotic development. PMID:18305101
Genome-Wide Analysis Reveals the Unique Stem Cell Identity of Human Amniocytes
Maguire, Colin T.; Demarest, Bradley L.; Hill, Jonathon T.; Palmer, James D.; Brothman, Arthur R.; Yost, H. Joseph; Condic, Maureen L.
2013-01-01
Human amniotic fluid contains cells that potentially have important stem cell characteristics, yet the programs controlling their developmental potency are unclear. Here, we provide evidence that amniocytes derived from multiple patients are marked by heterogeneity and variability in expression levels of pluripotency markers. Clonal analysis from multiple patients indicates that amniocytes have large pools of self-renewing cells that have an inherent property to give rise to a distinct amniocyte phenotype with a heterogeneity of pluripotent markers. Significant to their therapeutic potential, genome-wide profiles are distinct at different gestational ages and times in culture, but do not differ between genders. Based on hierarchical clustering and differential expression analyses of the entire transcriptome, amniocytes express canonical regulators associated with pluripotency and stem cell repression. Their profiles are distinct from human embryonic stem cells (ESCs), induced-pluripotent stem cells (iPSCs), and newborn foreskin fibroblasts. Amniocytes have a complex molecular signature, coexpressing trophoblastic, ectodermal, mesodermal, and endodermal cell-type-specific regulators. In contrast to the current view of the ground state of stem cells, ESCs and iPSCs also express high levels of a wide range of cell-type-specific regulators. The coexpression of multilineage differentiation markers combined with the strong expression of a subset of ES cell repressors in amniocytes suggests that these cells have a distinct phenotype that is unlike any other known cell-type or lineage. PMID:23326421
The translational landscape of Arabidopsis mitochondria.
Planchard, Noelya; Bertin, Pierre; Quadrado, Martine; Dargel-Graffin, Céline; Hatin, Isabelle; Namy, Olivier; Mireau, Hakim
2018-06-05
Messenger RNA translation is a complex process that is still poorly understood in eukaryotic organelles like mitochondria. Growing evidence indicates though that mitochondrial translation differs from its bacterial counterpart in many key aspects. In this analysis, we have used ribosome profiling technology to generate a genome-wide snapshot view of mitochondrial translation in Arabidopsis. We show that, unlike in humans, most Arabidopsis mitochondrial ribosome footprints measure 27 and 28 bases. We also reveal that respiratory subunits encoding mRNAs show much higher ribosome association than other mitochondrial mRNAs, implying that they are translated at higher levels. Homogenous ribosome densities were generally detected within each respiratory complex except for complex V, where higher ribosome coverage corroborated with higher requirements for specific subunits. In complex I respiratory mutants, a reorganization of mitochondrial mRNAs ribosome association was detected involving increased ribosome densities for certain ribosomal protein encoding transcripts and a reduction in translation of a few complex V mRNAs. Taken together, our observations reveal that plant mitochondrial translation is a dynamic process and that translational control is important for gene expression in plant mitochondria. This study paves the way for future advances in the understanding translation in higher plant mitochondria.
LDSplitDB: a database for studies of meiotic recombination hotspots in MHC using human genomic data.
Guo, Jing; Chen, Hao; Yang, Peng; Lee, Yew Ti; Wu, Min; Przytycka, Teresa M; Kwoh, Chee Keong; Zheng, Jie
2018-04-20
Meiotic recombination happens during the process of meiosis when chromosomes inherited from two parents exchange genetic materials to generate chromosomes in the gamete cells. The recombination events tend to occur in narrow genomic regions called recombination hotspots. Its dysregulation could lead to serious human diseases such as birth defects. Although the regulatory mechanism of recombination events is still unclear, DNA sequence polymorphisms have been found to play crucial roles in the regulation of recombination hotspots. To facilitate the studies of the underlying mechanism, we developed a database named LDSplitDB which provides an integrative and interactive data mining and visualization platform for the genome-wide association studies of recombination hotspots. It contains the pre-computed association maps of the major histocompatibility complex (MHC) region in the 1000 Genomes Project and the HapMap Phase III datasets, and a genome-scale study of the European population from the HapMap Phase II dataset. Besides the recombination profiles, related data of genes, SNPs and different types of epigenetic modifications, which could be associated with meiotic recombination, are provided for comprehensive analysis. To meet the computational requirement of the rapidly increasing population genomics data, we prepared a lookup table of 400 haplotypes for recombination rate estimation using the well-known LDhat algorithm which includes all possible two-locus haplotype configurations. To the best of our knowledge, LDSplitDB is the first large-scale database for the association analysis of human recombination hotspots with DNA sequence polymorphisms. It provides valuable resources for the discovery of the mechanism of meiotic recombination hotspots. The information about MHC in this database could help understand the roles of recombination in human immune system. DATABASE URL: http://histone.scse.ntu.edu.sg/LDSplitDB.
Regions of very low H3K27me3 partition the Drosophila genome into topological domains
Flower, Rosalyn; Choo, Siew Woh
2017-01-01
It is now well established that eukaryote genomes have a common architectural organization into topologically associated domains (TADs) and evidence is accumulating that this organization plays an important role in gene regulation. However, the mechanisms that partition the genome into TADs and the nature of domain boundaries are still poorly understood. We have investigated boundary regions in the Drosophila genome and find that they can be identified as domains of very low H3K27me3. The genome-wide H3K27me3 profile partitions into two states; very low H3K27me3 identifies Depleted (D) domains that contain housekeeping genes and their regulators such as the histone acetyltransferase-containing NSL complex, whereas domains containing moderate-to-high levels of H3K27me3 (Enriched or E domains) are associated with regulated genes, irrespective of whether they are active or inactive. The D domains correlate with the boundaries of TADs and are enriched in a subset of architectural proteins, particularly Chromator, BEAF-32, and Z4/Putzig. However, rather than being clustered at the borders of these domains, these proteins bind throughout the H3K27me3-depleted regions and are much more strongly associated with the transcription start sites of housekeeping genes than with the H3K27me3 domain boundaries. While we have not demonstrated causality, we suggest that the D domain chromatin state, characterised by very low or absent H3K27me3 and established by housekeeping gene regulators, acts to separate topological domains thereby setting up the domain architecture of the genome. PMID:28282436
Sherrard, Laura J; Tai, Anna S; Wee, Bryan A; Ramsay, Kay A; Kidd, Timothy J; Ben Zakour, Nouri L; Whiley, David M; Beatson, Scott A; Bell, Scott C
2017-01-01
A Pseudomonas aeruginosa AUST-02 strain sub-type (M3L7) has been identified in Australia, infects the lungs of some people with cystic fibrosis and is associated with antibiotic resistance. Multiple clonal lineages may emerge during treatment with mutations in chromosomally encoded antibiotic resistance genes commonly observed. Here we describe the within-host diversity and antibiotic resistance of M3L7 during and after antibiotic treatment of an acute pulmonary exacerbation using whole genome sequencing and show both variation and shared mutations in important genes. Eleven isolates from an M3L7 population (n = 134) isolated over 3 months from an individual with cystic fibrosis underwent whole genome sequencing. A phylogeny based on core genome SNPs identified three distinct phylogenetic groups comprising two groups with higher rates of mutation (hypermutators) and one non-hypermutator group. Genomes were screened for acquired antibiotic resistance genes with the result suggesting that M3L7 resistance is principally driven by chromosomal mutations as no acquired mechanisms were detected. Small genetic variations, shared by all 11 isolates, were found in 49 genes associated with antibiotic resistance including frame-shift mutations (mexA, mexT), premature stop codons (oprD, mexB) and mutations in quinolone-resistance determining regions (gyrA, parE). However, whole genome sequencing also revealed mutations in 21 genes that were acquired following divergence of groups, which may also impact the activity of antibiotics and multi-drug efflux pumps. Comparison of mutations with minimum inhibitory concentrations of anti-pseudomonal antibiotics could not easily explain all resistance profiles observed. These data further demonstrate the complexity of chronic and antibiotic resistant P. aeruginosa infection where a multitude of co-existing genotypically diverse sub-lineages might co-exist during and after intravenous antibiotic treatment.
BAC sequencing using pooled methods.
Saski, Christopher A; Feltus, F Alex; Parida, Laxmi; Haiminen, Niina
2015-01-01
Shotgun sequencing and assembly of a large, complex genome can be both expensive and challenging to accurately reconstruct the true genome sequence. Repetitive DNA arrays, paralogous sequences, polyploidy, and heterozygosity are main factors that plague de novo genome sequencing projects that typically result in highly fragmented assemblies and are difficult to extract biological meaning. Targeted, sub-genomic sequencing offers complexity reduction by removing distal segments of the genome and a systematic mechanism for exploring prioritized genomic content through BAC sequencing. If one isolates and sequences the genome fraction that encodes the relevant biological information, then it is possible to reduce overall sequencing costs and efforts that target a genomic segment. This chapter describes the sub-genome assembly protocol for an organism based upon a BAC tiling path derived from a genome-scale physical map or from fine mapping using BACs to target sub-genomic regions. Methods that are described include BAC isolation and mapping, DNA sequencing, and sequence assembly.
Li, LiQi; Jothi, Raja; Cui, Kairong; Lee, Jan Y; Cohen, Tsadok; Gorivodsky, Marat; Tzchori, Itai; Zhao, Yangu; Hayes, Sandra M; Bresnick, Emery H; Zhao, Keji; Westphal, Heiner; Love, Paul E
2013-01-01
The nuclear adaptor Ldb1 functions as a core component of multiprotein transcription complexes that regulate differentiation in diverse cell types. In the hematopoietic lineage, Ldb1 forms a complex with the non–DNA-binding adaptor Lmo2 and the transcription factors E2A, Scl and GATA-1 (or GATA-2). Here we demonstrate a critical and continuous requirement for Ldb1 in the maintenance of both fetal and adult mouse hematopoietic stem cells (HSCs). Deletion of Ldb1 in hematopoietic progenitors resulted in the downregulation of many transcripts required for HSC maintenance. Genome-wide profiling by chromatin immunoprecipitation followed by sequencing (ChIP-Seq) identified Ldb1 complex–binding sites at highly conserved regions in the promoters of genes involved in HSC maintenance. Our results identify a central role for Ldb1 in regulating the transcriptional program responsible for the maintenance of HSCs. PMID:21186366
Modular probes for enriching and detecting complex nucleic acid sequences
NASA Astrophysics Data System (ADS)
Wang, Juexiao Sherry; Yan, Yan Helen; Zhang, David Yu
2017-12-01
Complex DNA sequences are difficult to detect and profile, but are important contributors to human health and disease. Existing hybridization probes lack the capability to selectively bind and enrich hypervariable, long or repetitive sequences. Here, we present a generalized strategy for constructing modular hybridization probes (M-Probes) that overcomes these challenges. We demonstrate that M-Probes can tolerate sequence variations of up to 7 nt at prescribed positions while maintaining single nucleotide sensitivity at other positions. M-Probes are also shown to be capable of sequence-selectively binding a continuous DNA sequence of more than 500 nt. Furthermore, we show that M-Probes can detect genes with triplet repeats exceeding a programmed threshold. As a demonstration of this technology, we have developed a hybrid capture method to determine the exact triplet repeat expansion number in the Huntington's gene of genomic DNA using quantitative PCR.
USDA-ARS?s Scientific Manuscript database
The technological advances of RNA-seq and de novo transcriptome assembly have enabled genome annotation and transcriptome profiling in heterozygous species. This is a promising approach to improving the annotation of the reference genome sequence of grapevine (Vitis vinifera L.), a species of high-l...
Diagnosis and therapy of oral squamous cell carcinoma.
Konkimalla, V Badireenath; Suhas, Venkatramana Laxminarayana; Chandra, Nagasuma R; Gebhart, Erich; Efferth, Thomas
2007-03-01
Oral squamous cell carcinoma ranks among the top ten most common cancers worldwide. Despite the success in diagnosis and therapy during the past 30 years, oral squamous cell carcinoma still belongs to the tumor types with a very unfavorable prognosis. In an effort to identify genomic alterations with prognostic relevance, we applied the comparative genomic hybridization technique on oral squamous cell carcinoma. The tumors exhibited from five up to 47 DNA copy number alterations, indicating a considerable degree of genomic imbalance. Out of 35 tumors, 19 showed a gain of chromosome band 7p12. Genomic imbalances were investigated by hierarchical cluster analysis and clustered image mapping to investigate whether genomic profiles correlate with clinical data. Results of the present investigation show that profiling of genomic imbalances in general, and especially of the epidermal growth factor receptor (EGFR) on 7p12, may be suitable as prognostic factors. In order to identify small-molecule inhibitors for EGFR, we established a database of 531 natural compounds derived from medicinal plants used in traditional Chinese medicine. Candidate compounds were identified by correlation analysis using the Kendall tau-test of IC50 values of tumor cell lines and microarray-based EGFR mRNA expression. Further validation was performed by molecular docking studies using the AutoDock program with the crystal structure of EGFR tyrosine kinase domain as docking template. We estimate these results will be a further step toward the ultimate goal of individualized, patient-adapted tumor treatment based on tumor molecular profiling.
USDA-ARS?s Scientific Manuscript database
Bovine Respiratory Disease Complex is a disease that is very costly to the dairy industry. Genomic selection may be an effective tool to improve host resistance to the pathogens that cause this disease. Use of genomic predicted transmitting abilities (GPTA) for selection has had a dramatic effect on...
USDA-ARS?s Scientific Manuscript database
New and emerging next generation sequencing technologies have been promising in reducing sequencing costs, but not significantly for complex polyploid plant genomes such as cotton. Large and highly repetitive genome of G. hirsutum (~2.5GB) is less amenable and cost-intensive with traditional BAC-by...
Chromosome catastrophes involve replication mechanisms generating complex genomic rearrangements
Liu, Pengfei; Erez, Ayelet; Sreenath Nagamani, Sandesh C.; Dhar, Shweta U.; Kołodziejska, Katarzyna E.; Dharmadhikari, Avinash V.; Cooper, M. Lance; Wiszniewska, Joanna; Zhang, Feng; Withers, Marjorie A.; Bacino, Carlos A.; Campos-Acevedo, Luis Daniel; Delgado, Mauricio R.; Freedenberg, Debra; Garnica, Adolfo; Grebe, Theresa A.; Hernández-Almaguer, Dolores; Immken, LaDonna; Lalani, Seema R.; McLean, Scott D.; Northrup, Hope; Scaglia, Fernando; Strathearn, Lane; Trapane, Pamela; Kang, Sung-Hae L.; Patel, Ankita; Cheung, Sau Wai; Hastings, P. J.; Stankiewicz, Paweł; Lupski, James R.; Bi, Weimin
2011-01-01
SUMMARY Complex genomic rearrangements (CGR) consisting of two or more breakpoint junctions have been observed in genomic disorders. Recently, a chromosome catastrophe phenomenon termed chromothripsis, in which numerous genomic rearrangements are apparently acquired in one single catastrophic event, was described in multiple cancers. Here we show that constitutionally acquired CGRs share similarities with cancer chromothripsis. In the 17 CGR cases investigated we observed localization and multiple copy number changes including deletions, duplications and/or triplications, as well as extensive translocations and inversions. Genomic rearrangements involved varied in size and complexities; in one case, array comparative genomic hybridization revealed 18 copy number changes. Breakpoint sequencing identified characteristic features, including small templated insertions at breakpoints and microhomology at breakpoint junctions, which have been attributed to replicative processes. The resemblance between CGR and chromothripsis suggests similar mechanistic underpinnings. Such chromosome catastrophic events appear to reflect basic DNA metabolism operative throughout an organism’s life cycle. PMID:21925314
MetaSort untangles metagenome assembly by reducing microbial community complexity
Ji, Peifeng; Zhang, Yanming; Wang, Jinfeng; Zhao, Fangqing
2017-01-01
Most current approaches to analyse metagenomic data rely on reference genomes. Novel microbial communities extend far beyond the coverage of reference databases and de novo metagenome assembly from complex microbial communities remains a great challenge. Here we present a novel experimental and bioinformatic framework, metaSort, for effective construction of bacterial genomes from metagenomic samples. MetaSort provides a sorted mini-metagenome approach based on flow cytometry and single-cell sequencing methodologies, and employs new computational algorithms to efficiently recover high-quality genomes from the sorted mini-metagenome by the complementary of the original metagenome. Through extensive evaluations, we demonstrated that metaSort has an excellent and unbiased performance on genome recovery and assembly. Furthermore, we applied metaSort to an unexplored microflora colonized on the surface of marine kelp and successfully recovered 75 high-quality genomes at one time. This approach will greatly improve access to microbial genomes from complex or novel communities. PMID:28112173
Kay, Neil E.; Eckel-Passow, Jeanette E.; Braggio, Esteban; VanWier, Scott; Shanafelt, Tait D.; Van Dyke, Daniel L.; Jelinek, Diane F.; Tschumper, Renee C.; Kipps, Thomas; Byrd, John C.; Fonseca, Rafael
2010-01-01
To better understand the implications of genomic instability and outcome in B-cell CLL, we sought to address genomic complexity as a predictor of chemosensitivity and ultimately clinical outcome in this disease. We employed array-based comparative genomic hybridization (aCGH), using a one-million probe array and identified gains and losses of genetic material in 48 patients treated on a chemoimmunotherapy (CIT) clinical trial. We identified chromosomal gain or loss in ≥6% of the patients on chromosomes 3, 8, 9, 10, 11, 12, 13, 14 and 17. Higher genomic complexity, as a mechanism favoring clonal selection, was associated with shorter progression-free survival and predicted a poor response to treatment. Of interest, CLL cases with loss of p53 surveillance showed more complex genomic features and were found both in patients with a 17p13.1 deletion and in the more favorable genetic subtype characterized by the presence of 13q14.1 deletion. This aCGH study adds information on the association between inferior trial response and increasing genetic complexity as CLL progresses. PMID:21156228
PAF Complex Plays Novel Subunit-Specific Roles in Alternative Cleavage and Polyadenylation
Yang, Yan; Li, Wencheng; Hoque, Mainul; Hou, Liming; Shen, Steven; Tian, Bin; Dynlacht, Brian D.
2016-01-01
The PAF complex (Paf1C) has been shown to regulate chromatin modifications, gene transcription, and RNA polymerase II (PolII) elongation. Here, we provide the first genome-wide profiles for the distribution of the entire complex in mammalian cells using chromatin immunoprecipitation and high throughput sequencing. We show that Paf1C is recruited not only to promoters and gene bodies, but also to regions downstream of cleavage/polyadenylation (pA) sites at 3’ ends, a profile that sharply contrasted with the yeast complex. Remarkably, we identified novel, subunit-specific links between Paf1C and regulation of alternative cleavage and polyadenylation (APA) and upstream antisense transcription using RNAi coupled with deep sequencing of the 3’ ends of transcripts. Moreover, we found that depletion of Paf1C subunits resulted in the accumulation of PolII over gene bodies, which coincided with APA. Depletion of specific Paf1C subunits led to global loss of histone H2B ubiquitylation, although there was little impact of Paf1C depletion on other histone modifications, including tri-methylation of histone H3 on lysines 4 and 36 (H3K4me3 and H3K36me3), previously associated with this complex. Our results provide surprising differences with yeast, while unifying observations that link Paf1C with PolII elongation and RNA processing, and indicate that Paf1C subunits could play roles in controlling transcript length through suppression of PolII accumulation at transcription start site (TSS)-proximal pA sites and regulating pA site choice in 3’UTRs. PMID:26765774
Brägelmann, Johannes; Klümper, Niklas; Offermann, Anne; von Mässenhausen, Anne; Böhm, Diana; Deng, Mario; Queisser, Angela; Sanders, Christine; Syring, Isabella; Merseburger, Axel S; Vogel, Wenzel; Sievers, Elisabeth; Vlasic, Ignacija; Carlsson, Jessica; Andrén, Ove; Brossart, Peter; Duensing, Stefan; Svensson, Maria A; Shaikhibrahim, Zaki; Kirfel, Jutta; Perner, Sven
2017-04-01
Purpose: The Mediator complex is a multiprotein assembly, which serves as a hub for diverse signaling pathways to regulate gene expression. Because gene expression is frequently altered in cancer, a systematic understanding of the Mediator complex in malignancies could foster the development of novel targeted therapeutic approaches. Experimental Design: We performed a systematic deconvolution of the Mediator subunit expression profiles across 23 cancer entities ( n = 8,568) using data from The Cancer Genome Atlas (TCGA). Prostate cancer-specific findings were validated in two publicly available gene expression cohorts and a large cohort of primary and advanced prostate cancer ( n = 622) stained by immunohistochemistry. The role of CDK19 and CDK8 was evaluated by siRNA-mediated gene knockdown and inhibitor treatment in prostate cancer cell lines with functional assays and gene expression analysis by RNAseq. Results: Cluster analysis of TCGA expression data segregated tumor entities, indicating tumor-type-specific Mediator complex compositions. Only prostate cancer was marked by high expression of CDK19 In primary prostate cancer, CDK19 was associated with increased aggressiveness and shorter disease-free survival. During cancer progression, highest levels of CDK19 and of its paralog CDK8 were present in metastases. In vitro , inhibition of CDK19 and CDK8 by knockdown or treatment with a selective CDK8/CDK19 inhibitor significantly decreased migration and invasion. Conclusions: Our analysis revealed distinct transcriptional expression profiles of the Mediator complex across cancer entities indicating differential modes of transcriptional regulation. Moreover, it identified CDK19 and CDK8 to be specifically overexpressed during prostate cancer progression, highlighting their potential as novel therapeutic targets in advanced prostate cancer. Clin Cancer Res; 23(7); 1829-40. ©2016 AACR . ©2016 American Association for Cancer Research.
Transposon identification using profile HMMs
2010-01-01
Background Transposons are "jumping genes" that account for large quantities of repetitive content in genomes. They are known to affect transcriptional regulation in several different ways, and are implicated in many human diseases. Transposons are related to microRNAs and viruses, and many genes, pseudogenes, and gene promoters are derived from transposons or have origins in transposon-induced duplication. Modeling transposon-derived genomic content is difficult because they are poorly conserved. Profile hidden Markov models (profile HMMs), widely used for protein sequence family modeling, are rarely used for modeling DNA sequence families. The algorithm commonly used to estimate the parameters of profile HMMs, Baum-Welch, is prone to prematurely converge to local optima. The DNA domain is especially problematic for the Baum-Welch algorithm, since it has only four letters as opposed to the twenty residues of the amino acid alphabet. Results We demonstrate with a simulation study and with an application to modeling the MIR family of transposons that two recently introduced methods, Conditional Baum-Welch and Dynamic Model Surgery, achieve better estimates of the parameters of profile HMMs across a range of conditions. Conclusions We argue that these new algorithms expand the range of potential applications of profile HMMs to many important DNA sequence family modeling problems, including that of searching for and modeling the virus-like transposons that are found in all known genomes. PMID:20158867
Data compression and genomes: a two-dimensional life domain map.
Menconi, Giulia; Benci, Vieri; Buiatti, Marcello
2008-07-21
We define the complexity of DNA sequences as the information content per nucleotide, calculated by means of some Lempel-Ziv data compression algorithm. It is possible to use the statistics of the complexity values of the functional regions of different complete genomes to distinguish among genomes of different domains of life (Archaea, Bacteria and Eukarya). We shall focus on the distribution function of the complexity of non-coding regions. We show that the three domains may be plotted in separate regions within the two-dimensional space where the axes are the skewness coefficient and the curtosis coefficient of the aforementioned distribution. Preliminary results on 15 genomes are introduced.
Cifola, Ingrid; Bianchi, Cristina; Mangano, Eleonora; Bombelli, Silvia; Frascati, Fabio; Fasoli, Ester; Ferrero, Stefano; Di Stefano, Vitalba; Zipeto, Maria A; Magni, Fulvio; Signorini, Stefano; Battaglia, Cristina; Perego, Roberto A
2011-06-13
Clear cell renal cell carcinoma (ccRCC) is characterized by recurrent copy number alterations (CNAs) and loss of heterozygosity (LOH), which may have potential diagnostic and prognostic applications. Here, we explored whether ccRCC primary cultures, established from surgical tumor specimens, maintain the DNA profile of parental tumor tissues allowing a more confident CNAs and LOH discrimination with respect to the original tissues. We established a collection of 9 phenotypically well-characterized ccRCC primary cell cultures. Using the Affymetrix SNP array technology, we performed the genome-wide copy number (CN) profiling of both cultures and corresponding tumor tissues. Global concordance for each culture/tissue pair was assayed evaluating the correlations between whole-genome CN profiles and SNP allelic calls. CN analysis was performed using the two CNAG v3.0 and Partek software, and comparing results returned by two different algorithms (Hidden Markov Model and Genomic Segmentation). A very good overlap between the CNAs of each culture and corresponding tissue was observed. The finding, reinforced by high whole-genome CN correlations and SNP call concordances, provided evidence that each culture was derived from its corresponding tissue and maintained the genomic alterations of parental tumor. In addition, primary culture DNA profile remained stable for at least 3 weeks, till to third passage. These cultures showed a greater cell homogeneity and enrichment in tumor component than original tissues, thus enabling a better discrimination of CNAs and LOH. Especially for hemizygous deletions, primary cultures presented more evident CN losses, typically accompanied by LOH; differently, in original tissues the intensity of these deletions was weaken by normal cell contamination and LOH calls were missed. ccRCC primary cultures are a reliable in vitro model, well-reproducing original tumor genetics and phenotype, potentially useful for future functional approaches aimed to study genes or pathways involved in ccRCC etiopathogenesis and to identify novel clinical markers or therapeutic targets. Moreover, SNP array technology proved to be a powerful tool to better define the cell composition and homogeneity of RCC primary cultures. © 2011 Cifola et al; licensee BioMed Central Ltd.
Functional Study of Genes Essential for Autogamy and Nuclear Reorganization in Paramecium▿§
Nowak, Jacek K.; Gromadka, Robert; Juszczuk, Marek; Jerka-Dziadosz, Maria; Maliszewska, Kamila; Mucchielli, Marie-Hélène; Gout, Jean-François; Arnaiz, Olivier; Agier, Nicolas; Tang, Thomas; Aggerbeck, Lawrence P.; Cohen, Jean; Delacroix, Hervé; Sperling, Linda; Herbert, Christopher J.; Zagulski, Marek; Bétermier, Mireille
2011-01-01
Like all ciliates, Paramecium tetraurelia is a unicellular eukaryote that harbors two kinds of nuclei within its cytoplasm. At each sexual cycle, a new somatic macronucleus (MAC) develops from the germ line micronucleus (MIC) through a sequence of complex events, which includes meiosis, karyogamy, and assembly of the MAC genome from MIC sequences. The latter process involves developmentally programmed genome rearrangements controlled by noncoding RNAs and a specialized RNA interference machinery. We describe our first attempts to identify genes and biological processes that contribute to the progression of the sexual cycle. Given the high percentage of unknown genes annotated in the P. tetraurelia genome, we applied a global strategy to monitor gene expression profiles during autogamy, a self-fertilization process. We focused this pilot study on the genes carried by the largest somatic chromosome and designed dedicated DNA arrays covering 484 genes from this chromosome (1.2% of all genes annotated in the genome). Transcriptome analysis revealed four major patterns of gene expression, including two successive waves of gene induction. Functional analysis of 15 upregulated genes revealed four that are essential for vegetative growth, one of which is involved in the maintenance of MAC integrity and another in cell division or membrane trafficking. Two additional genes, encoding a MIC-specific protein and a putative RNA helicase localizing to the old and then to the new MAC, are specifically required during sexual processes. Our work provides a proof of principle that genes essential for meiosis and nuclear reorganization can be uncovered following genome-wide transcriptome analysis. PMID:21257794
Accurate and robust genomic prediction of celiac disease using statistical learning.
Abraham, Gad; Tye-Din, Jason A; Bhalala, Oneil G; Kowalczyk, Adam; Zobel, Justin; Inouye, Michael
2014-02-01
Practical application of genomic-based risk stratification to clinical diagnosis is appealing yet performance varies widely depending on the disease and genomic risk score (GRS) method. Celiac disease (CD), a common immune-mediated illness, is strongly genetically determined and requires specific HLA haplotypes. HLA testing can exclude diagnosis but has low specificity, providing little information suitable for clinical risk stratification. Using six European cohorts, we provide a proof-of-concept that statistical learning approaches which simultaneously model all SNPs can generate robust and highly accurate predictive models of CD based on genome-wide SNP profiles. The high predictive capacity replicated both in cross-validation within each cohort (AUC of 0.87-0.89) and in independent replication across cohorts (AUC of 0.86-0.9), despite differences in ethnicity. The models explained 30-35% of disease variance and up to ∼43% of heritability. The GRS's utility was assessed in different clinically relevant settings. Comparable to HLA typing, the GRS can be used to identify individuals without CD with ≥99.6% negative predictive value however, unlike HLA typing, fine-scale stratification of individuals into categories of higher-risk for CD can identify those that would benefit from more invasive and costly definitive testing. The GRS is flexible and its performance can be adapted to the clinical situation by adjusting the threshold cut-off. Despite explaining a minority of disease heritability, our findings indicate a genomic risk score provides clinically relevant information to improve upon current diagnostic pathways for CD and support further studies evaluating the clinical utility of this approach in CD and other complex diseases.
Kurth, Daniel; Belfiore, Carolina; Gorriti, Marta F.; Cortez, Néstor; Farias, María E.; Albarracín, Virginia H.
2015-01-01
Ultraviolet radiation can damage biomolecules, with detrimental or even lethal effects for life. Even though lower wavelengths are filtered by the ozone layer, a significant amount of harmful UV-B and UV-A radiation reach Earth’s surface, particularly in high altitude environments. high-altitude Andean lakes (HAALs) are a group of disperse shallow lakes and salterns, located at the Dry Central Andes region in South America at altitudes above 3,000 m. As it is considered one of the highest UV-exposed environments, HAAL microbes constitute model systems to study UV-resistance mechanisms in environmental bacteria at various complexity levels. Herein, we present the genome sequence of Acinetobacter sp. Ver3, a gammaproteobacterium isolated from Lake Verde (4,400 m), together with further experimental evidence supporting the phenomenological observations regarding this bacterium ability to cope with increased UV-induced DNA damage. Comparison with the genomes of other Acinetobacter strains highlighted a number of unique genes, such as a novel cryptochrome. Proteomic profiling of UV-exposed cells identified up-regulated proteins such as a specific cytoplasmic catalase, a putative regulator, and proteins associated to amino acid and protein synthesis. Down-regulated proteins were related to several energy-generating pathways such as glycolysis, beta-oxidation of fatty acids, and electronic respiratory chain. To the best of our knowledge, this is the first report on a genome from a polyextremophilic Acinetobacter strain. From the genomic and proteomic data, an “UV-resistome” was defined, encompassing the genes that would support the outstanding UV-resistance of this strain. PMID:25954258
Kurth, Daniel; Belfiore, Carolina; Gorriti, Marta F; Cortez, Néstor; Farias, María E; Albarracín, Virginia H
2015-01-01
Ultraviolet radiation can damage biomolecules, with detrimental or even lethal effects for life. Even though lower wavelengths are filtered by the ozone layer, a significant amount of harmful UV-B and UV-A radiation reach Earth's surface, particularly in high altitude environments. high-altitude Andean lakes (HAALs) are a group of disperse shallow lakes and salterns, located at the Dry Central Andes region in South America at altitudes above 3,000 m. As it is considered one of the highest UV-exposed environments, HAAL microbes constitute model systems to study UV-resistance mechanisms in environmental bacteria at various complexity levels. Herein, we present the genome sequence of Acinetobacter sp. Ver3, a gammaproteobacterium isolated from Lake Verde (4,400 m), together with further experimental evidence supporting the phenomenological observations regarding this bacterium ability to cope with increased UV-induced DNA damage. Comparison with the genomes of other Acinetobacter strains highlighted a number of unique genes, such as a novel cryptochrome. Proteomic profiling of UV-exposed cells identified up-regulated proteins such as a specific cytoplasmic catalase, a putative regulator, and proteins associated to amino acid and protein synthesis. Down-regulated proteins were related to several energy-generating pathways such as glycolysis, beta-oxidation of fatty acids, and electronic respiratory chain. To the best of our knowledge, this is the first report on a genome from a polyextremophilic Acinetobacter strain. From the genomic and proteomic data, an "UV-resistome" was defined, encompassing the genes that would support the outstanding UV-resistance of this strain.
Integrated analysis of chromosome copy number variation and gene expression in cervical carcinoma
Yan, Deng; Yi, Song; Chiu, Wang Chi; Qin, Liu Gui; Kin, Wong Hoi; Kwok Hung, Chung Tony; Linxiao, Han; Wai, Choy Kwong; Yi, Sui; Tao, Yang; Tao, Tang
2017-01-01
Objective This study was conducted to explore chromosomal copy number variations (CNV) and transcript expression and to examine pathways in cervical pathogenesis using genome-wide high resolution microarrays. Methods Genome-wide chromosomal CNVs were investigated in 6 cervical cancer cell lines by Human Genome CGH Microarray Kit (4x44K). Gene expression profiles in cervical cancer cell lines, primary cervical carcinoma and normal cervical epithelium tissues were also studied using the Whole Human Genome Microarray Kit (4x44K). Results Fifty common chromosomal CNVs were identified in the cervical cancer cell lines. Correlation analysis revealed that gene up-regulation or down-regulation is significantly correlated with genomic amplification (P=0.009) or deletion (P=0.006) events. Expression profiles were identified through cluster analysis. Gene annotation analysis pinpointed cell cycle pathways was significantly (P=1.15E-08) affected in cervical cancer. Common CNVs were associated with cervical cancer. Conclusion Chromosomal CNVs may contribute to their transcript expression in cervical cancer. PMID:29312578
BLISS is a versatile and quantitative method for genome-wide profiling of DNA double-strand breaks.
Yan, Winston X; Mirzazadeh, Reza; Garnerone, Silvano; Scott, David; Schneider, Martin W; Kallas, Tomasz; Custodio, Joaquin; Wernersson, Erik; Li, Yinqing; Gao, Linyi; Federova, Yana; Zetsche, Bernd; Zhang, Feng; Bienko, Magda; Crosetto, Nicola
2017-05-12
Precisely measuring the location and frequency of DNA double-strand breaks (DSBs) along the genome is instrumental to understanding genomic fragility, but current methods are limited in versatility, sensitivity or practicality. Here we present Breaks Labeling In Situ and Sequencing (BLISS), featuring the following: (1) direct labelling of DSBs in fixed cells or tissue sections on a solid surface; (2) low-input requirement by linear amplification of tagged DSBs by in vitro transcription; (3) quantification of DSBs through unique molecular identifiers; and (4) easy scalability and multiplexing. We apply BLISS to profile endogenous and exogenous DSBs in low-input samples of cancer cells, embryonic stem cells and liver tissue. We demonstrate the sensitivity of BLISS by assessing the genome-wide off-target activity of two CRISPR-associated RNA-guided endonucleases, Cas9 and Cpf1, observing that Cpf1 has higher specificity than Cas9. Our results establish BLISS as a versatile, sensitive and efficient method for genome-wide DSB mapping in many applications.
Integrated analysis of chromosome copy number variation and gene expression in cervical carcinoma.
Yan, Deng; Yi, Song; Chiu, Wang Chi; Qin, Liu Gui; Kin, Wong Hoi; Kwok Hung, Chung Tony; Linxiao, Han; Wai, Choy Kwong; Yi, Sui; Tao, Yang; Tao, Tang
2017-12-12
This study was conducted to explore chromosomal copy number variations (CNV) and transcript expression and to examine pathways in cervical pathogenesis using genome-wide high resolution microarrays. Genome-wide chromosomal CNVs were investigated in 6 cervical cancer cell lines by Human Genome CGH Microarray Kit (4x44K). Gene expression profiles in cervical cancer cell lines, primary cervical carcinoma and normal cervical epithelium tissues were also studied using the Whole Human Genome Microarray Kit (4x44K). Fifty common chromosomal CNVs were identified in the cervical cancer cell lines. Correlation analysis revealed that gene up-regulation or down-regulation is significantly correlated with genomic amplification ( P =0.009) or deletion ( P =0.006) events. Expression profiles were identified through cluster analysis. Gene annotation analysis pinpointed cell cycle pathways was significantly ( P =1.15E-08) affected in cervical cancer. Common CNVs were associated with cervical cancer. Chromosomal CNVs may contribute to their transcript expression in cervical cancer.
An Adenovirus DNA Replication Factor, but Not Incoming Genome Complexes, Targets PML Nuclear Bodies.
Komatsu, Tetsuro; Nagata, Kyosuke; Wodrich, Harald
2016-02-01
Promyelocytic leukemia protein nuclear bodies (PML-NBs) are subnuclear domains implicated in cellular antiviral responses. Despite the antiviral activity, several nuclear replicating DNA viruses use the domains as deposition sites for the incoming viral genomes and/or as sites for viral DNA replication, suggesting that PML-NBs are functionally relevant during early viral infection to establish productive replication. Although PML-NBs and their components have also been implicated in the adenoviral life cycle, it remains unclear whether incoming adenoviral genome complexes target PML-NBs. Here we show using immunofluorescence and live-cell imaging analyses that incoming adenovirus genome complexes neither localize at nor recruit components of PML-NBs during early phases of infection. We further show that the viral DNA binding protein (DBP), an early expressed viral gene and essential DNA replication factor, independently targets PML-NBs. We show that DBP oligomerization is required to selectively recruit the PML-NB components Sp100 and USP7. Depletion experiments suggest that the absence of one PML-NB component might not affect the recruitment of other components toward DBP oligomers. Thus, our findings suggest a model in which an adenoviral DNA replication factor, but not incoming viral genome complexes, targets and modulates PML-NBs to support a conducive state for viral DNA replication and argue against a generalized concept that PML-NBs target incoming viral genomes. The immediate fate upon nuclear delivery of genomes of incoming DNA viruses is largely unclear. Early reports suggested that incoming genomes of herpesviruses are targeted and repressed by PML-NBs immediately upon nuclear import. Genome localization and/or viral DNA replication has also been observed at PML-NBs for other DNA viruses. Thus, it was suggested that PML-NBs may immediately sense and target nuclear viral genomes and hence serve as sites for deposition of incoming viral genomes and/or subsequent viral DNA replication. Here we performed a detailed analyses of the spatiotemporal distribution of incoming adenoviral genome complexes and found, in contrast to the expectation, that an adenoviral DNA replication factor, but not incoming genomes, targets PML-NBs. Thus, our findings may explain why adenoviral genomes could be observed at PML-NBs in earlier reports but argue against a generalized role for PML-NBs in targeting invading viral genomes. Copyright © 2016, American Society for Microbiology. All Rights Reserved.
Malaria in India: the center for the study of complex malaria in India.
Das, Aparup; Anvikar, Anupkumar R; Cator, Lauren J; Dhiman, Ramesh C; Eapen, Alex; Mishra, Neelima; Nagpal, Bhupinder N; Nanda, Nutan; Raghavendra, Kamaraju; Read, Andrew F; Sharma, Surya K; Singh, Om P; Singh, Vineeta; Sinnis, Photini; Srivastava, Harish C; Sullivan, Steven A; Sutton, Patrick L; Thomas, Matthew B; Carlton, Jane M; Valecha, Neena
2012-03-01
Malaria is a major public health problem in India and one which contributes significantly to the overall malaria burden in Southeast Asia. The National Vector Borne Disease Control Program of India reported ∼1.6 million cases and ∼1100 malaria deaths in 2009. Some experts argue that this is a serious underestimation and that the actual number of malaria cases per year is likely between 9 and 50 times greater, with an approximate 13-fold underestimation of malaria-related mortality. The difficulty in making these estimations is further exacerbated by (i) highly variable malaria eco-epidemiological profiles, (ii) the transmission and overlap of multiple Plasmodium species and Anopheles vectors, (iii) increasing antimalarial drug resistance and insecticide resistance, and (iv) the impact of climate change on each of these variables. Simply stated, the burden of malaria in India is complex. Here we describe plans for a Center for the Study of Complex Malaria in India (CSCMi), one of ten International Centers of Excellence in Malaria Research (ICEMRs) located in malarious regions of the world recently funded by the National Institute of Allergy and Infectious Diseases, National Institutes of Health. The CSCMi is a close partnership between Indian and United States scientists, and aims to address major gaps in our understanding of the complexity of malaria in India, including changing patterns of epidemiology, vector biology and control, drug resistance, and parasite genomics. We hope that such a multidisciplinary approach that integrates clinical and field studies with laboratory, molecular, and genomic methods will provide a powerful combination for malaria control and prevention in India. Copyright © 2011 Elsevier B.V. All rights reserved.
Comparative genome analysis in the integrated microbial genomes (IMG) system.
Markowitz, Victor M; Kyrpides, Nikos C
2007-01-01
Comparative genome analysis is critical for the effective exploration of a rapidly growing number of complete and draft sequences for microbial genomes. The Integrated Microbial Genomes (IMG) system (img.jgi.doe.gov) has been developed as a community resource that provides support for comparative analysis of microbial genomes in an integrated context. IMG allows users to navigate the multidimensional microbial genome data space and focus their analysis on a subset of genes, genomes, and functions of interest. IMG provides graphical viewers, summaries, and occurrence profile tools for comparing genes, pathways, and functions (terms) across specific genomes. Genes can be further examined using gene neighborhoods and compared with sequence alignment tools.
Ostrovnaya, Irina; Seshan, Venkatraman E; Olshen, Adam B; Begg, Colin B
2011-06-15
If a cancer patient develops multiple tumors, it is sometimes impossible to determine whether these tumors are independent or clonal based solely on pathological characteristics. Investigators have studied how to improve this diagnostic challenge by comparing the presence of loss of heterozygosity (LOH) at selected genetic locations of tumor samples, or by comparing genomewide copy number array profiles. We have previously developed statistical methodology to compare such genomic profiles for an evidence of clonality. We assembled the software for these tests in a new R package called 'Clonality'. For LOH profiles, the package contains significance tests. The analysis of copy number profiles includes a likelihood ratio statistic and reference distribution, as well as an option to produce various plots that summarize the results. Bioconductor (http://bioconductor.org/packages/release/bioc/html/Clonality.html) and http://www.mskcc.org/mskcc/html/13287.cfm.
Rice, K L; Lin, X; Wolniak, K; Ebert, B L; Berkofsky-Fessler, W; Buzzai, M; Sun, Y; Xi, C; Elkin, P; Levine, R; Golub, T; Gilliland, D G; Crispino, J D; Licht, J D; Zhang, W
2011-01-01
Polycythemia vera (PV), essential thrombocythemia and primary myelofibrosis, are myeloproliferative neoplasms (MPNs) with distinct clinical features and are associated with the JAK2V617F mutation. To identify genomic anomalies involved in the pathogenesis of these disorders, we profiled 87 MPN patients using Affymetrix 250K single-nucleotide polymorphism (SNP) arrays. Aberrations affecting chr9 were the most frequently observed and included 9pLOH (n=16), trisomy 9 (n=6) and amplifications of 9p13.3–23.3 (n=1), 9q33.1–34.13 (n=1) and 9q34.13 (n=6). Patients with trisomy 9 were associated with elevated JAK2V617F mutant allele burden, suggesting that gain of chr9 represents an alternative mechanism for increasing JAK2V617F dosage. Gene expression profiling of patients with and without chr9 abnormalities (+9, 9pLOH), identified genes potentially involved in disease pathogenesis including JAK2, STAT5B and MAPK14. We also observed recurrent gains of 1p36.31–36.33 (n=6), 17q21.2–q21.31 (n=5) and 17q25.1–25.3 (n=5) and deletions affecting 18p11.31–11.32 (n=8). Combined SNP and gene expression analysis identified aberrations affecting components of a non-canonical PRC2 complex (EZH1, SUZ12 and JARID2) and genes comprising a ‘HSC signature' (MLLT3, SMARCA2 and PBX1). We show that NFIB, which is amplified in 7/87 MPN patients and upregulated in PV CD34+ cells, protects cells from apoptosis induced by cytokine withdrawal. PMID:22829077
Daniels, Noah M; Hosur, Raghavendra; Berger, Bonnie; Cowen, Lenore J
2012-05-01
One of the most successful methods to date for recognizing protein sequences that are evolutionarily related has been profile hidden Markov models (HMMs). However, these models do not capture pairwise statistical preferences of residues that are hydrogen bonded in beta sheets. These dependencies have been partially captured in the HMM setting by simulated evolution in the training phase and can be fully captured by Markov random fields (MRFs). However, the MRFs can be computationally prohibitive when beta strands are interleaved in complex topologies. We introduce SMURFLite, a method that combines both simplified MRFs and simulated evolution to substantially improve remote homology detection for beta structures. Unlike previous MRF-based methods, SMURFLite is computationally feasible on any beta-structural motif. We test SMURFLite on all propeller and barrel folds in the mainly-beta class of the SCOP hierarchy in stringent cross-validation experiments. We show a mean 26% (median 16%) improvement in area under curve (AUC) for beta-structural motif recognition as compared with HMMER (a well-known HMM method) and a mean 33% (median 19%) improvement as compared with RAPTOR (a well-known threading method) and even a mean 18% (median 10%) improvement in AUC over HHPred (a profile-profile HMM method), despite HHpred's use of extensive additional training data. We demonstrate SMURFLite's ability to scale to whole genomes by running a SMURFLite library of 207 beta-structural SCOP superfamilies against the entire genome of Thermotoga maritima, and make over a 100 new fold predictions. Availability and implementaion: A webserver that runs SMURFLite is available at: http://smurf.cs.tufts.edu/smurflite/
He, Liye; Tang, Jing; Andersson, Emma I; Timonen, Sanna; Koschmieder, Steffen; Wennerberg, Krister; Mustjoki, Satu; Aittokallio, Tero
2018-05-01
The molecular pathways that drive cancer progression and treatment resistance are highly redundant and variable between individual patients with the same cancer type. To tackle this complex rewiring of pathway cross-talk, personalized combination treatments targeting multiple cancer growth and survival pathways are required. Here we implemented a computational-experimental drug combination prediction and testing (DCPT) platform for efficient in silico prioritization and ex vivo testing in patient-derived samples to identify customized synergistic combinations for individual cancer patients. DCPT used drug-target interaction networks to traverse the massive combinatorial search spaces among 218 compounds (a total of 23,653 pairwise combinations) and identified cancer-selective synergies by using differential single-compound sensitivity profiles between patient cells and healthy controls, hence reducing the likelihood of toxic combination effects. A polypharmacology-based machine learning modeling and network visualization made use of baseline genomic and molecular profiles to guide patient-specific combination testing and clinical translation phases. Using T-cell prolymphocytic leukemia (T-PLL) as a first case study, we show how the DCPT platform successfully predicted distinct synergistic combinations for each of the three T-PLL patients, each presenting with different resistance patterns and synergy mechanisms. In total, 10 of 24 (42%) of selective combination predictions were experimentally confirmed to show synergy in patient-derived samples ex vivo The identified selective synergies among approved drugs, including tacrolimus and temsirolimus combined with BCL-2 inhibitor venetoclax, may offer novel drug repurposing opportunities for treating T-PLL. Significance: An integrated use of functional drug screening combined with genomic and molecular profiling enables patient-customized prediction and testing of drug combination synergies for T-PLL patients. Cancer Res; 78(9); 2407-18. ©2018 AACR . ©2018 American Association for Cancer Research.
Manzoor, Shahid; Bongcam-Rudloff, Erik; Schnürer, Anna; Müller, Bettina
2016-01-01
Syntrophaceticus schinkii is a mesophilic, anaerobic bacterium capable of oxidising acetate to CO2 and H2 in intimate association with a methanogenic partner, a syntrophic relationship which operates close to the energetic limits of microbial life. Syntrophaceticus schinkii has been identified as a key organism in engineered methane-producing processes relying on syntrophic acetate oxidation as the main methane-producing pathway. However, due to strict cultivation requirements and difficulties in reconstituting the thermodynamically unfavourable acetate oxidation, the physiology of this functional group is poorly understood. Genome-guided and whole transcriptome analyses performed in the present study provide new insights into habitat adaptation, syntrophic acetate oxidation and energy conservation. The working draft genome of Syntrophaceticus schinkii indicates limited metabolic capacities, with lack of organic nutrient uptake systems, chemotactic machineries, carbon catabolite repression and incomplete biosynthesis pathways. Ech hydrogenase, [FeFe] hydrogenases, [NiFe] hydrogenases, F1F0-ATP synthase and membrane-bound and cytoplasmic formate dehydrogenases were found clearly expressed, whereas Rnf and a predicted oxidoreductase/heterodisulphide reductase complex, both found encoded in the genome, were not expressed under syntrophic growth condition. A transporter sharing similarities to the high-affinity acetate transporters of aceticlastic methanogens was also found expressed, suggesting that Syntrophaceticus schinkii can potentially compete with methanogens for acetate. Acetate oxidation seems to proceed via the Wood-Ljungdahl pathway as all genes involved in this pathway were highly expressed. This study shows that Syntrophaceticus schinkii is a highly specialised, habitat-adapted organism relying on syntrophic acetate oxidation rather than metabolic versatility. By expanding its complement of respiratory complexes, it might overcome limiting bioenergetic barriers, and drive efficient energy conservation from reactions operating close to the thermodynamic equilibrium, which might enable S. schinkii to occupy the same niche as the aceticlastic methanogens. The knowledge gained here will help specify process conditions supporting efficient and robust biogas production and will help identify mechanisms important for the syntrophic lifestyle. PMID:27851830
Integrative Analysis Reveals Relationships of Genetic and Epigenetic Alterations in Osteosarcoma
Skårn, Magne; Namløs, Heidi M.; Barragan-Polania, Ana H.; Cleton-Jansen, Anne-Marie; Serra, Massimo; Liestøl, Knut; Hogendoorn, Pancras C. W.; Hovig, Eivind; Myklebost, Ola; Meza-Zepeda, Leonardo A.
2012-01-01
Background Osteosarcomas are the most common non-haematological primary malignant tumours of bone, and all conventional osteosarcomas are high-grade tumours showing complex genomic aberrations. We have integrated genome-wide genetic and epigenetic profiles from the EuroBoNeT panel of 19 human osteosarcoma cell lines based on microarray technologies. Principal Findings The cell lines showed complex patterns of DNA copy number changes, where genomic copy number gains were significantly associated with gene-rich regions and losses with gene-poor regions. By integrating the datasets, 350 genes were identified as having two types of aberrations (gain/over-expression, hypo-methylation/over-expression, loss/under-expression or hyper-methylation/under-expression) using a recurrence threshold of 6/19 (>30%) cell lines. The genes showed in general alterations in either DNA copy number or DNA methylation, both within individual samples and across the sample panel. These 350 genes are involved in embryonic skeletal system development and morphogenesis, as well as remodelling of extracellular matrix. The aberrations of three selected genes, CXCL5, DLX5 and RUNX2, were validated in five cell lines and five tumour samples using PCR techniques. Several genes were hyper-methylated and under-expressed compared to normal osteoblasts, and expression could be reactivated by demethylation using 5-Aza-2′-deoxycytidine treatment for four genes tested; AKAP12, CXCL5, EFEMP1 and IL11RA. Globally, there was as expected a significant positive association between gain and over-expression, loss and under-expression as well as hyper-methylation and under-expression, but gain was also associated with hyper-methylation and under-expression, suggesting that hyper-methylation may oppose the effects of increased copy number for detrimental genes. Conclusions Integrative analysis of genome-wide genetic and epigenetic alterations identified dependencies and relationships between DNA copy number, DNA methylation and mRNA expression in osteosarcomas, contributing to better understanding of osteosarcoma biology. PMID:23144859
The Genomic and Transcriptomic Landscape of a HeLa Cell Line
Landry, Jonathan J. M.; Pyl, Paul Theodor; Rausch, Tobias; Zichner, Thomas; Tekkedil, Manu M.; Stütz, Adrian M.; Jauch, Anna; Aiyar, Raeka S.; Pau, Gregoire; Delhomme, Nicolas; Gagneur, Julien; Korbel, Jan O.; Huber, Wolfgang; Steinmetz, Lars M.
2013-01-01
HeLa is the most widely used model cell line for studying human cellular and molecular biology. To date, no genomic reference for this cell line has been released, and experiments have relied on the human reference genome. Effective design and interpretation of molecular genetic studies performed using HeLa cells require accurate genomic information. Here we present a detailed genomic and transcriptomic characterization of a HeLa cell line. We performed DNA and RNA sequencing of a HeLa Kyoto cell line and analyzed its mutational portfolio and gene expression profile. Segmentation of the genome according to copy number revealed a remarkably high level of aneuploidy and numerous large structural variants at unprecedented resolution. Some of the extensive genomic rearrangements are indicative of catastrophic chromosome shattering, known as chromothripsis. Our analysis of the HeLa gene expression profile revealed that several pathways, including cell cycle and DNA repair, exhibit significantly different expression patterns from those in normal human tissues. Our results provide the first detailed account of genomic variants in the HeLa genome, yielding insight into their impact on gene expression and cellular function as well as their origins. This study underscores the importance of accounting for the strikingly aberrant characteristics of HeLa cells when designing and interpreting experiments, and has implications for the use of HeLa as a model of human biology. PMID:23550136
Jang, In Sock; Dienstmann, Rodrigo; Margolin, Adam A; Guinney, Justin
2015-01-01
Complex mechanisms involving genomic aberrations in numerous proteins and pathways are believed to be a key cause of many diseases such as cancer. With recent advances in genomics, elucidating the molecular basis of cancer at a patient level is now feasible, and has led to personalized treatment strategies whereby a patient is treated according to his or her genomic profile. However, there is growing recognition that existing treatment modalities are overly simplistic, and do not fully account for the deep genomic complexity associated with sensitivity or resistance to cancer therapies. To overcome these limitations, large-scale pharmacogenomic screens of cancer cell lines--in conjunction with modern statistical learning approaches--have been used to explore the genetic underpinnings of drug response. While these analyses have demonstrated the ability to infer genetic predictors of compound sensitivity, to date most modeling approaches have been data-driven, i.e. they do not explicitly incorporate domain-specific knowledge (priors) in the process of learning a model. While a purely data-driven approach offers an unbiased perspective of the data--and may yield unexpected or novel insights--this strategy introduces challenges for both model interpretability and accuracy. In this study, we propose a novel prior-incorporated sparse regression model in which the choice of informative predictor sets is carried out by knowledge-driven priors (gene sets) in a stepwise fashion. Under regularization in a linear regression model, our algorithm is able to incorporate prior biological knowledge across the predictive variables thereby improving the interpretability of the final model with no loss--and often an improvement--in predictive performance. We evaluate the performance of our algorithm compared to well-known regularization methods such as LASSO, Ridge and Elastic net regression in the Cancer Cell Line Encyclopedia (CCLE) and Genomics of Drug Sensitivity in Cancer (Sanger) pharmacogenomics datasets, demonstrating that incorporation of the biological priors selected by our model confers improved predictability and interpretability, despite much fewer predictors, over existing state-of-the-art methods.
Protein complexes are assemblies of subunits that have co-evolved to execute one or many coordinated functions in the cellular environment. Functional annotation of mammalian protein complexes is critical to understanding biological processes, as well as disease mechanisms. Here, we used genetic co-essentiality derived from genome-scale RNAi- and CRISPR-Cas9-based fitness screens performed across hundreds of human cancer cell lines to assign measures of functional similarity.
Transposable elements as a molecular evolutionary force
NASA Technical Reports Server (NTRS)
Fedoroff, N. V.
1999-01-01
This essay addresses the paradoxes of the complex and highly redundant genomes. The central theses developed are that: (1) the distinctive feature of complex genomes is the existence of epigenetic mechanisms that permit extremely high levels of both tandem and dispersed redundancy; (2) the special contribution of transposable elements is to modularize the genome; and (3) the labilizing forces of recombination and transposition are just barely contained, giving a dynamic genetic system of ever increasing complexity that verges on the chaotic.
Grötzinger, Stefan W.; Alam, Intikhab; Ba Alawi, Wail; Bajic, Vladimir B.; Stingl, Ulrich; Eppinger, Jörg
2014-01-01
Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs) and poor homology of novel extremophile's genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the Integrated Data Warehouse of Microbial Genomes (INDIGO) data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes) may translate into false positives when searching for specific functions. The Profile and Pattern Matching (PPM) strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO)-terms (which represent enzyme function profiles) and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern). The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2577 enzyme commission (E.C.) numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from six different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter) and PROSITE IDs (pattern filter). Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits) are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns) are present. Scripts for annotation, as well as for the PPM algorithm, are available through the INDIGO website. PMID:24778629
USDA-ARS?s Scientific Manuscript database
Single Molecule Real-Time (SMRT) sequencing provides advantages to the sequencing of complex genomes. The long reads generated are superior for resolving complex genomic regions and provide highly contiguous de novo assemblies. Current SMRTbell libraries generate average read lengths of 10-15kb. How...
USDA-ARS?s Scientific Manuscript database
The identification of specific genes underlying phenotypic variation of complex traits remains one of the greatest challenges in biology despite having genome sequences and more powerful tools. Most genome-wide screens lack sufficient resolving power as they typically depend on linkage. One altern...
Mass Spectrometry Based Ultrasensitive DNA Methylation Profiling Using Target Fragmentation Assay.
Lin, Xiang-Cheng; Zhang, Ting; Liu, Lan; Tang, Hao; Yu, Ru-Qin; Jiang, Jian-Hui
2016-01-19
Efficient tools for profiling DNA methylation in specific genes are essential for epigenetics and clinical diagnostics. Current DNA methylation profiling techniques have been limited by inconvenient implementation, requirements of specific reagents, and inferior accuracy in quantifying methylation degree. We develop a novel mass spectrometry method, target fragmentation assay (TFA), which enable to profile methylation in specific sequences. This method combines selective capture of DNA target from restricted cleavage of genomic DNA using magnetic separation with MS detection of the nonenzymatic hydrolysates of target DNA. This method is shown to be highly sensitive with a detection limit as low as 0.056 amol, allowing direct profiling of methylation using genome DNA without preamplification. Moreover, this method offers a unique advantage in accurately determining DNA methylation level. The clinical applicability was demonstrated by DNA methylation analysis using prostate tissue samples, implying the potential of this method as a useful tool for DNA methylation profiling in early detection of related diseases.
Genomic profiling of human penile carcinoma predicts worse prognosis and survival.
Busso-Lopes, Ariane F; Marchi, Fábio A; Kuasne, Hellen; Scapulatempo-Neto, Cristovam; Trindade-Filho, José Carlos S; de Jesus, Carlos Márcio N; Lopes, Ademar; Guimarães, Gustavo C; Rogatto, Silvia R
2015-02-01
The molecular mechanisms underlying penile carcinoma are still poorly understood, and the detection of genetic markers would be of great benefit for these patients. In this study, we assessed the genomic profile aiming at identifying potential prognostic biomarkers in penile carcinoma. Globally, 46 penile carcinoma samples were considered to evaluate DNA copy-number alterations via array comparative genomic hybridization (aCGH) combined with human papillomavirus (HPV) genotyping. Specific genes were investigated by using qPCR, FISH, and RT-qPCR. Genomic alterations mapped at 3p and 8p were related to worse prognostic features, including advanced T and clinical stage, recurrence and death from the disease. Losses of 3p21.1-p14.3 and gains of 3q25.31-q29 were associated with reduced cancer-specific and disease-free survival. Genomic alterations detected for chromosome 3 (LAMP3, PPARG, TNFSF10 genes) and 8 (DLC1) were evaluated by qPCR. DLC1 and PPARG losses were associated with poor prognosis characteristics. Losses of DLC1 were an independent risk factor for recurrence on multivariate analysis. The gene-expression analysis showed downexpression of DLC1 and PPARG and overexpression of LAMP3 and TNFSF10 genes. Chromosome Y losses and MYC gene (8q24) gains were confirmed by FISH. HPV infection was detected in 34.8% of the samples, and 19 differential genomic regions were obtained related to viral status. At first time, we described recurrent copy-number alterations and its potential prognostic value in penile carcinomas. We also showed a specific genomic profile according to HPV infection, supporting the hypothesis that penile tumors present distinct etiologies according to virus status. ©2014 American Association for Cancer Research.
Clinically Applicable Inhibitors Impacting Genome Stability.
Prakash, Anu; Garcia-Moreno, Juan F; Brown, James A L; Bourke, Emer
2018-05-13
Advances in technology have facilitated the molecular profiling (genomic and transcriptomic) of tumours, and has led to improved stratification of patients and the individualisation of treatment regimes. To fully realize the potential of truly personalised treatment options, we need targeted therapies that precisely disrupt the compensatory pathways identified by profiling which allow tumours to survive or gain resistance to treatments. Here, we discuss recent advances in novel therapies that impact the genome (chromosomes and chromatin), pathways targeted and the stage of the pathways targeted. The current state of research will be discussed, with a focus on compounds that have advanced into trials (clinical and pre-clinical). We will discuss inhibitors of specific DNA damage responses and other genome stability pathways, including those in development, which are likely to synergistically combine with current therapeutic options. Tumour profiling data, combined with the knowledge of new treatments that affect the regulation of essential tumour signalling pathways, is revealing fundamental insights into cancer progression and resistance mechanisms. This is the forefront of the next evolution of advanced oncology medicine that will ultimately lead to improved survival and may, one day, result in many cancers becoming chronic conditions, rather than fatal diseases.
Williams, Richard D; Al-Saadi, Reem; Natrajan, Rachael; Mackay, Alan; Chagtai, Tasnim; Little, Suzanne; Hing, Sandra N; Fenwick, Kerry; Ashworth, Alan; Grundy, Paul; Anderson, James R; Dome, Jeffrey S; Perlman, Elizabeth J; Jones, Chris; Pritchard-Jones, Kathy
2011-12-01
Anaplasia in Wilms tumor, a distinctive histology characterized by abnormal mitoses, is associated with poor patient outcome. While anaplastic tumors frequently harbour TP53 mutations, little is otherwise known about their molecular biology. We have used array comparative genomic hybridization (aCGH) and cDNA microarray expression profiling to compare anaplastic and favorable histology Wilms tumors to determine their common and differentiating features. In addition to changes on 17p, consistent with TP53 deletion, recurrent anaplasia-specific genomic loss and under-expression were noted in several other regions, most strikingly 4q and 14q. Further aberrations, including gain of 1q and loss of 16q were common to both histologies. Focal gain of MYCN, initially detected by high resolution aCGH profiling in 6/61 anaplastic samples, was confirmed in a significant proportion of both tumor types by a genomic quantitative PCR survey of over 400 tumors. Overall, these results are consistent with a model where anaplasia, rather than forming an entirely distinct molecular entity, arises from the general continuum of Wilms tumor by the acquisition of additional genomic changes at multiple loci. Copyright © 2011 Wiley Periodicals, Inc.
Hilson, Pierre; Allemeersch, Joke; Altmann, Thomas; Aubourg, Sébastien; Avon, Alexandra; Beynon, Jim; Bhalerao, Rishikesh P.; Bitton, Frédérique; Caboche, Michel; Cannoot, Bernard; Chardakov, Vasil; Cognet-Holliger, Cécile; Colot, Vincent; Crowe, Mark; Darimont, Caroline; Durinck, Steffen; Eickhoff, Holger; de Longevialle, Andéol Falcon; Farmer, Edward E.; Grant, Murray; Kuiper, Martin T.R.; Lehrach, Hans; Léon, Céline; Leyva, Antonio; Lundeberg, Joakim; Lurin, Claire; Moreau, Yves; Nietfeld, Wilfried; Paz-Ares, Javier; Reymond, Philippe; Rouzé, Pierre; Sandberg, Goran; Segura, Maria Dolores; Serizet, Carine; Tabrett, Alexandra; Taconnat, Ludivine; Thareau, Vincent; Van Hummelen, Paul; Vercruysse, Steven; Vuylsteke, Marnik; Weingartner, Magdalena; Weisbeek, Peter J.; Wirta, Valtteri; Wittink, Floyd R.A.; Zabeau, Marc; Small, Ian
2004-01-01
Microarray transcript profiling and RNA interference are two new technologies crucial for large-scale gene function studies in multicellular eukaryotes. Both rely on sequence-specific hybridization between complementary nucleic acid strands, inciting us to create a collection of gene-specific sequence tags (GSTs) representing at least 21,500 Arabidopsis genes and which are compatible with both approaches. The GSTs were carefully selected to ensure that each of them shared no significant similarity with any other region in the Arabidopsis genome. They were synthesized by PCR amplification from genomic DNA. Spotted microarrays fabricated from the GSTs show good dynamic range, specificity, and sensitivity in transcript profiling experiments. The GSTs have also been transferred to bacterial plasmid vectors via recombinational cloning protocols. These cloned GSTs constitute the ideal starting point for a variety of functional approaches, including reverse genetics. We have subcloned GSTs on a large scale into vectors designed for gene silencing in plant cells. We show that in planta expression of GST hairpin RNA results in the expected phenotypes in silenced Arabidopsis lines. These versatile GST resources provide novel and powerful tools for functional genomics. PMID:15489341
Pancreatic Cancer Genomics 2.0: Profiling Metastases.
Collisson, Eric A; Maitra, Anirban
2017-03-13
Pancreatic ductal adenocarcinoma, even when diagnosed early, nearly always metastasizes. Recurrent mutations and genomic instability are early events in the disease. Two recent papers advance our understanding of how the cancer genome evolves as the primary tumor migrates from its origin in the pancreas to colonize distant metastatic sites. Copyright © 2017 Elsevier Inc. All rights reserved.
Microeconomic principles explain an optimal genome size in bacteria.
Ranea, Juan A G; Grant, Alastair; Thornton, Janet M; Orengo, Christine A
2005-01-01
Bacteria can clearly enhance their survival by expanding their genetic repertoire. However, the tight packing of the bacterial genome and the fact that the most evolved species do not necessarily have the biggest genomes suggest there are other evolutionary factors limiting their genome expansion. To clarify these restrictions on size, we studied those protein families contributing most significantly to bacterial-genome complexity. We found that all bacteria apply the same basic and ancestral 'molecular technology' to optimize their reproductive efficiency. The same microeconomics principles that define the optimum size in a factory can also explain the existence of a statistical optimum in bacterial genome size. This optimum is reached when the bacterial genome obtains the maximum metabolic complexity (revenue) for minimal regulatory genes (logistic cost).
Integration of biological networks and gene expression data using Cytoscape
Cline, Melissa S; Smoot, Michael; Cerami, Ethan; Kuchinsky, Allan; Landys, Nerius; Workman, Chris; Christmas, Rowan; Avila-Campilo, Iliana; Creech, Michael; Gross, Benjamin; Hanspers, Kristina; Isserlin, Ruth; Kelley, Ryan; Killcoyne, Sarah; Lotia, Samad; Maere, Steven; Morris, John; Ono, Keiichiro; Pavlovic, Vuk; Pico, Alexander R; Vailaya, Aditya; Wang, Peng-Liang; Adler, Annette; Conklin, Bruce R; Hood, Leroy; Kuiper, Martin; Sander, Chris; Schmulevich, Ilya; Schwikowski, Benno; Warner, Guy J; Ideker, Trey; Bader, Gary D
2013-01-01
Cytoscape is a free software package for visualizing, modeling and analyzing molecular and genetic interaction networks. This protocol explains how to use Cytoscape to analyze the results of mRNA expression profiling, and other functional genomics and proteomics experiments, in the context of an interaction network obtained for genes of interest. Five major steps are described: (i) obtaining a gene or protein network, (ii) displaying the network using layout algorithms, (iii) integrating with gene expression and other functional attributes, (iv) identifying putative complexes and functional modules and (v) identifying enriched Gene Ontology annotations in the network. These steps provide a broad sample of the types of analyses performed by Cytoscape. PMID:17947979
C. elegans network biology: a beginning.
Piano, Fabio; Gunsalus, Kristin C; Hill, David E; Vidal, Marc
2006-01-01
The architecture and dynamics of molecular networks can provide an understanding of complex biological processes complementary to that obtained from the in-depth study of single genes and proteins. With a completely sequenced and well-annotated genome, a fully characterized cell lineage, and powerful tools available to dissect development, Caenorhabditis elegans, among metazoans, provides an optimal system to bridge cellular and organismal biology with the global properties of macromolecular networks. This chapter considers omic technologies available for C. elegans to describe molecular networks--encompassing transcriptional and phenotypic profiling as well as physical interaction mapping--and discusses how their individual and integrated applications are paving the way for a network-level understanding of C. elegans biology. PMID:18050437
Genome-wide high-resolution aCGH analysis of gestational choriocarcinomas.
Poaty, Henriette; Coullin, Philippe; Peko, Jean Félix; Dessen, Philippe; Diatta, Ange Lucien; Valent, Alexander; Leguern, Eric; Prévot, Sophie; Gombé-Mbalawa, Charles; Candelier, Jean-Jacques; Picard, Jean-Yves; Bernheim, Alain
2012-01-01
Eleven samples of DNA from choriocarcinomas were studied by high resolution CGH-array 244 K. They were studied after histopathological confirmation of the diagnosis, of the androgenic etiology and after a microsatellite marker analysis confirming the absence of contamination of tumor DNA from maternal DNA. Three cell lines, BeWo, JAR, JEG were also studied by this high resolution pangenomic technique. According to aCGH analysis, the de novo choriocarcinomas exhibited simple chromosomal rearrangements or normal profiles. The cell lines showed various and complex chromosomal aberrations. 23 Minimal Critical Regions were defined that allowed us to list the genes that were potentially implicated. Among them, unusually high numbers of microRNA clusters and imprinted genes were observed.
Beauregard, France; Angers, Bernard
2018-05-31
Unisexuals of the blue-spotted salamander complex are thought to reproduce by kleptogenesis. Genome exchanges associated with this sperm-dependent mode of reproduction are expected to result in a higher genetic variation and multiple ploidy levels compared to clonality. However, the existence of some populations exclusively formed of genetically identical individuals suggests that factors could prevent genome exchanges. This study aimed at assessing the prevalence of genome exchange among unisexuals of the Ambystoma laterale-jeffersonianum complex from 10 sites in the northern part of their distribution. A total of 235 individuals, including 207 unisexuals, were genotyped using microsatellite loci and AFLP. Unisexual individuals could be sorted in five genetically distinct groups, likely derived from the same paternal A. jeffersonianum haplome. One of these groups exclusively reproduced clonally, even when found in sympatry with lineages presenting signature of genome exchange. Genome exchange was site-dependent for another group. Genome exchange was detected at all sites for the three remaining groups. Prevalence of genome exchange appears to be associated with ecological conditions such as availability of effective sperm donors. Intrinsic genomic factors may also affect this process, since different lineages in sympatry present highly variable rate of genome exchange. The coexistence of clonal and genetically diversified lineages opens the door to further research on alternatives to genetic variation.
Selective recruitment of nuclear factors to productively replicating herpes simplex virus genomes.
Dembowski, Jill A; DeLuca, Neal A
2015-05-01
Much of the HSV-1 life cycle is carried out in the cell nucleus, including the expression, replication, repair, and packaging of viral genomes. Viral proteins, as well as cellular factors, play essential roles in these processes. Isolation of proteins on nascent DNA (iPOND) was developed to label and purify cellular replication forks. We adapted aspects of this method to label viral genomes to both image, and purify replicating HSV-1 genomes for the identification of associated proteins. Many viral and cellular factors were enriched on viral genomes, including factors that mediate DNA replication, repair, chromatin remodeling, transcription, and RNA processing. As infection proceeded, packaging and structural components were enriched to a greater extent. Among the more abundant proteins that copurified with genomes were the viral transcription factor ICP4 and the replication protein ICP8. Furthermore, all seven viral replication proteins were enriched on viral genomes, along with cellular PCNA and topoisomerases, while other cellular replication proteins were not detected. The chromatin-remodeling complexes present on viral genomes included the INO80, SWI/SNF, NURD, and FACT complexes, which may prevent chromatinization of the genome. Consistent with this conclusion, histones were not readily recovered with purified viral genomes, and imaging studies revealed an underrepresentation of histones on viral genomes. RNA polymerase II, the mediator complex, TFIID, TFIIH, and several other transcriptional activators and repressors were also affinity purified with viral DNA. The presence of INO80, NURD, SWI/SNF, mediator, TFIID, and TFIIH components is consistent with previous studies in which these complexes copurified with ICP4. Therefore, ICP4 is likely involved in the recruitment of these key cellular chromatin remodeling and transcription factors to viral genomes. Taken together, iPOND is a valuable method for the study of viral genome dynamics during infection and provides a comprehensive view of how HSV-1 selectively utilizes cellular resources.
The dynamics of genome replication using deep sequencing
Müller, Carolin A.; Hawkins, Michelle; Retkute, Renata; Malla, Sunir; Wilson, Ray; Blythe, Martin J.; Nakato, Ryuichiro; Komata, Makiko; Shirahige, Katsuhiko; de Moura, Alessandro P.S.; Nieduszynski, Conrad A.
2014-01-01
Eukaryotic genomes are replicated from multiple DNA replication origins. We present complementary deep sequencing approaches to measure origin location and activity in Saccharomyces cerevisiae. Measuring the increase in DNA copy number during a synchronous S-phase allowed the precise determination of genome replication. To map origin locations, replication forks were stalled close to their initiation sites; therefore, copy number enrichment was limited to origins. Replication timing profiles were generated from asynchronous cultures using fluorescence-activated cell sorting. Applying this technique we show that the replication profiles of haploid and diploid cells are indistinguishable, indicating that both cell types use the same cohort of origins with the same activities. Finally, increasing sequencing depth allowed the direct measure of replication dynamics from an exponentially growing culture. This is the first time this approach, called marker frequency analysis, has been successfully applied to a eukaryote. These data provide a high-resolution resource and methodological framework for studying genome biology. PMID:24089142
Gong, Jun; Pan, Kathy; Fakih, Marwan; Pal, Sumanta; Salgia, Ravi
2018-03-20
Advancements in next-generation sequencing have greatly enhanced the development of biomarker-driven cancer therapies. The affordability and availability of next-generation sequencers have allowed for the commercialization of next-generation sequencing platforms that have found widespread use for clinical-decision making and research purposes. Despite the greater availability of tumor molecular profiling by next-generation sequencing at our doorsteps, the achievement of value-based care, or improving patient outcomes while reducing overall costs or risks, in the era of precision oncology remains a looming challenge. In this review, we highlight available data through a pre-established and conceptualized framework for evaluating value-based medicine to assess the cost (efficiency), clinical benefit (effectiveness), and toxicity (safety) of genomic profiling in cancer care. We also provide perspectives on future directions of next-generation sequencing from targeted panels to whole-exome or whole-genome sequencing and describe potential strategies needed to attain value-based genomics.
Gong, Jun; Pan, Kathy; Fakih, Marwan; Pal, Sumanta; Salgia, Ravi
2018-01-01
Advancements in next-generation sequencing have greatly enhanced the development of biomarker-driven cancer therapies. The affordability and availability of next-generation sequencers have allowed for the commercialization of next-generation sequencing platforms that have found widespread use for clinical-decision making and research purposes. Despite the greater availability of tumor molecular profiling by next-generation sequencing at our doorsteps, the achievement of value-based care, or improving patient outcomes while reducing overall costs or risks, in the era of precision oncology remains a looming challenge. In this review, we highlight available data through a pre-established and conceptualized framework for evaluating value-based medicine to assess the cost (efficiency), clinical benefit (effectiveness), and toxicity (safety) of genomic profiling in cancer care. We also provide perspectives on future directions of next-generation sequencing from targeted panels to whole-exome or whole-genome sequencing and describe potential strategies needed to attain value-based genomics. PMID:29644010
Romero-López, Cristina; Barroso-delJesus, Alicia; García-Sacristán, Ana; Briones, Carlos; Berzal-Herranz, Alfredo
2012-01-01
Hepatitis C virus (HCV) translation initiation is directed by an internal ribosome entry site (IRES) and regulated by distant regions at the 3′-end of the viral genome. Through a combination of improved RNA chemical probing methods, SHAPE structural analysis and screening of RNA accessibility using antisense oligonucleotide microarrays, here, we show that HCV IRES folding is fine-tuned by the genomic 3′-end. The essential IRES subdomains IIIb and IIId, and domain IV, adopted a different conformation in the presence of the cis-acting replication element and/or the 3′-untranslatable region compared to that taken up in their absence. Importantly, many of the observed changes involved significant decreases in the dimethyl sulfate or N-methyl-isatoic anhydride reactivity profiles at subdomains IIIb and IIId, while domain IV appeared as a more flexible element. These observations were additionally confirmed in a replication-competent RNA molecule. Significantly, protein factors are not required for these conformational differences to be made manifest. Our results suggest that a complex, direct and long-distance RNA–RNA interaction network plays an important role in the regulation of HCV translation and replication, as well as in the switching between different steps of the viral cycle. PMID:23066110
DiRE: identifying distant regulatory elements of co-expressed genes
Gotea, Valer; Ovcharenko, Ivan
2008-01-01
Regulation of gene expression in eukaryotic genomes is established through a complex cooperative activity of proximal promoters and distant regulatory elements (REs) such as enhancers, repressors and silencers. We have developed a web server named DiRE, based on the Enhancer Identification (EI) method, for predicting distant regulatory elements in higher eukaryotic genomes, namely for determining their chromosomal location and functional characteristics. The server uses gene co-expression data, comparative genomics and profiles of transcription factor binding sites (TFBSs) to determine TFBS-association signatures that can be used for discriminating specific regulatory functions. DiRE's unique feature is its ability to detect REs outside of proximal promoter regions, as it takes advantage of the full gene locus to conduct the search. DiRE can predict common REs for any set of input genes for which the user has prior knowledge of co-expression, co-function or other biologically meaningful grouping. The server predicts function-specific REs consisting of clusters of specifically-associated TFBSs and it also scores the association of individual transcription factors (TFs) with the biological function shared by the group of input genes. Its integration with the Array2BIO server allows users to start their analysis with raw microarray expression data. The DiRE web server is freely available at http://dire.dcode.org. PMID:18487623
YOSHINO, TIMOTHY P.; DINGUIRARD, NATHALIE; DE MORAES MOURÃO, MARINA
2013-01-01
SUMMARY With rapid developments in DNA and protein sequencing technologies, combined with powerful bioinformatics tools, a continued acceleration of gene identification in parasitic helminths is predicted, potentially leading to discovery of new drug and vaccine targets, enhanced diagnostics and insights into the complex biology underlying host-parasite interactions. For the schistosome blood flukes, with the recent completion of genome sequencing and comprehensive transcriptomic datasets, there has accumulated massive amounts of gene sequence data, for which, in the vast majority of cases, little is known about actual functions within the intact organism. In this review we attempt to bring together traditional in vitro cultivation approaches and recent emergent technologies of molecular genomics, transcriptomics and genetic manipulation to illustrate the considerable progress made in our understanding of trematode gene expression and function during development of the intramolluscan larval stages. Using several prominent trematode families (Schistosomatidae, Fasciolidae, Echinostomatidae), we have focused on the current status of in vitro larval isolation/cultivation as a source of valuable raw material supporting gene discovery efforts in model digeneans that include whole genome sequencing, transcript and protein expression profiling during larval development, and progress made in the in vitro manipulation of genes and their expression in larval trematodes using transgenic and RNA interference (RNAi) approaches. PMID:19961646
Chang, Ting-Yu; Wu, Yu-Hsuan; Cheng, Cheng-Chung; Wang, Hsei-Wei
2011-09-01
Alternative RNA splicing greatly increases proteome diversity, and the possibility of studying genome-wide alternative splicing (AS) events becomes available with the advent of high-throughput genomics tools devoted to this issue. Kaposi's sarcoma associated herpesvirus (KSHV) is the etiological agent of KS, a tumor of lymphatic endothelial cell (LEC) lineage, but little is known about the AS variations induced by KSHV. We analyzed KSHV-controlled AS using high-density microarrays capable of detecting all exons in the human genome. Splicing variants and altered exon-intron usage in infected LEC were found, and these correlated with protein domain modification. The different 3'-UTR used in new transcripts also help isoforms to escape microRNA-mediated surveillance. Exome-level analysis further revealed information that cannot be disclosed using classical gene-level profiling: a significant exon usage difference existed between LEC and CD34(+) precursor cells, and KSHV infection resulted in LEC-to-precursor, dedifferentiation-like exon level reprogramming. Our results demonstrate the application of exon arrays in systems biology research, and suggest the regulatory effects of AS in endothelial cells are far more complex than previously observed. This extra layer of molecular diversity helps to account for various aspects of endothelial biology, KSHV life cycle and disease pathogenesis that until now have been unexplored.
Que, Feng; Wang, Guang-Long; Li, Tong; Wang, Ya-Hui; Xu, Zhi-Sheng; Xiong, Ai-Sheng
2018-06-16
The homeobox gene family, a large family represented by transcription factors, has been implicated in secondary growth, early embryo patterning, and hormone response pathways in plants. However, reports about the information and evolutionary history of the homeobox gene family in carrot are limited. In the present study, a total of 130 homeobox family genes were identified in the carrot genome. Specific codomain and phylogenetic analyses revealed that the genes were classified into 14 subgroups. Whole genome and proximal duplication participated in the homeobox gene family expansion in carrot. Purifying selection also contributed to the evolution of carrot homeobox genes. In Gene Ontology (GO) analysis, most members of the HD-ZIP III and IV subfamilies were found to have a lipid binding (GO:0008289) term. Most HD-ZIP III and IV genes also harbored a steroidogenic acute regulatory protein-related lipid transfer (START) domain. These results suggested that the HD-ZIP III and IV subfamilies might be related to lipid transfer. Transcriptome and quantitative real-time PCR (RT-qPCR) data indicated that members of the WOX and KNOX subfamilies were likely implicated in carrot root development. Our study provided a useful basis for further studies on the complexity and function of the homeobox gene family in carrot.
Genome-wide association studies for multiple diseases of the German Shepherd Dog
Tsai, Kate L.; Noorai, Rooksana E.; Starr-Moss, Alison N.; Quignon, Pascale; Rinz, Caitlin J.; Ostrander, Elaine A.; Steiner, Jörg M.; Murphy, Keith E.
2012-01-01
The German Shepherd Dog (GSD) is a popular working and companion breed for which over 50 hereditary diseases have been documented. Herein, SNP profiles for 197 GSDs were generated using the Affymetrix v2 canine SNP array for a genome-wide association study to identify loci associated with four diseases: pituitary dwarfism, degenerative myelopathy (DM), congenital megaesophagus (ME), and pancreatic acinar atrophy (PAA). A locus on Chr 9 is strongly associated with pituitary dwarfism and is proximal to a plausible candidate gene, LHX3. Results for DM confirm a major locus encompassing SOD1, in which an associated point mutation was previously identified, but do not suggest modifier loci. Several SNPs on Chr 12 are associated with ME and a 4.7 Mb haplotype block is present in affected dogs. Analysis of additional ME cases for a SNP within the haplotype provides further support for this association. Results for PAA indicate more complex genetic underpinnings. Several regions on multiple chromosomes reach genome-wide significance. However, no major locus is apparent and only two associated haplotype blocks, on Chrs 7 and 12 are observed. These data suggest that PAA may be governed by multiple loci with small effects, or it may be a heterogeneous disorder. PMID:22105877
Chen, Josephine; Zhao, Po; Massaro, Donald; Clerch, Linda B; Almon, Richard R; DuBois, Debra C; Jusko, William J; Hoffman, Eric P
2004-01-01
Publicly accessible DNA databases (genome browsers) are rapidly accelerating post-genomic research (see http://www.genome.ucsc.edu/), with integrated genomic DNA, gene structure, EST/ splicing and cross-species ortholog data. DNA databases have relatively low dimensionality; the genome is a linear code that anchors all associated data. In contrast, RNA expression and protein databases need to be able to handle very high dimensional data, with time, tissue, cell type and genes, as interrelated variables. The high dimensionality of microarray expression profile data, and the lack of a standard experimental platform have complicated the development of web-accessible databases and analytical tools. We have designed and implemented a public resource of expression profile data containing 1024 human, mouse and rat Affymetrix GeneChip expression profiles, generated in the same laboratory, and subject to the same quality and procedural controls (Public Expression Profiling Resource; PEPR). Our Oracle-based PEPR data warehouse includes a novel time series query analysis tool (SGQT), enabling dynamic generation of graphs and spreadsheets showing the action of any transcript of interest over time. In this report, we demonstrate the utility of this tool using a 27 time point, in vivo muscle regeneration series. This data warehouse and associated analysis tools provides access to multidimensional microarray data through web-based interfaces, both for download of all types of raw data for independent analysis, and also for straightforward gene-based queries. Planned implementations of PEPR will include web-based remote entry of projects adhering to quality control and standard operating procedure (QC/SOP) criteria, and automated output of alternative probe set algorithms for each project (see http://microarray.cnmcresearch.org/pgadatatable.asp).
Sargent, Rachel; Jones, Dan; Abruzzo, Lynne V.; Yao, Hui; Bonderover, Jaime; Cisneros, Marissa; Wierda, William G.; Keating, Michael J.; Luthra, Rajyalakshmi
2009-01-01
Chromosome gains and losses used for risk stratification in chronic lymphocytic leukemia (CLL) are commonly assessed by multiprobe fluorescence in situ hybridization (FISH) studies. We designed and validated a customized array-comparative genomic hybridization (aCGH) platform as a clinical assay for CLL genomic profiling. A 60-mer, 44,000-probe oligonucleotide array with a 50-kb average spatial resolution was augmented with high-density probe tiling at loci that are frequently aberrant in CLL. Aberrations identified by aCGH were compared with those identified by a FISH panel, including locus-specific probes to ATM (11q22.3), the centromeric region of chromosome 12 (12p11.1–q11), D13S319 (13q14.3), LAMP1 (13q34), and TP53 (17p13.1). In 100 CLL samples, aCGH/FISH concordance was seen for 89% of FISH-called aberrations at the ATM (n = 18), D13S319 (n = 42), LAMP (n = 12), and TP53 (n = 22) loci and for chromosome 12 (n = 14). Eighty-four percentage of FISH/aCGH discordant calls were in samples either at or below the limit of aCGH sensitivity (10% to 25% FISH aberration-containing cells). Therefore, aCGH profiling is a feasible routine clinical test with comparable results to multiprobe FISH studies; however, it may be less sensitive than FISH in cases with low-level aberrations. Further, a customized array design can provide comprehensive genomic profiling with additional accuracy in both identifying and defining the extent of small aberrations at target loci. PMID:19074592
Prediction of individualized therapeutic vulnerabilities in cancer from genomic profiles
Aksoy, Bülent Arman; Demir, Emek; Babur, Özgün; Wang, Weiqing; Jing, Xiaohong; Schultz, Nikolaus; Sander, Chris
2014-01-01
Motivation: Somatic homozygous deletions of chromosomal regions in cancer, while not necessarily oncogenic, may lead to therapeutic vulnerabilities specific to cancer cells compared with normal cells. A recently reported example is the loss of one of the two isoenzymes in glioblastoma cancer cells such that the use of a specific inhibitor selectively inhibited growth of the cancer cells, which had become fully dependent on the second isoenzyme. We have now made use of the unprecedented conjunction of large-scale cancer genomics profiling of tumor samples in The Cancer Genome Atlas (TCGA) and of tumor-derived cell lines in the Cancer Cell Line Encyclopedia, as well as the availability of integrated pathway information systems, such as Pathway Commons, to systematically search for a comprehensive set of such epistatic vulnerabilities. Results: Based on homozygous deletions affecting metabolic enzymes in 16 TCGA cancer studies and 972 cancer cell lines, we identified 4104 candidate metabolic vulnerabilities present in 1019 tumor samples and 482 cell lines. Up to 44% of these vulnerabilities can be targeted with at least one Food and Drug Administration-approved drug. We suggest focused experiments to test these vulnerabilities and clinical trials based on personalized genomic profiles of those that pass preclinical filters. We conclude that genomic profiling will in the future provide a promising basis for network pharmacology of epistatic vulnerabilities as a promising therapeutic strategy. Availability and implementation: A web-based tool for exploring all vulnerabilities and their details is available at http://cbio.mskcc.org/cancergenomics/statius/ along with supplemental data files. Contact: statius@cbio.mskcc.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24665131
Chen, Josephine; Zhao, Po; Massaro, Donald; Clerch, Linda B.; Almon, Richard R.; DuBois, Debra C.; Jusko, William J.; Hoffman, Eric P.
2004-01-01
Publicly accessible DNA databases (genome browsers) are rapidly accelerating post-genomic research (see http://www.genome.ucsc.edu/), with integrated genomic DNA, gene structure, EST/ splicing and cross-species ortholog data. DNA databases have relatively low dimensionality; the genome is a linear code that anchors all associated data. In contrast, RNA expression and protein databases need to be able to handle very high dimensional data, with time, tissue, cell type and genes, as interrelated variables. The high dimensionality of microarray expression profile data, and the lack of a standard experimental platform have complicated the development of web-accessible databases and analytical tools. We have designed and implemented a public resource of expression profile data containing 1024 human, mouse and rat Affymetrix GeneChip expression profiles, generated in the same laboratory, and subject to the same quality and procedural controls (Public Expression Profiling Resource; PEPR). Our Oracle-based PEPR data warehouse includes a novel time series query analysis tool (SGQT), enabling dynamic generation of graphs and spreadsheets showing the action of any transcript of interest over time. In this report, we demonstrate the utility of this tool using a 27 time point, in vivo muscle regeneration series. This data warehouse and associated analysis tools provides access to multidimensional microarray data through web-based interfaces, both for download of all types of raw data for independent analysis, and also for straightforward gene-based queries. Planned implementations of PEPR will include web-based remote entry of projects adhering to quality control and standard operating procedure (QC/SOP) criteria, and automated output of alternative probe set algorithms for each project (see http://microarray.cnmcresearch.org/pgadatatable.asp). PMID:14681485
Appels, R; Barrero, R; Bellgard, M
2012-03-01
The Plant and Animal Genome (PAG, held annually) meeting in January 2012 provided insights into the advances in plant, animal, and microbe genome studies particularly as they impact on our understanding of complex biological systems. The diverse areas of biology covered included the advances in technologies, variation in complex traits, genome change in evolution, and targeting phenotypic changes, across the broad spectrum of life forms. This overview aims to summarize the major advances in research areas presented in the plenary lectures and does not attempt to summarize the diverse research activities covered throughout the PAG in workshops, posters, presentations, and displays by suppliers of cutting-edge technologies.
The NCI Genomic Data Commons as an engine for precision medicine.
Jensen, Mark A; Ferretti, Vincent; Grossman, Robert L; Staudt, Louis M
2017-07-27
The National Cancer Institute Genomic Data Commons (GDC) is an information system for storing, analyzing, and sharing genomic and clinical data from patients with cancer. The recent high-throughput sequencing of cancer genomes and transcriptomes has produced a big data problem that precludes many cancer biologists and oncologists from gleaning knowledge from these data regarding the nature of malignant processes and the relationship between tumor genomic profiles and treatment response. The GDC aims to democratize access to cancer genomic data and to foster the sharing of these data to promote precision medicine approaches to the diagnosis and treatment of cancer.
KGCAK: a K-mer based database for genome-wide phylogeny and complexity evaluation.
Wang, Dapeng; Xu, Jiayue; Yu, Jun
2015-09-16
The K-mer approach, treating genomic sequences as simple characters and counting the relative abundance of each string upon a fixed K, has been extensively applied to phylogeny inference for genome assembly, annotation, and comparison. To meet increasing demands for comparing large genome sequences and to promote the use of the K-mer approach, we develop a versatile database, KGCAK ( http://kgcak.big.ac.cn/KGCAK/ ), containing ~8,000 genomes that include genome sequences of diverse life forms (viruses, prokaryotes, protists, animals, and plants) and cellular organelles of eukaryotic lineages. It builds phylogeny based on genomic elements in an alignment-free fashion and provides in-depth data processing enabling users to compare the complexity of genome sequences based on K-mer distribution. We hope that KGCAK becomes a powerful tool for exploring relationship within and among groups of species in a tree of life based on genomic data.
Mack, Stephen C; Northcott, Paul A
2017-07-20
Recent breakthroughs in next-generation sequencing technology and complementary genomic platforms have transformed our capacity to interrogate the molecular landscapes of human cancers, including childhood brain tumors. Numerous high-throughput genomic studies have been reported for the major histologic brain tumor entities diagnosed in children, including interrogations at the level of the genome, epigenome, and transcriptome, many of which have yielded essential new insights into disease biology. The nature of these discoveries has been largely platform dependent, exemplifying the usefulness of applying different genomic and computational strategies, or integrative approaches, to address specific biologic and/or clinical questions. The goal of this article is to summarize the spectrum of molecular profiling methods available for investigating genomic aspects of childhood brain tumors in both the research and the clinical setting. We provide an overview of the main next-generation sequencing and array-based technologies currently being applied in this field and draw from key examples in the recent neuro-oncology literature to illustrate how these genomic approaches have profoundly advanced our understanding of individual tumor entities. Moreover, we discuss the current status of genomic profiling in the clinic and how different platforms are being used to improve patient diagnosis and stratification, as well as to identify actionable targets for informing molecularly guided therapies, especially for patients for whom conventional standard-of-care treatments have failed. Both the demand for genomic testing and the main challenges associated with incorporating genomics into the clinical management of pediatric patients with brain tumors are discussed, as are recommendations for incorporating these assays into future clinical trials.
2010-01-01
Background Identifying associations between genotypes and gene expression levels using microarrays has enabled systematic interrogation of regulatory variation underlying complex phenotypes. This approach has vast potential for functional characterization of disease states, but its prohibitive cost, given hundreds to thousands of individual samples from populations have to be genotyped and expression profiled, has limited its widespread application. Results Here we demonstrate that genomic regions with allele-specific expression (ASE) detected by sequencing cDNA are highly enriched for cis-acting expression quantitative trait loci (cis-eQTL) identified by profiling of 500 animals in parallel, with up to 90% agreement on the allele that is preferentially expressed. We also observed widespread noncoding and antisense ASE and identified several allele-specific alternative splicing variants. Conclusion Monitoring ASE by sequencing cDNA from as little as one sample is a practical alternative to expression genetics for mapping cis-acting variation that regulates RNA transcription and processing. PMID:20707912
3' terminal diversity of MRP RNA and other human noncoding RNAs revealed by deep sequencing.
Goldfarb, Katherine C; Cech, Thomas R
2013-09-21
Post-transcriptional 3' end processing is a key component of RNA regulation. The abundant and essential RNA subunit of RNase MRP has been proposed to function in three distinct cellular compartments and therefore may utilize this mode of regulation. Here we employ 3' RACE coupled with high-throughput sequencing to characterize the 3' terminal sequences of human MRP RNA and other noncoding RNAs that form RNP complexes. The 3' terminal sequence of MRP RNA from HEK293T cells has a distinctive distribution of genomically encoded termini (including an assortment of U residues) with a portion of these selectively tagged by oligo(A) tails. This profile contrasts with the relatively homogenous 3' terminus of an in vitro transcribed MRP RNA control and the differing 3' terminal profiles of U3 snoRNA, RNase P RNA, and telomerase RNA (hTR). 3' RACE coupled with deep sequencing provides a valuable framework for the functional characterization of 3' terminal sequences of noncoding RNAs.
Costaglioli, Patricia; Barthe, Christophe; Claverol, Stephane; Brözel, Volker S; Perrot, Michel; Crouzet, Marc; Bonneu, Marc; Garbay, Bertrand; Vilain, Sebastien
2012-01-01
Bacterial biofilms are complex cell communities found attached to surfaces and surrounded by an extracellular matrix composed of exopolysaccharides, DNA, and proteins. We investigated the whole-genome expression profile of Pseudomonas aeruginosa sessile cells (SCs) present in biofilms developed on a glass wool substratum. The transcriptome and proteome of SCs were compared with those of planktonic cell cultures. Principal component analysis revealed a biofilm-specific gene expression profile. Our study highlighted the overexpression of genes controlling the anthranilate degradation pathway in the SCs grown on glass wool for 24 h. In this condition, the metabolic pathway that uses anthranilate for Pseudomonas quinolone signal production was not activated, which suggested that anthranilate was primarily being consumed for energy metabolism. Transposon mutants defective for anthranilate degradation were analyzed in a simple assay of biofilm formation. The phenotypic analyses confirmed that P. aeruginosa biofilm formation partially depended on the activity of the anthranilate degradation pathway. This work points to a new feature concerning anthranilate metabolism in P. aeruginosa SCs. PMID:23170231
Functional proteomics outlines the complexity of breast cancer molecular subtypes.
Gámez-Pozo, Angelo; Trilla-Fuertes, Lucía; Berges-Soria, Julia; Selevsek, Nathalie; López-Vacas, Rocío; Díaz-Almirón, Mariana; Nanni, Paolo; Arevalillo, Jorge M; Navarro, Hilario; Grossmann, Jonas; Gayá Moreno, Francisco; Gómez Rioja, Rubén; Prado-Vázquez, Guillermo; Zapater-Moros, Andrea; Main, Paloma; Feliú, Jaime; Martínez Del Prado, Purificación; Zamora, Pilar; Ciruelos, Eva; Espinosa, Enrique; Fresno Vara, Juan Ángel
2017-08-30
Breast cancer is a heterogeneous disease comprising a variety of entities with various genetic backgrounds. Estrogen receptor-positive, human epidermal growth factor receptor 2-negative tumors typically have a favorable outcome; however, some patients eventually relapse, which suggests some heterogeneity within this category. In the present study, we used proteomics and miRNA profiling techniques to characterize a set of 102 either estrogen receptor-positive (ER+)/progesterone receptor-positive (PR+) or triple-negative formalin-fixed, paraffin-embedded breast tumors. Protein expression-based probabilistic graphical models and flux balance analyses revealed that some ER+/PR+ samples had a protein expression profile similar to that of triple-negative samples and had a clinical outcome similar to those with triple-negative disease. This probabilistic graphical model-based classification had prognostic value in patients with luminal A breast cancer. This prognostic information was independent of that provided by standard genomic tests for breast cancer, such as MammaPrint, OncoType Dx and the 8-gene Score.
2013-01-01
Background Homosporous ferns are distinctive amongst the land plant lineages for their high chromosome numbers and enigmatic genomes. Genome size measurements are an under exploited tool in homosporous ferns and show great potential to provide an overview of the mechanisms that define genome evolution in these ferns. The aim of this study is to investigate the evolution of genome size and the relationship between genome size and spore size within the apomictic Asplenium monanthes fern complex and related lineages. Results Comparative analyses to test for a relationship between spore size and genome size show that they are not correlated. The data do however provide evidence for marked genome size variation between species in this group. These results indicate that Asplenium monanthes has undergone a two-fold expansion in genome size. Conclusions Our findings challenge the widely held assumption that spore size can be used to infer ploidy levels within apomictic fern complexes. We argue that the observed genome size variation is likely to have arisen via increases in both chromosome number due to polyploidy and chromosome size due to amplification of repetitive DNA (e.g. transposable elements, especially retrotransposons). However, to date the latter has not been considered to be an important process of genome evolution within homosporous ferns. We infer that genome evolution, at least in some homosporous fern lineages, is a more dynamic process than existing studies would suggest. PMID:24354467
Synthetic Genetic Arrays: Automation of Yeast Genetics.
Kuzmin, Elena; Costanzo, Michael; Andrews, Brenda; Boone, Charles
2016-04-01
Genome-sequencing efforts have led to great strides in the annotation of protein-coding genes and other genomic elements. The current challenge is to understand the functional role of each gene and how genes work together to modulate cellular processes. Genetic interactions define phenotypic relationships between genes and reveal the functional organization of a cell. Synthetic genetic array (SGA) methodology automates yeast genetics and enables large-scale and systematic mapping of genetic interaction networks in the budding yeast,Saccharomyces cerevisiae SGA facilitates construction of an output array of double mutants from an input array of single mutants through a series of replica pinning steps. Subsequent analysis of genetic interactions from SGA-derived mutants relies on accurate quantification of colony size, which serves as a proxy for fitness. Since its development, SGA has given rise to a variety of other experimental approaches for functional profiling of the yeast genome and has been applied in a multitude of other contexts, such as genome-wide screens for synthetic dosage lethality and integration with high-content screening for systematic assessment of morphology defects. SGA-like strategies can also be implemented similarly in a number of other cell types and organisms, includingSchizosaccharomyces pombe,Escherichia coli, Caenorhabditis elegans, and human cancer cell lines. The genetic networks emerging from these studies not only generate functional wiring diagrams but may also play a key role in our understanding of the complex relationship between genotype and phenotype. © 2016 Cold Spring Harbor Laboratory Press.
Beyond endometriosis GWAS: from Genomics to Phenomics to the Patient
Zondervan, Krina T.; Rahmioglu, Nilufer; Morris, Andrew P.; Nyholt, Dale R.; Montgomery, Grant W.; Becker, Christian M.; Missmer, Stacey A.
2017-01-01
Endometriosis is a heritable, complex chronic inflammatory disease, for which much of the causal pathogenic mechanism remain unknown. Genome-wide association studies (GWAS) to date have identified 12 single nucleotide polymorphisms or SNPs at 10 independent genetic loci associated with endometriosis. Most of these were more strongly associated with rAFS stage III/IV, rather than I/II. The loci are almost all located in inter-genic regions that are known to play a role in the regulation of expression of target genes yet to be identified. To identify the target genes and pathways perturbed by the implicated variants, studies are required involving functional genomic annotation of the surrounding chromosomal regions, in terms of transcriptor factor binding, epigenetic modification (e.g. DNA methylation and histone modification) sites, as well as their correlation with RNA transcription. These studies need to be conducted in tissue types relevant to endometriosis – in particular endometrium. In addition, to allow biologically and clinically relevant interpretation of molecular profiling data, they need to be combined and correlated with detailed, systematically collected phenotypic information (surgical and clinical). The WERF Endometriosis Phenome and Biobanking Harmonization project (EPHect) is a global standardisation initiative that has produced consensus data and sample collection protocols for endometriosis research. These now pave the way for collaborative studies integrating phenomic with genomic data, to identify informative subtypes of endometriosis that will enhance understanding of the pathogenic mechanisms of the disease and discovery of novel, targeted treatments. PMID:27513026
Perrino, Cinzia; Barabási, Albert-Laszló; Condorelli, Gianluigi; Davidson, Sean Michael; De Windt, Leon; Dimmeler, Stefanie; Engel, Felix Benedikt; Hausenloy, Derek John; Hill, Joseph Addison; Van Laake, Linda Wilhelmina; Lecour, Sandrine; Leor, Jonathan; Madonna, Rosalinda; Mayr, Manuel; Prunier, Fabrice; Sluijter, Joost Petrus Geradus; Schulz, Rainer; Thum, Thomas; Ytrehus, Kirsti
2017-01-01
Despite advances in myocardial reperfusion therapies, acute myocardial ischaemia/reperfusion injury and consequent ischaemic heart failure represent the number one cause of morbidity and mortality in industrialized societies. Although different therapeutic interventions have been shown beneficial in preclinical settings, an effective cardioprotective or regenerative therapy has yet to be successfully introduced in the clinical arena. Given the complex pathophysiology of the ischaemic heart, large scale, unbiased, global approaches capable of identifying multiple branches of the signalling networks activated in the ischaemic/reperfused heart might be more successful in the search for novel diagnostic or therapeutic targets. High-throughput techniques allow high-resolution, genome-wide investigation of genetic variants, epigenetic modifications, and associated gene expression profiles. Platforms such as proteomics and metabolomics (not described here in detail) also offer simultaneous readouts of hundreds of proteins and metabolites. Isolated omics analyses usually provide Big Data requiring large data storage, advanced computational resources and complex bioinformatics tools. The possibility of integrating different omics approaches gives new hope to better understand the molecular circuitry activated by myocardial ischaemia, putting it in the context of the human ‘diseasome’. Since modifications of cardiac gene expression have been consistently linked to pathophysiology of the ischaemic heart, the integration of epigenomic and transcriptomic data seems a promising approach to identify crucial disease networks. Thus, the scope of this Position Paper will be to highlight potentials and limitations of these approaches, and to provide recommendations to optimize the search for novel diagnostic or therapeutic targets for acute ischaemia/reperfusion injury and ischaemic heart failure in the post-genomic era. PMID:28460026
Trevino, Victor; Cassese, Alberto; Nagy, Zsuzsanna; Zhuang, Xiaodong; Herbert, John; Antzack, Philipp; Clarke, Kim; Davies, Nicholas; Rahman, Ayesha; Campbell, Moray J.; Bicknell, Roy; Vannucci, Marina; Falciani, Francesco
2016-01-01
Abstract The advent of functional genomics has enabled the genome-wide characterization of the molecular state of cells and tissues, virtually at every level of biological organization. The difficulty in organizing and mining this unprecedented amount of information has stimulated the development of computational methods designed to infer the underlying structure of regulatory networks from observational data. These important developments had a profound impact in biological sciences since they triggered the development of a novel data-driven investigative approach. In cancer research, this strategy has been particularly successful. It has contributed to the identification of novel biomarkers, to a better characterization of disease heterogeneity and to a more in depth understanding of cancer pathophysiology. However, so far these approaches have not explicitly addressed the challenge of identifying networks representing the interaction of different cell types in a complex tissue. Since these interactions represent an essential part of the biology of both diseased and healthy tissues, it is of paramount importance that this challenge is addressed. Here we report the definition of a network reverse engineering strategy designed to infer directional signals linking adjacent cell types within a complex tissue. The application of this inference strategy to prostate cancer genome-wide expression profiling data validated the approach and revealed that normal epithelial cells exert an anti-tumour activity on prostate carcinoma cells. Moreover, by using a Bayesian hierarchical model integrating genetics and gene expression data and combining this with survival analysis, we show that the expression of putative cell communication genes related to focal adhesion and secretion is affected by epistatic gene copy number variation and it is predictive of patient survival. Ultimately, this study represents a generalizable approach to the challenge of deciphering cell communication networks in a wide spectrum of biological systems. PMID:27124473
Trevino, Victor; Cassese, Alberto; Nagy, Zsuzsanna; Zhuang, Xiaodong; Herbert, John; Antczak, Philipp; Clarke, Kim; Davies, Nicholas; Rahman, Ayesha; Campbell, Moray J; Guindani, Michele; Bicknell, Roy; Vannucci, Marina; Falciani, Francesco
2016-04-01
The advent of functional genomics has enabled the genome-wide characterization of the molecular state of cells and tissues, virtually at every level of biological organization. The difficulty in organizing and mining this unprecedented amount of information has stimulated the development of computational methods designed to infer the underlying structure of regulatory networks from observational data. These important developments had a profound impact in biological sciences since they triggered the development of a novel data-driven investigative approach. In cancer research, this strategy has been particularly successful. It has contributed to the identification of novel biomarkers, to a better characterization of disease heterogeneity and to a more in depth understanding of cancer pathophysiology. However, so far these approaches have not explicitly addressed the challenge of identifying networks representing the interaction of different cell types in a complex tissue. Since these interactions represent an essential part of the biology of both diseased and healthy tissues, it is of paramount importance that this challenge is addressed. Here we report the definition of a network reverse engineering strategy designed to infer directional signals linking adjacent cell types within a complex tissue. The application of this inference strategy to prostate cancer genome-wide expression profiling data validated the approach and revealed that normal epithelial cells exert an anti-tumour activity on prostate carcinoma cells. Moreover, by using a Bayesian hierarchical model integrating genetics and gene expression data and combining this with survival analysis, we show that the expression of putative cell communication genes related to focal adhesion and secretion is affected by epistatic gene copy number variation and it is predictive of patient survival. Ultimately, this study represents a generalizable approach to the challenge of deciphering cell communication networks in a wide spectrum of biological systems.
Modularity and evolutionary constraints in a baculovirus gene regulatory network
2013-01-01
Background The structure of regulatory networks remains an open question in our understanding of complex biological systems. Interactions during complete viral life cycles present unique opportunities to understand how host-parasite network take shape and behave. The Anticarsia gemmatalis multiple nucleopolyhedrovirus (AgMNPV) is a large double-stranded DNA virus, whose genome may encode for 152 open reading frames (ORFs). Here we present the analysis of the ordered cascade of the AgMNPV gene expression. Results We observed an earlier onset of the expression than previously reported for other baculoviruses, especially for genes involved in DNA replication. Most ORFs were expressed at higher levels in a more permissive host cell line. Genes with more than one copy in the genome had distinct expression profiles, which could indicate the acquisition of new functionalities. The transcription gene regulatory network (GRN) for 149 ORFs had a modular topology comprising five communities of highly interconnected nodes that separated key genes that are functionally related on different communities, possibly maximizing redundancy and GRN robustness by compartmentalization of important functions. Core conserved functions showed expression synchronicity, distinct GRN features and significantly less genetic diversity, consistent with evolutionary constraints imposed in key elements of biological systems. This reduced genetic diversity also had a positive correlation with the importance of the gene in our estimated GRN, supporting a relationship between phylogenetic data of baculovirus genes and network features inferred from expression data. We also observed that gene arrangement in overlapping transcripts was conserved among related baculoviruses, suggesting a principle of genome organization. Conclusions Albeit with a reduced number of nodes (149), the AgMNPV GRN had a topology and key characteristics similar to those observed in complex cellular organisms, which indicates that modularity may be a general feature of biological gene regulatory networks. PMID:24006890
Targeted and genome-scale methylomics reveals gene body signatures in human cell lines
Ball, Madeleine Price; Li, Jin Billy; Gao, Yuan; Lee, Je-Hyuk; LeProust, Emily; Park, In-Hyun; Xie, Bin; Daley, George Q.; Church, George M.
2012-01-01
Cytosine methylation, an epigenetic modification of DNA, is a target of growing interest for developing high throughput profiling technologies. Here we introduce two new, complementary techniques for cytosine methylation profiling utilizing next generation sequencing technology: bisulfite padlock probes (BSPPs) and methyl sensitive cut counting (MSCC). In the first method, we designed a set of ~10,000 BSPPs distributed over the ENCODE pilot project regions to take advantage of existing expression and chromatin immunoprecipitation data. We observed a pattern of low promoter methylation coupled with high gene body methylation in highly expressed genes. Using the second method, MSCC, we gathered genome-scale data for 1.4 million HpaII sites and confirmed that gene body methylation in highly expressed genes is a consistent phenomenon over the entire genome. Our observations highlight the usefulness of techniques which are not inherently or intentionally biased in favor of only profiling particular subsets like CpG islands or promoter regions. PMID:19329998
Decoherence in yeast cell populations and its implications for genome-wide expression noise.
Briones, M R S; Bosco, F
2009-01-20
Gene expression "noise" is commonly defined as the stochastic variation of gene expression levels in different cells of the same population under identical growth conditions. Here, we tested whether this "noise" is amplified with time, as a consequence of decoherence in global gene expression profiles (genome-wide microarrays) of synchronized cells. The stochastic component of transcription causes fluctuations that tend to be amplified as time progresses, leading to a decay of correlations of expression profiles, in perfect analogy with elementary relaxation processes. Measuring decoherence, defined here as a decay in the auto-correlation function of yeast genome-wide expression profiles, we found a slowdown in the decay of correlations, opposite to what would be expected if, as in mixing systems, correlations decay exponentially as the equilibrium state is reached. Our results indicate that the populational variation in gene expression (noise) is a consequence of temporal decoherence, in which the slow decay of correlations is a signature of strong interdependence of the transcription dynamics of different genes.
Darwinian evolution in the light of genomics
Koonin, Eugene V.
2009-01-01
Comparative genomics and systems biology offer unprecedented opportunities for testing central tenets of evolutionary biology formulated by Darwin in the Origin of Species in 1859 and expanded in the Modern Synthesis 100 years later. Evolutionary-genomic studies show that natural selection is only one of the forces that shape genome evolution and is not quantitatively dominant, whereas non-adaptive processes are much more prominent than previously suspected. Major contributions of horizontal gene transfer and diverse selfish genetic elements to genome evolution undermine the Tree of Life concept. An adequate depiction of evolution requires the more complex concept of a network or ‘forest’ of life. There is no consistent tendency of evolution towards increased genomic complexity, and when complexity increases, this appears to be a non-adaptive consequence of evolution under weak purifying selection rather than an adaptation. Several universals of genome evolution were discovered including the invariant distributions of evolutionary rates among orthologous genes from diverse genomes and of paralogous gene family sizes, and the negative correlation between gene expression level and sequence evolution rate. Simple, non-adaptive models of evolution explain some of these universals, suggesting that a new synthesis of evolutionary biology might become feasible in a not so remote future. PMID:19213802
Comparative genomic analysis of Mycobacterium tuberculosis clinical isolates.
Liu, Fei; Hu, Yongfei; Wang, Qi; Li, Hong Min; Gao, George F; Liu, Cui Hua; Zhu, Baoli
2014-06-13
Due to excessive antibiotic use, drug-resistant Mycobacterium tuberculosis has become a serious public health threat and a major obstacle to disease control in many countries. To better understand the evolution of drug-resistant M. tuberculosis strains, we performed whole genome sequencing for 7 M. tuberculosis clinical isolates with different antibiotic resistance profiles and conducted comparative genomic analysis of gene variations among them. We observed that all 7 M. tuberculosis clinical isolates with different levels of drug resistance harbored similar numbers of SNPs, ranging from 1409-1464. The numbers of insertion/deletions (Indels) identified in the 7 isolates were also similar, ranging from 56 to 101. A total of 39 types of mutations were identified in drug resistance-associated loci, including 14 previously reported ones and 25 newly identified ones. Sixteen of the identified large Indels spanned PE-PPE-PGRS genes, which represents a major source of antigenic variability. Aside from SNPs and Indels, a CRISPR locus with varied spacers was observed in all 7 clinical isolates, suggesting that they might play an important role in plasticity of the M. tuberculosis genome. The nucleotide diversity (Л value) and selection intensity (dN/dS value) of the whole genome sequences of the 7 isolates were similar. The dN/dS values were less than 1 for all 7 isolates (range from 0.608885 to 0.637365), supporting the notion that M. tuberculosis genomes undergo purifying selection. The Л values and dN/dS values were comparable between drug-susceptible and drug-resistant strains. In this study, we show that clinical M. tuberculosis isolates exhibit distinct variations in terms of the distribution of SNP, Indels, CRISPR-cas locus, as well as the nucleotide diversity and selection intensity, but there are no generalizable differences between drug-susceptible and drug-resistant isolates on the genomic scale. Our study provides evidence strengthening the notion that the evolution of drug resistance among clinical M. tuberculosis isolates is clearly a complex and diversified process.
Jaiswal, Sarika; Sheoran, Sonia; Arora, Vasu; Angadi, Ulavappa B; Iquebal, Mir A; Raghav, Nishu; Aneja, Bharti; Kumar, Deepender; Singh, Rajender; Sharma, Pradeep; Singh, G P; Rai, Anil; Tiwari, Ratan; Kumar, Dinesh
2017-01-01
Wheat fulfills 20% of global caloric requirement. World needs 60% more wheat for 9 billion population by 2050 but climate change with increasing temperature is projected to affect wheat productivity adversely. Trait improvement and management of wheat germplasm requires genomic resource. Simple Sequence Repeats (SSRs) being highly polymorphic and ubiquitously distributed in the genome, can be a marker of choice but there is no structured marker database with options to generate primer pairs for genotyping on desired chromosome/physical location. Previously associated markers with different wheat trait are also not available in any database. Limitations of in vitro SSR discovery can be overcome by genome-wide in silico mining of SSR. Triticum aestivum SSR database ( TaSSRDb ) is an integrated online database with three-tier architecture, developed using PHP and MySQL and accessible at http://webtom.cabgrid.res.in/wheatssr/. For genotyping, Primer3 standalone code computes primers on user request. Chromosome-wise SSR calling for all the three sub genomes along with choice of motif types is provided in addition to the primer generation for desired marker. We report here a database of highest number of SSRs (476,169) from complex, hexaploid wheat genome (~17 GB) along with previously reported 268 SSR markers associated with 11 traits. Highest (116.93 SSRs/Mb) and lowest (74.57 SSRs/Mb) SSR densities were found on 2D and 3A chromosome, respectively. To obtain homozygous locus, e-PCR was done. Such 30 loci were randomly selected for PCR validation in panel of 18 wheat Advance Varietal Trial (AVT) lines. TaSSRDb can be a valuable genomic resource tool for linkage mapping, gene/QTL (Quantitative trait locus) discovery, diversity analysis, traceability and variety identification. Varietal specific profiling and differentiation can supplement DUS (Distinctiveness, Uniformity, and Stability) testing, EDV (Essentially Derived Variety)/IV (Initial Variety) disputes, seed purity and hybrid wheat testing. All these are required in germplasm management as well as also in the endeavor of wheat productivity.
Jaiswal, Sarika; Sheoran, Sonia; Arora, Vasu; Angadi, Ulavappa B.; Iquebal, Mir A.; Raghav, Nishu; Aneja, Bharti; Kumar, Deepender; Singh, Rajender; Sharma, Pradeep; Singh, G. P.; Rai, Anil; Tiwari, Ratan; Kumar, Dinesh
2017-01-01
Wheat fulfills 20% of global caloric requirement. World needs 60% more wheat for 9 billion population by 2050 but climate change with increasing temperature is projected to affect wheat productivity adversely. Trait improvement and management of wheat germplasm requires genomic resource. Simple Sequence Repeats (SSRs) being highly polymorphic and ubiquitously distributed in the genome, can be a marker of choice but there is no structured marker database with options to generate primer pairs for genotyping on desired chromosome/physical location. Previously associated markers with different wheat trait are also not available in any database. Limitations of in vitro SSR discovery can be overcome by genome-wide in silico mining of SSR. Triticum aestivum SSR database (TaSSRDb) is an integrated online database with three-tier architecture, developed using PHP and MySQL and accessible at http://webtom.cabgrid.res.in/wheatssr/. For genotyping, Primer3 standalone code computes primers on user request. Chromosome-wise SSR calling for all the three sub genomes along with choice of motif types is provided in addition to the primer generation for desired marker. We report here a database of highest number of SSRs (476,169) from complex, hexaploid wheat genome (~17 GB) along with previously reported 268 SSR markers associated with 11 traits. Highest (116.93 SSRs/Mb) and lowest (74.57 SSRs/Mb) SSR densities were found on 2D and 3A chromosome, respectively. To obtain homozygous locus, e-PCR was done. Such 30 loci were randomly selected for PCR validation in panel of 18 wheat Advance Varietal Trial (AVT) lines. TaSSRDb can be a valuable genomic resource tool for linkage mapping, gene/QTL (Quantitative trait locus) discovery, diversity analysis, traceability and variety identification. Varietal specific profiling and differentiation can supplement DUS (Distinctiveness, Uniformity, and Stability) testing, EDV (Essentially Derived Variety)/IV (Initial Variety) disputes, seed purity and hybrid wheat testing. All these are required in germplasm management as well as also in the endeavor of wheat productivity. PMID:29234333
Jung, Sungwon
2018-04-20
Obesity and type 2 diabetes (T2D) are two major conditions that are related to metabolic disorders and affect a large population. Although there have been significant efforts to identify their therapeutic targets, few benefits have come from comprehensive molecular profiling. This limited availability of comprehensive molecular profiling of obesity and T2D may be due to multiple challenges, as these conditions involve multiple organs and collecting tissue samples from subjects is more difficult in obesity and T2D than in other diseases, where surgical treatments are popular choices. While there is no repository of comprehensive molecular profiling data for obesity and T2D, multiple existing data resources can be utilized to cover various aspects of these conditions. This review presents studies with available genomic data resources for obesity and T2D and discusses genome-wide association studies (GWAS), a knockout (KO)-based phenotyping study, and gene expression profiles. These studies, based on their assessed coverage and characteristics, can provide insights into how such data can be utilized to identify therapeutic targets for obesity and T2D.
Ensembl genomes 2016: more genomes, more complexity
USDA-ARS?s Scientific Manuscript database
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent...
NASA Astrophysics Data System (ADS)
Meyer, Sam; Everaers, Ralf
2015-02-01
The histone-DNA interaction in the nucleosome is a fundamental mechanism of genomic compaction and regulation, which remains largely unknown despite increasing structural knowledge of the complex. In this paper, we propose a framework for the extraction of a nanoscale histone-DNA force-field from a collection of high-resolution structures, which may be adapted to a larger class of protein-DNA complexes. We applied the procedure to a large crystallographic database extended by snapshots from molecular dynamics simulations. The comparison of the structural models first shows that, at histone-DNA contact sites, the DNA base-pairs are shifted outwards locally, consistent with locally repulsive forces exerted by the histones. The second step shows that the various force profiles of the structures under analysis derive locally from a unique, sequence-independent, quadratic repulsive force-field, while the sequence preferences are entirely due to internal DNA mechanics. We have thus obtained the first knowledge-derived nanoscale interaction potential for histone-DNA in the nucleosome. The conformations obtained by relaxation of nucleosomal DNA with high-affinity sequences in this potential accurately reproduce the experimental values of binding preferences. Finally we address the more generic binding mechanisms relevant to the 80% genomic sequences incorporated in nucleosomes, by computing the conformation of nucleosomal DNA with sequence-averaged properties. This conformation differs from those found in crystals, and the analysis suggests that repulsive histone forces are related to local stretch tension in nucleosomal DNA, mostly between adjacent contact points. This tension could play a role in the stability of the complex.
Jenkins, Adam M; Waterhouse, Robert M; Muskavitch, Marc A T
2015-04-23
Long non-coding RNAs (lncRNAs) have been defined as mRNA-like transcripts longer than 200 nucleotides that lack significant protein-coding potential, and many of them constitute scaffolds for ribonucleoprotein complexes with critical roles in epigenetic regulation. Various lncRNAs have been implicated in the modulation of chromatin structure, transcriptional and post-transcriptional gene regulation, and regulation of genomic stability in mammals, Caenorhabditis elegans, and Drosophila melanogaster. The purpose of this study is to identify the lncRNA landscape in the malaria vector An. gambiae and assess the evolutionary conservation of lncRNAs and their secondary structures across the Anopheles genus. Using deep RNA sequencing of multiple Anopheles gambiae life stages, we have identified 2,949 lncRNAs and more than 300 previously unannotated putative protein-coding genes. The lncRNAs exhibit differential expression profiles across life stages and adult genders. We find that across the genus Anopheles, lncRNAs display much lower sequence conservation than protein-coding genes. Additionally, we find that lncRNA secondary structure is highly conserved within the Gambiae complex, but diverges rapidly across the rest of the genus Anopheles. This study offers one of the first lncRNA secondary structure analyses in vector insects. Our description of lncRNAs in An. gambiae offers the most comprehensive genome-wide insights to date into lncRNAs in this vector mosquito, and defines a set of potential targets for the development of vector-based interventions that may further curb the human malaria burden in disease-endemic countries.
Gu, Joyce Xiuweu-Xu; Wei, Michael Yang; Rao, Pulivarthi H.; Lau, Ching C.; Behl, Sanjiv; Man, Tsz-Kwong
2007-01-01
With the increasing application of various genomic technologies in biomedical research, there is a need to integrate these data to correlate candidate genes/regions that are identified by different genomic platforms. Although there are tools that can analyze data from individual platforms, essential software for integration of genomic data is still lacking. Here, we present a novel Java-based program called CGI (Cytogenetics-Genomics Integrator) that matches the BAC clones from array-based comparative genomic hybridization (aCGH) to genes from RNA expression profiling datasets. The matching is computed via a fast, backend MySQL database containing UCSC Genome Browser annotations. This program also provides an easy-to-use graphical user interface for visualizing and summarizing the correlation of DNA copy number changes and RNA expression patterns from a set of experiments. In addition, CGI uses a Java applet to display the copy number values of a specific BAC clone in aCGH experiments side by side with the expression levels of genes that are mapped back to that BAC clone from the microarray experiments. The CGI program is built on top of extensible, reusable graphic components specifically designed for biologists. It is cross-platform compatible and the source code is freely available under the General Public License. PMID:19936083
Gu, Joyce Xiuweu-Xu; Wei, Michael Yang; Rao, Pulivarthi H; Lau, Ching C; Behl, Sanjiv; Man, Tsz-Kwong
2007-10-06
With the increasing application of various genomic technologies in biomedical research, there is a need to integrate these data to correlate candidate genes/regions that are identified by different genomic platforms. Although there are tools that can analyze data from individual platforms, essential software for integration of genomic data is still lacking. Here, we present a novel Java-based program called CGI (Cytogenetics-Genomics Integrator) that matches the BAC clones from array-based comparative genomic hybridization (aCGH) to genes from RNA expression profiling datasets. The matching is computed via a fast, backend MySQL database containing UCSC Genome Browser annotations. This program also provides an easy-to-use graphical user interface for visualizing and summarizing the correlation of DNA copy number changes and RNA expression patterns from a set of experiments. In addition, CGI uses a Java applet to display the copy number values of a specific BAC clone in aCGH experiments side by side with the expression levels of genes that are mapped back to that BAC clone from the microarray experiments. The CGI program is built on top of extensible, reusable graphic components specifically designed for biologists. It is cross-platform compatible and the source code is freely available under the General Public License.
Przytycki, Pawel F; Singh, Mona
2017-08-25
A major aim of cancer genomics is to pinpoint which somatically mutated genes are involved in tumor initiation and progression. We introduce a new framework for uncovering cancer genes, differential mutation analysis, which compares the mutational profiles of genes across cancer genomes with their natural germline variation across healthy individuals. We present DiffMut, a fast and simple approach for differential mutational analysis, and demonstrate that it is more effective in discovering cancer genes than considerably more sophisticated approaches. We conclude that germline variation across healthy human genomes provides a powerful means for characterizing somatic mutation frequency and identifying cancer driver genes. DiffMut is available at https://github.com/Singh-Lab/Differential-Mutation-Analysis .
Single-cell transcriptional analysis of taste sensory neuron pair in Caenorhabditis elegans.
Takayama, Jun; Faumont, Serge; Kunitomo, Hirofumi; Lockery, Shawn R; Iino, Yuichi
2010-01-01
The nervous system is composed of a wide variety of neurons. A description of the transcriptional profiles of each neuron would yield enormous information about the molecular mechanisms that define morphological or functional characteristics. Here we show that RNA isolation from single neurons is feasible by using an optimized mRNA tagging method. This method extracts transcripts in the target cells by co-immunoprecipitation of the complexes of RNA and epitope-tagged poly(A) binding protein expressed specifically in the cells. With this method and genome-wide microarray, we compared the transcriptional profiles of two functionally different neurons in the main C. elegans gustatory neuron class ASE. Eight of the 13 known subtype-specific genes were successfully detected. Additionally, we identified nine novel genes including a receptor guanylyl cyclase, secreted proteins, a TRPC channel and uncharacterized genes conserved among nematodes, suggesting the two neurons are substantially different than previously thought. The expression of these novel genes was controlled by the previously known regulatory network for subtype differentiation. We also describe unique motif organization within individual gene groups classified by the expression patterns in ASE. Our study paves the way to the complete catalog of the expression profiles of individual C. elegans neurons.
Genomic Prediction of Testcross Performance in Canola (Brassica napus)
Jan, Habib U.; Abbadi, Amine; Lücke, Sophie; Nichols, Richard A.; Snowdon, Rod J.
2016-01-01
Genomic selection (GS) is a modern breeding approach where genome-wide single-nucleotide polymorphism (SNP) marker profiles are simultaneously used to estimate performance of untested genotypes. In this study, the potential of genomic selection methods to predict testcross performance for hybrid canola breeding was applied for various agronomic traits based on genome-wide marker profiles. A total of 475 genetically diverse spring-type canola pollinator lines were genotyped at 24,403 single-copy, genome-wide SNP loci. In parallel, the 950 F1 testcross combinations between the pollinators and two representative testers were evaluated for a number of important agronomic traits including seedling emergence, days to flowering, lodging, oil yield and seed yield along with essential seed quality characters including seed oil content and seed glucosinolate content. A ridge-regression best linear unbiased prediction (RR-BLUP) model was applied in combination with 500 cross-validations for each trait to predict testcross performance, both across the whole population as well as within individual subpopulations or clusters, based solely on SNP profiles. Subpopulations were determined using multidimensional scaling and K-means clustering. Genomic prediction accuracy across the whole population was highest for seed oil content (0.81) followed by oil yield (0.75) and lowest for seedling emergence (0.29). For seed yieId, seed glucosinolate, lodging resistance and days to onset of flowering (DTF), prediction accuracies were 0.45, 0.61, 0.39 and 0.56, respectively. Prediction accuracies could be increased for some traits by treating subpopulations separately; a strategy which only led to moderate improvements for some traits with low heritability, like seedling emergence. No useful or consistent increase in accuracy was obtained by inclusion of a population substructure covariate in the model. Testcross performance prediction using genome-wide SNP markers shows considerable potential for pre-selection of promising hybrid combinations prior to resource-intensive field testing over multiple locations and years. PMID:26824924
Sexton-Oates, Alexandra; Carmody, Jake; Ekinci, Elif I.; Dwyer, Karen M.; Saffery, Richard
2018-01-01
Aim To characterise the genomic DNA (gDNA) yield from urine and quality of derived methylation data generated from the widely used Illuminia Infinium MethylationEPIC (HM850K) platform and compare this with buffy coat samples. Background DNA methylation is the most widely studied epigenetic mark and variations in DNA methylation profile have been implicated in diabetes which affects approximately 415 million people worldwide. Methods QIAamp Viral RNA Mini Kit and QIAamp DNA micro kit were used to extract DNA from frozen and fresh urine samples as well as increasing volumes of fresh urine. Matched buffy coats to the frozen urine were also obtained and DNA was extracted from the buffy coats using the QIAamp DNA Mini Kit. Genomic DNA of greater concentration than 20μg/ml were used for methylation analysis using the HM850K array. Results Irrespective of extraction technique or the use of fresh versus frozen urine samples, limited genomic DNA was obtained using a starting sample volume of 5ml (0–0.86μg/mL). In order to optimize the yield, we increased starting volumes to 50ml fresh urine, which yielded only 0–9.66μg/mL A different kit, QIAamp DNA Micro Kit, was trialled in six fresh urine samples and ten frozen urine samples with inadequate DNA yields from 0–17.7μg/mL and 0–1.6μg/mL respectively. Sufficient genomic DNA was obtained from only 4 of the initial 41 frozen urine samples (10%) for DNA methylation profiling. In comparison, all four buffy coat samples (100%) provided sufficient genomic DNA. Conclusion High quality data can be obtained provided a sufficient yield of genomic DNA is isolated. Despite optimizing various extraction methodologies, the modest amount of genomic DNA derived from urine, may limit the generalisability of this approach for the identification of DNA methylation biomarkers of chronic diabetic kidney disease. PMID:29462136
Lecamwasam, Ashani; Sexton-Oates, Alexandra; Carmody, Jake; Ekinci, Elif I; Dwyer, Karen M; Saffery, Richard
2018-01-01
To characterise the genomic DNA (gDNA) yield from urine and quality of derived methylation data generated from the widely used Illuminia Infinium MethylationEPIC (HM850K) platform and compare this with buffy coat samples. DNA methylation is the most widely studied epigenetic mark and variations in DNA methylation profile have been implicated in diabetes which affects approximately 415 million people worldwide. QIAamp Viral RNA Mini Kit and QIAamp DNA micro kit were used to extract DNA from frozen and fresh urine samples as well as increasing volumes of fresh urine. Matched buffy coats to the frozen urine were also obtained and DNA was extracted from the buffy coats using the QIAamp DNA Mini Kit. Genomic DNA of greater concentration than 20μg/ml were used for methylation analysis using the HM850K array. Irrespective of extraction technique or the use of fresh versus frozen urine samples, limited genomic DNA was obtained using a starting sample volume of 5ml (0-0.86μg/mL). In order to optimize the yield, we increased starting volumes to 50ml fresh urine, which yielded only 0-9.66μg/mL A different kit, QIAamp DNA Micro Kit, was trialled in six fresh urine samples and ten frozen urine samples with inadequate DNA yields from 0-17.7μg/mL and 0-1.6μg/mL respectively. Sufficient genomic DNA was obtained from only 4 of the initial 41 frozen urine samples (10%) for DNA methylation profiling. In comparison, all four buffy coat samples (100%) provided sufficient genomic DNA. High quality data can be obtained provided a sufficient yield of genomic DNA is isolated. Despite optimizing various extraction methodologies, the modest amount of genomic DNA derived from urine, may limit the generalisability of this approach for the identification of DNA methylation biomarkers of chronic diabetic kidney disease.
Cho, Yong-Joon; Yi, Hana; Chun, Jongsik; Cho, Sang-Nae; Daley, Charles L; Koh, Won-Jung; Shin, Sung Jae
2013-01-01
Members of the Mycobacterium abscessus complex are rapidly growing mycobacteria that are emerging as human pathogens. The M. abscessus complex was previously composed of three species, namely M. abscessus sensu stricto, 'M. massiliense', and 'M. bolletii'. In 2011, 'M. massiliense' and 'M. bolletii' were united and reclassified as a single subspecies within M. abscessus: M. abscessus subsp. bolletii. However, the placement of 'M. massiliense' within the boundary of M. abscessus subsp. bolletii remains highly controversial with regard to clinical aspects. In this study, we revisited the taxonomic status of members of the M. abscessus complex based on comparative analysis of the whole-genome sequences of 53 strains. The genome sequence of the previous type strain of 'Mycobacterium massiliense' (CIP 108297) was determined using next-generation sequencing. The genome tree based on average nucleotide identity (ANI) values supported the differentiation of 'M. bolletii' and 'M. massiliense' at the subspecies level. The genome tree also clearly illustrated that 'M. bolletii' and 'M. massiliense' form a distinct phylogenetic clade within the radiation of the M. abscessus complex. The genomic distances observed in this study suggest that the current M. abscessus subsp. bolletii taxon should be divided into two subspecies, M. abscessus subsp. massiliense subsp. nov. and M. abscessus subsp. bolletii, to correspondingly accommodate the previously known 'M. massiliense' and 'M. bolletii' strains.
Phenotypic and Genomic Analysis of Hypervirulent Human-associated Bordetella bronchiseptica
2012-01-01
Background B. bronchiseptica infections are usually associated with wild or domesticated animals, but infrequently with humans. A recent phylogenetic analysis distinguished two distinct B. bronchiseptica subpopulations, designated complexes I and IV. Complex IV isolates appear to have a bias for infecting humans; however, little is known regarding their epidemiology, virulence properties, or comparative genomics. Results Here we report a characterization of the virulence of human-associated complex IV B. bronchiseptica strains. In in vitro cytotoxicity assays, complex IV strains showed increased cytotoxicity in comparison to a panel of complex I strains. Some complex IV isolates were remarkably cytotoxic, resulting in LDH release levels in A549 cells that were 10- to 20-fold greater than complex I strains. In vivo, a subset of complex IV strains was found to be hypervirulent, with an increased ability to cause lethal pulmonary infections in mice. Hypercytotoxicity in vitro and hypervirulence in vivo were both dependent on the activity of the bsc T3SS and the BteA effector. To clarify differences between lineages, representative complex IV isolates were sequenced and their genomes were compared to complex I isolates. Although our analysis showed there were no genomic sequences that can be considered unique to complex IV strains, there were several loci that were predominantly found in complex IV isolates. Conclusion Our observations reveal a T3SS-dependent hypervirulence phenotype in human-associated complex IV isolates, highlighting the need for further studies on the epidemiology and evolutionary dynamics of this B. bronchiseptica lineage. PMID:22863321
Biswas et al. describe an “exceptional responder” lung adenocarcinoma patient who survived with metastatic lung adenocarcinoma for 7 years while undergoing single or combination ERBB2-directed therapies. Whole-genome, whole-exome, and high-coverage ion-torrent targeted sequencing were used to demonstrate extreme genomic heterogeneity between the lung and lymph node metastatic
Schwaenen, Carsten; Viardot, Andreas; Berger, Hilmar; Barth, Thomas F E; Bentink, Stefan; Döhner, Hartmut; Enz, Martina; Feller, Alfred C; Hansmann, Martin-Leo; Hummel, Michael; Kestler, Hans A; Klapper, Wolfram; Kreuz, Markus; Lenze, Dido; Loeffler, Markus; Möller, Peter; Müller-Hermelink, Hans-Konrad; Ott, German; Rosolowski, Maciej; Rosenwald, Andreas; Ruf, Sandra; Siebert, Reiner; Spang, Rainer; Stein, Harald; Truemper, Lorenz; Lichter, Peter; Bentz, Martin; Wessendorf, Swen
2009-01-01
Follicular lymphoma (FL) is characterized by a large number of chromosomal aberrations. However, their exact genomic extension and involved target genes remain to be determined. For this purpose, we used array-based intermediate-high resolution genomic profiling in combination with Affymetrix gene expression analysis. Tumor specimens from 128 FL patients were analyzed for the presence of genomic aberrations and the results were correlated to clinical data sets and mRNA expression levels. In 114 (89%) of the 128 analyzed cases, a total of 688 genomic aberrations (384 gains/amplifications and 304 losses) were detected. Frequent genomic aberrations were: -1p36 (18%), +2p15 (24%), -3q (14%), -6q (25%), +7p (19%), +7q (23%), +8q (14%), -9p (16%), -11q (15%), +12q (20%), -13q (11%), -17p (16%), +18p (18%), and +18q (28%). Critical segments of these imbalances were delineated to genomic fragments with a minimum size down to 0.2 Mb. By comparison of these with mRNA gene expression data, putative candidate genes were identified. Moreover, we found that deletions affecting the tumor suppressor gene CDKN2A/B on 9p21 were detected in nontransformed FL grade I-II. For this aberration as well as for -6q25 and -6q26, an association with inferior survival was observed.
GDA, a web-based tool for Genomics and Drugs integrated analysis.
Caroli, Jimmy; Sorrentino, Giovanni; Forcato, Mattia; Del Sal, Giannino; Bicciato, Silvio
2018-05-25
Several major screenings of genetic profiling and drug testing in cancer cell lines proved that the integration of genomic portraits and compound activities is effective in discovering new genetic markers of drug sensitivity and clinically relevant anticancer compounds. Despite most genetic and drug response data are publicly available, the availability of user-friendly tools for their integrative analysis remains limited, thus hampering an effective exploitation of this information. Here, we present GDA, a web-based tool for Genomics and Drugs integrated Analysis that combines drug response data for >50 800 compounds with mutations and gene expression profiles across 73 cancer cell lines. Genomic and pharmacological data are integrated through a modular architecture that allows users to identify compounds active towards cancer cell lines bearing a specific genomic background and, conversely, the mutational or transcriptional status of cells responding or not-responding to a specific compound. Results are presented through intuitive graphical representations and supplemented with information obtained from public repositories. As both personalized targeted therapies and drug-repurposing are gaining increasing attention, GDA represents a resource to formulate hypotheses on the interplay between genomic traits and drug response in cancer. GDA is freely available at http://gda.unimore.it/.
Gene integrated set profile analysis: a context-based approach for inferring biological endpoints
Kowalski, Jeanne; Dwivedi, Bhakti; Newman, Scott; Switchenko, Jeffery M.; Pauly, Rini; Gutman, David A.; Arora, Jyoti; Gandhi, Khanjan; Ainslie, Kylie; Doho, Gregory; Qin, Zhaohui; Moreno, Carlos S.; Rossi, Michael R.; Vertino, Paula M.; Lonial, Sagar; Bernal-Mizrachi, Leon; Boise, Lawrence H.
2016-01-01
The identification of genes with specific patterns of change (e.g. down-regulated and methylated) as phenotype drivers or samples with similar profiles for a given gene set as drivers of clinical outcome, requires the integration of several genomic data types for which an ‘integrate by intersection’ (IBI) approach is often applied. In this approach, results from separate analyses of each data type are intersected, which has the limitation of a smaller intersection with more data types. We introduce a new method, GISPA (Gene Integrated Set Profile Analysis) for integrated genomic analysis and its variation, SISPA (Sample Integrated Set Profile Analysis) for defining respective genes and samples with the context of similar, a priori specified molecular profiles. With GISPA, the user defines a molecular profile that is compared among several classes and obtains ranked gene sets that satisfy the profile as drivers of each class. With SISPA, the user defines a gene set that satisfies a profile and obtains sample groups of profile activity. Our results from applying GISPA to human multiple myeloma (MM) cell lines contained genes of known profiles and importance, along with several novel targets, and their further SISPA application to MM coMMpass trial data showed clinical relevance. PMID:26826710