targeted genomic regions: Topics by Science.gov

Sample records for targeted genomic regions

A multiplex primer design algorithm for target amplification of continuous genomic regions.

PubMed

Ozturk, Ahmet Rasit; Can, Tolga

2017-06-19

Targeted Next Generation Sequencing (NGS) assays are cost-efficient and reliable alternatives to Sanger sequencing. For sequencing of very large set of genes, the target enrichment approach is suitable. However, for smaller genomic regions, the target amplification method is more efficient than both the target enrichment method and Sanger sequencing. The major difficulty of the target amplification method is the preparation of amplicons, regarding required time, equipment, and labor. Multiplex PCR (MPCR) is a good solution for the mentioned problems. We propose a novel method to design MPCR primers for a continuous genomic region, following the best practices of clinically reliable PCR design processes. On an experimental setup with 48 different combinations of factors, we have shown that multiple parameters might effect finding the first feasible solution. Increasing the length of the initial primer candidate selection sequence gives better results whereas waiting for a longer time to find the first feasible solution does not have a significant impact. We generated MPCR primer designs for the HBB whole gene, MEFV coding regions, and human exons between 2000 bp to 2100 bp-long. Our benchmarking experiments show that the proposed MPCR approach is able produce reliable NGS assay primers for a given sequence in a reasonable amount of time.
TARGETED CAPTURE IN EVOLUTIONARY AND ECOLOGICAL GENOMICS

PubMed Central

Jones, Matthew R.; Good, Jeffrey M.

2016-01-01

The rapid expansion of next-generation sequencing has yielded a powerful array of tools to address fundamental biological questions at a scale that was inconceivable just a few years ago. Various genome partitioning strategies to sequence select subsets of the genome have emerged as powerful alternatives to whole genome sequencing in ecological and evolutionary genomic studies. High throughput targeted capture is one such strategy that involves the parallel enrichment of pre-selected genomic regions of interest. The growing use of targeted capture demonstrates its potential power to address a range of research questions, yet these approaches have yet to expand broadly across labs focused on evolutionary and ecological genomics. In part, the use of targeted capture has been hindered by the logistics of capture design and implementation in species without established reference genomes. Here we aim to 1) increase the accessibility of targeted capture to researchers working in non-model taxa by discussing capture methods that circumvent the need of a reference genome, 2) highlight the evolutionary and ecological applications where this approach is emerging as a powerful sequencing strategy, and 3) discuss the future of targeted capture and other genome partitioning approaches in light of the increasing accessibility of whole genome sequencing. Given the practical advantages and increasing feasibility of high-throughput targeted capture, we anticipate an ongoing expansion of capture-based approaches in evolutionary and ecological research, synergistic with an expansion of whole genome sequencing. PMID:26137993
GRAbB: Selective Assembly of Genomic Regions, a New Niche for Genomic Research

PubMed Central

Zhang, Hao; van Diepeningen, Anne D.; van der Lee, Theo A. J.; Waalwijk, Cees; de Hoog, G. Sybren

2016-01-01

GRAbB (Genomic Region Assembly by Baiting) is a new program that is dedicated to assemble specific genomic regions from NGS data. This approach is especially useful when dealing with multi copy regions, such as mitochondrial genome and the rDNA repeat region, parts of the genome that are often neglected or poorly assembled, although they contain interesting information from phylogenetic or epidemiologic perspectives, but also single copy regions can be assembled. The program is capable of targeting multiple regions within a single run. Furthermore, GRAbB can be used to extract specific loci from NGS data, based on homology, like sequences that are used for barcoding. To make the assembly specific, a known part of the region, such as the sequence of a PCR amplicon or a homologous sequence from a related species must be specified. By assembling only the region of interest, the assembly process is computationally much less demanding and may lead to assemblies of better quality. In this study the different applications and functionalities of the program are demonstrated such as: exhaustive assembly (rDNA region and mitochondrial genome), extracting homologous regions or genes (IGS, RPB1, RPB2 and TEF1a), as well as extracting multiple regions within a single run. The program is also compared with MITObim, which is meant for the exhaustive assembly of a single target based on a similar query sequence. GRAbB is shown to be more efficient than MITObim in terms of speed, memory and disk usage. The other functionalities (handling multiple targets simultaneously and extracting homologous regions) of the new program are not matched by other programs. The program is available with explanatory documentation at https://github.com/b-brankovics/grabb. GRAbB has been tested on Ubuntu (12.04 and 14.04), Fedora (23), CentOS (7.1.1503) and Mac OS X (10.7). Furthermore, GRAbB is available as a docker repository: brankovics/grabb (https://hub.docker.com/r/brankovics/grabb/). PMID
GRAbB: Selective Assembly of Genomic Regions, a New Niche for Genomic Research.

PubMed

Brankovics, Balázs; Zhang, Hao; van Diepeningen, Anne D; van der Lee, Theo A J; Waalwijk, Cees; de Hoog, G Sybren

2016-06-01

GRAbB (Genomic Region Assembly by Baiting) is a new program that is dedicated to assemble specific genomic regions from NGS data. This approach is especially useful when dealing with multi copy regions, such as mitochondrial genome and the rDNA repeat region, parts of the genome that are often neglected or poorly assembled, although they contain interesting information from phylogenetic or epidemiologic perspectives, but also single copy regions can be assembled. The program is capable of targeting multiple regions within a single run. Furthermore, GRAbB can be used to extract specific loci from NGS data, based on homology, like sequences that are used for barcoding. To make the assembly specific, a known part of the region, such as the sequence of a PCR amplicon or a homologous sequence from a related species must be specified. By assembling only the region of interest, the assembly process is computationally much less demanding and may lead to assemblies of better quality. In this study the different applications and functionalities of the program are demonstrated such as: exhaustive assembly (rDNA region and mitochondrial genome), extracting homologous regions or genes (IGS, RPB1, RPB2 and TEF1a), as well as extracting multiple regions within a single run. The program is also compared with MITObim, which is meant for the exhaustive assembly of a single target based on a similar query sequence. GRAbB is shown to be more efficient than MITObim in terms of speed, memory and disk usage. The other functionalities (handling multiple targets simultaneously and extracting homologous regions) of the new program are not matched by other programs. The program is available with explanatory documentation at https://github.com/b-brankovics/grabb. GRAbB has been tested on Ubuntu (12.04 and 14.04), Fedora (23), CentOS (7.1.1503) and Mac OS X (10.7). Furthermore, GRAbB is available as a docker repository: brankovics/grabb (https://hub.docker.com/r/brankovics/grabb/).
The use of genomic coancestry matrices in the optimisation of contributions to maintain genetic diversity at specific regions of the genome.

PubMed

Gómez-Romano, Fernando; Villanueva, Beatriz; Fernández, Jesús; Woolliams, John A; Pong-Wong, Ricardo

2016-01-13

Optimal contribution methods have proved to be very efficient for controlling the rates at which coancestry and inbreeding increase and therefore, for maintaining genetic diversity. These methods have usually relied on pedigree information for estimating genetic relationships between animals. However, with the large amount of genomic information now available such as high-density single nucleotide polymorphism (SNP) chips that contain thousands of SNPs, it becomes possible to calculate more accurate estimates of relationships and to target specific regions in the genome where there is a particular interest in maximising genetic diversity. The objective of this study was to investigate the effectiveness of using genomic coancestry matrices for: (1) minimising the loss of genetic variability at specific genomic regions while restricting the overall loss in the rest of the genome; or (2) maximising the overall genetic diversity while restricting the loss of diversity at specific genomic regions. Our study shows that the use of genomic coancestry was very successful at minimising the loss of diversity and outperformed the use of pedigree-based coancestry (genetic diversity even increased in some scenarios). The results also show that genomic information allows a targeted optimisation to maintain diversity at specific genomic regions, whether they are linked or not. The level of variability maintained increased when the targeted regions were closely linked. However, such targeted management leads to an important loss of diversity in the rest of the genome and, thus, it is necessary to take further actions to constrain this loss. Optimal contribution methods also proved to be effective at restricting the loss of diversity in the rest of the genome, although the resulting rate of coancestry was higher than the constraint imposed. The use of genomic matrices when optimising contributions permits the control of genetic diversity and inbreeding at specific regions of the
Mammalian genomic regulatory regions predicted by utilizing human genomics, transcriptomics, and epigenetics data

PubMed Central

Nguyen, Quan H; Tellam, Ross L; Naval-Sanchez, Marina; Porto-Neto, Laercio R; Barendse, William; Reverter, Antonio; Hayes, Benjamin; Kijas, James; Dalrymple, Brian P

2018-01-01

Abstract Genome sequences for hundreds of mammalian species are available, but an understanding of their genomic regulatory regions, which control gene expression, is only beginning. A comprehensive prediction of potential active regulatory regions is necessary to functionally study the roles of the majority of genomic variants in evolution, domestication, and animal production. We developed a computational method to predict regulatory DNA sequences (promoters, enhancers, and transcription factor binding sites) in production animals (cows and pigs) and extended its broad applicability to other mammals. The method utilizes human regulatory features identified from thousands of tissues, cell lines, and experimental assays to find homologous regions that are conserved in sequences and genome organization and are enriched for regulatory elements in the genome sequences of other mammalian species. Importantly, we developed a filtering strategy, including a machine learning classification method, to utilize a very small number of species-specific experimental datasets available to select for the likely active regulatory regions. The method finds the optimal combination of sensitivity and accuracy to unbiasedly predict regulatory regions in mammalian species. Furthermore, we demonstrated the utility of the predicted regulatory datasets in cattle for prioritizing variants associated with multiple production and climate change adaptation traits and identifying potential genome editing targets. PMID:29618048
Mammalian genomic regulatory regions predicted by utilizing human genomics, transcriptomics, and epigenetics data.

PubMed

Nguyen, Quan H; Tellam, Ross L; Naval-Sanchez, Marina; Porto-Neto, Laercio R; Barendse, William; Reverter, Antonio; Hayes, Benjamin; Kijas, James; Dalrymple, Brian P

2018-03-01

Genome sequences for hundreds of mammalian species are available, but an understanding of their genomic regulatory regions, which control gene expression, is only beginning. A comprehensive prediction of potential active regulatory regions is necessary to functionally study the roles of the majority of genomic variants in evolution, domestication, and animal production. We developed a computational method to predict regulatory DNA sequences (promoters, enhancers, and transcription factor binding sites) in production animals (cows and pigs) and extended its broad applicability to other mammals. The method utilizes human regulatory features identified from thousands of tissues, cell lines, and experimental assays to find homologous regions that are conserved in sequences and genome organization and are enriched for regulatory elements in the genome sequences of other mammalian species. Importantly, we developed a filtering strategy, including a machine learning classification method, to utilize a very small number of species-specific experimental datasets available to select for the likely active regulatory regions. The method finds the optimal combination of sensitivity and accuracy to unbiasedly predict regulatory regions in mammalian species. Furthermore, we demonstrated the utility of the predicted regulatory datasets in cattle for prioritizing variants associated with multiple production and climate change adaptation traits and identifying potential genome editing targets.
Hyb-Seq: Combining target enrichment and genome skimming for plant phylogenomics1

PubMed Central

Weitemier, Kevin; Straub, Shannon C. K.; Cronn, Richard C.; Fishbein, Mark; Schmickl, Roswitha; McDonnell, Angela; Liston, Aaron

2014-01-01

• Premise of the study: Hyb-Seq, the combination of target enrichment and genome skimming, allows simultaneous data collection for low-copy nuclear genes and high-copy genomic targets for plant systematics and evolution studies. • Methods and Results: Genome and transcriptome assemblies for milkweed (Asclepias syriaca) were used to design enrichment probes for 3385 exons from 768 genes (>1.6 Mbp) followed by Illumina sequencing of enriched libraries. Hyb-Seq of 12 individuals (10 Asclepias species and two related genera) resulted in at least partial assembly of 92.6% of exons and 99.7% of genes and an average assembly length >2 Mbp. Importantly, complete plastomes and nuclear ribosomal DNA cistrons were assembled using off-target reads. Phylogenomic analyses demonstrated signal conflict between genomes. • Conclusions: The Hyb-Seq approach enables targeted sequencing of thousands of low-copy nuclear exons and flanking regions, as well as genome skimming of high-copy repeats and organellar genomes, to efficiently produce genome-scale data sets for phylogenomics. PMID:25225629
AnnotateGenomicRegions: a web application.

PubMed

Zammataro, Luca; DeMolfetta, Rita; Bucci, Gabriele; Ceol, Arnaud; Muller, Heiko

2014-01-01

Modern genomic technologies produce large amounts of data that can be mapped to specific regions in the genome. Among the first steps in interpreting the results is annotation of genomic regions with known features such as genes, promoters, CpG islands etc. Several tools have been published to perform this task. However, using these tools often requires a significant amount of bioinformatics skills and/or downloading and installing dedicated software. Here we present AnnotateGenomicRegions, a web application that accepts genomic regions as input and outputs a selection of overlapping and/or neighboring genome annotations. Supported organisms include human (hg18, hg19), mouse (mm8, mm9, mm10), zebrafish (danRer7), and Saccharomyces cerevisiae (sacCer2, sacCer3). AnnotateGenomicRegions is accessible online on a public server or can be installed locally. Some frequently used annotations and genomes are embedded in the application while custom annotations may be added by the user. The increasing spread of genomic technologies generates the need for a simple-to-use annotation tool for genomic regions that can be used by biologists and bioinformaticians alike. AnnotateGenomicRegions meets this demand. AnnotateGenomicRegions is an open-source web application that can be installed on any personal computer or institute server. AnnotateGenomicRegions is available at: http://cru.genomics.iit.it/AnnotateGenomicRegions.
AnnotateGenomicRegions: a web application

PubMed Central

2014-01-01

Background Modern genomic technologies produce large amounts of data that can be mapped to specific regions in the genome. Among the first steps in interpreting the results is annotation of genomic regions with known features such as genes, promoters, CpG islands etc. Several tools have been published to perform this task. However, using these tools often requires a significant amount of bioinformatics skills and/or downloading and installing dedicated software. Results Here we present AnnotateGenomicRegions, a web application that accepts genomic regions as input and outputs a selection of overlapping and/or neighboring genome annotations. Supported organisms include human (hg18, hg19), mouse (mm8, mm9, mm10), zebrafish (danRer7), and Saccharomyces cerevisiae (sacCer2, sacCer3). AnnotateGenomicRegions is accessible online on a public server or can be installed locally. Some frequently used annotations and genomes are embedded in the application while custom annotations may be added by the user. Conclusions The increasing spread of genomic technologies generates the need for a simple-to-use annotation tool for genomic regions that can be used by biologists and bioinformaticians alike. AnnotateGenomicRegions meets this demand. AnnotateGenomicRegions is an open-source web application that can be installed on any personal computer or institute server. AnnotateGenomicRegions is available at: http://cru.genomics.iit.it/AnnotateGenomicRegions. PMID:24564446
Advances in targeted genome editing.

PubMed

Perez-Pinera, Pablo; Ousterout, David G; Gersbach, Charles A

2012-08-01

New technologies have recently emerged that enable targeted editing of genomes in diverse systems. This includes precise manipulation of gene sequences in their natural chromosomal context and addition of transgenes to specific genomic loci. This progress has been facilitated by advances in engineering targeted nucleases with programmable, site-specific DNA-binding domains, including zinc finger proteins and transcription activator-like effectors (TALEs). Recent improvements have enhanced nuclease performance, accelerated nuclease assembly, and lowered the cost of genome editing. These advances are driving new approaches to many areas of biotechnology, including biopharmaceutical production, agriculture, creation of transgenic organisms and cell lines, and studies of genome structure, regulation, and function. Genome editing is also being investigated in preclinical and clinical gene therapies for many diseases. Copyright © 2012 Elsevier Ltd. All rights reserved.
Diverse patterns of genomic targeting by transcriptional regulators in Drosophila melanogaster.

PubMed

Slattery, Matthew; Ma, Lijia; Spokony, Rebecca F; Arthur, Robert K; Kheradpour, Pouya; Kundaje, Anshul; Nègre, Nicolas; Crofts, Alex; Ptashkin, Ryan; Zieba, Jennifer; Ostapenko, Alexander; Suchy, Sarah; Victorsen, Alec; Jameel, Nader; Grundstad, A Jason; Gao, Wenxuan; Moran, Jennifer R; Rehm, E Jay; Grossman, Robert L; Kellis, Manolis; White, Kevin P

2014-07-01

Annotation of regulatory elements and identification of the transcription-related factors (TRFs) targeting these elements are key steps in understanding how cells interpret their genetic blueprint and their environment during development, and how that process goes awry in the case of disease. One goal of the modENCODE (model organism ENCyclopedia of DNA Elements) Project is to survey a diverse sampling of TRFs, both DNA-binding and non-DNA-binding factors, to provide a framework for the subsequent study of the mechanisms by which transcriptional regulators target the genome. Here we provide an updated map of the Drosophila melanogaster regulatory genome based on the location of 84 TRFs at various stages of development. This regulatory map reveals a variety of genomic targeting patterns, including factors with strong preferences toward proximal promoter binding, factors that target intergenic and intronic DNA, and factors with distinct chromatin state preferences. The data also highlight the stringency of the Polycomb regulatory network, and show association of the Trithorax-like (Trl) protein with hotspots of DNA binding throughout development. Furthermore, the data identify more than 5800 instances in which TRFs target DNA regions with demonstrated enhancer activity. Regions of high TRF co-occupancy are more likely to be associated with open enhancers used across cell types, while lower TRF occupancy regions are associated with complex enhancers that are also regulated at the epigenetic level. Together these data serve as a resource for the research community in the continued effort to dissect transcriptional regulatory mechanisms directing Drosophila development. © 2014 Slattery et al.; Published by Cold Spring Harbor Laboratory Press.
High-Throughput resequencing of maize landraces at genomic regions associated with flowering time

USDA-ARS?s Scientific Manuscript database

Despite the reduction in the price of sequencing, it remains expensive to sequence and assemble whole, complex genomes of multiple samples for population studies, particularly for large genomes like those of many crop species. Enrichment of target genome regions coupled with next generation sequenci...
GANESH: software for customized annotation of genome regions.

PubMed

Huntley, Derek; Hummerich, Holger; Smedley, Damian; Kittivoravitkul, Sasivimol; McCarthy, Mark; Little, Peter; Sergot, Marek

2003-09-01

GANESH is a software package designed to support the genetic analysis of regions of human and other genomes. It provides a set of components that may be assembled to construct a self-updating database of DNA sequence, mapping data, and annotations of possible genome features. Once one or more remote sources of data for the target region have been identified, all sequences for that region are downloaded, assimilated, and subjected to a (configurable) set of standard database-searching and genome-analysis packages. The results are stored in compressed form in a relational database, and are updated automatically on a regular schedule so that they are always immediately available in their most up-to-date versions. A Java front-end, executed as a stand alone application or web applet, provides a graphical interface for navigating the database and for viewing the annotations. There are facilities for importing and exporting data in the format of the Distributed Annotation System (DAS), enabling a GANESH database to be used as a component of a DAS configuration. The system has been used to construct databases for about a dozen regions of human chromosomes and for three regions of mouse chromosomes.
Chromosomal targeting by CRISPR-Cas systems can contribute to genome plasticity in bacteria

PubMed Central

Dy, Ron L; Pitman, Andrew R; Fineran, Peter C

2013-01-01

The clustered regularly interspaced short palindromic repeats (CRISPR) and their associated (Cas) proteins form adaptive immune systems in bacteria to combat phage and other foreign genetic elements. Typically, short spacer sequences are acquired from the invader DNA and incorporated into CRISPR arrays in the bacterial genome. Small RNAs are generated that contain these spacer sequences and enable sequence-specific destruction of the foreign nucleic acids. Occasionally, spacers are acquired from the chromosome, which instead leads to targeting of the host genome. Chromosomal targeting is highly toxic to the bacterium, providing a strong selective pressure for a variety of evolutionary routes that enable host cell survival. Mutations that inactivate the CRISPR-Cas functionality, such as within the cas genes, CRISPR repeat, protospacer adjacent motifs (PAM), and target sequence, mediate escape from toxicity. This self-targeting might provide some explanation for the incomplete distribution of CRISPR-Cas systems in less than half of sequenced bacterial genomes. More importantly, self-genome targeting can cause large-scale genomic alterations, including remodeling or deletion of pathogenicity islands and other non-mobile chromosomal regions. While control of horizontal gene transfer is perceived as their main function, our recent work illuminates an alternative role of CRISPR-Cas systems in causing host genomic changes and influencing bacterial evolution. PMID:24251073
Genomic Target Database (GTD): A database of potential targets in human pathogenic bacteria

PubMed Central

Barh, Debmalya; Kumar, Anil; Misra, Amarendra Narayana

2009-01-01

A Genomic Target Database (GTD) has been developed having putative genomic drug targets for human bacterial pathogens. The selected pathogens are either drug resistant or vaccines are yet to be developed against them. The drug targets have been identified using subtractive genomics approaches and these are subsequently classified into Drug targets in pathogen specific unique metabolic pathways,Drug targets in host-pathogen common metabolic pathways, andMembrane localized drug targets. HTML code is used to link each target to its various properties and other available public resources. Essential resources and tools for subtractive genomic analysis, sub-cellular localization, vaccine and drug designing are also mentioned. To the best of authors knowledge, no such database (DB) is presently available that has listed metabolic pathways and membrane specific genomic drug targets based on subtractive genomics. Listed targets in GTD are readily available resource in developing drug and vaccine against the respective pathogen, its subtypes, and other family members. Currently GTD contains 58 drug targets for four pathogens. Shortly, drug targets for six more pathogens will be listed. Availability GTD is available at IIOAB website http://www.iioab.webs.com/GTD.htm. It can also be accessed at http://www.iioabdgd.webs.com.GTD is free for academic research and non-commercial use only. Commercial use is strictly prohibited without prior permission from IIOAB. PMID:20011153
An analysis of possible off target effects following CAS9/CRISPR targeted deletions of neuropeptide gene enhancers from the mouse genome.

PubMed

Hay, Elizabeth Anne; Khalaf, Abdulla Razak; Marini, Pietro; Brown, Andrew; Heath, Karyn; Sheppard, Darrin; MacKenzie, Alasdair

2017-08-01

We have successfully used comparative genomics to identify putative regulatory elements within the human genome that contribute to the tissue specific expression of neuropeptides such as galanin and receptors such as CB1. However, a previous inability to rapidly delete these elements from the mouse genome has prevented optimal assessment of their function in-vivo. This has been solved using CAS9/CRISPR genome editing technology which uses a bacterial endonuclease called CAS9 that, in combination with specifically designed guide RNA (gRNA) molecules, cuts specific regions of the mouse genome. However, reports of "off target" effects, whereby the CAS9 endonuclease is able to cut sites other than those targeted, limits the appeal of this technology. We used cytoplasmic microinjection of gRNA and CAS9 mRNA into 1-cell mouse embryos to rapidly generate enhancer knockout mouse lines. The current study describes our analysis of the genomes of these enhancer knockout lines to detect possible off-target effects. Bioinformatic analysis was used to identify the most likely putative off-target sites and to design PCR primers that would amplify these sequences from genomic DNA of founder enhancer deletion mouse lines. Amplified DNA was then sequenced and blasted against the mouse genome sequence to detect off-target effects. Using this approach we were unable to detect any evidence of off-target effects in the genomes of three founder lines using any of the four gRNAs used in the analysis. This study suggests that the problem of off-target effects in transgenic mice have been exaggerated and that CAS9/CRISPR represents a highly effective and accurate method of deleting putative neuropeptide gene enhancer sequences from the mouse genome. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
Quantifying on- and off-target genome editing.

PubMed

Hendel, Ayal; Fine, Eli J; Bao, Gang; Porteus, Matthew H

2015-02-01

Genome editing with engineered nucleases is a rapidly growing field thanks to transformative technologies that allow researchers to precisely alter genomes for numerous applications including basic research, biotechnology, and human gene therapy. While the ability to make precise and controlled changes at specified sites throughout the genome has grown tremendously in recent years, we still lack a comprehensive and standardized battery of assays for measuring the different genome editing outcomes created at endogenous genomic loci. Here we review the existing assays for quantifying on- and off-target genome editing and describe their utility in advancing the technology. We also highlight unmet assay needs for quantifying on- and off-target genome editing outcomes and discuss their importance for the genome editing field. Copyright © 2014 Elsevier Ltd. All rights reserved.
Genomic Copy Number Dictates a Gene-Independent Cell Response to CRISPR/Cas9 Targeting | Office of Cancer Genomics

Cancer.gov

The CRISPR/Cas9 system enables genome editing and somatic cell genetic screens in mammalian cells. We performed genome-scale loss-of-function screens in 33 cancer cell lines to identify genes essential for proliferation/survival and found a strong correlation between increased gene copy number and decreased cell viability after genome editing. Within regions of copy-number gain, CRISPR/Cas9 targeting of both expressed and unexpressed genes, as well as intergenic loci, led to significantly decreased cell proliferation through induction of a G2 cell-cycle arrest.
Targeted Analysis of Whole Genome Sequence Data to Diagnose Genetic Cardiomyopathy

DOE PAGES

Golbus, Jessica R.; Puckelwartz, Megan J.; Dellefave-Castillo, Lisa; ...

2014-09-01

Background—Cardiomyopathy is highly heritable but genetically diverse. At present, genetic testing for cardiomyopathy uses targeted sequencing to simultaneously assess the coding regions of more than 50 genes. New genes are routinely added to panels to improve the diagnostic yield. With the anticipated $1000 genome, it is expected that genetic testing will shift towards comprehensive genome sequencing accompanied by targeted gene analysis. Therefore, we assessed the reliability of whole genome sequencing and targeted analysis to identify cardiomyopathy variants in 11 subjects with cardiomyopathy. Methods and Results—Whole genome sequencing with an average of 37× coverage was combined with targeted analysis focused onmore » 204 genes linked to cardiomyopathy. Genetic variants were scored using multiple prediction algorithms combined with frequency data from public databases. This pipeline yielded 1-14 potentially pathogenic variants per individual. Variants were further analyzed using clinical criteria and/or segregation analysis. Three of three previously identified primary mutations were detected by this analysis. In six subjects for whom the primary mutation was previously unknown, we identified mutations that segregated with disease, had clinical correlates, and/or had additional pathological correlation to provide evidence for causality. For two subjects with previously known primary mutations, we identified additional variants that may act as modifiers of disease severity. In total, we identified the likely pathological mutation in 9 of 11 (82%) subjects. We conclude that these pilot data demonstrate that ~30-40× coverage whole genome sequencing combined with targeted analysis is feasible and sensitive to identify rare variants in cardiomyopathy-associated genes.« less

Multiplex Degenerate Primer Design for Targeted Whole Genome Amplification of Many Viral Genomes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gardner, Shea N.; Jaing, Crystal J.; Elsheikh, Maher M.

Background . Targeted enrichment improves coverage of highly mutable viruses at low concentration in complex samples. Degenerate primers that anneal to conserved regions can facilitate amplification of divergent, low concentration variants, even when the strain present is unknown. Results . A tool for designing multiplex sets of degenerate sequencing primers to tile overlapping amplicons across multiple whole genomes is described. The new script, run_tiled_primers, is part of the PriMux software. Primers were designed for each segment of South American hemorrhagic fever viruses, tick-borne encephalitis, Henipaviruses, Arenaviruses, Filoviruses, Crimean-Congo hemorrhagic fever virus, Rift Valley fever virus, and Japanese encephalitis virus. Eachmore » group is highly diverse with as little as 5% genome consensus. Primer sets were computationally checked for nontarget cross reactions against the NCBI nucleotide sequence database. Primers for murine hepatitis virus were demonstrated in the lab to specifically amplify selected genes from a laboratory cultured strain that had undergone extensive passage in vitro and in vivo. Conclusions . This software should help researchers design multiplex sets of primers for targeted whole genome enrichment prior to sequencing to obtain better coverage of low titer, divergent viruses. Applications include viral discovery from a complex background and improved sensitivity and coverage of rapidly evolving strains or variants in a gene family.« less
Multiplex Degenerate Primer Design for Targeted Whole Genome Amplification of Many Viral Genomes

DOE PAGES

Gardner, Shea N.; Jaing, Crystal J.; Elsheikh, Maher M.; ...

2014-01-01

Background . Targeted enrichment improves coverage of highly mutable viruses at low concentration in complex samples. Degenerate primers that anneal to conserved regions can facilitate amplification of divergent, low concentration variants, even when the strain present is unknown. Results . A tool for designing multiplex sets of degenerate sequencing primers to tile overlapping amplicons across multiple whole genomes is described. The new script, run_tiled_primers, is part of the PriMux software. Primers were designed for each segment of South American hemorrhagic fever viruses, tick-borne encephalitis, Henipaviruses, Arenaviruses, Filoviruses, Crimean-Congo hemorrhagic fever virus, Rift Valley fever virus, and Japanese encephalitis virus. Eachmore » group is highly diverse with as little as 5% genome consensus. Primer sets were computationally checked for nontarget cross reactions against the NCBI nucleotide sequence database. Primers for murine hepatitis virus were demonstrated in the lab to specifically amplify selected genes from a laboratory cultured strain that had undergone extensive passage in vitro and in vivo. Conclusions . This software should help researchers design multiplex sets of primers for targeted whole genome enrichment prior to sequencing to obtain better coverage of low titer, divergent viruses. Applications include viral discovery from a complex background and improved sensitivity and coverage of rapidly evolving strains or variants in a gene family.« less
Differential contribution of genomic regions to marked genetic variation and prediction of quantitative traits in broiler chickens.

PubMed

Abdollahi-Arpanahi, Rostam; Morota, Gota; Valente, Bruno D; Kranis, Andreas; Rosa, Guilherme J M; Gianola, Daniel

2016-02-03

Genome-wide association studies in humans have found enrichment of trait-associated single nucleotide polymorphisms (SNPs) in coding regions of the genome and depletion of these in intergenic regions. However, a recent release of the ENCyclopedia of DNA elements showed that ~80 % of the human genome has a biochemical function. Similar studies on the chicken genome are lacking, thus assessing the relative contribution of its genic and non-genic regions to variation is relevant for biological studies and genetic improvement of chicken populations. A dataset including 1351 birds that were genotyped with the 600K Affymetrix platform was used. We partitioned SNPs according to genome annotation data into six classes to characterize the relative contribution of genic and non-genic regions to genetic variation as well as their predictive power using all available quality-filtered SNPs. Target traits were body weight, ultrasound measurement of breast muscle and hen house egg production in broiler chickens. Six genomic regions were considered: intergenic regions, introns, missense, synonymous, 5' and 3' untranslated regions, and regions that are located 5 kb upstream and downstream of coding genes. Genomic relationship matrices were constructed for each genomic region and fitted in the models, separately or simultaneously. Kernel-based ridge regression was used to estimate variance components and assess predictive ability. Contribution of each class of genomic regions to dominance variance was also considered. Variance component estimates indicated that all genomic regions contributed to marked additive genetic variation and that the class of synonymous regions tended to have the greatest contribution. The marked dominance genetic variation explained by each class of genomic regions was similar and negligible (~0.05). In terms of prediction mean-square error, the whole-genome approach showed the best predictive ability. All genic and non-genic regions contributed to
Nuclease Target Site Selection for Maximizing On-target Activity and Minimizing Off-target Effects in Genome Editing

PubMed Central

Lee, Ciaran M; Cradick, Thomas J; Fine, Eli J; Bao, Gang

2016-01-01

The rapid advancement in targeted genome editing using engineered nucleases such as ZFNs, TALENs, and CRISPR/Cas9 systems has resulted in a suite of powerful methods that allows researchers to target any genomic locus of interest. A complementary set of design tools has been developed to aid researchers with nuclease design, target site selection, and experimental validation. Here, we review the various tools available for target selection in designing engineered nucleases, and for quantifying nuclease activity and specificity, including web-based search tools and experimental methods. We also elucidate challenges in target selection, especially in predicting off-target effects, and discuss future directions in precision genome editing and its applications. PMID:26750397
Multi-targeted priming for genome-wide gene expression assays.

PubMed

Adomas, Aleksandra B; Lopez-Giraldez, Francesc; Clark, Travis A; Wang, Zheng; Townsend, Jeffrey P

2010-08-17

Complementary approaches to assaying global gene expression are needed to assess gene expression in regions that are poorly assayed by current methodologies. A key component of nearly all gene expression assays is the reverse transcription of transcribed sequences that has traditionally been performed by priming the poly-A tails on many of the transcribed genes in eukaryotes with oligo-dT, or by priming RNA indiscriminately with random hexamers. We designed an algorithm to find common sequence motifs that were present within most protein-coding genes of Saccharomyces cerevisiae and of Neurospora crassa, but that were not present within their ribosomal RNA or transfer RNA genes. We then experimentally tested whether degenerately priming these motifs with multi-targeted primers improved the accuracy and completeness of transcriptomic assays. We discovered two multi-targeted primers that would prime a preponderance of genes in the genomes of Saccharomyces cerevisiae and Neurospora crassa while avoiding priming ribosomal RNA or transfer RNA. Examining the response of Saccharomyces cerevisiae to nitrogen deficiency and profiling Neurospora crassa early sexual development, we demonstrated that using multi-targeted primers in reverse transcription led to superior performance of microarray profiling and next-generation RNA tag sequencing. Priming with multi-targeted primers in addition to oligo-dT resulted in higher sensitivity, a larger number of well-measured genes and greater power to detect differences in gene expression. Our results provide the most complete and detailed expression profiles of the yeast nitrogen starvation response and N. crassa early sexual development to date. Furthermore, our multi-targeting priming methodology for genome-wide gene expression assays provides selective targeting of multiple sequences and counter-selection against undesirable sequences, facilitating a more complete and precise assay of the transcribed sequences within the genome.
TARGET Publication Guidelines | Office of Cancer Genomics

Cancer.gov

Like other NCI large-scale genomics initiatives, TARGET is a community resource project and data are made available rapidly after validation for use by other researchers. To act in accord with the Fort Lauderdale principles and support the continued prompt public release of large-scale genomic data prior to publication, researchers who plan to prepare manuscripts containing descriptions of TARGET pediatric cancer data that would be of comparable scope to an initial TARGET disease-specific comprehensive, global analysis publication, and journal editors who receive such manuscripts, are
[Genome-editing: focus on the off-target effects].

PubMed

He, Xiubin; Gu, Feng

2017-10-25

Breakthroughs of genome-editing in recent years have paved the way to develop new therapeutic strategies. These genome-editing tools mainly include Zinc-finger nucleases (ZFNs), Transcription activator-like effector nucleases (TALENs), and clustered regulatory interspaced short palindromic repeat (CRISPR)/Cas-based RNA-guided DNA endonucleases. However, off-target effects are still the major issue in genome editing, and limit the application in gene therapy. Here, we summarized the cause and compared different detection methods of off-targets.
Whole-genome resequencing of 292 pigeonpea accessions identifies genomic regions associated with domestication and agronomic traits.

PubMed

Varshney, Rajeev K; Saxena, Rachit K; Upadhyaya, Hari D; Khan, Aamir W; Yu, Yue; Kim, Changhoon; Rathore, Abhishek; Kim, Dongseon; Kim, Jihun; An, Shaun; Kumar, Vinay; Anuradha, Ghanta; Yamini, Kalinati Narasimhan; Zhang, Wei; Muniswamy, Sonnappa; Kim, Jong-So; Penmetsa, R Varma; von Wettberg, Eric; Datta, Swapan K

2017-07-01

Pigeonpea (Cajanus cajan), a tropical grain legume with low input requirements, is expected to continue to have an important role in supplying food and nutritional security in developing countries in Asia, Africa and the tropical Americas. From whole-genome resequencing of 292 Cajanus accessions encompassing breeding lines, landraces and wild species, we characterize genome-wide variation. On the basis of a scan for selective sweeps, we find several genomic regions that were likely targets of domestication and breeding. Using genome-wide association analysis, we identify associations between several candidate genes and agronomically important traits. Candidate genes for these traits in pigeonpea have sequence similarity to genes functionally characterized in other plants for flowering time control, seed development and pod dehiscence. Our findings will allow acceleration of genetic gains for key traits to improve yield and sustainability in pigeonpea.
How many genomics targets can a portfolio afford?

PubMed

Betz, Ulrich A K

2005-08-01

The pharmaceutical industry can look back at a history of successful innovations. Although genomics technologies have provided drug discovery pipelines with a plethora of new potential drug targets, solid target validation is crucial to avoiding high attrition rates. Biomarkers for patient stratification and approaches for personalized medicine will further help to reduce the risk associated with new targets. To achieve an overall risk balance, portfolios have to be supplemented with precedented targets, me-too approaches and line extensions of existing drugs. However, capitalizing on genomics investments and working on unprecedented targets is essential for a continuous stream of innovative drugs.
Genome-wide analysis of Polycomb targets in Drosophila

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schwartz, Yuri B.; Kahn, Tatyana G.; Nix, David A.

2006-04-01

Polycomb Group (PcG) complexes are multiprotein assemblages that bind to chromatin and establish chromatin states leading to epigenetic silencing. PcG proteins regulate homeotic genes in flies and vertebrates but little is known about other PcG targets and the role of the PcG in development, differentiation and disease. We have determined the distribution of the PcG proteins PC, E(Z) and PSC and of histone H3K27 trimethylation in the Drosophila genome. At more than 200 PcG target genes, binding sites for the three PcG proteins colocalize to presumptive Polycomb Response Elements (PREs). In contrast, H3 me3K27 forms broad domains including the entiremore » transcription unit and regulatory regions. PcG targets are highly enriched in genes encoding transcription factors but receptors, signaling proteins, morphogens and regulators representing all major developmental pathways are also included.« less
In silico screening of the chicken genome for overlaps between genomic regions: microRNA genes, coding and non-coding transcriptional units, QTL, and genetic variations.

PubMed

Zorc, Minja; Kunej, Tanja

2016-05-01

MicroRNAs (miRNAs) are a class of non-coding RNAs involved in posttranscriptional regulation of target genes. Regulation requires complementarity between target mRNA and the mature miRNA seed region, responsible for their recognition and binding. It has been estimated that each miRNA targets approximately 200 genes, and genetic variability of miRNA genes has been reported to affect phenotypic variability and disease susceptibility in humans, livestock species, and model organisms. Polymorphisms in miRNA genes could therefore represent biomarkers for phenotypic traits in livestock animals. In our previous study, we collected polymorphisms within miRNA genes in chicken. In the present study, we identified miRNA-related genomic overlaps to prioritize genomic regions of interest for further functional studies and biomarker discovery. Overlapping genomic regions in chicken were analyzed using the following bioinformatics tools and databases: miRNA SNiPer, Ensembl, miRBase, NCBI Blast, and QTLdb. Out of 740 known pre-miRNA genes, 263 (35.5 %) contain polymorphisms; among them, 35 contain more than three polymorphisms The most polymorphic miRNA genes in chicken are gga-miR-6662, containing 23 single nucleotide polymorphisms (SNPs) within the pre-miRNA region, including five consecutive SNPs, and gga-miR-6688, containing ten polymorphisms including three consecutive polymorphisms. Several miRNA-related genomic hotspots have been revealed in chicken genome; polymorphic miRNA genes are located within protein-coding and/or non-coding transcription units and quantitative trait loci (QTL) associated with production traits. The present study includes the first description of an exonic miRNA in a chicken genome, an overlap between the miRNA gene and the exon of the protein-coding gene (gga-miR-6578/HADHB), and the first report of a missense polymorphism located within a mature miRNA seed region. Identified miRNA-related genomic hotspots in chicken can serve researchers as a
A Targeted Capture Linkage Map Anchors the Genome of the Schistosomiasis Vector Snail, Biomphalaria glabrata.

PubMed

Tennessen, Jacob A; Bollmann, Stephanie R; Blouin, Michael S

2017-07-05

The aquatic planorbid snail Biomphalaria glabrata is one of the most intensively-studied mollusks due to its role in the transmission of schistosomiasis. Its 916 Mb genome has recently been sequenced and annotated, but it remains poorly assembled. Here, we used targeted capture markers to map over 10,000 B. glabrata scaffolds in a linkage cross of 94 F1 offspring, generating 24 linkage groups (LGs). We added additional scaffolds to these LGs based on linkage disequilibrium (LD) analysis of targeted capture and whole-genome sequences of 96 unrelated snails. Our final linkage map consists of 18,613 scaffolds comprising 515 Mb, representing 56% of the genome and 75% of genic and nonrepetitive regions. There are 18 large (> 10 Mb) LGs, likely representing the expected 18 haploid chromosomes, and > 50% of the genome has been assigned to LGs of at least 17 Mb. Comparisons with other gastropod genomes reveal patterns of synteny and chromosomal rearrangements. Linkage relationships of key immune-relevant genes may help clarify snail-schistosome interactions. By focusing on linkage among genic and nonrepetitive regions, we have generated a useful resource for associating snail phenotypes with causal genes, even in the absence of a complete genome assembly. A similar approach could potentially improve numerous poorly-assembled genomes in other taxa. This map will facilitate future work on this host of a serious human parasite. Copyright © 2017 Tennessen et al.
Genome-wide comparisons of phylogenetic similarities between partial genomic regions and the full-length genome in Hepatitis E virus genotyping.

PubMed

Wang, Shuai; Wei, Wei; Luo, Xuenong; Cai, Xuepeng

2014-01-01

Besides the complete genome, different partial genomic sequences of Hepatitis E virus (HEV) have been used in genotyping studies, making it difficult to compare the results based on them. No commonly agreed partial region for HEV genotyping has been determined. In this study, we used a statistical method to evaluate the phylogenetic performance of each partial genomic sequence from a genome wide, by comparisons of evolutionary distances between genomic regions and the full-length genomes of 101 HEV isolates to identify short genomic regions that can reproduce HEV genotype assignments based on full-length genomes. Several genomic regions, especially one genomic region at the 3'-terminal of the papain-like cysteine protease domain, were detected to have relatively high phylogenetic correlations with the full-length genome. Phylogenetic analyses confirmed the identical performances between these regions and the full-length genome in genotyping, in which the HEV isolates involved could be divided into reasonable genotypes. This analysis may be of value in developing a partial sequence-based consensus classification of HEV species.
Genomic Copy Number Dictates a Gene-Independent Cell Response to CRISPR/Cas9 Targeting.

PubMed

Aguirre, Andrew J; Meyers, Robin M; Weir, Barbara A; Vazquez, Francisca; Zhang, Cheng-Zhong; Ben-David, Uri; Cook, April; Ha, Gavin; Harrington, William F; Doshi, Mihir B; Kost-Alimova, Maria; Gill, Stanley; Xu, Han; Ali, Levi D; Jiang, Guozhi; Pantel, Sasha; Lee, Yenarae; Goodale, Amy; Cherniack, Andrew D; Oh, Coyin; Kryukov, Gregory; Cowley, Glenn S; Garraway, Levi A; Stegmaier, Kimberly; Roberts, Charles W; Golub, Todd R; Meyerson, Matthew; Root, David E; Tsherniak, Aviad; Hahn, William C

2016-08-01

The CRISPR/Cas9 system enables genome editing and somatic cell genetic screens in mammalian cells. We performed genome-scale loss-of-function screens in 33 cancer cell lines to identify genes essential for proliferation/survival and found a strong correlation between increased gene copy number and decreased cell viability after genome editing. Within regions of copy-number gain, CRISPR/Cas9 targeting of both expressed and unexpressed genes, as well as intergenic loci, led to significantly decreased cell proliferation through induction of a G2 cell-cycle arrest. By examining single-guide RNAs that map to multiple genomic sites, we found that this cell response to CRISPR/Cas9 editing correlated strongly with the number of target loci. These observations indicate that genome targeting by CRISPR/Cas9 elicits a gene-independent antiproliferative cell response. This effect has important practical implications for the interpretation of CRISPR/Cas9 screening data and confounds the use of this technology for the identification of essential genes in amplified regions. We found that the number of CRISPR/Cas9-induced DNA breaks dictates a gene-independent antiproliferative response in cells. These observations have practical implications for using CRISPR/Cas9 to interrogate cancer gene function and illustrate that cancer cells are highly sensitive to site-specific DNA damage, which may provide a path to novel therapeutic strategies. Cancer Discov; 6(8); 914-29. ©2016 AACR.See related commentary by Sheel and Xue, p. 824See related article by Munoz et al., p. 900This article is highlighted in the In This Issue feature, p. 803. 2016 American Association for Cancer Research.
Targeted activation of diverse CRISPR-Cas systems for mammalian genome editing via proximal CRISPR targeting.

PubMed

Chen, Fuqiang; Ding, Xiao; Feng, Yongmei; Seebeck, Timothy; Jiang, Yanfang; Davis, Gregory D

2017-04-07

Bacterial CRISPR-Cas systems comprise diverse effector endonucleases with different targeting ranges, specificities and enzymatic properties, but many of them are inactive in mammalian cells and are thus precluded from genome-editing applications. Here we show that the type II-B FnCas9 from Francisella novicida possesses novel properties, but its nuclease function is frequently inhibited at many genomic loci in living human cells. Moreover, we develop a proximal CRISPR (termed proxy-CRISPR) targeting method that restores FnCas9 nuclease activity in a target-specific manner. We further demonstrate that this proxy-CRISPR strategy is applicable to diverse CRISPR-Cas systems, including type II-C Cas9 and type V Cpf1 systems, and can facilitate precise gene editing even between identical genomic sites within the same genome. Our findings provide a novel strategy to enable use of diverse otherwise inactive CRISPR-Cas systems for genome-editing applications and a potential path to modulate the impact of chromatin microenvironments on genome modification.
Targeted activation of diverse CRISPR-Cas systems for mammalian genome editing via proximal CRISPR targeting

PubMed Central

Chen, Fuqiang; Ding, Xiao; Feng, Yongmei; Seebeck, Timothy; Jiang, Yanfang; Davis, Gregory D.

2017-01-01

Bacterial CRISPR–Cas systems comprise diverse effector endonucleases with different targeting ranges, specificities and enzymatic properties, but many of them are inactive in mammalian cells and are thus precluded from genome-editing applications. Here we show that the type II-B FnCas9 from Francisella novicida possesses novel properties, but its nuclease function is frequently inhibited at many genomic loci in living human cells. Moreover, we develop a proximal CRISPR (termed proxy-CRISPR) targeting method that restores FnCas9 nuclease activity in a target-specific manner. We further demonstrate that this proxy-CRISPR strategy is applicable to diverse CRISPR–Cas systems, including type II-C Cas9 and type V Cpf1 systems, and can facilitate precise gene editing even between identical genomic sites within the same genome. Our findings provide a novel strategy to enable use of diverse otherwise inactive CRISPR–Cas systems for genome-editing applications and a potential path to modulate the impact of chromatin microenvironments on genome modification. PMID:28387220
regioneR: an R/Bioconductor package for the association analysis of genomic regions based on permutation tests.

PubMed

Gel, Bernat; Díez-Villanueva, Anna; Serra, Eduard; Buschbeck, Marcus; Peinado, Miguel A; Malinverni, Roberto

2016-01-15

Statistically assessing the relation between a set of genomic regions and other genomic features is a common challenging task in genomic and epigenomic analyses. Randomization based approaches implicitly take into account the complexity of the genome without the need of assuming an underlying statistical model. regioneR is an R package that implements a permutation test framework specifically designed to work with genomic regions. In addition to the predefined randomization and evaluation strategies, regioneR is fully customizable allowing the use of custom strategies to adapt it to specific questions. Finally, it also implements a novel function to evaluate the local specificity of the detected association. regioneR is an R package released under Artistic-2.0 License. The source code and documents are freely available through Bioconductor (http://www.bioconductor.org/packages/regioneR). rmalinverni@carrerasresearch.org. © The Author 2015. Published by Oxford University Press.
microRNA-122 target sites in the hepatitis C virus RNA NS5B coding region and 3' untranslated region: function in replication and influence of RNA secondary structure.

PubMed

Gerresheim, Gesche K; Dünnes, Nadia; Nieder-Röhrmann, Anika; Shalamova, Lyudmila A; Fricke, Markus; Hofacker, Ivo; Höner Zu Siederdissen, Christian; Marz, Manja; Niepmann, Michael

2017-02-01

We have analyzed the binding of the liver-specific microRNA-122 (miR-122) to three conserved target sites of hepatitis C virus (HCV) RNA, two in the non-structural protein 5B (NS5B) coding region and one in the 3' untranslated region (3'UTR). miR-122 binding efficiency strongly depends on target site accessibility under conditions when the range of flanking sequences available for the formation of local RNA secondary structures changes. Our results indicate that the particular sequence feature that contributes most to the correlation between target site accessibility and binding strength varies between different target sites. This suggests that the dynamics of miRNA/Ago2 binding not only depends on the target site itself but also on flanking sequence context to a considerable extent, in particular in a small viral genome in which strong selection constraints act on coding sequence and overlapping cis-signals and model the accessibility of cis-signals. In full-length genomes, single and combination mutations in the miR-122 target sites reveal that site 5B.2 is positively involved in regulating overall genome replication efficiency, whereas mutation of site 5B.3 showed a weaker effect. Mutation of the 3'UTR site and double or triple mutants showed no significant overall effect on genome replication, whereas in a translation reporter RNA, the 3'UTR target site inhibits translation directed by the HCV 5'UTR. Thus, the miR-122 target sites in the 3'-region of the HCV genome are involved in a complex interplay in regulating different steps of the HCV replication cycle.
Indel-seq: a fast-forward genetics approach for identification of trait-associated putative candidate genomic regions and its application in pigeonpea (Cajanus cajan).

PubMed

Singh, Vikas K; Khan, Aamir W; Saxena, Rachit K; Sinha, Pallavi; Kale, Sandip M; Parupalli, Swathi; Kumar, Vinay; Chitikineni, Annapurna; Vechalapu, Suryanarayana; Sameer Kumar, Chanda Venkata; Sharma, Mamta; Ghanta, Anuradha; Yamini, Kalinati Narasimhan; Muniswamy, Sonnappa; Varshney, Rajeev K

2017-07-01

Identification of candidate genomic regions associated with target traits using conventional mapping methods is challenging and time-consuming. In recent years, a number of single nucleotide polymorphism (SNP)-based mapping approaches have been developed and used for identification of candidate/putative genomic regions. However, in the majority of these studies, insertion-deletion (Indel) were largely ignored. For efficient use of Indels in mapping target traits, we propose Indel-seq approach, which is a combination of whole-genome resequencing (WGRS) and bulked segregant analysis (BSA) and relies on the Indel frequencies in extreme bulks. Deployment of Indel-seq approach for identification of candidate genomic regions associated with fusarium wilt (FW) and sterility mosaic disease (SMD) resistance in pigeonpea has identified 16 Indels affecting 26 putative candidate genes. Of these 26 affected putative candidate genes, 24 genes showed effect in the upstream/downstream of the genic region and two genes showed effect in the genes. Validation of these 16 candidate Indels in other FW- and SMD-resistant and FW- and SMD-susceptible genotypes revealed a significant association of five Indels (three for FW and two for SMD resistance). Comparative analysis of Indel-seq with other genetic mapping approaches highlighted the importance of the approach in identification of significant genomic regions associated with target traits. Therefore, the Indel-seq approach can be used for quick and precise identification of candidate genomic regions for any target traits in any crop species. © 2016 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
Progress of targeted genome modification approaches in higher plants.

PubMed

Cardi, Teodoro; Neal Stewart, C

2016-07-01

Transgene integration in plants is based on illegitimate recombination between non-homologous sequences. The low control of integration site and number of (trans/cis)gene copies might have negative consequences on the expression of transferred genes and their insertion within endogenous coding sequences. The first experiments conducted to use precise homologous recombination for gene integration commenced soon after the first demonstration that transgenic plants could be produced. Modern transgene targeting categories used in plant biology are: (a) homologous recombination-dependent gene targeting; (b) recombinase-mediated site-specific gene integration; (c) oligonucleotide-directed mutagenesis; (d) nuclease-mediated site-specific genome modifications. New tools enable precise gene replacement or stacking with exogenous sequences and targeted mutagenesis of endogeneous sequences. The possibility to engineer chimeric designer nucleases, which are able to target virtually any genomic site, and use them for inducing double-strand breaks in host DNA create new opportunities for both applied plant breeding and functional genomics. CRISPR is the most recent technology available for precise genome editing. Its rapid adoption in biological research is based on its inherent simplicity and efficacy. Its utilization, however, depends on available sequence information, especially for genome-wide analysis. We will review the approaches used for genome modification, specifically those for affecting gene integration and modification in higher plants. For each approach, the advantages and limitations will be noted. We also will speculate on how their actual commercial development and implementation in plant breeding will be affected by governmental regulations.

Selective whole genome amplification for resequencing target microbial species from complex natural samples.

PubMed

Leichty, Aaron R; Brisson, Dustin

2014-10-01

Population genomic analyses have demonstrated power to address major questions in evolutionary and molecular microbiology. Collecting populations of genomes is hindered in many microbial species by the absence of a cost effective and practical method to collect ample quantities of sufficiently pure genomic DNA for next-generation sequencing. Here we present a simple method to amplify genomes of a target microbial species present in a complex, natural sample. The selective whole genome amplification (SWGA) technique amplifies target genomes using nucleotide sequence motifs that are common in the target microbe genome, but rare in the background genomes, to prime the highly processive phi29 polymerase. SWGA thus selectively amplifies the target genome from samples in which it originally represented a minor fraction of the total DNA. The post-SWGA samples are enriched in target genomic DNA, which are ideal for population resequencing. We demonstrate the efficacy of SWGA using both laboratory-prepared mixtures of cultured microbes as well as a natural host-microbe association. Targeted amplification of Borrelia burgdorferi mixed with Escherichia coli at genome ratios of 1:2000 resulted in >10(5)-fold amplification of the target genomes with <6.7-fold amplification of the background. SWGA-treated genomic extracts from Wolbachia pipientis-infected Drosophila melanogaster resulted in up to 70% of high-throughput resequencing reads mapping to the W. pipientis genome. By contrast, 2-9% of sequencing reads were derived from W. pipientis without prior amplification. The SWGA technique results in high sequencing coverage at a fraction of the sequencing effort, thus allowing population genomic studies at affordable costs. Copyright © 2014 by the Genetics Society of America.
RNA-guided genome editing for target gene mutations in wheat.

PubMed

Upadhyay, Santosh Kumar; Kumar, Jitesh; Alok, Anshu; Tuli, Rakesh

2013-12-09

The clustered, regularly interspaced, short palindromic repeats (CRISPR) and CRISPR-associated protein (Cas) system has been used as an efficient tool for genome editing. We report the application of CRISPR-Cas-mediated genome editing to wheat (Triticum aestivum), the most important food crop plant with a very large and complex genome. The mutations were targeted in the inositol oxygenase (inox) and phytoene desaturase (pds) genes using cell suspension culture of wheat and in the pds gene in leaves of Nicotiana benthamiana. The expression of chimeric guide RNAs (cgRNA) targeting single and multiple sites resulted in indel mutations in all the tested samples. The expression of Cas9 or sgRNA alone did not cause any mutation. The expression of duplex cgRNA with Cas9 targeting two sites in the same gene resulted in deletion of DNA fragment between the targeted sequences. Multiplexing the cgRNA could target two genes at one time. Target specificity analysis of cgRNA showed that mismatches at the 3' end of the target site abolished the cleavage activity completely. The mismatches at the 5' end reduced cleavage, suggesting that the off target effects can be abolished in vivo by selecting target sites with unique sequences at 3' end. This approach provides a powerful method for genome engineering in plants.
New genomic resources for switchgrass: a BAC library and comparative analysis of homoeologous genomic regions harboring bioenergy traits

PubMed Central

2011-01-01

Background Switchgrass, a C4 species and a warm-season grass native to the prairies of North America, has been targeted for development into an herbaceous biomass fuel crop. Genetic improvement of switchgrass feedstock traits through marker-assisted breeding and biotechnology approaches calls for genomic tools development. Establishment of integrated physical and genetic maps for switchgrass will accelerate mapping of value added traits useful to breeding programs and to isolate important target genes using map based cloning. The reported polyploidy series in switchgrass ranges from diploid (2X = 18) to duodecaploid (12X = 108). Like in other large, repeat-rich plant genomes, this genomic complexity will hinder whole genome sequencing efforts. An extensive physical map providing enough information to resolve the homoeologous genomes would provide the necessary framework for accurate assembly of the switchgrass genome. Results A switchgrass BAC library constructed by partial digestion of nuclear DNA with EcoRI contains 147,456 clones covering the effective genome approximately 10 times based on a genome size of 3.2 Gigabases (~1.6 Gb effective). Restriction digestion and PFGE analysis of 234 randomly chosen BACs indicated that 95% of the clones contained inserts, ranging from 60 to 180 kb with an average of 120 kb. Comparative sequence analysis of two homoeologous genomic regions harboring orthologs of the rice OsBRI1 locus, a low-copy gene encoding a putative protein kinase and associated with biomass, revealed that orthologous clones from homoeologous chromosomes can be unambiguously distinguished from each other and correctly assembled to respective fingerprint contigs. Thus, the data obtained not only provide genomic resources for further analysis of switchgrass genome, but also improve efforts for an accurate genome sequencing strategy. Conclusions The construction of the first switchgrass BAC library and comparative analysis of homoeologous harboring OsBRI1
New genomic resources for switchgrass: a BAC library and comparative analysis of homoeologous genomic regions harboring bioenergy traits.

PubMed

Saski, Christopher A; Li, Zhigang; Feltus, Frank A; Luo, Hong

2011-07-18

Switchgrass, a C4 species and a warm-season grass native to the prairies of North America, has been targeted for development into an herbaceous biomass fuel crop. Genetic improvement of switchgrass feedstock traits through marker-assisted breeding and biotechnology approaches calls for genomic tools development. Establishment of integrated physical and genetic maps for switchgrass will accelerate mapping of value added traits useful to breeding programs and to isolate important target genes using map based cloning. The reported polyploidy series in switchgrass ranges from diploid (2X = 18) to duodecaploid (12X = 108). Like in other large, repeat-rich plant genomes, this genomic complexity will hinder whole genome sequencing efforts. An extensive physical map providing enough information to resolve the homoeologous genomes would provide the necessary framework for accurate assembly of the switchgrass genome. A switchgrass BAC library constructed by partial digestion of nuclear DNA with EcoRI contains 147,456 clones covering the effective genome approximately 10 times based on a genome size of 3.2 Gigabases (~1.6 Gb effective). Restriction digestion and PFGE analysis of 234 randomly chosen BACs indicated that 95% of the clones contained inserts, ranging from 60 to 180 kb with an average of 120 kb. Comparative sequence analysis of two homoeologous genomic regions harboring orthologs of the rice OsBRI1 locus, a low-copy gene encoding a putative protein kinase and associated with biomass, revealed that orthologous clones from homoeologous chromosomes can be unambiguously distinguished from each other and correctly assembled to respective fingerprint contigs. Thus, the data obtained not only provide genomic resources for further analysis of switchgrass genome, but also improve efforts for an accurate genome sequencing strategy. The construction of the first switchgrass BAC library and comparative analysis of homoeologous harboring OsBRI1 orthologs present a glimpse into
Genome-wide determination of on-target and off-target characteristics for RNA-guided DNA methylation by dCas9 methyltransferases

PubMed Central

Lin, Lin; Liu, Yong; Xu, Fengping; Huang, Jinrong; Daugaard, Tina Fuglsang; Petersen, Trine Skov; Hansen, Bettina; Ye, Lingfei; Zhou, Qing; Fang, Fang; Yang, Ling; Li, Shengting; Fløe, Lasse; Jensen, Kristopher Torp; Shrock, Ellen; Chen, Fang; Yang, Huanming; Wang, Jian; Liu, Xin; Xu, Xun; Bolund, Lars; Nielsen, Anders Lade; Luo, Yonglun

2018-01-01

Abstract Background Fusion of DNA methyltransferase domains to the nuclease-deficient clustered regularly interspaced short palindromic repeat (CRISPR) associated protein 9 (dCas9) has been used for epigenome editing, but the specificities of these dCas9 methyltransferases have not been fully investigated. Findings We generated CRISPR-guided DNA methyltransferases by fusing the catalytic domain of DNMT3A or DNMT3B to the C terminus of the dCas9 protein from Streptococcus pyogenes and validated its on-target and global off-target characteristics. Using targeted quantitative bisulfite pyrosequencing, we prove that dCas9-BFP-DNMT3A and dCas9-BFP-DNMT3B can efficiently methylate the CpG dinucleotides flanking its target sites at different genomic loci (uPA and TGFBR3) in human embryonic kidney cells (HEK293T). Furthermore, we conducted whole genome bisulfite sequencing (WGBS) to address the specificity of our dCas9 methyltransferases. WGBS revealed that although dCas9-BFP-DNMT3A and dCas9-BFP-DNMT3B did not cause global methylation changes, a substantial number (more than 1000) of the off-target differentially methylated regions (DMRs) were identified. The off-target DMRs, which were hypermethylated in cells expressing dCas9 methyltransferase and guide RNAs, were predominantly found in promoter regions, 5΄ untranslated regions, CpG islands, and DNase I hypersensitivity sites, whereas unexpected hypomethylated off-target DMRs were significantly enriched in repeated sequences. Through chromatin immunoprecipitation with massive parallel DNA sequencing analysis, we further revealed that these off-target DMRs were weakly correlated with dCas9 off-target binding sites. Using quantitative polymerase chain reaction, RNA sequencing, and fluorescence reporter cells, we also found that dCas9-BFP-DNMT3A and dCas9-BFP-DNMT3B can mediate transient inhibition of gene expression, which might be caused by dCas9-mediated de novo DNA methylation as well as interference with
Genome-wide determination of on-target and off-target characteristics for RNA-guided DNA methylation by dCas9 methyltransferases.

PubMed

Lin, Lin; Liu, Yong; Xu, Fengping; Huang, Jinrong; Daugaard, Tina Fuglsang; Petersen, Trine Skov; Hansen, Bettina; Ye, Lingfei; Zhou, Qing; Fang, Fang; Yang, Ling; Li, Shengting; Fløe, Lasse; Jensen, Kristopher Torp; Shrock, Ellen; Chen, Fang; Yang, Huanming; Wang, Jian; Liu, Xin; Xu, Xun; Bolund, Lars; Nielsen, Anders Lade; Luo, Yonglun

2018-03-01

Fusion of DNA methyltransferase domains to the nuclease-deficient clustered regularly interspaced short palindromic repeat (CRISPR) associated protein 9 (dCas9) has been used for epigenome editing, but the specificities of these dCas9 methyltransferases have not been fully investigated. We generated CRISPR-guided DNA methyltransferases by fusing the catalytic domain of DNMT3A or DNMT3B to the C terminus of the dCas9 protein from Streptococcus pyogenes and validated its on-target and global off-target characteristics. Using targeted quantitative bisulfite pyrosequencing, we prove that dCas9-BFP-DNMT3A and dCas9-BFP-DNMT3B can efficiently methylate the CpG dinucleotides flanking its target sites at different genomic loci (uPA and TGFBR3) in human embryonic kidney cells (HEK293T). Furthermore, we conducted whole genome bisulfite sequencing (WGBS) to address the specificity of our dCas9 methyltransferases. WGBS revealed that although dCas9-BFP-DNMT3A and dCas9-BFP-DNMT3B did not cause global methylation changes, a substantial number (more than 1000) of the off-target differentially methylated regions (DMRs) were identified. The off-target DMRs, which were hypermethylated in cells expressing dCas9 methyltransferase and guide RNAs, were predominantly found in promoter regions, 5΄ untranslated regions, CpG islands, and DNase I hypersensitivity sites, whereas unexpected hypomethylated off-target DMRs were significantly enriched in repeated sequences. Through chromatin immunoprecipitation with massive parallel DNA sequencing analysis, we further revealed that these off-target DMRs were weakly correlated with dCas9 off-target binding sites. Using quantitative polymerase chain reaction, RNA sequencing, and fluorescence reporter cells, we also found that dCas9-BFP-DNMT3A and dCas9-BFP-DNMT3B can mediate transient inhibition of gene expression, which might be caused by dCas9-mediated de novo DNA methylation as well as interference with transcription. Our results prove that d
Hyb-Seq: combining target enrichment and genome skimming for plant phylogenomics

Treesearch

Kevin Weitemier; Shannon C.K. Straub; Richard C. Cronn; Mark Fishbein; Roswitha Schmickl; Angela McDonnell; Aaron Liston

2014-01-01

â¢ Premise of the study: Hyb-Seq, the combination of target enrichment and genome skimming, allows simultaneous data collection for low-copy nuclear genes and high-copy genomic targets for plant systematics and evolution studies. â¢ Methods and Results: Genome and transcriptome assemblies for milkweed ( Asclepias syriaca ) were used to design enrichment probes for 3385...
SynFind: Compiling Syntenic Regions across Any Set of Genomes on Demand.

PubMed

Tang, Haibao; Bomhoff, Matthew D; Briones, Evan; Zhang, Liangsheng; Schnable, James C; Lyons, Eric

2015-11-11

The identification of conserved syntenic regions enables discovery of predicted locations for orthologous and homeologous genes, even when no such gene is present. This capability means that synteny-based methods are far more effective than sequence similarity-based methods in identifying true-negatives, a necessity for studying gene loss and gene transposition. However, the identification of syntenic regions requires complex analyses which must be repeated for pairwise comparisons between any two species. Therefore, as the number of published genomes increases, there is a growing demand for scalable, simple-to-use applications to perform comparative genomic analyses that cater to both gene family studies and genome-scale studies. We implemented SynFind, a web-based tool that addresses this need. Given one query genome, SynFind is capable of identifying conserved syntenic regions in any set of target genomes. SynFind is capable of reporting per-gene information, useful for researchers studying specific gene families, as well as genome-wide data sets of syntenic gene and predicted gene locations, critical for researchers focused on large-scale genomic analyses. Inference of syntenic homologs provides the basis for correlation of functional changes around genes of interests between related organisms. Deployed on the CoGe online platform, SynFind is connected to the genomic data from over 15,000 organisms from all domains of life as well as supporting multiple releases of the same organism. SynFind makes use of a powerful job execution framework that promises scalability and reproducibility. SynFind can be accessed at http://genomevolution.org/CoGe/SynFind.pl. A video tutorial of SynFind using Phytophthrora as an example is available at http://www.youtube.com/watch?v=2Agczny9Nyc. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Genome-wide Target Enrichment-aided Chip Design: a 66 K SNP Chip for Cashmere Goat.

PubMed

Qiao, Xian; Su, Rui; Wang, Yang; Wang, Ruijun; Yang, Ting; Li, Xiaokai; Chen, Wei; He, Shiyang; Jiang, Yu; Xu, Qiwu; Wan, Wenting; Zhang, Yaolei; Zhang, Wenguang; Chen, Jiang; Liu, Bin; Liu, Xin; Fan, Yixing; Chen, Duoyuan; Jiang, Huaizhi; Fang, Dongming; Liu, Zhihong; Wang, Xiaowen; Zhang, Yanjun; Mao, Danqing; Wang, Zhiying; Di, Ran; Zhao, Qianjun; Zhong, Tao; Yang, Huanming; Wang, Jian; Wang, Wen; Dong, Yang; Chen, Xiaoli; Xu, Xun; Li, Jinquan

2017-08-17

Compared with the commercially available single nucleotide polymorphism (SNP) chip based on the Bead Chip technology, the solution hybrid selection (SHS)-based target enrichment SNP chip is not only design-flexible, but also cost-effective for genotype sequencing. In this study, we propose to design an animal SNP chip using the SHS-based target enrichment strategy for the first time. As an update to the international collaboration on goat research, a 66 K SNP chip for cashmere goat was created from the whole-genome sequencing data of 73 individuals. Verification of this 66 K SNP chip with the whole-genome sequencing data of 436 cashmere goats showed that the SNP call rates was between 95.3% and 99.8%. The average sequencing depth for target SNPs were 40X. The capture regions were shown to be 200 bp that flank target SNPs. This chip was further tested in a genome-wide association analysis of cashmere fineness (fiber diameter). Several top hit loci were found marginally associated with signaling pathways involved in hair growth. These results demonstrate that the 66 K SNP chip is a useful tool in the genomic analyses of cashmere goats. The successful chip design shows that the SHS-based target enrichment strategy could be applied to SNP chip design in other species.
Redundancy analysis allows improved detection of methylation changes in large genomic regions.

PubMed

Ruiz-Arenas, Carlos; González, Juan R

2017-12-14

DNA methylation is an epigenetic process that regulates gene expression. Methylation can be modified by environmental exposures and changes in the methylation patterns have been associated with diseases. Methylation microarrays measure methylation levels at more than 450,000 CpGs in a single experiment, and the most common analysis strategy is to perform a single probe analysis to find methylation probes associated with the outcome of interest. However, methylation changes usually occur at the regional level: for example, genomic structural variants can affect methylation patterns in regions up to several megabases in length. Existing DMR methods provide lists of Differentially Methylated Regions (DMRs) of up to only few kilobases in length, and cannot check if a target region is differentially methylated. Therefore, these methods are not suitable to evaluate methylation changes in large regions. To address these limitations, we developed a new DMR approach based on redundancy analysis (RDA) that assesses whether a target region is differentially methylated. Using simulated and real datasets, we compared our approach to three common DMR detection methods (Bumphunter, blockFinder, and DMRcate). We found that Bumphunter underestimated methylation changes and blockFinder showed poor performance. DMRcate showed poor power in the simulated datasets and low specificity in the real data analysis. Our method showed very high performance in all simulation settings, even with small sample sizes and subtle methylation changes, while controlling type I error. Other advantages of our method are: 1) it estimates the degree of association between the DMR and the outcome; 2) it can analyze a targeted or region of interest; and 3) it can evaluate the simultaneous effects of different variables. The proposed methodology is implemented in MEAL, a Bioconductor package designed to facilitate the analysis of methylation data. We propose a multivariate approach to decipher whether an
Rescue of Targeted Regions of Mammalian Chromosomes by in Vivo Recombination in Yeast

PubMed Central

Kouprina, Natalya; Kawamoto, Kensaku; Barrett, J. Carl; Larionov, Vladimir; Koi, Minoru

1998-01-01

In contrast to other animal cell lines, the chicken pre-B cell lymphoma line, DT40, exhibits a high level of homologous recombination, which can be exploited to generate site-specific alterations in defined target genes or regions. In addition, the ability to generate human/chicken monochromosomal hybrids in the DT40 cell line opens a way for specific targeting of human genes. Here we describe a new strategy for direct isolation of a human chromosomal region that is based on targeting of the chromosome with a vector containing a yeast selectable marker, centromere, and an ARS element. This procedure allows rescue of the targeted region by transfection of total genomic DNA into yeast spheroplasts. Selection for the yeast marker results in isolation of chromosome sequences in the form of large circular yeast artificial chromosomes (YACs) up to 170 kb in size containing the targeted region. These YACs are generated by homologous recombination in yeast between common repeated sequences in the targeted chromosomal fragment. Alternatively, the targeted region can be rescued as a linear YACs when a YAC fragmentation vector is included in the yeast transformation mixture. Because the entire isolation procedure of the chromosomal region, once a target insertion is obtained, can be accomplished in ∼1 week, the new method greatly expands the utility of the homologous recombinationproficient DT40 chicken cell system. PMID:9647640
Pan-genome analysis of human gastric pathogen H. pylori: comparative genomics and pathogenomics approaches to identify regions associated with pathogenicity and prediction of potential core therapeutic targets.

PubMed

Ali, Amjad; Naz, Anam; Soares, Siomar C; Bakhtiar, Marriam; Tiwari, Sandeep; Hassan, Syed S; Hanan, Fazal; Ramos, Rommel; Pereira, Ulisses; Barh, Debmalya; Figueiredo, Henrique César Pereira; Ussery, David W; Miyoshi, Anderson; Silva, Artur; Azevedo, Vasco

2015-01-01

Helicobacter pylori is a human gastric pathogen implicated as the major cause of peptic ulcer and second leading cause of gastric cancer (~70%) around the world. Conversely, an increased resistance to antibiotics and hindrances in the development of vaccines against H. pylori are observed. Pan-genome analyses of the global representative H. pylori isolates consisting of 39 complete genomes are presented in this paper. Phylogenetic analyses have revealed close relationships among geographically diverse strains of H. pylori. The conservation among these genomes was further analyzed by pan-genome approach; the predicted conserved gene families (1,193) constitute ~77% of the average H. pylori genome and 45% of the global gene repertoire of the species. Reverse vaccinology strategies have been adopted to identify and narrow down the potential core-immunogenic candidates. Total of 28 nonhost homolog proteins were characterized as universal therapeutic targets against H. pylori based on their functional annotation and protein-protein interaction. Finally, pathogenomics and genome plasticity analysis revealed 3 highly conserved and 2 highly variable putative pathogenicity islands in all of the H. pylori genomes been analyzed.
Genome-wide evidence for local DNA methylation spreading from small RNA-targeted sequences in Arabidopsis.

PubMed

Ahmed, Ikhlak; Sarazin, Alexis; Bowler, Chris; Colot, Vincent; Quesneville, Hadi

2011-09-01

Transposable elements (TEs) and their relics play major roles in genome evolution. However, mobilization of TEs is usually deleterious and strongly repressed. In plants and mammals, this repression is typically associated with DNA methylation, but the relationship between this epigenetic mark and TE sequences has not been investigated systematically. Here, we present an improved annotation of TE sequences and use it to analyze genome-wide DNA methylation maps obtained at single-nucleotide resolution in Arabidopsis. We show that although the majority of TE sequences are methylated, ∼26% are not. Moreover, a significant fraction of TE sequences densely methylated at CG, CHG and CHH sites (where H = A, T or C) have no or few matching small interfering RNA (siRNAs) and are therefore unlikely to be targeted by the RNA-directed DNA methylation (RdDM) machinery. We provide evidence that these TE sequences acquire DNA methylation through spreading from adjacent siRNA-targeted regions. Further, we show that although both methylated and unmethylated TE sequences located in euchromatin tend to be more abundant closer to genes, this trend is least pronounced for methylated, siRNA-targeted TE sequences located 5' to genes. Based on these and other findings, we propose that spreading of DNA methylation through promoter regions explains at least in part the negative impact of siRNA-targeted TE sequences on neighboring gene expression.
The genome editing revolution: A CRISPR-Cas TALE off-target story.

PubMed

Stella, Stefano; Montoya, Guillermo

2016-07-01

In the last 10 years, we have witnessed a blooming of targeted genome editing systems and applications. The area was revolutionized by the discovery and characterization of the transcription activator-like effector proteins, which are easier to engineer to target new DNA sequences than the previously available DNA binding templates, zinc fingers and meganucleases. Recently, the area experimented a quantum leap because of the introduction of the clustered regularly interspaced short palindromic repeats (CRISPR)-associated protein (Cas) system (clustered regularly interspaced short palindromic sequence). This ribonucleoprotein complex protects bacteria from invading DNAs, and it was adapted to be used in genome editing. The CRISPR ribonucleic acid (RNA) molecule guides to the specific DNA site the Cas9 nuclease to cleave the DNA target. Two years and more than 1000 publications later, the CRISPR-Cas system has become the main tool for genome editing in many laboratories. Currently the targeted genome editing technology has been used in many fields and may be a possible approach for human gene therapy. Furthermore, it can also be used to modifying the genomes of model organisms for studying human pathways or to improve key organisms for biotechnological applications, such as plants, livestock genome as well as yeasts and bacterial strains. © 2016 The Authors. BioEssays published by WILEY Periodicals, Inc.
Seamless Genome Editing in Rice via Gene Targeting and Precise Marker Elimination.

PubMed

Nishizawa-Yokoi, Ayako; Saika, Hiroaki; Toki, Seiichi

2016-01-01

Positive-negative selection using hygromycin phosphotransferase (hpt) and diphtheria toxin A-fragment (DT-A) as positive and negative selection markers, respectively, allows enrichment of cells harboring target genes modified via gene targeting (GT). We have developed a successful GT system employing positive-negative selection and subsequent precise marker excision via the piggyBac transposon derived from the cabbage looper moth to introduce desired modifications into target genes in the rice genome. This approach could be applied to the precision genome editing of almost all endogenous genes throughout the genome, at least in rice.
Telomere maintenance through recruitment of internal genomic regions.

PubMed

Seo, Beomseok; Kim, Chuna; Hills, Mark; Sung, Sanghyun; Kim, Hyesook; Kim, Eunkyeong; Lim, Daisy S; Oh, Hyun-Seok; Choi, Rachael Mi Jung; Chun, Jongsik; Shim, Jaegal; Lee, Junho

2015-09-18

Cells surviving crisis are often tumorigenic and their telomeres are commonly maintained through the reactivation of telomerase. However, surviving cells occasionally activate a recombination-based mechanism called alternative lengthening of telomeres (ALT). Here we establish stably maintained survivors in telomerase-deleted Caenorhabditis elegans that escape from sterility by activating ALT. ALT survivors trans-duplicate an internal genomic region, which is already cis-duplicated to chromosome ends, across the telomeres of all chromosomes. These 'Template for ALT' (TALT) regions consist of a block of genomic DNA flanked by telomere-like sequences, and are different between two genetic background. We establish a model that an ancestral duplication of a donor TALT region to a proximal telomere region forms a genomic reservoir ready to be incorporated into telomeres on ALT activation.
Identification of genomic sites for CRISPR/Cas9-based genome editing in the Vitis vinifera genome.

PubMed

Wang, Yi; Liu, Xianju; Ren, Chong; Zhong, Gan-Yuan; Yang, Long; Li, Shaohua; Liang, Zhenchang

2016-04-21

CRISPR/Cas9 has been recently demonstrated as an effective and popular genome editing tool for modifying genomes of humans, animals, microorganisms, and plants. Success of such genome editing is highly dependent on the availability of suitable target sites in the genomes to be edited. Many specific target sites for CRISPR/Cas9 have been computationally identified for several annual model and crop species, but such sites have not been reported for perennial, woody fruit species. In this study, we identified and characterized five types of CRISPR/Cas9 target sites in the widely cultivated grape species Vitis vinifera and developed a user-friendly database for editing grape genomes in the future. A total of 35,767,960 potential CRISPR/Cas9 target sites were identified from grape genomes in this study. Among them, 22,597,817 target sites were mapped to specific genomic locations and 7,269,788 were found to be highly specific. Protospacers and PAMs were found to distribute uniformly and abundantly in the grape genomes. They were present in all the structural elements of genes with the coding region having the highest abundance. Five PAM types, TGG, AGG, GGG, CGG and NGG, were observed. With the exception of the NGG type, they were abundantly present in the grape genomes. Synteny analysis of similar genes revealed that the synteny of protospacers matched the synteny of homologous genes. A user-friendly database containing protospacers and detailed information of the sites was developed and is available for public use at the Grape-CRISPR website ( http://biodb.sdau.edu.cn/gc/index.html ). Grape genomes harbour millions of potential CRISPR/Cas9 target sites. These sites are widely distributed among and within chromosomes with predominant abundance in the coding regions of genes. We developed a publicly-accessible Grape-CRISPR database for facilitating the use of the CRISPR/Cas9 system as a genome editing tool for functional studies and molecular breeding of grapes. Among
Targeted and genome-scale methylomics reveals gene body signatures in human cell lines

PubMed Central

Ball, Madeleine Price; Li, Jin Billy; Gao, Yuan; Lee, Je-Hyuk; LeProust, Emily; Park, In-Hyun; Xie, Bin; Daley, George Q.; Church, George M.

2012-01-01

Cytosine methylation, an epigenetic modification of DNA, is a target of growing interest for developing high throughput profiling technologies. Here we introduce two new, complementary techniques for cytosine methylation profiling utilizing next generation sequencing technology: bisulfite padlock probes (BSPPs) and methyl sensitive cut counting (MSCC). In the first method, we designed a set of ~10,000 BSPPs distributed over the ENCODE pilot project regions to take advantage of existing expression and chromatin immunoprecipitation data. We observed a pattern of low promoter methylation coupled with high gene body methylation in highly expressed genes. Using the second method, MSCC, we gathered genome-scale data for 1.4 million HpaII sites and confirmed that gene body methylation in highly expressed genes is a consistent phenomenon over the entire genome. Our observations highlight the usefulness of techniques which are not inherently or intentionally biased in favor of only profiling particular subsets like CpG islands or promoter regions. PMID:19329998
Off-target Effects in CRISPR/Cas9-mediated Genome Engineering

PubMed Central

Zhang, Xiao-Hui; Tee, Louis Y; Wang, Xiao-Gang; Huang, Qun-Shan; Yang, Shi-Hua

2015-01-01

CRISPR/Cas9 is a versatile genome-editing technology that is widely used for studying the functionality of genetic elements, creating genetically modified organisms as well as preclinical research of genetic disorders. However, the high frequency of off-target activity (≥50%)—RGEN (RNA-guided endonuclease)-induced mutations at sites other than the intended on-target site—is one major concern, especially for therapeutic and clinical applications. Here, we review the basic mechanisms underlying off-target cutting in the CRISPR/Cas9 system, methods for detecting off-target mutations, and strategies for minimizing off-target cleavage. The improvement off-target specificity in the CRISPR/Cas9 system will provide solid genotype–phenotype correlations, and thus enable faithful interpretation of genome-editing data, which will certainly facilitate the basic and clinical application of this technology. PMID:26575098
EGFR-targeted therapies in the post-genomic era.

PubMed

Xu, Mary Jue; Johnson, Daniel E; Grandis, Jennifer R

2017-09-01

Over 90% of head and neck cancers overexpress the epidermal growth factor receptor (EGFR). In diverse tumor types, EGFR overexpression has been associated with poorer prognosis and outcomes. Therapies targeting EGFR include monoclonal antibodies, tyrosine kinase inhibitors, phosphatidylinositol 3-kinase (PI3K) inhibitors, and antisense gene therapy. Few EGFR-targeted therapeutics are approved for clinical use. The monoclonal antibody cetuximab is a Food and Drug Administration (FDA)-approved EGFR-targeted therapy, yet has exhibited modest benefit in clinical trials. The humanized monoclonal antibody nimotuzumab is also approved for head and neck cancers in Cuba, Argentina, Colombia, Peru, India, Ukraine, Ivory Coast, and Gabon in addition to nasopharyngeal cancers in China. Few other EGFR-targeted therapeutics for head and neck cancers have led to as significant responses as seen in lung carcinomas, for instance. Recent genome sequencing of head and neck tumors has helped identify patient subgroups with improved response to EGFR inhibitors, for example, cetuximab in patients with the KRAS-variant and the tyrosine kinase inhibitor erlotinib for tumors harboring MAPK1 E322K mutations. Genome sequencing has furthermore broadened our understanding of dysregulated pathways, holding the potential to enhance the benefit derived from therapies targeting EGFR.

Cell-Free DNA Analysis of Targeted Genomic Regions in Maternal Plasma for Non-Invasive Prenatal Testing of Trisomy 21, Trisomy 18, Trisomy 13, and Fetal Sex.

PubMed

Koumbaris, George; Kypri, Elena; Tsangaras, Kyriakos; Achilleos, Achilleas; Mina, Petros; Neofytou, Maria; Velissariou, Voula; Christopoulou, Georgia; Kallikas, Ioannis; González-Liñán, Alicia; Benusiene, Egle; Latos-Bielenska, Anna; Marek, Pietryga; Santana, Alfredo; Nagy, Nikoletta; Széll, Márta; Laudanski, Piotr; Papageorgiou, Elisavet A; Ioannides, Marios; Patsalis, Philippos C

2016-06-01

There is great need for the development of highly accurate cost effective technologies that could facilitate the widespread adoption of noninvasive prenatal testing (NIPT). We developed an assay based on the targeted analysis of cell-free DNA for the detection of fetal aneuploidies of chromosomes 21, 18, and 13. This method enabled the capture and analysis of selected genomic regions of interest. An advanced fetal fraction estimation and aneuploidy determination algorithm was also developed. This assay allowed for accurate counting and assessment of chromosomal regions of interest. The analytical performance of the assay was evaluated in a blind study of 631 samples derived from pregnancies of at least 10 weeks of gestation that had also undergone invasive testing. Our blind study exhibited 100% diagnostic sensitivity and specificity and correctly classified 52/52 (95% CI, 93.2%-100%) cases of trisomy 21, 16/16 (95% CI, 79.4%-100%) cases of trisomy 18, 5/5 (95% CI, 47.8%-100%) cases of trisomy 13, and 538/538 (95% CI, 99.3%-100%) normal cases. The test also correctly identified fetal sex in all cases (95% CI, 99.4%-100%). One sample failed prespecified assay quality control criteria, and 19 samples were nonreportable because of low fetal fraction. The extent to which free fetal DNA testing can be applied as a universal screening tool for trisomy 21, 18, and 13 depends mainly on assay accuracy and cost. Cell-free DNA analysis of targeted genomic regions in maternal plasma enables accurate and cost-effective noninvasive fetal aneuploidy detection, which is critical for widespread adoption of NIPT. © 2016 American Association for Clinical Chemistry.
Partial DNA-guided Cas9 enables genome editing with reduced off-target activity

PubMed Central

Yin, Hao; Song, Chun-Qing; Suresh, Sneha; Kwan, Suet-Yan; Wu, Qiongqiong; Walsh, Stephen; Ding, Junmei; Bogorad, Roman L; Zhu, Lihua Julie; Wolfe, Scot A; Koteliansky, Victor; Xue, Wen; Langer, Robert; Anderson, Daniel G

2018-01-01

CRISPR–Cas9 is a versatile RNA-guided genome editing tool. Here we demonstrate that partial replacement of RNA nucleotides with DNA nucleotides in CRISPR RNA (crRNA) enables efficient gene editing in human cells. This strategy of partial DNA replacement retains on-target activity when used with both crRNA and sgRNA, as well as with multiple guide sequences. Partial DNA replacement also works for crRNA of Cpf1, another CRISPR system. We find that partial DNA replacement in the guide sequence significantly reduces off-target genome editing through focused analysis of off-target cleavage, measurement of mismatch tolerance and genome-wide profiling of off-target sites. Using the structure of the Cas9–sgRNA complex as a guide, the majority of the 3′ end of crRNA can be replaced with DNA nucleotide, and the 5 - and 3′-DNA-replaced crRNA enables efficient genome editing. Cas9 guided by a DNA–RNA chimera may provide a generalized strategy to reduce both the cost and the off-target genome editing in human cells. PMID:29377001
An Adenovirus DNA Replication Factor, but Not Incoming Genome Complexes, Targets PML Nuclear Bodies.

PubMed

Komatsu, Tetsuro; Nagata, Kyosuke; Wodrich, Harald

2016-02-01

Promyelocytic leukemia protein nuclear bodies (PML-NBs) are subnuclear domains implicated in cellular antiviral responses. Despite the antiviral activity, several nuclear replicating DNA viruses use the domains as deposition sites for the incoming viral genomes and/or as sites for viral DNA replication, suggesting that PML-NBs are functionally relevant during early viral infection to establish productive replication. Although PML-NBs and their components have also been implicated in the adenoviral life cycle, it remains unclear whether incoming adenoviral genome complexes target PML-NBs. Here we show using immunofluorescence and live-cell imaging analyses that incoming adenovirus genome complexes neither localize at nor recruit components of PML-NBs during early phases of infection. We further show that the viral DNA binding protein (DBP), an early expressed viral gene and essential DNA replication factor, independently targets PML-NBs. We show that DBP oligomerization is required to selectively recruit the PML-NB components Sp100 and USP7. Depletion experiments suggest that the absence of one PML-NB component might not affect the recruitment of other components toward DBP oligomers. Thus, our findings suggest a model in which an adenoviral DNA replication factor, but not incoming viral genome complexes, targets and modulates PML-NBs to support a conducive state for viral DNA replication and argue against a generalized concept that PML-NBs target incoming viral genomes. The immediate fate upon nuclear delivery of genomes of incoming DNA viruses is largely unclear. Early reports suggested that incoming genomes of herpesviruses are targeted and repressed by PML-NBs immediately upon nuclear import. Genome localization and/or viral DNA replication has also been observed at PML-NBs for other DNA viruses. Thus, it was suggested that PML-NBs may immediately sense and target nuclear viral genomes and hence serve as sites for deposition of incoming viral genomes and
DArT Markers Effectively Target Gene Space in the Rye Genome.

PubMed

Gawroński, Piotr; Pawełkowicz, Magdalena; Tofil, Katarzyna; Uszyński, Grzegorz; Sharifova, Saida; Ahluwalia, Shivaksh; Tyrka, Mirosław; Wędzony, Maria; Kilian, Andrzej; Bolibok-Brągoszewska, Hanna

2016-01-01

Large genome size and complexity hamper considerably the genomics research in relevant species. Rye ( Secale cereale L.) has one of the largest genomes among cereal crops and repetitive sequences account for over 90% of its length. Diversity Arrays Technology is a high-throughput genotyping method, in which a preferential sampling of gene-rich regions is achieved through the use of methylation sensitive restriction enzymes. We obtained sequences of 6,177 rye DArT markers and following a redundancy analysis assembled them into 3,737 non-redundant sequences, which were then used in homology searches against five Pooideae sequence sets. In total 515 DArT sequences could be incorporated into publicly available rye genome zippers providing a starting point for the integration of DArT- and transcript-based genomics resources in rye. Using Blast2Go pipeline we attributed putative gene functions to 1101 (29.4%) of the non-redundant DArT marker sequences, including 132 sequences with putative disease resistance-related functions, which were found to be preferentially located in the 4RL and 6RL chromosomes. Comparative analysis based on the DArT sequences revealed obvious inconsistencies between two recently published high density consensus maps of rye. Furthermore we demonstrated that DArT marker sequences can be a source of SSR polymorphisms. Obtained data demonstrate that DArT markers effectively target gene space in the large, complex, and repetitive rye genome. Through the annotation of putative gene functions and the alignment of DArT sequences relative to reference genomes we obtained information, that will complement the results of the studies, where DArT genotyping was deployed, by simplifying the gene ontology and microcolinearity based identification of candidate genes.
Transfer RNA gene-targeted integration: an adaptation of retrotransposable elements to survive in the compact Dictyostelium discoideum genome.

PubMed

Winckler, T; Szafranski, K; Glöckner, G

2005-01-01

Almost every organism carries along a multitude of molecular parasites known as transposable elements (TEs). TEs influence their host genomes in many ways by expanding genome size and complexity, rearranging genomic DNA, mutagenizing host genes, and altering transcription levels of nearby genes. The eukaryotic microorganism Dictyostelium discoideum is attractive for the study of fundamental biological phenomena such as intercellular communication, formation of multicellularity, cell differentiation, and morphogenesis. D. discoideum has a highly compacted, haploid genome with less than 1 kb of genomic DNA separating coding regions. Nevertheless, the D. discoideum genome is loaded with 10% of TEs that managed to settle and survive in this inhospitable environment. In depth analysis of D. discoideum genome project data has provided intriguing insights into the evolutionary challenges that mobile elements face when they invade compact genomes. Two different mechanisms are used by D. discoideum TEs to avoid disruption of host genes upon retrotransposition. Several TEs have invented the specific targeting of tRNA gene-flanking regions as a means to avoid integration into coding regions. These elements have been dispersed on all chromosomes, closely following the distribution of tRNA genes. By contrast, TEs that lack bona fide integration specificities show a strong bias to nested integration, thus forming large TE clusters at certain chromosomal loci that are hardly resolved by bioinformatics approaches. We summarize our current view of D. discoideum TEs and present new data from the analysis of the complete sequences of D. discoideum chromosomes 1 and 2, which comprise more than one third of the total genome.
Enhancer scanning to locate regulatory regions in genomic loci

PubMed Central

Buckley, Melissa; Gjyshi, Anxhela; Mendoza-Fandiño, Gustavo; Baskin, Rebekah; Carvalho, Renato S.; Carvalho, Marcelo A.; Woods, Nicholas T.; Monteiro, Alvaro N.A.

2016-01-01

The present protocol provides a rapid, streamlined and scalable strategy to systematically scan genomic regions for the presence of transcriptional regulatory regions active in a specific cell type. It creates genomic tiles spanning a region of interest that are subsequently cloned by recombination into a luciferase reporter vector containing the Simian Virus 40 promoter. Tiling clones are transfected into specific cell types to test for the presence of transcriptional regulatory regions. The protocol includes testing of different SNP (single nucleotide polymorphism) alleles to determine their effect on regulatory activity. This procedure provides a systematic framework to identify candidate functional SNPs within a locus during functional analysis of genome-wide association studies. This protocol adapts and combines previous well-established molecular biology methods to provide a streamlined strategy, based on automated primer design and recombinational cloning to rapidly go from a genomic locus to a set of candidate functional SNPs in eight weeks. PMID:26658467
Genome-wide prediction of cis-regulatory regions using supervised deep learning methods.

PubMed

Li, Yifeng; Shi, Wenqiang; Wasserman, Wyeth W

2018-05-31

In the human genome, 98% of DNA sequences are non-protein-coding regions that were previously disregarded as junk DNA. In fact, non-coding regions host a variety of cis-regulatory regions which precisely control the expression of genes. Thus, Identifying active cis-regulatory regions in the human genome is critical for understanding gene regulation and assessing the impact of genetic variation on phenotype. The developments of high-throughput sequencing and machine learning technologies make it possible to predict cis-regulatory regions genome wide. Based on rich data resources such as the Encyclopedia of DNA Elements (ENCODE) and the Functional Annotation of the Mammalian Genome (FANTOM) projects, we introduce DECRES based on supervised deep learning approaches for the identification of enhancer and promoter regions in the human genome. Due to their ability to discover patterns in large and complex data, the introduction of deep learning methods enables a significant advance in our knowledge of the genomic locations of cis-regulatory regions. Using models for well-characterized cell lines, we identify key experimental features that contribute to the predictive performance. Applying DECRES, we delineate locations of 300,000 candidate enhancers genome wide (6.8% of the genome, of which 40,000 are supported by bidirectional transcription data), and 26,000 candidate promoters (0.6% of the genome). The predicted annotations of cis-regulatory regions will provide broad utility for genome interpretation from functional genomics to clinical applications. The DECRES model demonstrates potentials of deep learning technologies when combined with high-throughput sequencing data, and inspires the development of other advanced neural network models for further improvement of genome annotations.
Discovery of novel targets for multi-epitope vaccines: Screening of HIV-1 genomes using association rule mining

PubMed Central

Paul, Sinu; Piontkivska, Helen

2009-01-01

Background Studies have shown that in the genome of human immunodeficiency virus (HIV-1) regions responsible for interactions with the host's immune system, namely, cytotoxic T-lymphocyte (CTL) epitopes tend to cluster together in relatively conserved regions. On the other hand, "epitope-less" regions or regions with relatively low density of epitopes tend to be more variable. However, very little is known about relationships among epitopes from different genes, in other words, whether particular epitopes from different genes would occur together in the same viral genome. To identify CTL epitopes in different genes that co-occur in HIV genomes, association rule mining was used. Results Using a set of 189 best-defined HIV-1 CTL/CD8+ epitopes from 9 different protein-coding genes, as described by Frahm, Linde & Brander (2007), we examined the complete genomic sequences of 62 reference HIV sequences (including 13 subtypes and sub-subtypes with approximately 4 representative sequences for each subtype or sub-subtype, and 18 circulating recombinant forms). The results showed that despite inclusion of recombinant sequences that would be expected to break-up associations of epitopes in different genes when two different genomes are recombined, there exist particular combinations of epitopes (epitope associations) that occur repeatedly across the world-wide population of HIV-1. For example, Pol epitope LFLDGIDKA is found to be significantly associated with epitopes GHQAAMQML and FLKEKGGL from Gag and Nef, respectively, and this association rule is observed even among circulating recombinant forms. Conclusion We have identified CTL epitope combinations co-occurring in HIV-1 genomes including different subtypes and recombinant forms. Such co-occurrence has important implications for design of complex vaccines (multi-epitope vaccines) and/or drugs that would target multiple HIV-1 regions at once and, thus, may be expected to overcome challenges associated with viral escape
Enhancing Targeted Genomic DNA Editing in Chicken Cells Using the CRISPR/Cas9 System

PubMed Central

Wang, Ling; Yang, Likai; Guo, Yijie; Du, Weili; Yin, Yajun; Zhang, Tao; Lu, Hongzhao

2017-01-01

The CRISPR/Cas9 system has enabled highly efficient genome targeted editing for various organisms. However, few studies have focused on CRISPR/Cas9 nuclease-mediated chicken genome editing compared with mammalian genomes. The current study combined CRISPR with yeast Rad52 (yRad52) to enhance targeted genomic DNA editing in chicken DF-1 cells. The efficiency of CRISPR/Cas9 nuclease-induced targeted mutations in the chicken genome was increased to 41.9% via the enrichment of the dual-reporter surrogate system. In addition, the combined effect of CRISPR nuclease and yRad52 dramatically increased the efficiency of the targeted substitution in the myostatin gene using 50-mer oligodeoxynucleotides (ssODN) as the donor DNA, resulting in a 36.7% editing efficiency after puromycin selection. Furthermore, based on the effect of yRad52, the frequency of exogenous gene integration in the chicken genome was more than 3-fold higher than that without yRad52. Collectively, these results suggest that ssODN is an ideal donor DNA for targeted substitution and that CRISPR/Cas9 combined with yRad52 significantly enhances chicken genome editing. These findings could be extensively applied in other organisms. PMID:28068387
A resource for characterizing genome-wide binding and putative target genes of transcription factors expressed during secondary growth and wood formation in Populus

Treesearch

Lijun Liu; Trevor Ramsay; Matthew S. Zinkgraf; David Sundell; Nathaniel Robert Street; Vladimir Filkov; Andrew Groover

2015-01-01

Identifying transcription factor target genes is essential for modeling the transcriptional networks underlying developmental processes. Here we report a chromatin immunoprecipitation sequencing (ChIP-seq) resource consisting of genome-wide binding regions and associated putative target genes for four Populus homeodomain transcription factors...
Systems genetics for drug target discovery

PubMed Central

Penrod, Nadia M.; Cowper-Sal_lari, Richard; Moore, Jason H.

2011-01-01

The collection and analysis of genomic data has the potential to reveal novel druggable targets by providing insight into the genetic basis of disease. However, the number of drugs, targeting new molecular entities, approved by the US Food and Drug Administration (FDA) has not increased in the years since the collection of genomic data has become commonplace. The paucity of translatable results can be partly attributed to conventional analysis methods that test one gene at a time in an effort to identify disease-associated factors as candidate drug targets. By disengaging genetic factors from their position within the genetic regulatory system, much of the information stored within the genomic data set is lost. Here we discuss how genomic data is used to identify disease-associated genes or genomic regions, how disease-associated regions are validated as functional targets, and the role network analysis can play in bridging the gap between data generation and effective drug target identification. PMID:21862141
Augmenting Chinese hamster genome assembly by identifying regions of high confidence.

PubMed

Vishwanathan, Nandita; Bandyopadhyay, Arpan A; Fu, Hsu-Yuan; Sharma, Mohit; Johnson, Kathryn C; Mudge, Joann; Ramaraj, Thiruvarangan; Onsongo, Getiria; Silverstein, Kevin A T; Jacob, Nitya M; Le, Huong; Karypis, George; Hu, Wei-Shou

2016-09-01

Chinese hamster Ovary (CHO) cell lines are the dominant industrial workhorses for therapeutic recombinant protein production. The availability of genome sequence of Chinese hamster and CHO cells will spur further genome and RNA sequencing of producing cell lines. However, the mammalian genomes assembled using shot-gun sequencing data still contain regions of uncertain quality due to assembly errors. Identifying high confidence regions in the assembled genome will facilitate its use for cell engineering and genome engineering. We assembled two independent drafts of Chinese hamster genome by de novo assembly from shotgun sequencing reads and by re-scaffolding and gap-filling the draft genome from NCBI for improved scaffold lengths and gap fractions. We then used the two independent assemblies to identify high confidence regions using two different approaches. First, the two independent assemblies were compared at the sequence level to identify their consensus regions as "high confidence regions" which accounts for at least 78 % of the assembled genome. Further, a genome wide comparison of the Chinese hamster scaffolds with mouse chromosomes revealed scaffolds with large blocks of collinearity, which were also compiled as high-quality scaffolds. Genome scale collinearity was complemented with EST based synteny which also revealed conserved gene order compared to mouse. As cell line sequencing becomes more commonly practiced, the approaches reported here are useful for assessing the quality of assembly and potentially facilitate the engineering of cell lines. Copyright © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Is mammalian chromosomal evolution driven by regions of genome fragility?

PubMed Central

Ruiz-Herrera, Aurora; Castresana, Jose; Robinson, Terence J

2006-01-01

Background A fundamental question in comparative genomics concerns the identification of mechanisms that underpin chromosomal change. In an attempt to shed light on the dynamics of mammalian genome evolution, we analyzed the distribution of syntenic blocks, evolutionary breakpoint regions, and evolutionary breakpoints taken from public databases available for seven eutherian species (mouse, rat, cattle, dog, pig, cat, and horse) and the chicken, and examined these for correspondence with human fragile sites and tandem repeats. Results Our results confirm previous investigations that showed the presence of chromosomal regions in the human genome that have been repeatedly used as illustrated by a high breakpoint accumulation in certain chromosomes and chromosomal bands. We show, however, that there is a striking correspondence between fragile site location, the positions of evolutionary breakpoints, and the distribution of tandem repeats throughout the human genome, which similarly reflect a non-uniform pattern of occurrence. Conclusion These observations provide further evidence that certain chromosomal regions in the human genome have been repeatedly used in the evolutionary process. As a consequence, the genome is a composite of fragile regions prone to reorganization that have been conserved in different lineages, and genomic tracts that do not exhibit the same levels of evolutionary plasticity. PMID:17156441
Whole-Genome Thermodynamic Analysis Reduces siRNA Off-Target Effects

PubMed Central

Chen, Xi; Liu, Peng; Chou, Hui-Hsien

2013-01-01

Small interfering RNAs (siRNAs) are important tools for knocking down targeted genes, and have been widely applied to biological and biomedical research. To design siRNAs, two important aspects must be considered: the potency in knocking down target genes and the off-target effect on any nontarget genes. Although many studies have produced useful tools to design potent siRNAs, off-target prevention has mostly been delegated to sequence-level alignment tools such as BLAST. We hypothesize that whole-genome thermodynamic analysis can identify potential off-targets with higher precision and help us avoid siRNAs that may have strong off-target effects. To validate this hypothesis, two siRNA sets were designed to target three human genes IDH1, ITPR2 and TRIM28. They were selected from the output of two popular siRNA design tools, siDirect and siDesign. Both siRNA design tools have incorporated sequence-level screening to avoid off-targets, thus their output is believed to be optimal. However, one of the sets we tested has off-target genes predicted by Picky, a whole-genome thermodynamic analysis tool. Picky can identify off-target genes that may hybridize to a siRNA within a user-specified melting temperature range. Our experiments validated that some off-target genes predicted by Picky can indeed be inhibited by siRNAs. Similar experiments were performed using commercially available siRNAs and a few off-target genes were also found to be inhibited as predicted by Picky. In summary, we demonstrate that whole-genome thermodynamic analysis can identify off-target genes that are missed in sequence-level screening. Because Picky prediction is deterministic according to thermodynamics, if a siRNA candidate has no Picky predicted off-targets, it is unlikely to cause off-target effects. Therefore, we recommend including Picky as an additional screening step in siRNA design. PMID:23484018
DArT Markers Effectively Target Gene Space in the Rye Genome

PubMed Central

Gawroński, Piotr; Pawełkowicz, Magdalena; Tofil, Katarzyna; Uszyński, Grzegorz; Sharifova, Saida; Ahluwalia, Shivaksh; Tyrka, Mirosław; Wędzony, Maria; Kilian, Andrzej; Bolibok-Brągoszewska, Hanna

2016-01-01

Large genome size and complexity hamper considerably the genomics research in relevant species. Rye (Secale cereale L.) has one of the largest genomes among cereal crops and repetitive sequences account for over 90% of its length. Diversity Arrays Technology is a high-throughput genotyping method, in which a preferential sampling of gene-rich regions is achieved through the use of methylation sensitive restriction enzymes. We obtained sequences of 6,177 rye DArT markers and following a redundancy analysis assembled them into 3,737 non-redundant sequences, which were then used in homology searches against five Pooideae sequence sets. In total 515 DArT sequences could be incorporated into publicly available rye genome zippers providing a starting point for the integration of DArT- and transcript-based genomics resources in rye. Using Blast2Go pipeline we attributed putative gene functions to 1101 (29.4%) of the non-redundant DArT marker sequences, including 132 sequences with putative disease resistance-related functions, which were found to be preferentially located in the 4RL and 6RL chromosomes. Comparative analysis based on the DArT sequences revealed obvious inconsistencies between two recently published high density consensus maps of rye. Furthermore we demonstrated that DArT marker sequences can be a source of SSR polymorphisms. Obtained data demonstrate that DArT markers effectively target gene space in the large, complex, and repetitive rye genome. Through the annotation of putative gene functions and the alignment of DArT sequences relative to reference genomes we obtained information, that will complement the results of the studies, where DArT genotyping was deployed, by simplifying the gene ontology and microcolinearity based identification of candidate genes. PMID:27833625
The genome editing toolbox: a spectrum of approaches for targeted modification.

PubMed

Cheng, Joseph K; Alper, Hal S

2014-12-01

The increase in quality, quantity, and complexity of recombinant products heavily drives the need to predictably engineer model and complex (mammalian) cell systems. However, until recently, limited tools offered the ability to precisely manipulate their genomes, thus impeding the full potential of rational cell line development processes. Targeted genome editing can combine the advances in synthetic and systems biology with current cellular hosts to further push productivity and expand the product repertoire. This review highlights recent advances in targeted genome editing techniques, discussing some of their capabilities and limitations and their potential to aid advances in pharmaceutical biotechnology. Copyright © 2014 Elsevier Ltd. All rights reserved.
GUIDE-Seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases

PubMed Central

Nguyen, Nhu T.; Liebers, Matthew; Topkar, Ved V.; Thapar, Vishal; Wyvekens, Nicolas; Khayter, Cyd; Iafrate, A. John; Le, Long P.; Aryee, Martin J.; Joung, J. Keith

2014-01-01

CRISPR RNA-guided nucleases (RGNs) are widely used genome-editing reagents, but methods to delineate their genome-wide off-target cleavage activities have been lacking. Here we describe an approach for global detection of DNA double-stranded breaks (DSBs) introduced by RGNs and potentially other nucleases. This method, called Genome-wide Unbiased Identification of DSBs Enabled by Sequencing (GUIDE-Seq), relies on capture of double-stranded oligodeoxynucleotides into breaks Application of GUIDE-Seq to thirteen RGNs in two human cell lines revealed wide variability in RGN off-target activities and unappreciated characteristics of off-target sequences. The majority of identified sites were not detected by existing computational methods or ChIP-Seq. GUIDE-Seq also identified RGN-independent genomic breakpoint ‘hotspots’. Finally, GUIDE-Seq revealed that truncated guide RNAs exhibit substantially reduced RGN-induced off-target DSBs. Our experiments define the most rigorous framework for genome-wide identification of RGN off-target effects to date and provide a method for evaluating the safety of these nucleases prior to clinical use. PMID:25513782
Conserved microstructure of the Brassica B Genome of Brassica nigra in relation to homologous regions of Arabidopsis thaliana, B. rapa and B. oleracea

PubMed Central

2013-01-01

Background The Brassica B genome is known to carry several important traits, yet there has been limited analyses of its underlying genome structure, especially in comparison to the closely related A and C genomes. A bacterial artificial chromosome (BAC) library of Brassica nigra was developed and screened with 17 genes from a 222 kb region of A. thaliana that had been well characterised in both the Brassica A and C genomes. Results Fingerprinting of 483 apparently non-redundant clones defined physical contigs for the corresponding regions in B. nigra. The target region is duplicated in A. thaliana and six homologous contigs were found in B. nigra resulting from the whole genome triplication event shared by the Brassiceae tribe. BACs representative of each region were sequenced to elucidate the level of microscale rearrangements across the Brassica species divide. Conclusions Although the B genome species separated from the A/C lineage some 6 Mya, comparisons between the three paleopolyploid Brassica genomes revealed extensive conservation of gene content and sequence identity. The level of fractionation or gene loss varied across genomes and genomic regions; however, the greatest loss of genes was observed to be common to all three genomes. One large-scale chromosomal rearrangement differentiated the B genome suggesting such events could contribute to the lack of recombination observed between B genome species and those of the closely related A/C lineage. PMID:23586706
Mapping and genomic targeting of the major leaf shape gene (L) in Upland cotton (Gossypium hirsutum L.).

PubMed

Andres, Ryan J; Bowman, Daryl T; Kaur, Baljinder; Kuraparthy, Vasu

2014-01-01

A major leaf shape locus (L) was mapped with molecular markers and genomically targeted to a small region in the D-genome of cotton. By using expression analysis and candidate gene mapping, two LMI1 -like genes are identified as possible candidates for leaf shape trait in cotton. Leaf shape in cotton is an important trait that influences yield, flowering rates, disease resistance, lint trash, and the efficacy of foliar chemical application. The leaves of okra leaf cotton display a significantly enhanced lobing pattern, as well as ectopic outgrowths along the lobe margins when compared with normal leaf cotton. These phenotypes are the hallmark characteristics of mutations in various known modifiers of leaf shape that culminate in the mis/over-expression of Class I KNOX genes. To better understand the molecular and genetic processes underlying leaf shape in cotton, a normal leaf accession (PI607650) was crossed to an okra leaf breeding line (NC05AZ21). An F2 population of 236 individuals confirmed the incompletely dominant single gene nature of the okra leaf shape trait in Gossypium hirsutum L. Molecular mapping with simple sequence repeat markers localized the leaf shape gene to 5.4 cM interval in the distal region of the short arm of chromosome 15. Orthologous mapping of the closely linked markers with the sequenced diploid D-genome (Gossypium raimondii) tentatively resolved the leaf shape locus to a small genomic region. RT-PCR-based expression analysis and candidate gene mapping indicated that the okra leaf shape gene (L (o) ) in cotton might be an upstream regulator of Class I KNOX genes. The linked molecular markers and delineated genomic region in the sequenced diploid D-genome will assist in the future high-resolution mapping and map-based cloning of the leaf shape gene in cotton.
Genome-wide comparative diversity uncovers multiple targets of selection for improvement in hexaploid wheat landraces and cultivars.

PubMed

Cavanagh, Colin R; Chao, Shiaoman; Wang, Shichen; Huang, Bevan Emma; Stephen, Stuart; Kiani, Seifollah; Forrest, Kerrie; Saintenac, Cyrille; Brown-Guedira, Gina L; Akhunova, Alina; See, Deven; Bai, Guihua; Pumphrey, Michael; Tomar, Luxmi; Wong, Debbie; Kong, Stephan; Reynolds, Matthew; da Silva, Marta Lopez; Bockelman, Harold; Talbert, Luther; Anderson, James A; Dreisigacker, Susanne; Baenziger, Stephen; Carter, Arron; Korzun, Viktor; Morrell, Peter Laurent; Dubcovsky, Jorge; Morell, Matthew K; Sorrells, Mark E; Hayden, Matthew J; Akhunov, Eduard

2013-05-14

Domesticated crops experience strong human-mediated selection aimed at developing high-yielding varieties adapted to local conditions. To detect regions of the wheat genome subject to selection during improvement, we developed a high-throughput array to interrogate 9,000 gene-associated single-nucleotide polymorphisms (SNP) in a worldwide sample of 2,994 accessions of hexaploid wheat including landraces and modern cultivars. Using a SNP-based diversity map we characterized the impact of crop improvement on genomic and geographic patterns of genetic diversity. We found evidence of a small population bottleneck and extensive use of ancestral variation often traceable to founders of cultivars from diverse geographic regions. Analyzing genetic differentiation among populations and the extent of haplotype sharing, we identified allelic variants subjected to selection during improvement. Selective sweeps were found around genes involved in the regulation of flowering time and phenology. An introgression of a wild relative-derived gene conferring resistance to a fungal pathogen was detected by haplotype-based analysis. Comparing selective sweeps identified in different populations, we show that selection likely acts on distinct targets or multiple functionally equivalent alleles in different portions of the geographic range of wheat. The majority of the selected alleles were present at low frequency in local populations, suggesting either weak selection pressure or temporal variation in the targets of directional selection during breeding probably associated with changing agricultural practices or environmental conditions. The developed SNP chip and map of genetic variation provide a resource for advancing wheat breeding and supporting future population genomic and genome-wide association studies in wheat.

Genome-wide comparative diversity uncovers multiple targets of selection for improvement in hexaploid wheat landraces and cultivars

PubMed Central

Cavanagh, Colin R.; Chao, Shiaoman; Wang, Shichen; Huang, Bevan Emma; Stephen, Stuart; Kiani, Seifollah; Forrest, Kerrie; Saintenac, Cyrille; Brown-Guedira, Gina L.; Akhunova, Alina; See, Deven; Bai, Guihua; Pumphrey, Michael; Tomar, Luxmi; Wong, Debbie; Kong, Stephan; Reynolds, Matthew; da Silva, Marta Lopez; Bockelman, Harold; Talbert, Luther; Anderson, James A.; Dreisigacker, Susanne; Baenziger, Stephen; Carter, Arron; Korzun, Viktor; Morrell, Peter Laurent; Dubcovsky, Jorge; Morell, Matthew K.; Sorrells, Mark E.; Hayden, Matthew J.; Akhunov, Eduard

2013-01-01

Domesticated crops experience strong human-mediated selection aimed at developing high-yielding varieties adapted to local conditions. To detect regions of the wheat genome subject to selection during improvement, we developed a high-throughput array to interrogate 9,000 gene-associated single-nucleotide polymorphisms (SNP) in a worldwide sample of 2,994 accessions of hexaploid wheat including landraces and modern cultivars. Using a SNP-based diversity map we characterized the impact of crop improvement on genomic and geographic patterns of genetic diversity. We found evidence of a small population bottleneck and extensive use of ancestral variation often traceable to founders of cultivars from diverse geographic regions. Analyzing genetic differentiation among populations and the extent of haplotype sharing, we identified allelic variants subjected to selection during improvement. Selective sweeps were found around genes involved in the regulation of flowering time and phenology. An introgression of a wild relative-derived gene conferring resistance to a fungal pathogen was detected by haplotype-based analysis. Comparing selective sweeps identified in different populations, we show that selection likely acts on distinct targets or multiple functionally equivalent alleles in different portions of the geographic range of wheat. The majority of the selected alleles were present at low frequency in local populations, suggesting either weak selection pressure or temporal variation in the targets of directional selection during breeding probably associated with changing agricultural practices or environmental conditions. The developed SNP chip and map of genetic variation provide a resource for advancing wheat breeding and supporting future population genomic and genome-wide association studies in wheat. PMID:23630259
Targeted Sequencing of Venom Genes from Cone Snail Genomes Improves Understanding of Conotoxin Molecular Evolution

PubMed Central

Mahardika, Gusti N

2018-01-01

Abstract To expand our capacity to discover venom sequences from the genomes of venomous organisms, we applied targeted sequencing techniques to selectively recover venom gene superfamilies and nontoxin loci from the genomes of 32 cone snail species (family, Conidae), a diverse group of marine gastropods that capture their prey using a cocktail of neurotoxic peptides (conotoxins). We were able to successfully recover conotoxin gene superfamilies across all species with high confidence (> 100× coverage) and used these data to provide new insights into conotoxin evolution. First, we found that conotoxin gene superfamilies are composed of one to six exons and are typically short in length (mean = ∼85 bp). Second, we expanded our understanding of the following genetic features of conotoxin evolution: 1) positive selection, where exons coding the mature toxin region were often three times more divergent than their adjacent noncoding regions, 2) expression regulation, with comparisons to transcriptome data showing that cone snails only express a fraction of the genes available in their genome (24–63%), and 3) extensive gene turnover, where Conidae species varied from 120 to 859 conotoxin gene copies. Finally, using comparative phylogenetic methods, we found that while diet specificity did not predict patterns of conotoxin evolution, dietary breadth was positively correlated with total conotoxin gene diversity. Overall, the targeted sequencing technique demonstrated here has the potential to radically increase the pace at which venom gene families are sequenced and studied, reshaping our ability to understand the impact of genetic changes on ecologically relevant phenotypes and subsequent diversification. PMID:29514313
Genome and transcriptome sequencing identifies breeding targets in the orphan crop tef (Eragrostis tef).

PubMed

Cannarozzi, Gina; Plaza-Wüthrich, Sonia; Esfeld, Korinna; Larti, Stéphanie; Wilson, Yi Song; Girma, Dejene; de Castro, Edouard; Chanyalew, Solomon; Blösch, Regula; Farinelli, Laurent; Lyons, Eric; Schneider, Michel; Falquet, Laurent; Kuhlemeier, Cris; Assefa, Kebebew; Tadele, Zerihun

2014-07-09

Tef (Eragrostis tef), an indigenous cereal critical to food security in the Horn of Africa, is rich in minerals and protein, resistant to many biotic and abiotic stresses and safe for diabetics as well as sufferers of immune reactions to wheat gluten. We present the genome of tef, the first species in the grass subfamily Chloridoideae and the first allotetraploid assembled de novo. We sequenced the tef genome for marker-assisted breeding, to shed light on the molecular mechanisms conferring tef's desirable nutritional and agronomic properties, and to make its genome publicly available as a community resource. The draft genome contains 672 Mbp representing 87% of the genome size estimated from flow cytometry. We also sequenced two transcriptomes, one from a normalized RNA library and another from unnormalized RNASeq data. The normalized RNA library revealed around 38000 transcripts that were then annotated by the SwissProt group. The CoGe comparative genomics platform was used to compare the tef genome to other genomes, notably sorghum. Scaffolds comprising approximately half of the genome size were ordered by syntenic alignment to sorghum producing tef pseudo-chromosomes, which were sorted into A and B genomes as well as compared to the genetic map of tef. The draft genome was used to identify novel SSR markers, investigate target genes for abiotic stress resistance studies, and understand the evolution of the prolamin family of proteins that are responsible for the immune response to gluten. It is highly plausible that breeding targets previously identified in other cereal crops will also be valuable breeding targets in tef. The draft genome and transcriptome will be of great use for identifying these targets for genetic improvement of this orphan crop that is vital for feeding 50 million people in the Horn of Africa.
Combining functional genomics and chemical biology to identify targets of bioactive compounds.

PubMed

Ho, Cheuk Hei; Piotrowski, Jeff; Dixon, Scott J; Baryshnikova, Anastasia; Costanzo, Michael; Boone, Charles

2011-02-01

Genome sequencing projects have revealed thousands of suspected genes, challenging researchers to develop efficient large-scale functional analysis methodologies. Determining the function of a gene product generally requires a means to alter its function. Genetically tractable model organisms have been widely exploited for the isolation and characterization of activating and inactivating mutations in genes encoding proteins of interest. Chemical genetics represents a complementary approach involving the use of small molecules capable of either inactivating or activating their targets. Saccharomyces cerevisiae has been an important test bed for the development and application of chemical genomic assays aimed at identifying targets and modes of action of known and uncharacterized compounds. Here we review yeast chemical genomic assays strategies for drug target identification. Copyright © 2010 Elsevier Ltd. All rights reserved.
Applications of CRISPR genome editing technology in drug target identification and validation.

PubMed

Lu, Quinn; Livi, George P; Modha, Sundip; Yusa, Kosuke; Macarrón, Ricardo; Dow, David J

2017-06-01

The analysis of pharmaceutical industry data indicates that the major reason for drug candidates failing in late stage clinical development is lack of efficacy, with a high proportion of these due to erroneous hypotheses about target to disease linkage. More than ever, there is a requirement to better understand potential new drug targets and their role in disease biology in order to reduce attrition in drug development. Genome editing technology enables precise modification of individual protein coding genes, as well as noncoding regulatory sequences, enabling the elucidation of functional effects in human disease relevant cellular systems. Areas covered: This article outlines applications of CRISPR genome editing technology in target identification and target validation studies. Expert opinion: Applications of CRISPR technology in target validation studies are in evidence and gaining momentum. Whilst technical challenges remain, we are on the cusp of CRISPR being applied in complex cell systems such as iPS derived differentiated cells and stem cell derived organoids. In the meantime, our experience to date suggests that precise genome editing of putative targets in primary cell systems is possible, offering more human disease relevant systems than conventional cell lines.
Harnessing genomics to improve health in the Eastern Mediterranean Region – an executive course in genomics policy

PubMed Central

Acharya, Tara; Rab, Mohammed Abdur; Singer, Peter A; Daar, Abdallah S

2005-01-01

Background While innovations in medicine, science and technology have resulted in improved health and quality of life for many people, the benefits of modern medicine continue to elude millions of people in many parts of the world. To assess the potential of genomics to address health needs in EMR, the World Health Organization's Eastern Mediterranean Regional Office and the University of Toronto Joint Centre for Bioethics jointly organized a Genomics and Public Health Policy Executive Course, held September 20th–23rd, 2003, in Muscat, Oman. The 4-day course was sponsored by WHO-EMRO with additional support from the Canadian Program in Genomics and Global Health. The overall objective of the course was to collectively explore how to best harness genomics to improve health in the region. This article presents the course findings and recommendations for genomics policy in EMR. Methods The course brought together senior representatives from academia, biotechnology companies, regulatory bodies, media, voluntary, and legal organizations to engage in discussion. Topics covered included scientific advances in genomics, followed by innovations in business models, public sector perspectives, ethics, legal issues and national innovation systems. Results A set of recommendations, summarized below, was formulated for the Regional Office, the Member States and for individuals. • Advocacy for genomics and biotechnology for political leadership; • Networking between member states to share information, expertise, training, and regional cooperation in biotechnology; coordination of national surveys for assessment of health biotechnology innovation systems, science capacity, government policies, legislation and regulations, intellectual property policies, private sector activity; • Creation in each member country of an effective National Body on genomics, biotechnology and health to: - formulate national biotechnology strategies - raise biotechnology awareness - encourage
Sequencing of a new target genome: the Pediculus humanus humanus (Phthiraptera: Pediculidae) genome project.

PubMed

Pittendrigh, B R; Clark, J M; Johnston, J S; Lee, S H; Romero-Severson, J; Dasch, G A

2006-11-01

The human body louse, Pediculus humanus humanus (L.), and the human head louse, Pediculus humanus capitis, belong to the hemimetabolous order Phthiraptera. The body louse is the primary vector that transmits the bacterial agents of louse-borne relapsing fever, trench fever, and epidemic typhus. The genomes of the bacterial causative agents of several of these aforementioned diseases have been sequenced. Thus, determining the body louse genome will enhance studies of host-vector-pathogen interactions. Although not important as a major disease vector, head lice are of major social concern. Resistance to traditional pesticides used to control head and body lice have developed. It is imperative that new molecular targets be discovered for the development of novel compounds to control these insects. No complete genome sequence exists for a hemimetabolous insect species primarily because hemimetabolous insects often have large (2000 Mb) to very large (up to 16,300 Mb) genomes. Fortuitously, we determined that the human body louse has one of the smallest genome sizes known in insects, suggesting it may be a suitable choice as a minimal hemimetabolous genome in which many genes have been eliminated during its adaptation to human parasitism. Because many louse species infest birds and mammals, the body louse genome-sequencing project will facilitate studies of their comparative genomics. A 6-8X coverage of the body louse genome, plus sequenced expressed sequence tags, should provide the entomological, evolutionary biology, medical, and public health communities with useful genetic information.
Drug target inference through pathway analysis of genomics data

PubMed Central

Ma, Haisu; Zhao, Hongyu

2013-01-01

Statistical modeling coupled with bioinformatics is commonly used for drug discovery. Although there exist many approaches for single target based drug design and target inference, recent years have seen a paradigm shift to system-level pharmacological research. Pathway analysis of genomics data represents one promising direction for computational inference of drug targets. This article aims at providing a comprehensive review on the evolving issues is this field, covering methodological developments, their pros and cons, as well as future research directions. PMID:23369829
Genome-scale measurement of off-target activity using Cas9 toxicity in high-throughput screens.

PubMed

Morgens, David W; Wainberg, Michael; Boyle, Evan A; Ursu, Oana; Araya, Carlos L; Tsui, C Kimberly; Haney, Michael S; Hess, Gaelen T; Han, Kyuho; Jeng, Edwin E; Li, Amy; Snyder, Michael P; Greenleaf, William J; Kundaje, Anshul; Bassik, Michael C

2017-05-05

CRISPR-Cas9 screens are powerful tools for high-throughput interrogation of genome function, but can be confounded by nuclease-induced toxicity at both on- and off-target sites, likely due to DNA damage. Here, to test potential solutions to this issue, we design and analyse a CRISPR-Cas9 library with 10 variable-length guides per gene and thousands of negative controls targeting non-functional, non-genic regions (termed safe-targeting guides), in addition to non-targeting controls. We find this library has excellent performance in identifying genes affecting growth and sensitivity to the ricin toxin. The safe-targeting guides allow for proper control of toxicity from on-target DNA damage. Using this toxicity as a proxy to measure off-target cutting, we demonstrate with tens of thousands of guides both the nucleotide position-dependent sensitivity to single mismatches and the reduction of off-target cutting using truncated guides. Our results demonstrate a simple strategy for high-throughput evaluation of target specificity and nuclease toxicity in Cas9 screens.
Genome-scale measurement of off-target activity using Cas9 toxicity in high-throughput screens

PubMed Central

Morgens, David W.; Wainberg, Michael; Boyle, Evan A.; Ursu, Oana; Araya, Carlos L.; Tsui, C. Kimberly; Haney, Michael S.; Hess, Gaelen T.; Han, Kyuho; Jeng, Edwin E.; Li, Amy; Snyder, Michael P.; Greenleaf, William J.; Kundaje, Anshul; Bassik, Michael C.

2017-01-01

CRISPR-Cas9 screens are powerful tools for high-throughput interrogation of genome function, but can be confounded by nuclease-induced toxicity at both on- and off-target sites, likely due to DNA damage. Here, to test potential solutions to this issue, we design and analyse a CRISPR-Cas9 library with 10 variable-length guides per gene and thousands of negative controls targeting non-functional, non-genic regions (termed safe-targeting guides), in addition to non-targeting controls. We find this library has excellent performance in identifying genes affecting growth and sensitivity to the ricin toxin. The safe-targeting guides allow for proper control of toxicity from on-target DNA damage. Using this toxicity as a proxy to measure off-target cutting, we demonstrate with tens of thousands of guides both the nucleotide position-dependent sensitivity to single mismatches and the reduction of off-target cutting using truncated guides. Our results demonstrate a simple strategy for high-throughput evaluation of target specificity and nuclease toxicity in Cas9 screens. PMID:28474669
Microfluidic droplet enrichment for targeted sequencing

PubMed Central

Eastburn, Dennis J.; Huang, Yong; Pellegrino, Maurizio; Sciambi, Adam; Ptáček, Louis J.; Abate, Adam R.

2015-01-01

Targeted sequence enrichment enables better identification of genetic variation by providing increased sequencing coverage for genomic regions of interest. Here, we report the development of a new target enrichment technology that is highly differentiated from other approaches currently in use. Our method, MESA (Microfluidic droplet Enrichment for Sequence Analysis), isolates genomic DNA fragments in microfluidic droplets and performs TaqMan PCR reactions to identify droplets containing a desired target sequence. The TaqMan positive droplets are subsequently recovered via dielectrophoretic sorting, and the TaqMan amplicons are removed enzymatically prior to sequencing. We demonstrated the utility of this approach by generating an average 31.6-fold sequence enrichment across 250 kb of targeted genomic DNA from five unique genomic loci. Significantly, this enrichment enabled a more comprehensive identification of genetic polymorphisms within the targeted loci. MESA requires low amounts of input DNA, minimal prior locus sequence information and enriches the target region without PCR bias or artifacts. These features make it well suited for the study of genetic variation in a number of research and diagnostic applications. PMID:25873629
RGmatch: matching genomic regions to proximal genes in omics data integration.

PubMed

Furió-Tarí, Pedro; Conesa, Ana; Tarazona, Sonia

2016-11-22

The integrative analysis of multiple genomics data often requires that genome coordinates-based signals have to be associated with proximal genes. The relative location of a genomic region with respect to the gene (gene area) is important for functional data interpretation; hence algorithms that match regions to genes should be able to deliver insight into this information. In this work we review the tools that are publicly available for making region-to-gene associations. We also present a novel method, RGmatch, a flexible and easy-to-use Python tool that computes associations either at the gene, transcript, or exon level, applying a set of rules to annotate each region-gene association with the region location within the gene. RGmatch can be applied to any organism as long as genome annotation is available. Furthermore, we qualitatively and quantitatively compare RGmatch to other tools. RGmatch simplifies the association of a genomic region with its closest gene. At the same time, it is a powerful tool because the rules used to annotate these associations are very easy to modify according to the researcher's specific interests. Some important differences between RGmatch and other similar tools already in existence are RGmatch's flexibility, its wide range of user options, compatibility with any annotatable organism, and its comprehensive and user-friendly output.
Genome-wide analysis of murine renal distal convoluted tubular cells for the target genes of mineralocorticoid receptor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ueda, Kohei; Fujiki, Katsunori; Shirahige, Katsuhiko

Highlights: • We define a target gene of MR as that with MR-binding to the adjacent region of DNA. • We use ChIP-seq analysis in combination with microarray. • We, for the first time, explore the genome-wide binding profile of MR. • We reveal 5 genes as the direct target genes of MR in the renal epithelial cell-line. - Abstract: Background and objective: Mineralocorticoid receptor (MR) is a member of nuclear receptor family proteins and contributes to fluid homeostasis in the kidney. Although aldosterone-MR pathway induces several gene expressions in the kidney, it is often unclear whether the gene expressionsmore » are accompanied by direct regulations of MR through its binding to the regulatory region of each gene. The purpose of this study is to identify the direct target genes of MR in a murine distal convoluted tubular epithelial cell-line (mDCT). Methods: We analyzed the DNA samples of mDCT cells overexpressing 3xFLAG-hMR after treatment with 10{sup −7} M aldosterone for 1 h by chromatin immunoprecipitation with deep-sequence (ChIP-seq) and mRNA of the cell-line with treatment of 10{sup −7} M aldosterone for 3 h by microarray. Results: 3xFLAG-hMR overexpressed in mDCT cells accumulated in the nucleus in response to 10{sup −9} M aldosterone. Twenty-five genes were indicated as the candidate target genes of MR by ChIP-seq and microarray analyses. Five genes, Sgk1, Fkbp5, Rasl12, Tns1 and Tsc22d3 (Gilz), were validated as the direct target genes of MR by quantitative RT-qPCR and ChIP-qPCR. MR binding regions adjacent to Ctgf and Serpine1 were also validated. Conclusions: We, for the first time, captured the genome-wide distribution of MR in mDCT cells and, furthermore, identified five MR target genes in the cell-line. These results will contribute to further studies on the mechanisms of kidney diseases.« less
Deciphering the genomic targets of alkylating polyamide conjugates using high-throughput sequencing

PubMed Central

Chandran, Anandhakumar; Syed, Junetha; Taylor, Rhys D.; Kashiwazaki, Gengo; Sato, Shinsuke; Hashiya, Kaori; Bando, Toshikazu; Sugiyama, Hiroshi

2016-01-01

Chemically engineered small molecules targeting specific genomic sequences play an important role in drug development research. Pyrrole-imidazole polyamides (PIPs) are a group of molecules that can bind to the DNA minor-groove and can be engineered to target specific sequences. Their biological effects rely primarily on their selective DNA binding. However, the binding mechanism of PIPs at the chromatinized genome level is poorly understood. Herein, we report a method using high-throughput sequencing to identify the DNA-alkylating sites of PIP-indole-seco-CBI conjugates. High-throughput sequencing analysis of conjugate 2 showed highly similar DNA-alkylating sites on synthetic oligos (histone-free DNA) and on human genomes (chromatinized DNA context). To our knowledge, this is the first report identifying alkylation sites across genomic DNA by alkylating PIP conjugates using high-throughput sequencing. PMID:27098039
Sequencing of a QTL-rich region of the Theobroma cacao genome using pooled BACs and the identification of trait specific candidate genes.

PubMed

Feltus, Frank A; Saski, Christopher A; Mockaitis, Keithanne; Haiminen, Niina; Parida, Laxmi; Smith, Zachary; Ford, James; Staton, Margaret E; Ficklin, Stephen P; Blackmon, Barbara P; Cheng, Chun-Huai; Schnell, Raymond J; Kuhn, David N; Motamayor, Juan-Carlos

2011-07-27

BAC-based physical maps provide for sequencing across an entire genome or a selected sub-genomic region of biological interest. Such a region can be approached with next-generation whole-genome sequencing and assembly as if it were an independent small genome. Using the minimum tiling path as a guide, specific BAC clones representing the prioritized genomic interval are selected, pooled, and used to prepare a sequencing library. This pooled BAC approach was taken to sequence and assemble a QTL-rich region, of ~3 Mbp and represented by twenty-seven BACs, on linkage group 5 of the Theobroma cacao cv. Matina 1-6 genome. Using various mixtures of read coverages from paired-end and linear 454 libraries, multiple assemblies of varied quality were generated. Quality was assessed by comparing the assembly of 454 reads with a subset of ten BACs individually sequenced and assembled using Sanger reads. A mixture of reads optimal for assembly was identified. We found, furthermore, that a quality assembly suitable for serving as a reference genome template could be obtained even with a reduced depth of sequencing coverage. Annotation of the resulting assembly revealed several genes potentially responsible for three T. cacao traits: black pod disease resistance, bean shape index, and pod weight. Our results, as with other pooled BAC sequencing reports, suggest that pooling portions of a minimum tiling path derived from a BAC-based physical map is an effective method to target sub-genomic regions for sequencing. While we focused on a single QTL region, other QTL regions of importance could be similarly sequenced allowing for biological discovery to take place before a high quality whole-genome assembly is completed.
Genomes2Drugs: Identifies Target Proteins and Lead Drugs from Proteome Data

PubMed Central

Toomey, David; Hoppe, Heinrich C.; Brennan, Marian P.; Nolan, Kevin B.; Chubb, Anthony J.

2009-01-01

Background Genome sequencing and bioinformatics have provided the full hypothetical proteome of many pathogenic organisms. Advances in microarray and mass spectrometry have also yielded large output datasets of possible target proteins/genes. However, the challenge remains to identify new targets for drug discovery from this wealth of information. Further analysis includes bioinformatics and/or molecular biology tools to validate the findings. This is time consuming and expensive, and could fail to yield novel drugs if protein purification and crystallography is impossible. To pre-empt this, a researcher may want to rapidly filter the output datasets for proteins that show good homology to proteins that have already been structurally characterised or proteins that are already targets for known drugs. Critically, those researchers developing novel antibiotics need to select out the proteins that show close homology to any human proteins, as future inhibitors are likely to cross-react with the host protein, causing off-target toxicity effects later in clinical trials. Methodology/Principal Findings To solve many of these issues, we have developed a free online resource called Genomes2Drugs which ranks sequences to identify proteins that are (i) homologous to previously crystallized proteins or (ii) targets of known drugs, but are (iii) not homologous to human proteins. When tested using the Plasmodium falciparum malarial genome the program correctly enriched the ranked list of proteins with known drug target proteins. Conclusions/Significance Genomes2Drugs rapidly identifies proteins that are likely to succeed in drug discovery pipelines. This free online resource helps in the identification of potential drug targets. Importantly, the program further highlights proteins that are likely to be inhibited by FDA-approved drugs. These drugs can then be rapidly moved into Phase IV clinical studies under ‘change-of-application’ patents. PMID:19593435
Identification of Thiotetronic Acid Antibiotic Biosynthetic Pathways by Target-directed Genome Mining.

PubMed

Tang, Xiaoyu; Li, Jie; Millán-Aguiñaga, Natalie; Zhang, Jia Jia; O'Neill, Ellis C; Ugalde, Juan A; Jensen, Paul R; Mantovani, Simone M; Moore, Bradley S

2015-12-18

Recent genome sequencing efforts have led to the rapid accumulation of uncharacterized or "orphaned" secondary metabolic biosynthesis gene clusters (BGCs) in public databases. This increase in DNA-sequenced big data has given rise to significant challenges in the applied field of natural product genome mining, including (i) how to prioritize the characterization of orphan BGCs and (ii) how to rapidly connect genes to biosynthesized small molecules. Here, we show that by correlating putative antibiotic resistance genes that encode target-modified proteins with orphan BGCs, we predict the biological function of pathway specific small molecules before they have been revealed in a process we call target-directed genome mining. By querying the pan-genome of 86 Salinispora bacterial genomes for duplicated house-keeping genes colocalized with natural product BGCs, we prioritized an orphan polyketide synthase-nonribosomal peptide synthetase hybrid BGC (tlm) with a putative fatty acid synthase resistance gene. We employed a new synthetic double-stranded DNA-mediated cloning strategy based on transformation-associated recombination to efficiently capture tlm and the related ttm BGCs directly from genomic DNA and to heterologously express them in Streptomyces hosts. We show the production of a group of unusual thiotetronic acid natural products, including the well-known fatty acid synthase inhibitor thiolactomycin that was first described over 30 years ago, yet never at the genetic level in regards to biosynthesis and autoresistance. This finding not only validates the target-directed genome mining strategy for the discovery of antibiotic producing gene clusters without a priori knowledge of the molecule synthesized but also paves the way for the investigation of novel enzymology involved in thiotetronic acid natural product biosynthesis.
Targeted or whole genome sequencing of formalin fixed tissue samples: potential applications in cancer genomics.

PubMed

Munchel, Sarah; Hoang, Yen; Zhao, Yue; Cottrell, Joseph; Klotzle, Brandy; Godwin, Andrew K; Koestler, Devin; Beyerlein, Peter; Fan, Jian-Bing; Bibikova, Marina; Chien, Jeremy

2015-09-22

Current genomic studies are limited by the poor availability of fresh-frozen tissue samples. Although formalin-fixed diagnostic samples are in abundance, they are seldom used in current genomic studies because of the concern of formalin-fixation artifacts. Better characterization of these artifacts will allow the use of archived clinical specimens in translational and clinical research studies. To provide a systematic analysis of formalin-fixation artifacts on Illumina sequencing, we generated 26 DNA sequencing data sets from 13 pairs of matched formalin-fixed paraffin-embedded (FFPE) and fresh-frozen (FF) tissue samples. The results indicate high rate of concordant calls between matched FF/FFPE pairs at reference and variant positions in three commonly used sequencing approaches (whole genome, whole exome, and targeted exon sequencing). Global mismatch rates and C · G > T · A substitutions were comparable between matched FF/FFPE samples, and discordant rates were low (<0.26%) in all samples. Finally, low-pass whole genome sequencing produces similar pattern of copy number alterations between FF/FFPE pairs. The results from our studies suggest the potential use of diagnostic FFPE samples for cancer genomic studies to characterize and catalog variations in cancer genomes.
Ebolavirus comparative genomics

DOE PAGES

Jun, Se-Ran; Leuze, Michael R.; Nookaew, Intawat; ...

2015-07-14

The 2014 Ebola outbreak in West Africa is the largest documented for this virus. We examine the dynamics of this genome, comparing more than one hundred currently available ebolavirus genomes to each other and to other viral genomes. Based on oligomer frequency analysis, the family Filoviridae forms a distinct group from all other sequenced viral genomes. All filovirus genomes sequenced to date encode proteins with similar functions and gene order, although there is considerable divergence in sequences between the three genera Ebolavirus, Cuevavirus, and Marburgvirus within the family Filoviridae. Whereas all ebolavirus genomes are quite similar (multiple sequences of themore » same strain are often identical), variation is most common in the intergenic regions and within specific areas of the genes encoding the glycoprotein (GP), nucleoprotein (NP), and polymerase (L). We predict regions that could contain epitope-binding sites, which might be good vaccine targets. In conclusion, this information, combined with glycosylation sites and experimentally determined epitopes, can identify the most promising regions for the development of therapeutic strategies.« less
Sequencing of a QTL-rich region of the Theobroma cacao genome using pooled BACs and the identification of trait specific candidate genes

PubMed Central

2011-01-01

Background BAC-based physical maps provide for sequencing across an entire genome or a selected sub-genomic region of biological interest. Such a region can be approached with next-generation whole-genome sequencing and assembly as if it were an independent small genome. Using the minimum tiling path as a guide, specific BAC clones representing the prioritized genomic interval are selected, pooled, and used to prepare a sequencing library. Results This pooled BAC approach was taken to sequence and assemble a QTL-rich region, of ~3 Mbp and represented by twenty-seven BACs, on linkage group 5 of the Theobroma cacao cv. Matina 1-6 genome. Using various mixtures of read coverages from paired-end and linear 454 libraries, multiple assemblies of varied quality were generated. Quality was assessed by comparing the assembly of 454 reads with a subset of ten BACs individually sequenced and assembled using Sanger reads. A mixture of reads optimal for assembly was identified. We found, furthermore, that a quality assembly suitable for serving as a reference genome template could be obtained even with a reduced depth of sequencing coverage. Annotation of the resulting assembly revealed several genes potentially responsible for three T. cacao traits: black pod disease resistance, bean shape index, and pod weight. Conclusions Our results, as with other pooled BAC sequencing reports, suggest that pooling portions of a minimum tiling path derived from a BAC-based physical map is an effective method to target sub-genomic regions for sequencing. While we focused on a single QTL region, other QTL regions of importance could be similarly sequenced allowing for biological discovery to take place before a high quality whole-genome assembly is completed. PMID:21794110

Lineage-Specific Genome Architecture Links Enhancers and Non-coding Disease Variants to Target Gene Promoters.

PubMed

Javierre, Biola M; Burren, Oliver S; Wilder, Steven P; Kreuzhuber, Roman; Hill, Steven M; Sewitz, Sven; Cairns, Jonathan; Wingett, Steven W; Várnai, Csilla; Thiecke, Michiel J; Burden, Frances; Farrow, Samantha; Cutler, Antony J; Rehnström, Karola; Downes, Kate; Grassi, Luigi; Kostadima, Myrto; Freire-Pritchett, Paula; Wang, Fan; Stunnenberg, Hendrik G; Todd, John A; Zerbino, Daniel R; Stegle, Oliver; Ouwehand, Willem H; Frontini, Mattia; Wallace, Chris; Spivakov, Mikhail; Fraser, Peter

2016-11-17

Long-range interactions between regulatory elements and gene promoters play key roles in transcriptional regulation. The vast majority of interactions are uncharted, constituting a major missing link in understanding genome control. Here, we use promoter capture Hi-C to identify interacting regions of 31,253 promoters in 17 human primary hematopoietic cell types. We show that promoter interactions are highly cell type specific and enriched for links between active promoters and epigenetically marked enhancers. Promoter interactomes reflect lineage relationships of the hematopoietic tree, consistent with dynamic remodeling of nuclear architecture during differentiation. Interacting regions are enriched in genetic variants linked with altered expression of genes they contact, highlighting their functional role. We exploit this rich resource to connect non-coding disease variants to putative target promoters, prioritizing thousands of disease-candidate genes and implicating disease pathways. Our results demonstrate the power of primary cell promoter interactomes to reveal insights into genomic regulatory mechanisms underlying common diseases. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
How may targeted proteomics complement genomic data in breast cancer?

PubMed

Guerin, Mathilde; Gonçalves, Anthony; Toiron, Yves; Baudelet, Emilie; Audebert, Stéphane; Boyer, Jean-Baptiste; Borg, Jean-Paul; Camoin, Luc

2017-01-01

Breast cancer (BC) is the most common female cancer in the world and was recently deconstructed in different molecular entities. Although most of the recent assays to characterize tumors at the molecular level are genomic-based, proteins are the actual executors of cellular functions and represent the vast majority of targets for anticancer drugs. Accumulated data has demonstrated an important level of quantitative and qualitative discrepancies between genomic/transcriptomic alterations and their protein counterparts, mostly related to the large number of post-translational modifications. Areas covered: This review will present novel proteomics technologies such as Reverse Phase Protein Array (RPPA) or mass-spectrometry (MS) based approaches that have emerged and that could progressively replace old-fashioned methods (e.g. immunohistochemistry, ELISA, etc.) to validate proteins as diagnostic, prognostic or predictive biomarkers, and eventually monitor them in the routine practice. Expert commentary: These different targeted proteomic approaches, able to complement genomic data in BC and characterize tumors more precisely, will permit to go through a more personalized treatment for each patient and tumor.
Genome-Wide Analysis in Brazilians Reveals Highly Differentiated Native American Genome Regions

PubMed Central

Havt, Alexandre; Nayak, Uma; Pinkerton, Relana; Farber, Emily; Concannon, Patrick; Lima, Aldo A.; Guerrant, Richard L.

2017-01-01

Despite its population, geographic size, and emerging economic importance, disproportionately little genome-scale research exists into genetic factors that predispose Brazilians to disease, or the population genetics of risk. After identification of suitable proxy populations and careful analysis of tri-continental admixture in 1,538 North-Eastern Brazilians to estimate individual ancestry and ancestral allele frequencies, we computed 400,000 genome-wide locus-specific branch length (LSBL) Fst statistics of Brazilian Amerindian ancestry compared to European and African; and a similar set of differentiation statistics for their Amerindian component compared with the closest Asian 1000 Genomes population (surprisingly, Bengalis in Bangladesh). After ranking SNPs by these statistics, we identified the top 10 highly differentiated SNPs in five genome regions in the LSBL tests of Brazilian Amerindian ancestry compared to European and African; and the top 10 SNPs in eight regions comparing their Amerindian component to the closest Asian 1000 Genomes population. We found SNPs within or proximal to the genes CIITA (rs6498115), SMC6 (rs1834619), and KLHL29 (rs2288697) were most differentiated in the Amerindian-specific branch, while SNPs in the genes ADAMTS9 (rs7631391), DOCK2 (rs77594147), SLC28A1 (rs28649017), ARHGAP5 (rs7151991), and CIITA (rs45601437) were most highly differentiated in the Asian comparison. These genes are known to influence immune function, metabolic and anthropometry traits, and embryonic development. These analyses have identified candidate genes for selection within Amerindian ancestry, and by comparison of the two analyses, those for which the differentiation may have arisen during the migration from Asia to the Americas. PMID:28100790
Investigation of potential targets of Porphyromonas CRISPRs among the genomes of Porphyromonas species

PubMed Central

Shibasaki, Masaki; Maruyama, Fumito; Sekizaki, Tsutomu; Nakagawa, Ichiro

2017-01-01

The oral bacterial species Porphyromonas gingivalis, a periodontal pathogen, has plastic genomes that may be driven by homologous recombination with exogenous deoxyribonucleic acid (DNA) that is incorporated by natural transformation and conjugation. However, bacteriophages and plasmids, both of which are main resources of exogenous DNA, do not exist in the known P. gingivalis genomes. This could be associated with an adaptive immunity system conferred by clustered regularly interspaced short palindromic repeat (CRISPR) and CRISPR-associated (cas) genes in P. gingivalis as well as innate immune systems such as a restriction-modification system. In a previous study, few immune targets were predicted for P. gingivalis CRISPR/Cas. In this paper, we analyzed 51 P. gingivalis genomes, which were newly sequenced, and publicly available genomes of 13 P. gingivalis and 46 other Porphyromonas species. We detected 6 CRISPR/Cas types (classified by sequence similarity of repeat) in P. gingivalis and 12 other types in the remaining species. The Porphyromonas CRISPR spacers with potential targets in the genus Porphyromonas were approximately 23 times more abundant than those with potential targets in other genus taxa (1,720/6,896 spacers vs. 74/6,896 spacers). Porphyromonas CRISPR/Cas may be involved in genome plasticity by exhibiting selective interference against intra- and interspecies nucleic acids. PMID:28837670
Cytotoxic chromosomal targeting by CRISPR/Cas systems can reshape bacterial genomes and expel or remodel pathogenicity islands.

PubMed

Vercoe, Reuben B; Chang, James T; Dy, Ron L; Taylor, Corinda; Gristwood, Tamzin; Clulow, James S; Richter, Corinna; Przybilski, Rita; Pitman, Andrew R; Fineran, Peter C

2013-04-01

In prokaryotes, clustered regularly interspaced short palindromic repeats (CRISPRs) and their associated (Cas) proteins constitute a defence system against bacteriophages and plasmids. CRISPR/Cas systems acquire short spacer sequences from foreign genetic elements and incorporate these into their CRISPR arrays, generating a memory of past invaders. Defence is provided by short non-coding RNAs that guide Cas proteins to cleave complementary nucleic acids. While most spacers are acquired from phages and plasmids, there are examples of spacers that match genes elsewhere in the host bacterial chromosome. In Pectobacterium atrosepticum the type I-F CRISPR/Cas system has acquired a self-complementary spacer that perfectly matches a protospacer target in a horizontally acquired island (HAI2) involved in plant pathogenicity. Given the paucity of experimental data about CRISPR/Cas-mediated chromosomal targeting, we examined this process by developing a tightly controlled system. Chromosomal targeting was highly toxic via targeting of DNA and resulted in growth inhibition and cellular filamentation. The toxic phenotype was avoided by mutations in the cas operon, the CRISPR repeats, the protospacer target, and protospacer-adjacent motif (PAM) beside the target. Indeed, the natural self-targeting spacer was non-toxic due to a single nucleotide mutation adjacent to the target in the PAM sequence. Furthermore, we show that chromosomal targeting can result in large-scale genomic alterations, including the remodelling or deletion of entire pre-existing pathogenicity islands. These features can be engineered for the targeted deletion of large regions of bacterial chromosomes. In conclusion, in DNA-targeting CRISPR/Cas systems, chromosomal interference is deleterious by causing DNA damage and providing a strong selective pressure for genome alterations, which may have consequences for bacterial evolution and pathogenicity.
Comparative Genomic Analyses of the Human NPHP1 Locus Reveal Complex Genomic Architecture and Its Regional Evolution in Primates

PubMed Central

Yuan, Bo; Liu, Pengfei; Gupta, Aditya; Beck, Christine R.; Tejomurtula, Anusha; Campbell, Ian M.; Gambin, Tomasz; Simmons, Alexandra D.; Withers, Marjorie A.; Harris, R. Alan; Rogers, Jeffrey; Schwartz, David C.; Lupski, James R.

2015-01-01

Many loci in the human genome harbor complex genomic structures that can result in susceptibility to genomic rearrangements leading to various genomic disorders. Nephronophthisis 1 (NPHP1, MIM# 256100) is an autosomal recessive disorder that can be caused by defects of NPHP1; the gene maps within the human 2q13 region where low copy repeats (LCRs) are abundant. Loss of function of NPHP1 is responsible for approximately 85% of the NPHP1 cases—about 80% of such individuals carry a large recurrent homozygous NPHP1 deletion that occurs via nonallelic homologous recombination (NAHR) between two flanking directly oriented ~45 kb LCRs. Published data revealed a non-pathogenic inversion polymorphism involving the NPHP1 gene flanked by two inverted ~358 kb LCRs. Using optical mapping and array-comparative genomic hybridization, we identified three potential novel structural variant (SV) haplotypes at the NPHP1 locus that may protect a haploid genome from the NPHP1 deletion. Inter-species comparative genomic analyses among primate genomes revealed massive genomic changes during evolution. The aggregated data suggest that dynamic genomic rearrangements occurred historically within the NPHP1 locus and generated SV haplotypes observed in the human population today, which may confer differential susceptibility to genomic instability and the NPHP1 deletion within a personal genome. Our study documents diverse SV haplotypes at a complex LCR-laden human genomic region. Comparative analyses provide a model for how this complex region arose during primate evolution, and studies among humans suggest that intra-species polymorphism may potentially modulate an individual’s susceptibility to acquiring disease-associated alleles. PMID:26641089
Attenuation of monkeypox virus by deletion of genomic regions

USGS Publications Warehouse

Lopera, Juan G.; Falendysz, Elizabeth A.; Rocke, Tonie E.; Osorio, Jorge E.

2015-01-01

Monkeypox virus (MPXV) is an emerging pathogen from Africa that causes disease similar to smallpox. Two clades with different geographic distributions and virulence have been described. Here, we utilized bioinformatic tools to identify genomic regions in MPXV containing multiple virulence genes and explored their roles in pathogenicity; two selected regions were then deleted singularly or in combination. In vitro and in vivostudies indicated that these regions play a significant role in MPXV replication, tissue spread, and mortality in mice. Interestingly, while deletion of either region led to decreased virulence in mice, one region had no effect on in vitro replication. Deletion of both regions simultaneously also reduced cell culture replication and significantly increased the attenuation in vivo over either single deletion. Attenuated MPXV with genomic deletions present a safe and efficacious tool in the study of MPX pathogenesis and in the identification of genetic factors associated with virulence.
Attenuation of monkeypox virus by deletion of genomic regions.

PubMed

Lopera, Juan G; Falendysz, Elizabeth A; Rocke, Tonie E; Osorio, Jorge E

2015-01-15

Monkeypox virus (MPXV) is an emerging pathogen from Africa that causes disease similar to smallpox. Two clades with different geographic distributions and virulence have been described. Here, we utilized bioinformatic tools to identify genomic regions in MPXV containing multiple virulence genes and explored their roles in pathogenicity; two selected regions were then deleted singularly or in combination. In vitro and in vivo studies indicated that these regions play a significant role in MPXV replication, tissue spread, and mortality in mice. Interestingly, while deletion of either region led to decreased virulence in mice, one region had no effect on in vitro replication. Deletion of both regions simultaneously also reduced cell culture replication and significantly increased the attenuation in vivo over either single deletion. Attenuated MPXV with genomic deletions present a safe and efficacious tool in the study of MPX pathogenesis and in the identification of genetic factors associated with virulence. Copyright © 2014 Elsevier Inc. All rights reserved.
Whole genome analysis of CRISPR Cas9 sgRNA off-target homologies via an efficient computational algorithm.

PubMed

Zhou, Hong; Zhou, Michael; Li, Daisy; Manthey, Joseph; Lioutikova, Ekaterina; Wang, Hong; Zeng, Xiao

2017-11-17

The beauty and power of the genome editing mechanism, CRISPR Cas9 endonuclease system, lies in the fact that it is RNA-programmable such that Cas9 can be guided to any genomic loci complementary to a 20-nt RNA, single guide RNA (sgRNA), to cleave double stranded DNA, allowing the introduction of wanted mutations. Unfortunately, it has been reported repeatedly that the sgRNA can also guide Cas9 to off-target sites where the DNA sequence is homologous to sgRNA. Using human genome and Streptococcus pyogenes Cas9 (SpCas9) as an example, this article mathematically analyzed the probabilities of off-target homologies of sgRNAs and discovered that for large genome size such as human genome, potential off-target homologies are inevitable for sgRNA selection. A highly efficient computationl algorithm was developed for whole genome sgRNA design and off-target homology searches. By means of a dynamically constructed sequence-indexed database and a simplified sequence alignment method, this algorithm achieves very high efficiency while guaranteeing the identification of all existing potential off-target homologies. Via this algorithm, 1,876,775 sgRNAs were designed for the 19,153 human mRNA genes and only two sgRNAs were found to be free of off-target homology. By means of the novel and efficient sgRNA homology search algorithm introduced in this article, genome wide sgRNA design and off-target analysis were conducted and the results confirmed the mathematical analysis that for a sgRNA sequence, it is almost impossible to escape potential off-target homologies. Future innovations on the CRISPR Cas9 gene editing technology need to focus on how to eliminate the Cas9 off-target activity.
Application of industrial scale genomics to discovery of therapeutic targets in heart failure.

PubMed

Mehraban, F; Tomlinson, J E

2001-12-01

In recent years intense activity in both academic and industrial sectors has provided a wealth of information on the human genome with an associated impressive increase in the number of novel gene sequences deposited in sequence data repositories and patent applications. This genomic industrial revolution has transformed the way in which drug target discovery is now approached. In this article we discuss how various differential gene expression (DGE) technologies are being utilized for cardiovascular disease (CVD) drug target discovery. Other approaches such as sequencing cDNA from cardiovascular derived tissues and cells coupled with bioinformatic sequence analysis are used with the aim of identifying novel gene sequences that may be exploited towards target discovery. Additional leverage from gene sequence information is obtained through identification of polymorphisms that may confer disease susceptibility and/or affect drug responsiveness. Pharmacogenomic studies are described wherein gene expression-based techniques are used to evaluate drug response and/or efficacy. Industrial-scale genomics supports and addresses not only novel target gene discovery but also the burgeoning issues in pharmaceutical and clinical cardiovascular medicine relative to polymorphic gene responses.
Expansion of the CRISPR-Cas9 genome targeting space through the use of H1 promoter-expressed guide RNAs.

PubMed

Ranganathan, Vinod; Wahlin, Karl; Maruotti, Julien; Zack, Donald J

2014-08-08

The repurposed CRISPR-Cas9 system has recently emerged as a revolutionary genome-editing tool. Here we report a modification in the expression of the guide RNA (gRNA) required for targeting that greatly expands the targetable genome. gRNA expression through the commonly used U6 promoter requires a guanosine nucleotide to initiate transcription, thus constraining genomic-targeting sites to GN19NGG. We demonstrate the ability to modify endogenous genes using H1 promoter-expressed gRNAs, which can be used to target both AN19NGG and GN19NGG genomic sites. AN19NGG sites occur ~15% more frequently than GN19NGG sites in the human genome and the increase in targeting space is also enriched at human genes and disease loci. Together, our results enhance the versatility of the CRISPR technology by more than doubling the number of targetable sites within the human genome and other eukaryotic species.
Integrated genomic and interfacility patient-transfer data reveal the transmission pathways of multidrug-resistant Klebsiella pneumoniae in a regional outbreak.

PubMed

Snitkin, Evan S; Won, Sarah; Pirani, Ali; Lapp, Zena; Weinstein, Robert A; Lolans, Karen; Hayden, Mary K

2017-11-22

Development of effective strategies to limit the proliferation of multidrug-resistant organisms requires a thorough understanding of how such organisms spread among health care facilities. We sought to uncover the chains of transmission underlying a 2008 U.S. regional outbreak of carbapenem-resistant Klebsiella pneumoniae by performing an integrated analysis of genomic and interfacility patient-transfer data. Genomic analysis yielded a high-resolution transmission network that assigned directionality to regional transmission events and discriminated between intra- and interfacility transmission when epidemiologic data were ambiguous or misleading. Examining the genomic transmission network in the context of interfacility patient transfers (patient-sharing networks) supported the role of patient transfers in driving the outbreak, with genomic analysis revealing that a small subset of patient-transfer events was sufficient to explain regional spread. Further integration of the genomic and patient-sharing networks identified one nursing home as an important bridge facility early in the outbreak-a role that was not apparent from analysis of genomic or patient-transfer data alone. Last, we found that when simulating a real-time regional outbreak, our methodology was able to accurately infer the facility at which patients acquired their infections. This approach has the potential to identify facilities with high rates of intra- or interfacility transmission, data that will be useful for triggering targeted interventions to prevent further spread of multidrug-resistant organisms. Copyright © 2017 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
An integrated CRISPR Bombyx mori genome editing system with improved efficiency and expanded target sites.

PubMed

Ma, Sanyuan; Liu, Yue; Liu, Yuanyuan; Chang, Jiasong; Zhang, Tong; Wang, Xiaogang; Shi, Run; Lu, Wei; Xia, Xiaojuan; Zhao, Ping; Xia, Qingyou

2017-04-01

Genome editing enabled unprecedented new opportunities for targeted genomic engineering of a wide variety of organisms ranging from microbes, plants, animals and even human embryos. The serial establishing and rapid applications of genome editing tools significantly accelerated Bombyx mori (B. mori) research during the past years. However, the only CRISPR system in B. mori was the commonly used SpCas9, which only recognize target sites containing NGG PAM sequence. In the present study, we first improve the efficiency of our previous established SpCas9 system by 3.5 folds. The improved high efficiency was also observed at several loci in both BmNs cells and B. mori embryos. Then to expand the target sites, we showed that two newly discovered CRISPR system, SaCas9 and AsCpf1, could also induce highly efficient site-specific genome editing in BmNs cells, and constructed an integrated CRISPR system. Genome-wide analysis of targetable sites was further conducted and showed that the integrated system cover 69,144,399 sites in B. mori genome, and one site could be found in every 6.5 bp. The efficiency and resolution of this CRISPR platform will probably accelerate both fundamental researches and applicable studies in B. mori, and perhaps other insects. Copyright © 2017 Elsevier Ltd. All rights reserved.
Human long intrinsically disordered protein regions are frequent targets of positive selection.

PubMed

Afanasyeva, Arina; Bockwoldt, Mathias; Cooney, Christopher R; Heiland, Ines; Gossmann, Toni I

2018-06-01

Intrinsically disordered regions occur frequently in proteins and are characterized by a lack of a well-defined three-dimensional structure. Although these regions do not show a higher order of structural organization, they are known to be functionally important. Disordered regions are rapidly evolving, largely attributed to relaxed purifying selection and an increased role of genetic drift. It has also been suggested that positive selection might contribute to their rapid diversification. However, for our own species, it is currently unknown whether positive selection has played a role during the evolution of these protein regions. Here, we address this question by investigating the evolutionary pattern of more than 6600 human proteins with intrinsically disordered regions and their ordered counterparts. Our comparative approach with data from more than 90 mammalian genomes uses a priori knowledge of disordered protein regions, and we show that this increases the power to detect positive selection by an order of magnitude. We can confirm that human intrinsically disordered regions evolve more rapidly, not only within humans but also across the entire mammalian phylogeny. They have, however, experienced substantial evolutionary constraint, hinting at their fundamental functional importance. We find compelling evidence that disordered protein regions are frequent targets of positive selection and estimate that the relative rate of adaptive substitutions differs fourfold between disordered and ordered protein regions in humans. Our results suggest that disordered protein regions are important targets of genetic innovation and that the contribution of positive selection in these regions is more pronounced than in other protein parts. © 2018 Afanasyeva et al.; Published by Cold Spring Harbor Laboratory Press.
Unbiased Combinatorial Genomic Approaches to Identify Alternative Therapeutic Targets within the TSC Signaling Network

DTIC Science & Technology

2014-06-01

Specifically, we combined the CRISPR genome editing system with a novel approach allowing efficient single cell cloning of Drosophila cells with the aim of...and culture these to produce cultures completely lacking wildtype sequence at the target locus. No robust methods existed to clone single Drosophila ...targeting all kinases and phosphatases (563 genes) in the Drosophila genome . 65 samples that displayed synthetic lethality (15 genes) or synthetic
Genomic scan of selective sweeps in thin and fat tail sheep breeds for identifying of candidate regions associated with fat deposition

PubMed Central

2012-01-01

Background Identification of genomic regions that have been targets of selection for phenotypic traits is one of the most important and challenging areas of research in animal genetics. However, currently there are relatively few genomic regions identified that have been subject to positive selection. In this study, a genome-wide scan using ~50,000 Single Nucleotide Polymorphisms (SNPs) was performed in an attempt to identify genomic regions associated with fat deposition in fat-tail breeds. This trait and its modification are very important in those countries grazing these breeds. Results Two independent experiments using either Iranian or Ovine HapMap genotyping data contrasted thin and fat tail breeds. Population differentiation using FST in Iranian thin and fat tail breeds revealed seven genomic regions. Almost all of these regions overlapped with QTLs that had previously been identified as affecting fat and carcass yield traits in beef and dairy cattle. Study of selection sweep signatures using FST in thin and fat tail breeds sampled from the Ovine HapMap project confirmed three of these regions located on Chromosomes 5, 7 and X. We found increased homozygosity in these regions in favour of fat tail breeds on chromosome 5 and X and in favour of thin tail breeds on chromosome 7. Conclusions In this study, we were able to identify three novel regions associated with fat deposition in thin and fat tail sheep breeds. Two of these were associated with an increase of homozygosity in the fat tail breeds which would be consistent with selection for mutations affecting fat tail size several thousand years after domestication. PMID:22364287
Genomic regions underlying susceptibility to bovine tuberculosis in Holstein-Friesian cattle.

PubMed

Raphaka, Kethusegile; Matika, Oswald; Sánchez-Molano, Enrique; Mrode, Raphael; Coffey, Mike Peter; Riggio, Valentina; Glass, Elizabeth Janet; Woolliams, John Arthur; Bishop, Stephen Christopher; Banos, Georgios

2017-03-23

The significant social and economic loss as a result of bovine tuberculosis (bTB) presents a continuous challenge to cattle industries in the UK and worldwide. However, host genetic variation in cattle susceptibility to bTB provides an opportunity to select for resistant animals and further understand the genetic mechanisms underlying disease dynamics. The present study identified genomic regions associated with susceptibility to bTB using genome-wide association (GWA), regional heritability mapping (RHM) and chromosome association approaches. Phenotypes comprised de-regressed estimated breeding values of 804 Holstein-Friesian sires and pertained to three bTB indicator traits: i) positive reactors to the skin test with positive post-mortem examination results (phenotype 1); ii) positive reactors to the skin test regardless of post-mortem examination results (phenotype 2) and iii) as in (ii) plus non-reactors and inconclusive reactors to the skin tests with positive post-mortem examination results (phenotype 3). Genotypes based on the 50 K SNP DNA array were available and a total of 34,874 SNPs remained per animal after quality control. The estimated polygenic heritability for susceptibility to bTB was 0.26, 0.37 and 0.34 for phenotypes 1, 2 and 3, respectively. GWA analysis identified a putative SNP on Bos taurus autosomes (BTA) 2 associated with phenotype 1, and another on BTA 23 associated with phenotype 2. Genomic regions encompassing these SNPs were found to harbour potentially relevant annotated genes. RHM confirmed the effect of these genomic regions and identified new regions on BTA 18 for phenotype 1 and BTA 3 for phenotypes 2 and 3. Heritabilities of the genomic regions ranged between 0.05 and 0.08 across the three phenotypes. Chromosome association analysis indicated a major role of BTA 23 on susceptibility to bTB. Genomic regions and candidate genes identified in the present study provide an opportunity to further understand pathways critical to cattle
Genome assemblies for 11 Yersinia pestis strains isolated in the Caucasus region

DOE PAGES

Zhgenti, Ekaterine; Johnson, Shannon L.; Davenport, Karen W.; ...

2015-09-17

Yersinia pestis, the causative agent of plague, is endemic to the Caucasus region but few reference strain genome sequences from that region are available. We present the improved draft or finished assembled genomes from 11 strains isolated in the nation of Georgia and surrounding countries.
[Comparative analysis of variable regions in the genomes of variola virus].

PubMed

Babkin, I V; Nepomniashchikh, T S; Maksiutov, R A; Gutorov, V V; Babkina, I N; Shchelkunov, S N

2008-01-01

Nucleotide sequences of two extended segments of the terminal variable regions in variola virus genome were determined. The size of the left segment was 13.5 kbp and of the right, 10.5 kbp. Totally, over 540 kbp were sequenced for 22 variola virus strains. The conducted phylogenetic analysis and the data published earlier allowed us to find the interrelations between 70 variola virus isolates, the character of their clustering, and the degree of intergroup and intragroup variations of the clusters of variola virus strains. The most polymorphic loci of the genome segments studied were determined. It was demonstrated that that these loci are localized to either noncoding genome regions or to the regions of destroyed open reading frames, characteristic of the ancestor virus. These loci are promising for development of the strategy for genotyping variola virus strains. Analysis of recombination using various methods demonstrated that, with the only exception, no statistically significant recombinational events in the genomes of variola virus strains studied were detectable.
Unbiased Combinatorial Genomic Approaches to Identify Alternative Therapeutic Targets within the TSC Signaling Network

DTIC Science & Technology

2015-09-01

assessed the specificity of mutation in Drosophila S2R+ cells. We generated a quantitative mutation reporter vector in which an sgRNA target sequence ...phosphatases (563 genes) in the Drosophila genome (Figure 4). 65 samples that displayed synthetic lethality (15 genes) or synthetic increases in viability...targeting all kinases and phosphatases (563 genes) in the Drosophila genome . . Identified three hits (mRNA-Cap, Pitslre and CycT) that scored as

Functional annotation of HOT regions in the human genome: implications for human disease and cancer

PubMed Central

Li, Hao; Chen, Hebing; Liu, Feng; Ren, Chao; Wang, Shengqi; Bo, Xiaochen; Shu, Wenjie

2015-01-01

Advances in genome-wide association studies (GWAS) and large-scale sequencing studies have resulted in an impressive and growing list of disease- and trait-associated genetic variants. Most studies have emphasised the discovery of genetic variation in coding sequences, however, the noncoding regulatory effects responsible for human disease and cancer biology have been substantially understudied. To better characterise the cis-regulatory effects of noncoding variation, we performed a comprehensive analysis of the genetic variants in HOT (high-occupancy target) regions, which are considered to be one of the most intriguing findings of recent large-scale sequencing studies. We observed that GWAS variants that map to HOT regions undergo a substantial net decrease and illustrate development-specific localisation during haematopoiesis. Additionally, genetic risk variants are disproportionally enriched in HOT regions compared with LOT (low-occupancy target) regions in both disease-relevant and cancer cells. Importantly, this enrichment is biased toward disease- or cancer-specific cell types. Furthermore, we observed that cancer cells generally acquire cancer-specific HOT regions at oncogenes through diverse mechanisms of cancer pathogenesis. Collectively, our findings demonstrate the key roles of HOT regions in human disease and cancer and represent a critical step toward further understanding disease biology, diagnosis, and therapy. PMID:26113264
Functional annotation of HOT regions in the human genome: implications for human disease and cancer.

PubMed

Li, Hao; Chen, Hebing; Liu, Feng; Ren, Chao; Wang, Shengqi; Bo, Xiaochen; Shu, Wenjie

2015-06-26

Advances in genome-wide association studies (GWAS) and large-scale sequencing studies have resulted in an impressive and growing list of disease- and trait-associated genetic variants. Most studies have emphasised the discovery of genetic variation in coding sequences, however, the noncoding regulatory effects responsible for human disease and cancer biology have been substantially understudied. To better characterise the cis-regulatory effects of noncoding variation, we performed a comprehensive analysis of the genetic variants in HOT (high-occupancy target) regions, which are considered to be one of the most intriguing findings of recent large-scale sequencing studies. We observed that GWAS variants that map to HOT regions undergo a substantial net decrease and illustrate development-specific localisation during haematopoiesis. Additionally, genetic risk variants are disproportionally enriched in HOT regions compared with LOT (low-occupancy target) regions in both disease-relevant and cancer cells. Importantly, this enrichment is biased toward disease- or cancer-specific cell types. Furthermore, we observed that cancer cells generally acquire cancer-specific HOT regions at oncogenes through diverse mechanisms of cancer pathogenesis. Collectively, our findings demonstrate the key roles of HOT regions in human disease and cancer and represent a critical step toward further understanding disease biology, diagnosis, and therapy.
Cytotoxic Chromosomal Targeting by CRISPR/Cas Systems Can Reshape Bacterial Genomes and Expel or Remodel Pathogenicity Islands

PubMed Central

Vercoe, Reuben B.; Chang, James T.; Dy, Ron L.; Taylor, Corinda; Gristwood, Tamzin; Clulow, James S.; Richter, Corinna; Przybilski, Rita; Pitman, Andrew R.; Fineran, Peter C.

2013-01-01

In prokaryotes, clustered regularly interspaced short palindromic repeats (CRISPRs) and their associated (Cas) proteins constitute a defence system against bacteriophages and plasmids. CRISPR/Cas systems acquire short spacer sequences from foreign genetic elements and incorporate these into their CRISPR arrays, generating a memory of past invaders. Defence is provided by short non-coding RNAs that guide Cas proteins to cleave complementary nucleic acids. While most spacers are acquired from phages and plasmids, there are examples of spacers that match genes elsewhere in the host bacterial chromosome. In Pectobacterium atrosepticum the type I-F CRISPR/Cas system has acquired a self-complementary spacer that perfectly matches a protospacer target in a horizontally acquired island (HAI2) involved in plant pathogenicity. Given the paucity of experimental data about CRISPR/Cas–mediated chromosomal targeting, we examined this process by developing a tightly controlled system. Chromosomal targeting was highly toxic via targeting of DNA and resulted in growth inhibition and cellular filamentation. The toxic phenotype was avoided by mutations in the cas operon, the CRISPR repeats, the protospacer target, and protospacer-adjacent motif (PAM) beside the target. Indeed, the natural self-targeting spacer was non-toxic due to a single nucleotide mutation adjacent to the target in the PAM sequence. Furthermore, we show that chromosomal targeting can result in large-scale genomic alterations, including the remodelling or deletion of entire pre-existing pathogenicity islands. These features can be engineered for the targeted deletion of large regions of bacterial chromosomes. In conclusion, in DNA–targeting CRISPR/Cas systems, chromosomal interference is deleterious by causing DNA damage and providing a strong selective pressure for genome alterations, which may have consequences for bacterial evolution and pathogenicity. PMID:23637624
Region 6 Targeted Brownfields Assessment

EPA Pesticide Factsheets

A Target Brownfields Assessment (TBA) is a free service the EPA Region 6 Brownfields Team provides to communities to support their eligible brownfields projects. Region 6 consists of Arkansas, Louisiana, new Mexico, Oklahoma, and Texas.
Genome-wide identification of microRNA targets in the neglected disease pathogens of the genus Echinococcus.

PubMed

Macchiaroli, Natalia; Maldonado, Lucas L; Zarowiecki, Magdalena; Cucher, Marcela; Gismondi, María Inés; Kamenetzky, Laura; Rosenzvit, Mara Cecilia

2017-06-01

MicroRNAs (miRNAs), a class of small non-coding RNAs, are key regulators of gene expression at post-transcriptional level and play essential roles in biological processes such as development. MiRNAs silence target mRNAs by binding to complementary sequences in the 3'untranslated regions (3'UTRs). The parasitic helminths of the genus Echinococcus are the causative agents of echinococcosis, a zoonotic neglected disease. In previous work, we performed a comprehensive identification and characterization of Echinococcus miRNAs. However, current knowledge about their targets is limited. Since target prediction algorithms rely on complementarity between 3'UTRs and miRNA sequences, a major limitation is the lack of accurate sequence information of 3'UTR for most species including parasitic helminths. We performed RNA-seq and developed a pipeline that integrates the transcriptomic data with available genomic data of this parasite in order to identify 3'UTRs of Echinococcus canadensis. The high confidence set of 3'UTRs obtained allowed the prediction of miRNA targets in Echinococcus through a bioinformatic approach. We performed for the first time a comparative analysis of miRNA targets in Echinococcus and Taenia. We found that many evolutionarily conserved target sites in Echinococcus and Taenia may be functional and under selective pressure. Signaling pathways such as MAPK and Wnt were among the most represented pathways indicating miRNA roles in parasite growth and development. Genome-wide identification and characterization of miRNA target genes in Echinococcus provide valuable information to guide experimental studies in order to understand miRNA functions in the parasites biology. miRNAs involved in essential functions, especially those being absent in the host or showing sequence divergence with respect to host orthologs, might be considered as novel therapeutic targets for echinococcosis control. Copyright © 2017 Elsevier B.V. All rights reserved.
Sequencing small genomic targets with high efficiency and extreme accuracy

PubMed Central

Schmitt, Michael W.; Fox, Edward J.; Prindle, Marc J.; Reid-Bayliss, Kate S.; True, Lawrence D.; Radich, Jerald P.; Loeb, Lawrence A.

2015-01-01

The detection of minority variants in mixed samples demands methods for enrichment and accurate sequencing of small genomic intervals. We describe an efficient approach based on sequential rounds of hybridization with biotinylated oligonucleotides, enabling more than one-million fold enrichment of genomic regions of interest. In conjunction with error correcting double-stranded molecular tags, our approach enables the quantification of mutations in individual DNA molecules. PMID:25849638
The CRISPR-Cas9 technology: Closer to the ultimate toolkit for targeted genome editing.

PubMed

Quétier, Francis

2016-01-01

The first period of plant genome editing was based on Agrobacterium; chemical mutagenesis by EMS (ethyl methanesulfonate) and ionizing radiations; each of these technologies led to randomly distributed genome modifications. The second period is associated with the discoveries of homing and meganuclease enzymes during the 80s and 90s, which were then engineered to provide efficient tools for targeted editing. From 2006 to 2012, a few crop plants were successfully and precisely modified using zinc-finger nucleases. A third wave of improvement in genome editing, which led to a dramatic decrease in off-target events, was achieved in 2009-2011 with the TALEN technology. The latest revolution surfaced in 2013 with the CRISPR-Cas9 system, whose high efficiency and technical ease of use is really impressive; scientists can use in-house kits or commercially available kits; the only two requirements are to carefully choose the location of the DNA double strand breaks to be induced and then to order an oligonucleotide. While this close-to- ultimate toolkit for targeted editing of genomes represents dramatic scientific progress which allows the development of more complex useful agronomic traits through synthetic biology, the social acceptance of genome editing remains regularly questioned by anti-GMO citizens and organizations. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Genomic evaluation of regional dairy cattle breeds in single-breed and multibreed contexts.

PubMed

Jónás, D; Ducrocq, V; Fritz, S; Baur, A; Sanchez, M-P; Croiseau, P

2017-02-01

An important prerequisite for high prediction accuracy in genomic prediction is the availability of a large training population, which allows accurate marker effect estimation. This requirement is not fulfilled in case of regional breeds with a limited number of breeding animals. We assessed the efficiency of the current French routine genomic evaluation procedure in four regional breeds (Abondance, Tarentaise, French Simmental and Vosgienne) as well as the potential benefits when the training populations consisting of males and females of these breeds are merged to form a multibreed training population. Genomic evaluation was 5-11% more accurate than a pedigree-based BLUP in three of the four breeds, while the numerically smallest breed showed a < 1% increase in accuracy. Multibreed genomic evaluation was beneficial for two breeds (Abondance and French Simmental) with maximum gains of 5 and 8% in correlation coefficients between yield deviations and genomic estimated breeding values, when compared to the single-breed genomic evaluation results. Inflation of genomic evaluation of young candidates was also reduced. Our results indicate that genomic selection can be effective in regional breeds as well. Here, we provide empirical evidence proving that genetic distance between breeds is only one of the factors affecting the efficiency of multibreed genomic evaluation. © 2016 Blackwell Verlag GmbH.
Combined genome-wide linkage and targeted association analysis of head circumference in autism spectrum disorder families.

PubMed

Woodbury-Smith, M; Bilder, D A; Morgan, J; Jerominski, L; Darlington, T; Dyer, T; Paterson, A D; Coon, H

2017-01-01

It has long been recognized that there is an association between enlarged head circumference (HC) and autism spectrum disorder (ASD), but the genetics of HC in ASD is not well understood. In order to investigate the genetic underpinning of HC in ASD, we undertook a genome-wide linkage study of HC followed by linkage signal targeted association among a sample of 67 extended pedigrees with ASD. HC measurements on members of 67 multiplex ASD extended pedigrees were used as a quantitative trait in a genome-wide linkage analysis. The Illumina 6K SNP linkage panel was used, and analyses were carried out using the SOLAR implemented variance components model. Loci identified in this way formed the target for subsequent association analysis using the Illumina OmniExpress chip and imputed genotypes. A modification of the qTDT was used as implemented in SOLAR. We identified a linkage signal spanning 6p21.31 to 6p22.2 (maximum LOD = 3.4). Although targeted association did not find evidence of association with any SNP overall, in one family with the strongest evidence of linkage, there was evidence for association (rs17586672, p = 1.72E-07). Although this region does not overlap with ASD linkage signals in these same samples, it has been associated with other psychiatric risk, including ADHD, developmental dyslexia, schizophrenia, specific language impairment, and juvenile bipolar disorder. The genome-wide significant linkage signal represents the first reported observation of a potential quantitative trait locus for HC in ASD and may be relevant in the context of complex multivariate risk likely leading to ASD.
Genome-Wide Association Identifies SLC2A9 and NLN Gene Regions as Associated with Entropion in Domestic Sheep

PubMed Central

Mousel, Michelle R.; Reynolds, James O.; White, Stephen N.

2015-01-01

Entropion is an inward rolling of the eyelid allowing contact between the eyelashes and cornea that may lead to blindness if not corrected. Although many mammalian species, including humans and dogs, are afflicted by congenital entropion, no specific genes or gene regions related to development of entropion have been reported in any mammalian species to date. Entropion in domestic sheep is known to have a genetic component therefore, we used domestic sheep as a model system to identify genomic regions containing genes associated with entropion. A genome-wide association was conducted with congenital entropion in 998 Columbia, Polypay, and Rambouillet sheep genotyped with 50,000 SNP markers. Prevalence of entropion was 6.01%, with all breeds represented. Logistic regression was performed in PLINK with additive allelic, recessive, dominant, and genotypic inheritance models. Two genome-wide significant (empirical P<0.05) SNP were identified, specifically markers in SLC2A9 (empirical P = 0.007; genotypic model) and near NLN (empirical P = 0.026; dominance model). Six additional genome-wide suggestive SNP (nominal P<1x10-5) were identified including markers in or near PIK3CB (P = 2.22x10-6; additive model), KCNB1 (P = 2.93x10-6; dominance model), ZC3H12C (P = 3.25x10-6; genotypic model), JPH1 (P = 4.68x20-6; genotypic model), and MYO3B (P = 5.74x10-6; recessive model). This is the first report of specific gene regions associated with congenital entropion in any mammalian species, to our knowledge. Further, none of these genes have previously been associated with any eyelid traits. These results represent the first genome-wide analysis of gene regions associated with entropion and provide target regions for the development of sheep genetic markers for marker-assisted selection. PMID:26098909
Genome-Wide Association Identifies SLC2A9 and NLN Gene Regions as Associated with Entropion in Domestic Sheep.

PubMed

Mousel, Michelle R; Reynolds, James O; White, Stephen N

2015-01-01

Entropion is an inward rolling of the eyelid allowing contact between the eyelashes and cornea that may lead to blindness if not corrected. Although many mammalian species, including humans and dogs, are afflicted by congenital entropion, no specific genes or gene regions related to development of entropion have been reported in any mammalian species to date. Entropion in domestic sheep is known to have a genetic component therefore, we used domestic sheep as a model system to identify genomic regions containing genes associated with entropion. A genome-wide association was conducted with congenital entropion in 998 Columbia, Polypay, and Rambouillet sheep genotyped with 50,000 SNP markers. Prevalence of entropion was 6.01%, with all breeds represented. Logistic regression was performed in PLINK with additive allelic, recessive, dominant, and genotypic inheritance models. Two genome-wide significant (empirical P<0.05) SNP were identified, specifically markers in SLC2A9 (empirical P = 0.007; genotypic model) and near NLN (empirical P = 0.026; dominance model). Six additional genome-wide suggestive SNP (nominal P<1x10(-5)) were identified including markers in or near PIK3CB (P = 2.22x10(-6); additive model), KCNB1 (P = 2.93x10(-6); dominance model), ZC3H12C (P = 3.25x10(-6); genotypic model), JPH1 (P = 4.68x20(-6); genotypic model), and MYO3B (P = 5.74x10(-6); recessive model). This is the first report of specific gene regions associated with congenital entropion in any mammalian species, to our knowledge. Further, none of these genes have previously been associated with any eyelid traits. These results represent the first genome-wide analysis of gene regions associated with entropion and provide target regions for the development of sheep genetic markers for marker-assisted selection.
GEAR: genomic enrichment analysis of regional DNA copy number changes.

PubMed

Kim, Tae-Min; Jung, Yu-Chae; Rhyu, Mun-Gan; Jung, Myeong Ho; Chung, Yeun-Jun

2008-02-01

We developed an algorithm named GEAR (genomic enrichment analysis of regional DNA copy number changes) for functional interpretation of genome-wide DNA copy number changes identified by array-based comparative genomic hybridization. GEAR selects two types of chromosomal alterations with potential biological relevance, i.e. recurrent and phenotype-specific alterations. Then it performs functional enrichment analysis using a priori selected functional gene sets to identify primary and clinical genomic signatures. The genomic signatures identified by GEAR represent functionally coordinated genomic changes, which can provide clues on the underlying molecular mechanisms related to the phenotypes of interest. GEAR can help the identification of key molecular functions that are activated or repressed in the tumor genomes leading to the improved understanding on the tumor biology. GEAR software is available with online manual in the website, http://www.systemsbiology.co.kr/GEAR/.
Fine organization of genomic regions tagged to the 5S rDNA locus of the bread wheat 5B chromosome.

PubMed

Sergeeva, Ekaterina M; Shcherban, Andrey B; Adonina, Irina G; Nesterov, Michail A; Beletsky, Alexey V; Rakitin, Andrey L; Mardanov, Andrey V; Ravin, Nikolai V; Salina, Elena A

2017-11-14

The multigene family encoding the 5S rRNA, one of the most important structurally-functional part of the large ribosomal subunit, is an obligate component of all eukaryotic genomes. 5S rDNA has long been a favored target for cytological and phylogenetic studies due to the inherent peculiarities of its structural organization, such as the tandem arrays of repetitive units and their high interspecific divergence. The complex polyploid nature of the genome of bread wheat, Triticum aestivum, and the technically difficult task of sequencing clusters of tandem repeats mean that the detailed organization of extended genomic regions containing 5S rRNA genes remains unclear. This is despite the recent progress made in wheat genomic sequencing. Using pyrosequencing of BAC clones, in this work we studied the organization of two distinct 5S rDNA-tagged regions of the 5BS chromosome of bread wheat. Three BAC-clones containing 5S rDNA were identified in the 5BS chromosome-specific BAC-library of Triticum aestivum. Using the results of pyrosequencing and assembling, we obtained six 5S rDNA- containing contigs with a total length of 140,417 bp, and two sets (pools) of individual 5S rDNA sequences belonging to separate, but closely located genomic regions on the 5BS chromosome. Both regions are characterized by the presence of approximately 70-80 copies of 5S rDNA, however, they are completely different in their structural organization. The first region contained highly diverged short-type 5S rDNA units that were disrupted by multiple insertions of transposable elements. The second region contained the more conserved long-type 5S rDNA, organized as a single tandem array. FISH using probes specific to both 5S rDNA unit types showed differences in the distribution and intensity of signals on the chromosomes of polyploid wheat species and their diploid progenitors. A detailed structural organization of two closely located 5S rDNA-tagged genomic regions on the 5BS chromosome of bread
From Bioengineering to CRISPR/Cas9 – A Personal Retrospective of 20 Years of Research in Programmable Genome Targeting

PubMed Central

Jeltsch, Albert

2018-01-01

Genome targeting of restriction enzymes and DNA methyltransferases has many important applications including genome and epigenome editing. 15–20 years ago, my group was involved in the development of approaches for programmable genome targeting, aiming to connect enzymes with an oligodeoxynucleotide (ODN), which could form a sequence-specific triple helix at the genomic target site. Importantly, the target site of such enzyme-ODN conjugate could be varied simply by altering the ODN sequence promising great applicative values. However, this approach was facing many problems including the preparation and purification of the enzyme-ODN conjugates, their efficient delivery into cells, slow kinetics of triple helix formation and the requirement of a poly-purine target site sequence. Hence, for several years genome and epigenome editing approaches mainly were based on Zinc fingers and TAL proteins as targeting devices. More recently, CRISPR/Cas systems were discovered, which use a bound RNA for genome targeting that forms an RNA/DNA duplex with one DNA strand of the target site. These systems combine all potential advantages of the once imagined enzyme-ODN conjugates and avoid all main disadvantageous. Consequently, the application of CRISPR/Cas in genome and epigenome editing has exploded in recent years. We can draw two important conclusions from this example of research history. First, evolution still is the better bioengineer than humans and, whenever tested in parallel, natural solutions outcompete engineered ones. Second, CRISPR/Cas system were discovered in pure, curiosity driven, basic research, highlighting that it is basic, bottom-up research paving the way for fundamental innovation. PMID:29434619
Genome analysis of Excretory/Secretory proteins in Taenia solium reveals their Abundance of Antigenic Regions (AAR).

PubMed

Gomez, Sandra; Adalid-Peralta, Laura; Palafox-Fonseca, Hector; Cantu-Robles, Vito Adrian; Soberón, Xavier; Sciutto, Edda; Fragoso, Gladis; Bobes, Raúl J; Laclette, Juan P; Yauner, Luis del Pozo; Ochoa-Leyva, Adrián

2015-05-19

Excretory/Secretory (ES) proteins play an important role in the host-parasite interactions. Experimental identification of ES proteins is time-consuming and expensive. Alternative bioinformatics approaches are cost-effective and can be used to prioritize the experimental analysis of therapeutic targets for parasitic diseases. Here we predicted and functionally annotated the ES proteins in T. solium genome using an integration of bioinformatics tools. Additionally, we developed a novel measurement to evaluate the potential antigenicity of T. solium secretome using sequence length and number of antigenic regions of ES proteins. This measurement was formalized as the Abundance of Antigenic Regions (AAR) value. AAR value for secretome showed a similar value to that obtained for a set of experimentally determined antigenic proteins and was different to the calculated value for the non-ES proteins of T. solium genome. Furthermore, we calculated the AAR values for known helminth secretomes and they were similar to that obtained for T. solium. The results reveal the utility of AAR value as a novel genomic measurement to evaluate the potential antigenicity of secretomes. This comprehensive analysis of T. solium secretome provides functional information for future experimental studies, including the identification of novel ES proteins of therapeutic, diagnosis and immunological interest.
Genome analysis of Excretory/Secretory proteins in Taenia solium reveals their Abundance of Antigenic Regions (AAR)

PubMed Central

Gomez, Sandra; Adalid-Peralta, Laura; Palafox-Fonseca, Hector; Cantu-Robles, Vito Adrian; Soberón, Xavier; Sciutto, Edda; Fragoso, Gladis; Bobes, Raúl J.; Laclette, Juan P.; Yauner, Luis del Pozo; Ochoa-Leyva, Adrián

2015-01-01

Excretory/Secretory (ES) proteins play an important role in the host-parasite interactions. Experimental identification of ES proteins is time-consuming and expensive. Alternative bioinformatics approaches are cost-effective and can be used to prioritize the experimental analysis of therapeutic targets for parasitic diseases. Here we predicted and functionally annotated the ES proteins in T. solium genome using an integration of bioinformatics tools. Additionally, we developed a novel measurement to evaluate the potential antigenicity of T. solium secretome using sequence length and number of antigenic regions of ES proteins. This measurement was formalized as the Abundance of Antigenic Regions (AAR) value. AAR value for secretome showed a similar value to that obtained for a set of experimentally determined antigenic proteins and was different to the calculated value for the non-ES proteins of T. solium genome. Furthermore, we calculated the AAR values for known helminth secretomes and they were similar to that obtained for T. solium. The results reveal the utility of AAR value as a novel genomic measurement to evaluate the potential antigenicity of secretomes. This comprehensive analysis of T. solium secretome provides functional information for future experimental studies, including the identification of novel ES proteins of therapeutic, diagnosis and immunological interest. PMID:25989346
Genomic regions associated with kyphosis in swine

USDA-ARS?s Scientific Manuscript database

Background: A back curvature defect similar to kyphosis in humans has been observed in swine herds. The defect ranges from mild to severe curvature of the thoracic vertebrate in split carcasses and has an estimated heritability of 0.3. The objective of this study was to identify genomic regions that...
Comparative Genome Sequence Analysis of the Bpa/Str Region in Mouse and Man

PubMed Central

Mallon, A.-M.; Platzer, M.; Bate, R.; Gloeckner, G.; Botcherby, M.R.M.; Nordsiek, G.; Strivens, M.A.; Kioschis, P.; Dangel, A.; Cunningham, D.; Straw, R.N.A.; Weston, P.; Gilbert, M.; Fernando, S.; Goodall, K.; Hunter, G.; Greystrong, J.S.; Clarke, D.; Kimberley, C.; Goerdes, M.; Blechschmidt, K.; Rump, A.; Hinzmann, B.; Mundy, C.R.; Miller, W.; Poustka, A.; Herman, G.E.; Rhodes, M.; Denny, P.; Rosenthal, A.; Brown, S.D.M.

2000-01-01

The progress of human and mouse genome sequencing programs presages the possibility of systematic cross-species comparison of the two genomes as a powerful tool for gene and regulatory element identification. As the opportunities to perform comparative sequence analysis emerge, it is important to develop parameters for such analyses and to examine the outcomes of cross-species comparison. Our analysis used gene prediction and a database search of 430 kb of genomic sequence covering the Bpa/Str region of the mouse X chromosome, and 745 kb of genomic sequence from the homologous human X chromosome region. We identified 11 genes in mouse and 13 genes and two pseudogenes in human. In addition, we compared the mouse and human sequences using pairwise alignment and searches for evolutionary conserved regions (ECRs) exceeding a defined threshold of sequence identity. This approach aided the identification of at least four further putative conserved genes in the region. Comparative sequencing revealed that this region is a mosaic in evolutionary terms, with considerably more rearrangement between the two species than realized previously from comparative mapping studies. Surprisingly, this region showed an extremely high LINE and low SINE content, low G+C content, and yet a relatively high gene density, in contrast to the low gene density usually associated with such regions. [The sequence data described in this paper have been submitted to EMBL under the following accession nos.: Mouse Genomic Sequence: Mouse contig A (AL021127), Mouse contig B (AL049866), BAC41M10 (AL136328), PAC303O11(AL136329). Human Genomic Sequence: Human contig 1 (U82671, U82670), Human contig 2 (U82695).] PMID:10854409
NEBNext Direct: A Novel, Rapid, Hybridization-Based Approach for the Capture and Library Conversion of Genomic Regions of Interest.

PubMed

Emerman, Amy B; Bowman, Sarah K; Barry, Andrew; Henig, Noa; Patel, Kruti M; Gardner, Andrew F; Hendrickson, Cynthia L

2017-07-05

Next-generation sequencing (NGS) is a powerful tool for genomic studies, translational research, and clinical diagnostics that enables the detection of single nucleotide polymorphisms, insertions and deletions, copy number variations, and other genetic variations. Target enrichment technologies improve the efficiency of NGS by only sequencing regions of interest, which reduces sequencing costs while increasing coverage of the selected targets. Here we present NEBNext Direct ® , a hybridization-based, target-enrichment approach that addresses many of the shortcomings of traditional target-enrichment methods. This approach features a simple, 7-hr workflow that uses enzymatic removal of off-target sequences to achieve a high specificity for regions of interest. Additionally, unique molecular identifiers are incorporated for the identification and filtering of PCR duplicates. The same protocol can be used across a wide range of input amounts, input types, and panel sizes, enabling NEBNext Direct to be broadly applicable across a wide variety of research and diagnostic needs. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.
Identification of genomic variants putatively targeted by selection during dog domestication.

PubMed

Cagan, Alex; Blass, Torsten

2016-01-12

Dogs [Canis lupus familiaris] were the first animal species to be domesticated and continue to occupy an important place in human societies. Recent studies have begun to reveal when and where dog domestication occurred. While much progress has been made in identifying the genetic basis of phenotypic differences between dog breeds we still know relatively little about the genetic changes underlying the phenotypes that differentiate all dogs from their wild progenitors, wolves [Canis lupus]. In particular, dogs generally show reduced aggression and fear towards humans compared to wolves. Therefore, selection for tameness was likely a necessary prerequisite for dog domestication. With the increasing availability of whole-genome sequence data it is possible to try and directly identify the genetic variants contributing to the phenotypic differences between dogs and wolves. We analyse the largest available database of genome-wide polymorphism data in a global sample of dogs 69 and wolves 7. We perform a scan to identify regions of the genome that are highly differentiated between dogs and wolves. We identify putatively functional genomic variants that are segregating or at high frequency [> = 0.75 Fst] for alternative alleles between dogs and wolves. A biological pathways analysis of the genes containing these variants suggests that there has been selection on the 'adrenaline and noradrenaline biosynthesis pathway', well known for its involvement in the fight-or-flight response. We identify 11 genes with putatively functional variants fixed for alternative alleles between dogs and wolves. The segregating variants in these genes are strong candidates for having been targets of selection during early dog domestication. We present the first genome-wide analysis of the different categories of putatively functional variants that are fixed or segregating at high frequency between a global sampling of dogs and wolves. We find evidence that selection has been strongest

Regulation of Sex Determination in Mice by a Non-coding Genomic Region

PubMed Central

Arboleda, Valerie A.; Fleming, Alice; Barseghyan, Hayk; Délot, Emmanuèle; Sinsheimer, Janet S.; Vilain, Eric

2014-01-01

To identify novel genomic regions that regulate sex determination, we utilized the powerful C57BL/6J-YPOS (B6-YPOS) model of XY sex reversal where mice with autosomes from the B6 strain and a Y chromosome from a wild-derived strain, Mus domesticus poschiavinus (YPOS), show complete sex reversal. In B6-YPOS, the presence of a 55-Mb congenic region on chromosome 11 protects from sex reversal in a dose-dependent manner. Using mouse genetic backcross designs and high-density SNP arrays, we narrowed the congenic region to a 1.62-Mb genomic region on chromosome 11 that confers 80% protection from B6-YPOS sex reversal when one copy is present and complete protection when two copies are present. It was previously believed that the protective congenic region originated from the 129S1/SviMJ (129) strain. However, genomic analysis revealed that this region is not derived from 129 and most likely is derived from the semi-inbred strain POSA. We show that the small 1.62-Mb congenic region that protects against B6-YPOS sex reversal is located within the Sox9 promoter and promotes the expression of Sox9, thereby driving testis development within the B6-YPOS background. Through 30 years of backcrossing, this congenic region was maintained, as it promoted male sex determination and fertility despite the female-promoting B6-YPOS genetic background. Our findings demonstrate that long-range enhancer regions are critical to developmental processes and can be used to identify the complex interplay between genome variants, epigenetics, and developmental gene regulation. PMID:24793290
A resource for characterizing genome-wide binding and putative target genes of transcription factors expressed during secondary growth and wood formation in Populus.

PubMed

Liu, Lijun; Ramsay, Trevor; Zinkgraf, Matthew; Sundell, David; Street, Nathaniel Robert; Filkov, Vladimir; Groover, Andrew

2015-06-01

Identifying transcription factor target genes is essential for modeling the transcriptional networks underlying developmental processes. Here we report a chromatin immunoprecipitation sequencing (ChIP-seq) resource consisting of genome-wide binding regions and associated putative target genes for four Populus homeodomain transcription factors expressed during secondary growth and wood formation. Software code (programs and scripts) for processing the Populus ChIP-seq data are provided within a publically available iPlant image, including tools for ChIP-seq data quality control and evaluation adapted from the human Encyclopedia of DNA Elements (ENCODE) project. Basic information for each transcription factor (including members of Class I KNOX, Class III HD ZIP, BEL1-like families) binding are summarized, including the number and location of binding regions, distribution of binding regions relative to gene features, associated putative target genes, and enriched functional categories of putative target genes. These ChIP-seq data have been integrated within the Populus Genome Integrative Explorer (PopGenIE) where they can be analyzed using a variety of web-based tools. We present an example analysis that shows preferential binding of transcription factor ARBORKNOX1 to the nearest neighbor genes in a pre-calculated co-expression network module, and enrichment for meristem-related genes within this module including multiple orthologs of Arabidopsis KNOTTED-like Arabidopsis 2/6. © 2015 Society for Experimental Biology and John Wiley & Sons Ltd This article has been contributed to by US Government employees and their work is in the public domain in the USA.
[Efficient genome editing in human pluripotent stem cells through CRISPR/Cas9].

PubMed

Liu, Gai-gai; Li, Shuang; Wei, Yu-da; Zhang, Yong-xian; Ding, Qiu-rong

2015-11-01

The RNA-guided CRISPR (clustered regularly interspaced short palindromic repeat)-associated Cas9 nuclease has offered a new platform for genome editing with high efficiency. Here, we report the use of CRISPR/Cas9 technology to target a specific genomic region in human pluripotent stem cells. We show that CRISPR/Cas9 can be used to disrupt a gene by introducing frameshift mutations to gene coding region; to knock in specific sequences (e.g. FLAG tag DNA sequence) to targeted genomic locus via homology directed repair; to induce large genomic deletion through dual-guide multiplex. Our results demonstrate the versatile application of CRISPR/Cas9 in stem cell genome editing, which can be widely utilized for functional studies of genes or genome loci in human pluripotent stem cells.
Short interspersed transposable elements (SINEs) are excluded from imprinted regions in the human genome.

PubMed

Greally, John M

2002-01-08

To test whether regions undergoing genomic imprinting have unique genomic characteristics, imprinted and nonimprinted human loci were compared for nucleotide and retroelement composition. Maternally and paternally expressed subgroups of imprinted genes were found to differ in terms of guanine and cytosine, CpG, and retroelement content, indicating a segregation into distinct genomic compartments. Imprinted regions have been normally permissive to L1 long interspersed transposable element retroposition during mammalian evolution but universally and significantly lack short interspersed transposable elements (SINEs). The primate-specific Alu SINEs, as well as the more ancient mammalian-wide interspersed repeat SINEs, are found at significantly low densities in imprinted regions. The latter paleogenomic signature indicates that the sequence characteristics of currently imprinted regions existed before the mammalian radiation. Transitions from imprinted to nonimprinted genomic regions in cis are characterized by a sharp inflection in SINE content, demonstrating that this genomic characteristic can help predict the presence and extent of regions undergoing imprinting. During primate evolution, SINE accumulation in imprinted regions occurred at a decreased rate compared with control loci. The constraint on SINE accumulation in imprinted regions may be mediated by an active selection process. This selection could be because of SINEs attracting and spreading methylation, as has been found at other loci. Methylation-induced silencing could lead to deleterious consequences at imprinted loci, where inactivation of one allele is already established, and expression is often essential for embryonic growth and survival.
Short interspersed transposable elements (SINEs) are excluded from imprinted regions in the human genome

PubMed Central

Greally, John M.

2002-01-01

To test whether regions undergoing genomic imprinting have unique genomic characteristics, imprinted and nonimprinted human loci were compared for nucleotide and retroelement composition. Maternally and paternally expressed subgroups of imprinted genes were found to differ in terms of guanine and cytosine, CpG, and retroelement content, indicating a segregation into distinct genomic compartments. Imprinted regions have been normally permissive to L1 long interspersed transposable element retroposition during mammalian evolution but universally and significantly lack short interspersed transposable elements (SINEs). The primate-specific Alu SINEs, as well as the more ancient mammalian-wide interspersed repeat SINEs, are found at significantly low densities in imprinted regions. The latter paleogenomic signature indicates that the sequence characteristics of currently imprinted regions existed before the mammalian radiation. Transitions from imprinted to nonimprinted genomic regions in cis are characterized by a sharp inflection in SINE content, demonstrating that this genomic characteristic can help predict the presence and extent of regions undergoing imprinting. During primate evolution, SINE accumulation in imprinted regions occurred at a decreased rate compared with control loci. The constraint on SINE accumulation in imprinted regions may be mediated by an active selection process. This selection could be because of SINEs attracting and spreading methylation, as has been found at other loci. Methylation-induced silencing could lead to deleterious consequences at imprinted loci, where inactivation of one allele is already established, and expression is often essential for embryonic growth and survival. PMID:11756672
Genome-wide prediction of vaccine targets for human herpes simplex viruses using Vaxign reverse vaccinology

PubMed Central

2013-01-01

Herpes simplex virus (HSV) types 1 and 2 (HSV-1 and HSV-2) are the most common infectious agents of humans. No safe and effective HSV vaccines have been licensed. Reverse vaccinology is an emerging and revolutionary vaccine development strategy that starts with the prediction of vaccine targets by informatics analysis of genome sequences. Vaxign (http://www.violinet.org/vaxign) is the first web-based vaccine design program based on reverse vaccinology. In this study, we used Vaxign to analyze 52 herpesvirus genomes, including 3 HSV-1 genomes, one HSV-2 genome, 8 other human herpesvirus genomes, and 40 non-human herpesvirus genomes. The HSV-1 strain 17 genome that contains 77 proteins was used as the seed genome. These 77 proteins are conserved in two other HSV-1 strains (strain F and strain H129). Two envelope glycoproteins gJ and gG do not have orthologs in HSV-2 or 8 other human herpesviruses. Seven HSV-1 proteins (including gJ and gG) do not have orthologs in all 40 non-human herpesviruses. Nineteen proteins are conserved in all human herpesviruses, including capsid scaffold protein UL26.5 (NP_044628.1). As the only HSV-1 protein predicted to be an adhesin, UL26.5 is a promising vaccine target. The MHC Class I and II epitopes were predicted by the Vaxign Vaxitop prediction program and IEDB prediction programs recently installed and incorporated in Vaxign. Our comparative analysis found that the two programs identified largely the same top epitopes but also some positive results predicted from one program might not be positive from another program. Overall, our Vaxign computational prediction provides many promising candidates for rational HSV vaccine development. The method is generic and can also be used to predict other viral vaccine targets. PMID:23514126
The Effects of Signal Erosion and Core Genome Reduction on the Identification of Diagnostic Markers

PubMed Central

Sahl, Jason W.; Vazquez, Adam J.; Hall, Carina M.; Busch, Joseph D.; Tuanyok, Apichai; Mayo, Mark; Schupp, James M.; Lummis, Madeline; Pearson, Talima; Shippy, Kenzie; Allender, Christopher J.; Theobald, Vanessa; Hutcheson, Alex; Korlach, Jonas; LiPuma, John J.; Ladner, Jason; Lovett, Sean; Koroleva, Galina; Palacios, Gustavo; Limmathurotsakul, Direk; Wuthiekanun, Vanaporn; Wongsuwan, Gumphol; Currie, Bart J.

2016-01-01

ABSTRACT Whole-genome sequence (WGS) data are commonly used to design diagnostic targets for the identification of bacterial pathogens. To do this effectively, genomics databases must be comprehensive to identify the strict core genome that is specific to the target pathogen. As additional genomes are analyzed, the core genome size is reduced and there is erosion of the target-specific regions due to commonality with related species, potentially resulting in the identification of false positives and/or false negatives. PMID:27651357
Region 6 Targeted Brownfields Assessment Brochure

EPA Pesticide Factsheets

A Target Brownfields Assessment (TBA) is a free service the EPA Region 6 Brownfields Team provides to communities to support their eligible brownfields projects. Region 6 consists of Arkansas, Louisiana, New Mexico, Oklahoma, and Texas.
Programmable Removal of Bacterial Strains by Use of Genome-Targeting CRISPR-Cas Systems

PubMed Central

Gomaa, Ahmed A.; Klumpe, Heidi E.; Luo, Michelle L.; Selle, Kurt; Barrangou, Rodolphe; Beisel, Chase L.

2014-01-01

ABSTRACT CRISPR (clustered regularly interspaced short palindromic repeats)-Cas (CRISPR-associated) systems in bacteria and archaea employ CRISPR RNAs to specifically recognize the complementary DNA of foreign invaders, leading to sequence-specific cleavage or degradation of the target DNA. Recent work has shown that the accidental or intentional targeting of the bacterial genome is cytotoxic and can lead to cell death. Here, we have demonstrated that genome targeting with CRISPR-Cas systems can be employed for the sequence-specific and titratable removal of individual bacterial strains and species. Using the type I-E CRISPR-Cas system in Escherichia coli as a model, we found that this effect could be elicited using native or imported systems and was similarly potent regardless of the genomic location, strand, or transcriptional activity of the target sequence. Furthermore, the specificity of targeting with CRISPR RNAs could readily distinguish between even highly similar strains in pure or mixed cultures. Finally, varying the collection of delivered CRISPR RNAs could quantitatively control the relative number of individual strains within a mixed culture. Critically, the observed selectivity and programmability of bacterial removal would be virtually impossible with traditional antibiotics, bacteriophages, selectable markers, or tailored growth conditions. Once delivery challenges are addressed, we envision that this approach could offer a novel means to quantitatively control the composition of environmental and industrial microbial consortia and may open new avenues for the development of “smart” antibiotics that circumvent multidrug resistance and differentiate between pathogenic and beneficial microorganisms. PMID:24473129
A new age in functional genomics using CRISPR/Cas9 in arrayed library screening.

PubMed

Agrotis, Alexander; Ketteler, Robin

2015-01-01

CRISPR technology has rapidly changed the face of biological research, such that precise genome editing has now become routine for many labs within several years of its initial development. What makes CRISPR/Cas9 so revolutionary is the ability to target a protein (Cas9) to an exact genomic locus, through designing a specific short complementary nucleotide sequence, that together with a common scaffold sequence, constitute the guide RNA bridging the protein and the DNA. Wild-type Cas9 cleaves both DNA strands at its target sequence, but this protein can also be modified to exert many other functions. For instance, by attaching an activation domain to catalytically inactive Cas9 and targeting a promoter region, it is possible to stimulate the expression of a specific endogenous gene. In principle, any genomic region can be targeted, and recent efforts have successfully generated pooled guide RNA libraries for coding and regulatory regions of human, mouse and Drosophila genomes with high coverage, thus facilitating functional phenotypic screening. In this review, we will highlight recent developments in the area of CRISPR-based functional genomics and discuss potential future directions, with a special focus on mammalian cell systems and arrayed library screening.
[Overview of patents on targeted genome editing technologies and their implications for innovation and entrepreneurship education in universities].

PubMed

Fan, Xiang-yu; Lin, Yan-ping; Liao, Guo-jian; Xie, Jian-ping

2015-12-01

Zinc finger nuclease, transcription activator-like effector nuclease, and clustered regularly interspaced short palindromic repeats/Cas9 nuclease are important targeted genome editing technologies. They have great significance in scientific research and applications on aspects of functional genomics research, species improvement, disease prevention and gene therapy. There are past or ongoing disputes over ownership of the intellectual property behind every technology. In this review, we summarize the patents on these three targeted genome editing technologies in order to provide some reference for developing genome editing technologies with self-owned intellectual property rights and some implications for current innovation and entrepreneurship education in universities.
Gross rearrangements within the 5'-untranslated region of the picornaviral genomes.

PubMed

Pilipenko, E V; Blinov, V M; Agol, V I

1990-06-11

An analysis of reported nucleotide sequences revealed several cases of gross rearrangements in the 5'-untranslated region (5-UTR) of picornaviral genomes. A large (greater than 100 nt) duplication was discovered in a downstream region of poliovirus 5-UTR involved in the translational control. Properties of the poliovirus mutants with large deletions [Kuge and Nomoto (1987) J. Virol. 61, 1478-1487] show that a single copy of the appropriate repeating unit is compatible with a wild type phenotype of the virus. In contrast to poliovirus and another enterovirus genomes, human rhinovirus RNAs contain only a single copy of this repeating unit. Another similarly large repeat was found in an upstream segment of the bovine enterovirus 5-UTR. A comparison of the primary and secondary structures of cardio- and aphthovirus 5-UTRs demonstrated the existence of a large (ca. 250 nucleotides) insertion/deletion in a region preceding the poly(C) tract. The two latter rearrangements appear to involve elements of the viral genome replication machinery. Possible origin as well as evolutionary and functional implications of these structural peculiarities are discussed.
Advances in sarcoma genomics and new therapeutic targets

PubMed Central

Taylor, Barry S.; Barretina, Jordi; Maki, Robert G.; Antonescu, Cristina R.; Singer, Samuel; Ladanyi, Marc

2012-01-01

Preface Increasingly, human mesenchymal malignancies are classified by the abnormalities that drive their pathogenesis. While many of these aberrations are highly prevalent within particular sarcoma subtypes, few are currently targeted therapeutically. Indeed, most subtypes of sarcoma are still treated with traditional therapeutic modalities and in many cases are resistant to adjuvant therapies. In this Review, we discuss the core molecular determinants of sarcomagenesis and emphasize the emerging genomic and functional genetic approaches that, coupled to novel therapeutic strategies, have the potential to transform the care of patients with sarcoma. PMID:21753790
Peroxisome Proliferator-Activated Receptor Subtype- and Cell-Type-Specific Activation of Genomic Target Genes upon Adenoviral Transgene Delivery

PubMed Central

Nielsen, Ronni; Grøntved, Lars; Stunnenberg, Hendrik G.; Mandrup, Susanne

2006-01-01

Investigations of the molecular events involved in activation of genomic target genes by peroxisome proliferator-activated receptors (PPARs) have been hampered by the inability to establish a clean on/off state of the receptor in living cells. Here we show that the combination of adenoviral delivery and chromatin immunoprecipitation (ChIP) is ideal for dissecting these mechanisms. Adenoviral delivery of PPARs leads to a rapid and synchronous expression of the PPAR subtypes, establishment of transcriptional active complexes at genomic loci, and immediate activation of even silent target genes. We demonstrate that PPARγ2 possesses considerable ligand-dependent as well as independent transactivation potential and that agonists increase the occupancy of PPARγ2/retinoid X receptor at PPAR response elements. Intriguingly, by direct comparison of the PPARs (α, γ, and β/δ), we show that the subtypes have very different abilities to gain access to target sites and that in general the genomic occupancy correlates with the ability to activate the corresponding target gene. In addition, the specificity and potency of activation by PPAR subtypes are highly dependent on the cell type. Thus, PPAR subtype-specific activation of genomic target genes involves an intricate interplay between the properties of the subtype- and cell-type-specific settings at the individual target loci. PMID:16847324
Chromosomal Targeting by the Type III-A CRISPR-Cas System Can Reshape Genomes in Staphylococcus aureus

PubMed Central

Guan, Jing; Wang, Wanying

2017-01-01

ABSTRACT CRISPR-Cas (clustered regularly interspaced short palindromic repeat [CRISPR]-CRISPR-associated protein [Cas]) systems can provide protection against invading genetic elements by using CRISPR RNAs (crRNAs) as a guide to locate and degrade the target DNA. CRISPR-Cas systems have been classified into two classes and five types according to the content of cas genes. Previous studies have indicated that CRISPR-Cas systems can avoid viral infection and block plasmid transfer. Here we show that chromosomal targeting by the Staphylococcus aureus type III-A CRISPR-Cas system can drive large-scale genome deletion and alteration within integrated staphylococcal cassette chromosome mec (SCCmec). The targeting activity of the CRISPR-Cas system is associated with the complementarity between crRNAs and protospacers, and 10- to 13-nucleotide truncations of spacers partially block CRISPR attack and more than 13-nucleotide truncation can fully abolish targeting, suggesting that a minimal length is required to license cleavage. Avoiding base pairings in the upstream region of protospacers is also necessary for CRISPR targeting. Successive trinucleotide complementarity between the 5′ tag of crRNAs and protospacers can disrupt targeting. Our findings reveal that type III-A CRISPR-Cas systems can modulate bacterial genome stability and may serve as a high-efficiency tool for deleting resistance or virulence genes in bacteria. IMPORTANCE Staphylococcus aureus is a pathogen that can cause a wide range of infections in humans. Studies have suggested that CRISPR-Cas systems can drive the loss of integrated mobile genetic elements (MGEs) by chromosomal targeting. Here we demonstrate that CRISPR-mediated cleavage contributes to the partial deletion of integrated SCCmec in methicillin-resistant S. aureus (MRSA), which provides a strategy for the treatment of MRSA infections. The spacer within artificial CRISPR arrays should contain more than 25 nucleotides for immunity, and
Chromosomal Targeting by the Type III-A CRISPR-Cas System Can Reshape Genomes in Staphylococcus aureus.

PubMed

Guan, Jing; Wang, Wanying; Sun, Baolin

2017-01-01

CRISPR-Cas (clustered regularly interspaced short palindromic repeat [CRISPR]-CRISPR-associated protein [Cas]) systems can provide protection against invading genetic elements by using CRISPR RNAs (crRNAs) as a guide to locate and degrade the target DNA. CRISPR-Cas systems have been classified into two classes and five types according to the content of cas genes. Previous studies have indicated that CRISPR-Cas systems can avoid viral infection and block plasmid transfer. Here we show that chromosomal targeting by the Staphylococcus aureus type III-A CRISPR-Cas system can drive large-scale genome deletion and alteration within integrated staphylococcal cassette chromosome mec (SCC mec ). The targeting activity of the CRISPR-Cas system is associated with the complementarity between crRNAs and protospacers, and 10- to 13-nucleotide truncations of spacers partially block CRISPR attack and more than 13-nucleotide truncation can fully abolish targeting, suggesting that a minimal length is required to license cleavage. Avoiding base pairings in the upstream region of protospacers is also necessary for CRISPR targeting. Successive trinucleotide complementarity between the 5' tag of crRNAs and protospacers can disrupt targeting. Our findings reveal that type III-A CRISPR-Cas systems can modulate bacterial genome stability and may serve as a high-efficiency tool for deleting resistance or virulence genes in bacteria. IMPORTANCE Staphylococcus aureus is a pathogen that can cause a wide range of infections in humans. Studies have suggested that CRISPR-Cas systems can drive the loss of integrated mobile genetic elements (MGEs) by chromosomal targeting. Here we demonstrate that CRISPR-mediated cleavage contributes to the partial deletion of integrated SCC mec in methicillin-resistant S. aureus (MRSA), which provides a strategy for the treatment of MRSA infections. The spacer within artificial CRISPR arrays should contain more than 25 nucleotides for immunity, and consecutive
The perennial ryegrass GenomeZipper: targeted use of genome resources for comparative grass genomics.

PubMed

Pfeifer, Matthias; Martis, Mihaela; Asp, Torben; Mayer, Klaus F X; Lübberstedt, Thomas; Byrne, Stephen; Frei, Ursula; Studer, Bruno

2013-02-01

Whole-genome sequences established for model and major crop species constitute a key resource for advanced genomic research. For outbreeding forage and turf grass species like ryegrasses (Lolium spp.), such resources have yet to be developed. Here, we present a model of the perennial ryegrass (Lolium perenne) genome on the basis of conserved synteny to barley (Hordeum vulgare) and the model grass genome Brachypodium (Brachypodium distachyon) as well as rice (Oryza sativa) and sorghum (Sorghum bicolor). A transcriptome-based genetic linkage map of perennial ryegrass served as a scaffold to establish the chromosomal arrangement of syntenic genes from model grass species. This scaffold revealed a high degree of synteny and macrocollinearity and was then utilized to anchor a collection of perennial ryegrass genes in silico to their predicted genome positions. This resulted in the unambiguous assignment of 3,315 out of 8,876 previously unmapped genes to the respective chromosomes. In total, the GenomeZipper incorporates 4,035 conserved grass gene loci, which were used for the first genome-wide sequence divergence analysis between perennial ryegrass, barley, Brachypodium, rice, and sorghum. The perennial ryegrass GenomeZipper is an ordered, information-rich genome scaffold, facilitating map-based cloning and genome assembly in perennial ryegrass and closely related Poaceae species. It also represents a milestone in describing synteny between perennial ryegrass and fully sequenced model grass genomes, thereby increasing our understanding of genome organization and evolution in the most important temperate forage and turf grass species.
Engineered Cpf1 variants with altered PAM specificities increase genome targeting range

PubMed Central

Gao, Linyi; Cox, David B.T.; Yan, Winston X.; Manteiga, John C.; Schneider, Martin W.; Yamano, Takashi; Nishimasu, Hiroshi; Nureki, Osamu; Crosetto, Nicola; Zhang, Feng

2017-01-01

The RNA-guided endonuclease Cpf1 is a promising tool for genome editing in eukaryotic cells1–7. However, the utility of the commonly used Acidaminococcus sp. BV3L6 Cpf1 (AsCpf1) and Lachnospiraceae bacterium ND2006 Cpf1 (LbCpf1) is limited by their requirement of a TTTV protospacer adjacent motif (PAM) in the DNA substrate. To address this limitation, we performed a structure-guided mutagenesis screen to increase the targeting range of Cpf1. We engineered two AsCpf1 variants carrying the mutations S542R/K607R and S542R/K548V/N552R, which recognize TYCV and TATV PAMs, respectively, with enhanced activities in vitro and in human cells. Genome-wide assessment of off-target activity using BLISS7 assay indicated that these variants retain high DNA targeting specificity, which we further improved by introducing an additional non-PAM-interacting mutation. Introducing the identified mutations at their corresponding positions in LbCpf1 similarly altered its PAM specificity. Together, these variants increase the targeting range of Cpf1 by approximately three-fold in human coding sequences to one cleavage site per ~11 bp. PMID:28581492
Complete genome-wide screening and subtractive genomic approach revealed new virulence factors, potential drug targets against bio-war pathogen Brucella melitensis 16M

PubMed Central

Pradeepkiran, Jangampalli Adi; Sainath, Sri Bhashyam; Kumar, Konidala Kranthi; Bhaskar, Matcha

2015-01-01

Brucella melitensis 16M is a Gram-negative coccobacillus that infects both animals and humans. It causes a disease known as brucellosis, which is characterized by acute febrile illness in humans and causes abortions in livestock. To prevent and control brucellosis, identification of putative drug targets is crucial. The present study aimed to identify drug targets in B. melitensis 16M by using a subtractive genomic approach. We used available database repositories (Database of Essential Genes, Kyoto Encyclopedia of Genes and Genomes Automatic Annotation Server, and Kyoto Encyclopedia of Genes and Genomes) to identify putative genes that are nonhomologous to humans and essential for pathogen B. melitensis 16M. The results revealed that among 3 Mb genome size of pathogen, 53 putative characterized and 13 uncharacterized hypothetical genes were identified; further, from Basic Local Alignment Search Tool protein analysis, one hypothetical protein showed a close resemblance (50%) to Silicibacter pomeroyi DUF1285 family protein (2RE3). A further homology model of the target was constructed using MODELLER 9.12 and optimized through variable target function method by molecular dynamics optimization with simulating annealing. The stereochemical quality of the restrained model was evaluated by PROCHECK, VERIFY-3D, ERRAT, and WHATIF servers. Furthermore, structure-based virtual screening was carried out against the predicted active site of the respective protein using the glycerol structural analogs from the PubChem database. We identified five best inhibitors with strong affinities, stable interactions, and also with reliable drug-like properties. Hence, these leads might be used as the most effective inhibitors of modeled protein. The outcome of the present work of virtual screening of putative gene targets might facilitate design of potential drugs for better treatment against brucellosis. PMID:25834405
A Functional Genomics Approach to Identify Novel Breast Cancer Gene Targets in Yeast

DTIC Science & Technology

2004-05-01

AD Award Number: DAMD17-03-1-0232 TITLE: A Functional Genomics Approach to Identify Novel Breast Cancer Gene Targets in Yeast PRINCIPAL INVESTIGATOR...Approach to Identify Novel Breast DAMD17-03-1-0232 Cancer Gene Targets in Yeast 6. A UTHOR(S) Craig Bennett, Ph.D. 7. PERFORMING ORGANIZA TION NAME(S...Unlimited 13. ABSTRACT (Maximum 200 Words) We are using the yeast Saccharomyces cerevisiae to identify new cancer gene targets that interact with the

Variability among the Most Rapidly Evolving Plastid Genomic Regions is Lineage-Specific: Implications of Pairwise Genome Comparisons in Pyrus (Rosaceae) and Other Angiosperms for Marker Choice

PubMed Central

Ter-Voskanyan, Hasmik; Allgaier, Martin; Borsch, Thomas

2014-01-01

Plastid genomes exhibit different levels of variability in their sequences, depending on the respective kinds of genomic regions. Genes are usually more conserved while noncoding introns and spacers evolve at a faster pace. While a set of about thirty maximum variable noncoding genomic regions has been suggested to provide universally promising phylogenetic markers throughout angiosperms, applications often require several regions to be sequenced for many individuals. Our project aims to illuminate evolutionary relationships and species-limits in the genus Pyrus (Rosaceae)—a typical case with very low genetic distances between taxa. In this study, we have sequenced the plastid genome of Pyrus spinosa and aligned it to the already available P. pyrifolia sequence. The overall p-distance of the two Pyrus genomes was 0.00145. The intergenic spacers between ndhC–trnV, trnR–atpA, ndhF–rpl32, psbM–trnD, and trnQ–rps16 were the most variable regions, also comprising the highest total numbers of substitutions, indels and inversions (potentially informative characters). Our comparative analysis of further plastid genome pairs with similar low p-distances from Oenothera (representing another rosid), Olea (asterids) and Cymbidium (monocots) showed in each case a different ranking of genomic regions in terms of variability and potentially informative characters. Only two intergenic spacers (ndhF–rpl32 and trnK–rps16) were consistently found among the 30 top-ranked regions. We have mapped the occurrence of substitutions and microstructural mutations in the four genome pairs. High AT content in specific sequence elements seems to foster frequent mutations. We conclude that the variability among the fastest evolving plastid genomic regions is lineage-specific and thus cannot be precisely predicted across angiosperms. The often lineage-specific occurrence of stem-loop elements in the sequences of introns and spacers also governs lineage-specific mutations
Nonviral Genome Editing Based on a Polymer-Derivatized CRISPR Nanocomplex for Targeting Bacterial Pathogens and Antibiotic Resistance.

PubMed

Kang, Yoo Kyung; Kwon, Kyu; Ryu, Jea Sung; Lee, Ha Neul; Park, Chankyu; Chung, Hyun Jung

2017-04-19

The overuse of antibiotics plays a major role in the emergence and spread of multidrug-resistant bacteria. A molecularly targeted, specific treatment method for bacterial pathogens can prevent this problem by reducing the selective pressure during microbial growth. Herein, we introduce a nonviral treatment strategy delivering genome editing material for targeting antibacterial resistance. We apply the CRISPR-Cas9 system, which has been recognized as an innovative tool for highly specific and efficient genome engineering in different organisms, as the delivery cargo. We utilize polymer-derivatized Cas9, by direct covalent modification of the protein with cationic polymer, for subsequent complexation with single-guide RNA targeting antibiotic resistance. We show that nanosized CRISPR complexes (= Cr-Nanocomplex) were successfully formed, while maintaining the functional activity of Cas9 endonuclease to induce double-strand DNA cleavage. We also demonstrate that the Cr-Nanocomplex designed to target mecA-the major gene involved in methicillin resistance-can be efficiently delivered into Methicillin-resistant Staphylococcus aureus (MRSA), and allow the editing of the bacterial genome with much higher efficiency compared to using native Cas9 complexes or conventional lipid-based formulations. The present study shows for the first time that a covalently modified CRISPR system allows nonviral, therapeutic genome editing, and can be potentially applied as a target specific antimicrobial.
Genomic Sequence around Butterfly Wing Development Genes: Annotation and Comparative Analysis

PubMed Central

Conceição, Inês C.; Long, Anthony D.; Gruber, Jonathan D.; Beldade, Patrícia

2011-01-01

Background Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. Methodology/Principal Findings We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations) and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes). Conclusions The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1) the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2) the high conservation of non
DNA methylation in the APOE genomic region is associated with cognitive function in African Americans.

PubMed

Liu, Jiaxuan; Zhao, Wei; Ware, Erin B; Turner, Stephen T; Mosley, Thomas H; Smith, Jennifer A

2018-05-08

Genetic variations in apolipoprotein E (APOE) and proximal genes (PVRL2, TOMM40, and APOC1) are associated with cognitive function and dementia, particularly Alzheimer's disease. Epigenetic mechanisms such as DNA methylation play a central role in the regulation of gene expression. Recent studies have found evidence that DNA methylation may contribute to the pathogenesis of dementia, but its association with cognitive function in populations without dementia remains unclear. We assessed DNA methylation levels of 48 CpG sites in the APOE genomic region in peripheral blood leukocytes collected from 289 African Americans (mean age = 67 years) from the Genetic Epidemiology Network of Arteriopathy (GENOA) study. Using linear regression, we examined the relationship between methylation in the APOE genomic region and multiple cognitive measures including learning, memory, processing speed, concentration, language and global cognitive function. We identified eight CpG sites in three genes (PVRL2, TOMM40, and APOE) that showed an inverse association between methylation level and delayed recall, a measure of memory, after adjusting for age and sex (False Discovery Rate q-value < 0.1). All eight CpGs are located in either CpG islands (CGIs) or CGI shelves, and six of them are in promoter regions. Education and APOE ε4 carrier status significantly modified the effect of methylation in cg08583001 (PVRL2) and cg22024783 (TOMM40), respectively. Together, methylation of the eight CpGs explained an additional 8.7% of the variance in delayed recall, after adjustment for age, sex, education, and APOE ε4 carrier status. Methylation was not significantly associated with any other cognitive measures. Our results suggest that methylation levels at multiple CpGs in the APOE genomic region are inversely associated with delayed recall during normal cognitive aging, even after accounting for known genetic predictors for cognition. Our findings highlight the important role of
Organizational heterogeneity of vertebrate genomes.

PubMed

Frenkel, Svetlana; Kirzhner, Valery; Korol, Abraham

2012-01-01

Genomes of higher eukaryotes are mosaics of segments with various structural, functional, and evolutionary properties. The availability of whole-genome sequences allows the investigation of their structure as "texts" using different statistical and computational methods. One such method, referred to as Compositional Spectra (CS) analysis, is based on scoring the occurrences of fixed-length oligonucleotides (k-mers) in the target DNA sequence. CS analysis allows generating species- or region-specific characteristics of the genome, regardless of their length and the presence of coding DNA. In this study, we consider the heterogeneity of vertebrate genomes as a joint effect of regional variation in sequence organization superimposed on the differences in nucleotide composition. We estimated compositional and organizational heterogeneity of genome and chromosome sequences separately and found that both heterogeneity types vary widely among genomes as well as among chromosomes in all investigated taxonomic groups. The high correspondence of heterogeneity scores obtained on three genome fractions, coding, repetitive, and the remaining part of the noncoding DNA (the genome dark matter--GDM) allows the assumption that CS-heterogeneity may have functional relevance to genome regulation. Of special interest for such interpretation is the fact that natural GDM sequences display the highest deviation from the corresponding reshuffled sequences.
Targeted Genome Editing Using DNA-Free RNA-Guided Cas9 Ribonucleoprotein for CHO Cell Engineering.

PubMed

Shin, Jongoh; Lee, Namil; Cho, Suhyung; Cho, Byung-Kwan

2018-01-01

Recent advances in the CRISPR/Cas9 system have dramatically facilitated genome engineering in various cell systems. Among the protocols, the direct delivery of the Cas9-sgRNA ribonucleoprotein (RNP) complex into cells is an efficient approach to increase genome editing efficiency. This method uses purified Cas9 protein and in vitro transcribed sgRNA to edit the target gene without vector DNA. We have applied the RNP complex to CHO cell engineering to obtain desirable phenotypes and to reduce unintended insertional mutagenesis and off-target effects. Here, we describe our routine methods for RNP complex-mediated gene deletion including the protocols to prepare the purified Cas9 protein and the in vitro transcribed sgRNA. Subsequently, we also describe a protocol to confirm the edited genomic positions using the T7E1 enzymatic assay and next-generation sequencing.
Comparative Analysis of Predicted Plastid-Targeted Proteomes of Sequenced Higher Plant Genomes

PubMed Central

Schaeffer, Scott; Harper, Artemus; Raja, Rajani; Jaiswal, Pankaj; Dhingra, Amit

2014-01-01

Plastids are actively involved in numerous plant processes critical to growth, development and adaptation. They play a primary role in photosynthesis, pigment and monoterpene synthesis, gravity sensing, starch and fatty acid synthesis, as well as oil, and protein storage. We applied two complementary methods to analyze the recently published apple genome (Malus × domestica) to identify putative plastid-targeted proteins, the first using TargetP and the second using a custom workflow utilizing a set of predictive programs. Apple shares roughly 40% of its 10,492 putative plastid-targeted proteins with that of the Arabidopsis (Arabidopsis thaliana) plastid-targeted proteome as identified by the Chloroplast 2010 project and ∼57% of its entire proteome with Arabidopsis. This suggests that the plastid-targeted proteomes between apple and Arabidopsis are different, and interestingly alludes to the presence of differential targeting of homologs between the two species. Co-expression analysis of 2,224 genes encoding putative plastid-targeted apple proteins suggests that they play a role in plant developmental and intermediary metabolism. Further, an inter-specific comparison of Arabidopsis, Prunus persica (Peach), Malus × domestica (Apple), Populus trichocarpa (Black cottonwood), Fragaria vesca (Woodland Strawberry), Solanum lycopersicum (Tomato) and Vitis vinifera (Grapevine) also identified a large number of novel species-specific plastid-targeted proteins. This analysis also revealed the presence of alternatively targeted homologs across species. Two separate analyses revealed that a small subset of proteins, one representing 289 protein clusters and the other 737 unique protein sequences, are conserved between seven plastid-targeted angiosperm proteomes. Majority of the novel proteins were annotated to play roles in stress response, transport, catabolic processes, and cellular component organization. Our results suggest that the current state of knowledge regarding
Cas9-based tools for targeted genome editing and transcriptional control.

PubMed

Xu, Tao; Li, Yongchao; Van Nostrand, Joy D; He, Zhili; Zhou, Jizhong

2014-03-01

Development of tools for targeted genome editing and regulation of gene expression has significantly expanded our ability to elucidate the mechanisms of interesting biological phenomena and to engineer desirable biological systems. Recent rapid progress in the study of a clustered, regularly interspaced short palindromic repeat (CRISPR)/CRISPR-associated (Cas) protein system in bacteria has facilitated the development of newly facile and programmable platforms for genome editing and transcriptional control in a sequence-specific manner. The core RNA-guided Cas9 endonuclease in the type II CRISPR system has been harnessed to realize gene mutation and DNA deletion and insertion, as well as transcriptional activation and repression, with multiplex targeting ability, just by customizing 20-nucleotide RNA components. Here we describe the molecular basis of the type II CRISPR/Cas system and summarize applications and factors affecting its utilization in model organisms. We also discuss the advantages and disadvantages of Cas9-based tools in comparison with widely used customizable tools, such as Zinc finger nucleases and transcription activator-like effector nucleases.
A genome-wide association study identifies a genomic region for the polycerate phenotype in sheep (Ovis aries).

PubMed

Ren, Xue; Yang, Guang-Li; Peng, Wei-Feng; Zhao, Yong-Xin; Zhang, Min; Chen, Ze-Hui; Wu, Fu-An; Kantanen, Juha; Shen, Min; Li, Meng-Hua

2016-02-17

Horns are a cranial appendage found exclusively in Bovidae, and play important roles in accessing resources and mates. In sheep (Ovies aries), horns vary from polled to six-horned, and human have been selecting polled animals in farming and breeding. Here, we conducted a genome-wide association study on 24 two-horned versus 22 four-horned phenotypes in a native Chinese breed of Sishui Fur sheep. Together with linkage disequilibrium (LD) analyses and haplotype-based association tests, we identified a genomic region comprising 132.0-133.1 Mb on chromosome 2 that contained the top 10 SNPs (including 4 significant SNPs) and 5 most significant haplotypes associated with the polycerate phenotype. In humans and mice, this genomic region contains the HOXD gene cluster and adjacent functional genes EVX2 and KIAA1715, which have a close association with the formation of limbs and genital buds. Our results provide new insights into the genetic basis underlying variable numbers of horns and represent a new resource for use in sheep genetics and breeding.
Piggy: a rapid, large-scale pan-genome analysis tool for intergenic regions in bacteria.

PubMed

Thorpe, Harry A; Bayliss, Sion C; Sheppard, Samuel K; Feil, Edward J

2018-04-01

The concept of the "pan-genome," which refers to the total complement of genes within a given sample or species, is well established in bacterial genomics. Rapid and scalable pipelines are available for managing and interpreting pan-genomes from large batches of annotated assemblies. However, despite overwhelming evidence that variation in intergenic regions in bacteria can directly influence phenotypes, most current approaches for analyzing pan-genomes focus exclusively on protein-coding sequences. To address this we present Piggy, a novel pipeline that emulates Roary except that it is based only on intergenic regions. A key utility provided by Piggy is the detection of highly divergent ("switched") intergenic regions (IGRs) upstream of genes. We demonstrate the use of Piggy on large datasets of clinically important lineages of Staphylococcus aureus and Escherichia coli. For S. aureus, we show that highly divergent (switched) IGRs are associated with differences in gene expression and we establish a multilocus reference database of IGR alleles (igMLST; implemented in BIGSdb).
Genome-wide methylation analysis identified sexually dimorphic methylated regions in hybrid tilapia

PubMed Central

Wan, Zi Yi; Xia, Jun Hong; Lin, Grace; Wang, Le; Lin, Valerie C. L.; Yue, Gen Hua

2016-01-01

Sexual dimorphism is an interesting biological phenomenon. Previous studies showed that DNA methylation might play a role in sexual dimorphism. However, the overall picture of the genome-wide methylation landscape in sexually dimorphic species remains unclear. We analyzed the DNA methylation landscape and transcriptome in hybrid tilapia (Oreochromis spp.) using whole genome bisulfite sequencing (WGBS) and RNA-sequencing (RNA-seq). We found 4,757 sexually dimorphic differentially methylated regions (DMRs), with significant clusters of DMRs located on chromosomal regions associated with sex determination. CpG methylation in promoter regions was negatively correlated with the gene expression level. MAPK/ERK pathway was upregulated in male tilapia. We also inferred active cis-regulatory regions (ACRs) in skeletal muscle tissues from WGBS datasets, revealing sexually dimorphic cis-regulatory regions. These results suggest that DNA methylation contribute to sex-specific phenotypes and serve as resources for further investigation to analyze the functions of these regions and their contributions towards sexual dimorphisms. PMID:27782217
Genomic region operation kit for flexible processing of deep sequencing data.

PubMed

Ovaska, Kristian; Lyly, Lauri; Sahu, Biswajyoti; Jänne, Olli A; Hautaniemi, Sampsa

2013-01-01

Computational analysis of data produced in deep sequencing (DS) experiments is challenging due to large data volumes and requirements for flexible analysis approaches. Here, we present a mathematical formalism based on set algebra for frequently performed operations in DS data analysis to facilitate translation of biomedical research questions to language amenable for computational analysis. With the help of this formalism, we implemented the Genomic Region Operation Kit (GROK), which supports various DS-related operations such as preprocessing, filtering, file conversion, and sample comparison. GROK provides high-level interfaces for R, Python, Lua, and command line, as well as an extension C++ API. It supports major genomic file formats and allows storing custom genomic regions in efficient data structures such as red-black trees and SQL databases. To demonstrate the utility of GROK, we have characterized the roles of two major transcription factors (TFs) in prostate cancer using data from 10 DS experiments. GROK is freely available with a user guide from >http://csbi.ltdk.helsinki.fi/grok/.
MicroRNA-guided prioritization of genome-wide association signals reveals the importance of microRNA-target gene networks for complex traits in cattle.

PubMed

Fang, Lingzhao; Sørensen, Peter; Sahana, Goutam; Panitz, Frank; Su, Guosheng; Zhang, Shengli; Yu, Ying; Li, Bingjie; Ma, Li; Liu, George; Lund, Mogens Sandø; Thomsen, Bo

2018-06-19

MicroRNAs (miRNA) are key modulators of gene expression and so act as putative fine-tuners of complex phenotypes. Here, we hypothesized that causal variants of complex traits are enriched in miRNAs and miRNA-target networks. First, we conducted a genome-wide association study (GWAS) for seven functional and milk production traits using imputed sequence variants (13~15 million) and >10,000 animals from three dairy cattle breeds, i.e., Holstein (HOL), Nordic red cattle (RDC) and Jersey (JER). Second, we analyzed for enrichments of association signals in miRNAs and their miRNA-target networks. Our results demonstrated that genomic regions harboring miRNA genes were significantly (P < 0.05) enriched with GWAS signals for milk production traits and mastitis, and that enrichments within miRNA-target gene networks were significantly higher than in random gene-sets for the majority of traits. Furthermore, most between-trait and across-breed correlations of enrichments with miRNA-target networks were significantly greater than with random gene-sets, suggesting pleiotropic effects of miRNAs. Intriguingly, genes that were differentially expressed in response to mammary gland infections were significantly enriched in the miRNA-target networks associated with mastitis. All these findings were consistent across three breeds. Collectively, our observations demonstrate the importance of miRNAs and their targets for the expression of complex traits.
The genomic landscape at a late stage of stickleback speciation: High genomic divergence interspersed by small localized regions of introgression.

PubMed

Ravinet, Mark; Yoshida, Kohta; Shigenobu, Shuji; Toyoda, Atsushi; Fujiyama, Asao; Kitano, Jun

2018-05-01

Speciation is a continuous process and analysis of species pairs at different stages of divergence provides insight into how it unfolds. Previous genomic studies on young species pairs have revealed peaks of divergence and heterogeneous genomic differentiation. Yet less known is how localised peaks of differentiation progress to genome-wide divergence during the later stages of speciation in the presence of persistent gene flow. Spanning the speciation continuum, stickleback species pairs are ideal for investigating how genomic divergence builds up during speciation. However, attention has largely focused on young postglacial species pairs, with little knowledge of the genomic signatures of divergence and introgression in older stickleback systems. The Japanese stickleback species pair, composed of the Pacific Ocean three-spined stickleback (Gasterosteus aculeatus) and the Japan Sea stickleback (G. nipponicus), which co-occur in the Japanese islands, is at a late stage of speciation. Divergence likely started well before the end of the last glacial period and crosses between Japan Sea females and Pacific Ocean males result in hybrid male sterility. Here we use coalescent analyses and Approximate Bayesian Computation to show that the two species split approximately 0.68-1 million years ago but that they have continued to exchange genes at a low rate throughout divergence. Population genomic data revealed that, despite gene flow, a high level of genomic differentiation is maintained across the majority of the genome. However, we identified multiple, small regions of introgression, occurring mainly in areas of low recombination rate. Our results demonstrate that a high level of genome-wide divergence can establish in the face of persistent introgression and that gene flow can be localized to small genomic regions at the later stages of speciation with gene flow.
A Gene-Oriented Haplotype Comparison Reveals Recently Selected Genomic Regions in Temperate and Tropical Maize Germplasm

PubMed Central

Zhang, Jie; Li, Yongxiang; Zheng, Jun; Zhang, Hongwei; Yang, Xiaohong; Wang, Jianhua; Wang, Guoying

2017-01-01

The extensive genetic variation present in maize (Zea mays) germplasm makes it possible to detect signatures of positive artificial selection that occurred during temperate and tropical maize improvement. Here we report an analysis of 532,815 polymorphisms from a maize association panel consisting of 368 diverse temperate and tropical inbred lines. We developed a gene-oriented approach adapting exonic polymorphisms to identify recently selected alleles by comparing haplotypes across the maize genome. This analysis revealed evidence of selection for more than 1100 genomic regions during recent improvement, and included regulatory genes and key genes with visible mutant phenotypes. We find that selected candidate target genes in temperate maize are enriched in biosynthetic processes, and further examination of these candidates highlights two cases, sucrose flux and oil storage, in which multiple genes in a common pathway can be cooperatively selected. Finally, based on available parallel gene expression data, we hypothesize that some genes were selected for regulatory variations, resulting in altered gene expression. PMID:28099470
Modular assembly of transposable element arrays by microsatellite targeting in the guayule and rice genomes.

PubMed

Valdes Franco, José A; Wang, Yi; Huo, Naxin; Ponciano, Grisel; Colvin, Howard A; McMahan, Colleen M; Gu, Yong Q; Belknap, William R

2018-04-19

Guayule (Parthenium argentatum A. Gray) is a rubber-producing desert shrub native to Mexico and the United States. Guayule represents an alternative to Hevea brasiliensis as a source for commercial natural rubber. The efficient application of modern molecular/genetic tools to guayule improvement requires characterization of its genome. The 1.6 Gb guayule genome was sequenced, assembled and annotated. The final 1.5 Gb assembly, while fragmented (N 50 = 22 kb), maps > 95% of the shotgun reads and is essentially complete. Approximately 40,000 transcribed, protein encoding genes were annotated on the assembly. Further characterization of this genome revealed 15 families of small, microsatellite-associated, transposable elements (TEs) with unexpected chromosomal distribution profiles. These SaTar (Satellite Targeted) elements, which are non-autonomous Mu-like elements (MULEs), were frequently observed in multimeric linear arrays of unrelated individual elements within which no individual element is interrupted by another. This uniformly non-nested TE multimer architecture has not been previously described in either eukaryotic or prokaryotic genomes. Five families of similarly distributed non-autonomous MULEs (microsatellite associated, modularly assembled) were characterized in the rice genome. Families of TEs with similar structures and distribution profiles were identified in sorghum and citrus. The sequencing and assembly of the guayule genome provides a foundation for application of current crop improvement technologies to this plant. In addition, characterization of this genome revealed SaTar elements with distribution profiles unique among TEs. Satar targeting appears based on an alternative MULE recombination mechanism with the potential to impact gene evolution.
Analysis of genomic regions of Trichoderma harzianum IOC-3844 related to biomass degradation.

PubMed

Crucello, Aline; Sforça, Danilo Augusto; Horta, Maria Augusta Crivelente; dos Santos, Clelton Aparecido; Viana, Américo José Carvalho; Beloti, Lilian Luzia; de Toledo, Marcelo Augusto Szymanski; Vincentz, Michel; Kuroshu, Reginaldo Massanobu; de Souza, Anete Pereira

2015-01-01

Trichoderma harzianum IOC-3844 secretes high levels of cellulolytic-active enzymes and is therefore a promising strain for use in biotechnological applications in second-generation bioethanol production. However, the T. harzianum biomass degradation mechanism has not been well explored at the genetic level. The present work investigates six genomic regions (~150 kbp each) in this fungus that are enriched with genes related to biomass conversion. A BAC library consisting of 5,760 clones was constructed, with an average insert length of 90 kbp. The assembled BAC sequences revealed 232 predicted genes, 31.5% of which were related to catabolic pathways, including those involved in biomass degradation. An expression profile analysis based on RNA-Seq data demonstrated that putative regulatory elements, such as membrane transport proteins and transcription factors, are located in the same genomic regions as genes related to carbohydrate metabolism and exhibit similar expression profiles. Thus, we demonstrate a rapid and efficient tool that focuses on specific genomic regions by combining a BAC library with transcriptomic data. This is the first BAC-based structural genomic study of the cellulolytic fungus T. harzianum, and its findings provide new perspectives regarding the use of this species in biomass degradation processes.
A fungal avirulence factor encoded in a highly plastic genomic region triggers partial resistance to septoria tritici blotch.

PubMed

Meile, Lukas; Croll, Daniel; Brunner, Patrick C; Plissonneau, Clémence; Hartmann, Fanny E; McDonald, Bruce A; Sánchez-Vallet, Andrea

2018-04-25

Cultivar-strain specificity in the wheat-Zymoseptoria tritici pathosystem determines the infection outcome and is controlled by resistance genes on the host side, many of which have been identified. On the pathogen side, however, the molecular determinants of specificity remain largely unknown. We used genetic mapping, targeted gene disruption and allele swapping to characterise the recognition of the new avirulence factor Avr3D1. We then combined population genetic and comparative genomic analyses to characterise the evolutionary trajectory of Avr3D1. Avr3D1 is specifically recognised by wheat cultivars harbouring the Stb7 resistance gene, triggering a strong defence response without preventing pathogen infection and reproduction. Avr3D1 resides in a cluster of putative effector genes located in a genome region populated by independent transposable element insertions. The gene was present in all 132 investigated strains and is highly polymorphic, with 30 different protein variants identified. We demonstrated that specific amino acid substitutions in Avr3D1 led to evasion of recognition. These results demonstrate that quantitative resistance and gene-for-gene interactions are not mutually exclusive. Localising avirulence genes in highly plastic genomic regions probably facilitates accelerated evolution that enables escape from recognition by resistance proteins. © 2018 The Authors. New Phytologist © 2018 New Phytologist Trust.
Target Discovery for Precision Medicine Using High-Throughput Genome Engineering.

PubMed

Guo, Xinyi; Chitale, Poonam; Sanjana, Neville E

2017-01-01

Over the past few years, programmable RNA-guided nucleases such as the CRISPR/Cas9 system have ushered in a new era of precision genome editing in diverse model systems and in human cells. Functional screens using large libraries of RNA guides can interrogate a large hypothesis space to pinpoint particular genes and genetic elements involved in fundamental biological processes and disease-relevant phenotypes. Here, we review recent high-throughput CRISPR screens (e.g. loss-of-function, gain-of-function, and targeting noncoding elements) and highlight their potential for uncovering novel therapeutic targets, such as those involved in cancer resistance to small molecular drugs and immunotherapies, tumor evolution, infectious disease, inborn genetic disorders, and other therapeutic challenges.
New target for inhibition of bacterial RNA polymerase: 'switch region'.

PubMed

Srivastava, Aashish; Talaue, Meliza; Liu, Shuang; Degen, David; Ebright, Richard Y; Sineva, Elena; Chakraborty, Anirban; Druzhinin, Sergey Y; Chatterjee, Sujoy; Mukhopadhyay, Jayanta; Ebright, Yon W; Zozula, Alex; Shen, Juan; Sengupta, Sonali; Niedfeldt, Rui Rong; Xin, Cai; Kaneko, Takushi; Irschik, Herbert; Jansen, Rolf; Donadio, Stefano; Connell, Nancy; Ebright, Richard H

2011-10-01

A new drug target - the 'switch region' - has been identified within bacterial RNA polymerase (RNAP), the enzyme that mediates bacterial RNA synthesis. The new target serves as the binding site for compounds that inhibit bacterial RNA synthesis and kill bacteria. Since the new target is present in most bacterial species, compounds that bind to the new target are active against a broad spectrum of bacterial species. Since the new target is different from targets of other antibacterial agents, compounds that bind to the new target are not cross-resistant with other antibacterial agents. Four antibiotics that function through the new target have been identified: myxopyronin, corallopyronin, ripostatin, and lipiarmycin. This review summarizes the switch region, switch-region inhibitors, and implications for antibacterial drug discovery. Copyright © 2011 Elsevier Ltd. All rights reserved.

Developmental roles of 21 Drosophila transcription factors are determined by quantitative differences in binding to an overlapping set of thousands of genomic regions

DOE Office of Scientific and Technical Information (OSTI.GOV)

MacArthur, Stewart; Li, Xiao-Yong; Li, Jingyi

2009-05-15

BACKGROUND: We previously established that six sequence-specific transcription factors that initiate anterior/posterior patterning in Drosophila bind to overlapping sets of thousands of genomic regions in blastoderm embryos. While regions bound at high levels include known and probable functional targets, more poorly bound regions are preferentially associated with housekeeping genes and/or genes not transcribed in the blastoderm, and are frequently found in protein coding sequences or in less conserved non-coding DNA, suggesting that many are likely non-functional. RESULTS: Here we show that an additional 15 transcription factors that regulate other aspects of embryo patterning show a similar quantitative continuum of functionmore » and binding to thousands of genomic regions in vivo. Collectively, the 21 regulators show a surprisingly high overlap in the regions they bind given that they belong to 11 DNA binding domain families, specify distinct developmental fates, and can act via different cis-regulatory modules. We demonstrate, however, that quantitative differences in relative levels of binding to shared targets correlate with the known biological and transcriptional regulatory specificities of these factors. CONCLUSIONS: It is likely that the overlap in binding of biochemically and functionally unrelated transcription factors arises from the high concentrations of these proteins in nuclei, which, coupled with their broad DNA binding specificities, directs them to regions of open chromatin. We suggest that most animal transcription factors will be found to show a similar broad overlapping pattern of binding in vivo, with specificity achieved by modulating the amount, rather than the identity, of bound factor.« less
Functional precision medicine identifies novel druggable targets and therapeutic options in head and neck cancer. | Office of Cancer Genomics

Cancer.gov

Purpose: Head and neck squamous cell carcinoma (HNSCC) is the sixth most common cancer worldwide with high mortality and a lack of targeted therapies. To identify and prioritize druggable targets, we performed genome analysis together with genome-scale siRNA and oncology drug profiling using low passage tumor cells derived from a patient with a treatmentresistant HPV-negative HNSCC.
Functional interrogation of non-coding DNA through CRISPR genome editing.

PubMed

Canver, Matthew C; Bauer, Daniel E; Orkin, Stuart H

2017-05-15

Methodologies to interrogate non-coding regions have lagged behind coding regions despite comprising the vast majority of the genome. However, the rapid evolution of clustered regularly interspaced short palindromic repeats (CRISPR)-based genome editing has provided a multitude of novel techniques for laboratory investigation including significant contributions to the toolbox for studying non-coding DNA. CRISPR-mediated loss-of-function strategies rely on direct disruption of the underlying sequence or repression of transcription without modifying the targeted DNA sequence. CRISPR-mediated gain-of-function approaches similarly benefit from methods to alter the targeted sequence through integration of customized sequence into the genome as well as methods to activate transcription. Here we review CRISPR-based loss- and gain-of-function techniques for the interrogation of non-coding DNA. Copyright © 2017 Elsevier Inc. All rights reserved.
Functional interrogation of non-coding DNA through CRISPR genome editing

PubMed Central

Canver, Matthew C.; Bauer, Daniel E.; Orkin, Stuart H.

2017-01-01

Methodologies to interrogate non-coding regions have lagged behind coding regions despite comprising the vast majority of the genome. However, the rapid evolution of clustered regularly interspaced short palindromic repeats (CRISPR)-based genome editing has provided a multitude of novel techniques for laboratory investigation including significant contributions to the toolbox for studying non-coding DNA. CRISPR-mediated loss-of-function strategies rely on direct disruption of the underlying sequence or repression of transcription without modifying the targeted DNA sequence. CRISPR-mediated gain-of-function approaches similarly benefit from methods to alter the targeted sequence through integration of customized sequence into the genome as well as methods to activate transcription. Here we review CRISPR-based loss- and gain-of-function techniques for the interrogation of non-coding DNA. PMID:28288828
Identification of genomic regions contributing to etoposide-induced cytotoxicity

PubMed Central

Bleibel, Wasim K.; Duan, Shiwei; Huang, R. Stephanie; Kistner, Emily O.; Shukla, Sunita J.; Wu, Xiaolin; Badner, Judith A.

2009-01-01

Etoposide is routinely used in combination based chemotherapy for testicular cancer and small-cell lung cancer; however, myelosuppression, therapy-related leukemia and neurotoxicity limit its utility. To determine the genetic contribution to cellular sensitivity to etoposide, we evaluated cell growth inhibition in Centre d’ Etude du Polymorphisme Humain lymphoblastoid cell lines from 24 multi-generational pedigrees (321 samples) following treatment with 0.02–2.5 µM etoposide for 72 h. Heritability analysis showed that genetic variation contributes significantly to the cytotoxic phenotypes (h2 = 0.17–0.25, P = 4.9 × 10−5−7.3 × 10−3). Whole genome linkage scans uncovered 8 regions with peak LOD scores ranging from 1.57 to 2.55, with the most significant signals being found on chromosome 5 (LOD = 2.55) and chromosome 6 (LOD = 2.52). Linkage-directed association was performed on a subset of HapMap samples within the pedigrees to find 22 SNPs significantly associated with etoposide cytotoxicity at one or more treatment concentrations. UVRAG, a DNA repair gene, SEMA5A, SLC7A6 and PRMT7 are implicated from these unbiased studies. Our findings suggest that susceptibility to etoposide-induced cytotoxicity is heritable and using an integrated genomics approach we identified both genomic regions and SNPs associated with the cytotoxic phenotypes. PMID:19089452
Identification of genomic regions contributing to etoposide-induced cytotoxicity.

PubMed

Bleibel, Wasim K; Duan, Shiwei; Huang, R Stephanie; Kistner, Emily O; Shukla, Sunita J; Wu, Xiaolin; Badner, Judith A; Dolan, M Eileen

2009-03-01

Etoposide is routinely used in combination-based chemotherapy for testicular cancer and small-cell lung cancer; however, myelosuppression, therapy-related leukemia and neurotoxicity limit its utility. To determine the genetic contribution to cellular sensitivity to etoposide, we evaluated cell growth inhibition in Centre d' Etude du Polymorphisme Humain lymphoblastoid cell lines from 24 multi-generational pedigrees (321 samples) following treatment with 0.02-2.5 microM etoposide for 72 h. Heritability analysis showed that genetic variation contributes significantly to the cytotoxic phenotypes (h (2) = 0.17-0.25, P = 4.9 x 10(-5)-7.3 x 10(-3)). Whole genome linkage scans uncovered 8 regions with peak LOD scores ranging from 1.57 to 2.55, with the most significant signals being found on chromosome 5 (LOD = 2.55) and chromosome 6 (LOD = 2.52). Linkage-directed association was performed on a subset of HapMap samples within the pedigrees to find 22 SNPs significantly associated with etoposide cytotoxicity at one or more treatment concentrations. UVRAG, a DNA repair gene, SEMA5A, SLC7A6 and PRMT7 are implicated from these unbiased studies. Our findings suggest that susceptibility to etoposide-induced cytotoxicity is heritable and using an integrated genomics approach we identified both genomic regions and SNPs associated with the cytotoxic phenotypes.
Comparative mitochondrial genomics of snakes: extraordinary substitution rate dynamics and functionality of the duplicate control region

PubMed Central

Jiang, Zhi J; Castoe, Todd A; Austin, Christopher C; Burbrink, Frank T; Herron, Matthew D; McGuire, Jimmy A; Parkinson, Christopher L; Pollock, David D

2007-01-01

Background The mitochondrial genomes of snakes are characterized by an overall evolutionary rate that appears to be one of the most accelerated among vertebrates. They also possess other unusual features, including short tRNAs and other genes, and a duplicated control region that has been stably maintained since it originated more than 70 million years ago. Here, we provide a detailed analysis of evolutionary dynamics in snake mitochondrial genomes to better understand the basis of these extreme characteristics, and to explore the relationship between mitochondrial genome molecular evolution, genome architecture, and molecular function. We sequenced complete mitochondrial genomes from Slowinski's corn snake (Pantherophis slowinskii) and two cottonmouths (Agkistrodon piscivorus) to complement previously existing mitochondrial genomes, and to provide an improved comparative view of how genome architecture affects molecular evolution at contrasting levels of divergence. Results We present a Bayesian genetic approach that suggests that the duplicated control region can function as an additional origin of heavy strand replication. The two control regions also appear to have different intra-specific versus inter-specific evolutionary dynamics that may be associated with complex modes of concerted evolution. We find that different genomic regions have experienced substantial accelerated evolution along early branches in snakes, with different genes having experienced dramatic accelerations along specific branches. Some of these accelerations appear to coincide with, or subsequent to, the shortening of various mitochondrial genes and the duplication of the control region and flanking tRNAs. Conclusion Fluctuations in the strength and pattern of selection during snake evolution have had widely varying gene-specific effects on substitution rates, and these rate accelerations may have been functionally related to unusual changes in genomic architecture. The among-lineage and
Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data.

PubMed

Chung, Dongjun; Kuan, Pei Fen; Li, Bo; Sanalkumar, Rajendran; Liang, Kun; Bresnick, Emery H; Dewey, Colin; Keleş, Sündüz

2011-07-01

Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is rapidly replacing chromatin immunoprecipitation combined with genome-wide tiling array analysis (ChIP-chip) as the preferred approach for mapping transcription-factor binding sites and chromatin modifications. The state of the art for analyzing ChIP-seq data relies on using only reads that map uniquely to a relevant reference genome (uni-reads). This can lead to the omission of up to 30% of alignable reads. We describe a general approach for utilizing reads that map to multiple locations on the reference genome (multi-reads). Our approach is based on allocating multi-reads as fractional counts using a weighted alignment scheme. Using human STAT1 and mouse GATA1 ChIP-seq datasets, we illustrate that incorporation of multi-reads significantly increases sequencing depths, leads to detection of novel peaks that are not otherwise identifiable with uni-reads, and improves detection of peaks in mappable regions. We investigate various genome-wide characteristics of peaks detected only by utilization of multi-reads via computational experiments. Overall, peaks from multi-read analysis have similar characteristics to peaks that are identified by uni-reads except that the majority of them reside in segmental duplications. We further validate a number of GATA1 multi-read only peaks by independent quantitative real-time ChIP analysis and identify novel target genes of GATA1. These computational and experimental results establish that multi-reads can be of critical importance for studying transcription factor binding in highly repetitive regions of genomes with ChIP-seq experiments.
Heuristic Bayesian segmentation for discovery of coexpressed genes within genomic regions.

PubMed

Pehkonen, Petri; Wong, Garry; Törönen, Petri

2010-01-01

Segmentation aims to separate homogeneous areas from the sequential data, and plays a central role in data mining. It has applications ranging from finance to molecular biology, where bioinformatics tasks such as genome data analysis are active application fields. In this paper, we present a novel application of segmentation in locating genomic regions with coexpressed genes. We aim at automated discovery of such regions without requirement for user-given parameters. In order to perform the segmentation within a reasonable time, we use heuristics. Most of the heuristic segmentation algorithms require some decision on the number of segments. This is usually accomplished by using asymptotic model selection methods like the Bayesian information criterion. Such methods are based on some simplification, which can limit their usage. In this paper, we propose a Bayesian model selection to choose the most proper result from heuristic segmentation. Our Bayesian model presents a simple prior for the segmentation solutions with various segment numbers and a modified Dirichlet prior for modeling multinomial data. We show with various artificial data sets in our benchmark system that our model selection criterion has the best overall performance. The application of our method in yeast cell-cycle gene expression data reveals potential active and passive regions of the genome.
Computational approach for elucidating interactions of cross-species miRNAs and their targets in Flaviviruses.

PubMed

Shinde, Santosh P; Banerjee, Amit Kumar; Arora, Neelima; Murty, U S N; Sripathi, Venkateswara Rao; Pal-Bhadra, Manika; Bhadra, Utpal

2015-03-01

Combating viral diseases has been a challenging task since time immemorial. Available molecular approaches are limited and not much effective for this daunting task. MicroRNA based therapies have shown promise in recent times. MicroRNAs are tiny non-coding RNAs that regulate translational repression of target mRNA in highly specific manner. In this study, we have determined the target regions for human and viral microRNAs in the conserved genomic regions of selected viruses of Flaviviridae family using miRanda and performed a comparative target selectivity analysis among them. Specific target regions were determined and they were compared extensively among themselves by exploring their position to determine the vicinity. Based on the multiplicity and cooperativity analysis, interaction maps were developed manually to represent the interactions between top-ranking miRNAs and genomes of the viruses considered in this study. Self-organizing map (SOM) was used to cluster the best-ranked microRNAs based on the vital physicochemical properties. This study will provide deep insight into the interrelation of the viral and human microRNAs interactions with the selected Flaviviridae genomes and will help to identify cross-species microRNA targets on the viral genome.
Enhanced CRISPR/Cas9-mediated biallelic genome targeting with dual surrogate reporter-integrated donors.

PubMed

Wu, Yun; Xu, Kun; Ren, Chonghua; Li, Xinyi; Lv, Huijiao; Han, Furong; Wei, Zehui; Wang, Xin; Zhang, Zhiying

2017-03-01

The clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR-associated protein 9 (Cas9) system has recently emerged as a simple, yet powerful genome engineering tool, which has been widely used for genome modification in various organisms and cell types. However, screening biallelic genome-modified cells is often time-consuming and technically challenging. In this study, we incorporated two different surrogate reporter cassettes into paired donor plasmids, which were used as both the surrogate reporters and the knock-in donors. By applying our dual surrogate reporter-integrated donor system, we demonstrate high frequency of CRISPR/Cas9-mediated biallelic genome integration in both human HEK293T and porcine PK15 cells (34.09% and 18.18%, respectively). Our work provides a powerful genetic tool for assisting the selection and enrichment of cells with targeted biallelic genome modification. © 2017 Federation of European Biochemical Societies.
Comparative analyses of CTCF and BORIS occupancies uncover two distinct classes of CTCF binding genomic regions.

PubMed

Pugacheva, Elena M; Rivero-Hinojosa, Samuel; Espinoza, Celso A; Méndez-Catalá, Claudia Fabiola; Kang, Sungyun; Suzuki, Teruhiko; Kosaka-Suzuki, Natsuki; Robinson, Susan; Nagarajan, Vijayaraj; Ye, Zhen; Boukaba, Abdelhalim; Rasko, John E J; Strunnikov, Alexander V; Loukinov, Dmitri; Ren, Bing; Lobanenkov, Victor V

2015-08-14

CTCF and BORIS (CTCFL), two paralogous mammalian proteins sharing nearly identical DNA binding domains, are thought to function in a mutually exclusive manner in DNA binding and transcriptional regulation. Here we show that these two proteins co-occupy a specific subset of regulatory elements consisting of clustered CTCF binding motifs (termed 2xCTSes). BORIS occupancy at 2xCTSes is largely invariant in BORIS-positive cancer cells, with the genomic pattern recapitulating the germline-specific BORIS binding to chromatin. In contrast to the single-motif CTCF target sites (1xCTSes), the 2xCTS elements are preferentially found at active promoters and enhancers, both in cancer and germ cells. 2xCTSes are also enriched in genomic regions that escape histone to protamine replacement in human and mouse sperm. Depletion of the BORIS gene leads to altered transcription of a large number of genes and the differentiation of K562 cells, while the ectopic expression of this CTCF paralog leads to specific changes in transcription in MCF7 cells. We discover two functionally and structurally different classes of CTCF binding regions, 2xCTSes and 1xCTSes, revealed by their predisposition to bind BORIS. We propose that 2xCTSes play key roles in the transcriptional program of cancer and germ cells.
Exome capture from the spruce and pine giga-genomes.

PubMed

Suren, H; Hodgins, K A; Yeaman, S; Nurkowski, K A; Smets, P; Rieseberg, L H; Aitken, S N; Holliday, J A

2016-09-01

Sequence capture is a flexible tool for generating reduced representation libraries, particularly in species with massive genomes. We used an exome capture approach to sequence the gene space of two of the dominant species in Canadian boreal and montane forests - interior spruce (Picea glauca x engelmanii) and lodgepole pine (Pinus contorta). Transcriptome data generated with RNA-seq were coupled with draft genome sequences to design baits corresponding to 26 824 genes from pine and 28 649 genes from spruce. A total of 579 samples for spruce and 631 samples for pine were included, as well as two pine congeners and six spruce congeners. More than 50% of targeted regions were sequenced at >10× depth in each species, while ~12% captured near-target regions within 500 bp of a bait position were sequenced to a depth >10×. Much of our read data arose from off-target regions, which was likely due to the fragmented and incomplete nature of the draft genome assemblies. Capture in general was successful for the related species, suggesting that baits designed for a single species are likely to successfully capture sequences from congeners. From these data, we called approximately 10 million SNPs and INDELs in each species from coding regions, introns, untranslated and flanking regions, as well as from the intergenic space. Our study demonstrates the utility of sequence capture for resequencing in complex conifer genomes, suggests guidelines for improving capture efficiency and provides a rich resource of genetic variants for studies of selection and local adaptation in these species. © 2016 John Wiley & Sons Ltd.
Phylogeny Inference of Closely Related Bacterial Genomes: Combining the Features of Both Overlapping Genes and Collinear Genomic Regions

PubMed Central

Zhang, Yan-Cong; Lin, Kui

2015-01-01

Overlapping genes (OGs) represent one type of widespread genomic feature in bacterial genomes and have been used as rare genomic markers in phylogeny inference of closely related bacterial species. However, the inference may experience a decrease in performance for phylogenomic analysis of too closely or too distantly related genomes. Another drawback of OGs as phylogenetic markers is that they usually take little account of the effects of genomic rearrangement on the similarity estimation, such as intra-chromosome/genome translocations, horizontal gene transfer, and gene losses. To explore such effects on the accuracy of phylogeny reconstruction, we combine phylogenetic signals of OGs with collinear genomic regions, here called locally collinear blocks (LCBs). By putting these together, we refine our previous metric of pairwise similarity between two closely related bacterial genomes. As a case study, we used this new method to reconstruct the phylogenies of 88 Enterobacteriale genomes of the class Gammaproteobacteria. Our results demonstrated that the topological accuracy of the inferred phylogeny was improved when both OGs and LCBs were simultaneously considered, suggesting that combining these two phylogenetic markers may reduce, to some extent, the influence of gene loss on phylogeny inference. Such phylogenomic studies, we believe, will help us to explore a more effective approach to increasing the robustness of phylogeny reconstruction of closely related bacterial organisms. PMID:26715828
Cancer Genomic Resources and Present Needs in the Latin American Region.

PubMed

Torres, Ángela; Oliver, Javier; Frecha, Cecilia; Montealegre, Ana Lorena; Quezada-Urbán, Rosalía; Díaz-Velásquez, Clara Estela; Vaca-Paniagua, Felipe; Perdomo, Sandra

2017-01-01

In Latin America (LA), cancer is the second leading cause of death, and little is known about the capacities and needs for the development of research in the field of cancer genomics. In order to evaluate the current capacity for and development of cancer genomics in LA, we collected the available information on genomics, including the number of next-generation sequencing (NGS) platforms, the number of cancer research institutions and research groups, publications in the last 10 years, educational programs, and related national cancer control policies. Currently, there are 221 NGS platforms and 118 research groups in LA developing cancer genomics projects. A total of 272 articles in the field of cancer genetics/genomics were published by authors affiliated to Latin American institutions. Educational programs in genomics are scarce, almost exclusive of graduate programs, and only few are concerning cancer. Only 14 countries have national cancer control plans, but all of them consider secondary prevention strategies for early diagnosis, opportune treatment, and decreasing mortality, where genomic analyses could be implemented. Despite recent advances in introducing knowledge about cancer genomics and its application to LA, the region lacks development of integrated genomic research projects, improved use of NGS platforms, implementation of associated educational programs, and health policies that could have an impact on cancer care. © 2017 S. Karger AG, Basel.
The druggable genome and support for target identification and validation in drug development.

PubMed

Finan, Chris; Gaulton, Anna; Kruger, Felix A; Lumbers, R Thomas; Shah, Tina; Engmann, Jorgen; Galver, Luana; Kelley, Ryan; Karlsson, Anneli; Santos, Rita; Overington, John P; Hingorani, Aroon D; Casas, Juan P

2017-03-29

Target identification (determining the correct drug targets for a disease) and target validation (demonstrating an effect of target perturbation on disease biomarkers and disease end points) are important steps in drug development. Clinically relevant associations of variants in genes encoding drug targets model the effect of modifying the same targets pharmacologically. To delineate drug development (including repurposing) opportunities arising from this paradigm, we connected complex disease- and biomarker-associated loci from genome-wide association studies to an updated set of genes encoding druggable human proteins, to agents with bioactivity against these targets, and, where there were licensed drugs, to clinical indications. We used this set of genes to inform the design of a new genotyping array, which will enable association studies of druggable genes for drug target selection and validation in human disease. Copyright © 2017, American Association for the Advancement of Science.
Plant genome editing made easy: targeted mutagenesis in model and crop plants using the CRISPR/Cas system.

PubMed

Belhaj, Khaoula; Chaparro-Garcia, Angela; Kamoun, Sophien; Nekrasov, Vladimir

2013-10-11

Targeted genome engineering (also known as genome editing) has emerged as an alternative to classical plant breeding and transgenic (GMO) methods to improve crop plants. Until recently, available tools for introducing site-specific double strand DNA breaks were restricted to zinc finger nucleases (ZFNs) and TAL effector nucleases (TALENs). However, these technologies have not been widely adopted by the plant research community due to complicated design and laborious assembly of specific DNA binding proteins for each target gene. Recently, an easier method has emerged based on the bacterial type II CRISPR (clustered regularly interspaced short palindromic repeats)/Cas (CRISPR-associated) immune system. The CRISPR/Cas system allows targeted cleavage of genomic DNA guided by a customizable small noncoding RNA, resulting in gene modifications by both non-homologous end joining (NHEJ) and homology-directed repair (HDR) mechanisms. In this review we summarize and discuss recent applications of the CRISPR/Cas technology in plants.
CRISPR: From Prokaryotic Immune Systems to Plant Genome Editing Tools.

PubMed

Bandyopadhyay, Anindya; Mazumdar, Shamik; Yin, Xiaojia; Quick, William Paul

2017-01-01

The clustered regularly interspaced short palindromic repeats (CRISPR) system is a prokaryotic adaptive immune system that has the ability to identify specific locations on the bacteriophage (phage) genome to create breaks in it, and internalize the phage genome fragments in its own genome as CRISPR arrays for memory-dependent resistance. Although CRISPR has been used in the dairy industry for a long time, it recently gained importance in the field of genome editing because of its ability to precisely target locations in a genome. This system has further been modified to locate and target any region of a genome of choice due to modifications in the components of the system. By changing the nucleotide sequence of the 20-nucleotide target sequence in the guide RNA, targeting any location is possible. It has found an application in the modification of plant genomes with its ability to generate mutations and insertions, thus helping to create new varieties of plants. With the ability to introduce specific sequences into the plant genome after cleavage by the CRISPR system and subsequent DNA repair through homology-directed repair (HDR), CRISPR ensures that genome editing can be successfully applied in plants, thus generating stronger and more improved traits. Also, the use of the CRISPR editing system can generate plants that are transgene-free and have mutations that are stably inherited, thus helping to circumvent current GMO regulations.
An Integrated Tool to Study MHC Region: Accurate SNV Detection and HLA Genes Typing in Human MHC Region Using Targeted High-Throughput Sequencing

PubMed Central

Liu, Xiao; Xu, Yinyin; Liang, Dequan; Gao, Peng; Sun, Yepeng; Gifford, Benjamin; D’Ascenzo, Mark; Liu, Xiaomin; Tellier, Laurent C. A. M.; Yang, Fang; Tong, Xin; Chen, Dan; Zheng, Jing; Li, Weiyang; Richmond, Todd; Xu, Xun; Wang, Jun; Li, Yingrui

2013-01-01

The major histocompatibility complex (MHC) is one of the most variable and gene-dense regions of the human genome. Most studies of the MHC, and associated regions, focus on minor variants and HLA typing, many of which have been demonstrated to be associated with human disease susceptibility and metabolic pathways. However, the detection of variants in the MHC region, and diagnostic HLA typing, still lacks a coherent, standardized, cost effective and high coverage protocol of clinical quality and reliability. In this paper, we presented such a method for the accurate detection of minor variants and HLA types in the human MHC region, using high-throughput, high-coverage sequencing of target regions. A probe set was designed to template upon the 8 annotated human MHC haplotypes, and to encompass the 5 megabases (Mb) of the extended MHC region. We deployed our probes upon three, genetically diverse human samples for probe set evaluation, and sequencing data show that ∼97% of the MHC region, and over 99% of the genes in MHC region, are covered with sufficient depth and good evenness. 98% of genotypes called by this capture sequencing prove consistent with established HapMap genotypes. We have concurrently developed a one-step pipeline for calling any HLA type referenced in the IMGT/HLA database from this target capture sequencing data, which shows over 96% typing accuracy when deployed at 4 digital resolution. This cost-effective and highly accurate approach for variant detection and HLA typing in the MHC region may lend further insight into immune-mediated diseases studies, and may find clinical utility in transplantation medicine research. This one-step pipeline is released for general evaluation and use by the scientific community. PMID:23894464
Transposon Mutagenesis of the Zika Virus Genome Highlights Regions Essential for RNA Replication and Restricted for Immune Evasion.

PubMed

Fulton, Benjamin O; Sachs, David; Schwarz, Megan C; Palese, Peter; Evans, Matthew J

2017-08-01

The molecular constraints affecting Zika virus (ZIKV) evolution are not well understood. To investigate ZIKV genetic flexibility, we used transposon mutagenesis to add 15-nucleotide insertions throughout the ZIKV MR766 genome and subsequently deep sequenced the viable mutants. Few ZIKV insertion mutants replicated, which likely reflects a high degree of functional constraints on the genome. The NS1 gene exhibited distinct mutational tolerances at different stages of the screen. This result may define regions of the NS1 protein that are required for the different stages of the viral life cycle. The ZIKV structural genes showed the highest degree of insertional tolerance. Although the envelope (E) protein exhibited particular flexibility, the highly conserved envelope domain II (EDII) fusion loop of the E protein was intolerant of transposon insertions. The fusion loop is also a target of pan-flavivirus antibodies that are generated against other flaviviruses and neutralize a broad range of dengue virus and ZIKV isolates. The genetic restrictions identified within the epitopes in the EDII fusion loop likely explain the sequence and antigenic conservation of these regions in ZIKV and among multiple flaviviruses. Thus, our results provide insights into the genetic restrictions on ZIKV that may affect the evolution of this virus. IMPORTANCE Zika virus recently emerged as a significant human pathogen. Determining the genetic constraints on Zika virus is important for understanding the factors affecting viral evolution. We used a genome-wide transposon mutagenesis screen to identify where mutations were tolerated in replicating viruses. We found that the genetic regions involved in RNA replication were mostly intolerant of mutations. The genes coding for structural proteins were more permissive to mutations. Despite the flexibility observed in these regions, we found that epitopes bound by broadly reactive antibodies were genetically constrained. This finding may explain

Cre/lox-Recombinase-Mediated Cassette Exchange for Reversible Site-Specific Genomic Targeting of the Disease Vector, Aedes aegypti.

PubMed

Häcker, Irina; Harrell Ii, Robert A; Eichner, Gerrit; Pilitt, Kristina L; O'Brochta, David A; Handler, Alfred M; Schetelig, Marc F

2017-03-07

Site-specific genome modification (SSM) is an important tool for mosquito functional genomics and comparative gene expression studies, which contribute to a better understanding of mosquito biology and are thus a key to finding new strategies to eliminate vector-borne diseases. Moreover, it allows for the creation of advanced transgenic strains for vector control programs. SSM circumvents the drawbacks of transposon-mediated transgenesis, where random transgene integration into the host genome results in insertional mutagenesis and variable position effects. We applied the Cre/lox recombinase-mediated cassette exchange (RMCE) system to Aedes aegypti, the vector of dengue, chikungunya, and Zika viruses. In this context we created four target site lines for RMCE and evaluated their fitness costs. Cre-RMCE is functional in a two-step mechanism and with good efficiency in Ae. aegypti. The advantages of Cre-RMCE over existing site-specific modification systems for Ae. aegypti, phiC31-RMCE and CRISPR, originate in the preservation of the recombination sites, which 1) allows successive modifications and rapid expansion or adaptation of existing systems by repeated targeting of the same site; and 2) provides reversibility, thus allowing the excision of undesired sequences. Thereby, Cre-RMCE complements existing genomic modification tools, adding flexibility and versatility to vector genome targeting.
CRISPR/Cas9-mediated gene knockout screens and target identification via whole-genome sequencing uncover host genes required for picornavirus infection.

PubMed

Kim, Heon Seok; Lee, Kyungjin; Bae, Sangsu; Park, Jeongbin; Lee, Chong-Kyo; Kim, Meehyein; Kim, Eunji; Kim, Minju; Kim, Seokjoong; Kim, Chonsaeng; Kim, Jin-Soo

2017-06-23

Several groups have used genome-wide libraries of lentiviruses encoding small guide RNAs (sgRNAs) for genetic screens. In most cases, sgRNA expression cassettes are integrated into cells by using lentiviruses, and target genes are statistically estimated by the readout of sgRNA sequences after targeted sequencing. We present a new virus-free method for human gene knockout screens using a genome-wide library of CRISPR/Cas9 sgRNAs based on plasmids and target gene identification via whole-genome sequencing (WGS) confirmation of authentic mutations rather than statistical estimation through targeted amplicon sequencing. We used 30,840 pairs of individually synthesized oligonucleotides to construct the genome-scale sgRNA library, collectively targeting 10,280 human genes ( i.e. three sgRNAs per gene). These plasmid libraries were co-transfected with a Cas9-expression plasmid into human cells, which were then treated with cytotoxic drugs or viruses. Only cells lacking key factors essential for cytotoxic drug metabolism or viral infection were able to survive. Genomic DNA isolated from cells that survived these challenges was subjected to WGS to directly identify CRISPR/Cas9-mediated causal mutations essential for cell survival. With this approach, we were able to identify known and novel genes essential for viral infection in human cells. We propose that genome-wide sgRNA screens based on plasmids coupled with WGS are powerful tools for forward genetics studies and drug target discovery. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.
Ebolavirus comparative genomics

PubMed Central

Jun, Se-Ran; Leuze, Michael R.; Nookaew, Intawat; Uberbacher, Edward C.; Land, Miriam; Zhang, Qian; Wanchai, Visanu; Chai, Juanjuan; Nielsen, Morten; Trolle, Thomas; Lund, Ole; Buzard, Gregory S.; Pedersen, Thomas D.; Wassenaar, Trudy M.; Ussery, David W.

2015-01-01

The 2014 Ebola outbreak in West Africa is the largest documented for this virus. To examine the dynamics of this genome, we compare more than 100 currently available ebolavirus genomes to each other and to other viral genomes. Based on oligomer frequency analysis, the family Filoviridae forms a distinct group from all other sequenced viral genomes. All filovirus genomes sequenced to date encode proteins with similar functions and gene order, although there is considerable divergence in sequences between the three genera Ebolavirus, Cuevavirus and Marburgvirus within the family Filoviridae. Whereas all ebolavirus genomes are quite similar (multiple sequences of the same strain are often identical), variation is most common in the intergenic regions and within specific areas of the genes encoding the glycoprotein (GP), nucleoprotein (NP) and polymerase (L). We predict regions that could contain epitope-binding sites, which might be good vaccine targets. This information, combined with glycosylation sites and experimentally determined epitopes, can identify the most promising regions for the development of therapeutic strategies. This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). PMID:26175035
Pre-capture multiplexing improves efficiency and cost-effectiveness of targeted genomic enrichment.

PubMed

Shearer, A Eliot; Hildebrand, Michael S; Ravi, Harini; Joshi, Swati; Guiffre, Angelica C; Novak, Barbara; Happe, Scott; LeProust, Emily M; Smith, Richard J H

2012-11-14

Targeted genomic enrichment (TGE) is a widely used method for isolating and enriching specific genomic regions prior to massively parallel sequencing. To make effective use of sequencer output, barcoding and sample pooling (multiplexing) after TGE and prior to sequencing (post-capture multiplexing) has become routine. While previous reports have indicated that multiplexing prior to capture (pre-capture multiplexing) is feasible, no thorough examination of the effect of this method has been completed on a large number of samples. Here we compare standard post-capture TGE to two levels of pre-capture multiplexing: 12 or 16 samples per pool. We evaluated these methods using standard TGE metrics and determined the ability to identify several classes of genetic mutations in three sets of 96 samples, including 48 controls. Our overall goal was to maximize cost reduction and minimize experimental time while maintaining a high percentage of reads on target and a high depth of coverage at thresholds required for variant detection. We adapted the standard post-capture TGE method for pre-capture TGE with several protocol modifications, including redesign of blocking oligonucleotides and optimization of enzymatic and amplification steps. Pre-capture multiplexing reduced costs for TGE by at least 38% and significantly reduced hands-on time during the TGE protocol. We found that pre-capture multiplexing reduced capture efficiency by 23 or 31% for pre-capture pools of 12 and 16, respectively. However efficiency losses at this step can be compensated by reducing the number of simultaneously sequenced samples. Pre-capture multiplexing and post-capture TGE performed similarly with respect to variant detection of positive control mutations. In addition, we detected no instances of sample switching due to aberrant barcode identification. Pre-capture multiplexing improves efficiency of TGE experiments with respect to hands-on time and reagent use compared to standard post-capture TGE
Full-genome sequences of hepatitis B virus subgenotype D3 isolates from the Brazilian Amazon Region.

PubMed

Spitz, Natália; Mello, Francisco C A; Araujo, Natalia Motta

2015-02-01

The Brazilian Amazon Region is a highly endemic area for hepatitis B virus (HBV). However, little is known regarding the genetic variability of the strains circulating in this geographical region. Here, we describe the first full-length genomes of HBV isolated in the Brazilian Amazon Region; these genomes are also the first complete HBV subgenotype D3 genomes reported for Brazil. The genomes of the five Brazilian isolates were all 3,182 base pairs in length and the isolates were classified as belonging to subgenotype D3, subtypes ayw2 (n = 3) and ayw3 (n = 2). Phylogenetic analysis suggested that the Brazilian sequences are not likely to be closely related to European D3 sequences. Such results will contribute to further epidemiological and evolutionary studies of HBV.
Uprobe: a genome-wide universal probe resource for comparative physical mapping in vertebrates.

PubMed

Kellner, Wendy A; Sullivan, Robert T; Carlson, Brian H; Thomas, James W

2005-01-01

Interspecies comparisons are important for deciphering the functional content and evolution of genomes. The expansive array of >70 public vertebrate genomic bacterial artificial chromosome (BAC) libraries can provide a means of comparative mapping, sequencing, and functional analysis of targeted chromosomal segments that is independent and complementary to whole-genome sequencing. However, at the present time, no complementary resource exists for the efficient targeted physical mapping of the majority of these BAC libraries. Universal overgo-hybridization probes, designed from regions of sequenced genomes that are highly conserved between species, have been demonstrated to be an effective resource for the isolation of orthologous regions from multiple BAC libraries in parallel. Here we report the application of the universal probe design principal across entire genomes, and the subsequent creation of a complementary probe resource, Uprobe, for screening vertebrate BAC libraries. Uprobe currently consists of whole-genome sets of universal overgo-hybridization probes designed for screening mammalian or avian/reptilian libraries. Retrospective analysis, experimental validation of the probe design process on a panel of representative BAC libraries, and estimates of probe coverage across the genome indicate that the majority of all eutherian and avian/reptilian genes or regions of interest can be isolated using Uprobe. Future implementation of the universal probe design strategy will be used to create an expanded number of whole-genome probe sets that will encompass all vertebrate genomes.
Analysis of Genomic Regions of Trichoderma harzianum IOC-3844 Related to Biomass Degradation

PubMed Central

Crucello, Aline; Sforça, Danilo Augusto; Horta, Maria Augusta Crivelente; dos Santos, Clelton Aparecido; Viana, Américo José Carvalho; Beloti, Lilian Luzia; de Toledo, Marcelo Augusto Szymanski; Vincentz, Michel; Kuroshu, Reginaldo Massanobu; de Souza, Anete Pereira

2015-01-01

Trichoderma harzianum IOC-3844 secretes high levels of cellulolytic-active enzymes and is therefore a promising strain for use in biotechnological applications in second-generation bioethanol production. However, the T. harzianum biomass degradation mechanism has not been well explored at the genetic level. The present work investigates six genomic regions (~150 kbp each) in this fungus that are enriched with genes related to biomass conversion. A BAC library consisting of 5,760 clones was constructed, with an average insert length of 90 kbp. The assembled BAC sequences revealed 232 predicted genes, 31.5% of which were related to catabolic pathways, including those involved in biomass degradation. An expression profile analysis based on RNA-Seq data demonstrated that putative regulatory elements, such as membrane transport proteins and transcription factors, are located in the same genomic regions as genes related to carbohydrate metabolism and exhibit similar expression profiles. Thus, we demonstrate a rapid and efficient tool that focuses on specific genomic regions by combining a BAC library with transcriptomic data. This is the first BAC-based structural genomic study of the cellulolytic fungus T. harzianum, and its findings provide new perspectives regarding the use of this species in biomass degradation processes. PMID:25836973
Using in Vitro Evolution and Whole Genome Analysis To Discover Next Generation Targets for Antimalarial Drug Discovery

PubMed Central

2018-01-01

Although many new anti-infectives have been discovered and developed solely using phenotypic cellular screening and assay optimization, most researchers recognize that structure-guided drug design is more practical and less costly. In addition, a greater chemical space can be interrogated with structure-guided drug design. The practicality of structure-guided drug design has launched a search for the targets of compounds discovered in phenotypic screens. One method that has been used extensively in malaria parasites for target discovery and chemical validation is in vitro evolution and whole genome analysis (IVIEWGA). Here, small molecules from phenotypic screens with demonstrated antiparasitic activity are used in genome-based target discovery methods. In this Review, we discuss the newest, most promising druggable targets discovered or further validated by evolution-based methods, as well as some exceptions. PMID:29451780
Bat white-nose syndrome: a real-time TaqMan polymerase chain reaction test targeting the intergenic spacer region of Geomyces destructanstructans.

USGS Publications Warehouse

Muller, Laura K.; Lorch, Jeffrey M.; Lindner, Daniel L.; O'Connor, Michael; Gargas, Andrea; Blehert, David S.

2013-01-01

The fungus Geomyces destructans is the causative agent of white-nose syndrome (WNS), a disease that has killed millions of North American hibernating bats. We describe a real-time TaqMan PCR test that detects DNA from G. destructans by targeting a portion of the multicopy intergenic spacer region of the rRNA gene complex. The test is highly sensitive, consistently detecting as little as 3.3 fg of genomic DNA from G. destructans. The real-time PCR test specifically amplified genomic DNA from G. destructans but did not amplify target sequence from 54 closely related fungal isolates (including 43 Geomyces spp. isolates) associated with bats. The test was further qualified by analyzing DNA extracted from 91 bat wing skin samples, and PCR results matched histopathology findings. These data indicate the real-time TaqMan PCR method described herein is a sensitive, specific, and rapid test to detect DNA from G. destructans and provides a valuable tool for WNS diagnostics and research.
Genome-wide association analysis of milk yield traits in Nordic Red Cattle using imputed whole genome sequence variants.

PubMed

Iso-Touru, T; Sahana, G; Guldbrandtsen, B; Lund, M S; Vilkki, J

2016-03-22

The Nordic Red Cattle consisting of three different populations from Finland, Sweden and Denmark are under a joint breeding value estimation system. The long history of recording of production and health traits offers a great opportunity to study production traits and identify causal variants behind them. In this study, we used whole genome sequence level data from 4280 progeny tested Nordic Red Cattle bulls to scan the genome for loci affecting milk, fat and protein yields. Using a genome-wise significance threshold, regions on Bos taurus chromosomes 5, 14, 23, 25 and 26 were associated with fat yield. Regions on chromosomes 5, 14, 16, 19, 20 and 25 were associated with milk yield and chromosomes 5, 14 and 25 had regions associated with protein yield. Significantly associated variations were found in 227 genes for fat yield, 72 genes for milk yield and 30 genes for protein yield. Ingenuity Pathway Analysis was used to identify networks connecting these genes displaying significant hits. When compared to previously mapped genomic regions associated with fertility, significantly associated variations were found in 5 genes common for fat yield and fertility, thus linking these two traits via biological networks. This is the first time when whole genome sequence data is utilized to study genomic regions affecting milk production in the Nordic Red Cattle population. Sequence level data offers the possibility to study quantitative traits in detail but still cannot unambiguously reveal which of the associated variations is causative. Linkage disequilibrium creates difficulties to pinpoint the causative genes and variations. One solution to overcome these difficulties is the identification of the functional gene networks and pathways to reveal important interacting genes as candidates for the observed effects. This information on target genomic regions may be exploited to improve genomic prediction.
Comparative Genomics of Campylobacter iguaniorum to Unravel Genetic Regions Associated with Reptilian Hosts

PubMed Central

Gilbert, Maarten J.; Miller, William G.; Yee, Emma; Kik, Marja; Zomer, Aldert L.; Wagenaar, Jaap A.; Duim, Birgitta

2016-01-01

Abstract Campylobacter iguaniorum is most closely related to the species C. fetus, C. hyointestinalis, and C. lanienae. Reptiles, chelonians and lizards in particular, appear to be a primary reservoir of this Campylobacter species. Here we report the genome comparison of C. iguaniorum strain 1485E, isolated from a bearded dragon (Pogona vitticeps), and strain 2463D, isolated from a green iguana (Iguana iguana), with the genomes of closely related taxa, in particular with reptile-associated C. fetus subsp. testudinum. In contrast to C. fetus, C. iguaniorum is lacking an S-layer encoding region. Furthermore, a defined lipooligosaccharide biosynthesis locus, encoding multiple glycosyltransferases and bounded by waa genes, is absent from C. iguaniorum. Instead, multiple predicted glycosylation regions were identified in C. iguaniorum. One of these regions is > 50 kb with deviant G + C content, suggesting acquisition via lateral transfer. These similar, but non-homologous glycosylation regions were located at the same position on the genome in both strains. Multiple genes encoding respiratory enzymes not identified to date within the C. fetus clade were present. C. iguaniorum shared highest homology with C. hyointestinalis and C. fetus. As in reptile-associated C. fetus subsp. testudinum, a putative tricarballylate catabolism locus was identified. However, despite colonizing a shared host, no recent recombination between both taxa was detected. This genomic study provides a better understanding of host adaptation, virulence, phylogeny, and evolution of C. iguaniorum and related Campylobacter taxa. PMID:27604878
Emergence of the Noncoding Cancer Genome: A Target of Genetic and Epigenetic Alterations.

PubMed

Zhou, Stanley; Treloar, Aislinn E; Lupien, Mathieu

2016-11-01

The emergence of whole-genome annotation approaches is paving the way for the comprehensive annotation of the human genome across diverse cell and tissue types exposed to various environmental conditions. This has already unmasked the positions of thousands of functional cis-regulatory elements integral to transcriptional regulation, such as enhancers, promoters, and anchors of chromatin interactions that populate the noncoding genome. Recent studies have shown that cis-regulatory elements are commonly the targets of genetic and epigenetic alterations associated with aberrant gene expression in cancer. Here, we review these findings to showcase the contribution of the noncoding genome and its alteration in the development and progression of cancer. We also highlight the opportunities to translate the biological characterization of genetic and epigenetic alterations in the noncoding cancer genome into novel approaches to treat or monitor disease. The majority of genetic and epigenetic alterations accumulate in the noncoding genome throughout oncogenesis. Discriminating driver from passenger events is a challenge that holds great promise to improve our understanding of the etiology of different cancer types. Advancing our understanding of the noncoding cancer genome may thus identify new therapeutic opportunities and accelerate our capacity to find improved biomarkers to monitor various stages of cancer development. Cancer Discov; 6(11); 1215-29. ©2016 AACR. ©2016 American Association for Cancer Research.
A tailing genome walking method suitable for genomes with high local GC content.

PubMed

Liu, Taian; Fang, Yongxiang; Yao, Wenjuan; Guan, Qisai; Bai, Gang; Jing, Zhizhong

2013-10-15

The tailing genome walking strategies are simple and efficient. However, they sometimes can be restricted due to the low stringency of homo-oligomeric primers. Here we modified their conventional tailing step by adding polythymidine and polyguanine to the target single-stranded DNA (ssDNA). The tailed ssDNA was then amplified exponentially with a specific primer in the known region and a primer comprising 5' polycytosine and 3' polyadenosine. The successful application of this novel method for identifying integration sites mediated by φC31 integrase in goat genome indicates that the method is more suitable for genomes with high complexity and local GC content. Copyright © 2013 Elsevier Inc. All rights reserved.
Genomic Organization of the Murine Miller–Dieker/Lissencephaly Region: Conservation of Linkage with the Human Region

PubMed Central

Hirotsune, Shinji; Pack, Svetlana D.; Chong, Samuel S.; Robbins, Christiane M.; Pavan, William J.; Ledbetter, David H.; Wynshaw-Boris, Anthony

1997-01-01

Several human syndromes are associated with haploinsufficiency of chromosomal regions secondary to microdeletions. Isolated lissencephaly sequence (ILS), a human developmental disease characterized by a smooth cerebral surface (classical lissencephaly) and microscopic evidence of incomplete neuronal migration, is often associated with small deletions or translocations at chromosome 17p13.3. Miller–Dieker syndrome (MDS) is associated with larger deletions of 17p13.3 and consists of classical lissencephaly with additional phenotypes including facial abnormalities. We have isolated the murine homologs of three genes located inside and outside the MDS region: Lis1, Mnt/Rox, and 14-3-3ε. These genes are all located on mouse chromosome 11B2, as determined by metaphase FISH, and the relative order and approximate gene distance was determined by interphase FISH analysis. The transcriptional orientation and intergenic distance of Lis1 and Mnt/Rox were ascertained by fragmentation analysis of a mouse yeast artificial chromosome containing both genes. To determine the distance and orientation of 14-3-3ε with respect to Lis1 and Mnt/Rox, we introduced a super-rare cutter site (VDE) that is unique in the mouse genome into 14-3-3ε by gene targeting. Using the introduced VDE site, the orientation of this gene was determined by pulsed field gel electrophoresis and Southern blot analysis. Our results demonstrate that the MDS region is conserved between human and mouse. This conservation of linkage suggests that the mouse can be used to model microdeletions that occur in ILS and MDS. PMID:9199935
Targeting vector construction through recombineering.

PubMed

Malureanu, Liviu A

2011-01-01

Gene targeting in mouse embryonic stem cells is an essential, yet still very expensive and highly time-consuming, tool and method to study gene function at the organismal level or to create mouse models of human diseases. Conventional cloning-based methods have been largely used for generating targeting vectors, but are hampered by a number of limiting factors, including the variety and location of restriction enzymes in the gene locus of interest, the specific PCR amplification of repetitive DNA sequences, and cloning of large DNA fragments. Recombineering is a technique that exploits the highly efficient homologous recombination function encoded by λ phage in Escherichia coli. Bacteriophage-based recombination can recombine homologous sequences as short as 30-50 bases, allowing manipulations such as insertion, deletion, or mutation of virtually any genomic region. The large availability of mouse genomic bacterial artificial chromosome (BAC) libraries covering most of the genome facilitates the retrieval of genomic DNA sequences from the bacterial chromosomes through recombineering. This chapter describes a successfully applied protocol and aims to be a detailed guide through the steps of generation of targeting vectors through recombineering.
Determining Epigenetic Targets: A Beginner's Guide to Identifying Genome Functionality Through Database Analysis.

PubMed

Hay, Elizabeth A; Cowie, Philip; MacKenzie, Alasdair

2017-01-01

There can now be little doubt that the cis-regulatory genome represents the largest information source within the human genome essential for health. In addition to containing up to five times more information than the coding genome, the cis-regulatory genome also acts as a major reservoir of disease-associated polymorphic variation. The cis-regulatory genome, which is comprised of enhancers, silencers, promoters, and insulators, also acts as a major functional target for epigenetic modification including DNA methylation and chromatin modifications. These epigenetic modifications impact the ability of cis-regulatory sequences to maintain tissue-specific and inducible expression of genes that preserve health. There has been limited ability to identify and characterize the functional components of this huge and largely misunderstood part of the human genome that, for decades, was ignored as "Junk" DNA. In an attempt to address this deficit, the current chapter will first describe methods of identifying and characterizing functional elements of the cis-regulatory genome at a genome-wide level using databases such as ENCODE, the UCSC browser, and NCBI. We will then explore the databases on the UCSC genome browser, which provides access to DNA methylation and chromatin modification datasets. Finally, we will describe how we can superimpose the huge volume of study data contained in the NCBI archives onto that contained within the UCSC browser in order to glean relevant in vivo study data for any locus within the genome. An ability to access and utilize these information sources will become essential to informing the future design of experiments and subsequent determination of the role of epigenetics in health and disease and will form a critical step in our development of personalized medicine.
Identification of Genomic Regions Associated with Phenotypic Variation between Dog Breeds using Selection Mapping

PubMed Central

Derrien, Thomas; Axelsson, Erik; Rosengren Pielberg, Gerli; Sigurdsson, Snaevar; Fall, Tove; Seppälä, Eija H.; Hansen, Mark S. T.; Lawley, Cindy T.; Karlsson, Elinor K.; Bannasch, Danika; Vilà, Carles; Lohi, Hannes; Galibert, Francis; Fredholm, Merete; Häggström, Jens; Hedhammar, Åke; André, Catherine; Lindblad-Toh, Kerstin; Hitte, Christophe; Webster, Matthew T.

2011-01-01

The extraordinary phenotypic diversity of dog breeds has been sculpted by a unique population history accompanied by selection for novel and desirable traits. Here we perform a comprehensive analysis using multiple test statistics to identify regions under selection in 509 dogs from 46 diverse breeds using a newly developed high-density genotyping array consisting of >170,000 evenly spaced SNPs. We first identify 44 genomic regions exhibiting extreme differentiation across multiple breeds. Genetic variation in these regions correlates with variation in several phenotypic traits that vary between breeds, and we identify novel associations with both morphological and behavioral traits. We next scan the genome for signatures of selective sweeps in single breeds, characterized by long regions of reduced heterozygosity and fixation of extended haplotypes. These scans identify hundreds of regions, including 22 blocks of homozygosity longer than one megabase in certain breeds. Candidate selection loci are strongly enriched for developmental genes. We chose one highly differentiated region, associated with body size and ear morphology, and characterized it using high-throughput sequencing to provide a list of variants that may directly affect these traits. This study provides a catalogue of genomic regions showing extreme reduction in genetic variation or population differentiation in dogs, including many linked to phenotypic variation. The many blocks of reduced haplotype diversity observed across the genome in dog breeds are the result of both selection and genetic drift, but extended blocks of homozygosity on a megabase scale appear to be best explained by selection. Further elucidation of the variants under selection will help to uncover the genetic basis of complex traits and disease. PMID:22022279
Identification of genomic regions associated with phenotypic variation between dog breeds using selection mapping.

PubMed

Vaysse, Amaury; Ratnakumar, Abhirami; Derrien, Thomas; Axelsson, Erik; Rosengren Pielberg, Gerli; Sigurdsson, Snaevar; Fall, Tove; Seppälä, Eija H; Hansen, Mark S T; Lawley, Cindy T; Karlsson, Elinor K; Bannasch, Danika; Vilà, Carles; Lohi, Hannes; Galibert, Francis; Fredholm, Merete; Häggström, Jens; Hedhammar, Ake; André, Catherine; Lindblad-Toh, Kerstin; Hitte, Christophe; Webster, Matthew T

2011-10-01

The extraordinary phenotypic diversity of dog breeds has been sculpted by a unique population history accompanied by selection for novel and desirable traits. Here we perform a comprehensive analysis using multiple test statistics to identify regions under selection in 509 dogs from 46 diverse breeds using a newly developed high-density genotyping array consisting of >170,000 evenly spaced SNPs. We first identify 44 genomic regions exhibiting extreme differentiation across multiple breeds. Genetic variation in these regions correlates with variation in several phenotypic traits that vary between breeds, and we identify novel associations with both morphological and behavioral traits. We next scan the genome for signatures of selective sweeps in single breeds, characterized by long regions of reduced heterozygosity and fixation of extended haplotypes. These scans identify hundreds of regions, including 22 blocks of homozygosity longer than one megabase in certain breeds. Candidate selection loci are strongly enriched for developmental genes. We chose one highly differentiated region, associated with body size and ear morphology, and characterized it using high-throughput sequencing to provide a list of variants that may directly affect these traits. This study provides a catalogue of genomic regions showing extreme reduction in genetic variation or population differentiation in dogs, including many linked to phenotypic variation. The many blocks of reduced haplotype diversity observed across the genome in dog breeds are the result of both selection and genetic drift, but extended blocks of homozygosity on a megabase scale appear to be best explained by selection. Further elucidation of the variants under selection will help to uncover the genetic basis of complex traits and disease.
Estimation of (co)variances for genomic regions of flexible sizes: application to complex infectious udder diseases in dairy cattle

PubMed Central

2012-01-01

Background Multi-trait genomic models in a Bayesian context can be used to estimate genomic (co)variances, either for a complete genome or for genomic regions (e.g. per chromosome) for the purpose of multi-trait genomic selection or to gain further insight into the genomic architecture of related traits such as mammary disease traits in dairy cattle. Methods Data on progeny means of six traits related to mastitis resistance in dairy cattle (general mastitis resistance and five pathogen-specific mastitis resistance traits) were analyzed using a bivariate Bayesian SNP-based genomic model with a common prior distribution for the marker allele substitution effects and estimation of the hyperparameters in this prior distribution from the progeny means data. From the Markov chain Monte Carlo samples of the allele substitution effects, genomic (co)variances were calculated on a whole-genome level, per chromosome, and in regions of 100 SNP on a chromosome. Results Genomic proportions of the total variance differed between traits. Genomic correlations were lower than pedigree-based genetic correlations and they were highest between general mastitis and pathogen-specific traits because of the part-whole relationship between these traits. The chromosome-wise genomic proportions of the total variance differed between traits, with some chromosomes explaining higher or lower values than expected in relation to chromosome size. Few chromosomes showed pleiotropic effects and only chromosome 19 had a clear effect on all traits, indicating the presence of QTL with a general effect on mastitis resistance. The region-wise patterns of genomic variances differed between traits. Peaks indicating QTL were identified but were not very distinctive because a common prior for the marker effects was used. There was a clear difference in the region-wise patterns of genomic correlation among combinations of traits, with distinctive peaks indicating the presence of pleiotropic QTL. Conclusions
Genomic landscape of gastric cancer: molecular classification and potential targets.

PubMed

Guo, Jiawei; Yu, Weiwei; Su, Hui; Pang, Xiufeng

2017-02-01

Gastric cancer imposes a considerable health burden worldwide, and its mortality ranks as the second highest for all types of cancers. The limited knowledge of the molecular mechanisms underlying gastric cancer tumorigenesis hinders the development of therapeutic strategies. However, ongoing collaborative sequencing efforts facilitate molecular classification and unveil the genomic landscape of gastric cancer. Several new drivers and tumorigenic pathways in gastric cancer, including chromatin remodeling genes, RhoA-related pathways, TP53 dysregulation, activation of receptor tyrosine kinases, stem cell pathways and abnormal DNA methylation, have been revealed. These newly identified genomic alterations await translation into clinical diagnosis and targeted therapies. Considering that loss-of-function mutations are intractable, synthetic lethality could be employed when discussing feasible therapeutic strategies. Although many challenges remain to be tackled, we are optimistic regarding improvements in the prognosis and treatment of gastric cancer in the near future.

PAM multiplicity marks genomic target sites as inhibitory to CRISPR-Cas9 editing.

PubMed

Malina, Abba; Cameron, Christopher J F; Robert, Francis; Blanchette, Mathieu; Dostie, Josée; Pelletier, Jerry

2015-12-08

In CRISPR-Cas9 genome editing, the underlying principles for selecting guide RNA (gRNA) sequences that would ensure for efficient target site modification remain poorly understood. Here we show that target sites harbouring multiple protospacer adjacent motifs (PAMs) are refractory to Cas9-mediated repair in situ. Thus we refine which substrates should be avoided in gRNA design, implicating PAM density as a novel sequence-specific feature that inhibits in vivo Cas9-driven DNA modification.
Genome-Wide Analysis of Transposon and Retroviral Insertions Reveals Preferential Integrations in Regions of DNA Flexibility.

PubMed

Vrljicak, Pavle; Tao, Shijie; Varshney, Gaurav K; Quach, Helen Ngoc Bao; Joshi, Adita; LaFave, Matthew C; Burgess, Shawn M; Sampath, Karuna

2016-04-07

DNA transposons and retroviruses are important transgenic tools for genome engineering. An important consideration affecting the choice of transgenic vector is their insertion site preferences. Previous large-scale analyses of Ds transposon integration sites in plants were done on the basis of reporter gene expression or germ-line transmission, making it difficult to discern vertebrate integration preferences. Here, we compare over 1300 Ds transposon integration sites in zebrafish with Tol2 transposon and retroviral integration sites. Genome-wide analysis shows that Ds integration sites in the presence or absence of marker selection are remarkably similar and distributed throughout the genome. No strict motif was found, but a preference for structural features in the target DNA associated with DNA flexibility (Twist, Tilt, Rise, Roll, Shift, and Slide) was observed. Remarkably, this feature is also found in transposon and retroviral integrations in maize and mouse cells. Our findings show that structural features influence the integration of heterologous DNA in genomes, and have implications for targeted genome engineering. Copyright © 2016 Vrljicak et al.
Recurrent DNA inversion rearrangements in the human genome

PubMed Central

Flores, Margarita; Morales, Lucía; Gonzaga-Jauregui, Claudia; Domínguez-Vidaña, Rocío; Zepeda, Cinthya; Yañez, Omar; Gutiérrez, María; Lemus, Tzitziki; Valle, David; Avila, Ma. Carmen; Blanco, Daniel; Medina-Ruiz, Sofía; Meza, Karla; Ayala, Erandi; García, Delfino; Bustos, Patricia; González, Víctor; Girard, Lourdes; Tusie-Luna, Teresa; Dávila, Guillermo; Palacios, Rafael

2007-01-01

Several lines of evidence suggest that reiterated sequences in the human genome are targets for nonallelic homologous recombination (NAHR), which facilitates genomic rearrangements. We have used a PCR-based approach to identify breakpoint regions of rearranged structures in the human genome. In particular, we have identified intrachromosomal identical repeats that are located in reverse orientation, which may lead to chromosomal inversions. A bioinformatic workflow pathway to select appropriate regions for analysis was developed. Three such regions overlapping with known human genes, located on chromosomes 3, 15, and 19, were analyzed. The relative proportion of wild-type to rearranged structures was determined in DNA samples from blood obtained from different, unrelated individuals. The results obtained indicate that recurrent genomic rearrangements occur at relatively high frequency in somatic cells. Interestingly, the rearrangements studied were significantly more abundant in adults than in newborn individuals, suggesting that such DNA rearrangements might start to appear during embryogenesis or fetal life and continue to accumulate after birth. The relevance of our results in regard to human genomic variation is discussed. PMID:17389356
Centromere-Like Regions in the Budding Yeast Genome

PubMed Central

Lefrançois, Philippe; Auerbach, Raymond K.; Yellman, Christopher M.; Roeder, G. Shirleen; Snyder, Michael

2013-01-01

Accurate chromosome segregation requires centromeres (CENs), the DNA sequences where kinetochores form, to attach chromosomes to microtubules. In contrast to most eukaryotes, which have broad centromeres, Saccharomyces cerevisiae possesses sequence-defined point CENs. Chromatin immunoprecipitation followed by sequencing (ChIP–Seq) reveals colocalization of four kinetochore proteins at novel, discrete, non-centromeric regions, especially when levels of the centromeric histone H3 variant, Cse4 (a.k.a. CENP-A or CenH3), are elevated. These regions of overlapping protein binding enhance the segregation of plasmids and chromosomes and have thus been termed Centromere-Like Regions (CLRs). CLRs form in close proximity to S. cerevisiae CENs and share characteristics typical of both point and regional CENs. CLR sequences are conserved among related budding yeasts. Many genomic features characteristic of CLRs are also associated with these conserved homologous sequences from closely related budding yeasts. These studies provide general and important insights into the origin and evolution of centromeres. PMID:23349633
Genome-wide selection components analysis in a fish with male pregnancy.

PubMed

Flanagan, Sarah P; Jones, Adam G

2017-04-01

A major goal of evolutionary biology is to identify the genome-level targets of natural and sexual selection. With the advent of next-generation sequencing, whole-genome selection components analysis provides a promising avenue in the search for loci affected by selection in nature. Here, we implement a genome-wide selection components analysis in the sex role reversed Gulf pipefish, Syngnathus scovelli. Our approach involves a double-digest restriction-site associated DNA sequencing (ddRAD-seq) technique, applied to adult females, nonpregnant males, pregnant males, and their offspring. An F ST comparison of allele frequencies among these groups reveals 47 genomic regions putatively experiencing sexual selection, as well as 468 regions showing a signature of differential viability selection between males and females. A complementary likelihood ratio test identifies similar patterns in the data as the F ST analysis. Sexual selection and viability selection both tend to favor the rare alleles in the population. Ultimately, we conclude that genome-wide selection components analysis can be a useful tool to complement other approaches in the effort to pinpoint genome-level targets of selection in the wild. © 2017 The Author(s). Evolution © 2017 The Society for the Study of Evolution.
Smooth Muscle Cell Genome Browser: Enabling the Identification of Novel Serum Response Factor Target Genes

PubMed Central

Lee, Moon Young; Park, Chanjae; Berent, Robyn M.; Park, Paul J.; Fuchs, Robert; Syn, Hannah; Chin, Albert; Townsend, Jared; Benson, Craig C.; Redelman, Doug; Shen, Tsai-wei; Park, Jong Kun; Miano, Joseph M.; Sanders, Kenton M.; Ro, Seungil

2015-01-01

Genome-scale expression data on the absolute numbers of gene isoforms offers essential clues in cellular functions and biological processes. Smooth muscle cells (SMCs) perform a unique contractile function through expression of specific genes controlled by serum response factor (SRF), a transcription factor that binds to DNA sites known as the CArG boxes. To identify SRF-regulated genes specifically expressed in SMCs, we isolated SMC populations from mouse small intestine and colon, obtained their transcriptomes, and constructed an interactive SMC genome and CArGome browser. To our knowledge, this is the first online resource that provides a comprehensive library of all genetic transcripts expressed in primary SMCs. The browser also serves as the first genome-wide map of SRF binding sites. The browser analysis revealed novel SMC-specific transcriptional variants and SRF target genes, which provided new and unique insights into the cellular and biological functions of the cells in gastrointestinal (GI) physiology. The SRF target genes in SMCs, which were discovered in silico, were confirmed by proteomic analysis of SMC-specific Srf knockout mice. Our genome browser offers a new perspective into the alternative expression of genes in the context of SRF binding sites in SMCs and provides a valuable reference for future functional studies. PMID:26241044
Pooled-DNA sequencing identifies genomic regions of selection in Nigerian isolates of Plasmodium falciparum.

PubMed

Oyebola, Kolapo M; Idowu, Emmanuel T; Olukosi, Yetunde A; Awolola, Taiwo S; Amambua-Ngwa, Alfred

2017-06-29

The burden of falciparum malaria is especially high in sub-Saharan Africa. Differences in pressure from host immunity and antimalarial drugs lead to adaptive changes responsible for high level of genetic variations within and between the parasite populations. Population-specific genetic studies to survey for genes under positive or balancing selection resulting from drug pressure or host immunity will allow for refinement of interventions. We performed a pooled sequencing (pool-seq) of the genomes of 100 Plasmodium falciparum isolates from Nigeria. We explored allele-frequency based neutrality test (Tajima's D) and integrated haplotype score (iHS) to identify genes under selection. Fourteen shared iHS regions that had at least 2 SNPs with a score > 2.5 were identified. These regions code for genes that were likely to have been under strong directional selection. Two of these genes were the chloroquine resistance transporter (CRT) on chromosome 7 and the multidrug resistance 1 (MDR1) on chromosome 5. There was a weak signature of selection in the dihydrofolate reductase (DHFR) gene on chromosome 4 and MDR5 genes on chromosome 13, with only 2 and 3 SNPs respectively identified within the iHS window. We observed strong selection pressure attributable to continued chloroquine and sulfadoxine-pyrimethamine use despite their official proscription for the treatment of uncomplicated malaria. There was also a major selective sweep on chromosome 6 which had 32 SNPs within the shared iHS region. Tajima's D of circumsporozoite protein (CSP), erythrocyte-binding antigen (EBA-175), merozoite surface proteins - MSP3 and MSP7, merozoite surface protein duffy binding-like (MSPDBL2) and serine repeat antigen (SERA-5) were 1.38, 1.29, 0.73, 0.84 and 0.21, respectively. We have demonstrated the use of pool-seq to understand genomic patterns of selection and variability in P. falciparum from Nigeria, which bears the highest burden of infections. This investigation identified known
PAM multiplicity marks genomic target sites as inhibitory to CRISPR-Cas9 editing

PubMed Central

Malina, Abba; Cameron, Christopher J. F.; Robert, Francis; Blanchette, Mathieu; Dostie, Josée; Pelletier, Jerry

2015-01-01

In CRISPR-Cas9 genome editing, the underlying principles for selecting guide RNA (gRNA) sequences that would ensure for efficient target site modification remain poorly understood. Here we show that target sites harbouring multiple protospacer adjacent motifs (PAMs) are refractory to Cas9-mediated repair in situ. Thus we refine which substrates should be avoided in gRNA design, implicating PAM density as a novel sequence-specific feature that inhibits in vivo Cas9-driven DNA modification. PMID:26644285
Targeted Capture Sequencing in Whitebark Pine Reveals Range-Wide Demographic and Adaptive Patterns Despite Challenges of a Large, Repetitive Genome.

PubMed

Syring, John V; Tennessen, Jacob A; Jennings, Tara N; Wegrzyn, Jill; Scelfo-Dalbey, Camille; Cronn, Richard

2016-01-01

Whitebark pine (Pinus albicaulis) inhabits an expansive range in western North America, and it is a keystone species of subalpine environments. Whitebark is susceptible to multiple threats - climate change, white pine blister rust, mountain pine beetle, and fire exclusion - and it is suffering significant mortality range-wide, prompting the tree to be listed as 'globally endangered' by the International Union for Conservation of Nature and 'endangered' by the Canadian government. Conservation collections (in situ and ex situ) are being initiated to preserve the genetic legacy of the species. Reliable, transferrable, and highly variable genetic markers are essential for quantifying the genetic profiles of seed collections relative to natural stands, and ensuring the completeness of conservation collections. We evaluated the use of hybridization-based target capture to enrich specific genomic regions from the 27 GB genome of whitebark pine, and to evaluate genetic variation across loci, trees, and geography. Probes were designed to capture 7,849 distinct genes, and screening was performed on 48 trees. Despite the inclusion of repetitive elements in the probe pool, the resulting dataset provided information on 4,452 genes and 32% of targeted positions (528,873 bp), and we were able to identify 12,390 segregating sites from 47 trees. Variations reveal strong geographic trends in heterozygosity and allelic richness, with trees from the southern Cascade and Sierra Range showing the greatest distinctiveness and differentiation. Our results show that even under non-optimal conditions (low enrichment efficiency; inclusion of repetitive elements in baits), targeted enrichment produces high quality, codominant genotypes from large genomes. The resulting data can be readily integrated into management and gene conservation activities for whitebark pine, and have the potential to be applied to other members of 5-needle pine group (Pinus subsect. Quinquefolia) due to their
Divergent genome evolution caused by regional variation in DNA gain and loss between human and mouse

PubMed Central

Kortschak, R. Daniel

2018-01-01

The forces driving the accumulation and removal of non-coding DNA and ultimately the evolution of genome size in complex organisms are intimately linked to genome structure and organisation. Our analysis provides a novel method for capturing the regional variation of lineage-specific DNA gain and loss events in their respective genomic contexts. To further understand this connection we used comparative genomics to identify genome-wide individual DNA gain and loss events in the human and mouse genomes. Focusing on the distribution of DNA gains and losses, relationships to important structural features and potential impact on biological processes, we found that in autosomes, DNA gains and losses both followed separate lineage-specific accumulation patterns. However, in both species chromosome X was particularly enriched for DNA gain, consistent with its high L1 retrotransposon content required for X inactivation. We found that DNA loss was associated with gene-rich open chromatin regions and DNA gain events with gene-poor closed chromatin regions. Additionally, we found that DNA loss events tended to be smaller than DNA gain events suggesting that they were able to accumulate in gene-rich open chromatin regions due to their reduced capacity to interrupt gene regulatory architecture. GO term enrichment showed that mouse loss hotspots were strongly enriched for terms related to developmental processes. However, these genes were also located in regions with a high density of conserved elements, suggesting that despite high levels of DNA loss, gene regulatory architecture remained conserved. This is consistent with a model in which DNA gain and loss results in turnover or “churning” in regulatory element dense regions of open chromatin, where interruption of regulatory elements is selected against. PMID:29677183
Comparative Genomics of Campylobacter iguaniorum to Unravel Genetic Regions Associated with Reptilian Hosts.

PubMed

Gilbert, Maarten J; Miller, William G; Yee, Emma; Kik, Marja; Zomer, Aldert L; Wagenaar, Jaap A; Duim, Birgitta

2016-10-05

Campylobacter iguaniorum is most closely related to the species C fetus, C hyointestinalis, and C lanienae Reptiles, chelonians and lizards in particular, appear to be a primary reservoir of this Campylobacter species. Here we report the genome comparison of C iguaniorum strain 1485E, isolated from a bearded dragon (Pogona vitticeps), and strain 2463D, isolated from a green iguana (Iguana iguana), with the genomes of closely related taxa, in particular with reptile-associated C fetus subsp. testudinum In contrast to C fetus, C iguaniorum is lacking an S-layer encoding region. Furthermore, a defined lipooligosaccharide biosynthesis locus, encoding multiple glycosyltransferases and bounded by waa genes, is absent from C iguaniorum Instead, multiple predicted glycosylation regions were identified in C iguaniorum One of these regions is > 50 kb with deviant G + C content, suggesting acquisition via lateral transfer. These similar, but non-homologous glycosylation regions were located at the same position on the genome in both strains. Multiple genes encoding respiratory enzymes not identified to date within the C. fetus clade were present. C iguaniorum shared highest homology with C hyointestinalis and C fetus. As in reptile-associated C fetus subsp. testudinum, a putative tricarballylate catabolism locus was identified. However, despite colonizing a shared host, no recent recombination between both taxa was detected. This genomic study provides a better understanding of host adaptation, virulence, phylogeny, and evolution of C iguaniorum and related Campylobacter taxa. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
LAMP detection assays for boxwood blight pathogens: A comparative genomics approach

DOE Office of Scientific and Technical Information (OSTI.GOV)

Malapi-Wight, Martha; Demers, Jill E.; Veltri, Daniel

Rapid and accurate molecular diagnostic tools are critical to efforts to minimize the impact and spread of emergent pathogens. The identification of diagnostic markers for novel pathogens presents several challenges, especially in the absence of information about population diversity and where genetic resources are limited. The objective of this study was to use comparative genomics datasets to find unique target regions suitable for the diagnosis of two fungal species causing a newly emergent blight disease of boxwood. Candidate marker regions for loop-mediated isothermal amplification (LAMP) assays were identified from draft genomes of Calonectria henricotiae and C. pseudonaviculata, as well asmore » three related species not associated with this disease. To increase the probability of identifying unique targets, we used three approaches to mine genome datasets, based on (i) unique regions, (ii) polymorphisms, and (iii) presence/absence of regions across datasets. From a pool of candidate markers, we demonstrate LAMP assay specificity by testing related fungal species, common boxwood pathogens, and environmental samples containing 445 diverse fungal taxa. In conclusion, this comparative-genomics-based approach to the development of LAMP diagnostic assays is the first of its kind for fungi and could be easily applied to diagnostic marker development for other newly emergent plant pathogens.« less
LAMP detection assays for boxwood blight pathogens: A comparative genomics approach

DOE PAGES

Malapi-Wight, Martha; Demers, Jill E.; Veltri, Daniel; ...

2016-05-20

Rapid and accurate molecular diagnostic tools are critical to efforts to minimize the impact and spread of emergent pathogens. The identification of diagnostic markers for novel pathogens presents several challenges, especially in the absence of information about population diversity and where genetic resources are limited. The objective of this study was to use comparative genomics datasets to find unique target regions suitable for the diagnosis of two fungal species causing a newly emergent blight disease of boxwood. Candidate marker regions for loop-mediated isothermal amplification (LAMP) assays were identified from draft genomes of Calonectria henricotiae and C. pseudonaviculata, as well asmore » three related species not associated with this disease. To increase the probability of identifying unique targets, we used three approaches to mine genome datasets, based on (i) unique regions, (ii) polymorphisms, and (iii) presence/absence of regions across datasets. From a pool of candidate markers, we demonstrate LAMP assay specificity by testing related fungal species, common boxwood pathogens, and environmental samples containing 445 diverse fungal taxa. In conclusion, this comparative-genomics-based approach to the development of LAMP diagnostic assays is the first of its kind for fungi and could be easily applied to diagnostic marker development for other newly emergent plant pathogens.« less
Genome-scale prediction of proteins with long intrinsically disordered regions.

PubMed

Peng, Zhenling; Mizianty, Marcin J; Kurgan, Lukasz

2014-01-01

Proteins with long disordered regions (LDRs), defined as having 30 or more consecutive disordered residues, are abundant in eukaryotes, and these regions are recognized as a distinct class of biologically functional domains. LDRs facilitate various cellular functions and are important for target selection in structural genomics. Motivated by the lack of methods that directly predict proteins with LDRs, we designed Super-fast predictor of proteins with Long Intrinsically DisordERed regions (SLIDER). SLIDER utilizes logistic regression that takes an empirically chosen set of numerical features, which consider selected physicochemical properties of amino acids, sequence complexity, and amino acid composition, as its inputs. Empirical tests show that SLIDER offers competitive predictive performance combined with low computational cost. It outperforms, by at least a modest margin, a comprehensive set of modern disorder predictors (that can indirectly predict LDRs) and is 16 times faster compared to the best currently available disorder predictor. Utilizing our time-efficient predictor, we characterized abundance and functional roles of proteins with LDRs over 110 eukaryotic proteomes. Similar to related studies, we found that eukaryotes have many (on average 30.3%) proteins with LDRs with majority of proteomes having between 25 and 40%, where higher abundance is characteristic to proteomes that have larger proteins. Our first-of-its-kind large-scale functional analysis shows that these proteins are enriched in a number of cellular functions and processes including certain binding events, regulation of catalytic activities, cellular component organization, biogenesis, biological regulation, and some metabolic and developmental processes. A webserver that implements SLIDER is available at http://biomine.ece.ualberta.ca/SLIDER/. Copyright © 2013 Wiley Periodicals, Inc.
Evolutionary history of the ABCB2 genomic region in teleosts

USGS Publications Warehouse

Palti, Y.; Rodriguez, M.F.; Gahr, S.A.; Hansen, J.D.

2007-01-01

Gene duplication, silencing and translocation have all been implicated in shaping the unique genomic architecture of the teleost MH regions. Previously, we demonstrated that trout possess five unlinked regions encoding MH genes. One of these regions harbors ABCB2 which in all other vertebrate classes is found in the MHC class II region. In this study, we sequenced a BAC contig for the trout ABCB2 region. Analysis of this region revealed the presence of genes homologous to those located in the human class II (ABCB2, BRD2, ??DAA), extended class II (RGL2, PHF1, SYGP1) and class III (PBX2, Notch-L) regions. The organization and syntenic relationships of this region were then compared to similar regions in humans, Tetraodon and zebrafish to learn more about the evolutionary history of this region. Our analysis indicates that this region was generated during the teleost-specific duplication event while also providing insight about potential MH paralogous regions in teleosts. ?? 2006 Elsevier Ltd. All rights reserved.
Synthetic Zinc Finger Proteins: The Advent of Targeted Gene Regulation and Genome Modification Technologies

PubMed Central

2015-01-01

Conspectus The understanding of gene regulation and the structure and function of the human genome increased dramatically at the end of the 20th century. Yet the technologies for manipulating the genome have been slower to develop. For instance, the field of gene therapy has been focused on correcting genetic diseases and augmenting tissue repair for more than 40 years. However, with the exception of a few very low efficiency approaches, conventional genetic engineering methods have only been able to add auxiliary genes to cells. This has been a substantial obstacle to the clinical success of gene therapies and has also led to severe unintended consequences in several cases. Therefore, technologies that facilitate the precise modification of cellular genomes have diverse and significant implications in many facets of research and are essential for translating the products of the Genomic Revolution into tangible benefits for medicine and biotechnology. To address this need, in the 1990s, we embarked on a mission to develop technologies for engineering protein–DNA interactions with the aim of creating custom tools capable of targeting any DNA sequence. Our goal has been to allow researchers to reach into genomes to specifically regulate, knock out, or replace any gene. To realize these goals, we initially focused on understanding and manipulating zinc finger proteins. In particular, we sought to create a simple and straightforward method that enables unspecialized laboratories to engineer custom DNA-modifying proteins using only defined modular components, a web-based utility, and standard recombinant DNA technology. Two significant challenges we faced were (i) the development of zinc finger domains that target sequences not recognized by naturally occurring zinc finger proteins and (ii) determining how individual zinc finger domains could be tethered together as polydactyl proteins to recognize unique locations within complex genomes. We and others have since used
Host and viral RNA-binding proteins involved in membrane targeting, replication and intercellular movement of plant RNA virus genomes

PubMed Central

Hyodo, Kiwamu; Kaido, Masanori; Okuno, Tetsuro

2014-01-01

Many plant viruses have positive-strand RNA [(+)RNA] as their genome. Therefore, it is not surprising that RNA-binding proteins (RBPs) play important roles during (+)RNA virus infection in host plants. Increasing evidence demonstrates that viral and host RBPs play critical roles in multiple steps of the viral life cycle, including translation and replication of viral genomic RNAs, and their intra- and intercellular movement. Although studies focusing on the RNA-binding activities of viral and host proteins, and their associations with membrane targeting, and intercellular movement of viral genomes have been limited to a few viruses, these studies have provided important insights into the molecular mechanisms underlying the replication and movement of viral genomic RNAs. In this review, we briefly overview the currently defined roles of viral and host RBPs whose RNA-binding activity have been confirmed experimentally in association with their membrane targeting, and intercellular movement of plant RNA virus genomes. PMID:25071804
Pulmonary Sarcomatoid Carcinomas Commonly Harbor Either Potentially Targetable Genomic Alterations or High Tumor Mutational Burden as Observed by Comprehensive Genomic Profiling.

PubMed

Schrock, Alexa B; Li, Shuyu D; Frampton, Garrett M; Suh, James; Braun, Eduardo; Mehra, Ranee; Buck, Steven C; Bufill, Jose A; Peled, Nir; Karim, Nagla Abdel; Hsieh, K Cynthia; Doria, Manuel; Knost, James; Chen, Rong; Ou, Sai-Hong Ignatius; Ross, Jeffrey S; Stephens, Philip J; Fishkin, Paul; Miller, Vincent A; Ali, Siraj M; Halmos, Balazs; Liu, Jane J

2017-06-01

Pulmonary sarcomatoid carcinoma (PSC) is a high-grade NSCLC characterized by poor prognosis and resistance to chemotherapy. Development of targeted therapeutic strategies for PSC has been hampered because of limited and inconsistent molecular characterization. Hybrid capture-based comprehensive genomic profiling was performed on DNA from formalin-fixed paraffin-embedded sections of 15,867 NSCLCs, including 125 PSCs (0.8%). Tumor mutational burden (TMB) was calculated from 1.11 megabases (Mb) of sequenced DNA. The median age of the patients with PSC was 67 years (range 32-87), 58% were male, and 78% had stage IV disease. Tumor protein p53 gene (TP53) genomic alterations (GAs) were identified in 74% of cases, which had genomics distinct from TP53 wild-type cases, and 62% featured a GA in KRAS (34%) or one of seven genes currently recommended for testing in the National Comprehensive Cancer Network NSCLC guidelines, including the following: hepatocyte growth factor receptor gene (MET) (13.6%), EGFR (8.8%), BRAF (7.2%), erb-b2 receptor tyrosine kinase 2 gene (HER2) (1.6%), and ret proto-oncogene (RET) (0.8%). MET exon 14 alterations were enriched in PSC (12%) compared with non-PSC NSCLCs (∼3%) (p < 0.0001) and were more prevalent in PSC cases with an adenocarcinoma component. The fraction of PSC with a high TMB (>20 mutations per Mb) was notably higher than in non-PSC NSCLC (20% versus 14%, p = 0.056). Of nine patients with PSC treated with targeted or immunotherapies, three had partial responses and three had stable disease. Potentially targetable GAs in National Comprehensive Cancer Network NSCLC genes (30%) or intermediate or high TMB (43%, >10 mutations per Mb) were identified in most of the PSC cases. Thus, the use of comprehensive genomic profiling in clinical care may provide important treatment options for a historically poorly characterized and difficult to treat disease. Copyright © 2017 International Association for the Study of Lung Cancer. Published
A programmable method for massively parallel targeted sequencing

PubMed Central

Hopmans, Erik S.; Natsoulis, Georges; Bell, John M.; Grimes, Susan M.; Sieh, Weiva; Ji, Hanlee P.

2014-01-01

We have developed a targeted resequencing approach referred to as Oligonucleotide-Selective Sequencing. In this study, we report a series of significant improvements and novel applications of this method whereby the surface of a sequencing flow cell is modified in situ to capture specific genomic regions of interest from a sample and then sequenced. These improvements include a fully automated targeted sequencing platform through the use of a standard Illumina cBot fluidics station. Targeting optimization increased the yield of total on-target sequencing data 2-fold compared to the previous iteration, while simultaneously increasing the percentage of reads that could be mapped to the human genome. The described assays cover up to 1421 genes with a total coverage of 5.5 Megabases (Mb). We demonstrate a 10-fold abundance uniformity of greater than 90% in 1 log distance from the median and a targeting rate of up to 95%. We also sequenced continuous genomic loci up to 1.5 Mb while simultaneously genotyping SNPs and genes. Variants with low minor allele fraction were sensitively detected at levels of 5%. Finally, we determined the exact breakpoint sequence of cancer rearrangements. Overall, this approach has high performance for selective sequencing of genome targets, configuration flexibility and variant calling accuracy. PMID:24782526
Identification of genomic regions associated with resistance to clinical mastitis in US Holstein cattle

USDA-ARS?s Scientific Manuscript database

The objective of this research was to identify genomic regions associated with clinical mastitis (MAST) in US Holsteins using producer-reported data. Genome-wide association studies (GWAS) were performed on deregressed PTA using GEMMA v. 0.94. Genotypes included 60,671 SNP for all predictor bulls (n...

Enhanced guide-RNA design and targeting analysis for precise CRISPR genome editing of single and consortia of industrially relevant and non-model organisms.

PubMed

Mendoza, Brian J; Trinh, Cong T

2018-01-01

Genetic diversity of non-model organisms offers a repertoire of unique phenotypic features for exploration and cultivation for synthetic biology and metabolic engineering applications. To realize this enormous potential, it is critical to have an efficient genome editing tool for rapid strain engineering of these organisms to perform novel programmed functions. To accommodate the use of CRISPR/Cas systems for genome editing across organisms, we have developed a novel method, named CRISPR Associated Software for Pathway Engineering and Research (CASPER), for identifying on- and off-targets with enhanced predictability coupled with an analysis of non-unique (repeated) targets to assist in editing any organism with various endonucleases. Utilizing CASPER, we demonstrated a modest 2.4% and significant 30.2% improvement (F-test, P < 0.05) over the conventional methods for predicting on- and off-target activities, respectively. Further we used CASPER to develop novel applications in genome editing: multitargeting analysis (i.e. simultaneous multiple-site modification on a target genome with a sole guide-RNA requirement) and multispecies population analysis (i.e. guide-RNA design for genome editing across a consortium of organisms). Our analysis on a selection of industrially relevant organisms revealed a number of non-unique target sites associated with genes and transposable elements that can be used as potential sites for multitargeting. The analysis also identified shared and unshared targets that enable genome editing of single or multiple genomes in a consortium of interest. We envision CASPER as a useful platform to enhance the precise CRISPR genome editing for metabolic engineering and synthetic biology applications. https://github.com/TrinhLab/CASPER. ctrinh@utk.edu. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Comparison of Genome-Wide Binding of MyoD in Normal Human Myogenic Cells and Rhabdomyosarcomas Identifies Regional and Local Suppression of Promyogenic Transcription Factors

PubMed Central

MacQuarrie, Kyle L.; Yao, Zizhen; Fong, Abraham P.; Diede, Scott J.; Rudzinski, Erin R.; Hawkins, Douglas S.

2013-01-01

Rhabdomyosarcoma is a pediatric tumor of skeletal muscle that expresses the myogenic basic helix-loop-helix protein MyoD but fails to undergo terminal differentiation. Prior work has determined that DNA binding by MyoD occurs in the tumor cells, but myogenic targets fail to activate. Using MyoD chromatin immunoprecipitation coupled to high-throughput sequencing and gene expression analysis in both primary human muscle cells and RD rhabdomyosarcoma cells, we demonstrate that MyoD binds in a similar genome-wide pattern in both tumor and normal cells but binds poorly at a subset of myogenic genes that fail to activate in the tumor cells. Binding differences are found both across genomic regions and locally at specific sites that are associated with binding motifs for RUNX1, MEF2C, JDP2, and NFIC. These factors are expressed at lower levels in RD cells than muscle cells and rescue myogenesis when expressed in RD cells. MEF2C is located in a genomic region that exhibits poor MyoD binding in RD cells, whereas JDP2 exhibits local DNA hypermethylation in its promoter in both RD cells and primary tumor samples. These results demonstrate that regional and local silencing of differentiation factors contributes to the differentiation defect in rhabdomyosarcomas. PMID:23230269
Capturing the target genes of BldD in Saccharopolyspora erythraea using improved genomic SELEX method.

PubMed

Wu, Hang; Mao, Yongrong; Chen, Meng; Pan, Hui; Huang, Xunduan; Ren, Min; Wu, Hao; Li, Jiali; Xu, Zhongdong; Yuan, Hualing; Geng, Ming; Weaver, David T; Zhang, Lixin; Zhang, Buchang

2015-03-01

BldD (SACE_2077), a key developmental regulator in actinomycetes, is the first identified transcriptional factor in Saccharopolyspora erythraea positively regulating erythromycin production and morphological differentiation. Although the BldD of S. erythraea binds to the promoters of erythromycin biosynthetic genes, the interaction affinities are relatively low, implying the existence of its other target genes in S. erythraea. Through the genomic systematic evolution of ligands by exponential enrichment (SELEX) method that we herein improved, four DNA sequences of S. erythraea A226, corresponding to the promoter regions of SACE_0306 (beta-galactosidase), SACE_0811 (50S ribosomal protein L25), SACE_3410 (fumarylacetoacetate hydrolase), and SACE_6014 (aldehyde dehydrogenase), were captured with all three BldD concentrations of 0.5, 1, and 2 μM, while the previously identified intergenic regions of eryBIV-eryAI and ermE-eryCI plus the promoter region of SACE_7115, the amfC homolog for aerial mycelium formation, could be captured only when the BldD's concentration reached 2 μM. Electrophoretic mobility shift assay (EMSA) analysis indicated that BldD specifically bound to above seven DNA sequences, and quantitative real-time PCR (qRT-PCR) assay showed that the transcriptional levels of the abovementioned target genes decreased when bldD was disrupted in A226. Furthermore, SACE_7115 and SACE_0306 in A226 were individually inactivated, showing that SACE_7115 was predominantly involved in aerial mycelium formation, while SACE_0306 mainly controlled erythromycin production. This study provides valuable information for better understanding of the pleiotropic regulator BldD in S. erythraea, and the improved method may be useful for uncovering regulatory networks of other transcriptional factors.
Analysis of illegitimate genomic integration mediated by zinc-finger nucleases: implications for specificity of targeted gene correction

PubMed Central

2010-01-01

Background Formation of site specific genomic double strand breaks (DSBs), induced by the expression of a pair of engineered zinc-finger nucleases (ZFNs), dramatically increases the rates of homologous recombination (HR) between a specific genomic target and a donor plasmid. However, for the safe use of ZFN induced HR in practical applications, possible adverse effects of the technology such as cytotoxicity and genotoxicity need to be well understood. In this work, off-target activity of a pair of ZFNs has been examined by measuring the ratio between HR and illegitimate genomic integration in cells that are growing exponentially, and in cells that have been arrested in the G2/M phase. Results A reporter cell line that contained consensus ZFN binding sites in an enhanced green fluorescent protein (EGFP) reporter gene was used to measure ratios between HR and non-homologous integration of a plasmid template. Both in human cells (HEK 293) containing the consensus ZFN binding sites and in cells lacking the ZFN binding sites, a 3.5 fold increase in the level of illegitimate integration was observed upon ZFN expression. Since the reporter gene containing the consensus ZFN target sites was found to be intact in cells where illegitimate integration had occurred, increased rates of illegitimate integration most likely resulted from the formation of off-target genomic DSBs. Additionally, in a fraction of the ZFN treated cells the co-occurrence of both specific HR and illegitimate integration was observed. As a mean to minimize unspecific effects, cell cycle manipulation of the target cells by induction of a transient G2/M cell cycle arrest was shown to stimulate the activity of HR while having little effect on the levels of illegitimate integration, thus resulting in a nearly eight fold increase in the ratio between the two processes. Conclusions The demonstration that ZFN expression, in addition to stimulating specific gene targeting by HR, leads to increased rates of
Read clouds uncover variation in complex regions of the human genome

PubMed Central

Bishara, Alex; Liu, Yuling; Weng, Ziming; Kashef-Haghighi, Dorna; Newburger, Daniel E.; West, Robert; Sidow, Arend; Batzoglou, Serafim

2015-01-01

Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Core to the problem is the lack of a sequencing technology that produces reads with sufficient length and accuracy to enable unique mapping. Here, we present a novel methodology of using read clouds, obtained by accurate short-read sequencing of DNA derived from long fragment libraries, to confidently align short reads within repeat regions and enable accurate variant discovery. Our novel algorithm, Random Field Aligner (RFA), captures the relationships among the short reads governed by the long read process via a Markov Random Field. We utilized a modified version of the Illumina TruSeq synthetic long-read protocol, which yielded shallow-sequenced read clouds. We test RFA through extensive simulations and apply it to discover variants on the NA12878 human sample, for which shallow TruSeq read cloud sequencing data are available, and on an invasive breast carcinoma genome that we sequenced using the same method. We demonstrate that RFA facilitates accurate recovery of variation in 155 Mb of the human genome, including 94% of 67 Mb of segmental duplication sequence and 96% of 11 Mb of transcribed sequence, that are currently hidden from short-read technologies. PMID:26286554
Read clouds uncover variation in complex regions of the human genome.

PubMed

Bishara, Alex; Liu, Yuling; Weng, Ziming; Kashef-Haghighi, Dorna; Newburger, Daniel E; West, Robert; Sidow, Arend; Batzoglou, Serafim

2015-10-01

Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Core to the problem is the lack of a sequencing technology that produces reads with sufficient length and accuracy to enable unique mapping. Here, we present a novel methodology of using read clouds, obtained by accurate short-read sequencing of DNA derived from long fragment libraries, to confidently align short reads within repeat regions and enable accurate variant discovery. Our novel algorithm, Random Field Aligner (RFA), captures the relationships among the short reads governed by the long read process via a Markov Random Field. We utilized a modified version of the Illumina TruSeq synthetic long-read protocol, which yielded shallow-sequenced read clouds. We test RFA through extensive simulations and apply it to discover variants on the NA12878 human sample, for which shallow TruSeq read cloud sequencing data are available, and on an invasive breast carcinoma genome that we sequenced using the same method. We demonstrate that RFA facilitates accurate recovery of variation in 155 Mb of the human genome, including 94% of 67 Mb of segmental duplication sequence and 96% of 11 Mb of transcribed sequence, that are currently hidden from short-read technologies. © 2015 Bishara et al.; Published by Cold Spring Harbor Laboratory Press.
Genome editing for crop improvement: Challenges and opportunities

PubMed Central

Abdallah, Naglaa A; Prakash, Channapatna S; McHughen, Alan G

2015-01-01

ABSTRACT Genome or gene editing includes several new techniques to help scientists precisely modify genome sequences. The techniques also enables us to alter the regulation of gene expression patterns in a pre-determined region and facilitates novel insights into the functional genomics of an organism. Emergence of genome editing has brought considerable excitement especially among agricultural scientists because of its simplicity, precision and power as it offers new opportunities to develop improved crop varieties with clear-cut addition of valuable traits or removal of undesirable traits. Research is underway to improve crop varieties with higher yields, strengthen stress tolerance, disease and pest resistance, decrease input costs, and increase nutritional value. Genome editing encompasses a wide variety of tools using either a site-specific recombinase (SSR) or a site-specific nuclease (SSN) system. Both systems require recognition of a known sequence. The SSN system generates single or double strand DNA breaks and activates endogenous DNA repair pathways. SSR technology, such as Cre/loxP and Flp/FRT mediated systems, are able to knockdown or knock-in genes in the genome of eukaryotes, depending on the orientation of the specific sites (loxP, FLP, etc.) flanking the target site. There are 4 main classes of SSN developed to cleave genomic sequences, mega-nucleases (homing endonuclease), zinc finger nucleases (ZFNs), transcriptional activator-like effector nucleases (TALENs), and the CRISPR/Cas nuclease system (clustered regularly interspaced short palindromic repeat/CRISPR-associated protein). The recombinase mediated genome engineering depends on recombinase (sub-) family and target-site and induces high frequencies of homologous recombination. Improving crops with gene editing provides a range of options: by altering only a few nucleotides from billions found in the genomes of living cells, altering the full allele or by inserting a new gene in a targeted
Programmable removal of bacterial strains by use of genome-targeting CRISPR-Cas systems.

PubMed

Gomaa, Ahmed A; Klumpe, Heidi E; Luo, Michelle L; Selle, Kurt; Barrangou, Rodolphe; Beisel, Chase L

2014-01-28

CRISPR (clustered regularly interspaced short palindromic repeats)-Cas (CRISPR-associated) systems in bacteria and archaea employ CRISPR RNAs to specifically recognize the complementary DNA of foreign invaders, leading to sequence-specific cleavage or degradation of the target DNA. Recent work has shown that the accidental or intentional targeting of the bacterial genome is cytotoxic and can lead to cell death. Here, we have demonstrated that genome targeting with CRISPR-Cas systems can be employed for the sequence-specific and titratable removal of individual bacterial strains and species. Using the type I-E CRISPR-Cas system in Escherichia coli as a model, we found that this effect could be elicited using native or imported systems and was similarly potent regardless of the genomic location, strand, or transcriptional activity of the target sequence. Furthermore, the specificity of targeting with CRISPR RNAs could readily distinguish between even highly similar strains in pure or mixed cultures. Finally, varying the collection of delivered CRISPR RNAs could quantitatively control the relative number of individual strains within a mixed culture. Critically, the observed selectivity and programmability of bacterial removal would be virtually impossible with traditional antibiotics, bacteriophages, selectable markers, or tailored growth conditions. Once delivery challenges are addressed, we envision that this approach could offer a novel means to quantitatively control the composition of environmental and industrial microbial consortia and may open new avenues for the development of "smart" antibiotics that circumvent multidrug resistance and differentiate between pathogenic and beneficial microorganisms. Controlling the composition of microbial populations is a critical aspect in medicine, biotechnology, and environmental cycles. While different antimicrobial strategies, such as antibiotics, antimicrobial peptides, and lytic bacteriophages, offer partial solutions
Phosphorodiamidate morpholino targeting the 5' untranslated region of the ZIKV RNA inhibits virus replication.

PubMed

Popik, Waldemar; Khatua, Atanu; Hildreth, James E K; Lee, Benjamin; Alcendor, Donald J

2018-06-01

Zika virus (ZIKV) infection has been associated with microcephaly in infants. Currently there is no treatment or vaccine. Here we explore the use of a morpholino oligonucleotide targeted to the 5' untranslated region (5'-UTR) of the ZIKV RNA to prevent ZIKV replication. Morpholino DWK-1 inhibition of ZIKV replication in human glomerular podocytes was examined by qRT-PCR, reduction in ZIKV genome copy number, western blot analysis, immunofluorescence and proinflammatory cytokine gene expression. Podocytes pretreated with DWK-1 showed reduced levels of both viral mRNA and ZIKV E protein expression compared to controls. We observed suppression in proinflammatory gene expression for IFN-β (interferon β) RANTES (regulated on activation, normal T cell expressed and secreted), MIP-1α (macrophage inflammatory protein-1α), TNF-α (tumor necrosis factor-α) and IL1-α (interleukin 1-α) in ZIKV-infected podocytes pretreated with DWK-1. Morpholino DWK-1 targeting the ZIKV 5'-UTR effectively inhibits ZIKV replication and suppresses ZIKV-induced proinflammatory gene expression. Copyright © 2018 Elsevier Inc. All rights reserved.
Intra-Genomic Internal Transcribed Spacer Region Sequence Heterogeneity and Molecular Diagnosis in Clinical Microbiology.

PubMed

Zhao, Ying; Tsang, Chi-Ching; Xiao, Meng; Cheng, Jingwei; Xu, Yingchun; Lau, Susanna K P; Woo, Patrick C Y

2015-10-22

Internal transcribed spacer region (ITS) sequencing is the most extensively used technology for accurate molecular identification of fungal pathogens in clinical microbiology laboratories. Intra-genomic ITS sequence heterogeneity, which makes fungal identification based on direct sequencing of PCR products difficult, has rarely been reported in pathogenic fungi. During the process of performing ITS sequencing on 71 yeast strains isolated from various clinical specimens, direct sequencing of the PCR products showed ambiguous sequences in six of them. After cloning the PCR products into plasmids for sequencing, interpretable sequencing electropherograms could be obtained. For each of the six isolates, 10-49 clones were selected for sequencing and two to seven intra-genomic ITS copies were detected. The identities of these six isolates were confirmed to be Candida glabrata (n=2), Pichia (Candida) norvegensis (n=2), Candida tropicalis (n=1) and Saccharomyces cerevisiae (n=1). Multiple sequence alignment revealed that one to four intra-genomic ITS polymorphic sites were present in the six isolates, and all these polymorphic sites were located in the ITS1 and/or ITS2 regions. We report and describe the first evidence of intra-genomic ITS sequence heterogeneity in four different pathogenic yeasts, which occurred exclusively in the ITS1 and ITS2 spacer regions for the six isolates in this study.
The Perennial Ryegrass GenomeZipper: Targeted Use of Genome Resources for Comparative Grass Genomics1[C][W

PubMed Central

Pfeifer, Matthias; Martis, Mihaela; Asp, Torben; Mayer, Klaus F.X.; Lübberstedt, Thomas; Byrne, Stephen; Frei, Ursula; Studer, Bruno

2013-01-01

Whole-genome sequences established for model and major crop species constitute a key resource for advanced genomic research. For outbreeding forage and turf grass species like ryegrasses (Lolium spp.), such resources have yet to be developed. Here, we present a model of the perennial ryegrass (Lolium perenne) genome on the basis of conserved synteny to barley (Hordeum vulgare) and the model grass genome Brachypodium (Brachypodium distachyon) as well as rice (Oryza sativa) and sorghum (Sorghum bicolor). A transcriptome-based genetic linkage map of perennial ryegrass served as a scaffold to establish the chromosomal arrangement of syntenic genes from model grass species. This scaffold revealed a high degree of synteny and macrocollinearity and was then utilized to anchor a collection of perennial ryegrass genes in silico to their predicted genome positions. This resulted in the unambiguous assignment of 3,315 out of 8,876 previously unmapped genes to the respective chromosomes. In total, the GenomeZipper incorporates 4,035 conserved grass gene loci, which were used for the first genome-wide sequence divergence analysis between perennial ryegrass, barley, Brachypodium, rice, and sorghum. The perennial ryegrass GenomeZipper is an ordered, information-rich genome scaffold, facilitating map-based cloning and genome assembly in perennial ryegrass and closely related Poaceae species. It also represents a milestone in describing synteny between perennial ryegrass and fully sequenced model grass genomes, thereby increasing our understanding of genome organization and evolution in the most important temperate forage and turf grass species. PMID:23184232
The siRNA Non-seed Region and Its Target Sequences Are Auxiliary Determinants of Off-Target Effects.

PubMed

Kamola, Piotr J; Nakano, Yuko; Takahashi, Tomoko; Wilson, Paul A; Ui-Tei, Kumiko

2015-12-01

RNA interference (RNAi) is a powerful tool for post-transcriptional gene silencing. However, the siRNA guide strand may bind unintended off-target transcripts via partial sequence complementarity by a mechanism closely mirroring micro RNA (miRNA) silencing. To better understand these off-target effects, we investigated the correlation between sequence features within various subsections of siRNA guide strands, and its corresponding target sequences, with off-target activities. Our results confirm previous reports that strength of base-pairing in the siRNA seed region is the primary factor determining the efficiency of off-target silencing. However, the degree of downregulation of off-target transcripts with shared seed sequence is not necessarily similar, suggesting that there are additional auxiliary factors that influence the silencing potential. Here, we demonstrate that both the melting temperature (Tm) in a subsection of siRNA non-seed region, and the GC contents of its corresponding target sequences, are negatively correlated with the efficiency of off-target effect. Analysis of experimentally validated miRNA targets demonstrated a similar trend, indicating a putative conserved mechanistic feature of seed region-dependent targeting mechanism. These observations may prove useful as parameters for off-target prediction algorithms and improve siRNA 'specificity' design rules.
Outlier analysis of functional genomic profiles enriches for oncology targets and enables precision medicine.

PubMed

Zhu, Zhou; Ihle, Nathan T; Rejto, Paul A; Zarrinkar, Patrick P

2016-06-13

Genome-scale functional genomic screens across large cell line panels provide a rich resource for discovering tumor vulnerabilities that can lead to the next generation of targeted therapies. Their data analysis typically has focused on identifying genes whose knockdown enhances response in various pre-defined genetic contexts, which are limited by biological complexities as well as the incompleteness of our knowledge. We thus introduce a complementary data mining strategy to identify genes with exceptional sensitivity in subsets, or outlier groups, of cell lines, allowing an unbiased analysis without any a priori assumption about the underlying biology of dependency. Genes with outlier features are strongly and specifically enriched with those known to be associated with cancer and relevant biological processes, despite no a priori knowledge being used to drive the analysis. Identification of exceptional responders (outliers) may not lead only to new candidates for therapeutic intervention, but also tumor indications and response biomarkers for companion precision medicine strategies. Several tumor suppressors have an outlier sensitivity pattern, supporting and generalizing the notion that tumor suppressors can play context-dependent oncogenic roles. The novel application of outlier analysis described here demonstrates a systematic and data-driven analytical strategy to decipher large-scale functional genomic data for oncology target and precision medicine discoveries.
Genome Engineering in Bacillus anthracis Using Cre Recombinase

PubMed Central

Pomerantsev, Andrei P.; Sitaraman, Ramakrishnan; Galloway, Craig R.; Kivovich, Violetta; Leppla, Stephen H.

2006-01-01

Genome engineering is a powerful method for the study of bacterial virulence. With the availability of the complete genomic sequence of Bacillus anthracis, it is now possible to inactivate or delete selected genes of interest. However, many current methods for disrupting or deleting more than one gene require use of multiple antibiotic resistance determinants. In this report we used an approach that temporarily inserts an antibiotic resistance marker into a selected region of the genome and subsequently removes it, leaving the target region (a single gene or a larger genomic segment) permanently mutated. For this purpose, a spectinomycin resistance cassette flanked by bacteriophage P1 loxP sites oriented as direct repeats was inserted within a selected gene. After identification of strains having the spectinomycin cassette inserted by a double-crossover event, a thermo-sensitive plasmid expressing Cre recombinase was introduced at the permissive temperature. Cre recombinase action at the loxP sites excised the spectinomycin marker, leaving a single loxP site within the targeted gene or genomic segment. The Cre-expressing plasmid was then removed by growth at the restrictive temperature. The procedure could then be repeated to mutate additional genes. In this way, we sequentially mutated two pairs of genes: pepM and spo0A, and mcrB and mrr. Furthermore, loxP sites introduced at distant genes could be recombined by Cre recombinase to cause deletion of large intervening regions. In this way, we deleted the capBCAD region of the pXO2 plasmid and the entire 30 kb of chromosomal DNA between the mcrB and mrr genes, and in the latter case we found that the 32 intervening open reading frames were not essential to growth. PMID:16369025
Multi-region and single-cell sequencing reveal variable genomic heterogeneity in rectal cancer.

PubMed

Liu, Mingshan; Liu, Yang; Di, Jiabo; Su, Zhe; Yang, Hong; Jiang, Beihai; Wang, Zaozao; Zhuang, Meng; Bai, Fan; Su, Xiangqian

2017-11-23

Colorectal cancer is a heterogeneous group of malignancies with complex molecular subtypes. While colon cancer has been widely investigated, studies on rectal cancer are very limited. Here, we performed multi-region whole-exome sequencing and single-cell whole-genome sequencing to examine the genomic intratumor heterogeneity (ITH) of rectal tumors. We sequenced nine tumor regions and 88 single cells from two rectal cancer patients with tumors of the same molecular classification and characterized their mutation profiles and somatic copy number alterations (SCNAs) at the multi-region and the single-cell levels. A variable extent of genomic heterogeneity was observed between the two patients, and the degree of ITH increased when analyzed on the single-cell level. We found that major SCNAs were early events in cancer development and inherited steadily. Single-cell sequencing revealed mutations and SCNAs which were hidden in bulk sequencing. In summary, we studied the ITH of rectal cancer at regional and single-cell resolution and demonstrated that variable heterogeneity existed in two patients. The mutational scenarios and SCNA profiles of two patients with treatment naïve from the same molecular subtype are quite different. Our results suggest each tumor possesses its own architecture, which may result in different diagnosis, prognosis, and drug responses. Remarkable ITH exists in the two patients we have studied, providing a preliminary impression of ITH in rectal cancer.
High-efficiency targeted editing of large viral genomes by RNA-guided nucleases.

PubMed

Bi, Yanwei; Sun, Le; Gao, Dandan; Ding, Chen; Li, Zhihua; Li, Yadong; Cun, Wei; Li, Qihan

2014-05-01

A facile and efficient method for the precise editing of large viral genomes is required for the selection of attenuated vaccine strains and the construction of gene therapy vectors. The type II prokaryotic CRISPR-Cas (clustered regularly interspaced short palindromic repeats (CRISPR)-associated (Cas)) RNA-guided nuclease system can be introduced into host cells during viral replication. The CRISPR-Cas9 system robustly stimulates targeted double-stranded breaks in the genomes of DNA viruses, where the non-homologous end joining (NHEJ) and homology-directed repair (HDR) pathways can be exploited to introduce site-specific indels or insert heterologous genes with high frequency. Furthermore, CRISPR-Cas9 can specifically inhibit the replication of the original virus, thereby significantly increasing the abundance of the recombinant virus among progeny virus. As a result, purified recombinant virus can be obtained with only a single round of selection. In this study, we used recombinant adenovirus and type I herpes simplex virus as examples to demonstrate that the CRISPR-Cas9 system is a valuable tool for editing the genomes of large DNA viruses.
High-Efficiency Targeted Editing of Large Viral Genomes by RNA-Guided Nucleases

PubMed Central

Gao, Dandan; Ding, Chen; Li, Zhihua; Li, Yadong; Cun, Wei; Li, Qihan

2014-01-01

A facile and efficient method for the precise editing of large viral genomes is required for the selection of attenuated vaccine strains and the construction of gene therapy vectors. The type II prokaryotic CRISPR-Cas (clustered regularly interspaced short palindromic repeats (CRISPR)-associated (Cas)) RNA-guided nuclease system can be introduced into host cells during viral replication. The CRISPR-Cas9 system robustly stimulates targeted double-stranded breaks in the genomes of DNA viruses, where the non-homologous end joining (NHEJ) and homology-directed repair (HDR) pathways can be exploited to introduce site-specific indels or insert heterologous genes with high frequency. Furthermore, CRISPR-Cas9 can specifically inhibit the replication of the original virus, thereby significantly increasing the abundance of the recombinant virus among progeny virus. As a result, purified recombinant virus can be obtained with only a single round of selection. In this study, we used recombinant adenovirus and type I herpes simplex virus as examples to demonstrate that the CRISPR-Cas9 system is a valuable tool for editing the genomes of large DNA viruses. PMID:24788700
Pedigree-based analysis of derivation of genome segments of an elite rice reveals key regions during its breeding.

PubMed

Zhou, Degui; Chen, Wei; Lin, Zechuan; Chen, Haodong; Wang, Chongrong; Li, Hong; Yu, Renbo; Zhang, Fengyun; Zhen, Gang; Yi, Junliang; Li, Kanghuo; Liu, Yaoguang; Terzaghi, William; Tang, Xiaoyan; He, Hang; Zhou, Shaochuan; Deng, Xing Wang

2016-02-01

Analyses of genome variations with high-throughput assays have improved our understanding of genetic basis of crop domestication and identified the selected genome regions, but little is known about that of modern breeding, which has limited the usefulness of massive elite cultivars in further breeding. Here we deploy pedigree-based analysis of an elite rice, Huanghuazhan, to exploit key genome regions during its breeding. The cultivars in the pedigree were resequenced with 7.6× depth on average, and 2.1 million high-quality single nucleotide polymorphisms (SNPs) were obtained. Tracing the derivation of genome blocks with pedigree and information on SNPs revealed the chromosomal recombination during breeding, which showed that 26.22% of Huanghuazhan genome are strictly conserved key regions. These major effect regions were further supported by a QTL mapping of 260 recombinant inbred lines derived from the cross of Huanghuazhan and a very dissimilar cultivar, Shuanggui 36, and by the genome profile of eight cultivars and 36 elite lines derived from Huanghuazhan. Hitting these regions with the cloned genes revealed they include numbers of key genes, which were then applied to demonstrate how Huanghuazhan were bred after 30 years of effort and to dissect the deficiency of artificial selection. We concluded the regions are helpful to the further breeding based on this pedigree and performing breeding by design. Our study provides genetic dissection of modern rice breeding and sheds new light on how to perform genomewide breeding by design. © 2015 Society for Experimental Biology, Association of Applied Biologists and John Wiley & Sons Ltd.
Identification of coding and non-coding mutational hotspots in cancer genomes.

PubMed

Piraino, Scott W; Furney, Simon J

2017-01-05

The identification of mutations that play a causal role in tumour development, so called "driver" mutations, is of critical importance for understanding how cancers form and how they might be treated. Several large cancer sequencing projects have identified genes that are recurrently mutated in cancer patients, suggesting a role in tumourigenesis. While the landscape of coding drivers has been extensively studied and many of the most prominent driver genes are well characterised, comparatively less is known about the role of mutations in the non-coding regions of the genome in cancer development. The continuing fall in genome sequencing costs has resulted in a concomitant increase in the number of cancer whole genome sequences being produced, facilitating systematic interrogation of both the coding and non-coding regions of cancer genomes. To examine the mutational landscapes of tumour genomes we have developed a novel method to identify mutational hotspots in tumour genomes using both mutational data and information on evolutionary conservation. We have applied our methodology to over 1300 whole cancer genomes and show that it identifies prominent coding and non-coding regions that are known or highly suspected to play a role in cancer. Importantly, we applied our method to the entire genome, rather than relying on predefined annotations (e.g. promoter regions) and we highlight recurrently mutated regions that may have resulted from increased exposure to mutational processes rather than selection, some of which have been identified previously as targets of selection. Finally, we implicate several pan-cancer and cancer-specific candidate non-coding regions, which could be involved in tumourigenesis. We have developed a framework to identify mutational hotspots in cancer genomes, which is applicable to the entire genome. This framework identifies known and novel coding and non-coding mutional hotspots and can be used to differentiate candidate driver regions from
Small interfering RNA against the 2C genomic region of coxsackievirus B3 exerts potential antiviral effects in permissive HeLa cells.

PubMed

Luan, Ying; Dai, Hai-Li; Yang, Dan; Zhu, Lin; Gao, Tie-Lei; Shao, Hong-Jiang; Peng, Xue; Jin, Zhan-Feng

2012-01-01

Coxsackievirus B3 (CVB3) is the most important causal agent of viral heart muscle disease, but no specific antiviral drug is currently available. Small interfering RNA (siRNA) has been used as an antiviral therapeutic strategy via posttranscriptional gene silencing. In this study, eleven siRNAs were designed to target seven distinct regions of the CVB3 genome including VP1, VP2, VP3, 2A, 2C, 3C, and 3D. All of the siRNAs were individually transfected into HeLa cells, which were subsequently infected with CVB3. The impacts of RNA interference (RNAi) on viral replication were evaluated using five measures: cytopathic effect (CPE), 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide (MTT) assay, 50% tissue culture infectious dose (TCID(50)), real-time RT-PCR, and Western blot. Five of the eleven siRNAs were highly efficient at inhibiting viral replication. This was especially true for siRNA-5, which targeted the ATPase 2C. However, antiviral activity varied significantly among siRNA-9, -10, and -11 even though that they all targeted the 3D region. Our results revealed several effective targets for CVB3 silencing, and provided evidence that sequences except CRE within the 2C region may also be potential targets for CVB3-specific siRNAs design. These data supported a potential role of RNA interference in future antiviral intervention therapies. Copyright © 2011 Elsevier B.V. All rights reserved.

Variation block-based genomics method for crop plants.

PubMed

Kim, Yul Ho; Park, Hyang Mi; Hwang, Tae-Young; Lee, Seuk Ki; Choi, Man Soo; Jho, Sungwoong; Hwang, Seungwoo; Kim, Hak-Min; Lee, Dongwoo; Kim, Byoung-Chul; Hong, Chang Pyo; Cho, Yun Sung; Kim, Hyunmin; Jeong, Kwang Ho; Seo, Min Jung; Yun, Hong Tai; Kim, Sun Lim; Kwon, Young-Up; Kim, Wook Han; Chun, Hye Kyung; Lim, Sang Jong; Shin, Young-Ah; Choi, Ik-Young; Kim, Young Sun; Yoon, Ho-Sung; Lee, Suk-Ha; Lee, Sunghoon

2014-06-15

In contrast with wild species, cultivated crop genomes consist of reshuffled recombination blocks, which occurred by crossing and selection processes. Accordingly, recombination block-based genomics analysis can be an effective approach for the screening of target loci for agricultural traits. We propose the variation block method, which is a three-step process for recombination block detection and comparison. The first step is to detect variations by comparing the short-read DNA sequences of the cultivar to the reference genome of the target crop. Next, sequence blocks with variation patterns are examined and defined. The boundaries between the variation-containing sequence blocks are regarded as recombination sites. All the assumed recombination sites in the cultivar set are used to split the genomes, and the resulting sequence regions are termed variation blocks. Finally, the genomes are compared using the variation blocks. The variation block method identified recurring recombination blocks accurately and successfully represented block-level diversities in the publicly available genomes of 31 soybean and 23 rice accessions. The practicality of this approach was demonstrated by the identification of a putative locus determining soybean hilum color. We suggest that the variation block method is an efficient genomics method for the recombination block-level comparison of crop genomes. We expect that this method will facilitate the development of crop genomics by bringing genomics technologies to the field of crop breeding.
Templated sequence insertion polymorphisms in the human genome

NASA Astrophysics Data System (ADS)

Onozawa, Masahiro; Aplan, Peter

2016-11-01

Templated Sequence Insertion Polymorphism (TSIP) is a recently described form of polymorphism recognized in the human genome, in which a sequence that is templated from a distant genomic region is inserted into the genome, seemingly at random. TSIPs can be grouped into two classes based on nucleotide sequence features at the insertion junctions; Class 1 TSIPs show features of insertions that are mediated via the LINE-1 ORF2 protein, including 1) target-site duplication (TSD), 2) polyadenylation 10-30 nucleotides downstream of a “cryptic” polyadenylation signal, and 3) preference for insertion at a 5’-TTTT/A-3’ sequence. In contrast, class 2 TSIPs show features consistent with repair of a DNA double-strand break via insertion of a DNA “patch” that is derived from a distant genomic region. Survey of a large number of normal human volunteers demonstrates that most individuals have 25-30 TSIPs, and that these TSIPs track with specific geographic regions. Similar to other forms of human polymorphism, we suspect that these TSIPs may be important for the generation of human diversity and genetic diseases.
Identification of striated muscle activator of Rho signaling (STARS) as a novel calmodulin target by a newly developed genome-wide screen.

PubMed

Furuya, Yusui; Denda, Miwako; Sakane, Kyohei; Ogusu, Tomoko; Takahashi, Sumio; Magari, Masaki; Kanayama, Naoki; Morishita, Ryo; Tokumitsu, Hiroshi

2016-07-01

To search for novel target(s) of the Ca(2+)-signaling transducer, calmodulin (CaM), we performed a newly developed genome-wide CaM interaction screening of 19,676 GST-fused proteins expressed in human. We identified striated muscle activator of Rho signaling (STARS) as a novel CaM target and characterized its CaM binding ability and found that the Ca(2+)/CaM complex interacted stoichiometrically with the N-terminal region (Ala13-Gln35) of STARS in vitro as well as in living cells. Mutagenesis studies identified Ile20 and Trp33 as the essential hydrophobic residues in CaM anchoring. Furthermore, the CaM binding deficient mutant (Ile20Ala, Trp33Ala) of STARS further enhanced its stimulatory effect on SRF-dependent transcriptional activation. These results suggest a connection between Ca(2+)-signaling via excitation-contraction coupling and the regulation of STARS-mediated gene expression in muscles. Copyright © 2016 Elsevier Ltd. All rights reserved.
New Regions of the Human Genome Linked to Skin Color Variation in Some African Populations

Cancer.gov

In the first study of its kind, an international team of genomics researchers has identified new regions of the human genome that are associated with skin color variation in some African populations, opening new avenues for research on skin diseases and cancer in all populations.
Development of Real Time PCR Using Novel Genomic Target for Detection of Multiple Salmonella Serovars from Milk and Chickens

USDA-ARS?s Scientific Manuscript database

Background: A highly sensitive and specific novel genomic and plasmid target-based PCR platform was developed to detect multiple Salmonella serovars (S. Heidelberg, S. Dublin, S. Hadar, S. Kentucky and S. Enteritidis). Through extensive genome mining of protein databases of these serovars and compar...
A statistical framework to predict functional non-coding regions in the human genome through integrated analysis of annotation data.

PubMed

Lu, Qiongshi; Hu, Yiming; Sun, Jiehuan; Cheng, Yuwei; Cheung, Kei-Hoi; Zhao, Hongyu

2015-05-27

Identifying functional regions in the human genome is a major goal in human genetics. Great efforts have been made to functionally annotate the human genome either through computational predictions, such as genomic conservation, or high-throughput experiments, such as the ENCODE project. These efforts have resulted in a rich collection of functional annotation data of diverse types that need to be jointly analyzed for integrated interpretation and annotation. Here we present GenoCanyon, a whole-genome annotation method that performs unsupervised statistical learning using 22 computational and experimental annotations thereby inferring the functional potential of each position in the human genome. With GenoCanyon, we are able to predict many of the known functional regions. The ability of predicting functional regions as well as its generalizable statistical framework makes GenoCanyon a unique and powerful tool for whole-genome annotation. The GenoCanyon web server is available at http://genocanyon.med.yale.edu.
A Rapid Method of Genomic Array Analysis of Scaffold/Matrix Attachment Regions (S/MARs) Identifies a 2.5-Mb Region of Enhanced Scaffold/Matrix Attachment at a Human Neocentromere

PubMed Central

Sumer, Huseyin; Craig, Jeffrey M.; Sibson, Mandy; Choo, K.H. Andy

2003-01-01

Human neocentromeres are fully functional centromeres that arise at previously noncentromeric regions of the genome. We have tested a rapid procedure of genomic array analysis of chromosome scaffold/matrix attachment regions (S/MARs), involving the isolation of S/MAR DNA and hybridization of this DNA to a genomic BAC/PAC array. Using this procedure, we have defined a 2.5-Mb domain of S/MAR-enriched chromatin that fully encompasses a previously mapped centromere protein-A (CENP-A)-associated domain at a human neocentromere. We have independently verified this procedure using a previously established fluorescence in situ hybridization method on salt-treated metaphase chromosomes. In silico sequence analysis of the S/MAR-enriched and surrounding regions has revealed no outstanding sequence-related predisposition. This study defines the S/MAR-enriched domain of a higher eukaryotic centromere and provides a method that has broad application for the mapping of S/MAR attachment sites over large genomic regions or throughout a genome. PMID:12840048
Genome-wide identification and characterisation of HOT regions in the human genome.

PubMed

Li, Hao; Liu, Feng; Ren, Chao; Bo, Xiaochen; Shu, Wenjie

2016-09-15

HOT (high-occupancy target) regions, which are bound by a surprisingly large number of transcription factors, are considered to be among the most intriguing findings of recent years. An improved understanding of the roles that HOT regions play in biology would be afforded by knowing the constellation of factors that constitute these domains and by identifying HOT regions across the spectrum of human cell types. We characterised and validated HOT regions in embryonic stem cells (ESCs) and produced a catalogue of HOT regions in a broad range of human cell types. We found that HOT regions are associated with genes that control and define the developmental processes of the respective cell and tissue types. We also showed evidence of the developmental persistence of HOT regions at primitive enhancers and demonstrate unique signatures of HOT regions that distinguish them from typical enhancers and super-enhancers. Finally, we performed a dynamic analysis to reveal the dynamical regulation of HOT regions upon H1 differentiation. Taken together, our results provide a resource for the functional exploration of HOT regions and extend our understanding of the key roles of HOT regions in development and differentiation.
Intra-Genomic Internal Transcribed Spacer Region Sequence Heterogeneity and Molecular Diagnosis in Clinical Microbiology

PubMed Central

Zhao, Ying; Tsang, Chi-Ching; Xiao, Meng; Cheng, Jingwei; Xu, Yingchun; Lau, Susanna K. P.; Woo, Patrick C. Y.

2015-01-01

Internal transcribed spacer region (ITS) sequencing is the most extensively used technology for accurate molecular identification of fungal pathogens in clinical microbiology laboratories. Intra-genomic ITS sequence heterogeneity, which makes fungal identification based on direct sequencing of PCR products difficult, has rarely been reported in pathogenic fungi. During the process of performing ITS sequencing on 71 yeast strains isolated from various clinical specimens, direct sequencing of the PCR products showed ambiguous sequences in six of them. After cloning the PCR products into plasmids for sequencing, interpretable sequencing electropherograms could be obtained. For each of the six isolates, 10–49 clones were selected for sequencing and two to seven intra-genomic ITS copies were detected. The identities of these six isolates were confirmed to be Candida glabrata (n = 2), Pichia (Candida) norvegensis (n = 2), Candida tropicalis (n = 1) and Saccharomyces cerevisiae (n = 1). Multiple sequence alignment revealed that one to four intra-genomic ITS polymorphic sites were present in the six isolates, and all these polymorphic sites were located in the ITS1 and/or ITS2 regions. We report and describe the first evidence of intra-genomic ITS sequence heterogeneity in four different pathogenic yeasts, which occurred exclusively in the ITS1 and ITS2 spacer regions for the six isolates in this study. PMID:26506340
Genome wide association analyses based on a multiple trait approach for modeling feed efficiency

USDA-ARS?s Scientific Manuscript database

Genome wide association (GWA) of feed efficiency (FE) could help target important genomic regions influencing FE. Data provided by an international dairy FE research consortium consisted of phenotypic records on dry matter intakes (DMI), milk energy (MILKE), and metabolic body weight (MBW) on 6,937 ...
A novel program to design siRNAs simultaneously effective to highly variable virus genomes.

PubMed

Lee, Hui Sun; Ahn, Jeonghyun; Jun, Eun Jung; Yang, Sanghwa; Joo, Chul Hyun; Kim, Yoo Kyum; Lee, Heuiran

2009-07-10

A major concern of antiviral therapy using small interfering RNAs (siRNAs) targeting RNA viral genome is high sequence diversity and mutation rate due to genetic instability. To overcome this problem, it is indispensable to design siRNAs targeting highly conserved regions. We thus designed CAPSID (Convenient Application Program for siRNA Design), a novel bioinformatics program to identify siRNAs targeting highly conserved regions within RNA viral genomes. From a set of input RNAs of diverse sequences, CAPSID rapidly searches conserved patterns and suggests highly potent siRNA candidates in a hierarchical manner. To validate the usefulness of this novel program, we investigated the antiviral potency of universal siRNA for various Human enterovirus B (HEB) serotypes. Assessment of antiviral efficacy using Hela cells, clearly demonstrates that HEB-specific siRNAs exhibit protective effects against all HEBs examined. These findings strongly indicate that CAPSID can be applied to select universal antiviral siRNAs against highly divergent viral genomes.
The Genome Sequencer FLX System--longer reads, more applications, straight forward bioinformatics and more complete data sets.

PubMed

Droege, Marcus; Hill, Brendon

2008-08-31

The Genome Sequencer FLX System (GS FLX), powered by 454 Sequencing, is a next-generation DNA sequencing technology featuring a unique mix of long reads, exceptional accuracy, and ultra-high throughput. It has been proven to be the most versatile of all currently available next-generation sequencing technologies, supporting many high-profile studies in over seven applications categories. GS FLX users have pursued innovative research in de novo sequencing, re-sequencing of whole genomes and target DNA regions, metagenomics, and RNA analysis. 454 Sequencing is a powerful tool for human genetics research, having recently re-sequenced the genome of an individual human, currently re-sequencing the complete human exome and targeted genomic regions using the NimbleGen sequence capture process, and detected low-frequency somatic mutations linked to cancer.
A decade of human genome project conclusion: Scientific diffusion about our genome knowledge.

PubMed

Moraes, Fernanda; Góes, Andréa

2016-05-06

The Human Genome Project (HGP) was initiated in 1990 and completed in 2003. It aimed to sequence the whole human genome. Although it represented an advance in understanding the human genome and its complexity, many questions remained unanswered. Other projects were launched in order to unravel the mysteries of our genome, including the ENCyclopedia of DNA Elements (ENCODE). This review aims to analyze the evolution of scientific knowledge related to both the HGP and ENCODE projects. Data were retrieved from scientific articles published in 1990-2014, a period comprising the development and the 10 years following the HGP completion. The fact that only 20,000 genes are protein and RNA-coding is one of the most striking HGP results. A new concept about the organization of genome arose. The ENCODE project was initiated in 2003 and targeted to map the functional elements of the human genome. This project revealed that the human genome is pervasively transcribed. Therefore, it was determined that a large part of the non-protein coding regions are functional. Finally, a more sophisticated view of chromatin structure emerged. The mechanistic functioning of the genome has been redrafted, revealing a much more complex picture. Besides, a gene-centric conception of the organism has to be reviewed. A number of criticisms have emerged against the ENCODE project approaches, raising the question of whether non-conserved but biochemically active regions are truly functional. Thus, HGP and ENCODE projects accomplished a great map of the human genome, but the data generated still requires further in depth analysis. © 2016 by The International Union of Biochemistry and Molecular Biology, 44:215-223, 2016. © 2016 The International Union of Biochemistry and Molecular Biology.
Demographically-Based Evaluation of Genomic Regions under Selection in Domestic Dogs

PubMed Central

Freedman, Adam H.; Schweizer, Rena M.; Ortega-Del Vecchyo, Diego; Han, Eunjung; Davis, Brian W.; Gronau, Ilan; Silva, Pedro M.; Galaverni, Marco; Fan, Zhenxin; Marx, Peter; Lorente-Galdos, Belen; Ramirez, Oscar; Hormozdiari, Farhad; Alkan, Can; Vilà, Carles; Squire, Kevin; Geffen, Eli; Kusak, Josip; Boyko, Adam R.; Parker, Heidi G.; Lee, Clarence; Tadigotla, Vasisht; Siepel, Adam; Bustamante, Carlos D.; Harkins, Timothy T.; Nelson, Stanley F.; Marques-Bonet, Tomas; Ostrander, Elaine A.; Wayne, Robert K.; Novembre, John

2016-01-01

Controlling for background demographic effects is important for accurately identifying loci that have recently undergone positive selection. To date, the effects of demography have not yet been explicitly considered when identifying loci under selection during dog domestication. To investigate positive selection on the dog lineage early in the domestication, we examined patterns of polymorphism in six canid genomes that were previously used to infer a demographic model of dog domestication. Using an inferred demographic model, we computed false discovery rates (FDR) and identified 349 outlier regions consistent with positive selection at a low FDR. The signals in the top 100 regions were frequently centered on candidate genes related to brain function and behavior, including LHFPL3, CADM2, GRIK3, SH3GL2, MBP, PDE7B, NTAN1, and GLRA1. These regions contained significant enrichments in behavioral ontology categories. The 3rd top hit, CCRN4L, plays a major role in lipid metabolism, that is supported by additional metabolism related candidates revealed in our scan, including SCP2D1 and PDXC1. Comparing our method to an empirical outlier approach that does not directly account for demography, we found only modest overlaps between the two methods, with 60% of empirical outliers having no overlap with our demography-based outlier detection approach. Demography-aware approaches have lower-rates of false discovery. Our top candidates for selection, in addition to expanding the set of neurobehavioral candidate genes, include genes related to lipid metabolism, suggesting a dietary target of selection that was important during the period when proto-dogs hunted and fed alongside hunter-gatherers. PMID:26943675
Adenovirus Delivered Short Hairpin RNA Targeting a Conserved Site in the 5′ Non-Translated Region Inhibits All Four Serotypes of Dengue Viruses

PubMed Central

Korrapati, Anil Babu; Swaminathan, Gokul; Singh, Aarti; Khanna, Navin; Swaminathan, Sathyamangalam

2012-01-01

Background Dengue is a mosquito-borne viral disease caused by four closely related serotypes of Dengue viruses (DENVs). This disease whose symptoms range from mild fever to potentially fatal haemorrhagic fever and hypovolemic shock, threatens nearly half the global population. There is neither a preventive vaccine nor an effective antiviral therapy against dengue disease. The difference between severe and mild disease appears to be dependent on the viral load. Early diagnosis may enable timely therapeutic intervention to blunt disease severity by reducing the viral load. Harnessing the therapeutic potential of RNA interference (RNAi) to attenuate DENV replication may offer one approach to dengue therapy. Methodology/Principal Findings We screened the non-translated regions (NTRs) of the RNA genomes of representative members of the four DENV serotypes for putative siRNA targets mapping to known transcription/translation regulatory elements. We identified a target site in the 5′ NTR that maps to the 5′ upstream AUG region, a highly conserved cis-acting element essential for viral replication. We used a replication-defective human adenovirus type 5 (AdV5) vector to deliver a short-hairpin RNA (shRNA) targeting this site into cells. We show that this shRNA matures to the cognate siRNA and is able to inhibit effectively antigen secretion, viral RNA replication and infectious virus production by all four DENV serotypes. Conclusion/Significance The data demonstrate the feasibility of using AdV5-mediated delivery of shRNAs targeting conserved sites in the viral genome to achieve inhibition of all four DENV serotypes. This paves the way towards exploration of RNAi as a possible therapeutic strategy to curtail DENV infection. PMID:22848770
Detection of DNA Methylation by Whole-Genome Bisulfite Sequencing.

PubMed

Li, Qing; Hermanson, Peter J; Springer, Nathan M

2018-01-01

DNA methylation plays an important role in the regulation of the expression of transposons and genes. Various methods have been developed to assay DNA methylation levels. Bisulfite sequencing is considered to be the "gold standard" for single-base resolution measurement of DNA methylation levels. Coupled with next-generation sequencing, whole-genome bisulfite sequencing (WGBS) allows DNA methylation to be evaluated at a genome-wide scale. Here, we described a protocol for WGBS in plant species with large genomes. This protocol has been successfully applied to assay genome-wide DNA methylation levels in maize and barley. This protocol has also been successfully coupled with sequence capture technology to assay DNA methylation levels in a targeted set of genomic regions.
Genome-wide association study of body weight in Australian Merino sheep reveals an orthologous region on OAR6 to human and bovine genomic regions affecting height and weight.

PubMed

Al-Mamun, Hawlader A; Kwan, Paul; Clark, Samuel A; Ferdosi, Mohammad H; Tellam, Ross; Gondro, Cedric

2015-08-14

Body weight (BW) is an important trait for meat production in sheep. Although over the past few years, numerous quantitative trait loci (QTL) have been detected for production traits in cattle, few QTL studies have been reported for sheep, with even fewer on meat production traits. Our objective was to perform a genome-wide association study (GWAS) with the medium-density Illumina Ovine SNP50 BeadChip to identify genomic regions and corresponding haplotypes associated with BW in Australian Merino sheep. A total of 1781 Australian Merino sheep were genotyped using the medium-density Illumina Ovine SNP50 BeadChip. Among the 53 862 single nucleotide polymorphisms (SNPs) on this array, 48 640 were used to perform a GWAS using a linear mixed model approach. Genotypes were phased with hsphase; to estimate SNP haplotype effects, linkage disequilibrium blocks were identified in the detected QTL region. Thirty-nine SNPs were associated with BW at a Bonferroni-corrected genome-wide significance threshold of 1 %. One region on sheep (Ovis aries) chromosome 6 (OAR6) between 36.15 and 38.56 Mb, included 13 significant SNPs that were associated with BW; the most significant SNP was OAR6_41936490.1 (P = 2.37 × 10(-16)) at 37.69 Mb with an allele substitution effect of 2.12 kg, which corresponds to 0.248 phenotypic standard deviations for BW. The region that surrounds this association signal on OAR6 contains three genes: leucine aminopeptidase 3 (LAP3), which is involved in the processing of the oxytocin precursor; NCAPG non-SMC condensin I complex, subunit G (NCAPG), which is associated with foetal growth and carcass size in cattle; and ligand dependent nuclear receptor corepressor-like (LCORL), which is associated with height in humans and cattle. The GWAS analysis detected 39 SNPs associated with BW in sheep and a major QTL region was identified on OAR6. In several other mammalian species, regions that are syntenic with this region have been found to be associated with body
Comparative map and trait viewer (CMTV): an integrated bioinformatic tool to construct consensus maps and compare QTL and functional genomics data across genomes and experiments.

PubMed

Sawkins, M C; Farmer, A D; Hoisington, D; Sullivan, J; Tolopko, A; Jiang, Z; Ribaut, J-M

2004-10-01

In the past few decades, a wealth of genomic data has been produced in a wide variety of species using a diverse array of functional and molecular marker approaches. In order to unlock the full potential of the information contained in these independent experiments, researchers need efficient and intuitive means to identify common genomic regions and genes involved in the expression of target phenotypic traits across diverse conditions. To address this need, we have developed a Comparative Map and Trait Viewer (CMTV) tool that can be used to construct dynamic aggregations of a variety of types of genomic datasets. By algorithmically determining correspondences between sets of objects on multiple genomic maps, the CMTV can display syntenic regions across taxa, combine maps from separate experiments into a consensus map, or project data from different maps into a common coordinate framework using dynamic coordinate translations between source and target maps. We present a case study that illustrates the utility of the tool for managing large and varied datasets by integrating data collected by CIMMYT in maize drought tolerance research with data from public sources. This example will focus on one of the visualization features for Quantitative Trait Locus (QTL) data, using likelihood ratio (LR) files produced by generic QTL analysis software and displaying the data in a unique visual manner across different combinations of traits, environments and crosses. Once a genomic region of interest has been identified, the CMTV can search and display additional QTLs meeting a particular threshold for that region, or other functional data such as sets of differentially expressed genes located in the region; it thus provides an easily used means for organizing and manipulating data sets that have been dynamically integrated under the focus of the researcher's specific hypothesis.
Regional price targets appropriate for advanced coal extraction

NASA Technical Reports Server (NTRS)

Terasawa, K. L.; Whipple, D. M.

1980-01-01

A methodology is presented for predicting coal prices in regional markets for the target time frames 1985 and 2000 that could subsequently be used to guide the development of an advanced coal extraction system. The model constructed is a supply and demand model that focuses on underground mining since the advanced technology is expected to be developed for these reserves by the target years. Coal reserve data and the cost of operating a mine are used to obtain the minimum acceptable selling price that would induce the producer to bring the mine into production. Based on this information, market supply curves can be generated. Demand by region is calculated based on an EEA methodology that emphasizes demand by electric utilities and demand by industry. The demand and supply curves are then used to obtain the price targets. The results show a growth in the size of the markets for compliance and low sulphur coal regions. A significant rise in the real price of coal is not expected even by the year 2000. The model predicts heavy reliance on mines with thick seams, larger block size and deep overburden.
Genome-Wide Analysis of Androgen Receptor Targets Reveals COUP-TF1 as a Novel Player in Human Prostate Cancer

PubMed Central

Perets, Ruth; Kaplan, Tommy; Stein, Ilan; Hidas, Guy; Tayeb, Shay; Avraham, Eti; Ben-Neriah, Yinon; Simon, Itamar; Pikarsky, Eli

2012-01-01

Androgen activity plays a key role in prostate cancer progression. Androgen receptor (AR) is the main mediator of androgen activity in the prostate, through its ability to act as a transcription mediator. Here we performed a genome-wide analysis of human AR binding to promoters in the presence of an agonist or antagonist in an androgen dependent prostate cancer cell line. Many of the AR bound promoters are bound in all examined conditions while others are bound only in the presence of an agonist or antagonist. Several motifs are enriched in AR bound promoters, including the AR Response Element (ARE) half-site and recognition elements for the transcription factors OCT1 and SOX9. This suggests that these 3 factors could define a module of co-operating transcription factors in the prostate. Interestingly, AR bound promoters are preferentially located in AT rich genomic regions. Analysis of mRNA expression identified chicken ovalbumin upstream promoter-transcription factor 1 (COUP-TF1) as a direct AR target gene that is downregulated upon binding by the agonist liganded AR. COUP-TF1 immunostaining revealed nucleolar localization of COUP-TF1 in epithelium of human androgen dependent prostate cancer, but not in adjacent benign prostate epithelium. Stromal cells both in human and mouse prostate show nuclear COUP-TF1 staining. We further show that there is an inverse correlation between COUP-TF1 expression in prostate stromal cells and the rising levels of androgen with advancing puberty. This study extends the pool of recognized putative AR targets and identifies a negatively regulated target of AR – COUP-TF1 – which could possibly play a role in human prostate cancer. PMID:23056316

Genome-wide analysis of androgen receptor targets reveals COUP-TF1 as a novel player in human prostate cancer.

PubMed

Perets, Ruth; Kaplan, Tommy; Stein, Ilan; Hidas, Guy; Tayeb, Shay; Avraham, Eti; Ben-Neriah, Yinon; Simon, Itamar; Pikarsky, Eli

2012-01-01

Androgen activity plays a key role in prostate cancer progression. Androgen receptor (AR) is the main mediator of androgen activity in the prostate, through its ability to act as a transcription mediator. Here we performed a genome-wide analysis of human AR binding to promoters in the presence of an agonist or antagonist in an androgen dependent prostate cancer cell line. Many of the AR bound promoters are bound in all examined conditions while others are bound only in the presence of an agonist or antagonist. Several motifs are enriched in AR bound promoters, including the AR Response Element (ARE) half-site and recognition elements for the transcription factors OCT1 and SOX9. This suggests that these 3 factors could define a module of co-operating transcription factors in the prostate. Interestingly, AR bound promoters are preferentially located in AT rich genomic regions. Analysis of mRNA expression identified chicken ovalbumin upstream promoter-transcription factor 1 (COUP-TF1) as a direct AR target gene that is downregulated upon binding by the agonist liganded AR. COUP-TF1 immunostaining revealed nucleolar localization of COUP-TF1 in epithelium of human androgen dependent prostate cancer, but not in adjacent benign prostate epithelium. Stromal cells both in human and mouse prostate show nuclear COUP-TF1 staining. We further show that there is an inverse correlation between COUP-TF1 expression in prostate stromal cells and the rising levels of androgen with advancing puberty. This study extends the pool of recognized putative AR targets and identifies a negatively regulated target of AR - COUP-TF1 - which could possibly play a role in human prostate cancer.
Development of a targeted transgenesis strategy in highly differentiated cells: a powerful tool for functional genomic analysis.

PubMed

Puttini, Stefania; Ouvrard-Pascaud, Antoine; Palais, Gael; Beggah, Ahmed T; Gascard, Philippe; Cohen-Tannoudji, Michel; Babinet, Charles; Blot-Chabaud, Marcel; Jaisser, Frederic

2005-03-16

Functional genomic analysis is a challenging step in the so-called post-genomic field. Identification of potential targets using large-scale gene expression analysis requires functional validation to identify those that are physiologically relevant. Genetically modified cell models are often used for this purpose allowing up- or down-expression of selected targets in a well-defined and if possible highly differentiated cell type. However, the generation of such models remains time-consuming and expensive. In order to alleviate this step, we developed a strategy aimed at the rapid and efficient generation of genetically modified cell lines with conditional, inducible expression of various target genes. Efficient knock-in of various constructs, called targeted transgenesis, in a locus selected for its permissibility to the tet inducible system, was obtained through the stimulation of site-specific homologous recombination by the meganuclease I-SceI. Our results demonstrate that targeted transgenesis in a reference inducible locus greatly facilitated the functional analysis of the selected recombinant cells. The efficient screening strategy we have designed makes possible automation of the transfection and selection steps. Furthermore, this strategy could be applied to a variety of highly differentiated cells.
Large-scale chromatin immunoprecipitation with promoter sequence microarray analysis of the interaction of the NSs protein of Rift Valley fever virus with regulatory DNA regions of the host genome.

PubMed

Benferhat, Rima; Josse, Thibaut; Albaud, Benoit; Gentien, David; Mansuroglu, Zeyni; Marcato, Vasco; Souès, Sylvie; Le Bonniec, Bernard; Bouloy, Michèle; Bonnefoy, Eliette

2012-10-01

Rift Valley fever virus (RVFV) is a highly pathogenic Phlebovirus that infects humans and ruminants. Initially confined to Africa, RVFV has spread outside Africa and presently represents a high risk to other geographic regions. It is responsible for high fatality rates in sheep and cattle. In humans, RVFV can induce hepatitis, encephalitis, retinitis, or fatal hemorrhagic fever. The nonstructural NSs protein that is the major virulence factor is found in the nuclei of infected cells where it associates with cellular transcription factors and cofactors. In previous work, we have shown that NSs interacts with the promoter region of the beta interferon gene abnormally maintaining the promoter in a repressed state. In this work, we performed a genome-wide analysis of the interactions between NSs and the host genome using a genome-wide chromatin immunoprecipitation combined with promoter sequence microarray, the ChIP-on-chip technique. Several cellular promoter regions were identified as significantly interacting with NSs, and the establishment of NSs interactions with these regions was often found linked to deregulation of expression of the corresponding genes. Among annotated NSs-interacting genes were present not only genes regulating innate immunity and inflammation but also genes regulating cellular pathways that have not yet been identified as targeted by RVFV. Several of these pathways, such as cell adhesion, axonal guidance, development, and coagulation were closely related to RVFV-induced disorders. In particular, we show in this work that NSs targeted and modified the expression of genes coding for coagulation factors, demonstrating for the first time that this hemorrhagic virus impairs the host coagulation cascade at the transcriptional level.
Large-Scale Chromatin Immunoprecipitation with Promoter Sequence Microarray Analysis of the Interaction of the NSs Protein of Rift Valley Fever Virus with Regulatory DNA Regions of the Host Genome

PubMed Central

Benferhat, Rima; Josse, Thibaut; Albaud, Benoit; Gentien, David; Mansuroglu, Zeyni; Marcato, Vasco; Souès, Sylvie; Le Bonniec, Bernard

2012-01-01

Rift Valley fever virus (RVFV) is a highly pathogenic Phlebovirus that infects humans and ruminants. Initially confined to Africa, RVFV has spread outside Africa and presently represents a high risk to other geographic regions. It is responsible for high fatality rates in sheep and cattle. In humans, RVFV can induce hepatitis, encephalitis, retinitis, or fatal hemorrhagic fever. The nonstructural NSs protein that is the major virulence factor is found in the nuclei of infected cells where it associates with cellular transcription factors and cofactors. In previous work, we have shown that NSs interacts with the promoter region of the beta interferon gene abnormally maintaining the promoter in a repressed state. In this work, we performed a genome-wide analysis of the interactions between NSs and the host genome using a genome-wide chromatin immunoprecipitation combined with promoter sequence microarray, the ChIP-on-chip technique. Several cellular promoter regions were identified as significantly interacting with NSs, and the establishment of NSs interactions with these regions was often found linked to deregulation of expression of the corresponding genes. Among annotated NSs-interacting genes were present not only genes regulating innate immunity and inflammation but also genes regulating cellular pathways that have not yet been identified as targeted by RVFV. Several of these pathways, such as cell adhesion, axonal guidance, development, and coagulation were closely related to RVFV-induced disorders. In particular, we show in this work that NSs targeted and modified the expression of genes coding for coagulation factors, demonstrating for the first time that this hemorrhagic virus impairs the host coagulation cascade at the transcriptional level. PMID:22896612
Improved genome-scale multi-target virtual screening via a novel collaborative filtering approach to cold-start problem

PubMed Central

Lim, Hansaim; Gray, Paul; Xie, Lei; Poleksic, Aleksandar

2016-01-01

Conventional one-drug-one-gene approach has been of limited success in modern drug discovery. Polypharmacology, which focuses on searching for multi-targeted drugs to perturb disease-causing networks instead of designing selective ligands to target individual proteins, has emerged as a new drug discovery paradigm. Although many methods for single-target virtual screening have been developed to improve the efficiency of drug discovery, few of these algorithms are designed for polypharmacology. Here, we present a novel theoretical framework and a corresponding algorithm for genome-scale multi-target virtual screening based on the one-class collaborative filtering technique. Our method overcomes the sparseness of the protein-chemical interaction data by means of interaction matrix weighting and dual regularization from both chemicals and proteins. While the statistical foundation behind our method is general enough to encompass genome-wide drug off-target prediction, the program is specifically tailored to find protein targets for new chemicals with little to no available interaction data. We extensively evaluate our method using a number of the most widely accepted gene-specific and cross-gene family benchmarks and demonstrate that our method outperforms other state-of-the-art algorithms for predicting the interaction of new chemicals with multiple proteins. Thus, the proposed algorithm may provide a powerful tool for multi-target drug design. PMID:27958331
Improved genome-scale multi-target virtual screening via a novel collaborative filtering approach to cold-start problem.

PubMed

Lim, Hansaim; Gray, Paul; Xie, Lei; Poleksic, Aleksandar

2016-12-13

Conventional one-drug-one-gene approach has been of limited success in modern drug discovery. Polypharmacology, which focuses on searching for multi-targeted drugs to perturb disease-causing networks instead of designing selective ligands to target individual proteins, has emerged as a new drug discovery paradigm. Although many methods for single-target virtual screening have been developed to improve the efficiency of drug discovery, few of these algorithms are designed for polypharmacology. Here, we present a novel theoretical framework and a corresponding algorithm for genome-scale multi-target virtual screening based on the one-class collaborative filtering technique. Our method overcomes the sparseness of the protein-chemical interaction data by means of interaction matrix weighting and dual regularization from both chemicals and proteins. While the statistical foundation behind our method is general enough to encompass genome-wide drug off-target prediction, the program is specifically tailored to find protein targets for new chemicals with little to no available interaction data. We extensively evaluate our method using a number of the most widely accepted gene-specific and cross-gene family benchmarks and demonstrate that our method outperforms other state-of-the-art algorithms for predicting the interaction of new chemicals with multiple proteins. Thus, the proposed algorithm may provide a powerful tool for multi-target drug design.
Genome editing strategies: potential tools for eradicating HIV-1/AIDS

PubMed Central

Khalili, Kamel; Gordon, Jennifer; Cosentino, Laura; Hu, Wenhui

2015-01-01

Current therapy for controlling HIV-1 infection and preventing AIDS progression has profoundly decreased viral replication in cells susceptible to HIV-1 infection, but it does not eliminate the low level of viral replication in latently infected cells which contain integrated copies of HIV-1 proviral DNA. There is an urgent need for the development of HIV-1 genome eradication strategies that will lead to a permanent or “sterile” cure of HIV-1/AIDS. In the past few years, novel nuclease-initiated genome editing tools have been developing rapidly, including ZFNs, TALENs, and the CRISPR/Cas9 system. These surgical knives, which can excise any genome, provide a great opportunity to eradicate the HIV-1 genome by targeting highly conserved regions of the HIV-1 long terminal repeats or essential viral genes. Given the time consuming and costly engineering of target-specific ZFNs and TALENs, the RNA-guided endonuclease Cas9 technology has emerged as a simpler and more versatile technology to allow permanent removal of integrated HIV-1 proviral DNA in eukaryotic cells, and hopefully animal models or human patients. The major unmet challenges of this approach at present include inefficient nuclease gene delivery, potential off-target cleavage, and cell-specific genome targeting. Nanoparticle or lentivirus-mediated delivery of next generation Cas9 technologies including nickase or RNA-guided FokI nuclease (RFN) will further improve the potential for genome editing to become a promising approach for curing HIV-1/AIDS. PMID:25716921
Targeted isolation, sequence assembly and characterization of two white spruce (Picea glauca) BAC clones for terpenoid synthase and cytochrome P450 genes involved in conifer defence reveal insights into a conifer genome

PubMed Central

2009-01-01

Background Conifers are a large group of gymnosperm trees which are separated from the angiosperms by more than 300 million years of independent evolution. Conifer genomes are extremely large and contain considerable amounts of repetitive DNA. Currently, conifer sequence resources exist predominantly as expressed sequence tags (ESTs) and full-length (FL)cDNAs. There is no genome sequence available for a conifer or any other gymnosperm. Conifer defence-related genes often group into large families with closely related members. The goals of this study are to assess the feasibility of targeted isolation and sequence assembly of conifer BAC clones containing specific genes from two large gene families, and to characterize large segments of genomic DNA sequence for the first time from a conifer. Results We used a PCR-based approach to identify BAC clones for two target genes, a terpene synthase (3-carene synthase; 3CAR) and a cytochrome P450 (CYP720B4) from a non-arrayed genomic BAC library of white spruce (Picea glauca). Shotgun genomic fragments isolated from the BAC clones were sequenced to a depth of 15.6- and 16.0-fold coverage, respectively. Assembly and manual curation yielded sequence scaffolds of 172 kbp (3CAR) and 94 kbp (CYP720B4) long. Inspection of the genomic sequences revealed the intron-exon structures, the putative promoter regions and putative cis-regulatory elements of these genes. Sequences related to transposable elements (TEs), high complexity repeats and simple repeats were prevalent and comprised approximately 40% of the sequenced genomic DNA. An in silico simulation of the effect of sequencing depth on the quality of the sequence assembly provides direction for future efforts of conifer genome sequencing. Conclusion We report the first targeted cloning, sequencing, assembly, and annotation of large segments of genomic DNA from a conifer. We demonstrate that genomic BAC clones for individual members of multi-member gene families can be isolated
Targeted isolation, sequence assembly and characterization of two white spruce (Picea glauca) BAC clones for terpenoid synthase and cytochrome P450 genes involved in conifer defence reveal insights into a conifer genome.

PubMed

Hamberger, Björn; Hall, Dawn; Yuen, Mack; Oddy, Claire; Hamberger, Britta; Keeling, Christopher I; Ritland, Carol; Ritland, Kermit; Bohlmann, Jörg

2009-08-06

Conifers are a large group of gymnosperm trees which are separated from the angiosperms by more than 300 million years of independent evolution. Conifer genomes are extremely large and contain considerable amounts of repetitive DNA. Currently, conifer sequence resources exist predominantly as expressed sequence tags (ESTs) and full-length (FL)cDNAs. There is no genome sequence available for a conifer or any other gymnosperm. Conifer defence-related genes often group into large families with closely related members. The goals of this study are to assess the feasibility of targeted isolation and sequence assembly of conifer BAC clones containing specific genes from two large gene families, and to characterize large segments of genomic DNA sequence for the first time from a conifer. We used a PCR-based approach to identify BAC clones for two target genes, a terpene synthase (3-carene synthase; 3CAR) and a cytochrome P450 (CYP720B4) from a non-arrayed genomic BAC library of white spruce (Picea glauca). Shotgun genomic fragments isolated from the BAC clones were sequenced to a depth of 15.6- and 16.0-fold coverage, respectively. Assembly and manual curation yielded sequence scaffolds of 172 kbp (3CAR) and 94 kbp (CYP720B4) long. Inspection of the genomic sequences revealed the intron-exon structures, the putative promoter regions and putative cis-regulatory elements of these genes. Sequences related to transposable elements (TEs), high complexity repeats and simple repeats were prevalent and comprised approximately 40% of the sequenced genomic DNA. An in silico simulation of the effect of sequencing depth on the quality of the sequence assembly provides direction for future efforts of conifer genome sequencing. We report the first targeted cloning, sequencing, assembly, and annotation of large segments of genomic DNA from a conifer. We demonstrate that genomic BAC clones for individual members of multi-member gene families can be isolated in a gene-specific fashion. The
Characterisation of the subtelomeric regions of Giardia lamblia genome isolate WBC6.

PubMed

Prabhu, Anjali; Morrison, Hilary G; Martinez, Charles R; Adam, Rodney D

2007-04-01

Giardia trophozoites are polyploid and have five chromosomes. The chromosome homologues demonstrate considerable size heterogeneity due to variation in the subtelomeric regions. We used clones from the genome project with telomeric sequence at one end to identify six subtelomeric regions in addition to previously identified subtelomeric regions, to study the telomeric arrangement of the chromosomes. The subtelomeric regions included two retroposons, one retroposon pseudogene, and two vsp genes, in addition to the previously identified subtelomeric regions that include ribosomal DNA repeats. The presence of vsp genes in a subtelomeric region suggests that telomeric rearrangements may contribute to the generation of vsp diversity. These studies of the subtelomeric regions of Giardia may contribute to our understanding of the factors that maintain stability, while allowing diversity in chromosome structure.
L1-associated genomic regions are deleted in somatic cells of the healthy human brain.

PubMed

Erwin, Jennifer A; Paquola, Apuã C M; Singer, Tatjana; Gallina, Iryna; Novotny, Mark; Quayle, Carolina; Bedrosian, Tracy A; Alves, Francisco I A; Butcher, Cheyenne R; Herdy, Joseph R; Sarkar, Anindita; Lasken, Roger S; Muotri, Alysson R; Gage, Fred H

2016-12-01

The healthy human brain is a mosaic of varied genomes. Long interspersed element-1 (LINE-1 or L1) retrotransposition is known to create mosaicism by inserting L1 sequences into new locations of somatic cell genomes. Using a machine learning-based, single-cell sequencing approach, we discovered that somatic L1-associated variants (SLAVs) are composed of two classes: L1 retrotransposition insertions and retrotransposition-independent L1-associated variants. We demonstrate that a subset of SLAVs comprises somatic deletions generated by L1 endonuclease cutting activity. Retrotransposition-independent rearrangements in inherited L1s resulted in the deletion of proximal genomic regions. These rearrangements were resolved by microhomology-mediated repair, which suggests that L1-associated genomic regions are hotspots for somatic copy number variants in the brain and therefore a heritable genetic contributor to somatic mosaicism. We demonstrate that SLAVs are present in crucial neural genes, such as DLG2 (also called PSD93), and affect 44-63% of cells of the cells in the healthy brain.
L1-Associated Genomic Regions are Deleted in Somatic Cells of the Healthy Human Brain

PubMed Central

Erwin, Jennifer A.; Paquola, Apuã C.M.; Singer, Tatjana; Gallina, Iryna; Novotny, Mark; Quayle, Carolina; Bedrosian, Tracy; Ivanio, Francisco; Butcher, Cheyenne R.; Herdy, Joseph R.; Sarkar, Anindita; Lasken, Roger S.; Muotri, Alysson R.; Gage, Fred H.

2016-01-01

The healthy human brain is a mosaic of varied genomes. L1 retrotransposition is known to create mosaicism by inserting L1 sequences into new locations of somatic cell genomes. Using a machine learning-based, single-cell sequencing approach, we discovered that Somatic L1-Associated Variants (SLAVs) are actually composed of two classes: L1 retrotransposition insertions and retrotransposition-independent L1-associated variants. We demonstrate that a subset of SLAVs are, in fact, somatic deletions generated by L1 endonuclease cutting activity. Retrotransposition- independent rearrangements within inherited L1s resulted in the deletion of proximal genomic regions. These rearrangements were resolved by microhomology-mediated repair, which suggests that L1-associated genomic regions are hotspots for somatic copy number variants in the brain and therefore a heritable genetic contributor to somatic mosaicism. We demonstrate that SLAVs are present in crucial neural genes, such as DLG2/PSD93, and affect between 44–63% of cells of the cells in the healthy brain. PMID:27618310
Identification of genomic sites for CRISPR/Cas9-based genome editing in the Vitis vinifera genome

USDA-ARS?s Scientific Manuscript database

CRISPR/Cas9 has been recently demonstrated as an effective and popular genome editing tool for modifying genomes of human, animals, microorganisms, and plants. Success of such genome editing is highly dependent on the availability of suitable target sites in the genomes to be edited. Many specific t...
Genomic signatures of positive selection in humans and the limits of outlier approaches.

PubMed

Kelley, Joanna L; Madeoy, Jennifer; Calhoun, John C; Swanson, Willie; Akey, Joshua M

2006-08-01

Identifying regions of the human genome that have been targets of positive selection will provide important insights into recent human evolutionary history and may facilitate the search for complex disease genes. However, the confounding effects of population demographic history and selection on patterns of genetic variation complicate inferences of selection when a small number of loci are studied. To this end, identifying outlier loci from empirical genome-wide distributions of genetic variation is a promising strategy to detect targets of selection. Here, we evaluate the power and efficiency of a simple outlier approach and describe a genome-wide scan for positive selection using a dense catalog of 1.58 million SNPs that were genotyped in three human populations. In total, we analyzed 14,589 genes, 385 of which possess patterns of genetic variation consistent with the hypothesis of positive selection. Furthermore, several extended genomic regions were found, spanning >500 kb, that contained multiple contiguous candidate selection genes. More generally, these data provide important practical insights into the limits of outlier approaches in genome-wide scans for selection, provide strong candidate selection genes to study in greater detail, and may have important implications for disease related research.
Predicted stem-loop structures and variation in nucleotide sequence of 3' noncoding regions among animal calicivirus genomes.

PubMed

Seal, B S; Neill, J D; Ridpath, J F

1994-07-01

Caliciviruses are nonenveloped with a polyadenylated genome of approximately 7.6 kb and a single capsid protein. The "RNA Fold" computer program was used to analyze 3'-terminal noncoding sequences of five feline calicivirus (FCV), rabbit hemorrhagic disease virus (RHDV), and two San Miguel sea lion virus (SMSV) isolates. The FCV 3'-terminal sequences are 40-46 nucleotides in length and 72-91% similar. The FCV sequences were predicted to contain two possible duplex structures and one stem-loop structure with free energies of -2.1 to -18.2 kcal/mole. The RHDV genomic 3'-terminal RNA sequences are 54 nucleotides in length and share 49% sequence similarity to homologous regions of the FCV genome. The RHDV sequence was predicted to form two duplex structures in the 3'-terminal noncoding region with a single stem-loop structure, resembling that of FCV. In contrast, the SMSV 1 and 4 genomic 3'-terminal noncoding sequences were 185 and 182 nucleotides in length, respectively. Ten possible duplex structures were predicted with an average structural free energy of -35 kcal/mole. Sequence similarity between the two SMSV isolates was 75%. Furthermore, extensive cloverleaflike structures are predicted in the 3' noncoding region of the SMSV genome, in contrast to the predicted single stem-loop structures of FCV or RHDV.
Recurrent Targeted Genes of Hepatitis B Virus in the Liver Cancer Genomes Identified by a Next-Generation Sequencing–Based Approach

PubMed Central

Ding, Dong; Lou, Xiaoyan; Hua, Dasong; Yu, Wei; Li, Lisha; Wang, Jun; Gao, Feng; Zhao, Na; Ren, Guoping; Li, Lanjuan; Lin, Biaoyang

2012-01-01

Integration of the viral DNA into host chromosomes was found in most of the hepatitis B virus (HBV)–related hepatocellular carcinomas (HCCs). Here we devised a massive anchored parallel sequencing (MAPS) method using next-generation sequencing to isolate and sequence HBV integrants. Applying MAPS to 40 pairs of HBV–related HCC tissues (cancer and adjacent tissues), we identified 296 HBV integration events corresponding to 286 unique integration sites (UISs) with precise HBV–Human DNA junctions. HBV integration favored chromosome 17 and preferentially integrated into human transcript units. HBV targeted genes were enriched in GO terms: cAMP metabolic processes, T cell differentiation and activation, TGF beta receptor pathway, ncRNA catabolic process, and dsRNA fragmentation and cellular response to dsRNA. The HBV targeted genes include 7 genes (PTPRJ, CNTN6, IL12B, MYOM1, FNDC3B, LRFN2, FN1) containing IPR003961 (Fibronectin, type III domain), 7 genes (NRG3, MASP2, NELL1, LRP1B, ADAM21, NRXN1, FN1) containing IPR013032 (EGF-like region, conserved site), and three genes (PDE7A, PDE4B, PDE11A) containing IPR002073 (3′, 5′-cyclic-nucleotide phosphodiesterase). Enriched pathways include hsa04512 (ECM-receptor interaction), hsa04510 (Focal adhesion), and hsa04012 (ErbB signaling pathway). Fewer integration events were found in cancers compared to cancer-adjacent tissues, suggesting a clonal expansion model in HCC development. Finally, we identified 8 genes that were recurrent target genes by HBV integration including fibronectin 1 (FN1) and telomerase reverse transcriptase (TERT1), two known recurrent target genes, and additional novel target genes such as SMAD family member 5 (SMAD5), phosphatase and actin regulator 4 (PHACTR4), and RNA binding protein fox-1 homolog (C. elegans) 1 (RBFOX1). Integrating analysis with recently published whole-genome sequencing analysis, we identified 14 additional recurrent HBV target genes, greatly expanding the HBV recurrent
Genetically based location from triploid populations and gene ontology of a 3.3-mb genome region linked to Alternaria brown spot resistance in citrus reveal clusters of resistance genes.

PubMed

Cuenca, José; Aleza, Pablo; Vicent, Antonio; Brunel, Dominique; Ollitrault, Patrick; Navarro, Luis

2013-01-01

Genetic analysis of phenotypical traits and marker-trait association in polyploid species is generally considered as a challenge. In the present work, different approaches were combined taking advantage of the particular genetic structures of 2n gametes resulting from second division restitution (SDR) to map a genome region linked to Alternaria brown spot (ABS) resistance in triploid citrus progeny. ABS in citrus is a serious disease caused by the tangerine pathotype of the fungus Alternaria alternata. This pathogen produces ACT-toxin, which induces necrotic lesions on fruit and young leaves, defoliation and fruit drop in susceptible genotypes. It is a strong concern for triploid breeding programs aiming to produce seedless mandarin cultivars. The monolocus dominant inheritance of susceptibility, proposed on the basis of diploid population studies, was corroborated in triploid progeny. Bulk segregant analysis coupled with genome scan using a large set of genetically mapped SNP markers and targeted genetic mapping by half tetrad analysis, using SSR and SNP markers, allowed locating a 3.3 Mb genomic region linked to ABS resistance near the centromere of chromosome III. Clusters of resistance genes were identified by gene ontology analysis of this genomic region. Some of these genes are good candidates to control the dominant susceptibility to the ACT-toxin. SSR and SNP markers were developed for efficient early marker-assisted selection of ABS resistant hybrids.
Genetically Based Location from Triploid Populations and Gene Ontology of a 3.3-Mb Genome Region Linked to Alternaria Brown Spot Resistance in Citrus Reveal Clusters of Resistance Genes

PubMed Central

Cuenca, José; Aleza, Pablo; Vicent, Antonio; Brunel, Dominique; Ollitrault, Patrick; Navarro, Luis

2013-01-01

Genetic analysis of phenotypical traits and marker-trait association in polyploid species is generally considered as a challenge. In the present work, different approaches were combined taking advantage of the particular genetic structures of 2n gametes resulting from second division restitution (SDR) to map a genome region linked to Alternaria brown spot (ABS) resistance in triploid citrus progeny. ABS in citrus is a serious disease caused by the tangerine pathotype of the fungus Alternaria alternata. This pathogen produces ACT-toxin, which induces necrotic lesions on fruit and young leaves, defoliation and fruit drop in susceptible genotypes. It is a strong concern for triploid breeding programs aiming to produce seedless mandarin cultivars. The monolocus dominant inheritance of susceptibility, proposed on the basis of diploid population studies, was corroborated in triploid progeny. Bulk segregant analysis coupled with genome scan using a large set of genetically mapped SNP markers and targeted genetic mapping by half tetrad analysis, using SSR and SNP markers, allowed locating a 3.3 Mb genomic region linked to ABS resistance near the centromere of chromosome III. Clusters of resistance genes were identified by gene ontology analysis of this genomic region. Some of these genes are good candidates to control the dominant susceptibility to the ACT-toxin. SSR and SNP markers were developed for efficient early marker-assisted selection of ABS resistant hybrids. PMID:24116149
Genomic and Epigenomic Alterations in Cancer.

PubMed

Chakravarthi, Balabhadrapatruni V S K; Nepal, Saroj; Varambally, Sooryanarayana

2016-07-01

Multiple genetic and epigenetic events characterize tumor progression and define the identity of the tumors. Advances in high-throughput technologies, like gene expression profiling, next-generation sequencing, proteomics, and metabolomics, have enabled detailed molecular characterization of various tumors. The integration and analyses of these high-throughput data have unraveled many novel molecular aberrations and network alterations in tumors. These molecular alterations include multiple cancer-driving mutations, gene fusions, amplification, deletion, and post-translational modifications, among others. Many of these genomic events are being used in cancer diagnosis, whereas others are therapeutically targeted with small-molecule inhibitors. Multiple genes/enzymes that play a role in DNA and histone modifications are also altered in various cancers, changing the epigenomic landscape during cancer initiation and progression. Apart from protein-coding genes, studies are uncovering the critical regulatory roles played by noncoding RNAs and noncoding regions of the genome during cancer progression. Many of these genomic and epigenetic events function in tandem to drive tumor development and metastasis. Concurrent advances in genome-modulating technologies, like gene silencing and genome editing, are providing ability to understand in detail the process of cancer initiation, progression, and signaling as well as opening up avenues for therapeutic targeting. In this review, we discuss some of the recent advances in cancer genomic and epigenomic research. Copyright © 2016 American Society for Investigative Pathology. Published by Elsevier Inc. All rights reserved.
CTD² Dashboard: a searchable web interface to connect validated results from the Cancer Target Discovery and Development Network* | Office of Cancer Genomics

Cancer.gov

The Cancer Target Discovery and Development (CTD2) Network aims to use functional genomics to accelerate the translation of high-throughput and high-content genomic and small-molecule data towards use in precision oncology.

Dataset of potential targets for Mycobacterium tuberculosis H37Rv through comparative genome analysis.

PubMed

Asif, Siddiqui M; Asad, Amir; Faizan, Ahmad; Anjali, Malik S; Arvind, Arya; Neelesh, Kapoor; Hirdesh, Kumar; Sanjay, Kumar

2009-12-31

Mycobacterium tuberculosis is the causative agent of the disease, tuberculosis and H37Rv is the most studied clinical strain. We use comparative genome analysis of Mycobacterium tuberculosis H37Rv and human for the identification of potential targets dataset. We used DEG (Database of Essential Genes) to identify essential genes in the H37Rv strain. The analysis shows that 628 of the 3989 genes in Mycobacterium tuberculosis H37Rv were found to be essential of which 324 genes lack similarity to the human genome. Subsequently hypothetical proteins were removed through manual curation. This further resulted in a dataset of 135 proteins with essential function and no homology to human.
Target identification in Fusobacterium nucleatum by subtractive genomics approach and enrichment analysis of host-pathogen protein-protein interactions.

PubMed

Kumar, Amit; Thotakura, Pragna Lakshmi; Tiwary, Basant Kumar; Krishna, Ramadas

2016-05-12

Fusobacterium nucleatum, a well studied bacterium in periodontal diseases, appendicitis, gingivitis, osteomyelitis and pregnancy complications has recently gained attention due to its association with colorectal cancer (CRC) progression. Treatment with berberine was shown to reverse F. nucleatum-induced CRC progression in mice by balancing the growth of opportunistic pathogens in tumor microenvironment. Intestinal microbiota imbalance and the infections caused by F. nucleatum might be regulated by therapeutic intervention. Hence, we aimed to predict drug target proteins in F. nucleatum, through subtractive genomics approach and host-pathogen protein-protein interactions (HP-PPIs). We also carried out enrichment analysis of host interacting partners to hypothesize the possible mechanisms involved in CRC progression due to F. nucleatum. In subtractive genomics approach, the essential, virulence and resistance related proteins were retrieved from RefSeq proteome of F. nucleatum by searching against Database of Essential Genes (DEG), Virulence Factor Database (VFDB) and Antibiotic Resistance Gene-ANNOTation (ARG-ANNOT) tool respectively. A subsequent hierarchical screening to identify non-human homologous, metabolic pathway-independent/pathway-specific and druggable proteins resulted in eight pathway-independent and 27 pathway-specific druggable targets. Co-aggregation of F. nucleatum with host induces proinflammatory gene expression thereby potentiates tumorigenesis. Hence, proteins from IBDsite, a database for inflammatory bowel disease (IBD) research and those involved in colorectal adenocarcinoma as interpreted from The Cancer Genome Atlas (TCGA) were retrieved to predict drug targets based on HP-PPIs with F. nucleatum proteome. Prediction of HP-PPIs exhibited 186 interactions contributed by 103 host and 76 bacterial proteins. Bacterial interacting partners were accounted as putative targets. And enrichment analysis of host interacting partners showed statistically
A Cryptosporidium parvum genomic region encoding hemolytic activity.

PubMed Central

Steele, M I; Kuhls, T L; Nida, K; Meka, C S; Halabi, I M; Mosier, D A; Elliott, W; Crawford, D L; Greenfield, R A

1995-01-01

Successful parasitization by Cryptosporidium parvum requires multiple disruptions in both host and protozoan cell membranes as cryptosporidial sporozoites invade intestinal epithelial cells and subsequently develop into asexual and sexual life stages. To identify cryptosporidial proteins which may play a role in these membrane alterations, hemolytic activity was used as a marker to screen a C. parvum genomic expression library. A stable hemolytic clone (H4) containing a 5.5-kb cryptosporidial genomic fragment was identified. The hemolytic activity encoded on H4 was mapped to a 1-kb region that contained a complete 690-bp open reading frame (hemA) ending in a common stop codon. A 21-kDa plasmid-encoded recombinant protein was expressed in maxicells containing H4. Subclones of H4 which contained only a portion of hemA did not induce hemolysis on blood agar or promote expression of the recombinant protein in maxicells. Reverse transcriptase-mediated PCR analysis of total RNA isolated from excysted sporozoites and the intestines of infected adult mice with severe combined immunodeficiency demonstrated that hemA is actively transcribed during the cryptosporidial life cycle. PMID:7558289
Complete sequences of organelle genomes from the medicinal plant Rhazya stricta (Apocynaceae) and contrasting patterns of mitochondrial genome evolution across asterids.

PubMed

Park, Seongjun; Ruhlman, Tracey A; Sabir, Jamal S M; Mutwakil, Mohammed H Z; Baeshen, Mohammed N; Sabir, Meshaal J; Baeshen, Nabih A; Jansen, Robert K

2014-05-28

Rhazya stricta is native to arid regions in South Asia and the Middle East and is used extensively in folk medicine to treat a wide range of diseases. In addition to generating genomic resources for this medicinally important plant, analyses of the complete plastid and mitochondrial genomes and a nuclear transcriptome from Rhazya provide insights into inter-compartmental transfers between genomes and the patterns of evolution among eight asterid mitochondrial genomes. The 154,841 bp plastid genome is highly conserved with gene content and order identical to the ancestral organization of angiosperms. The 548,608 bp mitochondrial genome exhibits a number of phenomena including the presence of recombinogenic repeats that generate a multipartite organization, transferred DNA from the plastid and nuclear genomes, and bidirectional DNA transfers between the mitochondrion and the nucleus. The mitochondrial genes sdh3 and rps14 have been transferred to the nucleus and have acquired targeting presequences. In the case of rps14, two copies are present in the nucleus; only one has a mitochondrial targeting presequence and may be functional. Phylogenetic analyses of both nuclear and mitochondrial copies of rps14 across angiosperms suggests Rhazya has experienced a single transfer of this gene to the nucleus, followed by a duplication event. Furthermore, the phylogenetic distribution of gene losses and the high level of sequence divergence in targeting presequences suggest multiple, independent transfers of both sdh3 and rps14 across asterids. Comparative analyses of mitochondrial genomes of eight sequenced asterids indicates a complicated evolutionary history in this large angiosperm clade with considerable diversity in genome organization and size, repeat, gene and intron content, and amount of foreign DNA from the plastid and nuclear genomes. Organelle genomes of Rhazya stricta provide valuable information for improving the understanding of mitochondrial genome evolution
Tandem repeat regions within the Burkholderia pseudomallei genome and their application for high resolution genotyping.

PubMed

U'Ren, Jana M; Schupp, James M; Pearson, Talima; Hornstra, Heidie; Friedman, Christine L Clark; Smith, Kimothy L; Daugherty, Rebecca R Leadem; Rhoton, Shane D; Leadem, Ben; Georgia, Shalamar; Cardon, Michelle; Huynh, Lynn Y; DeShazer, David; Harvey, Steven P; Robison, Richard; Gal, Daniel; Mayo, Mark J; Wagner, David; Currie, Bart J; Keim, Paul

2007-03-30

The facultative, intracellular bacterium Burkholderia pseudomallei is the causative agent of melioidosis, a serious infectious disease of humans and animals. We identified and categorized tandem repeat arrays and their distribution throughout the genome of B. pseudomallei strain K96243 in order to develop a genetic typing method for B. pseudomallei. We then screened 104 of the potentially polymorphic loci across a diverse panel of 31 isolates including B. pseudomallei, B. mallei and B. thailandensis in order to identify loci with varying degrees of polymorphism. A subset of these tandem repeat arrays were subsequently developed into a multiple-locus VNTR analysis to examine 66 B. pseudomallei and 21 B. mallei isolates from around the world, as well as 95 lineages from a serial transfer experiment encompassing ~18,000 generations. B. pseudomallei contains a preponderance of tandem repeat loci throughout its genome, many of which are duplicated elsewhere in the genome. The majority of these loci are composed of repeat motif lengths of 6 to 9 bp with 4 to 10 repeat units and are predominately located in intergenic regions of the genome. Across geographically diverse B. pseudomallei and B.mallei isolates, the 32 VNTR loci displayed between 7 and 28 alleles, with Nei's diversity values ranging from 0.47 and 0.94. Mutation rates for these loci are comparable (>10-5 per locus per generation) to that of the most diverse tandemly repeated regions found in other less diverse bacteria. The frequency, location and duplicate nature of tandemly repeated regions within the B. pseudomallei genome indicate that these tandem repeat regions may play a role in generating and maintaining adaptive genomic variation. Multiple-locus VNTR analysis revealed extensive diversity within the global isolate set containing B. pseudomallei and B. mallei, and it detected genotypic differences within clonal lineages of both species that were identical using previous typing methods. Given the health
Pediatric, Adolescent, and Young Adult Thyroid Carcinoma Harbors Frequent and Diverse Targetable Genomic Alterations, Including Kinase Fusions

PubMed Central

Schrock, Alexa B.; Anderson, Peter M.; Morris, John C.; Heilmann, Andreas M.; Holmes, Oliver; Wang, Kai; Johnson, Adrienne; Waguespack, Steven G.; Ou, Sai‐Hong Ignatius; Khan, Saad; Fung, Kar‐Ming; Stephens, Philip J.; Erlich, Rachel L.; Miller, Vincent A.; Ross, Jeffrey S.; Ali, Siraj M.

2017-01-01

Background. Thyroid carcinoma, which is rare in pediatric patients (age 0–18 years) but more common in adolescent and young adult (AYA) patients (age 15–39 years), carries the potential for morbidity and mortality. Methods. Hybrid‐capture‐based comprehensive genomic profiling (CGP) was performed prospectively on 512 consecutively submitted thyroid carcinomas, including 58 from pediatric and AYA (PAYA) patients, to identify genomic alterations (GAs), including base substitutions, insertions/deletions, copy number alterations, and rearrangements. This PAYA data series includes 41 patients with papillary thyroid carcinoma (PTC), 3 with anaplastic thyroid carcinoma (ATC), and 14 with medullary thyroid carcinoma (MTC). Results. GAs were detected in 93% (54/58) of PAYA cases, with a mean of 1.4 GAs per case. In addition to BRAF V600E mutations, detected in 46% (19/41) of PAYA PTC cases and in 1 of 3 AYA ATC cases, oncogenic fusions involving RET, NTRK1, NTRK3, and ALK were detected in 37% (15/41) of PAYA PTC and 33% (1/3) of AYA ATC cases. Ninety‐three percent (13/14) of MTC patients harbored RET alterations, including 3 novel insertions/deletions in exons 6 and 11. Two of these MTC patients with novel alterations in RET experienced clinical benefit from vandetanib treatment. Conclusion. CGP identified diverse clinically relevant GAs in PAYA patients with thyroid carcinoma, including 83% (34/41) of PTC cases harboring activating kinase mutations or activating kinase rearrangements. These genomic observations and index cases exhibiting clinical benefit from targeted therapy suggest that young patients with advanced thyroid carcinoma can benefit from CGP and rationally matched targeted therapy. Implications for Practice. The detection of diverse clinically relevant genomic alterations in the majority of pediatric, adolescent, and young adult patients with thyroid carcinoma in this study suggests that comprehensive genomic profiling may be beneficial for young
A Tool for Multiple Targeted Genome Deletions that Is Precise, Scar-Free, and Suitable for Automation

PubMed Central

Aubrey, Wayne; Riley, Michael C.; Young, Michael; King, Ross D.; Oliver, Stephen G.; Clare, Amanda

2015-01-01

Many advances in synthetic biology require the removal of a large number of genomic elements from a genome. Most existing deletion methods leave behind markers, and as there are a limited number of markers, such methods can only be applied a fixed number of times. Deletion methods that recycle markers generally are either imprecise (remove untargeted sequences), or leave scar sequences which can cause genome instability and rearrangements. No existing marker recycling method is automation-friendly. We have developed a novel openly available deletion tool that consists of: 1) a method for deleting genomic elements that can be repeatedly used without limit, is precise, scar-free, and suitable for automation; and 2) software to design the method’s primers. Our tool is sequence agnostic and could be used to delete large numbers of coding sequences, promoter regions, transcription factor binding sites, terminators, etc in a single genome. We have validated our tool on the deletion of non-essential open reading frames (ORFs) from S. cerevisiae. The tool is applicable to arbitrary genomes, and we provide primer sequences for the deletion of: 90% of the ORFs from the S. cerevisiae genome, 88% of the ORFs from S. pombe genome, and 85% of the ORFs from the L. lactis genome. PMID:26630677
A Tool for Multiple Targeted Genome Deletions that Is Precise, Scar-Free, and Suitable for Automation.

PubMed

Aubrey, Wayne; Riley, Michael C; Young, Michael; King, Ross D; Oliver, Stephen G; Clare, Amanda

2015-01-01

Many advances in synthetic biology require the removal of a large number of genomic elements from a genome. Most existing deletion methods leave behind markers, and as there are a limited number of markers, such methods can only be applied a fixed number of times. Deletion methods that recycle markers generally are either imprecise (remove untargeted sequences), or leave scar sequences which can cause genome instability and rearrangements. No existing marker recycling method is automation-friendly. We have developed a novel openly available deletion tool that consists of: 1) a method for deleting genomic elements that can be repeatedly used without limit, is precise, scar-free, and suitable for automation; and 2) software to design the method's primers. Our tool is sequence agnostic and could be used to delete large numbers of coding sequences, promoter regions, transcription factor binding sites, terminators, etc in a single genome. We have validated our tool on the deletion of non-essential open reading frames (ORFs) from S. cerevisiae. The tool is applicable to arbitrary genomes, and we provide primer sequences for the deletion of: 90% of the ORFs from the S. cerevisiae genome, 88% of the ORFs from S. pombe genome, and 85% of the ORFs from the L. lactis genome.
RRE: a tool for the extraction of non-coding regions surrounding annotated genes from genomic datasets.

PubMed

Lazzarato, F; Franceschinis, G; Botta, M; Cordero, F; Calogero, R A

2004-11-01

RRE allows the extraction of non-coding regions surrounding a coding sequence [i.e. gene upstream region, 5'-untranslated region (5'-UTR), introns, 3'-UTR, downstream region] from annotated genomic datasets available at NCBI. RRE parser and web-based interface are accessible at http://www.bioinformatica.unito.it/bioinformatics/rre/rre.html
Genomic and oncogenic preference of HBV integration in hepatocellular carcinoma

PubMed Central

Zhao, Ling-Hao; Liu, Xiao; Yan, He-Xin; Li, Wei-Yang; Zeng, Xi; Yang, Yuan; Zhao, Jie; Liu, Shi-Ping; Zhuang, Xue-Han; Lin, Chuan; Qin, Chen-Jie; Zhao, Yi; Pan, Ze-Ya; Huang, Gang; Liu, Hui; Zhang, Jin; Wang, Ruo-Yu; Yang, Yun; Wen, Wen; Lv, Gui-Shuai; Zhang, Hui-Lu; Wu, Han; Huang, Shuai; Wang, Ming-Da; Tang, Liang; Cao, Hong-Zhi; Wang, Ling; Lee, Tin-Lap; Jiang, Hui; Tan, Ye-Xiong; Yuan, Sheng-Xian; Hou, Guo-Jun; Tao, Qi-Fei; Xu, Qin-Guo; Zhang, Xiu-Qing; Wu, Meng-Chao; Xu, Xun; Wang, Jun; Yang, Huan-Ming; Zhou, Wei-Ping; Wang, Hong-Yang

2016-01-01

Hepatitis B virus (HBV) can integrate into the human genome, contributing to genomic instability and hepatocarcinogenesis. Here by conducting high-throughput viral integration detection and RNA sequencing, we identify 4,225 HBV integration events in tumour and adjacent non-tumour samples from 426 patients with HCC. We show that HBV is prone to integrate into rare fragile sites and functional genomic regions including CpG islands. We observe a distinct pattern in the preferential sites of HBV integration between tumour and non-tumour tissues. HBV insertional sites are significantly enriched in the proximity of telomeres in tumours. Recurrent HBV target genes are identified with few that overlap. The overall HBV integration frequency is much higher in tumour genomes of males than in females, with a significant enrichment of integration into chromosome 17. Furthermore, a cirrhosis-dependent HBV integration pattern is observed, affecting distinct targeted genes. Our data suggest that HBV integration has a high potential to drive oncogenic transformation. PMID:27703150
Novel myosin mutations for hereditary hearing loss revealed by targeted genomic capture and massively parallel sequencing

PubMed Central

Brownstein, Zippora; Abu-Rayyan, Amal; Karfunkel-Doron, Daphne; Sirigu, Serena; Davidov, Bella; Shohat, Mordechai; Frydman, Moshe; Houdusse, Anne; Kanaan, Moien; Avraham, Karen B

2014-01-01

Hereditary hearing loss is genetically heterogeneous, with a large number of genes and mutations contributing to this sensory, often monogenic, disease. This number, as well as large size, precludes comprehensive genetic diagnosis of all known deafness genes. A combination of targeted genomic capture and massively parallel sequencing (MPS), also referred to as next-generation sequencing, was applied to determine the deafness-causing genes in hearing-impaired individuals from Israeli Jewish and Palestinian Arab families. Among the mutations detected, we identified nine novel mutations in the genes encoding myosin VI, myosin VIIA and myosin XVA, doubling the number of myosin mutations in the Middle East. Myosin VI mutations were identified in this population for the first time. Modeling of the mutations provided predicted mechanisms for the damage they inflict in the molecular motors, leading to impaired function and thus deafness. The myosin mutations span all regions of these molecular motors, leading to a wide range of hearing phenotypes, reinforcing the key role of this family of proteins in auditory function. This study demonstrates that multiple mutations responsible for hearing loss can be identified in a relatively straightforward manner by targeted-gene MPS technology and concludes that this is the optimal genetic diagnostic approach for identification of mutations responsible for hearing loss. PMID:24105371
Genomic identification of direct target genes of LEAFY

PubMed Central

William, Dilusha A.; Su, Yanhui; Smith, Michael R.; Lu, Meina; Baldwin, Don A.; Wagner, Doris

2004-01-01

The switch from vegetative to reproductive development in plants necessitates a switch in the developmental program of the descendents of the stem cells in the shoot apical meristem. Genetic and molecular investigations have demonstrated that the plant-specific transcription factor and meristem identity regulator LEAFY (LFY) controls this developmental transition by inducing expression of a second transcription factor, APETALA1, and by regulating the expression of additional, as yet unknown, genes. Here we show that the additional LFY targets include the APETALA1-related factor, CAULI-FLOWER, as well as three transcription factors and two putative signal transduction pathway components. These genes are up-regulated by LFY even when protein synthesis is inhibited and, hence, appear to be direct targets of LFY. Supporting this conclusion, cis-regulatory regions upstream of these genes are bound by LFY in vivo. The newly identified LFY targets likely initiate the transcriptional changes that are required for the switch from vegetative to reproductive development in Arabidopsis. PMID:14736918
Genetic recombination is targeted towards gene promoter regions in dogs.

PubMed

Auton, Adam; Rui Li, Ying; Kidd, Jeffrey; Oliveira, Kyle; Nadel, Julie; Holloway, J Kim; Hayward, Jessica J; Cohen, Paula E; Greally, John M; Wang, Jun; Bustamante, Carlos D; Boyko, Adam R

2013-01-01

The identification of the H3K4 trimethylase, PRDM9, as the gene responsible for recombination hotspot localization has provided considerable insight into the mechanisms by which recombination is initiated in mammals. However, uniquely amongst mammals, canids appear to lack a functional version of PRDM9 and may therefore provide a model for understanding recombination that occurs in the absence of PRDM9, and thus how PRDM9 functions to shape the recombination landscape. We have constructed a fine-scale genetic map from patterns of linkage disequilibrium assessed using high-throughput sequence data from 51 free-ranging dogs, Canis lupus familiaris. While broad-scale properties of recombination appear similar to other mammalian species, our fine-scale estimates indicate that canine highly elevated recombination rates are observed in the vicinity of CpG rich regions including gene promoter regions, but show little association with H3K4 trimethylation marks identified in spermatocytes. By comparison to genomic data from the Andean fox, Lycalopex culpaeus, we show that biased gene conversion is a plausible mechanism by which the high CpG content of the dog genome could have occurred.
Genetic Recombination Is Targeted towards Gene Promoter Regions in Dogs

PubMed Central

Auton, Adam; Rui Li, Ying; Kidd, Jeffrey; Oliveira, Kyle; Nadel, Julie; Holloway, J. Kim; Hayward, Jessica J.; Cohen, Paula E.; Greally, John M.; Wang, Jun; Bustamante, Carlos D.; Boyko, Adam R.

2013-01-01

The identification of the H3K4 trimethylase, PRDM9, as the gene responsible for recombination hotspot localization has provided considerable insight into the mechanisms by which recombination is initiated in mammals. However, uniquely amongst mammals, canids appear to lack a functional version of PRDM9 and may therefore provide a model for understanding recombination that occurs in the absence of PRDM9, and thus how PRDM9 functions to shape the recombination landscape. We have constructed a fine-scale genetic map from patterns of linkage disequilibrium assessed using high-throughput sequence data from 51 free-ranging dogs, Canis lupus familiaris. While broad-scale properties of recombination appear similar to other mammalian species, our fine-scale estimates indicate that canine highly elevated recombination rates are observed in the vicinity of CpG rich regions including gene promoter regions, but show little association with H3K4 trimethylation marks identified in spermatocytes. By comparison to genomic data from the Andean fox, Lycalopex culpaeus, we show that biased gene conversion is a plausible mechanism by which the high CpG content of the dog genome could have occurred. PMID:24348265
MicroTrout: A comprehensive, genome-wide miRNA target prediction framework for rainbow trout, Oncorhynchus mykiss.

PubMed

Mennigen, Jan A; Zhang, Dapeng

2016-12-01

Rainbow trout represent an important teleost research model and aquaculture species. As such, rainbow trout are employed in diverse areas of biological research, including basic biological disciplines such as comparative physiology, toxicology, and, since rainbow trout have undergone both teleost- and salmonid-specific rounds of genome duplication, molecular evolution. In recent years, microRNAs (miRNAs, small non-protein coding RNAs) have emerged as important posttranscriptional regulators of gene expression in animals. Given the increasingly recognized importance of miRNAs as an additional layer in the regulation of gene expression and hence biological function, recent efforts using RNA- and genome sequencing approaches have resulted in the creation of several resources for the construction of a comprehensive repertoire of rainbow trout miRNAs and isomiRs (variant miRNA sequences that all appear to derive from the same gene but vary in sequence due to post-transcriptional processing). Importantly, through the recent publication of the rainbow trout genome (Berthelot et al., 2014), mRNA 3'UTR information has become available, allowing for the first time the genome-wide prediction of miRNA-target RNA relationships in this species. We here report the creation of the microtrout database, a comprehensive resource for rainbow trout miRNA and annotated 3'UTRs. The comprehensive database was used to implement an algorithm to predict genome-wide rainbow trout-specific miRNA-mRNA target relationships, generating an improved predictive framework over previously published approaches. This work will serve as a useful framework and sequence resource to experimentally address the role of miRNAs in several research areas using the rainbow trout model, examples of which are discussed. Copyright © 2016 Elsevier Inc. All rights reserved.
Comparative genomics of Lupinus angustifolius gene-rich regions: BAC library exploration, genetic mapping and cytogenetics

PubMed Central

2013-01-01

Background The narrow-leafed lupin, Lupinus angustifolius L., is a grain legume species with a relatively compact genome. The species has 2n = 40 chromosomes and its genome size is 960 Mbp/1C. During the last decade, L. angustifolius genomic studies have achieved several milestones, such as molecular-marker development, linkage maps, and bacterial artificial chromosome (BAC) libraries. Here, these resources were integratively used to identify and sequence two gene-rich regions (GRRs) of the genome. Results The genome was screened with a probe representing the sequence of a microsatellite fragment length polymorphism (MFLP) marker linked to Phomopsis stem blight resistance. BAC clones selected by hybridization were subjected to restriction fingerprinting and contig assembly, and 232 BAC-ends were sequenced and annotated. BAC fluorescence in situ hybridization (BAC-FISH) identified eight single-locus clones. Based on physical mapping, cytogenetic localization, and BAC-end annotation, five clones were chosen for sequencing. Within the sequences of clones that hybridized in FISH to a single-locus, two large GRRs were identified. The GRRs showed strong and conserved synteny to Glycine max duplicated genome regions, illustrated by both identical gene order and parallel orientation. In contrast, in the clones with dispersed FISH signals, more than one-third of sequences were transposable elements. Sequenced, single-locus clones were used to develop 12 genetic markers, increasing the number of L. angustifolius chromosomes linked to appropriate linkage groups by five pairs. Conclusions In general, probes originating from MFLP sequences can assist genome screening and gene discovery. However, such probes are not useful for positional cloning, because they tend to hybridize to numerous loci. GRRs identified in L. angustifolius contained a low number of interspersed repeats and had a high level of synteny to the genome of the model legume G. max. Our results showed that
A Genome-Wide Association Study Identifies Multiple Regions Associated with Head Size in Catfish

PubMed Central

Geng, Xin; Liu, Shikai; Yao, Jun; Bao, Lisui; Zhang, Jiaren; Li, Chao; Wang, Ruijia; Sha, Jin; Zeng, Peng; Zhi, Degui; Liu, Zhanjiang

2016-01-01

Skull morphology is fundamental to evolution and the biological adaptation of species to their environments. With aquaculture fish species, head size is also important for economic reasons because it has a direct impact on fillet yield. However, little is known about the underlying genetic basis of head size. Catfish is the primary aquaculture species in the United States. In this study, we performed a genome-wide association study using the catfish 250K SNP array with backcross hybrid catfish to map the QTL for head size (head length, head width, and head depth). One significantly associated region on linkage group (LG) 7 was identified for head length. In addition, LGs 7, 9, and 16 contain suggestively associated regions for head length. For head width, significantly associated regions were found on LG9, and additional suggestively associated regions were identified on LGs 5 and 7. No region was found associated with head depth. Head size genetic loci were mapped in catfish to genomic regions with candidate genes involved in bone development. Comparative analysis indicated that homologs of several candidate genes are also involved in skull morphology in various other species ranging from amphibian to mammalian species, suggesting possible evolutionary conservation of those genes in the control of skull morphologies. PMID:27558670
Genomes by design

PubMed Central

Haimovich, Adrian D.; Muir, Paul; Isaacs, Farren J.

2016-01-01

Next-generation DNA sequencing has revealed the complete genome sequences of numerous organisms, establishing a fundamental and growing understanding of genetic variation and phenotypic diversity. Engineering at the gene, network and whole-genome scale aims to introduce targeted genetic changes both to explore emergent phenotypes and to introduce new functionalities. Expansion of these approaches into massively parallel platforms establishes the ability to generate targeted genome modifications, elucidating causal links between genotype and phenotype, as well as the ability to design and reprogramme organisms. In this Review, we explore techniques and applications in genome engineering, outlining key advances and defining challenges. PMID:26260262
Whole Genome Amplification of Labeled Viable Single Cells Suited for Array-Comparative Genomic Hybridization.

PubMed

Kroneis, Thomas; El-Heliebi, Amin

2015-01-01

Understanding details of a complex biological system makes it necessary to dismantle it down to its components. Immunostaining techniques allow identification of several distinct cell types thereby giving an inside view of intercellular heterogeneity. Often staining reveals that the most remarkable cells are the rarest. To further characterize the target cells on a molecular level, single cell techniques are necessary. Here, we describe the immunostaining, micromanipulation, and whole genome amplification of single cells for the purpose of genomic characterization. First, we exemplify the preparation of cell suspensions from cultured cells as well as the isolation of peripheral mononucleated cells from blood. The target cell population is then subjected to immunostaining. After cytocentrifugation target cells are isolated by micromanipulation and forwarded to whole genome amplification. For whole genome amplification, we use GenomePlex(®) technology allowing downstream genomic analysis such as array-comparative genomic hybridization.
Center for Cancer Genomics | Office of Cancer Genomics

Cancer.gov

The Center for Cancer Genomics (CCG) was established to unify the National Cancer Institute's activities in cancer genomics, with the goal of advancing genomics research and translating findings into the clinic to improve the precise diagnosis and treatment of cancers. In addition to promoting genomic sequencing approaches, CCG aims to accelerate structural, functional and computational research to explore cancer mechanisms, discover new cancer targets, and develop new therapeutics.

Non-Homologous End Joining and Homology Directed DNA Repair Frequency of Double-Stranded Breaks Introduced by Genome Editing Reagents.

PubMed

Zaboikin, Michail; Zaboikina, Tatiana; Freter, Carl; Srinivasakumar, Narasimhachar

2017-01-01

Genome editing using transcription-activator like effector nucleases or RNA guided nucleases allows one to precisely engineer desired changes within a given target sequence. The genome editing reagents introduce double stranded breaks (DSBs) at the target site which can then undergo DNA repair by non-homologous end joining (NHEJ) or homology directed recombination (HDR) when a template DNA molecule is available. NHEJ repair results in indel mutations at the target site. As PCR amplified products from mutant target regions are likely to exhibit different melting profiles than PCR products amplified from wild type target region, we designed a high resolution melting analysis (HRMA) for rapid identification of efficient genome editing reagents. We also designed TaqMan assays using probes situated across the cut site to discriminate wild type from mutant sequences present after genome editing. The experiments revealed that the sensitivity of the assays to detect NHEJ-mediated DNA repair could be enhanced by selection of transfected cells to reduce the contribution of unmodified genomic DNA from untransfected cells to the DNA melting profile. The presence of donor template DNA lacking the target sequence at the time of genome editing further enhanced the sensitivity of the assays for detection of mutant DNA molecules by excluding the wild-type sequences modified by HDR. A second TaqMan probe that bound to an adjacent site, outside of the primary target cut site, was used to directly determine the contribution of HDR to DNA repair in the presence of the donor template sequence. The TaqMan qPCR assay, designed to measure the contribution of NHEJ and HDR in DNA repair, corroborated the results from HRMA. The data indicated that genome editing reagents can produce DSBs at high efficiency in HEK293T cells but a significant proportion of these are likely masked by reversion to wild type as a result of HDR. Supplying a donor plasmid to provide a template for HDR (that
Genome-wide target profiling of piggyBac and Tol2 in HEK 293: pros and cons for gene discovery and gene therapy

PubMed Central

2011-01-01

Background DNA transposons have emerged as indispensible tools for manipulating vertebrate genomes with applications ranging from insertional mutagenesis and transgenesis to gene therapy. To fully explore the potential of two highly active DNA transposons, piggyBac and Tol2, as mammalian genetic tools, we have conducted a side-by-side comparison of the two transposon systems in the same setting to evaluate their advantages and disadvantages for use in gene therapy and gene discovery. Results We have observed that (1) the Tol2 transposase (but not piggyBac) is highly sensitive to molecular engineering; (2) the piggyBac donor with only the 40 bp 3'-and 67 bp 5'-terminal repeat domain is sufficient for effective transposition; and (3) a small amount of piggyBac transposases results in robust transposition suggesting the piggyBac transpospase is highly active. Performing genome-wide target profiling on data sets obtained by retrieving chromosomal targeting sequences from individual clones, we have identified several piggyBac and Tol2 hotspots and observed that (4) piggyBac and Tol2 display a clear difference in targeting preferences in the human genome. Finally, we have observed that (5) only sites with a particular sequence context can be targeted by either piggyBac or Tol2. Conclusions The non-overlapping targeting preference of piggyBac and Tol2 makes them complementary research tools for manipulating mammalian genomes. PiggyBac is the most promising transposon-based vector system for achieving site-specific targeting of therapeutic genes due to the flexibility of its transposase for being molecularly engineered. Insights from this study will provide a basis for engineering piggyBac transposases to achieve site-specific therapeutic gene targeting. PMID:21447194
Functional assessment of human enhancer activities using whole-genome STARR-sequencing.

PubMed

Liu, Yuwen; Yu, Shan; Dhiman, Vineet K; Brunetti, Tonya; Eckart, Heather; White, Kevin P

2017-11-20

Genome-wide quantification of enhancer activity in the human genome has proven to be a challenging problem. Recent efforts have led to the development of powerful tools for enhancer quantification. However, because of genome size and complexity, these tools have yet to be applied to the whole human genome. In the current study, we use a human prostate cancer cell line, LNCaP as a model to perform whole human genome STARR-seq (WHG-STARR-seq) to reliably obtain an assessment of enhancer activity. This approach builds upon previously developed STARR-seq in the fly genome and CapSTARR-seq techniques in targeted human genomic regions. With an improved library preparation strategy, our approach greatly increases the library complexity per unit of starting material, which makes it feasible and cost-effective to explore the landscape of regulatory activity in the much larger human genome. In addition to our ability to identify active, accessible enhancers located in open chromatin regions, we can also detect sequences with the potential for enhancer activity that are located in inaccessible, closed chromatin regions. When treated with the histone deacetylase inhibitor, Trichostatin A, genes nearby this latter class of enhancers are up-regulated, demonstrating the potential for endogenous functionality of these regulatory elements. WHG-STARR-seq provides an improved approach to current pipelines for analysis of high complexity genomes to gain a better understanding of the intricacies of transcriptional regulation.
Genome-wide identification and characterization of microRNA genes and their targets in flax (Linum usitatissimum): Characterization of flax miRNA genes.

PubMed

Barvkar, Vitthal T; Pardeshi, Varsha C; Kale, Sandip M; Qiu, Shuqing; Rollins, Meaghen; Datla, Raju; Gupta, Vidya S; Kadoo, Narendra Y

2013-04-01

MicroRNAs (miRNAs) are small (20-24 nucleotide long) endogenous regulatory RNAs that play important roles in plant growth and development. They regulate gene expression at the post-transcriptional level by translational repression or target degradation and gene silencing. In this study, we identified 116 conserved miRNAs belonging to 23 families from the flax (Linum usitatissimum L.) genome using a computational approach. The precursor miRNAs varied in length; while most of the mature miRNAs were 21 nucleotide long, intergenic and showed conserved signatures of RNA polymerase II transcripts in their upstream regions. Promoter region analysis of the flax miRNA genes indicated prevalence of MYB transcription factor binding sites. Four miRNA gene clusters containing members of three phylogenetic groups were identified. Further, 142 target genes were predicted for these miRNAs and most of these represent transcriptional regulators. The miRNA encoding genes were expressed in diverse tissues as determined by digital expression analysis as well as real-time PCR. The expression of fourteen miRNAs and nine target genes was independently validated using the quantitative reverse transcription PCR (qRT-PCR). This study suggests that a large number of conserved plant miRNAs are also found in flax and these may play important roles in growth and development of flax.
Protospacer Adjacent Motif (PAM)-Distal Sequences Engage CRISPR Cas9 DNA Target Cleavage

PubMed Central

Ethier, Sylvain; Schmeing, T. Martin; Dostie, Josée; Pelletier, Jerry

2014-01-01

The clustered regularly interspaced short palindromic repeat (CRISPR)-associated enzyme Cas9 is an RNA-guided nuclease that has been widely adapted for genome editing in eukaryotic cells. However, the in vivo target specificity of Cas9 is poorly understood and most studies rely on in silico predictions to define the potential off-target editing spectrum. Using chromatin immunoprecipitation followed by sequencing (ChIP-seq), we delineate the genome-wide binding panorama of catalytically inactive Cas9 directed by two different single guide (sg) RNAs targeting the Trp53 locus. Cas9:sgRNA complexes are able to load onto multiple sites with short seed regions adjacent to 5′NGG3′ protospacer adjacent motifs (PAM). Yet among 43 ChIP-seq sites harboring seed regions analyzed for mutational status, we find editing only at the intended on-target locus and one off-target site. In vitro analysis of target site recognition revealed that interactions between the 5′ end of the guide and PAM-distal target sequences are necessary to efficiently engage Cas9 nucleolytic activity, providing an explanation for why off-target editing is significantly lower than expected from ChIP-seq data. PMID:25275497
megaTALs: a rare-cleaving nuclease architecture for therapeutic genome engineering.

PubMed

Boissel, Sandrine; Jarjour, Jordan; Astrakhan, Alexander; Adey, Andrew; Gouble, Agnès; Duchateau, Philippe; Shendure, Jay; Stoddard, Barry L; Certo, Michael T; Baker, David; Scharenberg, Andrew M

2014-02-01

Rare-cleaving endonucleases have emerged as important tools for making targeted genome modifications. While multiple platforms are now available to generate reagents for research applications, each existing platform has significant limitations in one or more of three key properties necessary for therapeutic application: efficiency of cleavage at the desired target site, specificity of cleavage (i.e. rate of cleavage at 'off-target' sites), and efficient/facile means for delivery to desired target cells. Here, we describe the development of a single-chain rare-cleaving nuclease architecture, which we designate 'megaTAL', in which the DNA binding region of a transcription activator-like (TAL) effector is used to 'address' a site-specific meganuclease adjacent to a single desired genomic target site. This architecture allows the generation of extremely active and hyper-specific compact nucleases that are compatible with all current viral and nonviral cell delivery methods.
Single nucleotide polymorphisms in the Mycobacterium bovis genome resolve phylogenetic relationships

USDA-ARS?s Scientific Manuscript database

Mycobacterium bovis isolates carry restricted allelic variation yet exhibit a range of disease phenotypes and host preferences. Conventional genotyping methods target small hyper-variable regions of their genome and provide anonymous biallelic information insufficient to develop phylogeny. To resolv...
In vivo genome editing via CRISPR/Cas9 mediated homology-independent targeted integration

PubMed Central

Suzuki, Keiichiro; Tsunekawa, Yuji; Hernandez-Benitez, Reyna; Wu, Jun; Zhu, Jie; Kim, Euiseok J.; Hatanaka, Fumiyuki; Yamamoto, Mako; Araoka, Toshikazu; Li, Zhe; Kurita, Masakazu; Hishida, Tomoaki; Li, Mo; Aizawa, Emi; Guo, Shicheng; Chen, Song; Goebl, April; Soligalla, Rupa Devi; Qu, Jing; Jiang, Tingshuai; Fu, Xin; Jafari, Maryam; Esteban, Concepcion Rodriguez; Berggren, W. Travis; Lajara, Jeronimo; Nuñez-Delicado, Estrella; Guillen, Pedro; Campistol, Josep M.; Matsuzaki, Fumio; Liu, Guang-Hui; Magistretti, Pierre; Zhang, Kun; Callaway, Edward M.; Zhang, Kang; Belmonte, Juan Carlos Izpisua

2017-01-01

Targeted genome editing via engineered nucleases is an exciting area of biomedical research and holds potential for clinical applications. Despite rapid advances in the field, in vivo targeted transgene integration is still infeasible because current tools are inefficient1, especially for non-dividing cells, which compose most adult tissues. This poses a barrier for uncovering fundamental biological principles and developing treatments for a broad range of genetic disorders2. Based on clustered regularly interspaced short palindromic repeat/Cas9 (CRISPR/Cas9)3,4 technology, here we devise a homology-independent targeted integration (HITI) strategy, which allows for robust DNA knock-in in both dividing and non-dividing cells in vitro and, more importantly, in vivo (for example, in neurons of postnatal mammals). As a proof of concept of its therapeutic potential, we demonstrate the efficacy of HITI in improving visual function using a rat model of the retinal degeneration condition retinitis pigmentosa. The HITI method presented here establishes new avenues for basic research and targeted gene therapies. PMID:27851729
Comparison of gene expression in segregating families identifies genes and genomic regions involved in a novel adaptation, zinc hyperaccumulation.

PubMed

Filatov, Victor; Dowdle, John; Smirnoff, Nicholas; Ford-Lloyd, Brian; Newbury, H John; Macnair, Mark R

2006-09-01

One of the challenges of comparative genomics is to identify specific genetic changes associated with the evolution of a novel adaptation or trait. We need to be able to disassociate the genes involved with a particular character from all the other genetic changes that take place as lineages diverge. Here we show that by comparing the transcriptional profile of segregating families with that of parent species differing in a novel trait, it is possible to narrow down substantially the list of potential target genes. In addition, by assuming synteny with a related model organism for which the complete genome sequence is available, it is possible to use the cosegregation of markers differing in transcription level to identify regions of the genome which probably contain quantitative trait loci (QTLs) for the character. This novel combination of genomics and classical genetics provides a very powerful tool to identify candidate genes. We use this methodology to investigate zinc hyperaccumulation in Arabidopsis halleri, the sister species to the model plant, Arabidopsis thaliana. We compare the transcriptional profile of A. halleri with that of its sister nonaccumulator species, Arabidopsis petraea, and between accumulator and nonaccumulator F(3)s derived from the cross between the two species. We identify eight genes which consistently show greater expression in accumulator phenotypes in both roots and shoots, including two metal transporter genes (NRAMP3 and ZIP6), and cytoplasmic aconitase, a gene involved in iron homeostasis in mammals. We also show that there appear to be two QTLs for zinc accumulation, on chromosomes 3 and 7.
Assessing pooled BAC and whole genome shotgun strategies for assembly of complex genomes.

PubMed

Haiminen, Niina; Feltus, F Alex; Parida, Laxmi

2011-04-15

We investigate if pooling BAC clones and sequencing the pools can provide for more accurate assembly of genome sequences than the "whole genome shotgun" (WGS) approach. Furthermore, we quantify this accuracy increase. We compare the pooled BAC and WGS approaches using in silico simulations. Standard measures of assembly quality focus on assembly size and fragmentation, which are desirable for large whole genome assemblies. We propose additional measures enabling easy and visual comparison of assembly quality, such as rearrangements and redundant sequence content, relative to the known target sequence. The best assembly quality scores were obtained using 454 coverage of 15× linear and 5× paired (3kb insert size) reads (15L-5P) on Arabidopsis. This regime gave similarly good results on four additional plant genomes of very different GC and repeat contents. BAC pooling improved assembly scores over WGS assembly, coverage and redundancy scores improving the most. BAC pooling works better than WGS, however, both require a physical map to order the scaffolds. Pool sizes up to 12Mbp work well, suggesting this pooling density to be effective in medium-scale re-sequencing applications such as targeted sequencing of QTL intervals for candidate gene discovery. Assuming the current Roche/454 Titanium sequencing limitations, a 12 Mbp region could be re-sequenced with a full plate of linear reads and a half plate of paired-end reads, yielding 15L-5P coverage after read pre-processing. Our simulation suggests that massively over-sequencing may not improve accuracy. Our scoring measures can be used generally to evaluate and compare results of simulated genome assemblies.
Assessing pooled BAC and whole genome shotgun strategies for assembly of complex genomes

PubMed Central

2011-01-01

Background We investigate if pooling BAC clones and sequencing the pools can provide for more accurate assembly of genome sequences than the "whole genome shotgun" (WGS) approach. Furthermore, we quantify this accuracy increase. We compare the pooled BAC and WGS approaches using in silico simulations. Standard measures of assembly quality focus on assembly size and fragmentation, which are desirable for large whole genome assemblies. We propose additional measures enabling easy and visual comparison of assembly quality, such as rearrangements and redundant sequence content, relative to the known target sequence. Results The best assembly quality scores were obtained using 454 coverage of 15× linear and 5× paired (3kb insert size) reads (15L-5P) on Arabidopsis. This regime gave similarly good results on four additional plant genomes of very different GC and repeat contents. BAC pooling improved assembly scores over WGS assembly, coverage and redundancy scores improving the most. Conclusions BAC pooling works better than WGS, however, both require a physical map to order the scaffolds. Pool sizes up to 12Mbp work well, suggesting this pooling density to be effective in medium-scale re-sequencing applications such as targeted sequencing of QTL intervals for candidate gene discovery. Assuming the current Roche/454 Titanium sequencing limitations, a 12 Mbp region could be re-sequenced with a full plate of linear reads and a half plate of paired-end reads, yielding 15L-5P coverage after read pre-processing. Our simulation suggests that massively over-sequencing may not improve accuracy. Our scoring measures can be used generally to evaluate and compare results of simulated genome assemblies. PMID:21496274
Genome-Scale Screening of Drug-Target Associations Relevant to Ki Using a Chemogenomics Approach

PubMed Central

Cao, Dong-Sheng; Liang, Yi-Zeng; Deng, Zhe; Hu, Qian-Nan; He, Min; Xu, Qing-Song; Zhou, Guang-Hua; Zhang, Liu-Xia; Deng, Zi-xin; Liu, Shao

2013-01-01

The identification of interactions between drugs and target proteins plays a key role in genomic drug discovery. In the present study, the quantitative binding affinities of drug-target pairs are differentiated as a measurement to define whether a drug interacts with a protein or not, and then a chemogenomics framework using an unbiased set of general integrated features and random forest (RF) is employed to construct a predictive model which can accurately classify drug-target pairs. The predictability of the model is further investigated and validated by several independent validation sets. The built model is used to predict drug-target associations, some of which were confirmed by comparing experimental data from public biological resources. A drug-target interaction network with high confidence drug-target pairs was also reconstructed. This network provides further insight for the action of drugs and targets. Finally, a web-based server called PreDPI-Ki was developed to predict drug-target interactions for drug discovery. In addition to providing a high-confidence list of drug-target associations for subsequent experimental investigation guidance, these results also contribute to the understanding of drug-target interactions. We can also see that quantitative information of drug-target associations could greatly promote the development of more accurate models. The PreDPI-Ki server is freely available via: http://sdd.whu.edu.cn/dpiki. PMID:23577055
Genome-wide copy number variation (CNV) detection in Nelore cattle reveals highly frequent variants in genome regions harboring QTLs affecting production traits.

PubMed

da Silva, Joaquim Manoel; Giachetto, Poliana Fernanda; da Silva, Luiz Otávio; Cintra, Leandro Carrijo; Paiva, Samuel Rezende; Yamagishi, Michel Eduardo Beleza; Caetano, Alexandre Rodrigues

2016-06-13

Copy number variations (CNVs) have been shown to account for substantial portions of observed genomic variation and have been associated with qualitative and quantitative traits and the onset of disease in a number of species. Information from high-resolution studies to detect, characterize and estimate population-specific variant frequencies will facilitate the incorporation of CNVs in genomic studies to identify genes affecting traits of importance. Genome-wide CNVs were detected in high-density single nucleotide polymorphism (SNP) genotyping data from 1,717 Nelore (Bos indicus) cattle, and in NGS data from eight key ancestral bulls. A total of 68,007 and 12,786 distinct CNVs were observed, respectively. Cross-comparisons of results obtained for the eight resequenced animals revealed that 92 % of the CNVs were observed in both datasets, while 62 % of all detected CNVs were observed to overlap with previously validated cattle copy number variant regions (CNVRs). Observed CNVs were used for obtaining breed-specific CNV frequencies and identification of CNVRs, which were subsequently used for gene annotation. A total of 688 of the detected CNVRs were observed to overlap with 286 non-redundant QTLs associated with important production traits in cattle. All of 34 CNVs previously reported to be associated with milk production traits in Holsteins were also observed in Nelore cattle. Comparisons of estimated frequencies of these CNVs in the two breeds revealed 14, 13, 6 and 14 regions in high (>20 %), low (<20 %) and divergent (NEL > HOL, NEL < HOL) frequencies, respectively. Obtained results significantly enriched the bovine CNV map and enabled the identification of variants that are potentially associated with traits under selection in Nelore cattle, particularly in genome regions harboring QTLs affecting production traits.
GenomeLandscaper: Landscape analysis of genome-fingerprints maps assessing chromosome architecture.

PubMed

Ai, Hannan; Ai, Yuncan; Meng, Fanmei

2018-01-18

Assessing correctness of an assembled chromosome architecture is a central challenge. We create a geometric analysis method (called GenomeLandscaper) to conduct landscape analysis of genome-fingerprints maps (GFM), trace large-scale repetitive regions, and assess their impacts on the global architectures of assembled chromosomes. We develop an alignment-free method for phylogenetics analysis. The human Y chromosomes (GRCh.chrY, HuRef.chrY and YH.chrY) are analysed as a proof-of-concept study. We construct a galaxy of genome-fingerprints maps (GGFM) for them, and a landscape compatibility among relatives is observed. But a long sharp straight line on the GGFM breaks such a landscape compatibility, distinguishing GRCh38p1.chrY (and throughout GRCh38p7.chrY) from GRCh37p13.chrY, HuRef.chrY and YH.chrY. We delete a 1.30-Mbp target segment to rescue the landscape compatibility, matching the antecedent GRCh37p13.chrY. We re-locate it into the modelled centromeric and pericentromeric region of GRCh38p10.chrY, matching a gap placeholder of GRCh37p13.chrY. We decompose it into sub-constituents (such as BACs, interspersed repeats, and tandem repeats) and trace their homologues by phylogenetics analysis. We elucidate that most examined tandem repeats are of reasonable quality, but the BAC-sized repeats, 173U1020C (176.46 Kbp) and 5U41068C (205.34 Kbp), are likely over-repeated. These results offer unique insights into the centromeric and pericentromeric regions of the human Y chromosomes.
Genomics of Parallel Ecological Speciation in Lake Victoria Cichlids.

PubMed

Meier, Joana Isabel; Marques, David Alexander; Wagner, Catherine Elise; Excoffier, Laurent; Seehausen, Ole

2018-06-01

The genetic basis of parallel evolution of similar species is of great interest in evolutionary biology. In the adaptive radiation of Lake Victoria cichlid fishes, sister species with either blue or red-back male nuptial coloration have evolved repeatedly, often associated with shallower and deeper water, respectively. One such case is blue and red-backed Pundamilia species, for which we recently showed that a young species pair may have evolved through "hybrid parallel speciation". Coalescent simulations suggested that the older species P. pundamilia (blue) and P. nyererei (red-back) admixed in the Mwanza Gulf and that new "nyererei-like" and "pundamilia-like" species evolved from the admixed population. Here, we use genome scans to study the genomic architecture of differentiation, and assess the influence of hybridization on the evolution of the younger species pair. For each of the two species pairs, we find over 300 genomic regions, widespread across the genome, which are highly differentiated. A subset of the most strongly differentiated regions of the older pair are also differentiated in the younger pair. These shared differentiated regions often show parallel allele frequency differences, consistent with the hypothesis that admixture-derived alleles were targeted by divergent selection in the hybrid population. However, two-thirds of the genomic regions that are highly differentiated between the younger species are not highly differentiated between the older species, suggesting independent evolutionary responses to selection pressures. Our analyses reveal how divergent selection on admixture-derived genetic variation can facilitate new speciation events.
Clustered Mutation Signatures Reveal that Error-Prone DNA Repair Targets Mutations to Active Genes.

PubMed

Supek, Fran; Lehner, Ben

2017-07-27

Many processes can cause the same nucleotide change in a genome, making the identification of the mechanisms causing mutations a difficult challenge. Here, we show that clustered mutations provide a more precise fingerprint of mutagenic processes. Of nine clustered mutation signatures identified from >1,000 tumor genomes, three relate to variable APOBEC activity and three are associated with tobacco smoking. An additional signature matches the spectrum of translesion DNA polymerase eta (POLH). In lymphoid cells, these mutations target promoters, consistent with AID-initiated somatic hypermutation. In solid tumors, however, they are associated with UV exposure and alcohol consumption and target the H3K36me3 chromatin of active genes in a mismatch repair (MMR)-dependent manner. These regions normally have a low mutation rate because error-free MMR also targets H3K36me3 chromatin. Carcinogens and error-prone repair therefore redistribute mutations to the more important regions of the genome, contributing a substantial mutation load in many tumors, including driver mutations. Copyright © 2017 Elsevier Inc. All rights reserved.
aCGH Local Copy Number Aberrations Associated with Overall Copy Number Genomic Instability in Colorectal Cancer: Coordinate Involvement of the Regions Including BCR and ABL

PubMed Central

Bartos, Jeremy D.; Gaile, Daniel P.; McQuaid, Devin E.; Conroy, Jeffrey M.; Darbary, Huferesh; Nowak, Norma J.; Block, Annemarie; Petrelli, Nicholas J.; Mittelman, Arnold; Stoler, Daniel L.; Anderson, Garth R.

2007-01-01

In order to identify small regions of the genome whose specific copy number alteration is associated with high genomic instability in the form of overall genome-wide copy number aberrations, we have analyzed array-based comparative genomic hybridization (aCGH) data from 33 sporadic colorectal carcinomas. Copy number changes of a small number of specific regions were significantly correlated with elevated overall amplifications and deletions scattered throughout the entire genome. One significant region at 9q34 includes the c-ABL gene Another region spanning 22q11–13 includes the breakpoint cluster region (BCR) of the Philadelphia chromosome Coordinate 22q11–13 alterations were observed in nine of eleven tumors with the 9q34 alteration Additional regions on 1q and 14q were associated with overall genome-wide copy number changes, while copy number aberrations on chromosome 7p, 7q, and 13q21.1–31.3 were found associated with this instability only in tumors from patients with a smoking history Our analysis demonstrates there are a small number of regions of the genome where gain or loss is commonly associated with a tumor’s overall level of copy number aberrations Our finding BCR and ABL located within two of the instability-associated regions, and the involvement of these two regions occurring coordinately, suggests a system akin to the BCR-ABL translocation of CML may be involved in genomic instability in about one-third of human colorectal carcinomas. PMID:17196995
Engineered chromosome-based genetic mapping establishes a 3.7 Mb critical genomic region for Down syndrome-associated heart defects in mice.

PubMed

Liu, Chunhong; Morishima, Masae; Jiang, Xiaoling; Yu, Tao; Meng, Kai; Ray, Debjit; Pao, Annie; Ye, Ping; Parmacek, Michael S; Yu, Y Eugene

2014-06-01

Trisomy 21 (Down syndrome, DS) is the most common human genetic anomaly associated with heart defects. Based on evolutionary conservation, DS-associated heart defects have been modeled in mice. By generating and analyzing mouse mutants carrying different genomic rearrangements in human chromosome 21 (Hsa21) syntenic regions, we found the triplication of the Tiam1-Kcnj6 region on mouse chromosome 16 (Mmu16) resulted in DS-related cardiovascular abnormalities. In this study, we developed two tandem duplications spanning the Tiam1-Kcnj6 genomic region on Mmu16 using recombinase-mediated genome engineering, Dp(16)3Yey and Dp(16)4Yey, spanning the 2.1 Mb Tiam1-Il10rb and 3.7 Mb Ifnar1-Kcnj6 regions, respectively. We found that Dp(16)4Yey/+, but not Dp(16)3Yey/+, led to heart defects, suggesting the triplication of the Ifnar1-Kcnj6 region is sufficient to cause DS-associated heart defects. Our transcriptional analysis of Dp(16)4Yey/+ embryos showed that the Hsa21 gene orthologs located within the duplicated interval were expressed at the elevated levels, reflecting the consequences of the gene dosage alterations. Therefore, we have identified a 3.7 Mb genomic region, the smallest critical genomic region, for DS-associated heart defects, and our results should set the stage for the final step to establish the identities of the causal gene(s), whose elevated expression(s) directly underlie this major DS phenotype.
Genome-wide Determinants of Proviral Targeting, Clonal Abundance and Expression in Natural HTLV-1 Infection

PubMed Central

Melamed, Anat; Laydon, Daniel J.; Gillet, Nicolas A.; Tanaka, Yuetsu; Taylor, Graham P.; Bangham, Charles R. M.

2013-01-01

The regulation of proviral latency is a central problem in retrovirology. We postulate that the genomic integration site of human T lymphotropic virus type 1 (HTLV-1) determines the pattern of expression of the provirus, which in turn determines the abundance and pathogenic potential of infected T cell clones in vivo. We recently developed a high-throughput method for the genome-wide amplification, identification and quantification of proviral integration sites. Here, we used this protocol to test two hypotheses. First, that binding sites for transcription factors and chromatin remodelling factors in the genome flanking the proviral integration site of HTLV-1 are associated with integration targeting, spontaneous proviral expression, and in vivo clonal abundance. Second, that the transcriptional orientation of the HTLV-1 provirus relative to that of the nearest host gene determines spontaneous proviral expression and in vivo clonal abundance. Integration targeting was strongly associated with the presence of a binding site for specific host transcription factors, especially STAT1 and p53. The presence of the chromatin remodelling factors BRG1 and INI1 and certain host transcription factors either upstream or downstream of the provirus was associated respectively with silencing or spontaneous expression of the provirus. Cells expressing HTLV-1 Tax protein were significantly more frequent in clones of low abundance in vivo. We conclude that transcriptional interference and chromatin remodelling are critical determinants of proviral latency in natural HTLV-1 infection. PMID:23555266
The nucleotide composition of microbial genomes indicates differential patterns of selection on core and accessory genomes.

PubMed

Bohlin, Jon; Eldholm, Vegard; Pettersson, John H O; Brynildsrud, Ola; Snipen, Lars

2017-02-10

The core genome consists of genes shared by the vast majority of a species and is therefore assumed to have been subjected to substantially stronger purifying selection than the more mobile elements of the genome, also known as the accessory genome. Here we examine intragenic base composition differences in core genomes and corresponding accessory genomes in 36 species, represented by the genomes of 731 bacterial strains, to assess the impact of selective forces on base composition in microbes. We also explore, in turn, how these results compare with findings for whole genome intragenic regions. We found that GC content in coding regions is significantly higher in core genomes than accessory genomes and whole genomes. Likewise, GC content variation within coding regions was significantly lower in core genomes than in accessory genomes and whole genomes. Relative entropy in coding regions, measured as the difference between observed and expected trinucleotide frequencies estimated from mononucleotide frequencies, was significantly higher in the core genomes than in accessory and whole genomes. Relative entropy was positively associated with coding region GC content within the accessory genomes, but not within the corresponding coding regions of core or whole genomes. The higher intragenic GC content and relative entropy, as well as the lower GC content variation, observed in the core genomes is most likely associated with selective constraints. It is unclear whether the positive association between GC content and relative entropy in the more mobile accessory genomes constitutes signatures of selection or selective neutral processes.

Can we use genetic and genomic approaches to identify candidate animals for targeted selective treatment.

PubMed

Laurenson, Yan C S M; Kyriazakis, Ilias; Bishop, Stephen C

2013-10-18

Estimated breeding values (EBV) for faecal egg count (FEC) and genetic markers for host resistance to nematodes may be used to identify resistant animals for selective breeding programmes. Similarly, targeted selective treatment (TST) requires the ability to identify the animals that will benefit most from anthelmintic treatment. A mathematical model was used to combine the concepts and evaluate the potential of using genetic-based methods to identify animals for a TST regime. EBVs obtained by genomic prediction were predicted to be the best determinant criterion for TST in terms of the impact on average empty body weight and average FEC, whereas pedigree-based EBVs for FEC were predicted to be marginally worse than using phenotypic FEC as a determinant criterion. Whilst each method has financial implications, if the identification of host resistance is incorporated into a wider genomic selection indices or selective breeding programmes, then genetic or genomic information may be plausibly included in TST regimes. Copyright © 2013 Elsevier B.V. All rights reserved.
An exploration of the sequence of a 2.9-Mb region of the genome of Drosophila melanogaster: the Adh region.

PubMed Central

Ashburner, M; Misra, S; Roote, J; Lewis, S E; Blazej, R; Davis, T; Doyle, C; Galle, R; George, R; Harris, N; Hartzell, G; Harvey, D; Hong, L; Houston, K; Hoskins, R; Johnson, G; Martin, C; Moshrefi, A; Palazzolo, M; Reese, M G; Spradling, A; Tsang, G; Wan, K; Whitelaw, K; Celniker, S

1999-01-01

A contiguous sequence of nearly 3 Mb from the genome of Drosophila melanogaster has been sequenced from a series of overlapping P1 and BAC clones. This region covers 69 chromosome polytene bands on chromosome arm 2L, including the genetically well-characterized "Adh region." A computational analysis of the sequence predicts 218 protein-coding genes, 11 tRNAs, and 17 transposable element sequences. At least 38 of the protein-coding genes are arranged in clusters of from 2 to 6 closely related genes, suggesting extensive tandem duplication. The gene density is one protein-coding gene every 13 kb; the transposable element density is one element every 171 kb. Of 73 genes in this region identified by genetic analysis, 49 have been located on the sequence; P-element insertions have been mapped to 43 genes. Ninety-five (44%) of the known and predicted genes match a Drosophila EST, and 144 (66%) have clear similarities to proteins in other organisms. Genes known to have mutant phenotypes are more likely to be represented in cDNA libraries, and far more likely to have products similar to proteins of other organisms, than are genes with no known mutant phenotype. Over 650 chromosome aberration breakpoints map to this chromosome region, and their nonrandom distribution on the genetic map reflects variation in gene spacing on the DNA. This is the first large-scale analysis of the genome of D. melanogaster at the sequence level. In addition to the direct results obtained, this analysis has allowed us to develop and test methods that will be needed to interpret the complete sequence of the genome of this species.Before beginning a Hunt, it is wise to ask someone what you are looking for before you begin looking for it. Milne 1926 PMID:10471707
Expanding probe repertoire and improving reproducibility in human genomic hybridization

PubMed Central

Dorman, Stephanie N.; Shirley, Ben C.; Knoll, Joan H. M.; Rogan, Peter K.

2013-01-01

Diagnostic DNA hybridization relies on probes composed of single copy (sc) genomic sequences. Sc sequences in probe design ensure high specificity and avoid cross-hybridization to other regions of the genome, which could lead to ambiguous results that are difficult to interpret. We examine how the distribution and composition of repetitive sequences in the genome affects sc probe performance. A divide and conquer algorithm was implemented to design sc probes. With this approach, sc probes can include divergent repetitive elements, which hybridize to unique genomic targets under higher stringency experimental conditions. Genome-wide custom probe sets were created for fluorescent in situ hybridization (FISH) and microarray genomic hybridization. The scFISH probes were developed for detection of copy number changes within small tumour suppressor genes and oncogenes. The microarrays demonstrated increased reproducibility by eliminating cross-hybridization to repetitive sequences adjacent to probe targets. The genome-wide microarrays exhibited lower median coefficients of variation (17.8%) for two HapMap family trios. The coefficients of variations of commercial probes within 300 nt of a repetitive element were 48.3% higher than the nearest custom probe. Furthermore, the custom microarray called a chromosome 15q11.2q13 deletion more consistently. This method for sc probe design increases probe coverage for FISH and lowers variability in genomic microarrays. PMID:23376933
Phenome-genome association studies of pancreatic cancer: new targets for therapy and diagnosis.

PubMed

Narayanan, Ramaswamy

2015-01-01

Pancreatic cancer, has a very high mortality rate and requires novel molecular targets for diagnosis and therapy. Genetic association studies over databases offer an attractive starting point for gene discovery. The National Center for Biotechnology Information (NCBI) Phenome Genome Integrator (PheGenI) tool was enriched for pancreatic cancer-associated traits. The genes associated with the trait were characterized using diverse bioinformatics tools for Genome-Wide Association (GWA), transcriptome and proteome profile and protein classes for motif and domain. Two hundred twenty-six genes were identified that had a genetic association with pancreatic cancer in the human genome. This included 25 uncharacterized open reading frames (ORFs). Bioinformatics analysis of these ORFs identified putative druggable proteins and biomarkers including enzymes, transporters and G-protein-coupled receptor signaling proteins. Secreted proteins including a neuroendocrine factor and a chemokine were identified. Five out of these ORFs encompassed non coding RNAs. The ORF protein expression was detected in numerous body fluids, such as ascites, bile, pancreatic juice, milk, plasma, serum and saliva. Transcriptome and proteome analyses showed a correlation of mRNA and protein expression for nine ORFs. Analysis of the Catalogue of Somatic Mutations in Cancer (COSMIC) database revealed a strong correlation across copy number variations and mRNA over-expression for four ORFs. Mining of the International Cancer Gene Consortium (ICGC) database identified somatic mutations in a significant number of pancreatic patients' tumors for most of these ORFs. The pancreatic cancer-associated ORFs were also found to be genetically associated with other neoplasms, including leukemia, malignant melanoma, neuroblastoma and prostate carcinomas, as well as other unrelated diseases and disorders, such as Alzheimer's disease, Crohn's disease, coronary diseases, attention deficit disorder and addiction. Based
Single Molecule Analysis of Replicated DNA Reveals the Usage of Multiple KSHV Genome Regions for Latent Replication

PubMed Central

Verma, Subhash C.; Lu, Jie; Cai, Qiliang; Kosiyatrakul, Settapong; McDowell, Maria E.; Schildkraut, Carl L.; Robertson, Erle S.

2011-01-01

Kaposi's sarcoma associated herpesvirus (KSHV), an etiologic agent of Kaposi's sarcoma, Body Cavity Based Lymphoma and Multicentric Castleman's Disease, establishes lifelong latency in infected cells. The KSHV genome tethers to the host chromosome with the help of a latency associated nuclear antigen (LANA). Additionally, LANA supports replication of the latent origins within the terminal repeats by recruiting cellular factors. Our previous studies identified and characterized another latent origin, which supported the replication of plasmids ex-vivo without LANA expression in trans. Therefore identification of an additional origin site prompted us to analyze the entire KSHV genome for replication initiation sites using single molecule analysis of replicated DNA (SMARD). Our results showed that replication of DNA can initiate throughout the KSHV genome and the usage of these regions is not conserved in two different KSHV strains investigated. SMARD also showed that the utilization of multiple replication initiation sites occurs across large regions of the genome rather than a specified sequence. The replication origin of the terminal repeats showed only a slight preference for their usage indicating that LANA dependent origin at the terminal repeats (TR) plays only a limited role in genome duplication. Furthermore, we performed chromatin immunoprecipitation for ORC2 and MCM3, which are part of the pre-replication initiation complex to determine the genomic sites where these proteins accumulate, to provide further characterization of potential replication initiation sites on the KSHV genome. The ChIP data confirmed accumulation of these pre-RC proteins at multiple genomic sites in a cell cycle dependent manner. Our data also show that both the frequency and the sites of replication initiation vary within the two KSHV genomes studied here, suggesting that initiation of replication is likely to be affected by the genomic context rather than the DNA sequences. PMID
Genomic Mapping of Human DNA provides Evidence of Difference in Stretch between AT and GC rich regions

NASA Astrophysics Data System (ADS)

Reifenberger, Jeffrey; Dorfman, Kevin; Cao, Han

Human DNA is a not a polymer consisting of a uniform distribution of all 4 nucleic acids, but rather contains regions of high AT and high GC content. When confined, these regions could have different stretch due to the extra hydrogen bond present in the GC basepair. To measure this potential difference, human genomic DNA was nicked with NtBspQI, labeled with a cy3 like fluorophore at the nick site, stained with YOYO, loaded into a device containing an array of nanochannels, and imaged. Over 473,000 individual molecules of DNA, corresponding to roughly 30x coverage of a human genome, were collected and aligned to the human reference. Based on the known AT/GC content between aligned pairs of labels, the stretch was measured for regions of similar size but different AT/GC content. We found that regions of high GC content were consistently more stretched than regions of high AT content between pairs of labels varying in size between 2.5 kbp and 500 kbp. We measured that for every 1% increase in GC content there was roughly a 0.06% increase in stretch. While this effect is small, it is important to take into account differences in stretch between AT and GC rich regions to improve the sensitivity of detection of structural variations from genomic variations. NIH Grant: R01-HG006851.
Unlocking hidden genomic sequence

PubMed Central

Keith, Jonathan M.; Cochran, Duncan A. E.; Lala, Gita H.; Adams, Peter; Bryant, Darryn; Mitchelson, Keith R.

2004-01-01

Despite the success of conventional Sanger sequencing, significant regions of many genomes still present major obstacles to sequencing. Here we propose a novel approach with the potential to alleviate a wide range of sequencing difficulties. The technique involves extracting target DNA sequence from variants generated by introduction of random mutations. The introduction of mutations does not destroy original sequence information, but distributes it amongst multiple variants. Some of these variants lack problematic features of the target and are more amenable to conventional sequencing. The technique has been successfully demonstrated with mutation levels up to an average 18% base substitution and has been used to read previously intractable poly(A), AT-rich and GC-rich motifs. PMID:14973330
Analysis of new isolates reveals new genome organization and a hypervariable region in infectious myonecrosis virus (IMNV).

PubMed

Dantas, Márcia Danielle A; Chavante, Suely F; Teixeira, Dárlio Inácio A; Lima, João Paulo M S; Lanza, Daniel C F

2015-05-04

Infectious myonecrosis virus (IMNV) has been the cause of many losses in shrimp farming since 2002, when the first myonecrosis outbreak was reported at Brazilian's northeast coast. Two additional genomes of Brazilian IMNV isolates collected in 2009 and 2013 were sequenced and analyzed in the present study. The sequencing revealed extra 643 bp and 22 bp, at 5' and 3' ends of IMNV genome respectively, confirming that its actual size is at least 8226 bp long. Considering these additional sequences in genome extremities, ORF1 can starts at nt 470, encoding a 1708 aa polyprotein. Computational predictions reveal two stem loops and two pseudoknots in the 5' end and a putative stem loop and a slippery motif located at 3' end, indicating that these regions can be involved in the start and termination of translation. Through a careful phylogenetic analysis, a higher genetic variability among Brazilian isolates could be observed, comparing with Indonesian IMNV isolates. It was also observed that the most variable region of IMNV genome is located in the first half of ORF1, coinciding with a region which probably encodes the capsid protrusions. The results presented here are a starting point to elucidate the viral's translational regulation and the mechanisms involved in virulence. Copyright © 2015 Elsevier B.V. All rights reserved.
A comparative genomics strategy for targeted discovery of single-nucleotide polymorphisms and conserved-noncoding sequences in orphan crops.

PubMed

Feltus, F A; Singh, H P; Lohithaswa, H C; Schulze, S R; Silva, T D; Paterson, A H

2006-04-01

Completed genome sequences provide templates for the design of genome analysis tools in orphan species lacking sequence information. To demonstrate this principle, we designed 384 PCR primer pairs to conserved exonic regions flanking introns, using Sorghum/Pennisetum expressed sequence tag alignments to the Oryza genome. Conserved-intron scanning primers (CISPs) amplified single-copy loci at 37% to 80% success rates in taxa that sample much of the approximately 50-million years of Poaceae divergence. While the conserved nature of exons fostered cross-taxon amplification, the lesser evolutionary constraints on introns enhanced single-nucleotide polymorphism detection. For example, in eight rice (Oryza sativa) genotypes, polymorphism averaged 12.1 per kb in introns but only 3.6 per kb in exons. Curiously, among 124 CISPs evaluated across Oryza, Sorghum, Pennisetum, Cynodon, Eragrostis, Zea, Triticum, and Hordeum, 23 (18.5%) seemed to be subject to rigid intron size constraints that were independent of per-nucleotide DNA sequence variation. Furthermore, we identified 487 conserved-noncoding sequence motifs in 129 CISP loci. A large CISP set (6,062 primer pairs, amplifying introns from 1,676 genes) designed using an automated pipeline showed generally higher abundance in recombinogenic than in nonrecombinogenic regions of the rice genome, thus providing relatively even distribution along genetic maps. CISPs are an effective means to explore poorly characterized genomes for both DNA polymorphism and noncoding sequence conservation on a genome-wide or candidate gene basis, and also provide anchor points for comparative genomics across a diverse range of species.
Comparative analysis of the 5{prime} genomic and promoter regions between the mouse (Hdh) and human Huntington disease (HD) gene

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kalchman, M.; Lin, B.; Nasir, J.

1994-09-01

The mouse homologue of the Huntington disease gene (Hdh) has recently been cloned and mapped to a region of synteny with the human, on mouse chromosome 5. The two genes share a high degree of both coding (90% amino acid) and nucleotide (86.2%) identity. We have subsequently performed a detailed comparison of the genomic organization of the 5{prime} region of the two genes encompassing the promoter region and first five exons of both the human and mouse genes. The comparative sequence analysis of the promoter region between HD and Hdh reveals two highly conserved regions. One region (-56 to -118)more » (+1 is the ATG start codon), shared 84% nucleotide identity and another region (-130 to -206) had 81% nucleotide identity. Nine putative Sp1 sites appear in the human promoter region contrasted with only 3 in a similar region in the mouse. Furthermore, 17 and 20 base pair direct repeats present in the HD 5{prime} region are absent in the similar Hdh region. Although both the mouse and human intron/exon boundaries conform to the GT/AG rule, the intron sizes between HD and Hdh are markedly different. The first four introns in Hdh are 15, 7, 5 and 0.5 kb compared to sizes of 10, 15, 7 and 0.5 kb, respectively. Comparison between the mouse and human intronic sequences immediately adjacent to the first five exons (excluding exon 1) reveals only about 46 to 50% identity within the first 60 bp of intronic sequence. Furthermore, we have identified novel polymorphic di-, tri- and tetra-nucleotide repeats in Hdh introns of various mouse strains that are not present in the human. For example, polymorphic CT repeats are present in introns 2 and 4 of Hdh and a novel mouse 56 AAG trinucleotide repeat (interrupted by an AAGG) is also located within intron 2. This information concerning the promoter and genomic organization of both HD and Hdh is critical for designing appropriate gene targetting vectors for studying the normal function of the HD and Hdh genes in model systems.« less
AID targeting: old mysteries and new challenges.

PubMed

Chandra, Vivek; Bortnick, Alexandra; Murre, Cornelis

2015-09-01

Activation-induced cytidine deaminase (AID) mediates cytosine deamination and underlies two central processes in antibody diversification: somatic hypermutation and class-switch recombination. AID deamination is not exclusive to immunoglobulin loci; it can instigate DNA lesions in non-immunoglobulin genes and thus stringent checks are in place to constrain and restrict its activity. Recent findings have provided new insights into the mechanisms that target AID activity to specific genomic regions, revealing an involvement for noncoding RNAs associated with polymerase pausing and with enhancer transcription as well as genomic architecture. We review these findings and integrate them into a model for multilevel regulation of AID expression and targeting in immunoglobulin and non-immunoglobulin loci. Within this framework we discuss gaps in understanding, and outline important areas of further research. Copyright © 2015 Elsevier Ltd. All rights reserved.
AID Targeting: Old Mysteries and New Challenges

PubMed Central

Chandra, Vivek; Bortnick, Alexandra; Murre, Cornelis

2015-01-01

Activation-induced cytidine deaminase (AID) mediates cytosine deamination and underlies two central processes in antibody diversification: somatic hypermutation and class-switch recombination. AID deamination is not exclusive to immunoglobulin loci; it can instigate DNA lesions in non-immunoglobulin genes and thus, stringent checks are in place to constrain and restrict its activity. Recent findings have provided new insights into the mechanisms that target AID activity to specific genomic regions, revealing an involvement for non-coding RNAs associated with polymerase pausing and with enhancer transcription as well as genomic architecture. We review these findings and integrate them into a model for multi-level regulation of AID expression and targeting in immunoglobulin and non-immunoglobulin loci. Within this framework we discuss gaps in understanding, and outline important areas of further research. PMID:26254147
Use of randomly mutagenized genomic cDNA banks of potato spindle tuber viroid to screen for viable versions of the viroid genome.

PubMed

Więsyk, Aneta; Candresse, Thierry; Zagórski, Włodzimierz; Góra-Sochacka, Anna

2011-02-01

In an effort to study sequence space allowing the recovery of viable potato spindle tuber viroid (PSTVd) variants we have developed an in vivo selection (Selex) method to produce and bulk-inoculate by agroinfiltration large PSTVd cDNA banks in which a short stretch of the genome is mutagenized to saturation. This technique was applied to two highly conserved 6 nt-long regions of the PSTVd genome, the left terminal loop (TL bank) and part of the polypurine stretch in the upper strand of pre-melting loop 1 (PM1 bank). In each case, PSTVd accumulation was observed in a large fraction of bank-inoculated tomato plants. Characterization of the progeny molecules showed the recovery of the parental PSTVd sequence in 89 % (TL bank) and 18 % (PM1 bank) of the analysed plants. In addition, viable and genetically stable PSTVd variants with mutations outside of the known natural variability of PSTVd were recovered in both cases, although at different rates. In the case of the TL region, mutations were recovered at five of the six mutagenized positions (357, 358, 359, 1 and 3 of the genome) while for the PM1 region mutations were recovered at all six targeted positions (50-55), providing significant new insight on the plasticity of the PSTVd genome.
Genome-wide association study using high-density single nucleotide polymorphism arrays and whole-genome sequences for clinical mastitis traits in dairy cattle.

PubMed

Sahana, G; Guldbrandtsen, B; Thomsen, B; Holm, L-E; Panitz, F; Brøndum, R F; Bendixen, C; Lund, M S

2014-11-01

Mastitis is a mammary disease that frequently affects dairy cattle. Despite considerable research on the development of effective prevention and treatment strategies, mastitis continues to be a significant issue in bovine veterinary medicine. To identify major genes that affect mastitis in dairy cattle, 6 chromosomal regions on Bos taurus autosome (BTA) 6, 13, 16, 19, and 20 were selected from a genome scan for 9 mastitis phenotypes using imputed high-density single nucleotide polymorphism arrays. Association analyses using sequence-level variants for the 6 targeted regions were carried out to map causal variants using whole-genome sequence data from 3 breeds. The quantitative trait loci (QTL) discovery population comprised 4,992 progeny-tested Holstein bulls, and QTL were confirmed in 4,442 Nordic Red and 1,126 Jersey cattle. The targeted regions were imputed to the sequence level. The highest association signal for clinical mastitis was observed on BTA 6 at 88.97 Mb in Holstein cattle and was confirmed in Nordic Red cattle. The peak association region on BTA 6 contained 2 genes: vitamin D-binding protein precursor (GC) and neuropeptide FF receptor 2 (NPFFR2), which, based on known biological functions, are good candidates for affecting mastitis. However, strong linkage disequilibrium in this region prevented conclusive determination of the causal gene. A different QTL on BTA 6 located at 88.32 Mb in Holstein cattle affected mastitis. In addition, QTL on BTA 13 and 19 were confirmed to segregate in Nordic Red cattle and QTL on BTA 16 and 20 were confirmed in Jersey cattle. Although several candidate genes were identified in these targeted regions, it was not possible to identify a gene or polymorphism as the causal factor for any of these regions. Copyright © 2014 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Whole genome association study identifies regions of the bovine genome and biological pathways involved in carcass trait performance in Holstein-Friesian cattle.

PubMed

Doran, Anthony G; Berry, Donagh P; Creevey, Christopher J

2014-10-01

Four traits related to carcass performance have been identified as economically important in beef production: carcass weight, carcass fat, carcass conformation of progeny and cull cow carcass weight. Although Holstein-Friesian cattle are primarily utilized for milk production, they are also an important source of meat for beef production and export. Because of this, there is great interest in understanding the underlying genomic structure influencing these traits. Several genome-wide association studies have identified regions of the bovine genome associated with growth or carcass traits, however, little is known about the mechanisms or underlying biological pathways involved. This study aims to detect regions of the bovine genome associated with carcass performance traits (employing a panel of 54,001 SNPs) using measures of genetic merit (as predicted transmitting abilities) for 5,705 Irish Holstein-Friesian animals. Candidate genes and biological pathways were then identified for each trait under investigation. Following adjustment for false discovery (q-value < 0.05), 479 quantitative trait loci (QTL) were associated with at least one of the four carcass traits using a single SNP regression approach. Using a Bayesian approach, 46 QTL were associated (posterior probability > 0.5) with at least one of the four traits. In total, 557 unique bovine genes, which mapped to 426 human orthologs, were within 500kbs of QTL found associated with a trait using the Bayesian approach. Using this information, 24 significantly over-represented pathways were identified across all traits. The most significantly over-represented biological pathway was the peroxisome proliferator-activated receptor (PPAR) signaling pathway. A large number of genomic regions putatively associated with bovine carcass traits were detected using two different statistical approaches. Notably, several significant associations were detected in close proximity to genes with a known role in animal growth
Implementing Genome-Driven Oncology

PubMed Central

Hyman, David M.; Taylor, Barry S.; Baselga, José

2017-01-01

Early successes in identifying and targeting individual oncogenic drivers, together with the increasing feasibility of sequencing tumor genomes, have brought forth the promise of genome-driven oncology care. As we expand the breadth and depth of genomic analyses, the biological and clinical complexity of its implementation will be unparalleled. Challenges include target credentialing and validation, implementing drug combinations, clinical trial designs, targeting tumor heterogeneity, and deploying technologies beyond DNA sequencing, among others. We review how contemporary approaches are tackling these challenges and will ultimately serve as an engine for biological discovery and increase our insight into cancer and its treatment. PMID:28187282
Advances in plant gene-targeted and functional markers: a review

PubMed Central

2013-01-01

Public genomic databases have provided new directions for molecular marker development and initiated a shift in the types of PCR-based techniques commonly used in plant science. Alongside commonly used arbitrarily amplified DNA markers, other methods have been developed. Targeted fingerprinting marker techniques are based on the well-established practices of arbitrarily amplified DNA methods, but employ novel methodological innovations such as the incorporation of gene or promoter elements in the primers. These markers provide good reproducibility and increased resolution by the concurrent incidence of dominant and co-dominant bands. Despite their promising features, these semi-random markers suffer from possible problems of collision and non-homology analogous to those found with randomly generated fingerprints. Transposable elements, present in abundance in plant genomes, may also be used to generate fingerprints. These markers provide increased genomic coverage by utilizing specific targeted sites and produce bands that mostly seem to be homologous. The biggest drawback with most of these techniques is that prior genomic information about retrotransposons is needed for primer design, prohibiting universal applications. Another class of recently developed methods exploits length polymorphism present in arrays of multi-copy gene families such as cytochrome P450 and β-tubulin genes to provide cross-species amplification and transferability. A specific class of marker makes use of common features of plant resistance genes to generate bands linked to a given phenotype, or to reveal genetic diversity. Conserved DNA-based strategies have limited genome coverage and may fail to reveal genetic diversity, while resistance genes may be under specific evolutionary selection. Markers may also be generated from functional and/or transcribed regions of the genome using different gene-targeting approaches coupled with the use of RNA information. Such techniques have the
QTL mapping of genome regions controlling temephos resistance in larvae of the mosquito Aedes aegypti.

PubMed

Reyes-Solis, Guadalupe Del Carmen; Saavedra-Rodriguez, Karla; Suarez, Adriana Flores; Black, William C

2014-10-01

The mosquito Aedes aegypti is the principal vector of dengue and yellow fever flaviviruses. Temephos is an organophosphate insecticide used globally to suppress Ae. aegypti larval populations but resistance has evolved in many locations. Quantitative Trait Loci (QTL) controlling temephos survival in Ae. aegypti larvae were mapped in a pair of F3 advanced intercross lines arising from temephos resistant parents from Solidaridad, México and temephos susceptible parents from Iquitos, Peru. Two sets of 200 F3 larvae were exposed to a discriminating dose of temephos and then dead larvae were collected and preserved for DNA isolation every two hours up to 16 hours. Larvae surviving longer than 16 hours were considered resistant. For QTL mapping, single nucleotide polymorphisms (SNPs) were identified at 23 single copy genes and 26 microsatellite loci of known physical positions in the Ae. aegypti genome. In both reciprocal crosses, Multiple Interval Mapping identified eleven QTL associated with time until death. In the Solidaridad×Iquitos (SLD×Iq) cross twelve were associated with survival but in the reciprocal IqxSLD cross, only six QTL were survival associated. Polymorphisms at acetylcholine esterase (AchE) loci 1 and 2 were not associated with either resistance phenotype suggesting that target site insensitivity is not an organophosphate resistance mechanism in this region of México. Temephos resistance is under the control of many metabolic genes of small effect and dispersed throughout the Ae. aegypti genome.
QTL Mapping of Genome Regions Controlling Temephos Resistance in Larvae of the Mosquito Aedes aegypti

PubMed Central

Reyes-Solis, Guadalupe del Carmen; Saavedra-Rodriguez, Karla; Suarez, Adriana Flores; Black, William C.

2014-01-01

Introduction The mosquito Aedes aegypti is the principal vector of dengue and yellow fever flaviviruses. Temephos is an organophosphate insecticide used globally to suppress Ae. aegypti larval populations but resistance has evolved in many locations. Methodology/Principal Findings Quantitative Trait Loci (QTL) controlling temephos survival in Ae. aegypti larvae were mapped in a pair of F3 advanced intercross lines arising from temephos resistant parents from Solidaridad, México and temephos susceptible parents from Iquitos, Peru. Two sets of 200 F3 larvae were exposed to a discriminating dose of temephos and then dead larvae were collected and preserved for DNA isolation every two hours up to 16 hours. Larvae surviving longer than 16 hours were considered resistant. For QTL mapping, single nucleotide polymorphisms (SNPs) were identified at 23 single copy genes and 26 microsatellite loci of known physical positions in the Ae. aegypti genome. In both reciprocal crosses, Multiple Interval Mapping identified eleven QTL associated with time until death. In the Solidaridad×Iquitos (SLD×Iq) cross twelve were associated with survival but in the reciprocal IqxSLD cross, only six QTL were survival associated. Polymorphisms at acetylcholine esterase (AchE) loci 1 and 2 were not associated with either resistance phenotype suggesting that target site insensitivity is not an organophosphate resistance mechanism in this region of México. Conclusions/Significance Temephos resistance is under the control of many metabolic genes of small effect and dispersed throughout the Ae. aegypti genome. PMID:25330200
Segmental Duplications and Copy-Number Variation in the Human Genome

PubMed Central

Sharp, Andrew J. ; Locke, Devin P. ; McGrath, Sean D. ; Cheng, Ze ; Bailey, Jeffrey A. ; Vallente, Rhea U. ; Pertz, Lisa M. ; Clark, Royden A. ; Schwartz, Stuart ; Segraves, Rick ; Oseroff, Vanessa V. ; Albertson, Donna G. ; Pinkel, Daniel ; Eichler, Evan E.

2005-01-01

The human genome contains numerous blocks of highly homologous duplicated sequence. This higher-order architecture provides a substrate for recombination and recurrent chromosomal rearrangement associated with genomic disease. However, an assessment of the role of segmental duplications in normal variation has not yet been made. On the basis of the duplication architecture of the human genome, we defined a set of 130 potential rearrangement hotspots and constructed a targeted bacterial artificial chromosome (BAC) microarray (with 2,194 BACs) to assess copy-number variation in these regions by array comparative genomic hybridization. Using our segmental duplication BAC microarray, we screened a panel of 47 normal individuals, who represented populations from four continents, and we identified 119 regions of copy-number polymorphism (CNP), 73 of which were previously unreported. We observed an equal frequency of duplications and deletions, as well as a 4-fold enrichment of CNPs within hotspot regions, compared with control BACs (P < .000001), which suggests that segmental duplications are a major catalyst of large-scale variation in the human genome. Importantly, segmental duplications themselves were also significantly enriched >4-fold within regions of CNP. Almost without exception, CNPs were not confined to a single population, suggesting that these either are recurrent events, having occurred independently in multiple founders, or were present in early human populations. Our study demonstrates that segmental duplications define hotspots of chromosomal rearrangement, likely acting as mediators of normal variation as well as genomic disease, and it suggests that the consideration of genomic architecture can significantly improve the ascertainment of large-scale rearrangements. Our specialized segmental duplication BAC microarray and associated database of structural polymorphisms will provide an important resource for the future characterization of human genomic

Assembling the Setaria italica L. Beauv. genome into nine chromosomes and insights into regions affecting growth and drought tolerance

PubMed Central

Tsai, Kevin J.; Lu, Mei-Yeh Jade; Yang, Kai-Jung; Li, Mengyun; Teng, Yuchuan; Chen, Shihmay; Ku, Maurice S. B.; Li, Wen-Hsiung

2016-01-01

The diploid C4 plant foxtail millet (Setaria italica L. Beauv.) is an important crop in many parts of Africa and Asia for the vast consumption of its grain and ability to grow in harsh environments, but remains understudied in terms of complete genomic architecture. To date, there have been only two genome assembly and annotation efforts with neither assembly reaching over 86% of the estimated genome size. We have combined de novo assembly with custom reference-guided improvements on a popular cultivar of foxtail millet and have achieved a genome assembly of 477 Mbp in length, which represents over 97% of the estimated 490 Mbp. The assembly anchors over 98% of the predicted genes to the nine assembled nuclear chromosomes and contains more functional annotation gene models than previous assemblies. Our annotation has identified a large number of unique gene ontology terms related to metabolic activities, a region of chromosome 9 with several growth factor proteins, and regions syntenic with pearl millet or maize genomic regions that have been previously shown to affect growth. The new assembly and annotation for this important species can be used for detailed investigation and future innovations in growth for millet and other grains. PMID:27734962
Assembling the Setaria italica L. Beauv. genome into nine chromosomes and insights into regions affecting growth and drought tolerance.

PubMed

Tsai, Kevin J; Lu, Mei-Yeh Jade; Yang, Kai-Jung; Li, Mengyun; Teng, Yuchuan; Chen, Shihmay; Ku, Maurice S B; Li, Wen-Hsiung

2016-10-13

The diploid C 4 plant foxtail millet (Setaria italica L. Beauv.) is an important crop in many parts of Africa and Asia for the vast consumption of its grain and ability to grow in harsh environments, but remains understudied in terms of complete genomic architecture. To date, there have been only two genome assembly and annotation efforts with neither assembly reaching over 86% of the estimated genome size. We have combined de novo assembly with custom reference-guided improvements on a popular cultivar of foxtail millet and have achieved a genome assembly of 477 Mbp in length, which represents over 97% of the estimated 490 Mbp. The assembly anchors over 98% of the predicted genes to the nine assembled nuclear chromosomes and contains more functional annotation gene models than previous assemblies. Our annotation has identified a large number of unique gene ontology terms related to metabolic activities, a region of chromosome 9 with several growth factor proteins, and regions syntenic with pearl millet or maize genomic regions that have been previously shown to affect growth. The new assembly and annotation for this important species can be used for detailed investigation and future innovations in growth for millet and other grains.
Epigenetics, chromatin and genome organization: recent advances from the ENCODE project.

PubMed

Siggens, L; Ekwall, K

2014-09-01

The organization of the genome into functional units, such as enhancers and active or repressed promoters, is associated with distinct patterns of DNA and histone modifications. The Encyclopedia of DNA Elements (ENCODE) project has advanced our understanding of the principles of genome, epigenome and chromatin organization, identifying hundreds of thousands of potential regulatory regions and transcription factor binding sites. Part of the ENCODE consortium, GENCODE, has annotated the human genome with novel transcripts including new noncoding RNAs and pseudogenes, highlighting transcriptional complexity. Many disease variants identified in genome-wide association studies are located within putative enhancer regions defined by the ENCODE project. Understanding the principles of chromatin and epigenome organization will help to identify new disease mechanisms, biomarkers and drug targets, particularly as ongoing epigenome mapping projects generate data for primary human cell types that play important roles in disease. © 2014 The Association for the Publication of the Journal of Internal Medicine.
Targeting the human genome-microbiome axis for drug discovery: inspirations from global systems biology and traditional Chinese medicine.

PubMed

Zhao, Liping; Nicholson, Jeremy K; Lu, Aiping; Wang, Zhengtao; Tang, Huiru; Holmes, Elaine; Shen, Jian; Zhang, Xu; Li, Jia V; Lindon, John C

2012-07-06

Most chronic diseases impairing current human public health involve not only the human genome but also gene-environment interactions, and in the latter case the gut microbiome is an important factor. This makes the classical single drug-receptor target drug discovery paradigm much less applicable. There is widespread and increasing international interest in understanding the properties of traditional Chinese medicines (TCMs) for their potential utilization as a source of new drugs for Western markets as emerging evidence indicates that most TCM drugs are actually targeting both the host and its symbiotic microbes. In this review, we explore the challenges of and opportunities for harmonizing Eastern-Western drug discovery paradigms by focusing on emergent functions at the whole body level of humans as superorganisms. This could lead to new drug candidate compounds for chronic diseases targeting receptors outside the currently accepted "druggable genome" and shed light on current high interest issues in Western medicine such as drug-drug and drug-diet-gut microbial interactions that will be crucial in the development and delivery of future therapeutic regimes optimized for the individual patient.
Intrinsically disordered region of influenza A NP regulates viral genome packaging via interactions with viral RNA and host PI(4,5)P2.

PubMed

Kakisaka, Michinori; Yamada, Kazunori; Yamaji-Hasegawa, Akiko; Kobayashi, Toshihide; Aida, Yoko

2016-09-01

To be incorporated into progeny virions, the viral genome must be transported to the inner leaflet of the plasma membrane (PM) and accumulate there. Some viruses utilize lipid components to assemble at the PM. For example, simian virus 40 (SV40) targets the ganglioside GM1 and human immunodeficiency virus type 1 (HIV-1) utilizes phosphatidylinositol (4,5) bisphosphate [PI(4,5)P2]. Recent studies clearly indicate that Rab11-mediated recycling endosomes are required for influenza A virus (IAV) trafficking of vRNPs to the PM but it remains unclear how IAV vRNP localized or accumulate underneath the PM for viral genome incorporation into progeny virions. In this study, we found that the second intrinsically disordered region (IDR2) of NP regulates two binding steps involved in viral genome packaging. First, IDR2 facilitates NP oligomer binding to viral RNA to form vRNP. Secondly, vRNP assemble by interacting with PI(4,5)P2 at the PM via IDR2. These findings suggest that PI(4,5)P2 functions as the determinant of vRNP accumulation at the PM. Copyright © 2016 Elsevier Inc. All rights reserved.
Convergent Transcription At Intragenic Super-Enhancers Targets AID-initiated Genomic Instability

PubMed Central

Meng, Fei-Long; Du, Zhou; Federation, Alexander; Hu, Jiazhi; Wang, Qiao; Kieffer-Kwon, Kyong-Rim; Meyers, Robin M.; Amor, Corina; Wasserman, Caitlyn R.; Neuberg, Donna; Casellas, Rafael; Nussenzweig, Michel C.; Bradner, James E.; Liu, X. Shirley; Alt, Frederick W.

2015-01-01

Summary Activation-induced cytidine deaminase (AID) initiates both somatic hypermutation (SHM) for antibody affinity maturation and DNA breakage for antibody class switch recombination (CSR) via transcription-dependent cytidine deamination of single stranded DNA targets. While largely specific for immunoglobulin genes, AID also acts on a limited set of off-targets, generating oncogenic translocations and mutations that contribute to B cell lymphoma. How AID is recruited to off-targets has been a long-standing mystery. Based on deep GRO-Seq studies of mouse and human B lineage cells activated for CSR or SHM, we report that most robust AID off-target translocations occur within highly focal regions of target genes in which sense and antisense transcription converge. Moreover, we found that such AID-targeting “convergent” transcription arises from antisense transcription that emanates from Super-Enhancers within sense transcribed gene bodies. Our findings provide an explanation for AID off-targeting to a small subset of mostly lineage-specific genes in activated B cells. PMID:25483776
Targeted Degradation of CTCF Decouples Local Insulation of Chromosome Domains from Genomic Compartmentalization.

PubMed

Nora, Elphège P; Goloborodko, Anton; Valton, Anne-Laure; Gibcus, Johan H; Uebersohn, Alec; Abdennur, Nezar; Dekker, Job; Mirny, Leonid A; Bruneau, Benoit G

2017-05-18

The molecular mechanisms underlying folding of mammalian chromosomes remain poorly understood. The transcription factor CTCF is a candidate regulator of chromosomal structure. Using the auxin-inducible degron system in mouse embryonic stem cells, we show that CTCF is absolutely and dose-dependently required for looping between CTCF target sites and insulation of topologically associating domains (TADs). Restoring CTCF reinstates proper architecture on altered chromosomes, indicating a powerful instructive function for CTCF in chromatin folding. CTCF remains essential for TAD organization in non-dividing cells. Surprisingly, active and inactive genome compartments remain properly segregated upon CTCF depletion, revealing that compartmentalization of mammalian chromosomes emerges independently of proper insulation of TADs. Furthermore, our data support that CTCF mediates transcriptional insulator function through enhancer blocking but not as a direct barrier to heterochromatin spreading. Beyond defining the functions of CTCF in chromosome folding, these results provide new fundamental insights into the rules governing mammalian genome organization. Copyright © 2017 Elsevier Inc. All rights reserved.
Targeted degradation of CTCF decouples local insulation of chromosome domains from genomic compartmentalization

PubMed Central

Nora, Elphège P.; Goloborodko, Anton; Valton, Anne-Laure; Gibcus, Johan H.; Uebersohn, Alec; Abdennur, Nezar; Dekker, Job; Mirny, Leonid A.; Bruneau, Benoit G.

2017-01-01

Summary The molecular mechanisms underlying folding of mammalian chromosomes remain poorly understood. The transcription factor CTCF is a candidate regulator of chromosomal structure. Using the auxin-inducible degron system in mouse embryonic stem cells, we show that CTCF is absolutely and dose-dependently required for looping between CTCF target sites and insulation of topologically associating domains (TADs). Restoring CTCF reinstates proper architecture on altered chromosomes, indicating a powerful instructive function for CTCF in chromatin folding. CTCF remains essential for TAD organization in non-dividing cells. Surprisingly, active and inactive genome compartments remain properly segregated upon CTCF depletion, revealing that compartmentalization of mammalian chromosomes emerges independently of proper insulation of TADs. Further, our data support that CTCF mediates transcriptional insulator function through enhancer-blocking but not as a direct barrier to heterochromatin spreading. Beyond defining the functions of CTCF in chromosome folding these results provide new fundamental insights into the rules governing mammalian genome organization. PMID:28525758
Targeted sequencing of plant genomes

Treesearch

Mark D. Huynh

2014-01-01

Next-generation sequencing (NGS) has revolutionized the field of genetics by providing a means for fast and relatively affordable sequencing. With the advancement of NGS, wholegenome sequencing (WGS) has become more commonplace. However, sequencing an entire genome is still not cost effective or even beneficial in all cases. In studies that do not require a whole-...
Advantages of genome sequencing by long-read sequencer using SMRT technology in medical area.

PubMed

Nakano, Kazuma; Shiroma, Akino; Shimoji, Makiko; Tamotsu, Hinako; Ashimine, Noriko; Ohki, Shun; Shinzato, Misuzu; Minami, Maiko; Nakanishi, Tetsuhiro; Teruya, Kuniko; Satou, Kazuhito; Hirano, Takashi

2017-07-01

PacBio RS II is the first commercialized third-generation DNA sequencer able to sequence a single molecule DNA in real-time without amplification. PacBio RS II's sequencing technology is novel and unique, enabling the direct observation of DNA synthesis by DNA polymerase. PacBio RS II confers four major advantages compared to other sequencing technologies: long read lengths, high consensus accuracy, a low degree of bias, and simultaneous capability of epigenetic characterization. These advantages surmount the obstacle of sequencing genomic regions such as high/low G+C, tandem repeat, and interspersed repeat regions. Moreover, PacBio RS II is ideal for whole genome sequencing, targeted sequencing, complex population analysis, RNA sequencing, and epigenetics characterization. With PacBio RS II, we have sequenced and analyzed the genomes of many species, from viruses to humans. Herein, we summarize and review some of our key genome sequencing projects, including full-length viral sequencing, complete bacterial genome and almost-complete plant genome assemblies, and long amplicon sequencing of a disease-associated gene region. We believe that PacBio RS II is not only an effective tool for use in the basic biological sciences but also in the medical/clinical setting.
Comparative transgenic analysis of enhancers from the human SHOX and mouse Shox2 genomic regions.

PubMed

Rosin, Jessica M; Abassah-Oppong, Samuel; Cobb, John

2013-08-01

Disruption of presumptive enhancers downstream of the human SHOX gene (hSHOX) is a frequent cause of the zeugopodal limb defects characteristic of Léri-Weill dyschondrosteosis (LWD). The closely related mouse Shox2 gene (mShox2) is also required for limb development, but in the more proximal stylopodium. In this study, we used transgenic mice in a comparative approach to characterize enhancer sequences in the hSHOX and mShox2 genomic regions. Among conserved noncoding elements (CNEs) that function as enhancers in vertebrate genomes, those that are maintained near paralogous genes are of particular interest given their ancient origins. Therefore, we first analyzed the regulatory potential of a genomic region containing one such duplicated CNE (dCNE) downstream of mShox2 and hSHOX. We identified a strong limb enhancer directly adjacent to the mShox2 dCNE that recapitulates the expression pattern of the endogenous gene. Interestingly, this enhancer requires sequences only conserved in the mammalian lineage in order to drive strong limb expression, whereas the more deeply conserved sequences of the dCNE function as a neural enhancer. Similarly, we found that a conserved element downstream of hSHOX (CNE9) also functions as a neural enhancer in transgenic mice. However, when the CNE9 transgenic construct was enlarged to include adjacent, non-conserved sequences frequently deleted in LWD patients, the transgene drove expression in the zeugopodium of the limbs. Therefore, both hSHOX and mShox2 limb enhancers are coupled to distinct neural enhancers. This is the first report demonstrating the activity of cis-regulatory elements from the hSHOX and mShox2 genomic regions in mammalian embryos.
Functions of the 3′ and 5′ genome RNA regions of members of the genus Flavivirus

PubMed Central

Brinton, Margo A.; Basu, Mausumi

2015-01-01

The positive sense genomes of members of the genus Flavivirus in the family Flaviviridae are ~11 kb nts in length and have a 5′ type I cap but no 3′ poly A. The 5′ and 3′ terminal regions contain short conserved sequences that are proposed to be repeated remnants of an ancient sequence. However, the functions of most of these conserved sequences have not yet been determined. The terminal regions of the genome also contain multiple conserved RNA structures. Functional data for many of these structures has been obtained. Three sets of complementary 3′ and 5′ terminal region sequences, some of which are located in conserved RNA structures, interact to form a panhandle structure that is required for initiation of minus strand RNA synthesis with the 5′ terminal structure functioning as the promoter. How the switch from the terminal RNA structure base pairing to the long distance RNA-RNA interaction is triggered and regulated is not well understood but evidence suggests involvement of a cell protein binding to three sites on the 3′ terminal RNA structures and a cis-acting metastable 3′ RNA element in the 3′ terminal structure. Cell proteins may also be involved in facilitating exponential replication of nascent genomic RNA within replication vesicles at later times of infection cycle. Other conserved RNA structures and/or sequences in the 5′ and 3′ terminal regions have been proposed to regulate genome translation. Additional functions of the 5′ and 3′ terminal sequences have also been reported. PMID:25683510
Exploiting the CRISPR/Cas9 System for Targeted Genome Mutagenesis in Petunia.

PubMed

Zhang, Bin; Yang, Xia; Yang, Chunping; Li, Mingyang; Guo, Yulong

2016-02-03

Recently, CRISPR/Cas9 technology has emerged as a powerful approach for targeted genome modification in eukaryotic organisms from yeast to human cell lines. Its successful application in several plant species promises enormous potential for basic and applied plant research. However, extensive studies are still needed to assess this system in other important plant species, to broaden its fields of application and to improve methods. Here we showed that the CRISPR/Cas9 system is efficient in petunia (Petunia hybrid), an important ornamental plant and a model for comparative research. When PDS was used as target gene, transgenic shoot lines with albino phenotype accounted for 55.6%-87.5% of the total regenerated T0 Basta-resistant lines. A homozygous deletion close to 1 kb in length can be readily generated and identified in the first generation. A sequential transformation strategy--introducing Cas9 and sgRNA expression cassettes sequentially into petunia--can be used to make targeted mutations with short indels or chromosomal fragment deletions. Our results present a new plant species amenable to CRIPR/Cas9 technology and provide an alternative procedure for its exploitation.
Exploiting the CRISPR/Cas9 System for Targeted Genome Mutagenesis in Petunia

PubMed Central

Zhang, Bin; Yang, Xia; Yang, Chunping; Li, Mingyang; Guo, Yulong

2016-01-01

Recently, CRISPR/Cas9 technology has emerged as a powerful approach for targeted genome modification in eukaryotic organisms from yeast to human cell lines. Its successful application in several plant species promises enormous potential for basic and applied plant research. However, extensive studies are still needed to assess this system in other important plant species, to broaden its fields of application and to improve methods. Here we showed that the CRISPR/Cas9 system is efficient in petunia (Petunia hybrid), an important ornamental plant and a model for comparative research. When PDS was used as target gene, transgenic shoot lines with albino phenotype accounted for 55.6%–87.5% of the total regenerated T0 Basta-resistant lines. A homozygous deletion close to 1 kb in length can be readily generated and identified in the first generation. A sequential transformation strategy—introducing Cas9 and sgRNA expression cassettes sequentially into petunia—can be used to make targeted mutations with short indels or chromosomal fragment deletions. Our results present a new plant species amenable to CRIPR/Cas9 technology and provide an alternative procedure for its exploitation. PMID:26837606
Identification of potential drug targets by subtractive genome analysis of Escherichia coli O157:H7: an in silico approach

PubMed Central

Mondal, Shakhinur Islam; Ferdous, Sabiha; Jewel, Nurnabi Azad; Akter, Arzuba; Mahmud, Zabed; Islam, Md Muzahidul; Afrin, Tanzila; Karim, Nurul

2015-01-01

Bacterial enteric infections resulting in diarrhea, dysentery, or enteric fever constitute a huge public health problem, with more than a billion episodes of disease annually in developing and developed countries. In this study, the deadly agent of hemorrhagic diarrhea and hemolytic uremic syndrome, Escherichia coli O157:H7 was investigated with extensive computational approaches aimed at identifying novel and broad-spectrum antibiotic targets. A systematic in silico workflow consisting of comparative genomics, metabolic pathways analysis, and additional drug prioritizing parameters was used to identify novel drug targets that were essential for the pathogen’s survival but absent in its human host. Comparative genomic analysis of Kyoto Encyclopedia of Genes and Genomes annotated metabolic pathways identified 350 putative target proteins in E. coli O157:H7 which showed no similarity to human proteins. Further bio-informatic approaches including prediction of subcellular localization, calculation of molecular weight, and web-based investigation of 3D structural characteristics greatly aided in filtering the potential drug targets from 350 to 120. Ultimately, 44 non-homologous essential proteins of E. coli O157:H7 were prioritized and proved to have the eligibility to become novel broad-spectrum antibiotic targets and DNA polymerase III alpha (dnaE) was the top-ranked among these targets. Moreover, druggability of each of the identified drug targets was evaluated by the DrugBank database. In addition, 3D structure of the dnaE was modeled and explored further for in silico docking with ligands having potential druggability. Finally, we confirmed that the compounds N-coeleneterazine and N-(1,4-dihydro-5H-tetrazol-5-ylidene)-9-oxo-9H-xanthene-2-sulfon-amide were the most suitable ligands of dnaE and hence proposed as the potential inhibitors of this target protein. The results of this study could facilitate the discovery and release of new and effective drugs against E
Beyond endometriosis GWAS: from Genomics to Phenomics to the Patient

PubMed Central

Zondervan, Krina T.; Rahmioglu, Nilufer; Morris, Andrew P.; Nyholt, Dale R.; Montgomery, Grant W.; Becker, Christian M.; Missmer, Stacey A.

2017-01-01

Endometriosis is a heritable, complex chronic inflammatory disease, for which much of the causal pathogenic mechanism remain unknown. Genome-wide association studies (GWAS) to date have identified 12 single nucleotide polymorphisms or SNPs at 10 independent genetic loci associated with endometriosis. Most of these were more strongly associated with rAFS stage III/IV, rather than I/II. The loci are almost all located in inter-genic regions that are known to play a role in the regulation of expression of target genes yet to be identified. To identify the target genes and pathways perturbed by the implicated variants, studies are required involving functional genomic annotation of the surrounding chromosomal regions, in terms of transcriptor factor binding, epigenetic modification (e.g. DNA methylation and histone modification) sites, as well as their correlation with RNA transcription. These studies need to be conducted in tissue types relevant to endometriosis – in particular endometrium. In addition, to allow biologically and clinically relevant interpretation of molecular profiling data, they need to be combined and correlated with detailed, systematically collected phenotypic information (surgical and clinical). The WERF Endometriosis Phenome and Biobanking Harmonization project (EPHect) is a global standardisation initiative that has produced consensus data and sample collection protocols for endometriosis research. These now pave the way for collaborative studies integrating phenomic with genomic data, to identify informative subtypes of endometriosis that will enhance understanding of the pathogenic mechanisms of the disease and discovery of novel, targeted treatments. PMID:27513026
The mitochondrial genome of Pocillopora (Cnidaria: Scleractinia) contains two variable regions: the putative D-loop and a novel ORF of unknown function.

PubMed

Flot, Jean-François; Tillier, Simon

2007-10-15

The complete mitochondrial genomes of two individuals attributed to different morphospecies of the scleractinian coral genus Pocillopora have been sequenced. Both genomes, respectively 17,415 and 17,422 nt long, share the presence of a previously undescribed ORF encoding a putative protein made up of 302 amino acids and of unknown function. Surprisingly, this ORF turns out to be the second most variable region of the mitochondrial genome (1% nucleotide sequence difference between the two individuals) after the putative control region (1.5% sequence difference). Except for the presence of this ORF and for the location of the putative control region, the mitochondrial genome of Pocillopora is organized in a fashion similar to the other scleractinian coral genomes published to date. For the first time in a cnidarian, a putative second origin of replication is described based on its secondary structure similar to the stem-loop structure of O(L), the origin of L-strand replication in vertebrates.
Identification and characterization of potential druggable targets among hypothetical proteins of extensively drug resistant Mycobacterium tuberculosis (XDR KZN 605) through subtractive genomics approach.

PubMed

Uddin, Reaz; Siddiqui, Quratulain Nehal; Azam, Syed Sikander; Saima, Bibi; Wadood, Abdul

2018-03-01

Among the resistant isolates of tuberculosis (TB), the multidrug resistance tuberculosis (MDR-TB) and extensively drug resistant tuberculosis (XDR-TB) are the areas of growing concern for which the front-line antibiotics are no more effective. As a result, the search of new therapeutic targets against TB is an imperative need of time. On the other hand, the target identification is an a priori step in drug discovery based research. Furthermore, the availability of the complete proteomic data of extensively drug resistant Mycobacterium tuberculosis (XDR-MTB) made it possible to carry out in silico analysis for the discovery of new drug targets. In the current study, we aimed to prioritize the potential drug targets among the hypothetical proteins of XDR-TB via subtractive genomics approach. In the subtractive genomics, we stepwise reduced the complete proteome of XDR-MTB to only two hypothetical proteins and evidently proposed them as new therapeutic targets. The 3D structure of one of the two target proteins was predicted via homology modeling and later on, validated by various analysis tools. Our study suggested that the domains identified and the motif hits found in the sequences of the shortlisted drug targets are crucial for the survival of the XDR-MTB. To the best of our knowledge, the current study is the first attempt in which the complete proteomic data of XDR-MTB was subjected to the computational subtractive genomics approach and therefore, would provide an opportunity to identify the unique therapeutic targets against deadly XDR-MTB. Copyright © 2017 Elsevier B.V. All rights reserved.
Prospects of Fine-Mapping Trait-Associated Genomic Regions by Using Summary Statistics from Genome-wide Association Studies.

PubMed

Benner, Christian; Havulinna, Aki S; Järvelin, Marjo-Riitta; Salomaa, Veikko; Ripatti, Samuli; Pirinen, Matti

2017-10-05

During the past few years, various novel statistical methods have been developed for fine-mapping with the use of summary statistics from genome-wide association studies (GWASs). Although these approaches require information about the linkage disequilibrium (LD) between variants, there has not been a comprehensive evaluation of how estimation of the LD structure from reference genotype panels performs in comparison with that from the original individual-level GWAS data. Using population genotype data from Finland and the UK Biobank, we show here that a reference panel of 1,000 individuals from the target population is adequate for a GWAS cohort of up to 10,000 individuals, whereas smaller panels, such as those from the 1000 Genomes Project, should be avoided. We also show, both theoretically and empirically, that the size of the reference panel needs to scale with the GWAS sample size; this has important consequences for the application of these methods in ongoing GWAS meta-analyses and large biobank studies. We conclude by providing software tools and by recommending practices for sharing LD information to more efficiently exploit summary statistics in genetics research. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
A high quality assembly of the Nile Tilapia (Oreochromis niloticus) genome reveals the structure of two sex determination regions.

PubMed

Conte, Matthew A; Gammerdinger, William J; Bartie, Kerry L; Penman, David J; Kocher, Thomas D

2017-05-02

Tilapias are the second most farmed fishes in the world and a sustainable source of food. Like many other fish, tilapias are sexually dimorphic and sex is a commercially important trait in these fish. In this study, we developed a significantly improved assembly of the tilapia genome using the latest genome sequencing methods and show how it improves the characterization of two sex determination regions in two tilapia species. A homozygous clonal XX female Nile tilapia (Oreochromis niloticus) was sequenced to 44X coverage using Pacific Biosciences (PacBio) SMRT sequencing. Dozens of candidate de novo assemblies were generated and an optimal assembly (contig NG50 of 3.3Mbp) was selected using principal component analysis of likelihood scores calculated from several paired-end sequencing libraries. Comparison of the new assembly to the previous O. niloticus genome assembly reveals that recently duplicated portions of the genome are now well represented. The overall number of genes in the new assembly increased by 27.3%, including a 67% increase in pseudogenes. The new tilapia genome assembly correctly represents two recent vasa gene duplication events that have been verified with BAC sequencing. At total of 146Mbp of additional transposable element sequence are now assembled, a large proportion of which are recent insertions. Large centromeric satellite repeats are assembled and annotated in cichlid fish for the first time. Finally, the new assembly identifies the long-range structure of both a ~9Mbp XY sex determination region on LG1 in O. niloticus, and a ~50Mbp WZ sex determination region on LG3 in the related species O. aureus. This study highlights the use of long read sequencing to correctly assemble recent duplications and to characterize repeat-filled regions of the genome. The study serves as an example of the need for high quality genome assemblies and provides a framework for identifying sex determining genes in tilapia and related fish species.

Differentially Methylated Region-Representational Difference Analysis (DMR-RDA): A Powerful Method to Identify DMRs in Uncharacterized Genomes.

PubMed

Sasheva, Pavlina; Grossniklaus, Ueli

2017-01-01

Over the last years, it has become increasingly clear that environmental influences can affect the epigenomic landscape and that some epigenetic variants can have heritable, phenotypic effects. While there are a variety of methods to perform genome-wide analyses of DNA methylation in model organisms, this is still a challenging task for non-model organisms without a reference genome. Differentially methylated region-representational difference analysis (DMR-RDA) is a sensitive and powerful PCR-based technique that isolates DNA fragments that are differentially methylated between two otherwise identical genomes. The technique does not require special equipment and is independent of prior knowledge about the genome. It is even applicable to genomes that have high complexity and a large size, being the method of choice for the analysis of plant non-model systems.
Targeted DNA Mutagenesis for the Cure of Chronic Viral Infections

PubMed Central

Schiffer, Joshua T.; Aubert, Martine; Weber, Nicholas D.; Mintzer, Esther; Stone, Daniel

2012-01-01

Human immunodeficiency virus type 1 (HIV-1), hepatitis B virus (HBV), and herpes simplex virus (HSV) have been incurable to date because effective antiviral therapies target only replicating viruses and do not eradicate latently integrated or nonreplicating episomal viral genomes. Endonucleases that can target and cleave critical regions within latent viral genomes are currently in development. These enzymes are being engineered with high specificity such that off-target binding of cellular DNA will be absent or minimal. Imprecise nonhomologous-end-joining (NHEJ) DNA repair following repeated cleavage at the same critical site may permanently disrupt translation of essential viral proteins. We discuss the benefits and drawbacks of three types of DNA cleavage enzymes (zinc finger endonucleases, transcription activator-like [TAL] effector nucleases [TALENs], and homing endonucleases [also called meganucleases]), the development of delivery vectors for these enzymes, and potential obstacles for successful treatment of chronic viral infections. We then review issues regarding persistence of HIV-1, HBV, and HSV that are relevant to eradication with genome-altering approaches. PMID:22718830
Histone deacetylase inhibition modulates histone acetylation at gene promoter regions and affects genome-wide gene transcription in Schistosoma mansoni

PubMed Central

Anderson, Letícia; Gomes, Monete Rajão; daSilva, Lucas Ferreira; Pereira, Adriana da Silva Andrade; Mourão, Marina M.; Romier, Christophe; Pierce, Raymond

2017-01-01

Background Schistosomiasis is a parasitic disease infecting hundreds of millions of people worldwide. Treatment depends on a single drug, praziquantel, which kills the Schistosoma spp. parasite only at the adult stage. HDAC inhibitors (HDACi) such as Trichostatin A (TSA) induce parasite mortality in vitro (schistosomula and adult worms), however the downstream effects of histone hyperacetylation on the parasite are not known. Methodology/Principal findings TSA treatment of adult worms in vitro increased histone acetylation at H3K9ac and H3K14ac, which are transcription activation marks, not affecting the unrelated transcription repression mark H3K27me3. We investigated the effect of TSA HDACi on schistosomula gene expression at three different time points, finding a marked genome-wide change in the transcriptome profile. Gene transcription activity was correlated with changes on the chromatin acetylation mark at gene promoter regions. Moreover, combining expression data with ChIP-Seq public data for schistosomula, we found that differentially expressed genes having the H3K4me3 mark at their promoter region in general showed transcription activation upon HDACi treatment, compared with those without the mark, which showed transcription down-regulation. Affected genes are enriched for DNA replication processes, most of them being up-regulated. Twenty out of 22 genes encoding proteins involved in reducing reactive oxygen species accumulation were down-regulated. Dozens of genes encoding proteins with histone reader motifs were changed, including SmEED from the PRC2 complex. We targeted SmEZH2 methyltransferase PRC2 component with a new EZH2 inhibitor (GSK343) and showed a synergistic effect with TSA, significantly increasing schistosomula mortality. Conclusions/Significance Genome-wide gene expression analyses have identified important pathways and cellular functions that were affected and may explain the schistosomicidal effect of TSA HDACi. The change in expression
Enhancement of single guide RNA transcription for efficient CRISPR/Cas-based genomic engineering.

PubMed

Ui-Tei, Kumiko; Maruyama, Shohei; Nakano, Yuko

2017-06-01

Genomic engineering using clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated (Cas) protein is a promising approach for targeting the genomic DNA of virtually any organism in a sequence-specific manner. Recent remarkable advances in CRISPR/Cas technology have made it a feasible system for use in therapeutic applications and biotechnology. In the CRISPR/Cas system, a guide RNA (gRNA), interacting with the Cas protein, recognizes a genomic region with sequence complementarity, and the double-stranded DNA at the target site is cleaved by the Cas protein. A widely used gRNA is an RNA polymerase III (pol III)-driven single gRNA (sgRNA), which is produced by artificial fusion of CRISPR RNA (crRNA) and trans-activation crRNA (tracrRNA). However, we identified a TTTT stretch, known as a termination signal of RNA pol III, in the scaffold region of the sgRNA. Here, we revealed that sgRNA carrying a TTTT stretch reduces the efficiency of sgRNA transcription due to premature transcriptional termination, and decreases the efficiency of genome editing. Unexpectedly, it was also shown that the premature terminated sgRNA may have an adverse effect of inducing RNA interference. Such disadvantageous effects were avoided by substituting one base in the TTTT stretch.
Genome-Wide Analysis of Grain Yield Stability and Environmental Interactions in a Multiparental Soybean Population.

PubMed

Xavier, Alencar; Jarquin, Diego; Howard, Reka; Ramasubramanian, Vishnu; Specht, James E; Graef, George L; Beavis, William D; Diers, Brian W; Song, Qijian; Cregan, Perry B; Nelson, Randall; Mian, Rouf; Shannon, J Grover; McHale, Leah; Wang, Dechun; Schapaugh, William; Lorenz, Aaron J; Xu, Shizhong; Muir, William M; Rainey, Katy M

2018-02-02

Genetic improvement toward optimized and stable agronomic performance of soybean genotypes is desirable for food security. Understanding how genotypes perform in different environmental conditions helps breeders develop sustainable cultivars adapted to target regions. Complex traits of importance are known to be controlled by a large number of genomic regions with small effects whose magnitude and direction are modulated by environmental factors. Knowledge of the constraints and undesirable effects resulting from genotype by environmental interactions is a key objective in improving selection procedures in soybean breeding programs. In this study, the genetic basis of soybean grain yield responsiveness to environmental factors was examined in a large soybean nested association population. For this, a genome-wide association to performance stability estimates generated from a Finlay-Wilkinson analysis and the inclusion of the interaction between marker genotypes and environmental factors was implemented. Genomic footprints were investigated by analysis and meta-analysis using a recently published multiparent model. Results indicated that specific soybean genomic regions were associated with stability, and that multiplicative interactions were present between environments and genetic background. Seven genomic regions in six chromosomes were identified as being associated with genotype-by-environment interactions. This study provides insight into genomic assisted breeding aimed at achieving a more stable agronomic performance of soybean, and documented opportunities to exploit genomic regions that were specifically associated with interactions involving environments and subpopulations. Copyright © 2018 Xavier et al.
Genomic treasure troves: complete genome sequencing of herbarium and insect museum specimens.

PubMed

Staats, Martijn; Erkens, Roy H J; van de Vossenberg, Bart; Wieringa, Jan J; Kraaijeveld, Ken; Stielow, Benjamin; Geml, József; Richardson, James E; Bakker, Freek T

2013-01-01

Unlocking the vast genomic diversity stored in natural history collections would create unprecedented opportunities for genome-scale evolutionary, phylogenetic, domestication and population genomic studies. Many researchers have been discouraged from using historical specimens in molecular studies because of both generally limited success of DNA extraction and the challenges associated with PCR-amplifying highly degraded DNA. In today's next-generation sequencing (NGS) world, opportunities and prospects for historical DNA have changed dramatically, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates. Here we show that using a standard multiplex and paired-end Illumina sequencing approach, genome-scale sequence data can be generated reliably from dry-preserved plant, fungal and insect specimens collected up to 115 years ago, and with minimal destructive sampling. Using a reference-based assembly approach, we were able to produce the entire nuclear genome of a 43-year-old Arabidopsis thaliana (Brassicaceae) herbarium specimen with high and uniform sequence coverage. Nuclear genome sequences of three fungal specimens of 22-82 years of age (Agaricus bisporus, Laccaria bicolor, Pleurotus ostreatus) were generated with 81.4-97.9% exome coverage. Complete organellar genome sequences were assembled for all specimens. Using de novo assembly we retrieved between 16.2-71.0% of coding sequence regions, and hence remain somewhat cautious about prospects for de novo genome assembly from historical specimens. Non-target sequence contaminations were observed in 2 of our insect museum specimens. We anticipate that future museum genomics projects will perhaps not generate entire genome sequences in all cases (our specimens contained relatively small and low-complexity genomes), but at least generating vital comparative genomic data for testing (phylo)genetic, demographic and genetic hypotheses, that become increasingly more horizontal
Variants in Several Genomic Regions Associated with Asperger Disorder

PubMed Central

Salyakina, D.; Ma, D.Q.; Jaworski, J.M.; Konidari, I.; Whitehead, P.L.; Henson, R.; Martinez, D.; Robinson, J.L.; Sacharow, S.; Wright, H.H.; Abramson, R.K.; Gilbert, J.R.; Cuccaro, M.L.; Pericak-Vance, M.A.

2010-01-01

Asperger disorder (ASP) is one of the autism spectrum disorders (ASD) and is differentiated from autism largely on the absence of clinically significant cognitive and language delays. Analysis of a homogenous subset of families with ASP may help to address the corresponding effect of genetic heterogeneity on identifying ASD genetic risk factors. To examine the hypothesis that common variation is important in ASD, we performed a genome-wide association study (GWAS) in 124 ASP families in a discovery data set and 110 ASP families in a validation data set. We prioritized the top 100 association results from both cohorts by employing a ranking strategy. Novel regions on 5q21.1 (P = 9.7 × 10−7) and 15q22.1–q22.2 (P = 7.3 × 10−6) were our most significant findings in the combined data set. Three chromosomal regions showing association, 3p14.2 (P = 3.6 × 10−6), 3q25–26 (P = 6.0 × 10−5) and 3p23 (P = 3.3 × 10−4) overlapped linkage regions reported in Finnish ASP families, and eight association regions overlapped ASD linkage areas. Our findings suggest that ASP shares both ASD-related genetic risk factors, as well as has genetic risk factors unique to the ASP phenotype. PMID:21182207
Neptune: a bioinformatics tool for rapid discovery of genomic variation in bacterial populations

PubMed Central

Marinier, Eric; Zaheer, Rahat; Berry, Chrystal; Weedmark, Kelly A.; Domaratzki, Michael; Mabon, Philip; Knox, Natalie C.; Reimer, Aleisha R.; Graham, Morag R.; Chui, Linda; Patterson-Fortin, Laura; Zhang, Jian; Pagotto, Franco; Farber, Jeff; Mahony, Jim; Seyer, Karine; Bekal, Sadjia; Tremblay, Cécile; Isaac-Renton, Judy; Prystajecky, Natalie; Chen, Jessica; Slade, Peter

2017-01-01

Abstract The ready availability of vast amounts of genomic sequence data has created the need to rethink comparative genomics algorithms using ‘big data’ approaches. Neptune is an efficient system for rapidly locating differentially abundant genomic content in bacterial populations using an exact k-mer matching strategy, while accommodating k-mer mismatches. Neptune’s loci discovery process identifies sequences that are sufficiently common to a group of target sequences and sufficiently absent from non-targets using probabilistic models. Neptune uses parallel computing to efficiently identify and extract these loci from draft genome assemblies without requiring multiple sequence alignments or other computationally expensive comparative sequence analyses. Tests on simulated and real datasets showed that Neptune rapidly identifies regions that are both sensitive and specific. We demonstrate that this system can identify trait-specific loci from different bacterial lineages. Neptune is broadly applicable for comparative bacterial analyses, yet will particularly benefit pathogenomic applications, owing to efficient and sensitive discovery of differentially abundant genomic loci. The software is available for download at: http://github.com/phac-nml/neptune. PMID:29048594
Genomic Variation in Natural Populations of Drosophila melanogaster

PubMed Central

Langley, Charles H.; Stevens, Kristian; Cardeno, Charis; Lee, Yuh Chwen G.; Schrider, Daniel R.; Pool, John E.; Langley, Sasha A.; Suarez, Charlyn; Corbett-Detig, Russell B.; Kolaczkowski, Bryan; Fang, Shu; Nista, Phillip M.; Holloway, Alisha K.; Kern, Andrew D.; Dewey, Colin N.; Song, Yun S.; Hahn, Matthew W.; Begun, David J.

2012-01-01

This report of independent genome sequences of two natural populations of Drosophila melanogaster (37 from North America and 6 from Africa) provides unique insight into forces shaping genomic polymorphism and divergence. Evidence of interactions between natural selection and genetic linkage is abundant not only in centromere- and telomere-proximal regions, but also throughout the euchromatic arms. Linkage disequilibrium, which decays within 1 kbp, exhibits a strong bias toward coupling of the more frequent alleles and provides a high-resolution map of recombination rate. The juxtaposition of population genetics statistics in small genomic windows with gene structures and chromatin states yields a rich, high-resolution annotation, including the following: (1) 5′- and 3′-UTRs are enriched for regions of reduced polymorphism relative to lineage-specific divergence; (2) exons overlap with windows of excess relative polymorphism; (3) epigenetic marks associated with active transcription initiation sites overlap with regions of reduced relative polymorphism and relatively reduced estimates of the rate of recombination; (4) the rate of adaptive nonsynonymous fixation increases with the rate of crossing over per base pair; and (5) both duplications and deletions are enriched near origins of replication and their density correlates negatively with the rate of crossing over. Available demographic models of X and autosome descent cannot account for the increased divergence on the X and loss of diversity associated with the out-of-Africa migration. Comparison of the variation among these genomes to variation among genomes from D. simulans suggests that many targets of directional selection are shared between these species. PMID:22673804
Remarkably Divergent Regions Punctuate the Genome Assembly of the Caenorhabditis elegans Hawaiian Strain CB4856

PubMed Central

Thompson, Owen A.; Snoek, L. Basten; Nijveen, Harm; Sterken, Mark G.; Volkers, Rita J. M.; Brenchley, Rachel; van’t Hof, Arjen; Bevers, Roel P. J.; Cossins, Andrew R.; Yanai, Itai; Hajnal, Alex; Schmid, Tobias; Perkins, Jaryn D.; Spencer, David; Kruglyak, Leonid; Andersen, Erik C.; Moerman, Donald G.; Hillier, LaDeana W.; Kammenga, Jan E.; Waterston, Robert H.

2015-01-01

The Hawaiian strain (CB4856) of Caenorhabditis elegans is one of the most divergent from the canonical laboratory strain N2 and has been widely used in developmental, population, and evolutionary studies. To enhance the utility of the strain, we have generated a draft sequence of the CB4856 genome, exploiting a variety of resources and strategies. When compared against the N2 reference, the CB4856 genome has 327,050 single nucleotide variants (SNVs) and 79,529 insertion–deletion events that result in a total of 3.3 Mb of N2 sequence missing from CB4856 and 1.4 Mb of sequence present in CB4856 but not present in N2. As previously reported, the density of SNVs varies along the chromosomes, with the arms of chromosomes showing greater average variation than the centers. In addition, we find 61 regions totaling 2.8 Mb, distributed across all six chromosomes, which have a greatly elevated SNV density, ranging from 2 to 16% SNVs. A survey of other wild isolates show that the two alternative haplotypes for each region are widely distributed, suggesting they have been maintained by balancing selection over long evolutionary times. These divergent regions contain an abundance of genes from large rapidly evolving families encoding F-box, MATH, BATH, seven-transmembrane G-coupled receptors, and nuclear hormone receptors, suggesting that they provide selective advantages in natural environments. The draft sequence makes available a comprehensive catalog of sequence differences between the CB4856 and N2 strains that will facilitate the molecular dissection of their phenotypic differences. Our work also emphasizes the importance of going beyond simple alignment of reads to a reference genome when assessing differences between genomes. PMID:25995208
Genomic organization of the canine herpesvirus US region.

PubMed

Haanes, E J; Tomlinson, C C

1998-02-01

Canine herpesvirus (CHV) is an alpha-herpesvirus of limited pathogenicity in healthy adult dogs and infectivity of the virus appears to be largely limited to cells of canine origin. CHV's low virulence and species specificity make it an attractive candidate for a recombinant vaccine vector to protect dogs against a variety of pathogens. As part of the analysis of the CHV genome, the authors determined the complete nucleotide sequence of the CHV US region as well as portions of the flanking inverted repeats. Seven full open reading frames (ORFs) encoding proteins larger than 100 amino acids were identified within, or partially within the CHV US: cUS2, cUS3, cUS4, cUS6, cUS7, cUS8 and cUS9; which are homologs of the herpes simplex virus type-1 US2; protein kinase; gG, gD, gI, gE; and US9 genes, respectively. An eighth ORF was identified in the inverted repeat region, cIR6, a homolog of the equine herpesvirus type-1 IR6 gene. The authors identified and mapped most of the major transcripts for the predicted CHV US ORFs by Northern analysis.
Structural complexity of Dengue virus untranslated regions: cis-acting RNA motifs and pseudoknot interactions modulating functionality of the viral genome

PubMed Central

Sztuba-Solinska, Joanna; Teramoto, Tadahisa; Rausch, Jason W.; Shapiro, Bruce A.; Padmanabhan, Radhakrishnan; Le Grice, Stuart F. J.

2013-01-01

The Dengue virus (DENV) genome contains multiple cis-acting elements required for translation and replication. Previous studies indicated that a 719-nt subgenomic minigenome (DENV-MINI) is an efficient template for translation and (−) strand RNA synthesis in vitro. We performed a detailed structural analysis of DENV-MINI RNA, combining chemical acylation techniques, Pb2+ ion-induced hydrolysis and site-directed mutagenesis. Our results highlight protein-independent 5′–3′ terminal interactions involving hybridization between recognized cis-acting motifs. Probing analyses identified tandem dumbbell structures (DBs) within the 3′ terminus spaced by single-stranded regions, internal loops and hairpins with embedded GNRA-like motifs. Analysis of conserved motifs and top loops (TLs) of these dumbbells, and their proposed interactions with downstream pseudoknot (PK) regions, predicted an H-type pseudoknot involving TL1 of the 5′ DB and the complementary region, PK2. As disrupting the TL1/PK2 interaction, via ‘flipping’ mutations of PK2, previously attenuated DENV replication, this pseudoknot may participate in regulation of RNA synthesis. Computer modeling implied that this motif might function as autonomous structural/regulatory element. In addition, our studies targeting elements of the 3′ DB and its complementary region PK1 indicated that communication between 5′–3′ terminal regions strongly depends on structure and sequence composition of the 5′ cyclization region. PMID:23531545
Collateral DNA damage produced by genome-editing drones: exception or rule?

PubMed

Canela, Andres; Stanlie, Andre; Nussenzweig, André

2015-05-21

In the recent issue of Nature Biotechnology, Frock et al. (2015) developed an elegant technique to capture translocation partners that can be utilized to determine off-target regions of genome-editing endonucleases as well as endogenous mutators at nucleotide resolution. Copyright © 2015 Elsevier Inc. All rights reserved.
Comprehensive genomic profiling of different subtypes of nasopharyngeal carcinoma reveals similarities and differences to guide targeted therapy.

PubMed

Ali, Siraj M; Yao, Ming; Yao, Jicheng; Wang, Jing; Cheng, Yuwei; Schrock, Alexa B; Chirn, Gung-Wei; Chen, Hui; Mu, Shuo; Gay, Laurie; Elvin, Julia A; Suh, James; Miller, Vincent A; Stephens, Philip J; Ross, Jeffrey S; Wang, Kai

2017-09-15

To date, no targeted therapy has been approved for nasopharyngeal carcinoma (NPC), and this underscores the need for an in-depth understanding of clinically relevant genomic alterations (CRGAs). Comprehensive genomic profiling was performed for 190 NPC patients, including 20 patients with nasopharyngeal adenocarcinoma (NPAC), 62 patients with nasopharyngeal squamous cell carcinoma (NPSCC), and 108 patients with nasopharyngeal undifferentiated carcinoma (NPUC). The associations of genes and pathways with subtypes, Epstein-Barr virus (EBV) infections, and the tumor mutation burden (TMB) were statistically evaluated. Although the overall rates of genomic alterations were similar, the 3 NPC subtypes exhibited different mutational landscapes. Notably, mutations in a proven-treatable target gene, isocitrate dehydrogenase 2 (IDH2), were significantly associated with NPUC but not with NPAC or NPSCC. The top 5 ranked CRGAs included CDKN2A (29%), IDH2 (16%), SMARCB1 (7%), PIK3CA (6%), and NF1 (5%) in NPUC; CDKN2A (27%), PIK3CA (23%), FBXW7 (11%), PTEN (11%), and EGFR (8%) in NPSCC; and CDKN2A (20%), KRAS (15%), CCND1 (10%), MAP3K1 (10%), and NOTCH1 (10%) in NPAC. The incidence of EBV infections significantly correlated with the subtypes and with TP53, CDKN2A, and CDKN2B. The TMB status correlated with the subtypes and with LRP1B, FBXW7, and PIK3CA mutations as well as DNA repair, phosphoinositide 3-kinase/mammalian target of rapamycin, and mitogen-activated protein kinase pathways. These results indicate that different NPC subtypes harbor different CRGAs. Both EBV infections and the TMB are associated with the NPC subtypes as well as the alterations of individual genes and pathways. The high frequency of IDH2 mutations in NPUC may facilitate potential targeted therapy and will ultimately point to new therapeutic strategies. Cancer 2017;123:3628-37. © 2017 American Cancer Society. © 2017 American Cancer Society.
Application of selection mapping to identify genomic regions associated with dairy production in sheep.

PubMed

Gutiérrez-Gil, Beatriz; Arranz, Juan Jose; Pong-Wong, Ricardo; García-Gámez, Elsa; Kijas, James; Wiener, Pamela

2014-01-01

In Europe, especially in Mediterranean areas, the sheep has been traditionally exploited as a dual purpose species, with income from both meat and milk. Modernization of husbandry methods and the establishment of breeding schemes focused on milk production have led to the development of "dairy breeds." This study investigated selective sweeps specifically related to dairy production in sheep by searching for regions commonly identified in different European dairy breeds. With this aim, genotypes from 44,545 SNP markers covering the sheep autosomes were analysed in both European dairy and non-dairy sheep breeds using two approaches: (i) identification of genomic regions showing extreme genetic differentiation between each dairy breed and a closely related non-dairy breed, and (ii) identification of regions with reduced variation (heterozygosity) in the dairy breeds using two methods. Regions detected in at least two breeds (breed pairs) by the two approaches (genetic differentiation and at least one of the heterozygosity-based analyses) were labeled as core candidate convergence regions and further investigated for candidate genes. Following this approach six regions were detected. For some of them, strong candidate genes have been proposed (e.g. ABCG2, SPP1), whereas some other genes designated as candidates based on their association with sheep and cattle dairy traits (e.g. LALBA, DGAT1A) were not associated with a detectable sweep signal. Few of the identified regions were coincident with QTL previously reported in sheep, although many of them corresponded to orthologous regions in cattle where QTL for dairy traits have been identified. Due to the limited number of QTL studies reported in sheep compared with cattle, the results illustrate the potential value of selection mapping to identify genomic regions associated with dairy traits in sheep.
Application of Selection Mapping to Identify Genomic Regions Associated with Dairy Production in Sheep

PubMed Central

Gutiérrez-Gil, Beatriz; Arranz, Juan Jose; Pong-Wong, Ricardo; García-Gámez, Elsa; Kijas, James; Wiener, Pamela

2014-01-01

In Europe, especially in Mediterranean areas, the sheep has been traditionally exploited as a dual purpose species, with income from both meat and milk. Modernization of husbandry methods and the establishment of breeding schemes focused on milk production have led to the development of “dairy breeds.” This study investigated selective sweeps specifically related to dairy production in sheep by searching for regions commonly identified in different European dairy breeds. With this aim, genotypes from 44,545 SNP markers covering the sheep autosomes were analysed in both European dairy and non-dairy sheep breeds using two approaches: (i) identification of genomic regions showing extreme genetic differentiation between each dairy breed and a closely related non-dairy breed, and (ii) identification of regions with reduced variation (heterozygosity) in the dairy breeds using two methods. Regions detected in at least two breeds (breed pairs) by the two approaches (genetic differentiation and at least one of the heterozygosity-based analyses) were labeled as core candidate convergence regions and further investigated for candidate genes. Following this approach six regions were detected. For some of them, strong candidate genes have been proposed (e.g. ABCG2, SPP1), whereas some other genes designated as candidates based on their association with sheep and cattle dairy traits (e.g. LALBA, DGAT1A) were not associated with a detectable sweep signal. Few of the identified regions were coincident with QTL previously reported in sheep, although many of them corresponded to orthologous regions in cattle where QTL for dairy traits have been identified. Due to the limited number of QTL studies reported in sheep compared with cattle, the results illustrate the potential value of selection mapping to identify genomic regions associated with dairy traits in sheep. PMID:24788864
Elucidating the genomic architecture of Asian EGFR-mutant lung adenocarcinoma through multi-region exome sequencing.

PubMed

Nahar, Rahul; Zhai, Weiwei; Zhang, Tong; Takano, Angela; Khng, Alexis J; Lee, Yin Yeng; Liu, Xingliang; Lim, Chong Hee; Koh, Tina P T; Aung, Zaw Win; Lim, Tony Kiat Hon; Veeravalli, Lavanya; Yuan, Ju; Teo, Audrey S M; Chan, Cheryl X; Poh, Huay Mei; Chua, Ivan M L; Liew, Audrey Ann; Lau, Dawn Ping Xi; Kwang, Xue Lin; Toh, Chee Keong; Lim, Wan-Teck; Lim, Bing; Tam, Wai Leong; Tan, Eng-Huat; Hillmer, Axel M; Tan, Daniel S W

2018-01-15

EGFR-mutant lung adenocarcinomas (LUAD) display diverse clinical trajectories and are characterized by rapid but short-lived responses to EGFR tyrosine kinase inhibitors (TKIs). Through sequencing of 79 spatially distinct regions from 16 early stage tumors, we show that despite low mutation burdens, EGFR-mutant Asian LUADs unexpectedly exhibit a complex genomic landscape with frequent and early whole-genome doubling, aneuploidy, and high clonal diversity. Multiple truncal alterations, including TP53 mutations and loss of CDKN2A and RB1, converge on cell cycle dysregulation, with late sector-specific high-amplitude amplifications and deletions that potentially beget drug resistant clones. We highlight the association between genomic architecture and clinical phenotypes, such as co-occurring truncal drivers and primary TKI resistance. Through comparative analysis with published smoking-related LUAD, we postulate that the high intra-tumor heterogeneity observed in Asian EGFR-mutant LUAD may be contributed by an early dominant driver, genomic instability, and low background mutation rates.
The complete chloroplast genome sequence of Aconitum coreanum and Aconitum carmichaelii and comparative analysis with other Aconitum species

PubMed Central

Park, Inkyu; Kim, Wook-jin; Yang, Sungyu; Yeo, Sang-Min; Li, Hulin

2017-01-01

Aconitum species (belonging to the Ranunculaceae) are well known herbaceous medicinal ingredients and have great economic value in Asian countries. However, there are still limited genomic resources available for Aconitum species. In this study, we sequenced the chloroplast (cp) genomes of two Aconitum species, A. coreanum and A. carmichaelii, using the MiSeq platform. The two Aconitum chloroplast genomes were 155,880 and 157,040 bp in length, respectively, and exhibited LSC and SSC regions separated by a pair of inverted repeat regions. Both cp genomes had 38% GC content and contained 131 unique functional genes including 86 protein-coding genes, eight ribosomal RNA genes, and 37 transfer RNA genes. The gene order, content, and orientation of the two Aconitum cp genomes exhibited the general structure of angiosperms, and were similar to those of other Aconitum species. Comparison of the cp genome structure and gene order with that of other Aconitum species revealed general contraction and expansion of the inverted repeat regions and single copy boundary regions. Divergent regions were also identified. In phylogenetic analysis, Aconitum species positon among the Ranunculaceae was determined with other family cp genomes in the Ranunculales. We obtained a barcoding target sequence in a divergent region, ndhC–trnV, and successfully developed a SCAR (sequence characterized amplified region) marker for discrimination of A. coreanum. Our results provide useful genetic information and a specific barcode for discrimination of Aconitum species. PMID:28863163
The complete chloroplast genome sequence of Aconitum coreanum and Aconitum carmichaelii and comparative analysis with other Aconitum species.

PubMed

Park, Inkyu; Kim, Wook-Jin; Yang, Sungyu; Yeo, Sang-Min; Li, Hulin; Moon, Byeong Cheol

2017-01-01

Aconitum species (belonging to the Ranunculaceae) are well known herbaceous medicinal ingredients and have great economic value in Asian countries. However, there are still limited genomic resources available for Aconitum species. In this study, we sequenced the chloroplast (cp) genomes of two Aconitum species, A. coreanum and A. carmichaelii, using the MiSeq platform. The two Aconitum chloroplast genomes were 155,880 and 157,040 bp in length, respectively, and exhibited LSC and SSC regions separated by a pair of inverted repeat regions. Both cp genomes had 38% GC content and contained 131 unique functional genes including 86 protein-coding genes, eight ribosomal RNA genes, and 37 transfer RNA genes. The gene order, content, and orientation of the two Aconitum cp genomes exhibited the general structure of angiosperms, and were similar to those of other Aconitum species. Comparison of the cp genome structure and gene order with that of other Aconitum species revealed general contraction and expansion of the inverted repeat regions and single copy boundary regions. Divergent regions were also identified. In phylogenetic analysis, Aconitum species positon among the Ranunculaceae was determined with other family cp genomes in the Ranunculales. We obtained a barcoding target sequence in a divergent region, ndhC-trnV, and successfully developed a SCAR (sequence characterized amplified region) marker for discrimination of A. coreanum. Our results provide useful genetic information and a specific barcode for discrimination of Aconitum species.
Genomic regions controlling shape variation in the first upper molar of the house mouse

PubMed Central

Pantalacci, Sophie; Turner, Leslie M; Steingrimsson, Eirikur; Renaud, Sabrina

2017-01-01

Numerous loci of large effect have been shown to underlie phenotypic variation between species. However, loci with subtle effects are presumably more frequently involved in microevolutionary processes but have rarely been discovered. We explore the genetic basis of shape variation in the first upper molar of hybrid mice between Mus musculus musculus and M. m. domesticus. We performed the first genome-wide association study for molar shape and used 3D surface morphometrics to quantify subtle variation between individuals. We show that many loci of small effect underlie phenotypic variation, and identify five genomic regions associated with tooth shape; one region contained the gene microphthalmia-associated transcription factor Mitf that has previously been associated with tooth malformations. Using a panel of five mutant laboratory strains, we show the effect of the Mitf gene on tooth shape. This is the first report of a gene causing subtle but consistent variation in tooth shape resembling variation in nature. PMID:29091026

Human genomic regions with exceptionally high levels of population differentiation identified from 911 whole-genome sequences.

PubMed

Colonna, Vincenza; Ayub, Qasim; Chen, Yuan; Pagani, Luca; Luisi, Pierre; Pybus, Marc; Garrison, Erik; Xue, Yali; Tyler-Smith, Chris; Abecasis, Goncalo R; Auton, Adam; Brooks, Lisa D; DePristo, Mark A; Durbin, Richard M; Handsaker, Robert E; Kang, Hyun Min; Marth, Gabor T; McVean, Gil A

2014-06-30

Population differentiation has proved to be effective for identifying loci under geographically localized positive selection, and has the potential to identify loci subject to balancing selection. We have previously investigated the pattern of genetic differentiation among human populations at 36.8 million genomic variants to identify sites in the genome showing high frequency differences. Here, we extend this dataset to include additional variants, survey sites with low levels of differentiation, and evaluate the extent to which highly differentiated sites are likely to result from selective or other processes. We demonstrate that while sites with low differentiation represent sampling effects rather than balancing selection, sites showing extremely high population differentiation are enriched for positive selection events and that one half may be the result of classic selective sweeps. Among these, we rediscover known examples, where we actually identify the established functional SNP, and discover novel examples including the genes ABCA12, CALD1 and ZNF804, which we speculate may be linked to adaptations in skin, calcium metabolism and defense, respectively. We identify known and many novel candidate regions for geographically restricted positive selection, and suggest several directions for further research.
Genome-Wide Analysis of miRNA targets in Brachypodium and Biomass Energy Crops

DOE Office of Scientific and Technical Information (OSTI.GOV)

Green, Pamela J.

2015-08-11

MicroRNAs (miRNAs) contribute to the control of numerous biological processes through the regulation of specific target mRNAs. Although the identities of these targets are essential to elucidate miRNA function, the targets are much more difficult to identify than the small RNAs themselves. Before this work, we pioneered the genome-wide identification of the targets of Arabidopsis miRNAs using an approach called PARE (German et al., Nature Biotech. 2008; Nature Protocols, 2009). Under this project, we applied PARE to Brachypodium distachyon (Brachypodium), a model plant in the Poaceae family, which includes the major food grain and bioenergy crops. Through in-depth global analysismore » and examination of specific examples, this research greatly expanded our knowledge of miRNAs and target RNAs of Brachypodium. New regulation in response to environmental stress or tissue type was found, and many new miRNAs were discovered. More than 260 targets of new and known miRNAs with PARE sequences at the precise sites of miRNA-guided cleavage were identified and characterized. Combining PARE data with the small RNA data also identified the miRNAs responsible for initiating approximately 500 phased loci, including one of the novel miRNAs. PARE analysis also revealed that differentially expressed miRNAs in the same family guide specific target RNA cleavage in a correspondingly tissue-preferential manner. The project included generation of small RNA and PARE resources for bioenergy crops, to facilitate ongoing discovery of conserved miRNA-target RNA regulation. By associating specific miRNA-target RNA pairs with known physiological functions, the research provides insights about gene regulation in different tissues and in response to environmental stress. This, and release of new PARE and small RNA data sets should contribute basic knowledge to enhance breeding and may suggest new strategies for improvement of biomass energy crops.« less
An integrative approach to predicting the functional effects of small indels in non-coding regions of the human genome

PubMed Central

Ferlaino, Michael; Rogers, Mark F.; Shihab, Hashem A.; Mort, Matthew; Cooper, David N.; Gaunt, Tom R.; Campbell, Colin

2018-01-01

Background Small insertions and deletions (indels) have a significant influence in human disease and, in terms of frequency, they are second only to single nucleotide variants as pathogenic mutations. As the majority of mutations associated with complex traits are located outside the exome, it is crucial to investigate the potential pathogenic impact of indels in non-coding regions of the human genome. Results We present FATHMM-indel, an integrative approach to predict the functional effect, pathogenic or neutral, of indels in non-coding regions of the human genome. Our method exploits various genomic annotations in addition to sequence data. When validated on benchmark data, FATHMM-indel significantly outperforms CADD and GAVIN, state of the art models in assessing the pathogenic impact of non-coding variants. FATHMM-indel is available via a web server at indels.biocompute.org.uk. Conclusions FATHMM-indel can accurately predict the functional impact and prioritise small indels throughout the whole non-coding genome. PMID:28985712
An integrative approach to predicting the functional effects of small indels in non-coding regions of the human genome.

PubMed

Ferlaino, Michael; Rogers, Mark F; Shihab, Hashem A; Mort, Matthew; Cooper, David N; Gaunt, Tom R; Campbell, Colin

2017-10-06

Small insertions and deletions (indels) have a significant influence in human disease and, in terms of frequency, they are second only to single nucleotide variants as pathogenic mutations. As the majority of mutations associated with complex traits are located outside the exome, it is crucial to investigate the potential pathogenic impact of indels in non-coding regions of the human genome. We present FATHMM-indel, an integrative approach to predict the functional effect, pathogenic or neutral, of indels in non-coding regions of the human genome. Our method exploits various genomic annotations in addition to sequence data. When validated on benchmark data, FATHMM-indel significantly outperforms CADD and GAVIN, state of the art models in assessing the pathogenic impact of non-coding variants. FATHMM-indel is available via a web server at indels.biocompute.org.uk. FATHMM-indel can accurately predict the functional impact and prioritise small indels throughout the whole non-coding genome.
Whole-genome sequencing of an aggressive BRAF wild-type papillary thyroid cancer identified EML4-ALK translocation as a therapeutic target.

PubMed

Demeure, Michael J; Aziz, Meraj; Rosenberg, Richard; Gurley, Steven D; Bussey, Kimberly J; Carpten, John D

2014-06-01

Recent advances in the treatment of cancer have focused on targeting genomic aberrations with selective therapeutic agents. In radioiodine resistant aggressive papillary thyroid cancers, there remain few effective therapeutic options. A 62-year-old man who underwent multiple operations for papillary thyroid cancer and whose metastases progressed despite standard treatments provided tumor tissue. We analyzed tumor and whole blood DNA by whole genome sequencing, achieving 80× or greater coverage over 94 % of the exome and 90 % of the genome. We determined somatic mutations and structural alterations. We found a total of 57 somatic mutations in 55 genes of the cancer genome. There was notably a lack of mutations in NRAS and BRAF, and no RET/PTC rearrangement. There was a mutation in the TRAPP oncogene and a loss of heterozygosity of the p16, p18, and RB1 tumor suppressor genes. The oncogenic driver for this tumor is a translocation involving the genes for anaplastic lymphoma receptor tyrosine kinase (ALK) and echinoderm microtubule associated protein like 4 (EML4). The EML4-ALK translocation has been reported in approximately 5 % of lung cancers, as well as in pediatric neuroblastoma, and is a therapeutic target for crizotinib. This is the first report of the whole genomic sequencing of a papillary thyroid cancer in which we identified an EML4-ALK translocation of a TRAPP oncogene mutation. These findings suggest that this tumor has a more distinct oncogenesis than BRAF mutant papillary thyroid cancer. Whole genome sequencing can elucidate an oncogenic context and expose potential therapeutic vulnerabilities in rare cancers.
In vivo therapeutic potential of Dicer-hunting siRNAs targeting infectious hepatitis C virus.

PubMed

Watanabe, Tsunamasa; Hatakeyama, Hiroto; Matsuda-Yasui, Chiho; Sato, Yusuke; Sudoh, Masayuki; Takagi, Asako; Hirata, Yuichi; Ohtsuki, Takahiro; Arai, Masaaki; Inoue, Kazuaki; Harashima, Hideyoshi; Kohara, Michinori

2014-04-23

The development of RNA interference (RNAi)-based therapy faces two major obstacles: selecting small interfering RNA (siRNA) sequences with strong activity, and identifying a carrier that allows efficient delivery to target organs. Additionally, conservative region at nucleotide level must be targeted for RNAi in applying to virus because hepatitis C virus (HCV) could escape from therapeutic pressure with genome mutations. In vitro preparation of Dicer-generated siRNAs targeting a conserved, highly ordered HCV 5' untranslated region are capable of inducing strong RNAi activity. By dissecting the 5'-end of an RNAi-mediated cleavage site in the HCV genome, we identified potent siRNA sequences, which we designate as Dicer-hunting siRNAs (dh-siRNAs). Furthermore, formulation of the dh-siRNAs in an optimized multifunctional envelope-type nano device inhibited ongoing infectious HCV replication in human hepatocytes in vivo. Our efforts using both identification of optimal siRNA sequences and delivery to human hepatocytes suggest therapeutic potential of siRNA for a virus.
Dictyostelium mobile elements: strategies to amplify in a compact genome.

PubMed

Winckler, T; Dingermann, T; Glöckner, G

2002-12-01

Dictyostelium discoideum is a eukaryotic microorganism that is attractive for the study of fundamental biological phenomena such as cell-cell communication, formation of multicellularity, cell differentiation and morphogenesis. Large-scale sequencing of the D. discoideum genome has provided new insights into evolutionary strategies evolved by transposable elements (TEs) to settle in compact microbial genomes and to maintain active populations over evolutionary time. The high gene density (about 1 gene/2.6 kb) of the D. discoideum genome leaves limited space for selfish molecular invaders to move and amplify without causing deleterious mutations that eradicate their host. Targeting of transfer RNA (tRNA) gene loci appears to be a generally successful strategy for TEs residing in compact genomes to insert away from coding regions. In D. discoideum, tRNA gene-targeted retrotransposition has evolved independently at least three times by both non-long terminal repeat (LTR) retrotransposons and retrovirus-like LTR retrotransposons. Unlike the nonspecifically inserting D. discoideum TEs, which have a strong tendency to insert into preexisting TE copies and form large and complex clusters near the ends of chromosomes, the tRNA gene-targeted retrotransposons have managed to occupy 75% of the tRNA gene loci spread on chromosome 2 and represent 80% of the TEs recognized on the assembled central 6.5-Mb part of chromosome 2. In this review we update the available information about D. discoideum TEs which emerges both from previous work and current large-scale genome sequencing, with special emphasis on the fact that tRNA genes are principal determinants of retrotransposon insertions into the D. discoideum genome.
QTL mapping in white spruce: gene maps and genomic regions underlying adaptive traits across pedigrees, years and environments.

PubMed

Pelgas, Betty; Bousquet, Jean; Meirmans, Patrick G; Ritland, Kermit; Isabel, Nathalie

2011-03-10

adaptation and growth in Picea taxa. The putative QTNs identified will be tested for associations in natural populations, with potential applications in molecular breeding and gene conservation programs. QTLs mapping consistently across years and environments could also be the most important targets for breeding, because they represent genomic regions that may be least affected by G × E interactions.
QTL mapping in white spruce: gene maps and genomic regions underlying adaptive traits across pedigrees, years and environments

PubMed Central

2011-01-01

association genetic studies of adaptation and growth in Picea taxa. The putative QTNs identified will be tested for associations in natural populations, with potential applications in molecular breeding and gene conservation programs. QTLs mapping consistently across years and environments could also be the most important targets for breeding, because they represent genomic regions that may be least affected by G × E interactions. PMID:21392393
A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Utturkar, Sagar M.; Klingeman, Dawn M.; Hurt, Jr., Richard A.

This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted.more » PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Furthermore, our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.« less
A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies

DOE PAGES

Utturkar, Sagar M.; Klingeman, Dawn M.; Hurt, Jr., Richard A.; ...

2017-07-18

This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted.more » PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Furthermore, our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.« less
A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies.

PubMed

Utturkar, Sagar M; Klingeman, Dawn M; Hurt, Richard A; Brown, Steven D

2017-01-01

This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.
A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies

PubMed Central

Utturkar, Sagar M.; Klingeman, Dawn M.; Hurt, Richard A.; Brown, Steven D.

2017-01-01

This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences. PMID:28769883
RNAi Functions in Adaptive Reprogramming of the Genome | Center for Cancer Research

Cancer.gov

The regulation of transcribing DNA into RNA, including the production, processing, and degradation of RNA transcripts, affects the expression and the regulation of the genome in ways that are just beginning to be unraveled. A surprising discovery in recent years is that the vast majority of the genome is transcribed to yield an abundance of RNA transcripts. Many transcripts are regulated by the exosome, a multi-protein complex that degrades RNAs, and may also be targeted, under certain conditions, by the RNA interference (RNAi) pathway. These RNA degrading activities can recruit factors to silence certain regions of the genome by condensing the DNA into tightly-packed heterochromatin. For some chromosomal regions, such as centromeres and telomeres, which lie at the center and ends of chromosomes, respectively, silencing must be stably enforced through each cell generation. For other regions, silencing mechanisms must be easily reversible to activate gene expression in response to changing environmental or developmental conditions. Thus, the regulation of gene silencing is key to maintaining the integrity of the genome and proper cellular expression patterns, which, when disrupted can underlie many diseases, including cancer.
Revised annotation of Plutella xylostella microRNAs and their genome-wide target identification.

PubMed

Etebari, K; Asgari, S

2016-12-01

The diamondback moth, Plutella xylostella, is the most devastating pest of brassica crops worldwide. Although 128 mature microRNAs (miRNAs) have been annotated from this species in miRBase, there is a need to extend and correct the current P. xylostella miRNA repertoire as a result of its recently improved genome assembly and more available small RNA sequence data. We used our new ultra-deep sequence data and bioinformatics to re-annotate the P. xylostella genome for high confidence miRNAs with the correct 5p and 3p arm features. Furthermore, all the P. xylostella annotated genes were also screened to identify potential miRNA binding sites using three target-predicting algorithms. In total, 203 mature miRNAs were annotated, including 33 novel miRNAs. We identified 7691 highly confident binding sites for 160 pxy-miRNAs. The data provided here will facilitate future studies involving functional analyses of P. xylostella miRNAs as a platform to introduce novel approaches for sustainable management of this destructive pest. © 2016 The Royal Entomological Society.
Origins of the Xylella fastidiosa Prophage-Like Regions and Their Impact in Genome Differentiation

PubMed Central

de Mello Varani, Alessandro; Souza, Rangel Celso; Nakaya, Helder I.; de Lima, Wanessa Cristina; Paula de Almeida, Luiz Gonzaga; Kitajima, Elliot Watanabe; Chen, Jianchi; Civerolo, Edwin; Vasconcelos, Ana Tereza Ribeiro; Van Sluys, Marie-Anne

2008-01-01

Xylella fastidiosa is a Gram negative plant pathogen causing many economically important diseases, and analyses of completely sequenced X. fastidiosa genome strains allowed the identification of many prophage-like elements and possibly phage remnants, accounting for up to 15% of the genome composition. To better evaluate the recent evolution of the X. fastidiosa chromosome backbone among distinct pathovars, the number and location of prophage-like regions on two finished genomes (9a5c and Temecula1), and in two candidate molecules (Ann1 and Dixon) were assessed. Based on comparative best bidirectional hit analyses, the majority (51%) of the predicted genes in the X. fastidiosa prophage-like regions are related to structural phage genes belonging to the Siphoviridae family. Electron micrograph reveals the existence of putative viral particles with similar morphology to lambda phages in the bacterial cell in planta. Moreover, analysis of microarray data indicates that 9a5c strain cultivated under stress conditions presents enhanced expression of phage anti-repressor genes, suggesting switches from lysogenic to lytic cycle of phages under stress-induced situations. Furthermore, virulence-associated proteins and toxins are found within these prophage-like elements, thus suggesting an important role in host adaptation. Finally, clustering analyses of phage integrase genes based on multiple alignment patterns reveal they group in five lineages, all possessing a tyrosine recombinase catalytic domain, and phylogenetically close to other integrases found in phages that are genetic mosaics and able to perform generalized and specialized transduction. Integration sites and tRNA association is also evidenced. In summary, we present comparative and experimental evidence supporting the association and contribution of phage activity on the differentiation of Xylella genomes. PMID:19116666
Origins of the Xylella fastidiosa prophage-like regions and their impact in genome differentiation.

PubMed

de Mello Varani, Alessandro; Souza, Rangel Celso; Nakaya, Helder I; de Lima, Wanessa Cristina; Paula de Almeida, Luiz Gonzaga; Kitajima, Elliot Watanabe; Chen, Jianchi; Civerolo, Edwin; Vasconcelos, Ana Tereza Ribeiro; Van Sluys, Marie-Anne

2008-01-01

Xylella fastidiosa is a Gram negative plant pathogen causing many economically important diseases, and analyses of completely sequenced X. fastidiosa genome strains allowed the identification of many prophage-like elements and possibly phage remnants, accounting for up to 15% of the genome composition. To better evaluate the recent evolution of the X. fastidiosa chromosome backbone among distinct pathovars, the number and location of prophage-like regions on two finished genomes (9a5c and Temecula1), and in two candidate molecules (Ann1 and Dixon) were assessed. Based on comparative best bidirectional hit analyses, the majority (51%) of the predicted genes in the X. fastidiosa prophage-like regions are related to structural phage genes belonging to the Siphoviridae family. Electron micrograph reveals the existence of putative viral particles with similar morphology to lambda phages in the bacterial cell in planta. Moreover, analysis of microarray data indicates that 9a5c strain cultivated under stress conditions presents enhanced expression of phage anti-repressor genes, suggesting switches from lysogenic to lytic cycle of phages under stress-induced situations. Furthermore, virulence-associated proteins and toxins are found within these prophage-like elements, thus suggesting an important role in host adaptation. Finally, clustering analyses of phage integrase genes based on multiple alignment patterns reveal they group in five lineages, all possessing a tyrosine recombinase catalytic domain, and phylogenetically close to other integrases found in phages that are genetic mosaics and able to perform generalized and specialized transduction. Integration sites and tRNA association is also evidenced. In summary, we present comparative and experimental evidence supporting the association and contribution of phage activity on the differentiation of Xylella genomes.
The complete mitochondrial genome of the common sea slater, Ligia oceanica (Crustacea, Isopoda) bears a novel gene order and unusual control region features

PubMed Central

Kilpert, Fabian; Podsiadlowski, Lars

2006-01-01

Background Sequence data and other characters from mitochondrial genomes (gene translocations, secondary structure of RNA molecules) are useful in phylogenetic studies among metazoan animals from population to phylum level. Moreover, the comparison of complete mitochondrial sequences gives valuable information about the evolution of small genomes, e.g. about different mechanisms of gene translocation, gene duplication and gene loss, or concerning nucleotide frequency biases. The Peracarida (gammarids, isopods, etc.) comprise about 21,000 species of crustaceans, living in many environments from deep sea floor to arid terrestrial habitats. Ligia oceanica is a terrestrial isopod living at rocky seashores of the european North Sea and Atlantic coastlines. Results The study reveals the first complete mitochondrial DNA sequence from a peracarid crustacean. The mitochondrial genome of Ligia oceanica is a circular double-stranded DNA molecule, with a size of 15,289 bp. It shows several changes in mitochondrial gene order compared to other crustacean species. An overview about mitochondrial gene order of all crustacean taxa yet sequenced is also presented. The largest non-coding part (the putative mitochondrial control region) of the mitochondrial genome of Ligia oceanica is unexpectedly not AT-rich compared to the remainder of the genome. It bears two repeat regions (4× 10 bp and 3× 64 bp), and a GC-rich hairpin-like secondary structure. Some of the transfer RNAs show secondary structures which derive from the usual cloverleaf pattern. While some tRNA genes are putative targets for RNA editing, trnR could not be localized at all. Conclusion Gene order is not conserved among Peracarida, not even among isopods. The two isopod species Ligia oceanica and Idotea baltica show a similarly derived gene order, compared to the arthropod ground pattern and to the amphipod Parhyale hawaiiensis, suggesting that most of the translocation events were already present the last common
The patterns of genomic variances and covariances across genome for milk production traits between Chinese and Nordic Holstein populations.

PubMed

Li, Xiujin; Lund, Mogens Sandø; Janss, Luc; Wang, Chonglong; Ding, Xiangdong; Zhang, Qin; Su, Guosheng

2017-03-15

With the development of SNP chips, SNP information provides an efficient approach to further disentangle different patterns of genomic variances and covariances across the genome for traits of interest. Due to the interaction between genotype and environment as well as possible differences in genetic background, it is reasonable to treat the performances of a biological trait in different populations as different but genetic correlated traits. In the present study, we performed an investigation on the patterns of region-specific genomic variances, covariances and correlations between Chinese and Nordic Holstein populations for three milk production traits. Variances and covariances between Chinese and Nordic Holstein populations were estimated for genomic regions at three different levels of genome region (all SNP as one region, each chromosome as one region and every 100 SNP as one region) using a novel multi-trait random regression model which uses latent variables to model heterogeneous variance and covariance. In the scenario of the whole genome as one region, the genomic variances, covariances and correlations obtained from the new multi-trait Bayesian method were comparable to those obtained from a multi-trait GBLUP for all the three milk production traits. In the scenario of each chromosome as one region, BTA 14 and BTA 5 accounted for very large genomic variance, covariance and correlation for milk yield and fat yield, whereas no specific chromosome showed very large genomic variance, covariance and correlation for protein yield. In the scenario of every 100 SNP as one region, most regions explained <0.50% of genomic variance and covariance for milk yield and fat yield, and explained <0.30% for protein yield, while some regions could present large variance and covariance. Although overall correlations between two populations for the three traits were positive and high, a few regions still showed weakly positive or highly negative genomic correlations for
Sequencing of a QTL-rich region of the Theobroma cacao genome using pooled BACs and the identification of trait specific candidate genes

USDA-ARS?s Scientific Manuscript database

Background: BAC-based physical maps provide for sequencing across an entire genome or selected sub-genome regions of biological interest. Using the minimum tiling path as a guide, it is possible to select specific BAC clones from prioritized genome sections such as a genetically defined QTL interv...

Target-Pathogen: a structural bioinformatic approach to prioritize drug targets in pathogens.

PubMed

Sosa, Ezequiel J; Burguener, Germán; Lanzarotti, Esteban; Defelipe, Lucas; Radusky, Leandro; Pardo, Agustín M; Marti, Marcelo; Turjanski, Adrián G; Fernández Do Porto, Darío

2018-01-04

Available genomic data for pathogens has created new opportunities for drug discovery and development to fight them, including new resistant and multiresistant strains. In particular structural data must be integrated with both, gene information and experimental results. In this sense, there is a lack of an online resource that allows genome wide-based data consolidation from diverse sources together with thorough bioinformatic analysis that allows easy filtering and scoring for fast target selection for drug discovery. Here, we present Target-Pathogen database (http://target.sbg.qb.fcen.uba.ar/patho), designed and developed as an online resource that allows the integration and weighting of protein information such as: function, metabolic role, off-targeting, structural properties including druggability, essentiality and omic experiments, to facilitate the identification and prioritization of candidate drug targets in pathogens. We include in the database 10 genomes of some of the most relevant microorganisms for human health (Mycobacterium tuberculosis, Mycobacterium leprae, Klebsiella pneumoniae, Plasmodium vivax, Toxoplasma gondii, Leishmania major, Wolbachia bancrofti, Trypanosoma brucei, Shigella dysenteriae and Schistosoma Smanosoni) and show its applicability. New genomes can be uploaded upon request. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Target-Pathogen: a structural bioinformatic approach to prioritize drug targets in pathogens

PubMed Central

Sosa, Ezequiel J; Burguener, Germán; Lanzarotti, Esteban; Radusky, Leandro; Pardo, Agustín M; Marti, Marcelo

2018-01-01

Abstract Available genomic data for pathogens has created new opportunities for drug discovery and development to fight them, including new resistant and multiresistant strains. In particular structural data must be integrated with both, gene information and experimental results. In this sense, there is a lack of an online resource that allows genome wide-based data consolidation from diverse sources together with thorough bioinformatic analysis that allows easy filtering and scoring for fast target selection for drug discovery. Here, we present Target-Pathogen database (http://target.sbg.qb.fcen.uba.ar/patho), designed and developed as an online resource that allows the integration and weighting of protein information such as: function, metabolic role, off-targeting, structural properties including druggability, essentiality and omic experiments, to facilitate the identification and prioritization of candidate drug targets in pathogens. We include in the database 10 genomes of some of the most relevant microorganisms for human health (Mycobacterium tuberculosis, Mycobacterium leprae, Klebsiella pneumoniae, Plasmodium vivax, Toxoplasma gondii, Leishmania major, Wolbachia bancrofti, Trypanosoma brucei, Shigella dysenteriae and Schistosoma Smanosoni) and show its applicability. New genomes can be uploaded upon request. PMID:29106651
The importance of the genomic landscape in Waldenström's Macroglobulinemia for targeted therapeutical interventions

PubMed Central

Sacco, Antonio; Fenotti, Adriano; Affò, Loredana; Bazzana, Stefano; Russo, Domenico; Presta, Marco; Malagola, Michele; Anastasia, Antonella; Motta, Marina; Patterson, Christopher J.; Rossi, Giuseppe; Imberti, Luisa; Treon, Steven P.; Ghobrial, Irene M.; Roccaro, Aldo M.

2017-01-01

The Literature has recently reported on the importance of genomics in the field of hematologic malignancies, including B-cell lymphoproliferative disorders such as Waldenström's Macrolgobulinemia (WM). Particularly, whole exome sequencing has led to the identification of the MYD88L265P and CXCR4C1013G somatic variants in WM, occurring in about 90% and 30% of the patients, respectively. Subsequently, functional studies have demonstrated their functional role in supporting WM pathogenesis and disease progression, both in vitro and in vivo, thus providing the pre-clinical evidences for extremely attractive targets for novel therapeutic interventions in WM. Of note, recent evidences have also approached and defined the transcriptome profiling of WM cells, revealing a signature that mirrors the somatic aberrations demonstrated within the tumor clone. A parallel research field has also reported on microRNAs (miRNAs), highlighting the oncogenic role of miRNA-155 in WM. In the present review, we focus on the latest reports on genomics and miRNAs in WM, providing an overview of the clinical relevance of the latest acquired knowledge about genomics and miRNA aberrations in WM. PMID:28423722
The importance of the genomic landscape in Waldenström's Macroglobulinemia for targeted therapeutical interventions.

PubMed

Sacco, Antonio; Fenotti, Adriano; Affò, Loredana; Bazzana, Stefano; Russo, Domenico; Presta, Marco; Malagola, Michele; Anastasia, Antonella; Motta, Marina; Patterson, Christopher J; Rossi, Giuseppe; Imberti, Luisa; Treon, Steven P; Ghobrial, Irene M; Roccaro, Aldo M

2017-05-23

The Literature has recently reported on the importance of genomics in the field of hematologic malignancies, including B-cell lymphoproliferative disorders such as Waldenström's Macrolgobulinemia (WM). Particularly, whole exome sequencing has led to the identification of the MYD88L265P and CXCR4C1013G somatic variants in WM, occurring in about 90% and 30% of the patients, respectively. Subsequently, functional studies have demonstrated their functional role in supporting WM pathogenesis and disease progression, both in vitro and in vivo, thus providing the pre-clinical evidences for extremely attractive targets for novel therapeutic interventions in WM. Of note, recent evidences have also approached and defined the transcriptome profiling of WM cells, revealing a signature that mirrors the somatic aberrations demonstrated within the tumor clone. A parallel research field has also reported on microRNAs (miRNAs), highlighting the oncogenic role of miRNA-155 in WM. In the present review, we focus on the latest reports on genomics and miRNAs in WM, providing an overview of the clinical relevance of the latest acquired knowledge about genomics and miRNA aberrations in WM.
Applied Genomics: Data Mining Reveals Species-Specific Malaria Diagnostic Targets More Sensitive than 18S rRNA▿†‡

PubMed Central

Demas, Allison; Oberstaller, Jenna; DeBarry, Jeremy; Lucchi, Naomi W.; Srinivasamoorthy, Ganesh; Sumari, Deborah; Kabanywanyi, Abdunoor M.; Villegas, Leopoldo; Escalante, Ananias A.; Kachur, S. Patrick; Barnwell, John W.; Peterson, David S.; Udhayakumar, Venkatachalam; Kissinger, Jessica C.

2011-01-01

Accurate and rapid diagnosis of malaria infections is crucial for implementing species-appropriate treatment and saving lives. Molecular diagnostic tools are the most accurate and sensitive method of detecting Plasmodium, differentiating between Plasmodium species, and detecting subclinical infections. Despite available whole-genome sequence data for Plasmodium falciparum and P. vivax, the majority of PCR-based methods still rely on the 18S rRNA gene targets. Historically, this gene has served as the best target for diagnostic assays. However, it is limited in its ability to detect mixed infections in multiplex assay platforms without the use of nested PCR. New diagnostic targets are needed. Ideal targets will be species specific, highly sensitive, and amenable to both single-step and multiplex PCRs. We have mined the genomes of P. falciparum and P. vivax to identify species-specific, repetitive sequences that serve as new PCR targets for the detection of malaria. We show that these targets (Pvr47 and Pfr364) exist in 14 to 41 copies and are more sensitive than 18S rRNA when utilized in a single-step PCR. Parasites are routinely detected at levels of 1 to 10 parasites/μl. The reaction can be multiplexed to detect both species in a single reaction. We have examined 7 P. falciparum strains and 91 P. falciparum clinical isolates from Tanzania and 10 P. vivax strains and 96 P. vivax clinical isolates from Venezuela, and we have verified a sensitivity and specificity of ∼100% for both targets compared with a nested 18S rRNA approach. We show that bioinformatics approaches can be successfully applied to identify novel diagnostic targets and improve molecular methods for pathogen detection. These novel targets provide a powerful alternative molecular diagnostic method for the detection of P. falciparum and P. vivax in conventional or multiplex PCR platforms. PMID:21525225
A Perfect Match Genomic Landscape Provides a Unified Framework for the Precise Detection of Variation in Natural and Synthetic Haploid Genomes

PubMed Central

Palacios-Flores, Kim; García-Sotelo, Jair; Castillo, Alejandra; Uribe, Carina; Aguilar, Luis; Morales, Lucía; Gómez-Romero, Laura; Reyes, José; Garciarubio, Alejandro; Boege, Margareta; Dávila, Guillermo

2018-01-01

We present a conceptually simple, sensitive, precise, and essentially nonstatistical solution for the analysis of genome variation in haploid organisms. The generation of a Perfect Match Genomic Landscape (PMGL), which computes intergenome identity with single nucleotide resolution, reveals signatures of variation wherever a query genome differs from a reference genome. Such signatures encode the precise location of different types of variants, including single nucleotide variants, deletions, insertions, and amplifications, effectively introducing the concept of a general signature of variation. The precise nature of variants is then resolved through the generation of targeted alignments between specific sets of sequence reads and known regions of the reference genome. Thus, the perfect match logic decouples the identification of the location of variants from the characterization of their nature, providing a unified framework for the detection of genome variation. We assessed the performance of the PMGL strategy via simulation experiments. We determined the variation profiles of natural genomes and of a synthetic chromosome, both in the context of haploid yeast strains. Our approach uncovered variants that have previously escaped detection. Moreover, our strategy is ideally suited for further refining high-quality reference genomes. The source codes for the automated PMGL pipeline have been deposited in a public repository. PMID:29367403
A Perfect Match Genomic Landscape Provides a Unified Framework for the Precise Detection of Variation in Natural and Synthetic Haploid Genomes.

PubMed

Palacios-Flores, Kim; García-Sotelo, Jair; Castillo, Alejandra; Uribe, Carina; Aguilar, Luis; Morales, Lucía; Gómez-Romero, Laura; Reyes, José; Garciarubio, Alejandro; Boege, Margareta; Dávila, Guillermo

2018-04-01

We present a conceptually simple, sensitive, precise, and essentially nonstatistical solution for the analysis of genome variation in haploid organisms. The generation of a Perfect Match Genomic Landscape (PMGL), which computes intergenome identity with single nucleotide resolution, reveals signatures of variation wherever a query genome differs from a reference genome. Such signatures encode the precise location of different types of variants, including single nucleotide variants, deletions, insertions, and amplifications, effectively introducing the concept of a general signature of variation. The precise nature of variants is then resolved through the generation of targeted alignments between specific sets of sequence reads and known regions of the reference genome. Thus, the perfect match logic decouples the identification of the location of variants from the characterization of their nature, providing a unified framework for the detection of genome variation. We assessed the performance of the PMGL strategy via simulation experiments. We determined the variation profiles of natural genomes and of a synthetic chromosome, both in the context of haploid yeast strains. Our approach uncovered variants that have previously escaped detection. Moreover, our strategy is ideally suited for further refining high-quality reference genomes. The source codes for the automated PMGL pipeline have been deposited in a public repository. Copyright © 2018 by the Genetics Society of America.
MHC class I-associated peptides derive from selective regions of the human genome.

PubMed

Pearson, Hillary; Daouda, Tariq; Granados, Diana Paola; Durette, Chantal; Bonneil, Eric; Courcelles, Mathieu; Rodenbrock, Anja; Laverdure, Jean-Philippe; Côté, Caroline; Mader, Sylvie; Lemieux, Sébastien; Thibault, Pierre; Perreault, Claude

2016-12-01

MHC class I-associated peptides (MAPs) define the immune self for CD8+ T lymphocytes and are key targets of cancer immunosurveillance. Here, the goals of our work were to determine whether the entire set of protein-coding genes could generate MAPs and whether specific features influence the ability of discrete genes to generate MAPs. Using proteogenomics, we have identified 25,270 MAPs isolated from the B lymphocytes of 18 individuals who collectively expressed 27 high-frequency HLA-A,B allotypes. The entire MAP repertoire presented by these 27 allotypes covered only 10% of the exomic sequences expressed in B lymphocytes. Indeed, 41% of expressed protein-coding genes generated no MAPs, while 59% of genes generated up to 64 MAPs, often derived from adjacent regions and presented by different allotypes. We next identified several features of transcripts and proteins associated with efficient MAP production. From these data, we built a logistic regression model that predicts with good accuracy whether a gene generates MAPs. Our results show preferential selection of MAPs from a limited repertoire of proteins with distinctive features. The notion that the MHC class I immunopeptidome presents only a small fraction of the protein-coding genome for monitoring by the immune system has profound implications in autoimmunity and cancer immunology.
MHC class I–associated peptides derive from selective regions of the human genome

PubMed Central

Pearson, Hillary; Granados, Diana Paola; Durette, Chantal; Bonneil, Eric; Courcelles, Mathieu; Rodenbrock, Anja; Laverdure, Jean-Philippe; Côté, Caroline; Thibault, Pierre

2016-01-01

MHC class I–associated peptides (MAPs) define the immune self for CD8+ T lymphocytes and are key targets of cancer immunosurveillance. Here, the goals of our work were to determine whether the entire set of protein-coding genes could generate MAPs and whether specific features influence the ability of discrete genes to generate MAPs. Using proteogenomics, we have identified 25,270 MAPs isolated from the B lymphocytes of 18 individuals who collectively expressed 27 high-frequency HLA-A,B allotypes. The entire MAP repertoire presented by these 27 allotypes covered only 10% of the exomic sequences expressed in B lymphocytes. Indeed, 41% of expressed protein-coding genes generated no MAPs, while 59% of genes generated up to 64 MAPs, often derived from adjacent regions and presented by different allotypes. We next identified several features of transcripts and proteins associated with efficient MAP production. From these data, we built a logistic regression model that predicts with good accuracy whether a gene generates MAPs. Our results show preferential selection of MAPs from a limited repertoire of proteins with distinctive features. The notion that the MHC class I immunopeptidome presents only a small fraction of the protein-coding genome for monitoring by the immune system has profound implications in autoimmunity and cancer immunology. PMID:27841757
Quantitative in vivo whole genome motility screen reveals novel therapeutic targets to block cancer metastasis.

PubMed

Stoletov, Konstantin; Willetts, Lian; Paproski, Robert J; Bond, David J; Raha, Srijan; Jovel, Juan; Adam, Benjamin; Robertson, Amy E; Wong, Francis; Woolner, Emma; Sosnowski, Deborah L; Bismar, Tarek A; Wong, Gane Ka-Shu; Zijlstra, Andries; Lewis, John D

2018-06-14

Metastasis is the most lethal aspect of cancer, yet current therapeutic strategies do not target its key rate-limiting steps. We have previously shown that the entry of cancer cells into the blood stream, or intravasation, is highly dependent upon in vivo cancer cell motility, making it an attractive therapeutic target. To systemically identify genes required for tumor cell motility in an in vivo tumor microenvironment, we established a novel quantitative in vivo screening platform based on intravital imaging of human cancer metastasis in ex ovo avian embryos. Utilizing this platform to screen a genome-wide shRNA library, we identified a panel of novel genes whose function is required for productive cancer cell motility in vivo, and whose expression is closely associated with metastatic risk in human cancers. The RNAi-mediated inhibition of these gene targets resulted in a nearly total (>99.5%) block of spontaneous cancer metastasis in vivo.
A Comparative Genomics Strategy for Targeted Discovery of Single-Nucleotide Polymorphisms and Conserved-Noncoding Sequences in Orphan Crops1[W

PubMed Central

Feltus, F.A.; Singh, H.P.; Lohithaswa, H.C.; Schulze, S.R.; Silva, T.D.; Paterson, A.H.

2006-01-01

Completed genome sequences provide templates for the design of genome analysis tools in orphan species lacking sequence information. To demonstrate this principle, we designed 384 PCR primer pairs to conserved exonic regions flanking introns, using Sorghum/Pennisetum expressed sequence tag alignments to the Oryza genome. Conserved-intron scanning primers (CISPs) amplified single-copy loci at 37% to 80% success rates in taxa that sample much of the approximately 50-million years of Poaceae divergence. While the conserved nature of exons fostered cross-taxon amplification, the lesser evolutionary constraints on introns enhanced single-nucleotide polymorphism detection. For example, in eight rice (Oryza sativa) genotypes, polymorphism averaged 12.1 per kb in introns but only 3.6 per kb in exons. Curiously, among 124 CISPs evaluated across Oryza, Sorghum, Pennisetum, Cynodon, Eragrostis, Zea, Triticum, and Hordeum, 23 (18.5%) seemed to be subject to rigid intron size constraints that were independent of per-nucleotide DNA sequence variation. Furthermore, we identified 487 conserved-noncoding sequence motifs in 129 CISP loci. A large CISP set (6,062 primer pairs, amplifying introns from 1,676 genes) designed using an automated pipeline showed generally higher abundance in recombinogenic than in nonrecombinogenic regions of the rice genome, thus providing relatively even distribution along genetic maps. CISPs are an effective means to explore poorly characterized genomes for both DNA polymorphism and noncoding sequence conservation on a genome-wide or candidate gene basis, and also provide anchor points for comparative genomics across a diverse range of species. PMID:16607031
Non-Enzymatic Detection of Bacterial Genomic DNA Using the Bio-Barcode Assay

PubMed Central

Hill, Haley D.; Vega, Rafael A.; Mirkin, Chad A.

2011-01-01

The detection of bacterial genomic DNA through a non-enzymatic nanomaterials based amplification method, the bio-barcode assay, is reported. The assay utilizes oligonucleotide functionalized magnetic microparticles to capture the target of interest from the sample. A critical step in the new assay involves the use of blocking oligonucleotides during heat denaturation of the double stranded DNA. These blockers bind to specific regions of the target DNA upon cooling, and prevent the duplex DNA from re-hybridizing, which allows the particle probes to bind. Following target isolation using the magnetic particles, oligonucleotide functionalized gold nanoparticles act as target recognition agents. The oligonucleotides on the nanoparticle (barcodes) act as amplification surrogates. The barcodes are then detected using the Scanometric method. The limit of detection for this assay was determined to be 2.5 femtomolar, and this is the first demonstration of a barcode type assay for the detection of double stranded, genomic DNA. PMID:17927207
Utilizing the Dog Genome in the Search for Novel Candidate Genes Involved in Glioma Development—Genome Wide Association Mapping followed by Targeted Massive Parallel Sequencing Identifies a Strongly Associated Locus

PubMed Central

Dickinson, Peter; Xiong, Anqi; York, Daniel; Jayashankar, Kartika; Pielberg, Gerli; Koltookian, Michele; Murén, Eva; Fuxelius, Hans-Henrik; Weishaupt, Holger; Andersson, Göran; Hedhammar, Åke; Bongcam-Rudloff, Erik; Forsberg-Nilsson, Karin

2016-01-01

Gliomas are the most common form of malignant primary brain tumors in humans and second most common in dogs, occurring with similar frequencies in both species. Dogs are valuable spontaneous models of human complex diseases including cancers and may provide insight into disease susceptibility and oncogenesis. Several brachycephalic breeds such as Boxer, Bulldog and Boston Terrier have an elevated risk of developing glioma, but others, including Pug and Pekingese, are not at higher risk. To identify glioma-associated genetic susceptibility factors, an across-breed genome-wide association study (GWAS) was performed on 39 dog glioma cases and 141 controls from 25 dog breeds, identifying a genome-wide significant locus on canine chromosome (CFA) 26 (p = 2.8 x 10−8). Targeted re-sequencing of the 3.4 Mb candidate region was performed, followed by genotyping of the 56 SNVs that best fit the association pattern between the re-sequenced cases and controls. We identified three candidate genes that were highly associated with glioma susceptibility: CAMKK2, P2RX7 and DENR. CAMKK2 showed reduced expression in both canine and human brain tumors, and a non-synonymous variant in P2RX7, previously demonstrated to have a 50% decrease in receptor function, was also associated with disease. Thus, one or more of these genes appear to affect glioma susceptibility. PMID:27171399
Mammalian Comparative Genomics Reveals Genetic and Epigenetic Features Associated with Genome Reshuffling in Rodentia

PubMed Central

Capilla, Laia; Sánchez-Guillén, Rosa Ana; Farré, Marta; Paytuví-Gallart, Andreu; Malinverni, Roberto; Ventura, Jacint; Larkin, Denis M.

2016-01-01

Abstract Understanding how mammalian genomes have been reshuffled through structural changes is fundamental to the dynamics of its composition, evolutionary relationships between species and, in the long run, speciation. In this work, we reveal the evolutionary genomic landscape in Rodentia, the most diverse and speciose mammalian order, by whole-genome comparisons of six rodent species and six representative outgroup mammalian species. The reconstruction of the evolutionary breakpoint regions across rodent phylogeny shows an increased rate of genome reshuffling that is approximately two orders of magnitude greater than in other mammalian species here considered. We identified novel lineage and clade-specific breakpoint regions within Rodentia and analyzed their gene content, recombination rates and their relationship with constitutive lamina genomic associated domains, DNase I hypersensitivity sites and chromatin modifications. We detected an accumulation of protein-coding genes in evolutionary breakpoint regions, especially genes implicated in reproduction and pheromone detection and mating. Moreover, we found an association of the evolutionary breakpoint regions with active chromatin state landscapes, most probably related to gene enrichment. Our results have two important implications for understanding the mechanisms that govern and constrain mammalian genome evolution. The first is that the presence of genes related to species-specific phenotypes in evolutionary breakpoint regions reinforces the adaptive value of genome reshuffling. Second, that chromatin conformation, an aspect that has been often overlooked in comparative genomic studies, might play a role in modeling the genomic distribution of evolutionary breakpoints. PMID:28175287
Mammalian Comparative Genomics Reveals Genetic and Epigenetic Features Associated with Genome Reshuffling in Rodentia.

PubMed

Capilla, Laia; Sánchez-Guillén, Rosa Ana; Farré, Marta; Paytuví-Gallart, Andreu; Malinverni, Roberto; Ventura, Jacint; Larkin, Denis M; Ruiz-Herrera, Aurora

2016-12-01

Understanding how mammalian genomes have been reshuffled through structural changes is fundamental to the dynamics of its composition, evolutionary relationships between species and, in the long run, speciation. In this work, we reveal the evolutionary genomic landscape in Rodentia, the most diverse and speciose mammalian order, by whole-genome comparisons of six rodent species and six representative outgroup mammalian species. The reconstruction of the evolutionary breakpoint regions across rodent phylogeny shows an increased rate of genome reshuffling that is approximately two orders of magnitude greater than in other mammalian species here considered. We identified novel lineage and clade-specific breakpoint regions within Rodentia and analyzed their gene content, recombination rates and their relationship with constitutive lamina genomic associated domains, DNase I hypersensitivity sites and chromatin modifications. We detected an accumulation of protein-coding genes in evolutionary breakpoint regions, especially genes implicated in reproduction and pheromone detection and mating. Moreover, we found an association of the evolutionary breakpoint regions with active chromatin state landscapes, most probably related to gene enrichment. Our results have two important implications for understanding the mechanisms that govern and constrain mammalian genome evolution. The first is that the presence of genes related to species-specific phenotypes in evolutionary breakpoint regions reinforces the adaptive value of genome reshuffling. Second, that chromatin conformation, an aspect that has been often overlooked in comparative genomic studies, might play a role in modeling the genomic distribution of evolutionary breakpoints.
Schistosoma comparative genomics: integrating genome structure, parasite biology and anthelmintic discovery

PubMed Central

Swain, Martin T.; Larkin, Denis M.; Caffrey, Conor R.; Davies, Stephen J.; Loukas, Alex; Skelly, Patrick J.; Hoffmann, Karl F.

2011-01-01

Schistosoma genomes provide a comprehensive resource for identifying the molecular processes that shape parasite evolution and for discovering novel chemotherapeutic or immunoprophylactic targets. Here, we demonstrate how intra- and intergenus comparative genomics can be used to drive these investigations forward, illustrate the advantages and limitations of these approaches and review how post genomic technologies offer complementary strategies for genome characterisation. While sequencing and functional characterisation of other schistosome/platyhelminth genomes continues to expedite anthelmintic discovery, we contend that future priorities should equally focus on improving assembly quality, and chromosomal assignment, of existing schistosome/platyhelminth genomes. PMID:22024648
Genomic signatures of selection at linked sites: unifying the disparity among species

PubMed Central

Cutter, Asher D.; Payseur, Bret A.

2014-01-01

Population genetics theory supplies powerful predictions about how natural selection interacts with genetic linkage to sculpt the genomic landscape of nucleotide polymorphism. Both the spread of beneficial mutations and removal of deleterious mutations act to depress polymorphism levels, especially in low-recombination regions. However, empiricists have documented extreme disparities among species. Here we characterize the dominant features that could drive variation in linked selection among species, including roles for selective sweeps being ‘hard’ or ‘soft’, and concealing by demography and genomic confounds. We advocate targeted studies of close relatives to unify our understanding of how selection and linkage interact to shape genome evolution. PMID:23478346
Gaussian decomposition of high-resolution melt curve derivatives for measuring genome-editing efficiency

PubMed Central

Zaboikin, Michail; Freter, Carl

2018-01-01

We describe a method for measuring genome editing efficiency from in silico analysis of high-resolution melt curve data. The melt curve data derived from amplicons of genome-edited or unmodified target sites were processed to remove the background fluorescent signal emanating from free fluorophore and then corrected for temperature-dependent quenching of fluorescence of double-stranded DNA-bound fluorophore. Corrected data were normalized and numerically differentiated to obtain the first derivatives of the melt curves. These were then mathematically modeled as a sum or superposition of minimal number of Gaussian components. Using Gaussian parameters determined by modeling of melt curve derivatives of unedited samples, we were able to model melt curve derivatives from genetically altered target sites where the mutant population could be accommodated using an additional Gaussian component. From this, the proportion contributed by the mutant component in the target region amplicon could be accurately determined. Mutant component computations compared well with the mutant frequency determination from next generation sequencing data. The results were also consistent with our earlier studies that used difference curve areas from high-resolution melt curves for determining the efficiency of genome-editing reagents. The advantage of the described method is that it does not require calibration curves to estimate proportion of mutants in amplicons of genome-edited target sites. PMID:29300734
CRISPR/Cas9-mediated genome editing of Epstein-Barr virus in human cells.

PubMed

Yuen, Kit-San; Chan, Chi-Ping; Wong, Nok-Hei Mickey; Ho, Chau-Ha; Ho, Ting-Hin; Lei, Ting; Deng, Wen; Tsao, Sai Wah; Chen, Honglin; Kok, Kin-Hang; Jin, Dong-Yan

2015-03-01

The CRISPR (clustered regularly interspaced short palindromic repeats)/Cas9 (CRISPR-associated 9) system is a highly efficient and powerful tool for RNA-guided editing of the cellular genome. Whether CRISPR/Cas9 can also cleave the genome of DNA viruses such as Epstein-Barr virus (EBV), which undergo episomal replication in human cells, remains to be established. Here, we reported on CRISPR/Cas9-mediated editing of the EBV genome in human cells. Two guide RNAs (gRNAs) were used to direct a targeted deletion of 558 bp in the promoter region of BART (BamHI A rightward transcript) which encodes viral microRNAs (miRNAs). Targeted editing was achieved in several human epithelial cell lines latently infected with EBV, including nasopharyngeal carcinoma C666-1 cells. CRISPR/Cas9-mediated editing of the EBV genome was efficient. A recombinant virus with the desired deletion was obtained after puromycin selection of cells expressing Cas9 and gRNAs. No off-target cleavage was found by deep sequencing. The loss of BART miRNA expression and activity was verified, supporting the BART promoter as the major promoter of BART RNA. Although CRISPR/Cas9-mediated editing of the multicopy episome of EBV in infected HEK293 cells was mostly incomplete, viruses could be recovered and introduced into other cells at low m.o.i. Recombinant viruses with an edited genome could be further isolated through single-cell sorting. Finally, a DsRed selectable marker was successfully introduced into the EBV genome during the course of CRISPR/Cas9-mediated editing. Taken together, our work provided not only the first genetic evidence that the BART promoter drives the expression of the BART transcript, but also a new and efficient method for targeted editing of EBV genome in human cells. © 2015 The Authors.
Pervasive Transcription of a Herpesvirus Genome Generates Functionally Important RNAs

PubMed Central

Canny, Susan P.; Reese, Tiffany A.; Johnson, L. Steven; Zhang, Xin; Kambal, Amal; Duan, Erning; Liu, Catherine Y.; Virgin, Herbert W.

2014-01-01

ABSTRACT Pervasive transcription is observed in a wide range of organisms, including humans, mice, and viruses, but the functional significance of the resulting transcripts remains uncertain. Current genetic approaches are often limited by their emphasis on protein-coding open reading frames (ORFs). We previously identified extensive pervasive transcription from the murine gammaherpesvirus 68 (MHV68) genome outside known ORFs and antisense to known genes (termed expressed genomic regions [EGRs]). Similar antisense transcripts have been identified in many other herpesviruses, including Kaposi’s sarcoma-associated herpesvirus and human and murine cytomegalovirus. Despite their prevalence, whether these RNAs have any functional importance in the viral life cycle is unknown, and one interpretation is that these are merely “noise” generated by functionally unimportant transcriptional events. To determine whether pervasive transcription of a herpesvirus genome generates RNA molecules that are functionally important, we used a strand-specific functional approach to target transcripts from thirteen EGRs in MHV68. We found that targeting transcripts from six EGRs reduced viral protein expression, proving that pervasive transcription can generate functionally important RNAs. We characterized transcripts emanating from EGRs 26 and 27 in detail using several methods, including RNA sequencing, and identified several novel polyadenylated transcripts that were enriched in the nuclei of infected cells. These data provide the first evidence of the functional importance of regions of pervasive transcription emanating from MHV68 EGRs. Therefore, studies utilizing mutation of a herpesvirus genome must account for possible effects on RNAs generated by pervasive transcription. PMID:24618256

Techniques for Large-Scale Bacterial Genome Manipulation and Characterization of the Mutants with Respect to In Silico Metabolic Reconstructions.

PubMed

diCenzo, George C; Finan, Turlough M

2018-01-01

The rate at which all genes within a bacterial genome can be identified far exceeds the ability to characterize these genes. To assist in associating genes with cellular functions, a large-scale bacterial genome deletion approach can be employed to rapidly screen tens to thousands of genes for desired phenotypes. Here, we provide a detailed protocol for the generation of deletions of large segments of bacterial genomes that relies on the activity of a site-specific recombinase. In this procedure, two recombinase recognition target sequences are introduced into known positions of a bacterial genome through single cross-over plasmid integration. Subsequent expression of the site-specific recombinase mediates recombination between the two target sequences, resulting in the excision of the intervening region and its loss from the genome. We further illustrate how this deletion system can be readily adapted to function as a large-scale in vivo cloning procedure, in which the region excised from the genome is captured as a replicative plasmid. We next provide a procedure for the metabolic analysis of bacterial large-scale genome deletion mutants using the Biolog Phenotype MicroArray™ system. Finally, a pipeline is described, and a sample Matlab script is provided, for the integration of the obtained data with a draft metabolic reconstruction for the refinement of the reactions and gene-protein-reaction relationships in a metabolic reconstruction.
In situ optical sequencing and structure analysis of a trinucleotide repeat genome region by localization microscopy after specific COMBO-FISH nano-probing

NASA Astrophysics Data System (ADS)

Stuhlmüller, M.; Schwarz-Finsterle, J.; Fey, E.; Lux, J.; Bach, M.; Cremer, C.; Hinderhofer, K.; Hausmann, M.; Hildenbrand, G.

2015-10-01

Trinucleotide repeat expansions (like (CGG)n) of chromatin in the genome of cell nuclei can cause neurological disorders such as for example the Fragile-X syndrome. Until now the mechanisms are not clearly understood as to how these expansions develop during cell proliferation. Therefore in situ investigations of chromatin structures on the nanoscale are required to better understand supra-molecular mechanisms on the single cell level. By super-resolution localization microscopy (Spectral Position Determination Microscopy; SPDM) in combination with nano-probing using COMBO-FISH (COMBinatorial Oligonucleotide FISH), novel insights into the nano-architecture of the genome will become possible. The native spatial structure of trinucleotide repeat expansion genome regions was analysed and optical sequencing of repetitive units was performed within 3D-conserved nuclei using SPDM after COMBO-FISH. We analysed a (CGG)n-expansion region inside the 5' untranslated region of the FMR1 gene. The number of CGG repeats for a full mutation causing the Fragile-X syndrome was found and also verified by Southern blot. The FMR1 promotor region was similarly condensed like a centromeric region whereas the arrangement of the probes labelling the expansion region seemed to indicate a loop-like nano-structure. These results for the first time demonstrate that in situ chromatin structure measurements on the nanoscale are feasible. Due to further methodological progress it will become possible to estimate the state of trinucleotide repeat mutations in detail and to determine the associated chromatin strand structural changes on the single cell level. In general, the application of the described approach to any genome region will lead to new insights into genome nano-architecture and open new avenues for understanding mechanisms and their relevance in the development of heredity diseases.
GWASeq: targeted re-sequencing follow up to GWAS.

PubMed

Salomon, Matthew P; Li, Wai Lok Sibon; Edlund, Christopher K; Morrison, John; Fortini, Barbara K; Win, Aung Ko; Conti, David V; Thomas, Duncan C; Duggan, David; Buchanan, Daniel D; Jenkins, Mark A; Hopper, John L; Gallinger, Steven; Le Marchand, Loïc; Newcomb, Polly A; Casey, Graham; Marjoram, Paul

2016-03-03

For the last decade the conceptual framework of the Genome-Wide Association Study (GWAS) has dominated the investigation of human disease and other complex traits. While GWAS have been successful in identifying a large number of variants associated with various phenotypes, the overall amount of heritability explained by these variants remains small. This raises the question of how best to follow up on a GWAS, localize causal variants accounting for GWAS hits, and as a consequence explain more of the so-called "missing" heritability. Advances in high throughput sequencing technologies now allow for the efficient and cost-effective collection of vast amounts of fine-scale genomic data to complement GWAS. We investigate these issues using a colon cancer dataset. After QC, our data consisted of 1993 cases, 899 controls. Using marginal tests of associations, we identify 10 variants distributed among six targeted regions that are significantly associated with colorectal cancer, with eight of the variants being novel to this study. Additionally, we perform so-called 'SNP-set' tests of association and identify two sets of variants that implicate both common and rare variants in the etiology of colorectal cancer. Here we present a large-scale targeted re-sequencing resource focusing on genomic regions implicated in colorectal cancer susceptibility previously identified in several GWAS, which aims to 1) provide fine-scale targeted sequencing data for fine-mapping and 2) provide data resources to address methodological questions regarding the design of sequencing-based follow-up studies to GWAS. Additionally, we show that this strategy successfully identifies novel variants associated with colorectal cancer susceptibility and can implicate both common and rare variants.
Dana-Farber Cancer Institute: Identification of Therapeutic Targets Across Cancer Types | Office of Cancer Genomics

Cancer.gov

The Dana Farber Cancer Institute CTD2 Center focuses on the use of high-throughput genetic and bioinformatic approaches to identify and credential oncogenes and co-dependencies in cancers. This Center aims to provide the cancer research community with information that will facilitate the prioritization of targets based on both genomic and functional evidence, inform the most appropriate genetic context for downstream mechanistic and validation studies, and enable the translation of this information into therapeutics and diagnostics.
Improvement of experimental testing and network training conditions with genome-wide microarrays for more accurate predictions of drug gene targets

PubMed Central

2014-01-01

Background Genome-wide microarrays have been useful for predicting chemical-genetic interactions at the gene level. However, interpreting genome-wide microarray results can be overwhelming due to the vast output of gene expression data combined with off-target transcriptional responses many times induced by a drug treatment. This study demonstrates how experimental and computational methods can interact with each other, to arrive at more accurate predictions of drug-induced perturbations. We present a two-stage strategy that links microarray experimental testing and network training conditions to predict gene perturbations for a drug with a known mechanism of action in a well-studied organism. Results S. cerevisiae cells were treated with the antifungal, fluconazole, and expression profiling was conducted under different biological conditions using Affymetrix genome-wide microarrays. Transcripts were filtered with a formal network-based method, sparse simultaneous equation models and Lasso regression (SSEM-Lasso), under different network training conditions. Gene expression results were evaluated using both gene set and single gene target analyses, and the drug’s transcriptional effects were narrowed first by pathway and then by individual genes. Variables included: (i) Testing conditions – exposure time and concentration and (ii) Network training conditions – training compendium modifications. Two analyses of SSEM-Lasso output – gene set and single gene – were conducted to gain a better understanding of how SSEM-Lasso predicts perturbation targets. Conclusions This study demonstrates that genome-wide microarrays can be optimized using a two-stage strategy for a more in-depth understanding of how a cell manifests biological reactions to a drug treatment at the transcription level. Additionally, a more detailed understanding of how the statistical model, SSEM-Lasso, propagates perturbations through a network of gene regulatory interactions is achieved
Nuclease-mediated genome editing: At the front-line of functional genomics technology.

PubMed

Sakuma, Tetsushi; Woltjen, Knut

2014-01-01

Genome editing with engineered endonucleases is rapidly becoming a staple method in developmental biology studies. Engineered nucleases permit random or designed genomic modification at precise loci through the stimulation of endogenous double-strand break repair. Homology-directed repair following targeted DNA damage is mediated by co-introduction of a custom repair template, allowing the derivation of knock-out and knock-in alleles in animal models previously refractory to classic gene targeting procedures. Currently there are three main types of customizable site-specific nucleases delineated by the source mechanism of DNA binding that guides nuclease activity to a genomic target: zinc-finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), and clustered regularly interspaced short palindromic repeats (CRISPR). Among these genome engineering tools, characteristics such as the ease of design and construction, mechanism of inducing DNA damage, and DNA sequence specificity all differ, making their application complementary. By understanding the advantages and disadvantages of each method, one may make the best choice for their particular purpose. © 2014 The Authors Development, Growth & Differentiation © 2014 Japanese Society of Developmental Biologists.
Merkel cell polyomavirus small T antigen induces genome instability by E3 ubiquitin ligase targeting.

PubMed

Kwun, H J; Wendzicki, J A; Shuda, Y; Moore, P S; Chang, Y

2017-12-07

The formation of a bipolar mitotic spindle is an essential process for the equal segregation of duplicated DNA into two daughter cells during mitosis. As a result of deregulated cellular signaling pathways, cancer cells often suffer a loss of genome integrity that might etiologically contribute to carcinogenesis. Merkel cell polyomavirus (MCV) small T (sT) oncoprotein induces centrosome overduplication, aneuploidy, chromosome breakage and the formation of micronuclei by targeting cellular ligases through a sT domain that also inhibits MCV large T oncoprotein turnover. These results provide important insight as to how centrosome number and chromosomal stability can be affected by the E3 ligase targeting capacity of viral oncoproteins such as MCV sT, which may contribute to Merkel cell carcinogenesis.
Intergenic disease-associated regions are abundant in novel transcripts.

PubMed

Bartonicek, N; Clark, M B; Quek, X C; Torpy, J R; Pritchard, A L; Maag, J L V; Gloss, B S; Crawford, J; Taft, R J; Hayward, N K; Montgomery, G W; Mattick, J S; Mercer, T R; Dinger, M E

2017-12-28

Genotyping of large populations through genome-wide association studies (GWAS) has successfully identified many genomic variants associated with traits or disease risk. Unexpectedly, a large proportion of GWAS single nucleotide polymorphisms (SNPs) and associated haplotype blocks are in intronic and intergenic regions, hindering their functional evaluation. While some of these risk-susceptibility regions encompass cis-regulatory sites, their transcriptional potential has never been systematically explored. To detect rare tissue-specific expression, we employed the transcript-enrichment method CaptureSeq on 21 human tissues to identify 1775 multi-exonic transcripts from 561 intronic and intergenic haploblocks associated with 392 traits and diseases, covering 73.9 Mb (2.2%) of the human genome. We show that a large proportion (85%) of disease-associated haploblocks express novel multi-exonic non-coding transcripts that are tissue-specific and enriched for GWAS SNPs as well as epigenetic markers of active transcription and enhancer activity. Similarly, we captured transcriptomes from 13 melanomas, targeting nine melanoma-associated haploblocks, and characterized 31 novel melanoma-specific transcripts that include fusion proteins, novel exons and non-coding RNAs, one-third of which showed allelically imbalanced expression. This resource of previously unreported transcripts in disease-associated regions ( http://gwas-captureseq.dingerlab.org ) should provide an important starting point for the translational community in search of novel biomarkers, disease mechanisms, and drug targets.
Integrated Genomic Characterization Reveals Novel, Therapeutically Relevant Drug Targets in FGFR and EGFR Pathways in Sporadic Intrahepatic Cholangiocarcinoma

PubMed Central

Liang, Winnie S.; Fonseca, Rafael; Bryce, Alan H.; McCullough, Ann E.; Barrett, Michael T.; Hunt, Katherine; Patel, Maitray D.; Young, Scott W.; Collins, Joseph M.; Silva, Alvin C.; Condjella, Rachel M.; Block, Matthew; McWilliams, Robert R.; Lazaridis, Konstantinos N.; Klee, Eric W.; Bible, Keith C.; Harris, Pamela; Oliver, Gavin R.; Bhavsar, Jaysheel D.; Nair, Asha A.; Middha, Sumit; Asmann, Yan; Kocher, Jean-Pierre; Schahl, Kimberly; Kipp, Benjamin R.; Barr Fritcher, Emily G.; Baker, Angela; Aldrich, Jessica; Kurdoglu, Ahmet; Izatt, Tyler; Christoforides, Alexis; Cherni, Irene; Nasser, Sara; Reiman, Rebecca; Phillips, Lori; McDonald, Jackie; Adkins, Jonathan; Mastrian, Stephen D.; Placek, Pamela; Watanabe, Aprill T.; LoBello, Janine; Han, Haiyong; Von Hoff, Daniel; Craig, David W.; Stewart, A. Keith; Carpten, John D.

2014-01-01

Advanced cholangiocarcinoma continues to harbor a difficult prognosis and therapeutic options have been limited. During the course of a clinical trial of whole genomic sequencing seeking druggable targets, we examined six patients with advanced cholangiocarcinoma. Integrated genome-wide and whole transcriptome sequence analyses were performed on tumors from six patients with advanced, sporadic intrahepatic cholangiocarcinoma (SIC) to identify potential therapeutically actionable events. Among the somatic events captured in our analysis, we uncovered two novel therapeutically relevant genomic contexts that when acted upon, resulted in preliminary evidence of anti-tumor activity. Genome-wide structural analysis of sequence data revealed recurrent translocation events involving the FGFR2 locus in three of six assessed patients. These observations and supporting evidence triggered the use of FGFR inhibitors in these patients. In one example, preliminary anti-tumor activity of pazopanib (in vitro FGFR2 IC50≈350 nM) was noted in a patient with an FGFR2-TACC3 fusion. After progression on pazopanib, the same patient also had stable disease on ponatinib, a pan-FGFR inhibitor (in vitro, FGFR2 IC50≈8 nM). In an independent non-FGFR2 translocation patient, exome and transcriptome analysis revealed an allele specific somatic nonsense mutation (E384X) in ERRFI1, a direct negative regulator of EGFR activation. Rapid and robust disease regression was noted in this ERRFI1 inactivated tumor when treated with erlotinib, an EGFR kinase inhibitor. FGFR2 fusions and ERRFI mutations may represent novel targets in sporadic intrahepatic cholangiocarcinoma and trials should be characterized in larger cohorts of patients with these aberrations. PMID:24550739
Genome scale enzyme–metabolite and drug–target interaction predictions using the signature molecular descriptor

DOE PAGES

Faulon, Jean-Loup; Misra, Milind; Martin, Shawn; ...

2007-11-23

Motivation: Identifying protein enzymatic or pharmacological activities are important areas of research in biology and chemistry. Biological and chemical databases are increasingly being populated with linkages between protein sequences and chemical structures. Additionally, there is now sufficient information to apply machine-learning techniques to predict interactions between chemicals and proteins at a genome scale. Current machine-learning techniques use as input either protein sequences and structures or chemical information. We propose here a method to infer protein–chemical interactions using heterogeneous input consisting of both protein sequence and chemical information. Results: Our method relies on expressing proteins and chemicals with a common cheminformaticsmore » representation. We demonstrate our approach by predicting whether proteins can catalyze reactions not present in training sets. We also predict whether a given drug can bind a target, in the absence of prior binding information for that drug and target. Lastly, such predictions cannot be made with current machine-learning techniques requiring binding information for individual reactions or individual targets.« less
Genome-wide scan of gastrointestinal nematode resistance in closed Angus population selected for minimized influence of MHC.

PubMed

Kim, Eui-Soo; Sonstegard, Tad S; da Silva, Marcos V G B; Gasbarre, Louis C; Van Tassell, Curtis P

2015-01-01

Genetic markers associated with parasite indicator traits are ideal targets for study of marker assisted selection aimed at controlling infections that reduce herd use of anthelminthics. For this study, we collected gastrointestinal (GI) nematode fecal egg count (FEC) data from post-weaning animals of an Angus resource population challenged to a 26 week natural exposure on pasture. In all, data from 487 animals was collected over a 16 year period between 1992 and 2007, most of which were selected for a specific DRB1 allele to reduce the influence of potential allelic variant effects of the MHC locus. A genome-wide association study (GWAS) based on BovineSNP50 genotypes revealed six genomic regions located on bovine Chromosomes 3, 5, 8, 15 and 27; which were significantly associated (-log10 p=4.3) with Box-Cox transformed mean FEC (BC-MFEC). DAVID analysis of the genes within the significant genomic regions suggested a correlation between our results and annotation for genes involved in inflammatory response to infection. Furthermore, ROH and selection signature analyses provided strong evidence that the genomic regions associated BC-MFEC have not been affected by local autozygosity or recent experimental selection. These findings provide useful information for parasite resistance prediction for young grazing cattle and suggest new candidate gene targets for development of disease-modifying therapies or future studies of host response to GI parasite infection.
A Genome-Wide Association Study Identifies Genomic Regions for Virulence in the Non-Model Organism Heterobasidion annosum s.s

PubMed Central

Dalman, Kerstin; Himmelstrand, Kajsa; Olson, Åke; Lind, Mårten; Brandström-Durling, Mikael; Stenlid, Jan

2013-01-01

The dense single nucleotide polymorphisms (SNP) panels needed for genome wide association (GWA) studies have hitherto been expensive to establish and use on non-model organisms. To overcome this, we used a next generation sequencing approach to both establish SNPs and to determine genotypes. We conducted a GWA study on a fungal species, analysing the virulence of Heterobasidion annosum s.s., a necrotrophic pathogen, on its hosts Picea abies and Pinus sylvestris. From a set of 33,018 single nucleotide polymorphisms (SNP) in 23 haploid isolates, twelve SNP markers distributed on seven contigs were associated with virulence (P<0.0001). Four of the contigs harbour known virulence genes from other fungal pathogens and the remaining three harbour novel candidate genes. Two contigs link closely to virulence regions recognized previously by QTL mapping in the congeneric hybrid H. irregulare × H. occidentale. Our study demonstrates the efficiency of GWA studies for dissecting important complex traits of small populations of non-model haploid organisms with small genomes. PMID:23341945
Global Identification and Characterization of Transcriptionally Active Regions in the Rice Genome

PubMed Central

Stolc, Viktor; Deng, Wei; He, Hang; Korbel, Jan; Chen, Xuewei; Tongprasit, Waraporn; Ronald, Pamela; Chen, Runsheng; Gerstein, Mark; Wang Deng, Xing

2007-01-01

Genome tiling microarray studies have consistently documented rich transcriptional activity beyond the annotated genes. However, systematic characterization and transcriptional profiling of the putative novel transcripts on the genome scale are still lacking. We report here the identification of 25,352 and 27,744 transcriptionally active regions (TARs) not encoded by annotated exons in the rice (Oryza. sativa) subspecies japonica and indica, respectively. The non-exonic TARs account for approximately two thirds of the total TARs detected by tiling arrays and represent transcripts likely conserved between japonica and indica. Transcription of 21,018 (83%) japonica non-exonic TARs was verified through expression profiling in 10 tissue types using a re-array in which annotated genes and TARs were each represented by five independent probes. Subsequent analyses indicate that about 80% of the japonica TARs that were not assigned to annotated exons can be assigned to various putatively functional or structural elements of the rice genome, including splice variants, uncharacterized portions of incompletely annotated genes, antisense transcripts, duplicated gene fragments, and potential non-coding RNAs. These results provide a systematic characterization of non-exonic transcripts in rice and thus expand the current view of the complexity and dynamics of the rice transcriptome. PMID:17372628
Hundreds of conserved non-coding genomic regions are independently lost in mammals

PubMed Central

Hiller, Michael; Schaar, Bruce T.; Bejerano, Gill

2012-01-01

Conserved non-protein-coding DNA elements (CNEs) often encode cis-regulatory elements and are rarely lost during evolution. However, CNE losses that do occur can be associated with phenotypic changes, exemplified by pelvic spine loss in sticklebacks. Using a computational strategy to detect complete loss of CNEs in mammalian genomes while strictly controlling for artifacts, we find >600 CNEs that are independently lost in at least two mammalian lineages, including a spinal cord enhancer near GDF11. We observed several genomic regions where multiple independent CNE loss events happened; the most extreme is the DIAPH2 locus. We show that CNE losses often involve deletions and that CNE loss frequencies are non-uniform. Similar to less pleiotropic enhancers, we find that independently lost CNEs are shorter, slightly less constrained and evolutionarily younger than CNEs without detected losses. This suggests that independently lost CNEs are less pleiotropic and that pleiotropic constraints contribute to non-uniform CNE loss frequencies. We also detected 35 CNEs that are independently lost in the human lineage and in other mammals. Our study uncovers an interesting aspect of the evolution of functional DNA in mammalian genomes. Experiments are necessary to test if these independently lost CNEs are associated with parallel phenotype changes in mammals. PMID:23042682
Mind the gap! The mitochondrial control region and its power as a phylogenetic marker in echinoids.

PubMed

Bronstein, Omri; Kroh, Andreas; Haring, Elisabeth

2018-05-30

In Metazoa, mitochondrial markers are the most commonly used targets for inferring species-level molecular phylogenies due to their extremely low rate of recombination, maternal inheritance, ease of use and fast substitution rate in comparison to nuclear DNA. The mitochondrial control region (CR) is the main non-coding area of the mitochondrial genome and contains the mitochondrial origin of replication and transcription. While sequences of the cytochrome oxidase subunit 1 (COI) and 16S rRNA genes are the prime mitochondrial markers in phylogenetic studies, the highly variable CR is typically ignored and not targeted in such analyses. However, the higher substitution rate of the CR can be harnessed to infer the phylogeny of closely related species, and the use of a non-coding region alleviates biases resulting from both directional and purifying selection. Additionally, complete mitochondrial genome assemblies utilizing next generation sequencing (NGS) data often show exceptionally low coverage at specific regions, including the CR. This can only be resolved by targeted sequencing of this region. Here we provide novel sequence data for the echinoid mitochondrial control region in over 40 species across the echinoid phylogenetic tree. We demonstrate the advantages of directly targeting the CR and adjacent tRNAs to facilitate complementing low coverage NGS data from complete mitochondrial genome assemblies. Finally, we test the performance of this region as a phylogenetic marker both in the lab and in phylogenetic analyses, and demonstrate its superior performance over the other available mitochondrial markers in echinoids. Our target region of the mitochondrial CR (1) facilitates the first thorough investigation of this region across a wide range of echinoid taxa, (2) provides a tool for complementing missing data in NGS experiments, and (3) identifies the CR as a powerful, novel marker for phylogenetic inference in echinoids due to its high variability, lack of
[Genome editing of industrial microorganism].

PubMed

Zhu, Linjiang; Li, Qi

2015-03-01

Genome editing is defined as highly-effective and precise modification of cellular genome in a large scale. In recent years, such genome-editing methods have been rapidly developed in the field of industrial strain improvement. The quickly-updating methods thoroughly change the old mode of inefficient genetic modification, which is "one modification, one selection marker, and one target site". Highly-effective modification mode in genome editing have been developed including simultaneous modification of multiplex genes, highly-effective insertion, replacement, and deletion of target genes in the genome scale, cut-paste of a large DNA fragment. These new tools for microbial genome editing will certainly be applied widely, and increase the efficiency of industrial strain improvement, and promote the revolution of traditional fermentation industry and rapid development of novel industrial biotechnology like production of biofuel and biomaterial. The technological principle of these genome-editing methods and their applications were summarized in this review, which can benefit engineering and construction of industrial microorganism.
Trends in genome-wide and region-specific genetic diversity in the Dutch-Flemish Holstein-Friesian breeding program from 1986 to 2015.

PubMed

Doekes, Harmen P; Veerkamp, Roel F; Bijma, Piter; Hiemstra, Sipke J; Windig, Jack J

2018-04-11

In recent decades, Holstein-Friesian (HF) selection schemes have undergone profound changes, including the introduction of optimal contribution selection (OCS; around 2000), a major shift in breeding goal composition (around 2000) and the implementation of genomic selection (GS; around 2010). These changes are expected to have influenced genetic diversity trends. Our aim was to evaluate genome-wide and region-specific diversity in HF artificial insemination (AI) bulls in the Dutch-Flemish breeding program from 1986 to 2015. Pedigree and genotype data (~ 75.5 k) of 6280 AI-bulls were used to estimate rates of genome-wide inbreeding and kinship and corresponding effective population sizes. Region-specific inbreeding trends were evaluated using regions of homozygosity (ROH). Changes in observed allele frequencies were compared to those expected under pure drift to identify putative regions under selection. We also investigated the direction of changes in allele frequency over time. Effective population size estimates for the 1986-2015 period ranged from 69 to 102. Two major breakpoints were observed in genome-wide inbreeding and kinship trends. Around 2000, inbreeding and kinship levels temporarily dropped. From 2010 onwards, they steeply increased, with pedigree-based, ROH-based and marker-based inbreeding rates as high as 1.8, 2.1 and 2.8% per generation, respectively. Accumulation of inbreeding varied substantially across the genome. A considerable fraction of markers showed changes in allele frequency that were greater than expected under pure drift. Putative selected regions harboured many quantitative trait loci (QTL) associated to a wide range of traits. In consecutive 5-year periods, allele frequencies changed more often in the same direction than in opposite directions, except when comparing the 1996-2000 and 2001-2005 periods. Genome-wide and region-specific diversity trends reflect major changes in the Dutch-Flemish HF breeding program. Introduction of
CisMiner: Genome-Wide In-Silico Cis-Regulatory Module Prediction by Fuzzy Itemset Mining

PubMed Central

Navarro, Carmen; Lopez, Francisco J.; Cano, Carlos; Garcia-Alcalde, Fernando; Blanco, Armando

2014-01-01

Eukaryotic gene control regions are known to be spread throughout non-coding DNA sequences which may appear distant from the gene promoter. Transcription factors are proteins that coordinately bind to these regions at transcription factor binding sites to regulate gene expression. Several tools allow to detect significant co-occurrences of closely located binding sites (cis-regulatory modules, CRMs). However, these tools present at least one of the following limitations: 1) scope limited to promoter or conserved regions of the genome; 2) do not allow to identify combinations involving more than two motifs; 3) require prior information about target motifs. In this work we present CisMiner, a novel methodology to detect putative CRMs by means of a fuzzy itemset mining approach able to operate at genome-wide scale. CisMiner allows to perform a blind search of CRMs without any prior information about target CRMs nor limitation in the number of motifs. CisMiner tackles the combinatorial complexity of genome-wide cis-regulatory module extraction using a natural representation of motif combinations as itemsets and applying the Top-Down Fuzzy Frequent- Pattern Tree algorithm to identify significant itemsets. Fuzzy technology allows CisMiner to better handle the imprecision and noise inherent to regulatory processes. Results obtained for a set of well-known binding sites in the S. cerevisiae genome show that our method yields highly reliable predictions. Furthermore, CisMiner was also applied to putative in-silico predicted transcription factor binding sites to identify significant combinations in S. cerevisiae and D. melanogaster, proving that our approach can be further applied genome-wide to more complex genomes. CisMiner is freely accesible at: http://genome2.ugr.es/cisminer. CisMiner can be queried for the results presented in this work and can also perform a customized cis-regulatory module prediction on a query set of transcription factor binding sites provided by
Exploring Pandora's Box: Potential and Pitfalls of Low Coverage Genome Surveys for Evolutionary Biology

PubMed Central

Leese, Florian; Mayer, Christoph; Agrawal, Shobhit; Dambach, Johannes; Dietz, Lars; Doemel, Jana S.; Goodall-Copstake, William P.; Held, Christoph; Jackson, Jennifer A.; Lampert, Kathrin P.; Linse, Katrin; Macher, Jan N.; Nolzen, Jennifer; Raupach, Michael J.; Rivera, Nicole T.; Schubart, Christoph D.; Striewski, Sebastian; Tollrian, Ralph; Sands, Chester J.

2012-01-01

High throughput sequencing technologies are revolutionizing genetic research. With this “rise of the machines”, genomic sequences can be obtained even for unknown genomes within a short time and for reasonable costs. This has enabled evolutionary biologists studying genetically unexplored species to identify molecular markers or genomic regions of interest (e.g. micro- and minisatellites, mitochondrial and nuclear genes) by sequencing only a fraction of the genome. However, when using such datasets from non-model species, it is possible that DNA from non-target contaminant species such as bacteria, viruses, fungi, or other eukaryotic organisms may complicate the interpretation of the results. In this study we analysed 14 genomic pyrosequencing libraries of aquatic non-model taxa from four major evolutionary lineages. We quantified the amount of suitable micro- and minisatellites, mitochondrial genomes, known nuclear genes and transposable elements and searched for contamination from various sources using bioinformatic approaches. Our results show that in all sequence libraries with estimated coverage of about 0.02–25%, many appropriate micro- and minisatellites, mitochondrial gene sequences and nuclear genes from different KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways could be identified and characterized. These can serve as markers for phylogenetic and population genetic analyses. A central finding of our study is that several genomic libraries suffered from different biases owing to non-target DNA or mobile elements. In particular, viruses, bacteria or eukaryote endosymbionts contributed significantly (up to 10%) to some of the libraries analysed. If not identified as such, genetic markers developed from high-throughput sequencing data for non-model organisms may bias evolutionary studies or fail completely in experimental tests. In conclusion, our study demonstrates the enormous potential of low-coverage genome survey sequences and suggests
Comparative genomics identification of a novel set of temporally regulated hedgehog target genes in the retina.

PubMed

McNeill, Brian; Perez-Iratxeta, Carol; Mazerolle, Chantal; Furimsky, Marosh; Mishina, Yuji; Andrade-Navarro, Miguel A; Wallace, Valerie A

2012-03-01

The hedgehog (Hh) signaling pathway is involved in numerous developmental and adult processes with many links to cancer. In vertebrates, the activity of the Hh pathway is mediated primarily through three Gli transcription factors (Gli1, 2 and 3) that can serve as transcriptional activators or repressors. The identification of Gli target genes is essential for the understanding of the Hh-mediated processes. We used a comparative genomics approach using the mouse and human genomes to identify 390 genes that contained conserved Gli binding sites. RT-qPCR validation of 46 target genes in E14.5 and P0.5 retinal explants revealed that Hh pathway activation resulted in the modulation of 30 of these targets, 25 of which demonstrated a temporal regulation. Further validation revealed that the expression of Bok, FoxA1, Sox8 and Wnt7a was dependent upon Sonic Hh (Shh) signaling in the retina and their regulation is under positive and negative controls by Gli2 and Gli3, respectively. We also show using chromatin immunoprecipitation that Gli2 binds to the Sox8 promoter, suggesting that Sox8 is an Hh-dependent direct target of Gli2. Finally, we demonstrate that the Hh pathway also modulates the expression of Sox9 and Sox10, which together with Sox8 make up the SoxE group. Previously, it has been shown that Hh and SoxE group genes promote Müller glial cell development in the retina. Our data are consistent with the possibility for a role of SoxE group genes downstream of Hh signaling on Müller cell development. Crown Copyright Â© 2012. Published by Elsevier Inc. All rights reserved.

Formation of new chromatin domains determines pathogenicity of genomic duplications.

PubMed

Franke, Martin; Ibrahim, Daniel M; Andrey, Guillaume; Schwarzer, Wibke; Heinrich, Verena; Schöpflin, Robert; Kraft, Katerina; Kempfer, Rieke; Jerković, Ivana; Chan, Wing-Lee; Spielmann, Malte; Timmermann, Bernd; Wittler, Lars; Kurth, Ingo; Cambiaso, Paola; Zuffardi, Orsetta; Houge, Gunnar; Lambie, Lindsay; Brancati, Francesco; Pombo, Ana; Vingron, Martin; Spitz, Francois; Mundlos, Stefan

2016-10-13

Chromosome conformation capture methods have identified subchromosomal structures of higher-order chromatin interactions called topologically associated domains (TADs) that are separated from each other by boundary regions. By subdividing the genome into discrete regulatory units, TADs restrict the contacts that enhancers establish with their target genes. However, the mechanisms that underlie partitioning of the genome into TADs remain poorly understood. Here we show by chromosome conformation capture (capture Hi-C and 4C-seq methods) that genomic duplications in patient cells and genetically modified mice can result in the formation of new chromatin domains (neo-TADs) and that this process determines their molecular pathology. Duplications of non-coding DNA within the mouse Sox9 TAD (intra-TAD) that cause female to male sex reversal in humans, showed increased contact of the duplicated regions within the TAD, but no change in the overall TAD structure. In contrast, overlapping duplications that extended over the next boundary into the neighbouring TAD (inter-TAD), resulted in the formation of a new chromatin domain (neo-TAD) that was isolated from the rest of the genome. As a consequence of this insulation, inter-TAD duplications had no phenotypic effect. However, incorporation of the next flanking gene, Kcnj2, in the neo-TAD resulted in ectopic contacts of Kcnj2 with the duplicated part of the Sox9 regulatory region, consecutive misexpression of Kcnj2, and a limb malformation phenotype. Our findings provide evidence that TADs are genomic regulatory units with a high degree of internal stability that can be sculptured by structural genomic variations. This process is important for the interpretation of copy number variations, as these variations are routinely detected in diagnostic tests for genetic disease and cancer. This finding also has relevance in an evolutionary setting because copy-number differences are thought to have a crucial role in the evolution of
Genome-wide high-throughput SNP discovery and genotyping for understanding natural (functional) allelic diversity and domestication patterns in wild chickpea

PubMed Central

Bajaj, Deepak; Das, Shouvik; Badoni, Saurabh; Kumar, Vinod; Singh, Mohar; Bansal, Kailash C.; Tyagi, Akhilesh K.; Parida, Swarup K.

2015-01-01

We identified 82489 high-quality genome-wide SNPs from 93 wild and cultivated Cicer accessions through integrated reference genome- and de novo-based GBS assays. High intra- and inter-specific polymorphic potential (66–85%) and broader natural allelic diversity (6–64%) detected by genome-wide SNPs among accessions signify their efficacy for monitoring introgression and transferring target trait-regulating genomic (gene) regions/allelic variants from wild to cultivated Cicer gene pools for genetic improvement. The population-specific assignment of wild Cicer accessions pertaining to the primary gene pool are more influenced by geographical origin/phenotypic characteristics than species/gene-pools of origination. The functional significance of allelic variants (non-synonymous and regulatory SNPs) scanned from transcription factors and stress-responsive genes in differentiating wild accessions (with potential known sources of yield-contributing and stress tolerance traits) from cultivated desi and kabuli accessions, fine-mapping/map-based cloning of QTLs and determination of LD patterns across wild and cultivated gene-pools are suitably elucidated. The correlation between phenotypic (agromorphological traits) and molecular diversity-based admixed domestication patterns within six structured populations of wild and cultivated accessions via genome-wide SNPs was apparent. This suggests utility of whole genome SNPs as a potential resource for identifying naturally selected trait-regulating genomic targets/functional allelic variants adaptive to diverse agroclimatic regions for genetic enhancement of cultivated gene-pools. PMID:26208313
Comparative genomics in chicken and Pekin duck using FISH mapping and microarray analysis

PubMed Central

2009-01-01

Background The availability of the complete chicken (Gallus gallus) genome sequence as well as a large number of chicken probes for fluorescent in-situ hybridization (FISH) and microarray resources facilitate comparative genomic studies between chicken and other bird species. In a previous study, we provided a comprehensive cytogenetic map for the turkey (Meleagris gallopavo) and the first analysis of copy number variants (CNVs) in birds. Here, we extend this approach to the Pekin duck (Anas platyrhynchos), an obvious target for comparative genomic studies due to its agricultural importance and resistance to avian flu. Results We provide a detailed molecular cytogenetic map of the duck genome through FISH assignment of 155 chicken clones. We identified one inter- and six intrachromosomal rearrangements between chicken and duck macrochromosomes and demonstrated conserved synteny among all microchromosomes analysed. Array comparative genomic hybridisation revealed 32 CNVs, of which 5 overlap previously designated "hotspot" regions between chicken and turkey. Conclusion Our results suggest extensive conservation of avian genomes across 90 million years of evolution in both macro- and microchromosomes. The data on CNVs between chicken and duck extends previous analyses in chicken and turkey and supports the hypotheses that avian genomes contain fewer CNVs than mammalian genomes and that genomes of evolutionarily distant species share regions of copy number variation ("CNV hotspots"). Our results will expedite duck genomics, assist marker development and highlight areas of interest for future evolutionary and functional studies. PMID:19656363
Secondary structure of the 3'-noncoding region of flavivirus genomes: comparative analysis of base pairing probabilities.

PubMed

Rauscher, S; Flamm, C; Mandl, C W; Heinz, F X; Stadler, P F

1997-07-01

The prediction of the complete matrix of base pairing probabilities was applied to the 3' noncoding region (NCR) of flavivirus genomes. This approach identifies not only well-defined secondary structure elements, but also regions of high structural flexibility. Flaviviruses, many of which are important human pathogens, have a common genomic organization, but exhibit a significant degree of RNA sequence diversity in the functionally important 3'-NCR. We demonstrate the presence of secondary structures shared by all flaviviruses, as well as structural features that are characteristic for groups of viruses within the genus reflecting the established classification scheme. The significance of most of the predicted structures is corroborated by compensatory mutations. The availability of infectious clones for several flaviviruses will allow the assessment of these structural elements in processes of the viral life cycle, such as replication and assembly.
Convergent genomic signatures of domestication in sheep and goats.

PubMed

Alberto, Florian J; Boyer, Frédéric; Orozco-terWengel, Pablo; Streeter, Ian; Servin, Bertrand; de Villemereuil, Pierre; Benjelloun, Badr; Librado, Pablo; Biscarini, Filippo; Colli, Licia; Barbato, Mario; Zamani, Wahid; Alberti, Adriana; Engelen, Stefan; Stella, Alessandra; Joost, Stéphane; Ajmone-Marsan, Paolo; Negrini, Riccardo; Orlando, Ludovic; Rezaei, Hamid Reza; Naderi, Saeid; Clarke, Laura; Flicek, Paul; Wincker, Patrick; Coissac, Eric; Kijas, James; Tosser-Klopp, Gwenola; Chikhi, Abdelkader; Bruford, Michael W; Taberlet, Pierre; Pompanon, François

2018-03-06

The evolutionary basis of domestication has been a longstanding question and its genetic architecture is becoming more tractable as more domestic species become genome-enabled. Before becoming established worldwide, sheep and goats were domesticated in the fertile crescent 10,500 years before present (YBP) where their wild relatives remain. Here we sequence the genomes of wild Asiatic mouflon and Bezoar ibex in the sheep and goat domestication center and compare their genomes with that of domestics from local, traditional, and improved breeds. Among the genomic regions carrying selective sweeps differentiating domestic breeds from wild populations, which are associated among others to genes involved in nervous system, immunity and productivity traits, 20 are common to Capra and Ovis. The patterns of selection vary between species, suggesting that while common targets of selection related to domestication and improvement exist, different solutions have arisen to achieve similar phenotypic end-points within these closely related livestock species.
Variability among Cucurbitaceae species (melon, cucumber and watermelon) in a genomic region containing a cluster of NBS-LRR genes.

PubMed

Morata, Jordi; Puigdomènech, Pere

2017-02-08

Cucurbitaceae species contain a significantly lower number of genes coding for proteins with similarity to plant resistance genes belonging to the NBS-LRR family than other plant species of similar genome size. A large proportion of these genes are organized in clusters that appear to be hotspots of variability. The genomes of the Cucurbitaceae species measured until now are intermediate in size (between 350 and 450 Mb) and they apparently have not undergone any genome duplications beside those at the origin of eudicots. The cluster containing the largest number of NBS-LRR genes has previously been analyzed in melon and related species and showed a high degree of interspecific and intraspecific variability. It was of interest to study whether similar behavior occurred in other cluster of the same family of genes. The cluster of NBS-LRR genes located in melon chromosome 9 was analyzed and compared with the syntenic regions in other cucurbit genomes. This is the second cluster in number within this species and it contains nine sequences with a NBS-LRR annotation including two genes, Fom1 and Prv, providing resistance against Fusarium and Ppapaya ring-spot virus (PRSV). The variability within the melon species appears to consist essentially of single nucleotide polymorphisms. Clusters of similar genes are present in the syntenic regions of the two species of Cucurbitaceae that were sequenced, cucumber and watermelon. Most of the genes in the syntenic clusters can be aligned between species and a hypothesis of generation of the cluster is proposed. The number of genes in the watermelon cluster is similar to that in melon while a higher number of genes (12) is present in cucumber, a species with a smaller genome than melon. After comparing genome resequencing data of 115 cucumber varieties, deletion of a group of genes is observed in a group of varieties of Indian origin. Clusters of genes coding for NBS-LRR proteins in cucurbits appear to have specific variability in
Mechanism of Genome Interrogation: How CRISPR RNA-Guided Cas9 Proteins Locate Specific Targets on DNA.

PubMed

Shvets, Alexey A; Kolomeisky, Anatoly B

2017-10-03

The ability to precisely edit and modify a genome opens endless opportunities to investigate fundamental properties of living systems as well as to advance various medical techniques and bioengineering applications. This possibility is now close to reality due to a recent discovery of the adaptive bacterial immune system, which is based on clustered regularly interspaced short palindromic repeats (CRISPR)-associated proteins (Cas) that utilize RNA to find and cut the double-stranded DNA molecules at specific locations. Here we develop a quantitative theoretical approach to analyze the mechanism of target search on DNA by CRISPR RNA-guided Cas9 proteins, which is followed by a selective cleavage of nucleic acids. It is based on a discrete-state stochastic model that takes into account the most relevant physical-chemical processes in the system. Using a method of first-passage processes, a full dynamic description of the target search is presented. It is found that the location of specific sites on DNA by CRISPR Cas9 proteins is governed by binding first to protospacer adjacent motif sequences on DNA, which is followed by reversible transitions into DNA interrogation states. In addition, the search dynamics is strongly influenced by the off-target cutting. Our theoretical calculations allow us to explain the experimental observations and to give experimentally testable predictions. Thus, the presented theoretical model clarifies some molecular aspects of the genome interrogation by CRISPR RNA-guided Cas9 proteins. Copyright © 2017 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Atlantic salmon populations reveal adaptive divergence of immune related genes - a duplicated genome under selection.

PubMed

Kjærner-Semb, Erik; Ayllon, Fernando; Furmanek, Tomasz; Wennevik, Vidar; Dahle, Geir; Niemelä, Eero; Ozerov, Mikhail; Vähä, Juha-Pekka; Glover, Kevin A; Rubin, Carl J; Wargelius, Anna; Edvardsen, Rolf B

2016-08-11

Populations of Atlantic salmon display highly significant genetic differences with unresolved molecular basis. These differences may result from separate postglacial colonization patterns, diversifying natural selection and adaptation, or a combination. Adaptation could be influenced or even facilitated by the recent whole genome duplication in the salmonid lineage which resulted in a partly tetraploid species with duplicated genes and regions. In order to elucidate the genes and genomic regions underlying the genetic differences, we conducted a genome wide association study using whole genome resequencing data from eight populations from Northern and Southern Norway. From a total of ~4.5 million sequencing-derived SNPs, more than 10 % showed significant differentiation between populations from these two regions and ten selective sweeps on chromosomes 5, 10, 11, 13-15, 21, 24 and 25 were identified. These comprised 59 genes, of which 15 had one or more differentiated missense mutation. Our analysis showed that most sweeps have paralogous regions in the partially tetraploid genome, each lacking the high number of significant SNPs found in the sweeps. The most significant sweep was found on Chr 25 and carried several missense mutations in the antiviral mx genes, suggesting that these populations have experienced differing viral pressures. Interestingly the second most significant sweep, found on Chr 5, contains two genes involved in the NF-KB pathway (nkap and nkrf), which is also a known pathogen target that controls a large number of processes in animals. Our results show that natural selection acting on immune related genes has contributed to genetic divergence between salmon populations in Norway. The differences between populations may have been facilitated by the plasticity of the salmon genome. The observed signatures of selection in duplicated genomic regions suggest that the recently duplicated genome has provided raw material for evolutionary adaptation.
Genomic profiling of a Hepatocyte growth factor-dependent signature for MET-targeted therapy in glioblastoma.

PubMed

Johnson, Jennifer; Ascierto, Maria Libera; Mittal, Sandeep; Newsome, David; Kang, Liang; Briggs, Michael; Tanner, Kirk; Marincola, Francesco M; Berens, Michael E; Vande Woude, George F; Xie, Qian

2015-09-17

Constitutive MET signaling promotes invasiveness in most primary and recurrent GBM. However, deployment of available MET-targeting agents is confounded by lack of effective biomarkers for selecting suitable patients for treatment. Because endogenous HGF overexpression often causes autocrine MET activation, and also indicates sensitivity to MET inhibitors, we investigated whether it drives the expression of distinct genes which could serve as a signature indicating vulnerability to MET-targeted therapy in GBM. Interrogation of genomic data from TCGA GBM (Student's t test, GBM patients with high and low HGF expression, p ≤ 0.00001) referenced against patient-derived xenograft (PDX) models (Student's t test, sensitive vs. insensitive models, p ≤ 0.005) was used to identify the HGF-dependent signature. Genomic analysis of GBM xenograft models using both human and mouse gene expression microarrays (Student's t test, treated vs. vehicle tumors, p ≤ 0.01) were performed to elucidate the tumor and microenvironment cross talk. A PDX model with EGFR(amp) was tested for MET activation as a mechanism of erlotinib resistance. We identified a group of 20 genes highly associated with HGF overexpression in GBM and were up- or down-regulated only in tumors sensitive to MET inhibitor. The MET inhibitors regulate tumor (human) and host (mouse) cells within the tumor via distinct molecular processes, but overall impede tumor growth by inhibiting cell cycle progression. EGFR (amp) tumors undergo erlotinib resistance responded to a combination of MET and EGFR inhibitors. Combining TCGA primary tumor datasets (human) and xenograft tumor model datasets (human tumor grown in mice) using therapeutic efficacy as an endpoint may serve as a useful approach to discover and develop molecular signatures as therapeutic biomarkers for targeted therapy. The HGF dependent signature may serve as a candidate predictive signature for patient enrollment in clinical trials using MET inhibitors
Implications of publicly available genomic data resources in searching for therapeutic targets of obesity and type 2 diabetes.

PubMed

Jung, Sungwon

2018-04-20

Obesity and type 2 diabetes (T2D) are two major conditions that are related to metabolic disorders and affect a large population. Although there have been significant efforts to identify their therapeutic targets, few benefits have come from comprehensive molecular profiling. This limited availability of comprehensive molecular profiling of obesity and T2D may be due to multiple challenges, as these conditions involve multiple organs and collecting tissue samples from subjects is more difficult in obesity and T2D than in other diseases, where surgical treatments are popular choices. While there is no repository of comprehensive molecular profiling data for obesity and T2D, multiple existing data resources can be utilized to cover various aspects of these conditions. This review presents studies with available genomic data resources for obesity and T2D and discusses genome-wide association studies (GWAS), a knockout (KO)-based phenotyping study, and gene expression profiles. These studies, based on their assessed coverage and characteristics, can provide insights into how such data can be utilized to identify therapeutic targets for obesity and T2D.
Controlling gene networks and cell fate with precision-targeted DNA-binding proteins and small-molecule-based genome readers

PubMed Central

Eguchi, Asuka; Lee, Garrett O.; Wan, Fang; Erwin, Graham S.; Ansari, Aseem Z.

2014-01-01

Transcription factors control the fate of a cell by regulating the expression of genes and regulatory networks. Recent successes in inducing pluripotency in terminally differentiated cells as well as directing differentiation with natural transcription factors has lent credence to the efforts that aim to direct cell fate with rationally designed transcription factors. Because DNA-binding factors are modular in design, they can be engineered to target specific genomic sequences and perform pre-programmed regulatory functions upon binding. Such precision-tailored factors can serve as molecular tools to reprogramme or differentiate cells in a targeted manner. Using different types of engineered DNA binders, both regulatory transcriptional controls of gene networks, as well as permanent alteration of genomic content, can be implemented to study cell fate decisions. In the present review, we describe the current state of the art in artificial transcription factor design and the exciting prospect of employing artificial DNA-binding factors to manipulate the transcriptional networks as well as epigenetic landscapes that govern cell fate. PMID:25145439
Evolutionary Genomics of Peach and Almond Domestication

PubMed Central

Velasco, Dianne; Hough, Josh; Aradhya, Mallikarjuna; Ross-Ibarra, Jeffrey

2016-01-01

The domesticated almond [Prunus dulcis (L.) Batsch] and peach [P. persica (Mill.) D. A. Webb] originated on opposite sides of Asia and were independently domesticated ∼5000 yr ago. While interfertile, they possess alternate mating systems and differ in a number of morphological and physiological traits. Here, we evaluated patterns of genome-wide diversity in both almond and peach to better understand the impacts of mating system, adaptation, and domestication on the evolution of these taxa. Almond has around seven times the genetic diversity of peach, and high genome-wide FST values support their status as separate species. We estimated a divergence time of ∼8 MYA (million years ago), coinciding with an active period of uplift in the northeast Tibetan Plateau and subsequent Asian climate change. We see no evidence of a bottleneck during domestication of either species, but identify a number of regions showing signatures of selection during domestication and a significant overlap in candidate regions between peach and almond. While we expected gene expression in fruit to overlap with candidate selected regions, instead we find enrichment for loci highly differentiated between the species, consistent with recent fossil evidence suggesting fruit divergence long preceded domestication. Taken together, this study tells us how closely related tree species evolve and are domesticated, the impact of these events on their genomes, and the utility of genomic information for long-lived species. Further exploration of this data will contribute to the genetic knowledge of these species and provide information regarding targets of selection for breeding application, and further the understanding of evolution in these species. PMID:27707802
Genome-wide comparative diversity uncovers multiple targets of selection for improvement in hexaploid wheat landrace and cultivars

USDA-ARS?s Scientific Manuscript database

Domesticated crops have experienced strong human-driven selection aimed at the development of improved varieties adapted to local conditions. To detect regions of the wheat genome subject to selection during improvement, we developed a high-throughput array to interrogate 9,000 gene-associated DNA m...
Genomic regions with a history of divergent selection affect fitness of hybrids between two butterfly species.

PubMed

Gompert, Zachariah; Lucas, Lauren K; Nice, Chris C; Fordyce, James A; Forister, Matthew L; Buerkle, C Alex

2012-07-01

Speciation is the process by which reproductively isolated lineages arise, and is one of the fundamental means by which the diversity of life increases. Whereas numerous studies have documented an association between ecological divergence and reproductive isolation, relatively little is known about the role of natural selection in genome divergence during the process of speciation. Here, we use genome-wide DNA sequences and Bayesian models to test the hypothesis that loci under divergent selection between two butterfly species (Lycaeides idas and L. melissa) also affect fitness in an admixed population. Locus-specific measures of genetic differentiation between L. idas and L. melissa and genomic introgression in hybrids varied across the genome. The most differentiated genetic regions were characterized by elevated L. idas ancestry in the admixed population, which occurs in L. idas-like habitat, consistent with the hypothesis that local adaptation contributes to speciation. Moreover, locus-specific measures of genetic differentiation (a metric of divergent selection) were positively associated with extreme genomic introgression (a metric of hybrid fitness). Interestingly, concordance of differentiation and introgression was only partial. We discuss multiple, complementary explanations for this partial concordance. © 2012 The Author(s).
Identification of the TFII-I family target genes in the vertebrate genome.

PubMed

Chimge, Nyam-Osor; Makeyev, Aleksandr V; Ruddle, Frank H; Bayarsaihan, Dashzeveg

2008-07-01

GTF2I and GTF2IRD1 encode members of the TFII-I transcription factor family and are prime candidates in the Williams syndrome, a complex neurodevelopmental disorder. Our previous expression microarray studies implicated TFII-I proteins in the regulation of a number of genes critical in various aspects of cell physiology. Here, we combined bioinformatics and microarray results to identify TFII-I downstream targets in the vertebrate genome. These results were validated by chromatin immunoprecipitation and siRNA analysis. The collected evidence revealed the complexity of TFII-I-mediated processes that involve distinct regulatory networks. Altogether, these results lead to a better understanding of specific molecular events, some of which may be responsible for the Williams syndrome phenotype.
Genomic regions associated with kyphosis in swine

PubMed Central

2010-01-01

Background A back curvature defect similar to kyphosis in humans has been observed in swine herds. The defect ranges from mild to severe curvature of the thoracic vertebrate in split carcasses and has an estimated heritability of 0.3. The objective of this study was to identify genomic regions that affect this trait. Results Single nucleotide polymorphism (SNP) associations performed with 198 SNPs and microsatellite markers in a Duroc-Landrace-Yorkshire resource population (U.S. Meat Animal Research Center, USMARC resource population) of swine provided regions of association with this trait on 15 chromosomes. Positional candidate genes, especially those involved in human skeletal development pathways, were selected for SNP identification. SNPs in 16 candidate genes were genotyped in an F2 population (n = 371) and the USMARC resource herd (n = 1,257) with kyphosis scores. SNPs in KCNN2 on SSC2, RYR1 and PLOD1 on SSC6 and MYST4 on SSC14 were significantly associated with kyphosis in the resource population of swine (P ≤ 0.05). SNPs in CER1 and CDH7 on SSC1, PSMA5 on SSC4, HOXC6 and HOXC8 on SSC5, ADAMTS18 on SSC6 and SOX9 on SSC12 were significantly associated with the kyphosis trait in the F2 population of swine (P ≤ 0.05). Conclusions These data suggest that this kyphosis trait may be affected by several loci and that these may differ by population. Carcass value could be improved by effectively removing this undesirable trait from pig populations. PMID:21176156
Physical mapping of a large plant genome using global high-information-content-fingerprinting: the distal region of the wheat ancestor Aegilops tauschii chromosome 3DS

PubMed Central

2010-01-01

Background Physical maps employing libraries of bacterial artificial chromosome (BAC) clones are essential for comparative genomics and sequencing of large and repetitive genomes such as those of the hexaploid bread wheat. The diploid ancestor of the D-genome of hexaploid wheat (Triticum aestivum), Aegilops tauschii, is used as a resource for wheat genomics. The barley diploid genome also provides a good model for the Triticeae and T. aestivum since it is only slightly larger than the ancestor wheat D genome. Gene co-linearity between the grasses can be exploited by extrapolating from rice and Brachypodium distachyon to Ae. tauschii or barley, and then to wheat. Results We report the use of Ae. tauschii for the construction of the physical map of a large distal region of chromosome arm 3DS. A physical map of 25.4 Mb was constructed by anchoring BAC clones of Ae. tauschii with 85 EST on the Ae. tauschii and barley genetic maps. The 24 contigs were aligned to the rice and B. distachyon genomic sequences and a high density SNP genetic map of barley. As expected, the mapped region is highly collinear to the orthologous chromosome 1 in rice, chromosome 2 in B. distachyon and chromosome 3H in barley. However, the chromosome scale of the comparative maps presented provides new insights into grass genome organization. The disruptions of the Ae. tauschii-rice and Ae. tauschii-Brachypodium syntenies were identical. We observed chromosomal rearrangements between Ae. tauschii and barley. The comparison of Ae. tauschii physical and genetic maps showed that the recombination rate across the region dropped from 2.19 cM/Mb in the distal region to 0.09 cM/Mb in the proximal region. The size of the gaps between contigs was evaluated by comparing the recombination rate along the map with the local recombination rates calculated on single contigs. Conclusions The physical map reported here is the first physical map using fingerprinting of a complete Triticeae genome. This study
Detecting long tandem duplications in genomic sequences.

PubMed

Audemard, Eric; Schiex, Thomas; Faraut, Thomas

2012-05-08

Detecting duplication segments within completely sequenced genomes provides valuable information to address genome evolution and in particular the important question of the emergence of novel functions. The usual approach to gene duplication detection, based on all-pairs protein gene comparisons, provides only a restricted view of duplication. In this paper, we introduce ReD Tandem, a software using a flow based chaining algorithm targeted at detecting tandem duplication arrays of moderate to longer length regions, with possibly locally weak similarities, directly at the DNA level. On the A. thaliana genome, using a reference set of tandem duplicated genes built using TAIR,(a) we show that ReD Tandem is able to predict a large fraction of recently duplicated genes (dS < 1) and that it is also able to predict tandem duplications involving non coding elements such as pseudo-genes or RNA genes. ReD Tandem allows to identify large tandem duplications without any annotation, leading to agnostic identification of tandem duplications. This approach nicely complements the usual protein gene based which ignores duplications involving non coding regions. It is however inherently restricted to relatively recent duplications. By recovering otherwise ignored events, ReD Tandem gives a more comprehensive view of existing evolutionary processes and may also allow to improve existing annotations.
[Identification of Clonorchis sinensis metacercariae based on PCR targeting ribosomal DNA ITS regions and COX1 gene].

PubMed

Yang, Qing-Li; Shen, Ji-Qing; Jiang, Zhi-Hua; Yang, Yi-Chao; Li, Hong-Mei; Chen, Ying-Dan; Zhou, Xiao-Nong

2014-06-01

To identify Clonorchis sinensis metacercariae using PCR targeting ribosomal DNA ITS region and COX1 gene. Pseudorasbora parva were collected from Hengxian County of Guangxi at the end of May 2013. Single metacercaria of C. sinensis and other trematodes were separated from muscle tissue of P. parva by digestion method. Primers targeting ribosomal DNA ITS region and COX1 gene of C. sinensis were designed for PCR and the universal primers were used as control. The sensitivity and specificity of the PCR detection were analyzed. C. sinensis metacercariae at different stages were identified by PCR. DNA from single C. sinensis metacercaria was detected by PCR targeting ribosomal DNA ITS region and COX1 gene. The specific amplicans have sizes of 437/549, 156/249 and 195/166 bp, respectively. The ratio of the two positive numbers in PCR with universal primers and specific primers targeting C. sinensis ribosomal DNA ITS1 and ITS2 regions was 0.905 and 0.952, respectively. The target gene fragments were amplified by PCR using COX1 gene-specific primers. The PCR with specific primers did not show any non-specific amplification. However, the PCR with universal primers targeting ribosomal DNA ITS regions performed serious non-specific amplification. C. sinensis metacercariae at different stages are identified by morphological observation and PCR method. Species-specific primers targeting ribosomal DNA ITS region show higher sensitivity and specificity than the universal primers. PCR targeting COX1 gene shows similar sensitivity and specificity to PCR with specific primers targeting ribosomal DNA ITS regions.
[Three-dimensional genome organization: a lesson from the Polycomb-Group proteins].

PubMed

Bantignies, Frédéric

2013-01-01

As more and more genomes are being explored and annotated, important features of three-dimensional (3D) genome organization are just being uncovered. In the light of what we know about Polycomb group (PcG) proteins, we will present the latest findings on this topic. The PcG proteins are well-conserved chromatin factors that repress transcription of numerous target genes. They bind the genome at specific sites, forming chromatin domains of associated histone modifications as well as higher-order chromatin structures. These 3D chromatin structures involve the interactions between PcG-bound regulatory regions at short- and long-range distances, and may significantly contribute to PcG function. Recent high throughput "Chromosome Conformation Capture" (3C) analyses have revealed many other higher order structures along the chromatin fiber, partitioning the genomes into well demarcated topological domains. This revealed an unprecedented link between linear epigenetic domains and chromosome architecture, which might be intimately connected to genome function. © Société de Biologie, 2013.

Detailed analysis of targeted gene mutations caused by the Platinum-Fungal TALENs in Aspergillus oryzae RIB40 strain and a ligD disruptant.

PubMed

Mizutani, Osamu; Arazoe, Takayuki; Toshida, Kenji; Hayashi, Risa; Ohsato, Shuichi; Sakuma, Tetsushi; Yamamoto, Takashi; Kuwata, Shigeru; Yamada, Osamu

2017-03-01

Transcription activator-like effector nucleases (TALENs), which can generate DNA double-strand breaks at specific sites in the desired genome locus, have been used in many organisms as a tool for genome editing. In Aspergilli, including Aspergillus oryzae, however, the use of TALENs has not been validated. In this study, we performed genome editing of A. oryzae wild-type strain via error of nonhomologous end-joining (NHEJ) repair by transient expression of high-efficiency Platinum-Fungal TALENs (PtFg TALENs). Targeted mutations were observed as various mutation patterns. In particular, approximately half of the PtFg TALEN-mediated deletion mutants had deletions larger than 1 kb in the TALEN-targeting region. We also conducted PtFg TALEN-based genome editing in A. oryzae ligD disruptant (ΔligD) lacking the ligD gene involved in the final step of the NHEJ repair and found that mutations were still obtained as well as wild-type. In this case, the ratio of the large deletions reduced compared to PtFg TALEN-based genome editing in the wild-type. In conclusion, we demonstrate that PtFg TALENs are sufficiently functional to cause genome editing via error of NHEJ in A. oryzae. In addition, we reveal that genome editing using TALENs in A. oryzae tends to cause large deletions at the target region, which were partly suppressed by deletion of ligD. Copyright © 2016 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.
Enhancers Are Major Targets for Murine Leukemia Virus Vector Integration

PubMed Central

De Ravin, Suk See; Su, Ling; Theobald, Narda; Choi, Uimook; Macpherson, Janet L.; Poidinger, Michael; Symonds, Geoff; Pond, Susan M.; Ferris, Andrea L.; Hughes, Stephen H.

2014-01-01

ABSTRACT Retroviral vectors have been used in successful gene therapies. However, in some patients, insertional mutagenesis led to leukemia or myelodysplasia. Both the strong promoter/enhancer elements in the long terminal repeats (LTRs) of murine leukemia virus (MLV)-based vectors and the vector-specific integration site preferences played an important role in these adverse clinical events. MLV integration is known to prefer regions in or near transcription start sites (TSS). Recently, BET family proteins were shown to be the major cellular proteins responsible for targeting MLV integration. Although MLV integration sites are significantly enriched at TSS, only a small fraction of the MLV integration sites (<15%) occur in this region. To resolve this apparent discrepancy, we created a high-resolution genome-wide integration map of more than one million integration sites from CD34+ hematopoietic stem cells transduced with a clinically relevant MLV-based vector. The integration sites form ∼60,000 tight clusters. These clusters comprise ∼1.9% of the genome. The vast majority (87%) of the integration sites are located within histone H3K4me1 islands, a hallmark of enhancers. The majority of these clusters also have H3K27ac histone modifications, which mark active enhancers. The enhancers of some oncogenes, including LMO2, are highly preferred targets for integration without in vivo selection. IMPORTANCE We show that active enhancer regions are the major targets for MLV integration; this means that MLV preferentially integrates in regions that are favorable for viral gene expression in a variety of cell types. The results provide insights for MLV integration target site selection and also explain the high risk of insertional mutagenesis that is associated with gene therapy trials using MLV vectors. PMID:24501411
Synteny conservation between the Prunus genome and both the present and ancestral Arabidopsis genomes

PubMed Central

Jung, Sook; Main, Dorrie; Staton, Margaret; Cho, Ilhyung; Zhebentyayeva, Tatyana; Arús, Pere; Abbott, Albert

2006-01-01

Background Due to the lack of availability of large genomic sequences for peach or other Prunus species, the degree of synteny conservation between the Prunus species and Arabidopsis has not been systematically assessed. Using the recently available peach EST sequences that are anchored to Prunus genetic maps and to peach physical map, we analyzed the extent of conserved synteny between the Prunus and the Arabidopsis genomes. The reconstructed pseudo-ancestral Arabidopsis genome, existed prior to the proposed recent polyploidy event, was also utilized in our analysis to further elucidate the evolutionary relationship. Results We analyzed the synteny conservation between the Prunus and the Arabidopsis genomes by comparing 475 peach ESTs that are anchored to Prunus genetic maps and their Arabidopsis homologs detected by sequence similarity. Microsyntenic regions were detected between all five Arabidopsis chromosomes and seven of the eight linkage groups of the Prunus reference map. An additional 1097 peach ESTs that are anchored to 431 BAC contigs of the peach physical map and their Arabidopsis homologs were also analyzed. Microsyntenic regions were detected in 77 BAC contigs. The syntenic regions from both data sets were short and contained only a couple of conserved gene pairs. The synteny between peach and Arabidopsis was fragmentary; all the Prunus linkage groups containing syntenic regions matched to more than two different Arabidopsis chromosomes, and most BAC contigs with multiple conserved syntenic regions corresponded to multiple Arabidopsis chromosomes. Using the same peach EST datasets and their Arabidopsis homologs, we also detected conserved syntenic regions in the pseudo-ancestral Arabidopsis genome. In many cases, the gene order and content of peach regions was more conserved in the ancestral genome than in the present Arabidopsis region. Statistical significance of each syntenic group was calculated using simulated Arabidopsis genome. Conclusion We
HiView: an integrative genome browser to leverage Hi-C results for the interpretation of GWAS variants.

PubMed

Xu, Zheng; Zhang, Guosheng; Duan, Qing; Chai, Shengjie; Zhang, Baqun; Wu, Cong; Jin, Fulai; Yue, Feng; Li, Yun; Hu, Ming

2016-03-11

Genome-wide association studies (GWAS) have identified thousands of genetic variants associated with complex traits and diseases. However, most of them are located in the non-protein coding regions, and therefore it is challenging to hypothesize the functions of these non-coding GWAS variants. Recent large efforts such as the ENCODE and Roadmap Epigenomics projects have predicted a large number of regulatory elements. However, the target genes of these regulatory elements remain largely unknown. Chromatin conformation capture based technologies such as Hi-C can directly measure the chromatin interactions and have generated an increasingly comprehensive catalog of the interactome between the distal regulatory elements and their potential target genes. Leveraging such information revealed by Hi-C holds the promise of elucidating the functions of genetic variants in human diseases. In this work, we present HiView, the first integrative genome browser to leverage Hi-C results for the interpretation of GWAS variants. HiView is able to display Hi-C data and statistical evidence for chromatin interactions in genomic regions surrounding any given GWAS variant, enabling straightforward visualization and interpretation. We believe that as the first GWAS variants-centered Hi-C genome browser, HiView is a useful tool guiding post-GWAS functional genomics studies. HiView is freely accessible at: http://www.unc.edu/~yunmli/HiView .
[Exon-intron structure of the fet5+ gene of Schizosaccharomyces pombe and physical mapping of genome encompassing regions].

PubMed

Shpakovskiĭ, G V; Lebedenko, E N

1998-01-01

Plasmid pYUK3 bearing the fet5+ gene of Schizosaccharomyces pombe was isolated from a genomic library of the fission yeast, and a detailed physical map of the whole genomic insert (ca. 9.6 Kbp) was constructed. The primary structure of the fet5+ gene and its flanking regions is established. The gene contains a single 45-bp intron in its distal part. A typical TATA-box (TATAAG) was found in the 5'-noncoding region ca. 50 bp upstream of the putative start of transcription, and the 3'-noncoding region contains AT-rich palindromes, which are probably involved in termination of the fet5+ transcription. A previously unidentified gene of Sz. pombe encoding a protein with some similarity to one of the transcriptional activators from the TBP (TATA-binding protein) group of SPT factors of transcription was found in the vicinity of the fet5+ gene. Taking into account that cDNA of the fet5(+)-gene was isolated as a suppressor of the genetic-defect of nuclear RNA polymerases I-III (Bioorg. Khim., 1997, vol. 23, No 3, pp. 234-237), this vicinity may be the first evidence of possible clustering, in the genome of the fission yeast, of genes participating in transcription regulation.
A difference in the pattern of repair in a large genomic region in UV-irradiated normal human and Cockayne syndrome cells.

PubMed

Shanower, G A; Kantor, G J

1997-11-01

Xeroderma pigmentosum group C cells repair DNA damaged by ultraviolet radiation in an unusual pattern throughout the genome. They remove cyclobutane pyrimidine dimers only from the DNA of transcriptionally active chromatin regions and only from the strand that contains the transcribed strand. The repair proceeds in a manner that creates damage-free islands which are in some cases much larger than the active gene associated with them. For example, the small transcriptionally active beta-actin gene (3.5 kb) is repaired as part of a 50 kb single-stranded region. The repair responsible for creating these islands requires active transcription, suggesting that the two activities are coupled. A preferential repair pathway in normal human cells promotes repair of actively transcribed DNA strands and is coupled to transcription. It is not known if similar large islands, referred to as repair domains, are preferentially created as a result of the coupling. Data are presented showing that in normal cells, preferential repair in the beta-actin region is associated with the creation of a large, completely repaired region in the partially repaired genome. Repair at other genomic locations which contain inactive genes (insulin, 754) does not create similar large regions as quickly. In contrast, repair in Cockayne syndrome cells, which are defective in the preferential repair pathway but not in genome-overall repair, proceeds in the beta-actin region by a mechanism which does not create preferentially a large repaired region. Thus a correlation between the activity required to preferentially repair active genes and that required to create repaired domains is detected. We propose an involvement of the transcription-repair coupling factor in a coordinated repair pathway for removing DNA damage from entire transcription units.
Genome-wide analyses of LINE–LINE-mediated nonallelic homologous recombination

PubMed Central

Startek, Michał; Szafranski, Przemyslaw; Gambin, Tomasz; Campbell, Ian M.; Hixson, Patricia; Shaw, Chad A.; Stankiewicz, Paweł; Gambin, Anna

2015-01-01

Nonallelic homologous recombination (NAHR), occurring between low-copy repeats (LCRs) >10 kb in size and sharing >97% DNA sequence identity, is responsible for the majority of recurrent genomic rearrangements in the human genome. Recent studies have shown that transposable elements (TEs) can also mediate recurrent deletions and translocations, indicating the features of substrates that mediate NAHR may be significantly less stringent than previously believed. Using >4 kb length and >95% sequence identity criteria, we analyzed of the genome-wide distribution of long interspersed element (LINE) retrotransposon and their potential to mediate NAHR. We identified 17 005 directly oriented LINE pairs located <10 Mbp from each other as potential NAHR substrates, placing 82.8% of the human genome at risk of LINE–LINE-mediated instability. Cross-referencing these regions with CNVs in the Baylor College of Medicine clinical chromosomal microarray database of 36 285 patients, we identified 516 CNVs potentially mediated by LINEs. Using long-range PCR of five different genomic regions in a total of 44 patients, we confirmed that the CNV breakpoints in each patient map within the LINE elements. To additionally assess the scale of LINE–LINE/NAHR phenomenon in the human genome, we tested DNA samples from six healthy individuals on a custom aCGH microarray targeting LINE elements predicted to mediate CNVs and identified 25 LINE–LINE rearrangements. Our data indicate that LINE–LINE-mediated NAHR is widespread and under-recognized, and is an important mechanism of structural rearrangement contributing to human genomic variability. PMID:25613453
Whole-genome sequencing of a quarter-century melioidosis outbreak in temperate Australia uncovers a region of low-prevalence endemicity

PubMed Central

Chapple, Stephanie N. J.; Sarovich, Derek S.; Holden, Matthew T. G.; Peacock, Sharon J.; Buller, Nicky; Golledge, Clayton; Mayo, Mark; Currie, Bart J.

2016-01-01

Melioidosis, caused by the highly recombinogenic bacterium Burkholderia pseudomallei, is a disease with high mortality. Tracing the origin of melioidosis outbreaks and understanding how the bacterium spreads and persists in the environment are essential to protecting public and veterinary health and reducing mortality associated with outbreaks. We used whole-genome sequencing to compare isolates from a historical quarter-century outbreak that occurred between 1966 and 1991 in the Avon Valley, Western Australia, a region far outside the known range of B. pseudomallei endemicity. All Avon Valley outbreak isolates shared the same multilocus sequence type (ST-284), which has not been identified outside this region. We found substantial genetic diversity among isolates based on a comparison of genome-wide variants, with no clear correlation between genotypes and temporal, geographical or source data. We observed little evidence of recombination in the outbreak strains, indicating that genetic diversity among these isolates has primarily accrued by mutation. Phylogenomic analysis demonstrated that the isolates confidently grouped within the Australian B. pseudomallei clade, thereby ruling out introduction from a melioidosis-endemic region outside Australia. Collectively, our results point to B. pseudomallei ST-284 being present in the Avon Valley for longer than previously recognized, with its persistence and genomic diversity suggesting long-term, low-prevalence endemicity in this temperate region. Our findings provide a concerning demonstration of the potential for environmental persistence of B. pseudomallei far outside the conventional endemic regions. An expected increase in extreme weather events may reactivate latent B. pseudomallei populations in this region. PMID:28348862
Improving mapping and SNP-calling performance in multiplexed targeted next-generation sequencing

PubMed Central

2012-01-01

Background Compared to classical genotyping, targeted next-generation sequencing (tNGS) can be custom-designed to interrogate entire genomic regions of interest, in order to detect novel as well as known variants. To bring down the per-sample cost, one approach is to pool barcoded NGS libraries before sample enrichment. Still, we lack a complete understanding of how this multiplexed tNGS approach and the varying performance of the ever-evolving analytical tools can affect the quality of variant discovery. Therefore, we evaluated the impact of different software tools and analytical approaches on the discovery of single nucleotide polymorphisms (SNPs) in multiplexed tNGS data. To generate our own test model, we combined a sequence capture method with NGS in three experimental stages of increasing complexity (E. coli genes, multiplexed E. coli, and multiplexed HapMap BRCA1/2 regions). Results We successfully enriched barcoded NGS libraries instead of genomic DNA, achieving reproducible coverage profiles (Pearson correlation coefficients of up to 0.99) across multiplexed samples, with <10% strand bias. However, the SNP calling quality was substantially affected by the choice of tools and mapping strategy. With the aim of reducing computational requirements, we compared conventional whole-genome mapping and SNP-calling with a new faster approach: target-region mapping with subsequent ‘read-backmapping’ to the whole genome to reduce the false detection rate. Consequently, we developed a combined mapping pipeline, which includes standard tools (BWA, SAMtools, etc.), and tested it on public HiSeq2000 exome data from the 1000 Genomes Project. Our pipeline saved 12 hours of run time per Hiseq2000 exome sample and detected ~5% more SNPs than the conventional whole genome approach. This suggests that more potential novel SNPs may be discovered using both approaches than with just the conventional approach. Conclusions We recommend applying our general �
Detection of genome-wide copy number variants in myeloid malignancies using next-generation sequencing.

PubMed

Shen, Wei; Paxton, Christian N; Szankasi, Philippe; Longhurst, Maria; Schumacher, Jonathan A; Frizzell, Kimberly A; Sorrells, Shelly M; Clayton, Adam L; Jattani, Rakhi P; Patel, Jay L; Toydemir, Reha; Kelley, Todd W; Xu, Xinjie

2018-04-01

Genetic abnormalities, including copy number variants (CNV), copy number neutral loss of heterozygosity (CN-LOH) and gene mutations, underlie the pathogenesis of myeloid malignancies and serve as important diagnostic, prognostic and/or therapeutic markers. Currently, multiple testing strategies are required for comprehensive genetic testing in myeloid malignancies. The aim of this proof-of-principle study was to investigate the feasibility of combining detection of genome-wide large CNVs, CN-LOH and targeted gene mutations into a single assay using next-generation sequencing (NGS). For genome-wide CNV detection, we designed a single nucleotide polymorphism (SNP) sequencing backbone with 22 762 SNP regions evenly distributed across the entire genome. For targeted mutation detection, 62 frequently mutated genes in myeloid malignancies were targeted. We combined this SNP sequencing backbone with a targeted mutation panel, and sequenced 9 healthy individuals and 16 patients with myeloid malignancies using NGS. We detected 52 somatic CNVs, 11 instances of CN-LOH and 39 oncogenic mutations in the 16 patients with myeloid malignancies, and none in the 9 healthy individuals. All CNVs and CN-LOH were confirmed by SNP microarray analysis. We describe a genome-wide SNP sequencing backbone which allows for sensitive detection of genome-wide CNVs and CN-LOH using NGS. This proof-of-principle study has demonstrated that this strategy can provide more comprehensive genetic profiling for patients with myeloid malignancies using a single assay. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
The construction of recombinant industrial yeasts free of bacterial sequences by directed gene replacement into a nonessential region of the genome.

PubMed

Xiao, W; Rank, G H

1989-03-15

The yeast SMR1 gene was used as a dominant resistance-selectable marker for industrial yeast transformation and for targeting integration of an economically important gene at the homologous ILV2 locus. A MEL1 gene, which codes for alpha-galactosidase, was inserted into a dispensable upstream region of SMR1 in vitro; different treatments of the plasmid (pWX813) prior to transformation resulted in 3' end, 5' end and replacement integrations that exhibited distinct integrant structures. One-step replacement within a nonessential region of the host genome generated a stable integration of MEL1 devoid of bacterial plasmid DNA. Using this method, we have constructed several alpha-galactosidase positive industrial Saccharomyces strains. Our study provides a general method for stable gene transfer in most industrial Saccharomyces yeasts, including those used in the baking, brewing (ale and lager), distilling, wine and sake industries, with solely nucleotide sequences of interest. The absence of bacterial DNA in the integrant structure facilitates the commercial application of recombinant DNA technology in the food and beverage industry.
A network-based drug repositioning infrastructure for precision cancer medicine through targeting significantly mutated genes in the human cancer genomes.

PubMed

Cheng, Feixiong; Zhao, Junfei; Fooksa, Michaela; Zhao, Zhongming

2016-07-01

Development of computational approaches and tools to effectively integrate multidomain data is urgently needed for the development of newly targeted cancer therapeutics. We proposed an integrative network-based infrastructure to identify new druggable targets and anticancer indications for existing drugs through targeting significantly mutated genes (SMGs) discovered in the human cancer genomes. The underlying assumption is that a drug would have a high potential for anticancer indication if its up-/down-regulated genes from the Connectivity Map tended to be SMGs or their neighbors in the human protein interaction network. We assembled and curated 693 SMGs in 29 cancer types and found 121 proteins currently targeted by known anticancer or noncancer (repurposed) drugs. We found that the approved or experimental cancer drugs could potentially target these SMGs in 33.3% of the mutated cancer samples, and this number increased to 68.0% by drug repositioning through surveying exome-sequencing data in approximately 5000 normal-tumor pairs from The Cancer Genome Atlas. Furthermore, we identified 284 potential new indications connecting 28 cancer types and 48 existing drugs (adjusted P < .05), with a 66.7% success rate validated by literature data. Several existing drugs (e.g., niclosamide, valproic acid, captopril, and resveratrol) were predicted to have potential indications for multiple cancer types. Finally, we used integrative analysis to showcase a potential mechanism-of-action for resveratrol in breast and lung cancer treatment whereby it targets several SMGs (ARNTL, ASPM, CTTN, EIF4G1, FOXP1, and STIP1). In summary, we demonstrated that our integrative network-based infrastructure is a promising strategy to identify potential druggable targets and uncover new indications for existing drugs to speed up molecularly targeted cancer therapeutics. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All
Sequencing intractable DNA to close microbial genomes.

PubMed

Hurt, Richard A; Brown, Steven D; Podar, Mircea; Palumbo, Anthony V; Elias, Dwayne A

2012-01-01

Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.
St2-80: a new FISH marker for St genome and genome analysis in Triticeae.

PubMed

Wang, Long; Shi, Qinghua; Su, Handong; Wang, Yi; Sha, Lina; Fan, Xing; Kang, Houyang; Zhang, Haiqin; Zhou, Yonghong

2017-07-01

The St genome is one of the most fundamental genomes in Triticeae. Repetitive sequences are widely used to distinguish different genomes or species. The primary objectives of this study were to (i) screen a new sequence that could easily distinguish the chromosome of the St genome from those of other genomes by fluorescence in situ hybridization (FISH) and (ii) investigate the genome constitution of some species that remain uncertain and controversial. We used degenerated oligonucleotide primer PCR (Dop-PCR), Dot-blot, and FISH to screen for a new marker of the St genome and to test the efficiency of this marker in the detection of the St chromosome at different ploidy levels. Signals produced by a new FISH marker (denoted St 2 -80) were present on the entire arm of chromosomes of the St genome, except in the centromeric region. On the contrary, St 2 -80 signals were present in the terminal region of chromosomes of the E, H, P, and Y genomes. No signal was detected in the A and B genomes, and only weak signals were detected in the terminal region of chromosomes of the D genome. St 2 -80 signals were obvious and stable in chromosomes of different genomes, whether diploid or polyploid. Therefore, St 2 -80 is a potential and useful FISH marker that can be used to distinguish the St genome from those of other genomes in Triticeae.
Editor’s Highlight: A Genome-wide Screening of Target Genes Against Silver Nanoparticles in Fission Yeast

PubMed Central

Lee, Sook-Jeong; Lee, Minho; Nam, Miyoung; Lee, Sol; Choi, Jian; Lee, Hye-Jin; Kim, Dong-Uk; Hoe, Kwang-Lae

2018-01-01

Abstract To identify target genes against silver nanoparticles (AgNPs), we screened a genome-wide gene deletion library of 4843 fission yeast heterozygous mutants covering 96% of all protein encoding genes. A total of 33 targets were identified by a microarray and subsequent individual confirmation. The target pattern of AgNPs was more similar to those of AgNO3 and H2O2, followed by Cd and As. The toxic effect of AgNPs on fission yeast was attributed to the intracellular uptake of AgNPs, followed by the subsequent release of Ag+, leading to the generation of reactive oxygen species (ROS). Next, we focused on the top 10 sensitive targets for further studies. As described previously, 7 nonessential targets were associated with detoxification of ROS, because their heterozygous mutants showed elevated ROS levels. Three novel essential targets were related to folate metabolism or cellular component organization, resulting in cell cycle arrest and no induction in the transcriptional level of antioxidant enzymes such as Sod1 and Gpx1 when 1 of the 2 copies was deleted. Intriguingly, met9 played a key role in combating AgNP-induced ROS generation via NADPH production and was also conserved in a human cell line. PMID:29294138
Molecular characterization, genomic distribution and evolutionary dynamics of Short INterspersed Elements in the termite genome.

PubMed

Luchetti, Andrea; Mantovani, Barbara

2011-02-01

Short INterspersed Elements (SINEs) in invertebrates, and especially in animal inbred genomes such that of termites, are poorly known; in this paper we characterize three new SINE families (Talub, Taluc and Talud) through the analyses of 341 sequences, either isolated from the Reticulitermes lucifugus genome or drawn from EST Genbank collection. We further add new data to the only isopteran element known so far, Talua. These SINEs are tRNA-derived elements, with an average length ranging from 258 to 372 bp. The tails are made up by poly(A) or microsatellite motifs. Their copy number varies from 7.9 × 10(3) to 10(5) copies, well within the range observed for other metazoan genomes. Species distribution, age and target site duplication analysis indicate Talud as the oldest, possibly inactive SINE originated before the onset of Isoptera (~150 Myr ago). Taluc underwent to substantial sequence changes throughout the evolution of termites and data suggest it was silenced and then re-activated in the R. lucifugus lineage. Moreover, Taluc shares a conserved sequence block with other unrelated SINEs, as observed for some vertebrate and cephalopod elements. The study of genomic environment showed that insertions are mainly surrounded by microsatellites and other SINEs, indicating a biased accumulation within non-coding regions. The evolutionary dynamics of Talu~ elements is explained through selective mechanisms acting in an inbred genome; in this respect, the study of termites' SINEs activity may provide an interesting framework to address the (co)evolution of mobile elements and the host genome.
Substantial genome synteny preservation among woody angiosperm species: comparative genomics of Chinese chestnut (Castanea mollissima) and plant reference genomes.

PubMed

Staton, Margaret; Zhebentyayeva, Tetyana; Olukolu, Bode; Fang, Guang Chen; Nelson, Dana; Carlson, John E; Abbott, Albert G

2015-10-05

Chinese chestnut (Castanea mollissima) has emerged as a model species for the Fagaceae family with extensive genomic resources including a physical map, a dense genetic map and quantitative trait loci (QTLs) for chestnut blight resistance. These resources enable comparative genomics analyses relative to model plants. We assessed the degree of conservation between the chestnut genome and other well annotated and assembled plant genomic sequences, focusing on the QTL regions of most interest to the chestnut breeding community. The integrated physical and genetic map of Chinese chestnut has been improved to now include 858 shared sequence-based markers. The utility of the integrated map has also been improved through the addition of 42,970 BAC (bacterial artificial chromosome) end sequences spanning over 26 million bases of the estimated 800 Mb chestnut genome. Synteny between chestnut and ten model plant species was conducted on a macro-syntenic scale using sequences from both individual probes and BAC end sequences across the chestnut physical map. Blocks of synteny with chestnut were found in all ten reference species, with the percent of the chestnut physical map that could be aligned ranging from 10 to 39 %. The integrated genetic and physical map was utilized to identify BACs that spanned the three previously identified QTL regions conferring blight resistance. The clones were pooled and sequenced, yielding 396 sequence scaffolds covering 13.9 Mbp. Comparative genomic analysis on a microsytenic scale, using the QTL-associated genomic sequence, identified synteny from chestnut to other plant genomes ranging from 5.4 to 12.9 % of the genome sequences aligning. On both the macro- and micro-synteny levels, the peach, grape and poplar genomes were found to be the most structurally conserved with chestnut. Interestingly, these results did not strictly follow the expectation that decreased phylogenetic distance would correspond to increased levels of genome
Targeted Next-generation Sequencing and Bioinformatics Pipeline to Evaluate Genetic Determinants of Constitutional Disease.

PubMed

Dilliott, Allison A; Farhan, Sali M K; Ghani, Mahdi; Sato, Christine; Liang, Eric; Zhang, Ming; McIntyre, Adam D; Cao, Henian; Racacho, Lemuel; Robinson, John F; Strong, Michael J; Masellis, Mario; Bulman, Dennis E; Rogaeva, Ekaterina; Lang, Anthony; Tartaglia, Carmela; Finger, Elizabeth; Zinman, Lorne; Turnbull, John; Freedman, Morris; Swartz, Rick; Black, Sandra E; Hegele, Robert A

2018-04-04

Next-generation sequencing (NGS) is quickly revolutionizing how research into the genetic determinants of constitutional disease is performed. The technique is highly efficient with millions of sequencing reads being produced in a short time span and at relatively low cost. Specifically, targeted NGS is able to focus investigations to genomic regions of particular interest based on the disease of study. Not only does this further reduce costs and increase the speed of the process, but it lessens the computational burden that often accompanies NGS. Although targeted NGS is restricted to certain regions of the genome, preventing identification of potential novel loci of interest, it can be an excellent technique when faced with a phenotypically and genetically heterogeneous disease, for which there are previously known genetic associations. Because of the complex nature of the sequencing technique, it is important to closely adhere to protocols and methodologies in order to achieve sequencing reads of high coverage and quality. Further, once sequencing reads are obtained, a sophisticated bioinformatics workflow is utilized to accurately map reads to a reference genome, to call variants, and to ensure the variants pass quality metrics. Variants must also be annotated and curated based on their clinical significance, which can be standardized by applying the American College of Medical Genetics and Genomics Pathogenicity Guidelines. The methods presented herein will display the steps involved in generating and analyzing NGS data from a targeted sequencing panel, using the ONDRISeq neurodegenerative disease panel as a model, to identify variants that may be of clinical significance.
Pathways to Genome-targeted Therapies in Serous Ovarian Cancer.

PubMed

Axelrod, Joshua; Delaney, Joe

2017-07-01

Genome sequencing technologies and corresponding oncology publications have generated enormous publicly available datasets for many cancer types. While this has enabled new treatments, and in some limited cases lifetime management of the disease, the treatment options for serous ovarian cancer remain dismal. This review summarizes recent advances in our understanding of ovarian cancer, with a focus on heterogeneity, functional genomics, and actionable data.
Genome sequencing and comparative genomics of honey bee microsporidia, Nosema apis reveal novel insights into host-parasite interactions.

PubMed

Chen, Yan ping; Pettis, Jeffery S; Zhao, Yan; Liu, Xinyue; Tallon, Luke J; Sadzewicz, Lisa D; Li, Renhua; Zheng, Huoqing; Huang, Shaokang; Zhang, Xuan; Hamilton, Michele C; Pernal, Stephen F; Melathopoulos, Andony P; Yan, Xianghe; Evans, Jay D

2013-07-05

The microsporidia parasite Nosema contributes to the steep global decline of honey bees that are critical pollinators of food crops. There are two species of Nosema that have been found to infect honey bees, Nosema apis and N. ceranae. Genome sequencing of N. apis and comparative genome analysis with N. ceranae, a fully sequenced microsporidia species, reveal novel insights into host-parasite interactions underlying the parasite infections. We applied the whole-genome shotgun sequencing approach to sequence and assemble the genome of N. apis which has an estimated size of 8.5 Mbp. We predicted 2,771 protein- coding genes and predicted the function of each putative protein using the Gene Ontology. The comparative genomic analysis led to identification of 1,356 orthologs that are conserved between the two Nosema species and genes that are unique characteristics of the individual species, thereby providing a list of virulence factors and new genetic tools for studying host-parasite interactions. We also identified a highly abundant motif in the upstream promoter regions of N. apis genes. This motif is also conserved in N. ceranae and other microsporidia species and likely plays a role in gene regulation across the microsporidia. The availability of the N. apis genome sequence is a significant addition to the rapidly expanding body of microsprodian genomic data which has been improving our understanding of eukaryotic genome diversity and evolution in a broad sense. The predicted virulent genes and transcriptional regulatory elements are potential targets for innovative therapeutics to break down the life cycle of the parasite.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.