units enables genome-wide: Topics by Science.gov

Sample records for units enables genome-wide

Genome of Drosophila suzukii, the Spotted Wing Drosophila

PubMed Central

Chiu, Joanna C.; Jiang, Xuanting; Zhao, Li; Hamm, Christopher A.; Cridland, Julie M.; Saelao, Perot; Hamby, Kelly A.; Lee, Ernest K.; Kwok, Rosanna S.; Zhang, Guojie; Zalom, Frank G.; Walton, Vaughn M.; Begun, David J.

2013-01-01

Drosophila suzukii Matsumura (spotted wing drosophila) has recently become a serious pest of a wide variety of fruit crops in the United States as well as in Europe, leading to substantial yearly crop losses. To enable basic and applied research of this important pest, we sequenced the D. suzukii genome to obtain a high-quality reference sequence. Here, we discuss the basic properties of the genome and transcriptome and describe patterns of genome evolution in D. suzukii and its close relatives. Our analyses and genome annotations are presented in a web portal, SpottedWingFlyBase, to facilitate public access. PMID:24142924
Each cell counts: Hematopoiesis and immunity research in the era of single cell genomics.

PubMed

Jaitin, Diego Adhemar; Keren-Shaul, Hadas; Elefant, Naama; Amit, Ido

2015-02-01

Hematopoiesis and immunity are mediated through complex interactions between multiple cell types and states. This complexity is currently addressed following a reductionist approach of characterizing cell types by a small number of cell surface molecular features and gross functions. While the introduction of global transcriptional profiling technologies enabled a more comprehensive view, heterogeneity within sampled populations remained unaddressed, obscuring the true picture of hematopoiesis and immune system function. A critical mass of technological advances in molecular biology and genomics has enabled genome-wide measurements of single cells - the fundamental unit of immunity. These new advances are expected to boost detection of less frequent cell types and fuzzy intermediate cell states, greatly expanding the resolution of current available classifications. This new era of single-cell genomics in immunology research holds great promise for further understanding of the mechanisms and circuits regulating hematopoiesis and immunity in both health and disease. In the near future, the accuracy of single-cell genomics will ultimately enable precise diagnostics and treatment of multiple hematopoietic and immune related diseases. Copyright © 2015 Elsevier Ltd. All rights reserved.
The UCSC Genome Browser: What Every Molecular Biologist Should Know

PubMed Central

Mangan, Mary E.; Williams, Jennifer M.; Kuhn, Robert M.; Lathe, Warren C.

2014-01-01

Electronic data resources can enable molecular biologists to quickly get information from around the world that a decade ago would have been buried in papers scattered throughout the library. The ability to access, query, and display these data make benchwork much more efficient and drive new discoveries. Increasingly, mastery of software resources and corresponding data repositories is required to fully explore the volume of data generated in biomedical and agricultural research, because only small amounts of data are actually found in traditional publications. The UCSC Genome Browser provides a wealth of data and tools that advance understanding of genomic context for many species, enable detailed analysis of data, and provide the ability to interrogate regions of interest across disparate data sets from a wide variety of sources. Researchers can also supplement the standard display with their own data to query and share this with others. Effective use of these resources has become crucial to biological research today, and this unit describes some practical applications of the UCSC Genome Browser. PMID:24984850
GUIDE-Seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases

PubMed Central

Nguyen, Nhu T.; Liebers, Matthew; Topkar, Ved V.; Thapar, Vishal; Wyvekens, Nicolas; Khayter, Cyd; Iafrate, A. John; Le, Long P.; Aryee, Martin J.; Joung, J. Keith

2014-01-01

CRISPR RNA-guided nucleases (RGNs) are widely used genome-editing reagents, but methods to delineate their genome-wide off-target cleavage activities have been lacking. Here we describe an approach for global detection of DNA double-stranded breaks (DSBs) introduced by RGNs and potentially other nucleases. This method, called Genome-wide Unbiased Identification of DSBs Enabled by Sequencing (GUIDE-Seq), relies on capture of double-stranded oligodeoxynucleotides into breaks Application of GUIDE-Seq to thirteen RGNs in two human cell lines revealed wide variability in RGN off-target activities and unappreciated characteristics of off-target sequences. The majority of identified sites were not detected by existing computational methods or ChIP-Seq. GUIDE-Seq also identified RGN-independent genomic breakpoint ‘hotspots’. Finally, GUIDE-Seq revealed that truncated guide RNAs exhibit substantially reduced RGN-induced off-target DSBs. Our experiments define the most rigorous framework for genome-wide identification of RGN off-target effects to date and provide a method for evaluating the safety of these nucleases prior to clinical use. PMID:25513782
The UCSC Genome Browser: What Every Molecular Biologist Should Know.

PubMed

Mangan, Mary E; Williams, Jennifer M; Kuhn, Robert M; Lathe, Warren C

2014-07-01

Electronic data resources can enable molecular biologists to quickly get information from around the world that a decade ago would have been buried in papers scattered throughout the library. The ability to access, query, and display these data makes benchwork much more efficient and drives new discoveries. Increasingly, mastery of software resources and corresponding data repositories is required to fully explore the volume of data generated in biomedical and agricultural research, because only small amounts of data are actually found in traditional publications. The UCSC Genome Browser provides a wealth of data and tools that advance understanding of genomic context for many species, enable detailed analysis of data, and provide the ability to interrogate regions of interest across disparate data sets from a wide variety of sources. Researchers can also supplement the standard display with their own data to query and share this with others. Effective use of these resources has become crucial to biological research today, and this unit describes some practical applications of the UCSC Genome Browser. Copyright © 2014 John Wiley & Sons, Inc.
Genetic markers, genotyping methods & next generation sequencing in Mycobacterium tuberculosis

PubMed Central

Desikan, Srinidhi; Narayanan, Sujatha

2015-01-01

Molecular epidemiology (ME) is one of the main areas in tuberculosis research which is widely used to study the transmission epidemics and outbreaks of tubercle bacilli. It exploits the presence of various polymorphisms in the genome of the bacteria that can be widely used as genetic markers. Many DNA typing methods apply these genetic markers to differentiate various strains and to study the evolutionary relationships between them. The three widely used genotyping tools to differentiate Mycobacterium tuberculosis strains are IS6110 restriction fragment length polymorphism (RFLP), spacer oligotyping (Spoligotyping), and mycobacterial interspersed repeat units - variable number of tandem repeats (MIRU-VNTR). A new prospect towards ME was introduced with the development of whole genome sequencing (WGS) and the next generation sequencing (NGS) methods, where the entire genome is sequenced that not only helps in pointing out minute differences between the various sequences but also saves time and the cost. NGS is also found to be useful in identifying single nucleotide polymorphisms (SNPs), comparative genomics and also various aspects about transmission dynamics. These techniques enable the identification of mycobacterial strains and also facilitate the study of their phylogenetic and evolutionary traits. PMID:26205019
OrthoVenn: a web server for genome wide comparison and annotation of orthologous clusters across multiple species

USDA-ARS?s Scientific Manuscript database

Genome wide analysis of orthologous clusters is an important component of comparative genomics studies. Identifying the overlap among orthologous clusters can enable us to elucidate the function and evolution of proteins across multiple species. Here, we report a web platform named OrthoVenn that i...
Genome-wide mapping of mutations at single-nucleotide resolution for protein, metabolic and genome engineering.

PubMed

Garst, Andrew D; Bassalo, Marcelo C; Pines, Gur; Lynch, Sean A; Halweg-Edwards, Andrea L; Liu, Rongming; Liang, Liya; Wang, Zhiwen; Zeitoun, Ramsey; Alexander, William G; Gill, Ryan T

2017-01-01

Improvements in DNA synthesis and sequencing have underpinned comprehensive assessment of gene function in bacteria and eukaryotes. Genome-wide analyses require high-throughput methods to generate mutations and analyze their phenotypes, but approaches to date have been unable to efficiently link the effects of mutations in coding regions or promoter elements in a highly parallel fashion. We report that CRISPR-Cas9 gene editing in combination with massively parallel oligomer synthesis can enable trackable editing on a genome-wide scale. Our method, CRISPR-enabled trackable genome engineering (CREATE), links each guide RNA to homologous repair cassettes that both edit loci and function as barcodes to track genotype-phenotype relationships. We apply CREATE to site saturation mutagenesis for protein engineering, reconstruction of adaptive laboratory evolution experiments, and identification of stress tolerance and antibiotic resistance genes in bacteria. We provide preliminary evidence that CREATE will work in yeast. We also provide a webtool to design multiplex CREATE libraries.
Comprehensive definition of genome features in Spirodela polyrhiza by high-depth physical mapping and short-read DNA sequencing strategies.

PubMed

Michael, Todd P; Bryant, Douglas; Gutierrez, Ryan; Borisjuk, Nikolai; Chu, Philomena; Zhang, Hanzhong; Xia, Jing; Zhou, Junfei; Peng, Hai; El Baidouri, Moaine; Ten Hallers, Boudewijn; Hastie, Alex R; Liang, Tiffany; Acosta, Kenneth; Gilbert, Sarah; McEntee, Connor; Jackson, Scott A; Mockler, Todd C; Zhang, Weixiong; Lam, Eric

2017-02-01

Spirodela polyrhiza is a fast-growing aquatic monocot with highly reduced morphology, genome size and number of protein-coding genes. Considering these biological features of Spirodela and its basal position in the monocot lineage, understanding its genome architecture could shed light on plant adaptation and genome evolution. Like many draft genomes, however, the 158-Mb Spirodela genome sequence has not been resolved to chromosomes, and important genome characteristics have not been defined. Here we deployed rapid genome-wide physical maps combined with high-coverage short-read sequencing to resolve the 20 chromosomes of Spirodela and to empirically delineate its genome features. Our data revealed a dramatic reduction in the number of the rDNA repeat units in Spirodela to fewer than 100, which is even fewer than that reported for yeast. Consistent with its unique phylogenetic position, small RNA sequencing revealed 29 Spirodela-specific microRNA, with only two being shared with Elaeis guineensis (oil palm) and Musa balbisiana (banana). Combining DNA methylation data and small RNA sequencing enabled the accurate prediction of 20.5% long terminal repeats (LTRs) that doubled the previous estimate, and revealed a high Solo:Intact LTR ratio of 8.2. Interestingly, we found that Spirodela has the lowest global DNA methylation levels (9%) of any plant species tested. Taken together our results reveal a genome that has undergone reduction, likely through eliminating non-essential protein coding genes, rDNA and LTRs. In addition to delineating the genome features of this unique plant, the methodologies described and large-scale genome resources from this work will enable future evolutionary and functional studies of this basal monocot family. © 2016 The Authors The Plant Journal © 2016 John Wiley & Sons Ltd.
Genome-wide Analysis Reveals Extensive Functional Interaction between DNA Replication Initiation and Transcription in the Genome of Trypanosoma brucei

PubMed Central

Tiengwe, Calvin; Marcello, Lucio; Farr, Helen; Dickens, Nicholas; Kelly, Steven; Swiderski, Michal; Vaughan, Diane; Gull, Keith; Barry, J. David; Bell, Stephen D.; McCulloch, Richard

2012-01-01

Summary Identification of replication initiation sites, termed origins, is a crucial step in understanding genome transmission in any organism. Transcription of the Trypanosoma brucei genome is highly unusual, with each chromosome comprising a few discrete transcription units. To understand how DNA replication occurs in the context of such organization, we have performed genome-wide mapping of the binding sites of the replication initiator ORC1/CDC6 and have identified replication origins, revealing that both localize to the boundaries of the transcription units. A remarkably small number of active origins is seen, whose spacing is greater than in any other eukaryote. We show that replication and transcription in T. brucei have a profound functional overlap, as reducing ORC1/CDC6 levels leads to genome-wide increases in mRNA levels arising from the boundaries of the transcription units. In addition, ORC1/CDC6 loss causes derepression of silent Variant Surface Glycoprotein genes, which are critical for host immune evasion. PMID:22840408
Memory management in genome-wide association studies

PubMed Central

2009-01-01

Genome-wide association is a powerful tool for the identification of genes that underlie common diseases. Genome-wide association studies generate billions of genotypes and pose significant computational challenges for most users including limited computer memory. We applied a recently developed memory management tool to two analyses of North American Rheumatoid Arthritis Consortium studies and measured the performance in terms of central processing unit and memory usage. We conclude that our memory management approach is simple, efficient, and effective for genome-wide association studies. PMID:20018047
Comparison and quantitative verification of mapping algorithms for whole genome bisulfite sequencing

USDA-ARS?s Scientific Manuscript database

Coupling bisulfite conversion with next-generation sequencing (Bisulfite-seq) enables genome-wide measurement of DNA methylation, but poses unique challenges for mapping. However, despite a proliferation of Bisulfite-seq mapping tools, no systematic comparison of their genomic coverage and quantitat...
Genome-enabled prediction models for yield related traits in chickpea

USDA-ARS?s Scientific Manuscript database

Genomic selection (GS) unlike marker-assisted backcrossing (MABC) predicts breeding values of lines using genome-wide marker profiling and allows selection of lines prior to field-phenotyping, thereby shortening the breeding cycle. A collection of 320 elite breeding lines was selected and phenotyped...
The role of genomics in the neonatal ICU.

PubMed

Maresso, Karen; Broeckel, Ulrich

2009-03-01

Results of both the Human Genome and International HapMap Projects have provided the technology and resources necessary to enable fundamental advances through the study of DNA sequence variation in almost all fields of medicine, including neonatology. Genome-wide association studies are now practical, and the first of these studies are appearing in the literature. This article provides the reader with an overview of the issues in technology and study design relating to genome-wide association studies and summarizes the current state of association studies in neonatal ICU populations with a brief review of the relevant literature. Future recommendations for genomic association studies in neonatal ICU populations are also provided.
Microfluidic-based mini-metagenomics enables discovery of novel microbial lineages from complex environmental samples.

PubMed

Yu, Feiqiao Brian; Blainey, Paul C; Schulz, Frederik; Woyke, Tanja; Horowitz, Mark A; Quake, Stephen R

2017-07-05

Metagenomics and single-cell genomics have enabled genome discovery from unknown branches of life. However, extracting novel genomes from complex mixtures of metagenomic data can still be challenging and represents an ill-posed problem which is generally approached with ad hoc methods. Here we present a microfluidic-based mini-metagenomic method which offers a statistically rigorous approach to extract novel microbial genomes while preserving single-cell resolution. We used this approach to analyze two hot spring samples from Yellowstone National Park and extracted 29 new genomes, including three deeply branching lineages. The single-cell resolution enabled accurate quantification of genome function and abundance, down to 1% in relative abundance. Our analyses of genome level SNP distributions also revealed low to moderate environmental selection. The scale, resolution, and statistical power of microfluidic-based mini-metagenomics make it a powerful tool to dissect the genomic structure of microbial communities while effectively preserving the fundamental unit of biology, the single cell.
Building the tree of life from scratch: an end-to-end work flow for phylogenomic studies

USDA-ARS?s Scientific Manuscript database

Whole genome sequences are rich sources of information about organisms that are superbly useful for addressing a wide variety of evolutionary questions. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understan...
The UCSC genome browser: what every molecular biologist should know.

PubMed

Mangan, Mary E; Williams, Jennifer M; Kuhn, Robert M; Lathe, Warren C

2009-10-01

Electronic data resources can enable molecular biologists to query and display many useful features that make benchwork more efficient and drive new discoveries. The UCSC Genome Browser provides a wealth of data and tools that advance one's understanding of genomic context for many species, enable detailed understanding of data, and provide the ability to interrogate regions of interest. Researchers can also supplement the standard display with their own data to query and share with others. Effective use of these resources has become crucial to biological research today, and this unit describes some practical applications of the UCSC Genome Browser.
Whole genome sequences in pulse crops: a global community resource to expedite translational genomics and knowledge-based crop improvement.

PubMed

Bohra, Abhishek; Singh, Narendra P

2015-08-01

Unprecedented developments in legume genomics over the last decade have resulted in the acquisition of a wide range of modern genomic resources to underpin genetic improvement of grain legumes. The genome enabled insights direct investigators in various ways that primarily include unearthing novel structural variations, retrieving the lost genetic diversity, introducing novel/exotic alleles from wider gene pools, finely resolving the complex quantitative traits and so forth. To this end, ready availability of cost-efficient and high-density genotyping assays allows genome wide prediction to be increasingly recognized as the key selection criterion in crop breeding. Further, the high-dimensional measurements of agronomically significant phenotypes obtained by using new-generation screening techniques will empower reference based resequencing as well as allele mining and trait mapping methods to comprehensively associate genome diversity with the phenome scale variation. Besides stimulating the forward genetic systems, accessibility to precisely delineated genomic segments reveals novel candidates for reverse genetic techniques like targeted genome editing. The shifting paradigm in plant genomics in turn necessitates optimization of crop breeding strategies to enable the most efficient integration of advanced omics knowledge and tools. We anticipate that the crop improvement schemes will be bolstered remarkably with rational deployment of these genome-guided approaches, ultimately resulting in expanded plant breeding capacities and improved crop performance.
The UCSC Genome Browser: What Every Molecular Biologist Should Know

PubMed Central

Mangan, Mary E.; Williams, Jennifer M.; Kuhn, Robert M.; Lathe, Warren C.

2016-01-01

Electronic data resources can enable molecular biologists to query and display many useful features that make benchwork more efficient and drive new discoveries. The UCSC Genome Browser provides a wealth of data and tools that advance one’s understanding of genomic context for many species, enable detailed understanding of data, and provide the ability to interrogate regions of interest. Researchers can also supplement the standard display with their own data to query and share with others. Effective use of these resources has become crucial to biological research today, and this unit describes some practical applications of the UCSC Genome Browser. PMID:19816931
Human Genomic Loci Important in Common Infectious Diseases: Role of High-Throughput Sequencing and Genome-Wide Association Studies

PubMed Central

Sserwadda, Ivan; Amujal, Marion; Namatovu, Norah

2018-01-01

HIV/AIDS, tuberculosis (TB), and malaria are 3 major global public health threats that undermine development in many resource-poor settings. Recently, the notion that positive selection during epidemics or longer periods of exposure to common infectious diseases may have had a major effect in modifying the constitution of the human genome is being interrogated at a large scale in many populations around the world. This positive selection from infectious diseases increases power to detect associations in genome-wide association studies (GWASs). High-throughput sequencing (HTS) has transformed both the management of infectious diseases and continues to enable large-scale functional characterization of host resistance/susceptibility alleles and loci; a paradigm shift from single candidate gene studies. Application of genome sequencing technologies and genomics has enabled us to interrogate the host-pathogen interface for improving human health. Human populations are constantly locked in evolutionary arms races with pathogens; therefore, identification of common infectious disease-associated genomic variants/markers is important in therapeutic, vaccine development, and screening susceptible individuals in a population. This review describes a range of host-pathogen genomic loci that have been associated with disease susceptibility and resistant patterns in the era of HTS. We further highlight potential opportunities for these genetic markers. PMID:29755620

Microfluidic-based mini-metagenomics enables discovery of novel microbial lineages from complex environmental samples

PubMed Central

Yu, Feiqiao Brian; Blainey, Paul C; Schulz, Frederik; Woyke, Tanja; Horowitz, Mark A; Quake, Stephen R

2017-01-01

Metagenomics and single-cell genomics have enabled genome discovery from unknown branches of life. However, extracting novel genomes from complex mixtures of metagenomic data can still be challenging and represents an ill-posed problem which is generally approached with ad hoc methods. Here we present a microfluidic-based mini-metagenomic method which offers a statistically rigorous approach to extract novel microbial genomes while preserving single-cell resolution. We used this approach to analyze two hot spring samples from Yellowstone National Park and extracted 29 new genomes, including three deeply branching lineages. The single-cell resolution enabled accurate quantification of genome function and abundance, down to 1% in relative abundance. Our analyses of genome level SNP distributions also revealed low to moderate environmental selection. The scale, resolution, and statistical power of microfluidic-based mini-metagenomics make it a powerful tool to dissect the genomic structure of microbial communities while effectively preserving the fundamental unit of biology, the single cell. DOI: http://dx.doi.org/10.7554/eLife.26580.001 PMID:28678007
Genome wide selection in Citrus breeding.

PubMed

Gois, I B; Borém, A; Cristofani-Yaly, M; de Resende, M D V; Azevedo, C F; Bastianel, M; Novelli, V M; Machado, M A

2016-10-17

Genome wide selection (GWS) is essential for the genetic improvement of perennial species such as Citrus because of its ability to increase gain per unit time and to enable the efficient selection of characteristics with low heritability. This study assessed GWS efficiency in a population of Citrus and compared it with selection based on phenotypic data. A total of 180 individual trees from a cross between Pera sweet orange (Citrus sinensis Osbeck) and Murcott tangor (Citrus sinensis Osbeck x Citrus reticulata Blanco) were evaluated for 10 characteristics related to fruit quality. The hybrids were genotyped using 5287 DArT_seq TM (diversity arrays technology) molecular markers and their effects on phenotypes were predicted using the random regression - best linear unbiased predictor (rr-BLUP) method. The predictive ability, prediction bias, and accuracy of GWS were estimated to verify its effectiveness for phenotype prediction. The proportion of genetic variance explained by the markers was also computed. The heritability of the traits, as determined by markers, was 16-28%. The predictive ability of these markers ranged from 0.53 to 0.64, and the regression coefficients between predicted and observed phenotypes were close to unity. Over 35% of the genetic variance was accounted for by the markers. Accuracy estimates with GWS were lower than those obtained by phenotypic analysis; however, GWS was superior in terms of genetic gain per unit time. Thus, GWS may be useful for Citrus breeding as it can predict phenotypes early and accurately, and reduce the length of the selection cycle. This study demonstrates the feasibility of genomic selection in Citrus.
Complete genome sequence of Tomato mosaic virus isolated from jasmine in the United States

USDA-ARS?s Scientific Manuscript database

Tomato mosaic virus (ToMV) was first identified in jasmine in the U.S. in Florida in 1999. This report provides the first full genome sequence of a ToMV isolate from jasmine. The full genome sequence of this virus will enable research scientists to develop additional specific diagnostic tests for ...
Quantifying Temporal Genomic Erosion in Endangered Species.

PubMed

Díez-Del-Molino, David; Sánchez-Barreiro, Fatima; Barnes, Ian; Gilbert, M Thomas P; Dalén, Love

2018-03-01

Many species have undergone dramatic population size declines over the past centuries. Although stochastic genetic processes during and after such declines are thought to elevate the risk of extinction, comparative analyses of genomic data from several endangered species suggest little concordance between genome-wide diversity and current population sizes. This is likely because species-specific life-history traits and ancient bottlenecks overshadow the genetic effect of recent demographic declines. Therefore, we advocate that temporal sampling of genomic data provides a more accurate approach to quantify genetic threats in endangered species. Specifically, genomic data from predecline museum specimens will provide valuable baseline data that enable accurate estimation of recent decreases in genome-wide diversity, increases in inbreeding levels, and accumulation of deleterious genetic variation. Copyright © 2017 Elsevier Ltd. All rights reserved.
Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets

PubMed Central

Macosko, Evan Z.; Basu, Anindita; Satija, Rahul; Nemesh, James; Shekhar, Karthik; Goldman, Melissa; Tirosh, Itay; Bialas, Allison R.; Kamitaki, Nolan; Martersteck, Emily M.; Trombetta, John J.; Weitz, David A.; Sanes, Joshua R.; Shalek, Alex K.; Regev, Aviv; McCarroll, Steven A.

2015-01-01

Summary Cells, the basic units of biological structure and function, vary broadly in type and state. Single-cell genomics can characterize cell identity and function, but limitations of ease and scale have prevented its broad application. Here we describe Drop-Seq, a strategy for quickly profiling thousands of individual cells by separating them into nanoliter-sized aqueous droplets, associating a different barcode with each cell’s RNAs, and sequencing them all together. Drop-Seq analyzes mRNA transcripts from thousands of individual cells simultaneously while remembering transcripts’ cell of origin. We analyzed transcriptomes from 44,808 mouse retinal cells and identified 39 transcriptionally distinct cell populations, creating a molecular atlas of gene expression for known retinal cell classes and novel candidate cell subtypes. Drop-Seq will accelerate biological discovery by enabling routine transcriptional profiling at single-cell resolution. PMID:26000488
Genome-wide association study for feed efficiency traits using SNP and haplotype models

USDA-ARS?s Scientific Manuscript database

Feed costs comprise the majority of variable expenses in beef cattle systems making feed efficiency an important economic consideration within the beef industry. Due to the expense of recording individual feed intake phenotypes, a genomic-enabled approach could be advantageous towards improving this...
Microbial genome-wide association studies: lessons from human GWAS.

PubMed

Power, Robert A; Parkhill, Julian; de Oliveira, Tulio

2017-01-01

The reduced costs of sequencing have led to whole-genome sequences for a large number of microorganisms, enabling the application of microbial genome-wide association studies (GWAS). Given the successes of human GWAS in understanding disease aetiology and identifying potential drug targets, microbial GWAS are likely to further advance our understanding of infectious diseases. These advances include insights into pressing global health problems, such as antibiotic resistance and disease transmission. In this Review, we outline the methodologies of GWAS, the current state of the field of microbial GWAS, and how lessons from human GWAS can direct the future of the field.
Partial DNA-guided Cas9 enables genome editing with reduced off-target activity

PubMed Central

Yin, Hao; Song, Chun-Qing; Suresh, Sneha; Kwan, Suet-Yan; Wu, Qiongqiong; Walsh, Stephen; Ding, Junmei; Bogorad, Roman L; Zhu, Lihua Julie; Wolfe, Scot A; Koteliansky, Victor; Xue, Wen; Langer, Robert; Anderson, Daniel G

2018-01-01

CRISPR–Cas9 is a versatile RNA-guided genome editing tool. Here we demonstrate that partial replacement of RNA nucleotides with DNA nucleotides in CRISPR RNA (crRNA) enables efficient gene editing in human cells. This strategy of partial DNA replacement retains on-target activity when used with both crRNA and sgRNA, as well as with multiple guide sequences. Partial DNA replacement also works for crRNA of Cpf1, another CRISPR system. We find that partial DNA replacement in the guide sequence significantly reduces off-target genome editing through focused analysis of off-target cleavage, measurement of mismatch tolerance and genome-wide profiling of off-target sites. Using the structure of the Cas9–sgRNA complex as a guide, the majority of the 3′ end of crRNA can be replaced with DNA nucleotide, and the 5 - and 3′-DNA-replaced crRNA enables efficient genome editing. Cas9 guided by a DNA–RNA chimera may provide a generalized strategy to reduce both the cost and the off-target genome editing in human cells. PMID:29377001
Computer vision and machine learning for robust phenotyping in genome-wide studies

PubMed Central

Zhang, Jiaoping; Naik, Hsiang Sing; Assefa, Teshale; Sarkar, Soumik; Reddy, R. V. Chowda; Singh, Arti; Ganapathysubramanian, Baskar; Singh, Asheesh K.

2017-01-01

Traditional evaluation of crop biotic and abiotic stresses are time-consuming and labor-intensive limiting the ability to dissect the genetic basis of quantitative traits. A machine learning (ML)-enabled image-phenotyping pipeline for the genetic studies of abiotic stress iron deficiency chlorosis (IDC) of soybean is reported. IDC classification and severity for an association panel of 461 diverse plant-introduction accessions was evaluated using an end-to-end phenotyping workflow. The workflow consisted of a multi-stage procedure including: (1) optimized protocols for consistent image capture across plant canopies, (2) canopy identification and registration from cluttered backgrounds, (3) extraction of domain expert informed features from the processed images to accurately represent IDC expression, and (4) supervised ML-based classifiers that linked the automatically extracted features with expert-rating equivalent IDC scores. ML-generated phenotypic data were subsequently utilized for the genome-wide association study and genomic prediction. The results illustrate the reliability and advantage of ML-enabled image-phenotyping pipeline by identifying previously reported locus and a novel locus harboring a gene homolog involved in iron acquisition. This study demonstrates a promising path for integrating the phenotyping pipeline into genomic prediction, and provides a systematic framework enabling robust and quicker phenotyping through ground-based systems. PMID:28272456
Landscape genomics reveals altered genome wide diversity within revegetated stands of Eucalyptus microcarpa (Grey Box).

PubMed

Jordan, Rebecca; Dillon, Shannon K; Prober, Suzanne M; Hoffmann, Ary A

2016-12-01

In order to contribute to evolutionary resilience and adaptive potential in highly modified landscapes, revegetated areas should ideally reflect levels of genetic diversity within and across natural stands. Landscape genomic analyses enable such diversity patterns to be characterized at genome and chromosomal levels. Landscape-wide patterns of genomic diversity were assessed in Eucalyptus microcarpa, a dominant tree species widely used in revegetation in Southeastern Australia. Trees from small and large patches within large remnants, small isolated remnants and revegetation sites were assessed across the now highly fragmented distribution of this species using the DArTseq genomic approach. Genomic diversity was similar within all three types of remnant patches analysed, although often significantly but only slightly lower in revegetation sites compared with natural remnants. Differences in diversity between stand types varied across chromosomes. Genomic differentiation was higher between small, isolated remnants, and among revegetated sites compared with natural stands. We conclude that small remnants and revegetated sites of our E. microcarpa samples largely but not completely capture patterns in genomic diversity across the landscape. Genomic approaches provide a powerful tool for assessing restoration efforts across the landscape. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.
SCOPA and META-SCOPA: software for the analysis and aggregation of genome-wide association studies of multiple correlated phenotypes.

PubMed

Mägi, Reedik; Suleimanov, Yury V; Clarke, Geraldine M; Kaakinen, Marika; Fischer, Krista; Prokopenko, Inga; Morris, Andrew P

2017-01-11

Genome-wide association studies (GWAS) of single nucleotide polymorphisms (SNPs) have been successful in identifying loci contributing genetic effects to a wide range of complex human diseases and quantitative traits. The traditional approach to GWAS analysis is to consider each phenotype separately, despite the fact that many diseases and quantitative traits are correlated with each other, and often measured in the same sample of individuals. Multivariate analyses of correlated phenotypes have been demonstrated, by simulation, to increase power to detect association with SNPs, and thus may enable improved detection of novel loci contributing to diseases and quantitative traits. We have developed the SCOPA software to enable GWAS analysis of multiple correlated phenotypes. The software implements "reverse regression" methodology, which treats the genotype of an individual at a SNP as the outcome and the phenotypes as predictors in a general linear model. SCOPA can be applied to quantitative traits and categorical phenotypes, and can accommodate imputed genotypes under a dosage model. The accompanying META-SCOPA software enables meta-analysis of association summary statistics from SCOPA across GWAS. Application of SCOPA to two GWAS of high-and low-density lipoprotein cholesterol, triglycerides and body mass index, and subsequent meta-analysis with META-SCOPA, highlighted stronger association signals than univariate phenotype analysis at established lipid and obesity loci. The META-SCOPA meta-analysis also revealed a novel signal of association at genome-wide significance for triglycerides mapping to GPC5 (lead SNP rs71427535, p = 1.1x10 -8 ), which has not been reported in previous large-scale GWAS of lipid traits. The SCOPA and META-SCOPA software enable discovery and dissection of multiple phenotype association signals through implementation of a powerful reverse regression approach.
A Transcriptome Map of Actinobacillus pleuropneumoniae at Single-Nucleotide Resolution Using Deep RNA-Seq

PubMed Central

Su, Zhipeng; Zhu, Jiawen; Xu, Zhuofei; Xiao, Ran; Zhou, Rui; Li, Lu; Chen, Huanchun

2016-01-01

Actinobacillus pleuropneumoniae is the pathogen of porcine contagious pleuropneumoniae, a highly contagious respiratory disease of swine. Although the genome of A. pleuropneumoniae was sequenced several years ago, limited information is available on the genome-wide transcriptional analysis to accurately annotate the gene structures and regulatory elements. High-throughput RNA sequencing (RNA-seq) has been applied to study the transcriptional landscape of bacteria, which can efficiently and accurately identify gene expression regions and unknown transcriptional units, especially small non-coding RNAs (sRNAs), UTRs and regulatory regions. The aim of this study is to comprehensively analyze the transcriptome of A. pleuropneumoniae by RNA-seq in order to improve the existing genome annotation and promote our understanding of A. pleuropneumoniae gene structures and RNA-based regulation. In this study, we utilized RNA-seq to construct a single nucleotide resolution transcriptome map of A. pleuropneumoniae. More than 3.8 million high-quality reads (average length ~90 bp) from a cDNA library were generated and aligned to the reference genome. We identified 32 open reading frames encoding novel proteins that were mis-annotated in the previous genome annotations. The start sites for 35 genes based on the current genome annotation were corrected. Furthermore, 51 sRNAs in the A. pleuropneumoniae genome were discovered, of which 40 sRNAs were never reported in previous studies. The transcriptome map also enabled visualization of 5'- and 3'-UTR regions, in which contained 11 sRNAs. In addition, 351 operons covering 1230 genes throughout the whole genome were identified. The RNA-Seq based transcriptome map validated annotated genes and corrected annotations of open reading frames in the genome, and led to the identification of many functional elements (e.g. regions encoding novel proteins, non-coding sRNAs and operon structures). The transcriptional units described in this study provide a foundation for future studies concerning the gene functions and the transcriptional regulatory architectures of this pathogen. PMID:27018591
WordSeeker: concurrent bioinformatics software for discovering genome-wide patterns and word-based genomic signatures

PubMed Central

2010-01-01

Background An important focus of genomic science is the discovery and characterization of all functional elements within genomes. In silico methods are used in genome studies to discover putative regulatory genomic elements (called words or motifs). Although a number of methods have been developed for motif discovery, most of them lack the scalability needed to analyze large genomic data sets. Methods This manuscript presents WordSeeker, an enumerative motif discovery toolkit that utilizes multi-core and distributed computational platforms to enable scalable analysis of genomic data. A controller task coordinates activities of worker nodes, each of which (1) enumerates a subset of the DNA word space and (2) scores words with a distributed Markov chain model. Results A comprehensive suite of performance tests was conducted to demonstrate the performance, speedup and efficiency of WordSeeker. The scalability of the toolkit enabled the analysis of the entire genome of Arabidopsis thaliana; the results of the analysis were integrated into The Arabidopsis Gene Regulatory Information Server (AGRIS). A public version of WordSeeker was deployed on the Glenn cluster at the Ohio Supercomputer Center. Conclusion WordSeeker effectively utilizes concurrent computing platforms to enable the identification of putative functional elements in genomic data sets. This capability facilitates the analysis of the large quantity of sequenced genomic data. PMID:21210985
Genome-wide scans for loci under selection in humans

PubMed Central

2005-01-01

Natural selection, which can be defined as the differential contribution of genetic variants to future generations, is the driving force of Darwinian evolution. Identifying regions of the human genome that have been targets of natural selection is an important step in clarifying human evolutionary history and understanding how genetic variation results in phenotypic diversity, it may also facilitate the search for complex disease genes. Technological advances in high-throughput DNA sequencing and single nucleotide polymorphism genotyping have enabled several genome-wide scans of natural selection to be undertaken. Here, some of the observations that are beginning to emerge from these studies will be reviewed, including evidence for geographically restricted selective pressures (ie local adaptation) and a relationship between genes subject to natural selection and human disease. In addition, the paper will highlight several important problems that need to be addressed in future genome-wide studies of natural selection. PMID:16004726
SuperDCA for genome-wide epistasis analysis.

PubMed

Puranen, Santeri; Pesonen, Maiju; Pensar, Johan; Xu, Ying Ying; Lees, John A; Bentley, Stephen D; Croucher, Nicholas J; Corander, Jukka

2018-05-29

The potential for genome-wide modelling of epistasis has recently surfaced given the possibility of sequencing densely sampled populations and the emerging families of statistical interaction models. Direct coupling analysis (DCA) has previously been shown to yield valuable predictions for single protein structures, and has recently been extended to genome-wide analysis of bacteria, identifying novel interactions in the co-evolution between resistance, virulence and core genome elements. However, earlier computational DCA methods have not been scalable to enable model fitting simultaneously to 10 4 -10 5 polymorphisms, representing the amount of core genomic variation observed in analyses of many bacterial species. Here, we introduce a novel inference method (SuperDCA) that employs a new scoring principle, efficient parallelization, optimization and filtering on phylogenetic information to achieve scalability for up to 10 5 polymorphisms. Using two large population samples of Streptococcus pneumoniae, we demonstrate the ability of SuperDCA to make additional significant biological findings about this major human pathogen. We also show that our method can uncover signals of selection that are not detectable by genome-wide association analysis, even though our analysis does not require phenotypic measurements. SuperDCA, thus, holds considerable potential in building understanding about numerous organisms at a systems biological level.
Genome dynamics of the human embryonic kidney 293 lineage in response to cell biology manipulations.

PubMed

Lin, Yao-Cheng; Boone, Morgane; Meuris, Leander; Lemmens, Irma; Van Roy, Nadine; Soete, Arne; Reumers, Joke; Moisse, Matthieu; Plaisance, Stéphane; Drmanac, Radoje; Chen, Jason; Speleman, Frank; Lambrechts, Diether; Van de Peer, Yves; Tavernier, Jan; Callewaert, Nico

2014-09-03

The HEK293 human cell lineage is widely used in cell biology and biotechnology. Here we use whole-genome resequencing of six 293 cell lines to study the dynamics of this aneuploid genome in response to the manipulations used to generate common 293 cell derivatives, such as transformation and stable clone generation (293T); suspension growth adaptation (293S); and cytotoxic lectin selection (293SG). Remarkably, we observe that copy number alteration detection could identify the genomic region that enabled cell survival under selective conditions (i.c. ricin selection). Furthermore, we present methods to detect human/vector genome breakpoints and a user-friendly visualization tool for the 293 genome data. We also establish that the genome structure composition is in steady state for most of these cell lines when standard cell culturing conditions are used. This resource enables novel and more informed studies with 293 cells, and we will distribute the sequenced cell lines to this effect.
Genomic Definition of Hypervirulent and Multidrug-Resistant Klebsiella pneumoniae Clonal Groups

PubMed Central

Bialek-Davenet, Suzanne; Criscuolo, Alexis; Ailloud, Florent; Passet, Virginie; Jones, Louis; Delannoy-Vieillard, Anne-Sophie; Garin, Benoit; Le Hello, Simon; Arlet, Guillaume; Nicolas-Chanoine, Marie-Hélène; Decré, Dominique

2014-01-01

Multidrug-resistant and highly virulent Klebsiella pneumoniae isolates are emerging, but the clonal groups (CGs) corresponding to these high-risk strains have remained imprecisely defined. We aimed to identify K. pneumoniae CGs on the basis of genome-wide sequence variation and to provide a simple bioinformatics tool to extract virulence and resistance gene data from genomic data. We sequenced 48 K. pneumoniae isolates, mostly of serotypes K1 and K2, and compared the genomes with 119 publicly available genomes. A total of 694 highly conserved genes were included in a core-genome multilocus sequence typing scheme, and cluster analysis of the data enabled precise definition of globally distributed hypervirulent and multidrug-resistant CGs. In addition, we created a freely accessible database, BIGSdb-Kp, to enable rapid extraction of medically and epidemiologically relevant information from genomic sequences of K. pneumoniae. Although drug-resistant and virulent K. pneumoniae populations were largely nonoverlapping, isolates with combined virulence and resistance features were detected. PMID:25341126
StereoGene: rapid estimation of genome-wide correlation of continuous or interval feature data.

PubMed

Stavrovskaya, Elena D; Niranjan, Tejasvi; Fertig, Elana J; Wheelan, Sarah J; Favorov, Alexander V; Mironov, Andrey A

2017-10-15

Genomics features with similar genome-wide distributions are generally hypothesized to be functionally related, for example, colocalization of histones and transcription start sites indicate chromatin regulation of transcription factor activity. Therefore, statistical algorithms to perform spatial, genome-wide correlation among genomic features are required. Here, we propose a method, StereoGene, that rapidly estimates genome-wide correlation among pairs of genomic features. These features may represent high-throughput data mapped to reference genome or sets of genomic annotations in that reference genome. StereoGene enables correlation of continuous data directly, avoiding the data binarization and subsequent data loss. Correlations are computed among neighboring genomic positions using kernel correlation. Representing the correlation as a function of the genome position, StereoGene outputs the local correlation track as part of the analysis. StereoGene also accounts for confounders such as input DNA by partial correlation. We apply our method to numerous comparisons of ChIP-Seq datasets from the Human Epigenome Atlas and FANTOM CAGE to demonstrate its wide applicability. We observe the changes in the correlation between epigenomic features across developmental trajectories of several tissue types consistent with known biology and find a novel spatial correlation of CAGE clusters with donor splice sites and with poly(A) sites. These analyses provide examples for the broad applicability of StereoGene for regulatory genomics. The StereoGene C ++ source code, program documentation, Galaxy integration scripts and examples are available from the project homepage http://stereogene.bioinf.fbb.msu.ru/. favorov@sensi.org. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Genome-wide specificity of DNA binding, gene regulation, and chromatin remodeling by TALE- and CRISPR/Cas9-based transcriptional activators

PubMed Central

Polstein, Lauren R.; Perez-Pinera, Pablo; Kocak, D. Dewran; Vockley, Christopher M.; Bledsoe, Peggy; Song, Lingyun; Safi, Alexias; Crawford, Gregory E.; Reddy, Timothy E.; Gersbach, Charles A.

2015-01-01

Genome engineering technologies based on the CRISPR/Cas9 and TALE systems are enabling new approaches in science and biotechnology. However, the specificity of these tools in complex genomes and the role of chromatin structure in determining DNA binding are not well understood. We analyzed the genome-wide effects of TALE- and CRISPR-based transcriptional activators in human cells using ChIP-seq to assess DNA-binding specificity and RNA-seq to measure the specificity of perturbing the transcriptome. Additionally, DNase-seq was used to assess genome-wide chromatin remodeling that occurs as a result of their action. Our results show that these transcription factors are highly specific in both DNA binding and gene regulation and are able to open targeted regions of closed chromatin independent of gene activation. Collectively, these results underscore the potential for these technologies to make precise changes to gene expression for gene and cell therapies or fundamental studies of gene function. PMID:26025803
Strategies for the acquisition of transcriptional and epigenetic information in single cells.

PubMed

Li, Guang; Dzilic, Elda; Flores, Nick; Shieh, Alice; Wu, Sean M

2017-03-01

As the basic unit of living organisms, each single cell has unique molecular signatures and functions. Our ability to uncover the transcriptional and epigenetic signature of single cells has been hampered by the lack of tools to explore this area of research. The advent of microfluidic single cell technology along with single cell genome-wide DNA amplification methods had greatly improved our understanding of the expression variation in single cells. Transcriptional expression profile by multiplex qPCR or genome-wide RNA sequencing has enabled us to examine genes expression in single cells in different tissues. With the new tools, the identification of new cellular heterogeneity, novel marker genes, unique subpopulations, and spatial locations of each single cell can be acquired successfully. Epigenetic modifications for each single cell can also be obtained via similar methods. Based on single cell genome sequencing, single cell epigenetic information including histone modifications, DNA methylation, and chromatin accessibility have been explored and provided valuable insights regarding gene regulation and disease prognosis. In this article, we review the development of strategies to obtain single cell transcriptional and epigenetic data. Furthermore, we discuss ways in which single cell studies may help to provide greater understanding of the mechanisms of basic cardiovascular biology that will eventually lead to improvement in our ability to diagnose disease and develop new therapies.

High-utility conserved avian microsatellite markers enable parentage and population studies across a wide range of species

PubMed Central

2013-01-01

Background Microsatellites are widely used for many genetic studies. In contrast to single nucleotide polymorphism (SNP) and genotyping-by-sequencing methods, they are readily typed in samples of low DNA quality/concentration (e.g. museum/non-invasive samples), and enable the quick, cheap identification of species, hybrids, clones and ploidy. Microsatellites also have the highest cross-species utility of all types of markers used for genotyping, but, despite this, when isolated from a single species, only a relatively small proportion will be of utility. Marker development of any type requires skill and time. The availability of sufficient “off-the-shelf” markers that are suitable for genotyping a wide range of species would not only save resources but also uniquely enable new comparisons of diversity among taxa at the same set of loci. No other marker types are capable of enabling this. We therefore developed a set of avian microsatellite markers with enhanced cross-species utility. Results We selected highly-conserved sequences with a high number of repeat units in both of two genetically distant species. Twenty-four primer sets were designed from homologous sequences that possessed at least eight repeat units in both the zebra finch (Taeniopygia guttata) and chicken (Gallus gallus). Each primer sequence was a complete match to zebra finch and, after accounting for degenerate bases, at least 86% similar to chicken. We assessed primer-set utility by genotyping individuals belonging to eight passerine and four non-passerine species. The majority of the new Conserved Avian Microsatellite (CAM) markers amplified in all 12 species tested (on average, 94% in passerines and 95% in non-passerines). This new marker set is of especially high utility in passerines, with a mean 68% of loci polymorphic per species, compared with 42% in non-passerine species. Conclusions When combined with previously described conserved loci, this new set of conserved markers will not only reduce the necessity and expense of microsatellite isolation for a wide range of genetic studies, including avian parentage and population analyses, but will also now enable comparisons of genetic diversity among different species (and populations) at the same set of loci, with no or reduced bias. Finally, the approach used here can be applied to other taxa in which appropriate genome sequences are available. PMID:23497230
snpGeneSets: An R Package for Genome-Wide Study Annotation

PubMed Central

Mei, Hao; Li, Lianna; Jiang, Fan; Simino, Jeannette; Griswold, Michael; Mosley, Thomas; Liu, Shijian

2016-01-01

Genome-wide studies (GWS) of SNP associations and differential gene expressions have generated abundant results; next-generation sequencing technology has further boosted the number of variants and genes identified. Effective interpretation requires massive annotation and downstream analysis of these genome-wide results, a computationally challenging task. We developed the snpGeneSets package to simplify annotation and analysis of GWS results. Our package integrates local copies of knowledge bases for SNPs, genes, and gene sets, and implements wrapper functions in the R language to enable transparent access to low-level databases for efficient annotation of large genomic data. The package contains functions that execute three types of annotations: (1) genomic mapping annotation for SNPs and genes and functional annotation for gene sets; (2) bidirectional mapping between SNPs and genes, and genes and gene sets; and (3) calculation of gene effect measures from SNP associations and performance of gene set enrichment analyses to identify functional pathways. We applied snpGeneSets to type 2 diabetes (T2D) results from the NHGRI genome-wide association study (GWAS) catalog, a Finnish GWAS, and a genome-wide expression study (GWES). These studies demonstrate the usefulness of snpGeneSets for annotating and performing enrichment analysis of GWS results. The package is open-source, free, and can be downloaded at: https://www.umc.edu/biostats_software/. PMID:27807048
An interactive environment for agile analysis and visualization of ChIP-sequencing data.

PubMed

Lerdrup, Mads; Johansen, Jens Vilstrup; Agrawal-Singh, Shuchi; Hansen, Klaus

2016-04-01

To empower experimentalists with a means for fast and comprehensive chromatin immunoprecipitation sequencing (ChIP-seq) data analyses, we introduce an integrated computational environment, EaSeq. The software combines the exploratory power of genome browsers with an extensive set of interactive and user-friendly tools for genome-wide abstraction and visualization. It enables experimentalists to easily extract information and generate hypotheses from their own data and public genome-wide datasets. For demonstration purposes, we performed meta-analyses of public Polycomb ChIP-seq data and established a new screening approach to analyze more than 900 datasets from mouse embryonic stem cells for factors potentially associated with Polycomb recruitment. EaSeq, which is freely available and works on a standard personal computer, can substantially increase the throughput of many analysis workflows, facilitate transparency and reproducibility by automatically documenting and organizing analyses, and enable a broader group of scientists to gain insights from ChIP-seq data.
Detecting DNA double-stranded breaks in mammalian genomes by linear amplification-mediated high-throughput genome-wide translocation sequencing.

PubMed

Hu, Jiazhi; Meyers, Robin M; Dong, Junchao; Panchakshari, Rohit A; Alt, Frederick W; Frock, Richard L

2016-05-01

Unbiased, high-throughput assays for detecting and quantifying DNA double-stranded breaks (DSBs) across the genome in mammalian cells will facilitate basic studies of the mechanisms that generate and repair endogenous DSBs. They will also enable more applied studies, such as those to evaluate the on- and off-target activities of engineered nucleases. Here we describe a linear amplification-mediated high-throughput genome-wide sequencing (LAM-HTGTS) method for the detection of genome-wide 'prey' DSBs via their translocation in cultured mammalian cells to a fixed 'bait' DSB. Bait-prey junctions are cloned directly from isolated genomic DNA using LAM-PCR and unidirectionally ligated to bridge adapters; subsequent PCR steps amplify the single-stranded DNA junction library in preparation for Illumina Miseq paired-end sequencing. A custom bioinformatics pipeline identifies prey sequences that contribute to junctions and maps them across the genome. LAM-HTGTS differs from related approaches because it detects a wide range of broken end structures with nucleotide-level resolution. Familiarity with nucleic acid methods and next-generation sequencing analysis is necessary for library generation and data interpretation. LAM-HTGTS assays are sensitive, reproducible, relatively inexpensive, scalable and straightforward to implement with a turnaround time of <1 week.
Genomic markers for decision making: what is preventing us from using markers?

PubMed

Coyle, Vicky M; Johnston, Patrick G

2010-02-01

The advent of novel genomic technologies that enable the evaluation of genomic alterations on a genome-wide scale has significantly altered the field of genomic marker research in solid tumors. Researchers have moved away from the traditional model of identifying a particular genomic alteration and evaluating the association between this finding and a clinical outcome measure to a new approach involving the identification and measurement of multiple genomic markers simultaneously within clinical studies. This in turn has presented additional challenges in considering the use of genomic markers in oncology, such as clinical study design, reproducibility and interpretation and reporting of results. This Review will explore these challenges, focusing on microarray-based gene-expression profiling, and highlights some common failings in study design that have impacted on the use of putative genomic markers in the clinic. Despite these rapid technological advances there is still a paucity of genomic markers in routine clinical use at present. A rational and focused approach to the evaluation and validation of genomic markers is needed, whereby analytically validated markers are investigated in clinical studies that are adequately powered and have pre-defined patient populations and study endpoints. Furthermore, novel adaptive clinical trial designs, incorporating putative genomic markers into prospective clinical trials, will enable the evaluation of these markers in a rigorous and timely fashion. Such approaches have the potential to facilitate the implementation of such markers into routine clinical practice and consequently enable the rational and tailored use of cancer therapies for individual patients.
Genome-Wide SNP Detection, Validation, and Development of an 8K SNP Array for Apple

PubMed Central

Chagné, David; Crowhurst, Ross N.; Troggio, Michela; Davey, Mark W.; Gilmore, Barbara; Lawley, Cindy; Vanderzande, Stijn; Hellens, Roger P.; Kumar, Satish; Cestaro, Alessandro; Velasco, Riccardo; Main, Dorrie; Rees, Jasper D.; Iezzoni, Amy; Mockler, Todd; Wilhelm, Larry; Van de Weg, Eric; Gardiner, Susan E.; Bassil, Nahla; Peace, Cameron

2012-01-01

As high-throughput genetic marker screening systems are essential for a range of genetics studies and plant breeding applications, the International RosBREED SNP Consortium (IRSC) has utilized the Illumina Infinium® II system to develop a medium- to high-throughput SNP screening tool for genome-wide evaluation of allelic variation in apple (Malus×domestica) breeding germplasm. For genome-wide SNP discovery, 27 apple cultivars were chosen to represent worldwide breeding germplasm and re-sequenced at low coverage with the Illumina Genome Analyzer II. Following alignment of these sequences to the whole genome sequence of ‘Golden Delicious’, SNPs were identified using SoapSNP. A total of 2,113,120 SNPs were detected, corresponding to one SNP to every 288 bp of the genome. The Illumina GoldenGate® assay was then used to validate a subset of 144 SNPs with a range of characteristics, using a set of 160 apple accessions. This validation assay enabled fine-tuning of the final subset of SNPs for the Illumina Infinium® II system. The set of stringent filtering criteria developed allowed choice of a set of SNPs that not only exhibited an even distribution across the apple genome and a range of minor allele frequencies to ensure utility across germplasm, but also were located in putative exonic regions to maximize genotyping success rate. A total of 7867 apple SNPs was established for the IRSC apple 8K SNP array v1, of which 5554 were polymorphic after evaluation in segregating families and a germplasm collection. This publicly available genomics resource will provide an unprecedented resolution of SNP haplotypes, which will enable marker-locus-trait association discovery, description of the genetic architecture of quantitative traits, investigation of genetic variation (neutral and functional), and genomic selection in apple. PMID:22363718
Systems Biology Approaches for Understanding Genome Architecture.

PubMed

Sewitz, Sven; Lipkow, Karen

2016-01-01

The linear and three-dimensional arrangement and composition of chromatin in eukaryotic genomes underlies the mechanisms directing gene regulation. Understanding this organization requires the integration of many data types and experimental results. Here we describe the approach of integrating genome-wide protein-DNA binding data to determine chromatin states. To investigate spatial aspects of genome organization, we present a detailed description of how to run stochastic simulations of protein movements within a simulated nucleus in 3D. This systems level approach enables the development of novel questions aimed at understanding the basic mechanisms that regulate genome dynamics.
Comparison of Models and Whole-Genome Profiling Approaches for Genomic-Enabled Prediction of Septoria Tritici Blotch, Stagonospora Nodorum Blotch, and Tan Spot Resistance in Wheat.

PubMed

Juliana, Philomin; Singh, Ravi P; Singh, Pawan K; Crossa, Jose; Rutkoski, Jessica E; Poland, Jesse A; Bergstrom, Gary C; Sorrells, Mark E

2017-07-01

The leaf spotting diseases in wheat that include Septoria tritici blotch (STB) caused by , Stagonospora nodorum blotch (SNB) caused by , and tan spot (TS) caused by pose challenges to breeding programs in selecting for resistance. A promising approach that could enable selection prior to phenotyping is genomic selection that uses genome-wide markers to estimate breeding values (BVs) for quantitative traits. To evaluate this approach for seedling and/or adult plant resistance (APR) to STB, SNB, and TS, we compared the predictive ability of least-squares (LS) approach with genomic-enabled prediction models including genomic best linear unbiased predictor (GBLUP), Bayesian ridge regression (BRR), Bayes A (BA), Bayes B (BB), Bayes Cπ (BC), Bayesian least absolute shrinkage and selection operator (BL), and reproducing kernel Hilbert spaces markers (RKHS-M), a pedigree-based model (RKHS-P) and RKHS markers and pedigree (RKHS-MP). We observed that LS gave the lowest prediction accuracies and RKHS-MP, the highest. The genomic-enabled prediction models and RKHS-P gave similar accuracies. The increase in accuracy using genomic prediction models over LS was 48%. The mean genomic prediction accuracies were 0.45 for STB (APR), 0.55 for SNB (seedling), 0.66 for TS (seedling) and 0.48 for TS (APR). We also compared markers from two whole-genome profiling approaches: genotyping by sequencing (GBS) and diversity arrays technology sequencing (DArTseq) for prediction. While, GBS markers performed slightly better than DArTseq, combining markers from the two approaches did not improve accuracies. We conclude that implementing GS in breeding for these diseases would help to achieve higher accuracies and rapid gains from selection. Copyright © 2017 Crop Science Society of America.
Understanding development and stem cells using single cell-based analyses of gene expression

PubMed Central

Kumar, Pavithra; Tan, Yuqi

2017-01-01

In recent years, genome-wide profiling approaches have begun to uncover the molecular programs that drive developmental processes. In particular, technical advances that enable genome-wide profiling of thousands of individual cells have provided the tantalizing prospect of cataloging cell type diversity and developmental dynamics in a quantitative and comprehensive manner. Here, we review how single-cell RNA sequencing has provided key insights into mammalian developmental and stem cell biology, emphasizing the analytical approaches that are specific to studying gene expression in single cells. PMID:28049689
Next-generation sequencing strategies enable routine detection of balanced chromosome rearrangements for clinical diagnostics and genetic research.

PubMed

Talkowski, Michael E; Ernst, Carl; Heilbut, Adrian; Chiang, Colby; Hanscom, Carrie; Lindgren, Amelia; Kirby, Andrew; Liu, Shangtao; Muddukrishna, Bhavana; Ohsumi, Toshiro K; Shen, Yiping; Borowsky, Mark; Daly, Mark J; Morton, Cynthia C; Gusella, James F

2011-04-08

The contribution of balanced chromosomal rearrangements to complex disorders remains unclear because they are not detected routinely by genome-wide microarrays and clinical localization is imprecise. Failure to consider these events bypasses a potentially powerful complement to single nucleotide polymorphism and copy-number association approaches to complex disorders, where much of the heritability remains unexplained. To capitalize on this genetic resource, we have applied optimized sequencing and analysis strategies to test whether these potentially high-impact variants can be mapped at reasonable cost and throughput. By using a whole-genome multiplexing strategy, rearrangement breakpoints could be delineated at a fraction of the cost of standard sequencing. For rearrangements already mapped regionally by karyotyping and fluorescence in situ hybridization, a targeted approach enabled capture and sequencing of multiple breakpoints simultaneously. Importantly, this strategy permitted capture and unique alignment of up to 97% of repeat-masked sequences in the targeted regions. Genome-wide analyses estimate that only 3.7% of bases should be routinely omitted from genomic DNA capture experiments. Illustrating the power of these approaches, the rearrangement breakpoints were rapidly defined to base pair resolution and revealed unexpected sequence complexity, such as co-occurrence of inversion and translocation as an underlying feature of karyotypically balanced alterations. These findings have implications ranging from genome annotation to de novo assemblies and could enable sequencing screens for structural variations at a cost comparable to that of microarrays in standard clinical practice. Copyright © 2011 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Acid Stress Response Mechanisms of Group B Streptococci

PubMed Central

Shabayek, Sarah; Spellerberg, Barbara

2017-01-01

Group B streptococcus (GBS) is a leading cause of neonatal mortality and morbidity in the United States and Europe. It is part of the vaginal microbiota in up to 30% of pregnant women and can be passed on to the newborn through perinatal transmission. GBS has the ability to survive in multiple different host niches. The pathophysiology of this bacterium reveals an outstanding ability to withstand varying pH fluctuations of the surrounding environments inside the human host. GBS host pathogen interations include colonization of the acidic vaginal mucosa, invasion of the neutral human blood or amniotic fluid, breaching of the blood brain barrier as well as survival within the acidic phagolysosomal compartment of macrophages. However, investigations on GBS responses to acid stress are limited. Technologies, such as whole genome sequencing, genome-wide transcription and proteome mapping facilitate large scale identification of genes and proteins. Mechanisms enabling GBS to cope with acid stress have mainly been studied through these techniques and are summarized in the current review PMID:28936424
Guidelines for whole genome bisulphite sequencing of intact and FFPET DNA on the Illumina HiSeq X Ten.

PubMed

Nair, Shalima S; Luu, Phuc-Loi; Qu, Wenjia; Maddugoda, Madhavi; Huschtscha, Lily; Reddel, Roger; Chenevix-Trench, Georgia; Toso, Martina; Kench, James G; Horvath, Lisa G; Hayes, Vanessa M; Stricker, Phillip D; Hughes, Timothy P; White, Deborah L; Rasko, John E J; Wong, Justin J-L; Clark, Susan J

2018-05-28

Comprehensive genome-wide DNA methylation profiling is critical to gain insights into epigenetic reprogramming during development and disease processes. Among the different genome-wide DNA methylation technologies, whole genome bisulphite sequencing (WGBS) is considered the gold standard for assaying genome-wide DNA methylation at single base resolution. However, the high sequencing cost to achieve the optimal depth of coverage limits its application in both basic and clinical research. To achieve 15× coverage of the human methylome, using WGBS, requires approximately three lanes of 100-bp-paired-end Illumina HiSeq 2500 sequencing. It is important, therefore, for advances in sequencing technologies to be developed to enable cost-effective high-coverage sequencing. In this study, we provide an optimised WGBS methodology, from library preparation to sequencing and data processing, to enable 16-20× genome-wide coverage per single lane of HiSeq X Ten, HCS 3.3.76. To process and analyse the data, we developed a WGBS pipeline (METH10X) that is fast and can call SNPs. We performed WGBS on both high-quality intact DNA and degraded DNA from formalin-fixed paraffin-embedded tissue. First, we compared different library preparation methods on the HiSeq 2500 platform to identify the best method for sequencing on the HiSeq X Ten. Second, we optimised the PhiX and genome spike-ins to achieve higher quality and coverage of WGBS data on the HiSeq X Ten. Third, we performed integrated whole genome sequencing (WGS) and WGBS of the same DNA sample in a single lane of HiSeq X Ten to improve data output. Finally, we compared methylation data from the HiSeq 2500 and HiSeq X Ten and found high concordance (Pearson r > 0.9×). Together we provide a systematic, efficient and complete approach to perform and analyse WGBS on the HiSeq X Ten. Our protocol allows for large-scale WGBS studies at reasonable processing time and cost on the HiSeq X Ten platform.
Screening of duplicated loci reveals hidden divergence patterns in a complex salmonid genome

USGS Publications Warehouse

Limborg, Morten T.; Larson, Wesley; Seeb, Lisa W.; Seeb, James E.

2017-01-01

A whole-genome duplication (WGD) doubles the entire genomic content of a species and is thought to have catalysed adaptive radiation in some polyploid-origin lineages. However, little is known about general consequences of a WGD because gene duplicates (i.e., paralogs) are commonly filtered in genomic studies; such filtering may remove substantial portions of the genome in data sets from polyploid-origin species. We demonstrate a new method that enables genome-wide scans for signatures of selection at both nonduplicated and duplicated loci by taking locus-specific copy number into account. We apply this method to RAD sequence data from different ecotypes of a polyploid-origin salmonid (Oncorhynchus nerka) and reveal signatures of divergent selection that would have been missed if duplicated loci were filtered. We also find conserved signatures of elevated divergence at pairs of homeologous chromosomes with residual tetrasomic inheritance, suggesting that joint evolution of some nondiverged gene duplicates may affect the adaptive potential of these genes. These findings illustrate that including duplicated loci in genomic analyses enables novel insights into the evolutionary consequences of WGDs and local segmental gene duplications.
Genome-wide specificity of DNA binding, gene regulation, and chromatin remodeling by TALE- and CRISPR/Cas9-based transcriptional activators.

PubMed

Polstein, Lauren R; Perez-Pinera, Pablo; Kocak, D Dewran; Vockley, Christopher M; Bledsoe, Peggy; Song, Lingyun; Safi, Alexias; Crawford, Gregory E; Reddy, Timothy E; Gersbach, Charles A

2015-08-01

Genome engineering technologies based on the CRISPR/Cas9 and TALE systems are enabling new approaches in science and biotechnology. However, the specificity of these tools in complex genomes and the role of chromatin structure in determining DNA binding are not well understood. We analyzed the genome-wide effects of TALE- and CRISPR-based transcriptional activators in human cells using ChIP-seq to assess DNA-binding specificity and RNA-seq to measure the specificity of perturbing the transcriptome. Additionally, DNase-seq was used to assess genome-wide chromatin remodeling that occurs as a result of their action. Our results show that these transcription factors are highly specific in both DNA binding and gene regulation and are able to open targeted regions of closed chromatin independent of gene activation. Collectively, these results underscore the potential for these technologies to make precise changes to gene expression for gene and cell therapies or fundamental studies of gene function. © 2015 Polstein et al.; Published by Cold Spring Harbor Laboratory Press.
Genome-wide mapping of DNase I hypersensitive sites in rare cell populations using single-cell DNase sequencing.

PubMed

Cooper, James; Ding, Yi; Song, Jiuzhou; Zhao, Keji

2017-11-01

Increased chromatin accessibility is a feature of cell-type-specific cis-regulatory elements; therefore, mapping of DNase I hypersensitive sites (DHSs) enables the detection of active regulatory elements of transcription, including promoters, enhancers, insulators and locus-control regions. Single-cell DNase sequencing (scDNase-seq) is a method of detecting genome-wide DHSs when starting with either single cells or <1,000 cells from primary cell sources. This technique enables genome-wide mapping of hypersensitive sites in a wide range of cell populations that cannot be analyzed using conventional DNase I sequencing because of the requirement for millions of starting cells. Fresh cells, formaldehyde-cross-linked cells or cells recovered from formalin-fixed paraffin-embedded (FFPE) tissue slides are suitable for scDNase-seq assays. To generate scDNase-seq libraries, cells are lysed and then digested with DNase I. Circular carrier plasmid DNA is included during subsequent DNA purification and library preparation steps to prevent loss of the small quantity of DHS DNA. Libraries are generated for high-throughput sequencing on the Illumina platform using standard methods. Preparation of scDNase-seq libraries requires only 2 d. The materials and molecular biology techniques described in this protocol should be accessible to any general molecular biology laboratory. Processing of high-throughput sequencing data requires basic bioinformatics skills and uses publicly available bioinformatics software.
Partitioning heritability by functional annotation using genome-wide association summary statistics.

PubMed

Finucane, Hilary K; Bulik-Sullivan, Brendan; Gusev, Alexander; Trynka, Gosia; Reshef, Yakir; Loh, Po-Ru; Anttila, Verneri; Xu, Han; Zang, Chongzhi; Farh, Kyle; Ripke, Stephan; Day, Felix R; Purcell, Shaun; Stahl, Eli; Lindstrom, Sara; Perry, John R B; Okada, Yukinori; Raychaudhuri, Soumya; Daly, Mark J; Patterson, Nick; Neale, Benjamin M; Price, Alkes L

2015-11-01

Recent work has demonstrated that some functional categories of the genome contribute disproportionately to the heritability of complex diseases. Here we analyze a broad set of functional elements, including cell type-specific elements, to estimate their polygenic contributions to heritability in genome-wide association studies (GWAS) of 17 complex diseases and traits with an average sample size of 73,599. To enable this analysis, we introduce a new method, stratified LD score regression, for partitioning heritability from GWAS summary statistics while accounting for linked markers. This new method is computationally tractable at very large sample sizes and leverages genome-wide information. Our findings include a large enrichment of heritability in conserved regions across many traits, a very large immunological disease-specific enrichment of heritability in FANTOM5 enhancers and many cell type-specific enrichments, including significant enrichment of central nervous system cell types in the heritability of body mass index, age at menarche, educational attainment and smoking behavior.
Genomics-enabled analysis of the emergent disease cotton bacterial blight

PubMed Central

Phillips, Anne Z.; Burke, Jillian; Bunn, J. Imani; Allen, Tom W.; Wheeler, Terry

2017-01-01

Cotton bacterial blight (CBB), an important disease of (Gossypium hirsutum) in the early 20th century, had been controlled by resistant germplasm for over half a century. Recently, CBB re-emerged as an agronomic problem in the United States. Here, we report analysis of cotton variety planting statistics that indicate a steady increase in the percentage of susceptible cotton varieties grown each year since 2009. Phylogenetic analysis revealed that strains from the current outbreak cluster with race 18 Xanthomonas citri pv. malvacearum (Xcm) strains. Illumina based draft genomes were generated for thirteen Xcm isolates and analyzed along with 4 previously published Xcm genomes. These genomes encode 24 conserved and nine variable type three effectors. Strains in the race 18 clade contain 3 to 5 more effectors than other Xcm strains. SMRT sequencing of two geographically and temporally diverse strains of Xcm yielded circular chromosomes and accompanying plasmids. These genomes encode eight and thirteen distinct transcription activator-like effector genes. RNA-sequencing revealed 52 genes induced within two cotton cultivars by both tested Xcm strains. This gene list includes a homeologous pair of genes, with homology to the known susceptibility gene, MLO. In contrast, the two strains of Xcm induce different clade III SWEET sugar transporters. Subsequent genome wide analysis revealed patterns in the overall expression of homeologous gene pairs in cotton after inoculation by Xcm. These data reveal important insights into the Xcm-G. hirsutum disease complex and strategies for future development of resistant cultivars. PMID:28910288
Inferring transposons activity chronology by TRANScendence - TEs database and de-novo mining tool.

PubMed

Startek, Michał Piotr; Nogły, Jakub; Gromadka, Agnieszka; Grzebelus, Dariusz; Gambin, Anna

2017-10-16

The constant progress in sequencing technology leads to ever increasing amounts of genomic data. In the light of current evidence transposable elements (TEs for short) are becoming useful tools for learning about the evolution of host genome. Therefore the software for genome-wide detection and analysis of TEs is of great interest. Here we describe the computational tool for mining, classifying and storing TEs from newly sequenced genomes. This is an online, web-based, user-friendly service, enabling users to upload their own genomic data, and perform de-novo searches for TEs. The detected TEs are automatically analyzed, compared to reference databases, annotated, clustered into families, and stored in TEs repository. Also, the genome-wide nesting structure of found elements are detected and analyzed by new method for inferring evolutionary history of TEs. We illustrate the functionality of our tool by performing a full-scale analyses of TE landscape in Medicago truncatula genome. TRANScendence is an effective tool for the de-novo annotation and classification of transposable elements in newly-acquired genomes. Its streamlined interface makes it well-suited for evolutionary studies.
Using Full Genomic Information to Predict Disease: Breaking Down the Barriers Between Complex and Mendelian Diseases.

PubMed

Jordan, Daniel M; Do, Ron

2018-04-11

While sequence-based genetic tests have long been available for specific loci, especially for Mendelian disease, the rapidly falling costs of genome-wide genotyping arrays, whole-exome sequencing, and whole-genome sequencing are moving us toward a future where full genomic information might inform the prognosis and treatment of a variety of diseases, including complex disease. Similarly, the availability of large populations with full genomic information has enabled new insights about the etiology and genetic architecture of complex disease. Insights from the latest generation of genomic studies suggest that our categorization of diseases as complex may conceal a wide spectrum of genetic architectures and causal mechanisms that ranges from Mendelian forms of complex disease to complex regulatory structures underlying Mendelian disease. Here, we review these insights, along with advances in the prediction of disease risk and outcomes from full genomic information. Expected final online publication date for the Annual Review of Genomics and Human Genetics Volume 19 is August 31, 2018. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Development and application of a novel genome-wide SNP array reveals domestication history in soybean

PubMed Central

Wang, Jiao; Chu, Shanshan; Zhang, Huairen; Zhu, Ying; Cheng, Hao; Yu, Deyue

2016-01-01

Domestication of soybeans occurred under the intense human-directed selections aimed at developing high-yielding lines. Tracing the domestication history and identifying the genes underlying soybean domestication require further exploration. Here, we developed a high-throughput NJAU 355 K SoySNP array and used this array to study the genetic variation patterns in 367 soybean accessions, including 105 wild soybeans and 262 cultivated soybeans. The population genetic analysis suggests that cultivated soybeans have tended to originate from northern and central China, from where they spread to other regions, accompanied with a gradual increase in seed weight. Genome-wide scanning for evidence of artificial selection revealed signs of selective sweeps involving genes controlling domestication-related agronomic traits including seed weight. To further identify genomic regions related to seed weight, a genome-wide association study (GWAS) was conducted across multiple environments in wild and cultivated soybeans. As a result, a strong linkage disequilibrium region on chromosome 20 was found to be significantly correlated with seed weight in cultivated soybeans. Collectively, these findings should provide an important basis for genomic-enabled breeding and advance the study of functional genomics in soybean. PMID:26856884

Development and application of a novel genome-wide SNP array reveals domestication history in soybean.

PubMed

Wang, Jiao; Chu, Shanshan; Zhang, Huairen; Zhu, Ying; Cheng, Hao; Yu, Deyue

2016-02-09

Domestication of soybeans occurred under the intense human-directed selections aimed at developing high-yielding lines. Tracing the domestication history and identifying the genes underlying soybean domestication require further exploration. Here, we developed a high-throughput NJAU 355 K SoySNP array and used this array to study the genetic variation patterns in 367 soybean accessions, including 105 wild soybeans and 262 cultivated soybeans. The population genetic analysis suggests that cultivated soybeans have tended to originate from northern and central China, from where they spread to other regions, accompanied with a gradual increase in seed weight. Genome-wide scanning for evidence of artificial selection revealed signs of selective sweeps involving genes controlling domestication-related agronomic traits including seed weight. To further identify genomic regions related to seed weight, a genome-wide association study (GWAS) was conducted across multiple environments in wild and cultivated soybeans. As a result, a strong linkage disequilibrium region on chromosome 20 was found to be significantly correlated with seed weight in cultivated soybeans. Collectively, these findings should provide an important basis for genomic-enabled breeding and advance the study of functional genomics in soybean.
Informatics Infrastructure for the Materials Genome Initiative

NASA Astrophysics Data System (ADS)

Dima, Alden; Bhaskarla, Sunil; Becker, Chandler; Brady, Mary; Campbell, Carelyn; Dessauw, Philippe; Hanisch, Robert; Kattner, Ursula; Kroenlein, Kenneth; Newrock, Marcus; Peskin, Adele; Plante, Raymond; Li, Sheng-Yen; Rigodiat, Pierre-François; Amaral, Guillaume Sousa; Trautt, Zachary; Schmitt, Xavier; Warren, James; Youssef, Sharief

2016-08-01

A materials data infrastructure that enables the sharing and transformation of a wide range of materials data is an essential part of achieving the goals of the Materials Genome Initiative. We describe two high-level requirements of such an infrastructure as well as an emerging open-source implementation consisting of the Materials Data Curation System and the National Institute of Standards and Technology Materials Resource Registry.
Genome Editing of Structural Variations: Modeling and Gene Correction.

PubMed

Park, Chul-Yong; Sung, Jin Jea; Kim, Dong-Wook

2016-07-01

The analysis of chromosomal structural variations (SVs), such as inversions and translocations, was made possible by the completion of the human genome project and the development of genome-wide sequencing technologies. SVs contribute to genetic diversity and evolution, although some SVs can cause diseases such as hemophilia A in humans. Genome engineering technology using programmable nucleases (e.g., ZFNs, TALENs, and CRISPR/Cas9) has been rapidly developed, enabling precise and efficient genome editing for SV research. Here, we review advances in modeling and gene correction of SVs, focusing on inversion, translocation, and nucleotide repeat expansion. Copyright © 2016 Elsevier Ltd. All rights reserved.
Large-scale gene function analysis with the PANTHER classification system.

PubMed

Mi, Huaiyu; Muruganujan, Anushya; Casagrande, John T; Thomas, Paul D

2013-08-01

The PANTHER (protein annotation through evolutionary relationship) classification system (http://www.pantherdb.org/) is a comprehensive system that combines gene function, ontology, pathways and statistical analysis tools that enable biologists to analyze large-scale, genome-wide data from sequencing, proteomics or gene expression experiments. The system is built with 82 complete genomes organized into gene families and subfamilies, and their evolutionary relationships are captured in phylogenetic trees, multiple sequence alignments and statistical models (hidden Markov models or HMMs). Genes are classified according to their function in several different ways: families and subfamilies are annotated with ontology terms (Gene Ontology (GO) and PANTHER protein class), and sequences are assigned to PANTHER pathways. The PANTHER website includes a suite of tools that enable users to browse and query gene functions, and to analyze large-scale experimental data with a number of statistical tests. It is widely used by bench scientists, bioinformaticians, computer scientists and systems biologists. In the 2013 release of PANTHER (v.8.0), in addition to an update of the data content, we redesigned the website interface to improve both user experience and the system's analytical capability. This protocol provides a detailed description of how to analyze genome-wide experimental data with the PANTHER classification system.
Molecular inversion probe assay.

PubMed

Absalan, Farnaz; Ronaghi, Mostafa

2007-01-01

We have described molecular inversion probe technologies for large-scale genetic analyses. This technique provides a comprehensive and powerful tool for the analysis of genetic variation and enables affordable, large-scale studies that will help uncover the genetic basis of complex disease and explain the individual variation in response to therapeutics. Major applications of the molecular inversion probes (MIP) technologies include targeted genotyping from focused regions to whole-genome studies, and allele quantification of genomic rearrangements. The MIP technology (used in the HapMap project) provides an efficient, scalable, and affordable way to score polymorphisms in case/control populations for genetic studies. The MIP technology provides the highest commercially available multiplexing levels and assay conversion rates for targeted genotyping. This enables more informative, genome-wide studies with either the functional (direct detection) approach or the indirect detection approach.
Understanding development and stem cells using single cell-based analyses of gene expression.

PubMed

Kumar, Pavithra; Tan, Yuqi; Cahan, Patrick

2017-01-01

In recent years, genome-wide profiling approaches have begun to uncover the molecular programs that drive developmental processes. In particular, technical advances that enable genome-wide profiling of thousands of individual cells have provided the tantalizing prospect of cataloging cell type diversity and developmental dynamics in a quantitative and comprehensive manner. Here, we review how single-cell RNA sequencing has provided key insights into mammalian developmental and stem cell biology, emphasizing the analytical approaches that are specific to studying gene expression in single cells. © 2017. Published by The Company of Biologists Ltd.
The FLEXGene repository: exploiting the fruits of the genome projects by creating a needed resource to face the challenges of the post-genomic era.

PubMed

Brizuela, Leonardo; Richardson, Aaron; Marsischky, Gerald; Labaer, Joshua

2002-01-01

Thanks to the results of the multiple completed and ongoing genome sequencing projects and to the newly available recombination-based cloning techniques, it is now possible to build gene repositories with no precedent in their composition, formatting, and potential. This new type of gene repository is necessary to address the challenges imposed by the post-genomic era, i.e., experimentation on a genome-wide scale. We are building the FLEXGene (Full Length EXpression-ready) repository. This unique resource will contain clones representing the complete ORFeome of different organisms, including Homo sapiens as well as several pathogens and model organisms. It will consist of a comprehensive, characterized (sequence-verified), and arrayed gene repository. This resource will allow full exploitation of the genomic information by enabling genome-wide scale experimentation at the level of functional/phenotypic assays as well as at the level of protein expression, purification, and analysis. Here we describe the rationale and construction of this resource and focus on the data obtained from the Saccharomyces cerevisiae project.
Ribosomal DNA sequence heterogeneity reflects intraspecies phylogenies and predicts genome structure in two contrasting yeast species.

PubMed

West, Claire; James, Stephen A; Davey, Robert P; Dicks, Jo; Roberts, Ian N

2014-07-01

The ribosomal RNA encapsulates a wealth of evolutionary information, including genetic variation that can be used to discriminate between organisms at a wide range of taxonomic levels. For example, the prokaryotic 16S rDNA sequence is very widely used both in phylogenetic studies and as a marker in metagenomic surveys and the internal transcribed spacer region, frequently used in plant phylogenetics, is now recognized as a fungal DNA barcode. However, this widespread use does not escape criticism, principally due to issues such as difficulties in classification of paralogous versus orthologous rDNA units and intragenomic variation, both of which may be significant barriers to accurate phylogenetic inference. We recently analyzed data sets from the Saccharomyces Genome Resequencing Project, characterizing rDNA sequence variation within multiple strains of the baker's yeast Saccharomyces cerevisiae and its nearest wild relative Saccharomyces paradoxus in unprecedented detail. Notably, both species possess single locus rDNA systems. Here, we use these new variation datasets to assess whether a more detailed characterization of the rDNA locus can alleviate the second of these phylogenetic issues, sequence heterogeneity, while controlling for the first. We demonstrate that a strong phylogenetic signal exists within both datasets and illustrate how they can be used, with existing methodology, to estimate intraspecies phylogenies of yeast strains consistent with those derived from whole-genome approaches. We also describe the use of partial Single Nucleotide Polymorphisms, a type of sequence variation found only in repetitive genomic regions, in identifying key evolutionary features such as genome hybridization events and show their consistency with whole-genome Structure analyses. We conclude that our approach can transform rDNA sequence heterogeneity from a problem to a useful source of evolutionary information, enabling the estimation of highly accurate phylogenies of closely related organisms, and discuss how it could be extended to future studies of multilocus rDNA systems. [concerted evolution; genome hydridisation; phylogenetic analysis; ribosomal DNA; whole genome sequencing; yeast]. © The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.
New bioinformatic tool for quick identification of functionally relevant endogenous retroviral inserts in human genome.

PubMed

Garazha, Andrew; Ivanova, Alena; Suntsova, Maria; Malakhova, Galina; Roumiantsev, Sergey; Zhavoronkov, Alex; Buzdin, Anton

2015-01-01

Endogenous retroviruses (ERVs) and LTR retrotransposons (LRs) occupy ∼8% of human genome. Deep sequencing technologies provide clues to understanding of functional relevance of individual ERVs/LRs by enabling direct identification of transcription factor binding sites (TFBS) and other landmarks of functional genomic elements. Here, we performed the genome-wide identification of human ERVs/LRs containing TFBS according to the ENCODE project. We created the first interactive ERV/LRs database that groups the individual inserts according to their familial nomenclature, number of mapped TFBS and divergence from their consensus sequence. Information on any particular element can be easily extracted by the user. We also created a genome browser tool, which enables quick mapping of any ERV/LR insert according to genomic coordinates, known human genes and TFBS. These tools can be used to easily explore functionally relevant individual ERV/LRs, and for studying their impact on the regulation of human genes. Overall, we identified ∼110,000 ERV/LR genomic elements having TFBS. We propose a hypothesis of "domestication" of ERV/LR TFBS by the genome milieu including subsequent stages of initial epigenetic repression, partial functional release, and further mutation-driven reshaping of TFBS in tight coevolution with the enclosing genomic loci.
Enabling functional genomics with genome engineering

PubMed Central

Hilton, Isaac B.; Gersbach, Charles A.

2015-01-01

Advances in genome engineering technologies have made the precise control over genome sequence and regulation possible across a variety of disciplines. These tools can expand our understanding of fundamental biological processes and create new opportunities for therapeutic designs. The rapid evolution of these methods has also catalyzed a new era of genomics that includes multiple approaches to functionally characterize and manipulate the regulation of genomic information. Here, we review the recent advances of the most widely adopted genome engineering platforms and their application to functional genomics. This includes engineered zinc finger proteins, TALEs/TALENs, and the CRISPR/Cas9 system as nucleases for genome editing, transcription factors for epigenome editing, and other emerging applications. We also present current and potential future applications of these tools, as well as their current limitations and areas for future advances. PMID:26430154
Genomics and integrated systems biology in Plasmodium falciparum: a path to malaria control and eradication.

PubMed

Le Roch, K G; Chung, D-W D; Ponts, N

2012-01-01

The first draft of the human malaria parasite's genome was released in 2002. Since then, the malaria scientific community has witnessed a steady embrace of new and powerful functional genomic studies. Over the years, these approaches have slowly revolutionized malaria research and enabled the comprehensive, unbiased investigation of various aspects of the parasite's biology. These genome-wide analyses delivered a refined annotation of the parasite's genome, delivered a better knowledge of its RNA, proteins and metabolite derivatives, and fostered the discovery of new vaccine and drug targets. Despite the positive impacts of these genomic studies, most research and investment still focus on protein targets, drugs and vaccine candidates that were known before the publication of the parasite genome sequence. However, recent access to next-generation sequencing technologies, along with an increased number of genome-wide applications, is expanding the impact of the parasite genome on biomedical research, contributing to a paradigm shift in research activities that may possibly lead to new optimized diagnosis and treatments. This review provides an update of Plasmodium falciparum genome sequences and an overview of the rapid development of genomics and system biology applications that have an immense potential of creating powerful tools for a successful malaria eradication campaign. © 2011 Blackwell Publishing Ltd.
CRISPR/Cas9-mediated gene targeting in Arabidopsis using sequential transformation.

PubMed

Miki, Daisuke; Zhang, Wenxin; Zeng, Wenjie; Feng, Zhengyan; Zhu, Jian-Kang

2018-05-17

Homologous recombination-based gene targeting is a powerful tool for precise genome modification and has been widely used in organisms ranging from yeast to higher organisms such as Drosophila and mouse. However, gene targeting in higher plants, including the most widely used model plant Arabidopsis thaliana, remains challenging. Here we report a sequential transformation method for gene targeting in Arabidopsis. We find that parental lines expressing the bacterial endonuclease Cas9 from the egg cell- and early embryo-specific DD45 gene promoter can improve the frequency of single-guide RNA-targeted gene knock-ins and sequence replacements via homologous recombination at several endogenous sites in the Arabidopsis genome. These heritable gene targeting can be identified by regular PCR. Our approach enables routine and fine manipulation of the Arabidopsis genome.
Genome-Wide Mutagenesis of Dengue Virus Reveals Plasticity of the NS1 Protein and Enables Generation of Infectious Tagged Reporter Viruses

PubMed Central

Johnson, Stephen M.; Eltahla, Auda A.; Aloi, Maria; Aloia, Amanda L.; McDevitt, Christopher A.; Bull, Rowena A.

2017-01-01

ABSTRACT Dengue virus (DENV) is a major global pathogen that causes significant morbidity and mortality in tropical and subtropical areas worldwide. An improved understanding of the regions within the DENV genome and its encoded proteins that are required for the virus replication cycle will expedite the development of urgently required therapeutics and vaccines. We subjected an infectious DENV genome to unbiased insertional mutagenesis and used next-generation sequencing to identify sites that tolerate 15-nucleotide insertions during the virus replication cycle in hepatic cell culture. This revealed that the regions within capsid, NS1, and the 3′ untranslated region were the most tolerant of insertions. In contrast, prM- and NS2A-encoding regions were largely intolerant of insertions. Notably, the multifunctional NS1 protein readily tolerated insertions in regions within the Wing, connector, and β-ladder domains with minimal effects on viral RNA replication and infectious virus production. Using this information, we generated infectious reporter viruses, including a variant encoding the APEX2 electron microscopy tag in NS1 that uniquely enabled high-resolution imaging of its localization to the surface and interior of viral replication vesicles. In addition, we generated a tagged virus bearing an mScarlet fluorescent protein insertion in NS1 that, despite an impact on fitness, enabled live cell imaging of NS1 localization and traffic in infected cells. Overall, this genome-wide profile of DENV genome flexibility may be further dissected and exploited in reporter virus generation and antiviral strategies. IMPORTANCE Regions of genetic flexibility in viral genomes can be exploited in the generation of reporter virus tools and should arguably be avoided in antiviral drug and vaccine design. Here, we subjected the DENV genome to high-throughput insertional mutagenesis to identify regions of genetic flexibility and enable tagged reporter virus generation. In particular, the viral NS1 protein displayed remarkable tolerance of small insertions. This genetic flexibility enabled generation of several novel NS1-tagged reporter viruses, including an APEX2-tagged virus that we used in high-resolution imaging of NS1 localization in infected cells by electron microscopy. For the first time, this analysis revealed the localization of NS1 within viral replication factories known as “vesicle packets” (VPs), in addition to its acknowledged localization to the luminal surface of these VPs. Together, this genetic profile of DENV may be further refined and exploited in the identification of antiviral targets and the generation of reporter virus tools. PMID:28956770
Conservation genomics of threatened animal species.

PubMed

Steiner, Cynthia C; Putnam, Andrea S; Hoeck, Paquita E A; Ryder, Oliver A

2013-01-01

The genomics era has opened up exciting possibilities in the field of conservation biology by enabling genomic analyses of threatened species that previously were limited to model organisms. Next-generation sequencing (NGS) and the collection of genome-wide data allow for more robust studies of the demographic history of populations and adaptive variation associated with fitness and local adaptation. Genomic analyses can also advance management efforts for threatened wild and captive populations by identifying loci contributing to inbreeding depression and disease susceptibility, and predicting fitness consequences of introgression. However, the development of genomic tools in wild species still carries multiple challenges, particularly those associated with computational and sampling constraints. This review provides an overview of the most significant applications of NGS and the implications and limitations of genomic studies in conservation.
Fluorescence Reporter-Based Genome-Wide RNA Interference Screening to Identify Alternative Splicing Regulators.

PubMed

Misra, Ashish; Green, Michael R

2017-01-01

Alternative splicing is a regulated process that leads to inclusion or exclusion of particular exons in a pre-mRNA transcript, resulting in multiple protein isoforms being encoded by a single gene. With more than 90 % of human genes known to undergo alternative splicing, it represents a major source for biological diversity inside cells. Although in vitro splicing assays have revealed insights into the mechanisms regulating individual alternative splicing events, our global understanding of alternative splicing regulation is still evolving. In recent years, genome-wide RNA interference (RNAi) screening has transformed biological research by enabling genome-scale loss-of-function screens in cultured cells and model organisms. In addition to resulting in the identification of new cellular pathways and potential drug targets, these screens have also uncovered many previously unknown mechanisms regulating alternative splicing. Here, we describe a method for the identification of alternative splicing regulators using genome-wide RNAi screening, as well as assays for further validation of the identified candidates. With modifications, this method can also be adapted to study the splicing regulation of pre-mRNAs that contain two or more splice isoforms.
genipe: an automated genome-wide imputation pipeline with automatic reporting and statistical tools.

PubMed

Lemieux Perreault, Louis-Philippe; Legault, Marc-André; Asselin, Géraldine; Dubé, Marie-Pierre

2016-12-01

Genotype imputation is now commonly performed following genome-wide genotyping experiments. Imputation increases the density of analyzed genotypes in the dataset, enabling fine-mapping across the genome. However, the process of imputation using the most recent publicly available reference datasets can require considerable computation power and the management of hundreds of large intermediate files. We have developed genipe, a complete genome-wide imputation pipeline which includes automatic reporting, imputed data indexing and management, and a suite of statistical tests for imputed data commonly used in genetic epidemiology (Sequence Kernel Association Test, Cox proportional hazards for survival analysis, and linear mixed models for repeated measurements in longitudinal studies). The genipe package is an open source Python software and is freely available for non-commercial use (CC BY-NC 4.0) at https://github.com/pgxcentre/genipe Documentation and tutorials are available at http://pgxcentre.github.io/genipe CONTACT: louis-philippe.lemieux.perreault@statgen.org or marie-pierre.dube@statgen.orgSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Advances in Cryptococcus genomics: insights into the evolution of pathogenesis.

PubMed

Cuomo, Christina A; Rhodes, Johanna; Desjardins, Christopher A

2018-01-01

Cryptococcus species are the causative agents of cryptococcal meningitis, a significant source of mortality in immunocompromised individuals. Initial work on the molecular epidemiology of this fungal pathogen utilized genotyping approaches to describe the genetic diversity and biogeography of two species, Cryptococcus neoformans and Cryptococcus gattii. Whole genome sequencing of representatives of both species resulted in reference assemblies enabling a wide array of downstream studies and genomic resources. With the increasing availability of whole genome sequencing, both species have now had hundreds of individual isolates sequenced, providing fine-scale insight into the evolution and diversification of Cryptococcus and allowing for the first genome-wide association studies to identify genetic variants associated with human virulence. Sequencing has also begun to examine the microevolution of isolates during prolonged infection and to identify variants specific to outbreak lineages, highlighting the potential role of hyper-mutation in evolving within short time scales. We can anticipate that further advances in sequencing technology and sequencing microbial genomes at scale, including metagenomics approaches, will continue to refine our view of how the evolution of Cryptococcus drives its success as a pathogen.
Genome-Wide Association Mapping and Genomic Prediction Elucidate the Genetic Architecture of Morphological Traits in Arabidopsis.

PubMed

Kooke, Rik; Kruijer, Willem; Bours, Ralph; Becker, Frank; Kuhn, André; van de Geest, Henri; Buntjer, Jaap; Doeswijk, Timo; Guerra, José; Bouwmeester, Harro; Vreugdenhil, Dick; Keurentjes, Joost J B

2016-04-01

Quantitative traits in plants are controlled by a large number of genes and their interaction with the environment. To disentangle the genetic architecture of such traits, natural variation within species can be explored by studying genotype-phenotype relationships. Genome-wide association studies that link phenotypes to thousands of single nucleotide polymorphism markers are nowadays common practice for such analyses. In many cases, however, the identified individual loci cannot fully explain the heritability estimates, suggesting missing heritability. We analyzed 349 Arabidopsis accessions and found extensive variation and high heritabilities for different morphological traits. The number of significant genome-wide associations was, however, very low. The application of genomic prediction models that take into account the effects of all individual loci may greatly enhance the elucidation of the genetic architecture of quantitative traits in plants. Here, genomic prediction models revealed different genetic architectures for the morphological traits. Integrating genomic prediction and association mapping enabled the assignment of many plausible candidate genes explaining the observed variation. These genes were analyzed for functional and sequence diversity, and good indications that natural allelic variation in many of these genes contributes to phenotypic variation were obtained. For ACS11, an ethylene biosynthesis gene, haplotype differences explaining variation in the ratio of petiole and leaf length could be identified. © 2016 American Society of Plant Biologists. All Rights Reserved.
Genome-wide prediction of cis-regulatory regions using supervised deep learning methods.

PubMed

Li, Yifeng; Shi, Wenqiang; Wasserman, Wyeth W

2018-05-31

In the human genome, 98% of DNA sequences are non-protein-coding regions that were previously disregarded as junk DNA. In fact, non-coding regions host a variety of cis-regulatory regions which precisely control the expression of genes. Thus, Identifying active cis-regulatory regions in the human genome is critical for understanding gene regulation and assessing the impact of genetic variation on phenotype. The developments of high-throughput sequencing and machine learning technologies make it possible to predict cis-regulatory regions genome wide. Based on rich data resources such as the Encyclopedia of DNA Elements (ENCODE) and the Functional Annotation of the Mammalian Genome (FANTOM) projects, we introduce DECRES based on supervised deep learning approaches for the identification of enhancer and promoter regions in the human genome. Due to their ability to discover patterns in large and complex data, the introduction of deep learning methods enables a significant advance in our knowledge of the genomic locations of cis-regulatory regions. Using models for well-characterized cell lines, we identify key experimental features that contribute to the predictive performance. Applying DECRES, we delineate locations of 300,000 candidate enhancers genome wide (6.8% of the genome, of which 40,000 are supported by bidirectional transcription data), and 26,000 candidate promoters (0.6% of the genome). The predicted annotations of cis-regulatory regions will provide broad utility for genome interpretation from functional genomics to clinical applications. The DECRES model demonstrates potentials of deep learning technologies when combined with high-throughput sequencing data, and inspires the development of other advanced neural network models for further improvement of genome annotations.
Genome-wide SNP identification, linkage map construction and QTL mapping for seed mineral concentrations and contents in pea (Pisum sativum L.).

PubMed

Ma, Yu; Coyne, Clarice J; Grusak, Michael A; Mazourek, Michael; Cheng, Peng; Main, Dorrie; McGee, Rebecca J

2017-02-13

Marker-assisted breeding is now routinely used in major crops to facilitate more efficient cultivar improvement. This has been significantly enabled by the use of next-generation sequencing technology to identify loci and markers associated with traits of interest. While rich in a range of nutritional components, such as protein, mineral nutrients, carbohydrates and several vitamins, pea (Pisum sativum L.), one of the oldest domesticated crops in the world, remains behind many other crops in the availability of genomic and genetic resources. To further improve mineral nutrient levels in pea seeds requires the development of genome-wide tools. The objectives of this research were to develop these tools by: identifying genome-wide single nucleotide polymorphisms (SNPs) using genotyping by sequencing (GBS); constructing a high-density linkage map and comparative maps with other legumes, and identifying quantitative trait loci (QTL) for levels of boron, calcium, iron, potassium, magnesium, manganese, molybdenum, phosphorous, sulfur, and zinc in the seed, as well as for seed weight. In this study, 1609 high quality SNPs were found to be polymorphic between 'Kiflica' and 'Aragorn', two parents of an F 6 -derived recombinant inbred line (RIL) population. Mapping 1683 markers including 75 previously published markers and 1608 SNPs developed from the present study generated a linkage map of size 1310.1 cM. Comparative mapping with other legumes demonstrated that the highest level of synteny was observed between pea and the genome of Medicago truncatula. QTL analysis of the RIL population across two locations revealed at least one QTL for each of the mineral nutrient traits. In total, 46 seed mineral concentration QTLs, 37 seed mineral content QTLs, and 6 seed weight QTLs were discovered. The QTLs explained from 2.4% to 43.3% of the phenotypic variance. The genome-wide SNPs and the genetic linkage map developed in this study permitted QTL identification for pea seed mineral nutrients that will serve as important resources to enable marker-assisted selection (MAS) for nutritional quality traits in pea breeding programs.

Genomic analyses of Northern snakehead (Channa argus) populations in North America

PubMed Central

Resh, Carlee A.; Galaska, Matthew P.

2018-01-01

Background The introduction of northern snakehead (Channa argus; Anabantiformes: Channidae) and their subsequent expansion is one of many problematic biological invasions in the United States. This harmful aquatic invasive species has become established in various parts of the eastern United States, including the Potomac River basin, and has recently become established in the Mississippi River basin in Arkansas. Effective management of C. argus and prevention of its further spread depends upon knowledge of current population structure in the United States. Methods Novel methods for invasive species using whole genomic scans provide unprecedented levels of data, which are able to investigate fine scale differences between and within populations of organisms. In this study, we utilize 2b-RAD genomic sequencing to recover 1,007 single-nucleotide polymorphism (SNP) loci from genomic DNA extracted from 165 C. argus individuals: 147 individuals sampled along the East Coast of the United States and 18 individuals sampled throughout Arkansas. Results Analysis of those SNP loci help to resolve existing population structure and recover five genetically distinct populations of C. argus in the United States. Additionally, information from the SNP loci enable us to begin to calculate the long-term effective population size ranges of this harmful aquatic invasive species. We estimate long-term Ne to be 1,840,000–18,400,000 for the Upper Hudson River basin, 4,537,500–45,375,000 for the Lower Hudson River basin, 3,422,500–34,225,000 for the Potomac River basin, 2,715,000–7,150,000 for Philadelphia, and 2,580,000–25,800,000 for Arkansas populations. Discussion and Conclusions This work provides evidence for the presence of more genetic populations than previously estimated and estimates population size, showing the invasive potential of C. argus in the United States. The valuable information gained from this study will allow effective management of the existing populations to avoid expansion and possibly enable future eradication efforts. PMID:29637024
Genomic analyses of Northern snakehead (Channa argus) populations in North America.

PubMed

Resh, Carlee A; Galaska, Matthew P; Mahon, Andrew R

2018-01-01

The introduction of northern snakehead ( Channa argus ; Anabantiformes: Channidae) and their subsequent expansion is one of many problematic biological invasions in the United States. This harmful aquatic invasive species has become established in various parts of the eastern United States, including the Potomac River basin, and has recently become established in the Mississippi River basin in Arkansas. Effective management of C. argus and prevention of its further spread depends upon knowledge of current population structure in the United States. Novel methods for invasive species using whole genomic scans provide unprecedented levels of data, which are able to investigate fine scale differences between and within populations of organisms. In this study, we utilize 2b-RAD genomic sequencing to recover 1,007 single-nucleotide polymorphism (SNP) loci from genomic DNA extracted from 165 C. argus individuals: 147 individuals sampled along the East Coast of the United States and 18 individuals sampled throughout Arkansas. Analysis of those SNP loci help to resolve existing population structure and recover five genetically distinct populations of C. argus in the United States. Additionally, information from the SNP loci enable us to begin to calculate the long-term effective population size ranges of this harmful aquatic invasive species. We estimate long-term N e to be 1,840,000-18,400,000 for the Upper Hudson River basin, 4,537,500-45,375,000 for the Lower Hudson River basin, 3,422,500-34,225,000 for the Potomac River basin, 2,715,000-7,150,000 for Philadelphia, and 2,580,000-25,800,000 for Arkansas populations. This work provides evidence for the presence of more genetic populations than previously estimated and estimates population size, showing the invasive potential of C. argus in the United States. The valuable information gained from this study will allow effective management of the existing populations to avoid expansion and possibly enable future eradication efforts.
High-throughput microarray technology in diagnostics of enterobacteria based on genome-wide probe selection and regression analysis.

PubMed

Friedrich, Torben; Rahmann, Sven; Weigel, Wilfried; Rabsch, Wolfgang; Fruth, Angelika; Ron, Eliora; Gunzer, Florian; Dandekar, Thomas; Hacker, Jörg; Müller, Tobias; Dobrindt, Ulrich

2010-10-21

The Enterobacteriaceae comprise a large number of clinically relevant species with several individual subspecies. Overlapping virulence-associated gene pools and the high overall genome plasticity often interferes with correct enterobacterial strain typing and risk assessment. Array technology offers a fast, reproducible and standardisable means for bacterial typing and thus provides many advantages for bacterial diagnostics, risk assessment and surveillance. The development of highly discriminative broad-range microbial diagnostic microarrays remains a challenge, because of marked genome plasticity of many bacterial pathogens. We developed a DNA microarray for strain typing and detection of major antimicrobial resistance genes of clinically relevant enterobacteria. For this purpose, we applied a global genome-wide probe selection strategy on 32 available complete enterobacterial genomes combined with a regression model for pathogen classification. The discriminative power of the probe set was further tested in silico on 15 additional complete enterobacterial genome sequences. DNA microarrays based on the selected probes were used to type 92 clinical enterobacterial isolates. Phenotypic tests confirmed the array-based typing results and corroborate that the selected probes allowed correct typing and prediction of major antibiotic resistances of clinically relevant Enterobacteriaceae, including the subspecies level, e.g. the reliable distinction of different E. coli pathotypes. Our results demonstrate that the global probe selection approach based on longest common factor statistics as well as the design of a DNA microarray with a restricted set of discriminative probes enables robust discrimination of different enterobacterial variants and represents a proof of concept that can be adopted for diagnostics of a wide range of microbial pathogens. Our approach circumvents misclassifications arising from the application of virulence markers, which are highly affected by horizontal gene transfer. Moreover, a broad range of pathogens have been covered by an efficient probe set size enabling the design of high-throughput diagnostics.
Genomics and functional genomics in Chlamydomonas reinhardtii

DOE Office of Scientific and Technical Information (OSTI.GOV)

Blaby, Ian K.; Blaby-Haas, Crysten E.

The availability of the Chlamydomonas reinhardtii nuclear genome sequence continues to enable researchers to address biological questions relevant to algae, land plants and animals in unprecedented ways. As we continue to characterize and understand biological processes in C. reinhardtii and translate that knowledge to other systems, we are faced with the realization that many genes encode proteins without a defined function. The field of functional genomics aims to close this gap between genome sequence and protein function. Transcriptomes, proteomes and phenomes can each provide layers of gene-specific functional data while supplying a global snapshot of cellular behavior under different conditions.more » Herein we present a brief history of functional genomics, the present status of the C. reinhardtii genome, how genome-wide experiments can aid in supplying protein function inferences, and provide an outlook for functional genomics in C. reinhardtii.« less
Enabling functional genomics with genome engineering.

PubMed

Hilton, Isaac B; Gersbach, Charles A

2015-10-01

Advances in genome engineering technologies have made the precise control over genome sequence and regulation possible across a variety of disciplines. These tools can expand our understanding of fundamental biological processes and create new opportunities for therapeutic designs. The rapid evolution of these methods has also catalyzed a new era of genomics that includes multiple approaches to functionally characterize and manipulate the regulation of genomic information. Here, we review the recent advances of the most widely adopted genome engineering platforms and their application to functional genomics. This includes engineered zinc finger proteins, TALEs/TALENs, and the CRISPR/Cas9 system as nucleases for genome editing, transcription factors for epigenome editing, and other emerging applications. We also present current and potential future applications of these tools, as well as their current limitations and areas for future advances. © 2015 Hilton and Gersbach; Published by Cold Spring Harbor Laboratory Press.
KGCAK: a K-mer based database for genome-wide phylogeny and complexity evaluation.

PubMed

Wang, Dapeng; Xu, Jiayue; Yu, Jun

2015-09-16

The K-mer approach, treating genomic sequences as simple characters and counting the relative abundance of each string upon a fixed K, has been extensively applied to phylogeny inference for genome assembly, annotation, and comparison. To meet increasing demands for comparing large genome sequences and to promote the use of the K-mer approach, we develop a versatile database, KGCAK ( http://kgcak.big.ac.cn/KGCAK/ ), containing ~8,000 genomes that include genome sequences of diverse life forms (viruses, prokaryotes, protists, animals, and plants) and cellular organelles of eukaryotic lineages. It builds phylogeny based on genomic elements in an alignment-free fashion and provides in-depth data processing enabling users to compare the complexity of genome sequences based on K-mer distribution. We hope that KGCAK becomes a powerful tool for exploring relationship within and among groups of species in a tree of life based on genomic data.
Genomics and functional genomics in Chlamydomonas reinhardtii

DOE PAGES

Blaby, Ian K.; Blaby-Haas, Crysten E.

2017-03-21

The availability of the Chlamydomonas reinhardtii nuclear genome sequence continues to enable researchers to address biological questions relevant to algae, land plants and animals in unprecedented ways. As we continue to characterize and understand biological processes in C. reinhardtii and translate that knowledge to other systems, we are faced with the realization that many genes encode proteins without a defined function. The field of functional genomics aims to close this gap between genome sequence and protein function. Transcriptomes, proteomes and phenomes can each provide layers of gene-specific functional data while supplying a global snapshot of cellular behavior under different conditions.more » Herein we present a brief history of functional genomics, the present status of the C. reinhardtii genome, how genome-wide experiments can aid in supplying protein function inferences, and provide an outlook for functional genomics in C. reinhardtii.« less
Hematopoietic transcriptional mechanisms: from locus-specific to genome-wide vantage points.

PubMed

DeVilbiss, Andrew W; Sanalkumar, Rajendran; Johnson, Kirby D; Keles, Sunduz; Bresnick, Emery H

2014-08-01

Hematopoiesis is an exquisitely regulated process in which stem cells in the developing embryo and the adult generate progenitor cells that give rise to all blood lineages. Master regulatory transcription factors control hematopoiesis by integrating signals from the microenvironment and dynamically establishing and maintaining genetic networks. One of the most rudimentary aspects of cell type-specific transcription factor function, how they occupy a highly restricted cohort of cis-elements in chromatin, remains poorly understood. Transformative technologic advances involving the coupling of next-generation DNA sequencing technology with the chromatin immunoprecipitation assay (ChIP-seq) have enabled genome-wide mapping of factor occupancy patterns. However, formidable problems remain; notably, ChIP-seq analysis yields hundreds to thousands of chromatin sites occupied by a given transcription factor, and only a fraction of the sites appear to be endowed with critical, non-redundant function. It has become en vogue to map transcription factor occupancy patterns genome-wide, while using powerful statistical tools to establish correlations to inform biology and mechanisms. With the advent of revolutionary genome editing technologies, one can now reach beyond correlations to conduct definitive hypothesis testing. This review focuses on key discoveries that have emerged during the path from single loci to genome-wide analyses, specifically in the context of hematopoietic transcriptional mechanisms. Copyright © 2014 ISEH - International Society for Experimental Hematology. Published by Elsevier Inc. All rights reserved.
Horizontal transfer of potential mobile units in phytoplasmas

PubMed Central

Ku, Chuan; Lo, Wen-Sui; Kuo, Chih-Horng

2013-01-01

Phytoplasmas are uncultivated phytopathogenic bacteria that cause diseases in a wide range of economically important plants. Through secretion of effector proteins, they are able to manipulate their plant hosts to facilitate their multiplication and dispersal by insect vectors. The genome sequences of several phytoplasmas have been characterized to date and a group of putative composite transposons called potential mobile units (PMUs) are found in these highly reduced genomes. Recently, our team reported the genome sequence and comparative analysis of a peanut witches’ broom (PnWB) phytoplasma, the first representative of the phytoplasma 16SrII group. Comparisons between the species phylogeny and the phylogenies of the PMU genes revealed that the PnWB PMU is likely to have been transferred from the 16SrI group. This indicates that PMUs are not only the DNA unit for transposition within a genome, but also for horizontal transfer among divergent phytoplasma lineages. Given the association of PMUs with effector genes, the mobility of PMUs across genomes has important implications for phytoplasma ecology and evolution. PMID:24251068
Horizontal transfer of potential mobile units in phytoplasmas.

PubMed

Ku, Chuan; Lo, Wen-Sui; Kuo, Chih-Horng

2013-09-01

Phytoplasmas are uncultivated phytopathogenic bacteria that cause diseases in a wide range of economically important plants. Through secretion of effector proteins, they are able to manipulate their plant hosts to facilitate their multiplication and dispersal by insect vectors. The genome sequences of several phytoplasmas have been characterized to date and a group of putative composite transposons called potential mobile units (PMUs) are found in these highly reduced genomes. Recently, our team reported the genome sequence and comparative analysis of a peanut witches' broom (PnWB) phytoplasma, the first representative of the phytoplasma 16SrII group. Comparisons between the species phylogeny and the phylogenies of the PMU genes revealed that the PnWB PMU is likely to have been transferred from the 16SrI group. This indicates that PMUs are not only the DNA unit for transposition within a genome, but also for horizontal transfer among divergent phytoplasma lineages. Given the association of PMUs with effector genes, the mobility of PMUs across genomes has important implications for phytoplasma ecology and evolution.
Genome-Wide Associations for Multiple Pest Resistances in a Northwestern United States Elite Spring Wheat Panel

USDA-ARS?s Scientific Manuscript database

Northern areas of the western United States are one of the most productive wheat growing regions in the United States. Increasing productivity through breeding is hindered by several biotic stresses which slow and constrain targeted yield improvement. In order to understand genetic variation for str...
Genome-Wide Meta-Analysis Identifies Regions on 7p21 (AHR) and 15q24 (CYP1A2) As Determinants of Habitual Caffeine Consumption

PubMed Central

Azzato, Elizabeth M.; Bennett, Siiri N.; Berndt, Sonja I.; Boerwinkle, Eric; Chanock, Stephen; Chatterjee, Nilanjan; Couper, David; Curhan, Gary; Heiss, Gerardo; Hu, Frank B.; Hunter, David J.; Jacobs, Kevin; Jensen, Majken K.; Kraft, Peter; Landi, Maria Teresa; Nettleton, Jennifer A.; Purdue, Mark P.; Rajaraman, Preetha; Rimm, Eric B.; Rose, Lynda M.; Rothman, Nathaniel; Silverman, Debra; Stolzenberg-Solomon, Rachael; Subar, Amy; Yeager, Meredith; Chasman, Daniel I.; van Dam, Rob M.; Caporaso, Neil E.

2011-01-01

We report the first genome-wide association study of habitual caffeine intake. We included 47,341 individuals of European descent based on five population-based studies within the United States. In a meta-analysis adjusted for age, sex, smoking, and eigenvectors of population variation, two loci achieved genome-wide significance: 7p21 (P = 2.4×10−19), near AHR, and 15q24 (P = 5.2×10−14), between CYP1A1 and CYP1A2. Both the AHR and CYP1A2 genes are biologically plausible candidates as CYP1A2 metabolizes caffeine and AHR regulates CYP1A2. PMID:21490707
Genetic variance partitioning and genome-wide prediction with allele dosage information in autotetraploid potato

USDA-ARS?s Scientific Manuscript database

Potato breeding cycles typically last 6-7 years because of the modest seed multiplication rate and large number of traits required of new varieties. Genomic selection has the potential to increase genetic gain per unit of time, through higher accuracy and/or a shorter cycle. Both possibilities were ...
Population Stratification in the Context of Diverse Epidemiologic Surveys Sans Genome-Wide Data

PubMed Central

Oetjens, Matthew T.; Brown-Gentry, Kristin; Goodloe, Robert; Dilks, Holli H.; Crawford, Dana C.

2016-01-01

Population stratification or confounding by genetic ancestry is a potential cause of false associations in genetic association studies. Estimation of and adjustment for genetic ancestry has become common practice thanks in part to the availability of ancestry informative markers on genome-wide association study (GWAS) arrays. While array data is now widespread, these data are not ubiquitous as several large epidemiologic and clinic-based studies lack genome-wide data. One such large epidemiologic-based study lacking genome-wide data accessible to investigators is the National Health and Nutrition Examination Surveys (NHANES), population-based cross-sectional surveys of Americans linked to demographic, health, and lifestyle data conducted by the Centers for Disease Control and Prevention. DNA samples (n = 14,998) were extracted from biospecimens from consented NHANES participants between 1991–1994 (NHANES III, phase 2) and 1999–2002 and represent three major self-identified racial/ethnic groups: non-Hispanic whites (n = 6,634), non-Hispanic blacks (n = 3,458), and Mexican Americans (n = 3,950). We as the Epidemiologic Architecture for Genes Linked to Environment study genotyped candidate gene and GWAS-identified index variants in NHANES as part of the larger Population Architecture using Genomics and Epidemiology I study for collaborative genetic association studies. To enable basic quality control such as estimation of genetic ancestry to control for population stratification in NHANES san genome-wide data, we outline here strategies that use limited genetic data to identify the markers optimal for characterizing genetic ancestry. From among 411 and 295 autosomal SNPs available in NHANES III and NHANES 1999–2002, we demonstrate that markers with ancestry information can be identified to estimate global ancestry. Despite limited resolution, global genetic ancestry is highly correlated with self-identified race for the majority of participants, although less so for ethnicity. Overall, the strategies outlined here for a large epidemiologic study can be applied to other datasets accessible for genotype–phenotype studies but are sans genome-wide data. PMID:27200085
Genome-wide re-sequencing of multidrug-resistant Mycobacterium leprae Airaku-3.

PubMed

Singh, P; Benjak, A; Carat, S; Kai, M; Busso, P; Avanzi, C; Paniz-Mondolfi, A; Peter, C; Harshman, K; Rougemont, J; Matsuoka, M; Cole, S T

2014-10-01

Genotyping and molecular characterization of drug resistance mechanisms in Mycobacterium leprae enables disease transmission and drug resistance trends to be monitored. In the present study, we performed genome-wide analysis of Airaku-3, a multidrug-resistant strain with an unknown mechanism of resistance to rifampicin. We identified 12 unique non-synonymous single-nucleotide polymorphisms (SNPs) including two in the transporter-encoding ctpC and ctpI genes. In addition, two SNPs were found that improve the resolution of SNP-based genotyping, particularly for Venezuelan and South East Asian strains of M. leprae. © 2014 The Authors Clinical Microbiology and Infection © 2014 European Society of Clinical Microbiology and Infectious Diseases.
The role of protozoa-driven selection in shaping human genetic variability.

PubMed

Pozzoli, Uberto; Fumagalli, Matteo; Cagliani, Rachele; Comi, Giacomo P; Bresolin, Nereo; Clerici, Mario; Sironi, Manuela

2010-03-01

Protozoa exert a strong selective pressure in humans. The selection signatures left by these pathogens can be exploited to identify genetic modulators of infection susceptibility. We show that protozoa diversity in different geographic locations is a good measure of protozoa-driven selective pressure; protozoa diversity captured selection signatures at known malaria resistance loci and identified several selected single nucleotide polymorphisms in immune and hemolytic anemia genes. A genome-wide search enabled us to identify 5180 variants mapping to 1145 genes that are subjected to protozoa-driven selective pressure. We provide a genome-wide estimate of protozoa-driven selective pressure and identify candidate susceptibility genes for protozoa-borne diseases. Copyright 2010 Elsevier Ltd. All rights reserved.
Decoding the Heart through Next Generation Sequencing Approaches.

PubMed

Pawlak, Michal; Niescierowicz, Katarzyna; Winata, Cecilia Lanny

2018-06-07

: Vertebrate organs develop through a complex process which involves interaction between multiple signaling pathways at the molecular, cell, and tissue levels. Heart development is an example of such complex process which, when disrupted, results in congenital heart disease (CHD). This complexity necessitates a holistic approach which allows the visualization of genome-wide interaction networks, as opposed to assessment of limited subsets of factors. Genomics offers a powerful solution to address the problem of biological complexity by enabling the observation of molecular processes at a genome-wide scale. The emergence of next generation sequencing (NGS) technology has facilitated the expansion of genomics, increasing its output capacity and applicability in various biological disciplines. The application of NGS in various aspects of heart biology has resulted in new discoveries, generating novel insights into this field of study. Here we review the contributions of NGS technology into the understanding of heart development and its disruption reflected in CHD and discuss how emerging NGS based methodologies can contribute to the further understanding of heart repair.
Efficient genome-wide detection and cataloging of EMS-induced mutations using exome capture and next-generation sequencing

USDA-ARS?s Scientific Manuscript database

Chemical mutagenesis efficiently generates phenotypic variation in otherwise homogeneous genetic backgrounds, enabling functional analysis of genes. Advances in mutation detection have brought the utility of induced mutant populations on par with those produced by insertional mutagenesis, but system...
Wiki-based Data Management System for Toxicogenomics

EPA Science Inventory

We are developing a data management system to enable systems-based toxicology at the US EPA. This is built upon the WikiLIMS platform and is capabale of housing not just genomics data but also a wide variety of toxicology data and associated experimental design information. Thi...
Three-dimensional reconstruction of single-cell chromosome structure using recurrence plots.

PubMed

Hirata, Yoshito; Oda, Arisa; Ohta, Kunihiro; Aihara, Kazuyuki

2016-10-11

Single-cell analysis of the three-dimensional (3D) chromosome structure can reveal cell-to-cell variability in genome activities. Here, we propose to apply recurrence plots, a mathematical method of nonlinear time series analysis, to reconstruct the 3D chromosome structure of a single cell based on information of chromosomal contacts from genome-wide chromosome conformation capture (Hi-C) data. This recurrence plot-based reconstruction (RPR) method enables rapid reconstruction of a unique structure in single cells, even from incomplete Hi-C information.

Three-dimensional reconstruction of single-cell chromosome structure using recurrence plots

NASA Astrophysics Data System (ADS)

Hirata, Yoshito; Oda, Arisa; Ohta, Kunihiro; Aihara, Kazuyuki

2016-10-01

Single-cell analysis of the three-dimensional (3D) chromosome structure can reveal cell-to-cell variability in genome activities. Here, we propose to apply recurrence plots, a mathematical method of nonlinear time series analysis, to reconstruct the 3D chromosome structure of a single cell based on information of chromosomal contacts from genome-wide chromosome conformation capture (Hi-C) data. This recurrence plot-based reconstruction (RPR) method enables rapid reconstruction of a unique structure in single cells, even from incomplete Hi-C information.
OrthoVenn: a web server for genome wide comparison and annotation of orthologous clusters across multiple species.

PubMed

Wang, Yi; Coleman-Derr, Devin; Chen, Guoping; Gu, Yong Q

2015-07-01

Genome wide analysis of orthologous clusters is an important component of comparative genomics studies. Identifying the overlap among orthologous clusters can enable us to elucidate the function and evolution of proteins across multiple species. Here, we report a web platform named OrthoVenn that is useful for genome wide comparisons and visualization of orthologous clusters. OrthoVenn provides coverage of vertebrates, metazoa, protists, fungi, plants and bacteria for the comparison of orthologous clusters and also supports uploading of customized protein sequences from user-defined species. An interactive Venn diagram, summary counts, and functional summaries of the disjunction and intersection of clusters shared between species are displayed as part of the OrthoVenn result. OrthoVenn also includes in-depth views of the clusters using various sequence analysis tools. Furthermore, OrthoVenn identifies orthologous clusters of single copy genes and allows for a customized search of clusters of specific genes through key words or BLAST. OrthoVenn is an efficient and user-friendly web server freely accessible at http://probes.pw.usda.gov/OrthoVenn or http://aegilops.wheat.ucdavis.edu/OrthoVenn. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Novel applications of array comparative genomic hybridization in molecular diagnostics.

PubMed

Cheung, Sau W; Bi, Weimin

2018-05-31

In 2004, the implementation of array comparative genomic hybridization (array comparative genome hybridization [CGH]) into clinical practice marked a new milestone for genetic diagnosis. Array CGH and single-nucleotide polymorphism (SNP) arrays enable genome-wide detection of copy number changes in a high resolution, and therefore microarray has been recognized as the first-tier test for patients with intellectual disability or multiple congenital anomalies, and has also been applied prenatally for detection of clinically relevant copy number variations in the fetus. Area covered: In this review, the authors summarize the evolution of array CGH technology from their diagnostic laboratory, highlighting exonic SNP arrays developed in the past decade which detect small intragenic copy number changes as well as large DNA segments for the region of heterozygosity. The applications of array CGH to human diseases with different modes of inheritance with the emphasis on autosomal recessive disorders are discussed. Expert commentary: An exonic array is a powerful and most efficient clinical tool in detecting genome wide small copy number variants in both dominant and recessive disorders. However, whole-genome sequencing may become the single integrated platform for detection of copy number changes, single-nucleotide changes as well as balanced chromosomal rearrangements in the near future.
Anonymization of electronic medical records for validating genome-wide association studies

PubMed Central

Loukides, Grigorios; Gkoulalas-Divanis, Aris; Malin, Bradley

2010-01-01

Genome-wide association studies (GWAS) facilitate the discovery of genotype–phenotype relations from population-based sequence databases, which is an integral facet of personalized medicine. The increasing adoption of electronic medical records allows large amounts of patients’ standardized clinical features to be combined with the genomic sequences of these patients and shared to support validation of GWAS findings and to enable novel discoveries. However, disseminating these data “as is” may lead to patient reidentification when genomic sequences are linked to resources that contain the corresponding patients’ identity information based on standardized clinical features. This work proposes an approach that provably prevents this type of data linkage and furnishes a result that helps support GWAS. Our approach automatically extracts potentially linkable clinical features and modifies them in a way that they can no longer be used to link a genomic sequence to a small number of patients, while preserving the associations between genomic sequences and specific sets of clinical features corresponding to GWAS-related diseases. Extensive experiments with real patient data derived from the Vanderbilt's University Medical Center verify that our approach generates data that eliminate the threat of individual reidentification, while supporting GWAS validation and clinical case analysis tasks. PMID:20385806
Differential DNA Methylation Analysis without a Reference Genome.

PubMed

Klughammer, Johanna; Datlinger, Paul; Printz, Dieter; Sheffield, Nathan C; Farlik, Matthias; Hadler, Johanna; Fritsch, Gerhard; Bock, Christoph

2015-12-22

Genome-wide DNA methylation mapping uncovers epigenetic changes associated with animal development, environmental adaptation, and species evolution. To address the lack of high-throughput methods for DNA methylation analysis in non-model organisms, we developed an integrated approach for studying DNA methylation differences independent of a reference genome. Experimentally, our method relies on an optimized 96-well protocol for reduced representation bisulfite sequencing (RRBS), which we have validated in nine species (human, mouse, rat, cow, dog, chicken, carp, sea bass, and zebrafish). Bioinformatically, we developed the RefFreeDMA software to deduce ad hoc genomes directly from RRBS reads and to pinpoint differentially methylated regions between samples or groups of individuals (http://RefFreeDMA.computational-epigenetics.org). The identified regions are interpreted using motif enrichment analysis and/or cross-mapping to annotated genomes. We validated our method by reference-free analysis of cell-type-specific DNA methylation in the blood of human, cow, and carp. In summary, we present a cost-effective method for epigenome analysis in ecology and evolution, which enables epigenome-wide association studies in natural populations and species without a reference genome. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
XGR software for enhanced interpretation of genomic summary data, illustrated by application to immunological traits.

PubMed

Fang, Hai; Knezevic, Bogdan; Burnham, Katie L; Knight, Julian C

2016-12-13

Biological interpretation of genomic summary data such as those resulting from genome-wide association studies (GWAS) and expression quantitative trait loci (eQTL) studies is one of the major bottlenecks in medical genomics research, calling for efficient and integrative tools to resolve this problem. We introduce eXploring Genomic Relations (XGR), an open source tool designed for enhanced interpretation of genomic summary data enabling downstream knowledge discovery. Targeting users of varying computational skills, XGR utilises prior biological knowledge and relationships in a highly integrated but easily accessible way to make user-input genomic summary datasets more interpretable. We show how by incorporating ontology, annotation, and systems biology network-driven approaches, XGR generates more informative results than conventional analyses. We apply XGR to GWAS and eQTL summary data to explore the genomic landscape of the activated innate immune response and common immunological diseases. We provide genomic evidence for a disease taxonomy supporting the concept of a disease spectrum from autoimmune to autoinflammatory disorders. We also show how XGR can define SNP-modulated gene networks and pathways that are shared and distinct between diseases, how it achieves functional, phenotypic and epigenomic annotations of genes and variants, and how it enables exploring annotation-based relationships between genetic variants. XGR provides a single integrated solution to enhance interpretation of genomic summary data for downstream biological discovery. XGR is released as both an R package and a web-app, freely available at http://galahad.well.ox.ac.uk/XGR .
Helicos BioSciences.

PubMed

Milos, Patrice

2008-04-01

Helicos BioSciences Corporation is a life sciences company developing revolutionary new single molecule sequencing technology to provide the path to the US$1000 genome. True Single Molecule Sequencing (tSMS) will drive advancements in pharmacogenomics that can enable a better understanding of an individual's susceptibility to disease, develop more effective disease diagnoses and differentiate response to disease therapies. During 2007, genome-wide disease-association studies, the encylopedia of DNA elements (ENCODE) and the published genome sequence of two individuals have revealed human genome variation far more extensive than originally believed. These also demonstrated that common variations explain only a fraction of the genetic basis of disease. Therefore, the capability to understand an individual genome is critical in setting the foundation for the next great revolution in healthcare. Helicos is committed to this vision and will provide cost-effective genome sequencing and comprehensive analysis of the transcribed genome that can unlock the era of personalized healthcare.
Genome Evolution of Plant-Parasitic Nematodes.

PubMed

Kikuchi, Taisei; Eves-van den Akker, Sebastian; Jones, John T

2017-08-04

Plant parasitism has evolved independently on at least four separate occasions in the phylum Nematoda. The application of next-generation sequencing (NGS) to plant-parasitic nematodes has allowed a wide range of genome- or transcriptome-level comparisons, and these have identified genome adaptations that enable parasitism of plants. Current genome data suggest that horizontal gene transfer, gene family expansions, evolution of new genes that mediate interactions with the host, and parasitism-specific gene regulation are important adaptations that allow nematodes to parasitize plants. Sequencing of a larger number of nematode genomes, including plant parasites that show different modes of parasitism or that have evolved in currently unsampled clades, and using free-living taxa as comparators would allow more detailed analysis and a better understanding of the organization of key genes within the genomes. This would facilitate a more complete understanding of the way in which parasitism has shaped the genomes of plant-parasitic nematodes.
ChloroMitoCU: Codon patterns across organelle genomes for functional genomics and evolutionary applications.

PubMed

Sablok, Gaurav; Chen, Ting-Wen; Lee, Chi-Ching; Yang, Chi; Gan, Ruei-Chi; Wegrzyn, Jill L; Porta, Nicola L; Nayak, Kinshuk C; Huang, Po-Jung; Varotto, Claudio; Tang, Petrus

2017-06-01

Organelle genomes are widely thought to have arisen from reduction events involving cyanobacterial and archaeal genomes, in the case of chloroplasts, or α-proteobacterial genomes, in the case of mitochondria. Heterogeneity in base composition and codon preference has long been the subject of investigation of topics ranging from phylogenetic distortion to the design of overexpression cassettes for transgenic expression. From the overexpression point of view, it is critical to systematically analyze the codon usage patterns of the organelle genomes. In light of the importance of codon usage patterns in the development of hyper-expression organelle transgenics, we present ChloroMitoCU, the first-ever curated, web-based reference catalog of the codon usage patterns in organelle genomes. ChloroMitoCU contains the pre-compiled codon usage patterns of 328 chloroplast genomes (29,960 CDS) and 3,502 mitochondrial genomes (49,066 CDS), enabling genome-wide exploration and comparative analysis of codon usage patterns across species. ChloroMitoCU allows the phylogenetic comparison of codon usage patterns across organelle genomes, the prediction of codon usage patterns based on user-submitted transcripts or assembled organelle genes, and comparative analysis with the pre-compiled patterns across species of interest. ChloroMitoCU can increase our understanding of the biased patterns of codon usage in organelle genomes across multiple clades. ChloroMitoCU can be accessed at: http://chloromitocu.cgu.edu.tw/. © The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
The Microarray Revolution: Perspectives from Educators

ERIC Educational Resources Information Center

Brewster, Jay L.; Beason, K. Beth; Eckdahl, Todd T.; Evans, Irene M.

2004-01-01

In recent years, microarray analysis has become a key experimental tool, enabling the analysis of genome-wide patterns of gene expression. This review approaches the microarray revolution with a focus upon four topics: 1) the early development of this technology and its application to cancer diagnostics; 2) a primer of microarray research,…
The expanding footprint of CRISPR/Cas9 in the plant sciences

USDA-ARS?s Scientific Manuscript database

CRISPR/Cas9 has evolved and transformed the field of biology at an unprecedented pace. From the initial purpose of introducing a site specific mutation within a genome of choice, this technology has morphed into enabling a wide array of molecular applications, including site-specific transgene inser...
Identification of pathogen avirulencegenes in the fusiform rust pathosystem

Treesearch

John M. Davis; Katherine E. Smith; Amanda Pendleton; Jason A. Smith; C. Dana Nelson

2012-01-01

The Cronartium quercuum f.sp. fusiforme (Cqf) whole genome sequencing project will enable identification of avirulence genes in the most devastating pine fungal pathogen in the southeastern United States. Amerson and colleagues (unpublished) have mapped nine fusiform rust resistance genes in loblolly pine,...
Cloud computing for comparative genomics

PubMed Central

2010-01-01

Background Large comparative genomics studies and tools are becoming increasingly more compute-expensive as the number of available genome sequences continues to rise. The capacity and cost of local computing infrastructures are likely to become prohibitive with the increase, especially as the breadth of questions continues to rise. Alternative computing architectures, in particular cloud computing environments, may help alleviate this increasing pressure and enable fast, large-scale, and cost-effective comparative genomics strategies going forward. To test this, we redesigned a typical comparative genomics algorithm, the reciprocal smallest distance algorithm (RSD), to run within Amazon's Elastic Computing Cloud (EC2). We then employed the RSD-cloud for ortholog calculations across a wide selection of fully sequenced genomes. Results We ran more than 300,000 RSD-cloud processes within the EC2. These jobs were farmed simultaneously to 100 high capacity compute nodes using the Amazon Web Service Elastic Map Reduce and included a wide mix of large and small genomes. The total computation time took just under 70 hours and cost a total of $6,302 USD. Conclusions The effort to transform existing comparative genomics algorithms from local compute infrastructures is not trivial. However, the speed and flexibility of cloud computing environments provides a substantial boost with manageable cost. The procedure designed to transform the RSD algorithm into a cloud-ready application is readily adaptable to similar comparative genomics problems. PMID:20482786
Cloud computing for comparative genomics.

PubMed

Wall, Dennis P; Kudtarkar, Parul; Fusaro, Vincent A; Pivovarov, Rimma; Patil, Prasad; Tonellato, Peter J

2010-05-18

Large comparative genomics studies and tools are becoming increasingly more compute-expensive as the number of available genome sequences continues to rise. The capacity and cost of local computing infrastructures are likely to become prohibitive with the increase, especially as the breadth of questions continues to rise. Alternative computing architectures, in particular cloud computing environments, may help alleviate this increasing pressure and enable fast, large-scale, and cost-effective comparative genomics strategies going forward. To test this, we redesigned a typical comparative genomics algorithm, the reciprocal smallest distance algorithm (RSD), to run within Amazon's Elastic Computing Cloud (EC2). We then employed the RSD-cloud for ortholog calculations across a wide selection of fully sequenced genomes. We ran more than 300,000 RSD-cloud processes within the EC2. These jobs were farmed simultaneously to 100 high capacity compute nodes using the Amazon Web Service Elastic Map Reduce and included a wide mix of large and small genomes. The total computation time took just under 70 hours and cost a total of $6,302 USD. The effort to transform existing comparative genomics algorithms from local compute infrastructures is not trivial. However, the speed and flexibility of cloud computing environments provides a substantial boost with manageable cost. The procedure designed to transform the RSD algorithm into a cloud-ready application is readily adaptable to similar comparative genomics problems.
Focus on the good, the bad and the unknown: genomics-enabled discovery of plant-associated microbial processes and diversity.

PubMed

2015-03-01

MPMI has played a leading role in disseminating new insights into plant-microbe interactions and promoting new approaches. Articles in this Focus Issue highlight the power of genomic studies in uncovering novel determinants of plant interactions with microbial symbionts (good), pathogens (bad), and complex microbial communities (unknown). Many articles also illustrate how genomics can support translational research by quickly advancing our knowledge of important microbes that have not been widely studied. Click on Next Article or Table of Contents above to view the articles in this Focus Issue. (From the mobile site, go to the MPMI March 2015 issue.).
Toward genome-enabled mycology.

PubMed

Hibbett, David S; Stajich, Jason E; Spatafora, Joseph W

2013-01-01

Genome-enabled mycology is a rapidly expanding field that is characterized by the pervasive use of genome-scale data and associated computational tools in all aspects of fungal biology. Genome-enabled mycology is integrative and often requires teams of researchers with diverse skills in organismal mycology, bioinformatics and molecular biology. This issue of Mycologia presents the first complete fungal genomes in the history of the journal, reflecting the ongoing transformation of mycology into a genome-enabled science. Here, we consider the prospects for genome-enabled mycology and the technical and social challenges that will need to be overcome to grow the database of complete fungal genomes and enable all fungal biologists to make use of the new data.
The Draft Genome Sequence of Actinokineospora bangkokensis 44EHWT Reveals the Biosynthetic Pathway of the Antifungal Thailandin Compounds with Unusual Butylmalonyl-CoA Extender Units.

PubMed

Greule, Anja; Intra, Bungonsiri; Flemming, Stephan; Rommel, Marcel G E; Panbangred, Watanalai; Bechthold, Andreas

2016-11-23

We report the draft genome sequence of Actinokineospora bangkokensis 44EHW T , the producer of the antifungal polyene compounds, thailandins A and B. The sequence contains 7.45 Mb, 74.1% GC content and 35 putative gene clusters for the biosynthesis of secondary metabolites. There are three gene clusters encoding large polyketide synthases of type I. Annotation of the ORF functions and targeted gene disruption enabled us to identify the cluster for thailandin biosynthesis. We propose a plausible biosynthetic pathway for thailandin, where the unusual butylmalonyl-CoA extender unit is incorporated and results in an untypical side chain.
Genome-Wide Tuning of Protein Expression Levels to Rapidly Engineer Microbial Traits.

PubMed

Freed, Emily F; Winkler, James D; Weiss, Sophie J; Garst, Andrew D; Mutalik, Vivek K; Arkin, Adam P; Knight, Rob; Gill, Ryan T

2015-11-20

The reliable engineering of biological systems requires quantitative mapping of predictable and context-independent expression over a broad range of protein expression levels. However, current techniques for modifying expression levels are cumbersome and are not amenable to high-throughput approaches. Here we present major improvements to current techniques through the design and construction of E. coli genome-wide libraries using synthetic DNA cassettes that can tune expression over a ∼10(4) range. The cassettes also contain molecular barcodes that are optimized for next-generation sequencing, enabling rapid and quantitative tracking of alleles that have the highest fitness advantage. We show these libraries can be used to determine which genes and expression levels confer greater fitness to E. coli under different growth conditions.
TSSer: an automated method to identify transcription start sites in prokaryotic genomes from differential RNA sequencing data.

PubMed

Jorjani, Hadi; Zavolan, Mihaela

2014-04-01

Accurate identification of transcription start sites (TSSs) is an essential step in the analysis of transcription regulatory networks. In higher eukaryotes, the capped analysis of gene expression technology enabled comprehensive annotation of TSSs in genomes such as those of mice and humans. In bacteria, an equivalent approach, termed differential RNA sequencing (dRNA-seq), has recently been proposed, but the application of this approach to a large number of genomes is hindered by the paucity of computational analysis methods. With few exceptions, when the method has been used, annotation of TSSs has been largely done manually. In this work, we present a computational method called 'TSSer' that enables the automatic inference of TSSs from dRNA-seq data. The method rests on a probabilistic framework for identifying both genomic positions that are preferentially enriched in the dRNA-seq data as well as preferentially captured relative to neighboring genomic regions. Evaluating our approach for TSS calling on several publicly available datasets, we find that TSSer achieves high consistency with the curated lists of annotated TSSs, but identifies many additional TSSs. Therefore, TSSer can accelerate genome-wide identification of TSSs in bacterial genomes and can aid in further characterization of bacterial transcription regulatory networks. TSSer is freely available under GPL license at http://www.clipz.unibas.ch/TSSer/index.php
Next generation tools for genomic data generation, distribution, and visualization

PubMed Central

2010-01-01

Background With the rapidly falling cost and availability of high throughput sequencing and microarray technologies, the bottleneck for effectively using genomic analysis in the laboratory and clinic is shifting to one of effectively managing, analyzing, and sharing genomic data. Results Here we present three open-source, platform independent, software tools for generating, analyzing, distributing, and visualizing genomic data. These include a next generation sequencing/microarray LIMS and analysis project center (GNomEx); an application for annotating and programmatically distributing genomic data using the community vetted DAS/2 data exchange protocol (GenoPub); and a standalone Java Swing application (GWrap) that makes cutting edge command line analysis tools available to those who prefer graphical user interfaces. Both GNomEx and GenoPub use the rich client Flex/Flash web browser interface to interact with Java classes and a relational database on a remote server. Both employ a public-private user-group security model enabling controlled distribution of patient and unpublished data alongside public resources. As such, they function as genomic data repositories that can be accessed manually or programmatically through DAS/2-enabled client applications such as the Integrated Genome Browser. Conclusions These tools have gained wide use in our core facilities, research laboratories and clinics and are freely available for non-profit use. See http://sourceforge.net/projects/gnomex/, http://sourceforge.net/projects/genoviz/, and http://sourceforge.net/projects/useq. PMID:20828407

Kernel-based whole-genome prediction of complex traits: a review.

PubMed

Morota, Gota; Gianola, Daniel

2014-01-01

Prediction of genetic values has been a focus of applied quantitative genetics since the beginning of the 20th century, with renewed interest following the advent of the era of whole genome-enabled prediction. Opportunities offered by the emergence of high-dimensional genomic data fueled by post-Sanger sequencing technologies, especially molecular markers, have driven researchers to extend Ronald Fisher and Sewall Wright's models to confront new challenges. In particular, kernel methods are gaining consideration as a regression method of choice for genome-enabled prediction. Complex traits are presumably influenced by many genomic regions working in concert with others (clearly so when considering pathways), thus generating interactions. Motivated by this view, a growing number of statistical approaches based on kernels attempt to capture non-additive effects, either parametrically or non-parametrically. This review centers on whole-genome regression using kernel methods applied to a wide range of quantitative traits of agricultural importance in animals and plants. We discuss various kernel-based approaches tailored to capturing total genetic variation, with the aim of arriving at an enhanced predictive performance in the light of available genome annotation information. Connections between prediction machines born in animal breeding, statistics, and machine learning are revisited, and their empirical prediction performance is discussed. Overall, while some encouraging results have been obtained with non-parametric kernels, recovering non-additive genetic variation in a validation dataset remains a challenge in quantitative genetics.
Comparative Genomics and Host Resistance against Infectious Diseases

PubMed Central

Qureshi, Salman T.; Skamene, Emil

1999-01-01

The large size and complexity of the human genome have limited the identification and functional characterization of components of the innate immune system that play a critical role in front-line defense against invading microorganisms. However, advances in genome analysis (including the development of comprehensive sets of informative genetic markers, improved physical mapping methods, and novel techniques for transcript identification) have reduced the obstacles to discovery of novel host resistance genes. Study of the genomic organization and content of widely divergent vertebrate species has shown a remarkable degree of evolutionary conservation and enables meaningful cross-species comparison and analysis of newly discovered genes. Application of comparative genomics to host resistance will rapidly expand our understanding of human immune defense by facilitating the translation of knowledge acquired through the study of model organisms. We review the rationale and resources for comparative genomic analysis and describe three examples of host resistance genes successfully identified by this approach. PMID:10081670
Next-generation genome-scale models for metabolic engineering.

PubMed

King, Zachary A; Lloyd, Colton J; Feist, Adam M; Palsson, Bernhard O

2015-12-01

Constraint-based reconstruction and analysis (COBRA) methods have become widely used tools for metabolic engineering in both academic and industrial laboratories. By employing a genome-scale in silico representation of the metabolic network of a host organism, COBRA methods can be used to predict optimal genetic modifications that improve the rate and yield of chemical production. A new generation of COBRA models and methods is now being developed--encompassing many biological processes and simulation strategies-and next-generation models enable new types of predictions. Here, three key examples of applying COBRA methods to strain optimization are presented and discussed. Then, an outlook is provided on the next generation of COBRA models and the new types of predictions they will enable for systems metabolic engineering. Copyright © 2014 Elsevier Ltd. All rights reserved.
OperomeDB: A Database of Condition-Specific Transcription Units in Prokaryotic Genomes.

PubMed

Chetal, Kashish; Janga, Sarath Chandra

2015-01-01

Background. In prokaryotic organisms, a substantial fraction of adjacent genes are organized into operons-codirectionally organized genes in prokaryotic genomes with the presence of a common promoter and terminator. Although several available operon databases provide information with varying levels of reliability, very few resources provide experimentally supported results. Therefore, we believe that the biological community could benefit from having a new operon prediction database with operons predicted using next-generation RNA-seq datasets. Description. We present operomeDB, a database which provides an ensemble of all the predicted operons for bacterial genomes using available RNA-sequencing datasets across a wide range of experimental conditions. Although several studies have recently confirmed that prokaryotic operon structure is dynamic with significant alterations across environmental and experimental conditions, there are no comprehensive databases for studying such variations across prokaryotic transcriptomes. Currently our database contains nine bacterial organisms and 168 transcriptomes for which we predicted operons. User interface is simple and easy to use, in terms of visualization, downloading, and querying of data. In addition, because of its ability to load custom datasets, users can also compare their datasets with publicly available transcriptomic data of an organism. Conclusion. OperomeDB as a database should not only aid experimental groups working on transcriptome analysis of specific organisms but also enable studies related to computational and comparative operomics.
in silico Whole Genome Sequencer & Analyzer (iWGS): A Computational Pipeline to Guide the Design and Analysis of de novo Genome Sequencing Studies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhou, Xiaofan; Peris, David; Kominek, Jacek

The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The increasing diversity of sequencing technologies, assays, and de novo assembly algorithms have augmented the complexity of de novo genome sequencing projects in nonmodel organisms. To reduce the costs and challenges in de novo genome sequencing projects and streamline their experimentalmore » design and analysis, we developed iWGS (in silico Whole Genome Sequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols. iWGS seamlessly integrates the four key steps of a de novo genome sequencing project: data generation (through simulation), data quality control, de novo assembly, and assembly evaluation and validation. The last three steps can also be applied to the analysis of real data. iWGS is designed to enable the user to have great flexibility in testing the range of experimental designs available for genome sequencing projects, and supports all major sequencing technologies and popular assembly tools. Three case studies illustrate how iWGS can guide the design of de novo genome sequencing projects, and evaluate the performance of a wide variety of user-specified sequencing strategies and assembly protocols on genomes of differing architectures. iWGS, along with a detailed documentation, is freely available at https://github.com/zhouxiaofan1983/iWGS.« less
in silico Whole Genome Sequencer & Analyzer (iWGS): A Computational Pipeline to Guide the Design and Analysis of de novo Genome Sequencing Studies

DOE PAGES

Zhou, Xiaofan; Peris, David; Kominek, Jacek; ...

2016-09-16

The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The increasing diversity of sequencing technologies, assays, and de novo assembly algorithms have augmented the complexity of de novo genome sequencing projects in nonmodel organisms. To reduce the costs and challenges in de novo genome sequencing projects and streamline their experimentalmore » design and analysis, we developed iWGS (in silico Whole Genome Sequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols. iWGS seamlessly integrates the four key steps of a de novo genome sequencing project: data generation (through simulation), data quality control, de novo assembly, and assembly evaluation and validation. The last three steps can also be applied to the analysis of real data. iWGS is designed to enable the user to have great flexibility in testing the range of experimental designs available for genome sequencing projects, and supports all major sequencing technologies and popular assembly tools. Three case studies illustrate how iWGS can guide the design of de novo genome sequencing projects, and evaluate the performance of a wide variety of user-specified sequencing strategies and assembly protocols on genomes of differing architectures. iWGS, along with a detailed documentation, is freely available at https://github.com/zhouxiaofan1983/iWGS.« less
Transcription facilitated genome-wide recruitment of topoisomerase I and DNA gyrase.

PubMed

Ahmed, Wareed; Sala, Claudia; Hegde, Shubhada R; Jha, Rajiv Kumar; Cole, Stewart T; Nagaraja, Valakunja

2017-05-01

Movement of the transcription machinery along a template alters DNA topology resulting in the accumulation of supercoils in DNA. The positive supercoils generated ahead of transcribing RNA polymerase (RNAP) and the negative supercoils accumulating behind impose severe topological constraints impeding transcription process. Previous studies have implied the role of topoisomerases in the removal of torsional stress and the maintenance of template topology but the in vivo interaction of functionally distinct topoisomerases with heterogeneous chromosomal territories is not deciphered. Moreover, how the transcription-induced supercoils influence the genome-wide recruitment of DNA topoisomerases remains to be explored in bacteria. Using ChIP-Seq, we show the genome-wide occupancy profile of both topoisomerase I and DNA gyrase in conjunction with RNAP in Mycobacterium tuberculosis taking advantage of minimal topoisomerase representation in the organism. The study unveils the first in vivo genome-wide interaction of both the topoisomerases with the genomic regions and establishes that transcription-induced supercoils govern their recruitment at genomic sites. Distribution profiles revealed co-localization of RNAP and the two topoisomerases on the active transcriptional units (TUs). At a given locus, topoisomerase I and DNA gyrase were localized behind and ahead of RNAP, respectively, correlating with the twin-supercoiled domains generated. The recruitment of topoisomerases was higher at the genomic loci with higher transcriptional activity and/or at regions under high torsional stress compared to silent genomic loci. Importantly, the occupancy of DNA gyrase, sole type II topoisomerase in Mtb, near the Ter domain of the Mtb chromosome validates its function as a decatenase.
Transcription facilitated genome-wide recruitment of topoisomerase I and DNA gyrase

PubMed Central

Ahmed, Wareed; Sala, Claudia; Hegde, Shubhada R.; Jha, Rajiv Kumar

2017-01-01

Movement of the transcription machinery along a template alters DNA topology resulting in the accumulation of supercoils in DNA. The positive supercoils generated ahead of transcribing RNA polymerase (RNAP) and the negative supercoils accumulating behind impose severe topological constraints impeding transcription process. Previous studies have implied the role of topoisomerases in the removal of torsional stress and the maintenance of template topology but the in vivo interaction of functionally distinct topoisomerases with heterogeneous chromosomal territories is not deciphered. Moreover, how the transcription-induced supercoils influence the genome-wide recruitment of DNA topoisomerases remains to be explored in bacteria. Using ChIP-Seq, we show the genome-wide occupancy profile of both topoisomerase I and DNA gyrase in conjunction with RNAP in Mycobacterium tuberculosis taking advantage of minimal topoisomerase representation in the organism. The study unveils the first in vivo genome-wide interaction of both the topoisomerases with the genomic regions and establishes that transcription-induced supercoils govern their recruitment at genomic sites. Distribution profiles revealed co-localization of RNAP and the two topoisomerases on the active transcriptional units (TUs). At a given locus, topoisomerase I and DNA gyrase were localized behind and ahead of RNAP, respectively, correlating with the twin-supercoiled domains generated. The recruitment of topoisomerases was higher at the genomic loci with higher transcriptional activity and/or at regions under high torsional stress compared to silent genomic loci. Importantly, the occupancy of DNA gyrase, sole type II topoisomerase in Mtb, near the Ter domain of the Mtb chromosome validates its function as a decatenase. PMID:28463980
A Year of Infection in the Intensive Care Unit: Prospective Whole Genome Sequencing of Bacterial Clinical Isolates Reveals Cryptic Transmissions and Novel Microbiota

PubMed Central

Roach, David J.; Burton, Joshua N.; Lee, Choli; Stackhouse, Bethany; Butler-Wu, Susan M.; Cookson, Brad T.

2015-01-01

Bacterial whole genome sequencing holds promise as a disruptive technology in clinical microbiology, but it has not yet been applied systematically or comprehensively within a clinical context. Here, over the course of one year, we performed prospective collection and whole genome sequencing of nearly all bacterial isolates obtained from a tertiary care hospital’s intensive care units (ICUs). This unbiased collection of 1,229 bacterial genomes from 391 patients enables detailed exploration of several features of clinical pathogens. A sizable fraction of isolates identified as clinically relevant corresponded to previously undescribed species: 12% of isolates assigned a species-level classification by conventional methods actually qualified as distinct, novel genomospecies on the basis of genomic similarity. Pan-genome analysis of the most frequently encountered pathogens in the collection revealed substantial variation in pan-genome size (1,420 to 20,432 genes) and the rate of gene discovery (1 to 152 genes per isolate sequenced). Surprisingly, although potential nosocomial transmission of actively surveilled pathogens was rare, 8.7% of isolates belonged to genomically related clonal lineages that were present among multiple patients, usually with overlapping hospital admissions, and were associated with clinically significant infection in 62% of patients from which they were recovered. Multi-patient clonal lineages were particularly evident in the neonatal care unit, where seven separate Staphylococcus epidermidis clonal lineages were identified, including one lineage associated with bacteremia in 5/9 neonates. Our study highlights key differences in the information made available by conventional microbiological practices versus whole genome sequencing, and motivates the further integration of microbial genome sequencing into routine clinical care. PMID:26230489
Studying the genetic basis of speciation in high gene flow marine invertebrates

PubMed Central

2016-01-01

A growing number of genes responsible for reproductive incompatibilities between species (barrier loci) exhibit the signals of positive selection. However, the possibility that genes experiencing positive selection diverge early in speciation and commonly cause reproductive incompatibilities has not been systematically investigated on a genome-wide scale. Here, I outline a research program for studying the genetic basis of speciation in broadcast spawning marine invertebrates that uses a priori genome-wide information on a large, unbiased sample of genes tested for positive selection. A targeted sequence capture approach is proposed that scores single-nucleotide polymorphisms (SNPs) in widely separated species populations at an early stage of allopatric divergence. The targeted capture of both coding and non-coding sequences enables SNPs to be characterized at known locations across the genome and at genes with known selective or neutral histories. The neutral coding and non-coding SNPs provide robust background distributions for identifying FST-outliers within genes that can, in principle, identify specific mutations experiencing diversifying selection. If natural hybridization occurs between species, the neutral coding and non-coding SNPs can provide a neutral admixture model for genomic clines analyses aimed at finding genes exhibiting strong blocks to introgression. Strongylocentrotid sea urchins are used as a model system to outline the approach but it can be used for any group that has a complete reference genome available. PMID:29491951
Genome-wide engineering of an infectious clone of herpes simplex virus type 1 using synthetic genomics assembly methods.

PubMed

Oldfield, Lauren M; Grzesik, Peter; Voorhies, Alexander A; Alperovich, Nina; MacMath, Derek; Najera, Claudia D; Chandra, Diya Sabrina; Prasad, Sanjana; Noskov, Vladimir N; Montague, Michael G; Friedman, Robert M; Desai, Prashant J; Vashee, Sanjay

2017-10-17

Here, we present a transformational approach to genome engineering of herpes simplex virus type 1 (HSV-1), which has a large DNA genome, using synthetic genomics tools. We believe this method will enable more rapid and complex modifications of HSV-1 and other large DNA viruses than previous technologies, facilitating many useful applications. Yeast transformation-associated recombination was used to clone 11 fragments comprising the HSV-1 strain KOS 152 kb genome. Using overlapping sequences between the adjacent pieces, we assembled the fragments into a complete virus genome in yeast, transferred it into an Escherichia coli host, and reconstituted infectious virus following transfection into mammalian cells. The virus derived from this yeast-assembled genome, KOS YA , replicated with kinetics similar to wild-type virus. We demonstrated the utility of this modular assembly technology by making numerous modifications to a single gene, making changes to two genes at the same time and, finally, generating individual and combinatorial deletions to a set of five conserved genes that encode virion structural proteins. While the ability to perform genome-wide editing through assembly methods in large DNA virus genomes raises dual-use concerns, we believe the incremental risks are outweighed by potential benefits. These include enhanced functional studies, generation of oncolytic virus vectors, development of delivery platforms of genes for vaccines or therapy, as well as more rapid development of countermeasures against potential biothreats.
Genome-wide engineering of an infectious clone of herpes simplex virus type 1 using synthetic genomics assembly methods

PubMed Central

Grzesik, Peter; Voorhies, Alexander A.; Alperovich, Nina; MacMath, Derek; Najera, Claudia D.; Chandra, Diya Sabrina; Prasad, Sanjana; Noskov, Vladimir N.; Montague, Michael G.; Friedman, Robert M.; Desai, Prashant J.

2017-01-01

Here, we present a transformational approach to genome engineering of herpes simplex virus type 1 (HSV-1), which has a large DNA genome, using synthetic genomics tools. We believe this method will enable more rapid and complex modifications of HSV-1 and other large DNA viruses than previous technologies, facilitating many useful applications. Yeast transformation-associated recombination was used to clone 11 fragments comprising the HSV-1 strain KOS 152 kb genome. Using overlapping sequences between the adjacent pieces, we assembled the fragments into a complete virus genome in yeast, transferred it into an Escherichia coli host, and reconstituted infectious virus following transfection into mammalian cells. The virus derived from this yeast-assembled genome, KOSYA, replicated with kinetics similar to wild-type virus. We demonstrated the utility of this modular assembly technology by making numerous modifications to a single gene, making changes to two genes at the same time and, finally, generating individual and combinatorial deletions to a set of five conserved genes that encode virion structural proteins. While the ability to perform genome-wide editing through assembly methods in large DNA virus genomes raises dual-use concerns, we believe the incremental risks are outweighed by potential benefits. These include enhanced functional studies, generation of oncolytic virus vectors, development of delivery platforms of genes for vaccines or therapy, as well as more rapid development of countermeasures against potential biothreats. PMID:28928148
Apollo2Go: a web service adapter for the Apollo genome viewer to enable distributed genome annotation.

PubMed

Klee, Kathrin; Ernst, Rebecca; Spannagl, Manuel; Mayer, Klaus F X

2007-08-30

Apollo, a genome annotation viewer and editor, has become a widely used genome annotation and visualization tool for distributed genome annotation projects. When using Apollo for annotation, database updates are carried out by uploading intermediate annotation files into the respective database. This non-direct database upload is laborious and evokes problems of data synchronicity. To overcome these limitations we extended the Apollo data adapter with a generic, configurable web service client that is able to retrieve annotation data in a GAME-XML-formatted string and pass it on to Apollo's internal input routine. This Apollo web service adapter, Apollo2Go, simplifies the data exchange in distributed projects and aims to render the annotation process more comfortable. The Apollo2Go software is freely available from ftp://ftpmips.gsf.de/plants/apollo_webservice.
Apollo2Go: a web service adapter for the Apollo genome viewer to enable distributed genome annotation

PubMed Central

Klee, Kathrin; Ernst, Rebecca; Spannagl, Manuel; Mayer, Klaus FX

2007-01-01

Background Apollo, a genome annotation viewer and editor, has become a widely used genome annotation and visualization tool for distributed genome annotation projects. When using Apollo for annotation, database updates are carried out by uploading intermediate annotation files into the respective database. This non-direct database upload is laborious and evokes problems of data synchronicity. Results To overcome these limitations we extended the Apollo data adapter with a generic, configurable web service client that is able to retrieve annotation data in a GAME-XML-formatted string and pass it on to Apollo's internal input routine. Conclusion This Apollo web service adapter, Apollo2Go, simplifies the data exchange in distributed projects and aims to render the annotation process more comfortable. The Apollo2Go software is freely available from . PMID:17760972
A genome-wide association study of production traits in a commercial population of Large White pigs: evidence of haplotypes affecting meat quality

PubMed Central

2014-01-01

Background Numerous quantitative trait loci (QTL) have been detected in pigs over the past 20 years using microsatellite markers. However, due to the low density of these markers, the accuracy of QTL location has generally been poor. Since 2009, the dense genome coverage provided by the Illumina PorcineSNP60 BeadChip has made it possible to more accurately map QTL using genome-wide association studies (GWAS). Our objective was to perform high-density GWAS in order to identify genomic regions and corresponding haplotypes associated with production traits in a French Large White population of pigs. Methods Animals (385 Large White pigs from 106 sires) were genotyped using the PorcineSNP60 BeadChip and evaluated for 19 traits related to feed intake, growth, carcass composition and meat quality. Of the 64 432 SNPs on the chip, 44 412 were used for GWAS with an animal mixed model that included a regression coefficient for the tested SNPs and a genomic kinship matrix. SNP haplotype effects in QTL regions were then tested for association with phenotypes following phase reconstruction based on the Sscrofa10.2 pig genome assembly. Results Twenty-three QTL regions were identified on autosomes and their effects ranged from 0.25 to 0.75 phenotypic standard deviation units for feed intake and feed efficiency (four QTL), carcass (12 QTL) and meat quality traits (seven QTL). The 10 most significant QTL regions had effects on carcass (chromosomes 7, 10, 16, 17 and 18) and meat quality traits (two regions on chromosome 1 and one region on chromosomes 8, 9 and 13). Thirteen of the 23 QTL regions had not been previously described. A haplotype block of 183 kb on chromosome 1 (six SNPs) was identified and displayed three distinct haplotypes with significant (0.0001 < P < 0.03) associations with all evaluated meat quality traits. Conclusions GWAS analyses with the PorcineSNP60 BeadChip enabled the detection of 23 QTL regions that affect feed consumption, carcass and meat quality traits in a LW population, of which 13 were novel QTL. The proportionally larger number of QTL found for meat quality traits suggests a specific opportunity for improving these traits in the pig by genomic selection. PMID:24528607
Expanding the biological periodic table.

PubMed

Seravalli, Javier; Ragsdale, Stephen W

2010-08-27

Metal ions play an indispensable role in biology, enabling enzymes to perform their functions and lending support to the structures of numerous macromolecules. Despite their prevalence and importance, the metalloproteome is still relatively unexplored. Cvetkovic et al. (2010) now describe an approach to identify metalloproteins on a genome-wide scale. Copyright (c) 2010 Elsevier Ltd. All rights reserved.
NEBNext Direct: A Novel, Rapid, Hybridization-Based Approach for the Capture and Library Conversion of Genomic Regions of Interest.

PubMed

Emerman, Amy B; Bowman, Sarah K; Barry, Andrew; Henig, Noa; Patel, Kruti M; Gardner, Andrew F; Hendrickson, Cynthia L

2017-07-05

Next-generation sequencing (NGS) is a powerful tool for genomic studies, translational research, and clinical diagnostics that enables the detection of single nucleotide polymorphisms, insertions and deletions, copy number variations, and other genetic variations. Target enrichment technologies improve the efficiency of NGS by only sequencing regions of interest, which reduces sequencing costs while increasing coverage of the selected targets. Here we present NEBNext Direct ® , a hybridization-based, target-enrichment approach that addresses many of the shortcomings of traditional target-enrichment methods. This approach features a simple, 7-hr workflow that uses enzymatic removal of off-target sequences to achieve a high specificity for regions of interest. Additionally, unique molecular identifiers are incorporated for the identification and filtering of PCR duplicates. The same protocol can be used across a wide range of input amounts, input types, and panel sizes, enabling NEBNext Direct to be broadly applicable across a wide variety of research and diagnostic needs. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.
Computational analysis of conserved RNA secondary structure in transcriptomes and genomes.

PubMed

Eddy, Sean R

2014-01-01

Transcriptomics experiments and computational predictions both enable systematic discovery of new functional RNAs. However, many putative noncoding transcripts arise instead from artifacts and biological noise, and current computational prediction methods have high false positive rates. I discuss prospects for improving computational methods for analyzing and identifying functional RNAs, with a focus on detecting signatures of conserved RNA secondary structure. An interesting new front is the application of chemical and enzymatic experiments that probe RNA structure on a transcriptome-wide scale. I review several proposed approaches for incorporating structure probing data into the computational prediction of RNA secondary structure. Using probabilistic inference formalisms, I show how all these approaches can be unified in a well-principled framework, which in turn allows RNA probing data to be easily integrated into a wide range of analyses that depend on RNA secondary structure inference. Such analyses include homology search and genome-wide detection of new structural RNAs.
Toward the 1,000 dollars human genome.

PubMed

Bennett, Simon T; Barnes, Colin; Cox, Anthony; Davies, Lisa; Brown, Clive

2005-06-01

Revolutionary new technologies, capable of transforming the economics of sequencing, are providing an unparalleled opportunity to analyze human genetic variation comprehensively at the whole-genome level within a realistic timeframe and at affordable costs. Current estimates suggest that it would cost somewhere in the region of 30 million US dollars to sequence an entire human genome using Sanger-based sequencing, and on one machine it would take about 60 years. Solexa is widely regarded as a company with the necessary disruptive technology to be the first to achieve the ultimate goal of the so-called 1,000 dollars human genome - the conceptual cost-point needed for routine analysis of individual genomes. Solexa's technology is based on completely novel sequencing chemistry capable of sequencing billions of individual DNA molecules simultaneously, a base at a time, to enable highly accurate, low cost analysis of an entire human genome in a single experiment. When applied over a large enough genomic region, these new approaches to resequencing will enable the simultaneous detection and typing of known, as well as unknown, polymorphisms, and will also offer information about patterns of linkage disequilibrium in the population being studied. Technological progress, leading to the advent of single-molecule-based approaches, is beginning to dramatically drive down costs and increase throughput to unprecedented levels, each being several orders of magnitude better than that which is currently available. A new sequencing paradigm based on single molecules will be faster, cheaper and more sensitive, and will permit routine analysis at the whole-genome level.
Capturing Three-Dimensional Genome Organization in Individual Cells by Single-Cell Hi-C.

PubMed

Nagano, Takashi; Wingett, Steven W; Fraser, Peter

2017-01-01

Hi-C is a powerful method to investigate genome-wide, higher-order chromatin and chromosome conformations averaged from a population of cells. To expand the potential of Hi-C for single-cell analysis, we developed single-cell Hi-C. Similar to the existing "ensemble" Hi-C method, single-cell Hi-C detects proximity-dependent ligation events between cross-linked and restriction-digested chromatin fragments in cells. A major difference between the single-cell Hi-C and ensemble Hi-C protocol is that the proximity-dependent ligation is carried out in the nucleus. This allows the isolation of individual cells in which nearly the entire Hi-C procedure has been carried out, enabling the production of a Hi-C library and data from individual cells. With this new method, we studied genome conformations and found evidence for conserved topological domain organization from cell to cell, but highly variable interdomain contacts and chromosome folding genome wide. In addition, we found that the single-cell Hi-C protocol provided cleaner results with less technical noise suggesting it could be used to improve the ensemble Hi-C technique.

Evo-Devo-EpiR: a genome-wide search platform for epistatic control on the evolution of development.

PubMed

Jiang, Libo; Zhang, Miaomiao; Sang, Mengmeng; Ye, Meixia; Wu, Rongling

2017-09-01

Evo-devo is a theory proposed to study how phenotypes evolve by comparing the developmental processes of different organisms or the same organism experiencing changing environments. It has been recognized that nonallelic interactions at different genes or quantitative trait loci, known as epistasis, may play a pivotal role in the evolution of development, but it has proven difficult to quantify and elucidate this role into a coherent picture. We implement a high-dimensional genome-wide association study model into the evo-devo paradigm and pack it into the R-based Evo-Devo-EpiR, aimed at facilitating the genome-wide landscaping of epistasis for the diversification of phenotypic development. By analyzing a high-throughput assay of DNA markers and their pairs simultaneously, Evo-Devo-EpiR is equipped with a capacity to systematically characterize various epistatic interactions that impact on the pattern and timing of development and its evolution. Enabling a global search for all possible genetic interactions for developmental processes throughout the whole genome, Evo-Devo-EpiR provides a computational tool to illustrate a precise genotype-phenotype map at interface between epistasis, development and evolution. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Genome-wide Selective Sweeps in Natural Bacterial Populations Revealed by Time-series Metagenomics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chan, Leong-Keat; Bendall, Matthew L.; Malfatti, Stephanie

2014-06-18

Multiple evolutionary models have been proposed to explain the formation of genetically and ecologically distinct bacterial groups. Time-series metagenomics enables direct observation of evolutionary processes in natural populations, and if applied over a sufficiently long time frame, this approach could capture events such as gene-specific or genome-wide selective sweeps. Direct observations of either process could help resolve how distinct groups form in natural microbial assemblages. Here, from a three-year metagenomic study of a freshwater lake, we explore changes in single nucleotide polymorphism (SNP) frequencies and patterns of gene gain and loss in populations of Chlorobiaceae and Methylophilaceae. SNP analyses revealedmore » substantial genetic heterogeneity within these populations, although the degree of heterogeneity varied considerably among closely related, co-occurring Methylophilaceae populations. SNP allele frequencies, as well as the relative abundance of certain genes, changed dramatically over time in each population. Interestingly, SNP diversity was purged at nearly every genome position in one of the Chlorobiaceae populations over the course of three years, while at the same time multiple genes either swept through or were swept from this population. These patterns were consistent with a genome-wide selective sweep, a process predicted by the ‘ecotype model’ of diversification, but not previously observed in natural populations.« less
Genome-wide Selective Sweeps in Natural Bacterial Populations Revealed by Time-series Metagenomics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chan, Leong-Keat; Bendall, Matthew L.; Malfatti, Stephanie

2014-05-12

Multiple evolutionary models have been proposed to explain the formation of genetically and ecologically distinct bacterial groups. Time-series metagenomics enables direct observation of evolutionary processes in natural populations, and if applied over a sufficiently long time frame, this approach could capture events such as gene-specific or genome-wide selective sweeps. Direct observations of either process could help resolve how distinct groups form in natural microbial assemblages. Here, from a three-year metagenomic study of a freshwater lake, we explore changes in single nucleotide polymorphism (SNP) frequencies and patterns of gene gain and loss in populations of Chlorobiaceae and Methylophilaceae. SNP analyses revealedmore » substantial genetic heterogeneity within these populations, although the degree of heterogeneity varied considerably among closely related, co-occurring Methylophilaceae populations. SNP allele frequencies, as well as the relative abundance of certain genes, changed dramatically over time in each population. Interestingly, SNP diversity was purged at nearly every genome position in one of the Chlorobiaceae populations over the course of three years, while at the same time multiple genes either swept through or were swept from this population. These patterns were consistent with a genome-wide selective sweep, a process predicted by the ecotype model? of diversification, but not previously observed in natural populations.« less
A Genome-Wide Association Study of Chronic Obstructive Pulmonary Disease in Hispanics

PubMed Central

Chen, Wei; Brehm, John M.; Manichaikul, Ani; Cho, Michael H.; Boutaoui, Nadia; Yan, Qi; Burkart, Kristin M.; Enright, Paul L.; Rotter, Jerome I.; Petersen, Hans; Leng, Shuguang; Obeidat, Ma’en; Bossé, Yohan; Brandsma, Corry-Anke; Hao, Ke; Rich, Stephen S.; Powell, Rhea; Avila, Lydiana; Soto-Quiros, Manuel; Silverman, Edwin K.; Tesfaigzi, Yohannes; Barr, R. Graham

2015-01-01

Rationale: Genome-wide association studies (GWAS) of chronic obstructive pulmonary disease (COPD) have identified disease-susceptibility loci, mostly in subjects of European descent. Objectives: We hypothesized that by studying Hispanic populations we would be able to identify unique loci that contribute to COPD pathogenesis in Hispanics but remain undetected in GWAS of non-Hispanic populations. Methods: We conducted a metaanalysis of two GWAS of COPD in independent cohorts of Hispanics in Costa Rica and the United States (Multi-Ethnic Study of Atherosclerosis [MESA]). We performed a replication study of the top single-nucleotide polymorphisms in an independent Hispanic cohort in New Mexico (the Lovelace Smokers Cohort). We also attempted to replicate prior findings from genome-wide studies in non-Hispanic populations in Hispanic cohorts. Measurements and Main Results: We found no genome-wide significant association with COPD in our metaanalysis of Costa Rica and MESA. After combining the top results from this metaanalysis with those from our replication study in the Lovelace Smokers Cohort, we identified two single-nucleotide polymorphisms approaching genome-wide significance for an association with COPD. The first (rs858249, combined P value = 6.1 × 10−8) is near the genes KLHL7 and NUPL2 on chromosome 7. The second (rs286499, combined P value = 8.4 × 10−8) is located in an intron of DLG2. The two most significant single-nucleotide polymorphisms in FAM13A from a previous genome-wide study in non-Hispanics were associated with COPD in Hispanics. Conclusions: We have identified two novel loci (in or near the genes KLHL7/NUPL2 and DLG2) that may play a role in COPD pathogenesis in Hispanic populations. PMID:25584925
A genome-wide association study of chronic obstructive pulmonary disease in Hispanics.

PubMed

Chen, Wei; Brehm, John M; Manichaikul, Ani; Cho, Michael H; Boutaoui, Nadia; Yan, Qi; Burkart, Kristin M; Enright, Paul L; Rotter, Jerome I; Petersen, Hans; Leng, Shuguang; Obeidat, Ma'en; Bossé, Yohan; Brandsma, Corry-Anke; Hao, Ke; Rich, Stephen S; Powell, Rhea; Avila, Lydiana; Soto-Quiros, Manuel; Silverman, Edwin K; Tesfaigzi, Yohannes; Barr, R Graham; Celedón, Juan C

2015-03-01

Genome-wide association studies (GWAS) of chronic obstructive pulmonary disease (COPD) have identified disease-susceptibility loci, mostly in subjects of European descent. We hypothesized that by studying Hispanic populations we would be able to identify unique loci that contribute to COPD pathogenesis in Hispanics but remain undetected in GWAS of non-Hispanic populations. We conducted a metaanalysis of two GWAS of COPD in independent cohorts of Hispanics in Costa Rica and the United States (Multi-Ethnic Study of Atherosclerosis [MESA]). We performed a replication study of the top single-nucleotide polymorphisms in an independent Hispanic cohort in New Mexico (the Lovelace Smokers Cohort). We also attempted to replicate prior findings from genome-wide studies in non-Hispanic populations in Hispanic cohorts. We found no genome-wide significant association with COPD in our metaanalysis of Costa Rica and MESA. After combining the top results from this metaanalysis with those from our replication study in the Lovelace Smokers Cohort, we identified two single-nucleotide polymorphisms approaching genome-wide significance for an association with COPD. The first (rs858249, combined P value = 6.1 × 10(-8)) is near the genes KLHL7 and NUPL2 on chromosome 7. The second (rs286499, combined P value = 8.4 × 10(-8)) is located in an intron of DLG2. The two most significant single-nucleotide polymorphisms in FAM13A from a previous genome-wide study in non-Hispanics were associated with COPD in Hispanics. We have identified two novel loci (in or near the genes KLHL7/NUPL2 and DLG2) that may play a role in COPD pathogenesis in Hispanic populations.
GWAS and admixture mapping identify different asthma-associated loci in Latinos: The GALA II Study

PubMed Central

Galanter, Joshua M; Gignoux, Christopher R; Torgerson, Dara G; Roth, Lindsey A; Eng, Celeste; Oh, Sam S; Nguyen, Elizabeth A; Drake, Katherine A; Huntsman, Scott; Hu, Donglei; Sen, Saunak; Davis, Adam; Farber, Harold J.; Avila, Pedro C.; Brigino-Buenaventura, Emerita; LeNoir, Michael A.; Meade, Kelley; Serebrisky, Denise; Borrell, Luisa N; Rodríguez-Cintrón, William; Estrada, Andres Moreno; Mendoza, Karla Sandoval; Winkler, Cheryl A.; Klitz, William; Romieu, Isabelle; London, Stephanie J.; Gilliland, Frank; Martinez, Fernando; Bustamante, Carlos; Williams, L Keoki; Kumar, Rajesh; Rodríguez-Santana, José R.; Burchard, and Esteban G.

2013-01-01

Background Asthma is a complex disease with both genetic and environmental causes. Genome-wide association studies of asthma have mostly involved European populations and replication of positive associations has been inconsistent. Objective To identify asthma-associated genes in a large Latino population with genome-wide association analysis and admixture mapping. Methods Latino children with asthma (n = 1,893) and healthy controls (n = 1,881) were recruited from five sites in the United States: Puerto Rico, New York, Chicago, Houston, and the San Francisco Bay Area. Subjects were genotyped on an Affymetrix World Array IV chip. We performed genome-wide association and admixture mapping to identify asthma-associated loci. Results We identified a significant association between ancestry and asthma at 6p21 (lowest p-value: rs2523924, p < 5 × 10−6). This association replicates in a meta-analysis of the EVE Asthma Consortium (p = 0.01). Fine mapping of the region in this study and the EVE Asthma Consortium suggests an association between PSORS1C1 and asthma. We confirmed the strong allelic association between the 17q21 asthma in Latinos (IKZF3, lowest p-value: rs90792, OR: 0.67, 95% CI 0.61 – 0.75, p = 6 × 10−13) and replicated associations in several genes that had previously been associated with asthma in genome-wide association studies. Conclusions Admixture mapping and genome-wide association are complementary techniques that provide evidence for multiple asthma-associated loci in Latinos. Admixture mapping identifies a novel locus on 6p21 that replicates in a meta-analysis of several Latino populations, while genome-wide association confirms the previously identified locus on 17q21. PMID:24406073
Environmental Adaptation Contributes to Gene Polymorphism across the Arabidopsis thaliana Genome

PubMed Central

Lee, Cheng-Ruei

2012-01-01

The level of within-species polymorphism differs greatly among genes in a genome. Many genomic studies have investigated the relationship between gene polymorphism and factors such as recombination rate or expression pattern. However, the polymorphism of a gene is affected not only by its physical properties or functional constraints but also by natural selection on organisms in their environments. Specifically, if functionally divergent alleles enable adaptation to different environments, locus-specific polymorphism may be maintained by spatially heterogeneous natural selection. To test this hypothesis and estimate the extent to which environmental selection shapes the pattern of genome-wide polymorphism, we define the "environmental relevance" of a gene as the proportion of genetic variation explained by environmental factors, after controlling for population structure. We found substantial effects of environmental relevance on patterns of polymorphism among genes. In addition, the correlation between environmental relevance and gene polymorphism is positive, consistent with the expectation that balancing selection among heterogeneous environments maintains genetic variation at ecologically important genes. Comparison of the gene ontology annotations shows that genes with high environmental relevance are enriched in unknown function categories. These results suggest an important role for environmental factors in shaping genome-wide patterns of polymorphism and indicate another direction of genomic study. PMID:22798389
Engineering customized TALE nucleases (TALENs) and TALE transcription factors by fast ligation-based automatable solid-phase high-throughput (FLASH) assembly.

PubMed

Reyon, Deepak; Maeder, Morgan L; Khayter, Cyd; Tsai, Shengdar Q; Foley, Jonathan E; Sander, Jeffry D; Joung, J Keith

2013-07-01

Customized DNA-binding domains made using transcription activator-like effector (TALE) repeats are rapidly growing in importance as widely applicable research tools. TALE nucleases (TALENs), composed of an engineered array of TALE repeats fused to the FokI nuclease domain, have been used successfully for directed genome editing in various organisms and cell types. TALE transcription factors (TALE-TFs), consisting of engineered TALE repeat arrays linked to a transcriptional regulatory domain, have been used to up- or downregulate expression of endogenous genes in human cells and plants. This unit describes a detailed protocol for the recently described fast ligation-based automatable solid-phase high-throughput (FLASH) assembly method. FLASH enables automated high-throughput construction of engineered TALE repeats using an automated liquid handling robot or manually using a multichannel pipet. Using the automated approach, a single researcher can construct up to 96 DNA fragments encoding TALE repeat arrays of various lengths in a single day, and then clone these to construct sequence-verified TALEN or TALE-TF expression plasmids in a week or less. Plasmids required for FLASH are available by request from the Joung lab (http://eGenome.org). This unit also describes improvements to the Zinc Finger and TALE Targeter (ZiFiT Targeter) web server (http://ZiFiT.partners.org) that facilitate the design and construction of FLASH TALE repeat arrays in high throughput. © 2013 by John Wiley & Sons, Inc.
GenomeD3Plot: a library for rich, interactive visualizations of genomic data in web applications.

PubMed

Laird, Matthew R; Langille, Morgan G I; Brinkman, Fiona S L

2015-10-15

A simple static image of genomes and associated metadata is very limiting, as researchers expect rich, interactive tools similar to the web applications found in the post-Web 2.0 world. GenomeD3Plot is a light weight visualization library written in javascript using the D3 library. GenomeD3Plot provides a rich API to allow the rapid visualization of complex genomic data using a convenient standards based JSON configuration file. When integrated into existing web services GenomeD3Plot allows researchers to interact with data, dynamically alter the view, or even resize or reposition the visualization in their browser window. In addition GenomeD3Plot has built in functionality to export any resulting genome visualization in PNG or SVG format for easy inclusion in manuscripts or presentations. GenomeD3Plot is being utilized in the recently released Islandviewer 3 (www.pathogenomics.sfu.ca/islandviewer/) to visualize predicted genomic islands with other genome annotation data. However, its features enable it to be more widely applicable for dynamic visualization of genomic data in general. GenomeD3Plot is licensed under the GNU-GPL v3 at https://github.com/brinkmanlab/GenomeD3Plot/. brinkman@sfu.ca. © The Author 2015. Published by Oxford University Press.
Holocentromeres in Rhynchospora are associated with genome-wide centromere-specific repeat arrays interspersed among euchromatin.

PubMed

Marques, André; Ribeiro, Tiago; Neumann, Pavel; Macas, Jiří; Novák, Petr; Schubert, Veit; Pellino, Marco; Fuchs, Jörg; Ma, Wei; Kuhlmann, Markus; Brandt, Ronny; Vanzela, André L L; Beseda, Tomáš; Šimková, Hana; Pedrosa-Harand, Andrea; Houben, Andreas

2015-11-03

Holocentric chromosomes lack a primary constriction, in contrast to monocentrics. They form kinetochores distributed along almost the entire poleward surface of the chromatids, to which spindle fibers attach. No centromere-specific DNA sequence has been found for any holocentric organism studied so far. It was proposed that centromeric repeats, typical for many monocentric species, could not occur in holocentrics, most likely because of differences in the centromere organization. Here we show that the holokinetic centromeres of the Cyperaceae Rhynchospora pubera are highly enriched by a centromeric histone H3 variant-interacting centromere-specific satellite family designated "Tyba" and by centromeric retrotransposons (i.e., CRRh) occurring as genome-wide interspersed arrays. Centromeric arrays vary in length from 3 to 16 kb and are intermingled with gene-coding sequences and transposable elements. We show that holocentromeres of metaphase chromosomes are composed of multiple centromeric units rather than possessing a diffuse organization, thus favoring the polycentric model. A cell-cycle-dependent shuffling of multiple centromeric units results in the formation of functional (poly)centromeres during mitosis. The genome-wide distribution of centromeric repeat arrays interspersing the euchromatin provides a previously unidentified type of centromeric chromatin organization among eukaryotes. Thus, different types of holocentromeres exist in different species, namely with and without centromeric repetitive sequences.
GBOOST: a GPU-based tool for detecting gene-gene interactions in genome-wide case control studies.

PubMed

Yung, Ling Sing; Yang, Can; Wan, Xiang; Yu, Weichuan

2011-05-01

Collecting millions of genetic variations is feasible with the advanced genotyping technology. With a huge amount of genetic variations data in hand, developing efficient algorithms to carry out the gene-gene interaction analysis in a timely manner has become one of the key problems in genome-wide association studies (GWAS). Boolean operation-based screening and testing (BOOST), a recent work in GWAS, completes gene-gene interaction analysis in 2.5 days on a desktop computer. Compared with central processing units (CPUs), graphic processing units (GPUs) are highly parallel hardware and provide massive computing resources. We are, therefore, motivated to use GPUs to further speed up the analysis of gene-gene interactions. We implement the BOOST method based on a GPU framework and name it GBOOST. GBOOST achieves a 40-fold speedup compared with BOOST. It completes the analysis of Wellcome Trust Case Control Consortium Type 2 Diabetes (WTCCC T2D) genome data within 1.34 h on a desktop computer equipped with Nvidia GeForce GTX 285 display card. GBOOST code is available at http://bioinformatics.ust.hk/BOOST.html#GBOOST.
2010 Award for Outstanding Doctoral Thesis Research in Biological Physics Talk: How the Genome Folds

NASA Astrophysics Data System (ADS)

Lieberman-Aiden, Erez

2011-03-01

I describe Hi-C, a novel technology for probing the three-dimensional architecture of whole genomes by coupling proximity-based ligation with massively parallel sequencing. Working with collaborators at the Broad Institute and UMass Medical School, we used Hi-C to construct spatial proximity maps of the human genome at a resolution of 1Mb. These maps confirm the presence of chromosome territories and the spatial proximity of small, gene-rich chromosomes. We identified an additional level of genome organization that is characterized by the spatial segregation of open and closed chromatin to form two genome-wide compartments. At the megabase scale, the chromatin conformation is consistent with a fractal globule, a knot-free conformation that enables maximally dense packing while preserving the ability to easily fold and unfold any genomic locus. The fractal globule is distinct from the more commonly used globular equilibrium model. Our results demonstrate the power of Hi-C to map the dynamic conformations of whole genomes.
An integrated CRISPR Bombyx mori genome editing system with improved efficiency and expanded target sites.

PubMed

Ma, Sanyuan; Liu, Yue; Liu, Yuanyuan; Chang, Jiasong; Zhang, Tong; Wang, Xiaogang; Shi, Run; Lu, Wei; Xia, Xiaojuan; Zhao, Ping; Xia, Qingyou

2017-04-01

Genome editing enabled unprecedented new opportunities for targeted genomic engineering of a wide variety of organisms ranging from microbes, plants, animals and even human embryos. The serial establishing and rapid applications of genome editing tools significantly accelerated Bombyx mori (B. mori) research during the past years. However, the only CRISPR system in B. mori was the commonly used SpCas9, which only recognize target sites containing NGG PAM sequence. In the present study, we first improve the efficiency of our previous established SpCas9 system by 3.5 folds. The improved high efficiency was also observed at several loci in both BmNs cells and B. mori embryos. Then to expand the target sites, we showed that two newly discovered CRISPR system, SaCas9 and AsCpf1, could also induce highly efficient site-specific genome editing in BmNs cells, and constructed an integrated CRISPR system. Genome-wide analysis of targetable sites was further conducted and showed that the integrated system cover 69,144,399 sites in B. mori genome, and one site could be found in every 6.5 bp. The efficiency and resolution of this CRISPR platform will probably accelerate both fundamental researches and applicable studies in B. mori, and perhaps other insects. Copyright © 2017 Elsevier Ltd. All rights reserved.
VirtualPlant: A Software Platform to Support Systems Biology Research1[W][OA

PubMed Central

Katari, Manpreet S.; Nowicki, Steve D.; Aceituno, Felipe F.; Nero, Damion; Kelfer, Jonathan; Thompson, Lee Parnell; Cabello, Juan M.; Davidson, Rebecca S.; Goldberg, Arthur P.; Shasha, Dennis E.; Coruzzi, Gloria M.; Gutiérrez, Rodrigo A.

2010-01-01

Data generation is no longer the limiting factor in advancing biological research. In addition, data integration, analysis, and interpretation have become key bottlenecks and challenges that biologists conducting genomic research face daily. To enable biologists to derive testable hypotheses from the increasing amount of genomic data, we have developed the VirtualPlant software platform. VirtualPlant enables scientists to visualize, integrate, and analyze genomic data from a systems biology perspective. VirtualPlant integrates genome-wide data concerning the known and predicted relationships among genes, proteins, and molecules, as well as genome-scale experimental measurements. VirtualPlant also provides visualization techniques that render multivariate information in visual formats that facilitate the extraction of biological concepts. Importantly, VirtualPlant helps biologists who are not trained in computer science to mine lists of genes, microarray experiments, and gene networks to address questions in plant biology, such as: What are the molecular mechanisms by which internal or external perturbations affect processes controlling growth and development? We illustrate the use of VirtualPlant with three case studies, ranging from querying a gene of interest to the identification of gene networks and regulatory hubs that control seed development. Whereas the VirtualPlant software was developed to mine Arabidopsis (Arabidopsis thaliana) genomic data, its data structures, algorithms, and visualization tools are designed in a species-independent way. VirtualPlant is freely available at www.virtualplant.org. PMID:20007449
Genome-wide SNP identification, linkage map construction and QTL mapping for mineral nutrient concentrations and contents in pea (Pisum sativum L.)

USDA-ARS?s Scientific Manuscript database

Marker-assisted breeding is now routinely used in major crops to facilitate more efficient cultivar improvement. This has been significantly enabled by the use of next-generation sequencing technology to identify loci and markers associated with traits of interest. While rich in a variety of nutriti...
A genome-wide resource for the analysis of protein localisation in Drosophila

PubMed Central

Sarov, Mihail; Barz, Christiane; Jambor, Helena; Hein, Marco Y; Schmied, Christopher; Suchold, Dana; Stender, Bettina; Janosch, Stephan; KJ, Vinay Vikas; Krishnan, RT; Krishnamoorthy, Aishwarya; Ferreira, Irene RS; Ejsmont, Radoslaw K; Finkl, Katja; Hasse, Susanne; Kämpfer, Philipp; Plewka, Nicole; Vinis, Elisabeth; Schloissnig, Siegfried; Knust, Elisabeth; Hartenstein, Volker; Mann, Matthias; Ramaswami, Mani; VijayRaghavan, K; Tomancak, Pavel; Schnorrer, Frank

2016-01-01

The Drosophila genome contains >13000 protein-coding genes, the majority of which remain poorly investigated. Important reasons include the lack of antibodies or reporter constructs to visualise these proteins. Here, we present a genome-wide fosmid library of 10000 GFP-tagged clones, comprising tagged genes and most of their regulatory information. For 880 tagged proteins, we created transgenic lines, and for a total of 207 lines, we assessed protein expression and localisation in ovaries, embryos, pupae or adults by stainings and live imaging approaches. Importantly, we visualised many proteins at endogenous expression levels and found a large fraction of them localising to subcellular compartments. By applying genetic complementation tests, we estimate that about two-thirds of the tagged proteins are functional. Moreover, these tagged proteins enable interaction proteomics from developing pupae and adult flies. Taken together, this resource will boost systematic analysis of protein expression and localisation in various cellular and developmental contexts. DOI: http://dx.doi.org/10.7554/eLife.12068.001 PMID:26896675
Decoding the complex genetic causes of heart diseases using systems biology.

PubMed

Djordjevic, Djordje; Deshpande, Vinita; Szczesnik, Tomasz; Yang, Andrian; Humphreys, David T; Giannoulatou, Eleni; Ho, Joshua W K

2015-03-01

The pace of disease gene discovery is still much slower than expected, even with the use of cost-effective DNA sequencing and genotyping technologies. It is increasingly clear that many inherited heart diseases have a more complex polygenic aetiology than previously thought. Understanding the role of gene-gene interactions, epigenetics, and non-coding regulatory regions is becoming increasingly critical in predicting the functional consequences of genetic mutations identified by genome-wide association studies and whole-genome or exome sequencing. A systems biology approach is now being widely employed to systematically discover genes that are involved in heart diseases in humans or relevant animal models through bioinformatics. The overarching premise is that the integration of high-quality causal gene regulatory networks (GRNs), genomics, epigenomics, transcriptomics and other genome-wide data will greatly accelerate the discovery of the complex genetic causes of congenital and complex heart diseases. This review summarises state-of-the-art genomic and bioinformatics techniques that are used in accelerating the pace of disease gene discovery in heart diseases. Accompanying this review, we provide an interactive web-resource for systems biology analysis of mammalian heart development and diseases, CardiacCode ( http://CardiacCode.victorchang.edu.au/ ). CardiacCode features a dataset of over 700 pieces of manually curated genetic or molecular perturbation data, which enables the inference of a cardiac-specific GRN of 280 regulatory relationships between 33 regulator genes and 129 target genes. We believe this growing resource will fill an urgent unmet need to fully realise the true potential of predictive and personalised genomic medicine in tackling human heart disease.
Advances in ecological genomics in forest trees and applications to genetic resources conservation and breeding.

PubMed

Holliday, Jason A; Aitken, Sally N; Cooke, Janice E K; Fady, Bruno; González-Martínez, Santiago C; Heuertz, Myriam; Jaramillo-Correa, Juan-Pablo; Lexer, Christian; Staton, Margaret; Whetten, Ross W; Plomion, Christophe

2017-02-01

Forest trees are an unparalleled group of organisms in their combined ecological, economic and societal importance. With widespread distributions, predominantly random mating systems and large population sizes, most tree species harbour extensive genetic variation both within and among populations. At the same time, demographic processes associated with Pleistocene climate oscillations and land-use change have affected contemporary range-wide diversity and may impinge on the potential for future adaptation. Understanding how these adaptive and neutral processes have shaped the genomes of trees species is therefore central to their management and conservation. As for many other taxa, the advent of high-throughput sequencing methods is expected to yield an understanding of the interplay between the genome and environment at a level of detail and depth not possible only a few years ago. An international conference entitled 'Genomics and Forest Tree Genetics' was held in May 2016, in Arcachon (France), and brought together forest geneticists with a wide range of research interests to disseminate recent efforts that leverage contemporary genomic tools to probe the population, quantitative and evolutionary genomics of trees. An important goal of the conference was to discuss how such data can be applied to both genome-enabled breeding and the conservation of forest genetic resources under land use and climate change. Here, we report discoveries presented at the meeting and discuss how the ecological genomic toolkit can be used to address both basic and applied questions in tree biology. © 2016 John Wiley & Sons Ltd.
Phenotypes, genome wide markers and structured genetic populations; a means to understand economically important traits in beta vulgaris and to inform the process of germplasm enhancement

USDA-ARS?s Scientific Manuscript database

Although hybrid seed systems in beet have been widely adopted due to profitability and productivity, the population remains the operational unit of beet improvement and thus characterizing populations in terms of markers and phenotypes is critical for novel trait discovery and eventual deployment of...
Development of a tissue-specific ribosome profiling approach in Drosophila enables genome-wide evaluation of translational adaptations

PubMed Central

2017-01-01

Recent advances in next-generation sequencing approaches have revolutionized our understanding of transcriptional expression in diverse systems. However, measurements of transcription do not necessarily reflect gene translation, the process of ultimate importance in understanding cellular function. To circumvent this limitation, biochemical tagging of ribosome subunits to isolate ribosome-associated mRNA has been developed. However, this approach, called TRAP, lacks quantitative resolution compared to a superior technology, ribosome profiling. Here, we report the development of an optimized ribosome profiling approach in Drosophila. We first demonstrate successful ribosome profiling from a specific tissue, larval muscle, with enhanced resolution compared to conventional TRAP approaches. We next validate the ability of this technology to define genome-wide translational regulation. This technology is leveraged to test the relative contributions of transcriptional and translational mechanisms in the postsynaptic muscle that orchestrate the retrograde control of presynaptic function at the neuromuscular junction. Surprisingly, we find no evidence that significant changes in the transcription or translation of specific genes are necessary to enable retrograde homeostatic signaling, implying that post-translational mechanisms ultimately gate instructive retrograde communication. Finally, we show that a global increase in translation induces adaptive responses in both transcription and translation of protein chaperones and degradation factors to promote cellular proteostasis. Together, this development and validation of tissue-specific ribosome profiling enables sensitive and specific analysis of translation in Drosophila. PMID:29194454

Equine Clinical Genomics: A Clinician’s Primer

PubMed Central

Brosnahan, Margaret Mary; Brooks, Samantha A.; Antczak, Douglas F.

2012-01-01

Summary The objective of this review is to introduce equine clinicians to the rapidly evolving field of clinical genomics with a vision of improving the health and welfare of the domestic horse. For fifteen years a consortium of veterinary geneticists and clinicians has worked together under the umbrella of The Horse Genome Project. This group, encompassing 22 laboratories in 12 countries, has made rapid progress, developing several iterations of linkage, physical and comparative gene maps of the horse with increasing levels of detail. In early 2006, the research was greatly facilitated when the U.S. National Human Genome Research Institute of the National Institutes of Health added the horse to the list of mammalian species scheduled for whole genome sequencing. The genome of the domestic horse has now been sequenced and is available to researchers worldwide in publicly accessible databases. This achievement creates the potential for transformative change within the horse industry, particularly in the fields of internal medicine, sports medicine and reproduction. The genome sequence has enabled the development of new genome-wide tools and resources for studying inherited diseases of the horse. To date, researchers have identified eleven mutations causing ten clinical syndromes in the horse. Testing is commercially available for all but one of these diseases. Future research will probably identify the genetic bases for other equine diseases, produce new diagnostic tests and generate novel therapeutics for some of these conditions. This will enable equine clinicians to play a critical role in ensuring the thoughtful and appropriate application of this knowledge as they assist clients with breeding and clinical decision-making. PMID:20840582
Genome-Wide Spectra of Transcription Insertions and Deletions Reveal That Slippage Depends on RNA:DNA Hybrid Complementarity

PubMed Central

Traverse, Charles C.

2017-01-01

ABSTRACT Advances in sequencing technologies have enabled direct quantification of genome-wide errors that occur during RNA transcription. These errors occur at rates that are orders of magnitude higher than rates during DNA replication, but due to technical difficulties such measurements have been limited to single-base substitutions and have not yet quantified the scope of transcription insertions and deletions. Previous reporter gene assay findings suggested that transcription indels are produced exclusively by elongation complex slippage at homopolymeric runs, so we enumerated indels across the protein-coding transcriptomes of Escherichia coli and Buchnera aphidicola, which differ widely in their genomic base compositions and incidence of repeat regions. As anticipated from prior assays, transcription insertions prevailed in homopolymeric runs of A and T; however, transcription deletions arose in much more complex sequences and were rarely associated with homopolymeric runs. By reconstructing the relocated positions of the elongation complex as inferred from the sequences inserted or deleted during transcription, we show that continuation of transcription after slippage hinges on the degree of nucleotide complementarity within the RNA:DNA hybrid at the new DNA template location. PMID:28851848
CscoreTool: fast Hi-C compartment analysis at high resolution.

PubMed

Zheng, Xiaobin; Zheng, Yixian

2018-05-01

The genome-wide chromosome conformation capture (Hi-C) has revealed that the eukaryotic genome can be partitioned into A and B compartments that have distinctive chromatin and transcription features. Current Principle Component Analyses (PCA)-based method for the A/B compartment prediction based on Hi-C data requires substantial CPU time and memory. We report the development of a method, CscoreTool, which enables fast and memory-efficient determination of A/B compartments at high resolution even in datasets with low sequencing depth. https://github.com/scoutzxb/CscoreTool. xzheng@carnegiescience.edu. Supplementary data are available at Bioinformatics online.
Adaptive evolution has targeted the C-terminal domain of the RXLR effectors of plant pathogenic oomycetes.

PubMed

Win, Joe; Kamoun, Sophien

2008-04-01

Plant pathogenic microbes deliver effector proteins inside host cells to modulate plant defense circuitry and enable parasitic colonization. As genome sequences from plant pathogens become available, genome-wide evolutionary analyses will shed light on how pathogen effector genes evolved and adapted to the cellular environment of their host plants. In the August 2007 issue of Plant Cell, we described adaptive evolution (positive selection) in the cytoplasmic RXLR effectors of three recently sequenced oomycete plant pathogens. Here, we summarize our findings and describe additional data that further validate our approach.
Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bowers, Robert M.; Kyrpides, Nikos C.; Stepanauskas, Ramunas

The number of genomes from uncultivated microbes will soon surpass the number of isolate genomes in public databases (Hugenholtz, Skarshewski, & Parks, 2016). Technological advancements in high-throughput sequencing and assembly, including single-cell genomics and the computational extraction of genomes from metagenomes (GFMs), are largely responsible. Here we propose community standards for reporting the Minimum Information about a Single-Cell Genome (MIxS-SCG) and Minimum Information about Genomes extracted From Metagenomes (MIxS-GFM) specific for Bacteria and Archaea. The standards have been developed in the context of the International Genomics Standards Consortium (GSC) community (Field et al., 2014) and can be viewed as amore » supplement to other GSC checklists including the Minimum Information about a Genome Sequence (MIGS), Minimum information about a Metagenomic Sequence(s) (MIMS) (Field et al., 2008) and Minimum Information about a Marker Gene Sequence (MIMARKS) (P. Yilmaz et al., 2011). Community-wide acceptance of MIxS-SCG and MIxS-GFM for Bacteria and Archaea will enable broad comparative analyses of genomes from the majority of taxa that remain uncultivated, improving our understanding of microbial function, ecology, and evolution.« less
Genome-Enabled Molecular Tools for Reductive Dehalogenation

DTIC Science & Technology

2011-11-01

Genome-Enabled Molecular Tools for Reductive Dehalogenation - A Shift in Paradigm for Bioremediation - Alfred M. Spormann Departments of Chemical...Genome-Enabled Molecular Tools for Reductive Dehalogenation 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d...Applications Technical Session No. 3D C-77 GENOME-ENABLED MOLECULAR TOOLS FOR REDUCTIVE DEHALOGENATION PROFESSOR ALFRED SPORMANN Stanford
Evolution of rDNA in Nicotiana Allopolyploids: A Potential Link between rDNA Homogenization and Epigenetics

PubMed Central

Kovarik, Ales; Dadejova, Martina; Lim, Yoong K.; Chase, Mark W.; Clarkson, James J.; Knapp, Sandra; Leitch, Andrew R.

2008-01-01

Background The evolution and biology of rDNA have interested biologists for many years, in part, because of two intriguing processes: (1) nucleolar dominance and (2) sequence homogenization. We review patterns of evolution in rDNA in the angiosperm genus Nicotiana to determine consequences of allopolyploidy on these processes. Scope Allopolyploid species of Nicotiana are ideal for studying rDNA evolution because phylogenetic reconstruction of DNA sequences has revealed patterns of species divergence and their parents. From these studies we also know that polyploids formed over widely different timeframes (thousands to millions of years), enabling comparative and temporal studies of rDNA structure, activity and chromosomal distribution. In addition studies on synthetic polyploids enable the consequences of de novo polyploidy on rDNA activity to be determined. Conclusions We propose that rDNA epigenetic expression patterns established even in F1 hybrids have a material influence on the likely patterns of divergence of rDNA. It is the active rDNA units that are vulnerable to homogenization, which probably acts to reduce mutational load across the active array. Those rDNA units that are epigenetically silenced may be less vulnerable to sequence homogenization. Selection cannot act on these silenced genes, and they are likely to accumulate mutations and eventually be eliminated from the genome. It is likely that whole silenced arrays will be deleted in polyploids of 1 million years of age and older. PMID:18310159
Genome-wide association study identifies novel susceptibility loci for cutaneous squamous cell carcinoma.

PubMed

Chahal, Harvind S; Lin, Yuan; Ransohoff, Katherine J; Hinds, David A; Wu, Wenting; Dai, Hong-Ji; Qureshi, Abrar A; Li, Wen-Qing; Kraft, Peter; Tang, Jean Y; Han, Jiali; Sarin, Kavita Y

2016-07-18

Cutaneous squamous cell carcinoma represents the second most common cutaneous malignancy, affecting 7-11% of Caucasians in the United States. The genetic determinants of susceptibility to cutaneous squamous cell carcinoma remain largely unknown. Here we report the results of a two-stage genome-wide association study of cutaneous squamous cell carcinoma, totalling 7,404 cases and 292,076 controls. Eleven loci reached genome-wide significance (P<5 × 10(-8)) including seven previously confirmed pigmentation-related loci: MC1R, ASIP, TYR, SLC45A2, OCA2, IRF4 and BNC2. We identify an additional four susceptibility loci: 11q23.3 CADM1, a metastasis suppressor gene involved in modifying tumour interaction with cell-mediated immunity; 2p22.3; 7p21.1 AHR, the dioxin receptor involved in anti-apoptotic pathways and melanoma progression; and 9q34.3 SEC16A, a putative oncogene with roles in secretion and cellular proliferation. These susceptibility loci provide deeper insight into the pathogenesis of squamous cell carcinoma.
Robust one-Tube Ω-PCR Strategy Accelerates Precise Sequence Modification of Plasmids for Functional Genomics

PubMed Central

Chen, Letian; Wang, Fengpin; Wang, Xiaoyu; Liu, Yao-Guang

2013-01-01

Functional genomics requires vector construction for protein expression and functional characterization of target genes; therefore, a simple, flexible and low-cost molecular manipulation strategy will be highly advantageous for genomics approaches. Here, we describe a Ω-PCR strategy that enables multiple types of sequence modification, including precise insertion, deletion and substitution, in any position of a circular plasmid. Ω-PCR is based on an overlap extension site-directed mutagenesis technique, and is named for its characteristic Ω-shaped secondary structure during PCR. Ω-PCR can be performed either in two steps, or in one tube in combination with exonuclease I treatment. These strategies have wide applications for protein engineering, gene function analysis and in vitro gene splicing. PMID:23335613
Energy biotechnology in the CRISPR-Cas9 era.

PubMed

Estrela, Raissa; Cate, Jamie Harrison Doudna

2016-04-01

The production of bioenergy from plant biomass previously relied on using microorganisms that rapidly and efficiently convert simple sugars into fuels and chemicals. However, to exploit the far more abundant carbon fixed in plant cell walls, future industrial production hosts will need to be engineered to leverage the most efficient biochemical pathways and most robust traits that can be found in nature. The CRISPR-Cas9 genome editing technology now enables writing the genome at will, which will allow biotechnology to become an 'information science.' This review covers recent advances in using CRISPR-Cas9 to engineer the genomes of a wide variety of organisms that could be use in the industrial production of biofuels and renewable chemicals. Copyright © 2016 Elsevier Ltd. All rights reserved.
VariantSpark: population scale clustering of genotype information.

PubMed

O'Brien, Aidan R; Saunders, Neil F W; Guo, Yi; Buske, Fabian A; Scott, Rodney J; Bauer, Denis C

2015-12-10

Genomic information is increasingly used in medical practice giving rise to the need for efficient analysis methodology able to cope with thousands of individuals and millions of variants. The widely used Hadoop MapReduce architecture and associated machine learning library, Mahout, provide the means for tackling computationally challenging tasks. However, many genomic analyses do not fit the Map-Reduce paradigm. We therefore utilise the recently developed SPARK engine, along with its associated machine learning library, MLlib, which offers more flexibility in the parallelisation of population-scale bioinformatics tasks. The resulting tool, VARIANTSPARK provides an interface from MLlib to the standard variant format (VCF), offers seamless genome-wide sampling of variants and provides a pipeline for visualising results. To demonstrate the capabilities of VARIANTSPARK, we clustered more than 3,000 individuals with 80 Million variants each to determine the population structure in the dataset. VARIANTSPARK is 80 % faster than the SPARK-based genome clustering approach, ADAM, the comparable implementation using Hadoop/Mahout, as well as ADMIXTURE, a commonly used tool for determining individual ancestries. It is over 90 % faster than traditional implementations using R and Python. The benefits of speed, resource consumption and scalability enables VARIANTSPARK to open up the usage of advanced, efficient machine learning algorithms to genomic data.
The Genome-Wide Influence on Human BMI Depends on Physical Activity, Life Course, and Historical Period.

PubMed

Guo, Guang; Liu, Hexuan; Wang, Ling; Shen, Haipeng; Hu, Wen

2015-10-01

In this analysis, guided by an evolutionary framework, we investigate how the human genome as a whole interacts with historical period, age, and physical activity to influence body mass index (BMI). The genomic influence is estimated by (1) heritability or the proportion of variance in BMI explained by genome-wide genotype data, and (2) the random effects or the best linear unbiased predictors (BLUPs) of genome-wide association studies (GWAS) data on BMI. Data were used from the Framingham Heart Study (FHS) in the United States. The study was initiated in 1948, and the obesity data were collected repeatedly over the subsequent decades. The analyses draw analysis samples from a pool of >8,000 individuals in the FHS. The hypothesis testing based on Pitman test, permutation Pitman test, F test, and permutation F test produces three sets of significant findings. First, the genomic influence on BMI is substantially larger after the mid-1980s than in the few decades before the mid-1980s within each age group of 21-40, 41-50, 51-60, and >60. Second, the genomic influence on BMI weakens as one ages across the life course, or the genomic influence on BMI tends to be more important during reproductive ages than after reproductive ages within each of the two historical periods. Third, within the age group of 21-50 and not in the age group of >50, the genomic influence on BMI among physically active individuals is substantially smaller than the influence on those who are not physically active. In summary, this study provides evidence that the influence of human genome as a whole on obesity depends on historical period, age, and level of physical activity.
Time-Resolved Transposon Insertion Sequencing Reveals Genome-Wide Fitness Dynamics during Infection.

PubMed

Yang, Guanhua; Billings, Gabriel; Hubbard, Troy P; Park, Joseph S; Yin Leung, Ka; Liu, Qin; Davis, Brigid M; Zhang, Yuanxing; Wang, Qiyao; Waldor, Matthew K

2017-10-03

Transposon insertion sequencing (TIS) is a powerful high-throughput genetic technique that is transforming functional genomics in prokaryotes, because it enables genome-wide mapping of the determinants of fitness. However, current approaches for analyzing TIS data assume that selective pressures are constant over time and thus do not yield information regarding changes in the genetic requirements for growth in dynamic environments (e.g., during infection). Here, we describe structured analysis of TIS data collected as a time series, termed pattern analysis of conditional essentiality (PACE). From a temporal series of TIS data, PACE derives a quantitative assessment of each mutant's fitness over the course of an experiment and identifies mutants with related fitness profiles. In so doing, PACE circumvents major limitations of existing methodologies, specifically the need for artificial effect size thresholds and enumeration of bacterial population expansion. We used PACE to analyze TIS samples of Edwardsiella piscicida (a fish pathogen) collected over a 2-week infection period from a natural host (the flatfish turbot). PACE uncovered more genes that affect E. piscicida 's fitness in vivo than were detected using a cutoff at a terminal sampling point, and it identified subpopulations of mutants with distinct fitness profiles, one of which informed the design of new live vaccine candidates. Overall, PACE enables efficient mining of time series TIS data and enhances the power and sensitivity of TIS-based analyses. IMPORTANCE Transposon insertion sequencing (TIS) enables genome-wide mapping of the genetic determinants of fitness, typically based on observations at a single sampling point. Here, we move beyond analysis of endpoint TIS data to create a framework for analysis of time series TIS data, termed pattern analysis of conditional essentiality (PACE). We applied PACE to identify genes that contribute to colonization of a natural host by the fish pathogen Edwardsiella piscicida. PACE uncovered more genes that affect E. piscicida 's fitness in vivo than were detected using a terminal sampling point, and its clustering of mutants with related fitness profiles informed design of new live vaccine candidates. PACE yields insights into patterns of fitness dynamics and circumvents major limitations of existing methodologies. Finally, the PACE method should be applicable to additional "omic" time series data, including screens based on clustered regularly interspaced short palindromic repeats with Cas9 (CRISPR/Cas9). Copyright © 2017 Yang et al.
A genome-wide CRISPR library for high-throughput genetic screening in Drosophila cells.

PubMed

Bassett, Andrew R; Kong, Lesheng; Liu, Ji-Long

2015-06-20

The simplicity of the CRISPR/Cas9 system of genome engineering has opened up the possibility of performing genome-wide targeted mutagenesis in cell lines, enabling screening for cellular phenotypes resulting from genetic aberrations. Drosophila cells have proven to be highly effective in identifying genes involved in cellular processes through similar screens using partial knockdown by RNAi. This is in part due to the lower degree of redundancy between genes in this organism, whilst still maintaining highly conserved gene networks and orthologs of many human disease-causing genes. The ability of CRISPR to generate genetic loss of function mutations not only increases the magnitude of any effect over currently employed RNAi techniques, but allows analysis over longer periods of time which can be critical for certain phenotypes. In this study, we have designed and built a genome-wide CRISPR library covering 13,501 genes, among which 8989 genes are targeted by three or more independent single guide RNAs (sgRNAs). Moreover, we describe strategies to monitor the population of guide RNAs by high throughput sequencing (HTS). We hope that this library will provide an invaluable resource for the community to screen loss of function mutations for cellular phenotypes, and as a source of guide RNA designs for future studies. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Retroposed SNOfall--a mammalian-wide comparison of platypus snoRNAs.

PubMed

Schmitz, Jürgen; Zemann, Anja; Churakov, Gennady; Kuhl, Heiner; Grützner, Frank; Reinhardt, Richard; Brosius, Jürgen

2008-06-01

Diversification of mammalian species began more than 160 million years ago when the egg-laying monotremes diverged from live bearing mammals. The duck-billed platypus (Ornithorhynchus anatinus) and echidnas are the only potential contemporary witnesses of this period and, thereby, provide a unique insight into mammalian genome evolution. It has become clear that small RNAs are major regulatory agents in eukaryotic cells, and the significant role of non-protein-coding (npc) RNAs in transcription, processing, and translation is now well accepted. Here we show that the platypus genome contains more than 200 small nucleolar (sno) RNAs among hundreds of other diverse npcRNAs. Their comparison among key mammalian groups and other vertebrates enabled us to reconstruct a complete temporal pathway of acquisition and loss of these snoRNAs. In platypus we found cis- and trans-duplication distribution patterns for snoRNAs, which have not been described in any other vertebrates but are known to occur in nematodes. An exciting novelty in platypus is a snoRNA-derived retroposon (termed snoRTE) that facilitates a very effective dispersal of an H/ACA snoRNA via RTE-mediated retroposition. From more than 40,000 detected full-length and truncated genomic copies of this snoRTE, at least 21 are processed into mature snoRNAs. High-copy retroposition via multiple host gene-promoted transcription units is a novel pathway for combining housekeeping function and SINE-like dispersal and reveals a new dimension in the evolution of novel snoRNA function.
Retroposed SNOfall—A mammalian-wide comparison of platypus snoRNAs

PubMed Central

Schmitz, Jürgen; Zemann, Anja; Churakov, Gennady; Kuhl, Heiner; Grützner, Frank; Reinhardt, Richard; Brosius, Jürgen

2008-01-01

Diversification of mammalian species began more than 160 million years ago when the egg-laying monotremes diverged from live bearing mammals. The duck-billed platypus (Ornithorhynchus anatinus) and echidnas are the only potential contemporary witnesses of this period and, thereby, provide a unique insight into mammalian genome evolution. It has become clear that small RNAs are major regulatory agents in eukaryotic cells, and the significant role of non-protein-coding (npc) RNAs in transcription, processing, and translation is now well accepted. Here we show that the platypus genome contains more than 200 small nucleolar (sno) RNAs among hundreds of other diverse npcRNAs. Their comparison among key mammalian groups and other vertebrates enabled us to reconstruct a complete temporal pathway of acquisition and loss of these snoRNAs. In platypus we found cis- and trans-duplication distribution patterns for snoRNAs, which have not been described in any other vertebrates but are known to occur in nematodes. An exciting novelty in platypus is a snoRNA-derived retroposon (termed snoRTE) that facilitates a very effective dispersal of an H/ACA snoRNA via RTE-mediated retroposition. From more than 40,000 detected full-length and truncated genomic copies of this snoRTE, at least 21 are processed into mature snoRNAs. High-copy retroposition via multiple host gene-promoted transcription units is a novel pathway for combining housekeeping function and SINE-like dispersal and reveals a new dimension in the evolution of novel snoRNA function. PMID:18463303
Genome-wide data (ChIP-seq) enabled identification of cell wall-related and aquaporin genes as targets of tomato ASR1, a drought stress-responsive transcription factor

USDA-ARS?s Scientific Manuscript database

Here we report efforts to take advantage of previous knowledge on well characterized proteins that extensively accumulate in dehydration, for example those belonging to the LEA (late embryogenesis abundant) superfamily. ASR proteins, a subgroup exclusive to the plant kingdom (albeit absent in Arabid...
SpDamID: Marking DNA Bound by Protein Complexes Identifies Notch-Dimer Responsive Enhancers

PubMed Central

Hass, Matthew R.; Liow, Hien-haw; Chen, Xiaoting; Sharma, Ankur; Inoue, Yukiko U.; Inoue, Takayoshi; Reeb, Ashley; Martens, Andrew; Fulbright, Mary; Raju, Saravanan; Stevens, Michael; Boyle, Scott; Park, Joo-Seop; Weirauch, Matthew T.; Brent, Michael; Kopan, Raphael

2015-01-01

SUMMARY We developed Split DamID (SpDamID), a protein complementation version of DamID, to mark genomic DNA bound in vivo by interacting or juxtapositioned transcription factors. Inactive halves of DAM (DNA Adenine Methyltransferase) were fused to protein pairs to be queried Interaction or proximity enabled DAM reconstitution and methylation of adenine in GATC. Inducible SpDamID was used to analyze Notch-mediated transcriptional activation. We demonstrate that Notch complexes label RBP sites broadly across the genome, and show that a subset of these complexes that recruit MAML and p300 undergo changes in chromatin accessibility in response to Notch signaling. SpDamID differentiates between monomeric and dimeric binding thereby allowing for identification of half-site motifs used by Notch dimers. Motif enrichment of Notch enhancers coupled with SpDamID reveals co-targeting of regulatory sequences by Notch and Runx1. SpDamID represents a sensitive and powerful tool that enables dynamic analysis of combinatorial protein-DNA transactions at a genome-wide level. PMID:26257285
Functional Analysis With a Barcoder Yeast Gene Overexpression System

PubMed Central

Douglas, Alison C.; Smith, Andrew M.; Sharifpoor, Sara; Yan, Zhun; Durbic, Tanja; Heisler, Lawrence E.; Lee, Anna Y.; Ryan, Owen; Göttert, Hendrikje; Surendra, Anu; van Dyk, Dewald; Giaever, Guri; Boone, Charles; Nislow, Corey; Andrews, Brenda J.

2012-01-01

Systematic analysis of gene overexpression phenotypes provides an insight into gene function, enzyme targets, and biological pathways. Here, we describe a novel functional genomics platform that enables a highly parallel and systematic assessment of overexpression phenotypes in pooled cultures. First, we constructed a genome-level collection of ~5100 yeast barcoder strains, each of which carries a unique barcode, enabling pooled fitness assays with a barcode microarray or sequencing readout. Second, we constructed a yeast open reading frame (ORF) galactose-induced overexpression array by generating a genome-wide set of yeast transformants, each of which carries an individual plasmid-born and sequence-verified ORF derived from the Saccharomyces cerevisiae full-length EXpression-ready (FLEX) collection. We combined these collections genetically using synthetic genetic array methodology, generating ~5100 strains, each of which is barcoded and overexpresses a specific ORF, a set we termed “barFLEX.” Additional synthetic genetic array allows the barFLEX collection to be moved into different genetic backgrounds. As a proof-of-principle, we describe the properties of the barFLEX overexpression collection and its application in synthetic dosage lethality studies under different environmental conditions. PMID:23050238
Evidence for suppression of immunity as a driver for genomic introgressions and host range expansion in races of Albugo candida, a generalist parasite

PubMed Central

McMullan, Mark; Gardiner, Anastasia; Bailey, Kate; Kemen, Eric; Ward, Ben J; Cevik, Volkan; Robert-Seilaniantz, Alexandre; Schultz-Larsen, Torsten; Balmuth, Alexi; Holub, Eric; van Oosterhout, Cock; Jones, Jonathan DG

2015-01-01

How generalist parasites with wide host ranges can evolve is a central question in parasite evolution. Albugo candida is an obligate biotrophic parasite that consists of many physiological races that each specialize on distinct Brassicaceae host species. By analyzing genome sequence assemblies of five isolates, we show they represent three races that are genetically diverged by ∼1%. Despite this divergence, their genomes are mosaic-like, with ∼25% being introgressed from other races. Sequential infection experiments show that infection by adapted races enables subsequent infection of hosts by normally non-infecting races. This facilitates introgression and the exchange of effector repertoires, and may enable the evolution of novel races that can undergo clonal population expansion on new hosts. We discuss recent studies on hybridization in other eukaryotes such as yeast, Heliconius butterflies, Darwin's finches, sunflowers and cichlid fishes, and the implications of introgression for pathogen evolution in an agro-ecological environment. DOI: http://dx.doi.org/10.7554/eLife.04550.001 PMID:25723966

Mouse Genome Informatics (MGI) Resource: Genetic, Genomic, and Biological Knowledgebase for the Laboratory Mouse.

PubMed

Eppig, Janan T

2017-07-01

The Mouse Genome Informatics (MGI) Resource supports basic, translational, and computational research by providing high-quality, integrated data on the genetics, genomics, and biology of the laboratory mouse. MGI serves a strategic role for the scientific community in facilitating biomedical, experimental, and computational studies investigating the genetics and processes of diseases and enabling the development and testing of new disease models and therapeutic interventions. This review describes the nexus of the body of growing genetic and biological data and the advances in computer technology in the late 1980s, including the World Wide Web, that together launched the beginnings of MGI. MGI develops and maintains a gold-standard resource that reflects the current state of knowledge, provides semantic and contextual data integration that fosters hypothesis testing, continually develops new and improved tools for searching and analysis, and partners with the scientific community to assure research data needs are met. Here we describe one slice of MGI relating to the development of community-wide large-scale mutagenesis and phenotyping projects and introduce ways to access and use these MGI data. References and links to additional MGI aspects are provided. © The Author 2017. Published by Oxford University Press.
Mouse Genome Informatics (MGI) Resource: Genetic, Genomic, and Biological Knowledgebase for the Laboratory Mouse

PubMed Central

Eppig, Janan T.

2017-01-01

Abstract The Mouse Genome Informatics (MGI) Resource supports basic, translational, and computational research by providing high-quality, integrated data on the genetics, genomics, and biology of the laboratory mouse. MGI serves a strategic role for the scientific community in facilitating biomedical, experimental, and computational studies investigating the genetics and processes of diseases and enabling the development and testing of new disease models and therapeutic interventions. This review describes the nexus of the body of growing genetic and biological data and the advances in computer technology in the late 1980s, including the World Wide Web, that together launched the beginnings of MGI. MGI develops and maintains a gold-standard resource that reflects the current state of knowledge, provides semantic and contextual data integration that fosters hypothesis testing, continually develops new and improved tools for searching and analysis, and partners with the scientific community to assure research data needs are met. Here we describe one slice of MGI relating to the development of community-wide large-scale mutagenesis and phenotyping projects and introduce ways to access and use these MGI data. References and links to additional MGI aspects are provided. PMID:28838066
Genetics on the Fly: A Primer on the Drosophila Model System

PubMed Central

Hales, Karen G.; Korey, Christopher A.; Larracuente, Amanda M.; Roberts, David M.

2015-01-01

Fruit flies of the genus Drosophila have been an attractive and effective genetic model organism since Thomas Hunt Morgan and colleagues made seminal discoveries with them a century ago. Work with Drosophila has enabled dramatic advances in cell and developmental biology, neurobiology and behavior, molecular biology, evolutionary and population genetics, and other fields. With more tissue types and observable behaviors than in other short-generation model organisms, and with vast genome data available for many species within the genus, the fly’s tractable complexity will continue to enable exciting opportunities to explore mechanisms of complex developmental programs, behaviors, and broader evolutionary questions. This primer describes the organism’s natural history, the features of sequenced genomes within the genus, the wide range of available genetic tools and online resources, the types of biological questions Drosophila can help address, and historical milestones. PMID:26564900
ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data

PubMed Central

2010-01-01

Background Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) or ChIP followed by genome tiling array analysis (ChIP-chip) have become standard technologies for genome-wide identification of DNA-binding protein target sites. A number of algorithms have been developed in parallel that allow identification of binding sites from ChIP-seq or ChIP-chip datasets and subsequent visualization in the University of California Santa Cruz (UCSC) Genome Browser as custom annotation tracks. However, summarizing these tracks can be a daunting task, particularly if there are a large number of binding sites or the binding sites are distributed widely across the genome. Results We have developed ChIPpeakAnno as a Bioconductor package within the statistical programming environment R to facilitate batch annotation of enriched peaks identified from ChIP-seq, ChIP-chip, cap analysis of gene expression (CAGE) or any experiments resulting in a large number of enriched genomic regions. The binding sites annotated with ChIPpeakAnno can be viewed easily as a table, a pie chart or plotted in histogram form, i.e., the distribution of distances to the nearest genes for each set of peaks. In addition, we have implemented functionalities for determining the significance of overlap between replicates or binding sites among transcription factors within a complex, and for drawing Venn diagrams to visualize the extent of the overlap between replicates. Furthermore, the package includes functionalities to retrieve sequences flanking putative binding sites for PCR amplification, cloning, or motif discovery, and to identify Gene Ontology (GO) terms associated with adjacent genes. Conclusions ChIPpeakAnno enables batch annotation of the binding sites identified from ChIP-seq, ChIP-chip, CAGE or any technology that results in a large number of enriched genomic regions within the statistical programming environment R. Allowing users to pass their own annotation data such as a different Chromatin immunoprecipitation (ChIP) preparation and a dataset from literature, or existing annotation packages, such as GenomicFeatures and BSgenome, provides flexibility. Tight integration to the biomaRt package enables up-to-date annotation retrieval from the BioMart database. PMID:20459804
DOE Office of Scientific and Technical Information (OSTI.GOV)

Gilbert, Jack A.; Quinn, Robert A.; Debelius, Justine

Rapid advances in DNA sequencing, metabolomics, proteomics and computation dramatically increase accessibility of microbiome studies and identify links between the microbiome and disease. Microbial time-series and multiple molecular perspectives enable Microbiome-Wide Association Studies (MWAS), analogous to Genome-Wide Association Studies (GWAS). Rapid research advances point towards actionable results, although approved clinical tests based on MWAS are still in the future. Appreciating the complexity of interactions between diet, chemistry, health and the microbiome, and determining the frequency of observations needed to capture and integrate this dynamic interface, is paramount for addressing the need for personalized and precision microbiome-based diagnostics and therapies.
Read clouds uncover variation in complex regions of the human genome

PubMed Central

Bishara, Alex; Liu, Yuling; Weng, Ziming; Kashef-Haghighi, Dorna; Newburger, Daniel E.; West, Robert; Sidow, Arend; Batzoglou, Serafim

2015-01-01

Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Core to the problem is the lack of a sequencing technology that produces reads with sufficient length and accuracy to enable unique mapping. Here, we present a novel methodology of using read clouds, obtained by accurate short-read sequencing of DNA derived from long fragment libraries, to confidently align short reads within repeat regions and enable accurate variant discovery. Our novel algorithm, Random Field Aligner (RFA), captures the relationships among the short reads governed by the long read process via a Markov Random Field. We utilized a modified version of the Illumina TruSeq synthetic long-read protocol, which yielded shallow-sequenced read clouds. We test RFA through extensive simulations and apply it to discover variants on the NA12878 human sample, for which shallow TruSeq read cloud sequencing data are available, and on an invasive breast carcinoma genome that we sequenced using the same method. We demonstrate that RFA facilitates accurate recovery of variation in 155 Mb of the human genome, including 94% of 67 Mb of segmental duplication sequence and 96% of 11 Mb of transcribed sequence, that are currently hidden from short-read technologies. PMID:26286554
Genome-wide Reconstruction of OxyR and SoxRS Transcriptional Regulatory Networks under Oxidative Stress in Escherichia coli K-12 MG1655.

PubMed

Seo, Sang Woo; Kim, Donghyuk; Szubin, Richard; Palsson, Bernhard O

2015-08-25

Three transcription factors (TFs), OxyR, SoxR, and SoxS, play a critical role in transcriptional regulation of the defense system for oxidative stress in bacteria. However, their full genome-wide regulatory potential is unknown. Here, we perform a genome-scale reconstruction of the OxyR, SoxR, and SoxS regulons in Escherichia coli K-12 MG1655. Integrative data analysis reveals that a total of 68 genes in 51 transcription units (TUs) belong to these regulons. Among them, 48 genes showed more than 2-fold changes in expression level under single-TF-knockout conditions. This reconstruction expands the genome-wide roles of these factors to include direct activation of genes related to amino acid biosynthesis (methionine and aromatic amino acids), cell wall synthesis (lipid A biosynthesis and peptidoglycan growth), and divalent metal ion transport (Mn(2+), Zn(2+), and Mg(2+)). Investigating the co-regulation of these genes with other stress-response TFs reveals that they are independently regulated by stress-specific TFs. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Novel insights into the elm yellows phytoplasma genome and into the metagenome of elm yellows-infected elms

Treesearch

Christina Rosa; Paolo Margaria; Scott M. Geib; Erin D. Scully

2017-01-01

In North America, American elms were historically present throughout the northeastern United States and southeastern Canada. The longevity of these trees, their resistance to the harsh urban environment, and their aesthetics led to their wide use in landscaping and streetscaping over several decades. American elms were one of most cultivated plants in the United States...
The Importance of Biological Databases in Biological Discovery.

PubMed

Baxevanis, Andreas D; Bateman, Alex

2015-06-19

Biological databases play a central role in bioinformatics. They offer scientists the opportunity to access a wide variety of biologically relevant data, including the genomic sequences of an increasingly broad range of organisms. This unit provides a brief overview of major sequence databases and portals, such as GenBank, the UCSC Genome Browser, and Ensembl. Model organism databases, including WormBase, The Arabidopsis Information Resource (TAIR), and those made available through the Mouse Genome Informatics (MGI) resource, are also covered. Non-sequence-centric databases, such as Online Mendelian Inheritance in Man (OMIM), the Protein Data Bank (PDB), MetaCyc, and the Kyoto Encyclopedia of Genes and Genomes (KEGG), are also discussed. Copyright © 2015 John Wiley & Sons, Inc.
GenomicTools: a computational platform for developing high-throughput analytics in genomics.

PubMed

Tsirigos, Aristotelis; Haiminen, Niina; Bilal, Erhan; Utro, Filippo

2012-01-15

Recent advances in sequencing technology have resulted in the dramatic increase of sequencing data, which, in turn, requires efficient management of computational resources, such as computing time, memory requirements as well as prototyping of computational pipelines. We present GenomicTools, a flexible computational platform, comprising both a command-line set of tools and a C++ API, for the analysis and manipulation of high-throughput sequencing data such as DNA-seq, RNA-seq, ChIP-seq and MethylC-seq. GenomicTools implements a variety of mathematical operations between sets of genomic regions thereby enabling the prototyping of computational pipelines that can address a wide spectrum of tasks ranging from pre-processing and quality control to meta-analyses. Additionally, the GenomicTools platform is designed to analyze large datasets of any size by minimizing memory requirements. In practical applications, where comparable, GenomicTools outperforms existing tools in terms of both time and memory usage. The GenomicTools platform (version 2.0.0) was implemented in C++. The source code, documentation, user manual, example datasets and scripts are available online at http://code.google.com/p/ibm-cbc-genomic-tools.
Oncogenomic portals for the visualization and analysis of genome-wide cancer data

PubMed Central

Klonowska, Katarzyna; Czubak, Karol; Wojciechowska, Marzena; Handschuh, Luiza; Zmienko, Agnieszka; Figlerowicz, Marek; Dams-Kozlowska, Hanna; Kozlowski, Piotr

2016-01-01

Somatically acquired genomic alterations that drive oncogenic cellular processes are of great scientific and clinical interest. Since the initiation of large-scale cancer genomic projects (e.g., the Cancer Genome Project, The Cancer Genome Atlas, and the International Cancer Genome Consortium cancer genome projects), a number of web-based portals have been created to facilitate access to multidimensional oncogenomic data and assist with the interpretation of the data. The portals provide the visualization of small-size mutations, copy number variations, methylation, and gene/protein expression data that can be correlated with the available clinical, epidemiological, and molecular features. Additionally, the portals enable to analyze the gathered data with the use of various user-friendly statistical tools. Herein, we present a highly illustrated review of seven portals, i.e., Tumorscape, UCSC Cancer Genomics Browser, ICGC Data Portal, COSMIC, cBioPortal, IntOGen, and BioProfiling.de. All of the selected portals are user-friendly and can be exploited by scientists from different cancer-associated fields, including those without bioinformatics background. It is expected that the use of the portals will contribute to a better understanding of cancer molecular etiology and will ultimately accelerate the translation of genomic knowledge into clinical practice. PMID:26484415
Oncogenomic portals for the visualization and analysis of genome-wide cancer data.

PubMed

Klonowska, Katarzyna; Czubak, Karol; Wojciechowska, Marzena; Handschuh, Luiza; Zmienko, Agnieszka; Figlerowicz, Marek; Dams-Kozlowska, Hanna; Kozlowski, Piotr

2016-01-05

Somatically acquired genomic alterations that drive oncogenic cellular processes are of great scientific and clinical interest. Since the initiation of large-scale cancer genomic projects (e.g., the Cancer Genome Project, The Cancer Genome Atlas, and the International Cancer Genome Consortium cancer genome projects), a number of web-based portals have been created to facilitate access to multidimensional oncogenomic data and assist with the interpretation of the data. The portals provide the visualization of small-size mutations, copy number variations, methylation, and gene/protein expression data that can be correlated with the available clinical, epidemiological, and molecular features. Additionally, the portals enable to analyze the gathered data with the use of various user-friendly statistical tools. Herein, we present a highly illustrated review of seven portals, i.e., Tumorscape, UCSC Cancer Genomics Browser, ICGC Data Portal, COSMIC, cBioPortal, IntOGen, and BioProfiling.de. All of the selected portals are user-friendly and can be exploited by scientists from different cancer-associated fields, including those without bioinformatics background. It is expected that the use of the portals will contribute to a better understanding of cancer molecular etiology and will ultimately accelerate the translation of genomic knowledge into clinical practice.
Proteome Studies of Filamentous Fungi

DOE Office of Scientific and Technical Information (OSTI.GOV)

Baker, Scott E.; Panisko, Ellen A.

2011-04-20

The continued fast pace of fungal genome sequence generation has enabled proteomic analysis of a wide breadth of organisms that span the breadth of the Kingdom Fungi. There is some phylogenetic bias to the current catalog of fungi with reasonable DNA sequence databases (genomic or EST) that could be analyzed at a global proteomic level. However, the rapid development of next generation sequencing platforms has lowered the cost of genome sequencing such that in the near future, having a genome sequence will no longer be a time or cost bottleneck for downstream proteomic (and transcriptomic) analyses. High throughput, non-gel basedmore » proteomics offers a snapshot of proteins present in a given sample at a single point in time. There are a number of different variations on the general method and technologies for identifying peptides in a given sample. We present a method that can serve as a “baseline” for proteomic studies of fungi.« less
Proteome studies of filamentous fungi.

PubMed

Baker, Scott E; Panisko, Ellen A

2011-01-01

The continued fast pace of fungal genome sequence generation has enabled proteomic analysis of a wide variety of organisms that span the breadth of the Kingdom Fungi. There is some phylogenetic bias to the current catalog of fungi with reasonable DNA sequence databases (genomic or EST) that could be analyzed at a global proteomic level. However, the rapid development of next generation sequencing platforms has lowered the cost of genome sequencing such that in the near future, having a genome sequence will no longer be a time or cost bottleneck for downstream proteomic (and transcriptomic) analyses. High throughput, nongel-based proteomics offers a snapshot of proteins present in a given sample at a single point in time. There are a number of variations on the general methods and technologies for identifying peptides in a given sample. We present a method that can serve as a "baseline" for proteomic studies of fungi.
Polycistronic tRNA and CRISPR guide-RNA enables highly efficient multiplexed genome engineering in human cells

PubMed Central

Dong, Fengping; Xie, Kabin; Chen, Yueying; Yang, Yinong; Mao, Yingwei

2016-01-01

CRISPR/Cas9 has been widely used for genomic editing in many organisms. Many human diseases are caused by multiple mutations. The CRISPR/Cas9 system provides a potential tool to introduce multiple mutations in a genome. To mimic complicated genomic variants in human diseases, such as multiple gene deletions or mutations, two or more small guide RNAs (sgRNAs) need to be introduced all together. This can be achieved by separate Pol III promoters in a construct. However, limited enzyme sites and increased insertion size lower the efficiency to make a construct. Here, we report a strategy to quickly assembly multiple sgRNAs in one construct using a polycistronic-tRNA-gRNA (PTG) strategy. Taking advantage of the endogenous tRNA processing system in mammalian cells, we efficiently express multiple sgRNAs driven using only one Pol III promoter. Using an all-in-one construct carrying PTG, we disrupt the deacetylase domain in multiple histone deacetylases (HDACs) in human cells simultaneously. We demonstrate that multiple HDAC deletions significantly affect the activation of the Wnt-signaling pathway. Thus, this method enables to efficiently target multiple genes and provide a useful tool to establish mutated cells mimicking human diseases. PMID:27890617
Polycistronic tRNA and CRISPR guide-RNA enables highly efficient multiplexed genome engineering in human cells.

PubMed

Dong, Fengping; Xie, Kabin; Chen, Yueying; Yang, Yinong; Mao, Yingwei

2017-01-22

CRISPR/Cas9 has been widely used for genomic editing in many organisms. Many human diseases are caused by multiple mutations. The CRISPR/Cas9 system provides a potential tool to introduce multiple mutations in a genome. To mimic complicated genomic variants in human diseases, such as multiple gene deletions or mutations, two or more small guide RNAs (sgRNAs) need to be introduced all together. This can be achieved by separate Pol III promoters in a construct. However, limited enzyme sites and increased insertion size lower the efficiency to make a construct. Here, we report a strategy to quickly assembly multiple sgRNAs in one construct using a polycistronic-tRNA-gRNA (PTG) strategy. Taking advantage of the endogenous tRNA processing system in mammalian cells, we efficiently express multiple sgRNAs driven using only one Pol III promoter. Using an all-in-one construct carrying PTG, we disrupt the deacetylase domain in multiple histone deacetylases (HDACs) in human cells simultaneously. We demonstrate that multiple HDAC deletions significantly affect the activation of the Wnt-signaling pathway. Thus, this method enables to efficiently target multiple genes and provide a useful tool to establish mutated cells mimicking human diseases. Copyright © 2016 Elsevier Inc. All rights reserved.
High-resolution definition of the Vibrio cholerae essential gene set with hidden Markov model–based analyses of transposon-insertion sequencing data

PubMed Central

Chao, Michael C.; Pritchard, Justin R.; Zhang, Yanjia J.; Rubin, Eric J.; Livny, Jonathan; Davis, Brigid M.; Waldor, Matthew K.

2013-01-01

The coupling of high-density transposon mutagenesis to high-throughput DNA sequencing (transposon-insertion sequencing) enables simultaneous and genome-wide assessment of the contributions of individual loci to bacterial growth and survival. We have refined analysis of transposon-insertion sequencing data by normalizing for the effect of DNA replication on sequencing output and using a hidden Markov model (HMM)-based filter to exploit heretofore unappreciated information inherent in all transposon-insertion sequencing data sets. The HMM can smooth variations in read abundance and thereby reduce the effects of read noise, as well as permit fine scale mapping that is independent of genomic annotation and enable classification of loci into several functional categories (e.g. essential, domain essential or ‘sick’). We generated a high-resolution map of genomic loci (encompassing both intra- and intergenic sequences) that are required or beneficial for in vitro growth of the cholera pathogen, Vibrio cholerae. This work uncovered new metabolic and physiologic requirements for V. cholerae survival, and by combining transposon-insertion sequencing and transcriptomic data sets, we also identified several novel noncoding RNA species that contribute to V. cholerae growth. Our findings suggest that HMM-based approaches will enhance extraction of biological meaning from transposon-insertion sequencing genomic data. PMID:23901011
Analysis of Bioprocesses. Dynamic Modeling is a Must.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ramkrishna, Doraiswami; Song, Hyun-Seob

2016-01-01

The goal of this paper is to report on the performance of a promising dynamic framework based on the cybernetic concepts which have evolved over three decades. We present case studies of successful dynamic simulations of wild-type strains as well as specific KO mutants on bacteria and yeast. An extensive metabolic engineering effort, including genome scale networks, is called for to secure the methodology and realize its full potential. Towards this end, the software AUMIC is under active further development to enable speedy applications. Its wide use will be enabled by a publication that is shortly due.
Comparative genomics of the marine bacterial genus Glaciecola reveals the high degree of genomic diversity and genomic characteristic for cold adaptation.

PubMed

Qin, Qi-Long; Xie, Bin-Bin; Yu, Yong; Shu, Yan-Li; Rong, Jin-Cheng; Zhang, Yan-Jiao; Zhao, Dian-Li; Chen, Xiu-Lan; Zhang, Xi-Ying; Chen, Bo; Zhou, Bai-Cheng; Zhang, Yu-Zhong

2014-06-01

To what extent the genomes of different species belonging to one genus can be diverse and the relationship between genomic differentiation and environmental factor remain unclear for oceanic bacteria. With many new bacterial genera and species being isolated from marine environments, this question warrants attention. In this study, we sequenced all the type strains of the published species of Glaciecola, a recently defined cold-adapted genus with species from diverse marine locations, to study the genomic diversity and cold-adaptation strategy in this genus.The genome size diverged widely from 3.08 to 5.96 Mb, which can be explained by massive gene gain and loss events. Horizontal gene transfer and new gene emergence contributed substantially to the genome size expansion. The genus Glaciecola had an open pan-genome. Comparative genomic research indicated that species of the genus Glaciecola had high diversity in genome size, gene content and genetic relatedness. This may be prevalent in marine bacterial genera considering the dynamic and complex environments of the ocean. Species of Glaciecola had some common genomic features related to cold adaptation, which enable them to thrive and play a role in biogeochemical cycle in the cold marine environments.
Directional genomic hybridization for chromosomal inversion discovery and detection.

PubMed

Ray, F Andrew; Zimmerman, Erin; Robinson, Bruce; Cornforth, Michael N; Bedford, Joel S; Goodwin, Edwin H; Bailey, Susan M

2013-04-01

Chromosomal rearrangements are a source of structural variation within the genome that figure prominently in human disease, where the importance of translocations and deletions is well recognized. In principle, inversions-reversals in the orientation of DNA sequences within a chromosome-should have similar detrimental potential. However, the study of inversions has been hampered by traditional approaches used for their detection, which are not particularly robust. Even with significant advances in whole genome approaches, changes in the absolute orientation of DNA remain difficult to detect routinely. Consequently, our understanding of inversions is still surprisingly limited, as is our appreciation for their frequency and involvement in human disease. Here, we introduce the directional genomic hybridization methodology of chromatid painting-a whole new way of looking at structural features of the genome-that can be employed with high resolution on a cell-by-cell basis, and demonstrate its basic capabilities for genome-wide discovery and targeted detection of inversions. Bioinformatics enabled development of sequence- and strand-specific directional probe sets, which when coupled with single-stranded hybridization, greatly improved the resolution and ease of inversion detection. We highlight examples of the far-ranging applicability of this cytogenomics-based approach, which include confirmation of the alignment of the human genome database and evidence that individuals themselves share similar sequence directionality, as well as use in comparative and evolutionary studies for any species whose genome has been sequenced. In addition to applications related to basic mechanistic studies, the information obtainable with strand-specific hybridization strategies may ultimately enable novel gene discovery, thereby benefitting the diagnosis and treatment of a variety of human disease states and disorders including cancer, autism, and idiopathic infertility.

iss047e066248

NASA Image and Video Library

2016-04-19

ISS047e066248 (04/19/2016) --- NASA astronaut and Expedition 47 Flight Engineer Jeff Williams works with the Wet Lab RNA SmartCycler on-board the International Space Station. Wetlab RNA SmartCycler is a research platform for conducting real-time quantitative gene expression analysis aboard the ISS. The system enables spaceflight genomic studies involving a wide variety of biospecimen types in the unique microgravity environment of space.
D-GENIES: dot plot large genomes in an interactive, efficient and simple way.

PubMed

Cabanettes, Floréal; Klopp, Christophe

2018-01-01

Dot plots are widely used to quickly compare sequence sets. They provide a synthetic similarity overview, highlighting repetitions, breaks and inversions. Different tools have been developed to easily generated genomic alignment dot plots, but they are often limited in the input sequence size. D-GENIES is a standalone and web application performing large genome alignments using minimap2 software package and generating interactive dot plots. It enables users to sort query sequences along the reference, zoom in the plot and download several image, alignment or sequence files. D-GENIES is an easy-to-install, open-source software package (GPL) developed in Python and JavaScript. The source code is available at https://github.com/genotoul-bioinfo/dgenies and it can be tested at http://dgenies.toulouse.inra.fr/.
Simultaneous live imaging of the transcription and nuclear position of specific genes

PubMed Central

Ochiai, Hiroshi; Sugawara, Takeshi; Yamamoto, Takashi

2015-01-01

The relationship between genome organization and gene expression has recently been established. However, the relationships between spatial organization, dynamics, and transcriptional regulation of the genome remain unknown. In this study, we developed a live-imaging method for simultaneous measurements of the transcriptional activity and nuclear position of endogenous genes, which we termed the ‘Real-time Observation of Localization and EXpression (ROLEX)’ system. We demonstrated that ROLEX is highly specific and does not affect the expression level of the target gene. ROLEX enabled detection of sub-genome-wide mobility changes that depended on the state of Nanog transactivation in embryonic stem cells. We believe that the ROLEX system will become a powerful tool for exploring the relationship between transcription and nuclear dynamics in living cells. PMID:26092696
Network-assisted crop systems genetics: network inference and integrative analysis.

PubMed

Lee, Tak; Kim, Hyojin; Lee, Insuk

2015-04-01

Although next-generation sequencing (NGS) technology has enabled the decoding of many crop species genomes, most of the underlying genetic components for economically important crop traits remain to be determined. Network approaches have proven useful for the study of the reference plant, Arabidopsis thaliana, and the success of network-based crop genetics will also require the availability of a genome-scale functional networks for crop species. In this review, we discuss how to construct functional networks and elucidate the holistic view of a crop system. The crop gene network then can be used for gene prioritization and the analysis of resequencing-based genome-wide association study (GWAS) data, the amount of which will rapidly grow in the field of crop science in the coming years. Copyright © 2015 Elsevier Ltd. All rights reserved.
Quantitative Tracking of Combinatorially Engineered Populations with Multiplexed Binary Assemblies.

PubMed

Zeitoun, Ramsey I; Pines, Gur; Grau, Willliam C; Gill, Ryan T

2017-04-21

Advances in synthetic biology and genomics have enabled full-scale genome engineering efforts on laboratory time scales. However, the absence of sufficient approaches for mapping engineered genomes at system-wide scales onto performance has limited the adoption of more sophisticated algorithms for engineering complex biological systems. Here we report on the development and application of a robust approach to quantitatively map combinatorially engineered populations at scales up to several dozen target sites. This approach works by assembling genome engineered sites with cell-specific barcodes into a format compatible with high-throughput sequencing technologies. This approach, called barcoded-TRACE (bTRACE) was applied to assess E. coli populations engineered by recursive multiplex recombineering across both 6-target sites and 31-target sites. The 31-target library was then tracked throughout growth selections in the presence and absence of isopentenol (a potential next-generation biofuel). We also use the resolution of bTRACE to compare the influence of technical and biological noise on genome engineering efforts.
Genome-scale CRISPR-Cas9 knockout screening in human cells.

PubMed

Shalem, Ophir; Sanjana, Neville E; Hartenian, Ella; Shi, Xi; Scott, David A; Mikkelson, Tarjei; Heckl, Dirk; Ebert, Benjamin L; Root, David E; Doench, John G; Zhang, Feng

2014-01-03

The simplicity of programming the CRISPR (clustered regularly interspaced short palindromic repeats)-associated nuclease Cas9 to modify specific genomic loci suggests a new way to interrogate gene function on a genome-wide scale. We show that lentiviral delivery of a genome-scale CRISPR-Cas9 knockout (GeCKO) library targeting 18,080 genes with 64,751 unique guide sequences enables both negative and positive selection screening in human cells. First, we used the GeCKO library to identify genes essential for cell viability in cancer and pluripotent stem cells. Next, in a melanoma model, we screened for genes whose loss is involved in resistance to vemurafenib, a therapeutic RAF inhibitor. Our highest-ranking candidates include previously validated genes NF1 and MED12, as well as novel hits NF2, CUL3, TADA2B, and TADA1. We observe a high level of consistency between independent guide RNAs targeting the same gene and a high rate of hit confirmation, demonstrating the promise of genome-scale screening with Cas9.
Mapping and phasing of structural variation in patient genomes using nanopore sequencing.

PubMed

Cretu Stancu, Mircea; van Roosmalen, Markus J; Renkens, Ivo; Nieboer, Marleen M; Middelkamp, Sjors; de Ligt, Joep; Pregno, Giulia; Giachino, Daniela; Mandrile, Giorgia; Espejo Valle-Inclan, Jose; Korzelius, Jerome; de Bruijn, Ewart; Cuppen, Edwin; Talkowski, Michael E; Marschall, Tobias; de Ridder, Jeroen; Kloosterman, Wigard P

2017-11-06

Despite improvements in genomics technology, the detection of structural variants (SVs) from short-read sequencing still poses challenges, particularly for complex variation. Here we analyse the genomes of two patients with congenital abnormalities using the MinION nanopore sequencer and a novel computational pipeline-NanoSV. We demonstrate that nanopore long reads are superior to short reads with regard to detection of de novo chromothripsis rearrangements. The long reads also enable efficient phasing of genetic variations, which we leveraged to determine the parental origin of all de novo chromothripsis breakpoints and to resolve the structure of these complex rearrangements. Additionally, genome-wide surveillance of inherited SVs reveals novel variants, missed in short-read data sets, a large proportion of which are retrotransposon insertions. We provide a first exploration of patient genome sequencing with a nanopore sequencer and demonstrate the value of long-read sequencing in mapping and phasing of SVs for both clinical and research applications.
Unexplored therapeutic opportunities in the human genome.

PubMed

Oprea, Tudor I; Bologa, Cristian G; Brunak, Søren; Campbell, Allen; Gan, Gregory N; Gaulton, Anna; Gomez, Shawn M; Guha, Rajarshi; Hersey, Anne; Holmes, Jayme; Jadhav, Ajit; Jensen, Lars Juhl; Johnson, Gary L; Karlson, Anneli; Leach, Andrew R; Ma'ayan, Avi; Malovannaya, Anna; Mani, Subramani; Mathias, Stephen L; McManus, Michael T; Meehan, Terrence F; von Mering, Christian; Muthas, Daniel; Nguyen, Dac-Trung; Overington, John P; Papadatos, George; Qin, Jun; Reich, Christian; Roth, Bryan L; Schürer, Stephan C; Simeonov, Anton; Sklar, Larry A; Southall, Noel; Tomita, Susumu; Tudose, Ilinca; Ursu, Oleg; Vidovic, Dušica; Waller, Anna; Westergaard, David; Yang, Jeremy J; Zahoránszky-Köhalmi, Gergely

2018-05-01

A large proportion of biomedical research and the development of therapeutics is focused on a small fraction of the human genome. In a strategic effort to map the knowledge gaps around proteins encoded by the human genome and to promote the exploration of currently understudied, but potentially druggable, proteins, the US National Institutes of Health launched the Illuminating the Druggable Genome (IDG) initiative in 2014. In this article, we discuss how the systematic collection and processing of a wide array of genomic, proteomic, chemical and disease-related resource data by the IDG Knowledge Management Center have enabled the development of evidence-based criteria for tracking the target development level (TDL) of human proteins, which indicates a substantial knowledge deficit for approximately one out of three proteins in the human proteome. We then present spotlights on the TDL categories as well as key drug target classes, including G protein-coupled receptors, protein kinases and ion channels, which illustrate the nature of the unexplored opportunities for biomedical research and therapeutic development.
Policy perspectives on the emerging pathways of personalized medicine

PubMed Central

Downing, Gregory J.

2009-01-01

Remarkable advances in the fundamental knowledge about the biological basis of disease and technical advances in methods to assess genomic information have led the health care system to the threshold of personalized medicine. It is now feasible to consider strategic application of genomic information to guide patient management by being predictive, preemptive, and preventive, and enabling patient participation in medical decisions. Early evidence of this transition has some hallmarks of disruptive innovation to existing health care practices. Presented here is an examination of the changes underway to enable this new concept in health care in the United States, to improve precision and quality of care through innovations aimed at individualized approaches to medical decision making. A broad range of public policy positions will need to be considered for the health care delivery enterprise to accommodate the promise of this new science and technology for the benefit of patients. PMID:20135895
Genome-wide and fine-resolution association analysis of malaria in West Africa.

PubMed

Jallow, Muminatou; Teo, Yik Ying; Small, Kerrin S; Rockett, Kirk A; Deloukas, Panos; Clark, Taane G; Kivinen, Katja; Bojang, Kalifa A; Conway, David J; Pinder, Margaret; Sirugo, Giorgio; Sisay-Joof, Fatou; Usen, Stanley; Auburn, Sarah; Bumpstead, Suzannah J; Campino, Susana; Coffey, Alison; Dunham, Andrew; Fry, Andrew E; Green, Angela; Gwilliam, Rhian; Hunt, Sarah E; Inouye, Michael; Jeffreys, Anna E; Mendy, Alieu; Palotie, Aarno; Potter, Simon; Ragoussis, Jiannis; Rogers, Jane; Rowlands, Kate; Somaskantharajah, Elilan; Whittaker, Pamela; Widden, Claire; Donnelly, Peter; Howie, Bryan; Marchini, Jonathan; Morris, Andrew; SanJoaquin, Miguel; Achidi, Eric Akum; Agbenyega, Tsiri; Allen, Angela; Amodu, Olukemi; Corran, Patrick; Djimde, Abdoulaye; Dolo, Amagana; Doumbo, Ogobara K; Drakeley, Chris; Dunstan, Sarah; Evans, Jennifer; Farrar, Jeremy; Fernando, Deepika; Hien, Tran Tinh; Horstmann, Rolf D; Ibrahim, Muntaser; Karunaweera, Nadira; Kokwaro, Gilbert; Koram, Kwadwo A; Lemnge, Martha; Makani, Julie; Marsh, Kevin; Michon, Pascal; Modiano, David; Molyneux, Malcolm E; Mueller, Ivo; Parker, Michael; Peshu, Norbert; Plowe, Christopher V; Puijalon, Odile; Reeder, John; Reyburn, Hugh; Riley, Eleanor M; Sakuntabhai, Anavaj; Singhasivanon, Pratap; Sirima, Sodiomon; Tall, Adama; Taylor, Terrie E; Thera, Mahamadou; Troye-Blomberg, Marita; Williams, Thomas N; Wilson, Michael; Kwiatkowski, Dominic P

2009-06-01

We report a genome-wide association (GWA) study of severe malaria in The Gambia. The initial GWA scan included 2,500 children genotyped on the Affymetrix 500K GeneChip, and a replication study included 3,400 children. We used this to examine the performance of GWA methods in Africa. We found considerable population stratification, and also that signals of association at known malaria resistance loci were greatly attenuated owing to weak linkage disequilibrium (LD). To investigate possible solutions to the problem of low LD, we focused on the HbS locus, sequencing this region of the genome in 62 Gambian individuals and then using these data to conduct multipoint imputation in the GWA samples. This increased the signal of association, from P = 4 × 10(-7) to P = 4 × 10(-14), with the peak of the signal located precisely at the HbS causal variant. Our findings provide proof of principle that fine-resolution multipoint imputation, based on population-specific sequencing data, can substantially boost authentic GWA signals and enable fine mapping of causal variants in African populations.
Outbreak Investigation Using High-Throughput Genome Sequencing within a Diagnostic Microbiology Laboratory

PubMed Central

Sherry, Norelle L.; Porter, Jessica L.; Seemann, Torsten; Watkins, Andrew; Stinear, Timothy P.

2013-01-01

Next-generation sequencing (NGS) of bacterial genomes has recently become more accessible and is now available to the routine diagnostic microbiology laboratory. However, questions remain regarding its feasibility, particularly with respect to data analysis in nonspecialist centers. To test the applicability of NGS to outbreak investigations, Ion Torrent sequencing was used to investigate a putative multidrug-resistant Escherichia coli outbreak in the neonatal unit of the Mercy Hospital for Women, Melbourne, Australia. Four suspected outbreak strains and a comparator strain were sequenced. Genome-wide single nucleotide polymorphism (SNP) analysis demonstrated that the four neonatal intensive care unit (NICU) strains were identical and easily differentiated from the comparator strain. Genome sequence data also determined that the NICU strains belonged to multilocus sequence type 131 and carried the blaCTX-M-15 extended-spectrum beta-lactamase. Comparison of the outbreak strains to all publicly available complete E. coli genome sequences showed that they clustered with neonatal meningitis and uropathogenic isolates. The turnaround time from a positive culture to the completion of sequencing (prior to data analysis) was 5 days, and the cost was approximately $300 per strain (for the reagents only). The main obstacles to a mainstream adoption of NGS technologies in diagnostic microbiology laboratories are currently cost (although this is decreasing), a paucity of user-friendly and clinically focused bioinformatics platforms, and a lack of genomics expertise outside the research environment. Despite these hurdles, NGS technologies provide unparalleled high-resolution genotyping in a short time frame and are likely to be widely implemented in the field of diagnostic microbiology in the next few years, particularly for epidemiological investigations (replacing current typing methods) and the characterization of resistance determinants. Clinical microbiologists need to familiarize themselves with these technologies and their applications. PMID:23408689
Systems Biology of the Vervet Monkey

PubMed Central

Jasinska, Anna J.; Schmitt, Christopher A.; Service, Susan K.; Cantor, Rita M.; Dewar, Ken; Jentsch, James D.; Kaplan, Jay R.; Turner, Trudy R.; Warren, Wesley C.; Weinstock, George M.; Woods, Roger P.; Freimer, Nelson B.

2013-01-01

Nonhuman primates (NHP) provide crucial biomedical model systems intermediate between rodents and humans. The vervet monkey (also called the African green monkey) is a widely used NHP model that has unique value for genetic and genomic investigations of traits relevant to human diseases. This article describes the phylogeny and population history of the vervet monkey and summarizes the use of both captive and wild vervet monkeys in biomedical research. It also discusses the effort of an international collaboration to develop the vervet monkey as the most comprehensively phenotypically and genomically characterized NHP, a process that will enable the scientific community to employ this model for systems biology investigations. PMID:24174437
Towards a complete map of the human long non-coding RNA transcriptome.

PubMed

Uszczynska-Ratajczak, Barbara; Lagarde, Julien; Frankish, Adam; Guigó, Roderic; Johnson, Rory

2018-05-23

Gene maps, or annotations, enable us to navigate the functional landscape of our genome. They are a resource upon which virtually all studies depend, from single-gene to genome-wide scales and from basic molecular biology to medical genetics. Yet present-day annotations suffer from trade-offs between quality and size, with serious but often unappreciated consequences for downstream studies. This is particularly true for long non-coding RNAs (lncRNAs), which are poorly characterized compared to protein-coding genes. Long-read sequencing technologies promise to improve current annotations, paving the way towards a complete annotation of lncRNAs expressed throughout a human lifetime.
Live single-cell laser tag.

PubMed

Binan, Loïc; Mazzaferri, Javier; Choquet, Karine; Lorenzo, Louis-Etienne; Wang, Yu Chang; Affar, El Bachir; De Koninck, Yves; Ragoussis, Jiannis; Kleinman, Claudia L; Costantino, Santiago

2016-05-20

The ability to conduct image-based, non-invasive cell tagging, independent of genetic engineering, is key to cell biology applications. Here we introduce cell labelling via photobleaching (CLaP), a method that enables instant, specific tagging of individual cells based on a wide array of criteria such as shape, behaviour or positional information. CLaP uses laser illumination to crosslink biotin onto the plasma membrane, coupled with streptavidin conjugates to label individual cells for genomic, cell-tracking, flow cytometry or ultra-microscopy applications. We show that the incorporated mark is stable, non-toxic, retained for several days, and transferred by cell division but not to adjacent cells in culture. To demonstrate the potential of CLaP for genomic applications, we combine CLaP with microfluidics-based single-cell capture followed by transcriptome-wide next-generation sequencing. Finally, we show that CLaP can also be exploited for inducing transient cell adhesion to substrates for microengineering cultures with spatially patterned cell types.
Epigenetics: the language of the cell?

PubMed

Huang, Biao; Jiang, Cizhong; Zhang, Rongxin

2014-02-01

Epigenetics is one of the most rapidly developing fields of biological research. Breakthroughs in several technologies have enabled the possibility of genome-wide epigenetic research, for example the mapping of human genome-wide DNA methylation. In addition, with the development of various high-throughput and high-resolution sequencing technologies, a large number of functional noncoding RNAs have been identified. Massive studies indicated that these functional ncRNA also play an important role in epigenetics. In this review, we gain inspiration from the recent proposal of the ceRNAs hypothesis. This hypothesis proposes that miRNAs act as a language of communication. Accordingly, we further deduce that all of epigenetics may functionally acquire such a unique language characteristic. In summary, various epigenetic markers may not only participate in regulating cellular processes, but they may also act as the intracellular 'language' of communication and are involved in extensive information exchanges within cell.
Application of the stepwise focusing method to optimize the cost-effectiveness of genome-wide association studies with limited research budgets for genotyping and phenotyping.

PubMed

Ohashi, J; Clark, A G

2005-05-01

The recent cataloguing of a large number of SNPs enables us to perform genome-wide association studies for detecting common genetic variants associated with disease. Such studies, however, generally have limited research budgets for genotyping and phenotyping. It is therefore necessary to optimize the study design by determining the most cost-effective numbers of SNPs and individuals to analyze. In this report we applied the stepwise focusing method, with two-stage design, developed by Satagopan et al. (2002) and Saito & Kamatani (2002), to optimize the cost-effectiveness of a genome-wide direct association study using a transmission/disequilibrium test (TDT). The stepwise focusing method consists of two steps: a large number of SNPs are examined in the first focusing step, and then all the SNPs showing a significant P-value are tested again using a larger set of individuals in the second focusing step. In the framework of optimization, the numbers of SNPs and families and the significance levels in the first and second steps were regarded as variables to be considered. Our results showed that the stepwise focusing method achieves a distinct gain of power compared to a conventional method with the same research budget.
Genome-wide identification of bacterial plant colonization genes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cole, Benjamin J.; Feltcher, Meghan E.; Waters, Robert J.

Diverse soil-resident bacteria can contribute to plant growth and health, but the molecular mechanisms enabling them to effectively colonize their plant hosts remain poorly understood. We used randomly barcoded transposon mutagenesis sequencing (RB-TnSeq) in Pseudomonas simiae, a model root-colonizing bacterium, to establish a genome-wide map of bacterial genes required for colonization of the Arabidopsis thaliana root system. We identified 115 genes (2% of all P. simiae genes) with functions that are required for maximal competitive colonization of the root system. Among the genes we identified were some with obvious colonization-related roles in motility and carbon metabolism, as well as 44more » other genes that had no or vague functional predictions. Independent validation assays of individual genes confirmed colonization functions for 20 of 22 (91%) cases tested. To further characterize genes identified by our screen, we compared the functional contributions of P. simiae genes to growth in 90 distinct in vitro conditions by RB-TnSeq, highlighting specific metabolic functions associated with root colonization genes. Here, our analysis of bacterial genes by sequence-driven saturation mutagenesis revealed a genome-wide map of the genetic determinants of plant root colonization and offers a starting point for targeted improvement of the colonization capabilities of plant-beneficial microbes.« less
A Panel of Ancestry Informative Markers for the Complex Five-Way Admixed South African Coloured Population

PubMed Central

Daya, Michelle; van der Merwe, Lize; Galal, Ushma; Möller, Marlo; Salie, Muneeb; Chimusa, Emile R.; Galanter, Joshua M.; van Helden, Paul D.; Henn, Brenna M.; Gignoux, Chris R.; Hoal, Eileen

2013-01-01

Admixture is a well known confounder in genetic association studies. If genome-wide data is not available, as would be the case for candidate gene studies, ancestry informative markers (AIMs) are required in order to adjust for admixture. The predominant population group in the Western Cape, South Africa, is the admixed group known as the South African Coloured (SAC). A small set of AIMs that is optimized to distinguish between the five source populations of this population (African San, African non-San, European, South Asian, and East Asian) will enable researchers to cost-effectively reduce false-positive findings resulting from ignoring admixture in genetic association studies of the population. Using genome-wide data to find SNPs with large allele frequency differences between the source populations of the SAC, as quantified by Rosenberg et. al's -statistic, we developed a panel of AIMs by experimenting with various selection strategies. Subsets of different sizes were evaluated by measuring the correlation between ancestry proportions estimated by each AIM subset with ancestry proportions estimated using genome-wide data. We show that a panel of 96 AIMs can be used to assess ancestry proportions and to adjust for the confounding effect of the complex five-way admixture that occurred in the South African Coloured population. PMID:24376522
Genome-wide identification of bacterial plant colonization genes

DOE PAGES

Cole, Benjamin J.; Feltcher, Meghan E.; Waters, Robert J.; ...

2017-09-22

Diverse soil-resident bacteria can contribute to plant growth and health, but the molecular mechanisms enabling them to effectively colonize their plant hosts remain poorly understood. We used randomly barcoded transposon mutagenesis sequencing (RB-TnSeq) in Pseudomonas simiae, a model root-colonizing bacterium, to establish a genome-wide map of bacterial genes required for colonization of the Arabidopsis thaliana root system. We identified 115 genes (2% of all P. simiae genes) with functions that are required for maximal competitive colonization of the root system. Among the genes we identified were some with obvious colonization-related roles in motility and carbon metabolism, as well as 44more » other genes that had no or vague functional predictions. Independent validation assays of individual genes confirmed colonization functions for 20 of 22 (91%) cases tested. To further characterize genes identified by our screen, we compared the functional contributions of P. simiae genes to growth in 90 distinct in vitro conditions by RB-TnSeq, highlighting specific metabolic functions associated with root colonization genes. Here, our analysis of bacterial genes by sequence-driven saturation mutagenesis revealed a genome-wide map of the genetic determinants of plant root colonization and offers a starting point for targeted improvement of the colonization capabilities of plant-beneficial microbes.« less
A draft annotation and overview of the human genome

PubMed Central

Wright, Fred A; Lemon, William J; Zhao, Wei D; Sears, Russell; Zhuo, Degen; Wang, Jian-Ping; Yang, Hee-Yung; Baer, Troy; Stredney, Don; Spitzner, Joe; Stutz, Al; Krahe, Ralf; Yuan, Bo

2001-01-01

Background The recent draft assembly of the human genome provides a unified basis for describing genomic structure and function. The draft is sufficiently accurate to provide useful annotation, enabling direct observations of previously inferred biological phenomena. Results We report here a functionally annotated human gene index placed directly on the genome. The index is based on the integration of public transcript, protein, and mapping information, supplemented with computational prediction. We describe numerous global features of the genome and examine the relationship of various genetic maps with the assembly. In addition, initial sequence analysis reveals highly ordered chromosomal landscapes associated with paralogous gene clusters and distinct functional compartments. Finally, these annotation data were synthesized to produce observations of gene density and number that accord well with historical estimates. Such a global approach had previously been described only for chromosomes 21 and 22, which together account for 2.2% of the genome. Conclusions We estimate that the genome contains 65,000-75,000 transcriptional units, with exon sequences comprising 4%. The creation of a comprehensive gene index requires the synthesis of all available computational and experimental evidence. PMID:11516338

Microsatellite Interruptions Stabilize Primate Genomes and Exist as Population-Specific Single Nucleotide Polymorphisms within Individual Human Genomes

PubMed Central

Ananda, Guruprasad; Hile, Suzanne E.; Breski, Amanda; Wang, Yanli; Kelkar, Yogeshwar; Makova, Kateryna D.; Eckert, Kristin A.

2014-01-01

Interruptions of microsatellite sequences impact genome evolution and can alter disease manifestation. However, human polymorphism levels at interrupted microsatellites (iMSs) are not known at a genome-wide scale, and the pathways for gaining interruptions are poorly understood. Using the 1000 Genomes Phase-1 variant call set, we interrogated mono-, di-, tri-, and tetranucleotide repeats up to 10 units in length. We detected ∼26,000–40,000 iMSs within each of four human population groups (African, European, East Asian, and American). We identified population-specific iMSs within exonic regions, and discovered that known disease-associated iMSs contain alleles present at differing frequencies among the populations. By analyzing longer microsatellites in primate genomes, we demonstrate that single interruptions result in a genome-wide average two- to six-fold reduction in microsatellite mutability, as compared with perfect microsatellites. Centrally located interruptions lowered mutability dramatically, by two to three orders of magnitude. Using a biochemical approach, we tested directly whether the mutability of a specific iMS is lower because of decreased DNA polymerase strand slippage errors. Modeling the adenomatous polyposis coli tumor suppressor gene sequence, we observed that a single base substitution interruption reduced strand slippage error rates five- to 50-fold, relative to a perfect repeat, during synthesis by DNA polymerases α, β, or η. Computationally, we demonstrate that iMSs arise primarily by base substitution mutations within individual human genomes. Our biochemical survey of human DNA polymerase α, β, δ, κ, and η error rates within certain microsatellites suggests that interruptions are created most frequently by low fidelity polymerases. Our combined computational and biochemical results demonstrate that iMSs are abundant in human genomes and are sources of population-specific genetic variation that may affect genome stability. The genome-wide identification of iMSs in human populations presented here has important implications for current models describing the impact of microsatellite polymorphisms on gene expression. PMID:25033203
Genome-Wide Patterns of Polymorphism in an Inbred Line of the African Malaria Mosquito Anopheles gambiae

PubMed Central

Turissini, David A.; Gamez, Stephanie; White, Bradley J.

2014-01-01

Anopheles gambiae is a major mosquito vector of malaria in Africa. Although increased use of insecticide-based vector control tools has decreased malaria transmission, elimination is likely to require novel genetic control strategies. It can be argued that the absence of an A. gambiae inbred line has slowed progress toward genetic vector control. In order to empower genetic studies and enable precise and reproducible experimentation, we set out to create an inbred line of this species. We found that amenability to inbreeding varied between populations of A. gambiae. After full-sib inbreeding for ten generations, we genotyped 112 individuals—56 saved prior to inbreeding and 56 collected after inbreeding—at a genome-wide panel of single nucleotide polymorphisms (SNPs). Although inbreeding dramatically reduced diversity across much of the genome, we discovered numerous, discrete genomic blocks that maintained high heterozygosity. For one large genomic region, we were able to definitively show that high diversity is due to the persistent polymorphism of a chromosomal inversion. Inbred lines in other eukaryotes often exhibit a qualitatively similar retention of polymorphism when typed at a small number of markers. Our whole-genome SNP data provide the first strong, empirical evidence supporting associative overdominance as the mechanism maintaining higher than expected diversity in inbred lines. Although creation of A. gambiae lines devoid of nearly all polymorphism may not be feasible, our results provide critical insights into how more fully isogenic lines can be created. PMID:25377942
The rapid evolution of molecular genetic diagnostics in neuromuscular diseases.

PubMed

Volk, Alexander E; Kubisch, Christian

2017-10-01

The development of massively parallel sequencing (MPS) has revolutionized molecular genetic diagnostics in monogenic disorders. The present review gives a brief overview of different MPS-based approaches used in clinical diagnostics of neuromuscular disorders (NMDs) and highlights their advantages and limitations. MPS-based approaches like gene panel sequencing, (whole) exome sequencing, (whole) genome sequencing, and RNA sequencing have been used to identify the genetic cause in NMDs. Although gene panel sequencing has evolved as a standard test for heterogeneous diseases, it is still debated, mainly because of financial issues and unsolved problems of variant interpretation, whether genome sequencing (and to a lesser extent also exome sequencing) of single patients can already be regarded as routine diagnostics. However, it has been shown that the inclusion of parents and additional family members often leads to a substantial increase in the diagnostic yield in exome-wide/genome-wide MPS approaches. In addition, MPS-based RNA sequencing just enters the research and diagnostic scene. Next-generation sequencing increasingly enables the detection of the genetic cause in highly heterogeneous diseases like NMDs in an efficient and affordable way. Gene panel sequencing and family-based exome sequencing have been proven as potent and cost-efficient diagnostic tools. Although clinical validation and interpretation of genome sequencing is still challenging, diagnostic RNA sequencing represents a promising tool to bypass some hurdles of diagnostics using genomic DNA.
Next Generation Sequencing Technologies: The Doorway to the Unexplored Genomics of Non-Model Plants

PubMed Central

Unamba, Chibuikem I. N.; Nag, Akshay; Sharma, Ram K.

2015-01-01

Non-model plants i.e., the species which have one or all of the characters such as long life cycle, difficulty to grow in the laboratory or poor fecundity, have been schemed out of sequencing projects earlier, due to high running cost of Sanger sequencing. Consequently, the information about their genomics and key biological processes are inadequate. However, the advent of fast and cost effective next generation sequencing (NGS) platforms in the recent past has enabled the unearthing of certain characteristic gene structures unique to these species. It has also aided in gaining insight about mechanisms underlying processes of gene expression and secondary metabolism as well as facilitated development of genomic resources for diversity characterization, evolutionary analysis and marker assisted breeding even without prior availability of genomic sequence information. In this review we explore how different Next Gen Sequencing platforms, as well as recent advances in NGS based high throughput genotyping technologies are rewarding efforts on de-novo whole genome/transcriptome sequencing, development of genome wide sequence based markers resources for improvement of non-model crops that are less costly than phenotyping. PMID:26734016
Population-based structural variation discovery with Hydra-Multi.

PubMed

Lindberg, Michael R; Hall, Ira M; Quinlan, Aaron R

2015-04-15

Current strategies for SNP and INDEL discovery incorporate sequence alignments from multiple individuals to maximize sensitivity and specificity. It is widely accepted that this approach also improves structural variant (SV) detection. However, multisample SV analysis has been stymied by the fundamental difficulties of SV calling, e.g. library insert size variability, SV alignment signal integration and detecting long-range genomic rearrangements involving disjoint loci. Extant tools suffer from poor scalability, which limits the number of genomes that can be co-analyzed and complicates analysis workflows. We have developed an approach that enables multisample SV analysis in hundreds to thousands of human genomes using commodity hardware. Here, we describe Hydra-Multi and measure its accuracy, speed and scalability using publicly available datasets provided by The 1000 Genomes Project and by The Cancer Genome Atlas (TCGA). Hydra-Multi is written in C++ and is freely available at https://github.com/arq5x/Hydra. aaronquinlan@gmail.com or ihall@genome.wustl.edu Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
Ebolavirus comparative genomics

PubMed Central

Jun, Se-Ran; Leuze, Michael R.; Nookaew, Intawat; Uberbacher, Edward C.; Land, Miriam; Zhang, Qian; Wanchai, Visanu; Chai, Juanjuan; Nielsen, Morten; Trolle, Thomas; Lund, Ole; Buzard, Gregory S.; Pedersen, Thomas D.; Wassenaar, Trudy M.; Ussery, David W.

2015-01-01

The 2014 Ebola outbreak in West Africa is the largest documented for this virus. To examine the dynamics of this genome, we compare more than 100 currently available ebolavirus genomes to each other and to other viral genomes. Based on oligomer frequency analysis, the family Filoviridae forms a distinct group from all other sequenced viral genomes. All filovirus genomes sequenced to date encode proteins with similar functions and gene order, although there is considerable divergence in sequences between the three genera Ebolavirus, Cuevavirus and Marburgvirus within the family Filoviridae. Whereas all ebolavirus genomes are quite similar (multiple sequences of the same strain are often identical), variation is most common in the intergenic regions and within specific areas of the genes encoding the glycoprotein (GP), nucleoprotein (NP) and polymerase (L). We predict regions that could contain epitope-binding sites, which might be good vaccine targets. This information, combined with glycosylation sites and experimentally determined epitopes, can identify the most promising regions for the development of therapeutic strategies. This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). PMID:26175035
Flow cytometry sorting of nuclei enables the first global characterization of Paramecium germline DNA and transposable elements.

PubMed

Guérin, Frédéric; Arnaiz, Olivier; Boggetto, Nicole; Denby Wilkes, Cyril; Meyer, Eric; Sperling, Linda; Duharcourt, Sandra

2017-04-26

DNA elimination is developmentally programmed in a wide variety of eukaryotes, including unicellular ciliates, and leads to the generation of distinct germline and somatic genomes. The ciliate Paramecium tetraurelia harbors two types of nuclei with different functions and genome structures. The transcriptionally inactive micronucleus contains the complete germline genome, while the somatic macronucleus contains a reduced genome streamlined for gene expression. During development of the somatic macronucleus, the germline genome undergoes massive and reproducible DNA elimination events. Availability of both the somatic and germline genomes is essential to examine the genome changes that occur during programmed DNA elimination and ultimately decipher the mechanisms underlying the specific removal of germline-limited sequences. We developed a novel experimental approach that uses flow cell imaging and flow cytometry to sort subpopulations of nuclei to high purity. We sorted vegetative micronuclei and macronuclei during development of P. tetraurelia. We validated the method by flow cell imaging and by high throughput DNA sequencing. Our work establishes the proof of principle that developing somatic macronuclei can be sorted from a complex biological sample to high purity based on their size, shape and DNA content. This method enabled us to sequence, for the first time, the germline DNA from pure micronuclei and to identify novel transposable elements. Sequencing the germline DNA confirms that the Pgm domesticated transposase is required for the excision of all ~45,000 Internal Eliminated Sequences. Comparison of the germline DNA and unrearranged DNA obtained from PGM-silenced cells reveals that the latter does not provide a faithful representation of the germline genome. We developed a flow cytometry-based method to purify P. tetraurelia nuclei to high purity and provided quality control with flow cell imaging and high throughput DNA sequencing. We identified 61 germline transposable elements including the first Paramecium retrotransposons. This approach paves the way to sequence the germline genomes of P. aurelia sibling species for future comparative genomic studies.
Best practices for mapping replication origins in eukaryotic chromosomes.

PubMed

Besnard, Emilie; Desprat, Romain; Ryan, Michael; Kahli, Malik; Aladjem, Mirit I; Lemaitre, Jean-Marc

2014-09-02

Understanding the regulatory principles ensuring complete DNA replication in each cell division is critical for deciphering the mechanisms that maintain genomic stability. Recent advances in genome sequencing technology facilitated complete mapping of DNA replication sites and helped move the field from observing replication patterns at a handful of single loci to analyzing replication patterns genome-wide. These advances address issues, such as the relationship between replication initiation events, transcription, and chromatin modifications, and identify potential replication origin consensus sequences. This unit summarizes the technological and fundamental aspects of replication profiling and briefly discusses novel insights emerging from mining large datasets, published in the last 3 years, and also describes DNA replication dynamics on a whole-genome scale. Copyright © 2014 John Wiley & Sons, Inc.
QuIN: A Web Server for Querying and Visualizing Chromatin Interaction Networks.

PubMed

Thibodeau, Asa; Márquez, Eladio J; Luo, Oscar; Ruan, Yijun; Menghi, Francesca; Shin, Dong-Guk; Stitzel, Michael L; Vera-Licona, Paola; Ucar, Duygu

2016-06-01

Recent studies of the human genome have indicated that regulatory elements (e.g. promoters and enhancers) at distal genomic locations can interact with each other via chromatin folding and affect gene expression levels. Genomic technologies for mapping interactions between DNA regions, e.g., ChIA-PET and HiC, can generate genome-wide maps of interactions between regulatory elements. These interaction datasets are important resources to infer distal gene targets of non-coding regulatory elements and to facilitate prioritization of critical loci for important cellular functions. With the increasing diversity and complexity of genomic information and public ontologies, making sense of these datasets demands integrative and easy-to-use software tools. Moreover, network representation of chromatin interaction maps enables effective data visualization, integration, and mining. Currently, there is no software that can take full advantage of network theory approaches for the analysis of chromatin interaction datasets. To fill this gap, we developed a web-based application, QuIN, which enables: 1) building and visualizing chromatin interaction networks, 2) annotating networks with user-provided private and publicly available functional genomics and interaction datasets, 3) querying network components based on gene name or chromosome location, and 4) utilizing network based measures to identify and prioritize critical regulatory targets and their direct and indirect interactions. QuIN's web server is available at http://quin.jax.org QuIN is developed in Java and JavaScript, utilizing an Apache Tomcat web server and MySQL database and the source code is available under the GPLV3 license available on GitHub: https://github.com/UcarLab/QuIN/.
Atlas2 Cloud: a framework for personal genome analysis in the cloud

PubMed Central

2012-01-01

Background Until recently, sequencing has primarily been carried out in large genome centers which have invested heavily in developing the computational infrastructure that enables genomic sequence analysis. The recent advancements in next generation sequencing (NGS) have led to a wide dissemination of sequencing technologies and data, to highly diverse research groups. It is expected that clinical sequencing will become part of diagnostic routines shortly. However, limited accessibility to computational infrastructure and high quality bioinformatic tools, and the demand for personnel skilled in data analysis and interpretation remains a serious bottleneck. To this end, the cloud computing and Software-as-a-Service (SaaS) technologies can help address these issues. Results We successfully enabled the Atlas2 Cloud pipeline for personal genome analysis on two different cloud service platforms: a community cloud via the Genboree Workbench, and a commercial cloud via the Amazon Web Services using Software-as-a-Service model. We report a case study of personal genome analysis using our Atlas2 Genboree pipeline. We also outline a detailed cost structure for running Atlas2 Amazon on whole exome capture data, providing cost projections in terms of storage, compute and I/O when running Atlas2 Amazon on a large data set. Conclusions We find that providing a web interface and an optimized pipeline clearly facilitates usage of cloud computing for personal genome analysis, but for it to be routinely used for large scale projects there needs to be a paradigm shift in the way we develop tools, in standard operating procedures, and in funding mechanisms. PMID:23134663
Atlas2 Cloud: a framework for personal genome analysis in the cloud.

PubMed

Evani, Uday S; Challis, Danny; Yu, Jin; Jackson, Andrew R; Paithankar, Sameer; Bainbridge, Matthew N; Jakkamsetti, Adinarayana; Pham, Peter; Coarfa, Cristian; Milosavljevic, Aleksandar; Yu, Fuli

2012-01-01

Until recently, sequencing has primarily been carried out in large genome centers which have invested heavily in developing the computational infrastructure that enables genomic sequence analysis. The recent advancements in next generation sequencing (NGS) have led to a wide dissemination of sequencing technologies and data, to highly diverse research groups. It is expected that clinical sequencing will become part of diagnostic routines shortly. However, limited accessibility to computational infrastructure and high quality bioinformatic tools, and the demand for personnel skilled in data analysis and interpretation remains a serious bottleneck. To this end, the cloud computing and Software-as-a-Service (SaaS) technologies can help address these issues. We successfully enabled the Atlas2 Cloud pipeline for personal genome analysis on two different cloud service platforms: a community cloud via the Genboree Workbench, and a commercial cloud via the Amazon Web Services using Software-as-a-Service model. We report a case study of personal genome analysis using our Atlas2 Genboree pipeline. We also outline a detailed cost structure for running Atlas2 Amazon on whole exome capture data, providing cost projections in terms of storage, compute and I/O when running Atlas2 Amazon on a large data set. We find that providing a web interface and an optimized pipeline clearly facilitates usage of cloud computing for personal genome analysis, but for it to be routinely used for large scale projects there needs to be a paradigm shift in the way we develop tools, in standard operating procedures, and in funding mechanisms.
Read clouds uncover variation in complex regions of the human genome.

PubMed

Bishara, Alex; Liu, Yuling; Weng, Ziming; Kashef-Haghighi, Dorna; Newburger, Daniel E; West, Robert; Sidow, Arend; Batzoglou, Serafim

2015-10-01

Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Core to the problem is the lack of a sequencing technology that produces reads with sufficient length and accuracy to enable unique mapping. Here, we present a novel methodology of using read clouds, obtained by accurate short-read sequencing of DNA derived from long fragment libraries, to confidently align short reads within repeat regions and enable accurate variant discovery. Our novel algorithm, Random Field Aligner (RFA), captures the relationships among the short reads governed by the long read process via a Markov Random Field. We utilized a modified version of the Illumina TruSeq synthetic long-read protocol, which yielded shallow-sequenced read clouds. We test RFA through extensive simulations and apply it to discover variants on the NA12878 human sample, for which shallow TruSeq read cloud sequencing data are available, and on an invasive breast carcinoma genome that we sequenced using the same method. We demonstrate that RFA facilitates accurate recovery of variation in 155 Mb of the human genome, including 94% of 67 Mb of segmental duplication sequence and 96% of 11 Mb of transcribed sequence, that are currently hidden from short-read technologies. © 2015 Bishara et al.; Published by Cold Spring Harbor Laboratory Press.
Artificial selection on introduced Asian haplotypes shaped the genetic architecture in European commercial pigs.

PubMed

Bosse, Mirte; Lopes, Marcos S; Madsen, Ole; Megens, Hendrik-Jan; Crooijmans, Richard P M A; Frantz, Laurent A F; Harlizius, Barbara; Bastiaansen, John W M; Groenen, Martien A M

2015-12-22

Early pig farmers in Europe imported Asian pigs to cross with their local breeds in order to improve traits of commercial interest. Current genomics techniques enabled genome-wide identification of these Asian introgressed haplotypes in modern European pig breeds. We propose that the Asian variants are still present because they affect phenotypes that were important for ancient traditional, as well as recent, commercial pig breeding. Genome-wide introgression levels were only weakly correlated with gene content and recombination frequency. However, regions with an excess or absence of Asian haplotypes (AS) contained genes that were previously identified as phenotypically important such as FASN, ME1, and KIT. Therefore, the Asian alleles are thought to have an effect on phenotypes that were historically under selection. We aimed to estimate the effect of AS in introgressed regions in Large White pigs on the traits of backfat (BF) and litter size. The majority of regions we tested that retained Asian deoxyribonucleic acid (DNA) showed significantly increased BF from the Asian alleles. Our results suggest that the introgression in Large White pigs has been strongly determined by the selective pressure acting upon the introgressed AS. We therefore conclude that human-driven hybridization and selection contributed to the genomic architecture of these commercial pigs. © 2015 The Author(s).
Integration of Genomic and Other Epidemiologic Data to Investigate and Control a Cross-Institutional Outbreak of Streptococcus pyogenes.

PubMed

Chalker, Victoria J; Smith, Alyson; Al-Shahib, Ali; Botchway, Stella; Macdonald, Emily; Daniel, Roger; Phillips, Sarah; Platt, Steven; Doumith, Michel; Tewolde, Rediat; Coelho, Juliana; Jolley, Keith A; Underwood, Anthony; McCarthy, Noel D

2016-06-01

Single-strain outbreaks of Streptococcus pyogenes infections are common and often go undetected. In 2013, two clusters of invasive group A Streptococcus (iGAS) infection were identified in independent but closely located care homes in Oxfordshire, United Kingdom. Investigation included visits to each home, chart review, staff survey, microbiologic sampling, and genome sequencing. S. pyogenes emm type 1.0, the most common circulating type nationally, was identified from all cases yielding GAS isolates. A tailored whole-genome reference population comprising epidemiologically relevant contemporaneous isolates and published isolates was assembled. Data were analyzed independently using whole-genome multilocus sequencing and single-nucleotide polymorphism analyses. Six isolates from staff and residents of the homes formed a single cluster that was separated from the reference population by both analytical approaches. No further cases occurred after mass chemoprophylaxis and enhanced infection control. Our findings demonstrate the ability of 2 independent analytical approaches to enable robust conclusions from nonstandardized whole-genome analysis to support public health practice.
A Hierarchical Framework for State-Space Matrix Inference and Clustering.

PubMed

Zuo, Chandler; Chen, Kailei; Hewitt, Kyle J; Bresnick, Emery H; Keleş, Sündüz

2016-09-01

In recent years, a large number of genomic and epigenomic studies have been focusing on the integrative analysis of multiple experimental datasets measured over a large number of observational units. The objectives of such studies include not only inferring a hidden state of activity for each unit over individual experiments, but also detecting highly associated clusters of units based on their inferred states. Although there are a number of methods tailored for specific datasets, there is currently no state-of-the-art modeling framework for this general class of problems. In this paper, we develop the MBASIC ( M atrix B ased A nalysis for S tate-space I nference and C lustering) framework. MBASIC consists of two parts: state-space mapping and state-space clustering. In state-space mapping, it maps observations onto a finite state-space, representing the activation states of units across conditions. In state-space clustering, MBASIC incorporates a finite mixture model to cluster the units based on their inferred state-space profiles across all conditions. Both the state-space mapping and clustering can be simultaneously estimated through an Expectation-Maximization algorithm. MBASIC flexibly adapts to a large number of parametric distributions for the observed data, as well as the heterogeneity in replicate experiments. It allows for imposing structural assumptions on each cluster, and enables model selection using information criterion. In our data-driven simulation studies, MBASIC showed significant accuracy in recovering both the underlying state-space variables and clustering structures. We applied MBASIC to two genome research problems using large numbers of datasets from the ENCODE project. The first application grouped genes based on transcription factor occupancy profiles of their promoter regions in two different cell types. The second application focused on identifying groups of loci that are similar to a GATA2 binding site that is functional at its endogenous locus by utilizing transcription factor occupancy data and illustrated applicability of MBASIC in a wide variety of problems. In both studies, MBASIC showed higher levels of raw data fidelity than analyzing these data with a two-step approach using ENCODE results on transcription factor occupancy data.
Pathway-Based Kernel Boosting for the Analysis of Genome-Wide Association Studies

PubMed Central

Manitz, Juliane; Burger, Patricia; Amos, Christopher I.; Chang-Claude, Jenny; Wichmann, Heinz-Erich; Kneib, Thomas; Bickeböller, Heike

2017-01-01

The analysis of genome-wide association studies (GWAS) benefits from the investigation of biologically meaningful gene sets, such as gene-interaction networks (pathways). We propose an extension to a successful kernel-based pathway analysis approach by integrating kernel functions into a powerful algorithmic framework for variable selection, to enable investigation of multiple pathways simultaneously. We employ genetic similarity kernels from the logistic kernel machine test (LKMT) as base-learners in a boosting algorithm. A model to explain case-control status is created iteratively by selecting pathways that improve its prediction ability. We evaluated our method in simulation studies adopting 50 pathways for different sample sizes and genetic effect strengths. Additionally, we included an exemplary application of kernel boosting to a rheumatoid arthritis and a lung cancer dataset. Simulations indicate that kernel boosting outperforms the LKMT in certain genetic scenarios. Applications to GWAS data on rheumatoid arthritis and lung cancer resulted in sparse models which were based on pathways interpretable in a clinical sense. Kernel boosting is highly flexible in terms of considered variables and overcomes the problem of multiple testing. Additionally, it enables the prediction of clinical outcomes. Thus, kernel boosting constitutes a new, powerful tool in the analysis of GWAS data and towards the understanding of biological processes involved in disease susceptibility. PMID:28785300
Pathway-Based Kernel Boosting for the Analysis of Genome-Wide Association Studies.

PubMed

Friedrichs, Stefanie; Manitz, Juliane; Burger, Patricia; Amos, Christopher I; Risch, Angela; Chang-Claude, Jenny; Wichmann, Heinz-Erich; Kneib, Thomas; Bickeböller, Heike; Hofner, Benjamin

2017-01-01

The analysis of genome-wide association studies (GWAS) benefits from the investigation of biologically meaningful gene sets, such as gene-interaction networks (pathways). We propose an extension to a successful kernel-based pathway analysis approach by integrating kernel functions into a powerful algorithmic framework for variable selection, to enable investigation of multiple pathways simultaneously. We employ genetic similarity kernels from the logistic kernel machine test (LKMT) as base-learners in a boosting algorithm. A model to explain case-control status is created iteratively by selecting pathways that improve its prediction ability. We evaluated our method in simulation studies adopting 50 pathways for different sample sizes and genetic effect strengths. Additionally, we included an exemplary application of kernel boosting to a rheumatoid arthritis and a lung cancer dataset. Simulations indicate that kernel boosting outperforms the LKMT in certain genetic scenarios. Applications to GWAS data on rheumatoid arthritis and lung cancer resulted in sparse models which were based on pathways interpretable in a clinical sense. Kernel boosting is highly flexible in terms of considered variables and overcomes the problem of multiple testing. Additionally, it enables the prediction of clinical outcomes. Thus, kernel boosting constitutes a new, powerful tool in the analysis of GWAS data and towards the understanding of biological processes involved in disease susceptibility.
Metabolic Engineering for Probiotics and their Genome-Wide Expression Profiling.

PubMed

Yadav, Ruby; Singh, Puneet K; Shukla, Pratyoosh

2018-01-01

Probiotic supplements in food industry have attracted a lot of attention and shown a remarkable growth in this field. Metabolic engineering (ME) approaches enable understanding their mechanism of action and increases possibility of designing probiotic strains with desired functions. Probiotic microorganisms generally referred as industrially important lactic acid bacteria (LAB) which are involved in fermenting dairy products, food, beverages and produces lactic acid as final product. A number of illustrations of metabolic engineering approaches in industrial probiotic bacteria have been described in this review including transcriptomic studies of Lactobacillus reuteri and improvement in exopolysaccharide (EPS) biosynthesis yield in Lactobacillus casei LC2W. This review summaries various metabolic engineering approaches for exploring metabolic pathways. These approaches enable evaluation of cellular metabolic state and effective editing of microbial genome or introduction of novel enzymes to redirect the carbon fluxes. In addition, various system biology tools such as in silico design commonly used for improving strain performance is also discussed. Finally, we discuss the integration of metabolic engineering and genome profiling which offers a new way to explore metabolic interactions, fluxomics and probiogenomics using probiotic bacteria like Bifidobacterium spp and Lactobacillus spp. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
A MBD-seq protocol for large-scale methylome-wide studies with (very) low amounts of DNA.

PubMed

Aberg, Karolina A; Chan, Robin F; Shabalin, Andrey A; Zhao, Min; Turecki, Gustavo; Staunstrup, Nicklas Heine; Starnawska, Anna; Mors, Ole; Xie, Lin Y; van den Oord, Edwin Jcg

2017-09-01

We recently showed that, after optimization, our methyl-CpG binding domain sequencing (MBD-seq) application approximates the methylome-wide coverage obtained with whole-genome bisulfite sequencing (WGB-seq), but at a cost that enables adequately powered large-scale association studies. A prior drawback of MBD-seq is the relatively large amount of genomic DNA (ideally >1 µg) required to obtain high-quality data. Biomaterials are typically expensive to collect, provide a finite amount of DNA, and may simply not yield sufficient starting material. The ability to use low amounts of DNA will increase the breadth and number of studies that can be conducted. Therefore, we further optimized the enrichment step. With this low starting material protocol, MBD-seq performed equally well, or better, than the protocol requiring ample starting material (>1 µg). Using only 15 ng of DNA as input, there is minimal loss in data quality, achieving 93% of the coverage of WGB-seq (with standard amounts of input DNA) at similar false/positive rates. Furthermore, across a large number of genomic features, the MBD-seq methylation profiles closely tracked those observed for WGB-seq with even slightly larger effect sizes. This suggests that MBD-seq provides similar information about the methylome and classifies methylation status somewhat more accurately. Performance decreases with <15 ng DNA as starting material but, even with as little as 5 ng, MBD-seq still achieves 90% of the coverage of WGB-seq with comparable genome-wide methylation profiles. Thus, the proposed protocol is an attractive option for adequately powered and cost-effective methylome-wide investigations using (very) low amounts of DNA.
Genome-wide patterns of promoter sharing and co-expression in bovine skeletal muscle.

PubMed

Gu, Quan; Nagaraj, Shivashankar H; Hudson, Nicholas J; Dalrymple, Brian P; Reverter, Antonio

2011-01-12

Gene regulation by transcription factors (TF) is species, tissue and time specific. To better understand how the genetic code controls gene expression in bovine muscle we associated gene expression data from developing Longissimus thoracis et lumborum skeletal muscle with bovine promoter sequence information. We created a highly conserved genome-wide promoter landscape comprising 87,408 interactions relating 333 TFs with their 9,242 predicted target genes (TGs). We discovered that the complete set of predicted TGs share an average of 2.75 predicted TF binding sites (TFBSs) and that the average co-expression between a TF and its predicted TGs is higher than the average co-expression between the same TF and all genes. Conversely, pairs of TFs sharing predicted TGs showed a co-expression correlation higher that pairs of TFs not sharing TGs. Finally, we exploited the co-occurrence of predicted TFBS in the context of muscle-derived functionally-coherent modules including cell cycle, mitochondria, immune system, fat metabolism, muscle/glycolysis, and ribosome. Our findings enabled us to reverse engineer a regulatory network of core processes, and correctly identified the involvement of E2F1, GATA2 and NFKB1 in the regulation of cell cycle, fat, and muscle/glycolysis, respectively. The pivotal implication of our research is two-fold: (1) there exists a robust genome-wide expression signal between TFs and their predicted TGs in cattle muscle consistent with the extent of promoter sharing; and (2) this signal can be exploited to recover the cellular mechanisms underpinning transcription regulation of muscle structure and development in bovine. Our study represents the first genome-wide report linking tissue specific co-expression to co-regulation in a non-model vertebrate.

Comprehensive Genome-Wide Classification Reveals That Many Plant-Specific Transcription Factors Evolved in Streptophyte Algae

PubMed Central

Wilhelmsson, Per K I; Mühlich, Cornelia; Ullrich, Kristian K

2017-01-01

Abstract Plant genomes encode many lineage-specific, unique transcription factors. Expansion of such gene families has been previously found to coincide with the evolution of morphological complexity, although comparative analyses have been hampered by severe sampling bias. Here, we make use of the recently increased availability of plant genomes. We have updated and expanded previous rule sets for domain-based classification of transcription associated proteins (TAPs), comprising transcription factors and transcriptional regulators. The genome-wide annotation of these protein families has been analyzed and made available via the novel TAPscan web interface. We find that many TAP families previously thought to be specific for land plants actually evolved in streptophyte (charophyte) algae; 26 out of 36 TAP family gains are inferred to have occurred in the common ancestor of the Streptophyta (uniting the land plants—Embryophyta—with their closest algal relatives). In contrast, expansions of TAP families were found to occur throughout streptophyte evolution. 17 out of 76 expansion events were found to be common to all land plants and thus probably evolved concomitant with the water-to-land-transition. PMID:29216360
PEPIS: A Pipeline for Estimating Epistatic Effects in Quantitative Trait Locus Mapping and Genome-Wide Association Studies.

PubMed

Zhang, Wenchao; Dai, Xinbin; Wang, Qishan; Xu, Shizhong; Zhao, Patrick X

2016-05-01

The term epistasis refers to interactions between multiple genetic loci. Genetic epistasis is important in regulating biological function and is considered to explain part of the 'missing heritability,' which involves marginal genetic effects that cannot be accounted for in genome-wide association studies. Thus, the study of epistasis is of great interest to geneticists. However, estimating epistatic effects for quantitative traits is challenging due to the large number of interaction effects that must be estimated, thus significantly increasing computing demands. Here, we present a new web server-based tool, the Pipeline for estimating EPIStatic genetic effects (PEPIS), for analyzing polygenic epistatic effects. The PEPIS software package is based on a new linear mixed model that has been used to predict the performance of hybrid rice. The PEPIS includes two main sub-pipelines: the first for kinship matrix calculation, and the second for polygenic component analyses and genome scanning for main and epistatic effects. To accommodate the demand for high-performance computation, the PEPIS utilizes C/C++ for mathematical matrix computing. In addition, the modules for kinship matrix calculations and main and epistatic-effect genome scanning employ parallel computing technology that effectively utilizes multiple computer nodes across our networked cluster, thus significantly improving the computational speed. For example, when analyzing the same immortalized F2 rice population genotypic data examined in a previous study, the PEPIS returned identical results at each analysis step with the original prototype R code, but the computational time was reduced from more than one month to about five minutes. These advances will help overcome the bottleneck frequently encountered in genome wide epistatic genetic effect analysis and enable accommodation of the high computational demand. The PEPIS is publically available at http://bioinfo.noble.org/PolyGenic_QTL/.
Comparison of HapMap and 1000 Genomes Reference Panels in a Large-Scale Genome-Wide Association Study.

PubMed

de Vries, Paul S; Sabater-Lleal, Maria; Chasman, Daniel I; Trompet, Stella; Ahluwalia, Tarunveer S; Teumer, Alexander; Kleber, Marcus E; Chen, Ming-Huei; Wang, Jie Jin; Attia, John R; Marioni, Riccardo E; Steri, Maristella; Weng, Lu-Chen; Pool, Rene; Grossmann, Vera; Brody, Jennifer A; Venturini, Cristina; Tanaka, Toshiko; Rose, Lynda M; Oldmeadow, Christopher; Mazur, Johanna; Basu, Saonli; Frånberg, Mattias; Yang, Qiong; Ligthart, Symen; Hottenga, Jouke J; Rumley, Ann; Mulas, Antonella; de Craen, Anton J M; Grotevendt, Anne; Taylor, Kent D; Delgado, Graciela E; Kifley, Annette; Lopez, Lorna M; Berentzen, Tina L; Mangino, Massimo; Bandinelli, Stefania; Morrison, Alanna C; Hamsten, Anders; Tofler, Geoffrey; de Maat, Moniek P M; Draisma, Harmen H M; Lowe, Gordon D; Zoledziewska, Magdalena; Sattar, Naveed; Lackner, Karl J; Völker, Uwe; McKnight, Barbara; Huang, Jie; Holliday, Elizabeth G; McEvoy, Mark A; Starr, John M; Hysi, Pirro G; Hernandez, Dena G; Guan, Weihua; Rivadeneira, Fernando; McArdle, Wendy L; Slagboom, P Eline; Zeller, Tanja; Psaty, Bruce M; Uitterlinden, André G; de Geus, Eco J C; Stott, David J; Binder, Harald; Hofman, Albert; Franco, Oscar H; Rotter, Jerome I; Ferrucci, Luigi; Spector, Tim D; Deary, Ian J; März, Winfried; Greinacher, Andreas; Wild, Philipp S; Cucca, Francesco; Boomsma, Dorret I; Watkins, Hugh; Tang, Weihong; Ridker, Paul M; Jukema, Jan W; Scott, Rodney J; Mitchell, Paul; Hansen, Torben; O'Donnell, Christopher J; Smith, Nicholas L; Strachan, David P; Dehghan, Abbas

2017-01-01

An increasing number of genome-wide association (GWA) studies are now using the higher resolution 1000 Genomes Project reference panel (1000G) for imputation, with the expectation that 1000G imputation will lead to the discovery of additional associated loci when compared to HapMap imputation. In order to assess the improvement of 1000G over HapMap imputation in identifying associated loci, we compared the results of GWA studies of circulating fibrinogen based on the two reference panels. Using both HapMap and 1000G imputation we performed a meta-analysis of 22 studies comprising the same 91,953 individuals. We identified six additional signals using 1000G imputation, while 29 loci were associated using both HapMap and 1000G imputation. One locus identified using HapMap imputation was not significant using 1000G imputation. The genome-wide significance threshold of 5×10-8 is based on the number of independent statistical tests using HapMap imputation, and 1000G imputation may lead to further independent tests that should be corrected for. When using a stricter Bonferroni correction for the 1000G GWA study (P-value < 2.5×10-8), the number of loci significant only using HapMap imputation increased to 4 while the number of loci significant only using 1000G decreased to 5. In conclusion, 1000G imputation enabled the identification of 20% more loci than HapMap imputation, although the advantage of 1000G imputation became less clear when a stricter Bonferroni correction was used. More generally, our results provide insights that are applicable to the implementation of other dense reference panels that are under development.
Comparison of HapMap and 1000 Genomes Reference Panels in a Large-Scale Genome-Wide Association Study

PubMed Central

de Vries, Paul S.; Sabater-Lleal, Maria; Chasman, Daniel I.; Trompet, Stella; Kleber, Marcus E.; Chen, Ming-Huei; Wang, Jie Jin; Attia, John R.; Marioni, Riccardo E.; Weng, Lu-Chen; Grossmann, Vera; Brody, Jennifer A.; Venturini, Cristina; Tanaka, Toshiko; Rose, Lynda M.; Oldmeadow, Christopher; Mazur, Johanna; Basu, Saonli; Yang, Qiong; Ligthart, Symen; Hottenga, Jouke J.; Rumley, Ann; Mulas, Antonella; de Craen, Anton J. M.; Grotevendt, Anne; Taylor, Kent D.; Delgado, Graciela E.; Kifley, Annette; Lopez, Lorna M.; Berentzen, Tina L.; Mangino, Massimo; Bandinelli, Stefania; Morrison, Alanna C.; Hamsten, Anders; Tofler, Geoffrey; de Maat, Moniek P. M.; Draisma, Harmen H. M.; Lowe, Gordon D.; Zoledziewska, Magdalena; Sattar, Naveed; Lackner, Karl J.; Völker, Uwe; McKnight, Barbara; Huang, Jie; Holliday, Elizabeth G.; McEvoy, Mark A.; Starr, John M.; Hysi, Pirro G.; Hernandez, Dena G.; Guan, Weihua; Rivadeneira, Fernando; McArdle, Wendy L.; Slagboom, P. Eline; Zeller, Tanja; Psaty, Bruce M.; Uitterlinden, André G.; de Geus, Eco J. C.; Stott, David J.; Binder, Harald; Hofman, Albert; Franco, Oscar H.; Rotter, Jerome I.; Ferrucci, Luigi; Spector, Tim D.; Deary, Ian J.; März, Winfried; Greinacher, Andreas; Wild, Philipp S.; Cucca, Francesco; Boomsma, Dorret I.; Watkins, Hugh; Tang, Weihong; Ridker, Paul M.; Jukema, Jan W.; Scott, Rodney J.; Mitchell, Paul; Hansen, Torben; O'Donnell, Christopher J.; Smith, Nicholas L.; Strachan, David P.

2017-01-01

An increasing number of genome-wide association (GWA) studies are now using the higher resolution 1000 Genomes Project reference panel (1000G) for imputation, with the expectation that 1000G imputation will lead to the discovery of additional associated loci when compared to HapMap imputation. In order to assess the improvement of 1000G over HapMap imputation in identifying associated loci, we compared the results of GWA studies of circulating fibrinogen based on the two reference panels. Using both HapMap and 1000G imputation we performed a meta-analysis of 22 studies comprising the same 91,953 individuals. We identified six additional signals using 1000G imputation, while 29 loci were associated using both HapMap and 1000G imputation. One locus identified using HapMap imputation was not significant using 1000G imputation. The genome-wide significance threshold of 5×10−8 is based on the number of independent statistical tests using HapMap imputation, and 1000G imputation may lead to further independent tests that should be corrected for. When using a stricter Bonferroni correction for the 1000G GWA study (P-value < 2.5×10−8), the number of loci significant only using HapMap imputation increased to 4 while the number of loci significant only using 1000G decreased to 5. In conclusion, 1000G imputation enabled the identification of 20% more loci than HapMap imputation, although the advantage of 1000G imputation became less clear when a stricter Bonferroni correction was used. More generally, our results provide insights that are applicable to the implementation of other dense reference panels that are under development. PMID:28107422
Analysis of the Pantoea ananatis pan-genome reveals factors underlying its ability to colonize and interact with plant, insect and vertebrate hosts.

PubMed

De Maayer, Pieter; Chan, Wai Yin; Rubagotti, Enrico; Venter, Stephanus N; Toth, Ian K; Birch, Paul R J; Coutinho, Teresa A

2014-05-27

Pantoea ananatis is found in a wide range of natural environments, including water, soil, as part of the epi- and endophytic flora of various plant hosts, and in the insect gut. Some strains have proven effective as biological control agents and plant-growth promoters, while other strains have been implicated in diseases of a broad range of plant hosts and humans. By analysing the pan-genome of eight sequenced P. ananatis strains isolated from different sources we identified factors potentially underlying its ability to colonize and interact with hosts in both the plant and animal Kingdoms. The pan-genome of the eight compared P. ananatis strains consisted of a core genome comprised of 3,876 protein coding sequences (CDSs) and a sizeable accessory genome consisting of 1,690 CDSs. We estimate that ~106 unique CDSs would be added to the pan-genome with each additional P. ananatis genome sequenced in the future. The accessory fraction is derived mainly from integrated prophages and codes mostly for proteins of unknown function. Comparison of the translated CDSs on the P. ananatis pan-genome with the proteins encoded on all sequenced bacterial genomes currently available revealed that P. ananatis carries a number of CDSs with orthologs restricted to bacteria associated with distinct hosts, namely plant-, animal- and insect-associated bacteria. These CDSs encode proteins with putative roles in transport and metabolism of carbohydrate and amino acid substrates, adherence to host tissues, protection against plant and animal defense mechanisms and the biosynthesis of potential pathogenicity determinants including insecticidal peptides, phytotoxins and type VI secretion system effectors. P. ananatis has an 'open' pan-genome typical of bacterial species that colonize several different environments. The pan-genome incorporates a large number of genes encoding proteins that may enable P. ananatis to colonize, persist in and potentially cause disease symptoms in a wide range of plant and animal hosts.
GUIDEseq: a bioconductor package to analyze GUIDE-Seq datasets for CRISPR-Cas nucleases.

PubMed

Zhu, Lihua Julie; Lawrence, Michael; Gupta, Ankit; Pagès, Hervé; Kucukural, Alper; Garber, Manuel; Wolfe, Scot A

2017-05-15

Genome editing technologies developed around the CRISPR-Cas9 nuclease system have facilitated the investigation of a broad range of biological questions. These nucleases also hold tremendous promise for treating a variety of genetic disorders. In the context of their therapeutic application, it is important to identify the spectrum of genomic sequences that are cleaved by a candidate nuclease when programmed with a particular guide RNA, as well as the cleavage efficiency of these sites. Powerful new experimental approaches, such as GUIDE-seq, facilitate the sensitive, unbiased genome-wide detection of nuclease cleavage sites within the genome. Flexible bioinformatics analysis tools for processing GUIDE-seq data are needed. Here, we describe an open source, open development software suite, GUIDEseq, for GUIDE-seq data analysis and annotation as a Bioconductor package in R. The GUIDEseq package provides a flexible platform with more than 60 adjustable parameters for the analysis of datasets associated with custom nuclease applications. These parameters allow data analysis to be tailored to different nuclease platforms with different length and complexity in their guide and PAM recognition sequences or their DNA cleavage position. They also enable users to customize sequence aggregation criteria, and vary peak calling thresholds that can influence the number of potential off-target sites recovered. GUIDEseq also annotates potential off-target sites that overlap with genes based on genome annotation information, as these may be the most important off-target sites for further characterization. In addition, GUIDEseq enables the comparison and visualization of off-target site overlap between different datasets for a rapid comparison of different nuclease configurations or experimental conditions. For each identified off-target, the GUIDEseq package outputs mapped GUIDE-Seq read count as well as cleavage score from a user specified off-target cleavage score prediction algorithm permitting the identification of genomic sequences with unexpected cleavage activity. The GUIDEseq package enables analysis of GUIDE-data from various nuclease platforms for any species with a defined genomic sequence. This software package has been used successfully to analyze several GUIDE-seq datasets. The software, source code and documentation are freely available at http://www.bioconductor.org/packages/release/bioc/html/GUIDEseq.html .
Off-target Effects in CRISPR/Cas9-mediated Genome Engineering

PubMed Central

Zhang, Xiao-Hui; Tee, Louis Y; Wang, Xiao-Gang; Huang, Qun-Shan; Yang, Shi-Hua

2015-01-01

CRISPR/Cas9 is a versatile genome-editing technology that is widely used for studying the functionality of genetic elements, creating genetically modified organisms as well as preclinical research of genetic disorders. However, the high frequency of off-target activity (≥50%)—RGEN (RNA-guided endonuclease)-induced mutations at sites other than the intended on-target site—is one major concern, especially for therapeutic and clinical applications. Here, we review the basic mechanisms underlying off-target cutting in the CRISPR/Cas9 system, methods for detecting off-target mutations, and strategies for minimizing off-target cleavage. The improvement off-target specificity in the CRISPR/Cas9 system will provide solid genotype–phenotype correlations, and thus enable faithful interpretation of genome-editing data, which will certainly facilitate the basic and clinical application of this technology. PMID:26575098
Genome-wide identification of significant aberrations in cancer genome.

PubMed

Yuan, Xiguo; Yu, Guoqiang; Hou, Xuchu; Shih, Ie-Ming; Clarke, Robert; Zhang, Junying; Hoffman, Eric P; Wang, Roger R; Zhang, Zhen; Wang, Yue

2012-07-27

Somatic Copy Number Alterations (CNAs) in human genomes are present in almost all human cancers. Systematic efforts to characterize such structural variants must effectively distinguish significant consensus events from random background aberrations. Here we introduce Significant Aberration in Cancer (SAIC), a new method for characterizing and assessing the statistical significance of recurrent CNA units. Three main features of SAIC include: (1) exploiting the intrinsic correlation among consecutive probes to assign a score to each CNA unit instead of single probes; (2) performing permutations on CNA units that preserve correlations inherent in the copy number data; and (3) iteratively detecting Significant Copy Number Aberrations (SCAs) and estimating an unbiased null distribution by applying an SCA-exclusive permutation scheme. We test and compare the performance of SAIC against four peer methods (GISTIC, STAC, KC-SMART, CMDS) on a large number of simulation datasets. Experimental results show that SAIC outperforms peer methods in terms of larger area under the Receiver Operating Characteristics curve and increased detection power. We then apply SAIC to analyze structural genomic aberrations acquired in four real cancer genome-wide copy number data sets (ovarian cancer, metastatic prostate cancer, lung adenocarcinoma, glioblastoma). When compared with previously reported results, SAIC successfully identifies most SCAs known to be of biological significance and associated with oncogenes (e.g., KRAS, CCNE1, and MYC) or tumor suppressor genes (e.g., CDKN2A/B). Furthermore, SAIC identifies a number of novel SCAs in these copy number data that encompass tumor related genes and may warrant further studies. Supported by a well-grounded theoretical framework, SAIC has been developed and used to identify SCAs in various cancer copy number data sets, providing useful information to study the landscape of cancer genomes. Open-source and platform-independent SAIC software is implemented using C++, together with R scripts for data formatting and Perl scripts for user interfacing, and it is easy to install and efficient to use. The source code and documentation are freely available at http://www.cbil.ece.vt.edu/software.htm.
Advanced imaging techniques for the study of plant growth and development.

PubMed

Sozzani, Rosangela; Busch, Wolfgang; Spalding, Edgar P; Benfey, Philip N

2014-05-01

A variety of imaging methodologies are being used to collect data for quantitative studies of plant growth and development from living plants. Multi-level data, from macroscopic to molecular, and from weeks to seconds, can be acquired. Furthermore, advances in parallelized and automated image acquisition enable the throughput to capture images from large populations of plants under specific growth conditions. Image-processing capabilities allow for 3D or 4D reconstruction of image data and automated quantification of biological features. These advances facilitate the integration of imaging data with genome-wide molecular data to enable systems-level modeling. Copyright © 2013 Elsevier Ltd. All rights reserved.
Enabling the democratization of the genomics revolution with a fully integrated web-based bioinformatics platform

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Po-E; Lo, Chien -Chi; Anderson, Joseph J.

Continued advancements in sequencing technologies have fueled the development of new sequencing applications and promise to flood current databases with raw data. A number of factors prevent the seamless and easy use of these data, including the breadth of project goals, the wide array of tools that individually perform fractions of any given analysis, the large number of associated software/hardware dependencies, and the detailed expertise required to perform these analyses. To address these issues, we have developed an intuitive web-based environment with a wide assortment of integrated and cutting-edge bioinformatics tools in pre-configured workflows. These workflows, coupled with the easemore » of use of the environment, provide even novice next-generation sequencing users with the ability to perform many complex analyses with only a few mouse clicks and, within the context of the same environment, to visualize and further interrogate their results. As a result, this bioinformatics platform is an initial attempt at Empowering the Development of Genomics Expertise (EDGE) in a wide range of applications for microbial research.« less
Enabling the democratization of the genomics revolution with a fully integrated web-based bioinformatics platform

PubMed Central

Li, Po-E; Lo, Chien-Chi; Anderson, Joseph J.; Davenport, Karen W.; Bishop-Lilly, Kimberly A.; Xu, Yan; Ahmed, Sanaa; Feng, Shihai; Mokashi, Vishwesh P.; Chain, Patrick S.G.

2017-01-01

Continued advancements in sequencing technologies have fueled the development of new sequencing applications and promise to flood current databases with raw data. A number of factors prevent the seamless and easy use of these data, including the breadth of project goals, the wide array of tools that individually perform fractions of any given analysis, the large number of associated software/hardware dependencies, and the detailed expertise required to perform these analyses. To address these issues, we have developed an intuitive web-based environment with a wide assortment of integrated and cutting-edge bioinformatics tools in pre-configured workflows. These workflows, coupled with the ease of use of the environment, provide even novice next-generation sequencing users with the ability to perform many complex analyses with only a few mouse clicks and, within the context of the same environment, to visualize and further interrogate their results. This bioinformatics platform is an initial attempt at Empowering the Development of Genomics Expertise (EDGE) in a wide range of applications for microbial research. PMID:27899609
Enabling the democratization of the genomics revolution with a fully integrated web-based bioinformatics platform

DOE PAGES

Li, Po-E; Lo, Chien -Chi; Anderson, Joseph J.; ...

2016-11-24

Continued advancements in sequencing technologies have fueled the development of new sequencing applications and promise to flood current databases with raw data. A number of factors prevent the seamless and easy use of these data, including the breadth of project goals, the wide array of tools that individually perform fractions of any given analysis, the large number of associated software/hardware dependencies, and the detailed expertise required to perform these analyses. To address these issues, we have developed an intuitive web-based environment with a wide assortment of integrated and cutting-edge bioinformatics tools in pre-configured workflows. These workflows, coupled with the easemore » of use of the environment, provide even novice next-generation sequencing users with the ability to perform many complex analyses with only a few mouse clicks and, within the context of the same environment, to visualize and further interrogate their results. As a result, this bioinformatics platform is an initial attempt at Empowering the Development of Genomics Expertise (EDGE) in a wide range of applications for microbial research.« less
Inference of gene regulatory networks from genome-wide knockout fitness data

PubMed Central

Wang, Liming; Wang, Xiaodong; Arkin, Adam P.; Samoilov, Michael S.

2013-01-01

Motivation: Genome-wide fitness is an emerging type of high-throughput biological data generated for individual organisms by creating libraries of knockouts, subjecting them to broad ranges of environmental conditions, and measuring the resulting clone-specific fitnesses. Since fitness is an organism-scale measure of gene regulatory network behaviour, it may offer certain advantages when insights into such phenotypical and functional features are of primary interest over individual gene expression. Previous works have shown that genome-wide fitness data can be used to uncover novel gene regulatory interactions, when compared with results of more conventional gene expression analysis. Yet, to date, few algorithms have been proposed for systematically using genome-wide mutant fitness data for gene regulatory network inference. Results: In this article, we describe a model and propose an inference algorithm for using fitness data from knockout libraries to identify underlying gene regulatory networks. Unlike most prior methods, the presented approach captures not only structural, but also dynamical and non-linear nature of biomolecular systems involved. A state–space model with non-linear basis is used for dynamically describing gene regulatory networks. Network structure is then elucidated by estimating unknown model parameters. Unscented Kalman filter is used to cope with the non-linearities introduced in the model, which also enables the algorithm to run in on-line mode for practical use. Here, we demonstrate that the algorithm provides satisfying results for both synthetic data as well as empirical measurements of GAL network in yeast Saccharomyces cerevisiae and TyrR–LiuR network in bacteria Shewanella oneidensis. Availability: MATLAB code and datasets are available to download at http://www.duke.edu/∼lw174/Fitness.zip and http://genomics.lbl.gov/supplemental/fitness-bioinf/ Contact: wangx@ee.columbia.edu or mssamoilov@lbl.gov Supplementary information: Supplementary data are available at Bioinformatics online PMID:23271269
SUSCEPTIBILITY LOCI FOR UMBILICAL HERNIA IN SWINE DETECTED BY GENOME-WIDE ASSOCIATION.

PubMed

Liao, X J; Lia, L; Zhang, Z Y; Long, Y; Yang, B; Ruan, G R; Su, Y; Ai, H S; Zhang, W C; Deng, W Y; Xiao, S J; Ren, J; Ding, N S; Huang, L S

2015-10-01

Umbilical hernia (UH) is a complex disorder caused by both genetic and environmental factors. UH brings animal welfare problems and severe economic loss to the pig industry. Until now, the genetic basis of UH is poorly understood. The high-density 60K porcine SNP array enables the rapid application of genome-wide association study (GWAS) to identify genetic loci for phenotypic traits at genome wide scale in pigs. The objective of this research was to identify susceptibility loci for swine umbilical hernia using the GWAS approach. We genotyped 478 piglets from 142 families representing three Western commercial breeds with the Illumina PorcineSNP60 BeadChip. Then significant SNPs were detected by GWAS using ROADTRIPS (Robust Association-Detection Test for Related Individuals with Population Substructure) software base on a Bonferroni corrected threshold (P = 1.67E-06) or suggestive threshold (P = 3.34E-05) and false discovery rate (FDR = 0.05). After quality control, 29,924 qualified SNPs and 472 piglets were used for GWAS. Two suggestive loci predisposing to pig UH were identified at 44.25MB on SSC2 (rs81358018, P = 3.34E-06, FDR = 0.049933) and at 45.90MB on SSC17 (rs81479278, P = 3.30E-06, FDR = 0.049933) in Duroc population, respectively. And no SNP was detected to be associated with pig UH at significant level in neither Landrace nor Large White population. Furthermore, we carried out a meta-analysis in the combined pure-breed population containing all the 472 piglets. rs81479278 (P = 1.16E-06, FDR = 0.022475) was identified to associate with pig UH at genome-wide significant level. SRC was characterized as plausible candidate gene for susceptibility to pig UH according to its genomic position and biological functions. To our knowledge, this study gives the first description of GWAS identifying susceptibility loci for umbilical hernia in pigs. Our findings provide deeper insights to the genetic architecture of umbilical hernia in pigs.
Medium-sized tandem repeats represent an abundant component of the Drosophila virilis genome.

PubMed

Abdurashitov, Murat A; Gonchar, Danila A; Chernukhin, Valery A; Tomilov, Victor N; Tomilova, Julia E; Schostak, Natalia G; Zatsepina, Olga G; Zelentsova, Elena S; Evgen'ev, Michael B; Degtyarev, Sergey K H

2013-11-09

Previously, we developed a simple method for carrying out a restriction enzyme analysis of eukaryotic DNA in silico, based on the known DNA sequences of the genomes. This method allows the user to calculate lengths of all DNA fragments that are formed after a whole genome is digested at the theoretical recognition sites of a given restriction enzyme. A comparison of the observed peaks in distribution diagrams with the results from DNA cleavage using several restriction enzymes performed in vitro have shown good correspondence between the theoretical and experimental data in several cases. Here, we applied this approach to the annotated genome of Drosophila virilis which is extremely rich in various repeats. Here we explored the combined approach to perform the restriction analysis of D. virilis DNA. This approach enabled to reveal three abundant medium-sized tandem repeats within the D. virilis genome. While the 225 bp repeats were revealed previously in intergenic non-transcribed spacers between ribosomal genes of D. virilis, two other families comprised of 154 bp and 172 bp repeats were not described. Tandem Repeats Finder search demonstrated that 154 bp and 172 bp units are organized in multiple clusters in the genome of D. virilis. Characteristically, only 154 bp repeats derived from Helitron transposon are transcribed. Using in silico digestion in combination with conventional restriction analysis and sequencing of repeated DNA fragments enabled us to isolate and characterize three highly abundant families of medium-sized repeats present in the D. virilis genome. These repeats comprise a significant portion of the genome and may have important roles in genome function and structural integrity. Therefore, we demonstrated an approach which makes possible to investigate in detail the gross arrangement and expression of medium-sized repeats basing on sequencing data even in the case of incompletely assembled and/or annotated genomes.
Meta-analysis of Genome-wide Association Studies for Neuroticism, and the Polygenic Association With Major Depressive Disorder.

PubMed

de Moor, Marleen H M; van den Berg, Stéphanie M; Verweij, Karin J H; Krueger, Robert F; Luciano, Michelle; Arias Vasquez, Alejandro; Matteson, Lindsay K; Derringer, Jaime; Esko, Tõnu; Amin, Najaf; Gordon, Scott D; Hansell, Narelle K; Hart, Amy B; Seppälä, Ilkka; Huffman, Jennifer E; Konte, Bettina; Lahti, Jari; Lee, Minyoung; Miller, Mike; Nutile, Teresa; Tanaka, Toshiko; Teumer, Alexander; Viktorin, Alexander; Wedenoja, Juho; Abecasis, Goncalo R; Adkins, Daniel E; Agrawal, Arpana; Allik, Jüri; Appel, Katja; Bigdeli, Timothy B; Busonero, Fabio; Campbell, Harry; Costa, Paul T; Davey Smith, George; Davies, Gail; de Wit, Harriet; Ding, Jun; Engelhardt, Barbara E; Eriksson, Johan G; Fedko, Iryna O; Ferrucci, Luigi; Franke, Barbara; Giegling, Ina; Grucza, Richard; Hartmann, Annette M; Heath, Andrew C; Heinonen, Kati; Henders, Anjali K; Homuth, Georg; Hottenga, Jouke-Jan; Iacono, William G; Janzing, Joost; Jokela, Markus; Karlsson, Robert; Kemp, John P; Kirkpatrick, Matthew G; Latvala, Antti; Lehtimäki, Terho; Liewald, David C; Madden, Pamela A F; Magri, Chiara; Magnusson, Patrik K E; Marten, Jonathan; Maschio, Andrea; Medland, Sarah E; Mihailov, Evelin; Milaneschi, Yuri; Montgomery, Grant W; Nauck, Matthias; Ouwens, Klaasjan G; Palotie, Aarno; Pettersson, Erik; Polasek, Ozren; Qian, Yong; Pulkki-Råback, Laura; Raitakari, Olli T; Realo, Anu; Rose, Richard J; Ruggiero, Daniela; Schmidt, Carsten O; Slutske, Wendy S; Sorice, Rossella; Starr, John M; St Pourcain, Beate; Sutin, Angelina R; Timpson, Nicholas J; Trochet, Holly; Vermeulen, Sita; Vuoksimaa, Eero; Widen, Elisabeth; Wouda, Jasper; Wright, Margaret J; Zgaga, Lina; Porteous, David; Minelli, Alessandra; Palmer, Abraham A; Rujescu, Dan; Ciullo, Marina; Hayward, Caroline; Rudan, Igor; Metspalu, Andres; Kaprio, Jaakko; Deary, Ian J; Räikkönen, Katri; Wilson, James F; Keltikangas-Järvinen, Liisa; Bierut, Laura J; Hettema, John M; Grabe, Hans J; van Duijn, Cornelia M; Evans, David M; Schlessinger, David; Pedersen, Nancy L; Terracciano, Antonio; McGue, Matt; Penninx, Brenda W J H; Martin, Nicholas G; Boomsma, Dorret I

2015-07-01

Neuroticism is a pervasive risk factor for psychiatric conditions. It genetically overlaps with major depressive disorder (MDD) and is therefore an important phenotype for psychiatric genetics. The Genetics of Personality Consortium has created a resource for genome-wide association analyses of personality traits in more than 63,000 participants (including MDD cases). To identify genetic variants associated with neuroticism by performing a meta-analysis of genome-wide association results based on 1000 Genomes imputation; to evaluate whether common genetic variants as assessed by single-nucleotide polymorphisms (SNPs) explain variation in neuroticism by estimating SNP-based heritability; and to examine whether SNPs that predict neuroticism also predict MDD. Genome-wide association meta-analysis of 30 cohorts with genome-wide genotype, personality, and MDD data from the Genetics of Personality Consortium. The study included 63,661 participants from 29 discovery cohorts and 9786 participants from a replication cohort. Participants came from Europe, the United States, or Australia. Analyses were conducted between 2012 and 2014. Neuroticism scores harmonized across all 29 discovery cohorts by item response theory analysis, and clinical MDD case-control status in 2 of the cohorts. A genome-wide significant SNP was found on 3p14 in MAGI1 (rs35855737; P = 9.26 × 10-9 in the discovery meta-analysis). This association was not replicated (P = .32), but the SNP was still genome-wide significant in the meta-analysis of all 30 cohorts (P = 2.38 × 10-8). Common genetic variants explain 15% of the variance in neuroticism. Polygenic scores based on the meta-analysis of neuroticism in 27 cohorts significantly predicted neuroticism (1.09 × 10-12 < P < .05) and MDD (4.02 × 10-9 < P < .05) in the 2 other cohorts. This study identifies a novel locus for neuroticism. The variant is located in a known gene that has been associated with bipolar disorder and schizophrenia in previous studies. In addition, the study shows that neuroticism is influenced by many genetic variants of small effect that are either common or tagged by common variants. These genetic variants also influence MDD. Future studies should confirm the role of the MAGI1 locus for neuroticism and further investigate the association of MAGI1 and the polygenic association to a range of other psychiatric disorders that are phenotypically correlated with neuroticism.
Reference-guided de novo assembly approach improves genome reconstruction for related species.

PubMed

Lischer, Heidi E L; Shimizu, Kentaro K

2017-11-10

The development of next-generation sequencing has made it possible to sequence whole genomes at a relatively low cost. However, de novo genome assemblies remain challenging due to short read length, missing data, repetitive regions, polymorphisms and sequencing errors. As more and more genomes are sequenced, reference-guided assembly approaches can be used to assist the assembly process. However, previous methods mostly focused on the assembly of other genotypes within the same species. We adapted and extended a reference-guided de novo assembly approach, which enables the usage of a related reference sequence to guide the genome assembly. In order to compare and evaluate de novo and our reference-guided de novo assembly approaches, we used a simulated data set of a repetitive and heterozygotic plant genome. The extended reference-guided de novo assembly approach almost always outperforms the corresponding de novo assembly program even when a reference of a different species is used. Similar improvements can be observed in high and low coverage situations. In addition, we show that a single evaluation metric, like the widely used N50 length, is not enough to properly rate assemblies as it not always points to the best assembly evaluated with other criteria. Therefore, we used the summed z-scores of 36 different statistics to evaluate the assemblies. The combination of reference mapping and de novo assembly provides a powerful tool to improve genome reconstruction by integrating information of a related genome. Our extension of the reference-guided de novo assembly approach enables the application of this strategy not only within but also between related species. Finally, the evaluation of genome assemblies is often not straight forward, as the truth is not known. Thus one should always use a combination of evaluation metrics, which not only try to assess the continuity but also the accuracy of an assembly.
Management of familial cancer: sequencing, surveillance and society.

PubMed

Samuel, Nardin; Villani, Anita; Fernandez, Conrad V; Malkin, David

2014-12-01

The clinical management of familial cancer begins with recognition of patterns of cancer occurrence suggestive of genetic susceptibility in a proband or pedigree, to enable subsequent investigation of the underlying DNA mutations. In this regard, next-generation sequencing of DNA continues to transform cancer diagnostics, by enabling screening for cancer-susceptibility genes in the context of known and emerging familial cancer syndromes. Increasingly, not only are candidate cancer genes sequenced, but also entire 'healthy' genomes are mapped in children with cancer and their family members. Although large-scale genomic analysis is considered intrinsic to the success of cancer research and discovery, a number of accompanying ethical and technical issues must be addressed before this approach can be adopted widely in personalized therapy. In this Perspectives article, we describe our views on how the emergence of new sequencing technologies and cancer surveillance strategies is altering the framework for the clinical management of hereditary cancer. Genetic counselling and disclosure issues are discussed, and strategies for approaching ethical dilemmas are proposed.
Family genome browser: visualizing genomes with pedigree information.

PubMed

Juan, Liran; Liu, Yongzhuang; Wang, Yongtian; Teng, Mingxiang; Zang, Tianyi; Wang, Yadong

2015-07-15

Families with inherited diseases are widely used in Mendelian/complex disease studies. Owing to the advances in high-throughput sequencing technologies, family genome sequencing becomes more and more prevalent. Visualizing family genomes can greatly facilitate human genetics studies and personalized medicine. However, due to the complex genetic relationships and high similarities among genomes of consanguineous family members, family genomes are difficult to be visualized in traditional genome visualization framework. How to visualize the family genome variants and their functions with integrated pedigree information remains a critical challenge. We developed the Family Genome Browser (FGB) to provide comprehensive analysis and visualization for family genomes. The FGB can visualize family genomes in both individual level and variant level effectively, through integrating genome data with pedigree information. Family genome analysis, including determination of parental origin of the variants, detection of de novo mutations, identification of potential recombination events and identical-by-decent segments, etc., can be performed flexibly. Diverse annotations for the family genome variants, such as dbSNP memberships, linkage disequilibriums, genes, variant effects, potential phenotypes, etc., are illustrated as well. Moreover, the FGB can automatically search de novo mutations and compound heterozygous variants for a selected individual, and guide investigators to find high-risk genes with flexible navigation options. These features enable users to investigate and understand family genomes intuitively and systematically. The FGB is available at http://mlg.hit.edu.cn/FGB/. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Genome-wide association study of a nicotine metabolism biomarker in African American smokers: impact of chromosome 19 genetic influences.

PubMed

Chenoweth, Meghan J; Ware, Jennifer J; Zhu, Andy Z X; Cole, Christopher B; Cox, Lisa Sanderson; Nollen, Nikki; Ahluwalia, Jasjit S; Benowitz, Neal L; Schnoll, Robert A; Hawk, Larry W; Cinciripini, Paul M; George, Tony P; Lerman, Caryn; Knight, Joanne; Tyndale, Rachel F

2018-03-01

The activity of CYP2A6, the major nicotine-inactivating enzyme, is measurable in smokers using the nicotine metabolite ratio (NMR; 3'hydroxycotinine/cotinine). Due to its role in nicotine clearance, the NMR is associated with smoking behaviours and response to pharmacotherapies. The NMR is highly heritable (~80%), and on average lower in African Americans (AA) versus whites. We previously identified several reduce and loss-of-function CYP2A6 variants common in individuals of African descent. Our current aim was to identify novel genetic influences on the NMR in AA smokers using genome-wide approaches. Genome-wide association study (GWAS). Multiple sites within Canada and the United States. AA smokers from two clinical trials: Pharmacogenetics of Nicotine Addiction Treatment (PNAT)-2 (NCT01314001; n = 504) and Kick-it-at-Swope (KIS)-3 (NCT00666978; n = 450). Genome-wide SNP genotyping, the NMR (phenotype) and population substructure and NMR covariates. Meta-analysis revealed three independent chromosome 19 signals (rs12459249, rs111645190 and rs185430475) associated with the NMR. The top overall hit, rs12459249 (P = 1.47e-39; beta = 0.59 per C (versus T) allele, SE = 0.045), located ~9.5 kb 3' of CYP2A6, remained genome-wide significant after controlling for the common (~10% in AA) non-functional CYP2A6*17 allele. In contrast, rs111645190 and rs185430475 were not genome-wide significant when controlling for CYP2A6*17. In total, 96 signals associated with the NMR were identified; many were not found in prior NMR GWASs in individuals of European descent. The top hits were also associated with the NMR in a third cohort of AA (KIS2; n = 480). None of the hits were in UGT or OCT2 genes. Three independent chromosome 19 signals account for ~20% of the variability in the nicotine metabolite ratio in African American smokers. The hits identified may contribute to inter-ethnic variability in nicotine metabolism, smoking behaviours and tobacco-related disease risk. © 2017 Society for the Study of Addiction.

RNAi screening comes of age: improved techniques and complementary approaches

PubMed Central

Mohr, Stephanie E.; Smith, Jennifer A.; Shamu, Caroline E.; Neumüller, Ralph A.; Perrimon, Norbert

2014-01-01

Gene silencing through sequence-specific targeting of mRNAs by RNAi has enabled genome-wide functional screens in cultured cells and in vivo in model organisms. These screens have resulted in the identification of new cellular pathways and potential drug targets. Considerable progress has been made to improve the quality of RNAi screen data through the development of new experimental and bioinformatics approaches. The recent availability of genome-editing strategies, such as the CRISPR (clustered regularly interspaced short palindromic repeats)-Cas9 system, when combined with RNAi, could lead to further improvements in screen data quality and follow-up experiments, thus promoting our understanding of gene function and gene regulatory networks. PMID:25145850
Cell-type-specific profiling of protein-DNA interactions without cell isolation using targeted DamID with next-generation sequencing.

PubMed

Marshall, Owen J; Southall, Tony D; Cheetham, Seth W; Brand, Andrea H

2016-09-01

This protocol is an extension to: Nat. Protoc. 2, 1467-1478 (2007); doi:10.1038/nprot.2007.148; published online 7 June 2007The ability to profile transcription and chromatin binding in a cell-type-specific manner is a powerful aid to understanding cell-fate specification and cellular function in multicellular organisms. We recently developed targeted DamID (TaDa) to enable genome-wide, cell-type-specific profiling of DNA- and chromatin-binding proteins in vivo without cell isolation. As a protocol extension, this article describes substantial modifications to an existing protocol, and it offers additional applications. TaDa builds upon DamID, a technique for detecting genome-wide DNA-binding profiles of proteins, by coupling it with the GAL4 system in Drosophila to enable both temporal and spatial resolution. TaDa ensures that Dam-fusion proteins are expressed at very low levels, thus avoiding toxicity and potential artifacts from overexpression. The modifications to the core DamID technique presented here also increase the speed of sample processing and throughput, and adapt the method to next-generation sequencing technology. TaDa is robust, reproducible and highly sensitive. Compared with other methods for cell-type-specific profiling, the technique requires no cell-sorting, cross-linking or antisera, and binding profiles can be generated from as few as 10,000 total induced cells. By profiling the genome-wide binding of RNA polymerase II (Pol II), TaDa can also identify transcribed genes in a cell-type-specific manner. Here we describe a detailed protocol for carrying out TaDa experiments and preparing the material for next-generation sequencing. Although we developed TaDa in Drosophila, it should be easily adapted to other organisms with an inducible expression system. Once transgenic animals are obtained, the entire experimental procedure-from collecting tissue samples to generating sequencing libraries-can be accomplished within 5 d.
Genome-wide investigation and expression analyses of WD40 protein family in the model plant foxtail millet (Setaria italica L.).

PubMed

Mishra, Awdhesh Kumar; Muthamilarasan, Mehanathan; Khan, Yusuf; Parida, Swarup Kumar; Prasad, Manoj

2014-01-01

WD40 proteins play a crucial role in diverse protein-protein interactions by acting as scaffolding molecules and thus assisting in the proper activity of proteins. Hence, systematic characterization and expression profiling of these WD40 genes in foxtail millet would enable us to understand the networks of WD40 proteins and their biological processes and gene functions. In the present study, a genome-wide survey was conducted and 225 potential WD40 genes were identified. Phylogenetic analysis categorized the WD40 proteins into 5 distinct sub-families (I-V). Gene Ontology annotation revealed the biological roles of the WD40 proteins along with its cellular components and molecular functions. In silico comparative mapping with sorghum, maize and rice demonstrated the orthologous relationships and chromosomal rearrangements including duplication, inversion and deletion of WD40 genes. Estimation of synonymous and non-synonymous substitution rates revealed its evolutionary significance in terms of gene-duplication and divergence. Expression profiling against abiotic stresses provided novel insights into specific and/or overlapping expression patterns of SiWD40 genes. Homology modeling enabled three-dimensional structure prediction was performed to understand the molecular functions of WD40 proteins. Although, recent findings had shown the importance of WD40 domains in acting as hubs for cellular networks during many biological processes, it has invited a lesser research attention unlike other common domains. Being a most promiscuous interactors, WD40 domains are versatile in mediating critical cellular functions and hence this genome-wide study especially in the model crop foxtail millet would serve as a blue-print for functional characterization of WD40s in millets and bioenergy grass species. In addition, the present analyses would also assist the research community in choosing the candidate WD40s for comprehensive studies towards crop improvement of millets and biofuel grasses.
Genome-Wide Investigation and Expression Analyses of WD40 Protein Family in the Model Plant Foxtail Millet (Setaria italica L.)

PubMed Central

Mishra, Awdhesh Kumar; Muthamilarasan, Mehanathan; Khan, Yusuf; Parida, Swarup Kumar; Prasad, Manoj

2014-01-01

WD40 proteins play a crucial role in diverse protein-protein interactions by acting as scaffolding molecules and thus assisting in the proper activity of proteins. Hence, systematic characterization and expression profiling of these WD40 genes in foxtail millet would enable us to understand the networks of WD40 proteins and their biological processes and gene functions. In the present study, a genome-wide survey was conducted and 225 potential WD40 genes were identified. Phylogenetic analysis categorized the WD40 proteins into 5 distinct sub-families (I–V). Gene Ontology annotation revealed the biological roles of the WD40 proteins along with its cellular components and molecular functions. In silico comparative mapping with sorghum, maize and rice demonstrated the orthologous relationships and chromosomal rearrangements including duplication, inversion and deletion of WD40 genes. Estimation of synonymous and non-synonymous substitution rates revealed its evolutionary significance in terms of gene-duplication and divergence. Expression profiling against abiotic stresses provided novel insights into specific and/or overlapping expression patterns of SiWD40 genes. Homology modeling enabled three-dimensional structure prediction was performed to understand the molecular functions of WD40 proteins. Although, recent findings had shown the importance of WD40 domains in acting as hubs for cellular networks during many biological processes, it has invited a lesser research attention unlike other common domains. Being a most promiscuous interactors, WD40 domains are versatile in mediating critical cellular functions and hence this genome-wide study especially in the model crop foxtail millet would serve as a blue-print for functional characterization of WD40s in millets and bioenergy grass species. In addition, the present analyses would also assist the research community in choosing the candidate WD40s for comprehensive studies towards crop improvement of millets and biofuel grasses. PMID:24466268
Genome structure of a Saccharomyces cerevisiae strain widely used in bioethanol production

PubMed Central

Argueso, Juan Lucas; Carazzolle, Marcelo F.; Mieczkowski, Piotr A.; Duarte, Fabiana M.; Netto, Osmar V.C.; Missawa, Silvia K.; Galzerani, Felipe; Costa, Gustavo G.L.; Vidal, Ramon O.; Noronha, Melline F.; Dominska, Margaret; Andrietta, Maria G.S.; Andrietta, Sílvio R.; Cunha, Anderson F.; Gomes, Luiz H.; Tavares, Flavio C.A.; Alcarde, André R.; Dietrich, Fred S.; McCusker, John H.; Petes, Thomas D.; Pereira, Gonçalo A.G.

2009-01-01

Bioethanol is a biofuel produced mainly from the fermentation of carbohydrates derived from agricultural feedstocks by the yeast Saccharomyces cerevisiae. One of the most widely adopted strains is PE-2, a heterothallic diploid naturally adapted to the sugar cane fermentation process used in Brazil. Here we report the molecular genetic analysis of a PE-2 derived diploid (JAY270), and the complete genome sequence of a haploid derivative (JAY291). The JAY270 genome is highly heterozygous (∼2 SNPs/kb) and has several structural polymorphisms between homologous chromosomes. These chromosomal rearrangements are confined to the peripheral regions of the chromosomes, with breakpoints within repetitive DNA sequences. Despite its complex karyotype, this diploid, when sporulated, had a high frequency of viable spores. Hybrid diploids formed by outcrossing with the laboratory strain S288c also displayed good spore viability. Thus, the rearrangements that exist near the ends of chromosomes do not impair meiosis, as they do not span regions that contain essential genes. This observation is consistent with a model in which the peripheral regions of chromosomes represent plastic domains of the genome that are free to recombine ectopically and experiment with alternative structures. We also explored features of the JAY270 and JAY291 genomes that help explain their high adaptation to industrial environments, exhibiting desirable phenotypes such as high ethanol and cell mass production and high temperature and oxidative stress tolerance. The genomic manipulation of such strains could enable the creation of a new generation of industrial organisms, ideally suited for use as delivery vehicles for future bioenergy technologies. PMID:19812109
Functional modules, mutational load and human genetic disease.

PubMed

Zaghloul, Norann A; Katsanis, Nicholas

2010-04-01

The ability to generate a massive amount of sequencing and genotyping data is transforming the study of human genetic disorders. Driven by such innovation, it is likely that whole exome and whole-genome resequencing will replace regionally focused approaches for gene discovery and clinical testing in the next few years. However, this opportunity brings a significant interpretative challenge to assigning function and phenotypic variance to common and rare alleles. Understanding the effect of individual mutations in the context of the remaining genomic variation represents a major challenge to our interpretation of disease. Here, we discuss the challenges of assigning mutation functionality and, drawing from the examples of ciliopathies as well as cohesinopathies and channelopathies, discuss possibilities for the functional modularization of the human genome. Functional modularization in addition to the development of physiologically relevant assays to test allele functionality will accelerate our understanding of disease architecture and enable the use of genome-wide sequence data for disease diagnosis and phenotypic prediction in individuals. Copyright 2010 Elsevier Ltd. All rights reserved.
Functional modules, mutational load and human genetic disease

PubMed Central

Zaghloul, Norann A.; Katsanis, Nicholas

2013-01-01

The ability to generate a massive amount of sequencing and genotyping data is transforming the study of human genetic disorders. Driven by such innovation, it is likely that whole exome and whole-genome resequencing will replace regionally focused approaches for gene discovery and clinical testing in the next few years. However, this opportunity brings a significant interpretative challenge to assigning function and phenotypic variance to common and rare alleles. Understanding the effect of individual mutations in the context of the remaining genomic variation represents a major challenge to our interpretation of disease. Here, we discuss the challenges of assigning mutation functionality and, drawing from the examples of ciliopathies as well as cohesinopathies and channelopathies, discuss possibilities for the functional modularization of the human genome. Functional modularization in addition to the development of physiologically-relevant assays to test allele functionality will accelerate our understanding of disease architecture and enable the use of genome-wide sequence data for disease diagnosis and phenotypic prediction in individuals. PMID:20226561
Genome-wide cross-amplification of domestic sheep microsatellites in bighorn sheep and mountain goats.

PubMed

Poissant, J; Shafer, A B A; Davis, C S; Mainguy, J; Hogg, J T; Côté, S D; Coltman, D W

2009-07-01

We tested for cross-species amplification of microsatellite loci located throughout the domestic sheep (Ovis aries) genome in two north American mountain ungulates (bighorn sheep, Ovis canadensis, and mountain goats, Oreamnos americanus). We identified 247 new polymorphic markers in bighorn sheep (≥ 3 alleles in one of two study populations) and 149 in mountain goats (≥ 2 alleles in a single study population) using 648 and 576 primer pairs, respectively. Our efforts increased the number of available polymorphic microsatellite markers to 327 for bighorn sheep and 180 for mountain goats. The average distance between successive polymorphic bighorn sheep and mountain goat markers inferred from the Australian domestic sheep genome linkage map (mean ± 1 SD) was 11.9 ± 9.2 and 15.8 ± 13.8 centimorgans, respectively. The development of genomic resources in these wildlife species enables future studies of the genetic architecture of trait variation. © 2009 Blackwell Publishing Ltd.
Genome-scale deletion screening of human long non-coding RNAs using a paired-guide RNA CRISPR library

PubMed Central

Zhu, Shiyou; Li, Wei; Liu, Jingze; Chen, Chen-Hao; Liao, Qi; Xu, Ping; Xu, Han; Xiao, Tengfei; Cao, Zhongzheng; Peng, Jingyu; Yuan, Pengfei; Brown, Myles; Liu, Xiaole Shirley; Wei, Wensheng

2017-01-01

CRISPR/Cas9 screens have been widely adopted to analyse coding gene functions, but high throughput screening of non-coding elements using this method is more challenging, because indels caused by a single cut in non-coding regions are unlikely to produce a functional knockout. A high-throughput method to produce deletions of non-coding DNA is needed. Herein, we report a high throughput genomic deletion strategy to screen for functional long non-coding RNAs (lncRNAs) that is based on a lentiviral paired-guide RNA (pgRNA) library. Applying our screening method, we identified 51 lncRNAs that can positively or negatively regulate human cancer cell growth. We individually validated 9 lncRNAs using CRISPR/Cas9-mediated genomic deletion and functional rescue, CRISPR activation or inhibition, and gene expression profiling. Our high-throughput pgRNA genome deletion method should enable rapid identification of functional mammalian non-coding elements. PMID:27798563
A parallel genome-wide RNAi screening strategy to identify host proteins important for entry of Marburg virus and H5N1 influenza virus.

PubMed

Cheng, Han; Koning, Katie; O'Hearn, Aileen; Wang, Minxiu; Rumschlag-Booms, Emily; Varhegyi, Elizabeth; Rong, Lijun

2015-11-24

Genome-wide RNAi screening has been widely used to identify host proteins involved in replication and infection of different viruses, and numerous host factors are implicated in the replication cycles of these viruses, demonstrating the power of this approach. However, discrepancies on target identification of the same viruses by different groups suggest that high throughput RNAi screening strategies need to be carefully designed, developed and optimized prior to the large scale screening. Two genome-wide RNAi screens were performed in parallel against the entry of pseudotyped Marburg viruses and avian influenza virus H5N1 utilizing an HIV-1 based surrogate system, to identify host factors which are important for virus entry. A comparative analysis approach was employed in data analysis, which alleviated systematic positional effects and reduced the false positive number of virus-specific hits. The parallel nature of the strategy allows us to easily identify the host factors for a specific virus with a greatly reduced number of false positives in the initial screen, which is one of the major problems with high throughput screening. The power of this strategy is illustrated by a genome-wide RNAi screen for identifying the host factors important for Marburg virus and/or avian influenza virus H5N1 as described in this study. This strategy is particularly useful for highly pathogenic viruses since pseudotyping allows us to perform high throughput screens in the biosafety level 2 (BSL-2) containment instead of the BSL-3 or BSL-4 for the infectious viruses, with alleviated safety concerns. The screening strategy together with the unique comparative analysis approach makes the data more suitable for hit selection and enables us to identify virus-specific hits with a much lower false positive rate.
Scientific Advances with Aspergillus Species that Are Used for Food and Biotech Applications.

PubMed

Biesebeke, Rob Te; Record, Erik

2008-01-01

Yeast and filamentous fungi have been used for centuries in diverse biotechnological processes. Fungal fermentation technology is traditionally used in relation to food production, such as for bread, beer, cheese, sake and soy sauce. Last century, the industrial application of yeast and filamentous fungi expanded rapidly, with excellent examples such as purified enzymes and secondary metabolites (e.g. antibiotics), which are used in a wide range of food as well as non-food industries. Research on protein and/or metabolite secretion by fungal species has focused on identifying bottlenecks in (post-) transcriptional regulation of protein production, metabolic rerouting, morphology and the transit of proteins through the secretion pathway. In past years, genome sequencing of some fungi (e.g. Aspergillus oryzae, Aspergillus niger) has been completed. The available genome sequences have enabled identification of genes and functionally important regions of the genome. This has directed research to focus on a post-genomics era in which transcriptomics, proteomics and metabolomics methodologies will help to explore the scientific relevance and industrial application of fungal genome sequences.
Use of a Drosophila Genome-Wide Conserved Sequence Database to Identify Functionally Related cis-Regulatory Enhancers

PubMed Central

Brody, Thomas; Yavatkar, Amarendra S; Kuzin, Alexander; Kundu, Mukta; Tyson, Leonard J; Ross, Jermaine; Lin, Tzu-Yang; Lee, Chi-Hon; Awasaki, Takeshi; Lee, Tzumin; Odenwald, Ward F

2012-01-01

Background: Phylogenetic footprinting has revealed that cis-regulatory enhancers consist of conserved DNA sequence clusters (CSCs). Currently, there is no systematic approach for enhancer discovery and analysis that takes full-advantage of the sequence information within enhancer CSCs. Results: We have generated a Drosophila genome-wide database of conserved DNA consisting of >100,000 CSCs derived from EvoPrints spanning over 90% of the genome. cis-Decoder database search and alignment algorithms enable the discovery of functionally related enhancers. The program first identifies conserved repeat elements within an input enhancer and then searches the database for CSCs that score highly against the input CSC. Scoring is based on shared repeats as well as uniquely shared matches, and includes measures of the balance of shared elements, a diagnostic that has proven to be useful in predicting cis-regulatory function. To demonstrate the utility of these tools, a temporally-restricted CNS neuroblast enhancer was used to identify other functionally related enhancers and analyze their structural organization. Conclusions: cis-Decoder reveals that co-regulating enhancers consist of combinations of overlapping shared sequence elements, providing insights into the mode of integration of multiple regulating transcription factors. The database and accompanying algorithms should prove useful in the discovery and analysis of enhancers involved in any developmental process. Developmental Dynamics 241:169–189, 2012. © 2011 Wiley Periodicals, Inc. Key findings A genome-wide catalog of Drosophila conserved DNA sequence clusters. cis-Decoder discovers functionally related enhancers. Functionally related enhancers share balanced sequence element copy numbers. Many enhancers function during multiple phases of development. PMID:22174086
Genome-Wide Gene Set Analysis for Identification of Pathways Associated with Alcohol Dependence

PubMed Central

Biernacka, Joanna M.; Geske, Jennifer; Jenkins, Gregory D.; Colby, Colin; Rider, David N.; Karpyak, Victor M.; Choi, Doo-Sup; Fridley, Brooke L.

2013-01-01

It is believed that multiple genetic variants with small individual effects contribute to the risk of alcohol dependence. Such polygenic effects are difficult to detect in genome-wide association studies that test for association of the phenotype with each single nucleotide polymorphism (SNP) individually. To overcome this challenge, gene set analysis (GSA) methods that jointly test for the effects of pre-defined groups of genes have been proposed. Rather than testing for association between the phenotype and individual SNPs, these analyses evaluate the global evidence of association with a set of related genes enabling the identification of cellular or molecular pathways or biological processes that play a role in development of the disease. It is hoped that by aggregating the evidence of association for all available SNPs in a group of related genes, these approaches will have enhanced power to detect genetic associations with complex traits. We performed GSA using data from a genome-wide study of 1165 alcohol dependent cases and 1379 controls from the Study of Addiction: Genetics and Environment (SAGE), for all 200 pathways listed in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Results demonstrated a potential role of the “Synthesis and Degradation of Ketone Bodies” pathway. Our results also support the potential involvement of the “Neuroactive Ligand Receptor Interaction” pathway, which has previously been implicated in addictive disorders. These findings demonstrate the utility of GSA in the study of complex disease, and suggest specific directions for further research into the genetic architecture of alcohol dependence. PMID:22717047
A genome-wide shRNA screen identifies GAS1 as a novel melanoma metastasis suppressor gene.

PubMed

Gobeil, Stephane; Zhu, Xiaochun; Doillon, Charles J; Green, Michael R

2008-11-01

Metastasis suppressor genes inhibit one or more steps required for metastasis without affecting primary tumor formation. Due to the complexity of the metastatic process, the development of experimental approaches for identifying genes involved in metastasis prevention has been challenging. Here we describe a genome-wide RNAi screening strategy to identify candidate metastasis suppressor genes. Following expression in weakly metastatic B16-F0 mouse melanoma cells, shRNAs were selected based upon enhanced satellite colony formation in a three-dimensional cell culture system and confirmed in a mouse experimental metastasis assay. Using this approach we discovered 22 genes whose knockdown increased metastasis without affecting primary tumor growth. We focused on one of these genes, Gas1 (Growth arrest-specific 1), because we found that it was substantially down-regulated in highly metastatic B16-F10 melanoma cells, which contributed to the high metastatic potential of this mouse cell line. We further demonstrated that Gas1 has all the expected properties of a melanoma tumor suppressor including: suppression of metastasis in a spontaneous metastasis assay, promotion of apoptosis following dissemination of cells to secondary sites, and frequent down-regulation in human melanoma metastasis-derived cell lines and metastatic tumor samples. Thus, we developed a genome-wide shRNA screening strategy that enables the discovery of new metastasis suppressor genes.
Cell Lines Models of Drug Response: Successes and Lessons from this Pharmacogenomic Model

PubMed Central

Jack, J.; Rotroff, D.; Motsinger-Reif, A.

2015-01-01

A new standard for medicine is emerging that aims to improve individual drug responses through studying associations with genetic variations. This field, pharmacogenomics, is undergoing a rapid expansion due to a variety of technological advancements that are enabling higher throughput with reductions in cost. Here we review the advantages, limitations, and opportunities for using lymphoblastoid cell lines (LCL) as a model system for human pharmacogenomic studies. There are a wide range of publicly available resources with genome-wide data available for LCLs from both related and unrelated populations, removing the cost of genotyping the data for drug response studies. Furthermore, in contrast to human clinical trials or in vivo model systems, with high-throughput in vitro screening technologies, pharmacogenomics studies can easily be scaled to accommodate large sample sizes. An important component to leveraging genome-wide data in LCL models is association mapping. Several methods are discussed herein, and include multivariate concentration response modeling, issues with multiple testing, and successful examples of the ‘triangle model’ to identify candidate variants. Once candidate gene variants have been determined, their biological roles can be elucidated using pathway analyses and functionally confirmed using siRNA knockdown experiments. The wealth of genomics data being produced using related and unrelated populations is creating many exciting opportunities leading to new insights into the genetic contribution and heritability of drug response. PMID:25109794
CONAN: copy number variation analysis software for genome-wide association studies

PubMed Central

2010-01-01

Background Genome-wide association studies (GWAS) based on single nucleotide polymorphisms (SNPs) revolutionized our perception of the genetic regulation of complex traits and diseases. Copy number variations (CNVs) promise to shed additional light on the genetic basis of monogenic as well as complex diseases and phenotypes. Indeed, the number of detected associations between CNVs and certain phenotypes are constantly increasing. However, while several software packages support the determination of CNVs from SNP chip data, the downstream statistical inference of CNV-phenotype associations is still subject to complicated and inefficient in-house solutions, thus strongly limiting the performance of GWAS based on CNVs. Results CONAN is a freely available client-server software solution which provides an intuitive graphical user interface for categorizing, analyzing and associating CNVs with phenotypes. Moreover, CONAN assists the evaluation process by visualizing detected associations via Manhattan plots in order to enable a rapid identification of genome-wide significant CNV regions. Various file formats including the information on CNVs in population samples are supported as input data. Conclusions CONAN facilitates the performance of GWAS based on CNVs and the visual analysis of calculated results. CONAN provides a rapid, valid and straightforward software solution to identify genetic variation underlying the 'missing' heritability for complex traits that remains unexplained by recent GWAS. The freely available software can be downloaded at http://genepi-conan.i-med.ac.at. PMID:20546565
Emerging trends in the functional genomics of the abiotic stress response in crop plants.

PubMed

Vij, Shubha; Tyagi, Akhilesh K

2007-05-01

Plants are exposed to different abiotic stresses, such as water deficit, high temperature, salinity, cold, heavy metals and mechanical wounding, under field conditions. It is estimated that such stress conditions can potentially reduce the yield of crop plants by more than 50%. Investigations of the physiological, biochemical and molecular aspects of stress tolerance have been conducted to unravel the intrinsic mechanisms developed during evolution to mitigate against stress by plants. Before the advent of the genomics era, researchers primarily used a gene-by-gene approach to decipher the function of the genes involved in the abiotic stress response. However, abiotic stress tolerance is a complex trait and, although large numbers of genes have been identified to be involved in the abiotic stress response, there remain large gaps in our understanding of the trait. The availability of the genome sequences of certain important plant species has enabled the use of strategies, such as genome-wide expression profiling, to identify the genes associated with the stress response, followed by the verification of gene function by the analysis of mutants and transgenics. Certain components of both abscisic acid-dependent and -independent cascades involved in the stress response have already been identified. Information originating from the genome-wide analysis of abiotic stress tolerance will help to provide an insight into the stress-responsive network(s), and may allow the modification of this network to reduce the loss caused by stress and to increase agricultural productivity.
Bioinformatics challenges for genome-wide association studies.

PubMed

Moore, Jason H; Asselbergs, Folkert W; Williams, Scott M

2010-02-15

The sequencing of the human genome has made it possible to identify an informative set of >1 million single nucleotide polymorphisms (SNPs) across the genome that can be used to carry out genome-wide association studies (GWASs). The availability of massive amounts of GWAS data has necessitated the development of new biostatistical methods for quality control, imputation and analysis issues including multiple testing. This work has been successful and has enabled the discovery of new associations that have been replicated in multiple studies. However, it is now recognized that most SNPs discovered via GWAS have small effects on disease susceptibility and thus may not be suitable for improving health care through genetic testing. One likely explanation for the mixed results of GWAS is that the current biostatistical analysis paradigm is by design agnostic or unbiased in that it ignores all prior knowledge about disease pathobiology. Further, the linear modeling framework that is employed in GWAS often considers only one SNP at a time thus ignoring their genomic and environmental context. There is now a shift away from the biostatistical approach toward a more holistic approach that recognizes the complexity of the genotype-phenotype relationship that is characterized by significant heterogeneity and gene-gene and gene-environment interaction. We argue here that bioinformatics has an important role to play in addressing the complexity of the underlying genetic basis of common human diseases. The goal of this review is to identify and discuss those GWAS challenges that will require computational methods.
LAILAPS: the plant science search engine.

PubMed

Esch, Maria; Chen, Jinbo; Colmsee, Christian; Klapperstück, Matthias; Grafahrend-Belau, Eva; Scholz, Uwe; Lange, Matthias

2015-01-01

With the number of sequenced plant genomes growing, the number of predicted genes and functional annotations is also increasing. The association between genes and phenotypic traits is currently of great interest. Unfortunately, the information available today is widely scattered over a number of different databases. Information retrieval (IR) has become an all-encompassing bioinformatics methodology for extracting knowledge from complex, heterogeneous and distributed databases, and therefore can be a useful tool for obtaining a comprehensive view of plant genomics, from genes to traits. Here we describe LAILAPS (http://lailaps.ipk-gatersleben.de), an IR system designed to link plant genomic data in the context of phenotypic attributes for a detailed forward genetic research. LAILAPS comprises around 65 million indexed documents, encompassing >13 major life science databases with around 80 million links to plant genomic resources. The LAILAPS search engine allows fuzzy querying for candidate genes linked to specific traits over a loosely integrated system of indexed and interlinked genome databases. Query assistance and an evidence-based annotation system enable time-efficient and comprehensive information retrieval. An artificial neural network incorporating user feedback and behavior tracking allows relevance sorting of results. We fully describe LAILAPS's functionality and capabilities by comparing this system's performance with other widely used systems and by reporting both a validation in maize and a knowledge discovery use-case focusing on candidate genes in barley. © The Author 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists.
FORESEE: Fully Outsourced secuRe gEnome Study basEd on homomorphic Encryption

PubMed Central

2015-01-01

Background The increasing availability of genome data motivates massive research studies in personalized treatment and precision medicine. Public cloud services provide a flexible way to mitigate the storage and computation burden in conducting genome-wide association studies (GWAS). However, data privacy has been widely concerned when sharing the sensitive information in a cloud environment. Methods We presented a novel framework (FORESEE: Fully Outsourced secuRe gEnome Study basEd on homomorphic Encryption) to fully outsource GWAS (i.e., chi-square statistic computation) using homomorphic encryption. The proposed framework enables secure divisions over encrypted data. We introduced two division protocols (i.e., secure errorless division and secure approximation division) with a trade-off between complexity and accuracy in computing chi-square statistics. Results The proposed framework was evaluated for the task of chi-square statistic computation with two case-control datasets from the 2015 iDASH genome privacy protection challenge. Experimental results show that the performance of FORESEE can be significantly improved through algorithmic optimization and parallel computation. Remarkably, the secure approximation division provides significant performance gain, but without missing any significance SNPs in the chi-square association test using the aforementioned datasets. Conclusions Unlike many existing HME based studies, in which final results need to be computed by the data owner due to the lack of the secure division operation, the proposed FORESEE framework support complete outsourcing to the cloud and output the final encrypted chi-square statistics. PMID:26733391

FORESEE: Fully Outsourced secuRe gEnome Study basEd on homomorphic Encryption.

PubMed

Zhang, Yuchen; Dai, Wenrui; Jiang, Xiaoqian; Xiong, Hongkai; Wang, Shuang

2015-01-01

The increasing availability of genome data motivates massive research studies in personalized treatment and precision medicine. Public cloud services provide a flexible way to mitigate the storage and computation burden in conducting genome-wide association studies (GWAS). However, data privacy has been widely concerned when sharing the sensitive information in a cloud environment. We presented a novel framework (FORESEE: Fully Outsourced secuRe gEnome Study basEd on homomorphic Encryption) to fully outsource GWAS (i.e., chi-square statistic computation) using homomorphic encryption. The proposed framework enables secure divisions over encrypted data. We introduced two division protocols (i.e., secure errorless division and secure approximation division) with a trade-off between complexity and accuracy in computing chi-square statistics. The proposed framework was evaluated for the task of chi-square statistic computation with two case-control datasets from the 2015 iDASH genome privacy protection challenge. Experimental results show that the performance of FORESEE can be significantly improved through algorithmic optimization and parallel computation. Remarkably, the secure approximation division provides significant performance gain, but without missing any significance SNPs in the chi-square association test using the aforementioned datasets. Unlike many existing HME based studies, in which final results need to be computed by the data owner due to the lack of the secure division operation, the proposed FORESEE framework support complete outsourcing to the cloud and output the final encrypted chi-square statistics.
Genomics Virtual Laboratory: A Practical Bioinformatics Workbench for the Cloud

PubMed Central

Afgan, Enis; Sloggett, Clare; Goonasekera, Nuwan; Makunin, Igor; Benson, Derek; Crowe, Mark; Gladman, Simon; Kowsar, Yousef; Pheasant, Michael; Horst, Ron; Lonie, Andrew

2015-01-01

Background Analyzing high throughput genomics data is a complex and compute intensive task, generally requiring numerous software tools and large reference data sets, tied together in successive stages of data transformation and visualisation. A computational platform enabling best practice genomics analysis ideally meets a number of requirements, including: a wide range of analysis and visualisation tools, closely linked to large user and reference data sets; workflow platform(s) enabling accessible, reproducible, portable analyses, through a flexible set of interfaces; highly available, scalable computational resources; and flexibility and versatility in the use of these resources to meet demands and expertise of a variety of users. Access to an appropriate computational platform can be a significant barrier to researchers, as establishing such a platform requires a large upfront investment in hardware, experience, and expertise. Results We designed and implemented the Genomics Virtual Laboratory (GVL) as a middleware layer of machine images, cloud management tools, and online services that enable researchers to build arbitrarily sized compute clusters on demand, pre-populated with fully configured bioinformatics tools, reference datasets and workflow and visualisation options. The platform is flexible in that users can conduct analyses through web-based (Galaxy, RStudio, IPython Notebook) or command-line interfaces, and add/remove compute nodes and data resources as required. Best-practice tutorials and protocols provide a path from introductory training to practice. The GVL is available on the OpenStack-based Australian Research Cloud (http://nectar.org.au) and the Amazon Web Services cloud. The principles, implementation and build process are designed to be cloud-agnostic. Conclusions This paper provides a blueprint for the design and implementation of a cloud-based Genomics Virtual Laboratory. We discuss scope, design considerations and technical and logistical constraints, and explore the value added to the research community through the suite of services and resources provided by our implementation. PMID:26501966
Genomics Virtual Laboratory: A Practical Bioinformatics Workbench for the Cloud.

PubMed

Afgan, Enis; Sloggett, Clare; Goonasekera, Nuwan; Makunin, Igor; Benson, Derek; Crowe, Mark; Gladman, Simon; Kowsar, Yousef; Pheasant, Michael; Horst, Ron; Lonie, Andrew

2015-01-01

Analyzing high throughput genomics data is a complex and compute intensive task, generally requiring numerous software tools and large reference data sets, tied together in successive stages of data transformation and visualisation. A computational platform enabling best practice genomics analysis ideally meets a number of requirements, including: a wide range of analysis and visualisation tools, closely linked to large user and reference data sets; workflow platform(s) enabling accessible, reproducible, portable analyses, through a flexible set of interfaces; highly available, scalable computational resources; and flexibility and versatility in the use of these resources to meet demands and expertise of a variety of users. Access to an appropriate computational platform can be a significant barrier to researchers, as establishing such a platform requires a large upfront investment in hardware, experience, and expertise. We designed and implemented the Genomics Virtual Laboratory (GVL) as a middleware layer of machine images, cloud management tools, and online services that enable researchers to build arbitrarily sized compute clusters on demand, pre-populated with fully configured bioinformatics tools, reference datasets and workflow and visualisation options. The platform is flexible in that users can conduct analyses through web-based (Galaxy, RStudio, IPython Notebook) or command-line interfaces, and add/remove compute nodes and data resources as required. Best-practice tutorials and protocols provide a path from introductory training to practice. The GVL is available on the OpenStack-based Australian Research Cloud (http://nectar.org.au) and the Amazon Web Services cloud. The principles, implementation and build process are designed to be cloud-agnostic. This paper provides a blueprint for the design and implementation of a cloud-based Genomics Virtual Laboratory. We discuss scope, design considerations and technical and logistical constraints, and explore the value added to the research community through the suite of services and resources provided by our implementation.
Microsatellite genotyping and genome-wide single nucleotide polymorphism-based indices of Plasmodium falciparum diversity within clinical infections.

PubMed

Murray, Lee; Mobegi, Victor A; Duffy, Craig W; Assefa, Samuel A; Kwiatkowski, Dominic P; Laman, Eugene; Loua, Kovana M; Conway, David J

2016-05-12

In regions where malaria is endemic, individuals are often infected with multiple distinct parasite genotypes, a situation that may impact on evolution of parasite virulence and drug resistance. Most approaches to studying genotypic diversity have involved analysis of a modest number of polymorphic loci, although whole genome sequencing enables a broader characterisation of samples. PCR-based microsatellite typing of a panel of ten loci was performed on Plasmodium falciparum in 95 clinical isolates from a highly endemic area in the Republic of Guinea, to characterize within-isolate genetic diversity. Separately, single nucleotide polymorphism (SNP) data from genome-wide short-read sequences of the same samples were used to derive within-isolate fixation indices (F ws), an inverse measure of diversity within each isolate compared to overall local genetic diversity. The latter indices were compared with the microsatellite results, and also with indices derived by randomly sampling modest numbers of SNPs. As expected, the number of microsatellite loci with more than one allele in each isolate was highly significantly inversely correlated with the genome-wide F ws fixation index (r = -0.88, P < 0.001). However, the microsatellite analysis revealed that most isolates contained mixed genotypes, even those that had no detectable genome sequence heterogeneity. Random sampling of different numbers of SNPs showed that an F ws index derived from ten or more SNPs with minor allele frequencies of >10 % had high correlation (r > 0.90) with the index derived using all SNPs. Different types of data give highly correlated indices of within-infection diversity, although PCR-based analysis detects low-level minority genotypes not apparent in bulk sequence analysis. When whole-genome data are not obtainable, quantitative assay of ten or more SNPs can yield a reasonably accurate estimate of the within-infection fixation index (F ws).
Power considerations for λ inflation factor in meta-analyses of genome-wide association studies.

PubMed

Georgiopoulos, Georgios; Evangelou, Evangelos

2016-05-19

The genomic control (GC) approach is extensively used to effectively control false positive signals due to population stratification in genome-wide association studies (GWAS). However, GC affects the statistical power of GWAS. The loss of power depends on the magnitude of the inflation factor (λ) that is used for GC. We simulated meta-analyses of different GWAS. Minor allele frequency (MAF) ranged from 0·001 to 0·5 and λ was sampled from two scenarios: (i) random scenario (empirically-derived distribution of real λ values) and (ii) selected scenario from simulation parameter modification. Adjustment for λ was considered under single correction (within study corrected standard errors) and double correction (additional λ corrected summary estimate). MAF was a pivotal determinant of observed power. In random λ scenario, double correction induced a symmetric power reduction in comparison to single correction. For MAF 1·2 and MAF >5%. Our results provide a quick but detailed index for power considerations of future meta-analyses of GWAS that enables a more flexible design from early steps based on the number of studies accumulated in different groups and the λ values observed in the single studies.
Prostate Cancer Genomics: Recent Advances and the Prevailing Underrepresentation from Racial and Ethnic Minorities.

PubMed

Tan, Shyh-Han; Petrovics, Gyorgy; Srivastava, Shiv

2018-04-22

Prostate cancer (CaP) is the most commonly diagnosed non-cutaneous cancer and the second leading cause of male cancer deaths in the United States. Among African American (AA) men, CaP is the most prevalent malignancy, with disproportionately higher incidence and mortality rates. Even after discounting the influence of socioeconomic factors, the effect of molecular and genetic factors on racial disparity of CaP is evident. Earlier studies on the molecular basis for CaP disparity have focused on the influence of heritable mutations and single-nucleotide polymorphisms (SNPs). Most CaP susceptibility alleles identified based on genome-wide association studies (GWAS) were common, low-penetrance variants. Germline CaP-associated mutations that are highly penetrant, such as those found in HOXB13 and BRCA2 , are usually rare. More recently, genomic studies enabled by Next-Gen Sequencing (NGS) technologies have focused on the identification of somatic mutations that contribute to CaP tumorigenesis. These studies confirmed the high prevalence of ERG gene fusions and PTEN deletions among Caucasian Americans and identified novel somatic alterations in SPOP and FOXA1 genes in early stages of CaP. Individuals with African ancestry and other minorities are often underrepresented in these large-scale genomic studies, which are performed primarily using tumors from men of European ancestry. The insufficient number of specimens from AA men and other minority populations, together with the heterogeneity in the molecular etiology of CaP across populations, challenge the generalizability of findings from these projects. Efforts to close this gap by sequencing larger numbers of tumor specimens from more diverse populations, although still at an early stage, have discovered distinct genomic alterations. These research findings can have a direct impact on the diagnosis of CaP, the stratification of patients for treatment, and can help to address the disparity in incidence and mortality of CaP. This review examines the progress of understanding in CaP genetics and genomics and highlight the need to increase the representation from minority populations.
QuIN: A Web Server for Querying and Visualizing Chromatin Interaction Networks

PubMed Central

Thibodeau, Asa; Márquez, Eladio J.; Luo, Oscar; Ruan, Yijun; Shin, Dong-Guk; Stitzel, Michael L.; Ucar, Duygu

2016-01-01

Recent studies of the human genome have indicated that regulatory elements (e.g. promoters and enhancers) at distal genomic locations can interact with each other via chromatin folding and affect gene expression levels. Genomic technologies for mapping interactions between DNA regions, e.g., ChIA-PET and HiC, can generate genome-wide maps of interactions between regulatory elements. These interaction datasets are important resources to infer distal gene targets of non-coding regulatory elements and to facilitate prioritization of critical loci for important cellular functions. With the increasing diversity and complexity of genomic information and public ontologies, making sense of these datasets demands integrative and easy-to-use software tools. Moreover, network representation of chromatin interaction maps enables effective data visualization, integration, and mining. Currently, there is no software that can take full advantage of network theory approaches for the analysis of chromatin interaction datasets. To fill this gap, we developed a web-based application, QuIN, which enables: 1) building and visualizing chromatin interaction networks, 2) annotating networks with user-provided private and publicly available functional genomics and interaction datasets, 3) querying network components based on gene name or chromosome location, and 4) utilizing network based measures to identify and prioritize critical regulatory targets and their direct and indirect interactions. AVAILABILITY: QuIN’s web server is available at http://quin.jax.org QuIN is developed in Java and JavaScript, utilizing an Apache Tomcat web server and MySQL database and the source code is available under the GPLV3 license available on GitHub: https://github.com/UcarLab/QuIN/. PMID:27336171
GeNemo: a search engine for web-based functional genomic data.

PubMed

Zhang, Yongqing; Cao, Xiaoyi; Zhong, Sheng

2016-07-08

A set of new data types emerged from functional genomic assays, including ChIP-seq, DNase-seq, FAIRE-seq and others. The results are typically stored as genome-wide intensities (WIG/bigWig files) or functional genomic regions (peak/BED files). These data types present new challenges to big data science. Here, we present GeNemo, a web-based search engine for functional genomic data. GeNemo searches user-input data against online functional genomic datasets, including the entire collection of ENCODE and mouse ENCODE datasets. Unlike text-based search engines, GeNemo's searches are based on pattern matching of functional genomic regions. This distinguishes GeNemo from text or DNA sequence searches. The user can input any complete or partial functional genomic dataset, for example, a binding intensity file (bigWig) or a peak file. GeNemo reports any genomic regions, ranging from hundred bases to hundred thousand bases, from any of the online ENCODE datasets that share similar functional (binding, modification, accessibility) patterns. This is enabled by a Markov Chain Monte Carlo-based maximization process, executed on up to 24 parallel computing threads. By clicking on a search result, the user can visually compare her/his data with the found datasets and navigate the identified genomic regions. GeNemo is available at www.genemo.org. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Introduction to the fathead minnow genome browser and ...

EPA Pesticide Factsheets

Ab initio gene prediction and evidence alignment were used to produce the first annotations for the fathead minnow SOAPdenovo genome assembly. Additionally, a genome browser hosted at genome.setac.org provides simplified access to the annotation data in context with fathead minnow genomic sequence. This work is meant to extend the utility of fathead minnow genome as a resource and enable the continued development of this species as a model organism. The fathead minnow (Pimephales promelas) is a laboratory model organism widely used in regulatory toxicity testing and ecotoxicology research. Despite, the wealth of toxicological data for this organism, until recently genome scale information was lacking for the species, which limited the utility of the species for pathway-based toxicity testing and research. As part of a EPA Pathfinder Innovation Project, next generation sequencing was applied to generate a draft genome assembly, which was published in 2016. However, application of those genome-scale sequencing resources was still limited by the lack of available gene annotations for fathead minnow. Here we report on development of a first generation genome annotation for fathead minnow and the dissemination of that information through a web-based browser that makes it easy to search for genes of interest, extract the corresponding sequence, identify intron and exon boundaries and regulatory regions, and align the computationally predicted genes with other supporti
Epigenetics, chromatin and genome organization: recent advances from the ENCODE project.

PubMed

Siggens, L; Ekwall, K

2014-09-01

The organization of the genome into functional units, such as enhancers and active or repressed promoters, is associated with distinct patterns of DNA and histone modifications. The Encyclopedia of DNA Elements (ENCODE) project has advanced our understanding of the principles of genome, epigenome and chromatin organization, identifying hundreds of thousands of potential regulatory regions and transcription factor binding sites. Part of the ENCODE consortium, GENCODE, has annotated the human genome with novel transcripts including new noncoding RNAs and pseudogenes, highlighting transcriptional complexity. Many disease variants identified in genome-wide association studies are located within putative enhancer regions defined by the ENCODE project. Understanding the principles of chromatin and epigenome organization will help to identify new disease mechanisms, biomarkers and drug targets, particularly as ongoing epigenome mapping projects generate data for primary human cell types that play important roles in disease. © 2014 The Association for the Publication of the Journal of Internal Medicine.
Chapter 27 -- Breast Cancer Genomics, Section VI, Pathology and Biological Markers of Invasive Breast Cancer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Spellman, Paul T.; Heiser, Laura; Gray, Joe W.

2009-06-18

Breast cancer is predominantly a disease of the genome with cancers arising and progressing through accumulation of aberrations that alter the genome - by changing DNA sequence, copy number, and structure in ways that that contribute to diverse aspects of cancer pathophysiology. Classic examples of genomic events that contribute to breast cancer pathophysiology include inherited mutations in BRCA1, BRCA2, TP53, and CHK2 that contribute to the initiation of breast cancer, amplification of ERBB2 (formerly HER2) and mutations of elements of the PI3-kinase pathway that activate aspects of epidermal growth factor receptor (EGFR) signaling and deletion of CDKN2A/B that contributes tomore » cell cycle deregulation and genome instability. It is now apparent that accumulation of these aberrations is a time-dependent process that accelerates with age. Although American women living to an age of 85 have a 1 in 8 chance of developing breast cancer, the incidence of cancer in women younger than 30 years is uncommon. This is consistent with a multistep cancer progression model whereby mutation and selection drive the tumor's development, analogous to traditional Darwinian evolution. In the case of cancer, the driving events are changes in sequence, copy number, and structure of DNA and alterations in chromatin structure or other epigenetic marks. Our understanding of the genetic, genomic, and epigenomic events that influence the development and progression of breast cancer is increasing at a remarkable rate through application of powerful analysis tools that enable genome-wide analysis of DNA sequence and structure, copy number, allelic loss, and epigenomic modification. Application of these techniques to elucidation of the nature and timing of these events is enriching our understanding of mechanisms that increase breast cancer susceptibility, enable tumor initiation and progression to metastatic disease, and determine therapeutic response or resistance. These studies also reveal the molecular differences between cancer and normal that may be exploited to therapeutic benefit or that provide targets for molecular assays that may enable early cancer detection, and predict individual disease progression or response to treatment. This chapter reviews current and future directions in genome analysis and summarizes studies that provide insights into breast cancer pathophysiology or that suggest strategies to improve breast cancer management.« less
Novel insights into the Elm Yellows phytoplasma genome and into the metagenome of Elm Yellows-infected elms

USDA-ARS?s Scientific Manuscript database

In North America, American elms were historically present throughout the northeastern United States and southeastern Canada. The longevity of these trees, their resistance to the harsh urban environment, and their aesthetics led to their wide use in landscaping and streetscaping over several decade...
Genome-wide association analysis identifies candidate genes associated with iron deficiency chlorosis in soybean

USDA-ARS?s Scientific Manuscript database

Iron deficiency chlorosis (IDC) is a significant yield-limiting problem in some of the major soybean production regions in the United States. Soybean plants display a variety of symptoms, ranging from slight yellowing of the leaves to interveinal chlorosis and sometimes it is followed by stunted gr...
GWAS meta-analysis and replication identifies three new susceptibility loci for ovarian cancer

PubMed Central

Pharoah, Paul D. P.; Tsai, Ya-Yu; Ramus, Susan J.; Phelan, Catherine M.; Goode, Ellen L.; Lawrenson, Kate; Price, Melissa; Fridley, Brooke L.; Tyrer, Jonathan P.; Shen, Howard; Weber, Rachel; Karevan, Rod; Larson, Melissa C.; Song, Honglin; Tessier, Daniel C.; Bacot, François; Vincent, Daniel; Cunningham, Julie M.; Dennis, Joe; Dicks, Ed; Aben, Katja K.; Anton-Culver, Hoda; Antonenkova, Natalia; Armasu, Sebastian M.; Baglietto, Laura; Bandera, Elisa V.; Beckmann, Matthias W.; Birrer, Michael J.; Bloom, Greg; Bogdanova, Natalia; Brenton, James D.; Brinton, Louise A.; Brooks-Wilson, Angela; Brown, Robert; Butzow, Ralf; Campbell, Ian; Carney, Michael E; Carvalho, Renato S.; Chang-Claude, Jenny; Chen, Y. Anne; Chen, Zhihua; Chow, Wong-Ho; Cicek, Mine S.; Coetzee, Gerhard; Cook, Linda S.; Cramer, Daniel W.; Cybulski, Cezary; Dansonka-Mieszkowska, Agnieszka; Despierre, Evelyn; Doherty, Jennifer A; Dörk, Thilo; du Bois, Andreas; Dürst, Matthias; Eccles, Diana; Edwards, Robert; Ekici, Arif B.; Fasching, Peter A.; Fenstermacher, David; Flanagan, James; Gao, Yu-Tang; Garcia-Closas, Montserrat; Gentry-Maharaj, Aleksandra; Giles, Graham; Gjyshi, Anxhela; Gore, Martin; Gronwald, Jacek; Guo, Qi; Halle, Mari K; Harter, Philipp; Hein, Alexander; Heitz, Florian; Hillemanns, Peter; Hoatlin, Maureen; Høgdall, Estrid; Høgdall, Claus K.; Hosono, Satoyo; Jakubowska, Anna; Jensen, Allan; Kalli, Kimberly R.; Karlan, Beth Y.; Kelemen, Linda E.; Kiemeney, Lambertus A.; Kjaer, Susanne Krüger; Konecny, Gottfried E.; Krakstad, Camilla; Kupryjanczyk, Jolanta; Lambrechts, Diether; Lambrechts, Sandrina; Le, Nhu D.; Lee, Nathan; Lee, Janet; Leminen, Arto; Lim, Boon Kiong; Lissowska, Jolanta; Lubiński, Jan; Lundvall, Lene; Lurie, Galina; Massuger, Leon F.A.G.; Matsuo, Keitaro; McGuire, Valerie; McLaughlin, John R; Menon, Usha; Modugno, Francesmary; Moysich, Kirsten B.; Nakanishi, Toru; Narod, Steven A.; Ness, Roberta B.; Nevanlinna, Heli; Nickels, Stefan; Noushmehr, Houtan; Odunsi, Kunle; Olson, Sara; Orlow, Irene; Paul, James; Pejovic, Tanja; Pelttari, Liisa M; Permuth-Wey, Jenny; Pike, Malcolm C; Poole, Elizabeth M; Qu, Xiaotao; Risch, Harvey A.; Rodriguez-Rodriguez, Lorna; Rossing, Mary Anne; Rudolph, Anja; Runnebaum, Ingo; Rzepecka, Iwona K; Salvesen, Helga B.; Schwaab, Ira; Severi, Gianluca; Shen, Hui; Shridhar, Vijayalakshmi; Shu, Xiao-Ou; Sieh, Weiva; Southey, Melissa C.; Spellman, Paul; Tajima, Kazuo; Teo, Soo-Hwang; Terry, Kathryn L.; Thompson, Pamela J; Timorek, Agnieszka; Tworoger, Shelley S.; van Altena, Anne M.; Berg, David Van Den; Vergote, Ignace; Vierkant, Robert A.; Vitonis, Allison F.; Wang-Gohrke, Shan; Wentzensen, Nicolas; Whittemore, Alice S.; Wik, Elisabeth; Winterhoff, Boris; Woo, Yin Ling; Wu, Anna H; Yang, Hannah P.; Zheng, Wei; Ziogas, Argyrios; Zulkifli, Famida; Goodman, Marc T.; Hall, Per; Easton, Douglas F; Pearce, Celeste L; Berchuck, Andrew; Chenevix-Trench, Georgia; Iversen, Edwin; Monteiro, Alvaro N.A.; Gayther, Simon A.; Schildkraut, Joellen M.; Sellers, Thomas A.

2013-01-01

Genome wide association studies (GWAS) have identified four susceptibility loci for epithelial ovarian cancer (EOC) with another two loci being close to genome-wide significance. We pooled data from a GWAS conducted in North America with another GWAS from the United Kingdom. We selected the top 24,551 SNPs for inclusion on the iCOGS custom genotyping array. Follow-up genotyping was carried out in 18,174 cases and 26,134 controls from 43 studies from the Ovarian Cancer Association Consortium. We validated the two loci at 3q25 and 17q21 previously near genome-wide significance and identified three novel loci associated with risk; two loci associated with all EOC subtypes, at 8q21 (rs11782652, P=5.5×10-9) and 10p12 (rs1243180; P=1.8×10-8), and another locus specific to the serous subtype at 17q12 (rs757210; P=8.1×10-10). An integrated molecular analysis of genes and regulatory regions at these loci provided evidence for functional mechanisms underlying susceptibility that implicates CHMP4C in the pathogenesis of ovarian cancer. PMID:23535730
Genome-wide association study of handedness excludes simple genetic models

PubMed Central

Armour, J AL; Davison, A; McManus, I C

2014-01-01

Handedness is a human behavioural phenotype that appears to be congenital, and is often assumed to be inherited, but for which the developmental origin and underlying causation(s) have been elusive. Models of the genetic basis of variation in handedness have been proposed that fit different features of the observed resemblance between relatives, but none has been decisively tested or a corresponding causative locus identified. In this study, we applied data from well-characterised individuals studied at the London Twin Research Unit. Analysis of genome-wide SNP data from 3940 twins failed to identify any locus associated with handedness at a genome-wide level of significance. The most straightforward interpretation of our analyses is that they exclude the simplest formulations of the ‘right-shift' model of Annett and the ‘dextral/chance' model of McManus, although more complex modifications of those models are still compatible with our observations. For polygenic effects, our study is inadequately powered to reliably detect alleles with effect sizes corresponding to an odds ratio of 1.2, but should have good power to detect effects at an odds ratio of 2 or more. PMID:24065183
Aptazyme-embedded guide RNAs enable ligand-responsive genome editing and transcriptional activation

PubMed Central

Tang, Weixin; Hu, Johnny H.; Liu, David R.

2017-01-01

Programmable sequence-specific genome editing agents such as CRISPR-Cas9 have greatly advanced our ability to manipulate the human genome. Although canonical forms of genome-editing agents and programmable transcriptional regulators are constitutively active, precise temporal and spatial control over genome editing and transcriptional regulation activities would enable the more selective and potentially safer use of these powerful technologies. Here, by incorporating ligand-responsive self-cleaving catalytic RNAs (aptazymes) into guide RNAs, we developed a set of aptazyme-embedded guide RNAs that enable small molecule-controlled nuclease-mediated genome editing and small molecule-controlled base editing, as well as small molecule-dependent transcriptional activation in mammalian cells. PMID:28656978
Butterfly genomics eclosing.

PubMed

Beldade, P; McMillan, W O; Papanicolaou, A

2008-02-01

Technological and conceptual advances of the last decade have led to an explosion of genomic data and the emergence of new research avenues. Evolutionary and ecological functional genomics, with its focus on the genes that affect ecological success and adaptation in natural populations, benefits immensely from a phylogenetically widespread sampling of biological patterns and processes. Among those organisms outside established model systems, butterflies offer exceptional opportunities for multidisciplinary research on the processes generating and maintaining variation in ecologically relevant traits. Here we highlight research on wing color pattern variation in two groups of Nymphalid butterflies, the African species Bicyclus anynana (subfamily Satyrinae) and species of the South American genus Heliconius (subfamily Heliconiinae), which are emerging as important systems for studying the nature and origins of functional diversity. Growing genomic resources including genomic and cDNA libraries, dense genetic maps, high-density gene arrays, and genetic transformation techniques are extending current gene mapping and expression profiling analysis and enabling the next generation of research questions linking genes, development, form, and fitness. Efforts to develop such resources in Bicyclus and Heliconius underscore the general challenges facing the larger research community and highlight the need for a community-wide effort to extend ongoing functional genomic research on butterflies.
Travel- and Community-Based Transmission of Multidrug-Resistant Shigella sonnei Lineage among International Orthodox Jewish Communities

PubMed Central

Baker, Kate S.; Dallman, Timothy J.; Behar, Adi; Weill, François-Xavier; Gouali, Malika; Sobel, Jeremy; Fookes, Maria; Valinsky, Lea; Gal-Mor, Ohad; Connor, Thomas R.; Nissan, Israel; Bertrand, Sophie; Parkhill, Julian; Jenkins, Claire; Cohen, Dani

2016-01-01

Shigellae are sensitive indicator species for studying trends in the international transmission of antimicrobial-resistant Enterobacteriaceae. Orthodox Jewish communities (OJCs) are a known risk group for shigellosis; Shigella sonnei is cyclically epidemic in OJCs in Israel, and sporadic outbreaks occur in OJCs elsewhere. We generated whole-genome sequences for 437 isolates of S. sonnei from OJCs and non-OJCs collected over 22 years in Europe (the United Kingdom, France, and Belgium), the United States, Canada, and Israel and analyzed these within a known global genomic context. Through phylogenetic and genomic analysis, we showed that strains from outbreaks in OJCs outside of Israel are distinct from strains in the general population and relate to a single multidrug-resistant sublineage of S. sonnei that prevails in Israel. Further Bayesian phylogenetic analysis showed that this strain emerged approximately 30 years ago, demonstrating the speed at which antimicrobial drug–resistant pathogens can spread widely through geographically dispersed, but internationally connected, communities. PMID:27532625
Travel- and Community-Based Transmission of Multidrug-Resistant Shigella sonnei Lineage among International Orthodox Jewish Communities.

PubMed

Baker, Kate S; Dallman, Timothy J; Behar, Adi; Weill, François-Xavier; Gouali, Malika; Sobel, Jeremy; Fookes, Maria; Valinsky, Lea; Gal-Mor, Ohad; Connor, Thomas R; Nissan, Israel; Bertrand, Sophie; Parkhill, Julian; Jenkins, Claire; Cohen, Dani; Thomson, Nicholas R

2016-09-01

Shigellae are sensitive indicator species for studying trends in the international transmission of antimicrobial-resistant Enterobacteriaceae. Orthodox Jewish communities (OJCs) are a known risk group for shigellosis; Shigella sonnei is cyclically epidemic in OJCs in Israel, and sporadic outbreaks occur in OJCs elsewhere. We generated whole-genome sequences for 437 isolates of S. sonnei from OJCs and non-OJCs collected over 22 years in Europe (the United Kingdom, France, and Belgium), the United States, Canada, and Israel and analyzed these within a known global genomic context. Through phylogenetic and genomic analysis, we showed that strains from outbreaks in OJCs outside of Israel are distinct from strains in the general population and relate to a single multidrug-resistant sublineage of S. sonnei that prevails in Israel. Further Bayesian phylogenetic analysis showed that this strain emerged approximately 30 years ago, demonstrating the speed at which antimicrobial drug-resistant pathogens can spread widely through geographically dispersed, but internationally connected, communities.
Genomic and Histopathological Tissue Biomarkers That Predict Radiotherapy Response in Localised Prostate Cancer

PubMed Central

Wilkins, Anna; Dearnaley, David; Somaiah, Navita

2015-01-01

Localised prostate cancer, in particular, intermediate risk disease, has varied survival outcomes that cannot be predicted accurately using current clinical risk factors. External beam radiotherapy (EBRT) is one of the standard curative treatment options for localised disease and its efficacy is related to wide ranging aspects of tumour biology. Histopathological techniques including immunohistochemistry and a variety of genomic assays have been used to identify biomarkers of tumour proliferation, cell cycle checkpoints, hypoxia, DNA repair, apoptosis, and androgen synthesis, which predict response to radiotherapy. Global measures of genomic instability also show exciting capacity to predict survival outcomes following EBRT. There is also an urgent clinical need for biomarkers to predict the radiotherapy fraction sensitivity of different prostate tumours and preclinical studies point to possible candidates. Finally, the increased resolution of next generation sequencing (NGS) is likely to enable yet more precise molecular predictions of radiotherapy response and fraction sensitivity. PMID:26504789

PRESAGE: PRivacy-preserving gEnetic testing via SoftwAre Guard Extension.

PubMed

Chen, Feng; Wang, Chenghong; Dai, Wenrui; Jiang, Xiaoqian; Mohammed, Noman; Al Aziz, Md Momin; Sadat, Md Nazmus; Sahinalp, Cenk; Lauter, Kristin; Wang, Shuang

2017-07-26

Advances in DNA sequencing technologies have prompted a wide range of genomic applications to improve healthcare and facilitate biomedical research. However, privacy and security concerns have emerged as a challenge for utilizing cloud computing to handle sensitive genomic data. We present one of the first implementations of Software Guard Extension (SGX) based securely outsourced genetic testing framework, which leverages multiple cryptographic protocols and minimal perfect hash scheme to enable efficient and secure data storage and computation outsourcing. We compared the performance of the proposed PRESAGE framework with the state-of-the-art homomorphic encryption scheme, as well as the plaintext implementation. The experimental results demonstrated significant performance over the homomorphic encryption methods and a small computational overhead in comparison to plaintext implementation. The proposed PRESAGE provides an alternative solution for secure and efficient genomic data outsourcing in an untrusted cloud by using a hybrid framework that combines secure hardware and multiple crypto protocols.
Single-cell RNA-sequencing: The future of genome biology is now

PubMed Central

Picelli, Simone

2017-01-01

ABSTRACT Genome-wide single-cell analysis represents the ultimate frontier of genomics research. In particular, single-cell RNA-sequencing (scRNA-seq) studies have been boosted in the last few years by an explosion of new technologies enabling the study of the transcriptomic landscape of thousands of single cells in complex multicellular organisms. More sensitive and automated methods are being continuously developed and promise to deliver better data quality and higher throughput with less hands-on time. The outstanding amount of knowledge that is going to be gained from present and future studies will have a profound impact in many aspects of our society, from the introduction of truly tailored cancer treatments, to a better understanding of antibiotic resistance and host-pathogen interactions; from the discovery of the mechanisms regulating stem cell differentiation to the characterization of the early event of human embryogenesis. PMID:27442339
Whole genome DNA methylation: beyond genes silencing.

PubMed

Tirado-Magallanes, Roberto; Rebbani, Khadija; Lim, Ricky; Pradhan, Sriharsa; Benoukraf, Touati

2017-01-17

The combination of DNA bisulfite treatment with high-throughput sequencing technologies has enabled investigation of genome-wide DNA methylation at near base pair level resolution, far beyond that of the kilobase-long canonical CpG islands that initially revealed the biological relevance of this covalent DNA modification. The latest high-resolution studies have revealed a role for very punctual DNA methylation in chromatin plasticity, gene regulation and splicing. Here, we aim to outline the major biological consequences of DNA methylation recently discovered. We also discuss the necessity of tuning DNA methylation resolution into an adequate scale to ease the integration of the methylome information with other chromatin features and transcription events such as gene expression, nucleosome positioning, transcription factors binding dynamic, gene splicing and genomic imprinting. Finally, our review sheds light on DNA methylation heterogeneity in cell population and the different approaches used for its assessment, including the contribution of single cell DNA analysis technology.
Whole genome DNA methylation: beyond genes silencing

PubMed Central

Tirado-Magallanes, Roberto; Rebbani, Khadija; Lim, Ricky; Pradhan, Sriharsa; Benoukraf, Touati

2017-01-01

The combination of DNA bisulfite treatment with high-throughput sequencing technologies has enabled investigation of genome-wide DNA methylation at near base pair level resolution, far beyond that of the kilobase-long canonical CpG islands that initially revealed the biological relevance of this covalent DNA modification. The latest high-resolution studies have revealed a role for very punctual DNA methylation in chromatin plasticity, gene regulation and splicing. Here, we aim to outline the major biological consequences of DNA methylation recently discovered. We also discuss the necessity of tuning DNA methylation resolution into an adequate scale to ease the integration of the methylome information with other chromatin features and transcription events such as gene expression, nucleosome positioning, transcription factors binding dynamic, gene splicing and genomic imprinting. Finally, our review sheds light on DNA methylation heterogeneity in cell population and the different approaches used for its assessment, including the contribution of single cell DNA analysis technology. PMID:27895318
Bridging epigenomics and complex disease: the basics.

PubMed

Teperino, Raffaele; Lempradl, Adelheid; Pospisilik, J Andrew

2013-05-01

The DNA sequence largely defines gene expression and phenotype. However, it is becoming increasingly clear that an additional chromatin-based regulatory network imparts both stability and plasticity to genome output, modifying phenotype independently of the genetic blueprint. Indeed, alterations in this "epigenetic" control layer underlie, at least in part, the reason for monozygotic twins being discordant for disease. Functionally, this regulatory layer comprises post-translational modifications of DNA and histones, as well as small and large noncoding RNAs. Together these regulate gene expression by changing chromatin organization and DNA accessibility. Successive technological advances over the past decade have enabled researchers to map the chromatin state with increasing accuracy and comprehensiveness, catapulting genetic research into a genome-wide era. Here, aiming particularly at the genomics/epigenomics newcomer, we review the epigenetic basis that has helped drive the technological shift and how this progress is shaping our understanding of complex disease.
Genome-Wide Association Study Identifies Four Loci Associated with Eruption of Permanent Teeth

PubMed Central

Zhang, Hao; Shaffer, John R.; Hansen, Thomas; Esserlind, Ann-Louise; Boyd, Heather A.; Nohr, Ellen A.; Timpson, Nicholas J.; Fatemifar, Ghazaleh; Paternoster, Lavinia; Evans, David M.; Weyant, Robert J.; Levy, Steven M.; Lathrop, Mark; Smith, George Davey; Murray, Jeffrey C.; Olesen, Jes; Werge, Thomas; Marazita, Mary L.; Sørensen, Thorkild I. A.; Melbye, Mads

2011-01-01

The sequence and timing of permanent tooth eruption is thought to be highly heritable and can have important implications for the risk of malocclusion, crowding, and periodontal disease. We conducted a genome-wide association study of number of permanent teeth erupted between age 6 and 14 years, analyzed as age-adjusted standard deviation score averaged over multiple time points, based on childhood records for 5,104 women from the Danish National Birth Cohort. Four loci showed association at P<5×10−8 and were replicated in four independent study groups from the United States and Denmark with a total of 3,762 individuals; all combined P-values were below 10−11. Two loci agreed with previous findings in primary tooth eruption and were also known to influence height and breast cancer, respectively. The two other loci pointed to genomic regions without any previous significant genome-wide association study results. The intronic SNP rs7924176 in ADK could be linked to gene expression in monocytes. The combined effect of the four genetic variants was most pronounced between age 10 and 12 years, where children with 6 to 8 delayed tooth eruption alleles had on average 3.5 (95% confidence interval: 2.9–4.1) fewer permanent teeth than children with 0 or 1 of these alleles. PMID:21931568
Development and Evaluation of a Genome-Wide 6K SNP Array for Diploid Sweet Cherry and Tetraploid Sour Cherry

PubMed Central

Peace, Cameron; Bassil, Nahla; Main, Dorrie; Ficklin, Stephen; Rosyara, Umesh R.; Stegmeir, Travis; Sebolt, Audrey; Gilmore, Barbara; Lawley, Cindy; Mockler, Todd C.; Bryant, Douglas W.; Wilhelm, Larry; Iezzoni, Amy

2012-01-01

High-throughput genome scans are important tools for genetic studies and breeding applications. Here, a 6K SNP array for use with the Illumina Infinium® system was developed for diploid sweet cherry (Prunus avium) and allotetraploid sour cherry (P. cerasus). This effort was led by RosBREED, a community initiative to enable marker-assisted breeding for rosaceous crops. Next-generation sequencing in diverse breeding germplasm provided 25 billion basepairs (Gb) of cherry DNA sequence from which were identified genome-wide SNPs for sweet cherry and for the two sour cherry subgenomes derived from sweet cherry (avium subgenome) and P. fruticosa (fruticosa subgenome). Anchoring to the peach genome sequence, recently released by the International Peach Genome Initiative, predicted relative physical locations of the 1.9 million putative SNPs detected, preliminarily filtered to 368,943 SNPs. Further filtering was guided by results of a 144-SNP subset examined with the Illumina GoldenGate® assay on 160 accessions. A 6K Infinium® II array was designed with SNPs evenly spaced genetically across the sweet and sour cherry genomes. SNPs were developed for each sour cherry subgenome by using minor allele frequency in the sour cherry detection panel to enrich for subgenome-specific SNPs followed by targeting to either subgenome according to alleles observed in sweet cherry. The array was evaluated using panels of sweet (n = 269) and sour (n = 330) cherry breeding germplasm. Approximately one third of array SNPs were informative for each crop. A total of 1825 polymorphic SNPs were verified in sweet cherry, 13% of these originally developed for sour cherry. Allele dosage was resolved for 2058 polymorphic SNPs in sour cherry, one third of these being originally developed for sweet cherry. This publicly available genomics resource represents a significant advance in cherry genome-scanning capability that will accelerate marker-locus-trait association discovery, genome structure investigation, and genetic diversity assessment in this diploid-tetraploid crop group. PMID:23284615
Population-scale whole genome sequencing identifies 271 highly polymorphic short tandem repeats from Japanese population.

PubMed

Hirata, Satoshi; Kojima, Kaname; Misawa, Kazuharu; Gervais, Olivier; Kawai, Yosuke; Nagasaki, Masao

2018-05-01

Forensic DNA typing is widely used to identify missing persons and plays a central role in forensic profiling. DNA typing usually uses capillary electrophoresis fragment analysis of PCR amplification products to detect the length of short tandem repeat (STR) markers. Here, we analyzed whole genome data from 1,070 Japanese individuals generated using massively parallel short-read sequencing of 162 paired-end bases. We have analyzed 843,473 STR loci with two to six basepair repeat units and cataloged highly polymorphic STR loci in the Japanese population. To evaluate the performance of the cataloged STR loci, we compared 23 STR loci, widely used in forensic DNA typing, with capillary electrophoresis based STR genotyping results in the Japanese population. Seventeen loci had high correlations and high call rates. The other six loci had low call rates or low correlations due to either the limitations of short-read sequencing technology, the bioinformatics tool used, or the complexity of repeat patterns. With these analyses, we have also purified the suitable 218 STR loci with four basepair repeat units and 53 loci with five basepair repeat units both for short read sequencing and PCR based technologies, which would be candidates to the actual forensic DNA typing in Japanese population.
BioSMACK: a linux live CD for genome-wide association analyses.

PubMed

Hong, Chang Bum; Kim, Young Jin; Moon, Sanghoon; Shin, Young-Ah; Go, Min Jin; Kim, Dong-Joon; Lee, Jong-Young; Cho, Yoon Shin

2012-01-01

Recent advances in high-throughput genotyping technologies have enabled us to conduct a genome-wide association study (GWAS) on a large cohort. However, analyzing millions of single nucleotide polymorphisms (SNPs) is still a difficult task for researchers conducting a GWAS. Several difficulties such as compatibilities and dependencies are often encountered by researchers using analytical tools, during the installation of software. This is a huge obstacle to any research institute without computing facilities and specialists. Therefore, a proper research environment is an urgent need for researchers working on GWAS. We developed BioSMACK to provide a research environment for GWAS that requires no configuration and is easy to use. BioSMACK is based on the Ubuntu Live CD that offers a complete Linux-based operating system environment without installation. Moreover, we provide users with a GWAS manual consisting of a series of guidelines for GWAS and useful examples. BioSMACK is freely available at http://ksnp.cdc. go.kr/biosmack.
Joint Identification of Genetic Variants for Physical Activity in Korean Population

PubMed Central

Kim, Jayoun; Kim, Jaehee; Min, Haesook; Oh, Sohee; Kim, Yeonjung; Lee, Andy H.; Park, Taesung

2014-01-01

There has been limited research on genome-wide association with physical activity (PA). This study ascertained genetic associations between PA and 344,893 single nucleotide polymorphism (SNP) markers in 8842 Korean samples. PA data were obtained from a validated questionnaire that included information on PA intensity and duration. Metabolic equivalent of tasks were calculated to estimate the total daily PA level for each individual. In addition to single- and multiple-SNP association tests, a pathway enrichment analysis was performed to identify the biological significance of SNP markers. Although no significant SNP was found at genome-wide significance level via single-SNP association tests, 59 genetic variants mapped to 76 genes were identified via a multiple SNP approach using a bootstrap selection stability measure. Pathway analysis for these 59 variants showed that maturity onset diabetes of the young (MODY) was enriched. Joint identification of SNPs could enable the identification of multiple SNPs with good predictive power for PA and a pathway enriched for PA. PMID:25026172
Parallel shRNA and CRISPR-Cas9 screens enable antiviral drug target identification.

PubMed

Deans, Richard M; Morgens, David W; Ökesli, Ayşe; Pillay, Sirika; Horlbeck, Max A; Kampmann, Martin; Gilbert, Luke A; Li, Amy; Mateo, Roberto; Smith, Mark; Glenn, Jeffrey S; Carette, Jan E; Khosla, Chaitan; Bassik, Michael C

2016-05-01

Broad-spectrum antiviral drugs targeting host processes could potentially treat a wide range of viruses while reducing the likelihood of emergent resistance. Despite great promise as therapeutics, such drugs remain largely elusive. Here we used parallel genome-wide high-coverage short hairpin RNA (shRNA) and clustered regularly interspaced short palindromic repeats (CRISPR)-Cas9 screens to identify the cellular target and mechanism of action of GSK983, a potent broad-spectrum antiviral with unexplained cytotoxicity. We found that GSK983 blocked cell proliferation and dengue virus replication by inhibiting the pyrimidine biosynthesis enzyme dihydroorotate dehydrogenase (DHODH). Guided by mechanistic insights from both genomic screens, we found that exogenous deoxycytidine markedly reduced GSK983 cytotoxicity but not antiviral activity, providing an attractive new approach to improve the therapeutic window of DHODH inhibitors against RNA viruses. Our results highlight the distinct advantages and limitations of each screening method for identifying drug targets, and demonstrate the utility of parallel knockdown and knockout screens for comprehensive probing of drug activity.
A rare functional cardioprotective APOC3 variant has risen in frequency in distinct population isolates

PubMed Central

Tachmazidou, Ioanna; Dedoussis, George; Southam, Lorraine; Farmaki, Aliki-Eleni; Ritchie, Graham R. S.; Xifara, Dionysia K.; Matchan, Angela; Hatzikotoulas, Konstantinos; Rayner, Nigel W.; Chen, Yuan; Pollin, Toni I.; O’Connell, Jeffrey R.; Yerges-Armstrong, Laura M.; Kiagiadaki, Chrysoula; Panoutsopoulou, Kalliope; Schwartzentruber, Jeremy; Moutsianas, Loukas; Tsafantakis, Emmanouil; Tyler-Smith, Chris; McVean, Gil; Xue, Yali; Zeggini, Eleftheria

2013-01-01

Isolated populations can empower the identification of rare variation associated with complex traits through next generation association studies, but the generalizability of such findings remains unknown. Here we genotype 1,267 individuals from a Greek population isolate on the Illumina HumanExome Beadchip, in search of functional coding variants associated with lipids traits. We find genome-wide significant evidence for association between R19X, a functional variant in APOC3, with increased high-density lipoprotein and decreased triglycerides levels. Approximately 3.8% of individuals are heterozygous for this cardioprotective variant, which was previously thought to be private to the Amish founder population. R19X is rare (<0.05% frequency) in outbred European populations. The increased frequency of R19X enables discovery of this lipid traits signal at genome-wide significance in a small sample size. This work exemplifies the value of isolated populations in successfully detecting transferable rare variant associations of high medical relevance. PMID:24343240
A rare functional cardioprotective APOC3 variant has risen in frequency in distinct population isolates.

PubMed

Tachmazidou, Ioanna; Dedoussis, George; Southam, Lorraine; Farmaki, Aliki-Eleni; Ritchie, Graham R S; Xifara, Dionysia K; Matchan, Angela; Hatzikotoulas, Konstantinos; Rayner, Nigel W; Chen, Yuan; Pollin, Toni I; O'Connell, Jeffrey R; Yerges-Armstrong, Laura M; Kiagiadaki, Chrysoula; Panoutsopoulou, Kalliope; Schwartzentruber, Jeremy; Moutsianas, Loukas; Tsafantakis, Emmanouil; Tyler-Smith, Chris; McVean, Gil; Xue, Yali; Zeggini, Eleftheria

2013-01-01

Isolated populations can empower the identification of rare variation associated with complex traits through next generation association studies, but the generalizability of such findings remains unknown. Here we genotype 1,267 individuals from a Greek population isolate on the Illumina HumanExome Beadchip, in search of functional coding variants associated with lipids traits. We find genome-wide significant evidence for association between R19X, a functional variant in APOC3, with increased high-density lipoprotein and decreased triglycerides levels. Approximately 3.8% of individuals are heterozygous for this cardioprotective variant, which was previously thought to be private to the Amish founder population. R19X is rare (<0.05% frequency) in outbred European populations. The increased frequency of R19X enables discovery of this lipid traits signal at genome-wide significance in a small sample size. This work exemplifies the value of isolated populations in successfully detecting transferable rare variant associations of high medical relevance.
Genome-Wide Association Analysis of Adaptation Using Environmentally Predicted Traits.

PubMed

van Heerwaarden, Joost; van Zanten, Martijn; Kruijer, Willem

2015-10-01

Current methods for studying the genetic basis of adaptation evaluate genetic associations with ecologically relevant traits or single environmental variables, under the implicit assumption that natural selection imposes correlations between phenotypes, environments and genotypes. In practice, observed trait and environmental data are manifestations of unknown selective forces and are only indirectly associated with adaptive genetic variation. In theory, improved estimation of these forces could enable more powerful detection of loci under selection. Here we present an approach in which we approximate adaptive variation by modeling phenotypes as a function of the environment and using the predicted trait in multivariate and univariate genome-wide association analysis (GWAS). Based on computer simulations and published flowering time data from the model plant Arabidopsis thaliana, we find that environmentally predicted traits lead to higher recovery of functional loci in multivariate GWAS and are more strongly correlated to allele frequencies at adaptive loci than individual environmental variables. Our results provide an example of the use of environmental data to obtain independent and meaningful information on adaptive genetic variation.
New Era of Studying RNA Secondary Structure and Its Influence on Gene Regulation in Plants.

PubMed

Yang, Xiaofei; Yang, Minglei; Deng, Hongjing; Ding, Yiliang

2018-01-01

The dynamic structure of RNA plays a central role in post-transcriptional regulation of gene expression such as RNA maturation, degradation, and translation. With the rise of next-generation sequencing, the study of RNA structure has been transformed from in vitro low-throughput RNA structure probing methods to in vivo high-throughput RNA structure profiling. The development of these methods enables incremental studies on the function of RNA structure to be performed, revealing new insights of novel regulatory mechanisms of RNA structure in plants. Genome-wide scale RNA structure profiling allows us to investigate general RNA structural features over 10s of 1000s of mRNAs and to compare RNA structuromes between plant species. Here, we provide a comprehensive and up-to-date overview of: (i) RNA structure probing methods; (ii) the biological functions of RNA structure; (iii) genome-wide RNA structural features corresponding to their regulatory mechanisms; and (iv) RNA structurome evolution in plants.
First-generation linkage map of the gray, short-tailed opossum, Monodelphis domestica, reveals genome-wide reduction in female recombination rates.

PubMed Central

Samollow, Paul B; Kammerer, Candace M; Mahaney, Susan M; Schneider, Jennifer L; Westenberger, Scott J; VandeBerg, John L; Robinson, Edward S

2004-01-01

The gray, short-tailed opossum, Monodelphis domestica, is the most extensively used, laboratory-bred marsupial resource for basic biologic and biomedical research worldwide. To enhance the research utility of this species, we are building a linkage map, using both anonymous markers and functional gene loci, that will enable the localization of quantitative trait loci (QTL) and provide comparative information regarding the evolution of mammalian and other vertebrate genomes. The current map is composed of 83 loci distributed among eight autosomal linkage groups and the X chromosome. The autosomal linkage groups appear to encompass a very large portion of the genome, yet span a sex-average distance of only 633.0 cM, making this the most compact linkage map known among vertebrates. Most surprising, the male map is much larger than the female map (884.6 cM vs. 443.1 cM), a pattern contrary to that in eutherian mammals and other vertebrates. The finding of genome-wide reduction in female recombination in M. domestica, coupled with recombination data from two other, distantly related marsupial species, suggests that reduced female recombination might be a widespread metatherian attribute. We discuss possible explanations for reduced female recombination in marsupials as a consequence of the metatherian characteristic of determinate paternal X chromosome inactivation. PMID:15020427
Diversity and genomic insights into the uncultured Chloroflexi from the human microbiota.

PubMed

Campbell, Alisha G; Schwientek, Patrick; Vishnivetskaya, Tatiana; Woyke, Tanja; Levy, Shawn; Beall, Clifford J; Griffen, Ann; Leys, Eugene; Podar, Mircea

2014-09-01

Many microbial phyla that are widely distributed in open environments have few or no representatives within animal-associated microbiota. Among them, the Chloroflexi comprises taxonomically and physiologically diverse lineages adapted to a wide range of aquatic and terrestrial habitats. A distinct group of uncultured chloroflexi related to free-living anaerobic Anaerolineae inhabits the mammalian gastrointestinal tract and includes low-abundance human oral bacteria that appear to proliferate in periodontitis. Using a single-cell genomics approach, we obtained the first draft genomic reconstruction for these organisms and compared their inferred metabolic potential with free-living chloroflexi. Genomic data suggest that oral chloroflexi are anaerobic heterotrophs, encoding abundant carbohydrate transport and metabolism functionalities, similar to those seen in environmental Anaerolineae isolates. The presence of genes for a unique phosphotransferase system and N-acetylglucosamine metabolism suggests an important ecological niche for oral chloroflexi in scavenging material from lysed bacterial cells and the human tissue. The inferred ability to produce sialic acid for cell membrane decoration may enable them to evade the host defence system and colonize the subgingival space. As with other low abundance but persistent members of the microbiota, discerning community and host factors that influence the proliferation of oral chloroflexi may help understand the emergence of oral pathogens and the microbiota dynamics in health and disease states. © 2014 Society for Applied Microbiology and John Wiley & Sons Ltd.
Strong trans-Pacific break and local conservation units in the Galapagos shark (Carcharhinus galapagensis) revealed by genome-wide cytonuclear markers.

PubMed

Pazmiño, Diana A; Maes, Gregory E; Green, Madeline E; Simpfendorfer, Colin A; Hoyos-Padilla, E Mauricio; Duffy, Clinton J A; Meyer, Carl G; Kerwath, Sven E; Salinas-de-León, Pelayo; van Herwerden, Lynne

2018-05-01

The application of genome-wide cytonuclear molecular data to identify management and adaptive units at various spatio-temporal levels is particularly important for overharvested large predatory organisms, often characterized by smaller, localized populations. Despite being "near threatened", current understanding of habitat use and population structure of Carcharhinus galapagensis is limited to specific areas within its distribution. We evaluated population structure and connectivity across the Pacific Ocean using genome-wide single-nucleotide polymorphisms (~7200 SNPs) and mitochondrial control region sequences (945 bp) for 229 individuals. Neutral SNPs defined at least two genetically discrete geographic groups: an East Tropical Pacific (Mexico, east and west Galapagos Islands), and another central-west Pacific (Lord Howe Island, Middleton Reef, Norfolk Island, Elizabeth Reef, Kermadec, Hawaii and Southern Africa). More fine-grade population structure was suggested using outlier SNPs: west Pacific, Hawaii, Mexico, and Galapagos. Consistently, mtDNA pairwise Φ ST defined three regional stocks: east, central and west Pacific. Compared to neutral SNPs (F ST = 0.023-0.035), mtDNA exhibited more divergence (Φ ST = 0.258-0.539) and high overall genetic diversity (h = 0.794 ± 0.014; π = 0.004 ± 0.000), consistent with the longstanding eastern Pacific barrier between the east and central-west Pacific. Hawaiian and Southern African populations group within the west Pacific cluster. Effective population sizes were moderate/high for east/west populations (738 and 3421, respectively). Insights into the biology, connectivity, genetic diversity, and population demographics informs for improved conservation of this species, by delineating three to four conservation units across their Pacific distribution. Implementing such conservation management may be challenging, but is necessary to achieve long-term population resilience at basin and regional scales.
Ames Culture Chamber System: Enabling Model Organism Research Aboard the international Space Station

NASA Technical Reports Server (NTRS)

Steele, Marianne

2014-01-01

Understanding the genetic, physiological, and behavioral effects of spaceflight on living organisms and elucidating the molecular mechanisms that underlie these effects are high priorities for NASA. Certain organisms, known as model organisms, are widely studied to help researchers better understand how all biological systems function. Small model organisms such as nem-atodes, slime mold, bacteria, green algae, yeast, and moss can be used to study the effects of micro- and reduced gravity at both the cellular and systems level over multiple generations. Many model organisms have sequenced genomes and published data sets on their transcriptomes and proteomes that enable scientific investigations of the molecular mechanisms underlying the adaptations of these organisms to space flight.
Nitrogen economics of root foraging: Transitive closure of the nitrate–cytokinin relay and distinct systemic signaling for N supply vs. demand

PubMed Central

Ruffel, Sandrine; Krouk, Gabriel; Ristova, Daniela; Shasha, Dennis; Birnbaum, Kenneth D.; Coruzzi, Gloria M.

2011-01-01

As sessile organisms, root plasticity enables plants to forage for and acquire nutrients in a fluctuating underground environment. Here, we use genetic and genomic approaches in a “split-root” framework—in which physically isolated root systems of the same plant are challenged with different nitrogen (N) environments—to investigate how systemic signaling affects genome-wide reprogramming and root development. The integration of transcriptome and root phenotypes enables us to identify distinct mechanisms underlying “N economy” (i.e., N supply and demand) of plants as a system. Under nitrate-limited conditions, plant roots adopt an “active-foraging strategy”, characterized by lateral root outgrowth and a shared pattern of transcriptome reprogramming, in response to either local or distal nitrate deprivation. By contrast, in nitrate-replete conditions, plant roots adopt a “dormant strategy”, characterized by a repression of lateral root outgrowth and a shared pattern of transcriptome reprogramming, in response to either local or distal nitrate supply. Sentinel genes responding to systemic N signaling identified by genome-wide comparisons of heterogeneous vs. homogeneous split-root N treatments were used to probe systemic N responses in Arabidopsis mutants impaired in nitrate reduction and hormone synthesis and also in decapitated plants. This combined analysis identified genetically distinct systemic signaling underlying plant N economy: (i) N supply, corresponding to a long-distance systemic signaling triggered by nitrate sensing; and (ii) N demand, experimental support for the transitive closure of a previously inferred nitrate–cytokinin shoot–root relay system that reports the nitrate demand of the whole plant, promoting a compensatory root growth in nitrate-rich patches of heterogeneous soil. PMID:22025711

The Nuclear and Mitochondrial Genomes of the Facultatively Eusocial Orchid Bee Euglossa dilemma

PubMed Central

Brand, Philipp; Saleh, Nicholas; Pan, Hailin; Li, Cai; Kapheim, Karen M.; Ramírez, Santiago R.

2017-01-01

Bees provide indispensable pollination services to both agricultural crops and wild plant populations, and several species of bees have become important models for the study of learning and memory, plant–insect interactions, and social behavior. Orchid bees (Apidae: Euglossini) are especially important to the fields of pollination ecology, evolution, and species conservation. Here we report the nuclear and mitochondrial genome sequences of the orchid bee Euglossa dilemma Bembé & Eltz. E. dilemma was selected because it is widely distributed, highly abundant, and it was recently naturalized in the southeastern United States. We provide a high-quality assembly of the 3.3 Gb genome, and an official gene set of 15,904 gene annotations. We find high conservation of gene synteny with the honey bee throughout 80 MY of divergence time. This genomic resource represents the first draft genome of the orchid bee genus Euglossa, and the first draft orchid bee mitochondrial genome, thus representing a valuable resource to the research community. PMID:28701376
The Nuclear and Mitochondrial Genomes of the Facultatively Eusocial Orchid Bee Euglossa dilemma.

PubMed

Brand, Philipp; Saleh, Nicholas; Pan, Hailin; Li, Cai; Kapheim, Karen M; Ramírez, Santiago R

2017-09-07

Bees provide indispensable pollination services to both agricultural crops and wild plant populations, and several species of bees have become important models for the study of learning and memory, plant-insect interactions, and social behavior. Orchid bees (Apidae: Euglossini) are especially important to the fields of pollination ecology, evolution, and species conservation. Here we report the nuclear and mitochondrial genome sequences of the orchid bee Euglossa dilemma Bembé & Eltz. E. dilemma was selected because it is widely distributed, highly abundant, and it was recently naturalized in the southeastern United States. We provide a high-quality assembly of the 3.3 Gb genome, and an official gene set of 15,904 gene annotations. We find high conservation of gene synteny with the honey bee throughout 80 MY of divergence time. This genomic resource represents the first draft genome of the orchid bee genus Euglossa , and the first draft orchid bee mitochondrial genome, thus representing a valuable resource to the research community. Copyright © 2017 Brand et al.
CRISPR-FOCUS: A web server for designing focused CRISPR screening experiments.

PubMed

Cao, Qingyi; Ma, Jian; Chen, Chen-Hao; Xu, Han; Chen, Zhi; Li, Wei; Liu, X Shirley

2017-01-01

The recently developed CRISPR screen technology, based on the CRISPR/Cas9 genome editing system, enables genome-wide interrogation of gene functions in an efficient and cost-effective manner. Although many computational algorithms and web servers have been developed to design single-guide RNAs (sgRNAs) with high specificity and efficiency, algorithms specifically designed for conducting CRISPR screens are still lacking. Here we present CRISPR-FOCUS, a web-based platform to search and prioritize sgRNAs for CRISPR screen experiments. With official gene symbols or RefSeq IDs as the only mandatory input, CRISPR-FOCUS filters and prioritizes sgRNAs based on multiple criteria, including efficiency, specificity, sequence conservation, isoform structure, as well as genomic variations including Single Nucleotide Polymorphisms and cancer somatic mutations. CRISPR-FOCUS also provides pre-defined positive and negative control sgRNAs, as well as other necessary sequences in the construct (e.g., U6 promoters to drive sgRNA transcription and RNA scaffolds of the CRISPR/Cas9). These features allow users to synthesize oligonucleotides directly based on the output of CRISPR-FOCUS. Overall, CRISPR-FOCUS provides a rational and high-throughput approach for sgRNA library design that enables users to efficiently conduct a focused screen experiment targeting up to thousands of genes. (CRISPR-FOCUS is freely available at http://cistrome.org/crispr-focus/).
RegPrecise 3.0--a resource for genome-scale exploration of transcriptional regulation in bacteria.

PubMed

Novichkov, Pavel S; Kazakov, Alexey E; Ravcheev, Dmitry A; Leyn, Semen A; Kovaleva, Galina Y; Sutormin, Roman A; Kazanov, Marat D; Riehl, William; Arkin, Adam P; Dubchak, Inna; Rodionov, Dmitry A

2013-11-01

Genome-scale prediction of gene regulation and reconstruction of transcriptional regulatory networks in prokaryotes is one of the critical tasks of modern genomics. Bacteria from different taxonomic groups, whose lifestyles and natural environments are substantially different, possess highly diverged transcriptional regulatory networks. The comparative genomics approaches are useful for in silico reconstruction of bacterial regulons and networks operated by both transcription factors (TFs) and RNA regulatory elements (riboswitches). RegPrecise (http://regprecise.lbl.gov) is a web resource for collection, visualization and analysis of transcriptional regulons reconstructed by comparative genomics. We significantly expanded a reference collection of manually curated regulons we introduced earlier. RegPrecise 3.0 provides access to inferred regulatory interactions organized by phylogenetic, structural and functional properties. Taxonomy-specific collections include 781 TF regulogs inferred in more than 160 genomes representing 14 taxonomic groups of Bacteria. TF-specific collections include regulogs for a selected subset of 40 TFs reconstructed across more than 30 taxonomic lineages. Novel collections of regulons operated by RNA regulatory elements (riboswitches) include near 400 regulogs inferred in 24 bacterial lineages. RegPrecise 3.0 provides four classifications of the reference regulons implemented as controlled vocabularies: 55 TF protein families; 43 RNA motif families; ~150 biological processes or metabolic pathways; and ~200 effectors or environmental signals. Genome-wide visualization of regulatory networks and metabolic pathways covered by the reference regulons are available for all studied genomes. A separate section of RegPrecise 3.0 contains draft regulatory networks in 640 genomes obtained by an conservative propagation of the reference regulons to closely related genomes. RegPrecise 3.0 gives access to the transcriptional regulons reconstructed in bacterial genomes. Analytical capabilities include exploration of: regulon content, structure and function; TF binding site motifs; conservation and variations in genome-wide regulatory networks across all taxonomic groups of Bacteria. RegPrecise 3.0 was selected as a core resource on transcriptional regulation of the Department of Energy Systems Biology Knowledgebase, an emerging software and data environment designed to enable researchers to collaboratively generate, test and share new hypotheses about gene and protein functions, perform large-scale analyses, and model interactions in microbes, plants, and their communities.
Identification of a unique library of complex, but ordered, arrays of repetitive elements in the human genome and implication of their potential involvement in pathobiology.

PubMed

Lee, Kang-Hoon; Lee, Young-Kwan; Kwon, Deug-Nam; Chiu, Sophia; Chew, Victoria; Rah, Hyungchul; Kujawski, Gregory; Melhem, Ramzi; Hsu, Karen; Chung, Cecilia; Greenhalgh, David G; Cho, Kiho

2011-06-01

Approximately 2% of the human genome is reported to be occupied by genes. Various forms of repetitive elements (REs), both characterized and uncharacterized, are presumed to make up the vast majority of the rest of the genomes of human and other species. In conjunction with a comprehensive annotation of genes, information regarding components of genome biology, such as gene polymorphisms, non-coding RNAs, and certain REs, is found in human genome databases. However, the genome-wide profile of unique RE arrangements formed by different groups of REs has not been fully characterized yet. In this study, the entire human genome was subjected to an unbiased RE survey to establish a whole-genome profile of REs and their arrangements. Due to the limitation in query size within the bl2seq alignment program (National Center for Biotechnology Information [NCBI]) utilized for the RE survey, the entire NCBI reference human genome was fragmented into 6206 units of 0.5M nucleotides. A number of RE arrangements with varying complexities and patterns were identified throughout the genome. Each chromosome had unique profiles of RE arrangements and density, and high levels of RE density were measured near the centromere regions. Subsequently, 175 complex RE arrangements, which were selected throughout the genome, were subjected to a comparison analysis using five different human genome sequences. Interestingly, three of the five human genome databases shared the exactly same arrangement patterns and sequences for all 175 RE arrangement regions (a total of 12,765,625 nucleotides). The findings from this study demonstrate that a substantial fraction of REs in the human genome are clustered into various forms of ordered structures. Further investigations are needed to examine whether some of these ordered RE arrangements contribute to the human pathobiology as a functional genome unit. Copyright © 2011 Elsevier Inc. All rights reserved.
The Plant Short-Chain Dehydrogenase (SDR) superfamily: genome-wide inventory and diversification patterns

PubMed Central

2012-01-01

Background Short-chain dehydrogenases/reductases (SDRs) form one of the largest and oldest NAD(P)(H) dependent oxidoreductase families. Despite a conserved ‘Rossmann-fold’ structure, members of the SDR superfamily exhibit low sequence similarities, which constituted a bottleneck in terms of identification. Recent classification methods, relying on hidden-Markov models (HMMs), improved identification and enabled the construction of a nomenclature. However, functional annotations of plant SDRs remain scarce. Results Wide-scale analyses were performed on ten plant genomes. The combination of hidden Markov model (HMM) based analyses and similarity searches led to the construction of an exhaustive inventory of plant SDR. With 68 to 315 members found in each analysed genome, the inventory confirmed the over-representation of SDRs in plants compared to animals, fungi and prokaryotes. The plant SDRs were first classified into three major types — ‘classical’, ‘extended’ and ‘divergent’ — but a minority (10% of the predicted SDRs) could not be classified into these general types (‘unknown’ or ‘atypical’ types). In a second step, we could categorize the vast majority of land plant SDRs into a set of 49 families. Out of these 49 families, 35 appeared early during evolution since they are commonly found through all the Green Lineage. Yet, some SDR families — tropinone reductase-like proteins (SDR65C), ‘ABA2-like’-NAD dehydrogenase (SDR110C), ‘salutaridine/menthone-reductase-like’ proteins (SDR114C), ‘dihydroflavonol 4-reductase’-like proteins (SDR108E) and ‘isoflavone-reductase-like’ (SDR460A) proteins — have undergone significant functional diversification within vascular plants since they diverged from Bryophytes. Interestingly, these diversified families are either involved in the secondary metabolism routes (terpenoids, alkaloids, phenolics) or participate in developmental processes (hormone biosynthesis or catabolism, flower development), in opposition to SDR families involved in primary metabolism which are poorly diversified. Conclusion The application of HMMs to plant genomes enabled us to identify 49 families that encompass all Angiosperms (‘higher plants’) SDRs, each family being sufficiently conserved to enable simpler analyses based only on overall sequence similarity. The multiplicity of SDRs in plant kingdom is mainly explained by the diversification of large families involved in different secondary metabolism pathways, suggesting that the chemical diversification that accompanied the emergence of vascular plants acted as a driving force for SDR evolution. PMID:23167570
Genome-wide comparison of medieval and modern Mycobacterium leprae.

PubMed

Schuenemann, Verena J; Singh, Pushpendra; Mendum, Thomas A; Krause-Kyora, Ben; Jäger, Günter; Bos, Kirsten I; Herbig, Alexander; Economou, Christos; Benjak, Andrej; Busso, Philippe; Nebel, Almut; Boldsen, Jesper L; Kjellström, Anna; Wu, Huihai; Stewart, Graham R; Taylor, G Michael; Bauer, Peter; Lee, Oona Y-C; Wu, Houdini H T; Minnikin, David E; Besra, Gurdyal S; Tucker, Katie; Roffey, Simon; Sow, Samba O; Cole, Stewart T; Nieselt, Kay; Krause, Johannes

2013-07-12

Leprosy was endemic in Europe until the Middle Ages. Using DNA array capture, we have obtained genome sequences of Mycobacterium leprae from skeletons of five medieval leprosy cases from the United Kingdom, Sweden, and Denmark. In one case, the DNA was so well preserved that full de novo assembly of the ancient bacterial genome could be achieved through shotgun sequencing alone. The ancient M. leprae sequences were compared with those of 11 modern strains, representing diverse genotypes and geographic origins. The comparisons revealed remarkable genomic conservation during the past 1000 years, a European origin for leprosy in the Americas, and the presence of an M. leprae genotype in medieval Europe now commonly associated with the Middle East. The exceptional preservation of M. leprae biomarkers, both DNA and mycolic acids, in ancient skeletons has major implications for palaeomicrobiology and human pathogen evolution.
Genome-wide survey of DNA-binding proteins in Arabidopsis thaliana: analysis of distribution and functions.

PubMed

Malhotra, Sony; Sowdhamini, Ramanathan

2013-08-01

The interaction of proteins with their respective DNA targets is known to control many high-fidelity cellular processes. Performing a comprehensive survey of the sequenced genomes for DNA-binding proteins (DBPs) will help in understanding their distribution and the associated functions in a particular genome. Availability of fully sequenced genome of Arabidopsis thaliana enables the review of distribution of DBPs in this model plant genome. We used profiles of both structure and sequence-based DNA-binding families, derived from PDB and PFam databases, to perform the survey. This resulted in 4471 proteins, identified as DNA-binding in Arabidopsis genome, which are distributed across 300 different PFam families. Apart from several plant-specific DNA-binding families, certain RING fingers and leucine zippers also had high representation. Our search protocol helped to assign DNA-binding property to several proteins that were previously marked as unknown, putative or hypothetical in function. The distribution of Arabidopsis genes having a role in plant DNA repair were particularly studied and noted for their functional mapping. The functions observed to be overrepresented in the plant genome harbour DNA-3-methyladenine glycosylase activity, alkylbase DNA N-glycosylase activity and DNA-(apurinic or apyrimidinic site) lyase activity, suggesting their role in specialized functions such as gene regulation and DNA repair.
Deciphering the distance to antibiotic resistance for the pneumococcus using genome sequencing data

PubMed Central

Mobegi, Fredrick M.; Cremers, Amelieke J. H.; de Jonge, Marien I.; Bentley, Stephen D.; van Hijum, Sacha A. F. T.; Zomer, Aldert

2017-01-01

Advances in genome sequencing technologies and genome-wide association studies (GWAS) have provided unprecedented insights into the molecular basis of microbial phenotypes and enabled the identification of the underlying genetic variants in real populations. However, utilization of genome sequencing in clinical phenotyping of bacteria is challenging due to the lack of reliable and accurate approaches. Here, we report a method for predicting microbial resistance patterns using genome sequencing data. We analyzed whole genome sequences of 1,680 Streptococcus pneumoniae isolates from four independent populations using GWAS and identified probable hotspots of genetic variation which correlate with phenotypes of resistance to essential classes of antibiotics. With the premise that accumulation of putative resistance-conferring SNPs, potentially in combination with specific resistance genes, precedes full resistance, we retrogressively surveyed the hotspot loci and quantified the number of SNPs and/or genes, which if accumulated would confer full resistance to an otherwise susceptible strain. We name this approach the ‘distance to resistance’. It can be used to identify the creep towards complete antibiotics resistance in bacteria using genome sequencing. This approach serves as a basis for the development of future sequencing-based methods for predicting resistance profiles of bacterial strains in hospital microbiology and public health settings. PMID:28205635
Genome-Wide Prediction and Analysis of 3D-Domain Swapped Proteins in the Human Genome from Sequence Information.

PubMed

Upadhyay, Atul Kumar; Sowdhamini, Ramanathan

2016-01-01

3D-domain swapping is one of the mechanisms of protein oligomerization and the proteins exhibiting this phenomenon have many biological functions. These proteins, which undergo domain swapping, have acquired much attention owing to their involvement in human diseases, such as conformational diseases, amyloidosis, serpinopathies, proteionopathies etc. Early realisation of proteins in the whole human genome that retain tendency to domain swap will enable many aspects of disease control management. Predictive models were developed by using machine learning approaches with an average accuracy of 78% (85.6% of sensitivity, 87.5% of specificity and an MCC value of 0.72) to predict putative domain swapping in protein sequences. These models were applied to many complete genomes with special emphasis on the human genome. Nearly 44% of the protein sequences in the human genome were predicted positive for domain swapping. Enrichment analysis was performed on the positively predicted sequences from human genome for their domain distribution, disease association and functional importance based on Gene Ontology (GO). Enrichment analysis was also performed to infer a better understanding of the functional importance of these sequences. Finally, we developed hinge region prediction, in the given putative domain swapped sequence, by using important physicochemical properties of amino acids.
Phylo_dCor: distance correlation as a novel metric for phylogenetic profiling.

PubMed

Sferra, Gabriella; Fratini, Federica; Ponzi, Marta; Pizzi, Elisabetta

2017-09-05

Elaboration of powerful methods to predict functional and/or physical protein-protein interactions from genome sequence is one of the main tasks in the post-genomic era. Phylogenetic profiling allows the prediction of protein-protein interactions at a whole genome level in both Prokaryotes and Eukaryotes. For this reason it is considered one of the most promising methods. Here, we propose an improvement of phylogenetic profiling that enables handling of large genomic datasets and infer global protein-protein interactions. This method uses the distance correlation as a new measure of phylogenetic profile similarity. We constructed robust reference sets and developed Phylo-dCor, a parallelized version of the algorithm for calculating the distance correlation that makes it applicable to large genomic data. Using Saccharomyces cerevisiae and Escherichia coli genome datasets, we showed that Phylo-dCor outperforms phylogenetic profiling methods previously described based on the mutual information and Pearson's correlation as measures of profile similarity. In this work, we constructed and assessed robust reference sets and propose the distance correlation as a measure for comparing phylogenetic profiles. To make it applicable to large genomic data, we developed Phylo-dCor, a parallelized version of the algorithm for calculating the distance correlation. Two R scripts that can be run on a wide range of machines are available upon request.
Omics in the Arctic: Genome-enabled Contributions to Carbon Cycle Research in High-Latitude Ecosystems (JGI Seventh Annual User Meeting 2012: Genomics of Energy and Environment)

ScienceCinema

Wullschleger, Stan

2018-02-13

Stan Wullschleger of Oak Ridge National Laboratory on "Omics in the Arctic: Genome-enabled Contributions to Carbon Cycle Research in High-Latitude Ecosystems" on March 22, 2012 at the 7th Annual Genomics of Energy & Environment Meeting in Walnut Creek, California.
Genetically contextual effects of smoking on genome wide DNA methylation.

PubMed

Dogan, Meeshanthini V; Beach, Steven R H; Philibert, Robert A

2017-09-01

Smoking is the leading cause of death in the United States. It exerts its effects by increasing susceptibility to a variety of complex disorders among those who smoke, and if pregnant, to their unborn children. In prior efforts to understand the epigenetic mechanisms through which this increased vulnerability is conveyed, a number of investigators have conducted genome wide methylation analyses. Unfortunately, secondary to methodological limitations, these studies were unable to examine methylation in gene regions with significant amounts of genetic variation. Using genome wide genetic and epigenetic data from the Framingham Heart Study, we re-examined the relationship of smoking status to genome wide methylation status. When only methylation status is considered, smoking was significantly associated with differential methylation in 310 genes that map to a variety of biological process and cellular differentiation pathways. However, when SNP effects on the magnitude of smoking associated methylation changes are also considered, cis and trans-interaction effects were noted at a total of 266 and 4353 genes with no marked enrichment for any biological pathways. Furthermore, the SNP variation participating in the significant interaction effects is enriched for loci previously associated with complex medical illnesses. The enlarged scope of the methylome shown to be affected by smoking may better explicate the mediational pathways linking smoking with a myriad of smoking related complex syndromes. Additionally, these results strongly suggest that combined epigenetic and genetic data analyses may be critical for a more complete understanding of the relationship between environmental variables, such as smoking, and pathophysiological outcomes. © 2017 Wiley Periodicals, Inc.
Genome-Wide Association Study of Personality Traits in the Long Life Family Study

PubMed Central

Bae, Harold T.; Sebastiani, Paola; Sun, Jenny X.; Andersen, Stacy L.; Daw, E. Warwick; Terracciano, Antonio; Ferrucci, Luigi; Perls, Thomas T.

2013-01-01

Personality traits have been shown to be associated with longevity and healthy aging. In order to discover novel genetic modifiers associated with personality traits as related with longevity, we performed a genome-wide association study (GWAS) on personality factors assessed by NEO-five-factor inventory in individuals enrolled in the Long Life Family Study (LLFS), a study of 583 families (N up to 4595) with clustering for longevity in the United States and Denmark. Three SNPs, in almost perfect LD, associated with agreeableness reached genome-wide significance (p < 10−8) and replicated in an additional sample of 1279 LLFS subjects, although one (rs9650241) failed to replicate and the other two were not available in two independent replication cohorts, the Baltimore Longitudinal Study of Aging and the New England Centenarian Study. Based on 10,000,000 permutations, the empirical p-value of 2 × 10−7 was observed for the genome-wide significant SNPs. Seventeen SNPs that reached marginal statistical significance in the two previous GWASs (p-value <10−4 and 10−5), were also marginally significantly associated in this study (p-value <0.05), although none of the associations passed the Bonferroni correction. In addition, we tested age-by-SNP interactions and found some significant associations. Since scores of personality traits in LLFS subjects change in the oldest ages, and genetic factors outweigh environmental factors to achieve extreme ages, these age-by-SNP interactions could be a proxy for complex gene–gene interactions affecting personality traits and longevity. PMID:23658558
SWPhylo - A Novel Tool for Phylogenomic Inferences by Comparison of Oligonucleotide Patterns and Integration of Genome-Based and Gene-Based Phylogenetic Trees.

PubMed

Yu, Xiaoyu; Reva, Oleg N

2018-01-01

Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA.
SWPhylo – A Novel Tool for Phylogenomic Inferences by Comparison of Oligonucleotide Patterns and Integration of Genome-Based and Gene-Based Phylogenetic Trees

PubMed Central

Yu, Xiaoyu; Reva, Oleg N

2018-01-01

Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA. PMID:29511354
Systematic Determination of Replication Activity Type Highlights Interconnections between Replication, Chromatin Structure and Nuclear Localization

PubMed Central

Polten, Andreas; Hezroni, Hadas; Eldar, Yonina C.; Meshorer, Eran; Yakhini, Zohar; Simon, Itamar

2012-01-01

DNA replication is a highly regulated process, with each genomic locus replicating at a distinct time of replication (ToR). Advances in ToR measurement technology enabled several genome-wide profiling studies that revealed tight associations between ToR and general genomic features and a remarkable ToR conservation in mammals. Genome wide studies further showed that at the hundreds kb-to-megabase scale the genome can be divided into constant ToR regions (CTRs) in which the replication process propagates at a faster pace due to the activation of multiple origins and temporal transition regions (TTRs) in which the replication process propagates at a slower pace. We developed a computational tool that assigns a ToR to every measured locus and determines its replication activity type (CTR versus TTR). Our algorithm, ARTO (Analysis of Replication Timing and Organization), uses signal processing methods to fit a constant piece-wise linear curve to the measured raw data. We tested our algorithm and provide performance and usability results. A Matlab implementation of ARTO is available at http://bioinfo.cs.technion.ac.il/people/zohar/ARTO/. Applying our algorithm to ToR data measured in multiple mouse and human samples allowed precise genome-wide ToR determination and replication activity type characterization. Analysis of the results highlighted the plasticity of the replication program. For example, we observed significant ToR differences in 10–25% of the genome when comparing different tissue types. Our analyses also provide evidence for activity type differences in up to 30% of the probes. Integration of the ToR data with multiple aspects of chromosome organization characteristics suggests that ToR plays a role in shaping the regional chromatin structure. Namely, repressive chromatin marks, are associated with late ToR both in TTRs and CTRs. Finally, characterization of the differences between TTRs and CTRs, with matching ToR, revealed that TTRs are associated with compact chromatin and are located significantly closer to the nuclear envelope. Supplementary material is available. Raw and processed data were deposited in Geo (GSE17236). PMID:23145042
A GPU-Based Wide-Band Radio Spectrometer

NASA Astrophysics Data System (ADS)

Chennamangalam, Jayanth; Scott, Simon; Jones, Glenn; Chen, Hong; Ford, John; Kepley, Amanda; Lorimer, D. R.; Nie, Jun; Prestage, Richard; Roshi, D. Anish; Wagner, Mark; Werthimer, Dan

2014-12-01

The graphics processing unit has become an integral part of astronomical instrumentation, enabling high-performance online data reduction and accelerated online signal processing. In this paper, we describe a wide-band reconfigurable spectrometer built using an off-the-shelf graphics processing unit card. This spectrometer, when configured as a polyphase filter bank, supports a dual-polarisation bandwidth of up to 1.1 GHz (or a single-polarisation bandwidth of up to 2.2 GHz) on the latest generation of graphics processing units. On the other hand, when configured as a direct fast Fourier transform, the spectrometer supports a dual-polarisation bandwidth of up to 1.4 GHz (or a single-polarisation bandwidth of up to 2.8 GHz).
Emerging Science and Research Opportunities for Metals and Metallic Nanostructures: A Report on the NSF MMN Workshop

NASA Astrophysics Data System (ADS)

Pollock, Tresa; Handwerker, Carol

In the next decade, fundamental research in metals and metallic nanostructures (MMN) has the potential to continue to transform science into innovative materials, devices, and systems. This talk summarizes the findings of a workshop to identify emerging and potentially transformative research areas in MMN. The metals and metallic nanostructures (MMNs) workshop aimed to identify significant research trends, scientific fundamentals, and recent breakthroughs that can enable new or enhanced MMN performance, either alone or in a more complex materials system, for a wide range of applications. Additionally, the role that MMN research can play in high-priority research and development (R&D) areas such as the U.S. Materials Genome Initiative, the National Nanotechnology Initiative, the Advanced Manufacturing Initiative, and other similar initiatives that exist internationally was assessed. The workshop also addressed critical issues related to materials research instrumentation and the cyberinfrastructure for materials science research and education, as well as science, technology, engineering, and mathematics (STEM) workforce development, with emphasis on the United States but with an appreciation that similar challenges and opportunities for the materials community exist internationally.
Empirical Bayes method for reducing false discovery rates of correlation matrices with block diagonal structure.

PubMed

Pacini, Clare; Ajioka, James W; Micklem, Gos

2017-04-12

Correlation matrices are important in inferring relationships and networks between regulatory or signalling elements in biological systems. With currently available technology sample sizes for experiments are typically small, meaning that these correlations can be difficult to estimate. At a genome-wide scale estimation of correlation matrices can also be computationally demanding. We develop an empirical Bayes approach to improve covariance estimates for gene expression, where we assume the covariance matrix takes a block diagonal form. Our method shows lower false discovery rates than existing methods on simulated data. Applied to a real data set from Bacillus subtilis we demonstrate it's ability to detecting known regulatory units and interactions between them. We demonstrate that, compared to existing methods, our method is able to find significant covariances and also to control false discovery rates, even when the sample size is small (n=10). The method can be used to find potential regulatory networks, and it may also be used as a pre-processing step for methods that calculate, for example, partial correlations, so enabling the inference of the causal and hierarchical structure of the networks.

Gene therapy: advances, challenges and perspectives

PubMed Central

Gonçalves, Giulliana Augusta Rangel; Paiva, Raquel de Melo Alves

2017-01-01

ABSTRACT The ability to make site-specific modifications to the human genome has been an objective in medicine since the recognition of the gene as the basic unit of heredity. Thus, gene therapy is understood as the ability of genetic improvement through the correction of altered (mutated) genes or site-specific modifications that target therapeutic treatment. This therapy became possible through the advances of genetics and bioengineering that enabled manipulating vectors for delivery of extrachromosomal material to target cells. One of the main focuses of this technique is the optimization of delivery vehicles (vectors) that are mostly plasmids, nanostructured or viruses. The viruses are more often investigated due to their excellence of invading cells and inserting their genetic material. However, there is great concern regarding exacerbated immune responses and genome manipulation, especially in germ line cells. In vivo studies in in somatic cell showed satisfactory results with approved protocols in clinical trials. These trials have been conducted in the United States, Europe, Australia and China. Recent biotechnological advances, such as induced pluripotent stem cells in patients with liver diseases, chimeric antigen receptor T-cell immunotherapy, and genomic editing by CRISPR/Cas9, are addressed in this review. PMID:29091160
Loci associated with resistance to stripe rust (Puccinia striiformis f. sp. tritici) in a core collection of spring wheat (Triticum aestivum)

USDA-ARS?s Scientific Manuscript database

Stripe rust, caused by Puccinia striiformis Westend. f. sp. tritici Erikss. (Pst) remains one of the most significant diseases of wheat worldwide. We investigated stripe rust resistance by genome-wide association analysis (GWAS) in 959 spring wheat accessions from the Unites States Department of Agr...
De novo genome assembly of the fungal plant pathogen Pyrenophora semeniperda

Treesearch

Marcus M. Soliai; Susan E. Meyer; Joshua A. Udall; David E. Elzinga; Russell A. Hermansen; Paul M. Bodily; Aaron A. Hart; Craig E. Coleman

2014-01-01

Pyrenophora semeniperda (anamorph Drechslera campulata) is a necrotrophic fungal seed pathogen that has a wide host range within the Poaceae. One of its hosts is cheatgrass (Bromus tectorum), a species exotic to the United States that has invaded natural ecosystems of the Intermountain West. As a natural pathogen of cheatgrass, P. semeniperda has potential as a...
Enabling the democratization of the genomics revolution with a fully integrated web-based bioinformatics platform.

PubMed

Li, Po-E; Lo, Chien-Chi; Anderson, Joseph J; Davenport, Karen W; Bishop-Lilly, Kimberly A; Xu, Yan; Ahmed, Sanaa; Feng, Shihai; Mokashi, Vishwesh P; Chain, Patrick S G

2017-01-09

Continued advancements in sequencing technologies have fueled the development of new sequencing applications and promise to flood current databases with raw data. A number of factors prevent the seamless and easy use of these data, including the breadth of project goals, the wide array of tools that individually perform fractions of any given analysis, the large number of associated software/hardware dependencies, and the detailed expertise required to perform these analyses. To address these issues, we have developed an intuitive web-based environment with a wide assortment of integrated and cutting-edge bioinformatics tools in pre-configured workflows. These workflows, coupled with the ease of use of the environment, provide even novice next-generation sequencing users with the ability to perform many complex analyses with only a few mouse clicks and, within the context of the same environment, to visualize and further interrogate their results. This bioinformatics platform is an initial attempt at Empowering the Development of Genomics Expertise (EDGE) in a wide range of applications for microbial research. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Genome-wide copy number variation (CNV) detection in Nelore cattle reveals highly frequent variants in genome regions harboring QTLs affecting production traits.

PubMed

da Silva, Joaquim Manoel; Giachetto, Poliana Fernanda; da Silva, Luiz Otávio; Cintra, Leandro Carrijo; Paiva, Samuel Rezende; Yamagishi, Michel Eduardo Beleza; Caetano, Alexandre Rodrigues

2016-06-13

Copy number variations (CNVs) have been shown to account for substantial portions of observed genomic variation and have been associated with qualitative and quantitative traits and the onset of disease in a number of species. Information from high-resolution studies to detect, characterize and estimate population-specific variant frequencies will facilitate the incorporation of CNVs in genomic studies to identify genes affecting traits of importance. Genome-wide CNVs were detected in high-density single nucleotide polymorphism (SNP) genotyping data from 1,717 Nelore (Bos indicus) cattle, and in NGS data from eight key ancestral bulls. A total of 68,007 and 12,786 distinct CNVs were observed, respectively. Cross-comparisons of results obtained for the eight resequenced animals revealed that 92 % of the CNVs were observed in both datasets, while 62 % of all detected CNVs were observed to overlap with previously validated cattle copy number variant regions (CNVRs). Observed CNVs were used for obtaining breed-specific CNV frequencies and identification of CNVRs, which were subsequently used for gene annotation. A total of 688 of the detected CNVRs were observed to overlap with 286 non-redundant QTLs associated with important production traits in cattle. All of 34 CNVs previously reported to be associated with milk production traits in Holsteins were also observed in Nelore cattle. Comparisons of estimated frequencies of these CNVs in the two breeds revealed 14, 13, 6 and 14 regions in high (>20 %), low (<20 %) and divergent (NEL > HOL, NEL < HOL) frequencies, respectively. Obtained results significantly enriched the bovine CNV map and enabled the identification of variants that are potentially associated with traits under selection in Nelore cattle, particularly in genome regions harboring QTLs affecting production traits.
Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score.

PubMed

Lee, Hayan; Schatz, Michael C

2012-08-15

Genome resequencing and short read mapping are two of the primary tools of genomics and are used for many important applications. The current state-of-the-art in mapping uses the quality values and mapping quality scores to evaluate the reliability of the mapping. These attributes, however, are assigned to individual reads and do not directly measure the problematic repeats across the genome. Here, we present the Genome Mappability Score (GMS) as a novel measure of the complexity of resequencing a genome. The GMS is a weighted probability that any read could be unambiguously mapped to a given position and thus measures the overall composition of the genome itself. We have developed the Genome Mappability Analyzer to compute the GMS of every position in a genome. It leverages the parallelism of cloud computing to analyze large genomes, and enabled us to identify the 5-14% of the human, mouse, fly and yeast genomes that are difficult to analyze with short reads. We examined the accuracy of the widely used BWA/SAMtools polymorphism discovery pipeline in the context of the GMS, and found discovery errors are dominated by false negatives, especially in regions with poor GMS. These errors are fundamental to the mapping process and cannot be overcome by increasing coverage. As such, the GMS should be considered in every resequencing project to pinpoint the 'dark matter' of the genome, including of known clinically relevant variations in these regions. The source code and profiles of several model organisms are available at http://gma-bio.sourceforge.net
Genetic Competence Drives Genome Diversity in Bacillus subtilis

PubMed Central

Chevreux, Bastien; Serra, Cláudia R; Schyns, Ghislain; Henriques, Adriano O

2018-01-01

Abstract Prokaryote genomes are the result of a dynamic flux of genes, with increases achieved via horizontal gene transfer and reductions occurring through gene loss. The ecological and selective forces that drive this genomic flexibility vary across species. Bacillus subtilis is a naturally competent bacterium that occupies various environments, including plant-associated, soil, and marine niches, and the gut of both invertebrates and vertebrates. Here, we quantify the genomic diversity of B. subtilis and infer the genome dynamics that explain the high genetic and phenotypic diversity observed. Phylogenomic and comparative genomic analyses of 42 B. subtilis genomes uncover a remarkable genome diversity that translates into a core genome of 1,659 genes and an asymptotic pangenome growth rate of 57 new genes per new genome added. This diversity is due to a large proportion of low-frequency genes that are acquired from closely related species. We find no gene-loss bias among wild isolates, which explains why the cloud genome, 43% of the species pangenome, represents only a small proportion of each genome. We show that B. subtilis can acquire xenologous copies of core genes that propagate laterally among strains within a niche. While not excluding the contributions of other mechanisms, our results strongly suggest a process of gene acquisition that is largely driven by competence, where the long-term maintenance of acquired genes depends on local and global fitness effects. This competence-driven genomic diversity provides B. subtilis with its generalist character, enabling it to occupy a wide range of ecological niches and cycle through them. PMID:29272410
Characterization and expression profiling of glutathione S-transferases in the diamondback moth, Plutella xylostella (L.).

PubMed

You, Yanchun; Xie, Miao; Ren, Nana; Cheng, Xuemin; Li, Jianyu; Ma, Xiaoli; Zou, Minming; Vasseur, Liette; Gurr, Geoff M; You, Minsheng

2015-03-05

Glutathione S-transferases (GSTs) are multifunctional detoxification enzymes that play important roles in insects. The completion of several insect genome projects has enabled the identification and characterization of GST genes over recent years. This study presents a genome-wide investigation of the diamondback moth (DBM), Plutella xylostella, a species in which the GSTs are of special importance because this pest is highly resistant to many insecticides. A total of 22 putative cytosolic GSTs were identified from a published P. xylostella genome and grouped into 6 subclasses (with two unclassified). Delta, Epsilon and Omega GSTs were numerically superior with 5 genes for each of the subclasses. The resulting phylogenetic tree showed that the P. xylostella GSTs were all clustered into Lepidoptera-specific branches. Intron sites and phases as well as GSH binding sites were strongly conserved within each of the subclasses in the GSTs of P. xylostella. Transcriptome-, RNA-seq- and qRT-PCR-based analyses showed that the GST genes were developmental stage- and strain-specifically expressed. Most of the highly expressed genes in insecticide resistant strains were also predominantly expressed in the Malpighian tubules, midgut or epidermis. To date, this is the most comprehensive study on genome-wide identification, characterization and expression profiling of the GST family in P. xylostella. The diversified features and expression patterns of the GSTs are inferred to be associated with the capacity of this species to develop resistance to a wide range of pesticides and biological toxins. Our findings provide a base for functional research on specific GST genes, a better understanding of the evolution of insecticide resistance, and strategies for more sustainable management of the pest.
Genome-wide comparison of paired fresh frozen and formalin-fixed paraffin-embedded gliomas by custom BAC and oligonucleotide array comparative genomic hybridization: facilitating analysis of archival gliomas.

PubMed

Mohapatra, Gayatry; Engler, David A; Starbuck, Kristen D; Kim, James C; Bernay, Derek C; Scangas, George A; Rousseau, Audrey; Batchelor, Tracy T; Betensky, Rebecca A; Louis, David N

2011-04-01

Array comparative genomic hybridization (aCGH) is a powerful tool for detecting DNA copy number alterations (CNA). Because diffuse malignant gliomas are often sampled by small biopsies, formalin-fixed paraffin-embedded (FFPE) blocks are often the only tissue available for genetic analysis; FFPE tissues are also needed to study the intratumoral heterogeneity that characterizes these neoplasms. In this paper, we present a combination of evaluations and technical advances that provide strong support for the ready use of oligonucleotide aCGH on FFPE diffuse gliomas. We first compared aCGH using bacterial artificial chromosome (BAC) arrays in 45 paired frozen and FFPE gliomas, and demonstrate a high concordance rate between FFPE and frozen DNA in an individual clone-level analysis of sensitivity and specificity, assuring that under certain array conditions, frozen and FFPE DNA can perform nearly identically. However, because oligonucleotide arrays offer advantages to BAC arrays in genomic coverage and practical availability, we next developed a method of labeling DNA from FFPE tissue that allows efficient hybridization to oligonucleotide arrays. To demonstrate utility in FFPE tissues, we applied this approach to biphasic anaplastic oligoastrocytomas and demonstrate CNA differences between DNA obtained from the two components. Therefore, BAC and oligonucleotide aCGH can be sensitive and specific tools for detecting CNAs in FFPE DNA, and novel labeling techniques enable the routine use of oligonucleotide arrays for FFPE DNA. In combination, these advances should facilitate genome-wide analysis of rare, small and/or histologically heterogeneous gliomas from FFPE tissues.
Mammalian Synthetic Biology: Engineering Biological Systems.

PubMed

Black, Joshua B; Perez-Pinera, Pablo; Gersbach, Charles A

2017-06-21

The programming of new functions into mammalian cells has tremendous application in research and medicine. Continued improvements in the capacity to sequence and synthesize DNA have rapidly increased our understanding of mechanisms of gene function and regulation on a genome-wide scale and have expanded the set of genetic components available for programming cell biology. The invention of new research tools, including targetable DNA-binding systems such as CRISPR/Cas9 and sensor-actuator devices that can recognize and respond to diverse chemical, mechanical, and optical inputs, has enabled precise control of complex cellular behaviors at unprecedented spatial and temporal resolution. These tools have been critical for the expansion of synthetic biology techniques from prokaryotic and lower eukaryotic hosts to mammalian systems. Recent progress in the development of genome and epigenome editing tools and in the engineering of designer cells with programmable genetic circuits is expanding approaches to prevent, diagnose, and treat disease and to establish personalized theranostic strategies for next-generation medicines. This review summarizes the development of these enabling technologies and their application to transforming mammalian synthetic biology into a distinct field in research and medicine.
The histone shuffle: histone chaperones in an energetic dance

PubMed Central

Das, Chandrima; Tyler, Jessica K.; Churchill, Mair E.A.

2014-01-01

Our genetic information is tightly packaged into a rather ingenious nucleoprotein complex called chromatin in a manner that enables it to be rapidly accessed during genomic processes. Formation of the nucleosome, which is the fundamental unit of chromatin, occurs via a stepwise process that is reversed to enable the disassembly of nucleosomes. Histone chaperone proteins have prominent roles in facilitating these processes as well as in replacing old histones with new canonical histones or histone variants during the process of histone exchange. Recent structural, biophysical and biochemical studies have begun to shed light on the molecular mechanisms whereby histone chaperones promote chromatin assembly, disassembly and histone exchange to facilitate DNA replication, repair and transcription. PMID:20444609
TEAM: efficient two-locus epistasis tests in human genome-wide association study.

PubMed

Zhang, Xiang; Huang, Shunping; Zou, Fei; Wang, Wei

2010-06-15

As a promising tool for identifying genetic markers underlying phenotypic differences, genome-wide association study (GWAS) has been extensively investigated in recent years. In GWAS, detecting epistasis (or gene-gene interaction) is preferable over single locus study since many diseases are known to be complex traits. A brute force search is infeasible for epistasis detection in the genome-wide scale because of the intensive computational burden. Existing epistasis detection algorithms are designed for dataset consisting of homozygous markers and small sample size. In human study, however, the genotype may be heterozygous, and number of individuals can be up to thousands. Thus, existing methods are not readily applicable to human datasets. In this article, we propose an efficient algorithm, TEAM, which significantly speeds up epistasis detection for human GWAS. Our algorithm is exhaustive, i.e. it does not ignore any epistatic interaction. Utilizing the minimum spanning tree structure, the algorithm incrementally updates the contingency tables for epistatic tests without scanning all individuals. Our algorithm has broader applicability and is more efficient than existing methods for large sample study. It supports any statistical test that is based on contingency tables, and enables both family-wise error rate and false discovery rate controlling. Extensive experiments show that our algorithm only needs to examine a small portion of the individuals to update the contingency tables, and it achieves at least an order of magnitude speed up over the brute force approach.
SQC: secure quality control for meta-analysis of genome-wide association studies.

PubMed

Huang, Zhicong; Lin, Huang; Fellay, Jacques; Kutalik, Zoltán; Hubaux, Jean-Pierre

2017-08-01

Due to the limited power of small-scale genome-wide association studies (GWAS), researchers tend to collaborate and establish a larger consortium in order to perform large-scale GWAS. Genome-wide association meta-analysis (GWAMA) is a statistical tool that aims to synthesize results from multiple independent studies to increase the statistical power and reduce false-positive findings of GWAS. However, it has been demonstrated that the aggregate data of individual studies are subject to inference attacks, hence privacy concerns arise when researchers share study data in GWAMA. In this article, we propose a secure quality control (SQC) protocol, which enables checking the quality of data in a privacy-preserving way without revealing sensitive information to a potential adversary. SQC employs state-of-the-art cryptographic and statistical techniques for privacy protection. We implement the solution in a meta-analysis pipeline with real data to demonstrate the efficiency and scalability on commodity machines. The distributed execution of SQC on a cluster of 128 cores for one million genetic variants takes less than one hour, which is a modest cost considering the 10-month time span usually observed for the completion of the QC procedure that includes timing of logistics. SQC is implemented in Java and is publicly available at https://github.com/acs6610987/secureqc. jean-pierre.hubaux@epfl.ch. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
The impact of computer science in molecular medicine: enabling high-throughput research.

PubMed

de la Iglesia, Diana; García-Remesal, Miguel; de la Calle, Guillermo; Kulikowski, Casimir; Sanz, Ferran; Maojo, Víctor

2013-01-01

The Human Genome Project and the explosion of high-throughput data have transformed the areas of molecular and personalized medicine, which are producing a wide range of studies and experimental results and providing new insights for developing medical applications. Research in many interdisciplinary fields is resulting in data repositories and computational tools that support a wide diversity of tasks: genome sequencing, genome-wide association studies, analysis of genotype-phenotype interactions, drug toxicity and side effects assessment, prediction of protein interactions and diseases, development of computational models, biomarker discovery, and many others. The authors of the present paper have developed several inventories covering tools, initiatives and studies in different computational fields related to molecular medicine: medical informatics, bioinformatics, clinical informatics and nanoinformatics. With these inventories, created by mining the scientific literature, we have carried out several reviews of these fields, providing researchers with a useful framework to locate, discover, search and integrate resources. In this paper we present an analysis of the state-of-the-art as it relates to computational resources for molecular medicine, based on results compiled in our inventories, as well as results extracted from a systematic review of the literature and other scientific media. The present review is based on the impact of their related publications and the available data and software resources for molecular medicine. It aims to provide information that can be useful to support ongoing research and work to improve diagnostics and therapeutics based on molecular-level insights.
Genome-wide association study identifies Loci and candidate genes for body composition and meat quality traits in Beijing-You chickens.

PubMed

Liu, Ranran; Sun, Yanfa; Zhao, Guiping; Wang, Fangjie; Wu, Dan; Zheng, Maiqing; Chen, Jilan; Zhang, Lei; Hu, Yaodong; Wen, Jie

2013-01-01

Body composition and meat quality traits are important economic traits of chickens. The development of high-throughput genotyping platforms and relevant statistical methods have enabled genome-wide association studies in chickens. In order to identify molecular markers and candidate genes associated with body composition and meat quality traits, genome-wide association studies were conducted using the Illumina 60 K SNP Beadchip to genotype 724 Beijing-You chickens. For each bird, a total of 16 traits were measured, including carcass weight (CW), eviscerated weight (EW), dressing percentage, breast muscle weight (BrW) and percentage (BrP), thigh muscle weight and percentage, abdominal fat weight and percentage, dry matter and intramuscular fat contents of breast and thigh muscle, ultimate pH, and shear force of the pectoralis major muscle at 100 d of age. The SNPs that were significantly associated with the phenotypic traits were identified using both simple (GLM) and compressed mixed linear (MLM) models. For nine of ten body composition traits studied, SNPs showing genome wide significance (P<2.59E-6) have been identified. A consistent region on chicken (Gallus gallus) chromosome 4 (GGA4), including seven significant SNPs and four candidate genes (LCORL, LAP3, LDB2, TAPT1), were found to be associated with CW and EW. Another 0.65 Mb region on GGA3 for BrW and BrP was identified. After measuring the mRNA content in beast muscle for five genes located in this region, the changes in GJA1 expression were found to be consistent with that of breast muscle weight across development. It is highly possible that GJA1 is a functional gene for breast muscle development in chickens. For meat quality traits, several SNPs reaching suggestive association were identified and possible candidate genes with their functions were discussed.
Genome-wide analysis of ABA-responsive elements ABRE and CE3 reveals divergent patterns in Arabidopsis and rice

PubMed Central

Gómez-Porras, Judith L; Riaño-Pachón, Diego Mauricio; Dreyer, Ingo; Mayer, Jorge E; Mueller-Roeber, Bernd

2007-01-01

Background In plants, complex regulatory mechanisms are at the core of physiological and developmental processes. The phytohormone abscisic acid (ABA) is involved in the regulation of various such processes, including stomatal closure, seed and bud dormancy, and physiological responses to cold, drought and salinity stress. The underlying tissue or plant-wide control circuits often include combinatorial gene regulatory mechanisms and networks that we are only beginning to unravel with the help of new molecular tools. The increasing availability of genomic sequences and gene expression data enables us to dissect ABA regulatory mechanisms at the individual gene expression level. In this paper we used an in-silico-based approach directed towards genome-wide prediction and identification of specific features of ABA-responsive elements. In particular we analysed the genome-wide occurrence and positional arrangements of two well-described ABA-responsive cis-regulatory elements (CREs), ABRE and CE3, in thale cress (Arabidopsis thaliana) and rice (Oryza sativa). Results Our results show that Arabidopsis and rice use the ABA-responsive elements ABRE and CE3 distinctively. Earlier reports for various monocots have identified CE3 as a coupling element (CE) associated with ABRE. Surprisingly, we found that while ABRE is equally abundant in both species, CE3 is practically absent in Arabidopsis. ABRE-ABRE pairs are common in both genomes, suggesting that these can form functional ABA-responsive complexes (ABRCs) in Arabidopsis and rice. Furthermore, we detected distinct combinations, orientation patterns and DNA strand preferences of ABRE and CE3 motifs in rice gene promoters. Conclusion Our computational analyses revealed distinct recruitment patterns of ABA-responsive CREs in upstream sequences of Arabidopsis and rice. The apparent absence of CE3s in Arabidopsis suggests that another CE pairs with ABRE to establish a functional ABRC capable of interacting with transcription factors. Further studies will be needed to test whether the observed differences are extrapolatable to monocots and dicots in general, and to understand how they contribute to the fine-tuning of the hormonal response. The outcome of our investigation can now be used to direct future experimentation designed to further dissect the ABA-dependent regulatory networks. PMID:17672917
Genome-wide analysis of ABA-responsive elements ABRE and CE3 reveals divergent patterns in Arabidopsis and rice.

PubMed

Gómez-Porras, Judith L; Riaño-Pachón, Diego Mauricio; Dreyer, Ingo; Mayer, Jorge E; Mueller-Roeber, Bernd

2007-08-01

In plants, complex regulatory mechanisms are at the core of physiological and developmental processes. The phytohormone abscisic acid (ABA) is involved in the regulation of various such processes, including stomatal closure, seed and bud dormancy, and physiological responses to cold, drought and salinity stress. The underlying tissue or plant-wide control circuits often include combinatorial gene regulatory mechanisms and networks that we are only beginning to unravel with the help of new molecular tools. The increasing availability of genomic sequences and gene expression data enables us to dissect ABA regulatory mechanisms at the individual gene expression level. In this paper we used an in-silico-based approach directed towards genome-wide prediction and identification of specific features of ABA-responsive elements. In particular we analysed the genome-wide occurrence and positional arrangements of two well-described ABA-responsive cis-regulatory elements (CREs), ABRE and CE3, in thale cress (Arabidopsis thaliana) and rice (Oryza sativa). Our results show that Arabidopsis and rice use the ABA-responsive elements ABRE and CE3 distinctively. Earlier reports for various monocots have identified CE3 as a coupling element (CE) associated with ABRE. Surprisingly, we found that while ABRE is equally abundant in both species, CE3 is practically absent in Arabidopsis. ABRE-ABRE pairs are common in both genomes, suggesting that these can form functional ABA-responsive complexes (ABRCs) in Arabidopsis and rice. Furthermore, we detected distinct combinations, orientation patterns and DNA strand preferences of ABRE and CE3 motifs in rice gene promoters. Our computational analyses revealed distinct recruitment patterns of ABA-responsive CREs in upstream sequences of Arabidopsis and rice. The apparent absence of CE3s in Arabidopsis suggests that another CE pairs with ABRE to establish a functional ABRC capable of interacting with transcription factors. Further studies will be needed to test whether the observed differences are extrapolatable to monocots and dicots in general, and to understand how they contribute to the fine-tuning of the hormonal response. The outcome of our investigation can now be used to direct future experimentation designed to further dissect the ABA-dependent regulatory networks.
"FACILS 2014: Microbially-driven facilitation systems in environmental biotechnology" (hereafter "FACILS") presented here by the European Commission (EC)-United States (US) Task Force on Biotechnology Research

DOE Office of Scientific and Technical Information (OSTI.GOV)

Methe, Barbara

As we enter the 21st century, the sustainability of the biosphere is a global challenge that can best be met with a global response. This includes how we train and promote our next generation of research scientists in the emerging arenas of genome-enabled biology and a bio-based economy. It is this fundamental issue that formed the motivation for designing and conducting a shortcourse entitled “FACILIS 2014: Microbially-driven facilitation systems in environmental biotechnology” (hereafter “FACILIS”) presented here by the European Commission (EC)-United States (US) Task Force on Biotechnology Research. This WG was established in 1994 under the umbrella of the US-ECmore » Task Force on Biotechnology Research, a transatlantic collaborative group overseen by the US Office of Science and Technology Policy (OSTP) and the EC. The Environmental Biotechnology Working Group maintains several goals, including establishing research links between scientists in EU countries and the US and fostering the careers of junior scientists from both sides of the Atlantic to the global nature of scientific cooperation. To that end, a shortcourse was held at the University of Milan in Italy on July 12-25 2014 organized around cross-cutting themes of genomic science and designed to attract a stellar group of interdisciplinary early carrier researchers. A total of 22 students, 10 from the US and 12 from the EU participated. The course provided them with hands-on experience with the latest scientific methods in genomics and bioinformatics; using a format that combines lectures, laboratory research and field work with the final goal to enable researchers to finally turn data into knowledge.« less
The Applied Development of a Tiered Multilocus Sequence Typing (MLST) Scheme for Dichelobacter nodosus.

PubMed

Blanchard, Adam M; Jolley, Keith A; Maiden, Martin C J; Coffey, Tracey J; Maboni, Grazieli; Staley, Ceri E; Bollard, Nicola J; Warry, Andrew; Emes, Richard D; Davies, Peers L; Tötemeyer, Sabine

2018-01-01

Dichelobacter nodosus ( D. nodosus ) is the causative pathogen of ovine footrot, a disease that has a significant welfare and financial impact on the global sheep industry. Previous studies into the phylogenetics of D. nodosus have focused on Australia and Scandinavia, meaning the current diversity in the United Kingdom (U.K.) population and its relationship globally, is poorly understood. Numerous epidemiological methods are available for bacterial typing; however, few account for whole genome diversity or provide the opportunity for future application of new computational techniques. Multilocus sequence typing (MLST) measures nucleotide variations within several loci with slow accumulation of variation to enable the designation of allele numbers to determine a sequence type. The usage of whole genome sequence data enables the application of MLST, but also core and whole genome MLST for higher levels of strain discrimination with a negligible increase in experimental cost. An MLST database was developed alongside a seven loci scheme using publically available whole genome data from the sequence read archive. Sequence type designation and strain discrimination was compared to previously published data to ensure reproducibility. Multiple D. nodosus isolates from U.K. farms were directly compared to populations from other countries. The U.K. isolates define new clades within the global population of D. nodosus and predominantly consist of serogroups A, B and H, however serogroups C, D, E, and I were also found. The scheme is publically available at https://pubmlst.org/dnodosus/.
Privacy-preserving GWAS analysis on federated genomic datasets.

PubMed

Constable, Scott D; Tang, Yuzhe; Wang, Shuang; Jiang, Xiaoqian; Chapin, Steve

2015-01-01

The biomedical community benefits from the increasing availability of genomic data to support meaningful scientific research, e.g., Genome-Wide Association Studies (GWAS). However, high quality GWAS usually requires a large amount of samples, which can grow beyond the capability of a single institution. Federated genomic data analysis holds the promise of enabling cross-institution collaboration for effective GWAS, but it raises concerns about patient privacy and medical information confidentiality (as data are being exchanged across institutional boundaries), which becomes an inhibiting factor for the practical use. We present a privacy-preserving GWAS framework on federated genomic datasets. Our method is to layer the GWAS computations on top of secure multi-party computation (MPC) systems. This approach allows two parties in a distributed system to mutually perform secure GWAS computations, but without exposing their private data outside. We demonstrate our technique by implementing a framework for minor allele frequency counting and χ2 statistics calculation, one of typical computations used in GWAS. For efficient prototyping, we use a state-of-the-art MPC framework, i.e., Portable Circuit Format (PCF) 1. Our experimental results show promise in realizing both efficient and secure cross-institution GWAS computations.

5C-ID: Increased resolution Chromosome-Conformation-Capture-Carbon-Copy with in situ 3C and double alternating primer design.

PubMed

Kim, Ji Hun; Titus, Katelyn R; Gong, Wanfeng; Beagan, Jonathan A; Cao, Zhendong; Phillips-Cremins, Jennifer E

2018-05-14

Mammalian genomes are folded in a hierarchy of compartments, topologically associating domains (TADs), subTADs, and looping interactions. Currently, there is a great need to evaluate the link between chromatin topology and genome function across many biological conditions and genetic perturbations. Hi-C can generate genome-wide maps of looping interactions but is intractable for high-throughput comparison of loops across multiple conditions due to the enormous number of reads (>6 Billion) required per library. Here, we describe 5C-ID, a new version of Chromosome-Conformation-Capture-Carbon-Copy (5C) with restriction digest and ligation performed in the nucleus (in situ Chromosome-Conformation-Capture (3C)) and ligation-mediated amplification performed with a double alternating primer design. We demonstrate that 5C-ID produces higher-resolution 3D genome folding maps with reduced spatial noise using markedly lower cell numbers than canonical 5C. 5C-ID enables the creation of high-resolution, high-coverage maps of chromatin loops in up to a 30 Megabase subset of the genome at a fraction of the cost of Hi-C. Copyright © 2018 Elsevier Inc. All rights reserved.
Visualization of RNA structure models within the Integrative Genomics Viewer.

PubMed

Busan, Steven; Weeks, Kevin M

2017-07-01

Analyses of the interrelationships between RNA structure and function are increasingly important components of genomic studies. The SHAPE-MaP strategy enables accurate RNA structure probing and realistic structure modeling of kilobase-length noncoding RNAs and mRNAs. Existing tools for visualizing RNA structure models are not suitable for efficient analysis of long, structurally heterogeneous RNAs. In addition, structure models are often advantageously interpreted in the context of other experimental data and gene annotation information, for which few tools currently exist. We have developed a module within the widely used and well supported open-source Integrative Genomics Viewer (IGV) that allows visualization of SHAPE and other chemical probing data, including raw reactivities, data-driven structural entropies, and data-constrained base-pair secondary structure models, in context with linear genomic data tracks. We illustrate the usefulness of visualizing RNA structure in the IGV by exploring structure models for a large viral RNA genome, comparing bacterial mRNA structure in cells with its structure under cell- and protein-free conditions, and comparing a noncoding RNA structure modeled using SHAPE data with a base-pairing model inferred through sequence covariation analysis. © 2017 Busan and Weeks; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
The Neisseria meningitidis CRISPR-Cas9 System Enables Specific Genome Editing in Mammalian Cells.

PubMed

Lee, Ciaran M; Cradick, Thomas J; Bao, Gang

2016-03-01

The clustered regularly-interspaced short palindromic repeats (CRISPR)-CRISPR-associated (Cas) system from Streptococcus pyogenes (Spy) has been successfully adapted for RNA-guided genome editing in a wide range of organisms. However, numerous reports have indicated that Spy CRISPR-Cas9 systems may have significant off-target cleavage of genomic DNA sequences differing from the intended on-target site. Here, we report the performance of the Neisseria meningitidis (Nme) CRISPR-Cas9 system that requires a longer protospacer-adjacent motif for site-specific cleavage, and present a comparison between the Spy and Nme CRISPR-Cas9 systems targeting the same protospacer sequence. The results with the native crRNA and tracrRNA as well as a chimeric single guide RNA for the Nme CRISPR-Cas9 system were also compared. Our results suggest that, compared with the Spy system, the Nme CRISPR-Cas9 system has similar or lower on-target cleavage activity but a reduced overall off-target effect on a genomic level when sites containing three or fewer mismatches are considered. Thus, the Nme CRISPR-Cas9 system may represent a safer alternative for precision genome engineering applications.
Self-guided management of exome and whole-genome sequencing results: changing the results return model.

PubMed

Yu, Joon-Ho; Jamal, Seema M; Tabor, Holly K; Bamshad, Michael J

2013-09-01

Researchers and clinicians face the practical and ethical challenge of if and how to offer for return the wide and varied scope of results available from individual exome sequencing and whole-genome sequencing. We argue that rather than viewing individual exome sequencing and whole-genome sequencing as a test for which results need to be "returned," that the technology should instead be framed as a dynamic resource of information from which results should be "managed" over the lifetime of an individual. We further suggest that individual exome sequencing and whole-genome sequencing results management is optimized using a self-guided approach that enables individuals to self-select among results offered for return in a convenient, confidential, personalized context that is responsive to their value system. This approach respects autonomy, allows individuals to maximize potential benefits of genomic information (beneficence) and minimize potential harms (nonmaleficence), and also preserves their right to an open future to the extent they desire or think is appropriate. We describe key challenges and advantages of such a self-guided management system and offer guidance on implementation using an information systems approach.
The Neisseria meningitidis CRISPR-Cas9 System Enables Specific Genome Editing in Mammalian Cells

PubMed Central

Lee, Ciaran M; Cradick, Thomas J; Bao, Gang

2016-01-01

The clustered regularly-interspaced short palindromic repeats (CRISPR)—CRISPR-associated (Cas) system from Streptococcus pyogenes (Spy) has been successfully adapted for RNA-guided genome editing in a wide range of organisms. However, numerous reports have indicated that Spy CRISPR-Cas9 systems may have significant off-target cleavage of genomic DNA sequences differing from the intended on-target site. Here, we report the performance of the Neisseria meningitidis (Nme) CRISPR-Cas9 system that requires a longer protospacer-adjacent motif for site-specific cleavage, and present a comparison between the Spy and Nme CRISPR-Cas9 systems targeting the same protospacer sequence. The results with the native crRNA and tracrRNA as well as a chimeric single guide RNA for the Nme CRISPR-Cas9 system were also compared. Our results suggest that, compared with the Spy system, the Nme CRISPR-Cas9 system has similar or lower on-target cleavage activity but a reduced overall off-target effect on a genomic level when sites containing three or fewer mismatches are considered. Thus, the Nme CRISPR-Cas9 system may represent a safer alternative for precision genome engineering applications. PMID:26782639
Path from schizophrenia genomics to biology: gene regulation and perturbation in neurons derived from induced pluripotent stem cells and genome editing.

PubMed

Duan, Jubao

2015-02-01

Schizophrenia (SZ) is a devastating mental disorder afflicting 1% of the population. Recent genome-wide association studies (GWASs) of SZ have identified >100 risk loci. However, the causal variants/genes and the causal mechanisms remain largely unknown, which hinders the translation of GWAS findings into disease biology and drug targets. Most risk variants are noncoding, thus likely regulate gene expression. A major mechanism of transcriptional regulation is chromatin remodeling, and open chromatin is a versatile predictor of regulatory sequences. MicroRNA-mediated post-transcriptional regulation plays an important role in SZ pathogenesis. Neurons differentiated from patient-specific induced pluripotent stem cells (iPSCs) provide an experimental model to characterize the genetic perturbation of regulatory variants that are often specific to cell type and/or developmental stage. The emerging genome-editing technology enables the creation of isogenic iPSCs and neurons to efficiently characterize the effects of SZ-associated regulatory variants on SZ-relevant molecular and cellular phenotypes involving dopaminergic, glutamatergic, and GABAergic neurotransmissions. SZ GWAS findings equipped with the emerging functional genomics approaches provide an unprecedented opportunity for understanding new disease biology and identifying novel drug targets.
Genotyping-by-sequencing enables linkage mapping in three octoploid cultivated strawberry families

PubMed Central

Salinas, Natalia; Tennessen, Jacob A.; Zurn, Jason D.; Sargent, Daniel James; Hancock, James; Bassil, Nahla V.

2017-01-01

Genotyping-by-sequencing (GBS) was used to survey genome-wide single-nucleotide polymorphisms (SNPs) in three biparental strawberry (Fragaria × ananassa) populations with the goal of evaluating this technique in a species with a complex octoploid genome. GBS sequence data were aligned to the F. vesca ‘Fvb’ reference genome in order to call SNPs. Numbers of polymorphic SNPs per population ranged from 1,163 to 3,190. Linkage maps consisting of 30–65 linkage groups were produced from the SNP sets derived from each parent. The linkage groups covered 99% of the Fvb reference genome, with three to seven linkage groups from a given parent aligned to any particular chromosome. A phylogenetic analysis performed using the POLiMAPS pipeline revealed linkage groups that were most similar to ancestral species F. vesca for each chromosome. Linkage groups that were most similar to a second ancestral species, F. iinumae, were only resolved for Fvb 4. The quantity of missing data and heterogeneity in genome coverage inherent in GBS complicated the analysis, but POLiMAPS resolved F. × ananassa chromosomal regions derived from diploid ancestor F. vesca. PMID:28875078
Ergatis: a web interface and scalable software system for bioinformatics workflows

PubMed Central

Orvis, Joshua; Crabtree, Jonathan; Galens, Kevin; Gussman, Aaron; Inman, Jason M.; Lee, Eduardo; Nampally, Sreenath; Riley, David; Sundaram, Jaideep P.; Felix, Victor; Whitty, Brett; Mahurkar, Anup; Wortman, Jennifer; White, Owen; Angiuoli, Samuel V.

2010-01-01

Motivation: The growth of sequence data has been accompanied by an increasing need to analyze data on distributed computer clusters. The use of these systems for routine analysis requires scalable and robust software for data management of large datasets. Software is also needed to simplify data management and make large-scale bioinformatics analysis accessible and reproducible to a wide class of target users. Results: We have developed a workflow management system named Ergatis that enables users to build, execute and monitor pipelines for computational analysis of genomics data. Ergatis contains preconfigured components and template pipelines for a number of common bioinformatics tasks such as prokaryotic genome annotation and genome comparisons. Outputs from many of these components can be loaded into a Chado relational database. Ergatis was designed to be accessible to a broad class of users and provides a user friendly, web-based interface. Ergatis supports high-throughput batch processing on distributed compute clusters and has been used for data management in a number of genome annotation and comparative genomics projects. Availability: Ergatis is an open-source project and is freely available at http://ergatis.sourceforge.net Contact: jorvis@users.sourceforge.net PMID:20413634
Demographic Divergence History of Pied Flycatcher and Collared Flycatcher Inferred from Whole-Genome Re-sequencing Data

PubMed Central

Nadachowska-Brzyska, Krystyna; Burri, Reto; Olason, Pall I.; Kawakami, Takeshi; Smeds, Linnéa; Ellegren, Hans

2013-01-01

Profound knowledge of demographic history is a prerequisite for the understanding and inference of processes involved in the evolution of population differentiation and speciation. Together with new coalescent-based methods, the recent availability of genome-wide data enables investigation of differentiation and divergence processes at unprecedented depth. We combined two powerful approaches, full Approximate Bayesian Computation analysis (ABC) and pairwise sequentially Markovian coalescent modeling (PSMC), to reconstruct the demographic history of the split between two avian speciation model species, the pied flycatcher and collared flycatcher. Using whole-genome re-sequencing data from 20 individuals, we investigated 15 demographic models including different levels and patterns of gene flow, and changes in effective population size over time. ABC provided high support for recent (mode 0.3 my, range <0.7 my) species divergence, declines in effective population size of both species since their initial divergence, and unidirectional recent gene flow from pied flycatcher into collared flycatcher. The estimated divergence time and population size changes, supported by PSMC results, suggest that the ancestral species persisted through one of the glacial periods of middle Pleistocene and then split into two large populations that first increased in size before going through severe bottlenecks and expanding into their current ranges. Secondary contact appears to have been established after the last glacial maximum. The severity of the bottlenecks at the last glacial maximum is indicated by the discrepancy between current effective population sizes (20,000–80,000) and census sizes (5–50 million birds) of the two species. The recent divergence time challenges the supposition that avian speciation is a relatively slow process with extended times for intrinsic postzygotic reproductive barriers to evolve. Our study emphasizes the importance of using genome-wide data to unravel tangled demographic histories. Moreover, it constitutes one of the first examples of the inference of divergence history from genome-wide data in non-model species. PMID:24244198
Demographic divergence history of pied flycatcher and collared flycatcher inferred from whole-genome re-sequencing data.

PubMed

Nadachowska-Brzyska, Krystyna; Burri, Reto; Olason, Pall I; Kawakami, Takeshi; Smeds, Linnéa; Ellegren, Hans

2013-11-01

Profound knowledge of demographic history is a prerequisite for the understanding and inference of processes involved in the evolution of population differentiation and speciation. Together with new coalescent-based methods, the recent availability of genome-wide data enables investigation of differentiation and divergence processes at unprecedented depth. We combined two powerful approaches, full Approximate Bayesian Computation analysis (ABC) and pairwise sequentially Markovian coalescent modeling (PSMC), to reconstruct the demographic history of the split between two avian speciation model species, the pied flycatcher and collared flycatcher. Using whole-genome re-sequencing data from 20 individuals, we investigated 15 demographic models including different levels and patterns of gene flow, and changes in effective population size over time. ABC provided high support for recent (mode 0.3 my, range <0.7 my) species divergence, declines in effective population size of both species since their initial divergence, and unidirectional recent gene flow from pied flycatcher into collared flycatcher. The estimated divergence time and population size changes, supported by PSMC results, suggest that the ancestral species persisted through one of the glacial periods of middle Pleistocene and then split into two large populations that first increased in size before going through severe bottlenecks and expanding into their current ranges. Secondary contact appears to have been established after the last glacial maximum. The severity of the bottlenecks at the last glacial maximum is indicated by the discrepancy between current effective population sizes (20,000-80,000) and census sizes (5-50 million birds) of the two species. The recent divergence time challenges the supposition that avian speciation is a relatively slow process with extended times for intrinsic postzygotic reproductive barriers to evolve. Our study emphasizes the importance of using genome-wide data to unravel tangled demographic histories. Moreover, it constitutes one of the first examples of the inference of divergence history from genome-wide data in non-model species.
An Efficient Strategy Combining SSR Markers- and Advanced QTL-seq-driven QTL Mapping Unravels Candidate Genes Regulating Grain Weight in Rice

PubMed Central

Daware, Anurag; Das, Sweta; Srivastava, Rishi; Badoni, Saurabh; Singh, Ashok K.; Agarwal, Pinky; Parida, Swarup K.; Tyagi, Akhilesh K.

2016-01-01

Development and use of genome-wide informative simple sequence repeat (SSR) markers and novel integrated genomic strategies are vital to drive genomics-assisted breeding applications and for efficient dissection of quantitative trait loci (QTLs) underlying complex traits in rice. The present study developed 6244 genome-wide informative SSR markers exhibiting in silico fragment length polymorphism based on repeat-unit variations among genomic sequences of 11 indica, japonica, aus, and wild rice accessions. These markers were mapped on diverse coding and non-coding sequence components of known cloned/candidate genes annotated from 12 chromosomes and revealed a much higher amplification (97%) and polymorphic potential (88%) along with wider genetic/functional diversity level (16–74% with a mean 53%) especially among accessions belonging to indica cultivar group, suggesting their utility in large-scale genomics-assisted breeding applications in rice. A high-density 3791 SSR markers-anchored genetic linkage map (IR 64 × Sonasal) spanning 2060 cM total map-length with an average inter-marker distance of 0.54 cM was generated. This reference genetic map identified six major genomic regions harboring robust QTLs (31% combined phenotypic variation explained with a 5.7–8.7 LOD) governing grain weight on six rice chromosomes. One strong grain weight major QTL region (OsqGW5.1) was narrowed-down by integrating traditional QTL mapping with high-resolution QTL region-specific integrated SSR and single nucleotide polymorphism markers-based QTL-seq analysis and differential expression profiling. This led us to delineate two natural allelic variants in two known cis-regulatory elements (RAV1AAT and CARGCW8GAT) of glycosyl hydrolase and serine carboxypeptidase genes exhibiting pronounced seed-specific differential regulation in low (Sonasal) and high (IR 64) grain weight mapping parental accessions. Our genome-wide SSR marker resource (polymorphic within/between diverse cultivar groups) and integrated genomic strategy can efficiently scan functionally relevant potential molecular tags (markers, candidate genes and alleles) regulating complex agronomic traits (grain weight) and expedite marker-assisted genetic enhancement in rice. PMID:27833617
Improved evidence-based genome-scale metabolic models for maize leaf, embryo, and endosperm

PubMed Central

Seaver, Samuel M. D.; Bradbury, Louis M. T.; Frelin, Océane; Zarecki, Raphy; Ruppin, Eytan; Hanson, Andrew D.; Henry, Christopher S.

2015-01-01

There is a growing demand for genome-scale metabolic reconstructions for plants, fueled by the need to understand the metabolic basis of crop yield and by progress in genome and transcriptome sequencing. Methods are also required to enable the interpretation of plant transcriptome data to study how cellular metabolic activity varies under different growth conditions or even within different organs, tissues, and developmental stages. Such methods depend extensively on the accuracy with which genes have been mapped to the biochemical reactions in the plant metabolic pathways. Errors in these mappings lead to metabolic reconstructions with an inflated number of reactions and possible generation of unreliable metabolic phenotype predictions. Here we introduce a new evidence-based genome-scale metabolic reconstruction of maize, with significant improvements in the quality of the gene-reaction associations included within our model. We also present a new approach for applying our model to predict active metabolic genes based on transcriptome data. This method includes a minimal set of reactions associated with low expression genes to enable activity of a maximum number of reactions associated with high expression genes. We apply this method to construct an organ-specific model for the maize leaf, and tissue specific models for maize embryo and endosperm cells. We validate our models using fluxomics data for the endosperm and embryo, demonstrating an improved capacity of our models to fit the available fluxomics data. All models are publicly available via the DOE Systems Biology Knowledgebase and PlantSEED, and our new method is generally applicable for analysis transcript profiles from any plant, paving the way for further in silico studies with a wide variety of plant genomes. PMID:25806041
Improved evidence-based genome-scale metabolic models for maize leaf, embryo, and endosperm

DOE PAGES

Seaver, Samuel M.D.; Bradbury, Louis M.T.; Frelin, Océane; ...

2015-03-10

There is a growing demand for genome-scale metabolic reconstructions for plants, fueled by the need to understand the metabolic basis of crop yield and by progress in genome and transcriptome sequencing. Methods are also required to enable the interpretation of plant transcriptome data to study how cellular metabolic activity varies under different growth conditions or even within different organs, tissues, and developmental stages. Such methods depend extensively on the accuracy with which genes have been mapped to the biochemical reactions in the plant metabolic pathways. Errors in these mappings lead to metabolic reconstructions with an inflated number of reactions andmore » possible generation of unreliable metabolic phenotype predictions. Here we introduce a new evidence-based genome-scale metabolic reconstruction of maize, with significant improvements in the quality of the gene-reaction associations included within our model. We also present a new approach for applying our model to predict active metabolic genes based on transcriptome data. This method includes a minimal set of reactions associated with low expression genes to enable activity of a maximum number of reactions associated with high expression genes. We apply this method to construct an organ-specific model for the maize leaf, and tissue specific models for maize embryo and endosperm cells. We validate our models using fluxomics data for the endosperm and embryo, demonstrating an improved capacity of our models to fit the available fluxomics data. All models are publicly available via the DOE Systems Biology Knowledgebase and PlantSEED, and our new method is generally applicable for analysis transcript profiles from any plant, paving the way for further in silico studies with a wide variety of plant genomes.« less
eXframe: reusable framework for storage, analysis and visualization of genomics experiments

PubMed Central

2011-01-01

Background Genome-wide experiments are routinely conducted to measure gene expression, DNA-protein interactions and epigenetic status. Structured metadata for these experiments is imperative for a complete understanding of experimental conditions, to enable consistent data processing and to allow retrieval, comparison, and integration of experimental results. Even though several repositories have been developed for genomics data, only a few provide annotation of samples and assays using controlled vocabularies. Moreover, many of them are tailored for a single type of technology or measurement and do not support the integration of multiple data types. Results We have developed eXframe - a reusable web-based framework for genomics experiments that provides 1) the ability to publish structured data compliant with accepted standards 2) support for multiple data types including microarrays and next generation sequencing 3) query, analysis and visualization integration tools (enabled by consistent processing of the raw data and annotation of samples) and is available as open-source software. We present two case studies where this software is currently being used to build repositories of genomics experiments - one contains data from hematopoietic stem cells and another from Parkinson's disease patients. Conclusion The web-based framework eXframe offers structured annotation of experiments as well as uniform processing and storage of molecular data from microarray and next generation sequencing platforms. The framework allows users to query and integrate information across species, technologies, measurement types and experimental conditions. Our framework is reusable and freely modifiable - other groups or institutions can deploy their own custom web-based repositories based on this software. It is interoperable with the most important data formats in this domain. We hope that other groups will not only use eXframe, but also contribute their own useful modifications. PMID:22103807
GIANT 2.0: genome-scale integrated analysis of gene networks in tissues.

PubMed

Wong, Aaron K; Krishnan, Arjun; Troyanskaya, Olga G

2018-05-25

GIANT2 (Genome-wide Integrated Analysis of gene Networks in Tissues) is an interactive web server that enables biomedical researchers to analyze their proteins and pathways of interest and generate hypotheses in the context of genome-scale functional maps of human tissues. The precise actions of genes are frequently dependent on their tissue context, yet direct assay of tissue-specific protein function and interactions remains infeasible in many normal human tissues and cell-types. With GIANT2, researchers can explore predicted tissue-specific functional roles of genes and reveal changes in those roles across tissues, all through interactive multi-network visualizations and analyses. Additionally, the NetWAS approach available through the server uses tissue-specific/cell-type networks predicted by GIANT2 to re-prioritize statistical associations from GWAS studies and identify disease-associated genes. GIANT2 predicts tissue-specific interactions by integrating diverse functional genomics data from now over 61 400 experiments for 283 diverse tissues and cell-types. GIANT2 does not require any registration or installation and is freely available for use at http://giant-v2.princeton.edu.
An integrated semiconductor device enabling non-optical genome sequencing.

PubMed

Rothberg, Jonathan M; Hinz, Wolfgang; Rearick, Todd M; Schultz, Jonathan; Mileski, William; Davey, Mel; Leamon, John H; Johnson, Kim; Milgrew, Mark J; Edwards, Matthew; Hoon, Jeremy; Simons, Jan F; Marran, David; Myers, Jason W; Davidson, John F; Branting, Annika; Nobile, John R; Puc, Bernard P; Light, David; Clark, Travis A; Huber, Martin; Branciforte, Jeffrey T; Stoner, Isaac B; Cawley, Simon E; Lyons, Michael; Fu, Yutao; Homer, Nils; Sedova, Marina; Miao, Xin; Reed, Brian; Sabina, Jeffrey; Feierstein, Erika; Schorn, Michelle; Alanjary, Mohammad; Dimalanta, Eileen; Dressman, Devin; Kasinskas, Rachel; Sokolsky, Tanya; Fidanza, Jacqueline A; Namsaraev, Eugeni; McKernan, Kevin J; Williams, Alan; Roth, G Thomas; Bustillo, James

2011-07-20

The seminal importance of DNA sequencing to the life sciences, biotechnology and medicine has driven the search for more scalable and lower-cost solutions. Here we describe a DNA sequencing technology in which scalable, low-cost semiconductor manufacturing techniques are used to make an integrated circuit able to directly perform non-optical DNA sequencing of genomes. Sequence data are obtained by directly sensing the ions produced by template-directed DNA polymerase synthesis using all-natural nucleotides on this massively parallel semiconductor-sensing device or ion chip. The ion chip contains ion-sensitive, field-effect transistor-based sensors in perfect register with 1.2 million wells, which provide confinement and allow parallel, simultaneous detection of independent sequencing reactions. Use of the most widely used technology for constructing integrated circuits, the complementary metal-oxide semiconductor (CMOS) process, allows for low-cost, large-scale production and scaling of the device to higher densities and larger array sizes. We show the performance of the system by sequencing three bacterial genomes, its robustness and scalability by producing ion chips with up to 10 times as many sensors and sequencing a human genome.
Cold adaptive traits revealed by comparative genomic analysis of the eurypsychrophile Rhodococcus sp. JG3 isolated from high elevation McMurdo Dry Valley permafrost, Antarctica.

PubMed

Goordial, Jacqueline; Raymond-Bouchard, Isabelle; Zolotarov, Yevgen; de Bethencourt, Luis; Ronholm, Jennifer; Shapiro, Nicole; Woyke, Tanja; Stromvik, Martina; Greer, Charles W; Bakermans, Corien; Whyte, Lyle

2016-02-01

The permafrost soils of the high elevation McMurdo Dry Valleys are the most cold, desiccating and oligotrophic on Earth. Rhodococcus sp. JG3 is one of very few bacterial isolates from Antarctic Dry Valley permafrost, and displays subzero growth down to -5°C. To understand how Rhodococcus sp. JG3 is able to survive extreme permafrost conditions and be metabolically active at subzero temperatures, we sequenced its genome and compared it to the genomes of 14 mesophilic rhodococci. Rhodococcus sp. JG3 possessed a higher copy number of genes for general stress response, UV protection and protection from cold shock, osmotic stress and oxidative stress. We characterized genome wide molecular adaptations to cold, and identified genes that had amino acid compositions favourable for increased flexibility and functionality at low temperatures. Rhodococcus sp. JG3 possesses multiple complimentary strategies which may enable its survival in some of the harshest permafrost on Earth. © FEMS 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Functional Genomics Using the Saccharomyces cerevisiae Yeast Deletion Collections.

PubMed

Nislow, Corey; Wong, Lai Hong; Lee, Amy Huei-Yi; Giaever, Guri

2016-09-01

Constructed by a consortium of 16 laboratories, the Saccharomyces genome-wide deletion collections have, for the past decade, provided a powerful, rapid, and inexpensive approach for functional profiling of the yeast genome. Loss-of-function deletion mutants were systematically created using a polymerase chain reaction (PCR)-based gene deletion strategy to generate a start-to-stop codon replacement of each open reading frame by homologous recombination. Each strain carries two molecular barcodes that serve as unique strain identifiers, enabling their growth to be analyzed in parallel and the fitness contribution of each gene to be quantitatively assessed by hybridization to high-density oligonucleotide arrays or through the use of next-generation sequencing technologies. Functional profiling of the deletion collections, using either strain-by-strain or parallel assays, provides an unbiased approach to systematically survey the yeast genome. The Saccharomyces yeast deletion collections have proved immensely powerful in contributing to the understanding of gene function, including functional relationships between genes and genetic pathways in response to diverse genetic and environmental perturbations. © 2016 Cold Spring Harbor Laboratory Press.
Comparative genome-wide polymorphic microsatellite markers in Antarctic penguins through next generation sequencing

PubMed Central

Vianna, Juliana A.; Noll, Daly; Mura-Jornet, Isidora; Valenzuela-Guerra, Paulina; González-Acuña, Daniel; Navarro, Cristell; Loyola, David E.; Dantas, Gisele P. M.

2017-01-01

Abstract Microsatellites are valuable molecular markers for evolutionary and ecological studies. Next generation sequencing is responsible for the increasing number of microsatellites for non-model species. Penguins of the Pygoscelis genus are comprised of three species: Adélie (P. adeliae), Chinstrap (P. antarcticus) and Gentoo penguin (P. papua), all distributed around Antarctica and the sub-Antarctic. The species have been affected differently by climate change, and the use of microsatellite markers will be crucial to monitor population dynamics. We characterized a large set of genome-wide microsatellites and evaluated polymorphisms in all three species. SOLiD reads were generated from the libraries of each species, identifying a large amount of microsatellite loci: 33,677, 35,265 and 42,057 for P. adeliae, P. antarcticus and P. papua, respectively. A large number of dinucleotide (66,139), trinucleotide (29,490) and tetranucleotide (11,849) microsatellites are described. Microsatellite abundance, diversity and orthology were characterized in penguin genomes. We evaluated polymorphisms in 170 tetranucleotide loci, obtaining 34 polymorphic loci in at least one species and 15 polymorphic loci in all three species, which allow to perform comparative studies. Polymorphic markers presented here enable a number of ecological, population, individual identification, parentage and evolutionary studies of Pygoscelis, with potential use in other penguin species. PMID:28898354
Red jungle fowl (Gallus gallus) as a model for studying the molecular mechanism of seasonal reproduction.

PubMed

Ono, Hiroko; Nakao, Nobuhiro; Yamamura, Takashi; Kinoshita, Keiji; Mizutani, Makoto; Namikawa, Takao; Iigo, Masayuki; Ebihara, Shizufumi; Yoshimura, Takashi

2009-06-01

Photoperiodism is an adaptation mechanism that enables animals to predict seasonal changes in the environment. Japanese quail is the best model organism for studying photoperiodism. Although the recent availability of chicken genome sequences has permitted the expansion from single gene to genome-wide transcriptional analysis in this organism, the photoperiodic response of the domestic chicken is less robust than that of the quail. Therefore, in the present study, we examined the photoperiodic response of the red jungle fowl (Gallus gallus), a predecessor of the domestic chicken, to test whether this animal could be developed as an ideal model for studying the molecular mechanisms of seasonal reproduction. When red jungle fowls were transferred from short-day- to long-day conditions, gonadal development and an increase in plasma LH concentration were observed. Furthermore, rapid induction of thyrotropin beta subunit, a master regulator of photoperiodism, was observed at 16 h after dawn on the first long day. In addition, the long-day condition induced the expression of type 2 deiodinase, the key output gene of photoperiodism. These results were consistent with the results obtained in quail and suggest that the red jungle fowl could be an ideal model animal for the genome-wide transcriptional analysis of photoperiodism.

Metabolites associated with adaptation of microorganisms to an acidophilic, metal-rich environment identified by stable-isotope-enabled metabolomics.

PubMed

Mosier, Annika C; Justice, Nicholas B; Bowen, Benjamin P; Baran, Richard; Thomas, Brian C; Northen, Trent R; Banfield, Jillian F

2013-03-12

Microorganisms grow under a remarkable range of extreme conditions. Environmental transcriptomic and proteomic studies have highlighted metabolic pathways active in extremophilic communities. However, metabolites directly linked to their physiology are less well defined because metabolomics methods lag behind other omics technologies due to a wide range of experimental complexities often associated with the environmental matrix. We identified key metabolites associated with acidophilic and metal-tolerant microorganisms using stable isotope labeling coupled with untargeted, high-resolution mass spectrometry. We observed >3,500 metabolic features in biofilms growing in pH ~0.9 acid mine drainage solutions containing millimolar concentrations of iron, sulfate, zinc, copper, and arsenic. Stable isotope labeling improved chemical formula prediction by >50% for larger metabolites (>250 atomic mass units), many of which were unrepresented in metabolic databases and may represent novel compounds. Taurine and hydroxyectoine were identified and likely provide protection from osmotic stress in the biofilms. Community genomic, transcriptomic, and proteomic data implicate fungi in taurine metabolism. Leptospirillum group II bacteria decrease production of ectoine and hydroxyectoine as biofilms mature, suggesting that biofilm structure provides some resistance to high metal and proton concentrations. The combination of taurine, ectoine, and hydroxyectoine may also constitute a sulfur, nitrogen, and carbon currency in the communities. Microbial communities are central to many critical global processes and yet remain enigmatic largely due to their complex and distributed metabolic interactions. Metabolomics has the possibility of providing mechanistic insights into the function and ecology of microbial communities. However, our limited knowledge of microbial metabolites, the difficulty of identifying metabolites from complex samples, and the inability to link metabolites directly to community members have proven to be major limitations in developing advances in systems interactions. Here, we show that combining stable-isotope-enabled metabolomics with genomics, transcriptomics, and proteomics can illuminate the ecology of microorganisms at the community scale.
Developmental pathways inferred from modularity, morphological integration and fluctuating asymmetry patterns in the human face.

PubMed

Quinto-Sánchez, Mirsha; Muñoz-Muñoz, Francesc; Gomez-Valdes, Jorge; Cintas, Celia; Navarro, Pablo; Cerqueira, Caio Cesar Silva de; Paschetta, Carolina; de Azevedo, Soledad; Ramallo, Virginia; Acuña-Alonzo, Victor; Adhikari, Kaustubh; Fuentes-Guajardo, Macarena; Hünemeier, Tábita; Everardo, Paola; de Avila, Francisco; Jaramillo, Claudia; Arias, Williams; Gallo, Carla; Poletti, Giovani; Bedoya, Gabriel; Bortolini, Maria Cátira; Canizales-Quinteros, Samuel; Rothhammer, Francisco; Rosique, Javier; Ruiz-Linares, Andres; Gonzalez-Jose, Rolando

2018-01-17

Facial asymmetries are usually measured and interpreted as proxies to developmental noise. However, analyses focused on its developmental and genetic architecture are scarce. To advance on this topic, studies based on a comprehensive and simultaneous analysis of modularity, morphological integration and facial asymmetries including both phenotypic and genomic information are needed. Here we explore several modularity hypotheses on a sample of Latin American mestizos, in order to test if modularity and integration patterns differ across several genomic ancestry backgrounds. To do so, 4104 individuals were analyzed using 3D photogrammetry reconstructions and a set of 34 facial landmarks placed on each individual. We found a pattern of modularity and integration that is conserved across sub-samples differing in their genomic ancestry background. Specifically, a signal of modularity based on functional demands and organization of the face is regularly observed across the whole sample. Our results shed more light on previous evidence obtained from Genome Wide Association Studies performed on the same samples, indicating the action of different genomic regions contributing to the expression of the nose and mouth facial phenotypes. Our results also indicate that large samples including phenotypic and genomic metadata enable a better understanding of the developmental and genetic architecture of craniofacial phenotypes.
Complex interplay between neutral and adaptive evolution shaped differential genomic background and disease susceptibility along the Italian peninsula.

PubMed

Sazzini, Marco; Gnecchi Ruscone, Guido Alberto; Giuliani, Cristina; Sarno, Stefania; Quagliariello, Andrea; De Fanti, Sara; Boattini, Alessio; Gentilini, Davide; Fiorito, Giovanni; Catanoso, Mariagrazia; Boiardi, Luigi; Croci, Stefania; Macchioni, Pierluigi; Mantovani, Vilma; Di Blasio, Anna Maria; Matullo, Giuseppe; Salvarani, Carlo; Franceschi, Claudio; Pettener, Davide; Garagnani, Paolo; Luiselli, Donata

2016-09-01

The Italian peninsula has long represented a natural hub for human migrations across the Mediterranean area, being involved in several prehistoric and historical population movements. Coupled with a patchy environmental landscape entailing different ecological/cultural selective pressures, this might have produced peculiar patterns of population structure and local adaptations responsible for heterogeneous genomic background of present-day Italians. To disentangle this complex scenario, genome-wide data from 780 Italian individuals were generated and set into the context of European/Mediterranean genomic diversity by comparison with genotypes from 50 populations. To maximize possibility of pinpointing functional genomic regions that have played adaptive roles during Italian natural history, our survey included also ~250,000 exomic markers and ~20,000 coding/regulatory variants with well-established clinical relevance. This enabled fine-grained dissection of Italian population structure through the identification of clusters of genetically homogeneous provinces and of genomic regions underlying their local adaptations. Description of such patterns disclosed crucial implications for understanding differential susceptibility to some inflammatory/autoimmune disorders, coronary artery disease and type 2 diabetes of diverse Italian subpopulations, suggesting the evolutionary causes that made some of them particularly exposed to the metabolic and immune challenges imposed by dietary and lifestyle shifts that involved western societies in the last centuries.
Genome-wide and gene-based association implicates FRMD6 in Alzheimer disease.

PubMed

Hong, Mun-Gwan; Reynolds, Chandra A; Feldman, Adina L; Kallin, Mikael; Lambert, Jean-Charles; Amouyel, Philippe; Ingelsson, Erik; Pedersen, Nancy L; Prince, Jonathan A

2012-03-01

Genome-wide association studies (GWAS) that allow for allelic heterogeneity may facilitate the discovery of novel genes not detectable by models that require replication of a single variant site. One strategy to accomplish this is to focus on genes rather than markers as units of association, and so potentially capture a spectrum of causal alleles that differ across populations. Here, we conducted a GWAS of Alzheimer disease (AD) in 2,586 Swedes and performed gene-based meta-analysis with three additional studies from France, Canada, and the United States, in total encompassing 4,259 cases and 8,284 controls. Implementing a newly designed gene-based algorithm, we identified two loci apart from the region around APOE that achieved study-wide significance in combined samples, the strongest finding being for FRMD6 on chromosome 14q (P = 2.6 × 10(-14)) and a weaker signal for NARS2 that is immediately adjacent to GAB2 on chromosome 11q (P = 7.8 × 10(-9)). Ontology-based pathway analyses revealed significant enrichment of genes involved in glycosylation. Results suggest that gene-based approaches that accommodate allelic heterogeneity in GWAS can provide a complementary avenue for gene discovery and may help to explain a portion of the missing heritability not detectable with single nucleotide polymorphisms (SNPs) derived from marker-specific meta-analysis. © 2011 Wiley Periodicals, Inc.
Novel Sources of Stripe Rust Resistance Identified by Genome-Wide Association Mapping in Ethiopian Durum Wheat (Triticumturgidumssp. durum)

USDA-ARS?s Scientific Manuscript database

Stripe rust of wheat, caused by Puccinia striiformis f. sp. tritici (Pst), is a global concern for wheat production and has been increasingly destructive in Ethiopia,as well as in the United States and many other countries. As Ethiopia has a long history of stripe rust epidemics, its native wheat ge...
Genome-wide analysis of the SPL/miR156 module and its interaction with the AP2/miR172 unit in barley

USDA-ARS?s Scientific Manuscript database

The SQUAMOSA-promoter binding like (SPL) gene family encodes transcription factors shown in a number of species to influence plant growth and development, but information about these genes in barley is limited. This study identified 13 barley SPL genes, within five distinct groups, that are ortholog...
Extensive Local Gene Duplication and Functional Divergence among Paralogs in Atlantic Salmon

PubMed Central

Warren, Ian A.; Ciborowski, Kate L.; Casadei, Elisa; Hazlerigg, David G.; Martin, Sam; Jordan, William C.; Sumner, Seirian

2014-01-01

Many organisms can generate alternative phenotypes from the same genome, enabling individuals to exploit diverse and variable environments. A prevailing hypothesis is that such adaptation has been favored by gene duplication events, which generate redundant genomic material that may evolve divergent functions. Vertebrate examples of recent whole-genome duplications are sparse although one example is the salmonids, which have undergone a whole-genome duplication event within the last 100 Myr. The life-cycle of the Atlantic salmon, Salmo salar, depends on the ability to produce alternating phenotypes from the same genome, to facilitate migration and maintain its anadromous life history. Here, we investigate the hypothesis that genome-wide and local gene duplication events have contributed to the salmonid adaptation. We used high-throughput sequencing to characterize the transcriptomes of three key organs involved in regulating migration in S. salar: Brain, pituitary, and olfactory epithelium. We identified over 10,000 undescribed S. salar sequences and designed an analytic workflow to distinguish between paralogs originating from local gene duplication events or from whole-genome duplication events. These data reveal that substantial local gene duplications took place shortly after the whole-genome duplication event. Many of the identified paralog pairs have either diverged in function or become noncoding. Future functional genomics studies will reveal to what extent this rich source of divergence in genetic sequence is likely to have facilitated the evolution of extreme phenotypic plasticity required for an anadromous life-cycle. PMID:24951567
Genome-wide identification and evolutionary analysis of algal LPAT genes involved in TAG biosynthesis using bioinformatic approaches.

PubMed

Misra, Namrata; Panda, Prasanna Kumar; Parida, Bikram Kumar

2014-12-01

Lysophosphatidyl acyltransferase (LPAT) is one of the major triacylglycerol synthesis enzymes, controlling the metabolic flow of lysophosphatidic acid to phosphatidic acid. Experimental studies in Arabidopsis have shown that LPAT activity is exhibited primarily by three distinct isoforms, namely the plastid-located LPAT1, the endoplasmic reticulum-located LPAT2, and the soluble isoform of LPAT (solLPAT). In this study, 24 putative genes representing all LPAT isoforms were identified from the analysis of 11 complete genomes including green algae, red algae, diatoms and higher plants. We observed LPAT1 and solLPAT genes to be ubiquitously present in nearly all genomes examined, whereas LPAT2 genes to have evolved more recently in the plant lineage. Phylogenetic analysis indicated that LPAT1, LPAT2 and solLPAT have convergently evolved through separate evolutionary paths and belong to three different gene families, which was further evidenced by their wide divergence at gene structure and sequence level. The genome distribution supports the hypothesis that each gene encoding a LPAT is not duplicated. Mapping of exon-intron structure of LPAT genes to the domain structure of proteins across different algal and plant species indicates that exon shuffling plays no role in the evolution of LPAT genes. Besides the previously defined motifs, several conserved consensus sequences were discovered which could be useful to distinguish different LPAT isoforms. Taken together, this study will enable the generation of experimental approximations to better understand the functional role of algal LPAT in lipid accumulation.
Genome-wide identification and characterisation of human DNA replication origins by initiation site sequencing (ini-seq)

PubMed Central

Langley, Alexander R.; Gräf, Stefan; Smith, James C.; Krude, Torsten

2016-01-01

Next-generation sequencing has enabled the genome-wide identification of human DNA replication origins. However, different approaches to mapping replication origins, namely (i) sequencing isolated small nascent DNA strands (SNS-seq); (ii) sequencing replication bubbles (bubble-seq) and (iii) sequencing Okazaki fragments (OK-seq), show only limited concordance. To address this controversy, we describe here an independent high-resolution origin mapping technique that we call initiation site sequencing (ini-seq). In this approach, newly replicated DNA is directly labelled with digoxigenin-dUTP near the sites of its initiation in a cell-free system. The labelled DNA is then immunoprecipitated and genomic locations are determined by DNA sequencing. Using this technique we identify >25,000 discrete origin sites at sub-kilobase resolution on the human genome, with high concordance between biological replicates. Most activated origins identified by ini-seq are found at transcriptional start sites and contain G-quadruplex (G4) motifs. They tend to cluster in early-replicating domains, providing a correlation between early replication timing and local density of activated origins. Origins identified by ini-seq show highest concordance with sites identified by SNS-seq, followed by OK-seq and bubble-seq. Furthermore, germline origins identified by positive nucleotide distribution skew jumps overlap with origins identified by ini-seq and OK-seq more frequently and more specifically than do sites identified by either SNS-seq or bubble-seq. PMID:27587586
Genome-wide identification and characterisation of human DNA replication origins by initiation site sequencing (ini-seq).

PubMed

Langley, Alexander R; Gräf, Stefan; Smith, James C; Krude, Torsten

2016-12-01

Next-generation sequencing has enabled the genome-wide identification of human DNA replication origins. However, different approaches to mapping replication origins, namely (i) sequencing isolated small nascent DNA strands (SNS-seq); (ii) sequencing replication bubbles (bubble-seq) and (iii) sequencing Okazaki fragments (OK-seq), show only limited concordance. To address this controversy, we describe here an independent high-resolution origin mapping technique that we call initiation site sequencing (ini-seq). In this approach, newly replicated DNA is directly labelled with digoxigenin-dUTP near the sites of its initiation in a cell-free system. The labelled DNA is then immunoprecipitated and genomic locations are determined by DNA sequencing. Using this technique we identify >25,000 discrete origin sites at sub-kilobase resolution on the human genome, with high concordance between biological replicates. Most activated origins identified by ini-seq are found at transcriptional start sites and contain G-quadruplex (G4) motifs. They tend to cluster in early-replicating domains, providing a correlation between early replication timing and local density of activated origins. Origins identified by ini-seq show highest concordance with sites identified by SNS-seq, followed by OK-seq and bubble-seq. Furthermore, germline origins identified by positive nucleotide distribution skew jumps overlap with origins identified by ini-seq and OK-seq more frequently and more specifically than do sites identified by either SNS-seq or bubble-seq. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
A survey of enabling technologies in synthetic biology

PubMed Central

2013-01-01

Background Realizing constructive applications of synthetic biology requires continued development of enabling technologies as well as policies and practices to ensure these technologies remain accessible for research. Broadly defined, enabling technologies for synthetic biology include any reagent or method that, alone or in combination with associated technologies, provides the means to generate any new research tool or application. Because applications of synthetic biology likely will embody multiple patented inventions, it will be important to create structures for managing intellectual property rights that best promote continued innovation. Monitoring the enabling technologies of synthetic biology will facilitate the systematic investigation of property rights coupled to these technologies and help shape policies and practices that impact the use, regulation, patenting, and licensing of these technologies. Results We conducted a survey among a self-identifying community of practitioners engaged in synthetic biology research to obtain their opinions and experiences with technologies that support the engineering of biological systems. Technologies widely used and considered enabling by survey participants included public and private registries of biological parts, standard methods for physical assembly of DNA constructs, genomic databases, software tools for search, alignment, analysis, and editing of DNA sequences, and commercial services for DNA synthesis and sequencing. Standards and methods supporting measurement, functional composition, and data exchange were less widely used though still considered enabling by a subset of survey participants. Conclusions The set of enabling technologies compiled from this survey provide insight into the many and varied technologies that support innovation in synthetic biology. Many of these technologies are widely accessible for use, either by virtue of being in the public domain or through legal tools such as non-exclusive licensing. Access to some patent protected technologies is less clear and use of these technologies may be subject to restrictions imposed by material transfer agreements or other contract terms. We expect the technologies considered enabling for synthetic biology to change as the field advances. By monitoring the enabling technologies of synthetic biology and addressing the policies and practices that impact their development and use, our hope is that the field will be better able to realize its full potential. PMID:23663447
Efficient and secure outsourcing of genomic data storage.

PubMed

Sousa, João Sá; Lefebvre, Cédric; Huang, Zhicong; Raisaro, Jean Louis; Aguilar-Melchor, Carlos; Killijian, Marc-Olivier; Hubaux, Jean-Pierre

2017-07-26

Cloud computing is becoming the preferred solution for efficiently dealing with the increasing amount of genomic data. Yet, outsourcing storage and processing sensitive information, such as genomic data, comes with important concerns related to privacy and security. This calls for new sophisticated techniques that ensure data protection from untrusted cloud providers and that still enable researchers to obtain useful information. We present a novel privacy-preserving algorithm for fully outsourcing the storage of large genomic data files to a public cloud and enabling researchers to efficiently search for variants of interest. In order to protect data and query confidentiality from possible leakage, our solution exploits optimal encoding for genomic variants and combines it with homomorphic encryption and private information retrieval. Our proposed algorithm is implemented in C++ and was evaluated on real data as part of the 2016 iDash Genome Privacy-Protection Challenge. Results show that our solution outperforms the state-of-the-art solutions and enables researchers to search over millions of encrypted variants in a few seconds. As opposed to prior beliefs that sophisticated privacy-enhancing technologies (PETs) are unpractical for real operational settings, our solution demonstrates that, in the case of genomic data, PETs are very efficient enablers.
Cloning, Assembly, and Modification of the Primary Human Cytomegalovirus Isolate Toledo by Yeast-Based Transformation-Associated Recombination.

PubMed

Vashee, Sanjay; Stockwell, Timothy B; Alperovich, Nina; Denisova, Evgeniya A; Gibson, Daniel G; Cady, Kyle C; Miller, Kristofer; Kannan, Krishna; Malouli, Daniel; Crawford, Lindsey B; Voorhies, Alexander A; Bruening, Eric; Caposio, Patrizia; Früh, Klaus

2017-01-01

Genetic engineering of cytomegalovirus (CMV) currently relies on generating a bacterial artificial chromosome (BAC) by introducing a bacterial origin of replication into the viral genome using in vivo recombination in virally infected tissue culture cells. However, this process is inefficient, results in adaptive mutations, and involves deletion of viral genes to avoid oversized genomes when inserting the BAC cassette. Moreover, BAC technology does not permit the simultaneous manipulation of multiple genome loci and cannot be used to construct synthetic genomes. To overcome these limitations, we adapted synthetic biology tools to clone CMV genomes in Saccharomyces cerevisiae . Using an early passage of the human CMV isolate Toledo, we first applied transformation-associated recombination (TAR) to clone 16 overlapping fragments covering the entire Toledo genome in Saccharomyces cerevisiae . Then, we assembled these fragments by TAR in a stepwise process until the entire genome was reconstituted in yeast. Since next-generation sequence analysis revealed that the low-passage-number isolate represented a mixture of parental and fibroblast-adapted genomes, we selectively modified individual DNA fragments of fibroblast-adapted Toledo (Toledo-F) and again used TAR assembly to recreate parental Toledo (Toledo-P). Linear, full-length HCMV genomes were transfected into human fibroblasts to recover virus. Unlike Toledo-F, Toledo-P displayed characteristics of primary isolates, including broad cellular tropism in vitro and the ability to establish latency and reactivation in humanized mice. Our novel strategy thus enables de novo cloning of CMV genomes, more-efficient genome-wide engineering, and the generation of viral genomes that are partially or completely derived from synthetic DNA. IMPORTANCE The genomes of large DNA viruses, such as human cytomegalovirus (HCMV), are difficult to manipulate using current genetic tools, and at this time, it is not possible to obtain, molecular clones of CMV without extensive tissue culture. To overcome these limitations, we used synthetic biology tools to capture genomic fragments from viral DNA and assemble full-length genomes in yeast. Using an early passage of the HCMV isolate Toledo containing a mixture of wild-type and tissue culture-adapted virus. we directly cloned the majority sequence and recreated the minority sequence by simultaneous modification of multiple genomic regions. Thus, our novel approach provides a paradigm to not only efficiently engineer HCMV and other large DNA viruses on a genome-wide scale but also facilitates the cloning and genetic manipulation of primary isolates and provides a pathway to generating entirely synthetic genomes.
Cloning, Assembly, and Modification of the Primary Human Cytomegalovirus Isolate Toledo by Yeast-Based Transformation-Associated Recombination

PubMed Central

Vashee, Sanjay; Stockwell, Timothy B.; Alperovich, Nina; Denisova, Evgeniya A.; Gibson, Daniel G.; Cady, Kyle C.; Miller, Kristofer; Kannan, Krishna; Malouli, Daniel; Crawford, Lindsey B.; Voorhies, Alexander A.; Bruening, Eric; Caposio, Patrizia

2017-01-01

ABSTRACT Genetic engineering of cytomegalovirus (CMV) currently relies on generating a bacterial artificial chromosome (BAC) by introducing a bacterial origin of replication into the viral genome using in vivo recombination in virally infected tissue culture cells. However, this process is inefficient, results in adaptive mutations, and involves deletion of viral genes to avoid oversized genomes when inserting the BAC cassette. Moreover, BAC technology does not permit the simultaneous manipulation of multiple genome loci and cannot be used to construct synthetic genomes. To overcome these limitations, we adapted synthetic biology tools to clone CMV genomes in Saccharomyces cerevisiae. Using an early passage of the human CMV isolate Toledo, we first applied transformation-associated recombination (TAR) to clone 16 overlapping fragments covering the entire Toledo genome in Saccharomyces cerevisiae. Then, we assembled these fragments by TAR in a stepwise process until the entire genome was reconstituted in yeast. Since next-generation sequence analysis revealed that the low-passage-number isolate represented a mixture of parental and fibroblast-adapted genomes, we selectively modified individual DNA fragments of fibroblast-adapted Toledo (Toledo-F) and again used TAR assembly to recreate parental Toledo (Toledo-P). Linear, full-length HCMV genomes were transfected into human fibroblasts to recover virus. Unlike Toledo-F, Toledo-P displayed characteristics of primary isolates, including broad cellular tropism in vitro and the ability to establish latency and reactivation in humanized mice. Our novel strategy thus enables de novo cloning of CMV genomes, more-efficient genome-wide engineering, and the generation of viral genomes that are partially or completely derived from synthetic DNA. IMPORTANCE The genomes of large DNA viruses, such as human cytomegalovirus (HCMV), are difficult to manipulate using current genetic tools, and at this time, it is not possible to obtain, molecular clones of CMV without extensive tissue culture. To overcome these limitations, we used synthetic biology tools to capture genomic fragments from viral DNA and assemble full-length genomes in yeast. Using an early passage of the HCMV isolate Toledo containing a mixture of wild-type and tissue culture-adapted virus. we directly cloned the majority sequence and recreated the minority sequence by simultaneous modification of multiple genomic regions. Thus, our novel approach provides a paradigm to not only efficiently engineer HCMV and other large DNA viruses on a genome-wide scale but also facilitates the cloning and genetic manipulation of primary isolates and provides a pathway to generating entirely synthetic genomes. PMID:28989973
Finding Our Way through Phenotypes

PubMed Central

Deans, Andrew R.; Lewis, Suzanna E.; Huala, Eva; Anzaldo, Salvatore S.; Ashburner, Michael; Balhoff, James P.; Blackburn, David C.; Blake, Judith A.; Burleigh, J. Gordon; Chanet, Bruno; Cooper, Laurel D.; Courtot, Mélanie; Csösz, Sándor; Cui, Hong; Dahdul, Wasila; Das, Sandip; Dececchi, T. Alexander; Dettai, Agnes; Diogo, Rui; Druzinsky, Robert E.; Dumontier, Michel; Franz, Nico M.; Friedrich, Frank; Gkoutos, George V.; Haendel, Melissa; Harmon, Luke J.; Hayamizu, Terry F.; He, Yongqun; Hines, Heather M.; Ibrahim, Nizar; Jackson, Laura M.; Jaiswal, Pankaj; James-Zorn, Christina; Köhler, Sebastian; Lecointre, Guillaume; Lapp, Hilmar; Lawrence, Carolyn J.; Le Novère, Nicolas; Lundberg, John G.; Macklin, James; Mast, Austin R.; Midford, Peter E.; Mikó, István; Mungall, Christopher J.; Oellrich, Anika; Osumi-Sutherland, David; Parkinson, Helen; Ramírez, Martín J.; Richter, Stefan; Robinson, Peter N.; Ruttenberg, Alan; Schulz, Katja S.; Segerdell, Erik; Seltmann, Katja C.; Sharkey, Michael J.; Smith, Aaron D.; Smith, Barry; Specht, Chelsea D.; Squires, R. Burke; Thacker, Robert W.; Thessen, Anne; Fernandez-Triana, Jose; Vihinen, Mauno; Vize, Peter D.; Vogt, Lars; Wall, Christine E.; Walls, Ramona L.; Westerfeld, Monte; Wharton, Robert A.; Wirkner, Christian S.; Woolley, James B.; Yoder, Matthew J.; Zorn, Aaron M.; Mabee, Paula

2015-01-01

Despite a large and multifaceted effort to understand the vast landscape of phenotypic data, their current form inhibits productive data analysis. The lack of a community-wide, consensus-based, human- and machine-interpretable language for describing phenotypes and their genomic and environmental contexts is perhaps the most pressing scientific bottleneck to integration across many key fields in biology, including genomics, systems biology, development, medicine, evolution, ecology, and systematics. Here we survey the current phenomics landscape, including data resources and handling, and the progress that has been made to accurately capture relevant data descriptions for phenotypes. We present an example of the kind of integration across domains that computable phenotypes would enable, and we call upon the broader biology community, publishers, and relevant funding agencies to support efforts to surmount today's data barriers and facilitate analytical reproducibility. PMID:25562316
Finding our way through phenotypes.

PubMed

Deans, Andrew R; Lewis, Suzanna E; Huala, Eva; Anzaldo, Salvatore S; Ashburner, Michael; Balhoff, James P; Blackburn, David C; Blake, Judith A; Burleigh, J Gordon; Chanet, Bruno; Cooper, Laurel D; Courtot, Mélanie; Csösz, Sándor; Cui, Hong; Dahdul, Wasila; Das, Sandip; Dececchi, T Alexander; Dettai, Agnes; Diogo, Rui; Druzinsky, Robert E; Dumontier, Michel; Franz, Nico M; Friedrich, Frank; Gkoutos, George V; Haendel, Melissa; Harmon, Luke J; Hayamizu, Terry F; He, Yongqun; Hines, Heather M; Ibrahim, Nizar; Jackson, Laura M; Jaiswal, Pankaj; James-Zorn, Christina; Köhler, Sebastian; Lecointre, Guillaume; Lapp, Hilmar; Lawrence, Carolyn J; Le Novère, Nicolas; Lundberg, John G; Macklin, James; Mast, Austin R; Midford, Peter E; Mikó, István; Mungall, Christopher J; Oellrich, Anika; Osumi-Sutherland, David; Parkinson, Helen; Ramírez, Martín J; Richter, Stefan; Robinson, Peter N; Ruttenberg, Alan; Schulz, Katja S; Segerdell, Erik; Seltmann, Katja C; Sharkey, Michael J; Smith, Aaron D; Smith, Barry; Specht, Chelsea D; Squires, R Burke; Thacker, Robert W; Thessen, Anne; Fernandez-Triana, Jose; Vihinen, Mauno; Vize, Peter D; Vogt, Lars; Wall, Christine E; Walls, Ramona L; Westerfeld, Monte; Wharton, Robert A; Wirkner, Christian S; Woolley, James B; Yoder, Matthew J; Zorn, Aaron M; Mabee, Paula

2015-01-01

Despite a large and multifaceted effort to understand the vast landscape of phenotypic data, their current form inhibits productive data analysis. The lack of a community-wide, consensus-based, human- and machine-interpretable language for describing phenotypes and their genomic and environmental contexts is perhaps the most pressing scientific bottleneck to integration across many key fields in biology, including genomics, systems biology, development, medicine, evolution, ecology, and systematics. Here we survey the current phenomics landscape, including data resources and handling, and the progress that has been made to accurately capture relevant data descriptions for phenotypes. We present an example of the kind of integration across domains that computable phenotypes would enable, and we call upon the broader biology community, publishers, and relevant funding agencies to support efforts to surmount today's data barriers and facilitate analytical reproducibility.
The Genome of Winter Moth (Operophtera brumata) Provides a Genomic Perspective on Sexual Dimorphism and Phenology.

PubMed

Derks, Martijn F L; Smit, Sandra; Salis, Lucia; Schijlen, Elio; Bossers, Alex; Mateman, Christa; Pijl, Agata S; de Ridder, Dick; Groenen, Martien A M; Visser, Marcel E; Megens, Hendrik-Jan

2015-07-29

The winter moth (Operophtera brumata) belongs to one of the most species-rich families in Lepidoptera, the Geometridae (approximately 23,000 species). This family is of great economic importance as most species are herbivorous and capable of defoliating trees. Genome assembly of the winter moth allows the study of genes and gene families, such as the cytochrome P450 gene family, which is known to be vital in plant secondary metabolite detoxification and host-plant selection. It also enables exploration of the genomic basis for female brachyptery (wing reduction), a feature of sexual dimorphism in winter moth, and for seasonal timing, a trait extensively studied in this species. Here we present a reference genome for the winter moth, the first geometrid and largest sequenced Lepidopteran genome to date (638 Mb) including a set of 16,912 predicted protein-coding genes. This allowed us to assess the dynamics of evolution on a genome-wide scale using the P450 gene family. We also identified an expanded gene family potentially linked to female brachyptery, and annotated the genes involved in the circadian clock mechanism as main candidates for involvement in seasonal timing. The genome will contribute to Lepidopteran genomic resources and comparative genomics. In addition, the genome enhances our ability to understand the genetic and molecular basis of insect seasonal timing and thereby provides a reference for future evolutionary and population studies on the winter moth. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Towards development of new ornamental plants: status and progress in wide hybridization.

PubMed

Kuligowska, Katarzyna; Lütken, Henrik; Müller, Renate

2016-07-01

The present review provides insights into the key findings of the hybridization process, crucial factors affecting the adaptation of new technologies within wide hybridization of ornamental plants and presents perspectives of further development of this strategy. Wide hybridization is one of the oldest breeding techniques that contributed enormously to the development of modern plant cultivars. Within ornamental breeding, it represents the main source of genetic variation. During the long history of wide hybridization, a number of methods were implemented allowing the evolution from a conventional breeding tool into a modern methodology. Nowadays, the research on model plants and crop species increases our understanding of reproductive isolation among distant species and partly explains the background of the traditional approaches previously used for overcoming hybridization barriers. Characterization of parental plants and hybrids is performed using molecular and cytological techniques that strongly facilitate breeding processes. Molecular markers and sequencing technologies are used for the assessment of genetic relationships among plants, as the genetic distance is typically depicted as one of the most important factors influencing cross-compatibility in hybridization processes. Furthermore, molecular marker systems are frequently applied for verification of hybrid state of the progeny. The flow cytometry and genomic in situ hybridization are used in the assessment of hybridization partners and characterization of hybrid progeny in relation to genome stabilization as well as genome recombination and introgression. In the future, new research and technologies are likely to provide more detailed information about genes and pathways responsible for interspecific reproductive isolation. Ultimately, this knowledge will enable development of strategies for obtaining compatible lines for hybrid production. Recent development in sequencing technologies and availability of sequence data will also facilitate creation of new molecular markers that will advance marker-assisted selection in hybridization process.
Genome-wide analysis of Tol2 transposon reintegration in zebrafish.

PubMed

Kondrychyn, Igor; Garcia-Lecea, Marta; Emelyanov, Alexander; Parinov, Sergey; Korzh, Vladimir

2009-09-08

Tol2, a member of the hAT family of transposons, has become a useful tool for genetic manipulation of model animals, but information about its interactions with vertebrate genomes is still limited. Furthermore, published reports on Tol2 have mainly been based on random integration of the transposon system after co-injection of a plasmid DNA harboring the transposon and a transposase mRNA. It is important to understand how Tol2 would behave upon activation after integration into the genome. We performed a large-scale enhancer trap (ET) screen and generated 338 insertions of the Tol2 transposon-based ET cassette into the zebrafish genome. These insertions were generated by remobilizing the transposon from two different donor sites in two transgenic lines. We found that 39% of Tol2 insertions occurred in transcription units, mostly into introns. Analysis of the transposon target sites revealed no strict specificity at the DNA sequence level. However, Tol2 was prone to target AT-rich regions with weak palindromic consensus sequences centered at the insertion site. Our systematic analysis of sequential remobilizations of the Tol2 transposon from two independent sites within a vertebrate genome has revealed properties such as a tendency to integrate into transcription units and into AT-rich palindrome-like sequences. This information will influence the development of various applications involving DNA transposons and Tol2 in particular.
Three chromosomal rearrangements promote genomic divergence between migratory and stationary ecotypes of Atlantic cod.

PubMed

Berg, Paul R; Star, Bastiaan; Pampoulie, Christophe; Sodeland, Marte; Barth, Julia M I; Knutsen, Halvor; Jakobsen, Kjetill S; Jentoft, Sissel

2016-03-17

Identification of genome-wide patterns of divergence provides insight on how genomes are influenced by selection and can reveal the potential for local adaptation in spatially structured populations. In Atlantic cod - historically a major marine resource - Northeast-Arctic- and Norwegian coastal cod are recognized by fundamental differences in migratory and non-migratory behavior, respectively. However, the genomic architecture underlying such behavioral ecotypes is unclear. Here, we have analyzed more than 8.000 polymorphic SNPs distributed throughout all 23 linkage groups and show that loci putatively under selection are localized within three distinct genomic regions, each of several megabases long, covering approximately 4% of the Atlantic cod genome. These regions likely represent genomic inversions. The frequency of these distinct regions differ markedly between the ecotypes, spawning in the vicinity of each other, which contrasts with the low level of divergence in the rest of the genome. The observed patterns strongly suggest that these chromosomal rearrangements are instrumental in local adaptation and separation of Atlantic cod populations, leaving footprints of large genomic regions under selection. Our findings demonstrate the power of using genomic information in further understanding the population dynamics and defining management units in one of the world's most economically important marine resources.

Practical utilization of recombinant AAV vector reference standards: focus on vector genomes titration by free ITR qPCR.

PubMed

D'Costa, Susan; Blouin, Veronique; Broucque, Frederic; Penaud-Budloo, Magalie; François, Achille; Perez, Irene C; Le Bec, Christine; Moullier, Philippe; Snyder, Richard O; Ayuso, Eduard

2016-01-01

Clinical trials using recombinant adeno-associated virus (rAAV) vectors have demonstrated efficacy and a good safety profile. Although the field is advancing quickly, vector analytics and harmonization of dosage units are still a limitation for commercialization. AAV reference standard materials (RSMs) can help ensure product safety by controlling the consistency of assays used to characterize rAAV stocks. The most widely utilized unit of vector dosing is based on the encapsidated vector genome. Quantitative polymerase chain reaction (qPCR) is now the most common method to titer vector genomes (vg); however, significant inter- and intralaboratory variations have been documented using this technique. Here, RSMs and rAAV stocks were titered on the basis of an inverted terminal repeats (ITRs) sequence-specific qPCR and we found an artificial increase in vg titers using a widely utilized approach. The PCR error was introduced by using single-cut linearized plasmid as the standard curve. This bias was eliminated using plasmid standards linearized just outside the ITR region on each end to facilitate the melting of the palindromic ITR sequences during PCR. This new "Free-ITR" qPCR delivers vg titers that are consistent with titers obtained with transgene-specific qPCR and could be used to normalize in-house product-specific AAV vector standards and controls to the rAAV RSMs. The free-ITR method, including well-characterized controls, will help to calibrate doses to compare preclinical and clinical data in the field.
Applications of the pipeline environment for visual informatics and genomics computations

PubMed Central

2011-01-01

Background Contemporary informatics and genomics research require efficient, flexible and robust management of large heterogeneous data, advanced computational tools, powerful visualization, reliable hardware infrastructure, interoperability of computational resources, and detailed data and analysis-protocol provenance. The Pipeline is a client-server distributed computational environment that facilitates the visual graphical construction, execution, monitoring, validation and dissemination of advanced data analysis protocols. Results This paper reports on the applications of the LONI Pipeline environment to address two informatics challenges - graphical management of diverse genomics tools, and the interoperability of informatics software. Specifically, this manuscript presents the concrete details of deploying general informatics suites and individual software tools to new hardware infrastructures, the design, validation and execution of new visual analysis protocols via the Pipeline graphical interface, and integration of diverse informatics tools via the Pipeline eXtensible Markup Language syntax. We demonstrate each of these processes using several established informatics packages (e.g., miBLAST, EMBOSS, mrFAST, GWASS, MAQ, SAMtools, Bowtie) for basic local sequence alignment and search, molecular biology data analysis, and genome-wide association studies. These examples demonstrate the power of the Pipeline graphical workflow environment to enable integration of bioinformatics resources which provide a well-defined syntax for dynamic specification of the input/output parameters and the run-time execution controls. Conclusions The LONI Pipeline environment http://pipeline.loni.ucla.edu provides a flexible graphical infrastructure for efficient biomedical computing and distributed informatics research. The interactive Pipeline resource manager enables the utilization and interoperability of diverse types of informatics resources. The Pipeline client-server model provides computational power to a broad spectrum of informatics investigators - experienced developers and novice users, user with or without access to advanced computational-resources (e.g., Grid, data), as well as basic and translational scientists. The open development, validation and dissemination of computational networks (pipeline workflows) facilitates the sharing of knowledge, tools, protocols and best practices, and enables the unbiased validation and replication of scientific findings by the entire community. PMID:21791102
Systematic quantification of HDR and NHEJ reveals effects of locus, nuclease, and cell type on genome-editing.

PubMed

Miyaoka, Yuichiro; Berman, Jennifer R; Cooper, Samantha B; Mayerl, Steven J; Chan, Amanda H; Zhang, Bin; Karlin-Neumann, George A; Conklin, Bruce R

2016-03-31

Precise genome-editing relies on the repair of sequence-specific nuclease-induced DNA nicking or double-strand breaks (DSBs) by homology-directed repair (HDR). However, nonhomologous end-joining (NHEJ), an error-prone repair, acts concurrently, reducing the rate of high-fidelity edits. The identification of genome-editing conditions that favor HDR over NHEJ has been hindered by the lack of a simple method to measure HDR and NHEJ directly and simultaneously at endogenous loci. To overcome this challenge, we developed a novel, rapid, digital PCR-based assay that can simultaneously detect one HDR or NHEJ event out of 1,000 copies of the genome. Using this assay, we systematically monitored genome-editing outcomes of CRISPR-associated protein 9 (Cas9), Cas9 nickases, catalytically dead Cas9 fused to FokI, and transcription activator-like effector nuclease at three disease-associated endogenous gene loci in HEK293T cells, HeLa cells, and human induced pluripotent stem cells. Although it is widely thought that NHEJ generally occurs more often than HDR, we found that more HDR than NHEJ was induced under multiple conditions. Surprisingly, the HDR/NHEJ ratios were highly dependent on gene locus, nuclease platform, and cell type. The new assay system, and our findings based on it, will enable mechanistic studies of genome-editing and help improve genome-editing technology.
Identification and Resolution of Microdiversity through Metagenomic Sequencing of Parallel Consortia

PubMed Central

Maezato, Yukari; Wu, Yu-Wei; Romine, Margaret F.; Lindemann, Stephen R.

2015-01-01

To gain a predictive understanding of the interspecies interactions within microbial communities that govern community function, the genomic complement of every member population must be determined. Although metagenomic sequencing has enabled the de novo reconstruction of some microbial genomes from environmental communities, microdiversity confounds current genome reconstruction techniques. To overcome this issue, we performed short-read metagenomic sequencing on parallel consortia, defined as consortia cultivated under the same conditions from the same natural community with overlapping species composition. The differences in species abundance between the two consortia allowed reconstruction of near-complete (at an estimated >85% of gene complement) genome sequences for 17 of the 20 detected member species. Two Halomonas spp. indistinguishable by amplicon analysis were found to be present within the community. In addition, comparison of metagenomic reads against the consensus scaffolds revealed within-species variation for one of the Halomonas populations, one of the Rhodobacteraceae populations, and the Rhizobiales population. Genomic comparison of these representative instances of inter- and intraspecies microdiversity suggests differences in functional potential that may result in the expression of distinct roles in the community. In addition, isolation and complete genome sequence determination of six member species allowed an investigation into the sensitivity and specificity of genome reconstruction processes, demonstrating robustness across a wide range of sequence coverage (9× to 2,700×) within the metagenomic data set. PMID:26497460
Identification and Resolution of Microdiversity through Metagenomic Sequencing of Parallel Consortia

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nelson, William C.; Maezato, Yukari; Wu, Yu-Wei

2015-10-23

To gain a predictive understanding of the interspecies interactions within microbial communities that govern community function, the genomic complement of every member population must be determined. Although metagenomic sequencing has enabled thede novoreconstruction of some microbial genomes from environmental communities, microdiversity confounds current genome reconstruction techniques. To overcome this issue, we performed short-read metagenomic sequencing on parallel consortia, defined as consortia cultivated under the same conditions from the same natural community with overlapping species composition. The differences in species abundance between the two consortia allowed reconstruction of near-complete (at an estimated >85% of gene complement) genome sequences for 17 ofmore » the 20 detected member species. TwoHalomonasspp. indistinguishable by amplicon analysis were found to be present within the community. In addition, comparison of metagenomic reads against the consensus scaffolds revealed within-species variation for one of theHalomonaspopulations, one of theRhodobacteraceaepopulations, and theRhizobialespopulation. Genomic comparison of these representative instances of inter- and intraspecies microdiversity suggests differences in functional potential that may result in the expression of distinct roles in the community. In addition, isolation and complete genome sequence determination of six member species allowed an investigation into the sensitivity and specificity of genome reconstruction processes, demonstrating robustness across a wide range of sequence coverage (9× to 2,700×) within the metagenomic data set.« less
Characterizing Phage Genomes for Therapeutic Applications

PubMed Central

Philipson, Casandra W.; Voegtly, Logan J.; Lueder, Matthew R.; Long, Kyle A.; Rice, Gregory K.; Frey, Kenneth G.; Biswas, Biswajit; Cer, Regina Z.; Hamilton, Theron; Bishop-Lilly, Kimberly A.

2018-01-01

Multi-drug resistance is increasing at alarming rates. The efficacy of phage therapy, treating bacterial infections with bacteriophages alone or in combination with traditional antibiotics, has been demonstrated in emergency cases in the United States and in other countries, however remains to be approved for wide-spread use in the US. One limiting factor is a lack of guidelines for assessing the genomic safety of phage candidates. We present the phage characterization workflow used by our team to generate data for submitting phages to the Federal Drug Administration (FDA) for authorized use. Essential analysis checkpoints and warnings are detailed for obtaining high-quality genomes, excluding undesirable candidates, rigorously assessing a phage genome for safety and evaluating sequencing contamination. This workflow has been developed in accordance with community standards for high-throughput sequencing of viral genomes as well as principles for ideal phages used for therapy. The feasibility and utility of the pipeline is demonstrated on two new phage genomes that meet all safety criteria. We propose these guidelines as a minimum standard for phages being submitted to the FDA for review as investigational new drug candidates. PMID:29642590
Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests.

PubMed

Nguyen, Thanh-Tung; Huang, Joshua; Wu, Qingyao; Nguyen, Thuy; Li, Mark

2015-01-01

Single-nucleotide polymorphisms (SNPs) selection and identification are the most important tasks in Genome-wide association data analysis. The problem is difficult because genome-wide association data is very high dimensional and a large portion of SNPs in the data is irrelevant to the disease. Advanced machine learning methods have been successfully used in Genome-wide association studies (GWAS) for identification of genetic variants that have relatively big effects in some common, complex diseases. Among them, the most successful one is Random Forests (RF). Despite of performing well in terms of prediction accuracy in some data sets with moderate size, RF still suffers from working in GWAS for selecting informative SNPs and building accurate prediction models. In this paper, we propose to use a new two-stage quality-based sampling method in random forests, named ts-RF, for SNP subspace selection for GWAS. The method first applies p-value assessment to find a cut-off point that separates informative and irrelevant SNPs in two groups. The informative SNPs group is further divided into two sub-groups: highly informative and weak informative SNPs. When sampling the SNP subspace for building trees for the forest, only those SNPs from the two sub-groups are taken into account. The feature subspaces always contain highly informative SNPs when used to split a node at a tree. This approach enables one to generate more accurate trees with a lower prediction error, meanwhile possibly avoiding overfitting. It allows one to detect interactions of multiple SNPs with the diseases, and to reduce the dimensionality and the amount of Genome-wide association data needed for learning the RF model. Extensive experiments on two genome-wide SNP data sets (Parkinson case-control data comprised of 408,803 SNPs and Alzheimer case-control data comprised of 380,157 SNPs) and 10 gene data sets have demonstrated that the proposed model significantly reduced prediction errors and outperformed most existing the-state-of-the-art random forests. The top 25 SNPs in Parkinson data set were identified by the proposed model including four interesting genes associated with neurological disorders. The presented approach has shown to be effective in selecting informative sub-groups of SNPs potentially associated with diseases that traditional statistical approaches might fail. The new RF works well for the data where the number of case-control objects is much smaller than the number of SNPs, which is a typical problem in gene data and GWAS. Experiment results demonstrated the effectiveness of the proposed RF model that outperformed the state-of-the-art RFs, including Breiman's RF, GRRF and wsRF methods.
The Glyphosate-Based Herbicide Roundup Does not Elevate Genome-Wide Mutagenesis of Escherichia coli.

PubMed

Tincher, Clayton; Long, Hongan; Behringer, Megan; Walker, Noah; Lynch, Michael

2017-10-05

Mutations induced by pollutants may promote pathogen evolution, for example by accelerating mutations conferring antibiotic resistance. Generally, evaluating the genome-wide mutagenic effects of long-term sublethal pollutant exposure at single-nucleotide resolution is extremely difficult. To overcome this technical barrier, we use the mutation accumulation/whole-genome sequencing (MA/WGS) method as a mutagenicity test, to quantitatively evaluate genome-wide mutagenesis of Escherichia coli after long-term exposure to a wide gradient of the glyphosate-based herbicide (GBH) Roundup Concentrate Plus. The genome-wide mutation rate decreases as GBH concentration increases, suggesting that even long-term GBH exposure does not compromise the genome stability of bacteria. Copyright © 2017 Tincher et al.
Data partitioning enables the use of standard SOAP Web Services in genome-scale workflows.

PubMed

Sztromwasser, Pawel; Puntervoll, Pål; Petersen, Kjell

2011-07-26

Biological databases and computational biology tools are provided by research groups around the world, and made accessible on the Web. Combining these resources is a common practice in bioinformatics, but integration of heterogeneous and often distributed tools and datasets can be challenging. To date, this challenge has been commonly addressed in a pragmatic way, by tedious and error-prone scripting. Recently however a more reliable technique has been identified and proposed as the platform that would tie together bioinformatics resources, namely Web Services. In the last decade the Web Services have spread wide in bioinformatics, and earned the title of recommended technology. However, in the era of high-throughput experimentation, a major concern regarding Web Services is their ability to handle large-scale data traffic. We propose a stream-like communication pattern for standard SOAP Web Services, that enables efficient flow of large data traffic between a workflow orchestrator and Web Services. We evaluated the data-partitioning strategy by comparing it with typical communication patterns on an example pipeline for genomic sequence annotation. The results show that data-partitioning lowers resource demands of services and increases their throughput, which in consequence allows to execute in-silico experiments on genome-scale, using standard SOAP Web Services and workflows. As a proof-of-principle we annotated an RNA-seq dataset using a plain BPEL workflow engine.
Aquaculture genomics, genetics and breeding in the United States: current status, challenges, and priorities for future research.

PubMed

Abdelrahman, Hisham; ElHady, Mohamed; Alcivar-Warren, Acacia; Allen, Standish; Al-Tobasei, Rafet; Bao, Lisui; Beck, Ben; Blackburn, Harvey; Bosworth, Brian; Buchanan, John; Chappell, Jesse; Daniels, William; Dong, Sheng; Dunham, Rex; Durland, Evan; Elaswad, Ahmed; Gomez-Chiarri, Marta; Gosh, Kamal; Guo, Ximing; Hackett, Perry; Hanson, Terry; Hedgecock, Dennis; Howard, Tiffany; Holland, Leigh; Jackson, Molly; Jin, Yulin; Khalil, Karim; Kocher, Thomas; Leeds, Tim; Li, Ning; Lindsey, Lauren; Liu, Shikai; Liu, Zhanjiang; Martin, Kyle; Novriadi, Romi; Odin, Ramjie; Palti, Yniv; Peatman, Eric; Proestou, Dina; Qin, Guyu; Reading, Benjamin; Rexroad, Caird; Roberts, Steven; Salem, Mohamed; Severin, Andrew; Shi, Huitong; Shoemaker, Craig; Stiles, Sheila; Tan, Suxu; Tang, Kathy F J; Thongda, Wilawan; Tiersch, Terrence; Tomasso, Joseph; Prabowo, Wendy Tri; Vallejo, Roger; van der Steen, Hein; Vo, Khoi; Waldbieser, Geoff; Wang, Hanping; Wang, Xiaozhu; Xiang, Jianhai; Yang, Yujia; Yant, Roger; Yuan, Zihao; Zeng, Qifan; Zhou, Tao

2017-02-20

Advancing the production efficiency and profitability of aquaculture is dependent upon the ability to utilize a diverse array of genetic resources. The ultimate goals of aquaculture genomics, genetics and breeding research are to enhance aquaculture production efficiency, sustainability, product quality, and profitability in support of the commercial sector and for the benefit of consumers. In order to achieve these goals, it is important to understand the genomic structure and organization of aquaculture species, and their genomic and phenomic variations, as well as the genetic basis of traits and their interrelationships. In addition, it is also important to understand the mechanisms of regulation and evolutionary conservation at the levels of genome, transcriptome, proteome, epigenome, and systems biology. With genomic information and information between the genomes and phenomes, technologies for marker/causal mutation-assisted selection, genome selection, and genome editing can be developed for applications in aquaculture. A set of genomic tools and resources must be made available including reference genome sequences and their annotations (including coding and non-coding regulatory elements), genome-wide polymorphic markers, efficient genotyping platforms, high-density and high-resolution linkage maps, and transcriptome resources including non-coding transcripts. Genomic and genetic control of important performance and production traits, such as disease resistance, feed conversion efficiency, growth rate, processing yield, behaviour, reproductive characteristics, and tolerance to environmental stressors like low dissolved oxygen, high or low water temperature and salinity, must be understood. QTL need to be identified, validated across strains, lines and populations, and their mechanisms of control understood. Causal gene(s) need to be identified. Genetic and epigenetic regulation of important aquaculture traits need to be determined, and technologies for marker-assisted selection, causal gene/mutation-assisted selection, genome selection, and genome editing using CRISPR and other technologies must be developed, demonstrated with applicability, and application to aquaculture industries.Major progress has been made in aquaculture genomics for dozens of fish and shellfish species including the development of genetic linkage maps, physical maps, microarrays, single nucleotide polymorphism (SNP) arrays, transcriptome databases and various stages of genome reference sequences. This paper provides a general review of the current status, challenges and future research needs of aquaculture genomics, genetics, and breeding, with a focus on major aquaculture species in the United States: catfish, rainbow trout, Atlantic salmon, tilapia, striped bass, oysters, and shrimp. While the overall research priorities and the practical goals are similar across various aquaculture species, the current status in each species should dictate the next priority areas within the species. This paper is an output of the USDA Workshop for Aquaculture Genomics, Genetics, and Breeding held in late March 2016 in Auburn, Alabama, with participants from all parts of the United States.
SPlinted Ligation Adapter Tagging (SPLAT), a novel library preparation method for whole genome bisulphite sequencing

PubMed Central

Manlig, Erika; Wahlberg, Per

2017-01-01

Abstract Sodium bisulphite treatment of DNA combined with next generation sequencing (NGS) is a powerful combination for the interrogation of genome-wide DNA methylation profiles. Library preparation for whole genome bisulphite sequencing (WGBS) is challenging due to side effects of the bisulphite treatment, which leads to extensive DNA damage. Recently, a new generation of methods for bisulphite sequencing library preparation have been devised. They are based on initial bisulphite treatment of the DNA, followed by adaptor tagging of single stranded DNA fragments, and enable WGBS using low quantities of input DNA. In this study, we present a novel approach for quick and cost effective WGBS library preparation that is based on splinted adaptor tagging (SPLAT) of bisulphite-converted single-stranded DNA. Moreover, we validate SPLAT against three commercially available WGBS library preparation techniques, two of which are based on bisulphite treatment prior to adaptor tagging and one is a conventional WGBS method. PMID:27899585
Sequencing of diverse mandarin, pummelo and orange genomes reveals complex history of admixture during citrus domestication

PubMed Central

Wu, G. Albert; Prochnik, Simon; Jenkins, Jerry; Salse, Jerome; Hellsten, Uffe; Murat, Florent; Perrier, Xavier; Ruiz, Manuel; Scalabrin, Simone; Terol, Javier; Takita, Marco Aurélio; Labadie, Karine; Poulain, Julie; Couloux, Arnaud; Jabbari, Kamel; Cattonaro, Federica; Del Fabbro, Cristian; Pinosio, Sara; Zuccolo, Andrea; Chapman, Jarrod; Grimwood, Jane; Tadeo, Francisco R.; Estornell, Leandro H.; Muñoz-Sanz, Juan V.; Ibanez, Victoria; Herrero-Ortega, Amparo; Aleza, Pablo; Pérez-Pérez, Julián; Ramón, Daniel; Brunel, Dominique; Luro, François; Chen, Chunxian; Farmerie, William G.; Desany, Brian; Kodira, Chinnappa; Mohiuddin, Mohammed; Harkins, Tim; Fredrikson, Karin; Burns, Paul; Lomsadze, Alexandre; Borodovsky, Mark; Reforgiato, Giuseppe; Freitas-Astúa, Juliana; Quetier, Francis; Navarro, Luis; Roose, Mikeal; Wincker, Patrick; Schmutz, Jeremy; Morgante, Michele; Machado, Marcos Antonio; Talon, Manuel; Jaillon, Olivier; Ollitrault, Patrick; Gmitter, Frederick; Rokhsar, Daniel

2014-01-01

The domestication of citrus, is poorly understood. Cultivated types are selections from, or hybrids of, wild progenitor species, whose identities and contributions remain controversial. By comparative analysis of a collection of citrus genomes, including a high quality haploid reference, we show that cultivated types were derived from two progenitor species. Though cultivated pummelos represent selections from a single progenitor species, C. maxima, cultivated mandarins are introgressions of C. maxima into the ancestral mandarin species, C. reticulata. The most widely cultivated citrus, sweet orange, is the offspring of previously admixed individuals, but sour orange is an F1 hybrid of pure C. maxima and C. reticulata parents, implying that wild mandarins were part of the early breeding germplasm. A wild “mandarin” from China exhibited substantial divergence from C. reticulata, suggesting the possibility of other unrecognized wild citrus species. Understanding citrus phylogeny through genome analysis clarifies taxonomic relationships and enables sequence-directed genetic improvement. PMID:24908277
The epigenomic interface between genome and environment in common complex diseases.

PubMed

Bell, Christopher G; Beck, Stephan

2010-12-01

The epigenome plays the pivotal role as interface between genome and environment. True genome-wide assessments of epigenetic marks, such as DNA methylation (methylomes) or chromatin modifications (chromatinomes), are now possible, either through high-throughput arrays or increasingly by second-generation DNA sequencing methods. The ability to collect these data at this level of resolution enables us to begin to be able to propose detailed questions, and interrogate this information, with regards to changes that occur due to development, lineage and tissue-specificity, and significantly those caused by environmental influence, such as ageing, stress, diet, hormones or toxins. Common complex traits are under variable levels of genetic influence and additionally epigenetic effect. The detection of pathological epigenetic alterations will reveal additional insights into their aetiology and how possible environmental modulation of this mechanism may occur. Due to the reversibility of these marks, the potential for sequence-specific targeted therapeutics exists. This review surveys recent epigenomic advances and their current and prospective application to the study of common diseases.
A Transcription Activator-Like Effector (TALE) Toolbox for Genome Engineering

PubMed Central

Sanjana, Neville E.; Cong, Le; Zhou, Yang; Cunniff, Margaret M.; Feng, Guoping; Zhang, Feng

2013-01-01

Transcription activator-like effectors (TALEs) are a class of naturally occurring DNA binding proteins found in the plant pathogen Xanthomonas sp. The DNA binding domain of each TALE consists of tandem 34-amino acid repeat modules that can be rearranged according to a simple cipher to target new DNA sequences. Customized TALEs can be used for a wide variety of genome engineering applications, including transcriptional modulation and genome editing. Here we describe a toolbox for rapid construction of custom TALE transcription factors (TALE-TFs) and nucleases (TALENs) using a hierarchical ligation procedure. This toolbox facilitates affordable and rapid construction of custom TALE-TFs and TALENs within one week and can be easily scaled up to construct TALEs for multiple targets in parallel. We also provide details for testing the activity in mammalian cells of custom TALE-TFs and TALENs using, respectively, qRT-PCR and Surveyor nuclease. The TALE toolbox described here will enable a broad range of biological applications. PMID:22222791
bwtool: a tool for bigWig files

PubMed Central

Pohl, Andy; Beato, Miguel

2014-01-01

BigWig files are a compressed, indexed, binary format for genome-wide signal data for calculations (e.g. GC percent) or experiments (e.g. ChIP-seq/RNA-seq read depth). bwtool is a tool designed to read bigWig files rapidly and efficiently, providing functionality for extracting data and summarizing it in several ways, globally or at specific regions. Additionally, the tool enables the conversion of the positions of signal data from one genome assembly to another, also known as ‘lifting’. We believe bwtool can be useful for the analyst frequently working with bigWig data, which is becoming a standard format to represent functional signals along genomes. The article includes supplementary examples of running the software. Availability and implementation: The C source code is freely available under the GNU public license v3 at http://cromatina.crg.eu/bwtool. Contact: andrew.pohl@crg.eu, andypohl@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24489365
Using Genome Sequence to Enable the Design of Medicines and Chemical Probes.

PubMed

Angelbello, Alicia J; Chen, Jonathan L; Childs-Disney, Jessica L; Zhang, Peiyuan; Wang, Zi-Fu; Disney, Matthew D

2018-02-28

Rapid progress in genome sequencing technology has put us firmly into a postgenomic era. A key challenge in biomedical research is harnessing genome sequence to fulfill the promise of personalized medicine. This Review describes how genome sequencing has enabled the identification of disease-causing biomolecules and how these data have been converted into chemical probes of function, preclinical lead modalities, and ultimately U.S. Food and Drug Administration (FDA)-approved drugs. In particular, we focus on the use of oligonucleotide-based modalities to target disease-causing RNAs; small molecules that target DNA, RNA, or protein; the rational repurposing of known therapeutic modalities; and the advantages of pharmacogenetics. Lastly, we discuss the remaining challenges and opportunities in the direct utilization of genome sequence to enable design of medicines.
Genomes to natural products PRediction Informatics for Secondary Metabolomes (PRISM)

PubMed Central

Skinnider, Michael A.; Dejong, Chris A.; Rees, Philip N.; Johnston, Chad W.; Li, Haoxin; Webster, Andrew L. H.; Wyatt, Morgan A.; Magarvey, Nathan A.

2015-01-01

Microbial natural products are an invaluable source of evolved bioactive small molecules and pharmaceutical agents. Next-generation and metagenomic sequencing indicates untapped genomic potential, yet high rediscovery rates of known metabolites increasingly frustrate conventional natural product screening programs. New methods to connect biosynthetic gene clusters to novel chemical scaffolds are therefore critical to enable the targeted discovery of genetically encoded natural products. Here, we present PRISM, a computational resource for the identification of biosynthetic gene clusters, prediction of genetically encoded nonribosomal peptides and type I and II polyketides, and bio- and cheminformatic dereplication of known natural products. PRISM implements novel algorithms which render it uniquely capable of predicting type II polyketides, deoxygenated sugars, and starter units, making it a comprehensive genome-guided chemical structure prediction engine. A library of 57 tailoring reactions is leveraged for combinatorial scaffold library generation when multiple potential substrates are consistent with biosynthetic logic. We compare the accuracy of PRISM to existing genomic analysis platforms. PRISM is an open-source, user-friendly web application available at http://magarveylab.ca/prism/. PMID:26442528
Genome-wide Fitness Profiles Reveal a Requirement for Autophagy During Yeast Fermentation

PubMed Central

Piggott, Nina; Cook, Michael A.; Tyers, Mike; Measday, Vivien

2011-01-01

The ability of cells to respond to environmental changes and adapt their metabolism enables cell survival under stressful conditions. The budding yeast Saccharomyces cerevisiae (S. cerevisiae) is particularly well adapted to the harsh conditions of anaerobic wine fermentation. However, S. cerevisiae gene function has not been previously systematically interrogated under conditions of industrial fermentation. We performed a genome-wide study of essential and nonessential S. cerevisiae gene requirements during grape juice fermentation to identify deletion strains that are either depleted or enriched within the viable fermentative population. Genes that function in autophagy and ubiquitin-proteasome degradation are required for optimal survival during fermentation, whereas genes that function in ribosome assembly and peroxisome biogenesis impair fitness during fermentation. We also uncover fermentation phenotypes for 139 uncharacterized genes with no previously known cellular function. We demonstrate that autophagy is induced early in wine fermentation in a nitrogen-replete environment, suggesting that autophagy may be triggered by other forms of stress that arise during fermentation. These results provide insights into the complex fermentation process and suggest possible means for improvement of industrial fermentation strains. PMID:22384346
Genome-wide analysis of differential transcriptional and epigenetic variability across human immune cell types.

PubMed

Ecker, Simone; Chen, Lu; Pancaldi, Vera; Bagger, Frederik O; Fernández, José María; Carrillo de Santa Pau, Enrique; Juan, David; Mann, Alice L; Watt, Stephen; Casale, Francesco Paolo; Sidiropoulos, Nikos; Rapin, Nicolas; Merkel, Angelika; Stunnenberg, Hendrik G; Stegle, Oliver; Frontini, Mattia; Downes, Kate; Pastinen, Tomi; Kuijpers, Taco W; Rico, Daniel; Valencia, Alfonso; Beck, Stephan; Soranzo, Nicole; Paul, Dirk S

2017-01-26

A healthy immune system requires immune cells that adapt rapidly to environmental challenges. This phenotypic plasticity can be mediated by transcriptional and epigenetic variability. We apply a novel analytical approach to measure and compare transcriptional and epigenetic variability genome-wide across CD14 + CD16 - monocytes, CD66b + CD16 + neutrophils, and CD4 + CD45RA + naïve T cells from the same 125 healthy individuals. We discover substantially increased variability in neutrophils compared to monocytes and T cells. In neutrophils, genes with hypervariable expression are found to be implicated in key immune pathways and are associated with cellular properties and environmental exposure. We also observe increased sex-specific gene expression differences in neutrophils. Neutrophil-specific DNA methylation hypervariable sites are enriched at dynamic chromatin regions and active enhancers. Our data highlight the importance of transcriptional and epigenetic variability for the key role of neutrophils as the first responders to inflammatory stimuli. We provide a resource to enable further functional studies into the plasticity of immune cells, which can be accessed from: http://blueprint-dev.bioinfo.cnio.es/WP10/hypervariability .
Enrichment methods provide a feasible approach to comprehensive and adequately powered investigations of the brain methylome

PubMed Central

Chan, Robin F.; Shabalin, Andrey A.; Xie, Lin Y.; Adkins, Daniel E.; Zhao, Min; Turecki, Gustavo; Clark, Shaunna L.; Aberg, Karolina A.

2017-01-01

Abstract Methylome-wide association studies are typically performed using microarray technologies that only assay a very small fraction of the CG methylome and entirely miss two forms of methylation that are common in brain and likely of particular relevance for neuroscience and psychiatric disorders. The alternative is to use whole genome bisulfite (WGB) sequencing but this approach is not yet practically feasible with sample sizes required for adequate statistical power. We argue for revisiting methylation enrichment methods that, provided optimal protocols are used, enable comprehensive, adequately powered and cost-effective genome-wide investigations of the brain methylome. To support our claim we use data showing that enrichment methods approximate the sensitivity obtained with WGB methods and with slightly better specificity. However, this performance is achieved at <5% of the reagent costs. Furthermore, because many more samples can be sequenced simultaneously, projects can be completed about 15 times faster. Currently the only viable option available for comprehensive brain methylome studies, enrichment methods may be critical for moving the field forward. PMID:28334972

High-throughput alternative splicing detection using dually constrained correspondence analysis (DCCA).

PubMed

Baty, Florent; Klingbiel, Dirk; Zappa, Francesco; Brutsche, Martin

2015-12-01

Alternative splicing is an important component of tumorigenesis. Recent advent of exon array technology enables the detection of alternative splicing at a genome-wide scale. The analysis of high-throughput alternative splicing is not yet standard and methodological developments are still needed. We propose a novel statistical approach-Dually Constrained Correspondence Analysis-for the detection of splicing changes in exon array data. Using this methodology, we investigated the genome-wide alteration of alternative splicing in patients with non-small cell lung cancer treated by bevacizumab/erlotinib. Splicing candidates reveal a series of genes related to carcinogenesis (SFTPB), cell adhesion (STAB2, PCDH15, HABP2), tumor aggressiveness (ARNTL2), apoptosis, proliferation and differentiation (PDE4D, FLT3, IL1R2), cell invasion (ETV1), as well as tumor growth (OLFM4, FGF14), tumor necrosis (AFF3) or tumor suppression (TUSC3, CSMD1, RHOBTB2, SERPINB5), with indication of known alternative splicing in a majority of genes. DCCA facilitates the identification of putative biologically relevant alternative splicing events in high-throughput exon array data. Copyright © 2015 Elsevier Inc. All rights reserved.
Genome-wide association mapping identifies multiple loci for a canine SLE-related disease complex.

PubMed

Wilbe, Maria; Jokinen, Päivi; Truvé, Katarina; Seppala, Eija H; Karlsson, Elinor K; Biagi, Tara; Hughes, Angela; Bannasch, Danika; Andersson, Göran; Hansson-Hamlin, Helene; Lohi, Hannes; Lindblad-Toh, Kerstin

2010-03-01

The unique canine breed structure makes dogs an excellent model for studying genetic diseases. Within a dog breed, linkage disequilibrium is extensive, enabling genome-wide association (GWA) with only around 15,000 SNPs and fewer individuals than in human studies. Incidences of specific diseases are elevated in different breeds, indicating that a few genetic risk factors might have accumulated through drift or selective breeding. In this study, a GWA study with 81 affected dogs (cases) and 57 controls from the Nova Scotia duck tolling retriever breed identified five loci associated with a canine systemic lupus erythematosus (SLE)-related disease complex that includes both antinuclear antibody (ANA)-positive immune-mediated rheumatic disease (IMRD) and steroid-responsive meningitis-arteritis (SRMA). Fine mapping with twice as many dogs validated these loci. Our results indicate that the homogeneity of strong genetic risk factors within dog breeds allows multigenic disorders to be mapped with fewer than 100 cases and 100 controls, making dogs an excellent model in which to identify pathways involved in human complex diseases.
Genome-Wide Association Analysis of Adaptation Using Environmentally Predicted Traits

PubMed Central

van Zanten, Martijn

2015-01-01

Current methods for studying the genetic basis of adaptation evaluate genetic associations with ecologically relevant traits or single environmental variables, under the implicit assumption that natural selection imposes correlations between phenotypes, environments and genotypes. In practice, observed trait and environmental data are manifestations of unknown selective forces and are only indirectly associated with adaptive genetic variation. In theory, improved estimation of these forces could enable more powerful detection of loci under selection. Here we present an approach in which we approximate adaptive variation by modeling phenotypes as a function of the environment and using the predicted trait in multivariate and univariate genome-wide association analysis (GWAS). Based on computer simulations and published flowering time data from the model plant Arabidopsis thaliana, we find that environmentally predicted traits lead to higher recovery of functional loci in multivariate GWAS and are more strongly correlated to allele frequencies at adaptive loci than individual environmental variables. Our results provide an example of the use of environmental data to obtain independent and meaningful information on adaptive genetic variation. PMID:26496492
Structural Analysis of Biodiversity

PubMed Central

Sirovich, Lawrence; Stoeckle, Mark Y.; Zhang, Yu

2010-01-01

Large, recently-available genomic databases cover a wide range of life forms, suggesting opportunity for insights into genetic structure of biodiversity. In this study we refine our recently-described technique using indicator vectors to analyze and visualize nucleotide sequences. The indicator vector approach generates correlation matrices, dubbed Klee diagrams, which represent a novel way of assembling and viewing large genomic datasets. To explore its potential utility, here we apply the improved algorithm to a collection of almost 17000 DNA barcode sequences covering 12 widely-separated animal taxa, demonstrating that indicator vectors for classification gave correct assignment in all 11000 test cases. Indicator vector analysis revealed discontinuities corresponding to species- and higher-level taxonomic divisions, suggesting an efficient approach to classification of organisms from poorly-studied groups. As compared to standard distance metrics, indicator vectors preserve diagnostic character probabilities, enable automated classification of test sequences, and generate high-information density single-page displays. These results support application of indicator vectors for comparative analysis of large nucleotide data sets and raise prospect of gaining insight into broad-scale patterns in the genetic structure of biodiversity. PMID:20195371
77 FR 43237 - Genome in a Bottle Consortium-Work Plan Review Workshop

Federal Register 2010, 2011, 2012, 2013, 2014

2012-07-24

... in human whole genome variant calls. A principal motivation for this consortium is to enable... principal motivation for this consortium is to enable science-based regulatory oversight of clinical...
Genome-wide screening and identification of antigens for rickettsial vaccine development

USDA-ARS?s Scientific Manuscript database

The capacity to identify immunogens for vaccine development by genome-wide screening has been markedly enhanced by the availability of complete microbial genome sequences coupled to rapid proteomic and bioinformatic analysis. Critical to this genome-wide screening is in vivo testing in the context o...
Chromosomal Mapping of Canine-Derived BAC Clones to the Red Fox and American Mink Genomes

PubMed Central

Vorobieva, Nadegda V.; Beklemisheva, Violetta R.; Johnson, Jennifer L.; Temnykh, Svetlana V.; Yudkin, Dmitry V.; Trut, Lyudmila N.; Andre, Catherine; Galibert, Francis; Aguirre, Gustavo D.; Acland, Gregory M.; Graphodatsky, Alexander S.

2009-01-01

High-quality sequencing of the dog (Canis lupus familiaris) genome has enabled enormous progress in genetic mapping of canine phenotypic variation. The red fox (Vulpes vulpes), another canid species, also exhibits a wide range of variation in coat color, morphology, and behavior. Although the fox genome has not yet been sequenced, canine genomic resources have been used to construct a meiotic linkage map of the red fox genome and begin genetic mapping in foxes. However, a more detailed gene-specific comparative map between the dog and fox genomes is required to establish gene order within homologous regions of dog and fox chromosomes and to refine breakpoints between homologous chromosomes of the 2 species. In the current study, we tested whether canine-derived gene–containing bacterial artificial chromosome (BAC) clones can be routinely used to build a gene-specific map of the red fox genome. Forty canine BAC clones were mapped to the red fox genome by fluorescence in situ hybridization (FISH). Each clone was uniquely assigned to a single fox chromosome, and the locations of 38 clones agreed with cytogenetic predictions. These results clearly demonstrate the utility of FISH mapping for construction of a whole-genome gene-specific map of the red fox. The further possibility of using canine BAC clones to map genes in the American mink (Mustela vison) genome was also explored. Much lower success was obtained for this more distantly related farm-bred species, although a few BAC clones were mapped to the predicted chromosomal locations. PMID:19546120
Chromosomal mapping of canine-derived BAC clones to the red fox and American mink genomes.

PubMed

Kukekova, Anna V; Vorobieva, Nadegda V; Beklemisheva, Violetta R; Johnson, Jennifer L; Temnykh, Svetlana V; Yudkin, Dmitry V; Trut, Lyudmila N; Andre, Catherine; Galibert, Francis; Aguirre, Gustavo D; Acland, Gregory M; Graphodatsky, Alexander S

2009-01-01

High-quality sequencing of the dog (Canis lupus familiaris) genome has enabled enormous progress in genetic mapping of canine phenotypic variation. The red fox (Vulpes vulpes), another canid species, also exhibits a wide range of variation in coat color, morphology, and behavior. Although the fox genome has not yet been sequenced, canine genomic resources have been used to construct a meiotic linkage map of the red fox genome and begin genetic mapping in foxes. However, a more detailed gene-specific comparative map between the dog and fox genomes is required to establish gene order within homologous regions of dog and fox chromosomes and to refine breakpoints between homologous chromosomes of the 2 species. In the current study, we tested whether canine-derived gene-containing bacterial artificial chromosome (BAC) clones can be routinely used to build a gene-specific map of the red fox genome. Forty canine BAC clones were mapped to the red fox genome by fluorescence in situ hybridization (FISH). Each clone was uniquely assigned to a single fox chromosome, and the locations of 38 clones agreed with cytogenetic predictions. These results clearly demonstrate the utility of FISH mapping for construction of a whole-genome gene-specific map of the red fox. The further possibility of using canine BAC clones to map genes in the American mink (Mustela vison) genome was also explored. Much lower success was obtained for this more distantly related farm-bred species, although a few BAC clones were mapped to the predicted chromosomal locations.
Horizon scanning for new genomic tests.

PubMed

Gwinn, Marta; Grossniklaus, Daurice A; Yu, Wei; Melillo, Stephanie; Wulf, Anja; Flome, Jennifer; Dotson, W David; Khoury, Muin J

2011-02-01

The development of health-related genomic tests is decentralized and dynamic, involving government, academic, and commercial entities. Consequently, it is not easy to determine which tests are in development, currently available, or discontinued. We developed and assessed the usefulness of a systematic approach to identifying new genomic tests on the Internet. We devised targeted queries of Web pages, newspaper articles, and blogs (Google Alerts) to identify new genomic tests. We finalized search and review procedures during a pilot phase that ended in March 2010. Queries continue to run daily and are compiled weekly; selected data are indexed in an online database, the Genomic Applications in Practice and Prevention Finder. After the pilot phase, our scan detected approximately two to three new genomic tests per week. Nearly two thirds of all tests (122/188, 65%) were related to cancer; only 6% were related to hereditary disorders. Although 88 (47%) of the tests, including 2 marketed directly to consumers, were commercially available, only 12 (6%) claimed United States Food and Drug Administration licensure. Systematic surveillance of the Internet provides information about genomic tests that can be used in combination with other resources to evaluate genomic tests. The Genomic Applications in Practice and Prevention Finder makes this information accessible to a wide group of stakeholders.
A preliminary qualitative exploration of dietitians' engagement with genetics and nutritional genomics: perspectives from international leaders.

PubMed

Li, Sherly X; Collins, Jorja; Lawson, Stephanie; Thomas, Jane; Truby, Helen; Whelan, Kevin; Palermo, Claire

2014-01-01

This qualitative study explored the underlying determinants of dietitians' current practice and attitudes about nutritional genomics. Sixteen semi-structured interviews were conducted with international leaders selected across each domain of dietetics practice from Australia (n=8) and the United Kingdom (n=8). Interviews explored knowledge, involvement, perceived role, and attitudes about the benefits and barriers of genetics and nutritional genomics. Interviews were transcribed and analysed using thematic analysis. Five key themes were identified: (i) acknowledgment that there are wide applications for nutritional genomics; (ii) a general lack of awareness of nutritional genomics that underlies a knowledge, skills, and confidence gap; (iii) dietitians are patient-orientated and thus are receptive to the public's needs; (iv) the legitimacy of commercialised nutritional genomics products and services; and (v) prioritisation of nutritional genomics amongst other practice-related commitments as well as the influence of the workplace setting. In order for healthcare services to prepare for the application of nutritional genomics, these social, political, attitudinal, and awareness issues amongst dietitians need to be addressed. Further education in nutritional genomics may help to build awareness, continued research is crucial in determining utility, whilst establishing a healthcare system that supports and rewards this approach may cultivate its adoption.
Genome-wide comparison of paired fresh frozen and formalin-fixed paraffin-embedded gliomas by custom BAC and oligonucleotide array comparative genomic hybridization: facilitating analysis of archival gliomas

PubMed Central

Mohapatra, Gayatry; Engler, David A.; Starbuck, Kristen D.; Kim, James C.; Bernay, Derek C.; Scangas, George A.; Rousseau, Audrey; Batchelor, Tracy T.; Betensky, Rebecca A.; Louis, David N.

2010-01-01

Molecular genetic analysis of cancer is rapidly evolving as a result of improvement in genomic technologies and the growing applicability of such analyses to clinical oncology. Array based comparative genomic hybridization (aCGH) is a powerful tool for detecting DNA copy number alterations (CNA), particularly in solid tumors, and has been applied to the study of malignant gliomas. In the clinical setting, however, gliomas are often sampled by small biopsies and thus formalin-fixed paraffin-embedded (FFPE) blocks are often the only tissue available for genetic analysis, especially for rare types of gliomas. Moreover, the biological basis for the marked intratumoral heterogeneity in gliomas is most readily addressed in FFPE material. Therefore, for gliomas, the ability to use DNA from FFPE tissue is essential for both clinical and research applications. In this study, we have constructed a custom bacterial artificial chromosome (BAC) array and show excellent sensitivity and specificity for detecting CNAs in a panel of paired frozen and FFPE glioma samples. Our study demonstrates a high concordance rate between CNAs detected in FFPE compared to frozen DNA. We have also developed a method of labeling DNA from FFPE tissue that allows efficient hybridization to oligonucleotide arrays. This labeling technique was applied to a panel of biphasic anaplastic oligoastrocytomas (AOA) to identify genetic changes unique to each component. Together, results from these studies suggest that BAC and oligonucleotide aCGH are sensitive tools for detecting CNAs in FFPE DNA, and can enable genome-wide analysis of rare, small and/or histologically heterogeneous gliomas. PMID:21080181
Genome-wide identification and characterization of NB-ARC resistant genes in wheat (Triticum aestivum L.) and their expression during leaf rust infection.

PubMed

Chandra, Saket; Kazmi, Andaleeb Z; Ahmed, Zainab; Roychowdhury, Gargi; Kumari, Veena; Kumar, Manish; Mukhopadhyay, Kunal

2017-07-01

NB-ARC domain-containing resistance genes from the wheat genome were identified, characterized and localized on chromosome arms that displayed differential yet positive response during incompatible and compatible leaf rust interactions. Wheat (Triticum aestivum L.) is an important cereal crop; however, its production is affected severely by numerous diseases including rusts. An efficient, cost-effective and ecologically viable approach to control pathogens is through host resistance. In wheat, high numbers of resistance loci are present but only few have been identified and cloned. A comprehensive analysis of the NB-ARC-containing genes in complete wheat genome was accomplished in this study. Complete NB-ARC encoding genes were mined from the Ensembl Plants database to predict 604 NB-ARC containing sequences using the HMM approach. Genome-wide analysis of orthologous clusters in the NB-ARC-containing sequences of wheat and other members of the Poaceae family revealed maximum homology with Oryza sativa indica and Brachypodium distachyon. The identification of overlap between orthologous clusters enabled the elucidation of the function and evolution of resistance proteins. The distributions of the NB-ARC domain-containing sequences were found to be balanced among the three wheat sub-genomes. Wheat chromosome arms 4AL and 7BL had the most NB-ARC domain-containing contigs. The spatio-temporal expression profiling studies exemplified the positive role of these genes in resistant and susceptible wheat plants during incompatible and compatible interaction in response to the leaf rust pathogen Puccinia triticina. Two NB-ARC domain-containing sequences were modelled in silico, cloned and sequenced to analyze their fine structures. The data obtained in this study will augment isolation, characterization and application NB-ARC resistance genes in marker-assisted selection based breeding programs for improving rust resistance in wheat.
Genome-Wide Association Mapping for Intelligence in Military Working Dogs: Canine Cohort, Canine Intelligence Assessment Regimen, Genome-Wide Single Nucleotide Polymorphism (SNP) Typing, and Unsupervised Classification Algorithm for Genome-Wide Association Data Analysis

DTIC Science & Technology

2011-09-01

Almasy, L, Blangero, J. (2009) Human QTL linkage mapping. Genetica 136:333-340. Amos, CI. (2007) Successful design and conduct of genome-wide...quantitative trait loci. Genetica 136:237-243. Skol AD, Scott LJ, Abecasis GR, Boehnke M. (2006) Joint analysis is more efficient than replication
Emergent Self-Organized Criticality in Gene Expression Dynamics: Temporal Development of Global Phase Transition Revealed in a Cancer Cell Line

PubMed Central

Tsuchiya, Masa; Giuliani, Alessandro; Hashimoto, Midori; Erenpreisa, Jekaterina; Yoshikawa, Kenichi

2015-01-01

Background The underlying mechanism of dynamic control of the genome-wide expression is a fundamental issue in bioscience. We addressed it in terms of phase transition by a systemic approach based on both density analysis and characteristics of temporal fluctuation for the time-course mRNA expression in differentiating MCF-7 breast cancer cells. Methodology In a recent work, we suggested criticality as an essential aspect of dynamic control of genome-wide gene expression. Criticality was evident by a unimodal-bimodal transition through flattened unimodal expression profile. The flatness on the transition suggests the existence of a critical transition at which up- and down-regulated expression is balanced. Mean field (averaging) behavior of mRNAs based on the temporal expression changes reveals a sandpile type of transition in the flattened profile. Furthermore, around the transition, a self-similar unimodal-bimodal transition of the whole expression occurs in the density profile of an ensemble of mRNA expression. These singular and scaling behaviors identify the transition as the expression phase transition driven by self-organized criticality (SOC). Principal Findings Emergent properties of SOC through a mean field approach are revealed: i) SOC, as a form of genomic phase transition, consolidates distinct critical states of expression, ii) Coupling of coherent stochastic oscillations between critical states on different time-scales gives rise to SOC, and iii) Specific gene clusters (barcode genes) ranging in size from kbp to Mbp reveal similar SOC to genome-wide mRNA expression and ON-OFF synchronization to critical states. This suggests that the cooperative gene regulation of topological genome sub-units is mediated by the coherent phase transitions of megadomain-scaled conformations between compact and swollen chromatin states. Conclusion and Significance In summary, our study provides not only a systemic method to demonstrate SOC in whole-genome expression, but also introduces novel, physically grounded concepts for a breakthrough in the study of biological regulation. PMID:26067993
The Genetic Basis of Psoriasis

PubMed Central

Capon, Francesca

2017-01-01

Psoriasis is widely regarded as a multifactorial condition which is caused by the interaction between inherited susceptibility alleles and environmental triggers. In the last decade, technological advances have enabled substantial progress in the understanding of disease genetics. Genome-wide association studies have identified more than 60 disease susceptibility regions, highlighting the pathogenic involvement of genes related to Th17 cell activation. This pathway has now been targeted by a new generation of biologics that have shown great efficacy in clinical trials. At the same time, the study of rare variants of psoriasis has identified interleukin (IL)-36 cytokines as important amplifiers of Th17 signaling and promising targets for therapeutic intervention. Here, we review these exciting discoveries, which highlight the translational potential of genetic studies. PMID:29186830
Body Area Network BAN--a key infrastructure element for patient-centered medical applications.

PubMed

Schmidt, Robert; Norgall, Thomas; Mörsdorf, Joachim; Bernhard, Josef; von der Grün, Thomas

2002-01-01

The Body Area Network (BAN) concept enables wireless communication between several miniaturized, intelligent Body Sensor (or actor) Units (BSU) and a single Body Central Unit (BCU) worn at the human body. A separate wireless transmission link from the BCU to a network access point--using different technology--provides for online access to BAN data via usual network infrastructure. BAN is expected to become a basic infrastructure element for service-based electronic health assistance: By integrating patient-attached sensors and control of mobile dedicated actor units, the range of medical workflow can be extended by wireless patient monitoring and therapy support. Beyond clinical use, professional disease management environments, and private personal health assistance scenarios (without financial reimbursement by health agencies/insurance companies), BAN enables a wide range of health care applications and related services.
The detailed 3D multi-loop aggregate/rosette chromatin architecture and functional dynamic organization of the human and mouse genomes.

PubMed

Knoch, Tobias A; Wachsmuth, Malte; Kepper, Nick; Lesnussa, Michael; Abuseiris, Anis; Ali Imam, A M; Kolovos, Petros; Zuin, Jessica; Kockx, Christel E M; Brouwer, Rutger W W; van de Werken, Harmen J G; van IJcken, Wilfred F J; Wendt, Kerstin S; Grosveld, Frank G

2016-01-01

The dynamic three-dimensional chromatin architecture of genomes and its co-evolutionary connection to its function-the storage, expression, and replication of genetic information-is still one of the central issues in biology. Here, we describe the much debated 3D architecture of the human and mouse genomes from the nucleosomal to the megabase pair level by a novel approach combining selective high-throughput high-resolution chromosomal interaction capture ( T2C ), polymer simulations, and scaling analysis of the 3D architecture and the DNA sequence. The genome is compacted into a chromatin quasi-fibre with ~5 ± 1 nucleosomes/11 nm, folded into stable ~30-100 kbp loops forming stable loop aggregates/rosettes connected by similar sized linkers. Minor but significant variations in the architecture are seen between cell types and functional states. The architecture and the DNA sequence show very similar fine-structured multi-scaling behaviour confirming their co-evolution and the above. This architecture, its dynamics, and accessibility, balance stability and flexibility ensuring genome integrity and variation enabling gene expression/regulation by self-organization of (in)active units already in proximity. Our results agree with the heuristics of the field and allow "architectural sequencing" at a genome mechanics level to understand the inseparable systems genomic properties.
A hybrid BAC physical map of potato: a framework for sequencing a heterozygous genome

PubMed Central

2011-01-01

Background Potato is the world's third most important food crop, yet cultivar improvement and genomic research in general remain difficult because of the heterozygous and tetraploid nature of its genome. The development of physical map resources that can facilitate genomic analyses in potato has so far been very limited. Here we present the methods of construction and the general statistics of the first two genome-wide BAC physical maps of potato, which were made from the heterozygous diploid clone RH89-039-16 (RH). Results First, a gel electrophoresis-based physical map was made by AFLP fingerprinting of 64478 BAC clones, which were aligned into 4150 contigs with an estimated total length of 1361 Mb. Screening of BAC pools, followed by the KeyMaps in silico anchoring procedure, identified 1725 AFLP markers in the physical map, and 1252 BAC contigs were anchored the ultradense potato genetic map. A second, sequence-tag-based physical map was constructed from 65919 whole genome profiling (WGP) BAC fingerprints and these were aligned into 3601 BAC contigs spanning 1396 Mb. The 39733 BAC clones that overlap between both physical maps provided anchors to 1127 contigs in the WGP physical map, and reduced the number of contigs to around 2800 in each map separately. Both physical maps were 1.64 times longer than the 850 Mb potato genome. Genome heterozygosity and incomplete merging of BAC contigs are two factors that can explain this map inflation. The contig information of both physical maps was united in a single table that describes hybrid potato physical map. Conclusions The AFLP physical map has already been used by the Potato Genome Sequencing Consortium for sequencing 10% of the heterozygous genome of clone RH on a BAC-by-BAC basis. By layering a new WGP physical map on top of the AFLP physical map, a genetically anchored genome-wide framework of 322434 sequence tags has been created. This reference framework can be used for anchoring and ordering of genomic sequences of clone RH (and other potato genotypes), and opens the possibility to finish sequencing of the RH genome in a more efficient way via high throughput next generation approaches. PMID:22142254
The response and recovery of the Arabidopsis thaliana transcriptome to phosphate starvation.

PubMed

Woo, Jongchan; MacPherson, Cameron Ross; Liu, Jun; Wang, Huan; Kiba, Takatoshi; Hannah, Matthew A; Wang, Xiu-Jie; Bajic, Vladimir B; Chua, Nam-Hai

2012-05-03

Over application of phosphate fertilizers in modern agriculture contaminates waterways and disrupts natural ecosystems. Nevertheless, this is a common practice among farmers, especially in developing countries as abundant fertilizers are believed to boost crop yields. The study of plant phosphate metabolism and its underlying genetic pathways is key to discovering methods of efficient fertilizer usage. The work presented here describes a genome-wide resource on the molecular dynamics underpinning the response and recovery in roots and shoots of Arabidopsis thaliana to phosphate-starvation. Genome-wide profiling by micro- and tiling-arrays (accessible from GEO: GSE34004) revealed minimal overlap between root and shoot transcriptomes suggesting two independent phosphate-starvation regulons. Novel gene expression patterns were detected for over 1000 candidates and were classified as either initial, persistent, or latent responders. Comparative analysis to AtGenExpress identified cohorts of genes co-regulated across multiple stimuli. The hormone ABA displayed a dominant role in regulating many phosphate-responsive candidates. Analysis of co-regulation enabled the determination of specific versus generic members of closely related gene families with respect to phosphate-starvation. Thus, among others, we showed that PHR1-regulated members of closely related phosphate-responsive families (PHT1;1, PHT1;7-9, SPX1-3, and PHO1;H1) display greater specificity to phosphate-starvation than their more generic counterparts. Our results uncover much larger, staged responses to phosphate-starvation than previously described. To our knowledge, this work describes the most complete genome-wide data on plant nutrient stress to-date.
Tips and tricks for the assembly of a Corynebacterium pseudotuberculosis genome using a semiconductor sequencer.

PubMed

Ramos, Rommel Thiago Jucá; Carneiro, Adriana Ribeiro; Soares, Siomar de Castro; dos Santos, Anderson Rodrigues; Almeida, Sintia; Guimarães, Luis; Figueira, Flávia; Barbosa, Eudes; Tauch, Andreas; Azevedo, Vasco; Silva, Artur

2013-03-01

New sequencing platforms have enabled rapid decoding of complete prokaryotic genomes at relatively low cost. The Ion Torrent platform is an example of these technologies, characterized by lower coverage, generating challenges for the genome assembly. One particular problem is the lack of genomes that enable reference-based assembly, such as the one used in the present study, Corynebacterium pseudotuberculosis biovar equi, which causes high economic losses in the US equine industry. The quality treatment strategy incorporated into the assembly pipeline enabled a 16-fold greater use of the sequencing data obtained compared with traditional quality filter approaches. Data preprocessing prior to the de novo assembly enabled the use of known methodologies in the next-generation sequencing data assembly. Moreover, manual curation was proved to be essential for ensuring a quality assembly, which was validated by comparative genomics with other species of the genus Corynebacterium. The present study presents a modus operandi that enables a greater and better use of data obtained from semiconductor sequencing for obtaining the complete genome from a prokaryotic microorganism, C. pseudotuberculosis, which is not a traditional biological model such as Escherichia coli. © 2012 The Authors. Published by Society for Applied Microbiology and Blackwell Publishing Ltd. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

Using Semantic Web Technologies for Cohort Identification from Electronic Health Records for Clinical Research

PubMed Central

Pathak, Jyotishman; Kiefer, Richard C.; Chute, Christopher G.

2012-01-01

The ability to conduct genome-wide association studies (GWAS) has enabled new exploration of how genetic variations contribute to health and disease etiology. One of the key requirements to perform GWAS is the identification of subject cohorts with accurate classification of disease phenotypes. In this work, we study how emerging Semantic Web technologies can be applied in conjunction with clinical data stored in electronic health records (EHRs) to accurately identify subjects with specific diseases for inclusion in cohort studies. In particular, we demonstrate the role of using Resource Description Framework (RDF) for representing EHR data and enabling federated querying and inferencing via standardized Web protocols for identifying subjects with Diabetes Mellitus. Our study highlights the potential of using Web-scale data federation approaches to execute complex queries. PMID:22779040
Genome-wide Mapping of Cellular Protein–RNA Interactions Enabled by Chemical Crosslinking

PubMed Central

Li, Xiaoyu; Song, Jinghui; Yi, Chengqi

2014-01-01

RNA–protein interactions influence many biological processes. Identifying the binding sites of RNA-binding proteins (RBPs) remains one of the most fundamental and important challenges to the studies of such interactions. Capturing RNA and RBPs via chemical crosslinking allows stringent purification procedures that significantly remove the non-specific RNA and protein interactions. Two major types of chemical crosslinking strategies have been developed to date, i.e., UV-enabled crosslinking and enzymatic mechanism-based covalent capture. In this review, we compare such strategies and their current applications, with an emphasis on the technologies themselves rather than the biology that has been revealed. We hope such methods could benefit broader audience and also urge for the development of new methods to study RNA−RBP interactions. PMID:24747191
Cronobacter, the emergent bacterial pathogen Enterobacter sakazakii comes of age; MLST and whole genome sequence analysis.

PubMed

Forsythe, Stephen J; Dickins, Benjamin; Jolley, Keith A

2014-12-16

Following the association of Cronobacter spp. to several publicized fatal outbreaks in neonatal intensive care units of meningitis and necrotising enterocolitis, the World Health Organization (WHO) in 2004 requested the establishment of a molecular typing scheme to enable the international control of the organism. This paper presents the application of Next Generation Sequencing (NGS) to Cronobacter which has led to the establishment of the Cronobacter PubMLST genome and sequence definition database (http://pubmlst.org/cronobacter/) containing over 1000 isolates with metadata along with the recognition of specific clonal lineages linked to neonatal meningitis and adult infections Whole genome sequencing and multilocus sequence typing (MLST) has supports the formal recognition of the genus Cronobacter composed of seven species to replace the former single species Enterobacter sakazakii. Applying the 7-loci MLST scheme to 1007 strains revealed 298 definable sequence types, yet only C. sakazakii clonal complex 4 (CC4) was principally associated with neonatal meningitis. This clonal lineage has been confirmed using ribosomal-MLST (51-loci) and whole genome-MLST (1865 loci) to analyse 107 whole genomes via the Cronobacter PubMLST database. This database has enabled the retrospective analysis of historic cases and outbreaks following re-identification of those strains. The Cronobacter PubMLST database offers a central, open access, reliable sequence-based repository for researchers. It has the capacity to create new analysis schemes 'on the fly', and to integrate metadata (source, geographic distribution, clinical presentation). It is also expandable and adaptable to changes in taxonomy, and able to support the development of reliable detection methods of use to industry and regulatory authorities. Therefore it meets the WHO (2004) request for the establishment of a typing scheme for this emergent bacterial pathogen. Whole genome sequencing has additionally shown a range of potential virulence and environmental fitness traits which may account for the association of C. sakazakii CC4 pathogenicity, and propensity for neonatal CNS.
Electronic medical records and genomics (eMERGE) network exploration in cataract: Several new potential susceptibility loci

PubMed Central

Verma, Shefali S.; Hall, Molly A.; Goodloe, Robert J.; Berg, Richard L.; Carrell, Dave S.; Carlson, Christopher S.; Chen, Lin; Crosslin, David R.; Denny, Joshua C.; Jarvik, Gail; Li, Rongling; Linneman, James G.; Pathak, Jyoti; Peissig, Peggy; Rasmussen, Luke V.; Ramirez, Andrea H.; Wang, Xiaoming; Wilke, Russell A.; Wolf, Wendy A.; Torstenson, Eric S.; Turner, Stephen D.; McCarty, Catherine A.

2014-01-01

Purpose Cataract is the leading cause of blindness in the world, and in the United States accounts for approximately 60% of Medicare costs related to vision. The purpose of this study was to identify genetic markers for age-related cataract through a genome-wide association study (GWAS). Methods In the electronic medical records and genomics (eMERGE) network, we ran an electronic phenotyping algorithm on individuals in each of five sites with electronic medical records linked to DNA biobanks. We performed a GWAS using 530,101 SNPs from the Illumina 660W-Quad in a total of 7,397 individuals (5,503 cases and 1,894 controls). We also performed an age-at-diagnosis case-only analysis. Results We identified several statistically significant associations with age-related cataract (45 SNPs) as well as age at diagnosis (44 SNPs). The 45 SNPs associated with cataract at p<1×10−5 are in several interesting genes, including ALDOB, MAP3K1, and MEF2C. All have potential biologic relationships with cataracts. Conclusions This is the first genome-wide association study of age-related cataract, and several regions of interest have been identified. The eMERGE network has pioneered the exploration of genomic associations in biobanks linked to electronic health records, and this study is another example of the utility of such resources. Explorations of age-related cataract including validation and replication of the association results identified herein are needed in future studies. PMID:25352737
Analysis of Multiallelic CNVs by Emulsion Haplotype Fusion PCR.

PubMed

Tyson, Jess; Armour, John A L

2017-01-01

Emulsion-fusion PCR recovers long-range sequence information by combining products in cis from individual genomic DNA molecules. Emulsion droplets act as very numerous small reaction chambers in which different PCR products from a single genomic DNA molecule are condensed into short joint products, to unite sequences in cis from widely separated genomic sites. These products can therefore provide information about the arrangement of sequences and variants at a larger scale than established long-read sequencing methods. The method has been useful in defining the phase of variants in haplotypes, the typing of inversions, and determining the configuration of sequence variants in multiallelic CNVs. In this description we outline the rationale for the application of emulsion-fusion PCR methods to the analysis of multiallelic CNVs, and give practical details for our own implementation of the method in that context.
Phylogenomics of Brazilian epidemic isolates of Mycobacterium abscessus subsp. bolletii reveals relationships of global outbreak strains

PubMed Central

Davidson, Rebecca M.; Hasan, Nabeeh A.; de Moura, Vinicius Calado Nogueira; Duarte, Rafael Silva; Jackson, Mary; Strong, Michael

2013-01-01

Rapidly growing, non-tuberculous mycobacteria (NTM) in the Mycobacterium abscessus (MAB) species are emerging pathogens that cause various diseases including skin and respiratory infections. The species has undergone recent taxonomic nomenclature refinement, and is currently recognized as two subspecies, M. abscessus subsp. abscessus (MAB-A) and M. abscessus subsp. bolletii (MAB-B). The recently reported outbreaks of MAB-B in surgical patients in Brazil from 2004 to 2009 and in cystic fibrosis patients in the United Kingdom (UK) in 2006 to 2012 underscore the need to investigate the genetic diversity of clinical MAB strains. To this end, we sequenced the genomes of two Brazilian MAB-B epidemic isolates (CRM-0019 and CRM-0020) derived from an outbreak of skin infections in Rio de Janeiro, two unrelated MAB strains from patients with pulmonary infections in the United States (US) (NJH8 and NJH11) and one type MAB-B strain (CCUG 48898) and compared them to 25 publically available genomes of globally diverse MAB strains. Genome-wide analyses of 27,598 core genome single nucleotide polymorphisms (SNPs) revealed that the two Brazilian derived CRM strains are nearly indistinguishable from one another and are more closely related to UK outbreak isolates infecting CF patients than to strains from the US, Malaysia or France. Comparative genomic analyses of six closely related outbreak strains revealed geographic-specific large-scale insertion/deletion variation that corresponds to bacteriophage insertions and recombination hotspots. Our study integrates new genome sequence data with existing genomic information to explore the global diversity of infectious M. abscessus isolates and to compare clinically relevant outbreak strains from different continents. PMID:24055961
[Tale nucleases--new tool for genome editing].

PubMed

Glazkova, D V; Shipulin, G A

2014-01-01

The ability to introduce targeted changes in the genome of living cells or entire organisms enables researchers to meet the challenges of basic life sciences, biotechnology and medicine. Knockdown of target genes in the zygotes gives the opportunity to investigate the functions of these genes in different organisms. Replacement of single nucleotide in the DNA sequence allows to correct mutations in genes and thus to cure hereditary diseases. Adding transgene to specific genomic.loci can be used in biotechnology for generation of organisms with certain properties or cell lines for biopharmaceutical production. Such manipulations of gene sequences in their natural chromosomal context became possible after the emergence of the technology called "genome editing". This technology is based on the induction of a double-strand break in a specific genomic target DNA using endonucleases that recognize the unique sequences in the genome and on subsequent recovery of DNA integrity through the use of cellular repair mechanisms. A necessary tool for the genome editing is a custom-designed endonuclease which is able to recognize selected sequences. The emergence of a new type of programmable endonucleases, which were constructed on the basis of bacterial proteins--TAL-effectors (Transcription activators like effector), has become an important stage in the development of technology and promoted wide spread of the genome editing. This article reviews the history of the discovery of TAL effectors and creation of TALE nucleases, and describes their advantages over zinc finger endonucleases that appeared earlier. A large section is devoted to description of genetic modifications that can be performed using the genome editing.
Informed consent in direct-to-consumer personal genome testing: the outline of a model between specific and generic consent.

PubMed

Bunnik, Eline M; Janssens, A Cecile J W; Schermer, Maartje H N

2014-09-01

Broad genome-wide testing is increasingly finding its way to the public through the online direct-to-consumer marketing of so-called personal genome tests. Personal genome tests estimate genetic susceptibilities to multiple diseases and other phenotypic traits simultaneously. Providers commonly make use of Terms of Service agreements rather than informed consent procedures. However, to protect consumers from the potential physical, psychological and social harms associated with personal genome testing and to promote autonomous decision-making with regard to the testing offer, we argue that current practices of information provision are insufficient and that there is a place--and a need--for informed consent in personal genome testing, also when it is offered commercially. The increasing quantity, complexity and diversity of most testing offers, however, pose challenges for information provision and informed consent. Both specific and generic models for informed consent fail to meet its moral aims when applied to personal genome testing. Consumers should be enabled to know the limitations, risks and implications of personal genome testing and should be given control over the genetic information they do or do not wish to obtain. We present the outline of a new model for informed consent which can meet both the norm of providing sufficient information and the norm of providing understandable information. The model can be used for personal genome testing, but will also be applicable to other, future forms of broad genetic testing or screening in commercial and clinical settings. © 2012 John Wiley & Sons Ltd.
Web-based visual analysis for high-throughput genomics

PubMed Central

2013-01-01

Background Visualization plays an essential role in genomics research by making it possible to observe correlations and trends in large datasets as well as communicate findings to others. Visual analysis, which combines visualization with analysis tools to enable seamless use of both approaches for scientific investigation, offers a powerful method for performing complex genomic analyses. However, there are numerous challenges that arise when creating rich, interactive Web-based visualizations/visual analysis applications for high-throughput genomics. These challenges include managing data flow from Web server to Web browser, integrating analysis tools and visualizations, and sharing visualizations with colleagues. Results We have created a platform simplifies the creation of Web-based visualization/visual analysis applications for high-throughput genomics. This platform provides components that make it simple to efficiently query very large datasets, draw common representations of genomic data, integrate with analysis tools, and share or publish fully interactive visualizations. Using this platform, we have created a Circos-style genome-wide viewer, a generic scatter plot for correlation analysis, an interactive phylogenetic tree, a scalable genome browser for next-generation sequencing data, and an application for systematically exploring tool parameter spaces to find good parameter values. All visualizations are interactive and fully customizable. The platform is integrated with the Galaxy (http://galaxyproject.org) genomics workbench, making it easy to integrate new visual applications into Galaxy. Conclusions Visualization and visual analysis play an important role in high-throughput genomics experiments, and approaches are needed to make it easier to create applications for these activities. Our framework provides a foundation for creating Web-based visualizations and integrating them into Galaxy. Finally, the visualizations we have created using the framework are useful tools for high-throughput genomics experiments. PMID:23758618
Smooth Muscle Cell Genome Browser: Enabling the Identification of Novel Serum Response Factor Target Genes

PubMed Central

Lee, Moon Young; Park, Chanjae; Berent, Robyn M.; Park, Paul J.; Fuchs, Robert; Syn, Hannah; Chin, Albert; Townsend, Jared; Benson, Craig C.; Redelman, Doug; Shen, Tsai-wei; Park, Jong Kun; Miano, Joseph M.; Sanders, Kenton M.; Ro, Seungil

2015-01-01

Genome-scale expression data on the absolute numbers of gene isoforms offers essential clues in cellular functions and biological processes. Smooth muscle cells (SMCs) perform a unique contractile function through expression of specific genes controlled by serum response factor (SRF), a transcription factor that binds to DNA sites known as the CArG boxes. To identify SRF-regulated genes specifically expressed in SMCs, we isolated SMC populations from mouse small intestine and colon, obtained their transcriptomes, and constructed an interactive SMC genome and CArGome browser. To our knowledge, this is the first online resource that provides a comprehensive library of all genetic transcripts expressed in primary SMCs. The browser also serves as the first genome-wide map of SRF binding sites. The browser analysis revealed novel SMC-specific transcriptional variants and SRF target genes, which provided new and unique insights into the cellular and biological functions of the cells in gastrointestinal (GI) physiology. The SRF target genes in SMCs, which were discovered in silico, were confirmed by proteomic analysis of SMC-specific Srf knockout mice. Our genome browser offers a new perspective into the alternative expression of genes in the context of SRF binding sites in SMCs and provides a valuable reference for future functional studies. PMID:26241044
MIPS: analysis and annotation of proteins from whole genomes in 2005

PubMed Central

Mewes, H. W.; Frishman, D.; Mayer, K. F. X.; Münsterkötter, M.; Noubibou, O.; Pagel, P.; Rattei, T.; Oesterheld, M.; Ruepp, A.; Stümpflen, V.

2006-01-01

The Munich Information Center for Protein Sequences (MIPS at the GSF), Neuherberg, Germany, provides resources related to genome information. Manually curated databases for several reference organisms are maintained. Several of these databases are described elsewhere in this and other recent NAR database issues. In a complementary effort, a comprehensive set of >400 genomes automatically annotated with the PEDANT system are maintained. The main goal of our current work on creating and maintaining genome databases is to extend gene centered information to information on interactions within a generic comprehensive framework. We have concentrated our efforts along three lines (i) the development of suitable comprehensive data structures and database technology, communication and query tools to include a wide range of different types of information enabling the representation of complex information such as functional modules or networks Genome Research Environment System, (ii) the development of databases covering computable information such as the basic evolutionary relations among all genes, namely SIMAP, the sequence similarity matrix and the CABiNet network analysis framework and (iii) the compilation and manual annotation of information related to interactions such as protein–protein interactions or other types of relations (e.g. MPCDB, MPPI, CYGD). All databases described and the detailed descriptions of our projects can be accessed through the MIPS WWW server (). PMID:16381839
MIPS: analysis and annotation of proteins from whole genomes in 2005.

PubMed

Mewes, H W; Frishman, D; Mayer, K F X; Münsterkötter, M; Noubibou, O; Pagel, P; Rattei, T; Oesterheld, M; Ruepp, A; Stümpflen, V

2006-01-01

The Munich Information Center for Protein Sequences (MIPS at the GSF), Neuherberg, Germany, provides resources related to genome information. Manually curated databases for several reference organisms are maintained. Several of these databases are described elsewhere in this and other recent NAR database issues. In a complementary effort, a comprehensive set of >400 genomes automatically annotated with the PEDANT system are maintained. The main goal of our current work on creating and maintaining genome databases is to extend gene centered information to information on interactions within a generic comprehensive framework. We have concentrated our efforts along three lines (i) the development of suitable comprehensive data structures and database technology, communication and query tools to include a wide range of different types of information enabling the representation of complex information such as functional modules or networks Genome Research Environment System, (ii) the development of databases covering computable information such as the basic evolutionary relations among all genes, namely SIMAP, the sequence similarity matrix and the CABiNet network analysis framework and (iii) the compilation and manual annotation of information related to interactions such as protein-protein interactions or other types of relations (e.g. MPCDB, MPPI, CYGD). All databases described and the detailed descriptions of our projects can be accessed through the MIPS WWW server (http://mips.gsf.de).
A Genome-Wide Linkage Scan for Age at Menarche in Three Populations of European Descent

PubMed Central

Anderson, Carl A.; Zhu, Gu; Falchi, Mario; van den Berg, Stéphanie M.; Treloar, Susan A.; Spector, Timothy D.; Martin, Nicholas G.; Boomsma, Dorret I.; Visscher, Peter M.; Montgomery, Grant W.

2008-01-01

Context: Age at menarche (AAM) is an important trait both biologically and socially, a clearly defined event in female pubertal development, and has been associated with many clinically significant phenotypes. Objective: The objective of the study was to identify genetic loci influencing variation in AAM in large population-based samples from three countries. Design/Participants: Recalled AAM data were collected from 13,697 individuals and 4,899 pseudoindependent sister-pairs from three different populations (Australia, The Netherlands, and the United Kingdom) by mailed questionnaire or interview. Genome-wide variance components linkage analysis was implemented on each sample individually and in combination. Results: The mean, sd, and heritability of AAM across the three samples was 13.1 yr, 1.5 yr, and 0.69, respectively. No loci were detected that reached genome-wide significance in the combined analysis, but a suggestive locus was detected on chromosome 12 (logarithm of the odds = 2.0). Three loci of suggestive significance were seen in the U.K. sample on chromosomes 1, 4, and 18 (logarithm of the odds = 2.4, 2.2 and 3.2, respectively). Conclusions: There was no evidence for common highly penetrant variants influencing AAM. Linkage and association suggest that one trait locus for AAM is located on chromosome 12, but further studies are required to replicate these results. PMID:18647812
Genome-wide mapping in a house mouse hybrid zone reveals hybrid sterility loci and Dobzhansky-Muller interactions.

PubMed

Turner, Leslie M; Harr, Bettina

2014-12-09

Mapping hybrid defects in contact zones between incipient species can identify genomic regions contributing to reproductive isolation and reveal genetic mechanisms of speciation. The house mouse features a rare combination of sophisticated genetic tools and natural hybrid zones between subspecies. Male hybrids often show reduced fertility, a common reproductive barrier between incipient species. Laboratory crosses have identified sterility loci, but each encompasses hundreds of genes. We map genetic determinants of testis weight and testis gene expression using offspring of mice captured in a hybrid zone between M. musculus musculus and M. m. domesticus. Many generations of admixture enables high-resolution mapping of loci contributing to these sterility-related phenotypes. We identify complex interactions among sterility loci, suggesting multiple, non-independent genetic incompatibilities contribute to barriers to gene flow in the hybrid zone.
Recent progress in the genetics of spontaneously hypertensive rats.

PubMed

Pravenec, M; Křen, V; Landa, V; Mlejnek, P; Musilová, A; Šilhavý, J; Šimáková, M; Zídek, V

2014-01-01

The spontaneously hypertensive rat (SHR) is the most widely used animal model of essential hypertension and accompanying metabolic disturbances. Recent advances in sequencing of genomes of BN-Lx and SHR progenitors of the BXH/HXB recombinant inbred (RI) strains as well as accumulation of multiple data sets of intermediary phenotypes in the RI strains, including mRNA and microRNA abundance, quantitative metabolomics, proteomics, methylomics or histone modifications, will make it possible to systematically search for genetic variants involved in regulation of gene expression and in the etiology of complex pathophysiological traits. New advances in manipulation of the rat genome, including efficient transgenesis and gene targeting, will enable in vivo functional analyses of selected candidate genes to identify QTL at the molecular level or to provide insight into mechanisms whereby targeted genes affect pathophysiological traits in the SHR.
Dissection of genomic correlation matrices using multivariate factor analysis in dairy and dual-purpose cattle breeds

USDA-ARS?s Scientific Manuscript database

SNP effects estimated in genomic selection programs allow for the prediction of direct genomic values (DGV) both at genome-wide and chromosomal level. As a consequence, genome-wide (G_GW) or chromosomal (G_CHR) correlation matrices between genomic predictions for different traits can be calculated. ...
HiView: an integrative genome browser to leverage Hi-C results for the interpretation of GWAS variants.

PubMed

Xu, Zheng; Zhang, Guosheng; Duan, Qing; Chai, Shengjie; Zhang, Baqun; Wu, Cong; Jin, Fulai; Yue, Feng; Li, Yun; Hu, Ming

2016-03-11

Genome-wide association studies (GWAS) have identified thousands of genetic variants associated with complex traits and diseases. However, most of them are located in the non-protein coding regions, and therefore it is challenging to hypothesize the functions of these non-coding GWAS variants. Recent large efforts such as the ENCODE and Roadmap Epigenomics projects have predicted a large number of regulatory elements. However, the target genes of these regulatory elements remain largely unknown. Chromatin conformation capture based technologies such as Hi-C can directly measure the chromatin interactions and have generated an increasingly comprehensive catalog of the interactome between the distal regulatory elements and their potential target genes. Leveraging such information revealed by Hi-C holds the promise of elucidating the functions of genetic variants in human diseases. In this work, we present HiView, the first integrative genome browser to leverage Hi-C results for the interpretation of GWAS variants. HiView is able to display Hi-C data and statistical evidence for chromatin interactions in genomic regions surrounding any given GWAS variant, enabling straightforward visualization and interpretation. We believe that as the first GWAS variants-centered Hi-C genome browser, HiView is a useful tool guiding post-GWAS functional genomics studies. HiView is freely accessible at: http://www.unc.edu/~yunmli/HiView .
Whole genome sequences of a male and female supercentenarian, ages greater than 114 years.

PubMed

Sebastiani, Paola; Riva, Alberto; Montano, Monty; Pham, Phillip; Torkamani, Ali; Scherba, Eugene; Benson, Gary; Milton, Jacqueline N; Baldwin, Clinton T; Andersen, Stacy; Schork, Nicholas J; Steinberg, Martin H; Perls, Thomas T

2011-01-01

Supercentenarians (age 110+ years old) generally delay or escape age-related diseases and disability well beyond the age of 100 and this exceptional survival is likely to be influenced by a genetic predisposition that includes both common and rare genetic variants. In this report, we describe the complete genomic sequences of male and female supercentenarians, both age >114 years old. We show that: (1) the sequence variant spectrum of these two individuals' DNA sequences is largely comparable to existing non-supercentenarian genomes; (2) the two individuals do not appear to carry most of the well-established human longevity enabling variants already reported in the literature; (3) they have a comparable number of known disease-associated variants relative to most human genomes sequenced to-date; (4) approximately 1% of the variants these individuals possess are novel and may point to new genes involved in exceptional longevity; and (5) both individuals are enriched for coding variants near longevity-associated variants that we discovered through a large genome-wide association study. These analyses suggest that there are both common and rare longevity-associated variants that may counter the effects of disease-predisposing variants and extend lifespan. The continued analysis of the genomes of these and other rare individuals who have survived to extremely old ages should provide insight into the processes that contribute to the maintenance of health during extreme aging.
Whole Genome Sequences of a Male and Female Supercentenarian, Ages Greater than 114 Years

PubMed Central

Sebastiani, Paola; Riva, Alberto; Montano, Monty; Pham, Phillip; Torkamani, Ali; Scherba, Eugene; Benson, Gary; Milton, Jacqueline N.; Baldwin, Clinton T.; Andersen, Stacy; Schork, Nicholas J.; Steinberg, Martin H.; Perls, Thomas T.

2012-01-01

Supercentenarians (age 110+ years old) generally delay or escape age-related diseases and disability well beyond the age of 100 and this exceptional survival is likely to be influenced by a genetic predisposition that includes both common and rare genetic variants. In this report, we describe the complete genomic sequences of male and female supercentenarians, both age >114 years old. We show that: (1) the sequence variant spectrum of these two individuals’ DNA sequences is largely comparable to existing non-supercentenarian genomes; (2) the two individuals do not appear to carry most of the well-established human longevity enabling variants already reported in the literature; (3) they have a comparable number of known disease-associated variants relative to most human genomes sequenced to-date; (4) approximately 1% of the variants these individuals possess are novel and may point to new genes involved in exceptional longevity; and (5) both individuals are enriched for coding variants near longevity-associated variants that we discovered through a large genome-wide association study. These analyses suggest that there are both common and rare longevity-associated variants that may counter the effects of disease-predisposing variants and extend lifespan. The continued analysis of the genomes of these and other rare individuals who have survived to extremely old ages should provide insight into the processes that contribute to the maintenance of health during extreme aging. PMID:22303384
Investigation of a Quadruplex-Forming Repeat Sequence Highly Enriched in Xanthomonas and Nostoc sp.

PubMed

Rehm, Charlotte; Wurmthaler, Lena A; Li, Yuanhao; Frickey, Tancred; Hartig, Jörg S

2015-01-01

In prokaryotes simple sequence repeats (SSRs) with unit sizes of 1-5 nucleotides (nt) are causative for phase and antigenic variation. Although an increased abundance of heptameric repeats was noticed in bacteria, reports about SSRs of 6-9 nt are rare. In particular G-rich repeat sequences with the propensity to fold into G-quadruplex (G4) structures have received little attention. In silico analysis of prokaryotic genomes show putative G4 forming sequences to be abundant. This report focuses on a surprisingly enriched G-rich repeat of the type GGGNATC in Xanthomonas and cyanobacteria such as Nostoc. We studied in detail the genomes of Xanthomonas campestris pv. campestris ATCC 33913 (Xcc), Xanthomonas axonopodis pv. citri str. 306 (Xac), and Nostoc sp. strain PCC7120 (Ana). In all three organisms repeats are spread all over the genome with an over-representation in non-coding regions. Extensive variation of the number of repetitive units was observed with repeat numbers ranging from two up to 26 units. However a clear preference for four units was detected. The strong bias for four units coincides with the requirement of four consecutive G-tracts for G4 formation. Evidence for G4 formation of the consensus repeat sequences was found in biophysical studies utilizing CD spectroscopy. The G-rich repeats are preferably located between aligned open reading frames (ORFs) and are under-represented in coding regions or between divergent ORFs. The G-rich repeats are preferentially located within a distance of 50 bp upstream of an ORF on the anti-sense strand or within 50 bp from the stop codon on the sense strand. Analysis of whole transcriptome sequence data showed that the majority of repeat sequences are transcribed. The genetic loci in the vicinity of repeat regions show increased genomic stability. In conclusion, we introduce and characterize a special class of highly abundant and wide-spread quadruplex-forming repeat sequences in bacteria.

Investigation of a Quadruplex-Forming Repeat Sequence Highly Enriched in Xanthomonas and Nostoc sp.

PubMed Central

Rehm, Charlotte; Wurmthaler, Lena A.; Li, Yuanhao; Frickey, Tancred; Hartig, Jörg S.

2015-01-01

In prokaryotes simple sequence repeats (SSRs) with unit sizes of 1–5 nucleotides (nt) are causative for phase and antigenic variation. Although an increased abundance of heptameric repeats was noticed in bacteria, reports about SSRs of 6–9 nt are rare. In particular G-rich repeat sequences with the propensity to fold into G-quadruplex (G4) structures have received little attention. In silico analysis of prokaryotic genomes show putative G4 forming sequences to be abundant. This report focuses on a surprisingly enriched G-rich repeat of the type GGGNATC in Xanthomonas and cyanobacteria such as Nostoc. We studied in detail the genomes of Xanthomonas campestris pv. campestris ATCC 33913 (Xcc), Xanthomonas axonopodis pv. citri str. 306 (Xac), and Nostoc sp. strain PCC7120 (Ana). In all three organisms repeats are spread all over the genome with an over-representation in non-coding regions. Extensive variation of the number of repetitive units was observed with repeat numbers ranging from two up to 26 units. However a clear preference for four units was detected. The strong bias for four units coincides with the requirement of four consecutive G-tracts for G4 formation. Evidence for G4 formation of the consensus repeat sequences was found in biophysical studies utilizing CD spectroscopy. The G-rich repeats are preferably located between aligned open reading frames (ORFs) and are under-represented in coding regions or between divergent ORFs. The G-rich repeats are preferentially located within a distance of 50 bp upstream of an ORF on the anti-sense strand or within 50 bp from the stop codon on the sense strand. Analysis of whole transcriptome sequence data showed that the majority of repeat sequences are transcribed. The genetic loci in the vicinity of repeat regions show increased genomic stability. In conclusion, we introduce and characterize a special class of highly abundant and wide-spread quadruplex-forming repeat sequences in bacteria. PMID:26695179
Homopolymer tail-mediated ligation PCR: a streamlined and highly efficient method for DNA cloning and library construction.

PubMed

Lazinski, David W; Camilli, Andrew

2013-01-01

The amplification of DNA fragments, cloned between user-defined 5' and 3' end sequences, is a prerequisite step in the use of many current applications including massively parallel sequencing (MPS). Here we describe an improved method, called homopolymer tail-mediated ligation PCR (HTML-PCR), that requires very little starting template, minimal hands-on effort, is cost-effective, and is suited for use in high-throughput and robotic methodologies. HTML-PCR starts with the addition of homopolymer tails of controlled lengths to the 3' termini of a double-stranded genomic template. The homopolymer tails enable the annealing-assisted ligation of a hybrid oligonucleotide to the template's recessed 5' ends. The hybrid oligonucleotide has a user-defined sequence at its 5' end. This primer, together with a second primer composed of a longer region complementary to the homopolymer tail and fused to a second 5' user-defined sequence, are used in a PCR reaction to generate the final product. The user-defined sequences can be varied to enable compatibility with a wide variety of downstream applications. We demonstrate our new method by constructing MPS libraries starting from nanogram and sub-nanogram quantities of Vibrio cholerae and Streptococcus pneumoniae genomic DNA.
Genome-wide characterization and expression analysis enables identification of abiotic stress-responsive MYB transcription factors in cassava (Manihot esculenta).

PubMed

Ruan, Meng-Bin; Guo, Xin; Wang, Bin; Yang, Yi-Ling; Li, Wen-Qi; Yu, Xiao-Ling; Zhang, Peng; Peng, Ming

2017-06-15

The myeloblastosis (MYB) transcription factor superfamily is the largest transcription factor family in plants, playing different roles during stress response. However, abiotic stress-responsive MYB transcription factors have not been systematically studied in cassava (Manihot esculenta), an important tropical tuber root crop. In this study, we used a genome-wide transcriptome analysis to predict 299 putative MeMYB genes in the cassava genome. Under drought and cold stresses, many MeMYB genes exhibited different expression patterns in cassava leaves, indicating that these genes might play a role in abiotic stress responses. We found that several stress-responsive MeMYB genes responded to abscisic acid (ABA) in cassava leaves. We characterize four MeMYBs, namely MeMYB1, MeMYB2, MeMYB4, and MeMYB9, as R2R3-MYB transcription factors. Furthermore, RNAi-driven repression of MeMYB2 resulted in drought and cold tolerance in transgenic cassava. Gene expression assays in wild-type and MeMYB2-RNAi cassava plants revealed that MeMYB2 may affect other MeMYBs as well as MeWRKYs under drought and cold stress, suggesting crosstalk between MYB and WRKY family genes under stress conditions in cassava. © The Author 2017. Published by Oxford University Press on behalf of the Society for Experimental Biology. All rights reserved. For permissions, please email: journals.permissions@oup.com.
QuadBase2: web server for multiplexed guanine quadruplex mining and visualization

PubMed Central

Dhapola, Parashar; Chowdhury, Shantanu

2016-01-01

DNA guanine quadruplexes or G4s are non-canonical DNA secondary structures which affect genomic processes like replication, transcription and recombination. G4s are computationally identified by specific nucleotide motifs which are also called putative G4 (PG4) motifs. Despite the general relevance of these structures, there is currently no tool available that can allow batch queries and genome-wide analysis of these motifs in a user-friendly interface. QuadBase2 (quadbase.igib.res.in) presents a completely reinvented web server version of previously published QuadBase database. QuadBase2 enables users to mine PG4 motifs in up to 178 eukaryotes through the EuQuad module. This module interfaces with Ensembl Compara database, to allow users mine PG4 motifs in the orthologues of genes of interest across eukaryotes. PG4 motifs can be mined across genes and their promoter sequences in 1719 prokaryotes through ProQuad module. This module includes a feature that allows genome-wide mining of PG4 motifs and their visualization as circular histograms. TetraplexFinder, the module for mining PG4 motifs in user-provided sequences is now capable of handling up to 20 MB of data. QuadBase2 is a comprehensive PG4 motif mining tool that further expands the configurations and algorithms for mining PG4 motifs in a user-friendly way. PMID:27185890
Functional assessment of human enhancer activities using whole-genome STARR-sequencing.

PubMed

Liu, Yuwen; Yu, Shan; Dhiman, Vineet K; Brunetti, Tonya; Eckart, Heather; White, Kevin P

2017-11-20

Genome-wide quantification of enhancer activity in the human genome has proven to be a challenging problem. Recent efforts have led to the development of powerful tools for enhancer quantification. However, because of genome size and complexity, these tools have yet to be applied to the whole human genome. In the current study, we use a human prostate cancer cell line, LNCaP as a model to perform whole human genome STARR-seq (WHG-STARR-seq) to reliably obtain an assessment of enhancer activity. This approach builds upon previously developed STARR-seq in the fly genome and CapSTARR-seq techniques in targeted human genomic regions. With an improved library preparation strategy, our approach greatly increases the library complexity per unit of starting material, which makes it feasible and cost-effective to explore the landscape of regulatory activity in the much larger human genome. In addition to our ability to identify active, accessible enhancers located in open chromatin regions, we can also detect sequences with the potential for enhancer activity that are located in inaccessible, closed chromatin regions. When treated with the histone deacetylase inhibitor, Trichostatin A, genes nearby this latter class of enhancers are up-regulated, demonstrating the potential for endogenous functionality of these regulatory elements. WHG-STARR-seq provides an improved approach to current pipelines for analysis of high complexity genomes to gain a better understanding of the intricacies of transcriptional regulation.
Defense Logistics Agency Disposition Services as a Supply Source: A DoD-Wide Opportunity

DTIC Science & Technology

2013-07-01

upon the eco - nomic benefits of reutilization. Reutilization already saves the DoD millions of dollars each year by enabling both internal and...cost, over the Internet at govliquidation. com, tents, boots, gasoline burners (stove/heating units), a medical suction apparatus, and bandages and
A synteny-based draft genome sequence of the forage grass Lolium perenne.

PubMed

Byrne, Stephen L; Nagy, Istvan; Pfeifer, Matthias; Armstead, Ian; Swain, Suresh; Studer, Bruno; Mayer, Klaus; Campbell, Jacqueline D; Czaban, Adrian; Hentrup, Stephan; Panitz, Frank; Bendixen, Christian; Hedegaard, Jakob; Caccamo, Mario; Asp, Torben

2015-11-01

Here we report the draft genome sequence of perennial ryegrass (Lolium perenne), an economically important forage and turf grass species that is widely cultivated in temperate regions worldwide. It is classified along with wheat, barley, oats and Brachypodium distachyon in the Pooideae sub-family of the grass family (Poaceae). Transcriptome data was used to identify 28,455 gene models, and we utilized macro-co-linearity between perennial ryegrass and barley, and synteny within the grass family, to establish a synteny-based linear gene order. The gametophytic self-incompatibility mechanism enables the pistil of a plant to reject self-pollen and therefore promote out-crossing. We have used the sequence assembly to characterize transcriptional changes in the stigma during pollination with both compatible and incompatible pollen. Characterization of the pollen transcriptome identified homologs to pollen allergens from a range of species, many of which were expressed to very high levels in mature pollen grains, and are potentially involved in the self-incompatibility mechanism. The genome sequence provides a valuable resource for future breeding efforts based on genomic prediction, and will accelerate the development of new varieties for more productive grasslands. © 2015 The Authors The Plant Journal © 2015 John Wiley & Sons Ltd.
Cas9, Cpf1 and C2c1/2/3―What's next?

PubMed Central

Yamamoto, Takashi; Sakuma, Tetsushi

2017-01-01

ABSTRACT Since the rapid emergence of clustered regulatory interspaced short palindromic repeats (CRISPR)-CRISPR-associated protein 9 (Cas9) system, developed as a genome engineering tool in 2012–2013, most researchers in the life science field have had a fixated interest in this fascinating technology. CRISPR-Cas9 is an RNA-guided DNA endonuclease system, which consists of Cas9 nuclease defining a few targeting base via protospacer adjacent motif complexed with easily customizable single guide RNA targeting around 20-bp genomic sequence. Although Streptococcus pyogenes Cas9 (SpCas9), one of the Cas9 proteins that applications in genome engineering were first demonstrated, still has wide usage because of its high nuclease activity and broad targeting range, there are several limitations such as large molecular weight and potential off-target effect. In this commentary, we describe various improvements and alternatives of CRISPR-Cas systems, including engineered Cas9 variants, Cas9 homologs, and novel Cas proteins other than Cas9. These variations enable flexible genome engineering with high efficiency and specificity, orthogonal genetic control at multiple gene loci, gene knockdown, or fluorescence imaging of transcripts mediated by RNA targeting, and beyond. PMID:28140746
RSAT: regulatory sequence analysis tools.

PubMed

Thomas-Chollier, Morgane; Sand, Olivier; Turatsinze, Jean-Valéry; Janky, Rekin's; Defrance, Matthieu; Vervisch, Eric; Brohée, Sylvain; van Helden, Jacques

2008-07-01

The regulatory sequence analysis tools (RSAT, http://rsat.ulb.ac.be/rsat/) is a software suite that integrates a wide collection of modular tools for the detection of cis-regulatory elements in genome sequences. The suite includes programs for sequence retrieval, pattern discovery, phylogenetic footprint detection, pattern matching, genome scanning and feature map drawing. Random controls can be performed with random gene selections or by generating random sequences according to a variety of background models (Bernoulli, Markov). Beyond the original word-based pattern-discovery tools (oligo-analysis and dyad-analysis), we recently added a battery of tools for matrix-based detection of cis-acting elements, with some original features (adaptive background models, Markov-chain estimation of P-values) that do not exist in other matrix-based scanning tools. The web server offers an intuitive interface, where each program can be accessed either separately or connected to the other tools. In addition, the tools are now available as web services, enabling their integration in programmatic workflows. Genomes are regularly updated from various genome repositories (NCBI and EnsEMBL) and 682 organisms are currently supported. Since 1998, the tools have been used by several hundreds of researchers from all over the world. Several predictions made with RSAT were validated experimentally and published.
Cas9, Cpf1 and C2c1/2/3-What's next?

PubMed

Nakade, Shota; Yamamoto, Takashi; Sakuma, Tetsushi

2017-05-04

Since the rapid emergence of clustered regulatory interspaced short palindromic repeats (CRISPR)-CRISPR-associated protein 9 (Cas9) system, developed as a genome engineering tool in 2012-2013, most researchers in the life science field have had a fixated interest in this fascinating technology. CRISPR-Cas9 is an RNA-guided DNA endonuclease system, which consists of Cas9 nuclease defining a few targeting base via protospacer adjacent motif complexed with easily customizable single guide RNA targeting around 20-bp genomic sequence. Although Streptococcus pyogenes Cas9 (SpCas9), one of the Cas9 proteins that applications in genome engineering were first demonstrated, still has wide usage because of its high nuclease activity and broad targeting range, there are several limitations such as large molecular weight and potential off-target effect. In this commentary, we describe various improvements and alternatives of CRISPR-Cas systems, including engineered Cas9 variants, Cas9 homologs, and novel Cas proteins other than Cas9. These variations enable flexible genome engineering with high efficiency and specificity, orthogonal genetic control at multiple gene loci, gene knockdown, or fluorescence imaging of transcripts mediated by RNA targeting, and beyond.
Finding cancer driver mutations in the era of big data research.

PubMed

Poulos, Rebecca C; Wong, Jason W H

2018-04-02

In the last decade, the costs of genome sequencing have decreased considerably. The commencement of large-scale cancer sequencing projects has enabled cancer genomics to join the big data revolution. One of the challenges still facing cancer genomics research is determining which are the driver mutations in an individual cancer, as these contribute only a small subset of the overall mutation profile of a tumour. Focusing primarily on somatic single nucleotide mutations in this review, we consider both coding and non-coding driver mutations, and discuss how such mutations might be identified from cancer sequencing datasets. We describe some of the tools and database that are available for the annotation of somatic variants and the identification of cancer driver genes. We also address the use of genome-wide variation in mutation load to establish background mutation rates from which to identify driver mutations under positive selection. Finally, we describe the ways in which mutational signatures can act as clues for the identification of cancer drivers, as these mutations may cause, or arise from, certain mutational processes. By defining the molecular changes responsible for driving cancer development, new cancer treatment strategies may be developed or novel preventative measures proposed.
Pathogenesis of Helicobacter pylori-Related Gastroduodenal Diseases from Molecular Epidemiological Studies.

PubMed

Yamaoka, Yoshio

2012-01-01

Helicobacter pylori is a major human pathogen that infects the stomach and produces inflammation that is responsible for various gastroduodenal diseases. Despite the high prevalence of H. pylori infections in Africa and South Asia, the incidence of gastric cancer in these areas is much lower than in other countries. The incidence of gastric cancer also tends to decrease from north to south in East Asia. Data from molecular epidemiological studies show that this variation in different geographic areas could be explained in part by different types of H. pylori virulence factors, especially CagA, VacA, and OipA. H. pylori infection is thought to be involved in both gastric cancer and duodenal ulcer, which are at opposite ends of the disease spectrum. This discrepancy can also be explained in part by another H. pylori factor, DupA, as well as by CagA typing (East Asian type versus Western type). H. pylori has a genome of approximately 1,600 genes; therefore, there might be other novel virulence factors. Because genome wide analyses using whole-genome sequencing technology give a broad view of the genome of H. pylori, we hope that next-generation sequencers will enable us to efficiently investigate novel virulence factors.
In vivo genome editing in animals using AAV-CRISPR system: applications to translational research of human disease

PubMed Central

Lau, Cia-Hin; Suh, Yousin

2017-01-01

Adeno-associated virus (AAV) has shown promising therapeutic efficacy with a good safety profile in a wide range of animal models and human clinical trials. With the advent of clustered regulatory interspaced short palindromic repeat (CRISPR)-based genome-editing technologies, AAV provides one of the most suitable viral vectors to package, deliver, and express CRISPR components for targeted gene editing. Recent discoveries of smaller Cas9 orthologues have enabled the packaging of Cas9 nuclease and its chimeric guide RNA into a single AAV delivery vehicle for robust in vivo genome editing. Here, we discuss how the combined use of small Cas9 orthologues, tissue-specific minimal promoters, AAV serotypes, and different routes of administration has advanced the development of efficient and precise in vivo genome editing and comprehensively review the various AAV-CRISPR systems that have been effectively used in animals. We then discuss the clinical implications and potential strategies to overcome off-target effects, immunogenicity, and toxicity associated with CRISPR components and AAV delivery vehicles. Finally, we discuss ongoing non-viral-based ex vivo gene therapy clinical trials to underscore the current challenges and future prospects of CRISPR/Cas9 delivery for human therapeutics. PMID:29333255
Indels, structural variation, and recombination drive genomic diversity in Plasmodium falciparum

PubMed Central

Miles, Alistair; Iqbal, Zamin; Vauterin, Paul; Pearson, Richard; Campino, Susana; Theron, Michel; Gould, Kelda; Mead, Daniel; Drury, Eleanor; O'Brien, John; Ruano Rubio, Valentin; MacInnis, Bronwyn; Mwangi, Jonathan; Samarakoon, Upeka; Ranford-Cartwright, Lisa; Ferdig, Michael; Hayton, Karen; Su, Xin-zhuan; Wellems, Thomas; Rayner, Julian; McVean, Gil; Kwiatkowski, Dominic

2016-01-01

The malaria parasite Plasmodium falciparum has a great capacity for evolutionary adaptation to evade host immunity and develop drug resistance. Current understanding of parasite evolution is impeded by the fact that a large fraction of the genome is either highly repetitive or highly variable and thus difficult to analyze using short-read sequencing technologies. Here, we describe a resource of deep sequencing data on parents and progeny from genetic crosses, which has enabled us to perform the first genome-wide, integrated analysis of SNP, indel and complex polymorphisms, using Mendelian error rates as an indicator of genotypic accuracy. These data reveal that indels are exceptionally abundant, being more common than SNPs and thus the dominant mode of polymorphism within the core genome. We use the high density of SNP and indel markers to analyze patterns of meiotic recombination, confirming a high rate of crossover events and providing the first estimates for the rate of non-crossover events and the length of conversion tracts. We observe several instances of meiotic recombination within copy number variants associated with drug resistance, demonstrating a mechanism whereby fitness costs associated with resistance mutations could be compensated and greater phenotypic plasticity could be acquired. PMID:27531718
Systems metabolic engineering: genome-scale models and beyond.

PubMed

Blazeck, John; Alper, Hal

2010-07-01

The advent of high throughput genome-scale bioinformatics has led to an exponential increase in available cellular system data. Systems metabolic engineering attempts to use data-driven approaches--based on the data collected with high throughput technologies--to identify gene targets and optimize phenotypical properties on a systems level. Current systems metabolic engineering tools are limited for predicting and defining complex phenotypes such as chemical tolerances and other global, multigenic traits. The most pragmatic systems-based tool for metabolic engineering to arise is the in silico genome-scale metabolic reconstruction. This tool has seen wide adoption for modeling cell growth and predicting beneficial gene knockouts, and we examine here how this approach can be expanded for novel organisms. This review will highlight advances of the systems metabolic engineering approach with a focus on de novo development and use of genome-scale metabolic reconstructions for metabolic engineering applications. We will then discuss the challenges and prospects for this emerging field to enable model-based metabolic engineering. Specifically, we argue that current state-of-the-art systems metabolic engineering techniques represent a viable first step for improving product yield that still must be followed by combinatorial techniques or random strain mutagenesis to achieve optimal cellular systems.
Expression atlas and comparative coexpression network analyses reveal important genes involved in the formation of lignified cell wall in Brachypodium distachyon.

PubMed

Sibout, Richard; Proost, Sebastian; Hansen, Bjoern Oest; Vaid, Neha; Giorgi, Federico M; Ho-Yue-Kuang, Severine; Legée, Frédéric; Cézart, Laurent; Bouchabké-Coussa, Oumaya; Soulhat, Camille; Provart, Nicholas; Pasha, Asher; Le Bris, Philippe; Roujol, David; Hofte, Herman; Jamet, Elisabeth; Lapierre, Catherine; Persson, Staffan; Mutwil, Marek

2017-08-01

While Brachypodium distachyon (Brachypodium) is an emerging model for grasses, no expression atlas or gene coexpression network is available. Such tools are of high importance to provide insights into the function of Brachypodium genes. We present a detailed Brachypodium expression atlas, capturing gene expression in its major organs at different developmental stages. The data were integrated into a large-scale coexpression database ( www.gene2function.de), enabling identification of duplicated pathways and conserved processes across 10 plant species, thus allowing genome-wide inference of gene function. We highlight the importance of the atlas and the platform through the identification of duplicated cell wall modules, and show that a lignin biosynthesis module is conserved across angiosperms. We identified and functionally characterised a putative ferulate 5-hydroxylase gene through overexpression of it in Brachypodium, which resulted in an increase in lignin syringyl units and reduced lignin content of mature stems, and led to improved saccharification of the stem biomass. Our Brachypodium expression atlas thus provides a powerful resource to reveal functionally related genes, which may advance our understanding of important biological processes in grasses. © 2017 The Authors. New Phytologist © 2017 New Phytologist Trust.
In silico mining of putative microsatellite markers from whole genome sequence of water buffalo (Bubalus bubalis) and development of first BuffSatDB

PubMed Central

2013-01-01

Background Though India has sequenced water buffalo genome but its draft assembly is based on cattle genome BTau 4.0, thus de novo chromosome wise assembly is a major pending issue for global community. The existing radiation hybrid of buffalo and these reported STR can be used further in final gap plugging and “finishing” expected in de novo genome assembly. QTL and gene mapping needs mining of putative STR from buffalo genome at equal interval on each and every chromosome. Such markers have potential role in improvement of desirable characteristics, such as high milk yields, resistance to diseases, high growth rate. The STR mining from whole genome and development of user friendly database is yet to be done to reap the benefit of whole genome sequence. Description By in silico microsatellite mining of whole genome, we have developed first STR database of water buffalo, BuffSatDb (Buffalo MicroSatellite Database (http://cabindb.iasri.res.in/buffsatdb/) which is a web based relational database of 910529 microsatellite markers, developed using PHP and MySQL database. Microsatellite markers have been generated using MIcroSAtellite tool. It is simple and systematic web based search for customised retrieval of chromosome wise and genome-wide microsatellites. Search has been enabled based on chromosomes, motif type (mono-hexa), repeat motif and repeat kind (simple and composite). The search may be customised by limiting location of STR on chromosome as well as number of markers in that range. This is a novel approach and not been implemented in any of the existing marker database. This database has been further appended with Primer3 for primer designing of the selected markers enabling researcher to select markers of choice at desired interval over the chromosome. The unique add-on of degenerate bases further helps in resolving presence of degenerate bases in current buffalo assembly. Conclusion Being first buffalo STR database in the world , this would not only pave the way in resolving current assembly problem but shall be of immense use for global community in QTL/gene mapping critically required to increase knowledge in the endeavour to increase buffalo productivity, especially for third world country where rural economy is significantly dependent on buffalo productivity. PMID:23336431
In silico mining of putative microsatellite markers from whole genome sequence of water buffalo (Bubalus bubalis) and development of first BuffSatDB.

PubMed

Sarika; Arora, Vasu; Iquebal, Mir Asif; Rai, Anil; Kumar, Dinesh

2013-01-19

Though India has sequenced water buffalo genome but its draft assembly is based on cattle genome BTau 4.0, thus de novo chromosome wise assembly is a major pending issue for global community. The existing radiation hybrid of buffalo and these reported STR can be used further in final gap plugging and "finishing" expected in de novo genome assembly. QTL and gene mapping needs mining of putative STR from buffalo genome at equal interval on each and every chromosome. Such markers have potential role in improvement of desirable characteristics, such as high milk yields, resistance to diseases, high growth rate. The STR mining from whole genome and development of user friendly database is yet to be done to reap the benefit of whole genome sequence. By in silico microsatellite mining of whole genome, we have developed first STR database of water buffalo, BuffSatDb (Buffalo MicroSatellite Database (http://cabindb.iasri.res.in/buffsatdb/) which is a web based relational database of 910529 microsatellite markers, developed using PHP and MySQL database. Microsatellite markers have been generated using MIcroSAtellite tool. It is simple and systematic web based search for customised retrieval of chromosome wise and genome-wide microsatellites. Search has been enabled based on chromosomes, motif type (mono-hexa), repeat motif and repeat kind (simple and composite). The search may be customised by limiting location of STR on chromosome as well as number of markers in that range. This is a novel approach and not been implemented in any of the existing marker database. This database has been further appended with Primer3 for primer designing of the selected markers enabling researcher to select markers of choice at desired interval over the chromosome. The unique add-on of degenerate bases further helps in resolving presence of degenerate bases in current buffalo assembly. Being first buffalo STR database in the world , this would not only pave the way in resolving current assembly problem but shall be of immense use for global community in QTL/gene mapping critically required to increase knowledge in the endeavour to increase buffalo productivity, especially for third world country where rural economy is significantly dependent on buffalo productivity.
Dual tunneling-unit scanning tunneling microscope for length measurement based on crystalline lattice

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, H.; Higuchi, T.; Nishioki, N.

1997-01-01

A dual tunneling-unit scanning tunneling microscope (DTU STM) was developed for nm order length measurement with wide scan range. The crystalline lattice of highly oriented pyrolitic graphite (HOPG) was used as reference scale. A reference unit was set up on top of a test unit. The reference sample holder and the probe tip of test unit were attached to one single XY scanner on either surface, while the test sample holder was open. This enables simultaneous acquisition of wide images of HOPG and test sample. The length in test sample image was measured by counting the number of HOPG lattices.more » An inchworm actuator and an impact drive mechanism were introduced to roughly position probe tips. The XY scanner was designed to be elastic to eliminate image distortion. Some comparison experiments using two HOPG chips were carried out in air. The DTU STM is confirmed to be a stable and more powerful device for length measurement which has nanometer accuracy when covering a wide scan range up to several micrometers, and is capable of measuring comparatively large and heavy samples. {copyright} {ital 1997 American Vacuum Society.}« less
The science commons in health research: structure, function, and value.

PubMed

Cook-Deegan, Robert

The "science commons," knowledge that is widely accessible at low or no cost, is a uniquely important input to scientific advance and cumulative technological innovation. It is primarily, although not exclusively, funded by government and nonprofit sources. Much of it is produced at academic research centers, although some academic science is proprietary and some privately funded R&D enters the science commons. Science in general aspires to Mertonian norms of openness, universality, objectivity, and critical inquiry. The science commons diverges from proprietary science primarily in being open and being very broadly available. These features make the science commons particularly valuable for advancing knowledge, for training innovators who will ultimately work in both public and private sectors, and in providing a common stock of knowledge upon which all players-both public and private-can draw readily. Open science plays two important roles that proprietary R&D cannot: it enables practical benefits even in the absence of profitable markets for goods and services, and its lays a shared foundation for subsequent private R&D. The history of genomics in the period 1992-2004, covering two periods when genomic startup firms attracted significant private R&D investment, illustrates these features of how a science commons contributes value. Commercial interest in genomics was intense during this period. Fierce competition between private sector and public sector genomics programs was highly visible. Seemingly anomalous behavior, such as private firms funding "open science," can be explained by unusual business dynamics between established firms wanting to preserve a robust science commons to prevent startup firms from limiting established firms' freedom to operate. Deliberate policies to create and protect a large science commons were pursued by nonprofit and government funders of genomics research, such as the Wellcome Trust and National Institutes of Health. These policies were crucial to keeping genomic data and research tools widely available at low cost.

Progress of targeted genome modification approaches in higher plants.

PubMed

Cardi, Teodoro; Neal Stewart, C

2016-07-01

Transgene integration in plants is based on illegitimate recombination between non-homologous sequences. The low control of integration site and number of (trans/cis)gene copies might have negative consequences on the expression of transferred genes and their insertion within endogenous coding sequences. The first experiments conducted to use precise homologous recombination for gene integration commenced soon after the first demonstration that transgenic plants could be produced. Modern transgene targeting categories used in plant biology are: (a) homologous recombination-dependent gene targeting; (b) recombinase-mediated site-specific gene integration; (c) oligonucleotide-directed mutagenesis; (d) nuclease-mediated site-specific genome modifications. New tools enable precise gene replacement or stacking with exogenous sequences and targeted mutagenesis of endogeneous sequences. The possibility to engineer chimeric designer nucleases, which are able to target virtually any genomic site, and use them for inducing double-strand breaks in host DNA create new opportunities for both applied plant breeding and functional genomics. CRISPR is the most recent technology available for precise genome editing. Its rapid adoption in biological research is based on its inherent simplicity and efficacy. Its utilization, however, depends on available sequence information, especially for genome-wide analysis. We will review the approaches used for genome modification, specifically those for affecting gene integration and modification in higher plants. For each approach, the advantages and limitations will be noted. We also will speculate on how their actual commercial development and implementation in plant breeding will be affected by governmental regulations.
Ancient Recombination Events between Human Herpes Simplex Viruses

PubMed Central

Burrel, Sonia; Boutolleau, David; Ryu, Diane; Agut, Henri; Merkel, Kevin; Leendertz, Fabian H.

2017-01-01

Abstract Herpes simplex viruses 1 and 2 (HSV-1 and HSV-2) are seen as close relatives but also unambiguously considered as evolutionary independent units. Here, we sequenced the genomes of 18 HSV-2 isolates characterized by divergent UL30 gene sequences to further elucidate the evolutionary history of this virus. Surprisingly, genome-wide recombination analyses showed that all HSV-2 genomes sequenced to date contain HSV-1 fragments. Using phylogenomic analyses, we could also show that two main HSV-2 lineages exist. One lineage is mostly restricted to subSaharan Africa whereas the other has reached a global distribution. Interestingly, only the worldwide lineage is characterized by ancient recombination events with HSV-1. Our findings highlight the complexity of HSV-2 evolution, a virus of putative zoonotic origin which later recombined with its human-adapted relative. They also suggest that coinfections with HSV-1 and 2 may have genomic and potentially functional consequences and should therefore be monitored more closely. PMID:28369565
GenomeGems: evaluation of genetic variability from deep sequencing data

PubMed Central

2012-01-01

Background Detection of disease-causing mutations using Deep Sequencing technologies possesses great challenges. In particular, organizing the great amount of sequences generated so that mutations, which might possibly be biologically relevant, are easily identified is a difficult task. Yet, for this assignment only limited automatic accessible tools exist. Findings We developed GenomeGems to gap this need by enabling the user to view and compare Single Nucleotide Polymorphisms (SNPs) from multiple datasets and to load the data onto the UCSC Genome Browser for an expanded and familiar visualization. As such, via automatic, clear and accessible presentation of processed Deep Sequencing data, our tool aims to facilitate ranking of genomic SNP calling. GenomeGems runs on a local Personal Computer (PC) and is freely available at http://www.tau.ac.il/~nshomron/GenomeGems. Conclusions GenomeGems enables researchers to identify potential disease-causing SNPs in an efficient manner. This enables rapid turnover of information and leads to further experimental SNP validation. The tool allows the user to compare and visualize SNPs from multiple experiments and to easily load SNP data onto the UCSC Genome browser for further detailed information. PMID:22748151
After the revolution? Ethical and social challenges in 'personalized genomic medicine'

PubMed

Juengst, Eric T; Settersten, Richard A; Fishman, Jennifer R; McGowan, Michelle L

2012-06-01

Personalized genomic medicine (PGM) is a goal that currently unites a wide array of biomedical initiatives, and is promoted as a 'new paradigm for healthcare' by its champions. Its promissory virtues include individualized diagnosis and risk prediction, more effective prevention and health promotion, and patient empowerment. Beyond overcoming scientific and technological hurdles to realizing PGM, proponents may interpret and rank these promises differently, which carries ethical and social implications for the realization of PGM as an approach to healthcare. We examine competing visions of PGM's virtues and the directions in which they could take the field, in order to anticipate policy choices that may lie ahead for researchers, healthcare providers and the public.
Genome-wide association and genomic prediction of resistance to viral nervous necrosis in European sea bass (Dicentrarchus labrax) using RAD sequencing.

PubMed

Palaiokostas, Christos; Cariou, Sophie; Bestin, Anastasia; Bruant, Jean-Sebastien; Haffray, Pierrick; Morin, Thierry; Cabon, Joëlle; Allal, François; Vandeputte, Marc; Houston, Ross D

2018-06-08

European sea bass (Dicentrarchus labrax) is one of the most important species for European aquaculture. Viral nervous necrosis (VNN), commonly caused by the redspotted grouper nervous necrosis virus (RGNNV), can result in high levels of morbidity and mortality, mainly during the larval and juvenile stages of cultured sea bass. In the absence of efficient therapeutic treatments, selective breeding for host resistance offers a promising strategy to control this disease. Our study aimed at investigating genetic resistance to VNN and genomic-based approaches to improve disease resistance by selective breeding. A population of 1538 sea bass juveniles from a factorial cross between 48 sires and 17 dams was challenged with RGNNV with mortalities and survivors being recorded and sampled for genotyping by the RAD sequencing approach. We used genome-wide genotype data from 9195 single nucleotide polymorphisms (SNPs) for downstream analysis. Estimates of heritability of survival on the underlying scale for the pedigree and genomic relationship matrices were 0.27 (HPD interval 95%: 0.14-0.40) and 0.43 (0.29-0.57), respectively. Classical genome-wide association analysis detected genome-wide significant quantitative trait loci (QTL) for resistance to VNN on chromosomes (unassigned scaffolds in the case of 'chromosome' 25) 3, 20 and 25 (P < 1e06). Weighted genomic best linear unbiased predictor provided additional support for the QTL on chromosome 3 and suggested that it explained 4% of the additive genetic variation. Genomic prediction approaches were tested to investigate the potential of using genome-wide SNP data to estimate breeding values for resistance to VNN and showed that genomic prediction resulted in a 13% increase in successful classification of resistant and susceptible animals compared to pedigree-based methods, with Bayes A and Bayes B giving the highest predictive ability. Genome-wide significant QTL were identified but each with relatively small effects on the trait. Tests of genomic prediction suggested that incorporating genome-wide SNP data is likely to result in higher accuracy of estimated breeding values for resistance to VNN. RAD sequencing is an effective method for generating such genome-wide SNPs, and our findings highlight the potential of genomic selection to breed farmed European sea bass with improved resistance to VNN.
Integrated platform for genome-wide screening and construction of high-density genetic interaction maps in mammalian cells

PubMed Central

Kampmann, Martin; Bassik, Michael C.; Weissman, Jonathan S.

2013-01-01

A major challenge of the postgenomic era is to understand how human genes function together in normal and disease states. In microorganisms, high-density genetic interaction (GI) maps are a powerful tool to elucidate gene functions and pathways. We have developed an integrated methodology based on pooled shRNA screening in mammalian cells for genome-wide identification of genes with relevant phenotypes and systematic mapping of all GIs among them. We recently demonstrated the potential of this approach in an application to pathways controlling the susceptibility of human cells to the toxin ricin. Here we present the complete quantitative framework underlying our strategy, including experimental design, derivation of quantitative phenotypes from pooled screens, robust identification of hit genes using ultra-complex shRNA libraries, parallel measurement of tens of thousands of GIs from a single double-shRNA experiment, and construction of GI maps. We describe the general applicability of our strategy. Our pooled approach enables rapid screening of the same shRNA library in different cell lines and under different conditions to determine a range of different phenotypes. We illustrate this strategy here for single- and double-shRNA libraries. We compare the roles of genes for susceptibility to ricin and Shiga toxin in different human cell lines and reveal both toxin-specific and cell line-specific pathways. We also present GI maps based on growth and ricin-resistance phenotypes, and we demonstrate how such a comparative GI mapping strategy enables functional dissection of physical complexes and context-dependent pathways. PMID:23739767
Identification and profiling of novel microRNAs in the Brassica rapa genome based on small RNA deep sequencing

PubMed Central

2012-01-01

Background MicroRNAs (miRNAs) are one of the functional non-coding small RNAs involved in the epigenetic control of the plant genome. Although plants contain both evolutionary conserved miRNAs and species-specific miRNAs within their genomes, computational methods often only identify evolutionary conserved miRNAs. The recent sequencing of the Brassica rapa genome enables us to identify miRNAs and their putative target genes. In this study, we sought to provide a more comprehensive prediction of B. rapa miRNAs based on high throughput small RNA deep sequencing. Results We sequenced small RNAs from five types of tissue: seedlings, roots, petioles, leaves, and flowers. By analyzing 2.75 million unique reads that mapped to the B. rapa genome, we identified 216 novel and 196 conserved miRNAs that were predicted to target approximately 20% of the genome’s protein coding genes. Quantitative analysis of miRNAs from the five types of tissue revealed that novel miRNAs were expressed in diverse tissues but their expression levels were lower than those of the conserved miRNAs. Comparative analysis of the miRNAs between the B. rapa and Arabidopsis thaliana genomes demonstrated that redundant copies of conserved miRNAs in the B. rapa genome may have been deleted after whole genome triplication. Novel miRNA members seemed to have spontaneously arisen from the B. rapa and A. thaliana genomes, suggesting the species-specific expansion of miRNAs. We have made this data publicly available in a miRNA database of B. rapa called BraMRs. The database allows the user to retrieve miRNA sequences, their expression profiles, and a description of their target genes from the five tissue types investigated here. Conclusions This is the first report to identify novel miRNAs from Brassica crops using genome-wide high throughput techniques. The combination of computational methods and small RNA deep sequencing provides robust predictions of miRNAs in the genome. The finding of numerous novel miRNAs, many with few target genes and low expression levels, suggests the rapid evolution of miRNA genes. The development of a miRNA database, BraMRs, enables us to integrate miRNA identification, target prediction, and functional annotation of target genes. BraMRs will represent a valuable public resource with which to study the epigenetic control of B. rapa and other closely related Brassica species. The database is available at the following link: http://bramrs.rna.kr [1]. PMID:23163954
metaseq: a Python package for integrative genome-wide analysis reveals relationships between chromatin insulators and associated nuclear mRNA.

PubMed

Dale, Ryan K; Matzat, Leah H; Lei, Elissa P

2014-08-01

Here we introduce metaseq, a software library written in Python, which enables loading multiple genomic data formats into standard Python data structures and allows flexible, customized manipulation and visualization of data from high-throughput sequencing studies. We demonstrate its practical use by analyzing multiple datasets related to chromatin insulators, which are DNA-protein complexes proposed to organize the genome into distinct transcriptional domains. Recent studies in Drosophila and mammals have implicated RNA in the regulation of chromatin insulator activities. Moreover, the Drosophila RNA-binding protein Shep has been shown to antagonize gypsy insulator activity in a tissue-specific manner, but the precise role of RNA in this process remains unclear. Better understanding of chromatin insulator regulation requires integration of multiple datasets, including those from chromatin-binding, RNA-binding, and gene expression experiments. We use metaseq to integrate RIP- and ChIP-seq data for Shep and the core gypsy insulator protein Su(Hw) in two different cell types, along with publicly available ChIP-chip and RNA-seq data. Based on the metaseq-enabled analysis presented here, we propose a model where Shep associates with chromatin cotranscriptionally, then is recruited to insulator complexes in trans where it plays a negative role in insulator activity. Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.
Sequencing small genomic targets with high efficiency and extreme accuracy

PubMed Central

Schmitt, Michael W.; Fox, Edward J.; Prindle, Marc J.; Reid-Bayliss, Kate S.; True, Lawrence D.; Radich, Jerald P.; Loeb, Lawrence A.

2015-01-01

The detection of minority variants in mixed samples demands methods for enrichment and accurate sequencing of small genomic intervals. We describe an efficient approach based on sequential rounds of hybridization with biotinylated oligonucleotides, enabling more than one-million fold enrichment of genomic regions of interest. In conjunction with error correcting double-stranded molecular tags, our approach enables the quantification of mutations in individual DNA molecules. PMID:25849638
Efficient CRISPR/Cas9-Based Genome Engineering in Human Pluripotent Stem Cells.

PubMed

Kime, Cody; Mandegar, Mohammad A; Srivastava, Deepak; Yamanaka, Shinya; Conklin, Bruce R; Rand, Tim A

2016-01-01

Human pluripotent stem cells (hPS cells) are rapidly emerging as a powerful tool for biomedical discovery. The advent of human induced pluripotent stem cells (hiPS cells) with human embryonic stem (hES)-cell-like properties has led to hPS cells with disease-specific genetic backgrounds for in vitro disease modeling and drug discovery as well as mechanistic and developmental studies. To fully realize this potential, it will be necessary to modify the genome of hPS cells with precision and flexibility. Pioneering experiments utilizing site-specific double-strand break (DSB)-mediated genome engineering tools, including zinc finger nucleases (ZFNs) and transcription activator-like effector nucleases (TALENs), have paved the way to genome engineering in previously recalcitrant systems such as hPS cells. However, these methods are technically cumbersome and require significant expertise, which has limited adoption. A major recent advance involving the clustered regularly interspaced short palindromic repeats (CRISPR) endonuclease has dramatically simplified the effort required for genome engineering and will likely be adopted widely as the most rapid and flexible system for genome editing in hPS cells. In this unit, we describe commonly practiced methods for CRISPR endonuclease genomic editing of hPS cells into cell lines containing genomes altered by insertion/deletion (indel) mutagenesis or insertion of recombinant genomic DNA. Copyright © 2016 John Wiley & Sons, Inc.
Low incidence of SNVs and indels in trio genomes of Cas9-mediated multiplex edited sheep.

PubMed

Wang, Xiaolong; Liu, Jing; Niu, Yiyuan; Li, Yan; Zhou, Shiwei; Li, Chao; Ma, Baohua; Kou, Qifang; Petersen, Bjoern; Sonstegard, Tad; Huang, Xingxu; Jiang, Yu; Chen, Yulin

2018-05-25

The simplicity of the CRISPR/Cas9 system has enabled its widespread applications in generating animal models, functional genomic screening and in treating genetic and infectious diseases. However, unintended mutations produced by off-target CRISPR/Cas9 nuclease activity may lead to negative consequences. Especially, a very recent study found that gene editing can introduce hundreds of unintended mutations into the genome, and have attracted wide attention. To address the off-target concerns, urgent characterization of the CRISPR/Cas9-mediated off-target mutagenesis is highly anticipated. Here we took advantage of our previously generated gene-edited sheep and performed family trio-based whole genome sequencing which is capable of discriminating variants in the edited progenies that are inherited, naturally generated, or induced by genetic modification. Three family trios were re-sequenced at a high average depth of genomic coverage (~ 25.8×). After developing a pipeline to comprehensively analyze the sequence data for de novo single nucleotide variants, indels and structural variations from the genome; we only found a single unintended event in the form of a 2.4 kb inversion induced by site-specific double-strand breaks between two sgRNA targeting sites at the MSTN locus with a low incidence. We provide the first report on the fidelity of CRISPR-based modification for sheep genomes targeted simultaneously for gene breaks at three coding sequence locations. The trio-based sequencing approach revealed almost negligible off-target modifications, providing timely evidences of the safe application of genome editing in vivo with CRISPR/Cas9.
Cytoscape: the network visualization tool for GenomeSpace workflows.

PubMed

Demchak, Barry; Hull, Tim; Reich, Michael; Liefeld, Ted; Smoot, Michael; Ideker, Trey; Mesirov, Jill P

2014-01-01

Modern genomic analysis often requires workflows incorporating multiple best-of-breed tools. GenomeSpace is a web-based visual workbench that combines a selection of these tools with mechanisms that create data flows between them. One such tool is Cytoscape 3, a popular application that enables analysis and visualization of graph-oriented genomic networks. As Cytoscape runs on the desktop, and not in a web browser, integrating it into GenomeSpace required special care in creating a seamless user experience and enabling appropriate data flows. In this paper, we present the design and operation of the Cytoscape GenomeSpace app, which accomplishes this integration, thereby providing critical analysis and visualization functionality for GenomeSpace users. It has been downloaded over 850 times since the release of its first version in September, 2013.
Cytoscape: the network visualization tool for GenomeSpace workflows

PubMed Central

Demchak, Barry; Hull, Tim; Reich, Michael; Liefeld, Ted; Smoot, Michael; Ideker, Trey; Mesirov, Jill P.

2014-01-01

Modern genomic analysis often requires workflows incorporating multiple best-of-breed tools. GenomeSpace is a web-based visual workbench that combines a selection of these tools with mechanisms that create data flows between them. One such tool is Cytoscape 3, a popular application that enables analysis and visualization of graph-oriented genomic networks. As Cytoscape runs on the desktop, and not in a web browser, integrating it into GenomeSpace required special care in creating a seamless user experience and enabling appropriate data flows. In this paper, we present the design and operation of the Cytoscape GenomeSpace app, which accomplishes this integration, thereby providing critical analysis and visualization functionality for GenomeSpace users. It has been downloaded over 850 times since the release of its first version in September, 2013. PMID:25165537
Genome-Wide Linkage and Association Analysis Identifies Major Gene Loci for Guttural Pouch Tympany in Arabian and German Warmblood Horses

PubMed Central

Metzger, Julia; Ohnesorge, Bernhard; Distl, Ottmar

2012-01-01

Equine guttural pouch tympany (GPT) is a hereditary condition affecting foals in their first months of life. Complex segregation analyses in Arabian and German warmblood horses showed the involvement of a major gene as very likely. Genome-wide linkage and association analyses including a high density marker set of single nucleotide polymorphisms (SNPs) were performed to map the genomic region harbouring the potential major gene for GPT. A total of 85 Arabian and 373 German warmblood horses were genotyped on the Illumina equine SNP50 beadchip. Non-parametric multipoint linkage analyses showed genome-wide significance on horse chromosomes (ECA) 3 for German warmblood at 16–26 Mb and 34–55 Mb and for Arabian on ECA15 at 64–65 Mb. Genome-wide association analyses confirmed the linked regions for both breeds. In Arabian, genome-wide association was detected at 64 Mb within the region with the highest linkage peak on ECA15. For German warmblood, signals for genome-wide association were close to the peak region of linkage at 52 Mb on ECA3. The odds ratio for the SNP with the highest genome-wide association was 0.12 for the Arabian. In conclusion, the refinement of the regions with the Illumina equine SNP50 beadchip is an important step to unravel the responsible mutations for GPT. PMID:22848553
High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs.

PubMed

Dilthey, Alexander T; Gourraud, Pierre-Antoine; Mentzer, Alexander J; Cereb, Nezih; Iqbal, Zamin; McVean, Gil

2016-10-01

Genetic variation at the Human Leucocyte Antigen (HLA) genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing) as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG) framework. First, we construct a PRG for 46 (mostly HLA) genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1) and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data). Of 158 alleles tested, we correctly infer 157 alleles (99.4%). We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently ~30-250 CPU hours per sample) remain a significant challenge to practical application.
High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs

PubMed Central

Dilthey, Alexander T.; Gourraud, Pierre-Antoine; McVean, Gil

2016-01-01

Genetic variation at the Human Leucocyte Antigen (HLA) genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing) as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG) framework. First, we construct a PRG for 46 (mostly HLA) genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1) and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data). Of 158 alleles tested, we correctly infer 157 alleles (99.4%). We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently ~30–250 CPU hours per sample) remain a significant challenge to practical application. PMID:27792722
Large-Scale Phylogenetic Classification of Fungal Chitin Synthases and Identification of a Putative Cell-Wall Metabolism Gene Cluster in Aspergillus Genomes

PubMed Central

Pacheco-Arjona, Jose Ramon; Ramirez-Prado, Jorge Humberto

2014-01-01

The cell wall is a protective and versatile structure distributed in all fungi. The component responsible for its rigidity is chitin, a product of chitin synthase (Chsp) enzymes. There are seven classes of chitin synthase genes (CHS) and the amount and type encoded in fungal genomes varies considerably from one species to another. Previous Chsp sequence analyses focused on their study as individual units, regardless of genomic context. The identification of blocks of conserved genes between genomes can provide important clues about the interactions and localization of chitin synthases. On the present study, we carried out an in silico search of all putative Chsp encoded in 54 full fungal genomes, encompassing 21 orders from five phyla. Phylogenetic studies of these Chsp were able to confidently classify 347 out of the 369 Chsp identified (94%). Patterns in the distribution of Chsp related to taxonomy were identified, the most prominent being related to the type of fungal growth. More importantly, a synteny analysis for genomic blocks centered on class IV Chsp (the most abundant and widely distributed Chsp class) identified a putative cell wall metabolism gene cluster in members of the genus Aspergillus, the first such association reported for any fungal genome. PMID:25148134
Genomic-Enabled Prediction of Ordinal Data with Bayesian Logistic Ordinal Regression.

PubMed

Montesinos-López, Osval A; Montesinos-López, Abelardo; Crossa, José; Burgueño, Juan; Eskridge, Kent

2015-08-18

Most genomic-enabled prediction models developed so far assume that the response variable is continuous and normally distributed. The exception is the probit model, developed for ordered categorical phenotypes. In statistical applications, because of the easy implementation of the Bayesian probit ordinal regression (BPOR) model, Bayesian logistic ordinal regression (BLOR) is implemented rarely in the context of genomic-enabled prediction [sample size (n) is much smaller than the number of parameters (p)]. For this reason, in this paper we propose a BLOR model using the Pólya-Gamma data augmentation approach that produces a Gibbs sampler with similar full conditional distributions of the BPOR model and with the advantage that the BPOR model is a particular case of the BLOR model. We evaluated the proposed model by using simulation and two real data sets. Results indicate that our BLOR model is a good alternative for analyzing ordinal data in the context of genomic-enabled prediction with the probit or logit link. Copyright © 2015 Montesinos-López et al.
A genome wide association study links glutamate receptor pathway to sporadic Creutzfeldt-Jakob disease risk.

PubMed

Sanchez-Juan, Pascual; Bishop, Matthew T; Kovacs, Gabor G; Calero, Miguel; Aulchenko, Yurii S; Ladogana, Anna; Boyd, Alison; Lewis, Victoria; Ponto, Claudia; Calero, Olga; Poleggi, Anna; Carracedo, Ángel; van der Lee, Sven J; Ströbel, Thomas; Rivadeneira, Fernando; Hofman, Albert; Haïk, Stéphane; Combarros, Onofre; Berciano, José; Uitterlinden, Andre G; Collins, Steven J; Budka, Herbert; Brandel, Jean-Philippe; Laplanche, Jean Louis; Pocchiari, Maurizio; Zerr, Inga; Knight, Richard S G; Will, Robert G; van Duijn, Cornelia M

2014-01-01

We performed a genome-wide association (GWA) study in 434 sporadic Creutzfeldt-Jakob disease (sCJD) patients and 1939 controls from the United Kingdom, Germany and The Netherlands. The findings were replicated in an independent sample of 1109 sCJD and 2264 controls provided by a multinational consortium. From the initial GWA analysis we selected 23 SNPs for further genotyping in 1109 sCJD cases from seven different countries. Five SNPs were significantly associated with sCJD after correction for multiple testing. Subsequently these five SNPs were genotyped in 2264 controls. The pooled analysis, including 1543 sCJD cases and 4203 controls, yielded two genome wide significant results: rs6107516 (p-value=7.62x10-9) a variant tagging the prion protein gene (PRNP); and rs6951643 (p-value=1.66x10-8) tagging the Glutamate Receptor Metabotropic 8 gene (GRM8). Next we analysed the data stratifying by country of origin combining samples from the pooled analysis with genotypes from the 1000 Genomes Project and imputed genotypes from the Rotterdam Study (Total n=12967). The meta-analysis of the results showed that rs6107516 (p-value=3.00x10-8) and rs6951643 (p-value=3.91x10-5) remained as the two most significantly associated SNPs. Rs6951643 is located in an intronic region of GRM8, a gene that was additionally tagged by a cluster of 12 SNPs within our top100 ranked results. GRM8 encodes for mGluR8, a protein which belongs to the metabotropic glutamate receptor family, recently shown to be involved in the transduction of cellular signals triggered by the prion protein. Pathway enrichment analyses performed with both Ingenuity Pathway Analysis and ALIGATOR postulates glutamate receptor signalling as one of the main pathways associated with sCJD. In summary, we have detected GRM8 as a novel, non-PRNP, genome-wide significant marker associated with heightened disease risk, providing additional evidence supporting a role of glutamate receptors in sCJD pathogenesis.
Infrastructure for Personalized Medicine at Partners HealthCare

PubMed Central

Weiss, Scott T.; Shin, Meini Sumbada

2016-01-01

Partners HealthCare Personalized Medicine (PPM) is a center within the Partners HealthCare system (founded by Massachusetts General Hospital and Brigham and Women’s Hospital) whose mission is to utilize genetics and genomics to improve the care of patients in a cost effective manner. PPM consists of five interconnected components: (1) Laboratory for Molecular Medicine (LMM), a CLIA laboratory performing genetic testing for patients world-wide; (2) Translational Genomics Core (TGC), a core laboratory providing genomic platforms for Partners investigators; (3) Partners Biobank, a biobank of samples (DNA, plasma and serum) for 50,000 Consented Partners patients; (4) Biobank Portal, an IT infrastructure and viewer to bring together genotypes, samples, phenotypes (validated diagnoses, radiology, and clinical chemistry) from the electronic medical record to Partners investigators. These components are united by (5) a common IT system that brings researchers, clinicians, and patients together for optimal research and patient care. PMID:26927187

Some links on this page may take you to non-federal websites. Their policies may differ from this site.